From 1 to 100 developers: Scaling for developer productivity at Spotify @dawhiting HUG UK Strata 11/11/2013
How do I scale my development? How many developers? How many teams? How many Hadoop jobs? How much code
A brief history of Hadoop development at Spotify 2008 - Spotify launches in Sweden 2009 - First Hadoop cluster for royalties, 2 devs 2010 - Up to 37 nodes, BI team formed of 3 devs/3 analysts 2011 - Migrate to Amazon EMR 2012 - Back to our own cluster, 60->190 nodes, Infrastructure/Insights/Tools split 2013 - 6 teams just for data infrastructure. Loads of consumers. ~100 devs using cluster
What could possibly go wrong? Contention for resources Repetition of code, repetition of data, repetition of processing steps Poor code quality and technical debt Disorganised HDFS Unclear data catalogue
Contention for resources Priority and isolation What is important? Isolation > Prioritisation Hadoop scheduling Capacity scheduler YARN Resource Allocation
Repetition repetition repetition Refactor data, not just code Make popular data available pre-joined Analyse code to find jobs with similar dependencies Work at a higher level MapReduce out, (S)Crunch in Automatic substitution of operations for cached data
Code Quality & Technical Debt Stable platform Python -> JVM Abolish custom infrastructure Off-the-shelf is often good enough Eg. Sqoop, Kafka, ... Testing Make testing easier than running Educate Enforced testing
HDFS Data retention Automatic deletion of old intermediate data Opt-out, not opt-in Establish convention Can you correctly guess the path to the data you need Enforce convention Path literals are a code smell
Data Catalogue Core datasets Identify Catalogue Document Monitor Data library as code library Easy to pick up Change history Synced with release cycles
You can have it easier than us Act now Big data technical debt is worse than normal technical debt Rewriting 10 jobs is easier than rewriting 300 Plan to decentralise You can't review every job forever Write up guidelines, abolish folklore Make it simpler to do things the right way Example: build tools