How Spotify scaled their Hadoop cluster and the people working on it from 1 to over 100 develop, and 1 node to now over 690 nodes pushing them to have the largest Hadoop cluster in Europe.
Unleash Your Potential - Namagunga Girls Coding Club
Spotify: From 1 to 100 Hadoop developers
1. From 1 to 100 developers
Scaling for developer productivity at Spotify
@dawhiting
HUG UK @ Strata
11/11/2013
2. 2
How do I scale?
How many nodes?
How much data?
How many records?
3. 3
How do I scale my
development?
How many developers?
How many teams?
How many Hadoop jobs?
How much code?
Data Infrastructure - July 2013
4. 4
A brief history of Hadoop development at
Spotify
2008 - Spotify launches in Sweden
2009 - First Hadoop cluster for royalties, 2 developers
2010 - Up to 37 nodes, BI team formed, 3 devs/3 analysts
2011 - to Elastic MapReduce
2012 - Back to own cluster, 60 -> 190 nodes, Infrastructure/Insights/
Tools team split
2013 - 6 teams just for data infrastructure, ~100 developers using
Hadoop cluster.
5. 5
Issues
What could possibly go wrong?
•Contention for resources
•Repetition of code, repetition of data
•Poor code quality / technical debt
•Disorganised HDFS
•Data cataloguing
7. 7
Don’t Repeat
Yourself
Refactor data, not just code
•Make popular data available
pre-joined
•Analyse code to find jobs
with the same dependencies
Work at a higher level
•MapReduce out, (S)Crunch in
•Allow substitution of
operations for cached data
8. 8
Code Quality &
Technical Debt
Stable platform
•Python -> JVM
Abolish custom infrastructure
•Off-the-shelf is often good
enough
•Eg. Sqoop, Kafka, ...
Testing
•Make testing easier than
running
9. 9
HDFS
Retention policy
•Automatic deletion of old
intermediate data
•Opt-out, not opt-in
Establish convention
•Can you correctly guess the
path to the data you need?
Enforce structure
•Path literals are a code smell
11. 11
You can have it easier than us
Act now
•Big Data technical debt is worse than normal technical debt
•Rewriting 10 jobs is easier than rewriting 300
Plan to decentralise
•At some point it won’t be enough to trust your developers
•You won’t be able to review every job forever
Make it simpler to do things the right way
•Example: build tools
12. Want to join the band?
We’re hiring for Stockholm and NYC
Check out http://www.spotify.com/
jobs for more information.