1. Sharding Internals
Eliot Horowitz
@eliothorowitz
MongoSV
December 3, 2010
2. MongoDB Sharding
• Scale horizontally for data size, index size,
write and consistent read scaling
• Distribute databases, collections or a
objects in a collection
• Auto-balancing, migrations, management
happen with no down time
3. • Choose how you partition data
• Can convert from single master to sharded
system with no downtime
• Same features as non-sharding single
master
• Fully consistent
4. Range Based
MIN MAX LOCATION
A F shard1
F M shard1
M R shard2
R Z shard3
• collection is broken into chunks by range
• chunks default to 200mb or 100,000
objects
6. Shards
• Can be master, master/slave or replica sets
• Replica sets gives sharding + full auto-
failover
• Regular mongod processes
7. Config Servers
• 3 of them
• changes are made with 2 phase commit
• if any are down, meta data goes read only
• system is online as long as 1/3 is up
8. mongos
• Sharding Router
• Acts just like a mongod to clients
• Can have 1 or as many as you want
• Can run on appserver so no extra network
traffic
• Cache meta data from config servers
10. Queries
• By shard key: routed
• sorted by shard key: routed in order
• by non shard key: scatter gather
• sorted by non shard key: distributed merge
sort
11. Splitting
• Take a chunk and split it in 2
• Splits on the median value
• Splits only change meta data, no data
change
12. Splitting
T1
MIN MAX LOCATION
A Z shard1
T2
MIN MAX LOCATION
A G shard1
G Z shard1
T3
MIN MAX LOCATION
A D shard1
D G shard1
G S shard1
S Z shard1
13. Balancing
• Moves chunks from one shard to another
• Done online while system is running
• Balancing runs in the background
14. Migrating
T3 MIN MAX LOCATION
A D shard1
D G shard1
G S shard1
S Z shard1
T4 MIN MAX LOCATION
A D shard1
D G shard1
G S shard1
S Z shard2
T5
MIN MAX LOCATION
A D shard1
D G shard1
G S shard2
S Z shard2
15. Setting it Up
• Start servers
• add shards: db.runCommand( { addshard :
"10.1.1.5" } )
• turn on partitioning:
db.runCommand( { enablesharding : "test" }
• shard a collection:
db.runCommand( { shardcollection : "test.data" ,
key : { num : 1 } } )
16. User profiles
• Partition by user_id
• Secondary indexes on location, dates, etc...
• Reads/writes know which shard to hit
17. User Activity Stream
• Shard by user_id
• Loading a user’s stream hits a single shard
• Writes are distributed across all shards
• Can index on activity for deleting
18. Photos
• Can shard by photo_id for best read/write
distribution
• Secondary index on tags, date