This presentation describes the thought process for sharding in MongoDB for a social game that has or will have a large influx of users.
The presentation goes into how to setup a basic dev environment as well as provide some sample code to get you started.
2. Rick Chen
Rick Chen is a Technical Director at Cie
Games, overseeing the company‟s
engineering department. In 2012, Cie Games,
the leading developer and publisher of social
and mobile games, was recognized as one of
the Fastest Growing Company in North
America on Deloitte‟s Technology Fast 500 list.
3. Eugene Kovshilovsky
Eugene Kovshilovsky is a Technical Director at
Cie Games, a leading developer and publisher
of social and mobile games. Its popular game,
Car Town, is the first Facebook game built
around brands and has attracted more than 50
million players worldwide. The company‟s first
iOS game, Car Town Streets, for iPhone, iPad
and iPod touch, is one of the Top 10 Games
on the App Store in multiple countries.
4. What is Sharding?
(Horizontal) partitioning in a database where data is split by
rows (collections) instead or columns
5. Why Shard in Games?
Write Throughput Exceeds I/O
Each click updates the user‟s currency amount.
6. Why Shard in Games?
Memory isn‟t large enough to hold active dataset
7. Why Shard in Games?
You want to run on commodity hardware and scale
horizontally vs. vertically
8. MongoDB Sharding
Distribute objects within collections automatically
Choose how data is partitioned
Single master to sharded cluster with = 0 down time
Fully consistent
Works well in the cloud
Automated fail-over
Java support with Spring integration for easy JSON to Object
mapping
9. Shard Design
Share the load amongst all shards
equally
High Randomness
Have the smallest working set possible
Roll with time
Recreate shard key from ObjectId
Be careful how you shard, no way back
10. 99.999%
Most of our queries we have the ObjectId
Client saves ObjectId for re-login at a later time
I‟m this user and I want to do xyz with my car.
No range based queries
We don‟t do „find‟ this user in some range and query against our
master db, we can do this on our secondary data stores that are
not sharded or additional reporting systems.
11. Shard Behavior
Attempt to insert data into pre-split chunks
Increase cluster throughput by reducing overhead
User Life Time of 3 Days
Most new players expected to play for an average of 3 days
Unbalanced chunks will be rebalanced during off-peak hours
Rarely happens
12. Pre Split
Creates chunks and moves them to different shards ahead of
time
Creates 0 byte chunks equally to all shards
One chunk may grow to 5-6 chunks depending on how many
new users come in within a 3 day window
Used instead of waiting for the balancer to detect that there
are too many chunks in one shard
13. 3 Day Shard Key
2^18 = 262144 seconds = 3.03 days
ObjectId consists of the timestamp of when the
_id is created
“Consistent random” value
Generate Md5 from String version of ObjectId
Take first 18 bit from Md5
Append the 18 bit random value to the
timestamp
14. Mongo 2.4 and New Hash
Sharding
No need for Pre-splitting
Easy setup of randomized shard key
Better distribution of chunks on shards for isolated document
writes and reads
Uses first 64bit of md5 of the field
Doesn‟t support compound indexes
Don‟t use floating point numbers
Range bases queries go to all shards
Working set is distributed to all shards
16. Deploy
Start the Config Server (mongod) Instances
Start the mongos Instances
Add Shards to the Cluster
Enable Sharding for a Database
Enable Sharding for a Collection
17. Start the Config Server Database
Instances
Create data dir for each of the 3 config server instances
Start the 3 config server instances
If starting on the same machine for study purposes then
configure each one to run on a different port
mkdir /data/configdb
mongod –configsvr –dbpath /data/configdb –port 27019
18. Start the mongos Instances
Ideally these lightweight instances run on the application
server(s)
Start the mongos instances
mongos –configdb <config server hostnames>
If connecting to the demo configs on localhost then specify the
config servers with the different ports.
mongos run on port 27017 by default
mongos –configdb cfg0.example.net:27019,cfg1.example.net:27019
mongos –configdb localhost:27019,localhost:27020
19. Add Shards to the Cluster
Connect to mongos from mongo shell
Add each shard to the cluster using sh.addShard()
In production always specify the replica set
sh.addShard(“rs1/localhost:27017”)
Mongo –host localhost –port 27017
20. Enable Sharding for a Database
To shard collections you need to enable this first
Add sharding to database using
sh.enableSharding((“<database>”)
sh.enableSharding(“mydb”)
21. Enable Sharding for a Collection
Enable sharing using
sh.shardCollection("<database>.<collection>", shard-
key-pattern)
sh.shardCollection(”mydb.shardcollection", {userId:
1})