Care & Feeding of Large MongoDB Clusters

The Care & Feeding of a
Large MongoDB Cluster
2012

Chris Henry
@chrishnry

Who uses MongoDB?

In Production?

MongoDB @ Behance

Activity Feed v1
~ 40 Nodes
~ 250 Million Docs
~ ext3 Filesystem

Ran for around 3 months, then...

MongoDB @ Behance

Activity Feed v2
• ext4 Filesystem
• 2 TB data
• 3 Collections
• 15k Chunks
• 19 Shards
• 60 Nodes
• 400M Docs at peak
• Ranges ~120-250M

Why MongoDB?

• Easy to use.
• Easy to Iterate on.
• Devs like it.
• Fantastic Community built by 10Gen.
• “Fast.”

Why NOT MongoDB?

• Bleeding Edge.
• Not enough battle scars.
• Fewer tried and true fixes.
• No Transactional support.

Why MongoDB at Scale?

• Autosharding. (Stop hacking your app.)
• Smart Replica Sets / High Availability.
• Horizontal Scalability
• Easy to grow and shrink.
• Good fit for cloud*.

Why NOT MongoDB at Scale?

• Data can take up more space on disk.
• Disk IO in the cloud sucks.
• Database-level write lock.
• More Management than a MySQL cluster.

Behance’s Use Case + Fit

• Data is ephemeral.
• Denormalization of existing data.
• Fan Out Approach.
• Sharded by User.

Care & Feeding. Srsly?

• As a less mature DB, admins need to be a
bit more aware.
• No Different than MySQL (Need to take into
account memory, disk, usage patterns,
indexes)
• Watch Error Logs, Disk Use, Data Size, # of
Chunks, Old files, Sharding Status, Padding
Factors...

MongoDB Basics

MongoDB Docs are great, and always improving.

http://docs.mongodb.org/manual/

Indexes

You need them.

duh and/or hello.

Profiling

• The profiler is equivalent to the slow log in
MySQL.

• Logs all operations slower than X seconds
to a collection.
// log slow operations, slow threshold=50ms
> db.setProfilingLevel(1,50)

// get operations that were slow.
db.system.profile.find( { millis : { $gt : 5 } } )

http://www.mongodb.org/display/DOCS/Database+Proﬁler

Explain

• Equivalent to MySQL’s EXPLAIN
• From Profiler, grab $query + $orderby, build
into real query.

// Explain a query
db.collection.find({ x: 1 }).explain()

http://www.mongodb.org/display/DOCS/Explain

Replica Sets

• Equivalent to MySQL’s replication, but not quite.
• Resiliency and availability through cleverness.
• ReplicaSet setups
• rs.stepDown()
• rs.slaveOk()
• w parameter

Replica Sets

// mongod.conf
replSet = myreplica

// Initiate the Replica Set
> rs.initiate()

//Add a node
> rs.add(“myreplica1:27017”);

// Allow reads from the secondaries
> rs.slaveOk()

// Write something.
> db.replica.insert({x:1});

// Make sure write propogates to majority of servers.
> db.runCommand( { getlasterror : 1 , w : "majority" } )

http://www.mongodb.org/display/DOCS/Replica+Set+Commands
http://docs.mongodb.org/manual/applications/replication/#read-preference
http://www.mongodb.org/display/DOCS/Verifying+Propagation+of+Writes+with+getLastError

Sharding

• Goal: Distribute data across many nodes / replica sets.
• Provides baked-in horizontal scalability.

Sharding

• Chunks.
• Routing process - mongos (manages balancing, and query routing)
• Shards - mongod configured with shardsvr = 1.
• Config servers - 3 mongod servers configured on mongos.conf
• Can be replica sets, or stand alone servers.
• Shard Key

Sharding

// mongos.conf
configdb = server,server,server

// Initiate
// connect to mongos the same way you would connect to mongod
> db.runCommand( { addshard : "<serverhostname>[:<port>]" } );

// Shard a collection
> db.runCommand( { shardcollection : "test.fs.chunks", key :
{ files_id : 1 } } )

http://www.mongodb.org/display/DOCS/Replica+Set+Commands

Indexing Big - Sharded Index

• Always run on mongos
• Background Indexing
• Sparse indexes

Maintenance

• Replica Sets
• Add Node / Remove Node
• rs.stepDown()

• Shards
• Drain Shard / Add Shard

Gotchas

• A Sharded cluster with a shard down will
return no results.

• If a chunk has too much data dedicated to a
single shard key, and cannot split it, balancing
will become blocked, and the cluster will
become unbalanced.

Hardware / OS

• No knobs in MongoDB.
• Filesystem. (ext4 or xfs)
• Memory.
• Unix distro. Linux kernel >= 2.6.23
http://www.mongodb.org/display/DOCS/Production+Notes#ProductionNotes-LinuxFileSystems

That’s it!

Thank you all for coming!

Feedback is welcome!

Care & Feeding of Large MongoDB Clusters

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (6)

Semelhante a Care & Feeding of Large MongoDB Clusters

Semelhante a Care & Feeding of Large MongoDB Clusters (20)

Último

Último (20)

Care & Feeding of Large MongoDB Clusters

Notas do Editor