2. Presentation Overview Data >>> code Treat it appropriately Manage and maintain Mongo Mongo is young (and robust!) Performance and Features The right hooks exist
3. Who is Wordnik Wordnik is: The world’s largest English Language reference ~10M words! Mapping every word, based on real data (free ) API to add word information, everywhere
4. Wordnik’s MongoDB Deployment Over 12 Months with Mongo Corpus/UGC/Structured Data/Statistics Master/Slave ~3TB data ~12B records We love Mongo’s performance Read more: http://blog.wordnik.com/12-months-with-mongodb
5. Engineering + IT Ops First, Guiding Principles Know your data Don’t rely on IT magic Equal Importance in WebApps / SaaS Hold hands and be friends If you can’t manage it, don’t deploy it
7. How? Replicate! Is that enough? Well, not if your company is on the line Snapshot Every minute??? Export often Really???
8. Then What? Yes, Mongo can do Incremental Use the mongo slave mechanism It’s exposed It’s supported It’s very easy It’s extremely fast How? Snapshot your data Stream write ops to disk Repeat
9. Better than Free Take our tools-They work!!! SnapshotUtil Selectively snapshot in BSON Index info too! IncrementalBackupUtil Tail the oplog, stream to disk Only the collections you want! Compress & rotate RestoreUtil Recover your snapshots Apply indexes yourself ReplayUtil Apply your Incremental backups
10. What if Scenarios One collection gets corrupt? Restore it Apply all operations to it “My top developer dropped a collection!” Restore just that one Apply operations to it until that POT “We got hacked!” Restore it all Apply operations until that POT
11. What else is possible? Replication Why not use built-in? Control, of course Same logic as Incremental + Replay Add some filters and it gets interesting
12. Hot Datacenter Create incremental backups Compress Push to DC in batch Apply to master Primary Datacenter Hot Datacenter Incremental Backup Files Master Master Replay Util SCP
14. Multiple Upstream Masters Aggregate to single collection Target can be a master! Master A Master B Master C db.page_views db.page_views
15. Unblock MapReduce Map Reduce can lock up your server Replicate source data to another mongod Replicate results back to master Master MR Server db.source_data db.summary_data