SlideShare uma empresa Scribd logo
1 de 26
The Care & Feeding of a
Large MongoDB Cluster
                          2012

Chris Henry
@chrishnry
hello.
Who uses MongoDB?

In Production?
MongoDB @ Behance


 Activity Feed v1
 ~ 40 Nodes
 ~ 250 Million Docs
 ~ ext3 Filesystem

 Ran for around 3 months, then...
MongoDB @ Behance
MongoDB @ Behance


 Activity Feed v2
 • ext4 Filesystem
 • 2 TB data
 • 3 Collections
 • 15k Chunks
 • 19 Shards
 • 60 Nodes
 • 400M Docs at peak
 • Ranges ~120-250M
Why MongoDB?


  • Easy to use.
  • Easy to Iterate on.
  • Devs like it.
  • Fantastic Community built by 10Gen.
  • “Fast.”
Why NOT MongoDB?


  • Bleeding Edge.
  • Not enough battle scars.
  • Fewer tried and true fixes.
  • No Transactional support.
Why MongoDB at Scale?


  • Autosharding. (Stop hacking your app.)
  • Smart Replica Sets / High Availability.
  • Horizontal Scalability
  • Easy to grow and shrink.
  • Good fit for cloud*.
Why NOT MongoDB at Scale?


  • Data can take up more space on disk.
  • Disk IO in the cloud sucks.
  • Database-level write lock.
  • More Management than a MySQL cluster.
Behance’s Use Case + Fit


  • Data is ephemeral.
  • Denormalization of existing data.
  • Fan Out Approach.
  • Sharded by User.
Care & Feeding. Srsly?


  • As a less mature DB, admins need to be a
  bit more aware.
  • No Different than MySQL (Need to take into
  account memory, disk, usage patterns,
  indexes)
  • Watch Error Logs, Disk Use, Data Size, # of
  Chunks, Old files, Sharding Status, Padding
  Factors...
MongoDB Basics




    MongoDB Docs are great, and always improving.

    http://docs.mongodb.org/manual/
Indexes



  You need them.




  duh and/or hello.
Profiling


  • The profiler is equivalent to the slow log in
  MySQL.

  • Logs all operations slower than X seconds
  to a collection.
  // log slow operations, slow threshold=50ms
  > db.setProfilingLevel(1,50)

  // get operations that were slow.
  db.system.profile.find( { millis : { $gt : 5 } } )


  http://www.mongodb.org/display/DOCS/Database+Profiler
Explain


   • Equivalent to MySQL’s EXPLAIN
   • From Profiler, grab $query + $orderby, build
   into real query.

 // Explain a query
 db.collection.find({ x: 1 }).explain()


 http://www.mongodb.org/display/DOCS/Explain
Replica Sets


 • Equivalent to MySQL’s replication, but not quite.
 • Resiliency and availability through cleverness.
 • ReplicaSet setups
 • rs.stepDown()
 • rs.slaveOk()
 • w parameter
Replica Sets

// mongod.conf
replSet = myreplica

// Initiate the Replica Set
> rs.initiate()

//Add a node
> rs.add(“myreplica1:27017”);

// Allow reads from the secondaries
> rs.slaveOk()

// Write something.
> db.replica.insert({x:1});

// Make sure write propogates to majority of servers.
> db.runCommand( { getlasterror : 1 , w : "majority" } )


http://www.mongodb.org/display/DOCS/Replica+Set+Commands
http://docs.mongodb.org/manual/applications/replication/#read-preference
http://www.mongodb.org/display/DOCS/Verifying+Propagation+of+Writes+with+getLastError
Sharding


  • Goal: Distribute data across many nodes / replica sets.
  • Provides baked-in horizontal scalability.
Sharding


  • Chunks.
  • Routing process - mongos (manages balancing, and query routing)
  • Shards - mongod configured with shardsvr = 1.
  • Config servers - 3 mongod servers configured on mongos.conf
  • Can be replica sets, or stand alone servers.
  • Shard Key
Sharding


// mongos.conf
configdb = server,server,server

// Initiate
// connect to mongos the same way you would connect to mongod
> db.runCommand( { addshard : "<serverhostname>[:<port>]" } );

// Shard a collection
> db.runCommand( { shardcollection : "test.fs.chunks", key :
{ files_id : 1 } } )


http://www.mongodb.org/display/DOCS/Replica+Set+Commands
Indexing Big - Sharded Index


 • Always run on mongos
 • Background Indexing
 • Sparse indexes
Maintenance


 • Replica Sets
 • Add Node / Remove Node
 • rs.stepDown()


 • Shards
 • Drain Shard / Add Shard
Gotchas


 • A Sharded cluster with a shard down will
 return no results.

 • If a chunk has too much data dedicated to a
 single shard key, and cannot split it, balancing
 will become blocked, and the cluster will
 become unbalanced.
Hardware / OS


 • No knobs in MongoDB.
 • Filesystem. (ext4 or xfs)
 • Memory.
 • Unix distro. Linux kernel >= 2.6.23
 http://www.mongodb.org/display/DOCS/Production+Notes#ProductionNotes-LinuxFileSystems
That’s it!


 Thank you all for coming!

 Feedback is welcome!

Mais conteúdo relacionado

Mais procurados

Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to RedisKnoldus Inc.
 
Installing postgres & postgis
Installing postgres & postgisInstalling postgres & postgis
Installing postgres & postgisJohn Ashmead
 
A simple introduction to redis
A simple introduction to redisA simple introduction to redis
A simple introduction to redisZhichao Liang
 
Redis overview for Software Architecture Forum
Redis overview for Software Architecture ForumRedis overview for Software Architecture Forum
Redis overview for Software Architecture ForumChristopher Spring
 
Perl for System Automation - 01 Advanced File Processing
Perl for System Automation - 01 Advanced File ProcessingPerl for System Automation - 01 Advanced File Processing
Perl for System Automation - 01 Advanced File ProcessingDanairat Thanabodithammachari
 
phptek13 - Caching and tuning fun tutorial
phptek13 - Caching and tuning fun tutorialphptek13 - Caching and tuning fun tutorial
phptek13 - Caching and tuning fun tutorialWim Godden
 
Mysql Fulltext Search 1
Mysql Fulltext Search 1Mysql Fulltext Search 1
Mysql Fulltext Search 1johnymas
 
XtraDB 5.7: key performance algorithms
XtraDB 5.7: key performance algorithmsXtraDB 5.7: key performance algorithms
XtraDB 5.7: key performance algorithmsLaurynas Biveinis
 
Elastic Search Training#1 (brief tutorial)-ESCC#1
Elastic Search Training#1 (brief tutorial)-ESCC#1Elastic Search Training#1 (brief tutorial)-ESCC#1
Elastic Search Training#1 (brief tutorial)-ESCC#1medcl
 
Ugif 10 2012 beauty ofifmxdiskstructs ugif
Ugif 10 2012 beauty ofifmxdiskstructs ugifUgif 10 2012 beauty ofifmxdiskstructs ugif
Ugif 10 2012 beauty ofifmxdiskstructs ugifUGIF
 
Introduction to MongoDB with PHP
Introduction to MongoDB with PHPIntroduction to MongoDB with PHP
Introduction to MongoDB with PHPfwso
 
Non-Framework MVC sites with PHP
Non-Framework MVC sites with PHPNon-Framework MVC sites with PHP
Non-Framework MVC sites with PHPCésar Rodas
 
PGDAY FR 2014 : presentation de Postgresql chez leboncoin.fr
PGDAY FR 2014 : presentation de Postgresql chez leboncoin.frPGDAY FR 2014 : presentation de Postgresql chez leboncoin.fr
PGDAY FR 2014 : presentation de Postgresql chez leboncoin.frjlb666
 
Introduction to redis
Introduction to redisIntroduction to redis
Introduction to redisTanu Siwag
 
Thinking in documents
Thinking in documentsThinking in documents
Thinking in documentsCésar Rodas
 

Mais procurados (20)

Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
 
Installing postgres & postgis
Installing postgres & postgisInstalling postgres & postgis
Installing postgres & postgis
 
A simple introduction to redis
A simple introduction to redisA simple introduction to redis
A simple introduction to redis
 
Redis overview for Software Architecture Forum
Redis overview for Software Architecture ForumRedis overview for Software Architecture Forum
Redis overview for Software Architecture Forum
 
MongoDB & PHP
MongoDB & PHPMongoDB & PHP
MongoDB & PHP
 
Perl Programming - 03 Programming File
Perl Programming - 03 Programming FilePerl Programming - 03 Programming File
Perl Programming - 03 Programming File
 
Perl for System Automation - 01 Advanced File Processing
Perl for System Automation - 01 Advanced File ProcessingPerl for System Automation - 01 Advanced File Processing
Perl for System Automation - 01 Advanced File Processing
 
phptek13 - Caching and tuning fun tutorial
phptek13 - Caching and tuning fun tutorialphptek13 - Caching and tuning fun tutorial
phptek13 - Caching and tuning fun tutorial
 
Redis and it's data types
Redis and it's data typesRedis and it's data types
Redis and it's data types
 
Mysql Fulltext Search 1
Mysql Fulltext Search 1Mysql Fulltext Search 1
Mysql Fulltext Search 1
 
Redis basics
Redis basicsRedis basics
Redis basics
 
XtraDB 5.7: key performance algorithms
XtraDB 5.7: key performance algorithmsXtraDB 5.7: key performance algorithms
XtraDB 5.7: key performance algorithms
 
Redis 101
Redis 101Redis 101
Redis 101
 
Elastic Search Training#1 (brief tutorial)-ESCC#1
Elastic Search Training#1 (brief tutorial)-ESCC#1Elastic Search Training#1 (brief tutorial)-ESCC#1
Elastic Search Training#1 (brief tutorial)-ESCC#1
 
Ugif 10 2012 beauty ofifmxdiskstructs ugif
Ugif 10 2012 beauty ofifmxdiskstructs ugifUgif 10 2012 beauty ofifmxdiskstructs ugif
Ugif 10 2012 beauty ofifmxdiskstructs ugif
 
Introduction to MongoDB with PHP
Introduction to MongoDB with PHPIntroduction to MongoDB with PHP
Introduction to MongoDB with PHP
 
Non-Framework MVC sites with PHP
Non-Framework MVC sites with PHPNon-Framework MVC sites with PHP
Non-Framework MVC sites with PHP
 
PGDAY FR 2014 : presentation de Postgresql chez leboncoin.fr
PGDAY FR 2014 : presentation de Postgresql chez leboncoin.frPGDAY FR 2014 : presentation de Postgresql chez leboncoin.fr
PGDAY FR 2014 : presentation de Postgresql chez leboncoin.fr
 
Introduction to redis
Introduction to redisIntroduction to redis
Introduction to redis
 
Thinking in documents
Thinking in documentsThinking in documents
Thinking in documents
 

Destaque

Cost savings and expert system advice with athene ES/1
Cost savings and expert system advice with athene ES/1 Cost savings and expert system advice with athene ES/1
Cost savings and expert system advice with athene ES/1 Metron
 
Big Data Performance and Capacity Management
Big Data Performance and Capacity ManagementBig Data Performance and Capacity Management
Big Data Performance and Capacity Managementrightsize
 
Scaling Spark Workloads on YARN - Boulder/Denver July 2015
Scaling Spark Workloads on YARN - Boulder/Denver July 2015Scaling Spark Workloads on YARN - Boulder/Denver July 2015
Scaling Spark Workloads on YARN - Boulder/Denver July 2015Mac Moore
 
Capacity Management and BigData/Hadoop - Hitchhiker's guide for the Capacity ...
Capacity Management and BigData/Hadoop - Hitchhiker's guide for the Capacity ...Capacity Management and BigData/Hadoop - Hitchhiker's guide for the Capacity ...
Capacity Management and BigData/Hadoop - Hitchhiker's guide for the Capacity ...Renato Bonomini
 
Provisioning and Capacity Planning (Travel Meets Big Data)
Provisioning and Capacity Planning (Travel Meets Big Data)Provisioning and Capacity Planning (Travel Meets Big Data)
Provisioning and Capacity Planning (Travel Meets Big Data)Brian Brazil
 

Destaque (6)

Cost savings and expert system advice with athene ES/1
Cost savings and expert system advice with athene ES/1 Cost savings and expert system advice with athene ES/1
Cost savings and expert system advice with athene ES/1
 
Big Data Performance and Capacity Management
Big Data Performance and Capacity ManagementBig Data Performance and Capacity Management
Big Data Performance and Capacity Management
 
Scaling Spark Workloads on YARN - Boulder/Denver July 2015
Scaling Spark Workloads on YARN - Boulder/Denver July 2015Scaling Spark Workloads on YARN - Boulder/Denver July 2015
Scaling Spark Workloads on YARN - Boulder/Denver July 2015
 
Capacity Management and BigData/Hadoop - Hitchhiker's guide for the Capacity ...
Capacity Management and BigData/Hadoop - Hitchhiker's guide for the Capacity ...Capacity Management and BigData/Hadoop - Hitchhiker's guide for the Capacity ...
Capacity Management and BigData/Hadoop - Hitchhiker's guide for the Capacity ...
 
Provisioning and Capacity Planning (Travel Meets Big Data)
Provisioning and Capacity Planning (Travel Meets Big Data)Provisioning and Capacity Planning (Travel Meets Big Data)
Provisioning and Capacity Planning (Travel Meets Big Data)
 
Modern Data Architecture
Modern Data ArchitectureModern Data Architecture
Modern Data Architecture
 

Semelhante a Care & Feeding of Large MongoDB Clusters

Deployment Strategy
Deployment StrategyDeployment Strategy
Deployment StrategyMongoDB
 
MongoDB Pros and Cons
MongoDB Pros and ConsMongoDB Pros and Cons
MongoDB Pros and Consjohnrjenson
 
Mongo db first steps with csharp
Mongo db first steps with csharpMongo db first steps with csharp
Mongo db first steps with csharpSerdar Buyuktemiz
 
Deployment Strategies
Deployment StrategiesDeployment Strategies
Deployment StrategiesMongoDB
 
MongoDB : Scaling, Security & Performance
MongoDB : Scaling, Security & PerformanceMongoDB : Scaling, Security & Performance
MongoDB : Scaling, Security & PerformanceSasidhar Gogulapati
 
MongoDB 2.4 and spring data
MongoDB 2.4 and spring dataMongoDB 2.4 and spring data
MongoDB 2.4 and spring dataJimmy Ray
 
MongoDB Knowledge Shareing
MongoDB Knowledge ShareingMongoDB Knowledge Shareing
MongoDB Knowledge ShareingPhilip Zhong
 
5 Pitfalls to Avoid with MongoDB
5 Pitfalls to Avoid with MongoDB5 Pitfalls to Avoid with MongoDB
5 Pitfalls to Avoid with MongoDBTim Callaghan
 
Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
Cassandra Community Webinar: From Mongo to Cassandra, Architectural LessonsCassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
Cassandra Community Webinar: From Mongo to Cassandra, Architectural LessonsDataStax
 
MongoDB vs Mysql. A devops point of view
MongoDB vs Mysql. A devops point of viewMongoDB vs Mysql. A devops point of view
MongoDB vs Mysql. A devops point of viewPierre Baillet
 
MongoDB Internals
MongoDB InternalsMongoDB Internals
MongoDB InternalsSiraj Memon
 
MongoDB Introduction - Document Oriented Nosql Database
MongoDB Introduction - Document Oriented Nosql DatabaseMongoDB Introduction - Document Oriented Nosql Database
MongoDB Introduction - Document Oriented Nosql DatabaseSudhir Patil
 
Conceptos básicos. Seminario web 6: Despliegue de producción
Conceptos básicos. Seminario web 6: Despliegue de producciónConceptos básicos. Seminario web 6: Despliegue de producción
Conceptos básicos. Seminario web 6: Despliegue de producciónMongoDB
 

Semelhante a Care & Feeding of Large MongoDB Clusters (20)

Deployment Strategy
Deployment StrategyDeployment Strategy
Deployment Strategy
 
MongoDB Pros and Cons
MongoDB Pros and ConsMongoDB Pros and Cons
MongoDB Pros and Cons
 
MongoDB
MongoDBMongoDB
MongoDB
 
Mongo db first steps with csharp
Mongo db first steps with csharpMongo db first steps with csharp
Mongo db first steps with csharp
 
MongoDB
MongoDBMongoDB
MongoDB
 
Deployment Strategies
Deployment StrategiesDeployment Strategies
Deployment Strategies
 
MongoDB : Scaling, Security & Performance
MongoDB : Scaling, Security & PerformanceMongoDB : Scaling, Security & Performance
MongoDB : Scaling, Security & Performance
 
MongoDB 2.4 and spring data
MongoDB 2.4 and spring dataMongoDB 2.4 and spring data
MongoDB 2.4 and spring data
 
MongoDB Knowledge Shareing
MongoDB Knowledge ShareingMongoDB Knowledge Shareing
MongoDB Knowledge Shareing
 
MongoDB
MongoDBMongoDB
MongoDB
 
5 Pitfalls to Avoid with MongoDB
5 Pitfalls to Avoid with MongoDB5 Pitfalls to Avoid with MongoDB
5 Pitfalls to Avoid with MongoDB
 
Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
Cassandra Community Webinar: From Mongo to Cassandra, Architectural LessonsCassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
 
MongoDB vs Mysql. A devops point of view
MongoDB vs Mysql. A devops point of viewMongoDB vs Mysql. A devops point of view
MongoDB vs Mysql. A devops point of view
 
MongoDB Internals
MongoDB InternalsMongoDB Internals
MongoDB Internals
 
Mongodb
MongodbMongodb
Mongodb
 
Deployment
DeploymentDeployment
Deployment
 
Introduction to Mongodb
Introduction to MongodbIntroduction to Mongodb
Introduction to Mongodb
 
MongoDB Introduction - Document Oriented Nosql Database
MongoDB Introduction - Document Oriented Nosql DatabaseMongoDB Introduction - Document Oriented Nosql Database
MongoDB Introduction - Document Oriented Nosql Database
 
Conceptos básicos. Seminario web 6: Despliegue de producción
Conceptos básicos. Seminario web 6: Despliegue de producciónConceptos básicos. Seminario web 6: Despliegue de producción
Conceptos básicos. Seminario web 6: Despliegue de producción
 
mongodb tutorial
mongodb tutorialmongodb tutorial
mongodb tutorial
 

Último

What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 

Último (20)

What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 

Care & Feeding of Large MongoDB Clusters

  • 1. The Care & Feeding of a Large MongoDB Cluster 2012 Chris Henry @chrishnry
  • 3. Who uses MongoDB? In Production?
  • 4. MongoDB @ Behance Activity Feed v1 ~ 40 Nodes ~ 250 Million Docs ~ ext3 Filesystem Ran for around 3 months, then...
  • 6. MongoDB @ Behance Activity Feed v2 • ext4 Filesystem • 2 TB data • 3 Collections • 15k Chunks • 19 Shards • 60 Nodes • 400M Docs at peak • Ranges ~120-250M
  • 7. Why MongoDB? • Easy to use. • Easy to Iterate on. • Devs like it. • Fantastic Community built by 10Gen. • “Fast.”
  • 8. Why NOT MongoDB? • Bleeding Edge. • Not enough battle scars. • Fewer tried and true fixes. • No Transactional support.
  • 9. Why MongoDB at Scale? • Autosharding. (Stop hacking your app.) • Smart Replica Sets / High Availability. • Horizontal Scalability • Easy to grow and shrink. • Good fit for cloud*.
  • 10. Why NOT MongoDB at Scale? • Data can take up more space on disk. • Disk IO in the cloud sucks. • Database-level write lock. • More Management than a MySQL cluster.
  • 11. Behance’s Use Case + Fit • Data is ephemeral. • Denormalization of existing data. • Fan Out Approach. • Sharded by User.
  • 12. Care & Feeding. Srsly? • As a less mature DB, admins need to be a bit more aware. • No Different than MySQL (Need to take into account memory, disk, usage patterns, indexes) • Watch Error Logs, Disk Use, Data Size, # of Chunks, Old files, Sharding Status, Padding Factors...
  • 13. MongoDB Basics MongoDB Docs are great, and always improving. http://docs.mongodb.org/manual/
  • 14. Indexes You need them. duh and/or hello.
  • 15. Profiling • The profiler is equivalent to the slow log in MySQL. • Logs all operations slower than X seconds to a collection. // log slow operations, slow threshold=50ms > db.setProfilingLevel(1,50) // get operations that were slow. db.system.profile.find( { millis : { $gt : 5 } } ) http://www.mongodb.org/display/DOCS/Database+Profiler
  • 16. Explain • Equivalent to MySQL’s EXPLAIN • From Profiler, grab $query + $orderby, build into real query. // Explain a query db.collection.find({ x: 1 }).explain() http://www.mongodb.org/display/DOCS/Explain
  • 17. Replica Sets • Equivalent to MySQL’s replication, but not quite. • Resiliency and availability through cleverness. • ReplicaSet setups • rs.stepDown() • rs.slaveOk() • w parameter
  • 18. Replica Sets // mongod.conf replSet = myreplica // Initiate the Replica Set > rs.initiate() //Add a node > rs.add(“myreplica1:27017”); // Allow reads from the secondaries > rs.slaveOk() // Write something. > db.replica.insert({x:1}); // Make sure write propogates to majority of servers. > db.runCommand( { getlasterror : 1 , w : "majority" } ) http://www.mongodb.org/display/DOCS/Replica+Set+Commands http://docs.mongodb.org/manual/applications/replication/#read-preference http://www.mongodb.org/display/DOCS/Verifying+Propagation+of+Writes+with+getLastError
  • 19. Sharding • Goal: Distribute data across many nodes / replica sets. • Provides baked-in horizontal scalability.
  • 20. Sharding • Chunks. • Routing process - mongos (manages balancing, and query routing) • Shards - mongod configured with shardsvr = 1. • Config servers - 3 mongod servers configured on mongos.conf • Can be replica sets, or stand alone servers. • Shard Key
  • 21. Sharding // mongos.conf configdb = server,server,server // Initiate // connect to mongos the same way you would connect to mongod > db.runCommand( { addshard : "<serverhostname>[:<port>]" } ); // Shard a collection > db.runCommand( { shardcollection : "test.fs.chunks", key : { files_id : 1 } } ) http://www.mongodb.org/display/DOCS/Replica+Set+Commands
  • 22. Indexing Big - Sharded Index • Always run on mongos • Background Indexing • Sparse indexes
  • 23. Maintenance • Replica Sets • Add Node / Remove Node • rs.stepDown() • Shards • Drain Shard / Add Shard
  • 24. Gotchas • A Sharded cluster with a shard down will return no results. • If a chunk has too much data dedicated to a single shard key, and cannot split it, balancing will become blocked, and the cluster will become unbalanced.
  • 25. Hardware / OS • No knobs in MongoDB. • Filesystem. (ext4 or xfs) • Memory. • Unix distro. Linux kernel >= 2.6.23 http://www.mongodb.org/display/DOCS/Production+Notes#ProductionNotes-LinuxFileSystems
  • 26. That’s it! Thank you all for coming! Feedback is welcome!

Notas do Editor

  1. \n
  2. I&amp;#x2019;m Chris Henry, CTO of Behance.\n\nGoals of this class -&gt; Learn a bit about MongoDB itself, and learn if MongoDB is the solution you want, learn how to deal with some pitfalls that aren&amp;#x2019;t exactly clear in the docs....or y&amp;#x2019;know, anywhere.\n\nThis is meant to be a conversation, stop me any time something isn&amp;#x2019;t clear.\n
  3. Ask.\n\nI&amp;#x2019;ve been using MongoDB for 3 years now, for any number of failed projects no longer in existence.\n
  4. Show the Activity Feed\n\nData Porn\n\n\n
  5. \n
  6. Show the Activity Feed\n\nData Porn\n\n\n
  7. Easy to use -&gt; installation on most systems is a single line. Updating is easy. There&amp;#x2019;s a driver for basically every language. JSON - like Storage makes modeling easy.\n\nEasy to Iterate on -&gt; Once modeled, making changes is really easy. Just add the key to the document you need. to remove, iterate through and use the $unset operator\n\nWhy is &amp;#x201C;Fast&amp;#x201D; in quotes? Just like anything piece of software, the way you manage it / deploy it / write code for it really determine how &amp;#x201C;Fast&amp;#x201D; it is.\n\n\n
  8. Bleeding edge -&gt; will cut you. Definitely production ready, but beware that you will need to devote serious time and effort, and will potentially have problems scaling.\n\nBattle scars -&gt; software gets better by being in production for a long time. MySQL / Postgres all have the benefit of this. MongoDB is still the new kid in town.\n\nTried and true -&gt; Many paradigms are document design are still developing.\n\nTxn support -&gt; don&amp;#x2019;t put important data that requires transaction support.\n\nGood news\n10Gen is a aware of most of Mongo&amp;#x2019;s flaws, and is doing a stellar job of listening to the community and making changes.\n
  9. Autosharding -&gt; if your data gets too big, just add more capacity, without having any thing about the way your app connects to mongo.\n\nReplica Sets -&gt; replicas of data that are smart enough to handle outages, and members disappearing.\n\nHorizontal -&gt; Too much data? Just add more shards / nodes. More Nodes = More Scalability.\n\nCloud* -&gt; Good fit for the on-demand, super fast provisioning of new instances. Why Asterisk? Shitty fit for Disk IO in cloud.\n
  10. Data -&gt; BSON format allows for much more flixibility, but as documents change size, they need to be moved, which takes up more space. Same problem as memcached.\n\nIO -&gt; AWS has bad neighbor problem. Never sure who else on your virtualized machine will be thrashing the disk. When they do, writes take longer\n\nGlobal lock -&gt; huge problem for write intensive standalone servers.In 2.2, this is changing to a collection level lock. In Behance&amp;#x2019;s use case, this isn&amp;#x2019;t really helpful, since we have one main collection.\n\nMgmt. -&gt; Debatable. However, in our case, we have keep a much closer eye on data size, index size against available memory.\n
  11. \n
  12. Running any large Database cluster takes some work. However, Mongo seems to be a bit more on the needy side than MySQL.\n
  13. \n
  14. Get the Sterling Archer image here.\n\n\n
  15. What&amp;#x2019;s nice about keeping slow operations in a collection is that you can query them the same way you would query your collection.\n\n\nShard10-3 has profiling enabled. \n\ndb.system.profile.find( { millis : { $gt : 5 } } ).limit(1).pretty()\n
  16. \ndb.activity.find({ &quot;user&quot; : NumberLong(981122), &quot;verb&quot; : { &quot;$in&quot; : [300] }, &quot;type&quot; : 1}).sort({&quot;ts_mo&quot; : -1}).explain()\n
  17. cleverness -&gt; unlike MySQL, MongoDB replica nodes keep track of each other&amp;#x2019;s state. If one goes down, an election is held between the rest of the nodes, and a new node is elected primary. Since all drivers will detect nodes in the set, writes are then directed there.\n\nsetups -&gt; 2 Nodes + Arbiter OR 3 nodes\n\nrs.stepDown() -&gt; force the primary to relinquish role as primary, and elect a secondary as primary\n\nslaveOk -&gt; setting this parameter in the driver will send reads to the secondaries.\n\n\n
  18. cleverness -&gt; unlike MySQL, MongoDB replica nodes keep track of each other&amp;#x2019;s state. If one goes down, an election is held between the rest of the nodes, and a new node is elected primary. Since all drivers will detect nodes in the set, writes are then directed there.\n\nsetups -&gt; 2 Nodes + Arbiter OR 3 nodes\n\nrs.stepDown() -&gt; force the primary to relinquish role as primary, and elect a secondary as primary\n\nslaveOk -&gt; setting this parameter in the driver will send reads to the secondaries.\n\n\n
  19. \n
  20. \n
  21. \n
  22. &amp;#xA0;- Beware: Backgrounding will index in the background on the primary, but in the foreground if on secondary. Use only primary when indexing. Do it at off peak hours.\n
  23. \n
  24. \n
  25. \n
  26. \n