SlideShare uma empresa Scribd logo
1 de 26
The Care & Feeding of a
Large MongoDB Cluster
                          2012

Chris Henry
@chrishnry
hello.
Who uses MongoDB?

In Production?
MongoDB @ Behance


 Activity Feed v1
 ~ 40 Nodes
 ~ 250 Million Docs
 ~ ext3 Filesystem

 Ran for around 3 months, then...
MongoDB @ Behance
MongoDB @ Behance


 Activity Feed v2
 • ext4 Filesystem
 • 2 TB data
 • 3 Collections
 • 15k Chunks
 • 19 Shards
 • 60 Nodes
 • 400M Docs at peak
 • Ranges ~120-250M
Why MongoDB?


  • Easy to use.
  • Easy to Iterate on.
  • Devs like it.
  • Fantastic Community built by 10Gen.
  • “Fast.”
Why NOT MongoDB?


  • Bleeding Edge.
  • Not enough battle scars.
  • Fewer tried and true fixes.
  • No Transactional support.
Why MongoDB at Scale?


  • Autosharding. (Stop hacking your app.)
  • Smart Replica Sets / High Availability.
  • Horizontal Scalability
  • Easy to grow and shrink.
  • Good fit for cloud*.
Why NOT MongoDB at Scale?


  • Data can take up more space on disk.
  • Disk IO in the cloud sucks.
  • Database-level write lock.
  • More Management than a MySQL cluster.
Behance’s Use Case + Fit


  • Data is ephemeral.
  • Denormalization of existing data.
  • Fan Out Approach.
  • Sharded by User.
Care & Feeding. Srsly?


  • As a less mature DB, admins need to be a
  bit more aware.
  • No Different than MySQL (Need to take into
  account memory, disk, usage patterns,
  indexes)
  • Watch Error Logs, Disk Use, Data Size, # of
  Chunks, Old files, Sharding Status, Padding
  Factors...
MongoDB Basics




    MongoDB Docs are great, and always improving.

    http://docs.mongodb.org/manual/
Indexes



  You need them.




  duh and/or hello.
Profiling


  • The profiler is equivalent to the slow log in
  MySQL.

  • Logs all operations slower than X seconds
  to a collection.
  // log slow operations, slow threshold=50ms
  > db.setProfilingLevel(1,50)

  // get operations that were slow.
  db.system.profile.find( { millis : { $gt : 5 } } )


  http://www.mongodb.org/display/DOCS/Database+Profiler
Explain


   • Equivalent to MySQL’s EXPLAIN
   • From Profiler, grab $query + $orderby, build
   into real query.

 // Explain a query
 db.collection.find({ x: 1 }).explain()


 http://www.mongodb.org/display/DOCS/Explain
Replica Sets


 • Equivalent to MySQL’s replication, but not quite.
 • Resiliency and availability through cleverness.
 • ReplicaSet setups
 • rs.stepDown()
 • rs.slaveOk()
 • w parameter
Replica Sets

// mongod.conf
replSet = myreplica

// Initiate the Replica Set
> rs.initiate()

//Add a node
> rs.add(“myreplica1:27017”);

// Allow reads from the secondaries
> rs.slaveOk()

// Write something.
> db.replica.insert({x:1});

// Make sure write propogates to majority of servers.
> db.runCommand( { getlasterror : 1 , w : "majority" } )


http://www.mongodb.org/display/DOCS/Replica+Set+Commands
http://docs.mongodb.org/manual/applications/replication/#read-preference
http://www.mongodb.org/display/DOCS/Verifying+Propagation+of+Writes+with+getLastError
Sharding


  • Goal: Distribute data across many nodes / replica sets.
  • Provides baked-in horizontal scalability.
Sharding


  • Chunks.
  • Routing process - mongos (manages balancing, and query routing)
  • Shards - mongod configured with shardsvr = 1.
  • Config servers - 3 mongod servers configured on mongos.conf
  • Can be replica sets, or stand alone servers.
  • Shard Key
Sharding


// mongos.conf
configdb = server,server,server

// Initiate
// connect to mongos the same way you would connect to mongod
> db.runCommand( { addshard : "<serverhostname>[:<port>]" } );

// Shard a collection
> db.runCommand( { shardcollection : "test.fs.chunks", key :
{ files_id : 1 } } )


http://www.mongodb.org/display/DOCS/Replica+Set+Commands
Indexing Big - Sharded Index


 • Always run on mongos
 • Background Indexing
 • Sparse indexes
Maintenance


 • Replica Sets
 • Add Node / Remove Node
 • rs.stepDown()


 • Shards
 • Drain Shard / Add Shard
Gotchas


 • A Sharded cluster with a shard down will
 return no results.

 • If a chunk has too much data dedicated to a
 single shard key, and cannot split it, balancing
 will become blocked, and the cluster will
 become unbalanced.
Hardware / OS


 • No knobs in MongoDB.
 • Filesystem. (ext4 or xfs)
 • Memory.
 • Unix distro. Linux kernel >= 2.6.23
 http://www.mongodb.org/display/DOCS/Production+Notes#ProductionNotes-LinuxFileSystems
That’s it!


 Thank you all for coming!

 Feedback is welcome!

Mais conteúdo relacionado

Mais procurados

Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to RedisKnoldus Inc.
 
Installing postgres & postgis
Installing postgres & postgisInstalling postgres & postgis
Installing postgres & postgisJohn Ashmead
 
A simple introduction to redis
A simple introduction to redisA simple introduction to redis
A simple introduction to redisZhichao Liang
 
Redis overview for Software Architecture Forum
Redis overview for Software Architecture ForumRedis overview for Software Architecture Forum
Redis overview for Software Architecture ForumChristopher Spring
 
Perl for System Automation - 01 Advanced File Processing
Perl for System Automation - 01 Advanced File ProcessingPerl for System Automation - 01 Advanced File Processing
Perl for System Automation - 01 Advanced File ProcessingDanairat Thanabodithammachari
 
phptek13 - Caching and tuning fun tutorial
phptek13 - Caching and tuning fun tutorialphptek13 - Caching and tuning fun tutorial
phptek13 - Caching and tuning fun tutorialWim Godden
 
Mysql Fulltext Search 1
Mysql Fulltext Search 1Mysql Fulltext Search 1
Mysql Fulltext Search 1johnymas
 
XtraDB 5.7: key performance algorithms
XtraDB 5.7: key performance algorithmsXtraDB 5.7: key performance algorithms
XtraDB 5.7: key performance algorithmsLaurynas Biveinis
 
Elastic Search Training#1 (brief tutorial)-ESCC#1
Elastic Search Training#1 (brief tutorial)-ESCC#1Elastic Search Training#1 (brief tutorial)-ESCC#1
Elastic Search Training#1 (brief tutorial)-ESCC#1medcl
 
Ugif 10 2012 beauty ofifmxdiskstructs ugif
Ugif 10 2012 beauty ofifmxdiskstructs ugifUgif 10 2012 beauty ofifmxdiskstructs ugif
Ugif 10 2012 beauty ofifmxdiskstructs ugifUGIF
 
Introduction to MongoDB with PHP
Introduction to MongoDB with PHPIntroduction to MongoDB with PHP
Introduction to MongoDB with PHPfwso
 
Non-Framework MVC sites with PHP
Non-Framework MVC sites with PHPNon-Framework MVC sites with PHP
Non-Framework MVC sites with PHPCésar Rodas
 
PGDAY FR 2014 : presentation de Postgresql chez leboncoin.fr
PGDAY FR 2014 : presentation de Postgresql chez leboncoin.frPGDAY FR 2014 : presentation de Postgresql chez leboncoin.fr
PGDAY FR 2014 : presentation de Postgresql chez leboncoin.frjlb666
 
Introduction to redis
Introduction to redisIntroduction to redis
Introduction to redisTanu Siwag
 
Thinking in documents
Thinking in documentsThinking in documents
Thinking in documentsCésar Rodas
 

Mais procurados (20)

Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
 
Installing postgres & postgis
Installing postgres & postgisInstalling postgres & postgis
Installing postgres & postgis
 
A simple introduction to redis
A simple introduction to redisA simple introduction to redis
A simple introduction to redis
 
Redis overview for Software Architecture Forum
Redis overview for Software Architecture ForumRedis overview for Software Architecture Forum
Redis overview for Software Architecture Forum
 
MongoDB & PHP
MongoDB & PHPMongoDB & PHP
MongoDB & PHP
 
Perl Programming - 03 Programming File
Perl Programming - 03 Programming FilePerl Programming - 03 Programming File
Perl Programming - 03 Programming File
 
Perl for System Automation - 01 Advanced File Processing
Perl for System Automation - 01 Advanced File ProcessingPerl for System Automation - 01 Advanced File Processing
Perl for System Automation - 01 Advanced File Processing
 
phptek13 - Caching and tuning fun tutorial
phptek13 - Caching and tuning fun tutorialphptek13 - Caching and tuning fun tutorial
phptek13 - Caching and tuning fun tutorial
 
Redis and it's data types
Redis and it's data typesRedis and it's data types
Redis and it's data types
 
Mysql Fulltext Search 1
Mysql Fulltext Search 1Mysql Fulltext Search 1
Mysql Fulltext Search 1
 
Redis basics
Redis basicsRedis basics
Redis basics
 
XtraDB 5.7: key performance algorithms
XtraDB 5.7: key performance algorithmsXtraDB 5.7: key performance algorithms
XtraDB 5.7: key performance algorithms
 
Redis 101
Redis 101Redis 101
Redis 101
 
Elastic Search Training#1 (brief tutorial)-ESCC#1
Elastic Search Training#1 (brief tutorial)-ESCC#1Elastic Search Training#1 (brief tutorial)-ESCC#1
Elastic Search Training#1 (brief tutorial)-ESCC#1
 
Ugif 10 2012 beauty ofifmxdiskstructs ugif
Ugif 10 2012 beauty ofifmxdiskstructs ugifUgif 10 2012 beauty ofifmxdiskstructs ugif
Ugif 10 2012 beauty ofifmxdiskstructs ugif
 
Introduction to MongoDB with PHP
Introduction to MongoDB with PHPIntroduction to MongoDB with PHP
Introduction to MongoDB with PHP
 
Non-Framework MVC sites with PHP
Non-Framework MVC sites with PHPNon-Framework MVC sites with PHP
Non-Framework MVC sites with PHP
 
PGDAY FR 2014 : presentation de Postgresql chez leboncoin.fr
PGDAY FR 2014 : presentation de Postgresql chez leboncoin.frPGDAY FR 2014 : presentation de Postgresql chez leboncoin.fr
PGDAY FR 2014 : presentation de Postgresql chez leboncoin.fr
 
Introduction to redis
Introduction to redisIntroduction to redis
Introduction to redis
 
Thinking in documents
Thinking in documentsThinking in documents
Thinking in documents
 

Destaque

Cost savings and expert system advice with athene ES/1
Cost savings and expert system advice with athene ES/1 Cost savings and expert system advice with athene ES/1
Cost savings and expert system advice with athene ES/1 Metron
 
Big Data Performance and Capacity Management
Big Data Performance and Capacity ManagementBig Data Performance and Capacity Management
Big Data Performance and Capacity Managementrightsize
 
Scaling Spark Workloads on YARN - Boulder/Denver July 2015
Scaling Spark Workloads on YARN - Boulder/Denver July 2015Scaling Spark Workloads on YARN - Boulder/Denver July 2015
Scaling Spark Workloads on YARN - Boulder/Denver July 2015Mac Moore
 
Capacity Management and BigData/Hadoop - Hitchhiker's guide for the Capacity ...
Capacity Management and BigData/Hadoop - Hitchhiker's guide for the Capacity ...Capacity Management and BigData/Hadoop - Hitchhiker's guide for the Capacity ...
Capacity Management and BigData/Hadoop - Hitchhiker's guide for the Capacity ...Renato Bonomini
 
Provisioning and Capacity Planning (Travel Meets Big Data)
Provisioning and Capacity Planning (Travel Meets Big Data)Provisioning and Capacity Planning (Travel Meets Big Data)
Provisioning and Capacity Planning (Travel Meets Big Data)Brian Brazil
 

Destaque (6)

Cost savings and expert system advice with athene ES/1
Cost savings and expert system advice with athene ES/1 Cost savings and expert system advice with athene ES/1
Cost savings and expert system advice with athene ES/1
 
Big Data Performance and Capacity Management
Big Data Performance and Capacity ManagementBig Data Performance and Capacity Management
Big Data Performance and Capacity Management
 
Scaling Spark Workloads on YARN - Boulder/Denver July 2015
Scaling Spark Workloads on YARN - Boulder/Denver July 2015Scaling Spark Workloads on YARN - Boulder/Denver July 2015
Scaling Spark Workloads on YARN - Boulder/Denver July 2015
 
Capacity Management and BigData/Hadoop - Hitchhiker's guide for the Capacity ...
Capacity Management and BigData/Hadoop - Hitchhiker's guide for the Capacity ...Capacity Management and BigData/Hadoop - Hitchhiker's guide for the Capacity ...
Capacity Management and BigData/Hadoop - Hitchhiker's guide for the Capacity ...
 
Provisioning and Capacity Planning (Travel Meets Big Data)
Provisioning and Capacity Planning (Travel Meets Big Data)Provisioning and Capacity Planning (Travel Meets Big Data)
Provisioning and Capacity Planning (Travel Meets Big Data)
 
Modern Data Architecture
Modern Data ArchitectureModern Data Architecture
Modern Data Architecture
 

Semelhante a Care & Feeding of Large MongoDB Clusters

Deployment Strategy
Deployment StrategyDeployment Strategy
Deployment StrategyMongoDB
 
MongoDB Pros and Cons
MongoDB Pros and ConsMongoDB Pros and Cons
MongoDB Pros and Consjohnrjenson
 
Mongo db first steps with csharp
Mongo db first steps with csharpMongo db first steps with csharp
Mongo db first steps with csharpSerdar Buyuktemiz
 
Deployment Strategies
Deployment StrategiesDeployment Strategies
Deployment StrategiesMongoDB
 
MongoDB : Scaling, Security & Performance
MongoDB : Scaling, Security & PerformanceMongoDB : Scaling, Security & Performance
MongoDB : Scaling, Security & PerformanceSasidhar Gogulapati
 
MongoDB 2.4 and spring data
MongoDB 2.4 and spring dataMongoDB 2.4 and spring data
MongoDB 2.4 and spring dataJimmy Ray
 
MongoDB Knowledge Shareing
MongoDB Knowledge ShareingMongoDB Knowledge Shareing
MongoDB Knowledge ShareingPhilip Zhong
 
5 Pitfalls to Avoid with MongoDB
5 Pitfalls to Avoid with MongoDB5 Pitfalls to Avoid with MongoDB
5 Pitfalls to Avoid with MongoDBTim Callaghan
 
Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
Cassandra Community Webinar: From Mongo to Cassandra, Architectural LessonsCassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
Cassandra Community Webinar: From Mongo to Cassandra, Architectural LessonsDataStax
 
MongoDB vs Mysql. A devops point of view
MongoDB vs Mysql. A devops point of viewMongoDB vs Mysql. A devops point of view
MongoDB vs Mysql. A devops point of viewPierre Baillet
 
MongoDB Internals
MongoDB InternalsMongoDB Internals
MongoDB InternalsSiraj Memon
 
MongoDB Introduction - Document Oriented Nosql Database
MongoDB Introduction - Document Oriented Nosql DatabaseMongoDB Introduction - Document Oriented Nosql Database
MongoDB Introduction - Document Oriented Nosql DatabaseSudhir Patil
 
Conceptos básicos. Seminario web 6: Despliegue de producción
Conceptos básicos. Seminario web 6: Despliegue de producciónConceptos básicos. Seminario web 6: Despliegue de producción
Conceptos básicos. Seminario web 6: Despliegue de producciónMongoDB
 

Semelhante a Care & Feeding of Large MongoDB Clusters (20)

Deployment Strategy
Deployment StrategyDeployment Strategy
Deployment Strategy
 
MongoDB Pros and Cons
MongoDB Pros and ConsMongoDB Pros and Cons
MongoDB Pros and Cons
 
MongoDB
MongoDBMongoDB
MongoDB
 
Mongo db first steps with csharp
Mongo db first steps with csharpMongo db first steps with csharp
Mongo db first steps with csharp
 
MongoDB
MongoDBMongoDB
MongoDB
 
Deployment Strategies
Deployment StrategiesDeployment Strategies
Deployment Strategies
 
MongoDB : Scaling, Security & Performance
MongoDB : Scaling, Security & PerformanceMongoDB : Scaling, Security & Performance
MongoDB : Scaling, Security & Performance
 
MongoDB 2.4 and spring data
MongoDB 2.4 and spring dataMongoDB 2.4 and spring data
MongoDB 2.4 and spring data
 
MongoDB Knowledge Shareing
MongoDB Knowledge ShareingMongoDB Knowledge Shareing
MongoDB Knowledge Shareing
 
MongoDB
MongoDBMongoDB
MongoDB
 
5 Pitfalls to Avoid with MongoDB
5 Pitfalls to Avoid with MongoDB5 Pitfalls to Avoid with MongoDB
5 Pitfalls to Avoid with MongoDB
 
Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
Cassandra Community Webinar: From Mongo to Cassandra, Architectural LessonsCassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
 
MongoDB vs Mysql. A devops point of view
MongoDB vs Mysql. A devops point of viewMongoDB vs Mysql. A devops point of view
MongoDB vs Mysql. A devops point of view
 
MongoDB Internals
MongoDB InternalsMongoDB Internals
MongoDB Internals
 
Mongodb
MongodbMongodb
Mongodb
 
Deployment
DeploymentDeployment
Deployment
 
Introduction to Mongodb
Introduction to MongodbIntroduction to Mongodb
Introduction to Mongodb
 
MongoDB Introduction - Document Oriented Nosql Database
MongoDB Introduction - Document Oriented Nosql DatabaseMongoDB Introduction - Document Oriented Nosql Database
MongoDB Introduction - Document Oriented Nosql Database
 
Conceptos básicos. Seminario web 6: Despliegue de producción
Conceptos básicos. Seminario web 6: Despliegue de producciónConceptos básicos. Seminario web 6: Despliegue de producción
Conceptos básicos. Seminario web 6: Despliegue de producción
 
mongodb tutorial
mongodb tutorialmongodb tutorial
mongodb tutorial
 

Último

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 

Último (20)

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 

Care & Feeding of Large MongoDB Clusters

  • 1. The Care & Feeding of a Large MongoDB Cluster 2012 Chris Henry @chrishnry
  • 3. Who uses MongoDB? In Production?
  • 4. MongoDB @ Behance Activity Feed v1 ~ 40 Nodes ~ 250 Million Docs ~ ext3 Filesystem Ran for around 3 months, then...
  • 6. MongoDB @ Behance Activity Feed v2 • ext4 Filesystem • 2 TB data • 3 Collections • 15k Chunks • 19 Shards • 60 Nodes • 400M Docs at peak • Ranges ~120-250M
  • 7. Why MongoDB? • Easy to use. • Easy to Iterate on. • Devs like it. • Fantastic Community built by 10Gen. • “Fast.”
  • 8. Why NOT MongoDB? • Bleeding Edge. • Not enough battle scars. • Fewer tried and true fixes. • No Transactional support.
  • 9. Why MongoDB at Scale? • Autosharding. (Stop hacking your app.) • Smart Replica Sets / High Availability. • Horizontal Scalability • Easy to grow and shrink. • Good fit for cloud*.
  • 10. Why NOT MongoDB at Scale? • Data can take up more space on disk. • Disk IO in the cloud sucks. • Database-level write lock. • More Management than a MySQL cluster.
  • 11. Behance’s Use Case + Fit • Data is ephemeral. • Denormalization of existing data. • Fan Out Approach. • Sharded by User.
  • 12. Care & Feeding. Srsly? • As a less mature DB, admins need to be a bit more aware. • No Different than MySQL (Need to take into account memory, disk, usage patterns, indexes) • Watch Error Logs, Disk Use, Data Size, # of Chunks, Old files, Sharding Status, Padding Factors...
  • 13. MongoDB Basics MongoDB Docs are great, and always improving. http://docs.mongodb.org/manual/
  • 14. Indexes You need them. duh and/or hello.
  • 15. Profiling • The profiler is equivalent to the slow log in MySQL. • Logs all operations slower than X seconds to a collection. // log slow operations, slow threshold=50ms > db.setProfilingLevel(1,50) // get operations that were slow. db.system.profile.find( { millis : { $gt : 5 } } ) http://www.mongodb.org/display/DOCS/Database+Profiler
  • 16. Explain • Equivalent to MySQL’s EXPLAIN • From Profiler, grab $query + $orderby, build into real query. // Explain a query db.collection.find({ x: 1 }).explain() http://www.mongodb.org/display/DOCS/Explain
  • 17. Replica Sets • Equivalent to MySQL’s replication, but not quite. • Resiliency and availability through cleverness. • ReplicaSet setups • rs.stepDown() • rs.slaveOk() • w parameter
  • 18. Replica Sets // mongod.conf replSet = myreplica // Initiate the Replica Set > rs.initiate() //Add a node > rs.add(“myreplica1:27017”); // Allow reads from the secondaries > rs.slaveOk() // Write something. > db.replica.insert({x:1}); // Make sure write propogates to majority of servers. > db.runCommand( { getlasterror : 1 , w : "majority" } ) http://www.mongodb.org/display/DOCS/Replica+Set+Commands http://docs.mongodb.org/manual/applications/replication/#read-preference http://www.mongodb.org/display/DOCS/Verifying+Propagation+of+Writes+with+getLastError
  • 19. Sharding • Goal: Distribute data across many nodes / replica sets. • Provides baked-in horizontal scalability.
  • 20. Sharding • Chunks. • Routing process - mongos (manages balancing, and query routing) • Shards - mongod configured with shardsvr = 1. • Config servers - 3 mongod servers configured on mongos.conf • Can be replica sets, or stand alone servers. • Shard Key
  • 21. Sharding // mongos.conf configdb = server,server,server // Initiate // connect to mongos the same way you would connect to mongod > db.runCommand( { addshard : "<serverhostname>[:<port>]" } ); // Shard a collection > db.runCommand( { shardcollection : "test.fs.chunks", key : { files_id : 1 } } ) http://www.mongodb.org/display/DOCS/Replica+Set+Commands
  • 22. Indexing Big - Sharded Index • Always run on mongos • Background Indexing • Sparse indexes
  • 23. Maintenance • Replica Sets • Add Node / Remove Node • rs.stepDown() • Shards • Drain Shard / Add Shard
  • 24. Gotchas • A Sharded cluster with a shard down will return no results. • If a chunk has too much data dedicated to a single shard key, and cannot split it, balancing will become blocked, and the cluster will become unbalanced.
  • 25. Hardware / OS • No knobs in MongoDB. • Filesystem. (ext4 or xfs) • Memory. • Unix distro. Linux kernel >= 2.6.23 http://www.mongodb.org/display/DOCS/Production+Notes#ProductionNotes-LinuxFileSystems
  • 26. That’s it! Thank you all for coming! Feedback is welcome!

Notas do Editor

  1. \n
  2. I&amp;#x2019;m Chris Henry, CTO of Behance.\n\nGoals of this class -&gt; Learn a bit about MongoDB itself, and learn if MongoDB is the solution you want, learn how to deal with some pitfalls that aren&amp;#x2019;t exactly clear in the docs....or y&amp;#x2019;know, anywhere.\n\nThis is meant to be a conversation, stop me any time something isn&amp;#x2019;t clear.\n
  3. Ask.\n\nI&amp;#x2019;ve been using MongoDB for 3 years now, for any number of failed projects no longer in existence.\n
  4. Show the Activity Feed\n\nData Porn\n\n\n
  5. \n
  6. Show the Activity Feed\n\nData Porn\n\n\n
  7. Easy to use -&gt; installation on most systems is a single line. Updating is easy. There&amp;#x2019;s a driver for basically every language. JSON - like Storage makes modeling easy.\n\nEasy to Iterate on -&gt; Once modeled, making changes is really easy. Just add the key to the document you need. to remove, iterate through and use the $unset operator\n\nWhy is &amp;#x201C;Fast&amp;#x201D; in quotes? Just like anything piece of software, the way you manage it / deploy it / write code for it really determine how &amp;#x201C;Fast&amp;#x201D; it is.\n\n\n
  8. Bleeding edge -&gt; will cut you. Definitely production ready, but beware that you will need to devote serious time and effort, and will potentially have problems scaling.\n\nBattle scars -&gt; software gets better by being in production for a long time. MySQL / Postgres all have the benefit of this. MongoDB is still the new kid in town.\n\nTried and true -&gt; Many paradigms are document design are still developing.\n\nTxn support -&gt; don&amp;#x2019;t put important data that requires transaction support.\n\nGood news\n10Gen is a aware of most of Mongo&amp;#x2019;s flaws, and is doing a stellar job of listening to the community and making changes.\n
  9. Autosharding -&gt; if your data gets too big, just add more capacity, without having any thing about the way your app connects to mongo.\n\nReplica Sets -&gt; replicas of data that are smart enough to handle outages, and members disappearing.\n\nHorizontal -&gt; Too much data? Just add more shards / nodes. More Nodes = More Scalability.\n\nCloud* -&gt; Good fit for the on-demand, super fast provisioning of new instances. Why Asterisk? Shitty fit for Disk IO in cloud.\n
  10. Data -&gt; BSON format allows for much more flixibility, but as documents change size, they need to be moved, which takes up more space. Same problem as memcached.\n\nIO -&gt; AWS has bad neighbor problem. Never sure who else on your virtualized machine will be thrashing the disk. When they do, writes take longer\n\nGlobal lock -&gt; huge problem for write intensive standalone servers.In 2.2, this is changing to a collection level lock. In Behance&amp;#x2019;s use case, this isn&amp;#x2019;t really helpful, since we have one main collection.\n\nMgmt. -&gt; Debatable. However, in our case, we have keep a much closer eye on data size, index size against available memory.\n
  11. \n
  12. Running any large Database cluster takes some work. However, Mongo seems to be a bit more on the needy side than MySQL.\n
  13. \n
  14. Get the Sterling Archer image here.\n\n\n
  15. What&amp;#x2019;s nice about keeping slow operations in a collection is that you can query them the same way you would query your collection.\n\n\nShard10-3 has profiling enabled. \n\ndb.system.profile.find( { millis : { $gt : 5 } } ).limit(1).pretty()\n
  16. \ndb.activity.find({ &quot;user&quot; : NumberLong(981122), &quot;verb&quot; : { &quot;$in&quot; : [300] }, &quot;type&quot; : 1}).sort({&quot;ts_mo&quot; : -1}).explain()\n
  17. cleverness -&gt; unlike MySQL, MongoDB replica nodes keep track of each other&amp;#x2019;s state. If one goes down, an election is held between the rest of the nodes, and a new node is elected primary. Since all drivers will detect nodes in the set, writes are then directed there.\n\nsetups -&gt; 2 Nodes + Arbiter OR 3 nodes\n\nrs.stepDown() -&gt; force the primary to relinquish role as primary, and elect a secondary as primary\n\nslaveOk -&gt; setting this parameter in the driver will send reads to the secondaries.\n\n\n
  18. cleverness -&gt; unlike MySQL, MongoDB replica nodes keep track of each other&amp;#x2019;s state. If one goes down, an election is held between the rest of the nodes, and a new node is elected primary. Since all drivers will detect nodes in the set, writes are then directed there.\n\nsetups -&gt; 2 Nodes + Arbiter OR 3 nodes\n\nrs.stepDown() -&gt; force the primary to relinquish role as primary, and elect a secondary as primary\n\nslaveOk -&gt; setting this parameter in the driver will send reads to the secondaries.\n\n\n
  19. \n
  20. \n
  21. \n
  22. &amp;#xA0;- Beware: Backgrounding will index in the background on the primary, but in the foreground if on secondary. Use only primary when indexing. Do it at off peak hours.\n
  23. \n
  24. \n
  25. \n
  26. \n