SlideShare a Scribd company logo
1 of 29
Charity Majors
 @mipsytipsy
Topics:
•   Replica sets

•   Resources and capacity planning

•   Provisioning with chef

•   Snapshotting

•   Scaling tips

•   Monitoring

•   Disaster mitigation
Replica sets
•   Always use replica sets

•   Distribute across Availability Zones

•   Avoid situations where you have even # voters
    •   50% is not a majority!

•   More votes are better than fewer (max is 7)

•   Add an arbiter for more flexibility

•   Always explicitly set the priority of your nodes.
    Surprise elections are terrible.
Basic sane replica set config




•   Each node has one vote (default)
•   Snapshot node does not serve read queries, cannot become master
•   This configuration can survive any single node or Availability Zone outage
Or manage votes with arbiters




•   Three separate arbiter processes on each AZ arbiter node, one per cluster
•   Maximum of seven votes per replica set
•   Now you can survive all secondaries dying, or an AZ outage
•   If you have even one healthy node, you can continue to serve traffic
•   Arbiters tend to be more reliable than nodes because they have less to do.
Provisioning

•   Memory is your primary constraint, spend your
    money there
    •   Especially for read-heavy workloads

•   Your working set should fit into RAM
    •   lots of page faults means it doesn’t fit
    •   2.4 has a working set estimator in db.serverStatus!

•   Your snapshot host can usually be smaller, if cost is
    a concern
Disk options

•   EBS -- just kidding, EBS is not an option

•   EBS with Provisioned IOPS

•   Ephemeral storage

•   SSD
EBS classic




 EBS with
  PIOPS:
PIOPS
• Guaranteed # of IOPS, up to 2000/volume
• Variability of <0.1%
• Raid together multiple volumes for higher
   performance
• Supports EBS snapshots
• Costs 2x regular EBS
• Can only attach to certain instance types
Estimating PIOPS
• estimate how many IOPS to provision with the “tps”
  column of sar -d 1




• multiply that by 2-3x depending on your spikiness
• when you exceed your PIOPS limit, your disk stops
   for a few seconds Avoid this.
Ephemeral storage
•   Cheap
•   Fast
•   No network latency
•   You can snapshot with LVM + S3
•   Data is lost forever if you stop or resize the instance
•   Can use EBS on your snapshot node to take
    advantage of EBS tools
     •     makes restore a little more complicated
Filesystem
•   Use ext4

•   Raise file descriptor limits (cat /proc/<mongo
    pid>/limits to verify)

•   If you’re using ubuntu, use upstart

•   Set your blockdev --set-ra to something sane, or
    you won’t use all your RAM

•   If you’re using mdadm, make sure your md device
    and its volumes have a small enough block size

•   RAID 10 is the safest and best-performing, RAID 0
    is fine if you understand the risks
Chef everything
 •   Role attributes for backup volumes, cluster names

 •   Nodes are effectively disposable

 •   Provision and attach EBS RAID arrays via AWS
     cookbook

 •   Delete volumes and AWS attributes, run chef-
     client to re-provision

 •   Restore from snapshot automatically with our
     backup scripts
Our mongo cookbook and backup scripts: https://github.com/ParsePlatform/Ops/
Bringing up a new node from the most recent mongo
snapshot is as simple as this:




It’s faster for us to re-provision a node from scratch
than to repair a RAID array or fix most problems.
Each replica set has its own role, where it sets the
cluster name, the snapshot host name, and the EBS
volumes to snapshot.




When you provision a new node for this role,
mongodb::raid_data will build it off the most recent
completed set of snapshots for the volumes specified in
backups => mongo_volumes.
Snapshots

•   Snapshot often

•   Set snapshot node to priority = 0, hidden = 1

•   Lock Mongo OR stop mongod during snapshot

•   Snapshot all RAID volumes
    •   We use ec2-consistent-snapshot:
        http://eric.lubow.org/2011/databases/mongodb/ec2-consistent-snapshot-with-mongo
        , with a wrapper script for chef to generate the backup volume ids

•   Always warm up a snapshot before promoting
Warming a secondary
• Warm up both indexes and data
• Use dd or vmtouch to load files from S3
• Scan for most commonly used collections on
  primary, read those into memory on secondary
• Read collections into memory
   •   Natural sort
   •   Full table scan
   •   Search for something that doesn’t exist

   http://blog.parse.com/2013/03/07/techniques-for-warming-up-mongodb/
Fragmentation
•   Your RAM gets fragmented too!

•   Leads to underuse of memory

•   Deletes are not the only source of fragmentation

•   db.<collection>.stats to find the padding factor
    (between 1 - 2, the higher the more
    fragmentation)

•   Repair, compact, or reslave regularly
    (db.printReplicationInfo() to get the length of your
    oplog to see if repair is a viable option)
Compaction: before and after
Compaction
•   We recommend running a continuous compaction
    script on your snapshot host

•   Every time you provision a new host, it will be
    freshly compacted.

•   Plan to rotate in a compacted primary regularly
    (quarterly, yearly depending on rate of decay)

•   If you also delete a lot of collections, you may need
    to periodically run db.repairDatabase() on each db

        http://blog.parse.com/2013/03/26/always-be-compacting/
Scaling strategies
•   Horizontal scaling

•   Query optimization, index optimization

•   Throw money at it (hardware)

•   Upgrade to > 2.2 to get rid of global lock

•   Read from secondaries

•   Put the journal on a different volume

•   Repair, compact, or reslave
Monitoring

•   MMS

•   Ganglia + nagios
    •   correlate graphs with local metrics like disk i/o
    •   graph your own index ops
    •   graph your own aggregate lock percentages
    •   alert on replication lag, replication error
    •   alert if the primary changes, connection limit

•   Use chef! Generate all your monitoring from roles
fun with MMS

                                opcounters are color-coded by op type!




                                                        big bgflush spike means there was an
                                                        EBS event




lots of page faults means reading
lots of cold data into memory from
disk




 lock percentage is your single best gauge of
 fragility.
so ... what can go wrong?

•   Your queues are rising and queries are piling up

•   Everything seems to be getting vaguely slower

•   Your secondaries are in a crash loop

•   You run out of available connections

•   You can’t elect a primary

•   You have an AWS or EBS outage or degradation

•   You have terrible latency spikes

•   Replication stops
... when queries pile up ...
•   Know what your healthy cluster looks like

•   Don’t switch your primary or restart when
    overloaded

•   Do kill queries before the tipping point

•   Write your kill script before you need it

•   Read your mongodb.log. Enable profiling!

•   Check db.currentOp():
      •   check to see if you’re building any indexes
      •   check queries with a high numYields
      •   check for long running queries
      •   use explain() on them, check for full table scans
      •   sort by number of queries/write locks per namespace
... everything getting slower ...
•   Is your RAID array degraded?

•   Do you need to compact your collections or databases?

•   Are you having EBS problems? Check bgflush

•   Are you reaching your PIOPS limit?

•   Are you snapshotting while serving traffic?


    ... terrible latency spikes ...
mongodb.log is your friend.
... AWS or EBS outage ...
•   Full outages are often less painful than degradation

•   Take down the degraded nodes

•   Stop mongodb to close all connections

•   Hopefully you have balanced across AZs and are
    coasting

•   If you are down and can’t elect a primary, bring up
    a new node with the same hostname and port as a
    downed node
that’s all folks :)




    Charity Majors
     @mipsytipsy

More Related Content

Viewers also liked

MongoDB and Hadoop: Driving Business Insights
MongoDB and Hadoop: Driving Business InsightsMongoDB and Hadoop: Driving Business Insights
MongoDB and Hadoop: Driving Business Insights
MongoDB
 

Viewers also liked (10)

MongoDB and AWS: Integrations
MongoDB and AWS: IntegrationsMongoDB and AWS: Integrations
MongoDB and AWS: Integrations
 
AWS re:Invent 2016: How Thermo Fisher Is Reducing Mass Spectrometry Experimen...
AWS re:Invent 2016: How Thermo Fisher Is Reducing Mass Spectrometry Experimen...AWS re:Invent 2016: How Thermo Fisher Is Reducing Mass Spectrometry Experimen...
AWS re:Invent 2016: How Thermo Fisher Is Reducing Mass Spectrometry Experimen...
 
Plus de flexibilité et de scalabilité chez Bouygues Télécom grâce à MongoDB
Plus de flexibilité et de scalabilité chez Bouygues Télécom grâce à MongoDBPlus de flexibilité et de scalabilité chez Bouygues Télécom grâce à MongoDB
Plus de flexibilité et de scalabilité chez Bouygues Télécom grâce à MongoDB
 
Running MongoDB 3.0 on AWS
Running MongoDB 3.0 on AWSRunning MongoDB 3.0 on AWS
Running MongoDB 3.0 on AWS
 
Best Practices for Running MongoDB on AWS - AWS May 2016 Webinar Series
Best Practices for Running MongoDB on AWS - AWS May 2016 Webinar SeriesBest Practices for Running MongoDB on AWS - AWS May 2016 Webinar Series
Best Practices for Running MongoDB on AWS - AWS May 2016 Webinar Series
 
Configuring MongoDB HA Replica Set on AWS EC2
Configuring MongoDB HA Replica Set on AWS EC2Configuring MongoDB HA Replica Set on AWS EC2
Configuring MongoDB HA Replica Set on AWS EC2
 
Optimizing MongoDB: Lessons Learned at Localytics
Optimizing MongoDB: Lessons Learned at LocalyticsOptimizing MongoDB: Lessons Learned at Localytics
Optimizing MongoDB: Lessons Learned at Localytics
 
How Thermo Fisher Is Reducing Mass Spectrometry Experiment Times from Days to...
How Thermo Fisher Is Reducing Mass Spectrometry Experiment Times from Days to...How Thermo Fisher Is Reducing Mass Spectrometry Experiment Times from Days to...
How Thermo Fisher Is Reducing Mass Spectrometry Experiment Times from Days to...
 
MongoDB and Hadoop: Driving Business Insights
MongoDB and Hadoop: Driving Business InsightsMongoDB and Hadoop: Driving Business Insights
MongoDB and Hadoop: Driving Business Insights
 
Maximizing MongoDB Performance on AWS
Maximizing MongoDB Performance on AWSMaximizing MongoDB Performance on AWS
Maximizing MongoDB Performance on AWS
 

More from MongoDB

More from MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 

Webinar: Best Practices for MongoDB on AWS

  • 2. Topics: • Replica sets • Resources and capacity planning • Provisioning with chef • Snapshotting • Scaling tips • Monitoring • Disaster mitigation
  • 3. Replica sets • Always use replica sets • Distribute across Availability Zones • Avoid situations where you have even # voters • 50% is not a majority! • More votes are better than fewer (max is 7) • Add an arbiter for more flexibility • Always explicitly set the priority of your nodes. Surprise elections are terrible.
  • 4. Basic sane replica set config • Each node has one vote (default) • Snapshot node does not serve read queries, cannot become master • This configuration can survive any single node or Availability Zone outage
  • 5. Or manage votes with arbiters • Three separate arbiter processes on each AZ arbiter node, one per cluster • Maximum of seven votes per replica set • Now you can survive all secondaries dying, or an AZ outage • If you have even one healthy node, you can continue to serve traffic • Arbiters tend to be more reliable than nodes because they have less to do.
  • 6. Provisioning • Memory is your primary constraint, spend your money there • Especially for read-heavy workloads • Your working set should fit into RAM • lots of page faults means it doesn’t fit • 2.4 has a working set estimator in db.serverStatus! • Your snapshot host can usually be smaller, if cost is a concern
  • 7. Disk options • EBS -- just kidding, EBS is not an option • EBS with Provisioned IOPS • Ephemeral storage • SSD
  • 8. EBS classic EBS with PIOPS:
  • 9. PIOPS • Guaranteed # of IOPS, up to 2000/volume • Variability of <0.1% • Raid together multiple volumes for higher performance • Supports EBS snapshots • Costs 2x regular EBS • Can only attach to certain instance types
  • 10. Estimating PIOPS • estimate how many IOPS to provision with the “tps” column of sar -d 1 • multiply that by 2-3x depending on your spikiness • when you exceed your PIOPS limit, your disk stops for a few seconds Avoid this.
  • 11. Ephemeral storage • Cheap • Fast • No network latency • You can snapshot with LVM + S3 • Data is lost forever if you stop or resize the instance • Can use EBS on your snapshot node to take advantage of EBS tools • makes restore a little more complicated
  • 12. Filesystem • Use ext4 • Raise file descriptor limits (cat /proc/<mongo pid>/limits to verify) • If you’re using ubuntu, use upstart • Set your blockdev --set-ra to something sane, or you won’t use all your RAM • If you’re using mdadm, make sure your md device and its volumes have a small enough block size • RAID 10 is the safest and best-performing, RAID 0 is fine if you understand the risks
  • 13. Chef everything • Role attributes for backup volumes, cluster names • Nodes are effectively disposable • Provision and attach EBS RAID arrays via AWS cookbook • Delete volumes and AWS attributes, run chef- client to re-provision • Restore from snapshot automatically with our backup scripts Our mongo cookbook and backup scripts: https://github.com/ParsePlatform/Ops/
  • 14. Bringing up a new node from the most recent mongo snapshot is as simple as this: It’s faster for us to re-provision a node from scratch than to repair a RAID array or fix most problems.
  • 15. Each replica set has its own role, where it sets the cluster name, the snapshot host name, and the EBS volumes to snapshot. When you provision a new node for this role, mongodb::raid_data will build it off the most recent completed set of snapshots for the volumes specified in backups => mongo_volumes.
  • 16. Snapshots • Snapshot often • Set snapshot node to priority = 0, hidden = 1 • Lock Mongo OR stop mongod during snapshot • Snapshot all RAID volumes • We use ec2-consistent-snapshot: http://eric.lubow.org/2011/databases/mongodb/ec2-consistent-snapshot-with-mongo , with a wrapper script for chef to generate the backup volume ids • Always warm up a snapshot before promoting
  • 17. Warming a secondary • Warm up both indexes and data • Use dd or vmtouch to load files from S3 • Scan for most commonly used collections on primary, read those into memory on secondary • Read collections into memory • Natural sort • Full table scan • Search for something that doesn’t exist http://blog.parse.com/2013/03/07/techniques-for-warming-up-mongodb/
  • 18. Fragmentation • Your RAM gets fragmented too! • Leads to underuse of memory • Deletes are not the only source of fragmentation • db.<collection>.stats to find the padding factor (between 1 - 2, the higher the more fragmentation) • Repair, compact, or reslave regularly (db.printReplicationInfo() to get the length of your oplog to see if repair is a viable option)
  • 20. Compaction • We recommend running a continuous compaction script on your snapshot host • Every time you provision a new host, it will be freshly compacted. • Plan to rotate in a compacted primary regularly (quarterly, yearly depending on rate of decay) • If you also delete a lot of collections, you may need to periodically run db.repairDatabase() on each db http://blog.parse.com/2013/03/26/always-be-compacting/
  • 21. Scaling strategies • Horizontal scaling • Query optimization, index optimization • Throw money at it (hardware) • Upgrade to > 2.2 to get rid of global lock • Read from secondaries • Put the journal on a different volume • Repair, compact, or reslave
  • 22. Monitoring • MMS • Ganglia + nagios • correlate graphs with local metrics like disk i/o • graph your own index ops • graph your own aggregate lock percentages • alert on replication lag, replication error • alert if the primary changes, connection limit • Use chef! Generate all your monitoring from roles
  • 23. fun with MMS opcounters are color-coded by op type! big bgflush spike means there was an EBS event lots of page faults means reading lots of cold data into memory from disk lock percentage is your single best gauge of fragility.
  • 24. so ... what can go wrong? • Your queues are rising and queries are piling up • Everything seems to be getting vaguely slower • Your secondaries are in a crash loop • You run out of available connections • You can’t elect a primary • You have an AWS or EBS outage or degradation • You have terrible latency spikes • Replication stops
  • 25. ... when queries pile up ... • Know what your healthy cluster looks like • Don’t switch your primary or restart when overloaded • Do kill queries before the tipping point • Write your kill script before you need it • Read your mongodb.log. Enable profiling! • Check db.currentOp(): • check to see if you’re building any indexes • check queries with a high numYields • check for long running queries • use explain() on them, check for full table scans • sort by number of queries/write locks per namespace
  • 26. ... everything getting slower ... • Is your RAID array degraded? • Do you need to compact your collections or databases? • Are you having EBS problems? Check bgflush • Are you reaching your PIOPS limit? • Are you snapshotting while serving traffic? ... terrible latency spikes ...
  • 28. ... AWS or EBS outage ... • Full outages are often less painful than degradation • Take down the degraded nodes • Stop mongodb to close all connections • Hopefully you have balanced across AZs and are coasting • If you are down and can’t elect a primary, bring up a new node with the same hostname and port as a downed node
  • 29. that’s all folks :) Charity Majors @mipsytipsy