SlideShare uma empresa Scribd logo
1 de 29
Five MMS Monitoring Alerts to
Keep Your MongoDB Deployment
on Track
Angshuman Bagchi (angshuman@mongodb.com)
Technical Services Engineer
Agenda
•
•
•
•
•

What is MMS Monitoring?
What are Alerts?
How to pick an Alert?
Five recommended Alerts
Wrap up
What is MMS Monitoring?
Who uses MMS?
What are MMS alerts?
Source:
http://www.cleanfunnypics.com/no-its-not-empty/#axzz2pqknJJbC
How to pick an Alert?
•
•
•
•
•

Is there an absolute limit to alert on?
What is normal (baseline) ?
What is worrying (warning) ?
What is a definite problem (critical) ?
Likelihood of false positives ?
... there is no magic formula
Five recommended alerts
• Host Recovering (All, but by definition
Secondary)
• Replication Lag (Secondary)
• Connections (All mongos, mongod)
• Lock % (Primary, Secondary)
• Replica (Primary, Secondary)
Host Recovering
• General alert triggered if any instance
enters RECOVERING mode
• Required for all use-cases
• All Replica Sets should have this.
• Sometimes, during maintenance this
may be expected
Host Recovering
Replication Lag
•
•
•
•

No secondary should be behind
Secondary reads affected
All Replica Sets should have this
Only exception is configured slaveDelay
Replication Lag
Absolute Limit?

Yes, about 1 or 2s. To prevent false positives absolute
threshold > 240s should be alerted

Normal

Lag is ideally 0s

Worrying

< 60s, some false positives

Critical

> 240s

False positives

Above 240s likelihood low.
Example: replication lag
150,000s of lag ~ almost 2 days of lag!
Example: replication lag
• Secondaries under specified vs primaries
• Access patterns between primary /
secondaries
• Insufficient bandwidth
• Foreground index builds on secondaries
“…when you have eliminated the impossible, whatever remains,
however improbable, must be the truth…” -- Sherlock Holmes
Sir Arthur Conan Doyle, The Sign of the Four
Example: replication lag
Example:
• ~1500 ops per minute (opcounters)
• 0.1 MB per object (average object size,
local db)
~1500 ops/min / 60 seconds * 0.1 MB/op *
8b/B =~ 20 mbps required bandwidth
Connections
• Each connection consumes ~ 1MB and
a file descriptor
• 5000 connections => 5GB of RAM
• Stability and predictability are key
Pro-Tip: know thyself
You have to recognize normal to know when it isn’t.

Source: http://www.flickr.com/photos/skippy/6853920/
Connections
Absolute Limit?

Yes, but this is too high. We need to alert before that

Normal

TBD based on deployment, number of nodes, connection
pool settings, app servers, load etc.
Say, X during peak load

Worrying

50% increase, so, 1.5X

Critical

Double, so 2X
Lock %
• Lock contention degrades performance
• High lock % starves replication, reads.
• Bounds need to be determined
Lock %
Absolute Limit?

Yes, >80% occasional degraded performance, 90% major
impact regularly

Normal

TBD. Write heavy loads see higher values. Normal, say
X% during peak load

Worrying

Double, so approximately 2X%

Critical

TBD. For Prod > 80%
Replica
• Represents oplog window
• Depends on
– Rate of operations inserted into oplog
– Size of operations
– Size of oplog capped collection

• Normal maintenance window X 3
• Resizing the oplog is non-trivial
Replica
Absolute Limit?

50% below Normal

Normal

TBD. Say X hours during peak

Worrying

25% below Normal

Critical

50% below Normal
Summary
• Use similar approach for other metrics
• Different audiences for alerts
– Worrying alerts ops team
– Critical goes out to a wider audience

• Get started with MMS Monitoring and
alerts!
I got alerted … now what?
mms.mongodb.com

angshuman@mongodb.com

Mais conteúdo relacionado

Mais de MongoDB

Mais de MongoDB (20)

MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
 
MongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDB
MongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDBMongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDB
MongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDB
 
MongoDB .local Paris 2020: Tout savoir sur le moteur de recherche Full Text S...
MongoDB .local Paris 2020: Tout savoir sur le moteur de recherche Full Text S...MongoDB .local Paris 2020: Tout savoir sur le moteur de recherche Full Text S...
MongoDB .local Paris 2020: Tout savoir sur le moteur de recherche Full Text S...
 
MongoDB .local Paris 2020: Adéo @MongoDB : MongoDB Atlas & Leroy Merlin : et ...
MongoDB .local Paris 2020: Adéo @MongoDB : MongoDB Atlas & Leroy Merlin : et ...MongoDB .local Paris 2020: Adéo @MongoDB : MongoDB Atlas & Leroy Merlin : et ...
MongoDB .local Paris 2020: Adéo @MongoDB : MongoDB Atlas & Leroy Merlin : et ...
 
MongoDB .local Paris 2020: Les bonnes pratiques pour travailler avec les donn...
MongoDB .local Paris 2020: Les bonnes pratiques pour travailler avec les donn...MongoDB .local Paris 2020: Les bonnes pratiques pour travailler avec les donn...
MongoDB .local Paris 2020: Les bonnes pratiques pour travailler avec les donn...
 
MongoDB .local Paris 2020: Devenez explorateur de données avec MongoDB Charts
MongoDB .local Paris 2020: Devenez explorateur de données avec MongoDB ChartsMongoDB .local Paris 2020: Devenez explorateur de données avec MongoDB Charts
MongoDB .local Paris 2020: Devenez explorateur de données avec MongoDB Charts
 
MongoDB .local Paris 2020: La puissance du Pipeline d'Agrégation de MongoDB
MongoDB .local Paris 2020: La puissance du Pipeline d'Agrégation de MongoDBMongoDB .local Paris 2020: La puissance du Pipeline d'Agrégation de MongoDB
MongoDB .local Paris 2020: La puissance du Pipeline d'Agrégation de MongoDB
 
MongoDB .local Toronto 2019: Keep your Business Safe and Scaling Holistically...
MongoDB .local Toronto 2019: Keep your Business Safe and Scaling Holistically...MongoDB .local Toronto 2019: Keep your Business Safe and Scaling Holistically...
MongoDB .local Toronto 2019: Keep your Business Safe and Scaling Holistically...
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Último (20)

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 

Webinar: Five MMS Monitoring Alerts to Keep Your MongoDB Deployment on Track

  • 1. Five MMS Monitoring Alerts to Keep Your MongoDB Deployment on Track Angshuman Bagchi (angshuman@mongodb.com) Technical Services Engineer
  • 2. Agenda • • • • • What is MMS Monitoring? What are Alerts? How to pick an Alert? Five recommended Alerts Wrap up
  • 3. What is MMS Monitoring?
  • 4.
  • 5.
  • 7. What are MMS alerts?
  • 9.
  • 10. How to pick an Alert?
  • 11. • • • • • Is there an absolute limit to alert on? What is normal (baseline) ? What is worrying (warning) ? What is a definite problem (critical) ? Likelihood of false positives ? ... there is no magic formula
  • 12. Five recommended alerts • Host Recovering (All, but by definition Secondary) • Replication Lag (Secondary) • Connections (All mongos, mongod) • Lock % (Primary, Secondary) • Replica (Primary, Secondary)
  • 13. Host Recovering • General alert triggered if any instance enters RECOVERING mode • Required for all use-cases • All Replica Sets should have this. • Sometimes, during maintenance this may be expected
  • 15. Replication Lag • • • • No secondary should be behind Secondary reads affected All Replica Sets should have this Only exception is configured slaveDelay
  • 16. Replication Lag Absolute Limit? Yes, about 1 or 2s. To prevent false positives absolute threshold > 240s should be alerted Normal Lag is ideally 0s Worrying < 60s, some false positives Critical > 240s False positives Above 240s likelihood low.
  • 17. Example: replication lag 150,000s of lag ~ almost 2 days of lag!
  • 18. Example: replication lag • Secondaries under specified vs primaries • Access patterns between primary / secondaries • Insufficient bandwidth • Foreground index builds on secondaries “…when you have eliminated the impossible, whatever remains, however improbable, must be the truth…” -- Sherlock Holmes Sir Arthur Conan Doyle, The Sign of the Four
  • 19. Example: replication lag Example: • ~1500 ops per minute (opcounters) • 0.1 MB per object (average object size, local db) ~1500 ops/min / 60 seconds * 0.1 MB/op * 8b/B =~ 20 mbps required bandwidth
  • 20. Connections • Each connection consumes ~ 1MB and a file descriptor • 5000 connections => 5GB of RAM • Stability and predictability are key
  • 21. Pro-Tip: know thyself You have to recognize normal to know when it isn’t. Source: http://www.flickr.com/photos/skippy/6853920/
  • 22. Connections Absolute Limit? Yes, but this is too high. We need to alert before that Normal TBD based on deployment, number of nodes, connection pool settings, app servers, load etc. Say, X during peak load Worrying 50% increase, so, 1.5X Critical Double, so 2X
  • 23. Lock % • Lock contention degrades performance • High lock % starves replication, reads. • Bounds need to be determined
  • 24. Lock % Absolute Limit? Yes, >80% occasional degraded performance, 90% major impact regularly Normal TBD. Write heavy loads see higher values. Normal, say X% during peak load Worrying Double, so approximately 2X% Critical TBD. For Prod > 80%
  • 25. Replica • Represents oplog window • Depends on – Rate of operations inserted into oplog – Size of operations – Size of oplog capped collection • Normal maintenance window X 3 • Resizing the oplog is non-trivial
  • 26. Replica Absolute Limit? 50% below Normal Normal TBD. Say X hours during peak Worrying 25% below Normal Critical 50% below Normal
  • 27. Summary • Use similar approach for other metrics • Different audiences for alerts – Worrying alerts ops team – Critical goes out to a wider audience • Get started with MMS Monitoring and alerts!
  • 28. I got alerted … now what?

Notas do Editor

  1. A member of a replica set enters RECOVERING state when it is not ready to accept reads. The RECOVERING state can occur during normal operation, and doesn’t necessarily reflect an error condition. Members in the RECOVERING state are eligible to vote in elections, but is not eligible to enter the PRIMARY state.
  2. A member of a replica set enters RECOVERING state when it is not ready to accept reads. The RECOVERING state can occur during normal operation, and doesn’t necessarily reflect an error condition. Members in the RECOVERING state are eligible to vote in elections, but is not eligible to enter the PRIMARY state.----- Meeting Notes (1/9/14 11:17) -----Initial Sync, rollback, stale
  3. Get the discount code from Meghan or Rhea