SlideShare a Scribd company logo
1 of 20
Download to read offline
Operational Best
   Practices
Tales from the field
The Plan
● Review support cases
  ○ Taken from real issues
  ○ Names/ips/dates changed to protect identities
● Analyze reported issues
● Distill best practices
● Summarize takeaways

● Repeat...
Scenario 1
● Fire, it is on fire!

● Users notice response time takes 1-3 sec
● App logs show timeouts
● Server log show socket exceptions
Scenario 1 - Diagnostics
● Logs

● Understanding the timeouts
   ○ Client read timeout set
   ○ Connection closed/discarded
   ○ Symptom not cause


● Server connection exceptions
  ○ Match timing of client timeouts
   ○ Symptom not cause
Scenario 1 - Monitoring
Graphs speak a thousand words
Scenario 1 - Takeaways
● Monitor Logs
  ○ Alert, escalate
  ○ Correlate
● Disk
   ○ Monitor
   ○ Moved to RAID (10)
● Instrument/Monitor App
● Know your application and application (write)
  characteristics
Scenario 2
● Alerts warn that server is running hot
● Random (small) slowdowns
● Increased traffic/queries
Scenario 2 - Symptoms
High use cpu




Similar query
pattern
Scenario 2 - Diagnostics
● Turn on DB Profiling
● Look at logs


Identify query patterns taking longest or with
highest frequency and run explain
Scenario 2 - Explain
db.scenario2.find({...}).sort({...}).explain() {
     "cursor" : "BtreeCursor ABC",
     "nscanned" : 160677,
     "nscannedObjects" : 12015,
     "n" : 55,
     "millis" : 99,
     "scanAndOrder" : true,
     "indexBounds" : {...} }
Scenario 2 - Diagnostics
● Create a compound index
  ○ Used for criteria and sort
  ○ Reduced CPU dramatically
Scenario 2 - Takeaways
●   Performance test/analyze system behavior
●   Load test before deployment
●   Alert on abnormal states
●   High CPU is a sign of poorly indexed
●   Rolling upgrade for indexes
Scenario 3
● General slowdown on login
● High disk utilization
Scenario 3 - Diagnostics
iostat
Device:   rrqm/s wrqm/s r/s   w/s rsec/s wsec/s avgrq-sz avgqu-sz await   svctm %util
sdp       0.00 0.00 0.50      0.00 27.86 0.00 56.00 149.58        20320.00 2010.00 100.00
Scenario 3
$ blockdev --report
RO RA SSZ BSZ StartSec       Size    Device
rw 8096 512 4048    0 1099494850560 /dev/sdp


Huge read-ahead of 4MB
Scenario 3 - Takeaways
●   Pay attention to disk configurations
●   Load testing would have found this early
●   MongoDB depends on the OS a lot
●   Connect the dots from disportionate effects
Best Practices Learned
● System provisioning
  ○   Capacity
  ○   Performance
  ○   Scale
  ○   Configuration
● Logs
  ○ Review
  ○ Alert
  ○ Rotate and collect (per cluster)
Best Practices Learned
● Query/Index Analysis
  ○ Database Profiler
  ○ Run explain periodically (sampled)
  ○ Instrument code, generate metrics
● Plan/test rollouts
  ○ Rolling upgrade for Replica Set
  ○ Generate indexes on secondaries first
  ○ Name services, use redirection
Thanks, more refs
Please take a look at http://mongodb.org (docs)

● Ask on mongodb-user group
● Use MMS or historic monitoring
   ○ Watch for trends
   ○ Create alerts
   ○ Forecast capacity for provisioning
● logrotate unix command
● monitor disk - munin or the like
● iostat, dstat, vmstat, free, netstat
Questions

More Related Content

Viewers also liked

Java Development with MongoDB
Java Development with MongoDBJava Development with MongoDB
Java Development with MongoDBScott Hernandez
 
Petty Cash Management - How To Manage Logs and Transactions
Petty Cash Management - How To Manage Logs and TransactionsPetty Cash Management - How To Manage Logs and Transactions
Petty Cash Management - How To Manage Logs and TransactionsDavid Olson
 
Managing the logs of your (Rails) applications - RailsWayCon 2011
Managing the logs of your (Rails) applications - RailsWayCon 2011Managing the logs of your (Rails) applications - RailsWayCon 2011
Managing the logs of your (Rails) applications - RailsWayCon 2011lennartkoopmann
 
"Grand Challenges" of Log Management
"Grand Challenges" of Log Management"Grand Challenges" of Log Management
"Grand Challenges" of Log ManagementAnton Chuvakin
 
Elastic - ELK, Logstash & Kibana
Elastic - ELK, Logstash & KibanaElastic - ELK, Logstash & Kibana
Elastic - ELK, Logstash & KibanaSpringPeople
 
Gaining Operational Insights out of Your Logs
Gaining Operational Insights out of Your LogsGaining Operational Insights out of Your Logs
Gaining Operational Insights out of Your LogsAmazon Web Services
 

Viewers also liked (8)

Java Development with MongoDB
Java Development with MongoDBJava Development with MongoDB
Java Development with MongoDB
 
Petty Cash Management - How To Manage Logs and Transactions
Petty Cash Management - How To Manage Logs and TransactionsPetty Cash Management - How To Manage Logs and Transactions
Petty Cash Management - How To Manage Logs and Transactions
 
Managing the logs of your (Rails) applications - RailsWayCon 2011
Managing the logs of your (Rails) applications - RailsWayCon 2011Managing the logs of your (Rails) applications - RailsWayCon 2011
Managing the logs of your (Rails) applications - RailsWayCon 2011
 
"Grand Challenges" of Log Management
"Grand Challenges" of Log Management"Grand Challenges" of Log Management
"Grand Challenges" of Log Management
 
Log Files
Log FilesLog Files
Log Files
 
Elastic - ELK, Logstash & Kibana
Elastic - ELK, Logstash & KibanaElastic - ELK, Logstash & Kibana
Elastic - ELK, Logstash & Kibana
 
Gaining Operational Insights out of Your Logs
Gaining Operational Insights out of Your LogsGaining Operational Insights out of Your Logs
Gaining Operational Insights out of Your Logs
 
Logs management
Logs managementLogs management
Logs management
 

Similar to MongoDB Operational Best Practices (mongosf2012)

How To Get The Most Out Of Your Hibernate, JBoss EAP 7 Application (Ståle Ped...
How To Get The Most Out Of Your Hibernate, JBoss EAP 7 Application (Ståle Ped...How To Get The Most Out Of Your Hibernate, JBoss EAP 7 Application (Ståle Ped...
How To Get The Most Out Of Your Hibernate, JBoss EAP 7 Application (Ståle Ped...Red Hat Developers
 
Elasticsearch 101 - Cluster setup and tuning
Elasticsearch 101 - Cluster setup and tuningElasticsearch 101 - Cluster setup and tuning
Elasticsearch 101 - Cluster setup and tuningPetar Djekic
 
Security Monitoring for big Infrastructures without a Million Dollar budget
Security Monitoring for big Infrastructures without a Million Dollar budgetSecurity Monitoring for big Infrastructures without a Million Dollar budget
Security Monitoring for big Infrastructures without a Million Dollar budgetJuan Berner
 
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...Guglielmo Iozzia
 
Eko10 - Security Monitoring for Big Infrastructures without a Million Dollar ...
Eko10 - Security Monitoring for Big Infrastructures without a Million Dollar ...Eko10 - Security Monitoring for Big Infrastructures without a Million Dollar ...
Eko10 - Security Monitoring for Big Infrastructures without a Million Dollar ...Hernan Costante
 
Webinar slides: An Introduction to Performance Monitoring for PostgreSQL
Webinar slides: An Introduction to Performance Monitoring for PostgreSQLWebinar slides: An Introduction to Performance Monitoring for PostgreSQL
Webinar slides: An Introduction to Performance Monitoring for PostgreSQLSeveralnines
 
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafka
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache KafkaStrata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafka
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafkaconfluent
 
Scaling Up Logging and Metrics
Scaling Up Logging and MetricsScaling Up Logging and Metrics
Scaling Up Logging and MetricsRicardo Lourenço
 
NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1Ruslan Meshenberg
 
Backing up Wikipedia Databases
Backing up Wikipedia DatabasesBacking up Wikipedia Databases
Backing up Wikipedia DatabasesJaime Crespo
 
Oksana Safronova - Will you detect it or not? How to check if security team i...
Oksana Safronova - Will you detect it or not? How to check if security team i...Oksana Safronova - Will you detect it or not? How to check if security team i...
Oksana Safronova - Will you detect it or not? How to check if security team i...NoNameCon
 
Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis ...
Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis ...Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis ...
Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis ...Redis Labs
 
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...javier ramirez
 
Introduction to Apache Apex by Thomas Weise
Introduction to Apache Apex by Thomas WeiseIntroduction to Apache Apex by Thomas Weise
Introduction to Apache Apex by Thomas WeiseBig Data Spain
 
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)Kevin Lynch
 
Clustering in PostgreSQL - Because one database server is never enough (and n...
Clustering in PostgreSQL - Because one database server is never enough (and n...Clustering in PostgreSQL - Because one database server is never enough (and n...
Clustering in PostgreSQL - Because one database server is never enough (and n...Umair Shahid
 

Similar to MongoDB Operational Best Practices (mongosf2012) (20)

How To Get The Most Out Of Your Hibernate, JBoss EAP 7 Application (Ståle Ped...
How To Get The Most Out Of Your Hibernate, JBoss EAP 7 Application (Ståle Ped...How To Get The Most Out Of Your Hibernate, JBoss EAP 7 Application (Ståle Ped...
How To Get The Most Out Of Your Hibernate, JBoss EAP 7 Application (Ståle Ped...
 
Elasticsearch 101 - Cluster setup and tuning
Elasticsearch 101 - Cluster setup and tuningElasticsearch 101 - Cluster setup and tuning
Elasticsearch 101 - Cluster setup and tuning
 
Performance Whackamole (short version)
Performance Whackamole (short version)Performance Whackamole (short version)
Performance Whackamole (short version)
 
Security Monitoring for big Infrastructures without a Million Dollar budget
Security Monitoring for big Infrastructures without a Million Dollar budgetSecurity Monitoring for big Infrastructures without a Million Dollar budget
Security Monitoring for big Infrastructures without a Million Dollar budget
 
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
 
Eko10 - Security Monitoring for Big Infrastructures without a Million Dollar ...
Eko10 - Security Monitoring for Big Infrastructures without a Million Dollar ...Eko10 - Security Monitoring for Big Infrastructures without a Million Dollar ...
Eko10 - Security Monitoring for Big Infrastructures without a Million Dollar ...
 
Webinar slides: An Introduction to Performance Monitoring for PostgreSQL
Webinar slides: An Introduction to Performance Monitoring for PostgreSQLWebinar slides: An Introduction to Performance Monitoring for PostgreSQL
Webinar slides: An Introduction to Performance Monitoring for PostgreSQL
 
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafka
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache KafkaStrata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafka
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafka
 
Scaling Up Logging and Metrics
Scaling Up Logging and MetricsScaling Up Logging and Metrics
Scaling Up Logging and Metrics
 
NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1
 
Stream Processing Overview
Stream Processing OverviewStream Processing Overview
Stream Processing Overview
 
Backing up Wikipedia Databases
Backing up Wikipedia DatabasesBacking up Wikipedia Databases
Backing up Wikipedia Databases
 
Oksana Safronova - Will you detect it or not? How to check if security team i...
Oksana Safronova - Will you detect it or not? How to check if security team i...Oksana Safronova - Will you detect it or not? How to check if security team i...
Oksana Safronova - Will you detect it or not? How to check if security team i...
 
Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis ...
Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis ...Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis ...
Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis ...
 
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
 
Dynomite @ Redis Conference 2016
Dynomite @ Redis Conference 2016Dynomite @ Redis Conference 2016
Dynomite @ Redis Conference 2016
 
Introduction to Apache Apex by Thomas Weise
Introduction to Apache Apex by Thomas WeiseIntroduction to Apache Apex by Thomas Weise
Introduction to Apache Apex by Thomas Weise
 
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
 
Clustering in PostgreSQL - Because one database server is never enough (and n...
Clustering in PostgreSQL - Because one database server is never enough (and n...Clustering in PostgreSQL - Because one database server is never enough (and n...
Clustering in PostgreSQL - Because one database server is never enough (and n...
 
Tuning Java Servers
Tuning Java Servers Tuning Java Servers
Tuning Java Servers
 

More from Scott Hernandez

MongoDB 2.8 Replication Internals: Fitting it all together
MongoDB 2.8 Replication Internals: Fitting it all togetherMongoDB 2.8 Replication Internals: Fitting it all together
MongoDB 2.8 Replication Internals: Fitting it all togetherScott Hernandez
 
Advanced Replication Internals
Advanced Replication InternalsAdvanced Replication Internals
Advanced Replication InternalsScott Hernandez
 
Realtime Analytics with MongoDB Counters (mongonyc 2012)
Realtime Analytics with MongoDB Counters (mongonyc 2012)Realtime Analytics with MongoDB Counters (mongonyc 2012)
Realtime Analytics with MongoDB Counters (mongonyc 2012)Scott Hernandez
 
MongoDB Datacenter Awareness (mongosf2012)
MongoDB Datacenter Awareness (mongosf2012)MongoDB Datacenter Awareness (mongosf2012)
MongoDB Datacenter Awareness (mongosf2012)Scott Hernandez
 
Mongo sf easy java persistence
Mongo sf   easy java persistenceMongo sf   easy java persistence
Mongo sf easy java persistenceScott Hernandez
 
MongoDB: Easy Java Persistence with Morphia
MongoDB: Easy Java Persistence with MorphiaMongoDB: Easy Java Persistence with Morphia
MongoDB: Easy Java Persistence with MorphiaScott Hernandez
 
MongoDB: Mastering the shell
MongoDB: Mastering the shellMongoDB: Mastering the shell
MongoDB: Mastering the shellScott Hernandez
 
MongoDB: Backup, Restore, and DR
MongoDB: Backup, Restore, and DRMongoDB: Backup, Restore, and DR
MongoDB: Backup, Restore, and DRScott Hernandez
 
What's new in the MongoDB Java Driver (2.5)?
What's new in the MongoDB Java Driver (2.5)?What's new in the MongoDB Java Driver (2.5)?
What's new in the MongoDB Java Driver (2.5)?Scott Hernandez
 
MongoDB Aug2010 SF Meetup
MongoDB Aug2010 SF MeetupMongoDB Aug2010 SF Meetup
MongoDB Aug2010 SF MeetupScott Hernandez
 
MongoDB: tips, trick and hacks
MongoDB: tips, trick and hacksMongoDB: tips, trick and hacks
MongoDB: tips, trick and hacksScott Hernandez
 
Mastering the MongoDB Javascript Shell
Mastering the MongoDB Javascript ShellMastering the MongoDB Javascript Shell
Mastering the MongoDB Javascript ShellScott Hernandez
 

More from Scott Hernandez (13)

MongoDB 2.8 Replication Internals: Fitting it all together
MongoDB 2.8 Replication Internals: Fitting it all togetherMongoDB 2.8 Replication Internals: Fitting it all together
MongoDB 2.8 Replication Internals: Fitting it all together
 
Advanced Replication Internals
Advanced Replication InternalsAdvanced Replication Internals
Advanced Replication Internals
 
Realtime Analytics with MongoDB Counters (mongonyc 2012)
Realtime Analytics with MongoDB Counters (mongonyc 2012)Realtime Analytics with MongoDB Counters (mongonyc 2012)
Realtime Analytics with MongoDB Counters (mongonyc 2012)
 
MongoDB Datacenter Awareness (mongosf2012)
MongoDB Datacenter Awareness (mongosf2012)MongoDB Datacenter Awareness (mongosf2012)
MongoDB Datacenter Awareness (mongosf2012)
 
Mongo sf easy java persistence
Mongo sf   easy java persistenceMongo sf   easy java persistence
Mongo sf easy java persistence
 
MongoDB: Easy Java Persistence with Morphia
MongoDB: Easy Java Persistence with MorphiaMongoDB: Easy Java Persistence with Morphia
MongoDB: Easy Java Persistence with Morphia
 
MongoDB: Mastering the shell
MongoDB: Mastering the shellMongoDB: Mastering the shell
MongoDB: Mastering the shell
 
MongoDB: Backup, Restore, and DR
MongoDB: Backup, Restore, and DRMongoDB: Backup, Restore, and DR
MongoDB: Backup, Restore, and DR
 
A Brief MongoDB Intro
A Brief MongoDB IntroA Brief MongoDB Intro
A Brief MongoDB Intro
 
What's new in the MongoDB Java Driver (2.5)?
What's new in the MongoDB Java Driver (2.5)?What's new in the MongoDB Java Driver (2.5)?
What's new in the MongoDB Java Driver (2.5)?
 
MongoDB Aug2010 SF Meetup
MongoDB Aug2010 SF MeetupMongoDB Aug2010 SF Meetup
MongoDB Aug2010 SF Meetup
 
MongoDB: tips, trick and hacks
MongoDB: tips, trick and hacksMongoDB: tips, trick and hacks
MongoDB: tips, trick and hacks
 
Mastering the MongoDB Javascript Shell
Mastering the MongoDB Javascript ShellMastering the MongoDB Javascript Shell
Mastering the MongoDB Javascript Shell
 

Recently uploaded

Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 

Recently uploaded (20)

Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 

MongoDB Operational Best Practices (mongosf2012)

  • 1. Operational Best Practices Tales from the field
  • 2. The Plan ● Review support cases ○ Taken from real issues ○ Names/ips/dates changed to protect identities ● Analyze reported issues ● Distill best practices ● Summarize takeaways ● Repeat...
  • 3. Scenario 1 ● Fire, it is on fire! ● Users notice response time takes 1-3 sec ● App logs show timeouts ● Server log show socket exceptions
  • 4. Scenario 1 - Diagnostics ● Logs ● Understanding the timeouts ○ Client read timeout set ○ Connection closed/discarded ○ Symptom not cause ● Server connection exceptions ○ Match timing of client timeouts ○ Symptom not cause
  • 5. Scenario 1 - Monitoring Graphs speak a thousand words
  • 6. Scenario 1 - Takeaways ● Monitor Logs ○ Alert, escalate ○ Correlate ● Disk ○ Monitor ○ Moved to RAID (10) ● Instrument/Monitor App ● Know your application and application (write) characteristics
  • 7. Scenario 2 ● Alerts warn that server is running hot ● Random (small) slowdowns ● Increased traffic/queries
  • 8. Scenario 2 - Symptoms High use cpu Similar query pattern
  • 9. Scenario 2 - Diagnostics ● Turn on DB Profiling ● Look at logs Identify query patterns taking longest or with highest frequency and run explain
  • 10. Scenario 2 - Explain db.scenario2.find({...}).sort({...}).explain() { "cursor" : "BtreeCursor ABC", "nscanned" : 160677, "nscannedObjects" : 12015, "n" : 55, "millis" : 99, "scanAndOrder" : true, "indexBounds" : {...} }
  • 11. Scenario 2 - Diagnostics ● Create a compound index ○ Used for criteria and sort ○ Reduced CPU dramatically
  • 12. Scenario 2 - Takeaways ● Performance test/analyze system behavior ● Load test before deployment ● Alert on abnormal states ● High CPU is a sign of poorly indexed ● Rolling upgrade for indexes
  • 13. Scenario 3 ● General slowdown on login ● High disk utilization
  • 14. Scenario 3 - Diagnostics iostat Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sdp 0.00 0.00 0.50 0.00 27.86 0.00 56.00 149.58 20320.00 2010.00 100.00
  • 15. Scenario 3 $ blockdev --report RO RA SSZ BSZ StartSec Size Device rw 8096 512 4048 0 1099494850560 /dev/sdp Huge read-ahead of 4MB
  • 16. Scenario 3 - Takeaways ● Pay attention to disk configurations ● Load testing would have found this early ● MongoDB depends on the OS a lot ● Connect the dots from disportionate effects
  • 17. Best Practices Learned ● System provisioning ○ Capacity ○ Performance ○ Scale ○ Configuration ● Logs ○ Review ○ Alert ○ Rotate and collect (per cluster)
  • 18. Best Practices Learned ● Query/Index Analysis ○ Database Profiler ○ Run explain periodically (sampled) ○ Instrument code, generate metrics ● Plan/test rollouts ○ Rolling upgrade for Replica Set ○ Generate indexes on secondaries first ○ Name services, use redirection
  • 19. Thanks, more refs Please take a look at http://mongodb.org (docs) ● Ask on mongodb-user group ● Use MMS or historic monitoring ○ Watch for trends ○ Create alerts ○ Forecast capacity for provisioning ● logrotate unix command ● monitor disk - munin or the like ● iostat, dstat, vmstat, free, netstat