O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

Mongo DB Monitoring - Become a MongoDB DBA

This presentation was presented by Art van Scheppingen at Percona Live 2017 in Santa Clara CA and covers what you need to know to effectively monitor MongoDB

  • Entre para ver os comentários

Mongo DB Monitoring - Become a MongoDB DBA

  1. 1. Copyright 2017 Severalnines AB MongoDB Monitoring Art van Scheppingen Senior Support Engineer, Severalnines Become a MongoDB DBA - Monitoring Essentials
  2. 2. Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB ● Monitoring and trending ● Why do we collect data? ● What metrics to collect from MongoDB? ● Key MongoDB metrics in depth ● Available MongoDB monitoring tools ● How to monitor MongoDB using ClusterControl Agenda
  3. 3. Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB Monitoring and trending
  4. 4. Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB Do you need monitoring and trending?
  5. 5. Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB There is only one person who can land a plane without instruments
  6. 6. Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB ● Monitoring system (i.e. Nagios) ○ Checks if services are healthy ○ Sends pages ● Trending system (i.e. Cacti, Graphite, Prometheus) ○ Collects metrics ○ Generate graphs Monitoring vs Trending
  7. 7. Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB ● Do more than just opening a connection ○ Measure true status of nodes and cluster ○ Test read/write ○ Open essential databases and collections ○ Keep an eye on the replication lag ■ Increase oplog size? ○ Check the full topology Monitoring: Availability
  8. 8. Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB ● Trending ○ Plot trends of key (performance) metrics ○ Create timelines of metrics ○ Correlate various metrics ○ Find problems before they arise ○ Pre-emptive problem management ● Trending tools ○ Granularity of sampling ○ More datapoints = better Trending: why do we need trends?
  9. 9. Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB Why do we collect data?
  10. 10. Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB ● Periodical (daily/weekly) healthchecks ● Insight into all aspects of the database operations ● Post mortem and proactive monitoring ● Capacity planning Why do we collect data?
  11. 11. Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB ● Healthchecks are a pain ● You want to see aggregated data ● You want to be able to drill down to a particular host ● You want to see the most important data first and dig in later on Healthchecks
  12. 12. Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB ● Ability to dig into past data ● Even less than 5s of data granularity (hardware-dependent) ● Low granularity allows you to catch the issue as it evolves - no need to wait 5 minutes for a graph to refresh Post mortem and proactive monitoring
  13. 13. Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB ● Graphs based on MongoDB status metrics ● Overall status and per-node graphs ● Ability to get a timeshifted graphs - useful for comparing workload changes across the time Insight into internals, capacity planning
  14. 14. Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB What metrics to collect from MongoDB?
  15. 15. Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB ● Quite similar to other database systems ○ Host metrics ○ Operational metrics ○ Storage engine metrics ○ Replication metrics ○ Shard metrics Type of metrics to collect
  16. 16. Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB ● Similar to most other databases ● Understand the utilization of the machine ● Capacity planning ● Determine the type of an issue ○ I/O related? ○ CPU related? ○ Network related? Host metrics: what for?
  17. 17. Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB ● CPU utilization (should I add more nodes to the cluster?) ● Network utilization (am I running out of bandwidth?) ● Ping (how badly latency affects my MongoDB cluster?) ● Disk throughput and IOPS (am I within my hardware limits?) ● Disk space (do I have to plan for larger disks?) ● Memory utilization (do I suffer from a memory leak?) Host metrics: what to look for?
  18. 18. Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB ● Similar to most other databases ● Throughput of the cluster ● Relate throughput to cluster performance ● Determine the type of an issue ○ Request spikes? ○ Write amplification related? ○ Queueing? Operational metrics
  19. 19. Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB ● Storage engine specific ○ MMAP ○ Wired Tiger ○ MongoRocks ● Insight in how the engine performs ● Internal congestion Storage engine metrics
  20. 20. Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB ● Throughput of the replication ● Durability of the oplog ● Replication lag ● Cluster replication acknowledgement ○ Quorum based ○ At least one secondary needs to acknowledge Replication metrics
  21. 21. Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB Eventual consistency
  22. 22. Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB ● Shard chunks and balancing ○ Chunks per shard ○ Disk usage ● Non-sharded collections ○ Sharding has to be enabled on collection level ○ Non-sharded collections get a primary shard assigned ○ Once the primary shard is full, no writes can happen ● Connection pool (mongos) ○ All queries will be sent to the primary in a shard ○ Range queries will block connections of the connection pool Sharding related metrics
  23. 23. Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB Key MongoDB metrics to know about
  24. 24. Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB ● Oplog: a special collection containing all transactions ○ Limited in size (configurable) ○ Eviction of transactions (FIFO) ○ Comparable to a ringbuffer ● Used for replication ○ Secondaries copy transactions from the oplog on other nodes ○ Full data sync necessary once the last executed transaction has been evicted ● Replication window ○ Time between first and last transaction in the oplog ○ Time that allows your secondary to be offline before performing a full sync Oplog
  25. 25. Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB From the MongoDB CLI mongo_replica_0:PRIMARY> db.getReplicationInfo() { "logSizeMB" : 1895.7751951217651, "usedMB" : 419.86, "timeDiff" : 281419, "timeDiffHours" : 78.17, "tFirst" : "Fri Jul 08 2016 10:56:01 GMT+0000 (UTC)", "tLast" : "Mon Jul 11 2016 17:06:20 GMT+0000 (UTC)", "now" : "Mon Jul 11 2016 17:15:06 GMT+0000 (UTC)" } Oplog: replication window
  26. 26. Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB From the ClusterControl advisor: function getReplicationWindow(host) { var replwindow = {}; replwindow['newset'] = false; // Fetch the first and last record from the Oplog and take it's timestamp var res = host.executeMongoQuery("local", '{find: "oplog.rs", sort: { $natural: 1}, limit: 1}'); replwindow['first'] = res["result"]["cursor"]["firstBatch"][0]["ts"]["$timestamp"]["t"]; if (res["result"]["cursor"]["firstBatch"][0]["o"]["msg"] == "initiating set") { replwindow['newset'] = true; } res = host.executeMongoQuery("local", '{find: "oplog.rs", sort: { $natural: -1}, limit: 1}'); replwindow['last'] = res["result"]["cursor"]["firstBatch"][0]["ts"]["$timestamp"]["t"]; replwindow['replwindow'] = replwindow['last'] - replwindow['first']; return replwindow; } Oplog: replication window
  27. 27. Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB ● CPU, IO or lock related ● Outcome: ○ Secondary not used by Mongo client drivers ○ Puts larger strain on other secondaries ○ Less likely to be elected during a failover ■ If it will be elected it could be disastrous ○ Lagging behind too far could cause a full sync Replication lag
  28. 28. Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB my_mongodb_0:PRIMARY> db.runCommand( { replSetGetStatus: 1 } ) { … "members" : [ { "_id" : 0, "name" : "10.10.32.11:27017", "stateStr" : "PRIMARY", "optime" : { "ts" : Timestamp(1466247801, 5), "t" : NumberLong(1) }, }, { "_id" : 1, "name" : "10.10.32.12:27017", "stateStr" : "SECONDARY", "optime" : { "ts" : Timestamp(1466247801, 5), "t" : NumberLong(1) }, }, … ], "ok" : 1 } Replication lag
  29. 29. Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB ● Like any other databases: availability ● Client drivers may support connection pooling ○ Multiple non-blocking queries can use the same connection ○ Spawns new connections when low on threshold ● Increase of connections ○ Locking issues ○ Application request bursts Connections
  30. 30. Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB From the MongoDB CLI mongo_replica_0:PRIMARY> db.serverStatus().connections { "current" : 25, "available" : 794, "totalCreated" : NumberLong(122418) } From any mongo client mongo_replica_0:PRIMARY> db.runCommand( { serverStatus: 1 } ).connections { "current" : 25, "available" : 794, "totalCreated" : NumberLong(122418) } Connections
  31. 31. Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB ● Atomicity on document level ○ Wiredtiger and MongoRocks ● No “real” transactions ● Write data with the $isolated operator ○ Similar to READ UNCOMMITTED in MySQL (dirty reads in ANSI SQL) ○ No rollback ○ Does not work on shards Transactions
  32. 32. Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB Transactions From the MongoDB CLI mongo_replica_0:PRIMARY> db.serverStatus().opcounters { "insert" : 1355272, "query" : 20712, "update" : 8995, "delete" : 0, "getmore" : 400791, "command" : 2405749 } From any mongo client mongo_replica_0:PRIMARY> db.runCommand({serverStatus: 1}).opcounters { "insert" : 1355272, "query" : 20712, "update" : 8995, "delete" : 0, "getmore" : 400791, "command" : 2405749 } Transactions
  33. 33. Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB ● Three levels of (generic) locking ○ Global ○ Database ○ Collection ● Global lock hardly ever happens (full lock on MongoDB) ● Database locks occur when dropping a collection ● Collection locks occur mostly in MMAP Locks
  34. 34. Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB From the MongoDB CLI mongo_replica_0:PRIMARY> db.serverStatus().locks { "Global" : { "acquireCount" : { "r" : NumberLong(6050583), "w" : NumberLong(2416551), "R" : NumberLong(1), "W" : NumberLong(7) }, "acquireWaitCount" : { "r" : NumberLong(1), "w" : NumberLong(1), "W" : NumberLong(1) }, … } Locks (generic)
  35. 35. Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB ● Optimistic concurrency control ○ If two write operations conflict, the transaction will be paused and retried ● Document level locking ● Tickets (threads) ○ Read ○ Write Locks (WiredTiger)
  36. 36. Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB From the MongoDB CLI mongo_replica_0:PRIMARY> db.serverStatus().wiredTiger.concurrentTransactions { "write" : { "out" : 0, "available" : 128, "totalTickets" : 128 }, "read" : { "out" : 0, "available" : 128, "totalTickets" : 128 } } Locks (WiredTiger)
  37. 37. Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB ● MongoDB uses three tiers of cache ○ Filesystem ○ Active memory ○ Storage engine (WiredTiger / MongoRocks) ● Page faults ○ Cache miss ● Evictions Cache
  38. 38. Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB From the MongoDB CLI mongo_replica_0:PRIMARY> db.serverStatus().extra_info.page_faults 37912924 mongo_replica_0:PRIMARY> db.serverStatus().wiredTiger.cache { "bytes currently in the cache" : 887889617, "modified pages evicted" : 561514, "tracked dirty pages in the cache" : 626, "unmodified pages evicted" : 15823118 } Page faults and cache usage
  39. 39. Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB ● Shards make write scaling transparently ● Sharding can be solved with two methods: ○ Hash key distribution (limited) ○ Shard lookup table ● MongoDB uses a combination of hash key distribution and shard lookup table ○ Hash key (or range key) distribution gets divided into chunks (ranges) ○ The chunk metadata gets stored in the config server ● The config server is the most important data in a MongoDB sharded cluster! ● The shard router is the the second most important component ● Shards can get out of balance ○ Non-sharded collections ○ Heavy / large writes on a single chunk ○ Auto balancing by the primary of the Config server (3.4) or mongos (< 3.2) Shard metrics
  40. 40. Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB From the MongoDB CLI: mongos> sh.status() --- Sharding Status --- … databases: { "_id" : "shardtest", "primary" : "sh1", "partitioned" : true } shardtest.collection shard key: { "_id" : 1 } unique: false balancing: true chunks: sh1 1 sh2 2 sh3 1 From any mongo client: mongos> use config switched to db config mongos> db.config.runCommand({aggregate: "chunks", pipeline: [{$group: {"_id": {"ns": "$ns", "shard": "$shard"}, "total_chunks": {$sum: 1}}}]}) { "_id" : { "ns" : "test.usertable", "shard" : "mongo_replica_1" }, "total_chunks" : 330 } { "_id" : { "ns" : "test.usertable", "shard" : "mongo_replica_0" }, "total_chunks" : 328 } { "_id" : { "ns" : "test.usertable", "shard" : "mongo_replica_2" }, "total_chunks" : 335 } Shard chunks and balancing
  41. 41. Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB From the ClusterControl non-sharded collection advisor: use config; var shard_collections = db.collections.find(); var sharded_names = {}; while (shard_collections.hasNext()) { shard = shard_collections.next(); sharded_names[shard._id] = 1; } var admin_db = db.getSiblingDB("admin"); dbs = admin_db.runCommand({ "listDatabases": 1 }).databases; dbs.forEach(function(database) { if (database.name != "config") { db = db.getSiblingDB(database.name); cols = db.getCollectionNames(); cols.forEach(function(col) { if( col != "system.indexes" ) { if( shard_names[database.name + "." + col] != 1) { print (database.name + "." + col); } } }); } }); Non-sharded collections
  42. 42. Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB From the MongoDB CLI mongos> db.runCommand( { "connPoolStats" : 1 } ) { "numClientConnections" : 10, "numAScopedConnections" : 0, "totalInUse" : 4, "totalAvailable" : 8, "totalCreated" : 23, "hosts" : { "10.10.34.11:27019" : { "inUse" : 1, "available" : 1, "created" : 1 }, "10.10.34.12:27018" : { "inUse" : 3, "available" : 1, "created" : 2 } }, ... "ok" : 1 } Connection pool
  43. 43. Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB Available MongoDB monitoring tools
  44. 44. Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB ● Open Source ○ Nagios ○ Zabbix ● Subscription based ○ MongoDB Cloud Manager ○ VividCortex ○ ClusterControl Alerting solutions
  45. 45. Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB ● Nagios-MongoDB ○ https://github.com/mzupan/nagios-plugin-mongodb/ ○ Performs some very important checks ■ Replication lag ■ Lock time percentage ■ Index miss ratio Nagios
  46. 46. Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB ● MongoDB Zabbix monitoring plugin ○ https://github.com/nightw/mikoomi-zabbix-mongodb-monitoring ○ All the necessary metrics and more ■ Entries in oplog ○ Pre-canned triggers Zabbix
  47. 47. Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB ● Trending tools ○ Statsd/Grafana ○ Cacti ○ Zabbix ● Subscription based ○ MongoDB Cloud Manager ○ VividCortex ○ ClusterControl Trending solutions
  48. 48. Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB ● Percona MongoDB Monitoring Templates ○ https://www.percona.com/doc/percona-monitoring-plugins/1.1/cacti/mongodb-templates. html Cacti
  49. 49. Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB ● PMM ○ https://www.percona.com/doc/percona-monitoring-and-management/ ○ Open Source Monitoring & Management framework ○ Can deploy, manage and monitor MySQL & MongoDB ○ Uses Prometheus and Grafana Orchestration systems: Percona Monitoring & Management
  50. 50. Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB ● PMM ○ https://www.percona.com/doc/percona-monitoring-and-management/ ○ Open Source Monitoring & Management framework ○ Can deploy, manage and monitor MySQL & MongoDB ○ Uses Prometheus and Grafana Percona Monitoring & Management sessions: ● MySQL Monitoring with Percona Monitoring and Management, Tue 11:30 - 12:20 in Ballroom E ● Hipster MySQL Monitoring: Serving a deconstructed PMM, Tue 11:30 - 12:20 in Ballroom H ● Monitoring production environment with Percona Monitoring and Management (PMM), Thu 3:00 - 3:50 in room 209 Orchestration systems: Percona Monitoring & Management
  51. 51. Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB How to monitor MongoDB using ClusterControl
  52. 52. Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB ● ClusterControl ○ http://www.severalnines.com ○ Deploy Mongo shards & replicasets ○ Monitor and trend ○ Manage configuration and backups ○ Scale ○ Community edition Orechestration systems: ClusterControl
  53. 53. Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB Easily deploy and import MongoDB replicaSets and Shards
  54. 54. Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB Monitor and trend
  55. 55. Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB Cluster management
  56. 56. Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB Scale replicaSets and Shards
  57. 57. Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB Convert replicaSet into a Sharded cluster
  58. 58. Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB Q & A
  59. 59. Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB ● Blog series: Become a MongoDB DBA ○ http://severalnines.com/blog-categories/mongodb ● Webinar series: Become a MongoDB DBA ○ http://severalnines.com/upcoming-webinars ● Visit our website for more resources! ○ http://www.severalnines.com ● Stop by our booth in the exhibit hall ● Other sessions by Severalnines at Percona Live 2017 MySQL Load Balancers - MaxScale, ProxySQL, HAProxy, MySQL Router & nginx - a close up look, Wed 11:10am - 12:00pm in Ballroom D MySQL (NDB) Cluster Best Practices (Die Hard VIII), Wed 3:30pm - 4:20pm in Room 210 Additional resources
  60. 60. Copyright 2017 Severalnines AB Thank You!

×