2. My Background
• For the past year, I’ve looked at
MongoDB logs at least once every day.
• We routinely answer the question “how
can I improve performance?”
3. Who’s this talk for?
• New to MongoDB
• Seeing some slow operations, and need
help debugging
• Running database operations on a sizeable
deploy
• I have a MongoDB deployment, and I’ve
hit a performance wall
4. What should you learn?
Know where to look on a running MongoDB
to uncover slowness, and discuss solutions.
MongoDB has performance
“patterns”.
How to think about improving performance.
And . . .
8. Solution
We are fixing this query
{ $query: { publisher: "US Weekly" }, orderby: { publishedAt: -1 } }
With this index
db.my_collection.ensureIndex({“publisher”: 1, publishedAt: -1}, {background: true})
I would show you the logs, but now they are silent.
Example 1
9. The Pattern
Inefficient Read Queries from in-memory
table scans cause high CPU load
Caused by not matching indexes to queries.
Example 1
12. Solution
This is the slow query
db.my_collection.remove({status: “some chunk of text”})
With this index
db.my_collection.ensureIndex({status: 1})
Example 2
13. The Pattern
Inefficient write queries cause high lock.
Caused by losing track of your indexes / queries.
Example 2
22. Solution
Problem
Query is slow because it has multiple multi-value operators: $or, $gte, and $lte
Solution
Change schema to use an “hour_created” attribute:
hour_created: “%Y-%m-%d %H”
Create an index on “hour_created” with followed by “$or” values. Query
using the new “hour_created.”
Example 4
23. Words of caution
2 / 4 solutions were to add an index.
New indexes as a solution scales poorly.
24. Sometimes . . .
It is best to do nothing, except add shards / add
hardware.
Go back to the drawing board on the design.
25. Bad things happen to
good databases?
• ORMs
• Manage your indexes and queries.
• Constraints will set you free.
26. Road Map for
Refactoring
• Measure, measure, measure.
• Find your slowest queries and determine
if they can be indexed
• Rephrase the problem you are solving by
asking “How do I want to query by data?”