2. Who am I?
• Now a consultant/trainer, but formerly...
• Software engineer at SourceForge
• Author of Essential SQLAlchemy
• Author of MongoDB with Python and Ming
• Primarily code Python
3. The Inspiration
• MongoDB monitoring service
(MMS)
• Free to all MongoDB users
• Minute-by-minute stats on all
your servers
• Hardware cost is important,
use it efficiently (remember it’s
a free service!)
4. Our Experiment
• Similar to MMS but not identical
• Collection of 100 metrics, each with per-
minute values
• “Simulation time” is 300x real time
• Run on 2x AWS small instance
• one MongoDB server (2.0.2)
• one “load generator”
5. Load Generator
• Increment each metric as many times as
possible during the course of a simulated
minute
• Record number of updates per second
• Occasionally call getLastError to prevent
disconnects
6. Schema v1
{
_id: "20101010/metric-1",
metadata: {
date: ISODate("2000-10-10T00:00:00Z"),
metric: "metric-1" }, • One document
daily: 5468426,
hourly: {
per metric (per
"00": 227850, server) per day
"01": 210231,
...
"23": 20457 },
minute: { • Per hour/minute
"0000": 3612, statistics stored as
"0001": 3241,
... documents
"1439": 2819 }
}
7. Update v1
• Use $inc to
update fields in-
place
increment = { daily: 1 }
increment['hourly.' + hour] = 1
increment['minute.' + minute] = 1
• Use upsert to
db.stats.update(
{ _id: id, metadata: metadata },
create document
{ $inc: update }, if it’s missing
true) // upsert
• Easy, correct,
seems like a good
idea....
11. Problems with v1
• The document movement problem
• The midnight problem
• The end-of-the-day problem
• The historical query problem
12. Document movement
problem
• MongoDB in-place updates are fast
• ... except when they’re not in place
• MongoDB adaptively pads documents
• ... but it’s better to know your doc size
ahead of time
13. Midnight problem
• Upserts are convenient, but what’s our key?
• date/metric
• At midnight, you get a huge spike in inserts
14. Fixing the document
movement problem
• Preallocate
db.stats.update(
documents with
{ _id: id, metadata: metadata }, zeros
{ $inc: {
daily: 0,
hourly.0: 0,
hourly.1: 0,
...
• Crontab (?)
minute.0: 0,
minute.1: 0,
... } • NO! (makes
true) // upsert the midnight
problem even
worse)
16. Fixing the midnight
problem
• Could schedule preallocation for different
metrics, staggered through the day
17. Fixing the midnight
problem
• Could schedule preallocation for different
metrics, staggered through the day
• Observation: Preallocation isn’t required for
correct operation
18. Fixing the midnight
problem
• Could schedule preallocation for different
metrics, staggered through the day
• Observation: Preallocation isn’t required for
correct operation
• Let’s just preallocate tomorrow’s docs
randomly as new stats are inserted (with
low probability).
20. Performance with
Preallocation
• Well, it’s better
Experiment startup
21. Performance with
Preallocation
• Well, it’s better
• Still have
Experiment startup decreasing
performance
through the day...
WTF?
22. Performance with
Preallocation
• Well, it’s better
• Still have
Experiment startup decreasing
performance
through the day...
WTF?
23. Problems with v1
• The document movement problem
• The midnight problem
• The end-of-the-day problem
• The historical query problem
24. End-of-day problem
“0000” Value “0001” Value “1439” Value
• Bson stores documents as an association list
• MongoDB must check each key for a match
• Load increases significantly at the end of the day
(MongoDB must scan 1439 keys to find the right minute!)
29. Historical Query
Problem
• Intra-day queries are great
• What about “performance year to date”?
• Now you’re hitting a lot of “cold”
documents and causing page faults
30. Fixing the historical
query problem
• Store multiple levels
{ _id: "201010/metric-1", of granularity in
metadata: {
date: ISODate("2000-10-01T00:00:00Z"), different collections
metric: "metric-1" },
•
daily: {
"0": 5468426, 2 updates rather than
"1": ...,
... 1, but historical
}
"31": ... },
queries much faster
• Preallocate along with
daily docs (only
infrequently upserted)
31. Queries
db.stats.daily.find( { • Updates are by
"metadata.date": { $gte: dt1, $lte: dt2 },
"metadata.metric": "metric-1"},
_id, so no index
{ "metadata.date": 1, "hourly": 1 } }, needed there
sort=[("metadata.date", 1)])
• Chart queries are
by metadata
db.stats.daily.ensureIndex({
'metadata.metric': 1,
• Your range/sort
'metadata.date': 1 }) should be last in
the compound
index
32. Conclusion
• Monitor your performance. Watch out for
spikes.
• Preallocate to prevent document copying
• Pay attention to the number of keys in your
documents (hierarchy can help)
• Make sure your index is optimized for your
sorts
33. Questions?
MongoDB Monitoring Service
http://www.10gen.com/mongodb-monitoring-service
Rick Copeland
@rick446
http://arborian.com
MongoDB Consulting & Training