15. How do I create indexes?
// The client remembers the index and raises no errors
db.recipes.ensureIndex({ main_ingredient: 1 })
* 1 means ascending, -1 descending
16. What can be indexed?
// Multiple fields (compound indexes)
db.recipes.ensureIndex({
main_ingredient: 1,
calories: -1
})
// Arrays of values (multikey indexes)
{
name: ’Curry Wurst mit Pommes’,
ingredients : [’pork', ’curry']
}
db.recipes.ensureIndex({ ingredients: 1 })
Image: http://www.marions-kochbuch.com/
17. What can be indexed?
// Subdocuments
{
name : ’Curry Wurst mit Pommes',
contributor: {
name: ’Hans Wurst',
id: ’hawu36'
}
}
db.recipes.ensureIndex({ 'contributor.id': 1 })
db.recipes.ensureIndex({ 'contributor': 1 })
Image: http://www.marions-kochbuch.com/
18. How do I manage indexes?
// List a collection's indexes
db.recipes.getIndexes()
db.recipes.getIndexKeys()
// Drop a specific index
db.recipes.dropIndex({ ingredients: 1 })
// Drop all indexes and recreate them
db.recipes.reIndex()
// Default (unique) index on _id
19. Background Index Builds
// Index creation is a blocking operation that can take a long time
// Background creation yields to other operations
db.recipes.ensureIndex(
{ ingredients: 1 },
{ background: true }
)
21. Uniqueness Constraints
// Only one recipe can have a given value for name
db.recipes.ensureIndex( { name: 1 }, { unique: true } )
// Force index on collection with duplicate recipe names – drop the
duplicates
db.recipes.ensureIndex(
{ name: 1 },
{ unique: true, dropDups: true }
)
* dropDups is probably never what you want
image: www.idownloadblog.com
22. Sparse Indexes
// Only documents with field calories will be indexed
db.recipes.ensureIndex(
{ calories: -1 },
{ sparse: true }
)
// Allow multiple documents to not have calories field
db.recipes.ensureIndex(
{ name: 1 , calories: -1 },
{ unique: true, sparse: true }
)
* Missing fields are stored as null(s) in the index
23. Geospatial Indexes
// Add latitude, longitude coordinates
{
name: ’Curry 36 am Mehringdamm’,
loc: [ 13.387764, 52.493442]
}
// Index the coordinates
db.locations.ensureIndex( { loc : '2d' } )
// Query for locations 'near' a particular coordinate
db.locations.find({
loc: { $near: [ 37.4, -122.3 ] }
})
image: NASA
24. TTL Collections
// Documents must have a BSON UTC Date field
{ ’submitted_date' :
ISODate('2012-10-12T05:24:07.211Z'), … }
// Documents are removed after
// 'expireAfterSeconds' seconds
db.recipes.ensureIndex(
{ submitted_date: 1 },
{ expireAfterSeconds: 3600 }
)
image: taylordsdn112.wordpress.com
25. Limitations
• Collections can not have > 64 indexes.
• Index keys can not be > 1024 bytes (1K).
• The name of an index, including the namespace, must be <
128 characters.
• Queries can only use 1 index*
• Indexes have storage requirements, and impact the
performance of writes.
• In memory sort (no-index) limited to 32mb of return data.
27. Profiling Slow Ops
db.setProfilingLevel( n , slowms=100ms )
n=0 profiler off
n=1 record operations longer than slowms
n=2 record all queries
db.system.profile.find()
* The profile collection is a capped collection, and fixed in size
image: http://www.speareducation.com/
28. The Explain Plan (without
Index)
db.recipes.find( { calories:
{ $lt : 40 } }
).explain( )
{
"cursor" : "BasicCursor" ,
"n" : 42,
"nscannedObjects” : 12345
"nscanned" : 12345,
...
"millis" : 356,
...
}
* Doesn’t use cached plans, re-evals and resets cache
29. The Explain Plan (with Index)
db.recipes.find( { calories:
{ $lt : 40 } }
).explain( )
{
"cursor" : "BtreeCursor calories_-1" ,
"n" : 42,
"nscannedObjects": 42
"nscanned" : 42,
...
"millis" : 0,
...
}
* Doesn’t use cached plans, re-evals and resets cache
30. The Query Optimizer
• For each "type" of query, MongoDB
periodically tries all useful indexes
• Aborts the rest as soon as one plan wins
• The winning plan is temporarily cached for
each “type” of query
31. Manually Select Index to Use
// Tell the database what index to use
db.recipes.find({
calories: { $lt: 1000 } }
).hint({ _id: 1 })
// Tell the database to NOT use an index
db.recipes.find(
{ calories: { $lt: 1000 } }
).hint({ $natural: 1 })
32. Use Indexes to Sort Query
Results
// Given the following index
db.collection.ensureIndex({ a:1, b:1 , c:1, d:1 })
// The following query and sort operations can use the index
db.collection.find( ).sort({ a:1 })
db.collection.find( ).sort({ a:1, b:1 })
db.collection.find({ a:4 }).sort({ a:1, b:1 })
db.collection.find({ b:5 }).sort({ a:1, b:1 })
33. Indexes that won’t work for
sorting query results
// Given the following index
db.collection.ensureIndex({ a:1, b:1, c:1, d:1 })
// These can not sort using the index
db.collection.find( ).sort({ b: 1 })
db.collection.find({ b: 5 }).sort({ b: 1 })
34. Index Covered Queries
// MongoDB can return data from just the index
db.recipes.ensureIndex({ main_ingredient: 1, name: 1 })
// Return only the ingredients field
db.recipes.find(
{ main_ingredient: 'chicken’ },
{ _id: 0, name: 1 }
)
// indexOnly will be true in the explain plan
db.recipes.find(
{ main_ingredient: 'chicken' },
{ _id: 0, name: 1 }
).explain()
{
"indexOnly": true,
}
37. Trying to Use Multiple
Indexes
// MongoDB can only use one index for a query
db.collection.ensureIndex({ a: 1 })
db.collection.ensureIndex({ b: 1 })
// Only one of the above indexes is used
db.collection.find({ a: 3, b: 4 })
38. Compound Key Mistakes
// Compound key indexes are very effective
db.collection.ensureIndex({ a: 1, b: 1, c: 1 })
// But only if the query is a prefix of the index
// This query can't effectively use the index
db.collection.find({ c: 2 })
// …but this query can
db.collection.find({ a: 3, b: 5 })
40. Regular Expressions
db.users.ensureIndex({ username: 1 })
// Left anchored regex queries can use the index
db.users.find({ username: /^hans wurst/ })
// But not generic regexes
db.users.find({username: /wurst/ })
// Or case insensitive queries
db.users.find({ username: /Hans/i })
First part of this talk is more conceptualSecond part is more detailed and MongoDB specific.
Not just a database concept, have been around for a long timeServe the same purpose: access information quickly and efficiently.
Find recipe by name: CurrywurstHere: compound index on (Reipe Type, Name, Page Number)
Really depends on the order
Humans have ability to quickly skim over the recipe page and find the name they are looking forMove into the world of computers:Linked lists
Look at 7 documents
Binary TreesQueries, inserts and deletes: O(log(n)) time
MongoDB's indexes are B-Trees.German invention: Prof. Rudolph BayerLookups (queries), inserts and deletes happen in O(log(n)) time.
So this is helpful, and can speed up queries by a tremendous amount
So it’s imperative we understand them
As Support Engineer, get questions about performance all the timeUsually :“Everything got really slow”“We didn’t change anything” – RIGHTLast week: They didn’t change anythingDBA ran a few queries without an indexNot only slow, slow down + blocks reads, pulled 20M documents into RAM, ejecting working set
As Support Engineer, get questions about performance all the timeUsually :“Everything got really slow”“We didn’t change anything” – RIGHTLast week: They didn’t change anythingDBA ran a few queries without an indexNot only slow, slow down + blocks reads, pulled 20M documents into RAM, ejecting working set
Repeated calls to ensureIndex only result in one create message going to the server. The index is cached client side for some period of time (varies by driver).Check Manual.
reIndex drops *all* indexes (including the _id index) and rebuilds themExpensive operation. Don’t need to do this often. Corruption / Fragmentation.
Caveats:Still a resource-intensive operationIndex build is slowerIndexes are still built in the foreground on secondaries
Repeat what wejust did:Basic operationsNow looking at some of the options and different index types
unique applies a uniqueness constant on duplicate values.dropDups will force the server to create a unique index by only keeping the first document found in natural order with a value and dropping all other documents with that value.dropDups will likely result in data loss!!!
MongoDB doesn't enforce a schemaSparse indexes only contain entries for documents that have the indexed field.With sparse a unique constraint can be applied to a field not shared by all documents. Otherwise multiple 'null' values violate the unique constraint.Useful? Imagine: high_priority tag on only 1 % of your data. Can use sparse index to retrieve quickly.
Only touch on Geo indexes.'2d' index is a geohash on top of the b-tree.Allows you to search for documents 'near' a long/lat position. Bounds queries are also possible using $within.Good for mobile apps: Find 50 nearest “currywurstbuden” close to me2.4 brings even more geo features.
Index must be on a BSON date field.Documents are removed after expireAfterSeconds seconds.Reaper thread runs every 60 seconds.
Indexes are a really powerful feature of MongoDB, so you need to understand the limitations.*With the exception of $or queries.If index key exceeds 1k, documents silently dropped/not included
Another question we often get in Support:How do we know that our indexes are efficientHow do we know that our queries use the right indexes
n=2 lot of writes to profile collection, usually n=1.
cursor – the type of cursor used. BasicCursor means no index was used. TODO: Use a real example here instead of made up numbers…n – the number of documents that match the querynscannedObjects – the number of documents that had to be scannednscanned – the number of items (index entries or documents) examinedmillis – how long the query tookRatio of n to nscanned should be as close to 1 as possible.
cursor – the type of cursor used. BasicCursor means no index was used.n – the number of documents that match the querynscannedObjects – the number of documents that had to be scannednscanned – the number of items (index entries or documents) examinedmillis – how long the query tookRatio of n to nscanned should be as close to 1 as possible.
Winning plan is reevaluated after 1000 write operations (insert, update, remove, etc.).
Tells MongoDB exactly what index to use.
MongoDB sorts results based on the field order in the index.For queries that include a sort that uses a compound key index, ensure that all fields before the first sorted field are equality matches.Add slide before this to explain sorting. Also: break
MongoDB sorts results based on the field order in the index.For queries that include a sort that uses a compound key index, ensure that all fields before the first sorted field are equality matches.
Rework to go along with the cookbook example
SupportThese things come up oftenUnderstand now, don’t make same mistakes
Better to use a compound index on the low selectivity field and some other more selective field.