MongoDB Indexing: The Details

MongoDBIndexing and Query Optimizer Details Aaron Staple MongoSV December 3, 2010

What will we cover? Many details of how indexing and the query optimizer work A full understanding of these details is not required to use mongo, but this knowledge can be helpful when making optimizations. We’ll discuss functionality of Mongo 1.8 (for our purposes pretty similar to 1.6 and almost identical to 1.7 edge). Much of the material will be presented through examples. Diagrams are to aid understanding – some details will be left out.

What will we cover? Basic index bounds Compound key index bounds Or queries Automatic index selection

How will we cover it? We’re going to try and cover this material interactively - please volunteer your thoughts on what mongo should do in given scenarios when I ask. Pertinent questions are welcome, but please keep off topic or specialized questions until the end so we don’t lose momentum.

Btree (just a conceptual diagram) 5 2 7 6 8 9 1 3 4 {_id:4,x:6}

Find One Document db.c.find( {x:6} ).limit( 1 ) Index {x:1}

Find One Document 6 ? 1 2 3 4 5 6 7 8 9 {_id:4,x:6}

Find One Document > db.c.find( {x:6} ).limit( 1 ).explain() { "cursor" : "BtreeCursor x_1", "nscanned" : 1, "nscannedObjects" : 1, "n" : 1, "millis" : 1, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : false, "indexBounds" : { "x" : [ [ 6, 6 ] ] } }

Find One Document "indexBounds" : { "x" : [ [ 6, 6 ] ] }

Find One Document "nscanned" : 1, "nscannedObjects" : 1, "n" : 1,

Find One Document 6 ? Now we have duplicate x values 1 2 3 4 5 6 6 6 9 {_id:4,x:6}

Equality Match db.c.find( {x:6} ) Index {x:1}

9 Equality Match 6 ? 1 2 3 4 5 6 6 6 {_id:5,x:6} {_id:4,x:6} {_id:1,x:6}

Equality Match > db.c.find( {x:6} ).explain() { "cursor" : "BtreeCursor x_1", "nscanned" : 3, "nscannedObjects" : 3, "n" : 3, "millis" : 1, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : false, "indexBounds" : { "x" : [ [ 6, 6 ] ] } }

Equality Match "indexBounds" : { "x" : [ [ 6, 6 ] ] }

Equality Match "nscanned" : 3, "nscannedObjects" : 3, "n" : 3,

Equality Match 6 ? 5 2 6 6 6 9 1 3 4

Full Document Matcher db.c.find( {x:6,y:1} ) Index {x:1}

9 Full Document Matcher 6 ? 1 2 3 4 5 6 6 6 {y:5,x:6} {y:4,x:6} {y:1,x:6}

Full Document Matcher > db.c.find( {x:6,y:1} ).explain() { "cursor" : "BtreeCursor x_1", "nscanned" : 3, "nscannedObjects" : 3, "n" : 1, "millis" : 1, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : false, "indexBounds" : { "x" : [ [ 6, 6 ] ] } }

Full Document Matcher "indexBounds" : { "x" : [ [ 6, 6 ] ] }

Full Document Matcher "nscanned" : 3, "nscannedObjects" : 3, "n" : 1, Documents for all matching keys scanned, but only one document matched on non index keys.

Range Match db.c.find( {x:{$gte:4,$lte:7}} ) Index {x:1}

Range Match 4 7 <= ? <= 8 1 2 3 4 5 6 7 9

Range Match > db.c.find( {x:{$gte:4,$lte:7}} ).explain() { "cursor" : "BtreeCursor x_1", "nscanned" : 4, "nscannedObjects" : 4, "n" : 4, "millis" : 1, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : false, "indexBounds" : { "x" : [ [ 4, 7 ] ] } }

Range Match "indexBounds" : { "x" : [ [ 4, 7 ] ]

Range Match "nscanned" : 4, "nscannedObjects" : 4, "n" : 4,

Exclusive Range Match db.c.find( {x:{$gt:4,$lt:7}} ) Index {x:1}

Exclusive Range Match 4 7 < ? < 8 1 2 3 4 5 6 7 9

Exclusive Range Match > db.c.find( {x:{$gt:4,$lt:7}} ).explain() { "cursor" : "BtreeCursor x_1", "nscanned" : 2, "nscannedObjects" : 2, "n" : 2, "millis" : 0, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : false, "indexBounds" : { "x" : [ [ 4, 7 ] ] } }

Exclusive Range Match "indexBounds" : { "x" : [ [ 4, 7 ] ] } Explain doesn’t indicate that the range is exclusive.

Exclusive Range Match "nscanned" : 2, "nscannedObjects" : 2, "n" : 2, But index keys matching the range bounds are not scanned because the bounds are exclusive.

Exclusive Range Match 5 2 7 6 8 9 1 3 4

Multikeys db.c.find( {x:{$gt:7}} ) Index {x:1}

Multikeys 7 ? > 1 2 3 4 5 6 7 9 8 {_id:4,x:[8,9]}

Multikeys > db.c.find( {x:{$gt:7}} ).explain() { "cursor" : "BtreeCursor x_1", "nscanned" : 2, "nscannedObjects" : 2, "n" : 1, "millis" : 1, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : true, "indexOnly" : false, "indexBounds" : { "x" : [ [ 7, 1.7976931348623157e+308 ] ] } }

Multikeys "indexBounds" : { "x" : [ [ 7, 1.7976931348623157e+308 ] ] }

Multikeys "nscanned" : 2, "nscannedObjects" : 2, "n" : 1, All keys in valid range are scanned, but the matcher rejects duplicate documents making n == 1.

Range Types Explicit inequality ,[object Object]

db.c.find( {x:{$ne:4}} )Regular expression prefix db.c.find( {x:/^a/} ) ,[object Object],db.c.find( {x:/a/} )

Range Types db.c.find( {x:{$gt:4,$lt:7}} ) "indexBounds" : { "x" : [ [ 4, 7 ] ] }

Range Types db.c.find( {x:{$gt:4}} ) "indexBounds" : { "x" : [ [ 4, 1.7976931348623157e+308 ] ] }

Range Types db.c.find( {x:{$ne:4}} ) "indexBounds" : { "x" : [ [ { "$minElement" : 1 }, 4 ], [ 4, { "$maxElement" : 1 } ] ] }

Range Types db.c.find( {x:/â/} ) "indexBounds" : { "x" : [ [ "a", "b" ], [ /â/, /â/ ] ] }

Range Types db.c.find( {x:/a/} ) "indexBounds" : { "x" : [ [ "", { } ], [ /a/, /a/ ] ] }

Set Match db.c.find( {x:{$in:[3,6]}} ) Index {x:1}

Set Match 3 6 , 8 1 2 3 4 5 6 7 9

Set Match > db.c.find( {x:{$in:[3,6]}} ).explain() { "cursor" : "BtreeCursor x_1 multi", "nscanned" : 3, "nscannedObjects" : 2, "n" : 2, "millis" : 8, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : false, "indexBounds" : { "x" : [ [ 3, 3 ], [ 6, 6 ] ] } }

Set Match "indexBounds" : { "x" : [ [ 3, 3 ], [ 6, 6 ] ] }

Set Match "nscanned" : 3, "nscannedObjects" : 2, "n" : 2, Why is nscanned 3? This is an algorithmic detail we’ll discuss more later, but when there are disjoint ranges for a key nscanned may be higher than the number of matching keys.

All Match db.c.find( {x:{$all:[3,6]}} ) Index {x:1}

8 All Match 3 ? 1 2 3 4 5 6 7 9 {_id:4,x:[3,6]}

All Match > db.c.find( {x:{$all:[3,6]}} ).explain() { "cursor" : "BtreeCursor x_1", "nscanned" : 1, "nscannedObjects" : 1, "n" : 1, "millis" : 0, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : true, "indexOnly" : false, "indexBounds" : { "x" : [ [ 3, 3 ] ] } }

All Match "indexBounds" : { "x" : [ [ 3, 3 ] ] } The first entry in the $all match array is always used for index bounds. Note this may not be the least numerous indexed value in the $all array.

All Match "nscanned" : 1, "nscannedObjects" : 1, "n" : 1,

Limit db.c.find( {x:{$lt:6},y:3} ).limit( 3 ) Index {x:1}

8 Limit 6 ? < 1 2 3 4 5 6 7 9 y:3 y:3 y:3 y:1 y:3

Limit > db.c.find( {x:{$lt:6},y:3} ).limit( 3 ).explain() { "cursor" : "BtreeCursor x_1", "nscanned" : 4, "nscannedObjects" : 4, "n" : 3, "millis" : 1, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : true, "indexOnly" : false, "indexBounds" : { "x" : [ [ -1.7976931348623157e+308, 6 ] ] } }

Limit "indexBounds" : { "x" : [ [ -1.7976931348623157e+308, 6 ] ] }

Limit "nscanned" : 4, "nscannedObjects" : 4, "n" : 3, Scan until three matches are found, then stop.

Skip db.c.find( {x:{$lt:6},y:3} ).skip( 3 ) Index {x:1}

8 Skip 6 ? < 1 2 3 4 5 6 7 9 y:3 y:3 y:3 y:1 y:3

Skip > db.c.find( {x:{$lt:6},y:3} ).skip( 3 ).explain() { "cursor" : "BtreeCursor x_1", "nscanned" : 5, "nscannedObjects" : 5, "n" : 1, "millis" : 1, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : true, "indexOnly" : false, "indexBounds" : { "x" : [ [ -1.7976931348623157e+308, 6 ] ] } }

Skip "indexBounds" : { "x" : [ [ -1.7976931348623157e+308, 6 ] ] }

Skip "nscanned" : 5, "nscannedObjects" : 5, "n" : 1, All skipped documents are scanned.

Sort db.c.find( {x:{$lt:6}} ).sort( {x:1} ) Index {x:1}

8 Sort 6 ? < 1 2 3 4 5 6 7 9 y:3 y:3 y:3 y:1 y:3

Sort > db.c.find( {x:{$lt:6},y:3} ).sort( {x:1} ).explain() { "cursor" : "BtreeCursor x_1", "nscanned" : 5, "nscannedObjects" : 5, "n" : 4, "millis" : 1, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : true, "indexOnly" : false, "indexBounds" : { "x" : [ [ -1.7976931348623157e+308, 6 ] ] } }

Sort "cursor" : "BtreeCursor x_1",

Sort db.c.find( {x:{$lt:6}} ).sort( {y:1} ) Index {x:1}

Sort > db.c.find( {x:{$lt:6},y:3} ).sort( {y:1} ).explain() { "cursor" : "BtreeCursor x_1", "nscanned" : 5, "nscannedObjects" : 5, "n" : 4, "scanAndOrder" : true, "millis" : 1, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : true, "indexOnly" : false, "indexBounds" : { "x" : [ [ -1.7976931348623157e+308, 6 ] ] } }

Sort "cursor" : "BtreeCursor x_1", "nscanned" : 5, "nscannedObjects" : 5, "n" : 4, "scanAndOrder" : true, Results are sorted on the fly to match requested order. The scanAndOrder field is only printed when its value is true.

Sort and scanAndOrder With “scanAndOrder” sort, all documents must be touched even if there is a limit spec. With scanAndOrder, sorting is performed in memory and the memory footprint is constrained by the limit spec if present.

Count db.c.count( {x:{$gte:4,$lte:7}} ) Index {x:1}

Count 4 7 <= ? <= 8 1 2 3 4 5 6 7 9

Count We’re just counting keys here, not loading the full documents. 5 2 7 6 8 9 1 3 4

Count With some operators the full document must be checked. Some of these cases: ,[object Object]

Negation - $ne, $nin, $not, etc.With current semantics, all multikey elements must match negation constraints Multikey de duplication works without loading full document

Covered Indexes db.c.find( {x:6}, {x:1,_id:0} ) Index {x:1} Id would be returned by default, but isn’t in the index so we need to exclude to return only indexed fields.

8 Covered Indexes 6 ? 1 2 3 4 5 6 7 9 {_id:4,x:6}

Covered Indexes > db.c.find( {x:6}, {x:1,_id:0} ).explain() { "cursor" : "BtreeCursor x_1", "nscanned" : 1, "nscannedObjects" : 1, "n" : 1, "millis" : 0, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : true, "indexBounds" : { "x" : [ [ 6, 6 ] ] } }

Covered Indexes "isMultiKey" : false, "indexOnly" : true,

8 Covered Indexes 6 ? 1 2 3 4 5 6 7 9 {_id:4,x:[6,7]}

Covered Indexes > db.c.find( {x:6}, {x:1,_id:0} ).explain() { "cursor" : "BtreeCursor x_1", "nscanned" : 1, "nscannedObjects" : 1, "n" : 1, "millis" : 0, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : true, "indexOnly" : false, "indexBounds" : { "x" : [ [ 6, 6 ] ] } }

Covered Indexes "isMultiKey" : true, "indexOnly" : false, Currently we set isMultiKey to true the first time we save a doc where the field is a multikey array. But when all multikey docs are removed we don’t reset isMultiKey. This can be improved.

Update db.c.find( {x:{$gte:4,$lte:7}}, {$set:{x:2}} ) Index {x:1}

Update 4 7 <= ? <= 8 1 2 3 4 5 6 7 9 {_id:4,x:4}

Update 5 2 7 6 8 9 1 3 4 {_id:4,x:4}

Update 5 2 7 6 8 9 1 2 3 {_id:4,x:2}

Update We track the set of documents that have been updated in the course of the current operation so they are only updated once.

Two Equality Bounds db.c.find( {x:5,y:’c’} ) Index {x:1,y:1}

Two Equality Bounds ? 5 1 3 4 5 5 6 7 9 5 c b d g d f c a b c

Two Equality Bounds > db.c.find( {x:5,y:'c'} ).explain() { "cursor" : "BtreeCursor x_1_y_1", "nscanned" : 1, "nscannedObjects" : 1, "n" : 1, "millis" : 1, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : false, "indexBounds" : { "x" : [ [ 5, 5 ] ], "y" : [ [ "c", "c" ] ] } }

Two Equality Bounds "indexBounds" : { "x" : [ [ 5, 5 ] ], "y" : [ [ "c", "c" ] ] } }

Two Equality Bounds "nscanned" : 1, "nscannedObjects" : 1, "n" : 1,

Two Equality Bounds ? 1 3 4 5 5 5 5 6 7 9 b d g c d f c c a b

Equality and Set db.c.find( {x:5,y:{$in:[’c’,’f’]}} ) Index {x:1,y:1}

Equality and Set , 5 1 3 4 5 5 6 7 9 5 5 c b d g d f c a b c f

Equality and Set > db.c.find( {x:5,y:{$in:['c','f']}} ).explain() { "cursor" : "BtreeCursor x_1_y_1 multi", "nscanned" : 3, "nscannedObjects" : 2, "n" : 2, "millis" : 1, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : false, "indexBounds" : { "x" : [ [ 5, 5 ] ], "y" : [ [ "c", "c" ], [ "f", "f" ] ] } }

Equality and Set "indexBounds" : { "x" : [ [ 5, 5 ] ], "y" : [ [ "c", "c" ], [ "f", "f" ] ] }

Equality and Set "nscanned" : 3, "nscannedObjects" : 2, "n" : 2,

Equality and Set 1 3 4 5 5 5 6 7 9 b d g c d f c a b

Equality and Range db.c.find( {x:5,y:{$gte:’d’}} ) Index {x:1,y:1}

Equality and Range <= ? <= 1 3 4 5 5 6 7 9 5 5 5 b d g d f c a b c d max string

Equality and Range > db.c.find( {x:5,y:{$gte:'d'}} ).explain() { "cursor" : "BtreeCursor x_1_y_1", "nscanned" : 2, "nscannedObjects" : 2, "n" : 2, "millis" : 1, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : false, "indexBounds" : { "x" : [ [ 5, 5 ] ], "y" : [ [ "d", { } ] ] } }

Equality and Range "indexBounds" : { "x" : [ [ 5, 5 ] ], "y" : [ [ "d", { } ] ] }

Equality and Range "nscanned" : 2, "nscannedObjects" : 2, "n" : 2,

Equality and Range 1 3 4 5 5 5 6 7 9 b d g c d f c a b

Two Set Bounds db.c.find( {x:{$in:[5,9]},y:{$in:[’c’,’f’]}} ) Index {x:1,y:1}

Two Set Bounds , , , 5 1 3 4 5 5 6 7 9 5 5 9 9 c b d g d f c a f c f c f

Two Set Bounds > db.c.find( {x:{$in:[5,9]},y:{$in:['c','f']}} ).explain() { "cursor" : "BtreeCursor x_1_y_1 multi", "nscanned" : 5, "nscannedObjects" : 3, "n" : 3, "millis" : 0, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : false, "indexBounds" : { "x" : [ [ 5, 5 ], [ 9, 9 ] ], "y" : [ [ "c", "c" ], [ "f", "f" ] ] } }

Two Set Bounds "indexBounds" : { "x" : [ [ 5, 5 ], [ 9, 9 ] ], "y" : [ [ "c", "c" ], [ "f", "f" ] ] }

Two Set Bounds "nscanned" : 5, "nscannedObjects" : 3, "n" : 3,

Two Set Bounds 1 3 4 5 5 5 6 7 9 b d g c d f c a f

Set and Range db.c.find( {x:{$in:[5,9]},y:{$lte:’d’}} ) Index {x:1,y:1}

Set and Range , <=?<= <=?<= 5 1 3 4 5 5 6 9 9 5 5 9 9 b d g d f c a f c d d min string min string

Set and Range > db.c.find( {x:{$in:[5,9]},y:{$lte:'d'}} ).explain() { "cursor" : "BtreeCursor x_1_y_1 multi", "nscanned" : 5, "nscannedObjects" : 3, "n" : 3, "millis" : 0, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : false, "indexBounds" : { "x" : [ [ 5, 5 ], [ 9, 9 ] ], "y" : [ [ "", "d" ] ] } }

Set and Range "x" : [ [ 5, 5 ], [ 9, 9 ] ], "y" : [ [ "", "d" ] ] }

Set and Range "nscanned" : 5, "nscannedObjects" : 3, "n" : 3,

Range and Equality db.c.find( {x:{$gte:4},y:’c’} ) Index {x:1,y:1}

Range and Equality ? and ? >= 4 1 3 4 5 6 7 9 5 8 b d g d a e f c c c

Range and Equality > db.c.find( {x:{$gte:4},y:'c'} ).explain() { "cursor" : "BtreeCursor x_1_y_1", "nscanned" : 7, "nscannedObjects" : 2, "n" : 2, "millis" : 0, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : false, "indexBounds" : { "x" : [ [ 4, 1.7976931348623157e+308 ] ], "y" : [ [ "c", "c" ] ] } }

Range and Equality "indexBounds" : { "x" : [ [ 4, 1.7976931348623157e+308 ] ], "y" : [ [ "c", "c" ] ] }

Range and Equality "nscanned" : 7, "nscannedObjects" : 2, "n" : 2, High nscanned because every distinct value of x must be checked.

Range and Equality 1 3 4 5 5 9 6 7 8 b d g c d f a e c

Range and Equality 1 3 4 5 5 9 6 7 8 b d g c d f a e c Every distinct value of x must be checked.

Range and Set db.c.find( {x:{$gte:4},y:{$in:[’c’,’a’]}} ) Index {x:1,y:1}

Range and Set , and ? >= 4 1 3 4 5 6 7 9 5 8 b d g d a e f c c c a

Range and Set > db.c.find( {x:{$gte:4},y:{$in:['c','a']}} ).explain() { "cursor" : "BtreeCursor x_1_y_1 multi", "nscanned" : 7, "nscannedObjects" : 3, "n" : 3, "millis" : 0, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : false, "indexBounds" : { "x" : [ [ 4, 1.7976931348623157e+308 ] ], "y" : [ [ "a", "a" ], [ "c", "c" ] ] } }

Range and Set "indexBounds" : { "x" : [ [ 4, 1.7976931348623157e+308 ] ], "y" : [ [ "a", "a" ], [ "c", "c" ] ] }

Range and Set "nscanned" : 7, "nscannedObjects" : 3, "n" : 3,

Range and Set 1 3 4 5 5 9 6 7 8 b d g c d f a e c

Range and Set 1 3 4 5 5 9 6 7 8 b d g c d f a e c Every distinct value of x must be checked for y values ‘a’ and ‘c’.

Two Ranges (2D Box) db.c.find( {x:{$gte:3,$lte:7},y:{$gte:’c’,$lte:’f’}} ) Index {x:1,y:1}

Two Ranges (2D Box) y f {x: {$gte:3,$lte:7}, y: {$gte:’c’,$lte:’f’}} c x 7 3

Two Ranges (2D Box) & <=?<= <=?<= 7 1 3 4 5 6 7 9 5 7 3 b d g d a e f c c g f

Two Ranges (2D Box) > db.c.find( {x:{$gte:3,$lte:7},y:{$gte:'c',$lte:'f'}} ).explain() { "cursor" : "BtreeCursor x_1_y_1", "nscanned" : 6, "nscannedObjects" : 4, "n" : 4, "millis" : 1, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : false, "indexBounds" : { "x" : [ [ 3, 7 ] ], "y" : [ [ "c", "f" ] ] } }

Two Ranges (2D Box) "indexBounds" : { "x" : [ [ 3, 7 ] ], "y" : [ [ "c", "f" ] ] }

Two Ranges (2D Box) "nscanned" : 6, "nscannedObjects" : 4, "n" : 4,

Two Ranges (2D Box) 1 3 4 5 5 9 6 7 7 b d g c d f a e g

Two Ranges (2D Box) 7 3 For every distinct value of x in this range c f <=?<= Scan for every value of y in this range <=?<=

Disjoint $or Criteria db.c.find( {$or:[{x:5},{y:’d’}]} ) Indexes {x:1}, {y:1}

Disjoint $or Criteria ? 1 3 4 5 6 7 9 5 7 5 1 3 4 5 6 7 9 5 7 b d g d a e f c d g b d g d a e f c g ?

Disjoint $or Criteria > db.c.find( {$or:[{x:5},{y:'d'}]} ).explain() { "clauses" : [ { "cursor" : "BtreeCursor x_1", "nscanned" : 2, "nscannedObjects" : 2, "n" : 2, "millis" : 0, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : false, "indexBounds" : { "x" : [ [ 5, 5 ] ] } }, { "cursor" : "BtreeCursor y_1", "nscanned" : 2, "nscannedObjects" : 2, "n" : 1, "millis" : 1, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : false, "indexBounds" : { "y" : [ [ "d", "d" ] ] } } ], "nscanned" : 4, "nscannedObjects" : 4, "n" : 3, "millis" : 1 }

Disjoint $or Criteria { "cursor" : "BtreeCursor x_1", "nscanned" : 2, "nscannedObjects" : 2, "n" : 2, "millis" : 0, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : false, "indexBounds" : { "x" : [ [ 5, 5 ] ] } },

Disjoint $or Criteria { "cursor" : "BtreeCursor y_1", "nscanned" : 2, "nscannedObjects" : 2, "n" : 1, "millis" : 1, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : false, "indexBounds" : { "y" : [ [ "d", "d" ] ] } } ], Only return one document matching this clause.

Disjoint $or Criteria "nscanned" : 4, "nscannedObjects" : 4, "n" : 3, "millis" : 1

Disjoint $or Criteria ? 1 3 4 5 6 7 9 5 7 5 b d g d a e f c g ✓

Disjoint $or Criteria ? 1 3 4 5 6 7 9 5 7 d b d g d a e f c g ✓ We have already scanned the x index for x:5. So this document was returned already. We don’t return it again.

Unindexed $or Clause db.c.find( {$or:[{x:5},{y:’d’}]} ) Index {x:1} (no index on y)

Unindexed $or Clause > db.c.find( {$or:[{x:5},{y:'d'}]} ).explain() { "cursor" : "BasicCursor", "nscanned" : 9, "nscannedObjects" : 9, "n" : 3, "millis" : 0, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : false, "indexBounds" : { } } Since y is not indexed, we must do a full collection scan to match y:’d’. Since a full scan is required, we don’t use the index on x to match x:5.

Eliminated $or Clause db.c.find( {$or:[{x:{$gt:2,$lt:6}},{x:5}]} ) Index {x:1}

Eliminated $or Clause 2 6 < ? < 8 1 2 3 4 6 7 9 5 5 ? 8 1 2 3 4 6 7 9 5

Eliminated $or Clause > db.c.find( {$or:[{x:{$gt:2,$lt:6}},{x:5}]} ).explain() { "cursor" : "BtreeCursor x_1", "nscanned" : 3, "nscannedObjects" : 3, "n" : 3, "millis" : 0, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : false, "indexBounds" : { "x" : [ [ 2, 6 ] ] } } The index range of the second clause is included in the index range of the first clause, so we use the first index range only.

Eliminated $or Clause with Differing Unindexed Criteria db.c.find( {$or:[{x:{$gt:2,$lt:6},y:’c’},{x:5,y:'d’}]} ) Index {x:1}

Eliminated $or Clause with Differing Unindexed Criteria < ? < and 1 3 4 5 6 7 9 5 7 2 6 1 3 4 5 6 7 9 5 7 5 b d g d a e f c g c b d g d a e f c g d and

Eliminated $or Clause with Differing Unindexed Criteria > db.c.find( {$or:[{x:{$gt:2,$lt:6},y:’c’},{x:5,y:'d’}]} ).explain() { "cursor" : "BtreeCursor x_1", "nscanned" : 4, "nscannedObjects" : 4, "n" : 2, "millis" : 1, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : false, "indexBounds" : { "x" : [ [ 2, 6 ] ] } }

Eliminated $or Clause with Differing Unindexed Criteria 1 3 4 5 6 7 9 5 7 2 6 < ? < and , b d g d a e f c g c d The index range for the first clause contains the index range for the second clause, so all matching is done using the index range for the first clause.

Overlapping $or Clauses db.c.find( {$or:[{x:{$gt:2,$lt:6}},{x:{$gt:4,$lt:7}}]} ) Index {x:1,y:1}

Overlapping $or Clauses 2 6 < ? < 8 1 2 3 4 6 7 9 5 4 7 < ? < 8 1 2 3 4 6 7 9 5

Overlapping $or Clauses > db.d.find( {$or:[{x:{$gt:2,$lt:6}},{x:{$gt:4,$lt:7}}]} ).explain() { "clauses" : [ { "cursor" : "BtreeCursor x_1", "nscanned" : 3, "nscannedObjects" : 3, "n" : 3, "millis" : 0, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : false, "indexBounds" : { "x" : [ [ 2, 6 ] ] } }, { "cursor" : "BtreeCursor x_1", "nscanned" : 1, "nscannedObjects" : 1, "n" : 1, "millis" : 1, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : false, "indexBounds" : { "x" : [ [ 6, 7 ] ] } } ], "nscanned" : 4, "nscannedObjects" : 4, "n" : 4, "millis" : 1 } >

Overlapping $or Clauses { "cursor" : "BtreeCursor x_1", "nscanned" : 3, "nscannedObjects" : 3, "n" : 3, "millis" : 0, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : false, "indexBounds" : { "x" : [ [ 2, 6 ] ] } },

Overlapping $or Clauses { "cursor" : "BtreeCursor x_1", "nscanned" : 1, "nscannedObjects" : 1, "n" : 1, "millis" : 1, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : false, "indexBounds" : { "x" : [ [ 6, 7 ] ] } } The index range scanned for the previous clause is removed.

Overlapping $or Clauses 2 6 < ? < 8 1 2 3 4 6 7 9 5 6 7 <= ? < 8 1 2 3 4 7 9 5 6

2D Overlapping $or Clauses db.c.find( {$or:[{x:{$gt:2,$lt:6},y:{$gt:’b’,$lt:’f’}},{x:{$gt:4,$lt:7},y:{$gt:’b’,$lt:’e’}}]} ) Index {x:1,y:1}

2D Overlapping $or Clauses y f Clause 1 e Clause 2 b x 7 6 2

2D Overlapping $or Clauses > db.c.find( {$or:[{x:{$gt:2,$lt:6},y:{$gt:'b',$lt:'f'}},{x:{$gt:4,$lt:7},y:{$gt:'b',$lt:'e'}}]} ).explain() { "clauses" : [ { "cursor" : "BtreeCursor x_1_y_1", "nscanned" : 4, "nscannedObjects" : 3, "n" : 3, "millis" : 1, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : false, "indexBounds" : { "x" : [ [ 2, 6 ] ], "y" : [ [ "b", "f" ] ] } }, { "cursor" : "BtreeCursor x_1_y_1", "nscanned" : 0, "nscannedObjects" : 0, "n" : 0, "millis" : 1, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : false, "indexBounds" : { "x" : [ [ 6, 7 ] ], "y" : [ [ "b", "e" ] ] } } ], "nscanned" : 4, "nscannedObjects" : 3, "n" : 3, "millis" : 1 }

2D Overlapping $or Clauses { "cursor" : "BtreeCursor x_1_y_1", "nscanned" : 4, "nscannedObjects" : 3, "n" : 3, "millis" : 1, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : false, "indexBounds" : { "x" : [ [ 2, 6 ] ], "y" : [ [ "b", "f" ] ] } },

2D Overlapping $or Clauses { "cursor" : "BtreeCursor x_1_y_1", "nscanned" : 0, "nscannedObjects" : 0, "n" : 0, "millis" : 1, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : false, "indexBounds" : { "x" : [ [ 6, 7 ] ], "y" : [ [ "b", "e" ] ] } } ], The index range scanned for the previous clause is removed.

2D Overlapping $or Clauses y We only have to scan the remainder here f Clause 1 e Clause 2 b x 7 6 2

Overlapping $or Clauses Rule of thumb for n dimensions: We subtract earlier clause boxes from current box when the result is a/some box(es). 2 ✓ 1 1 2 ✓ ✓

Overlapping $or Clauses Rule of thumb for n dimensions: We subtract earlier clause boxes from current box when the result is a/some box(es). 1 2 ✗

$or TODO Use indexes on $or fields to satisfy a sort specification SERVER-1205 Use full query optimizer to select $or clause indexes in getMore SERVER-1215 Improve index range elimination (handling some cases where remainder is not a box)

Automatic Index Selection(Query Optimizer)

Optimal Index find( {x:5} ) Index {x:1} Index {x:1,y:1} find( {x:5} ).sort( {y:1 } ) Index {x:1,y:1} find( {} ).sort( {x:1} ) Index {x:1} find( {x:{$gt:1,$lt:7}} ).sort( {x:1} ) Index {x:1}

Optimal Index Rule of Thumb No scanAndOrder All fields with index useful constraints are indexed If there is a range or sort it is the last field of the index used to resolve the query If multiple optimal indexes exist, one chosen arbitrarily.

Optimal Index These same criteria are useful when you are designing your indexes.

Multiple Candidate Indexes find( {x:4,y:’a’} ) Index {x:1} or {y:1}? find( {x:4} ).sort( {y:1} ) Index {x:1} or {y:1}? Note: {x:1,y:1} is optimal find( {x:{$gt:2,$lt:7},y:{$gt:’a’,$lt:’f’}} ) Index {x:1,y:1} or {y:1,x:1}?

Multiple Candidate Indexes The only index selection criterion is nscanned find( {x:4,y:’a’} ) Index {x:1} or {y:1} ? If fewer documents match {y:’a’} than {x:4} then nscanned for {y:1} will be less so we pick {y:1} find( {x:{$gt:2,$lt:7},y:{$gt:’b’,$lt:’f’}} ) Index {x:1,y:1} or {y:1,x:1} ? If fewer distinct values of 2 < x < 7 than distinct values of ‘b’ < y < ‘f’ then {x:1,y:1} chosen (rule of thumb)

Multiple Candidate Indexes The only index selection criterion is nscanned Pretty good, but doesn’t cover every case, eg Cost of scanAndOrdervs ordered index Cost of loading full document vs just index key Cost of scanning adjacent btree keys vs non adjacent keys/documents

Competing Indexes At most one query plan per index Run in interleaved fashion Plans kept in a priority queue ordered by nscanned. We always continue progress on plan with lowest nscanned.

Competing Indexes Run until one plan returns all results or enough results to satisfy the initial query request (based on soft limit spec / data size requirement for initial query). We only allow plans to compete in initial query. In getMore, we continue reading from the index cursor established by the initial query.

“Learning” a Query Plan When an index is chosen for a query the query’s “pattern” and nscanned are recorded find( {x:3,y:’c’} ) {Pattern: {x:’equality’, y:’equality’}, Index: {x:1}, nscanned: 50} find( {x:{$gt:5},y:{$lt:’z’}} ) {Pattern: {x:’gt bound’, y:’lt bound’}, Index: {y:1}, nscanned: 500}

“Learning” a Query Plan When a new query matches the same pattern, the same query plan is used find( {x:5,y:’z’} ) Use index {x:1} find( {x:{$gt:20},y:{$lt:’b’}} ) Use index {y:1}

MongoDB Indexing: The Details

Recomendados

Recomendados

Mais conteúdo relacionado

Destaque

Destaque (14)

Semelhante a MongoDB Indexing: The Details

Semelhante a MongoDB Indexing: The Details (6)

Mais de MongoDB

Mais de MongoDB (20)

MongoDB Indexing: The Details