SlideShare uma empresa Scribd logo
1 de 199
MongoDBIndexing and Query Optimizer Details Aaron Staple MongoSV December 3, 2010
What will we cover? Many details of how indexing and the query optimizer work A full understanding of these details is not required to use mongo, but this knowledge can be helpful when making optimizations. We’ll discuss functionality of Mongo 1.8 (for our purposes pretty similar to 1.6 and almost identical to 1.7 edge). Much of the material will be presented through examples. Diagrams are to aid understanding – some details will be left out.
What will we cover? Basic index bounds Compound key index bounds Or queries Automatic index selection
How will we cover it? We’re going to try and cover this material interactively - please volunteer your thoughts on what mongo should do in given scenarios when I ask. Pertinent questions are welcome, but please keep off topic or specialized questions until the end so we don’t lose momentum.
Btree (just a conceptual diagram) 5 2 7 6 8 9 1 3 4 {_id:4,x:6}
Basic Index Bounds
Find One Document db.c.find( {x:6} ).limit( 1 ) Index {x:1}
Find One Document 6 ? 1 2 3 4 5 6 7 8 9 {_id:4,x:6}
Find One Document > db.c.find( {x:6} ).limit( 1 ).explain() { 	"cursor" : "BtreeCursor x_1", 	"nscanned" : 1, 	"nscannedObjects" : 1, 	"n" : 1, 	"millis" : 1, 	"nYields" : 0, 	"nChunkSkips" : 0, 	"isMultiKey" : false, 	"indexOnly" : false, 	"indexBounds" : { 		"x" : [ 			[ 				6, 				6 			] 		] 	} }
Find One Document 	"indexBounds" : { 		"x" : [ 			[ 				6, 				6 			] 		] 	}
Find One Document 	"nscanned" : 1, 	"nscannedObjects" : 1, 	"n" : 1,
Find One Document 6 ? 5 2 7 6 8 9 1 3 4 {_id:4,x:6}
Find One Document 6 ? 5 2 7 6 8 9 1 3 4 {_id:4,x:6}
Find One Document 6 ? Now we have duplicate x values 1 2 3 4 5 6 6 6 9 {_id:4,x:6}
Find One Document 6 ? 5 2 6 6 6 9 1 3 4 {_id:4,x:6}
Equality Match db.c.find( {x:6} ) Index {x:1}
9 Equality Match 6 ? 1 2 3 4 5 6 6 6 {_id:5,x:6} {_id:4,x:6} {_id:1,x:6}
Equality Match > db.c.find( {x:6} ).explain() { 	"cursor" : "BtreeCursor x_1", 	"nscanned" : 3, 	"nscannedObjects" : 3, 	"n" : 3, 	"millis" : 1, 	"nYields" : 0, 	"nChunkSkips" : 0, 	"isMultiKey" : false, 	"indexOnly" : false, 	"indexBounds" : { 		"x" : [ 			[ 				6, 				6 			] 		] 	} }
Equality Match 	"indexBounds" : { 		"x" : [ 			[ 				6, 				6 			] 		] 	}
Equality Match 	"nscanned" : 3, 	"nscannedObjects" : 3, 	"n" : 3,
Equality Match 6 ? 5 2 6 6 6 9 1 3 4
Full Document Matcher db.c.find( {x:6,y:1} ) Index {x:1}
9 Full Document Matcher 6 ? 1 2 3 4 5 6 6 6 {y:5,x:6} {y:4,x:6} {y:1,x:6}
Full Document Matcher > db.c.find( {x:6,y:1} ).explain() { 	"cursor" : "BtreeCursor x_1", 	"nscanned" : 3, 	"nscannedObjects" : 3, 	"n" : 1, 	"millis" : 1, 	"nYields" : 0, 	"nChunkSkips" : 0, 	"isMultiKey" : false, 	"indexOnly" : false, 	"indexBounds" : { 		"x" : [ 			[ 				6, 				6 			] 		] 	} }
Full Document Matcher 	"indexBounds" : { 		"x" : [ 			[ 				6, 				6 			] 		] 	}
Full Document Matcher 	"nscanned" : 3, 	"nscannedObjects" : 3, 	"n" : 1, Documents for all matching keys scanned, but only one document matched on non index keys.
Range Match db.c.find( {x:{$gte:4,$lte:7}} ) Index {x:1}
Range Match 4 7 <= ? <= 8 1 2 3 4 5 6 7 9
Range Match > db.c.find( {x:{$gte:4,$lte:7}} ).explain() { 	"cursor" : "BtreeCursor x_1", 	"nscanned" : 4, 	"nscannedObjects" : 4, 	"n" : 4, 	"millis" : 1, 	"nYields" : 0, 	"nChunkSkips" : 0, 	"isMultiKey" : false, 	"indexOnly" : false, 	"indexBounds" : { 		"x" : [ 			[ 				4, 				7 			] 		] 	} }
Range Match "indexBounds" : { 		"x" : [ 			[ 				4, 				7 			] 		]
Range Match 	"nscanned" : 4, 	"nscannedObjects" : 4, 	"n" : 4,
Range Match 5 2 7 6 8 9 1 3 4
Exclusive Range Match db.c.find( {x:{$gt:4,$lt:7}} ) Index {x:1}
Exclusive Range Match 4 7 < ? < 8 1 2 3 4 5 6 7 9
Exclusive Range Match > db.c.find( {x:{$gt:4,$lt:7}} ).explain() { 	"cursor" : "BtreeCursor x_1", 	"nscanned" : 2, 	"nscannedObjects" : 2, 	"n" : 2, 	"millis" : 0, 	"nYields" : 0, 	"nChunkSkips" : 0, 	"isMultiKey" : false, 	"indexOnly" : false, 	"indexBounds" : { 		"x" : [ 			[ 				4, 				7 			] 		] 	} }
Exclusive Range Match 	"indexBounds" : { 		"x" : [ 			[ 				4, 				7 			] 		] 	} Explain doesn’t indicate that the range is exclusive.
Exclusive Range Match 	"nscanned" : 2, 	"nscannedObjects" : 2, 	"n" : 2, But index keys matching the range bounds are not scanned because the bounds are exclusive.
Exclusive Range Match 5 2 7 6 8 9 1 3 4
Multikeys db.c.find( {x:{$gt:7}} ) Index {x:1}
Multikeys 7 ? > 1 2 3 4 5 6 7 9 8 {_id:4,x:[8,9]}
Multikeys > db.c.find( {x:{$gt:7}} ).explain() { 	"cursor" : "BtreeCursor x_1", 	"nscanned" : 2, 	"nscannedObjects" : 2, 	"n" : 1, 	"millis" : 1, 	"nYields" : 0, 	"nChunkSkips" : 0, 	"isMultiKey" : true, 	"indexOnly" : false, 	"indexBounds" : { 		"x" : [ 			[ 				7, 				1.7976931348623157e+308 			] 		] 	} }
Multikeys 	"indexBounds" : { 		"x" : [ 			[ 				7, 				1.7976931348623157e+308 			] 		] 	}
Multikeys 	"nscanned" : 2, 	"nscannedObjects" : 2, 	"n" : 1, All keys in valid range are scanned, but the matcher rejects duplicate documents making n == 1.
Multikeys 5 2 7 6 8 9 1 3 4
Range Types Explicit inequality ,[object Object]
db.c.find( {x:{$gt:4}} )
db.c.find( {x:{$ne:4}} )Regular expression prefix db.c.find( {x:/^a/} ) ,[object Object],db.c.find( {x:/a/} )
Range Types db.c.find( {x:{$gt:4,$lt:7}} ) 	"indexBounds" : { 		"x" : [ 			[ 				4, 				7 			] 		] 	}
Range Types db.c.find( {x:{$gt:4}} ) 	"indexBounds" : { 		"x" : [ 			[ 				4, 				1.7976931348623157e+308 			] 		] 	}
Range Types db.c.find( {x:{$ne:4}} ) 	"indexBounds" : { 		"x" : [ 			[ 				{ 					"$minElement" : 1 				}, 				4 			], 			[ 				4, 				{ 					"$maxElement" : 1 				} 			] 		] 	}
Range Types db.c.find( {x:/^a/} ) 	"indexBounds" : { 		"x" : [ 			[ 				"a", 				"b" 			], 			[ 				/^a/, 				/^a/ 			] 		] 	}
Range Types db.c.find( {x:/a/} ) 	"indexBounds" : { 		"x" : [ 			[ 				"", 				{ 				} 			], 			[ 				/a/, 				/a/ 			] 		] 	}
Set Match db.c.find( {x:{$in:[3,6]}} ) Index {x:1}
Set Match 3 6 , 8 1 2 3 4 5 6 7 9
Set Match > db.c.find( {x:{$in:[3,6]}} ).explain() { 	"cursor" : "BtreeCursor x_1 multi", 	"nscanned" : 3, 	"nscannedObjects" : 2, 	"n" : 2, 	"millis" : 8, 	"nYields" : 0, 	"nChunkSkips" : 0, 	"isMultiKey" : false, 	"indexOnly" : false, 	"indexBounds" : { 		"x" : [ 			[ 				3, 				3 			], 			[ 				6, 				6 			] 		] 	} }
Set Match 	"indexBounds" : { 		"x" : [ 			[ 				3, 				3 			], 			[ 				6, 				6 			] 		] 	}
Set Match 	"nscanned" : 3, 	"nscannedObjects" : 2, 	"n" : 2, Why is nscanned 3?  This is an algorithmic detail we’ll discuss more later, but when there are disjoint ranges for a key nscanned may be higher than the number of matching keys.
Set Match 5 2 7 6 8 9 1 3 4
All Match db.c.find( {x:{$all:[3,6]}} ) Index {x:1}
8 All Match 3 ? 1 2 3 4 5 6 7 9 {_id:4,x:[3,6]}
All Match > db.c.find( {x:{$all:[3,6]}} ).explain() { 	"cursor" : "BtreeCursor x_1", 	"nscanned" : 1, 	"nscannedObjects" : 1, 	"n" : 1, 	"millis" : 0, 	"nYields" : 0, 	"nChunkSkips" : 0, 	"isMultiKey" : true, 	"indexOnly" : false, 	"indexBounds" : { 		"x" : [ 			[ 				3, 				3 			] 		] 	} }
All Match 	"indexBounds" : { 		"x" : [ 			[ 				3, 				3 			] 		] 	} The first entry in the $all match array is always used for index bounds.  Note this may not be the least numerous indexed value in the $all array.
All Match 	"nscanned" : 1, 	"nscannedObjects" : 1, 	"n" : 1,
All Match 5 2 7 6 8 9 1 3 4
Limit db.c.find( {x:{$lt:6},y:3} ).limit( 3 ) Index {x:1}
8 Limit 6 ? < 1 2 3 4 5 6 7 9 y:3 y:3 y:3 y:1 y:3
Limit > db.c.find( {x:{$lt:6},y:3} ).limit( 3 ).explain() { 	"cursor" : "BtreeCursor x_1", 	"nscanned" : 4, 	"nscannedObjects" : 4, 	"n" : 3, 	"millis" : 1, 	"nYields" : 0, 	"nChunkSkips" : 0, 	"isMultiKey" : true, 	"indexOnly" : false, 	"indexBounds" : { 		"x" : [ 			[ 				-1.7976931348623157e+308, 				6 			] 		] 	} }
Limit 	"indexBounds" : { 		"x" : [ 			[ 				-1.7976931348623157e+308, 				6 			] 		] 	}
Limit 	"nscanned" : 4, 	"nscannedObjects" : 4, 	"n" : 3, Scan until three matches are found, then stop.
Skip db.c.find( {x:{$lt:6},y:3} ).skip( 3 ) Index {x:1}
8 Skip 6 ? < 1 2 3 4 5 6 7 9 y:3 y:3 y:3 y:1 y:3
Skip > db.c.find( {x:{$lt:6},y:3} ).skip( 3 ).explain() { 	"cursor" : "BtreeCursor x_1", 	"nscanned" : 5, 	"nscannedObjects" : 5, 	"n" : 1, 	"millis" : 1, 	"nYields" : 0, 	"nChunkSkips" : 0, 	"isMultiKey" : true, 	"indexOnly" : false, 	"indexBounds" : { 		"x" : [ 			[ 				-1.7976931348623157e+308, 				6 			] 		] 	} }
Skip 	"indexBounds" : { 		"x" : [ 			[ 				-1.7976931348623157e+308, 				6 			] 		] 	}
Skip 	"nscanned" : 5, 	"nscannedObjects" : 5, 	"n" : 1, All skipped documents are scanned.
Sort db.c.find( {x:{$lt:6}} ).sort( {x:1} ) Index {x:1}
8 Sort 6 ? < 1 2 3 4 5 6 7 9 y:3 y:3 y:3 y:1 y:3
Sort > db.c.find( {x:{$lt:6},y:3} ).sort( {x:1} ).explain() { 	"cursor" : "BtreeCursor x_1", 	"nscanned" : 5, 	"nscannedObjects" : 5, 	"n" : 4, 	"millis" : 1, 	"nYields" : 0, 	"nChunkSkips" : 0, 	"isMultiKey" : true, 	"indexOnly" : false, 	"indexBounds" : { 		"x" : [ 			[ 				-1.7976931348623157e+308, 				6 			] 		] 	} }
Sort 	"cursor" : "BtreeCursor x_1",
Sort db.c.find( {x:{$lt:6}} ).sort( {y:1} ) Index {x:1}
8 Sort 6 ? < 1 2 3 4 5 6 7 9 y:3 y:3 y:3 y:1 y:3
Sort > db.c.find( {x:{$lt:6},y:3} ).sort( {y:1} ).explain() { 	"cursor" : "BtreeCursor x_1", 	"nscanned" : 5, 	"nscannedObjects" : 5, 	"n" : 4, 	"scanAndOrder" : true, 	"millis" : 1, 	"nYields" : 0, 	"nChunkSkips" : 0, 	"isMultiKey" : true, 	"indexOnly" : false, 	"indexBounds" : { 		"x" : [ 			[ 				-1.7976931348623157e+308, 				6 			] 		] 	} }
Sort 	"cursor" : "BtreeCursor x_1", 	"nscanned" : 5, 	"nscannedObjects" : 5, 	"n" : 4, 	"scanAndOrder" : true, Results are sorted on the fly to match requested order.  The scanAndOrder field is only printed when its value is true.
Sort and scanAndOrder With “scanAndOrder” sort, all documents must be touched even if there is a limit spec. With scanAndOrder, sorting is performed in memory and the memory footprint is constrained by the limit spec if present.
Count db.c.count( {x:{$gte:4,$lte:7}} ) Index {x:1}
Count 4 7 <= ? <= 8 1 2 3 4 5 6 7 9
Count We’re just counting keys here, not loading the full documents. 5 2 7 6 8 9 1 3 4
Count With some operators the full document must be checked.  Some of these cases: ,[object Object]
$size
array match
Negation - $ne, $nin, $not, etc.With current semantics, all multikey elements must match negation constraints Multikey de duplication works without loading full document
Covered Indexes db.c.find( {x:6}, {x:1,_id:0} ) Index {x:1} Id would be returned by default, but isn’t in the index so we need to exclude to return only indexed fields.
8 Covered Indexes 6 ? 1 2 3 4 5 6 7 9 {_id:4,x:6}
Covered Indexes > db.c.find( {x:6}, {x:1,_id:0} ).explain() { 	"cursor" : "BtreeCursor x_1", 	"nscanned" : 1, 	"nscannedObjects" : 1, 	"n" : 1, 	"millis" : 0, 	"nYields" : 0, 	"nChunkSkips" : 0, 	"isMultiKey" : false, 	"indexOnly" : true, 	"indexBounds" : { 		"x" : [ 			[ 				6, 				6 			] 		] 	} }
Covered Indexes 	"isMultiKey" : false, 	"indexOnly" : true,
8 Covered Indexes 6 ? 1 2 3 4 5 6 7 9 {_id:4,x:[6,7]}
Covered Indexes > db.c.find( {x:6}, {x:1,_id:0} ).explain() { 	"cursor" : "BtreeCursor x_1", 	"nscanned" : 1, 	"nscannedObjects" : 1, 	"n" : 1, 	"millis" : 0, 	"nYields" : 0, 	"nChunkSkips" : 0, 	"isMultiKey" : true, 	"indexOnly" : false, 	"indexBounds" : { 		"x" : [ 			[ 				6, 				6 			] 		] 	} }
Covered Indexes 	"isMultiKey" : true, 	"indexOnly" : false, Currently we set isMultiKey to true the first time we save a doc where the field is a multikey array.  But when all multikey docs are removed we don’t reset isMultiKey.  This can be improved.
Update db.c.find( {x:{$gte:4,$lte:7}}, {$set:{x:2}} ) Index {x:1}
Update 4 7 <= ? <= 8 1 2 3 4 5 6 7 9 {_id:4,x:4}
Update 5 2 7 6 8 9 1 3 4 {_id:4,x:4}
Update 5 2 7 6 8 9 1 3 4 {_id:4,x:4}
Update 5 2 7 6 8 9 1 2 3 {_id:4,x:2}
Update We track the set of documents that have been updated in the course of the current operation so they are only updated once.
Compound Key Index Bounds
Two Equality Bounds db.c.find( {x:5,y:’c’} ) Index {x:1,y:1}
Two Equality Bounds ? 5 1 3 4 5 5 6 7 9 5 c b d g d f c a b c
Two Equality Bounds > db.c.find( {x:5,y:'c'} ).explain() { 	"cursor" : "BtreeCursor x_1_y_1", 	"nscanned" : 1, 	"nscannedObjects" : 1, 	"n" : 1, 	"millis" : 1, 	"nYields" : 0, 	"nChunkSkips" : 0, 	"isMultiKey" : false, 	"indexOnly" : false, 	"indexBounds" : { 		"x" : [ 			[ 				5, 				5 			] 		], 		"y" : [ 			[ 				"c", 				"c" 			] 		] 	} }
Two Equality Bounds 	"indexBounds" : { 		"x" : [ 			[ 				5, 				5 			] 		], 		"y" : [ 			[ 				"c", 				"c" 			] 		] 	} }
Two Equality Bounds 	"nscanned" : 1, 	"nscannedObjects" : 1, 	"n" : 1,
Two Equality Bounds ? 1 3 4 5 5 5 5 6 7 9 b d g c d f c c a b
Equality and Set db.c.find( {x:5,y:{$in:[’c’,’f’]}} ) Index {x:1,y:1}
Equality and Set , 5 1 3 4 5 5 6 7 9 5 5 c b d g d f c a b c f
Equality and Set > db.c.find( {x:5,y:{$in:['c','f']}} ).explain() { 	"cursor" : "BtreeCursor x_1_y_1 multi", 	"nscanned" : 3, 	"nscannedObjects" : 2, 	"n" : 2, 	"millis" : 1, 	"nYields" : 0, 	"nChunkSkips" : 0, 	"isMultiKey" : false, 	"indexOnly" : false, 	"indexBounds" : { 		"x" : [ 			[ 				5, 				5 			] 		], 		"y" : [ 			[ 				"c", 				"c" 			], 			[ 				"f", 				"f" 			] 		] 	} }
Equality and Set 	"indexBounds" : { 		"x" : [ 			[ 				5, 				5 			] 		], 		"y" : [ 			[ 				"c", 				"c" 			], 			[ 				"f", 				"f" 			] 		] 	}
Equality and Set 	"nscanned" : 3, 	"nscannedObjects" : 2, 	"n" : 2,
Equality and Set 1 3 4 5 5 5 6 7 9 b d g c d f c a b
Equality and Range db.c.find( {x:5,y:{$gte:’d’}} ) Index {x:1,y:1}
Equality and Range <= ? <=  1 3 4 5 5 6 7 9 5 5 5 b d g d f c a b c d max string
Equality and Range > db.c.find( {x:5,y:{$gte:'d'}} ).explain() { 	"cursor" : "BtreeCursor x_1_y_1", 	"nscanned" : 2, 	"nscannedObjects" : 2, 	"n" : 2, 	"millis" : 1, 	"nYields" : 0, 	"nChunkSkips" : 0, 	"isMultiKey" : false, 	"indexOnly" : false, 	"indexBounds" : { 		"x" : [ 			[ 				5, 				5 			] 		], 		"y" : [ 			[ 				"d", 				{ 				} 			] 		] 	} }
Equality and Range 	"indexBounds" : { 		"x" : [ 			[ 				5, 				5 			] 		], 		"y" : [ 			[ 				"d", 				{ 				} 			] 		] 	}
Equality and Range 	"nscanned" : 2, 	"nscannedObjects" : 2, 	"n" : 2,
Equality and Range 1 3 4 5 5 5 6 7 9 b d g c d f c a b
Two Set Bounds db.c.find( {x:{$in:[5,9]},y:{$in:[’c’,’f’]}} ) Index {x:1,y:1}
Two Set Bounds , , , 5 1 3 4 5 5 6 7 9 5 5 9 9 c b d g d f c a f c f c f
Two Set Bounds > db.c.find( {x:{$in:[5,9]},y:{$in:['c','f']}} ).explain() { 	"cursor" : "BtreeCursor x_1_y_1 multi", 	"nscanned" : 5, 	"nscannedObjects" : 3, 	"n" : 3, 	"millis" : 0, 	"nYields" : 0, 	"nChunkSkips" : 0, 	"isMultiKey" : false, 	"indexOnly" : false, 	"indexBounds" : { 		"x" : [ 			[ 				5, 				5 			], 			[ 				9, 				9 			] 		], 		"y" : [ 			[ 				"c", 				"c" 			], 			[ 				"f", 				"f" 			] 		] 	} }
Two Set Bounds 	"indexBounds" : { 		"x" : [ 			[ 				5, 				5 			], 			[ 				9, 				9 			] 		], 		"y" : [ 			[ 				"c", 				"c" 			], 			[ 				"f", 				"f" 			] 		] 	}
Two Set Bounds 	"nscanned" : 5, 	"nscannedObjects" : 3, 	"n" : 3,
Two Set Bounds 1 3 4 5 5 5 6 7 9 b d g c d f c a f
Set and Range db.c.find( {x:{$in:[5,9]},y:{$lte:’d’}} ) Index {x:1,y:1}
Set and Range , <=?<= <=?<= 5 1 3 4 5 5 6 9 9 5 5 9 9 b d g d f c a f c d d min string min string
Set and Range > db.c.find( {x:{$in:[5,9]},y:{$lte:'d'}} ).explain() { 	"cursor" : "BtreeCursor x_1_y_1 multi", 	"nscanned" : 5, 	"nscannedObjects" : 3, 	"n" : 3, 	"millis" : 0, 	"nYields" : 0, 	"nChunkSkips" : 0, 	"isMultiKey" : false, 	"indexOnly" : false, 	"indexBounds" : { 		"x" : [ 			[ 				5, 				5 			], 			[ 				9, 				9 			] 		], 		"y" : [ 			[ 				"", 				"d" 			] 		] 	} }
Set and Range 		"x" : [ 			[ 				5, 				5 			], 			[ 				9, 				9 			] 		], 		"y" : [ 			[ 				"", 				"d" 			] 		] 	}
Set and Range 	"nscanned" : 5, 	"nscannedObjects" : 3, 	"n" : 3,
Range and Equality db.c.find( {x:{$gte:4},y:’c’} ) Index {x:1,y:1}
Range and Equality ? and ? >= 4 1 3 4 5 6 7 9 5 8 b d g d a e f c c c
Range and Equality > db.c.find( {x:{$gte:4},y:'c'} ).explain() { 	"cursor" : "BtreeCursor x_1_y_1", 	"nscanned" : 7, 	"nscannedObjects" : 2, 	"n" : 2, 	"millis" : 0, 	"nYields" : 0, 	"nChunkSkips" : 0, 	"isMultiKey" : false, 	"indexOnly" : false, 	"indexBounds" : { 		"x" : [ 			[ 				4, 				1.7976931348623157e+308 			] 		], 		"y" : [ 			[ 				"c", 				"c" 			] 		] 	} }
Range and Equality 	"indexBounds" : { 		"x" : [ 			[ 				4, 				1.7976931348623157e+308 			] 		], 		"y" : [ 			[ 				"c", 				"c" 			] 		] 	}
Range and Equality 	"nscanned" : 7, 	"nscannedObjects" : 2, 	"n" : 2, High nscanned because every distinct value of x must be checked.
Range and Equality 1 3 4 5 5 9 6 7 8 b d g c d f a e c
Range and Equality 1 3 4 5 5 9 6 7 8 b d g c d f a e c Every distinct value of x must be checked.
Range and Set db.c.find( {x:{$gte:4},y:{$in:[’c’,’a’]}} ) Index {x:1,y:1}
Range and Set , and ? >= 4 1 3 4 5 6 7 9 5 8 b d g d a e f c c c a
Range and Set > db.c.find( {x:{$gte:4},y:{$in:['c','a']}} ).explain() { 	"cursor" : "BtreeCursor x_1_y_1 multi", 	"nscanned" : 7, 	"nscannedObjects" : 3, 	"n" : 3, 	"millis" : 0, 	"nYields" : 0, 	"nChunkSkips" : 0, 	"isMultiKey" : false, 	"indexOnly" : false, 	"indexBounds" : { 		"x" : [ 			[ 				4, 				1.7976931348623157e+308 			] 		], 		"y" : [ 			[ 				"a", 				"a" 			], 			[ 				"c", 				"c" 			] 		] 	} }
Range and Set 	"indexBounds" : { 		"x" : [ 			[ 				4, 				1.7976931348623157e+308 			] 		], 		"y" : [ 			[ 				"a", 				"a" 			], 			[ 				"c", 				"c" 			] 		] 	}
Range and Set 	"nscanned" : 7, 	"nscannedObjects" : 3, 	"n" : 3,
Range and Set 1 3 4 5 5 9 6 7 8 b d g c d f a e c
Range and Set 1 3 4 5 5 9 6 7 8 b d g c d f a e c Every distinct value of x must be checked for y values ‘a’ and ‘c’.
Two Ranges (2D Box) db.c.find( {x:{$gte:3,$lte:7},y:{$gte:’c’,$lte:’f’}} ) Index {x:1,y:1}
Two Ranges (2D Box) y f {x: {$gte:3,$lte:7}, y: {$gte:’c’,$lte:’f’}} c x 7 3
Two Ranges (2D Box) & <=?<= <=?<= 7 1 3 4 5 6 7 9 5 7 3 b d g d a e f c c g f
Two Ranges (2D Box) > db.c.find( {x:{$gte:3,$lte:7},y:{$gte:'c',$lte:'f'}} ).explain() { 	"cursor" : "BtreeCursor x_1_y_1", 	"nscanned" : 6, 	"nscannedObjects" : 4, 	"n" : 4, 	"millis" : 1, 	"nYields" : 0, 	"nChunkSkips" : 0, 	"isMultiKey" : false, 	"indexOnly" : false, 	"indexBounds" : { 		"x" : [ 			[ 				3, 				7 			] 		], 		"y" : [ 			[ 				"c", 				"f" 			] 		] 	} }
Two Ranges (2D Box) 	"indexBounds" : { 		"x" : [ 			[ 				3, 				7 			] 		], 		"y" : [ 			[ 				"c", 				"f" 			] 		] 	}
Two Ranges (2D Box) 	"nscanned" : 6, 	"nscannedObjects" : 4, 	"n" : 4,
Two Ranges (2D Box) 1 3 4 5 5 9 6 7 7 b d g c d f a e g
Two Ranges (2D Box) 7 3 For every distinct value of x in this range c f <=?<= Scan for every value of y in this range <=?<=
$or
Disjoint $or Criteria db.c.find( {$or:[{x:5},{y:’d’}]} ) Indexes {x:1}, {y:1}
Disjoint $or Criteria ? 1 3 4 5 6 7 9 5 7 5 1 3 4 5 6 7 9 5 7 b d g d a e f c d g b d g d a e f c g ?
Disjoint $or Criteria	 > db.c.find( {$or:[{x:5},{y:'d'}]} ).explain() { 	"clauses" : [ 		{ 			"cursor" : "BtreeCursor x_1", 			"nscanned" : 2, 			"nscannedObjects" : 2, 			"n" : 2, 			"millis" : 0, 			"nYields" : 0, 			"nChunkSkips" : 0, 			"isMultiKey" : false, 			"indexOnly" : false, 			"indexBounds" : { 				"x" : [ 					[ 						5, 						5 					] 				] 			} 		}, 		{ 			"cursor" : "BtreeCursor y_1", 			"nscanned" : 2, 			"nscannedObjects" : 2, 			"n" : 1, 			"millis" : 1, 			"nYields" : 0, 			"nChunkSkips" : 0, 			"isMultiKey" : false, 			"indexOnly" : false, 			"indexBounds" : { 				"y" : [ 					[ 						"d", 						"d" 					] 				] 			} 		} 	], 	"nscanned" : 4, 	"nscannedObjects" : 4, 	"n" : 3, 	"millis" : 1 }
Disjoint $or Criteria 		{ 			"cursor" : "BtreeCursor x_1", 			"nscanned" : 2, 			"nscannedObjects" : 2, 			"n" : 2, 			"millis" : 0, 			"nYields" : 0, 			"nChunkSkips" : 0, 			"isMultiKey" : false, 			"indexOnly" : false, 			"indexBounds" : { 				"x" : [ 					[ 						5, 						5 					] 				] 			} 		},
Disjoint $or Criteria 		{ 			"cursor" : "BtreeCursor y_1", 			"nscanned" : 2, 			"nscannedObjects" : 2, 			"n" : 1, 			"millis" : 1, 			"nYields" : 0, 			"nChunkSkips" : 0, 			"isMultiKey" : false, 			"indexOnly" : false, 			"indexBounds" : { 				"y" : [ 					[ 						"d", 						"d" 					] 				] 			} 		} 	], Only return one document matching this clause.
Disjoint $or Criteria 	"nscanned" : 4, 	"nscannedObjects" : 4, 	"n" : 3, 	"millis" : 1
Disjoint $or Criteria ? 1 3 4 5 6 7 9 5 7 5 b d g d a e f c g ✓
Disjoint $or Criteria ? 1 3 4 5 6 7 9 5 7 d b d g d a e f c g ✓ We have already scanned the x index for x:5.  So this document was returned already.  We don’t return it again.
Unindexed $or Clause db.c.find( {$or:[{x:5},{y:’d’}]} ) Index {x:1} (no index on y)
Unindexed $or Clause > db.c.find( {$or:[{x:5},{y:'d'}]} ).explain() { 	"cursor" : "BasicCursor", 	"nscanned" : 9, 	"nscannedObjects" : 9, 	"n" : 3, 	"millis" : 0, 	"nYields" : 0, 	"nChunkSkips" : 0, 	"isMultiKey" : false, 	"indexOnly" : false, 	"indexBounds" : { 	} } Since y is not indexed, we must do a full collection scan to match y:’d’.  Since a full scan is required, we don’t use the index on x to match x:5.
Eliminated $or Clause db.c.find( {$or:[{x:{$gt:2,$lt:6}},{x:5}]} ) Index {x:1}
Eliminated $or Clause 2 6 < ? < 8 1 2 3 4 6 7 9 5 5 ? 8 1 2 3 4 6 7 9 5
Eliminated $or Clause > db.c.find( {$or:[{x:{$gt:2,$lt:6}},{x:5}]} ).explain() { 	"cursor" : "BtreeCursor x_1", 	"nscanned" : 3, 	"nscannedObjects" : 3, 	"n" : 3, 	"millis" : 0, 	"nYields" : 0, 	"nChunkSkips" : 0, 	"isMultiKey" : false, 	"indexOnly" : false, 	"indexBounds" : { 		"x" : [ 			[ 				2, 				6 			] 		] 	} } The index range of the second clause is included in the index range of the first clause, so we use the first index range only.
Eliminated $or Clause with Differing Unindexed Criteria db.c.find( {$or:[{x:{$gt:2,$lt:6},y:’c’},{x:5,y:'d’}]} ) Index {x:1}
Eliminated $or Clause with Differing Unindexed Criteria < ? < and 1 3 4 5 6 7 9 5 7 2 6 1 3 4 5 6 7 9 5 7 5 b d g d a e f c g c b d g d a e f c g d and
Eliminated $or Clause with Differing Unindexed Criteria > db.c.find( {$or:[{x:{$gt:2,$lt:6},y:’c’},{x:5,y:'d’}]} ).explain() { 	"cursor" : "BtreeCursor x_1", 	"nscanned" : 4, 	"nscannedObjects" : 4, 	"n" : 2, 	"millis" : 1, 	"nYields" : 0, 	"nChunkSkips" : 0, 	"isMultiKey" : false, 	"indexOnly" : false, 	"indexBounds" : { 		"x" : [ 			[ 				2, 				6 			] 		] 	} }
Eliminated $or Clause with Differing Unindexed Criteria 1 3 4 5 6 7 9 5 7 2 6 < ? < and , b d g d a e f c g c d The index range for the first clause contains the index range for the second clause, so all matching is done using the index range for the first clause.
Overlapping $or Clauses db.c.find( {$or:[{x:{$gt:2,$lt:6}},{x:{$gt:4,$lt:7}}]} ) Index {x:1,y:1}
Overlapping $or Clauses 2 6 < ? < 8 1 2 3 4 6 7 9 5 4 7 < ? < 8 1 2 3 4 6 7 9 5
Overlapping $or Clauses > db.d.find( {$or:[{x:{$gt:2,$lt:6}},{x:{$gt:4,$lt:7}}]} ).explain() { 	"clauses" : [ 		{ 			"cursor" : "BtreeCursor x_1", 			"nscanned" : 3, 			"nscannedObjects" : 3, 			"n" : 3, 			"millis" : 0, 			"nYields" : 0, 			"nChunkSkips" : 0, 			"isMultiKey" : false, 			"indexOnly" : false, 			"indexBounds" : { 				"x" : [ 					[ 						2, 						6 					] 				] 			} 		}, 		{ 			"cursor" : "BtreeCursor x_1", 			"nscanned" : 1, 			"nscannedObjects" : 1, 			"n" : 1, 			"millis" : 1, 			"nYields" : 0, 			"nChunkSkips" : 0, 			"isMultiKey" : false, 			"indexOnly" : false, 			"indexBounds" : { 				"x" : [ 					[ 						6, 						7 					] 				] 			} 		} 	], 	"nscanned" : 4, 	"nscannedObjects" : 4, 	"n" : 4, 	"millis" : 1 } >
Overlapping $or Clauses 		{ 			"cursor" : "BtreeCursor x_1", 			"nscanned" : 3, 			"nscannedObjects" : 3, 			"n" : 3, 			"millis" : 0, 			"nYields" : 0, 			"nChunkSkips" : 0, 			"isMultiKey" : false, 			"indexOnly" : false, 			"indexBounds" : { 				"x" : [ 					[ 						2, 						6 					] 				] 			} 		},
Overlapping $or Clauses 		{ 			"cursor" : "BtreeCursor x_1", 			"nscanned" : 1, 			"nscannedObjects" : 1, 			"n" : 1, 			"millis" : 1, 			"nYields" : 0, 			"nChunkSkips" : 0, 			"isMultiKey" : false, 			"indexOnly" : false, 			"indexBounds" : { 				"x" : [ 					[ 						6, 						7 					] 				] 			} 		} The index range scanned for the previous clause is removed.
Overlapping $or Clauses 2 6 < ? < 8 1 2 3 4 6 7 9 5 6 7 <= ? < 8 1 2 3 4 7 9 5 6
2D Overlapping $or Clauses db.c.find( {$or:[{x:{$gt:2,$lt:6},y:{$gt:’b’,$lt:’f’}},{x:{$gt:4,$lt:7},y:{$gt:’b’,$lt:’e’}}]} ) Index {x:1,y:1}
2D Overlapping $or Clauses y f Clause 1 e Clause 2 b x 7 6 2
2D Overlapping $or Clauses > db.c.find( {$or:[{x:{$gt:2,$lt:6},y:{$gt:'b',$lt:'f'}},{x:{$gt:4,$lt:7},y:{$gt:'b',$lt:'e'}}]} ).explain() { 	"clauses" : [ 		{ 			"cursor" : "BtreeCursor x_1_y_1", 			"nscanned" : 4, 			"nscannedObjects" : 3, 			"n" : 3, 			"millis" : 1, 			"nYields" : 0, 			"nChunkSkips" : 0, 			"isMultiKey" : false, 			"indexOnly" : false, 			"indexBounds" : { 				"x" : [ 					[ 						2, 						6 					] 				], 				"y" : [ 					[ 						"b", 						"f" 					] 				] 			} 		}, 		{ 			"cursor" : "BtreeCursor x_1_y_1", 			"nscanned" : 0, 			"nscannedObjects" : 0, 			"n" : 0, 			"millis" : 1, 			"nYields" : 0, 			"nChunkSkips" : 0, 			"isMultiKey" : false, 			"indexOnly" : false, 			"indexBounds" : { 				"x" : [ 					[ 						6, 						7 					] 				], 				"y" : [ 					[ 						"b", 						"e" 					] 				] 			} 		} 	], 	"nscanned" : 4, 	"nscannedObjects" : 3, 	"n" : 3, 	"millis" : 1 }
2D Overlapping $or Clauses 		{ 			"cursor" : "BtreeCursor x_1_y_1", 			"nscanned" : 4, 			"nscannedObjects" : 3, 			"n" : 3, 			"millis" : 1, 			"nYields" : 0, 			"nChunkSkips" : 0, 			"isMultiKey" : false, 			"indexOnly" : false, 			"indexBounds" : { 				"x" : [ 					[ 						2, 						6 					] 				], 				"y" : [ 					[ 						"b", 						"f" 					] 				] 			} 		},
2D Overlapping $or Clauses 		{ 			"cursor" : "BtreeCursor x_1_y_1", 			"nscanned" : 0, 			"nscannedObjects" : 0, 			"n" : 0, 			"millis" : 1, 			"nYields" : 0, 			"nChunkSkips" : 0, 			"isMultiKey" : false, 			"indexOnly" : false, 			"indexBounds" : { 				"x" : [ 					[ 						6, 						7 					] 				], 				"y" : [ 					[ 						"b", 						"e" 					] 				] 			} 		} 	], The index range scanned for the previous clause is removed.
2D Overlapping $or Clauses y We only have to scan the remainder here f Clause 1 e Clause 2 b x 7 6 2
Overlapping $or Clauses Rule of thumb for n dimensions: We subtract earlier clause boxes from current box when the result is a/some box(es). 2 ✓ 1 1 2 ✓ ✓
Overlapping $or Clauses Rule of thumb for n dimensions: We subtract earlier clause boxes from current box when the result is a/some box(es). 1 2 ✗
$or TODO Use indexes on $or fields to satisfy a sort specification SERVER-1205 Use full query optimizer to select $or clause indexes in getMore SERVER-1215 Improve index range elimination (handling some cases where remainder is not a box)
Automatic Index Selection(Query Optimizer)
Optimal Index find( {x:5} ) Index {x:1} Index {x:1,y:1} find( {x:5} ).sort( {y:1 } ) Index {x:1,y:1} find( {} ).sort( {x:1} ) Index {x:1} find( {x:{$gt:1,$lt:7}} ).sort( {x:1} ) Index {x:1}
Optimal Index Rule of Thumb No scanAndOrder All fields with index useful constraints are indexed If there is a range or sort it is the last field of the index used to resolve the query If multiple optimal indexes exist, one chosen arbitrarily.
Optimal Index These same criteria are useful when you are designing your indexes.
Multiple Candidate Indexes find( {x:4,y:’a’} ) Index {x:1} or {y:1}? find( {x:4} ).sort( {y:1} ) Index {x:1} or {y:1}? Note: {x:1,y:1} is optimal find( {x:{$gt:2,$lt:7},y:{$gt:’a’,$lt:’f’}} ) Index {x:1,y:1} or {y:1,x:1}?
Multiple Candidate Indexes The only index selection criterion is nscanned find( {x:4,y:’a’} ) Index {x:1} or {y:1} ? If fewer documents match {y:’a’} than {x:4} then nscanned for {y:1} will be less so we pick {y:1} find( {x:{$gt:2,$lt:7},y:{$gt:’b’,$lt:’f’}} ) Index {x:1,y:1} or {y:1,x:1} ? If fewer distinct values of 2 < x < 7 than distinct values of ‘b’ < y < ‘f’ then {x:1,y:1} chosen (rule of thumb)
Multiple Candidate Indexes The only index selection criterion is nscanned Pretty good, but doesn’t cover every case, eg Cost of scanAndOrdervs ordered index Cost of loading full document vs just index key Cost of scanning adjacent btree keys vs non adjacent keys/documents
Competing Indexes At most one query plan per index Run in interleaved fashion Plans kept in a priority queue ordered by nscanned.  We always continue progress on plan with lowest nscanned.
Competing Indexes Run until one plan returns all results or enough results to satisfy the initial query request (based on soft limit spec / data size requirement for initial query). We only allow plans to compete in initial query.  In getMore, we continue reading from the index cursor established by the initial query.
“Learning” a Query Plan When an index is chosen for a query the query’s “pattern” and nscanned are recorded find( {x:3,y:’c’} ) {Pattern: {x:’equality’, y:’equality’}, Index: {x:1}, nscanned: 50} find( {x:{$gt:5},y:{$lt:’z’}} ) {Pattern: {x:’gt bound’, y:’lt bound’}, Index: {y:1}, nscanned: 500}
“Learning” a Query Plan When a new query matches the same pattern, the same query plan is used find( {x:5,y:’z’} ) Use index {x:1} find( {x:{$gt:20},y:{$lt:’b’}} ) Use index {y:1}

Mais conteúdo relacionado

Destaque

The Physical Interface
The Physical InterfaceThe Physical Interface
The Physical InterfaceJosh Clark
 
Mobile-First SEO - The Marketers Edition #3XEDigital
Mobile-First SEO - The Marketers Edition #3XEDigitalMobile-First SEO - The Marketers Edition #3XEDigital
Mobile-First SEO - The Marketers Edition #3XEDigitalAleyda Solís
 
The Vortex of Change - Digital Transformation (Presented by Intel)
The Vortex of Change - Digital Transformation (Presented by Intel)The Vortex of Change - Digital Transformation (Presented by Intel)
The Vortex of Change - Digital Transformation (Presented by Intel)Cloudera, Inc.
 
Using Big Data to Transform Your Customer’s Experience - Part 1

Using Big Data to Transform Your Customer’s Experience - Part 1
Using Big Data to Transform Your Customer’s Experience - Part 1

Using Big Data to Transform Your Customer’s Experience - Part 1
Cloudera, Inc.
 
Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud World
Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud WorldPart 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud World
Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud WorldCloudera, Inc.
 
Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the Cloud
Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the CloudPart 2: Cloudera’s Operational Database: Unlocking New Benefits in the Cloud
Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the CloudCloudera, Inc.
 
Install Apache Hadoop for Development/Production
Install Apache Hadoop for  Development/ProductionInstall Apache Hadoop for  Development/Production
Install Apache Hadoop for Development/ProductionIMC Institute
 
Apache Spark & Hadoop : Train-the-trainer
Apache Spark & Hadoop : Train-the-trainerApache Spark & Hadoop : Train-the-trainer
Apache Spark & Hadoop : Train-the-trainerIMC Institute
 
Using Big Data to Drive Customer 360
Using Big Data to Drive Customer 360Using Big Data to Drive Customer 360
Using Big Data to Drive Customer 360Cloudera, Inc.
 
Introduction to Apache Spark Developer Training
Introduction to Apache Spark Developer TrainingIntroduction to Apache Spark Developer Training
Introduction to Apache Spark Developer TrainingCloudera, Inc.
 

Destaque (14)

The Physical Interface
The Physical InterfaceThe Physical Interface
The Physical Interface
 
Mobile-First SEO - The Marketers Edition #3XEDigital
Mobile-First SEO - The Marketers Edition #3XEDigitalMobile-First SEO - The Marketers Edition #3XEDigital
Mobile-First SEO - The Marketers Edition #3XEDigital
 
Map Reduce v2 and YARN - CHUG - 20120604
Map Reduce v2 and YARN - CHUG - 20120604Map Reduce v2 and YARN - CHUG - 20120604
Map Reduce v2 and YARN - CHUG - 20120604
 
The Vortex of Change - Digital Transformation (Presented by Intel)
The Vortex of Change - Digital Transformation (Presented by Intel)The Vortex of Change - Digital Transformation (Presented by Intel)
The Vortex of Change - Digital Transformation (Presented by Intel)
 
Using Big Data to Transform Your Customer’s Experience - Part 1

Using Big Data to Transform Your Customer’s Experience - Part 1
Using Big Data to Transform Your Customer’s Experience - Part 1

Using Big Data to Transform Your Customer’s Experience - Part 1

 
Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud World
Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud WorldPart 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud World
Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud World
 
Top 5 IoT Use Cases
Top 5 IoT Use CasesTop 5 IoT Use Cases
Top 5 IoT Use Cases
 
Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the Cloud
Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the CloudPart 2: Cloudera’s Operational Database: Unlocking New Benefits in the Cloud
Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the Cloud
 
Install Apache Hadoop for Development/Production
Install Apache Hadoop for  Development/ProductionInstall Apache Hadoop for  Development/Production
Install Apache Hadoop for Development/Production
 
Apache Spark & Hadoop : Train-the-trainer
Apache Spark & Hadoop : Train-the-trainerApache Spark & Hadoop : Train-the-trainer
Apache Spark & Hadoop : Train-the-trainer
 
Using Big Data to Drive Customer 360
Using Big Data to Drive Customer 360Using Big Data to Drive Customer 360
Using Big Data to Drive Customer 360
 
Introduction to Apache Spark Developer Training
Introduction to Apache Spark Developer TrainingIntroduction to Apache Spark Developer Training
Introduction to Apache Spark Developer Training
 
GIT Best Practices V 0.1
GIT Best Practices V 0.1GIT Best Practices V 0.1
GIT Best Practices V 0.1
 
Creative Overview
Creative OverviewCreative Overview
Creative Overview
 

Semelhante a MongoDB Indexing: The Details

Boostのあるプログラミング生活
Boostのあるプログラミング生活Boostのあるプログラミング生活
Boostのあるプログラミング生活Akira Takahashi
 
Chico-UI en escuela DaVinci
Chico-UI en escuela DaVinciChico-UI en escuela DaVinci
Chico-UI en escuela DaVinciNatan Santolo
 
Python and pandas as back end to real-time data driven applications by Giovan...
Python and pandas as back end to real-time data driven applications by Giovan...Python and pandas as back end to real-time data driven applications by Giovan...
Python and pandas as back end to real-time data driven applications by Giovan...PyData
 
N03 app engineseminar
N03 app engineseminarN03 app engineseminar
N03 app engineseminarSun-Jin Jang
 

Semelhante a MongoDB Indexing: The Details (6)

2011 01-18 mongo-db
2011 01-18 mongo-db2011 01-18 mongo-db
2011 01-18 mongo-db
 
Boostのあるプログラミング生活
Boostのあるプログラミング生活Boostのあるプログラミング生活
Boostのあるプログラミング生活
 
MS Swit 2010
MS Swit 2010MS Swit 2010
MS Swit 2010
 
Chico-UI en escuela DaVinci
Chico-UI en escuela DaVinciChico-UI en escuela DaVinci
Chico-UI en escuela DaVinci
 
Python and pandas as back end to real-time data driven applications by Giovan...
Python and pandas as back end to real-time data driven applications by Giovan...Python and pandas as back end to real-time data driven applications by Giovan...
Python and pandas as back end to real-time data driven applications by Giovan...
 
N03 app engineseminar
N03 app engineseminarN03 app engineseminar
N03 app engineseminar
 

Mais de MongoDB

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump StartMongoDB
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB
 

Mais de MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
 

MongoDB Indexing: The Details

  • 1. MongoDBIndexing and Query Optimizer Details Aaron Staple MongoSV December 3, 2010
  • 2. What will we cover? Many details of how indexing and the query optimizer work A full understanding of these details is not required to use mongo, but this knowledge can be helpful when making optimizations. We’ll discuss functionality of Mongo 1.8 (for our purposes pretty similar to 1.6 and almost identical to 1.7 edge). Much of the material will be presented through examples. Diagrams are to aid understanding – some details will be left out.
  • 3. What will we cover? Basic index bounds Compound key index bounds Or queries Automatic index selection
  • 4. How will we cover it? We’re going to try and cover this material interactively - please volunteer your thoughts on what mongo should do in given scenarios when I ask. Pertinent questions are welcome, but please keep off topic or specialized questions until the end so we don’t lose momentum.
  • 5. Btree (just a conceptual diagram) 5 2 7 6 8 9 1 3 4 {_id:4,x:6}
  • 7. Find One Document db.c.find( {x:6} ).limit( 1 ) Index {x:1}
  • 8. Find One Document 6 ? 1 2 3 4 5 6 7 8 9 {_id:4,x:6}
  • 9. Find One Document > db.c.find( {x:6} ).limit( 1 ).explain() { "cursor" : "BtreeCursor x_1", "nscanned" : 1, "nscannedObjects" : 1, "n" : 1, "millis" : 1, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : false, "indexBounds" : { "x" : [ [ 6, 6 ] ] } }
  • 10. Find One Document "indexBounds" : { "x" : [ [ 6, 6 ] ] }
  • 11. Find One Document "nscanned" : 1, "nscannedObjects" : 1, "n" : 1,
  • 12. Find One Document 6 ? 5 2 7 6 8 9 1 3 4 {_id:4,x:6}
  • 13. Find One Document 6 ? 5 2 7 6 8 9 1 3 4 {_id:4,x:6}
  • 14. Find One Document 6 ? Now we have duplicate x values 1 2 3 4 5 6 6 6 9 {_id:4,x:6}
  • 15. Find One Document 6 ? 5 2 6 6 6 9 1 3 4 {_id:4,x:6}
  • 16. Equality Match db.c.find( {x:6} ) Index {x:1}
  • 17. 9 Equality Match 6 ? 1 2 3 4 5 6 6 6 {_id:5,x:6} {_id:4,x:6} {_id:1,x:6}
  • 18. Equality Match > db.c.find( {x:6} ).explain() { "cursor" : "BtreeCursor x_1", "nscanned" : 3, "nscannedObjects" : 3, "n" : 3, "millis" : 1, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : false, "indexBounds" : { "x" : [ [ 6, 6 ] ] } }
  • 19. Equality Match "indexBounds" : { "x" : [ [ 6, 6 ] ] }
  • 20. Equality Match "nscanned" : 3, "nscannedObjects" : 3, "n" : 3,
  • 21. Equality Match 6 ? 5 2 6 6 6 9 1 3 4
  • 22. Full Document Matcher db.c.find( {x:6,y:1} ) Index {x:1}
  • 23. 9 Full Document Matcher 6 ? 1 2 3 4 5 6 6 6 {y:5,x:6} {y:4,x:6} {y:1,x:6}
  • 24. Full Document Matcher > db.c.find( {x:6,y:1} ).explain() { "cursor" : "BtreeCursor x_1", "nscanned" : 3, "nscannedObjects" : 3, "n" : 1, "millis" : 1, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : false, "indexBounds" : { "x" : [ [ 6, 6 ] ] } }
  • 25. Full Document Matcher "indexBounds" : { "x" : [ [ 6, 6 ] ] }
  • 26. Full Document Matcher "nscanned" : 3, "nscannedObjects" : 3, "n" : 1, Documents for all matching keys scanned, but only one document matched on non index keys.
  • 27. Range Match db.c.find( {x:{$gte:4,$lte:7}} ) Index {x:1}
  • 28. Range Match 4 7 <= ? <= 8 1 2 3 4 5 6 7 9
  • 29. Range Match > db.c.find( {x:{$gte:4,$lte:7}} ).explain() { "cursor" : "BtreeCursor x_1", "nscanned" : 4, "nscannedObjects" : 4, "n" : 4, "millis" : 1, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : false, "indexBounds" : { "x" : [ [ 4, 7 ] ] } }
  • 30. Range Match "indexBounds" : { "x" : [ [ 4, 7 ] ]
  • 31. Range Match "nscanned" : 4, "nscannedObjects" : 4, "n" : 4,
  • 32. Range Match 5 2 7 6 8 9 1 3 4
  • 33. Exclusive Range Match db.c.find( {x:{$gt:4,$lt:7}} ) Index {x:1}
  • 34. Exclusive Range Match 4 7 < ? < 8 1 2 3 4 5 6 7 9
  • 35. Exclusive Range Match > db.c.find( {x:{$gt:4,$lt:7}} ).explain() { "cursor" : "BtreeCursor x_1", "nscanned" : 2, "nscannedObjects" : 2, "n" : 2, "millis" : 0, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : false, "indexBounds" : { "x" : [ [ 4, 7 ] ] } }
  • 36. Exclusive Range Match "indexBounds" : { "x" : [ [ 4, 7 ] ] } Explain doesn’t indicate that the range is exclusive.
  • 37. Exclusive Range Match "nscanned" : 2, "nscannedObjects" : 2, "n" : 2, But index keys matching the range bounds are not scanned because the bounds are exclusive.
  • 38. Exclusive Range Match 5 2 7 6 8 9 1 3 4
  • 40. Multikeys 7 ? > 1 2 3 4 5 6 7 9 8 {_id:4,x:[8,9]}
  • 41. Multikeys > db.c.find( {x:{$gt:7}} ).explain() { "cursor" : "BtreeCursor x_1", "nscanned" : 2, "nscannedObjects" : 2, "n" : 1, "millis" : 1, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : true, "indexOnly" : false, "indexBounds" : { "x" : [ [ 7, 1.7976931348623157e+308 ] ] } }
  • 42. Multikeys "indexBounds" : { "x" : [ [ 7, 1.7976931348623157e+308 ] ] }
  • 43. Multikeys "nscanned" : 2, "nscannedObjects" : 2, "n" : 1, All keys in valid range are scanned, but the matcher rejects duplicate documents making n == 1.
  • 44. Multikeys 5 2 7 6 8 9 1 3 4
  • 45.
  • 47.
  • 48. Range Types db.c.find( {x:{$gt:4,$lt:7}} ) "indexBounds" : { "x" : [ [ 4, 7 ] ] }
  • 49. Range Types db.c.find( {x:{$gt:4}} ) "indexBounds" : { "x" : [ [ 4, 1.7976931348623157e+308 ] ] }
  • 50. Range Types db.c.find( {x:{$ne:4}} ) "indexBounds" : { "x" : [ [ { "$minElement" : 1 }, 4 ], [ 4, { "$maxElement" : 1 } ] ] }
  • 51. Range Types db.c.find( {x:/^a/} ) "indexBounds" : { "x" : [ [ "a", "b" ], [ /^a/, /^a/ ] ] }
  • 52. Range Types db.c.find( {x:/a/} ) "indexBounds" : { "x" : [ [ "", { } ], [ /a/, /a/ ] ] }
  • 53. Set Match db.c.find( {x:{$in:[3,6]}} ) Index {x:1}
  • 54. Set Match 3 6 , 8 1 2 3 4 5 6 7 9
  • 55. Set Match > db.c.find( {x:{$in:[3,6]}} ).explain() { "cursor" : "BtreeCursor x_1 multi", "nscanned" : 3, "nscannedObjects" : 2, "n" : 2, "millis" : 8, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : false, "indexBounds" : { "x" : [ [ 3, 3 ], [ 6, 6 ] ] } }
  • 56. Set Match "indexBounds" : { "x" : [ [ 3, 3 ], [ 6, 6 ] ] }
  • 57. Set Match "nscanned" : 3, "nscannedObjects" : 2, "n" : 2, Why is nscanned 3? This is an algorithmic detail we’ll discuss more later, but when there are disjoint ranges for a key nscanned may be higher than the number of matching keys.
  • 58. Set Match 5 2 7 6 8 9 1 3 4
  • 59. All Match db.c.find( {x:{$all:[3,6]}} ) Index {x:1}
  • 60. 8 All Match 3 ? 1 2 3 4 5 6 7 9 {_id:4,x:[3,6]}
  • 61. All Match > db.c.find( {x:{$all:[3,6]}} ).explain() { "cursor" : "BtreeCursor x_1", "nscanned" : 1, "nscannedObjects" : 1, "n" : 1, "millis" : 0, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : true, "indexOnly" : false, "indexBounds" : { "x" : [ [ 3, 3 ] ] } }
  • 62. All Match "indexBounds" : { "x" : [ [ 3, 3 ] ] } The first entry in the $all match array is always used for index bounds. Note this may not be the least numerous indexed value in the $all array.
  • 63. All Match "nscanned" : 1, "nscannedObjects" : 1, "n" : 1,
  • 64. All Match 5 2 7 6 8 9 1 3 4
  • 65. Limit db.c.find( {x:{$lt:6},y:3} ).limit( 3 ) Index {x:1}
  • 66. 8 Limit 6 ? < 1 2 3 4 5 6 7 9 y:3 y:3 y:3 y:1 y:3
  • 67. Limit > db.c.find( {x:{$lt:6},y:3} ).limit( 3 ).explain() { "cursor" : "BtreeCursor x_1", "nscanned" : 4, "nscannedObjects" : 4, "n" : 3, "millis" : 1, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : true, "indexOnly" : false, "indexBounds" : { "x" : [ [ -1.7976931348623157e+308, 6 ] ] } }
  • 68. Limit "indexBounds" : { "x" : [ [ -1.7976931348623157e+308, 6 ] ] }
  • 69. Limit "nscanned" : 4, "nscannedObjects" : 4, "n" : 3, Scan until three matches are found, then stop.
  • 70. Skip db.c.find( {x:{$lt:6},y:3} ).skip( 3 ) Index {x:1}
  • 71. 8 Skip 6 ? < 1 2 3 4 5 6 7 9 y:3 y:3 y:3 y:1 y:3
  • 72. Skip > db.c.find( {x:{$lt:6},y:3} ).skip( 3 ).explain() { "cursor" : "BtreeCursor x_1", "nscanned" : 5, "nscannedObjects" : 5, "n" : 1, "millis" : 1, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : true, "indexOnly" : false, "indexBounds" : { "x" : [ [ -1.7976931348623157e+308, 6 ] ] } }
  • 73. Skip "indexBounds" : { "x" : [ [ -1.7976931348623157e+308, 6 ] ] }
  • 74. Skip "nscanned" : 5, "nscannedObjects" : 5, "n" : 1, All skipped documents are scanned.
  • 75. Sort db.c.find( {x:{$lt:6}} ).sort( {x:1} ) Index {x:1}
  • 76. 8 Sort 6 ? < 1 2 3 4 5 6 7 9 y:3 y:3 y:3 y:1 y:3
  • 77. Sort > db.c.find( {x:{$lt:6},y:3} ).sort( {x:1} ).explain() { "cursor" : "BtreeCursor x_1", "nscanned" : 5, "nscannedObjects" : 5, "n" : 4, "millis" : 1, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : true, "indexOnly" : false, "indexBounds" : { "x" : [ [ -1.7976931348623157e+308, 6 ] ] } }
  • 78. Sort "cursor" : "BtreeCursor x_1",
  • 79. Sort db.c.find( {x:{$lt:6}} ).sort( {y:1} ) Index {x:1}
  • 80. 8 Sort 6 ? < 1 2 3 4 5 6 7 9 y:3 y:3 y:3 y:1 y:3
  • 81. Sort > db.c.find( {x:{$lt:6},y:3} ).sort( {y:1} ).explain() { "cursor" : "BtreeCursor x_1", "nscanned" : 5, "nscannedObjects" : 5, "n" : 4, "scanAndOrder" : true, "millis" : 1, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : true, "indexOnly" : false, "indexBounds" : { "x" : [ [ -1.7976931348623157e+308, 6 ] ] } }
  • 82. Sort "cursor" : "BtreeCursor x_1", "nscanned" : 5, "nscannedObjects" : 5, "n" : 4, "scanAndOrder" : true, Results are sorted on the fly to match requested order. The scanAndOrder field is only printed when its value is true.
  • 83. Sort and scanAndOrder With “scanAndOrder” sort, all documents must be touched even if there is a limit spec. With scanAndOrder, sorting is performed in memory and the memory footprint is constrained by the limit spec if present.
  • 85. Count 4 7 <= ? <= 8 1 2 3 4 5 6 7 9
  • 86. Count We’re just counting keys here, not loading the full documents. 5 2 7 6 8 9 1 3 4
  • 87.
  • 88. $size
  • 90. Negation - $ne, $nin, $not, etc.With current semantics, all multikey elements must match negation constraints Multikey de duplication works without loading full document
  • 91. Covered Indexes db.c.find( {x:6}, {x:1,_id:0} ) Index {x:1} Id would be returned by default, but isn’t in the index so we need to exclude to return only indexed fields.
  • 92. 8 Covered Indexes 6 ? 1 2 3 4 5 6 7 9 {_id:4,x:6}
  • 93. Covered Indexes > db.c.find( {x:6}, {x:1,_id:0} ).explain() { "cursor" : "BtreeCursor x_1", "nscanned" : 1, "nscannedObjects" : 1, "n" : 1, "millis" : 0, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : true, "indexBounds" : { "x" : [ [ 6, 6 ] ] } }
  • 94. Covered Indexes "isMultiKey" : false, "indexOnly" : true,
  • 95. 8 Covered Indexes 6 ? 1 2 3 4 5 6 7 9 {_id:4,x:[6,7]}
  • 96. Covered Indexes > db.c.find( {x:6}, {x:1,_id:0} ).explain() { "cursor" : "BtreeCursor x_1", "nscanned" : 1, "nscannedObjects" : 1, "n" : 1, "millis" : 0, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : true, "indexOnly" : false, "indexBounds" : { "x" : [ [ 6, 6 ] ] } }
  • 97. Covered Indexes "isMultiKey" : true, "indexOnly" : false, Currently we set isMultiKey to true the first time we save a doc where the field is a multikey array. But when all multikey docs are removed we don’t reset isMultiKey. This can be improved.
  • 98. Update db.c.find( {x:{$gte:4,$lte:7}}, {$set:{x:2}} ) Index {x:1}
  • 99. Update 4 7 <= ? <= 8 1 2 3 4 5 6 7 9 {_id:4,x:4}
  • 100. Update 5 2 7 6 8 9 1 3 4 {_id:4,x:4}
  • 101. Update 5 2 7 6 8 9 1 3 4 {_id:4,x:4}
  • 102. Update 5 2 7 6 8 9 1 2 3 {_id:4,x:2}
  • 103. Update We track the set of documents that have been updated in the course of the current operation so they are only updated once.
  • 105. Two Equality Bounds db.c.find( {x:5,y:’c’} ) Index {x:1,y:1}
  • 106. Two Equality Bounds ? 5 1 3 4 5 5 6 7 9 5 c b d g d f c a b c
  • 107. Two Equality Bounds > db.c.find( {x:5,y:'c'} ).explain() { "cursor" : "BtreeCursor x_1_y_1", "nscanned" : 1, "nscannedObjects" : 1, "n" : 1, "millis" : 1, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : false, "indexBounds" : { "x" : [ [ 5, 5 ] ], "y" : [ [ "c", "c" ] ] } }
  • 108. Two Equality Bounds "indexBounds" : { "x" : [ [ 5, 5 ] ], "y" : [ [ "c", "c" ] ] } }
  • 109. Two Equality Bounds "nscanned" : 1, "nscannedObjects" : 1, "n" : 1,
  • 110. Two Equality Bounds ? 1 3 4 5 5 5 5 6 7 9 b d g c d f c c a b
  • 111. Equality and Set db.c.find( {x:5,y:{$in:[’c’,’f’]}} ) Index {x:1,y:1}
  • 112. Equality and Set , 5 1 3 4 5 5 6 7 9 5 5 c b d g d f c a b c f
  • 113. Equality and Set > db.c.find( {x:5,y:{$in:['c','f']}} ).explain() { "cursor" : "BtreeCursor x_1_y_1 multi", "nscanned" : 3, "nscannedObjects" : 2, "n" : 2, "millis" : 1, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : false, "indexBounds" : { "x" : [ [ 5, 5 ] ], "y" : [ [ "c", "c" ], [ "f", "f" ] ] } }
  • 114. Equality and Set "indexBounds" : { "x" : [ [ 5, 5 ] ], "y" : [ [ "c", "c" ], [ "f", "f" ] ] }
  • 115. Equality and Set "nscanned" : 3, "nscannedObjects" : 2, "n" : 2,
  • 116. Equality and Set 1 3 4 5 5 5 6 7 9 b d g c d f c a b
  • 117. Equality and Range db.c.find( {x:5,y:{$gte:’d’}} ) Index {x:1,y:1}
  • 118. Equality and Range <= ? <= 1 3 4 5 5 6 7 9 5 5 5 b d g d f c a b c d max string
  • 119. Equality and Range > db.c.find( {x:5,y:{$gte:'d'}} ).explain() { "cursor" : "BtreeCursor x_1_y_1", "nscanned" : 2, "nscannedObjects" : 2, "n" : 2, "millis" : 1, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : false, "indexBounds" : { "x" : [ [ 5, 5 ] ], "y" : [ [ "d", { } ] ] } }
  • 120. Equality and Range "indexBounds" : { "x" : [ [ 5, 5 ] ], "y" : [ [ "d", { } ] ] }
  • 121. Equality and Range "nscanned" : 2, "nscannedObjects" : 2, "n" : 2,
  • 122. Equality and Range 1 3 4 5 5 5 6 7 9 b d g c d f c a b
  • 123. Two Set Bounds db.c.find( {x:{$in:[5,9]},y:{$in:[’c’,’f’]}} ) Index {x:1,y:1}
  • 124. Two Set Bounds , , , 5 1 3 4 5 5 6 7 9 5 5 9 9 c b d g d f c a f c f c f
  • 125. Two Set Bounds > db.c.find( {x:{$in:[5,9]},y:{$in:['c','f']}} ).explain() { "cursor" : "BtreeCursor x_1_y_1 multi", "nscanned" : 5, "nscannedObjects" : 3, "n" : 3, "millis" : 0, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : false, "indexBounds" : { "x" : [ [ 5, 5 ], [ 9, 9 ] ], "y" : [ [ "c", "c" ], [ "f", "f" ] ] } }
  • 126. Two Set Bounds "indexBounds" : { "x" : [ [ 5, 5 ], [ 9, 9 ] ], "y" : [ [ "c", "c" ], [ "f", "f" ] ] }
  • 127. Two Set Bounds "nscanned" : 5, "nscannedObjects" : 3, "n" : 3,
  • 128. Two Set Bounds 1 3 4 5 5 5 6 7 9 b d g c d f c a f
  • 129. Set and Range db.c.find( {x:{$in:[5,9]},y:{$lte:’d’}} ) Index {x:1,y:1}
  • 130. Set and Range , <=?<= <=?<= 5 1 3 4 5 5 6 9 9 5 5 9 9 b d g d f c a f c d d min string min string
  • 131. Set and Range > db.c.find( {x:{$in:[5,9]},y:{$lte:'d'}} ).explain() { "cursor" : "BtreeCursor x_1_y_1 multi", "nscanned" : 5, "nscannedObjects" : 3, "n" : 3, "millis" : 0, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : false, "indexBounds" : { "x" : [ [ 5, 5 ], [ 9, 9 ] ], "y" : [ [ "", "d" ] ] } }
  • 132. Set and Range "x" : [ [ 5, 5 ], [ 9, 9 ] ], "y" : [ [ "", "d" ] ] }
  • 133. Set and Range "nscanned" : 5, "nscannedObjects" : 3, "n" : 3,
  • 134. Range and Equality db.c.find( {x:{$gte:4},y:’c’} ) Index {x:1,y:1}
  • 135. Range and Equality ? and ? >= 4 1 3 4 5 6 7 9 5 8 b d g d a e f c c c
  • 136. Range and Equality > db.c.find( {x:{$gte:4},y:'c'} ).explain() { "cursor" : "BtreeCursor x_1_y_1", "nscanned" : 7, "nscannedObjects" : 2, "n" : 2, "millis" : 0, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : false, "indexBounds" : { "x" : [ [ 4, 1.7976931348623157e+308 ] ], "y" : [ [ "c", "c" ] ] } }
  • 137. Range and Equality "indexBounds" : { "x" : [ [ 4, 1.7976931348623157e+308 ] ], "y" : [ [ "c", "c" ] ] }
  • 138. Range and Equality "nscanned" : 7, "nscannedObjects" : 2, "n" : 2, High nscanned because every distinct value of x must be checked.
  • 139. Range and Equality 1 3 4 5 5 9 6 7 8 b d g c d f a e c
  • 140. Range and Equality 1 3 4 5 5 9 6 7 8 b d g c d f a e c Every distinct value of x must be checked.
  • 141. Range and Set db.c.find( {x:{$gte:4},y:{$in:[’c’,’a’]}} ) Index {x:1,y:1}
  • 142. Range and Set , and ? >= 4 1 3 4 5 6 7 9 5 8 b d g d a e f c c c a
  • 143. Range and Set > db.c.find( {x:{$gte:4},y:{$in:['c','a']}} ).explain() { "cursor" : "BtreeCursor x_1_y_1 multi", "nscanned" : 7, "nscannedObjects" : 3, "n" : 3, "millis" : 0, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : false, "indexBounds" : { "x" : [ [ 4, 1.7976931348623157e+308 ] ], "y" : [ [ "a", "a" ], [ "c", "c" ] ] } }
  • 144. Range and Set "indexBounds" : { "x" : [ [ 4, 1.7976931348623157e+308 ] ], "y" : [ [ "a", "a" ], [ "c", "c" ] ] }
  • 145. Range and Set "nscanned" : 7, "nscannedObjects" : 3, "n" : 3,
  • 146. Range and Set 1 3 4 5 5 9 6 7 8 b d g c d f a e c
  • 147. Range and Set 1 3 4 5 5 9 6 7 8 b d g c d f a e c Every distinct value of x must be checked for y values ‘a’ and ‘c’.
  • 148. Two Ranges (2D Box) db.c.find( {x:{$gte:3,$lte:7},y:{$gte:’c’,$lte:’f’}} ) Index {x:1,y:1}
  • 149. Two Ranges (2D Box) y f {x: {$gte:3,$lte:7}, y: {$gte:’c’,$lte:’f’}} c x 7 3
  • 150. Two Ranges (2D Box) & <=?<= <=?<= 7 1 3 4 5 6 7 9 5 7 3 b d g d a e f c c g f
  • 151. Two Ranges (2D Box) > db.c.find( {x:{$gte:3,$lte:7},y:{$gte:'c',$lte:'f'}} ).explain() { "cursor" : "BtreeCursor x_1_y_1", "nscanned" : 6, "nscannedObjects" : 4, "n" : 4, "millis" : 1, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : false, "indexBounds" : { "x" : [ [ 3, 7 ] ], "y" : [ [ "c", "f" ] ] } }
  • 152. Two Ranges (2D Box) "indexBounds" : { "x" : [ [ 3, 7 ] ], "y" : [ [ "c", "f" ] ] }
  • 153. Two Ranges (2D Box) "nscanned" : 6, "nscannedObjects" : 4, "n" : 4,
  • 154. Two Ranges (2D Box) 1 3 4 5 5 9 6 7 7 b d g c d f a e g
  • 155. Two Ranges (2D Box) 7 3 For every distinct value of x in this range c f <=?<= Scan for every value of y in this range <=?<=
  • 156. $or
  • 157. Disjoint $or Criteria db.c.find( {$or:[{x:5},{y:’d’}]} ) Indexes {x:1}, {y:1}
  • 158. Disjoint $or Criteria ? 1 3 4 5 6 7 9 5 7 5 1 3 4 5 6 7 9 5 7 b d g d a e f c d g b d g d a e f c g ?
  • 159. Disjoint $or Criteria > db.c.find( {$or:[{x:5},{y:'d'}]} ).explain() { "clauses" : [ { "cursor" : "BtreeCursor x_1", "nscanned" : 2, "nscannedObjects" : 2, "n" : 2, "millis" : 0, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : false, "indexBounds" : { "x" : [ [ 5, 5 ] ] } }, { "cursor" : "BtreeCursor y_1", "nscanned" : 2, "nscannedObjects" : 2, "n" : 1, "millis" : 1, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : false, "indexBounds" : { "y" : [ [ "d", "d" ] ] } } ], "nscanned" : 4, "nscannedObjects" : 4, "n" : 3, "millis" : 1 }
  • 160. Disjoint $or Criteria { "cursor" : "BtreeCursor x_1", "nscanned" : 2, "nscannedObjects" : 2, "n" : 2, "millis" : 0, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : false, "indexBounds" : { "x" : [ [ 5, 5 ] ] } },
  • 161. Disjoint $or Criteria { "cursor" : "BtreeCursor y_1", "nscanned" : 2, "nscannedObjects" : 2, "n" : 1, "millis" : 1, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : false, "indexBounds" : { "y" : [ [ "d", "d" ] ] } } ], Only return one document matching this clause.
  • 162. Disjoint $or Criteria "nscanned" : 4, "nscannedObjects" : 4, "n" : 3, "millis" : 1
  • 163. Disjoint $or Criteria ? 1 3 4 5 6 7 9 5 7 5 b d g d a e f c g ✓
  • 164. Disjoint $or Criteria ? 1 3 4 5 6 7 9 5 7 d b d g d a e f c g ✓ We have already scanned the x index for x:5. So this document was returned already. We don’t return it again.
  • 165. Unindexed $or Clause db.c.find( {$or:[{x:5},{y:’d’}]} ) Index {x:1} (no index on y)
  • 166. Unindexed $or Clause > db.c.find( {$or:[{x:5},{y:'d'}]} ).explain() { "cursor" : "BasicCursor", "nscanned" : 9, "nscannedObjects" : 9, "n" : 3, "millis" : 0, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : false, "indexBounds" : { } } Since y is not indexed, we must do a full collection scan to match y:’d’. Since a full scan is required, we don’t use the index on x to match x:5.
  • 167. Eliminated $or Clause db.c.find( {$or:[{x:{$gt:2,$lt:6}},{x:5}]} ) Index {x:1}
  • 168. Eliminated $or Clause 2 6 < ? < 8 1 2 3 4 6 7 9 5 5 ? 8 1 2 3 4 6 7 9 5
  • 169. Eliminated $or Clause > db.c.find( {$or:[{x:{$gt:2,$lt:6}},{x:5}]} ).explain() { "cursor" : "BtreeCursor x_1", "nscanned" : 3, "nscannedObjects" : 3, "n" : 3, "millis" : 0, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : false, "indexBounds" : { "x" : [ [ 2, 6 ] ] } } The index range of the second clause is included in the index range of the first clause, so we use the first index range only.
  • 170. Eliminated $or Clause with Differing Unindexed Criteria db.c.find( {$or:[{x:{$gt:2,$lt:6},y:’c’},{x:5,y:'d’}]} ) Index {x:1}
  • 171. Eliminated $or Clause with Differing Unindexed Criteria < ? < and 1 3 4 5 6 7 9 5 7 2 6 1 3 4 5 6 7 9 5 7 5 b d g d a e f c g c b d g d a e f c g d and
  • 172. Eliminated $or Clause with Differing Unindexed Criteria > db.c.find( {$or:[{x:{$gt:2,$lt:6},y:’c’},{x:5,y:'d’}]} ).explain() { "cursor" : "BtreeCursor x_1", "nscanned" : 4, "nscannedObjects" : 4, "n" : 2, "millis" : 1, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : false, "indexBounds" : { "x" : [ [ 2, 6 ] ] } }
  • 173. Eliminated $or Clause with Differing Unindexed Criteria 1 3 4 5 6 7 9 5 7 2 6 < ? < and , b d g d a e f c g c d The index range for the first clause contains the index range for the second clause, so all matching is done using the index range for the first clause.
  • 174. Overlapping $or Clauses db.c.find( {$or:[{x:{$gt:2,$lt:6}},{x:{$gt:4,$lt:7}}]} ) Index {x:1,y:1}
  • 175. Overlapping $or Clauses 2 6 < ? < 8 1 2 3 4 6 7 9 5 4 7 < ? < 8 1 2 3 4 6 7 9 5
  • 176. Overlapping $or Clauses > db.d.find( {$or:[{x:{$gt:2,$lt:6}},{x:{$gt:4,$lt:7}}]} ).explain() { "clauses" : [ { "cursor" : "BtreeCursor x_1", "nscanned" : 3, "nscannedObjects" : 3, "n" : 3, "millis" : 0, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : false, "indexBounds" : { "x" : [ [ 2, 6 ] ] } }, { "cursor" : "BtreeCursor x_1", "nscanned" : 1, "nscannedObjects" : 1, "n" : 1, "millis" : 1, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : false, "indexBounds" : { "x" : [ [ 6, 7 ] ] } } ], "nscanned" : 4, "nscannedObjects" : 4, "n" : 4, "millis" : 1 } >
  • 177. Overlapping $or Clauses { "cursor" : "BtreeCursor x_1", "nscanned" : 3, "nscannedObjects" : 3, "n" : 3, "millis" : 0, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : false, "indexBounds" : { "x" : [ [ 2, 6 ] ] } },
  • 178. Overlapping $or Clauses { "cursor" : "BtreeCursor x_1", "nscanned" : 1, "nscannedObjects" : 1, "n" : 1, "millis" : 1, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : false, "indexBounds" : { "x" : [ [ 6, 7 ] ] } } The index range scanned for the previous clause is removed.
  • 179. Overlapping $or Clauses 2 6 < ? < 8 1 2 3 4 6 7 9 5 6 7 <= ? < 8 1 2 3 4 7 9 5 6
  • 180. 2D Overlapping $or Clauses db.c.find( {$or:[{x:{$gt:2,$lt:6},y:{$gt:’b’,$lt:’f’}},{x:{$gt:4,$lt:7},y:{$gt:’b’,$lt:’e’}}]} ) Index {x:1,y:1}
  • 181. 2D Overlapping $or Clauses y f Clause 1 e Clause 2 b x 7 6 2
  • 182. 2D Overlapping $or Clauses > db.c.find( {$or:[{x:{$gt:2,$lt:6},y:{$gt:'b',$lt:'f'}},{x:{$gt:4,$lt:7},y:{$gt:'b',$lt:'e'}}]} ).explain() { "clauses" : [ { "cursor" : "BtreeCursor x_1_y_1", "nscanned" : 4, "nscannedObjects" : 3, "n" : 3, "millis" : 1, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : false, "indexBounds" : { "x" : [ [ 2, 6 ] ], "y" : [ [ "b", "f" ] ] } }, { "cursor" : "BtreeCursor x_1_y_1", "nscanned" : 0, "nscannedObjects" : 0, "n" : 0, "millis" : 1, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : false, "indexBounds" : { "x" : [ [ 6, 7 ] ], "y" : [ [ "b", "e" ] ] } } ], "nscanned" : 4, "nscannedObjects" : 3, "n" : 3, "millis" : 1 }
  • 183. 2D Overlapping $or Clauses { "cursor" : "BtreeCursor x_1_y_1", "nscanned" : 4, "nscannedObjects" : 3, "n" : 3, "millis" : 1, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : false, "indexBounds" : { "x" : [ [ 2, 6 ] ], "y" : [ [ "b", "f" ] ] } },
  • 184. 2D Overlapping $or Clauses { "cursor" : "BtreeCursor x_1_y_1", "nscanned" : 0, "nscannedObjects" : 0, "n" : 0, "millis" : 1, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : false, "indexBounds" : { "x" : [ [ 6, 7 ] ], "y" : [ [ "b", "e" ] ] } } ], The index range scanned for the previous clause is removed.
  • 185. 2D Overlapping $or Clauses y We only have to scan the remainder here f Clause 1 e Clause 2 b x 7 6 2
  • 186. Overlapping $or Clauses Rule of thumb for n dimensions: We subtract earlier clause boxes from current box when the result is a/some box(es). 2 ✓ 1 1 2 ✓ ✓
  • 187. Overlapping $or Clauses Rule of thumb for n dimensions: We subtract earlier clause boxes from current box when the result is a/some box(es). 1 2 ✗
  • 188. $or TODO Use indexes on $or fields to satisfy a sort specification SERVER-1205 Use full query optimizer to select $or clause indexes in getMore SERVER-1215 Improve index range elimination (handling some cases where remainder is not a box)
  • 190. Optimal Index find( {x:5} ) Index {x:1} Index {x:1,y:1} find( {x:5} ).sort( {y:1 } ) Index {x:1,y:1} find( {} ).sort( {x:1} ) Index {x:1} find( {x:{$gt:1,$lt:7}} ).sort( {x:1} ) Index {x:1}
  • 191. Optimal Index Rule of Thumb No scanAndOrder All fields with index useful constraints are indexed If there is a range or sort it is the last field of the index used to resolve the query If multiple optimal indexes exist, one chosen arbitrarily.
  • 192. Optimal Index These same criteria are useful when you are designing your indexes.
  • 193. Multiple Candidate Indexes find( {x:4,y:’a’} ) Index {x:1} or {y:1}? find( {x:4} ).sort( {y:1} ) Index {x:1} or {y:1}? Note: {x:1,y:1} is optimal find( {x:{$gt:2,$lt:7},y:{$gt:’a’,$lt:’f’}} ) Index {x:1,y:1} or {y:1,x:1}?
  • 194. Multiple Candidate Indexes The only index selection criterion is nscanned find( {x:4,y:’a’} ) Index {x:1} or {y:1} ? If fewer documents match {y:’a’} than {x:4} then nscanned for {y:1} will be less so we pick {y:1} find( {x:{$gt:2,$lt:7},y:{$gt:’b’,$lt:’f’}} ) Index {x:1,y:1} or {y:1,x:1} ? If fewer distinct values of 2 < x < 7 than distinct values of ‘b’ < y < ‘f’ then {x:1,y:1} chosen (rule of thumb)
  • 195. Multiple Candidate Indexes The only index selection criterion is nscanned Pretty good, but doesn’t cover every case, eg Cost of scanAndOrdervs ordered index Cost of loading full document vs just index key Cost of scanning adjacent btree keys vs non adjacent keys/documents
  • 196. Competing Indexes At most one query plan per index Run in interleaved fashion Plans kept in a priority queue ordered by nscanned. We always continue progress on plan with lowest nscanned.
  • 197. Competing Indexes Run until one plan returns all results or enough results to satisfy the initial query request (based on soft limit spec / data size requirement for initial query). We only allow plans to compete in initial query. In getMore, we continue reading from the index cursor established by the initial query.
  • 198. “Learning” a Query Plan When an index is chosen for a query the query’s “pattern” and nscanned are recorded find( {x:3,y:’c’} ) {Pattern: {x:’equality’, y:’equality’}, Index: {x:1}, nscanned: 50} find( {x:{$gt:5},y:{$lt:’z’}} ) {Pattern: {x:’gt bound’, y:’lt bound’}, Index: {y:1}, nscanned: 500}
  • 199. “Learning” a Query Plan When a new query matches the same pattern, the same query plan is used find( {x:5,y:’z’} ) Use index {x:1} find( {x:{$gt:20},y:{$lt:’b’}} ) Use index {y:1}
  • 200. “Un-Learning” a Query Plan 100 writes to the collection Indexes added / removed
  • 201. Bad Plan Insurance If nscanned for a new query using a recorded plan is much worse than the recorded nscanned for an earlier query with the same pattern, we start interleaving other plans with the current plan. Currently “much worse” means 10x
  • 202. Query Planner Ad hoc heuristics in some cases Seem to work decently in practice
  • 203. Feedback Large and small scale optimizer features are generally prioritized based on user input. Please use jira to request new features and vote on existing feature requests.
  • 204. Thanks! Feature Requests jira.mongodb.org Support groups.google.com/group/mongodb-user Next up: Sharding Details with Eliot