SlideShare uma empresa Scribd logo
1 de 66
Senior Solutions Architect, 10gen
James Kerr
2.4 Sharding Features
Agenda
• Mechanics of sharding
– Key space
– Chunks
– Balancing
• Types of requests
• Hashed shard keys
– Why use hashed shard keys
– How to enable hashed shard keys
– Limitations
Sharded Cluster
Sharding your data
What is a Shard Key
• Shard key is used to partition your collection
• Shard key must exist in every document
• Shard key is immutable
• Shard key values are immutable
• Shard key must be indexed
• Shard key is used to route requests to shards
The key space
{x: 10} {x: -5} {x: -9} {x: 7} {x: 6} {x: 0}
Inserting data
{x: 0}{x: 6}{x: 7}{x: -5}{x: 10} {x: -9}
Inserting data
{x: 0} {x: 6} {x: 7}{x: -5} {x: 10}{x: -9}
Chunk range and size
{x: 0} {x: 6} {x: 7}{x: -5} {x: 10}{x: -9}
Inserting further data
{x: 0} {x: 6} {x: 7}{x: -5} {x: 10}{x: -9}
{x: 9}{x: -7} {x: 3}
Chunk splitting
{x: 0} {x: 6} {x: 7}{x: -5} {x: 10}{x: -9}
0 0
• Achunk is split once it exceeds the maximum size
• There is no split point if all documents have the same shard key
• Chunk split is a logical operation (no data is moved)
• If split creates too large of a discrepancy of chunk count across cluster
a balancing round starts
Data distribution
• MinKey to 0 lives on Shard1
• 0 to MaxKey lives on Shard2
• Mongos routes queries appropriately
Mongos routes data
minKey  0 0  maxKey
db.test.insert({ x: -1000 })
Mongos routes data
minKey  0 0  maxKey
db.test.insert({ x: -1000 })
Unbalanced shards
minKey  0 0  maxKey
Balancing
• Migration threshold
• Number of chunks less than 20, migration threshold of 2
• 21-80, migration threshold 4
• >80, migration threshold 8
Moving the chunk
• One chunk of data is copied from Shard 1 to Shard 2
Committing Migration
• Once everyone agrees the data has moved, that chunk gets
deleted from Shard 1.
Cleanup
• Other mongos' have to find out about new configuration
Migrations' effect
• Expensive
• Can take a long time
• Competes for limited resources
Picking a shard key
• Cardinality
• Optimize routing
• Minimize (unnecessary) traffic
• Allow best scaling
What about Object Id?
ObjectId("51597ca8e28587b86528edfd”)
• Used for _id
• 12 byte value
• Generated by the driver if not specified
• Theoretically globally unique
What about Object Id?
ObjectId("51597ca8e28587b86528edfd”)
12 Bytes
Timestamp
MAC
PID
Counter
// enabling sharding on test database
mongos> sh.enableSharding("test")
{ "ok" : 1 }
// sharding the test collection
mongos> sh.shardCollection("test.test",{_id:1})
{ "collectionsharded" : "test.test", "ok" : 1 }
// create a loop inserting data
mongos> for (x=0; x<10000; x++) {
... db.test.insert({value:x})
... }
Sharding on ObjectId
shards:
{ "_id" : "shard0000", "host" : "localhost:30000" }
{ "_id" : "shard0001", "host" : "localhost:30001" }
databases:
{ "_id" : "test", "partitioned" : true, "primary" : "shard0001" }
test.test
shard key: { "_id" : 1 }
chunks:
shard0001 3
{ "_id" : { "$minKey" : 1 } } -->> { "_id" : ObjectId(”...") }
on : shard0001 { "t" : 1000, "i" : 1 }
{ "_id" : ObjectId(”...”) } -->> { "_id" : { "$maxKey" : 1 } }
on : shard0001 { "t" : 1000, "i" : 2 }
ObjectId chunk distribution
ObjectId gives a hot shard
minKey  0 0  maxKey
Sharding on incremental
values like timestamp is
not optimum for even
distribution
Hashed Shard Keys
Hashed Shard Keys
{x:2} md5 c81e728d9d4c2f636f067f89cc14862c
{x:3} md5 eccbc87e4b5ce2fe28308fd9f2a7baf3
{x:1} md5 c4ca4238a0b923820dcc509a6f75849b
Hashed shard key eliminates hot
shards
minKey  0 0  maxKey
Under the hood
• Create a hashed index used for sharding
• Uses the first 64-bits of md5 hash of field
• Uses existing hash index, or creates a new one
on a collection
• Hash both data and BSON type
• Represented as a NumberLong in the JS shell
// hash on 1 as an integer
> db.runCommand({_hashBSONElement:1})
{
"key" : 1,
"seed" : 0,
"out" : NumberLong("5902408780260971510"),
"ok" : 1
}
// hash on “1” as a string
> db.runCommand({_hashBSONElement:"1"})
{
"key" : "1",
"seed" : 0,
"out" : NumberLong("-2448670538483119681"),
"ok" : 1
}
Hash on simple or embedded BSON
values
Enabling hashed indexes
• Create index
– db.collection.ensureIndex( {field : ”hashed”} )
• Options
– Seed, specify a different seed to use
– hashVersion, at the moment only version 0 (md5).
Using hash shard keys
• Enable sharding on collection
– sh.shardCollection(“test.collection”, {field: “hashed”})
• Options
– numInitialChunks, specifies the number of initial chunks
per shard. Default is two chunks per shard (use
“sh._adminCommand” to specify options)
// enabling sharding on test database
mongos> sh.enableSharding("test")
{ "ok" : 1 }
// shard by hashed _id field
mongos> sh.shardCollection("test.hash",{_id:"hashed"})
{ "collectionsharded" : "test.hash", "ok" : 1 }
Sharding on hashed ObjectId
databases:
{ "_id" : "test", "partitioned" : true, "primary" : "shard0001" }
test.hash
shard key: { "_id" : "hashed" }
chunks:
shard0000 2
shard0001 2
{ "_id" : { "$minKey" : 1 } } -->> { "_id" : NumberLong("-4611686018427387902") } on :
shard0000 { "t" : 2000, "i" : 2 }
{ "_id" : NumberLong("-4611686018427387902") } -->> { "_id" : NumberLong(0) } on : shard0000
{ "t" : 2000, "i" : 3 }
{ "_id" : NumberLong(0) } -->> { "_id" : NumberLong("4611686018427387902") } on : shard0001 {
"t" : 2000, "i" : 4 }
{ "_id" : NumberLong("4611686018427387902") } -->> { "_id" : { "$maxKey" : 1 } } on : shard0001
{ "t" : 2000, "i" : 5 }
Pre-splitting the data
// create a loop inserting data
mongos> for (x=0; x<10000; x++) {
... db.hash.insert({value:x})
... }
Inserting into hashed shard key
collection
test.hash
shard key: { "_id" : "hashed" }
chunks:
shard0000 4
shard0001 4
{ "_id" : { "$minKey" : 1 } } -->> { "_id" : NumberLong("-
7374407069602479355") } on : shard0000 { "t" : 2000, "i" : 8 }
{ "_id" : NumberLong("-7374407069602479355") } -->> { "_id" : NumberLong("-
4611686018427387902") } on : shard0000 { "t" : 2000, "i" : 9 }
{ "_id" : NumberLong("-4611686018427387902") } -->> { "_id" : NumberLong("-
2456929743513174890") } on : shard0000 { "t" : 2000, "i" : 6 }
{ "_id" : NumberLong("-2456929743513174890") } -->> { "_id" : NumberLong(0)
} on : shard0000 { "t" : 2000, "i" : 7 }
{ "_id" : NumberLong(0) } -->> { "_id" : NumberLong("1483539935376971743")
} on : shard0001 { "t" : 2000, "i" : 12 }
Even distribution of chunks
Routing Requests
Cluster Request Routing
• Targeted Queries
• Scatter Gather Queries
• Scatter Gather Queries with Sort
Cluster Request Routing: Targeted
Query
Routable request received
Request routed to appropriate shard
Shard returns results
Mongos returns results to client
Cluster Request Routing: Non-Targeted
Query
Non-Targeted Request Received
Request sent to all shards
Shards return results to mongos
Mongos returns results to client
Cluster Request Routing: Non-Targeted
Query with Sort
Non-Targeted request with sort
received
Request sent to all shards
Query and sort performed locally
Shards return results to mongos
Mongos merges sorted results
Mongos returns results to client
Hash keys are great for equality
queries
• Equality queries directed to a specific shard
• Will use the index
• Most efficient query possible
mongos> db.hash.find({x:1}).explain()
{
"cursor" : "BtreeCursor x_hashed",
"n" : 1,
"nscanned" : 1,
"nscannedObjects" : 1,
"millisShardTotal" : 0,
"numQueries" : 1,
"numShards" : 1,
"indexBounds" : {
"x" : [
[
NumberLong("5902408780260971510"),
NumberLong("5902408780260971510")
]
]
},
"millis" : 0
}
Explain plan of an equality query
But not so good for a range query
• Range queries scatter gather
• Won’t use index
mongos> db.hash.find({x:{$gt:1, $lt:99}}).explain()
{
"cursor" : "BasicCursor",
"n" : 97,
"nChunkSkips" : 0,
"nYields" : 0,
"nscanned" : 1000,
"nscannedAllPlans" : 1000,
"nscannedObjects" : 1000,
"nscannedObjectsAllPlans" : 1000,
"millisShardTotal" : 0,
"millisShardAvg" : 0,
"numQueries" : 2,
"numShards" : 2,
"millis" : 3
}
Explain plan of a range query
Other limitations
• Cannot use a compound key
• Key cannot have an array value
• Tag-aware sharding
– Only makes sense to assign the full hashed shard key
collection to particular shards
– By design, there’s no real way to know or control what
data is in what range
• Key with poor cardinality is going to give a hash
with poor cardinality
– Floating point numbers are squashed. E.g. 100.4 will be
hashed as 100
Summary
• Range-based Sharding
– Most efficient for applications that operate on ranges
– Requires careful shard key selection
• Hash-based Sharding
– Uniform writes,
– No routed range queries
• TagAware Sharding
– That’s another talk!
Resources
• Tutorial:Sharding (laptop friendly)
• Tutorial:Converting a Replica Set to a Replicated Sharded Cluster(laptop
friendly)
• Manual:Select a Shard Key
• Manual:Hash-based Sharding
• Manual:TagAware Sharding
• Manual:Strategiesfor Bulk Inserts in ShardedClusters
• Manual:Manage Sharded Cluster Balancer
Questions?
James Kerr
Thank You
Senior Solutions Architect, 10gen

Mais conteúdo relacionado

Mais procurados

Indexing and Query Optimization
Indexing and Query OptimizationIndexing and Query Optimization
Indexing and Query Optimization
MongoDB
 
MongoDB Performance Tuning
MongoDB Performance TuningMongoDB Performance Tuning
MongoDB Performance Tuning
MongoDB
 
Di web tech mail (no subject)
Di web tech mail   (no subject)Di web tech mail   (no subject)
Di web tech mail (no subject)
shubhamvcs
 
Indexing & Query Optimization
Indexing & Query OptimizationIndexing & Query Optimization
Indexing & Query Optimization
MongoDB
 
MongoDB for Time Series Data Part 3: Sharding
MongoDB for Time Series Data Part 3: ShardingMongoDB for Time Series Data Part 3: Sharding
MongoDB for Time Series Data Part 3: Sharding
MongoDB
 

Mais procurados (20)

Mythbusting: Understanding How We Measure the Performance of MongoDB
Mythbusting: Understanding How We Measure the Performance of MongoDBMythbusting: Understanding How We Measure the Performance of MongoDB
Mythbusting: Understanding How We Measure the Performance of MongoDB
 
Indexing and Query Optimizer (Mongo Austin)
Indexing and Query Optimizer (Mongo Austin)Indexing and Query Optimizer (Mongo Austin)
Indexing and Query Optimizer (Mongo Austin)
 
Indexing and Query Optimization
Indexing and Query OptimizationIndexing and Query Optimization
Indexing and Query Optimization
 
Building a Scalable Inbox System with MongoDB and Java
Building a Scalable Inbox System with MongoDB and JavaBuilding a Scalable Inbox System with MongoDB and Java
Building a Scalable Inbox System with MongoDB and Java
 
MongoDB .local Munich 2019: New Encryption Capabilities in MongoDB 4.2: A Dee...
MongoDB .local Munich 2019: New Encryption Capabilities in MongoDB 4.2: A Dee...MongoDB .local Munich 2019: New Encryption Capabilities in MongoDB 4.2: A Dee...
MongoDB .local Munich 2019: New Encryption Capabilities in MongoDB 4.2: A Dee...
 
MongoDB Performance Tuning
MongoDB Performance TuningMongoDB Performance Tuning
MongoDB Performance Tuning
 
MongoDB and Indexes - MUG Denver - 20160329
MongoDB and Indexes - MUG Denver - 20160329MongoDB and Indexes - MUG Denver - 20160329
MongoDB and Indexes - MUG Denver - 20160329
 
Di web tech mail (no subject)
Di web tech mail   (no subject)Di web tech mail   (no subject)
Di web tech mail (no subject)
 
Indexing & Query Optimization
Indexing & Query OptimizationIndexing & Query Optimization
Indexing & Query Optimization
 
MongoDB World 2016: Deciphering .explain() Output
MongoDB World 2016: Deciphering .explain() OutputMongoDB World 2016: Deciphering .explain() Output
MongoDB World 2016: Deciphering .explain() Output
 
2011 Mongo FR - Indexing in MongoDB
2011 Mongo FR - Indexing in MongoDB2011 Mongo FR - Indexing in MongoDB
2011 Mongo FR - Indexing in MongoDB
 
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
 
5 Pitfalls to Avoid with MongoDB
5 Pitfalls to Avoid with MongoDB5 Pitfalls to Avoid with MongoDB
5 Pitfalls to Avoid with MongoDB
 
MongoDB .local Chicago 2019: Using Client Side Encryption in MongoDB 4.2
MongoDB .local Chicago 2019: Using Client Side Encryption in MongoDB 4.2MongoDB .local Chicago 2019: Using Client Side Encryption in MongoDB 4.2
MongoDB .local Chicago 2019: Using Client Side Encryption in MongoDB 4.2
 
Webinar: General Technical Overview of MongoDB for Dev Teams
Webinar: General Technical Overview of MongoDB for Dev TeamsWebinar: General Technical Overview of MongoDB for Dev Teams
Webinar: General Technical Overview of MongoDB for Dev Teams
 
MongoDB for Time Series Data Part 3: Sharding
MongoDB for Time Series Data Part 3: ShardingMongoDB for Time Series Data Part 3: Sharding
MongoDB for Time Series Data Part 3: Sharding
 
MongoDB Days Silicon Valley: MongoDB and the Hadoop Connector
MongoDB Days Silicon Valley: MongoDB and the Hadoop ConnectorMongoDB Days Silicon Valley: MongoDB and the Hadoop Connector
MongoDB Days Silicon Valley: MongoDB and the Hadoop Connector
 
Reducing Development Time with MongoDB vs. SQL
Reducing Development Time with MongoDB vs. SQLReducing Development Time with MongoDB vs. SQL
Reducing Development Time with MongoDB vs. SQL
 
Getting Started with MongoDB and NodeJS
Getting Started with MongoDB and NodeJSGetting Started with MongoDB and NodeJS
Getting Started with MongoDB and NodeJS
 
Conceptos básicos. Seminario web 5: Introducción a Aggregation Framework
Conceptos básicos. Seminario web 5: Introducción a Aggregation FrameworkConceptos básicos. Seminario web 5: Introducción a Aggregation Framework
Conceptos básicos. Seminario web 5: Introducción a Aggregation Framework
 

Semelhante a Webinar: MongoDB 2.4 Feature Demo and Q&A on Hash-based Sharding

2014 05-07-fr - add dev series - session 6 - deploying your application-2
2014 05-07-fr - add dev series - session 6 - deploying your application-22014 05-07-fr - add dev series - session 6 - deploying your application-2
2014 05-07-fr - add dev series - session 6 - deploying your application-2
MongoDB
 
Sharding
ShardingSharding
Sharding
MongoDB
 
Sharding - Seoul 2012
Sharding - Seoul 2012Sharding - Seoul 2012
Sharding - Seoul 2012
MongoDB
 
Introduction to Sharding
Introduction to ShardingIntroduction to Sharding
Introduction to Sharding
MongoDB
 
AWS IoT 핸즈온 워크샵 - 실습 5. DynamoDB에 센서 데이터 저장하기 (김무현 솔루션즈 아키텍트)
AWS IoT 핸즈온 워크샵 - 실습 5. DynamoDB에 센서 데이터 저장하기 (김무현 솔루션즈 아키텍트)AWS IoT 핸즈온 워크샵 - 실습 5. DynamoDB에 센서 데이터 저장하기 (김무현 솔루션즈 아키텍트)
AWS IoT 핸즈온 워크샵 - 실습 5. DynamoDB에 센서 데이터 저장하기 (김무현 솔루션즈 아키텍트)
Amazon Web Services Korea
 
High Performance, Scalable MongoDB in a Bare Metal Cloud
High Performance, Scalable MongoDB in a Bare Metal CloudHigh Performance, Scalable MongoDB in a Bare Metal Cloud
High Performance, Scalable MongoDB in a Bare Metal Cloud
MongoDB
 

Semelhante a Webinar: MongoDB 2.4 Feature Demo and Q&A on Hash-based Sharding (20)

2014 05-07-fr - add dev series - session 6 - deploying your application-2
2014 05-07-fr - add dev series - session 6 - deploying your application-22014 05-07-fr - add dev series - session 6 - deploying your application-2
2014 05-07-fr - add dev series - session 6 - deploying your application-2
 
Mongo indexes
Mongo indexesMongo indexes
Mongo indexes
 
Sharding
ShardingSharding
Sharding
 
How does cryptography work? by Jeroen Ooms
How does cryptography work?  by Jeroen OomsHow does cryptography work?  by Jeroen Ooms
How does cryptography work? by Jeroen Ooms
 
Sharding
ShardingSharding
Sharding
 
Mysql handle socket
Mysql handle socketMysql handle socket
Mysql handle socket
 
Hash Functions FTW
Hash Functions FTWHash Functions FTW
Hash Functions FTW
 
Sharding - Seoul 2012
Sharding - Seoul 2012Sharding - Seoul 2012
Sharding - Seoul 2012
 
MongoDB 3.0
MongoDB 3.0 MongoDB 3.0
MongoDB 3.0
 
Keccak
KeccakKeccak
Keccak
 
NOSQL101, Or: How I Learned To Stop Worrying And Love The Mongo!
NOSQL101, Or: How I Learned To Stop Worrying And Love The Mongo!NOSQL101, Or: How I Learned To Stop Worrying And Love The Mongo!
NOSQL101, Or: How I Learned To Stop Worrying And Love The Mongo!
 
Шардинг в MongoDB, Henrik Ingo (MongoDB)
Шардинг в MongoDB, Henrik Ingo (MongoDB)Шардинг в MongoDB, Henrik Ingo (MongoDB)
Шардинг в MongoDB, Henrik Ingo (MongoDB)
 
Introduction to Sharding
Introduction to ShardingIntroduction to Sharding
Introduction to Sharding
 
Philipp Krenn "Make Your Data FABulous"
Philipp Krenn "Make Your Data FABulous"Philipp Krenn "Make Your Data FABulous"
Philipp Krenn "Make Your Data FABulous"
 
Scalaで実装してみる簡易ブロックチェーン
Scalaで実装してみる簡易ブロックチェーンScalaで実装してみる簡易ブロックチェーン
Scalaで実装してみる簡易ブロックチェーン
 
OSDC 2012 | Scaling with MongoDB by Ross Lawley
OSDC 2012 | Scaling with MongoDB by Ross LawleyOSDC 2012 | Scaling with MongoDB by Ross Lawley
OSDC 2012 | Scaling with MongoDB by Ross Lawley
 
AWS IoT 핸즈온 워크샵 - 실습 5. DynamoDB에 센서 데이터 저장하기 (김무현 솔루션즈 아키텍트)
AWS IoT 핸즈온 워크샵 - 실습 5. DynamoDB에 센서 데이터 저장하기 (김무현 솔루션즈 아키텍트)AWS IoT 핸즈온 워크샵 - 실습 5. DynamoDB에 센서 데이터 저장하기 (김무현 솔루션즈 아키텍트)
AWS IoT 핸즈온 워크샵 - 실습 5. DynamoDB에 센서 데이터 저장하기 (김무현 솔루션즈 아키텍트)
 
High Performance, Scalable MongoDB in a Bare Metal Cloud
High Performance, Scalable MongoDB in a Bare Metal CloudHigh Performance, Scalable MongoDB in a Bare Metal Cloud
High Performance, Scalable MongoDB in a Bare Metal Cloud
 
MongoDB Auto-Sharding at Mongo Seattle
MongoDB Auto-Sharding at Mongo SeattleMongoDB Auto-Sharding at Mongo Seattle
MongoDB Auto-Sharding at Mongo Seattle
 
Sharding
ShardingSharding
Sharding
 

Mais de MongoDB

Mais de MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
 

Último

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Último (20)

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 

Webinar: MongoDB 2.4 Feature Demo and Q&A on Hash-based Sharding

  • 1. Senior Solutions Architect, 10gen James Kerr 2.4 Sharding Features
  • 2. Agenda • Mechanics of sharding – Key space – Chunks – Balancing • Types of requests • Hashed shard keys – Why use hashed shard keys – How to enable hashed shard keys – Limitations
  • 5. What is a Shard Key • Shard key is used to partition your collection • Shard key must exist in every document • Shard key is immutable • Shard key values are immutable • Shard key must be indexed • Shard key is used to route requests to shards
  • 6. The key space {x: 10} {x: -5} {x: -9} {x: 7} {x: 6} {x: 0}
  • 7. Inserting data {x: 0}{x: 6}{x: 7}{x: -5}{x: 10} {x: -9}
  • 8. Inserting data {x: 0} {x: 6} {x: 7}{x: -5} {x: 10}{x: -9}
  • 9. Chunk range and size {x: 0} {x: 6} {x: 7}{x: -5} {x: 10}{x: -9}
  • 10. Inserting further data {x: 0} {x: 6} {x: 7}{x: -5} {x: 10}{x: -9} {x: 9}{x: -7} {x: 3}
  • 11. Chunk splitting {x: 0} {x: 6} {x: 7}{x: -5} {x: 10}{x: -9} 0 0 • Achunk is split once it exceeds the maximum size • There is no split point if all documents have the same shard key • Chunk split is a logical operation (no data is moved) • If split creates too large of a discrepancy of chunk count across cluster a balancing round starts
  • 12. Data distribution • MinKey to 0 lives on Shard1 • 0 to MaxKey lives on Shard2 • Mongos routes queries appropriately
  • 13. Mongos routes data minKey  0 0  maxKey db.test.insert({ x: -1000 })
  • 14. Mongos routes data minKey  0 0  maxKey db.test.insert({ x: -1000 })
  • 16. Balancing • Migration threshold • Number of chunks less than 20, migration threshold of 2 • 21-80, migration threshold 4 • >80, migration threshold 8
  • 17. Moving the chunk • One chunk of data is copied from Shard 1 to Shard 2
  • 18. Committing Migration • Once everyone agrees the data has moved, that chunk gets deleted from Shard 1.
  • 19. Cleanup • Other mongos' have to find out about new configuration
  • 20. Migrations' effect • Expensive • Can take a long time • Competes for limited resources
  • 21. Picking a shard key • Cardinality • Optimize routing • Minimize (unnecessary) traffic • Allow best scaling
  • 22. What about Object Id? ObjectId("51597ca8e28587b86528edfd”) • Used for _id • 12 byte value • Generated by the driver if not specified • Theoretically globally unique
  • 23. What about Object Id? ObjectId("51597ca8e28587b86528edfd”) 12 Bytes Timestamp MAC PID Counter
  • 24. // enabling sharding on test database mongos> sh.enableSharding("test") { "ok" : 1 } // sharding the test collection mongos> sh.shardCollection("test.test",{_id:1}) { "collectionsharded" : "test.test", "ok" : 1 } // create a loop inserting data mongos> for (x=0; x<10000; x++) { ... db.test.insert({value:x}) ... } Sharding on ObjectId
  • 25. shards: { "_id" : "shard0000", "host" : "localhost:30000" } { "_id" : "shard0001", "host" : "localhost:30001" } databases: { "_id" : "test", "partitioned" : true, "primary" : "shard0001" } test.test shard key: { "_id" : 1 } chunks: shard0001 3 { "_id" : { "$minKey" : 1 } } -->> { "_id" : ObjectId(”...") } on : shard0001 { "t" : 1000, "i" : 1 } { "_id" : ObjectId(”...”) } -->> { "_id" : { "$maxKey" : 1 } } on : shard0001 { "t" : 1000, "i" : 2 } ObjectId chunk distribution
  • 26. ObjectId gives a hot shard minKey  0 0  maxKey
  • 27. Sharding on incremental values like timestamp is not optimum for even distribution
  • 29. Hashed Shard Keys {x:2} md5 c81e728d9d4c2f636f067f89cc14862c {x:3} md5 eccbc87e4b5ce2fe28308fd9f2a7baf3 {x:1} md5 c4ca4238a0b923820dcc509a6f75849b
  • 30. Hashed shard key eliminates hot shards minKey  0 0  maxKey
  • 31. Under the hood • Create a hashed index used for sharding • Uses the first 64-bits of md5 hash of field • Uses existing hash index, or creates a new one on a collection • Hash both data and BSON type • Represented as a NumberLong in the JS shell
  • 32. // hash on 1 as an integer > db.runCommand({_hashBSONElement:1}) { "key" : 1, "seed" : 0, "out" : NumberLong("5902408780260971510"), "ok" : 1 } // hash on “1” as a string > db.runCommand({_hashBSONElement:"1"}) { "key" : "1", "seed" : 0, "out" : NumberLong("-2448670538483119681"), "ok" : 1 } Hash on simple or embedded BSON values
  • 33. Enabling hashed indexes • Create index – db.collection.ensureIndex( {field : ”hashed”} ) • Options – Seed, specify a different seed to use – hashVersion, at the moment only version 0 (md5).
  • 34. Using hash shard keys • Enable sharding on collection – sh.shardCollection(“test.collection”, {field: “hashed”}) • Options – numInitialChunks, specifies the number of initial chunks per shard. Default is two chunks per shard (use “sh._adminCommand” to specify options)
  • 35. // enabling sharding on test database mongos> sh.enableSharding("test") { "ok" : 1 } // shard by hashed _id field mongos> sh.shardCollection("test.hash",{_id:"hashed"}) { "collectionsharded" : "test.hash", "ok" : 1 } Sharding on hashed ObjectId
  • 36. databases: { "_id" : "test", "partitioned" : true, "primary" : "shard0001" } test.hash shard key: { "_id" : "hashed" } chunks: shard0000 2 shard0001 2 { "_id" : { "$minKey" : 1 } } -->> { "_id" : NumberLong("-4611686018427387902") } on : shard0000 { "t" : 2000, "i" : 2 } { "_id" : NumberLong("-4611686018427387902") } -->> { "_id" : NumberLong(0) } on : shard0000 { "t" : 2000, "i" : 3 } { "_id" : NumberLong(0) } -->> { "_id" : NumberLong("4611686018427387902") } on : shard0001 { "t" : 2000, "i" : 4 } { "_id" : NumberLong("4611686018427387902") } -->> { "_id" : { "$maxKey" : 1 } } on : shard0001 { "t" : 2000, "i" : 5 } Pre-splitting the data
  • 37. // create a loop inserting data mongos> for (x=0; x<10000; x++) { ... db.hash.insert({value:x}) ... } Inserting into hashed shard key collection
  • 38. test.hash shard key: { "_id" : "hashed" } chunks: shard0000 4 shard0001 4 { "_id" : { "$minKey" : 1 } } -->> { "_id" : NumberLong("- 7374407069602479355") } on : shard0000 { "t" : 2000, "i" : 8 } { "_id" : NumberLong("-7374407069602479355") } -->> { "_id" : NumberLong("- 4611686018427387902") } on : shard0000 { "t" : 2000, "i" : 9 } { "_id" : NumberLong("-4611686018427387902") } -->> { "_id" : NumberLong("- 2456929743513174890") } on : shard0000 { "t" : 2000, "i" : 6 } { "_id" : NumberLong("-2456929743513174890") } -->> { "_id" : NumberLong(0) } on : shard0000 { "t" : 2000, "i" : 7 } { "_id" : NumberLong(0) } -->> { "_id" : NumberLong("1483539935376971743") } on : shard0001 { "t" : 2000, "i" : 12 } Even distribution of chunks
  • 40. Cluster Request Routing • Targeted Queries • Scatter Gather Queries • Scatter Gather Queries with Sort
  • 41. Cluster Request Routing: Targeted Query
  • 43. Request routed to appropriate shard
  • 46. Cluster Request Routing: Non-Targeted Query
  • 48. Request sent to all shards
  • 51. Cluster Request Routing: Non-Targeted Query with Sort
  • 52. Non-Targeted request with sort received
  • 53. Request sent to all shards
  • 54. Query and sort performed locally
  • 58. Hash keys are great for equality queries • Equality queries directed to a specific shard • Will use the index • Most efficient query possible
  • 59. mongos> db.hash.find({x:1}).explain() { "cursor" : "BtreeCursor x_hashed", "n" : 1, "nscanned" : 1, "nscannedObjects" : 1, "millisShardTotal" : 0, "numQueries" : 1, "numShards" : 1, "indexBounds" : { "x" : [ [ NumberLong("5902408780260971510"), NumberLong("5902408780260971510") ] ] }, "millis" : 0 } Explain plan of an equality query
  • 60. But not so good for a range query • Range queries scatter gather • Won’t use index
  • 61. mongos> db.hash.find({x:{$gt:1, $lt:99}}).explain() { "cursor" : "BasicCursor", "n" : 97, "nChunkSkips" : 0, "nYields" : 0, "nscanned" : 1000, "nscannedAllPlans" : 1000, "nscannedObjects" : 1000, "nscannedObjectsAllPlans" : 1000, "millisShardTotal" : 0, "millisShardAvg" : 0, "numQueries" : 2, "numShards" : 2, "millis" : 3 } Explain plan of a range query
  • 62. Other limitations • Cannot use a compound key • Key cannot have an array value • Tag-aware sharding – Only makes sense to assign the full hashed shard key collection to particular shards – By design, there’s no real way to know or control what data is in what range • Key with poor cardinality is going to give a hash with poor cardinality – Floating point numbers are squashed. E.g. 100.4 will be hashed as 100
  • 63. Summary • Range-based Sharding – Most efficient for applications that operate on ranges – Requires careful shard key selection • Hash-based Sharding – Uniform writes, – No routed range queries • TagAware Sharding – That’s another talk!
  • 64. Resources • Tutorial:Sharding (laptop friendly) • Tutorial:Converting a Replica Set to a Replicated Sharded Cluster(laptop friendly) • Manual:Select a Shard Key • Manual:Hash-based Sharding • Manual:TagAware Sharding • Manual:Strategiesfor Bulk Inserts in ShardedClusters • Manual:Manage Sharded Cluster Balancer
  • 66. James Kerr Thank You Senior Solutions Architect, 10gen

Notas do Editor

  1. BIO: I live and work in the Washington D.C. area and focus on supporting the MongoDB community and delivering solutions using MongoDB to the Federal government. I’ve spent the last 7 years or so working with NoSQL database. I’ve worked in a variety of industries including precise timing systems, military command and control and digital mapping, e-commerce, distributed computing platforms, search, system integration and Big Data applications.
  2. Remind everyone what a sharded cluster is. We will take a close look at some how sharded clusters work and at the new hashed shard key feature of 2.4
  3. Isolating queries (to a few shards)Scatter -- gather ( high latency but not bad )hash keys
  4. Min value includedMax value not included
  5. Balancer is running on mongosOnce the difference in chunks between the most dense shard and the least dense shard is above the migration threshold, a balancing round starts
  6. Moved chunk on shard2 should be gray
  7. Source shard deletes moved dataMust wait for open cursors to either close or time outNoTimeout cursors may prevent the release of the lockMongos releases the balancer lock after old chunks are deleted
  8. Moving data is expensive (i/o, network bandwidth)Moving many chunks takes a long time (can only move one chunk at a time)Balancing and migrations compete for resources with your application
  9. What’s the solution to sharding on incremental values as a shard key?
  10. mongod --setParameter=enableTestCommands=1
  11. Seed and hashVersion are undocumented at this pointLet people know we are at least thinking about these things (especially the ability to change the hash algorithm)
  12. When sharding a new collection using Hash-based shard keys, MongoDB will take care of the presplitting for you. Similarly sized ranges of the Hash-based key are distributed to each existing shard, which means that no initial balancing is needed (unless of course new shards are added).
  13. Only happens on new collections
  14. Query contains the shard key field
  15. The mongos does not have to load the whole set into memory since each shard sorts locally. The mongos can just getMore from the shards as needed and incrementally return the results to the client.
  16. Uses the hashed index
  17. Assuming only a hashed index on “x”
  18. Tag-aware note: it doesn’t usually make a lot of sense to tag anything other than the full hashed shard key collection to particular shards - by design, there’s no real way to know or control what data is in what range.since the chunk ranges are based on the value of the randomized hash of the shard key instead of the shard key itself, this is usually only useful for tagging the whole range to a specific set of shards