SlideShare uma empresa Scribd logo
1 de 19
Introduction to sharding
                                Christos Soulios
                     Software Architect, Persado
                       (christos.soulios@persado.com)



15 Jan 2013                                             0
Lets start with an example:

 We have launched our latest and greatest web application
 We use MongoDB database which is fast and cool
 We even have setup replication for high availability
 Our application turns out to be popular and we are already
  planning our next project
 Cool!




                                                               1
Unfortunately, our website becomes too popular too fast.
And this causes problems




                                                           2
MongoDB problems when dataset grows

 Dataset does not fit on local disks.
Solution: Let’s buy more disks
 Database indexes do not fit in memory. They have to be paged
   in and out. Database becomes sluggish
Solution: Let’s buy more memory
 High throughput writing operations cause high contention on
   the infamous MongoDB locks
Now what?

       We need to scale horizontally. We need sharding



                                                                 3
What is sharding?

 Shardingis automatic data partitioning
 Distributes data evenly across cluster nodes (called shards)
 Allows for seamless querying. Almost no functionality lost
  over single master
 Keeps database consistent




                                                                 4
How sharding works

 Collection data is broken into chunks based on the range of a
  selected collection field. This field is called the shard key
 Chunks are evenly distributed across shards. Each data chunk is
  controlled by a single shard
 Special config servers are responsible for storing which shard
  controls which chunks
 Database clients communicate with the shards through the mongos
  router process
 mongos router behaves to the client just as a normal mongod
  server. Sharding is transparent to the client
 For each database operation, the mongosrouter queries the config
  servers using the shard key and redirects the operation to the
  correct shards
 While more data is inserted, ranges are split into more chunks

                                                                     5
Example (Users collection)
{„user_id‟ : 45,
     „username‟: „asterix‟,
     „email‟ : asterix@google.com
 „last_login‟: ‟11/11/2012‟
},
{„user_id‟ : 4503,
     „username‟: „gandalf‟,
     „email‟ : gandalf_rules@yahoo.com
 „last_login‟: ‟01/14/2013‟
},
{„user_id‟ : 1153,
     „username‟: „superman‟,
     „email‟ : superman@superdomain.com
  „last_login‟: ‟10/30/2012‟
},
{„user_id‟ : 5434,
     „username‟: „darth_vader‟,
     „email‟ : darth@stardestroyer.org
 „last_login‟: ‟07/01/2012‟
}



          >db.runCommand( { shardcollection: “test.users”, key: { username: 1 }} )


                                                                                     6
Shard architecture (sharding by user_id)




                                           7
Database operations

 All queries are routed through the mongosprocess
 Insert operations are routed by shard key. Shard key is
  required
 Querying by shard key routes the query to shards
 Querying by non-shard key scatters the query to all shards
  and gathers results
 Updates and deletes behave like queries




                                                               8
Data balancing

 System becomes unbalanced when one shard stores more
  data chunks than others
 Data is automatically balanced without intervention from the
  client application or the administrator




                                                                 9
Data balancing

 The range of the loaded shard is split and chunks are migrated
  to other shards




                                                                   10
Data balancing

 Config servers are updated using a 2phase commit process to
  ensure database consistency
 System ends up balanced




                                                                11
Choosing a shard key

 Choosing a good shard key is critical
 Once chosen, we are stuck with it
 Shard key must be immutable
 Should distribute data load evenly across shards
 Should be of high cardinality. Enumerated values are not good
  shard keys
 Should not be monotically increasing. ObjectIds, dates or database
  sequences are not good shard keys, because they create hotspots
 Should be used by most critical queries to provide query isolation.
  Avoid scatter-gather queries
 Should provide good data affinity to avoid disk to memory transfers
  (random values are not good shard keys)


                                                                        13
Choosing a shard key

Know your data. It is important
 What is the expected dataset size?
 What is the write throughput?
 How do data look like? Which fields are random or increasing?
  Are there low cardinality fields?
 Can we identify any access patterns for reads?
 What data is indexed?
 What is the active working set? Are there historical data that
  are not used after sometime?




                                                                   14
Choosing a shard key

 It is not trivial
 Most of the times there is no single field that can be used as
  shard key
 We have to invent one




                                                                   15
Choosing a shard key

 Usually applications access lately inserted data more often
 What about a compound shard key?
 What about a combination of a coarsely ascending field and a
  commonly queried search key?
 Coarsely ascending key should have a few hundreds of chunks
  per value. This provides good data locality and even
  distribution
 Search key provides query isolation

         Rule of thumb: {coarseLocality: 1, search : 1}



                                                                 16
Example (Tweets collection)
{user: „asterix‟,
ts: ISODate(“01/14/2013Z22:53:33.123”),
 month: „2013-01‟
retweets: 45,
 client: „TweetDeck‟,
 text: „Mongodbsharding is super cool!‟
}


We are typically looking for the latest tweets of a user.

 Therefore, a combination of „month + user‟ fields would create a
  good shard key
 monthfield is coarsely ascending, allowing to transfer only
  latest tweets to memory
 user field is a commonly searched key


                                                                     17
Conclusion

   Sharding allows MongoDB databases to scale horizontally
   Shard balancing is performed automatically by the system
   Sharding is transparent to the client application
   Choosing a good shard key is critical
   Choosing a good shard key is not trivial
   Be creative and experiment with your data before choosing
    the shard key




                                                                18
Questions ?

Mais conteúdo relacionado

Mais procurados

When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...MongoDB
 
MongoDB Best Practices in AWS
MongoDB Best Practices in AWS MongoDB Best Practices in AWS
MongoDB Best Practices in AWS Chris Harris
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDBMongoDB
 
How To Connect Spark To Your Own Datasource
How To Connect Spark To Your Own DatasourceHow To Connect Spark To Your Own Datasource
How To Connect Spark To Your Own DatasourceMongoDB
 
Cassandra Summit 2014: Social Media Security Company Nexgate Relies on Cassan...
Cassandra Summit 2014: Social Media Security Company Nexgate Relies on Cassan...Cassandra Summit 2014: Social Media Security Company Nexgate Relies on Cassan...
Cassandra Summit 2014: Social Media Security Company Nexgate Relies on Cassan...DataStax Academy
 
Building a Real-Time Gaming Analytics Service with Apache Druid
Building a Real-Time Gaming Analytics Service with Apache DruidBuilding a Real-Time Gaming Analytics Service with Apache Druid
Building a Real-Time Gaming Analytics Service with Apache DruidImply
 
Why data warehouses cannot support hot analytics
Why data warehouses cannot support hot analyticsWhy data warehouses cannot support hot analytics
Why data warehouses cannot support hot analyticsImply
 
Webinar: Compliance and Data Protection in the Big Data Age: MongoDB Security...
Webinar: Compliance and Data Protection in the Big Data Age: MongoDB Security...Webinar: Compliance and Data Protection in the Big Data Age: MongoDB Security...
Webinar: Compliance and Data Protection in the Big Data Age: MongoDB Security...MongoDB
 
Webinar: What's New in MongoDB 3.2
Webinar: What's New in MongoDB 3.2Webinar: What's New in MongoDB 3.2
Webinar: What's New in MongoDB 3.2MongoDB
 
When to Use MongoDB
When to Use MongoDBWhen to Use MongoDB
When to Use MongoDBMongoDB
 
Blazing Fast Analytics with MongoDB & Spark
Blazing Fast Analytics with MongoDB & SparkBlazing Fast Analytics with MongoDB & Spark
Blazing Fast Analytics with MongoDB & SparkMongoDB
 
A gentle introduction to the world of BigData and Hadoop
A gentle introduction to the world of BigData and HadoopA gentle introduction to the world of BigData and Hadoop
A gentle introduction to the world of BigData and HadoopStefano Paluello
 
Apache Druid Vision and Roadmap
Apache Druid Vision and RoadmapApache Druid Vision and Roadmap
Apache Druid Vision and RoadmapImply
 
Apache Druid®: A Dance of Distributed Processes
 Apache Druid®: A Dance of Distributed Processes Apache Druid®: A Dance of Distributed Processes
Apache Druid®: A Dance of Distributed ProcessesImply
 
Globally Distributed RESTful Object Storage
Globally Distributed RESTful Object Storage Globally Distributed RESTful Object Storage
Globally Distributed RESTful Object Storage MongoDB
 
MongoDB Evenings Houston: What's the Scoop on MongoDB and Hadoop? by Jake Ang...
MongoDB Evenings Houston: What's the Scoop on MongoDB and Hadoop? by Jake Ang...MongoDB Evenings Houston: What's the Scoop on MongoDB and Hadoop? by Jake Ang...
MongoDB Evenings Houston: What's the Scoop on MongoDB and Hadoop? by Jake Ang...MongoDB
 
Schema Design Best Practices with Buzz Moschetti
Schema Design Best Practices with Buzz MoschettiSchema Design Best Practices with Buzz Moschetti
Schema Design Best Practices with Buzz MoschettiMongoDB
 
C* Summit 2013: Suicide Risk Prediction Using Social Media and Cassandra by K...
C* Summit 2013: Suicide Risk Prediction Using Social Media and Cassandra by K...C* Summit 2013: Suicide Risk Prediction Using Social Media and Cassandra by K...
C* Summit 2013: Suicide Risk Prediction Using Social Media and Cassandra by K...DataStax Academy
 
Big Data Tutorial - Marko Grobelnik - 25 May 2012
Big Data Tutorial - Marko Grobelnik - 25 May 2012Big Data Tutorial - Marko Grobelnik - 25 May 2012
Big Data Tutorial - Marko Grobelnik - 25 May 2012Marko Grobelnik
 

Mais procurados (20)

When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...
 
MongoDB Best Practices in AWS
MongoDB Best Practices in AWS MongoDB Best Practices in AWS
MongoDB Best Practices in AWS
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
How To Connect Spark To Your Own Datasource
How To Connect Spark To Your Own DatasourceHow To Connect Spark To Your Own Datasource
How To Connect Spark To Your Own Datasource
 
Cassandra Summit 2014: Social Media Security Company Nexgate Relies on Cassan...
Cassandra Summit 2014: Social Media Security Company Nexgate Relies on Cassan...Cassandra Summit 2014: Social Media Security Company Nexgate Relies on Cassan...
Cassandra Summit 2014: Social Media Security Company Nexgate Relies on Cassan...
 
Building a Real-Time Gaming Analytics Service with Apache Druid
Building a Real-Time Gaming Analytics Service with Apache DruidBuilding a Real-Time Gaming Analytics Service with Apache Druid
Building a Real-Time Gaming Analytics Service with Apache Druid
 
Why data warehouses cannot support hot analytics
Why data warehouses cannot support hot analyticsWhy data warehouses cannot support hot analytics
Why data warehouses cannot support hot analytics
 
Mongo DB
Mongo DBMongo DB
Mongo DB
 
Webinar: Compliance and Data Protection in the Big Data Age: MongoDB Security...
Webinar: Compliance and Data Protection in the Big Data Age: MongoDB Security...Webinar: Compliance and Data Protection in the Big Data Age: MongoDB Security...
Webinar: Compliance and Data Protection in the Big Data Age: MongoDB Security...
 
Webinar: What's New in MongoDB 3.2
Webinar: What's New in MongoDB 3.2Webinar: What's New in MongoDB 3.2
Webinar: What's New in MongoDB 3.2
 
When to Use MongoDB
When to Use MongoDBWhen to Use MongoDB
When to Use MongoDB
 
Blazing Fast Analytics with MongoDB & Spark
Blazing Fast Analytics with MongoDB & SparkBlazing Fast Analytics with MongoDB & Spark
Blazing Fast Analytics with MongoDB & Spark
 
A gentle introduction to the world of BigData and Hadoop
A gentle introduction to the world of BigData and HadoopA gentle introduction to the world of BigData and Hadoop
A gentle introduction to the world of BigData and Hadoop
 
Apache Druid Vision and Roadmap
Apache Druid Vision and RoadmapApache Druid Vision and Roadmap
Apache Druid Vision and Roadmap
 
Apache Druid®: A Dance of Distributed Processes
 Apache Druid®: A Dance of Distributed Processes Apache Druid®: A Dance of Distributed Processes
Apache Druid®: A Dance of Distributed Processes
 
Globally Distributed RESTful Object Storage
Globally Distributed RESTful Object Storage Globally Distributed RESTful Object Storage
Globally Distributed RESTful Object Storage
 
MongoDB Evenings Houston: What's the Scoop on MongoDB and Hadoop? by Jake Ang...
MongoDB Evenings Houston: What's the Scoop on MongoDB and Hadoop? by Jake Ang...MongoDB Evenings Houston: What's the Scoop on MongoDB and Hadoop? by Jake Ang...
MongoDB Evenings Houston: What's the Scoop on MongoDB and Hadoop? by Jake Ang...
 
Schema Design Best Practices with Buzz Moschetti
Schema Design Best Practices with Buzz MoschettiSchema Design Best Practices with Buzz Moschetti
Schema Design Best Practices with Buzz Moschetti
 
C* Summit 2013: Suicide Risk Prediction Using Social Media and Cassandra by K...
C* Summit 2013: Suicide Risk Prediction Using Social Media and Cassandra by K...C* Summit 2013: Suicide Risk Prediction Using Social Media and Cassandra by K...
C* Summit 2013: Suicide Risk Prediction Using Social Media and Cassandra by K...
 
Big Data Tutorial - Marko Grobelnik - 25 May 2012
Big Data Tutorial - Marko Grobelnik - 25 May 2012Big Data Tutorial - Marko Grobelnik - 25 May 2012
Big Data Tutorial - Marko Grobelnik - 25 May 2012
 

Semelhante a Hellenic MongoDB user group - Introduction to sharding

Comparison between mongo db and cassandra using ycsb
Comparison between mongo db and cassandra using ycsbComparison between mongo db and cassandra using ycsb
Comparison between mongo db and cassandra using ycsbsonalighai
 
CouchBase The Complete NoSql Solution for Big Data
CouchBase The Complete NoSql Solution for Big DataCouchBase The Complete NoSql Solution for Big Data
CouchBase The Complete NoSql Solution for Big DataDebajani Mohanty
 
Scaling MongoDB - Presentation at MTP
Scaling MongoDB - Presentation at MTPScaling MongoDB - Presentation at MTP
Scaling MongoDB - Presentation at MTPdarkdata
 
Cloudera Breakfast Series, Analytics Part 1: Use All Your Data
Cloudera Breakfast Series, Analytics Part 1: Use All Your DataCloudera Breakfast Series, Analytics Part 1: Use All Your Data
Cloudera Breakfast Series, Analytics Part 1: Use All Your DataCloudera, Inc.
 
What Your Database Query is Really Doing
What Your Database Query is Really DoingWhat Your Database Query is Really Doing
What Your Database Query is Really DoingDave Stokes
 
Quick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays
Quick dive into the big data pool without drowning - Demi Ben-Ari @ PanoraysQuick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays
Quick dive into the big data pool without drowning - Demi Ben-Ari @ PanoraysDemi Ben-Ari
 
Your data layer - Choosing the right database solutions for the future
Your data layer - Choosing the right database solutions for the futureYour data layer - Choosing the right database solutions for the future
Your data layer - Choosing the right database solutions for the futureObjectRocket
 
Hortonworks Big Data & Hadoop
Hortonworks Big Data & HadoopHortonworks Big Data & Hadoop
Hortonworks Big Data & HadoopMark Ginnebaugh
 
Privacy preserving multi-keyword ranked search over encrypted cloud data
Privacy preserving multi-keyword ranked search over encrypted cloud dataPrivacy preserving multi-keyword ranked search over encrypted cloud data
Privacy preserving multi-keyword ranked search over encrypted cloud dataPapitha Velumani
 
MongoDB Sharding Webinar 2014
MongoDB Sharding Webinar 2014MongoDB Sharding Webinar 2014
MongoDB Sharding Webinar 2014Dylan Tong
 
Big data and computing grid
Big data and computing gridBig data and computing grid
Big data and computing gridThang Nguyen
 
MongoDB meetup at Hike
MongoDB meetup at HikeMongoDB meetup at Hike
MongoDB meetup at HikeBharvi Dixit
 
Privacy preserving multi-keyword ranked search over encrypted cloud data
Privacy preserving multi-keyword ranked search over encrypted cloud dataPrivacy preserving multi-keyword ranked search over encrypted cloud data
Privacy preserving multi-keyword ranked search over encrypted cloud dataShakas Technologies
 
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...DataStax
 
Sharing a Startup’s Big Data Lessons
Sharing a Startup’s Big Data LessonsSharing a Startup’s Big Data Lessons
Sharing a Startup’s Big Data LessonsGeorge Stathis
 

Semelhante a Hellenic MongoDB user group - Introduction to sharding (20)

Comparison between mongo db and cassandra using ycsb
Comparison between mongo db and cassandra using ycsbComparison between mongo db and cassandra using ycsb
Comparison between mongo db and cassandra using ycsb
 
Hadoop bank
Hadoop bankHadoop bank
Hadoop bank
 
NOSQL
NOSQLNOSQL
NOSQL
 
CouchBase The Complete NoSql Solution for Big Data
CouchBase The Complete NoSql Solution for Big DataCouchBase The Complete NoSql Solution for Big Data
CouchBase The Complete NoSql Solution for Big Data
 
Scaling MongoDB - Presentation at MTP
Scaling MongoDB - Presentation at MTPScaling MongoDB - Presentation at MTP
Scaling MongoDB - Presentation at MTP
 
Spark
SparkSpark
Spark
 
Cloudera Breakfast Series, Analytics Part 1: Use All Your Data
Cloudera Breakfast Series, Analytics Part 1: Use All Your DataCloudera Breakfast Series, Analytics Part 1: Use All Your Data
Cloudera Breakfast Series, Analytics Part 1: Use All Your Data
 
What Your Database Query is Really Doing
What Your Database Query is Really DoingWhat Your Database Query is Really Doing
What Your Database Query is Really Doing
 
Quick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays
Quick dive into the big data pool without drowning - Demi Ben-Ari @ PanoraysQuick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays
Quick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays
 
Your data layer - Choosing the right database solutions for the future
Your data layer - Choosing the right database solutions for the futureYour data layer - Choosing the right database solutions for the future
Your data layer - Choosing the right database solutions for the future
 
Hortonworks Big Data & Hadoop
Hortonworks Big Data & HadoopHortonworks Big Data & Hadoop
Hortonworks Big Data & Hadoop
 
Privacy preserving multi-keyword ranked search over encrypted cloud data
Privacy preserving multi-keyword ranked search over encrypted cloud dataPrivacy preserving multi-keyword ranked search over encrypted cloud data
Privacy preserving multi-keyword ranked search over encrypted cloud data
 
No sq lv1_0
No sq lv1_0No sq lv1_0
No sq lv1_0
 
MongoDB Sharding Webinar 2014
MongoDB Sharding Webinar 2014MongoDB Sharding Webinar 2014
MongoDB Sharding Webinar 2014
 
Big data and computing grid
Big data and computing gridBig data and computing grid
Big data and computing grid
 
MongoDB meetup at Hike
MongoDB meetup at HikeMongoDB meetup at Hike
MongoDB meetup at Hike
 
Privacy preserving multi-keyword ranked search over encrypted cloud data
Privacy preserving multi-keyword ranked search over encrypted cloud dataPrivacy preserving multi-keyword ranked search over encrypted cloud data
Privacy preserving multi-keyword ranked search over encrypted cloud data
 
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
 
No sql database
No sql databaseNo sql database
No sql database
 
Sharing a Startup’s Big Data Lessons
Sharing a Startup’s Big Data LessonsSharing a Startup’s Big Data Lessons
Sharing a Startup’s Big Data Lessons
 

Último

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 

Último (20)

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 

Hellenic MongoDB user group - Introduction to sharding

  • 1. Introduction to sharding Christos Soulios Software Architect, Persado (christos.soulios@persado.com) 15 Jan 2013 0
  • 2. Lets start with an example:  We have launched our latest and greatest web application  We use MongoDB database which is fast and cool  We even have setup replication for high availability  Our application turns out to be popular and we are already planning our next project  Cool! 1
  • 3. Unfortunately, our website becomes too popular too fast. And this causes problems 2
  • 4. MongoDB problems when dataset grows  Dataset does not fit on local disks. Solution: Let’s buy more disks  Database indexes do not fit in memory. They have to be paged in and out. Database becomes sluggish Solution: Let’s buy more memory  High throughput writing operations cause high contention on the infamous MongoDB locks Now what? We need to scale horizontally. We need sharding 3
  • 5. What is sharding?  Shardingis automatic data partitioning  Distributes data evenly across cluster nodes (called shards)  Allows for seamless querying. Almost no functionality lost over single master  Keeps database consistent 4
  • 6. How sharding works  Collection data is broken into chunks based on the range of a selected collection field. This field is called the shard key  Chunks are evenly distributed across shards. Each data chunk is controlled by a single shard  Special config servers are responsible for storing which shard controls which chunks  Database clients communicate with the shards through the mongos router process  mongos router behaves to the client just as a normal mongod server. Sharding is transparent to the client  For each database operation, the mongosrouter queries the config servers using the shard key and redirects the operation to the correct shards  While more data is inserted, ranges are split into more chunks 5
  • 7. Example (Users collection) {„user_id‟ : 45, „username‟: „asterix‟, „email‟ : asterix@google.com „last_login‟: ‟11/11/2012‟ }, {„user_id‟ : 4503, „username‟: „gandalf‟, „email‟ : gandalf_rules@yahoo.com „last_login‟: ‟01/14/2013‟ }, {„user_id‟ : 1153, „username‟: „superman‟, „email‟ : superman@superdomain.com „last_login‟: ‟10/30/2012‟ }, {„user_id‟ : 5434, „username‟: „darth_vader‟, „email‟ : darth@stardestroyer.org „last_login‟: ‟07/01/2012‟ } >db.runCommand( { shardcollection: “test.users”, key: { username: 1 }} ) 6
  • 9. Database operations  All queries are routed through the mongosprocess  Insert operations are routed by shard key. Shard key is required  Querying by shard key routes the query to shards  Querying by non-shard key scatters the query to all shards and gathers results  Updates and deletes behave like queries 8
  • 10. Data balancing  System becomes unbalanced when one shard stores more data chunks than others  Data is automatically balanced without intervention from the client application or the administrator 9
  • 11. Data balancing  The range of the loaded shard is split and chunks are migrated to other shards 10
  • 12. Data balancing  Config servers are updated using a 2phase commit process to ensure database consistency  System ends up balanced 11
  • 13. Choosing a shard key  Choosing a good shard key is critical  Once chosen, we are stuck with it  Shard key must be immutable  Should distribute data load evenly across shards  Should be of high cardinality. Enumerated values are not good shard keys  Should not be monotically increasing. ObjectIds, dates or database sequences are not good shard keys, because they create hotspots  Should be used by most critical queries to provide query isolation. Avoid scatter-gather queries  Should provide good data affinity to avoid disk to memory transfers (random values are not good shard keys) 13
  • 14. Choosing a shard key Know your data. It is important  What is the expected dataset size?  What is the write throughput?  How do data look like? Which fields are random or increasing? Are there low cardinality fields?  Can we identify any access patterns for reads?  What data is indexed?  What is the active working set? Are there historical data that are not used after sometime? 14
  • 15. Choosing a shard key  It is not trivial  Most of the times there is no single field that can be used as shard key  We have to invent one 15
  • 16. Choosing a shard key  Usually applications access lately inserted data more often  What about a compound shard key?  What about a combination of a coarsely ascending field and a commonly queried search key?  Coarsely ascending key should have a few hundreds of chunks per value. This provides good data locality and even distribution  Search key provides query isolation Rule of thumb: {coarseLocality: 1, search : 1} 16
  • 17. Example (Tweets collection) {user: „asterix‟, ts: ISODate(“01/14/2013Z22:53:33.123”), month: „2013-01‟ retweets: 45, client: „TweetDeck‟, text: „Mongodbsharding is super cool!‟ } We are typically looking for the latest tweets of a user.  Therefore, a combination of „month + user‟ fields would create a good shard key  monthfield is coarsely ascending, allowing to transfer only latest tweets to memory  user field is a commonly searched key 17
  • 18. Conclusion  Sharding allows MongoDB databases to scale horizontally  Shard balancing is performed automatically by the system  Sharding is transparent to the client application  Choosing a good shard key is critical  Choosing a good shard key is not trivial  Be creative and experiment with your data before choosing the shard key 18