SlideShare uma empresa Scribd logo
1 de 31
Baixar para ler offline
Challenges with
                          MongoDB
                            Stone Gao




                                        MongoDB Beijing 2012

Monday, April 2, 2012
About Me
                        Tech Lead at Umeng.com




Monday, April 2, 2012
MongoDB is Awesome
                 •      Document-oriented storage

                 •      Full Index Support

                 •      Replication & High Availability

                 •      Auto-Sharding

                 •      Querying

                 •      Fast In-Place Updates

                 •      Map/Reduce

                 •      GridFS

Monday, April 2, 2012
But...
                        This talk is not Yet Another Talk about it’s
                        Awesomeness
                        but
                        challenges with MongoDB




Monday, April 2, 2012
Outline
                 1. Global Write Lock Sucks
                 2. Auto-Sharding is not that Reliable
                 3. Schema-less is Over Rated
                 4. Community Contribution is Quite Low
                 5. Attitude Matters


Monday, April 2, 2012
1. Global Write Lock Sucks




                               http://www.clker.com/cliparts/3/3/5/D/X/b/locked-exclamation-mark-padlock-hi.png




Monday, April 2, 2012
1. Global Write Lock Sucks
                                 single global write lock for the entire server (process)


                               collection1                                         table1
                        doc1                                                doc1
                        doc2                                                doc2
                                  db-1                                              db-1

                               collection2                                         table2
                        doc1                                                doc1
                        doc2                                                doc2



                                mongod                                             mysqld




                        doc1
                               collection1                 VS.              doc1
                                                                                   table1

                        doc2                                                doc2
                                  db-n                                              db-n

                               collection2                                         table2
                        doc1                                                doc1
                        doc2                                                doc2




                                             DB Process Lock VS. Row Lock


Monday, April 2, 2012
1. Global Write Lock Sucks
                         Intel SSD 320 RAID10 & mongostat
                         39.5K Rread IOPS / 23K Write IOPS




              Nearly all data in RAM, lock ratio is pretty high and
                      bunch of Queued Writes(qw)

Monday, April 2, 2012
1. Global Write Lock Sucks
                         Intel SSD 320 RAID10 & mongostat
                         39.5K Rread IOPS / 23K Write IOPS




              Nearly all data in RAM, lock ratio is pretty high and
                      bunch of Queued Writes(qw)

Monday, April 2, 2012
1. Global Write Lock Sucks
                         Intel SSD 320 RAID10 & mongostat
                         39.5K Rread IOPS / 23K Write IOPS




              Nearly all data in RAM, lock ratio is pretty high and
                      bunch of Queued Writes(qw)

Monday, April 2, 2012
1. Global Write Lock Sucks
                         Intel SSD 320 RAID10 & mongostat
                         39.5K Rread IOPS / 23K Write IOPS




              Nearly all data in RAM, lock ratio is pretty high and
                      bunch of Queued Writes(qw)

Monday, April 2, 2012
Possible Solutions/Workarounds #1
                        Wait for lock related issues on JIRA

                        •SERVER-2563 : When hitting disk, yield lock - phase 1
                         https://jira.mongodb.org/browse/SERVER-2563      Fixed in 1.9.1 Vote (25)

                        • any time we actually have to hit disk. so if a memory mapped page is not in ram, then we should yield
                         update by _id, remove, long cursor iteration


                        •SERVER-1240 : Collection level locking
                        https://jira.mongodb.org/browse/SERVER-1240      Planning Bucket A Vote (154)


                        •SERVER-1241 : Intra collection locking (maybe extent)
                         https://jira.mongodb.org/browse/SERVER-1241      Planning Bucket A Vote (25)


                        •SERVER-1169 : Record level locking
                         https://jira.mongodb.org/browse/SERVER-1169 Rejected Vote (1)



                        and more ...




Monday, April 2, 2012
Possible Solutions/Workarounds #2
                        One Collection per DB to Reduce Lock Ratio


                              But you can go no further

                          Use Auto-Sharding to the rescue ?




Monday, April 2, 2012
2. Auto-Sharding is not that Reliable




                        http://www.autoinsurancecompanies.com/wp-content/uploads/2011/11/reliable.jpg




Monday, April 2, 2012
Auto-Sharding is not that Reliable




Monday, April 2, 2012
Problems with Auto-Sharding
                 •      MongoDB can’t figure out how many docs in a collection after sharding

                 •      Balancer dead lock
                        [Balancer] skipping balancing round during ongoing split or move activity.)
                        [Balancer] dist_lock lock failed because taken by....
                        [Balancer] Assertion failure cm s/balance.cpp...


                 •      Uneven shard load distribution

                 •      ...




                        (Note: I did the experiment before 2.0. So some of the issues might be fixed
                        or improved in new versions of MongoDB coz it’s evolving very fast)




Monday, April 2, 2012
Possible Solutions/Workarounds #1
                                              Manual Chunk Pre-Splitting
                                 http://www.mongodb.org/display/DOCS/Splitting+Shard+Chunks
                                 https://groups.google.com/d/msg/mongodb-user/tYBFKSMM3cU/TiYtoOiNMgEJ
                                 http://blog.zawodny.com/2011/03/06/mongodb-pre-splitting-for-faster-data-loading-and-importing/


                        0) Turn off the balancer (balancing won't understand your locations, but it shouldn't matter b/c
                        you're using hashed shard keys)

                 1) Shard the empty collection over the shard key { location : 1, hash : 1 }

                 2) run db.runCommand({ split : "<coll>", middle : { "location":"DEN", "hash": "8000...0" }})

                 3) run db.runCommand({ split : "<coll>", middle : { "location":"SC", "hash": "0000...0" }})

                 4) move those empty chunks to whatever shards you want

                 - Greg Studer




Monday, April 2, 2012
Possible Solutions/Workarounds #2
                        SERVER-2001 : Option to hash shard key
                        https://jira.mongodb.org/browse/SERVER-2001            Unresolved Fix Version/s: 2.1.1 Vote (27)


                               “The lack of hashing based read/write distribution
                               amongst available shards is a huge issue for us now.
                               We're actually considering implementing an app-side
                               layer to do this but that obviously has a number of
                               serious drawbacks.”
                               - Remon van Vliet

                               “Seems like a good idea : we implemented hashed
                               shard key on client-side : operation rate sky rocked
                               ( x3 and less variability). Balancing is moreover
                               quicker and done during our very heavy insertion
                               process : perfect !”
                               - Grégoire Seux

                                                     https://github.com/twitter/gizzard/raw/master/doc/forwarding_table.png




Monday, April 2, 2012
Possible Solutions/Workarounds #3
                        Plain-old Application Level Sharding




                                  https://github.com/twitter/gizzard/raw/master/doc/forwarding_table.png




Monday, April 2, 2012
3. Schema-less is Over Rated 




                                http://images.sodahead.com/polls/001635729/1863780_overrated_answer_2_xlarge.jpeg




Monday, April 2, 2012
Schema-less is Over Rated




                               Schema-Free (schema-less) is not free.
                        It means repeat the schema in every docs (records) !




Monday, April 2, 2012
Possible Solutions/Workarounds #1
                                           Use Short Key Names

                        1.6 billion documents
                        {"sequence":"AHAHSPGPGSAVKLPAPHSVGKSALR",
                         "location":{

                                                                                               243 GB
                             "chromosome":"19",
                             "strand":"-",
                             "begin":"51067007",
                             "end":"51067085"
                         }}




                                                                                               183 GB
                        {"s":"AHAHSPGPGSAVKLPAPHSVGKSALR",
                         "l":{
                             "c":"19",
                             "s":"-",
                             "b":"51067007",
                             "e":"51067085"
                         }}


                                                                                       60 GB saved!
                                ref : http://christophermaier.name/blog/2011/05/22/MongoDB-key-names


Monday, April 2, 2012
Possible Solutions/Workarounds #2
                        SERVER-863 : Tokenize the field names
                        https://jira.mongodb.org/browse/SERVER-863   planned but not scheduled Vote (66)



                        “Most collections, even if they don’t contain the same
                        structure , they contain similar. So it would make a
                        lot of sense and save a lot of space to tokenize the field
                        names.”
                        “The overall benefit as mentioned by other users is that
                        you reduce the amount of storage/RAM taken up by
                        redundant data in each document (so you can use
                        less resources per request, hence gain more throughput
                        and capacity), while importantly also freeing the
                        developer from having to pick short and hard to read
                        field names as a workaround for a technical limitation.”

                        - Andrew Armstrong



Monday, April 2, 2012
Possible Solutions/Workarounds #3
                        SERVER-164 : Option to store data compressed
                        https://jira.mongodb.org/browse/SERVER-164   planned but not scheduled Vote (126)



                                     “The way oracle handles this is transparent to the
                                     database server at the block engine level. They
                                     compress the blocks similar to how SAN store's handle
                                     it rather than at a record level. They use zlib type
                                     compression and the overhead is less than 5 percent.
                                     Due to the IO access reduction in both number of
                                     blocks touched, and amount of data transferred, the
                                     overall effect is a cumulative speed increase.
                                     Should MongoDB do it this way? Maybe? But at the end
                                     of the day, the architecture must make Mongo more
                                     scalable, as well as increase the ability limit the storage
                                     footprint.”
                                     - Michael D. Joy


Monday, April 2, 2012
4. Community Contribution is
                               Quite Low




                               http://www.thompsoncrg.com/wp-content/themes/zoomtechnic/images/slide/img3.jpg




Monday, April 2, 2012
Community Contribution is
                             Quite Low




                              https://github.com/mongodb/mongo/graphs/impact
                              https://github.com/mongodb/mongo/contributors




Monday, April 2, 2012
5. Attitude Matters




Monday, April 2, 2012
5. Attitude Matters
                         http://www.mongodb.org/display/DOCS/SQL+to+Mongo+Mapping+Chart




    MongoDB already has the sweetest API in the
                 NoSQL world.



   Wish more effort invested in fixing the Hard
   Problems : locking, sharding, storage engine...



Monday, April 2, 2012
We are hiring
                              We are doing bigdata analytics
                        • Backend Engineer (MongoDB, Hadoop,
                          HBase, Storm, Scala, Java, Ruby, Clojure)
                        • Data Mining Engineer
                        • DevOps Engineer
                        • Front End Engineer
                                    hr@umeng.com
Monday, April 2, 2012
Contact
                        • Email : stones.gao@gmail.com
                                gaolei@umeng.com
                        • Twitter: @stonegao



Monday, April 2, 2012
Q &A

                        Thanks


Monday, April 2, 2012

Mais conteúdo relacionado

Mais procurados

Conceptos básicos. Seminario web 6: Despliegue de producción
Conceptos básicos. Seminario web 6: Despliegue de producciónConceptos básicos. Seminario web 6: Despliegue de producción
Conceptos básicos. Seminario web 6: Despliegue de producciónMongoDB
 
MongoDB basics & Introduction
MongoDB basics & IntroductionMongoDB basics & Introduction
MongoDB basics & IntroductionJerwin Roy
 
MongoDB Best Practices in AWS
MongoDB Best Practices in AWS MongoDB Best Practices in AWS
MongoDB Best Practices in AWS Chris Harris
 
Connecting NodeJS & MongoDB
Connecting NodeJS & MongoDBConnecting NodeJS & MongoDB
Connecting NodeJS & MongoDBEnoch Joshua
 
Migrating to MongoDB: Best Practices
Migrating to MongoDB: Best PracticesMigrating to MongoDB: Best Practices
Migrating to MongoDB: Best PracticesMongoDB
 
MongoDB : The Definitive Guide
MongoDB : The Definitive GuideMongoDB : The Definitive Guide
MongoDB : The Definitive GuideWildan Maulana
 
Exploring the replication in MongoDB
Exploring the replication in MongoDBExploring the replication in MongoDB
Exploring the replication in MongoDBIgor Donchovski
 
OSCON 2012 MongoDB Tutorial
OSCON 2012 MongoDB TutorialOSCON 2012 MongoDB Tutorial
OSCON 2012 MongoDB TutorialSteven Francia
 
MongoDB Administration ~ Kevin Hanson
MongoDB Administration ~ Kevin HansonMongoDB Administration ~ Kevin Hanson
MongoDB Administration ~ Kevin Hansonhungarianhc
 
Choosing the right NOSQL database
Choosing the right NOSQL databaseChoosing the right NOSQL database
Choosing the right NOSQL databaseTobias Lindaaker
 
Introduction to mongo db
Introduction to mongo dbIntroduction to mongo db
Introduction to mongo dbRohit Bishnoi
 
MongoDB for Coder Training (Coding Serbia 2013)
MongoDB for Coder Training (Coding Serbia 2013)MongoDB for Coder Training (Coding Serbia 2013)
MongoDB for Coder Training (Coding Serbia 2013)Uwe Printz
 
Webinar: Performance Tuning + Optimization
Webinar: Performance Tuning + OptimizationWebinar: Performance Tuning + Optimization
Webinar: Performance Tuning + OptimizationMongoDB
 
5 Pitfalls to Avoid with MongoDB
5 Pitfalls to Avoid with MongoDB5 Pitfalls to Avoid with MongoDB
5 Pitfalls to Avoid with MongoDBTim Callaghan
 

Mais procurados (20)

Conceptos básicos. Seminario web 6: Despliegue de producción
Conceptos básicos. Seminario web 6: Despliegue de producciónConceptos básicos. Seminario web 6: Despliegue de producción
Conceptos básicos. Seminario web 6: Despliegue de producción
 
MongoDB basics & Introduction
MongoDB basics & IntroductionMongoDB basics & Introduction
MongoDB basics & Introduction
 
Mongo db
Mongo dbMongo db
Mongo db
 
MongoDB Best Practices in AWS
MongoDB Best Practices in AWS MongoDB Best Practices in AWS
MongoDB Best Practices in AWS
 
Introduction to mongo db
Introduction to mongo dbIntroduction to mongo db
Introduction to mongo db
 
Connecting NodeJS & MongoDB
Connecting NodeJS & MongoDBConnecting NodeJS & MongoDB
Connecting NodeJS & MongoDB
 
Migrating to MongoDB: Best Practices
Migrating to MongoDB: Best PracticesMigrating to MongoDB: Best Practices
Migrating to MongoDB: Best Practices
 
Mongo db dhruba
Mongo db dhrubaMongo db dhruba
Mongo db dhruba
 
Mongo db report
Mongo db reportMongo db report
Mongo db report
 
MongoDB : The Definitive Guide
MongoDB : The Definitive GuideMongoDB : The Definitive Guide
MongoDB : The Definitive Guide
 
Exploring the replication in MongoDB
Exploring the replication in MongoDBExploring the replication in MongoDB
Exploring the replication in MongoDB
 
OSCON 2012 MongoDB Tutorial
OSCON 2012 MongoDB TutorialOSCON 2012 MongoDB Tutorial
OSCON 2012 MongoDB Tutorial
 
MongoDB Administration ~ Kevin Hanson
MongoDB Administration ~ Kevin HansonMongoDB Administration ~ Kevin Hanson
MongoDB Administration ~ Kevin Hanson
 
Choosing the right NOSQL database
Choosing the right NOSQL databaseChoosing the right NOSQL database
Choosing the right NOSQL database
 
Introduction to mongo db
Introduction to mongo dbIntroduction to mongo db
Introduction to mongo db
 
MongoDB for Coder Training (Coding Serbia 2013)
MongoDB for Coder Training (Coding Serbia 2013)MongoDB for Coder Training (Coding Serbia 2013)
MongoDB for Coder Training (Coding Serbia 2013)
 
Webinar: Performance Tuning + Optimization
Webinar: Performance Tuning + OptimizationWebinar: Performance Tuning + Optimization
Webinar: Performance Tuning + Optimization
 
Mongo DB
Mongo DB Mongo DB
Mongo DB
 
5 Pitfalls to Avoid with MongoDB
5 Pitfalls to Avoid with MongoDB5 Pitfalls to Avoid with MongoDB
5 Pitfalls to Avoid with MongoDB
 
MongoDB
MongoDBMongoDB
MongoDB
 

Destaque

Scaling MongoDB
Scaling MongoDBScaling MongoDB
Scaling MongoDBMongoDB
 
DocumentDB - NoSQL on Cloud at Reboot2015
DocumentDB - NoSQL on Cloud at Reboot2015DocumentDB - NoSQL on Cloud at Reboot2015
DocumentDB - NoSQL on Cloud at Reboot2015Vidyasagar Machupalli
 
When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...MongoDB
 
Compare DynamoDB vs. MongoDB
Compare DynamoDB vs. MongoDBCompare DynamoDB vs. MongoDB
Compare DynamoDB vs. MongoDBAmar Das
 
Dynamo db pros and cons
Dynamo db  pros and consDynamo db  pros and cons
Dynamo db pros and consSaniya Khalsa
 
How to monitor MongoDB
How to monitor MongoDBHow to monitor MongoDB
How to monitor MongoDBServer Density
 

Destaque (6)

Scaling MongoDB
Scaling MongoDBScaling MongoDB
Scaling MongoDB
 
DocumentDB - NoSQL on Cloud at Reboot2015
DocumentDB - NoSQL on Cloud at Reboot2015DocumentDB - NoSQL on Cloud at Reboot2015
DocumentDB - NoSQL on Cloud at Reboot2015
 
When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...
 
Compare DynamoDB vs. MongoDB
Compare DynamoDB vs. MongoDBCompare DynamoDB vs. MongoDB
Compare DynamoDB vs. MongoDB
 
Dynamo db pros and cons
Dynamo db  pros and consDynamo db  pros and cons
Dynamo db pros and cons
 
How to monitor MongoDB
How to monitor MongoDBHow to monitor MongoDB
How to monitor MongoDB
 

Semelhante a Challenges with MongoDB

MongoSF: MongoDB Concurrency Internals in v2.2
MongoSF: MongoDB Concurrency Internals in v2.2MongoSF: MongoDB Concurrency Internals in v2.2
MongoSF: MongoDB Concurrency Internals in v2.2MongoDB
 
MongoDB - Who, What & Where!
MongoDB - Who, What & Where!MongoDB - Who, What & Where!
MongoDB - Who, What & Where!Mark Hillick
 
Adapt to2012 oak - the new repository
Adapt to2012  oak - the new repositoryAdapt to2012  oak - the new repository
Adapt to2012 oak - the new repositorymichid
 
MyRocks introduction and production deployment
MyRocks introduction and production deploymentMyRocks introduction and production deployment
MyRocks introduction and production deploymentYoshinori Matsunobu
 
Database as a Service on the Oracle Database Appliance Platform
Database as a Service on the Oracle Database Appliance PlatformDatabase as a Service on the Oracle Database Appliance Platform
Database as a Service on the Oracle Database Appliance PlatformMaris Elsins
 
Oracle GoldenGate DB2 to Oracle11gR2 Configuration
Oracle GoldenGate DB2 to Oracle11gR2 ConfigurationOracle GoldenGate DB2 to Oracle11gR2 Configuration
Oracle GoldenGate DB2 to Oracle11gR2 Configurationgrigorianvlad
 
Eouc 12 on 12c osama mustafa
Eouc 12 on 12c osama mustafaEouc 12 on 12c osama mustafa
Eouc 12 on 12c osama mustafaOsama Mustafa
 
RocksDB Performance and Reliability Practices
RocksDB Performance and Reliability PracticesRocksDB Performance and Reliability Practices
RocksDB Performance and Reliability PracticesYoshinori Matsunobu
 
Scaling with mongo db (with notes)
Scaling with mongo db (with notes)Scaling with mongo db (with notes)
Scaling with mongo db (with notes)emiltamas
 
In-memory Database and MySQL Cluster
In-memory Database and MySQL ClusterIn-memory Database and MySQL Cluster
In-memory Database and MySQL Clustergrandis_au
 
A critique of snapshot isolation: eurosys 2012
A critique of snapshot isolation: eurosys 2012A critique of snapshot isolation: eurosys 2012
A critique of snapshot isolation: eurosys 2012Maysam Yabandeh
 
Know Your Competitor - Oracle 10g Express Edition
Know Your Competitor - Oracle 10g Express EditionKnow Your Competitor - Oracle 10g Express Edition
Know Your Competitor - Oracle 10g Express EditionRonald Bradford
 
Oracle Active Data Guard 12cR2. Is it the best option?
Oracle Active Data Guard 12cR2. Is it the best option?Oracle Active Data Guard 12cR2. Is it the best option?
Oracle Active Data Guard 12cR2. Is it the best option?Ludovico Caldara
 
Optimizing LAMPhp Applications
Optimizing LAMPhp ApplicationsOptimizing LAMPhp Applications
Optimizing LAMPhp ApplicationsPiyush Goel
 
Some key value stores using log-structure
Some key value stores using log-structureSome key value stores using log-structure
Some key value stores using log-structureZhichao Liang
 
Conference slides: MySQL Cluster Performance Tuning
Conference slides: MySQL Cluster Performance TuningConference slides: MySQL Cluster Performance Tuning
Conference slides: MySQL Cluster Performance TuningSeveralnines
 
High Availability in YARN
High Availability in YARNHigh Availability in YARN
High Availability in YARNArinto Murdopo
 

Semelhante a Challenges with MongoDB (20)

MongoSF: MongoDB Concurrency Internals in v2.2
MongoSF: MongoDB Concurrency Internals in v2.2MongoSF: MongoDB Concurrency Internals in v2.2
MongoSF: MongoDB Concurrency Internals in v2.2
 
MongoDB - Who, What & Where!
MongoDB - Who, What & Where!MongoDB - Who, What & Where!
MongoDB - Who, What & Where!
 
Adapt to2012 oak - the new repository
Adapt to2012  oak - the new repositoryAdapt to2012  oak - the new repository
Adapt to2012 oak - the new repository
 
MyRocks introduction and production deployment
MyRocks introduction and production deploymentMyRocks introduction and production deployment
MyRocks introduction and production deployment
 
Database as a Service on the Oracle Database Appliance Platform
Database as a Service on the Oracle Database Appliance PlatformDatabase as a Service on the Oracle Database Appliance Platform
Database as a Service on the Oracle Database Appliance Platform
 
Oracle GoldenGate DB2 to Oracle11gR2 Configuration
Oracle GoldenGate DB2 to Oracle11gR2 ConfigurationOracle GoldenGate DB2 to Oracle11gR2 Configuration
Oracle GoldenGate DB2 to Oracle11gR2 Configuration
 
Eouc 12 on 12c osama mustafa
Eouc 12 on 12c osama mustafaEouc 12 on 12c osama mustafa
Eouc 12 on 12c osama mustafa
 
RocksDB Performance and Reliability Practices
RocksDB Performance and Reliability PracticesRocksDB Performance and Reliability Practices
RocksDB Performance and Reliability Practices
 
Scaling with mongo db (with notes)
Scaling with mongo db (with notes)Scaling with mongo db (with notes)
Scaling with mongo db (with notes)
 
In-memory Database and MySQL Cluster
In-memory Database and MySQL ClusterIn-memory Database and MySQL Cluster
In-memory Database and MySQL Cluster
 
A critique of snapshot isolation: eurosys 2012
A critique of snapshot isolation: eurosys 2012A critique of snapshot isolation: eurosys 2012
A critique of snapshot isolation: eurosys 2012
 
Know Your Competitor - Oracle 10g Express Edition
Know Your Competitor - Oracle 10g Express EditionKnow Your Competitor - Oracle 10g Express Edition
Know Your Competitor - Oracle 10g Express Edition
 
Oracle Active Data Guard 12cR2. Is it the best option?
Oracle Active Data Guard 12cR2. Is it the best option?Oracle Active Data Guard 12cR2. Is it the best option?
Oracle Active Data Guard 12cR2. Is it the best option?
 
Optimizing LAMPhp Applications
Optimizing LAMPhp ApplicationsOptimizing LAMPhp Applications
Optimizing LAMPhp Applications
 
Some key value stores using log-structure
Some key value stores using log-structureSome key value stores using log-structure
Some key value stores using log-structure
 
Conference slides: MySQL Cluster Performance Tuning
Conference slides: MySQL Cluster Performance TuningConference slides: MySQL Cluster Performance Tuning
Conference slides: MySQL Cluster Performance Tuning
 
RDBMS vs NoSQL
RDBMS vs NoSQLRDBMS vs NoSQL
RDBMS vs NoSQL
 
Super hybrid2016 tdc
Super hybrid2016 tdcSuper hybrid2016 tdc
Super hybrid2016 tdc
 
High Availability in YARN
High Availability in YARNHigh Availability in YARN
High Availability in YARN
 
Spark architechure.pptx
Spark architechure.pptxSpark architechure.pptx
Spark architechure.pptx
 

Último

Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 

Último (20)

Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 

Challenges with MongoDB

  • 1. Challenges with MongoDB Stone Gao MongoDB Beijing 2012 Monday, April 2, 2012
  • 2. About Me Tech Lead at Umeng.com Monday, April 2, 2012
  • 3. MongoDB is Awesome • Document-oriented storage • Full Index Support • Replication & High Availability • Auto-Sharding • Querying • Fast In-Place Updates • Map/Reduce • GridFS Monday, April 2, 2012
  • 4. But... This talk is not Yet Another Talk about it’s Awesomeness but challenges with MongoDB Monday, April 2, 2012
  • 5. Outline 1. Global Write Lock Sucks 2. Auto-Sharding is not that Reliable 3. Schema-less is Over Rated 4. Community Contribution is Quite Low 5. Attitude Matters Monday, April 2, 2012
  • 6. 1. Global Write Lock Sucks http://www.clker.com/cliparts/3/3/5/D/X/b/locked-exclamation-mark-padlock-hi.png Monday, April 2, 2012
  • 7. 1. Global Write Lock Sucks single global write lock for the entire server (process) collection1 table1 doc1 doc1 doc2 doc2 db-1 db-1 collection2 table2 doc1 doc1 doc2 doc2 mongod mysqld doc1 collection1 VS. doc1 table1 doc2 doc2 db-n db-n collection2 table2 doc1 doc1 doc2 doc2 DB Process Lock VS. Row Lock Monday, April 2, 2012
  • 8. 1. Global Write Lock Sucks Intel SSD 320 RAID10 & mongostat 39.5K Rread IOPS / 23K Write IOPS Nearly all data in RAM, lock ratio is pretty high and bunch of Queued Writes(qw) Monday, April 2, 2012
  • 9. 1. Global Write Lock Sucks Intel SSD 320 RAID10 & mongostat 39.5K Rread IOPS / 23K Write IOPS Nearly all data in RAM, lock ratio is pretty high and bunch of Queued Writes(qw) Monday, April 2, 2012
  • 10. 1. Global Write Lock Sucks Intel SSD 320 RAID10 & mongostat 39.5K Rread IOPS / 23K Write IOPS Nearly all data in RAM, lock ratio is pretty high and bunch of Queued Writes(qw) Monday, April 2, 2012
  • 11. 1. Global Write Lock Sucks Intel SSD 320 RAID10 & mongostat 39.5K Rread IOPS / 23K Write IOPS Nearly all data in RAM, lock ratio is pretty high and bunch of Queued Writes(qw) Monday, April 2, 2012
  • 12. Possible Solutions/Workarounds #1 Wait for lock related issues on JIRA •SERVER-2563 : When hitting disk, yield lock - phase 1 https://jira.mongodb.org/browse/SERVER-2563 Fixed in 1.9.1 Vote (25) • any time we actually have to hit disk. so if a memory mapped page is not in ram, then we should yield update by _id, remove, long cursor iteration •SERVER-1240 : Collection level locking https://jira.mongodb.org/browse/SERVER-1240 Planning Bucket A Vote (154) •SERVER-1241 : Intra collection locking (maybe extent) https://jira.mongodb.org/browse/SERVER-1241 Planning Bucket A Vote (25) •SERVER-1169 : Record level locking https://jira.mongodb.org/browse/SERVER-1169 Rejected Vote (1) and more ... Monday, April 2, 2012
  • 13. Possible Solutions/Workarounds #2 One Collection per DB to Reduce Lock Ratio But you can go no further Use Auto-Sharding to the rescue ? Monday, April 2, 2012
  • 14. 2. Auto-Sharding is not that Reliable http://www.autoinsurancecompanies.com/wp-content/uploads/2011/11/reliable.jpg Monday, April 2, 2012
  • 15. Auto-Sharding is not that Reliable Monday, April 2, 2012
  • 16. Problems with Auto-Sharding • MongoDB can’t figure out how many docs in a collection after sharding • Balancer dead lock [Balancer] skipping balancing round during ongoing split or move activity.) [Balancer] dist_lock lock failed because taken by.... [Balancer] Assertion failure cm s/balance.cpp... • Uneven shard load distribution • ... (Note: I did the experiment before 2.0. So some of the issues might be fixed or improved in new versions of MongoDB coz it’s evolving very fast) Monday, April 2, 2012
  • 17. Possible Solutions/Workarounds #1 Manual Chunk Pre-Splitting http://www.mongodb.org/display/DOCS/Splitting+Shard+Chunks https://groups.google.com/d/msg/mongodb-user/tYBFKSMM3cU/TiYtoOiNMgEJ http://blog.zawodny.com/2011/03/06/mongodb-pre-splitting-for-faster-data-loading-and-importing/ 0) Turn off the balancer (balancing won't understand your locations, but it shouldn't matter b/c you're using hashed shard keys) 1) Shard the empty collection over the shard key { location : 1, hash : 1 } 2) run db.runCommand({ split : "<coll>", middle : { "location":"DEN", "hash": "8000...0" }}) 3) run db.runCommand({ split : "<coll>", middle : { "location":"SC", "hash": "0000...0" }}) 4) move those empty chunks to whatever shards you want - Greg Studer Monday, April 2, 2012
  • 18. Possible Solutions/Workarounds #2 SERVER-2001 : Option to hash shard key https://jira.mongodb.org/browse/SERVER-2001 Unresolved Fix Version/s: 2.1.1 Vote (27) “The lack of hashing based read/write distribution amongst available shards is a huge issue for us now. We're actually considering implementing an app-side layer to do this but that obviously has a number of serious drawbacks.” - Remon van Vliet “Seems like a good idea : we implemented hashed shard key on client-side : operation rate sky rocked ( x3 and less variability). Balancing is moreover quicker and done during our very heavy insertion process : perfect !” - Grégoire Seux https://github.com/twitter/gizzard/raw/master/doc/forwarding_table.png Monday, April 2, 2012
  • 19. Possible Solutions/Workarounds #3 Plain-old Application Level Sharding https://github.com/twitter/gizzard/raw/master/doc/forwarding_table.png Monday, April 2, 2012
  • 20. 3. Schema-less is Over Rated  http://images.sodahead.com/polls/001635729/1863780_overrated_answer_2_xlarge.jpeg Monday, April 2, 2012
  • 21. Schema-less is Over Rated Schema-Free (schema-less) is not free. It means repeat the schema in every docs (records) ! Monday, April 2, 2012
  • 22. Possible Solutions/Workarounds #1 Use Short Key Names 1.6 billion documents {"sequence":"AHAHSPGPGSAVKLPAPHSVGKSALR", "location":{ 243 GB "chromosome":"19", "strand":"-", "begin":"51067007", "end":"51067085" }} 183 GB {"s":"AHAHSPGPGSAVKLPAPHSVGKSALR", "l":{ "c":"19", "s":"-", "b":"51067007", "e":"51067085" }} 60 GB saved! ref : http://christophermaier.name/blog/2011/05/22/MongoDB-key-names Monday, April 2, 2012
  • 23. Possible Solutions/Workarounds #2 SERVER-863 : Tokenize the field names https://jira.mongodb.org/browse/SERVER-863 planned but not scheduled Vote (66) “Most collections, even if they don’t contain the same structure , they contain similar. So it would make a lot of sense and save a lot of space to tokenize the field names.” “The overall benefit as mentioned by other users is that you reduce the amount of storage/RAM taken up by redundant data in each document (so you can use less resources per request, hence gain more throughput and capacity), while importantly also freeing the developer from having to pick short and hard to read field names as a workaround for a technical limitation.” - Andrew Armstrong Monday, April 2, 2012
  • 24. Possible Solutions/Workarounds #3 SERVER-164 : Option to store data compressed https://jira.mongodb.org/browse/SERVER-164 planned but not scheduled Vote (126) “The way oracle handles this is transparent to the database server at the block engine level. They compress the blocks similar to how SAN store's handle it rather than at a record level. They use zlib type compression and the overhead is less than 5 percent. Due to the IO access reduction in both number of blocks touched, and amount of data transferred, the overall effect is a cumulative speed increase. Should MongoDB do it this way? Maybe? But at the end of the day, the architecture must make Mongo more scalable, as well as increase the ability limit the storage footprint.” - Michael D. Joy Monday, April 2, 2012
  • 25. 4. Community Contribution is Quite Low http://www.thompsoncrg.com/wp-content/themes/zoomtechnic/images/slide/img3.jpg Monday, April 2, 2012
  • 26. Community Contribution is Quite Low https://github.com/mongodb/mongo/graphs/impact https://github.com/mongodb/mongo/contributors Monday, April 2, 2012
  • 28. 5. Attitude Matters http://www.mongodb.org/display/DOCS/SQL+to+Mongo+Mapping+Chart MongoDB already has the sweetest API in the NoSQL world. Wish more effort invested in fixing the Hard Problems : locking, sharding, storage engine... Monday, April 2, 2012
  • 29. We are hiring We are doing bigdata analytics • Backend Engineer (MongoDB, Hadoop, HBase, Storm, Scala, Java, Ruby, Clojure) • Data Mining Engineer • DevOps Engineer • Front End Engineer hr@umeng.com Monday, April 2, 2012
  • 30. Contact • Email : stones.gao@gmail.com gaolei@umeng.com • Twitter: @stonegao Monday, April 2, 2012
  • 31. Q &A Thanks Monday, April 2, 2012