SlideShare uma empresa Scribd logo
1 de 27
Scaling with MongoDB
       Eliot Horowitz
       @eliothorowitz
          MongoSV
      December 3, 2010
Scaling

• Storage needs only go up
• Operations/sec only go up
• Complexity only goes up
Scaling by Optimization

• Schema Design
• Index Design
• Hardware Configuration
Horizontal Scaling

• Vertical scaling is limited
• Hard to scale vertically in the cloud
• Can scale wider than higher
Schema

• Modeling the same data in different ways
  can change performance by orders of
  magnitude
• Very often performance problems can be
  solved by changing Schema
Embedding

• Great for read performance
• One seek to load entire object
• One roundtrip to database
• Writes can be slow if adding to objects all
  the time
Should you embed comments?
             {
                 title : “MongoDB is fun” ,
                 author : “eliot” ,
                 date : “2010-12-03” ,
                 comments : [
                   { author : “bob” , text : “...” } ,
                   { author : “joe” , text : “...” }
                 ]
             }

db.posts.update( { title : “MongoDB is fun” } ,
                 { $push : { author : “sam” , text : “...” } } )
Indexes

• Index common queries
• Make sure there aren’t duplicates: (A) and
  (A,B) aren’t needed
• Right-balanced indexes keep working set
  small
Random Index Access


                       Have to keep
                      entire index in
                           ram
Right-Balanced Index Access


                      Only have to keep
                       small portion in
                             ram
Covered Indexes

    db.users.find( { name: “joe”} , { name: 1 , email: 1, _id:0} )
•   Add email address in your index
    db.users.ensureIndex( { name : 1 , email : 1} )
RAM Requirements

• Understand working set
• What percentage of your data has to fit in
  RAM?
• How do you figure this out?
Hardware

• Disk performance
• How many drives
• What about ec2?
• Network performance
Read Scaling

• One master at any time
• Programmer determines if read hits master
  or a slave
• Pro: easy to setup, can scale reads very well
• Con: reads are inconsistent on a slave
• Writes don’t scale
One Master, Many Slaves


• Custom Master/Slave setup
• Have as many slaves as you want
• Can put them local to application servers
• Good for 90+% read heavy applications
  (Wikipedia)
Replica Sets
• High Availability Cluster
• One master at any time, up to 6 slaves
• A slave automatically promoted to master if
  failure
• Drivers support auto routing of reads to
  slaves if programmer allows
• Good for applications that need high write
  availability but mostly reads (Commenting
  System)
Sharding

• Many masters, even more slaves
• Can scale reads and writes in two
  dimensions
• Add slaves for inconsistent read scaling and
  redundancy
• Add Shards for write and data size scaling
Architecture
                     Shards
            mongod   mongod     mongod
                                               ...
 Config      mongod   mongod     mongod
 Servers

mongod

mongod

mongod               mongos    mongos    ...


                      client
Common Setup
• Typical setup is 3 shards with 3 servers per
  shard: 3 masters, 6 slaves
• One massive collection, dozen non-sharded
• Can add sharding later to an existing replica
  set with no down time
• Can have sharded and non-sharded
  collections
Choosing a Shard Key

• Shard key determines how data is
  partitioned
• Hard to change
• Most important performance decision
Range Based
       MIN          MAX        LOCATION
        A            F           shard1
        F            M           shard1
        M            R           shard2
        R            Z           shard3




• collection is broken into chunks by range
• chunks default to 200mb or 100,000
  objects
Use Case: User Profiles
  { email : “eliot@10gen.com” ,
      addresses : [ { state : “NY” } ]
  }
• Shard by email
• Lookup by email hits 1 node
• Index on { “addresses.state” : 1 }
Use Case: Activity
          Stream
  { user_id : XXX, event_id : YYY , data : ZZZ }
• Shard by user_id
• Looking up an activity stream hits 1 node
• Writing even is distributed
• Index on { “event_id” : 1 } for deletes
Use Case: Photos
  { photo_id : ???? , data : <binary> }
  What’s the right key?
• auto increment
• MD5( data )
• now() + MD5(data)
• month() + MD5(data)
Use Case: Logging
    { machine : “app.foo.com” , app : “apache” ,
     when : “2010-12-02:11:33:14” , data : XXX }
    Possible Shard keys
•   { machine : 1 }
•   { when : 1 }
•   { machine : 1 , app : 1 }
•   { app : 1 }
Right-Balanced Index Access


                      Only have to keep
                       small portion in
                             ram
Download MongoDB
      http://www.mongodb.org



   and
let
us
know
what
you
think
    @eliothorowitz



@mongodb


       10gen is hiring!
http://www.10gen.com/jobs

Mais conteúdo relacionado

Mais procurados

Benefits of using MongoDB: Reduce Complexity & Adapt to Changes
Benefits of using MongoDB: Reduce Complexity & Adapt to ChangesBenefits of using MongoDB: Reduce Complexity & Adapt to Changes
Benefits of using MongoDB: Reduce Complexity & Adapt to Changes
Alex Nguyen
 
MongoDB and AWS Best Practices
MongoDB and AWS Best PracticesMongoDB and AWS Best Practices
MongoDB and AWS Best Practices
MongoDB
 

Mais procurados (20)

Compare DynamoDB vs. MongoDB
Compare DynamoDB vs. MongoDBCompare DynamoDB vs. MongoDB
Compare DynamoDB vs. MongoDB
 
Benefits of using MongoDB: Reduce Complexity & Adapt to Changes
Benefits of using MongoDB: Reduce Complexity & Adapt to ChangesBenefits of using MongoDB: Reduce Complexity & Adapt to Changes
Benefits of using MongoDB: Reduce Complexity & Adapt to Changes
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
NoSQL benchmarking
NoSQL benchmarkingNoSQL benchmarking
NoSQL benchmarking
 
Running MongoDB 3.0 on AWS
Running MongoDB 3.0 on AWSRunning MongoDB 3.0 on AWS
Running MongoDB 3.0 on AWS
 
MongoDB and hadoop
MongoDB and hadoopMongoDB and hadoop
MongoDB and hadoop
 
Cassandra Summit 2014: Fuzzy Entity Matching at Scale
Cassandra Summit 2014: Fuzzy Entity Matching at ScaleCassandra Summit 2014: Fuzzy Entity Matching at Scale
Cassandra Summit 2014: Fuzzy Entity Matching at Scale
 
Azure DocumentDB Overview
Azure DocumentDB OverviewAzure DocumentDB Overview
Azure DocumentDB Overview
 
Back to Basics 2017: Introduction to Sharding
Back to Basics 2017: Introduction to ShardingBack to Basics 2017: Introduction to Sharding
Back to Basics 2017: Introduction to Sharding
 
MongoDB and AWS Best Practices
MongoDB and AWS Best PracticesMongoDB and AWS Best Practices
MongoDB and AWS Best Practices
 
MongoDB Introduction talk at Dr Dobbs Conference, MongoDB Evenings at Bangalo...
MongoDB Introduction talk at Dr Dobbs Conference, MongoDB Evenings at Bangalo...MongoDB Introduction talk at Dr Dobbs Conference, MongoDB Evenings at Bangalo...
MongoDB Introduction talk at Dr Dobbs Conference, MongoDB Evenings at Bangalo...
 
What to know about Amazon Elastic Block Store (EBS)
What to know about Amazon Elastic Block Store (EBS)What to know about Amazon Elastic Block Store (EBS)
What to know about Amazon Elastic Block Store (EBS)
 
10gen MongoDB Video Presentation at WebGeek DevCup
10gen MongoDB Video Presentation at WebGeek DevCup10gen MongoDB Video Presentation at WebGeek DevCup
10gen MongoDB Video Presentation at WebGeek DevCup
 
Securing Your MongoDB Deployment
Securing Your MongoDB DeploymentSecuring Your MongoDB Deployment
Securing Your MongoDB Deployment
 
Migrating from MySQL to MongoDB at Wordnik
Migrating from MySQL to MongoDB at WordnikMigrating from MySQL to MongoDB at Wordnik
Migrating from MySQL to MongoDB at Wordnik
 
Rpsonmongodb
RpsonmongodbRpsonmongodb
Rpsonmongodb
 
Cassandra Day Atlanta 2015: Feeding Solr at Large Scale with Apache Cassandra
Cassandra Day Atlanta 2015: Feeding Solr at Large Scale with Apache CassandraCassandra Day Atlanta 2015: Feeding Solr at Large Scale with Apache Cassandra
Cassandra Day Atlanta 2015: Feeding Solr at Large Scale with Apache Cassandra
 
[PASS Summit 2016] Azure DocumentDB: A Deep Dive into Advanced Features
[PASS Summit 2016] Azure DocumentDB: A Deep Dive into Advanced Features[PASS Summit 2016] Azure DocumentDB: A Deep Dive into Advanced Features
[PASS Summit 2016] Azure DocumentDB: A Deep Dive into Advanced Features
 
Conceptos básicos. Seminario web 6: Despliegue de producción
Conceptos básicos. Seminario web 6: Despliegue de producciónConceptos básicos. Seminario web 6: Despliegue de producción
Conceptos básicos. Seminario web 6: Despliegue de producción
 
Gaming on AWS - 3. DynamoDB 모델링 및 Streams 활용법
Gaming on AWS - 3. DynamoDB 모델링 및 Streams 활용법Gaming on AWS - 3. DynamoDB 모델링 및 Streams 활용법
Gaming on AWS - 3. DynamoDB 모델링 및 Streams 활용법
 

Destaque

Social Data and Log Analysis Using MongoDB
Social Data and Log Analysis Using MongoDBSocial Data and Log Analysis Using MongoDB
Social Data and Log Analysis Using MongoDB
Takahiro Inoue
 
Building LinkedIn's Learning Platform with MongoDB
Building LinkedIn's Learning Platform with MongoDBBuilding LinkedIn's Learning Platform with MongoDB
Building LinkedIn's Learning Platform with MongoDB
MongoDB
 
MongoDB at eBay
MongoDB at eBayMongoDB at eBay
MongoDB at eBay
MongoDB
 

Destaque (14)

No sql e as vantagens na utilização do mongodb
No sql e as vantagens na utilização do mongodbNo sql e as vantagens na utilização do mongodb
No sql e as vantagens na utilização do mongodb
 
Ebay: DB Capacity planning at eBay
Ebay: DB Capacity planning at eBayEbay: DB Capacity planning at eBay
Ebay: DB Capacity planning at eBay
 
MongoATL: How Sourceforge is Using MongoDB
MongoATL: How Sourceforge is Using MongoDBMongoATL: How Sourceforge is Using MongoDB
MongoATL: How Sourceforge is Using MongoDB
 
eBay Cloud CMS based on NOSQL
eBay Cloud CMS based on NOSQLeBay Cloud CMS based on NOSQL
eBay Cloud CMS based on NOSQL
 
An Elastic Metadata Store for eBay’s Media Platform
An Elastic Metadata Store for eBay’s Media PlatformAn Elastic Metadata Store for eBay’s Media Platform
An Elastic Metadata Store for eBay’s Media Platform
 
Social Data and Log Analysis Using MongoDB
Social Data and Log Analysis Using MongoDBSocial Data and Log Analysis Using MongoDB
Social Data and Log Analysis Using MongoDB
 
Artigo Nosql
Artigo NosqlArtigo Nosql
Artigo Nosql
 
NOSQL uma breve introdução
NOSQL uma breve introduçãoNOSQL uma breve introdução
NOSQL uma breve introdução
 
Semantic Wiki: Social Semantic Web In Action:
Semantic Wiki: Social Semantic Web In Action: Semantic Wiki: Social Semantic Web In Action:
Semantic Wiki: Social Semantic Web In Action:
 
ebay
ebayebay
ebay
 
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB present...
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB  present...MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB  present...
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB present...
 
NoSQL at Twitter (NoSQL EU 2010)
NoSQL at Twitter (NoSQL EU 2010)NoSQL at Twitter (NoSQL EU 2010)
NoSQL at Twitter (NoSQL EU 2010)
 
Building LinkedIn's Learning Platform with MongoDB
Building LinkedIn's Learning Platform with MongoDBBuilding LinkedIn's Learning Platform with MongoDB
Building LinkedIn's Learning Platform with MongoDB
 
MongoDB at eBay
MongoDB at eBayMongoDB at eBay
MongoDB at eBay
 

Semelhante a Scaling with MongoDB

Meetup#2: Building responsive Symbology & Suggest WebService
Meetup#2: Building responsive Symbology & Suggest WebServiceMeetup#2: Building responsive Symbology & Suggest WebService
Meetup#2: Building responsive Symbology & Suggest WebService
Minsk MongoDB User Group
 
The Care + Feeding of a Mongodb Cluster
The Care + Feeding of a Mongodb ClusterThe Care + Feeding of a Mongodb Cluster
The Care + Feeding of a Mongodb Cluster
Chris Henry
 
SDEC2011 NoSQL concepts and models
SDEC2011 NoSQL concepts and modelsSDEC2011 NoSQL concepts and models
SDEC2011 NoSQL concepts and models
Korea Sdec
 
MongoDB in FS
MongoDB in FSMongoDB in FS
MongoDB in FS
MongoDB
 
MongoDB Knowledge Shareing
MongoDB Knowledge ShareingMongoDB Knowledge Shareing
MongoDB Knowledge Shareing
Philip Zhong
 

Semelhante a Scaling with MongoDB (20)

Scaling MongoDB (Mongo Austin)
Scaling MongoDB (Mongo Austin)Scaling MongoDB (Mongo Austin)
Scaling MongoDB (Mongo Austin)
 
MongoDB
MongoDBMongoDB
MongoDB
 
Large scale computing with mapreduce
Large scale computing with mapreduceLarge scale computing with mapreduce
Large scale computing with mapreduce
 
Running MongoDB in the Cloud
Running MongoDB in the CloudRunning MongoDB in the Cloud
Running MongoDB in the Cloud
 
Meetup#2: Building responsive Symbology & Suggest WebService
Meetup#2: Building responsive Symbology & Suggest WebServiceMeetup#2: Building responsive Symbology & Suggest WebService
Meetup#2: Building responsive Symbology & Suggest WebService
 
MongoDB at Scale
MongoDB at ScaleMongoDB at Scale
MongoDB at Scale
 
The Care + Feeding of a Mongodb Cluster
The Care + Feeding of a Mongodb ClusterThe Care + Feeding of a Mongodb Cluster
The Care + Feeding of a Mongodb Cluster
 
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
 
Using Spring with NoSQL databases (SpringOne China 2012)
Using Spring with NoSQL databases (SpringOne China 2012)Using Spring with NoSQL databases (SpringOne China 2012)
Using Spring with NoSQL databases (SpringOne China 2012)
 
MongoDB Internals
MongoDB InternalsMongoDB Internals
MongoDB Internals
 
Drop acid
Drop acidDrop acid
Drop acid
 
MongoDB
MongoDBMongoDB
MongoDB
 
Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
Cassandra Community Webinar: From Mongo to Cassandra, Architectural LessonsCassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
 
SDEC2011 NoSQL concepts and models
SDEC2011 NoSQL concepts and modelsSDEC2011 NoSQL concepts and models
SDEC2011 NoSQL concepts and models
 
MongoDB in FS
MongoDB in FSMongoDB in FS
MongoDB in FS
 
MongoDB: a gentle, friendly overview
MongoDB: a gentle, friendly overviewMongoDB: a gentle, friendly overview
MongoDB: a gentle, friendly overview
 
2011 mongo sf-schemadesign
2011 mongo sf-schemadesign2011 mongo sf-schemadesign
2011 mongo sf-schemadesign
 
MongoDB Knowledge Shareing
MongoDB Knowledge ShareingMongoDB Knowledge Shareing
MongoDB Knowledge Shareing
 
Mongodb sharding
Mongodb shardingMongodb sharding
Mongodb sharding
 
Dynamo vs Mongo
Dynamo vs MongoDynamo vs Mongo
Dynamo vs Mongo
 

Mais de MongoDB

Mais de MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
 

Último

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Último (20)

DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 

Scaling with MongoDB

  • 1. Scaling with MongoDB Eliot Horowitz @eliothorowitz MongoSV December 3, 2010
  • 2. Scaling • Storage needs only go up • Operations/sec only go up • Complexity only goes up
  • 3. Scaling by Optimization • Schema Design • Index Design • Hardware Configuration
  • 4. Horizontal Scaling • Vertical scaling is limited • Hard to scale vertically in the cloud • Can scale wider than higher
  • 5. Schema • Modeling the same data in different ways can change performance by orders of magnitude • Very often performance problems can be solved by changing Schema
  • 6. Embedding • Great for read performance • One seek to load entire object • One roundtrip to database • Writes can be slow if adding to objects all the time
  • 7. Should you embed comments? { title : “MongoDB is fun” , author : “eliot” , date : “2010-12-03” , comments : [ { author : “bob” , text : “...” } , { author : “joe” , text : “...” } ] } db.posts.update( { title : “MongoDB is fun” } , { $push : { author : “sam” , text : “...” } } )
  • 8. Indexes • Index common queries • Make sure there aren’t duplicates: (A) and (A,B) aren’t needed • Right-balanced indexes keep working set small
  • 9. Random Index Access Have to keep entire index in ram
  • 10. Right-Balanced Index Access Only have to keep small portion in ram
  • 11. Covered Indexes db.users.find( { name: “joe”} , { name: 1 , email: 1, _id:0} ) • Add email address in your index db.users.ensureIndex( { name : 1 , email : 1} )
  • 12. RAM Requirements • Understand working set • What percentage of your data has to fit in RAM? • How do you figure this out?
  • 13. Hardware • Disk performance • How many drives • What about ec2? • Network performance
  • 14. Read Scaling • One master at any time • Programmer determines if read hits master or a slave • Pro: easy to setup, can scale reads very well • Con: reads are inconsistent on a slave • Writes don’t scale
  • 15. One Master, Many Slaves • Custom Master/Slave setup • Have as many slaves as you want • Can put them local to application servers • Good for 90+% read heavy applications (Wikipedia)
  • 16. Replica Sets • High Availability Cluster • One master at any time, up to 6 slaves • A slave automatically promoted to master if failure • Drivers support auto routing of reads to slaves if programmer allows • Good for applications that need high write availability but mostly reads (Commenting System)
  • 17. Sharding • Many masters, even more slaves • Can scale reads and writes in two dimensions • Add slaves for inconsistent read scaling and redundancy • Add Shards for write and data size scaling
  • 18. Architecture Shards mongod mongod mongod ... Config mongod mongod mongod Servers mongod mongod mongod mongos mongos ... client
  • 19. Common Setup • Typical setup is 3 shards with 3 servers per shard: 3 masters, 6 slaves • One massive collection, dozen non-sharded • Can add sharding later to an existing replica set with no down time • Can have sharded and non-sharded collections
  • 20. Choosing a Shard Key • Shard key determines how data is partitioned • Hard to change • Most important performance decision
  • 21. Range Based MIN MAX LOCATION A F shard1 F M shard1 M R shard2 R Z shard3 • collection is broken into chunks by range • chunks default to 200mb or 100,000 objects
  • 22. Use Case: User Profiles { email : “eliot@10gen.com” , addresses : [ { state : “NY” } ] } • Shard by email • Lookup by email hits 1 node • Index on { “addresses.state” : 1 }
  • 23. Use Case: Activity Stream { user_id : XXX, event_id : YYY , data : ZZZ } • Shard by user_id • Looking up an activity stream hits 1 node • Writing even is distributed • Index on { “event_id” : 1 } for deletes
  • 24. Use Case: Photos { photo_id : ???? , data : <binary> } What’s the right key? • auto increment • MD5( data ) • now() + MD5(data) • month() + MD5(data)
  • 25. Use Case: Logging { machine : “app.foo.com” , app : “apache” , when : “2010-12-02:11:33:14” , data : XXX } Possible Shard keys • { machine : 1 } • { when : 1 } • { machine : 1 , app : 1 } • { app : 1 }
  • 26. Right-Balanced Index Access Only have to keep small portion in ram
  • 27. Download MongoDB http://www.mongodb.org and
let
us
know
what
you
think @eliothorowitz



@mongodb 10gen is hiring! http://www.10gen.com/jobs

Notas do Editor

  1. \n
  2. \n
  3. What is scaling?\nWell - hopefully for everyone here.\n\n
  4. \n
  5. ec2 goes up to 64gb, maybe mention 256gb box here??? ($30-40k)\nmaybe can but 256gb box, but i spin up 10 ec2 64gb boxes in 10 minutes\n
  6. \n
  7. not schema less - dynamic schema\nschema is just as important, or more important than relational\nunderstand write vs read tradeoffs\n\n
  8. compare to mysql here\n\n
  9. \n
  10. most common performance problem\nwhy _id index can be ignored\n
  11. \n
  12. \n
  13. \n
  14. data looked at per second/minute/hour/day\nare you indexes accessed randomly\n
  15. \n256gb ram $30-40k\n
  16. \n
  17. \n
  18. \n
  19. \n
  20. \n
  21. Don&amp;#x2019;t pre-emptively shard - easy to add later\n
  22. \n
  23. \n
  24. \n
  25. \n
  26. \n
  27. \n
  28. \n
  29. \n
  30. \n
  31. \n