SlideShare uma empresa Scribd logo
1 de 33
MongoDB: Schema
 Design at Scale


     Rick Copeland
       @rick446
    http://arborian.com
Who am I?

• Now a consultant/trainer, but formerly...
 • Software engineer at SourceForge
 • Author of Essential SQLAlchemy
 • Author of MongoDB with Python and Ming
 • Primarily code Python
The Inspiration
• MongoDB monitoring service
  (MMS)
• Free to all MongoDB users
• Minute-by-minute stats on all
  your servers
• Hardware cost is important,
  use it efficiently (remember it’s
  a free service!)
Our Experiment
• Similar to MMS but not identical
• Collection of 100 metrics, each with per-
  minute values
• “Simulation time” is 300x real time
• Run on 2x AWS small instance
 • one MongoDB server (2.0.2)
 • one “load generator”
Load Generator

• Increment each metric as many times as
  possible during the course of a simulated
  minute
• Record number of updates per second
• Occasionally call getLastError to prevent
  disconnects
Schema v1
{
    _id: "20101010/metric-1",
    metadata: {
        date: ISODate("2000-10-10T00:00:00Z"),
        metric: "metric-1" },                    •   One document
    daily: 5468426,
    hourly: {
                                                     per metric (per
        "00": 227850,                                server) per day
        "01": 210231,
        ...
        "23": 20457 },
    minute: {                                    •   Per hour/minute
        "0000": 3612,                                statistics stored as
        "0001": 3241,
        ...                                          documents
        "1439": 2819 }
}
Update v1
                                     •   Use $inc to
                                         update fields in-
                                         place
increment = { daily: 1 }
increment['hourly.' + hour] = 1
increment['minute.' + minute] = 1
                                     •   Use upsert to
db.stats.update(
  { _id: id, metadata: metadata },
                                         create document
  { $inc: update },                      if it’s missing
  true) // upsert


                                     •   Easy, correct,
                                         seems like a good
                                         idea....
Performance of v1
Performance of v1

              Experiment startup
Performance of v1

                Experiment startup




            OUCH!
Problems with v1

• The document movement problem
• The midnight problem
• The end-of-the-day problem
• The historical query problem
Document movement
     problem
• MongoDB in-place updates are fast
 • ... except when they’re not in place
• MongoDB adaptively pads documents
 • ... but it’s better to know your doc size
    ahead of time
Midnight problem

• Upserts are convenient, but what’s our key?
 • date/metric
• At midnight, you get a huge spike in inserts
Fixing the document
          movement problem
                                     •   Preallocate
db.stats.update(
                                         documents with
  { _id: id, metadata: metadata },       zeros
  { $inc: {
    daily: 0,
    hourly.0: 0,
    hourly.1: 0,
    ...
                                     •   Crontab (?)
    minute.0: 0,
    minute.1: 0,
    ... }                                • NO! (makes
  true) // upsert                          the midnight
                                           problem even
                                           worse)
Fixing the midnight
      problem
Fixing the midnight
         problem
• Could schedule preallocation for different
  metrics, staggered through the day
Fixing the midnight
         problem
• Could schedule preallocation for different
  metrics, staggered through the day
• Observation: Preallocation isn’t required for
  correct operation
Fixing the midnight
         problem
• Could schedule preallocation for different
  metrics, staggered through the day
• Observation: Preallocation isn’t required for
  correct operation
• Let’s just preallocate tomorrow’s docs
  randomly as new stats are inserted (with
  low probability).
Performance with
  Preallocation


     Experiment startup
Performance with
  Preallocation

                          • Well, it’s better
     Experiment startup
Performance with
  Preallocation

                          • Well, it’s better
                          • Still have
     Experiment startup     decreasing
                            performance
                            through the day...
                            WTF?
Performance with
  Preallocation

                          • Well, it’s better
                          • Still have
     Experiment startup     decreasing
                            performance
                            through the day...
                            WTF?
Problems with v1

• The document movement problem
• The midnight problem
• The end-of-the-day problem
• The historical query problem
End-of-day problem
“0000” Value “0001” Value           “1439” Value




•   Bson stores documents as an association list

•   MongoDB must check each key for a match

•   Load increases significantly at the end of the day
    (MongoDB must scan 1439 keys to find the right minute!)
Fixing the end-of-day
                problem
                                             •
{ _id: "20101010/metric-1",
  metadata: {                                    Split up our
    date: ISODate("2000-10-10T00:00:00Z"),
    metric: "metric-1" },                        ‘minute’ property
  daily: 5468426,
  hourly: {                                      by hour
    "0": 227850,
    "1": 210231,
    ...
    "23": 20457 },                           •   Better worst-case
  minute: {                                      keys scanned:
    "00": {
        "0000": 3612,
        "0100": 3241,
        ...
    }, ...,
                                                 •   Old: 1439
    "23": { ..., "1439": 2819 }
}
                                                 •   New: 82
“Hierarchical minutes”
    Performance
Performance
Comparision
Performance
Comparision (2.2)
Historical Query
         Problem

• Intra-day queries are great
• What about “performance year to date”?
 • Now you’re hitting a lot of “cold”
    documents and causing page faults
Fixing the historical
              query problem
                                             •   Store multiple levels
{ _id: "201010/metric-1",                        of granularity in
  metadata: {
    date: ISODate("2000-10-01T00:00:00Z"),       different collections
    metric: "metric-1" },

                                             •
  daily: {
    "0": 5468426,                                2 updates rather than
    "1": ...,
    ...                                          1, but historical
}
    "31": ... },
                                                 queries much faster

                                             •   Preallocate along with
                                                 daily docs (only
                                                 infrequently upserted)
Queries
db.stats.daily.find( {                           •   Updates are by
    "metadata.date": { $gte: dt1, $lte: dt2 },
    "metadata.metric": "metric-1"},
                                                     _id, so no index
{ "metadata.date": 1, "hourly": 1 } },               needed there
sort=[("metadata.date", 1)])


                                                 •   Chart queries are
                                                     by metadata

db.stats.daily.ensureIndex({
  'metadata.metric': 1,
                                                 •   Your range/sort
  'metadata.date': 1 })                              should be last in
                                                     the compound
                                                     index
Conclusion
• Monitor your performance. Watch out for
  spikes.
• Preallocate to prevent document copying
• Pay attention to the number of keys in your
  documents (hierarchy can help)
• Make sure your index is optimized for your
  sorts
Questions?
          MongoDB Monitoring Service
http://www.10gen.com/mongodb-monitoring-service




                Rick Copeland
                   @rick446
              http://arborian.com
         MongoDB Consulting & Training

Mais conteúdo relacionado

Mais procurados

MySQL Performance Monitoring
MySQL Performance MonitoringMySQL Performance Monitoring
MySQL Performance Monitoringspil-engineering
 
Retaining globally distributed high availability
Retaining globally distributed high availabilityRetaining globally distributed high availability
Retaining globally distributed high availabilityspil-engineering
 
Storing metrics at scale with Gnocchi
Storing metrics at scale with GnocchiStoring metrics at scale with Gnocchi
Storing metrics at scale with GnocchiGordon Chung
 
A Hitchhiker's Guide to NOSQL v1.0
A Hitchhiker's Guide to NOSQL v1.0A Hitchhiker's Guide to NOSQL v1.0
A Hitchhiker's Guide to NOSQL v1.0Krishna Sankar
 
MongoDB World 2018: Overnight to 60 Seconds: An IOT ETL Performance Case Study
MongoDB World 2018: Overnight to 60 Seconds: An IOT ETL Performance Case StudyMongoDB World 2018: Overnight to 60 Seconds: An IOT ETL Performance Case Study
MongoDB World 2018: Overnight to 60 Seconds: An IOT ETL Performance Case StudyMongoDB
 
Performance Schema in MySQL (Danil Zburivsky)
Performance Schema in MySQL (Danil Zburivsky)Performance Schema in MySQL (Danil Zburivsky)
Performance Schema in MySQL (Danil Zburivsky)Ontico
 
Cassandra Day Atlanta 2015: Diagnosing Problems in Production
Cassandra Day Atlanta 2015: Diagnosing Problems in ProductionCassandra Day Atlanta 2015: Diagnosing Problems in Production
Cassandra Day Atlanta 2015: Diagnosing Problems in ProductionDataStax Academy
 
Webinar: Diagnosing Apache Cassandra Problems in Production
Webinar: Diagnosing Apache Cassandra Problems in ProductionWebinar: Diagnosing Apache Cassandra Problems in Production
Webinar: Diagnosing Apache Cassandra Problems in ProductionDataStax Academy
 

Mais procurados (10)

MySQL Performance Monitoring
MySQL Performance MonitoringMySQL Performance Monitoring
MySQL Performance Monitoring
 
Retaining globally distributed high availability
Retaining globally distributed high availabilityRetaining globally distributed high availability
Retaining globally distributed high availability
 
Storing metrics at scale with Gnocchi
Storing metrics at scale with GnocchiStoring metrics at scale with Gnocchi
Storing metrics at scale with Gnocchi
 
Disco workshop
Disco workshopDisco workshop
Disco workshop
 
A Hitchhiker's Guide to NOSQL v1.0
A Hitchhiker's Guide to NOSQL v1.0A Hitchhiker's Guide to NOSQL v1.0
A Hitchhiker's Guide to NOSQL v1.0
 
MongoDB World 2018: Overnight to 60 Seconds: An IOT ETL Performance Case Study
MongoDB World 2018: Overnight to 60 Seconds: An IOT ETL Performance Case StudyMongoDB World 2018: Overnight to 60 Seconds: An IOT ETL Performance Case Study
MongoDB World 2018: Overnight to 60 Seconds: An IOT ETL Performance Case Study
 
Database TCO
Database TCODatabase TCO
Database TCO
 
Performance Schema in MySQL (Danil Zburivsky)
Performance Schema in MySQL (Danil Zburivsky)Performance Schema in MySQL (Danil Zburivsky)
Performance Schema in MySQL (Danil Zburivsky)
 
Cassandra Day Atlanta 2015: Diagnosing Problems in Production
Cassandra Day Atlanta 2015: Diagnosing Problems in ProductionCassandra Day Atlanta 2015: Diagnosing Problems in Production
Cassandra Day Atlanta 2015: Diagnosing Problems in Production
 
Webinar: Diagnosing Apache Cassandra Problems in Production
Webinar: Diagnosing Apache Cassandra Problems in ProductionWebinar: Diagnosing Apache Cassandra Problems in Production
Webinar: Diagnosing Apache Cassandra Problems in Production
 

Destaque

MongoDB Schema Design (Richard Kreuter's Mongo Berlin preso)
MongoDB Schema Design (Richard Kreuter's Mongo Berlin preso)MongoDB Schema Design (Richard Kreuter's Mongo Berlin preso)
MongoDB Schema Design (Richard Kreuter's Mongo Berlin preso)MongoDB
 
MongoDB Schema Design
MongoDB Schema DesignMongoDB Schema Design
MongoDB Schema DesignMongoDB
 
MongoDB Schema Design: Four Real-World Examples
MongoDB Schema Design: Four Real-World ExamplesMongoDB Schema Design: Four Real-World Examples
MongoDB Schema Design: Four Real-World ExamplesMike Friedman
 
Using communication and messaging API in the HTML5 world - GIl Fink, sparXsys
Using communication and messaging API in the HTML5 world - GIl Fink, sparXsysUsing communication and messaging API in the HTML5 world - GIl Fink, sparXsys
Using communication and messaging API in the HTML5 world - GIl Fink, sparXsysCodemotion Tel Aviv
 
MongoDB Schema Design
MongoDB Schema DesignMongoDB Schema Design
MongoDB Schema Designaaronheckmann
 
MongoDB Schema Design by Examples
MongoDB Schema Design by ExamplesMongoDB Schema Design by Examples
MongoDB Schema Design by ExamplesHadi Ariawan
 
10gen MongoDB Video Presentation at WebGeek DevCup
10gen MongoDB Video Presentation at WebGeek DevCup10gen MongoDB Video Presentation at WebGeek DevCup
10gen MongoDB Video Presentation at WebGeek DevCupWebGeek Philippines
 
How Insurance Companies Use MongoDB
How Insurance Companies Use MongoDB How Insurance Companies Use MongoDB
How Insurance Companies Use MongoDB MongoDB
 
Real World MongoDB: Use Cases from Financial Services by Daniel Roberts
Real World MongoDB: Use Cases from Financial Services by Daniel RobertsReal World MongoDB: Use Cases from Financial Services by Daniel Roberts
Real World MongoDB: Use Cases from Financial Services by Daniel RobertsMongoDB
 
How Retail Banks Use MongoDB
How Retail Banks Use MongoDBHow Retail Banks Use MongoDB
How Retail Banks Use MongoDBMongoDB
 
How Financial Services Organizations Use MongoDB
How Financial Services Organizations Use MongoDBHow Financial Services Organizations Use MongoDB
How Financial Services Organizations Use MongoDBMongoDB
 
Scalable Event Analytics with MongoDB & Ruby on Rails
Scalable Event Analytics with MongoDB & Ruby on RailsScalable Event Analytics with MongoDB & Ruby on Rails
Scalable Event Analytics with MongoDB & Ruby on RailsJared Rosoff
 
Schema Design with MongoDB
Schema Design with MongoDBSchema Design with MongoDB
Schema Design with MongoDBrogerbodamer
 
MongoDB Advanced Schema Design - Inboxes
MongoDB Advanced Schema Design - InboxesMongoDB Advanced Schema Design - Inboxes
MongoDB Advanced Schema Design - InboxesJared Rosoff
 
Modeling Data in MongoDB
Modeling Data in MongoDBModeling Data in MongoDB
Modeling Data in MongoDBlehresman
 
MongoDB for Time Series Data: Setting the Stage for Sensor Management
MongoDB for Time Series Data: Setting the Stage for Sensor ManagementMongoDB for Time Series Data: Setting the Stage for Sensor Management
MongoDB for Time Series Data: Setting the Stage for Sensor ManagementMongoDB
 
MongoDB for Time Series Data: Schema Design
MongoDB for Time Series Data: Schema DesignMongoDB for Time Series Data: Schema Design
MongoDB for Time Series Data: Schema DesignMongoDB
 

Destaque (17)

MongoDB Schema Design (Richard Kreuter's Mongo Berlin preso)
MongoDB Schema Design (Richard Kreuter's Mongo Berlin preso)MongoDB Schema Design (Richard Kreuter's Mongo Berlin preso)
MongoDB Schema Design (Richard Kreuter's Mongo Berlin preso)
 
MongoDB Schema Design
MongoDB Schema DesignMongoDB Schema Design
MongoDB Schema Design
 
MongoDB Schema Design: Four Real-World Examples
MongoDB Schema Design: Four Real-World ExamplesMongoDB Schema Design: Four Real-World Examples
MongoDB Schema Design: Four Real-World Examples
 
Using communication and messaging API in the HTML5 world - GIl Fink, sparXsys
Using communication and messaging API in the HTML5 world - GIl Fink, sparXsysUsing communication and messaging API in the HTML5 world - GIl Fink, sparXsys
Using communication and messaging API in the HTML5 world - GIl Fink, sparXsys
 
MongoDB Schema Design
MongoDB Schema DesignMongoDB Schema Design
MongoDB Schema Design
 
MongoDB Schema Design by Examples
MongoDB Schema Design by ExamplesMongoDB Schema Design by Examples
MongoDB Schema Design by Examples
 
10gen MongoDB Video Presentation at WebGeek DevCup
10gen MongoDB Video Presentation at WebGeek DevCup10gen MongoDB Video Presentation at WebGeek DevCup
10gen MongoDB Video Presentation at WebGeek DevCup
 
How Insurance Companies Use MongoDB
How Insurance Companies Use MongoDB How Insurance Companies Use MongoDB
How Insurance Companies Use MongoDB
 
Real World MongoDB: Use Cases from Financial Services by Daniel Roberts
Real World MongoDB: Use Cases from Financial Services by Daniel RobertsReal World MongoDB: Use Cases from Financial Services by Daniel Roberts
Real World MongoDB: Use Cases from Financial Services by Daniel Roberts
 
How Retail Banks Use MongoDB
How Retail Banks Use MongoDBHow Retail Banks Use MongoDB
How Retail Banks Use MongoDB
 
How Financial Services Organizations Use MongoDB
How Financial Services Organizations Use MongoDBHow Financial Services Organizations Use MongoDB
How Financial Services Organizations Use MongoDB
 
Scalable Event Analytics with MongoDB & Ruby on Rails
Scalable Event Analytics with MongoDB & Ruby on RailsScalable Event Analytics with MongoDB & Ruby on Rails
Scalable Event Analytics with MongoDB & Ruby on Rails
 
Schema Design with MongoDB
Schema Design with MongoDBSchema Design with MongoDB
Schema Design with MongoDB
 
MongoDB Advanced Schema Design - Inboxes
MongoDB Advanced Schema Design - InboxesMongoDB Advanced Schema Design - Inboxes
MongoDB Advanced Schema Design - Inboxes
 
Modeling Data in MongoDB
Modeling Data in MongoDBModeling Data in MongoDB
Modeling Data in MongoDB
 
MongoDB for Time Series Data: Setting the Stage for Sensor Management
MongoDB for Time Series Data: Setting the Stage for Sensor ManagementMongoDB for Time Series Data: Setting the Stage for Sensor Management
MongoDB for Time Series Data: Setting the Stage for Sensor Management
 
MongoDB for Time Series Data: Schema Design
MongoDB for Time Series Data: Schema DesignMongoDB for Time Series Data: Schema Design
MongoDB for Time Series Data: Schema Design
 

Semelhante a Schema Design at Scale

MongoDB Solution for Internet of Things and Big Data
MongoDB Solution for Internet of Things and Big DataMongoDB Solution for Internet of Things and Big Data
MongoDB Solution for Internet of Things and Big DataStefano Dindo
 
Lab pratico per la progettazione di soluzioni MongoDB in ambito Internet of T...
Lab pratico per la progettazione di soluzioni MongoDB in ambito Internet of T...Lab pratico per la progettazione di soluzioni MongoDB in ambito Internet of T...
Lab pratico per la progettazione di soluzioni MongoDB in ambito Internet of T...festival ICT 2016
 
Codemotion Milano 2014 - MongoDB and the Internet of Things
Codemotion Milano 2014 - MongoDB and the Internet of ThingsCodemotion Milano 2014 - MongoDB and the Internet of Things
Codemotion Milano 2014 - MongoDB and the Internet of ThingsMassimo Brignoli
 
Big Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapBig Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapSrinath Perera
 
Big Data Analytics: Finding diamonds in the rough with Azure
Big Data Analytics: Finding diamonds in the rough with AzureBig Data Analytics: Finding diamonds in the rough with Azure
Big Data Analytics: Finding diamonds in the rough with AzureChristos Charmatzis
 
What we do with Go
What we do with GoWhat we do with Go
What we do with GoMarcelLanz
 
Webinar: MongoDB Use Cases within the Oil, Gas, and Energy Industries
Webinar: MongoDB Use Cases within the Oil, Gas, and Energy IndustriesWebinar: MongoDB Use Cases within the Oil, Gas, and Energy Industries
Webinar: MongoDB Use Cases within the Oil, Gas, and Energy IndustriesMongoDB
 
Buildingsocialanalyticstoolwithmongodb
BuildingsocialanalyticstoolwithmongodbBuildingsocialanalyticstoolwithmongodb
BuildingsocialanalyticstoolwithmongodbMongoDB APAC
 
Sizing Your MongoDB Cluster
Sizing Your MongoDB ClusterSizing Your MongoDB Cluster
Sizing Your MongoDB ClusterMongoDB
 
System insight without Interference
System insight without InterferenceSystem insight without Interference
System insight without InterferenceTony Tam
 
[2C6]Everyplay_Big_Data
[2C6]Everyplay_Big_Data[2C6]Everyplay_Big_Data
[2C6]Everyplay_Big_DataNAVER D2
 
Microsoft Big Data @ SQLUG 2013
Microsoft Big Data @ SQLUG 2013Microsoft Big Data @ SQLUG 2013
Microsoft Big Data @ SQLUG 2013Nathan Bijnens
 
Benchmarking at Parse
Benchmarking at ParseBenchmarking at Parse
Benchmarking at ParseTravis Redman
 
Advanced Benchmarking at Parse
Advanced Benchmarking at ParseAdvanced Benchmarking at Parse
Advanced Benchmarking at ParseMongoDB
 
Prepare your data for machine learning
Prepare your data for machine learningPrepare your data for machine learning
Prepare your data for machine learningIvo Andreev
 
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?DATAVERSITY
 
Webinar: Best Practices for Getting Started with MongoDB
Webinar: Best Practices for Getting Started with MongoDBWebinar: Best Practices for Getting Started with MongoDB
Webinar: Best Practices for Getting Started with MongoDBMongoDB
 
MongoDB Best Practices
MongoDB Best PracticesMongoDB Best Practices
MongoDB Best PracticesLewis Lin 🦊
 
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data AnalyticsFebruary 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data AnalyticsYahoo Developer Network
 
MongoDB and the Internet of Things
MongoDB and the Internet of ThingsMongoDB and the Internet of Things
MongoDB and the Internet of ThingsSam_Francis
 

Semelhante a Schema Design at Scale (20)

MongoDB Solution for Internet of Things and Big Data
MongoDB Solution for Internet of Things and Big DataMongoDB Solution for Internet of Things and Big Data
MongoDB Solution for Internet of Things and Big Data
 
Lab pratico per la progettazione di soluzioni MongoDB in ambito Internet of T...
Lab pratico per la progettazione di soluzioni MongoDB in ambito Internet of T...Lab pratico per la progettazione di soluzioni MongoDB in ambito Internet of T...
Lab pratico per la progettazione di soluzioni MongoDB in ambito Internet of T...
 
Codemotion Milano 2014 - MongoDB and the Internet of Things
Codemotion Milano 2014 - MongoDB and the Internet of ThingsCodemotion Milano 2014 - MongoDB and the Internet of Things
Codemotion Milano 2014 - MongoDB and the Internet of Things
 
Big Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapBig Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and Roadmap
 
Big Data Analytics: Finding diamonds in the rough with Azure
Big Data Analytics: Finding diamonds in the rough with AzureBig Data Analytics: Finding diamonds in the rough with Azure
Big Data Analytics: Finding diamonds in the rough with Azure
 
What we do with Go
What we do with GoWhat we do with Go
What we do with Go
 
Webinar: MongoDB Use Cases within the Oil, Gas, and Energy Industries
Webinar: MongoDB Use Cases within the Oil, Gas, and Energy IndustriesWebinar: MongoDB Use Cases within the Oil, Gas, and Energy Industries
Webinar: MongoDB Use Cases within the Oil, Gas, and Energy Industries
 
Buildingsocialanalyticstoolwithmongodb
BuildingsocialanalyticstoolwithmongodbBuildingsocialanalyticstoolwithmongodb
Buildingsocialanalyticstoolwithmongodb
 
Sizing Your MongoDB Cluster
Sizing Your MongoDB ClusterSizing Your MongoDB Cluster
Sizing Your MongoDB Cluster
 
System insight without Interference
System insight without InterferenceSystem insight without Interference
System insight without Interference
 
[2C6]Everyplay_Big_Data
[2C6]Everyplay_Big_Data[2C6]Everyplay_Big_Data
[2C6]Everyplay_Big_Data
 
Microsoft Big Data @ SQLUG 2013
Microsoft Big Data @ SQLUG 2013Microsoft Big Data @ SQLUG 2013
Microsoft Big Data @ SQLUG 2013
 
Benchmarking at Parse
Benchmarking at ParseBenchmarking at Parse
Benchmarking at Parse
 
Advanced Benchmarking at Parse
Advanced Benchmarking at ParseAdvanced Benchmarking at Parse
Advanced Benchmarking at Parse
 
Prepare your data for machine learning
Prepare your data for machine learningPrepare your data for machine learning
Prepare your data for machine learning
 
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
 
Webinar: Best Practices for Getting Started with MongoDB
Webinar: Best Practices for Getting Started with MongoDBWebinar: Best Practices for Getting Started with MongoDB
Webinar: Best Practices for Getting Started with MongoDB
 
MongoDB Best Practices
MongoDB Best PracticesMongoDB Best Practices
MongoDB Best Practices
 
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data AnalyticsFebruary 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
 
MongoDB and the Internet of Things
MongoDB and the Internet of ThingsMongoDB and the Internet of Things
MongoDB and the Internet of Things
 

Mais de Rick Copeland

Python Functions (PyAtl Beginners Night)
Python Functions (PyAtl Beginners Night)Python Functions (PyAtl Beginners Night)
Python Functions (PyAtl Beginners Night)Rick Copeland
 
Building Your First MongoDB Application
Building Your First MongoDB ApplicationBuilding Your First MongoDB Application
Building Your First MongoDB ApplicationRick Copeland
 
Rapid and Scalable Development with MongoDB, PyMongo, and Ming
Rapid and Scalable Development with MongoDB, PyMongo, and MingRapid and Scalable Development with MongoDB, PyMongo, and Ming
Rapid and Scalable Development with MongoDB, PyMongo, and MingRick Copeland
 
Chef on MongoDB and Pyramid
Chef on MongoDB and PyramidChef on MongoDB and Pyramid
Chef on MongoDB and PyramidRick Copeland
 
Scaling with MongoDB
Scaling with MongoDBScaling with MongoDB
Scaling with MongoDBRick Copeland
 
Chef on Python and MongoDB
Chef on Python and MongoDBChef on Python and MongoDB
Chef on Python and MongoDBRick Copeland
 
Real-Time Python Web: Gevent and Socket.io
Real-Time Python Web: Gevent and Socket.ioReal-Time Python Web: Gevent and Socket.io
Real-Time Python Web: Gevent and Socket.ioRick Copeland
 
Rapid and Scalable Development with MongoDB, PyMongo, and Ming
Rapid and Scalable Development with MongoDB, PyMongo, and MingRapid and Scalable Development with MongoDB, PyMongo, and Ming
Rapid and Scalable Development with MongoDB, PyMongo, and MingRick Copeland
 
Realtime Analytics Using MongoDB, Python, Gevent, and ZeroMQ
Realtime Analytics Using MongoDB, Python, Gevent, and ZeroMQRealtime Analytics Using MongoDB, Python, Gevent, and ZeroMQ
Realtime Analytics Using MongoDB, Python, Gevent, and ZeroMQRick Copeland
 
Rapid, Scalable Web Development with MongoDB, Ming, and Python
Rapid, Scalable Web Development with MongoDB, Ming, and PythonRapid, Scalable Web Development with MongoDB, Ming, and Python
Rapid, Scalable Web Development with MongoDB, Ming, and PythonRick Copeland
 
Allura - an Open Source MongoDB Based Document Oriented SourceForge
Allura - an Open Source MongoDB Based Document Oriented SourceForgeAllura - an Open Source MongoDB Based Document Oriented SourceForge
Allura - an Open Source MongoDB Based Document Oriented SourceForgeRick Copeland
 
MongoATL: How Sourceforge is Using MongoDB
MongoATL: How Sourceforge is Using MongoDBMongoATL: How Sourceforge is Using MongoDB
MongoATL: How Sourceforge is Using MongoDBRick Copeland
 

Mais de Rick Copeland (12)

Python Functions (PyAtl Beginners Night)
Python Functions (PyAtl Beginners Night)Python Functions (PyAtl Beginners Night)
Python Functions (PyAtl Beginners Night)
 
Building Your First MongoDB Application
Building Your First MongoDB ApplicationBuilding Your First MongoDB Application
Building Your First MongoDB Application
 
Rapid and Scalable Development with MongoDB, PyMongo, and Ming
Rapid and Scalable Development with MongoDB, PyMongo, and MingRapid and Scalable Development with MongoDB, PyMongo, and Ming
Rapid and Scalable Development with MongoDB, PyMongo, and Ming
 
Chef on MongoDB and Pyramid
Chef on MongoDB and PyramidChef on MongoDB and Pyramid
Chef on MongoDB and Pyramid
 
Scaling with MongoDB
Scaling with MongoDBScaling with MongoDB
Scaling with MongoDB
 
Chef on Python and MongoDB
Chef on Python and MongoDBChef on Python and MongoDB
Chef on Python and MongoDB
 
Real-Time Python Web: Gevent and Socket.io
Real-Time Python Web: Gevent and Socket.ioReal-Time Python Web: Gevent and Socket.io
Real-Time Python Web: Gevent and Socket.io
 
Rapid and Scalable Development with MongoDB, PyMongo, and Ming
Rapid and Scalable Development with MongoDB, PyMongo, and MingRapid and Scalable Development with MongoDB, PyMongo, and Ming
Rapid and Scalable Development with MongoDB, PyMongo, and Ming
 
Realtime Analytics Using MongoDB, Python, Gevent, and ZeroMQ
Realtime Analytics Using MongoDB, Python, Gevent, and ZeroMQRealtime Analytics Using MongoDB, Python, Gevent, and ZeroMQ
Realtime Analytics Using MongoDB, Python, Gevent, and ZeroMQ
 
Rapid, Scalable Web Development with MongoDB, Ming, and Python
Rapid, Scalable Web Development with MongoDB, Ming, and PythonRapid, Scalable Web Development with MongoDB, Ming, and Python
Rapid, Scalable Web Development with MongoDB, Ming, and Python
 
Allura - an Open Source MongoDB Based Document Oriented SourceForge
Allura - an Open Source MongoDB Based Document Oriented SourceForgeAllura - an Open Source MongoDB Based Document Oriented SourceForge
Allura - an Open Source MongoDB Based Document Oriented SourceForge
 
MongoATL: How Sourceforge is Using MongoDB
MongoATL: How Sourceforge is Using MongoDBMongoATL: How Sourceforge is Using MongoDB
MongoATL: How Sourceforge is Using MongoDB
 

Último

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 

Último (20)

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 

Schema Design at Scale

  • 1. MongoDB: Schema Design at Scale Rick Copeland @rick446 http://arborian.com
  • 2. Who am I? • Now a consultant/trainer, but formerly... • Software engineer at SourceForge • Author of Essential SQLAlchemy • Author of MongoDB with Python and Ming • Primarily code Python
  • 3. The Inspiration • MongoDB monitoring service (MMS) • Free to all MongoDB users • Minute-by-minute stats on all your servers • Hardware cost is important, use it efficiently (remember it’s a free service!)
  • 4. Our Experiment • Similar to MMS but not identical • Collection of 100 metrics, each with per- minute values • “Simulation time” is 300x real time • Run on 2x AWS small instance • one MongoDB server (2.0.2) • one “load generator”
  • 5. Load Generator • Increment each metric as many times as possible during the course of a simulated minute • Record number of updates per second • Occasionally call getLastError to prevent disconnects
  • 6. Schema v1 { _id: "20101010/metric-1", metadata: { date: ISODate("2000-10-10T00:00:00Z"), metric: "metric-1" }, • One document daily: 5468426, hourly: { per metric (per "00": 227850, server) per day "01": 210231, ... "23": 20457 }, minute: { • Per hour/minute "0000": 3612, statistics stored as "0001": 3241, ... documents "1439": 2819 } }
  • 7. Update v1 • Use $inc to update fields in- place increment = { daily: 1 } increment['hourly.' + hour] = 1 increment['minute.' + minute] = 1 • Use upsert to db.stats.update( { _id: id, metadata: metadata }, create document { $inc: update }, if it’s missing true) // upsert • Easy, correct, seems like a good idea....
  • 9. Performance of v1 Experiment startup
  • 10. Performance of v1 Experiment startup OUCH!
  • 11. Problems with v1 • The document movement problem • The midnight problem • The end-of-the-day problem • The historical query problem
  • 12. Document movement problem • MongoDB in-place updates are fast • ... except when they’re not in place • MongoDB adaptively pads documents • ... but it’s better to know your doc size ahead of time
  • 13. Midnight problem • Upserts are convenient, but what’s our key? • date/metric • At midnight, you get a huge spike in inserts
  • 14. Fixing the document movement problem • Preallocate db.stats.update( documents with { _id: id, metadata: metadata }, zeros { $inc: { daily: 0, hourly.0: 0, hourly.1: 0, ... • Crontab (?) minute.0: 0, minute.1: 0, ... } • NO! (makes true) // upsert the midnight problem even worse)
  • 16. Fixing the midnight problem • Could schedule preallocation for different metrics, staggered through the day
  • 17. Fixing the midnight problem • Could schedule preallocation for different metrics, staggered through the day • Observation: Preallocation isn’t required for correct operation
  • 18. Fixing the midnight problem • Could schedule preallocation for different metrics, staggered through the day • Observation: Preallocation isn’t required for correct operation • Let’s just preallocate tomorrow’s docs randomly as new stats are inserted (with low probability).
  • 19. Performance with Preallocation Experiment startup
  • 20. Performance with Preallocation • Well, it’s better Experiment startup
  • 21. Performance with Preallocation • Well, it’s better • Still have Experiment startup decreasing performance through the day... WTF?
  • 22. Performance with Preallocation • Well, it’s better • Still have Experiment startup decreasing performance through the day... WTF?
  • 23. Problems with v1 • The document movement problem • The midnight problem • The end-of-the-day problem • The historical query problem
  • 24. End-of-day problem “0000” Value “0001” Value “1439” Value • Bson stores documents as an association list • MongoDB must check each key for a match • Load increases significantly at the end of the day (MongoDB must scan 1439 keys to find the right minute!)
  • 25. Fixing the end-of-day problem • { _id: "20101010/metric-1", metadata: { Split up our date: ISODate("2000-10-10T00:00:00Z"), metric: "metric-1" }, ‘minute’ property daily: 5468426, hourly: { by hour "0": 227850, "1": 210231, ... "23": 20457 }, • Better worst-case minute: { keys scanned: "00": { "0000": 3612, "0100": 3241, ... }, ..., • Old: 1439 "23": { ..., "1439": 2819 } } • New: 82
  • 29. Historical Query Problem • Intra-day queries are great • What about “performance year to date”? • Now you’re hitting a lot of “cold” documents and causing page faults
  • 30. Fixing the historical query problem • Store multiple levels { _id: "201010/metric-1", of granularity in metadata: { date: ISODate("2000-10-01T00:00:00Z"), different collections metric: "metric-1" }, • daily: { "0": 5468426, 2 updates rather than "1": ..., ... 1, but historical } "31": ... }, queries much faster • Preallocate along with daily docs (only infrequently upserted)
  • 31. Queries db.stats.daily.find( { • Updates are by "metadata.date": { $gte: dt1, $lte: dt2 }, "metadata.metric": "metric-1"}, _id, so no index { "metadata.date": 1, "hourly": 1 } }, needed there sort=[("metadata.date", 1)]) • Chart queries are by metadata db.stats.daily.ensureIndex({ 'metadata.metric': 1, • Your range/sort 'metadata.date': 1 }) should be last in the compound index
  • 32. Conclusion • Monitor your performance. Watch out for spikes. • Preallocate to prevent document copying • Pay attention to the number of keys in your documents (hierarchy can help) • Make sure your index is optimized for your sorts
  • 33. Questions? MongoDB Monitoring Service http://www.10gen.com/mongodb-monitoring-service Rick Copeland @rick446 http://arborian.com MongoDB Consulting & Training

Notas do Editor

  1. \n
  2. \n
  3. \n
  4. \n
  5. \n
  6. \n
  7. \n
  8. \n
  9. \n
  10. \n
  11. \n
  12. \n
  13. \n
  14. \n
  15. \n
  16. \n
  17. \n
  18. \n
  19. \n
  20. \n
  21. \n
  22. \n
  23. \n
  24. \n
  25. \n
  26. \n
  27. \n
  28. \n
  29. \n
  30. \n