SlideShare uma empresa Scribd logo
1 de 29
Scaling Rails @ Yottaa Jared Rosoff @forjared jrosoff@yottaa.com September 20th 2010
From zero to humongous 2 About our application  How we chose MongoDB How we use MongoDB
About our application 3 We collect lots of data 6000+ URLs 300 samples per URL per day Some samples are >1MB (firebug)  Missing a sample isn’t a bit deal We visualize data in real-time No delay when showing data “On-Demand” samples  The “check now” button
The Yottaa Network 4
How we chose mongo 5
Requirements Our data set is going to grow very quickly  Scalable by default We have a very small team Focus on application, not infrastructure We are a startup  Requirements change hourly Operations We’re 100% in the cloud 6
Rails default architecture Performance Bottleneck: Too much load Collection Server Data Source MySQL User Reporting Server “Just” a Rails App
Let’s add replication! Performance Bottleneck: Still can’t scale writes MySQL Master Collection Server Data Source Replication MySQL Master User Reporting Server MySQL Master MySQL Master Off the shelf! Scalable Reads!
What about sharding? Development Bottleneck: Need to write custom code Collection Server Data Source Sharding MySQL Master MySQL Master MySQL Master User Reporting Server Sharding Scalable Writes!
Key Value stores to the rescue? Development Bottleneck: Reporting is limited / hard Collection Server Data Source MySQL Master MySQL Master Cassandra or Voldemort User Reporting Server Scalable Writes!
Can I Hadoop my way out of this? 	Development Bottleneck: Too many systems! MySQL Master MySQL Master Cassandra or Voldemort Collection Server Data Source Hadoop MySQL Master Scalable Writes! Flexible Reports! “Just” a Rails App MySQL Master User Reporting Server MySQL Master MySQL Slave
MongoDB!  Collection Server Data Source MySQL Master MySQL Master MongoDB User Reporting Server Scalable Writes! “Just” a rails app Flexible Reporting!
MongoD App Server Data Source Collection MongoD Load Balancer Passenger Nginx Mongos Reporting User MongoD Sharding! High Concurrency Scale-Out
Sharding is critical 14 Distribute write load across servers Decentralize data storage Scale out!
Before Sharding 15 App Server App Server App Server Need higher write volume Buy a bigger database Need more storage volume Buy a bigger database
After Sharding 16 App Server App Server App Server Need higher write volume Add more servers Need more storage volume Add more servers
Scale out is the new scale up 17 App Server App Server App Server
How we’re using MongoDB 18
Our Data Model 19 Document per URL we track  Meta-data Summary Data Most recent measurements Document per URL per Day Detailed metrics Pre-aggregated data
Thinking in rows 20 { url: ‘www.google.com’,   location: “SFO”    connect: 23,  first_byte: 123,  last_byte: 245,    timestamp: 1234	}  { url: ‘www.google.com’,   location: “NYC”    connect: 23,  first_byte: 123,  last_byte: 245,    timestamp: 2345	}
Thinking in rows 21 What was the average connect time for google on friday? From SFO? From NYC? Between 1AM-2AM?
Thinking in rows 22  Up to 100’s of samples per URL per day!! Day 1 AVG Result Day 2 An “average” chart had to hit 600 rows   AVG Day 3 AVG 30 days average query range
Thinking in Documents This document contains all data for www.google.com collected during 9/20/2010 This tells us the average value for this metric for this url / time period Average value from SFO Average value from NYC 23
Storing a sample 24 db.metrics.dailies.update(  	{ url: ‘www.google.com’,         day: ‘9/20/2010’ },  	{ ‘$inc’: {  	  ‘connect.sum’:1234,        ‘connect.count’:1,        ‘connect.sfo.sum’:1234,        ‘connect.sfo.count’:1 } },      { upsert: true }  ); Which document we’re updating Update the aggregate value Update the location specific value Atomically update the document Create the document if it doesn’t already exist
Putting it together 25 Atomically update the daily data 1 { url: ‘www.google.com’,   location: “SFO”    connect: 23,  first_byte: 123,  last_byte: 245,    timestamp: 1234	}  Atomically update the weekly data 2 Atomically update the monthly data 3
Drawing connect time graph 26 db.metrics.dailies.find(  	{ url: ‘www.google.com’,         day: { “$gte”: ‘9/1/2010’,                  “$lte”:’9/20/2010’ },  	{ ‘connect’:true} ); Data for google We just want connect time data Compound index to make this query fast The range of dates for the chart db.metrics.dailies.ensureIndex({url:1,day:-1})
More efficient charts 27 1 Document per URL per Day Day 1 AVG Result Day 2 Average chart hits 30 documents.  AVG 20x fewer Day 3 AVG 30 days == 30 documents
Real Time Updates 28 Single query to fetch all metric data for a URL Fast enough that browser can poll constantly for updated data without impacting server
Final thoughts Mongo has been a great choice  80gb of data and counting Majorly compressed after moving from table to document oriented data model  100’s of updates per second 24x7 Not using Sharding in production yet, but planning on it soon  You are using replication, right?  29

Mais conteúdo relacionado

Destaque

MongoDB for Time Series Data Part 1: Setting the Stage for Sensor Management
MongoDB for Time Series Data Part 1: Setting the Stage for Sensor ManagementMongoDB for Time Series Data Part 1: Setting the Stage for Sensor Management
MongoDB for Time Series Data Part 1: Setting the Stage for Sensor Management
MongoDB
 
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB
 
Lightning Talk: Real-Time Analytics from MongoDB
Lightning Talk: Real-Time Analytics from MongoDBLightning Talk: Real-Time Analytics from MongoDB
Lightning Talk: Real-Time Analytics from MongoDB
MongoDB
 
Real Time Data Analytics with MongoDB and Fluentd at Wish
Real Time Data Analytics with MongoDB and Fluentd at WishReal Time Data Analytics with MongoDB and Fluentd at Wish
Real Time Data Analytics with MongoDB and Fluentd at Wish
MongoDB
 

Destaque (18)

MongoDB for Time Series Data Part 1: Setting the Stage for Sensor Management
MongoDB for Time Series Data Part 1: Setting the Stage for Sensor ManagementMongoDB for Time Series Data Part 1: Setting the Stage for Sensor Management
MongoDB for Time Series Data Part 1: Setting the Stage for Sensor Management
 
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
 
Webinar: Working with Graph Data in MongoDB
Webinar: Working with Graph Data in MongoDBWebinar: Working with Graph Data in MongoDB
Webinar: Working with Graph Data in MongoDB
 
Mining Social Web APIs with IPython Notebook - Data Day Texas 2014
Mining Social Web APIs with IPython Notebook - Data Day Texas 2014Mining Social Web APIs with IPython Notebook - Data Day Texas 2014
Mining Social Web APIs with IPython Notebook - Data Day Texas 2014
 
Benchmarking MongoDB for Fame and Fortune
Benchmarking MongoDB for Fame and FortuneBenchmarking MongoDB for Fame and Fortune
Benchmarking MongoDB for Fame and Fortune
 
Get More Out of MongoDB with TokuMX
Get More Out of MongoDB with TokuMXGet More Out of MongoDB with TokuMX
Get More Out of MongoDB with TokuMX
 
Webinar: Managing Real Time Risk Analytics with MongoDB
Webinar: Managing Real Time Risk Analytics with MongoDB Webinar: Managing Real Time Risk Analytics with MongoDB
Webinar: Managing Real Time Risk Analytics with MongoDB
 
Lightning Talk: Real-Time Analytics from MongoDB
Lightning Talk: Real-Time Analytics from MongoDBLightning Talk: Real-Time Analytics from MongoDB
Lightning Talk: Real-Time Analytics from MongoDB
 
Webinar: Data Processing and Aggregation Options
Webinar: Data Processing and Aggregation OptionsWebinar: Data Processing and Aggregation Options
Webinar: Data Processing and Aggregation Options
 
Introduction to MongoDB and Workshop
Introduction to MongoDB and WorkshopIntroduction to MongoDB and Workshop
Introduction to MongoDB and Workshop
 
Is It Fast? : Measuring MongoDB Performance
Is It Fast? : Measuring MongoDB PerformanceIs It Fast? : Measuring MongoDB Performance
Is It Fast? : Measuring MongoDB Performance
 
MongoDB - Ekino PHP
MongoDB - Ekino PHPMongoDB - Ekino PHP
MongoDB - Ekino PHP
 
MongoDB's New Aggregation framework
MongoDB's New Aggregation frameworkMongoDB's New Aggregation framework
MongoDB's New Aggregation framework
 
Real Time Data Analytics with MongoDB and Fluentd at Wish
Real Time Data Analytics with MongoDB and Fluentd at WishReal Time Data Analytics with MongoDB and Fluentd at Wish
Real Time Data Analytics with MongoDB and Fluentd at Wish
 
MongoDB
MongoDBMongoDB
MongoDB
 
[D2 campus seminar]오픈소스로 날아오르다
[D2 campus seminar]오픈소스로 날아오르다[D2 campus seminar]오픈소스로 날아오르다
[D2 campus seminar]오픈소스로 날아오르다
 
Introduction to MongoDB with PHP
Introduction to MongoDB with PHPIntroduction to MongoDB with PHP
Introduction to MongoDB with PHP
 
Building a Directed Graph with MongoDB
Building a Directed Graph with MongoDBBuilding a Directed Graph with MongoDB
Building a Directed Graph with MongoDB
 

Mais de Jared Rosoff

Mais de Jared Rosoff (10)

MongoDB Advanced Schema Design - Inboxes
MongoDB Advanced Schema Design - InboxesMongoDB Advanced Schema Design - Inboxes
MongoDB Advanced Schema Design - Inboxes
 
Mongosv 2011 - Sharding
Mongosv 2011 - ShardingMongosv 2011 - Sharding
Mongosv 2011 - Sharding
 
Mongosv 2011 - Replication
Mongosv 2011 - ReplicationMongosv 2011 - Replication
Mongosv 2011 - Replication
 
Mongosv 2011 - MongoDB on Amazon EC2
Mongosv 2011 - MongoDB on Amazon EC2Mongosv 2011 - MongoDB on Amazon EC2
Mongosv 2011 - MongoDB on Amazon EC2
 
MongoDB Deployment Tips
MongoDB Deployment TipsMongoDB Deployment Tips
MongoDB Deployment Tips
 
Scaling with mongo db - SF Mongo User Group 7-19-2011
Scaling with mongo db - SF Mongo User Group 7-19-2011Scaling with mongo db - SF Mongo User Group 7-19-2011
Scaling with mongo db - SF Mongo User Group 7-19-2011
 
MongoDB on EC2 and EBS
MongoDB on EC2 and EBSMongoDB on EC2 and EBS
MongoDB on EC2 and EBS
 
Indexing & query optimization
Indexing & query optimizationIndexing & query optimization
Indexing & query optimization
 
Web performance meetup bos 11 18-2010
Web performance meetup bos 11 18-2010Web performance meetup bos 11 18-2010
Web performance meetup bos 11 18-2010
 
Scalable Event Analytics with MongoDB & Ruby on Rails
Scalable Event Analytics with MongoDB & Ruby on RailsScalable Event Analytics with MongoDB & Ruby on Rails
Scalable Event Analytics with MongoDB & Ruby on Rails
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 

Realtime Analytics with MongoDB

  • 1. Scaling Rails @ Yottaa Jared Rosoff @forjared jrosoff@yottaa.com September 20th 2010
  • 2. From zero to humongous 2 About our application How we chose MongoDB How we use MongoDB
  • 3. About our application 3 We collect lots of data 6000+ URLs 300 samples per URL per day Some samples are >1MB (firebug) Missing a sample isn’t a bit deal We visualize data in real-time No delay when showing data “On-Demand” samples The “check now” button
  • 5. How we chose mongo 5
  • 6. Requirements Our data set is going to grow very quickly Scalable by default We have a very small team Focus on application, not infrastructure We are a startup Requirements change hourly Operations We’re 100% in the cloud 6
  • 7. Rails default architecture Performance Bottleneck: Too much load Collection Server Data Source MySQL User Reporting Server “Just” a Rails App
  • 8. Let’s add replication! Performance Bottleneck: Still can’t scale writes MySQL Master Collection Server Data Source Replication MySQL Master User Reporting Server MySQL Master MySQL Master Off the shelf! Scalable Reads!
  • 9. What about sharding? Development Bottleneck: Need to write custom code Collection Server Data Source Sharding MySQL Master MySQL Master MySQL Master User Reporting Server Sharding Scalable Writes!
  • 10. Key Value stores to the rescue? Development Bottleneck: Reporting is limited / hard Collection Server Data Source MySQL Master MySQL Master Cassandra or Voldemort User Reporting Server Scalable Writes!
  • 11. Can I Hadoop my way out of this? Development Bottleneck: Too many systems! MySQL Master MySQL Master Cassandra or Voldemort Collection Server Data Source Hadoop MySQL Master Scalable Writes! Flexible Reports! “Just” a Rails App MySQL Master User Reporting Server MySQL Master MySQL Slave
  • 12. MongoDB! Collection Server Data Source MySQL Master MySQL Master MongoDB User Reporting Server Scalable Writes! “Just” a rails app Flexible Reporting!
  • 13. MongoD App Server Data Source Collection MongoD Load Balancer Passenger Nginx Mongos Reporting User MongoD Sharding! High Concurrency Scale-Out
  • 14. Sharding is critical 14 Distribute write load across servers Decentralize data storage Scale out!
  • 15. Before Sharding 15 App Server App Server App Server Need higher write volume Buy a bigger database Need more storage volume Buy a bigger database
  • 16. After Sharding 16 App Server App Server App Server Need higher write volume Add more servers Need more storage volume Add more servers
  • 17. Scale out is the new scale up 17 App Server App Server App Server
  • 18. How we’re using MongoDB 18
  • 19. Our Data Model 19 Document per URL we track Meta-data Summary Data Most recent measurements Document per URL per Day Detailed metrics Pre-aggregated data
  • 20. Thinking in rows 20 { url: ‘www.google.com’, location: “SFO” connect: 23, first_byte: 123, last_byte: 245, timestamp: 1234 } { url: ‘www.google.com’, location: “NYC” connect: 23, first_byte: 123, last_byte: 245, timestamp: 2345 }
  • 21. Thinking in rows 21 What was the average connect time for google on friday? From SFO? From NYC? Between 1AM-2AM?
  • 22. Thinking in rows 22 Up to 100’s of samples per URL per day!! Day 1 AVG Result Day 2 An “average” chart had to hit 600 rows AVG Day 3 AVG 30 days average query range
  • 23. Thinking in Documents This document contains all data for www.google.com collected during 9/20/2010 This tells us the average value for this metric for this url / time period Average value from SFO Average value from NYC 23
  • 24. Storing a sample 24 db.metrics.dailies.update( { url: ‘www.google.com’, day: ‘9/20/2010’ }, { ‘$inc’: { ‘connect.sum’:1234, ‘connect.count’:1, ‘connect.sfo.sum’:1234, ‘connect.sfo.count’:1 } }, { upsert: true } ); Which document we’re updating Update the aggregate value Update the location specific value Atomically update the document Create the document if it doesn’t already exist
  • 25. Putting it together 25 Atomically update the daily data 1 { url: ‘www.google.com’, location: “SFO” connect: 23, first_byte: 123, last_byte: 245, timestamp: 1234 } Atomically update the weekly data 2 Atomically update the monthly data 3
  • 26. Drawing connect time graph 26 db.metrics.dailies.find( { url: ‘www.google.com’, day: { “$gte”: ‘9/1/2010’, “$lte”:’9/20/2010’ }, { ‘connect’:true} ); Data for google We just want connect time data Compound index to make this query fast The range of dates for the chart db.metrics.dailies.ensureIndex({url:1,day:-1})
  • 27. More efficient charts 27 1 Document per URL per Day Day 1 AVG Result Day 2 Average chart hits 30 documents. AVG 20x fewer Day 3 AVG 30 days == 30 documents
  • 28. Real Time Updates 28 Single query to fetch all metric data for a URL Fast enough that browser can poll constantly for updated data without impacting server
  • 29. Final thoughts Mongo has been a great choice 80gb of data and counting Majorly compressed after moving from table to document oriented data model 100’s of updates per second 24x7 Not using Sharding in production yet, but planning on it soon You are using replication, right? 29