SlideShare uma empresa Scribd logo
1 de 38
Baixar para ler offline
Yottaa Inc.
2 Canal Park 5th Floor
Cambridge MA 02141
http://www.yottaa.com
Realtime Analytics with
MongoDB & Rails
Jared Rosoff
@forjared
forjared@gmail.com
2
Overview
• About Yottaa
• Engineering challenges
• Approaches we considered
• How we did it
• How it works
©Yottaa Confidential. Do Not Distribute.
Who’s driving your website?
3
http://stop-the-damage.com/2010/08/276/
©Yottaa Confidential. Do Not Distribute.
We can help you make it faster
4
OMG!! 15
seconds?
WTF?
©Yottaa Confidential. Do Not Distribute.
Knowing is half the battle
5
San Francisco
Washington DC
London
©Yottaa Confidential. Do Not Distribute.
Data data everywhere
• We collect lots of data
– 14,000+ URLs being tracked
– Up to 300 samples per URL per day
– Some samples are >1mb (firebug)
– Missing a sample isn’t a big deal
• We try to make everything real time
– No batch jobs, everything displayed as it
happens
– “Check Now” button runs tests on demand
6
©Yottaa Confidential. Do Not Distribute.
7
Demo!
8
Engineering Challenges
• High write volume from day 1
– Sample collection is like having millions of users on the first day
– After 60 days, we have > 150GB of data
– Adding about 5gb / day today
• Small engineering team
– 1 built data ware house & portal, 1 built monitoring agents
– Bigger team now, but this was how we started
• Must be Agile
– We didn’t know exactly what features we’d need
– Requirements change daily
• Limited operations budget
– No full time operations staff
– 100% in the cloud
©Yottaa Confidential. Do Not Distribute.
Rails default architecture
MySQL
Data
Source
Collection Server
User Reporting Server
“Just” a Rails App
Performance
Bottleneck: Too much load
©Yottaa Confidential. Do Not Distribute.
Let’s add replication!
MySQL
Master
MySQL
Master
MySQL
Slave
MySQL
Master
Replication
Data
Source
Collection Server
User Reporting Server
Off the shelf!
Scalable Reads!
Performance
Bottleneck: Still can’t scale
writes
©Yottaa Confidential. Do Not Distribute.
What about sharding?
MySQL
Master
MySQL
Master
MySQL
Master
Data
Source
Collection Server
User Reporting Server
ShardingSharding
Scalable Writes!
Development Bottleneck:
Need to write custom code
©Yottaa Confidential. Do Not Distribute.
Key Value stores to the rescue?
MySQL
Master
MySQL
Master
Cassandra
or
Voldemort
Data
Source
Collection Server
User Reporting Server
Scalable Writes!
Development Bottleneck:
Reporting is limited / hard
©Yottaa Confidential. Do Not Distribute.
Can I Hadoop my way out of this?
MySQL
Master
MySQL
Master
Cassandra
or
Voldemort
Data
Source
Collection Server
User Reporting Server
Hadoop
MySQL
Master
MySQL
Master
MySQL
Slave
MySQL
Master
Scalable Writes!
Flexible Reports!
“Just” a Rails App
Development
Bottleneck:
Too many systems!
©Yottaa Confidential. Do Not Distribute.
MongoDB!
MySQL
Master
MySQL
MasterMongoDB
Data
Source
Collection Server
User Reporting Server
Scalable Writes!
“Just” a rails app
Flexible Reporting!
MongoD
MongoD
MongoD
Data
Source
App Server
CollectionNginx
Passenger
Mongos
Reporting
User
Sharding!
High ConcurrencyScale-Out
Load
Balancer
Easy as Rails!
3 Steps to Real Time Analytics
16
1. Collect data
2. Store Data
3. Display Reports
3 Steps to Real Time Analytics
17
1. Collect data
2. Store Data
3. Display Reports
Collecting Data
18
Data
Source
Collection Server
Data
Source
Data
Source
Collection Server
Collection Server
Collection Server
Load
Balancer
POST http://collector.com/samples
We use Amazon ELB
We use Amazon EC2
Collecting Data
19
- Sample data is passed in body of POST request
- Rails makes it really easy to parse JSON, XML, YML (we use JSON)
- We have a bunch of other stuff that happens when data arrives, but
all you really need to do is write the data
A Sample Sample!
20
{
url: ‘www.google.com’,
location: “SFO”
connect: 23,
first_byte: 123,
last_byte: 245,
timestamp: 1234
}
A more complicated example
21
22
"{"location":"aws-us-east","timestamp":"08/05/2010
07:11:54","http_archive":{"log":{"creator":{"name":"Firebug","version":"1.4.3"},"version":"1.1","pages":[{"title":"u4e2d
u56fdu7f51u7edcu7535u89c6u53f0-CNTV","id":"page_0","startedDateTime":"2010-08-05T08:11:51.897
01:00","pageTimings":{"onContentLoad":1883,"onLoad":2828}}],"entries":[{"timings":{"connect":null,"wait":561,"blocked":null,
"receive":19,"send":0,"dns":0},"response":{"statusText":"OK","headersSize":-
1,"httpVersion":"HTTP/1.1","bodySize":2067,"content":{"size":4467,"mimeType":"text/html"},"status":200,"redirectURL":""},
"cache":{},"pageref":"page_0","time":580,"startedDateTime":"2010-08-05T08:11:51.897 01:00","request":{"headersSize":-
1,"method":"GET","url":"http://www.cntv.cn/","httpVersion":"HTTP/1.1","bodySize":-
1}},{"timings":{"connect":null,"wait":188,"blocked":null,"receive":1,"send":0,"dns":0},"response":{"statusText":"OK","header
sSize":-
1,"httpVersion":"HTTP/1.1","bodySize":740,"content":{"size":740,"mimeType":"image/jpeg"},"status":200,"redirectURL":""},"
cache":{},"pageref":"page_0","time":370,"startedDateTime":"2010-08-05T08:11:52.481 01:00","request":{"headersSize":-
1,"method":"GET","url":"http://www.cntv.cn/nettv/homepage2010/globalhomepage_image/r_bg.jpg","httpVersion":"HTTP/1.1","b
odySize":-
1}},{"timings":{"connect":null,"wait":3,"blocked":null,"receive":1,"send":0,"dns":1280},"response":{"statusText":"OK","heade
rsSize":-1,"httpVersion":"HTTP/1.1","bodySize":2933,"content":{"size":7377,"mimeType":"application/x-
javascript"},"status":200,"redirectURL":""},"cache":{},"pageref":"page_0","time":1285,"startedDateTime":"2010-08-
05T08:11:52.483 01:00","request":{"headersSize":-
1,"method":"GET","url":"http://www.cctv.com/Library/a2.js","httpVersion":"HTTP/1.1","bodySize":-
1}},{"timings":{"connect":null,"wait":171,"blocked":null,"receive":83,"send":0,"dns":363},"response":{"statusText":"OK","hea
dersSize":-
1,"httpVersion":"HTTP/1.1","bodySize":76508,"content":{"size":76508,"mimeType":"image/png"},"status":200,"redirectURL":"
"},"cache":{},"pageref":"page_0","time":716,"startedDateTime":"2010-08-05T08:11:52.489 01:00","request":{"headersSize":-
1,"method":"GET","url":"http://www.cntv.cn/nettv/homepage2010/globalhomepage_image/r_top.png","httpVersion":"HTTP/1.1","
bodySize":-
1}},{"timings":{"connect":null,"wait":156,"blocked":null,"receive":1,"send":0,"dns":472},"response":{"statusText":"OK","head
ersSize":-
1,"httpVersion":"HTTP/1.1","bodySize":5351,"content":{"size":5351,"mimeType":"image/png"},"status":200,"redirectURL":""}
,"cache":{},"pageref":"page_0","time":629,"startedDateTime":"2010-08-05T08:11:52.490 01:00","request":{"headersSize":-
1,"method":"GET","url":"http://www.cntv.cn/nettv/homepage2010/globalhomepage_image/r_link.png","httpVersion":"HTTP/1.1","
bodySize":-
1}},{"timings":{"connect":null,"wait":147,"blocked":null,"receive":0,"send":0,"dns":470},"response":{"statusText":"OK","head
ersSize":-
1,"httpVersion":"HTTP/1.1","bodySize":2068,"content":{"size":2068,"mimeType":"image/png"},"status":200,"redirectURL":""}
,"cache":{},"pageref":"page_0","time":617,"startedDateTime":"2010-08-05T08:11:52.492 01:00","request":{"headersSize":-
1,"method":"GET","url":"http://www.cntv.cn/nettv/homepage2010/globalhomepage_image/r_bottom.png","httpVersion":"HTTP/1.1
","bodySize":-
3 Steps to Real Time Analytics
23
1. Collect data
2. Store Data
3. Display Reports
Thinking in rows
24
URL Location Connec
t
First
Byte
Last Byte Timestamp{ url: ‘www.google.com’,
location: “SFO”
connect: 23,
first_byte: 123,
last_byte: 245,
timestamp: 1234 }
{ url: ‘www.google.com’,
location: “NYC”
connect: 23,
first_byte: 123,
last_byte: 245,
timestamp: 2345 }
Thinking in rows
25
URL Location Connec
t
First
Byte
Last Byte Timestamp
What was the
average connect
time for google on
friday?
From SFO?
From NYC?
Between 1AM-2AM?
Thinking in rows
26
URL Location Connec
t
First
Byte
Last Byte Timestamp
AVG
AVG
AVG
Day 1
Day 2
Day 3
Result
Up to 100’s of
samples per
URL per day!!
30 days
average query
range
An “average”
chart had to hit
600 rows
Thinking in Documents
27
URL www.google.com
Day 9/20/2010
Last Byte
Sum 2312
Count 12
SFO
NYC
Sum 1200
Count 5
Sum 1112
Count 7
This document contains all
data for www.google.com
collected during 9/20/2010
This tells us the
average value for
this metric for this
url / time period
Average value from
SFO
Average value from
NYC
Storing a sample
28
Create the document if
it doesn’t already exist
Update the
location specific
value
Update the
aggregate value
Which document
we’re updating
Atomically update the
document
db.metrics.dailies.update(
{ url: ‘www.google.com’,
day: new Date(2010,9,2)},
{ ‘$inc’: {
‘connect.sum’:1234,
‘connect.count’:1,
‘connect.sfo.sum’:1234,
‘connect.sfo.count’:1 } },
true // upsert
);
An example document
29
{
"_id": ObjectId("4bb55c59c3666e02fc000001"),
"url": ”http://www.google.com/",
"date": "Mon Jun 07 2010 00:00:00 GMT",
"connect":{
"sum": 999, # sum of all the locations
"sum_of_squares": 99999,
"count": 99,
”san_francisco":{
"sum": 555, # sum of this location
"sum_of_squares": 55555,
"count": 55,
"values": [
[”Mon Jun 07 2010 20:00:00 GMT", 12],
[”Mon Jun 07 2010 20:10:00 GMT", 13],
.........
]
},
Putting it together
30
{ url: ‘www.google.com’,
location: “SFO”
connect: 23,
first_byte: 123,
last_byte: 245,
timestamp: 1234 }
Atomically update
the daily data
1
Atomically update
the weekly data
2
Atomically update
the monthly data
3
Sharding our Data
31
Shard 1
Shard 2
Shard 3
Shard 4
Reporting Server
Collection Server
URL 1
URL 2
URL 3
URL 4
URL 5
URL 6
URL 7
URL 8
Shard by URL
Write load evenly
distributed
Most reads hit a
single shard
3 Steps to Real Time Analytics
32
1. Collect data
2. Store Data
3. Display Reports
Drawing connect time graph
33
We just want
connect time
data. But we can
include as many
metrics as we
want
Data for google
The range of dates for
the chart
Compound index
to make this
query fast
db.metrics.dailies.ensureIndex({url:1,day:-1})
db.metrics.dailies.find(
{ url: ‘www.google.com’,
day: { “$gte”: new Date(2010,9,1),
“$lte”: new Date(2010,9,30)},
{ ‘connect’:true}
);
More efficient charts
34
URL Day <data>
AVG
AVG
AVG
Day 1
Day 2
Day 3
Result
1 Document
per URL per
Day
30 days == 30
documents
Average chart
hits 30
documents.
20x
fewer
Real Time Updates
URL Most Recent Data
Single query to fetch all
metric data for a URL
Fast enough that
browser can poll
constantly for updated
data without impacting
server
Evaluation
36
• High write volume
– Currently handling 1000’s of db writes per second on a single
MongoDB server
– Adding ~5GB per day
• Small Engineering Team
– Core system built by 2 engineers in <1 month
• Agile
– BDD using Rails
• Limited operations budget
– Runs on a handful of EC2 instances
– No major issues
Final thoughts
37
• Love MongoDB. (It’s now my default when
starting a new project)
• Using MongoMapper as ORM, but think
there must a better way, more in tune with
document model rather than a port of AR
• There’s magic in documents but it requires
thinking about your data in new ways.
38
Q & A
Thank you for viewing

Mais conteúdo relacionado

Último

All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFMichael Gough
 
Accelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessAccelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessWSO2
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsYoss Cohen
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...Karmanjay Verma
 
WomenInAutomation2024: AI and Automation for eveyone
WomenInAutomation2024: AI and Automation for eveyoneWomenInAutomation2024: AI and Automation for eveyone
WomenInAutomation2024: AI and Automation for eveyoneUiPathCommunity
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxAna-Maria Mihalceanu
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...Nikki Chapple
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
QMMS Lesson 2 - Using MS Excel Formula.pdf
QMMS Lesson 2 - Using MS Excel Formula.pdfQMMS Lesson 2 - Using MS Excel Formula.pdf
QMMS Lesson 2 - Using MS Excel Formula.pdfROWELL MARQUINA
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 

Último (20)

How Tech Giants Cut Corners to Harvest Data for A.I.
How Tech Giants Cut Corners to Harvest Data for A.I.How Tech Giants Cut Corners to Harvest Data for A.I.
How Tech Giants Cut Corners to Harvest Data for A.I.
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDF
 
Accelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessAccelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with Platformless
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platforms
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
 
WomenInAutomation2024: AI and Automation for eveyone
WomenInAutomation2024: AI and Automation for eveyoneWomenInAutomation2024: AI and Automation for eveyone
WomenInAutomation2024: AI and Automation for eveyone
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance Toolbox
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
QMMS Lesson 2 - Using MS Excel Formula.pdf
QMMS Lesson 2 - Using MS Excel Formula.pdfQMMS Lesson 2 - Using MS Excel Formula.pdf
QMMS Lesson 2 - Using MS Excel Formula.pdf
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 

Realtime Analytics with MongoDB - MongoDB Meetup NYC