SlideShare uma empresa Scribd logo
1 de 34
Baixar para ler offline
Appboy Analytics
Jon Hyman
NY MongoDB User Group, November 19, 2013
eBay NYC

@appboy @jon_hyman
A LITTLE BIT ABOUT
US & APPBOY
(who we are and what we do)

Appboy is a mobile relationship
management platform for apps
Jon Hyman
CIO :: @jon_hyman

!
Harvard
Bridgewater
Appboy improves
engagement by helping you
understand your app users
•

IDENTIFY - Understand demographics,

social and behavioral data
•

SEGMENT - Organize customers into

groups based on behaviors, events, user
attributes, and location
•

ENGAGE - Message users through

push notifications, emails, and multiple
forms of in-app messages
Use Case: Customer engagement begins with onboarding

Urban Outfitters

textPlus

Shape Magazine
Agenda
•

How to quickly store time series data in
MongoDB using flexible schemas


•

Learn how flexible schemas can easily
provide breakdowns across dimensions


•

Counting quickly: statistical analysis on top
of MongoDB queries
What kinds of analytics does Appboy track?
•

Lots of time series data
•

App opens over time

•

Events over time

•

Revenue over time

•

Marketing campaign stats and efficacy over time
What kinds of analytics does Appboy track?
•

Breakdowns*
•

Device types

•

Device OS versions

•

Screen resolutions

•

Revenue by product

* We also care about this over time!
What kinds of analytics does Appboy track?
•

User segment membership
•

How many users are in each
segment?

•

How many can be emailed or
reached via push notifications?

•

What is the average revenue
per user in the segment?

•

Per paying user?
Pre-aggregated Analytics:

APP OPENS OVER TIME
Typical time series collection
Log a new row for each open received
!
{!
timestamp: 2013-11-14 00:00:00 UTC,!
app_id: App identifier!
}!
!
db.app_opens.find({app_id: A, timestamp: {$gte: date}})!

Pro: Really, really simple. Easy to add attribution to users.
Con: You need to aggregate the data before
drawing the chart; lots of documents read into
memory, lots of dirty pages
Fewer documents with pre-aggregation iteration 1
Create a document that groups by the time period
!

{!
app_id: App identifier,!
date: Date of the document,!
hour: 0-23 based hour this document represents,!
opens: Number of opens this hour!
}!
!

db.app_opens.update({date: D, app_id: A, hour: 0},
{$inc: {opens:1}})
Pro: Really easy to draw histograms
Con: We never care about an hour by itself. We lose attribution.
Fewer documents with pre-aggregation iteration 2
Create a document by day and have each hour be a field
!
{!
app_id: App identifier,!
date: Date of the document,!
total_opens: Total number of opens this day,!
0: Number of opens at midnight,!
1: Number of opens at 1am,!
...!
23: Number of opens at 11pm!
}!

!
db.app_opens.update(!
{date: D, app_id: A}, !
{$inc: {“0”:1, total:1}}!
)

Pro: Document count is low, easy to use aggregation framework
for longer spans, fast: document should be in working set
Fewer documents with pre-aggregation iteration 2
•

What about looking at different dimensions?
•

App opens by device type (e.g., how do iPads

compare to iPhones?)
•

Demographics (gender, age group)
Solution!

FLEXIBLE SCHEMAS!
Fewer documents with pre-aggregation iteration 3
Dynamically add dimensions in the document

!
{!
app_id; App identifier,!
date: Date of the document,!
totals: {!
app_opens: Total number of opens this day,!
devices: {!
"iPad Air": Total number of opens on the iPad Air,!
"iPhone 4": Total number of opens on the iPhone 4,!
},!
genders: {!
male: Total number of opens from male users,!
female: Total number of opens from female users!
},!
...!
},!
0: {!
app_opens: Number of opens at midnight,!
devices: {!
"iPad Air": Number of opens on the iPad Air at midnight,!
"iPhone 4": Number of opens on the iPhone 4 at midnight,!
},!
...!
},!
...!
}!

!

db.app_opens.update({date: D, app_id: A}, {$inc: {“0”:1, total:1}})
Pre-aggregated analytics
Pros

•
•

Easily extensible to add other dimensions

•

Still only using one document, therefore you can create
charts very quickly

•

You get breakdowns over a time period for free

!

Cons

•
•

Pre-aggregated data has no attribution

•

Have to know questions ahead of time

Follow up: What if we wanted to look at a graph by age group?
Pre-aggregated analytics summary
•

Get started tracking time series
data quickly

•

You get breakdowns for free

•

Adding dimensions is super simple

•

No attribution, need to know
questions ahead of time

•

Don’t just rely on pre-aggregated
analytics
Counting quickly:

USER SEGMENTATION &
STATISTICAL ANALYSIS
User Segmentation
•A

group of users who match some set of filters
Counting quickly
Appboy shows you segment membership in real-time
as you add/edit/remove filters.
!

How do we do it quickly?
!

We estimate the population sizes of segments when
using our web UI.
Counting quickly

Goal: Quickly get the
count() of an arbitrary
query
!

Problem: MongoDB
counts are slow,
especially unindexed
ones
Counting quickly
10 million documents that represent people:
{!
favorite_color: “blue”,!
age: 27,!
gender: “M”,!
favorite_food: “pizza”,!
city: “NYC”,!
shoe_size: 11,!
attractiveness: 10,!
...!
} !
Counting quickly
10 million documents that represent people:
{!
favorite_color: “blue”,!
age: 27,!
gender: “M”,!
favorite_food: “pizza”,!
city: “NYC”,!
shoe_size: 11,!
attractiveness: 10,!
...!
} !
•

How many people like blue?

•

How many live in NYC and love pizza?

•

How many men have a shoe size less than 10?
Answer:

Big Question:
How do you estimate
counts?

The same way news
networks do it.
!

With confidence.
Counting quickly
Add a random number in a known range to each document. Say,
between 0 and 9999.
{!
random: 4583,!
favorite_color: “blue”,!
age: 27,!
gender: “M”,!
favorite_food: “pizza”,!
city: “NYC”,!
shoe_size: 11,!
attractiveness: 10,!
...!
} !

Add an index on the random number:
!

db.users.ensureIndex({random:1})
Counting quickly
Step 1: Get a random sample
!

I have 10 million documents. Of my 10,000 random “buckets”, I
should expect each “bucket” to hold about 1,000 users.
!

E.g.,
!

db.users.find({random: 123}).count() == ~1000!
db.users.find({random: 9043}).count() == ~1000!
db.users.find({random: 4982}).count() == ~1000
Counting quickly
Step 1: Get a random sample
!

Let’s take a random 100,000 users. Grab a random range that
“holds” those users. These all work:
!

db.users.find({random: {$gt: 0, $lt: 101})!
db.users.find({random: {$gt: 503, $lt: 604})!
db.users.find({random: {$gt: 8938, $lt: 9039})!
db.users.find({$or: [!
{random: {$gt: 9955}}, !
{random: {$lt: 56}}!
])
Tip: Limit $maxScan to 100,000 just to be safe
Counting quickly
Step 2: Learn about that random sample
!

db.users.find(!
{!
random: {$gt: 0, $lt: 101},!
gender: “M”,!
favorite_color: “blue”,!
size_size: {$gt: 10}!
}, !
)!
._addSpecial(“$maxScan”, 100000)!
.explain()
Explain Result:
!
{!
nscannedObjects: 100000,!
n: 11302,!
...!
} !
Counting quickly
Step 3: Do the math
!

Population: 10,000,000
!

Sample size: 100,000
!

Num matches: 11,302
!

Percentage of users who matched: 11.3%
!

Estimated total count: 1,130,000 +/- 0.2%
with 95% confidence
Counting quickly
Step 4: Optimize
!

Limit $maxScan to (100,000/numShards) to be even
faster
•

!

Cache the random range for a few hours

•
!

Add more RAM (or shards)

•
!

Cache results to not hit the database for the same
query
•
Counting quickly
Step 5: Improve
!

Get more than one count: use the aggregation
framework on top of the population’s sample size

•

•

Work around all sorts of Mongo bugs :-(
Summarize
•

Pre-aggregated analytics
•

Create a document that represents event occurrences
in some time period

•

Takes full advantage of MongoDB’s flexible schemas

•

Not a catch-all for analytics, you should still store event
data
Summarize
•

Counting quickly
•

Estimate results of arbitrary queries using population
sample sizes

•

Depending on your app, this could be a great way to
keep response time predictable as you scale
Thanks! Questions?
jon@appboy.com

@appboy @jon_hyman

Mais conteúdo relacionado

Semelhante a Appboy analytics - NYC MUG 11/19/13

AppSec Pipelines and Event based Security
AppSec Pipelines and Event based SecurityAppSec Pipelines and Event based Security
AppSec Pipelines and Event based SecurityMatt Tesauro
 
穆黎森:Interactive batch query at scale
穆黎森:Interactive batch query at scale穆黎森:Interactive batch query at scale
穆黎森:Interactive batch query at scalehdhappy001
 
Using MongoDB + Hadoop Together
Using MongoDB + Hadoop TogetherUsing MongoDB + Hadoop Together
Using MongoDB + Hadoop TogetherMongoDB
 
Our Data Ourselves, Pydata 2015
Our Data Ourselves, Pydata 2015Our Data Ourselves, Pydata 2015
Our Data Ourselves, Pydata 2015kingsBSD
 
Genn.ai introduction for Buzzwords
Genn.ai introduction for BuzzwordsGenn.ai introduction for Buzzwords
Genn.ai introduction for BuzzwordsTakeshi Nakano
 
Build Your Web Analytics with node.js, Amazon DynamoDB and Amazon EMR (BDT203...
Build Your Web Analytics with node.js, Amazon DynamoDB and Amazon EMR (BDT203...Build Your Web Analytics with node.js, Amazon DynamoDB and Amazon EMR (BDT203...
Build Your Web Analytics with node.js, Amazon DynamoDB and Amazon EMR (BDT203...Amazon Web Services
 
Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014
Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014
Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014The Hive
 
Using Visualizations to Monitor Changes and Harvest Insights from a Global-sc...
Using Visualizations to Monitor Changes and Harvest Insights from a Global-sc...Using Visualizations to Monitor Changes and Harvest Insights from a Global-sc...
Using Visualizations to Monitor Changes and Harvest Insights from a Global-sc...Krist Wongsuphasawat
 
Agile Data: Building Hadoop Analytics Applications
Agile Data: Building Hadoop Analytics ApplicationsAgile Data: Building Hadoop Analytics Applications
Agile Data: Building Hadoop Analytics ApplicationsDataWorks Summit
 
An Introduction To Software Development - Software Development Midterm Review
An Introduction To Software Development - Software Development Midterm ReviewAn Introduction To Software Development - Software Development Midterm Review
An Introduction To Software Development - Software Development Midterm ReviewBlue Elephant Consulting
 
SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018 SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018 CareerBuilder.com
 
Building an Open Source, Real-Time, Billion Object Spatio-Temporal Search Pla...
Building an Open Source, Real-Time, Billion Object Spatio-Temporal Search Pla...Building an Open Source, Real-Time, Billion Object Spatio-Temporal Search Pla...
Building an Open Source, Real-Time, Billion Object Spatio-Temporal Search Pla...Paolo Corti
 
Un backend: pour tous vos objets connectés
Un backend: pour tous vos objets connectésUn backend: pour tous vos objets connectés
Un backend: pour tous vos objets connectésAmazon Web Services
 
Klout as an Example Application of Topics-oriented NLP APIs
Klout as an Example Application of Topics-oriented NLP APIsKlout as an Example Application of Topics-oriented NLP APIs
Klout as an Example Application of Topics-oriented NLP APIsTyler Singletary
 

Semelhante a Appboy analytics - NYC MUG 11/19/13 (20)

Data Visualization
Data VisualizationData Visualization
Data Visualization
 
AWS re:Invent Hackathon
AWS re:Invent HackathonAWS re:Invent Hackathon
AWS re:Invent Hackathon
 
AppSec Pipelines and Event based Security
AppSec Pipelines and Event based SecurityAppSec Pipelines and Event based Security
AppSec Pipelines and Event based Security
 
穆黎森:Interactive batch query at scale
穆黎森:Interactive batch query at scale穆黎森:Interactive batch query at scale
穆黎森:Interactive batch query at scale
 
UI and UX for Mobile Developers
UI and UX for Mobile DevelopersUI and UX for Mobile Developers
UI and UX for Mobile Developers
 
Android development first steps
Android development   first stepsAndroid development   first steps
Android development first steps
 
Using MongoDB + Hadoop Together
Using MongoDB + Hadoop TogetherUsing MongoDB + Hadoop Together
Using MongoDB + Hadoop Together
 
Our Data Ourselves, Pydata 2015
Our Data Ourselves, Pydata 2015Our Data Ourselves, Pydata 2015
Our Data Ourselves, Pydata 2015
 
amansingh.docx
amansingh.docxamansingh.docx
amansingh.docx
 
Genn.ai introduction for Buzzwords
Genn.ai introduction for BuzzwordsGenn.ai introduction for Buzzwords
Genn.ai introduction for Buzzwords
 
Build Your Web Analytics with node.js, Amazon DynamoDB and Amazon EMR (BDT203...
Build Your Web Analytics with node.js, Amazon DynamoDB and Amazon EMR (BDT203...Build Your Web Analytics with node.js, Amazon DynamoDB and Amazon EMR (BDT203...
Build Your Web Analytics with node.js, Amazon DynamoDB and Amazon EMR (BDT203...
 
APIs v2
APIs v2APIs v2
APIs v2
 
Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014
Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014
Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014
 
Using Visualizations to Monitor Changes and Harvest Insights from a Global-sc...
Using Visualizations to Monitor Changes and Harvest Insights from a Global-sc...Using Visualizations to Monitor Changes and Harvest Insights from a Global-sc...
Using Visualizations to Monitor Changes and Harvest Insights from a Global-sc...
 
Agile Data: Building Hadoop Analytics Applications
Agile Data: Building Hadoop Analytics ApplicationsAgile Data: Building Hadoop Analytics Applications
Agile Data: Building Hadoop Analytics Applications
 
An Introduction To Software Development - Software Development Midterm Review
An Introduction To Software Development - Software Development Midterm ReviewAn Introduction To Software Development - Software Development Midterm Review
An Introduction To Software Development - Software Development Midterm Review
 
SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018 SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018
 
Building an Open Source, Real-Time, Billion Object Spatio-Temporal Search Pla...
Building an Open Source, Real-Time, Billion Object Spatio-Temporal Search Pla...Building an Open Source, Real-Time, Billion Object Spatio-Temporal Search Pla...
Building an Open Source, Real-Time, Billion Object Spatio-Temporal Search Pla...
 
Un backend: pour tous vos objets connectés
Un backend: pour tous vos objets connectésUn backend: pour tous vos objets connectés
Un backend: pour tous vos objets connectés
 
Klout as an Example Application of Topics-oriented NLP APIs
Klout as an Example Application of Topics-oriented NLP APIsKlout as an Example Application of Topics-oriented NLP APIs
Klout as an Example Application of Topics-oriented NLP APIs
 

Mais de MongoDB

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump StartMongoDB
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB
 

Mais de MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
 

Último

CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 

Último (20)

CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 

Appboy analytics - NYC MUG 11/19/13

  • 1. Appboy Analytics Jon Hyman NY MongoDB User Group, November 19, 2013 eBay NYC @appboy @jon_hyman
  • 2. A LITTLE BIT ABOUT US & APPBOY (who we are and what we do) Appboy is a mobile relationship management platform for apps Jon Hyman CIO :: @jon_hyman ! Harvard Bridgewater
  • 3. Appboy improves engagement by helping you understand your app users • IDENTIFY - Understand demographics, social and behavioral data • SEGMENT - Organize customers into groups based on behaviors, events, user attributes, and location • ENGAGE - Message users through push notifications, emails, and multiple forms of in-app messages
  • 4. Use Case: Customer engagement begins with onboarding Urban Outfitters textPlus Shape Magazine
  • 5. Agenda • How to quickly store time series data in MongoDB using flexible schemas
 • Learn how flexible schemas can easily provide breakdowns across dimensions
 • Counting quickly: statistical analysis on top of MongoDB queries
  • 6. What kinds of analytics does Appboy track? • Lots of time series data • App opens over time • Events over time • Revenue over time • Marketing campaign stats and efficacy over time
  • 7. What kinds of analytics does Appboy track? • Breakdowns* • Device types • Device OS versions • Screen resolutions • Revenue by product * We also care about this over time!
  • 8. What kinds of analytics does Appboy track? • User segment membership • How many users are in each segment? • How many can be emailed or reached via push notifications? • What is the average revenue per user in the segment? • Per paying user?
  • 10. Typical time series collection Log a new row for each open received ! {! timestamp: 2013-11-14 00:00:00 UTC,! app_id: App identifier! }! ! db.app_opens.find({app_id: A, timestamp: {$gte: date}})! Pro: Really, really simple. Easy to add attribution to users. Con: You need to aggregate the data before drawing the chart; lots of documents read into memory, lots of dirty pages
  • 11. Fewer documents with pre-aggregation iteration 1 Create a document that groups by the time period ! {! app_id: App identifier,! date: Date of the document,! hour: 0-23 based hour this document represents,! opens: Number of opens this hour! }! ! db.app_opens.update({date: D, app_id: A, hour: 0}, {$inc: {opens:1}}) Pro: Really easy to draw histograms Con: We never care about an hour by itself. We lose attribution.
  • 12. Fewer documents with pre-aggregation iteration 2 Create a document by day and have each hour be a field ! {! app_id: App identifier,! date: Date of the document,! total_opens: Total number of opens this day,! 0: Number of opens at midnight,! 1: Number of opens at 1am,! ...! 23: Number of opens at 11pm! }! ! db.app_opens.update(! {date: D, app_id: A}, ! {$inc: {“0”:1, total:1}}! ) Pro: Document count is low, easy to use aggregation framework for longer spans, fast: document should be in working set
  • 13. Fewer documents with pre-aggregation iteration 2 • What about looking at different dimensions? • App opens by device type (e.g., how do iPads compare to iPhones?) • Demographics (gender, age group)
  • 15. Fewer documents with pre-aggregation iteration 3 Dynamically add dimensions in the document ! {! app_id; App identifier,! date: Date of the document,! totals: {! app_opens: Total number of opens this day,! devices: {! "iPad Air": Total number of opens on the iPad Air,! "iPhone 4": Total number of opens on the iPhone 4,! },! genders: {! male: Total number of opens from male users,! female: Total number of opens from female users! },! ...! },! 0: {! app_opens: Number of opens at midnight,! devices: {! "iPad Air": Number of opens on the iPad Air at midnight,! "iPhone 4": Number of opens on the iPhone 4 at midnight,! },! ...! },! ...! }! ! db.app_opens.update({date: D, app_id: A}, {$inc: {“0”:1, total:1}})
  • 16. Pre-aggregated analytics Pros • • Easily extensible to add other dimensions • Still only using one document, therefore you can create charts very quickly • You get breakdowns over a time period for free ! Cons • • Pre-aggregated data has no attribution • Have to know questions ahead of time Follow up: What if we wanted to look at a graph by age group?
  • 17. Pre-aggregated analytics summary • Get started tracking time series data quickly • You get breakdowns for free • Adding dimensions is super simple • No attribution, need to know questions ahead of time • Don’t just rely on pre-aggregated analytics
  • 18. Counting quickly: USER SEGMENTATION & STATISTICAL ANALYSIS
  • 19. User Segmentation •A group of users who match some set of filters
  • 20. Counting quickly Appboy shows you segment membership in real-time as you add/edit/remove filters. ! How do we do it quickly? ! We estimate the population sizes of segments when using our web UI.
  • 21. Counting quickly Goal: Quickly get the count() of an arbitrary query ! Problem: MongoDB counts are slow, especially unindexed ones
  • 22. Counting quickly 10 million documents that represent people: {! favorite_color: “blue”,! age: 27,! gender: “M”,! favorite_food: “pizza”,! city: “NYC”,! shoe_size: 11,! attractiveness: 10,! ...! } !
  • 23. Counting quickly 10 million documents that represent people: {! favorite_color: “blue”,! age: 27,! gender: “M”,! favorite_food: “pizza”,! city: “NYC”,! shoe_size: 11,! attractiveness: 10,! ...! } ! • How many people like blue? • How many live in NYC and love pizza? • How many men have a shoe size less than 10?
  • 24. Answer: Big Question: How do you estimate counts? The same way news networks do it. ! With confidence.
  • 25. Counting quickly Add a random number in a known range to each document. Say, between 0 and 9999. {! random: 4583,! favorite_color: “blue”,! age: 27,! gender: “M”,! favorite_food: “pizza”,! city: “NYC”,! shoe_size: 11,! attractiveness: 10,! ...! } ! Add an index on the random number: ! db.users.ensureIndex({random:1})
  • 26. Counting quickly Step 1: Get a random sample ! I have 10 million documents. Of my 10,000 random “buckets”, I should expect each “bucket” to hold about 1,000 users. ! E.g., ! db.users.find({random: 123}).count() == ~1000! db.users.find({random: 9043}).count() == ~1000! db.users.find({random: 4982}).count() == ~1000
  • 27. Counting quickly Step 1: Get a random sample ! Let’s take a random 100,000 users. Grab a random range that “holds” those users. These all work: ! db.users.find({random: {$gt: 0, $lt: 101})! db.users.find({random: {$gt: 503, $lt: 604})! db.users.find({random: {$gt: 8938, $lt: 9039})! db.users.find({$or: [! {random: {$gt: 9955}}, ! {random: {$lt: 56}}! ]) Tip: Limit $maxScan to 100,000 just to be safe
  • 28. Counting quickly Step 2: Learn about that random sample ! db.users.find(! {! random: {$gt: 0, $lt: 101},! gender: “M”,! favorite_color: “blue”,! size_size: {$gt: 10}! }, ! )! ._addSpecial(“$maxScan”, 100000)! .explain() Explain Result: ! {! nscannedObjects: 100000,! n: 11302,! ...! } !
  • 29. Counting quickly Step 3: Do the math ! Population: 10,000,000 ! Sample size: 100,000 ! Num matches: 11,302 ! Percentage of users who matched: 11.3% ! Estimated total count: 1,130,000 +/- 0.2% with 95% confidence
  • 30. Counting quickly Step 4: Optimize ! Limit $maxScan to (100,000/numShards) to be even faster • ! Cache the random range for a few hours • ! Add more RAM (or shards) • ! Cache results to not hit the database for the same query •
  • 31. Counting quickly Step 5: Improve ! Get more than one count: use the aggregation framework on top of the population’s sample size
 • • Work around all sorts of Mongo bugs :-(
  • 32. Summarize • Pre-aggregated analytics • Create a document that represents event occurrences in some time period • Takes full advantage of MongoDB’s flexible schemas • Not a catch-all for analytics, you should still store event data
  • 33. Summarize • Counting quickly • Estimate results of arbitrary queries using population sample sizes • Depending on your app, this could be a great way to keep response time predictable as you scale