SlideShare uma empresa Scribd logo
1 de 43
ANALYTICS WITH MONGODB


      ROGER BODAMER
YOU WANT TO ANALYZE THIS
LIKE THIS
BUT HOW ?



• These   graphs are the end result of a process

• In
   order get here there’s a few things you need to do and
 explore
A WORD ON NON-NATIVE
         APPROACHES
•   Yes, you can

    •   map your document schema to a relational schema

    •   then export your data from MongoDB to a relational db

        •   and set up a cron job to do this every day

    •   then use your BI tool to map relational to “objects”

    •   and then Report and do Analytics
BUT THAT WOULD BE NO
              FUN


• Analytics   using Native Queries

•A   simple process
PROCESS: NAIVE

• Take   a sample document

• Develop     query

• Put   on chart

• Done    !

  • and   a gold star from your boss !
PROCESS: REALITY
• Understand       your schema
  • multiple schema’s in single collection
  • multiple collections / multiple data sources
• Iterate:
  • define metric
  • develop query and report on metrics
    • understand and drill down or discard
    • repeat
• Operationalize metrics: dashboard
  • Dimensions
  • Plotting
WHY ITERATE ?
UNDERSTAND YOUR SCHEMA

{
    "name" : "Mario",
    "games" : [{"game" : "WoW",
                "duration" : 130},
               {"game" : "Tetris",
                "duration" : 130}]
}
BUT ALSO:
• Schema’s   can be Polymorphic

{
    "name" : "Bob",
    "location" : "us",
    "games" : [{"game" : "WoW",
                "duration" : 2910},
               {"game" : "Tetris",
                "duration" : 593}]
}
SO NOW WHAT ?
•   Only report on common attributes

    •   probably missing the most recent / interesting data
SO NOW WHAT ?
•   Write 2 programs, one for each schema

    •   2 graphs / reports

    •   2 programs writing to 1 graph (basically merging instance data in 2
        places)
SO NOW WHAT ?

•   Unify Schema

    •   deal with absent, null values

    •   translate(NULL, “EU”);
ITERATE



• total   time and how many games people play in the us vs eu ?
QUERY
db.runCommand(
{ aggregate : "gamers", pipeline : [
    { $project : {
	

    location : 1,
	

    games: 1
    }},
    { $unwind : "$games" },
    { $group : {
        _id : { location : 1},
	

    number_games: { $sum : 1 },
        total_duration: {$sum : "$games.duration"}
    }},
    { $project : {
	

    _id : 0,
        location : "$_id.location",
	

    number_games : 1,
        total_duration : 1
    }}
]})
SIDEBAR: WRITING
           AGGREGATION QUERIES
•   Prepare Data
    •   Extract relevant properties from collection documents
    •   Unwind sub collection if its document is contributing to aggregation
•   Aggregate data
    •   determine the key (_id) on which the aggregates should be done
    •   name aggregates
•   Project Data
    •   For final results
EXAMPLE
{
    "name" : "Alice",
    "location" : "us",
    "games" : [{
        "game" : "WoW",
        "duration" : 200
      }, {
        "game" : "Tetris",
        "duration" : 100
      }]
}
PREPARE
• Only   use location and games:

{ $project : {
	

 location : 1,
	

 games: 1
    }}


• Unwind   games as properties of its documents are aggregated
 over:

{ $unwind : "$games" }
AGGREGATE DATA
• Aggregate on number of games (add 1 per game)
  and total duration (add duration per game)
  using location as key


{ $group : {
      _id : { location : 1},
	

   number_games: { $sum : 1 },
      total_duration: {$sum : "$games.duration"}
   }}
PROJECT
• Only   show location and aggregates, do not show _id


{ $project : {
	

 _id : 0,
      location : "$_id.location",
	

 number_games : 1,
      total_duration : 1
   }}
RESULT 1




• People   spend a little more time playing in the US
• More   games played in the EU
RING....
CHALLENGE 2


• Since
     we found EU and US play similar amount and same
 number of games, new challenge is:


• Lets
     see what the distribution of different
 games is the 2 locations
QUERY 2
db.runCommand(
{ aggregate : "gamers", pipeline : [
    { $project : {
	

     location : 1,
	

     games : 1
    }},
    { $unwind : "$games" },
    { $project : {
	

     location : 1,
	

     game : "$games.game",
        duration : "$games.duration"
    }},
    { $group : {
        _id : { location: "$location", game: "$game"},
	

     number_games: { $sum : 1 },
        total_duration: {$sum : "$duration"}
    }},
    { $project : {
        _id : 0,
        location : "$_id.location",
        game : "$_id.game",
	

     number_games : 1,
        total_duration : 1
    }}
]})
QUERY 2
db.runCommand(
{ aggregate : "gamers", pipeline : [
    { $project : {
	

     location : 1,                                    location, games
	

     games : 1
    }},
    { $unwind : "$games" },
    { $project : {
	

     location : 1,
	

     game : "$games.game",
        duration : "$games.duration"
    }},
    { $group : {
        _id : { location: "$location", game: "$game"},
	

     number_games: { $sum : 1 },
        total_duration: {$sum : "$duration"}
    }},
    { $project : {
        _id : 0,
        location : "$_id.location",
        game : "$_id.game",
	

     number_games : 1,
        total_duration : 1
    }}
]})
QUERY 2
db.runCommand(
{ aggregate : "gamers", pipeline : [
    { $project : {
	

     location : 1,                                    location, games
	

     games : 1
    }},
    { $unwind : "$games" },
    { $project : {
	

     location : 1,
	

     game : "$games.game",                            location, game, duration
        duration : "$games.duration"
    }},
    { $group : {
        _id : { location: "$location", game: "$game"},
	

     number_games: { $sum : 1 },
        total_duration: {$sum : "$duration"}
    }},
    { $project : {
        _id : 0,
        location : "$_id.location",
        game : "$_id.game",
	

     number_games : 1,
        total_duration : 1
    }}
]})
QUERY 2
db.runCommand(
{ aggregate : "gamers", pipeline : [
    { $project : {
	

     location : 1,                                     location, games
	

     games : 1
    }},
    { $unwind : "$games" },
    { $project : {
	

     location : 1,
	

     game : "$games.game",                            location, game, duration
        duration : "$games.duration"
    }},
    { $group : {
        _id : { location: "$location", game: "$game"},   key: aggregate on location and game
	

     number_games: { $sum : 1 },
        total_duration: {$sum : "$duration"}
    }},
    { $project : {
        _id : 0,
        location : "$_id.location",
        game : "$_id.game",
	

     number_games : 1,
        total_duration : 1
    }}
]})
QUERY 2
db.runCommand(
{ aggregate : "gamers", pipeline : [
    { $project : {
	

     location : 1,                                     location, games
	

     games : 1
    }},
    { $unwind : "$games" },
    { $project : {
	

     location : 1,
	

     game : "$games.game",                            location, game, duration
        duration : "$games.duration"
    }},
    { $group : {
        _id : { location: "$location", game: "$game"},   key: aggregate on location and game
	

     number_games: { $sum : 1 },
        total_duration: {$sum : "$duration"}
    }},
    { $project : {
        _id : 0,
        location : "$_id.location",
        game : "$_id.game",
	

     number_games : 1,
        total_duration : 1
    }}
]})
QUERY 2
db.runCommand(
{ aggregate : "gamers", pipeline : [
    { $project : {
	

     location : 1,                                                location, games
	

     games : 1
    }},
    { $unwind : "$games" },
    { $project : {
	

     location : 1,
	

     game : "$games.game",                                        location, game, duration
        duration : "$games.duration"
    }},
    { $group : {
        _id : { location: "$location", game: "$game"},              key: aggregate on location and game
	

     number_games: { $sum : 1 },
        total_duration: {$sum : "$duration"}
    }},
    { $project : {
        _id : 0,
        location : "$_id.location",                      project: location, game, total(#games), sum(duration)
        game : "$_id.game",
	

     number_games : 1,
        total_duration : 1
    }}
]})
RESULT 2




Count: EU - WoW, US Tetris
EU spends more time on WoW, US it’s more
evenly spread
RING....
CHALLENGE 3:



• How   do I compare Bob to everyone else in the EU ?
QUERY

•2   aggregations happening at same time:

  •1   by user

  •1   by location

• This   query needs to be broken up in several queries

• Fairly   complex

• Currently   easiest to process in Ruby/Java/Python/...
db.runCommand(                                                 db.runCommand(
{ aggregate : "gamers", pipeline : [                           { aggregate : "gamers", pipeline : [
    { $project : {                                                 { $project : {
         name : 1,                                             	

     location : 1,
	

     location : 1,                                          	

     games : 1
	

     games : 1                                                  }},
    }},                                                            { $unwind : "$games" },
    { $unwind : "$games" },                                        { $project : {
    { $project : {                                                      location : 1,
	

     name: 1,                                                        duration : "$games.duration"
         location : 1,                                             }},
	

     game : "$games.game",                                      { $group : {
         duration : "$games.duration"                                   _id : { location: 1},
    }},                                                                 total_duration: {$sum :
    { $group : {                                               "$duration"}
         _id : { location: "$location", name: "$name", game:       }},
"$game"},                                                          { $project : {
         total_duration: {$sum : "$duration"}                  	

     name : "$_id.location",
    }},                                                                 _id : 0,
    { $project : {                                                      total_duration : 1
	

     name : "$_id.name",                                        }}
         _id : 0,                                              ]})
         location : "$_id.location",
         game : "$_id.game",
         total_duration : 1
    }}
]})
RESULT 3




• Bob plays >20% WoW in comparison to the Europeans, but
 plays 200% more Tetris
A NOTE ON QUERIES


• There’s   no notion of a declared schema

• The   augmented scheme is coded in queries

• Reuse   is very hard, happens at a query language
DIMENSIONS
• Most   questions / graphs have a dimension

 • Time, Geo

 • Categories

 • Relative: what’s   X’s contribution of revenue to total

• Youwill need to be able to pass in dimensions as a
 predicate for your queries

 • or   cache result and post process client-side
A WORD ON RENDERING
           GRAPHS / REPORTS
• Several   libraries available for ruby / python / java

  • Gruff, Scruffy, StockCharts, D3, JRafael, JQuery Vizualize,
   MooCharts, etc, etc.

• Also some services: John Nunemakers work (http://
 get.gaug.es/)

• But   Basically:

  • you   know how to program, right !
REVIEW
• Understand       your schema
  • multiple schema’s in single collection
  • multiple collections / multiple data sources
• Iterate:
  • define metric
  • develop query and report on metrics
    • understand and drill down or discard
    • repeat
• Operationalize metrics: dashboard
  • Dimensions
  • Plotting
PUNCHLINES

• We     have described a software engineering process

  • but    requirements will be very fluid

• When      you know how to write ruby / java / python etc. - life is
  good

• If   you’re a business analyst you have a problem

  • better   be BFF with some engineer :)
PLUG

• We’ve    been working on a declarative analytics product

• (initially)   uses Excel as its presentation layer

• Reach    out to me if you’re interested

  @rogerb
  roger@norellan.com
THANK YOU / QUESTIONS

Mais conteúdo relacionado

Mais procurados

MongoDB Online Conference: Introducing MongoDB 2.2
MongoDB Online Conference: Introducing MongoDB 2.2MongoDB Online Conference: Introducing MongoDB 2.2
MongoDB Online Conference: Introducing MongoDB 2.2
MongoDB
 
From mysql to MongoDB(MongoDB2011北京交流会)
From mysql to MongoDB(MongoDB2011北京交流会)From mysql to MongoDB(MongoDB2011北京交流会)
From mysql to MongoDB(MongoDB2011北京交流会)
Night Sailer
 
Html5 game programming overview
Html5 game programming overviewHtml5 game programming overview
Html5 game programming overview
민태 김
 
First app online conf
First app   online confFirst app   online conf
First app online conf
MongoDB
 
Cleaner, Leaner, Meaner: Refactoring your jQuery
Cleaner, Leaner, Meaner: Refactoring your jQueryCleaner, Leaner, Meaner: Refactoring your jQuery
Cleaner, Leaner, Meaner: Refactoring your jQuery
Rebecca Murphey
 
enchant js workshop on Calpoly
enchant js workshop  on Calpolyenchant js workshop  on Calpoly
enchant js workshop on Calpoly
Ryo Shimizu
 
Contando uma história com O.O.
Contando uma história com O.O.Contando uma história com O.O.
Contando uma história com O.O.
Vagner Zampieri
 

Mais procurados (20)

The Ring programming language version 1.6 book - Part 50 of 189
The Ring programming language version 1.6 book - Part 50 of 189The Ring programming language version 1.6 book - Part 50 of 189
The Ring programming language version 1.6 book - Part 50 of 189
 
The Ring programming language version 1.5.3 book - Part 62 of 184
The Ring programming language version 1.5.3 book - Part 62 of 184The Ring programming language version 1.5.3 book - Part 62 of 184
The Ring programming language version 1.5.3 book - Part 62 of 184
 
Sensmon couchdb
Sensmon couchdbSensmon couchdb
Sensmon couchdb
 
Mongo or Die: How MongoDB Powers Doodle or Die
Mongo or Die: How MongoDB Powers Doodle or DieMongo or Die: How MongoDB Powers Doodle or Die
Mongo or Die: How MongoDB Powers Doodle or Die
 
Game dev 101 part 3
Game dev 101 part 3Game dev 101 part 3
Game dev 101 part 3
 
MongoDB Online Conference: Introducing MongoDB 2.2
MongoDB Online Conference: Introducing MongoDB 2.2MongoDB Online Conference: Introducing MongoDB 2.2
MongoDB Online Conference: Introducing MongoDB 2.2
 
The Ring programming language version 1.9 book - Part 62 of 210
The Ring programming language version 1.9 book - Part 62 of 210The Ring programming language version 1.9 book - Part 62 of 210
The Ring programming language version 1.9 book - Part 62 of 210
 
From mysql to MongoDB(MongoDB2011北京交流会)
From mysql to MongoDB(MongoDB2011北京交流会)From mysql to MongoDB(MongoDB2011北京交流会)
From mysql to MongoDB(MongoDB2011北京交流会)
 
Html5 game programming overview
Html5 game programming overviewHtml5 game programming overview
Html5 game programming overview
 
Books
BooksBooks
Books
 
Game dev 101 part 2
Game dev 101   part 2Game dev 101   part 2
Game dev 101 part 2
 
First app online conf
First app   online confFirst app   online conf
First app online conf
 
MongoDB Europe 2016 - Enabling the Internet of Things at Proximus - Belgium's...
MongoDB Europe 2016 - Enabling the Internet of Things at Proximus - Belgium's...MongoDB Europe 2016 - Enabling the Internet of Things at Proximus - Belgium's...
MongoDB Europe 2016 - Enabling the Internet of Things at Proximus - Belgium's...
 
Cleaner, Leaner, Meaner: Refactoring your jQuery
Cleaner, Leaner, Meaner: Refactoring your jQueryCleaner, Leaner, Meaner: Refactoring your jQuery
Cleaner, Leaner, Meaner: Refactoring your jQuery
 
Coding Horrors
Coding HorrorsCoding Horrors
Coding Horrors
 
Groovy scripts with Groovy
Groovy scripts with GroovyGroovy scripts with Groovy
Groovy scripts with Groovy
 
The Testing Games: Mocking, yay!
The Testing Games: Mocking, yay!The Testing Games: Mocking, yay!
The Testing Games: Mocking, yay!
 
The Ring programming language version 1.5 book - Part 9 of 31
The Ring programming language version 1.5 book - Part 9 of 31The Ring programming language version 1.5 book - Part 9 of 31
The Ring programming language version 1.5 book - Part 9 of 31
 
enchant js workshop on Calpoly
enchant js workshop  on Calpolyenchant js workshop  on Calpoly
enchant js workshop on Calpoly
 
Contando uma história com O.O.
Contando uma história com O.O.Contando uma história com O.O.
Contando uma história com O.O.
 

Destaque

Social Analytics on MongoDB at MongoNYC
Social Analytics on MongoDB at MongoNYCSocial Analytics on MongoDB at MongoNYC
Social Analytics on MongoDB at MongoNYC
Patrick Stokes
 
Real Time Data Analytics with MongoDB and Fluentd at Wish
Real Time Data Analytics with MongoDB and Fluentd at WishReal Time Data Analytics with MongoDB and Fluentd at Wish
Real Time Data Analytics with MongoDB and Fluentd at Wish
MongoDB
 

Destaque (9)

Social Analytics on MongoDB at MongoNYC
Social Analytics on MongoDB at MongoNYCSocial Analytics on MongoDB at MongoNYC
Social Analytics on MongoDB at MongoNYC
 
Klmug presentation - Simple Analytics with MongoDB
Klmug presentation - Simple Analytics with MongoDBKlmug presentation - Simple Analytics with MongoDB
Klmug presentation - Simple Analytics with MongoDB
 
Blazing Fast Analytics with MongoDB & Spark
Blazing Fast Analytics with MongoDB & SparkBlazing Fast Analytics with MongoDB & Spark
Blazing Fast Analytics with MongoDB & Spark
 
MongoDB for Analytics
MongoDB for AnalyticsMongoDB for Analytics
MongoDB for Analytics
 
Webinar: How Penton Uses MongoDB As an Analytics Platform within their Drupal...
Webinar: How Penton Uses MongoDB As an Analytics Platform within their Drupal...Webinar: How Penton Uses MongoDB As an Analytics Platform within their Drupal...
Webinar: How Penton Uses MongoDB As an Analytics Platform within their Drupal...
 
Webinar: Faster Big Data Analytics with MongoDB
Webinar: Faster Big Data Analytics with MongoDBWebinar: Faster Big Data Analytics with MongoDB
Webinar: Faster Big Data Analytics with MongoDB
 
Webinar: MongoDB and Analytics: Building Solutions with the MongoDB BI Connector
Webinar: MongoDB and Analytics: Building Solutions with the MongoDB BI ConnectorWebinar: MongoDB and Analytics: Building Solutions with the MongoDB BI Connector
Webinar: MongoDB and Analytics: Building Solutions with the MongoDB BI Connector
 
Real Time Data Analytics with MongoDB and Fluentd at Wish
Real Time Data Analytics with MongoDB and Fluentd at WishReal Time Data Analytics with MongoDB and Fluentd at Wish
Real Time Data Analytics with MongoDB and Fluentd at Wish
 
MongoDB World 2016: The Best IoT Analytics with MongoDB
MongoDB World 2016: The Best IoT Analytics with MongoDBMongoDB World 2016: The Best IoT Analytics with MongoDB
MongoDB World 2016: The Best IoT Analytics with MongoDB
 

Semelhante a Thoughts on MongoDB Analytics

MongoDB Aggregation Framework
MongoDB Aggregation FrameworkMongoDB Aggregation Framework
MongoDB Aggregation Framework
Tyler Brock
 
Operational Intelligence with MongoDB Webinar
Operational Intelligence with MongoDB WebinarOperational Intelligence with MongoDB Webinar
Operational Intelligence with MongoDB Webinar
MongoDB
 
Geospatial Enhancements in MongoDB 2.4
Geospatial Enhancements in MongoDB 2.4Geospatial Enhancements in MongoDB 2.4
Geospatial Enhancements in MongoDB 2.4
MongoDB
 
Aggregation Framework
Aggregation FrameworkAggregation Framework
Aggregation Framework
MongoDB
 
"Powerful Analysis with the Aggregation Pipeline (Tutorial)"
"Powerful Analysis with the Aggregation Pipeline (Tutorial)""Powerful Analysis with the Aggregation Pipeline (Tutorial)"
"Powerful Analysis with the Aggregation Pipeline (Tutorial)"
MongoDB
 
[MongoDB.local Bengaluru 2018] Tutorial: Pipeline Power - Doing More with Mon...
[MongoDB.local Bengaluru 2018] Tutorial: Pipeline Power - Doing More with Mon...[MongoDB.local Bengaluru 2018] Tutorial: Pipeline Power - Doing More with Mon...
[MongoDB.local Bengaluru 2018] Tutorial: Pipeline Power - Doing More with Mon...
MongoDB
 

Semelhante a Thoughts on MongoDB Analytics (20)

MongoDB Aggregation Framework
MongoDB Aggregation FrameworkMongoDB Aggregation Framework
MongoDB Aggregation Framework
 
MongoDB Aggregation Framework in action !
MongoDB Aggregation Framework in action !MongoDB Aggregation Framework in action !
MongoDB Aggregation Framework in action !
 
Doing More with MongoDB Aggregation
Doing More with MongoDB AggregationDoing More with MongoDB Aggregation
Doing More with MongoDB Aggregation
 
Modern Application Foundations: Underscore and Twitter Bootstrap
Modern Application Foundations: Underscore and Twitter BootstrapModern Application Foundations: Underscore and Twitter Bootstrap
Modern Application Foundations: Underscore and Twitter Bootstrap
 
d3sparql.js demo at SWAT4LS 2014 in Berlin
d3sparql.js demo at SWAT4LS 2014 in Berlind3sparql.js demo at SWAT4LS 2014 in Berlin
d3sparql.js demo at SWAT4LS 2014 in Berlin
 
Perl on Amazon Elastic MapReduce
Perl on Amazon Elastic MapReducePerl on Amazon Elastic MapReduce
Perl on Amazon Elastic MapReduce
 
Operational Intelligence with MongoDB Webinar
Operational Intelligence with MongoDB WebinarOperational Intelligence with MongoDB Webinar
Operational Intelligence with MongoDB Webinar
 
Couchdb
CouchdbCouchdb
Couchdb
 
Coscup2021-rust-toturial
Coscup2021-rust-toturialCoscup2021-rust-toturial
Coscup2021-rust-toturial
 
Geospatial Enhancements in MongoDB 2.4
Geospatial Enhancements in MongoDB 2.4Geospatial Enhancements in MongoDB 2.4
Geospatial Enhancements in MongoDB 2.4
 
Aggregation Framework
Aggregation FrameworkAggregation Framework
Aggregation Framework
 
Webinar: Data Processing and Aggregation Options
Webinar: Data Processing and Aggregation OptionsWebinar: Data Processing and Aggregation Options
Webinar: Data Processing and Aggregation Options
 
"Powerful Analysis with the Aggregation Pipeline (Tutorial)"
"Powerful Analysis with the Aggregation Pipeline (Tutorial)""Powerful Analysis with the Aggregation Pipeline (Tutorial)"
"Powerful Analysis with the Aggregation Pipeline (Tutorial)"
 
[MongoDB.local Bengaluru 2018] Tutorial: Pipeline Power - Doing More with Mon...
[MongoDB.local Bengaluru 2018] Tutorial: Pipeline Power - Doing More with Mon...[MongoDB.local Bengaluru 2018] Tutorial: Pipeline Power - Doing More with Mon...
[MongoDB.local Bengaluru 2018] Tutorial: Pipeline Power - Doing More with Mon...
 
Security Challenges in Node.js
Security Challenges in Node.jsSecurity Challenges in Node.js
Security Challenges in Node.js
 
Powerful Analysis with the Aggregation Pipeline
Powerful Analysis with the Aggregation PipelinePowerful Analysis with the Aggregation Pipeline
Powerful Analysis with the Aggregation Pipeline
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
 
Mongo db 2.2 aggregation like a champ
Mongo db 2.2 aggregation like a champMongo db 2.2 aggregation like a champ
Mongo db 2.2 aggregation like a champ
 
Webinar: Applikationsentwicklung mit MongoDB : Teil 5: Reporting & Aggregation
Webinar: Applikationsentwicklung mit MongoDB: Teil 5: Reporting & AggregationWebinar: Applikationsentwicklung mit MongoDB: Teil 5: Reporting & Aggregation
Webinar: Applikationsentwicklung mit MongoDB : Teil 5: Reporting & Aggregation
 
MongoDB World 2019: Aggregation Pipeline Power++: How MongoDB 4.2 Pipeline Em...
MongoDB World 2019: Aggregation Pipeline Power++: How MongoDB 4.2 Pipeline Em...MongoDB World 2019: Aggregation Pipeline Power++: How MongoDB 4.2 Pipeline Em...
MongoDB World 2019: Aggregation Pipeline Power++: How MongoDB 4.2 Pipeline Em...
 

Mais de rogerbodamer (6)

Thoughts on consistency models
Thoughts on consistency modelsThoughts on consistency models
Thoughts on consistency models
 
Intro to MongoDB and datamodeling
Intro to MongoDB and datamodeling Intro to MongoDB and datamodeling
Intro to MongoDB and datamodeling
 
Mongo Web Apps: OSCON 2011
Mongo Web Apps: OSCON 2011Mongo Web Apps: OSCON 2011
Mongo Web Apps: OSCON 2011
 
Mongo db japan
Mongo db japanMongo db japan
Mongo db japan
 
Deployment
DeploymentDeployment
Deployment
 
Schema Design with MongoDB
Schema Design with MongoDBSchema Design with MongoDB
Schema Design with MongoDB
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Último (20)

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 

Thoughts on MongoDB Analytics

  • 1. ANALYTICS WITH MONGODB ROGER BODAMER
  • 2. YOU WANT TO ANALYZE THIS
  • 4. BUT HOW ? • These graphs are the end result of a process • In order get here there’s a few things you need to do and explore
  • 5. A WORD ON NON-NATIVE APPROACHES • Yes, you can • map your document schema to a relational schema • then export your data from MongoDB to a relational db • and set up a cron job to do this every day • then use your BI tool to map relational to “objects” • and then Report and do Analytics
  • 6. BUT THAT WOULD BE NO FUN • Analytics using Native Queries •A simple process
  • 7. PROCESS: NAIVE • Take a sample document • Develop query • Put on chart • Done ! • and a gold star from your boss !
  • 8. PROCESS: REALITY • Understand your schema • multiple schema’s in single collection • multiple collections / multiple data sources • Iterate: • define metric • develop query and report on metrics • understand and drill down or discard • repeat • Operationalize metrics: dashboard • Dimensions • Plotting
  • 10. UNDERSTAND YOUR SCHEMA { "name" : "Mario", "games" : [{"game" : "WoW", "duration" : 130}, {"game" : "Tetris", "duration" : 130}] }
  • 11. BUT ALSO: • Schema’s can be Polymorphic { "name" : "Bob", "location" : "us", "games" : [{"game" : "WoW", "duration" : 2910}, {"game" : "Tetris", "duration" : 593}] }
  • 12. SO NOW WHAT ? • Only report on common attributes • probably missing the most recent / interesting data
  • 13. SO NOW WHAT ? • Write 2 programs, one for each schema • 2 graphs / reports • 2 programs writing to 1 graph (basically merging instance data in 2 places)
  • 14. SO NOW WHAT ? • Unify Schema • deal with absent, null values • translate(NULL, “EU”);
  • 15. ITERATE • total time and how many games people play in the us vs eu ?
  • 16. QUERY db.runCommand( { aggregate : "gamers", pipeline : [ { $project : { location : 1, games: 1 }}, { $unwind : "$games" }, { $group : { _id : { location : 1}, number_games: { $sum : 1 }, total_duration: {$sum : "$games.duration"} }}, { $project : { _id : 0, location : "$_id.location", number_games : 1, total_duration : 1 }} ]})
  • 17. SIDEBAR: WRITING AGGREGATION QUERIES • Prepare Data • Extract relevant properties from collection documents • Unwind sub collection if its document is contributing to aggregation • Aggregate data • determine the key (_id) on which the aggregates should be done • name aggregates • Project Data • For final results
  • 18. EXAMPLE { "name" : "Alice", "location" : "us", "games" : [{ "game" : "WoW", "duration" : 200 }, { "game" : "Tetris", "duration" : 100 }] }
  • 19. PREPARE • Only use location and games: { $project : { location : 1, games: 1 }} • Unwind games as properties of its documents are aggregated over: { $unwind : "$games" }
  • 20. AGGREGATE DATA • Aggregate on number of games (add 1 per game) and total duration (add duration per game) using location as key { $group : { _id : { location : 1}, number_games: { $sum : 1 }, total_duration: {$sum : "$games.duration"} }}
  • 21. PROJECT • Only show location and aggregates, do not show _id { $project : { _id : 0, location : "$_id.location", number_games : 1, total_duration : 1 }}
  • 22. RESULT 1 • People spend a little more time playing in the US • More games played in the EU
  • 24. CHALLENGE 2 • Since we found EU and US play similar amount and same number of games, new challenge is: • Lets see what the distribution of different games is the 2 locations
  • 25. QUERY 2 db.runCommand( { aggregate : "gamers", pipeline : [ { $project : { location : 1, games : 1 }}, { $unwind : "$games" }, { $project : { location : 1, game : "$games.game", duration : "$games.duration" }}, { $group : { _id : { location: "$location", game: "$game"}, number_games: { $sum : 1 }, total_duration: {$sum : "$duration"} }}, { $project : { _id : 0, location : "$_id.location", game : "$_id.game", number_games : 1, total_duration : 1 }} ]})
  • 26. QUERY 2 db.runCommand( { aggregate : "gamers", pipeline : [ { $project : { location : 1, location, games games : 1 }}, { $unwind : "$games" }, { $project : { location : 1, game : "$games.game", duration : "$games.duration" }}, { $group : { _id : { location: "$location", game: "$game"}, number_games: { $sum : 1 }, total_duration: {$sum : "$duration"} }}, { $project : { _id : 0, location : "$_id.location", game : "$_id.game", number_games : 1, total_duration : 1 }} ]})
  • 27. QUERY 2 db.runCommand( { aggregate : "gamers", pipeline : [ { $project : { location : 1, location, games games : 1 }}, { $unwind : "$games" }, { $project : { location : 1, game : "$games.game", location, game, duration duration : "$games.duration" }}, { $group : { _id : { location: "$location", game: "$game"}, number_games: { $sum : 1 }, total_duration: {$sum : "$duration"} }}, { $project : { _id : 0, location : "$_id.location", game : "$_id.game", number_games : 1, total_duration : 1 }} ]})
  • 28. QUERY 2 db.runCommand( { aggregate : "gamers", pipeline : [ { $project : { location : 1, location, games games : 1 }}, { $unwind : "$games" }, { $project : { location : 1, game : "$games.game", location, game, duration duration : "$games.duration" }}, { $group : { _id : { location: "$location", game: "$game"}, key: aggregate on location and game number_games: { $sum : 1 }, total_duration: {$sum : "$duration"} }}, { $project : { _id : 0, location : "$_id.location", game : "$_id.game", number_games : 1, total_duration : 1 }} ]})
  • 29. QUERY 2 db.runCommand( { aggregate : "gamers", pipeline : [ { $project : { location : 1, location, games games : 1 }}, { $unwind : "$games" }, { $project : { location : 1, game : "$games.game", location, game, duration duration : "$games.duration" }}, { $group : { _id : { location: "$location", game: "$game"}, key: aggregate on location and game number_games: { $sum : 1 }, total_duration: {$sum : "$duration"} }}, { $project : { _id : 0, location : "$_id.location", game : "$_id.game", number_games : 1, total_duration : 1 }} ]})
  • 30. QUERY 2 db.runCommand( { aggregate : "gamers", pipeline : [ { $project : { location : 1, location, games games : 1 }}, { $unwind : "$games" }, { $project : { location : 1, game : "$games.game", location, game, duration duration : "$games.duration" }}, { $group : { _id : { location: "$location", game: "$game"}, key: aggregate on location and game number_games: { $sum : 1 }, total_duration: {$sum : "$duration"} }}, { $project : { _id : 0, location : "$_id.location", project: location, game, total(#games), sum(duration) game : "$_id.game", number_games : 1, total_duration : 1 }} ]})
  • 31. RESULT 2 Count: EU - WoW, US Tetris EU spends more time on WoW, US it’s more evenly spread
  • 33. CHALLENGE 3: • How do I compare Bob to everyone else in the EU ?
  • 34. QUERY •2 aggregations happening at same time: •1 by user •1 by location • This query needs to be broken up in several queries • Fairly complex • Currently easiest to process in Ruby/Java/Python/...
  • 35. db.runCommand( db.runCommand( { aggregate : "gamers", pipeline : [ { aggregate : "gamers", pipeline : [ { $project : { { $project : { name : 1, location : 1, location : 1, games : 1 games : 1 }}, }}, { $unwind : "$games" }, { $unwind : "$games" }, { $project : { { $project : { location : 1, name: 1, duration : "$games.duration" location : 1, }}, game : "$games.game", { $group : { duration : "$games.duration" _id : { location: 1}, }}, total_duration: {$sum : { $group : { "$duration"} _id : { location: "$location", name: "$name", game: }}, "$game"}, { $project : { total_duration: {$sum : "$duration"} name : "$_id.location", }}, _id : 0, { $project : { total_duration : 1 name : "$_id.name", }} _id : 0, ]}) location : "$_id.location", game : "$_id.game", total_duration : 1 }} ]})
  • 36. RESULT 3 • Bob plays >20% WoW in comparison to the Europeans, but plays 200% more Tetris
  • 37. A NOTE ON QUERIES • There’s no notion of a declared schema • The augmented scheme is coded in queries • Reuse is very hard, happens at a query language
  • 38. DIMENSIONS • Most questions / graphs have a dimension • Time, Geo • Categories • Relative: what’s X’s contribution of revenue to total • Youwill need to be able to pass in dimensions as a predicate for your queries • or cache result and post process client-side
  • 39. A WORD ON RENDERING GRAPHS / REPORTS • Several libraries available for ruby / python / java • Gruff, Scruffy, StockCharts, D3, JRafael, JQuery Vizualize, MooCharts, etc, etc. • Also some services: John Nunemakers work (http:// get.gaug.es/) • But Basically: • you know how to program, right !
  • 40. REVIEW • Understand your schema • multiple schema’s in single collection • multiple collections / multiple data sources • Iterate: • define metric • develop query and report on metrics • understand and drill down or discard • repeat • Operationalize metrics: dashboard • Dimensions • Plotting
  • 41. PUNCHLINES • We have described a software engineering process • but requirements will be very fluid • When you know how to write ruby / java / python etc. - life is good • If you’re a business analyst you have a problem • better be BFF with some engineer :)
  • 42. PLUG • We’ve been working on a declarative analytics product • (initially) uses Excel as its presentation layer • Reach out to me if you’re interested @rogerb roger@norellan.com
  • 43. THANK YOU / QUESTIONS

Notas do Editor

  1. \n
  2. \n
  3. \n
  4. \n
  5. \n
  6. \n
  7. \n
  8. \n
  9. \n
  10. \n
  11. \n
  12. \n
  13. \n
  14. \n
  15. \n
  16. \n
  17. \n
  18. \n
  19. \n
  20. \n
  21. \n
  22. \n
  23. \n
  24. \n
  25. \n
  26. \n
  27. \n
  28. \n
  29. \n
  30. \n
  31. \n
  32. \n
  33. \n
  34. \n
  35. \n
  36. \n
  37. \n
  38. \n
  39. \n
  40. \n
  41. \n
  42. \n
  43. \n