SlideShare uma empresa Scribd logo
1 de 43
ANALYTICS WITH MONGODB


      ROGER BODAMER
YOU WANT TO ANALYZE THIS
LIKE THIS
BUT HOW ?



• These   graphs are the end result of a process

• In
   order get here there’s a few things you need to do and
 explore
A WORD ON NON-NATIVE
         APPROACHES
•   Yes, you can

    •   map your document schema to a relational schema

    •   then export your data from MongoDB to a relational db

        •   and set up a cron job to do this every day

    •   then use your BI tool to map relational to “objects”

    •   and then Report and do Analytics
BUT THAT WOULD BE NO
              FUN


• Analytics   using Native Queries

•A   simple process
PROCESS: NAIVE

• Take   a sample document

• Develop     query

• Put   on chart

• Done    !

  • and   a gold star from your boss !
PROCESS: REALITY
• Understand       your schema
  • multiple schema’s in single collection
  • multiple collections / multiple data sources
• Iterate:
  • define metric
  • develop query and report on metrics
    • understand and drill down or discard
    • repeat
• Operationalize metrics: dashboard
  • Dimensions
  • Plotting
WHY ITERATE ?
UNDERSTAND YOUR SCHEMA

{
    "name" : "Mario",
    "games" : [{"game" : "WoW",
                "duration" : 130},
               {"game" : "Tetris",
                "duration" : 130}]
}
BUT ALSO:
• Schema’s   can be Polymorphic

{
    "name" : "Bob",
    "location" : "us",
    "games" : [{"game" : "WoW",
                "duration" : 2910},
               {"game" : "Tetris",
                "duration" : 593}]
}
SO NOW WHAT ?
•   Only report on common attributes

    •   probably missing the most recent / interesting data
SO NOW WHAT ?
•   Write 2 programs, one for each schema

    •   2 graphs / reports

    •   2 programs writing to 1 graph (basically merging instance data in 2
        places)
SO NOW WHAT ?

•   Unify Schema

    •   deal with absent, null values

    •   translate(NULL, “EU”);
ITERATE



• total   time and how many games people play in the us vs eu ?
QUERY
db.runCommand(
{ aggregate : "gamers", pipeline : [
    { $project : {
	

    location : 1,
	

    games: 1
    }},
    { $unwind : "$games" },
    { $group : {
        _id : { location : 1},
	

    number_games: { $sum : 1 },
        total_duration: {$sum : "$games.duration"}
    }},
    { $project : {
	

    _id : 0,
        location : "$_id.location",
	

    number_games : 1,
        total_duration : 1
    }}
]})
SIDEBAR: WRITING
           AGGREGATION QUERIES
•   Prepare Data
    •   Extract relevant properties from collection documents
    •   Unwind sub collection if its document is contributing to aggregation
•   Aggregate data
    •   determine the key (_id) on which the aggregates should be done
    •   name aggregates
•   Project Data
    •   For final results
EXAMPLE
{
    "name" : "Alice",
    "location" : "us",
    "games" : [{
        "game" : "WoW",
        "duration" : 200
      }, {
        "game" : "Tetris",
        "duration" : 100
      }]
}
PREPARE
• Only   use location and games:

{ $project : {
	

 location : 1,
	

 games: 1
    }}


• Unwind   games as properties of its documents are aggregated
 over:

{ $unwind : "$games" }
AGGREGATE DATA
• Aggregate on number of games (add 1 per game)
  and total duration (add duration per game)
  using location as key


{ $group : {
      _id : { location : 1},
	

   number_games: { $sum : 1 },
      total_duration: {$sum : "$games.duration"}
   }}
PROJECT
• Only   show location and aggregates, do not show _id


{ $project : {
	

 _id : 0,
      location : "$_id.location",
	

 number_games : 1,
      total_duration : 1
   }}
RESULT 1




• People   spend a little more time playing in the US
• More   games played in the EU
RING....
CHALLENGE 2


• Since
     we found EU and US play similar amount and same
 number of games, new challenge is:


• Lets
     see what the distribution of different
 games is the 2 locations
QUERY 2
db.runCommand(
{ aggregate : "gamers", pipeline : [
    { $project : {
	

     location : 1,
	

     games : 1
    }},
    { $unwind : "$games" },
    { $project : {
	

     location : 1,
	

     game : "$games.game",
        duration : "$games.duration"
    }},
    { $group : {
        _id : { location: "$location", game: "$game"},
	

     number_games: { $sum : 1 },
        total_duration: {$sum : "$duration"}
    }},
    { $project : {
        _id : 0,
        location : "$_id.location",
        game : "$_id.game",
	

     number_games : 1,
        total_duration : 1
    }}
]})
QUERY 2
db.runCommand(
{ aggregate : "gamers", pipeline : [
    { $project : {
	

     location : 1,                                    location, games
	

     games : 1
    }},
    { $unwind : "$games" },
    { $project : {
	

     location : 1,
	

     game : "$games.game",
        duration : "$games.duration"
    }},
    { $group : {
        _id : { location: "$location", game: "$game"},
	

     number_games: { $sum : 1 },
        total_duration: {$sum : "$duration"}
    }},
    { $project : {
        _id : 0,
        location : "$_id.location",
        game : "$_id.game",
	

     number_games : 1,
        total_duration : 1
    }}
]})
QUERY 2
db.runCommand(
{ aggregate : "gamers", pipeline : [
    { $project : {
	

     location : 1,                                    location, games
	

     games : 1
    }},
    { $unwind : "$games" },
    { $project : {
	

     location : 1,
	

     game : "$games.game",                            location, game, duration
        duration : "$games.duration"
    }},
    { $group : {
        _id : { location: "$location", game: "$game"},
	

     number_games: { $sum : 1 },
        total_duration: {$sum : "$duration"}
    }},
    { $project : {
        _id : 0,
        location : "$_id.location",
        game : "$_id.game",
	

     number_games : 1,
        total_duration : 1
    }}
]})
QUERY 2
db.runCommand(
{ aggregate : "gamers", pipeline : [
    { $project : {
	

     location : 1,                                     location, games
	

     games : 1
    }},
    { $unwind : "$games" },
    { $project : {
	

     location : 1,
	

     game : "$games.game",                            location, game, duration
        duration : "$games.duration"
    }},
    { $group : {
        _id : { location: "$location", game: "$game"},   key: aggregate on location and game
	

     number_games: { $sum : 1 },
        total_duration: {$sum : "$duration"}
    }},
    { $project : {
        _id : 0,
        location : "$_id.location",
        game : "$_id.game",
	

     number_games : 1,
        total_duration : 1
    }}
]})
QUERY 2
db.runCommand(
{ aggregate : "gamers", pipeline : [
    { $project : {
	

     location : 1,                                     location, games
	

     games : 1
    }},
    { $unwind : "$games" },
    { $project : {
	

     location : 1,
	

     game : "$games.game",                            location, game, duration
        duration : "$games.duration"
    }},
    { $group : {
        _id : { location: "$location", game: "$game"},   key: aggregate on location and game
	

     number_games: { $sum : 1 },
        total_duration: {$sum : "$duration"}
    }},
    { $project : {
        _id : 0,
        location : "$_id.location",
        game : "$_id.game",
	

     number_games : 1,
        total_duration : 1
    }}
]})
QUERY 2
db.runCommand(
{ aggregate : "gamers", pipeline : [
    { $project : {
	

     location : 1,                                                location, games
	

     games : 1
    }},
    { $unwind : "$games" },
    { $project : {
	

     location : 1,
	

     game : "$games.game",                                        location, game, duration
        duration : "$games.duration"
    }},
    { $group : {
        _id : { location: "$location", game: "$game"},              key: aggregate on location and game
	

     number_games: { $sum : 1 },
        total_duration: {$sum : "$duration"}
    }},
    { $project : {
        _id : 0,
        location : "$_id.location",                      project: location, game, total(#games), sum(duration)
        game : "$_id.game",
	

     number_games : 1,
        total_duration : 1
    }}
]})
RESULT 2




Count: EU - WoW, US Tetris
EU spends more time on WoW, US it’s more
evenly spread
RING....
CHALLENGE 3:



• How   do I compare Bob to everyone else in the EU ?
QUERY

•2   aggregations happening at same time:

  •1   by user

  •1   by location

• This   query needs to be broken up in several queries

• Fairly   complex

• Currently   easiest to process in Ruby/Java/Python/...
db.runCommand(                                                 db.runCommand(
{ aggregate : "gamers", pipeline : [                           { aggregate : "gamers", pipeline : [
    { $project : {                                                 { $project : {
         name : 1,                                             	

     location : 1,
	

     location : 1,                                          	

     games : 1
	

     games : 1                                                  }},
    }},                                                            { $unwind : "$games" },
    { $unwind : "$games" },                                        { $project : {
    { $project : {                                                      location : 1,
	

     name: 1,                                                        duration : "$games.duration"
         location : 1,                                             }},
	

     game : "$games.game",                                      { $group : {
         duration : "$games.duration"                                   _id : { location: 1},
    }},                                                                 total_duration: {$sum :
    { $group : {                                               "$duration"}
         _id : { location: "$location", name: "$name", game:       }},
"$game"},                                                          { $project : {
         total_duration: {$sum : "$duration"}                  	

     name : "$_id.location",
    }},                                                                 _id : 0,
    { $project : {                                                      total_duration : 1
	

     name : "$_id.name",                                        }}
         _id : 0,                                              ]})
         location : "$_id.location",
         game : "$_id.game",
         total_duration : 1
    }}
]})
RESULT 3




• Bob plays >20% WoW in comparison to the Europeans, but
 plays 200% more Tetris
A NOTE ON QUERIES


• There’s   no notion of a declared schema

• The   augmented scheme is coded in queries

• Reuse   is very hard, happens at a query language
DIMENSIONS
• Most   questions / graphs have a dimension

 • Time, Geo

 • Categories

 • Relative: what’s   X’s contribution of revenue to total

• Youwill need to be able to pass in dimensions as a
 predicate for your queries

 • or   cache result and post process client-side
A WORD ON RENDERING
           GRAPHS / REPORTS
• Several   libraries available for ruby / python / java

  • Gruff, Scruffy, StockCharts, D3, JRafael, JQuery Vizualize,
   MooCharts, etc, etc.

• Also some services: John Nunemakers work (http://
 get.gaug.es/)

• But   Basically:

  • you   know how to program, right !
REVIEW
• Understand       your schema
  • multiple schema’s in single collection
  • multiple collections / multiple data sources
• Iterate:
  • define metric
  • develop query and report on metrics
    • understand and drill down or discard
    • repeat
• Operationalize metrics: dashboard
  • Dimensions
  • Plotting
PUNCHLINES

• We     have described a software engineering process

  • but    requirements will be very fluid

• When      you know how to write ruby / java / python etc. - life is
  good

• If   you’re a business analyst you have a problem

  • better   be BFF with some engineer :)
PLUG

• We’ve    been working on a declarative analytics product

• (initially)   uses Excel as its presentation layer

• Reach    out to me if you’re interested

  @rogerb
  roger@norellan.com
THANK YOU / QUESTIONS

Mais conteúdo relacionado

Mais procurados

MongoDB Online Conference: Introducing MongoDB 2.2
MongoDB Online Conference: Introducing MongoDB 2.2MongoDB Online Conference: Introducing MongoDB 2.2
MongoDB Online Conference: Introducing MongoDB 2.2
MongoDB
 
From mysql to MongoDB(MongoDB2011北京交流会)
From mysql to MongoDB(MongoDB2011北京交流会)From mysql to MongoDB(MongoDB2011北京交流会)
From mysql to MongoDB(MongoDB2011北京交流会)
Night Sailer
 
Html5 game programming overview
Html5 game programming overviewHtml5 game programming overview
Html5 game programming overview
민태 김
 
First app online conf
First app   online confFirst app   online conf
First app online conf
MongoDB
 
Cleaner, Leaner, Meaner: Refactoring your jQuery
Cleaner, Leaner, Meaner: Refactoring your jQueryCleaner, Leaner, Meaner: Refactoring your jQuery
Cleaner, Leaner, Meaner: Refactoring your jQuery
Rebecca Murphey
 
enchant js workshop on Calpoly
enchant js workshop  on Calpolyenchant js workshop  on Calpoly
enchant js workshop on Calpoly
Ryo Shimizu
 
Contando uma história com O.O.
Contando uma história com O.O.Contando uma história com O.O.
Contando uma história com O.O.
Vagner Zampieri
 

Mais procurados (20)

The Ring programming language version 1.6 book - Part 50 of 189
The Ring programming language version 1.6 book - Part 50 of 189The Ring programming language version 1.6 book - Part 50 of 189
The Ring programming language version 1.6 book - Part 50 of 189
 
The Ring programming language version 1.5.3 book - Part 62 of 184
The Ring programming language version 1.5.3 book - Part 62 of 184The Ring programming language version 1.5.3 book - Part 62 of 184
The Ring programming language version 1.5.3 book - Part 62 of 184
 
Sensmon couchdb
Sensmon couchdbSensmon couchdb
Sensmon couchdb
 
Mongo or Die: How MongoDB Powers Doodle or Die
Mongo or Die: How MongoDB Powers Doodle or DieMongo or Die: How MongoDB Powers Doodle or Die
Mongo or Die: How MongoDB Powers Doodle or Die
 
Game dev 101 part 3
Game dev 101 part 3Game dev 101 part 3
Game dev 101 part 3
 
MongoDB Online Conference: Introducing MongoDB 2.2
MongoDB Online Conference: Introducing MongoDB 2.2MongoDB Online Conference: Introducing MongoDB 2.2
MongoDB Online Conference: Introducing MongoDB 2.2
 
The Ring programming language version 1.9 book - Part 62 of 210
The Ring programming language version 1.9 book - Part 62 of 210The Ring programming language version 1.9 book - Part 62 of 210
The Ring programming language version 1.9 book - Part 62 of 210
 
From mysql to MongoDB(MongoDB2011北京交流会)
From mysql to MongoDB(MongoDB2011北京交流会)From mysql to MongoDB(MongoDB2011北京交流会)
From mysql to MongoDB(MongoDB2011北京交流会)
 
Html5 game programming overview
Html5 game programming overviewHtml5 game programming overview
Html5 game programming overview
 
Books
BooksBooks
Books
 
Game dev 101 part 2
Game dev 101   part 2Game dev 101   part 2
Game dev 101 part 2
 
First app online conf
First app   online confFirst app   online conf
First app online conf
 
MongoDB Europe 2016 - Enabling the Internet of Things at Proximus - Belgium's...
MongoDB Europe 2016 - Enabling the Internet of Things at Proximus - Belgium's...MongoDB Europe 2016 - Enabling the Internet of Things at Proximus - Belgium's...
MongoDB Europe 2016 - Enabling the Internet of Things at Proximus - Belgium's...
 
Cleaner, Leaner, Meaner: Refactoring your jQuery
Cleaner, Leaner, Meaner: Refactoring your jQueryCleaner, Leaner, Meaner: Refactoring your jQuery
Cleaner, Leaner, Meaner: Refactoring your jQuery
 
Coding Horrors
Coding HorrorsCoding Horrors
Coding Horrors
 
Groovy scripts with Groovy
Groovy scripts with GroovyGroovy scripts with Groovy
Groovy scripts with Groovy
 
The Testing Games: Mocking, yay!
The Testing Games: Mocking, yay!The Testing Games: Mocking, yay!
The Testing Games: Mocking, yay!
 
The Ring programming language version 1.5 book - Part 9 of 31
The Ring programming language version 1.5 book - Part 9 of 31The Ring programming language version 1.5 book - Part 9 of 31
The Ring programming language version 1.5 book - Part 9 of 31
 
enchant js workshop on Calpoly
enchant js workshop  on Calpolyenchant js workshop  on Calpoly
enchant js workshop on Calpoly
 
Contando uma história com O.O.
Contando uma história com O.O.Contando uma história com O.O.
Contando uma história com O.O.
 

Destaque

Social Analytics on MongoDB at MongoNYC
Social Analytics on MongoDB at MongoNYCSocial Analytics on MongoDB at MongoNYC
Social Analytics on MongoDB at MongoNYC
Patrick Stokes
 
Real Time Data Analytics with MongoDB and Fluentd at Wish
Real Time Data Analytics with MongoDB and Fluentd at WishReal Time Data Analytics with MongoDB and Fluentd at Wish
Real Time Data Analytics with MongoDB and Fluentd at Wish
MongoDB
 

Destaque (9)

Social Analytics on MongoDB at MongoNYC
Social Analytics on MongoDB at MongoNYCSocial Analytics on MongoDB at MongoNYC
Social Analytics on MongoDB at MongoNYC
 
Klmug presentation - Simple Analytics with MongoDB
Klmug presentation - Simple Analytics with MongoDBKlmug presentation - Simple Analytics with MongoDB
Klmug presentation - Simple Analytics with MongoDB
 
Blazing Fast Analytics with MongoDB & Spark
Blazing Fast Analytics with MongoDB & SparkBlazing Fast Analytics with MongoDB & Spark
Blazing Fast Analytics with MongoDB & Spark
 
MongoDB for Analytics
MongoDB for AnalyticsMongoDB for Analytics
MongoDB for Analytics
 
Webinar: How Penton Uses MongoDB As an Analytics Platform within their Drupal...
Webinar: How Penton Uses MongoDB As an Analytics Platform within their Drupal...Webinar: How Penton Uses MongoDB As an Analytics Platform within their Drupal...
Webinar: How Penton Uses MongoDB As an Analytics Platform within their Drupal...
 
Webinar: Faster Big Data Analytics with MongoDB
Webinar: Faster Big Data Analytics with MongoDBWebinar: Faster Big Data Analytics with MongoDB
Webinar: Faster Big Data Analytics with MongoDB
 
Webinar: MongoDB and Analytics: Building Solutions with the MongoDB BI Connector
Webinar: MongoDB and Analytics: Building Solutions with the MongoDB BI ConnectorWebinar: MongoDB and Analytics: Building Solutions with the MongoDB BI Connector
Webinar: MongoDB and Analytics: Building Solutions with the MongoDB BI Connector
 
Real Time Data Analytics with MongoDB and Fluentd at Wish
Real Time Data Analytics with MongoDB and Fluentd at WishReal Time Data Analytics with MongoDB and Fluentd at Wish
Real Time Data Analytics with MongoDB and Fluentd at Wish
 
MongoDB World 2016: The Best IoT Analytics with MongoDB
MongoDB World 2016: The Best IoT Analytics with MongoDBMongoDB World 2016: The Best IoT Analytics with MongoDB
MongoDB World 2016: The Best IoT Analytics with MongoDB
 

Semelhante a Thoughts on MongoDB Analytics

MongoDB Aggregation Framework
MongoDB Aggregation FrameworkMongoDB Aggregation Framework
MongoDB Aggregation Framework
Tyler Brock
 
Operational Intelligence with MongoDB Webinar
Operational Intelligence with MongoDB WebinarOperational Intelligence with MongoDB Webinar
Operational Intelligence with MongoDB Webinar
MongoDB
 
Geospatial Enhancements in MongoDB 2.4
Geospatial Enhancements in MongoDB 2.4Geospatial Enhancements in MongoDB 2.4
Geospatial Enhancements in MongoDB 2.4
MongoDB
 
Aggregation Framework
Aggregation FrameworkAggregation Framework
Aggregation Framework
MongoDB
 
"Powerful Analysis with the Aggregation Pipeline (Tutorial)"
"Powerful Analysis with the Aggregation Pipeline (Tutorial)""Powerful Analysis with the Aggregation Pipeline (Tutorial)"
"Powerful Analysis with the Aggregation Pipeline (Tutorial)"
MongoDB
 
[MongoDB.local Bengaluru 2018] Tutorial: Pipeline Power - Doing More with Mon...
[MongoDB.local Bengaluru 2018] Tutorial: Pipeline Power - Doing More with Mon...[MongoDB.local Bengaluru 2018] Tutorial: Pipeline Power - Doing More with Mon...
[MongoDB.local Bengaluru 2018] Tutorial: Pipeline Power - Doing More with Mon...
MongoDB
 

Semelhante a Thoughts on MongoDB Analytics (20)

MongoDB Aggregation Framework
MongoDB Aggregation FrameworkMongoDB Aggregation Framework
MongoDB Aggregation Framework
 
MongoDB Aggregation Framework in action !
MongoDB Aggregation Framework in action !MongoDB Aggregation Framework in action !
MongoDB Aggregation Framework in action !
 
Doing More with MongoDB Aggregation
Doing More with MongoDB AggregationDoing More with MongoDB Aggregation
Doing More with MongoDB Aggregation
 
Modern Application Foundations: Underscore and Twitter Bootstrap
Modern Application Foundations: Underscore and Twitter BootstrapModern Application Foundations: Underscore and Twitter Bootstrap
Modern Application Foundations: Underscore and Twitter Bootstrap
 
d3sparql.js demo at SWAT4LS 2014 in Berlin
d3sparql.js demo at SWAT4LS 2014 in Berlind3sparql.js demo at SWAT4LS 2014 in Berlin
d3sparql.js demo at SWAT4LS 2014 in Berlin
 
Perl on Amazon Elastic MapReduce
Perl on Amazon Elastic MapReducePerl on Amazon Elastic MapReduce
Perl on Amazon Elastic MapReduce
 
Operational Intelligence with MongoDB Webinar
Operational Intelligence with MongoDB WebinarOperational Intelligence with MongoDB Webinar
Operational Intelligence with MongoDB Webinar
 
Couchdb
CouchdbCouchdb
Couchdb
 
Coscup2021-rust-toturial
Coscup2021-rust-toturialCoscup2021-rust-toturial
Coscup2021-rust-toturial
 
Geospatial Enhancements in MongoDB 2.4
Geospatial Enhancements in MongoDB 2.4Geospatial Enhancements in MongoDB 2.4
Geospatial Enhancements in MongoDB 2.4
 
Aggregation Framework
Aggregation FrameworkAggregation Framework
Aggregation Framework
 
Webinar: Data Processing and Aggregation Options
Webinar: Data Processing and Aggregation OptionsWebinar: Data Processing and Aggregation Options
Webinar: Data Processing and Aggregation Options
 
"Powerful Analysis with the Aggregation Pipeline (Tutorial)"
"Powerful Analysis with the Aggregation Pipeline (Tutorial)""Powerful Analysis with the Aggregation Pipeline (Tutorial)"
"Powerful Analysis with the Aggregation Pipeline (Tutorial)"
 
[MongoDB.local Bengaluru 2018] Tutorial: Pipeline Power - Doing More with Mon...
[MongoDB.local Bengaluru 2018] Tutorial: Pipeline Power - Doing More with Mon...[MongoDB.local Bengaluru 2018] Tutorial: Pipeline Power - Doing More with Mon...
[MongoDB.local Bengaluru 2018] Tutorial: Pipeline Power - Doing More with Mon...
 
Security Challenges in Node.js
Security Challenges in Node.jsSecurity Challenges in Node.js
Security Challenges in Node.js
 
Powerful Analysis with the Aggregation Pipeline
Powerful Analysis with the Aggregation PipelinePowerful Analysis with the Aggregation Pipeline
Powerful Analysis with the Aggregation Pipeline
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
 
Mongo db 2.2 aggregation like a champ
Mongo db 2.2 aggregation like a champMongo db 2.2 aggregation like a champ
Mongo db 2.2 aggregation like a champ
 
Webinar: Applikationsentwicklung mit MongoDB : Teil 5: Reporting & Aggregation
Webinar: Applikationsentwicklung mit MongoDB: Teil 5: Reporting & AggregationWebinar: Applikationsentwicklung mit MongoDB: Teil 5: Reporting & Aggregation
Webinar: Applikationsentwicklung mit MongoDB : Teil 5: Reporting & Aggregation
 
MongoDB World 2019: Aggregation Pipeline Power++: How MongoDB 4.2 Pipeline Em...
MongoDB World 2019: Aggregation Pipeline Power++: How MongoDB 4.2 Pipeline Em...MongoDB World 2019: Aggregation Pipeline Power++: How MongoDB 4.2 Pipeline Em...
MongoDB World 2019: Aggregation Pipeline Power++: How MongoDB 4.2 Pipeline Em...
 

Mais de rogerbodamer (6)

Thoughts on consistency models
Thoughts on consistency modelsThoughts on consistency models
Thoughts on consistency models
 
Intro to MongoDB and datamodeling
Intro to MongoDB and datamodeling Intro to MongoDB and datamodeling
Intro to MongoDB and datamodeling
 
Mongo Web Apps: OSCON 2011
Mongo Web Apps: OSCON 2011Mongo Web Apps: OSCON 2011
Mongo Web Apps: OSCON 2011
 
Mongo db japan
Mongo db japanMongo db japan
Mongo db japan
 
Deployment
DeploymentDeployment
Deployment
 
Schema Design with MongoDB
Schema Design with MongoDBSchema Design with MongoDB
Schema Design with MongoDB
 

Último

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Último (20)

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 

Thoughts on MongoDB Analytics

  • 1. ANALYTICS WITH MONGODB ROGER BODAMER
  • 2. YOU WANT TO ANALYZE THIS
  • 4. BUT HOW ? • These graphs are the end result of a process • In order get here there’s a few things you need to do and explore
  • 5. A WORD ON NON-NATIVE APPROACHES • Yes, you can • map your document schema to a relational schema • then export your data from MongoDB to a relational db • and set up a cron job to do this every day • then use your BI tool to map relational to “objects” • and then Report and do Analytics
  • 6. BUT THAT WOULD BE NO FUN • Analytics using Native Queries •A simple process
  • 7. PROCESS: NAIVE • Take a sample document • Develop query • Put on chart • Done ! • and a gold star from your boss !
  • 8. PROCESS: REALITY • Understand your schema • multiple schema’s in single collection • multiple collections / multiple data sources • Iterate: • define metric • develop query and report on metrics • understand and drill down or discard • repeat • Operationalize metrics: dashboard • Dimensions • Plotting
  • 10. UNDERSTAND YOUR SCHEMA { "name" : "Mario", "games" : [{"game" : "WoW", "duration" : 130}, {"game" : "Tetris", "duration" : 130}] }
  • 11. BUT ALSO: • Schema’s can be Polymorphic { "name" : "Bob", "location" : "us", "games" : [{"game" : "WoW", "duration" : 2910}, {"game" : "Tetris", "duration" : 593}] }
  • 12. SO NOW WHAT ? • Only report on common attributes • probably missing the most recent / interesting data
  • 13. SO NOW WHAT ? • Write 2 programs, one for each schema • 2 graphs / reports • 2 programs writing to 1 graph (basically merging instance data in 2 places)
  • 14. SO NOW WHAT ? • Unify Schema • deal with absent, null values • translate(NULL, “EU”);
  • 15. ITERATE • total time and how many games people play in the us vs eu ?
  • 16. QUERY db.runCommand( { aggregate : "gamers", pipeline : [ { $project : { location : 1, games: 1 }}, { $unwind : "$games" }, { $group : { _id : { location : 1}, number_games: { $sum : 1 }, total_duration: {$sum : "$games.duration"} }}, { $project : { _id : 0, location : "$_id.location", number_games : 1, total_duration : 1 }} ]})
  • 17. SIDEBAR: WRITING AGGREGATION QUERIES • Prepare Data • Extract relevant properties from collection documents • Unwind sub collection if its document is contributing to aggregation • Aggregate data • determine the key (_id) on which the aggregates should be done • name aggregates • Project Data • For final results
  • 18. EXAMPLE { "name" : "Alice", "location" : "us", "games" : [{ "game" : "WoW", "duration" : 200 }, { "game" : "Tetris", "duration" : 100 }] }
  • 19. PREPARE • Only use location and games: { $project : { location : 1, games: 1 }} • Unwind games as properties of its documents are aggregated over: { $unwind : "$games" }
  • 20. AGGREGATE DATA • Aggregate on number of games (add 1 per game) and total duration (add duration per game) using location as key { $group : { _id : { location : 1}, number_games: { $sum : 1 }, total_duration: {$sum : "$games.duration"} }}
  • 21. PROJECT • Only show location and aggregates, do not show _id { $project : { _id : 0, location : "$_id.location", number_games : 1, total_duration : 1 }}
  • 22. RESULT 1 • People spend a little more time playing in the US • More games played in the EU
  • 24. CHALLENGE 2 • Since we found EU and US play similar amount and same number of games, new challenge is: • Lets see what the distribution of different games is the 2 locations
  • 25. QUERY 2 db.runCommand( { aggregate : "gamers", pipeline : [ { $project : { location : 1, games : 1 }}, { $unwind : "$games" }, { $project : { location : 1, game : "$games.game", duration : "$games.duration" }}, { $group : { _id : { location: "$location", game: "$game"}, number_games: { $sum : 1 }, total_duration: {$sum : "$duration"} }}, { $project : { _id : 0, location : "$_id.location", game : "$_id.game", number_games : 1, total_duration : 1 }} ]})
  • 26. QUERY 2 db.runCommand( { aggregate : "gamers", pipeline : [ { $project : { location : 1, location, games games : 1 }}, { $unwind : "$games" }, { $project : { location : 1, game : "$games.game", duration : "$games.duration" }}, { $group : { _id : { location: "$location", game: "$game"}, number_games: { $sum : 1 }, total_duration: {$sum : "$duration"} }}, { $project : { _id : 0, location : "$_id.location", game : "$_id.game", number_games : 1, total_duration : 1 }} ]})
  • 27. QUERY 2 db.runCommand( { aggregate : "gamers", pipeline : [ { $project : { location : 1, location, games games : 1 }}, { $unwind : "$games" }, { $project : { location : 1, game : "$games.game", location, game, duration duration : "$games.duration" }}, { $group : { _id : { location: "$location", game: "$game"}, number_games: { $sum : 1 }, total_duration: {$sum : "$duration"} }}, { $project : { _id : 0, location : "$_id.location", game : "$_id.game", number_games : 1, total_duration : 1 }} ]})
  • 28. QUERY 2 db.runCommand( { aggregate : "gamers", pipeline : [ { $project : { location : 1, location, games games : 1 }}, { $unwind : "$games" }, { $project : { location : 1, game : "$games.game", location, game, duration duration : "$games.duration" }}, { $group : { _id : { location: "$location", game: "$game"}, key: aggregate on location and game number_games: { $sum : 1 }, total_duration: {$sum : "$duration"} }}, { $project : { _id : 0, location : "$_id.location", game : "$_id.game", number_games : 1, total_duration : 1 }} ]})
  • 29. QUERY 2 db.runCommand( { aggregate : "gamers", pipeline : [ { $project : { location : 1, location, games games : 1 }}, { $unwind : "$games" }, { $project : { location : 1, game : "$games.game", location, game, duration duration : "$games.duration" }}, { $group : { _id : { location: "$location", game: "$game"}, key: aggregate on location and game number_games: { $sum : 1 }, total_duration: {$sum : "$duration"} }}, { $project : { _id : 0, location : "$_id.location", game : "$_id.game", number_games : 1, total_duration : 1 }} ]})
  • 30. QUERY 2 db.runCommand( { aggregate : "gamers", pipeline : [ { $project : { location : 1, location, games games : 1 }}, { $unwind : "$games" }, { $project : { location : 1, game : "$games.game", location, game, duration duration : "$games.duration" }}, { $group : { _id : { location: "$location", game: "$game"}, key: aggregate on location and game number_games: { $sum : 1 }, total_duration: {$sum : "$duration"} }}, { $project : { _id : 0, location : "$_id.location", project: location, game, total(#games), sum(duration) game : "$_id.game", number_games : 1, total_duration : 1 }} ]})
  • 31. RESULT 2 Count: EU - WoW, US Tetris EU spends more time on WoW, US it’s more evenly spread
  • 33. CHALLENGE 3: • How do I compare Bob to everyone else in the EU ?
  • 34. QUERY •2 aggregations happening at same time: •1 by user •1 by location • This query needs to be broken up in several queries • Fairly complex • Currently easiest to process in Ruby/Java/Python/...
  • 35. db.runCommand( db.runCommand( { aggregate : "gamers", pipeline : [ { aggregate : "gamers", pipeline : [ { $project : { { $project : { name : 1, location : 1, location : 1, games : 1 games : 1 }}, }}, { $unwind : "$games" }, { $unwind : "$games" }, { $project : { { $project : { location : 1, name: 1, duration : "$games.duration" location : 1, }}, game : "$games.game", { $group : { duration : "$games.duration" _id : { location: 1}, }}, total_duration: {$sum : { $group : { "$duration"} _id : { location: "$location", name: "$name", game: }}, "$game"}, { $project : { total_duration: {$sum : "$duration"} name : "$_id.location", }}, _id : 0, { $project : { total_duration : 1 name : "$_id.name", }} _id : 0, ]}) location : "$_id.location", game : "$_id.game", total_duration : 1 }} ]})
  • 36. RESULT 3 • Bob plays >20% WoW in comparison to the Europeans, but plays 200% more Tetris
  • 37. A NOTE ON QUERIES • There’s no notion of a declared schema • The augmented scheme is coded in queries • Reuse is very hard, happens at a query language
  • 38. DIMENSIONS • Most questions / graphs have a dimension • Time, Geo • Categories • Relative: what’s X’s contribution of revenue to total • Youwill need to be able to pass in dimensions as a predicate for your queries • or cache result and post process client-side
  • 39. A WORD ON RENDERING GRAPHS / REPORTS • Several libraries available for ruby / python / java • Gruff, Scruffy, StockCharts, D3, JRafael, JQuery Vizualize, MooCharts, etc, etc. • Also some services: John Nunemakers work (http:// get.gaug.es/) • But Basically: • you know how to program, right !
  • 40. REVIEW • Understand your schema • multiple schema’s in single collection • multiple collections / multiple data sources • Iterate: • define metric • develop query and report on metrics • understand and drill down or discard • repeat • Operationalize metrics: dashboard • Dimensions • Plotting
  • 41. PUNCHLINES • We have described a software engineering process • but requirements will be very fluid • When you know how to write ruby / java / python etc. - life is good • If you’re a business analyst you have a problem • better be BFF with some engineer :)
  • 42. PLUG • We’ve been working on a declarative analytics product • (initially) uses Excel as its presentation layer • Reach out to me if you’re interested @rogerb roger@norellan.com
  • 43. THANK YOU / QUESTIONS

Notas do Editor

  1. \n
  2. \n
  3. \n
  4. \n
  5. \n
  6. \n
  7. \n
  8. \n
  9. \n
  10. \n
  11. \n
  12. \n
  13. \n
  14. \n
  15. \n
  16. \n
  17. \n
  18. \n
  19. \n
  20. \n
  21. \n
  22. \n
  23. \n
  24. \n
  25. \n
  26. \n
  27. \n
  28. \n
  29. \n
  30. \n
  31. \n
  32. \n
  33. \n
  34. \n
  35. \n
  36. \n
  37. \n
  38. \n
  39. \n
  40. \n
  41. \n
  42. \n
  43. \n