SlideShare a Scribd company logo
1 of 27
Download to read offline
mongoDB




                    advanced analytics and
                    statistics with mongodb
                         John A. De Goes @jdegoes




 http://precog.io                                   04/30/2012
mongoDB




          what do you want
           from your data?
mongoDB




          I want to get and
                                I want aggregates   I want deep insight
          put data


               MongoDB               MongoDB
                 Query              Aggregation             ???
               Language             Framework



                              SQL

     data storage                                           data intelligence
mongoDB




          I want to get and
                                I want aggregates   I want deep insight
          put data


               MongoDB               MongoDB
                                                           Map
                 Query              Aggregation
                                                          Reduce
               Language             Framework



                              SQL

     data storage                                           data intelligence
mongoDB

          function map() {
              emit(1, // Or put a GROUP BY key here
                   {sum: this.value, // the field you want stats for
                    min: this.value,
                    max: this.value,
                    count:1,
                    diff: 0, // M2,n: sum((val-mean)^2)
              });
          }

          function reduce(key, values) {
              var a = values[0]; // will reduce into here
              for (var i=1/*!*/; i < values.length; i++){
                  var b = values[i]; // will merge 'b' into 'a'


                  // temp helpers
                  var delta = a.sum/a.count - b.sum/b.count; // a.mean - b.mean
                  var weight = (a.count * b.count)/(a.count + b.count);
                  
                  // do the reducing
                  a.diff += b.diff + delta*delta*weight;
                  a.sum += b.sum;
                  a.count += b.count;
                  a.min = Math.min(a.min, b.min);
                  a.max = Math.max(a.max, b.max);
              }

              return a;
          }

          function finalize(key, value){
              value.avg = value.sum / value.count;
              value.variance = value.diff / value.count;
              value.stddev = Math.sqrt(value.variance);
              return value;
          }
mongoDB




          what if there were
           another way?
mongoDB




                 introducing


          • Statistical query language for JSON data
          • Purely declarative
          • Implicitly parallel
          • Inherently composable
mongoDB




          a taste of quirrel
          pageViews := //pageViews

          bound := 1.5 * stdDev(pageViews.duration)

          avg := mean(pageViews.duration)

          lengthyPageViews := 
            pageViews where pageViews.duration > (avg + bound)

          lengthyPageViews.userId
mongoDB




          a taste of quirrel
          pageViews := //pageViews

          bound := 1.5 * stdDev(pageViews.duration)

                    Users who spend an unusually
          avg := mean(pageViews.duration)          long
                   time looking at a page!
          lengthyPageViews := 
            pageViews where pageViews.duration > (avg + bound)

          lengthyPageViews.userId
mongoDB




          quirrel in 10 minutes
mongoDB




          set-oriented
          in Quirrel everything is
          a set of events
mongoDB




          event
          an event is a JSON value
          paired with an identity
mongoDB




          (really) basic queries
          quirrel> 1
          [1]

          quirrel> true
          [true]

          quirrel> {userId: 1239823, name: “John Doe”}
          [{userId: 1239823, name: “John Doe”}]

          quirrel>1 + 2
          [3]

          quirrel> sqrt(16) * 4 - 1 / 3
          [5]
mongoDB




          loading data
          quirrel> //payments

          [{"amount":5,"date":1329741127233,"recipients":
          ["research","marketing"]}, ...]


          quirrel> load(“/payments”)

          [{"amount":5,"date":1329741127233,"recipients":
          ["research","marketing"]}, ...]
mongoDB




          variables
          quirrel> payments := //payments
                 | payments

          [{"amount":5,"date":1329741127233,"recipients":
          ["research","marketing"]}, ...]


          quirrel> five := 5
                 | five * 2
          [10]
mongoDB




          filtered descent
          quirrel> //users.userId

          [9823461231, 916727123, 23987183, ...]


          quirrel> //payments.recipients[0]

          ["engineering","operations","research", ...]
mongoDB




          reductions
          quirrel> count(//users)
          24185132

          quirrel> mean(//payments.amount)
          87.39

          quirrel> sum(//payments.amount)
          921541.29

          quirrel> stdDev(//payments.amount)
          31.84
mongoDB




          identity matching
                 a*b
            a
            e1
                  ?    b
            e2         e8
            e3         e9
            e4    *    e10
            e5         e11
            e6         e12
                  ?
            e7
mongoDB




          identity matching
          quirrel> orders := //orders
                 | orders.subTotal +
                 | orders.subTotal *
                 | orders.taxRate +
                 | orders.shipping + orders.handling 
          [153.54805, 152.7618, 80.38365, ...]
mongoDB




          values
          quirrel> payments.amount * 0.10
          [6.1, 27.842, 29.084, 50, 0.5, 16.955, ...]
mongoDB




          filtering
          quirrel> users := //users
                 | segment := users.age > 19 & 
                 | users.age < 53 & users.income > 60000
                 | count(users where segment)
          [15]
mongoDB




          chaining
          pageViews := //pageViews

          bound := 1.5 * stdDev(pageViews.duration)

          avg := mean(pageViews.duration)

          lengthyPageViews := 
            pageViews where pageViews.duration > (avg + bound)

          lengthyPageViews.userId
mongoDB




          user functions
          quirrel> pageViews := //pageViews
                 |
                 | statsForUser('userId) :=
                 |   {userId:      'userId, 
                 |    meanPageView: mean(pageViews.duration 
                 |                       where pageViews.userId =  'userId)}
                 |
                 | statsForUser

          [{"userId":12353,"meanPageView":100.66666666666667},{"userId":
          12359,"meanPageView":83}, ...]
mongoDB




          lots more!
          • Cross-joins
          • Self-joins
          • Augmentation
          • Power-packed standard library
mongoDB




          quirrel -> mongodb
          • Quirrel is extremely expressive
          • Aggregation framework insufficient
          • Working with 10gen on new primitives
          • Backup plan: AF + MapReduce
mongoDB




          quirrel -> mongodb
          pageViews := //pageViews

          bound := 1.5 * stdDev(pageViews.duration)
                                                                  one-pass
          avg := mean(pageViews.duration)                         map/reduce
          lengthyPageViews := 
            pageViews where pageViews.duration > (avg + bound)

          lengthyPageViews.userId
                                                                 one-pass
                                                                 mongo filter
mongoDB




                            qa
                    John A. De Goes @jdegoes




 http://precog.io                              04/30/2012

More Related Content

What's hot

Aggregation Framework MongoDB Days Munich
Aggregation Framework MongoDB Days MunichAggregation Framework MongoDB Days Munich
Aggregation Framework MongoDB Days MunichNorberto Leite
 
Hadoop - MongoDB Webinar June 2014
Hadoop - MongoDB Webinar June 2014Hadoop - MongoDB Webinar June 2014
Hadoop - MongoDB Webinar June 2014MongoDB
 
MongoDB Aggregation
MongoDB Aggregation MongoDB Aggregation
MongoDB Aggregation Amit Ghosh
 
Geospatial Indexing and Querying with MongoDB
Geospatial Indexing and Querying with MongoDBGeospatial Indexing and Querying with MongoDB
Geospatial Indexing and Querying with MongoDBGrant Goodale
 
MongoDB Aggregation Framework
MongoDB Aggregation FrameworkMongoDB Aggregation Framework
MongoDB Aggregation FrameworkCaserta
 
3D + MongoDB = 3D Repo
3D + MongoDB = 3D Repo3D + MongoDB = 3D Repo
3D + MongoDB = 3D RepoMongoDB
 
Agg framework selectgroup feb2015 v2
Agg framework selectgroup feb2015 v2Agg framework selectgroup feb2015 v2
Agg framework selectgroup feb2015 v2MongoDB
 
Embedding a language into string interpolator
Embedding a language into string interpolatorEmbedding a language into string interpolator
Embedding a language into string interpolatorMichael Limansky
 
The Aggregation Framework
The Aggregation FrameworkThe Aggregation Framework
The Aggregation FrameworkMongoDB
 
Mongodb Aggregation Pipeline
Mongodb Aggregation PipelineMongodb Aggregation Pipeline
Mongodb Aggregation Pipelinezahid-mian
 
Data Governance with JSON Schema
Data Governance with JSON SchemaData Governance with JSON Schema
Data Governance with JSON SchemaMongoDB
 
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...MongoDB
 
"Powerful Analysis with the Aggregation Pipeline (Tutorial)"
"Powerful Analysis with the Aggregation Pipeline (Tutorial)""Powerful Analysis with the Aggregation Pipeline (Tutorial)"
"Powerful Analysis with the Aggregation Pipeline (Tutorial)"MongoDB
 
MongoDB - Aggregation Pipeline
MongoDB - Aggregation PipelineMongoDB - Aggregation Pipeline
MongoDB - Aggregation PipelineJason Terpko
 
Getting Started with Geospatial Data in MongoDB
Getting Started with Geospatial Data in MongoDBGetting Started with Geospatial Data in MongoDB
Getting Started with Geospatial Data in MongoDBMongoDB
 
When to Use MongoDB
When to Use MongoDB When to Use MongoDB
When to Use MongoDB MongoDB
 
Aggregation in MongoDB
Aggregation in MongoDBAggregation in MongoDB
Aggregation in MongoDBKishor Parkhe
 

What's hot (19)

Aggregation Framework MongoDB Days Munich
Aggregation Framework MongoDB Days MunichAggregation Framework MongoDB Days Munich
Aggregation Framework MongoDB Days Munich
 
Hadoop - MongoDB Webinar June 2014
Hadoop - MongoDB Webinar June 2014Hadoop - MongoDB Webinar June 2014
Hadoop - MongoDB Webinar June 2014
 
Querying mongo db
Querying mongo dbQuerying mongo db
Querying mongo db
 
MongoDB Aggregation
MongoDB Aggregation MongoDB Aggregation
MongoDB Aggregation
 
Geospatial Indexing and Querying with MongoDB
Geospatial Indexing and Querying with MongoDBGeospatial Indexing and Querying with MongoDB
Geospatial Indexing and Querying with MongoDB
 
MongoDB Aggregation Framework
MongoDB Aggregation FrameworkMongoDB Aggregation Framework
MongoDB Aggregation Framework
 
3D + MongoDB = 3D Repo
3D + MongoDB = 3D Repo3D + MongoDB = 3D Repo
3D + MongoDB = 3D Repo
 
Agg framework selectgroup feb2015 v2
Agg framework selectgroup feb2015 v2Agg framework selectgroup feb2015 v2
Agg framework selectgroup feb2015 v2
 
Embedding a language into string interpolator
Embedding a language into string interpolatorEmbedding a language into string interpolator
Embedding a language into string interpolator
 
Web Development
Web DevelopmentWeb Development
Web Development
 
The Aggregation Framework
The Aggregation FrameworkThe Aggregation Framework
The Aggregation Framework
 
Mongodb Aggregation Pipeline
Mongodb Aggregation PipelineMongodb Aggregation Pipeline
Mongodb Aggregation Pipeline
 
Data Governance with JSON Schema
Data Governance with JSON SchemaData Governance with JSON Schema
Data Governance with JSON Schema
 
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
 
"Powerful Analysis with the Aggregation Pipeline (Tutorial)"
"Powerful Analysis with the Aggregation Pipeline (Tutorial)""Powerful Analysis with the Aggregation Pipeline (Tutorial)"
"Powerful Analysis with the Aggregation Pipeline (Tutorial)"
 
MongoDB - Aggregation Pipeline
MongoDB - Aggregation PipelineMongoDB - Aggregation Pipeline
MongoDB - Aggregation Pipeline
 
Getting Started with Geospatial Data in MongoDB
Getting Started with Geospatial Data in MongoDBGetting Started with Geospatial Data in MongoDB
Getting Started with Geospatial Data in MongoDB
 
When to Use MongoDB
When to Use MongoDB When to Use MongoDB
When to Use MongoDB
 
Aggregation in MongoDB
Aggregation in MongoDBAggregation in MongoDB
Aggregation in MongoDB
 

Viewers also liked

Using MongoDB As a Tick Database
Using MongoDB As a Tick DatabaseUsing MongoDB As a Tick Database
Using MongoDB As a Tick DatabaseMongoDB
 
Rise of the scientific database
Rise of the scientific databaseRise of the scientific database
Rise of the scientific databaseJohn De Goes
 
In-Database Predictive Analytics
In-Database Predictive AnalyticsIn-Database Predictive Analytics
In-Database Predictive AnalyticsJohn De Goes
 
Post-Free: Life After Free Monads
Post-Free: Life After Free MonadsPost-Free: Life After Free Monads
Post-Free: Life After Free MonadsJohn De Goes
 
Analytics Maturity Model
Analytics Maturity ModelAnalytics Maturity Model
Analytics Maturity ModelJohn De Goes
 
Фотоматериалы
Фотоматериалы Фотоматериалы
Фотоматериалы Yerdos
 
Universidad nacional de chimbor
Universidad nacional de chimborUniversidad nacional de chimbor
Universidad nacional de chimborDoris Aguagallo
 
Product Management and Systems Thinking
Product Management and Systems ThinkingProduct Management and Systems Thinking
Product Management and Systems ThinkingDr. Arne Roock
 
How emotional abuse is wrecking your mental health
How emotional abuse is wrecking your mental healthHow emotional abuse is wrecking your mental health
How emotional abuse is wrecking your mental healthRivka Levy
 
Tulevaisuutemme verkossa
Tulevaisuutemme verkossaTulevaisuutemme verkossa
Tulevaisuutemme verkossaKaroliina Luoto
 
Grafico diario del dax perfomance index para el 10 05-2012
Grafico diario del dax perfomance index para el 10 05-2012Grafico diario del dax perfomance index para el 10 05-2012
Grafico diario del dax perfomance index para el 10 05-2012Experiencia Trading
 
7 câu mẹ nào cũng muốn hỏi khi mang bầu
7 câu mẹ nào cũng muốn hỏi khi mang bầu7 câu mẹ nào cũng muốn hỏi khi mang bầu
7 câu mẹ nào cũng muốn hỏi khi mang bầucuongdienbaby05
 
Got centerpiece? (#hewebar 2013 edition)
Got centerpiece? (#hewebar 2013 edition)Got centerpiece? (#hewebar 2013 edition)
Got centerpiece? (#hewebar 2013 edition)Michael Fienen
 
Mobile is your friend, not enemy.
Mobile is your friend, not enemy. Mobile is your friend, not enemy.
Mobile is your friend, not enemy. Edith Yeung
 
School of Fish: The MSC End of Term Report on sustainable fish in schools 2015
School of Fish: The MSC End of Term Report on sustainable fish in schools 2015School of Fish: The MSC End of Term Report on sustainable fish in schools 2015
School of Fish: The MSC End of Term Report on sustainable fish in schools 2015Marine Stewardship Council
 
Ponencia experiencia e learning y web 2.0
Ponencia experiencia e learning y web 2.0Ponencia experiencia e learning y web 2.0
Ponencia experiencia e learning y web 2.0Elizabeth Huisa Veria
 

Viewers also liked (20)

Using MongoDB As a Tick Database
Using MongoDB As a Tick DatabaseUsing MongoDB As a Tick Database
Using MongoDB As a Tick Database
 
Rise of the scientific database
Rise of the scientific databaseRise of the scientific database
Rise of the scientific database
 
In-Database Predictive Analytics
In-Database Predictive AnalyticsIn-Database Predictive Analytics
In-Database Predictive Analytics
 
Post-Free: Life After Free Monads
Post-Free: Life After Free MonadsPost-Free: Life After Free Monads
Post-Free: Life After Free Monads
 
Analytics Maturity Model
Analytics Maturity ModelAnalytics Maturity Model
Analytics Maturity Model
 
Фотоматериалы
Фотоматериалы Фотоматериалы
Фотоматериалы
 
Universidad nacional de chimbor
Universidad nacional de chimborUniversidad nacional de chimbor
Universidad nacional de chimbor
 
Product Management and Systems Thinking
Product Management and Systems ThinkingProduct Management and Systems Thinking
Product Management and Systems Thinking
 
Barometrul mediului de afaceri romanesc 2016
Barometrul mediului de afaceri romanesc 2016Barometrul mediului de afaceri romanesc 2016
Barometrul mediului de afaceri romanesc 2016
 
How emotional abuse is wrecking your mental health
How emotional abuse is wrecking your mental healthHow emotional abuse is wrecking your mental health
How emotional abuse is wrecking your mental health
 
Tulevaisuutemme verkossa
Tulevaisuutemme verkossaTulevaisuutemme verkossa
Tulevaisuutemme verkossa
 
servo press P2113 BA for press fit
servo press P2113 BA for press fitservo press P2113 BA for press fit
servo press P2113 BA for press fit
 
Teoría de las relaciones humanas
Teoría de las relaciones humanasTeoría de las relaciones humanas
Teoría de las relaciones humanas
 
Grafico diario del dax perfomance index para el 10 05-2012
Grafico diario del dax perfomance index para el 10 05-2012Grafico diario del dax perfomance index para el 10 05-2012
Grafico diario del dax perfomance index para el 10 05-2012
 
7 câu mẹ nào cũng muốn hỏi khi mang bầu
7 câu mẹ nào cũng muốn hỏi khi mang bầu7 câu mẹ nào cũng muốn hỏi khi mang bầu
7 câu mẹ nào cũng muốn hỏi khi mang bầu
 
Got centerpiece? (#hewebar 2013 edition)
Got centerpiece? (#hewebar 2013 edition)Got centerpiece? (#hewebar 2013 edition)
Got centerpiece? (#hewebar 2013 edition)
 
Mobile is your friend, not enemy.
Mobile is your friend, not enemy. Mobile is your friend, not enemy.
Mobile is your friend, not enemy.
 
School of Fish: The MSC End of Term Report on sustainable fish in schools 2015
School of Fish: The MSC End of Term Report on sustainable fish in schools 2015School of Fish: The MSC End of Term Report on sustainable fish in schools 2015
School of Fish: The MSC End of Term Report on sustainable fish in schools 2015
 
Ponencia experiencia e learning y web 2.0
Ponencia experiencia e learning y web 2.0Ponencia experiencia e learning y web 2.0
Ponencia experiencia e learning y web 2.0
 
Vanvasa
VanvasaVanvasa
Vanvasa
 

Similar to Advanced Analytics & Statistics with MongoDB

Shankar's mongo db presentation
Shankar's mongo db presentationShankar's mongo db presentation
Shankar's mongo db presentationShankar Kamble
 
Building your first app with MongoDB
Building your first app with MongoDBBuilding your first app with MongoDB
Building your first app with MongoDBNorberto Leite
 
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB
 
MongoDB and Ruby on Rails
MongoDB and Ruby on RailsMongoDB and Ruby on Rails
MongoDB and Ruby on Railsrfischer20
 
Introduction to MongoDB at IGDTUW
Introduction to MongoDB at IGDTUWIntroduction to MongoDB at IGDTUW
Introduction to MongoDB at IGDTUWAnkur Raina
 
Analytics with MongoDB Aggregation Framework and Hadoop Connector
Analytics with MongoDB Aggregation Framework and Hadoop ConnectorAnalytics with MongoDB Aggregation Framework and Hadoop Connector
Analytics with MongoDB Aggregation Framework and Hadoop ConnectorHenrik Ingo
 
Webinar: Applikationsentwicklung mit MongoDB : Teil 5: Reporting & Aggregation
Webinar: Applikationsentwicklung mit MongoDB: Teil 5: Reporting & AggregationWebinar: Applikationsentwicklung mit MongoDB: Teil 5: Reporting & Aggregation
Webinar: Applikationsentwicklung mit MongoDB : Teil 5: Reporting & AggregationMongoDB
 
Dev Jumpstart: Build Your First App with MongoDB
Dev Jumpstart: Build Your First App with MongoDBDev Jumpstart: Build Your First App with MongoDB
Dev Jumpstart: Build Your First App with MongoDBMongoDB
 
Mongodb intro
Mongodb introMongodb intro
Mongodb introchristkv
 
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial IndexesBack to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial IndexesMongoDB
 
mongodb-introduction
mongodb-introductionmongodb-introduction
mongodb-introductionTse-Ching Ho
 
[MongoDB.local Bengaluru 2018] Just in Time Validation with JSON Schema
[MongoDB.local Bengaluru 2018] Just in Time Validation with JSON Schema[MongoDB.local Bengaluru 2018] Just in Time Validation with JSON Schema
[MongoDB.local Bengaluru 2018] Just in Time Validation with JSON SchemaMongoDB
 
Data Analytics with MongoDB - Jane Fine
Data Analytics with MongoDB - Jane FineData Analytics with MongoDB - Jane Fine
Data Analytics with MongoDB - Jane FineMongoDB
 
MongoDB and Hadoop: Driving Business Insights
MongoDB and Hadoop: Driving Business InsightsMongoDB and Hadoop: Driving Business Insights
MongoDB and Hadoop: Driving Business InsightsMongoDB
 

Similar to Advanced Analytics & Statistics with MongoDB (20)

Shankar's mongo db presentation
Shankar's mongo db presentationShankar's mongo db presentation
Shankar's mongo db presentation
 
MongoDB and Python
MongoDB and PythonMongoDB and Python
MongoDB and Python
 
Building your first app with MongoDB
Building your first app with MongoDBBuilding your first app with MongoDB
Building your first app with MongoDB
 
Mongo db dla administratora
Mongo db dla administratoraMongo db dla administratora
Mongo db dla administratora
 
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
 
MongoDB and Ruby on Rails
MongoDB and Ruby on RailsMongoDB and Ruby on Rails
MongoDB and Ruby on Rails
 
Python and MongoDB
Python and MongoDB Python and MongoDB
Python and MongoDB
 
Introduction to MongoDB at IGDTUW
Introduction to MongoDB at IGDTUWIntroduction to MongoDB at IGDTUW
Introduction to MongoDB at IGDTUW
 
Analytics with MongoDB Aggregation Framework and Hadoop Connector
Analytics with MongoDB Aggregation Framework and Hadoop ConnectorAnalytics with MongoDB Aggregation Framework and Hadoop Connector
Analytics with MongoDB Aggregation Framework and Hadoop Connector
 
Webinar: Applikationsentwicklung mit MongoDB : Teil 5: Reporting & Aggregation
Webinar: Applikationsentwicklung mit MongoDB: Teil 5: Reporting & AggregationWebinar: Applikationsentwicklung mit MongoDB: Teil 5: Reporting & Aggregation
Webinar: Applikationsentwicklung mit MongoDB : Teil 5: Reporting & Aggregation
 
Dev Jumpstart: Build Your First App with MongoDB
Dev Jumpstart: Build Your First App with MongoDBDev Jumpstart: Build Your First App with MongoDB
Dev Jumpstart: Build Your First App with MongoDB
 
Mongodb intro
Mongodb introMongodb intro
Mongodb intro
 
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial IndexesBack to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
 
MongoDB.pdf
MongoDB.pdfMongoDB.pdf
MongoDB.pdf
 
mongodb-introduction
mongodb-introductionmongodb-introduction
mongodb-introduction
 
[MongoDB.local Bengaluru 2018] Just in Time Validation with JSON Schema
[MongoDB.local Bengaluru 2018] Just in Time Validation with JSON Schema[MongoDB.local Bengaluru 2018] Just in Time Validation with JSON Schema
[MongoDB.local Bengaluru 2018] Just in Time Validation with JSON Schema
 
MongoDB
MongoDBMongoDB
MongoDB
 
Data Analytics with MongoDB - Jane Fine
Data Analytics with MongoDB - Jane FineData Analytics with MongoDB - Jane Fine
Data Analytics with MongoDB - Jane Fine
 
MongoDB and Hadoop: Driving Business Insights
MongoDB and Hadoop: Driving Business InsightsMongoDB and Hadoop: Driving Business Insights
MongoDB and Hadoop: Driving Business Insights
 
MongoDb and NoSQL
MongoDb and NoSQLMongoDb and NoSQL
MongoDb and NoSQL
 

More from John De Goes

Refactoring Functional Type Classes
Refactoring Functional Type ClassesRefactoring Functional Type Classes
Refactoring Functional Type ClassesJohn De Goes
 
One Monad to Rule Them All
One Monad to Rule Them AllOne Monad to Rule Them All
One Monad to Rule Them AllJohn De Goes
 
Error Management: Future vs ZIO
Error Management: Future vs ZIOError Management: Future vs ZIO
Error Management: Future vs ZIOJohn De Goes
 
Atomically { Delete Your Actors }
Atomically { Delete Your Actors }Atomically { Delete Your Actors }
Atomically { Delete Your Actors }John De Goes
 
The Death of Final Tagless
The Death of Final TaglessThe Death of Final Tagless
The Death of Final TaglessJohn De Goes
 
Scalaz Stream: Rebirth
Scalaz Stream: RebirthScalaz Stream: Rebirth
Scalaz Stream: RebirthJohn De Goes
 
Scalaz Stream: Rebirth
Scalaz Stream: RebirthScalaz Stream: Rebirth
Scalaz Stream: RebirthJohn De Goes
 
ZIO Schedule: Conquering Flakiness & Recurrence with Pure Functional Programming
ZIO Schedule: Conquering Flakiness & Recurrence with Pure Functional ProgrammingZIO Schedule: Conquering Flakiness & Recurrence with Pure Functional Programming
ZIO Schedule: Conquering Flakiness & Recurrence with Pure Functional ProgrammingJohn De Goes
 
Blazing Fast, Pure Effects without Monads — LambdaConf 2018
Blazing Fast, Pure Effects without Monads — LambdaConf 2018Blazing Fast, Pure Effects without Monads — LambdaConf 2018
Blazing Fast, Pure Effects without Monads — LambdaConf 2018John De Goes
 
Scalaz 8: A Whole New Game
Scalaz 8: A Whole New GameScalaz 8: A Whole New Game
Scalaz 8: A Whole New GameJohn De Goes
 
Scalaz 8 vs Akka Actors
Scalaz 8 vs Akka ActorsScalaz 8 vs Akka Actors
Scalaz 8 vs Akka ActorsJohn De Goes
 
Orthogonal Functional Architecture
Orthogonal Functional ArchitectureOrthogonal Functional Architecture
Orthogonal Functional ArchitectureJohn De Goes
 
The Design of the Scalaz 8 Effect System
The Design of the Scalaz 8 Effect SystemThe Design of the Scalaz 8 Effect System
The Design of the Scalaz 8 Effect SystemJohn De Goes
 
Quark: A Purely-Functional Scala DSL for Data Processing & Analytics
Quark: A Purely-Functional Scala DSL for Data Processing & AnalyticsQuark: A Purely-Functional Scala DSL for Data Processing & Analytics
Quark: A Purely-Functional Scala DSL for Data Processing & AnalyticsJohn De Goes
 
Streams for (Co)Free!
Streams for (Co)Free!Streams for (Co)Free!
Streams for (Co)Free!John De Goes
 
The Easy-Peasy-Lemon-Squeezy, Statically-Typed, Purely Functional Programming...
The Easy-Peasy-Lemon-Squeezy, Statically-Typed, Purely Functional Programming...The Easy-Peasy-Lemon-Squeezy, Statically-Typed, Purely Functional Programming...
The Easy-Peasy-Lemon-Squeezy, Statically-Typed, Purely Functional Programming...John De Goes
 
Halogen: Past, Present, and Future
Halogen: Past, Present, and FutureHalogen: Past, Present, and Future
Halogen: Past, Present, and FutureJohn De Goes
 
All Aboard The Scala-to-PureScript Express!
All Aboard The Scala-to-PureScript Express!All Aboard The Scala-to-PureScript Express!
All Aboard The Scala-to-PureScript Express!John De Goes
 

More from John De Goes (20)

Refactoring Functional Type Classes
Refactoring Functional Type ClassesRefactoring Functional Type Classes
Refactoring Functional Type Classes
 
One Monad to Rule Them All
One Monad to Rule Them AllOne Monad to Rule Them All
One Monad to Rule Them All
 
Error Management: Future vs ZIO
Error Management: Future vs ZIOError Management: Future vs ZIO
Error Management: Future vs ZIO
 
Atomically { Delete Your Actors }
Atomically { Delete Your Actors }Atomically { Delete Your Actors }
Atomically { Delete Your Actors }
 
The Death of Final Tagless
The Death of Final TaglessThe Death of Final Tagless
The Death of Final Tagless
 
Scalaz Stream: Rebirth
Scalaz Stream: RebirthScalaz Stream: Rebirth
Scalaz Stream: Rebirth
 
Scalaz Stream: Rebirth
Scalaz Stream: RebirthScalaz Stream: Rebirth
Scalaz Stream: Rebirth
 
ZIO Schedule: Conquering Flakiness & Recurrence with Pure Functional Programming
ZIO Schedule: Conquering Flakiness & Recurrence with Pure Functional ProgrammingZIO Schedule: Conquering Flakiness & Recurrence with Pure Functional Programming
ZIO Schedule: Conquering Flakiness & Recurrence with Pure Functional Programming
 
ZIO Queue
ZIO QueueZIO Queue
ZIO Queue
 
Blazing Fast, Pure Effects without Monads — LambdaConf 2018
Blazing Fast, Pure Effects without Monads — LambdaConf 2018Blazing Fast, Pure Effects without Monads — LambdaConf 2018
Blazing Fast, Pure Effects without Monads — LambdaConf 2018
 
Scalaz 8: A Whole New Game
Scalaz 8: A Whole New GameScalaz 8: A Whole New Game
Scalaz 8: A Whole New Game
 
Scalaz 8 vs Akka Actors
Scalaz 8 vs Akka ActorsScalaz 8 vs Akka Actors
Scalaz 8 vs Akka Actors
 
Orthogonal Functional Architecture
Orthogonal Functional ArchitectureOrthogonal Functional Architecture
Orthogonal Functional Architecture
 
The Design of the Scalaz 8 Effect System
The Design of the Scalaz 8 Effect SystemThe Design of the Scalaz 8 Effect System
The Design of the Scalaz 8 Effect System
 
Quark: A Purely-Functional Scala DSL for Data Processing & Analytics
Quark: A Purely-Functional Scala DSL for Data Processing & AnalyticsQuark: A Purely-Functional Scala DSL for Data Processing & Analytics
Quark: A Purely-Functional Scala DSL for Data Processing & Analytics
 
Streams for (Co)Free!
Streams for (Co)Free!Streams for (Co)Free!
Streams for (Co)Free!
 
MTL Versus Free
MTL Versus FreeMTL Versus Free
MTL Versus Free
 
The Easy-Peasy-Lemon-Squeezy, Statically-Typed, Purely Functional Programming...
The Easy-Peasy-Lemon-Squeezy, Statically-Typed, Purely Functional Programming...The Easy-Peasy-Lemon-Squeezy, Statically-Typed, Purely Functional Programming...
The Easy-Peasy-Lemon-Squeezy, Statically-Typed, Purely Functional Programming...
 
Halogen: Past, Present, and Future
Halogen: Past, Present, and FutureHalogen: Past, Present, and Future
Halogen: Past, Present, and Future
 
All Aboard The Scala-to-PureScript Express!
All Aboard The Scala-to-PureScript Express!All Aboard The Scala-to-PureScript Express!
All Aboard The Scala-to-PureScript Express!
 

Recently uploaded

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 

Recently uploaded (20)

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 

Advanced Analytics & Statistics with MongoDB

  • 1. mongoDB advanced analytics and statistics with mongodb John A. De Goes @jdegoes http://precog.io 04/30/2012
  • 2. mongoDB what do you want from your data?
  • 3. mongoDB I want to get and I want aggregates I want deep insight put data MongoDB MongoDB Query Aggregation ??? Language Framework SQL data storage data intelligence
  • 4. mongoDB I want to get and I want aggregates I want deep insight put data MongoDB MongoDB Map Query Aggregation Reduce Language Framework SQL data storage data intelligence
  • 5. mongoDB function map() {     emit(1, // Or put a GROUP BY key here          {sum: this.value, // the field you want stats for           min: this.value,           max: this.value,           count:1,           diff: 0, // M2,n: sum((val-mean)^2)     }); } function reduce(key, values) {     var a = values[0]; // will reduce into here     for (var i=1/*!*/; i < values.length; i++){         var b = values[i]; // will merge 'b' into 'a'         // temp helpers         var delta = a.sum/a.count - b.sum/b.count; // a.mean - b.mean         var weight = (a.count * b.count)/(a.count + b.count);                  // do the reducing         a.diff += b.diff + delta*delta*weight;         a.sum += b.sum;         a.count += b.count;         a.min = Math.min(a.min, b.min);         a.max = Math.max(a.max, b.max);     }     return a; } function finalize(key, value){     value.avg = value.sum / value.count;     value.variance = value.diff / value.count;     value.stddev = Math.sqrt(value.variance);     return value; }
  • 6. mongoDB what if there were another way?
  • 7. mongoDB introducing • Statistical query language for JSON data • Purely declarative • Implicitly parallel • Inherently composable
  • 8. mongoDB a taste of quirrel pageViews := //pageViews bound := 1.5 * stdDev(pageViews.duration) avg := mean(pageViews.duration) lengthyPageViews :=  pageViews where pageViews.duration > (avg + bound) lengthyPageViews.userId
  • 9. mongoDB a taste of quirrel pageViews := //pageViews bound := 1.5 * stdDev(pageViews.duration) Users who spend an unusually avg := mean(pageViews.duration) long time looking at a page! lengthyPageViews :=  pageViews where pageViews.duration > (avg + bound) lengthyPageViews.userId
  • 10. mongoDB quirrel in 10 minutes
  • 11. mongoDB set-oriented in Quirrel everything is a set of events
  • 12. mongoDB event an event is a JSON value paired with an identity
  • 13. mongoDB (really) basic queries quirrel> 1 [1] quirrel> true [true] quirrel> {userId: 1239823, name: “John Doe”} [{userId: 1239823, name: “John Doe”}] quirrel>1 + 2 [3] quirrel> sqrt(16) * 4 - 1 / 3 [5]
  • 14. mongoDB loading data quirrel> //payments [{"amount":5,"date":1329741127233,"recipients": ["research","marketing"]}, ...] quirrel> load(“/payments”) [{"amount":5,"date":1329741127233,"recipients": ["research","marketing"]}, ...]
  • 15. mongoDB variables quirrel> payments := //payments | payments [{"amount":5,"date":1329741127233,"recipients": ["research","marketing"]}, ...] quirrel> five := 5 | five * 2 [10]
  • 16. mongoDB filtered descent quirrel> //users.userId [9823461231, 916727123, 23987183, ...] quirrel> //payments.recipients[0] ["engineering","operations","research", ...]
  • 17. mongoDB reductions quirrel> count(//users) 24185132 quirrel> mean(//payments.amount) 87.39 quirrel> sum(//payments.amount) 921541.29 quirrel> stdDev(//payments.amount) 31.84
  • 18. mongoDB identity matching a*b a e1 ? b e2 e8 e3 e9 e4 * e10 e5 e11 e6 e12 ? e7
  • 19. mongoDB identity matching quirrel> orders := //orders   | orders.subTotal + | orders.subTotal * | orders.taxRate + | orders.shipping + orders.handling  [153.54805, 152.7618, 80.38365, ...]
  • 20. mongoDB values quirrel> payments.amount * 0.10 [6.1, 27.842, 29.084, 50, 0.5, 16.955, ...]
  • 21. mongoDB filtering quirrel> users := //users   | segment := users.age > 19 &  | users.age < 53 & users.income > 60000   | count(users where segment) [15]
  • 22. mongoDB chaining pageViews := //pageViews bound := 1.5 * stdDev(pageViews.duration) avg := mean(pageViews.duration) lengthyPageViews :=  pageViews where pageViews.duration > (avg + bound) lengthyPageViews.userId
  • 23. mongoDB user functions quirrel> pageViews := //pageViews |   | statsForUser('userId) :=   |   {userId:  'userId,  | meanPageView: mean(pageViews.duration  | where pageViews.userId =  'userId)} |   | statsForUser [{"userId":12353,"meanPageView":100.66666666666667},{"userId": 12359,"meanPageView":83}, ...]
  • 24. mongoDB lots more! • Cross-joins • Self-joins • Augmentation • Power-packed standard library
  • 25. mongoDB quirrel -> mongodb • Quirrel is extremely expressive • Aggregation framework insufficient • Working with 10gen on new primitives • Backup plan: AF + MapReduce
  • 26. mongoDB quirrel -> mongodb pageViews := //pageViews bound := 1.5 * stdDev(pageViews.duration) one-pass avg := mean(pageViews.duration) map/reduce lengthyPageViews :=  pageViews where pageViews.duration > (avg + bound) lengthyPageViews.userId one-pass mongo filter
  • 27. mongoDB qa John A. De Goes @jdegoes http://precog.io 04/30/2012