Back to Basics Webinar 5: Introduction to the Aggregation Framework

MongoDBEurope2016
Old Billingsgate, London
15th November
mongodb.com/europe
Register with code JD20 for a 20% discount

Back to Basics 2016 : Webinar 5
Introduction to the Aggregation Framework
Joe Drumgoole
Director of Developer Advocacy, EMEA
@jdrumgoole
V1.1

3
Recap
• Webinar 1 – Introduction to NoSQL
– The different types of NoSQL databases
– MongoDB is a document database
• Webinar 2 – My First Application
– Creating databases and collections
– CRUD, Indexes and Explain
• Webinar 3 – Schema Design
– Dynamic schema
– Embedding approaches
• Webinar 4 – GeoSpatial and Text Indexes

4
The Aggregation Framework
• An analytics engine for MongoDB
• What is analytics?
• Think of the two types of database, OLTP, OLAP
• OLTP : Online Transaction Processing
– airline booking,
– ATMs,
– Taxi booking
• OLAP : Online Analytical Processing
– Which tickets make us most money?
– When do we need to refill our ATMs?
– How many cabs do we need to service the West End of London?

5
What Does This Look Like?
OLTP OLAP

6
OLAP - There Be (Hadoop) Dragons Here
• OLAP queries are often table scans
• The output of the queries is often stored for future analysis and comparison
• Many customers are looking at Spark and Hadoop for OLAP but:
– Complexity is astronomical
– Focused on algorithmic analysis of data (you gotta write a program)
– Requires some significant knowledge of parallel alogrithms and parallel
processing
• The aggregation framework is a kinder gentler tool 
• May do what you want in less time with less anguish

7
The Aggregation Framework – A Processing Pipeline
Match Project Group SortLimit
• Think unix pipeline
• The output of one stage is passed to the input of the next stage
• Each stage performs one job
• Stages can be repeated
• The input is a single collection

8
Pipeline Operators
• $match
Filter documents
• $project
Reshape documents
• $group
Summarize documents
• $out
Create new collections
• $sort
Order documents
• $limit/$skip
Paginate documents
• $lookup
Join two collections togther
• $unwind
Expand an array

10
Example Document
{ "_id" : ObjectId("5759ee6e8684975e1098af68"),
"TestID" : 400,
"VehicleID" : "278",
"TestDate" : ISODate("2013-04-23T00:00:00Z"),
"TestClassID" : "4",
"TestType" : "N",
"TestResult" : "P",
"TestMileage" : 99284,
"Postcode" : "E",
"Make" : "AUDI",
"Model" : "A3",
"Colour" : "BLACK",
"FuelType" : "P",
"CylinderCapacity" : 1598,
"FirstUseDate" : ISODate("2003-11-11T00:00:00Z“) }

11
Lets Use The Shell
MongoDB Enterprise > use vosa
switched to db vosa
MongoDB Enterprise > db.results2013.findOne()
{
"_id" : ObjectId("577294020cb23533dfbaac18"),
"TestID" : 17,
"VehicleID" : 28,
"TestType" : "N",
"TestResult" : "P",
"Postcode" : "BN",
"Make" : "SUZUKI",
"Model" : "UNCLASSIFIED",
"Colour" : "GREEN",
"FuelType" : "P",
"FirstUseDate" : ISODate("1993-08-11T00:00:00Z")
}

12
$limit
MongoDB Enterprise > db.results2013.aggregate([ { "$limit" :2 } ] )
{
"TestID" : 17,
"VehicleID" : 28,
"TestType" : "N",
"TestResult" : "P",
"Postcode" : "BN",
"Make" : "SUZUKI",
…

13
Let’s Make a Small Collection
> db.results2013.aggregate( [ { "$limit" : 10000 }, {"$out" : "results10k" } ] )
> db.results10k.count()
10000
> db.results10k.findOne()
{
"TestID" : 17,
"VehicleID" : 28,
"TestType" : "N",
"TestResult" : "P",
"Postcode" : "BN",
"Make" : "SUZUKI",
"Colour" : "GREEN",
"FuelType" : "P",
"FirstUseDate" : ISODate("1993-08-11T00:00:00Z")
}

14
$match
…aggregate([ { "$limit" :2000 },
{ "$match" : { "FirstUseDate" : { "$ne" : "NULL" }}} ])
{ "_id" : ObjectId("577294020cb23533dfbaac18"), "TestID" : 17, "VehicleID" : 28,
"TestDate" : ISODate("2013-05-02T00:00:00Z"), "TestClassID" : "2", "TestType" :
"N", "TestResult" : "P", "TestMileage" : 46414, "Postcode" : "BN", "Make" :
"SUZUKI", "Model" : "UNCLASSIFIED", "Colour" : "GREEN", "FuelType" : "P",
"CylinderCapacity" : 398, "FirstUseDate" : ISODate("1993-08-11T00:00:00Z") }
{ "_id" : ObjectId("577294020cb23533dfbaac19"), "TestID" : 22, "VehicleID" : 33,
"N", "TestResult" : "P", "TestMileage" : 15605, "Postcode" : "PE", "Make" :
"UNCLASSIFIED", "Model" : "UNCLASSIFIED", "Colour" : "BLACK", "FuelType" : "P",
{ "_id" : ObjectId("577294020cb23533dfbaac1a"), "TestID" : 44, "VehicleID" : 49,
"N", "TestResult" : "PRS", "TestMileage" : 72694, "Postcode" : "SO", "Make" :
"UNCLASSIFIED", "Model" : "UNCLASSIFIED", "Colour" : "BLACK", "FuelType" : "P",
...

15
$project (1 of 2)
ageinusecs = { "$subtract" : [ "$TestDate", "$FirstUseDate" ] }
ageinyears = { "$divide" :[ ageinusecs , (1000*3600*24*365) ] }
floorage = { "$floor" : ageinyears }
ispass = { "$cond" : [{"$eq": ["$TestResult","P"]},1,0]}
project = { "$project" : { "Make” :1,
"Model” :1,
"VehicleID" :1,
"TestResult” :1,
"Age” :floorage,
"pass” :ispass }}

16
$project (2 of 2)
MongoDB Enterprise > db.nonulldates.aggregate( [ project ] )
{ "_id" : ObjectId("577294020cb23533dfbaac18"), "VehicleID" : 28, "TestResult" : "P", "Make" :
"SUZUKI", "Model" : "UNCLASSIFIED", "Age" : 19, "pass" : 1 }
{ "_id" : ObjectId("577294020cb23533dfbaac19"), "VehicleID" : 33, "TestResult" : "P", "Make" :
"UNCLASSIFIED", "Model" : "UNCLASSIFIED", "Age" : 51, "pass" : 1 }
{ "_id" : ObjectId("577294020cb23533dfbaac1a"), "VehicleID" : 49, "TestResult" : "PRS", "Make"
: "UNCLASSIFIED", "Model" : "UNCLASSIFIED", "Age" : 12, "pass" : 0 }
{ "_id" : ObjectId("577294020cb23533dfbaac1b"), "VehicleID" : 54, "TestResult" : "P", "Make" :
"NISSAN", "Model" : "MICRA GX", "Age" : 13, "pass" : 1 }
{ "_id" : ObjectId("577294020cb23533dfbaac1c"), "VehicleID" : 54, "TestResult" : "F", "Make" :
{ "_id" : ObjectId("577294020cb23533dfbaac1d"), "VehicleID" : 63, "TestResult" : "P", "Make" :
{ "_id" : ObjectId("577294020cb23533dfbaac1e"), "VehicleID" : 63, "TestResult" : "F", "Make" :
{ "_id" : ObjectId("577294020cb23533dfbaac1f"), "VehicleID" : 93, "TestResult" : "P", "Make" :
"BMW", "Model" : "318ti SE COMPACT", "Age" : 12, "pass" : 1 }
…

17
$group
countMakes = { "$group" : {{ "_id" : "$Make"} , "total" : {"$sum" : 1 }}}
db.nonulldates.aggregate( [ countMakes ])
{ "_id" : "IVECO", "total" : 1 }
{ "_id" : "ISUZU", "total" : 1 }
{ "_id" : "YAMAHA", "total" : 1 }
{ "_id" : "OLDSMOBILE", "total" : 1 }
{ "_id" : "KAWASAKI", "total" : 1 }
{ "_id" : "MASERATI", "total" : 1 }
{ "_id" : "BENELLI", "total" : 1 }
{ "_id" : "BENTLEY", "total" : 3 }
{ "_id" : "AUDI", "total" : 26 }
{ "_id" : "SMART", "total" : 2 }
{ "_id" : "HARLEY-DAVIDSON", "total" : 1 }
…

18
Summary
• A pipeline of operations
• Select, project, group, sort
• $out must appear last in an aggregation pipeline
• There are a range of accumulators (see the group by
documentation)
• Very powerful way to reshape and analyse data
• Shard aware to gain maxinum performance for large clusters

LOREM
IPSUM
LOREM
IPSUM
LOREM
IPSUM
LOREM
IPSUM
Sollicitudin VenenatisLOREM
IPSUM
LOREM
IPSUM
LOREM
IPSUM
LOREM
IPSUM
Graphic Element Examples

Porta Ultricies
Commodo Porta
Graph Examples
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
Category 1 Category 2 Category 3 Category 4
Series 1
Series 2

0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
Category 1 Category 2 Category 3 Category 4
Series 1
Series 2

{
_id : ObjectId("4c4ba5e5e8aabf3"),
employee_name: "Dunham, Justin",
department : "Marketing",
title : "Product Manager, Web",
report_up: "Neray, Graham",
pay_band: “C",
benefits : [
{ type : "Health",
plan : "PPO Plus" },
{ type : "Dental",
plan : "Standard" }
]
}
Code/Highlight Example

Aggregation Framework Agility Backup Big Data Briefcase
Buildings Business Intelligence Camera Cash Register Catalog
Chat Checkmark Checkmark Cloud Commercial Contract
Computer Content Continuous Development Credit Card Customer Success

Data Center Data Variety Data Velocity Data Volume Data Warehouse Database
Dialogue Directory Documents Downloads Drivers Dynamic Schema
EDW Integration Faster Time to Market File Transfer Flexible Gear Hadoop
Health Check High Availability Horizontal Scaling Integrating into Infrastructure Internet of Things Iterative Development

Life Preserver Line Graph Lock Log Data Lower Cost Magnifying Glass
Man Mobile Phone Meter Monitoring Music New Apps
New Data Types Online Open Source Parachute Personalization Pin
Platform Certification Product Catalog Puzzle Pieces RDBMS Realtime Analytics Rich Querying

Life Preserver RSS Scalability Scale Secondary Indexing Steering Wheel
Stopwatch Text Search Tick Data Training Transmission Tower Trophy
Woman World

Back to Basics Webinar 5: Introduction to the Aggregation Framework

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (18)

Destaque

Destaque (11)

Semelhante a Back to Basics Webinar 5: Introduction to the Aggregation Framework

Semelhante a Back to Basics Webinar 5: Introduction to the Aggregation Framework (20)

Mais de MongoDB

Mais de MongoDB (20)

Último

Último (20)

Back to Basics Webinar 5: Introduction to the Aggregation Framework

Notas do Editor