The document provides information about an upcoming webinar on the MongoDB aggregation framework. Key details include:
- The webinar will introduce the aggregation framework and provide an overview of its capabilities for analytics.
- Examples will use a real-world vehicle testing dataset to demonstrate aggregation pipeline stages like $match, $project, and $group.
- Attendees will learn how the aggregation framework provides a simpler way to perform analytics compared to other tools like Spark and Hadoop.
2. Back to Basics 2016 : Webinar 5
Introduction to the Aggregation Framework
Joe Drumgoole
Director of Developer Advocacy, EMEA
@jdrumgoole
V1.1
3. 3
Recap
• Webinar 1 – Introduction to NoSQL
– The different types of NoSQL databases
– MongoDB is a document database
• Webinar 2 – My First Application
– Creating databases and collections
– CRUD, Indexes and Explain
• Webinar 3 – Schema Design
– Dynamic schema
– Embedding approaches
• Webinar 4 – GeoSpatial and Text Indexes
4. 4
The Aggregation Framework
• An analytics engine for MongoDB
• What is analytics?
• Think of the two types of database, OLTP, OLAP
• OLTP : Online Transaction Processing
– airline booking,
– ATMs,
– Taxi booking
• OLAP : Online Analytical Processing
– Which tickets make us most money?
– When do we need to refill our ATMs?
– How many cabs do we need to service the West End of London?
6. 6
OLAP - There Be (Hadoop) Dragons Here
• OLAP queries are often table scans
• The output of the queries is often stored for future analysis and comparison
• Many customers are looking at Spark and Hadoop for OLAP but:
– Complexity is astronomical
– Focused on algorithmic analysis of data (you gotta write a program)
– Requires some significant knowledge of parallel alogrithms and parallel
processing
• The aggregation framework is a kinder gentler tool
• May do what you want in less time with less anguish
7. 7
The Aggregation Framework – A Processing Pipeline
Match Project Group SortLimit
• Think unix pipeline
• The output of one stage is passed to the input of the next stage
• Each stage performs one job
• Stages can be repeated
• The input is a single collection
8. 8
Pipeline Operators
• $match
Filter documents
• $project
Reshape documents
• $group
Summarize documents
• $out
Create new collections
• $sort
Order documents
• $limit/$skip
Paginate documents
• $lookup
Join two collections togther
• $unwind
Expand an array
18. 18
Summary
• A pipeline of operations
• Select, project, group, sort
• $out must appear last in an aggregation pipeline
• There are a range of accumulators (see the group by
documentation)
• Very powerful way to reshape and analyse data
• Shard aware to gain maxinum performance for large clusters
30. {
_id : ObjectId("4c4ba5e5e8aabf3"),
employee_name: "Dunham, Justin",
department : "Marketing",
title : "Product Manager, Web",
report_up: "Neray, Graham",
pay_band: “C",
benefits : [
{ type : "Health",
plan : "PPO Plus" },
{ type : "Dental",
plan : "Standard" }
]
}
Code/Highlight Example
31. Aggregation Framework Agility Backup Big Data Briefcase
Buildings Business Intelligence Camera Cash Register Catalog
Chat Checkmark Checkmark Cloud Commercial Contract
Computer Content Continuous Development Credit Card Customer Success
32. Data Center Data Variety Data Velocity Data Volume Data Warehouse Database
Dialogue Directory Documents Downloads Drivers Dynamic Schema
EDW Integration Faster Time to Market File Transfer Flexible Gear Hadoop
Health Check High Availability Horizontal Scaling Integrating into Infrastructure Internet of Things Iterative Development
33. Life Preserver Line Graph Lock Log Data Lower Cost Magnifying Glass
Man Mobile Phone Meter Monitoring Music New Apps
New Data Types Online Open Source Parachute Personalization Pin
Platform Certification Product Catalog Puzzle Pieces RDBMS Realtime Analytics Rich Querying
34. Life Preserver RSS Scalability Scale Secondary Indexing Steering Wheel
Stopwatch Text Search Tick Data Training Transmission Tower Trophy
Woman World