SlideShare uma empresa Scribd logo
1 de 73
Baixar para ler offline
Graph Operations
With MongoDB
Charles Sarrazin
Senior Consulting Engineer, MongoDB
Charles Sarrazin
Senior Consulting Engineer, MongoDB
Graph Operations
With MongoDB
Agenda
MongoDB
Introduction
01 New Lookup
Operators
03Graph Use &
Concepts
02
Example Scenarios
04 Wrap-up
06Design &
Performance
Considerations
05
MongoDB Introduction
Documents
{
first_name: ‘Paul’,
surname: ‘Miller’,
cell: 447557505611,
city: ‘London’,
location: [45.123,47.232],
profession: [‘banking’, ‘finance’, ‘trader’],
cars: [
{ model: ‘Bentley’,
year: 1973,
value: 100000, … },
{ model: ‘Rolls Royce’,
year: 1965,
value: 330000, … }
]
}
Fields can contain an array
of sub-documents
Fields
Typed field values
Fields can contain arrays
Number
Query Language
db.collection.find({'city':'London'})
db.collection.find({'profession':{'$in':['banking','trader']}},{'surname':1,'profession':1})
db.collection.find({'cars.year':{'$lte':1968}}).sort({'surname':1}).limit(10)
db.collection.find({'cars.model':'Bentley','cars.year':{'$lt':1966}})
db.collection.find({'cars':{'$elemMatch':{'model':'Bentley','year':{'$lt':1966}}}})
db.collection.find({'location':{'$geoWithin': { '$geometry': {
'type': 'Polygon',
coordinates: [ <array-of-coordinates> ]
}}}})
SecondaryIndexes
compound, geospatial, text, multikey, hashed,
unique, sparse, partial, TTL
Query Language
db.collection.aggregate ( [
{$match:{'profession':{'$in':['banking','trader']}}},
{$addFields:{'surnameLower':{$toLower:"$surname"},'prof':{$ifNull:["$prof","Unknown"]}},
{$group: { ... } },
{$sort: { ... } },
{$limit: { ... } },
{$match: { ... } },
...
] )
Aggregation pipeline
Schema Design
{
first_name: ‘Paul’,
surname: ‘Miller’,
cell: 447557505611,
city: ‘London’,
location: [45.123,47.232],
profession: [‘banking’, ‘finance’, ‘trader’],
cars: [
{ model: ‘Bentley’,
year: 1973,
value: 100000, … },
{ model: ‘Rolls Royce’,
year: 1965,
value: 330000, … }
]
}
Embed
same
document
Schema Design
{
first_name: ‘Paul’,
surname: ‘Miller’,
cell: 447557505611,
city: ‘London’,
location: [45.123,47.232],
profession: [‘banking’, ‘finance’, ‘trader’],
cars: [
{ model: ‘Bentley’,
year: 1973,
value: 100000, … },
{ model: ‘Rolls Royce’,
year: 1965,
value: 330000, … }
]
}
Embed
same
document
{
first_name: ‘Paul’,
surname: ‘Miller’,
cell: 447557505611,
city: ‘London’,
location: [45.123,47.232],
profession: [‘banking’, ‘finance’, ‘trader’]
}
cars:
{ owner_id: 146
model: ‘Bentley’,
year: 1973,
value: 100000, … },
{ owner_id: 146
model: ‘Rolls Royce’,
year: 1965,
value: 330000, … }
Separate
Collection
with reference
Functionality Timeline
2.0 – 2.2
Geospatial Polygon support
Aggregation Framework
New 2dsphere index
Aggregation Framework
efficiency optimisations
Full text search
2.4 – 2.6
3.0 – 3.2
Join functionality
Increased geo accuracy
New Aggregation operators
Improved case insensitivity
Recursive graph traversal
Faceted search
Multiple collations
3.4
MongoDB 3.4 - Multi-Model Database
Document
Rich	JSON	Data	Structures
Flexible	Schema
Global	Scale
Relational
Left-Outer	Join
Views
Schema	Validation
Key/Value
Horizontal	Scale
In-Memory
Search
Text	Search
Multiple	Languages
Faceted	Search
Binaries
Files	&	Metadata
Encrypted
Graph
Graph	&	Hierarchical
Recursive	Lookups
GeoSpatial
GeoJSON
2D	&	2DSphere
Graph Use & Concepts
Common Use Cases
• Networks
• Social – circle of friends/colleagues
• Computer network – physical/virtual/application layer
• Mapping / Routes
• Shortest route A to B
• Cybersecurity & Fraud Detection
• Real-time fraud/scam recognition
• Personalisation/Recommendation Engine
• Product, social, service, professional etc.
Graph Key Concepts
• Vertices (nodes)
• Edges (relationships)
• Nodes have properties
• Relationships have name & direction
Relational DBs Lack Relationships
• “Relationships” are actually JOINs
• Raw business or storage logic and constraints – not semantic
• JOIN tables, sparse columns, null-checks
• More JOINS = degraded performance and flexibility
Relational DBs Lack Relationships
• How expensive/complex is:
– Find my friends?
– Find friends of my friends?
– Find mutual friends?
– Find friends of my friends of my friends?
– And so on…
Native Graph Database Strengths
• Relationships are first class citizens of the database
• Index-free adjacency
• Nodes “point” directly to other nodes
• Efficient relationship traversal
Native Graph Database Challenges
• Complex query languages
• Poorly optimized for non-traversal queries
• Difficult to express
• May be memory intensive
• Less often used as System Of Record
• Synchronisation with SOR required
• Increased operational complexity
• Consistency concerns
NoSQL DBs Lack Relationships
• “Flat” disconnected documents or key/value pairs
• “Foreign keys” inferred at application layer
• Data integrity/quality onus is on the application
• Suggestions re difficulty of modeling ANY relationships efficiently with
aggregate stores.
• However…
Friends Network – Document Style
{
_id: 0,
name: "Bob Smith",
friends: ["Anna Jones", "Chris Green"]
},
{
_id: 1,
name: "Anna Jones",
friends: ["Bob Smith", "Chris Green", "Joe Lee"]
},
{
_id: 2,
name: "Chris Green",
friends: ["Anna Jones", "Bob Smith"]
}
Schema Design – before $graphLookup
• Options
• Store an array of direct children in each node
• Store parent in each node
• Store parent and array of ancestors
• Trade-offs
• Simple queries…
• …vs simple updates
5 13 14 16 176
3 15121094
2 7 8 11
1
Why MongoDB For Graph?
Lookup Operators
$lookup
Syntax
$lookup: {
from: <target lookup collection>,
localField: <field from the input document>,
foreignField: <field from the target collection to connect to>,
as: <field name for resulting array>
}
$graphLookup
Syntax
$graphLookup: {
from: <target lookup collection>,
startWith: <expression for value to start from>,
connectToField: <field name in target collection to connect to>,
connectFromField: <field name in target collection to connect from – recurse from here>,
as: <field name for resulting array>,
maxDepth: <max number of iterations to perform>,
depthField: <field name for number of recursive iterations required to reach this node>,
restrictSearchWithMatch: <match condition to apply to lookup>
}
Things To Note
• startWith value is an expression
• Referencing value of a field requires the ‘$’ prefix
• Can do things like {$toLower: "$name" }
• Handles array fields automatically
• connectToField and connectFromField take field names
• restrictSearchWithMatch takes a standard query expressions
Things To Note
• Cycles are automatically detected
• Can be used with 3.4 views:
• Define a view
• Recurse across existing view (‘base’ or ‘from’)
• Can be used multiple times per Aggregation pipeline
Schema Design – before $graphLookup
• Options
• Store an array of direct children in each node
• Store parent in each node
• Store parent and array of ancestors
• Trade-offs
• Simple queries…
• …vs simple updates
5 13 14 16 176
3 15121094
2 7 8 11
1
• Options
• Store immediate parent in each node
• Store immediate children in each node
• Traverse in multiple directions
• Recurse in same collection
• Join/recurse into another collection
5 13 14 16 176
3 15121094
2 7 8 11
1
Schema Design – with $graphLookup
75%
of use cases*
*based on beta test user feedback
So just how suitable is MongoDB for
the many varied graph use cases I
have then?”
Example Scenarios
Scenario: Calculate Friend Network
{
_id: 0,
name: "Bob Smith",
friends: ["Anna Jones", "Chris Green"]
},
{
_id: 1,
name: "Anna Jones",
friends: ["Bob Smith", "Chris Green", "Joe Lee"]
},
{
_id: 2,
name: "Chris Green",
friends: ["Anna Jones", "Bob Smith"]
}
Scenario: Calculate Friend Network
[
{
$match: { "name": "Bob Smith" }
},
{
$graphLookup: {
from: "contacts",
startWith: "$friends",
connectToField: "name",
connectFromField: "friends”,
as: "socialNetwork"
}
},
{
$project: { name: 1, friends:1, socialNetwork: "$socialNetwork.name"}
}
]
This field is an array
No maxDepth set
Scenario: Calculate Friend Network
{
"_id" : 0,
"name" : "Bob Smith",
"friends" : [
"Anna Jones",
"Chris Green"
],
"socialNetwork" : [
"Joe Lee",
"Fred Brown",
"Bob Smith",
"Chris Green",
"Anna Jones"
]
}
Array
Friends Network - Social
Bob
Smith
Chris
Greenfriends
Anna
Jones
Joe Lee
Recommendation ?
Friends Network - Social
Bob
Smith
Chris
Greenfriends
Anna
Jones
Joe Lee
Recommendation ?
Acme
Soda
Scenario: Determine Air Travel Options
ORD
JFK
BOS
PWM
LHR
{ "_id" : 0, "airport" : "JFK", "connects" : [ "BOS", "ORD" ] }
{ "_id" : 1, "airport" : "BOS", "connects" : [ "JFK", "PWM" ] }
{ "_id" : 2, "airport" : "ORD", "connects" : [ "JFK" ] }
{ "_id" : 3, "airport" : "PWM", "connects" : [ "BOS", "LHR" ] }
{ "_id" : 4, "airport" : "LHR", "connects" : [ "PWM" ] }
Scenario: Determine Air Travel Options
Meet Lucy
{ "_id" : 0, "name" : "Lucy", "nearestAirport" : "JFK" }
[
{
"$match": {"name":"Lucy"}
},
{
"$graphLookup": {
from: "airports",
startWith: "$nearestAirport",
connectToField: "airport",
connectFromField: "connects",
maxDepth: 2,
depthField: "numFlights",
as: "destinations”
}
}
]
Scenario: Determine Air Travel Options
Record the number of
recursions
{
name: "Lucy”,
nearestAirport: "JFK",
destinations: [
{ _id: 0, airport: "JFK", connects: ["BOS", "ORD"], numFlights: 0 },
{ _id: 1, airport: "BOS", connects: ["JFK", "PWM"], numFlights: 1 },
{ _id: 2, airport: "ORD", connects: ["JFK"], numFlights: 1 },
{ _id: 3, airport: "PWM", connects: ["BOS", "LHR"], numFlights: 2 }
]
}
Scenario: Determine Air Travel Options
How many flights this
would take
ORD
JFK
BOS
PWM
LHR
ATL
Scenario: Determine Air Travel Options
{ "_id" : 0, "airport" : "JFK", "connects" : [
{ "to" : "BOS", "airlines" : [ "UA", "AA" ] },
{ "to" : "ORD", "airlines" : [ "UA", "AA" ] },
{ "to" : "ATL", "airlines" : [ "AA", "DL" ] }] }
{ "_id" : 1, "airport" : "BOS", "connects" : [
{ "to" : "JFK", "airlines" : [ "UA", "AA" ] },
{ "to" : "PWM", "airlines" : [ "AA" ] } ]] }
{ "_id" : 2, "airport" : "ORD", "connects" : [
{ "to" : "JFK", "airlines" : [ "UA”,"AA" ] }] }
{ "_id" : 3, "airport" : "PWM", "connects" : [
{ "to" : "BOS", "airlines" : [ "AA" ] }] }
Scenario: Determine Air Travel Options
[
{
"$match":{"name":"Lucy"}
},
{
"$graphLookup": {
from: "airports",
startWith: "$nearestAirport",
connectToField: "airport",
connectFromField: "connects.to”,
maxDepth: 2,
depthField: "numFlights”,
restrictSearchWithMatch: {"connects.airlines":"UA"},
as: ”UAdestinations"
}
}
]
Scenario: Determine Air Travel Options
We’ve added a filter
{
"name" : "Lucy",
"from" : "JFK",
"UAdestinations" : [
{ "_id" : 2, "airport" : "ORD", "numFlights" : NumberLong(1) },
{ "_id" : 1, "airport" : "BOS", "numFlights" : NumberLong(1) }
]
}
Scenario: Determine Air Travel Options
Scenario: Product Categories
Mugs
Kitchen &
Dining
Commuter &
Travel
Glassware &
Drinkware
Outdoor
Recreation
Camping
Mugs
Running
Thermos
Red Run
Thermos
White Run
Thermos
Blue Run
Thermos
Scenario: Product Categories
Get all children 2 levels deep – flat result
Scenario: Product Categories
Get all children 2 levels deep – nested result
Scenario: Article Recommendation
1
98
9
1
8
15
7
2
6
8
5
38
4
12
3
4
2
75
Depth 1
Depth 2
Depth 0
43
19
content id
conversion rate
recommendation
Scenario: Article Recommendation
1
98
9
1
8
15
7
2
6
8
5
38
4
12
3
4
2
75
Depth 1
Depth 2
Depth 0
43
19
content id
conversion rate
recommendation
Recommendations
for Target #1
Recommendation for
Targets #2 and #3
Target #1 (best)
Target #2
Target #3
Syntax
Syntax
Design & Performance
Considerations
The Tale of Two Biebers
VS
Follower Churn
• Everyone worries about scaling content
• But follow requests can be >> message send rates
• Twitter enforces per day follow limits
Edge Metadata
• Models – friends/followers
• Requirements typically start simple
• Add Groups, Favorites, Relationships
Options for Storing Graphs in MongoDB
Option One – Embedding Edges
Embedded Edge Arrays
• Storing connections with user (popular choice)
üMost compact form
üEfficient for reads
• However….
• User documents grow
• Upper limit on degree (document size)
• Difficult to annotate (and index) edge
{
"_id" : "djw",
"fullname" : "Darren Wood",
"country" : "Australia",
"followers" : [ "jsr", "ian"],
"following" : [ "jsr", "pete"]
}
Embedded Edge Arrays
• Creating Rich Graph Information
• Can become cumbersome
{
"_id" : "djw",
"fullname" : "Darren Wood",
"country" : "Australia",
"friends" : [
{"uid" : "jsr", "grp" : "school"},
{"uid" : "ian", "grp" : "work"} ]
}
{
"_id" : "djw",
"fullname" : "Darren Wood",
"country" : "Australia",
"friends" : [ "jsr", "ian"],
"group" : [ ”school", ”work"]
}
Option Two – Edge Collection
Edge Collections
• Document per edge
• Very flexible for adding edge data
> db.followers.findOne()
{
"_id" : ObjectId(…),
"from" : "djw",
"to" : "jsr"
}
> db.friends.findOne()
{
"_id" : ObjectId(…),
"from" : "djw",
"to" : "jsr",
"grp" : "work",
"ts" : Date("2013-07-10")
}
Edge Collection
Indexing Strategies
Finding Followers
Find followers in single edge collection :
> db.followers.find({from : "djw"}, {_id:0, to:1})
{
"to" : "jsr"
}
Using index :
{
"v" : 1,
"key" : { "from" : 1, "to" : 1 },
"unique" : true,
"ns" : "socialite.followers",
"name" : "from_1_to_1"
}
Covered index when
searching on "from" for all
followers
Specify only if multiple
edges cannot exist
Finding Following
What about who a user is following?
Could use a reverse covered index :
{
"v" : 1,
"key" : { "from" : 1, "to" : 1 },
"unique" : true,
"ns" : "socialite.followers",
"name" : "from_1_to_1"
}
{
"v" : 1,
"key" : { "to" : 1, "from" : 1 },
"unique" : true,
"ns" : "socialite.followers",
"name" : "to_1_from_1"
}
Notice the flipped field
order here
Wait ! There may be an issue with the reverse index…..
{
"v" : 1,
"key" : { "from" : 1, "to" : 1 },
"unique" : true,
"ns" : "socialite.followers",
"name" : "from_1_to_1"
}
{
"v" : 1,
"key" : { "to" : 1, "from" : 1 },
"unique" : true,
"ns" : "socialite.followers",
"name" : "to_1_from_1"
}
If we shard this collection by "from",
looking up followers for a specific
user is "targeted" to a shard
To find who the user is following
however, it must scatter-gather the
query to all shards
SHARDING!
Finding Following
Dual Edge Collections
Dual Edge Collections
• When "following" queries are common
• Not always the case
• Consider overhead carefully
• Can use dual collections storing
• One for each direction
• Edges are duplicated reversed
• Can be sharded independently
Wrap-up
MongoDB $graphLookup
• Efficient, index-based recursive queries
• Familiar, MongoDB query language
• Use a single System Of Record
• Cater for all query types
• No added operational overhead
• No synchronization requirements
• Reduced technology surface area
Graph Operations
With MongoDB
Charles Sarrazin
Senior Consulting Engineer, MongoDB

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Storing tree structures with MongoDB
Storing tree structures with MongoDBStoring tree structures with MongoDB
Storing tree structures with MongoDB
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
An introduction to MongoDB
An introduction to MongoDBAn introduction to MongoDB
An introduction to MongoDB
 
Intro to Neo4j and Graph Databases
Intro to Neo4j and Graph DatabasesIntro to Neo4j and Graph Databases
Intro to Neo4j and Graph Databases
 
MongoDB .local Toronto 2019: Tips and Tricks for Effective Indexing
MongoDB .local Toronto 2019: Tips and Tricks for Effective IndexingMongoDB .local Toronto 2019: Tips and Tricks for Effective Indexing
MongoDB .local Toronto 2019: Tips and Tricks for Effective Indexing
 
Redis data modeling examples
Redis data modeling examplesRedis data modeling examples
Redis data modeling examples
 
MongoDB Fundamentals
MongoDB FundamentalsMongoDB Fundamentals
MongoDB Fundamentals
 
Introduction to mongodb
Introduction to mongodbIntroduction to mongodb
Introduction to mongodb
 
MongoDB San Francisco 2013: Hash-based Sharding in MongoDB 2.4 presented by B...
MongoDB San Francisco 2013: Hash-based Sharding in MongoDB 2.4 presented by B...MongoDB San Francisco 2013: Hash-based Sharding in MongoDB 2.4 presented by B...
MongoDB San Francisco 2013: Hash-based Sharding in MongoDB 2.4 presented by B...
 
Mongo DB 102
Mongo DB 102Mongo DB 102
Mongo DB 102
 
MongoDB
MongoDBMongoDB
MongoDB
 
Basics of MongoDB
Basics of MongoDB Basics of MongoDB
Basics of MongoDB
 
Cassandra Database
Cassandra DatabaseCassandra Database
Cassandra Database
 
JSON-LD and MongoDB
JSON-LD and MongoDBJSON-LD and MongoDB
JSON-LD and MongoDB
 
MongoDB 101
MongoDB 101MongoDB 101
MongoDB 101
 
An Overview of Apache Cassandra
An Overview of Apache CassandraAn Overview of Apache Cassandra
An Overview of Apache Cassandra
 
MongoDB Aggregation Performance
MongoDB Aggregation PerformanceMongoDB Aggregation Performance
MongoDB Aggregation Performance
 
MongoDB
MongoDBMongoDB
MongoDB
 
Graph databases
Graph databasesGraph databases
Graph databases
 
JSON-LD: JSON for the Social Web
JSON-LD: JSON for the Social WebJSON-LD: JSON for the Social Web
JSON-LD: JSON for the Social Web
 

Destaque

The Aggregation Framework
The Aggregation FrameworkThe Aggregation Framework
The Aggregation Framework
MongoDB
 
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB
 

Destaque (10)

Back to Basics Webinar 1: Introduction to NoSQL
Back to Basics Webinar 1: Introduction to NoSQLBack to Basics Webinar 1: Introduction to NoSQL
Back to Basics Webinar 1: Introduction to NoSQL
 
Design, Scale and Performance of MapR's Distribution for Hadoop
Design, Scale and Performance of MapR's Distribution for HadoopDesign, Scale and Performance of MapR's Distribution for Hadoop
Design, Scale and Performance of MapR's Distribution for Hadoop
 
Webinar: 10-Step Guide to Creating a Single View of your Business
Webinar: 10-Step Guide to Creating a Single View of your BusinessWebinar: 10-Step Guide to Creating a Single View of your Business
Webinar: 10-Step Guide to Creating a Single View of your Business
 
The Aggregation Framework
The Aggregation FrameworkThe Aggregation Framework
The Aggregation Framework
 
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
 
Creating a Modern Data Architecture for Digital Transformation
Creating a Modern Data Architecture for Digital TransformationCreating a Modern Data Architecture for Digital Transformation
Creating a Modern Data Architecture for Digital Transformation
 
Back to Basics Webinar 3: Introduction to Replica Sets
Back to Basics Webinar 3: Introduction to Replica SetsBack to Basics Webinar 3: Introduction to Replica Sets
Back to Basics Webinar 3: Introduction to Replica Sets
 
Seattle Scalability Meetup - Ted Dunning - MapR
Seattle Scalability Meetup - Ted Dunning - MapRSeattle Scalability Meetup - Ted Dunning - MapR
Seattle Scalability Meetup - Ted Dunning - MapR
 
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
 
Back to Basics: My First MongoDB Application
Back to Basics: My First MongoDB ApplicationBack to Basics: My First MongoDB Application
Back to Basics: My First MongoDB Application
 

Semelhante a Webinar: Working with Graph Data in MongoDB

Data Processing and Aggregation with MongoDB
Data Processing and Aggregation with MongoDB Data Processing and Aggregation with MongoDB
Data Processing and Aggregation with MongoDB
MongoDB
 
CouchDB Open Source Bridge
CouchDB Open Source BridgeCouchDB Open Source Bridge
CouchDB Open Source Bridge
Chris Anderson
 
Building your First MEAN App
Building your First MEAN AppBuilding your First MEAN App
Building your First MEAN App
MongoDB
 

Semelhante a Webinar: Working with Graph Data in MongoDB (20)

MongoDB Meetup
MongoDB MeetupMongoDB Meetup
MongoDB Meetup
 
Introduction to MongoDB and Workshop
Introduction to MongoDB and WorkshopIntroduction to MongoDB and Workshop
Introduction to MongoDB and Workshop
 
MongoDB 3.0
MongoDB 3.0 MongoDB 3.0
MongoDB 3.0
 
MongoDB.local DC 2018: Tutorial - Data Analytics with MongoDB
MongoDB.local DC 2018: Tutorial - Data Analytics with MongoDBMongoDB.local DC 2018: Tutorial - Data Analytics with MongoDB
MongoDB.local DC 2018: Tutorial - Data Analytics with MongoDB
 
Aggregation Framework MongoDB Days Munich
Aggregation Framework MongoDB Days MunichAggregation Framework MongoDB Days Munich
Aggregation Framework MongoDB Days Munich
 
Webinar: Back to Basics: Thinking in Documents
Webinar: Back to Basics: Thinking in DocumentsWebinar: Back to Basics: Thinking in Documents
Webinar: Back to Basics: Thinking in Documents
 
Data Processing and Aggregation with MongoDB
Data Processing and Aggregation with MongoDB Data Processing and Aggregation with MongoDB
Data Processing and Aggregation with MongoDB
 
Mongo db 101 dc group
Mongo db 101 dc groupMongo db 101 dc group
Mongo db 101 dc group
 
MongoDB .local Houston 2019: Jumpstart: From SQL to NoSQL -- Changing Your Mi...
MongoDB .local Houston 2019: Jumpstart: From SQL to NoSQL -- Changing Your Mi...MongoDB .local Houston 2019: Jumpstart: From SQL to NoSQL -- Changing Your Mi...
MongoDB .local Houston 2019: Jumpstart: From SQL to NoSQL -- Changing Your Mi...
 
Webinar: General Technical Overview of MongoDB for Dev Teams
Webinar: General Technical Overview of MongoDB for Dev TeamsWebinar: General Technical Overview of MongoDB for Dev Teams
Webinar: General Technical Overview of MongoDB for Dev Teams
 
Simplifying & accelerating application development with MongoDB's intelligent...
Simplifying & accelerating application development with MongoDB's intelligent...Simplifying & accelerating application development with MongoDB's intelligent...
Simplifying & accelerating application development with MongoDB's intelligent...
 
CouchDB Open Source Bridge
CouchDB Open Source BridgeCouchDB Open Source Bridge
CouchDB Open Source Bridge
 
Building your First MEAN App
Building your First MEAN AppBuilding your First MEAN App
Building your First MEAN App
 
tranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And When
tranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And WhentranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And When
tranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And When
 
The Aggregation Framework
The Aggregation FrameworkThe Aggregation Framework
The Aggregation Framework
 
Joins and Other Aggregation Enhancements Coming in MongoDB 3.2
Joins and Other Aggregation Enhancements Coming in MongoDB 3.2Joins and Other Aggregation Enhancements Coming in MongoDB 3.2
Joins and Other Aggregation Enhancements Coming in MongoDB 3.2
 
From SQL to NoSQL -- Changing Your Mindset
From SQL to NoSQL -- Changing Your MindsetFrom SQL to NoSQL -- Changing Your Mindset
From SQL to NoSQL -- Changing Your Mindset
 
MongoDB World 2019: From SQL to NoSQL -- Changing Your Mindset
MongoDB World 2019: From SQL to NoSQL -- Changing Your MindsetMongoDB World 2019: From SQL to NoSQL -- Changing Your Mindset
MongoDB World 2019: From SQL to NoSQL -- Changing Your Mindset
 
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
 
Scalding big ADta
Scalding big ADtaScalding big ADta
Scalding big ADta
 

Mais de MongoDB

Mais de MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 

Webinar: Working with Graph Data in MongoDB

  • 1. Graph Operations With MongoDB Charles Sarrazin Senior Consulting Engineer, MongoDB
  • 2. Charles Sarrazin Senior Consulting Engineer, MongoDB Graph Operations With MongoDB
  • 3. Agenda MongoDB Introduction 01 New Lookup Operators 03Graph Use & Concepts 02 Example Scenarios 04 Wrap-up 06Design & Performance Considerations 05
  • 5. Documents { first_name: ‘Paul’, surname: ‘Miller’, cell: 447557505611, city: ‘London’, location: [45.123,47.232], profession: [‘banking’, ‘finance’, ‘trader’], cars: [ { model: ‘Bentley’, year: 1973, value: 100000, … }, { model: ‘Rolls Royce’, year: 1965, value: 330000, … } ] } Fields can contain an array of sub-documents Fields Typed field values Fields can contain arrays Number
  • 7. Query Language db.collection.aggregate ( [ {$match:{'profession':{'$in':['banking','trader']}}}, {$addFields:{'surnameLower':{$toLower:"$surname"},'prof':{$ifNull:["$prof","Unknown"]}}, {$group: { ... } }, {$sort: { ... } }, {$limit: { ... } }, {$match: { ... } }, ... ] ) Aggregation pipeline
  • 8. Schema Design { first_name: ‘Paul’, surname: ‘Miller’, cell: 447557505611, city: ‘London’, location: [45.123,47.232], profession: [‘banking’, ‘finance’, ‘trader’], cars: [ { model: ‘Bentley’, year: 1973, value: 100000, … }, { model: ‘Rolls Royce’, year: 1965, value: 330000, … } ] } Embed same document
  • 9. Schema Design { first_name: ‘Paul’, surname: ‘Miller’, cell: 447557505611, city: ‘London’, location: [45.123,47.232], profession: [‘banking’, ‘finance’, ‘trader’], cars: [ { model: ‘Bentley’, year: 1973, value: 100000, … }, { model: ‘Rolls Royce’, year: 1965, value: 330000, … } ] } Embed same document { first_name: ‘Paul’, surname: ‘Miller’, cell: 447557505611, city: ‘London’, location: [45.123,47.232], profession: [‘banking’, ‘finance’, ‘trader’] } cars: { owner_id: 146 model: ‘Bentley’, year: 1973, value: 100000, … }, { owner_id: 146 model: ‘Rolls Royce’, year: 1965, value: 330000, … } Separate Collection with reference
  • 10.
  • 11. Functionality Timeline 2.0 – 2.2 Geospatial Polygon support Aggregation Framework New 2dsphere index Aggregation Framework efficiency optimisations Full text search 2.4 – 2.6 3.0 – 3.2 Join functionality Increased geo accuracy New Aggregation operators Improved case insensitivity Recursive graph traversal Faceted search Multiple collations 3.4
  • 12. MongoDB 3.4 - Multi-Model Database Document Rich JSON Data Structures Flexible Schema Global Scale Relational Left-Outer Join Views Schema Validation Key/Value Horizontal Scale In-Memory Search Text Search Multiple Languages Faceted Search Binaries Files & Metadata Encrypted Graph Graph & Hierarchical Recursive Lookups GeoSpatial GeoJSON 2D & 2DSphere
  • 13. Graph Use & Concepts
  • 14. Common Use Cases • Networks • Social – circle of friends/colleagues • Computer network – physical/virtual/application layer • Mapping / Routes • Shortest route A to B • Cybersecurity & Fraud Detection • Real-time fraud/scam recognition • Personalisation/Recommendation Engine • Product, social, service, professional etc.
  • 15. Graph Key Concepts • Vertices (nodes) • Edges (relationships) • Nodes have properties • Relationships have name & direction
  • 16. Relational DBs Lack Relationships • “Relationships” are actually JOINs • Raw business or storage logic and constraints – not semantic • JOIN tables, sparse columns, null-checks • More JOINS = degraded performance and flexibility
  • 17. Relational DBs Lack Relationships • How expensive/complex is: – Find my friends? – Find friends of my friends? – Find mutual friends? – Find friends of my friends of my friends? – And so on…
  • 18. Native Graph Database Strengths • Relationships are first class citizens of the database • Index-free adjacency • Nodes “point” directly to other nodes • Efficient relationship traversal
  • 19. Native Graph Database Challenges • Complex query languages • Poorly optimized for non-traversal queries • Difficult to express • May be memory intensive • Less often used as System Of Record • Synchronisation with SOR required • Increased operational complexity • Consistency concerns
  • 20. NoSQL DBs Lack Relationships • “Flat” disconnected documents or key/value pairs • “Foreign keys” inferred at application layer • Data integrity/quality onus is on the application • Suggestions re difficulty of modeling ANY relationships efficiently with aggregate stores. • However…
  • 21. Friends Network – Document Style { _id: 0, name: "Bob Smith", friends: ["Anna Jones", "Chris Green"] }, { _id: 1, name: "Anna Jones", friends: ["Bob Smith", "Chris Green", "Joe Lee"] }, { _id: 2, name: "Chris Green", friends: ["Anna Jones", "Bob Smith"] }
  • 22. Schema Design – before $graphLookup • Options • Store an array of direct children in each node • Store parent in each node • Store parent and array of ancestors • Trade-offs • Simple queries… • …vs simple updates 5 13 14 16 176 3 15121094 2 7 8 11 1
  • 23. Why MongoDB For Graph?
  • 26. Syntax $lookup: { from: <target lookup collection>, localField: <field from the input document>, foreignField: <field from the target collection to connect to>, as: <field name for resulting array> }
  • 28. Syntax $graphLookup: { from: <target lookup collection>, startWith: <expression for value to start from>, connectToField: <field name in target collection to connect to>, connectFromField: <field name in target collection to connect from – recurse from here>, as: <field name for resulting array>, maxDepth: <max number of iterations to perform>, depthField: <field name for number of recursive iterations required to reach this node>, restrictSearchWithMatch: <match condition to apply to lookup> }
  • 29. Things To Note • startWith value is an expression • Referencing value of a field requires the ‘$’ prefix • Can do things like {$toLower: "$name" } • Handles array fields automatically • connectToField and connectFromField take field names • restrictSearchWithMatch takes a standard query expressions
  • 30. Things To Note • Cycles are automatically detected • Can be used with 3.4 views: • Define a view • Recurse across existing view (‘base’ or ‘from’) • Can be used multiple times per Aggregation pipeline
  • 31. Schema Design – before $graphLookup • Options • Store an array of direct children in each node • Store parent in each node • Store parent and array of ancestors • Trade-offs • Simple queries… • …vs simple updates 5 13 14 16 176 3 15121094 2 7 8 11 1
  • 32. • Options • Store immediate parent in each node • Store immediate children in each node • Traverse in multiple directions • Recurse in same collection • Join/recurse into another collection 5 13 14 16 176 3 15121094 2 7 8 11 1 Schema Design – with $graphLookup
  • 33. 75% of use cases* *based on beta test user feedback So just how suitable is MongoDB for the many varied graph use cases I have then?”
  • 35. Scenario: Calculate Friend Network { _id: 0, name: "Bob Smith", friends: ["Anna Jones", "Chris Green"] }, { _id: 1, name: "Anna Jones", friends: ["Bob Smith", "Chris Green", "Joe Lee"] }, { _id: 2, name: "Chris Green", friends: ["Anna Jones", "Bob Smith"] }
  • 36. Scenario: Calculate Friend Network [ { $match: { "name": "Bob Smith" } }, { $graphLookup: { from: "contacts", startWith: "$friends", connectToField: "name", connectFromField: "friends”, as: "socialNetwork" } }, { $project: { name: 1, friends:1, socialNetwork: "$socialNetwork.name"} } ] This field is an array No maxDepth set
  • 37. Scenario: Calculate Friend Network { "_id" : 0, "name" : "Bob Smith", "friends" : [ "Anna Jones", "Chris Green" ], "socialNetwork" : [ "Joe Lee", "Fred Brown", "Bob Smith", "Chris Green", "Anna Jones" ] } Array
  • 38. Friends Network - Social Bob Smith Chris Greenfriends Anna Jones Joe Lee Recommendation ?
  • 39. Friends Network - Social Bob Smith Chris Greenfriends Anna Jones Joe Lee Recommendation ? Acme Soda
  • 40. Scenario: Determine Air Travel Options ORD JFK BOS PWM LHR { "_id" : 0, "airport" : "JFK", "connects" : [ "BOS", "ORD" ] } { "_id" : 1, "airport" : "BOS", "connects" : [ "JFK", "PWM" ] } { "_id" : 2, "airport" : "ORD", "connects" : [ "JFK" ] } { "_id" : 3, "airport" : "PWM", "connects" : [ "BOS", "LHR" ] } { "_id" : 4, "airport" : "LHR", "connects" : [ "PWM" ] }
  • 41. Scenario: Determine Air Travel Options Meet Lucy { "_id" : 0, "name" : "Lucy", "nearestAirport" : "JFK" }
  • 42. [ { "$match": {"name":"Lucy"} }, { "$graphLookup": { from: "airports", startWith: "$nearestAirport", connectToField: "airport", connectFromField: "connects", maxDepth: 2, depthField: "numFlights", as: "destinations” } } ] Scenario: Determine Air Travel Options Record the number of recursions
  • 43. { name: "Lucy”, nearestAirport: "JFK", destinations: [ { _id: 0, airport: "JFK", connects: ["BOS", "ORD"], numFlights: 0 }, { _id: 1, airport: "BOS", connects: ["JFK", "PWM"], numFlights: 1 }, { _id: 2, airport: "ORD", connects: ["JFK"], numFlights: 1 }, { _id: 3, airport: "PWM", connects: ["BOS", "LHR"], numFlights: 2 } ] } Scenario: Determine Air Travel Options How many flights this would take
  • 45. { "_id" : 0, "airport" : "JFK", "connects" : [ { "to" : "BOS", "airlines" : [ "UA", "AA" ] }, { "to" : "ORD", "airlines" : [ "UA", "AA" ] }, { "to" : "ATL", "airlines" : [ "AA", "DL" ] }] } { "_id" : 1, "airport" : "BOS", "connects" : [ { "to" : "JFK", "airlines" : [ "UA", "AA" ] }, { "to" : "PWM", "airlines" : [ "AA" ] } ]] } { "_id" : 2, "airport" : "ORD", "connects" : [ { "to" : "JFK", "airlines" : [ "UA”,"AA" ] }] } { "_id" : 3, "airport" : "PWM", "connects" : [ { "to" : "BOS", "airlines" : [ "AA" ] }] } Scenario: Determine Air Travel Options
  • 46. [ { "$match":{"name":"Lucy"} }, { "$graphLookup": { from: "airports", startWith: "$nearestAirport", connectToField: "airport", connectFromField: "connects.to”, maxDepth: 2, depthField: "numFlights”, restrictSearchWithMatch: {"connects.airlines":"UA"}, as: ”UAdestinations" } } ] Scenario: Determine Air Travel Options We’ve added a filter
  • 47. { "name" : "Lucy", "from" : "JFK", "UAdestinations" : [ { "_id" : 2, "airport" : "ORD", "numFlights" : NumberLong(1) }, { "_id" : 1, "airport" : "BOS", "numFlights" : NumberLong(1) } ] } Scenario: Determine Air Travel Options
  • 48. Scenario: Product Categories Mugs Kitchen & Dining Commuter & Travel Glassware & Drinkware Outdoor Recreation Camping Mugs Running Thermos Red Run Thermos White Run Thermos Blue Run Thermos
  • 49. Scenario: Product Categories Get all children 2 levels deep – flat result
  • 50. Scenario: Product Categories Get all children 2 levels deep – nested result
  • 51. Scenario: Article Recommendation 1 98 9 1 8 15 7 2 6 8 5 38 4 12 3 4 2 75 Depth 1 Depth 2 Depth 0 43 19 content id conversion rate recommendation
  • 52. Scenario: Article Recommendation 1 98 9 1 8 15 7 2 6 8 5 38 4 12 3 4 2 75 Depth 1 Depth 2 Depth 0 43 19 content id conversion rate recommendation Recommendations for Target #1 Recommendation for Targets #2 and #3 Target #1 (best) Target #2 Target #3
  • 56. The Tale of Two Biebers VS
  • 57. Follower Churn • Everyone worries about scaling content • But follow requests can be >> message send rates • Twitter enforces per day follow limits
  • 58. Edge Metadata • Models – friends/followers • Requirements typically start simple • Add Groups, Favorites, Relationships
  • 59. Options for Storing Graphs in MongoDB
  • 60. Option One – Embedding Edges
  • 61. Embedded Edge Arrays • Storing connections with user (popular choice) üMost compact form üEfficient for reads • However…. • User documents grow • Upper limit on degree (document size) • Difficult to annotate (and index) edge { "_id" : "djw", "fullname" : "Darren Wood", "country" : "Australia", "followers" : [ "jsr", "ian"], "following" : [ "jsr", "pete"] }
  • 62. Embedded Edge Arrays • Creating Rich Graph Information • Can become cumbersome { "_id" : "djw", "fullname" : "Darren Wood", "country" : "Australia", "friends" : [ {"uid" : "jsr", "grp" : "school"}, {"uid" : "ian", "grp" : "work"} ] } { "_id" : "djw", "fullname" : "Darren Wood", "country" : "Australia", "friends" : [ "jsr", "ian"], "group" : [ ”school", ”work"] }
  • 63. Option Two – Edge Collection
  • 64. Edge Collections • Document per edge • Very flexible for adding edge data > db.followers.findOne() { "_id" : ObjectId(…), "from" : "djw", "to" : "jsr" } > db.friends.findOne() { "_id" : ObjectId(…), "from" : "djw", "to" : "jsr", "grp" : "work", "ts" : Date("2013-07-10") }
  • 66. Finding Followers Find followers in single edge collection : > db.followers.find({from : "djw"}, {_id:0, to:1}) { "to" : "jsr" } Using index : { "v" : 1, "key" : { "from" : 1, "to" : 1 }, "unique" : true, "ns" : "socialite.followers", "name" : "from_1_to_1" } Covered index when searching on "from" for all followers Specify only if multiple edges cannot exist
  • 67. Finding Following What about who a user is following? Could use a reverse covered index : { "v" : 1, "key" : { "from" : 1, "to" : 1 }, "unique" : true, "ns" : "socialite.followers", "name" : "from_1_to_1" } { "v" : 1, "key" : { "to" : 1, "from" : 1 }, "unique" : true, "ns" : "socialite.followers", "name" : "to_1_from_1" } Notice the flipped field order here Wait ! There may be an issue with the reverse index…..
  • 68. { "v" : 1, "key" : { "from" : 1, "to" : 1 }, "unique" : true, "ns" : "socialite.followers", "name" : "from_1_to_1" } { "v" : 1, "key" : { "to" : 1, "from" : 1 }, "unique" : true, "ns" : "socialite.followers", "name" : "to_1_from_1" } If we shard this collection by "from", looking up followers for a specific user is "targeted" to a shard To find who the user is following however, it must scatter-gather the query to all shards SHARDING! Finding Following
  • 70. Dual Edge Collections • When "following" queries are common • Not always the case • Consider overhead carefully • Can use dual collections storing • One for each direction • Edges are duplicated reversed • Can be sharded independently
  • 72. MongoDB $graphLookup • Efficient, index-based recursive queries • Familiar, MongoDB query language • Use a single System Of Record • Cater for all query types • No added operational overhead • No synchronization requirements • Reduced technology surface area
  • 73. Graph Operations With MongoDB Charles Sarrazin Senior Consulting Engineer, MongoDB