The popularity of dedicated graph technologies has risen greatly in recent years, at least partly fuelled by the explosion in social media and similar systems, where a friend network or recommendation engine is often a critical component when delivering a successful application. MongoDB 3.4 introduces a new Aggregation Framework graph operator, $graphLookup, to enable some of these types of use cases to be built easily on top of MongoDB. We will see how semantic relationships can be modelled inside MongoDB today, how the new $graphLookup operator can help simplify this in 3.4, and how $graphLookup can be used to leverage these relationships and build a commercially focused news article recommendation system.
4. “
I’ve got too many applications. And I’ve got too
many different technologies and tools being used
to build and support those applications.
I want to significantly reduce my technology
surface area.”
5. #MDBE16
Functionality Timeline
2.0 – 2.2
Geospatial Polygon support
Aggregation Framework
New 2dsphere index
Aggregation Framework
efficiency optimisations
Full text search
2.4 – 2.6
3.0 – 3.2
Join functionality
Increased geo accuracy
New Aggregation operators
Improved case insensitivity
Recursive graph traversal
Faceted search
Multiple collations
3.4
8. #MDBE16
Common Use Cases
• Networks
• Social – circle of friends/colleagues
• Computer network – physical/virtual/application layer
• Mapping / Routes
• Shortest route A to B
• Cybersecurity & Fraud Detection
• Real-time fraud/scam recognition
• Personalisation/Recommendation Engine
• Product, social, service, professional etc.
9. #MDBE16
Graph Key Concepts
• Vertices (nodes)
• Edges (relationships)
• Nodes have properties
• Relationships have name & direction
11. #MDBE16
Native Graph Database Strengths
• Relationships are first class citizens of the
database
• Index-free adjacency
• Nodes “point” directly to other nodes
• Efficient relationship traversal
12. #MDBE16
Index Free Adjacency
• Each node maintains direct references to adjacent nodes
• Bi-directional joins “pre-computed” and stored as relationships
• Query times independent of overall graph size
• Query times proportional to graph search area
• BUT… efficiency relies on native graph data storage
• Fixed-sized records
• Pointer-like record IDs
• File-system and object cache
13. #MDBE16
Native Graph Database Challenges
• Complex query languages
• Lack of widely available skills
• Poorly optimised for non-traversal queries
• Difficult to express
• Inefficient, memory intensive
• Less often used as System Of Record
• Synchronisation with SOR required
• Increased operational complexity
• Consistency concerns
15. #MDBE16
Relational DBs Lack Relationships
• “Relationships” are actually JOINs
• Raw business or storage logic and constraints – not semantic
• JOIN tables, sparse columns, null-checks
• More JOINS = degraded performance and flexibility
16. #MDBE16
Relational DBs Lack Relationships
• How expensive/complex is:
• Find my friends?
• Find friends of my friends?
• Find mutual friends?
• Find friends of my friends of my friends?
• And so on…
17. #MDBE16
NoSQL DBs Lack Relationships
• “Flat” disconnected documents or key/value pairs
• “Foreign keys” inferred at application layer
• Data integrity/quality onus is on the application
• Suggestions re difficulty of modeling ANY relationships efficiently with
aggregate stores.
• However…
19. #MDBE16
Schema Design – before $graphLookup
• Options
• Store an array of direct children in each node
• Store parent in each node
• Store parent and array of ancestors
• Trade-offs
• Simple queries…
• …vs simple updates
5 13 14 16 176
3 15121094
2 7 8 11
1
20. #MDBE16
• Options
• Store immediate parent in each node
• Store immediate children in each node
• Traverse in multiple directions
• Recurse in same collection
• Join/recurse into another collection
• Aggregation Framework pipeline stage
• Can use multiple times per pipeline
5 13 14 16 176
3 15121094
2 7 8 11
1
Schema Design – with $graphLookup
22. #MDBE16
Syntax
$graphLookup: {
from: <target lookup table>,
startWith: <expression for value to start from>,
connectToField: <field name to connect to>,
connectFromField: <field name to connect from – recurse from here>,
as: <field name alias for result array>,
maxDepth: <max number of iterations to perform>,
depthField: <field name alias for number of iterations required to reach this node>,
restrictSearchWithMatch: <match condition to apply to lookup>
}
23. #MDBE16
Things To Note
• startWith value is an expression
• Referencing a field directly requires the ‘$’ prefix
• Can do things like { $toLower: “name” }
• Handles array-based fields
• connectToField and connectFromField take field names only
• restrictSearchWithMatch requires standard query expressions
• Does not support aggregation framework operators/expressions
24. #MDBE16
Things To Note
• Cycles are automatically detected
• Can be used with views:
• Define a view
• Recurse across existing view (‘base’ or ‘from’)
• Supports Collations
40. #MDBE16
Scenario: Product Categories
Mugs
Kitchen &
Dining
Commuter
& Travel
Glassware
& Drinkware
Outdoor
Recreation
Camping
Mugs
Running
Thermos
Red Run
Thermos
White Run
Thermos
Blue Run
Thermos
50. #MDBE16
MongoDB $graphLookup
• Efficient, index-based recursive queries
• Familiar, intuitive query language
• Use a single System Of Record
• Cater for all query types
• No added operational overhead
• No synchronisation requirements
• Reduced technology surface area