2. github.com/maxdemarzi
About 200 public repositories
Max De Marzi
Neo4j Field Engineer
About
Me !
01
02
03
04
maxdemarzi.com
@maxdemarzi
About 175 blog posts
3. • Relational Databases
• Graph Databases
• The most important slide about
Neo4j you will ever see
• A few slides about Modeling
• The Graph Platform
• Neo4j Cloud (aka Aura)
• Talking to Neo4j
• Neo4j Use Cases
Agenda
11. First we search for an id in the Index B- tree for the RowId
Then we search the Table B-tree to get to the data.
Inside each Page, we do a Binary search for which page to go to next.
15. Joins are executed every time
you query the relationship
Executing a Join means to
search for a key
B-Tree Index: O(log(n))
Your data grows, your search time
goes up
More Data = More Searches
Slower Performance
The Problem
1
2
3
4
16. Relational Databases can’t handle
Relationships
Degraded Performance
Speed plummets as data grows
and as the number of joins grows
Wrong Language
SQL was built with Set Theory in
mind, not Graph Theory
Not Flexible
New types of data and relationships
require schema redesign
Wrong Model
They cannot model or store
relationships without complexity
1
2
3
4
18. NoSQL Databases can’t handle
Relationships
Degraded Performance
Speed plummets as you try to join
data together in the application
Wrong Languages
Lots of wacky “almost sql”
languages terrible at “joins”
Not ACID
Eventually Consistent means
Eventually Corrupt
Wrong Model
They cannot model or store
relationships without complexity
1
2
3
4
20. Property Graph Model Components
Nodes
• Relate nodes by type and direction
• Can have Properties
• Can have Labels
• Can have Properties
name:”Dan”
born: May 29, 1970
twitter:”@dan”
name:”Ann”
born: Dec 5, 1975
Since:
Jan 10, 2011
brand: “Volvo”
model: “V70”
Car
LOVES
LIVES_WITH
Person
Relationships
Person
32. Real-Time Query Performance
Relational and Other NoSQL
Databases
ResponseTime
Connectedness and Size of Data Set
0 to 2 hops
0 to 3 degrees
Few connections
5+ hops
3+ degrees
Thousands of connections
1000x
Advantage
“Minutes to milliseconds”
Neo4j
33. I don’t know the average height of all hollywood actors, but I do know the Six Degrees of Kevin Bacon
But not for every query
34. Reimagine your Data as a Graph
Better Performance
Query relationships in real time
Right Language
Cypher was purpose built for
Graphs
Flexible and Consistent
Evolve your schema seamlessly while
keeping transactions
Right Model
Graphs simplify how you think1
2
3
4
Agile, High Performance
and Scalable without Sacrifice
43. Graph Databases: Designed for Connected Data
TRADITIONAL
DATABASES
BIG DATA
TECHNOLOGY
Store and retrieve data Aggregate and filter data Connections in data
Real time storage & retrieval Real-Time Connected Insights
Long running queries
aggregation & filtering
44. Perspective
Search
Visualization
Exploration
Inspection
Editing
Visually Explore your Neo4j Graph with Bloom
Business view of the graph enables analysts to
discover new insights
Codeless “Search first” experience makes it
easy for non-developers to pick up graphs
Easy-to-use graph interactions to explore,
inspect or edit connected data
GPU accelerated high performance
visualizations enable macro graph views
Deploys easily with Neo4j Desktop or as a
Neo4j Server plug-in component
Quickly prototype projects and enable
collaboration between developers and
business users
45. Neo4j Bloom User Interface
• Prompted Search
• Property Browser &
editor
• Category icons and
color scheme
• Pan, Zoom & Select
46. The most popular BI tools can now talk live to the
world’s most popular graph db
• Best live, seamless integration of graph data
with your favorite BI tools
• Familiar UI for end users
• No development effort for IT
• Democratizes access to Neo4j data
• Free to adopt by BI teams of Enterprise
Edition customers
Neo4j BI Connector
Tableau
JDBC
Neo4j
BI Connector
SQL
Cypher
Business/Data Analyst
Investigator
Data Scientist
48. • Degree Centrality
• Closeness Centrality
• CC Variations: Harmonic, Dangalchev,
Wasserman & Faust
• Betweenness Centrality & Approximate
• PageRank
• Personalized PageRank
• ArticleRank
• Eigenvector Centrality
• Triangle Count
• Clustering Coefficients
• Connected Components (Union Find)
• Strongly Connected Components
• Label Propagation
• Louvain Modularity
• Balanced Triad (identification)
Graph Algorithms & Functions in Neo4j
• Shortest Path
• Single-Source Shortest Path
• All Pairs Shortest Path
• A* Shortest Path
• Yen’s K Shortest Path
• Minimum Weight Spanning Tree
• K-Spanning Tree (MST)
• Random Walk
• Depth First Search
• Breadth First Search
• Triangle Count
• Local Clustering Coefficient
• Connected Components (Union Find)
• Strongly Connected Components
• Label Propagation
• Louvain Modularity
• K-1 Coloring
• Modularity Optimization
• Euclidean Distance
• Cosine Similarity
• Node Similarity (Jaccard)
• Overlap Similarity
• Pearson Similarity
• Approximate KNN
Pathfinding
& Search
Centrality /
Importance
Community
Detection
Similarity
Link
Prediction
• Adamic Adar
• Common Neighbors
• Preferential Attachment
• Resource Allocations
• Same Community
• Total Neighbors
...and also Auxiliary Functions:
• Random graph generation
• Graph export
• One hot encoding
• Distributions & metrics
49. Neo4j Integrates with Common Architectures
From Disparate Silos
To Cross-Silo Connections
From Tabular Data
To Connected Data
From Data Lake to Real-Time
Operations
58. Neo4j Cloud offerings to suit every need
Database-as-a-service Self-hosted Cloud Managed Services (CMS)
Cloud-native service
Zero administration Pay-as-you-
go
Self-service deployment
Cloud-native stack
No access to underlying infra and
systems.
Self hosted and managed
Any cloud (AWS, GCP, Azure)
Bring-your-own-license
Self-manage software, infra in own
private cloud
Own data, tenant, security
>50% deploy this way
White-glove fully managed
service by Neo4j experts
Fully customizable deployment model
and service levels
Operate In own data centers or Virtual
Private Cloud
59. Fully managed cloud-native Neo4j graph
database service, for the cloud-first
developer
• Fully automated with zero administration
• Faster innovation with the power of graphs
• Scalable on-demand dynamically
• Worry-free security and reliability
• Simple pay-as-you-go pricing
65. Not so Easy to Learn (by Java Devs)
•Start with the Simple Defaults :
order, relationships, depth, uniqueness, etc
•Custom Expanders
•Where should I go next
•Custom Evaluators
•I’ve gone there… should I accept this path?
71. Combine any APIs
Cypher Stored Procedures
https://maxdemarzi.com/2017/01/26/writing-a-cypher-stored-procedure/
72. Boring Java Code for Non Java Devs
https://maxdemarzi.com/2019/01/28/neo4j-stored-procedures-for-devs-that-dont-know-java-yet/
It’s only 372 Slides.
74. Highly Valuable Connected Data Use Cases
Drive Enterprise Adoption
Network &
IT Operations
Fraud
Detection
Identity & Access
Management
Knowledge
Graph
Master Data
Management
Real-Time
Recommendations
75. • Record “Cyber Monday” sales
• About 35M daily transactions
• Each transaction is 3-22 hops
• Queries executed in 4ms or less
• Replaced IBM Websphere commerce
• 300M pricing operations per day
• 10x transaction throughput on half the hardware
compared to Oracle
• Replaced Oracle database
• Large postal service with over 500k employees
• Neo4j routes 7M+ packages daily at peak, with
peaks of 5,000+ routing operations per second.
Handling Large Graph Work Loads for Enterprises
Real-time promotion
recommendations
Marriott’s Real-time
Pricing Engine
Handling Package
Routing in Real-Time
76. • 27 Million warranty & service documents parsed
for text to knowledge graph
• Graph is context for AI to learn “prime examples”
and anticipate maintenance
• Improves satisfaction and equipment lifespan
• Connecting 50 research databases, 100k’s of Excel
workbooks, 30 bio-sample databases
• Bytes 4 Diabetes Award for use of a knowledge
graph, graph analytics, and AI
• Customized views for flexible research angles
• Almost 70% of CC fraud was missed
• ~1B Nodes and Relationships to analyze
• Graph analytics with queries & algorithms help
find $ millions of fraud in 1st year
Improving Analytics, ML & AI for Enterprises
Caterpillar’s AI Supply
Chain & Maintenance
German Center for
Diabetes Research (DZD)
Financial Fraud
Detection & Recovery
Top 10
Bank
81. Cypher Query: Movie Recommendation
MATCH (watched:Movie {title:"Toy Story”}) <-[r1:RATED]- (p2) -[r2:RATED]-> (unseen:Movie), (p)
WHERE r1.rating > 7 AND r2.rating > 7 AND p2.gender = “female” AND p2.age < 35
AND watched.genres = unseen.genres
AND NOT( (p:Person) -[:RATED|WATCHED]-> (unseen) )
AND p.username IN [“maxdemarzi”,”janedoe”,”jamesdean”]
RETURN unseen.title, COUNT(*)
ORDER BY COUNT(*) DESC
LIMIT 25
What are the Top 25 Movies
• that I haven't seen
• with the same genres as Toy Story
• given high ratings
• by women under 35 who liked Toy Story
83. Cypher Query: Ratings of Two Users
MATCH (p1:Person {name:'Michael Sherman’}) -[r1:RATED]-> (m:Movie),
(p2:Person {name:'Michael Hunger’}) -[r2:RATED]-> (m:Movie)
RETURN m.name AS Movie,
r1.rating AS `M. Sherman's Rating`,
r2.rating AS `M. Hunger's Rating`
What are the Movies these 2 users have both rated
85. Cypher Query: Cosine Similarity
MATCH (p1:Person) -[x:RATED]-> (m:Movie) <-[y:RATED]- (p2:Person)
WITH SUM(x.rating * y.rating) AS xyDotProduct,
SQRT(REDUCE(xDot = 0.0, a IN COLLECT(x.rating) | xDot + a^2)) AS xLength,
SQRT(REDUCE(yDot = 0.0, b IN COLLECT(y.rating) | yDot + b^2)) AS yLength,
p1, p2
MERGE (p1)-[s:SIMILARITY]-(p2)
SET s.similarity = xyDotProduct / (xLength * yLength)
Calculate it for all Person nodes with at least one Movie between them
86. Available in the Graph Data Science Library
• Jaccard Similarity
• Cosine Similarity
• Pearson Similarity
• Euclidian Distance
• Overlap Similarity
88. Cypher Query: k-NN Recommendation
MATCH (m:Movie) <-[r:RATED]- (b:Person) -[s:SIMILARITY]- (p:Person {name:'Zoltan Varju'})
WHERE NOT( (p) -[:RATED|WATCHED]-> (m) )
WITH m, s.similarity AS similarity, r.rating AS rating
ORDER BY m.name, similarity DESC
WITH m.name AS movie, COLLECT(rating)[0..3] AS ratings
WITH movie, REDUCE(s = 0, i IN ratings | s + i)*1.0 / LENGTH(ratings) AS recommendation
ORDER BY recommendation DESC
RETURN movie, recommendation
LIMIT 25
What are the Top 25 Movies
• that Zoltan Varju has not seen
• using the average rating
• by my top 3 neighbors
93. • Number of Applicants to a Job
• Wholesale Resume sales
• Selling your aggregated Data
Just one tiny itsy bitsy problem:
Job Boards get paid by:
94. Two Way Matches
Find your soulmate in the graph
• Are they energetic?
• Do they like dogs?
• Have a good sense of humor?
• Neat and tidy, but not crazy about it?
What are the Top 10 Potential Mates for me
• that are in the same location
• are sexually compatible
• have traits I want
• want traits I have
Recommend Love
97. • Finding lots of “Possible Connections”
• Monthly Subscription Fees
• Keeping you single
Just one tiny itsy bitsy problem:
Dating Boards get paid by: