This talk examines graph databases and Neo4j with a use-case driven approach. First, we look at some property graph model examples, taken from real-world datasets. Next we discuss converting a relational model to graph, using the canonical Northwind example. Finally, we dive into Fraud Detection and Personalized Recommendation examples, learning about Neo4j developer tooling as we explore these use cases.
7. Name Country Dept University
John UK Prime Brokerage Princeton
Mary USA Sales and Trade Yale
Li China Investment Banking Princeton
Kate UK Sales and Trade Princeton
Michal CA Investment Banking Brown
Employees
9. ID Country Leader
17 UK Cameron
12 USA Obama
19 China Xi Jinping
17 UK Cameron
112 CA Trudeau
Countries
10. Name Country Dept University
John 17 Prime Brokerage Princeton
Mary 12 Sales and Trade Yale
Li 19 Investment Banking Princeton
Kate 17 Sales and Trade Princeton
Michal 112 Investment Banking Brown
Employees
11. ID Name President State
92 Princeton Eisgrubt NJ
34 Yale Salovey CT
1 Brown Paxson RI
University
12. Name Country Dept University
John 17 Prime Brokerage 92
Mary 12 Sales and Trade 34
Li 19 Investment Banking 92
Kate 17 Sales and Trade 92
Michal 112 Investment Banking 1
Employees
13. Name Country Dept University
John 17 Prime
Brokerage
92
Mary 12 Sales and
Trade
34
Li 19 Investment
Banking
92
Kate 17 Sales and
Trade
92
Michal 112 Investment
Banking
1
ID Country
17 UK
12 USA
19 China
17 UK
112 CA
ID Name Presiden
t
State
92 Princeton Eisgrubt NJ
34 Yale Salovey CT
1 Brown Paxson RI
15. Name Country Dept University
John 17
Prime
Brokerage
92
Mary 12 Sales and Trade 34
Li 19
Investment
Banking
92
Kate 17 Sales and Trade 92
Michal 112 Investment
Banking
1
ID Country
17 UK
12 USA
19 China
17 UK
112 CA
ID Name President State
92 Princeton Eisgrubt NJ
34 Yale Salovey CT
1 Brown Paxson RI
Name Country Dept University
John 17
Prime
Brokerage
92
Mary 12 Sales and Trade 34
Li 19
Investment
Banking
92
Kate 17 Sales and Trade 92
Michal 112 Investment
Banking
1
ID Country
17 UK
12 USA
19 China
17 UK
112 CA
ID Name President State
92 Princeton Eisgrubt NJ
34 Yale Salovey CT
1 Brown Paxson RI
Name Country Dept University
John 17
Prime
Brokerage
92
Mary 12 Sales and Trade 34
Li 19
Investment
Banking
92
Kate 17 Sales and Trade 92
Michal 112 Investment
Banking
1
ID Country
17 UK
12 USA
19 China
17 UK
112 CA
19 China
17 UK
112 CA
Name Country Dept University
John 17
Prime
Brokerage
92
Mary 12 Sales and Trade 34
Li 19
Investment
Banking
92
Kate 17 Sales and Trade 92
Michal 112 Investment
Banking
1
ID Country
17 UK
12 USA
19 China
17 UK
112 CA
Name Country Dept University
John 17
Prime
Brokerage
92
Mary 12 Sales and Trade 34
Li 19
Investment
Banking
92
Kate 17 Sales and Trade 92
Michal 112 Investment
Banking
1
ID Name President State
92 Princeton Eisgrubt NJ
34 Yale Salovey CT
1 Brown Paxson RI
16. (SELECT T.directReportees AS directReportees, sum(T.count) AS count
FROM (
SELECT manager.pid AS directReportees, 0 AS count
FROM person_reportee manager
WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName")
UNION
SELECT manager.pid AS directReportees, count(manager.directly_manages)
AS count
FROM person_reportee manager
WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName")
GROUP BY directReportees
UNION
SELECT manager.pid AS directReportees, count(reportee.directly_manages) AS
count
FROM person_reportee manager
JOIN person_reportee reportee
ON manager.directly_manages = reportee.pid
WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName")
GROUP BY directReportees
UNION
SELECT manager.pid AS directReportees, count(L2Reportees.directly_manages)
AS count
FROM person_reportee manager
JOIN person_reportee L1Reportees
ON manager.directly_manages = L1Reportees.pid
JOIN person_reportee L2Reportees
ON L1Reportees.directly_manages = L2Reportees.pid
WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName")
GROUP BY directReportees
) AS T
GROUP BY directReportees)
UNION
(SELECT T.directReportees AS directReportees, sum(T.count) AS count
FROM (
SELECT manager.directly_manages AS directReportees, 0 AS count
FROM person_reportee manager
WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName")
UNION
SELECT reportee.pid AS directReportees, count(reportee.directly_manages)
AS count
FROM person_reportee manager
JOIN person_reportee reportee
ON manager.directly_manages = reportee.pid
WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName")
GROUP BY directReportees
SELECT depth1Reportees.pid AS directReportees,
count(depth2Reportees.directly_manages) AS count
FROM person_reportee manager
JOIN person_reportee L1Reportees
ON manager.directly_manages = L1Reportees.pid
JOIN person_reportee L2Reportees
ON L1Reportees.directly_manages = L2Reportees.pid
WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName")
GROUP BY directReportees
) AS T
GROUP BY directReportees)
UNION
(SELECT T.directReportees AS directReportees, sum(T.count) AS count
FROM(
SELECT reportee.directly_manages AS directReportees, 0 AS count
FROM person_reportee manager
JOIN person_reportee reportee
ON manager.directly_manages = reportee.pid
WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName")
GROUP BY directReportees
UNION
SELECT L2Reportees.pid AS directReportees, count(L2Reportees.directly_manages) AS
count
FROM person_reportee manager
JOIN person_reportee L1Reportees
ON manager.directly_manages = L1Reportees.pid
JOIN person_reportee L2Reportees
ON L1Reportees.directly_manages = L2Reportees.pid
WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName")
GROUP BY directReportees
) AS T
GROUP BY directReportees)
UNION
(SELECT L2Reportees.directly_manages AS directReportees, 0 AS count
FROM person_reportee manager
JOIN person_reportee L1Reportees
ON manager.directly_manages = L1Reportees.pid
JOIN person_reportee L2Reportees
ON L1Reportees.directly_manages = L2Reportees.pid
WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName")
)
23. • Complex to model and store relationships
• Performance degrades with increases in data
• Queries get long and complex
• Maintenance is painful
SQL Trouble
24. • Easy to model and store relationships
• Performance of relationship traversal remains constant with growth
in data size
• Queries are shortened and more readable
• Adding additional properties and relationships can be done on the
fly - no migrations
Graph Motivations
26. (SELECT T.directReportees AS directReportees, sum(T.count) AS count
FROM (
SELECT manager.pid AS directReportees, 0 AS count
FROM person_reportee manager
WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName")
UNION
SELECT manager.pid AS directReportees, count(manager.directly_manages)
AS count
FROM person_reportee manager
WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName")
GROUP BY directReportees
UNION
SELECT manager.pid AS directReportees, count(reportee.directly_manages) AS
count
FROM person_reportee manager
JOIN person_reportee reportee
ON manager.directly_manages = reportee.pid
WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName")
GROUP BY directReportees
UNION
SELECT manager.pid AS directReportees, count(L2Reportees.directly_manages)
AS count
FROM person_reportee manager
JOIN person_reportee L1Reportees
ON manager.directly_manages = L1Reportees.pid
JOIN person_reportee L2Reportees
ON L1Reportees.directly_manages = L2Reportees.pid
WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName")
GROUP BY directReportees
) AS T
GROUP BY directReportees)
UNION
(SELECT T.directReportees AS directReportees, sum(T.count) AS count
FROM (
SELECT manager.directly_manages AS directReportees, 0 AS count
FROM person_reportee manager
WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName")
UNION
SELECT reportee.pid AS directReportees, count(reportee.directly_manages)
AS count
FROM person_reportee manager
JOIN person_reportee reportee
ON manager.directly_manages = reportee.pid
WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName")
GROUP BY directReportees
SELECT depth1Reportees.pid AS directReportees,
count(depth2Reportees.directly_manages) AS count
FROM person_reportee manager
JOIN person_reportee L1Reportees
ON manager.directly_manages = L1Reportees.pid
JOIN person_reportee L2Reportees
ON L1Reportees.directly_manages = L2Reportees.pid
WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName")
GROUP BY directReportees
) AS T
GROUP BY directReportees)
UNION
(SELECT T.directReportees AS directReportees, sum(T.count) AS count
FROM(
SELECT reportee.directly_manages AS directReportees, 0 AS count
FROM person_reportee manager
JOIN person_reportee reportee
ON manager.directly_manages = reportee.pid
WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName")
GROUP BY directReportees
UNION
SELECT L2Reportees.pid AS directReportees, count(L2Reportees.directly_manages) AS
count
FROM person_reportee manager
JOIN person_reportee L1Reportees
ON manager.directly_manages = L1Reportees.pid
JOIN person_reportee L2Reportees
ON L1Reportees.directly_manages = L2Reportees.pid
WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName")
GROUP BY directReportees
) AS T
GROUP BY directReportees)
UNION
(SELECT L2Reportees.directly_manages AS directReportees, 0 AS count
FROM person_reportee manager
JOIN person_reportee L1Reportees
ON manager.directly_manages = L1Reportees.pid
JOIN person_reportee L2Reportees
ON L1Reportees.directly_manages = L2Reportees.pid
WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName")
)
29. How Fast is Fast?
• Sample Social Graph with roughly 1,000 persons
• On average each person has 50 friends
• pathExists(a,b) limited to depth 4
• Caches warmed up to eliminate disk I/O
30. How Fast is Fast?
DATABASE # OF PERSONS QUERY TIME
• Sample Social Graph with roughly 1,000 persons
• On average each person has 50 friends
• pathExists(a,b) limited to depth 4
• Caches warmed up to eliminate disk I/O
31. How Fast is Fast?
DATABASE # OF PERSONS QUERY TIME
RDBMs 1,000 2,000 ms
• Sample Social Graph with roughly 1,000 persons
• On average each person has 50 friends
• pathExists(a,b) limited to depth 4
• Caches warmed up to eliminate disk I/O
32. How Fast is Fast?
DATABASE # OF PERSONS QUERY TIME
RDBMs 1,000 2,000 ms
Neo4j 1,000 2 ms
• Sample Social Graph with roughly 1,000 persons
• On average each person has 50 friends
• pathExists(a,b) limited to depth 4
• Caches warmed up to eliminate disk I/O
33. How Fast is Fast?
DATABASE # OF PERSONS QUERY TIME
RDBMs 1,000 2,000 ms
Neo4j 1,000 2 ms
Neo4j 10,000,000 2 ms
• Sample Social Graph with roughly 1,000 persons
• On average each person has 50 friends
• pathExists(a,b) limited to depth 4
• Caches warmed up to eliminate disk I/O
35. David Meza of NASA said: "Neo helped NASA save millions of
dollars and up to two years by locating existing research they
could use in his work on the Orion, the spacecraft
NASA hopes eventually will take humans to Mars."
37. “We needed to understand consumer behavior across
devices in order to capture a complete picture. Conceptually
we could have done this in a relational database, but the
multiple JOINS would have made it much too complicated.”
- Qualia CTO, Niels Meersschaert
38. We're smashing a billion
queries a day that'd be
impossible in relational…
39. "I found graph databases, which perform well with
queries on connected data. With more than 10 years of
experience of using relational database, I know that
complicated joins are the performance killer. But graph
databases kick ass of other databases."
- LinkedIn China Development Lead, Dong Bin
69. Who do people report to?
MATCH
(e:Employee)<-[:REPORTS_TO]-(sub:Employee)
RETURN
e.employeeID AS managerID,
e.firstName AS managerName,
sub.employeeID AS employeeID,
sub.firstName AS employeeName;