This is the presentation given by Michael Hunger and Peter Neubauer at the SF Data Mining group, see http://www.meetup.com/Data-Mining/events/80275492/
7. Um, Neo for what?
•Neo4j - the graph database
•Graph
•no: not for charts & diagrams, or vector artwork
•yes: for storing data that is connected
•remember linker lists, trees?
•graphs are the general-purpose data structure
2
9. Neo4j is a Graph Database
๏ A Graph Database:
3
10. Neo4j is a Graph Database
๏ A Graph Database:
• a Property Graph with Nodes, Relationships
and Properties on both
3
11. Neo4j is a Graph Database
๏ A Graph Database:
• a Property Graph with Nodes, Relationships
and Properties on both
• perfect for complex, highly connected data
3
12. Neo4j is a Graph Database
๏ A Graph Database:
• a Property Graph with Nodes, Relationships
and Properties on both
• perfect for complex, highly connected data
๏ A Graph Database:
3
13. Neo4j is a Graph Database
๏ A Graph Database:
• a Property Graph with Nodes, Relationships
and Properties on both
• perfect for complex, highly connected data
๏ A Graph Database:
• reliable with real ACID Transactions
3
14. Neo4j is a Graph Database
๏ A Graph Database:
• a Property Graph with Nodes, Relationships
and Properties on both
• perfect for complex, highly connected data
๏ A Graph Database:
• reliable with real ACID Transactions
• scalable: 32 Billion Nodes, 32 Billion Relationships, 64 Billion
Properties
3
15. Neo4j is a Graph Database
๏ A Graph Database:
• a Property Graph with Nodes, Relationships
and Properties on both
• perfect for complex, highly connected data
๏ A Graph Database:
• reliable with real ACID Transactions
• scalable: 32 Billion Nodes, 32 Billion Relationships, 64 Billion
Properties
• Server with REST API, or Embeddable on the JVM
3
16. Neo4j is a Graph Database
๏ A Graph Database:
• a Property Graph with Nodes, Relationships
and Properties on both
• perfect for complex, highly connected data
๏ A Graph Database:
• reliable with real ACID Transactions
• scalable: 32 Billion Nodes, 32 Billion Relationships, 64 Billion
Properties
• Server with REST API, or Embeddable on the JVM
• high-performance with High-Availability (read scaling) 3
38. We're talking about a
Property Graph
Em Joh
il a n
knows knows
Alli Tob Lar
Nodes
son ias knows s
knows
And And knows
knows rea rés
s
knows knows knows
Pet Miic
Mc knows Ian
er knows a
a
knows knows
De Mic
lia h ael
Relationships
Properties (each a key+value)
+ Indexes (for easy look-ups)
6
65. SELECT skills.*, user_skill.*
FROM users
JOIN user_skill ON users.id = user_skill.user_id
JOIN skills ON user_skill.skill_id = skill.id WHERE
users.id = 1
66. START user = node(1)
MATCH user -[user_skill]-> skill
RETURN skill, user_skill
67. Indexes
Used as multiple starting points, not to speed
up any traversals
START a = node:nodes_index(type='User') MATCH
a-[r:knows]-b
RETURN ID(a), ID(b), r.weight
68. Variable length Path Match
Some UGLY recursive self join on the groups
table
START max=node:person(name=“Max")
MATCH group <-[:BELONGS_TO*]- max
RETURN group
74. Cute meta + data
This Material is subject to the terms of the Mozilla Public # License, v. 2.0. If a copy of the
MPL was not distributed with this # file, You can obtain one at http://mozilla.org/MPL/2.0/
75. Cute meta + data
This Material is subject to the terms of the Mozilla Public # License, v. 2.0. If a copy of the
MPL was not distributed with this # file, You can obtain one at http://mozilla.org/MPL/2.0/
76. Neo4J Co-Existence
• Node uuids as refs in external ElasticSearch
also in internal Lucene
• Custom search ranking for user history based on
node relationship data
• MySQL for user data, Redis for metrics
This Material is subject to the terms of the Mozilla Public # License, v. 2.0. If a copy of the
MPL was not distributed with this # file, You can obtain one at http://mozilla.org/MPL/2.0/
77.
78. [B] ACL from Hell
One of the top 10 telcos worldwide
79. Example Access Authorization
User
Access may be given directly or by inheritance Customer
Account
U Subscription
U Inherit = true
Inherit =
false
In
C
he
rit
=
C C C
tru
e
A A A A A
S S S S S S S S S S
82. [C] MDM within Cisco
master data management, sales compensation management, online customer support
Description Benefits
Real-time conflict detection in sales compensation management. Performance : “Minutes to Milliseconds”
Business-critical “P1” system. Neo4j allows Cisco to model complex Outperforms Oracle RAC, serving complex queries in real time
algorithms, which still maintaining high performance over a large Flexibility
dataset. Allows for Cisco to model interconnected data and complex queries with
ease
Background
Robustness
Neo4j replaces Oracle RAC, which was not performant enough for the
With 9+ years of production experience, Neo4j brings a solid product.
use case.
Architecture
3-node Enterprise cluster with mirrored
disaster recovery cluster
Dedicated hardware in own datacenter
Embedded in custom webapp
Sizing
35 million nodes
50 million relationships
600 million properties
98. Cypher - read clauses
๏ Find a place to begin:
•START <lookup>
๏ Describe what to find:
•MATCH <paths>
๏ Filter the elements:
•WHERE <filters>
๏ RETURN <values>
42
100. Cypher: START + RETURN
๏ START <lookup> RETURN <expressions>
๏ START binds terms using simple look-up
•directly using known ids
•or based on indexed Property
๏ RETURN expressions specify result set
43
101. Cypher: START + RETURN
๏ START <lookup> RETURN <expressions>
๏ START binds terms using simple look-up
•directly using known ids
•or based on indexed Property
๏ RETURN expressions specify result set
// lookup node id 1, return that node
start n=node(1) return n
// lookup all nodes
start n=node(*) return n
43
103. Cypher: MATCH
๏ START <lookup> MATCH <pattern> RETURN <expr>
๏ MATCH describes a pattern of nodes+relationships
•node terms in optional parenthesis
•lines with arrows for relationships
44
104. Cypher: MATCH
๏ START <lookup> MATCH <pattern> RETURN <expr>
๏ MATCH describes a pattern of nodes+relationships
•node terms in optional parenthesis
•lines with arrows for relationships
// lookup 'n', traverse any relationship to some 'm'
start n=node(0) match (n)--(m) return n,m;
// any outgoing relationship from 'n' to 'm'
start n=node(1) match n-->m return n,m;
// only 'next' relationships from 'n' to 'm' up to 3 away
start n=node(1) match p=n-[:next*..3]->m return p;
// from 'n' to 'm' and capture the relationship as 'r'
start n=node(1) match n-[r]->m return n,r,m
44
106. Cypher: WHERE
๏ START <lookup> [MATCH <pattern>]
WHERE <condition> RETURN <expr>
๏ WHERE filters nodes or relationships
•uses expressions to constrain elements
45
107. Cypher: WHERE
๏ START <lookup> [MATCH <pattern>]
WHERE <condition> RETURN <expr>
๏ WHERE filters nodes or relationships
•uses expressions to constrain elements
// lookup all nodes as 'n', constrained to name 'Andreas'
start n=node(*) where n.name='Andreas' return n
// filter nodes where age is less than 30
start n=node(*) where n.age<30 return n
// filter using a regular expression
start n=node(*) where n.name =~ ‘Tob.*’ return n
// filter for a property exists
start n=node(*) where has(n.name) return n
45
109. Cypher: CREATE
๏ CREATE <node>[,node or relationship] RETURN <expr>
•create nodes with optional properties
•create relationship (must have a type)
46
110. Cypher: CREATE
๏ CREATE <node>[,node or relationship] RETURN <expr>
•create nodes with optional properties
•create relationship (must have a type)
// create an anonymous node
create n
// create node with a property, returning it
create n={name:'Andreas'} return n
// lookup 2 nodes, then create a relationship and return it
start n=node(0),m=node(1) create n-[r:KNOWS]-m return r
// lookup nodes, then create a relationship with properties
start n=node(1),m=node(2) create n-[r:KNOWS {since:2008}]->m
46
112. Cypher: SET
๏ SET [<node property>] [<relationship property>]
•update a property on a node or relationship
•must follow a START
47
113. Cypher: SET
๏ SET [<node property>] [<relationship property>]
•update a property on a node or relationship
•must follow a START
// update the name property
start n=node(0) set n.name='Peter'
// update many nodes, using a calculation
start n=node(*) set n.size=n.size+1
// match & capture a relationship, update a property
start n=node(1) match n-[r]-m set r.times=10
47
115. Cypher: DELETE
๏ DELETE [<node>|<relationship>|<property>]
•delete a node, relationship or property
•must follow a START
•tofirst a node, all relationships must be deleted
delete
48
116. Cypher: DELETE
๏ DELETE [<node>|<relationship>|<property>]
•delete a node, relationship or property
•must follow a START
•tofirst a node, all relationships must be deleted
delete
// delete a node
start n=node(5) delete n
// remove a node and all relationships
start n=node(3) match n-[r]-() delete n, r
// remove a property
start n=node(3) delete n.age
48
Notas do Editor
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
There existed a number of different ways to query a graph database. This one aims to make querying easy, and to produce queries that are readable.\n\nWe looked at alternatives - SPARQL, SQL, Gremlin and other...\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
Visualization done with GraphViz\nA user will have many such &#x201C;stacks&#x201D;\n
Search ranking weighs inbound and outbound node connections as part of search score calculation\n