5. The Graph based Technologies in
BigData/Nosql domain
Storage & Traversal/Query
• Neo4j
• TitanDB
• OrientDB
Processing/Computation Engines
• Apache Giraph
• GraphLab
• Apache Spark Graph ML/Graphx
6. Graph Databases
• A database which follows graph structure
• Each node knows its adjacent nodes
• As the number of nodes increases, the cost of local
step remains the same
• Index for lookups
• Optimized for traversing connected data
7. Neo4j
• Graph database from Neo Technology
• A schema-free labeled Property Graph Database +
Lucene Index
• Perfect for complex, highly connected data
• Reliable with real ACID Transactions
• Scalable: Billions of Nodes and Relationships, Scale
out with highly available Neo4j Cluster
• Server with REST API or Embeddable
• Declarative Query Language (Cypher)
8. Neo4j: Strengths & Weakness
Strengths
• Powerful data model
• Whiteboard friendly
• Fast for connected data
• Easy to query
Weakness
• Sharding
• Requires Conceptual Shift (Graph like thinking)
9. Four Building Blocks
• Nodes
• Relationships
• Properties
• Labels
(:USER)
[:RELATIVE
] (:PET)
Name: Mike
Animal: Dog
Name: Apple
Age: 25
Relation: Owner
10. 10Serendio Proprietary and Confidential
SQL to Graph DB: Data Model
Transformation
SQL Graph DB
Table Type of Node
Rows of Table Nodes
Columns of Table Node-Properties
Foreign-key, Joins Relationships
11. SQL to Graph DB: Data Model
Transformation
Name Movies
Language
Rajnikant Tamil
Maheshbabu Telugu
Vijay Tamil
Prabhas Telugu
Name Lead Actor
Bahubali Prabhas
Puli Vijay
Shrimanthudu Maheshbabu
Robot Rajnikant
Table: Actor
Table: Movie
ACTOR
MOVIE
ACTOR
MOVIE
Name Prabhas
Movie
Language
Telugu
Name Rajnikant
Movie
Language
TamilName Bahubali
Name Robot
LEAD_ACTOR
LEAD_ACTOR
12. Interact with Neo4j
• Web Interface
– http://IP:7474/browser/
– http://IP:7474/webadmin/
• Neo4j Console
• REST API
• Java Native Libraries
13. How to query Graph Database?
• Graph Query Language
– Cypher
– Gremlin
15. Cypher Query Language
• Declarative
• SQL-inspired
• Pattern based
Apple Orange
LIKES
(Apple:FRUIT) - [connect:RELATIVE] -> (Orange:FRUIT)
16. Cypher: Getting Started
Structure:
• Similar to SQL
• Most common clauses:
– MATCH: the graph pattern for matching
– WHERE: add constrains or filter
– RETURN: what to return
17. Cypher: Frequently Used Queries
• get whole database:
MATCH n RETURN n
• delete whole database:
MATCH (n)
OPTIONAL MATCH (n)-[r]-()
DELETE n,r
18. CRUD Operations
Copy the code from link and paste in Noe4j Web Browser
MATCH:
• MATCH (n) RETURN n
• MATCH (movie:Movie) RETURN movie
• MATCH (movie:Movie { title: 'Bahubali' }) RETURN movie
• MATCH (director { name:'Rajamouli' })--(movie) RETURN movie.title
• MATCH (raj:Person { name:'Rajamouli'})--(movie:Movie) RETURN movie
• MATCH (raj:Person { name:'Rajamouli'})-->(movie:Movie) RETURN movie
• MATCH (raj:Person { name:'Rajamouli'})<--(movie:Movie) RETURN movie
• MATCH (raj:Person { name:'Rajamouli'})-[:DIRECTED]->(movie:Movie)
RETURN movie
21. CRUD Operations
CREATE:
Node:
• CREATE (n)
• CREATE (n),(m)
• CREATE (n:Person)
• CREATE (n:Person:Swedish)
• CREATE (n:Person { name : 'Andres', title : 'Developer' })
• CREATE (a:Person { name : 'Roman' }) RETURN a
22. CRUD Operations
CREATE:
Relationships:
• MATCH (a:Person),(b:Person)
WHERE a.name = 'Roman' AND b.name = 'Andres'
CREATE (a)-[r:RELTYPE]->(b)
RETURN r
• MATCH (a:Person),(b:Person)
WHERE a.name = 'Roman' AND b.name = 'Andres'
CREATE (a)-[r:RELTYPE { name : a.name + '<->' + b.name }]->(b)
RETURN r
24. CRUD Operations
UPDATE:
Properties:
• MATCH (n:Person { name : 'Andres' }) SET n :Person:Coder
• MATCH (n:Person { name : 'Andres', title : 'Developer' }) SET
n.title = 'Mang'
25. CRUD Operations
DELETE:
• MATCH (n:Person)
WHERE n.name = 'Andres'
DELETE n
• MATCH (n { name: 'Andres' })-[r]-()
DELETE n, r
• MATCH (n:Person)
DELETE n
• MATCH (n)
OPTIONAL MATCH (n)-[r]-()
DELETE n,r
26. Functions
Predicates:
• ALL(identifier in collection WHERE predicate)
• ANY(identifier in collection WHERE predicate)
• NONE(identifier in collection WHERE predicate)
• SINGLE(identifier in collection WHERE predicate)
• EXISTS( pattern-or-property )
Scalar Function:
• LENGTH( collection/pattern expression )
• TYPE( relationship )
• ID( property-container )
• COALESCE( expression [, expression]* )
• HEAD( expression )
• LAST( expression )
• TIMESTAMP()
29. Use case 1: Mumbai Local Train*
Problem
• Four main railway lines- Western, Central, Harbour and Trans
Harbour.
• Each line serves various sections of the city.
• To travel across sections, one must change lines at various
interchange stations.
• Find the shortest path from source station to destination
station.
•*https://gist.github.com/luanne/8159102
31. Use case 1: Mumbai Local Train (conti..)
Solution:
• Create railway network graph.
• Use shortest path algo for source and destination.
32. Use case 1: Mumbai Local Train (conti..)
Graph Database Model:
Station Station
Next
33. Use case 1: Mumbai Local Train (conti..)
Create Graph
• Open the file from link below, copy-paste and run it on neo4j.
34. Use case 1: Mumbai Local Train (conti..)
• Query 1: The Graph
match n return n
• Query 2: Route from Churchgate to Vashi
match (s1 {name:"Churchgate"}),(s2 {name:"Vashi"}),
p=shortestPath((s1)-[:NEXT*]->(s2))
return p
• Query 3: Route from Santa Cruz to Dockyard
Road
match (s1 {name:"Santa Cruz"}),(s2 {name:"Dockyard Road"}),
p=shortestPath((s1)-[:NEXT*]-(s2))
return p
35. Use Case 2: Movie Recommendation*
Problem:
• We are running IMDB type website.
• We have dataset which contains movie rating done by users.
• Our problem is to generate list of movies which will be
recommended to individual users.
* http://www.neo4j.org/graphgist?8173017
36. Use Case 2: Movie Recommendation
(Conti..)
Solution:
• We will find the people who has given similar rating to the
movies watch by both of them.
• After that we will recommend movies which one has not seen
and other has rated high.
• Cosine Similarity function to calculate similarity between
users.
• k-Nearest Neighbors for finding similar users
37. Use Case 2: Movie Recommendation
(Conti..)
• Cosine Similarity:
• K-NN:
38. Use Case 2: Movie Recommendation
(Conti..)
• Let’s create real dataset with you folks.
• Visit:
http://graphlab.byethost7.com/movie_recco/index.php
39. Use Case 2: Movie Recommendation
(Conti..)
Dataset:
• Nodes:
– movies.csv
– users.csv
• Edges:
– rating.csv
EXTRA FILES WE WILL CREATE
• movies_header.csv
• users_header.csv
• rating_header.csv
40. Use Case 2: Movie Recommendation
(Conti..)
• Import to Neo4j
$ ./neo4j-import
--into /tmp/graph.db
--nodes:USER person_header.csv,person.csv
--nodes:MOVIES movies_header.csv,movies.csv
--relationships:RATING rating_header.csv, rating.csv
41. Use Case 2: Movie Recommendation
(Conti..)
• Query:Add Cosine Similarity
MATCH (p1:USER)-[x:RATING]->(m:MOVIES)<-[y:RATING]-(p2:USER)
WITH SUM(x.rating * y.rating) AS xyDotProduct,
SQRT(REDUCE(xDot = 0.0, a IN COLLECT(x.rating) | xDot + a^2)) AS
xLength,
SQRT(REDUCE(yDot = 0.0, b IN COLLECT(y.rating) | yDot + b^2)) AS
yLength,
p1, p2
MERGE (p1)-[s:SIMILARITY]-(p2)
SET s.similarity = xyDotProduct / (xLength * yLength)
42. Use Case 2: Movie Recommendation
(Conti..)
• Query: See who is your neighbor in
similarity
MATCH (p1:USER {name:'Nishant'})-[s:SIMILARITY](p2:USER)
WITH p2, s.similarity AS sim
ORDER BY sim DESC
LIMIT 5
RETURN p2.name AS Neighbor, sim AS Similarity
43. Use Case 2: Movie Recommendation
(Conti..)
• Query: Recommendation Finally :D
MATCH (b:USER)-[r:RATING]->(m:MOVIES), (b)-[s:SIMILARITY]-(a:USER
{name:'Nishant'})
WHERE NOT((a)-[:RATING]->(m))
WITH m, s.similarity AS similarity, r.rating AS rating
ORDER BY m.name, similarity DESC
WITH m.name AS movie, COLLECT(rating)[0..3] AS ratings
WITH movie, REDUCE(s = 0, i IN ratings | s + i)*1.0 / LENGTH(ratings) AS
reco
ORDER BY reco DESC
RETURN movie AS Movie, reco AS Recommendation
44. Use Case 3: Email Analytics*
Overview:
• Framework for analyzing large email datasets
• Capability of performing Sentiment Analysis and Topic
Extraction on email dataset
• Accessed through Command Line Interface
• Incubated at Serendio and open source project now.
*https://github.com/serendio-labs/email-analytics
45. Use Case 3: Email Analytics (Conti..)
System Architecture:
47. Use Case 3: Email Analytics (Conti..)
Possible Use cases:
• Keep track of your employee’s activities.
• Fraud-detection
• Data-mining for Business Analytics
48. Use Case 3: Email Analytics (Conti..)
• Come forward and contribute:
• The project need attention in the area of
– Web-UI
– REST API
– Unit Test
– Custom Email Format Support
– Other Features
52. Conclusion
Graph Database Technologies like Neo4j has lot of potential
to solve many complex problems.
The neo4j is mature technology which can be used in
designing solutions.