Since the irruption in the market of the NoSQL concept, graph databases have been traditionally designed to be used with Java or C. With some honorable exceptions, there isn't an easy way to manage graph databases from Python. In this talk, I will introduce you some of the tools that you can use today in order to work with those new challenging databases, from our favorite languge, Python.
1. GRAPH DATABASES
IN PYTHON
Javier de la Rosa
@versae
The CulturePlex Lab
Western University, London, ON
PyCon Canada 2012
2. WHO I AM
●
Javier de la Rosa
●
versae
●
versae
●
Computer Scientist and
Humanist
●
CulturePlex Lab
●
CulturePlex
Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 2
3. FIRST OF ALL
“You do not really understand something
unless you can explain it to your
grandmother”
– (Frequently attributed to) Richard Feynman
Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 3
4. DATABASES (in the last 30 years)
●
Data in tables, rows and columns
●
Pretty basic mechanism to make connections:
– Primary keys, Foreign keys, and... that's all
●
Relational, ahem, really?
Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 4
5. DATABASES (in the last 30 years)
●
Rigid data schemas
– Have you ever tried to make a schema migration?
●
Relational Algebra and SQL
– Terrible for highly interconnected data
– JOIN's can take a life to end (a bit overdramatized)
Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 5
6. NoSQL, Not Only SQL
●
Document ●
Anaylitc
– MongoDB, CouchDB, etc. – Hadoop
●
Key-value stores ●
Graph
– Redis, Riak, Voldemort, – Neo4j, OrientDB,
Dynamo, etc. HyperGraphDB, Titan, etc.
●
Big Tables ●
Other
– Cassandra, Hbase, etc – Objectivity/DB, ZODB, etc.
Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 6
7. DATABASES LANDSCAPE
Source: 451Research, https://451research.com/report-long?icid=2289
Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 7
8. WHO IS USING GRAPHS?
●
Mozilla with Pancake and Pacer
– https://wiki.mozilla.org/Pancake &
http://pangloss.github.com/pacer/
●
Twitter with FlockDB
– https://github.com/twitter/flockdb
●
Facebook with Open Graph
– https://developers.facebook.com/docs/opengraph/
●
Google with Knowledge Graph
– http://www.google.ca/insidesearch/.../knowledge.html
Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 8
9. WHY GRAPHS?
●
Data is getting more and more connected
– From text documents, to wikis, to ontologies, to
folksonomies, etc
●
And more semi-structured
– Think about the decentralization of content generation
●
And more complex
– Social networks, semantic trending, etc
Source: Neo Technology, http://www.slideshare.net/emileifrem/neo4j-the-benefits-of-graph-databases-oscon-2009
Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 9
10. A FEW OF THE CURRENT USES
●
Social Networking and Recommendations
●
Network and Cloud Management
●
Master Data Management
●
Geospatial
●
Bioinformatics
●
Content Management and Security and Access
Control
Source: Mashable, http://mashable.com/2012/09/26/graph-databases/
Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 10
11. AND WHY ELSE?
●
Because graphs are cool!
Leonard Euler
Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 11
12. WHAT IS A GRAPH?
●
G = (V, E)
Where
– G is a graph
– V is a set of vertices
– E is a set of edges
Source: Wikipedia, https://en.wikipedia.org/wiki/Graph_(mathematics)
Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 12
13. WHAT IS A GRAPH?
●
G = (V, E)
– Graph, aka network, diagram, etc.
– Vertex, aka point, dot, node, element, etc.
– Edge, aka relationship, arc, line, link, etc.
●
Basically, “a graph states that something is related
to something else”
– Svetlana Sicular,
Research Director at Gartner
Source: Gartner, http://blogs.gartner.com/svetlana-sicular/think-graph/
Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 13
14. TYPES OF GRAPH
Undirected Digraph
Source: Wikipedia, https://en.wikipedia.org/wiki/Graph_(mathematics)
Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 14
15. TYPES OF GRAPH
Multigraph Hypergraph
Source: Wikipedia, https://en.wikipedia.org/wiki/Graph_(mathematics)
Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 15
16. SOME GRAPHS EVEN HAVE A NAME
●
Complete graphs
K3 K5 K8
Source: Wikipedia, http://en.wikipedia.org/wiki/Gallery_of_named_graphs
Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 16
17. SOME GRAPHS EVEN HAVE A NAME
●
Stars
The star graphs S3, S4, S5 and S6
Source: Wikipedia, http://en.wikipedia.org/wiki/Gallery_of_named_graphs
Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 17
18. SOME GRAPHS EVEN HAVE A NAME
●
Snarks
Blanuša (second) Szekeres Double star
Source: Wikipedia, http://en.wikipedia.org/wiki/Gallery_of_named_graphs
Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 18
19. THINGS CAN COMPLICATE...
Local McLaughlin graph
Source: Wikipedia, http://en.wikipedia.org/wiki/Gallery_of_named_graphs
Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 19
20. WAIT A SEC,
Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 20
21. DON'T WORRY
●
Just one more type: the Property Graph
1
2 1
2 3 3
4
4
Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 21
22. THE PROPERTY GRAPH
●
Directed, attributed and multi-relational
Name: Javi
1
2 1
Knows Knows
Since: 2009 Since:1990
2 3 3 Name: David
Likes
Name: John
4
Likes
4
Title: The Art of Computer Programming
Price: $135
Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 22
23. THE PROPERTY GRAPH
●
A set of nodes, and each node has:
– An unique identifier.
– A set of outgoing edges.
– A set of incoming edges.
– A collection of properties defined by a map from key to value.
●
A set of relationships, and each relationship has:
– An unique identifier.
– An outgoing tail vertex.
– An incoming head vertex.
– And a collection of properties defined by a map from key to value.
Source: TinkerPop, https://github.com/tinkerpop/gremlin/wiki/Defining-a-Property-Graph
Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 23
24. IN SHORT
●
A Property Graph is composed by:
– A set of nodes
– A set of relationships
– Properties and id's on both
●
Sometimes, nodes and relationship can be typed
– In Blueprints and Neo4j, a label denotes the type of
relationship between its two nodes.
Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 24
25. GRAPH DATABASES
●
A graph database uses graph structures with nodes,
edges, and properties to represent and store data
– ...but there is not an easy way to visualize this
Source: Wikipedia, https://en.wikipedia.org/wiki/Graph_database
Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 25
26. HOW IT LOOKS IN PYTHON?
Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 26
27. HOW IT LOOKS IN PYTHON?
# Let's create a graph
>>> silvester = g.nodes.create(name="Silvester")
Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 27
28. HOW IT LOOKS IN PYTHON?
# Let's create a graph
>>> silvester = g.nodes.create(name="Silvester")
Name: Silvester
Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 28
29. HOW IT LOOKS IN PYTHON?
# Let's create a graph
>>> silvester = g.nodes.create(name="Silvester")
>>> arnold = g.nodes.create(name="Arnold")
Name: Silvester
Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 29
30. HOW IT LOOKS IN PYTHON?
# Let's create a graph
>>> silvester = g.nodes.create(name="Silvester")
>>> arnold = g.nodes.create(name="Arnold")
Name: Silvester Name: Arnold
Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 30
31. HOW IT LOOKS IN PYTHON?
# Let's create a graph
>>> silvester = g.nodes.create(name="Silvester")
>>> arnold = g.nodes.create(name="Arnold")
>>> punch = arnold.punches(silvester)
Name: Silvester Name: Arnold
Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 31
32. HOW IT LOOKS IN PYTHON?
# Let's create a graph
>>> silvester = g.nodes.create(name="Silvester")
>>> arnold = g.nodes.create(name="Arnold")
>>> punch = arnold.punches(silvester)
punches
Name: Silvester Name: Arnold
Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 32
33. HOW IT LOOKS IN PYTHON?
punches
Name: Arnold
Name: Silvester
Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 33
34. HOW IT LOOKS IN PYTHON?
>>> chuck = g.nodes.create(name="Chuck")
punches
Name: Arnold
Name: Silvester
Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 34
35. HOW IT LOOKS IN PYTHON?
>>> chuck = g.nodes.create(name="Chuck")
punches
Name: Arnold
Name: Silvester Name: Chuck
Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 35
36. HOW IT LOOKS IN PYTHON?
>>> chuck.dropkicks(silvester)
>>> chuck.dropkicks(arnold)
punches
Name: Arnold
Name: Silvester Name: Chuck
Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 36
37. HOW IT LOOKS IN PYTHON?
>>> chuck.dropkicks(silvester)
>>> chuck.dropkicks(arnold)
punches dropkicks
Name: Arnold
dropkicks
Name: Silvester Name: Chuck
Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 37
39. GRAPH DATABASES LANDSCAPE
And more:
– AffinityDB
– YarcData uRiKA
– Apache Giraph
– Cassovary
– StigDB
– NuvolaBase
– Pegasus
– Microsoft Trinity
– Sherlock
– And so on
Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 41
41. GREMLIN, BLUEPRINTS, WAT?
Let me introduce you the TinkerPop Stack
Source:TinkerPop, http://www.tinkerpop.com/
Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 43
42. BLUEPRINTS AND REXSTER
●
Blueprints is a property graph model interface
●
Rexster is a server that exposes any Blueprints
graph through REST
Source:TinkerPop, http://www.tinkerpop.com/
Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 44
43. AND WHAT ABOUT PYTHON?
●
Options to connect to a Blueprints Graph Database
OrientDB Neo4j
bulbflow
Blueprints API Rexster python-blueprints
pyblueprints
DEX Titan
REST
Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 45
44. BULBFLOW
●
Create
>>> alice = g.vertices.create(name="Alice")
>>> bob = g.vertices.create(name="Bob")
>>> g.edges.create(alice, "knows", bob)
●
Get
>>> alice = g.vertices.get(1)
>>> bob = g.vertices.get(2)
●
Update
>>> alice.age = 21
>>> alice.save()
●
Delete
>>> alice.delete()
Source: Bulbflow, http://bulbflow.com/docs/
Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 46
45. PYBLUEPRINTS
●
Create
>>> alice = g.addVertex()
>>> alice.setProperty("name", "Alice")
>>> bob = g.addVertex()
>>> bob.setProperty("name", "Bob")
>>> g.addEdge(alice, bob, "knows")
●
Get
>>> alice = g.getVertex(1)
>>> bob = g.getVertex(2)
●
Update
>>> alice.setProperty("age", 21)
●
Delete
>>> g.removeVertex(alice.getId())
Source: PyBlueprints, https://github.com/escalant3/pyblueprints
Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 47
46. BUT NEO4J HAS ITS OWN CLIENTS!
●
REST Clients for Neo4j
neo4j-rest-client
OrientDB Neo4j
py2neo
Blueprints API Rexster bulbflow
python-blueprints
DEX Titan
pyblueprints
REST
Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 48
47. HOW CAN I LOOKUP?
●
An index is a data structure that supports the fast
lookup of elements by some key/value pair
Source: TinkerPop, https://github.com/tinkerpop/blueprints/wiki/Graph-Indices
Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 49
48. INDICES
●
In Python bindings, are similar to dict
– bulbflow
# bulbflow creates auto indices to make easier basic lookups
>>> nodes = g.vertices.index.lookup(name="Alice")
>>> for node in nodes:
...: print vertex
– PyBlueprints
>>> index = g.getIndex("names", "vertex")
>>> index.put("name", alice.getProperty("name"), alice)
>>> nodes = index.get("name", "Alice")
>>> for node in nodes:
...: print node
Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 50
49. INDICES
●
Some Graph Databases provide full-text queries
– bulbflow
>>> nodes = g.vertices.index.query(name="ali*")
>>> for node in nodes:
...: print node
– PyBlueprints
>>> index = g.getIndex("names", "vertex")
>>> nodes = index.query("name", "ali*")
>>> for node in nodes:
...: print node
Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 51
50. ...MORE COMPLEX SEARCHS?
“Without traversals [FlockDB] is only a persisted
graph. But not a graph database.”
– Alex Popescu
Source: myNoSQL, http://nosql.mypopescu.com/
Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 52
51. LET'S TRAVERSE THE GRAPH!
●
“A graph traversal is the problem of visiting all the
nodes in a graph in a particular manner”
– A* search
– Alpha-beta prunning
– Breadth-First Search (BFS)
– Depth-First Search (DFS)
– Dijkstra's algorithm
– Floyd-Warshall's algortimth
– Etc.
Source: Wikipedia, https://en.wikipedia.org/wiki/Graph_traversal
Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 53
52. NEO4J TRAVERSAL API
●
Python-embedded (native Neo4j Python binding)
>>> traverser = gdb.traversal()
.relationships('knows').traverse(alice)
# The graph is traversed as you loop through the result
>>> for node in traverser.nodes:
...: print node
●
neo4j-rest-client
>>> traverser = alice.traverse(types=[client.All.knows])
# The graph is traversed as you loop through the result
>>> for node in traverser:
...: print node
Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 54
53. BLUEPRINTS GREMLIN
●
Gremlin is a domain specific language for traversing
property graphs
– Defines how to do a query based on the graph structure
>>> gremlin = g.extensions.GremlinPlugin.execute_script
>>> params = {'alice_id': alice.id}
>>> script = "g.V(alice_id).out('knows')"
>>> node = gremlin(script=script, params=params)
>>> node == bob
Source: TinkerPop Gremlin, https://github.com/tinkerpop/gremlin/wiki
Source: Marko Rodríguez, The Graph Traversal Programmin Pattern, http://www.slideshare.net/slidarko/graph-windycitydb2010
Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 55
54. NEO4J CYPHER QUERY LANGUAGE
●
Declarative graph query language
– Expressive and efficient querying
– Focused on expressing what to retrieve from a graph
– Inspired by SQL
– Pattern matching expressions from SPARQL
Source: Wikipedia, https://en.wikipedia.org/wiki/Graph_database
Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 56
55. NEO4J CYPHER QUERY LANGUAGE
●
Declarative graph query language
– Expressive and efficient querying
– Focused on expressing what to retrieve from a graph
– Inspired by SQL
– Pattern matching expressions from SPARQL
1 2
label
(1) -[:label]- (2)
Source: Wikipedia, https://en.wikipedia.org/wiki/Graph_database
Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 57
56. NEO4J CYPHER QUERY LANGUAGE
●
Declarative graph query language
– Expressive and efficient querying
– Focused on expressing what to retrieve from a graph
– Inspired by SQL
– Pattern matching expressions from SPARQL
1 2
label
START n=(1), m=(2) MATCH
n-[r:label]-m
RETURN r
Source: Wikipedia, https://en.wikipedia.org/wiki/Graph_database
Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 58
57. PY2NEO CYPHER HELPERS
●
Get or create elements
>>> g.get_or_create_relationships(
...: (bob, "WORKS WITH", carol, {"since": 2004}),
...: (alice, "DISLIKES!", carol, {"reason": "youth"}),
...: (bob, "WORKS WITH", dave, {"since": 2009}), )
●
Get counts
>>> nodes_count = g.get_node_count()
>>> rels_count = g.get_relationship_count()
●
Delete
>>> g.delete()
Source: py2neo, http://py2neo.org/
Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 59
59. LET'S PLAY!
●
Deploy Neo4j in Heroku or Amazon
●
Use one of the available clients
Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 61
60. NEO4J HEROKU ADD-ON
●
Create a Heroku app and add the Neo4j add-on
$ heroku apps:create pyconca
$ heroku addons:add neo4j --app pyconca
$ xdg-open `heroku config:get NEO4J_URL --app pyconca`
$ export NEO4J_URL=`heroku config:get NEO4J_URL --app pyconca`
●
Create a virtualenv with neo4j-rest-client
$ mkvirtualenv --no-site-packages pyconca
$ workon pyconca
$ pip install ipython neo4jrestclient
$ ipython
Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 62
61. NEO4J HEROKU ADD-ON
●
Run IPython and that's it!
>>> import os
>>> NEO4J_URL = os.environ["NEO4J_URL"]
>>> from neo4jrestclient import client
>>> gdb = client.GraphDatabase(NEO4J_URL + "/db/data")
>>> gdb.url
Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 63
62. NEO4J HEROKU ADD-ON
●
Run IPython and that's it!
>>> import os
>>> NEO4J_URL = os.environ["NEO4J_URL"]
>>> from neo4jrestclient import client
>>> gdb = client.GraphDatabase(NEO4J_URL + "/db/data")
>>> gdb.url
Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 64
63. THANKS!
Questions?
Javier de la Rosa
@versae
The CulturePlex Lab
Western University, London, ON
PyCon Canada 2012
64. APPENDIX: DATA MODELS
●
neo4django
– https://github.com/scholrly/neo4django
●
neomodel
– https://github.com/robinedwards/neomodel
●
bulbflow models
– http://bulbflow.com/quickstart/#models
Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 66
65. APPENDIX: VISUALIZE YOUR GRAPH
●
Export somehow to .gexf for Gephi
– http://gephi.org/
●
Use D3.js
– http://d3js.org/
●
Use sigma.js
– http://sigmajs.org/
●
Take a look on Max De Marzi work
– http://maxdemarzi.com/category/visualization/
●
Use Sylva (for newbies)
– http://www.sylvadb.com/
Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 67