This tutorial/lecture addresses various aspects of the graph traversal language Gremlin. In particular, the presentation focuses on Gremlin 0.7 and its application to graph analysis and manipulation.
IAC 2024 - IA Fast Track to Search Focused AI Solutions
The Gremlin in the Graph
1. The Gremlin in the Graph
Marko A. Rodriguez
Graph Systems Architect
http://markorodriguez.com
http://twitter.com/twarko
http://slideshare.com/slidarko
First University-Industry Meeting on Graph Databases (UIM-GDB)
Universitat Polit´cnica de Catalunya – February 8, 2011
e
February 6, 2011
2. Abstract
This tutorial/lecture addresses various aspects of the graph traversal
language Gremlin. In particular, the presentation focuses on Gremlin 0.7
and its application to graph analysis and processing.
Gremlin G = (V, E)
http://gremlin.tinkerpop.com
3. TinkerPop Productions
1
1
TinkerPop (http://tinkerpop.com), Marko A. Rodriguez (http://markorodriguez.com), Peter
Neubauer (http://www.linkedin.com/in/neubauer), Joshua Shinavier (http://fortytwo.net/), Pavel
Yaskevich (https://github.com/xedin), Darrick Wiebe (http://ofallpossibleworlds.wordpress.
com/), Stephen Mallette (http://stephen.genoprime.com/), Alex Averbuch (http://se.linkedin.
com/in/alexaverbuch)
4. Gremlin is a Domain Specific Language
• A domain specific language for graph analysis and manipulation.
• Gremlin 0.7+ uses Groovy has its host language.2
• Groovy is a superset of Java and as such, natively exposes the full JDK
to Gremlin.
2
Groovy available at http://groovy.codehaus.org/.
5. Gremlin is for Property Graphs
name = "lop"
lang = "java"
weight = 0.4 3
name = "marko"
age = 29 created
weight = 0.2
9
1 created
8 created
12
7 weight = 1.0
weight = 0.5
weight = 0.4 6
knows
knows 11 name = "peter"
age = 35
name = "josh"
4 age = 32
2
10
name = "vadas"
age = 27
weight = 1.0
created
5
name = "ripple"
lang = "java"
6. Gremlin is for Blueprints-enabled Graph Databases
• Blueprints can be seen as the JDBC for property graph databases.3
• Provides a collection of interfaces for graph database providers to implement.
• Provides tests to ensure the operational semantics of any implementation are correct.
• Provides support for GraphML and other “helper” utilities.
Blueprints
3
Blueprints is available at http://blueprints.tinkerpop.com.
7. A Blueprints Detour - Basic Example
Graph graph = new Neo4jGraph("/tmp/my_graph");
Vertex a = graph.addVertex(null);
Vertex b = graph.addVertex(null);
a.setProperty("name","marko");
b.setProperty("name","peter");
Edge e = graph.addEdge(null, a, b, "knows");
e.setProperty("since", 2007);
graph.shutdown();
0 knows 1
since=2007
name=marko name=peter
8. A Blueprints Detour - Sub-Interfaces
• TransactionalGraph extends Graph
For transaction based graph databases.4
• IndexableGraph extends Graph
For graph databases that support the indexing of properties.5
4
Graph Transactions https://github.com/tinkerpop/blueprints/wiki/Graph-Transactions.
5
Graph Indices https://github.com/tinkerpop/blueprints/wiki/Graph-Indices.
10. A Blueprints Detour - Implementations
• TinkerGraph implements IndexableGraph
• Neo4jGraph implements TransactionalGraph, IndexableGraph6
• OrientGraph implements TransactionalGraph, IndexableGraph7
• RexsterGraph implements IndexableGraph: Implementation of a
Rexster REST API.8
• SailGraph implements TransactionalGraph: Turns any Sail-based
RDF store into a Blueprints Graph.9
6
Neo4j is available at http://neo4j.org.
7
OrientDB is available at http://www.orientechnologies.com.
8
Rexster is available at http://rexster.tinkerpop.com.
9
Sail is available at http://openrdf.org.
12. A Blueprints Detour - Ouplementations
10
• GraphSail: Turns any IndexableGraph into an RDF store.
• GraphJung: Turns any Graph into a JUNG graph.11 12
10
GraphSail https://github.com/tinkerpop/blueprints/wiki/Sail-Ouplementation.
11
JUNG is available at http://jung.sourceforge.net/.
12
GraphJung https://github.com/tinkerpop/blueprints/wiki/JUNG-Ouplementation.
13. Gremlin Compiles Down to Pipes
• Pipes is a data flow framework for evaluating lazy graph traversals.13
• A Pipe extends Iterator, Iterable and can be chained together
to create processing pipelines.
• A Pipe can be embedded into another Pipe to allow for nested
processing.
Pipes
13
Pipes is available at http://pipes.tinkerpop.com.
14. A Pipes Detour - Chained Iterators
• This Pipeline takes objects of type A and turns them into objects of
type D.
D
D
A
A A Pipe1 B Pipe2 C Pipe3 D D
A D
A
Pipeline
Pipe<A,D> pipeline =
new Pipeline<A,D>(Pipe1<A,B>, Pipe2<B,C>, Pipe3<C,D>)
15. A Pipes Detour - Simple Example
“What are the names of the people that marko knows?”
B name=peter
knows
A knows C name=pavel
name=marko
created
created
D name=gremlin
16. A Pipes Detour - Simple Example
Pipe<Vertex,Edge> pipe1 = new VertexEdgePipe(Step.OUT_EDGES);
Pipe<Edge,Edge> pipe2= new LabelFilterPipe("knows",Filter.NOT_EQUAL);
Pipe<Edge,Vertex> pipe3 = new EdgeVertexPipe(Step.IN_VERTEX);
Pipe<Vertex,String> pipe4 = new PropertyPipe<String>("name");
Pipe<Vertex,String> pipeline = new Pipeline(pipe1,pipe2,pipe3,pipe4);
pipeline.setStarts(new SingleIterator<Vertex>(graph.getVertex("A"));
EdgeVertexPipe(IN_VERTEX)
VertexEdgePipe(OUT_EDGES)
PropertyPipe("name")
B name=peter
knows
A knows C name=pavel
name=marko
created
created
D name=gremlin
LabelFilterPipe("knows")
HINT: The benefit of Gremlin is that this Java verbosity is reduced to g.v(‘A’).outE[[label:‘knows’]].inV.name.
18. A Pipes Detour - Creating Pipes
public class NumCharsPipe extends AbstractPipe<String,Integer> {
public Integer processNextStart() {
String word = this.starts.next();
return word.length();
}
}
When extending the base class AbstractPipe<S,E> all that is required is
an implementation of processNextStart().
21. The Many Ways of Using Gremlin
• Gremlin has a REPL to be run from the shell.14
• Gremlin can be natively integrated into any Groovy class.
• Gremlin can be interacted with indirectly through Java, via Groovy.
• Gremlin has a JSR 223 ScriptEngine as well.
14
All examples in this lecture/tutorial are via the command line REPL.
22. The Seamless Nature of Gremlin/Groovy/Java
Simply Gremlin.load() and add Gremlin to your Groovy class.
// a Groovy class
class GraphAlgorithms {
static {
Gremlin.load();
}
public static Map<Vertex, Integer> eigenvectorRank(Graph g) {
Map<Vertex,Integer> m = [:]; int c = 0;
g.V.outE.inV.groupCount(m).loop(3) {c++ < 1000} >> -1;
return m;
}
}
Writing software that mixes Groovy and Java is simple...dead simple.
// a Java class
public class GraphFramework {
public static void main(String[] args) {
System.out.println(GraphAlgorithms.eigenvectorRank(new Neo4jGraph("/tmp/graphdata"));
}
}
23. Property Graph Model and Gremlin
vertex 1 out edges vertex 3 in edges
edge 9 out vertex edge 9 label edge 9 in vertex
edge 9 id
1 9 created 3
8 11
knows created
4 vertex 4 id
vertex 4 properties
name = "josh"
age = 32
• outE: outgoing edges from a vertex.
• inE: incoming edge to a vertex.
• inV: incoming/head vertex of an edge.
• outV: outgoing/tail vertex of an edge.15
15
Documentation on atomic steps: https://github.com/tinkerpop/gremlin/wiki/Gremlin-Steps.
24. Loading a Toy Graph
name = "lop" marko$ ./gremlin.sh
lang = "java"
weight = 0.4 3
,,,/
name = "marko"
age = 29 created
(o o)
weight = 0.2
9 -----oOOo-(_)-oOOo-----
1 created gremlin> g = TinkerGraphFactory.createTinkerGraph()
8 created
12 ==>tinkergraph[vertices:6 edges:6]
7 weight = 1.0
weight = 0.5
weight = 0.4 6 gremlin>
knows
knows 11 name = "peter"
age = 35
4
name = "josh"
age = 32
16
2
10
name = "vadas"
age = 27
weight = 1.0
created
5
name = "ripple"
lang = "java"
16
Load up the toy 6 vertex/6 edge graph that comes hardcoded with Blueprints.
25. Basic Gremlin Traversals - Part 1
name = "lop" gremlin> g.v(1)
lang = "java" ==>v[1]
weight = 0.4
gremlin> g.v(1).map
name = "marko" 3
age = 29
==>name=marko
created
9
==>age=29
1 created gremlin> g.v(1).outE
8 created
7
12 ==>e[7][1-knows->2]
weight = 0.5 6 ==>e[9][1-created->3]
knows ==>e[8][1-knows->4]
knows 11
weight = 1.0 gremlin>
name = "josh"
4 age = 32
2
10 17
name = "vadas"
age = 27
created
5
17
Look at the neighborhood around marko.
26. Basic Gremlin Traversals - Part 2
name = "lop"
gremlin> g.v(1)
lang = "java" ==>v[1]
gremlin> g.v(1).outE[[label:‘created’]].inV
name = "marko" 3 ==>v[3]
age = 29 created
gremlin> g.v(1).outE[[label:‘created’]].inV.name
9
==>lop
created
1 gremlin>
8 created
12
7
6 18
knows
knows 11
4
2
10
created
5
18
What has marko created?
27. Basic Gremlin Traversals - Part 3
name = "lop"
gremlin> g.v(1)
lang = "java" ==>v[1]
gremlin> g.v(1).outE[[label:‘knows’]].inV
name = "marko" 3 ==>v[2]
age = 29 created ==>v[4]
9 gremlin> g.v(1).outE[[label:‘knows’]].inV.name
1 created
8 created ==>vadas
12
7 ==>josh
6
gremlin> g.v(1).outE[[label:‘knows’]].inV{it.age < 30}.name
knows
knows 11 ==>vadas
gremlin>
name = "josh"
4 age = 32
2
10 19
name = "vadas"
age = 27
created
5
19
Who does marko know? Who does marko know that is under 30 years of age?
28. Basic Gremlin Traversals - Part 4
codeveloper gremlin> g.v(1)
==>v[1]
gremlin> g.v(1).outE[[label:‘created’]].inV
lop codeveloper ==>v[3]
created gremlin> g.v(1).outE[[label:‘created’]].inV
codeveloper created .inE[[label:‘created’]].outV
marko
==>v[1]
peter ==>v[4]
created
==>v[6]
knows knows gremlin> g.v(1).outE[[label:‘created’]].inV
.inE[[label:‘created’]].outV.except([g.v(1)])
josh ==>v[4]
vadas
==>v[6]
gremlin> g.v(1).outE[[label:‘created’]].inV
created .inE[[label:‘created’]].outV.except([g.v(1)]).name
==>josh
ripple
==>peter
gremlin>
20
20
Who are marko’s codevelopers? That is, people who have created the same software as him, but are
not him (marko can’t be a codeveloper of himeself).
29. Basic Gremlin Traversals - Part 5
gremlin> Gremlin.addStep(‘codeveloper’)
==>null
lop gremlin> c = {_{x = it}.outE[[label:‘created’]].inV
codeveloper
.inE[[label:‘created’]].outV{!x.equals(it)}}
created
==>groovysh_evaluate$_run_closure1@69e94001
created
marko gremlin> [Vertex, Pipe].each{ it.metaClass.codeveloper =
created { Gremlin.compose(delegate, c())}}
codeveloper peter
==>interface com.tinkerpop.blueprints.pgm.Vertex
knows knows ==>interface com.tinkerpop.pipes.Pipe
gremlin> g.v(1).codeveloper
==>v[4]
josh
vadas ==>v[6]
gremlin> g.v(1).codeveloper.codeveloper
created ==>v[1]
==>v[6]
==>v[1]
ripple ==>v[4]
21
21
I don’t want to talk in terms of outE, inV, etc. Given my domain model, I want to talk in terms of
higher-order, abstract adjacencies — e.g. codeveloper.
31. Grateful Dead Concert Graph
The Grateful Dead were an American band that was born out of the San Francisco,
California psychedelic movement of the 1960s. The band played music together
from 1965 to 1995 and is well known for concert performances containing extended
improvisations and long and unique set lists. [http://arxiv.org/abs/0807.2466]
32. Grateful Dead Concert Graph Schema
• vertices
song vertices
∗ type: always song for song vertices.
∗ name: the name of the song.
∗ performances: the number of times the song was played in concert.
∗ song type: whether the song is a cover song or an original.
artist vertices
∗ type: always artist for artist vertices.
∗ name: the name of the artist.
• edges
followed by (song → song): if the tail song was followed by the head song in
concert.
∗ weight: the number of times these two songs were paired in concert.
sung by (song → artist): if the tail song was primarily sung by the head artist.
written by (song → artist): if the tail song was written by the head artist.
34. Centrality in a Property Graph
...
Garcia sung_by
sung_by
sung_by
sung_by
sung_by
sung_by
sung_by sung_by sung_by
sung_by sung_by sung_by
Dark Terrapin China Stella Peggy
Star Station Doll Blue O
...
followed_by followed_by
followed_by followed_by
followed_by followed_by
• What is the most central vertex in this graph? – Garcia.
• What is the most central song? What do you mean “central?”
• How do you calculate centrality when there are numerous ways in which vertices can
be related?
35. Loading the GraphML Representation
gremlin> g = new TinkerGraph()
==>tinkergraph[vertices:0 edges:0]
gremlin> GraphMLReader.inputGraph(g,
new FileInputStream(‘data/graph-example-2.xml’))
==>null
gremlin> g.V.name
==>OTHER ONE JAM
==>Hunter
==>HIDEAWAY
==>A MIND TO GIVE UP LIVIN
==>TANGLED UP IN BLUE
...
22
22
Load the GraphML (http://graphml.graphdrawing.org/) representation of the Grateful Dead
graph, iterate through its vertices and get the name property of each vertex.
36. Using Graph Indices
gremlin> v = g.idx(T.v)[[name:‘DARK STAR’]] >> 1
==>v[89]
gremlin> v.outE[[label:‘sung_by’]].inV.name
==>Garcia
23
23
Use the default vertex index (T.v) and find all vertices index by the key/value pair name/DARK STAR.
Who sung Dark Star?
37. Emitting Collections
gremlin> v.outE[[label:‘followed_by’]].inV.name
==>EYES OF THE WORLD
==>TRUCKING
==>SING ME BACK HOME
==>MORNING DEW
...
gremlin> v.outE[[label:‘followed_by’]].emit{[it.inV.name >> 1, it.weight]}
==>[EYES OF THE WORLD, 9]
==>[TRUCKING, 1]
==>[SING ME BACK HOME, 1]
==>[MORNING DEW, 11]
...
24
24
What followed Dark Star in concert?
How many times did each song follow Dark Start in concert?
38. Eigenvector Centrality – Indexing Vertices
gremlin> m = [:]; c = 0
==>0
gremlin> g.V.outE[[label:‘followed_by’]].inV.groupCount(m)
.loop(4){c++ < 1000}.filter{false}
gremlin> m.sort{a,b -> b.value <=> a.value}
==>v[13]=762
==>v[21]=668
==>v[50]=661
==>v[153]=630
==>v[96]=609
==>v[26]=586
...
25
25
Emanate from each vertex the followed by path. Index each vertex by how many times its been
traversed over in the map m. Do this for 1000 times. The final filter{false} will ensure nothing is
outputted.
39. Eigenvector Centrality – Indexing Names
gremlin> m = [:]; c = 0
==>0
gremlin> g.V.outE[[label:‘followed_by’]].inV.name.groupCount(m)
.back(2).loop(4){c++ < 1000}.filter{false}
gremlin> m.sort{a,b -> b.value <=> a.value}
==>PLAYING IN THE BAND=762
==>TRUCKING=668
==>JACK STRAW=661
==>SUGAR MAGNOLIA=630
==>DRUMS=609
==>PROMISED LAND=586
...
26
26
Do the same, index the names of the songs in the map m. However, since you can get the outgoing
edges of a string, jump back 2 steps, then loop.
40. Paths in a Graph
gremlin> g.v(89).outE.inV.name
==>EYES OF THE WORLD
==>TRUCKING
==>SING ME BACK HOME
==>MORNING DEW
==>HES GONE
...
gremlin> g.v(89).outE.inV.name.paths
==>[v[89], e[7021][89-followed_by->83], v[83], EYES OF THE WORLD]
==>[v[89], e[7022][89-followed_by->21], v[21], TRUCKING]
==>[v[89], e[7023][89-followed_by->206], v[206], SING ME BACK HOME]
==>[v[89], e[7006][89-followed_by->127], v[127], MORNING DEW]
==>[v[89], e[7024][89-followed_by->49], v[49], HES GONE]
...
27
27
What are the paths that are out.inV.name emanating from Dark Star (v[89])?
41. Paths of Length < 4 Between Dark Star and Drums
gremlin> g.v(89).outE.inV.loop(2){it.loops < 4 & !(it.object.equals(g.v(96)))}[[id:‘96’]].paths
==>[v[89], e[7014][89-followed_by->96], v[96]]
==>[v[89], e[7021][89-followed_by->83], v[83], e[1418][83-followed_by->96], v[96]]
==>[v[89], e[7022][89-followed_by->21], v[21], e[6320][21-followed_by->96], v[96]]
==>[v[89], e[7006][89-followed_by->127], v[127], e[6599][127-followed_by->96], v[96]]
==>[v[89], e[7024][89-followed_by->49], v[49], e[6277][49-followed_by->96], v[96]]
==>[v[89], e[7025][89-followed_by->129], v[129], e[5751][129-followed_by->96], v[96]]
==>[v[89], e[7026][89-followed_by->149], v[149], e[2017][149-followed_by->96], v[96]]
==>[v[89], e[7027][89-followed_by->148], v[148], e[1937][148-followed_by->96], v[96]]
==>[v[89], e[7028][89-followed_by->130], v[130], e[1378][130-followed_by->96], v[96]]
==>[v[89], e[7019][89-followed_by->39], v[39], e[6804][39-followed_by->96], v[96]]
==>[v[89], e[7034][89-followed_by->91], v[91], e[925][91-followed_by->96], v[96]]
==>[v[89], e[7035][89-followed_by->70], v[70], e[181][70-followed_by->96], v[96]]
==>[v[89], e[7017][89-followed_by->57], v[57], e[2551][57-followed_by->96], v[96]]
...
28
28
Get the adjacent vertices to Dark Star (v[89]). Loop on outE.inV while the amount of loops is less
than 4 and the current path hasn’t reached Drums (v[96]). Return the paths traversed.
42. Shortest Path Between Dark Star and Drums
gremlin> [g.v(89).outE.inV
.loop(2){!(it.object.equals(g.v(96)))}[[id:‘96’]].paths >> 1]
==>[v[89], e[7014][89-followed_by->96], v[96]]
gremlin> (g.v(89).outE.inV.loop(2)
{!(it.object.equals(g.v(96)))}[[id:‘96’]].paths >> 1).size() / 3
==>1
29
29
Given the nature of the loop step, the paths are emitted in order of length. Thus, just “pop off” the
first path (>> 1) and that is the shortest path. You can then gets its length if you want to know the length
of the shortest path. Be sure to divide by 3 as the path has the start vertex, edge, and end vertex.
43. Integrating the JDK (Java API)
• Groovy is the host language for Gremlin. Thus, the JDK is natively
available in a path expression.
gremlin> v.outE[[label:‘followed_by’]].inV{it.name.matches(‘D.*’)}.name
==>DEAL
==>DRUMS
• Examples below are not with respect to the Grateful Dead schema
presented previously. They help to further articulate the point at hand.
gremlin> x.outE.inV.emit{JSONParser.get(it.uri).reviews}.mean()
...
gremlin> y.outE[[label:‘rated’]].filter{TextAnalysis.isElated(it.review)}.inV
...
44. The Concept of Abstract Adjacency – Part 1
• Loop on outE.inV.
g.v(89).outE.inV.loop(2){it.loops < 4}
• Loop on followed by.
g.v(89).outE[[label:‘followed_by’]].inV.loop(3){it.loops < 4}
• Loop on cofollowed by.
g.v(89).outE[[label:‘followed_by’]].inV
.inE[[label:‘followed_by’]].outV.loop(6){it.loops < 4}
45. The Concept of Abstract Adjacency – Part 2
outE
in outE
V
loop(2){..} back(3)
{i
t.
sa
la
} ry
}
friend_of_a_friend_who_earns_less_than_friend_at_work
• outE, inV, etc. is low-level graph speak (the domain is the graph).
• codeveloper is high-level domain speak (the domain is software development).30
30
In this way, Gremlin can be seen as a DSL (domain-specific language) for creating DSL’s for your
graph applications. Gremlin’s domain is “the graph.” Build languages for your domain on top of Gremlin
(e.g. “software development domain”).
46. Developer's FOAF Import
Graph
Friend-Of-A-Friend
Graph
Developer Imports Friends at Work
Graph Graph
You need not make
Software Imports Friendship derived graphs explicit.
Graph Graph You can, at runtime,
compute them.
Moreover, generate
them locally, not
Developer Created globally (e.g. ``Marko's
Employment
Graph friends from work
Graph
relations").
This concept is related
to automated reasoning
and whether reasoned
relations are inserted
into the explicit graph or
Explicit Graph computed at query time.
47. The Concept of Abstract Adjacency – Part 3
• Gremlin provides fine grained (Turing-complete) control over how your
traverser moves through a property graph.
• As such, there are as many shortest paths, eigenvectors/PageRanks,
clusters, cliques, etc. as there are “abstract adjacencies” in the
graph.31 32
This is the power of the property graph over standard unlabeled graphs.
There are many ways to slice and dice your data.
31
Rodriguez M.A., Shinavier, J., “Exposing Multi-Relational Networks to Single-Relational Network
Analysis Algorithms, Journal of Informetrics, 4(1), pp. 29–41, 2009. [http://arxiv.org/abs/0806.2274]
32
Other terms for abstract adjacencies include domain-specific relations, inferred relations, implicit edges,
virtual edges, abstract paths, derived adjacencies, higher-order relations, etc.
48. In the End, Gremlin is Just Pipes
gremlin> (g.v(89).outE[[label:‘followed_by’]].inV.name).toString()
==>[VertexEdgePipe<OUT_EDGES>, LabelFilterPipe<NOT_EQUAL,followed_by>,
EdgeVertexPipe<IN_VERTEX>, PropertyPipe<name>]
33
33
Gremlin is a domain specific language for constructing lazy evaluation, data flow Pipes (http:
//pipes.tinkerpop.com). In the end, what is evaluated is a pipeline process in pure Java over a
Blueprints-enabled property graph (http://blueprints.tinkerpop.com).
49. Credits
• Marko A. Rodriguez: designed, developed, and documented Gremlin.
• Pavel Yaskevich: heavy development on earlier versions of Gremlin.
• Darrick Wiebe: development on Pipes, Blueprints, and inspired the move to using
Groovy as the host language of Gremlin.
• Peter Neubauer: promoter and evangelist of Gremlin.
• Ketrina Yim: designed the Gremlin logo.
Gremlin G = (V, E)
http://gremlin.tinkerpop.com
http://groups.google.com/group/gremlin-users
http://tinkerpop.com