Cloud east shutl_talk

How Neo4j helps Shutl
to delivery even faster...
Tuesday, 28 May 13

Volker Pacher
senior developer @shutl
@vpacher
http://github.com/vpacher
Tuesday, 28 May 13

• SaaS platform
Tuesday, 28 May 13

• SaaS platform
• we provide an API for carriers and merchants
Tuesday, 28 May 13

• SaaS platform
• shutl.it C2C platform
Tuesday, 28 May 13

• SaaS platform
• customers can chose between a delivery either:
Tuesday, 28 May 13

• SaaS platform
within 90 minutes of purchase
Tuesday, 28 May 13

• SaaS platform
or a 1 hour window of their choice
Tuesday, 28 May 13

• SaaS platform
(same day or any day)
Tuesday, 28 May 13

• SaaS platform
• fastest delivery to date 15:00 min
Tuesday, 28 May 13

• SaaS platform
• fastest delivery to date 15:00 min
• SOA with services built using jRuby, sinatra, mongoDB and neo4j
Tuesday, 28 May 13

http://xkcd.com/287/
Tuesday, 28 May 13

problems with our previous attempt (v1):
Tuesday, 28 May 13

• exponential growth of joins in mysql with added features
Tuesday, 28 May 13

• code base too complex and unmaintanable
Tuesday, 28 May 13

• api response time growing too large the more data was added
Tuesday, 28 May 13

• api response time growing too large the more data was added
• our fastest delivery was quicker then our slowest query!
Tuesday, 28 May 13

The case for graph databases:
Tuesday, 28 May 13

• relationships are explicit stored (RDBS lack relationships)
Tuesday, 28 May 13

• domain modelling is simpliﬁed because adding new ‘subgraphs‘
doesn’t affect the existing structure and queries (additive model)
Tuesday, 28 May 13

• white board friendly
Tuesday, 28 May 13

• schema-less
Tuesday, 28 May 13

• schema-less
• db performance remains relatively constant because queries are
localized to its portion of the graph. O(1) for same query
Tuesday, 28 May 13

• schema-less
• db performance remains relatively constant because queries are
localized to its portion of the graph. O(1) for same query
• traversals of relationships are easy and very fast
Tuesday, 28 May 13

What is a graph anyway?
Node 1 Node 2
Node 4
Node 3
a collection of vertices (nodes)
connected by edges (relationships)
Tuesday, 28 May 13

a short history
Leonard Euler
the seven bridges of Königsberg (1735)
Tuesday, 28 May 13

directed graph
Node 1 Node 2
Node 4
Node 3
each relationship has a direction or
one start node and one end node
Tuesday, 28 May 13

property graph
name:Volker
•nodes contain properties (key, value)
•relationships have a type and are always directed
•relationships can contain properties too
name: Sam
:friends
name: Megan
:knows
since: 2005
name: Paul
:friends
:works_for
:knows
Tuesday, 28 May 13

a graph is its own index (constant query performance)
Tuesday, 28 May 13

the case for Neo4j
Tuesday, 28 May 13

the case for Neo4j
• we can run it embedded in the same jvm
Tuesday, 28 May 13

the case for Neo4j
• we can use jruby as we know ruby very well already
Tuesday, 28 May 13

the case for Neo4j
• lots of good ruby libraries are available, we chose the neo4j gem
by Andreas Ronge (https://github.com/andreasronge/neo4j)
Tuesday, 28 May 13

the case for Neo4j
• it speaks cypher
Tuesday, 28 May 13

the case for Neo4j
• it speaks cypher
• the guys from neotech are awesome
Tuesday, 28 May 13

neo4j (jvm)
ﬂockdb (jvm)
DEX (c++)
OrientDB (jvm)
Sones GraphDB (c#)
some graph dbs available:
Tuesday, 28 May 13

embedded vs. standalone
pros:
cons:
better performance
transaction support
neo4j gem is available
we can use cypher and
traversal
only the code running the
db has access to the db
access via rest api and cypher
language independent and
code doesn’t need to run on
JVM
not as performant
only works with cypher
transaction is on a per query
basis
need to write model wrappers
for ourselves
Tuesday, 28 May 13

gotchas and other stuff to consider:
Tuesday, 28 May 13

• testing proved to be difﬁcult and we had to write our own tools
Tuesday, 28 May 13

• migrations of schemaless dbs are more difﬁcult to stay on top of and require
special solutions in the case of graph dbs
Tuesday, 28 May 13

• seeding an embedded database is hard
Tuesday, 28 May 13

• graph db partioning is almost impossible and the whole graph needs to be in
memory
Tuesday, 28 May 13

memory
• encoding Dates and Times that are stored in UTC and work across timezone is
non-trivial
Tuesday, 28 May 13

memory
• encoding Dates and Times that are stored in UTC and work across timezone is
non-trivial
• nested datastructure (hashes and array) can’t be stored and need to be
converted to json
Tuesday, 28 May 13

Querying the graph: Cypher
Tuesday, 28 May 13

• declarative query language speciﬁc to neo4j
Tuesday, 28 May 13

• easy to learn and intuitive
Tuesday, 28 May 13

• enables the user to specify speciﬁc patterns to query for (something that looks
like ‘this’)
Tuesday, 28 May 13

like ‘this’)
• inspired partly by SQL (WHERE and ORDER BY) and SPARQL (pattern matching)
Tuesday, 28 May 13

like ‘this’)
• focuses on what to query for and not how to query for it
Tuesday, 28 May 13

like ‘this’)
• focuses on what to query for and not how to query for it
• switch from a mySQl world is made easier by the use of cypher instead of having
to learn a traversal framework straight away
Tuesday, 28 May 13

• START: Starting points in the graph, obtained via index lookups or by element IDs.
• MATCH: The graph pattern to match, bound to the starting points in START.
• WHERE: Filtering criteria.
• RETURN: What to return.
• CREATE: Creates nodes and relationships.
• DELETE: Removes nodes, relationships and properties.
• SET: Set values to properties.
• FOREACH: Performs updating actions once per element in a list.
• WITH: Divides a query into multiple, distinct parts
cypher clauses
Tuesday, 28 May 13

an example graph
Node 1
me
Node 2
Steve
Node 3
Sam
Node 4
David
Node 5
Megan
me - [:knows] -> Steve -
[:knows] -> David
me - [:knows] -> Sam -
[:knows] -> Megan
Megan - [:knows] -> David
knows
knowsknows
knows
knows
Tuesday, 28 May 13

START me=node(1)
MATCH me-[:knows]->()-[:knows]->fof
RETURN fof
the query
Tuesday, 28 May 13

START me=node(1)
MATCH me-[:knows*2..]->fof
WHERE fof.name =~ 'Da.*'
RETURN fof
Tuesday, 28 May 13

a good place to try it out:
http://console.neo4j.org/
Tuesday, 28 May 13

root (0)
Year: 2013
Month: 05 Month 01
2014
01
05
2013
Year: 2014
Month: 06
06
Day: 24 Day: 25
24
25
Day: 26
26
Event 1 Event 2 Event 3
happens happens happens happens
representing dates/times
Tuesday, 28 May 13

ﬁnd all events on a speciﬁc day
START root=node(0)
MATCH root-[:‘2013’]-()-[:’05’]-()-[:’24’]-()-
[:happens]-event
RETURN event
Tuesday, 28 May 13

root (0)
Year: 2013
Month: 05 Month 01
2014
01
05
2013
Year: 2014
Month: 06
06
Day: 24 Day: 25
24
25
Day: 26
26
Event 1 Event 2 Event 3
next next
Tuesday, 28 May 13

ﬁnd all events for a given range
START root=node(0)
MATCH root-[:‘2013’]-()-[:’05’]-()-[:’24’]-start,
root-[:‘2013’]-()-[:’05’]-()-[:’26’]-end,
start-[:next*0..]-middle-[:next*0..]-end,
middle-[:happens]-event
RETURN event
Tuesday, 28 May 13

root (0)
Year: 2013
Month: 05 Month 01
2014
01
05
2013
Year: 2014
Month: 06
06
Day: 24 Day: 25
24
25
Day: 26
26
Event 1 (20) Event 2 Event 3
next next
Tuesday, 28 May 13

does an event happen on a certain date?
START event=node(20)
MATCH event-[:’24’]-()-[:’05’]-()-[:‘2013’]-()
RETURN event
Tuesday, 28 May 13

testing and importing:
• we are using rspec for all tests on the api and practice tdd/bdd
• setting up ‘scenarios’ for an integration test was difﬁcult and slow with existing tools
• we decided to built our own dsl based on the geoff notation developed by Nigel Small to
allow for the setting up of scenarios and for the import of data from mysql
Tuesday, 28 May 13

geoff:
developed by Nigel Small (@technige, http://geoff.nigelsmall.net/)
allows modelling of graphs in a human readable form
(A) {"name": "Alice"}
(B) {"name": "Bob"}
(A)-[:KNOWS]->(B)
and provides a java interface to insert them into an existing graph
Tuesday, 28 May 13

• imports any geoff ﬁle into a neo4j db
• it is open source
geoff-importer gem
(https://github.com/shutl/geoff-importer)
Tuesday, 28 May 13

• provides a dsl for creating a graph and inserting it into the db
• it is open source
• it works together with FactoryGirl
(https://github.com/thoughtbot/factory_girl)
• it supports only the graph structure of the neo4j gem at the
moment
• we haven’t solved all the issues with event listeners yet
geoff gem
(https://github.com/shutl/geoff)
Tuesday, 28 May 13

Geoff(Company, Person) do
company 'Acme' do
address "13 Something Road"
outgoing :employees do
person 'Geoff'
person 'Nigel' do
name 'Nigel Small'
end
end
end
company 'Github' do
outgoing :customers do
person 'Tom'
person 'Dick'
person 'Harry'
end
end
person 'Harry' do
incoming :customers do
company 'NeoTech'
end
end
end
geoff gem
(https://github.com/shutl/
geoff)
Tuesday, 28 May 13

root node
:company :person
acme
13 somthing road
NeoTech
GitHub
:all
:all
:all
Geoff
Nigel Small
Tom
Dick
Harry
:all
:all
:all
:all
:all
:employees
:employees
:customers
:customers
:customers
Tuesday, 28 May 13

QUESTIONS?
Volker Pacher
volker@shutl.com
www.shutl.com
Tuesday, 28 May 13

Cloud east shutl_talk

Recomendados

Recomendados

Mais conteúdo relacionado

Semelhante a Cloud east shutl_talk

Semelhante a Cloud east shutl_talk (20)

Cloud east shutl_talk