Relational DBMS and Document Databases use the "JOIN" operation to connect records and documents. Is there a better way to connect things? This presentation illustrates how OrientDB manages relationships by using the same technique of Graph Databases for super fast traversal.
1. Why Relationships
are cool
but the âJOINâ sucks
Luca Garulli â
Founder and CEO
@Orient Technologies Ltd
Author of OrientDB
www.twitter.com/lgarulli
(c) Luca Garulli
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 1
www.orientechnologies.com
2. 1979
First Relational DBMS available as product
2009
NoSQL movement
(c) Luca Garulli
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 2
3. 1979
First Relational DBMS available as product
Hey, 30 years in the
IT field is so huge!
2009
NoSQL movement
(c) Luca Garulli
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 3
4. Before 2009 teams of developers
always fought to select:
Operative System
Programming Language
Middleware (App-Servers)
What about the Database?
(c) Luca Garulli
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 4
5. One of the main resistances of
RDBMS users to pass to a NoSQL product
are related to the
complexity of the model:
Ok, NoSQL products are super for
BigData and BigScale
but...
(c) Luca Garulli
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 5
6. ...what about the model?
(c) Luca Garulli
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 6
7. What is the NoSQL answer
about managing complex domains?
(c) Luca Garulli
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
No Relationships
support
Key-Value stores ?
Column-Based ?
Document database ?
Graph database !
Page 7
8. Why
most of NoSQL
products
donât support
Relationship
Between entities?
(c) Luca Garulli
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 8
9. To understand why,
letâs see how
Relational DBMS
managed them
(c) Luca Garulli
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 9
10. Domain: the super minimal âSelling Appâ
Customer
Customer
Address
Address
Registry system
Order system
Order
Order
(c) Luca Garulli
Stock
Stock
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 10
11. Domain: the super minimal âSelling Appâ
Customer
Customer
Address
Address
How does
Relational DBMS
manage relationships?
Registry system
Order system
Order
Order
(c) Luca Garulli
Stock
Stock
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 11
12. Relational World: 1-1 Relationships
Primary key
Primary key
Customer
Id
Name
Address
Address
10 Luca
34
11 Jill
Foreign key
Id
Location
34
Rome
44
44
London
34 John
54
54
Moscow
56 Mark
66
66
New Mexico
88 Steve
68
68
Palo Alto
JOIN Customer.Address -> Address.Id
(c) Luca Garulli
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 12
13. Relational World: 1-N Relationships
Customer
Id
Address
Name
Id
Customer
Location
10 Luca
24
10
Rome
11 Jill
33
10
London
34 John
44
34
Moscow
56 Mark
66
56
Cologne
88 Steve
68
88
Palo Alto
Inverse JOIN Address.Customer -> Customer.Id
(c) Luca Garulli
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 13
14. Relational World: N-M Relationships
Customer
Id
Name
CustomerAddress
Id
Address
Address
Id
Location
10
Luca
10
24
24
Rome
11
Jill
10
33
33
London
34
John
34
44
44
Moscow
56
Mark
66
Cologne
88
Steve
68
Palo Alto
Additional table with 2 JOINs
(1) CustomerAddress.Id -> Customer.Id and
(2) CustomerAddress.Address -> Address.Id
(c) Luca Garulli
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 14
15. Whatâs wrong with the
Relational Model?
(c) Luca Garulli
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 15
16. The JOIN is the evil!
Customer
Id
CustomerAddress
Name
Id
Address
Address
Id
Location
10
Luca
10
24
24
Rome
11
Jill
10
33
33
London
34
John
34
24
44
Moscow
56
Mark
66
Cologne
88
Steve
68
Palo Alto
These are all JOINs executed
everytime you traverse a
relationship!
relationship
(c) Luca Garulli
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 16
17. A JOIN means searching for a key in
another table
The first rule to improve performance
is indexing all the keys
Index speeds up searches, but slows down
insert, updates and deletes
(c) Luca Garulli
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 17
18. So in the best case a JOIN is a lookup
into an index
This is done per single join!
If you traverse hundreds of relationships
youâre executing hundreds of JOINs
(c) Luca Garulli
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 18
19. Index Lookup
is it really that fast?
(c) Luca Garulli
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 19
20. Index Lookup: how does it works?
A-Z
A-L
M-Z
Think to an
Address Book
where we have to find
the Lucaâs phone
number
(c) Luca Garulli
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 20
21. Index Lookup: how does it works?
A-Z
A-L
M-Z
A-L
A-D
M-Z
E-L
M-R
S-Z
Index algorithms are all
similar and based on
balanced trees
(c) Luca Garulli
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 21
22. Index Lookup: how does it works?
A-Z
A-L
M-Z
A-L
A-D
M-Z
E-L
M-R
A-D
A-B
(c) Luca Garulli
S-Z
E-L
C-D
E-G
H-L
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 22
23. Index Lookup: how does it works?
A-Z
A-L
M-Z
A-L
A-D
M-Z
E-L
M-R
A-D
A-B
E-L
C-D
E-G
H-L
E-G
E-F
(c) Luca Garulli
S-Z
H-L
G
H-J
K-L
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 23
24. Index Lookup: how does it works?
A-Z
A-L
M-Z
A-L
A-D
M-Z
E-L
A-D
A-B
Found!
M-R
S-Z
This lookup took 5
steps and grows
up with the index
size!
E-L
C-D
E-G
H-L
E-G
E-F
H-L
G
H-J
K-L
Luca
(c) Luca Garulli
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 24
25. Can you imagine
how many steps a
Lookup operation does into an
Index with Millions or Billions
of records?
(c) Luca Garulli
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 25
26. And this JOIN is executed
foreach involved table,
multiplied
foreach scanned records
!
(c) Luca Garulli
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 26
27. Querying more tables can easily
produce millions of JOINs/Lookups!
Here the rule: more entries
= more lookup steps = slower JOIN
(c) Luca Garulli
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 27
28. Oh! This is why
performance of my database
drops down when
it becomes bigger,
and bigger,
and bigger!
(c) Luca Garulli
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 28
29. What about
Document Databases
like MongoDB?
(c) Luca Garulli
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 29
31. MongoDB uses the same approach:
it stores the _id of the connected
documents. At run-time it lookups up
for the _id by using an index.
(c) Luca Garulli
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 31
32. Is there a better way to
manage relationships?
(c) Luca Garulli
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 32
33. âA graph database is any
storage system
that provides
index-free adjacencyâ
- Marko Rodriguez
(author of TinkerPop Blueprints)
(c) Luca Garulli
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 33
34. How does GraphDB manage
index-free relationships?
(c) Luca Garulli
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 34
35. Every developer knows
the Relational Model,
but who knows the
Graph one?
(c) Luca Garulli
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 35
36. Back to school:
Graph Theory crash course
(c) Luca Garulli
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 36
37. Basic Graph
Luca
Luca
(c) Luca Garulli
Likes
NoSQL
NoSQL
Day
Day
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 37
38. Property Graph Model*
Vertices are
directed
Luca
Luca
Likes
name: Luca
name: Luca
surname: Garulli
surname: Garulli
company: Orient Tech
company: Orient Tech
since: 2013
NoSQL
NoSQL
Day
Day
date: Nov 15° 2013
date: Nov 15° 2013
Vertices and Edges
can have properties
* https://github.com/tinkerpop/blueprints/wiki/Property-Graph-Model
(c) Luca Garulli
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 38
41. Compliments, this is your diploma in
«Graph Theory»
(c) Luca Garulli
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 41
42. The Graph theory
is so simple to be so
powerful
(c) Luca Garulli
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 42
43. Letâs go back
to the Graph Stuff
How does OrientDB
manage relationships?
(c) Luca Garulli
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 43
44. OrientDB: traverse a relationship
The Record ID (RID)
is the physical position
RID = #13:35
RID = #13:35
RID = #13:100
RID = #13:100
Luca
Luca
Rome
Rome
(vertex)
(vertex)
label : :âCustomerâ
label âCustomerâ
name : :âLucaâ
name âLucaâ
(c) Luca Garulli
(vertex)
(vertex)
label = âAddressâ
label = âAddressâ
name = âRomeâ
name = âRomeâ
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 44
45. OrientDB: traverse a relationship
The Edgeâs RID is saved
inside both vertices, as
«out» and «in»
RID = #13:35
RID = #13:35
RID = #13:100
RID = #13:100
RID = #14:54
RID = #14:54
Luca
Luca
(vertex)
(vertex)
out ::[#14:54]
out [#14:54]
label : :âCustomerâ
label âCustomerâ
name : :âLucaâ
name âLucaâ
(c) Luca Garulli
Lives
out: [#13:35]
out: [#13:35]
in: [#13:100]
in: [#13:100]
Label : :âLivesâ
Label âLivesâ
Rome
Rome
(vertex)
(vertex)
in: [#14:54]
in: [#14:54]
label = âAddressâ
label = âAddressâ
name = âRomeâ
name = âRomeâ
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 45
48. GraphDB handles relationships as a
physical LINK to the record
assigned when the edge is created
on the other side
RDBMS computes the
relationship every time you query a database
Is not that crazy?!
(c) Luca Garulli
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 48
49. This means jumping from a
O(log N) algorithm to a near O(1)
traversing cost is not more affected
by database size!
This is huge in the BigData age
(c) Luca Garulli
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 49
50. an Open Source (Apache licensed)
document-graph NoSQL dbms
(c) Luca Garulli
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 50
51. OrientDB in the Blueprints micro-benchmark,
on common hw, with a hot cache,
traverses 29,6 Millions
of records in less than 5 seconds
about 6 Millions of nodes traversed per sec!
Do not try this at home
with a RDBMS*!
*unless you live in the Googleâs server farm
(c) Luca Garulli
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 51
52. Create the graph in SQL
$luca> cd bin
$luca> ./console.sh
OrientDB console v.1.6.1 (www.orientdb.org)
Type 'help' to display all the commands supported.
orientdb> create vertex Customer set name = âLucaâ
Created vertex #13:35 in 0.03 secs
orientdb> create vertex Address set name = âRomeâ
Created vertex #13:100 in 0.02 secs
orientdb> create edge Lives from #13:35 to #13:100
Created edge #14:54 in 0.02 secs
(c) Luca Garulli
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 52
54. Query the graph in SQL
orientdb> select in(âLivesâ) from Address where name = âRomeâ
---+------+---------|--------------------+--------------------+--------+
  #| RID  |@class   |label               |out_Lives           |in      |
---+------+---------+--------------------+--------------------+--------+
  0| 13:35|Customer |Luca                |[#14:54]            |        |
---+------+---------+--------------------+--------------------+--------+
1 item(s) found. Query executed in 0.007 sec(s).
Incoming vertices
(c) Luca Garulli
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 54
55. More on query power
orientdb> select sum( out(âOrderâ).total ) from Customer
where name = âLucaâ
orientdb> traverse both(âFriendâ)
from Customer while $depth <= 7
orientdb> select from (
traverse both(âFriendâ)
from Customer while $depth <= 7
) where @class=âCustomerâ and city.name = âUdineâ
(c) Luca Garulli
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 55
56. Query vs traversal
Once youâve a well connected database
in the form of a Super Graph you can
cross records instead of query them!
All you need is a fewâRoot Verticesâ
where to start traversing
(c) Luca Garulli
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 56
58. Root Vertices can be enriched by
Meta Graphs
to decorate Graphs with
additional information
and make easier/faster
the retrieval
(c) Luca Garulli
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 58
59. Temporal based Meta Graph
Calendar
Calendar
Year
Year
2013
2013
Month
Month
April 2013
April 2013
Day
Day
9/4/2013
9/4/2013
Hour
Hour
9/4/2013
9/4/2013
09:00
09:00
Order
Order
2332
2332
(c) Luca Garulli
Hour
Hour
9/4/2013
9/4/2013
10:00
10:00
Order
Order
2333
2333
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Order
Order
2334
2334
Page 59
60. Location based Meta Graph
Location
Location
Country
Country
Italy
Italy
Region
Region
Lazio
Lazio
State
State
RM
RM
City
City
Fiumicino
Fiumicino
Order
Order
2332
2332
(c) Luca Garulli
City
City
Rome
Rome
Order
Order
2333
2333
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Order
Order
2334
2334
Page 60
61. Mix & Merge graphs
Region
Region
Lazio
Lazio
Country
Country
Italy
Italy
State
State
RM
RM
City
City
Rome
Rome
City
City
Fiumicino
Fiumicino
Location
Location
Order
Order
2332
2332
Order
Order
2333
2333
Order
Order
2334
2334
Calendar
Calendar
Year
Year
2013
2013
(c) Luca Garulli
Hour
Hour
9/4/2013
9/4/2013
09:00
09:00
Month
Month
April 2013
April 2013
Hour
Hour
9/4/2013
9/4/2013
10:00
10:00
Day
Day
9/4/2013
9/4/2013
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 61
62. Region
Region
Lazio
Lazio
Get all the orders
sold in âFiumicinoâ city
Order
Order
Order
Order
2332
2333
2332
2333
on 9/4/2013 at 10:00
Country
Country
Italy
Italy
Location
Location
Calendar
Calendar
Year
Year
2013
2013
(c) Luca Garulli
State
State
RM
RM
City
City
Rome
Rome
City
City
Fiumicino
Fiumicino
Hour
Hour
9/4/2013
9/4/2013
09:00
09:00
Month
Month
April 2013
April 2013
Hour
Hour
9/4/2013
9/4/2013
10:00
10:00
Day
Day
9/4/2013
9/4/2013
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 62
Order
Order
2334
2334
63. Start from Calendar, look for Hour 10:00
Region
Region
Lazio
Lazio
Country
Country
Italy
Italy
State
State
RM
RM
City
City
Rome
Rome
City
City
Fiumicino
Fiumicino
Location
Location
Order
Order
2332
2332
Order
Order
2333
2333
Order
Order
2334
2334
Calendar
Calendar
Year
Year
2013
2013
(c) Luca Garulli
Hour
Hour
9/4/2013
9/4/2013
09:00
09:00
Month
Month
April 2013
April 2013
Hour
Hour
9/4/2013
9/4/2013
10:00
10:00
Day
Day
9/4/2013
9/4/2013
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 63
64. Start from Calendar, look for Hour 10:00
Found 2 Orders,
filter by incoming
now filter by
City
City
incoming edges
edges<
Region
Region
Lazio
Lazio
Country
Country
Italy
Italy
State
State
RM
RM
City
City
Fiumicino
Fiumicino
Rome
Rome
Location
Location
Order
Order
2332
2332
Order
Order
2333
2333
Order
Order
2334
2334
Calendar
Calendar
Year
Year
2013
2013
(c) Luca Garulli
Hour
Hour
9/4/2013
9/4/2013
09:00
09:00
Month
Month
April 2013
April 2013
Hour
Hour
9/4/2013
9/4/2013
10:00
10:00
Day
Day
9/4/2013
9/4/2013
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 64
65. Start from Calendar, look for Hour 10:00
Region
Region
Lazio
Lazio
Country
Country
Italy
Italy
State
State
RM
RM
City
City
Rome
Rome
City
City
Fiumicino
Fiumicino
Location
Location
Only âOrder 2333â has
incoming connections
with âFiumicinoâ
Order
Order
2332
2332
Order
Order
2333
2333
Order
Order
2334
2334
Calendar
Calendar
Year
Year
2013
2013
(c) Luca Garulli
Hour
Hour
9/4/2013
9/4/2013
09:00
09:00
Month
Month
April 2013
April 2013
Hour
Hour
9/4/2013
9/4/2013
10:00
10:00
Day
Day
9/4/2013
9/4/2013
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 65
66. Or start from Location, look for Fiumicino
Region
Region
Lazio
Lazio
Country
Country
Italy
Italy
State
State
RM
RM
City
City
Rome
Rome
City
City
Fiumicino
Fiumicino
Location
Location
Order
Order
2332
2332
Order
Order
2333
2333
Order
Order
2334
2334
Calendar
Calendar
Year
Year
2013
2013
(c) Luca Garulli
Hour
Hour
9/4/2013
9/4/2013
09:00
09:00
Month
Month
April 2013
April 2013
Hour
Hour
9/4/2013
9/4/2013
10:00
10:00
Day
Day
9/4/2013
9/4/2013
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 66
67. Start from Location, look for Fiumicino
Region
Region
Lazio
Lazio
Country
Country
Italy
Italy
State
State
RM
RM
City
City
Rome
Rome
City
City
Fiumicino
Fiumicino
Location
Location
Order
Order
2332
2332
Order
Order
2333
2333
Order
Order
2334
2334
Calendar
Calendar
Year
Year
2013
2013
(c) Luca Garulli
Hour
Hour
9/4/2013
9/4/2013
09:00
09:00
Month
Month
April 2013
April 2013
Hour
Hour
9/4/2013
9/4/2013
10:00
10:00
Day
Day
9/4/2013
9/4/2013
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 67
68. This is your database
(c) Luca Garulli
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 68
69. Get last customer bought âBaroloâ
select last(out(âOrderâ).in(âCustomer)) from Stock
where name = âBaroloâ
#34:22
(c) Luca Garulli
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 69
70. Get hisâs country
select out(âCityâ) from #34:22
Udine, Italy
#55:12
(c) Luca Garulli
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 70
71. Get orders from that country
select in(âCustomerâ) from #55:12
(c) Luca Garulli
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 71
72. Letâs move like a
Spider
on the web
(c) Luca Garulli
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 72