More Related Content Similar to En un mundo hiperconectado, las bases de datos de grafos son tu arma secreta (20) More from javier ramirez (20) En un mundo hiperconectado, las bases de datos de grafos son tu arma secreta1. MAD · NOV 22-23 · 2019
En un mundo hiperconectado,
las bases de datos de grafos
son tu arma secreta
Javier Ramirez
Technical Evangelist. Amazon Web Services
2. MAD · NOV 22-23 · 2019
Six degrees of Bacon
SAGIndie from Hollywood, USA - Flickr
CC BY 2.0
3. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
IF BY DEFAULT YOU THINK IN TABLES, YOU NEED
PROFESSIONAL HELP (OR SOME HOLIDAYS)
Purpose-built for a business process
Purpose-built to answer questions about
relationships
4. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
HIGHLY CONNECTED DATA
Retail Fraud DetectionRestaurant RecommendationsSocial Networks
5. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
US E C A S E S FO R HI GHLY C O NNE C T E D D A T A
Social Networking
Life Sciences Network & IT OperationsFraud Detection
Recommendations Knowledge Graphs
6. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
RECOMMENDATIONS BASED ON RELATIONSHIPS
7. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
KNOWLEDGE GRAPH APPLICATIONS
What museums should Alice
visit while in Paris?
Who painted the Mona Lisa?
What artists have paintings
in The Louvre?
8. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
NA VI GA T E A WE B O F GLO BA L T A X PO LI C I E S
“Our customers are increasingly required to navigate a complex web of global tax policies and
regulations. We need an approach to model the sophisticated corporate structures of our
largest clients and deliver an end-to-end tax solution. We use a microservices architecture
approach for our platforms and are beginning to leverage Amazon Neptune as a graph-based
system to quickly create links within the data.”
said Tim Vanderham, chief technology officer, Thomson Reuters Tax & Accounting
10. Airlines use case
Legacy systems running on mainframes backed by relational databases with complex workflow
engines and state machines.
Everything could be treated as entity and relationships between planes, parts, maintenance
locations, workstations able to perform the work, availability of parts at set workstations,
personnel and personnel skillsets. Impact of sudden logistic changes and reassignments would
be greatly simplified.
12. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Challenges Building Apps with Highly Connected DataRELATIONAL DATABASE CHALLENGES BUILDING
APPS WITH HIGHLY CONNECTED DATA
Unnatural for
querying graph
Inefficient
graph processing
Rigid schema inflexible
for changing data
13. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
A G RA PH DA T A BA SE IS OPT IMIZ E D F OR E F F ICIE NT
ST ORA G E A ND RE T RIE VA L OF H IG H L Y CONNE CT E D DA T A
14. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
GRAPHS ARE INTUITIVE.
TRIADIC CLOSURE – CLOSING TRIANGLES
FRIEND
FRIEND
Terry
Bill
Sarah
FRIEND
15. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
RECOMMENDING NEW CONNECTIONS
Terry
16. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
IMMEDIATE FRIENDSHIPS
FRIEND
Terry
Bill
17. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
MEANS AND MOTIVE
FRIEND
FRIEND
Terry
Bill
Sarah
18. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
RECOMMENDATION
FRIEND
FRIEND
Terry
Bill
Sarah
19. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Open Source Apache TinkerPop
Gremlin Traversal Language
W3C Standard
SPARQL Query Language
R E S O U R C E D E S C R I P T I O N
F R A M E W O R K ( R D F )
P R O P E R T Y G R A P H
LEADING GRAPH MODELS AND FRAMEWORKS.
2 MODELS, 2 QUERY LANGUAGES
22. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Property Graph versus RDF
Property Graph RDF
Abstraction level Vertices and connecting edges Triples (or quads)
Data model Naturally supports edge properties,
strongly typed literals
Multiple graphs, custom
datatypes (with loose typing
constraints)
Data reuse and
publishing
Typically application-specific model,
not primarily designed for data sharing
Access to Linked Open Data
useful in many domains, ease
of data publishing and sharing
by use of global URIs and
shared vocabulary
23. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Gremlin versus SPARQL
Property Graph RDF
Standardisation Apache Tinkerpop spec as de-facto
standard – Neptune has some
implementation differences
Based on W3C standards,
with different standards on
top (LDP, R2RML, etc)
Query language
features
Path extraction, iterative looping,
coin flips – closer to an algorithmic
approach
Optional selection of
patterns, variable selection,
nested subqueries
Approach DSL-like graph traversals (low fanout
queries) – path extraction
Clause-based pattern
matching (high fanout
queries) – entity extraction
Learning curve Easy to get started with simple
queries, steep learning curve for
complex queries
Initial learning curve steeper,
but easier to generalize to
complex queries
24. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
RDF: URIs as Globally Unique Identifiers
URIs to identify nodes and edge labels
<https://permid.org/1-4295902158>
=> identifies the company “Netflix Inc”
organization:isIncorporatedIn1
=> identifies the relationship “is incorporated in”
<http://sws.geonames.org/6252001/>
=> identifies country “USA”
1 This is a shortcut for
<http://permid.org/ontology/organization/isIncorporatedIn>.
RDF uses XML prefix notation, where the prefix organization is a shortcut
for <http://permid.org/ontology/organization/>.
25. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Querying RDF Using SPARQL (2)
?property
?property
?property
?node ?node
?node
26. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
The Power of URIs: Linked Data
Linking across datasets by referencing globally unique URIs
GeoNames
Wikidata
PermID
Example: PermID (re)uses <http://sws.geonames.org/6252001/>
as a global Identifier for the USA, which is an identifier rooted in GeoNames.
27. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
The Linked Open Data Cloud
Linking Open Data cloud diagram 2017, by Andrejs Abele, John P. McCrae, Paul
Buitelaar, Anja Jentzsch and Richard Cyganiak. http://lod-cloud.net/ (CC-BY-SA)
Example: SNOMED CT (Systematized Nomenclature of Medicine –Clinical Terms)
28. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
The Power of Linked Data
Data from Wikidata
Data from PermID
Data from GeoNames
29. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
PROPERTY GRAPH
A property graph is a set of vertices and edges with respective properties (i.e. key/value pairs)
• Vertex represents entities/domains
• Edge represents directional relationship
between vertices.
• Each edge has a label that denotes the
type of relationship
• Each vertex & edge has a unique identifier
• Vertex and edges can have properties
• Properties express non-relational information about the vertices and edges
FRIENDname:
Bill
name:
Sarah
UserUser
Since 11/29/16
30. Edges – the routes to success
Performance depends on how much of the graph a query
must “touch”
• Choose domain-meaningful edge labels
• Discover only what is absolutely necessary
• “Grey out” unnecessary portions of the graph
31. How fine-grained should my edge labels be?
Is it an open or extensible set of label values?
Do you need to query across values in the set? (e.g. all addresses)
36. MAD · NOV 22-23 · 2019
Relational DBs should really be called “row
databases”Did you ever felt you were overcomplicating your db schema adding
intermediate tables to model a complex relationship?
Did you ever use some obscure hack to query hierarchical or nested data?
Did you experience very degraded performance when having to join many
tables (or to self-join a table)?
37. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
GRAPH VS. RELATIONAL DATABASE MODELING.
Relational model
Write a query to give me
everything related to a
customer.
38. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
GRAPH VS. RELATIONAL DATABASE MODELING.
Relational model Write a query to give me everything related to a
customer.
You will probably need a
mega join or a mega union.
39. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
GRAPH VS. RELATIONAL DATABASE MODELING.
Relational model Write a query to give me everything related to a
customer.
You will probably need a mega join or a
megaunion.
What if we add two or three
more tables in a couple of
weeks? What happens with
your code?
40. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
GRAPH VS. RELATIONAL DATABASE MODELING.
* Source : http://www.playnexacro.com/index.html#show:article
Relational model Graph model subset
CompanyName:
Acme
…
Customers
OrderDate:
8/1/2018
…
Order
PURCHASED
HAS_DETAILS
UnitPrice:
$179.99
…
Order
DetailsProductName:
“Echo”
…
Product
HAS_PRODUCT
CompanyName:
“Amazon”
…
SupplierSUPPLIES
41. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
SQL RELATIONAL DATABASE QUERY
SELECT distinct c.CompanyName
FROM customers AS c
JOIN orders AS o ON /* Join the customer from the order */
(c.CustomerID = o.CustomerID)
JOIN order_details AS od /* Join the order details from the order
*/
ON (o.OrderID = od.OrderID)
JOIN products as p /* Join the products from the order details
*/
ON (od.ProductID = p.ProductID)
WHERE p.ProductName = ’Echo'; /* Find the product named ‘Echo’ */
Find the name of companies that purchased the ‘Echo’.
42. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
SPARQL DECLARATIVE GRAPH QUERY
PREFIX sales_db: <http://sales.widget.com/>
SELECT distinct ?comp_name WHERE {
?customer <sales_db:HAS_ORDER> ?order ; #customer graph pattern
<sales_db:CompanyName> ?comp_name . #orders graph pattern
?order <sales_db:HAS_DETAILS> ?order_d . #order details graph pattern
?order_d <sales_db:HAS_PRODUCT> ?product . #products graph
pattern
?product <sales_db:ProductName> “Echo” .
}
* Source : http://www.playnexacro.com/index.html#show:article
Find the name of companies that purchased the ‘Echo’.
43. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
GREMLIN IMPERATIVE GRAPH TRAVERSAL
/* All products named ”Echo” */
g.V().hasLabel(‘Product’).has('name',’Echo')
.in(’HAS_PRODUCT') /* Traverse to order details */
.in(‘HAS_DETAILS’) /* Traverse to order */
.in(’HAS_ORDER’) /* Traverse to Customer */
.values(’CompanyName’).dedup() /* Unique Company Name */
Find the name of companies that purchased the ‘Echo’.
44. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
TRIADIC CLOSURE – CLOSING TRIANGLES
FRIEND
FRIEND
Terry
Bill
Sarah
FRIEND
45. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Recommend New Connections
g = graph.traversal()
g.V().has('name','Terry').as('user').
both('FRIEND').aggregate('friends').
both('FRIEND').
where(neq('user')).where(neq('friends')).
groupCount().by('name').
order(local).by(values, decr)
46. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
FIND TERRY
g = graph.traversal()
g.V().has('name','Terry').as('user').
both('FRIEND').aggregate('friends').
both('FRIEND').
where(neq('user')).where(neq('friends')).
groupCount().by('name').
order(local).by(values, decr)
47. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
FIND TERRY’S FRIENDS
g = graph.traversal()
g.V().has('name','Terry').as('user').
both('FRIEND').aggregate('friends').
both('FRIEND').
where(neq('user')).where(neq('friends')).
groupCount().by('name').
order(local).by(values, decr)
48. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AND THE FRIENDS OF THOSE FRIENDS
g = graph.traversal()
g.V().has('name','Terry').as('user').
both('FRIEND').aggregate('friends').
both('FRIEND').
where(neq('user')).where(neq('friends')).
groupCount().by('name').
order(local).by(values, decr)
user
friend
fof
FRIEND
FRIEND
49. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
...WHO AREN’T TERRY AND AREN’T FRIENDS
WITH TERRY
g = graph.traversal()
g.V().has('name','Terry').as('user').
both('FRIEND').aggregate('friends').
both('FRIEND').
where(neq('user')).where(neq('friends')).
groupCount().by('name').
order(local).by(values, decr)
user
friend
fof
X
FRIEND
FRIEND
50. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
CHALLENGES OF EXISTING GRAPH DATABASES
Difficult to maintain
high availability
Difficult to scale
Limited support for
open standards
Too expensive
51. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AMAZON NEPTUNE
F u l l y m a n a g e d g r a p h d a t a b a s e
FAST RELIABLE OPEN
Query billions of
relationships with
millisecond latency
6 replicas of your data
across 3 AZs with full
backup and restore
Build powerful
queries easily with
Gremlin and SPARQL
Supports Apache
TinkerPop & W3C
RDF graph models
EASY
52. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AMAZON NEPTUNE HIGH LEVEL ARCHITECTURE
Bulk load
from
Amazon S3
Database
Mgmt.
53. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Fully Managed Service
Easily configurable via the console
Multi-AZ high availability
Support for up to 15 read replicas
Supports encryption at rest
Supports encryption in transit (TLS)
Backup and restore, point-in-time
recovery
B E N E F I T S
54. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
• Secure deployment in a VPC
• Increased availability through
deployment in two subnets in two
different Availability Zones (AZs)
• Cluster volume always spans three
AZ to provide durable storage
• See the Amazon Neptune
Documentation for VPC setup details
AMAZON NEPTUNE: VPC DEPLOYMENT
55. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
BATTLE-TESTED CLOUD-NATIVE STORAGE ENGINE
OVERVIEW
Data is replicated 6 times across 3 Availability Zones
Continuous backup to Amazon S3
(built for 11 9s durability)
Continuous monitoring of nodes and disks for repair
10 GB segments as unit of repair or hotspot rebalance
Quorum system for read/write; latency tolerant
Quorum membership changes do not stall writes
Storage volume automatically grows up to 64 TB
AZ 1 AZ 2 AZ 3
Amazon S3
Amazon
Neptune
Storage
Node
Storage
Node
Storage
Node
Storage
Node
Storage
Node
Storage
Node
Storage
Monitoring
56. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AMAZON NEPTUNE HIGH AVAILABILITY AND FAULT
TOLERANCE (CLOUD-NATIVE STORAGE)
What can fail?
Segment failures (disks)
Node failures (machines)
AZ failures (network or datacenter)
Optimizations
4 out of 6 write quorum
3 out of 6 read quorum
Peer-to-peer replication for repairs
AZ 1 AZ 2 AZ 3
Caching
Amazon
Neptune
AZ 1 AZ 2 AZ 3
Caching
Amazon
Neptune
57. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AMAZON NEPTUNE READ REPLICAS
Availability
• Failing database nodes are
automatically detected and replaced
• Failing database processes are
automatically detected and recycled
• Replicas are automatically promoted
to primary if needed (failover)
• Customer specifiable fail-over order
AZ 1 AZ 3AZ 2
Primary
Node
Primary
Node
Primary
Master
Node
Primary
Node
Primary
Node
Read
Replica
Primary
Node
Primary
Node
Read
Replica
Cluster
and
Instance
Monitoring
Performance
• Customer applications can scale out read
traffic across read replicas
• Read balancing across read replicas
58. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AMAZON NEPTUNE FAILOVER TIMES ARE
TYPICALLY < 30 SECONDS
Replica-Aware App Running
Failure Detection DNS Propagation
Recovery
Database
Failure
1 5 - 2 0 s e c 3 - 1 0 s e c
App
Running
59. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AMAZON NEPTUNE CONTINUOUS BACKUP (CLOUD-
NATIVE STORAGE)
• Take periodic snapshot of each segment in parallel; stream the logs to Amazon S3
• Backup happens continuously without performance or availability impact
• At restore, retrieve the appropriate segment snapshots and log streams to storage nodes
• Apply log streams to segment snapshots in parallel and asynchronously
Segment snapshot Log records
Recovery point
Segment 1
Segment 2
Segment 3
Time
60. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AMAZON NEPTUNE ONLINE POINT-IN-TIME
RESTORE (CLOUD-NATIVE STORAGE)
Online point-in-time restore is a quick way to bring the database to a particular point
in time without having to restore from backups
• Rewinding the database to quickly
• Rewind multiple times to determine the desired point-in-time in the database state
t0 t1 t2
t0 t1
t2
t3 t4
t3
t4
Rewind to t1
Rewind to t3
Invisible Invisible
63. Example scenario
Employment history application
• People, companies, roles
Use cases
1. Find the companies where X has worked, and their roles at those
companies
2. Find the people who have worked for a company at a specific
location during a particular time period
3. Find the people in more senior roles at the companies where X
worked
64. Identify entities, relationships, and attributes
Find the companies where X has worked, and their roles at
those companies
Which companies has X worked for, and in what roles?
Companies Company Entity
X Person Entity
Worked for Worked for Relationship
Roles Role Attribute?
65. Identify candidate vertices, labels, and
properties
Company Entity Vertex Company
Person Entity Vertex Person
Worked for Relationship Edge WORKED_FOR
Role Attribute? Property? role
name
CompanyPerson
66. Entity or attribute? Vertex or property?
Is role an entity or an attribute?
• Does it have identity (or is it a value type)? X
• Is it a complex type (with multiple fields)? X
• Are there any structural relations between values? ?
Keep it simple
• Prefer properties to vertices/edges until the need arises
68. What questions would we have to ask of our
data?
Find the people in more senior roles at the companies where
X worked
Who were in senior roles at the companies where X worked?
69. Entity or attribute? Vertex or property?
Is role an entity or an attribute?
• Does it have identity (or is it a value type)? X
• Is it a complex type (with multiple fields)? X
• Are there any structural relations between values? ✓
Model structural relations with edges
• Promote role to being a vertex
73. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Gremlin query 3
g.V('p-3').
out('JOB').as('j1').
out('ROLE').
repeat(out('PARENT_ROLE')).until(outE().count().is(0)).
emit().in('ROLE').as('j2').
or(
(where('j1', between('j2', 'j2')).by('from').by('from').by('to')),
(where('j1', between('j2', 'j2')).by('to').by('from').by('to')),
(where('j1', lte('j2').and(gt('j2'))).by('from').by('from').by('to').by('from'))
).
project('role', 'name').
by(out('ROLE').values('name')).
by(in('JOB').values('firstName', 'lastName').fold()).
toList())
74. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Gremlin query 3
g.V('p-3').
out('JOB').as('j1').
out('ROLE').
repeat(out('PARENT_ROLE')).until(outE().count().is(0)).
emit().in('ROLE').as('j2').
or(
(where('j1', between('j2', 'j2')).by('from').by('from').by('to')),
(where('j1', between('j2', 'j2')).by('to').by('from').by('to')),
(where('j1', lte('j2').and(gt('j2'))).by('from').by('from').by('to').by('from'))
).
project('role', 'name').
by(out('ROLE').values('name')).
by(in('JOB').values('firstName', 'lastName').fold()).
toList())
75. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Gremlin query 3
g.V('p-3').
out('JOB').as('j1').
out('ROLE').
repeat(out('PARENT_ROLE')).until(outE().count().is(0)).
emit().in('ROLE').as('j2').
or(
(where('j1', between('j2', 'j2')).by('from').by('from').by('to')),
(where('j1', between('j2', 'j2')).by('to').by('from').by('to')),
(where('j1', lte('j2').and(gt('j2'))).by('from').by('from').by('to').by('from'))
).
project('role', 'name').
by(out('ROLE').values('name')).
by(in('JOB').values('firstName', 'lastName').fold()).
toList())
76. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Gremlin query 3
g.V('p-3').
out('JOB').as('j1').
out('ROLE').
repeat(out('PARENT_ROLE')).until(outE().count().is(0)).
emit().in('ROLE').as('j2').
or(
(where('j1', between('j2', 'j2')).by('from').by('from').by('to')),
(where('j1', between('j2', 'j2')).by('to').by('from').by('to')),
(where('j1', lte('j2').and(gt('j2'))).by('from').by('from').by('to').by('from'))
).
project('role', 'name').
by(out('ROLE').values('name')).
by(in('JOB').values('firstName', 'lastName').fold()).
toList())
77. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Gremlin query 3
g.V('p-3').
out('JOB').as('j1').
out('ROLE').
repeat(out('PARENT_ROLE')).until(outE().count().is(0)).
emit().in('ROLE').as('j2').
or(
(where('j1', between('j2', 'j2')).by('from').by('from').by('to')),
(where('j1', between('j2', 'j2')).by('to').by('from').by('to')),
(where('j1', lte('j2').and(gt('j2'))).by('from').by('from').by('to').by('from'))
).
project('role', 'name').
by(out('ROLE').values('name')).
by(in('JOB').values('firstName', 'lastName').fold()).
toList())
78. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Gremlin query 3
g.V('p-3').
out('JOB').as('j1').
out('ROLE').
repeat(out('PARENT_ROLE')).until(outE().count().is(0)).
emit().in('ROLE').as('j2').
or(
(where('j1', between('j2', 'j2')).by('from').by('from').by('to')),
(where('j1', between('j2', 'j2')).by('to').by('from').by('to')),
(where('j1', lte('j2').and(gt('j2'))).by('from').by('from').by('to').by('from'))
).
project('role', 'name').
by(out('ROLE').values('name')).
by(in('JOB').values('firstName', 'lastName').fold()).
toList())
79. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Gremlin query 3
g.V('p-3').
out('JOB').as('j1').
out('ROLE').
repeat(out('PARENT_ROLE')).until(outE().count().is(0)).
emit().in('ROLE').as('j2').
or(
(where('j1', between('j2', 'j2')).by('from').by('from').by('to')),
(where('j1', between('j2', 'j2')).by('to').by('from').by('to')),
(where('j1', lte('j2').and(gt('j2'))).by('from').by('from').by('to').by('from'))
).
project('role', 'name').
by(out('ROLE').values('name')).
by(in('JOB').values('firstName', 'lastName').fold()).
toList())
80. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Gremlin query 3
g.V('p-3').
out('JOB').as('j1').
out('ROLE').
repeat(out('PARENT_ROLE')).until(outE().count().is(0)).
emit().in('ROLE').as('j2').
or(
(where('j1', between('j2', 'j2')).by('from').by('from').by('to')),
(where('j1', between('j2', 'j2')).by('to').by('from').by('to')),
(where('j1', lte('j2').and(gt('j2'))).by('from').by('from').by('to').by('from'))
).
project('role', 'name').
by(out('ROLE').values('name')).
by(in('JOB').values('firstName', 'lastName').fold()).
toList())
81. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Gremlin query 3
g.V('p-3').
out('JOB').as('j1').
out('ROLE').
repeat(out('PARENT_ROLE')).until(outE().count().is(0)).
emit().in('ROLE').as('j2').
or(
(where('j1', between('j2', 'j2')).by('from').by('from').by('to')),
(where('j1', between('j2', 'j2')).by('to').by('from').by('to')),
(where('j1', lte('j2').and(gt('j2'))).by('from').by('from').by('to').by('from'))
).
project('role', 'name').
by(out('ROLE').values('name')).
by(in('JOB').values('firstName', 'lastName').fold()).
toList())
82. Wrapping up
Graphs can be applied to a huge number of use cases
A graph database fits more naturally and performs much faster than other
databases when working with highly connected data
Scaling out a graph database is not easy. Amazon Neptune makes your
life easier
85. Mapping relational to graph (RDF)
W3C DirectMapping
• “Out-of-the-box” schema for mapping relational data to RDF
• https://www.w3.org/TR/rdb-direct-mapping/
R2RML
• Standard that allows you to specify mappings from relations to
graph
• Rules defined over logical tables
• https://www.w3.org/TR/r2rml/
Tooling
• D2RQ – access relational database as virtual, read-only RDF graph
87. Foreign keys
12 … Alice
id … f_name
37 … Bob
512 12 home High St
655 37 work Main St
700 12 work Any St
id p_id type addr_1
88. Foreign keys
12 … Alice
id … f_name
37 … Bob
512 12 home High St
655 37 work Main St
700 12 work Any St
id p_id type addr_1
89. Join tables
12 … Alice
id … f_name
37 … Bob
512 … Any Co
655 … Example
Co
700 … Example.
com
id … name
12 512 2012 2015
37 512 2011 2016
p_id c_id from to
37 655 2016 2017
12 700 2015 2017
93. Data load options
Bulk loader API
• Load data from S3
into Neptune
• Low overhead,
optimized for large
datasets
• Good for append-
only loads
Online endpoints
• Gremlin or SPARQL
Bulk load
from S3
Database
Mgmt.
94. Appendix II
Ignition One. Customer use case presented at
AWS Atlanta Summit
https://www.slideshare.net/AmazonWebServic
es/using-amazon-neptune-to-power-identity-
resolution-at-scale-adb303-atlanta-aws-
summit