EY has a large and growing graph practice with over 200 consultants globally. They see widespread use of graph technologies across many sectors and have delivered graph solutions to help clients drive insight, efficiency, and value. The document discusses trends driving graph adoption, graph leaders in the market, and EY's point of view on building data fabrics and knowledge graphs to connect and mobilize enterprise data.
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Predictions for the Future of Graph Database
1. Future of Graphs
Michael D. Moore, Ph.D.
Managing Director, Advanced Technology
michael.moore4@ey.com
2. EY & Graphs
12 February 2021
Page 2
Plasma
Donor 360
Retail
Customer
360
Customer
Identity
Enterprise
Org
Design
FinServ
Know Your
Customer
Regulatory
Reporting
Data
Lineage
Anti Money
Laundering
GCN
Cruiseline
Activity
NBA
Batch
Geneology
B2B Event
NBA
Capital
Projects
Cost
Visibility
COVID-19
Risk
Tracking
Fuels
Tradiing
Forecasting
Global
Compliance
Monitoring
Active
Directory
Access
Controls
Financial
Ledger
Transaction
Lineage
FINANCIAL
SERVICES
SALES &
MARKETING
ENERGY
EY
SOLUTIONS
LIFE
SCIENCES
RISK
EY has a large and growing graph
practice, with over 200
consultants globally.
We see a wide range of graph use
cases across all sectors, and have
delivered a number of compelling
graph solutions to help our clients
drive greater insight, efficiency
and value.
3. By the end of this decade, 50% of SQL workloads will move to graphs
11 February 2021 Presentation title
Page 3
4. Trends Driving Graph Adoption
• More data
• Less time
• Cheaper memory
• Cheaper compute
• n+1 Data Lakes (grrr!)
• Federated data
• Data as a Service
• DevOps
• Total Cost of Ownership
11 February 2021
Page 4
5. Graph Leaders
11 February 2021
Page 5
“Graph is the fastest way to connect data, especially
when dealing with complex or large volumes of disparate
data.
Without graph, organizations have to rely on
developers to write complex code that can take
considerable time and effort. In some cases, it becomes
impractical due to the complexity of data.
Graph data platform is a new and emerging
market that allows organizations to think differently and
create new, intelligence-based business opportunities
that would otherwise be difficult to develop and
support.”
Forrester Wave™: Graph Data Platforms, Q4 2020
The 12 Providers That Matter Most And How They Stack Up
by Noel Yuhann November 16, 2020
6. Neo4j now in the Top 20 most popular database engines
11 February 2021
Page 6
7. Page 7 EY POV on Data Fabric
Our point of view – Data fabric architecture
A modern pattern for a hybrid cloud ecosystem enabled by a Global Data Plane
1
• Infrastructure as code,
integrated with privacy and
cybersecurity
2
• Pattern based ingestion
• Automatically manage
schema drift
3
• Enable federated querying
across cloud and on-prem
platforms
4
• An integrated, analytics
workbench to drive AI at
scale
5
• Orchestration of insights
into operational systems
through API’s
8. Data Unification Approaches Create Immediate Business Value
.
Query Federation Knowledge Graphs
Metadata Catalog
Query Engine
Query API
• Enables federated querying
across cloud and on-prem
platforms
• Maps metadata for core
data elements
• SQL RDBMS data model
• Data kept in-place
• OLAP use cases
• Simple queries, standard
reporting
• Scales well to enterprise
Graph Database
ELT Processes
Graph API • Enables low-latency querying
on data sourced across cloud
and on-prem platforms
• Directly maps core data
elements using relationships
• No-SQL Graph data model
• Data in memory, query
federation across graphs
• OLTP & OLAP use cases
• Complex queries, advanced
analytics, AI inference
• Scales well to enterprise
Example Use Cases:
• Enterprise reporting
• Regulatory Reporting
• Data Governance
• Data Lineage
Example Use Cases:
• 360° Views (Customer/Asset/Batch)
• End-to-End Processes / Data Lineage
• Supply Chain & Regulatory Dependency Networks
• Next Best Action / Recommendation Engines
11 February 2021
Page 8
9. Three reasons why graphs beat SQL:
•Great end user experience
•Better data context
•Less effort to develop
12 February 2021
Page 9
10. Graph databases are designed for creating, storing, and querying graphs
“We send email to people, so they will
visit our website and buy our product”
MATCH (e:Email)-[:SENT_TO]->
(p:Person {fullName: ’Steve Newman'})-[:VISITED]-> (w:Website)<-
[:SOLD_ON]-(pr:Product)<-[:PURCHASED]-(p) RETURN *
Semantic Representation
Graph Representation
Physical Representation
Email Person Website
Product
SENT VISITED
SOLD ON
PURCHASED
• Graphs have all possible logical relationships precomputed, much, much faster than SQL
• Graphs are fast and easy understand, develop and use
• Graphs integrate well with applications and data sources, great for real-time digital workloads
• Graphs surface, unify and mobilize data held in silos and data lakes
11. SQL Graph
Graphs are Great for End Users
11 February 2021
Page 11
All end users pay cost of joining
data at query run time
à slower reads, simple queries
Slower Loads:
One time cost to compute and store
persistent data relationships
No additional cost for joining data
at query run time
à faster reads, complex queries
Faster Loads:
no persistent relationships are
created when data is stored in tables
0101010
1010110
1010100
0101010
1010110
1010100
12. Graphs Create Context: Wide Data
12 February 2021
Page 12
• Wide data
• Complex data
• Deep data
• Legacy data
• Frozen data
• Hidden data
ONE-TO-MANY RELATIONSHIPS ACROSS MANY ENTITIES
13. Graphs Create Context: Complex Data
12 February 2021
Page 13
• Wide data
• Complex data
• Deep data
• Legacy data
• Frozen data
• Hidden data
MANY-TO-MANY RELATIONSHIPS
14. Graphs Create Context: Deep Data
12 February 2021
Page 14
• Wide data
• Complex data
• Deep data
• Legacy data
• Frozen data
• Hidden data
RECURSION (SELF-JOINS)
DEEP HIERARCHY
15. Graphs Create Context: Legacy Data
12 February 2021
Page 15
• Wide data
• Complex data
• Deep data
• Legacy data
• Frozen data
• Hidden data
LEGACY A LEGACY B LEGACY C LEGACY D LEGACY E
SILOED LEGACY DATABASES
16. Graphs Create Context: Frozen Data
12 February 2021
Page 16
• Wide data
• Complex data
• Deep data
• Legacy data
• Frozen data
• Hidden data
DATA LAKE
FACT A FACT B FACT C FACT D FACT E
VERY LARGE INGESTED DATA
17. Graphs Create Context: Hidden Data
12 February 2021
Page 17
• Wide data
• Complex data
• Deep data
• Legacy data
• Frozen data
• Hidden data
IF A AND B ARE BOTH RELATED TO X,
WE CAN INFER A IS RELATED TO B
18. Graphs Require Less Effort to Develop
12 February 2021
Page 18
Neo4j is a full-featured graph platform:
• In-memory data fabric
• Fast, complex queries (Cypher)
• Clean, elegant semantics (Labeled Property Graph)
• Fidelity to business processes
• Multiple workloads and use cases (OLTP + OLAP)
• Enterprise DB features, security, scalability, containerization
• Rapid development, Languages, APIs (REST, GraphQL)
• Deploy on-prem, cloud infrastructure or as SaaS (Aura)
• Tooling, Visualizations, Data science
• Easy to adopt, large community
19. Graph Data Unification – Approaches to Building the Data Plane
Batched ELT of Structured Data
Query Federation to Structured Data
Query Federation to Semi Structured Document Data
Batched Pointers to UnStructured Blob Data
Query Federation to Sharded Graph Data
Near Real Time Message Data
Real Time API Transactions
Batched ELT of RDF Ontologies
A Knowledge Graph is a data fabric composed of nodes and relationships that connect
and mobilize data, using consistent semantics
INGESTION FEDERATION
20. Getting Started With Graphs
12 February 2021
Page 20
Small Team:
• Graph Architect
• Data Engineer
• Full-stack Developer
• Data Scientist
• Report Developer
Problem / Scope
What will the graph
solve?
Production Build
Cloud Pilot
Localhost POC
Graphy Problem
Business need, Data sources Data modeling, API, example queries Data snapshot, reference architecture, API suite Hardening, scheduled & stream ETL, Live UX
Stakeholder Input
Graph Design
Data Work
APIs / Data Services
Integration / Refinement
Scale / Harden / Run
Validate
What questions can now
be answered?
Connect
Does the data support the
graph model and
semantics?
Mobilize
What data does the new
experience need?
Use Cases
What is the feedback
from the business on how
well the graph solves the
use case?
Deploy
What monitoring, testing,
process needs to be put
in place to achieve a
robust SLA?
Key Conversations
21. Enterprise Knowledge Graph
How it all fits together
12 February 2021
Page 21
Ontology &
Taxonomy
Data
Lineage
Data
Discovery
Business
Semantics
Data Sources / Repositories
Front-end Applications
Data
Unification
Graph
Analytics
22. 22
Customer 360° Graph Schema
Account
Transactions
Segments
Product
Interactions
22
• Accurately
captures full range
of customer
touchpoints across
enterprise surface
area
• Enables more
insightful indirect
spend analytics for
products and
services
• Reconciles product
usage, marketing
interactions and
digital identity
• Integrates with
execution layer for
AI driven UX
23. Page 23 EY POV on Data Fabric
Master Data Management Graphs dynamically compute ”golden records”
Presentation title
Product
Core Data Elements
Customer
& Contact
Orders
MDM Graph Schema
• Accurately captures
data lineage for core
identity components
• Provides ”Golden
Record” from multi-
source probabilistic
authority scores
• Relates contacts,
customers, orders and
products without loss
of fidelity
• Enables detailed
whitespace analysis
and next best sales
action
• Integrates with data
lake and CRM
applications
24. Asset 360° Graph Schema Enables Data Discovery at Scale
Searchable Pointers to
Unstructured blobs
Text & Metrics from
Semi-Structured
data
Structured Data and Derived Entities
27. Neo4j Transactional Endpoint Using StreamSets ELT
12 February 2021
Page 27
1M records
in 30 sec,
5 parallel
threads
28. Neo4j Streams – Graph as a Real Time Event Consumer
12 February 2021
Page 28
Data stream pulled from
Kafka into Graph in real-time
1M
messages
in 30 sec
29. Neo4j Streams: Graph as a Real Time Event Producer
12 February 2021 Presentation title
Page 29
Click to add text
Neo4j
Kafka
Config
Kafka
Neo4j
CDC
Topic
Any changes to Graph are
pushed to Kafka in real-time
30. Page 30 EY POV on Data Fabric
Semantic graphs enable data lineage, data quality and consistent taxonomy
Presentation title
Semantic Graph Schema
• Handles complex mappings
• Data recency and coverage
• Track source systems & entities
for core data elements
• Track data requirements for
downstream consumers
• Repository for business friendly
terms used in APIs (Canonical
Message Model)
31. Ontology management in Neo4j
• Import/Export of RDF and RDF* in multiple formats (Turtle, N-Triples, JSON-LD, RDF/XML, TriG and N-Quads, Turtle*, TriG*)
• Model mapping on import/export
• Import and export of Ontologies/Taxonomies in different vocabularies (OWL,SKOS,RDFS)
• Graph validation based on SHACL constraints
• Basic inferencing
https://neo4j.com/labs/neosemantics/
32. Neo4j Graph Scaling
12 February 2021
Page 32
Last
Modified
2/12/21
Scale In: Multi-Database Scale Up: Causal Clustering Scale Out: Fabric
• Graph size up to largest VM (~24TB)
• Quorum write commits
• Read own writes using bookmarks
• Fast HA failover / new master election
• Async replication to read nodes
• Virtual DB connects Graph shards
• Query federation across instances
• Scales beyond VM sizes
• Balances domain vs enterprise
• Supports HA across clusters
• Multiple Graph DBs on same instance
• Security managed in system DB
• Operate independently
• Host small graphs (dev / departmental)
• Efficient use of server licensing
33. Neo4j Fabric: No Upper Limit to Graph Size
12 February 2021
Page 33
A GRAPH SHARD OF MOVIES AND ACTORS
A GRAPH SHARD OF MOVIES AND NON-ACTORS
FEDERATED QUERY RESULT COMBINING BOTH SHARDS
34. Fast & Efficient
Graphs have logical relationships
precomputed, ensuring significantly
improved speed and efficiency for
deep traversals across complex
relationships; Ideal for evolving and
interrelated populations
Interoperable Transformative Strategic
Intuitive
Schema-less for rapid, iterative
development; Inherent
visualization capabilities allow
for easy traversal and
understanding
Interfaces easily with traditional
systems and can be slotted in to
enhance already mature
workflows and data
environments
Provides extensible platform
for actionable, end-to-end
analytical applications
including operational analytics
Surfaces, unifies, mobilizes
disconnected information in
data lakes allowing for
advances in governance,
traceability, and awareness of
data across the environment
Graph can add value in any environment where:
Data is interconnected and
relationships matter
Data needs to be read and
queried with optimal
performance
Data is evolving and data model
is not always fixed and pre-
defined
Summary: Graph Usage will Continue to Rise across Enterprises