1. Roadmap for
Enterprise Graph Strategy
Michael Moore, Ph.D.
Executive Director, Enterprise Knowledge Graphs + AI
EY Performance Improvement Advisory
michael.moore4@ey.com
July 18, 2019
2. The Database Landscape is Changing
SQL RDBMS
Column
Document Key Value
Graph
SearchServerlessStreams In-Memory
Traditional Databases
& Data Warehousing
NoSQL Databases
Data Services & Data Processing
Batch MR Blockchain
2
3. Scale Out Scale Up
Continued increase in capacity and
dropping compute costs are challenging
scale-out commodity server assumptions,
particularly for database workloads
2018
3
4. Rankings Change in Popularity (db-engines.com)
*Proprietary method based on general interest, mentions, relevance in social networks, frequency of technical discussions etc.
Graph DBs
4
5. “We send email to people, so they will visit
our website and buy our product”
A Database specifically designed for creating, storing, and querying graphs
MATCH (e:Email)-[:SENT_TO]->
(p:Person {fullName: ’Steve Newman'})-[:VISITED]->
(w:Website)<-[:SOLD_ON]-(pr:Product)<-[:PURCHASED]-(p)
RETURN *
Semantic Representation
Graph Representation
Physical Representation
► Graphs have all possible logical relationships precomputed, much, much faster than SQL
► Graphs are fast and easy understand, develop and use
► Graphs integrate well with applications and data sources, great for real-time digital workloads
► Graphs surface, unify and mobilize data held in silos and data lakes
What is a Graph Database? 5
9. Graph Use Cases
► Customer 360°
► Recommendation
Engines
► Marketing Attribution
► Enterprise Search
► Fraud Detection
► Master Data
Management
► Supply Chain
► Geolocation & Routing
► Access & Asset Control
► Social Networks
► IT & Network
Management
9
10. Real-Time, Evolving Graph View Across the Business
Data Ingestion, Cleansing, Reduction & Pipelining
Real-time BI & ScorecardsMobile & Web Applications Data Science
access control, metadata, recos, monitoring KPIs, targets, reporting, drill down/across attribution, similarity, fraud, pathing, cliques
Marketing ROI &
Digital Experience (CMO)
Data Governance &
Data Quality (CDO)
Operations & Risk
Management (CFO)
Account Coverage &
Customer LTV (CRO)
Product Marketing &
Recommendations (CPO)
UNSTRUCTURED LEGACY SNAPSHOTS
CONFORMED &
CURATED
STREAMS
Graphs Accelerate Enterprise Data Mobilization 10
11. 11Roadmap for Enterprise Graph Strategy
Small Team:
• Graph Architect
• Data Engineer
• Full-stack Developer
• Data Scientist
• Report Developer
Problem / Scope
What will the graph
solve?
Production BuildCloud PilotLocalhost POCGraphy Problem
Business need, Data sources Data modeling, API, example queries Data snapshot, reference architecture, API suite Hardening, scheduled & stream ETL, Live UX
Stakeholder Input
Graph Design
Data Work
APIs / Data Services
Integration / Refinement
Scale / Harden / Run
Validate
What questions can now
be answered?
Connect
Does the data support the
graph model and
semantics?
Mobilize
What data does the new
experience need?
Use Cases
What is the feedback
from the business on how
well the graph solves the
use case?
Deploy
What monitoring, testing,
process needs to be put
in place to achieve a
robust SLA?
Key Conversations
12. Talk to the business, pick a graphy problem
What is a “Graphy” problem?
• Requires many entities (eg many SQL tables, 360° views)
• Involves recursion (eg. SQL self joins)
• Has complex, potentially colliding, hierarchies (eg SQL 1 to many, many-to-many)
• Based on informatics of the relationships themselves (eg collaborative filtering shared
relationship counts, shortest path segment summations for wayfinding, cost/time
minimization for supply chain, money flows for finance)
• Requires mapping, direct or indirect across data sources (eg data lake unification)
• Demands fast query results (eg digital applications, search)
• Most importantly, go talk to the business – what are the analytics you’d like to have or
customer experiences you’d like to light up – but can’t because of our current data
limitations?
• What’s the most critical data that you’d like to see connected?
• What would be an example demo that you’d find compelling (report/analysis/experience)
12
Production BuildCloud PilotLocalhost POCGraphy Problem
13. Get comfortable with Neo4j – don’t need to become an expert
• Get hands on – be fearless! Neo4j is the easiest graph database to
learn.
• Install Neo4j, Apoc procedures, set the following in Manage/Settings
#Apoc Plugin Configurations
apoc.import.file.enabled=true
apoc.export.file.enabled=true
dbms.security.procedures.unrestricted=*.*
• Go through the Cypher lessons, and learn basics graph modeling and to
load csv
LOAD CSV WITH HEADERS FROM "file:///movies.csv"
AS row
CALL apoc.load.csv(url,{}) YIELD map
• Any reasonably sized laptop should be able to handle a graph with
several million nodes and relationships You will quickly see some of the
significant benefits of connected data.
• For extra credit you can go onto github/neo4j-examples and download
starter applications for your favorite languages.
13
Production BuildCloud PilotLocalhost POCGraphy Problem
14. Design and build your POC Graph
• Start small and simple, limit yourself to 3-4 data sources, shallow extracts.
Snapshot SQL top queries for a pool of linked transactions
• Use common sense, business-friendly naming for your node labels and relationship
types. You’ll iterate this model using input from the business, and the model
should be clear and readable
• Don’t be afraid of recursion
(Employee)-[:REPORTS_TO]->(Employee) who is the boss?
• Don’t get too hung up on whether something should be a node label, property, or
relationship. Just keep in mind that node labels define set members, and that it’s
faster to search along relationships (traversal) than properties (full graph scan)
• You can use call db.schema() to see the graph schema, and we often use
http://apcjones.com/arrows/# to build illustrative schemas for conversations with
business stakeholders
• Test your graph design by writing some example queries, do this with your business
stakeholder
• Does this look right to you – is this how you would whiteboard this process? Am I
missing any key entities or relationships?
14
Production BuildCloud PilotLocalhost POCGraphy Problem
15. Example Knowledge Graph Schema
for Spend and Supply Chain Analytics
Supplier 360°
Spend Graph
• Accurately captures the
sourcing complexity of products
and services
• Enables more insightful indirect
spend analytics for products
and services
• Reconciles line-item detail to
top parent company, across
intermediate entities
• Extensible for audit, fraud
detection, tracking &
traceability
• Integrates with data lake,
reporting platforms and
transactional applications
Product Supply Chain Service Providers
Procurement
Top
Parent
Line
Item
Detail
Tracking and Traceability
Invoicing
Data fabric composed of nodes and relationships that
connect and mobilize data, using consistent semantics
15
16. 1
Example Customer 360° Graph Schema
Account
Transactions
Segments
Product
Interactions
16
Customer 360°
Graph
• Accurately captures full range
of customer touchpoints across
enterprise surface area
• Enables more insightful indirect
spend analytics for products
and services
• Reconciles product usage,
marketing interactions and
digital identity
• Integrates with execution layer
for AI driven UX
17. Example B2B MDM Graph Schema
Product
Core Data Elements
Customer
& Contact
Orders
17
Master Data
Management
Graph Schema
• Accurately captures data
lineage for core identity
components
• Provides ”Golden Record” from
multi-source probabilistic
authority scores
• Relates contacts, customers,
orders and products without
loss of fidelity
• Enables detailed whitespace
analysis and next best sales
action
• Integrates with data lake and
CRM applications
18. Example Polyglot Discovery Graph Schema
Searchable Pointers to
Unstructured blobs
Text & Metrics from
Semi-Structured
data
Structured Data and Derived Entities
18
Data Discovery
Graph Schema
• Connects structured, semi-
structured and unstructured
data across polyglot storage
• Accurately handles complex
data and documents hierarchies
• Enables full text search in graph
or in document store, directly
and via NLP
• Provides source document
access through blob URLs
• Integrates with data lake,
reporting platforms and
transactional applications
19. Design and build your POC Graph 19
Production BuildCloud PilotLocalhost POCGraphy Problem
• Breakthrough queries
• Graph algorithms
• Data unification & mobilization
• Use-case specific (Customer 360, Supply Chain, Fraud, Reco)
• Make a localhost graph->app stack so you understand how
parameterized Cypher & Bolt drivers work
• Use any of the neo4j-examples to jumpstart
• If you don’t want to spend time creating a REST API, check out
GraphQL and the GRAND stack (https://github.com/grand-
stack/grand-stack-starter)
• Focus on the business value of the new graph enabled analytics –
We can now know this to make better decisions
We can now do this for our customers
20. 20Neo4j - Power BI Integration with GraphQL
Graph Database
Neo4j GraphQL API
2
3
4
1. Client issues GraphQL query
2. GraphQL API sends Cypher query to Neo4j
3. Response data sent to Client
4. Data updated in PBI report
GraphQL schema, registered in Neo4j
m query cURL wrapper
PBI report
1
21. Neo4j – React Integration with GraphQL (GRAND Stack)
21
22. Pick and build your demo application for your snapshot graph 22
Production BuildCloud PilotLocalhost POCGraphy Problem
• Pick a cloud or on-prem
• Use Marketplace images if possible
• Start with a single instance VM for Neo4j, (~ RAM 50% of SQL size)
• Attach external drives so you can scale the server
• Determine your stack architecture
• Understand your data processing requirements
• Install Python – very good for performing batch operations, pip neo4j-driver
• Leverage Neo4j’s high speed loader
• Determine what cleansing needs to occur
• If you need help reach out to SI partner or Neo4j services
23. Pick and build your demo application for your snapshot graph 23
• MVP data domains
• Graph database, app-informed
• Simplest data service
• MVP app experience
• Add new experiences, same data
• Add new data domains
Nodejs, .Net, Python, React, Swift, Tableau, etc.
REST, Bolt
Production BuildCloud PilotLocalhost POCGraphy Problem
Michael’s I-Frame model For Graph ROI
Accelerate Graph-driven User Experiences
24. CRM
Reporting
(Tableau, PBI)
Blobs FilesQueuesTables
Azure Cloud Storage
AI Sandbox
(Azure ML Studio)
Stream ETL
(Azure Event Hub)
Audience
Manager
Campaign
Target
Experience
Manager
Analytics
Marketo
Engage
Adobe Experience Cloud
Scheduled
ETL
Data
Reduction
(Azure Spark)
Cloud Data Lake
In-Memory
Document Store
Data Models
(Azure Analysis
Services)
Data Catalog
(Azure Data
Catalog)
ERP
AZURE VPC
In-Memory
Knowledge Graph
Data Services APIs
REST
Ingest Batch
StoreIngest Real-time
SearchConsolidate
Connect & Unify
Mobilize
Semantic
Layer
Analytics
Layer
Azure Data
Factory
Automated Reports
and Dashboards
Consistent Metrics
Data Discovery
Retention Models
Deep Learning
In-Memory
Sessionization
Data Aggregation
Syndicated
Data and Analytics
Knowledge Graph
Customer/Contact 360° View
Marketing Attribution
Recommendations
Real-time
Document Search
Elastic SQL Repository for
Curated & Conformed Data
Data Staging
Elastic Repository for
Raw and Unstructured Data
Real Time Updates
Customer Events
Automated Data Loading
Triggered Marketing
Consistent Experience
Example Graph Architecture Execution
25. Reporting
(Tableau,QuickSight)
S3 Blobs FilesQueuesEBS Tables
AWS Cloud Storage
Data
Discovery
(AWS Athena)
Stream ETL
(AWS Kinesis)
Audience
Manager
Campaign
Target
Experience
Manager
Analytics
Marketo
Engage
Adobe Experience Cloud (Azure)
Scheduled ETL
(AWS Data Pipeline,
PDI Kettle)
Data
Reduction
(AWS EMR)
Cloud Data Lake
In-Memory
Document Store
Machine
Learning
(AWS SageMaker)
Data Catalog
(AWS Glue)
ERP
AWS VPC
In-Memory
Knowledge Graph
Data Services APIs
REST
Ingest Batch
StoreIngest Real-time
SearchConsolidate
Connect & Unify
Mobilize
Execution
Semantic
Layer
Analytics
Layer
Example Graph Architecture
Automated Reports
and Dashboards
Retention Models
Deep Learning
Data Discovery
Consistent
Data Models
Sessionization
Data Aggregation
Knowledge Graph
Customer/Contact 360° View
Marketing Attribution
Recommendations
Real-time
Document Search
Elastic SQL Repository for
Curated & Conformed Data
Data Staging
Elastic Repository for
Raw and Unstructured Data
CRM
Real Time Updates
Customer Events
Automated Data Loading
Triggered Marketing
Consistent Experience
Syndicated
Data and Analytics
26. Enterprise Knowledge Graph Development with Neo4j
• Locate and validate data lake tables
• Design test graph schema
• Estimate graph size from nodes, relationships and properties
• Configure Neo4j server to minimize SSD disk contention
• Prepare Hive queries to generate graph-form tables (nodes, relationships)
• Validate key uniqueness, string handling, character types, relationship mappings
• Export graph form tables to gzip csv files
• Iteratively test data loader scripts, file by file
• On successful completion of hydration, apply constraints and indexes, refactor as needed
Graph-form TablesData Lake Tables CSV.gz Files Load Script Data Store
EXTRACT EXTRACT HIGH SPEED LOADER
IMPORT DONE in 1h 29m 16s 530ms.
Imported:
458356377 nodes
2176603843 relationships
9064981812 properties
Peak memory usage: 9.46 GB
26
27. Polyglot Graph Data Processing
Extract XML,
Convert to JSON,
Load JSON with
Azure Blob URI
Extract and Load
Azure Blob URIs
Extract and Load
• Document Metadata
• Named Entities
• Map Relationships
• Text Summaries
Graph Analytics & Queries
Couchbase Full Text Search
Pointers to Azure Blob URIs
Leveraging fit-for-purpose storage:
Graph storage for unified many-to-many access to cross-domain data
Document storage for searchable access to semi-structured data
Blob storage repository for large, raw and unstructured data
37,157 blobs
5.5 TB
Unstructured:
Semi-Structured:
Load CSV to Graph
Structured:
20,573 JSONs
5 GB
Reports/Applications
Data Mobilization and Graph Unification – Full Lineage and Auditability
215K nodes & relationships
1.5 GB
27
28. Production BuildCloud PilotLocalhost POCGraphy Problem
Go to Production 28
• Follow your IT best practices
• Security, assume you’ll be breached
• Deploy full environment set – Prod cluster, Stg cluster,
Test, Dev
• DevOps - leverage Jenkins, Ansible
• Wrap your solution in test automation
• Do load testing against your APIs to look for additional
optimization opportunities (Gatling)
• Monitor your logs (Splunk, Dynatrace)
• Monitor your common queries, refactor or reindex as
needed, optimize for speed
• Leverage the I-Frame Model to provide more value
29. 29Roadmap for Enterprise Graph Strategy
Small Team:
• Graph Architect
• Data Engineer
• Full-stack Developer
• Data Scientist
• Report Developer
Problem / Scope
What will the graph
solve?
Production BuildCloud PilotLocalhost POCGraphy Problem
Business need, Data sources Data modeling, API, example queries Data snapshot, reference architecture, API suite Hardening, scheduled & stream ETL, Live UX
Stakeholder Input
Graph Design
Data Work
APIs / Data Services
Integration / Refinement
Scale / Harden / Run
Validate
What questions can now
be answered?
Connect
Does the data support the
graph model and
semantics?
Mobilize
What data does the new
experience need?
Use Cases
What is the feedback
from the business on how
well the graph solves the
use case?
Deploy
What monitoring, testing,
process needs to be put
in place to achieve a
robust SLA?
Key Conversations
30. EY Cross-Sector Graph Experience: MDM, 360°, AML/Fraud, Recommenders 30
Fortune 100 Tech Company
Use Case:
Global B2B Account 360° view and
marketing attribution
Approach:
Neo4j graph with 500M nodes
and 2.2B relationships,
representing all known business
accounts, contacts and marketing
touches. Mastered data from
17disparate transactional sources
in Azure Data Lake. Supported in-
graph analytics for marketing
attribution and next best action
recommendations across global
geographies
Duration:
16 weeks to working graph
Fortune 100 Footwear Company
Use Case:
Converged Brick & Mortar +
Online Shopper 360° View
Approach:
Neo4j graph with 2B nodes and
relationships, representing sales
transactions for 40M shoppers
across 275 physical stores and the
ecommerce platform. Algorithmic
extraction and profiling from raw
XML records in AWS Hadoop,
MDM record concordance and in-
graph analytics for product
associations, store analytics and
recommendation services.
Duration:
12 weeks to working graph,
ongoing project through 2018
Fortune 500 Cruise Line Company
Use Case:
Shipboard and Shoreside
Recommendation Engine
Approach:
Neo4j graph deployable to
shipboard VM Ware data centers,
with streaming updates from
large shoreside Neo4j graph
integrating data from Azure
Cerebro, Adobe Experience
Manager and legacy transactional
systems. In-graph
analytics,services API,
recommendation engine for next
best activity for passengers
surfaced via mobile app
Duration:
12 weeks to working graph,
ongoing project through 2018
Fortune 100 Investment Firm
Use Case:
Enhanced Anti-Money Laundering
and Fraud Detection using
Graph+AI
Approach:
Neo4j graph of account 360° view
representing activity of 2M
accounts over 4 years. MDM and
entity extraction for account and
party identity elements from
enterprise Oracle system.
Network clustering, feature
engineering and graph embedding
in TensorFlow deep learning
classifier for suspicious activity
patterns across accounts and
between parties.
Duration:
16 weeks to working graph
Fortune 100 Tech Company
Use Case:
B2B Local Marketing Events
Recommendation Engine
Approach:
Neo4j graph and personalized
next best event recommendation
engine for B2B field marketers.
Reconciles physical and digital
event attendees with corporate
account structures for 10K
accounts and 5M contacts
Entities mastered from
transactional data in SQLServer
and Azure Data Lake.
Microservices APIs support data
syndication to martech
applications and PowerBI
reporting.
Duration:
10 weeks to working graph
31. Better Questions
How can I get more business value and deeper
insights from the data I already have?
How can I get a better understanding of my customers to
create more relevant experiences?
How can I more effectively mobilize and
syndicate the data I’m ingesting?
What is the next best action I can take?
Thank
You!
31
32. Michael Moore, Ph.D.
Executive Director
► Michael Moore is an Executive Director and Practice Lead for Graph + AI
in EY’s Tech Consulting Emerging Technology (ET) Group
► Joined EY in 2017, based in the Seattle, WA office
► Ph.D. University of California, Berkeley
► B.S. & B.A. University of California, Santa Cruz
► Society Consulting – Graph Architect
Schema, ETL & systems design for a high-performance Neo4j graph database encompassing the totality
of Microsoft’s B2B data on Azure VM. Graph database supports multi-touch marketing attribution
analytics and multi-dimensional event-based audience segmentation & recommendations for direct
marketing. Provided POC graph reporting and visualization interfaces. Neo4j Enterprise edition, Python,
Node.js, nGraph, Javascript.
► Microsoft Corporation – General Manager
Management of core BI infrastructure and measurement capabilities supporting Microsoft's global
marketing budget cascade, campaign reporting, pipeline reporting, incentive reporting, ROMI reporting,
social and web analytics on Microsoft.com for the Global Marketing Operations team. Management of
complex projects across multiple subsidiaries, agencies and vendors. Strategic focus on foundational
database, digital and social marketing capabilities including: marketing ROI, customer & channel partner
engagement, marketing conversion, sales pipeline, dynamic personalization, data mining, predictive
modeling, behavioral segmentation, privacy governance, web enablement, tracking & measurement,
and internal & external data quality, and instrumentation process control.
► Grey San Francisco – VP Analytics
Responsible for ongoing campaign reporting, ROI analysis, creative and placement optimizations for
agency clients. Architected and deployed an enterprise OLAP reporting solution on Oracle RAC /
Microstrategy to improve quality and efficiency of analytics operations. Provided advanced analytical
services to clients in retail, tech, banking and automotive, including consulting, regression modeling and
data mining.
Profile Select professional experience
Skills and tool knowledge
► Michael Moore, Ph.D. is an Executive Director in the Advisory Services
practice of Ernst & Young LLP. He is the National practice lead for
Enterprise Knowledge Graphs + AI in EY’s Data and Analytics (DnA) Group.
► Michael has industry and solution in customer experience, customer
service, e-commerce, ad-serving, web and media analytics, consumer
loyalty and churn, marketing optimization, enterprise and partner pipeline,
and social media
► He specializes in graph database architecture, graph-based advanced
analytics, machine learning and recommender systems. Michael is certified
Neo4j Professional, and has active enterprise graph engagements in
financial services, tech, oil & gas, retail and hospitality sectors.
32