SlideShare uma empresa Scribd logo
1 de 32
Baixar para ler offline
Roadmap for
Enterprise Graph Strategy
Michael Moore, Ph.D.
Executive Director, Enterprise Knowledge Graphs + AI
EY Performance Improvement Advisory
michael.moore4@ey.com
July 18, 2019
The Database Landscape is Changing
SQL RDBMS
Column
Document Key Value
Graph
SearchServerlessStreams In-Memory
Traditional Databases
& Data Warehousing
NoSQL Databases
Data Services & Data Processing
Batch MR Blockchain
2
Scale Out  Scale Up
Continued increase in capacity and
dropping compute costs are challenging
scale-out commodity server assumptions,
particularly for database workloads
2018
3
Rankings Change in Popularity (db-engines.com)
*Proprietary method based on general interest, mentions, relevance in social networks, frequency of technical discussions etc.
Graph DBs
4
“We send email to people, so they will visit
our website and buy our product”
A Database specifically designed for creating, storing, and querying graphs
MATCH (e:Email)-[:SENT_TO]->
(p:Person {fullName: ’Steve Newman'})-[:VISITED]->
(w:Website)<-[:SOLD_ON]-(pr:Product)<-[:PURCHASED]-(p)
RETURN *
Semantic Representation
Graph Representation
Physical Representation
► Graphs have all possible logical relationships precomputed, much, much faster than SQL
► Graphs are fast and easy understand, develop and use
► Graphs integrate well with applications and data sources, great for real-time digital workloads
► Graphs surface, unify and mobilize data held in silos and data lakes
What is a Graph Database? 5
This is a Graph.
6
This is a Graph.7
This is a Graph.
8
Graph Use Cases
► Customer 360°
► Recommendation
Engines
► Marketing Attribution
► Enterprise Search
► Fraud Detection
► Master Data
Management
► Supply Chain
► Geolocation & Routing
► Access & Asset Control
► Social Networks
► IT & Network
Management
9
Real-Time, Evolving Graph View Across the Business
Data Ingestion, Cleansing, Reduction & Pipelining
Real-time BI & ScorecardsMobile & Web Applications Data Science
access control, metadata, recos, monitoring KPIs, targets, reporting, drill down/across attribution, similarity, fraud, pathing, cliques
Marketing ROI &
Digital Experience (CMO)
Data Governance &
Data Quality (CDO)
Operations & Risk
Management (CFO)
Account Coverage &
Customer LTV (CRO)
Product Marketing &
Recommendations (CPO)
UNSTRUCTURED LEGACY SNAPSHOTS
CONFORMED &
CURATED
STREAMS
Graphs Accelerate Enterprise Data Mobilization 10
11Roadmap for Enterprise Graph Strategy
Small Team:
• Graph Architect
• Data Engineer
• Full-stack Developer
• Data Scientist
• Report Developer
Problem / Scope
What will the graph
solve?
Production BuildCloud PilotLocalhost POCGraphy Problem
Business need, Data sources Data modeling, API, example queries Data snapshot, reference architecture, API suite Hardening, scheduled & stream ETL, Live UX
Stakeholder Input
Graph Design
Data Work
APIs / Data Services
Integration / Refinement
Scale / Harden / Run
Validate
What questions can now
be answered?
Connect
Does the data support the
graph model and
semantics?
Mobilize
What data does the new
experience need?
Use Cases
What is the feedback
from the business on how
well the graph solves the
use case?
Deploy
What monitoring, testing,
process needs to be put
in place to achieve a
robust SLA?
Key Conversations
Talk to the business, pick a graphy problem
What is a “Graphy” problem?
• Requires many entities (eg many SQL tables, 360° views)
• Involves recursion (eg. SQL self joins)
• Has complex, potentially colliding, hierarchies (eg SQL 1 to many, many-to-many)
• Based on informatics of the relationships themselves (eg collaborative filtering shared
relationship counts, shortest path segment summations for wayfinding, cost/time
minimization for supply chain, money flows for finance)
• Requires mapping, direct or indirect across data sources (eg data lake unification)
• Demands fast query results (eg digital applications, search)
• Most importantly, go talk to the business – what are the analytics you’d like to have or
customer experiences you’d like to light up – but can’t because of our current data
limitations?
• What’s the most critical data that you’d like to see connected?
• What would be an example demo that you’d find compelling (report/analysis/experience)
12
Production BuildCloud PilotLocalhost POCGraphy Problem
Get comfortable with Neo4j – don’t need to become an expert
• Get hands on – be fearless! Neo4j is the easiest graph database to
learn.
• Install Neo4j, Apoc procedures, set the following in Manage/Settings
#Apoc Plugin Configurations
apoc.import.file.enabled=true
apoc.export.file.enabled=true
dbms.security.procedures.unrestricted=*.*
• Go through the Cypher lessons, and learn basics graph modeling and to
load csv
LOAD CSV WITH HEADERS FROM "file:///movies.csv"
AS row
CALL apoc.load.csv(url,{}) YIELD map
• Any reasonably sized laptop should be able to handle a graph with
several million nodes and relationships You will quickly see some of the
significant benefits of connected data.
• For extra credit you can go onto github/neo4j-examples and download
starter applications for your favorite languages.
13
Production BuildCloud PilotLocalhost POCGraphy Problem
Design and build your POC Graph
• Start small and simple, limit yourself to 3-4 data sources, shallow extracts.
Snapshot SQL top queries for a pool of linked transactions
• Use common sense, business-friendly naming for your node labels and relationship
types. You’ll iterate this model using input from the business, and the model
should be clear and readable
• Don’t be afraid of recursion
(Employee)-[:REPORTS_TO]->(Employee) who is the boss?
• Don’t get too hung up on whether something should be a node label, property, or
relationship. Just keep in mind that node labels define set members, and that it’s
faster to search along relationships (traversal) than properties (full graph scan)
• You can use call db.schema() to see the graph schema, and we often use
http://apcjones.com/arrows/# to build illustrative schemas for conversations with
business stakeholders
• Test your graph design by writing some example queries, do this with your business
stakeholder
• Does this look right to you – is this how you would whiteboard this process? Am I
missing any key entities or relationships?
14
Production BuildCloud PilotLocalhost POCGraphy Problem
Example Knowledge Graph Schema
for Spend and Supply Chain Analytics
Supplier 360°
Spend Graph
• Accurately captures the
sourcing complexity of products
and services
• Enables more insightful indirect
spend analytics for products
and services
• Reconciles line-item detail to
top parent company, across
intermediate entities
• Extensible for audit, fraud
detection, tracking &
traceability
• Integrates with data lake,
reporting platforms and
transactional applications
Product Supply Chain Service Providers
Procurement
Top
Parent
Line
Item
Detail
Tracking and Traceability
Invoicing
Data fabric composed of nodes and relationships that
connect and mobilize data, using consistent semantics
15
1
Example Customer 360° Graph Schema
Account
Transactions
Segments
Product
Interactions
16
Customer 360°
Graph
• Accurately captures full range
of customer touchpoints across
enterprise surface area
• Enables more insightful indirect
spend analytics for products
and services
• Reconciles product usage,
marketing interactions and
digital identity
• Integrates with execution layer
for AI driven UX
Example B2B MDM Graph Schema
Product
Core Data Elements
Customer
& Contact
Orders
17
Master Data
Management
Graph Schema
• Accurately captures data
lineage for core identity
components
• Provides ”Golden Record” from
multi-source probabilistic
authority scores
• Relates contacts, customers,
orders and products without
loss of fidelity
• Enables detailed whitespace
analysis and next best sales
action
• Integrates with data lake and
CRM applications
Example Polyglot Discovery Graph Schema
Searchable Pointers to
Unstructured blobs
Text & Metrics from
Semi-Structured
data
Structured Data and Derived Entities
18
Data Discovery
Graph Schema
• Connects structured, semi-
structured and unstructured
data across polyglot storage
• Accurately handles complex
data and documents hierarchies
• Enables full text search in graph
or in document store, directly
and via NLP
• Provides source document
access through blob URLs
• Integrates with data lake,
reporting platforms and
transactional applications
Design and build your POC Graph 19
Production BuildCloud PilotLocalhost POCGraphy Problem
• Breakthrough queries
• Graph algorithms
• Data unification & mobilization
• Use-case specific (Customer 360, Supply Chain, Fraud, Reco)
• Make a localhost graph->app stack so you understand how
parameterized Cypher & Bolt drivers work
• Use any of the neo4j-examples to jumpstart
• If you don’t want to spend time creating a REST API, check out
GraphQL and the GRAND stack (https://github.com/grand-
stack/grand-stack-starter)
• Focus on the business value of the new graph enabled analytics –
We can now know this to make better decisions
We can now do this for our customers
20Neo4j - Power BI Integration with GraphQL
Graph Database
Neo4j GraphQL API
2
3
4
1. Client issues GraphQL query
2. GraphQL API sends Cypher query to Neo4j
3. Response data sent to Client
4. Data updated in PBI report
GraphQL schema, registered in Neo4j
m query cURL wrapper
PBI report
1
Neo4j – React Integration with GraphQL (GRAND Stack)
21
Pick and build your demo application for your snapshot graph 22
Production BuildCloud PilotLocalhost POCGraphy Problem
• Pick a cloud or on-prem
• Use Marketplace images if possible
• Start with a single instance VM for Neo4j, (~ RAM 50% of SQL size)
• Attach external drives so you can scale the server
• Determine your stack architecture
• Understand your data processing requirements
• Install Python – very good for performing batch operations, pip neo4j-driver
• Leverage Neo4j’s high speed loader
• Determine what cleansing needs to occur
• If you need help reach out to SI partner or Neo4j services
Pick and build your demo application for your snapshot graph 23
• MVP data domains
• Graph database, app-informed
• Simplest data service
• MVP app experience
• Add new experiences, same data
• Add new data domains
Nodejs, .Net, Python, React, Swift, Tableau, etc.
REST, Bolt
Production BuildCloud PilotLocalhost POCGraphy Problem
Michael’s I-Frame model For Graph ROI
 Accelerate Graph-driven User Experiences
CRM
Reporting
(Tableau, PBI)
Blobs FilesQueuesTables
Azure Cloud Storage
AI Sandbox
(Azure ML Studio)
Stream ETL
(Azure Event Hub)
Audience
Manager
Campaign
Target
Experience
Manager
Analytics
Marketo
Engage
Adobe Experience Cloud
Scheduled
ETL
Data
Reduction
(Azure Spark)
Cloud Data Lake
In-Memory
Document Store
Data Models
(Azure Analysis
Services)
Data Catalog
(Azure Data
Catalog)
ERP
AZURE VPC
In-Memory
Knowledge Graph
Data Services APIs
REST
Ingest Batch
StoreIngest Real-time
SearchConsolidate
Connect & Unify
Mobilize
Semantic
Layer
Analytics
Layer
Azure Data
Factory
Automated Reports
and Dashboards
Consistent Metrics
Data Discovery
Retention Models
Deep Learning
In-Memory
Sessionization
Data Aggregation
Syndicated
Data and Analytics
Knowledge Graph
Customer/Contact 360° View
Marketing Attribution
Recommendations
Real-time
Document Search
Elastic SQL Repository for
Curated & Conformed Data
Data Staging
Elastic Repository for
Raw and Unstructured Data
Real Time Updates
Customer Events
Automated Data Loading
Triggered Marketing
Consistent Experience
Example Graph Architecture Execution
Reporting
(Tableau,QuickSight)
S3 Blobs FilesQueuesEBS Tables
AWS Cloud Storage
Data
Discovery
(AWS Athena)
Stream ETL
(AWS Kinesis)
Audience
Manager
Campaign
Target
Experience
Manager
Analytics
Marketo
Engage
Adobe Experience Cloud (Azure)
Scheduled ETL
(AWS Data Pipeline,
PDI Kettle)
Data
Reduction
(AWS EMR)
Cloud Data Lake
In-Memory
Document Store
Machine
Learning
(AWS SageMaker)
Data Catalog
(AWS Glue)
ERP
AWS VPC
In-Memory
Knowledge Graph
Data Services APIs
REST
Ingest Batch
StoreIngest Real-time
SearchConsolidate
Connect & Unify
Mobilize
Execution
Semantic
Layer
Analytics
Layer
Example Graph Architecture
Automated Reports
and Dashboards
Retention Models
Deep Learning
Data Discovery
Consistent
Data Models
Sessionization
Data Aggregation
Knowledge Graph
Customer/Contact 360° View
Marketing Attribution
Recommendations
Real-time
Document Search
Elastic SQL Repository for
Curated & Conformed Data
Data Staging
Elastic Repository for
Raw and Unstructured Data
CRM
Real Time Updates
Customer Events
Automated Data Loading
Triggered Marketing
Consistent Experience
Syndicated
Data and Analytics
Enterprise Knowledge Graph Development with Neo4j
• Locate and validate data lake tables
• Design test graph schema
• Estimate graph size from nodes, relationships and properties
• Configure Neo4j server to minimize SSD disk contention
• Prepare Hive queries to generate graph-form tables (nodes, relationships)
• Validate key uniqueness, string handling, character types, relationship mappings
• Export graph form tables to gzip csv files
• Iteratively test data loader scripts, file by file
• On successful completion of hydration, apply constraints and indexes, refactor as needed
Graph-form TablesData Lake Tables CSV.gz Files Load Script Data Store
EXTRACT EXTRACT HIGH SPEED LOADER
IMPORT DONE in 1h 29m 16s 530ms.
Imported:
458356377 nodes
2176603843 relationships
9064981812 properties
Peak memory usage: 9.46 GB
26
Polyglot Graph Data Processing
Extract XML,
Convert to JSON,
Load JSON with
Azure Blob URI
Extract and Load
Azure Blob URIs
Extract and Load
• Document Metadata
• Named Entities
• Map Relationships
• Text Summaries
Graph Analytics & Queries
Couchbase Full Text Search
Pointers to Azure Blob URIs
Leveraging fit-for-purpose storage:
Graph storage for unified many-to-many access to cross-domain data
Document storage for searchable access to semi-structured data
Blob storage repository for large, raw and unstructured data
37,157 blobs
5.5 TB
Unstructured:
Semi-Structured:
Load CSV to Graph
Structured:
20,573 JSONs
5 GB
Reports/Applications
Data Mobilization and Graph Unification – Full Lineage and Auditability
215K nodes & relationships
1.5 GB
27
Production BuildCloud PilotLocalhost POCGraphy Problem
Go to Production 28
• Follow your IT best practices
• Security, assume you’ll be breached
• Deploy full environment set – Prod cluster, Stg cluster,
Test, Dev
• DevOps - leverage Jenkins, Ansible
• Wrap your solution in test automation
• Do load testing against your APIs to look for additional
optimization opportunities (Gatling)
• Monitor your logs (Splunk, Dynatrace)
• Monitor your common queries, refactor or reindex as
needed, optimize for speed
• Leverage the I-Frame Model to provide more value
29Roadmap for Enterprise Graph Strategy
Small Team:
• Graph Architect
• Data Engineer
• Full-stack Developer
• Data Scientist
• Report Developer
Problem / Scope
What will the graph
solve?
Production BuildCloud PilotLocalhost POCGraphy Problem
Business need, Data sources Data modeling, API, example queries Data snapshot, reference architecture, API suite Hardening, scheduled & stream ETL, Live UX
Stakeholder Input
Graph Design
Data Work
APIs / Data Services
Integration / Refinement
Scale / Harden / Run
Validate
What questions can now
be answered?
Connect
Does the data support the
graph model and
semantics?
Mobilize
What data does the new
experience need?
Use Cases
What is the feedback
from the business on how
well the graph solves the
use case?
Deploy
What monitoring, testing,
process needs to be put
in place to achieve a
robust SLA?
Key Conversations
EY Cross-Sector Graph Experience: MDM, 360°, AML/Fraud, Recommenders 30
Fortune 100 Tech Company
Use Case:
Global B2B Account 360° view and
marketing attribution
Approach:
Neo4j graph with 500M nodes
and 2.2B relationships,
representing all known business
accounts, contacts and marketing
touches. Mastered data from
17disparate transactional sources
in Azure Data Lake. Supported in-
graph analytics for marketing
attribution and next best action
recommendations across global
geographies
Duration:
16 weeks to working graph
Fortune 100 Footwear Company
Use Case:
Converged Brick & Mortar +
Online Shopper 360° View
Approach:
Neo4j graph with 2B nodes and
relationships, representing sales
transactions for 40M shoppers
across 275 physical stores and the
ecommerce platform. Algorithmic
extraction and profiling from raw
XML records in AWS Hadoop,
MDM record concordance and in-
graph analytics for product
associations, store analytics and
recommendation services.
Duration:
12 weeks to working graph,
ongoing project through 2018
Fortune 500 Cruise Line Company
Use Case:
Shipboard and Shoreside
Recommendation Engine
Approach:
Neo4j graph deployable to
shipboard VM Ware data centers,
with streaming updates from
large shoreside Neo4j graph
integrating data from Azure
Cerebro, Adobe Experience
Manager and legacy transactional
systems. In-graph
analytics,services API,
recommendation engine for next
best activity for passengers
surfaced via mobile app
Duration:
12 weeks to working graph,
ongoing project through 2018
Fortune 100 Investment Firm
Use Case:
Enhanced Anti-Money Laundering
and Fraud Detection using
Graph+AI
Approach:
Neo4j graph of account 360° view
representing activity of 2M
accounts over 4 years. MDM and
entity extraction for account and
party identity elements from
enterprise Oracle system.
Network clustering, feature
engineering and graph embedding
in TensorFlow deep learning
classifier for suspicious activity
patterns across accounts and
between parties.
Duration:
16 weeks to working graph
Fortune 100 Tech Company
Use Case:
B2B Local Marketing Events
Recommendation Engine
Approach:
Neo4j graph and personalized
next best event recommendation
engine for B2B field marketers.
Reconciles physical and digital
event attendees with corporate
account structures for 10K
accounts and 5M contacts
Entities mastered from
transactional data in SQLServer
and Azure Data Lake.
Microservices APIs support data
syndication to martech
applications and PowerBI
reporting.
Duration:
10 weeks to working graph
Better Questions
How can I get more business value and deeper
insights from the data I already have?
How can I get a better understanding of my customers to
create more relevant experiences?
How can I more effectively mobilize and
syndicate the data I’m ingesting?
What is the next best action I can take?
Thank
You!
31
Michael Moore, Ph.D.
Executive Director
► Michael Moore is an Executive Director and Practice Lead for Graph + AI
in EY’s Tech Consulting Emerging Technology (ET) Group
► Joined EY in 2017, based in the Seattle, WA office
► Ph.D. University of California, Berkeley
► B.S. & B.A. University of California, Santa Cruz
► Society Consulting – Graph Architect
Schema, ETL & systems design for a high-performance Neo4j graph database encompassing the totality
of Microsoft’s B2B data on Azure VM. Graph database supports multi-touch marketing attribution
analytics and multi-dimensional event-based audience segmentation & recommendations for direct
marketing. Provided POC graph reporting and visualization interfaces. Neo4j Enterprise edition, Python,
Node.js, nGraph, Javascript.
► Microsoft Corporation – General Manager
Management of core BI infrastructure and measurement capabilities supporting Microsoft's global
marketing budget cascade, campaign reporting, pipeline reporting, incentive reporting, ROMI reporting,
social and web analytics on Microsoft.com for the Global Marketing Operations team. Management of
complex projects across multiple subsidiaries, agencies and vendors. Strategic focus on foundational
database, digital and social marketing capabilities including: marketing ROI, customer & channel partner
engagement, marketing conversion, sales pipeline, dynamic personalization, data mining, predictive
modeling, behavioral segmentation, privacy governance, web enablement, tracking & measurement,
and internal & external data quality, and instrumentation process control.
► Grey San Francisco – VP Analytics
Responsible for ongoing campaign reporting, ROI analysis, creative and placement optimizations for
agency clients. Architected and deployed an enterprise OLAP reporting solution on Oracle RAC /
Microstrategy to improve quality and efficiency of analytics operations. Provided advanced analytical
services to clients in retail, tech, banking and automotive, including consulting, regression modeling and
data mining.
Profile Select professional experience
Skills and tool knowledge
► Michael Moore, Ph.D. is an Executive Director in the Advisory Services
practice of Ernst & Young LLP. He is the National practice lead for
Enterprise Knowledge Graphs + AI in EY’s Data and Analytics (DnA) Group.
► Michael has industry and solution in customer experience, customer
service, e-commerce, ad-serving, web and media analytics, consumer
loyalty and churn, marketing optimization, enterprise and partner pipeline,
and social media
► He specializes in graph database architecture, graph-based advanced
analytics, machine learning and recommender systems. Michael is certified
Neo4j Professional, and has active enterprise graph engagements in
financial services, tech, oil & gas, retail and hospitality sectors.
32

Mais conteúdo relacionado

Mais procurados

Graph-Based Customer Journey Analytics with Neo4j
Graph-Based Customer Journey Analytics with Neo4jGraph-Based Customer Journey Analytics with Neo4j
Graph-Based Customer Journey Analytics with Neo4j
Neo4j
 
Knowledge Graphs - The Power of Graph-Based Search
Knowledge Graphs - The Power of Graph-Based SearchKnowledge Graphs - The Power of Graph-Based Search
Knowledge Graphs - The Power of Graph-Based Search
Neo4j
 
Data Lineage with Apache Airflow using Marquez
Data Lineage with Apache Airflow using Marquez Data Lineage with Apache Airflow using Marquez
Data Lineage with Apache Airflow using Marquez
Willy Lulciuc
 
Training Series: Build APIs with Neo4j GraphQL Library
Training Series: Build APIs with Neo4j GraphQL LibraryTraining Series: Build APIs with Neo4j GraphQL Library
Training Series: Build APIs with Neo4j GraphQL Library
Neo4j
 

Mais procurados (20)

Graph-Based Customer Journey Analytics with Neo4j
Graph-Based Customer Journey Analytics with Neo4jGraph-Based Customer Journey Analytics with Neo4j
Graph-Based Customer Journey Analytics with Neo4j
 
The path to success with graph database and graph data science_ Neo4j GraphSu...
The path to success with graph database and graph data science_ Neo4j GraphSu...The path to success with graph database and graph data science_ Neo4j GraphSu...
The path to success with graph database and graph data science_ Neo4j GraphSu...
 
Knowledge Graphs - The Power of Graph-Based Search
Knowledge Graphs - The Power of Graph-Based SearchKnowledge Graphs - The Power of Graph-Based Search
Knowledge Graphs - The Power of Graph-Based Search
 
Optimizing Your Supply Chain with the Neo4j Graph
Optimizing Your Supply Chain with the Neo4j GraphOptimizing Your Supply Chain with the Neo4j Graph
Optimizing Your Supply Chain with the Neo4j Graph
 
Graph database Use Cases
Graph database Use CasesGraph database Use Cases
Graph database Use Cases
 
Model serving made easy using Kedro pipelines - Mariusz Strzelecki, GetInData
Model serving made easy using Kedro pipelines - Mariusz Strzelecki, GetInDataModel serving made easy using Kedro pipelines - Mariusz Strzelecki, GetInData
Model serving made easy using Kedro pipelines - Mariusz Strzelecki, GetInData
 
Data Lineage with Apache Airflow using Marquez
Data Lineage with Apache Airflow using Marquez Data Lineage with Apache Airflow using Marquez
Data Lineage with Apache Airflow using Marquez
 
The Knowledge Graph Explosion
The Knowledge Graph ExplosionThe Knowledge Graph Explosion
The Knowledge Graph Explosion
 
Neo4j in Production: A look at Neo4j in the Real World
Neo4j in Production: A look at Neo4j in the Real WorldNeo4j in Production: A look at Neo4j in the Real World
Neo4j in Production: A look at Neo4j in the Real World
 
Vector databases and neural search
Vector databases and neural searchVector databases and neural search
Vector databases and neural search
 
Training Series: Build APIs with Neo4j GraphQL Library
Training Series: Build APIs with Neo4j GraphQL LibraryTraining Series: Build APIs with Neo4j GraphQL Library
Training Series: Build APIs with Neo4j GraphQL Library
 
Improving Machine Learning using Graph Algorithms
Improving Machine Learning using Graph AlgorithmsImproving Machine Learning using Graph Algorithms
Improving Machine Learning using Graph Algorithms
 
Property graph vs. RDF Triplestore comparison in 2020
Property graph vs. RDF Triplestore comparison in 2020Property graph vs. RDF Triplestore comparison in 2020
Property graph vs. RDF Triplestore comparison in 2020
 
Full Stack Graph in the Cloud
Full Stack Graph in the CloudFull Stack Graph in the Cloud
Full Stack Graph in the Cloud
 
ntroducing to the Power of Graph Technology
ntroducing to the Power of Graph Technologyntroducing to the Power of Graph Technology
ntroducing to the Power of Graph Technology
 
Apache Calcite (a tutorial given at BOSS '21)
Apache Calcite (a tutorial given at BOSS '21)Apache Calcite (a tutorial given at BOSS '21)
Apache Calcite (a tutorial given at BOSS '21)
 
The Data Platform for Today's Intelligent Applications.pdf
The Data Platform for Today's Intelligent Applications.pdfThe Data Platform for Today's Intelligent Applications.pdf
The Data Platform for Today's Intelligent Applications.pdf
 
Workshop Introduction to Neo4j
Workshop Introduction to Neo4jWorkshop Introduction to Neo4j
Workshop Introduction to Neo4j
 
Lakehouse Analytics with Dremio
Lakehouse Analytics with DremioLakehouse Analytics with Dremio
Lakehouse Analytics with Dremio
 
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
 

Semelhante a Your Roadmap for An Enterprise Graph Strategy

Mark Simpson - UKOUG23 - Refactoring Monolithic Oracle Database Applications ...
Mark Simpson - UKOUG23 - Refactoring Monolithic Oracle Database Applications ...Mark Simpson - UKOUG23 - Refactoring Monolithic Oracle Database Applications ...
Mark Simpson - UKOUG23 - Refactoring Monolithic Oracle Database Applications ...
marksimpsongw
 

Semelhante a Your Roadmap for An Enterprise Graph Strategy (20)

Your Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph StrategyYour Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph Strategy
 
Your Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph StrategyYour Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph Strategy
 
Neo4j GraphTour New York_EY Presentation_Michael Moore
Neo4j GraphTour New York_EY Presentation_Michael MooreNeo4j GraphTour New York_EY Presentation_Michael Moore
Neo4j GraphTour New York_EY Presentation_Michael Moore
 
Roadmap for Enterprise Graph Strategy
Roadmap for Enterprise Graph StrategyRoadmap for Enterprise Graph Strategy
Roadmap for Enterprise Graph Strategy
 
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
 
Using ML and Azure to improve Customer Lifetime Value
Using ML and Azure to improve Customer Lifetime ValueUsing ML and Azure to improve Customer Lifetime Value
Using ML and Azure to improve Customer Lifetime Value
 
SPS Vancouver 2018 - What is CDM and CDS
SPS Vancouver 2018 - What is CDM and CDSSPS Vancouver 2018 - What is CDM and CDS
SPS Vancouver 2018 - What is CDM and CDS
 
Big Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft AzureBig Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft Azure
 
Produktdatenmanagement mit Neo4j
Produktdatenmanagement mit Neo4jProduktdatenmanagement mit Neo4j
Produktdatenmanagement mit Neo4j
 
Knowledge Graph for Machine Learning and Data Science
Knowledge Graph for Machine Learning and Data ScienceKnowledge Graph for Machine Learning and Data Science
Knowledge Graph for Machine Learning and Data Science
 
MuleSoft Meetup June London 2023.pptx.pdf
MuleSoft Meetup June London 2023.pptx.pdfMuleSoft Meetup June London 2023.pptx.pdf
MuleSoft Meetup June London 2023.pptx.pdf
 
How Celtra Optimizes its Advertising Platform with Databricks
How Celtra Optimizes its Advertising Platformwith DatabricksHow Celtra Optimizes its Advertising Platformwith Databricks
How Celtra Optimizes its Advertising Platform with Databricks
 
DataLive conference in Geneva 2018 - Bringing AI to the Data
DataLive conference in Geneva 2018 - Bringing AI to the DataDataLive conference in Geneva 2018 - Bringing AI to the Data
DataLive conference in Geneva 2018 - Bringing AI to the Data
 
Integrating Advanced Analytics with Autodesk Solutions
Integrating Advanced Analytics with Autodesk SolutionsIntegrating Advanced Analytics with Autodesk Solutions
Integrating Advanced Analytics with Autodesk Solutions
 
Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019
Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019
Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019
 
Digital Reinvention by NRB
Digital Reinvention by NRBDigital Reinvention by NRB
Digital Reinvention by NRB
 
Mark Simpson - UKOUG23 - Refactoring Monolithic Oracle Database Applications ...
Mark Simpson - UKOUG23 - Refactoring Monolithic Oracle Database Applications ...Mark Simpson - UKOUG23 - Refactoring Monolithic Oracle Database Applications ...
Mark Simpson - UKOUG23 - Refactoring Monolithic Oracle Database Applications ...
 
3 Steps to Accelerate to Cloud
3 Steps to Accelerate to Cloud3 Steps to Accelerate to Cloud
3 Steps to Accelerate to Cloud
 
Data Discovery and BI - Is there Really a Difference?
Data Discovery and BI - Is there Really a Difference?Data Discovery and BI - Is there Really a Difference?
Data Discovery and BI - Is there Really a Difference?
 
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache SparkData-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
 

Mais de Neo4j

Mais de Neo4j (20)

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansQIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...
 
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafosBBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
 
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
 
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jGraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
 
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
 
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdfRabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
 
Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!
 
IA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG timeIA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG time
 
Neo4j: Data Engineering for RAG (retrieval augmented generation)
Neo4j: Data Engineering for RAG (retrieval augmented generation)Neo4j: Data Engineering for RAG (retrieval augmented generation)
Neo4j: Data Engineering for RAG (retrieval augmented generation)
 
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdfNeo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
 
Enabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge GraphsEnabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge Graphs
 
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdfNeo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
 
Neo4j Jesus Barrasa The Art of the Possible with Graph
Neo4j Jesus Barrasa The Art of the Possible with GraphNeo4j Jesus Barrasa The Art of the Possible with Graph
Neo4j Jesus Barrasa The Art of the Possible with Graph
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Último (20)

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 

Your Roadmap for An Enterprise Graph Strategy

  • 1. Roadmap for Enterprise Graph Strategy Michael Moore, Ph.D. Executive Director, Enterprise Knowledge Graphs + AI EY Performance Improvement Advisory michael.moore4@ey.com July 18, 2019
  • 2. The Database Landscape is Changing SQL RDBMS Column Document Key Value Graph SearchServerlessStreams In-Memory Traditional Databases & Data Warehousing NoSQL Databases Data Services & Data Processing Batch MR Blockchain 2
  • 3. Scale Out  Scale Up Continued increase in capacity and dropping compute costs are challenging scale-out commodity server assumptions, particularly for database workloads 2018 3
  • 4. Rankings Change in Popularity (db-engines.com) *Proprietary method based on general interest, mentions, relevance in social networks, frequency of technical discussions etc. Graph DBs 4
  • 5. “We send email to people, so they will visit our website and buy our product” A Database specifically designed for creating, storing, and querying graphs MATCH (e:Email)-[:SENT_TO]-> (p:Person {fullName: ’Steve Newman'})-[:VISITED]-> (w:Website)<-[:SOLD_ON]-(pr:Product)<-[:PURCHASED]-(p) RETURN * Semantic Representation Graph Representation Physical Representation ► Graphs have all possible logical relationships precomputed, much, much faster than SQL ► Graphs are fast and easy understand, develop and use ► Graphs integrate well with applications and data sources, great for real-time digital workloads ► Graphs surface, unify and mobilize data held in silos and data lakes What is a Graph Database? 5
  • 6. This is a Graph. 6
  • 7. This is a Graph.7
  • 8. This is a Graph. 8
  • 9. Graph Use Cases ► Customer 360° ► Recommendation Engines ► Marketing Attribution ► Enterprise Search ► Fraud Detection ► Master Data Management ► Supply Chain ► Geolocation & Routing ► Access & Asset Control ► Social Networks ► IT & Network Management 9
  • 10. Real-Time, Evolving Graph View Across the Business Data Ingestion, Cleansing, Reduction & Pipelining Real-time BI & ScorecardsMobile & Web Applications Data Science access control, metadata, recos, monitoring KPIs, targets, reporting, drill down/across attribution, similarity, fraud, pathing, cliques Marketing ROI & Digital Experience (CMO) Data Governance & Data Quality (CDO) Operations & Risk Management (CFO) Account Coverage & Customer LTV (CRO) Product Marketing & Recommendations (CPO) UNSTRUCTURED LEGACY SNAPSHOTS CONFORMED & CURATED STREAMS Graphs Accelerate Enterprise Data Mobilization 10
  • 11. 11Roadmap for Enterprise Graph Strategy Small Team: • Graph Architect • Data Engineer • Full-stack Developer • Data Scientist • Report Developer Problem / Scope What will the graph solve? Production BuildCloud PilotLocalhost POCGraphy Problem Business need, Data sources Data modeling, API, example queries Data snapshot, reference architecture, API suite Hardening, scheduled & stream ETL, Live UX Stakeholder Input Graph Design Data Work APIs / Data Services Integration / Refinement Scale / Harden / Run Validate What questions can now be answered? Connect Does the data support the graph model and semantics? Mobilize What data does the new experience need? Use Cases What is the feedback from the business on how well the graph solves the use case? Deploy What monitoring, testing, process needs to be put in place to achieve a robust SLA? Key Conversations
  • 12. Talk to the business, pick a graphy problem What is a “Graphy” problem? • Requires many entities (eg many SQL tables, 360° views) • Involves recursion (eg. SQL self joins) • Has complex, potentially colliding, hierarchies (eg SQL 1 to many, many-to-many) • Based on informatics of the relationships themselves (eg collaborative filtering shared relationship counts, shortest path segment summations for wayfinding, cost/time minimization for supply chain, money flows for finance) • Requires mapping, direct or indirect across data sources (eg data lake unification) • Demands fast query results (eg digital applications, search) • Most importantly, go talk to the business – what are the analytics you’d like to have or customer experiences you’d like to light up – but can’t because of our current data limitations? • What’s the most critical data that you’d like to see connected? • What would be an example demo that you’d find compelling (report/analysis/experience) 12 Production BuildCloud PilotLocalhost POCGraphy Problem
  • 13. Get comfortable with Neo4j – don’t need to become an expert • Get hands on – be fearless! Neo4j is the easiest graph database to learn. • Install Neo4j, Apoc procedures, set the following in Manage/Settings #Apoc Plugin Configurations apoc.import.file.enabled=true apoc.export.file.enabled=true dbms.security.procedures.unrestricted=*.* • Go through the Cypher lessons, and learn basics graph modeling and to load csv LOAD CSV WITH HEADERS FROM "file:///movies.csv" AS row CALL apoc.load.csv(url,{}) YIELD map • Any reasonably sized laptop should be able to handle a graph with several million nodes and relationships You will quickly see some of the significant benefits of connected data. • For extra credit you can go onto github/neo4j-examples and download starter applications for your favorite languages. 13 Production BuildCloud PilotLocalhost POCGraphy Problem
  • 14. Design and build your POC Graph • Start small and simple, limit yourself to 3-4 data sources, shallow extracts. Snapshot SQL top queries for a pool of linked transactions • Use common sense, business-friendly naming for your node labels and relationship types. You’ll iterate this model using input from the business, and the model should be clear and readable • Don’t be afraid of recursion (Employee)-[:REPORTS_TO]->(Employee) who is the boss? • Don’t get too hung up on whether something should be a node label, property, or relationship. Just keep in mind that node labels define set members, and that it’s faster to search along relationships (traversal) than properties (full graph scan) • You can use call db.schema() to see the graph schema, and we often use http://apcjones.com/arrows/# to build illustrative schemas for conversations with business stakeholders • Test your graph design by writing some example queries, do this with your business stakeholder • Does this look right to you – is this how you would whiteboard this process? Am I missing any key entities or relationships? 14 Production BuildCloud PilotLocalhost POCGraphy Problem
  • 15. Example Knowledge Graph Schema for Spend and Supply Chain Analytics Supplier 360° Spend Graph • Accurately captures the sourcing complexity of products and services • Enables more insightful indirect spend analytics for products and services • Reconciles line-item detail to top parent company, across intermediate entities • Extensible for audit, fraud detection, tracking & traceability • Integrates with data lake, reporting platforms and transactional applications Product Supply Chain Service Providers Procurement Top Parent Line Item Detail Tracking and Traceability Invoicing Data fabric composed of nodes and relationships that connect and mobilize data, using consistent semantics 15
  • 16. 1 Example Customer 360° Graph Schema Account Transactions Segments Product Interactions 16 Customer 360° Graph • Accurately captures full range of customer touchpoints across enterprise surface area • Enables more insightful indirect spend analytics for products and services • Reconciles product usage, marketing interactions and digital identity • Integrates with execution layer for AI driven UX
  • 17. Example B2B MDM Graph Schema Product Core Data Elements Customer & Contact Orders 17 Master Data Management Graph Schema • Accurately captures data lineage for core identity components • Provides ”Golden Record” from multi-source probabilistic authority scores • Relates contacts, customers, orders and products without loss of fidelity • Enables detailed whitespace analysis and next best sales action • Integrates with data lake and CRM applications
  • 18. Example Polyglot Discovery Graph Schema Searchable Pointers to Unstructured blobs Text & Metrics from Semi-Structured data Structured Data and Derived Entities 18 Data Discovery Graph Schema • Connects structured, semi- structured and unstructured data across polyglot storage • Accurately handles complex data and documents hierarchies • Enables full text search in graph or in document store, directly and via NLP • Provides source document access through blob URLs • Integrates with data lake, reporting platforms and transactional applications
  • 19. Design and build your POC Graph 19 Production BuildCloud PilotLocalhost POCGraphy Problem • Breakthrough queries • Graph algorithms • Data unification & mobilization • Use-case specific (Customer 360, Supply Chain, Fraud, Reco) • Make a localhost graph->app stack so you understand how parameterized Cypher & Bolt drivers work • Use any of the neo4j-examples to jumpstart • If you don’t want to spend time creating a REST API, check out GraphQL and the GRAND stack (https://github.com/grand- stack/grand-stack-starter) • Focus on the business value of the new graph enabled analytics – We can now know this to make better decisions We can now do this for our customers
  • 20. 20Neo4j - Power BI Integration with GraphQL Graph Database Neo4j GraphQL API 2 3 4 1. Client issues GraphQL query 2. GraphQL API sends Cypher query to Neo4j 3. Response data sent to Client 4. Data updated in PBI report GraphQL schema, registered in Neo4j m query cURL wrapper PBI report 1
  • 21. Neo4j – React Integration with GraphQL (GRAND Stack) 21
  • 22. Pick and build your demo application for your snapshot graph 22 Production BuildCloud PilotLocalhost POCGraphy Problem • Pick a cloud or on-prem • Use Marketplace images if possible • Start with a single instance VM for Neo4j, (~ RAM 50% of SQL size) • Attach external drives so you can scale the server • Determine your stack architecture • Understand your data processing requirements • Install Python – very good for performing batch operations, pip neo4j-driver • Leverage Neo4j’s high speed loader • Determine what cleansing needs to occur • If you need help reach out to SI partner or Neo4j services
  • 23. Pick and build your demo application for your snapshot graph 23 • MVP data domains • Graph database, app-informed • Simplest data service • MVP app experience • Add new experiences, same data • Add new data domains Nodejs, .Net, Python, React, Swift, Tableau, etc. REST, Bolt Production BuildCloud PilotLocalhost POCGraphy Problem Michael’s I-Frame model For Graph ROI  Accelerate Graph-driven User Experiences
  • 24. CRM Reporting (Tableau, PBI) Blobs FilesQueuesTables Azure Cloud Storage AI Sandbox (Azure ML Studio) Stream ETL (Azure Event Hub) Audience Manager Campaign Target Experience Manager Analytics Marketo Engage Adobe Experience Cloud Scheduled ETL Data Reduction (Azure Spark) Cloud Data Lake In-Memory Document Store Data Models (Azure Analysis Services) Data Catalog (Azure Data Catalog) ERP AZURE VPC In-Memory Knowledge Graph Data Services APIs REST Ingest Batch StoreIngest Real-time SearchConsolidate Connect & Unify Mobilize Semantic Layer Analytics Layer Azure Data Factory Automated Reports and Dashboards Consistent Metrics Data Discovery Retention Models Deep Learning In-Memory Sessionization Data Aggregation Syndicated Data and Analytics Knowledge Graph Customer/Contact 360° View Marketing Attribution Recommendations Real-time Document Search Elastic SQL Repository for Curated & Conformed Data Data Staging Elastic Repository for Raw and Unstructured Data Real Time Updates Customer Events Automated Data Loading Triggered Marketing Consistent Experience Example Graph Architecture Execution
  • 25. Reporting (Tableau,QuickSight) S3 Blobs FilesQueuesEBS Tables AWS Cloud Storage Data Discovery (AWS Athena) Stream ETL (AWS Kinesis) Audience Manager Campaign Target Experience Manager Analytics Marketo Engage Adobe Experience Cloud (Azure) Scheduled ETL (AWS Data Pipeline, PDI Kettle) Data Reduction (AWS EMR) Cloud Data Lake In-Memory Document Store Machine Learning (AWS SageMaker) Data Catalog (AWS Glue) ERP AWS VPC In-Memory Knowledge Graph Data Services APIs REST Ingest Batch StoreIngest Real-time SearchConsolidate Connect & Unify Mobilize Execution Semantic Layer Analytics Layer Example Graph Architecture Automated Reports and Dashboards Retention Models Deep Learning Data Discovery Consistent Data Models Sessionization Data Aggregation Knowledge Graph Customer/Contact 360° View Marketing Attribution Recommendations Real-time Document Search Elastic SQL Repository for Curated & Conformed Data Data Staging Elastic Repository for Raw and Unstructured Data CRM Real Time Updates Customer Events Automated Data Loading Triggered Marketing Consistent Experience Syndicated Data and Analytics
  • 26. Enterprise Knowledge Graph Development with Neo4j • Locate and validate data lake tables • Design test graph schema • Estimate graph size from nodes, relationships and properties • Configure Neo4j server to minimize SSD disk contention • Prepare Hive queries to generate graph-form tables (nodes, relationships) • Validate key uniqueness, string handling, character types, relationship mappings • Export graph form tables to gzip csv files • Iteratively test data loader scripts, file by file • On successful completion of hydration, apply constraints and indexes, refactor as needed Graph-form TablesData Lake Tables CSV.gz Files Load Script Data Store EXTRACT EXTRACT HIGH SPEED LOADER IMPORT DONE in 1h 29m 16s 530ms. Imported: 458356377 nodes 2176603843 relationships 9064981812 properties Peak memory usage: 9.46 GB 26
  • 27. Polyglot Graph Data Processing Extract XML, Convert to JSON, Load JSON with Azure Blob URI Extract and Load Azure Blob URIs Extract and Load • Document Metadata • Named Entities • Map Relationships • Text Summaries Graph Analytics & Queries Couchbase Full Text Search Pointers to Azure Blob URIs Leveraging fit-for-purpose storage: Graph storage for unified many-to-many access to cross-domain data Document storage for searchable access to semi-structured data Blob storage repository for large, raw and unstructured data 37,157 blobs 5.5 TB Unstructured: Semi-Structured: Load CSV to Graph Structured: 20,573 JSONs 5 GB Reports/Applications Data Mobilization and Graph Unification – Full Lineage and Auditability 215K nodes & relationships 1.5 GB 27
  • 28. Production BuildCloud PilotLocalhost POCGraphy Problem Go to Production 28 • Follow your IT best practices • Security, assume you’ll be breached • Deploy full environment set – Prod cluster, Stg cluster, Test, Dev • DevOps - leverage Jenkins, Ansible • Wrap your solution in test automation • Do load testing against your APIs to look for additional optimization opportunities (Gatling) • Monitor your logs (Splunk, Dynatrace) • Monitor your common queries, refactor or reindex as needed, optimize for speed • Leverage the I-Frame Model to provide more value
  • 29. 29Roadmap for Enterprise Graph Strategy Small Team: • Graph Architect • Data Engineer • Full-stack Developer • Data Scientist • Report Developer Problem / Scope What will the graph solve? Production BuildCloud PilotLocalhost POCGraphy Problem Business need, Data sources Data modeling, API, example queries Data snapshot, reference architecture, API suite Hardening, scheduled & stream ETL, Live UX Stakeholder Input Graph Design Data Work APIs / Data Services Integration / Refinement Scale / Harden / Run Validate What questions can now be answered? Connect Does the data support the graph model and semantics? Mobilize What data does the new experience need? Use Cases What is the feedback from the business on how well the graph solves the use case? Deploy What monitoring, testing, process needs to be put in place to achieve a robust SLA? Key Conversations
  • 30. EY Cross-Sector Graph Experience: MDM, 360°, AML/Fraud, Recommenders 30 Fortune 100 Tech Company Use Case: Global B2B Account 360° view and marketing attribution Approach: Neo4j graph with 500M nodes and 2.2B relationships, representing all known business accounts, contacts and marketing touches. Mastered data from 17disparate transactional sources in Azure Data Lake. Supported in- graph analytics for marketing attribution and next best action recommendations across global geographies Duration: 16 weeks to working graph Fortune 100 Footwear Company Use Case: Converged Brick & Mortar + Online Shopper 360° View Approach: Neo4j graph with 2B nodes and relationships, representing sales transactions for 40M shoppers across 275 physical stores and the ecommerce platform. Algorithmic extraction and profiling from raw XML records in AWS Hadoop, MDM record concordance and in- graph analytics for product associations, store analytics and recommendation services. Duration: 12 weeks to working graph, ongoing project through 2018 Fortune 500 Cruise Line Company Use Case: Shipboard and Shoreside Recommendation Engine Approach: Neo4j graph deployable to shipboard VM Ware data centers, with streaming updates from large shoreside Neo4j graph integrating data from Azure Cerebro, Adobe Experience Manager and legacy transactional systems. In-graph analytics,services API, recommendation engine for next best activity for passengers surfaced via mobile app Duration: 12 weeks to working graph, ongoing project through 2018 Fortune 100 Investment Firm Use Case: Enhanced Anti-Money Laundering and Fraud Detection using Graph+AI Approach: Neo4j graph of account 360° view representing activity of 2M accounts over 4 years. MDM and entity extraction for account and party identity elements from enterprise Oracle system. Network clustering, feature engineering and graph embedding in TensorFlow deep learning classifier for suspicious activity patterns across accounts and between parties. Duration: 16 weeks to working graph Fortune 100 Tech Company Use Case: B2B Local Marketing Events Recommendation Engine Approach: Neo4j graph and personalized next best event recommendation engine for B2B field marketers. Reconciles physical and digital event attendees with corporate account structures for 10K accounts and 5M contacts Entities mastered from transactional data in SQLServer and Azure Data Lake. Microservices APIs support data syndication to martech applications and PowerBI reporting. Duration: 10 weeks to working graph
  • 31. Better Questions How can I get more business value and deeper insights from the data I already have? How can I get a better understanding of my customers to create more relevant experiences? How can I more effectively mobilize and syndicate the data I’m ingesting? What is the next best action I can take? Thank You! 31
  • 32. Michael Moore, Ph.D. Executive Director ► Michael Moore is an Executive Director and Practice Lead for Graph + AI in EY’s Tech Consulting Emerging Technology (ET) Group ► Joined EY in 2017, based in the Seattle, WA office ► Ph.D. University of California, Berkeley ► B.S. & B.A. University of California, Santa Cruz ► Society Consulting – Graph Architect Schema, ETL & systems design for a high-performance Neo4j graph database encompassing the totality of Microsoft’s B2B data on Azure VM. Graph database supports multi-touch marketing attribution analytics and multi-dimensional event-based audience segmentation & recommendations for direct marketing. Provided POC graph reporting and visualization interfaces. Neo4j Enterprise edition, Python, Node.js, nGraph, Javascript. ► Microsoft Corporation – General Manager Management of core BI infrastructure and measurement capabilities supporting Microsoft's global marketing budget cascade, campaign reporting, pipeline reporting, incentive reporting, ROMI reporting, social and web analytics on Microsoft.com for the Global Marketing Operations team. Management of complex projects across multiple subsidiaries, agencies and vendors. Strategic focus on foundational database, digital and social marketing capabilities including: marketing ROI, customer & channel partner engagement, marketing conversion, sales pipeline, dynamic personalization, data mining, predictive modeling, behavioral segmentation, privacy governance, web enablement, tracking & measurement, and internal & external data quality, and instrumentation process control. ► Grey San Francisco – VP Analytics Responsible for ongoing campaign reporting, ROI analysis, creative and placement optimizations for agency clients. Architected and deployed an enterprise OLAP reporting solution on Oracle RAC / Microstrategy to improve quality and efficiency of analytics operations. Provided advanced analytical services to clients in retail, tech, banking and automotive, including consulting, regression modeling and data mining. Profile Select professional experience Skills and tool knowledge ► Michael Moore, Ph.D. is an Executive Director in the Advisory Services practice of Ernst & Young LLP. He is the National practice lead for Enterprise Knowledge Graphs + AI in EY’s Data and Analytics (DnA) Group. ► Michael has industry and solution in customer experience, customer service, e-commerce, ad-serving, web and media analytics, consumer loyalty and churn, marketing optimization, enterprise and partner pipeline, and social media ► He specializes in graph database architecture, graph-based advanced analytics, machine learning and recommender systems. Michael is certified Neo4j Professional, and has active enterprise graph engagements in financial services, tech, oil & gas, retail and hospitality sectors. 32