SlideShare uma empresa Scribd logo
1 de 27
Baixar para ler offline
Graph Databases
and Machine
Learning: Finding a
Happy Marriage
Victor Lee, November 12, 2018
© 2018 TigerGraph. All Rights Reserved
Know Your Speaker
● Mission: Unleash the power of interconnected
data for deeper insights and better outcomes
● Technology: Industry’s First and Only Native
Massively Parallel Processing(MPP) Graph Technology
● Product: The world’s fastest graph database used by
organizations including AliPay, Intuit, Uber, Visa, Zillow
2
Victor Lee, Director of Product Management
PhD in Graph Data Mining
MS Electrical Engineering, Stanford
BS EE/CS, UC Berkeley
© 2018 TigerGraph. All Rights Reserved
Two Hot Items: A Good Match?
New, exciting way to represent
information and to query it.
3
Graph Database Machine Learning
Established powerhouse for
predictions and "smart"
systems, but still tricky to use.
© 2018 TigerGraph. All Rights Reserved
Graph analysis is possibly the single most
effective competitive differentiator for
organizations pursuing data-driven operations
and decisions after the design of data capture.”
4
© 2018 TigerGraph. All Rights Reserved
What is Machine Learning?
● Branch of AI
● Making predictions, models, or
optimizations using incomplete
or inexact data
○ Model? Weather
○ Optimization?
Best driving route,
best investment portfolio
5
© 2018 TigerGraph. All Rights Reserved
Uses of Machine Learning
● Virtual Personal Assistants
○ Voice-to-text
○ Semantic analysis
○ Formulate a response
● Real-time route optimization
○ Waze, Google Maps
● Image Recognition
○ Facial recognition
○ X-ray / CT scan reading
● Effective spam filters
6
© 2018 TigerGraph. All Rights Reserved
Machine Learning Techniques
● Supervised Learning
Humans provide "training data" with known correct answers
○ Decision Trees
○ Nearest Neighbor
○ Hidden Markov Model,Naïve Bayesian
○ Linear Regression
○ Support Vector Machines (SVM)
○ Neural Networks
● Unsupervised Learning
No "correct" answer; detect patterns or summarize
○ Clustering, Partitioning, Outlier detection
○ Neural Networks
○ Dimensionality reduction
7
© 2018 TigerGraph. All Rights
Reserved
8
Graphs (Networks) are all around us
• Data Never Sleeps
• Internet of Things
• BIG Data
• Social Networks
Graph: collection of
nodes and links ("edges")
between nodes
Node
1
Node
2
Edge
© 2018 TigerGraph. All Rights
Reserved
9
Graph Database - The Natural Choice for
Interconnected Data
• Natural storage model
for interconnected data
• Natural for transactions:
transaction = edge
• Natural for computational
knowledge/inference/
learning –
"connect the dots"
User Product Tag
Order
Location
© 2018 TigerGraph. All Rights Reserved
Matchmaking: Consider Each Party
Graph Databases
● Stores data as nodes & edges.
● An edge (link) IS a data object.
● Capabilities of platforms and
query languages vary:
○ All are good a storing data
○ Wide range of analytical
abilities
10
Machine Learning (ML)
● Wants data as vector, array, or
tensor
● Computationally intensive; long
and time-consuming
● Requires good quality input
data:
Garbage in, garbage out
● Many methods to choose from
© 2018 TigerGraph. All Rights Reserved
ML Features and Modeling
● Most/All ML methods try to correlate features
(properties/attributes) with the target result.
● Quality of modeling depends on quality of features
○ Having the right features
○ Having the right distribution of values & results
● Don't know which features are needed!
11
Feat 1 Feat 2 Feat 3 Result
Item 1 X 0 0 X
Item 2 X X 0 0
Item 3 X 0 X ?
© 2018 TigerGraph. All Rights Reserved
ML Features and Modeling
● If you don't know the right features yet, then just try
to have LOTS of features.
● Big Data 3 V's:
○ Volume
○ Variety
○ Velocity
● Example:
○ Suppose family medical history is the best predictor of an
ailment.
○ If you don't know family medical history, you won't be able
to make good predictions
12
© 2018 TigerGraph. All Rights Reserved
3 Possible Arrangements for Graph DB + ML
1. Graph Data leaves home and moves to ML.
2. ML moves into the Graph Database.
3. A Graph Database and ML partnership.
13
© 2018 TigerGraph. All Rights Reserved
#1 - Graph Data leaves home and moves to ML
● Export relevant graph data
● Simple analogy to Relational Database:
○ Table(s) of node data
○ Table(s) of edge data
● Traditional approach
○ ML is fed data from DBs or just data files
● Business Value
○ Easy to deploy
○ Graph DB still valuable for non-ML queries and analytics
14
NodeID attr1 attr2 attr3
EID NID1 NID2 Eattr1 Eattr2 Eattr3
© 2018 TigerGraph. All Rights Reserved
Exporting Graph Data to ML: Concerns
● Is exported data really "flat" enough?
○ Edge records refers to Node records
○ ML algorithms usually want records to be independent
○ One answer is graph embedding
● Graph Embedding
"Mapping a graph's nodes and edges to points in space."
○ Example: drawing a graph on paper is mapping to 2D space.
○ General graphs can be drawn in 3D space → tensor
15
NodeID attr1 attr2 attr3
EID NID1 NID2 Eattr1 Eattr2 Eattr3
Graph Embedding
© 2018 TigerGraph. All Rights Reserved
#2 - ML moves into the Graph Database
● Implement ML algorithm as an advanced query
● Novel Approach:
in-graph analytics
● Business Value
○ Integration
○ Eliminate need for separate systems
16
© 2018 TigerGraph. All Rights Reserved
ML in Graph Database: Concerns
1. Computational power of Graph DB
2. Graph query language's algorithmic expressiveness
3. Expressing ML algorithms in Graph terms
17
Only 3rd Gen. Native Parallel Graphs satisfy 1 and 2.
© 2018 TigerGraph. All Rights
Reserved
18
Native Graph Storage
Parallel Loading
Parallel Multi-Hop
Analytics
Parallel Updates
(in real-time)
Scale up to Support
Query Volume
Privacy for Sensitive
Data
Graph 1.0
single server,
non-parallel
Graph 2.0
NoSQL base
for storage
Graph 3.0
Native, Parallel
Hours to load
terabytes
Sub-second
across 10+
hops
Mutable/
Transactional
2 billion+/day
in production
Runs out of
time/memory
after 2 hops
Scales for
simple queries
(1-2 hops)
Times out
after 2 hops
Days to load
terabytes
Evolution of Graph
Databases
Days to load
terabytes
MultiGraph
service
© 2018 TigerGraph. All Rights
Reserved
Graph Query Languages
● SparQL (RDF): Designed for semantic reasoning, but
not computationally intensive work
● Gremlin (Apache): Older scheme. Declarative. Very
awkward to write complex tasks.
● Cypher (Neo4j): Known by many. Okay for analytics
if you couple it with Java.
● GSQL (TigerGraph): Designed for big data analytics.
○ Built-in parallelism and accumulation.
○ Syntax is natural and familiar for algorithms.
19
© 2018 TigerGraph. All Rights Reserved
Example: Graph Algorithm Libraries
• Some Graph Databases provide libraries of standard
graph algorithms
• Measure/discover characteristics of a graph
• PageRank, Community Detection, Shortest Path, etc.
• GSQL is well-suited for algorithms
• Library is written in GSQL
• Users can see the code and modify as desired
• Cypher is less well-suited
• Library is pre-compiled function calls
• User can't see how the algorithms are written or modify them
20
© 2018 TigerGraph. All Rights
Reserved
Expressing ML Algorithms in Graph Terms
● Not widely known how to execute most ML
algorithms on a graph.
● Same problem as with exporting the Graph data:
○ ML is used to flat data
○ How to make edges a benefit instead of a hinderance?
● This marriage proposal needs further work.
21
© 2018 TigerGraph. All Rights
Reserved
#3 - A Graph Database and ML partnership
● Multi-Step process:
a. First, Graph DB runs queries to extract graph-based
features
b. Send graph features to ML system.
c. ML system, enriched by new graph features, learns a
predictive model.
d. Model can be applied back to graph for real-time
prediction.
● Seems more like a transactional partnership than a
marriage.
22
© 2018 TigerGraph. All Rights
Reserved
Example: China Mobile Fraud Detection
using Graph Features for ML
23
© 2018 TigerGraph. All Rights
Reserved
Generating New Training Data for ML to
Detect Phone-Based Scam
24
Phone 2 Features
Phone 1 Features
(1) High call back phone
(2) Stable group
(3) Long term phone
(4) Many in-group
connections
(5) 3-step friend relation
(1) Short term call
duration
(2) Empty stable group
(3) No call back phone
(4) Many rejected calls
(5) Avg. distance > 3
Training Data
Tens – Hundreds
of Billions of calls
118 per phone x 600
Million phones =
70 Billion new features
Machine
Learning
Solution
Graph with 600M phones and 15B call edges,1000s of calls/second.
Feed ML with new training data with 118 features per phone.
© 2018 TigerGraph. All Rights Reserved
Graph-enhanced workflow for
Anti-Money Laundering
25
© 2018 TigerGraph. All Rights Reserved
Summary
Graph Databases and Machine Learning
both represent powerful tools for getting
more value from data.
3 Proposals for Marrying Graph DBs + ML:
● Export Graph Data to ML system
○ Conventional, but not clear how to treat the data.
● Perform ML within Graph Database
○ Need fast, scalable analytical Graph DB. ML methods not
yet ported to Graph DB.
● Export Graph Features to ML system
○ Improves ML results; in practice now.
26
© 2018 TigerGraph. All Rights Reserved
Final Thoughts
● Room for diversity.
● Healthy marriages evolve over time.
If you have any questions please
contact info@tigergraph.com
27

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Using Graph Algorithms for Advanced Analytics - Part 2 Centrality
Using Graph Algorithms for Advanced Analytics - Part 2 CentralityUsing Graph Algorithms for Advanced Analytics - Part 2 Centrality
Using Graph Algorithms for Advanced Analytics - Part 2 Centrality
 
Graph Gurus Episode 11: Accumulators for Complex Graph Analytics
Graph Gurus Episode 11: Accumulators for Complex Graph AnalyticsGraph Gurus Episode 11: Accumulators for Complex Graph Analytics
Graph Gurus Episode 11: Accumulators for Complex Graph Analytics
 
Neo4j y GenAI
Neo4j y GenAI Neo4j y GenAI
Neo4j y GenAI
 
Scaling into Billions of Nodes and Relationships with Neo4j Graph Data Science
Scaling into Billions of Nodes and Relationships with Neo4j Graph Data ScienceScaling into Billions of Nodes and Relationships with Neo4j Graph Data Science
Scaling into Billions of Nodes and Relationships with Neo4j Graph Data Science
 
Deep dive into LangChain integration with Neo4j.pptx
Deep dive into LangChain integration with Neo4j.pptxDeep dive into LangChain integration with Neo4j.pptx
Deep dive into LangChain integration with Neo4j.pptx
 
Intermediate Cypher.pdf
Intermediate Cypher.pdfIntermediate Cypher.pdf
Intermediate Cypher.pdf
 
Demystifying Graph Neural Networks
Demystifying Graph Neural NetworksDemystifying Graph Neural Networks
Demystifying Graph Neural Networks
 
Fraud Detection with Graphs at the Danish Business Authority
Fraud Detection with Graphs at the Danish Business AuthorityFraud Detection with Graphs at the Danish Business Authority
Fraud Detection with Graphs at the Danish Business Authority
 
Data Modeling with Neo4j
Data Modeling with Neo4jData Modeling with Neo4j
Data Modeling with Neo4j
 
Workshop - Build a Graph Solution
Workshop - Build a Graph SolutionWorkshop - Build a Graph Solution
Workshop - Build a Graph Solution
 
GSK: How Knowledge Graphs Improve Clinical Reporting Workflows
GSK: How Knowledge Graphs Improve Clinical Reporting WorkflowsGSK: How Knowledge Graphs Improve Clinical Reporting Workflows
GSK: How Knowledge Graphs Improve Clinical Reporting Workflows
 
Get Started with the Most Advanced Edition Yet of Neo4j Graph Data Science
Get Started with the Most Advanced Edition Yet of Neo4j Graph Data ScienceGet Started with the Most Advanced Edition Yet of Neo4j Graph Data Science
Get Started with the Most Advanced Edition Yet of Neo4j Graph Data Science
 
Graphs in Retail: Know Your Customers and Make Your Recommendations Engine Learn
Graphs in Retail: Know Your Customers and Make Your Recommendations Engine LearnGraphs in Retail: Know Your Customers and Make Your Recommendations Engine Learn
Graphs in Retail: Know Your Customers and Make Your Recommendations Engine Learn
 
Workshop Introduction to Neo4j
Workshop Introduction to Neo4jWorkshop Introduction to Neo4j
Workshop Introduction to Neo4j
 
Northern Gas Networks and CKDelta at Neo4j GraphSummit London 14Nov23.pptx
Northern Gas Networks and CKDelta at Neo4j GraphSummit London 14Nov23.pptxNorthern Gas Networks and CKDelta at Neo4j GraphSummit London 14Nov23.pptx
Northern Gas Networks and CKDelta at Neo4j GraphSummit London 14Nov23.pptx
 
Improving Machine Learning using Graph Algorithms
Improving Machine Learning using Graph AlgorithmsImproving Machine Learning using Graph Algorithms
Improving Machine Learning using Graph Algorithms
 
EY: Why graph technology makes sense for fraud detection and customer 360 pro...
EY: Why graph technology makes sense for fraud detection and customer 360 pro...EY: Why graph technology makes sense for fraud detection and customer 360 pro...
EY: Why graph technology makes sense for fraud detection and customer 360 pro...
 
AWS Neptune - A Fast and reliable Graph Database Built for the Cloud
AWS Neptune - A Fast and reliable Graph Database Built for the CloudAWS Neptune - A Fast and reliable Graph Database Built for the Cloud
AWS Neptune - A Fast and reliable Graph Database Built for the Cloud
 
The perfect couple: Uniting Large Language Models and Knowledge Graphs for En...
The perfect couple: Uniting Large Language Models and Knowledge Graphs for En...The perfect couple: Uniting Large Language Models and Knowledge Graphs for En...
The perfect couple: Uniting Large Language Models and Knowledge Graphs for En...
 
Scale Your Mission-Critical Applications With Neo4j Fabric and Clustering Arc...
Scale Your Mission-Critical Applications With Neo4j Fabric and Clustering Arc...Scale Your Mission-Critical Applications With Neo4j Fabric and Clustering Arc...
Scale Your Mission-Critical Applications With Neo4j Fabric and Clustering Arc...
 

Semelhante a Graph Databases and Machine Learning | November 2018

Designing data pipelines for analytics and machine learning in industrial set...
Designing data pipelines for analytics and machine learning in industrial set...Designing data pipelines for analytics and machine learning in industrial set...
Designing data pipelines for analytics and machine learning in industrial set...
DataWorks Summit
 
The Neo4j Data Platform for Today & Tomorrow.pdf
The Neo4j Data Platform for Today & Tomorrow.pdfThe Neo4j Data Platform for Today & Tomorrow.pdf
The Neo4j Data Platform for Today & Tomorrow.pdf
Neo4j
 
Real-Time Fraud Detection at Scale—Integrating Real-Time Deep-Link Graph Anal...
Real-Time Fraud Detection at Scale—Integrating Real-Time Deep-Link Graph Anal...Real-Time Fraud Detection at Scale—Integrating Real-Time Deep-Link Graph Anal...
Real-Time Fraud Detection at Scale—Integrating Real-Time Deep-Link Graph Anal...
Databricks
 

Semelhante a Graph Databases and Machine Learning | November 2018 (20)

Scaling up business value with real-time operational graph analytics
Scaling up business value with real-time operational graph analyticsScaling up business value with real-time operational graph analytics
Scaling up business value with real-time operational graph analytics
 
Graph Gurus Episode 8: Location, Location, Location - Geospatial Analysis wit...
Graph Gurus Episode 8: Location, Location, Location - Geospatial Analysis wit...Graph Gurus Episode 8: Location, Location, Location - Geospatial Analysis wit...
Graph Gurus Episode 8: Location, Location, Location - Geospatial Analysis wit...
 
Big Data LDN 2018: DATA OPERATIONS PROBLEMS CREATED BY DEEP LEARNING, AND HOW...
Big Data LDN 2018: DATA OPERATIONS PROBLEMS CREATED BY DEEP LEARNING, AND HOW...Big Data LDN 2018: DATA OPERATIONS PROBLEMS CREATED BY DEEP LEARNING, AND HOW...
Big Data LDN 2018: DATA OPERATIONS PROBLEMS CREATED BY DEEP LEARNING, AND HOW...
 
Big Stream Processing Systems, Big Graphs
Big Stream Processing Systems, Big GraphsBig Stream Processing Systems, Big Graphs
Big Stream Processing Systems, Big Graphs
 
Graph Gurus Episode 37: Modeling for Kaggle COVID-19 Dataset
Graph Gurus Episode 37: Modeling for Kaggle COVID-19 DatasetGraph Gurus Episode 37: Modeling for Kaggle COVID-19 Dataset
Graph Gurus Episode 37: Modeling for Kaggle COVID-19 Dataset
 
Designing data pipelines for analytics and machine learning in industrial set...
Designing data pipelines for analytics and machine learning in industrial set...Designing data pipelines for analytics and machine learning in industrial set...
Designing data pipelines for analytics and machine learning in industrial set...
 
Graph Data Science at Scale
Graph Data Science at ScaleGraph Data Science at Scale
Graph Data Science at Scale
 
Graph Gurus Episode 1: Enterprise Graph
Graph Gurus Episode 1: Enterprise GraphGraph Gurus Episode 1: Enterprise Graph
Graph Gurus Episode 1: Enterprise Graph
 
DataOps: An Agile Method for Data-Driven Organizations
DataOps: An Agile Method for Data-Driven OrganizationsDataOps: An Agile Method for Data-Driven Organizations
DataOps: An Agile Method for Data-Driven Organizations
 
Using Connected Data and Graph Technology to Enhance Machine Learning and Art...
Using Connected Data and Graph Technology to Enhance Machine Learning and Art...Using Connected Data and Graph Technology to Enhance Machine Learning and Art...
Using Connected Data and Graph Technology to Enhance Machine Learning and Art...
 
Graph Gurus Episode 5: Webinar PageRank
Graph Gurus Episode 5: Webinar PageRankGraph Gurus Episode 5: Webinar PageRank
Graph Gurus Episode 5: Webinar PageRank
 
Graph Gurus 21: Integrating Real-Time Deep-Link Graph Analytics with Spark AI
Graph Gurus 21: Integrating Real-Time Deep-Link Graph Analytics with Spark AIGraph Gurus 21: Integrating Real-Time Deep-Link Graph Analytics with Spark AI
Graph Gurus 21: Integrating Real-Time Deep-Link Graph Analytics with Spark AI
 
The Neo4j Data Platform for Today & Tomorrow.pdf
The Neo4j Data Platform for Today & Tomorrow.pdfThe Neo4j Data Platform for Today & Tomorrow.pdf
The Neo4j Data Platform for Today & Tomorrow.pdf
 
Lambda Architecture and open source technology stack for real time big data
Lambda Architecture and open source technology stack for real time big dataLambda Architecture and open source technology stack for real time big data
Lambda Architecture and open source technology stack for real time big data
 
Big data analysis
Big data analysisBig data analysis
Big data analysis
 
Cheryl Wiebe - Advanced Analytics in the Industrial World
Cheryl Wiebe - Advanced Analytics in the Industrial WorldCheryl Wiebe - Advanced Analytics in the Industrial World
Cheryl Wiebe - Advanced Analytics in the Industrial World
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
 
Scaling graph investigations with Math, GPUs, & Experts
Scaling graph investigations with Math, GPUs, & ExpertsScaling graph investigations with Math, GPUs, & Experts
Scaling graph investigations with Math, GPUs, & Experts
 
Make your data fly - Building data platform in AWS
Make your data fly - Building data platform in AWSMake your data fly - Building data platform in AWS
Make your data fly - Building data platform in AWS
 
Real-Time Fraud Detection at Scale—Integrating Real-Time Deep-Link Graph Anal...
Real-Time Fraud Detection at Scale—Integrating Real-Time Deep-Link Graph Anal...Real-Time Fraud Detection at Scale—Integrating Real-Time Deep-Link Graph Anal...
Real-Time Fraud Detection at Scale—Integrating Real-Time Deep-Link Graph Anal...
 

Mais de TigerGraph

Mais de TigerGraph (20)

MAXIMIZING THE VALUE OF SCIENTIFIC INFORMATION TO ACCELERATE INNOVATION
MAXIMIZING THE VALUE OF SCIENTIFIC INFORMATION TO ACCELERATE INNOVATIONMAXIMIZING THE VALUE OF SCIENTIFIC INFORMATION TO ACCELERATE INNOVATION
MAXIMIZING THE VALUE OF SCIENTIFIC INFORMATION TO ACCELERATE INNOVATION
 
Better Together: How Graph database enables easy data integration with Spark ...
Better Together: How Graph database enables easy data integration with Spark ...Better Together: How Graph database enables easy data integration with Spark ...
Better Together: How Graph database enables easy data integration with Spark ...
 
Building an accurate understanding of consumers based on real-world signals
Building an accurate understanding of consumers based on real-world signalsBuilding an accurate understanding of consumers based on real-world signals
Building an accurate understanding of consumers based on real-world signals
 
Care Intervention Assistant - Omaha Clinical Data Information System
Care Intervention Assistant - Omaha Clinical Data Information SystemCare Intervention Assistant - Omaha Clinical Data Information System
Care Intervention Assistant - Omaha Clinical Data Information System
 
Correspondent Banking Networks
Correspondent Banking NetworksCorrespondent Banking Networks
Correspondent Banking Networks
 
Delivering Large Scale Real-time Graph Analytics with Dell Infrastructure and...
Delivering Large Scale Real-time Graph Analytics with Dell Infrastructure and...Delivering Large Scale Real-time Graph Analytics with Dell Infrastructure and...
Delivering Large Scale Real-time Graph Analytics with Dell Infrastructure and...
 
Deploying an End-to-End TigerGraph Enterprise Architecture using Kafka, Maria...
Deploying an End-to-End TigerGraph Enterprise Architecture using Kafka, Maria...Deploying an End-to-End TigerGraph Enterprise Architecture using Kafka, Maria...
Deploying an End-to-End TigerGraph Enterprise Architecture using Kafka, Maria...
 
Fraud Detection and Compliance with Graph Learning
Fraud Detection and Compliance with Graph LearningFraud Detection and Compliance with Graph Learning
Fraud Detection and Compliance with Graph Learning
 
Fraudulent credit card cash-out detection On Graphs
Fraudulent credit card cash-out detection On GraphsFraudulent credit card cash-out detection On Graphs
Fraudulent credit card cash-out detection On Graphs
 
FROM DATAFRAMES TO GRAPH Data Science with pyTigerGraph
FROM DATAFRAMES TO GRAPH Data Science with pyTigerGraphFROM DATAFRAMES TO GRAPH Data Science with pyTigerGraph
FROM DATAFRAMES TO GRAPH Data Science with pyTigerGraph
 
Customer Experience Management
Customer Experience ManagementCustomer Experience Management
Customer Experience Management
 
Graph+AI for Fin. Services
Graph+AI for Fin. ServicesGraph+AI for Fin. Services
Graph+AI for Fin. Services
 
Davraz - A graph visualization and exploration software.
Davraz - A graph visualization and exploration software.Davraz - A graph visualization and exploration software.
Davraz - A graph visualization and exploration software.
 
Plume - A Code Property Graph Extraction and Analysis Library
Plume - A Code Property Graph Extraction and Analysis LibraryPlume - A Code Property Graph Extraction and Analysis Library
Plume - A Code Property Graph Extraction and Analysis Library
 
TigerGraph.js
TigerGraph.jsTigerGraph.js
TigerGraph.js
 
GRAPHS FOR THE FUTURE ENERGY SYSTEMS
GRAPHS FOR THE FUTURE ENERGY SYSTEMSGRAPHS FOR THE FUTURE ENERGY SYSTEMS
GRAPHS FOR THE FUTURE ENERGY SYSTEMS
 
Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...
Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...
Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...
 
How to Build An AI Based Customer Data Platform: Learn the design patterns fo...
How to Build An AI Based Customer Data Platform: Learn the design patterns fo...How to Build An AI Based Customer Data Platform: Learn the design patterns fo...
How to Build An AI Based Customer Data Platform: Learn the design patterns fo...
 
Machine Learning Feature Design with TigerGraph 3.0 No-Code GUI
Machine Learning Feature Design with TigerGraph 3.0 No-Code GUIMachine Learning Feature Design with TigerGraph 3.0 No-Code GUI
Machine Learning Feature Design with TigerGraph 3.0 No-Code GUI
 
Recommendation Engine with In-Database Machine Learning
Recommendation Engine with In-Database Machine LearningRecommendation Engine with In-Database Machine Learning
Recommendation Engine with In-Database Machine Learning
 

Último

Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
masabamasaba
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
VictoriaMetrics
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
VictorSzoltysek
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Último (20)

tonesoftg
tonesoftgtonesoftg
tonesoftg
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
Harnessing ChatGPT - Elevating Productivity in Today's Agile Environment
Harnessing ChatGPT  - Elevating Productivity in Today's Agile EnvironmentHarnessing ChatGPT  - Elevating Productivity in Today's Agile Environment
Harnessing ChatGPT - Elevating Productivity in Today's Agile Environment
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go Platformless
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 

Graph Databases and Machine Learning | November 2018

  • 1. Graph Databases and Machine Learning: Finding a Happy Marriage Victor Lee, November 12, 2018
  • 2. © 2018 TigerGraph. All Rights Reserved Know Your Speaker ● Mission: Unleash the power of interconnected data for deeper insights and better outcomes ● Technology: Industry’s First and Only Native Massively Parallel Processing(MPP) Graph Technology ● Product: The world’s fastest graph database used by organizations including AliPay, Intuit, Uber, Visa, Zillow 2 Victor Lee, Director of Product Management PhD in Graph Data Mining MS Electrical Engineering, Stanford BS EE/CS, UC Berkeley
  • 3. © 2018 TigerGraph. All Rights Reserved Two Hot Items: A Good Match? New, exciting way to represent information and to query it. 3 Graph Database Machine Learning Established powerhouse for predictions and "smart" systems, but still tricky to use.
  • 4. © 2018 TigerGraph. All Rights Reserved Graph analysis is possibly the single most effective competitive differentiator for organizations pursuing data-driven operations and decisions after the design of data capture.” 4
  • 5. © 2018 TigerGraph. All Rights Reserved What is Machine Learning? ● Branch of AI ● Making predictions, models, or optimizations using incomplete or inexact data ○ Model? Weather ○ Optimization? Best driving route, best investment portfolio 5
  • 6. © 2018 TigerGraph. All Rights Reserved Uses of Machine Learning ● Virtual Personal Assistants ○ Voice-to-text ○ Semantic analysis ○ Formulate a response ● Real-time route optimization ○ Waze, Google Maps ● Image Recognition ○ Facial recognition ○ X-ray / CT scan reading ● Effective spam filters 6
  • 7. © 2018 TigerGraph. All Rights Reserved Machine Learning Techniques ● Supervised Learning Humans provide "training data" with known correct answers ○ Decision Trees ○ Nearest Neighbor ○ Hidden Markov Model,Naïve Bayesian ○ Linear Regression ○ Support Vector Machines (SVM) ○ Neural Networks ● Unsupervised Learning No "correct" answer; detect patterns or summarize ○ Clustering, Partitioning, Outlier detection ○ Neural Networks ○ Dimensionality reduction 7
  • 8. © 2018 TigerGraph. All Rights Reserved 8 Graphs (Networks) are all around us • Data Never Sleeps • Internet of Things • BIG Data • Social Networks Graph: collection of nodes and links ("edges") between nodes Node 1 Node 2 Edge
  • 9. © 2018 TigerGraph. All Rights Reserved 9 Graph Database - The Natural Choice for Interconnected Data • Natural storage model for interconnected data • Natural for transactions: transaction = edge • Natural for computational knowledge/inference/ learning – "connect the dots" User Product Tag Order Location
  • 10. © 2018 TigerGraph. All Rights Reserved Matchmaking: Consider Each Party Graph Databases ● Stores data as nodes & edges. ● An edge (link) IS a data object. ● Capabilities of platforms and query languages vary: ○ All are good a storing data ○ Wide range of analytical abilities 10 Machine Learning (ML) ● Wants data as vector, array, or tensor ● Computationally intensive; long and time-consuming ● Requires good quality input data: Garbage in, garbage out ● Many methods to choose from
  • 11. © 2018 TigerGraph. All Rights Reserved ML Features and Modeling ● Most/All ML methods try to correlate features (properties/attributes) with the target result. ● Quality of modeling depends on quality of features ○ Having the right features ○ Having the right distribution of values & results ● Don't know which features are needed! 11 Feat 1 Feat 2 Feat 3 Result Item 1 X 0 0 X Item 2 X X 0 0 Item 3 X 0 X ?
  • 12. © 2018 TigerGraph. All Rights Reserved ML Features and Modeling ● If you don't know the right features yet, then just try to have LOTS of features. ● Big Data 3 V's: ○ Volume ○ Variety ○ Velocity ● Example: ○ Suppose family medical history is the best predictor of an ailment. ○ If you don't know family medical history, you won't be able to make good predictions 12
  • 13. © 2018 TigerGraph. All Rights Reserved 3 Possible Arrangements for Graph DB + ML 1. Graph Data leaves home and moves to ML. 2. ML moves into the Graph Database. 3. A Graph Database and ML partnership. 13
  • 14. © 2018 TigerGraph. All Rights Reserved #1 - Graph Data leaves home and moves to ML ● Export relevant graph data ● Simple analogy to Relational Database: ○ Table(s) of node data ○ Table(s) of edge data ● Traditional approach ○ ML is fed data from DBs or just data files ● Business Value ○ Easy to deploy ○ Graph DB still valuable for non-ML queries and analytics 14 NodeID attr1 attr2 attr3 EID NID1 NID2 Eattr1 Eattr2 Eattr3
  • 15. © 2018 TigerGraph. All Rights Reserved Exporting Graph Data to ML: Concerns ● Is exported data really "flat" enough? ○ Edge records refers to Node records ○ ML algorithms usually want records to be independent ○ One answer is graph embedding ● Graph Embedding "Mapping a graph's nodes and edges to points in space." ○ Example: drawing a graph on paper is mapping to 2D space. ○ General graphs can be drawn in 3D space → tensor 15 NodeID attr1 attr2 attr3 EID NID1 NID2 Eattr1 Eattr2 Eattr3 Graph Embedding
  • 16. © 2018 TigerGraph. All Rights Reserved #2 - ML moves into the Graph Database ● Implement ML algorithm as an advanced query ● Novel Approach: in-graph analytics ● Business Value ○ Integration ○ Eliminate need for separate systems 16
  • 17. © 2018 TigerGraph. All Rights Reserved ML in Graph Database: Concerns 1. Computational power of Graph DB 2. Graph query language's algorithmic expressiveness 3. Expressing ML algorithms in Graph terms 17 Only 3rd Gen. Native Parallel Graphs satisfy 1 and 2.
  • 18. © 2018 TigerGraph. All Rights Reserved 18 Native Graph Storage Parallel Loading Parallel Multi-Hop Analytics Parallel Updates (in real-time) Scale up to Support Query Volume Privacy for Sensitive Data Graph 1.0 single server, non-parallel Graph 2.0 NoSQL base for storage Graph 3.0 Native, Parallel Hours to load terabytes Sub-second across 10+ hops Mutable/ Transactional 2 billion+/day in production Runs out of time/memory after 2 hops Scales for simple queries (1-2 hops) Times out after 2 hops Days to load terabytes Evolution of Graph Databases Days to load terabytes MultiGraph service
  • 19. © 2018 TigerGraph. All Rights Reserved Graph Query Languages ● SparQL (RDF): Designed for semantic reasoning, but not computationally intensive work ● Gremlin (Apache): Older scheme. Declarative. Very awkward to write complex tasks. ● Cypher (Neo4j): Known by many. Okay for analytics if you couple it with Java. ● GSQL (TigerGraph): Designed for big data analytics. ○ Built-in parallelism and accumulation. ○ Syntax is natural and familiar for algorithms. 19
  • 20. © 2018 TigerGraph. All Rights Reserved Example: Graph Algorithm Libraries • Some Graph Databases provide libraries of standard graph algorithms • Measure/discover characteristics of a graph • PageRank, Community Detection, Shortest Path, etc. • GSQL is well-suited for algorithms • Library is written in GSQL • Users can see the code and modify as desired • Cypher is less well-suited • Library is pre-compiled function calls • User can't see how the algorithms are written or modify them 20
  • 21. © 2018 TigerGraph. All Rights Reserved Expressing ML Algorithms in Graph Terms ● Not widely known how to execute most ML algorithms on a graph. ● Same problem as with exporting the Graph data: ○ ML is used to flat data ○ How to make edges a benefit instead of a hinderance? ● This marriage proposal needs further work. 21
  • 22. © 2018 TigerGraph. All Rights Reserved #3 - A Graph Database and ML partnership ● Multi-Step process: a. First, Graph DB runs queries to extract graph-based features b. Send graph features to ML system. c. ML system, enriched by new graph features, learns a predictive model. d. Model can be applied back to graph for real-time prediction. ● Seems more like a transactional partnership than a marriage. 22
  • 23. © 2018 TigerGraph. All Rights Reserved Example: China Mobile Fraud Detection using Graph Features for ML 23
  • 24. © 2018 TigerGraph. All Rights Reserved Generating New Training Data for ML to Detect Phone-Based Scam 24 Phone 2 Features Phone 1 Features (1) High call back phone (2) Stable group (3) Long term phone (4) Many in-group connections (5) 3-step friend relation (1) Short term call duration (2) Empty stable group (3) No call back phone (4) Many rejected calls (5) Avg. distance > 3 Training Data Tens – Hundreds of Billions of calls 118 per phone x 600 Million phones = 70 Billion new features Machine Learning Solution Graph with 600M phones and 15B call edges,1000s of calls/second. Feed ML with new training data with 118 features per phone.
  • 25. © 2018 TigerGraph. All Rights Reserved Graph-enhanced workflow for Anti-Money Laundering 25
  • 26. © 2018 TigerGraph. All Rights Reserved Summary Graph Databases and Machine Learning both represent powerful tools for getting more value from data. 3 Proposals for Marrying Graph DBs + ML: ● Export Graph Data to ML system ○ Conventional, but not clear how to treat the data. ● Perform ML within Graph Database ○ Need fast, scalable analytical Graph DB. ML methods not yet ported to Graph DB. ● Export Graph Features to ML system ○ Improves ML results; in practice now. 26
  • 27. © 2018 TigerGraph. All Rights Reserved Final Thoughts ● Room for diversity. ● Healthy marriages evolve over time. If you have any questions please contact info@tigergraph.com 27