SlideShare uma empresa Scribd logo
1 de 43
Amy E. Hodler
Graph Analytics and AI Program Manager, Neo4j
San Francisco, May 2019
Improve ML Predictions Using
Graph Algorithms (Today!)
Amy.Hodler@neo4j.com
@amyhodler
neo4j.com/
graph-algorithms-book
free download
free in lobby today
Chapter 8:
Graph + ML
Spark & Neo4j
Relationships:
One of the Strongest
Predictors of Behavior
James Fowler
…And Success!
David Burkus
James Fowler
Albert-Laszlo
Barabasi
• Current data science models ignore network structure
• Graphs add highly predictive features to existing ML models
• Otherwise unattainable predictions based on relationships
Graphs Increase the Predictive Power of AI
with the Data You Already Have
Machine Learning Pipeline
Steps Forward in Graph Data Science
Graph Persistence
Knowledge
Graphs
Connected Feature
Engineering
Graph Native
Learning
Feature Engineering is how we combine and process the
data to create new, more meaningful features, such as
clustering or connectivity metrics.
Graph Feature Engineering
Add More Descriptive Features:
- Influence
- Relationships
- Communities
Feature Extraction
Graph Machine Learning Workflow
Data aggregation
Create and store
graphs
Extract Data &
Store as Graph
Explore, Clean,
Modify
Prepare for
Machine
Learning
Train
Models
Evaluate
Results
Productionize
Identify
uninteresting
features
Cleanse (outliers+)
Feature
engineering/
extraction
Train / Test split
Resample for
meaningful
representation
(proportional, etc.)
Precision,
accuracy, recall
(ROC curve & AUC)
SME Review
Cross-validation
Model & variable
selection
Hyperparameter
tuning
Ensemble methods
Example Technologies
Example:
Machine Learning for
Link Prediction
Can we infer new interactions in the future?
What unobserved facts we’re missing?
Algorithm Measures
Run targeted algorithms and score
outcomes
Set a threshold value used to
predict a link between nodes
Methods for Link Prediction
Machine Learning
Use the measures as features to
train an ML model
1st
Node
2nd
Node
Common
Neighbors
Preferential
Attachment
label
1 2 4 15 1
3 4 7 12 1
5 6 1 1 0
• Citation Network Dataset - Research Dataset
– “ArnetMiner: Extraction and Mining of Academic Social Networks”,
by J. Tang et al
– Used a subset with 52K papers, 80K authors, 140K author relationships
and 29K citation relationships
• Neo4j
– Create a co-authorship graph and connected feature engineering
• Spark and MLlib
– Train and test our model using a random forest classifier
Predicting Collaboration
with a Graph Enhanced ML Model
Graph Features:
• Common Authors
4 Models: Multiple Graph Features
“Graphy”
Model
Common Authors
Model
Triangles
Model
Community
Model
Graph Features:
• Preferential
Attachment
• Total Neighbors
Graph Features:
• Min & Max
Triangles
• Min & Max
Clustering
Coefficient
Graph Features:
• Label Propagation
• Louvain
Modularity
Trial, Trial and Error!
15
Test/Train Split
16
Test/Train Split
1st
Node
2nd
Node
Common
Neighbors
Preferential
Attachment
label
1 2 4 15 1
3 4 7 12 1
5 6 1 1 0
2 12 3 3 0
4 9 4 8 1
7 10 12 36 1
8 11 2 3 0
Train
Test
OMG I’m Good!
Data Leakage!
Graph metric computation for the train
set touches data from the test set.
Did you get really high accuracy on your
first run without tuning?
Train and Test Graphs: Time Based Split
1st
Node
2nd
Node
Common
Neighbors
Preferential
Attachment
label
1 2 4 15 1
3 4 7 12 1
5 6 1 1 0
Train
Test
1st
Node
2nd
Node
Common
Neighbors
Preferential
Attachment
label
2 12 3 3 0
4 9 4 8 1
7 10 12 36 1
<
2006
>=
2006
Train and Test Graphs: Time Based Split
Train
Test
1st
Node
2nd
Node
Common
Neighbors
Preferential
Attachment
label
1 2 4 15 1
3 4 7 12 1
5 6 1 1 0
1st
Node
2nd
Node
Common
Neighbors
Preferential
Attachment
label
2 12 3 3 0
4 9 4 8 1
7 10 12 36 1
Class Imbalance
Negative
Examples
Positive
Examples
A very high accuracy model could
predict that a pair of nodes are
not linked.
22
Class Imbalance
Class Imbalance
Training Our Model
This is one decision tree in
our Random Forest used as a
binary classifier to learn how
to classify a pair: predicting
either linked or not linked.
Result: First Model ROC & AUC Common Authors
Model 1
Result: All Models Common Authors
Model 1
Community
Model 4
Graph Feature Influence for Tuning
For feature importance, the
Spark random forest averages
the reduction in impurity
across all trees in the forest
Feature rankings are in
comparison to the group of
features evaluated
Also try PageRank!
Try removing some features.
LabelPropagation
Feature Selection is how we reduce the number of features used in
a model to a relevant subset. This can be done algorithmically or
based on domain expertise, but the objective is to maximize the
predictive power of your model while minimizing overfitting
Graph Feature Selection
Graph Algorithms for Feature
Engineering
32
Graph Feature Categories & Algorithms
Pathfinding
& Search
Finds the optimal paths or evaluates
route availability and quality
Centrality /
Importance
Determines the importance of
distinct nodes in the network
Community
Detection
Detects group clustering or
partition options
Heuristic
Link Prediction
Estimates the likelihood of nodes
forming a relationship
Evaluates how alike
nodes are
Similarity
Embeddings
Learned representations
of connectivity or topology
• Basic network analysis, e.g.
“small-world” structures,
stability
• ML Features:
• Tightness of groups
• Probability of links
Triangles and Clustering Coefficient
u
Triangles = 2
CC= 0.33
u
Triangles = 2
CC= 0.2
Probability that neighbors are connected
• Nodes adopt labels based on
neighbors to infer clusters
• Great for proposing and
large-scale initial clustering
• Graph ML Feature:
• Group Membership
(classification)
Label Propagation
1
2
2
5
3
2
1
6
1
5
4
Iteration
• Broad influence based on
transitive relationships and
originating node’s influence
• ”Golfing with the CEO”
• Graph ML Feature:
• Score top influencer
• Rank influence
• Contextual ranking
PageRank
• Measure of proportional
similarities between nodes
• Recommend of similar items
• Graph ML Feature:
• Coefficient representing
similarity of nodes
• Often as part of link prediction
Jaccard Similarity
A B
A BB
• Measures closeness by
multiplying the number of
connections two nodes have
• Rich get richer…
• Graph ML Feature:
• Probability of relationships
forming
Preferential Attachment
Illustration: be.amazd.com/link-prediction/
• Based on number of potential
triangles / closing triangles
• 2 strangers with a lot of friends
in common…
• Graph ML Feature:
• Probability of relationships
forming
Common Neighbors
Weight with Adamic AdairIllustration: be.amazd.com/link-prediction/
DEMO: Data
Game of Thrones – Books
• 800 Nodes
• 2,900 Relationships
(weighted by interactions)
• Game of Thrones – TV-Series
• 400 Nodes
• 565,800 Relationships
Andrew Beveridge’s script to graph
DEMO: Neo4j Desktop, Algorithms, Playground
neo4j.com/download/ install.graphapp.io
Lab Tools
Resources
• Algorithms Guide in Sandbox
• neo4j.com/sandbox
• Algorithms Playground (NEuler)
• neo4j.com/developer/
• Community for Q&A
• community.neo4j.com
• Code & Citation data from O’Reilly
book
• bit.ly/2FPgGVV (ML Folder)
Amy.Hodler@neo4j.com
@amyhodler
neo4j.com/
graph-algorithms-book
44
DEMO BACKUP
Improve ML Predictions using Graph Analytics (today!)
Improve ML Predictions using Graph Analytics (today!)
Improve ML Predictions using Graph Analytics (today!)
Improve ML Predictions using Graph Analytics (today!)
Improve ML Predictions using Graph Analytics (today!)

Mais conteúdo relacionado

Mais procurados

Elegant and Scalable Code Querying with Code Property Graphs
Elegant and Scalable Code Querying with Code Property GraphsElegant and Scalable Code Querying with Code Property Graphs
Elegant and Scalable Code Querying with Code Property Graphs
Connected Data World
 
Graph Based Machine Learning on Relational Data
Graph Based Machine Learning on Relational DataGraph Based Machine Learning on Relational Data
Graph Based Machine Learning on Relational Data
Benjamin Bengfort
 

Mais procurados (20)

Neo4j GraphTalk Basel - Building intelligent Software with Graphs
Neo4j GraphTalk Basel - Building intelligent Software with GraphsNeo4j GraphTalk Basel - Building intelligent Software with Graphs
Neo4j GraphTalk Basel - Building intelligent Software with Graphs
 
How Graphs Enhance AI
How Graphs Enhance AIHow Graphs Enhance AI
How Graphs Enhance AI
 
Neo4j Graph Data Science Training - June 9 & 10 - Slides #6 Graph Algorithms
Neo4j Graph Data Science Training - June 9 & 10 - Slides #6 Graph AlgorithmsNeo4j Graph Data Science Training - June 9 & 10 - Slides #6 Graph Algorithms
Neo4j Graph Data Science Training - June 9 & 10 - Slides #6 Graph Algorithms
 
Graph Databases and Graph Data Science in Neo4j
Graph Databases and Graph Data Science in Neo4jGraph Databases and Graph Data Science in Neo4j
Graph Databases and Graph Data Science in Neo4j
 
Graph analytic and machine learning
Graph analytic and machine learningGraph analytic and machine learning
Graph analytic and machine learning
 
Interpreting Relational Schema to Graphs
Interpreting Relational Schema to GraphsInterpreting Relational Schema to Graphs
Interpreting Relational Schema to Graphs
 
Neo4j GraphDay Seattle- Sept19- neo4j basic training
Neo4j GraphDay Seattle- Sept19- neo4j basic trainingNeo4j GraphDay Seattle- Sept19- neo4j basic training
Neo4j GraphDay Seattle- Sept19- neo4j basic training
 
Getting Started With Dato - August 2015
Getting Started With Dato - August 2015Getting Started With Dato - August 2015
Getting Started With Dato - August 2015
 
Graph Analytics: Graph Algorithms Inside Neo4j
Graph Analytics: Graph Algorithms Inside Neo4jGraph Analytics: Graph Algorithms Inside Neo4j
Graph Analytics: Graph Algorithms Inside Neo4j
 
Graph database & neo4j
Graph database & neo4jGraph database & neo4j
Graph database & neo4j
 
Elegant and Scalable Code Querying with Code Property Graphs
Elegant and Scalable Code Querying with Code Property GraphsElegant and Scalable Code Querying with Code Property Graphs
Elegant and Scalable Code Querying with Code Property Graphs
 
Knowledge graphs, meet Deep Learning
Knowledge graphs, meet Deep LearningKnowledge graphs, meet Deep Learning
Knowledge graphs, meet Deep Learning
 
Using Set Cover to Optimize a Large-Scale Low Latency Distributed Graph
Using Set Cover to Optimize a Large-Scale Low Latency Distributed GraphUsing Set Cover to Optimize a Large-Scale Low Latency Distributed Graph
Using Set Cover to Optimize a Large-Scale Low Latency Distributed Graph
 
Graph Analytics
Graph AnalyticsGraph Analytics
Graph Analytics
 
Big Data Analytics With MATLAB
Big Data Analytics With MATLABBig Data Analytics With MATLAB
Big Data Analytics With MATLAB
 
The Analytics Frontier of the Hadoop Eco-System
The Analytics Frontier of the Hadoop Eco-SystemThe Analytics Frontier of the Hadoop Eco-System
The Analytics Frontier of the Hadoop Eco-System
 
An Introduction to Graph: Database, Analytics, and Cloud Services
An Introduction to Graph:  Database, Analytics, and Cloud ServicesAn Introduction to Graph:  Database, Analytics, and Cloud Services
An Introduction to Graph: Database, Analytics, and Cloud Services
 
Apache Spark GraphX & GraphFrame Synthetic ID Fraud Use Case
Apache Spark GraphX & GraphFrame Synthetic ID Fraud Use CaseApache Spark GraphX & GraphFrame Synthetic ID Fraud Use Case
Apache Spark GraphX & GraphFrame Synthetic ID Fraud Use Case
 
Graph Based Machine Learning on Relational Data
Graph Based Machine Learning on Relational DataGraph Based Machine Learning on Relational Data
Graph Based Machine Learning on Relational Data
 
The Machine Learning Workflow with Azure
The Machine Learning Workflow with AzureThe Machine Learning Workflow with Azure
The Machine Learning Workflow with Azure
 

Semelhante a Improve ML Predictions using Graph Analytics (today!)

Machine Learning Pipelines - Joseph Bradley - Databricks
Machine Learning Pipelines - Joseph Bradley - DatabricksMachine Learning Pipelines - Joseph Bradley - Databricks
Machine Learning Pipelines - Joseph Bradley - Databricks
Spark Summit
 

Semelhante a Improve ML Predictions using Graph Analytics (today!) (20)

3. Relationships Matter: Using Connected Data for Better Machine Learning
3. Relationships Matter: Using Connected Data for Better Machine Learning3. Relationships Matter: Using Connected Data for Better Machine Learning
3. Relationships Matter: Using Connected Data for Better Machine Learning
 
Leveraging Graphs for Better AI
Leveraging Graphs for Better AILeveraging Graphs for Better AI
Leveraging Graphs for Better AI
 
What Makes Graph Queries Difficult?
What Makes Graph Queries Difficult?What Makes Graph Queries Difficult?
What Makes Graph Queries Difficult?
 
Machine Learning Pipelines - Joseph Bradley - Databricks
Machine Learning Pipelines - Joseph Bradley - DatabricksMachine Learning Pipelines - Joseph Bradley - Databricks
Machine Learning Pipelines - Joseph Bradley - Databricks
 
How Graphs are Changing AI
How Graphs are Changing AIHow Graphs are Changing AI
How Graphs are Changing AI
 
Transforming AI with Graphs: Real World Examples using Spark and Neo4j
Transforming AI with Graphs: Real World Examples using Spark and Neo4jTransforming AI with Graphs: Real World Examples using Spark and Neo4j
Transforming AI with Graphs: Real World Examples using Spark and Neo4j
 
Transforming AI with Graphs: Real World Examples using Spark and Neo4j
Transforming AI with Graphs: Real World Examples using Spark and Neo4jTransforming AI with Graphs: Real World Examples using Spark and Neo4j
Transforming AI with Graphs: Real World Examples using Spark and Neo4j
 
Improve ML Predictions using Connected Feature Extraction
Improve ML Predictions using Connected Feature ExtractionImprove ML Predictions using Connected Feature Extraction
Improve ML Predictions using Connected Feature Extraction
 
Introduction to graph databases in term of neo4j
Introduction to graph databases in term of neo4jIntroduction to graph databases in term of neo4j
Introduction to graph databases in term of neo4j
 
How Graph Algorithms Answer your Business Questions in Banking and Beyond
How Graph Algorithms Answer your Business Questions in Banking and BeyondHow Graph Algorithms Answer your Business Questions in Banking and Beyond
How Graph Algorithms Answer your Business Questions in Banking and Beyond
 
How Graph Technology is Changing AI
How Graph Technology is Changing AIHow Graph Technology is Changing AI
How Graph Technology is Changing AI
 
GraphTour Boston - Graphs for AI and ML
GraphTour Boston - Graphs for AI and MLGraphTour Boston - Graphs for AI and ML
GraphTour Boston - Graphs for AI and ML
 
3DIR: Exploiting Topological Relationships in Three-dimensional Information R...
3DIR: Exploiting Topological Relationships in Three-dimensional Information R...3DIR: Exploiting Topological Relationships in Three-dimensional Information R...
3DIR: Exploiting Topological Relationships in Three-dimensional Information R...
 
Graphs for Ai and ML
Graphs for Ai and MLGraphs for Ai and ML
Graphs for Ai and ML
 
Studying Software Engineering Patterns for Designing Machine Learning Systems
Studying Software Engineering Patterns for Designing Machine Learning SystemsStudying Software Engineering Patterns for Designing Machine Learning Systems
Studying Software Engineering Patterns for Designing Machine Learning Systems
 
laptop price prediction presentation
laptop price prediction presentationlaptop price prediction presentation
laptop price prediction presentation
 
Designing Distributed Machine Learning on Apache Spark
Designing Distributed Machine Learning on Apache SparkDesigning Distributed Machine Learning on Apache Spark
Designing Distributed Machine Learning on Apache Spark
 
Relationships Matter: Using Connected Data for Better Machine Learning
Relationships Matter: Using Connected Data for Better Machine LearningRelationships Matter: Using Connected Data for Better Machine Learning
Relationships Matter: Using Connected Data for Better Machine Learning
 
Collective Spammer Detection in Evolving Multi-Relational Social Networks
Collective Spammer Detection in Evolving Multi-Relational Social NetworksCollective Spammer Detection in Evolving Multi-Relational Social Networks
Collective Spammer Detection in Evolving Multi-Relational Social Networks
 
Ted Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SF
Ted Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SFTed Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SF
Ted Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SF
 

Mais de Neo4j

Mais de Neo4j (20)

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansQIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...
 
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafosBBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
 
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
 
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jGraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
 
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
 
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdfRabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
 
Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!
 
IA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG timeIA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG time
 
Neo4j: Data Engineering for RAG (retrieval augmented generation)
Neo4j: Data Engineering for RAG (retrieval augmented generation)Neo4j: Data Engineering for RAG (retrieval augmented generation)
Neo4j: Data Engineering for RAG (retrieval augmented generation)
 
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdfNeo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
 
Enabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge GraphsEnabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge Graphs
 
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdfNeo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
 
Neo4j Jesus Barrasa The Art of the Possible with Graph
Neo4j Jesus Barrasa The Art of the Possible with GraphNeo4j Jesus Barrasa The Art of the Possible with Graph
Neo4j Jesus Barrasa The Art of the Possible with Graph
 

Último

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Último (20)

Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 

Improve ML Predictions using Graph Analytics (today!)

  • 1. Amy E. Hodler Graph Analytics and AI Program Manager, Neo4j San Francisco, May 2019 Improve ML Predictions Using Graph Algorithms (Today!) Amy.Hodler@neo4j.com @amyhodler
  • 2. neo4j.com/ graph-algorithms-book free download free in lobby today Chapter 8: Graph + ML Spark & Neo4j
  • 3. Relationships: One of the Strongest Predictors of Behavior James Fowler
  • 4. …And Success! David Burkus James Fowler Albert-Laszlo Barabasi
  • 5. • Current data science models ignore network structure • Graphs add highly predictive features to existing ML models • Otherwise unattainable predictions based on relationships Graphs Increase the Predictive Power of AI with the Data You Already Have Machine Learning Pipeline
  • 6. Steps Forward in Graph Data Science Graph Persistence Knowledge Graphs Connected Feature Engineering Graph Native Learning
  • 7. Feature Engineering is how we combine and process the data to create new, more meaningful features, such as clustering or connectivity metrics. Graph Feature Engineering Add More Descriptive Features: - Influence - Relationships - Communities Feature Extraction
  • 8. Graph Machine Learning Workflow Data aggregation Create and store graphs Extract Data & Store as Graph Explore, Clean, Modify Prepare for Machine Learning Train Models Evaluate Results Productionize Identify uninteresting features Cleanse (outliers+) Feature engineering/ extraction Train / Test split Resample for meaningful representation (proportional, etc.) Precision, accuracy, recall (ROC curve & AUC) SME Review Cross-validation Model & variable selection Hyperparameter tuning Ensemble methods Example Technologies
  • 10. Can we infer new interactions in the future? What unobserved facts we’re missing?
  • 11. Algorithm Measures Run targeted algorithms and score outcomes Set a threshold value used to predict a link between nodes Methods for Link Prediction Machine Learning Use the measures as features to train an ML model 1st Node 2nd Node Common Neighbors Preferential Attachment label 1 2 4 15 1 3 4 7 12 1 5 6 1 1 0
  • 12. • Citation Network Dataset - Research Dataset – “ArnetMiner: Extraction and Mining of Academic Social Networks”, by J. Tang et al – Used a subset with 52K papers, 80K authors, 140K author relationships and 29K citation relationships • Neo4j – Create a co-authorship graph and connected feature engineering • Spark and MLlib – Train and test our model using a random forest classifier Predicting Collaboration with a Graph Enhanced ML Model
  • 13. Graph Features: • Common Authors 4 Models: Multiple Graph Features “Graphy” Model Common Authors Model Triangles Model Community Model Graph Features: • Preferential Attachment • Total Neighbors Graph Features: • Min & Max Triangles • Min & Max Clustering Coefficient Graph Features: • Label Propagation • Louvain Modularity Trial, Trial and Error!
  • 15. 16 Test/Train Split 1st Node 2nd Node Common Neighbors Preferential Attachment label 1 2 4 15 1 3 4 7 12 1 5 6 1 1 0 2 12 3 3 0 4 9 4 8 1 7 10 12 36 1 8 11 2 3 0 Train Test
  • 16. OMG I’m Good! Data Leakage! Graph metric computation for the train set touches data from the test set. Did you get really high accuracy on your first run without tuning?
  • 17. Train and Test Graphs: Time Based Split 1st Node 2nd Node Common Neighbors Preferential Attachment label 1 2 4 15 1 3 4 7 12 1 5 6 1 1 0 Train Test 1st Node 2nd Node Common Neighbors Preferential Attachment label 2 12 3 3 0 4 9 4 8 1 7 10 12 36 1 < 2006 >= 2006
  • 18. Train and Test Graphs: Time Based Split Train Test 1st Node 2nd Node Common Neighbors Preferential Attachment label 1 2 4 15 1 3 4 7 12 1 5 6 1 1 0 1st Node 2nd Node Common Neighbors Preferential Attachment label 2 12 3 3 0 4 9 4 8 1 7 10 12 36 1
  • 20. A very high accuracy model could predict that a pair of nodes are not linked. 22 Class Imbalance
  • 22. Training Our Model This is one decision tree in our Random Forest used as a binary classifier to learn how to classify a pair: predicting either linked or not linked.
  • 23. Result: First Model ROC & AUC Common Authors Model 1
  • 24. Result: All Models Common Authors Model 1 Community Model 4
  • 25. Graph Feature Influence for Tuning For feature importance, the Spark random forest averages the reduction in impurity across all trees in the forest Feature rankings are in comparison to the group of features evaluated Also try PageRank! Try removing some features. LabelPropagation
  • 26. Feature Selection is how we reduce the number of features used in a model to a relevant subset. This can be done algorithmically or based on domain expertise, but the objective is to maximize the predictive power of your model while minimizing overfitting Graph Feature Selection
  • 27. Graph Algorithms for Feature Engineering
  • 28. 32 Graph Feature Categories & Algorithms Pathfinding & Search Finds the optimal paths or evaluates route availability and quality Centrality / Importance Determines the importance of distinct nodes in the network Community Detection Detects group clustering or partition options Heuristic Link Prediction Estimates the likelihood of nodes forming a relationship Evaluates how alike nodes are Similarity Embeddings Learned representations of connectivity or topology
  • 29. • Basic network analysis, e.g. “small-world” structures, stability • ML Features: • Tightness of groups • Probability of links Triangles and Clustering Coefficient u Triangles = 2 CC= 0.33 u Triangles = 2 CC= 0.2 Probability that neighbors are connected
  • 30. • Nodes adopt labels based on neighbors to infer clusters • Great for proposing and large-scale initial clustering • Graph ML Feature: • Group Membership (classification) Label Propagation 1 2 2 5 3 2 1 6 1 5 4 Iteration
  • 31. • Broad influence based on transitive relationships and originating node’s influence • ”Golfing with the CEO” • Graph ML Feature: • Score top influencer • Rank influence • Contextual ranking PageRank
  • 32. • Measure of proportional similarities between nodes • Recommend of similar items • Graph ML Feature: • Coefficient representing similarity of nodes • Often as part of link prediction Jaccard Similarity A B A BB
  • 33. • Measures closeness by multiplying the number of connections two nodes have • Rich get richer… • Graph ML Feature: • Probability of relationships forming Preferential Attachment Illustration: be.amazd.com/link-prediction/
  • 34. • Based on number of potential triangles / closing triangles • 2 strangers with a lot of friends in common… • Graph ML Feature: • Probability of relationships forming Common Neighbors Weight with Adamic AdairIllustration: be.amazd.com/link-prediction/
  • 35. DEMO: Data Game of Thrones – Books • 800 Nodes • 2,900 Relationships (weighted by interactions) • Game of Thrones – TV-Series • 400 Nodes • 565,800 Relationships Andrew Beveridge’s script to graph
  • 36. DEMO: Neo4j Desktop, Algorithms, Playground neo4j.com/download/ install.graphapp.io Lab Tools
  • 37. Resources • Algorithms Guide in Sandbox • neo4j.com/sandbox • Algorithms Playground (NEuler) • neo4j.com/developer/ • Community for Q&A • community.neo4j.com • Code & Citation data from O’Reilly book • bit.ly/2FPgGVV (ML Folder) Amy.Hodler@neo4j.com @amyhodler neo4j.com/ graph-algorithms-book