SlideShare uma empresa Scribd logo
1 de 27
Data Modeling for Graph Data
Science
2
From Previous Session:
Developing the initial graph data model
1. Define high-level domain requirements
2. Create sample data for modeling purposes
3. Define the questions for the domain
4. Identify entities
5. Identify connections between entities
6. Test the model against the questions
7. Test scalability
What could a generic
data model look like for
Game of Thrones?
3
4
Game Of Thrones Data Model
Battle
King
Person
Knight
House
● Based on Network of Thrones, A song of
Math and Westeros
● Information from 5 books. Represents
characters, houses they belong to,
interactions between them and battles
etc..
5
Let’s look at some queries..
Find the Houses attacking or defending in “Battle of the Blackwater Bay”
and return names of Kings and Knights in those houses
MATCH (k) -
[:ATTACKER_KING |:DEFENDER_KING |:ATTACKER_COMMANDER
|:DEFENDER_COMMANDER]
-> (b:Battle {name: 'Battle of the Blackwater'})
WHERE 'King' in labels(k) or 'Knight' in labels(k)
MATCH (h:House) - [:ATTACKER |:DEFENDER] -> (b:Battle
{name: 'Battle of the Blackwater'})
RETURN (k) -[:BELONGS_TO] - (h)
6
Find all people who belonged to “House Lannister” and died between 300
and 600 and whom they interacted with from “House Martell”
Let’s look at some queries..
MATCH (p1:Person) - [:BELONGS_TO] ->
(h:House {name: 'Lannister'})
WHERE 300 <=p1.death_year <=600
MATCH p = (p1) -[:INTERACTS] - (:Person) -
[:BELONGS_TO] - (:House {name : 'Martell'})
RETURN p
7
Cluster or group battles based on Kings and Knights participated in them?
Let’s look at some queries..
…. this isn’t easy to answer with Cypher, but we’ve got Graph
Algorithms (Community detection) for that!
Objective: Understand data modeling for graph data
science.
● Considerations prior to schema design
● Monopartite, bipartite & multipartite graphs
8
In This Session..
The fundamentals don’t change
The white board model is
still your starting point
9
Things that break Neo4j still
break Neo4j for data
science
- Putting tons of attributes on
nodes and relationships
- Making all of your attributes into
nodes
- Labels that aren’t sufficiently
specific
- Super densely connected nodes
• Define your questions and objectives up front
• Are you running algorithms to answer a specific question? or
• Are you generating features for an ML pipeline?
10
Considerations prior to designing schema
• Which algorithms do you want to run?
• Different algorithms require differently shaped graphs
• How often do you want to run your workload
• Is the graph for one offs and experimentation?
• Is this part of a pipeline?
• Do you need to do data refreshes? Incremental updates?
• Are there any time constraints?
11
What is your objective?
Workload
Type
Performance
Expectation
Data Data Model
Experimental
Mixed Variable, no strict
requirements
The more the
merrier
Domain relevant, easy to
refactor
Model building
Primarily
Algorithms
Variable, usually
<24 hours
Reproducible,
timestamps,
test/train splits
Follow Best practices,
Easy for running algos
and updating data
Production ML
Primarily
Algorithms
< a few hours Only the data
relevant to your
workload
Mono- or bipartite
graphs, seeding
Production
Analytics
Mixed < 24 hours Only the data
relevant to your
workload
Optimized for algorithms
(mono- or bipartite
graphs, seeding) + key
queries
OBJECTIVES
CONSIDERATIONS
12
What are Graph Algorithms?
• Graph Algorithms (#Algorithms) solve a wide variety of problems
(#Problems) and are applicable in many areas of science and technology
(#Applications)
• Algorithms require differently shaped graphs. For example, spanning trees,
shortest path and community detection algorithms work on homogeneous
networks represented by monopartite graphs
• Similarity algorithms can work on heterogeneous networks represented
by bipartite or multipartite graphs with some strong assumptions
Real world networks are heterogeneous!
13
Monopartite, bipartite & multipartite
graphs
Most networks contain data with
multiple node and relationship
types. (multipartite)
A bipartite graph contains
nodes that can be divided into
two sets, such that
relationships only exists
between sets but not within
each set.
Node Similarity relies on this
type of graph.
A monopartite graph
contains one node label and
relationship. Most algorithms
rely on this type of graph.
14
Derived relationships and k-partite graphs
• One class of algorithm --
community detection is made to
answer this question, but it is
intended to work on monopartite
graphs
• Can we use cypher queries to
construct a subgraph with only
battles in it? That has information
about Kings/Knights that
participated in the battles?
Battle
King
Person
Knight
House
Let’s revisit this question: “Cluster or group battles based on
Kings and Knights participated in them”
15
Derived relationships and k-partite graphs
• One class of algorithm --
community detection is made to
answer this question, but it is
intended to work on monopartite
graphs
• Can we use cypher queries to
construct a subgraph with only
battles in it? That has information
about Kings/Knights that
participated in the battles?
Battle
King
Person
Knight
House
Let’s revisit this question: “Cluster or group battles based on
Kings and Knights participated in them”
16
Derived relationships and k-partite graphs
Monopartite = only 1 node label
- Connect battles to each
other
- Based on how many Knights
and Kings fought in the same
battle
Battle
King
Person
Knight
House
Let’s revisit this question: “Cluster or group battles based on
Kings and Knights participated in them”
17
Derived relationships and k-partite graphs
MATCH
(b1:Battle)<-[]-(k)-[]->(b2:Battle)
WHERE k:King or k:Knight
RETURN b1.name AS source, b2.name
AS target, count(distinct k) AS
num_participants
Battle
King
Person
Knight
House
Let’s revisit this question: “Cluster or group battles based on
Kings and Knights participated in them”
This gives you:
- for every pair of battles,
- how many kings or knights
fought in both
18
Derived relationships and k-partite graphs
MATCH
(b1:Battle)<-[]-(k)-[]->(b2:Battle)
WHERE k:King or k:Knight
WITH b1, b2 count(distinct k) AS
num_participants
MERGE (b1)-[r:shared_fighters
{count:num_participants}]->(b2)
RETURN b1,r,b2
Let’s revisit this question: “Cluster or group battles based on
Kings and Knights participated in them”
We can add the new derived relationship to the
graph and extract the relevant subgraph
19
Derived relationships and k-partite graphs
Let’s revisit this question: “Cluster or group battles based on
Kings and Knights participated in them”
Now that we have the graph we can run
our algorithms on ….
we can use one of the Community
detection algorithms to cluster
battles.
Algorithms learn based on the structure of a graph, and make
assumptions about graph topology:
• Any variation in the distribution of relationships for nodes can
tell us some underlying property (how important is a node?
what community is it in?)
• It’s not possible to iterate heuristics over directed graphs
• There’s no external information relevant to the algorithm (like
node labels or relationship type)
20
Why can’t I run my algorithm on a
multipartite graph?
huh?
What if I try to an algorithm on this graph?
- How many relationships does each
person have?
- How many relationships does each book
have?
- What is the direction of the relationships
in this graph?
- Can I reach a person node from another
person node, following the directed
relationships?
21
Why can’t I run my algorithm on a
multipartite graph?
What if I try to an algorithm on this graph?
- How many relationships does each
person have? 1 or 2
- How many relationships does each book
have? 5 or 6
- What is the direction of the relationships
in this graph? Person-[:APPEARED_IN]->Book
- Can I reach a person node from another
person node, following the directed
relationships? No!
22
Why can’t I run my algorithm on a
multipartite graph?
What if I try to an algorithm on this graph?
- What would an algorithm that used the
number of edges each node has to
calculate centrality conclude?
- What would an algorithm that followed
directed relationships to find
communities conclude?
23
Why can’t I run my algorithm on a
multipartite graph?
What if I try to an algorithm on this graph?
- What would an algorithm that used the
number of edges each node has to
calculate centrality conclude?
Books are more important than people!
- What would an algorithm that followed
directed relationships to find
communities conclude?
There seven communities?
24
Why can’t I run my algorithm on a
multipartite graph?
If you want to find out:
- What person is the most important
- How many communities of people are
there, across all the books?
You need to reshape your graph!
25
Why can’t I run my algorithm on a
multipartite graph?
• Neo4j automates data
transformations
• Fast iterations & layering
• Production ready features,
parallelization & enterprise
support
Neo4j’s Graph Catalog solves this problem!   
We have procedures to let you reshape and subset your
transactional graph so you have the right data in the right shape.
Mutable In-Memory Workspace
Computational Graph
Native Graph Store
27
Next Session..
Deep dive into Graph Data Science Library
● Graph Catalog
● Graph Algorithms

Mais conteúdo relacionado

Mais de Neo4j

GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jGraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jNeo4j
 
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j
 
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdfRabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j
 
Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!Neo4j
 
IA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG timeIA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG timeNeo4j
 
Neo4j: Data Engineering for RAG (retrieval augmented generation)
Neo4j: Data Engineering for RAG (retrieval augmented generation)Neo4j: Data Engineering for RAG (retrieval augmented generation)
Neo4j: Data Engineering for RAG (retrieval augmented generation)Neo4j
 
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdfNeo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdfNeo4j
 
Enabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge GraphsEnabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge GraphsNeo4j
 
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdfNeo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdfNeo4j
 
Neo4j Jesus Barrasa The Art of the Possible with Graph
Neo4j Jesus Barrasa The Art of the Possible with GraphNeo4j Jesus Barrasa The Art of the Possible with Graph
Neo4j Jesus Barrasa The Art of the Possible with GraphNeo4j
 
SWIFT: Maintaining Critical Standards in the Financial Services Industry with...
SWIFT: Maintaining Critical Standards in the Financial Services Industry with...SWIFT: Maintaining Critical Standards in the Financial Services Industry with...
SWIFT: Maintaining Critical Standards in the Financial Services Industry with...Neo4j
 
Deloitte & Red Cross: Talk to your data with Knowledge-enriched Generative AI
Deloitte & Red Cross: Talk to your data with Knowledge-enriched Generative AIDeloitte & Red Cross: Talk to your data with Knowledge-enriched Generative AI
Deloitte & Red Cross: Talk to your data with Knowledge-enriched Generative AINeo4j
 
Ingka Digital: Linked Metadata by Design
Ingka Digital: Linked Metadata by DesignIngka Digital: Linked Metadata by Design
Ingka Digital: Linked Metadata by DesignNeo4j
 
Discover Neo4j Aura_ The Future of Graph Database-as-a-Service Workshop_3.13.24
Discover Neo4j Aura_ The Future of Graph Database-as-a-Service Workshop_3.13.24Discover Neo4j Aura_ The Future of Graph Database-as-a-Service Workshop_3.13.24
Discover Neo4j Aura_ The Future of Graph Database-as-a-Service Workshop_3.13.24Neo4j
 
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptxGraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptxNeo4j
 
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptx
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptxEmil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptx
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptxNeo4j
 
Identification of insulin-resistance genes with Knowledge Graphs topology and...
Identification of insulin-resistance genes with Knowledge Graphs topology and...Identification of insulin-resistance genes with Knowledge Graphs topology and...
Identification of insulin-resistance genes with Knowledge Graphs topology and...Neo4j
 
Novo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4jNovo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4jNeo4j
 
EY: Graphs as Critical Enablers for LLM-based Assistants- the Case of Custome...
EY: Graphs as Critical Enablers for LLM-based Assistants- the Case of Custome...EY: Graphs as Critical Enablers for LLM-based Assistants- the Case of Custome...
EY: Graphs as Critical Enablers for LLM-based Assistants- the Case of Custome...Neo4j
 
GraphSummit London Feb 2024 - ABK - Neo4j Product Vision and Roadmap.pptx
GraphSummit London Feb 2024 - ABK - Neo4j Product Vision and Roadmap.pptxGraphSummit London Feb 2024 - ABK - Neo4j Product Vision and Roadmap.pptx
GraphSummit London Feb 2024 - ABK - Neo4j Product Vision and Roadmap.pptxNeo4j
 

Mais de Neo4j (20)

GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jGraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
 
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
 
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdfRabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
 
Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!
 
IA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG timeIA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG time
 
Neo4j: Data Engineering for RAG (retrieval augmented generation)
Neo4j: Data Engineering for RAG (retrieval augmented generation)Neo4j: Data Engineering for RAG (retrieval augmented generation)
Neo4j: Data Engineering for RAG (retrieval augmented generation)
 
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdfNeo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
 
Enabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge GraphsEnabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge Graphs
 
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdfNeo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
 
Neo4j Jesus Barrasa The Art of the Possible with Graph
Neo4j Jesus Barrasa The Art of the Possible with GraphNeo4j Jesus Barrasa The Art of the Possible with Graph
Neo4j Jesus Barrasa The Art of the Possible with Graph
 
SWIFT: Maintaining Critical Standards in the Financial Services Industry with...
SWIFT: Maintaining Critical Standards in the Financial Services Industry with...SWIFT: Maintaining Critical Standards in the Financial Services Industry with...
SWIFT: Maintaining Critical Standards in the Financial Services Industry with...
 
Deloitte & Red Cross: Talk to your data with Knowledge-enriched Generative AI
Deloitte & Red Cross: Talk to your data with Knowledge-enriched Generative AIDeloitte & Red Cross: Talk to your data with Knowledge-enriched Generative AI
Deloitte & Red Cross: Talk to your data with Knowledge-enriched Generative AI
 
Ingka Digital: Linked Metadata by Design
Ingka Digital: Linked Metadata by DesignIngka Digital: Linked Metadata by Design
Ingka Digital: Linked Metadata by Design
 
Discover Neo4j Aura_ The Future of Graph Database-as-a-Service Workshop_3.13.24
Discover Neo4j Aura_ The Future of Graph Database-as-a-Service Workshop_3.13.24Discover Neo4j Aura_ The Future of Graph Database-as-a-Service Workshop_3.13.24
Discover Neo4j Aura_ The Future of Graph Database-as-a-Service Workshop_3.13.24
 
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptxGraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
 
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptx
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptxEmil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptx
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptx
 
Identification of insulin-resistance genes with Knowledge Graphs topology and...
Identification of insulin-resistance genes with Knowledge Graphs topology and...Identification of insulin-resistance genes with Knowledge Graphs topology and...
Identification of insulin-resistance genes with Knowledge Graphs topology and...
 
Novo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4jNovo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4j
 
EY: Graphs as Critical Enablers for LLM-based Assistants- the Case of Custome...
EY: Graphs as Critical Enablers for LLM-based Assistants- the Case of Custome...EY: Graphs as Critical Enablers for LLM-based Assistants- the Case of Custome...
EY: Graphs as Critical Enablers for LLM-based Assistants- the Case of Custome...
 
GraphSummit London Feb 2024 - ABK - Neo4j Product Vision and Roadmap.pptx
GraphSummit London Feb 2024 - ABK - Neo4j Product Vision and Roadmap.pptxGraphSummit London Feb 2024 - ABK - Neo4j Product Vision and Roadmap.pptx
GraphSummit London Feb 2024 - ABK - Neo4j Product Vision and Roadmap.pptx
 

Último

Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 

Último (20)

Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 

Neo4j Graph Data Science Training - June 9 & 10 - Slides #4

  • 1. Data Modeling for Graph Data Science
  • 2. 2 From Previous Session: Developing the initial graph data model 1. Define high-level domain requirements 2. Create sample data for modeling purposes 3. Define the questions for the domain 4. Identify entities 5. Identify connections between entities 6. Test the model against the questions 7. Test scalability
  • 3. What could a generic data model look like for Game of Thrones? 3
  • 4. 4 Game Of Thrones Data Model Battle King Person Knight House ● Based on Network of Thrones, A song of Math and Westeros ● Information from 5 books. Represents characters, houses they belong to, interactions between them and battles etc..
  • 5. 5 Let’s look at some queries.. Find the Houses attacking or defending in “Battle of the Blackwater Bay” and return names of Kings and Knights in those houses MATCH (k) - [:ATTACKER_KING |:DEFENDER_KING |:ATTACKER_COMMANDER |:DEFENDER_COMMANDER] -> (b:Battle {name: 'Battle of the Blackwater'}) WHERE 'King' in labels(k) or 'Knight' in labels(k) MATCH (h:House) - [:ATTACKER |:DEFENDER] -> (b:Battle {name: 'Battle of the Blackwater'}) RETURN (k) -[:BELONGS_TO] - (h)
  • 6. 6 Find all people who belonged to “House Lannister” and died between 300 and 600 and whom they interacted with from “House Martell” Let’s look at some queries.. MATCH (p1:Person) - [:BELONGS_TO] -> (h:House {name: 'Lannister'}) WHERE 300 <=p1.death_year <=600 MATCH p = (p1) -[:INTERACTS] - (:Person) - [:BELONGS_TO] - (:House {name : 'Martell'}) RETURN p
  • 7. 7 Cluster or group battles based on Kings and Knights participated in them? Let’s look at some queries.. …. this isn’t easy to answer with Cypher, but we’ve got Graph Algorithms (Community detection) for that!
  • 8. Objective: Understand data modeling for graph data science. ● Considerations prior to schema design ● Monopartite, bipartite & multipartite graphs 8 In This Session..
  • 9. The fundamentals don’t change The white board model is still your starting point 9 Things that break Neo4j still break Neo4j for data science - Putting tons of attributes on nodes and relationships - Making all of your attributes into nodes - Labels that aren’t sufficiently specific - Super densely connected nodes
  • 10. • Define your questions and objectives up front • Are you running algorithms to answer a specific question? or • Are you generating features for an ML pipeline? 10 Considerations prior to designing schema • Which algorithms do you want to run? • Different algorithms require differently shaped graphs • How often do you want to run your workload • Is the graph for one offs and experimentation? • Is this part of a pipeline? • Do you need to do data refreshes? Incremental updates? • Are there any time constraints?
  • 11. 11 What is your objective? Workload Type Performance Expectation Data Data Model Experimental Mixed Variable, no strict requirements The more the merrier Domain relevant, easy to refactor Model building Primarily Algorithms Variable, usually <24 hours Reproducible, timestamps, test/train splits Follow Best practices, Easy for running algos and updating data Production ML Primarily Algorithms < a few hours Only the data relevant to your workload Mono- or bipartite graphs, seeding Production Analytics Mixed < 24 hours Only the data relevant to your workload Optimized for algorithms (mono- or bipartite graphs, seeding) + key queries OBJECTIVES CONSIDERATIONS
  • 12. 12 What are Graph Algorithms? • Graph Algorithms (#Algorithms) solve a wide variety of problems (#Problems) and are applicable in many areas of science and technology (#Applications) • Algorithms require differently shaped graphs. For example, spanning trees, shortest path and community detection algorithms work on homogeneous networks represented by monopartite graphs • Similarity algorithms can work on heterogeneous networks represented by bipartite or multipartite graphs with some strong assumptions Real world networks are heterogeneous!
  • 13. 13 Monopartite, bipartite & multipartite graphs Most networks contain data with multiple node and relationship types. (multipartite) A bipartite graph contains nodes that can be divided into two sets, such that relationships only exists between sets but not within each set. Node Similarity relies on this type of graph. A monopartite graph contains one node label and relationship. Most algorithms rely on this type of graph.
  • 14. 14 Derived relationships and k-partite graphs • One class of algorithm -- community detection is made to answer this question, but it is intended to work on monopartite graphs • Can we use cypher queries to construct a subgraph with only battles in it? That has information about Kings/Knights that participated in the battles? Battle King Person Knight House Let’s revisit this question: “Cluster or group battles based on Kings and Knights participated in them”
  • 15. 15 Derived relationships and k-partite graphs • One class of algorithm -- community detection is made to answer this question, but it is intended to work on monopartite graphs • Can we use cypher queries to construct a subgraph with only battles in it? That has information about Kings/Knights that participated in the battles? Battle King Person Knight House Let’s revisit this question: “Cluster or group battles based on Kings and Knights participated in them”
  • 16. 16 Derived relationships and k-partite graphs Monopartite = only 1 node label - Connect battles to each other - Based on how many Knights and Kings fought in the same battle Battle King Person Knight House Let’s revisit this question: “Cluster or group battles based on Kings and Knights participated in them”
  • 17. 17 Derived relationships and k-partite graphs MATCH (b1:Battle)<-[]-(k)-[]->(b2:Battle) WHERE k:King or k:Knight RETURN b1.name AS source, b2.name AS target, count(distinct k) AS num_participants Battle King Person Knight House Let’s revisit this question: “Cluster or group battles based on Kings and Knights participated in them” This gives you: - for every pair of battles, - how many kings or knights fought in both
  • 18. 18 Derived relationships and k-partite graphs MATCH (b1:Battle)<-[]-(k)-[]->(b2:Battle) WHERE k:King or k:Knight WITH b1, b2 count(distinct k) AS num_participants MERGE (b1)-[r:shared_fighters {count:num_participants}]->(b2) RETURN b1,r,b2 Let’s revisit this question: “Cluster or group battles based on Kings and Knights participated in them” We can add the new derived relationship to the graph and extract the relevant subgraph
  • 19. 19 Derived relationships and k-partite graphs Let’s revisit this question: “Cluster or group battles based on Kings and Knights participated in them” Now that we have the graph we can run our algorithms on …. we can use one of the Community detection algorithms to cluster battles.
  • 20. Algorithms learn based on the structure of a graph, and make assumptions about graph topology: • Any variation in the distribution of relationships for nodes can tell us some underlying property (how important is a node? what community is it in?) • It’s not possible to iterate heuristics over directed graphs • There’s no external information relevant to the algorithm (like node labels or relationship type) 20 Why can’t I run my algorithm on a multipartite graph? huh?
  • 21. What if I try to an algorithm on this graph? - How many relationships does each person have? - How many relationships does each book have? - What is the direction of the relationships in this graph? - Can I reach a person node from another person node, following the directed relationships? 21 Why can’t I run my algorithm on a multipartite graph?
  • 22. What if I try to an algorithm on this graph? - How many relationships does each person have? 1 or 2 - How many relationships does each book have? 5 or 6 - What is the direction of the relationships in this graph? Person-[:APPEARED_IN]->Book - Can I reach a person node from another person node, following the directed relationships? No! 22 Why can’t I run my algorithm on a multipartite graph?
  • 23. What if I try to an algorithm on this graph? - What would an algorithm that used the number of edges each node has to calculate centrality conclude? - What would an algorithm that followed directed relationships to find communities conclude? 23 Why can’t I run my algorithm on a multipartite graph?
  • 24. What if I try to an algorithm on this graph? - What would an algorithm that used the number of edges each node has to calculate centrality conclude? Books are more important than people! - What would an algorithm that followed directed relationships to find communities conclude? There seven communities? 24 Why can’t I run my algorithm on a multipartite graph?
  • 25. If you want to find out: - What person is the most important - How many communities of people are there, across all the books? You need to reshape your graph! 25 Why can’t I run my algorithm on a multipartite graph?
  • 26. • Neo4j automates data transformations • Fast iterations & layering • Production ready features, parallelization & enterprise support Neo4j’s Graph Catalog solves this problem!    We have procedures to let you reshape and subset your transactional graph so you have the right data in the right shape. Mutable In-Memory Workspace Computational Graph Native Graph Store
  • 27. 27 Next Session.. Deep dive into Graph Data Science Library ● Graph Catalog ● Graph Algorithms