SlideShare uma empresa Scribd logo
1 de 34
Baixar para ler offline
What makes graph queries difficult?
Gábor Szárnyas
szarnyas@mit.bme.hu
Budapest Neo4j Meetup – 2019/06/25
With contributions from Petra Várhegyi and Bálint Hegyi
The property graph data model
SIMPLE GRAPH
A B
D E
C
5 people
 Many of them know each other
This is a simple graph.
Algorithms:
 breadth-first search
 depth-first search
 PageRank
 connected components
ADD EDGE WEIGHTS
A B
D E
C
5 people
 Weight: communication cost
This is a weighted graph.
Algorithms:
 shortest path algorithms
 max-flow
10
8
4
2
2
9
1
6
ADD EDGE TYPES
A B
D E
C
5 people
ADD EDGE TYPES
A B
D E
C
5 people
 Business partners
 Friends
Multiple edge types
but only a single node type.
This is an edge-typed graph.
ADD NODE AND EDGE TYPES
c4
c2
c5
c3
c6
c1
A B
D E
C
5 people 
 Business partners
 Friends
6 comments 
 Replying to another comment
 Authored by a given person
This is a typed graph.
ADD PROPERTIES
c4
c2
c5
c3
c6
c1
A B
D E
C
5 people  – name, age
 Business partners
 Friends – since
6 comments  – content, date
 Replying to another comment
 Authored by a given person
This is a property graph.
Similar to object-oriented data.
name: “Alice”
age: 25
name: “Bob”
age: 26
since: 2014
name: “Erin”
age: 30
content: “I totally agree”
date: 2017-02-02
content: “Great”
date: 2017-02-03
name: “Dan”
age: 47
Graph processing: Queries and analytics
GRAPH QUERIES: LOCAL
c4
c2
c5
c3
c6
c1
A B
D E
C
Local graph query:
Return “Dan” and his comments.
Well-researched topic.
Typical execution times are low.
name: “Alice”
age: 25
name: “Bob”
age: 26
since: 2014
name: “Erin”
age: 30
content: “I totally agree”
date: 2017-02-02
content: “Great”
date: 2017-02-03
name: “Dan”
age: 47
GRAPH QUERIES: GLOBAL
c4
c2
c5
c3
c6
c1
A B
D E
C
Global graph query:
Find people who had no interaction
with “Cecil” through any comments,
neither replying nor receiving a reply.
The result is „Alice”.
Typical execution times are high.
name: “Alice”
age: 25
name: “Bob”
age: 26
since: 2014
name: “Erin”
age: 30
content: “I totally agree”
date: 2017-02-02
content: “Great”
date: 2017-02-03
name: “Dan”
age: 47
GRAPH ANALTYICS: NETWORK SCIENCE
 Studies the structure of graphs
 Pioneered by László Barabási-Albert et al.
 Degree distributions, clustering coefficient, etc.
LOCAL CLUSTERING COEFFICIENT
A B
D E
C LCC(𝑣)=
𝑣
𝑣2
3
2
3
2
3
2
3
2
3
0 0.66 10.33
0.0
0.5
1.0
LCC
The empirical cumulative distribution
function does not present much useful
information in this case.
TYPED CLUSTERING COEFFICIENT
𝑣
TCC(𝑣)=
𝑣
0 0.66 10.33
0.0
0.5
1.0
TCC
𝑣
𝑣
+
+
More information
High combinatorial complexity:
• 𝑡 types → 𝑡 × (𝑡 − 1) triangles
• 𝒪 𝑡2
steps
1
2
0 2
3
0 0
A B
D E
C
TYPED CLUSTERING COEFFICIENT
A B
D E
C
𝑣
TCC(𝑣)=
𝑣
+
𝑣
+
𝑣 𝑣
+
𝑣
++
𝑣 𝑣
+
𝑣
+
 Business partners
 Friends
 Family member
3 types → 6 triangles
Petra Várhegyi:
Multidimensional Graph Analytics,
Master’s thesis, 2018
F. Battiston et al.:
Structural measures for multiplex networks,
Physical Review E, 2014
level
of detail
estimated
evaluation
time
BFS
GRAPH PROCESSING TECHNIQUES AND LANGUAGES
PageRank
Dijkstra
structure +types +properties
Local
clustering
coeff.
+weights
Floyd
Ford-Fulkerson
Global queries
Local queries
Typed
clustering
coeff.
Neo4j Graph Algorithms library Neo4j Graph Database
level
of detail
estimated
evaluation
time
BFS
GRAPH PROCESSING TECHNIQUES AND LANGUAGES
PageRank
Dijkstra
structure +types +properties+weights
Floyd
Ford-Fulkerson
Global queries
Local queries
Neo4j Graph Algorithms library
Typed
clustering
coeff.
Neo4j Graph Database
Local
clustering
coeff.
Graph processing tools and challenges
GRAPH PROCESSING CHALLENGES / STRUCTURE
the “curse of connectedness”
data structures contemporary computer architectures are
good at processing are linear and simple hierarchical
structures, such as Lists, Stacks, or Trees
a massive amount of random data access is required […]
poor performance since the CPU cache is not in effect for
most of the time. […] parallelism is difficult to extract
because of the unstructured nature of graphs.
B. Shao, Y. Li, H. Wang, H. Xia (Microsoft Research):
Trinity Graph Engine and its Applications,
IEEE Data Engineering Bulleting 2017
connectedness
computer
architectures
caching and
parallelization
GRAPH PROCESSING CHALLENGES / PROPERTIES
existing graph query methods […] focus on the topological
structure of graphs and few have considered attributed graphs.
applications of large graph databases would involve querying the
graph data (attributes) in addition to the graph topology.
answering queries that involve predicates on the attributes of
the graphs in addition to the topological structure […] makes
evaluation and optimization more complex.
S. Sakr, S. Elnikety, Y. He (Microsoft Research):
G-SPARQL: A Hybrid Engine for Querying Large Attributed Graphs,
CIKM 2012
topology
properties
complex
optimization
GRAPH PROCESSING TOOLS
graph
queries
graph
analytics
Currently, there is a strong distinction between graph query
and analytical tools – this might change in the future.
Gelly LynxKite
János Szendi-Varga (GraphAware):
Graph Technology Landscape 2019
Neo4j Graph Algorithms library
Benchmarks:
Defining a common understanding
TRANSACTION PROCESSING PERFORMANCE COUNCIL (1988-)
Many standard specifications
for benchmarking certain
aspects of relational DBs
LINKED DATA BENCHMARK COUNCIL (2012–)
LDBC is a non-profit organization dedicated to establishing benchmarks,
benchmark practices and benchmark results for graph data management
software.
LDBC’s Social Network Benchmark is an industrial and academic initiative,
formed by principal actors in the field of graph-like data management.
LDBC SOCIAL NETWORK BENCHMARK
Complex graph schema
14 node types, many edge types
Subgraphs
 Network of persons
 Arbitrary depth trees
o Comments
o TagClasses
 Fixed depth trees
o City < Country < Continent
LDBC INTERACTIVE Q3
Friends and friends of friends that have been to countries X and Y
LDBC INTERACTIVE Q14
Trusted connection paths
1 2 73 4 5 6
8 9 1410 11 12 13 1615
17 18 2319 20 21 22 2524
BI WORKLOAD
GraphBLAS:
A unified theory built on linear algebra
THE GRAPHBLAS APPROACH
BLAS GraphBLAS
HW architecture HW architecture
Numerical
applications
Graph analytical
applications
LAGraphLINPACK/LAPACK
S. McMillan: Research review @ CMU, 2015
Graph algorithms on future architectures
Separation of concernsSeparation of concerns
 GraphBLAS is an effort to define standard building blocks for graph algorithms
in the language of linear algebra
 1979: BLAS (Basic Linear Algebra Subprograms)
 2013: GraphBLAS
 Key idea: separation of concerns
Graph algorithm
implementers
Hardware vendors
HPC experts
Tim Mattson et al.: LAGraph,
GrAPL @ IPDPS 2019
PARALLELIZATION ON SKEWED DISTRIBUTIONS
Using multiple processing units require load balancing.
Very difficult to implement for real graphs.
This work is in progress and improvements are expected.
Gábor Szárnyas: Multiplex graph analytics
with GraphBLAS, FOSDEM 2019
Bálint Hegyi: Benchmarking scalable graph
query techniques, Master’s thesis, 2019
Summary
SUMMARY: CHALLENGES IN GRAPH PROCESSING
No consensus on a unifying theory:
 Relational algebra?
 Linear algebra?
Performance:
 Many random access operations
 Difficult to cache
 Difficult to parallelize
 Handling properties introduces even more complexity
Many open research and implementation challenges.
CONTRIBUTIONS IN MY PHD DISSERTATION
database
research
high-
performance
computing
network science
object-oriented
SW engineering
semantic web P1
P2
P3
Gábor Szárnyas:
Query, Analysis, and Benchmarking Techniques for Evolving Property Graphs of Software Systems,
PhD dissertation, 2019

Mais conteúdo relacionado

Mais procurados

GraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communitiesGraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communities
Paco Nathan
 
Apache Spark GraphX highlights.
Apache Spark GraphX highlights. Apache Spark GraphX highlights.
Apache Spark GraphX highlights.
Doug Needham
 

Mais procurados (20)

GraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communitiesGraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communities
 
Apache Spark GraphX highlights.
Apache Spark GraphX highlights. Apache Spark GraphX highlights.
Apache Spark GraphX highlights.
 
GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)
GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)
GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)
 
GraphX is the blue ocean for scala engineers @ Scala Matsuri 2014
GraphX is the blue ocean for scala engineers @ Scala Matsuri 2014GraphX is the blue ocean for scala engineers @ Scala Matsuri 2014
GraphX is the blue ocean for scala engineers @ Scala Matsuri 2014
 
Apache Spark GraphX & GraphFrame Synthetic ID Fraud Use Case
Apache Spark GraphX & GraphFrame Synthetic ID Fraud Use CaseApache Spark GraphX & GraphFrame Synthetic ID Fraud Use Case
Apache Spark GraphX & GraphFrame Synthetic ID Fraud Use Case
 
Graphs are everywhere! Distributed graph computing with Spark GraphX
Graphs are everywhere! Distributed graph computing with Spark GraphXGraphs are everywhere! Distributed graph computing with Spark GraphX
Graphs are everywhere! Distributed graph computing with Spark GraphX
 
The 2nd graph database in sv meetup
The 2nd graph database in sv meetupThe 2nd graph database in sv meetup
The 2nd graph database in sv meetup
 
A Join Operator for Property Graphs
A Join Operator for Property GraphsA Join Operator for Property Graphs
A Join Operator for Property Graphs
 
Chengqi zhang graph processing and mining in the era of big data
Chengqi zhang graph processing and mining in the era of big dataChengqi zhang graph processing and mining in the era of big data
Chengqi zhang graph processing and mining in the era of big data
 
Bekas for cognitive_speaker_series
Bekas for cognitive_speaker_seriesBekas for cognitive_speaker_series
Bekas for cognitive_speaker_series
 
Improve ML Predictions using Connected Feature Extraction
Improve ML Predictions using Connected Feature ExtractionImprove ML Predictions using Connected Feature Extraction
Improve ML Predictions using Connected Feature Extraction
 
Trends In Graph Data Management And Mining
Trends In Graph Data Management And MiningTrends In Graph Data Management And Mining
Trends In Graph Data Management And Mining
 
Benchmarking graph databases on the problem of community detection
Benchmarking graph databases on the problem of community detectionBenchmarking graph databases on the problem of community detection
Benchmarking graph databases on the problem of community detection
 
Gephi, Graphx, and Giraph
Gephi, Graphx, and GiraphGephi, Graphx, and Giraph
Gephi, Graphx, and Giraph
 
Data visualization using R
Data visualization using RData visualization using R
Data visualization using R
 
Graphalytics: A big data benchmark for graph-processing platforms
Graphalytics: A big data benchmark for graph-processing platformsGraphalytics: A big data benchmark for graph-processing platforms
Graphalytics: A big data benchmark for graph-processing platforms
 
Spark graphx
Spark graphxSpark graphx
Spark graphx
 
Interpreting Relational Schema to Graphs
Interpreting Relational Schema to GraphsInterpreting Relational Schema to Graphs
Interpreting Relational Schema to Graphs
 
Graph Analytics: Graph Algorithms Inside Neo4j
Graph Analytics: Graph Algorithms Inside Neo4jGraph Analytics: Graph Algorithms Inside Neo4j
Graph Analytics: Graph Algorithms Inside Neo4j
 
SF Python Meetup: TextRank in Python
SF Python Meetup: TextRank in PythonSF Python Meetup: TextRank in Python
SF Python Meetup: TextRank in Python
 

Semelhante a What Makes Graph Queries Difficult?

Semelhante a What Makes Graph Queries Difficult? (20)

Improve ML Predictions using Graph Analytics (today!)
Improve ML Predictions using Graph Analytics (today!)Improve ML Predictions using Graph Analytics (today!)
Improve ML Predictions using Graph Analytics (today!)
 
ArXiv Literature Exploration using Social Network Analysis
ArXiv Literature Exploration using Social Network AnalysisArXiv Literature Exploration using Social Network Analysis
ArXiv Literature Exploration using Social Network Analysis
 
Ted Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SF
Ted Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SFTed Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SF
Ted Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SF
 
ONLINE SUBGRAPH SKYLINE ANALYSIS OVER KNOWLEDGE GRAPHS
ONLINE SUBGRAPH SKYLINE ANALYSIS OVER KNOWLEDGE GRAPHSONLINE SUBGRAPH SKYLINE ANALYSIS OVER KNOWLEDGE GRAPHS
ONLINE SUBGRAPH SKYLINE ANALYSIS OVER KNOWLEDGE GRAPHS
 
High-Performance Graph Analysis and Modeling
High-Performance Graph Analysis and ModelingHigh-Performance Graph Analysis and Modeling
High-Performance Graph Analysis and Modeling
 
論文紹介:Graph Pattern Entity Ranking Model for Knowledge Graph Completion
論文紹介:Graph Pattern Entity Ranking Model for Knowledge Graph Completion論文紹介:Graph Pattern Entity Ranking Model for Knowledge Graph Completion
論文紹介:Graph Pattern Entity Ranking Model for Knowledge Graph Completion
 
KagNet: Knowledge-Aware Graph Networks for Commonsense Reasoning
KagNet: Knowledge-Aware Graph Networks for Commonsense ReasoningKagNet: Knowledge-Aware Graph Networks for Commonsense Reasoning
KagNet: Knowledge-Aware Graph Networks for Commonsense Reasoning
 
Graph Analysis over Relational Database. Roberto Franchini - Arcade Analytics
Graph Analysis over Relational Database. Roberto Franchini - Arcade AnalyticsGraph Analysis over Relational Database. Roberto Franchini - Arcade Analytics
Graph Analysis over Relational Database. Roberto Franchini - Arcade Analytics
 
NoSQL Graph Databases - Why, When and Where
NoSQL Graph Databases - Why, When and WhereNoSQL Graph Databases - Why, When and Where
NoSQL Graph Databases - Why, When and Where
 
Graph databases and SQL Server 2017
Graph databases and SQL Server 2017Graph databases and SQL Server 2017
Graph databases and SQL Server 2017
 
A New Algorithm Model for Massive-Scale Streaming Graph Analysis
A New Algorithm Model for Massive-Scale Streaming Graph AnalysisA New Algorithm Model for Massive-Scale Streaming Graph Analysis
A New Algorithm Model for Massive-Scale Streaming Graph Analysis
 
GRAKN.AI - The Knowledge Graph
GRAKN.AI - The Knowledge GraphGRAKN.AI - The Knowledge Graph
GRAKN.AI - The Knowledge Graph
 
Data Modeling with Neo4j
Data Modeling with Neo4jData Modeling with Neo4j
Data Modeling with Neo4j
 
Leveraging Graphs for Better AI
Leveraging Graphs for Better AILeveraging Graphs for Better AI
Leveraging Graphs for Better AI
 
How Graphs are Changing AI
How Graphs are Changing AIHow Graphs are Changing AI
How Graphs are Changing AI
 
Transforming AI with Graphs: Real World Examples using Spark and Neo4j
Transforming AI with Graphs: Real World Examples using Spark and Neo4jTransforming AI with Graphs: Real World Examples using Spark and Neo4j
Transforming AI with Graphs: Real World Examples using Spark and Neo4j
 
Transforming AI with Graphs: Real World Examples using Spark and Neo4j
Transforming AI with Graphs: Real World Examples using Spark and Neo4jTransforming AI with Graphs: Real World Examples using Spark and Neo4j
Transforming AI with Graphs: Real World Examples using Spark and Neo4j
 
SCALABLE SEMI-SUPERVISED LEARNING BY EFFICIENT ANCHOR GRAPH REGULARIZATION
SCALABLE SEMI-SUPERVISED LEARNING BY EFFICIENT ANCHOR GRAPH REGULARIZATIONSCALABLE SEMI-SUPERVISED LEARNING BY EFFICIENT ANCHOR GRAPH REGULARIZATION
SCALABLE SEMI-SUPERVISED LEARNING BY EFFICIENT ANCHOR GRAPH REGULARIZATION
 
Improving Machine Learning using Graph Algorithms
Improving Machine Learning using Graph AlgorithmsImproving Machine Learning using Graph Algorithms
Improving Machine Learning using Graph Algorithms
 
Graph based data models
Graph based data modelsGraph based data models
Graph based data models
 

Mais de Gábor Szárnyas

Mais de Gábor Szárnyas (11)

GraphBLAS: A linear algebraic approach for high-performance graph queries
GraphBLAS: A linear algebraic approach for high-performance graph queriesGraphBLAS: A linear algebraic approach for high-performance graph queries
GraphBLAS: A linear algebraic approach for high-performance graph queries
 
An early look at the LDBC Social Network Benchmark's Business Intelligence wo...
An early look at the LDBC Social Network Benchmark's Business Intelligence wo...An early look at the LDBC Social Network Benchmark's Business Intelligence wo...
An early look at the LDBC Social Network Benchmark's Business Intelligence wo...
 
Learning Timed Automata with Cypher
Learning Timed Automata with CypherLearning Timed Automata with Cypher
Learning Timed Automata with Cypher
 
Időzített automatatanulás Cypherrel
Időzített automatatanulás CypherrelIdőzített automatatanulás Cypherrel
Időzített automatatanulás Cypherrel
 
Parsing process
Parsing processParsing process
Parsing process
 
Compiling openCypher graph queries with Spark Catalyst
Compiling openCypher graph queries with Spark CatalystCompiling openCypher graph queries with Spark Catalyst
Compiling openCypher graph queries with Spark Catalyst
 
Towards the Characterization of Realistic Models: Evaluation of Multidiscipli...
Towards the Characterization of Realistic Models: Evaluation of Multidiscipli...Towards the Characterization of Realistic Models: Evaluation of Multidiscipli...
Towards the Characterization of Realistic Models: Evaluation of Multidiscipli...
 
Sharded Joins for Scalable Incremental Graph Queries
Sharded Joins for Scalable Incremental Graph QueriesSharded Joins for Scalable Incremental Graph Queries
Sharded Joins for Scalable Incremental Graph Queries
 
Towards a Macrobenchmark Framework for Performance Analysis of Java Applications
Towards a Macrobenchmark Framework for Performance Analysis of Java ApplicationsTowards a Macrobenchmark Framework for Performance Analysis of Java Applications
Towards a Macrobenchmark Framework for Performance Analysis of Java Applications
 
IncQuery-D: Distributed Incremental Graph Queries
IncQuery-D: Distributed Incremental Graph QueriesIncQuery-D: Distributed Incremental Graph Queries
IncQuery-D: Distributed Incremental Graph Queries
 
IncQuery-D: Incremental Queries in the Cloud
IncQuery-D: Incremental Queries in the CloudIncQuery-D: Incremental Queries in the Cloud
IncQuery-D: Incremental Queries in the Cloud
 

Último

"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
mphochane1998
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakes
MayuraD1
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
Neometrix_Engineering_Pvt_Ltd
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
ssuser89054b
 
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
HenryBriggs2
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
Epec Engineered Technologies
 
+97470301568>> buy weed in qatar,buy thc oil qatar,buy weed and vape oil in d...
+97470301568>> buy weed in qatar,buy thc oil qatar,buy weed and vape oil in d...+97470301568>> buy weed in qatar,buy thc oil qatar,buy weed and vape oil in d...
+97470301568>> buy weed in qatar,buy thc oil qatar,buy weed and vape oil in d...
Health
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Kandungan 087776558899
 

Último (20)

A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna Municipality
 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.ppt
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakes
 
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
 
kiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal loadkiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal load
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equation
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdf
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
 
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
 
Bridge Jacking Design Sample Calculation.pptx
Bridge Jacking Design Sample Calculation.pptxBridge Jacking Design Sample Calculation.pptx
Bridge Jacking Design Sample Calculation.pptx
 
Computer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersComputer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to Computers
 
+97470301568>> buy weed in qatar,buy thc oil qatar,buy weed and vape oil in d...
+97470301568>> buy weed in qatar,buy thc oil qatar,buy weed and vape oil in d...+97470301568>> buy weed in qatar,buy thc oil qatar,buy weed and vape oil in d...
+97470301568>> buy weed in qatar,buy thc oil qatar,buy weed and vape oil in d...
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 

What Makes Graph Queries Difficult?

  • 1. What makes graph queries difficult? Gábor Szárnyas szarnyas@mit.bme.hu Budapest Neo4j Meetup – 2019/06/25 With contributions from Petra Várhegyi and Bálint Hegyi
  • 2. The property graph data model
  • 3. SIMPLE GRAPH A B D E C 5 people  Many of them know each other This is a simple graph. Algorithms:  breadth-first search  depth-first search  PageRank  connected components
  • 4. ADD EDGE WEIGHTS A B D E C 5 people  Weight: communication cost This is a weighted graph. Algorithms:  shortest path algorithms  max-flow 10 8 4 2 2 9 1 6
  • 5. ADD EDGE TYPES A B D E C 5 people
  • 6. ADD EDGE TYPES A B D E C 5 people  Business partners  Friends Multiple edge types but only a single node type. This is an edge-typed graph.
  • 7. ADD NODE AND EDGE TYPES c4 c2 c5 c3 c6 c1 A B D E C 5 people   Business partners  Friends 6 comments   Replying to another comment  Authored by a given person This is a typed graph.
  • 8. ADD PROPERTIES c4 c2 c5 c3 c6 c1 A B D E C 5 people  – name, age  Business partners  Friends – since 6 comments  – content, date  Replying to another comment  Authored by a given person This is a property graph. Similar to object-oriented data. name: “Alice” age: 25 name: “Bob” age: 26 since: 2014 name: “Erin” age: 30 content: “I totally agree” date: 2017-02-02 content: “Great” date: 2017-02-03 name: “Dan” age: 47
  • 10. GRAPH QUERIES: LOCAL c4 c2 c5 c3 c6 c1 A B D E C Local graph query: Return “Dan” and his comments. Well-researched topic. Typical execution times are low. name: “Alice” age: 25 name: “Bob” age: 26 since: 2014 name: “Erin” age: 30 content: “I totally agree” date: 2017-02-02 content: “Great” date: 2017-02-03 name: “Dan” age: 47
  • 11. GRAPH QUERIES: GLOBAL c4 c2 c5 c3 c6 c1 A B D E C Global graph query: Find people who had no interaction with “Cecil” through any comments, neither replying nor receiving a reply. The result is „Alice”. Typical execution times are high. name: “Alice” age: 25 name: “Bob” age: 26 since: 2014 name: “Erin” age: 30 content: “I totally agree” date: 2017-02-02 content: “Great” date: 2017-02-03 name: “Dan” age: 47
  • 12. GRAPH ANALTYICS: NETWORK SCIENCE  Studies the structure of graphs  Pioneered by László Barabási-Albert et al.  Degree distributions, clustering coefficient, etc.
  • 13. LOCAL CLUSTERING COEFFICIENT A B D E C LCC(𝑣)= 𝑣 𝑣2 3 2 3 2 3 2 3 2 3 0 0.66 10.33 0.0 0.5 1.0 LCC The empirical cumulative distribution function does not present much useful information in this case.
  • 14. TYPED CLUSTERING COEFFICIENT 𝑣 TCC(𝑣)= 𝑣 0 0.66 10.33 0.0 0.5 1.0 TCC 𝑣 𝑣 + + More information High combinatorial complexity: • 𝑡 types → 𝑡 × (𝑡 − 1) triangles • 𝒪 𝑡2 steps 1 2 0 2 3 0 0 A B D E C
  • 15. TYPED CLUSTERING COEFFICIENT A B D E C 𝑣 TCC(𝑣)= 𝑣 + 𝑣 + 𝑣 𝑣 + 𝑣 ++ 𝑣 𝑣 + 𝑣 +  Business partners  Friends  Family member 3 types → 6 triangles Petra Várhegyi: Multidimensional Graph Analytics, Master’s thesis, 2018 F. Battiston et al.: Structural measures for multiplex networks, Physical Review E, 2014
  • 16. level of detail estimated evaluation time BFS GRAPH PROCESSING TECHNIQUES AND LANGUAGES PageRank Dijkstra structure +types +properties Local clustering coeff. +weights Floyd Ford-Fulkerson Global queries Local queries Typed clustering coeff. Neo4j Graph Algorithms library Neo4j Graph Database
  • 17. level of detail estimated evaluation time BFS GRAPH PROCESSING TECHNIQUES AND LANGUAGES PageRank Dijkstra structure +types +properties+weights Floyd Ford-Fulkerson Global queries Local queries Neo4j Graph Algorithms library Typed clustering coeff. Neo4j Graph Database Local clustering coeff.
  • 18. Graph processing tools and challenges
  • 19. GRAPH PROCESSING CHALLENGES / STRUCTURE the “curse of connectedness” data structures contemporary computer architectures are good at processing are linear and simple hierarchical structures, such as Lists, Stacks, or Trees a massive amount of random data access is required […] poor performance since the CPU cache is not in effect for most of the time. […] parallelism is difficult to extract because of the unstructured nature of graphs. B. Shao, Y. Li, H. Wang, H. Xia (Microsoft Research): Trinity Graph Engine and its Applications, IEEE Data Engineering Bulleting 2017 connectedness computer architectures caching and parallelization
  • 20. GRAPH PROCESSING CHALLENGES / PROPERTIES existing graph query methods […] focus on the topological structure of graphs and few have considered attributed graphs. applications of large graph databases would involve querying the graph data (attributes) in addition to the graph topology. answering queries that involve predicates on the attributes of the graphs in addition to the topological structure […] makes evaluation and optimization more complex. S. Sakr, S. Elnikety, Y. He (Microsoft Research): G-SPARQL: A Hybrid Engine for Querying Large Attributed Graphs, CIKM 2012 topology properties complex optimization
  • 21. GRAPH PROCESSING TOOLS graph queries graph analytics Currently, there is a strong distinction between graph query and analytical tools – this might change in the future. Gelly LynxKite János Szendi-Varga (GraphAware): Graph Technology Landscape 2019 Neo4j Graph Algorithms library
  • 23. TRANSACTION PROCESSING PERFORMANCE COUNCIL (1988-) Many standard specifications for benchmarking certain aspects of relational DBs
  • 24. LINKED DATA BENCHMARK COUNCIL (2012–) LDBC is a non-profit organization dedicated to establishing benchmarks, benchmark practices and benchmark results for graph data management software. LDBC’s Social Network Benchmark is an industrial and academic initiative, formed by principal actors in the field of graph-like data management.
  • 25. LDBC SOCIAL NETWORK BENCHMARK Complex graph schema 14 node types, many edge types Subgraphs  Network of persons  Arbitrary depth trees o Comments o TagClasses  Fixed depth trees o City < Country < Continent
  • 26. LDBC INTERACTIVE Q3 Friends and friends of friends that have been to countries X and Y
  • 27. LDBC INTERACTIVE Q14 Trusted connection paths
  • 28. 1 2 73 4 5 6 8 9 1410 11 12 13 1615 17 18 2319 20 21 22 2524 BI WORKLOAD
  • 29. GraphBLAS: A unified theory built on linear algebra
  • 30. THE GRAPHBLAS APPROACH BLAS GraphBLAS HW architecture HW architecture Numerical applications Graph analytical applications LAGraphLINPACK/LAPACK S. McMillan: Research review @ CMU, 2015 Graph algorithms on future architectures Separation of concernsSeparation of concerns  GraphBLAS is an effort to define standard building blocks for graph algorithms in the language of linear algebra  1979: BLAS (Basic Linear Algebra Subprograms)  2013: GraphBLAS  Key idea: separation of concerns Graph algorithm implementers Hardware vendors HPC experts Tim Mattson et al.: LAGraph, GrAPL @ IPDPS 2019
  • 31. PARALLELIZATION ON SKEWED DISTRIBUTIONS Using multiple processing units require load balancing. Very difficult to implement for real graphs. This work is in progress and improvements are expected. Gábor Szárnyas: Multiplex graph analytics with GraphBLAS, FOSDEM 2019 Bálint Hegyi: Benchmarking scalable graph query techniques, Master’s thesis, 2019
  • 33. SUMMARY: CHALLENGES IN GRAPH PROCESSING No consensus on a unifying theory:  Relational algebra?  Linear algebra? Performance:  Many random access operations  Difficult to cache  Difficult to parallelize  Handling properties introduces even more complexity Many open research and implementation challenges.
  • 34. CONTRIBUTIONS IN MY PHD DISSERTATION database research high- performance computing network science object-oriented SW engineering semantic web P1 P2 P3 Gábor Szárnyas: Query, Analysis, and Benchmarking Techniques for Evolving Property Graphs of Software Systems, PhD dissertation, 2019