SlideShare a Scribd company logo
1 of 49
Download to read offline
gMark: Schema-driven data and workload
generation for graph databases
George Fletcher
TU Eindhoven
Joint work with
G. Bagan (Lyon), A. Bonifati (Lyon), R. Ciucanu (Oxford),
A. Lemay (Lille), and N. Advokaat (Eindhoven)
8th LDBC TUC Meeting
Redwood Shores, California
23 June 2016
Synthetic graph and workload generation with gMark
We present gMark, an open-source framework for generation of
synthetic graphs and workloads.
gMark has been designed to tailor diverse graph data management
scenarios, which are often driven by query workloads.
https://github.com/graphMark/gmark George Fletcher, TU Eindhoven
Synthetic graph and workload generation with gMark
We present gMark, an open-source framework for generation of
synthetic graphs and workloads.
gMark has been designed to tailor diverse graph data management
scenarios, which are often driven by query workloads.
For example
§ multi-query optimization,
§ mapping discovery and query rewriting in data integration
systems,
§ workload-driven graph database physical design,
and, in general, flexible specification and generation of diverse
workloads addressing particular experimental studies.
https://github.com/graphMark/gmark George Fletcher, TU Eindhoven
Synthetic graph and workload generation with gMark
Given a graph schema, gMark
§ generates synthetic instances of the schema (of desired size)
§ generates query workloads with targeted structure and runtime
behavior (which holds for all instances of the schema)
https://github.com/graphMark/gmark George Fletcher, TU Eindhoven
Why gMark?
We adopt successful aspects of the state of the art
For example, like the Waterloo Diversity Benchmark, gMark is
schema-driven,
§ allowing finely tailored graph instances for specific application
domains; and,
§ allowing tightly controlled generation of query workloads.
https://github.com/graphMark/gmark George Fletcher, TU Eindhoven
Why gMark?
We adopt successful aspects of the state of the art
For example, like the Waterloo Diversity Benchmark, gMark is
schema-driven,
§ allowing finely tailored graph instances for specific application
domains; and,
§ allowing tightly controlled generation of query workloads.
and, like the LDBC SNB Interactive, gMark supports focused
stress-testing of query optimization choke-points, through fine
control of query parameters such as selectivities.
https://github.com/graphMark/gmark George Fletcher, TU Eindhoven
Why gMark?
New features of gMark include
§ support for flexible generation of query workloads including
recursive path queries, which are fundamental for graph
analytics;
https://github.com/graphMark/gmark George Fletcher, TU Eindhoven
Why gMark?
New features of gMark include
§ support for flexible generation of query workloads including
recursive path queries, which are fundamental for graph
analytics; and,
§ query selectivity estimation solution, in a purely
instance-independent schema-driven fashion.
§ hence, more scalable, more predictable, and easier to
explain/understand.
https://github.com/graphMark/gmark George Fletcher, TU Eindhoven
Overview of the gMark workflow
Graph configuration
‚ Size
‚ Node types
‚ Edge predicates
‚ Schema constraints
‚ Degree distributions
Query workload configuration
‚ Size
‚ Selectivity
‚ Recursion
‚ Shape
‚ Arity
gMark
Graph&query generator
Graph instance file
(CSV)
Query workload file
(UCRPQs as XML)
gMark
Query translator
SPARQL
openCypher
PostgreSQL
Datalog
https://github.com/graphMark/gmark George Fletcher, TU Eindhoven
Graph generation
gMark graph generation
Graph configuration
‚ Size
‚ Node types
‚ Edge predicates
‚ Schema constraints
‚ Degree distributions
Query workload configuration
‚ Size
‚ Selectivity
‚ Recursion
‚ Shape
‚ Arity
gMark
Graph&query generator
Graph instance file
(CSV)
Query workload file
(UCRPQs as XML)
gMark
Query translator
SPARQL
openCypher
PostgreSQL
Datalog
https://github.com/graphMark/gmark George Fletcher, TU Eindhoven
Graph configurations
The user can specify in the graph configuration (i.e., graph
schema):
‚ Size: # of nodes
‚ Node types: finite set of node labels
e.g., author, citation, journal
https://github.com/graphMark/gmark George Fletcher, TU Eindhoven
Graph configurations
The user can specify in the graph configuration (i.e., graph
schema):
‚ Size: # of nodes
‚ Node types: finite set of node labels
e.g., author, citation, journal
‚ Edge predicates: finite set of edge labels
e.g., authoredBy, referencedBy
https://github.com/graphMark/gmark George Fletcher, TU Eindhoven
Graph configurations
The user can specify in the graph configuration (i.e., graph
schema):
‚ Size: # of nodes
‚ Node types: finite set of node labels
e.g., author, citation, journal
‚ Edge predicates: finite set of edge labels
e.g., authoredBy, referencedBy
‚ Schema constraints: proportion of nodes/edges of given type
e.g., 20% of all nodes are authors
‚ Degree distributions: on the in- and out-degree of edge
predicates (uniform, normal, zipfian)
e.g., the out-distribution of citation authoredBy
ÝÝÝÝÝÝÝÝÑ
author is Gaussian
with parameters µ “ 3, σ “ 1
https://github.com/graphMark/gmark George Fletcher, TU Eindhoven
Graph configurations: Uniprot schema
Node type Constr.
gene 35%
protein 31%
author 20%
citation 10%
organism 1%
. . . . . .
Edge predicate Constr.
authoredBy 64%
encodedOn 6%
referencedBy 3%
occursIn 2%
. . . . . .
Node types Edge predicates
source type predicate
ÝÝÝÝÝÝÑ
target type In-distr. Out-distr.
protein encodedOnÝÝÝÝÝÝÝÑ gene Zipfian Gaussian
protein occursInÝÝÝÝÝÝÑ organism Zipfian Uniform
protein referencedBy
ÝÝÝÝÝÝÝÝÝÝÑ
citation Zipfian Gaussian
citation authoredBy
ÝÝÝÝÝÝÝÝÑ
author Zipfian Gaussian
. . . . . . . . .
In- and out-degree distributions
https://github.com/graphMark/gmark George Fletcher, TU Eindhoven
Schema-driven graph generation
We have established the intractability of the generation problem
Theorem
Given a graph configuration G, deciding whether or not there exists
a graph instance satisfying G is NP-complete.
https://github.com/graphMark/gmark George Fletcher, TU Eindhoven
Schema-driven graph generation
We have established the intractability of the generation problem
Theorem
Given a graph configuration G, deciding whether or not there exists
a graph instance satisfying G is NP-complete.
Hence, gMark follows a ‘best-effort’ strategy in instance generation,
i.e., it attempts to achieve the exact values of the input parameters
and relaxes them whenever this is not possible.
https://github.com/graphMark/gmark George Fletcher, TU Eindhoven
Schema-driven graph generation
We have adapted the scenarios of several popular use cases into
meaningful gMark configurations, while also adding new gMark
features:
§ Bib: our default bibliographical use-case
§ LSN: LDBC social network benchmark
§ WD: WatDiv e-commerce benchmark
§ SP: SP2Bench DBLP benchmark
https://github.com/graphMark/gmark George Fletcher, TU Eindhoven
Scalability of gMark graph generation
100K 1M 10M 100M
Bib 0m0.057s 0m0.638s 0m8.344s 1m28.725s
LSN 0m0.225s 0m1.451s 0m23.018s 3m11.318s
WD 0m2.163s 0m25.032s 4m10.988s 113m31.078s
SP 0m0.638s 0m7.048s 1m28.831s 15m23.542s
Graph generation times, with varying graph sizes (# nodes)
Generation time depends heavily on density of instances (e.g., WD
has 100x number of edges than Bib)
https://github.com/graphMark/gmark George Fletcher, TU Eindhoven
Query workload generation
gMark query generation
Graph configuration
‚ Size
‚ Node types
‚ Edge predicates
‚ Schema constraints
‚ Degree distributions
Query workload configuration
‚ Size
‚ Selectivity
‚ Recursion
‚ Shape
‚ Arity
gMark
Graph&query generator
Graph instance file
(CSV)
Query workload file
(UCRPQs as XML)
gMark
Query translator
SPARQL
openCypher
PostgreSQL
Datalog
https://github.com/graphMark/gmark George Fletcher, TU Eindhoven
A query language for graphs
UCRPQ: Unions of Conjunctions of Regular Path Queries
– Core constructs of the W3C’s SPARQL 1.1, Oracle’s PGQL, and
and Neo4j’s openCypher
– Well understood theoretical properties (e.g., polynomial data
complexity)
https://github.com/graphMark/gmark George Fletcher, TU Eindhoven
A query language for graphs
UCRPQ: Unions of Conjunctions of Regular Path Queries
– Core constructs of the W3C’s SPARQL 1.1, Oracle’s PGQL, and
and Neo4j’s openCypher
– Well understood theoretical properties (e.g., polynomial data
complexity)
UCRPQ includes recursive queries (via the Kleene star ˚), with
applications in social networks, bioinformatics, etc.
gMark generates UCRPQ Ñ the first schema-driven tool to support
recursive queries and their generation in concrete syntaxes.
https://github.com/graphMark/gmark George Fletcher, TU Eindhoven
A query language for graphs
Example of UCRPQ
for each researcher, select all of the biological entities
(i.e., genes and organisms) relevant to proteins studied in
papers authored by people in the researcher’s
coauthorship network
https://github.com/graphMark/gmark George Fletcher, TU Eindhoven
A query language for graphs
Example of UCRPQ
for each researcher, select all of the biological entities
(i.e., genes and organisms) relevant to proteins studied in
papers authored by people in the researcher’s
coauthorship network
https://github.com/graphMark/gmark George Fletcher, TU Eindhoven
A query language for graphs
Example of UCRPQ
for each researcher, select all of the biological entities
(i.e., genes and organisms) relevant to proteins studied in
papers authored by people in the researcher’s
coauthorship network
p?x, ?zq Ð p?x, pa´¨aq˚, ?yq, p?y, pa´¨r´¨e ` a´¨r´ ¨oq, ?zq
(a=authoredBy, r=referencedBy, e=encodedOn, o=occursIn)
https://github.com/graphMark/gmark George Fletcher, TU Eindhoven
A query language for graphs
Example of UCRPQ
for each researcher, select all of the biological entities
(i.e., genes and organisms) relevant to proteins studied in
papers authored by people in the researcher’s
coauthorship network
p?x, ?zq Ð p?x, pa´¨aq˚, ?yq, p?y, pa´¨r´¨e ` a´¨r´ ¨oq, ?zq
(a=authoredBy, r=referencedBy, e=encodedOn, o=occursIn)
#rules 1
#conjuncts 2
#disjuncts 1, 2
path lengh 2, 3, 3
https://github.com/graphMark/gmark George Fletcher, TU Eindhoven
Schema-driven workload generation
The user can specify in the query workload configuration:
‚ Size: #queries, #conjuncts/#disjuncts/path length per query
‚ Selectivity: constant, linear, quadratic.
‚ Recursion: probability to generate Kleene star above a conjunct.
https://github.com/graphMark/gmark George Fletcher, TU Eindhoven
Schema-driven workload generation
The user can specify in the query workload configuration:
‚ Size: #queries, #conjuncts/#disjuncts/path length per query
‚ Selectivity: constant, linear, quadratic.
‚ Recursion: probability to generate Kleene star above a conjunct.
‚ Shape: chain, star, cycle, star-chain.
‚ Arity: arbitrary (including 0 i.e., Boolean).
The graph configuration is also input to the query generator.
https://github.com/graphMark/gmark George Fletcher, TU Eindhoven
Schema-driven workload generation
Approach
1. Prepare “schema” and “selectivity” graphs
https://github.com/graphMark/gmark George Fletcher, TU Eindhoven
Schema-driven workload generation
Approach
1. Prepare “schema” and “selectivity” graphs
2. For each query to be generated:
2.1 Generate query skeleton
2.2 Assign selectivity class to each predicate
2.3 Instantiate each predicate
https://github.com/graphMark/gmark George Fletcher, TU Eindhoven
Schema-driven workload generation
Approach
1. Prepare “schema” and “selectivity” graphs
2. For each query to be generated:
2.1 Generate query skeleton
2.2 Assign selectivity class to each predicate
2.3 Instantiate each predicate
Assigning selectivities required us to develop a non-trivial
infrastructure for instance-independent reasoning over query
behavior, based on a Selectivity Algebra.
https://github.com/graphMark/gmark George Fletcher, TU Eindhoven
Selectivity estimation quality of gMark
‚ Given a binary query Q and a graph G, we assume that
|QpGq| “ Op|nodespGq|αq.
‚ α is the selectivity value (0–constant, 1–linear, 2–quadratic).
https://github.com/graphMark/gmark George Fletcher, TU Eindhoven
Selectivity estimation quality of gMark
‚ Given a binary query Q and a graph G, we assume that
|QpGq| “ Op|nodespGq|αq.
‚ α is the selectivity value (0–constant, 1–linear, 2–quadratic).
‚ Experiments confirmed the assumption and the estimation quality.
Constant Linear Quadratic
LSN-Len 0.200˘0.417 1.189˘0.261 2.032˘0.059
LSN-Dis 0.182˘0.364 1.325˘0.318 2.046˘0.074
LSN-Con 0.190˘0.391 1.244˘0.326 2.017˘0.032
LSN-Rec 0.196˘0.409 1.090˘0.492 1.564˘0.889
Bib-Len 0.003˘0.010 0.921˘0.122 1.405˘0.337
Bib-Dis 0.000˘0.000 0.995˘0.012 1.607˘0.261
Bib-Con 0.023˘0.029 0.986˘0.112 1.409˘0.296
Bib-Rec 0.100˘0.316 0.982˘0.073 1.493˘0.335
WD-Len 0.016˘0.044 1.427˘0.392 2.004˘0.022
WD-Dis 0.009˘0.022 1.412˘0.380 1.999˘0.014
WD-Con -0.010˘0.026 1.540˘0.495 1.750˘0.708
WD-Rec 0.587˘0.830 - 1.976˘0.012
SP 0.074˘0.130 1.064˘0.034 2.034˘0.295
https://github.com/graphMark/gmark George Fletcher, TU Eindhoven
gMark query translator
Graph configuration
‚ Size
‚ Node types
‚ Edge predicates
‚ Schema constraints
‚ Degree distributions
Query workload configuration
‚ Size
‚ Selectivity
‚ Recursion
‚ Shape
‚ Arity
gMark
Graph&query generator
Graph instance file
(CSV)
Query workload file
(UCRPQs as XML)
gMark
Query translator
SPARQL
openCypher
PostgreSQL
Datalog
https://github.com/graphMark/gmark George Fletcher, TU Eindhoven
Query translation
UCRPQ: p?x, ?zq Ð p?x, pa´¨aq˚, ?yq, p?y, pa´¨r´¨e ` a´¨r´ ¨oq, ?zq
https://github.com/graphMark/gmark George Fletcher, TU Eindhoven
Query translation
UCRPQ: p?x, ?zq Ð p?x, pa´¨aq˚, ?yq, p?y, pa´¨r´¨e ` a´¨r´ ¨oq, ?zq
SPARQL openCypher
PREFIX : <http://example.org/gmark/>
SELECT DISTINCT ?x ?z
WHERE { ?x (^:a/:a)* ?y .
?y ((^:a/^:r/:e)|(^:a/^:r/:o)) ?z .}
MATCH (x)<-[:a]-()-[:a]->(y),
(y)<-[:a]-()<-[:r]-()-[:e]->(z)
RETURN DISTINCT x, z
UNION
MATCH (x)<-[:a]-()-[:a]->(y),
(y)<-[:a]-()<-[:r]-()-[:o]->(z)
RETURN DISTINCT x, z;
Datalog SQL
g0(x,y)<- edge(x1,a,x0),edge(x1,a,x2),
x=x0,y=x2.
g0(x,y)<- g0(x,z),g0(z,y).
g1(x,y)<- edge(x1,a,x0),edge(x2,r,x1),
edge(x2,e,x3),x=x0,y=x3.
g1(x,y)<- edge(x1,a,x0),edge(x2,r,x1),
edge(x2,o,x3),x=x0,y=x3.
query(x,z)<- g0(x,y),g1(y,z).
WITH RECURSIVE c0(src, trg) AS (
SELECT edge.src, edge.src FROM edge
UNION
SELECT edge.trg, edge.trg FROM edge
UNION
SELECT s0.src, s0.trg
FROM (SELECT trg as src, src as trg,
https://github.com/graphMark/gmark George Fletcher, TU Eindhoven
Scalability of gMark workload generation
On my laptop, gMark easily generates workloads of one thousand
queries for Bib in „ 0.3s; LSN and SP in „ 1.5s; and for the richer
WD scenario in „ 10s.
Query translation of the thousand queries into all four supported
syntaxes for each of the four scenarios required „ 0.1s.
https://github.com/graphMark/gmark George Fletcher, TU Eindhoven
Scalability on recursive query workloads
Example Application. We performed an extensive performance
study of four state-of-the-art systems under the four use-case
schemas.
Our main finding was that performance on queries containing
recursive path navigation (i.e., RPQs) was typically impractical
§ indicates the need for further study of the engineering of this
basic class of graph queries
https://github.com/graphMark/gmark George Fletcher, TU Eindhoven
Wrap Up
Recap
Novel contributions of gMark
https://github.com/graphMark/gmark George Fletcher, TU Eindhoven
Recap
Novel contributions of gMark
§ schema-driven graph and query-workload generation, featuring
instance-independent selectivity estimation;
https://github.com/graphMark/gmark George Fletcher, TU Eindhoven
Recap
Novel contributions of gMark
§ schema-driven graph and query-workload generation, featuring
instance-independent selectivity estimation;
§ finely controlled query workload-centered approach
§ versus query-centered approaches – nb. both are valid and
needed!
https://github.com/graphMark/gmark George Fletcher, TU Eindhoven
Recap
Novel contributions of gMark
§ schema-driven graph and query-workload generation, featuring
instance-independent selectivity estimation;
§ finely controlled query workload-centered approach
§ versus query-centered approaches – nb. both are valid and
needed!
§ discovery of the performance difficulties of existing graph
DBMS’s on evaluating a basic class of graph queries
§ Regular Path Queries
https://github.com/graphMark/gmark George Fletcher, TU Eindhoven
Recap
Novel contributions of gMark
§ schema-driven graph and query-workload generation, featuring
instance-independent selectivity estimation;
§ finely controlled query workload-centered approach
§ versus query-centered approaches – nb. both are valid and
needed!
§ discovery of the performance difficulties of existing graph
DBMS’s on evaluating a basic class of graph queries
§ Regular Path Queries
Come see us at our VLDB 2016 demo!
https://github.com/graphMark/gmark George Fletcher, TU Eindhoven
Looking ahead to gMark v2.0
To-do/wishlist.
§ richer queries
§ support of constants in queries
§ additional query shapes
§ aggregation for BI workloads
§ extensions of selectivity estimation to higher arity queries,
https://github.com/graphMark/gmark George Fletcher, TU Eindhoven
Looking ahead to gMark v2.0
To-do/wishlist.
§ richer queries
§ support of constants in queries
§ additional query shapes
§ aggregation for BI workloads
§ extensions of selectivity estimation to higher arity queries,
§ richer schemas
§ configuration parameter completion,
§ schema constructs for correlated structure,
§ class hierarchies
https://github.com/graphMark/gmark George Fletcher, TU Eindhoven
Looking ahead to gMark v2.0
To-do/wishlist.
§ richer queries
§ support of constants in queries
§ additional query shapes
§ aggregation for BI workloads
§ extensions of selectivity estimation to higher arity queries,
§ richer schemas
§ configuration parameter completion,
§ schema constructs for correlated structure,
§ class hierarchies
§ align our work with LDBC activites?
https://github.com/graphMark/gmark George Fletcher, TU Eindhoven
gMark: Schema-driven data and workload
generation for graph databases
George Fletcher
TU Eindhoven, Netherlands
Joint work with
G. Bagan (Lyon), A. Bonifati (Lyon), R. Ciucanu (Oxford),
A. Lemay (Lille), and N. Advokaat (Eindhoven)
https://github.com/graphMark/gmark
Thank you!

More Related Content

Similar to 8th TUC Meeting – George Fletcher (TU Eindhoven), gMark: Schema-driven data and workload generation for graph databases

Big Linked Data Federation - ExtremeEarth Open Workshop
Big Linked Data Federation - ExtremeEarth Open WorkshopBig Linked Data Federation - ExtremeEarth Open Workshop
Big Linked Data Federation - ExtremeEarth Open WorkshopExtremeEarth
 
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...MLconf
 
GraphQL & DGraph with Go
GraphQL & DGraph with GoGraphQL & DGraph with Go
GraphQL & DGraph with GoJames Tan
 
aRangodb, un package per l'utilizzo di ArangoDB con R
aRangodb, un package per l'utilizzo di ArangoDB con RaRangodb, un package per l'utilizzo di ArangoDB con R
aRangodb, un package per l'utilizzo di ArangoDB con RGraphRM
 
Software tools, crystal descriptors, and machine learning applied to material...
Software tools, crystal descriptors, and machine learning applied to material...Software tools, crystal descriptors, and machine learning applied to material...
Software tools, crystal descriptors, and machine learning applied to material...Anubhav Jain
 
SCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARK
SCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARKSCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARK
SCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARKzmhassan
 
Graph Techniques for Natural Language Processing
Graph Techniques for Natural Language ProcessingGraph Techniques for Natural Language Processing
Graph Techniques for Natural Language ProcessingSujit Pal
 
Web Traffic Time Series Forecasting
Web Traffic  Time Series ForecastingWeb Traffic  Time Series Forecasting
Web Traffic Time Series ForecastingBillTubbs
 
GraphTech Ecosystem - part 3: Graph Visualization
GraphTech Ecosystem - part 3: Graph VisualizationGraphTech Ecosystem - part 3: Graph Visualization
GraphTech Ecosystem - part 3: Graph VisualizationLinkurious
 
OpenACC and Hackathons Monthly Highlights: April 2023
OpenACC and Hackathons Monthly Highlights: April  2023OpenACC and Hackathons Monthly Highlights: April  2023
OpenACC and Hackathons Monthly Highlights: April 2023OpenACC
 
Generation of Random EMF Models for Benchmarks
Generation of Random EMF Models for BenchmarksGeneration of Random EMF Models for Benchmarks
Generation of Random EMF Models for BenchmarksMarkus Scheidgen
 
Andriy Shalaenko - GO security tips
Andriy Shalaenko - GO security tipsAndriy Shalaenko - GO security tips
Andriy Shalaenko - GO security tipsOWASP Kyiv
 
F3-DP-2015-Milata-Tomas-java-ee-batch-editor (1)
F3-DP-2015-Milata-Tomas-java-ee-batch-editor (1)F3-DP-2015-Milata-Tomas-java-ee-batch-editor (1)
F3-DP-2015-Milata-Tomas-java-ee-batch-editor (1)Tomáš Milata
 
AnalyticOps - Chicago PAW 2016
AnalyticOps - Chicago PAW 2016AnalyticOps - Chicago PAW 2016
AnalyticOps - Chicago PAW 2016Robert Grossman
 
From Data to Knowledge thru Grailog Visualization
From Data to Knowledge thru Grailog VisualizationFrom Data to Knowledge thru Grailog Visualization
From Data to Knowledge thru Grailog Visualizationgiurca
 
Start with version control and experiments management in machine learning
Start with version control and experiments management in machine learningStart with version control and experiments management in machine learning
Start with version control and experiments management in machine learningMikhail Rozhkov
 
Fugue: Unifying Spark and Non-Spark Ecosystems for Big Data Analytics
Fugue: Unifying Spark and Non-Spark Ecosystems for Big Data AnalyticsFugue: Unifying Spark and Non-Spark Ecosystems for Big Data Analytics
Fugue: Unifying Spark and Non-Spark Ecosystems for Big Data AnalyticsDatabricks
 

Similar to 8th TUC Meeting – George Fletcher (TU Eindhoven), gMark: Schema-driven data and workload generation for graph databases (20)

Big Linked Data Federation - ExtremeEarth Open Workshop
Big Linked Data Federation - ExtremeEarth Open WorkshopBig Linked Data Federation - ExtremeEarth Open Workshop
Big Linked Data Federation - ExtremeEarth Open Workshop
 
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
 
GraphQL & DGraph with Go
GraphQL & DGraph with GoGraphQL & DGraph with Go
GraphQL & DGraph with Go
 
aRangodb, un package per l'utilizzo di ArangoDB con R
aRangodb, un package per l'utilizzo di ArangoDB con RaRangodb, un package per l'utilizzo di ArangoDB con R
aRangodb, un package per l'utilizzo di ArangoDB con R
 
Software tools, crystal descriptors, and machine learning applied to material...
Software tools, crystal descriptors, and machine learning applied to material...Software tools, crystal descriptors, and machine learning applied to material...
Software tools, crystal descriptors, and machine learning applied to material...
 
SCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARK
SCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARKSCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARK
SCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARK
 
Generative AI for Reengineering Variants into Software Product Lines: An Expe...
Generative AI for Reengineering Variants into Software Product Lines: An Expe...Generative AI for Reengineering Variants into Software Product Lines: An Expe...
Generative AI for Reengineering Variants into Software Product Lines: An Expe...
 
Final Algos
Final AlgosFinal Algos
Final Algos
 
Graph Techniques for Natural Language Processing
Graph Techniques for Natural Language ProcessingGraph Techniques for Natural Language Processing
Graph Techniques for Natural Language Processing
 
Web Traffic Time Series Forecasting
Web Traffic  Time Series ForecastingWeb Traffic  Time Series Forecasting
Web Traffic Time Series Forecasting
 
GraphTech Ecosystem - part 3: Graph Visualization
GraphTech Ecosystem - part 3: Graph VisualizationGraphTech Ecosystem - part 3: Graph Visualization
GraphTech Ecosystem - part 3: Graph Visualization
 
3rd presentation
3rd presentation3rd presentation
3rd presentation
 
OpenACC and Hackathons Monthly Highlights: April 2023
OpenACC and Hackathons Monthly Highlights: April  2023OpenACC and Hackathons Monthly Highlights: April  2023
OpenACC and Hackathons Monthly Highlights: April 2023
 
Generation of Random EMF Models for Benchmarks
Generation of Random EMF Models for BenchmarksGeneration of Random EMF Models for Benchmarks
Generation of Random EMF Models for Benchmarks
 
Andriy Shalaenko - GO security tips
Andriy Shalaenko - GO security tipsAndriy Shalaenko - GO security tips
Andriy Shalaenko - GO security tips
 
F3-DP-2015-Milata-Tomas-java-ee-batch-editor (1)
F3-DP-2015-Milata-Tomas-java-ee-batch-editor (1)F3-DP-2015-Milata-Tomas-java-ee-batch-editor (1)
F3-DP-2015-Milata-Tomas-java-ee-batch-editor (1)
 
AnalyticOps - Chicago PAW 2016
AnalyticOps - Chicago PAW 2016AnalyticOps - Chicago PAW 2016
AnalyticOps - Chicago PAW 2016
 
From Data to Knowledge thru Grailog Visualization
From Data to Knowledge thru Grailog VisualizationFrom Data to Knowledge thru Grailog Visualization
From Data to Knowledge thru Grailog Visualization
 
Start with version control and experiments management in machine learning
Start with version control and experiments management in machine learningStart with version control and experiments management in machine learning
Start with version control and experiments management in machine learning
 
Fugue: Unifying Spark and Non-Spark Ecosystems for Big Data Analytics
Fugue: Unifying Spark and Non-Spark Ecosystems for Big Data AnalyticsFugue: Unifying Spark and Non-Spark Ecosystems for Big Data Analytics
Fugue: Unifying Spark and Non-Spark Ecosystems for Big Data Analytics
 

More from LDBC council

8th TUC Meeting – Marcus Paradies (SAP) Social Network Benchmark
8th TUC Meeting – Marcus Paradies (SAP) Social Network Benchmark8th TUC Meeting – Marcus Paradies (SAP) Social Network Benchmark
8th TUC Meeting – Marcus Paradies (SAP) Social Network BenchmarkLDBC council
 
8th TUC Meeting - Sergey Edunov (Facebook). Generating realistic trillion-edg...
8th TUC Meeting - Sergey Edunov (Facebook). Generating realistic trillion-edg...8th TUC Meeting - Sergey Edunov (Facebook). Generating realistic trillion-edg...
8th TUC Meeting - Sergey Edunov (Facebook). Generating realistic trillion-edg...LDBC council
 
Weining Qian (ECNU). On Statistical Characteristics of Real-Life Knowledge Gr...
Weining Qian (ECNU). On Statistical Characteristics of Real-Life Knowledge Gr...Weining Qian (ECNU). On Statistical Characteristics of Real-Life Knowledge Gr...
Weining Qian (ECNU). On Statistical Characteristics of Real-Life Knowledge Gr...LDBC council
 
8th TUC Meeting - Peter Boncz (CWI). Query Language Task Force status
8th TUC Meeting - Peter Boncz (CWI). Query Language Task Force status8th TUC Meeting - Peter Boncz (CWI). Query Language Task Force status
8th TUC Meeting - Peter Boncz (CWI). Query Language Task Force statusLDBC council
 
8th TUC Meeting - Eugene I. Chong (Oracle USA). Balancing Act to improve RDF ...
8th TUC Meeting - Eugene I. Chong (Oracle USA). Balancing Act to improve RDF ...8th TUC Meeting - Eugene I. Chong (Oracle USA). Balancing Act to improve RDF ...
8th TUC Meeting - Eugene I. Chong (Oracle USA). Balancing Act to improve RDF ...LDBC council
 
8th TUC Meeting -
8th TUC Meeting - 8th TUC Meeting -
8th TUC Meeting - LDBC council
 
8th TUC Meeting - David Meibusch, Nathan Hawes (Oracle Labs Australia). Frapp...
8th TUC Meeting - David Meibusch, Nathan Hawes (Oracle Labs Australia). Frapp...8th TUC Meeting - David Meibusch, Nathan Hawes (Oracle Labs Australia). Frapp...
8th TUC Meeting - David Meibusch, Nathan Hawes (Oracle Labs Australia). Frapp...LDBC council
 
8th TUC Meeting - Martin Zand University of Rochester Clinical and Translatio...
8th TUC Meeting - Martin Zand University of Rochester Clinical and Translatio...8th TUC Meeting - Martin Zand University of Rochester Clinical and Translatio...
8th TUC Meeting - Martin Zand University of Rochester Clinical and Translatio...LDBC council
 
8th TUC Meeting - Tim Hegeman (TU Delft). Social Network Benchmark, Analytics...
8th TUC Meeting - Tim Hegeman (TU Delft). Social Network Benchmark, Analytics...8th TUC Meeting - Tim Hegeman (TU Delft). Social Network Benchmark, Analytics...
8th TUC Meeting - Tim Hegeman (TU Delft). Social Network Benchmark, Analytics...LDBC council
 
LDBC 8th TUC Meeting: Introduction and status update
LDBC 8th TUC Meeting: Introduction and status updateLDBC 8th TUC Meeting: Introduction and status update
LDBC 8th TUC Meeting: Introduction and status updateLDBC council
 
LDBC 6th TUC Meeting conclusions
LDBC 6th TUC Meeting conclusionsLDBC 6th TUC Meeting conclusions
LDBC 6th TUC Meeting conclusionsLDBC council
 
Parallel and incremental materialisation of RDF/DATALOG in RDFOX
Parallel and incremental materialisation of RDF/DATALOG in RDFOXParallel and incremental materialisation of RDF/DATALOG in RDFOX
Parallel and incremental materialisation of RDF/DATALOG in RDFOXLDBC council
 
MODAClouds Decision Support System for Cloud Service Selection
MODAClouds Decision Support System for Cloud Service SelectionMODAClouds Decision Support System for Cloud Service Selection
MODAClouds Decision Support System for Cloud Service SelectionLDBC council
 
E-Commerce and Graph-driven Applications: Experiences and Optimizations while...
E-Commerce and Graph-driven Applications: Experiences and Optimizations while...E-Commerce and Graph-driven Applications: Experiences and Optimizations while...
E-Commerce and Graph-driven Applications: Experiences and Optimizations while...LDBC council
 
LDBC SNB Benchmark Auditing
LDBC SNB Benchmark AuditingLDBC SNB Benchmark Auditing
LDBC SNB Benchmark AuditingLDBC council
 
Social Network Benchmark Interactive Workload
Social Network Benchmark Interactive WorkloadSocial Network Benchmark Interactive Workload
Social Network Benchmark Interactive WorkloadLDBC council
 
MarkLogic Overview and Use Cases
MarkLogic Overview and Use CasesMarkLogic Overview and Use Cases
MarkLogic Overview and Use CasesLDBC council
 
Towards Temporal Graph Management and Analytics
Towards Temporal Graph Management and AnalyticsTowards Temporal Graph Management and Analytics
Towards Temporal Graph Management and AnalyticsLDBC council
 
Querying the Wikidata Knowledge Graph
Querying the Wikidata Knowledge GraphQuerying the Wikidata Knowledge Graph
Querying the Wikidata Knowledge GraphLDBC council
 
SADI: A design-pattern for “native” Linked-Data Semantic Web Services
SADI: A design-pattern for “native” Linked-Data Semantic Web ServicesSADI: A design-pattern for “native” Linked-Data Semantic Web Services
SADI: A design-pattern for “native” Linked-Data Semantic Web ServicesLDBC council
 

More from LDBC council (20)

8th TUC Meeting – Marcus Paradies (SAP) Social Network Benchmark
8th TUC Meeting – Marcus Paradies (SAP) Social Network Benchmark8th TUC Meeting – Marcus Paradies (SAP) Social Network Benchmark
8th TUC Meeting – Marcus Paradies (SAP) Social Network Benchmark
 
8th TUC Meeting - Sergey Edunov (Facebook). Generating realistic trillion-edg...
8th TUC Meeting - Sergey Edunov (Facebook). Generating realistic trillion-edg...8th TUC Meeting - Sergey Edunov (Facebook). Generating realistic trillion-edg...
8th TUC Meeting - Sergey Edunov (Facebook). Generating realistic trillion-edg...
 
Weining Qian (ECNU). On Statistical Characteristics of Real-Life Knowledge Gr...
Weining Qian (ECNU). On Statistical Characteristics of Real-Life Knowledge Gr...Weining Qian (ECNU). On Statistical Characteristics of Real-Life Knowledge Gr...
Weining Qian (ECNU). On Statistical Characteristics of Real-Life Knowledge Gr...
 
8th TUC Meeting - Peter Boncz (CWI). Query Language Task Force status
8th TUC Meeting - Peter Boncz (CWI). Query Language Task Force status8th TUC Meeting - Peter Boncz (CWI). Query Language Task Force status
8th TUC Meeting - Peter Boncz (CWI). Query Language Task Force status
 
8th TUC Meeting - Eugene I. Chong (Oracle USA). Balancing Act to improve RDF ...
8th TUC Meeting - Eugene I. Chong (Oracle USA). Balancing Act to improve RDF ...8th TUC Meeting - Eugene I. Chong (Oracle USA). Balancing Act to improve RDF ...
8th TUC Meeting - Eugene I. Chong (Oracle USA). Balancing Act to improve RDF ...
 
8th TUC Meeting -
8th TUC Meeting - 8th TUC Meeting -
8th TUC Meeting -
 
8th TUC Meeting - David Meibusch, Nathan Hawes (Oracle Labs Australia). Frapp...
8th TUC Meeting - David Meibusch, Nathan Hawes (Oracle Labs Australia). Frapp...8th TUC Meeting - David Meibusch, Nathan Hawes (Oracle Labs Australia). Frapp...
8th TUC Meeting - David Meibusch, Nathan Hawes (Oracle Labs Australia). Frapp...
 
8th TUC Meeting - Martin Zand University of Rochester Clinical and Translatio...
8th TUC Meeting - Martin Zand University of Rochester Clinical and Translatio...8th TUC Meeting - Martin Zand University of Rochester Clinical and Translatio...
8th TUC Meeting - Martin Zand University of Rochester Clinical and Translatio...
 
8th TUC Meeting - Tim Hegeman (TU Delft). Social Network Benchmark, Analytics...
8th TUC Meeting - Tim Hegeman (TU Delft). Social Network Benchmark, Analytics...8th TUC Meeting - Tim Hegeman (TU Delft). Social Network Benchmark, Analytics...
8th TUC Meeting - Tim Hegeman (TU Delft). Social Network Benchmark, Analytics...
 
LDBC 8th TUC Meeting: Introduction and status update
LDBC 8th TUC Meeting: Introduction and status updateLDBC 8th TUC Meeting: Introduction and status update
LDBC 8th TUC Meeting: Introduction and status update
 
LDBC 6th TUC Meeting conclusions
LDBC 6th TUC Meeting conclusionsLDBC 6th TUC Meeting conclusions
LDBC 6th TUC Meeting conclusions
 
Parallel and incremental materialisation of RDF/DATALOG in RDFOX
Parallel and incremental materialisation of RDF/DATALOG in RDFOXParallel and incremental materialisation of RDF/DATALOG in RDFOX
Parallel and incremental materialisation of RDF/DATALOG in RDFOX
 
MODAClouds Decision Support System for Cloud Service Selection
MODAClouds Decision Support System for Cloud Service SelectionMODAClouds Decision Support System for Cloud Service Selection
MODAClouds Decision Support System for Cloud Service Selection
 
E-Commerce and Graph-driven Applications: Experiences and Optimizations while...
E-Commerce and Graph-driven Applications: Experiences and Optimizations while...E-Commerce and Graph-driven Applications: Experiences and Optimizations while...
E-Commerce and Graph-driven Applications: Experiences and Optimizations while...
 
LDBC SNB Benchmark Auditing
LDBC SNB Benchmark AuditingLDBC SNB Benchmark Auditing
LDBC SNB Benchmark Auditing
 
Social Network Benchmark Interactive Workload
Social Network Benchmark Interactive WorkloadSocial Network Benchmark Interactive Workload
Social Network Benchmark Interactive Workload
 
MarkLogic Overview and Use Cases
MarkLogic Overview and Use CasesMarkLogic Overview and Use Cases
MarkLogic Overview and Use Cases
 
Towards Temporal Graph Management and Analytics
Towards Temporal Graph Management and AnalyticsTowards Temporal Graph Management and Analytics
Towards Temporal Graph Management and Analytics
 
Querying the Wikidata Knowledge Graph
Querying the Wikidata Knowledge GraphQuerying the Wikidata Knowledge Graph
Querying the Wikidata Knowledge Graph
 
SADI: A design-pattern for “native” Linked-Data Semantic Web Services
SADI: A design-pattern for “native” Linked-Data Semantic Web ServicesSADI: A design-pattern for “native” Linked-Data Semantic Web Services
SADI: A design-pattern for “native” Linked-Data Semantic Web Services
 

Recently uploaded

Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 

Recently uploaded (20)

Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 

8th TUC Meeting – George Fletcher (TU Eindhoven), gMark: Schema-driven data and workload generation for graph databases

  • 1. gMark: Schema-driven data and workload generation for graph databases George Fletcher TU Eindhoven Joint work with G. Bagan (Lyon), A. Bonifati (Lyon), R. Ciucanu (Oxford), A. Lemay (Lille), and N. Advokaat (Eindhoven) 8th LDBC TUC Meeting Redwood Shores, California 23 June 2016
  • 2. Synthetic graph and workload generation with gMark We present gMark, an open-source framework for generation of synthetic graphs and workloads. gMark has been designed to tailor diverse graph data management scenarios, which are often driven by query workloads. https://github.com/graphMark/gmark George Fletcher, TU Eindhoven
  • 3. Synthetic graph and workload generation with gMark We present gMark, an open-source framework for generation of synthetic graphs and workloads. gMark has been designed to tailor diverse graph data management scenarios, which are often driven by query workloads. For example § multi-query optimization, § mapping discovery and query rewriting in data integration systems, § workload-driven graph database physical design, and, in general, flexible specification and generation of diverse workloads addressing particular experimental studies. https://github.com/graphMark/gmark George Fletcher, TU Eindhoven
  • 4. Synthetic graph and workload generation with gMark Given a graph schema, gMark § generates synthetic instances of the schema (of desired size) § generates query workloads with targeted structure and runtime behavior (which holds for all instances of the schema) https://github.com/graphMark/gmark George Fletcher, TU Eindhoven
  • 5. Why gMark? We adopt successful aspects of the state of the art For example, like the Waterloo Diversity Benchmark, gMark is schema-driven, § allowing finely tailored graph instances for specific application domains; and, § allowing tightly controlled generation of query workloads. https://github.com/graphMark/gmark George Fletcher, TU Eindhoven
  • 6. Why gMark? We adopt successful aspects of the state of the art For example, like the Waterloo Diversity Benchmark, gMark is schema-driven, § allowing finely tailored graph instances for specific application domains; and, § allowing tightly controlled generation of query workloads. and, like the LDBC SNB Interactive, gMark supports focused stress-testing of query optimization choke-points, through fine control of query parameters such as selectivities. https://github.com/graphMark/gmark George Fletcher, TU Eindhoven
  • 7. Why gMark? New features of gMark include § support for flexible generation of query workloads including recursive path queries, which are fundamental for graph analytics; https://github.com/graphMark/gmark George Fletcher, TU Eindhoven
  • 8. Why gMark? New features of gMark include § support for flexible generation of query workloads including recursive path queries, which are fundamental for graph analytics; and, § query selectivity estimation solution, in a purely instance-independent schema-driven fashion. § hence, more scalable, more predictable, and easier to explain/understand. https://github.com/graphMark/gmark George Fletcher, TU Eindhoven
  • 9. Overview of the gMark workflow Graph configuration ‚ Size ‚ Node types ‚ Edge predicates ‚ Schema constraints ‚ Degree distributions Query workload configuration ‚ Size ‚ Selectivity ‚ Recursion ‚ Shape ‚ Arity gMark Graph&query generator Graph instance file (CSV) Query workload file (UCRPQs as XML) gMark Query translator SPARQL openCypher PostgreSQL Datalog https://github.com/graphMark/gmark George Fletcher, TU Eindhoven
  • 11. gMark graph generation Graph configuration ‚ Size ‚ Node types ‚ Edge predicates ‚ Schema constraints ‚ Degree distributions Query workload configuration ‚ Size ‚ Selectivity ‚ Recursion ‚ Shape ‚ Arity gMark Graph&query generator Graph instance file (CSV) Query workload file (UCRPQs as XML) gMark Query translator SPARQL openCypher PostgreSQL Datalog https://github.com/graphMark/gmark George Fletcher, TU Eindhoven
  • 12. Graph configurations The user can specify in the graph configuration (i.e., graph schema): ‚ Size: # of nodes ‚ Node types: finite set of node labels e.g., author, citation, journal https://github.com/graphMark/gmark George Fletcher, TU Eindhoven
  • 13. Graph configurations The user can specify in the graph configuration (i.e., graph schema): ‚ Size: # of nodes ‚ Node types: finite set of node labels e.g., author, citation, journal ‚ Edge predicates: finite set of edge labels e.g., authoredBy, referencedBy https://github.com/graphMark/gmark George Fletcher, TU Eindhoven
  • 14. Graph configurations The user can specify in the graph configuration (i.e., graph schema): ‚ Size: # of nodes ‚ Node types: finite set of node labels e.g., author, citation, journal ‚ Edge predicates: finite set of edge labels e.g., authoredBy, referencedBy ‚ Schema constraints: proportion of nodes/edges of given type e.g., 20% of all nodes are authors ‚ Degree distributions: on the in- and out-degree of edge predicates (uniform, normal, zipfian) e.g., the out-distribution of citation authoredBy ÝÝÝÝÝÝÝÝÑ author is Gaussian with parameters µ “ 3, σ “ 1 https://github.com/graphMark/gmark George Fletcher, TU Eindhoven
  • 15. Graph configurations: Uniprot schema Node type Constr. gene 35% protein 31% author 20% citation 10% organism 1% . . . . . . Edge predicate Constr. authoredBy 64% encodedOn 6% referencedBy 3% occursIn 2% . . . . . . Node types Edge predicates source type predicate ÝÝÝÝÝÝÑ target type In-distr. Out-distr. protein encodedOnÝÝÝÝÝÝÝÑ gene Zipfian Gaussian protein occursInÝÝÝÝÝÝÑ organism Zipfian Uniform protein referencedBy ÝÝÝÝÝÝÝÝÝÝÑ citation Zipfian Gaussian citation authoredBy ÝÝÝÝÝÝÝÝÑ author Zipfian Gaussian . . . . . . . . . In- and out-degree distributions https://github.com/graphMark/gmark George Fletcher, TU Eindhoven
  • 16. Schema-driven graph generation We have established the intractability of the generation problem Theorem Given a graph configuration G, deciding whether or not there exists a graph instance satisfying G is NP-complete. https://github.com/graphMark/gmark George Fletcher, TU Eindhoven
  • 17. Schema-driven graph generation We have established the intractability of the generation problem Theorem Given a graph configuration G, deciding whether or not there exists a graph instance satisfying G is NP-complete. Hence, gMark follows a ‘best-effort’ strategy in instance generation, i.e., it attempts to achieve the exact values of the input parameters and relaxes them whenever this is not possible. https://github.com/graphMark/gmark George Fletcher, TU Eindhoven
  • 18. Schema-driven graph generation We have adapted the scenarios of several popular use cases into meaningful gMark configurations, while also adding new gMark features: § Bib: our default bibliographical use-case § LSN: LDBC social network benchmark § WD: WatDiv e-commerce benchmark § SP: SP2Bench DBLP benchmark https://github.com/graphMark/gmark George Fletcher, TU Eindhoven
  • 19. Scalability of gMark graph generation 100K 1M 10M 100M Bib 0m0.057s 0m0.638s 0m8.344s 1m28.725s LSN 0m0.225s 0m1.451s 0m23.018s 3m11.318s WD 0m2.163s 0m25.032s 4m10.988s 113m31.078s SP 0m0.638s 0m7.048s 1m28.831s 15m23.542s Graph generation times, with varying graph sizes (# nodes) Generation time depends heavily on density of instances (e.g., WD has 100x number of edges than Bib) https://github.com/graphMark/gmark George Fletcher, TU Eindhoven
  • 21. gMark query generation Graph configuration ‚ Size ‚ Node types ‚ Edge predicates ‚ Schema constraints ‚ Degree distributions Query workload configuration ‚ Size ‚ Selectivity ‚ Recursion ‚ Shape ‚ Arity gMark Graph&query generator Graph instance file (CSV) Query workload file (UCRPQs as XML) gMark Query translator SPARQL openCypher PostgreSQL Datalog https://github.com/graphMark/gmark George Fletcher, TU Eindhoven
  • 22. A query language for graphs UCRPQ: Unions of Conjunctions of Regular Path Queries – Core constructs of the W3C’s SPARQL 1.1, Oracle’s PGQL, and and Neo4j’s openCypher – Well understood theoretical properties (e.g., polynomial data complexity) https://github.com/graphMark/gmark George Fletcher, TU Eindhoven
  • 23. A query language for graphs UCRPQ: Unions of Conjunctions of Regular Path Queries – Core constructs of the W3C’s SPARQL 1.1, Oracle’s PGQL, and and Neo4j’s openCypher – Well understood theoretical properties (e.g., polynomial data complexity) UCRPQ includes recursive queries (via the Kleene star ˚), with applications in social networks, bioinformatics, etc. gMark generates UCRPQ Ñ the first schema-driven tool to support recursive queries and their generation in concrete syntaxes. https://github.com/graphMark/gmark George Fletcher, TU Eindhoven
  • 24. A query language for graphs Example of UCRPQ for each researcher, select all of the biological entities (i.e., genes and organisms) relevant to proteins studied in papers authored by people in the researcher’s coauthorship network https://github.com/graphMark/gmark George Fletcher, TU Eindhoven
  • 25. A query language for graphs Example of UCRPQ for each researcher, select all of the biological entities (i.e., genes and organisms) relevant to proteins studied in papers authored by people in the researcher’s coauthorship network https://github.com/graphMark/gmark George Fletcher, TU Eindhoven
  • 26. A query language for graphs Example of UCRPQ for each researcher, select all of the biological entities (i.e., genes and organisms) relevant to proteins studied in papers authored by people in the researcher’s coauthorship network p?x, ?zq Ð p?x, pa´¨aq˚, ?yq, p?y, pa´¨r´¨e ` a´¨r´ ¨oq, ?zq (a=authoredBy, r=referencedBy, e=encodedOn, o=occursIn) https://github.com/graphMark/gmark George Fletcher, TU Eindhoven
  • 27. A query language for graphs Example of UCRPQ for each researcher, select all of the biological entities (i.e., genes and organisms) relevant to proteins studied in papers authored by people in the researcher’s coauthorship network p?x, ?zq Ð p?x, pa´¨aq˚, ?yq, p?y, pa´¨r´¨e ` a´¨r´ ¨oq, ?zq (a=authoredBy, r=referencedBy, e=encodedOn, o=occursIn) #rules 1 #conjuncts 2 #disjuncts 1, 2 path lengh 2, 3, 3 https://github.com/graphMark/gmark George Fletcher, TU Eindhoven
  • 28. Schema-driven workload generation The user can specify in the query workload configuration: ‚ Size: #queries, #conjuncts/#disjuncts/path length per query ‚ Selectivity: constant, linear, quadratic. ‚ Recursion: probability to generate Kleene star above a conjunct. https://github.com/graphMark/gmark George Fletcher, TU Eindhoven
  • 29. Schema-driven workload generation The user can specify in the query workload configuration: ‚ Size: #queries, #conjuncts/#disjuncts/path length per query ‚ Selectivity: constant, linear, quadratic. ‚ Recursion: probability to generate Kleene star above a conjunct. ‚ Shape: chain, star, cycle, star-chain. ‚ Arity: arbitrary (including 0 i.e., Boolean). The graph configuration is also input to the query generator. https://github.com/graphMark/gmark George Fletcher, TU Eindhoven
  • 30. Schema-driven workload generation Approach 1. Prepare “schema” and “selectivity” graphs https://github.com/graphMark/gmark George Fletcher, TU Eindhoven
  • 31. Schema-driven workload generation Approach 1. Prepare “schema” and “selectivity” graphs 2. For each query to be generated: 2.1 Generate query skeleton 2.2 Assign selectivity class to each predicate 2.3 Instantiate each predicate https://github.com/graphMark/gmark George Fletcher, TU Eindhoven
  • 32. Schema-driven workload generation Approach 1. Prepare “schema” and “selectivity” graphs 2. For each query to be generated: 2.1 Generate query skeleton 2.2 Assign selectivity class to each predicate 2.3 Instantiate each predicate Assigning selectivities required us to develop a non-trivial infrastructure for instance-independent reasoning over query behavior, based on a Selectivity Algebra. https://github.com/graphMark/gmark George Fletcher, TU Eindhoven
  • 33. Selectivity estimation quality of gMark ‚ Given a binary query Q and a graph G, we assume that |QpGq| “ Op|nodespGq|αq. ‚ α is the selectivity value (0–constant, 1–linear, 2–quadratic). https://github.com/graphMark/gmark George Fletcher, TU Eindhoven
  • 34. Selectivity estimation quality of gMark ‚ Given a binary query Q and a graph G, we assume that |QpGq| “ Op|nodespGq|αq. ‚ α is the selectivity value (0–constant, 1–linear, 2–quadratic). ‚ Experiments confirmed the assumption and the estimation quality. Constant Linear Quadratic LSN-Len 0.200˘0.417 1.189˘0.261 2.032˘0.059 LSN-Dis 0.182˘0.364 1.325˘0.318 2.046˘0.074 LSN-Con 0.190˘0.391 1.244˘0.326 2.017˘0.032 LSN-Rec 0.196˘0.409 1.090˘0.492 1.564˘0.889 Bib-Len 0.003˘0.010 0.921˘0.122 1.405˘0.337 Bib-Dis 0.000˘0.000 0.995˘0.012 1.607˘0.261 Bib-Con 0.023˘0.029 0.986˘0.112 1.409˘0.296 Bib-Rec 0.100˘0.316 0.982˘0.073 1.493˘0.335 WD-Len 0.016˘0.044 1.427˘0.392 2.004˘0.022 WD-Dis 0.009˘0.022 1.412˘0.380 1.999˘0.014 WD-Con -0.010˘0.026 1.540˘0.495 1.750˘0.708 WD-Rec 0.587˘0.830 - 1.976˘0.012 SP 0.074˘0.130 1.064˘0.034 2.034˘0.295 https://github.com/graphMark/gmark George Fletcher, TU Eindhoven
  • 35. gMark query translator Graph configuration ‚ Size ‚ Node types ‚ Edge predicates ‚ Schema constraints ‚ Degree distributions Query workload configuration ‚ Size ‚ Selectivity ‚ Recursion ‚ Shape ‚ Arity gMark Graph&query generator Graph instance file (CSV) Query workload file (UCRPQs as XML) gMark Query translator SPARQL openCypher PostgreSQL Datalog https://github.com/graphMark/gmark George Fletcher, TU Eindhoven
  • 36. Query translation UCRPQ: p?x, ?zq Ð p?x, pa´¨aq˚, ?yq, p?y, pa´¨r´¨e ` a´¨r´ ¨oq, ?zq https://github.com/graphMark/gmark George Fletcher, TU Eindhoven
  • 37. Query translation UCRPQ: p?x, ?zq Ð p?x, pa´¨aq˚, ?yq, p?y, pa´¨r´¨e ` a´¨r´ ¨oq, ?zq SPARQL openCypher PREFIX : <http://example.org/gmark/> SELECT DISTINCT ?x ?z WHERE { ?x (^:a/:a)* ?y . ?y ((^:a/^:r/:e)|(^:a/^:r/:o)) ?z .} MATCH (x)<-[:a]-()-[:a]->(y), (y)<-[:a]-()<-[:r]-()-[:e]->(z) RETURN DISTINCT x, z UNION MATCH (x)<-[:a]-()-[:a]->(y), (y)<-[:a]-()<-[:r]-()-[:o]->(z) RETURN DISTINCT x, z; Datalog SQL g0(x,y)<- edge(x1,a,x0),edge(x1,a,x2), x=x0,y=x2. g0(x,y)<- g0(x,z),g0(z,y). g1(x,y)<- edge(x1,a,x0),edge(x2,r,x1), edge(x2,e,x3),x=x0,y=x3. g1(x,y)<- edge(x1,a,x0),edge(x2,r,x1), edge(x2,o,x3),x=x0,y=x3. query(x,z)<- g0(x,y),g1(y,z). WITH RECURSIVE c0(src, trg) AS ( SELECT edge.src, edge.src FROM edge UNION SELECT edge.trg, edge.trg FROM edge UNION SELECT s0.src, s0.trg FROM (SELECT trg as src, src as trg, https://github.com/graphMark/gmark George Fletcher, TU Eindhoven
  • 38. Scalability of gMark workload generation On my laptop, gMark easily generates workloads of one thousand queries for Bib in „ 0.3s; LSN and SP in „ 1.5s; and for the richer WD scenario in „ 10s. Query translation of the thousand queries into all four supported syntaxes for each of the four scenarios required „ 0.1s. https://github.com/graphMark/gmark George Fletcher, TU Eindhoven
  • 39. Scalability on recursive query workloads Example Application. We performed an extensive performance study of four state-of-the-art systems under the four use-case schemas. Our main finding was that performance on queries containing recursive path navigation (i.e., RPQs) was typically impractical § indicates the need for further study of the engineering of this basic class of graph queries https://github.com/graphMark/gmark George Fletcher, TU Eindhoven
  • 41. Recap Novel contributions of gMark https://github.com/graphMark/gmark George Fletcher, TU Eindhoven
  • 42. Recap Novel contributions of gMark § schema-driven graph and query-workload generation, featuring instance-independent selectivity estimation; https://github.com/graphMark/gmark George Fletcher, TU Eindhoven
  • 43. Recap Novel contributions of gMark § schema-driven graph and query-workload generation, featuring instance-independent selectivity estimation; § finely controlled query workload-centered approach § versus query-centered approaches – nb. both are valid and needed! https://github.com/graphMark/gmark George Fletcher, TU Eindhoven
  • 44. Recap Novel contributions of gMark § schema-driven graph and query-workload generation, featuring instance-independent selectivity estimation; § finely controlled query workload-centered approach § versus query-centered approaches – nb. both are valid and needed! § discovery of the performance difficulties of existing graph DBMS’s on evaluating a basic class of graph queries § Regular Path Queries https://github.com/graphMark/gmark George Fletcher, TU Eindhoven
  • 45. Recap Novel contributions of gMark § schema-driven graph and query-workload generation, featuring instance-independent selectivity estimation; § finely controlled query workload-centered approach § versus query-centered approaches – nb. both are valid and needed! § discovery of the performance difficulties of existing graph DBMS’s on evaluating a basic class of graph queries § Regular Path Queries Come see us at our VLDB 2016 demo! https://github.com/graphMark/gmark George Fletcher, TU Eindhoven
  • 46. Looking ahead to gMark v2.0 To-do/wishlist. § richer queries § support of constants in queries § additional query shapes § aggregation for BI workloads § extensions of selectivity estimation to higher arity queries, https://github.com/graphMark/gmark George Fletcher, TU Eindhoven
  • 47. Looking ahead to gMark v2.0 To-do/wishlist. § richer queries § support of constants in queries § additional query shapes § aggregation for BI workloads § extensions of selectivity estimation to higher arity queries, § richer schemas § configuration parameter completion, § schema constructs for correlated structure, § class hierarchies https://github.com/graphMark/gmark George Fletcher, TU Eindhoven
  • 48. Looking ahead to gMark v2.0 To-do/wishlist. § richer queries § support of constants in queries § additional query shapes § aggregation for BI workloads § extensions of selectivity estimation to higher arity queries, § richer schemas § configuration parameter completion, § schema constructs for correlated structure, § class hierarchies § align our work with LDBC activites? https://github.com/graphMark/gmark George Fletcher, TU Eindhoven
  • 49. gMark: Schema-driven data and workload generation for graph databases George Fletcher TU Eindhoven, Netherlands Joint work with G. Bagan (Lyon), A. Bonifati (Lyon), R. Ciucanu (Oxford), A. Lemay (Lille), and N. Advokaat (Eindhoven) https://github.com/graphMark/gmark Thank you!