SlideShare a Scribd company logo
1 of 22
Mihai Capotã, Arnau Prat, Peter Boncz, Hassan Chafi
Yong Guo,
Ana Lucia Varbanescu,
Graphalytics: Benchmarking Graph-Processing Platforms
LDBC TUC Meeting
UPC Barcelona, March 2016
GRAPHALYTICS
A Big Data Benchmark for Graph-Processing Platforms
1
http://bl.ocks.org/mbostock/4062045
Tim Hegeman,
Wing Lung Ngai,
https://github.com/tudelft-atlarge/graphalytics/
GRAPHALYTICS was made
possible by a generous
contribution from Oracle.
Alexandru Iosup,
Stijn Heldens,
Graphs at the Core of Our Society:
The LinkedIn ExampleData Deluge
2
Sources: Vincenzo Cosenza, The State of LinkedIn, http://vincos.it/the-state-of-linkedin/
via Christopher Penn, http://www.shiftcomm.com/2014/02/state-linkedin-social-media-dark-horse/
Apr 2014
400
Nov 2015
400
LinkedIn Is Not Unique: Data Deluge
270M MAU
200+ avg followers
>54B edges
1.2B MAU 0.8B DAU
200+ avg followers
>240B edges
company/day:
100+ posts, 1,000+ comments
IBM 280k employee-
-users, 2.6M followers
Graph Processing @large
4
A Graph Processing
Platform
Streaming not considered in this presentation.
Interactive processing not considered in this presentation.
AlgorithmETL
Active Storage
(filtering, compression,
replication, caching)
Distribution
to processing
platform
Graph Processing @large
5
A Graph Processing
Platform
Streaming not considered in this presentation.
Interactive processing not considered in this presentation.
AlgorithmETL
Active Storage
(filtering, compression,
replication, caching)
Distribution
to processing
platform
Ideally,
N cores/disks
 Nx faster
Ideally,
N cores/disks
 Nx faster
Graph Processing @large
6
A Graph Processing
Platform
Streaming not considered in this presentation.
Interactive processing not considered in this presentation.
AlgorithmETL
Active Storage
(filtering, compression,
replication, caching)
Distribution
to processing
platform
Ideally,
N cores/disks
 Nx faster
Ideally,
N cores/disks
 Nx faster
Compute-intesive workload
different/more complex analysis  ?x slower
Dataset-dependent workload
unfriendly graphs  ??x slower
Data-intesive workload
10x graph size  100x—1,000x slower
Graph-Processing Platforms
• Platform: the combined hardware, software, and
programming system that is being used to complete
a graph processing task
7
Trinity
2
Which to choose?
What to tune?
Graphalytics, in a nutshell
• An LDBC benchmark*
• Advanced benchmarking harness
• Diverse real and synthetic datasets
• Many classes of algorithms
• Granula for manual choke-point analysis
• Modern software engineering practices
• Supports many platforms
• Enables comparison of
community-driven and industrial systems
8
http://graphalytics.ewi.tudelft.nl
https://github.com/tudelft-atlarge/graphalytics/
Benchmarking Harness
9
Iosup et al. LDBC Graphalytics: A Benchmark for Large
Scale Graph Analysis on Parallel and Distributed Platform (submitted).
Graphalytics = Representative
Classes of Algorithms
and Datasets
• 2-stage selection process of algorithms datasets
10
Class Examples %
Graph Statistics Diameter, Local Clust. Coeff., PageRank 20
Graph Traversal BFS, SSSP, DFS 50
Connected Comp. Reachability, BiCC, Weakly CC 10
Community
Detection
Clustering, Nearest Neighbor,
Community Detection w Label Propagation
5
Other Sampling, Partitioning <15
Guo et al. How Well do Graph-Processing Platforms Perform? An Empirical
Performance Evaluation and Analysis, IPDPS’14.
+ weighted graphs: Single-Source Shortest Paths (~35%)
Graphalytics = Distributed Graph
Generation w DATAGEN
Person
Generation
Edge
Generation
Activity
Generation
“Knows”
graph
serializa
tion
Activity
serializa
tion
Graphalytics
11
• Rich set of configurations
• More diverse degree distribution than Graph500
• Realistic clustering coefficient and assortativity
Level of Detail
Graphalytics = Portable
Perf. Analysis w Granula
Graph Processing System
Logging Patch
Performance
Analyzer
Granula
Performance
Archive
Granula
Performance
Model
Modeling
Archiving
logs
rules
Granula
Archiver
Sharing
Monitoring
Minimal code invasion + automated data collection at runtime
+ portable archive (+ web UI)  portable bottleneck analysis
Graphalytics = Diverse Set of
Automated Experiments
Category Experiment Algo. Data Nodes/
Threads
Metrics
Baseline Dataset variety BFS,PR All 1 Run, norm.
Algorithm variety All R4(S),
D300(L)
1 Runtime
Scalability Vertical vs. horiz. BFS, PR D300(L),
D1000(XL)
1—16/1—32 Runtime, S
Weak vs. strong BFS, PR G22(S)—
G26(XL)
1—16 Runtime, S
Robustness Stress test BFS All 1 SLA met
Variability BFS D300(L),
D1000(L)
1/16 CV
Self-Test Time to run/part -- Datagen 1—16 Runtime
13
Implementation status
Map
Red
uce
2
Gir
ap
h
Gra
ph
X
Pow
erGr
aph
Graph
Lab
Neo4j PG
X.D
Gra
ph
Mat
Ope
nG
TOTE
M
Map
Graph
M
ed
us
a
LCC G G G G G G -- G G -- -- --
BFS G G G G G G G G G V V V
WC
C
G G G G G G G G G V V V
CDL
P
G G G G G G G G G -- -- --
P’R
ank
-- G G G V -- G G G V V V
SSS
P
-- G G G -- -- G G G -- -- --
https://github.com/tudelft-atlarge/graphalytics/
G=validated, on GitHub
V=validation stage
Implementation status
Map
Red
uce
2
Gir
ap
h
Gra
ph
X
Pow
erGr
aph
Graph
Lab
Neo4j PG
X.D
Gra
ph
Mat
Ope
nG
TOTE
M
Map
Graph
M
ed
us
a
LCC G G G G G G -- G G -- -- --
BFS G G G G G G G G G V V V
WC
C
G G G G G G G G G V V V
CDL
P
G G G G G G G G G -- -- --
P’R
ank
-- G G G V -- G G G V V V
SSS
P
-- G G G -- -- G G G -- -- --
Benchmarking and tuning performed by vendors
G=validated, on GitHub
V=validation stage
Graphalytics Capabilities: An Example
16
Graphalytics enables deep comparison of many systems
at once, through diverse experiments and metrics
Your system here!
Diverse algorithms Diverse metrics
Diverse
datasets
Processing time (s) + Edges[+Vertices]/s
17
Which system is the best?
It depends…
Algorithm + Dataset + Metric
OK, but … why is this system better
for this workload for this metric?
Granula Visualizer
Portable choke-point analysis for everyone!
Graphalytics = Modern Software
Engineering Process
• Graphalytics code reviews
• Internal release to LDBC partners (first, Feb 2015; last, Feb 2016)
• Public release, announced first through LDBC (Apr 2015)
• First full benchmark specification, LDBC criteria (Q1 2016)
• Jenkins continuous integration server
• SonarQube software quality analyzer
19https://github.com/tudelft-atlarge/graphalytics/
Graphalytics, in the future
• An LDBC benchmark*
• Advanced benchmarking harness
• Diverse real and synthetic datasets
• Many classes of algorithms
• Granula for manual choke-point analysis
• Modern software engineering practices
• Supports many platforms
• Enables comparison of
community-driven and industrial systems
20
github.com/tudelft-atlarge/graphalytics/
+ more data generation
+ deeper performance metrics
+ choke-point analysis
PELGA – Performance Engineering for
Large-scale Graph Analytics,
workshop with EuroPar 2016
21
22

More Related Content

What's hot

Benchmarking Tool for Graph Algorithms
Benchmarking Tool for Graph AlgorithmsBenchmarking Tool for Graph Algorithms
Benchmarking Tool for Graph AlgorithmsYash Khandelwal
 
SexTant: Visualizing Time-Evolving Linked Geospatial Data
SexTant: Visualizing Time-Evolving Linked Geospatial DataSexTant: Visualizing Time-Evolving Linked Geospatial Data
SexTant: Visualizing Time-Evolving Linked Geospatial DataCharalampos (Babis) Nikolaou
 
GDB in SV_1st_meetup_09082016
GDB in SV_1st_meetup_09082016GDB in SV_1st_meetup_09082016
GDB in SV_1st_meetup_09082016Joshua Bae
 
Creating and Utilizing Linked Open Statistical Data for the Development of Ad...
Creating and Utilizing Linked Open Statistical Data for the Development of Ad...Creating and Utilizing Linked Open Statistical Data for the Development of Ad...
Creating and Utilizing Linked Open Statistical Data for the Development of Ad...Evangelos Kalampokis
 
OpenCube Workshop at eGov2015 & ePart2015 dual conference
OpenCube Workshop at eGov2015 & ePart2015 dual conferenceOpenCube Workshop at eGov2015 & ePart2015 dual conference
OpenCube Workshop at eGov2015 & ePart2015 dual conferenceEvangelos Kalampokis
 
An Introduction to Mapping, GIS and Spatial Modelling in R (presentation)
An Introduction to Mapping, GIS and Spatial Modelling in R (presentation)An Introduction to Mapping, GIS and Spatial Modelling in R (presentation)
An Introduction to Mapping, GIS and Spatial Modelling in R (presentation)Rich Harris
 
GI2016 ppt shi (big data analytics on the internet)
GI2016 ppt shi (big data analytics on the internet)GI2016 ppt shi (big data analytics on the internet)
GI2016 ppt shi (big data analytics on the internet)IGN Vorstand
 
Distributed R: The Next Generation Platform for Predictive Analytics
Distributed R: The Next Generation Platform for Predictive AnalyticsDistributed R: The Next Generation Platform for Predictive Analytics
Distributed R: The Next Generation Platform for Predictive AnalyticsJorge Martinez de Salinas
 
Streaming Weather Data from Web APIs to Jupyter through Kafka
Streaming Weather Data from Web APIs to Jupyter through KafkaStreaming Weather Data from Web APIs to Jupyter through Kafka
Streaming Weather Data from Web APIs to Jupyter through KafkaLeo Salemann
 
Spatial Data Integrator - Software Presentation and Use Cases
Spatial Data Integrator - Software Presentation and Use CasesSpatial Data Integrator - Software Presentation and Use Cases
Spatial Data Integrator - Software Presentation and Use Casesmathieuraj
 
Field Data Collecting, Processing and Sharing: Using web Service Technologies
Field Data Collecting, Processing and Sharing: Using web Service TechnologiesField Data Collecting, Processing and Sharing: Using web Service Technologies
Field Data Collecting, Processing and Sharing: Using web Service TechnologiesNiroshan Sanjaya
 
CKANへの空間情報機能拡張実装の試み
CKANへの空間情報機能拡張実装の試みCKANへの空間情報機能拡張実装の試み
CKANへの空間情報機能拡張実装の試みYoichi Kayama
 
Magellen: Geospatial Analytics on Spark by Ram Sriharsha
Magellen: Geospatial Analytics on Spark by Ram SriharshaMagellen: Geospatial Analytics on Spark by Ram Sriharsha
Magellen: Geospatial Analytics on Spark by Ram SriharshaSpark Summit
 

What's hot (18)

Benchmarking Tool for Graph Algorithms
Benchmarking Tool for Graph AlgorithmsBenchmarking Tool for Graph Algorithms
Benchmarking Tool for Graph Algorithms
 
SexTant: Visualizing Time-Evolving Linked Geospatial Data
SexTant: Visualizing Time-Evolving Linked Geospatial DataSexTant: Visualizing Time-Evolving Linked Geospatial Data
SexTant: Visualizing Time-Evolving Linked Geospatial Data
 
2014 july use_r
2014 july use_r2014 july use_r
2014 july use_r
 
Prague Hacks 2015
Prague Hacks 2015Prague Hacks 2015
Prague Hacks 2015
 
GDB in SV_1st_meetup_09082016
GDB in SV_1st_meetup_09082016GDB in SV_1st_meetup_09082016
GDB in SV_1st_meetup_09082016
 
Creating and Utilizing Linked Open Statistical Data for the Development of Ad...
Creating and Utilizing Linked Open Statistical Data for the Development of Ad...Creating and Utilizing Linked Open Statistical Data for the Development of Ad...
Creating and Utilizing Linked Open Statistical Data for the Development of Ad...
 
OpenCube Workshop at eGov2015 & ePart2015 dual conference
OpenCube Workshop at eGov2015 & ePart2015 dual conferenceOpenCube Workshop at eGov2015 & ePart2015 dual conference
OpenCube Workshop at eGov2015 & ePart2015 dual conference
 
Data_Size_statistics
Data_Size_statisticsData_Size_statistics
Data_Size_statistics
 
Graph Databases
Graph DatabasesGraph Databases
Graph Databases
 
An Introduction to Mapping, GIS and Spatial Modelling in R (presentation)
An Introduction to Mapping, GIS and Spatial Modelling in R (presentation)An Introduction to Mapping, GIS and Spatial Modelling in R (presentation)
An Introduction to Mapping, GIS and Spatial Modelling in R (presentation)
 
GI2016 ppt shi (big data analytics on the internet)
GI2016 ppt shi (big data analytics on the internet)GI2016 ppt shi (big data analytics on the internet)
GI2016 ppt shi (big data analytics on the internet)
 
Distributed R: The Next Generation Platform for Predictive Analytics
Distributed R: The Next Generation Platform for Predictive AnalyticsDistributed R: The Next Generation Platform for Predictive Analytics
Distributed R: The Next Generation Platform for Predictive Analytics
 
Streaming Weather Data from Web APIs to Jupyter through Kafka
Streaming Weather Data from Web APIs to Jupyter through KafkaStreaming Weather Data from Web APIs to Jupyter through Kafka
Streaming Weather Data from Web APIs to Jupyter through Kafka
 
Spatial Data Integrator - Software Presentation and Use Cases
Spatial Data Integrator - Software Presentation and Use CasesSpatial Data Integrator - Software Presentation and Use Cases
Spatial Data Integrator - Software Presentation and Use Cases
 
Field Data Collecting, Processing and Sharing: Using web Service Technologies
Field Data Collecting, Processing and Sharing: Using web Service TechnologiesField Data Collecting, Processing and Sharing: Using web Service Technologies
Field Data Collecting, Processing and Sharing: Using web Service Technologies
 
Introduction To R
Introduction To RIntroduction To R
Introduction To R
 
CKANへの空間情報機能拡張実装の試み
CKANへの空間情報機能拡張実装の試みCKANへの空間情報機能拡張実装の試み
CKANへの空間情報機能拡張実装の試み
 
Magellen: Geospatial Analytics on Spark by Ram Sriharsha
Magellen: Geospatial Analytics on Spark by Ram SriharshaMagellen: Geospatial Analytics on Spark by Ram Sriharsha
Magellen: Geospatial Analytics on Spark by Ram Sriharsha
 

Viewers also liked

Modelling the Clustering Coefficient of a Random graph
Modelling the Clustering Coefficient of a Random graphModelling the Clustering Coefficient of a Random graph
Modelling the Clustering Coefficient of a Random graphGraph-TA
 
Holistic Benchmarking of Big Linked Data: HOBBIT
Holistic Benchmarking of Big Linked Data: HOBBITHolistic Benchmarking of Big Linked Data: HOBBIT
Holistic Benchmarking of Big Linked Data: HOBBITGraph-TA
 
Benchmarking Versioning for Big Linked Data
Benchmarking Versioning for Big Linked DataBenchmarking Versioning for Big Linked Data
Benchmarking Versioning for Big Linked DataGraph-TA
 
Use of Graphs for Cloud Service Selection in Multi-Cloud Environments
Use of Graphs for Cloud Service Selection in Multi-Cloud EnvironmentsUse of Graphs for Cloud Service Selection in Multi-Cloud Environments
Use of Graphs for Cloud Service Selection in Multi-Cloud EnvironmentsGraph-TA
 
Polyglot Graph Databases using OCL as pivot
Polyglot Graph Databases using OCL as pivotPolyglot Graph Databases using OCL as pivot
Polyglot Graph Databases using OCL as pivotGraph-TA
 
The scarcity of crossing dependencies: a direct outcome of a specific constra...
The scarcity of crossing dependencies: a direct outcome of a specific constra...The scarcity of crossing dependencies: a direct outcome of a specific constra...
The scarcity of crossing dependencies: a direct outcome of a specific constra...Graph-TA
 
Using Evolutionary Computing for Feature-driven Graph generation
Using Evolutionary Computing for Feature-driven Graph generationUsing Evolutionary Computing for Feature-driven Graph generation
Using Evolutionary Computing for Feature-driven Graph generationGraph-TA
 
Identifiability in Dynamic Casual Networks
Identifiability in Dynamic Casual NetworksIdentifiability in Dynamic Casual Networks
Identifiability in Dynamic Casual NetworksGraph-TA
 
Synthetic Data Generation using exponential random Graph modeling
Synthetic Data Generation using exponential random Graph modelingSynthetic Data Generation using exponential random Graph modeling
Synthetic Data Generation using exponential random Graph modelingGraph-TA
 
Computing on Event-sourced Graphs
Computing on Event-sourced GraphsComputing on Event-sourced Graphs
Computing on Event-sourced GraphsGraph-TA
 
Big data Career Opportunuties
Big data  Career OpportunutiesBig data  Career Opportunuties
Big data Career OpportunutiesDevashish Mishra
 
Demystifying Distributed Graph Processing
Demystifying Distributed Graph ProcessingDemystifying Distributed Graph Processing
Demystifying Distributed Graph ProcessingVasia Kalavri
 
Big Data - The 5 Vs Everyone Must Know
Big Data - The 5 Vs Everyone Must KnowBig Data - The 5 Vs Everyone Must Know
Big Data - The 5 Vs Everyone Must KnowBernard Marr
 
Company Profile PT SKY LAB
Company Profile PT SKY LABCompany Profile PT SKY LAB
Company Profile PT SKY LABGatot Wahyu
 
05 questões comentadas (bônus)
05 questões comentadas (bônus)05 questões comentadas (bônus)
05 questões comentadas (bônus)Português em Foco
 
Foro de Innovación en Red. II Beyond Internet Barcelona. Ponencia 'Experienci...
Foro de Innovación en Red. II Beyond Internet Barcelona. Ponencia 'Experienci...Foro de Innovación en Red. II Beyond Internet Barcelona. Ponencia 'Experienci...
Foro de Innovación en Red. II Beyond Internet Barcelona. Ponencia 'Experienci...iSOCO
 

Viewers also liked (20)

Modelling the Clustering Coefficient of a Random graph
Modelling the Clustering Coefficient of a Random graphModelling the Clustering Coefficient of a Random graph
Modelling the Clustering Coefficient of a Random graph
 
Holistic Benchmarking of Big Linked Data: HOBBIT
Holistic Benchmarking of Big Linked Data: HOBBITHolistic Benchmarking of Big Linked Data: HOBBIT
Holistic Benchmarking of Big Linked Data: HOBBIT
 
Benchmarking Versioning for Big Linked Data
Benchmarking Versioning for Big Linked DataBenchmarking Versioning for Big Linked Data
Benchmarking Versioning for Big Linked Data
 
Use of Graphs for Cloud Service Selection in Multi-Cloud Environments
Use of Graphs for Cloud Service Selection in Multi-Cloud EnvironmentsUse of Graphs for Cloud Service Selection in Multi-Cloud Environments
Use of Graphs for Cloud Service Selection in Multi-Cloud Environments
 
Polyglot Graph Databases using OCL as pivot
Polyglot Graph Databases using OCL as pivotPolyglot Graph Databases using OCL as pivot
Polyglot Graph Databases using OCL as pivot
 
The scarcity of crossing dependencies: a direct outcome of a specific constra...
The scarcity of crossing dependencies: a direct outcome of a specific constra...The scarcity of crossing dependencies: a direct outcome of a specific constra...
The scarcity of crossing dependencies: a direct outcome of a specific constra...
 
Using Evolutionary Computing for Feature-driven Graph generation
Using Evolutionary Computing for Feature-driven Graph generationUsing Evolutionary Computing for Feature-driven Graph generation
Using Evolutionary Computing for Feature-driven Graph generation
 
Identifiability in Dynamic Casual Networks
Identifiability in Dynamic Casual NetworksIdentifiability in Dynamic Casual Networks
Identifiability in Dynamic Casual Networks
 
Synthetic Data Generation using exponential random Graph modeling
Synthetic Data Generation using exponential random Graph modelingSynthetic Data Generation using exponential random Graph modeling
Synthetic Data Generation using exponential random Graph modeling
 
Computing on Event-sourced Graphs
Computing on Event-sourced GraphsComputing on Event-sourced Graphs
Computing on Event-sourced Graphs
 
Big data Career Opportunuties
Big data  Career OpportunutiesBig data  Career Opportunuties
Big data Career Opportunuties
 
Big Data
Big DataBig Data
Big Data
 
Demystifying Distributed Graph Processing
Demystifying Distributed Graph ProcessingDemystifying Distributed Graph Processing
Demystifying Distributed Graph Processing
 
Oracle Big Data Cloud Serviceのご紹介
Oracle Big Data Cloud Serviceのご紹介Oracle Big Data Cloud Serviceのご紹介
Oracle Big Data Cloud Serviceのご紹介
 
Big Data - The 5 Vs Everyone Must Know
Big Data - The 5 Vs Everyone Must KnowBig Data - The 5 Vs Everyone Must Know
Big Data - The 5 Vs Everyone Must Know
 
Mejores Imagenes(10) Maiwald
Mejores  Imagenes(10) MaiwaldMejores  Imagenes(10) Maiwald
Mejores Imagenes(10) Maiwald
 
Company Profile PT SKY LAB
Company Profile PT SKY LABCompany Profile PT SKY LAB
Company Profile PT SKY LAB
 
05 questões comentadas (bônus)
05 questões comentadas (bônus)05 questões comentadas (bônus)
05 questões comentadas (bônus)
 
Foro de Innovación en Red. II Beyond Internet Barcelona. Ponencia 'Experienci...
Foro de Innovación en Red. II Beyond Internet Barcelona. Ponencia 'Experienci...Foro de Innovación en Red. II Beyond Internet Barcelona. Ponencia 'Experienci...
Foro de Innovación en Red. II Beyond Internet Barcelona. Ponencia 'Experienci...
 
Monitor
MonitorMonitor
Monitor
 

Similar to Graphalytics: A big data benchmark for graph-processing platforms

Scaling graph investigations with Math, GPUs, & Experts
Scaling graph investigations with Math, GPUs, & ExpertsScaling graph investigations with Math, GPUs, & Experts
Scaling graph investigations with Math, GPUs, & Expertsgraphistry
 
RAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needsRAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needsConnected Data World
 
SportsDataViz using Plotly, Shiny and Flexdashboard - PlotCon 2016
SportsDataViz using Plotly, Shiny and Flexdashboard - PlotCon 2016SportsDataViz using Plotly, Shiny and Flexdashboard - PlotCon 2016
SportsDataViz using Plotly, Shiny and Flexdashboard - PlotCon 2016Tanya Cashorali
 
H2O at Poznan R Meetup
H2O at Poznan R MeetupH2O at Poznan R Meetup
H2O at Poznan R MeetupJo-fai Chow
 
Efficient And Invincible Big Data Platform In LINE
Efficient And Invincible Big Data Platform In LINEEfficient And Invincible Big Data Platform In LINE
Efficient And Invincible Big Data Platform In LINELINE Corporation
 
Scaling Spatial Analytics with Google Cloud & CARTO
Scaling Spatial Analytics with Google Cloud & CARTOScaling Spatial Analytics with Google Cloud & CARTO
Scaling Spatial Analytics with Google Cloud & CARTOCARTO
 
Distributed computing poli
Distributed computing poliDistributed computing poli
Distributed computing poliivascucristian
 
Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...
Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...
Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...TigerGraph
 
GPU-Accelerating UDFs in PySpark with Numba and PyGDF
GPU-Accelerating UDFs in PySpark with Numba and PyGDFGPU-Accelerating UDFs in PySpark with Numba and PyGDF
GPU-Accelerating UDFs in PySpark with Numba and PyGDFKeith Kraus
 
Graph protocol for accessing information about blockchains and d apps
Graph protocol for accessing information about blockchains and d appsGraph protocol for accessing information about blockchains and d apps
Graph protocol for accessing information about blockchains and d appsGene Leybzon
 
The Perfect Fit: Scalable Graph for Big Data
The Perfect Fit: Scalable Graph for Big DataThe Perfect Fit: Scalable Graph for Big Data
The Perfect Fit: Scalable Graph for Big DataInside Analysis
 
GraphGen: Conducting Graph Analytics over Relational Databases
GraphGen: Conducting Graph Analytics over Relational DatabasesGraphGen: Conducting Graph Analytics over Relational Databases
GraphGen: Conducting Graph Analytics over Relational DatabasesKonstantinos Xirogiannopoulos
 
GraphGen: Conducting Graph Analytics over Relational Databases
GraphGen: Conducting Graph Analytics over Relational DatabasesGraphGen: Conducting Graph Analytics over Relational Databases
GraphGen: Conducting Graph Analytics over Relational DatabasesPyData
 
STIC-D: algorithmic techniques for efficient parallel pagerank computation on...
STIC-D: algorithmic techniques for efficient parallel pagerank computation on...STIC-D: algorithmic techniques for efficient parallel pagerank computation on...
STIC-D: algorithmic techniques for efficient parallel pagerank computation on...Subhajit Sahu
 
Overview of Modern Graph Analysis Tools
Overview of Modern Graph Analysis ToolsOverview of Modern Graph Analysis Tools
Overview of Modern Graph Analysis ToolsKeiichiro Ono
 
NVIDIA Rapids presentation
NVIDIA Rapids presentationNVIDIA Rapids presentation
NVIDIA Rapids presentationtestSri1
 
H2O at Berlin R Meetup
H2O at Berlin R MeetupH2O at Berlin R Meetup
H2O at Berlin R MeetupJo-fai Chow
 
Berlin R Meetup
Berlin R MeetupBerlin R Meetup
Berlin R MeetupSri Ambati
 

Similar to Graphalytics: A big data benchmark for graph-processing platforms (20)

Scaling graph investigations with Math, GPUs, & Experts
Scaling graph investigations with Math, GPUs, & ExpertsScaling graph investigations with Math, GPUs, & Experts
Scaling graph investigations with Math, GPUs, & Experts
 
RAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needsRAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needs
 
SportsDataViz using Plotly, Shiny and Flexdashboard - PlotCon 2016
SportsDataViz using Plotly, Shiny and Flexdashboard - PlotCon 2016SportsDataViz using Plotly, Shiny and Flexdashboard - PlotCon 2016
SportsDataViz using Plotly, Shiny and Flexdashboard - PlotCon 2016
 
H2O at Poznan R Meetup
H2O at Poznan R MeetupH2O at Poznan R Meetup
H2O at Poznan R Meetup
 
Efficient And Invincible Big Data Platform In LINE
Efficient And Invincible Big Data Platform In LINEEfficient And Invincible Big Data Platform In LINE
Efficient And Invincible Big Data Platform In LINE
 
Scaling Spatial Analytics with Google Cloud & CARTO
Scaling Spatial Analytics with Google Cloud & CARTOScaling Spatial Analytics with Google Cloud & CARTO
Scaling Spatial Analytics with Google Cloud & CARTO
 
Distributed computing poli
Distributed computing poliDistributed computing poli
Distributed computing poli
 
Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...
Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...
Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...
 
GPU-Accelerating UDFs in PySpark with Numba and PyGDF
GPU-Accelerating UDFs in PySpark with Numba and PyGDFGPU-Accelerating UDFs in PySpark with Numba and PyGDF
GPU-Accelerating UDFs in PySpark with Numba and PyGDF
 
Graph protocol for accessing information about blockchains and d apps
Graph protocol for accessing information about blockchains and d appsGraph protocol for accessing information about blockchains and d apps
Graph protocol for accessing information about blockchains and d apps
 
Visual Network Analysis
Visual Network AnalysisVisual Network Analysis
Visual Network Analysis
 
The Perfect Fit: Scalable Graph for Big Data
The Perfect Fit: Scalable Graph for Big DataThe Perfect Fit: Scalable Graph for Big Data
The Perfect Fit: Scalable Graph for Big Data
 
GraphGen: Conducting Graph Analytics over Relational Databases
GraphGen: Conducting Graph Analytics over Relational DatabasesGraphGen: Conducting Graph Analytics over Relational Databases
GraphGen: Conducting Graph Analytics over Relational Databases
 
GraphGen: Conducting Graph Analytics over Relational Databases
GraphGen: Conducting Graph Analytics over Relational DatabasesGraphGen: Conducting Graph Analytics over Relational Databases
GraphGen: Conducting Graph Analytics over Relational Databases
 
STIC-D: algorithmic techniques for efficient parallel pagerank computation on...
STIC-D: algorithmic techniques for efficient parallel pagerank computation on...STIC-D: algorithmic techniques for efficient parallel pagerank computation on...
STIC-D: algorithmic techniques for efficient parallel pagerank computation on...
 
Overview of Modern Graph Analysis Tools
Overview of Modern Graph Analysis ToolsOverview of Modern Graph Analysis Tools
Overview of Modern Graph Analysis Tools
 
Rapids: Data Science on GPUs
Rapids: Data Science on GPUsRapids: Data Science on GPUs
Rapids: Data Science on GPUs
 
NVIDIA Rapids presentation
NVIDIA Rapids presentationNVIDIA Rapids presentation
NVIDIA Rapids presentation
 
H2O at Berlin R Meetup
H2O at Berlin R MeetupH2O at Berlin R Meetup
H2O at Berlin R Meetup
 
Berlin R Meetup
Berlin R MeetupBerlin R Meetup
Berlin R Meetup
 

More from Graph-TA

RDF Graph Data Management in Oracle Database and NoSQL Platforms
RDF Graph Data Management in Oracle Database and NoSQL PlatformsRDF Graph Data Management in Oracle Database and NoSQL Platforms
RDF Graph Data Management in Oracle Database and NoSQL PlatformsGraph-TA
 
GRAPHITE — An Extensible Graph Traversal Framework for RDBMS
GRAPHITE — An Extensible Graph Traversal Framework for RDBMSGRAPHITE — An Extensible Graph Traversal Framework for RDBMS
GRAPHITE — An Extensible Graph Traversal Framework for RDBMSGraph-TA
 
On the Discovery of Novel Drug-Target Interactions from Dense SubGraphs
On the Discovery of Novel Drug-Target Interactions from Dense SubGraphsOn the Discovery of Novel Drug-Target Interactions from Dense SubGraphs
On the Discovery of Novel Drug-Target Interactions from Dense SubGraphsGraph-TA
 
Autograph: an evolving lightweight graph tool
Autograph: an evolving lightweight graph toolAutograph: an evolving lightweight graph tool
Autograph: an evolving lightweight graph toolGraph-TA
 
Understanding Graph Structure in Knowledge Bases
Understanding Graph Structure in Knowledge BasesUnderstanding Graph Structure in Knowledge Bases
Understanding Graph Structure in Knowledge BasesGraph-TA
 
Finding patterns of chronic disease and medication prescriptions from a large...
Finding patterns of chronic disease and medication prescriptions from a large...Finding patterns of chronic disease and medication prescriptions from a large...
Finding patterns of chronic disease and medication prescriptions from a large...Graph-TA
 
Recent Updates on IBM System G — GraphBIG and Temporal Data
Recent Updates on IBM System G — GraphBIG and Temporal DataRecent Updates on IBM System G — GraphBIG and Temporal Data
Recent Updates on IBM System G — GraphBIG and Temporal DataGraph-TA
 
Analysing the degree distribution of real graphs by means of several probabil...
Analysing the degree distribution of real graphs by means of several probabil...Analysing the degree distribution of real graphs by means of several probabil...
Analysing the degree distribution of real graphs by means of several probabil...Graph-TA
 
SPIMBENCH: A Scalable, Schema-Aware Instance Matching Benchmark for the Seman...
SPIMBENCH: A Scalable, Schema-Aware Instance Matching Benchmark for the Seman...SPIMBENCH: A Scalable, Schema-Aware Instance Matching Benchmark for the Seman...
SPIMBENCH: A Scalable, Schema-Aware Instance Matching Benchmark for the Seman...Graph-TA
 
Generating synthetic online social network graph data and topologies
Generating synthetic online social network graph data and topologiesGenerating synthetic online social network graph data and topologies
Generating synthetic online social network graph data and topologiesGraph-TA
 
Deriving an Emergent Relational Schema from RDF Data
Deriving an Emergent Relational Schema from RDF DataDeriving an Emergent Relational Schema from RDF Data
Deriving an Emergent Relational Schema from RDF DataGraph-TA
 
Managing RDF data with graph databases
Managing RDF data with graph databasesManaging RDF data with graph databases
Managing RDF data with graph databasesGraph-TA
 
Graph Based Word Spotting Approach for Large Document Collections
Graph Based Word Spotting Approach for Large Document CollectionsGraph Based Word Spotting Approach for Large Document Collections
Graph Based Word Spotting Approach for Large Document CollectionsGraph-TA
 
Use of graphs for political analysis
Use of graphs for political analysisUse of graphs for political analysis
Use of graphs for political analysisGraph-TA
 
Graphium Chrysalis: Exploiting Graph Database
Graphium Chrysalis: Exploiting Graph DatabaseGraphium Chrysalis: Exploiting Graph Database
Graphium Chrysalis: Exploiting Graph DatabaseGraph-TA
 
Langford sequences through a product of labeled digraphs
Langford sequences through a product of labeled digraphsLangford sequences through a product of labeled digraphs
Langford sequences through a product of labeled digraphsGraph-TA
 

More from Graph-TA (16)

RDF Graph Data Management in Oracle Database and NoSQL Platforms
RDF Graph Data Management in Oracle Database and NoSQL PlatformsRDF Graph Data Management in Oracle Database and NoSQL Platforms
RDF Graph Data Management in Oracle Database and NoSQL Platforms
 
GRAPHITE — An Extensible Graph Traversal Framework for RDBMS
GRAPHITE — An Extensible Graph Traversal Framework for RDBMSGRAPHITE — An Extensible Graph Traversal Framework for RDBMS
GRAPHITE — An Extensible Graph Traversal Framework for RDBMS
 
On the Discovery of Novel Drug-Target Interactions from Dense SubGraphs
On the Discovery of Novel Drug-Target Interactions from Dense SubGraphsOn the Discovery of Novel Drug-Target Interactions from Dense SubGraphs
On the Discovery of Novel Drug-Target Interactions from Dense SubGraphs
 
Autograph: an evolving lightweight graph tool
Autograph: an evolving lightweight graph toolAutograph: an evolving lightweight graph tool
Autograph: an evolving lightweight graph tool
 
Understanding Graph Structure in Knowledge Bases
Understanding Graph Structure in Knowledge BasesUnderstanding Graph Structure in Knowledge Bases
Understanding Graph Structure in Knowledge Bases
 
Finding patterns of chronic disease and medication prescriptions from a large...
Finding patterns of chronic disease and medication prescriptions from a large...Finding patterns of chronic disease and medication prescriptions from a large...
Finding patterns of chronic disease and medication prescriptions from a large...
 
Recent Updates on IBM System G — GraphBIG and Temporal Data
Recent Updates on IBM System G — GraphBIG and Temporal DataRecent Updates on IBM System G — GraphBIG and Temporal Data
Recent Updates on IBM System G — GraphBIG and Temporal Data
 
Analysing the degree distribution of real graphs by means of several probabil...
Analysing the degree distribution of real graphs by means of several probabil...Analysing the degree distribution of real graphs by means of several probabil...
Analysing the degree distribution of real graphs by means of several probabil...
 
SPIMBENCH: A Scalable, Schema-Aware Instance Matching Benchmark for the Seman...
SPIMBENCH: A Scalable, Schema-Aware Instance Matching Benchmark for the Seman...SPIMBENCH: A Scalable, Schema-Aware Instance Matching Benchmark for the Seman...
SPIMBENCH: A Scalable, Schema-Aware Instance Matching Benchmark for the Seman...
 
Generating synthetic online social network graph data and topologies
Generating synthetic online social network graph data and topologiesGenerating synthetic online social network graph data and topologies
Generating synthetic online social network graph data and topologies
 
Deriving an Emergent Relational Schema from RDF Data
Deriving an Emergent Relational Schema from RDF DataDeriving an Emergent Relational Schema from RDF Data
Deriving an Emergent Relational Schema from RDF Data
 
Managing RDF data with graph databases
Managing RDF data with graph databasesManaging RDF data with graph databases
Managing RDF data with graph databases
 
Graph Based Word Spotting Approach for Large Document Collections
Graph Based Word Spotting Approach for Large Document CollectionsGraph Based Word Spotting Approach for Large Document Collections
Graph Based Word Spotting Approach for Large Document Collections
 
Use of graphs for political analysis
Use of graphs for political analysisUse of graphs for political analysis
Use of graphs for political analysis
 
Graphium Chrysalis: Exploiting Graph Database
Graphium Chrysalis: Exploiting Graph DatabaseGraphium Chrysalis: Exploiting Graph Database
Graphium Chrysalis: Exploiting Graph Database
 
Langford sequences through a product of labeled digraphs
Langford sequences through a product of labeled digraphsLangford sequences through a product of labeled digraphs
Langford sequences through a product of labeled digraphs
 

Recently uploaded

Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startQuintin Balsdon
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxJuliansyahHarahap1
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapRishantSharmaFr
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTbhaskargani46
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueBhangaleSonal
 
Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdfKamal Acharya
 
PE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and propertiesPE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and propertiessarkmank1
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptDineshKumar4165
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdfKamal Acharya
 
School management system project Report.pdf
School management system project Report.pdfSchool management system project Report.pdf
School management system project Report.pdfKamal Acharya
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsvanyagupta248
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . pptDineshKumar4165
 
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwaitjaanualu31
 
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best ServiceTamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Servicemeghakumariji156
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptxJIT KUMAR GUPTA
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaOmar Fathy
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXssuser89054b
 
Wadi Rum luxhotel lodge Analysis case study.pptx
Wadi Rum luxhotel lodge Analysis case study.pptxWadi Rum luxhotel lodge Analysis case study.pptx
Wadi Rum luxhotel lodge Analysis case study.pptxNadaHaitham1
 

Recently uploaded (20)

Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptx
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdf
 
PE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and propertiesPE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and properties
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdf
 
School management system project Report.pdf
School management system project Report.pdfSchool management system project Report.pdf
School management system project Report.pdf
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech students
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
 
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best ServiceTamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
 
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS Lambda
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
Wadi Rum luxhotel lodge Analysis case study.pptx
Wadi Rum luxhotel lodge Analysis case study.pptxWadi Rum luxhotel lodge Analysis case study.pptx
Wadi Rum luxhotel lodge Analysis case study.pptx
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 

Graphalytics: A big data benchmark for graph-processing platforms

  • 1. Mihai Capotã, Arnau Prat, Peter Boncz, Hassan Chafi Yong Guo, Ana Lucia Varbanescu, Graphalytics: Benchmarking Graph-Processing Platforms LDBC TUC Meeting UPC Barcelona, March 2016 GRAPHALYTICS A Big Data Benchmark for Graph-Processing Platforms 1 http://bl.ocks.org/mbostock/4062045 Tim Hegeman, Wing Lung Ngai, https://github.com/tudelft-atlarge/graphalytics/ GRAPHALYTICS was made possible by a generous contribution from Oracle. Alexandru Iosup, Stijn Heldens,
  • 2. Graphs at the Core of Our Society: The LinkedIn ExampleData Deluge 2 Sources: Vincenzo Cosenza, The State of LinkedIn, http://vincos.it/the-state-of-linkedin/ via Christopher Penn, http://www.shiftcomm.com/2014/02/state-linkedin-social-media-dark-horse/ Apr 2014 400 Nov 2015 400
  • 3. LinkedIn Is Not Unique: Data Deluge 270M MAU 200+ avg followers >54B edges 1.2B MAU 0.8B DAU 200+ avg followers >240B edges company/day: 100+ posts, 1,000+ comments IBM 280k employee- -users, 2.6M followers
  • 4. Graph Processing @large 4 A Graph Processing Platform Streaming not considered in this presentation. Interactive processing not considered in this presentation. AlgorithmETL Active Storage (filtering, compression, replication, caching) Distribution to processing platform
  • 5. Graph Processing @large 5 A Graph Processing Platform Streaming not considered in this presentation. Interactive processing not considered in this presentation. AlgorithmETL Active Storage (filtering, compression, replication, caching) Distribution to processing platform Ideally, N cores/disks  Nx faster Ideally, N cores/disks  Nx faster
  • 6. Graph Processing @large 6 A Graph Processing Platform Streaming not considered in this presentation. Interactive processing not considered in this presentation. AlgorithmETL Active Storage (filtering, compression, replication, caching) Distribution to processing platform Ideally, N cores/disks  Nx faster Ideally, N cores/disks  Nx faster Compute-intesive workload different/more complex analysis  ?x slower Dataset-dependent workload unfriendly graphs  ??x slower Data-intesive workload 10x graph size  100x—1,000x slower
  • 7. Graph-Processing Platforms • Platform: the combined hardware, software, and programming system that is being used to complete a graph processing task 7 Trinity 2 Which to choose? What to tune?
  • 8. Graphalytics, in a nutshell • An LDBC benchmark* • Advanced benchmarking harness • Diverse real and synthetic datasets • Many classes of algorithms • Granula for manual choke-point analysis • Modern software engineering practices • Supports many platforms • Enables comparison of community-driven and industrial systems 8 http://graphalytics.ewi.tudelft.nl https://github.com/tudelft-atlarge/graphalytics/
  • 9. Benchmarking Harness 9 Iosup et al. LDBC Graphalytics: A Benchmark for Large Scale Graph Analysis on Parallel and Distributed Platform (submitted).
  • 10. Graphalytics = Representative Classes of Algorithms and Datasets • 2-stage selection process of algorithms datasets 10 Class Examples % Graph Statistics Diameter, Local Clust. Coeff., PageRank 20 Graph Traversal BFS, SSSP, DFS 50 Connected Comp. Reachability, BiCC, Weakly CC 10 Community Detection Clustering, Nearest Neighbor, Community Detection w Label Propagation 5 Other Sampling, Partitioning <15 Guo et al. How Well do Graph-Processing Platforms Perform? An Empirical Performance Evaluation and Analysis, IPDPS’14. + weighted graphs: Single-Source Shortest Paths (~35%)
  • 11. Graphalytics = Distributed Graph Generation w DATAGEN Person Generation Edge Generation Activity Generation “Knows” graph serializa tion Activity serializa tion Graphalytics 11 • Rich set of configurations • More diverse degree distribution than Graph500 • Realistic clustering coefficient and assortativity Level of Detail
  • 12. Graphalytics = Portable Perf. Analysis w Granula Graph Processing System Logging Patch Performance Analyzer Granula Performance Archive Granula Performance Model Modeling Archiving logs rules Granula Archiver Sharing Monitoring Minimal code invasion + automated data collection at runtime + portable archive (+ web UI)  portable bottleneck analysis
  • 13. Graphalytics = Diverse Set of Automated Experiments Category Experiment Algo. Data Nodes/ Threads Metrics Baseline Dataset variety BFS,PR All 1 Run, norm. Algorithm variety All R4(S), D300(L) 1 Runtime Scalability Vertical vs. horiz. BFS, PR D300(L), D1000(XL) 1—16/1—32 Runtime, S Weak vs. strong BFS, PR G22(S)— G26(XL) 1—16 Runtime, S Robustness Stress test BFS All 1 SLA met Variability BFS D300(L), D1000(L) 1/16 CV Self-Test Time to run/part -- Datagen 1—16 Runtime 13
  • 14. Implementation status Map Red uce 2 Gir ap h Gra ph X Pow erGr aph Graph Lab Neo4j PG X.D Gra ph Mat Ope nG TOTE M Map Graph M ed us a LCC G G G G G G -- G G -- -- -- BFS G G G G G G G G G V V V WC C G G G G G G G G G V V V CDL P G G G G G G G G G -- -- -- P’R ank -- G G G V -- G G G V V V SSS P -- G G G -- -- G G G -- -- -- https://github.com/tudelft-atlarge/graphalytics/ G=validated, on GitHub V=validation stage
  • 15. Implementation status Map Red uce 2 Gir ap h Gra ph X Pow erGr aph Graph Lab Neo4j PG X.D Gra ph Mat Ope nG TOTE M Map Graph M ed us a LCC G G G G G G -- G G -- -- -- BFS G G G G G G G G G V V V WC C G G G G G G G G G V V V CDL P G G G G G G G G G -- -- -- P’R ank -- G G G V -- G G G V V V SSS P -- G G G -- -- G G G -- -- -- Benchmarking and tuning performed by vendors G=validated, on GitHub V=validation stage
  • 16. Graphalytics Capabilities: An Example 16 Graphalytics enables deep comparison of many systems at once, through diverse experiments and metrics Your system here! Diverse algorithms Diverse metrics Diverse datasets
  • 17. Processing time (s) + Edges[+Vertices]/s 17 Which system is the best? It depends… Algorithm + Dataset + Metric OK, but … why is this system better for this workload for this metric?
  • 18. Granula Visualizer Portable choke-point analysis for everyone!
  • 19. Graphalytics = Modern Software Engineering Process • Graphalytics code reviews • Internal release to LDBC partners (first, Feb 2015; last, Feb 2016) • Public release, announced first through LDBC (Apr 2015) • First full benchmark specification, LDBC criteria (Q1 2016) • Jenkins continuous integration server • SonarQube software quality analyzer 19https://github.com/tudelft-atlarge/graphalytics/
  • 20. Graphalytics, in the future • An LDBC benchmark* • Advanced benchmarking harness • Diverse real and synthetic datasets • Many classes of algorithms • Granula for manual choke-point analysis • Modern software engineering practices • Supports many platforms • Enables comparison of community-driven and industrial systems 20 github.com/tudelft-atlarge/graphalytics/ + more data generation + deeper performance metrics + choke-point analysis
  • 21. PELGA – Performance Engineering for Large-scale Graph Analytics, workshop with EuroPar 2016 21
  • 22. 22