SlideShare uma empresa Scribd logo
1 de 12
Baixar para ler offline
BenchMarking Tool for
Graph Algorithms
IIIT-H Cloud Computing - Major Project
By:
Abhinaba Sarkar 201405616
Malavika Reddy 201201193
Yash Khandelwal 201302164
Nikita Kad 201330030
Description
● In computer science and mathematics, graphs are abstract data structures that model
structural relationships among objects. They are now widely used for data modeling in
application domains for which identifying relationship patterns, rules, and anomalies is useful.
● These domains include the web graph, social networks,etc. The ever-increasing size of graph-
structured data for these applications creates a critical need for scalable systems that can
process large amounts of it efficiently.
● The project aims at making a benchmarking tool for testing the performance of graph
algorithms like BFS, Pagerank,etc. with MapReduce, Giraph, GraphLab and Neo4j and testing
which approach works better on what kind of graphs.
Motivation
● Analyze the runtime of different types of graph algorithms on different
types of distributed systems.
● Performing computation on a graph data structure requires processing at
each node.
● Each node contains node-specific data as well as links (edges) to other
nodes. So computation must traverse the graph which will take a huge
amount of time.
Approach
The BFS/SSSP algorithm is broken in 2 tasks:
● Map Task:In each Map task, we discover all the neighbors of the node currently in queue (we
used color encoding GRAY for nodes in queue) and add them to our graph.
● Reduce Task:In each Reduce task, we set the correct level of the nodes and update the graph.
The pagerank algorithm is also broken in 2 steps:
● Map Task: Each page emit its neighbours and current pagerank.
● Reduce Task: For each key(page) new page rank is calculated using pagerank emitted in the
map task.
○ PR(A)=(1-d) + d(PR(T1)/C(T1) + ... +PR(Tn)/C(Tn)) Where - C(P) is the cardinality (out-
degree) of page P, d is the damping (“random URL”) factor.
Dijkstra:
● Map task : In each of the map tasks, neighbors are discovered and put into
the queue with color coding gray.
● Reduce task : In each of the reduce tasks, we select the nodes according to
the shortest distances from the current node.
Approach contd.
Giraph and Hadoop
All the computations are done on a cluster of 2 nodes
Graphlab
All the computations are performed on single machine
Applications
In today’s world, dynamic social graphs (like:
linkedin, twitter and facebook) are not feasible to
process in single node. Therefore we need to
benchmark the runtime of different graph
algorithms in distributed system.
Example graph: LinkedIn’s social graph
Complexity
● BFS: The complexity of standard BFS algorithm is O(V+E) but because of
the overhead of read/write in distributed computing, the order reaches O
(E*Depth).
● Similar is the case for Dijkstra’s algorithm. But number of iterations will be
higher than BFS.
● Page Rank: The Complexity of pagerank in distributed system is –
(No. of Node + No. of Relations)*Iterations
Benchmarking - Giraph
Nodes Time
1000 4 min
7.836 sec
1 million 10 min
11.443sec
Nodes Time
1000 3 min 5.655
sec
1 million 11 min 0.05
sec
Nodes Time
1000 5 min
12.111 sec
1 million 16 min
8.652 sec
BFS Dijkstra Pagerank
Nodes Time
1000 6.029 sec
10,000 20.154 sec
1 million 1 min 11.124
sec
Nodes Time
1000 4.852 sec
10,000 13.029 sec
1 million 1 min 10.576sec
Page-Rank
Dijkstra
Benchmarking - Graphlab
Benchmarking - Hadoop
Nodes Time
1000 4 min
7.836 sec
1 million 10 min
11.443sec
Nodes Time
1000 3 min 5.655
sec
1 million 11 min 0.05
sec
BFS Dijkstra Pagerank
Nodes Time
1000 5 min
12.111 sec
1 million 16 min
8.652 sec
BFS and Dijkstra’s runtime depend on the depth of the input graph.
Problems we faced
● Poor locality of memory access.
● Very little work per vertex.
● Changing degree of parallelism.
● Running over many machines makes the problem worse
Conclusion and Future Work
● Although GraphLab is fast, there is constraint on memory as it requires as much memory to
contain the edges and their associated values of any single vertex in the graph.
● From the experimental results, it is seen that the time taken for pagerank algorithm is directly
proportional to the number of relations in the graph when the number of nodes and iterations
are constant. This explains the huge difference in time.
● The runtime of BFS is directly proportional to the depth of the graph. So, greater the depth,
more will be the number of iterations and hence more time.
Future Work:
Taking the input graph from file adds a huge overhead of reading and writing to files in each
iteration, so if somehow we can store the graph and its properties in a Database, the read/write
overhead will be gone and the query time will be reduced. So,we plan to include Database in it.

Mais conteúdo relacionado

Mais procurados

Dr Richard Fry - Using R as a GIS
Dr Richard Fry - Using R as a GISDr Richard Fry - Using R as a GIS
Dr Richard Fry - Using R as a GISShaun Lewis
 
Using R to Visualize Spatial Data: R as GIS - Guy Lansley
Using R to Visualize Spatial Data: R as GIS - Guy LansleyUsing R to Visualize Spatial Data: R as GIS - Guy Lansley
Using R to Visualize Spatial Data: R as GIS - Guy LansleyGuy Lansley
 
Spatial Data Integrator - Software Presentation and Use Cases
Spatial Data Integrator - Software Presentation and Use CasesSpatial Data Integrator - Software Presentation and Use Cases
Spatial Data Integrator - Software Presentation and Use Casesmathieuraj
 
Executing Joins Dynamically in DDBS Query Optimizer
Executing Joins Dynamically in DDBS Query OptimizerExecuting Joins Dynamically in DDBS Query Optimizer
Executing Joins Dynamically in DDBS Query OptimizerEr. Shiva K. Shrestha
 
Datech2014-Session1-Document Representation Refinement for Precise Region Des...
Datech2014-Session1-Document Representation Refinement for Precise Region Des...Datech2014-Session1-Document Representation Refinement for Precise Region Des...
Datech2014-Session1-Document Representation Refinement for Precise Region Des...IMPACT Centre of Competence
 
Graph of UK train stations
Graph of UK train stationsGraph of UK train stations
Graph of UK train stationsDaniyar Mukhanov
 
Distributed Computing Seminar - Lecture 2: MapReduce Theory and Implementation
Distributed Computing Seminar - Lecture 2: MapReduce Theory and ImplementationDistributed Computing Seminar - Lecture 2: MapReduce Theory and Implementation
Distributed Computing Seminar - Lecture 2: MapReduce Theory and Implementationtugrulh
 
Reactive Databases for Big Data applications
Reactive Databases for Big Data applicationsReactive Databases for Big Data applications
Reactive Databases for Big Data applicationsGraph-TA
 
5 Ways to Improve Your LiDAR Workflows
5 Ways to Improve Your LiDAR Workflows5 Ways to Improve Your LiDAR Workflows
5 Ways to Improve Your LiDAR WorkflowsSafe Software
 
Network analysis and Geocoding.
Network analysis and Geocoding.Network analysis and Geocoding.
Network analysis and Geocoding.Habiba28
 
Sparse inverse covariance estimation
Sparse inverse covariance estimationSparse inverse covariance estimation
Sparse inverse covariance estimationAyush Singh, MS
 
Graph Neural Network - Introduction
Graph Neural Network - IntroductionGraph Neural Network - Introduction
Graph Neural Network - IntroductionJungwon Kim
 
How Powerful are Graph Networks?
How Powerful are Graph Networks?How Powerful are Graph Networks?
How Powerful are Graph Networks?IAMAl
 
Time travel and time series analysis with pandas + statsmodels
Time travel and time series analysis with pandas + statsmodelsTime travel and time series analysis with pandas + statsmodels
Time travel and time series analysis with pandas + statsmodelsAlexander Hendorf
 
Parallel Algorithms- Sorting and Graph
Parallel Algorithms- Sorting and GraphParallel Algorithms- Sorting and Graph
Parallel Algorithms- Sorting and GraphDr Shashikant Athawale
 
Par add shared ifc parameters
Par add shared ifc parametersPar add shared ifc parameters
Par add shared ifc parametersMenno Mekes
 

Mais procurados (20)

Dr Richard Fry - Using R as a GIS
Dr Richard Fry - Using R as a GISDr Richard Fry - Using R as a GIS
Dr Richard Fry - Using R as a GIS
 
Using R to Visualize Spatial Data: R as GIS - Guy Lansley
Using R to Visualize Spatial Data: R as GIS - Guy LansleyUsing R to Visualize Spatial Data: R as GIS - Guy Lansley
Using R to Visualize Spatial Data: R as GIS - Guy Lansley
 
Spatial Data Integrator - Software Presentation and Use Cases
Spatial Data Integrator - Software Presentation and Use CasesSpatial Data Integrator - Software Presentation and Use Cases
Spatial Data Integrator - Software Presentation and Use Cases
 
Executing Joins Dynamically in DDBS Query Optimizer
Executing Joins Dynamically in DDBS Query OptimizerExecuting Joins Dynamically in DDBS Query Optimizer
Executing Joins Dynamically in DDBS Query Optimizer
 
Datech2014-Session1-Document Representation Refinement for Precise Region Des...
Datech2014-Session1-Document Representation Refinement for Precise Region Des...Datech2014-Session1-Document Representation Refinement for Precise Region Des...
Datech2014-Session1-Document Representation Refinement for Precise Region Des...
 
Graph of UK train stations
Graph of UK train stationsGraph of UK train stations
Graph of UK train stations
 
BarnieMAT
BarnieMATBarnieMAT
BarnieMAT
 
Distributed Computing Seminar - Lecture 2: MapReduce Theory and Implementation
Distributed Computing Seminar - Lecture 2: MapReduce Theory and ImplementationDistributed Computing Seminar - Lecture 2: MapReduce Theory and Implementation
Distributed Computing Seminar - Lecture 2: MapReduce Theory and Implementation
 
Reactive Databases for Big Data applications
Reactive Databases for Big Data applicationsReactive Databases for Big Data applications
Reactive Databases for Big Data applications
 
5 Ways to Improve Your LiDAR Workflows
5 Ways to Improve Your LiDAR Workflows5 Ways to Improve Your LiDAR Workflows
5 Ways to Improve Your LiDAR Workflows
 
Network analysis and Geocoding.
Network analysis and Geocoding.Network analysis and Geocoding.
Network analysis and Geocoding.
 
GIS fundamentals - raster
GIS fundamentals - rasterGIS fundamentals - raster
GIS fundamentals - raster
 
Parallel Processing Concepts
Parallel Processing Concepts Parallel Processing Concepts
Parallel Processing Concepts
 
Sparse inverse covariance estimation
Sparse inverse covariance estimationSparse inverse covariance estimation
Sparse inverse covariance estimation
 
Graph Neural Network - Introduction
Graph Neural Network - IntroductionGraph Neural Network - Introduction
Graph Neural Network - Introduction
 
How Powerful are Graph Networks?
How Powerful are Graph Networks?How Powerful are Graph Networks?
How Powerful are Graph Networks?
 
Time travel and time series analysis with pandas + statsmodels
Time travel and time series analysis with pandas + statsmodelsTime travel and time series analysis with pandas + statsmodels
Time travel and time series analysis with pandas + statsmodels
 
Graph Databases
Graph DatabasesGraph Databases
Graph Databases
 
Parallel Algorithms- Sorting and Graph
Parallel Algorithms- Sorting and GraphParallel Algorithms- Sorting and Graph
Parallel Algorithms- Sorting and Graph
 
Par add shared ifc parameters
Par add shared ifc parametersPar add shared ifc parameters
Par add shared ifc parameters
 

Destaque

Benchmarking tool for graph algorithms
Benchmarking tool for graph algorithmsBenchmarking tool for graph algorithms
Benchmarking tool for graph algorithmsYash Khandelwal
 
Dynamic Draph / Iterative Computation on Apache Giraph
Dynamic Draph / Iterative Computation on Apache GiraphDynamic Draph / Iterative Computation on Apache Giraph
Dynamic Draph / Iterative Computation on Apache GiraphDataWorks Summit
 
Big Graph Analytics Systems (Sigmod16 Tutorial)
Big Graph Analytics Systems (Sigmod16 Tutorial)Big Graph Analytics Systems (Sigmod16 Tutorial)
Big Graph Analytics Systems (Sigmod16 Tutorial)Yuanyuan Tian
 
Selling Your House Spring-2015
Selling Your House Spring-2015Selling Your House Spring-2015
Selling Your House Spring-2015MICHAEL TESSARO
 
Give your body a nutritious diet
Give your body a nutritious dietGive your body a nutritious diet
Give your body a nutritious dietGM Diet Magic
 
Київська русь
Київська русьКиївська русь
Київська русьsvinchuk
 
El misterio del solitario
El misterio del solitarioEl misterio del solitario
El misterio del solitarioPamela Quirarte
 
Graphs, Edges & Nodes - Untangling the Social Web
Graphs, Edges & Nodes - Untangling the Social WebGraphs, Edges & Nodes - Untangling the Social Web
Graphs, Edges & Nodes - Untangling the Social WebJoël Perras
 
ملخص رسالة ماجستير أحمد المباريدي
ملخص رسالة ماجستير أحمد المباريديملخص رسالة ماجستير أحمد المباريدي
ملخص رسالة ماجستير أحمد المباريديAhmed EL-Mabaredy
 
Apple diseases by Nazia Manzar
Apple diseases by Nazia ManzarApple diseases by Nazia Manzar
Apple diseases by Nazia ManzarNazia Manzar
 
Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15
Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15
Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15MLconf
 
Instagramrettino
InstagramrettinoInstagramrettino
Instagramrettinojoeyrettino
 

Destaque (17)

Benchmarking tool for graph algorithms
Benchmarking tool for graph algorithmsBenchmarking tool for graph algorithms
Benchmarking tool for graph algorithms
 
Dynamic Draph / Iterative Computation on Apache Giraph
Dynamic Draph / Iterative Computation on Apache GiraphDynamic Draph / Iterative Computation on Apache Giraph
Dynamic Draph / Iterative Computation on Apache Giraph
 
Apache Giraph
Apache GiraphApache Giraph
Apache Giraph
 
Sparksee overview
Sparksee overviewSparksee overview
Sparksee overview
 
Big Graph Analytics Systems (Sigmod16 Tutorial)
Big Graph Analytics Systems (Sigmod16 Tutorial)Big Graph Analytics Systems (Sigmod16 Tutorial)
Big Graph Analytics Systems (Sigmod16 Tutorial)
 
Selling Your House Spring-2015
Selling Your House Spring-2015Selling Your House Spring-2015
Selling Your House Spring-2015
 
Give your body a nutritious diet
Give your body a nutritious dietGive your body a nutritious diet
Give your body a nutritious diet
 
Київська русь
Київська русьКиївська русь
Київська русь
 
El misterio del solitario
El misterio del solitarioEl misterio del solitario
El misterio del solitario
 
1
11
1
 
Graphs, Edges & Nodes - Untangling the Social Web
Graphs, Edges & Nodes - Untangling the Social WebGraphs, Edges & Nodes - Untangling the Social Web
Graphs, Edges & Nodes - Untangling the Social Web
 
5. organ support techniques
5. organ support techniques5. organ support techniques
5. organ support techniques
 
ملخص رسالة ماجستير أحمد المباريدي
ملخص رسالة ماجستير أحمد المباريديملخص رسالة ماجستير أحمد المباريدي
ملخص رسالة ماجستير أحمد المباريدي
 
Apple diseases by Nazia Manzar
Apple diseases by Nazia ManzarApple diseases by Nazia Manzar
Apple diseases by Nazia Manzar
 
Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15
Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15
Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15
 
Instagramrettino
InstagramrettinoInstagramrettino
Instagramrettino
 
2015 SW마에스트로 100+ 컨퍼런스_Hacking IoT
2015 SW마에스트로 100+ 컨퍼런스_Hacking IoT2015 SW마에스트로 100+ 컨퍼런스_Hacking IoT
2015 SW마에스트로 100+ 컨퍼런스_Hacking IoT
 

Semelhante a Benchmarking Tool for Graph Algorithms

Streaming Python on Hadoop
Streaming Python on HadoopStreaming Python on Hadoop
Streaming Python on HadoopVivian S. Zhang
 
How to Automate CAD & GIS Integration
How to Automate CAD & GIS IntegrationHow to Automate CAD & GIS Integration
How to Automate CAD & GIS IntegrationSafe Software
 
Machine Learning + Graph Databases for Better Recommendations V1 08/06/2022
Machine Learning + Graph Databases for Better Recommendations V1 08/06/2022Machine Learning + Graph Databases for Better Recommendations V1 08/06/2022
Machine Learning + Graph Databases for Better Recommendations V1 08/06/2022ArangoDB Database
 
Machine Learning + Graph Databases for Better Recommendations
Machine Learning + Graph Databases for Better RecommendationsMachine Learning + Graph Databases for Better Recommendations
Machine Learning + Graph Databases for Better RecommendationsChristopherWoodward16
 
ATO 2022 - Machine Learning + Graph Databases for Better Recommendations (3)....
ATO 2022 - Machine Learning + Graph Databases for Better Recommendations (3)....ATO 2022 - Machine Learning + Graph Databases for Better Recommendations (3)....
ATO 2022 - Machine Learning + Graph Databases for Better Recommendations (3)....ArangoDB Database
 
STIC-D: algorithmic techniques for efficient parallel pagerank computation on...
STIC-D: algorithmic techniques for efficient parallel pagerank computation on...STIC-D: algorithmic techniques for efficient parallel pagerank computation on...
STIC-D: algorithmic techniques for efficient parallel pagerank computation on...Subhajit Sahu
 
Map reduce programming model to solve graph problems
Map reduce programming model to solve graph problemsMap reduce programming model to solve graph problems
Map reduce programming model to solve graph problemsNishant Gandhi
 
Big Data processing with Apache Spark
Big Data processing with Apache SparkBig Data processing with Apache Spark
Big Data processing with Apache SparkLucian Neghina
 
Machine Learning + Graph Databases for Better Recommendations V2 08/20/2022
Machine Learning + Graph Databases for Better Recommendations V2 08/20/2022Machine Learning + Graph Databases for Better Recommendations V2 08/20/2022
Machine Learning + Graph Databases for Better Recommendations V2 08/20/2022ArangoDB Database
 
How to Get the Most Out of LiDAR Data
How to Get the Most Out of LiDAR DataHow to Get the Most Out of LiDAR Data
How to Get the Most Out of LiDAR DataSafe Software
 
Chapter 3 principles of parallel algorithm design
Chapter 3   principles of parallel algorithm designChapter 3   principles of parallel algorithm design
Chapter 3 principles of parallel algorithm designDenisAkbar1
 
On Traffic-Aware Partition and Aggregation in Map Reduce for Big Data Applica...
On Traffic-Aware Partition and Aggregation in Map Reduce for Big Data Applica...On Traffic-Aware Partition and Aggregation in Map Reduce for Big Data Applica...
On Traffic-Aware Partition and Aggregation in Map Reduce for Big Data Applica...dbpublications
 
Parallel Data Processing with MapReduce: A Survey
Parallel Data Processing with MapReduce: A SurveyParallel Data Processing with MapReduce: A Survey
Parallel Data Processing with MapReduce: A SurveyKyong-Ha Lee
 
Optimal Chain Matrix Multiplication Big Data Perspective
Optimal Chain Matrix Multiplication Big Data PerspectiveOptimal Chain Matrix Multiplication Big Data Perspective
Optimal Chain Matrix Multiplication Big Data Perspectiveপল্লব রায়
 
MULTIPLE DAG APPLICATIONS SCHEDULING ON A CLUSTER OF PROCESSORS
MULTIPLE DAG APPLICATIONS SCHEDULING ON A CLUSTER OF PROCESSORSMULTIPLE DAG APPLICATIONS SCHEDULING ON A CLUSTER OF PROCESSORS
MULTIPLE DAG APPLICATIONS SCHEDULING ON A CLUSTER OF PROCESSORScscpconf
 
Multiple dag applications
Multiple dag applicationsMultiple dag applications
Multiple dag applicationscsandit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 

Semelhante a Benchmarking Tool for Graph Algorithms (20)

Streaming Python on Hadoop
Streaming Python on HadoopStreaming Python on Hadoop
Streaming Python on Hadoop
 
How to Automate CAD & GIS Integration
How to Automate CAD & GIS IntegrationHow to Automate CAD & GIS Integration
How to Automate CAD & GIS Integration
 
Machine Learning + Graph Databases for Better Recommendations V1 08/06/2022
Machine Learning + Graph Databases for Better Recommendations V1 08/06/2022Machine Learning + Graph Databases for Better Recommendations V1 08/06/2022
Machine Learning + Graph Databases for Better Recommendations V1 08/06/2022
 
Machine Learning + Graph Databases for Better Recommendations
Machine Learning + Graph Databases for Better RecommendationsMachine Learning + Graph Databases for Better Recommendations
Machine Learning + Graph Databases for Better Recommendations
 
ATO 2022 - Machine Learning + Graph Databases for Better Recommendations (3)....
ATO 2022 - Machine Learning + Graph Databases for Better Recommendations (3)....ATO 2022 - Machine Learning + Graph Databases for Better Recommendations (3)....
ATO 2022 - Machine Learning + Graph Databases for Better Recommendations (3)....
 
STIC-D: algorithmic techniques for efficient parallel pagerank computation on...
STIC-D: algorithmic techniques for efficient parallel pagerank computation on...STIC-D: algorithmic techniques for efficient parallel pagerank computation on...
STIC-D: algorithmic techniques for efficient parallel pagerank computation on...
 
Map reduce programming model to solve graph problems
Map reduce programming model to solve graph problemsMap reduce programming model to solve graph problems
Map reduce programming model to solve graph problems
 
Druid
DruidDruid
Druid
 
Big Data processing with Apache Spark
Big Data processing with Apache SparkBig Data processing with Apache Spark
Big Data processing with Apache Spark
 
Machine Learning + Graph Databases for Better Recommendations V2 08/20/2022
Machine Learning + Graph Databases for Better Recommendations V2 08/20/2022Machine Learning + Graph Databases for Better Recommendations V2 08/20/2022
Machine Learning + Graph Databases for Better Recommendations V2 08/20/2022
 
How to Get the Most Out of LiDAR Data
How to Get the Most Out of LiDAR DataHow to Get the Most Out of LiDAR Data
How to Get the Most Out of LiDAR Data
 
Chapter 3 principles of parallel algorithm design
Chapter 3   principles of parallel algorithm designChapter 3   principles of parallel algorithm design
Chapter 3 principles of parallel algorithm design
 
On Traffic-Aware Partition and Aggregation in Map Reduce for Big Data Applica...
On Traffic-Aware Partition and Aggregation in Map Reduce for Big Data Applica...On Traffic-Aware Partition and Aggregation in Map Reduce for Big Data Applica...
On Traffic-Aware Partition and Aggregation in Map Reduce for Big Data Applica...
 
Parallel Data Processing with MapReduce: A Survey
Parallel Data Processing with MapReduce: A SurveyParallel Data Processing with MapReduce: A Survey
Parallel Data Processing with MapReduce: A Survey
 
Pregel
PregelPregel
Pregel
 
Optimal Chain Matrix Multiplication Big Data Perspective
Optimal Chain Matrix Multiplication Big Data PerspectiveOptimal Chain Matrix Multiplication Big Data Perspective
Optimal Chain Matrix Multiplication Big Data Perspective
 
MULTIPLE DAG APPLICATIONS SCHEDULING ON A CLUSTER OF PROCESSORS
MULTIPLE DAG APPLICATIONS SCHEDULING ON A CLUSTER OF PROCESSORSMULTIPLE DAG APPLICATIONS SCHEDULING ON A CLUSTER OF PROCESSORS
MULTIPLE DAG APPLICATIONS SCHEDULING ON A CLUSTER OF PROCESSORS
 
Multiple dag applications
Multiple dag applicationsMultiple dag applications
Multiple dag applications
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
SparkNet presentation
SparkNet presentationSparkNet presentation
SparkNet presentation
 

Último

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 

Último (20)

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 

Benchmarking Tool for Graph Algorithms

  • 1. BenchMarking Tool for Graph Algorithms IIIT-H Cloud Computing - Major Project By: Abhinaba Sarkar 201405616 Malavika Reddy 201201193 Yash Khandelwal 201302164 Nikita Kad 201330030
  • 2. Description ● In computer science and mathematics, graphs are abstract data structures that model structural relationships among objects. They are now widely used for data modeling in application domains for which identifying relationship patterns, rules, and anomalies is useful. ● These domains include the web graph, social networks,etc. The ever-increasing size of graph- structured data for these applications creates a critical need for scalable systems that can process large amounts of it efficiently. ● The project aims at making a benchmarking tool for testing the performance of graph algorithms like BFS, Pagerank,etc. with MapReduce, Giraph, GraphLab and Neo4j and testing which approach works better on what kind of graphs.
  • 3. Motivation ● Analyze the runtime of different types of graph algorithms on different types of distributed systems. ● Performing computation on a graph data structure requires processing at each node. ● Each node contains node-specific data as well as links (edges) to other nodes. So computation must traverse the graph which will take a huge amount of time.
  • 4. Approach The BFS/SSSP algorithm is broken in 2 tasks: ● Map Task:In each Map task, we discover all the neighbors of the node currently in queue (we used color encoding GRAY for nodes in queue) and add them to our graph. ● Reduce Task:In each Reduce task, we set the correct level of the nodes and update the graph. The pagerank algorithm is also broken in 2 steps: ● Map Task: Each page emit its neighbours and current pagerank. ● Reduce Task: For each key(page) new page rank is calculated using pagerank emitted in the map task. ○ PR(A)=(1-d) + d(PR(T1)/C(T1) + ... +PR(Tn)/C(Tn)) Where - C(P) is the cardinality (out- degree) of page P, d is the damping (“random URL”) factor. Dijkstra: ● Map task : In each of the map tasks, neighbors are discovered and put into the queue with color coding gray. ● Reduce task : In each of the reduce tasks, we select the nodes according to the shortest distances from the current node.
  • 5. Approach contd. Giraph and Hadoop All the computations are done on a cluster of 2 nodes Graphlab All the computations are performed on single machine
  • 6. Applications In today’s world, dynamic social graphs (like: linkedin, twitter and facebook) are not feasible to process in single node. Therefore we need to benchmark the runtime of different graph algorithms in distributed system. Example graph: LinkedIn’s social graph
  • 7. Complexity ● BFS: The complexity of standard BFS algorithm is O(V+E) but because of the overhead of read/write in distributed computing, the order reaches O (E*Depth). ● Similar is the case for Dijkstra’s algorithm. But number of iterations will be higher than BFS. ● Page Rank: The Complexity of pagerank in distributed system is – (No. of Node + No. of Relations)*Iterations
  • 8. Benchmarking - Giraph Nodes Time 1000 4 min 7.836 sec 1 million 10 min 11.443sec Nodes Time 1000 3 min 5.655 sec 1 million 11 min 0.05 sec Nodes Time 1000 5 min 12.111 sec 1 million 16 min 8.652 sec BFS Dijkstra Pagerank
  • 9. Nodes Time 1000 6.029 sec 10,000 20.154 sec 1 million 1 min 11.124 sec Nodes Time 1000 4.852 sec 10,000 13.029 sec 1 million 1 min 10.576sec Page-Rank Dijkstra Benchmarking - Graphlab
  • 10. Benchmarking - Hadoop Nodes Time 1000 4 min 7.836 sec 1 million 10 min 11.443sec Nodes Time 1000 3 min 5.655 sec 1 million 11 min 0.05 sec BFS Dijkstra Pagerank Nodes Time 1000 5 min 12.111 sec 1 million 16 min 8.652 sec BFS and Dijkstra’s runtime depend on the depth of the input graph.
  • 11. Problems we faced ● Poor locality of memory access. ● Very little work per vertex. ● Changing degree of parallelism. ● Running over many machines makes the problem worse
  • 12. Conclusion and Future Work ● Although GraphLab is fast, there is constraint on memory as it requires as much memory to contain the edges and their associated values of any single vertex in the graph. ● From the experimental results, it is seen that the time taken for pagerank algorithm is directly proportional to the number of relations in the graph when the number of nodes and iterations are constant. This explains the huge difference in time. ● The runtime of BFS is directly proportional to the depth of the graph. So, greater the depth, more will be the number of iterations and hence more time. Future Work: Taking the input graph from file adds a huge overhead of reading and writing to files in each iteration, so if somehow we can store the graph and its properties in a Database, the read/write overhead will be gone and the query time will be reduced. So,we plan to include Database in it.