SlideShare uma empresa Scribd logo
1 de 7
Indian Institute of Technology, Patna
Large Scale Graph Processing: Neo4j Vs Apache
Giraph Vs Hadoop-MapReduce
(Survey Report)
Nishant M Gandhi
M.Tech. CSE
IIT Patna
Contents
1. Introduction ..........................................................................................................................................3
2. Graph Processing Platforms..................................................................................................................3
a. Hadoop-MapReduce .............................................................................................................................3
b. Giraph....................................................................................................................................................4
c. Neo4j .....................................................................................................................................................4
3. Analysis of Platforms.............................................................................................................................5
a. Hadoop-MapReduce.........................................................................................................................5
b. Giraph................................................................................................................................................5
c. Neo4j.................................................................................................................................................5
4. Conclusion.............................................................................................................................................6
5. References ............................................................................................................................................7
1. Introduction
Today we are living in era of big data. From social media to scientific experiments, from
computer to mobile devices, generate huge amount of data every day. Storing and
processing this data is also the challenge now a day. There are so many real life
problems, which can be solved with use of this generated big data. Many of these
problems related with big data can be mapped to graph problems.
Many solutions have been created to process large scale data. One of the most popular
is [4] Hadoop with its [2] MapReduce programming platform. The lack of a programming
model dedicated for graph was addressed by Google with [3] Pregel. The Pregel uses Bulk
Synchronization Parallel model for graph processing. The open source version of Pregel
is [1] Giraph. Another platform is [6] Neo4j which is graph database processing platform.
In this document, we will try to understand these three platforms and their pros & cons.
2. Graph Processing Platforms
There are many platforms available for large scale graph processing. However we are
considering only these three platforms because all three contain very different
programming models to process graph.
 [6]Neo4J: desktop platform, NoSQL, graph database, version
 [2]Hadoop-MapReduce: cluster platform, generic large-scale data processing
platform
 [1]Giraph: cluster platform, large-scale graph processing specialized platform
a. Hadoop-MapReduce
[2]Hadoop is an open-source platform for storing & computing huge amount of data.
Hadoop has been used widely in many data analytics applications. It uses MapReduce
programming model. Hadoop’s MapReduce programming model is inspired by
functional programming’s Map & Reduce functions. The MapReduce programming
model process input data and divides it based on key/value pairs.
Data used by [4] Hadoop is stored in the Hadoop Distributed File System (HDFS). HDFS is
not a part of Hadoop, although it is being used by it and the platform will not work
without HDFS. Datasets which are stored in the HDFS are divided into N blocks of similar
size. Each of these blocks is used as an input for Mapper.
[8]Hadoop’s programming model has low performance and high resource consumption
for iterative graph algorithms, because of programming model which require multiple
map-reduce cycle. For example, for iterative graph traversing algorithms Hadoop would
often need to store and load the entire graph structure during each iteration, to transfer
data between the map and reduce processes through the disk-intensive HDFS, and to
run an convergence-checking iterations as an additional job.
b. Giraph
[1]Giraph is an open-source, graph specific distributed system platform. Giraph uses the
Pregel programming model, which is a vertex-centric programming abstraction that
adapts the Bulk Synchronous Parallel (BSP) model. A BSP computation proceeds in a
series of global Supersteps. Within each Superstep, active vertices execute the same
user defined compute function and create & deliver inter-vertex messages. Barriers
ensure synchronization between vertex computations. Once there are no messages to
process and all vertices vote to halt.
[8]Giraph utilizes the design of Hadoop, from which it leverages only Map phase. The
single biggest difference between Hadoop & Giraph is the fact that Giraph is in-memory
which speedup job execution. For fault-tolerance, Giraph uses periodic checkpoints. To
co-ordinate Superstep execution, it uses [5]ZooKeeper.
c. Neo4j
Neo4j is one of the popular open-source NoSQL graph database implemented in java.
Neo4j stores data in graphs rather than in tables. Every stored graph in Neo4j consists of
relationships and vertices annotated with properties. Neo4j can execute graph-
processing algorithms efficiently on just one machine, because of its optimization
techniques that favor response time. [8]Neo4j uses a two-level, main-memory caching
mechanism to improve its performance. The file buffer caches the storage file data in
the same format as it is stored on the durable storage media. The object buffer caches
vertices and relationships in a format that is optimized for high traversal speeds and
transactional writes.
Neo4j processes graphs by traversing all vertices, with the use of either the BFS or DFS
traversal algorithm. To start graph traversal a program has to define a special reference
vertex. This vertex is not a part of the original graph, but an additional artificial vertex
which is add to the graph structure and act as a starting point of the graph traversal. All
graph operations are performed as ACID transactions.
3. Analysis of Platforms
The performance analyses of these platforms have been done several times but here I
am using two materials and their results to write this section of report. The one is M.S.
theses report of Marcin Biczak batch of 2013 from Delft University of Technology.
Another one is report titled [8] “How well do Graph-processing platform performs?”
From these two materials, some important finding comes out which are as listed below.
[8]However, the performance of all platforms is stable and largest variance around 10%.
a. Hadoop-MapReduce
i. [7]Hadoop-MapReduce performs worst in any graph algorithm then other
platforms.
ii. [7]Multi-iteration algorithms suffer from additional performance
penalties.
b. Giraph
i. [8]Giraph process graph in-memory and realize dynamic computation
mechanism by which only selected vertices will be processed in all
iterations of algorithms. That reduces computation time.
ii. [7]For large amounts of messages or big datasets, Giraph can lead to
crashes due to lack of memory.
c. Neo4j
i. [8]Limited by the resource of single machine, the performance of Neo4j
becomes significantly worst when the graph exceeds the memory
capacity.
ii. [7]Neo4j was designed as a single machine dataset. To achieve multi scale,
users of Neo4j have to implement communication between these
machines as well as manage partitioning, consistency etc. It require
significant amount of additional work beside the application
implementations.
iii. [7]Two-level cache allows Neo4j to achieve excellent hot-cache execution
times, especially when graph data accessed by the algorithm fits in cache.
iv. [8]The data ingestion time of Neo4j matches closely the characteristics of
the graph. Overall, data ingestion takes much longer for Neo4j than
HDFS.
4. Conclusion
Based on survey, we can reach to following conclusion.
Modern computers can handle most of smaller or sparser graph databases. However,
once the dataset size significantly increases or if the graph is dense, the execution time
increases significantly. For this reason, single machine based graph processing platforms
cannot compete with distributed system.
We have considered two graph-processing frameworks (Giraph, Neo4j) and a generic
data-processing platform (Hadoop). The platforms which focus on processing graph
dataset achieve significant performance advantages over generic platforms in most of
cases. A Hadoop does not maintain the relations between data and treats every vertex
as a disjoint, which other platforms have to and pay a performance penalty for it. Thus
for certain datasets Hadoop can achieve better performance than the graph-processing
platforms.
There are two significant factors for the large-scale graph-processing platforms: the
programming model and platform design. The Pregel performs much better than
MapReduce for iterative algorithms. Giraph has limitation that it performs in memory
computation, which limitation is not for Hadoop-mapreduce. [7]The Neo4j has achieved
good performance for smaller or sparser dataset on single system. It has very good
documentation and hence easy to learn. [7]The Hadoop-mapreduce considered slowest
platform of all evaluated platforms but Neo4j’s performance for the large or dense
dataset is lower than that of Hadoop-mapreduce. [7]The Giraph platform, which
represents distributed large-scale graph processing platforms, was the fastest platform
in all the test experiment made by Marcin Biczak.
5. References
1. Apache Software Foundation, “Giraph.” http://giraph.apache.org
2. J. Dean and S. Ghemawat, “MapReduce: Simplified Data Processing on Large Clusters,”
Comm. ACM, vol. 51, no. 1,2008, pp. 107–112.
3. Pregel: a system for large-scale graph processing - "abstract". G. Malewicz, M. H.
Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. In Proceedings of
the 28th ACM symposium on Principles of distributed computing, PODC '09, pages 6-
6,New York, NY, USA, 2009. ACM.
4. Apache Software Foundation, “Hadoop” Website, 2011.http://hadoop.apache.org
5. Apache Software Foundation, “Zookeeper”.Website,2010. http://zookeeper.apache.org
6. Neo Technology, http://www.neo4j.org
7. LudoGraph: a Sampling Capable Cloud-Based System for Large-Scale Graph Processing
Based on the Pregel programming model, Marcin Biczak, Masters of Science Thesis,Delft
University of Technology Year 2013
8. Y. Guo, M. Biczak, A. Varbanescu, A. Isoup, C. Martella, and T. Willke, “How well do
graph-processing platforms perform? an empirical performance evaluation and analysis:
Extended report,” tech. rep., Delft University of Technology, 2013.

Mais conteúdo relacionado

Mais procurados

GPT and other Text Transformers: Black Swans and Stochastic Parrots
GPT and other Text Transformers:  Black Swans and Stochastic ParrotsGPT and other Text Transformers:  Black Swans and Stochastic Parrots
GPT and other Text Transformers: Black Swans and Stochastic ParrotsKonstantin Savenkov
 
Introduction to Graph Databases
Introduction to Graph DatabasesIntroduction to Graph Databases
Introduction to Graph DatabasesDataStax
 
Knowledge Graphs for Transformation: Dynamic Context for the Intelligent Ente...
Knowledge Graphs for Transformation: Dynamic Context for the Intelligent Ente...Knowledge Graphs for Transformation: Dynamic Context for the Intelligent Ente...
Knowledge Graphs for Transformation: Dynamic Context for the Intelligent Ente...Neo4j
 
ChatGPT Prompt Tips
ChatGPT Prompt TipsChatGPT Prompt Tips
ChatGPT Prompt TipsJohn Allan
 
GraphQL ♥︎ GraphDB
GraphQL ♥︎ GraphDBGraphQL ♥︎ GraphDB
GraphQL ♥︎ GraphDBGraphRM
 
Intro to Neo4j and Graph Databases
Intro to Neo4j and Graph DatabasesIntro to Neo4j and Graph Databases
Intro to Neo4j and Graph DatabasesNeo4j
 
Neo4j: What's Under the Hood
Neo4j: What's Under the HoodNeo4j: What's Under the Hood
Neo4j: What's Under the HoodNeo4j
 
Supermicro AI Pod that’s Super Simple, Super Scalable, and Super Affordable
Supermicro AI Pod that’s Super Simple, Super Scalable, and Super AffordableSupermicro AI Pod that’s Super Simple, Super Scalable, and Super Affordable
Supermicro AI Pod that’s Super Simple, Super Scalable, and Super AffordableRebekah Rodriguez
 
Bigdata - Leandro Wanderley
Bigdata - Leandro WanderleyBigdata - Leandro Wanderley
Bigdata - Leandro WanderleyLeandro Couto
 
The Knowledge Graph Explosion
The Knowledge Graph ExplosionThe Knowledge Graph Explosion
The Knowledge Graph ExplosionNeo4j
 
Easily Identify Sources of Supply Chain Gridlock
Easily Identify Sources of Supply Chain GridlockEasily Identify Sources of Supply Chain Gridlock
Easily Identify Sources of Supply Chain GridlockNeo4j
 
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Sri Ambati
 
Graphes de connaissances avec Neo4j
Graphes de connaissances avec Neo4j Graphes de connaissances avec Neo4j
Graphes de connaissances avec Neo4j Neo4j
 
Neanex - Semantic Construction with Graphs
Neanex - Semantic Construction with GraphsNeanex - Semantic Construction with Graphs
Neanex - Semantic Construction with GraphsNeo4j
 
AI Prompt Engineering 101.pdf
AI Prompt Engineering 101.pdfAI Prompt Engineering 101.pdf
AI Prompt Engineering 101.pdfAListDaily
 
LangChain Intro by KeyMate.AI
LangChain Intro by KeyMate.AILangChain Intro by KeyMate.AI
LangChain Intro by KeyMate.AIOzgurOscarOzkan
 
ELT vs. ETL - How they’re different and why it matters
ELT vs. ETL - How they’re different and why it mattersELT vs. ETL - How they’re different and why it matters
ELT vs. ETL - How they’re different and why it mattersMatillion
 
Graph Thinking: Why it Matters
Graph Thinking: Why it MattersGraph Thinking: Why it Matters
Graph Thinking: Why it MattersNeo4j
 
PostgreSQL as a Big Data Platform
PostgreSQL as a Big Data Platform PostgreSQL as a Big Data Platform
PostgreSQL as a Big Data Platform Chris Travers
 

Mais procurados (20)

GPT and other Text Transformers: Black Swans and Stochastic Parrots
GPT and other Text Transformers:  Black Swans and Stochastic ParrotsGPT and other Text Transformers:  Black Swans and Stochastic Parrots
GPT and other Text Transformers: Black Swans and Stochastic Parrots
 
ChatGPT.pptx
ChatGPT.pptxChatGPT.pptx
ChatGPT.pptx
 
Introduction to Graph Databases
Introduction to Graph DatabasesIntroduction to Graph Databases
Introduction to Graph Databases
 
Knowledge Graphs for Transformation: Dynamic Context for the Intelligent Ente...
Knowledge Graphs for Transformation: Dynamic Context for the Intelligent Ente...Knowledge Graphs for Transformation: Dynamic Context for the Intelligent Ente...
Knowledge Graphs for Transformation: Dynamic Context for the Intelligent Ente...
 
ChatGPT Prompt Tips
ChatGPT Prompt TipsChatGPT Prompt Tips
ChatGPT Prompt Tips
 
GraphQL ♥︎ GraphDB
GraphQL ♥︎ GraphDBGraphQL ♥︎ GraphDB
GraphQL ♥︎ GraphDB
 
Intro to Neo4j and Graph Databases
Intro to Neo4j and Graph DatabasesIntro to Neo4j and Graph Databases
Intro to Neo4j and Graph Databases
 
Neo4j: What's Under the Hood
Neo4j: What's Under the HoodNeo4j: What's Under the Hood
Neo4j: What's Under the Hood
 
Supermicro AI Pod that’s Super Simple, Super Scalable, and Super Affordable
Supermicro AI Pod that’s Super Simple, Super Scalable, and Super AffordableSupermicro AI Pod that’s Super Simple, Super Scalable, and Super Affordable
Supermicro AI Pod that’s Super Simple, Super Scalable, and Super Affordable
 
Bigdata - Leandro Wanderley
Bigdata - Leandro WanderleyBigdata - Leandro Wanderley
Bigdata - Leandro Wanderley
 
The Knowledge Graph Explosion
The Knowledge Graph ExplosionThe Knowledge Graph Explosion
The Knowledge Graph Explosion
 
Easily Identify Sources of Supply Chain Gridlock
Easily Identify Sources of Supply Chain GridlockEasily Identify Sources of Supply Chain Gridlock
Easily Identify Sources of Supply Chain Gridlock
 
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
 
Graphes de connaissances avec Neo4j
Graphes de connaissances avec Neo4j Graphes de connaissances avec Neo4j
Graphes de connaissances avec Neo4j
 
Neanex - Semantic Construction with Graphs
Neanex - Semantic Construction with GraphsNeanex - Semantic Construction with Graphs
Neanex - Semantic Construction with Graphs
 
AI Prompt Engineering 101.pdf
AI Prompt Engineering 101.pdfAI Prompt Engineering 101.pdf
AI Prompt Engineering 101.pdf
 
LangChain Intro by KeyMate.AI
LangChain Intro by KeyMate.AILangChain Intro by KeyMate.AI
LangChain Intro by KeyMate.AI
 
ELT vs. ETL - How they’re different and why it matters
ELT vs. ETL - How they’re different and why it mattersELT vs. ETL - How they’re different and why it matters
ELT vs. ETL - How they’re different and why it matters
 
Graph Thinking: Why it Matters
Graph Thinking: Why it MattersGraph Thinking: Why it Matters
Graph Thinking: Why it Matters
 
PostgreSQL as a Big Data Platform
PostgreSQL as a Big Data Platform PostgreSQL as a Big Data Platform
PostgreSQL as a Big Data Platform
 

Destaque

Neo4j Spatial - Backing a GIS with a true graph database
Neo4j Spatial - Backing a GIS with a true graph databaseNeo4j Spatial - Backing a GIS with a true graph database
Neo4j Spatial - Backing a GIS with a true graph databaseCraig Taverner
 
Computer services
Computer servicesComputer services
Computer servicesArz Sy
 
Sql injection exposed proof of concept
Sql injection exposed  proof of conceptSql injection exposed  proof of concept
Sql injection exposed proof of conceptlaila wulandari
 
Hadoop Graph Processing with Apache Giraph
Hadoop Graph Processing with Apache GiraphHadoop Graph Processing with Apache Giraph
Hadoop Graph Processing with Apache GiraphDataWorks Summit
 
Data Applications and Infrastructure at LinkedIn__HadoopSummit2010
Data Applications and Infrastructure at LinkedIn__HadoopSummit2010Data Applications and Infrastructure at LinkedIn__HadoopSummit2010
Data Applications and Infrastructure at LinkedIn__HadoopSummit2010Yahoo Developer Network
 
Modul Free One Day Workshop Implementing Cisco IP Routing and Switched Networks
Modul Free One Day Workshop Implementing Cisco IP Routing and Switched NetworksModul Free One Day Workshop Implementing Cisco IP Routing and Switched Networks
Modul Free One Day Workshop Implementing Cisco IP Routing and Switched NetworksI Putu Hariyadi
 
Packet tracer (network simulation)
Packet tracer (network simulation)Packet tracer (network simulation)
Packet tracer (network simulation)Aldi Nor Fahrudin
 
Modul Praktikum Sistem Keamanan Jaringan STMIK Bumigora Versi 1.0
Modul Praktikum Sistem Keamanan Jaringan STMIK Bumigora Versi 1.0Modul Praktikum Sistem Keamanan Jaringan STMIK Bumigora Versi 1.0
Modul Praktikum Sistem Keamanan Jaringan STMIK Bumigora Versi 1.0I Putu Hariyadi
 
Using packet-tracer, capture and other Cisco ASA tools for network troublesho...
Using packet-tracer, capture and other Cisco ASA tools for network troublesho...Using packet-tracer, capture and other Cisco ASA tools for network troublesho...
Using packet-tracer, capture and other Cisco ASA tools for network troublesho...Cisco Russia
 
Overview of Apache Flink: Next-Gen Big Data Analytics Framework
Overview of Apache Flink: Next-Gen Big Data Analytics FrameworkOverview of Apache Flink: Next-Gen Big Data Analytics Framework
Overview of Apache Flink: Next-Gen Big Data Analytics FrameworkSlim Baltagi
 
Pembahasan Cisco Packet Tracer Challenge LKS SMK Provinsi NTB 2016
Pembahasan Cisco Packet Tracer Challenge LKS SMK Provinsi NTB 2016Pembahasan Cisco Packet Tracer Challenge LKS SMK Provinsi NTB 2016
Pembahasan Cisco Packet Tracer Challenge LKS SMK Provinsi NTB 2016I Putu Hariyadi
 
Big Graph Analytics on Neo4j with Apache Spark
Big Graph Analytics on Neo4j with Apache SparkBig Graph Analytics on Neo4j with Apache Spark
Big Graph Analytics on Neo4j with Apache SparkKenny Bastani
 
Apache Flink: Real-World Use Cases for Streaming Analytics
Apache Flink: Real-World Use Cases for Streaming AnalyticsApache Flink: Real-World Use Cases for Streaming Analytics
Apache Flink: Real-World Use Cases for Streaming AnalyticsSlim Baltagi
 
Packet tracer practical guide
Packet tracer practical guidePacket tracer practical guide
Packet tracer practical guideNishant Gandhi
 
Using MongoDB as a high performance graph database
Using MongoDB as a high performance graph databaseUsing MongoDB as a high performance graph database
Using MongoDB as a high performance graph databaseChris Clarke
 
Tutorial cisco packet tracer lengkap
Tutorial cisco packet tracer lengkapTutorial cisco packet tracer lengkap
Tutorial cisco packet tracer lengkaplaila wulandari
 
Cisco Packet Tracer Overview
Cisco Packet Tracer OverviewCisco Packet Tracer Overview
Cisco Packet Tracer OverviewAli Usman
 

Destaque (20)

Neo4j Spatial - Backing a GIS with a true graph database
Neo4j Spatial - Backing a GIS with a true graph databaseNeo4j Spatial - Backing a GIS with a true graph database
Neo4j Spatial - Backing a GIS with a true graph database
 
Computer services
Computer servicesComputer services
Computer services
 
Tutorial debian
Tutorial debianTutorial debian
Tutorial debian
 
Sql injection exposed proof of concept
Sql injection exposed  proof of conceptSql injection exposed  proof of concept
Sql injection exposed proof of concept
 
Sekolah Impianku
Sekolah ImpiankuSekolah Impianku
Sekolah Impianku
 
Hadoop Graph Processing with Apache Giraph
Hadoop Graph Processing with Apache GiraphHadoop Graph Processing with Apache Giraph
Hadoop Graph Processing with Apache Giraph
 
Data Applications and Infrastructure at LinkedIn__HadoopSummit2010
Data Applications and Infrastructure at LinkedIn__HadoopSummit2010Data Applications and Infrastructure at LinkedIn__HadoopSummit2010
Data Applications and Infrastructure at LinkedIn__HadoopSummit2010
 
Networking
NetworkingNetworking
Networking
 
Modul Free One Day Workshop Implementing Cisco IP Routing and Switched Networks
Modul Free One Day Workshop Implementing Cisco IP Routing and Switched NetworksModul Free One Day Workshop Implementing Cisco IP Routing and Switched Networks
Modul Free One Day Workshop Implementing Cisco IP Routing and Switched Networks
 
Packet tracer (network simulation)
Packet tracer (network simulation)Packet tracer (network simulation)
Packet tracer (network simulation)
 
Modul Praktikum Sistem Keamanan Jaringan STMIK Bumigora Versi 1.0
Modul Praktikum Sistem Keamanan Jaringan STMIK Bumigora Versi 1.0Modul Praktikum Sistem Keamanan Jaringan STMIK Bumigora Versi 1.0
Modul Praktikum Sistem Keamanan Jaringan STMIK Bumigora Versi 1.0
 
Using packet-tracer, capture and other Cisco ASA tools for network troublesho...
Using packet-tracer, capture and other Cisco ASA tools for network troublesho...Using packet-tracer, capture and other Cisco ASA tools for network troublesho...
Using packet-tracer, capture and other Cisco ASA tools for network troublesho...
 
Overview of Apache Flink: Next-Gen Big Data Analytics Framework
Overview of Apache Flink: Next-Gen Big Data Analytics FrameworkOverview of Apache Flink: Next-Gen Big Data Analytics Framework
Overview of Apache Flink: Next-Gen Big Data Analytics Framework
 
Pembahasan Cisco Packet Tracer Challenge LKS SMK Provinsi NTB 2016
Pembahasan Cisco Packet Tracer Challenge LKS SMK Provinsi NTB 2016Pembahasan Cisco Packet Tracer Challenge LKS SMK Provinsi NTB 2016
Pembahasan Cisco Packet Tracer Challenge LKS SMK Provinsi NTB 2016
 
Big Graph Analytics on Neo4j with Apache Spark
Big Graph Analytics on Neo4j with Apache SparkBig Graph Analytics on Neo4j with Apache Spark
Big Graph Analytics on Neo4j with Apache Spark
 
Apache Flink: Real-World Use Cases for Streaming Analytics
Apache Flink: Real-World Use Cases for Streaming AnalyticsApache Flink: Real-World Use Cases for Streaming Analytics
Apache Flink: Real-World Use Cases for Streaming Analytics
 
Packet tracer practical guide
Packet tracer practical guidePacket tracer practical guide
Packet tracer practical guide
 
Using MongoDB as a high performance graph database
Using MongoDB as a high performance graph databaseUsing MongoDB as a high performance graph database
Using MongoDB as a high performance graph database
 
Tutorial cisco packet tracer lengkap
Tutorial cisco packet tracer lengkapTutorial cisco packet tracer lengkap
Tutorial cisco packet tracer lengkap
 
Cisco Packet Tracer Overview
Cisco Packet Tracer OverviewCisco Packet Tracer Overview
Cisco Packet Tracer Overview
 

Semelhante a Neo4j vs giraph

Survey on Performance of Hadoop Map reduce Optimization Methods
Survey on Performance of Hadoop Map reduce Optimization MethodsSurvey on Performance of Hadoop Map reduce Optimization Methods
Survey on Performance of Hadoop Map reduce Optimization Methodspaperpublications3
 
A data aware caching 2415
A data aware caching 2415A data aware caching 2415
A data aware caching 2415SANTOSH WAYAL
 
Understanding hadoop
Understanding hadoopUnderstanding hadoop
Understanding hadoopRexRamos9
 
Report Hadoop Map Reduce
Report Hadoop Map ReduceReport Hadoop Map Reduce
Report Hadoop Map ReduceUrvashi Kataria
 
Leveraging Map Reduce With Hadoop for Weather Data Analytics
Leveraging Map Reduce With Hadoop for Weather Data Analytics Leveraging Map Reduce With Hadoop for Weather Data Analytics
Leveraging Map Reduce With Hadoop for Weather Data Analytics iosrjce
 
Architecting and productionising data science applications at scale
Architecting and productionising data science applications at scaleArchitecting and productionising data science applications at scale
Architecting and productionising data science applications at scalesamthemonad
 
Seminar_Report_hadoop
Seminar_Report_hadoopSeminar_Report_hadoop
Seminar_Report_hadoopVarun Narang
 
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENTLARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENTijwscjournal
 
Design Issues and Challenges of Peer-to-Peer Video on Demand System
Design Issues and Challenges of Peer-to-Peer Video on Demand System Design Issues and Challenges of Peer-to-Peer Video on Demand System
Design Issues and Challenges of Peer-to-Peer Video on Demand System cscpconf
 
Survey of Parallel Data Processing in Context with MapReduce
Survey of Parallel Data Processing in Context with MapReduce Survey of Parallel Data Processing in Context with MapReduce
Survey of Parallel Data Processing in Context with MapReduce cscpconf
 
MAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISON
MAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISONMAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISON
MAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISONijcsit
 
Enhancement of Map Function Image Processing System Using DHRF Algorithm on B...
Enhancement of Map Function Image Processing System Using DHRF Algorithm on B...Enhancement of Map Function Image Processing System Using DHRF Algorithm on B...
Enhancement of Map Function Image Processing System Using DHRF Algorithm on B...AM Publications
 
Hadoop Mapreduce Performance Enhancement Using In-Node Combiners
Hadoop Mapreduce Performance Enhancement Using In-Node CombinersHadoop Mapreduce Performance Enhancement Using In-Node Combiners
Hadoop Mapreduce Performance Enhancement Using In-Node Combinersijcsit
 
Introduccion a Hadoop / Introduction to Hadoop
Introduccion a Hadoop / Introduction to HadoopIntroduccion a Hadoop / Introduction to Hadoop
Introduccion a Hadoop / Introduction to HadoopGERARDO BARBERENA
 

Semelhante a Neo4j vs giraph (20)

Survey on Performance of Hadoop Map reduce Optimization Methods
Survey on Performance of Hadoop Map reduce Optimization MethodsSurvey on Performance of Hadoop Map reduce Optimization Methods
Survey on Performance of Hadoop Map reduce Optimization Methods
 
A data aware caching 2415
A data aware caching 2415A data aware caching 2415
A data aware caching 2415
 
Hadoop Seminar Report
Hadoop Seminar ReportHadoop Seminar Report
Hadoop Seminar Report
 
Understanding hadoop
Understanding hadoopUnderstanding hadoop
Understanding hadoop
 
Report Hadoop Map Reduce
Report Hadoop Map ReduceReport Hadoop Map Reduce
Report Hadoop Map Reduce
 
43_Sameer_Kumar_Das2
43_Sameer_Kumar_Das243_Sameer_Kumar_Das2
43_Sameer_Kumar_Das2
 
B017320612
B017320612B017320612
B017320612
 
Leveraging Map Reduce With Hadoop for Weather Data Analytics
Leveraging Map Reduce With Hadoop for Weather Data Analytics Leveraging Map Reduce With Hadoop for Weather Data Analytics
Leveraging Map Reduce With Hadoop for Weather Data Analytics
 
Hadoop Cluster Analysis and Assessment
Hadoop Cluster Analysis and AssessmentHadoop Cluster Analysis and Assessment
Hadoop Cluster Analysis and Assessment
 
Architecting and productionising data science applications at scale
Architecting and productionising data science applications at scaleArchitecting and productionising data science applications at scale
Architecting and productionising data science applications at scale
 
Seminar_Report_hadoop
Seminar_Report_hadoopSeminar_Report_hadoop
Seminar_Report_hadoop
 
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENTLARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
 
Design Issues and Challenges of Peer-to-Peer Video on Demand System
Design Issues and Challenges of Peer-to-Peer Video on Demand System Design Issues and Challenges of Peer-to-Peer Video on Demand System
Design Issues and Challenges of Peer-to-Peer Video on Demand System
 
Survey of Parallel Data Processing in Context with MapReduce
Survey of Parallel Data Processing in Context with MapReduce Survey of Parallel Data Processing in Context with MapReduce
Survey of Parallel Data Processing in Context with MapReduce
 
MAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISON
MAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISONMAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISON
MAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISON
 
B04 06 0918
B04 06 0918B04 06 0918
B04 06 0918
 
Enhancement of Map Function Image Processing System Using DHRF Algorithm on B...
Enhancement of Map Function Image Processing System Using DHRF Algorithm on B...Enhancement of Map Function Image Processing System Using DHRF Algorithm on B...
Enhancement of Map Function Image Processing System Using DHRF Algorithm on B...
 
Hadoop Mapreduce Performance Enhancement Using In-Node Combiners
Hadoop Mapreduce Performance Enhancement Using In-Node CombinersHadoop Mapreduce Performance Enhancement Using In-Node Combiners
Hadoop Mapreduce Performance Enhancement Using In-Node Combiners
 
Introduccion a Hadoop / Introduction to Hadoop
Introduccion a Hadoop / Introduction to HadoopIntroduccion a Hadoop / Introduction to Hadoop
Introduccion a Hadoop / Introduction to Hadoop
 
B04 06 0918
B04 06 0918B04 06 0918
B04 06 0918
 

Mais de Nishant Gandhi

Customer Feedback Analytics for Starbucks
Customer Feedback Analytics for Starbucks Customer Feedback Analytics for Starbucks
Customer Feedback Analytics for Starbucks Nishant Gandhi
 
Guest Lecture: Introduction to Big Data at Indian Institute of Technology
Guest Lecture: Introduction to Big Data at Indian Institute of TechnologyGuest Lecture: Introduction to Big Data at Indian Institute of Technology
Guest Lecture: Introduction to Big Data at Indian Institute of TechnologyNishant Gandhi
 
Processing Large Graphs
Processing Large GraphsProcessing Large Graphs
Processing Large GraphsNishant Gandhi
 
Graph Coloring Algorithms on Pregel Model using Hadoop
Graph Coloring Algorithms on Pregel Model using HadoopGraph Coloring Algorithms on Pregel Model using Hadoop
Graph Coloring Algorithms on Pregel Model using HadoopNishant Gandhi
 
Map reduce programming model to solve graph problems
Map reduce programming model to solve graph problemsMap reduce programming model to solve graph problems
Map reduce programming model to solve graph problemsNishant Gandhi
 

Mais de Nishant Gandhi (7)

Customer Feedback Analytics for Starbucks
Customer Feedback Analytics for Starbucks Customer Feedback Analytics for Starbucks
Customer Feedback Analytics for Starbucks
 
Guest Lecture: Introduction to Big Data at Indian Institute of Technology
Guest Lecture: Introduction to Big Data at Indian Institute of TechnologyGuest Lecture: Introduction to Big Data at Indian Institute of Technology
Guest Lecture: Introduction to Big Data at Indian Institute of Technology
 
Processing Large Graphs
Processing Large GraphsProcessing Large Graphs
Processing Large Graphs
 
Graph Coloring Algorithms on Pregel Model using Hadoop
Graph Coloring Algorithms on Pregel Model using HadoopGraph Coloring Algorithms on Pregel Model using Hadoop
Graph Coloring Algorithms on Pregel Model using Hadoop
 
Map reduce programming model to solve graph problems
Map reduce programming model to solve graph problemsMap reduce programming model to solve graph problems
Map reduce programming model to solve graph problems
 
Hadoop Report
Hadoop ReportHadoop Report
Hadoop Report
 
Hadoop
HadoopHadoop
Hadoop
 

Último

RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxronsairoathenadugay
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...HyderabadDolls
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样wsppdmt
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...nirzagarg
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...SOFTTECHHUB
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numberssuginr1
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...gajnagarg
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabiaahmedjiabur940
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Klinik kandungan
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1ranjankumarbehera14
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制vexqp
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...gajnagarg
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...Health
 
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...kumargunjan9515
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...gajnagarg
 

Último (20)

RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbers
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 

Neo4j vs giraph

  • 1. Indian Institute of Technology, Patna Large Scale Graph Processing: Neo4j Vs Apache Giraph Vs Hadoop-MapReduce (Survey Report) Nishant M Gandhi M.Tech. CSE IIT Patna
  • 2. Contents 1. Introduction ..........................................................................................................................................3 2. Graph Processing Platforms..................................................................................................................3 a. Hadoop-MapReduce .............................................................................................................................3 b. Giraph....................................................................................................................................................4 c. Neo4j .....................................................................................................................................................4 3. Analysis of Platforms.............................................................................................................................5 a. Hadoop-MapReduce.........................................................................................................................5 b. Giraph................................................................................................................................................5 c. Neo4j.................................................................................................................................................5 4. Conclusion.............................................................................................................................................6 5. References ............................................................................................................................................7
  • 3. 1. Introduction Today we are living in era of big data. From social media to scientific experiments, from computer to mobile devices, generate huge amount of data every day. Storing and processing this data is also the challenge now a day. There are so many real life problems, which can be solved with use of this generated big data. Many of these problems related with big data can be mapped to graph problems. Many solutions have been created to process large scale data. One of the most popular is [4] Hadoop with its [2] MapReduce programming platform. The lack of a programming model dedicated for graph was addressed by Google with [3] Pregel. The Pregel uses Bulk Synchronization Parallel model for graph processing. The open source version of Pregel is [1] Giraph. Another platform is [6] Neo4j which is graph database processing platform. In this document, we will try to understand these three platforms and their pros & cons. 2. Graph Processing Platforms There are many platforms available for large scale graph processing. However we are considering only these three platforms because all three contain very different programming models to process graph.  [6]Neo4J: desktop platform, NoSQL, graph database, version  [2]Hadoop-MapReduce: cluster platform, generic large-scale data processing platform  [1]Giraph: cluster platform, large-scale graph processing specialized platform a. Hadoop-MapReduce [2]Hadoop is an open-source platform for storing & computing huge amount of data. Hadoop has been used widely in many data analytics applications. It uses MapReduce programming model. Hadoop’s MapReduce programming model is inspired by functional programming’s Map & Reduce functions. The MapReduce programming model process input data and divides it based on key/value pairs. Data used by [4] Hadoop is stored in the Hadoop Distributed File System (HDFS). HDFS is not a part of Hadoop, although it is being used by it and the platform will not work without HDFS. Datasets which are stored in the HDFS are divided into N blocks of similar size. Each of these blocks is used as an input for Mapper.
  • 4. [8]Hadoop’s programming model has low performance and high resource consumption for iterative graph algorithms, because of programming model which require multiple map-reduce cycle. For example, for iterative graph traversing algorithms Hadoop would often need to store and load the entire graph structure during each iteration, to transfer data between the map and reduce processes through the disk-intensive HDFS, and to run an convergence-checking iterations as an additional job. b. Giraph [1]Giraph is an open-source, graph specific distributed system platform. Giraph uses the Pregel programming model, which is a vertex-centric programming abstraction that adapts the Bulk Synchronous Parallel (BSP) model. A BSP computation proceeds in a series of global Supersteps. Within each Superstep, active vertices execute the same user defined compute function and create & deliver inter-vertex messages. Barriers ensure synchronization between vertex computations. Once there are no messages to process and all vertices vote to halt. [8]Giraph utilizes the design of Hadoop, from which it leverages only Map phase. The single biggest difference between Hadoop & Giraph is the fact that Giraph is in-memory which speedup job execution. For fault-tolerance, Giraph uses periodic checkpoints. To co-ordinate Superstep execution, it uses [5]ZooKeeper. c. Neo4j Neo4j is one of the popular open-source NoSQL graph database implemented in java. Neo4j stores data in graphs rather than in tables. Every stored graph in Neo4j consists of relationships and vertices annotated with properties. Neo4j can execute graph- processing algorithms efficiently on just one machine, because of its optimization techniques that favor response time. [8]Neo4j uses a two-level, main-memory caching mechanism to improve its performance. The file buffer caches the storage file data in the same format as it is stored on the durable storage media. The object buffer caches vertices and relationships in a format that is optimized for high traversal speeds and transactional writes. Neo4j processes graphs by traversing all vertices, with the use of either the BFS or DFS traversal algorithm. To start graph traversal a program has to define a special reference vertex. This vertex is not a part of the original graph, but an additional artificial vertex which is add to the graph structure and act as a starting point of the graph traversal. All graph operations are performed as ACID transactions.
  • 5. 3. Analysis of Platforms The performance analyses of these platforms have been done several times but here I am using two materials and their results to write this section of report. The one is M.S. theses report of Marcin Biczak batch of 2013 from Delft University of Technology. Another one is report titled [8] “How well do Graph-processing platform performs?” From these two materials, some important finding comes out which are as listed below. [8]However, the performance of all platforms is stable and largest variance around 10%. a. Hadoop-MapReduce i. [7]Hadoop-MapReduce performs worst in any graph algorithm then other platforms. ii. [7]Multi-iteration algorithms suffer from additional performance penalties. b. Giraph i. [8]Giraph process graph in-memory and realize dynamic computation mechanism by which only selected vertices will be processed in all iterations of algorithms. That reduces computation time. ii. [7]For large amounts of messages or big datasets, Giraph can lead to crashes due to lack of memory. c. Neo4j i. [8]Limited by the resource of single machine, the performance of Neo4j becomes significantly worst when the graph exceeds the memory capacity. ii. [7]Neo4j was designed as a single machine dataset. To achieve multi scale, users of Neo4j have to implement communication between these machines as well as manage partitioning, consistency etc. It require significant amount of additional work beside the application implementations. iii. [7]Two-level cache allows Neo4j to achieve excellent hot-cache execution times, especially when graph data accessed by the algorithm fits in cache. iv. [8]The data ingestion time of Neo4j matches closely the characteristics of the graph. Overall, data ingestion takes much longer for Neo4j than HDFS.
  • 6. 4. Conclusion Based on survey, we can reach to following conclusion. Modern computers can handle most of smaller or sparser graph databases. However, once the dataset size significantly increases or if the graph is dense, the execution time increases significantly. For this reason, single machine based graph processing platforms cannot compete with distributed system. We have considered two graph-processing frameworks (Giraph, Neo4j) and a generic data-processing platform (Hadoop). The platforms which focus on processing graph dataset achieve significant performance advantages over generic platforms in most of cases. A Hadoop does not maintain the relations between data and treats every vertex as a disjoint, which other platforms have to and pay a performance penalty for it. Thus for certain datasets Hadoop can achieve better performance than the graph-processing platforms. There are two significant factors for the large-scale graph-processing platforms: the programming model and platform design. The Pregel performs much better than MapReduce for iterative algorithms. Giraph has limitation that it performs in memory computation, which limitation is not for Hadoop-mapreduce. [7]The Neo4j has achieved good performance for smaller or sparser dataset on single system. It has very good documentation and hence easy to learn. [7]The Hadoop-mapreduce considered slowest platform of all evaluated platforms but Neo4j’s performance for the large or dense dataset is lower than that of Hadoop-mapreduce. [7]The Giraph platform, which represents distributed large-scale graph processing platforms, was the fastest platform in all the test experiment made by Marcin Biczak.
  • 7. 5. References 1. Apache Software Foundation, “Giraph.” http://giraph.apache.org 2. J. Dean and S. Ghemawat, “MapReduce: Simplified Data Processing on Large Clusters,” Comm. ACM, vol. 51, no. 1,2008, pp. 107–112. 3. Pregel: a system for large-scale graph processing - "abstract". G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. In Proceedings of the 28th ACM symposium on Principles of distributed computing, PODC '09, pages 6- 6,New York, NY, USA, 2009. ACM. 4. Apache Software Foundation, “Hadoop” Website, 2011.http://hadoop.apache.org 5. Apache Software Foundation, “Zookeeper”.Website,2010. http://zookeeper.apache.org 6. Neo Technology, http://www.neo4j.org 7. LudoGraph: a Sampling Capable Cloud-Based System for Large-Scale Graph Processing Based on the Pregel programming model, Marcin Biczak, Masters of Science Thesis,Delft University of Technology Year 2013 8. Y. Guo, M. Biczak, A. Varbanescu, A. Isoup, C. Martella, and T. Willke, “How well do graph-processing platforms perform? an empirical performance evaluation and analysis: Extended report,” tech. rep., Delft University of Technology, 2013.