SlideShare uma empresa Scribd logo
1 de 22
Baixar para ler offline
A Distributed Graph-Processing Library
Ahmet Emre Aladağ - AGMLab
26.08.2013
● Library for large-scale graph processing.
● Runs on Apache Hadoop with Map Jobs
● Bulk Synchronous Parallel (BSP) model
What is Giraph?
1incoming
messages
outgoing
messages
0.2
0.53
0.32
0.16
0.12
0.34
Vertex
computation
Uses
● PageRank-variant iterative algorithms
● Graph clustering
○ Label propagation
○ Max Clique
○ Triangle Closure
○ Finding related people, groups, interests.
● Shortest-Path
○ Single source, s-t, all to all
● Finding Connected Components
Alternatives
● Map-Reduce jobs on Hadoop
○ Not a good fit for graph algorithms: overhead.
● Google Pregel
○ Requires its own infrastructure
○ Not available
○ Master is single point of failure.
● Message Passing Interface (MPI)
○ Not fault-tolerant
○ Too generic
How Giraph differs
● You can use a Hadoop cluster, no need for
special infrastructure.
● Easy deployment with Amazon EMR
● Dynamic resource management
● Graph oriented API
● Open Source
● Fault Tolerant, no SPOF except Hadoop
namenode and jobtracker
● Jython Support
Layers
Mechanism
InputFormat/Reader
Input
Computation OutputFormat/Writer
Output
● Accumulo
● HBase
● HCatalog
● HDFS
● Hive
● Neo4j etc.
● Accumulo
● HBase
● HCatalog
● HDFS
● Hive
● Neo4j etc.
● GraphViz
Adjacency matrix, id-
value pairs, JSON
InputFormat
● VertexInputFormat
1;3.4
2;6.1
3;2.7
● EdgeInputFormat
1;2
2;3
1;3
1 2 3
3.4 6.1 2.7
1 2 3
Computation
● Superstep barriers.
● Send/Receive messages from neighbors
● Update value.
● Vote to halt or wake up.
Single-Source Shortest Path Example
Shortest-Path Computation Code
Note: old API
Ex: Finding the maximum value
Aggregators
● Shared variables among the workers.
● Each vertex computation can add/multiply a
value to aggregators.
● Examples:
○ Holding the min/max value among all vertices
○ Holding sum of the vertex values.
○ Holding average value of vertex values.
○ Holding sum of mean square errors and stdev.
1 2 3
0.2
0.6
0.45
1.25
Computation at
Iteration k
MasterCompute Class
● Master’s compute() always runs before the
slaves (like pre-superstep)
○ In compute: aggregate vertex values: sum of values
○ In MasterCompute: average=sum/N
● Aggregators are registered here.
● You can set values to aggregators.
Worker Context
● Allows for the execution of user code on a
per-worker basis.
● There's one WorkerContext per worker.
● Methods for Pre/post superstep/application
operations.
Flexible Edge/Vertex Input
● Read edges/vertices from different sources.
● Multiple input resources
Parallel Computing
● More map jobs (workers) = parallel computing
● To overcome slowest worker problem,
multithreading is applied on
input/computation/output
● Linear speedup in CPU-bound applications
such as k-means clustering due to
multithreading
● Take a set of entrie machines & use
multithreading to maximize resource utilization.
Memory Optimization
● Vertices and edges are stored as serialized
byte arrays.
● Used FastUtil-based Java primitives.
Sharded Aggregators
● Each aggregator is randomly assigned to one of the workers.
● The assigned worker is in charge of gathering the values of its aggregators
from all workers, performing the aggregation, and distributing the final values
to other workers.
● Aggregation responsibilities are balanced across all workers rather than
bottlenecked by the master.
Performance
● PageRank on 1 trillion edges with 200 commodity
machines: 4 minutes/iteration.
● K-Means on 1 billion input vectors x 100 features into
10.000 centroids: 10 minutes.
● Linear Scalability
Currently
● Version 1.0, on the way to 1.1
● Changing rapidly: backwards-incompatible
changes
● Documentation not mature yet.
● More algorithms to be contributed.
● More data sources to be ported.
● http://giraph.apache.org for more info
References
Giraph: Large-scale graph processing infrastructure on Hadoop, 2011
Scaling Apache Giraph to a trillion edges, Avery Ching, Facebook, 2013
Scaling Apache Giraph, Nitay Joffe, Facebook, 2013.
Giraph: http://giraph.apache.org
Questions
?

Mais conteúdo relacionado

Mais procurados

Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data Processing
DataWorks Summit
 

Mais procurados (20)

IBM MQ Whats new - including 9.3 and 9.3.1
IBM MQ Whats new - including 9.3 and 9.3.1IBM MQ Whats new - including 9.3 and 9.3.1
IBM MQ Whats new - including 9.3 and 9.3.1
 
IBM MQ - High Availability and Disaster Recovery
IBM MQ - High Availability and Disaster RecoveryIBM MQ - High Availability and Disaster Recovery
IBM MQ - High Availability and Disaster Recovery
 
IBM MQ CONNAUTH/CHLAUTH Doesn't Work Like You Think it Does (and if you aren'...
IBM MQ CONNAUTH/CHLAUTH Doesn't Work Like You Think it Does (and if you aren'...IBM MQ CONNAUTH/CHLAUTH Doesn't Work Like You Think it Does (and if you aren'...
IBM MQ CONNAUTH/CHLAUTH Doesn't Work Like You Think it Does (and if you aren'...
 
Anypoint mq queues and exchanges
Anypoint mq queues and exchangesAnypoint mq queues and exchanges
Anypoint mq queues and exchanges
 
Introduction to backwards learning algorithm
Introduction to backwards learning algorithmIntroduction to backwards learning algorithm
Introduction to backwards learning algorithm
 
Apache Kafka - Messaging System Overview
Apache Kafka - Messaging System OverviewApache Kafka - Messaging System Overview
Apache Kafka - Messaging System Overview
 
Intro to AsyncAPI
Intro to AsyncAPIIntro to AsyncAPI
Intro to AsyncAPI
 
Apache Giraph: Large-scale graph processing done better
Apache Giraph: Large-scale graph processing done betterApache Giraph: Large-scale graph processing done better
Apache Giraph: Large-scale graph processing done better
 
IBM MQ: An Introduction to Using and Developing with MQ Publish/Subscribe
IBM MQ: An Introduction to Using and Developing with MQ Publish/SubscribeIBM MQ: An Introduction to Using and Developing with MQ Publish/Subscribe
IBM MQ: An Introduction to Using and Developing with MQ Publish/Subscribe
 
Connecting mq&kafka
Connecting mq&kafkaConnecting mq&kafka
Connecting mq&kafka
 
Apache Kafka
Apache Kafka Apache Kafka
Apache Kafka
 
IBM MQ and Kafka, what is the difference?
IBM MQ and Kafka, what is the difference?IBM MQ and Kafka, what is the difference?
IBM MQ and Kafka, what is the difference?
 
MQ Guide France - IBM MQ and Containers
MQ Guide France - IBM MQ and ContainersMQ Guide France - IBM MQ and Containers
MQ Guide France - IBM MQ and Containers
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data Processing
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
KNIME Software Overview
KNIME Software OverviewKNIME Software Overview
KNIME Software Overview
 
Introduction to Kafka and Event-Driven
Introduction to Kafka and Event-DrivenIntroduction to Kafka and Event-Driven
Introduction to Kafka and Event-Driven
 
Introduction to Kafka
Introduction to KafkaIntroduction to Kafka
Introduction to Kafka
 
IBM MQ: Managing Workloads, Scaling and Availability with MQ Clusters
IBM MQ: Managing Workloads, Scaling and Availability with MQ ClustersIBM MQ: Managing Workloads, Scaling and Availability with MQ Clusters
IBM MQ: Managing Workloads, Scaling and Availability with MQ Clusters
 
IBM MQ Overview (IBM Message Queue)
IBM MQ Overview (IBM Message Queue)IBM MQ Overview (IBM Message Queue)
IBM MQ Overview (IBM Message Queue)
 

Semelhante a Apache Giraph

Pregel: A System For Large Scale Graph Processing
Pregel: A System For Large Scale Graph ProcessingPregel: A System For Large Scale Graph Processing
Pregel: A System For Large Scale Graph Processing
Riyad Parvez
 
Hadoop and cassandra
Hadoop and cassandraHadoop and cassandra
Hadoop and cassandra
Christina Yu
 
Introductionofdatastructure 110731092019-phpapp01
Introductionofdatastructure 110731092019-phpapp01Introductionofdatastructure 110731092019-phpapp01
Introductionofdatastructure 110731092019-phpapp01
Jay Patel
 
Introduction of data_structure
Introduction of data_structureIntroduction of data_structure
Introduction of data_structure
eShikshak
 

Semelhante a Apache Giraph (20)

Pregel: A System For Large Scale Graph Processing
Pregel: A System For Large Scale Graph ProcessingPregel: A System For Large Scale Graph Processing
Pregel: A System For Large Scale Graph Processing
 
Architecting and productionising data science applications at scale
Architecting and productionising data science applications at scaleArchitecting and productionising data science applications at scale
Architecting and productionising data science applications at scale
 
MapReduce: Simplified Data Processing on Large Clusters
MapReduce: Simplified Data Processing on Large ClustersMapReduce: Simplified Data Processing on Large Clusters
MapReduce: Simplified Data Processing on Large Clusters
 
Apache spark - Spark's distributed programming model
Apache spark - Spark's distributed programming modelApache spark - Spark's distributed programming model
Apache spark - Spark's distributed programming model
 
Apache Singa AI
Apache Singa AIApache Singa AI
Apache Singa AI
 
Custom Pregel Algorithms in ArangoDB
Custom Pregel Algorithms in ArangoDBCustom Pregel Algorithms in ArangoDB
Custom Pregel Algorithms in ArangoDB
 
Introduction to AWS Big Data
Introduction to AWS Big Data Introduction to AWS Big Data
Introduction to AWS Big Data
 
Software Design Practices for Large-Scale Automation
Software Design Practices for Large-Scale AutomationSoftware Design Practices for Large-Scale Automation
Software Design Practices for Large-Scale Automation
 
Scalability broad strokes
Scalability   broad strokesScalability   broad strokes
Scalability broad strokes
 
Introduction of MapReduce
Introduction of MapReduceIntroduction of MapReduce
Introduction of MapReduce
 
Comparing pregel related systems
Comparing pregel related systemsComparing pregel related systems
Comparing pregel related systems
 
BlaBlaCar Elastic Search Feedback
BlaBlaCar Elastic Search FeedbackBlaBlaCar Elastic Search Feedback
BlaBlaCar Elastic Search Feedback
 
Big Data processing with Apache Spark
Big Data processing with Apache SparkBig Data processing with Apache Spark
Big Data processing with Apache Spark
 
Computer Graphics - Lecture 01 - 3D Programming I
Computer Graphics - Lecture 01 - 3D Programming IComputer Graphics - Lecture 01 - 3D Programming I
Computer Graphics - Lecture 01 - 3D Programming I
 
Apache Hive for modern DBAs
Apache Hive for modern DBAsApache Hive for modern DBAs
Apache Hive for modern DBAs
 
Netflix machine learning
Netflix machine learningNetflix machine learning
Netflix machine learning
 
Hadoop and cassandra
Hadoop and cassandraHadoop and cassandra
Hadoop and cassandra
 
Introductionofdatastructure 110731092019-phpapp01
Introductionofdatastructure 110731092019-phpapp01Introductionofdatastructure 110731092019-phpapp01
Introductionofdatastructure 110731092019-phpapp01
 
Introduction of data_structure
Introduction of data_structureIntroduction of data_structure
Introduction of data_structure
 
OQGraph @ SCaLE 11x 2013
OQGraph @ SCaLE 11x 2013OQGraph @ SCaLE 11x 2013
OQGraph @ SCaLE 11x 2013
 

Último

Top profile Call Girls In Sagar [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Sagar [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Sagar [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Sagar [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Cara Gugurkan Kandungan Awal Kehamilan 1 bulan (087776558899)
Cara Gugurkan Kandungan Awal Kehamilan 1 bulan (087776558899)Cara Gugurkan Kandungan Awal Kehamilan 1 bulan (087776558899)
Cara Gugurkan Kandungan Awal Kehamilan 1 bulan (087776558899)
Cara Menggugurkan Kandungan 087776558899
 
Top profile Call Girls In Etawah [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Etawah [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Etawah [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Etawah [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
怎样办理加利福尼亚大学伯克利分校毕业证(UC Berkeley毕业证书)成绩单学校原版复制
怎样办理加利福尼亚大学伯克利分校毕业证(UC Berkeley毕业证书)成绩单学校原版复制怎样办理加利福尼亚大学伯克利分校毕业证(UC Berkeley毕业证书)成绩单学校原版复制
怎样办理加利福尼亚大学伯克利分校毕业证(UC Berkeley毕业证书)成绩单学校原版复制
yynod
 
怎样办理哥伦比亚大学毕业证(Columbia毕业证书)成绩单学校原版复制
怎样办理哥伦比亚大学毕业证(Columbia毕业证书)成绩单学校原版复制怎样办理哥伦比亚大学毕业证(Columbia毕业证书)成绩单学校原版复制
怎样办理哥伦比亚大学毕业证(Columbia毕业证书)成绩单学校原版复制
yynod
 
如何办理堪培拉大学毕业证(UC毕业证书)成绩单原版一比一
如何办理堪培拉大学毕业证(UC毕业证书)成绩单原版一比一如何办理堪培拉大学毕业证(UC毕业证书)成绩单原版一比一
如何办理堪培拉大学毕业证(UC毕业证书)成绩单原版一比一
ozave
 
Top profile Call Girls In Varanasi [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Varanasi [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Varanasi [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Varanasi [ 7014168258 ] Call Me For Genuine Models ...
gajnagarg
 
Top profile Call Girls In Rampur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Rampur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Rampur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Rampur [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Top profile Call Girls In Hubli [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hubli [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hubli [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hubli [ 7014168258 ] Call Me For Genuine Models We ...
gajnagarg
 
Top profile Call Girls In Shillong [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Shillong [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Shillong [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Shillong [ 7014168258 ] Call Me For Genuine Models ...
gajnagarg
 
一比一定(购)中央昆士兰大学毕业证(CQU毕业证)成绩单学位证
一比一定(购)中央昆士兰大学毕业证(CQU毕业证)成绩单学位证一比一定(购)中央昆士兰大学毕业证(CQU毕业证)成绩单学位证
一比一定(购)中央昆士兰大学毕业证(CQU毕业证)成绩单学位证
eqaqen
 
Top profile Call Girls In Jabalpur [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Jabalpur [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Jabalpur [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Jabalpur [ 7014168258 ] Call Me For Genuine Models ...
gajnagarg
 
Top profile Call Girls In Ratnagiri [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Ratnagiri [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Ratnagiri [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Ratnagiri [ 7014168258 ] Call Me For Genuine Models...
gajnagarg
 
<DUBAI>Abortion pills IN UAE {{+971561686603*^Mifepristone & Misoprostol in D...
<DUBAI>Abortion pills IN UAE {{+971561686603*^Mifepristone & Misoprostol in D...<DUBAI>Abortion pills IN UAE {{+971561686603*^Mifepristone & Misoprostol in D...
<DUBAI>Abortion pills IN UAE {{+971561686603*^Mifepristone & Misoprostol in D...
gynedubai
 
K Venkat Naveen Kumar | GCP Data Engineer | CV
K Venkat Naveen Kumar | GCP Data Engineer | CVK Venkat Naveen Kumar | GCP Data Engineer | CV
K Venkat Naveen Kumar | GCP Data Engineer | CV
K VENKAT NAVEEN KUMAR
 

Último (20)

Top profile Call Girls In Sagar [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Sagar [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Sagar [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Sagar [ 7014168258 ] Call Me For Genuine Models We ...
 
Cara Gugurkan Kandungan Awal Kehamilan 1 bulan (087776558899)
Cara Gugurkan Kandungan Awal Kehamilan 1 bulan (087776558899)Cara Gugurkan Kandungan Awal Kehamilan 1 bulan (087776558899)
Cara Gugurkan Kandungan Awal Kehamilan 1 bulan (087776558899)
 
Top profile Call Girls In Etawah [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Etawah [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Etawah [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Etawah [ 7014168258 ] Call Me For Genuine Models We...
 
B.tech Civil Engineering Major Project by Deepak Kumar ppt.pdf
B.tech Civil Engineering Major Project by Deepak Kumar ppt.pdfB.tech Civil Engineering Major Project by Deepak Kumar ppt.pdf
B.tech Civil Engineering Major Project by Deepak Kumar ppt.pdf
 
怎样办理加利福尼亚大学伯克利分校毕业证(UC Berkeley毕业证书)成绩单学校原版复制
怎样办理加利福尼亚大学伯克利分校毕业证(UC Berkeley毕业证书)成绩单学校原版复制怎样办理加利福尼亚大学伯克利分校毕业证(UC Berkeley毕业证书)成绩单学校原版复制
怎样办理加利福尼亚大学伯克利分校毕业证(UC Berkeley毕业证书)成绩单学校原版复制
 
Launch Your Research Career: A Beginner's Guide
Launch Your Research Career: A Beginner's GuideLaunch Your Research Career: A Beginner's Guide
Launch Your Research Career: A Beginner's Guide
 
怎样办理哥伦比亚大学毕业证(Columbia毕业证书)成绩单学校原版复制
怎样办理哥伦比亚大学毕业证(Columbia毕业证书)成绩单学校原版复制怎样办理哥伦比亚大学毕业证(Columbia毕业证书)成绩单学校原版复制
怎样办理哥伦比亚大学毕业证(Columbia毕业证书)成绩单学校原版复制
 
如何办理堪培拉大学毕业证(UC毕业证书)成绩单原版一比一
如何办理堪培拉大学毕业证(UC毕业证书)成绩单原版一比一如何办理堪培拉大学毕业证(UC毕业证书)成绩单原版一比一
如何办理堪培拉大学毕业证(UC毕业证书)成绩单原版一比一
 
Kannada Call Girls Mira Bhayandar WhatsApp +91-9930687706, Best Service
Kannada Call Girls Mira Bhayandar WhatsApp +91-9930687706, Best ServiceKannada Call Girls Mira Bhayandar WhatsApp +91-9930687706, Best Service
Kannada Call Girls Mira Bhayandar WhatsApp +91-9930687706, Best Service
 
Top profile Call Girls In Varanasi [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Varanasi [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Varanasi [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Varanasi [ 7014168258 ] Call Me For Genuine Models ...
 
Top profile Call Girls In Rampur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Rampur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Rampur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Rampur [ 7014168258 ] Call Me For Genuine Models We...
 
Top profile Call Girls In Hubli [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hubli [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hubli [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hubli [ 7014168258 ] Call Me For Genuine Models We ...
 
DMER-AYUSH-MIMS-Staff-Nurse-_Selection-List-04-05-2024.pdf
DMER-AYUSH-MIMS-Staff-Nurse-_Selection-List-04-05-2024.pdfDMER-AYUSH-MIMS-Staff-Nurse-_Selection-List-04-05-2024.pdf
DMER-AYUSH-MIMS-Staff-Nurse-_Selection-List-04-05-2024.pdf
 
Top profile Call Girls In Shillong [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Shillong [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Shillong [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Shillong [ 7014168258 ] Call Me For Genuine Models ...
 
一比一定(购)中央昆士兰大学毕业证(CQU毕业证)成绩单学位证
一比一定(购)中央昆士兰大学毕业证(CQU毕业证)成绩单学位证一比一定(购)中央昆士兰大学毕业证(CQU毕业证)成绩单学位证
一比一定(购)中央昆士兰大学毕业证(CQU毕业证)成绩单学位证
 
Top profile Call Girls In Jabalpur [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Jabalpur [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Jabalpur [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Jabalpur [ 7014168258 ] Call Me For Genuine Models ...
 
Top profile Call Girls In Ratnagiri [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Ratnagiri [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Ratnagiri [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Ratnagiri [ 7014168258 ] Call Me For Genuine Models...
 
B.tech civil major project by Deepak Kumar
B.tech civil major project by Deepak KumarB.tech civil major project by Deepak Kumar
B.tech civil major project by Deepak Kumar
 
<DUBAI>Abortion pills IN UAE {{+971561686603*^Mifepristone & Misoprostol in D...
<DUBAI>Abortion pills IN UAE {{+971561686603*^Mifepristone & Misoprostol in D...<DUBAI>Abortion pills IN UAE {{+971561686603*^Mifepristone & Misoprostol in D...
<DUBAI>Abortion pills IN UAE {{+971561686603*^Mifepristone & Misoprostol in D...
 
K Venkat Naveen Kumar | GCP Data Engineer | CV
K Venkat Naveen Kumar | GCP Data Engineer | CVK Venkat Naveen Kumar | GCP Data Engineer | CV
K Venkat Naveen Kumar | GCP Data Engineer | CV
 

Apache Giraph

  • 1. A Distributed Graph-Processing Library Ahmet Emre Aladağ - AGMLab 26.08.2013
  • 2. ● Library for large-scale graph processing. ● Runs on Apache Hadoop with Map Jobs ● Bulk Synchronous Parallel (BSP) model What is Giraph? 1incoming messages outgoing messages 0.2 0.53 0.32 0.16 0.12 0.34 Vertex computation
  • 3. Uses ● PageRank-variant iterative algorithms ● Graph clustering ○ Label propagation ○ Max Clique ○ Triangle Closure ○ Finding related people, groups, interests. ● Shortest-Path ○ Single source, s-t, all to all ● Finding Connected Components
  • 4. Alternatives ● Map-Reduce jobs on Hadoop ○ Not a good fit for graph algorithms: overhead. ● Google Pregel ○ Requires its own infrastructure ○ Not available ○ Master is single point of failure. ● Message Passing Interface (MPI) ○ Not fault-tolerant ○ Too generic
  • 5. How Giraph differs ● You can use a Hadoop cluster, no need for special infrastructure. ● Easy deployment with Amazon EMR ● Dynamic resource management ● Graph oriented API ● Open Source ● Fault Tolerant, no SPOF except Hadoop namenode and jobtracker ● Jython Support
  • 7. Mechanism InputFormat/Reader Input Computation OutputFormat/Writer Output ● Accumulo ● HBase ● HCatalog ● HDFS ● Hive ● Neo4j etc. ● Accumulo ● HBase ● HCatalog ● HDFS ● Hive ● Neo4j etc. ● GraphViz Adjacency matrix, id- value pairs, JSON
  • 9. Computation ● Superstep barriers. ● Send/Receive messages from neighbors ● Update value. ● Vote to halt or wake up. Single-Source Shortest Path Example
  • 11. Ex: Finding the maximum value
  • 12. Aggregators ● Shared variables among the workers. ● Each vertex computation can add/multiply a value to aggregators. ● Examples: ○ Holding the min/max value among all vertices ○ Holding sum of the vertex values. ○ Holding average value of vertex values. ○ Holding sum of mean square errors and stdev. 1 2 3 0.2 0.6 0.45 1.25 Computation at Iteration k
  • 13. MasterCompute Class ● Master’s compute() always runs before the slaves (like pre-superstep) ○ In compute: aggregate vertex values: sum of values ○ In MasterCompute: average=sum/N ● Aggregators are registered here. ● You can set values to aggregators.
  • 14. Worker Context ● Allows for the execution of user code on a per-worker basis. ● There's one WorkerContext per worker. ● Methods for Pre/post superstep/application operations.
  • 15. Flexible Edge/Vertex Input ● Read edges/vertices from different sources. ● Multiple input resources
  • 16. Parallel Computing ● More map jobs (workers) = parallel computing ● To overcome slowest worker problem, multithreading is applied on input/computation/output ● Linear speedup in CPU-bound applications such as k-means clustering due to multithreading ● Take a set of entrie machines & use multithreading to maximize resource utilization.
  • 17. Memory Optimization ● Vertices and edges are stored as serialized byte arrays. ● Used FastUtil-based Java primitives.
  • 18. Sharded Aggregators ● Each aggregator is randomly assigned to one of the workers. ● The assigned worker is in charge of gathering the values of its aggregators from all workers, performing the aggregation, and distributing the final values to other workers. ● Aggregation responsibilities are balanced across all workers rather than bottlenecked by the master.
  • 19. Performance ● PageRank on 1 trillion edges with 200 commodity machines: 4 minutes/iteration. ● K-Means on 1 billion input vectors x 100 features into 10.000 centroids: 10 minutes. ● Linear Scalability
  • 20. Currently ● Version 1.0, on the way to 1.1 ● Changing rapidly: backwards-incompatible changes ● Documentation not mature yet. ● More algorithms to be contributed. ● More data sources to be ported. ● http://giraph.apache.org for more info
  • 21. References Giraph: Large-scale graph processing infrastructure on Hadoop, 2011 Scaling Apache Giraph to a trillion edges, Avery Ching, Facebook, 2013 Scaling Apache Giraph, Nitay Joffe, Facebook, 2013. Giraph: http://giraph.apache.org