Giraph

Agenda
• Introduction
• BSP
• Pregel
• Giraph

Apache Giraph is an iterative graph
processing system built for high
scalability.
- 출처 : http://giraph.apache.org/

- 출처 : http://giraph.apache.org/
Giraph originated as the open-
source counterpart to Pregel.

Google Pregel is
distributed system especially developed
for large scale graph processing

다양한 그래프 문제들
V E

웹 그래프
소셜 네트워크
뉴스 기사의 유사성
질병 발생 경로
운송 경로
...

인터넷
웹 그래프
뉴스 기사의 유사성
소셜 네트워크
질병 발생 경로
운송 경로
...
웹 2.0

웹 그래프
소셜 네트워크
대규모 그래프
Internet of Things
모바일
인터넷
웹 2.0

웹 그래프
소셜 네트워크
대규모 그래프
Internet of Things
모바일
billions of vertices, trillions of edges

Options?
• Crafting a custom distributed infrastructure

Options?
• Relying on an existing distributed computing platform

Options?
• Using a single-computer graph algorithm library

Options?
• Using an existing parallel graph system

Options?
efficient processing of large graphs

Options?
locality of memory access

Options?
fault-tolerant platform

Options?
general-purpose system

Pregel
•
•
•
•

Pregel
•
•
•
•
BSP

The Bulk Synchronous Parallel (BSP) abstract computer
is a bridging model for designing parallel algorithms
- 출처 : Bulk synchronous parallel - Wikipedia, the free encyclopedia

BSP computer :
- processors connected by a communication network
- fast local memory
- different threads of computation
- series of global supersteps

(series of) supersteps
… …
cf. MapReduce : (map / reduce) + (map / reduce) + (map / reduce) + …

superstep
독립적
단방향순서 고려 X

superstep
독립적
단방향
Costly
but attractive
순서 고려 X

V
user-define
function
S - 1
S + 1
superstep S

Pregel Computation
• Input : directed graph
• Sequence of supersteps
• output

Pregel Computation
• output
- Vertex ID
- Value
- Value (weight)
- Target vertex ID
- Value (weight)
- Target vertex ID
…-
V E E
- Vertex ID
- Value
- Value (weight)
- Target vertex ID
- Value (weight)
- Target vertex ID
…-
V E E
- Vertex ID
- Value
- Value (weight)
- Target vertex ID
- Value (weight)
- Target vertex ID
…-
V E E

Pregel Computation
• output
84

Pregel Computation
• output
84
종료!!
…
4

Pregel Computation
• output
o the set of values explicitly output by the vertices
o aggregated statistics mined from the graph

- 출처 : http://prezi.com/zghqtkqstrg-/apache-giraph-berlin-buzzwords/
MapReduce?

Pregel API
• Message Passing
• Combiners
• Aggregators
• Topology Mutations
• Input and Output

Master/Worker model
- 출처 : http://de.slideshare.net/sscdotopen/introducing-apache-giraph-for-large-scale-graph-processing

Fault Tolerance
• Checkpointing
o The master periodically instructs the workers to save the state of
their partitions to persistent storage
 e.g., Vertex values, edge values, incoming messages
• Failure detection
o Using regular “ping” messages
• Recovery
o The master reassigns graph partitions to the currently available
workers
o The workers all reload their partition state from most recent
available checkpoint

Giraph
• Open source implementation of Pregel
• Runs on Hadoop infrastructure
o map-only job in hadoop
• Computation is executed in memory
• Uses Apache ZooKeeper for synchronization
o If not exist, hadoop file system instead

Giraph
• Choose your graph generic types
o Vertex ID (type I)
o Vertex value (type V)
o Edge value (type E)
o Message value (type M)
• Define how to load the graph into Giraph
o Vertex Input Format
• Define how to store the graph from Giraph
o Vertex Output Format
• Override the compute() method

Giraph (Shortest Path example)
• generic types
• compute() method

• In/output format

1 2
3 4
1
3
1
2
10

참고자료
• Pregel: A System for Large-Scale Graph Processing
• http://en.wikipedia.org/wiki/Bulk_synchronous_parallel
• http://giraph.apache.org
• http://prezi.com/zghqtkqstrg-/apache-giraph-berlin-buzzwords/

Giraph

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Giraph

Semelhante a Giraph (20)

Mais de 주영 송

Mais de 주영 송 (12)

Último

Último (20)

Giraph