18. Options?
• Crafting a custom distributed infrastructure
• Relying on an existing distributed computing platform
19. Options?
• Crafting a custom distributed infrastructure
• Relying on an existing distributed computing platform
• Using a single-computer graph algorithm library
20. Options?
• Crafting a custom distributed infrastructure
• Relying on an existing distributed computing platform
• Using a single-computer graph algorithm library
• Using an existing parallel graph system
21. Options?
• Crafting a custom distributed infrastructure
• Relying on an existing distributed computing platform
• Using a single-computer graph algorithm library
• Using an existing parallel graph system
efficient processing of large graphs
22. Options?
locality of memory access
efficient processing of large graphs
• Crafting a custom distributed infrastructure
• Relying on an existing distributed computing platform
• Using a single-computer graph algorithm library
• Using an existing parallel graph system
23. Options?
locality of memory access
fault-tolerant platform
efficient processing of large graphs
• Crafting a custom distributed infrastructure
• Relying on an existing distributed computing platform
• Using a single-computer graph algorithm library
• Using an existing parallel graph system
24. Options?
locality of memory access
efficient processing of large graphs
• Crafting a custom distributed infrastructure
• Relying on an existing distributed computing platform
• Using a single-computer graph algorithm library
• Using an existing parallel graph system
general-purpose system
fault-tolerant platform
25. locality of memory access
efficient processing of large graphs
Pregel
general-purpose system
fault-tolerant platform
•
•
•
•
26. locality of memory access
efficient processing of large graphs
Pregel
general-purpose system
fault-tolerant platform
•
•
•
•
BSP
27. The Bulk Synchronous Parallel (BSP) abstract computer
is a bridging model for designing parallel algorithms
- 출처 : Bulk synchronous parallel - Wikipedia, the free encyclopedia
28. BSP computer :
- processors connected by a communication network
- fast local memory
- different threads of computation
- series of global supersteps
38. Pregel Computation
• Input : directed graph
• Sequence of supersteps
• output
- Vertex ID
- Value
- Value (weight)
- Target vertex ID
- Value (weight)
- Target vertex ID
…-
V E E
- Vertex ID
- Value
- Value (weight)
- Target vertex ID
- Value (weight)
- Target vertex ID
…-
V E E
- Vertex ID
- Value
- Value (weight)
- Target vertex ID
- Value (weight)
- Target vertex ID
…-
V E E
42. Pregel Computation
• Input : directed graph
• Sequence of supersteps
• output
o the set of values explicitly output by the vertices
o aggregated statistics mined from the graph
47. Pregel API
• Message Passing
• Combiners
• Aggregators
• Topology Mutations
• Input and Output
48. Master/Worker model
- 출처 : http://de.slideshare.net/sscdotopen/introducing-apache-giraph-for-large-scale-graph-processing
49. Master/Worker model
- 출처 : http://de.slideshare.net/sscdotopen/introducing-apache-giraph-for-large-scale-graph-processing
50. Fault Tolerance
• Checkpointing
o The master periodically instructs the workers to save the state of
their partitions to persistent storage
e.g., Vertex values, edge values, incoming messages
• Failure detection
o Using regular “ping” messages
• Recovery
o The master reassigns graph partitions to the currently available
workers
o The workers all reload their partition state from most recent
available checkpoint
51.
52. Giraph
• Open source implementation of Pregel
• Runs on Hadoop infrastructure
o map-only job in hadoop
• Computation is executed in memory
• Uses Apache ZooKeeper for synchronization
o If not exist, hadoop file system instead
53. Giraph
• Choose your graph generic types
o Vertex ID (type I)
o Vertex value (type V)
o Edge value (type E)
o Message value (type M)
• Define how to load the graph into Giraph
o Vertex Input Format
• Define how to store the graph from Giraph
o Vertex Output Format
• Override the compute() method