SlideShare uma empresa Scribd logo
1 de 88
Baixar para ler offline
PowerGraph and GraphX
Amir H. Payberah
amir@sics.se
Amirkabir University of Technology
(Tehran Polytechnic)
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 1 / 61
Reminder
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 2 / 61
Data-Parallel vs. Graph-Parallel Computation
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 3 / 61
Pregel
Vertex-centric
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 4 / 61
Pregel
Vertex-centric
Bulk Synchronous Parallel (BSP)
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 4 / 61
Pregel
Vertex-centric
Bulk Synchronous Parallel (BSP)
Runs in sequence of iterations (supersteps)
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 4 / 61
Pregel
Vertex-centric
Bulk Synchronous Parallel (BSP)
Runs in sequence of iterations (supersteps)
A vertex in superstep S can:
• reads messages sent to it in superstep S-1.
• sends messages to other vertices: receiving at superstep S+1.
• modifies its state.
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 4 / 61
Pregel Limitations
Inefficient if different regions of the graph converge at different
speed.
Can suffer if one task is more expensive than the others.
Runtime of each phase is determined by the slowest machine.
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 5 / 61
GraphLab
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 6 / 61
GraphLab
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 7 / 61
GraphLab Limitations
Poor performance on Natural graphs.
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 8 / 61
Natural Graphs
Graphs derived from natural phenomena.
Skewed Power-Law degree distribution.
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 9 / 61
Natural Graphs Challenges
Traditional graph-partitioning algorithms (edge-cut algorithms) per-
form poorly on Power-Law Graphs.
Challenges of high-degree vertices.
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 10 / 61
Proposed Solution
Vertex-Cut Partitioning
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 11 / 61
Proposed Solution
Vertex-Cut Partitioning
Edge-cut Vertex-cut
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 11 / 61
Edge-cut vs. Vertex-cut Partitioning
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 12 / 61
Edge-cut vs. Vertex-cut Partitioning
Edge-cut Vertex-cut
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 13 / 61
PowerGraph
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 14 / 61
PowerGraph
Vertex-cut partitioning of graphs.
Factorizes the GraphLab update function into the Gather, Apply and
Scatter phases (GAS).
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 15 / 61
Gather-Apply-Scatter Programming Model
Gather
• Accumulate information about neighborhood
through a generalized sum.
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 16 / 61
Gather-Apply-Scatter Programming Model
Gather
• Accumulate information about neighborhood
through a generalized sum.
Apply
• Apply the accumulated value to center vertex.
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 16 / 61
Gather-Apply-Scatter Programming Model
Gather
• Accumulate information about neighborhood
through a generalized sum.
Apply
• Apply the accumulated value to center vertex.
Scatter
• Update adjacent edges and vertices.
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 16 / 61
Data Model
A directed graph that stores the program state, called data graph.
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 17 / 61
Execution Model (1/2)
Vertex-centric programming: implementing the GASVertexProgram
interface (vertex-program for short).
Similar to Comput in Pregel, and update function in GraphLab.
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 18 / 61
Execution Model (2/2)
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 19 / 61
Execution Model (2/2)
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 19 / 61
PageRank - Pregel
Pregel_PageRank(i, messages):
// receive all the messages
total = 0
foreach(msg in messages):
total = total + msg
// update the rank of this vertex
R[i] = 0.15 + total
// send new messages to neighbors
foreach(j in out_neighbors[i]):
sendmsg(R[i] * wij) to vertex j
R[i] = 0.15 +
j∈Nbrs(i)
wjiR[j]
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 20 / 61
PageRank - GraphLab
GraphLab_PageRank(i)
// compute sum over neighbors
total = 0
foreach(j in in_neighbors(i)):
total = total + R[j] * wji
// update the PageRank
R[i] = 0.15 + total
// trigger neighbors to run again
foreach(j in out_neighbors(i)):
signal vertex-program on j
R[i] = 0.15 +
j∈Nbrs(i)
wjiR[j]
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 21 / 61
PageRank - PowerGraph
PowerGraph_PageRank(i):
Gather(j -> i):
return wji * R[j]
sum(a, b):
return a + b
// total: Gather and sum
Apply(i, total):
R[i] = 0.15 + total
Scatter(i -> j):
if R[i] changed then activate(j)
R[i] = 0.15 +
j∈Nbrs(i)
wjiR[j]
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 22 / 61
Scheduling (1/5)
PowerGraph inherits the dynamic scheduling of GraphLab.
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 23 / 61
Scheduling (2/5)
Initially all vertices are active.
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 24 / 61
Scheduling (2/5)
Initially all vertices are active.
PowerGraph executes the vertex-program on the active vertices until
none remain.
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 24 / 61
Scheduling (2/5)
Initially all vertices are active.
PowerGraph executes the vertex-program on the active vertices until
none remain.
The order of executing activated vertices is up to the PowerGraph
execution engine.
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 24 / 61
Scheduling (2/5)
Initially all vertices are active.
PowerGraph executes the vertex-program on the active vertices until
none remain.
The order of executing activated vertices is up to the PowerGraph
execution engine.
Once a vertex-program completes the scatter phase it becomes in-
active until it is reactivated.
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 24 / 61
Scheduling (2/5)
Initially all vertices are active.
PowerGraph executes the vertex-program on the active vertices until
none remain.
The order of executing activated vertices is up to the PowerGraph
execution engine.
Once a vertex-program completes the scatter phase it becomes in-
active until it is reactivated.
Vertices can activate themselves and neighboring vertices.
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 24 / 61
Scheduling (3/5)
PowerGraph can execute both synchronously and asynchronously.
• Bulk synchronous execution
• Asynchronous execution
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 25 / 61
Scheduling - Bulk Synchronous Execution (4/5)
Similar to Pregel.
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 26 / 61
Scheduling - Bulk Synchronous Execution (4/5)
Similar to Pregel.
Minor-step: executing the gather, apply, and scatter in order.
• Runs synchronously on all active vertices with a barrier at the end.
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 26 / 61
Scheduling - Bulk Synchronous Execution (4/5)
Similar to Pregel.
Minor-step: executing the gather, apply, and scatter in order.
• Runs synchronously on all active vertices with a barrier at the end.
Super-step: a complete series of GAS minor-steps.
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 26 / 61
Scheduling - Bulk Synchronous Execution (4/5)
Similar to Pregel.
Minor-step: executing the gather, apply, and scatter in order.
• Runs synchronously on all active vertices with a barrier at the end.
Super-step: a complete series of GAS minor-steps.
Changes made to the vertex/edge data are committed at the end of
each minor-step and are visible in the subsequent minor-steps.
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 26 / 61
Scheduling - Asynchronous Execution (5/5)
Changes made to the vertex/edge data during the apply and scatter
functions are immediately committed to the graph.
• Visible to subsequent computation on neighboring vertices.
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 27 / 61
Scheduling - Asynchronous Execution (5/5)
Changes made to the vertex/edge data during the apply and scatter
functions are immediately committed to the graph.
• Visible to subsequent computation on neighboring vertices.
Serializability: prevents adjacent vertex-programs from running con-
currently using a fine-grained locking protocol.
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 27 / 61
Scheduling - Asynchronous Execution (5/5)
Changes made to the vertex/edge data during the apply and scatter
functions are immediately committed to the graph.
• Visible to subsequent computation on neighboring vertices.
Serializability: prevents adjacent vertex-programs from running con-
currently using a fine-grained locking protocol.
• Dining philosophers problem, where each vertex is a philosopher, and
each edge is a fork.
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 27 / 61
Scheduling - Asynchronous Execution (5/5)
Changes made to the vertex/edge data during the apply and scatter
functions are immediately committed to the graph.
• Visible to subsequent computation on neighboring vertices.
Serializability: prevents adjacent vertex-programs from running con-
currently using a fine-grained locking protocol.
• Dining philosophers problem, where each vertex is a philosopher, and
each edge is a fork.
• GraphLab implements Dijkstras solution, where forks are acquired
sequentially according to a total ordering.
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 27 / 61
Scheduling - Asynchronous Execution (5/5)
Changes made to the vertex/edge data during the apply and scatter
functions are immediately committed to the graph.
• Visible to subsequent computation on neighboring vertices.
Serializability: prevents adjacent vertex-programs from running con-
currently using a fine-grained locking protocol.
• Dining philosophers problem, where each vertex is a philosopher, and
each edge is a fork.
• GraphLab implements Dijkstras solution, where forks are acquired
sequentially according to a total ordering.
• PowerGraph implements Chandy-Misra solution, which acquires all
forks simultaneously.
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 27 / 61
Delta Caching (1/2)
Changes in a few of its neighbors → triggering a vertex-program
The gather operation is invoked on all neighbors: wasting compu-
tation cycles
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 28 / 61
Delta Caching (1/2)
Changes in a few of its neighbors → triggering a vertex-program
The gather operation is invoked on all neighbors: wasting compu-
tation cycles
Maintaining a cache of the accumulator av from the previous gather
phase for each vertex.
The scatter can return an additional ∆a, which is added to the
cached accumulator av.
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 28 / 61
Delta Caching (2/2)
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 29 / 61
Delta Caching (2/2)
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 29 / 61
Example: PageRank (Delta-Caching)
PowerGraph_PageRank(i):
Gather(j -> i):
return wji * R[j]
sum(a, b):
return a + b
// total: Gather and sum
Apply(i, total):
new = 0.15 + total
R[i].delta = new - R[i]
R[i] = new
Scatter(i -> j):
if R[i] changed then activate(j)
return wij * R[i].delta
R[i] = 0.15 +
j∈Nbrs(i)
wjiR[j]
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 30 / 61
Graph Partitioning
Vertex-cut partitioning.
Evenly assign edges to machines.
• Minimize machines spanned by each vertex.
Two proposed solutions:
• Random edge placement.
• Greedy edge placement.
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 31 / 61
Random Vertex-Cuts
Randomly assign edges to machines.
Completely parallel and easy to distribute.
High replication factor.
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 32 / 61
Greedy Vertex-Cuts (1/2)
A(v): set of machines that contain adjacent edges of v.
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 33 / 61
Greedy Vertex-Cuts (1/2)
A(v): set of machines that contain adjacent edges of v.
Case 1: If A(u) and A(v) intersect, then the edge should be assigned
to a machine in the intersection.
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 33 / 61
Greedy Vertex-Cuts (1/2)
A(v): set of machines that contain adjacent edges of v.
Case 1: If A(u) and A(v) intersect, then the edge should be assigned
to a machine in the intersection.
Case 2: If A(u) and A(v) are not empty and do not intersect, then
the edge should be assigned to one of the machines from the vertex
with the most unassigned edges.
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 33 / 61
Greedy Vertex-Cuts (1/2)
A(v): set of machines that contain adjacent edges of v.
Case 1: If A(u) and A(v) intersect, then the edge should be assigned
to a machine in the intersection.
Case 2: If A(u) and A(v) are not empty and do not intersect, then
the edge should be assigned to one of the machines from the vertex
with the most unassigned edges.
Case 3: If only one of the two vertices has been assigned, then
choose a machine from the assigned vertex.
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 33 / 61
Greedy Vertex-Cuts (1/2)
A(v): set of machines that contain adjacent edges of v.
Case 1: If A(u) and A(v) intersect, then the edge should be assigned
to a machine in the intersection.
Case 2: If A(u) and A(v) are not empty and do not intersect, then
the edge should be assigned to one of the machines from the vertex
with the most unassigned edges.
Case 3: If only one of the two vertices has been assigned, then
choose a machine from the assigned vertex.
Case 4: If neither vertex has been assigned, then assign the edge to
the least loaded machine.
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 33 / 61
Greedy Vertex-Cuts (2/2)
Coordinated edge placement:
• Requires coordination to place each edge
• Slower, but higher quality cuts
Oblivious edge placement:
• Approx. greedy objective without coordination
• Faster, but lower quality cuts
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 34 / 61
PowerGraph Summary
Gather-Apply-Scatter programming model
Synchronous and Asynchronous models
Vertex-cut graph partitioning
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 35 / 61
Any limitations?
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 36 / 61
Data-Parallel vs. Graph-Parallel Computation
Graph-parallel computation: restricting the types of computation to
achieve performance.
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 37 / 61
Data-Parallel vs. Graph-Parallel Computation
Graph-parallel computation: restricting the types of computation to
achieve performance.
But, the same restrictions make it difficult and inefficient to express
many stages in a typical graph-analytics pipeline.
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 37 / 61
Data-Parallel and Graph-Parallel Pipeline
Moving between table and graph views of the same physical data.
Inefficient: extensive data movement and duplication across the net-
work and file system.
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 38 / 61
GraphX vs. Data-Parallel/Graph-Parallel Systems
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 39 / 61
GraphX vs. Data-Parallel/Graph-Parallel Systems
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 39 / 61
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 40 / 61
GraphX
New API that blurs the distinction between Tables and Graphs.
New system that unifies Data-Parallel and Graph-Parallel systems.
It is implemented on top of Spark.
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 41 / 61
Unifying Data-Parallel and Graph-Parallel Analytics
Tables and Graphs are composable views of the same physical data.
Each view has its own operators that exploit the semantics of the
view to achieve efficient execution.
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 42 / 61
Data Model
Property Graph: represented using two Spark RDDs:
• Edge collection: VertexRDD
• Vertex collection: EdgeRDD
// VD: the type of the vertex attribute
// ED: the type of the edge attribute
class Graph[VD, ED] {
val vertices: VertexRDD[VD]
val edges: EdgeRDD[ED]
}
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 43 / 61
Primitive Data Types
// Vertex collection
class VertexRDD[VD] extends RDD[(VertexId, VD)]
// Edge collection
class EdgeRDD[ED] extends RDD[Edge[ED]]
case class Edge[ED](srcId: VertexId = 0, dstId: VertexId = 0,
attr: ED = null.asInstanceOf[ED])
// Edge Triple
class EdgeTriplet[VD, ED] extends Edge[ED]
EdgeTriplet represents an edge along with the vertex attributes of
its neighboring vertices.
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 44 / 61
Example (1/3)
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 45 / 61
Example (2/3)
val sc: SparkContext
// Create an RDD for the vertices
val users: VertexRDD[(String, String)] = sc.parallelize(
Array((3L, ("rxin", "student")), (7L, ("jgonzal", "postdoc")),
(5L, ("franklin", "prof")), (2L, ("istoica", "prof"))))
// Create an RDD for edges
val relationships: EdgeRDD[String] = sc.parallelize(
Array(Edge(3L, 7L, "collab"), Edge(5L, 3L, "advisor"),
Edge(2L, 5L, "colleague"), Edge(5L, 7L, "pi")))
// Define a default user in case there are relationship with missing user
val defaultUser = ("John Doe", "Missing")
// Build the initial Graph
val userGraph: Graph[(String, String), String] =
Graph(users, relationships, defaultUser)
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 46 / 61
Example (3/3)
// Constructed from above
val userGraph: Graph[(String, String), String]
// Count all users which are postdocs
userGraph.vertices.filter((id, (name, pos)) => pos == "postdoc").count
// Count all the edges where src > dst
userGraph.edges.filter(e => e.srcId > e.dstId).count
// Use the triplets view to create an RDD of facts
val facts: RDD[String] = graph.triplets.map(triplet =>
triplet.srcAttr._1 + " is the " +
triplet.attr + " of " + triplet.dstAttr._1)
// Remove missing vertices as well as the edges to connected to them
val validGraph = graph.subgraph(vpred = (id, attr) => attr._2 != "Missing")
facts.collect.foreach(println(_))
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 47 / 61
Property Operators (1/2)
class Graph[VD, ED] {
def mapVertices[VD2](map: (VertexId, VD) => VD2): Graph[VD2, ED]
def mapEdges[ED2](map: Edge[ED] => ED2): Graph[VD, ED2]
def mapTriplets[ED2](map: EdgeTriplet[VD, ED] => ED2): Graph[VD, ED2]
}
They yield new graphs with the vertex or edge properties modified
by the map function.
The graph structure is unaffected.
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 48 / 61
Property Operators (2/2)
val newGraph = graph.mapVertices((id, attr) => mapUdf(id, attr))
val newVertices = graph.vertices.map((id, attr) => (id, mapUdf(id, attr)))
val newGraph = Graph(newVertices, graph.edges)
Both are logically equivalent, but the second one does not preserve
the structural indices and would not benefit from the GraphX system
optimizations.
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 49 / 61
Map Reduce Triplets
Map-Reduce for each vertex
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 50 / 61
Map Reduce Triplets
Map-Reduce for each vertex
// what is the age of the oldest follower for each user?
val oldestFollowerAge = graph.mapReduceTriplets(
e => (e.dstAttr, e.srcAttr), // Map
(a, b) => max(a, b) // Reduce
).vertices
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 50 / 61
Structural Operators
class Graph[VD, ED] {
// returns a new graph with all the edge directions reversed
def reverse: Graph[VD, ED]
// returns the graph containing only the vertices and edges that satisfy
// the vertex predicate
def subgraph(epred: EdgeTriplet[VD,ED] => Boolean,
vpred: (VertexId, VD) => Boolean): Graph[VD, ED]
// a subgraph by returning a graph that contains the vertices and edges
// that are also found in the input graph
def mask[VD2, ED2](other: Graph[VD2, ED2]): Graph[VD, ED]
}
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 51 / 61
Structural Operators Example
// Build the initial Graph
val graph = Graph(users, relationships, defaultUser)
// Run Connected Components
val ccGraph = graph.connectedComponents()
// Remove missing vertices as well as the edges to connected to them
val validGraph = graph.subgraph(vpred = (id, attr) => attr._2 != "Missing")
// Restrict the answer to the valid subgraph
val validCCGraph = ccGraph.mask(validGraph)
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 52 / 61
Join Operators
To join data from external collections (RDDs) with graphs.
class Graph[VD, ED] {
// joins the vertices with the input RDD and returns a new graph
// by applying the map function to the result of the joined vertices
def joinVertices[U](table: RDD[(VertexId, U)])
(map: (VertexId, VD, U) => VD): Graph[VD, ED]
// similarly to joinVertices, but the map function is applied to
// all vertices and can change the vertex property type
def outerJoinVertices[U, VD2](table: RDD[(VertexId, U)])
(map: (VertexId, VD, Option[U]) => VD2): Graph[VD2, ED]
}
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 53 / 61
Graph Builders
// load a graph from a list of edges on disk
object GraphLoader {
def edgeListFile(
sc: SparkContext,
path: String,
canonicalOrientation: Boolean = false,
minEdgePartitions: Int = 1)
: Graph[Int, Int]
}
// graph file
# This is a comment
2 1
4 1
1 2
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 54 / 61
GraphX and Spark
GraphX is implemented on top of Spark
In-memory caching
Lineage-based fault tolerance
Programmable partitioning
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 55 / 61
Distributed Graph Representation (1/2)
Representing graphs using two RDDs: edge-collection and vertex-
collection
Vertex-cut partitioning (like PowerGraph)
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 56 / 61
Distributed Graph Representation (2/2)
Each vertex partition contains a bitmask and routing table.
Routing table: a logical map from a vertex id to the set of edge
partitions that contains adjacent edges.
Bitmask: enables the set intersection and filtering.
• Vertices bitmasks are updated after each operation (e.g., mapRe-
duceTriplets).
• Vertices hidden by the bitmask do not participate in the graph oper-
ations.
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 57 / 61
Summary
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 58 / 61
Summary
PowerGraph
• GAS programming model
• Vertex-cut partitioning
GraphX
• Unifying data-parallel and graph-parallel analytics
• Vertex-cut partitioning
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 59 / 61
Questions?
Acknowledgements
Some pictures were derived from the Spark web site
(http://spark.apache.org/).
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 60 / 61
Questions?
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 61 / 61

Mais conteúdo relacionado

Mais procurados

How we solved Real-time User Segmentation using HBase
How we solved Real-time User Segmentation using HBaseHow we solved Real-time User Segmentation using HBase
How we solved Real-time User Segmentation using HBaseDataWorks Summit
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingIntroduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingTill Rohrmann
 
Apache Flink and what it is used for
Apache Flink and what it is used forApache Flink and what it is used for
Apache Flink and what it is used forAljoscha Krettek
 
Magnet Shuffle Service: Push-based Shuffle at LinkedIn
Magnet Shuffle Service: Push-based Shuffle at LinkedInMagnet Shuffle Service: Push-based Shuffle at LinkedIn
Magnet Shuffle Service: Push-based Shuffle at LinkedInDatabricks
 
Rocks db state store in structured streaming
Rocks db state store in structured streamingRocks db state store in structured streaming
Rocks db state store in structured streamingBalaji Mohanam
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergFlink Forward
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiDatabricks
 
Polyglot persistence @ netflix (CDE Meetup)
Polyglot persistence @ netflix (CDE Meetup) Polyglot persistence @ netflix (CDE Meetup)
Polyglot persistence @ netflix (CDE Meetup) Roopa Tangirala
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & FeaturesDataStax Academy
 
Performant Streaming in Production: Preventing Common Pitfalls when Productio...
Performant Streaming in Production: Preventing Common Pitfalls when Productio...Performant Streaming in Production: Preventing Common Pitfalls when Productio...
Performant Streaming in Production: Preventing Common Pitfalls when Productio...Databricks
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsAlluxio, Inc.
 
Application Timeline Server - Past, Present and Future
Application Timeline Server - Past, Present and FutureApplication Timeline Server - Past, Present and Future
Application Timeline Server - Past, Present and FutureVARUN SAXENA
 
Graph processing - Pregel
Graph processing - PregelGraph processing - Pregel
Graph processing - PregelAmir Payberah
 
Apache Sparkにおけるメモリ - アプリケーションを落とさないメモリ設計手法 -
Apache Sparkにおけるメモリ - アプリケーションを落とさないメモリ設計手法 -Apache Sparkにおけるメモリ - アプリケーションを落とさないメモリ設計手法 -
Apache Sparkにおけるメモリ - アプリケーションを落とさないメモリ設計手法 -Yoshiyasu SAEKI
 
Accelerating Apache Spark Shuffle for Data Analytics on the Cloud with Remote...
Accelerating Apache Spark Shuffle for Data Analytics on the Cloud with Remote...Accelerating Apache Spark Shuffle for Data Analytics on the Cloud with Remote...
Accelerating Apache Spark Shuffle for Data Analytics on the Cloud with Remote...Databricks
 
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...HostedbyConfluent
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraFlink Forward
 
Optimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsOptimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsDatabricks
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkFlink Forward
 
Batch and Stream Graph Processing with Apache Flink
Batch and Stream Graph Processing with Apache FlinkBatch and Stream Graph Processing with Apache Flink
Batch and Stream Graph Processing with Apache FlinkVasia Kalavri
 

Mais procurados (20)

How we solved Real-time User Segmentation using HBase
How we solved Real-time User Segmentation using HBaseHow we solved Real-time User Segmentation using HBase
How we solved Real-time User Segmentation using HBase
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingIntroduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processing
 
Apache Flink and what it is used for
Apache Flink and what it is used forApache Flink and what it is used for
Apache Flink and what it is used for
 
Magnet Shuffle Service: Push-based Shuffle at LinkedIn
Magnet Shuffle Service: Push-based Shuffle at LinkedInMagnet Shuffle Service: Push-based Shuffle at LinkedIn
Magnet Shuffle Service: Push-based Shuffle at LinkedIn
 
Rocks db state store in structured streaming
Rocks db state store in structured streamingRocks db state store in structured streaming
Rocks db state store in structured streaming
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & Iceberg
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 
Polyglot persistence @ netflix (CDE Meetup)
Polyglot persistence @ netflix (CDE Meetup) Polyglot persistence @ netflix (CDE Meetup)
Polyglot persistence @ netflix (CDE Meetup)
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & Features
 
Performant Streaming in Production: Preventing Common Pitfalls when Productio...
Performant Streaming in Production: Preventing Common Pitfalls when Productio...Performant Streaming in Production: Preventing Common Pitfalls when Productio...
Performant Streaming in Production: Preventing Common Pitfalls when Productio...
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
 
Application Timeline Server - Past, Present and Future
Application Timeline Server - Past, Present and FutureApplication Timeline Server - Past, Present and Future
Application Timeline Server - Past, Present and Future
 
Graph processing - Pregel
Graph processing - PregelGraph processing - Pregel
Graph processing - Pregel
 
Apache Sparkにおけるメモリ - アプリケーションを落とさないメモリ設計手法 -
Apache Sparkにおけるメモリ - アプリケーションを落とさないメモリ設計手法 -Apache Sparkにおけるメモリ - アプリケーションを落とさないメモリ設計手法 -
Apache Sparkにおけるメモリ - アプリケーションを落とさないメモリ設計手法 -
 
Accelerating Apache Spark Shuffle for Data Analytics on the Cloud with Remote...
Accelerating Apache Spark Shuffle for Data Analytics on the Cloud with Remote...Accelerating Apache Spark Shuffle for Data Analytics on the Cloud with Remote...
Accelerating Apache Spark Shuffle for Data Analytics on the Cloud with Remote...
 
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native Era
 
Optimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsOptimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL Joins
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
 
Batch and Stream Graph Processing with Apache Flink
Batch and Stream Graph Processing with Apache FlinkBatch and Stream Graph Processing with Apache Flink
Batch and Stream Graph Processing with Apache Flink
 

Destaque

Graph processing - Graphlab
Graph processing - GraphlabGraph processing - Graphlab
Graph processing - GraphlabAmir Payberah
 
An excursion into Graph Analytics with Apache Spark GraphX
An excursion into Graph Analytics with Apache Spark GraphXAn excursion into Graph Analytics with Apache Spark GraphX
An excursion into Graph Analytics with Apache Spark GraphXKrishna Sankar
 
GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)
GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)
GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)Ankur Dave
 
Machine Learning and GraphX
Machine Learning and GraphXMachine Learning and GraphX
Machine Learning and GraphXAndy Petrella
 
Graph technology meetup slides
Graph technology meetup slidesGraph technology meetup slides
Graph technology meetup slidesSean Mulvehill
 
Real-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache FlinkReal-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache FlinkDataWorks Summit
 
GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013
GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013
GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013Amazon Web Services
 
Graphs are everywhere! Distributed graph computing with Spark GraphX
Graphs are everywhere! Distributed graph computing with Spark GraphXGraphs are everywhere! Distributed graph computing with Spark GraphX
Graphs are everywhere! Distributed graph computing with Spark GraphXAndrea Iacono
 
Spark Concepts - Spark SQL, Graphx, Streaming
Spark Concepts - Spark SQL, Graphx, StreamingSpark Concepts - Spark SQL, Graphx, Streaming
Spark Concepts - Spark SQL, Graphx, StreamingPetr Zapletal
 
Martin Junghans – Gradoop: Scalable Graph Analytics with Apache Flink
Martin Junghans – Gradoop: Scalable Graph Analytics with Apache FlinkMartin Junghans – Gradoop: Scalable Graph Analytics with Apache Flink
Martin Junghans – Gradoop: Scalable Graph Analytics with Apache FlinkFlink Forward
 
Graph Analytics in Spark
Graph Analytics in SparkGraph Analytics in Spark
Graph Analytics in SparkPaco Nathan
 
Spark Stream and SEEP
Spark Stream and SEEPSpark Stream and SEEP
Spark Stream and SEEPAmir Payberah
 
Introduction to stream processing
Introduction to stream processingIntroduction to stream processing
Introduction to stream processingAmir Payberah
 
Linux Module Programming
Linux Module ProgrammingLinux Module Programming
Linux Module ProgrammingAmir Payberah
 
Process Management - Part3
Process Management - Part3Process Management - Part3
Process Management - Part3Amir Payberah
 
Process Management - Part1
Process Management - Part1Process Management - Part1
Process Management - Part1Amir Payberah
 
MegaStore and Spanner
MegaStore and SpannerMegaStore and Spanner
MegaStore and SpannerAmir Payberah
 
Introduction to Operating Systems - Part3
Introduction to Operating Systems - Part3Introduction to Operating Systems - Part3
Introduction to Operating Systems - Part3Amir Payberah
 

Destaque (20)

Graph processing - Graphlab
Graph processing - GraphlabGraph processing - Graphlab
Graph processing - Graphlab
 
An excursion into Graph Analytics with Apache Spark GraphX
An excursion into Graph Analytics with Apache Spark GraphXAn excursion into Graph Analytics with Apache Spark GraphX
An excursion into Graph Analytics with Apache Spark GraphX
 
GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)
GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)
GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)
 
Machine Learning and GraphX
Machine Learning and GraphXMachine Learning and GraphX
Machine Learning and GraphX
 
Graph technology meetup slides
Graph technology meetup slidesGraph technology meetup slides
Graph technology meetup slides
 
Real-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache FlinkReal-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache Flink
 
GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013
GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013
GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013
 
Graphs are everywhere! Distributed graph computing with Spark GraphX
Graphs are everywhere! Distributed graph computing with Spark GraphXGraphs are everywhere! Distributed graph computing with Spark GraphX
Graphs are everywhere! Distributed graph computing with Spark GraphX
 
Spark Concepts - Spark SQL, Graphx, Streaming
Spark Concepts - Spark SQL, Graphx, StreamingSpark Concepts - Spark SQL, Graphx, Streaming
Spark Concepts - Spark SQL, Graphx, Streaming
 
Martin Junghans – Gradoop: Scalable Graph Analytics with Apache Flink
Martin Junghans – Gradoop: Scalable Graph Analytics with Apache FlinkMartin Junghans – Gradoop: Scalable Graph Analytics with Apache Flink
Martin Junghans – Gradoop: Scalable Graph Analytics with Apache Flink
 
Graph Analytics in Spark
Graph Analytics in SparkGraph Analytics in Spark
Graph Analytics in Spark
 
Spark Stream and SEEP
Spark Stream and SEEPSpark Stream and SEEP
Spark Stream and SEEP
 
MapReduce
MapReduceMapReduce
MapReduce
 
Introduction to stream processing
Introduction to stream processingIntroduction to stream processing
Introduction to stream processing
 
Linux Module Programming
Linux Module ProgrammingLinux Module Programming
Linux Module Programming
 
Process Management - Part3
Process Management - Part3Process Management - Part3
Process Management - Part3
 
Process Management - Part1
Process Management - Part1Process Management - Part1
Process Management - Part1
 
MegaStore and Spanner
MegaStore and SpannerMegaStore and Spanner
MegaStore and Spanner
 
Threads
ThreadsThreads
Threads
 
Introduction to Operating Systems - Part3
Introduction to Operating Systems - Part3Introduction to Operating Systems - Part3
Introduction to Operating Systems - Part3
 

Semelhante a Graph processing - Powergraph and GraphX

Process Synchronization - Part1
Process Synchronization -  Part1Process Synchronization -  Part1
Process Synchronization - Part1Amir Payberah
 
Process Synchronization - Part2
Process Synchronization -  Part2Process Synchronization -  Part2
Process Synchronization - Part2Amir Payberah
 
Comparative analysis of FACTS controllers by tuning employing GA and PSO
Comparative analysis of FACTS controllers by tuning employing GA and PSOComparative analysis of FACTS controllers by tuning employing GA and PSO
Comparative analysis of FACTS controllers by tuning employing GA and PSOINFOGAIN PUBLICATION
 
Numerical simulation of flow modeling in ducted axial fan using simpson’s 13r...
Numerical simulation of flow modeling in ducted axial fan using simpson’s 13r...Numerical simulation of flow modeling in ducted axial fan using simpson’s 13r...
Numerical simulation of flow modeling in ducted axial fan using simpson’s 13r...iaemedu
 
Raminder kaur presentation_two
Raminder kaur presentation_twoRaminder kaur presentation_two
Raminder kaur presentation_tworamikaurraminder
 
saad faim paper3
saad faim paper3saad faim paper3
saad faim paper3Saad Farooq
 
Process Management - Part2
Process Management - Part2Process Management - Part2
Process Management - Part2Amir Payberah
 
detail of flowchart and algorithm that are used in programmingpdf
detail of flowchart and algorithm that are used in programmingpdfdetail of flowchart and algorithm that are used in programmingpdf
detail of flowchart and algorithm that are used in programmingpdfssuserf86fba
 
Algorithm Flowchart Manual ALGORITHM FLOWCHART MANUAL For STUDENTS
Algorithm   Flowchart Manual ALGORITHM   FLOWCHART MANUAL For STUDENTSAlgorithm   Flowchart Manual ALGORITHM   FLOWCHART MANUAL For STUDENTS
Algorithm Flowchart Manual ALGORITHM FLOWCHART MANUAL For STUDENTSAlicia Edwards
 
Acceleration of the Longwave Rapid Radiative Transfer Module using GPGPU
Acceleration of the Longwave Rapid Radiative Transfer Module using GPGPUAcceleration of the Longwave Rapid Radiative Transfer Module using GPGPU
Acceleration of the Longwave Rapid Radiative Transfer Module using GPGPUMahesh Khadatare
 
Improving the Hydraulic Efficiency of Centrifugal Pumps through Computational...
Improving the Hydraulic Efficiency of Centrifugal Pumps through Computational...Improving the Hydraulic Efficiency of Centrifugal Pumps through Computational...
Improving the Hydraulic Efficiency of Centrifugal Pumps through Computational...IJERA Editor
 
GRAPH MATCHING ALGORITHM FOR TASK ASSIGNMENT PROBLEM
GRAPH MATCHING ALGORITHM FOR TASK ASSIGNMENT PROBLEMGRAPH MATCHING ALGORITHM FOR TASK ASSIGNMENT PROBLEM
GRAPH MATCHING ALGORITHM FOR TASK ASSIGNMENT PROBLEMIJCSEA Journal
 
AIROPT: A Multi-Objective Evolutionary Algorithm based Aerodynamic Shape Opti...
AIROPT: A Multi-Objective Evolutionary Algorithm based Aerodynamic Shape Opti...AIROPT: A Multi-Objective Evolutionary Algorithm based Aerodynamic Shape Opti...
AIROPT: A Multi-Objective Evolutionary Algorithm based Aerodynamic Shape Opti...Abhishek Jain
 
PowerLyra@EuroSys2015
PowerLyra@EuroSys2015PowerLyra@EuroSys2015
PowerLyra@EuroSys2015realstolz
 
Analysis Of Transmission Line Using MATLAB Software
Analysis Of Transmission Line Using MATLAB SoftwareAnalysis Of Transmission Line Using MATLAB Software
Analysis Of Transmission Line Using MATLAB SoftwareAllison Thompson
 
A Novel Technique in Software Engineering for Building Scalable Large Paralle...
A Novel Technique in Software Engineering for Building Scalable Large Paralle...A Novel Technique in Software Engineering for Building Scalable Large Paralle...
A Novel Technique in Software Engineering for Building Scalable Large Paralle...Eswar Publications
 

Semelhante a Graph processing - Powergraph and GraphX (20)

Scala
ScalaScala
Scala
 
Process Synchronization - Part1
Process Synchronization -  Part1Process Synchronization -  Part1
Process Synchronization - Part1
 
Process Synchronization - Part2
Process Synchronization -  Part2Process Synchronization -  Part2
Process Synchronization - Part2
 
Spark
SparkSpark
Spark
 
Comparative analysis of FACTS controllers by tuning employing GA and PSO
Comparative analysis of FACTS controllers by tuning employing GA and PSOComparative analysis of FACTS controllers by tuning employing GA and PSO
Comparative analysis of FACTS controllers by tuning employing GA and PSO
 
Numerical simulation of flow modeling in ducted axial fan using simpson’s 13r...
Numerical simulation of flow modeling in ducted axial fan using simpson’s 13r...Numerical simulation of flow modeling in ducted axial fan using simpson’s 13r...
Numerical simulation of flow modeling in ducted axial fan using simpson’s 13r...
 
Raminder kaur presentation_two
Raminder kaur presentation_twoRaminder kaur presentation_two
Raminder kaur presentation_two
 
saad faim paper3
saad faim paper3saad faim paper3
saad faim paper3
 
Process Management - Part2
Process Management - Part2Process Management - Part2
Process Management - Part2
 
detail of flowchart and algorithm that are used in programmingpdf
detail of flowchart and algorithm that are used in programmingpdfdetail of flowchart and algorithm that are used in programmingpdf
detail of flowchart and algorithm that are used in programmingpdf
 
Algorithm Flowchart Manual ALGORITHM FLOWCHART MANUAL For STUDENTS
Algorithm   Flowchart Manual ALGORITHM   FLOWCHART MANUAL For STUDENTSAlgorithm   Flowchart Manual ALGORITHM   FLOWCHART MANUAL For STUDENTS
Algorithm Flowchart Manual ALGORITHM FLOWCHART MANUAL For STUDENTS
 
Algorithm manual
Algorithm manualAlgorithm manual
Algorithm manual
 
Acceleration of the Longwave Rapid Radiative Transfer Module using GPGPU
Acceleration of the Longwave Rapid Radiative Transfer Module using GPGPUAcceleration of the Longwave Rapid Radiative Transfer Module using GPGPU
Acceleration of the Longwave Rapid Radiative Transfer Module using GPGPU
 
Improving the Hydraulic Efficiency of Centrifugal Pumps through Computational...
Improving the Hydraulic Efficiency of Centrifugal Pumps through Computational...Improving the Hydraulic Efficiency of Centrifugal Pumps through Computational...
Improving the Hydraulic Efficiency of Centrifugal Pumps through Computational...
 
GRAPH MATCHING ALGORITHM FOR TASK ASSIGNMENT PROBLEM
GRAPH MATCHING ALGORITHM FOR TASK ASSIGNMENT PROBLEMGRAPH MATCHING ALGORITHM FOR TASK ASSIGNMENT PROBLEM
GRAPH MATCHING ALGORITHM FOR TASK ASSIGNMENT PROBLEM
 
Co&al lecture-07
Co&al lecture-07Co&al lecture-07
Co&al lecture-07
 
AIROPT: A Multi-Objective Evolutionary Algorithm based Aerodynamic Shape Opti...
AIROPT: A Multi-Objective Evolutionary Algorithm based Aerodynamic Shape Opti...AIROPT: A Multi-Objective Evolutionary Algorithm based Aerodynamic Shape Opti...
AIROPT: A Multi-Objective Evolutionary Algorithm based Aerodynamic Shape Opti...
 
PowerLyra@EuroSys2015
PowerLyra@EuroSys2015PowerLyra@EuroSys2015
PowerLyra@EuroSys2015
 
Analysis Of Transmission Line Using MATLAB Software
Analysis Of Transmission Line Using MATLAB SoftwareAnalysis Of Transmission Line Using MATLAB Software
Analysis Of Transmission Line Using MATLAB Software
 
A Novel Technique in Software Engineering for Building Scalable Large Paralle...
A Novel Technique in Software Engineering for Building Scalable Large Paralle...A Novel Technique in Software Engineering for Building Scalable Large Paralle...
A Novel Technique in Software Engineering for Building Scalable Large Paralle...
 

Mais de Amir Payberah

P2P Content Distribution Network
P2P Content Distribution NetworkP2P Content Distribution Network
P2P Content Distribution NetworkAmir Payberah
 
Data Intensive Computing Frameworks
Data Intensive Computing FrameworksData Intensive Computing Frameworks
Data Intensive Computing FrameworksAmir Payberah
 
The Stratosphere Big Data Analytics Platform
The Stratosphere Big Data Analytics PlatformThe Stratosphere Big Data Analytics Platform
The Stratosphere Big Data Analytics PlatformAmir Payberah
 
The Spark Big Data Analytics Platform
The Spark Big Data Analytics PlatformThe Spark Big Data Analytics Platform
The Spark Big Data Analytics PlatformAmir Payberah
 
File System Implementation - Part2
File System Implementation - Part2File System Implementation - Part2
File System Implementation - Part2Amir Payberah
 
File System Implementation - Part1
File System Implementation - Part1File System Implementation - Part1
File System Implementation - Part1Amir Payberah
 
File System Interface
File System InterfaceFile System Interface
File System InterfaceAmir Payberah
 
Virtual Memory - Part2
Virtual Memory - Part2Virtual Memory - Part2
Virtual Memory - Part2Amir Payberah
 
Virtual Memory - Part1
Virtual Memory - Part1Virtual Memory - Part1
Virtual Memory - Part1Amir Payberah
 
CPU Scheduling - Part2
CPU Scheduling - Part2CPU Scheduling - Part2
CPU Scheduling - Part2Amir Payberah
 
CPU Scheduling - Part1
CPU Scheduling - Part1CPU Scheduling - Part1
CPU Scheduling - Part1Amir Payberah
 
Introduction to Operating Systems - Part2
Introduction to Operating Systems - Part2Introduction to Operating Systems - Part2
Introduction to Operating Systems - Part2Amir Payberah
 

Mais de Amir Payberah (20)

P2P Content Distribution Network
P2P Content Distribution NetworkP2P Content Distribution Network
P2P Content Distribution Network
 
Cloud Computing
Cloud ComputingCloud Computing
Cloud Computing
 
Data Intensive Computing Frameworks
Data Intensive Computing FrameworksData Intensive Computing Frameworks
Data Intensive Computing Frameworks
 
The Stratosphere Big Data Analytics Platform
The Stratosphere Big Data Analytics PlatformThe Stratosphere Big Data Analytics Platform
The Stratosphere Big Data Analytics Platform
 
The Spark Big Data Analytics Platform
The Spark Big Data Analytics PlatformThe Spark Big Data Analytics Platform
The Spark Big Data Analytics Platform
 
Security
SecuritySecurity
Security
 
Protection
ProtectionProtection
Protection
 
IO Systems
IO SystemsIO Systems
IO Systems
 
File System Implementation - Part2
File System Implementation - Part2File System Implementation - Part2
File System Implementation - Part2
 
File System Implementation - Part1
File System Implementation - Part1File System Implementation - Part1
File System Implementation - Part1
 
File System Interface
File System InterfaceFile System Interface
File System Interface
 
Storage
StorageStorage
Storage
 
Virtual Memory - Part2
Virtual Memory - Part2Virtual Memory - Part2
Virtual Memory - Part2
 
Virtual Memory - Part1
Virtual Memory - Part1Virtual Memory - Part1
Virtual Memory - Part1
 
Main Memory - Part2
Main Memory - Part2Main Memory - Part2
Main Memory - Part2
 
Main Memory - Part1
Main Memory - Part1Main Memory - Part1
Main Memory - Part1
 
Deadlocks
DeadlocksDeadlocks
Deadlocks
 
CPU Scheduling - Part2
CPU Scheduling - Part2CPU Scheduling - Part2
CPU Scheduling - Part2
 
CPU Scheduling - Part1
CPU Scheduling - Part1CPU Scheduling - Part1
CPU Scheduling - Part1
 
Introduction to Operating Systems - Part2
Introduction to Operating Systems - Part2Introduction to Operating Systems - Part2
Introduction to Operating Systems - Part2
 

Último

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 

Último (20)

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 

Graph processing - Powergraph and GraphX

  • 1. PowerGraph and GraphX Amir H. Payberah amir@sics.se Amirkabir University of Technology (Tehran Polytechnic) Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 1 / 61
  • 2. Reminder Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 2 / 61
  • 3. Data-Parallel vs. Graph-Parallel Computation Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 3 / 61
  • 4. Pregel Vertex-centric Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 4 / 61
  • 5. Pregel Vertex-centric Bulk Synchronous Parallel (BSP) Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 4 / 61
  • 6. Pregel Vertex-centric Bulk Synchronous Parallel (BSP) Runs in sequence of iterations (supersteps) Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 4 / 61
  • 7. Pregel Vertex-centric Bulk Synchronous Parallel (BSP) Runs in sequence of iterations (supersteps) A vertex in superstep S can: • reads messages sent to it in superstep S-1. • sends messages to other vertices: receiving at superstep S+1. • modifies its state. Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 4 / 61
  • 8. Pregel Limitations Inefficient if different regions of the graph converge at different speed. Can suffer if one task is more expensive than the others. Runtime of each phase is determined by the slowest machine. Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 5 / 61
  • 9. GraphLab Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 6 / 61
  • 10. GraphLab Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 7 / 61
  • 11. GraphLab Limitations Poor performance on Natural graphs. Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 8 / 61
  • 12. Natural Graphs Graphs derived from natural phenomena. Skewed Power-Law degree distribution. Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 9 / 61
  • 13. Natural Graphs Challenges Traditional graph-partitioning algorithms (edge-cut algorithms) per- form poorly on Power-Law Graphs. Challenges of high-degree vertices. Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 10 / 61
  • 14. Proposed Solution Vertex-Cut Partitioning Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 11 / 61
  • 15. Proposed Solution Vertex-Cut Partitioning Edge-cut Vertex-cut Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 11 / 61
  • 16. Edge-cut vs. Vertex-cut Partitioning Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 12 / 61
  • 17. Edge-cut vs. Vertex-cut Partitioning Edge-cut Vertex-cut Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 13 / 61
  • 18. PowerGraph Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 14 / 61
  • 19. PowerGraph Vertex-cut partitioning of graphs. Factorizes the GraphLab update function into the Gather, Apply and Scatter phases (GAS). Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 15 / 61
  • 20. Gather-Apply-Scatter Programming Model Gather • Accumulate information about neighborhood through a generalized sum. Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 16 / 61
  • 21. Gather-Apply-Scatter Programming Model Gather • Accumulate information about neighborhood through a generalized sum. Apply • Apply the accumulated value to center vertex. Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 16 / 61
  • 22. Gather-Apply-Scatter Programming Model Gather • Accumulate information about neighborhood through a generalized sum. Apply • Apply the accumulated value to center vertex. Scatter • Update adjacent edges and vertices. Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 16 / 61
  • 23. Data Model A directed graph that stores the program state, called data graph. Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 17 / 61
  • 24. Execution Model (1/2) Vertex-centric programming: implementing the GASVertexProgram interface (vertex-program for short). Similar to Comput in Pregel, and update function in GraphLab. Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 18 / 61
  • 25. Execution Model (2/2) Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 19 / 61
  • 26. Execution Model (2/2) Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 19 / 61
  • 27. PageRank - Pregel Pregel_PageRank(i, messages): // receive all the messages total = 0 foreach(msg in messages): total = total + msg // update the rank of this vertex R[i] = 0.15 + total // send new messages to neighbors foreach(j in out_neighbors[i]): sendmsg(R[i] * wij) to vertex j R[i] = 0.15 + j∈Nbrs(i) wjiR[j] Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 20 / 61
  • 28. PageRank - GraphLab GraphLab_PageRank(i) // compute sum over neighbors total = 0 foreach(j in in_neighbors(i)): total = total + R[j] * wji // update the PageRank R[i] = 0.15 + total // trigger neighbors to run again foreach(j in out_neighbors(i)): signal vertex-program on j R[i] = 0.15 + j∈Nbrs(i) wjiR[j] Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 21 / 61
  • 29. PageRank - PowerGraph PowerGraph_PageRank(i): Gather(j -> i): return wji * R[j] sum(a, b): return a + b // total: Gather and sum Apply(i, total): R[i] = 0.15 + total Scatter(i -> j): if R[i] changed then activate(j) R[i] = 0.15 + j∈Nbrs(i) wjiR[j] Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 22 / 61
  • 30. Scheduling (1/5) PowerGraph inherits the dynamic scheduling of GraphLab. Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 23 / 61
  • 31. Scheduling (2/5) Initially all vertices are active. Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 24 / 61
  • 32. Scheduling (2/5) Initially all vertices are active. PowerGraph executes the vertex-program on the active vertices until none remain. Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 24 / 61
  • 33. Scheduling (2/5) Initially all vertices are active. PowerGraph executes the vertex-program on the active vertices until none remain. The order of executing activated vertices is up to the PowerGraph execution engine. Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 24 / 61
  • 34. Scheduling (2/5) Initially all vertices are active. PowerGraph executes the vertex-program on the active vertices until none remain. The order of executing activated vertices is up to the PowerGraph execution engine. Once a vertex-program completes the scatter phase it becomes in- active until it is reactivated. Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 24 / 61
  • 35. Scheduling (2/5) Initially all vertices are active. PowerGraph executes the vertex-program on the active vertices until none remain. The order of executing activated vertices is up to the PowerGraph execution engine. Once a vertex-program completes the scatter phase it becomes in- active until it is reactivated. Vertices can activate themselves and neighboring vertices. Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 24 / 61
  • 36. Scheduling (3/5) PowerGraph can execute both synchronously and asynchronously. • Bulk synchronous execution • Asynchronous execution Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 25 / 61
  • 37. Scheduling - Bulk Synchronous Execution (4/5) Similar to Pregel. Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 26 / 61
  • 38. Scheduling - Bulk Synchronous Execution (4/5) Similar to Pregel. Minor-step: executing the gather, apply, and scatter in order. • Runs synchronously on all active vertices with a barrier at the end. Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 26 / 61
  • 39. Scheduling - Bulk Synchronous Execution (4/5) Similar to Pregel. Minor-step: executing the gather, apply, and scatter in order. • Runs synchronously on all active vertices with a barrier at the end. Super-step: a complete series of GAS minor-steps. Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 26 / 61
  • 40. Scheduling - Bulk Synchronous Execution (4/5) Similar to Pregel. Minor-step: executing the gather, apply, and scatter in order. • Runs synchronously on all active vertices with a barrier at the end. Super-step: a complete series of GAS minor-steps. Changes made to the vertex/edge data are committed at the end of each minor-step and are visible in the subsequent minor-steps. Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 26 / 61
  • 41. Scheduling - Asynchronous Execution (5/5) Changes made to the vertex/edge data during the apply and scatter functions are immediately committed to the graph. • Visible to subsequent computation on neighboring vertices. Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 27 / 61
  • 42. Scheduling - Asynchronous Execution (5/5) Changes made to the vertex/edge data during the apply and scatter functions are immediately committed to the graph. • Visible to subsequent computation on neighboring vertices. Serializability: prevents adjacent vertex-programs from running con- currently using a fine-grained locking protocol. Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 27 / 61
  • 43. Scheduling - Asynchronous Execution (5/5) Changes made to the vertex/edge data during the apply and scatter functions are immediately committed to the graph. • Visible to subsequent computation on neighboring vertices. Serializability: prevents adjacent vertex-programs from running con- currently using a fine-grained locking protocol. • Dining philosophers problem, where each vertex is a philosopher, and each edge is a fork. Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 27 / 61
  • 44. Scheduling - Asynchronous Execution (5/5) Changes made to the vertex/edge data during the apply and scatter functions are immediately committed to the graph. • Visible to subsequent computation on neighboring vertices. Serializability: prevents adjacent vertex-programs from running con- currently using a fine-grained locking protocol. • Dining philosophers problem, where each vertex is a philosopher, and each edge is a fork. • GraphLab implements Dijkstras solution, where forks are acquired sequentially according to a total ordering. Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 27 / 61
  • 45. Scheduling - Asynchronous Execution (5/5) Changes made to the vertex/edge data during the apply and scatter functions are immediately committed to the graph. • Visible to subsequent computation on neighboring vertices. Serializability: prevents adjacent vertex-programs from running con- currently using a fine-grained locking protocol. • Dining philosophers problem, where each vertex is a philosopher, and each edge is a fork. • GraphLab implements Dijkstras solution, where forks are acquired sequentially according to a total ordering. • PowerGraph implements Chandy-Misra solution, which acquires all forks simultaneously. Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 27 / 61
  • 46. Delta Caching (1/2) Changes in a few of its neighbors → triggering a vertex-program The gather operation is invoked on all neighbors: wasting compu- tation cycles Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 28 / 61
  • 47. Delta Caching (1/2) Changes in a few of its neighbors → triggering a vertex-program The gather operation is invoked on all neighbors: wasting compu- tation cycles Maintaining a cache of the accumulator av from the previous gather phase for each vertex. The scatter can return an additional ∆a, which is added to the cached accumulator av. Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 28 / 61
  • 48. Delta Caching (2/2) Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 29 / 61
  • 49. Delta Caching (2/2) Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 29 / 61
  • 50. Example: PageRank (Delta-Caching) PowerGraph_PageRank(i): Gather(j -> i): return wji * R[j] sum(a, b): return a + b // total: Gather and sum Apply(i, total): new = 0.15 + total R[i].delta = new - R[i] R[i] = new Scatter(i -> j): if R[i] changed then activate(j) return wij * R[i].delta R[i] = 0.15 + j∈Nbrs(i) wjiR[j] Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 30 / 61
  • 51. Graph Partitioning Vertex-cut partitioning. Evenly assign edges to machines. • Minimize machines spanned by each vertex. Two proposed solutions: • Random edge placement. • Greedy edge placement. Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 31 / 61
  • 52. Random Vertex-Cuts Randomly assign edges to machines. Completely parallel and easy to distribute. High replication factor. Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 32 / 61
  • 53. Greedy Vertex-Cuts (1/2) A(v): set of machines that contain adjacent edges of v. Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 33 / 61
  • 54. Greedy Vertex-Cuts (1/2) A(v): set of machines that contain adjacent edges of v. Case 1: If A(u) and A(v) intersect, then the edge should be assigned to a machine in the intersection. Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 33 / 61
  • 55. Greedy Vertex-Cuts (1/2) A(v): set of machines that contain adjacent edges of v. Case 1: If A(u) and A(v) intersect, then the edge should be assigned to a machine in the intersection. Case 2: If A(u) and A(v) are not empty and do not intersect, then the edge should be assigned to one of the machines from the vertex with the most unassigned edges. Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 33 / 61
  • 56. Greedy Vertex-Cuts (1/2) A(v): set of machines that contain adjacent edges of v. Case 1: If A(u) and A(v) intersect, then the edge should be assigned to a machine in the intersection. Case 2: If A(u) and A(v) are not empty and do not intersect, then the edge should be assigned to one of the machines from the vertex with the most unassigned edges. Case 3: If only one of the two vertices has been assigned, then choose a machine from the assigned vertex. Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 33 / 61
  • 57. Greedy Vertex-Cuts (1/2) A(v): set of machines that contain adjacent edges of v. Case 1: If A(u) and A(v) intersect, then the edge should be assigned to a machine in the intersection. Case 2: If A(u) and A(v) are not empty and do not intersect, then the edge should be assigned to one of the machines from the vertex with the most unassigned edges. Case 3: If only one of the two vertices has been assigned, then choose a machine from the assigned vertex. Case 4: If neither vertex has been assigned, then assign the edge to the least loaded machine. Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 33 / 61
  • 58. Greedy Vertex-Cuts (2/2) Coordinated edge placement: • Requires coordination to place each edge • Slower, but higher quality cuts Oblivious edge placement: • Approx. greedy objective without coordination • Faster, but lower quality cuts Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 34 / 61
  • 59. PowerGraph Summary Gather-Apply-Scatter programming model Synchronous and Asynchronous models Vertex-cut graph partitioning Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 35 / 61
  • 60. Any limitations? Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 36 / 61
  • 61. Data-Parallel vs. Graph-Parallel Computation Graph-parallel computation: restricting the types of computation to achieve performance. Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 37 / 61
  • 62. Data-Parallel vs. Graph-Parallel Computation Graph-parallel computation: restricting the types of computation to achieve performance. But, the same restrictions make it difficult and inefficient to express many stages in a typical graph-analytics pipeline. Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 37 / 61
  • 63. Data-Parallel and Graph-Parallel Pipeline Moving between table and graph views of the same physical data. Inefficient: extensive data movement and duplication across the net- work and file system. Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 38 / 61
  • 64. GraphX vs. Data-Parallel/Graph-Parallel Systems Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 39 / 61
  • 65. GraphX vs. Data-Parallel/Graph-Parallel Systems Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 39 / 61
  • 66. Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 40 / 61
  • 67. GraphX New API that blurs the distinction between Tables and Graphs. New system that unifies Data-Parallel and Graph-Parallel systems. It is implemented on top of Spark. Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 41 / 61
  • 68. Unifying Data-Parallel and Graph-Parallel Analytics Tables and Graphs are composable views of the same physical data. Each view has its own operators that exploit the semantics of the view to achieve efficient execution. Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 42 / 61
  • 69. Data Model Property Graph: represented using two Spark RDDs: • Edge collection: VertexRDD • Vertex collection: EdgeRDD // VD: the type of the vertex attribute // ED: the type of the edge attribute class Graph[VD, ED] { val vertices: VertexRDD[VD] val edges: EdgeRDD[ED] } Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 43 / 61
  • 70. Primitive Data Types // Vertex collection class VertexRDD[VD] extends RDD[(VertexId, VD)] // Edge collection class EdgeRDD[ED] extends RDD[Edge[ED]] case class Edge[ED](srcId: VertexId = 0, dstId: VertexId = 0, attr: ED = null.asInstanceOf[ED]) // Edge Triple class EdgeTriplet[VD, ED] extends Edge[ED] EdgeTriplet represents an edge along with the vertex attributes of its neighboring vertices. Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 44 / 61
  • 71. Example (1/3) Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 45 / 61
  • 72. Example (2/3) val sc: SparkContext // Create an RDD for the vertices val users: VertexRDD[(String, String)] = sc.parallelize( Array((3L, ("rxin", "student")), (7L, ("jgonzal", "postdoc")), (5L, ("franklin", "prof")), (2L, ("istoica", "prof")))) // Create an RDD for edges val relationships: EdgeRDD[String] = sc.parallelize( Array(Edge(3L, 7L, "collab"), Edge(5L, 3L, "advisor"), Edge(2L, 5L, "colleague"), Edge(5L, 7L, "pi"))) // Define a default user in case there are relationship with missing user val defaultUser = ("John Doe", "Missing") // Build the initial Graph val userGraph: Graph[(String, String), String] = Graph(users, relationships, defaultUser) Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 46 / 61
  • 73. Example (3/3) // Constructed from above val userGraph: Graph[(String, String), String] // Count all users which are postdocs userGraph.vertices.filter((id, (name, pos)) => pos == "postdoc").count // Count all the edges where src > dst userGraph.edges.filter(e => e.srcId > e.dstId).count // Use the triplets view to create an RDD of facts val facts: RDD[String] = graph.triplets.map(triplet => triplet.srcAttr._1 + " is the " + triplet.attr + " of " + triplet.dstAttr._1) // Remove missing vertices as well as the edges to connected to them val validGraph = graph.subgraph(vpred = (id, attr) => attr._2 != "Missing") facts.collect.foreach(println(_)) Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 47 / 61
  • 74. Property Operators (1/2) class Graph[VD, ED] { def mapVertices[VD2](map: (VertexId, VD) => VD2): Graph[VD2, ED] def mapEdges[ED2](map: Edge[ED] => ED2): Graph[VD, ED2] def mapTriplets[ED2](map: EdgeTriplet[VD, ED] => ED2): Graph[VD, ED2] } They yield new graphs with the vertex or edge properties modified by the map function. The graph structure is unaffected. Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 48 / 61
  • 75. Property Operators (2/2) val newGraph = graph.mapVertices((id, attr) => mapUdf(id, attr)) val newVertices = graph.vertices.map((id, attr) => (id, mapUdf(id, attr))) val newGraph = Graph(newVertices, graph.edges) Both are logically equivalent, but the second one does not preserve the structural indices and would not benefit from the GraphX system optimizations. Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 49 / 61
  • 76. Map Reduce Triplets Map-Reduce for each vertex Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 50 / 61
  • 77. Map Reduce Triplets Map-Reduce for each vertex // what is the age of the oldest follower for each user? val oldestFollowerAge = graph.mapReduceTriplets( e => (e.dstAttr, e.srcAttr), // Map (a, b) => max(a, b) // Reduce ).vertices Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 50 / 61
  • 78. Structural Operators class Graph[VD, ED] { // returns a new graph with all the edge directions reversed def reverse: Graph[VD, ED] // returns the graph containing only the vertices and edges that satisfy // the vertex predicate def subgraph(epred: EdgeTriplet[VD,ED] => Boolean, vpred: (VertexId, VD) => Boolean): Graph[VD, ED] // a subgraph by returning a graph that contains the vertices and edges // that are also found in the input graph def mask[VD2, ED2](other: Graph[VD2, ED2]): Graph[VD, ED] } Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 51 / 61
  • 79. Structural Operators Example // Build the initial Graph val graph = Graph(users, relationships, defaultUser) // Run Connected Components val ccGraph = graph.connectedComponents() // Remove missing vertices as well as the edges to connected to them val validGraph = graph.subgraph(vpred = (id, attr) => attr._2 != "Missing") // Restrict the answer to the valid subgraph val validCCGraph = ccGraph.mask(validGraph) Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 52 / 61
  • 80. Join Operators To join data from external collections (RDDs) with graphs. class Graph[VD, ED] { // joins the vertices with the input RDD and returns a new graph // by applying the map function to the result of the joined vertices def joinVertices[U](table: RDD[(VertexId, U)]) (map: (VertexId, VD, U) => VD): Graph[VD, ED] // similarly to joinVertices, but the map function is applied to // all vertices and can change the vertex property type def outerJoinVertices[U, VD2](table: RDD[(VertexId, U)]) (map: (VertexId, VD, Option[U]) => VD2): Graph[VD2, ED] } Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 53 / 61
  • 81. Graph Builders // load a graph from a list of edges on disk object GraphLoader { def edgeListFile( sc: SparkContext, path: String, canonicalOrientation: Boolean = false, minEdgePartitions: Int = 1) : Graph[Int, Int] } // graph file # This is a comment 2 1 4 1 1 2 Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 54 / 61
  • 82. GraphX and Spark GraphX is implemented on top of Spark In-memory caching Lineage-based fault tolerance Programmable partitioning Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 55 / 61
  • 83. Distributed Graph Representation (1/2) Representing graphs using two RDDs: edge-collection and vertex- collection Vertex-cut partitioning (like PowerGraph) Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 56 / 61
  • 84. Distributed Graph Representation (2/2) Each vertex partition contains a bitmask and routing table. Routing table: a logical map from a vertex id to the set of edge partitions that contains adjacent edges. Bitmask: enables the set intersection and filtering. • Vertices bitmasks are updated after each operation (e.g., mapRe- duceTriplets). • Vertices hidden by the bitmask do not participate in the graph oper- ations. Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 57 / 61
  • 85. Summary Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 58 / 61
  • 86. Summary PowerGraph • GAS programming model • Vertex-cut partitioning GraphX • Unifying data-parallel and graph-parallel analytics • Vertex-cut partitioning Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 59 / 61
  • 87. Questions? Acknowledgements Some pictures were derived from the Spark web site (http://spark.apache.org/). Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 60 / 61
  • 88. Questions? Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 61 / 61