Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Minimizing cost in distributed multiquery processing applications
1. Minimizing Communication Cost in Distributed Multi-query Processing Jian Li, Amol Deshpande, Samir Khuller Department of Computer Science, University of Maryland Presented by: Luis Galárraga Saarland University July 7th, 2010
31. Each query comes with a plan in the form of a directed tree. Destination node Data sources involved Data size S i S j S i x S j w z(S i ) z(S j ) z(S i x S j )
32. More formally Given the topology graph G c and a set of trees representing the query plans, our goal is to find a data movement plan that minimizes the total communication cost incurred while executing the queries.
33. Problem formulation Topology G c Queries (10) S 1 S 2 S 1 x S 2 C (10) (7) S 4 S 1 x S 2 x S 4 (5) (100) (100) S 2 S 6 S 2 x S 6 D (10) (5) S 2 S 5 S 2 x S 5 B (10) (6) (8) B A C D E F S 2 S 1 S 3 S 4 S 6 S 5 (10) (10) (100) (8) (100)
71. Tree topology – Step 2 – Single query G c u G c v S 1 S 2 C S 4 (10) S 1 x S 2 (10) (7) S 1 x S 2 x S 4 (5) (100) H D B A C D E F S 2 S 1 S 3 S 4 S 6 S 5 (10) (10) (100) (8) (100) C C D C
72.
73.
74.
75. Tree topology – Step 2 – Multiple queries S 1 S 1 x S 2 C (10) (7) S 4 S 1 x S 2 x S 4 (5) (100) (100) S 6 S 2 x S 6 D (5) S 2 S 5 S 2 x S 5 B (10) (6) (8)
76.
77.
78.
79. Then add the edges according to this rule (for every edge in G c ):
If more than one source resides in a node then we can just create a node per relation and link the nodes with weighted 0 edges. In the case of replication, it becomes part of the query plan optimization. In that case the tree query plan given to the algorithm should be the minimum weighted tree (of course only considering the weight edges in the topology). The shortest paths for every pair of nodes might be precomputed (we could use associativity for joins of 3 relations), then we only care about taking the groups of replicas such that their distance is the smallest, furthermore this is only done in the leaves of the query plan.
Min-cut can be solved in polynomial time. Edmonds-Karp solves the problem in O(|V| * |E| ^2) Max-flow: Ford-Fulkerson O(|V| * f) Dinitz algorithm: O(|E| |V| ^ 2)