Massive graphs are ubiquitous in various application domains, such as social networks, road networks, communication networks, biological networks, RDF graphs, and so on. Such graphs are massive (for example, with hundreds of millions of nodes and edges or even more) and contain rich information (for example, node/edge weights, labels and textual contents). In such massive graphs, an important class of problems is to process various graph structure related queries. Graph reachability, as an example, asks whether a node can reach another in a graph. However, the large graph scale presents new challenges for efficient query processing.
In this talk, I will introduce two new yet important types of graph reachability queries: weight constraint reachability that imposes edge weight constraint on the answer path, and k-hop reachability that imposes a length constraint on the answer path. With such realistic constraints, we can find more meaningful and practically feasible answers. These two reachablity queries have wide applications in many real-world problems, such as QoS routing and trip planning.
3. A: Yes
A: No
)9,3(=Q
Graph Reachability
Query: can node u reach node v in a directed graph?
)11,1(=Q
1 2
3 4
6 7 8
5
9
13 10
11
12
14
15
4. Graph Reachability
Has been studied extensively in the literature
A comprehensive survey by Yu and Cheng [1]
Main idea
Find all strongly connected components (SCCs) in
a graph G
Compress G into a DAG by replacing each SCC
with a node
Compute the edge transitive closure on the DAG
5. Graph Reachability with Realistic
Constraints
General reachability query is not expressive
enough, and the answers may not be
meaningful or practically feasible.
We, for the first time, study graph reachability
when realistic constraints are imposed.
Weight constraint [VLDBJ’13]
Distance constraint [PVLDB’12]
6. Weight Constraint Reachability (WCR)
Input: a weighted undirected graph
Query: can node u reach node v with every edge
weight on the path satisfying a constraint c?
)4,,( ≤= gaQ
),,( wEVG =
A: Yes!
7. Applications of WCR
QoS routing
Is there a route from one node to another in a
communication network, such that each link has a
bandwidth ≥ x ?
Trip planning
Is there a route from one city to another in a road network,
such that each segment has a speed limit within [50, 80]
miles/hour?
Distribution network
Is there a feasible delivery route between two locations,
such that each intermediate warehouse, storage point or
distribution center has a proper handling capacity ≥ x ?
8. A Straightforward Solution
Perform BFS/DFS search from node u, until reach
node v or no more unvisited nodes left
)4,,( ≤= gaQ
)( nmO + time!
9. A Nice Property based on MST
Theorem
Two vertices u and v are reachable w.r.t. the weight
constraint ≤ y in G Vertices u and v are⇔
reachable w.r.t. the constraint ≤ y in the MST of G.
)4,,( ≤= gaQ )()( nOnmO →+ time
With this property, can we further
reduce the query time and how?
10. Proof of Theorem
Given and its MST , for any vertices
, denote
The removal of creates two connected
components and .
Define an edge cut in as
Then and
),,( wEVG = T
Vvu ∈,
)(maxarg ),(max ewe vuPe T∈=
maxe
uT vT
G
},|),({ vuuv TbTaEbaeC ∈∈∈=
uvCe ∈max )(min)( max ewew uvCe∈=
according to the cut property of minimum spanning tree.
11. Proof of Theorem
For any path , we have .
For any , we have
Thus if , we can conclude and are
not reachable w.r.t. the constraint .
),( vuP Φ≠uvCvuP ),(
uvCvuPe ),('∈
),()'()( max vuPewew ≤≤
yew >)( max u v
y≤
12. For any ,
Given , if , then yes!
The Maximum Edge Weight on MST
21, TvTu ∈∈
4)()(max),( max
),(
===
∈
ewewvuP
vuPe
T
T
),,( yvuQ ≤= yvuPT ≤= 4),(
4 maxe
1T 2T
13. This Property can be Recursively Applied
1T11T 12T
3
maxe
For any ,1211, TvTu ∈∈
3)()(max),( max
),(
===
∈
ewewvuP
vuPe
T
T
15. Query on the Edge Index Tree
Given , we compute
where is the lowest common ancestor of and in
the edge index tree.
can be computed in time based on size
index.
Then we only need to test whether or
not to answer .
),,( yvuQ ≤=
)),((),( vuLCAwvuPT =
),( vuLCA
),( vuLCA )1(O )(nO
yvuLCAwvuPT ≤= )),((),(
),,( yvuQ ≤=
u v
17. Complexity Analysis
Query Time Index Size Index Time
)(nO)1(O )(nO
to process queries or .),,( yvuQ ≤= ),,( xvuQ ≥=
It can be easily extended to process .]),[,,( yxvuQ =
18. Answering WCR with a Disk-Resident
Index
What happens if the edge index tree is too large to
fit in memory?
Problem: it costs a large constant number of random
I/O access if we store the edge index tree in the disk
Our solution: design a disk-resident index and an
I/O efficient algorithm to answer a WCR query.
19. A Vertex Coding Idea
We pick an “arbitrary” node of an MST as the root to
get a rooted MST.
4)},(),,({max),( == gbPbaPgaP TTT 2)},(),,({max),( == efPfaPeaP TTT ))},(,()),,(,({max),( vuLCAvPvuLCAuPvuP TTT =
21. A Complexity Issue in Vertex Coding
We store the code for every vertex on the disk.
Given a query , and are
read from the disk to compute .
Space complexity:
Query I/O complexity: , where B is
the page size
),,( yvuQ ≤= )(ucode )(vcode
),( vuPT
)()( 2
nOdepthnO ⊆⋅
)()(
B
n
O
B
depth
O ⊆
22. Bound the Tree Depth by Balancing
We will balance the rooted MST.
Definition (Median Node)
Given an MST , a node is a median node
of , if for each neighbor of , the following holds
The median node always and uniquely exists in a tree.
We use the median node of an MST as its root. For
each subtree underneath the root, we use the median
node concept to balance the subtree recursively.
T )(TVv∈
T 'v v
2
|)(|
|)(| '
TV
TV v ≤
23. Tree Balancing: Example
Theorem
The depth of the balanced tree is at most .n2log
Corollary
code(u) for any node u contains at most entries,
thus can fit into one page (i.e., , where
B=1024 or 4096 bytes).
n2log
Bn ≤2log
24. Complexity Analysis
Query Time Index Size Index Time
Memory
Disk 2 I/Os )log( nnO)log( nnO
to process queries or .),,( yvuQ ≤= ),,( xvuQ ≥=
It can be easily extended to process .]),[,,( yxvuQ =
)1(O )(nO)(nO
26. Experiment Settings
2.67G Hz CPU, 12GB Memory, test 10,000 queries
Memory-based methods
BFS/DFS on graph
MST-Index
Edge-Index
Disk-based methods
External BFS/DFS on graph
External MST
Balanced Tree Index
27. Memory-based Algorithms: Query Time
Query Time in Microseconds (10-6
seconds)
Network DFS BFS MST-
Index
Edge-
Index
Facebook 1,098 1,429 1 1
USARN 32,462 30,868 1,382 4
28. Memory-based Algorithms: Index Size
Index Size in GB
Network DFS BFS MST-
Index
Edge-
Index
Facebook 0.01 0.01 0.0008 0.0025
USARN 0.89 0.89 0.28 0.95
29. Memory-based Algorithms: Index Time
Index Time in Seconds
Network DFS BFS MST-
Index
Edge-
Index
Facebook 0.4 0.4 0.03 0.06
USARN 33.7 33.7 9.9 39.2
30. Disk-based Algorithms: Query Time
Query Time in Microseconds (10-6
seconds)
Network Ext-
DFS
Ext-
BFS
Ext-
MST
Balance
d-Index
Facebook 31,368 48,152 772 11
USARN 294,521 64,471 422,810 18
31. Disk-based Algorithms: Index Size
Index Size in GB
Network Ext-
DFS
Ext-
BFS
Ext-
MST
Balance
d-Index
Facebook 0.01 0.01 0.0008 0.0035
USARN 0.89 0.89 0.28 0.52
32. Disk-based Algorithms: Index Time
Index Time in Seconds
Network Ext-
DFS
Ext-
BFS
Ext-
MST
Balance
d-Index
Facebook 0.6 0.6 0.048 0.146
USARN 48.8 48.8 12.2 118.8
33. Summary and Contribution
The first study on WCR query
Computing Weight Constraint Reachability in
Large Networks. The VLDB Journal, 22(3):275-
294, 2013.
Design two novel and efficient solutions
Memory: edge index tree for O(1) query time
Disk: balanced tree + vertex coding for 2 I/O
query cost
34. K-Hop Reachability (K-Reach)
Input: an unweighted directed graph
Query: can node u reach node v via a path of length
no more than k?
faQ 3: →
A: Yes!
gaQ 3: →
A: No!
35. Applications of K-Reach
In a wireless or sensor network, where a broadcasted
message may get lost during any hop, the probability
of reception degrades exponentially over multiple
hops.
In social networks, the degree of acquaintance may
even decrease super-exponentially (i.e., two persons
may hardly know each other if they are just 3 hops
apart).
K-Reach is helpful since it can model the level and
sphere of the influence.
36. Vertex Cover
A set of vertices is a vertex cover of a graph
, if for every edge , we have
.
The problem of computing the minimum vertex cover
is NP-hard.
But there is a polynomial time algorithm for
computing a 2-approxiamte minimum vertex cover.
VS ⊆
),( EVG = Evu ∈),(
Φ≠Svu },{
52. Summary and Contribution
The first study on K-Reach query
K-Reach: Who is in Your Small World. Proceedings of
the VLDB Endowment, 5(11):1292-1303, 2012.
An efficient vertex cover-based index can
answer both classic reachability and k-hop
reachability queries
53. Conclusions
We study two graph reachability queries,
WCR and K-Reach, when realistic constraints
are imposed. This makes the answers to the
queries more meaningful and practically
useful in many applications.
We exploit the nice property for each query
type and design efficient indices for
processing these two types of queries.
54. Joint work with (in alphabetical order)
Lijun Chang
James Cheng
Miao Qiao
Lu Qin
Zechao Shang
Haixun Wang
Jeffrey Xu Yu
Philip S. Yu
55. References
[1] Jeffrey Xu Yu, Jiefeng Cheng: Graph Reachability
Queries: A Survey. Managing and Mining Graph
Data 2010: 181-215
[2] Miao Qiao, Hong Cheng, Lu Qin, Jeffrey Xu Yu,
Philip S. Yu, Lijun Chang: Computing weight
constraint reachability in large networks. VLDB J.
22(3): 275-294 (2013)
[3] James Cheng, Zechao Shang, Hong Cheng,
Haixun Wang, Jeffrey Xu Yu: K-Reach: Who is in
Your Small World. PVLDB 5(11): 1292-1303 (2012)