5. Graph centrality
This talk
Path summation
X
f (paths of length `)
`
local Katz score
X number of paths of
↵` ·
length ` between i and j
`
5/41
6. A – adjacency matrix
L – Laplacian matrix
P – random walk transition matrix
Katz score
Ki,j = [(I ↵AT ) 1 ]i,j
Commute time
Ci,j = vol(G)(L+ + L+
i,i j,j 2L+ )
i,j
PageRank
(I ↵P T )x = (1 ↵)e/n
Xi,j = (1 ↵)[(I ↵P T ) 1 ]i,j
6/41
7. USES FOR CENTRALITY
Ranking features for web-search/classification
Najork, M. A.; Zaragoza, H. & Taylor, M. J.#
HITS on the web: How does it compare?
Becchetti, L.; Castillo, C.; Donato, D.; Baeza-Yates, R.
& Leonardi, S. Link analysis for Web spam detection
Interesting nodes
GeneRank, ProteinRank, TwitterRank, IsoRank,
FutureRank, HostRank, DiffusionRank, ItemRank,
SocialPageRank, SimRank
7/41
8. USES FOR CENTRALITY
Ranking networks of comparisons.
Chartier, T. P.; Kreutzer, E.; Langville, A. N. & Pedings,
K. E. Sensitivity and Stability of Ranking Vectors
Clustering or community detection
Andersen, R.; Chung, F. & Lang, K.#
Local Graph Partitioning using PageRank Vectors
Link prediction
Savas et al. Hold on about 90 minutes
8/41
10. MATRICES, MOMENTS, QUADRATURE
Estimate a quadratic form
T
l x f (Z )x u
T +
(ei ej ) L (ei ej ) Commute
1 T 1
(ei + ej )T (I ↵P ) 1
(ei + ej ) (ei ej )T (I ↵P T ) 1
(ei ej ) Katz
4
4
Also used by Benzi and Bonito (LAA) for Katz
scores and the matrix exponential
10/41
11. MMQ - THE BIG IDEA
Quadratic form Think
Weighted sum A is s.p.d. use EVD
Stieltjes integral “A tautology”
Quadrature approximation
Matrix equation Lanczos
David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 22 of 47
11/41
12. MMQ PROCEDURE
Goal
Given
1. Run k-steps of Lanczos on starting with
2. Compute , with an additional eigenvalue at ,
set Correspond to a Gauss-Radau rule, with
u as a prescribed node
3. Compute , with an additional eigenvalue at , set
Correspond to a Gauss-Radau rule, with
l as a prescribed node
4. Output as lower and upper bounds on
12/41
David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 25 of 47
13. How well does it work?
Bounds
Error
arxiv, Katz, hard alpha arxiv, Katz, hard
50
0
10
0
-5
10
-50 5 10 15 20 25 30
5 10 15 20 25 30 matrix-vector products
matrix-vector products
13/41
������ = 1/( || A ||2 + 1 )
15. Katz scores
ATZ SCORES ARE LOCALIZED
T
(I ↵A )k = e i are highly
localized.
Up to 50 neighbors is
99.65% of the total
mass
15/41
Gleich (Purdue) Univ. Chicago SSCS Seminar 32 of 47
17. TOP-K ALGORITHM FOR KATZ
Approximate
T
where is sparse
Keep sparse too
Ideally, don’t “touch” all of
17/41
David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 34 of
18. TOP-K ALGORITHM FOR KATZ
Approximate
T
where is sparse
Keep sparse too
Ideally, don’t “touch” all of
This is possible for "
18/41
David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 34 of
personalized PageRank!
19. Richardson Ax = b
x(k+1) = x(k) + r(k) A = AT , A ⌫ 0 Gradient descent
r(k+1) = b Ax(k) equivalent# min xT Ax 2xT b
to
What about coordinate descent?
Gauss-Southwell Ax = b
x(k+1) = x(k) + rj(k) ej How to
r(k+1) = r(k) + rj(k) Aej pick j?
Frequently “rediscovered” for PageRank.
19/41
McSherry (WWW2005), Berkhin (JIM 2007),
Andersen-Chung-Lang (FOCS 2006)
21. NEW CONVERGENCE THEORY
Katz and PageRank are equivalent if
������ < 1 / || A ||1
Gauss-Southwell converges when ������ < 1 / || A ||2
(Luo and Tseng 1992) if j is picked as the largest
residual
Read all about it
Fast matrix computations for pair-wise and column-wise commute times and
Katz scores. Bonchi, Esfandiar, Gleich, Greif, Lakshmanan, J. Internet
Mathematics (to appear)
21/41
23. OPEN QUESTIONS
I can’t find any existing derivation of this method
in the non-symmetric case (prior to the
PageRank literature). Any thoughts?
How to show that the method convergence for a
non-symmetric matrix when (I ↵P T ) is not
diagonally dominant?
23/41
27. Overlapping
Clusters
Use the
redundancy to
reduce
communication
when solving a
PageRank problem
Overlapping clusters for distributed computation. #
27/41
Andersen, Gleich, Mirrokni, WSDM2012 (to appear).
29. KEY POINTS
Utilize personalized PageRank vectors to find
the clusters with “good” conductance scores.
Define “core” vertices for each cluster. Find a
good way to cover the graph with these
clusters.
Use restricted additive Schwarz to solve #
(thanks Prof. Szyld and Frommer!)
29/41
30. All nodes solve locally using #
the coordinate descent method.
30/41
31. All nodes solve locally using #
the coordinate descent method.
A core vertex for the
31/41
gray cluster.
32. All nodes solve locally using #
the coordinate descent method.
Red sends residuals to white.
White send residuals to red.
32/41
33. White then uses the coordinate
descent method to adjust its solution.
33/41
Will cause communication to red/blue.
34. It works!
2
Swapping Probability (usroads)
PageRank Communication (usroads)
Swapping Probability (web−Google)
1.5
PageRank Communication (web−Google)
Relative Work
1 Metis Partitioner
0.5
0
1 1.1 1.2 1.3 1.4 1.5 1.6 1.7
Volume Ratio
How much more of the
34/41
graph we need to store.
35. PERSONALIZED PAGERANK CLUSTERS
Solve (I ↵P T )x = (1 ↵)ei
#
to a large degree-weighted tolerance ������
Sweep over the vertices in order of their degree-
normalized rank. Find the best conductance set.
A Cheeger-like inequality. (Not a heuristic.)
35/41
36. CORE VERTICES
Compute the expected “leavetime” for each
vertex in a cluster.
Keep increasing the threshold for a “good”
vertex until every vertex is core in some cluster.
Then approximate a set-cover problem to cover
the graph with clusters, and use a heuristic to
pack vertices until
36/41