Iterative graph computations are a key component in many big data applications. In my work, I have developed new frameworks to support efficient implementation of iterative graph computations, new distributed systems for analyzing dynamic graphs, and new algorithms for fast approximate computation over graphs that depend on time or on some parameters. In this talk, I focus on one example: the algorithmic challenge of efficient edge-weight personalization for PageRank.
I will first introduce two different ways to personalize PageRank: node weight personalization and edge weight personalization. Node weight personalization changes the teleport probabilities and edge weight personalization changes the transition probabilities in a random surfer model. While there exists many efficient methods for node weight personalization, fast edge weight personalization has been an open problem over a decade.
I will then describe the first fast method for computing PageRank on general graphs when the edge weights are personalized. Based on model reduction, this method is nearly five orders of magnitude faster than the standard approach for an example learning-to-rank application. This speed improvement enables interactive computation of a class of ranking results that previously could only be computed offline.
3. Ubiquitous Graph Data
33
Social Networks Web
Recommendation Systems
Computer VisionBioinformatics
Physical Simulations
New Challenges in Big Data Era
4. My Work
• Fast Iterative Graph Computation with Block Updates
W. Xie, G. Wang, D. Bindel, A. Demers, J. Gehrke. PVLDB 6(14)
• Dynamic Interaction Graph with Probabilistic Edge Decay
W. Xie, Y. Tian, Y. Sismanis, A. Balmin, P. J. Haas. ICDE 2015
• Edge-Weighted Personalized PageRank:
Breaking A Decade-Old Performance Barrier
W. Xie, D. Bindel, A. Demers, J. Gehrke.
Accepted by KDD 2015
To-be-Updated Vertices Dependent Vertices Unrelated Vertices
Block Boundary
(a) Vertex-Oriented Computation (b) Block-Oriented Computation
5 years ago now
Alice
Bob
Carol
1 month ago
5. My Work
• Fast Iterative Graph Computation with Block Updates
W. Xie, G. Wang, D. Bindel, A. Demers, J. Gehrke. PVLDB 6(14)
• Dynamic Interaction Graph with Probabilistic Edge Decay
W. Xie, Y. Tian, Y. Sismanis, A. Balmin, P. J. Haas. ICDE 2015
• Edge-Weighted Personalized PageRank:
Breaking A Decade-Old Performance Barrier
W. Xie, D. Bindel, A. Demers, J. Gehrke.
Accepted by KDD 2015
To-be-Updated Vertices Dependent Vertices Unrelated Vertices
Block Boundary
(a) Vertex-Oriented Computation (b) Block-Oriented Computation
5 years ago now
Alice
Bob
Carol
1 month ago
6. Outline
• Introduction and Motivation
• Model Reduction
• Application to Personalized PageRank
• Experiments
6
7. Outline
• Introduction and Motivation
• Model Reduction
• Application to Personalized PageRank
• Experiments
7
8. PageRank
• PageRank model
– A random walker moves in the graph
– At each step
• Move to an adjacent node (with prob. ), or
• Teleport to a new node (with prob. )
• PageRank vector: stationery vector for this process
8
9. PageRank
• PageRank model
– A random walker moves in the graph
– At each step
• Move to an adjacent node (with prob. ), or
• Teleport to a new node (with prob. )
• PageRank vector: stationery vector for this process
9
Transition
Matrix
PageRank
vector
Teleport
vector
12. • Edge-weighted personalized PageRank
– ObjectRank [Balmin+05] / PopRank [Nie+05]
– TwitterRank [Weng+10]
– Learning to Rank [BackstromL11]
Personalized PageRank
• Node-weighted personalized PageRank
– Topic-Sensitive PageRank (TSPR) [Haveliwala02]
– Localized PageRank [Bahmani+10]
12
Usually a small number of
global parameters (e.g. 5-10)
13. ObjectRank on DBLP
13
Paper Index Selection
for OLAP
Paper Data Cube: A
Relational Aggregation
Operator…
Forum ICDE
Paper Modeling
Multidimensional
DatabasesConference
ICDE 1997
Author Rakesh
Agrawal
Paper Range Queries
in OLAP Data Cubes
cites
contains
contains
has instance
writes
writes
cites
cites
14. • Edge-weighted personalized PageRank
– ObjectRank / PopRank
– TwitterRank
– Learning to Rank
Personalized PageRank
• Node-weighted personalized PageRank
– Topic-Sensitive PageRank (TSPR)
– Localized PageRank
14
Question: Which way to personalize?
Answer: Largely depends on whether the
metadata is associated with vertex or edge.
15. Personalized PageRank
• Node-weighted personalized PageRank
– Efficient algorithms exploiting the structure of v
• Linearity based on parameter w
• Sparsity
15
• Edge-weighted personalized PageRank
– NO Efficient algorithm for general graphs
• No linearity based on w
16. Edge Personalization Computation
• Ad-hoc algorithms for special graphs / specific
application
– ObjectRank [Balmin+05] / ScaleRank [Hristidis+14]
– Only applies to a limited type of graphs
• Hybrid strategy that linearly combines pre-computed
PageRank vector
– TwitterRank [Weng+10]
• Computing the parameter vector offline
– Many learning-to-rank applications [Nie+05, BackstromL11]
17. Edge Personalization Computation
• Ad-hoc algorithms for special graphs / specific
application
– ObjectRank / ScaleRank
– Only applies to a limited type of graphs
• Hybrid strategy that linearly combines pre-computed
PageRank vector
– TwitterRank
• Computing the parameter vector offline
– Many learning-to-rank applications
Can we efficiently compute
edge-weighted personalized
PageRank online?
18. Outline
• Introduction and Motivation
• Model Reduction
• Application to Personalized PageRank
• Experiments
18
19. Model Reduction
• Used in physical simulations
• Key assumption: solutions live in a low-
dimensional space
• Two ingredients
– Offline: Finding a basis for the space (POD/SVD)
– Online: Finding an approximation
19
20. Model Reduction for PageRank
• Assumption: lies close to a low-dimensional space
– Build a basis for k-dimensional reduced space
• Pick an approximation in the reduced space
– Represented by the coordinates in the k-dimensional space
– Need k equations
• Reconstruct the PageRank vector
20
21. Model Reduction for PageRank
• Assumption: lies close to a low-dimensional space
– Build a basis for k-dimensional reduced space
• Pick an approximation in the reduced space
– Represented by the coordinates in the k-dimensional space
– Need k equations
• Reconstruct the PageRank vector
21
22. Reduced Space Construction
• Assumption: lies close to a low-dimensional space
• Compute a sample set of PageRank vectors
22
23. Reduced Space Construction
• Assumption: lies close to a low-dimensional space
• Compute a sample set of PageRank vectors
• Find a basis for a k-dimensional space based on samples
– Data matrix
– Compute the SVD
here ,
– The best k-dimensional space under 2-norm
– Keep most important directions
23
24. Model Reduction
• Assumption: lies close to a low-dimensional space
– Build a basis for k-dimensional reduced space
• Pick an approximation in the reduced space
– Represented by the coordinates in the k-dimensional space
– Need k equations
• Reconstruct the PageRank vector
24
Denoted by Denoted by b
26. Extracting Approximations
• Reduced space basis U, online query w
• We want
– Usually
• The Petrov-Galerkin framework [Schiders08]
– Residual vector is orthogonal to the test space W
26
27. The Petrov-Galerkin Framework
• Bubnov-Galerkin
– The test space is the same as the reduced space
–
• Discrete Empirical Interpolation Method (DEIM)
– Satisfy a subset of equations
– Denote the index set for equations as
– when
27
28. DEIM
28
• Satisfy a subset of equations in the linear system
– Can choose more than k equations
– Over-determined linear system
• Least square solution
29. The Petrov-Galerkin Framework
• Bubnov-Galerkin
– The test space is the same as the reduced space
–
• Discrete Empirical Interpolation Method (DEIM)
– Satisfy a subset of equations
– Denote the index set for equations as
– when
29
30. The Petrov-Galerkin Framework
• Bubnov-Galerkin
– The test space is the same as the reduced space
–
• Discrete Empirical Interpolation Method (DEIM)
– Satisfy a subset of equations
– Denote the index set for equations as
–
30
What is the efficiency of these two
choices of test space?
How to choose the equations used by
DEIM?
31. Outline
• Introduction and Motivation
• Model Reduction
• Application to Personalized PageRank
• Experiments
31
32. The Petrov-Galerkin Framework
• Bubnov-Galerkin
– The test space is the same as the reduced space
–
• Discrete Empirical Interpolation Method (DEIM)
– Satisfy a subset of equations
– Denote the index set for equations as
–
32
What is the efficiency of these two
choices of test space?
How to choose the equations used by
DEIM?
33. Transition Matrix
• How is determined by w?
– First form the weighted adjacency matrix
• E.g.
– Normalize outgoing weights to be probabilities
33
1
23
3
2 3
0.25
0.20.3
0.75
0.2 0.3
34. Transition Matrix
• How is determined by w?
– First form the weighted adjacency matrix
• E.g.
– Normalize outgoing weights to be probabilities
• Bubnov-Galerkin: Too expensive to compute
• DEIM: NOT ENOUGH to just compute
incoming edge weights
34
1
23
3
2 3
35. Special Case: Linear Parameterization
• Linear Parameterization
– Each edge has one of the m different types
– A generalized random walker model
• First decide the type of edge to follow (according to w)
• Then decides between edges of that type (according to )
35
36. Special Case: Linear Parameterization
• Linear Parameterization
– Each edge has one of the m different types
– A generalized random walker model
• First decide the type of edge to follow (according to w)
• Then decides between edges of that type (according to )
• Bubnov-Galerkin
36
37. Special Case: Scaled-Linear Parameterization
• Scaled-Linear Parameterization
– Choose each edge weight as a linear combination of edge
feature
• E.g. post similarities between users in Twitter
– DEIM: Enough to compute incoming edge weights
37
38. The Petrov-Galerkin Framework
• Bubnov-Galerkin
– The test space is the same as the reduced space
–
• Discrete Empirical Interpolation Method (DEIM)
– Satisfy a subset of equations
– Denote the index set for equations as
–
38
What is the efficiency of these two
choices of test space?
How to choose the equations used by
DEIM?
39. Interpolation Set
• How should we choose the subset of equations?
– “Important” nodes according to PageRank
– Does not always work!
41. Interpolation Set
• We want rows are maximally linearly independent
– Pivoted QR
• DEIM: materialize only the selected rows
– Performance is decided by in-degree of selected nodes
– Skewed degree distribution in natural graphs
– A small set of nodes have large in-degrees
42. Utility vs. Cost
High-Level idea for Pivoted QR
Repeat for times
Select the next row with maximum utility
Adjust the utilities of other rows
• Idea 1: Among low-cost nodes, select one with maximum
utility
– Cost-bounded pivot
• Idea 2: Among high-utility nodes, select one with
minimal cost
– Threshold pivot
42
43. Learning to Rank
• Goal: Learn the best values of the parameters
– Based on user feedback, historic activities, etc
• Training Data
– Each pair : i should be ranked lower than j
– Objective Function
– Usually minimized via gradient-based method
43
Derivative of PageRank vector
44. The PageRank Derivative
• Standard Method
– Solves the same PageRank systems with different RHS
– With m parameters, solve m+1 PageRank systems !
• Compute the derivatives in the reduced space
– Solves the system with dimension k instead of dimension n !
44
45. Outline
• Introduction and Motivation
• Model Reduction
• Application to Personalized PageRank
• Experiments
45
46. Experiments
• Datasets
– DBLP
• 3.5M vertices, 18.5M edges, 7 parameters
• ObjectRank
– Weibo graph
• 2M vertices, 50.6M edges
• A social-blogging site in China, released by KDD Cup 2012
• Metrics
– Normalized L1
•
– Kendall’s tau
• The percentage of pairs that are out of order
46
51. Conclusion
• The first general scalable method for edge-weighted
personalized PageRank
– Based on model reduction
• Optimizations for common parameterization
• Cost/accuracy tradeoffs on power-law graphs
• Nearly 5 orders of magnitude faster on a learning to
rank application
51
55. Reference
• [Balmin+05] A. Balmin, et al. ObjectRank: Authority-Based Keyword Search
in Databases. In VLDB, 2004.
• [Nie+05] Z. Nie, et al. Object-level ranking: bringing order to web objects.
In WWW, 2005.
• [Haveliwala02] T. H. Haveliwala. Topic-sensitive PageRank. In WWW, 2002.
• [Bahmani+10] B. Bahmani, et al. Fast incremental and personalized
pagerank. PVLDB, 4(3):173–184, 2010.
• [Weng+10] J. Weng, et al. TwitterRank: finding topic-sensitive influential
twitterers. In WSDM, 2010.
• [BackstromL11] L. Backstrom and J. Leskovec. Supervised random walks:
predicting and recommending links in social networks. In WSDM, 2011.
• [Hristidis+14] V. Hristidis, et al. Efficient ranking on entity graphs with
personalized relationships. IEEE Trans. Knowl. Data Eng., 26(4):850–863,
2014.
• [Schiders08] W. Schilders. Model Order Reduction: Theory, Research
Aspects and Applications, Volume 13 of Mathematics in Industry. Springer,
Berlin, 2008.
55
Notas do Editor
Good morning everyone, welcome to my B exam.
Graphs are versatile tools used to express complex data dependencies. Social networks and the web are two best-known examples. Besides them, graphs have also been adopted in a variety of other domains, such as recommendation systems, bioinformatics, physical simulations, and computer vision.
Iterative graph computation is a classic problem, but new challenges arises in this big data era.
Here are the three projects I did during my graduate study to address the challenges of iterative graph computation in the big data era.
In today’s talk, I will discuss this recent work that enables interactive edge-weighted personalized PageRank.
Let’s first introduce and motivate the problem.
I believe most you already know the PageRank algorithm, which is originally proposed by Google to rank web pages.
But this algorithm has been widely used for many graph-ranking applications.
Let me try to briefly recap the model and define some terminology. In PageRank, a random walker moves through the nodes in the graph. At each step, the mover either walks alone the edge, or it says no, I am bored and I want restart to an arbitrary vertex according to some distributions.
Here P is the transition matrix, basically defines how the random walker moves alone the edge. It can move alone the edge uniformly or prefer some type of the edges.
v is the so-called teleport vector that defines a distribution. When the random walker is bored, it will tell which vertex to jump to. For example, it can jump to small subset of vertices. Or will it just select the vertex uniformly among the whole graph.
x is the PageRank vectors, what we are interested in.
The graph usually doesn’t only contain those topology data, but also other data associated with vertex and edges. Such as user profile, edge types..
PageRank algorithm usually exploits the rich metadata on the graphs. As it allows personalized results based on user preference.
For example, on a microblogging website such as Twitter, user would like to find authorities on different topics. In one query he is asking what’s the authority on food? In another query he would like to identify the expert on music.
To personalize the PageRank result, here are the two ideas. Both of them parameterize part of the PageRank system.
The first way is the node-weighted personalized PageRank, basically we decide teleport distribution based on the query parameter w. Intuitively, it marks some of the nodes are more related to the query.
The second way is the edge-weighted personalized PageRank. The transition matrix P is changed according to the parameter w.
Intuitively, it says, the random walker want to go through some edges more often than others.
Both types of personalization have many applications. For NodePPR, one example is TSPR. It assumes different nodes belongs to different topics, such as sports, entertainment, music. And it changes the teleport distribution according to the user’s query. Localized PageRank is another type of NodePPR, where it always jump to a single node, which is used in recommendation systems.
EdgePPR are suited for graphs with metadata on edges. For example, ObjectRank use the edge type in the DBLP graph to personalize the transition matrix. TwitterRank exploits the topic similarities between friends.
Here is the example of ObjectRank on DBLP.
As we can see, DBLP graph has different edge types between nodes; author writes paper, paper cites other paper.
ObjectRank allows changing the relative weights between different edge types. In one query, the user might think the citation edge is most important; in another query, user might say the conference and author is a strong indicator of a great paper, so let’s give these orange and green edges higher weights.
We have introduced these two very different schemes for personalization. The question is…
The answer is, it depends, especially whether the metadata is with vertex or edge.
For some graphs, we know the topic of vertices, it’s suitable for node personalization (e.g. TSPR). But for some graphs have metadata associated with edge, such as edge type, feature vectors. These graphs best fit to use edge personalization. For example, the ObjectRank.
Facebook has also reported that edge features such as communication frequency, visit time can significantly improve the friend recommendation result via EdgePPR.
As we discussed, there are two types of personalized PageRank, each of them is applied to a family of graphs. People are looking for efficient methods to compute them.
There are many efficient methods for Node PPR, usually based on the linearity of the parameter w.
There is no efficient algorithms for edge-weighted personalized PageRank on general graphs. The difficulty here is, in edge-weighted personalized PageRank, the PageRank vector x doesn’t depend on the parameter w linearly, unlike the case in node-weighted personalized PageRank.
-- hacking or first-order approx (TwitterRank) / on special graphs (ScaleRank) / Do it offline (All learning to rank)
Since Edge-PPR has many applications, people have tried hard to work around this.
Strictly speaking, it’s not generate the “correct result”. It’s kind of generating something between NodePPR and EdgePPR.
For most learning-to-rank applications, people just compute the parameter vector offline, because they cannot afford to do it online.
Since Edge-PPR has many applications, people have tried hard to work around this.
Strictly speaking, it’s not generate the “correct result”. It’s kind of generating something between NodePPR and EdgePPR.
For most learning-to-rank applications, people just compute the parameter vector offline, because they cannot afford to do it online.
We didn’t invent it.
To quickly get solutions from parametric large-scale PDE systems.
To locate the coordinate, we need to determine k variables.
This is trivial by computing the linear combination of basis vectors according to the coordinates.
How do we construct this reduced space?
We construct the low-dimensional space in a data-driven way, first we need to compute a sample set of PageRank vectors.
We currently did nothing special here. Basically we just sample the parameters uniformly. And compute these PageRank vectors.
There are many possible extension. We might want to use more sophisticated sampling method, such as adaptive sampling. But uniform sampling works well for the applications we tested.
With the sampled PageRank vectors, we can construct the reduced space via SVD.
More specifically, we will first assemble these vectors into a data matrix
After the SVD, the best k-dimensional space under 2-norm can be constructed by use the first k singular vector.
M(w) for the system matrix
B for the RHS
A linear combination of the basis vectors, and it’s closed to the accurate PageRank vector.
For the accurate PageRank vector, we have the residual vector M x – b = 0
Nevertheless, we can still enforce the residual vector being orthogonal to some test space W.
Although we cannot make this residual vector to be zero, we can make its projection to some test space to be zero.
We also call this index set the “interpolation set”
When the size of interpolation set is k. It fits into the Petrov Galerkin framework with this test space Pi. This matrix contains all zero except a few entries on the diagonal.
Here is a little more note about the DEIM method. As we said, it tries to satisfy a subset of equations in the huge linear system.
We can actually choose more than k equations for more accurate result, and in this case, it’s an over-determined linear system, and we will solve the least square problem instead.
We have introduced two general choice of test space.
Is one test space always better than another? Or there is some efficiency/accuracy tradeoff?
We said in the DEIM method, we want to choose a subset of equations, how should we choose it?
Those are questions specific to the PageRank applications, and the graphs we are using.
To answer this question, we need to investigate how the transition matrix formed.
I didn’t say anything about how P(w) is determined. We just assume there is some black box…
It’s now the time to introduce it
One popular choice is through a inner product with w and the edge feature vector.
After we have the weight on each edge, we know some edges are important and some are not.
For this very general way of determine the transition matrix, Bubnov Galerkin is not efficient.
Even for the DEIM method, since we need to transition probability for each incoming edges of the selected nodes.
To get the incoming transition prob for the light blue guy. We need to compute the edge weights for both purple edges and orange edges.
There are some common used transition matrix where we can do better…
Transition matrix is determined via a linear combination of type-specific transition matrix.
Thanks to the linearity, now the reduced system can be expanded. Each of the component can be precomputed and combined linearly later.
This combination is linear addition of several 100*100.
Transition matrix is no longer linear combination, but the weighted adjacent matrix A is.
Move forward
By some error analysis, we found the guideline to select the interpolation set is that make the rows selected in MU maximally linearly independent
We tried two heuristics. One favors the running time and another favors the accuracy.
Like a typical machine learning problem, it defines some loss function and try to find the best parameter to the optimization problem.
We are testing on 1000 randomly selected nodes. So we plot the cumulative frequency of the Kendal distance.
We compare our methods with BCA, the state-of-art algorithm for localized PageRank.
Thank my advisor Johannes for all the helpful suggestions. You have not only taught me knowledge in CS, but also how to find good research problems, how to think as a researcher.
Thank David Bindel for all the discussion on the projects we collaborated. Your suggestions from mathmatic and scientific computing are invaluable.
Thank Al Demers for the advice from system perspective, which helped me understand experimental result.
Thank Bobby Kleinberg for constructive suggestions and also inspiring lectures on algorithm design and analysis.