This document summarizes a presentation about modeling PageRank as a dynamical system with time-dependent teleportation. It discusses representing PageRank as the solution to a differential equation to model how it changes over time as the teleportation distribution varies. This allows PageRank values to become time-series rather than static scores. The model is applied to predict Twitter retweets and analyze causality between Wikipedia pages.
1. A dynamical system
for PageRank with
time-dependent
teleportation
David F. Gleich!
Computer Science"
Purdue University
Paper http://arxiv.org/abs/1211.4266
Code https://www.cs.purdue.edu/homes/dgleich/codes/dynsyspr-im
Ryan A. Rossi!
Computer Science"
Purdue University
1
David Gleich · Purdue
ANL Seminar
2. 1. Perspectives on PageRank
2. PageRank as a dynamical system and
time-dependent teleportation
3. Predicting using PageRank
4. Applications to the power-grid?
2
David Gleich · Purdue
ANL Seminar
3. Given a graph, what are the
most important nodes?
3
David Gleich · Purdue
ANL Seminar
4. The random surfer model!
At a node …
1. follow edges with prob α
2. do something else with prob (1-α)
Google’s PageRank is one
possible answer
PageRank by Google
1
2
3
4
5
6
The Model
1. follow edges uniformly with
probability , and
2. randomly jump with probability
1 , we’ll assume everywhere is
equally likely
The places we find the
surfer most often are im-
portant pages.
The important pages are the
places we are most likely to find
the random surfer
4
David Gleich · Purdue
ANL Seminar
6. PageRank details
1
2
3
4
5
6
!
2
6
6
4
1/6 1/2 0 0 0 0
1/6 0 0 1/3 0 0
1/6 1/2 0 1/3 0 0
1/6 0 1/2 0 0 0
1/6 0 1/2 1/3 0 1
1/6 0 0 0 1 0
3
7
7
5
| {z }
P
P j 0
eT P=eT
“jump” ! v = [ 1
n
... 1
n ]
T 0
eT v=1
Markov chain
î
P + (1 )veT
ó
x = x
unique x ) j 0, eT x = 1.
Linear system ( P)x = (1 )v
Ignored dangling nodes patched back to v
algorithms later
David F. Gleich (Sandia) PageRank intro Purdue 6 / 36
PageRank by Google
1
2
3
4
5
6
The Model
1. follow edges uniformly with
probability , and
2. randomly jump with probability
1 , we’ll assume everywhere
equally likely
The places we find the
surfer most often are im-
portant pages.
David F. Gleich (Sandia) PageRank intro Purdue
PageRank via
v is the jump vector.! vi 0, eT
v = 1
6
David Gleich · Purdue
ANL Seminar
7. My definition of PageRank
A PageRank vector x is the solution of the linear system:
(I – αP) x = (1 –α) v
where P is a column stochastic matrix, 0 ≤ α< 1, and v is a
probability vector.
tails
!
2
6
6
4
1/6 1/2 0 0 0 0
1/6 0 0 1/3 0 0
1/6 1/2 0 1/3 0 0
1/6 0 1/2 0 0 0
1/6 0 1/2 1/3 0 1
1/6 0 0 0 1 0
3
7
7
5
| {z }
P
P j 0
eT P=eT
Just three ingredients!
vi 0, eT
v = 1
↵ usually 0.5 to 0.99
7
David Gleich · Purdue
ANL Seminar
9. Richardson is a robust, simple
algorithm to compute PageRank
(I ↵P)x = (1 ↵)v
Richardson )
x(k+1)
= ↵Px(k)
+ (1 ↵)v
error = kx(k)
xk1 2↵k
Given α, P, v
9
David Gleich · Purdue
ANL Seminar
10. The teleportation distribution v
models where surfers “restart”
What if this changes with time?
10
David Gleich · Purdue
ANL Seminar
11. First idea
Resolve PageRank when v changes
+ PageRank is fast to solve!
+ Easy to understand
– Need another model to incorporate the past
– PageRank isn’t that fast to solve.
Is there anything better?
11
David Gleich · Purdue
ANL Seminar
12. Let’s look at how PageRank
evolves with iterations
x(k)
= x(k+1)
x(k)
= ↵Px(k)
+ (1 ↵)v x(k)
= (1 ↵)v (I ↵P)x(k)
x0
(t) = (1 ↵)v (I ↵P)x(t)
PageRank is the steady-state solution of the ODE
12
David Gleich · Purdue
ANL Seminar
13. A dynamical system for "
time-dependent teleportation
+ Easy to integrate
+ Easy to understand
+ Possible to treat analytically!
– Need to “model time” (not dimensionless)
– Still useful to have a data assimilation model
x0
(t) = (1 ↵)v(t) (I ↵P)x(t)
13
David Gleich · Purdue
ANL Seminar
14. Need a self-stabilized ODE
We use a standard RK integrator "
(ode45 in Matlab)
We used the formulation
to maintain x(t) as a probability distribution
x0
(t) = (1 ↵)v(t) ( I ↵P)x(t)
= (1 ↵)eT
v(t) + ↵eT
x(t)
14
David Gleich · Purdue
ANL Seminar
15. Where is this model realistic?
On Wikipedia, we have
hourly visit data that provides
a coarse measure of outside
interest
15
David Gleich · Purdue
ANL Seminar
16. Now PageRank values are
time-series, not static scores
1 MainPage 2 FrancisMag 3
11 501(c) 12 Searching 1
Earthquake
Australian
Earthquake
occurs!
Main page
Time
Time
Importance
16
David Gleich · Purdue
ANL Seminar
17. Some quick theory
x(t) = exp[ (I ↵P)t]x(0)
+ (1 ↵)
Z t
0
exp[ (I ↵P)(t ⌧)]v(⌧) d⌧.
x0
(t) = (1 ↵)v(t) (I ↵P)x(t)
Z t
0
exp[ (I ↵P)(t ⌧)]v(⌧) d⌧
= (I ↵P) 1
v exp[ (I ↵P)t](I ↵P) 1
v
x(t) = exp[ (I ↵P)t](x(0) x) + x
For
general
v(t)
For
static
v(t) = v
The original "
PageRank vector
17
David Gleich · Purdue
ANL Seminar
18. Thus we recover "
the original PageRank vector "
if interest stops changing.
18
David Gleich · Purdue
ANL Seminar
21. Modeling cyclical behavior
Cyclically switch between teleportation vectors vj
v(t) =
1
k
kX
j=1
vj
⇣
cos(t + (j 1)2⇡
k ) + 1
⌘
x(t) = x + Re {s exp(ıt)}
Then the eventual solution is
(I ↵P)x = (1 ↵)
1
k
Ve
(I ↵
1+ı P)s
= (1 ↵) 1
k(1+ı) V exp(ıf)
PageRank vector with average teleportation
PageRank with
complex teleportation
21
David Gleich · Purdue
ANL Seminar
22. Thus we can determine "
the size of the oscillation "
for the case of cyclical
teleportation
22
David Gleich · Purdue
ANL Seminar
23. Is it useful? Let’s try and
predict retweets on Twitter
We crawled Twitter and gathered "
a graph of who follows who and "
how active each user is in a month
This yields a graph and 6 vectors v!
!
Our goal is to predict how many tweets you’ll
send next month based on the current month!
23
David Gleich · Purdue
ANL Seminar
24. First, how do we model time?
v1, ... , vk ! V =
⇥
v1, ... , vk
⇤
v(t) = Ve(floor {t} + 1) = vfloor{t}+1 t=1 is one month
vs(t) = Ve(floor {t/s} + 1) = vfloor{t/s}+1
Rescaling time
t=s is one month
x(sj), j = 0, 1, ... These are the same time points
s=∞ yields a recomputed PageRank at each step!
24
David Gleich · Purdue
ANL Seminar
25. The effect of s on PageRank
of one node is considerable
s = 1 s = 2 s = 6
(a) timescale s
s = 1 s = 2 s = 6
Time
PageRankx1(t)
gray involves just recomputing PageRank at each change
Data from Wikipedia
25
David Gleich · Purdue
ANL Seminar
26. Second, can we make it smooth?
v1, ... , vk ! V =
⇥
v1, ... , vk
⇤
v(t) = Ve(floor {t} + 1) = vfloor{t}+1 t=1 is one month
¯v(t; ✓) = v(t)
| {z }
new data
+ (1 )¯v(t h; ✓)
| {z }
old data
,
¯v0
(t; ✓) = ✓v(t) ✓¯v(t; ✓) Full ODE
Forward Euler "
interpretation
26
David Gleich · Purdue
ANL Seminar
27. θ = 0.1 θ = 1 θ = 10
(b) smoothing ✓
The effect of theta on PageRank
of one node is moderate
Time
PageRankx1(t)
Only matters if there is a big jump
Data from Wikipedia
= 6 θ = 0.1 θ = 1 θ = 10
(b) smoothing ✓
27
David Gleich · Purdue
ANL Seminar
28. Parameters of the prediction
alpha – PageRank modeling parameters
s – time-scale
theta - smoothing
28
David Gleich · Purdue
ANL Seminar
29. The prediction model
⇥
¯f(t 1) ¯f(t 2) ... ¯f(t w)
⇤
b ⇡ p(t)
sMAPE =
1
|T|
|T|
X
t=1
|pt ˆpt |
(pt + ˆpt )/2
averaged over nodes
Linear, one-step ahead prediction
is evaluated using
29
David Gleich · Purdue
ANL Seminar
30. The results
Dataset Type ✓ Error Ratio
s (timescale)
1 2 6 1
TWITTER stationary 0.01 0.635 0.929 0.913 0.996
0.50 0.636 0.735 0.854 0.939
1.00 0.522 0.562 0.710 0.963
non-stationary 0.01 0.461 0.841 1.001 0.992
0.50 0.261 0.608 0.585 0.929
1.00 0.137 0.605 0.617 0.918
Err Ratio = SMAPE of tweets + Time-dependent PR / SMAPE of tweets only
If this ratio < 1, then using Time-dependent PR helps
Stationary nodes are those with small maximum change in scores
Non-stationary nodes are those with large maximum change in scores
30
David Gleich · Purdue
ANL Seminar
31. We tried the same experiment with Wikipedia, "
but there was no meaningful change in the prediction error.
31
David Gleich · Purdue
ANL Seminar
32. Using Granger Causality to study link
relationships on Wikipedia
51 Greygoo 52 pageprotec 53 R
61 Science 62 Gackt 63 T
71 Madonna(en 72 Richtermag 73 T
81 Livingpeop 82 Mathematic 83 S
91 Categories 92 Germany 93 M
ogy 20 Geography
atic 30 Biography
en(f 40 Earthquake
io 50 Raceandeth
60 Football(s
Earthquake
Richter Mag.
Causes?
Of course! We build this into the model.
32
David Gleich · Purdue
ANL Seminar
33. But, the question is, which of
these are preserved after
incorporating the effects of
page view data?
33
David Gleich · Purdue
ANL Seminar
34. Using Granger Causality to find the
important links on Wikipedia
Earthquake Granger causes p-value
Seismic hazard 0.003535
Extensional tectonics 0.003033
Landslide dam 0.002406
Earthquake preparedness 0.001157
Richter magnitude scale 0.000584
Fault (geology) 0.000437
Aseismic creep 0.000419
Seismometer 0.000284
Epicenter 0.000020
Seismology 0.000001
34
David Gleich · Purdue
ANL Seminar
35. Thus, these links “fit” our
model, whereas the other links
on the page do not.
35
David Gleich · Purdue
ANL Seminar
36. Application to the power grid
Prior work
• Kim, Obah, 2007; Jin et al., 2010; Adolf et al., 2011; Halappanavar et
al., 2012
has found that graph properties have important
correlations with power-grid vulnerabilities and
contingency analysis
36
David Gleich · Purdue
ANL Seminar
37. Each edge has a power
flow that satisfies some
non-linear power flow
equation.
We use average daily
flows to study time-
dependent PageRank
on the line graph of the
underlying network.
Lines with high variance
may be problematic?
37
David Gleich · Purdue
ANL Seminar
38. My questions
Sample data to test this idea?
Too simplistic?
Time-dependent betweenness centrality
with cyclical teleportation?
Other power-grid problems where similar ideas
may be able to help?
38
David Gleich · Purdue
ANL Seminar
39. A dynamical system
for PageRank with
time-dependent
teleportation
David F. Gleich!
Computer Science"
Purdue University
Paper http://arxiv.org/abs/1211.4266
Code https://www.cs.purdue.edu/homes/dgleich/codes/dynsyspr-im
Ryan A. Rossi!
Computer Science"
Purdue University
39
David Gleich · Purdue
ANL Seminar