A dynamical system for time-dependent PageRank

A dynamical system
for PageRank with
time-dependent
teleportation
David F. Gleich!
Computer Science"
Purdue University
Paper http://arxiv.org/abs/1211.4266
Code https://www.cs.purdue.edu/homes/dgleich/codes/dynsyspr-im
Ryan A. Rossi!
Computer Science"
Purdue University
1
David Gleich · Purdue
ANL Seminar

1.  Perspectives on PageRank
2.  PageRank as a dynamical system and
time-dependent teleportation
3.  Predicting using PageRank
4.  Applications to the power-grid?
2
ANL Seminar

Given a graph, what are the
most important nodes?
3
ANL Seminar

The random surfer model!
At a node …
1.  follow edges with prob α
2.  do something else with prob (1-α)
Google’s PageRank is one
possible answer
PageRank by Google
1
2
3
4
5
6
The Model
1. follow edges uniformly with
probability , and
2. randomly jump with probability
1 , we’ll assume everywhere is
equally likely
The places we ﬁnd the
surfer most often are im-
portant pages.
The important pages are the
places we are most likely to ﬁnd
the random surfer
4
ANL Seminar

The most important page on the web.!
5
ANL Seminar

PageRank details
1
2
3
4
5
6
!
2
6
6
4
1/6 1/2 0 0 0 0
1/6 0 0 1/3 0 0
1/6 1/2 0 1/3 0 0
1/6 0 1/2 0 0 0
1/6 0 1/2 1/3 0 1
1/6 0 0 0 1 0
3
7
7
5
| {z }
P
P j 0
eT P=eT
“jump” ! v = [ 1
n
... 1
n ]
T 0
eT v=1
Markov chain
î
P + (1 )veT
ó
x = x
unique x ) j 0, eT x = 1.
Linear system ( P)x = (1 )v
Ignored dangling nodes patched back to v
algorithms later
David F. Gleich (Sandia) PageRank intro Purdue 6 / 36
PageRank by Google
1
2
3
4
5
6
The Model
1. follow edges uniformly with
probability , and
2. randomly jump with probability
1 , we’ll assume everywhere
equally likely
The places we ﬁnd the
surfer most often are im-
portant pages.
David F. Gleich (Sandia) PageRank intro Purdue
PageRank via
v is the jump vector.! vi 0, eT
v = 1
6
ANL Seminar

My deﬁnition of PageRank
A PageRank vector x is the solution of the linear system:
(I – αP) x = (1 –α) v
where P is a column stochastic matrix, 0 ≤ α< 1, and v is a
probability vector.
tails
!
2
6
6
4
1/6 1/2 0 0 0 0
1/6 0 0 1/3 0 0
1/6 1/2 0 1/3 0 0
1/6 0 1/2 0 0 0
1/6 0 1/2 1/3 0 1
1/6 0 0 0 1 0
3
7
7
5
| {z }
P
P j 0
eT P=eT
Just three ingredients!
vi 0, eT
v = 1
↵ usually 0.5 to 0.99
7
ANL Seminar

This deﬁnition applies to a
remarkable variety of problems
1.  GeneRank
2.  ProteinRank
3.  FoodRank
4.  SportsRank
5.  HostRank
6.  TrustRank
7.  BadRank
8.  IsoRank
9.  SimRank
10.  ObjectRank
11.  ItemRank
12.  ArticleRank
13.  BookRank
14.  FutureRank
15.  TimedPageRank
16.  SocialPageRank
17.  DiffusionRank
18.  ImpressionRank
19.  TweetRank
20.  TwitterRank
21.  ReversePageRank
22.  PageTrust
23.  PopRank
24.  CiteRank
25.  FactRank
26.  InvestorRank
27.  ImageRank
28.  VisualRank
29.  QueryRank
30.  BookmarkRan
31.  StoryRank
32.  PerturbationRank
33.  ChemicalRank
34.  RoadRank
35.  PaperRank
36.  Etc…
8
ANL Seminar

Richardson is a robust, simple
algorithm to compute PageRank
(I ↵P)x = (1 ↵)v
Richardson )
x(k+1)
= ↵Px(k)
+ (1 ↵)v
error = kx(k)
xk1  2↵k
Given α, P, v
9
ANL Seminar

The teleportation distribution v
models where surfers “restart”

What if this changes with time?
10
ANL Seminar

First idea
Resolve PageRank when v changes

+ PageRank is fast to solve!
+ Easy to understand
– Need another model to incorporate the past
– PageRank isn’t that fast to solve.

Is there anything better?
11
ANL Seminar

Let’s look at how PageRank
evolves with iterations
x(k)
= x(k+1)
x(k)
= ↵Px(k)
+ (1 ↵)v x(k)
= (1 ↵)v (I ↵P)x(k)
x0
(t) = (1 ↵)v (I ↵P)x(t)
PageRank is the steady-state solution of the ODE
12
ANL Seminar

A dynamical system for "
time-dependent teleportation
+ Easy to integrate
+ Easy to understand
+ Possible to treat analytically!
– Need to “model time” (not dimensionless)
– Still useful to have a data assimilation model
x0
(t) = (1 ↵)v(t) (I ↵P)x(t)
13
ANL Seminar

Need a self-stabilized ODE
We use a standard RK integrator "
(ode45 in Matlab)
We used the formulation

to maintain x(t) as a probability distribution

x0
(t) = (1 ↵)v(t) ( I ↵P)x(t)
= (1 ↵)eT
v(t) + ↵eT
x(t)
14
ANL Seminar

Where is this model realistic?
On Wikipedia, we have
hourly visit data that provides
a coarse measure of outside
interest
15
ANL Seminar

Now PageRank values are
time-series, not static scores
1 MainPage 2 FrancisMag 3
11 501(c) 12 Searching 1
Earthquake
Australian
Earthquake
occurs!
Main page
Time
Time
Importance
16
ANL Seminar

Some quick theory
x(t) = exp[ (I ↵P)t]x(0)
+ (1 ↵)
Z t
0
exp[ (I ↵P)(t ⌧)]v(⌧) d⌧.
x0
(t) = (1 ↵)v(t) (I ↵P)x(t)
Z t
0
exp[ (I ↵P)(t ⌧)]v(⌧) d⌧
= (I ↵P) 1
v exp[ (I ↵P)t](I ↵P) 1
v
x(t) = exp[ (I ↵P)t](x(0) x) + x
For
general
v(t)
For
static
v(t) = v
The original "
PageRank vector
17
ANL Seminar

Thus we recover "
the original PageRank vector "
if interest stops changing.
18
ANL Seminar

0 5 10 15 20
0.1
0.2
0.3
0.4
0.5
time
DynamicPageRank
Page 1
Page 2
Page 3
Page 4
Cyclical behavior in the time-
dependent PageRank scores
1
2
3
4
0 20 40 60 80
0
0.05
0.1
0.15
0.2
time
Time−dependentteleportation
Page 1
Page 2
Page 3
Page 4
19
ANL Seminar

Modeling cyclical behavior
Cyclically switch between teleportation vectors vj
v(t) =
1
k
kX
j=1
vj
⇣
cos(t + (j 1)2⇡
k ) + 1
⌘
0 20 40 60 80
0
0.05
0.1
0.15
0.2
time
Time−dependentteleportation
Page 1
Page 2
Page 3
Page 4
v1
v2
v1
v2
20
ANL Seminar

Modeling cyclical behavior
Cyclically switch between teleportation vectors vj
v(t) =
1
k
kX
j=1
vj
⇣
cos(t + (j 1)2⇡
k ) + 1
⌘
x(t) = x + Re {s exp(ıt)}
Then the eventual solution is
(I ↵P)x = (1 ↵)
1
k
Ve
(I ↵
1+ı P)s
= (1 ↵) 1
k(1+ı) V exp(ıf)
PageRank vector with average teleportation
PageRank with
complex teleportation
21
ANL Seminar

Thus we can determine "
the size of the oscillation "
for the case of cyclical
teleportation
22
ANL Seminar

Is it useful? Let’s try and
predict retweets on Twitter
We crawled Twitter and gathered "
a graph of who follows who and "
how active each user is in a month
This yields a graph and 6 vectors v!
!
Our goal is to predict how many tweets you’ll
send next month based on the current month!
23
ANL Seminar

First, how do we model time?
v1, ... , vk ! V =
⇥
v1, ... , vk
⇤
v(t) = Ve(floor {t} + 1) = vfloor{t}+1 t=1 is one month
vs(t) = Ve(floor {t/s} + 1) = vfloor{t/s}+1
Rescaling time
t=s is one month
x(sj), j = 0, 1, ... These are the same time points
s=∞ yields a recomputed PageRank at each step!
24
ANL Seminar

The effect of s on PageRank
of one node is considerable
s = 1 s = 2 s = 6
(a) timescale s
s = 1 s = 2 s = 6
Time
PageRankx1(t)
gray involves just recomputing PageRank at each change
Data from Wikipedia
25
ANL Seminar

Second, can we make it smooth?
v1, ... , vk ! V =
⇥
v1, ... , vk
⇤
v(t) = Ve(ﬂoor {t} + 1) = vﬂoor{t}+1 t=1 is one month
¯v(t; ✓) = v(t)
| {z }
new data
+ (1 )¯v(t h; ✓)
| {z }
old data
,
¯v0
(t; ✓) = ✓v(t) ✓¯v(t; ✓) Full ODE
Forward Euler "
interpretation
26
ANL Seminar

θ = 0.1 θ = 1 θ = 10
(b) smoothing ✓
The effect of theta on PageRank
of one node is moderate
Time
PageRankx1(t)
Only matters if there is a big jump
Data from Wikipedia
= 6 θ = 0.1 θ = 1 θ = 10
(b) smoothing ✓
27
ANL Seminar

Parameters of the prediction
alpha – PageRank modeling parameters
s – time-scale
theta - smoothing
28
ANL Seminar

The prediction model
⇥
¯f(t 1) ¯f(t 2) ... ¯f(t w)
⇤
b ⇡ p(t)
sMAPE =
1
|T|
|T|
X
t=1
|pt ˆpt |
(pt + ˆpt )/2
averaged over nodes
Linear, one-step ahead prediction
is evaluated using
29
ANL Seminar

The results
Dataset Type ✓ Error Ratio
s (timescale)
1 2 6 1
TWITTER stationary 0.01 0.635 0.929 0.913 0.996
0.50 0.636 0.735 0.854 0.939
1.00 0.522 0.562 0.710 0.963
non-stationary 0.01 0.461 0.841 1.001 0.992
0.50 0.261 0.608 0.585 0.929
1.00 0.137 0.605 0.617 0.918
Err Ratio = SMAPE of tweets + Time-dependent PR / SMAPE of tweets only
If this ratio < 1, then using Time-dependent PR helps
Stationary nodes are those with small maximum change in scores
Non-stationary nodes are those with large maximum change in scores
30
ANL Seminar

We tried the same experiment with Wikipedia, "
but there was no meaningful change in the prediction error.
31
ANL Seminar

Using Granger Causality to study link
relationships on Wikipedia
51 Greygoo 52 pageprotec 53 R
61 Science 62 Gackt 63 T
71 Madonna(en 72 Richtermag 73 T
81 Livingpeop 82 Mathematic 83 S
91 Categories 92 Germany 93 M
ogy 20 Geography
atic 30 Biography
en(f 40 Earthquake
io 50 Raceandeth
60 Football(s
Earthquake
Richter Mag.
Causes?
Of course! We build this into the model.
32
ANL Seminar

But, the question is, which of
these are preserved after
incorporating the effects of
page view data?
33
ANL Seminar

Using Granger Causality to ﬁnd the
important links on Wikipedia
Earthquake Granger causes p-value
Seismic hazard 0.003535
Extensional tectonics 0.003033
Landslide dam 0.002406
Earthquake preparedness 0.001157
Richter magnitude scale 0.000584
Fault (geology) 0.000437
Aseismic creep 0.000419
Seismometer 0.000284
Epicenter 0.000020
Seismology 0.000001
34
ANL Seminar

Thus, these links “ﬁt” our
model, whereas the other links
on the page do not.
35
ANL Seminar

Application to the power grid
Prior work
•  Kim, Obah, 2007; Jin et al., 2010; Adolf et al., 2011; Halappanavar et
al., 2012
has found that graph properties have important
correlations with power-grid vulnerabilities and
contingency analysis
36
ANL Seminar

Each edge has a power
flow that satisfies some
non-linear power flow
equation.

We use average daily
flows to study time-
dependent PageRank
on the line graph of the
underlying network.

Lines with high variance
may be problematic?
37
ANL Seminar

My questions
Sample data to test this idea?
Too simplistic?

Time-dependent betweenness centrality

with cyclical teleportation?

Other power-grid problems where similar ideas
may be able to help?
38
ANL Seminar

A dynamical system
for PageRank with
time-dependent
teleportation
David F. Gleich!
Computer Science"
Purdue University
Paper http://arxiv.org/abs/1211.4266
Code https://www.cs.purdue.edu/homes/dgleich/codes/dynsyspr-im
Ryan A. Rossi!
Computer Science"
Purdue University
39
ANL Seminar

A dynamical system for time-dependent PageRank

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (15)

Destaque

Destaque (20)

Semelhante a A dynamical system for time-dependent PageRank

Semelhante a A dynamical system for time-dependent PageRank (20)

Mais de David Gleich

Mais de David Gleich (9)

Último

Último (20)

A dynamical system for time-dependent PageRank