13. PageRank sucks!
(or NOT?)
The Hollywood graph we used contains 2,000,000
nodes. Most of them are completely unknown!
14. PageRank sucks!
(or NOT?)
The Hollywood graph we used contains 2,000,000
nodes. Most of them are completely unknown!
PageRank is not singling out the best actors...
15. PageRank sucks!
(or NOT?)
The Hollywood graph we used contains 2,000,000
nodes. Most of them are completely unknown!
PageRank is not singling out the best actors...
...but still it is not pointing to random individuals, is it?
21. The glass is half full
Link Analysis is good, but you cannot expect
any centrality index to work on any network
22. The glass is half full
Link Analysis is good, but you cannot expect
any centrality index to work on any network
Probably PageRank is scarcely useful on Hollywood,
but maybe other measures would work like a charm
23. The glass is half full
Link Analysis is good, but you cannot expect
any centrality index to work on any network
Probably PageRank is scarcely useful on Hollywood,
but maybe other measures would work like a charm
Betweenness? Closeness? Katz? ...
24. The glass is half full
Link Analysis is good, but you cannot expect
any centrality index to work on any network
Probably PageRank is scarcely useful on Hollywood,
but maybe other measures would work like a charm
Betweenness? Closeness? Katz? ...
A problem here: some indices are computationa!y
unfeasible on large networks!
27. The grand plan
Centrality indices / link analysis has been with us for 60+
years
IR, sociology, psychology all have given their
contributions...
28. The grand plan
Centrality indices / link analysis has been with us for 60+
years
IR, sociology, psychology all have given their
contributions...
...turning it into a jungle
30. My point, today
By contract, I have to convince you that networks
contain a big deal of information useful for IR
31. My point, today
By contract, I have to convince you that networks
contain a big deal of information useful for IR
But you have to use them in a proper way (i.e., to
compute suitable indices)
32. My point, today
By contract, I have to convince you that networks
contain a big deal of information useful for IR
But you have to use them in a proper way (i.e., to
compute suitable indices)
And more often than not this calls for new algorithms
because of their size (and sometimes density)
34. Centrality in social sciences
a historical account
First works by Bavelas at MIT (1948)
35. Centrality in social sciences
a historical account
First works by Bavelas at MIT (1948)
This sparked countless works (Bavelas 1951; Katz 1953;
Shaw 1954; Beauchamp 1965; Mackenzie 1966; Burgess
1969; Anthonisse 1971; Czapiel 1974...) that Freeman
(1979) tried to summarize concluding that:
36. Centrality in social sciences
a historical account
First works by Bavelas at MIT (1948)
This sparked countless works (Bavelas 1951; Katz 1953;
Shaw 1954; Beauchamp 1965; Mackenzie 1966; Burgess
1969; Anthonisse 1971; Czapiel 1974...) that Freeman
(1979) tried to summarize concluding that:
several measures are o"en only vaguely
related to the intuitive ideas they purport to
index, and many are so complex that it is
difficult or impossible to discover what, if
anything, they are measuring
38. The 1990s revival
Link Analysis Ranking
With the advent of search engines, there was a strong
revamp of centrality (LAR in this context)
39. The 1990s revival
Link Analysis Ranking
With the advent of search engines, there was a strong
revamp of centrality (LAR in this context)
New scenarios
40. The 1990s revival
Link Analysis Ranking
With the advent of search engines, there was a strong
revamp of centrality (LAR in this context)
New scenarios
• graphs are directed mainly
41. The 1990s revival
Link Analysis Ranking
With the advent of search engines, there was a strong
revamp of centrality (LAR in this context)
New scenarios
• graphs are directed mainly
• they are huge
42. The 1990s revival
Link Analysis Ranking
With the advent of search engines, there was a strong
revamp of centrality (LAR in this context)
New scenarios
• graphs are directed mainly
• they are huge
• new attention to efficiency
43. The 1990s revival
Link Analysis Ranking
With the advent of search engines, there was a strong
revamp of centrality (LAR in this context)
New scenarios
• graphs are directed mainly
• they are huge
• new attention to efficiency
LAR is the ungrateful reincarnation of Centrality
45. What a mess!
Only few, brave guys tried to make some order in this
mess!
46. What a mess!
Only few, brave guys tried to make some order in this
mess!
Noteworthy (in the IR context): Craswell, Upstill,
Hawking (ADCS 2003); Najork, Zaragoza, Taylor
(SIGIR 2007); Najork, Gollapudi, Panigrahy (WSDM
2009)
47. What a mess!
Only few, brave guys tried to make some order in this
mess!
Noteworthy (in the IR context): Craswell, Upstill,
Hawking (ADCS 2003); Najork, Zaragoza, Taylor
(SIGIR 2007); Najork, Gollapudi, Panigrahy (WSDM
2009)
Guess that the others were scared by computational
burden of classical measures
49. Begin the begin
The only point on
which everybody
seems to agree: the
center of a star is more
important than the
other nodes
50. Begin the begin
The only point on
which everybody
seems to agree: the
center of a star is more
important than the
other nodes
But what does it make
more important?
59. A tale of three tribes
Indices based on degree
60. A tale of three tribes
Indices based on degree
Indices based on the number of paths or shortest paths
(geodesics) passing through a vertex;
61. A tale of three tribes
Indices based on degree
Indices based on the number of paths or shortest paths
(geodesics) passing through a vertex;
Indices based on distances from the vertex to the other
vertices
62. A tale of three tribes
Indices based on degree
Indices based on the number of paths or shortest paths
(geodesics) passing through a vertex;
Indices based on distances from the vertex to the other
vertices
Let me call these indices geometric
64. Three dogs strive for a bone, and
the fourth runs away with it...
The advent of Link Analysis pushed for a third
(winning) tribe: spectral indices
65. Three dogs strive for a bone, and
the fourth runs away with it...
The advent of Link Analysis pushed for a third
(winning) tribe: spectral indices
Some of the geometric indices have also a spectral
(equivalent) definition
67. 1) The degree tribe
(In-)Degree centrality: the number of incoming links
cdeg(x) = d (x)
68. 1) The degree tribe
(In-)Degree centrality: the number of incoming links
Careful: when dealing with directed networks, some
indices present two variants (e.g., in-degree vs. out-
degree), the ones based on incoming paths being more
interesting
cdeg(x) = d (x)
70. 2) The path tribe
Betweenness centrality (Anthonisse 1971):
cbetw(x) =
X
y,z6=x
yz(x)
yz
71. 2) The path tribe
Betweenness centrality (Anthonisse 1971):
Fraction of shortest
paths from y to z
passing through x
cbetw(x) =
X
y,z6=x
yz(x)
yz
72. 2) The path tribe
Betweenness centrality (Anthonisse 1971):
cbetw(x) =
X
y,z6=x
yz(x)
yz
73. 2) The path tribe
Betweenness centrality (Anthonisse 1971):
Katz centrality (Katz 1953):
cKatz(x) =
1X
t=0
↵t
⇧x(t)
cbetw(x) =
X
y,z6=x
yz(x)
yz
74. 2) The path tribe
Betweenness centrality (Anthonisse 1971):
Katz centrality (Katz 1953):
# of paths of length t
ending in x
cKatz(x) =
1X
t=0
↵t
⇧x(t)
cbetw(x) =
X
y,z6=x
yz(x)
yz
75. 2) The path tribe
Betweenness centrality (Anthonisse 1971):
Katz centrality (Katz 1953):
cKatz(x) =
1X
t=0
↵t
⇧x(t)
cbetw(x) =
X
y,z6=x
yz(x)
yz
77. 3) The distance tribe
Closeness centrality (Bavelas 1950):
cclos(x) =
1
P
y d(y, x)
78. 3) The distance tribe
Closeness centrality (Bavelas 1950):
Distance from y
to x
cclos(x) =
1
P
y d(y, x)
79. 3) The distance tribe
Closeness centrality (Bavelas 1950):
cclos(x) =
1
P
y d(y, x)
80. 3) The distance tribe
Closeness centrality (Bavelas 1950):
Lin centrality (Lin 1976):
cclos(x) =
1
P
y d(y, x)
cLin(x) =
canReach(x)2
P
y d(y, x)
81. 3) The distance tribe
Closeness centrality (Bavelas 1950):
Lin centrality (Lin 1976):
cclos(x) =
1
P
y d(y, x)
cLin(x) =
canReach(x)2
P
y d(y, x)
82. 3) The distance tribe
Closeness centrality (Bavelas 1950):
Lin centrality (Lin 1976):
The summation is over all y such that d(y,x)<∞
cclos(x) =
1
P
y d(y, x)
cLin(x) =
canReach(x)2
P
y d(y, x)
83. 3) The distance tribe
Closeness centrality (Bavelas 1950):
Lin centrality (Lin 1976):
The summation is over all y such that d(y,x)<∞
cclos(x) =
1
P
y d(y, x)
cLin(x) =
canReach(x)2
P
y d(y, x)
Completely neglected
by the literature
85. 3) The distance tribe
a new member
Give a warm welcome to Harmonic centrality:
charm(x) =
X
y6=x
1
d(y, x)
86. 3) The distance tribe
a new member
Give a warm welcome to Harmonic centrality:
The denormalized reciprocal of the harmonic mean of a!
distances (even ∞)
charm(x) =
X
y6=x
1
d(y, x)
87. 3) The distance tribe
a new member
Give a warm welcome to Harmonic centrality:
The denormalized reciprocal of the harmonic mean of a!
distances (even ∞)
Inspired by the use the the harmonic mean in
(Marchiori, Latora 2000)
charm(x) =
X
y6=x
1
d(y, x)
88. 3) The distance tribe
a new member
Give a warm welcome to Harmonic centrality:
The denormalized reciprocal of the harmonic mean of a!
distances (even ∞)
Inspired by the use the the harmonic mean in
(Marchiori, Latora 2000)
Probably already appeared somewhere (e.g., quoted for
undirected graphs in Tore Opsahl’s blog)
charm(x) =
X
y6=x
1
d(y, x)
92. PageRank
(Brin, Page, Motwani, Winograd 1999)
The idea is to start from the basic equation...
cpr(x) =
X
y!x
cpr(y)
d+(y)
93. PageRank
(Brin, Page, Motwani, Winograd 1999)
The idea is to start from the basic equation...
...with an adjustment to make it have a unique solution
(and more)...
cpr(x) =
X
y!x
cpr(y)
d+(y)
cpr(x) = ↵
X
y!x
cpr(y)
d+(y)
+
1 ↵
n
94. PageRank
(Brin, Page, Motwani, Winograd 1999)
The idea is to start from the basic equation...
...with an adjustment to make it have a unique solution
(and more)...
cpr(x) =
X
y!x
cpr(y)
d+(y)
cpr(x) = ↵
X
y!x
cpr(y)
d+(y)
+
1 ↵
n
95. PageRank
(Brin, Page, Motwani, Winograd 1999)
The idea is to start from the basic equation...
...with an adjustment to make it have a unique solution
(and more)...
It is the dominant eigenvector of
cpr(x) =
X
y!x
cpr(y)
d+(y)
cpr(x) = ↵
X
y!x
cpr(y)
d+(y)
+
1 ↵
n
↵Gr + (1 ↵)1T
1/n
96. PageRank
(Brin, Page, Motwani, Winograd 1999)
The idea is to start from the basic equation...
...with an adjustment to make it have a unique solution
(and more)...
It is the dominant eigenvector of
Katz is the dominant eigenvector of
cpr(x) =
X
y!x
cpr(y)
d+(y)
cpr(x) = ↵
X
y!x
cpr(y)
d+(y)
+
1 ↵
n
↵Gr + (1 ↵)1T
1/n
↵G + (1 ↵)eT
1/n
98. Seeley index
(Seeley 1949)
It is essentially like PageRank with no damping factor;
equivalently, the stable state of the natural random walk
on G:
cSeeley(x) =
✓
lim
t!1
1
n
(Gr)t
◆
x
99. Seeley index
(Seeley 1949)
It is essentially like PageRank with no damping factor;
equivalently, the stable state of the natural random walk
on G:
cSeeley(x) =
✓
lim
t!1
1
n
(Gr)t
◆
x
100. Seeley index
(Seeley 1949)
It is essentially like PageRank with no damping factor;
equivalently, the stable state of the natural random walk
on G:
It is obtained as the limit of PageRank when the
damping goes to 1
cSeeley(x) =
✓
lim
t!1
1
n
(Gr)t
◆
x
101. Seeley index
(Seeley 1949)
It is essentially like PageRank with no damping factor;
equivalently, the stable state of the natural random walk
on G:
It is obtained as the limit of PageRank when the
damping goes to 1
In general it is a dominant eigenvector of
cSeeley(x) =
✓
lim
t!1
1
n
(Gr)t
◆
x
Gr
104. HITS
(Kleinberg 1997)
The idea is to start from the system:
HITS centrality is defined to be the “authoritativeness”
score
cHauth(x) =
X
y!x
cHhub(y)
cHhub(x) =
X
x!y
cHauth(y)
105. HITS
(Kleinberg 1997)
The idea is to start from the system:
HITS centrality is defined to be the “authoritativeness”
score
It is a dominant eigenvector of
cHauth(x) =
X
y!x
cHhub(y)
cHhub(x) =
X
x!y
cHauth(y)
GT
G
107. HITS (, SALSA etc.)
WARNING: These measures were proposed exactly
for ranking results in hyperlinked collections
108. HITS (, SALSA etc.)
WARNING: These measures were proposed exactly
for ranking results in hyperlinked collections
Should be applied not to the whole graph, but to a
suitable subgraph derived from the query
109. HITS (, SALSA etc.)
WARNING: These measures were proposed exactly
for ranking results in hyperlinked collections
Should be applied not to the whole graph, but to a
suitable subgraph derived from the query
How the subgraph is derived is very relevant for
effectiveness (Najork, Gollapudi, Panighray 2009)
110. HITS (, SALSA etc.)
WARNING: These measures were proposed exactly
for ranking results in hyperlinked collections
Should be applied not to the whole graph, but to a
suitable subgraph derived from the query
How the subgraph is derived is very relevant for
effectiveness (Najork, Gollapudi, Panighray 2009)
Not really the central point here, though...
112. SALSA
(Lempel, Moran 2001)
The idea is to start from the system:
cSauth(x) =
X
y!x
cShub(y)
d+(y)
cShub(x) =
X
x!y
cSauth(y)
d (y)
113. SALSA
(Lempel, Moran 2001)
The idea is to start from the system:
SALSA centrality is defined to be the
“authoritativeness” score
cSauth(x) =
X
y!x
cShub(y)
d+(y)
cShub(x) =
X
x!y
cSauth(y)
d (y)
114. SALSA
(Lempel, Moran 2001)
The idea is to start from the system:
SALSA centrality is defined to be the
“authoritativeness” score
It is a dominant eigenvector of
cSauth(x) =
X
y!x
cShub(y)
d+(y)
cShub(x) =
X
x!y
cSauth(y)
d (y)
GT
c Gr
124. Axiomatic
Sensitivity to size
When k or p goes to ∞, the nodes of the
corresponding subnetwork must become
more important
k p
Two disjoint (or
very far)
components of a
single network
133. An axiomatic slaughter
Size Density Monot.
Degree only k yes yes
Betweennes
s
only p no no
Katz only k yes yes
Closeness no no no
Lin only k no no
Harmonic yes yes yes
PageRank no yes yes
Seeley no yes no
HITS only k yes no
SALSA no yes no
134. An axiomatic slaughter
Size Density Monot.
Degree only k yes yes
Betweennes
s
only p no no
Katz only k yes yes
Closeness no no no
Lin only k no no
Harmonic yes yes yes
PageRank no yes yes
Seeley no yes no
HITS only k yes no
SALSA no yes no
135. An axiomatic slaughter
Size Density Monot.
Degree only k yes yes
Betweennes
s
only p no no
Katz only k yes yes
Closeness no no no
Lin only k no no
Harmonic yes yes yes
PageRank no yes yes
Seeley no yes no
HITS only k yes no
SALSA no yes no
136. An axiomatic slaughter
Size Density Monot.
Degree only k yes yes
Betweennes
s
only p no no
Katz only k yes yes
Closeness no no no
Lin only k no no
Harmonic yes yes yes
PageRank no yes yes
Seeley no yes no
HITS only k yes no
SALSA no yes no
138. Ground-truth lens
(mostly: anecdoctic / comparative)
check them against real (social?) networks on which you have
some ground truth about importance/centrality/...
139. Hollywood: PageRank
Ron Jeremy Adolf Hitler Lloyd Kaufman George W. Bush
Ronald Reagan Bill Clinton Martin Sheen Debbie Rochon
140. Hollywood: PageRank
Ron Jeremy Adolf Hitler Lloyd Kaufman George W. Bush
Ronald Reagan Bill Clinton Martin Sheen Debbie Rochon
145. Hollywood: Katz
William Shatner Martin Sheen
George Clooney
Robin WilliamsTom Hanks
Ronald Reagan Bruce Willis Samuel Jackson
146. Hollywood: Katz
William Shatner Martin Sheen
George Clooney
Robin WilliamsTom Hanks
Ronald Reagan Bruce Willis Samuel Jackson
147. Hollywood: Closeness
Lina Tjeng Ryan VillapotoAnh Loan Nguyen Thi Chad Reed
Bjorn van Wenum J.P. Ramackers Herbert Sydney R.D. Nicholson
148. Hollywood: Closeness
Lina Tjeng Ryan VillapotoAnh Loan Nguyen Thi Chad Reed
Bjorn van Wenum J.P. Ramackers Herbert Sydney R.D. Nicholson
149. Hollywood: Closeness
Lina Tjeng Ryan VillapotoAnh Loan Nguyen Thi Chad Reed
Bjorn van Wenum J.P. Ramackers Herbert Sydney R.D. Nicholson
Isolatednodeshavelargestcentrality
150. Hollywood: Lin
George Clooney Samuel Jackson Martin Sheen Dennis Hopper
Antonio Banderas Madonna Michael Douglas Tom Cruise
151. Hollywood: HITS
William Shatner Robin Williams Bruce WillisTom Hanks
Michael Douglas Cameron Diaz Harrison Ford Pierce Brosnan
184. TREC .gov2
150 queries (query title words, in AND; with stemming,
no stopword elimination)
185. TREC .gov2
150 queries (query title words, in AND; with stemming,
no stopword elimination)
Generated the result graph using the method described
by Najork et al. 2009 (a variant of Kleinberg’s HITS
graph, taking a in-links and b out-links)
186. TREC .gov2
150 queries (query title words, in AND; with stemming,
no stopword elimination)
Generated the result graph using the method described
by Najork et al. 2009 (a variant of Kleinberg’s HITS
graph, taking a in-links and b out-links)
Considered many combinations: here I present only
the cases a=b=0 (i.e., subgraph induced by the result set)
187. TREC .gov2
150 queries (query title words, in AND; with stemming,
no stopword elimination)
Generated the result graph using the method described
by Najork et al. 2009 (a variant of Kleinberg’s HITS
graph, taking a in-links and b out-links)
Considered many combinations: here I present only
the cases a=b=0 (i.e., subgraph induced by the result set)
With or without intra-host links
198. Intra-host links?
Keep them or throw them away?
Most indices get better if you throw them away...
Throwing such links away injects a lot of information,
but apparently harmonic doesn’t need it!
199. Intra-host links?
Keep them or throw them away?
Most indices get better if you throw them away...
Throwing such links away injects a lot of information,
but apparently harmonic doesn’t need it!
...but harmonic is better (and best of all) with the
whole thing!
232. Computing harmonic
Let us take harmonic as an example
But how easy is it to compute?
charm(x) =
X
y6=x
1
d(y, x)
233. Computing harmonic
Let us take harmonic as an example
But how easy is it to compute?
...or...
charm(x) =
X
y6=x
1
d(y, x)
charm(x) =
1X
t=1
|Bt(x)| |Bt 1(x)|
t
234. Computing harmonic
Let us take harmonic as an example
But how easy is it to compute?
...or...
charm(x) =
X
y6=x
1
d(y, x)
charm(x) =
1X
t=1
|Bt(x)| |Bt 1(x)|
t Ball of radius t
about x
235. Computing harmonic
Let us take harmonic as an example
But how easy is it to compute?
...or...
charm(x) =
X
y6=x
1
d(y, x)
charm(x) =
1X
t=1
|Bt(x)| |Bt 1(x)|
t
239. Computing by diffusion
Clearly
Moreover
So one needs just one single sequential scan of the
graph to compute the balls at the next iteration
B0(x) = {x}
Bt+1(x) = {x} [
[
x!y
Bt(y)
268. Easy but expensive
Each set uses linear space; overall quadratic
Impossible!
But what if we use approximate sets?
269. Easy but expensive
Each set uses linear space; overall quadratic
Impossible!
But what if we use approximate sets?
Idea: use probabilistic counters, which represent sets but
answer just to “size?” questions
270. Easy but expensive
Each set uses linear space; overall quadratic
Impossible!
But what if we use approximate sets?
Idea: use probabilistic counters, which represent sets but
answer just to “size?” questions
Very small! With 40 bits you can count up to 4 billion
with a standard deviation of 6%
273. Use the Force
HyperBall (http://webgraph.dsi.unimi.it/)
does the job
Fully exploits multicore architectures; uses broadword
microparallelization
274. Use the Force
HyperBall (http://webgraph.dsi.unimi.it/)
does the job
Fully exploits multicore architectures; uses broadword
microparallelization
Works like a charm on networks with hundreds of
millions of nodes
277. What is HyperBall
It uses the diffusion idea to compute (at the same
time):
Lin + Closeness + Harmonic + Number of reachable
nodes...
278. What is HyperBall
It uses the diffusion idea to compute (at the same
time):
Lin + Closeness + Harmonic + Number of reachable
nodes...
It employs Flajolet’s HyperLogLog counters to store
sets
279. What is HyperBall
It uses the diffusion idea to compute (at the same
time):
Lin + Closeness + Harmonic + Number of reachable
nodes...
It employs Flajolet’s HyperLogLog counters to store
sets
More on HyperBall in the next few slides...
282. Main trick
Choose an approximate set such that unions can be
computed quickly
ANF [Palmer et al., KDD ’02] uses Martin–Flajolet
(MF) counters (log n+c space)
283. Main trick
Choose an approximate set such that unions can be
computed quickly
ANF [Palmer et al., KDD ’02] uses Martin–Flajolet
(MF) counters (log n+c space)
We use HyperLogLog counters [Flajolet et al.,2007]
(loglog n space)
284. Main trick
Choose an approximate set such that unions can be
computed quickly
ANF [Palmer et al., KDD ’02] uses Martin–Flajolet
(MF) counters (log n+c space)
We use HyperLogLog counters [Flajolet et al.,2007]
(loglog n space)
MF counters can be combined with an OR
285. Main trick
Choose an approximate set such that unions can be
computed quickly
ANF [Palmer et al., KDD ’02] uses Martin–Flajolet
(MF) counters (log n+c space)
We use HyperLogLog counters [Flajolet et al.,2007]
(loglog n space)
MF counters can be combined with an OR
We use broadword programming to combine
HyperLogLog counters quickly!
288. HyperLogLog counters
Instead of actually counting, we observe a statistical
feature of a set (think stream) of elements
The feature: the number of trailing zeroes of the value
of a very good hash function
289. HyperLogLog counters
Instead of actually counting, we observe a statistical
feature of a set (think stream) of elements
The feature: the number of trailing zeroes of the value
of a very good hash function
We keep track of the maximum m (log log n bits!)
290. HyperLogLog counters
Instead of actually counting, we observe a statistical
feature of a set (think stream) of elements
The feature: the number of trailing zeroes of the value
of a very good hash function
We keep track of the maximum m (log log n bits!)
The number of distinct elements ∝ 2m
291. HyperLogLog counters
Instead of actually counting, we observe a statistical
feature of a set (think stream) of elements
The feature: the number of trailing zeroes of the value
of a very good hash function
We keep track of the maximum m (log log n bits!)
The number of distinct elements ∝ 2m
Important: the counter of stream AB is simply the
maximum of the counters of A and B!
293. Other ideas
We keep track of modifications: we do not maximize
with unmodified counters
294. Other ideas
We keep track of modifications: we do not maximize
with unmodified counters
Systolic computation: each modified set signals back to
predecessors that something is going to happen (much
fewer updates!)
295. Other ideas
We keep track of modifications: we do not maximize
with unmodified counters
Systolic computation: each modified set signals back to
predecessors that something is going to happen (much
fewer updates!)
Multicore exploitation by decomposition: a task is
updating just 1000 counters (almost linear scaling)
299. Footprint
Scalability: a minimum of 20 bytes per node
On a 2TiB machine, 100 billion nodes
Graph structure is accessed by memory-mapping in a
compressed form (WebGraph)
300. Footprint
Scalability: a minimum of 20 bytes per node
On a 2TiB machine, 100 billion nodes
Graph structure is accessed by memory-mapping in a
compressed form (WebGraph)
Pointers to the graph are stored using succinct lists
(Elias-Fano representation)
303. Performance
On a 177K nodes
Hadoop: 2875s per iteration [Kang, Papadimitriou,
Sun and H. Tong, 2011]
304. Performance
On a 177K nodes
Hadoop: 2875s per iteration [Kang, Papadimitriou,
Sun and H. Tong, 2011]
HyperBall on this laptop: 70s per iteration
305. Performance
On a 177K nodes
Hadoop: 2875s per iteration [Kang, Papadimitriou,
Sun and H. Tong, 2011]
HyperBall on this laptop: 70s per iteration
On a 32-core workstation: 23s per iteration
306. Performance
On a 177K nodes
Hadoop: 2875s per iteration [Kang, Papadimitriou,
Sun and H. Tong, 2011]
HyperBall on this laptop: 70s per iteration
On a 32-core workstation: 23s per iteration
On ClueWeb09 (4.8G nodes, 8G arcs) on a 40-core
workstation: 141m (avg. 40s per iteration)
309. And when the Force is not enough?
Diffusion processes are easily distributable
310. And when the Force is not enough?
Diffusion processes are easily distributable
A Pregel-like implementation of HyperANF is
underway (by Sebastian Schelter, TU Berlin)
312. Did you see the
light?
No? Me neither... But we have a path
313. Did you see the
light?
No? Me neither... But we have a path
Some things have been ruled out, at least
314. Did you see the
light?
No? Me neither... But we have a path
Some things have been ruled out, at least
Some others, that were neglected, have found revenge
315. Did you see the
light?
No? Me neither... But we have a path
Some things have been ruled out, at least
Some others, that were neglected, have found revenge
Some new ones seem to be promising