SlideShare uma empresa Scribd logo
1 de 25
Baixar para ler offline
Kucuktunc et al. “Diversified Recommendation on Graphs: Pitfalls, Measures, and Algorithms”, WWW’13 1/25
Diversified Recommendation on Graphs:
Pitfalls, Measures, and Algorithms
Onur Küçüktunç1,2
Erik Saule1
Kamer Kaya1
Ümit V. Çatalyürek1,3
WWW 2013, May 13–17, 2013, Rio de Janeiro, Brazil.
1Dept. Biomedical Informatics
2Dept. of Computer Science and Engineering
3Dept. of Electrical and Computer Engineering
The Ohio State University
Kucuktunc et al. “Diversified Recommendation on Graphs: Pitfalls, Measures, and Algorithms”, WWW’13 2/25
Outline
•  Problem definition
–  Motivation
–  Result diversification algorithms
•  How to measure diversity
–  Classical relevance and diversity measures
–  Bicriteria optimization?!
–  Combined measures
•  Best Coverage method
–  Complexity, submodularity
–  A greedy solution, relaxation
•  Experiments
Kucuktunc et al. “Diversified Recommendation on Graphs: Pitfalls, Measures, and Algorithms”, WWW’13 3/25
Problem definition
G = (V, E)
Q ✓ V
Online shopping
product
co-purchasing
•  one product
•  previous purchases
•  page visit history
Academic
paper-to-paper
citations
•  paper/field of interest
•  set of references
•  researcher himself/herself
product recommendations
“you might also like…”R ⇢ V references for related work
new collaborators
collaboration
network
Social
friendship
network
•  user himself/herself
•  set of people
friend recommendations
“you might also know…”
Let G = (V, E) be an undirected graph.
Given a set of m seed nodes
Q = {q1, . . . , qm} s.t. Q ✓ V ,
and a parameter k, return top-k items
which are relevant to the ones in Q,
but diverse among themselves, covering
di↵erent aspects of the query.
Kucuktunc et al. “Diversified Recommendation on Graphs: Pitfalls, Measures, and Algorithms”, WWW’13 4/25
Problem definition
Let G = (V, E) be an undirected graph.
Given a set of m seed nodes
Q = {q1, . . . , qm} s.t. Q ✓ V ,
and a parameter k, return top-k items
which are relevant to the ones in Q,
but diverse among themselves, covering
di↵erent aspects of the query.
•  We assume that the graph itself is the only information we have, and
no categories or intents are available
•  no comparisons to intent-aware algorithms [Agrawal09,Welch11,etc.]
•  but we will compare against intent-aware measures
•  Relevance scores are obtained with Personalized PageRank (PPR)
[Haveliwala02]
p⇤
(v) =
(
1/m, if v 2 Q
0, otherwise.
Kucuktunc et al. “Diversified Recommendation on Graphs: Pitfalls, Measures, and Algorithms”, WWW’13 5/25
Result diversification algorithms
•  GrassHopper [Zhu07]
–  ranks the graph k times
•  turns the highest-ranked vertex into a sink node at each iteration
0 5 10
0
2
4
6
8
0 5 10
0
5
10
0
0.005
0.01
0.015
g1
0 5 10
0
5
10
0
2
4
6
g2
00
5
10
0
0.5
1
1.5
(a) (b) (c) (d)
Figure 1: (a) A toy data set. (b) The stationary distribution π reflects centrality. The item with
probability is selected as the first item g1. (c) The expected number of visits v to each node after g
an absorbing state. (d) After both g1 and g2 become absorbing states. Note the diversity in g1, g2,
0 5 10
0
2
4
6
8
0 5 10
0
5
10
0
0.005
0.01
0.015
g1
0 5 10
0
5
10
0
2
4
6
g2
00
5
10
0
0.5
1
1.5
(a) (b) (c) (d
Figure 1: (a) A toy data set. (b) The stationary distribution π reflects centrality. The item with
probability is selected as the first item g1. (c) The expected number of visits v to each node after
an absorbing state. (d) After both g1 and g2 become absorbing states. Note the diversity in g1, g2
come from different groups.
Items at group centers have higher probabilities, and 2.3 Ranking the Remaining Items
5 10 0 5 10
0
5
10
0
0.005
0.01
0.015
g1
0 5 10
0
5
10
0
2
4
6
g2
0 5 10
0
5
10
0
0.5
1
1.5
g
3
(a) (b) (c) (d)
highest-ranked
vertex
R = {g1}
R = {g1,g2}
R = {g1,g2,g3}
g1 turned into
a sink node
highest-ranked
in the next step
Kucuktunc et al. “Diversified Recommendation on Graphs: Pitfalls, Measures, and Algorithms”, WWW’13 6/25
Result diversification algorithms
•  GrassHopper [Zhu07]
–  ranks the graph k times
•  turns the highest-ranked vertex into a sink node at each iteration
•  DivRank [Mei10]
–  based on vertex-reinforced random walks (VRRW)
•  adjusts the transition matrix based on the number of visits to the
vertices (rich-gets-richer mechanism)
(a) An illustrative network. (b) Weighting with PageRank. (c) A diverse weighting.(a) An illustrative network. (b) Weighting with PageRank. (c) A diverse weighting.(a) An illustrative network. (b) Weighting with PageRank. (c) A diverse weighting.sample graph weighting with PPR diverse weighting
Kucuktunc et al. “Diversified Recommendation on Graphs: Pitfalls, Measures, and Algorithms”, WWW’13 7/25
Result diversification algorithms
•  GrassHopper [Zhu07]
–  ranks the graph k times
•  turns the highest-ranked vertex into a sink node at each iteration
•  DivRank [Mei10]
–  based on vertex-reinforced random walks (VRRW)
•  adjusts the transition matrix based on the number of visits to the
vertices (rich-gets-richer mechanism)
•  Dragon [Tong11]
–  based on optimizing the goodness measure
•  punishes the score when two neighbors are included in the
results
Kucuktunc et al. “Diversified Recommendation on Graphs: Pitfalls, Measures, and Algorithms”, WWW’13 8/25
Measuring diversity
Relevance measures
•  Normalized relevance
•  Difference ratio
•  nDCG
Diversity measures
•  l-step graph density
•  l-expansion ratio
rel(S) =
P
v2S ⇡v
Pk
i=1 ˆ⇡i
di↵(S, ˆS) = 1
|S  ˆS|
|S|
nDCGk =
⇡s1 +
Pk
i=2
⇡si
log2 i
ˆ⇡1 +
Pk
i=2
ˆ⇡i
log2 i
dens`(S) =
P
u,v2S,u6=v d`(u, v)
|S| ⇥ (|S| 1)
`(S) =
|N`(S)|
n
N`(S) = S [ {v 2 (V S) : 9u 2 S, d(u, v)  `}
where
Kucuktunc et al. “Diversified Recommendation on Graphs: Pitfalls, Measures, and Algorithms”, WWW’13 9/25
Bicriteria optimization measures
•  aggregate a relevance and a diversity measure
•  [Carbonell98]
•  [Li11]
•  [Vieira11]
•  max-sum diversification, max-min diversification,
k-similar diversification set, etc. [Gollapudi09]
fMMR(S) = (1 )
X
v2S
⇡v
X
u2S
max
v2S
u6=v
sim(u, v)
fL(S) =
X
v2S
⇡v +
|N(S)|
n
fMSD(S) = (k 1)(1 )
X
v2S
⇡v + 2
X
u2S
X
v2S
u6=v
div(u, v)
Kucuktunc et al. “Diversified Recommendation on Graphs: Pitfalls, Measures, and Algorithms”, WWW’13 10/25
Bicriteria optimization is not the answer
•  Objective: diversify top-10 results
•  Two query-oblivious algorithms:
–  top-% + random
–  top-% + greedy-σ2
Kucuktunc et al. “Diversified Recommendation on Graphs: Pitfalls, Measures, and Algorithms”, WWW’13 11/25
Bicriteria optimization is not the answer
•  normalized relevance and 2-step graph density
•  evaluating result diversification as a bicriteria optimization problem with
–  a relevance measure that ignores diversity, and
–  a diversity measure that ignores relevancy.
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
dens2
rel
10-RLM
BC1
BC1V
BC2
BC2V
BC1100
BC11000
BC150
BC2100
BC21000
BC250
CDivRank
DRAGON
GRASSHOPPER
GSPARSE
IL1
IL2
LM
PDivRank
PR
k-RLM
top-90%+random
top-75%+random
top-50%+random
top-25%+random
All Random
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
dens2
rel
10-RLM
BC1
BC1V
BC2
BC2V
BC1100
BC11000
BC150
BC2100
BC21000
BC250
CDivRank
DRAGON
GRASSHOPPER
GSPARSE
IL1
IL2
LM
PDivRank
PR
k-RLM
top-90%+random
top-75%+random
top-50%+random
top-25%+random
All Random
better
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
dens2
rel
10-RLM
BC1
BC1V
BC2
BC2V
BC1100
BC11000
BC150
BC2100
BC21000
BC250
CDivRank
DRAGON
GRASSHOPPER
GSPARSE
IL1
IL2
LM
PDivRank
PR
k-RLM
top-90%+random
top-75%+random
top-50%+random
top-25%+random
All Random
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
dens2
rel
10-RLM
BC1
BC1V
BC2
BC2V
BC1100
BC11000
BC150
BC2100
BC21000
BC250
CDivRank
DRAGON
GRASSHOPPER
GSPARSE
IL1
IL2
LM
PDivRank
PR
k-RLM
top-90%+random
top-75%+random
top-50%+random
top-25%+random
All Random
better
0
0.05
0.1
0.15
0.2
0.25
0.3
0 0.2 0.4 0.6 0.8 1
σ2
rel
0
0.05
0.1
0.15
0.2
0.25
0.3
0 0.2 0.4 0.6 0.8 1
σ2
rel
better
2 0.4 0.6 0.8 1
others
others
others
others
others
top-90%+random
top-75%+random
top-50%+random
top-25%+random
All random
others
others
others
top-90%+greedy-σ2
top-75%+greedy-σ2
top-50%+greedy-σ2
top-25%+greedy-σ2
All greedy-σ2
others
others
others
other algorithms
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8
others
others
others
others
others
top-90%+random
top-75%+random
top-50%+random
top-25%+random
All random
others
others
others
top-90%+greedy-σ2
top-75%+greedy-σ2
top-50%+greedy-σ2
top-25%+greedy-σ2
All greedy-σ2
others
others
others
other algorithms
Kucuktunc et al. “Diversified Recommendation on Graphs: Pitfalls, Measures, and Algorithms”, WWW’13 12/25
A better measure? Combine both
•  We need a combined measure that tightly
integrates both relevance and diversity aspects
of the result set
•  goodness [Tong11]
–  downside: highly dominated by relevance
fG(S) = 2
X
i2S
⇡i d
X
i,j2S
A(j, i)⇡j
(1 d)
X
j2S
⇡j
X
i2S
p⇤
(i)max-sum relevance
penalize the score when two results share
an edge
Kucuktunc et al. “Diversified Recommendation on Graphs: Pitfalls, Measures, and Algorithms”, WWW’13 13/25
Proposed measure: l-step expanded relevance
•  a combined measure of
–  l-step expansion ratio (σ2)
–  relevance scores (π)
•  quantifies: relevance of
the covered region
of the graph
•  do some sanity check
with this new measure
`-step expanded relevance:
exprel`(S) =
X
v2N`(S)
⇡v
where N`(S) is the `-step expansion
set of the result set S, and ⇡ is the
PPR scores of the items in the graph.
0
0.2
0.4
0.6
0.8
1
510 20 50 100
exprel2
k
0
0.2
0.4
0.6
0.8
1
510 20 50 100
exprel2
k
0.2
0.4
0.6
0.8
1
others
others
others
others
others
top-90%+random
top-75%+random
top-50%+random
top-25%+random
All random
others
others
others
top-90%+greedy-σ2
top-75%+greedy-σ2
top-50%+greedy-σ2
top-25%+greedy-σ2
All greedy-σ2
others
others
others
other algorithms
0.2
0.4
0.6
0.8
1
others
others
others
others
others
top-90%+random
top-75%+random
top-50%+random
top-25%+random
All random
others
others
others
top-90%+greedy-σ2
top-75%+greedy-σ2
top-50%+greedy-σ2
top-25%+greedy-σ2
All greedy-σ2
others
others
others
other algorithms
Kucuktunc et al. “Diversified Recommendation on Graphs: Pitfalls, Measures, and Algorithms”, WWW’13 14/25
Correlations of the measures
relevancediversity
goodness is dominated by
the relevancy measures
exprel has no high correlations with
other relevance or diversity measures
Kucuktunc et al. “Diversified Recommendation on Graphs: Pitfalls, Measures, and Algorithms”, WWW’13 15/25
Proposed algorithm: Best Coverage
•  Can we use -step expanded
relevance as an objective function?
•  Define:
•  Complexity: generalization of weighted maximum coverage problem
–  NP-hard!
–  but exprell is a submodular function (Lemma 4.2)
–  a greedy solution (Algorithm 1) that selects the item
with the highest marginal utility
at each step is the best possible polynomial time
approximation (proof based on [Nemhauser78])
•  Relaxation: computes BestCoverage on
highest ranked vertices to improve runtime
exprel`-diversified top-k ranking (DTR`)
S = argmax
S0
✓V
|S0
|=k
exprel`(S0
)
g(v, S) =
P
v02N`({v}) N`(S) ⇡v0
ALGORITHM 1: BestCoverage
Input: k, G, ⇡, `
Output: a list of recommendations S
S = ;
while |S| < k do
v⇤
argmaxv g(v, S)
S S [ {v⇤
}
return S
ALGORITHM 1: BestCoverage (relaxed)
ALGORITHM 2: BestCoverage (relaxed)
Input: k, G, ⇡, `
Output: a list of recommendations S
S = ;
Sort(V ) w.r.t ⇡i non-increasing
S1 V [1..k0
], i.e., top-k0
vertices where k0
= k¯`
8v 2 S1, g(v) g(v, ;)
8v 2 S1, c(v) Uncovered
while |S| < k do
v⇤
argmaxv2S1 g(v)
S S [ {v⇤
}
S2 N`({v⇤
})
for each v0
2 S2 do
if c(v0
) = Uncovered then
S3 N`({v0
})
8u 2 S3, g(u) g(u) ⇡v0
c(v0
) Covered
return S
`
Kucuktunc et al. “Diversified Recommendation on Graphs: Pitfalls, Measures, and Algorithms”, WWW’13 16/25
Experiments
•  5 target application areas, 5 graphs from SNAP
•  Queries generated based on 3 scenario types
–  one random vertex
–  random vertices from one area of interest
–  multiple vertices from multiple areas of interest
Dataset |V | |E| ¯ D D90% CC
amazon0601 403.3K 3.3M 16.8 21 7.6 0.42
ca-AstroPh 18.7K 396.1K 42.2 14 5.1 0.63
cit-Patents 3.7M 16.5M 8.7 22 9.4 0.09
soc-LiveJournal1 4.8M 68.9M 28.4 18 6.5 0.31
web-Google 875.7K 5.1M 11.6 22 8.1 0.60
Kucuktunc et al. “Diversified Recommendation on Graphs: Pitfalls, Measures, and Algorithms”, WWW’13 17/25
Results – relevance
•  Methods should trade-off relevance for better diversity
•  Normalized relevance of top-k set is always 1
•  DRAGON always return results having 70% similar items
to top-k, with more than 80% rel score
amazon0601, combined
0
0.2
0.4
0.6
0.8
1
5 10 20 50 100
rel
k
PPR (top-k)
GrassHopper
Dragon
PDivRank
CDivRank
k-RLM
GSparse
BC1
BC2
BC1 (relaxed)
BC2 (relaxed)
100
PPR (top-k)
GrassHopper
Dragon
PDivRank
CDivRank
k-RLM
GSparse
BC1
BC2
BC1 (relaxed)
BC2 (relaxed)
50 100
k
PPR (top-k)
GrassHopper
Dragon
PDivRank
CDivRank
k-RLM
GSparse
BC1
BC2
BC1 (relaxed)
BC2 (relaxed)
0
0.2
0.4
0.6
0.8
1
5 10 20 50 100
rel
k
PPR (top-k
GrassHop
Dragon
PDivRank
CDivRank
k-RLM
GSparse
BC1
BC1 (relax
soc-LiveJournal1, combined
10
100
1000
10000
5 10 20 50 100
time(sec)
PPR (top-k)
GrassHopper
Dragon
PDivRank
CDivRank
k-RLM
GSparse
BC1
BC1 (relaxed)
10
100
1000
10000
5 10 20 50 100
time(sec)
k
PPR (top-k)
GrassHopper
Dragon
PDivRank
CDivRank
k-RLM
GSparse
BC1
BC1 (relaxed)
amazon0601, combined
0.6
0.8
1
diff
PPR (top-k)
GrassHopper
Dragon
PDivRank
CDivRank
k-RLM
GSparse
BC1
BC2
BC1 (relaxed)
BC2 (relaxed)
0.6
0.8
1
diff
PPR (top-
GrassHop
Dragon
PDivRank
CDivRank
k-RLM
GSparse
BC1
BC1 (relax
soc-LiveJournal1, combined
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
5 10 2
σ2
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
5 10 20 50 100
σ2
k
P
G
D
P
C
k
G
B
B
B
B
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
5 10 20 50
σ2
k
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
5 10 2
σ2
Figure
ing k.
highest
and k-R
0.8
top-k
DRAGON
Kucuktunc et al. “Diversified Recommendation on Graphs: Pitfalls, Measures, and Algorithms”, WWW’13 18/25
Results – coverage
•  l-step expansion ratio (σ2) gives the graph coverage of
the result set: better coverage = better diversity
•  BestCoverage and DivRank variants, especially
BC2 and PDivRank, have the highest coverage
100
PPR (top-k)
GrassHopper
Dragon
PDivRank
CDivRank
k-RLM
GSparse
BC1
BC1 (relaxed)
ned
PPR (top-k)
GrassHopper
Dragon
PDivRank
CDivRank
k-RLM
GSparse
BC1
BC1 (relaxed)
ined
amazon0601, combined
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
5 10 20 50 100
σ2
k
PPR (top-k)
GrassHopper
Dragon
PDivRank
CDivRank
k-RLM
GSparse
BC1
BC2
BC1 (relaxed)
BC2 (relaxed)
100
PPR (top-k)
GrassHopper
Dragon
PDivRank
CDivRank
k-RLM
GSparse
BC1
BC2
BC1 (relaxed)
BC2 (relaxed)
50 100
k
PPR (top-k)
GrassHopper
Dragon
PDivRank
CDivRank
k-RLM
GSparse
BC1
BC2
BC1 (relaxed)
BC2 (relaxed)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
5 10 20 50 100
σ2
k
PPR (top-k
GrassHop
Dragon
PDivRank
CDivRank
k-RLM
GSparse
BC1
BC2
BC1 (relax
BC2 (relax
ca-AstroPh, combined
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
σ2
PPR (top-k)
GrassHopper
Dragon
PDivRank
CDivRank
k-RLM
GSparse
BC1
BC2
BC1 (relaxed)
BC2 (relaxed)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
5 10 20 50 100
σ2
k
PPR (top-k)
GrassHopper
Dragon
PDivRank
CDivRank
k-RLM
GSparse
BC1
BC2
BC1 (relaxed)
BC2 (relaxed)
Figure 3: Coverage ( 2) of the algorithms with vary-
ing k. BestCoverage and DivRank variants have the
highest coverage on the graphs while Dragon, GSparse,
and k-RLM have similar coverages to top-k results.
amazon0601, combined
0.8
PPR (top-k)
0.8
PPR (top-k)
soc-LiveJournal1, combined
Kucuktunc et al. “Diversified Recommendation on Graphs: Pitfalls, Measures, and Algorithms”, WWW’13 19/25
Results – expanded relevance
•  combined measure for relevance and diversity
•  BestCoverage variants and GrassHopper perform better
•  Although PDivRank gives the highest coverage on
amazon graph, it fails to cover the relevant parts!
100
PPR (top-k)
GrassHopper
Dragon
PDivRank
CDivRank
k-RLM
GSparse
BC1
BC1 (relaxed)
ent
rse
ele-
the
2.3,
ned
ores
0
0.1
0.2
0.3
0.4
5 10 20 50 100
σ
k
BC2 (relaxed)
0
0.1
0.2
5 10 20 50 100
k
ing k. BestCoverage and DivRank variants have the
highest coverage on the graphs while Dragon, GSparse,
and k-RLM have similar coverages to top-k results.
amazon0601, c o m b i n e d
0.35
0.4
0.45
0.5
0.55
0.6
0.65
0.7
0.75
0.8
5 10 20 50 100
exprel2
k
PPR (top-k)
GrassHopper
Dragon
PDivRank
CDivRank
k-RLM
GSparse
BC1
BC2
BC1 (relaxed)
BC2 (relaxed)
PPR (top-k)
GrassHopper
Dragon
PDivRank
CDivRank
k-RLM
GSparse
BC1
BC2
BC1 (relaxed)
BC2 (relaxed)
50 100
k
PPR (top-k)
GrassHopper
Dragon
PDivRank
CDivRank
k-RLM
GSparse
BC1
BC2
BC1 (relaxed)
BC2 (relaxed)
0.3
0.35
0.4
0.45
0.5
0.55
0.6
0.65
0.7
0.75
0.8
5 10 20 50 100
exprel2
k
PPR (top-k
GrassHopp
Dragon
PDivRank
CDivRank
k-RLM
GSparse
BC1
BC1 (relaxe
soc-LiveJournal1, c o m b i n e d
0.3
0.35
0.4
0.45
0.5
0.55
0.6
0.65
0.7
0.75
0.8
5 10 20 50 100
exprel2
k
PPR (top-k)
GrassHopper
Dragon
PDivRank
CDivRank
k-RLM
GSparse
BC1
BC1 (relaxed)
Figure 4: Expanded relevance (exprel2 ) with vary-
ing k. BC1 and BC2 variants mostly score the best,
GrassHopper performs high in soc-LiveJournal1. Al-
though PDivRank gives the highest coverage on ama-
zon0601 (Fig. 3), it fails to cover the relevant parts.
BC2
BC1
PDivRank
GrassHopper
Kucuktunc et al. “Diversified Recommendation on Graphs: Pitfalls, Measures, and Algorithms”, WWW’13 20/25
Results – efficiency
•  BC1 always performs
better, with a running
time less than, DivRank
and GrassHopper
•  BC1 (relaxed) offers
reasonable diversity,
with a very little
overhead on top of the
PPR computation
100
PPR (top-k)
GrassHopper
Dragon
PDivRank
CDivRank
k-RLM
GSparse
BC1
BC2
BC1 (relaxed)
BC2 (relaxed)
3
)
)
100
PPR (top-k)
GrassHopper
Dragon
PDivRank
CDivRank
k-RLM
GSparse
BC1
BC2
BC1 (relaxed)
BC2 (relaxed)
3
0.01
0.1
1
10
5 10 20 50 100
time(sec)
k
PPR (top-k)
GrassHopper
Dragon
PDivRank
CDivRank
k-RLM
GSparse
BC1
BC2
BC1 (relaxed)
BC2 (relaxed)
100
PPR (top-k)
GrassHopper
Dragon
PDivRank
CDivRank
k-RLM
GSparse
BC1
BC2
BC1 (relaxed)
BC2 (relaxed)
50 100
k
PPR (top-k)
GrassHopper
Dragon
PDivRank
CDivRank
k-RLM
GSparse
BC1
BC2
BC1 (relaxed)
BC2 (relaxed)
20 50 100
k
PPR (top-k)
GrassHopper
Dragon
PDivRank
CDivRank
k-RLM
GSparse
BC1
BC2
BC1 (relaxed)
BC2 (relaxed)
ca-AstroPh, combined
10
100
1000
10000
5 10 20 50 100
time(sec)
k
PPR (top-k)
GrassHopper
Dragon
PDivRank
CDivRank
k-RLM
GSparse
BC1
BC1 (relaxed)
0.1
1
10
100
5 10 20 50 100
time(sec)
k
PPR (top-k)
GrassHopper
Dragon
PDivRank
CDivRank
k-RLM
GSparse
BC1
BC2
BC1 (relaxed)
BC2 (relaxed)
10
100
1000
10000
5 10 20 50 100
time(sec)
k
PPR (top-k)
GrassHopper
Dragon
PDivRank
CDivRank
k-RLM
GSparse
BC1
BC1 (relaxed)
soc-LiveJournal1, combined
10
100
1000
10000
5 10 20 50 100
time(sec)
k
PPR (top-k)
GrassHopper
Dragon
PDivRank
CDivRank
k-RLM
GSparse
BC1
BC2
BC1 (relaxed)
BC2 (relaxed)
100
PPR (top-k)
GrassHopper
Dragon
PDivRank
CDivRank
k-RLM
GSparse
BC1
BC2
BC1 (relaxed)
BC2 (relaxed)
50 100
k
PPR (top-k)
GrassHopper
Dragon
PDivRank
CDivRank
k-RLM
GSparse
BC1
BC2
BC1 (relaxed)
BC2 (relaxed)
cit-Patents, scenario 1
1
10
100
1000
5 10 20 50 100
time(sec)
k
PPR (top-k)
GrassHopper
Dragon
PDivRank
CDivRank
k-RLM
GSparse
BC1
BC2
BC1 (relaxed)
BC2 (relaxed)
web-Google, scenario 1
Figure 7: Running times of the algorithms with
varying k. BC1 method always perform better with a
running time less than GrassHopper and DivRank vari-
ants, while the relaxed versions score similarly with
a slight overhead on top of the PPR computation.
BC1
BC1
BC1
BC1
DivRank
DivRank
DivRank DivRank
BC1 (relaxed)
BC1 (relaxed)
BC1 (relaxed)BC1 (relaxed)
Kucuktunc et al. “Diversified Recommendation on Graphs: Pitfalls, Measures, and Algorithms”, WWW’13 21/25
Results – intent aware experiments
•  evaluation of intent-oblivious algorithms against
intent-aware measures
•  two measures
–  group coverage [Li11]
–  S-recall [Zhai03]
•  cit-Patent dataset has the categorical
information
–  426 class labels, belong to 36 subtopics
Kucuktunc et al. “Diversified Recommendation on Graphs: Pitfalls, Measures, and Algorithms”, WWW’13 22/25
Results – intent aware experiments
•  group coverage [Li11]
–  How many different groups are covered by the results?
–  omits the actual intent of the query
•  top-k results are not diverse enough
•  AllRandom results cover the most number of groups
•  PDivRank and BC2 follows
0
10
20
30
40
50
60
70
5 10 20 50 100
Classcoverage
k
PPR (top-k)
Dragon
PDivRank
CDivRank
k-RLM
BC1
BC2
BC1 (relaxed)
BC2 (relaxed)
AllRandom
(a) Class coverage
0
5
10
15
20
25
30
5 10 20 50 100
Subtopiccoverage
k
PPR (top-k)
Dragon
PDivRank
CDivRank
k-RLM
BC1
BC2
BC1 (relaxed)
BC2 (relaxed)
AllRandom
(b) Subtopic coverage
0
1
2
3
4
5
6
5 10 20
Topiccoverage
(c) Topic
0
5
10
15
20
25
30
5 10 20 50
Subtopiccoverage
k
50 100
k
PPR (top-k)
Dragon
PDivRank
CDivRank
k-RLM
BC1
BC2
BC1 (relaxed)
BC2 (relaxed)
AllRandom
) Topic coverage
100
PPR (top-k)
Dragon
PDivRank
CDivRank
k-RLM
BC1
BC2
BC1 (relaxed)
BC2 (relaxed)
AllRandom
100
PPR (top-k)
Dragon
PDivRank
CDivRank
k-RLM
BC1
BC2
BC1 (relaxed)
BC2 (relaxed)
AllRandom
ge
0
5
10
15
20
25
30
5 10 20 50 100
Subtopiccoverage
k
PPR (top-k)
Dragon
PDivRank
CDivRank
k-RLM
BC1
BC2
BC1 (relaxed)
BC2 (relaxed)
AllRandom
(b) Subtopic coverage
0
1
2
3
4
5
6
5 10 20 50 100
Topiccoverage
k
PPR (top-k)
Dragon
PDivRank
CDivRank
k-RLM
BC1
BC2
BC1 (relaxed)
BC2 (relaxed)
AllRandom
(c) Topic coverage
0
5
10
15
20
25
30
5 10 20 50 100
Subtopiccoverage
k
PPR (top-k)
Dragon
PDivRank
CDivRank
k-RLM
BC1
BC2
BC1 (relaxed)
BC2 (relaxed)
AllRandom
ses (e) S-recall on subtopics (f) S-recall on topics
Intent-aware results on cit-Patents dataset with scenario-3 queries.
top-k
top-k
random
BC2
BC2
Kucuktunc et al. “Diversified Recommendation on Graphs: Pitfalls, Measures, and Algorithms”, WWW’13 23/25
0
10
20
30
40
50
60
5 10 20 50 100
Classcoverage
k
PDivRank
CDivRank
k-RLM
BC1
BC2
BC1 (relaxed)
BC2 (relaxed)
AllRandom
(a) Class coverage
0
5
10
15
20
25
5 10 20 50 100
Subtopiccoverage
k
PDivRank
CDivRank
k-RLM
BC1
BC2
BC1 (relaxed)
BC2 (relaxed)
AllRandom
(b) Subtopic coverage
0
1
2
3
4
5
5 10 20
Topiccoverage
(c) Topic
0
5
10
15
20
25
5 10 20 50
Subtopiccoverage
k
0
0.1
0.2
0.3
0.4
0.5
0.6
5 10 20
S-recall(class)
k
(d) S-recall on classes
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
5 10 20
S-recall(subtopic) k
(e) S-recall on subtopics (f) S-recal
Figure 8: Intent-aware results on cit-Patents dataset with sc
level topics7
. Here we present an evaluation of the intent-
oblivious algorithms against intent-aware measures. This
evaluation provides a validation of the diversification tech-
niques with an external measure such as group coverage [14]
and S-recall [23].
sults of all of the dive
those two extremes, w
and Dragon covers the
However, S-recall ind
was actually useful or
Results – intent aware experiments
•  S-recall [Zhai03], Intent-coverage [Zhu11]
–  percentage of relevant subtopics covered by the result set
–  the intent is given with the classes of the seed nodes
•  AllRandom brings irrelevant items from the search space
•  top-k results do not have the necessary diversity
•  BC2 variants and BC1 perform better than DivRank
•  BC1 (relaxed) and DivRank scores similar, but BC1r much faster
top-k
random
DivRank
BC
0
10
20
30
40
50
60
70
5 10 20 50 100
k
PPR (top-k)
Dragon
PDivRank
CDivRank
k-RLM
BC1
BC2
BC1 (relaxed)
BC2 (relaxed)
AllRandom
(a) Class coverage
0
5
10
15
20
25
30
5 10 20 50 100
Subtopiccoverage
k
PPR (top-k)
Dragon
PDivRank
CDivRank
k-RLM
BC1
BC2
BC1 (relaxed)
BC2 (relaxed)
AllRandom
(b) Subtopic coverage
0
1
2
3
4
5
6
5 10 20 50 100
Topiccoverage
k
PPR (top-k)
Dragon
PDivRank
CDivRank
k-RLM
BC1
BC2
BC1 (relaxed)
BC2 (relaxed)
AllRandom
(c) Topic coverage
0
5
10
15
20
25
30
5 10 20 50 100
Subtopiccoverage
k
PPR (top-k)
Dragon
PDivRank
CDivRank
k-RLM
BC1
BC2
BC1 (relaxed)
BC2 (relaxed)
AllRandom
(d) S-recall on classes (e) S-recall on subtopics (f) S-recall on topics
PPR (top-k)
Dragon
PDivRank
CDivRank
k-RLM
BC1
BC2
BC1 (relaxed)
BC2 (relaxed)
AllRandom
Figure 8: Intent-aware results on cit-Patents dataset with scenario-3 queries.
pics7
. Here we present an evaluation of the intent-
s algorithms against intent-aware measures. This
on provides a validation of the diversification tech-
with an external measure such as group coverage [14]
ecall [23].
ts of a query set Q is extracted by collecting the
subtopics, and topics of each seed node. Since our
o evaluate the results based on the coverage of dif-
sults of all of the diversification algorithms reside between
those two extremes, where the PDivRank covers the most
and Dragon covers the least number of groups.
However, S-recall index measures whether a covered group
was actually useful or not. Obviously, AllRandom scores the
lowest as it dismisses the actual query (you may omit the S-
recall on topics since there are only 6 groups in this granular
ity level). Among the algorithms, BC2 variants and BC1 score
random
random
random
random
random
top-k
top-k
top-k
top-k
top-k
DivRank
DivRank
DivRank
DivRank
DR
BC
BC
BC
BC
BC
Kucuktunc et al. “Diversified Recommendation on Graphs: Pitfalls, Measures, and Algorithms”, WWW’13 24/25
Conclusions
•  Result diversification should not be evaluated as a
bicriteria optimization problem with
–  a relevance measure that ignores diversity, and
–  a diversity measure that ignores relevancy
•  l-step expanded relevance is a simple measure that
combines both relevance and diversity
•  BestCoverage, a greedy solution that maximizes
exprell is a (1-1/e)-approximation of the optimal solution
•  BestCoverage variants perform better than others, its
relaxation is extremely efficient
•  goodness in DRAGON is dominated by relevancy
•  DivRank variants implicitly optimize expansion ratio
Kucuktunc et al. “Diversified Recommendation on Graphs: Pitfalls, Measures, and Algorithms”, WWW’13 25/25
Thank you
•  For more information visit
•  http://bmi.osu.edu/hpc
•  Research at the HPC Lab is funded by

Mais conteúdo relacionado

Destaque

El sistema científico tecnológico de venezuela
El sistema científico tecnológico de venezuelaEl sistema científico tecnológico de venezuela
El sistema científico tecnológico de venezuelanahomyc
 
proceso unico de ejecucion de garantias
proceso unico de ejecucion de garantiasproceso unico de ejecucion de garantias
proceso unico de ejecucion de garantiasmaria_leo
 
Livret recettes bridelight
Livret recettes bridelightLivret recettes bridelight
Livret recettes bridelightsarah Benmerzouk
 
cas resolucion de conflictos
cas resolucion de conflictoscas resolucion de conflictos
cas resolucion de conflictosLaura Ortiz
 
Dasar kurikulum smkadg 2013
Dasar kurikulum smkadg 2013Dasar kurikulum smkadg 2013
Dasar kurikulum smkadg 2013Aliff Hafiz
 
Capacitación Docente
Capacitación DocenteCapacitación Docente
Capacitación DocenteGaby Banegas
 
Juzgados de paz letrado
Juzgados de paz letradoJuzgados de paz letrado
Juzgados de paz letradosageta12
 
Strategy During turbulent times
Strategy During  turbulent timesStrategy During  turbulent times
Strategy During turbulent timesKannan R
 
Raising Capital for an Event Business
Raising Capital for an Event BusinessRaising Capital for an Event Business
Raising Capital for an Event BusinessSteven GINTOWT
 
Electrólisis de una disolución acuosa de yoduro de potasio
Electrólisis de una disolución acuosa de yoduro de potasioElectrólisis de una disolución acuosa de yoduro de potasio
Electrólisis de una disolución acuosa de yoduro de potasioMarii Michaus
 

Destaque (17)

El sistema científico tecnológico de venezuela
El sistema científico tecnológico de venezuelaEl sistema científico tecnológico de venezuela
El sistema científico tecnológico de venezuela
 
Enfermería kathy
Enfermería kathyEnfermería kathy
Enfermería kathy
 
proceso unico de ejecucion de garantias
proceso unico de ejecucion de garantiasproceso unico de ejecucion de garantias
proceso unico de ejecucion de garantias
 
Livret recettes bridelight
Livret recettes bridelightLivret recettes bridelight
Livret recettes bridelight
 
cas resolucion de conflictos
cas resolucion de conflictoscas resolucion de conflictos
cas resolucion de conflictos
 
Mapa conceptual
Mapa conceptualMapa conceptual
Mapa conceptual
 
Dasar kurikulum smkadg 2013
Dasar kurikulum smkadg 2013Dasar kurikulum smkadg 2013
Dasar kurikulum smkadg 2013
 
Ma présentation osvaldo
Ma présentation osvaldoMa présentation osvaldo
Ma présentation osvaldo
 
Capacitación Docente
Capacitación DocenteCapacitación Docente
Capacitación Docente
 
CCNA Security
CCNA SecurityCCNA Security
CCNA Security
 
Juzgados de paz letrado
Juzgados de paz letradoJuzgados de paz letrado
Juzgados de paz letrado
 
Strategy During turbulent times
Strategy During  turbulent timesStrategy During  turbulent times
Strategy During turbulent times
 
Raising Capital for an Event Business
Raising Capital for an Event BusinessRaising Capital for an Event Business
Raising Capital for an Event Business
 
Cultura ciudad
Cultura ciudadCultura ciudad
Cultura ciudad
 
October 28 (EE)
October 28 (EE)October 28 (EE)
October 28 (EE)
 
SVC Logo
SVC LogoSVC Logo
SVC Logo
 
Electrólisis de una disolución acuosa de yoduro de potasio
Electrólisis de una disolución acuosa de yoduro de potasioElectrólisis de una disolución acuosa de yoduro de potasio
Electrólisis de una disolución acuosa de yoduro de potasio
 

Semelhante a Diversified Recommendation on Graphs: Pitfalls, Measures, and Algorithms

Introduction geostatistic for_mineral_resources
Introduction geostatistic for_mineral_resourcesIntroduction geostatistic for_mineral_resources
Introduction geostatistic for_mineral_resourcesAdi Handarbeni
 
Social network analysis: Quadratic assignment procedure
Social network analysis: Quadratic assignment procedureSocial network analysis: Quadratic assignment procedure
Social network analysis: Quadratic assignment procedureMatthew Gwynfryn Thomas
 
QX Simulator and quantum programming - 2020-04-28
QX Simulator and quantum programming - 2020-04-28QX Simulator and quantum programming - 2020-04-28
QX Simulator and quantum programming - 2020-04-28Aritra Sarkar
 
Understanding how is that adaptive cursor sharing (acs) produces multiple opt...
Understanding how is that adaptive cursor sharing (acs) produces multiple opt...Understanding how is that adaptive cursor sharing (acs) produces multiple opt...
Understanding how is that adaptive cursor sharing (acs) produces multiple opt...Enkitec
 
Click Model-Based Information Retrieval Metrics
Click Model-Based Information Retrieval MetricsClick Model-Based Information Retrieval Metrics
Click Model-Based Information Retrieval MetricsAleksandr Chuklin
 
Quantum Gaussian Processes - Gawel Kus
Quantum Gaussian Processes - Gawel KusQuantum Gaussian Processes - Gawel Kus
Quantum Gaussian Processes - Gawel KusAdvanced-Concepts-Team
 
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...MLconf
 
Anomaly detection: Core Techniques and Advances in Big Data and Deep Learning
Anomaly detection: Core Techniques and Advances in Big Data and Deep LearningAnomaly detection: Core Techniques and Advances in Big Data and Deep Learning
Anomaly detection: Core Techniques and Advances in Big Data and Deep LearningQuantUniversity
 
Recommender Systems with Implicit Feedback Challenges, Techniques, and Applic...
Recommender Systems with Implicit Feedback Challenges, Techniques, and Applic...Recommender Systems with Implicit Feedback Challenges, Techniques, and Applic...
Recommender Systems with Implicit Feedback Challenges, Techniques, and Applic...NAVER Engineering
 
Virus, Vaccines, Genes and Quantum - 2020-06-18
Virus, Vaccines, Genes and Quantum - 2020-06-18Virus, Vaccines, Genes and Quantum - 2020-06-18
Virus, Vaccines, Genes and Quantum - 2020-06-18Aritra Sarkar
 
Anomaly detection Meetup Slides
Anomaly detection Meetup SlidesAnomaly detection Meetup Slides
Anomaly detection Meetup SlidesQuantUniversity
 
Scalable and Parallelizable Processing of Influence Maximization for Large-S...
Scalable and Parallelizable Processing of Influence Maximization  for Large-S...Scalable and Parallelizable Processing of Influence Maximization  for Large-S...
Scalable and Parallelizable Processing of Influence Maximization for Large-S...Jinha Kim
 
Predicting SPARQL query execution time and suggesting SPARQL queries based on...
Predicting SPARQL query execution time and suggesting SPARQL queries based on...Predicting SPARQL query execution time and suggesting SPARQL queries based on...
Predicting SPARQL query execution time and suggesting SPARQL queries based on...Rakebul Hasan
 
Quantum Computing Fundamentals 2023-04-14
Quantum Computing Fundamentals 2023-04-14Quantum Computing Fundamentals 2023-04-14
Quantum Computing Fundamentals 2023-04-14Gerald Scharitzer
 
How Do Gain and Discount Functions Affect the Correlation between DCG and Use...
How Do Gain and Discount Functions Affect the Correlation between DCG and Use...How Do Gain and Discount Functions Affect the Correlation between DCG and Use...
How Do Gain and Discount Functions Affect the Correlation between DCG and Use...Julián Urbano
 

Semelhante a Diversified Recommendation on Graphs: Pitfalls, Measures, and Algorithms (20)

Introduction geostatistic for_mineral_resources
Introduction geostatistic for_mineral_resourcesIntroduction geostatistic for_mineral_resources
Introduction geostatistic for_mineral_resources
 
Social network analysis: Quadratic assignment procedure
Social network analysis: Quadratic assignment procedureSocial network analysis: Quadratic assignment procedure
Social network analysis: Quadratic assignment procedure
 
Declarative data analysis
Declarative data analysisDeclarative data analysis
Declarative data analysis
 
QX Simulator and quantum programming - 2020-04-28
QX Simulator and quantum programming - 2020-04-28QX Simulator and quantum programming - 2020-04-28
QX Simulator and quantum programming - 2020-04-28
 
Understanding how is that adaptive cursor sharing (acs) produces multiple opt...
Understanding how is that adaptive cursor sharing (acs) produces multiple opt...Understanding how is that adaptive cursor sharing (acs) produces multiple opt...
Understanding how is that adaptive cursor sharing (acs) produces multiple opt...
 
Click Model-Based Information Retrieval Metrics
Click Model-Based Information Retrieval MetricsClick Model-Based Information Retrieval Metrics
Click Model-Based Information Retrieval Metrics
 
Quantum Gaussian Processes - Gawel Kus
Quantum Gaussian Processes - Gawel KusQuantum Gaussian Processes - Gawel Kus
Quantum Gaussian Processes - Gawel Kus
 
Anomaly detection
Anomaly detectionAnomaly detection
Anomaly detection
 
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
 
Anomaly detection: Core Techniques and Advances in Big Data and Deep Learning
Anomaly detection: Core Techniques and Advances in Big Data and Deep LearningAnomaly detection: Core Techniques and Advances in Big Data and Deep Learning
Anomaly detection: Core Techniques and Advances in Big Data and Deep Learning
 
Anomaly detection
Anomaly detectionAnomaly detection
Anomaly detection
 
Recommender Systems with Implicit Feedback Challenges, Techniques, and Applic...
Recommender Systems with Implicit Feedback Challenges, Techniques, and Applic...Recommender Systems with Implicit Feedback Challenges, Techniques, and Applic...
Recommender Systems with Implicit Feedback Challenges, Techniques, and Applic...
 
Virus, Vaccines, Genes and Quantum - 2020-06-18
Virus, Vaccines, Genes and Quantum - 2020-06-18Virus, Vaccines, Genes and Quantum - 2020-06-18
Virus, Vaccines, Genes and Quantum - 2020-06-18
 
Anomaly detection Meetup Slides
Anomaly detection Meetup SlidesAnomaly detection Meetup Slides
Anomaly detection Meetup Slides
 
Scalable and Parallelizable Processing of Influence Maximization for Large-S...
Scalable and Parallelizable Processing of Influence Maximization  for Large-S...Scalable and Parallelizable Processing of Influence Maximization  for Large-S...
Scalable and Parallelizable Processing of Influence Maximization for Large-S...
 
Predicting SPARQL query execution time and suggesting SPARQL queries based on...
Predicting SPARQL query execution time and suggesting SPARQL queries based on...Predicting SPARQL query execution time and suggesting SPARQL queries based on...
Predicting SPARQL query execution time and suggesting SPARQL queries based on...
 
Machine learning meetup
Machine learning meetupMachine learning meetup
Machine learning meetup
 
Quantum Computing Fundamentals 2023-04-14
Quantum Computing Fundamentals 2023-04-14Quantum Computing Fundamentals 2023-04-14
Quantum Computing Fundamentals 2023-04-14
 
How Do Gain and Discount Functions Affect the Correlation between DCG and Use...
How Do Gain and Discount Functions Affect the Correlation between DCG and Use...How Do Gain and Discount Functions Affect the Correlation between DCG and Use...
How Do Gain and Discount Functions Affect the Correlation between DCG and Use...
 
Real time traffic management - challenges and solutions
Real time traffic management - challenges and solutionsReal time traffic management - challenges and solutions
Real time traffic management - challenges and solutions
 

Último

Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 

Último (20)

Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 

Diversified Recommendation on Graphs: Pitfalls, Measures, and Algorithms

  • 1. Kucuktunc et al. “Diversified Recommendation on Graphs: Pitfalls, Measures, and Algorithms”, WWW’13 1/25 Diversified Recommendation on Graphs: Pitfalls, Measures, and Algorithms Onur Küçüktunç1,2 Erik Saule1 Kamer Kaya1 Ümit V. Çatalyürek1,3 WWW 2013, May 13–17, 2013, Rio de Janeiro, Brazil. 1Dept. Biomedical Informatics 2Dept. of Computer Science and Engineering 3Dept. of Electrical and Computer Engineering The Ohio State University
  • 2. Kucuktunc et al. “Diversified Recommendation on Graphs: Pitfalls, Measures, and Algorithms”, WWW’13 2/25 Outline •  Problem definition –  Motivation –  Result diversification algorithms •  How to measure diversity –  Classical relevance and diversity measures –  Bicriteria optimization?! –  Combined measures •  Best Coverage method –  Complexity, submodularity –  A greedy solution, relaxation •  Experiments
  • 3. Kucuktunc et al. “Diversified Recommendation on Graphs: Pitfalls, Measures, and Algorithms”, WWW’13 3/25 Problem definition G = (V, E) Q ✓ V Online shopping product co-purchasing •  one product •  previous purchases •  page visit history Academic paper-to-paper citations •  paper/field of interest •  set of references •  researcher himself/herself product recommendations “you might also like…”R ⇢ V references for related work new collaborators collaboration network Social friendship network •  user himself/herself •  set of people friend recommendations “you might also know…” Let G = (V, E) be an undirected graph. Given a set of m seed nodes Q = {q1, . . . , qm} s.t. Q ✓ V , and a parameter k, return top-k items which are relevant to the ones in Q, but diverse among themselves, covering di↵erent aspects of the query.
  • 4. Kucuktunc et al. “Diversified Recommendation on Graphs: Pitfalls, Measures, and Algorithms”, WWW’13 4/25 Problem definition Let G = (V, E) be an undirected graph. Given a set of m seed nodes Q = {q1, . . . , qm} s.t. Q ✓ V , and a parameter k, return top-k items which are relevant to the ones in Q, but diverse among themselves, covering di↵erent aspects of the query. •  We assume that the graph itself is the only information we have, and no categories or intents are available •  no comparisons to intent-aware algorithms [Agrawal09,Welch11,etc.] •  but we will compare against intent-aware measures •  Relevance scores are obtained with Personalized PageRank (PPR) [Haveliwala02] p⇤ (v) = ( 1/m, if v 2 Q 0, otherwise.
  • 5. Kucuktunc et al. “Diversified Recommendation on Graphs: Pitfalls, Measures, and Algorithms”, WWW’13 5/25 Result diversification algorithms •  GrassHopper [Zhu07] –  ranks the graph k times •  turns the highest-ranked vertex into a sink node at each iteration 0 5 10 0 2 4 6 8 0 5 10 0 5 10 0 0.005 0.01 0.015 g1 0 5 10 0 5 10 0 2 4 6 g2 00 5 10 0 0.5 1 1.5 (a) (b) (c) (d) Figure 1: (a) A toy data set. (b) The stationary distribution π reflects centrality. The item with probability is selected as the first item g1. (c) The expected number of visits v to each node after g an absorbing state. (d) After both g1 and g2 become absorbing states. Note the diversity in g1, g2, 0 5 10 0 2 4 6 8 0 5 10 0 5 10 0 0.005 0.01 0.015 g1 0 5 10 0 5 10 0 2 4 6 g2 00 5 10 0 0.5 1 1.5 (a) (b) (c) (d Figure 1: (a) A toy data set. (b) The stationary distribution π reflects centrality. The item with probability is selected as the first item g1. (c) The expected number of visits v to each node after an absorbing state. (d) After both g1 and g2 become absorbing states. Note the diversity in g1, g2 come from different groups. Items at group centers have higher probabilities, and 2.3 Ranking the Remaining Items 5 10 0 5 10 0 5 10 0 0.005 0.01 0.015 g1 0 5 10 0 5 10 0 2 4 6 g2 0 5 10 0 5 10 0 0.5 1 1.5 g 3 (a) (b) (c) (d) highest-ranked vertex R = {g1} R = {g1,g2} R = {g1,g2,g3} g1 turned into a sink node highest-ranked in the next step
  • 6. Kucuktunc et al. “Diversified Recommendation on Graphs: Pitfalls, Measures, and Algorithms”, WWW’13 6/25 Result diversification algorithms •  GrassHopper [Zhu07] –  ranks the graph k times •  turns the highest-ranked vertex into a sink node at each iteration •  DivRank [Mei10] –  based on vertex-reinforced random walks (VRRW) •  adjusts the transition matrix based on the number of visits to the vertices (rich-gets-richer mechanism) (a) An illustrative network. (b) Weighting with PageRank. (c) A diverse weighting.(a) An illustrative network. (b) Weighting with PageRank. (c) A diverse weighting.(a) An illustrative network. (b) Weighting with PageRank. (c) A diverse weighting.sample graph weighting with PPR diverse weighting
  • 7. Kucuktunc et al. “Diversified Recommendation on Graphs: Pitfalls, Measures, and Algorithms”, WWW’13 7/25 Result diversification algorithms •  GrassHopper [Zhu07] –  ranks the graph k times •  turns the highest-ranked vertex into a sink node at each iteration •  DivRank [Mei10] –  based on vertex-reinforced random walks (VRRW) •  adjusts the transition matrix based on the number of visits to the vertices (rich-gets-richer mechanism) •  Dragon [Tong11] –  based on optimizing the goodness measure •  punishes the score when two neighbors are included in the results
  • 8. Kucuktunc et al. “Diversified Recommendation on Graphs: Pitfalls, Measures, and Algorithms”, WWW’13 8/25 Measuring diversity Relevance measures •  Normalized relevance •  Difference ratio •  nDCG Diversity measures •  l-step graph density •  l-expansion ratio rel(S) = P v2S ⇡v Pk i=1 ˆ⇡i di↵(S, ˆS) = 1 |S ˆS| |S| nDCGk = ⇡s1 + Pk i=2 ⇡si log2 i ˆ⇡1 + Pk i=2 ˆ⇡i log2 i dens`(S) = P u,v2S,u6=v d`(u, v) |S| ⇥ (|S| 1) `(S) = |N`(S)| n N`(S) = S [ {v 2 (V S) : 9u 2 S, d(u, v)  `} where
  • 9. Kucuktunc et al. “Diversified Recommendation on Graphs: Pitfalls, Measures, and Algorithms”, WWW’13 9/25 Bicriteria optimization measures •  aggregate a relevance and a diversity measure •  [Carbonell98] •  [Li11] •  [Vieira11] •  max-sum diversification, max-min diversification, k-similar diversification set, etc. [Gollapudi09] fMMR(S) = (1 ) X v2S ⇡v X u2S max v2S u6=v sim(u, v) fL(S) = X v2S ⇡v + |N(S)| n fMSD(S) = (k 1)(1 ) X v2S ⇡v + 2 X u2S X v2S u6=v div(u, v)
  • 10. Kucuktunc et al. “Diversified Recommendation on Graphs: Pitfalls, Measures, and Algorithms”, WWW’13 10/25 Bicriteria optimization is not the answer •  Objective: diversify top-10 results •  Two query-oblivious algorithms: –  top-% + random –  top-% + greedy-σ2
  • 11. Kucuktunc et al. “Diversified Recommendation on Graphs: Pitfalls, Measures, and Algorithms”, WWW’13 11/25 Bicriteria optimization is not the answer •  normalized relevance and 2-step graph density •  evaluating result diversification as a bicriteria optimization problem with –  a relevance measure that ignores diversity, and –  a diversity measure that ignores relevancy. 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 dens2 rel 10-RLM BC1 BC1V BC2 BC2V BC1100 BC11000 BC150 BC2100 BC21000 BC250 CDivRank DRAGON GRASSHOPPER GSPARSE IL1 IL2 LM PDivRank PR k-RLM top-90%+random top-75%+random top-50%+random top-25%+random All Random 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 dens2 rel 10-RLM BC1 BC1V BC2 BC2V BC1100 BC11000 BC150 BC2100 BC21000 BC250 CDivRank DRAGON GRASSHOPPER GSPARSE IL1 IL2 LM PDivRank PR k-RLM top-90%+random top-75%+random top-50%+random top-25%+random All Random better 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 dens2 rel 10-RLM BC1 BC1V BC2 BC2V BC1100 BC11000 BC150 BC2100 BC21000 BC250 CDivRank DRAGON GRASSHOPPER GSPARSE IL1 IL2 LM PDivRank PR k-RLM top-90%+random top-75%+random top-50%+random top-25%+random All Random 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 dens2 rel 10-RLM BC1 BC1V BC2 BC2V BC1100 BC11000 BC150 BC2100 BC21000 BC250 CDivRank DRAGON GRASSHOPPER GSPARSE IL1 IL2 LM PDivRank PR k-RLM top-90%+random top-75%+random top-50%+random top-25%+random All Random better 0 0.05 0.1 0.15 0.2 0.25 0.3 0 0.2 0.4 0.6 0.8 1 σ2 rel 0 0.05 0.1 0.15 0.2 0.25 0.3 0 0.2 0.4 0.6 0.8 1 σ2 rel better 2 0.4 0.6 0.8 1 others others others others others top-90%+random top-75%+random top-50%+random top-25%+random All random others others others top-90%+greedy-σ2 top-75%+greedy-σ2 top-50%+greedy-σ2 top-25%+greedy-σ2 All greedy-σ2 others others others other algorithms 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 others others others others others top-90%+random top-75%+random top-50%+random top-25%+random All random others others others top-90%+greedy-σ2 top-75%+greedy-σ2 top-50%+greedy-σ2 top-25%+greedy-σ2 All greedy-σ2 others others others other algorithms
  • 12. Kucuktunc et al. “Diversified Recommendation on Graphs: Pitfalls, Measures, and Algorithms”, WWW’13 12/25 A better measure? Combine both •  We need a combined measure that tightly integrates both relevance and diversity aspects of the result set •  goodness [Tong11] –  downside: highly dominated by relevance fG(S) = 2 X i2S ⇡i d X i,j2S A(j, i)⇡j (1 d) X j2S ⇡j X i2S p⇤ (i)max-sum relevance penalize the score when two results share an edge
  • 13. Kucuktunc et al. “Diversified Recommendation on Graphs: Pitfalls, Measures, and Algorithms”, WWW’13 13/25 Proposed measure: l-step expanded relevance •  a combined measure of –  l-step expansion ratio (σ2) –  relevance scores (π) •  quantifies: relevance of the covered region of the graph •  do some sanity check with this new measure `-step expanded relevance: exprel`(S) = X v2N`(S) ⇡v where N`(S) is the `-step expansion set of the result set S, and ⇡ is the PPR scores of the items in the graph. 0 0.2 0.4 0.6 0.8 1 510 20 50 100 exprel2 k 0 0.2 0.4 0.6 0.8 1 510 20 50 100 exprel2 k 0.2 0.4 0.6 0.8 1 others others others others others top-90%+random top-75%+random top-50%+random top-25%+random All random others others others top-90%+greedy-σ2 top-75%+greedy-σ2 top-50%+greedy-σ2 top-25%+greedy-σ2 All greedy-σ2 others others others other algorithms 0.2 0.4 0.6 0.8 1 others others others others others top-90%+random top-75%+random top-50%+random top-25%+random All random others others others top-90%+greedy-σ2 top-75%+greedy-σ2 top-50%+greedy-σ2 top-25%+greedy-σ2 All greedy-σ2 others others others other algorithms
  • 14. Kucuktunc et al. “Diversified Recommendation on Graphs: Pitfalls, Measures, and Algorithms”, WWW’13 14/25 Correlations of the measures relevancediversity goodness is dominated by the relevancy measures exprel has no high correlations with other relevance or diversity measures
  • 15. Kucuktunc et al. “Diversified Recommendation on Graphs: Pitfalls, Measures, and Algorithms”, WWW’13 15/25 Proposed algorithm: Best Coverage •  Can we use -step expanded relevance as an objective function? •  Define: •  Complexity: generalization of weighted maximum coverage problem –  NP-hard! –  but exprell is a submodular function (Lemma 4.2) –  a greedy solution (Algorithm 1) that selects the item with the highest marginal utility at each step is the best possible polynomial time approximation (proof based on [Nemhauser78]) •  Relaxation: computes BestCoverage on highest ranked vertices to improve runtime exprel`-diversified top-k ranking (DTR`) S = argmax S0 ✓V |S0 |=k exprel`(S0 ) g(v, S) = P v02N`({v}) N`(S) ⇡v0 ALGORITHM 1: BestCoverage Input: k, G, ⇡, ` Output: a list of recommendations S S = ; while |S| < k do v⇤ argmaxv g(v, S) S S [ {v⇤ } return S ALGORITHM 1: BestCoverage (relaxed) ALGORITHM 2: BestCoverage (relaxed) Input: k, G, ⇡, ` Output: a list of recommendations S S = ; Sort(V ) w.r.t ⇡i non-increasing S1 V [1..k0 ], i.e., top-k0 vertices where k0 = k¯` 8v 2 S1, g(v) g(v, ;) 8v 2 S1, c(v) Uncovered while |S| < k do v⇤ argmaxv2S1 g(v) S S [ {v⇤ } S2 N`({v⇤ }) for each v0 2 S2 do if c(v0 ) = Uncovered then S3 N`({v0 }) 8u 2 S3, g(u) g(u) ⇡v0 c(v0 ) Covered return S `
  • 16. Kucuktunc et al. “Diversified Recommendation on Graphs: Pitfalls, Measures, and Algorithms”, WWW’13 16/25 Experiments •  5 target application areas, 5 graphs from SNAP •  Queries generated based on 3 scenario types –  one random vertex –  random vertices from one area of interest –  multiple vertices from multiple areas of interest Dataset |V | |E| ¯ D D90% CC amazon0601 403.3K 3.3M 16.8 21 7.6 0.42 ca-AstroPh 18.7K 396.1K 42.2 14 5.1 0.63 cit-Patents 3.7M 16.5M 8.7 22 9.4 0.09 soc-LiveJournal1 4.8M 68.9M 28.4 18 6.5 0.31 web-Google 875.7K 5.1M 11.6 22 8.1 0.60
  • 17. Kucuktunc et al. “Diversified Recommendation on Graphs: Pitfalls, Measures, and Algorithms”, WWW’13 17/25 Results – relevance •  Methods should trade-off relevance for better diversity •  Normalized relevance of top-k set is always 1 •  DRAGON always return results having 70% similar items to top-k, with more than 80% rel score amazon0601, combined 0 0.2 0.4 0.6 0.8 1 5 10 20 50 100 rel k PPR (top-k) GrassHopper Dragon PDivRank CDivRank k-RLM GSparse BC1 BC2 BC1 (relaxed) BC2 (relaxed) 100 PPR (top-k) GrassHopper Dragon PDivRank CDivRank k-RLM GSparse BC1 BC2 BC1 (relaxed) BC2 (relaxed) 50 100 k PPR (top-k) GrassHopper Dragon PDivRank CDivRank k-RLM GSparse BC1 BC2 BC1 (relaxed) BC2 (relaxed) 0 0.2 0.4 0.6 0.8 1 5 10 20 50 100 rel k PPR (top-k GrassHop Dragon PDivRank CDivRank k-RLM GSparse BC1 BC1 (relax soc-LiveJournal1, combined 10 100 1000 10000 5 10 20 50 100 time(sec) PPR (top-k) GrassHopper Dragon PDivRank CDivRank k-RLM GSparse BC1 BC1 (relaxed) 10 100 1000 10000 5 10 20 50 100 time(sec) k PPR (top-k) GrassHopper Dragon PDivRank CDivRank k-RLM GSparse BC1 BC1 (relaxed) amazon0601, combined 0.6 0.8 1 diff PPR (top-k) GrassHopper Dragon PDivRank CDivRank k-RLM GSparse BC1 BC2 BC1 (relaxed) BC2 (relaxed) 0.6 0.8 1 diff PPR (top- GrassHop Dragon PDivRank CDivRank k-RLM GSparse BC1 BC1 (relax soc-LiveJournal1, combined 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 5 10 2 σ2 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 5 10 20 50 100 σ2 k P G D P C k G B B B B 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 5 10 20 50 σ2 k 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 5 10 2 σ2 Figure ing k. highest and k-R 0.8 top-k DRAGON
  • 18. Kucuktunc et al. “Diversified Recommendation on Graphs: Pitfalls, Measures, and Algorithms”, WWW’13 18/25 Results – coverage •  l-step expansion ratio (σ2) gives the graph coverage of the result set: better coverage = better diversity •  BestCoverage and DivRank variants, especially BC2 and PDivRank, have the highest coverage 100 PPR (top-k) GrassHopper Dragon PDivRank CDivRank k-RLM GSparse BC1 BC1 (relaxed) ned PPR (top-k) GrassHopper Dragon PDivRank CDivRank k-RLM GSparse BC1 BC1 (relaxed) ined amazon0601, combined 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 5 10 20 50 100 σ2 k PPR (top-k) GrassHopper Dragon PDivRank CDivRank k-RLM GSparse BC1 BC2 BC1 (relaxed) BC2 (relaxed) 100 PPR (top-k) GrassHopper Dragon PDivRank CDivRank k-RLM GSparse BC1 BC2 BC1 (relaxed) BC2 (relaxed) 50 100 k PPR (top-k) GrassHopper Dragon PDivRank CDivRank k-RLM GSparse BC1 BC2 BC1 (relaxed) BC2 (relaxed) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 5 10 20 50 100 σ2 k PPR (top-k GrassHop Dragon PDivRank CDivRank k-RLM GSparse BC1 BC2 BC1 (relax BC2 (relax ca-AstroPh, combined 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 σ2 PPR (top-k) GrassHopper Dragon PDivRank CDivRank k-RLM GSparse BC1 BC2 BC1 (relaxed) BC2 (relaxed) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 5 10 20 50 100 σ2 k PPR (top-k) GrassHopper Dragon PDivRank CDivRank k-RLM GSparse BC1 BC2 BC1 (relaxed) BC2 (relaxed) Figure 3: Coverage ( 2) of the algorithms with vary- ing k. BestCoverage and DivRank variants have the highest coverage on the graphs while Dragon, GSparse, and k-RLM have similar coverages to top-k results. amazon0601, combined 0.8 PPR (top-k) 0.8 PPR (top-k) soc-LiveJournal1, combined
  • 19. Kucuktunc et al. “Diversified Recommendation on Graphs: Pitfalls, Measures, and Algorithms”, WWW’13 19/25 Results – expanded relevance •  combined measure for relevance and diversity •  BestCoverage variants and GrassHopper perform better •  Although PDivRank gives the highest coverage on amazon graph, it fails to cover the relevant parts! 100 PPR (top-k) GrassHopper Dragon PDivRank CDivRank k-RLM GSparse BC1 BC1 (relaxed) ent rse ele- the 2.3, ned ores 0 0.1 0.2 0.3 0.4 5 10 20 50 100 σ k BC2 (relaxed) 0 0.1 0.2 5 10 20 50 100 k ing k. BestCoverage and DivRank variants have the highest coverage on the graphs while Dragon, GSparse, and k-RLM have similar coverages to top-k results. amazon0601, c o m b i n e d 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 5 10 20 50 100 exprel2 k PPR (top-k) GrassHopper Dragon PDivRank CDivRank k-RLM GSparse BC1 BC2 BC1 (relaxed) BC2 (relaxed) PPR (top-k) GrassHopper Dragon PDivRank CDivRank k-RLM GSparse BC1 BC2 BC1 (relaxed) BC2 (relaxed) 50 100 k PPR (top-k) GrassHopper Dragon PDivRank CDivRank k-RLM GSparse BC1 BC2 BC1 (relaxed) BC2 (relaxed) 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 5 10 20 50 100 exprel2 k PPR (top-k GrassHopp Dragon PDivRank CDivRank k-RLM GSparse BC1 BC1 (relaxe soc-LiveJournal1, c o m b i n e d 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 5 10 20 50 100 exprel2 k PPR (top-k) GrassHopper Dragon PDivRank CDivRank k-RLM GSparse BC1 BC1 (relaxed) Figure 4: Expanded relevance (exprel2 ) with vary- ing k. BC1 and BC2 variants mostly score the best, GrassHopper performs high in soc-LiveJournal1. Al- though PDivRank gives the highest coverage on ama- zon0601 (Fig. 3), it fails to cover the relevant parts. BC2 BC1 PDivRank GrassHopper
  • 20. Kucuktunc et al. “Diversified Recommendation on Graphs: Pitfalls, Measures, and Algorithms”, WWW’13 20/25 Results – efficiency •  BC1 always performs better, with a running time less than, DivRank and GrassHopper •  BC1 (relaxed) offers reasonable diversity, with a very little overhead on top of the PPR computation 100 PPR (top-k) GrassHopper Dragon PDivRank CDivRank k-RLM GSparse BC1 BC2 BC1 (relaxed) BC2 (relaxed) 3 ) ) 100 PPR (top-k) GrassHopper Dragon PDivRank CDivRank k-RLM GSparse BC1 BC2 BC1 (relaxed) BC2 (relaxed) 3 0.01 0.1 1 10 5 10 20 50 100 time(sec) k PPR (top-k) GrassHopper Dragon PDivRank CDivRank k-RLM GSparse BC1 BC2 BC1 (relaxed) BC2 (relaxed) 100 PPR (top-k) GrassHopper Dragon PDivRank CDivRank k-RLM GSparse BC1 BC2 BC1 (relaxed) BC2 (relaxed) 50 100 k PPR (top-k) GrassHopper Dragon PDivRank CDivRank k-RLM GSparse BC1 BC2 BC1 (relaxed) BC2 (relaxed) 20 50 100 k PPR (top-k) GrassHopper Dragon PDivRank CDivRank k-RLM GSparse BC1 BC2 BC1 (relaxed) BC2 (relaxed) ca-AstroPh, combined 10 100 1000 10000 5 10 20 50 100 time(sec) k PPR (top-k) GrassHopper Dragon PDivRank CDivRank k-RLM GSparse BC1 BC1 (relaxed) 0.1 1 10 100 5 10 20 50 100 time(sec) k PPR (top-k) GrassHopper Dragon PDivRank CDivRank k-RLM GSparse BC1 BC2 BC1 (relaxed) BC2 (relaxed) 10 100 1000 10000 5 10 20 50 100 time(sec) k PPR (top-k) GrassHopper Dragon PDivRank CDivRank k-RLM GSparse BC1 BC1 (relaxed) soc-LiveJournal1, combined 10 100 1000 10000 5 10 20 50 100 time(sec) k PPR (top-k) GrassHopper Dragon PDivRank CDivRank k-RLM GSparse BC1 BC2 BC1 (relaxed) BC2 (relaxed) 100 PPR (top-k) GrassHopper Dragon PDivRank CDivRank k-RLM GSparse BC1 BC2 BC1 (relaxed) BC2 (relaxed) 50 100 k PPR (top-k) GrassHopper Dragon PDivRank CDivRank k-RLM GSparse BC1 BC2 BC1 (relaxed) BC2 (relaxed) cit-Patents, scenario 1 1 10 100 1000 5 10 20 50 100 time(sec) k PPR (top-k) GrassHopper Dragon PDivRank CDivRank k-RLM GSparse BC1 BC2 BC1 (relaxed) BC2 (relaxed) web-Google, scenario 1 Figure 7: Running times of the algorithms with varying k. BC1 method always perform better with a running time less than GrassHopper and DivRank vari- ants, while the relaxed versions score similarly with a slight overhead on top of the PPR computation. BC1 BC1 BC1 BC1 DivRank DivRank DivRank DivRank BC1 (relaxed) BC1 (relaxed) BC1 (relaxed)BC1 (relaxed)
  • 21. Kucuktunc et al. “Diversified Recommendation on Graphs: Pitfalls, Measures, and Algorithms”, WWW’13 21/25 Results – intent aware experiments •  evaluation of intent-oblivious algorithms against intent-aware measures •  two measures –  group coverage [Li11] –  S-recall [Zhai03] •  cit-Patent dataset has the categorical information –  426 class labels, belong to 36 subtopics
  • 22. Kucuktunc et al. “Diversified Recommendation on Graphs: Pitfalls, Measures, and Algorithms”, WWW’13 22/25 Results – intent aware experiments •  group coverage [Li11] –  How many different groups are covered by the results? –  omits the actual intent of the query •  top-k results are not diverse enough •  AllRandom results cover the most number of groups •  PDivRank and BC2 follows 0 10 20 30 40 50 60 70 5 10 20 50 100 Classcoverage k PPR (top-k) Dragon PDivRank CDivRank k-RLM BC1 BC2 BC1 (relaxed) BC2 (relaxed) AllRandom (a) Class coverage 0 5 10 15 20 25 30 5 10 20 50 100 Subtopiccoverage k PPR (top-k) Dragon PDivRank CDivRank k-RLM BC1 BC2 BC1 (relaxed) BC2 (relaxed) AllRandom (b) Subtopic coverage 0 1 2 3 4 5 6 5 10 20 Topiccoverage (c) Topic 0 5 10 15 20 25 30 5 10 20 50 Subtopiccoverage k 50 100 k PPR (top-k) Dragon PDivRank CDivRank k-RLM BC1 BC2 BC1 (relaxed) BC2 (relaxed) AllRandom ) Topic coverage 100 PPR (top-k) Dragon PDivRank CDivRank k-RLM BC1 BC2 BC1 (relaxed) BC2 (relaxed) AllRandom 100 PPR (top-k) Dragon PDivRank CDivRank k-RLM BC1 BC2 BC1 (relaxed) BC2 (relaxed) AllRandom ge 0 5 10 15 20 25 30 5 10 20 50 100 Subtopiccoverage k PPR (top-k) Dragon PDivRank CDivRank k-RLM BC1 BC2 BC1 (relaxed) BC2 (relaxed) AllRandom (b) Subtopic coverage 0 1 2 3 4 5 6 5 10 20 50 100 Topiccoverage k PPR (top-k) Dragon PDivRank CDivRank k-RLM BC1 BC2 BC1 (relaxed) BC2 (relaxed) AllRandom (c) Topic coverage 0 5 10 15 20 25 30 5 10 20 50 100 Subtopiccoverage k PPR (top-k) Dragon PDivRank CDivRank k-RLM BC1 BC2 BC1 (relaxed) BC2 (relaxed) AllRandom ses (e) S-recall on subtopics (f) S-recall on topics Intent-aware results on cit-Patents dataset with scenario-3 queries. top-k top-k random BC2 BC2
  • 23. Kucuktunc et al. “Diversified Recommendation on Graphs: Pitfalls, Measures, and Algorithms”, WWW’13 23/25 0 10 20 30 40 50 60 5 10 20 50 100 Classcoverage k PDivRank CDivRank k-RLM BC1 BC2 BC1 (relaxed) BC2 (relaxed) AllRandom (a) Class coverage 0 5 10 15 20 25 5 10 20 50 100 Subtopiccoverage k PDivRank CDivRank k-RLM BC1 BC2 BC1 (relaxed) BC2 (relaxed) AllRandom (b) Subtopic coverage 0 1 2 3 4 5 5 10 20 Topiccoverage (c) Topic 0 5 10 15 20 25 5 10 20 50 Subtopiccoverage k 0 0.1 0.2 0.3 0.4 0.5 0.6 5 10 20 S-recall(class) k (d) S-recall on classes 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 5 10 20 S-recall(subtopic) k (e) S-recall on subtopics (f) S-recal Figure 8: Intent-aware results on cit-Patents dataset with sc level topics7 . Here we present an evaluation of the intent- oblivious algorithms against intent-aware measures. This evaluation provides a validation of the diversification tech- niques with an external measure such as group coverage [14] and S-recall [23]. sults of all of the dive those two extremes, w and Dragon covers the However, S-recall ind was actually useful or Results – intent aware experiments •  S-recall [Zhai03], Intent-coverage [Zhu11] –  percentage of relevant subtopics covered by the result set –  the intent is given with the classes of the seed nodes •  AllRandom brings irrelevant items from the search space •  top-k results do not have the necessary diversity •  BC2 variants and BC1 perform better than DivRank •  BC1 (relaxed) and DivRank scores similar, but BC1r much faster top-k random DivRank BC 0 10 20 30 40 50 60 70 5 10 20 50 100 k PPR (top-k) Dragon PDivRank CDivRank k-RLM BC1 BC2 BC1 (relaxed) BC2 (relaxed) AllRandom (a) Class coverage 0 5 10 15 20 25 30 5 10 20 50 100 Subtopiccoverage k PPR (top-k) Dragon PDivRank CDivRank k-RLM BC1 BC2 BC1 (relaxed) BC2 (relaxed) AllRandom (b) Subtopic coverage 0 1 2 3 4 5 6 5 10 20 50 100 Topiccoverage k PPR (top-k) Dragon PDivRank CDivRank k-RLM BC1 BC2 BC1 (relaxed) BC2 (relaxed) AllRandom (c) Topic coverage 0 5 10 15 20 25 30 5 10 20 50 100 Subtopiccoverage k PPR (top-k) Dragon PDivRank CDivRank k-RLM BC1 BC2 BC1 (relaxed) BC2 (relaxed) AllRandom (d) S-recall on classes (e) S-recall on subtopics (f) S-recall on topics PPR (top-k) Dragon PDivRank CDivRank k-RLM BC1 BC2 BC1 (relaxed) BC2 (relaxed) AllRandom Figure 8: Intent-aware results on cit-Patents dataset with scenario-3 queries. pics7 . Here we present an evaluation of the intent- s algorithms against intent-aware measures. This on provides a validation of the diversification tech- with an external measure such as group coverage [14] ecall [23]. ts of a query set Q is extracted by collecting the subtopics, and topics of each seed node. Since our o evaluate the results based on the coverage of dif- sults of all of the diversification algorithms reside between those two extremes, where the PDivRank covers the most and Dragon covers the least number of groups. However, S-recall index measures whether a covered group was actually useful or not. Obviously, AllRandom scores the lowest as it dismisses the actual query (you may omit the S- recall on topics since there are only 6 groups in this granular ity level). Among the algorithms, BC2 variants and BC1 score random random random random random top-k top-k top-k top-k top-k DivRank DivRank DivRank DivRank DR BC BC BC BC BC
  • 24. Kucuktunc et al. “Diversified Recommendation on Graphs: Pitfalls, Measures, and Algorithms”, WWW’13 24/25 Conclusions •  Result diversification should not be evaluated as a bicriteria optimization problem with –  a relevance measure that ignores diversity, and –  a diversity measure that ignores relevancy •  l-step expanded relevance is a simple measure that combines both relevance and diversity •  BestCoverage, a greedy solution that maximizes exprell is a (1-1/e)-approximation of the optimal solution •  BestCoverage variants perform better than others, its relaxation is extremely efficient •  goodness in DRAGON is dominated by relevancy •  DivRank variants implicitly optimize expansion ratio
  • 25. Kucuktunc et al. “Diversified Recommendation on Graphs: Pitfalls, Measures, and Algorithms”, WWW’13 25/25 Thank you •  For more information visit •  http://bmi.osu.edu/hpc •  Research at the HPC Lab is funded by