The intent-oriented search diversification methods developed in the field so far tend to build on generative views of the retrieval system to be diversified. Core algorithm components –in particular redundancy assessment– are expressed in terms of the probability to observe documents, rather than the probability that the documents be relevant. This has been sometimes described as a view considering the selection of a single document in the underlying task model. In this paper we propose an alternative formulation of aspect-based diversification algorithms which explicitly includes a formal relevance model. We develop means for the effective computation of the new formulation, and we test the resulting algorithm empirically. We report experiments on search and recommendation tasks showing competitive or better performance than the original diversification algorithms. The relevance-based formulation has further interesting properties, such as unifying two well-known state of the art algorithms into a single version. The relevance-based approach opens alternative possibilities for further formal connections and developments as natural extensions of the framework. We illustrate this by modeling tolerance to redundancy as an explicit configurable parameter, which can be set to better suit the characteristics of the IR task, or the evaluation metrics, as we illustrate empirically.
SIGIR 2012 - Explicit Relevance Models in Intent-Oriented Information Retrieval Diversification
1. 35th Annual International ACM SIGIR Conference on Research
and Development in Information Retrieval (SIGIR 2012)
Explicit Relevance Models
in Intent-Aware IR Diversification
Saúl Vargas, Pablo Castells and David Vallet
Universidad Autónoma de Madrid
http://ir.ii.uam.es
Portland, OR, 13 August 2012
IRG
Explicit Relevance Models in Intent-Aware IR Diversification
35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
IR Group @ UAM Portland, OR, 13 August 2012
2. Outline
Context: IR diversification formulation and algorithms
Proposed approach: relevance-based reformulation
of diversification algorithms
Experiments
Adjustable tolerance to redundancy
Conclusion
IRG
Explicit Relevance Models in Intent-Aware IR Diversification
35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
IR Group @ UAM Portland, OR, 13 August 2012
3. IR diversity – Brief recap
Nutrition /
Health
Appliance
Chemical
element
Golf
Mining /
Metallurgy
IRG
Explicit Relevance Models in Intent-Aware IR Diversification
35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
IR Group @ UAM Portland, OR, 13 August 2012
4. IR diversity – Brief recap
Nutrition /
Health
Appliance
Diversity as a means to address uncertainty in user queries
– The same query may have different intents or aspects in the Chemical
information need underneath element
Revision of document relevance independence
– Marginal utility of additional relevant documents decreases fast
Golf
Trade diminishing marginal utility for increased intent coverage
– Thus maximize the number of users who obtain at least some
useful document Mining /
Metallurgy
IRG
Explicit Relevance Models in Intent-Aware IR Diversification
35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
IR Group @ UAM Portland, OR, 13 August 2012
5. IR diversification – Problem statement
Given a query 𝑞 on a collection
Find 𝑆 ⊂ of given size maximizing: NP-hard
𝑝 some 𝑑 ∈ 𝑆 relevant 𝑞
Agrawal 2009, Santos 2010, Chen 2006, …
𝑅− 𝑆 𝑆
Baseline arg max 𝝋 𝒅, 𝑺 𝒒 Diversified Greedy
ranking 𝑑∈𝑅−𝑆 ranking approx
𝑝(𝑑|𝑞)
𝝋 𝒅, 𝑺 𝒒 ∝ 𝑝 𝑑 is relevant ∧ no 𝑑 ′ ∈ 𝑆 is relevant 𝑞
IRG
Explicit Relevance Models in Intent-Aware IR Diversification
35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
IR Group @ UAM Portland, OR, 13 August 2012
6. IR diversity – Instantiations of objective function
State of the art aspect-based approaches
IA-Select scheme (Agrawal 2009)
𝜑 𝑑, 𝑆 𝑞 = 𝑝 𝒛 𝑞 𝑝 𝒛 𝑑 𝑝 𝑑 𝑞 1 − 𝑝 𝒛 𝑑′ 𝑝 𝑑 𝑞
𝑧 𝑑 ′ ∈𝑆
Explicit query aspects
xQuAD scheme (Santos 2010)
𝜑 𝑑, 𝑆 𝑞 = 1 − 𝜆 𝑝 𝑑 𝑞 + 𝜆 𝑝 𝑑, ¬ 𝑆 𝑞
= 1− 𝜆 𝑝 𝑑 𝑞 + 𝜆 𝑝 𝒛 𝑞 𝑝 𝑑 𝑞, 𝒛 1 − 𝑝 𝑑′ 𝑞, 𝒛
𝑧 𝑑 ′ ∈𝑆
Explicit query aspects
IRG
Explicit Relevance Models in Intent-Aware IR Diversification
35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
IR Group @ UAM Portland, OR, 13 August 2012
7. IR diversity – Instantiations of objective function
State of the art aspect-based approaches
IA-Select scheme (Agrawal 2009)
𝜑 𝑑, 𝑆 𝑞 = 𝑝 𝑧 𝑞 𝑝 𝑧 𝑑 𝑝 𝑑 𝑞 1 − 𝑝 𝑧 𝑑′ 𝑝 𝑑 𝑞
𝑧 𝑑 ′ ∈𝑆
Query aspect
xQuAD scheme (Santos 2010)
coverage
𝜑 𝑑, 𝑆 𝑞 = 1 − 𝜆 𝑝 𝑑 𝑞 + 𝜆 𝑝 𝑑, ¬ 𝑆 𝑞
= 1− 𝜆 𝑝 𝑑 𝑞 + 𝜆 𝑝 𝑧 𝑞 𝑝 𝑑 𝑞, 𝑧 1 − 𝑝 𝑑 ′ 𝑞, 𝑧
𝑧 𝑑 ′ ∈𝑆
IRG
Explicit Relevance Models in Intent-Aware IR Diversification
35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
IR Group @ UAM Portland, OR, 13 August 2012
8. IR diversity – Instantiations of objective function
State of the art aspect-based approaches
IA-Select scheme (Agrawal 2009)
𝜑 𝑑, 𝑆 𝑞 = 𝑝 𝑧 𝑞 𝑝 𝑧 𝑑 𝑝 𝑑 𝑞 1 − 𝑝 𝑧 𝑑′ 𝑝 𝑑 𝑞
𝑧 𝑑 ′ ∈𝑆
Document “relevance”
xQuAD scheme (Santos 2010)
for query aspect
𝜑 𝑑, 𝑆 𝑞 = 1 − 𝜆 𝑝 𝑑 𝑞 + 𝜆 𝑝 𝑑, ¬ 𝑆 𝑞
= 1− 𝜆 𝑝 𝑑 𝑞 + 𝜆 𝑝 𝑧 𝑞 𝑝 𝑑 𝑞, 𝑧 1 − 𝑝 𝑑 ′ 𝑞, 𝑧
𝑧 𝑑 ′ ∈𝑆
IRG
Explicit Relevance Models in Intent-Aware IR Diversification
35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
IR Group @ UAM Portland, OR, 13 August 2012
9. IR diversity – Instantiations of objective function
State of the art aspect-based approaches
IA-Select scheme (Agrawal 2009)
𝜑 𝑑, 𝑆 𝑞 = 𝑝 𝑧 𝑞 𝑝 𝑧 𝑑 𝑝 𝑑 𝑞 1 − 𝑝 𝑧 𝑑′ 𝑝 𝑑 𝑞
𝑧 𝑑 ′ ∈𝑆
xQuAD scheme (Santos 2010) Redundancy
penalization
𝜑 𝑑, 𝑆 𝑞 = 1 − 𝜆 𝑝 𝑑 𝑞 + 𝜆 𝑝 𝑑, ¬ 𝑆 𝑞
= 1− 𝜆 𝑝 𝑑 𝑞 + 𝜆 𝑝 𝑧 𝑞 𝑝 𝑑 𝑞, 𝑧 1 − 𝑝 𝑑 ′ 𝑞, 𝑧
𝑧 𝑑 ′ ∈𝑆
IRG
Explicit Relevance Models in Intent-Aware IR Diversification
35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
IR Group @ UAM Portland, OR, 13 August 2012
10. IR diversity – Instantiations of objective function
State of the art aspect-based approaches
IA-Select scheme (Agrawal 2009)
𝜑 𝑑, 𝑆 𝑞 = 𝑝 𝑧 𝑞 𝑝 𝑧 𝑑 𝑝 𝑑 𝑞 1 − 𝑝 𝑧 𝑑′ 𝑝 𝑑 𝑞
𝑧 𝑑 ′ ∈𝑆
xQuAD scheme (Santos 2010)
𝜑 𝑑, 𝑆 𝑞 = 1 − 𝜆 𝑝 𝑑 𝑞 + 𝜆 𝑝 𝑑, ¬ 𝑆 𝑞
= 1− 𝜆 𝑝 𝑑 𝑞 + 𝜆 𝑝 𝑧 𝑞 𝑝 𝑑 𝑞, 𝑧 1 − 𝑝 𝑑 ′ 𝑞, 𝑧
𝑧 𝑑 ′ ∈𝑆
Mixture with baseline 𝜆 Degree of diversification
IRG
Explicit Relevance Models in Intent-Aware IR Diversification
35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
IR Group @ UAM Portland, OR, 13 August 2012
11. IR diversity – Instantiations of objective function
𝜑 𝑑, 𝑆 𝑞 ∝ 𝑝 𝑑 is 𝐫𝐞𝐥𝐞𝐯𝐚𝐧𝐭 ∧ no 𝑑 ′ ∈ 𝑆 is 𝐫𝐞𝐥𝐞𝐯𝐚𝐧𝐭 𝑞
IA-Select scheme (Agrawal 2009)
𝜑 𝑑, 𝑆 𝑞 = 𝑝 𝑧 𝑞 𝑝 𝑧 𝑑 𝑝 𝑑 𝑞 1 − 𝑝 𝑧 𝑑′ 𝑝 𝑑 𝑞
𝑧 𝑑 ′ ∈𝑆
Probability to
xQuAD scheme (Santos 2010) observe documents
𝜑 𝑑, 𝑆 𝑞 = 1 − 𝜆 𝑝 𝑑 𝑞 + 𝜆 𝑝 𝑑, ¬ 𝑆 𝑞
= 1− 𝜆 𝑝 𝑑 𝑞 + 𝜆 𝑝 𝑧 𝑞 𝑝 𝑑 𝑞, 𝑧 1 − 𝑝 𝑑 ′ 𝑞, 𝑧
𝑧 𝑑 ′ ∈𝑆
IRG
Explicit Relevance Models in Intent-Aware IR Diversification
35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
IR Group @ UAM Portland, OR, 13 August 2012
12. IR diversity – Relevance-based instantiation of objective function
𝜑 𝑑, 𝑆 𝑞 ∝ 𝑝 𝑑 is 𝐫𝐞𝐥𝐞𝐯𝐚𝐧𝐭 ∧ no 𝑑 ′ ∈ 𝑆 is 𝐫𝐞𝐥𝐞𝐯𝐚𝐧𝐭 𝑞
IA-Select scheme – relevance-based Our proposal
𝜑 𝑑, 𝑆 𝑞 = 𝑝 𝑧 𝑞 𝑝 𝒓 𝑑, 𝑞, 𝑧 1 − 𝑝 𝒓 𝑑 ′ , 𝑞, 𝑧
𝑧 𝑑 ′ ∈𝑆
Probability
xQuAD scheme – relevance-based of relevance
𝜑 𝑑, 𝑆 𝑞 = 1 − 𝜆 𝑝 𝒓 𝑑 𝑞 + 𝜆 𝑝 𝒓 𝑑 , ¬ 𝒓 𝑆 𝑞
= 1 − 𝜆 𝑝 𝒓 𝑑, 𝑞 + 𝜆 𝑝 𝑧 𝑞 𝑝 𝒓 𝑑, 𝑞, 𝑧 1 − 𝑝 𝒓 𝑑′ , 𝑞, 𝑧
𝑧 𝑑 ′ ∈𝑆
IRG
Explicit Relevance Models in Intent-Aware IR Diversification
35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
IR Group @ UAM Portland, OR, 13 August 2012
13. IR diversity – Relevance-based instantiation of objective function
𝜑 𝑑, 𝑆 𝑞 ∝ 𝑝 𝑑 is 𝐫𝐞𝐥𝐞𝐯𝐚𝐧𝐭 ∧ no 𝑑 ′ ∈ 𝑆 is 𝐫𝐞𝐥𝐞𝐯𝐚𝐧𝐭 𝑞
IA-Select scheme – relevance-based
𝜑 𝑑, 𝑆 𝑞 = 𝑝 𝑧 𝑞 𝑝 𝑟 𝑑, 𝑞, 𝑧 1 − 𝑝 𝑟 𝑑 ′ , 𝑞, 𝑧
𝑧 More literal interpretation
𝑑 ′ ∈𝑆
of initial problem statement
xQuAD scheme – relevance-based
𝜑 𝑑, 𝑆 𝑞 = 1 − 𝜆 𝑝 𝒓 𝑑 𝑞 + 𝜆 𝑝 𝒓 𝑑 , ¬ 𝒓 𝑆 𝑞
= 1 − 𝜆 𝑝 𝑟 𝑑, 𝑞 + 𝜆 𝑝 𝑧 𝑞 𝑝 𝑟 𝑑, 𝑞, 𝑧 1 − 𝑝 𝑟 𝑑 ′ , 𝑞, 𝑧
𝑧 𝑑 ′ ∈𝑆
IRG
Explicit Relevance Models in Intent-Aware IR Diversification
35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
IR Group @ UAM Portland, OR, 13 August 2012
14. IR diversity – Relevance-based instantiation of objective function
𝜑 𝑑, 𝑆 𝑞 ∝ 𝑝 𝑑 is relevant ∧ no 𝑑′ ∈ 𝑆 is relevant 𝑞
IA-Select scheme – relevance-based
𝜑 𝑑, 𝑆 𝑞 = 𝑝 𝑧 𝑞 𝑝 𝑟 𝑑, 𝑞, 𝑧 1 − 𝑝 𝑟 𝑑 ′ , 𝑞, 𝑧
𝑧 𝑑 ′ ∈𝑆
Equivalent
xQuAD scheme – relevance-based
for 𝜆 = 1
𝜑 𝑑, 𝑆 𝑞 = 1 − 𝜆 𝑝 𝑟 𝑑 𝑞 + 𝜆 𝑝 𝑟 𝑑 , ¬ 𝑟 𝑆 𝑞
= 1 − 𝜆 𝑝 𝑟 𝑑, 𝑞 + 𝜆 𝑝 𝑧 𝑞 𝑝 𝑟 𝑑, 𝑞, 𝑧 1 − 𝑝 𝑟 𝑑 ′ , 𝑞, 𝑧
𝑧 𝑑 ′ ∈𝑆
IRG
Explicit Relevance Models in Intent-Aware IR Diversification
35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
IR Group @ UAM Portland, OR, 13 August 2012
15. Relevance distirbution vs. document distribution
𝑝 𝑟 𝑑,· vs. 𝑝 𝑑 · – The difference does matter (in this context)
1
𝑝 𝑑 𝑞, 𝑧 = 1
𝑑
𝑝 𝑟 𝑑, 𝑞, 𝑧 = E nr relevant docs ≥ 1
𝑑
Different potential behavior
E.g. stronger redundancy penalization
Potential rank
0 equivalences do
𝑑 not apply here
1 − 𝜆 𝑝 𝑟 𝑑, 𝑞 + 𝜆 𝑝 𝑧 𝑞 𝑝 𝑟 𝑑, 𝑞, 𝑧 1 − 𝑝 𝑟 𝑑′ , 𝑞, 𝑧
IRG 𝑧 ′
Explicit Relevance Models in Intent-Aware IR Diversification
𝑑 ∈𝑆
35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
IR Group @ UAM Portland, OR, 13 August 2012
16. Relevance-based greedy diversification
Relevance-based reformulation of diversification algorithm
1. Need to estimate 𝑝 𝑟 𝑑, 𝑞, 𝑧
2. Does it work? Test empirically
3. Further development: parameterized tolerance to redundancy
IRG
Explicit Relevance Models in Intent-Aware IR Diversification
35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
IR Group @ UAM Portland, OR, 13 August 2012
17. Aspect-based relevance model
Estimate 𝒑 𝒓 𝒅, 𝒒, 𝒛
Cannot use odds, logs, constant removal… or any other rank-preserving step
(we need the specific values)
𝑝 𝑟 𝑑, 𝑞 Positional relevance 𝑝 𝑟 rank 𝑑, 𝑞
Estimate 𝑝 𝑧 𝑑 or 𝑝 𝑧 𝑞 depending
𝑝 𝑧 𝑑
on available observations:
𝑝 𝑟 𝑑, 𝑞, 𝑧 𝑝 𝑧 𝑞 • 𝑧 as document classes (e.g. ODP)
• 𝑧 as subqueries (e.g. reformulations)
𝑝(𝑧)
Then derive the other two parameters
𝑝 𝑑 𝑞 Normalized baseline IR system score
(as in e.g. Bache 2009)
IRG
Explicit Relevance Models in Intent-Aware IR Diversification
35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
IR Group @ UAM Portland, OR, 13 August 2012
18. Positional relevance distribution estimate
𝒑 𝒓 𝒅, 𝒒 ∼ 𝑝 𝑟 rank 𝑑, 𝑞 = 𝒑 𝒓 𝒌
1E+00
1E-01 𝑝 𝑟 𝑘
pLSA
1E-02
p(r|k)
Lemur Precision
1E-03 estimates
1E-04 Click log
AOL statistics
1E-05
0 20 40 60 80 100 120 140 160 180 200
𝑘
k
IRG
Explicit Relevance Models in Intent-Aware IR Diversification
35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
IR Group @ UAM Portland, OR, 13 August 2012
19. Relevance-based greedy diversification
Relevance-based reformulation of diversification algorithm
1. Need to estimate 𝑝 𝑟 𝑑, 𝑞, 𝑧
2. Does it work? Test empirically
3. Further development: parameterized tolerance to redundancy
IRG
Explicit Relevance Models in Intent-Aware IR Diversification
35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
IR Group @ UAM Portland, OR, 13 August 2012
20. Experiments
Search diversity
Collection: ClueWeb09 category B (50M documents)
Query/subtopic set: TREC 2009/10 diversity task (100 queries)
Baseline ranking: Lemur Indri search engine (Web service) Diversified top n : 100
Query aspect space:
a) ODP categories level 4 (~7K categories)
b) TREC subtopics (oracle for reference)
Specific parameter estimates:
𝑝 𝑧 𝑞 Uniform
ODP categories: semi-supervised text classification by Textwise
𝑝 𝑧 𝑑
TREC subtopics: Indri search system run on 𝑧 as if a query
i. P@k estimates with TREC relevance judgments (2-fold 2009/10 cross validation)
𝑝 𝑟 𝑘
ii. Click statistics from AOL log (thus different IR system)
IRG
Explicit Relevance Models in Intent-Aware IR Diversification
35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
IR Group @ UAM Portland, OR, 13 August 2012
21. Experiments – Search diversity on TREC
xQuAD scheme
Based on 𝑝 𝑟 𝑑, 𝑞, 𝑧
𝑝 𝑟 𝑘 from qrels
Based on 𝑝 𝑑 𝑞, 𝑧
ODP categories TREC subtopics
ERR-IA
ERR-IA
λ λ
IRG
Explicit Relevance Models in Intent-Aware IR Diversification
35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
IR Group @ UAM Portland, OR, 13 August 2012
22. Experiments – Search diversity on TREC
-nDCG@20 ERR-IA@20 nDCGIA@20 S-recall@20
Lemur - 0.2587 0.1630 0.2396 0.4636
IA-Select - 0.2651 0.1681 0.2423 0.4483
categories
a) ODP
xQuAD 0.9 0.2675 0.1656 0.2451 0.4864
Rel-based i. Qrels 0.1 0.2858△▲ 0.1828△▲ 0.2655△▲ 0.4898▲△
xQuAD ii. Clicks 0.4 0.2841▲△ 0.1831△△ 0.2605△▲ 0.4830▲▽
IA-Select - 0.3541 0.2346 0.3213 0.5787
subtopics
b) TREC
xQuAD 1.0 0.3445 0.2241 0.3127 0.5704
Rel-based i. Qrels 1.0 0.3543△△ 0.2349△△ 0.3192▽△ 0.5782▽△
xQuAD ii. Clicks 1.0 0.3512▽△ 0.2320▽△ 0.3166▽△ 0.5748▽△
“informally” maximizing ERR-IA by 0.1 steps for each diversifier
Best value in bold green
▲▼ 𝑝 < 0.05
IRG
Explicit Relevance Models in Intent-Aware IR Diversification
35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
IR Group @ UAM Portland, OR, 13 August 2012
23. Experiments
Recommendation diversity
Collection: 6K users, 4K movies, 1M ratings
Dataset 1: MovieLens 1M
Subtopic set: 10 movie genres
Collection: 1K users, 175K artists, 20M playcounts
Dataset 2: Last.fm crawl
Subtopic set: 120K social tags on artists by Last.fm users
Queries users
Adaptation of IR diversity paradigm Documents items (movies, music artists)
Subtopics item features (genres, tags)
(Vargas, Castells & Vallet SIGIR 2011)
Relevance judgments test ratings from data split
a) pLSA
Baseline rankings: Diversified top n: 100
b) Popularity-based recommendation
Specific parameter estimates:
𝑝 𝑧 𝑞 Uniform
𝑝 𝑧 𝑑 Uniform on 𝑑 (based on binary aspect/item association)
𝑝 𝑟 𝑘 P@k estimates with 2-fold cross-validation on test users
IRG
Explicit Relevance Models in Intent-Aware IR Diversification
35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
IR Group @ UAM Portland, OR, 13 August 2012
24. Experiments – Recommendation diversity on MovieLens and Last.fm
pLSA recommender MovieLens 1M Last.fm
ERR-IA
by item popularity
Recommendation
ERR-IA
Based on 𝑝 𝑟 𝑑, 𝑞, 𝑧
Based on 𝑝 𝑑 𝑞, 𝑧
λ λ
IRG
Explicit Relevance Models in Intent-Aware IR Diversification
35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
IR Group @ UAM Portland, OR, 13 August 2012
25. Relevance-based greedy diversification
Relevance-based reformulation of diversification algorithm
1. Need to estimate 𝑝 𝑟 𝑑, 𝑞, 𝑧
2. Does it work? Test empirically
3. Further development: parameterized tolerance to redundancy
IRG
Explicit Relevance Models in Intent-Aware IR Diversification
35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
IR Group @ UAM Portland, OR, 13 August 2012
26. Adjustable tolerance to redundancy
Generalization of relevance-based diversification scheme
Formally support adjustable redundancy penalization
Approach: generalize relevance to browsing model
Tolerance to
redundancy
𝜑 𝑑, 𝑆 𝑞 = 1 − λ 𝑝 𝑟 𝑑, 𝑞 + λ 𝑝 𝑟 𝑑 , ¬ 𝒔𝒕𝒐𝒑 𝑆 𝑞 =⋯
= 1 − λ 𝑝 𝑟 𝑑, 𝑞 + λ 𝑝 𝑧 𝑞 𝑝 𝑟 𝑑, 𝑧, 𝑞 1 − 𝑝 𝑟 𝑑 ′ , 𝑧, 𝑞 𝒑 𝒔𝒕𝒐𝒑 𝒓
𝑐 𝑑 ′ ∈𝑆
Adjustable redundancy tolerance parameter 𝑝 𝑠𝑡𝑜𝑝 𝑟 ∈ [0,1]
– High 𝑝 𝑠𝑡𝑜𝑝 𝑟 for aggresive penalization, low for e.g. high-recall searches
– In this view, original formulations would implicitly assume 𝑝 𝑠𝑡𝑜𝑝 𝑟 = 1,
i.e. a single relevant document is sought
IRG
Explicit Relevance Models in Intent-Aware IR Diversification
35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
IR Group @ UAM Portland, OR, 13 August 2012
27. Adjustable tolerance to redundancy
Empirical observation: 𝑝 𝑠𝑡𝑜𝑝 𝑟 vs. in -nDCG
Search task Recommendation task
Lemur on TREC / Subtopics pLSA on MovieLens / Genres
1 1
𝑝 𝑠𝑡𝑜𝑝 𝑟
𝑝 𝑠𝑡𝑜𝑝 𝑟
0 1 0 1
best -nDCG value of column
For each
worst -nDCG value of column
IRG
Explicit Relevance Models in Intent-Aware IR Diversification
35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
IR Group @ UAM Portland, OR, 13 August 2012
28. Conclusion
Alternative, relevance-based formulation of greedy aspect-based diversification
– Unifies two previous aspect-based algorithms
– More literal expression of formal problem statement (and metrics?)
𝑝 𝑟 𝑑, 𝑞, 𝑧 vs. 𝑝 𝑑 𝑞, 𝑧
– Literal value estimates needed (rather than rank-equivalent approximations)
– Estimate based on positional relevance (relevance or click data needed)
Seems to perform well empirically
– Light requirements on relevance or click data for training positional relevance
– Improvement trend, but needs to be tested under further optimizations
Formal support for redundancy tolerance adjustment
IRG
Explicit Relevance Models in Intent-Aware IR Diversification
35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
IR Group @ UAM Portland, OR, 13 August 2012