2. Outline
What is Diversity?
Some Approaches for Diversification
MMR (SIGIR ‗98)
Multi Armed Bandits (ICML ‗08)
Portfolio Theory (SIGIR‗09)
Modeling user intent (WSDM ‗09)
Query Reformulation (WWW ‗10)
Evaluation Metrics and Datasets
Conclusions
2 2/23/2010
3. Motivation
Basic premise of all IR models that we have discussed
Is it always true?
Similar documents
Ambiguous Queries Key idea!
Polysemy : JAVA Diversify search
Insufficient information
Synonymy : Bombay or Mumbai
results
3 2/23/2010
4. What is Diversity?
Extrinsic Diversity
Uncertainty about the information need given a query
JAVA, JAGUAR
Query = page rank ; who is my audience?
A layman or an IR researcher
Intrinsic Diversity
Avoiding redundancy in search results
Encouraging novelty
Examples – literature survey, opinion mining, product reviews
4 2/23/2010
6. MMR (Carbonell et al., SIGIR ’98)
Combines query relevance and information novelty
Used in summarization and IR ranking
Instead of ranking by Relevance, rank by “relevant
novelty”
Relevance Novelty
6 2/23/2010
7. Portfolio Theory (Wang, SIGIR 09)
PRP optimizes mean of relevance scores of top-k search
results
Ideas from portfolio theory from finance:
To maximize profit, include some ―risky‖ stocks in your
portfolio
To maximize expected relevance, include some ―risky‖
documents (less relevant)
Maximize mean of relevance scores with a given risk (variance)
Risk taking encourages diversity
7 2/23/2010
8. A Learning To Rank Approach
Learn a ranking function
Manually labeled training data
Optimize IR performance metrics
Deploy the function in a live search engine
8 2/23/2010
9. A Learning To Rank Approach
Learn a ranking function
Manually labeled training data
Online Learning using Usage Data
Optimize IR performance metrics
Deploy the function in a live search engine
Use clickthrough data to maximize the
probability that a new user will find atleast
one relevant document high up in the ranking.
9 2/23/2010
10. Problem Set Up
To produce optimally diverse ranking of documents
for one fixed query.
Given a population of user
Each user ui has an associated set of relevant documents Ai
Different users have different relevant sets based on their
interpretation of the query.
User is presented with an ordered list of documents
User ut clicks on document i with probability pti
10 2/23/2010
13. Modeling User Intent
(Agrawal et al., WSDM 2009)
Assume a taxonomy of information
User intents are modeled as topics
Documents and Queries may belong to >1 topic
Distribution of user intents over topics/categories is available
(search engine logs)
Objective function trade offs relevance and diversity
Relevance – Standard IR methods
Diversity – categories in the taxonomy
13 2/23/2010
14. Objective Function
C(d) = set of categories for document d
C(q) = set of categories for query q
Distribution P(c|q) is known
V(d|q,c) = Quality Value of document d for query q, when
the intended category is c
Average Satisfaction
Probability
14 2/23/2010
15. Observations
Diversify(k) is NP-Hard
P(S|q) is a sub-modular function
IDEA!
A Greedy Approximate Solution
U(c|q,S) = probability that q belongs to c given all
documents in S fail to satisfy the user.
R(q) = set of original relevant documents
15 2/23/2010
17. Query Reformulation (Santos et al.,
WWW 2010)
Probabilistic framework for diversification
Models information need of an ambiguous query (or user
intent) as a set of sub queries
Rank documents according to following mixture model
Relevance Novelty
17 2/23/2010
18. Estimating Probabilities
– Use standard probabilistic models, eg., LM
Queries
suggested by
search engines
18 2/23/2010
21. Nugget based measures
Nugget – Any binary property such as ―a piece of
information‖, topic/category
Model user‘s information need and document‘s information
content as a set of nuggets
A document is relevant if it contains atleast one nugget from
u
J(d,i) = 1 if d contains information nugget ni; 0 otherwise
if J(d,i) = 1 ; 0 otherwise
21 2/23/2010
22. Nugget based measures
Gain at rank k
Discounted Cumulative Gain, DCG
Normalized Discounted Cumulative Gain,
22 2/23/2010
26. Concluding Remarks
Diversification has the potential to solve many IR problems
Query Ambiguity
Polysemy
Synonymy
Variations in user intents
Variations in user requirements
26 2/23/2010
28. Jaime G. Carbonell, Jade Goldstein. The Use of MMR, Diversity-Based Reranking for
Reordering Documents and Producing Summaries. SIGIR 1998: 335-336
J. Wang and J. Zhu. Portfolio Theory of Information Retrieval, in SIGIR09
Filip Radlinski, Robert Kleinberg, and Thorsten Joachims, Learning Diverse Rankings
with Multi-Armed Bandits, in Proceedings of the International Conference on Machine Learning
(ICML), 2008
Rakesh Agrawal, Sreenivas Gollapudi, Alan Halverson, and Samuel Ieong, Diversifying
Search Results, WSDM 2009: 5-14.
Rodrygo Santos, Craig Macdonald and Iadh Ounis . Exploiting Query Reformulations
For Web Search Result Diversification, in WWW 2010, (to appear)
C. Clarke, M. Kolla, G. Cormack, O. Vechtomova, A. Ashkan, S. Büttcher, and I.
MacKinnon, Novelty and Diversity in Information Retrieval Evaluation, In Proceedings of
the 31st ACM International Conference on Research and Development in Information Retrieval
(SIGIR), pp. 659-666, July 2008.
Redundancy, Diversity and Interdependent Document Relevance. F. Radlinski, Paul N.
Bennett, Ben Cartrette and Thorsten Joachims, SIGIR Forum, Vol 43, No. 2, 46—52,
December 2009
28 2/23/2010