SlideShare uma empresa Scribd logo
1 de 41
Baixar para ler offline
Efficient Diversification of Web
        Search Results
    G. Capannini, F. M. Nardini, R. Perego, and F. Silvestri
                    ISTI - CNR, Pisa, Italy
Introduction: SE Results
             Diversification

• Query: “Vinci”, what’s the user’s intent?
   • Information on Leonardo da Vinci?
   • Information on Vinci the small village in Tuscany?
   • Information on Vinci the company?
   • Others?

           F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow   2
Introduction: SE Results
             Diversification

• Query: “Vinci”, what’s the user’s intent?
   • Information on Leonardo da Vinci?
   • Information on Vinci the small village in Tuscany?
   • Information on Vinci the company?
   • Others?

           F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow   2
Introduction: SE Results
             Diversification

• Query: “Vinci”, what’s the user’s intent?
   • Information on Leonardo da Vinci?
   • Information on Vinci the small village in Tuscany?
   • Information on Vinci the company?
   • Others?

           F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow   2
Query Diversification as a
            Coverage Problem
• Hypothesis:
 • For each user’s query I can tell what’s the set of all possible intents
 • For each document in the collection I can tell what are all the possible user’s
    intents it represents
    • each intent for each document is, possibly, weighted by a value representing how
      much that intent is represented by that document (e.g., 1/2 of document D is
      related to the intent of “digital photography techniques”)
• Goal:
 • Select the set of k documents in the collection covering the maximum amount of
    intent weight. I.e., maximize the number of satisfied users.


              F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow   3
State-of-the-Art Methods


•   IASelect:
 •   Rakesh Agrawal, Sreenivas Gollapudi, Alan Halverson, and Samuel Ieong. 2009. Diversifying search results. In
     Proceedings of the Second ACM International Conference on Web Search and Data Mining (WSDM '09), Ricardo Baeza-
     Yates, Paolo Boldi, Berthier Ribeiro-Neto, and B. Barla Cambazoglu (Eds.). ACM, New York, NY, USA, 5-14.


• xQuAD:
 •   Rodrygo L. T. Santos, Craig Macdonald, and Iadh Ounis. Exploiting query reformulations for Web search
     result diversification. In Proceedings of the 19th International Conference on World Wide Web, pages 881-890, Raleigh,
     NC, USA, 2010. ACM.




                  F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow     4
Diversify (k)




F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow   5
Diversify (k)
                                                                       intents




F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow   5
Diversify (k)
                                                                                                         the weight
                                                                       intents




F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow                5
Diversify (k)
                                                                                                         the weight
                                                                       intents




                                                                               the weight is the probability of
                                                                                  being relative to intent c




F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow                5
Diversify (k)
                                                                                                         the weight
                                                                       intents




                                                                               the weight is the probability of
                                                                                  being relative to intent c




                                                                   d is not
                                                                pertinent to c




F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow                5
Diversify (k)
                                                                                                         the weight
                                                                       intents




                                                                               the weight is the probability of
                                                                                  being relative to intent c




                                                                   d is not
                                                                pertinent to c
                                                   no doc is
                                                 pertinent to c



F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow                5
Diversify (k)
                                                                                                         the weight
                                                                       intents




                                                                               the weight is the probability of
                                                                                  being relative to intent c




                                                                   d is not
                                                                pertinent to c

                at least one doc is                no doc is
                  pertinent to c                 pertinent to c



F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow                5
Known Results
• Diversify(k) is NP-hard:
 • Reduction from max-weight coverage
• Diversify(k)’s objective function is sub-modular:
 • Admits a (1-1/e)-approx. algorithm.
 • The algorithm works by inserting one result at a time, we insert the
   result with the max marginal utility.
 • Quadratic complexity in the number of results to consider:
  • at each iteration scan the complete list of not-yet-inserted results.
            F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow   6
Known Results
• Diversify(k) is NP-hard:
 • Reduction from max-weight coverage
• Diversify(k)’s objective function is sub-modular:
 • Admits a (1-1/e)-approx. algorithm.
 • The algorithm works by inserting one result at a time, we insert the
   result with the max marginal utility.
 • Quadratic complexity in the number of results to consider:
  • at each iteration scan the complete list of not-yet-inserted results.
            F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow   6
It looks reasonable, but...
•   ... we might not diversify, at all!
•   Consider a query returning a set Rd={a,b,c} of documents and two possible categories g,h.
•   The query is pertaining to each document with the same probability, i.e., P(g|q) = P(h|q) =
    1/2.

                                     dV                     V(x|q,g)                     V(x|q,h)
                                      a                           1                            0
                                      b                           1                            0
                                      c                          1/2                          1/2


•   The optimal selection is S={a,b}, replacing either a or b with c will make the objective
    function decrease its value.


                  F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow   7
It looks reasonable, but...
•   ... we might not diversify, at all!
•   Consider a query returning a set Rd={a,b,c} of documents and two possible categories g,h.
•   The query is pertaining to each document with the same probability, i.e., P(g|q) = P(h|q) =
    1/2.

                                     dV                     V(x|q,g)                     V(x|q,h)
                                      a                           1                            0
                                      b                           1                            0
                                      c                          1/2                          1/2


•   The optimal selection is S={a,b}, replacing either a or b with c will make the objective
    function decrease its value.


                  F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow   7
xQuAD_Diversify(k)




F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow   8
xQuAD_Diversify(k)




F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow   8
xQuAD_Diversify(k)




                                                                       Same problem as before...
                                                                       It may not diversify, at all.
F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow   8
Our Proposal:
                   MaxUtility




F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow   9
Vinci                     Our Proposal:
                           MaxUtility




        F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow   9
Leonardo da Vinci
Vinci      Vinci Town                      Our Proposal:
           Vinci Group                      MaxUtility




                         F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow   9
Leonardo da Vinci
Vinci      Vinci Town
                    1/3
                          5/12
                                            Our Proposal:
           Vinci Group
                    1/4
                                             MaxUtility




                          F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow   9
Leonardo da Vinci
Vinci      Vinci Town
                    1/3
                          5/12
                                            Our Proposal:
           Vinci Group
                    1/4
                                             MaxUtility



                     Rq                                                                                                     S




                          F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow   9
Leonardo da Vinci
Vinci      Vinci Town
                    1/3
                          5/12
                                            Our Proposal:
           Vinci Group
                    1/4
                                             MaxUtility



                     Rq                                                                                                     S




                          F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow   9
MaxUtility_Diversify(k)




F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow   10
MaxUtility_Diversify(k)



                                                                                                         Probability of query q’ being a
                                                                                                           specialization for query q




F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow                                 10
MaxUtility_Diversify(k)



                                                                                                         Probability of query q’ being a
                                                                                                           specialization for query q


                                            Set of possible query
                                               specializations




F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow                                 10
Why it is Efficient?

• By using a simple arithmetic argument we can show that:


• Therefore we can find the optimal set S of diversified
 documents by using a sort-based approach.


          F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow   11
OptSelect




F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow   12
OptSelect




F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow   12
The Specialization Set Sq
• It is crucial for OptSelect to
  have the set of specialization
  available for each query.
• Our method is, thus, query log-
  based.
 • we use a query recommender system
   to obtain a set of queries from which Sq
   is built by including the most popular
   (i.e., freq. in query log > f(q) / s)
   recommendations:


                    F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow   13
Probability Estimation




F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow   14
Usefulness of a Result




F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow   15
Usefulness of a Result




F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow   15
Experiments: Settings

• TREC 2009 Web track's Diversity Task framework:
 • ClueWeb-B, the subset of the TREC ClueWeb09 dataset
 • The 50 topics (i.e., queries) provided by TREC
 • We evaluate α-NDCG and IA-P
• All the tests were conducted on a Intel Core 2 Quad PC with
 8Gb of RAM and Ubuntu Linux 9.10 (kernel 2.6.31-22).


          F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow   16
Experiments: Quality




F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow   17
Experiments: Efficiency




F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow   18
Conclusions and Future Work
• We studied the problem of search results diversification from an efficiency point of
  view
• We derived a diversification method (OptSelect):
  •   same (or better) quality of the state of the art

  •   up to 100 times faster

• Future work:
  •   the exploitation of users' search history for personalizing result diversification

  •   the use of click-through data to improve our effectiveness results, and

  •   the study of a search architecture performing the diversification task in parallel with the
      document scoring phase (Done! See DDR2011 paper)


                 F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow   19
Question Time




                                     Fabrizio Silvestri
                                   ISTI-CNR, Pisa Italy
                          http://hpc.isti.cnr.it/~fabriziosilvestri
                                   f.silvestri@isti.cnr.it
F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow   20

Mais conteúdo relacionado

Mais de yaevents

Дом из готовых кирпичей. Библиотека блоков, тюнинг, инструменты. Елена Глухов...
Дом из готовых кирпичей. Библиотека блоков, тюнинг, инструменты. Елена Глухов...Дом из готовых кирпичей. Библиотека блоков, тюнинг, инструменты. Елена Глухов...
Дом из готовых кирпичей. Библиотека блоков, тюнинг, инструменты. Елена Глухов...yaevents
 
Модели в профессиональной инженерии и тестировании программ. Александр Петрен...
Модели в профессиональной инженерии и тестировании программ. Александр Петрен...Модели в профессиональной инженерии и тестировании программ. Александр Петрен...
Модели в профессиональной инженерии и тестировании программ. Александр Петрен...yaevents
 
Администрирование небольших сервисов или один за всех и 100 на одного. Роман ...
Администрирование небольших сервисов или один за всех и 100 на одного. Роман ...Администрирование небольших сервисов или один за всех и 100 на одного. Роман ...
Администрирование небольших сервисов или один за всех и 100 на одного. Роман ...yaevents
 
Мониторинг со всех сторон. Алексей Симаков, Яндекс
Мониторинг со всех сторон. Алексей Симаков, ЯндексМониторинг со всех сторон. Алексей Симаков, Яндекс
Мониторинг со всех сторон. Алексей Симаков, Яндексyaevents
 
Истории про разработку сайтов. Сергей Бережной, Яндекс
Истории про разработку сайтов. Сергей Бережной, ЯндексИстории про разработку сайтов. Сергей Бережной, Яндекс
Истории про разработку сайтов. Сергей Бережной, Яндексyaevents
 
Разработка приложений для Android на С++. Юрий Береза, Shturmann
Разработка приложений для Android на С++. Юрий Береза, ShturmannРазработка приложений для Android на С++. Юрий Береза, Shturmann
Разработка приложений для Android на С++. Юрий Береза, Shturmannyaevents
 
Кросс-платформенная разработка под мобильные устройства. Дмитрий Жестилевский...
Кросс-платформенная разработка под мобильные устройства. Дмитрий Жестилевский...Кросс-платформенная разработка под мобильные устройства. Дмитрий Жестилевский...
Кросс-платформенная разработка под мобильные устройства. Дмитрий Жестилевский...yaevents
 
Сложнейшие техники, применяемые буткитами и полиморфными вирусами. Вячеслав З...
Сложнейшие техники, применяемые буткитами и полиморфными вирусами. Вячеслав З...Сложнейшие техники, применяемые буткитами и полиморфными вирусами. Вячеслав З...
Сложнейшие техники, применяемые буткитами и полиморфными вирусами. Вячеслав З...yaevents
 
Сканирование уязвимостей со вкусом Яндекса. Тарас Иващенко, Яндекс
Сканирование уязвимостей со вкусом Яндекса. Тарас Иващенко, ЯндексСканирование уязвимостей со вкусом Яндекса. Тарас Иващенко, Яндекс
Сканирование уязвимостей со вкусом Яндекса. Тарас Иващенко, Яндексyaevents
 
Масштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebook
Масштабируемость Hadoop в Facebook. Дмитрий Мольков, FacebookМасштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebook
Масштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebookyaevents
 
Контроль зверей: инструменты для управления и мониторинга распределенных сист...
Контроль зверей: инструменты для управления и мониторинга распределенных сист...Контроль зверей: инструменты для управления и мониторинга распределенных сист...
Контроль зверей: инструменты для управления и мониторинга распределенных сист...yaevents
 
Юнит-тестирование и Google Mock. Влад Лосев, Google
Юнит-тестирование и Google Mock. Влад Лосев, GoogleЮнит-тестирование и Google Mock. Влад Лосев, Google
Юнит-тестирование и Google Mock. Влад Лосев, Googleyaevents
 
C++11 (formerly known as C++0x) is the new C++ language standard. Dave Abraha...
C++11 (formerly known as C++0x) is the new C++ language standard. Dave Abraha...C++11 (formerly known as C++0x) is the new C++ language standard. Dave Abraha...
C++11 (formerly known as C++0x) is the new C++ language standard. Dave Abraha...yaevents
 
Зачем обычному программисту знать языки, на которых почти никто не пишет. Але...
Зачем обычному программисту знать языки, на которых почти никто не пишет. Але...Зачем обычному программисту знать языки, на которых почти никто не пишет. Але...
Зачем обычному программисту знать языки, на которых почти никто не пишет. Але...yaevents
 
В поисках математики. Михаил Денисенко, Нигма
В поисках математики. Михаил Денисенко, НигмаВ поисках математики. Михаил Денисенко, Нигма
В поисках математики. Михаил Денисенко, Нигмаyaevents
 
Using classifiers to compute similarities between face images. Prof. Lior Wol...
Using classifiers to compute similarities between face images. Prof. Lior Wol...Using classifiers to compute similarities between face images. Prof. Lior Wol...
Using classifiers to compute similarities between face images. Prof. Lior Wol...yaevents
 
Поисковая технология "Спектр". Андрей Плахов, Яндекс
Поисковая технология "Спектр". Андрей Плахов, ЯндексПоисковая технология "Спектр". Андрей Плахов, Яндекс
Поисковая технология "Спектр". Андрей Плахов, Яндексyaevents
 
Julia Stoyanovich - Making interval-based clustering rank-aware
Julia Stoyanovich - Making interval-based clustering rank-awareJulia Stoyanovich - Making interval-based clustering rank-aware
Julia Stoyanovich - Making interval-based clustering rank-awareyaevents
 
Mike Thelwall - Sentiment strength detection for the social web: From YouTube...
Mike Thelwall - Sentiment strength detection for the social web: From YouTube...Mike Thelwall - Sentiment strength detection for the social web: From YouTube...
Mike Thelwall - Sentiment strength detection for the social web: From YouTube...yaevents
 
Evangelos Kanoulas — Advances in Information Retrieval Evaluation
Evangelos Kanoulas — Advances in Information Retrieval EvaluationEvangelos Kanoulas — Advances in Information Retrieval Evaluation
Evangelos Kanoulas — Advances in Information Retrieval Evaluationyaevents
 

Mais de yaevents (20)

Дом из готовых кирпичей. Библиотека блоков, тюнинг, инструменты. Елена Глухов...
Дом из готовых кирпичей. Библиотека блоков, тюнинг, инструменты. Елена Глухов...Дом из готовых кирпичей. Библиотека блоков, тюнинг, инструменты. Елена Глухов...
Дом из готовых кирпичей. Библиотека блоков, тюнинг, инструменты. Елена Глухов...
 
Модели в профессиональной инженерии и тестировании программ. Александр Петрен...
Модели в профессиональной инженерии и тестировании программ. Александр Петрен...Модели в профессиональной инженерии и тестировании программ. Александр Петрен...
Модели в профессиональной инженерии и тестировании программ. Александр Петрен...
 
Администрирование небольших сервисов или один за всех и 100 на одного. Роман ...
Администрирование небольших сервисов или один за всех и 100 на одного. Роман ...Администрирование небольших сервисов или один за всех и 100 на одного. Роман ...
Администрирование небольших сервисов или один за всех и 100 на одного. Роман ...
 
Мониторинг со всех сторон. Алексей Симаков, Яндекс
Мониторинг со всех сторон. Алексей Симаков, ЯндексМониторинг со всех сторон. Алексей Симаков, Яндекс
Мониторинг со всех сторон. Алексей Симаков, Яндекс
 
Истории про разработку сайтов. Сергей Бережной, Яндекс
Истории про разработку сайтов. Сергей Бережной, ЯндексИстории про разработку сайтов. Сергей Бережной, Яндекс
Истории про разработку сайтов. Сергей Бережной, Яндекс
 
Разработка приложений для Android на С++. Юрий Береза, Shturmann
Разработка приложений для Android на С++. Юрий Береза, ShturmannРазработка приложений для Android на С++. Юрий Береза, Shturmann
Разработка приложений для Android на С++. Юрий Береза, Shturmann
 
Кросс-платформенная разработка под мобильные устройства. Дмитрий Жестилевский...
Кросс-платформенная разработка под мобильные устройства. Дмитрий Жестилевский...Кросс-платформенная разработка под мобильные устройства. Дмитрий Жестилевский...
Кросс-платформенная разработка под мобильные устройства. Дмитрий Жестилевский...
 
Сложнейшие техники, применяемые буткитами и полиморфными вирусами. Вячеслав З...
Сложнейшие техники, применяемые буткитами и полиморфными вирусами. Вячеслав З...Сложнейшие техники, применяемые буткитами и полиморфными вирусами. Вячеслав З...
Сложнейшие техники, применяемые буткитами и полиморфными вирусами. Вячеслав З...
 
Сканирование уязвимостей со вкусом Яндекса. Тарас Иващенко, Яндекс
Сканирование уязвимостей со вкусом Яндекса. Тарас Иващенко, ЯндексСканирование уязвимостей со вкусом Яндекса. Тарас Иващенко, Яндекс
Сканирование уязвимостей со вкусом Яндекса. Тарас Иващенко, Яндекс
 
Масштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebook
Масштабируемость Hadoop в Facebook. Дмитрий Мольков, FacebookМасштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebook
Масштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebook
 
Контроль зверей: инструменты для управления и мониторинга распределенных сист...
Контроль зверей: инструменты для управления и мониторинга распределенных сист...Контроль зверей: инструменты для управления и мониторинга распределенных сист...
Контроль зверей: инструменты для управления и мониторинга распределенных сист...
 
Юнит-тестирование и Google Mock. Влад Лосев, Google
Юнит-тестирование и Google Mock. Влад Лосев, GoogleЮнит-тестирование и Google Mock. Влад Лосев, Google
Юнит-тестирование и Google Mock. Влад Лосев, Google
 
C++11 (formerly known as C++0x) is the new C++ language standard. Dave Abraha...
C++11 (formerly known as C++0x) is the new C++ language standard. Dave Abraha...C++11 (formerly known as C++0x) is the new C++ language standard. Dave Abraha...
C++11 (formerly known as C++0x) is the new C++ language standard. Dave Abraha...
 
Зачем обычному программисту знать языки, на которых почти никто не пишет. Але...
Зачем обычному программисту знать языки, на которых почти никто не пишет. Але...Зачем обычному программисту знать языки, на которых почти никто не пишет. Але...
Зачем обычному программисту знать языки, на которых почти никто не пишет. Але...
 
В поисках математики. Михаил Денисенко, Нигма
В поисках математики. Михаил Денисенко, НигмаВ поисках математики. Михаил Денисенко, Нигма
В поисках математики. Михаил Денисенко, Нигма
 
Using classifiers to compute similarities between face images. Prof. Lior Wol...
Using classifiers to compute similarities between face images. Prof. Lior Wol...Using classifiers to compute similarities between face images. Prof. Lior Wol...
Using classifiers to compute similarities between face images. Prof. Lior Wol...
 
Поисковая технология "Спектр". Андрей Плахов, Яндекс
Поисковая технология "Спектр". Андрей Плахов, ЯндексПоисковая технология "Спектр". Андрей Плахов, Яндекс
Поисковая технология "Спектр". Андрей Плахов, Яндекс
 
Julia Stoyanovich - Making interval-based clustering rank-aware
Julia Stoyanovich - Making interval-based clustering rank-awareJulia Stoyanovich - Making interval-based clustering rank-aware
Julia Stoyanovich - Making interval-based clustering rank-aware
 
Mike Thelwall - Sentiment strength detection for the social web: From YouTube...
Mike Thelwall - Sentiment strength detection for the social web: From YouTube...Mike Thelwall - Sentiment strength detection for the social web: From YouTube...
Mike Thelwall - Sentiment strength detection for the social web: From YouTube...
 
Evangelos Kanoulas — Advances in Information Retrieval Evaluation
Evangelos Kanoulas — Advances in Information Retrieval EvaluationEvangelos Kanoulas — Advances in Information Retrieval Evaluation
Evangelos Kanoulas — Advances in Information Retrieval Evaluation
 

Último

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 

Último (20)

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 

"Efficient Diversification of Web Search Results"

  • 1. Efficient Diversification of Web Search Results G. Capannini, F. M. Nardini, R. Perego, and F. Silvestri ISTI - CNR, Pisa, Italy
  • 2. Introduction: SE Results Diversification • Query: “Vinci”, what’s the user’s intent? • Information on Leonardo da Vinci? • Information on Vinci the small village in Tuscany? • Information on Vinci the company? • Others? F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 2
  • 3. Introduction: SE Results Diversification • Query: “Vinci”, what’s the user’s intent? • Information on Leonardo da Vinci? • Information on Vinci the small village in Tuscany? • Information on Vinci the company? • Others? F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 2
  • 4. Introduction: SE Results Diversification • Query: “Vinci”, what’s the user’s intent? • Information on Leonardo da Vinci? • Information on Vinci the small village in Tuscany? • Information on Vinci the company? • Others? F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 2
  • 5. Query Diversification as a Coverage Problem • Hypothesis: • For each user’s query I can tell what’s the set of all possible intents • For each document in the collection I can tell what are all the possible user’s intents it represents • each intent for each document is, possibly, weighted by a value representing how much that intent is represented by that document (e.g., 1/2 of document D is related to the intent of “digital photography techniques”) • Goal: • Select the set of k documents in the collection covering the maximum amount of intent weight. I.e., maximize the number of satisfied users. F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 3
  • 6. State-of-the-Art Methods • IASelect: • Rakesh Agrawal, Sreenivas Gollapudi, Alan Halverson, and Samuel Ieong. 2009. Diversifying search results. In Proceedings of the Second ACM International Conference on Web Search and Data Mining (WSDM '09), Ricardo Baeza- Yates, Paolo Boldi, Berthier Ribeiro-Neto, and B. Barla Cambazoglu (Eds.). ACM, New York, NY, USA, 5-14. • xQuAD: • Rodrygo L. T. Santos, Craig Macdonald, and Iadh Ounis. Exploiting query reformulations for Web search result diversification. In Proceedings of the 19th International Conference on World Wide Web, pages 881-890, Raleigh, NC, USA, 2010. ACM. F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 4
  • 7. Diversify (k) F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 5
  • 8. Diversify (k) intents F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 5
  • 9. Diversify (k) the weight intents F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 5
  • 10. Diversify (k) the weight intents the weight is the probability of being relative to intent c F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 5
  • 11. Diversify (k) the weight intents the weight is the probability of being relative to intent c d is not pertinent to c F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 5
  • 12. Diversify (k) the weight intents the weight is the probability of being relative to intent c d is not pertinent to c no doc is pertinent to c F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 5
  • 13. Diversify (k) the weight intents the weight is the probability of being relative to intent c d is not pertinent to c at least one doc is no doc is pertinent to c pertinent to c F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 5
  • 14. Known Results • Diversify(k) is NP-hard: • Reduction from max-weight coverage • Diversify(k)’s objective function is sub-modular: • Admits a (1-1/e)-approx. algorithm. • The algorithm works by inserting one result at a time, we insert the result with the max marginal utility. • Quadratic complexity in the number of results to consider: • at each iteration scan the complete list of not-yet-inserted results. F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 6
  • 15. Known Results • Diversify(k) is NP-hard: • Reduction from max-weight coverage • Diversify(k)’s objective function is sub-modular: • Admits a (1-1/e)-approx. algorithm. • The algorithm works by inserting one result at a time, we insert the result with the max marginal utility. • Quadratic complexity in the number of results to consider: • at each iteration scan the complete list of not-yet-inserted results. F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 6
  • 16. It looks reasonable, but... • ... we might not diversify, at all! • Consider a query returning a set Rd={a,b,c} of documents and two possible categories g,h. • The query is pertaining to each document with the same probability, i.e., P(g|q) = P(h|q) = 1/2. dV V(x|q,g) V(x|q,h) a 1 0 b 1 0 c 1/2 1/2 • The optimal selection is S={a,b}, replacing either a or b with c will make the objective function decrease its value. F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 7
  • 17. It looks reasonable, but... • ... we might not diversify, at all! • Consider a query returning a set Rd={a,b,c} of documents and two possible categories g,h. • The query is pertaining to each document with the same probability, i.e., P(g|q) = P(h|q) = 1/2. dV V(x|q,g) V(x|q,h) a 1 0 b 1 0 c 1/2 1/2 • The optimal selection is S={a,b}, replacing either a or b with c will make the objective function decrease its value. F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 7
  • 18. xQuAD_Diversify(k) F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 8
  • 19. xQuAD_Diversify(k) F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 8
  • 20. xQuAD_Diversify(k) Same problem as before... It may not diversify, at all. F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 8
  • 21. Our Proposal: MaxUtility F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 9
  • 22. Vinci Our Proposal: MaxUtility F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 9
  • 23. Leonardo da Vinci Vinci Vinci Town Our Proposal: Vinci Group MaxUtility F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 9
  • 24. Leonardo da Vinci Vinci Vinci Town 1/3 5/12 Our Proposal: Vinci Group 1/4 MaxUtility F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 9
  • 25. Leonardo da Vinci Vinci Vinci Town 1/3 5/12 Our Proposal: Vinci Group 1/4 MaxUtility Rq S F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 9
  • 26. Leonardo da Vinci Vinci Vinci Town 1/3 5/12 Our Proposal: Vinci Group 1/4 MaxUtility Rq S F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 9
  • 27. MaxUtility_Diversify(k) F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 10
  • 28. MaxUtility_Diversify(k) Probability of query q’ being a specialization for query q F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 10
  • 29. MaxUtility_Diversify(k) Probability of query q’ being a specialization for query q Set of possible query specializations F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 10
  • 30. Why it is Efficient? • By using a simple arithmetic argument we can show that: • Therefore we can find the optimal set S of diversified documents by using a sort-based approach. F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 11
  • 31. OptSelect F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 12
  • 32. OptSelect F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 12
  • 33. The Specialization Set Sq • It is crucial for OptSelect to have the set of specialization available for each query. • Our method is, thus, query log- based. • we use a query recommender system to obtain a set of queries from which Sq is built by including the most popular (i.e., freq. in query log > f(q) / s) recommendations: F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 13
  • 34. Probability Estimation F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 14
  • 35. Usefulness of a Result F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 15
  • 36. Usefulness of a Result F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 15
  • 37. Experiments: Settings • TREC 2009 Web track's Diversity Task framework: • ClueWeb-B, the subset of the TREC ClueWeb09 dataset • The 50 topics (i.e., queries) provided by TREC • We evaluate α-NDCG and IA-P • All the tests were conducted on a Intel Core 2 Quad PC with 8Gb of RAM and Ubuntu Linux 9.10 (kernel 2.6.31-22). F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 16
  • 38. Experiments: Quality F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 17
  • 39. Experiments: Efficiency F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 18
  • 40. Conclusions and Future Work • We studied the problem of search results diversification from an efficiency point of view • We derived a diversification method (OptSelect): • same (or better) quality of the state of the art • up to 100 times faster • Future work: • the exploitation of users' search history for personalizing result diversification • the use of click-through data to improve our effectiveness results, and • the study of a search architecture performing the diversification task in parallel with the document scoring phase (Done! See DDR2011 paper) F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 19
  • 41. Question Time Fabrizio Silvestri ISTI-CNR, Pisa Italy http://hpc.isti.cnr.it/~fabriziosilvestri f.silvestri@isti.cnr.it F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 20