SlideShare uma empresa Scribd logo
1 de 28
Baixar para ler offline
35th Annual International ACM SIGIR Conference on Research
                    and Development in Information Retrieval (SIGIR 2012)



                        Explicit Relevance Models
                   in Intent-Aware IR Diversification
                           Saúl Vargas, Pablo Castells and David Vallet
                               Universidad Autónoma de Madrid
                                        http://ir.ii.uam.es

                                         Portland, OR, 13 August 2012




IRG
                                           Explicit Relevance Models in Intent-Aware IR Diversification
                          35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
IR Group @ UAM                                             Portland, OR, 13 August 2012
Outline


            Context: IR diversification formulation and algorithms

            Proposed approach: relevance-based reformulation
                 of diversification algorithms

            Experiments

            Adjustable tolerance to redundancy

            Conclusion




IRG
                                           Explicit Relevance Models in Intent-Aware IR Diversification
                          35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
IR Group @ UAM                                             Portland, OR, 13 August 2012
IR diversity – Brief recap

                                                                                                               Nutrition /
                                                                                                               Health


                                                                                                               Appliance


                                                                                                               Chemical
                                                                                                               element


                                                                                                               Golf


                                                                                                               Mining /
                                                                                                               Metallurgy



IRG
                                  Explicit Relevance Models in Intent-Aware IR Diversification
                 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
IR Group @ UAM                                    Portland, OR, 13 August 2012
IR diversity – Brief recap

                                                                                                                         Nutrition /
                                                                                                                         Health


                                                                                                                         Appliance
                  Diversity as a means to address uncertainty in user queries
                    – The same query may have different intents or aspects in the Chemical
                      information need underneath                                 element
                  Revision of document relevance independence
                    – Marginal utility of additional relevant documents decreases fast
                                                                                Golf
                  Trade diminishing marginal utility for increased intent coverage
                    – Thus maximize the number of users who obtain at least some
                      useful document                                            Mining /
                                                                                                                         Metallurgy



IRG
                                            Explicit Relevance Models in Intent-Aware IR Diversification
                           35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
IR Group @ UAM                                              Portland, OR, 13 August 2012
IR diversification – Problem statement

   Given a query 𝑞 on a collection 
   Find 𝑆 ⊂  of given size maximizing:                                                                              NP-hard
     𝑝 some 𝑑 ∈ 𝑆 relevant 𝑞
   Agrawal 2009, Santos 2010, Chen 2006, …


          𝑅− 𝑆                                                                        𝑆
    Baseline              arg max 𝝋 𝒅, 𝑺 𝒒                                          Diversified                      Greedy
     ranking                 𝑑∈𝑅−𝑆                                                  ranking                          approx
      𝑝(𝑑|𝑞)


      𝝋 𝒅, 𝑺 𝒒 ∝ 𝑝 𝑑 is relevant ∧ no 𝑑 ′ ∈ 𝑆 is relevant                                                        𝑞



IRG
                                    Explicit Relevance Models in Intent-Aware IR Diversification
                   35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
IR Group @ UAM                                      Portland, OR, 13 August 2012
IR diversity – Instantiations of objective function

   State of the art aspect-based approaches
    IA-Select scheme (Agrawal 2009)

             𝜑 𝑑, 𝑆 𝑞 =              𝑝 𝒛 𝑞 𝑝 𝒛 𝑑 𝑝 𝑑 𝑞                                          1 − 𝑝 𝒛 𝑑′ 𝑝 𝑑 𝑞
                               𝑧                                                      𝑑 ′ ∈𝑆

                          Explicit query aspects
    xQuAD scheme (Santos 2010)
             𝜑 𝑑, 𝑆 𝑞 = 1 − 𝜆 𝑝 𝑑 𝑞 + 𝜆 𝑝 𝑑, ¬ 𝑆 𝑞

          = 1− 𝜆 𝑝 𝑑 𝑞 + 𝜆                                 𝑝 𝒛 𝑞 𝑝 𝑑 𝑞, 𝒛                                  1 − 𝑝 𝑑′ 𝑞, 𝒛
                                                    𝑧                                           𝑑 ′ ∈𝑆

                                                                           Explicit query aspects
IRG
                                           Explicit Relevance Models in Intent-Aware IR Diversification
                          35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
IR Group @ UAM                                             Portland, OR, 13 August 2012
IR diversity – Instantiations of objective function

   State of the art aspect-based approaches
    IA-Select scheme (Agrawal 2009)

             𝜑 𝑑, 𝑆 𝑞 =              𝑝 𝑧 𝑞 𝑝 𝑧 𝑑 𝑝 𝑑 𝑞                                          1 − 𝑝 𝑧 𝑑′ 𝑝 𝑑 𝑞
                               𝑧                                                     𝑑 ′ ∈𝑆


                        Query aspect
    xQuAD scheme (Santos 2010)
                          coverage
      𝜑 𝑑, 𝑆 𝑞 = 1 − 𝜆 𝑝 𝑑 𝑞 + 𝜆 𝑝 𝑑, ¬ 𝑆 𝑞

          = 1− 𝜆 𝑝 𝑑 𝑞 + 𝜆                                 𝑝 𝑧 𝑞 𝑝 𝑑 𝑞, 𝑧                                  1 − 𝑝 𝑑 ′ 𝑞, 𝑧
                                                    𝑧                                           𝑑 ′ ∈𝑆




IRG
                                           Explicit Relevance Models in Intent-Aware IR Diversification
                          35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
IR Group @ UAM                                             Portland, OR, 13 August 2012
IR diversity – Instantiations of objective function

   State of the art aspect-based approaches
    IA-Select scheme (Agrawal 2009)

             𝜑 𝑑, 𝑆 𝑞 =              𝑝 𝑧 𝑞 𝑝 𝑧 𝑑 𝑝 𝑑 𝑞                                          1 − 𝑝 𝑧 𝑑′ 𝑝 𝑑 𝑞
                               𝑧                                                     𝑑 ′ ∈𝑆



                           Document “relevance”
    xQuAD scheme (Santos 2010)
                              for query aspect
      𝜑 𝑑, 𝑆 𝑞 = 1 − 𝜆 𝑝 𝑑 𝑞 + 𝜆 𝑝 𝑑, ¬ 𝑆 𝑞

          = 1− 𝜆 𝑝 𝑑 𝑞 + 𝜆                                 𝑝 𝑧 𝑞 𝑝 𝑑 𝑞, 𝑧                                  1 − 𝑝 𝑑 ′ 𝑞, 𝑧
                                                    𝑧                                           𝑑 ′ ∈𝑆




IRG
                                           Explicit Relevance Models in Intent-Aware IR Diversification
                          35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
IR Group @ UAM                                             Portland, OR, 13 August 2012
IR diversity – Instantiations of objective function

   State of the art aspect-based approaches
    IA-Select scheme (Agrawal 2009)

             𝜑 𝑑, 𝑆 𝑞 =              𝑝 𝑧 𝑞 𝑝 𝑧 𝑑 𝑝 𝑑 𝑞                                          1 − 𝑝 𝑧 𝑑′ 𝑝 𝑑 𝑞
                               𝑧                                                     𝑑 ′ ∈𝑆



    xQuAD scheme (Santos 2010)                                                                           Redundancy
                                                                                                          penalization
             𝜑 𝑑, 𝑆 𝑞 = 1 − 𝜆 𝑝 𝑑 𝑞 + 𝜆 𝑝 𝑑, ¬ 𝑆 𝑞

          = 1− 𝜆 𝑝 𝑑 𝑞 + 𝜆                                 𝑝 𝑧 𝑞 𝑝 𝑑 𝑞, 𝑧                                  1 − 𝑝 𝑑 ′ 𝑞, 𝑧
                                                    𝑧                                           𝑑 ′ ∈𝑆




IRG
                                           Explicit Relevance Models in Intent-Aware IR Diversification
                          35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
IR Group @ UAM                                             Portland, OR, 13 August 2012
IR diversity – Instantiations of objective function

   State of the art aspect-based approaches
    IA-Select scheme (Agrawal 2009)

             𝜑 𝑑, 𝑆 𝑞 =               𝑝 𝑧 𝑞 𝑝 𝑧 𝑑 𝑝 𝑑 𝑞                                          1 − 𝑝 𝑧 𝑑′ 𝑝 𝑑 𝑞
                                𝑧                                                     𝑑 ′ ∈𝑆



    xQuAD scheme (Santos 2010)
             𝜑 𝑑, 𝑆 𝑞 = 1 − 𝜆 𝑝 𝑑 𝑞 + 𝜆 𝑝 𝑑, ¬ 𝑆 𝑞

          = 1− 𝜆 𝑝 𝑑 𝑞 + 𝜆                                  𝑝 𝑧 𝑞 𝑝 𝑑 𝑞, 𝑧                                  1 − 𝑝 𝑑 ′ 𝑞, 𝑧
                                                     𝑧                                           𝑑 ′ ∈𝑆

                 Mixture with baseline                           𝜆  Degree of diversification
IRG
                                            Explicit Relevance Models in Intent-Aware IR Diversification
                           35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
IR Group @ UAM                                              Portland, OR, 13 August 2012
IR diversity – Instantiations of objective function

                 𝜑 𝑑, 𝑆 𝑞 ∝ 𝑝 𝑑 is 𝐫𝐞𝐥𝐞𝐯𝐚𝐧𝐭 ∧ no 𝑑 ′ ∈ 𝑆 is 𝐫𝐞𝐥𝐞𝐯𝐚𝐧𝐭                                                     𝑞

    IA-Select scheme (Agrawal 2009)

             𝜑 𝑑, 𝑆 𝑞 =               𝑝 𝑧 𝑞 𝑝 𝑧 𝑑 𝑝 𝑑 𝑞                                          1 − 𝑝 𝑧 𝑑′ 𝑝 𝑑 𝑞
                                𝑧                                                     𝑑 ′ ∈𝑆


                                                                                                              Probability to
    xQuAD scheme (Santos 2010)                                                                            observe documents
             𝜑 𝑑, 𝑆 𝑞 = 1 − 𝜆 𝑝 𝑑 𝑞 + 𝜆 𝑝 𝑑, ¬ 𝑆 𝑞

          = 1− 𝜆 𝑝 𝑑 𝑞 + 𝜆                                  𝑝 𝑧 𝑞 𝑝 𝑑 𝑞, 𝑧                                  1 − 𝑝 𝑑 ′ 𝑞, 𝑧
                                                     𝑧                                           𝑑 ′ ∈𝑆




IRG
                                            Explicit Relevance Models in Intent-Aware IR Diversification
                           35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
IR Group @ UAM                                              Portland, OR, 13 August 2012
IR diversity – Relevance-based instantiation of objective function

                 𝜑 𝑑, 𝑆 𝑞 ∝ 𝑝 𝑑 is 𝐫𝐞𝐥𝐞𝐯𝐚𝐧𝐭 ∧ no 𝑑 ′ ∈ 𝑆 is 𝐫𝐞𝐥𝐞𝐯𝐚𝐧𝐭                                                     𝑞

    IA-Select scheme – relevance-based                                                                   Our proposal
             𝜑 𝑑, 𝑆 𝑞 =               𝑝 𝑧 𝑞 𝑝 𝒓 𝑑, 𝑞, 𝑧                                    1 − 𝑝 𝒓 𝑑 ′ , 𝑞, 𝑧
                                𝑧                                                𝑑 ′ ∈𝑆


                                                                                                                    Probability
    xQuAD scheme – relevance-based                                                                                of relevance
             𝜑 𝑑, 𝑆 𝑞 = 1 − 𝜆 𝑝 𝒓 𝑑 𝑞 + 𝜆 𝑝 𝒓 𝑑 , ¬ 𝒓 𝑆 𝑞

   = 1 − 𝜆 𝑝 𝒓 𝑑, 𝑞 + 𝜆                                     𝑝 𝑧 𝑞 𝑝 𝒓 𝑑, 𝑞, 𝑧                                    1 − 𝑝 𝒓 𝑑′ , 𝑞, 𝑧
                                                      𝑧                                                𝑑 ′ ∈𝑆




IRG
                                            Explicit Relevance Models in Intent-Aware IR Diversification
                           35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
IR Group @ UAM                                              Portland, OR, 13 August 2012
IR diversity – Relevance-based instantiation of objective function

                 𝜑 𝑑, 𝑆 𝑞 ∝ 𝑝 𝑑 is 𝐫𝐞𝐥𝐞𝐯𝐚𝐧𝐭 ∧ no 𝑑 ′ ∈ 𝑆 is 𝐫𝐞𝐥𝐞𝐯𝐚𝐧𝐭                                                      𝑞

    IA-Select scheme – relevance-based

             𝜑 𝑑, 𝑆 𝑞 =               𝑝 𝑧 𝑞 𝑝 𝑟 𝑑, 𝑞, 𝑧          1 − 𝑝 𝑟 𝑑 ′ , 𝑞, 𝑧
                                𝑧        More literal interpretation
                                                          𝑑 ′ ∈𝑆
                                        of initial problem statement

    xQuAD scheme – relevance-based
             𝜑 𝑑, 𝑆 𝑞 = 1 − 𝜆 𝑝 𝒓 𝑑 𝑞 + 𝜆 𝑝 𝒓 𝑑 , ¬ 𝒓 𝑆 𝑞

   = 1 − 𝜆 𝑝 𝑟 𝑑, 𝑞 + 𝜆                                     𝑝 𝑧 𝑞 𝑝 𝑟 𝑑, 𝑞, 𝑧                                    1 − 𝑝 𝑟 𝑑 ′ , 𝑞, 𝑧
                                                     𝑧                                                𝑑 ′ ∈𝑆




IRG
                                            Explicit Relevance Models in Intent-Aware IR Diversification
                           35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
IR Group @ UAM                                              Portland, OR, 13 August 2012
IR diversity – Relevance-based instantiation of objective function

             𝜑 𝑑, 𝑆 𝑞 ∝ 𝑝 𝑑 is relevant ∧ no 𝑑′ ∈ 𝑆 is relevant                                                         𝑞

    IA-Select scheme – relevance-based

             𝜑 𝑑, 𝑆 𝑞 =              𝑝 𝑧 𝑞 𝑝 𝑟 𝑑, 𝑞, 𝑧                                    1 − 𝑝 𝑟 𝑑 ′ , 𝑞, 𝑧
                               𝑧                                               𝑑 ′ ∈𝑆


                                                                                                            Equivalent
    xQuAD scheme – relevance-based
                                                                                                             for 𝜆 = 1
             𝜑 𝑑, 𝑆 𝑞 = 1 − 𝜆 𝑝 𝑟 𝑑 𝑞 + 𝜆 𝑝 𝑟 𝑑 , ¬ 𝑟 𝑆 𝑞

   = 1 − 𝜆 𝑝 𝑟 𝑑, 𝑞 + 𝜆                                    𝑝 𝑧 𝑞 𝑝 𝑟 𝑑, 𝑞, 𝑧                                    1 − 𝑝 𝑟 𝑑 ′ , 𝑞, 𝑧
                                                    𝑧                                                𝑑 ′ ∈𝑆




IRG
                                           Explicit Relevance Models in Intent-Aware IR Diversification
                          35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
IR Group @ UAM                                             Portland, OR, 13 August 2012
Relevance distirbution vs. document distribution

                 𝑝 𝑟 𝑑,· vs. 𝑝 𝑑 · – The difference does matter (in this context)
       1


                           𝑝 𝑑 𝑞, 𝑧 = 1
                       𝑑


                                                    𝑝 𝑟 𝑑, 𝑞, 𝑧 = E nr relevant docs ≥ 1
                                               𝑑


                                                                     Different potential behavior
                                                                    E.g. stronger redundancy penalization
                                                                                                                           Potential rank
       0                                                                                                                   equivalences do
                                                          𝑑                                                                not apply here

     1 − 𝜆 𝑝 𝑟 𝑑, 𝑞 + 𝜆                                𝑝 𝑧 𝑞 𝑝 𝑟 𝑑, 𝑞, 𝑧                                    1 − 𝑝 𝑟 𝑑′ , 𝑞, 𝑧
IRG                                             𝑧                                                     ′
                                              Explicit Relevance Models in Intent-Aware IR Diversification
                                                                                                 𝑑 ∈𝑆
                             35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
IR Group @ UAM                                                Portland, OR, 13 August 2012
Relevance-based greedy diversification



         Relevance-based reformulation of diversification algorithm

         1. Need to estimate 𝑝 𝑟 𝑑, 𝑞, 𝑧

         2. Does it work? Test empirically

         3. Further development: parameterized tolerance to redundancy




IRG
                                      Explicit Relevance Models in Intent-Aware IR Diversification
                     35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
IR Group @ UAM                                        Portland, OR, 13 August 2012
Aspect-based relevance model

     Estimate 𝒑 𝒓 𝒅, 𝒒, 𝒛

     Cannot use odds, logs, constant removal… or any other rank-preserving step
     (we need the specific values)

                              𝑝 𝑟 𝑑, 𝑞                            Positional relevance 𝑝 𝑟 rank 𝑑, 𝑞

                                                                  Estimate 𝑝 𝑧 𝑑 or 𝑝 𝑧 𝑞 depending
                              𝑝 𝑧 𝑑
                                                                  on available observations:
        𝑝 𝑟 𝑑, 𝑞, 𝑧           𝑝 𝑧 𝑞                               • 𝑧 as document classes (e.g. ODP)
                                                                  • 𝑧 as subqueries (e.g. reformulations)
                              𝑝(𝑧)
                                                                  Then derive the other two parameters

                              𝑝 𝑑 𝑞                              Normalized baseline IR system score
                                                                 (as in e.g. Bache 2009)

IRG
                                       Explicit Relevance Models in Intent-Aware IR Diversification
                      35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
IR Group @ UAM                                         Portland, OR, 13 August 2012
Positional relevance distribution estimate

                                        𝒑 𝒓 𝒅, 𝒒 ∼ 𝑝 𝑟 rank 𝑑, 𝑞                               = 𝒑 𝒓 𝒌

             1E+00

                 1E-01                                           𝑝 𝑟 𝑘
                                                                                                            pLSA
                 1E-02
    p(r|k)




                                                                                                       Lemur                Precision
                 1E-03                                                                                                      estimates

                 1E-04                                                                                                      Click log
                                                                                              AOL                           statistics
                 1E-05
                         0   20     40         60        80       100 120 140 160 180 200
                                                                   𝑘
                                                                   k


IRG
                                               Explicit Relevance Models in Intent-Aware IR Diversification
                              35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
IR Group @ UAM                                                 Portland, OR, 13 August 2012
Relevance-based greedy diversification



         Relevance-based reformulation of diversification algorithm

         1. Need to estimate 𝑝 𝑟 𝑑, 𝑞, 𝑧

         2. Does it work? Test empirically

         3. Further development: parameterized tolerance to redundancy




IRG
                                      Explicit Relevance Models in Intent-Aware IR Diversification
                     35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
IR Group @ UAM                                        Portland, OR, 13 August 2012
Experiments
   Search diversity
          Collection: ClueWeb09 category B (50M documents)
          Query/subtopic set: TREC 2009/10 diversity task (100 queries)


          Baseline ranking: Lemur Indri search engine (Web service)                                           Diversified top n : 100
          Query aspect space:
                 a) ODP categories level 4 (~7K categories)
                 b) TREC subtopics (oracle for reference)
          Specific parameter estimates:
                 𝑝 𝑧 𝑞     Uniform
                           ODP categories: semi-supervised text classification by Textwise
                 𝑝 𝑧 𝑑
                           TREC subtopics: Indri search system run on 𝑧 as if a query
                           i. P@k estimates with TREC relevance judgments (2-fold 2009/10 cross validation)
                 𝑝 𝑟 𝑘
                           ii. Click statistics from AOL log (thus different IR system)




IRG
                                                  Explicit Relevance Models in Intent-Aware IR Diversification
                                 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
IR Group @ UAM                                                    Portland, OR, 13 August 2012
Experiments – Search diversity on TREC

     xQuAD scheme
                                                               Based on 𝑝 𝑟 𝑑, 𝑞, 𝑧
       𝑝 𝑟 𝑘 from qrels
                                                               Based on 𝑝 𝑑 𝑞, 𝑧


                          ODP categories                                                             TREC subtopics
             ERR-IA




                                                                            ERR-IA

                                      λ                                                                           λ



IRG
                                           Explicit Relevance Models in Intent-Aware IR Diversification
                          35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
IR Group @ UAM                                             Portland, OR, 13 August 2012
Experiments – Search diversity on TREC


                                                                    -nDCG@20              ERR-IA@20             nDCGIA@20             S-recall@20
                              Lemur                            -        0.2587                0.1630                 0.2396              0.4636
                              IA-Select                        -        0.2651                0.1681                 0.2423              0.4483
                 categories
                  a) ODP




                              xQuAD                          0.9        0.2675                0.1656                 0.2451              0.4864
                              Rel-based i. Qrels 0.1                    0.2858△▲              0.1828△▲               0.2655△▲            0.4898▲△
                              xQuAD     ii. Clicks 0.4                  0.2841▲△              0.1831△△               0.2605△▲            0.4830▲▽

                              IA-Select                        -        0.3541                0.2346                 0.3213              0.5787
                 subtopics
                  b) TREC




                              xQuAD                          1.0        0.3445                0.2241                 0.3127              0.5704
                              Rel-based i. Qrels 1.0                    0.3543△△              0.2349△△               0.3192▽△            0.5782▽△
                              xQuAD     ii. Clicks 1.0                  0.3512▽△              0.2320▽△               0.3166▽△            0.5748▽△

                      “informally” maximizing ERR-IA by 0.1 steps for each diversifier
                     Best value in bold green
                     ▲▼         𝑝 < 0.05


IRG
                                                           Explicit Relevance Models in Intent-Aware IR Diversification
                                          35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
IR Group @ UAM                                                             Portland, OR, 13 August 2012
Experiments
   Recommendation diversity
                                               Collection: 6K users, 4K movies, 1M ratings
          Dataset 1: MovieLens 1M
                                               Subtopic set: 10 movie genres
                                               Collection: 1K users, 175K artists, 20M playcounts
          Dataset 2: Last.fm crawl
                                               Subtopic set: 120K social tags on artists by Last.fm users
                                                                Queries  users
          Adaptation of IR diversity paradigm                   Documents  items (movies, music artists)
                                                                Subtopics  item features (genres, tags)
          (Vargas, Castells & Vallet SIGIR 2011)
                                                                Relevance judgments  test ratings from data split

                                     a) pLSA
          Baseline rankings:                                                                              Diversified top n: 100
                                     b) Popularity-based recommendation
          Specific parameter estimates:
                 𝑝 𝑧 𝑞   Uniform
                 𝑝 𝑧 𝑑   Uniform on 𝑑 (based on binary aspect/item association)
                 𝑝 𝑟 𝑘   P@k estimates with 2-fold cross-validation on test users



IRG
                                               Explicit Relevance Models in Intent-Aware IR Diversification
                              35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
IR Group @ UAM                                                 Portland, OR, 13 August 2012
Experiments – Recommendation diversity on MovieLens and Last.fm

                      pLSA recommender                       MovieLens 1M                                                Last.fm

                                         ERR-IA
             by item popularity
             Recommendation


                                         ERR-IA




                                                                                                                                           Based on 𝑝 𝑟 𝑑, 𝑞, 𝑧
                                                                                                                                           Based on 𝑝 𝑑 𝑞, 𝑧

                                                                        λ                                                      λ

IRG
                                                                   Explicit Relevance Models in Intent-Aware IR Diversification
                                                  35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
IR Group @ UAM                                                                     Portland, OR, 13 August 2012
Relevance-based greedy diversification



         Relevance-based reformulation of diversification algorithm

         1. Need to estimate 𝑝 𝑟 𝑑, 𝑞, 𝑧

         2. Does it work? Test empirically

         3. Further development: parameterized tolerance to redundancy




IRG
                                      Explicit Relevance Models in Intent-Aware IR Diversification
                     35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
IR Group @ UAM                                        Portland, OR, 13 August 2012
Adjustable tolerance to redundancy

      Generalization of relevance-based diversification scheme
      Formally support adjustable redundancy penalization
      Approach: generalize relevance to browsing model
                                                                                                                           Tolerance to
                                                                                                                           redundancy
       𝜑 𝑑, 𝑆 𝑞 = 1 − λ 𝑝 𝑟 𝑑, 𝑞 + λ 𝑝 𝑟 𝑑 , ¬ 𝒔𝒕𝒐𝒑 𝑆                                           𝑞 =⋯

     = 1 − λ 𝑝 𝑟 𝑑, 𝑞 + λ                            𝑝 𝑧 𝑞 𝑝 𝑟 𝑑, 𝑧, 𝑞                            1 − 𝑝 𝑟 𝑑 ′ , 𝑧, 𝑞 𝒑 𝒔𝒕𝒐𝒑 𝒓
                                                𝑐                                        𝑑 ′ ∈𝑆

      Adjustable redundancy tolerance parameter 𝑝 𝑠𝑡𝑜𝑝 𝑟 ∈ [0,1]
                 – High 𝑝 𝑠𝑡𝑜𝑝 𝑟 for aggresive penalization, low for e.g. high-recall searches
                 – In this view, original formulations would implicitly assume 𝑝 𝑠𝑡𝑜𝑝 𝑟 = 1,
                    i.e. a single relevant document is sought


IRG
                                              Explicit Relevance Models in Intent-Aware IR Diversification
                             35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
IR Group @ UAM                                                Portland, OR, 13 August 2012
Adjustable tolerance to redundancy

                   Empirical observation: 𝑝 𝑠𝑡𝑜𝑝 𝑟 vs.  in -nDCG

                              Search task                                             Recommendation task
                       Lemur on TREC / Subtopics                                   pLSA on MovieLens / Genres
                      1                                                             1
                 𝑝 𝑠𝑡𝑜𝑝 𝑟




                                                                               𝑝 𝑠𝑡𝑜𝑝 𝑟
                      0                                      1                     0                                      1

                                                               best -nDCG value of column
                            For each 
                                                               worst -nDCG value of column

IRG
                                               Explicit Relevance Models in Intent-Aware IR Diversification
                              35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
IR Group @ UAM                                                 Portland, OR, 13 August 2012
Conclusion

    Alternative, relevance-based formulation of greedy aspect-based diversification
                 – Unifies two previous aspect-based algorithms

                 – More literal expression of formal problem statement (and metrics?)

    𝑝 𝑟 𝑑, 𝑞, 𝑧 vs. 𝑝 𝑑 𝑞, 𝑧
                 – Literal value estimates needed (rather than rank-equivalent approximations)

                 – Estimate based on positional relevance (relevance or click data needed)

    Seems to perform well empirically

                 – Light requirements on relevance or click data for training positional relevance

                 – Improvement trend, but needs to be tested under further optimizations

    Formal support for redundancy tolerance adjustment


IRG
                                                Explicit Relevance Models in Intent-Aware IR Diversification
                               35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
IR Group @ UAM                                                  Portland, OR, 13 August 2012

Mais conteúdo relacionado

Mais de Pablo Castells

Rational and irrational bias in recommendation
Rational and irrational bias in recommendationRational and irrational bias in recommendation
Rational and irrational bias in recommendationPablo Castells
 
Bias in recommendation: avoid it or embrace it?
Bias in recommendation: avoid it or embrace it?Bias in recommendation: avoid it or embrace it?
Bias in recommendation: avoid it or embrace it?Pablo Castells
 
RecSys 2020 - On Target Item Sampling in Offline Recommender System Evaluation
RecSys 2020 - On Target Item Sampling in Offline Recommender System EvaluationRecSys 2020 - On Target Item Sampling in Offline Recommender System Evaluation
RecSys 2020 - On Target Item Sampling in Offline Recommender System EvaluationPablo Castells
 
REVEAL @ RecSys 2018 - Characterization of Fair Experiments for Recommender S...
REVEAL @ RecSys 2018 - Characterization of Fair Experiments for Recommender S...REVEAL @ RecSys 2018 - Characterization of Fair Experiments for Recommender S...
REVEAL @ RecSys 2018 - Characterization of Fair Experiments for Recommender S...Pablo Castells
 
SIGIR 2018 - Should I Follow the Crowd? A Probabilistic Analysis of the Effec...
SIGIR 2018 - Should I Follow the Crowd? A Probabilistic Analysis of the Effec...SIGIR 2018 - Should I Follow the Crowd? A Probabilistic Analysis of the Effec...
SIGIR 2018 - Should I Follow the Crowd? A Probabilistic Analysis of the Effec...Pablo Castells
 
SIGIR 2017 - A Probabilistic Reformulation of Memory-Based Collaborative Filt...
SIGIR 2017 - A Probabilistic Reformulation of Memory-Based Collaborative Filt...SIGIR 2017 - A Probabilistic Reformulation of Memory-Based Collaborative Filt...
SIGIR 2017 - A Probabilistic Reformulation of Memory-Based Collaborative Filt...Pablo Castells
 
RSWeb @ ACM RecSys 2014 - Exploring social network effects on popularity bias...
RSWeb @ ACM RecSys 2014 - Exploring social network effects on popularity bias...RSWeb @ ACM RecSys 2014 - Exploring social network effects on popularity bias...
RSWeb @ ACM RecSys 2014 - Exploring social network effects on popularity bias...Pablo Castells
 
SIGIR 2011 Poster - Intent-Oriented Diversity in Recommender Systems
SIGIR 2011 Poster - Intent-Oriented Diversity in Recommender SystemsSIGIR 2011 Poster - Intent-Oriented Diversity in Recommender Systems
SIGIR 2011 Poster - Intent-Oriented Diversity in Recommender SystemsPablo Castells
 

Mais de Pablo Castells (8)

Rational and irrational bias in recommendation
Rational and irrational bias in recommendationRational and irrational bias in recommendation
Rational and irrational bias in recommendation
 
Bias in recommendation: avoid it or embrace it?
Bias in recommendation: avoid it or embrace it?Bias in recommendation: avoid it or embrace it?
Bias in recommendation: avoid it or embrace it?
 
RecSys 2020 - On Target Item Sampling in Offline Recommender System Evaluation
RecSys 2020 - On Target Item Sampling in Offline Recommender System EvaluationRecSys 2020 - On Target Item Sampling in Offline Recommender System Evaluation
RecSys 2020 - On Target Item Sampling in Offline Recommender System Evaluation
 
REVEAL @ RecSys 2018 - Characterization of Fair Experiments for Recommender S...
REVEAL @ RecSys 2018 - Characterization of Fair Experiments for Recommender S...REVEAL @ RecSys 2018 - Characterization of Fair Experiments for Recommender S...
REVEAL @ RecSys 2018 - Characterization of Fair Experiments for Recommender S...
 
SIGIR 2018 - Should I Follow the Crowd? A Probabilistic Analysis of the Effec...
SIGIR 2018 - Should I Follow the Crowd? A Probabilistic Analysis of the Effec...SIGIR 2018 - Should I Follow the Crowd? A Probabilistic Analysis of the Effec...
SIGIR 2018 - Should I Follow the Crowd? A Probabilistic Analysis of the Effec...
 
SIGIR 2017 - A Probabilistic Reformulation of Memory-Based Collaborative Filt...
SIGIR 2017 - A Probabilistic Reformulation of Memory-Based Collaborative Filt...SIGIR 2017 - A Probabilistic Reformulation of Memory-Based Collaborative Filt...
SIGIR 2017 - A Probabilistic Reformulation of Memory-Based Collaborative Filt...
 
RSWeb @ ACM RecSys 2014 - Exploring social network effects on popularity bias...
RSWeb @ ACM RecSys 2014 - Exploring social network effects on popularity bias...RSWeb @ ACM RecSys 2014 - Exploring social network effects on popularity bias...
RSWeb @ ACM RecSys 2014 - Exploring social network effects on popularity bias...
 
SIGIR 2011 Poster - Intent-Oriented Diversity in Recommender Systems
SIGIR 2011 Poster - Intent-Oriented Diversity in Recommender SystemsSIGIR 2011 Poster - Intent-Oriented Diversity in Recommender Systems
SIGIR 2011 Poster - Intent-Oriented Diversity in Recommender Systems
 

Último

Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 

Último (20)

Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 

SIGIR 2012 - Explicit Relevance Models in Intent-Oriented Information Retrieval Diversification

  • 1. 35th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012) Explicit Relevance Models in Intent-Aware IR Diversification Saúl Vargas, Pablo Castells and David Vallet Universidad Autónoma de Madrid http://ir.ii.uam.es Portland, OR, 13 August 2012 IRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012) IR Group @ UAM Portland, OR, 13 August 2012
  • 2. Outline  Context: IR diversification formulation and algorithms  Proposed approach: relevance-based reformulation of diversification algorithms  Experiments  Adjustable tolerance to redundancy  Conclusion IRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012) IR Group @ UAM Portland, OR, 13 August 2012
  • 3. IR diversity – Brief recap Nutrition / Health Appliance Chemical element Golf Mining / Metallurgy IRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012) IR Group @ UAM Portland, OR, 13 August 2012
  • 4. IR diversity – Brief recap Nutrition / Health Appliance  Diversity as a means to address uncertainty in user queries – The same query may have different intents or aspects in the Chemical information need underneath element  Revision of document relevance independence – Marginal utility of additional relevant documents decreases fast Golf  Trade diminishing marginal utility for increased intent coverage – Thus maximize the number of users who obtain at least some useful document Mining / Metallurgy IRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012) IR Group @ UAM Portland, OR, 13 August 2012
  • 5. IR diversification – Problem statement Given a query 𝑞 on a collection  Find 𝑆 ⊂  of given size maximizing: NP-hard 𝑝 some 𝑑 ∈ 𝑆 relevant 𝑞 Agrawal 2009, Santos 2010, Chen 2006, … 𝑅− 𝑆 𝑆 Baseline arg max 𝝋 𝒅, 𝑺 𝒒 Diversified Greedy ranking 𝑑∈𝑅−𝑆 ranking approx 𝑝(𝑑|𝑞) 𝝋 𝒅, 𝑺 𝒒 ∝ 𝑝 𝑑 is relevant ∧ no 𝑑 ′ ∈ 𝑆 is relevant 𝑞 IRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012) IR Group @ UAM Portland, OR, 13 August 2012
  • 6. IR diversity – Instantiations of objective function State of the art aspect-based approaches  IA-Select scheme (Agrawal 2009) 𝜑 𝑑, 𝑆 𝑞 = 𝑝 𝒛 𝑞 𝑝 𝒛 𝑑 𝑝 𝑑 𝑞 1 − 𝑝 𝒛 𝑑′ 𝑝 𝑑 𝑞 𝑧 𝑑 ′ ∈𝑆 Explicit query aspects  xQuAD scheme (Santos 2010) 𝜑 𝑑, 𝑆 𝑞 = 1 − 𝜆 𝑝 𝑑 𝑞 + 𝜆 𝑝 𝑑, ¬ 𝑆 𝑞 = 1− 𝜆 𝑝 𝑑 𝑞 + 𝜆 𝑝 𝒛 𝑞 𝑝 𝑑 𝑞, 𝒛 1 − 𝑝 𝑑′ 𝑞, 𝒛 𝑧 𝑑 ′ ∈𝑆 Explicit query aspects IRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012) IR Group @ UAM Portland, OR, 13 August 2012
  • 7. IR diversity – Instantiations of objective function State of the art aspect-based approaches  IA-Select scheme (Agrawal 2009) 𝜑 𝑑, 𝑆 𝑞 = 𝑝 𝑧 𝑞 𝑝 𝑧 𝑑 𝑝 𝑑 𝑞 1 − 𝑝 𝑧 𝑑′ 𝑝 𝑑 𝑞 𝑧 𝑑 ′ ∈𝑆 Query aspect  xQuAD scheme (Santos 2010) coverage 𝜑 𝑑, 𝑆 𝑞 = 1 − 𝜆 𝑝 𝑑 𝑞 + 𝜆 𝑝 𝑑, ¬ 𝑆 𝑞 = 1− 𝜆 𝑝 𝑑 𝑞 + 𝜆 𝑝 𝑧 𝑞 𝑝 𝑑 𝑞, 𝑧 1 − 𝑝 𝑑 ′ 𝑞, 𝑧 𝑧 𝑑 ′ ∈𝑆 IRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012) IR Group @ UAM Portland, OR, 13 August 2012
  • 8. IR diversity – Instantiations of objective function State of the art aspect-based approaches  IA-Select scheme (Agrawal 2009) 𝜑 𝑑, 𝑆 𝑞 = 𝑝 𝑧 𝑞 𝑝 𝑧 𝑑 𝑝 𝑑 𝑞 1 − 𝑝 𝑧 𝑑′ 𝑝 𝑑 𝑞 𝑧 𝑑 ′ ∈𝑆 Document “relevance”  xQuAD scheme (Santos 2010) for query aspect 𝜑 𝑑, 𝑆 𝑞 = 1 − 𝜆 𝑝 𝑑 𝑞 + 𝜆 𝑝 𝑑, ¬ 𝑆 𝑞 = 1− 𝜆 𝑝 𝑑 𝑞 + 𝜆 𝑝 𝑧 𝑞 𝑝 𝑑 𝑞, 𝑧 1 − 𝑝 𝑑 ′ 𝑞, 𝑧 𝑧 𝑑 ′ ∈𝑆 IRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012) IR Group @ UAM Portland, OR, 13 August 2012
  • 9. IR diversity – Instantiations of objective function State of the art aspect-based approaches  IA-Select scheme (Agrawal 2009) 𝜑 𝑑, 𝑆 𝑞 = 𝑝 𝑧 𝑞 𝑝 𝑧 𝑑 𝑝 𝑑 𝑞 1 − 𝑝 𝑧 𝑑′ 𝑝 𝑑 𝑞 𝑧 𝑑 ′ ∈𝑆  xQuAD scheme (Santos 2010) Redundancy penalization 𝜑 𝑑, 𝑆 𝑞 = 1 − 𝜆 𝑝 𝑑 𝑞 + 𝜆 𝑝 𝑑, ¬ 𝑆 𝑞 = 1− 𝜆 𝑝 𝑑 𝑞 + 𝜆 𝑝 𝑧 𝑞 𝑝 𝑑 𝑞, 𝑧 1 − 𝑝 𝑑 ′ 𝑞, 𝑧 𝑧 𝑑 ′ ∈𝑆 IRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012) IR Group @ UAM Portland, OR, 13 August 2012
  • 10. IR diversity – Instantiations of objective function State of the art aspect-based approaches  IA-Select scheme (Agrawal 2009) 𝜑 𝑑, 𝑆 𝑞 = 𝑝 𝑧 𝑞 𝑝 𝑧 𝑑 𝑝 𝑑 𝑞 1 − 𝑝 𝑧 𝑑′ 𝑝 𝑑 𝑞 𝑧 𝑑 ′ ∈𝑆  xQuAD scheme (Santos 2010) 𝜑 𝑑, 𝑆 𝑞 = 1 − 𝜆 𝑝 𝑑 𝑞 + 𝜆 𝑝 𝑑, ¬ 𝑆 𝑞 = 1− 𝜆 𝑝 𝑑 𝑞 + 𝜆 𝑝 𝑧 𝑞 𝑝 𝑑 𝑞, 𝑧 1 − 𝑝 𝑑 ′ 𝑞, 𝑧 𝑧 𝑑 ′ ∈𝑆 Mixture with baseline 𝜆  Degree of diversification IRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012) IR Group @ UAM Portland, OR, 13 August 2012
  • 11. IR diversity – Instantiations of objective function 𝜑 𝑑, 𝑆 𝑞 ∝ 𝑝 𝑑 is 𝐫𝐞𝐥𝐞𝐯𝐚𝐧𝐭 ∧ no 𝑑 ′ ∈ 𝑆 is 𝐫𝐞𝐥𝐞𝐯𝐚𝐧𝐭 𝑞  IA-Select scheme (Agrawal 2009) 𝜑 𝑑, 𝑆 𝑞 = 𝑝 𝑧 𝑞 𝑝 𝑧 𝑑 𝑝 𝑑 𝑞 1 − 𝑝 𝑧 𝑑′ 𝑝 𝑑 𝑞 𝑧 𝑑 ′ ∈𝑆 Probability to  xQuAD scheme (Santos 2010) observe documents 𝜑 𝑑, 𝑆 𝑞 = 1 − 𝜆 𝑝 𝑑 𝑞 + 𝜆 𝑝 𝑑, ¬ 𝑆 𝑞 = 1− 𝜆 𝑝 𝑑 𝑞 + 𝜆 𝑝 𝑧 𝑞 𝑝 𝑑 𝑞, 𝑧 1 − 𝑝 𝑑 ′ 𝑞, 𝑧 𝑧 𝑑 ′ ∈𝑆 IRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012) IR Group @ UAM Portland, OR, 13 August 2012
  • 12. IR diversity – Relevance-based instantiation of objective function 𝜑 𝑑, 𝑆 𝑞 ∝ 𝑝 𝑑 is 𝐫𝐞𝐥𝐞𝐯𝐚𝐧𝐭 ∧ no 𝑑 ′ ∈ 𝑆 is 𝐫𝐞𝐥𝐞𝐯𝐚𝐧𝐭 𝑞  IA-Select scheme – relevance-based Our proposal 𝜑 𝑑, 𝑆 𝑞 = 𝑝 𝑧 𝑞 𝑝 𝒓 𝑑, 𝑞, 𝑧 1 − 𝑝 𝒓 𝑑 ′ , 𝑞, 𝑧 𝑧 𝑑 ′ ∈𝑆 Probability  xQuAD scheme – relevance-based of relevance 𝜑 𝑑, 𝑆 𝑞 = 1 − 𝜆 𝑝 𝒓 𝑑 𝑞 + 𝜆 𝑝 𝒓 𝑑 , ¬ 𝒓 𝑆 𝑞 = 1 − 𝜆 𝑝 𝒓 𝑑, 𝑞 + 𝜆 𝑝 𝑧 𝑞 𝑝 𝒓 𝑑, 𝑞, 𝑧 1 − 𝑝 𝒓 𝑑′ , 𝑞, 𝑧 𝑧 𝑑 ′ ∈𝑆 IRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012) IR Group @ UAM Portland, OR, 13 August 2012
  • 13. IR diversity – Relevance-based instantiation of objective function 𝜑 𝑑, 𝑆 𝑞 ∝ 𝑝 𝑑 is 𝐫𝐞𝐥𝐞𝐯𝐚𝐧𝐭 ∧ no 𝑑 ′ ∈ 𝑆 is 𝐫𝐞𝐥𝐞𝐯𝐚𝐧𝐭 𝑞  IA-Select scheme – relevance-based 𝜑 𝑑, 𝑆 𝑞 = 𝑝 𝑧 𝑞 𝑝 𝑟 𝑑, 𝑞, 𝑧 1 − 𝑝 𝑟 𝑑 ′ , 𝑞, 𝑧 𝑧 More literal interpretation 𝑑 ′ ∈𝑆 of initial problem statement  xQuAD scheme – relevance-based 𝜑 𝑑, 𝑆 𝑞 = 1 − 𝜆 𝑝 𝒓 𝑑 𝑞 + 𝜆 𝑝 𝒓 𝑑 , ¬ 𝒓 𝑆 𝑞 = 1 − 𝜆 𝑝 𝑟 𝑑, 𝑞 + 𝜆 𝑝 𝑧 𝑞 𝑝 𝑟 𝑑, 𝑞, 𝑧 1 − 𝑝 𝑟 𝑑 ′ , 𝑞, 𝑧 𝑧 𝑑 ′ ∈𝑆 IRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012) IR Group @ UAM Portland, OR, 13 August 2012
  • 14. IR diversity – Relevance-based instantiation of objective function 𝜑 𝑑, 𝑆 𝑞 ∝ 𝑝 𝑑 is relevant ∧ no 𝑑′ ∈ 𝑆 is relevant 𝑞  IA-Select scheme – relevance-based 𝜑 𝑑, 𝑆 𝑞 = 𝑝 𝑧 𝑞 𝑝 𝑟 𝑑, 𝑞, 𝑧 1 − 𝑝 𝑟 𝑑 ′ , 𝑞, 𝑧 𝑧 𝑑 ′ ∈𝑆 Equivalent  xQuAD scheme – relevance-based for 𝜆 = 1 𝜑 𝑑, 𝑆 𝑞 = 1 − 𝜆 𝑝 𝑟 𝑑 𝑞 + 𝜆 𝑝 𝑟 𝑑 , ¬ 𝑟 𝑆 𝑞 = 1 − 𝜆 𝑝 𝑟 𝑑, 𝑞 + 𝜆 𝑝 𝑧 𝑞 𝑝 𝑟 𝑑, 𝑞, 𝑧 1 − 𝑝 𝑟 𝑑 ′ , 𝑞, 𝑧 𝑧 𝑑 ′ ∈𝑆 IRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012) IR Group @ UAM Portland, OR, 13 August 2012
  • 15. Relevance distirbution vs. document distribution 𝑝 𝑟 𝑑,· vs. 𝑝 𝑑 · – The difference does matter (in this context) 1 𝑝 𝑑 𝑞, 𝑧 = 1 𝑑 𝑝 𝑟 𝑑, 𝑞, 𝑧 = E nr relevant docs ≥ 1 𝑑 Different potential behavior  E.g. stronger redundancy penalization Potential rank 0 equivalences do 𝑑 not apply here 1 − 𝜆 𝑝 𝑟 𝑑, 𝑞 + 𝜆 𝑝 𝑧 𝑞 𝑝 𝑟 𝑑, 𝑞, 𝑧 1 − 𝑝 𝑟 𝑑′ , 𝑞, 𝑧 IRG 𝑧 ′ Explicit Relevance Models in Intent-Aware IR Diversification 𝑑 ∈𝑆 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012) IR Group @ UAM Portland, OR, 13 August 2012
  • 16. Relevance-based greedy diversification Relevance-based reformulation of diversification algorithm 1. Need to estimate 𝑝 𝑟 𝑑, 𝑞, 𝑧 2. Does it work? Test empirically 3. Further development: parameterized tolerance to redundancy IRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012) IR Group @ UAM Portland, OR, 13 August 2012
  • 17. Aspect-based relevance model Estimate 𝒑 𝒓 𝒅, 𝒒, 𝒛 Cannot use odds, logs, constant removal… or any other rank-preserving step (we need the specific values) 𝑝 𝑟 𝑑, 𝑞 Positional relevance 𝑝 𝑟 rank 𝑑, 𝑞 Estimate 𝑝 𝑧 𝑑 or 𝑝 𝑧 𝑞 depending 𝑝 𝑧 𝑑 on available observations: 𝑝 𝑟 𝑑, 𝑞, 𝑧 𝑝 𝑧 𝑞 • 𝑧 as document classes (e.g. ODP) • 𝑧 as subqueries (e.g. reformulations) 𝑝(𝑧) Then derive the other two parameters 𝑝 𝑑 𝑞 Normalized baseline IR system score (as in e.g. Bache 2009) IRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012) IR Group @ UAM Portland, OR, 13 August 2012
  • 18. Positional relevance distribution estimate 𝒑 𝒓 𝒅, 𝒒 ∼ 𝑝 𝑟 rank 𝑑, 𝑞 = 𝒑 𝒓 𝒌 1E+00 1E-01 𝑝 𝑟 𝑘 pLSA 1E-02 p(r|k) Lemur Precision 1E-03 estimates 1E-04 Click log AOL statistics 1E-05 0 20 40 60 80 100 120 140 160 180 200 𝑘 k IRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012) IR Group @ UAM Portland, OR, 13 August 2012
  • 19. Relevance-based greedy diversification Relevance-based reformulation of diversification algorithm 1. Need to estimate 𝑝 𝑟 𝑑, 𝑞, 𝑧 2. Does it work? Test empirically 3. Further development: parameterized tolerance to redundancy IRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012) IR Group @ UAM Portland, OR, 13 August 2012
  • 20. Experiments Search diversity Collection: ClueWeb09 category B (50M documents) Query/subtopic set: TREC 2009/10 diversity task (100 queries) Baseline ranking: Lemur Indri search engine (Web service) Diversified top n : 100 Query aspect space: a) ODP categories level 4 (~7K categories) b) TREC subtopics (oracle for reference) Specific parameter estimates: 𝑝 𝑧 𝑞 Uniform ODP categories: semi-supervised text classification by Textwise 𝑝 𝑧 𝑑 TREC subtopics: Indri search system run on 𝑧 as if a query i. P@k estimates with TREC relevance judgments (2-fold 2009/10 cross validation) 𝑝 𝑟 𝑘 ii. Click statistics from AOL log (thus different IR system) IRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012) IR Group @ UAM Portland, OR, 13 August 2012
  • 21. Experiments – Search diversity on TREC xQuAD scheme Based on 𝑝 𝑟 𝑑, 𝑞, 𝑧 𝑝 𝑟 𝑘 from qrels Based on 𝑝 𝑑 𝑞, 𝑧 ODP categories TREC subtopics ERR-IA ERR-IA λ λ IRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012) IR Group @ UAM Portland, OR, 13 August 2012
  • 22. Experiments – Search diversity on TREC  -nDCG@20 ERR-IA@20 nDCGIA@20 S-recall@20 Lemur - 0.2587 0.1630 0.2396 0.4636 IA-Select - 0.2651 0.1681 0.2423 0.4483 categories a) ODP xQuAD 0.9 0.2675 0.1656 0.2451 0.4864 Rel-based i. Qrels 0.1 0.2858△▲ 0.1828△▲ 0.2655△▲ 0.4898▲△ xQuAD ii. Clicks 0.4 0.2841▲△ 0.1831△△ 0.2605△▲ 0.4830▲▽ IA-Select - 0.3541 0.2346 0.3213 0.5787 subtopics b) TREC xQuAD 1.0 0.3445 0.2241 0.3127 0.5704 Rel-based i. Qrels 1.0 0.3543△△ 0.2349△△ 0.3192▽△ 0.5782▽△ xQuAD ii. Clicks 1.0 0.3512▽△ 0.2320▽△ 0.3166▽△ 0.5748▽△  “informally” maximizing ERR-IA by 0.1 steps for each diversifier Best value in bold green ▲▼  𝑝 < 0.05 IRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012) IR Group @ UAM Portland, OR, 13 August 2012
  • 23. Experiments Recommendation diversity Collection: 6K users, 4K movies, 1M ratings Dataset 1: MovieLens 1M Subtopic set: 10 movie genres Collection: 1K users, 175K artists, 20M playcounts Dataset 2: Last.fm crawl Subtopic set: 120K social tags on artists by Last.fm users Queries  users Adaptation of IR diversity paradigm Documents  items (movies, music artists) Subtopics  item features (genres, tags) (Vargas, Castells & Vallet SIGIR 2011) Relevance judgments  test ratings from data split a) pLSA Baseline rankings: Diversified top n: 100 b) Popularity-based recommendation Specific parameter estimates: 𝑝 𝑧 𝑞 Uniform 𝑝 𝑧 𝑑 Uniform on 𝑑 (based on binary aspect/item association) 𝑝 𝑟 𝑘 P@k estimates with 2-fold cross-validation on test users IRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012) IR Group @ UAM Portland, OR, 13 August 2012
  • 24. Experiments – Recommendation diversity on MovieLens and Last.fm pLSA recommender MovieLens 1M Last.fm ERR-IA by item popularity Recommendation ERR-IA Based on 𝑝 𝑟 𝑑, 𝑞, 𝑧 Based on 𝑝 𝑑 𝑞, 𝑧 λ λ IRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012) IR Group @ UAM Portland, OR, 13 August 2012
  • 25. Relevance-based greedy diversification Relevance-based reformulation of diversification algorithm 1. Need to estimate 𝑝 𝑟 𝑑, 𝑞, 𝑧 2. Does it work? Test empirically 3. Further development: parameterized tolerance to redundancy IRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012) IR Group @ UAM Portland, OR, 13 August 2012
  • 26. Adjustable tolerance to redundancy  Generalization of relevance-based diversification scheme  Formally support adjustable redundancy penalization  Approach: generalize relevance to browsing model Tolerance to redundancy 𝜑 𝑑, 𝑆 𝑞 = 1 − λ 𝑝 𝑟 𝑑, 𝑞 + λ 𝑝 𝑟 𝑑 , ¬ 𝒔𝒕𝒐𝒑 𝑆 𝑞 =⋯ = 1 − λ 𝑝 𝑟 𝑑, 𝑞 + λ 𝑝 𝑧 𝑞 𝑝 𝑟 𝑑, 𝑧, 𝑞 1 − 𝑝 𝑟 𝑑 ′ , 𝑧, 𝑞 𝒑 𝒔𝒕𝒐𝒑 𝒓 𝑐 𝑑 ′ ∈𝑆  Adjustable redundancy tolerance parameter 𝑝 𝑠𝑡𝑜𝑝 𝑟 ∈ [0,1] – High 𝑝 𝑠𝑡𝑜𝑝 𝑟 for aggresive penalization, low for e.g. high-recall searches – In this view, original formulations would implicitly assume 𝑝 𝑠𝑡𝑜𝑝 𝑟 = 1, i.e. a single relevant document is sought IRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012) IR Group @ UAM Portland, OR, 13 August 2012
  • 27. Adjustable tolerance to redundancy Empirical observation: 𝑝 𝑠𝑡𝑜𝑝 𝑟 vs.  in -nDCG Search task Recommendation task Lemur on TREC / Subtopics pLSA on MovieLens / Genres 1 1 𝑝 𝑠𝑡𝑜𝑝 𝑟 𝑝 𝑠𝑡𝑜𝑝 𝑟 0  1 0  1  best -nDCG value of column For each   worst -nDCG value of column IRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012) IR Group @ UAM Portland, OR, 13 August 2012
  • 28. Conclusion  Alternative, relevance-based formulation of greedy aspect-based diversification – Unifies two previous aspect-based algorithms – More literal expression of formal problem statement (and metrics?)  𝑝 𝑟 𝑑, 𝑞, 𝑧 vs. 𝑝 𝑑 𝑞, 𝑧 – Literal value estimates needed (rather than rank-equivalent approximations) – Estimate based on positional relevance (relevance or click data needed)  Seems to perform well empirically – Light requirements on relevance or click data for training positional relevance – Improvement trend, but needs to be tested under further optimizations  Formal support for redundancy tolerance adjustment IRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012) IR Group @ UAM Portland, OR, 13 August 2012