SlideShare uma empresa Scribd logo
1 de 30
Baixar para ler offline
Efficient Query Suggestions in the Long Tail
                Joint work:
                R. Perego, F. Silvestri, H. Vahabi, R. Venturini, HPC Lab, Italy
                F. Bonchi, Yahoo! Research, Spain




Wednesday, August 24, 2011
Query suggestion practices

                   • Use of the Wisdom of the Crowd mined
                             from Query Logs to recommend related
                             queries that are likely to better specify the
                             information need of the user
                              • shorten length of user sessions
                              • enhance perceived QoE

Wednesday, August 24, 2011
Queries in the Head




Wednesday, August 24, 2011
Queries in the Head




Wednesday, August 24, 2011
Queries in the Head




Wednesday, August 24, 2011
Queries in the Long Tail




Wednesday, August 24, 2011
Queries in the Long Tail
                             ?




Wednesday, August 24, 2011
Queries in the Long Tail
                             ?




                       ?
Wednesday, August 24, 2011
Queries in the Long Tail
                                      ?




                             Rare and never-seen



                       ?
                             queries account for
                              more than 50% of
                                 the traffic!



Wednesday, August 24, 2011
Open issues
                                    • Sparsity of models:
                                      • query assistance services perform
                                          poorly or are not even triggered
                                          on long-tail queries
                                    •   Performance:
                       Popularity




                                        • on-line process going in parallel
                                          with query answering



                                               Queries ordered by popularity




Wednesday, August 24, 2011
SoA: Query Flow Graph

     •      Query-centric approach
     •      Suggest queries by
            computing Random Walks
            with Restarts (RWRs) on
            the query-flow graph
            (QFG) by starting from
            the current user query


P. Boldi, F. Bonchi, C. Castillo, D. Donato, A. Gionis, S.Vigna: The query-flow graph: model and applications. CIKM 2008: 609-618
       P. Boldi, F. Bonchi, C. Castillo, D. Donato, A. Gionis, S.Vigna: Query suggestions using query-flow graphs. WSCD, 2009
Wednesday, August 24, 2011
Query-centric suggestions
                        Computing RWRs on a huge graph, e.g., built
                        from a QL recording 580,797,850 queries
                        (from Y! us):
                             • |V|     28,763,637
                             • |E|      56,250,874




Wednesday, August 24, 2011
Query-centric suggestions
                        Computing RWRs on a huge graph, e.g., built
                        from a QL recording 580,797,850 queries
                        (from Y! us):
                             • |V|            28,763,637
                             • |E|            56,250,874

                             • |{q: f(q)=1}| 162,221,967 (28%)

Wednesday, August 24, 2011
Term-centric opportunities
                        But, in the same Y! QL:
                             • queries             580,797,850
                             • Term occurrences   1,343,988,549




Wednesday, August 24, 2011
Term-centric opportunities
                        But, in the same Y! QL:
                             • queries             580,797,850
                             • Term occurrences   1,343,988,549

                             • |{t: f(t)=1}|         5,099,145 (0.04%)




Wednesday, August 24, 2011
The TQGraph
                                                                 *#%+,)
                                                    "#%&'("')&                      /#)(
                                     %-!&.'"#

                             !"##


                                                                      "#%&'("')&
                                       !"##$                          /#)($*#%+,)
                                    "#%&'("')&
                                      *#%+,)$
                                     %-!&.'"#
                                                 !"#$%&'




                                                                                           Term nodes are added to the
                                                                                           QFG which have only outgoing
                                                                                           links pointing at the query
                                                                                           nodes corresponding to the
                                                                                           queries in which the terms
                                                                                           occur.

Wednesday, August 24, 2011
TQGraph model
                   Suggestions for an incoming query q composed of
                   terms {t1,...tm} ⊆ T are generated from G by
                   extracting the center-piece subgraph starting from
                   the nodes corresponding to terms t1,...,tm
                   •         perform m Random Walks with Restart from
                             each one of the m term nodes corresponding to
                             terms in q
                   •         multiply component-wise the resulting m
                             stationary distributions


Wednesday, August 24, 2011
fro
                        100 queries on Yahoo!   useful   somewhat   not useful   M
                        α = 0.9                  48%       11%        41%        re

                             TQG effectiveness
                        α = 0.5
                        α = 0.1
                                                 41%
                                                 37%
                                                           20%
                                                           20%
                                                                      39%
                                                                      43%

                  Table 2: Effectiveness of TQGraph-based recommen-
          •      User study results comparing TQG and QFGand query
                  dations on the two different set of queries effectiveness
                 for two different testbeds (Y! US and MSN QLs).
                  logs, by varying the restart parameter α.

                        TREC on MSN             useful   somewhat   not useful
                        TQGraph α = 0.9          57%       16%        27%
                        QFG                      50%        9%        42%        sid
                                                                                 is
                        100 queries on Yahoo!   useful   somewhat   not useful   sit
                        TQGraph α = 0.9          48%       11%        41%        qu
                        QFG                      23%       10%        67%        lem
                                                                                 in
                                                                                 m
                    Table 3: User study results comparing effectiveness           th
                    of our method with the baseline for the two different
                    testbeds.                                                    qu
Wednesday, August 24, 2011
queries on Yahoo! useful somewhat not useful                 MSN query
    ing to outperform the effectiveness of QFG. The fact that log. The query is “lower heart rate”. Below we
 0.9                          48%     11%       41%           report the top 5 recommendations.
 0.5
    we, instead, achieve such a remarkable result is, actually, an
                              41%     20%       39%
    additional benefit of TQGraph-based 43%      methods.

                  Effectiveness on rare queries
 0.1                          37%     20%                             Query: lower heart rate
    Anecdotal evidence. We next show a few examplesSuggested Query      of                                     Score
    query recommendations. We start from queries that have            things to lower heart rate             2.9 e−14
e 2: Effectiveness of TQGraph-based recommen-
                                                                      lower heart rate through exercise      2.6 e−14
 nsnever been observed in the query log, i.e., the most difficult
     on the two different set of queries and query
                                                                      accelerated heart rate and pregnant 2.9 e−15
 by varying the restart parameter α. is one among the eight
    cases. The first query that we show

          •
                                                                      web md                                 2.0 e−16
     on MSN    Anecdotal evidence
    from the TREC testbed that do not appear at all in the problems
ECMSN query log. useful query is “lower heart rate”. Below we
                              The somewhat not useful
                                                                      heart                                  8.0 e−17

Graph α = 0.9 top 5 57%
    report the                recommendations. 27%
                                      16%                        We can observe that all the top-5 suggestions can be con-
                              50%      9%       42%           sidered pertinent to the initial topic. Moreover, even if this
             Query: lower heart rate                          is not an objective in this paper, they present some diver-
 queries on Yahoo! useful somewhat not useful
             Suggested Query                              Score the first two are how-to queries, while the last three are
                                                              sity:
Graph α = things to lower heart rate
              0.9             48%     11%       41%           queries related to finding information w.r.t. possible prob-
                                                        2.9 e−14
                              23%     10%
             lower heart rate through exercise  67%
                                                        2.6 e−14            Query not occurring
                                                              lems (with one very specific for pregnant women). The most
                                                              interesting recommendation is probably “web md”, which
e 3: Userweb mdresults comparing effectiveness
               study
                                                             −15
                                                                             in the training log
             accelerated heart rate and pregnant 2.9 e makes perfect sense4 , and has a large edit distance from
                                                        2.0 e−16 original query.
                                                              the
r method with the baseline for the two different
             heart problems                             8.0 e−17The next query we present is a rare (i.e., rarely appearing
 eds.                                                         query): “dog heat”; which appears only twice in the MSN
                                                              query log.
eas, we pair TREC queries up with the model built on can be con-
       We can observe that all the top-5 suggestions
 SN query log. In fact,to the initial topic. Moreover, even if this
    sidered pertinent the period from which TREC                  Query: dog heat
  come, is an objectiveperiod in paper, MSN queries some diver-
    is not     close to the in this which they present
 ubmitted. first two are how-to queries, while the last three are  Suggested Query                                  Score
    sity: the                                                     heat cycle dog pads                            4.3 e−10
 generated the top-5 recommendations for each query
    queries related to finding information w.r.t. possible what happens when female dog is
                                                                  prob-
            Query occurring twice
 ng both the QFG and the TQGraph with different pa-
    lems (with one very specific for pregnant women). The most in heat & a male dog is around
 ers setting. Using a web interface each assessor was                                                            4.0 e−10
              in the training log
nted a random query followed by the list of all the“web md”, boxer dog in heat
    interesting recommendation is probably dif-
                            sense4 , Recommendations were
                                                                  which
    makes perfect produced.and has a large edit distancedog in heat symptoms
  recommendations                                                   from
                                                                                                               3.99 e−10
                                                                                                               3.98 e−10
    the original query.
nted shuffled, in order for the assessor to not be able to          behavior of a male dog
        which system produced them. We give (i.e., rarely appearing around a female dog in heat
guishThe next query we present is a rare assessors                                                             3.95 e−10
ossibility to “dog heat”; search engine results for the in the MSN
    query): observe the which appears only twice
 al query and the recommended query that was being
    query log.                                                   As in the previous example, the top-5 suggestions are
ated. The assessor was asked to rate a recommendation         qualitatively good and present some diversity. Also, the
 one of the following scores: useful, somewhat useful,
   Wednesday, August 24, 2011                                 TQGraph-based method returns long queries, thus likely to
TQG pros

                   • provide query suggestions of quality
                             comparable/better than QFG even for rare
                             and unique queries
                   • several possible optimizations for achieving


Wednesday, August 24, 2011
TQG pros

                   • provide query suggestions of quality
                             comparable/better than QFG even for rare
                             and unique queries
                   • several possible optimizations for achieving
                                   an efficient on-line query
                                   recommendation service

Wednesday, August 24, 2011
Indexing precomputed suggestions
                                                  !"#$%&'(




                                                                       012"#3"4%014"5%#"6#"7"1389:1%:;%3<"%=>=7%
                                                                       ?:$6@3"4%:1%3<"%!AB#86<C%!<"%D"5-?:1%-7%$84"%
                                                                       @6%:;%3"#$%1:4"7(%6:791E7%8#"%3<"%7389:18#F%
                                                                       4-73#-G@9:1%28D@"7C%




                                                             )*%+,-%$.(/01*$023+,-(,4("35$.(6,751(0-(*'5(
                                                             !"#$%&'(%1(,2*%0-57(2.(%(898(4$,:(!5$:(;(


                   •         recommendations for an incoming query+ are computed -by
                               !"#$%&%     '&()*%        ')()+*%      ') (),*%       ') ().-/&**%
                             processing the posting lists associated with the terms in the query
                             90*'0-(23<=5*1(>35$051(%$5(1,$*57(2.(*'50$(?/1@()<,$51(%$5(
                             %&&$,A0:%*57(2.(*'5(B$5%*51*(2,3-7C(0@5@(D0(4,$(%EE(0(F(G@(




Wednesday, August 24, 2011
Indexing precomputed suggestions
                                                  !"#$%&'(




                                                                       012"#3"4%014"5%#"6#"7"1389:1%:;%3<"%=>=7%
                                                                       ?:$6@3"4%:1%3<"%!AB#86<C%!<"%D"5-?:1%-7%$84"%
                                                                       @6%:;%3"#$%1:4"7(%6:791E7%8#"%3<"%7389:18#F%
                                                                       4-73#-G@9:1%28D@"7C%




                                                             )*%+,-%$.(/01*$023+,-(,4("35$.(6,751(0-(*'5(
                                                             !"#$%&'(%1(,2*%0-57(2.(%(898(4$,:(!5$:(;(


                   •         recommendations for an incoming query+ are computed -by
                               !"#$%&%     '&()*%        ')()+*%      ') (),*%       ') ().-/&**%
                             processing the posting lists associated with the terms in the query
                             90*'0-(23<=5*1(>35$051(%$5(1,$*57(2.(*'50$(?/1@()<,$51(%$5(
                                 :)      O(|T|) posting lists
                             %&&$,A0:%*57(2.(*'5(B$5%*51*(2,3-7C(0@5@(D0(4,$(%EE(0(F(G@(


                                 :(      O(|Q|) length of each posting list
Wednesday, August 24, 2011
Pruning posting lists
                    • sort postings by probability and prune them
                              at a reasonable threshold p, e.g. 20,000
me quality of those

Graph-based recom-
lity those produced
 ar, TQGraph-based
 very large fraction
 as we shall present
QGraph can be pre-
 d list”-based repre-
very fast generation


PH
 ved online, a query
ciently, possibly in
n we introduce some
eneration of recom-
r, we show that the
 Wednesday, August 24, 2011
Pruning posting lists
                    • sort postings by probability and prune them
                              at a reasonable threshold p, e.g. 20,000
me quality of those

Graph-based recom-
lity those produced
 ar, TQGraph-based
 very large fraction
 as we shall present
QGraph can be pre-
 d list”-based repre-
very fast generation


PH
 ved online, a query
          O(|T|) lists, each of size O(p) and no loss in quality!
ciently, possibly in
n we introduce some
eneration of recom-
r, we show that the
 Wednesday, August 24, 2011
!"#$%&'(




                   Bucketing probabilities                      012"#3"4%014"5%#"6#"7"1389:1%:;%3<"%=>=7%


                 • Most space used for storing probabilities
                                                                ?:$6@3"4%:1%3<"%!AB#86<C%!<"%D"5-?:1%-7%$84"%
                                                                @6%:;%3"#$%1:4"7(%6:791E7%8#"%3<"%7389:18#F%
                                                                4-73#-G@9:1%28D@"7C%


                 • Given ε < 1, we can arrange postings in
                         buckets implicitly coding the approximate
                         probabilities )*%+,-%$.(/01*$023+,-(,4("35$.(6,751(0-(*'5(
                                       !"#$%&'(%1(,2*%0-57(2.(%(898(4$,:(!5$:(;(



                             !"#$%&%       '&()*%             ')()+*%            ')+(),*%            ')-().-/&**%

                       90*'0-(23<=5*1(>35$051(%$5(1,$*57(2.(*'50$(?/1@()<,$51(%$5(
                       %&&$,A0:%*57(2.(*'5(B$5%*51*(2,3-7C(0@5@(D0(4,$(%EE(0(F(G@(




Wednesday, August 24, 2011
!"#$%&'(




                   Bucketing probabilities                      012"#3"4%014"5%#"6#"7"1389:1%:;%3<"%=>=7%


                 • Most space used for storing probabilities
                                                                ?:$6@3"4%:1%3<"%!AB#86<C%!<"%D"5-?:1%-7%$84"%
                                                                @6%:;%3"#$%1:4"7(%6:791E7%8#"%3<"%7389:18#F%
                                                                4-73#-G@9:1%28D@"7C%


                 • Given ε < 1, we can arrange postings in
                         buckets implicitly coding the approximate
                         probabilities )*%+,-%$.(/01*$023+,-(,4("35$.(6,751(0-(*'5(
                                       !"#$%&'(%1(,2*%0-57(2.(%(898(4$,:(!5$:(;(



                             !"#$%&%       '&()*%             ')()+*%            ')+(),*%            ')-().-/&**%

                       90*'0-(23<=5*1(>35$051(%$5(1,$*57(2.(*'50$(?/1@()<,$51(%$5(
                       %&&$,A0:%*57(2.(*'5(B$5%*51*(2,3-7C(0@5@(D0(4,$(%EE(0(F(G@(



                                • Each entry coded with a few bits, e.g., 11-19 bits
                                • ~5x reduction!
                                • no loss in quality!
Wednesday, August 24, 2011
Caching posting lists
ided
both
            • achieving in-memory query suggestion
 are



d us-
e lost
 ever,
sults.
qual-
en in
hat a
ed by
make
 iffer-
which                Figure 4: Miss ratio of our cache as a function of
t Wednesday, August 24, 2011
   our
Conclusions
                   •         TQG model to overcome limitations of current query
                             recommenders
                         •      based on a principled, term-centric approach supporting rare and
                                never-seen queries
                   •         deployment with a efficient inverted index resulting in effectiveness
                             comparable/better to SoA approaches
                   •         the pruning, bucketing, caching techniques proposed constitute a
                             independent contribution in the area of efficiency in large scale RWR
                             computations
                         •      reduction of about 80% in the space occupancy w.r.t.
                                uncompressed data structures
                         •      in-memory RWRs on huge graphs with 90+ % hit-ratio cache



Wednesday, August 24, 2011
Вопросы?
                       Questions?
Wednesday, August 24, 2011

Mais conteúdo relacionado

Semelhante a Efficient Query Suggestions in the Long Tail

Slides for "Do Deep Generative Models Know What They Don't know?"
Slides for "Do Deep Generative Models Know What They Don't know?"Slides for "Do Deep Generative Models Know What They Don't know?"
Slides for "Do Deep Generative Models Know What They Don't know?"Julius Hietala
 
Artificial Intelligence for Automated Software Testing
Artificial Intelligence for Automated Software TestingArtificial Intelligence for Automated Software Testing
Artificial Intelligence for Automated Software TestingLionel Briand
 
Linked Open Vocabulary Ranking and Terms Discovery
Linked Open Vocabulary Ranking and Terms DiscoveryLinked Open Vocabulary Ranking and Terms Discovery
Linked Open Vocabulary Ranking and Terms DiscoveryIoannis Stavrakantonakis
 
The Rise of Approximate Ontology Reasoning: Is It Mainstream Yet? --- Revisit...
The Rise of Approximate Ontology Reasoning: Is It Mainstream Yet? --- Revisit...The Rise of Approximate Ontology Reasoning: Is It Mainstream Yet? --- Revisit...
The Rise of Approximate Ontology Reasoning: Is It Mainstream Yet? --- Revisit...Jeff Z. Pan
 
Visually Exploring Patent Collections for Events and Patterns
Visually Exploring Patent Collections for Events and PatternsVisually Exploring Patent Collections for Events and Patterns
Visually Exploring Patent Collections for Events and PatternsXiaoyu Wang
 
An Introduction to Quantum Programming Languages
An Introduction to Quantum Programming LanguagesAn Introduction to Quantum Programming Languages
An Introduction to Quantum Programming LanguagesDavid Yonge-Mallo
 
Ph d sem_1@iitm
Ph d sem_1@iitmPh d sem_1@iitm
Ph d sem_1@iitmVinu Ev
 
sa-mincut-aditya.ppt
sa-mincut-aditya.pptsa-mincut-aditya.ppt
sa-mincut-aditya.pptaashnareddy1
 
A Study on Glyph-based Visualisation with Dense Visual Context
A Study on Glyph-based Visualisation with Dense Visual ContextA Study on Glyph-based Visualisation with Dense Visual Context
A Study on Glyph-based Visualisation with Dense Visual ContextSaiful Khan
 

Semelhante a Efficient Query Suggestions in the Long Tail (11)

Slides for "Do Deep Generative Models Know What They Don't know?"
Slides for "Do Deep Generative Models Know What They Don't know?"Slides for "Do Deep Generative Models Know What They Don't know?"
Slides for "Do Deep Generative Models Know What They Don't know?"
 
Discussants
DiscussantsDiscussants
Discussants
 
Artificial Intelligence for Automated Software Testing
Artificial Intelligence for Automated Software TestingArtificial Intelligence for Automated Software Testing
Artificial Intelligence for Automated Software Testing
 
Linked Open Vocabulary Ranking and Terms Discovery
Linked Open Vocabulary Ranking and Terms DiscoveryLinked Open Vocabulary Ranking and Terms Discovery
Linked Open Vocabulary Ranking and Terms Discovery
 
The Rise of Approximate Ontology Reasoning: Is It Mainstream Yet? --- Revisit...
The Rise of Approximate Ontology Reasoning: Is It Mainstream Yet? --- Revisit...The Rise of Approximate Ontology Reasoning: Is It Mainstream Yet? --- Revisit...
The Rise of Approximate Ontology Reasoning: Is It Mainstream Yet? --- Revisit...
 
Visually Exploring Patent Collections for Events and Patterns
Visually Exploring Patent Collections for Events and PatternsVisually Exploring Patent Collections for Events and Patterns
Visually Exploring Patent Collections for Events and Patterns
 
An Introduction to Quantum Programming Languages
An Introduction to Quantum Programming LanguagesAn Introduction to Quantum Programming Languages
An Introduction to Quantum Programming Languages
 
Ph d sem_1@iitm
Ph d sem_1@iitmPh d sem_1@iitm
Ph d sem_1@iitm
 
sa-mincut-aditya.ppt
sa-mincut-aditya.pptsa-mincut-aditya.ppt
sa-mincut-aditya.ppt
 
sa.ppt
sa.pptsa.ppt
sa.ppt
 
A Study on Glyph-based Visualisation with Dense Visual Context
A Study on Glyph-based Visualisation with Dense Visual ContextA Study on Glyph-based Visualisation with Dense Visual Context
A Study on Glyph-based Visualisation with Dense Visual Context
 

Mais de yaevents

Модели в профессиональной инженерии и тестировании программ. Александр Петрен...
Модели в профессиональной инженерии и тестировании программ. Александр Петрен...Модели в профессиональной инженерии и тестировании программ. Александр Петрен...
Модели в профессиональной инженерии и тестировании программ. Александр Петрен...yaevents
 
Администрирование небольших сервисов или один за всех и 100 на одного. Роман ...
Администрирование небольших сервисов или один за всех и 100 на одного. Роман ...Администрирование небольших сервисов или один за всех и 100 на одного. Роман ...
Администрирование небольших сервисов или один за всех и 100 на одного. Роман ...yaevents
 
Мониторинг со всех сторон. Алексей Симаков, Яндекс
Мониторинг со всех сторон. Алексей Симаков, ЯндексМониторинг со всех сторон. Алексей Симаков, Яндекс
Мониторинг со всех сторон. Алексей Симаков, Яндексyaevents
 
Истории про разработку сайтов. Сергей Бережной, Яндекс
Истории про разработку сайтов. Сергей Бережной, ЯндексИстории про разработку сайтов. Сергей Бережной, Яндекс
Истории про разработку сайтов. Сергей Бережной, Яндексyaevents
 
Разработка приложений для Android на С++. Юрий Береза, Shturmann
Разработка приложений для Android на С++. Юрий Береза, ShturmannРазработка приложений для Android на С++. Юрий Береза, Shturmann
Разработка приложений для Android на С++. Юрий Береза, Shturmannyaevents
 
Кросс-платформенная разработка под мобильные устройства. Дмитрий Жестилевский...
Кросс-платформенная разработка под мобильные устройства. Дмитрий Жестилевский...Кросс-платформенная разработка под мобильные устройства. Дмитрий Жестилевский...
Кросс-платформенная разработка под мобильные устройства. Дмитрий Жестилевский...yaevents
 
Сложнейшие техники, применяемые буткитами и полиморфными вирусами. Вячеслав З...
Сложнейшие техники, применяемые буткитами и полиморфными вирусами. Вячеслав З...Сложнейшие техники, применяемые буткитами и полиморфными вирусами. Вячеслав З...
Сложнейшие техники, применяемые буткитами и полиморфными вирусами. Вячеслав З...yaevents
 
Сканирование уязвимостей со вкусом Яндекса. Тарас Иващенко, Яндекс
Сканирование уязвимостей со вкусом Яндекса. Тарас Иващенко, ЯндексСканирование уязвимостей со вкусом Яндекса. Тарас Иващенко, Яндекс
Сканирование уязвимостей со вкусом Яндекса. Тарас Иващенко, Яндексyaevents
 
Масштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebook
Масштабируемость Hadoop в Facebook. Дмитрий Мольков, FacebookМасштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebook
Масштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebookyaevents
 
Контроль зверей: инструменты для управления и мониторинга распределенных сист...
Контроль зверей: инструменты для управления и мониторинга распределенных сист...Контроль зверей: инструменты для управления и мониторинга распределенных сист...
Контроль зверей: инструменты для управления и мониторинга распределенных сист...yaevents
 
Юнит-тестирование и Google Mock. Влад Лосев, Google
Юнит-тестирование и Google Mock. Влад Лосев, GoogleЮнит-тестирование и Google Mock. Влад Лосев, Google
Юнит-тестирование и Google Mock. Влад Лосев, Googleyaevents
 
C++11 (formerly known as C++0x) is the new C++ language standard. Dave Abraha...
C++11 (formerly known as C++0x) is the new C++ language standard. Dave Abraha...C++11 (formerly known as C++0x) is the new C++ language standard. Dave Abraha...
C++11 (formerly known as C++0x) is the new C++ language standard. Dave Abraha...yaevents
 
Зачем обычному программисту знать языки, на которых почти никто не пишет. Але...
Зачем обычному программисту знать языки, на которых почти никто не пишет. Але...Зачем обычному программисту знать языки, на которых почти никто не пишет. Але...
Зачем обычному программисту знать языки, на которых почти никто не пишет. Але...yaevents
 
В поисках математики. Михаил Денисенко, Нигма
В поисках математики. Михаил Денисенко, НигмаВ поисках математики. Михаил Денисенко, Нигма
В поисках математики. Михаил Денисенко, Нигмаyaevents
 
Using classifiers to compute similarities between face images. Prof. Lior Wol...
Using classifiers to compute similarities between face images. Prof. Lior Wol...Using classifiers to compute similarities between face images. Prof. Lior Wol...
Using classifiers to compute similarities between face images. Prof. Lior Wol...yaevents
 
Поисковая технология "Спектр". Андрей Плахов, Яндекс
Поисковая технология "Спектр". Андрей Плахов, ЯндексПоисковая технология "Спектр". Андрей Плахов, Яндекс
Поисковая технология "Спектр". Андрей Плахов, Яндексyaevents
 
Julia Stoyanovich - Making interval-based clustering rank-aware
Julia Stoyanovich - Making interval-based clustering rank-awareJulia Stoyanovich - Making interval-based clustering rank-aware
Julia Stoyanovich - Making interval-based clustering rank-awareyaevents
 
Mike Thelwall - Sentiment strength detection for the social web: From YouTube...
Mike Thelwall - Sentiment strength detection for the social web: From YouTube...Mike Thelwall - Sentiment strength detection for the social web: From YouTube...
Mike Thelwall - Sentiment strength detection for the social web: From YouTube...yaevents
 
Evangelos Kanoulas — Advances in Information Retrieval Evaluation
Evangelos Kanoulas — Advances in Information Retrieval EvaluationEvangelos Kanoulas — Advances in Information Retrieval Evaluation
Evangelos Kanoulas — Advances in Information Retrieval Evaluationyaevents
 
Ben Carterett — Advances in Information Retrieval Evaluation
Ben Carterett — Advances in Information Retrieval EvaluationBen Carterett — Advances in Information Retrieval Evaluation
Ben Carterett — Advances in Information Retrieval Evaluationyaevents
 

Mais de yaevents (20)

Модели в профессиональной инженерии и тестировании программ. Александр Петрен...
Модели в профессиональной инженерии и тестировании программ. Александр Петрен...Модели в профессиональной инженерии и тестировании программ. Александр Петрен...
Модели в профессиональной инженерии и тестировании программ. Александр Петрен...
 
Администрирование небольших сервисов или один за всех и 100 на одного. Роман ...
Администрирование небольших сервисов или один за всех и 100 на одного. Роман ...Администрирование небольших сервисов или один за всех и 100 на одного. Роман ...
Администрирование небольших сервисов или один за всех и 100 на одного. Роман ...
 
Мониторинг со всех сторон. Алексей Симаков, Яндекс
Мониторинг со всех сторон. Алексей Симаков, ЯндексМониторинг со всех сторон. Алексей Симаков, Яндекс
Мониторинг со всех сторон. Алексей Симаков, Яндекс
 
Истории про разработку сайтов. Сергей Бережной, Яндекс
Истории про разработку сайтов. Сергей Бережной, ЯндексИстории про разработку сайтов. Сергей Бережной, Яндекс
Истории про разработку сайтов. Сергей Бережной, Яндекс
 
Разработка приложений для Android на С++. Юрий Береза, Shturmann
Разработка приложений для Android на С++. Юрий Береза, ShturmannРазработка приложений для Android на С++. Юрий Береза, Shturmann
Разработка приложений для Android на С++. Юрий Береза, Shturmann
 
Кросс-платформенная разработка под мобильные устройства. Дмитрий Жестилевский...
Кросс-платформенная разработка под мобильные устройства. Дмитрий Жестилевский...Кросс-платформенная разработка под мобильные устройства. Дмитрий Жестилевский...
Кросс-платформенная разработка под мобильные устройства. Дмитрий Жестилевский...
 
Сложнейшие техники, применяемые буткитами и полиморфными вирусами. Вячеслав З...
Сложнейшие техники, применяемые буткитами и полиморфными вирусами. Вячеслав З...Сложнейшие техники, применяемые буткитами и полиморфными вирусами. Вячеслав З...
Сложнейшие техники, применяемые буткитами и полиморфными вирусами. Вячеслав З...
 
Сканирование уязвимостей со вкусом Яндекса. Тарас Иващенко, Яндекс
Сканирование уязвимостей со вкусом Яндекса. Тарас Иващенко, ЯндексСканирование уязвимостей со вкусом Яндекса. Тарас Иващенко, Яндекс
Сканирование уязвимостей со вкусом Яндекса. Тарас Иващенко, Яндекс
 
Масштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebook
Масштабируемость Hadoop в Facebook. Дмитрий Мольков, FacebookМасштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebook
Масштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebook
 
Контроль зверей: инструменты для управления и мониторинга распределенных сист...
Контроль зверей: инструменты для управления и мониторинга распределенных сист...Контроль зверей: инструменты для управления и мониторинга распределенных сист...
Контроль зверей: инструменты для управления и мониторинга распределенных сист...
 
Юнит-тестирование и Google Mock. Влад Лосев, Google
Юнит-тестирование и Google Mock. Влад Лосев, GoogleЮнит-тестирование и Google Mock. Влад Лосев, Google
Юнит-тестирование и Google Mock. Влад Лосев, Google
 
C++11 (formerly known as C++0x) is the new C++ language standard. Dave Abraha...
C++11 (formerly known as C++0x) is the new C++ language standard. Dave Abraha...C++11 (formerly known as C++0x) is the new C++ language standard. Dave Abraha...
C++11 (formerly known as C++0x) is the new C++ language standard. Dave Abraha...
 
Зачем обычному программисту знать языки, на которых почти никто не пишет. Але...
Зачем обычному программисту знать языки, на которых почти никто не пишет. Але...Зачем обычному программисту знать языки, на которых почти никто не пишет. Але...
Зачем обычному программисту знать языки, на которых почти никто не пишет. Але...
 
В поисках математики. Михаил Денисенко, Нигма
В поисках математики. Михаил Денисенко, НигмаВ поисках математики. Михаил Денисенко, Нигма
В поисках математики. Михаил Денисенко, Нигма
 
Using classifiers to compute similarities between face images. Prof. Lior Wol...
Using classifiers to compute similarities between face images. Prof. Lior Wol...Using classifiers to compute similarities between face images. Prof. Lior Wol...
Using classifiers to compute similarities between face images. Prof. Lior Wol...
 
Поисковая технология "Спектр". Андрей Плахов, Яндекс
Поисковая технология "Спектр". Андрей Плахов, ЯндексПоисковая технология "Спектр". Андрей Плахов, Яндекс
Поисковая технология "Спектр". Андрей Плахов, Яндекс
 
Julia Stoyanovich - Making interval-based clustering rank-aware
Julia Stoyanovich - Making interval-based clustering rank-awareJulia Stoyanovich - Making interval-based clustering rank-aware
Julia Stoyanovich - Making interval-based clustering rank-aware
 
Mike Thelwall - Sentiment strength detection for the social web: From YouTube...
Mike Thelwall - Sentiment strength detection for the social web: From YouTube...Mike Thelwall - Sentiment strength detection for the social web: From YouTube...
Mike Thelwall - Sentiment strength detection for the social web: From YouTube...
 
Evangelos Kanoulas — Advances in Information Retrieval Evaluation
Evangelos Kanoulas — Advances in Information Retrieval EvaluationEvangelos Kanoulas — Advances in Information Retrieval Evaluation
Evangelos Kanoulas — Advances in Information Retrieval Evaluation
 
Ben Carterett — Advances in Information Retrieval Evaluation
Ben Carterett — Advances in Information Retrieval EvaluationBen Carterett — Advances in Information Retrieval Evaluation
Ben Carterett — Advances in Information Retrieval Evaluation
 

Último

Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 

Último (20)

Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 

Efficient Query Suggestions in the Long Tail

  • 1. Efficient Query Suggestions in the Long Tail Joint work: R. Perego, F. Silvestri, H. Vahabi, R. Venturini, HPC Lab, Italy F. Bonchi, Yahoo! Research, Spain Wednesday, August 24, 2011
  • 2. Query suggestion practices • Use of the Wisdom of the Crowd mined from Query Logs to recommend related queries that are likely to better specify the information need of the user • shorten length of user sessions • enhance perceived QoE Wednesday, August 24, 2011
  • 3. Queries in the Head Wednesday, August 24, 2011
  • 4. Queries in the Head Wednesday, August 24, 2011
  • 5. Queries in the Head Wednesday, August 24, 2011
  • 6. Queries in the Long Tail Wednesday, August 24, 2011
  • 7. Queries in the Long Tail ? Wednesday, August 24, 2011
  • 8. Queries in the Long Tail ? ? Wednesday, August 24, 2011
  • 9. Queries in the Long Tail ? Rare and never-seen ? queries account for more than 50% of the traffic! Wednesday, August 24, 2011
  • 10. Open issues • Sparsity of models: • query assistance services perform poorly or are not even triggered on long-tail queries • Performance: Popularity • on-line process going in parallel with query answering Queries ordered by popularity Wednesday, August 24, 2011
  • 11. SoA: Query Flow Graph • Query-centric approach • Suggest queries by computing Random Walks with Restarts (RWRs) on the query-flow graph (QFG) by starting from the current user query P. Boldi, F. Bonchi, C. Castillo, D. Donato, A. Gionis, S.Vigna: The query-flow graph: model and applications. CIKM 2008: 609-618 P. Boldi, F. Bonchi, C. Castillo, D. Donato, A. Gionis, S.Vigna: Query suggestions using query-flow graphs. WSCD, 2009 Wednesday, August 24, 2011
  • 12. Query-centric suggestions Computing RWRs on a huge graph, e.g., built from a QL recording 580,797,850 queries (from Y! us): • |V| 28,763,637 • |E| 56,250,874 Wednesday, August 24, 2011
  • 13. Query-centric suggestions Computing RWRs on a huge graph, e.g., built from a QL recording 580,797,850 queries (from Y! us): • |V| 28,763,637 • |E| 56,250,874 • |{q: f(q)=1}| 162,221,967 (28%) Wednesday, August 24, 2011
  • 14. Term-centric opportunities But, in the same Y! QL: • queries 580,797,850 • Term occurrences 1,343,988,549 Wednesday, August 24, 2011
  • 15. Term-centric opportunities But, in the same Y! QL: • queries 580,797,850 • Term occurrences 1,343,988,549 • |{t: f(t)=1}| 5,099,145 (0.04%) Wednesday, August 24, 2011
  • 16. The TQGraph *#%+,) "#%&'("')& /#)( %-!&.'"# !"## "#%&'("')& !"##$ /#)($*#%+,) "#%&'("')& *#%+,)$ %-!&.'"# !"#$%&' Term nodes are added to the QFG which have only outgoing links pointing at the query nodes corresponding to the queries in which the terms occur. Wednesday, August 24, 2011
  • 17. TQGraph model Suggestions for an incoming query q composed of terms {t1,...tm} ⊆ T are generated from G by extracting the center-piece subgraph starting from the nodes corresponding to terms t1,...,tm • perform m Random Walks with Restart from each one of the m term nodes corresponding to terms in q • multiply component-wise the resulting m stationary distributions Wednesday, August 24, 2011
  • 18. fro 100 queries on Yahoo! useful somewhat not useful M α = 0.9 48% 11% 41% re TQG effectiveness α = 0.5 α = 0.1 41% 37% 20% 20% 39% 43% Table 2: Effectiveness of TQGraph-based recommen- • User study results comparing TQG and QFGand query dations on the two different set of queries effectiveness for two different testbeds (Y! US and MSN QLs). logs, by varying the restart parameter α. TREC on MSN useful somewhat not useful TQGraph α = 0.9 57% 16% 27% QFG 50% 9% 42% sid is 100 queries on Yahoo! useful somewhat not useful sit TQGraph α = 0.9 48% 11% 41% qu QFG 23% 10% 67% lem in m Table 3: User study results comparing effectiveness th of our method with the baseline for the two different testbeds. qu Wednesday, August 24, 2011
  • 19. queries on Yahoo! useful somewhat not useful MSN query ing to outperform the effectiveness of QFG. The fact that log. The query is “lower heart rate”. Below we 0.9 48% 11% 41% report the top 5 recommendations. 0.5 we, instead, achieve such a remarkable result is, actually, an 41% 20% 39% additional benefit of TQGraph-based 43% methods. Effectiveness on rare queries 0.1 37% 20% Query: lower heart rate Anecdotal evidence. We next show a few examplesSuggested Query of Score query recommendations. We start from queries that have things to lower heart rate 2.9 e−14 e 2: Effectiveness of TQGraph-based recommen- lower heart rate through exercise 2.6 e−14 nsnever been observed in the query log, i.e., the most difficult on the two different set of queries and query accelerated heart rate and pregnant 2.9 e−15 by varying the restart parameter α. is one among the eight cases. The first query that we show • web md 2.0 e−16 on MSN Anecdotal evidence from the TREC testbed that do not appear at all in the problems ECMSN query log. useful query is “lower heart rate”. Below we The somewhat not useful heart 8.0 e−17 Graph α = 0.9 top 5 57% report the recommendations. 27% 16% We can observe that all the top-5 suggestions can be con- 50% 9% 42% sidered pertinent to the initial topic. Moreover, even if this Query: lower heart rate is not an objective in this paper, they present some diver- queries on Yahoo! useful somewhat not useful Suggested Query Score the first two are how-to queries, while the last three are sity: Graph α = things to lower heart rate 0.9 48% 11% 41% queries related to finding information w.r.t. possible prob- 2.9 e−14 23% 10% lower heart rate through exercise 67% 2.6 e−14 Query not occurring lems (with one very specific for pregnant women). The most interesting recommendation is probably “web md”, which e 3: Userweb mdresults comparing effectiveness study −15 in the training log accelerated heart rate and pregnant 2.9 e makes perfect sense4 , and has a large edit distance from 2.0 e−16 original query. the r method with the baseline for the two different heart problems 8.0 e−17The next query we present is a rare (i.e., rarely appearing eds. query): “dog heat”; which appears only twice in the MSN query log. eas, we pair TREC queries up with the model built on can be con- We can observe that all the top-5 suggestions SN query log. In fact,to the initial topic. Moreover, even if this sidered pertinent the period from which TREC Query: dog heat come, is an objectiveperiod in paper, MSN queries some diver- is not close to the in this which they present ubmitted. first two are how-to queries, while the last three are Suggested Query Score sity: the heat cycle dog pads 4.3 e−10 generated the top-5 recommendations for each query queries related to finding information w.r.t. possible what happens when female dog is prob- Query occurring twice ng both the QFG and the TQGraph with different pa- lems (with one very specific for pregnant women). The most in heat & a male dog is around ers setting. Using a web interface each assessor was 4.0 e−10 in the training log nted a random query followed by the list of all the“web md”, boxer dog in heat interesting recommendation is probably dif- sense4 , Recommendations were which makes perfect produced.and has a large edit distancedog in heat symptoms recommendations from 3.99 e−10 3.98 e−10 the original query. nted shuffled, in order for the assessor to not be able to behavior of a male dog which system produced them. We give (i.e., rarely appearing around a female dog in heat guishThe next query we present is a rare assessors 3.95 e−10 ossibility to “dog heat”; search engine results for the in the MSN query): observe the which appears only twice al query and the recommended query that was being query log. As in the previous example, the top-5 suggestions are ated. The assessor was asked to rate a recommendation qualitatively good and present some diversity. Also, the one of the following scores: useful, somewhat useful, Wednesday, August 24, 2011 TQGraph-based method returns long queries, thus likely to
  • 20. TQG pros • provide query suggestions of quality comparable/better than QFG even for rare and unique queries • several possible optimizations for achieving Wednesday, August 24, 2011
  • 21. TQG pros • provide query suggestions of quality comparable/better than QFG even for rare and unique queries • several possible optimizations for achieving an efficient on-line query recommendation service Wednesday, August 24, 2011
  • 22. Indexing precomputed suggestions !"#$%&'( 012"#3"4%014"5%#"6#"7"1389:1%:;%3<"%=>=7% ?:$6@3"4%:1%3<"%!AB#86<C%!<"%D"5-?:1%-7%$84"% @6%:;%3"#$%1:4"7(%6:791E7%8#"%3<"%7389:18#F% 4-73#-G@9:1%28D@"7C% )*%+,-%$.(/01*$023+,-(,4("35$.(6,751(0-(*'5( !"#$%&'(%1(,2*%0-57(2.(%(898(4$,:(!5$:(;( • recommendations for an incoming query+ are computed -by !"#$%&% '&()*% ')()+*% ') (),*% ') ().-/&**% processing the posting lists associated with the terms in the query 90*'0-(23<=5*1(>35$051(%$5(1,$*57(2.(*'50$(?/1@()<,$51(%$5( %&&$,A0:%*57(2.(*'5(B$5%*51*(2,3-7C(0@5@(D0(4,$(%EE(0(F(G@( Wednesday, August 24, 2011
  • 23. Indexing precomputed suggestions !"#$%&'( 012"#3"4%014"5%#"6#"7"1389:1%:;%3<"%=>=7% ?:$6@3"4%:1%3<"%!AB#86<C%!<"%D"5-?:1%-7%$84"% @6%:;%3"#$%1:4"7(%6:791E7%8#"%3<"%7389:18#F% 4-73#-G@9:1%28D@"7C% )*%+,-%$.(/01*$023+,-(,4("35$.(6,751(0-(*'5( !"#$%&'(%1(,2*%0-57(2.(%(898(4$,:(!5$:(;( • recommendations for an incoming query+ are computed -by !"#$%&% '&()*% ')()+*% ') (),*% ') ().-/&**% processing the posting lists associated with the terms in the query 90*'0-(23<=5*1(>35$051(%$5(1,$*57(2.(*'50$(?/1@()<,$51(%$5( :) O(|T|) posting lists %&&$,A0:%*57(2.(*'5(B$5%*51*(2,3-7C(0@5@(D0(4,$(%EE(0(F(G@( :( O(|Q|) length of each posting list Wednesday, August 24, 2011
  • 24. Pruning posting lists • sort postings by probability and prune them at a reasonable threshold p, e.g. 20,000 me quality of those Graph-based recom- lity those produced ar, TQGraph-based very large fraction as we shall present QGraph can be pre- d list”-based repre- very fast generation PH ved online, a query ciently, possibly in n we introduce some eneration of recom- r, we show that the Wednesday, August 24, 2011
  • 25. Pruning posting lists • sort postings by probability and prune them at a reasonable threshold p, e.g. 20,000 me quality of those Graph-based recom- lity those produced ar, TQGraph-based very large fraction as we shall present QGraph can be pre- d list”-based repre- very fast generation PH ved online, a query O(|T|) lists, each of size O(p) and no loss in quality! ciently, possibly in n we introduce some eneration of recom- r, we show that the Wednesday, August 24, 2011
  • 26. !"#$%&'( Bucketing probabilities 012"#3"4%014"5%#"6#"7"1389:1%:;%3<"%=>=7% • Most space used for storing probabilities ?:$6@3"4%:1%3<"%!AB#86<C%!<"%D"5-?:1%-7%$84"% @6%:;%3"#$%1:4"7(%6:791E7%8#"%3<"%7389:18#F% 4-73#-G@9:1%28D@"7C% • Given ε < 1, we can arrange postings in buckets implicitly coding the approximate probabilities )*%+,-%$.(/01*$023+,-(,4("35$.(6,751(0-(*'5( !"#$%&'(%1(,2*%0-57(2.(%(898(4$,:(!5$:(;( !"#$%&% '&()*% ')()+*% ')+(),*% ')-().-/&**% 90*'0-(23<=5*1(>35$051(%$5(1,$*57(2.(*'50$(?/1@()<,$51(%$5( %&&$,A0:%*57(2.(*'5(B$5%*51*(2,3-7C(0@5@(D0(4,$(%EE(0(F(G@( Wednesday, August 24, 2011
  • 27. !"#$%&'( Bucketing probabilities 012"#3"4%014"5%#"6#"7"1389:1%:;%3<"%=>=7% • Most space used for storing probabilities ?:$6@3"4%:1%3<"%!AB#86<C%!<"%D"5-?:1%-7%$84"% @6%:;%3"#$%1:4"7(%6:791E7%8#"%3<"%7389:18#F% 4-73#-G@9:1%28D@"7C% • Given ε < 1, we can arrange postings in buckets implicitly coding the approximate probabilities )*%+,-%$.(/01*$023+,-(,4("35$.(6,751(0-(*'5( !"#$%&'(%1(,2*%0-57(2.(%(898(4$,:(!5$:(;( !"#$%&% '&()*% ')()+*% ')+(),*% ')-().-/&**% 90*'0-(23<=5*1(>35$051(%$5(1,$*57(2.(*'50$(?/1@()<,$51(%$5( %&&$,A0:%*57(2.(*'5(B$5%*51*(2,3-7C(0@5@(D0(4,$(%EE(0(F(G@( • Each entry coded with a few bits, e.g., 11-19 bits • ~5x reduction! • no loss in quality! Wednesday, August 24, 2011
  • 28. Caching posting lists ided both • achieving in-memory query suggestion are d us- e lost ever, sults. qual- en in hat a ed by make iffer- which Figure 4: Miss ratio of our cache as a function of t Wednesday, August 24, 2011 our
  • 29. Conclusions • TQG model to overcome limitations of current query recommenders • based on a principled, term-centric approach supporting rare and never-seen queries • deployment with a efficient inverted index resulting in effectiveness comparable/better to SoA approaches • the pruning, bucketing, caching techniques proposed constitute a independent contribution in the area of efficiency in large scale RWR computations • reduction of about 80% in the space occupancy w.r.t. uncompressed data structures • in-memory RWRs on huge graphs with 90+ % hit-ratio cache Wednesday, August 24, 2011
  • 30. Вопросы? Questions? Wednesday, August 24, 2011