SlideShare uma empresa Scribd logo
1 de 2
Baixar para ler offline
Searching for Interestingness
                                    in Wikipedia and Yahoo! Answers

                         Yelena Mejova1 Ilaria Bordino2 Mounia Lalmas3 Aristides Gionis4
                                {1,2,3}                                                      4
                                          Yahoo! Research Barcelona, Spain                       Aalto University, Finland
                                   {1 ymejova, 2 bordino, 3 mounia}@yahoo-inc.com                 4
                                                                                                      aristides.gionis@aalto.fi



ABSTRACT                                                                            2. ENTITY NETWORKS
In many cases, when browsing the Web, users are searching                              We extract entity networks from (i) a dump of the En-
for specific information. Sometimes, though, users are also                          glish Wikipedia from December 2011 consisting of 3 795 865
looking for something interesting, surprising, or entertain-                        articles, and (ii) a sample of the English Yahoo! Answers
ing. Serendipitous search puts interestingness on par with                          dataset from 2010/2011, containing 67 336 144 questions and
relevance. We investigate how interesting are the results one                       261 770 047 answers. We use state-of-the-art methods [3, 5]
can obtain via serendipitous search, and what makes them                            to extract entities from the documents in each dataset.
so, by comparing entity networks extracted from two promi-                             Next we draw an arc between any two entities e1 and e2
nent social media sites, Wikipedia and Yahoo! Answers.                              that co-occur in one or more documents. We assign the arc
                                                                                    a weight w1 (e1 , e2 ) = DF(e1 , e2 ) equal to the number of such
Categories and Subject Descriptors                                                  documents (the document frequency (DF) of the entity pair).
H.4 [Information Systems Applications]: Miscellaneous                                  This weighting scheme tends to favor popular entities. To
                                                                                    mitigate this effect, we measure the rarity of any entity e
Keywords                                                                            in a dataset by computing its inverse document frequency
Serendipity, Exploratory search                                                     IDF(e) = log(N )−log(DF(e)), where N is the size of the col-
                                                                                    lection, and DF(e) is the document frequency of entity e. We
1. INTRODUCTION                                                                     set a threshold on IDF to drop the arcs that involve the most
   Serendipitous search occurs when a user with no a priori                         popular entities. We also rescale the arc weights according
or totally unrelated intentions interacts with a system and                         to the alternative scheme w2 (e1 → e2 ) = DF(e1 , e2 )·IDF(e2 ).
acquires useful information [4]. A system supporting such                              We use Personalized PageRank (PPR) [1] to extract the
exploratory capabilities must provide results that are rele-                        top n entities related to a query entity. We consider two
vant to the user’s current interest, and yet interesting, to                        scoring methods. When using the w2 weighting scheme, we
encourage the user to continue the exploration.                                     simply use the PPR scores (we dub this method IDF). When
   In this work, we describe an entity-driven exploratory and                       using the simpler scheme w1 , we normalize the PPR scores
serendipitous search system, based on enriched entity net-                          by the global PageRank scores (with no personalization) to
works that are explored through random-walk computations                            penalize popular entities. We dub this method PN.
to retrieve search results for a given query entity. We extract                        We enrich our entity networks with metadata regard-
entity networks from two datasets, Wikipedia, a curated,                            ing sentiment and quality of the documents. Using Sen-
collaborative online encyclopedia, and Yahoo! Answers, a                            tiStrength1 , we extract sentiment scores for each document.
more unconstrained question/answering forum, where the                              We calculate attitude and sentimentality metrics [2] to mea-
freedom of conversation may present advantages such as                              sure polarity and strength of the sentiment. Regarding qual-
opinions, rumors, and social interest and approval.                                 ity, for Yahoo! Answers documents we count the number of
   We compare the networks extracted from the two media                             points assigned by the system to the users, as indication of
by performing user studies in which we juxtapose interest-                          expertise and thus good quality. For Wikipedia, we count
ingness of the results retrieved for a query entity, with rel-                      the number of dispute messages inserted by editors to require
evance. We investigate whether interestingness depends on                           revisions, as indication of bad quality. We derive sentiment
(i) the curated/uncurated nature of the dataset, and/or on                          and quality scores for any entity by averaging over all the
(ii) additional characteristics of the results, such as senti-                      documents in which the entity appears. We use Wikimedia2
ment, content quality, and popularity.                                              statistics to estimate the popularity of entities.

                                                                                    3. EXPLORATORY SEARCH
                                                                                      We test our system using a set of 37 queries originat-
Permission to make digital or hard copies of all or part of this work for           ing from 2010 and 2011 Google Zeitgeist (www.google.com/
personal or classroom use is granted without fee provided that copies are           zeitgeist) and having sufficient coverage in both datasets.
not made or distributed for profit or commercial advantage and that copies           Using one of the two algorithms – PN or IDF – we retrieve
bear this notice and the full citation on the first page. To copy otherwise, to
                                                                                    the top five entities from each dataset – YA or WP – for each
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.                                                            1
WWW 2013 Companion, May 13–17, 2013, Rio de Janeiro, Brazil.                            sentistrength.wlv.ac.uk
                                                                                    2
Copyright 2013 ACM 978-1-4503-2038-2/13/05 ...$15.00.                                   dumps.wikimedia.org/other/pagecounts-raw
Figure 1: Performance: (a) and (b) scale range from 1 to 4, (c) correlation range from 0 to 1
      4.0




                                                                   4.0
                     int. to query   int. to user   relevant                 diverse      frustrating    interesting    learn new              int. to query      int. to user   relevant




                                                                                                                                    0.8
      3.0




                                                                   3.0




                                                                                                                                    0.4
      2.0




                                                                   2.0
      1.0




                                                                   1.0




                                                                                                                                    0.0
              ya r   ya pn       ya idf     wp r    wp pn wp idf          ya r    ya pn    ya idf       wp r     wp pn wp idf             ya pn          ya idf          wp pn      wp idf
      (a) Average query/result pair labels                               (b) Average query set labels                               (c) Corr. with learn smth new



query. For comparison, we consider setups consisting of 5                                        query (0.214), and quality of the result entity (0.201). These
random entities. Note that unlike for conventional retrieval,                                    features point to important aspects of a retrieval strategy
a random baseline is feasible for a browsing task.                                               which would lead to a successful serendipitous search.
   We recruit four editors to annotate the retrieved results,                                                          Table 1: Retrieval result examples
asking them to evaluate each result entity for relevance, in-                                        YA query: Kim Kardashian       Attitude   Sentiment.            Quality         Pageviews
terestingness to the query, and interestingness regardless of                                        Perry Williams                    0           0                   0                85
                                                                                                     Eva Longoria Parker             −0.602       2.018                6         1 450 814
the query, with responses falling on scale from 1 to 4 (Fig-
                                                                                                     WP query: H1N1 pandemic        Attitude   Sentiment.            Quality          Pageviews
ure 1(a)). Both of our retrieval methods outperform the                                               Phaungbyin                       2           2                   1                706
random baseline (at p < 0.01). The gain in interestingness                                           2009 US flu pandemic               1           1                   1             21 981
to the user despite the query suggests that randomly viewed
information is not intrinsically interesting to the user.                                        4. DISCUSSION & CONCLUSION
   Whereas performance improves from PN to IDF for YA,                                              Beyond the aggregate measures of the previous section,
the interestingness to the user is hurt significantly (at p <                                     the peculiarities of Yahoo! Answers and Wikipedia as so-
0.01) for WP (the other measures remain statistically the                                        cial media present unique advantages and challenges for
same). Note that PN uses the weighting scheme w1 , while                                         serendipitous search. For example, Table 1 shows poten-
IDF operates on the networks sparsified and weighted ac-                                          tial search YA results for an American socialite Kim Kar-
cording to function w2 . The frequency-based approach ap-                                        dashian: an actress Eva Longoria Parker (whose Wikipedia
plied by IDF mediates the mentioning of popular entities in                                      page has over a million visits in two years), and a footballer
a non-curated dataset like YA, but it fails to capture the im-                                   Perry Williams (who played his last game in 1993). Note
portance of entities in a domain with restricted authorship.                                     the difference in attitude and sentimentality. Yahoo! An-
   Next we ask the editors to look at the five results as a                                       swers provides a wider spread of emotion. This data may be
whole, measuring diversity, frustration, interestingness, and                                    of use when searching for potentially serendipitous entities.
the ability of the user to learn something new about the                                            Table 1 also shows potential WP results for the query
query. Figure 1(b) shows that the two random runs are                                            H1N1 Pandemic: a town in Burma called Phaungbyin, and
highly diverse but provoke the most frustration. The most                                        2009 flu pandemic in the United States. We may expect
diverse and the least frustrating result sets are provided by                                    pandemic to be associated with negative sentiment, but the
the YA IDF run. The WP PN run also shows high diversity,                                         documents in Wikipedia do not display it.
but it falls with the IDF constraint. The YA IDF run gives                                          It is our intuition that the two datasets provide a comple-
better diversity and interesting scores at p < 0.01 than the                                     mentary view of the entities and their relations, and that a
WP IDF run, while performing statistically the same.                                             hybrid system exploiting both resources would provide the
   To examine the relationship with the serendipity level of                                     best user experience. We leave this for future work.
the content, we compute correlation between the learn some-
thing new label (LSN) and the others. Figure 1(c) shows                                          5. ACKNOWLEDGEMENTS
the LSN label to be the least correlated with interests of the                                     This work was partially funded by the European Union
user in the WP IDF run, and the most for the YA IDF run.                                         Linguistically Motivated Semantic Aggregation Engines
Especially in the WP IDF run, the relevance is highly asso-                                      (LiMoSINe) project3 .
ciated with the LSN label. We are witnessing two different                                        References
searching experiences: in the YA IDF setup the results are
                                                                                                 [1] G. Jeh and J. Widom. Scaling personalized web search. In
diverse and popular, whereas in the WP IDF setup the re-                                             WWW ’03, pages 271–279. ACM, 2003.
sults are less diverse, and the user may be less interested in                                   [2] O. Kucuktunc, B. Cambazoglu, I. Weber, and H. Ferhatos-
the relevant content, but it will be just as educational.                                            manoglu. A large-scale sentiment analysis for yahoo! answers.
   Finally we analyze the metadata collected for the entities                                        In WSDM ’12, pages 633–642. ACM, 2012.
in any query-result pair: Attitude (A), Sentimentality (S),                                      [3] D. Paranjpe. Learning document aboutness from implicit user
                                                                                                     feedback and document structure. In CIKM, 2009.
Quality (Q), Popularity (V), and Context (T). For each pair,
                                                                                                 [4] E. G. Toms. Serendipitous information retrieval. In DELOS,
we calculate the difference between query and result in these                                         2000.
dimensions. For Context we compute the cosine similarity                                         [5] Y. Zhou, L. Nie, O. Rouhani-Kalleh, F. Vasile, and S. Gaffney.
between the TF/IDF vectors of the entities. In aggregate,                                            Resolving surface forms to Wikipedia topics. In COLING,
the best connections are between result popularity and rel-                                          pages 1335–1343, 2010.
evance (0.234), as well as interestingness of the result to the                                  3
                                                                                                     www.limosine-project.eu
user (0.227), followed by contextual similarity of result and

Mais conteúdo relacionado

Mais procurados

Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...
Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...
Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...IRJET Journal
 
Social media recommendation based on people and tags (final)
Social media recommendation based on people and tags (final)Social media recommendation based on people and tags (final)
Social media recommendation based on people and tags (final)es712
 
A Priori Relevance Based On Quality and Diversity Of Social Signals
A Priori Relevance Based On Quality and Diversity Of Social SignalsA Priori Relevance Based On Quality and Diversity Of Social Signals
A Priori Relevance Based On Quality and Diversity Of Social SignalsIsmail BADACHE
 
AN AFFECTIVE AWARE PSEUDO ASSOCIATION METHOD TO CONNECT DISJOINT USERS ACROSS...
AN AFFECTIVE AWARE PSEUDO ASSOCIATION METHOD TO CONNECT DISJOINT USERS ACROSS...AN AFFECTIVE AWARE PSEUDO ASSOCIATION METHOD TO CONNECT DISJOINT USERS ACROSS...
AN AFFECTIVE AWARE PSEUDO ASSOCIATION METHOD TO CONNECT DISJOINT USERS ACROSS...ijnlc
 
OpinionMiner: A Novel Machine Learning System for Web Opinion Mining and Extr...
OpinionMiner: A Novel Machine Learning System for Web Opinion Mining and Extr...OpinionMiner: A Novel Machine Learning System for Web Opinion Mining and Extr...
OpinionMiner: A Novel Machine Learning System for Web Opinion Mining and Extr...Content Savvy
 

Mais procurados (6)

Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...
Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...
Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...
 
Social media recommendation based on people and tags (final)
Social media recommendation based on people and tags (final)Social media recommendation based on people and tags (final)
Social media recommendation based on people and tags (final)
 
Df32676679
Df32676679Df32676679
Df32676679
 
A Priori Relevance Based On Quality and Diversity Of Social Signals
A Priori Relevance Based On Quality and Diversity Of Social SignalsA Priori Relevance Based On Quality and Diversity Of Social Signals
A Priori Relevance Based On Quality and Diversity Of Social Signals
 
AN AFFECTIVE AWARE PSEUDO ASSOCIATION METHOD TO CONNECT DISJOINT USERS ACROSS...
AN AFFECTIVE AWARE PSEUDO ASSOCIATION METHOD TO CONNECT DISJOINT USERS ACROSS...AN AFFECTIVE AWARE PSEUDO ASSOCIATION METHOD TO CONNECT DISJOINT USERS ACROSS...
AN AFFECTIVE AWARE PSEUDO ASSOCIATION METHOD TO CONNECT DISJOINT USERS ACROSS...
 
OpinionMiner: A Novel Machine Learning System for Web Opinion Mining and Extr...
OpinionMiner: A Novel Machine Learning System for Web Opinion Mining and Extr...OpinionMiner: A Novel Machine Learning System for Web Opinion Mining and Extr...
OpinionMiner: A Novel Machine Learning System for Web Opinion Mining and Extr...
 

Destaque (18)

Articulo mary-la mejor escuela para padres-2011
Articulo  mary-la mejor escuela para padres-2011Articulo  mary-la mejor escuela para padres-2011
Articulo mary-la mejor escuela para padres-2011
 
Bitacora 4
Bitacora 4Bitacora 4
Bitacora 4
 
Bitacora 5
Bitacora 5Bitacora 5
Bitacora 5
 
Ct impuestos nacionales ok impuesto a las ventas
Ct impuestos nacionales ok impuesto a las ventasCt impuestos nacionales ok impuesto a las ventas
Ct impuestos nacionales ok impuesto a las ventas
 
Bitacora 6
Bitacora 6Bitacora 6
Bitacora 6
 
Bitacora 8
Bitacora 8Bitacora 8
Bitacora 8
 
Plan de desarrollo profecional
Plan de desarrollo profecionalPlan de desarrollo profecional
Plan de desarrollo profecional
 
Agritourism - introduction
Agritourism - introductionAgritourism - introduction
Agritourism - introduction
 
Distributivo docentes 2013
Distributivo docentes 2013Distributivo docentes 2013
Distributivo docentes 2013
 
Digital strategy for pure michigan
Digital strategy for pure michiganDigital strategy for pure michigan
Digital strategy for pure michigan
 
An introduction to MongoDB by César Trigo #OpenExpoDay 2014
An introduction to MongoDB by César Trigo #OpenExpoDay 2014An introduction to MongoDB by César Trigo #OpenExpoDay 2014
An introduction to MongoDB by César Trigo #OpenExpoDay 2014
 
Algoritmos diagrama-de-flujo
Algoritmos diagrama-de-flujoAlgoritmos diagrama-de-flujo
Algoritmos diagrama-de-flujo
 
Tourism and travel agency management - cambridge
Tourism and travel agency management - cambridgeTourism and travel agency management - cambridge
Tourism and travel agency management - cambridge
 
Distribucion horario 2013
Distribucion horario 2013Distribucion horario 2013
Distribucion horario 2013
 
Proyecto abp
Proyecto abpProyecto abp
Proyecto abp
 
Business of Inbound tour operators at a glance
Business of Inbound tour operators at a glanceBusiness of Inbound tour operators at a glance
Business of Inbound tour operators at a glance
 
Guia 3e 2013
Guia 3e 2013Guia 3e 2013
Guia 3e 2013
 
CRM - Customer Relationship Management
CRM - Customer Relationship ManagementCRM - Customer Relationship Management
CRM - Customer Relationship Management
 

Semelhante a Searching for Interestingness in Wikipedia and Yahoo Answers

Arcomem training – Enrichment Advanced (update)
Arcomem training – Enrichment Advanced (update)Arcomem training – Enrichment Advanced (update)
Arcomem training – Enrichment Advanced (update)arcomem
 
Generating domain specific sentiment lexicons using the Web Directory
Generating domain specific sentiment lexicons using the Web Directory Generating domain specific sentiment lexicons using the Web Directory
Generating domain specific sentiment lexicons using the Web Directory acijjournal
 
Arcomem training enrichment_advanced
Arcomem training enrichment_advancedArcomem training enrichment_advanced
Arcomem training enrichment_advancedarcomem
 
Semantic IoT Semantic Inter-Operability Practices - Part 1
Semantic IoT Semantic Inter-Operability Practices - Part 1Semantic IoT Semantic Inter-Operability Practices - Part 1
Semantic IoT Semantic Inter-Operability Practices - Part 1iotest
 
Using transfer learning for video popularity prediction
Using transfer learning for video popularity predictionUsing transfer learning for video popularity prediction
Using transfer learning for video popularity predictioneSAT Publishing House
 
FLOWER VOICE: VIRTUAL ASSISTANT FOR OPEN DATA
FLOWER VOICE: VIRTUAL ASSISTANT FOR OPEN DATAFLOWER VOICE: VIRTUAL ASSISTANT FOR OPEN DATA
FLOWER VOICE: VIRTUAL ASSISTANT FOR OPEN DATAIJwest
 
Adaptive named entity recognition for social network analysis and domain onto...
Adaptive named entity recognition for social network analysis and domain onto...Adaptive named entity recognition for social network analysis and domain onto...
Adaptive named entity recognition for social network analysis and domain onto...Cuong Tran Van
 
Cluster Based Web Search Using Support Vector Machine
Cluster Based Web Search Using Support Vector MachineCluster Based Web Search Using Support Vector Machine
Cluster Based Web Search Using Support Vector MachineCSCJournals
 
Clustering of Deep WebPages: A Comparative Study
Clustering of Deep WebPages: A Comparative StudyClustering of Deep WebPages: A Comparative Study
Clustering of Deep WebPages: A Comparative Studyijcsit
 
IRJET - Cyberbulling Detection Model
IRJET -  	  Cyberbulling Detection ModelIRJET -  	  Cyberbulling Detection Model
IRJET - Cyberbulling Detection ModelIRJET Journal
 
Entity Linking Combining Open Source Annotators
Entity Linking Combining Open Source AnnotatorsEntity Linking Combining Open Source Annotators
Entity Linking Combining Open Source Annotatorspruiz_
 
Information Extraction from Text, presented @ Deloitte
Information Extraction from Text, presented @ DeloitteInformation Extraction from Text, presented @ Deloitte
Information Extraction from Text, presented @ DeloitteDeep Kayal
 
Achieving Privacy in Publishing Search logs
Achieving Privacy in Publishing Search logsAchieving Privacy in Publishing Search logs
Achieving Privacy in Publishing Search logsIOSR Journals
 
Prediction of Reaction towards Textual Posts in Social Networks
Prediction of Reaction towards Textual Posts in Social NetworksPrediction of Reaction towards Textual Posts in Social Networks
Prediction of Reaction towards Textual Posts in Social NetworksMohamed El-Geish
 
IRJET - Suicidal Text Detection using Machine Learning
IRJET -  	  Suicidal Text Detection using Machine LearningIRJET -  	  Suicidal Text Detection using Machine Learning
IRJET - Suicidal Text Detection using Machine LearningIRJET Journal
 
IRJET-Classifying Mined Online Discussion Data for Reflective Thinking based ...
IRJET-Classifying Mined Online Discussion Data for Reflective Thinking based ...IRJET-Classifying Mined Online Discussion Data for Reflective Thinking based ...
IRJET-Classifying Mined Online Discussion Data for Reflective Thinking based ...IRJET Journal
 
Sas web 2010 lora-aroyo
Sas web 2010 lora-aroyoSas web 2010 lora-aroyo
Sas web 2010 lora-aroyoLora Aroyo
 
AELA: An Adaptive Entity Linking Approach
AELA: An Adaptive Entity Linking ApproachAELA: An Adaptive Entity Linking Approach
AELA: An Adaptive Entity Linking ApproachBianca Pereira
 
Data Publishing Workflows with Dataverse
Data Publishing Workflows with DataverseData Publishing Workflows with Dataverse
Data Publishing Workflows with DataverseMicah Altman
 

Semelhante a Searching for Interestingness in Wikipedia and Yahoo Answers (20)

Arcomem training – Enrichment Advanced (update)
Arcomem training – Enrichment Advanced (update)Arcomem training – Enrichment Advanced (update)
Arcomem training – Enrichment Advanced (update)
 
Generating domain specific sentiment lexicons using the Web Directory
Generating domain specific sentiment lexicons using the Web Directory Generating domain specific sentiment lexicons using the Web Directory
Generating domain specific sentiment lexicons using the Web Directory
 
Arcomem training enrichment_advanced
Arcomem training enrichment_advancedArcomem training enrichment_advanced
Arcomem training enrichment_advanced
 
Semantic IoT Semantic Inter-Operability Practices - Part 1
Semantic IoT Semantic Inter-Operability Practices - Part 1Semantic IoT Semantic Inter-Operability Practices - Part 1
Semantic IoT Semantic Inter-Operability Practices - Part 1
 
Using transfer learning for video popularity prediction
Using transfer learning for video popularity predictionUsing transfer learning for video popularity prediction
Using transfer learning for video popularity prediction
 
FLOWER VOICE: VIRTUAL ASSISTANT FOR OPEN DATA
FLOWER VOICE: VIRTUAL ASSISTANT FOR OPEN DATAFLOWER VOICE: VIRTUAL ASSISTANT FOR OPEN DATA
FLOWER VOICE: VIRTUAL ASSISTANT FOR OPEN DATA
 
Adaptive named entity recognition for social network analysis and domain onto...
Adaptive named entity recognition for social network analysis and domain onto...Adaptive named entity recognition for social network analysis and domain onto...
Adaptive named entity recognition for social network analysis and domain onto...
 
Cluster Based Web Search Using Support Vector Machine
Cluster Based Web Search Using Support Vector MachineCluster Based Web Search Using Support Vector Machine
Cluster Based Web Search Using Support Vector Machine
 
Clustering of Deep WebPages: A Comparative Study
Clustering of Deep WebPages: A Comparative StudyClustering of Deep WebPages: A Comparative Study
Clustering of Deep WebPages: A Comparative Study
 
IRJET - Cyberbulling Detection Model
IRJET -  	  Cyberbulling Detection ModelIRJET -  	  Cyberbulling Detection Model
IRJET - Cyberbulling Detection Model
 
Entity Linking Combining Open Source Annotators
Entity Linking Combining Open Source AnnotatorsEntity Linking Combining Open Source Annotators
Entity Linking Combining Open Source Annotators
 
Information Extraction from Text, presented @ Deloitte
Information Extraction from Text, presented @ DeloitteInformation Extraction from Text, presented @ Deloitte
Information Extraction from Text, presented @ Deloitte
 
Achieving Privacy in Publishing Search logs
Achieving Privacy in Publishing Search logsAchieving Privacy in Publishing Search logs
Achieving Privacy in Publishing Search logs
 
Prediction of Reaction towards Textual Posts in Social Networks
Prediction of Reaction towards Textual Posts in Social NetworksPrediction of Reaction towards Textual Posts in Social Networks
Prediction of Reaction towards Textual Posts in Social Networks
 
IRJET - Suicidal Text Detection using Machine Learning
IRJET -  	  Suicidal Text Detection using Machine LearningIRJET -  	  Suicidal Text Detection using Machine Learning
IRJET - Suicidal Text Detection using Machine Learning
 
IRJET-Classifying Mined Online Discussion Data for Reflective Thinking based ...
IRJET-Classifying Mined Online Discussion Data for Reflective Thinking based ...IRJET-Classifying Mined Online Discussion Data for Reflective Thinking based ...
IRJET-Classifying Mined Online Discussion Data for Reflective Thinking based ...
 
Sas web 2010 lora-aroyo
Sas web 2010 lora-aroyoSas web 2010 lora-aroyo
Sas web 2010 lora-aroyo
 
AELA: An Adaptive Entity Linking Approach
AELA: An Adaptive Entity Linking ApproachAELA: An Adaptive Entity Linking Approach
AELA: An Adaptive Entity Linking Approach
 
Data Publishing Workflows with Dataverse
Data Publishing Workflows with DataverseData Publishing Workflows with Dataverse
Data Publishing Workflows with Dataverse
 
eventdemo2016
eventdemo2016eventdemo2016
eventdemo2016
 

Mais de Gabriela Agustini

Como a cultura maker vai mudar o modo de produção global
Como a cultura maker vai mudar o modo de produção globalComo a cultura maker vai mudar o modo de produção global
Como a cultura maker vai mudar o modo de produção globalGabriela Agustini
 
Cidadãos como protagonistas das transformações sociais
Cidadãos como protagonistas das transformações sociaisCidadãos como protagonistas das transformações sociais
Cidadãos como protagonistas das transformações sociaisGabriela Agustini
 
Movimento Maker e Educação
Movimento Maker e EducaçãoMovimento Maker e Educação
Movimento Maker e EducaçãoGabriela Agustini
 
Diversidade cultural gilberto gil
Diversidade cultural gilberto gilDiversidade cultural gilberto gil
Diversidade cultural gilberto gilGabriela Agustini
 
Social Entrepreneurship - International School of Law and Technology
Social Entrepreneurship - International School of Law and TechnologySocial Entrepreneurship - International School of Law and Technology
Social Entrepreneurship - International School of Law and TechnologyGabriela Agustini
 
A tecnologia pode salvar a gente? | A gente pode salvar a tecnologia?
A tecnologia pode salvar a gente? | A gente pode salvar a tecnologia?A tecnologia pode salvar a gente? | A gente pode salvar a tecnologia?
A tecnologia pode salvar a gente? | A gente pode salvar a tecnologia?Gabriela Agustini
 
Makersfor Global Good Report
Makersfor Global Good ReportMakersfor Global Good Report
Makersfor Global Good ReportGabriela Agustini
 
Apresentação olabi institucional interna - abril 17
Apresentação olabi institucional interna - abril 17Apresentação olabi institucional interna - abril 17
Apresentação olabi institucional interna - abril 17Gabriela Agustini
 
Pretalab- apresentação institucional
Pretalab- apresentação institucionalPretalab- apresentação institucional
Pretalab- apresentação institucionalGabriela Agustini
 
Cultura e tecnologia - aula2
Cultura e tecnologia - aula2Cultura e tecnologia - aula2
Cultura e tecnologia - aula2Gabriela Agustini
 
Cultura e tecnologia - aula1
Cultura e tecnologia - aula1Cultura e tecnologia - aula1
Cultura e tecnologia - aula1Gabriela Agustini
 
Global Innovation Gathering featured in Make Magazine Germany
Global Innovation Gathering featured in Make Magazine GermanyGlobal Innovation Gathering featured in Make Magazine Germany
Global Innovation Gathering featured in Make Magazine GermanyGabriela Agustini
 
Inovação de baixo para cima e o poder dos cidadãos
Inovação de baixo para cima e o poder dos cidadãos Inovação de baixo para cima e o poder dos cidadãos
Inovação de baixo para cima e o poder dos cidadãos Gabriela Agustini
 
Makerspaces e hubs de inovação
Makerspaces e hubs de inovaçãoMakerspaces e hubs de inovação
Makerspaces e hubs de inovaçãoGabriela Agustini
 

Mais de Gabriela Agustini (20)

Como a cultura maker vai mudar o modo de produção global
Como a cultura maker vai mudar o modo de produção globalComo a cultura maker vai mudar o modo de produção global
Como a cultura maker vai mudar o modo de produção global
 
Cidadãos como protagonistas das transformações sociais
Cidadãos como protagonistas das transformações sociaisCidadãos como protagonistas das transformações sociais
Cidadãos como protagonistas das transformações sociais
 
Inovação digital
Inovação digital Inovação digital
Inovação digital
 
Movimento Maker e Educação
Movimento Maker e EducaçãoMovimento Maker e Educação
Movimento Maker e Educação
 
Cultura digital - Aula 4
Cultura digital - Aula 4Cultura digital - Aula 4
Cultura digital - Aula 4
 
Cultura Digital- aula 3
Cultura Digital- aula 3Cultura Digital- aula 3
Cultura Digital- aula 3
 
Cultura Digital- aula 2
Cultura Digital- aula 2Cultura Digital- aula 2
Cultura Digital- aula 2
 
Diversidade cultural gilberto gil
Diversidade cultural gilberto gilDiversidade cultural gilberto gil
Diversidade cultural gilberto gil
 
Social Entrepreneurship - International School of Law and Technology
Social Entrepreneurship - International School of Law and TechnologySocial Entrepreneurship - International School of Law and Technology
Social Entrepreneurship - International School of Law and Technology
 
A tecnologia pode salvar a gente? | A gente pode salvar a tecnologia?
A tecnologia pode salvar a gente? | A gente pode salvar a tecnologia?A tecnologia pode salvar a gente? | A gente pode salvar a tecnologia?
A tecnologia pode salvar a gente? | A gente pode salvar a tecnologia?
 
Makersfor Global Good Report
Makersfor Global Good ReportMakersfor Global Good Report
Makersfor Global Good Report
 
Apresentação olabi institucional interna - abril 17
Apresentação olabi institucional interna - abril 17Apresentação olabi institucional interna - abril 17
Apresentação olabi institucional interna - abril 17
 
7 Forum Nacional de Museus
7 Forum Nacional de Museus7 Forum Nacional de Museus
7 Forum Nacional de Museus
 
Apresentacao metashop
Apresentacao metashopApresentacao metashop
Apresentacao metashop
 
Pretalab- apresentação institucional
Pretalab- apresentação institucionalPretalab- apresentação institucional
Pretalab- apresentação institucional
 
Cultura e tecnologia - aula2
Cultura e tecnologia - aula2Cultura e tecnologia - aula2
Cultura e tecnologia - aula2
 
Cultura e tecnologia - aula1
Cultura e tecnologia - aula1Cultura e tecnologia - aula1
Cultura e tecnologia - aula1
 
Global Innovation Gathering featured in Make Magazine Germany
Global Innovation Gathering featured in Make Magazine GermanyGlobal Innovation Gathering featured in Make Magazine Germany
Global Innovation Gathering featured in Make Magazine Germany
 
Inovação de baixo para cima e o poder dos cidadãos
Inovação de baixo para cima e o poder dos cidadãos Inovação de baixo para cima e o poder dos cidadãos
Inovação de baixo para cima e o poder dos cidadãos
 
Makerspaces e hubs de inovação
Makerspaces e hubs de inovaçãoMakerspaces e hubs de inovação
Makerspaces e hubs de inovação
 

Último

Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 

Último (20)

Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 

Searching for Interestingness in Wikipedia and Yahoo Answers

  • 1. Searching for Interestingness in Wikipedia and Yahoo! Answers Yelena Mejova1 Ilaria Bordino2 Mounia Lalmas3 Aristides Gionis4 {1,2,3} 4 Yahoo! Research Barcelona, Spain Aalto University, Finland {1 ymejova, 2 bordino, 3 mounia}@yahoo-inc.com 4 aristides.gionis@aalto.fi ABSTRACT 2. ENTITY NETWORKS In many cases, when browsing the Web, users are searching We extract entity networks from (i) a dump of the En- for specific information. Sometimes, though, users are also glish Wikipedia from December 2011 consisting of 3 795 865 looking for something interesting, surprising, or entertain- articles, and (ii) a sample of the English Yahoo! Answers ing. Serendipitous search puts interestingness on par with dataset from 2010/2011, containing 67 336 144 questions and relevance. We investigate how interesting are the results one 261 770 047 answers. We use state-of-the-art methods [3, 5] can obtain via serendipitous search, and what makes them to extract entities from the documents in each dataset. so, by comparing entity networks extracted from two promi- Next we draw an arc between any two entities e1 and e2 nent social media sites, Wikipedia and Yahoo! Answers. that co-occur in one or more documents. We assign the arc a weight w1 (e1 , e2 ) = DF(e1 , e2 ) equal to the number of such Categories and Subject Descriptors documents (the document frequency (DF) of the entity pair). H.4 [Information Systems Applications]: Miscellaneous This weighting scheme tends to favor popular entities. To mitigate this effect, we measure the rarity of any entity e Keywords in a dataset by computing its inverse document frequency Serendipity, Exploratory search IDF(e) = log(N )−log(DF(e)), where N is the size of the col- lection, and DF(e) is the document frequency of entity e. We 1. INTRODUCTION set a threshold on IDF to drop the arcs that involve the most Serendipitous search occurs when a user with no a priori popular entities. We also rescale the arc weights according or totally unrelated intentions interacts with a system and to the alternative scheme w2 (e1 → e2 ) = DF(e1 , e2 )·IDF(e2 ). acquires useful information [4]. A system supporting such We use Personalized PageRank (PPR) [1] to extract the exploratory capabilities must provide results that are rele- top n entities related to a query entity. We consider two vant to the user’s current interest, and yet interesting, to scoring methods. When using the w2 weighting scheme, we encourage the user to continue the exploration. simply use the PPR scores (we dub this method IDF). When In this work, we describe an entity-driven exploratory and using the simpler scheme w1 , we normalize the PPR scores serendipitous search system, based on enriched entity net- by the global PageRank scores (with no personalization) to works that are explored through random-walk computations penalize popular entities. We dub this method PN. to retrieve search results for a given query entity. We extract We enrich our entity networks with metadata regard- entity networks from two datasets, Wikipedia, a curated, ing sentiment and quality of the documents. Using Sen- collaborative online encyclopedia, and Yahoo! Answers, a tiStrength1 , we extract sentiment scores for each document. more unconstrained question/answering forum, where the We calculate attitude and sentimentality metrics [2] to mea- freedom of conversation may present advantages such as sure polarity and strength of the sentiment. Regarding qual- opinions, rumors, and social interest and approval. ity, for Yahoo! Answers documents we count the number of We compare the networks extracted from the two media points assigned by the system to the users, as indication of by performing user studies in which we juxtapose interest- expertise and thus good quality. For Wikipedia, we count ingness of the results retrieved for a query entity, with rel- the number of dispute messages inserted by editors to require evance. We investigate whether interestingness depends on revisions, as indication of bad quality. We derive sentiment (i) the curated/uncurated nature of the dataset, and/or on and quality scores for any entity by averaging over all the (ii) additional characteristics of the results, such as senti- documents in which the entity appears. We use Wikimedia2 ment, content quality, and popularity. statistics to estimate the popularity of entities. 3. EXPLORATORY SEARCH We test our system using a set of 37 queries originat- Permission to make digital or hard copies of all or part of this work for ing from 2010 and 2011 Google Zeitgeist (www.google.com/ personal or classroom use is granted without fee provided that copies are zeitgeist) and having sufficient coverage in both datasets. not made or distributed for profit or commercial advantage and that copies Using one of the two algorithms – PN or IDF – we retrieve bear this notice and the full citation on the first page. To copy otherwise, to the top five entities from each dataset – YA or WP – for each republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. 1 WWW 2013 Companion, May 13–17, 2013, Rio de Janeiro, Brazil. sentistrength.wlv.ac.uk 2 Copyright 2013 ACM 978-1-4503-2038-2/13/05 ...$15.00. dumps.wikimedia.org/other/pagecounts-raw
  • 2. Figure 1: Performance: (a) and (b) scale range from 1 to 4, (c) correlation range from 0 to 1 4.0 4.0 int. to query int. to user relevant diverse frustrating interesting learn new int. to query int. to user relevant 0.8 3.0 3.0 0.4 2.0 2.0 1.0 1.0 0.0 ya r ya pn ya idf wp r wp pn wp idf ya r ya pn ya idf wp r wp pn wp idf ya pn ya idf wp pn wp idf (a) Average query/result pair labels (b) Average query set labels (c) Corr. with learn smth new query. For comparison, we consider setups consisting of 5 query (0.214), and quality of the result entity (0.201). These random entities. Note that unlike for conventional retrieval, features point to important aspects of a retrieval strategy a random baseline is feasible for a browsing task. which would lead to a successful serendipitous search. We recruit four editors to annotate the retrieved results, Table 1: Retrieval result examples asking them to evaluate each result entity for relevance, in- YA query: Kim Kardashian Attitude Sentiment. Quality Pageviews terestingness to the query, and interestingness regardless of Perry Williams 0 0 0 85 Eva Longoria Parker −0.602 2.018 6 1 450 814 the query, with responses falling on scale from 1 to 4 (Fig- WP query: H1N1 pandemic Attitude Sentiment. Quality Pageviews ure 1(a)). Both of our retrieval methods outperform the Phaungbyin 2 2 1 706 random baseline (at p < 0.01). The gain in interestingness 2009 US flu pandemic 1 1 1 21 981 to the user despite the query suggests that randomly viewed information is not intrinsically interesting to the user. 4. DISCUSSION & CONCLUSION Whereas performance improves from PN to IDF for YA, Beyond the aggregate measures of the previous section, the interestingness to the user is hurt significantly (at p < the peculiarities of Yahoo! Answers and Wikipedia as so- 0.01) for WP (the other measures remain statistically the cial media present unique advantages and challenges for same). Note that PN uses the weighting scheme w1 , while serendipitous search. For example, Table 1 shows poten- IDF operates on the networks sparsified and weighted ac- tial search YA results for an American socialite Kim Kar- cording to function w2 . The frequency-based approach ap- dashian: an actress Eva Longoria Parker (whose Wikipedia plied by IDF mediates the mentioning of popular entities in page has over a million visits in two years), and a footballer a non-curated dataset like YA, but it fails to capture the im- Perry Williams (who played his last game in 1993). Note portance of entities in a domain with restricted authorship. the difference in attitude and sentimentality. Yahoo! An- Next we ask the editors to look at the five results as a swers provides a wider spread of emotion. This data may be whole, measuring diversity, frustration, interestingness, and of use when searching for potentially serendipitous entities. the ability of the user to learn something new about the Table 1 also shows potential WP results for the query query. Figure 1(b) shows that the two random runs are H1N1 Pandemic: a town in Burma called Phaungbyin, and highly diverse but provoke the most frustration. The most 2009 flu pandemic in the United States. We may expect diverse and the least frustrating result sets are provided by pandemic to be associated with negative sentiment, but the the YA IDF run. The WP PN run also shows high diversity, documents in Wikipedia do not display it. but it falls with the IDF constraint. The YA IDF run gives It is our intuition that the two datasets provide a comple- better diversity and interesting scores at p < 0.01 than the mentary view of the entities and their relations, and that a WP IDF run, while performing statistically the same. hybrid system exploiting both resources would provide the To examine the relationship with the serendipity level of best user experience. We leave this for future work. the content, we compute correlation between the learn some- thing new label (LSN) and the others. Figure 1(c) shows 5. ACKNOWLEDGEMENTS the LSN label to be the least correlated with interests of the This work was partially funded by the European Union user in the WP IDF run, and the most for the YA IDF run. Linguistically Motivated Semantic Aggregation Engines Especially in the WP IDF run, the relevance is highly asso- (LiMoSINe) project3 . ciated with the LSN label. We are witnessing two different References searching experiences: in the YA IDF setup the results are [1] G. Jeh and J. Widom. Scaling personalized web search. In diverse and popular, whereas in the WP IDF setup the re- WWW ’03, pages 271–279. ACM, 2003. sults are less diverse, and the user may be less interested in [2] O. Kucuktunc, B. Cambazoglu, I. Weber, and H. Ferhatos- the relevant content, but it will be just as educational. manoglu. A large-scale sentiment analysis for yahoo! answers. Finally we analyze the metadata collected for the entities In WSDM ’12, pages 633–642. ACM, 2012. in any query-result pair: Attitude (A), Sentimentality (S), [3] D. Paranjpe. Learning document aboutness from implicit user feedback and document structure. In CIKM, 2009. Quality (Q), Popularity (V), and Context (T). For each pair, [4] E. G. Toms. Serendipitous information retrieval. In DELOS, we calculate the difference between query and result in these 2000. dimensions. For Context we compute the cosine similarity [5] Y. Zhou, L. Nie, O. Rouhani-Kalleh, F. Vasile, and S. Gaffney. between the TF/IDF vectors of the entities. In aggregate, Resolving surface forms to Wikipedia topics. In COLING, the best connections are between result popularity and rel- pages 1335–1343, 2010. evance (0.234), as well as interestingness of the result to the 3 www.limosine-project.eu user (0.227), followed by contextual similarity of result and