SlideShare uma empresa Scribd logo
1 de 62
Baixar para ler offline
Ranking Related News Predictions
Ranking Related News Predictions
Nattiya Kanhabua1, Roi Blanco2 and Michael Matthews2
1Norwegian University of Science and Tech., Norway
2Yahoo! Research, Barcelona, Spain
SIGIR’2011, Beijing
Ranking Related News Predictions
Outline
Outline
Introduction
Problem Statement
Related Work
Contributions
Task Definition
System Architecture
Models
Approach
Features
Ranking Method
Evaluation
Experiment Setting
Experimental Results
Ranking Related News Predictions
Outline
Outline
Introduction
Problem Statement
Related Work
Contributions
Task Definition
System Architecture
Models
Approach
Features
Ranking Method
Evaluation
Experiment Setting
Experimental Results
Ranking Related News Predictions
Outline
Outline
Introduction
Problem Statement
Related Work
Contributions
Task Definition
System Architecture
Models
Approach
Features
Ranking Method
Evaluation
Experiment Setting
Experimental Results
Ranking Related News Predictions
Outline
Outline
Introduction
Problem Statement
Related Work
Contributions
Task Definition
System Architecture
Models
Approach
Features
Ranking Method
Evaluation
Experiment Setting
Experimental Results
Ranking Related News Predictions
Introduction
Problem Statement
Outline
Introduction
Problem Statement
Related Work
Contributions
Task Definition
System Architecture
Models
Approach
Features
Ranking Method
Evaluation
Experiment Setting
Experimental Results
Ranking Related News Predictions
Introduction
Problem Statement
Problem statement
People are naturally curious about the future.
◮ How long will a war in the middle east last?
◮ What is the latest health care plan?
◮ What will happen to EU economies in next 5 years?
◮ What will be potential effects of climate changes?
Over 32% of 2.5M documents from Yahoo! News (July 2009 to
July 2010) contain at least one prediction.
A new task called ranking related news predictions.
◮ Retrieve predictions related to a news story in news archives.
◮ Rank them according to their relevance to the news story.
Ranking Related News Predictions
Introduction
Problem Statement
Problem statement
People are naturally curious about the future.
◮ How long will a war in the middle east last?
◮ What is the latest health care plan?
◮ What will happen to EU economies in next 5 years?
◮ What will be potential effects of climate changes?
Over 32% of 2.5M documents from Yahoo! News (July 2009 to
July 2010) contain at least one prediction.
A new task called ranking related news predictions.
◮ Retrieve predictions related to a news story in news archives.
◮ Rank them according to their relevance to the news story.
Ranking Related News Predictions
Introduction
Problem Statement
Problem statement
People are naturally curious about the future.
◮ How long will a war in the middle east last?
◮ What is the latest health care plan?
◮ What will happen to EU economies in next 5 years?
◮ What will be potential effects of climate changes?
Over 32% of 2.5M documents from Yahoo! News (July 2009 to
July 2010) contain at least one prediction.
A new task called ranking related news predictions.
◮ Retrieve predictions related to a news story in news archives.
◮ Rank them according to their relevance to the news story.
Ranking Related News Predictions
Introduction
Problem Statement
Related News Predictions
Ranking Related News Predictions
Introduction
Problem Statement
Related News Predictions
Ranking Related News Predictions
Introduction
Problem Statement
Related News Predictions
Query = <gas, emission, percent,
european, global, climate>
Ranking Related News Predictions
Introduction
Related Work
Outline
Introduction
Problem Statement
Related Work
Contributions
Task Definition
System Architecture
Models
Approach
Features
Ranking Method
Evaluation
Experiment Setting
Experimental Results
Ranking Related News Predictions
Introduction
Related Work
Future-related Information Analyzing Tools
Recorded Future
Difference: a user must specify a query in advance using “predefined” entities.
Ranking Related News Predictions
Introduction
Related Work
Future-related Information Analyzing Tools
Yahoo’s Time Explorer
Difference: No ranking or performance evaluation is done.
Ranking Related News Predictions
Introduction
Related Work
Previous Work on Future Retrieval
R. Baeza-Yates. Searching the future. SIGIR’2005 Workshop
on Mathematical/Formal Methods in IR.
◮ Extract temporal expressions from news articles.
◮ Retrieve future information using a probabilistic model, i.e.,
multiplying term similarity and a time confidence.
◮ Only a small data set and a year granularity are used.
Ranking Related News Predictions
Introduction
Related Work
Previous Work on Future Retrieval
A. Jatowt et al. Supporting analysis of future-related
information in news archives and the web. JCDL’2009.
◮ Extract future mentions from news snippets obtained from
search engines.
◮ Summarize and aggregate results using clustering methods.
◮ Not focus on relevance and ranking of future information.
Ranking Related News Predictions
Introduction
Contributions
Outline
Introduction
Problem Statement
Related Work
Contributions
Task Definition
System Architecture
Models
Approach
Features
Ranking Method
Evaluation
Experiment Setting
Experimental Results
Ranking Related News Predictions
Introduction
Contributions
Contributions
I. Formally define ranking related news predictions.
II. Four classes of features: term similarity, entity-based
similarity, topic similarity and temporal similarity.
III. Extensive evaluation using dataset with over 6000
judgments from the NYT Annotated Corpus.
Ranking Related News Predictions
Task Definition
System Architecture
Outline
Introduction
Problem Statement
Related Work
Contributions
Task Definition
System Architecture
Models
Approach
Features
Ranking Method
Evaluation
Experiment Setting
Experimental Results
Ranking Related News Predictions
Task Definition
System Architecture
System Architecture
Step 1: Document annotation.
◮ Extract temporal expressions
using time and event recognition.
◮ Normalize them to dates so they
can be anchored on a timeline.
◮ Output: sentences annotated
with named entities and dates,
i.e., predictions.
Ranking Related News Predictions
Task Definition
System Architecture
System Architecture
Step 2: Retrieving predictions.
◮ Automatically generate a query
from a news article being read.
◮ Retrieve predictions that match
the query.
◮ Rank predictions by relevance. A
prediction is “relevant” if it is
about the topics of the article.
Ranking Related News Predictions
Task Definition
Models
Outline
Introduction
Problem Statement
Related Work
Contributions
Task Definition
System Architecture
Models
Approach
Features
Ranking Method
Evaluation
Experiment Setting
Experimental Results
Ranking Related News Predictions
Task Definition
Models
Annotated Document Model
Collection C = {d1, . . . , dn}.
Document d = {{w1, . . . , wn} , time(d)}.
◮ time(d) gives the publication date of d.
Annotated document ˆd is composed of:
◮ Named entities ˆde = {e1, . . . , en}
◮ Temporal expressions ˆdt = {t1, . . . , tm}
◮ Sentences ˆds = {s1, . . . , sz}
Ranking Related News Predictions
Task Definition
Models
Annotated Document Model
Collection C = {d1, . . . , dn}.
Document d = {{w1, . . . , wn} , time(d)}.
◮ time(d) gives the publication date of d.
Annotated document ˆd is composed of:
◮ Named entities ˆde = {e1, . . . , en}
◮ Temporal expressions ˆdt = {t1, . . . , tm}
◮ Sentences ˆds = {s1, . . . , sz}
Ranking Related News Predictions
Task Definition
Models
Prediction Model
Let dp be the parent document of a prediction p.
p is a sentence containing field/value pairs:
Field Value
ID 1136243_1
PARENT_ID 1136243
TITLE Gore Pledges A Health Plan For Every Child
TEXT Vice President Al Gore proposed today to guarantee access to
affordable health insurance for all children by 2005, expanding
on a program enacted two years ago that he conceded had had
limited success so far.
CONTEXT Mr. Gore acknowledged that the number of Americans without
health coverage had increased steadily since he and President
Clinton took office.
ENTITY Al Gore
FUTURE_DATE 2005
PUB_DATE 1999/09/08
Ranking Related News Predictions
Task Definition
Models
Query Model
Query q is extracted from a news article being read dq.
1. Keywords qtext
2. Time constraints qtime
Ranking Related News Predictions
Task Definition
Models
Query Keywords
A news article
being read
Query keyword
extraction
Term query
(1) (2) (3)
Entity query
Q Q
E T
Field
A prediction
ID
PARENT_ID
TITLE
TEXT
ENTITY
CONTEXT
FUTURE_DATE
PUB_DATE
Combined query
Q
C
QE = {e1, . . . , em}
E.g., Barack Obama, Iraq, America
Ranking Related News Predictions
Task Definition
Models
Query Keywords
A news article
being read
Query keyword
extraction
Term query
(1) (2) (3)
Entity query
Q Q
Combined query
Q
E T C
Field
A prediction
ID
PARENT_ID
TITLE
TEXT
ENTITY
CONTEXT
FUTURE_DATE
PUB_DATE
QE = {w1, . . . , wn}
E.g., troop, war, withdraw
Ranking Related News Predictions
Task Definition
Models
Query Keywords
A news article
being read
Query keyword
extraction
Term query
(1) (2) (3)
Entity query
Q Q
Combined query
Q
E T C
Field
A prediction
ID
PARENT_ID
TITLE
TEXT
ENTITY
CONTEXT
FUTURE_DATE
PUB_DATE
QC = {e1, . . . , em} ∪ {w1, . . . , wn}
E.g., Barack Obama, Iraq, America, troop, war, withdraw
Ranking Related News Predictions
Task Definition
Models
Query Time
Time constraints qtime
1. only predictions that are future to time(dq) - (time(dq), tmax]
2. only articles published before time(dq) - [tmin, time(dq)]
now futurepast
Query
2016 203320181999 20062002
P P P
Ranking Related News Predictions
Approach
Features
Outline
Introduction
Problem Statement
Related Work
Contributions
Task Definition
System Architecture
Models
Approach
Features
Ranking Method
Evaluation
Experiment Setting
Experimental Results
Ranking Related News Predictions
Approach
Features
Term Similarity
Capture the term-similarity between q and p.
1. retScore(q,p) Lucene’s TF-IDF scoring function
◮ Problem: keyword matching, short texts
◮ Predictions not containing query terms are not retrieved.
2. bm25f(q,p) field-aware ranking function
◮ Extend a sentence structure by surrounding sentences.
◮ Search CONTEXT in addition to TEXT [Blanco et al. 2010].
Ranking Related News Predictions
Approach
Features
Term Similarity
Capture the term-similarity between q and p.
1. retScore(q,p) Lucene’s TF-IDF scoring function
◮ Problem: keyword matching, short texts
◮ Predictions not containing query terms are not retrieved.
2. bm25f(q,p) field-aware ranking function
◮ Extend a sentence structure by surrounding sentences.
◮ Search CONTEXT in addition to TEXT [Blanco et al. 2010].
Ranking Related News Predictions
Approach
Features
Term Similarity
Capture the term-similarity between q and p.
1. retScore(q,p) Lucene’s TF-IDF scoring function
◮ Problem: keyword matching, short texts
◮ Predictions not containing query terms are not retrieved.
2. bm25f(q,p) field-aware ranking function
◮ Extend a sentence structure by surrounding sentences.
◮ Search CONTEXT in addition to TEXT [Blanco et al. 2010].
Ranking Related News Predictions
Approach
Features
Entity-based Similarity
Measure the similarity between q and p
by exploiting annotated entities in dp
, p,
q.
◮ Only applicable for QE and QC.
◮ Features commonly employed in
entity ranking tasks.
◮ Time distance captures the
relationship of term and time.
ID Feature
1 entitySim(q, p)
2 title(e, dp)
3 titleSim(e, dp)
4 senPos(e, dp)
5 senLen(e, dp)
6 cntSenSubj(e, dp)
7 cntEvent(e, dp)
8 cntFuture(e, dp)
9 cntEventSubj(e, dp)
10 cntFutureSubj(e, dp)
11 timeDistEvent(e, dp)
12 timeDistFuture(e, dp)
13 tagSim(e, dp)
14 isSubj(e, p)
15 timeDist(e, p)
Ranking Related News Predictions
Approach
Features
Topic Similarity
Compute the similarity between q and p on a topic level.
◮ Latent Dirichlet allocation [Blei et al. 2003] for modeling topics.
1. Train a topic model
2. Infer topics
3. Compute topic similarity
Ranking Related News Predictions
Approach
Features
Topic Similarity
Step 1: Learn a topic model.
◮ Partition DN into sub-collections,
called document snapshot Dtrain,tk
.
◮ For each Dtrain,tk
, randomly select
documents for training a topic model.
◮ Output: topic models at different
time snapshots, e.g., φtk
at tk .
Ranking Related News Predictions
Approach
Features
Topic Similarity
Step 2: Infer topics.
◮ Determine topics for q and p using
their contents, called topic inference.
◮ Both q and p are represented by a
probability distribution of topics.
◮ pφ = p(z1), . . . , p(zn), where p(z) is
a probability of a topic z.
Ranking Related News Predictions
Approach
Features
Topic Similarity
I. Which model snapshot should be used for inference?
Select a topic model φtk
for inference in 2 ways:
◮ tk = time(dq
)
◮ tk = time(dp
)
II. Which contents should be used for inference?
For a query q, the parent document dq is used. For a
prediction p, the contents can be:
◮ Only text ptxt
◮ Both text ptxt and context pctx
◮ Parent document dp
Ranking Related News Predictions
Approach
Features
Topic Similarity
I. Which model snapshot should be used for inference?
Select a topic model φtk
for inference in 2 ways:
◮ tk = time(dq
)
◮ tk = time(dp
)
II. Which contents should be used for inference?
For a query q, the parent document dq is used. For a
prediction p, the contents can be:
◮ Only text ptxt
◮ Both text ptxt and context pctx
◮ Parent document dp
Ranking Related News Predictions
Approach
Features
Topic Similarity
I. Which model snapshot should be used for inference?
Select a topic model φtk
for inference in 2 ways:
◮ tk = time(dq
)
◮ tk = time(dp
)
II. Which contents should be used for inference?
For a query q, the parent document dq is used. For a
prediction p, the contents can be:
◮ Only text ptxt
◮ Both text ptxt and context pctx
◮ Parent document dp
Ranking Related News Predictions
Approach
Features
Topic Similarity
I. Which model snapshot should be used for inference?
Select a topic model φtk
for inference in 2 ways:
◮ tk = time(dq
)
◮ tk = time(dp
)
II. Which contents should be used for inference?
For a query q, the parent document dq is used. For a
prediction p, the contents can be:
◮ Only text ptxt
◮ Both text ptxt and context pctx
◮ Parent document dp
Ranking Related News Predictions
Approach
Features
Topic Similarity
Step 3: Measuring topic similarity.
◮ q and p are represented by topic
distributions.
◮ qφ = p(z1), . . . , p(zn)
◮ pφ = p(z1), . . . , p(zn)
◮ Compute the topic similarity using
cosine similarity.
topicSim(q, p) =
qφ · pφ
||qφ|| · ||pφ||
= z∈Z qφz · pφz
z∈Z q2
φz
· z∈Z p2
φz
Ranking Related News Predictions
Approach
Features
Temporal Similarity
Hypothesis I. Predictions that are more recent to the query are
more relevant.
now futurepast
Query
2016 203320181999 20062002
P P P
Time distance
Ranking Related News Predictions
Approach
Features
Temporal Similarity
Hypothesis II. Predictions extracted from more recent
documents are more relevant.
now futurepast
Query
2016 203320181999 20062002
P P P
Time distance
◮ Timestamp-based Uncertainty (TSU) [Kanhabua and Nørvåg 2010]
◮ FussySet (FS) [Kalczynski and Chou 2005]
Ranking Related News Predictions
Approach
Ranking Method
Outline
Introduction
Problem Statement
Related Work
Contributions
Task Definition
System Architecture
Models
Approach
Features
Ranking Method
Evaluation
Experiment Setting
Experimental Results
Ranking Related News Predictions
Approach
Ranking Method
Ranking Method
Learning-to-rank: Given an unseen (q, p), p is ranked using a
model trained over a set of labeled query/prediction pairs.
score(q, p) =
N
i=1
wi × fi
◮ SVMMAP [Yue et al. 2007]
◮ RankSVM [Joachims 2002]
◮ SGD-SVM [Zhang 2004]
◮ PegasosSVM [Shalev-Shwartz et al. 2007]
◮ PA-Perceptron [Crammer et al. 2006]
Ranking Related News Predictions
Evaluation
Experiment Setting
Outline
Introduction
Problem Statement
Related Work
Contributions
Task Definition
System Architecture
Models
Approach
Features
Ranking Method
Evaluation
Experiment Setting
Experimental Results
Ranking Related News Predictions
Evaluation
Experiment Setting
Document collection
NYT Annotated Corpus 1.8M from 1987 to 2007.
◮ More than 25% contain at least one prediction
Annotation process uses several language processing tools.
◮ OpenNLP for tokenizing, sentence splitting, part-of-speech
tagging, shallow parsing
◮ SuperSense tagger for named entity recognition
◮ TARSQI for extracting temporal expressions
Apache Lucene for indexing and retrieving.
◮ 44,335,519 sentences and 548,491 predictions
◮ 939,455 future dates (avg. future date/prediction is 1.7)
Ranking Related News Predictions
Evaluation
Experiment Setting
Relevance judgments
42 future-related topics
POLITICS ENVIRONMENT SPACE
president election global warming Mars
Iraq war energy efficiency Moon
SCIENCE PHYSICS HEALTH
earthquake particle Physics bird flue
tsunami Big Bang influenza
BUSINESS SPORT TECHNOLOGY
subprime Olympics Internet
financial crisis World cup search engine
Ranking Related News Predictions
Evaluation
Experiment Setting
Relevance judgments
Human assessors gave a relevance score Grade(q, p, t).
◮ 4 (very relevant), 3 (relevant), 2 (related), 1 (non-relevant), and 0
(incorrect tagged date)
◮ relevant if Grade(q, p, t) ≥ 3 and non-relevant if
1 ≤ Grade(q, p, t) ≤ 2
In total, assessors judged 52 queries.
◮ On average 94 predictions were retrieved per query
◮ 4,888 query/prediction pairs (approximately 6,032 of triples)
Available for download at:
www.idi.ntnu.no/~nattiya/data/sigir2011/futurepredictions.zip
Ranking Related News Predictions
Evaluation
Experiment Setting
Parameter setting
BM25F: b = 0.75, k1 = 1.2 [Robertson et al. 1994]
◮ boost(TEXT) = 5.0
◮ boost(CONTEXT) = 1.0
◮ boost(TITLE) = 2.0
LDA: Stanford Topic Modeling Toolbox
◮ randomly select 4% of documents in each year for training
◮ filter 100 most common terms and in less than 15 documents
◮ number of topics Nz is 500
◮ collapsed variational Bayes approximation algorithm
Temporal features:
◮ DecayRate = 0.5, λ = 0.5, µ = 2y
◮ n = 2, m = 2, smin = 4y, smax = 2y
◮ α1 = time(dq) − 4y, α2 = time(dq) + 2y
Ranking Related News Predictions
Evaluation
Experimental Results
Outline
Introduction
Problem Statement
Related Work
Contributions
Task Definition
System Architecture
Models
Approach
Features
Ranking Method
Evaluation
Experiment Setting
Experimental Results
Ranking Related News Predictions
Evaluation
Experimental Results
Methods for comparison
Baseline: QE , QT , QC
◮ Rank using Lucene’s default ranking function.
Our approach: Re-QE , Re-QT , Re-QC
◮ Re-rank the baseline results using learning-to-rank.
Metrics: P@1, P@3, MRR
◮ Typically, a user is interested in a few top predictions.
Ranking Related News Predictions
Evaluation
Experimental Results
Selecting top-m entities and top-n terms
Select m and n with reasonable improvement in a hold-out set.
◮ Using QE to retrieve predictions, choose m = 11.
◮ Observing the performance of QC when m = 11, choose n = 10.
0
0.1
0.2
0.3
0.4
0.5
0.6
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
top-m entities
P10
MAP
0
0.1
0.2
0.3
0.4
0.5
0.6
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
top-n terms
P10
MAP
Ranking Related News Predictions
Evaluation
Experimental Results
Compare all other methods against QE
0
0.2
0.4
0.6
0.8
1
P1
QE
QT
QC
Re-QE
Re-QT
Re-QC
0
0.2
0.4
0.6
0.8
1
P3
QE
QT
QC
Re-QE
Re-QT
Re-QC
0
0.2
0.4
0.6
0.8
1
MRR
QE
QT
QC
Re-QE
Re-QT
Re-QC
Results:
◮ QE performs worst among the baselines, while QC is superior to QT .
◮ Re-QC gains the highest effectiveness followed by Re-QT .
◮ Re-ranking approach gains improvement, except Re-QE .
Ranking Related News Predictions
Evaluation
Experimental Results
Compare all other methods against QE
0
0.2
0.4
0.6
0.8
1
P1
QE
QT
QC
Re-QE
Re-QT
Re-QC
0
0.2
0.4
0.6
0.8
1
P3
QE
QT
QC
Re-QE
Re-QT
Re-QC
0
0.2
0.4
0.6
0.8
1
MRR
QE
QT
QC
Re-QE
Re-QT
Re-QC
Analysis:
◮ QE not retrieved any relevant result in the judged pool, difficult for re-ranking.
◮ Entity-based features perform well for some topics.
Ranking Related News Predictions
Evaluation
Experimental Results
Feature analysis
Top-5 features with highest weights and lowest weights for each query type.
QE QT QC
Feature Wi Feature Wi Feature Wi
tagSim 1.00 bm25f 1.00 LDA1,parent,k 1.00
FS1 0.97 retScore 0.60 retScore 0.99
TSU2 0.88 LDA1,parent,k 0.55 LDA1,parent,all 0.96
LDA1,txt,k 0.87 LDA2,parent,k 0.51 bm25f 0.93
LDA1,txt,all 0.82 LDA1,parent,all 0.49 isSubj 0.87
cntSenSubj 0.01 timeDistEvent -0.03 cntEventSen -0.02
cntEventSubj 0.01 timeDistFuture -0.11 querySim -0.05
isInTitle 0.00 cntEventSen -0.12 cntFutureSen -0.10
cntEventSen 0.00 cntFutureSen -0.12 timeDistFuture -0.14
querySim -0.01 senLen -0.16 senLen -0.18
◮ Topic-based features play an important role in the re-ranking model.
◮ Although relying on terms, retScore and bm25f help to re-rank predictions.
◮ Features in top-5 features with lowest weights are from the entity-based class.
Ranking Related News Predictions
Conclusions
Conclusions and Future Work
Outline
Introduction
Problem Statement
Related Work
Contributions
Task Definition
System Architecture
Models
Approach
Features
Ranking Method
Evaluation
Experiment Setting
Experimental Results
Ranking Related News Predictions
Conclusions
Conclusions and Future Work
Conclusions and future work
◮ Define the task of ranking related future predictions.
◮ Employ learning-to-rank incorporating 4 feature classes.
◮ Conduct extensive experiments and create an evaluation
dataset with over 6000 relevance judgments.
◮ Future work:
◮ Combining multiple sources (Wikipedia, blogs, home
pages, etc.) of future-related information.
◮ Sentimental analysis for future-related information.
Ranking Related News Predictions
Conclusions
Conclusions and Future Work
Acknowledgment: Thank Hugo Zaragoza for his help at the
early stages of this work.
Thank you!

Mais conteúdo relacionado

Destaque

Customer relationship management_dwm_ankita_dubey
Customer relationship management_dwm_ankita_dubeyCustomer relationship management_dwm_ankita_dubey
Customer relationship management_dwm_ankita_dubeyAnkita Dubey
 
How to apply CRM using data mining techniques.
How to apply CRM using data mining techniques.How to apply CRM using data mining techniques.
How to apply CRM using data mining techniques.customersforever
 
Data Mining Techniques for CRM
Data Mining Techniques for CRMData Mining Techniques for CRM
Data Mining Techniques for CRMShyaamini Balu
 
Practical Opinion Mining for Social Media
Practical Opinion Mining for Social MediaPractical Opinion Mining for Social Media
Practical Opinion Mining for Social MediaDiana Maynard
 
Using opinion mining techniques for early crisis detection
Using opinion mining techniques for early crisis detectionUsing opinion mining techniques for early crisis detection
Using opinion mining techniques for early crisis detectionFaculty of Computer Science
 
Text mining to correct missing CRM information: a practical data science project
Text mining to correct missing CRM information: a practical data science projectText mining to correct missing CRM information: a practical data science project
Text mining to correct missing CRM information: a practical data science projectJonathan Sedar
 
Text Mining to Correct Missing CRM Information by Jonathan Sedar
Text Mining to Correct Missing CRM Information by Jonathan SedarText Mining to Correct Missing CRM Information by Jonathan Sedar
Text Mining to Correct Missing CRM Information by Jonathan SedarPyData
 
Recommender Systems: Advances in Collaborative Filtering
Recommender Systems: Advances in Collaborative FilteringRecommender Systems: Advances in Collaborative Filtering
Recommender Systems: Advances in Collaborative FilteringChangsung Moon
 
Preprocessing of Academic Data for Mining Association Rule, Presentation @WAD...
Preprocessing of Academic Data for Mining Association Rule, Presentation @WAD...Preprocessing of Academic Data for Mining Association Rule, Presentation @WAD...
Preprocessing of Academic Data for Mining Association Rule, Presentation @WAD...shibbirtanvin
 
Recommender Systems and Active Learning
Recommender Systems and Active LearningRecommender Systems and Active Learning
Recommender Systems and Active LearningDain Kaplan
 
Online recommendations at scale using matrix factorisation
Online recommendations at scale using matrix factorisationOnline recommendations at scale using matrix factorisation
Online recommendations at scale using matrix factorisationMarcus Ljungblad
 
Data management services outsourcing – data mining, data entry and data proce...
Data management services outsourcing – data mining, data entry and data proce...Data management services outsourcing – data mining, data entry and data proce...
Data management services outsourcing – data mining, data entry and data proce...Sam Studio
 
Requirements for Processing Datasets for Recommender Systems
Requirements for Processing Datasets for Recommender SystemsRequirements for Processing Datasets for Recommender Systems
Requirements for Processing Datasets for Recommender SystemsStoitsis Giannis
 
Customer Relationship Management in Ireland Managing your Customers for Busin...
Customer Relationship Management in Ireland Managing your Customers for Busin...Customer Relationship Management in Ireland Managing your Customers for Busin...
Customer Relationship Management in Ireland Managing your Customers for Busin...Krishna De
 
Recommendation Engine Demystified
Recommendation Engine DemystifiedRecommendation Engine Demystified
Recommendation Engine DemystifiedDKALab
 
Recommendation techniques
Recommendation techniques Recommendation techniques
Recommendation techniques sun9413
 

Destaque (20)

Customer relationship management_dwm_ankita_dubey
Customer relationship management_dwm_ankita_dubeyCustomer relationship management_dwm_ankita_dubey
Customer relationship management_dwm_ankita_dubey
 
How to apply CRM using data mining techniques.
How to apply CRM using data mining techniques.How to apply CRM using data mining techniques.
How to apply CRM using data mining techniques.
 
Data Mining Techniques for CRM
Data Mining Techniques for CRMData Mining Techniques for CRM
Data Mining Techniques for CRM
 
Practical Opinion Mining for Social Media
Practical Opinion Mining for Social MediaPractical Opinion Mining for Social Media
Practical Opinion Mining for Social Media
 
Using opinion mining techniques for early crisis detection
Using opinion mining techniques for early crisis detectionUsing opinion mining techniques for early crisis detection
Using opinion mining techniques for early crisis detection
 
Retrive&amp;rank
Retrive&amp;rankRetrive&amp;rank
Retrive&amp;rank
 
Text mining to correct missing CRM information: a practical data science project
Text mining to correct missing CRM information: a practical data science projectText mining to correct missing CRM information: a practical data science project
Text mining to correct missing CRM information: a practical data science project
 
Text Mining to Correct Missing CRM Information by Jonathan Sedar
Text Mining to Correct Missing CRM Information by Jonathan SedarText Mining to Correct Missing CRM Information by Jonathan Sedar
Text Mining to Correct Missing CRM Information by Jonathan Sedar
 
Datamining for crm
Datamining for crmDatamining for crm
Datamining for crm
 
Recommender Systems: Advances in Collaborative Filtering
Recommender Systems: Advances in Collaborative FilteringRecommender Systems: Advances in Collaborative Filtering
Recommender Systems: Advances in Collaborative Filtering
 
Preprocessing of Academic Data for Mining Association Rule, Presentation @WAD...
Preprocessing of Academic Data for Mining Association Rule, Presentation @WAD...Preprocessing of Academic Data for Mining Association Rule, Presentation @WAD...
Preprocessing of Academic Data for Mining Association Rule, Presentation @WAD...
 
Recommender Systems and Active Learning
Recommender Systems and Active LearningRecommender Systems and Active Learning
Recommender Systems and Active Learning
 
Online recommendations at scale using matrix factorisation
Online recommendations at scale using matrix factorisationOnline recommendations at scale using matrix factorisation
Online recommendations at scale using matrix factorisation
 
Data management services outsourcing – data mining, data entry and data proce...
Data management services outsourcing – data mining, data entry and data proce...Data management services outsourcing – data mining, data entry and data proce...
Data management services outsourcing – data mining, data entry and data proce...
 
Requirements for Processing Datasets for Recommender Systems
Requirements for Processing Datasets for Recommender SystemsRequirements for Processing Datasets for Recommender Systems
Requirements for Processing Datasets for Recommender Systems
 
Customer Relationship Management in Ireland Managing your Customers for Busin...
Customer Relationship Management in Ireland Managing your Customers for Busin...Customer Relationship Management in Ireland Managing your Customers for Busin...
Customer Relationship Management in Ireland Managing your Customers for Busin...
 
Recommendation Engine Demystified
Recommendation Engine DemystifiedRecommendation Engine Demystified
Recommendation Engine Demystified
 
Recommendation techniques
Recommendation techniques Recommendation techniques
Recommendation techniques
 
Chapter 11 crm
Chapter 11 crmChapter 11 crm
Chapter 11 crm
 
Data mining
Data miningData mining
Data mining
 

Semelhante a Ranking Related News Predictions

Machine Learning for Forecasting: From Data to Deployment
Machine Learning for Forecasting: From Data to DeploymentMachine Learning for Forecasting: From Data to Deployment
Machine Learning for Forecasting: From Data to DeploymentAnant Agarwal
 
2016 Data Science Salary Survey
2016 Data Science Salary Survey2016 Data Science Salary Survey
2016 Data Science Salary SurveyTrieu Nguyen
 
2016 data-science-salary-survey - O’Reilly Data Science
2016 data-science-salary-survey - O’Reilly Data Science2016 data-science-salary-survey - O’Reilly Data Science
2016 data-science-salary-survey - O’Reilly Data ScienceAdam Rabinovitch
 
Week11-EvaluationMethods.ppt
Week11-EvaluationMethods.pptWeek11-EvaluationMethods.ppt
Week11-EvaluationMethods.pptKamranAli649587
 
Machine Learning in Banking
Machine Learning in Banking Machine Learning in Banking
Machine Learning in Banking vrtanes
 
405_02_Montgomery_Introduction-to-statistical-quality-control-7th-edtition-20...
405_02_Montgomery_Introduction-to-statistical-quality-control-7th-edtition-20...405_02_Montgomery_Introduction-to-statistical-quality-control-7th-edtition-20...
405_02_Montgomery_Introduction-to-statistical-quality-control-7th-edtition-20...ssuserac56571
 
Annotation_M1.docxSubject Information SystemsJoshi, G. (2.docx
Annotation_M1.docxSubject Information SystemsJoshi, G. (2.docxAnnotation_M1.docxSubject Information SystemsJoshi, G. (2.docx
Annotation_M1.docxSubject Information SystemsJoshi, G. (2.docxjustine1simpson78276
 
Text Analysis: Latent Topics and Annotated Documents
Text Analysis: Latent Topics and Annotated DocumentsText Analysis: Latent Topics and Annotated Documents
Text Analysis: Latent Topics and Annotated DocumentsNelson Auner
 
ML in banking
ML in bankingML in banking
ML in bankingvrtanes
 
News Article Ranking : Leveraging the Wisdom of Bloggers
News Article Ranking : Leveraging the Wisdom of BloggersNews Article Ranking : Leveraging the Wisdom of Bloggers
News Article Ranking : Leveraging the Wisdom of BloggersTerrierTeam
 
Predicting Online News Popularity
Predicting Online News Popularity Predicting Online News Popularity
Predicting Online News Popularity Ke Feng
 
Corporate bankruptcy prediction using Deep learning techniques
Corporate bankruptcy prediction using Deep learning techniquesCorporate bankruptcy prediction using Deep learning techniques
Corporate bankruptcy prediction using Deep learning techniquesShantanu Deshpande
 
Crime Risk Forecasting: Near Repeat Pattern Analysis & Load Forecasting
Crime Risk Forecasting: Near Repeat Pattern Analysis & Load ForecastingCrime Risk Forecasting: Near Repeat Pattern Analysis & Load Forecasting
Crime Risk Forecasting: Near Repeat Pattern Analysis & Load ForecastingAzavea
 
Visualization of Publication Impact
Visualization of Publication ImpactVisualization of Publication Impact
Visualization of Publication ImpactEamonn Maguire
 
Applications: Prediction
Applications: PredictionApplications: Prediction
Applications: PredictionNBER
 
SuicidesMany songs associate Mondays with having the blues, but ar.docx
SuicidesMany songs associate Mondays with having the blues, but ar.docxSuicidesMany songs associate Mondays with having the blues, but ar.docx
SuicidesMany songs associate Mondays with having the blues, but ar.docxjonghollingberry
 
130321 zephyrin soh - on the effect of exploration strategies on maintenanc...
130321   zephyrin soh - on the effect of exploration strategies on maintenanc...130321   zephyrin soh - on the effect of exploration strategies on maintenanc...
130321 zephyrin soh - on the effect of exploration strategies on maintenanc...Ptidej Team
 

Semelhante a Ranking Related News Predictions (20)

Machine Learning for Forecasting: From Data to Deployment
Machine Learning for Forecasting: From Data to DeploymentMachine Learning for Forecasting: From Data to Deployment
Machine Learning for Forecasting: From Data to Deployment
 
Kenett On Information NYU-Poly 2013
Kenett On Information NYU-Poly 2013Kenett On Information NYU-Poly 2013
Kenett On Information NYU-Poly 2013
 
2016 Data Science Salary Survey
2016 Data Science Salary Survey2016 Data Science Salary Survey
2016 Data Science Salary Survey
 
2016 data-science-salary-survey - O’Reilly Data Science
2016 data-science-salary-survey - O’Reilly Data Science2016 data-science-salary-survey - O’Reilly Data Science
2016 data-science-salary-survey - O’Reilly Data Science
 
Week11-EvaluationMethods.ppt
Week11-EvaluationMethods.pptWeek11-EvaluationMethods.ppt
Week11-EvaluationMethods.ppt
 
Data journalism: Data rules, while data rule
Data journalism: Data rules, while data ruleData journalism: Data rules, while data rule
Data journalism: Data rules, while data rule
 
Machine Learning in Banking
Machine Learning in Banking Machine Learning in Banking
Machine Learning in Banking
 
405_02_Montgomery_Introduction-to-statistical-quality-control-7th-edtition-20...
405_02_Montgomery_Introduction-to-statistical-quality-control-7th-edtition-20...405_02_Montgomery_Introduction-to-statistical-quality-control-7th-edtition-20...
405_02_Montgomery_Introduction-to-statistical-quality-control-7th-edtition-20...
 
Annotation_M1.docxSubject Information SystemsJoshi, G. (2.docx
Annotation_M1.docxSubject Information SystemsJoshi, G. (2.docxAnnotation_M1.docxSubject Information SystemsJoshi, G. (2.docx
Annotation_M1.docxSubject Information SystemsJoshi, G. (2.docx
 
Text Analysis: Latent Topics and Annotated Documents
Text Analysis: Latent Topics and Annotated DocumentsText Analysis: Latent Topics and Annotated Documents
Text Analysis: Latent Topics and Annotated Documents
 
ML in banking
ML in bankingML in banking
ML in banking
 
News Article Ranking : Leveraging the Wisdom of Bloggers
News Article Ranking : Leveraging the Wisdom of BloggersNews Article Ranking : Leveraging the Wisdom of Bloggers
News Article Ranking : Leveraging the Wisdom of Bloggers
 
Predicting Online News Popularity
Predicting Online News Popularity Predicting Online News Popularity
Predicting Online News Popularity
 
Corporate bankruptcy prediction using Deep learning techniques
Corporate bankruptcy prediction using Deep learning techniquesCorporate bankruptcy prediction using Deep learning techniques
Corporate bankruptcy prediction using Deep learning techniques
 
Design Presentation-CGillies
Design Presentation-CGilliesDesign Presentation-CGillies
Design Presentation-CGillies
 
Crime Risk Forecasting: Near Repeat Pattern Analysis & Load Forecasting
Crime Risk Forecasting: Near Repeat Pattern Analysis & Load ForecastingCrime Risk Forecasting: Near Repeat Pattern Analysis & Load Forecasting
Crime Risk Forecasting: Near Repeat Pattern Analysis & Load Forecasting
 
Visualization of Publication Impact
Visualization of Publication ImpactVisualization of Publication Impact
Visualization of Publication Impact
 
Applications: Prediction
Applications: PredictionApplications: Prediction
Applications: Prediction
 
SuicidesMany songs associate Mondays with having the blues, but ar.docx
SuicidesMany songs associate Mondays with having the blues, but ar.docxSuicidesMany songs associate Mondays with having the blues, but ar.docx
SuicidesMany songs associate Mondays with having the blues, but ar.docx
 
130321 zephyrin soh - on the effect of exploration strategies on maintenanc...
130321   zephyrin soh - on the effect of exploration strategies on maintenanc...130321   zephyrin soh - on the effect of exploration strategies on maintenanc...
130321 zephyrin soh - on the effect of exploration strategies on maintenanc...
 

Mais de Nattiya Kanhabua

Search, Exploration and Analytics of Evolving Data
Search, Exploration and Analytics of Evolving DataSearch, Exploration and Analytics of Evolving Data
Search, Exploration and Analytics of Evolving DataNattiya Kanhabua
 
Towards Concise Preservation by Managed Forgetting: Research Issues and Case ...
Towards Concise Preservation by Managed Forgetting: Research Issues and Case ...Towards Concise Preservation by Managed Forgetting: Research Issues and Case ...
Towards Concise Preservation by Managed Forgetting: Research Issues and Case ...Nattiya Kanhabua
 
Understanding the Diversity of Tweets in the Time of Outbreaks
Understanding the Diversity of Tweets in the Time of OutbreaksUnderstanding the Diversity of Tweets in the Time of Outbreaks
Understanding the Diversity of Tweets in the Time of OutbreaksNattiya Kanhabua
 
Why Is It Difficult to Detect Outbreaks in Twitter?
Why Is It Difficult to Detect Outbreaks in Twitter?Why Is It Difficult to Detect Outbreaks in Twitter?
Why Is It Difficult to Detect Outbreaks in Twitter?Nattiya Kanhabua
 
Leveraging Dynamic Query Subtopics for Time-aware Search Result Diversification
Leveraging Dynamic Query Subtopics for Time-aware Search Result DiversificationLeveraging Dynamic Query Subtopics for Time-aware Search Result Diversification
Leveraging Dynamic Query Subtopics for Time-aware Search Result DiversificationNattiya Kanhabua
 
On the Value of Temporal Anchor Texts in Wikipedia
On the Value of Temporal Anchor Texts in WikipediaOn the Value of Temporal Anchor Texts in Wikipedia
On the Value of Temporal Anchor Texts in WikipediaNattiya Kanhabua
 
Temporal summarization of event related updates
Temporal summarization of event related updatesTemporal summarization of event related updates
Temporal summarization of event related updatesNattiya Kanhabua
 
Temporal Web Dynamics: Implications from Search Perspective
Temporal Web Dynamics: Implications from Search PerspectiveTemporal Web Dynamics: Implications from Search Perspective
Temporal Web Dynamics: Implications from Search PerspectiveNattiya Kanhabua
 
Temporal Web Dynamics and Implications for Information Retrieval
Temporal Web Dynamics and Implications for Information RetrievalTemporal Web Dynamics and Implications for Information Retrieval
Temporal Web Dynamics and Implications for Information RetrievalNattiya Kanhabua
 
Preservation and Forgetting: Friends or Foes?
Preservation and Forgetting: Friends or Foes?Preservation and Forgetting: Friends or Foes?
Preservation and Forgetting: Friends or Foes?Nattiya Kanhabua
 
Concise Preservation by Combining Managed Forgetting and Contextualized Remem...
Concise Preservation by Combining Managed Forgetting and Contextualized Remem...Concise Preservation by Combining Managed Forgetting and Contextualized Remem...
Concise Preservation by Combining Managed Forgetting and Contextualized Remem...Nattiya Kanhabua
 
Can Twitter & Co. Save Lives?
Can Twitter & Co. Save Lives?Can Twitter & Co. Save Lives?
Can Twitter & Co. Save Lives?Nattiya Kanhabua
 
Searching the Temporal Web: Challenges and Current Approaches
Searching the Temporal Web: Challenges and Current ApproachesSearching the Temporal Web: Challenges and Current Approaches
Searching the Temporal Web: Challenges and Current ApproachesNattiya Kanhabua
 
Improving Temporal Language Models For Determining Time of Non-Timestamped Do...
Improving Temporal Language Models For Determining Time of Non-Timestamped Do...Improving Temporal Language Models For Determining Time of Non-Timestamped Do...
Improving Temporal Language Models For Determining Time of Non-Timestamped Do...Nattiya Kanhabua
 
Exploiting temporal information in retrieval of archived documents (doctoral ...
Exploiting temporal information in retrieval of archived documents (doctoral ...Exploiting temporal information in retrieval of archived documents (doctoral ...
Exploiting temporal information in retrieval of archived documents (doctoral ...Nattiya Kanhabua
 
Determining Time of Queries for Re-ranking Search Results
Determining Time of Queries for Re-ranking Search ResultsDetermining Time of Queries for Re-ranking Search Results
Determining Time of Queries for Re-ranking Search ResultsNattiya Kanhabua
 
Supporting Exploration and Serendipity in Information Retrieval
Supporting Exploration and Serendipity in Information RetrievalSupporting Exploration and Serendipity in Information Retrieval
Supporting Exploration and Serendipity in Information RetrievalNattiya Kanhabua
 
Time-aware Approaches to Information Retrieval
Time-aware Approaches to Information RetrievalTime-aware Approaches to Information Retrieval
Time-aware Approaches to Information RetrievalNattiya Kanhabua
 
Learning to Rank Search Results for Time-Sensitive Queries (poster presentation)
Learning to Rank Search Results for Time-Sensitive Queries (poster presentation)Learning to Rank Search Results for Time-Sensitive Queries (poster presentation)
Learning to Rank Search Results for Time-Sensitive Queries (poster presentation)Nattiya Kanhabua
 
Estimating Query Difficulty for News Prediction Retrieval (poster presentation)
Estimating Query Difficulty for News Prediction Retrieval (poster presentation)Estimating Query Difficulty for News Prediction Retrieval (poster presentation)
Estimating Query Difficulty for News Prediction Retrieval (poster presentation)Nattiya Kanhabua
 

Mais de Nattiya Kanhabua (20)

Search, Exploration and Analytics of Evolving Data
Search, Exploration and Analytics of Evolving DataSearch, Exploration and Analytics of Evolving Data
Search, Exploration and Analytics of Evolving Data
 
Towards Concise Preservation by Managed Forgetting: Research Issues and Case ...
Towards Concise Preservation by Managed Forgetting: Research Issues and Case ...Towards Concise Preservation by Managed Forgetting: Research Issues and Case ...
Towards Concise Preservation by Managed Forgetting: Research Issues and Case ...
 
Understanding the Diversity of Tweets in the Time of Outbreaks
Understanding the Diversity of Tweets in the Time of OutbreaksUnderstanding the Diversity of Tweets in the Time of Outbreaks
Understanding the Diversity of Tweets in the Time of Outbreaks
 
Why Is It Difficult to Detect Outbreaks in Twitter?
Why Is It Difficult to Detect Outbreaks in Twitter?Why Is It Difficult to Detect Outbreaks in Twitter?
Why Is It Difficult to Detect Outbreaks in Twitter?
 
Leveraging Dynamic Query Subtopics for Time-aware Search Result Diversification
Leveraging Dynamic Query Subtopics for Time-aware Search Result DiversificationLeveraging Dynamic Query Subtopics for Time-aware Search Result Diversification
Leveraging Dynamic Query Subtopics for Time-aware Search Result Diversification
 
On the Value of Temporal Anchor Texts in Wikipedia
On the Value of Temporal Anchor Texts in WikipediaOn the Value of Temporal Anchor Texts in Wikipedia
On the Value of Temporal Anchor Texts in Wikipedia
 
Temporal summarization of event related updates
Temporal summarization of event related updatesTemporal summarization of event related updates
Temporal summarization of event related updates
 
Temporal Web Dynamics: Implications from Search Perspective
Temporal Web Dynamics: Implications from Search PerspectiveTemporal Web Dynamics: Implications from Search Perspective
Temporal Web Dynamics: Implications from Search Perspective
 
Temporal Web Dynamics and Implications for Information Retrieval
Temporal Web Dynamics and Implications for Information RetrievalTemporal Web Dynamics and Implications for Information Retrieval
Temporal Web Dynamics and Implications for Information Retrieval
 
Preservation and Forgetting: Friends or Foes?
Preservation and Forgetting: Friends or Foes?Preservation and Forgetting: Friends or Foes?
Preservation and Forgetting: Friends or Foes?
 
Concise Preservation by Combining Managed Forgetting and Contextualized Remem...
Concise Preservation by Combining Managed Forgetting and Contextualized Remem...Concise Preservation by Combining Managed Forgetting and Contextualized Remem...
Concise Preservation by Combining Managed Forgetting and Contextualized Remem...
 
Can Twitter & Co. Save Lives?
Can Twitter & Co. Save Lives?Can Twitter & Co. Save Lives?
Can Twitter & Co. Save Lives?
 
Searching the Temporal Web: Challenges and Current Approaches
Searching the Temporal Web: Challenges and Current ApproachesSearching the Temporal Web: Challenges and Current Approaches
Searching the Temporal Web: Challenges and Current Approaches
 
Improving Temporal Language Models For Determining Time of Non-Timestamped Do...
Improving Temporal Language Models For Determining Time of Non-Timestamped Do...Improving Temporal Language Models For Determining Time of Non-Timestamped Do...
Improving Temporal Language Models For Determining Time of Non-Timestamped Do...
 
Exploiting temporal information in retrieval of archived documents (doctoral ...
Exploiting temporal information in retrieval of archived documents (doctoral ...Exploiting temporal information in retrieval of archived documents (doctoral ...
Exploiting temporal information in retrieval of archived documents (doctoral ...
 
Determining Time of Queries for Re-ranking Search Results
Determining Time of Queries for Re-ranking Search ResultsDetermining Time of Queries for Re-ranking Search Results
Determining Time of Queries for Re-ranking Search Results
 
Supporting Exploration and Serendipity in Information Retrieval
Supporting Exploration and Serendipity in Information RetrievalSupporting Exploration and Serendipity in Information Retrieval
Supporting Exploration and Serendipity in Information Retrieval
 
Time-aware Approaches to Information Retrieval
Time-aware Approaches to Information RetrievalTime-aware Approaches to Information Retrieval
Time-aware Approaches to Information Retrieval
 
Learning to Rank Search Results for Time-Sensitive Queries (poster presentation)
Learning to Rank Search Results for Time-Sensitive Queries (poster presentation)Learning to Rank Search Results for Time-Sensitive Queries (poster presentation)
Learning to Rank Search Results for Time-Sensitive Queries (poster presentation)
 
Estimating Query Difficulty for News Prediction Retrieval (poster presentation)
Estimating Query Difficulty for News Prediction Retrieval (poster presentation)Estimating Query Difficulty for News Prediction Retrieval (poster presentation)
Estimating Query Difficulty for News Prediction Retrieval (poster presentation)
 

Último

The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdfThe workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdfSenaatti-kiinteistöt
 
Sector 62, Noida Call girls :8448380779 Noida Escorts | 100% verified
Sector 62, Noida Call girls :8448380779 Noida Escorts | 100% verifiedSector 62, Noida Call girls :8448380779 Noida Escorts | 100% verified
Sector 62, Noida Call girls :8448380779 Noida Escorts | 100% verifiedDelhi Call girls
 
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdfAWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdfSkillCertProExams
 
Digital collaboration with Microsoft 365 as extension of Drupal
Digital collaboration with Microsoft 365 as extension of DrupalDigital collaboration with Microsoft 365 as extension of Drupal
Digital collaboration with Microsoft 365 as extension of DrupalFabian de Rijk
 
Dreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video TreatmentDreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video Treatmentnswingard
 
Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...
Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...
Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...Delhi Call girls
 
Report Writing Webinar Training
Report Writing Webinar TrainingReport Writing Webinar Training
Report Writing Webinar TrainingKylaCullinane
 
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptxChiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptxraffaeleoman
 
If this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New NigeriaIf this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New NigeriaKayode Fayemi
 
Dreaming Music Video Treatment _ Project & Portfolio III
Dreaming Music Video Treatment _ Project & Portfolio IIIDreaming Music Video Treatment _ Project & Portfolio III
Dreaming Music Video Treatment _ Project & Portfolio IIINhPhngng3
 
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...amilabibi1
 
Causes of poverty in France presentation.pptx
Causes of poverty in France presentation.pptxCauses of poverty in France presentation.pptx
Causes of poverty in France presentation.pptxCamilleBoulbin1
 
My Presentation "In Your Hands" by Halle Bailey
My Presentation "In Your Hands" by Halle BaileyMy Presentation "In Your Hands" by Halle Bailey
My Presentation "In Your Hands" by Halle Baileyhlharris
 
lONG QUESTION ANSWER PAKISTAN STUDIES10.
lONG QUESTION ANSWER PAKISTAN STUDIES10.lONG QUESTION ANSWER PAKISTAN STUDIES10.
lONG QUESTION ANSWER PAKISTAN STUDIES10.lodhisaajjda
 
Uncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac FolorunsoUncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac FolorunsoKayode Fayemi
 
Aesthetic Colaba Mumbai Cst Call girls 📞 7738631006 Grant road Call Girls ❤️-...
Aesthetic Colaba Mumbai Cst Call girls 📞 7738631006 Grant road Call Girls ❤️-...Aesthetic Colaba Mumbai Cst Call girls 📞 7738631006 Grant road Call Girls ❤️-...
Aesthetic Colaba Mumbai Cst Call girls 📞 7738631006 Grant road Call Girls ❤️-...Pooja Nehwal
 
Thirunelveli call girls Tamil escorts 7877702510
Thirunelveli call girls Tamil escorts 7877702510Thirunelveli call girls Tamil escorts 7877702510
Thirunelveli call girls Tamil escorts 7877702510Vipesco
 

Último (18)

The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdfThe workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
 
ICT role in 21st century education and it's challenges.pdf
ICT role in 21st century education and it's challenges.pdfICT role in 21st century education and it's challenges.pdf
ICT role in 21st century education and it's challenges.pdf
 
Sector 62, Noida Call girls :8448380779 Noida Escorts | 100% verified
Sector 62, Noida Call girls :8448380779 Noida Escorts | 100% verifiedSector 62, Noida Call girls :8448380779 Noida Escorts | 100% verified
Sector 62, Noida Call girls :8448380779 Noida Escorts | 100% verified
 
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdfAWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
 
Digital collaboration with Microsoft 365 as extension of Drupal
Digital collaboration with Microsoft 365 as extension of DrupalDigital collaboration with Microsoft 365 as extension of Drupal
Digital collaboration with Microsoft 365 as extension of Drupal
 
Dreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video TreatmentDreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video Treatment
 
Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...
Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...
Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...
 
Report Writing Webinar Training
Report Writing Webinar TrainingReport Writing Webinar Training
Report Writing Webinar Training
 
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptxChiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
 
If this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New NigeriaIf this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New Nigeria
 
Dreaming Music Video Treatment _ Project & Portfolio III
Dreaming Music Video Treatment _ Project & Portfolio IIIDreaming Music Video Treatment _ Project & Portfolio III
Dreaming Music Video Treatment _ Project & Portfolio III
 
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
 
Causes of poverty in France presentation.pptx
Causes of poverty in France presentation.pptxCauses of poverty in France presentation.pptx
Causes of poverty in France presentation.pptx
 
My Presentation "In Your Hands" by Halle Bailey
My Presentation "In Your Hands" by Halle BaileyMy Presentation "In Your Hands" by Halle Bailey
My Presentation "In Your Hands" by Halle Bailey
 
lONG QUESTION ANSWER PAKISTAN STUDIES10.
lONG QUESTION ANSWER PAKISTAN STUDIES10.lONG QUESTION ANSWER PAKISTAN STUDIES10.
lONG QUESTION ANSWER PAKISTAN STUDIES10.
 
Uncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac FolorunsoUncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac Folorunso
 
Aesthetic Colaba Mumbai Cst Call girls 📞 7738631006 Grant road Call Girls ❤️-...
Aesthetic Colaba Mumbai Cst Call girls 📞 7738631006 Grant road Call Girls ❤️-...Aesthetic Colaba Mumbai Cst Call girls 📞 7738631006 Grant road Call Girls ❤️-...
Aesthetic Colaba Mumbai Cst Call girls 📞 7738631006 Grant road Call Girls ❤️-...
 
Thirunelveli call girls Tamil escorts 7877702510
Thirunelveli call girls Tamil escorts 7877702510Thirunelveli call girls Tamil escorts 7877702510
Thirunelveli call girls Tamil escorts 7877702510
 

Ranking Related News Predictions

  • 1. Ranking Related News Predictions Ranking Related News Predictions Nattiya Kanhabua1, Roi Blanco2 and Michael Matthews2 1Norwegian University of Science and Tech., Norway 2Yahoo! Research, Barcelona, Spain SIGIR’2011, Beijing
  • 2. Ranking Related News Predictions Outline Outline Introduction Problem Statement Related Work Contributions Task Definition System Architecture Models Approach Features Ranking Method Evaluation Experiment Setting Experimental Results
  • 3. Ranking Related News Predictions Outline Outline Introduction Problem Statement Related Work Contributions Task Definition System Architecture Models Approach Features Ranking Method Evaluation Experiment Setting Experimental Results
  • 4. Ranking Related News Predictions Outline Outline Introduction Problem Statement Related Work Contributions Task Definition System Architecture Models Approach Features Ranking Method Evaluation Experiment Setting Experimental Results
  • 5. Ranking Related News Predictions Outline Outline Introduction Problem Statement Related Work Contributions Task Definition System Architecture Models Approach Features Ranking Method Evaluation Experiment Setting Experimental Results
  • 6. Ranking Related News Predictions Introduction Problem Statement Outline Introduction Problem Statement Related Work Contributions Task Definition System Architecture Models Approach Features Ranking Method Evaluation Experiment Setting Experimental Results
  • 7. Ranking Related News Predictions Introduction Problem Statement Problem statement People are naturally curious about the future. ◮ How long will a war in the middle east last? ◮ What is the latest health care plan? ◮ What will happen to EU economies in next 5 years? ◮ What will be potential effects of climate changes? Over 32% of 2.5M documents from Yahoo! News (July 2009 to July 2010) contain at least one prediction. A new task called ranking related news predictions. ◮ Retrieve predictions related to a news story in news archives. ◮ Rank them according to their relevance to the news story.
  • 8. Ranking Related News Predictions Introduction Problem Statement Problem statement People are naturally curious about the future. ◮ How long will a war in the middle east last? ◮ What is the latest health care plan? ◮ What will happen to EU economies in next 5 years? ◮ What will be potential effects of climate changes? Over 32% of 2.5M documents from Yahoo! News (July 2009 to July 2010) contain at least one prediction. A new task called ranking related news predictions. ◮ Retrieve predictions related to a news story in news archives. ◮ Rank them according to their relevance to the news story.
  • 9. Ranking Related News Predictions Introduction Problem Statement Problem statement People are naturally curious about the future. ◮ How long will a war in the middle east last? ◮ What is the latest health care plan? ◮ What will happen to EU economies in next 5 years? ◮ What will be potential effects of climate changes? Over 32% of 2.5M documents from Yahoo! News (July 2009 to July 2010) contain at least one prediction. A new task called ranking related news predictions. ◮ Retrieve predictions related to a news story in news archives. ◮ Rank them according to their relevance to the news story.
  • 10. Ranking Related News Predictions Introduction Problem Statement Related News Predictions
  • 11. Ranking Related News Predictions Introduction Problem Statement Related News Predictions
  • 12. Ranking Related News Predictions Introduction Problem Statement Related News Predictions Query = <gas, emission, percent, european, global, climate>
  • 13. Ranking Related News Predictions Introduction Related Work Outline Introduction Problem Statement Related Work Contributions Task Definition System Architecture Models Approach Features Ranking Method Evaluation Experiment Setting Experimental Results
  • 14. Ranking Related News Predictions Introduction Related Work Future-related Information Analyzing Tools Recorded Future Difference: a user must specify a query in advance using “predefined” entities.
  • 15. Ranking Related News Predictions Introduction Related Work Future-related Information Analyzing Tools Yahoo’s Time Explorer Difference: No ranking or performance evaluation is done.
  • 16. Ranking Related News Predictions Introduction Related Work Previous Work on Future Retrieval R. Baeza-Yates. Searching the future. SIGIR’2005 Workshop on Mathematical/Formal Methods in IR. ◮ Extract temporal expressions from news articles. ◮ Retrieve future information using a probabilistic model, i.e., multiplying term similarity and a time confidence. ◮ Only a small data set and a year granularity are used.
  • 17. Ranking Related News Predictions Introduction Related Work Previous Work on Future Retrieval A. Jatowt et al. Supporting analysis of future-related information in news archives and the web. JCDL’2009. ◮ Extract future mentions from news snippets obtained from search engines. ◮ Summarize and aggregate results using clustering methods. ◮ Not focus on relevance and ranking of future information.
  • 18. Ranking Related News Predictions Introduction Contributions Outline Introduction Problem Statement Related Work Contributions Task Definition System Architecture Models Approach Features Ranking Method Evaluation Experiment Setting Experimental Results
  • 19. Ranking Related News Predictions Introduction Contributions Contributions I. Formally define ranking related news predictions. II. Four classes of features: term similarity, entity-based similarity, topic similarity and temporal similarity. III. Extensive evaluation using dataset with over 6000 judgments from the NYT Annotated Corpus.
  • 20. Ranking Related News Predictions Task Definition System Architecture Outline Introduction Problem Statement Related Work Contributions Task Definition System Architecture Models Approach Features Ranking Method Evaluation Experiment Setting Experimental Results
  • 21. Ranking Related News Predictions Task Definition System Architecture System Architecture Step 1: Document annotation. ◮ Extract temporal expressions using time and event recognition. ◮ Normalize them to dates so they can be anchored on a timeline. ◮ Output: sentences annotated with named entities and dates, i.e., predictions.
  • 22. Ranking Related News Predictions Task Definition System Architecture System Architecture Step 2: Retrieving predictions. ◮ Automatically generate a query from a news article being read. ◮ Retrieve predictions that match the query. ◮ Rank predictions by relevance. A prediction is “relevant” if it is about the topics of the article.
  • 23. Ranking Related News Predictions Task Definition Models Outline Introduction Problem Statement Related Work Contributions Task Definition System Architecture Models Approach Features Ranking Method Evaluation Experiment Setting Experimental Results
  • 24. Ranking Related News Predictions Task Definition Models Annotated Document Model Collection C = {d1, . . . , dn}. Document d = {{w1, . . . , wn} , time(d)}. ◮ time(d) gives the publication date of d. Annotated document ˆd is composed of: ◮ Named entities ˆde = {e1, . . . , en} ◮ Temporal expressions ˆdt = {t1, . . . , tm} ◮ Sentences ˆds = {s1, . . . , sz}
  • 25. Ranking Related News Predictions Task Definition Models Annotated Document Model Collection C = {d1, . . . , dn}. Document d = {{w1, . . . , wn} , time(d)}. ◮ time(d) gives the publication date of d. Annotated document ˆd is composed of: ◮ Named entities ˆde = {e1, . . . , en} ◮ Temporal expressions ˆdt = {t1, . . . , tm} ◮ Sentences ˆds = {s1, . . . , sz}
  • 26. Ranking Related News Predictions Task Definition Models Prediction Model Let dp be the parent document of a prediction p. p is a sentence containing field/value pairs: Field Value ID 1136243_1 PARENT_ID 1136243 TITLE Gore Pledges A Health Plan For Every Child TEXT Vice President Al Gore proposed today to guarantee access to affordable health insurance for all children by 2005, expanding on a program enacted two years ago that he conceded had had limited success so far. CONTEXT Mr. Gore acknowledged that the number of Americans without health coverage had increased steadily since he and President Clinton took office. ENTITY Al Gore FUTURE_DATE 2005 PUB_DATE 1999/09/08
  • 27. Ranking Related News Predictions Task Definition Models Query Model Query q is extracted from a news article being read dq. 1. Keywords qtext 2. Time constraints qtime
  • 28. Ranking Related News Predictions Task Definition Models Query Keywords A news article being read Query keyword extraction Term query (1) (2) (3) Entity query Q Q E T Field A prediction ID PARENT_ID TITLE TEXT ENTITY CONTEXT FUTURE_DATE PUB_DATE Combined query Q C QE = {e1, . . . , em} E.g., Barack Obama, Iraq, America
  • 29. Ranking Related News Predictions Task Definition Models Query Keywords A news article being read Query keyword extraction Term query (1) (2) (3) Entity query Q Q Combined query Q E T C Field A prediction ID PARENT_ID TITLE TEXT ENTITY CONTEXT FUTURE_DATE PUB_DATE QE = {w1, . . . , wn} E.g., troop, war, withdraw
  • 30. Ranking Related News Predictions Task Definition Models Query Keywords A news article being read Query keyword extraction Term query (1) (2) (3) Entity query Q Q Combined query Q E T C Field A prediction ID PARENT_ID TITLE TEXT ENTITY CONTEXT FUTURE_DATE PUB_DATE QC = {e1, . . . , em} ∪ {w1, . . . , wn} E.g., Barack Obama, Iraq, America, troop, war, withdraw
  • 31. Ranking Related News Predictions Task Definition Models Query Time Time constraints qtime 1. only predictions that are future to time(dq) - (time(dq), tmax] 2. only articles published before time(dq) - [tmin, time(dq)] now futurepast Query 2016 203320181999 20062002 P P P
  • 32. Ranking Related News Predictions Approach Features Outline Introduction Problem Statement Related Work Contributions Task Definition System Architecture Models Approach Features Ranking Method Evaluation Experiment Setting Experimental Results
  • 33. Ranking Related News Predictions Approach Features Term Similarity Capture the term-similarity between q and p. 1. retScore(q,p) Lucene’s TF-IDF scoring function ◮ Problem: keyword matching, short texts ◮ Predictions not containing query terms are not retrieved. 2. bm25f(q,p) field-aware ranking function ◮ Extend a sentence structure by surrounding sentences. ◮ Search CONTEXT in addition to TEXT [Blanco et al. 2010].
  • 34. Ranking Related News Predictions Approach Features Term Similarity Capture the term-similarity between q and p. 1. retScore(q,p) Lucene’s TF-IDF scoring function ◮ Problem: keyword matching, short texts ◮ Predictions not containing query terms are not retrieved. 2. bm25f(q,p) field-aware ranking function ◮ Extend a sentence structure by surrounding sentences. ◮ Search CONTEXT in addition to TEXT [Blanco et al. 2010].
  • 35. Ranking Related News Predictions Approach Features Term Similarity Capture the term-similarity between q and p. 1. retScore(q,p) Lucene’s TF-IDF scoring function ◮ Problem: keyword matching, short texts ◮ Predictions not containing query terms are not retrieved. 2. bm25f(q,p) field-aware ranking function ◮ Extend a sentence structure by surrounding sentences. ◮ Search CONTEXT in addition to TEXT [Blanco et al. 2010].
  • 36. Ranking Related News Predictions Approach Features Entity-based Similarity Measure the similarity between q and p by exploiting annotated entities in dp , p, q. ◮ Only applicable for QE and QC. ◮ Features commonly employed in entity ranking tasks. ◮ Time distance captures the relationship of term and time. ID Feature 1 entitySim(q, p) 2 title(e, dp) 3 titleSim(e, dp) 4 senPos(e, dp) 5 senLen(e, dp) 6 cntSenSubj(e, dp) 7 cntEvent(e, dp) 8 cntFuture(e, dp) 9 cntEventSubj(e, dp) 10 cntFutureSubj(e, dp) 11 timeDistEvent(e, dp) 12 timeDistFuture(e, dp) 13 tagSim(e, dp) 14 isSubj(e, p) 15 timeDist(e, p)
  • 37. Ranking Related News Predictions Approach Features Topic Similarity Compute the similarity between q and p on a topic level. ◮ Latent Dirichlet allocation [Blei et al. 2003] for modeling topics. 1. Train a topic model 2. Infer topics 3. Compute topic similarity
  • 38. Ranking Related News Predictions Approach Features Topic Similarity Step 1: Learn a topic model. ◮ Partition DN into sub-collections, called document snapshot Dtrain,tk . ◮ For each Dtrain,tk , randomly select documents for training a topic model. ◮ Output: topic models at different time snapshots, e.g., φtk at tk .
  • 39. Ranking Related News Predictions Approach Features Topic Similarity Step 2: Infer topics. ◮ Determine topics for q and p using their contents, called topic inference. ◮ Both q and p are represented by a probability distribution of topics. ◮ pφ = p(z1), . . . , p(zn), where p(z) is a probability of a topic z.
  • 40. Ranking Related News Predictions Approach Features Topic Similarity I. Which model snapshot should be used for inference? Select a topic model φtk for inference in 2 ways: ◮ tk = time(dq ) ◮ tk = time(dp ) II. Which contents should be used for inference? For a query q, the parent document dq is used. For a prediction p, the contents can be: ◮ Only text ptxt ◮ Both text ptxt and context pctx ◮ Parent document dp
  • 41. Ranking Related News Predictions Approach Features Topic Similarity I. Which model snapshot should be used for inference? Select a topic model φtk for inference in 2 ways: ◮ tk = time(dq ) ◮ tk = time(dp ) II. Which contents should be used for inference? For a query q, the parent document dq is used. For a prediction p, the contents can be: ◮ Only text ptxt ◮ Both text ptxt and context pctx ◮ Parent document dp
  • 42. Ranking Related News Predictions Approach Features Topic Similarity I. Which model snapshot should be used for inference? Select a topic model φtk for inference in 2 ways: ◮ tk = time(dq ) ◮ tk = time(dp ) II. Which contents should be used for inference? For a query q, the parent document dq is used. For a prediction p, the contents can be: ◮ Only text ptxt ◮ Both text ptxt and context pctx ◮ Parent document dp
  • 43. Ranking Related News Predictions Approach Features Topic Similarity I. Which model snapshot should be used for inference? Select a topic model φtk for inference in 2 ways: ◮ tk = time(dq ) ◮ tk = time(dp ) II. Which contents should be used for inference? For a query q, the parent document dq is used. For a prediction p, the contents can be: ◮ Only text ptxt ◮ Both text ptxt and context pctx ◮ Parent document dp
  • 44. Ranking Related News Predictions Approach Features Topic Similarity Step 3: Measuring topic similarity. ◮ q and p are represented by topic distributions. ◮ qφ = p(z1), . . . , p(zn) ◮ pφ = p(z1), . . . , p(zn) ◮ Compute the topic similarity using cosine similarity. topicSim(q, p) = qφ · pφ ||qφ|| · ||pφ|| = z∈Z qφz · pφz z∈Z q2 φz · z∈Z p2 φz
  • 45. Ranking Related News Predictions Approach Features Temporal Similarity Hypothesis I. Predictions that are more recent to the query are more relevant. now futurepast Query 2016 203320181999 20062002 P P P Time distance
  • 46. Ranking Related News Predictions Approach Features Temporal Similarity Hypothesis II. Predictions extracted from more recent documents are more relevant. now futurepast Query 2016 203320181999 20062002 P P P Time distance ◮ Timestamp-based Uncertainty (TSU) [Kanhabua and Nørvåg 2010] ◮ FussySet (FS) [Kalczynski and Chou 2005]
  • 47. Ranking Related News Predictions Approach Ranking Method Outline Introduction Problem Statement Related Work Contributions Task Definition System Architecture Models Approach Features Ranking Method Evaluation Experiment Setting Experimental Results
  • 48. Ranking Related News Predictions Approach Ranking Method Ranking Method Learning-to-rank: Given an unseen (q, p), p is ranked using a model trained over a set of labeled query/prediction pairs. score(q, p) = N i=1 wi × fi ◮ SVMMAP [Yue et al. 2007] ◮ RankSVM [Joachims 2002] ◮ SGD-SVM [Zhang 2004] ◮ PegasosSVM [Shalev-Shwartz et al. 2007] ◮ PA-Perceptron [Crammer et al. 2006]
  • 49. Ranking Related News Predictions Evaluation Experiment Setting Outline Introduction Problem Statement Related Work Contributions Task Definition System Architecture Models Approach Features Ranking Method Evaluation Experiment Setting Experimental Results
  • 50. Ranking Related News Predictions Evaluation Experiment Setting Document collection NYT Annotated Corpus 1.8M from 1987 to 2007. ◮ More than 25% contain at least one prediction Annotation process uses several language processing tools. ◮ OpenNLP for tokenizing, sentence splitting, part-of-speech tagging, shallow parsing ◮ SuperSense tagger for named entity recognition ◮ TARSQI for extracting temporal expressions Apache Lucene for indexing and retrieving. ◮ 44,335,519 sentences and 548,491 predictions ◮ 939,455 future dates (avg. future date/prediction is 1.7)
  • 51. Ranking Related News Predictions Evaluation Experiment Setting Relevance judgments 42 future-related topics POLITICS ENVIRONMENT SPACE president election global warming Mars Iraq war energy efficiency Moon SCIENCE PHYSICS HEALTH earthquake particle Physics bird flue tsunami Big Bang influenza BUSINESS SPORT TECHNOLOGY subprime Olympics Internet financial crisis World cup search engine
  • 52. Ranking Related News Predictions Evaluation Experiment Setting Relevance judgments Human assessors gave a relevance score Grade(q, p, t). ◮ 4 (very relevant), 3 (relevant), 2 (related), 1 (non-relevant), and 0 (incorrect tagged date) ◮ relevant if Grade(q, p, t) ≥ 3 and non-relevant if 1 ≤ Grade(q, p, t) ≤ 2 In total, assessors judged 52 queries. ◮ On average 94 predictions were retrieved per query ◮ 4,888 query/prediction pairs (approximately 6,032 of triples) Available for download at: www.idi.ntnu.no/~nattiya/data/sigir2011/futurepredictions.zip
  • 53. Ranking Related News Predictions Evaluation Experiment Setting Parameter setting BM25F: b = 0.75, k1 = 1.2 [Robertson et al. 1994] ◮ boost(TEXT) = 5.0 ◮ boost(CONTEXT) = 1.0 ◮ boost(TITLE) = 2.0 LDA: Stanford Topic Modeling Toolbox ◮ randomly select 4% of documents in each year for training ◮ filter 100 most common terms and in less than 15 documents ◮ number of topics Nz is 500 ◮ collapsed variational Bayes approximation algorithm Temporal features: ◮ DecayRate = 0.5, λ = 0.5, µ = 2y ◮ n = 2, m = 2, smin = 4y, smax = 2y ◮ α1 = time(dq) − 4y, α2 = time(dq) + 2y
  • 54. Ranking Related News Predictions Evaluation Experimental Results Outline Introduction Problem Statement Related Work Contributions Task Definition System Architecture Models Approach Features Ranking Method Evaluation Experiment Setting Experimental Results
  • 55. Ranking Related News Predictions Evaluation Experimental Results Methods for comparison Baseline: QE , QT , QC ◮ Rank using Lucene’s default ranking function. Our approach: Re-QE , Re-QT , Re-QC ◮ Re-rank the baseline results using learning-to-rank. Metrics: P@1, P@3, MRR ◮ Typically, a user is interested in a few top predictions.
  • 56. Ranking Related News Predictions Evaluation Experimental Results Selecting top-m entities and top-n terms Select m and n with reasonable improvement in a hold-out set. ◮ Using QE to retrieve predictions, choose m = 11. ◮ Observing the performance of QC when m = 11, choose n = 10. 0 0.1 0.2 0.3 0.4 0.5 0.6 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 top-m entities P10 MAP 0 0.1 0.2 0.3 0.4 0.5 0.6 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 top-n terms P10 MAP
  • 57. Ranking Related News Predictions Evaluation Experimental Results Compare all other methods against QE 0 0.2 0.4 0.6 0.8 1 P1 QE QT QC Re-QE Re-QT Re-QC 0 0.2 0.4 0.6 0.8 1 P3 QE QT QC Re-QE Re-QT Re-QC 0 0.2 0.4 0.6 0.8 1 MRR QE QT QC Re-QE Re-QT Re-QC Results: ◮ QE performs worst among the baselines, while QC is superior to QT . ◮ Re-QC gains the highest effectiveness followed by Re-QT . ◮ Re-ranking approach gains improvement, except Re-QE .
  • 58. Ranking Related News Predictions Evaluation Experimental Results Compare all other methods against QE 0 0.2 0.4 0.6 0.8 1 P1 QE QT QC Re-QE Re-QT Re-QC 0 0.2 0.4 0.6 0.8 1 P3 QE QT QC Re-QE Re-QT Re-QC 0 0.2 0.4 0.6 0.8 1 MRR QE QT QC Re-QE Re-QT Re-QC Analysis: ◮ QE not retrieved any relevant result in the judged pool, difficult for re-ranking. ◮ Entity-based features perform well for some topics.
  • 59. Ranking Related News Predictions Evaluation Experimental Results Feature analysis Top-5 features with highest weights and lowest weights for each query type. QE QT QC Feature Wi Feature Wi Feature Wi tagSim 1.00 bm25f 1.00 LDA1,parent,k 1.00 FS1 0.97 retScore 0.60 retScore 0.99 TSU2 0.88 LDA1,parent,k 0.55 LDA1,parent,all 0.96 LDA1,txt,k 0.87 LDA2,parent,k 0.51 bm25f 0.93 LDA1,txt,all 0.82 LDA1,parent,all 0.49 isSubj 0.87 cntSenSubj 0.01 timeDistEvent -0.03 cntEventSen -0.02 cntEventSubj 0.01 timeDistFuture -0.11 querySim -0.05 isInTitle 0.00 cntEventSen -0.12 cntFutureSen -0.10 cntEventSen 0.00 cntFutureSen -0.12 timeDistFuture -0.14 querySim -0.01 senLen -0.16 senLen -0.18 ◮ Topic-based features play an important role in the re-ranking model. ◮ Although relying on terms, retScore and bm25f help to re-rank predictions. ◮ Features in top-5 features with lowest weights are from the entity-based class.
  • 60. Ranking Related News Predictions Conclusions Conclusions and Future Work Outline Introduction Problem Statement Related Work Contributions Task Definition System Architecture Models Approach Features Ranking Method Evaluation Experiment Setting Experimental Results
  • 61. Ranking Related News Predictions Conclusions Conclusions and Future Work Conclusions and future work ◮ Define the task of ranking related future predictions. ◮ Employ learning-to-rank incorporating 4 feature classes. ◮ Conduct extensive experiments and create an evaluation dataset with over 6000 relevance judgments. ◮ Future work: ◮ Combining multiple sources (Wikipedia, blogs, home pages, etc.) of future-related information. ◮ Sentimental analysis for future-related information.
  • 62. Ranking Related News Predictions Conclusions Conclusions and Future Work Acknowledgment: Thank Hugo Zaragoza for his help at the early stages of this work. Thank you!