6. Kȍniget al. & Sayyadiet al. have exploited the blogosphere for event detectionObama Victory Number of blog posts Day (November 2008) M. Thelwall WWW’06 Kȍnig et al. SIGIR’09 Sayyadi et al. ICWSM’09
7.
8. Every day newspaper editors select articles for placement within their newspapers.
10. Rank articles by readership interestFront Page Page 2 Newspaper Editor . . . We investigate how such a ranking can be approximated using evidence from the blogosphere
19. Given a day of interest dQ we wish to score each news article a by its predicted importance, score(a,dQ) using evidence from the blogosphere.=29 Day dQ =23 =14 =13 News Article Ranker =4 =4 Importance Scores
20.
21. Score by blog post volumeApproach Two Stages: Score each news article a for all days d based on related blog post volume for day d. News articles are represented by their headlines Given a query day dQ rank A based on the score for each news article on day dQ, i.e. score(a, dQ) -> a voting process The Votes Approach
22. Votes Approach : Stage 1 Stage 1: Score days for each news story 1 1 2 3 4 2 3 4 Ranking of days for a blog post ranking 4) Rank days by votes received 2) Select the top 1000 blog posts for a 3) Each post votes for a day Days votes = 2 votes = 1 votes = 2 votes = 2 For each news articlea 1) Use its representation (headline) as a query votes = 0 votes = 1 votes = 2 votes = 0 Terrier Votes Voting Model : Count * Craig Macdonald PhD thesis 2009
60. Votes Performance Better performance than TREC 2009 best systems Results: BM25<DPH (DFR) Votes + extras Hyperlink evidence is of less value than textual evidence Votes Approach TREC 2009 Best Systems
83. Weights downward the scores for each day dependent on w.ScoreGaussBoost(B,4) = (1*4)+(0.79*1)+(0.18*1) = 4.970 ScoreGaussBoost(A,4) = (1*4)+(0.79*4)+(0.18*3) = 7.700 dQ N = -2 Score =7.700 Num Votes Score=11 Score =4.970 Score=6 Days
84.
85. Does the quality of evidence decrease as distance from dQ increases?
86. Is historical or future (before or after dQ) blog post evidence more useful?Research Questions
87.
88. The parameter w determines the width of the Gaussian curve, and as such, the weights ∆d for the days.( n = -2, w = 0.5 ) ScoreGaussBoost(A,4) = (1*4)+(0.38*4)+(0.01*3) = 4.608 ScoreGaussBoost(B,4) = (1*4)+(0.38*1)+(0.01*1) = 4.390 ( n = -2, w = 1 ) ScoreGaussBoost(A,4) = (1*4)+(0.79*4)+(0.18*3) = 7.700 ScoreGaussBoost(B,4) = (1*4)+(0.79*1)+(0.18*1) = 4.970 Temporal Promotion
89. NDayBoost Performance Future blog postings does provide useful evidence Baseline DPH+Votes MAP Historical evidence is not useful for NDayBoost n value (days)
90. GaussBoost Performance Future blog postings provide stronger evidence than historical postings Historical blog postings are useful for days close to dQ Baseline DPH+Votes MAP w value (not days!)
91.
92. Both historical and future evidence is useful to improve Votes ranking performance
93. Can use this evidence to generate a better ranking for editors if the data is available
113. Prune headlines less likely to be news-worthyImproving the Article Representation
114.
115. Add related terms (counter sparsity)Approach: Select retrieve top 3 blog posts from: Blogs08 (query expansion , K. L. Kwok and M. S. Chan. SIGIR 1998) Wikipedia (collection enrichment, F. Diaz and D. Metzler. SIGIR 2006) using DPH (DFR) Expand query with the top 10 terms identified using Bo1 (G. Amati, Thesis 2003) from those documents. a Terrier Top Terms DPH Bo1 Blogs08/Wikipedia Query expansion/External Query expansion/Collection Enrichment
118. Collection enrichment helps find the blog posts that are related.Article Improvement Performance Collection enrichment with Wikipedia significantly increases performance MAP