O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

Leveraging Learning To Rank in an Optimization Framework for Timeline Summarization

With the tremendous amount of news published on the Web every day, helping users explore news events on a given topic of interest is an acute problem. Timeline summaries have recently emerged as a simple and effective solution for users to navigate through temporally related news events. In this paper, we propose an optimization framework and demonstrate the use of Learning To Rank (LTR) to automatically construct timeline summaries from Web news articles. Experimental evaluations show that our approach outperforms existing solutions in producing high quality timeline summaries.

  • Entre para ver os comentários

  • Seja a primeira pessoa a gostar disto

Leveraging Learning To Rank in an Optimization Framework for Timeline Summarization

  1. 1. Leveraging Learning To Rank in an Optimization Framework for Timeline Summarization Giang Binh Tran, Anh Tuan Tran, Nam Khanh Tran, Mohammad Alrifai, Nattiya Kanhabua L3S Research Center & University of Hannover, Germany 1 SIGIR Workshop TAIA’13, Dublin August 1, 2013
  2. 2. Timeline Summarization 2 News Topic: Arab Spring What and how did it happen? A summarization with the temporal structure (list of daily key events) Example: • 11 Feb 2011: Egypt President Hosni Mubarak resigned • 15 Feb 2011: protests broke out against Muammar Gaddafi’s regime • 03 Mar 2011: Egypt Prime Minister Ahmed Shafik resigned
  3. 3. Example Day Summaries of key events Important dates 3
  4. 4. Related work • Timeline Summarization: • Chieu et al. (SIGIR’04): • burstiness + interest score (~sum TFxIDF similarity to neighbor sentences) • Yan et al. (SIGIR’11): • Topic relevancy + coverage + coherence + diversity based on word distribution Unsupervised manners Our approach: learn from expert-created timeline summaries, and optimize with different criteria 4
  5. 5. Sentence Ranking Model TIMELINE Date Summary 2011-08-29 Eni CEO meets with members of the rebel government. 2011-09-08 Gaddafi vows to fight on ……. …… Learning Algorithms Manually created Timelines Optimization Rs Ranked Sentences 5
  6. 6. Learning to Rank sentences • Assumption • Day summaries are created from input news articles (e.g. BBC timelines  BBC news articles) • Generate Training Data automatically Relevance R(s) ~ Textual Similarity (s, DS ) A sentence with higher similarity to Day Summary (DS) is more likelihood to be selected as a part of summary • Feature extraction Surface: length, stop/non-stop words,#pronouns, position. Coherence: #temporal/logical/causal signals Topic: sum/avg TFIDF, logodds, cross entropy, semantic similarity to document abstract Temporal: popularity, has temporal expression Event: probability to describes the main events in term of top word pairs 6
  7. 7. Optimize Timeline Generation N-gram-based computation • Novelty Avoid duplication in a day summary when selecting s • Continuity Generate timeline as a flow of information (connecting the dots between day summaries) Maximize Using dynamic programming 7
  8. 8. Evaluation Dataset: Timeline17 (www.l3s.de/~gtran/timeline) 4650 articles collected from wellknown news agencies (e.g., BBC, CNN,.) 17 Timelines from 9 Topics : BP Oil Spill, Haiti Earthquake, H1N1, Financial Crisis, Lybian War, ... Leave-one-out strategy „In-house“ experiment: timeline generated from BBC news should be compared against BBC expert-generated timeline 8
  9. 9. 9
  10. 10. Metric ROUGE n-gram based measurement (overlapped n-grams between generated day summary and expert- created day summaries - Precision/Recall/F-measure) ROUGE-1 uses uni-grams, ROUGE-2 uses bi-grams, ROUGE-S* uses skipped bi-grams Chieu et al. (Chieu et al. SIGIR 2004) MEAD: traditional multi-document summarization system ETS (Yan et al. SIGIR 2011) 10
  11. 11. Michael Jackson Death trial, example 2009-07-28 Dr Murray 's home is also raided . 2011-05-02 The trial is delayed again , as Dr Murray 's lawyers ask for extra time to prepare for new prosecution witnesses . ----------------------- 2009-07-29 Court documents filed in Nevada show that Dr Murray is heavily in debt , owing more than $ 780,000 in judgements against him and his medical practice, outstanding mortgage payments on his house , child support and credit cards . 11 BBC Timeline (ground truth) 2009-07-28 (Ok) Police raid Jackson doctor 's home 2011-05-02 In Los Angeles , lawyers for Dr Conrad Murray had asked for a delay to prepare for new prosecution witnesses . ---------------------- 2009-07-29 (Bad) Michael Flanagan of the DEA describes the operation Police have searched the Las Vegas home and offices of Michael Jackson 's doctor as part of a manslaughter investigation into the singer 's death . Ours
  12. 12. H1N1 – Continuity v.s. NonContinuity 12 Without Continuity 2009-04-25 The World Health Organisation has warned countries to be on alert for any unusual flu outbreaks after a swine flu virus was implicated in possibly dozens of human deaths in Mexico . 2009-04-26 The World Health Organisation said at least 81 people had died from severe pneumonia caused by the flu - like illness in Mexico . With Continuity 2009-04-25 The World Health Organisation has warned countries to be on alert for any unusual flu outbreaks after a swine flu virus was implicated in possibly dozens of human deaths in Mexico . 2009-04-26 The influenza strain that has struck Mexico and the United States involves , in many cases, a never-before-seen strain of the H1N1 virus ..
  13. 13. Thank you very much! 13
  14. 14. 14 Novelty computation (s: sentence, S: set of sentences) Continuity computation (s: sentence, DS (d_i-1_) is the previous day summary

×