SlideShare uma empresa Scribd logo
1 de 24
Baixar para ler offline
Text Summarization - Machine Learning
    TEXT SUMMARIZATION
1   Kareem El-Sayed Hashem
    Mohamed Mohsen Brary
TEXT SUMMARIZATION
   Goal: reducing a text with a computer program in
    order to create a summary that retains the most
    important points of the original text.




                                                           Text Summarization - Machine Learning
   Summarization Applications
     summaries of email threads
     action items from a meeting
     simplifying text by compressing sentences




                                                       2
WHAT TO SUMMARIZE?
SINGLE VS. MULTIPLE DOCUMENTS
   Single Document Summarization
       Given a single document produce




                                                               Text Summarization - Machine Learning
         Abstract
         Outline

         Headline




   Multiple Document Summarization
       Given a group of document produce a gist of the
        document
         A series of news stories of the same event
         A set of webpages about some topic or question


                                                           3
QUERY-FOCUSED SUMMARIZATION
& GENERIC SUMMARIZATION
   Generic Summarization
       Summarize the content of a document




                                                                       Text Summarization - Machine Learning
   Query-focused Summarization
     Summarize a document with respect to an
      information need expressed in a user query
     A kind of complex question answering
           Answer a question by summarizing a document that has
            the information to construct the answer




                                                                   4
SUMMARIZATION FOR QUESTION
ANSWERING:
   Snippets
       Create snippets summarizing a web page for a query




                                                                        Text Summarization - Machine Learning
   Multiple Documents
       Create answer to complex questions summarizing
        multiple documents.
         Instead of giving a snippet for each document
         Create a cohesive answer that combines information from

          each document




                                                                    5
EXTRACTIVE SUMMARIZATION
& ABSTRACTIVE SUMMARIZATION
   Extractive Summarization:
       Create the summary from phrases or sentences in the
        source document(s)




                                                                  Text Summarization - Machine Learning
   Abstractive Summarization
       Express the ideas in the source document using
        different words




                                                              6
SUMMARIZATION: THREE STAGES
 Content Selection: choose sentences to extract
  from the document




                                                       Text Summarization - Machine Learning
 Information Ordering: choose an order to place
  them in the summary
 Sentence Realization: clean up the sentence




                                                   7
UNSUPERVISED CONTENT SELECTION
   Intuition Dating Back to Luhn (1958):
       Choose sentences that have distinguished or
        informative words




                                                                   Text Summarization - Machine Learning
   Two Approaches to Define distinguished words
       tf-idf: weigh each word wi in document j by tf-idf



       Topic signature: choose smaller set of distinguished
        words
           Log-likelihood ratio (LLR)


                                                               8
TOPIC SIGNATURE-BASED CONTENT
SELECTION WITH QUERIES

   Choose words that are informative either
       By log-likelihood ratio (LLR)




                                                       Text Summarization - Machine Learning
       Or by appearing in the query




       Weigh a sentence by weight of its words:


                                                   9
SUPERVISED CONTENT SELECTION
   Given
       A labeled training set of good summaries for each
        document




                                                               Text Summarization - Machine Learning
   Align
       The sentences in the document with sentences in the
        summary
   Extract Features
     Position
     Length of sentence
     Word informativeness
     Cohesion
                                                              10
SUPERVISED CONTENT SELECTION
   Train
       A binary classifier (put sentence in summary? Yes or
        no)




                                                                Text Summarization - Machine Learning
   Problems
       Hard to get labeled training data
       Alignment is difficult
       Performance not better that unsupervised algorithm




                                                               11
EVALUATING SUMMARIES: ROUGE
   ROUGE “ Recall Oriented Understudy for
    Gisting Evaluation ”




                                                    Text Summarization - Machine Learning
   Internal metric for automatically evaluating
    summaries
     Based on BLEU (a metric used for machine
      translation)
     Not as good as human evaluation.
     But much more convenient

                                                   12
EVALUATING SUMMARIES: ROUGE
   Given a document D, and an automatic
    summary X:




                                                           Text Summarization - Machine Learning
     Have N humans produce a set of reference
      summaries of D
     Run System, giving automatic summary X
     What percentage of the bigrams from the reference
      summaries appear in X?




                                                          13
EXAMPLE
 Human 1: water spinach is a green leafy
  vegetable grown in the tropics.




                                                 Text Summarization - Machine Learning
 Human 2: water spinach is a semi-aquatic
  tropical plant grown as a vegetable.
 Human 3: water spinach is a commonly eaten
  leaf vegetable of Asia.

   System: water spinach is a leaf vegetable
    commonly eaten in tropical areas of Asia.

   ROUGE -2=                = 12/28 = 0.43     14
ANSWERING HARDER QUESTION:
QUERY-FOCUSED MULTI-DOCUMENT
SUMMARIZATION

   The (bottom-up) snippet method
       Find a set of relevant documents




                                                                 Text Summarization - Machine Learning
       Extract informative sentences form the documents
       Order and modify the sentences into an answer



   The(top-down) information extraction method
       Build specific answers for different questions types:
         Definition questions
         Biography questions

         Certain medical questions

                                                                15
QUERY-FOCUSED MULTI-DOCUMENT
SUMMARIZATION




                                Text Summarization - Machine Learning
                               16
MAXIMAL MARGINAL RELEVANCE (MMR)
 An iterative method for content selection from
  multiple documents




                                                          Text Summarization - Machine Learning
 Iteratively (greedily) choose the best sentence to
  insert in the summary/answer so far:
       Relevant: maximally relevant to the user query
           High cosine similarity to the query
       Novel: minimally redundant with the summary so
        far:
           Low cosine similarity to the summary




                                                         17
   Stop when desired length
LLR + MMR CHOOSING INFORMATIVE YET
NON-REDUNDANT SENTENCES

   One of many ways to combine the intuitions of
    LLR and MMR:




                                                           Text Summarization - Machine Learning
     Score each sentence based on LLR (including query
      words)
     Include the sentence with highest score in the
      summary
     Iteratively add into the summary high-scoring
      sentences that are not redundant with the summary
      so far.


                                                          18
INFORMATION ORDERING
   Chronological ordering:
       Order sentences by the date of the document “ for
        summarizing news”




                                                               Text Summarization - Machine Learning
   Coherence:
     Choose ordering that make neighboring sentences
      similar(by cosine similarity)
     Choose ordering in which neighboring sentences
      discuss the same entity


   Topical ordering
                                                              19
       Learn the ordering of topics in the source document
DOMAIN-SPECIFIC ANSWERING:
THE INFORMATION EXTRACTION METHOD

   A good biography of a person contains:
       A person’s birth/death, fame factor, education …etc




                                                               Text Summarization - Machine Learning
   A good definition contains
       Type or category “ The Hajj is a type of ritual ”
   A medical answer about a drug’s use contains:
     The problem : medical condition
     The intervention : drug or procedure
     The outcome : the result of the study




                                                              20
INFORMATION THAT SHOULD BE IN THE
ANSWER FOR 3 KINDS OF QUESTIONS




                                     Text Summarization - Machine Learning
                                    21
ARCHITECTURE FOR ANSWERING COMPLEX
QUESTIONS




                                      Text Summarization - Machine Learning
                                     22
Text Summarization - Machine Learning
                                                                             23
              NLP Stanford course.
REFERENCES:
                    
Text Summarization - Machine Learning
                        THANK YOU 
                                      24

Mais conteúdo relacionado

Mais procurados

Natural language processing (nlp)
Natural language processing (nlp)Natural language processing (nlp)
Natural language processing (nlp)
Kuppusamy P
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
Yasir Khan
 

Mais procurados (20)

Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language Processing
 
NLP
NLPNLP
NLP
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Sentiment analysis
Sentiment analysisSentiment analysis
Sentiment analysis
 
Hidden markov model ppt
Hidden markov model pptHidden markov model ppt
Hidden markov model ppt
 
natural language processing help at myassignmenthelp.net
natural language processing  help at myassignmenthelp.netnatural language processing  help at myassignmenthelp.net
natural language processing help at myassignmenthelp.net
 
Introduction to natural language processing, history and origin
Introduction to natural language processing, history and originIntroduction to natural language processing, history and origin
Introduction to natural language processing, history and origin
 
TEXT SUMMARIZATION
TEXT SUMMARIZATIONTEXT SUMMARIZATION
TEXT SUMMARIZATION
 
Text Summarization Using AI Tools.pptx
Text Summarization Using AI Tools.pptxText Summarization Using AI Tools.pptx
Text Summarization Using AI Tools.pptx
 
Natural Language Processing: Parsing
Natural Language Processing: ParsingNatural Language Processing: Parsing
Natural Language Processing: Parsing
 
Text Classification
Text ClassificationText Classification
Text Classification
 
Language models
Language modelsLanguage models
Language models
 
Image captioning
Image captioningImage captioning
Image captioning
 
Natural language processing (nlp)
Natural language processing (nlp)Natural language processing (nlp)
Natural language processing (nlp)
 
Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural Network
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Text prediction based on Recurrent Neural Network Language Model
Text prediction based on Recurrent Neural Network Language ModelText prediction based on Recurrent Neural Network Language Model
Text prediction based on Recurrent Neural Network Language Model
 

Semelhante a Text summarization

Supporting program comprehension with source code summarization
Supporting program comprehension with source code summarizationSupporting program comprehension with source code summarization
Supporting program comprehension with source code summarization
Masud Rahman
 
Learning to Link with Wikipedia
Learning to Link with WikipediaLearning to Link with Wikipedia
Learning to Link with Wikipedia
Ashish Kulkarni
 
Query based summarization
Query based summarizationQuery based summarization
Query based summarization
damom77
 

Semelhante a Text summarization (20)

Summarization in Computational linguistics
Summarization in Computational linguisticsSummarization in Computational linguistics
Summarization in Computational linguistics
 
A hybrid approach for text summarization using semantic latent Dirichlet allo...
A hybrid approach for text summarization using semantic latent Dirichlet allo...A hybrid approach for text summarization using semantic latent Dirichlet allo...
A hybrid approach for text summarization using semantic latent Dirichlet allo...
 
Multi-Topic Multi-Document Summarizer
Multi-Topic Multi-Document SummarizerMulti-Topic Multi-Document Summarizer
Multi-Topic Multi-Document Summarizer
 
A statistical model for gist generation a case study on hindi news article
A statistical model for gist generation  a case study on hindi news articleA statistical model for gist generation  a case study on hindi news article
A statistical model for gist generation a case study on hindi news article
 
NLP Techniques for Text Summarization.docx
NLP Techniques for Text Summarization.docxNLP Techniques for Text Summarization.docx
NLP Techniques for Text Summarization.docx
 
K0936266
K0936266K0936266
K0936266
 
team10.ppt.pptx
team10.ppt.pptxteam10.ppt.pptx
team10.ppt.pptx
 
Summarization using ntc approach based on keyword extraction for discussion f...
Summarization using ntc approach based on keyword extraction for discussion f...Summarization using ntc approach based on keyword extraction for discussion f...
Summarization using ntc approach based on keyword extraction for discussion f...
 
Natural Language Generation / Stanford cs224n 2019w lecture 15 Review
Natural Language Generation / Stanford cs224n 2019w lecture 15 ReviewNatural Language Generation / Stanford cs224n 2019w lecture 15 Review
Natural Language Generation / Stanford cs224n 2019w lecture 15 Review
 
Article Summarizer
Article SummarizerArticle Summarizer
Article Summarizer
 
A Newly Proposed Technique for Summarizing the Abstractive Newspapers’ Articl...
A Newly Proposed Technique for Summarizing the Abstractive Newspapers’ Articl...A Newly Proposed Technique for Summarizing the Abstractive Newspapers’ Articl...
A Newly Proposed Technique for Summarizing the Abstractive Newspapers’ Articl...
 
Supporting program comprehension with source code summarization
Supporting program comprehension with source code summarizationSupporting program comprehension with source code summarization
Supporting program comprehension with source code summarization
 
Learning to Link with Wikipedia
Learning to Link with WikipediaLearning to Link with Wikipedia
Learning to Link with Wikipedia
 
Abigail See - 2017 - Get To The Point: Summarization with Pointer-Generator N...
Abigail See - 2017 - Get To The Point: Summarization with Pointer-Generator N...Abigail See - 2017 - Get To The Point: Summarization with Pointer-Generator N...
Abigail See - 2017 - Get To The Point: Summarization with Pointer-Generator N...
 
The International Journal of Engineering and Science (IJES)
The International Journal of Engineering and Science (IJES)The International Journal of Engineering and Science (IJES)
The International Journal of Engineering and Science (IJES)
 
Side final 2
Side final 2Side final 2
Side final 2
 
Information_Retrieval_Models_Nfaoui_El_Habib
Information_Retrieval_Models_Nfaoui_El_HabibInformation_Retrieval_Models_Nfaoui_El_Habib
Information_Retrieval_Models_Nfaoui_El_Habib
 
Query Based Summarization
Query Based SummarizationQuery Based Summarization
Query Based Summarization
 
Query based summarization
Query based summarizationQuery based summarization
Query based summarization
 
Query based summarization
Query based summarizationQuery based summarization
Query based summarization
 

Último

1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
QucHHunhnh
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
heathfieldcps1
 

Último (20)

psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
Asian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptxAsian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptx
 
Role Of Transgenic Animal In Target Validation-1.pptx
Role Of Transgenic Animal In Target Validation-1.pptxRole Of Transgenic Animal In Target Validation-1.pptx
Role Of Transgenic Animal In Target Validation-1.pptx
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural ResourcesEnergy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 

Text summarization

  • 1. Text Summarization - Machine Learning TEXT SUMMARIZATION 1 Kareem El-Sayed Hashem Mohamed Mohsen Brary
  • 2. TEXT SUMMARIZATION  Goal: reducing a text with a computer program in order to create a summary that retains the most important points of the original text. Text Summarization - Machine Learning  Summarization Applications  summaries of email threads  action items from a meeting  simplifying text by compressing sentences 2
  • 3. WHAT TO SUMMARIZE? SINGLE VS. MULTIPLE DOCUMENTS  Single Document Summarization  Given a single document produce Text Summarization - Machine Learning  Abstract  Outline  Headline  Multiple Document Summarization  Given a group of document produce a gist of the document  A series of news stories of the same event  A set of webpages about some topic or question 3
  • 4. QUERY-FOCUSED SUMMARIZATION & GENERIC SUMMARIZATION  Generic Summarization  Summarize the content of a document Text Summarization - Machine Learning  Query-focused Summarization  Summarize a document with respect to an information need expressed in a user query  A kind of complex question answering  Answer a question by summarizing a document that has the information to construct the answer 4
  • 5. SUMMARIZATION FOR QUESTION ANSWERING:  Snippets  Create snippets summarizing a web page for a query Text Summarization - Machine Learning  Multiple Documents  Create answer to complex questions summarizing multiple documents.  Instead of giving a snippet for each document  Create a cohesive answer that combines information from each document 5
  • 6. EXTRACTIVE SUMMARIZATION & ABSTRACTIVE SUMMARIZATION  Extractive Summarization:  Create the summary from phrases or sentences in the source document(s) Text Summarization - Machine Learning  Abstractive Summarization  Express the ideas in the source document using different words 6
  • 7. SUMMARIZATION: THREE STAGES  Content Selection: choose sentences to extract from the document Text Summarization - Machine Learning  Information Ordering: choose an order to place them in the summary  Sentence Realization: clean up the sentence 7
  • 8. UNSUPERVISED CONTENT SELECTION  Intuition Dating Back to Luhn (1958):  Choose sentences that have distinguished or informative words Text Summarization - Machine Learning  Two Approaches to Define distinguished words  tf-idf: weigh each word wi in document j by tf-idf  Topic signature: choose smaller set of distinguished words  Log-likelihood ratio (LLR) 8
  • 9. TOPIC SIGNATURE-BASED CONTENT SELECTION WITH QUERIES  Choose words that are informative either  By log-likelihood ratio (LLR) Text Summarization - Machine Learning  Or by appearing in the query  Weigh a sentence by weight of its words: 9
  • 10. SUPERVISED CONTENT SELECTION  Given  A labeled training set of good summaries for each document Text Summarization - Machine Learning  Align  The sentences in the document with sentences in the summary  Extract Features  Position  Length of sentence  Word informativeness  Cohesion 10
  • 11. SUPERVISED CONTENT SELECTION  Train  A binary classifier (put sentence in summary? Yes or no) Text Summarization - Machine Learning  Problems  Hard to get labeled training data  Alignment is difficult  Performance not better that unsupervised algorithm 11
  • 12. EVALUATING SUMMARIES: ROUGE  ROUGE “ Recall Oriented Understudy for Gisting Evaluation ” Text Summarization - Machine Learning  Internal metric for automatically evaluating summaries  Based on BLEU (a metric used for machine translation)  Not as good as human evaluation.  But much more convenient 12
  • 13. EVALUATING SUMMARIES: ROUGE  Given a document D, and an automatic summary X: Text Summarization - Machine Learning  Have N humans produce a set of reference summaries of D  Run System, giving automatic summary X  What percentage of the bigrams from the reference summaries appear in X? 13
  • 14. EXAMPLE  Human 1: water spinach is a green leafy vegetable grown in the tropics. Text Summarization - Machine Learning  Human 2: water spinach is a semi-aquatic tropical plant grown as a vegetable.  Human 3: water spinach is a commonly eaten leaf vegetable of Asia.  System: water spinach is a leaf vegetable commonly eaten in tropical areas of Asia.  ROUGE -2= = 12/28 = 0.43 14
  • 15. ANSWERING HARDER QUESTION: QUERY-FOCUSED MULTI-DOCUMENT SUMMARIZATION  The (bottom-up) snippet method  Find a set of relevant documents Text Summarization - Machine Learning  Extract informative sentences form the documents  Order and modify the sentences into an answer  The(top-down) information extraction method  Build specific answers for different questions types:  Definition questions  Biography questions  Certain medical questions 15
  • 16. QUERY-FOCUSED MULTI-DOCUMENT SUMMARIZATION Text Summarization - Machine Learning 16
  • 17. MAXIMAL MARGINAL RELEVANCE (MMR)  An iterative method for content selection from multiple documents Text Summarization - Machine Learning  Iteratively (greedily) choose the best sentence to insert in the summary/answer so far:  Relevant: maximally relevant to the user query  High cosine similarity to the query  Novel: minimally redundant with the summary so far:  Low cosine similarity to the summary 17  Stop when desired length
  • 18. LLR + MMR CHOOSING INFORMATIVE YET NON-REDUNDANT SENTENCES  One of many ways to combine the intuitions of LLR and MMR: Text Summarization - Machine Learning  Score each sentence based on LLR (including query words)  Include the sentence with highest score in the summary  Iteratively add into the summary high-scoring sentences that are not redundant with the summary so far. 18
  • 19. INFORMATION ORDERING  Chronological ordering:  Order sentences by the date of the document “ for summarizing news” Text Summarization - Machine Learning  Coherence:  Choose ordering that make neighboring sentences similar(by cosine similarity)  Choose ordering in which neighboring sentences discuss the same entity  Topical ordering 19  Learn the ordering of topics in the source document
  • 20. DOMAIN-SPECIFIC ANSWERING: THE INFORMATION EXTRACTION METHOD  A good biography of a person contains:  A person’s birth/death, fame factor, education …etc Text Summarization - Machine Learning  A good definition contains  Type or category “ The Hajj is a type of ritual ”  A medical answer about a drug’s use contains:  The problem : medical condition  The intervention : drug or procedure  The outcome : the result of the study 20
  • 21. INFORMATION THAT SHOULD BE IN THE ANSWER FOR 3 KINDS OF QUESTIONS Text Summarization - Machine Learning 21
  • 22. ARCHITECTURE FOR ANSWERING COMPLEX QUESTIONS Text Summarization - Machine Learning 22
  • 23. Text Summarization - Machine Learning 23 NLP Stanford course. REFERENCES: 
  • 24. Text Summarization - Machine Learning THANK YOU  24