SlideShare uma empresa Scribd logo
1 de 24
Baixar para ler offline
Text Summarization - Machine Learning
    TEXT SUMMARIZATION
1   Kareem El-Sayed Hashem
    Mohamed Mohsen Brary
TEXT SUMMARIZATION
   Goal: reducing a text with a computer program in
    order to create a summary that retains the most
    important points of the original text.




                                                           Text Summarization - Machine Learning
   Summarization Applications
     summaries of email threads
     action items from a meeting
     simplifying text by compressing sentences




                                                       2
WHAT TO SUMMARIZE?
SINGLE VS. MULTIPLE DOCUMENTS
   Single Document Summarization
       Given a single document produce




                                                               Text Summarization - Machine Learning
         Abstract
         Outline

         Headline




   Multiple Document Summarization
       Given a group of document produce a gist of the
        document
         A series of news stories of the same event
         A set of webpages about some topic or question


                                                           3
QUERY-FOCUSED SUMMARIZATION
& GENERIC SUMMARIZATION
   Generic Summarization
       Summarize the content of a document




                                                                       Text Summarization - Machine Learning
   Query-focused Summarization
     Summarize a document with respect to an
      information need expressed in a user query
     A kind of complex question answering
           Answer a question by summarizing a document that has
            the information to construct the answer




                                                                   4
SUMMARIZATION FOR QUESTION
ANSWERING:
   Snippets
       Create snippets summarizing a web page for a query




                                                                        Text Summarization - Machine Learning
   Multiple Documents
       Create answer to complex questions summarizing
        multiple documents.
         Instead of giving a snippet for each document
         Create a cohesive answer that combines information from

          each document




                                                                    5
EXTRACTIVE SUMMARIZATION
& ABSTRACTIVE SUMMARIZATION
   Extractive Summarization:
       Create the summary from phrases or sentences in the
        source document(s)




                                                                  Text Summarization - Machine Learning
   Abstractive Summarization
       Express the ideas in the source document using
        different words




                                                              6
SUMMARIZATION: THREE STAGES
 Content Selection: choose sentences to extract
  from the document




                                                       Text Summarization - Machine Learning
 Information Ordering: choose an order to place
  them in the summary
 Sentence Realization: clean up the sentence




                                                   7
UNSUPERVISED CONTENT SELECTION
   Intuition Dating Back to Luhn (1958):
       Choose sentences that have distinguished or
        informative words




                                                                   Text Summarization - Machine Learning
   Two Approaches to Define distinguished words
       tf-idf: weigh each word wi in document j by tf-idf



       Topic signature: choose smaller set of distinguished
        words
           Log-likelihood ratio (LLR)


                                                               8
TOPIC SIGNATURE-BASED CONTENT
SELECTION WITH QUERIES

   Choose words that are informative either
       By log-likelihood ratio (LLR)




                                                       Text Summarization - Machine Learning
       Or by appearing in the query




       Weigh a sentence by weight of its words:


                                                   9
SUPERVISED CONTENT SELECTION
   Given
       A labeled training set of good summaries for each
        document




                                                               Text Summarization - Machine Learning
   Align
       The sentences in the document with sentences in the
        summary
   Extract Features
     Position
     Length of sentence
     Word informativeness
     Cohesion
                                                              10
SUPERVISED CONTENT SELECTION
   Train
       A binary classifier (put sentence in summary? Yes or
        no)




                                                                Text Summarization - Machine Learning
   Problems
       Hard to get labeled training data
       Alignment is difficult
       Performance not better that unsupervised algorithm




                                                               11
EVALUATING SUMMARIES: ROUGE
   ROUGE “ Recall Oriented Understudy for
    Gisting Evaluation ”




                                                    Text Summarization - Machine Learning
   Internal metric for automatically evaluating
    summaries
     Based on BLEU (a metric used for machine
      translation)
     Not as good as human evaluation.
     But much more convenient

                                                   12
EVALUATING SUMMARIES: ROUGE
   Given a document D, and an automatic
    summary X:




                                                           Text Summarization - Machine Learning
     Have N humans produce a set of reference
      summaries of D
     Run System, giving automatic summary X
     What percentage of the bigrams from the reference
      summaries appear in X?




                                                          13
EXAMPLE
 Human 1: water spinach is a green leafy
  vegetable grown in the tropics.




                                                 Text Summarization - Machine Learning
 Human 2: water spinach is a semi-aquatic
  tropical plant grown as a vegetable.
 Human 3: water spinach is a commonly eaten
  leaf vegetable of Asia.

   System: water spinach is a leaf vegetable
    commonly eaten in tropical areas of Asia.

   ROUGE -2=                = 12/28 = 0.43     14
ANSWERING HARDER QUESTION:
QUERY-FOCUSED MULTI-DOCUMENT
SUMMARIZATION

   The (bottom-up) snippet method
       Find a set of relevant documents




                                                                 Text Summarization - Machine Learning
       Extract informative sentences form the documents
       Order and modify the sentences into an answer



   The(top-down) information extraction method
       Build specific answers for different questions types:
         Definition questions
         Biography questions

         Certain medical questions

                                                                15
QUERY-FOCUSED MULTI-DOCUMENT
SUMMARIZATION




                                Text Summarization - Machine Learning
                               16
MAXIMAL MARGINAL RELEVANCE (MMR)
 An iterative method for content selection from
  multiple documents




                                                          Text Summarization - Machine Learning
 Iteratively (greedily) choose the best sentence to
  insert in the summary/answer so far:
       Relevant: maximally relevant to the user query
           High cosine similarity to the query
       Novel: minimally redundant with the summary so
        far:
           Low cosine similarity to the summary




                                                         17
   Stop when desired length
LLR + MMR CHOOSING INFORMATIVE YET
NON-REDUNDANT SENTENCES

   One of many ways to combine the intuitions of
    LLR and MMR:




                                                           Text Summarization - Machine Learning
     Score each sentence based on LLR (including query
      words)
     Include the sentence with highest score in the
      summary
     Iteratively add into the summary high-scoring
      sentences that are not redundant with the summary
      so far.


                                                          18
INFORMATION ORDERING
   Chronological ordering:
       Order sentences by the date of the document “ for
        summarizing news”




                                                               Text Summarization - Machine Learning
   Coherence:
     Choose ordering that make neighboring sentences
      similar(by cosine similarity)
     Choose ordering in which neighboring sentences
      discuss the same entity


   Topical ordering
                                                              19
       Learn the ordering of topics in the source document
DOMAIN-SPECIFIC ANSWERING:
THE INFORMATION EXTRACTION METHOD

   A good biography of a person contains:
       A person’s birth/death, fame factor, education …etc




                                                               Text Summarization - Machine Learning
   A good definition contains
       Type or category “ The Hajj is a type of ritual ”
   A medical answer about a drug’s use contains:
     The problem : medical condition
     The intervention : drug or procedure
     The outcome : the result of the study




                                                              20
INFORMATION THAT SHOULD BE IN THE
ANSWER FOR 3 KINDS OF QUESTIONS




                                     Text Summarization - Machine Learning
                                    21
ARCHITECTURE FOR ANSWERING COMPLEX
QUESTIONS




                                      Text Summarization - Machine Learning
                                     22
Text Summarization - Machine Learning
                                                                             23
              NLP Stanford course.
REFERENCES:
                    
Text Summarization - Machine Learning
                        THANK YOU 
                                      24

Mais conteúdo relacionado

Mais procurados

text summarization using amr
text summarization using amrtext summarization using amr
text summarization using amramit nagarkoti
 
Amazon Product Sentiment review
Amazon Product Sentiment reviewAmazon Product Sentiment review
Amazon Product Sentiment reviewLalit Jain
 
natural language processing help at myassignmenthelp.net
natural language processing  help at myassignmenthelp.netnatural language processing  help at myassignmenthelp.net
natural language processing help at myassignmenthelp.netwww.myassignmenthelp.net
 
Natural Language Processing in AI
Natural Language Processing in AINatural Language Processing in AI
Natural Language Processing in AISaurav Shrestha
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language ProcessingPranav Gupta
 
Abstractive Text Summarization
Abstractive Text SummarizationAbstractive Text Summarization
Abstractive Text SummarizationTho Phan
 
Natural lanaguage processing
Natural lanaguage processingNatural lanaguage processing
Natural lanaguage processinggulshan kumar
 
Presentation on Sentiment Analysis
Presentation on Sentiment AnalysisPresentation on Sentiment Analysis
Presentation on Sentiment AnalysisRebecca Williams
 
Amazon sentimental analysis
Amazon sentimental analysisAmazon sentimental analysis
Amazon sentimental analysisAkhila
 
NLP Techniques for Text Summarization.docx
NLP Techniques for Text Summarization.docxNLP Techniques for Text Summarization.docx
NLP Techniques for Text Summarization.docxKevinSims18
 
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...Simplilearn
 
Natural language processing
Natural language processingNatural language processing
Natural language processingYogendra Tamang
 
Sentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSumit Raj
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingRishikese MR
 
Natural language processing (NLP) introduction
Natural language processing (NLP) introductionNatural language processing (NLP) introduction
Natural language processing (NLP) introductionRobert Lujo
 

Mais procurados (20)

TEXT SUMMARIZATION
TEXT SUMMARIZATIONTEXT SUMMARIZATION
TEXT SUMMARIZATION
 
text summarization using amr
text summarization using amrtext summarization using amr
text summarization using amr
 
Amazon Product Sentiment review
Amazon Product Sentiment reviewAmazon Product Sentiment review
Amazon Product Sentiment review
 
Nlp ambiguity presentation
Nlp ambiguity presentationNlp ambiguity presentation
Nlp ambiguity presentation
 
natural language processing help at myassignmenthelp.net
natural language processing  help at myassignmenthelp.netnatural language processing  help at myassignmenthelp.net
natural language processing help at myassignmenthelp.net
 
Natural Language Processing in AI
Natural Language Processing in AINatural Language Processing in AI
Natural Language Processing in AI
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 
Abstractive Text Summarization
Abstractive Text SummarizationAbstractive Text Summarization
Abstractive Text Summarization
 
Natural lanaguage processing
Natural lanaguage processingNatural lanaguage processing
Natural lanaguage processing
 
Automatic indexing
Automatic indexingAutomatic indexing
Automatic indexing
 
Presentation on Sentiment Analysis
Presentation on Sentiment AnalysisPresentation on Sentiment Analysis
Presentation on Sentiment Analysis
 
Amazon sentimental analysis
Amazon sentimental analysisAmazon sentimental analysis
Amazon sentimental analysis
 
NLP Techniques for Text Summarization.docx
NLP Techniques for Text Summarization.docxNLP Techniques for Text Summarization.docx
NLP Techniques for Text Summarization.docx
 
What is word2vec?
What is word2vec?What is word2vec?
What is word2vec?
 
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
 
Word embedding
Word embedding Word embedding
Word embedding
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Sentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSentiment Analysis of Twitter Data
Sentiment Analysis of Twitter Data
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Natural language processing (NLP) introduction
Natural language processing (NLP) introductionNatural language processing (NLP) introduction
Natural language processing (NLP) introduction
 

Semelhante a Text summarization

Summarization in Computational linguistics
Summarization in Computational linguisticsSummarization in Computational linguistics
Summarization in Computational linguisticsAhmad Mashhood
 
A hybrid approach for text summarization using semantic latent Dirichlet allo...
A hybrid approach for text summarization using semantic latent Dirichlet allo...A hybrid approach for text summarization using semantic latent Dirichlet allo...
A hybrid approach for text summarization using semantic latent Dirichlet allo...IJECEIAES
 
Multi-Topic Multi-Document Summarizer
Multi-Topic Multi-Document SummarizerMulti-Topic Multi-Document Summarizer
Multi-Topic Multi-Document Summarizerijcsit
 
A statistical model for gist generation a case study on hindi news article
A statistical model for gist generation  a case study on hindi news articleA statistical model for gist generation  a case study on hindi news article
A statistical model for gist generation a case study on hindi news articleIJDKP
 
Summarization using ntc approach based on keyword extraction for discussion f...
Summarization using ntc approach based on keyword extraction for discussion f...Summarization using ntc approach based on keyword extraction for discussion f...
Summarization using ntc approach based on keyword extraction for discussion f...eSAT Publishing House
 
Natural Language Generation / Stanford cs224n 2019w lecture 15 Review
Natural Language Generation / Stanford cs224n 2019w lecture 15 ReviewNatural Language Generation / Stanford cs224n 2019w lecture 15 Review
Natural Language Generation / Stanford cs224n 2019w lecture 15 Reviewchangedaeoh
 
Article Summarizer
Article SummarizerArticle Summarizer
Article SummarizerJose Katab
 
A Newly Proposed Technique for Summarizing the Abstractive Newspapers’ Articl...
A Newly Proposed Technique for Summarizing the Abstractive Newspapers’ Articl...A Newly Proposed Technique for Summarizing the Abstractive Newspapers’ Articl...
A Newly Proposed Technique for Summarizing the Abstractive Newspapers’ Articl...mlaij
 
Supporting program comprehension with source code summarization
Supporting program comprehension with source code summarizationSupporting program comprehension with source code summarization
Supporting program comprehension with source code summarizationMasud Rahman
 
Learning to Link with Wikipedia
Learning to Link with WikipediaLearning to Link with Wikipedia
Learning to Link with WikipediaAshish Kulkarni
 
Abigail See - 2017 - Get To The Point: Summarization with Pointer-Generator N...
Abigail See - 2017 - Get To The Point: Summarization with Pointer-Generator N...Abigail See - 2017 - Get To The Point: Summarization with Pointer-Generator N...
Abigail See - 2017 - Get To The Point: Summarization with Pointer-Generator N...Association for Computational Linguistics
 
The International Journal of Engineering and Science (IJES)
The International Journal of Engineering and Science (IJES)The International Journal of Engineering and Science (IJES)
The International Journal of Engineering and Science (IJES)theijes
 
Side final 2
Side final 2Side final 2
Side final 2ARYA TM
 
Information_Retrieval_Models_Nfaoui_El_Habib
Information_Retrieval_Models_Nfaoui_El_HabibInformation_Retrieval_Models_Nfaoui_El_Habib
Information_Retrieval_Models_Nfaoui_El_HabibEl Habib NFAOUI
 
Query based summarization
Query based summarizationQuery based summarization
Query based summarizationdamom77
 
Query based summarization
Query based summarizationQuery based summarization
Query based summarizationdamom77
 
MULTI-DOCUMENT SUMMARIZATION SYSTEM: USING FUZZY LOGIC AND GENETIC ALGORITHM
MULTI-DOCUMENT SUMMARIZATION SYSTEM: USING FUZZY LOGIC AND GENETIC ALGORITHM MULTI-DOCUMENT SUMMARIZATION SYSTEM: USING FUZZY LOGIC AND GENETIC ALGORITHM
MULTI-DOCUMENT SUMMARIZATION SYSTEM: USING FUZZY LOGIC AND GENETIC ALGORITHM IAEME Publication
 

Semelhante a Text summarization (20)

Summarization in Computational linguistics
Summarization in Computational linguisticsSummarization in Computational linguistics
Summarization in Computational linguistics
 
A hybrid approach for text summarization using semantic latent Dirichlet allo...
A hybrid approach for text summarization using semantic latent Dirichlet allo...A hybrid approach for text summarization using semantic latent Dirichlet allo...
A hybrid approach for text summarization using semantic latent Dirichlet allo...
 
Multi-Topic Multi-Document Summarizer
Multi-Topic Multi-Document SummarizerMulti-Topic Multi-Document Summarizer
Multi-Topic Multi-Document Summarizer
 
A statistical model for gist generation a case study on hindi news article
A statistical model for gist generation  a case study on hindi news articleA statistical model for gist generation  a case study on hindi news article
A statistical model for gist generation a case study on hindi news article
 
K0936266
K0936266K0936266
K0936266
 
team10.ppt.pptx
team10.ppt.pptxteam10.ppt.pptx
team10.ppt.pptx
 
Summarization using ntc approach based on keyword extraction for discussion f...
Summarization using ntc approach based on keyword extraction for discussion f...Summarization using ntc approach based on keyword extraction for discussion f...
Summarization using ntc approach based on keyword extraction for discussion f...
 
Natural Language Generation / Stanford cs224n 2019w lecture 15 Review
Natural Language Generation / Stanford cs224n 2019w lecture 15 ReviewNatural Language Generation / Stanford cs224n 2019w lecture 15 Review
Natural Language Generation / Stanford cs224n 2019w lecture 15 Review
 
Article Summarizer
Article SummarizerArticle Summarizer
Article Summarizer
 
A Newly Proposed Technique for Summarizing the Abstractive Newspapers’ Articl...
A Newly Proposed Technique for Summarizing the Abstractive Newspapers’ Articl...A Newly Proposed Technique for Summarizing the Abstractive Newspapers’ Articl...
A Newly Proposed Technique for Summarizing the Abstractive Newspapers’ Articl...
 
Supporting program comprehension with source code summarization
Supporting program comprehension with source code summarizationSupporting program comprehension with source code summarization
Supporting program comprehension with source code summarization
 
Learning to Link with Wikipedia
Learning to Link with WikipediaLearning to Link with Wikipedia
Learning to Link with Wikipedia
 
Abigail See - 2017 - Get To The Point: Summarization with Pointer-Generator N...
Abigail See - 2017 - Get To The Point: Summarization with Pointer-Generator N...Abigail See - 2017 - Get To The Point: Summarization with Pointer-Generator N...
Abigail See - 2017 - Get To The Point: Summarization with Pointer-Generator N...
 
The International Journal of Engineering and Science (IJES)
The International Journal of Engineering and Science (IJES)The International Journal of Engineering and Science (IJES)
The International Journal of Engineering and Science (IJES)
 
Side final 2
Side final 2Side final 2
Side final 2
 
Information_Retrieval_Models_Nfaoui_El_Habib
Information_Retrieval_Models_Nfaoui_El_HabibInformation_Retrieval_Models_Nfaoui_El_Habib
Information_Retrieval_Models_Nfaoui_El_Habib
 
Query Based Summarization
Query Based SummarizationQuery Based Summarization
Query Based Summarization
 
Query based summarization
Query based summarizationQuery based summarization
Query based summarization
 
Query based summarization
Query based summarizationQuery based summarization
Query based summarization
 
MULTI-DOCUMENT SUMMARIZATION SYSTEM: USING FUZZY LOGIC AND GENETIC ALGORITHM
MULTI-DOCUMENT SUMMARIZATION SYSTEM: USING FUZZY LOGIC AND GENETIC ALGORITHM MULTI-DOCUMENT SUMMARIZATION SYSTEM: USING FUZZY LOGIC AND GENETIC ALGORITHM
MULTI-DOCUMENT SUMMARIZATION SYSTEM: USING FUZZY LOGIC AND GENETIC ALGORITHM
 

Último

The byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxThe byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxShobhayan Kirtania
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajanpragatimahajan3
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...Pooja Nehwal
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 

Último (20)

The byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxThe byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptx
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 

Text summarization

  • 1. Text Summarization - Machine Learning TEXT SUMMARIZATION 1 Kareem El-Sayed Hashem Mohamed Mohsen Brary
  • 2. TEXT SUMMARIZATION  Goal: reducing a text with a computer program in order to create a summary that retains the most important points of the original text. Text Summarization - Machine Learning  Summarization Applications  summaries of email threads  action items from a meeting  simplifying text by compressing sentences 2
  • 3. WHAT TO SUMMARIZE? SINGLE VS. MULTIPLE DOCUMENTS  Single Document Summarization  Given a single document produce Text Summarization - Machine Learning  Abstract  Outline  Headline  Multiple Document Summarization  Given a group of document produce a gist of the document  A series of news stories of the same event  A set of webpages about some topic or question 3
  • 4. QUERY-FOCUSED SUMMARIZATION & GENERIC SUMMARIZATION  Generic Summarization  Summarize the content of a document Text Summarization - Machine Learning  Query-focused Summarization  Summarize a document with respect to an information need expressed in a user query  A kind of complex question answering  Answer a question by summarizing a document that has the information to construct the answer 4
  • 5. SUMMARIZATION FOR QUESTION ANSWERING:  Snippets  Create snippets summarizing a web page for a query Text Summarization - Machine Learning  Multiple Documents  Create answer to complex questions summarizing multiple documents.  Instead of giving a snippet for each document  Create a cohesive answer that combines information from each document 5
  • 6. EXTRACTIVE SUMMARIZATION & ABSTRACTIVE SUMMARIZATION  Extractive Summarization:  Create the summary from phrases or sentences in the source document(s) Text Summarization - Machine Learning  Abstractive Summarization  Express the ideas in the source document using different words 6
  • 7. SUMMARIZATION: THREE STAGES  Content Selection: choose sentences to extract from the document Text Summarization - Machine Learning  Information Ordering: choose an order to place them in the summary  Sentence Realization: clean up the sentence 7
  • 8. UNSUPERVISED CONTENT SELECTION  Intuition Dating Back to Luhn (1958):  Choose sentences that have distinguished or informative words Text Summarization - Machine Learning  Two Approaches to Define distinguished words  tf-idf: weigh each word wi in document j by tf-idf  Topic signature: choose smaller set of distinguished words  Log-likelihood ratio (LLR) 8
  • 9. TOPIC SIGNATURE-BASED CONTENT SELECTION WITH QUERIES  Choose words that are informative either  By log-likelihood ratio (LLR) Text Summarization - Machine Learning  Or by appearing in the query  Weigh a sentence by weight of its words: 9
  • 10. SUPERVISED CONTENT SELECTION  Given  A labeled training set of good summaries for each document Text Summarization - Machine Learning  Align  The sentences in the document with sentences in the summary  Extract Features  Position  Length of sentence  Word informativeness  Cohesion 10
  • 11. SUPERVISED CONTENT SELECTION  Train  A binary classifier (put sentence in summary? Yes or no) Text Summarization - Machine Learning  Problems  Hard to get labeled training data  Alignment is difficult  Performance not better that unsupervised algorithm 11
  • 12. EVALUATING SUMMARIES: ROUGE  ROUGE “ Recall Oriented Understudy for Gisting Evaluation ” Text Summarization - Machine Learning  Internal metric for automatically evaluating summaries  Based on BLEU (a metric used for machine translation)  Not as good as human evaluation.  But much more convenient 12
  • 13. EVALUATING SUMMARIES: ROUGE  Given a document D, and an automatic summary X: Text Summarization - Machine Learning  Have N humans produce a set of reference summaries of D  Run System, giving automatic summary X  What percentage of the bigrams from the reference summaries appear in X? 13
  • 14. EXAMPLE  Human 1: water spinach is a green leafy vegetable grown in the tropics. Text Summarization - Machine Learning  Human 2: water spinach is a semi-aquatic tropical plant grown as a vegetable.  Human 3: water spinach is a commonly eaten leaf vegetable of Asia.  System: water spinach is a leaf vegetable commonly eaten in tropical areas of Asia.  ROUGE -2= = 12/28 = 0.43 14
  • 15. ANSWERING HARDER QUESTION: QUERY-FOCUSED MULTI-DOCUMENT SUMMARIZATION  The (bottom-up) snippet method  Find a set of relevant documents Text Summarization - Machine Learning  Extract informative sentences form the documents  Order and modify the sentences into an answer  The(top-down) information extraction method  Build specific answers for different questions types:  Definition questions  Biography questions  Certain medical questions 15
  • 16. QUERY-FOCUSED MULTI-DOCUMENT SUMMARIZATION Text Summarization - Machine Learning 16
  • 17. MAXIMAL MARGINAL RELEVANCE (MMR)  An iterative method for content selection from multiple documents Text Summarization - Machine Learning  Iteratively (greedily) choose the best sentence to insert in the summary/answer so far:  Relevant: maximally relevant to the user query  High cosine similarity to the query  Novel: minimally redundant with the summary so far:  Low cosine similarity to the summary 17  Stop when desired length
  • 18. LLR + MMR CHOOSING INFORMATIVE YET NON-REDUNDANT SENTENCES  One of many ways to combine the intuitions of LLR and MMR: Text Summarization - Machine Learning  Score each sentence based on LLR (including query words)  Include the sentence with highest score in the summary  Iteratively add into the summary high-scoring sentences that are not redundant with the summary so far. 18
  • 19. INFORMATION ORDERING  Chronological ordering:  Order sentences by the date of the document “ for summarizing news” Text Summarization - Machine Learning  Coherence:  Choose ordering that make neighboring sentences similar(by cosine similarity)  Choose ordering in which neighboring sentences discuss the same entity  Topical ordering 19  Learn the ordering of topics in the source document
  • 20. DOMAIN-SPECIFIC ANSWERING: THE INFORMATION EXTRACTION METHOD  A good biography of a person contains:  A person’s birth/death, fame factor, education …etc Text Summarization - Machine Learning  A good definition contains  Type or category “ The Hajj is a type of ritual ”  A medical answer about a drug’s use contains:  The problem : medical condition  The intervention : drug or procedure  The outcome : the result of the study 20
  • 21. INFORMATION THAT SHOULD BE IN THE ANSWER FOR 3 KINDS OF QUESTIONS Text Summarization - Machine Learning 21
  • 22. ARCHITECTURE FOR ANSWERING COMPLEX QUESTIONS Text Summarization - Machine Learning 22
  • 23. Text Summarization - Machine Learning 23 NLP Stanford course. REFERENCES: 
  • 24. Text Summarization - Machine Learning THANK YOU  24