Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Tutorial on automatic summarization
1. Automatic summarisation in the Information Age
Constantin Or˘san
a
Research Group in Computational Linguistics
Research Institute in Information and Language Processing
University of Wolverhampton
http://www.wlv.ac.uk/~in6093/
http://www.summarizationonline.info
12th Sept 2009
2. Structure of the course
1 Introduction to automatic summarisation
3. Structure of the course
1 Introduction to automatic summarisation
2 Important methods in automatic summarisation
4. Structure of the course
1 Introduction to automatic summarisation
2 Important methods in automatic summarisation
3 Automatic summarisation and the Internet
5. Structure of the course
1 Introduction to automatic summarisation
What is a summary?
What is automatic summarisation
Context factors
Evaluation
General information about evaluation
Direct evaluation
Target-based evaluation
Task-based evaluation
Automatic evaluation
Evaluation conferences
2 Important methods in automatic summarisation
3 Automatic summarisation and the Internet
17. Summaries in everyday life
• Headlines: summaries of newspaper articles
• Table of contents: summary of a book, magazine
18. Summaries in everyday life
• Headlines: summaries of newspaper articles
• Table of contents: summary of a book, magazine
• Digest: summary of stories on the same topic
19. Summaries in everyday life
• Headlines: summaries of newspaper articles
• Table of contents: summary of a book, magazine
• Digest: summary of stories on the same topic
• Highlights: summary of an event (meeting, sport event, etc.)
20. Summaries in everyday life
• Headlines: summaries of newspaper articles
• Table of contents: summary of a book, magazine
• Digest: summary of stories on the same topic
• Highlights: summary of an event (meeting, sport event, etc.)
• Abstract: summary of a scientific paper
21. Summaries in everyday life
• Headlines: summaries of newspaper articles
• Table of contents: summary of a book, magazine
• Digest: summary of stories on the same topic
• Highlights: summary of an event (meeting, sport event, etc.)
• Abstract: summary of a scientific paper
• Bulletin: weather forecast, stock market, news
22. Summaries in everyday life
• Headlines: summaries of newspaper articles
• Table of contents: summary of a book, magazine
• Digest: summary of stories on the same topic
• Highlights: summary of an event (meeting, sport event, etc.)
• Abstract: summary of a scientific paper
• Bulletin: weather forecast, stock market, news
• Biography: resume, obituary
23. Summaries in everyday life
• Headlines: summaries of newspaper articles
• Table of contents: summary of a book, magazine
• Digest: summary of stories on the same topic
• Highlights: summary of an event (meeting, sport event, etc.)
• Abstract: summary of a scientific paper
• Bulletin: weather forecast, stock market, news
• Biography: resume, obituary
• Abridgment: of books
24. Summaries in everyday life
• Headlines: summaries of newspaper articles
• Table of contents: summary of a book, magazine
• Digest: summary of stories on the same topic
• Highlights: summary of an event (meeting, sport event, etc.)
• Abstract: summary of a scientific paper
• Bulletin: weather forecast, stock market, news
• Biography: resume, obituary
• Abridgment: of books
• Review: of books, music, plays
25. Summaries in everyday life
• Headlines: summaries of newspaper articles
• Table of contents: summary of a book, magazine
• Digest: summary of stories on the same topic
• Highlights: summary of an event (meeting, sport event, etc.)
• Abstract: summary of a scientific paper
• Bulletin: weather forecast, stock market, news
• Biography: resume, obituary
• Abridgment: of books
• Review: of books, music, plays
• Scale-downs: maps, thumbnails
26. Summaries in everyday life
• Headlines: summaries of newspaper articles
• Table of contents: summary of a book, magazine
• Digest: summary of stories on the same topic
• Highlights: summary of an event (meeting, sport event, etc.)
• Abstract: summary of a scientific paper
• Bulletin: weather forecast, stock market, news
• Biography: resume, obituary
• Abridgment: of books
• Review: of books, music, plays
• Scale-downs: maps, thumbnails
• Trailer: from film, speech
27. Summaries in the context of this tutorial
• are produced from the text of one or several documents
• the summary is a text or a list of sentences
28. Definitions of summary
• “an abbreviated, accurate representation of the content of a
document preferably prepared by its author(s) for publication
with it. Such abstracts are also useful in access publications
and machine-readable databases” (American National
Standards Institute Inc., 1979)
29. Definitions of summary
• “an abbreviated, accurate representation of the content of a
document preferably prepared by its author(s) for publication
with it. Such abstracts are also useful in access publications
and machine-readable databases” (American National
Standards Institute Inc., 1979)
30. Definitions of summary
• “an abbreviated, accurate representation of the content of a
document preferably prepared by its author(s) for publication
with it. Such abstracts are also useful in access publications
and machine-readable databases” (American National
Standards Institute Inc., 1979)
• “an abstract summarises the essential contents of a particular
knowledge record, and it is a true surrogate of the document”
(Cleveland, 1983)
31. Definitions of summary
• “an abbreviated, accurate representation of the content of a
document preferably prepared by its author(s) for publication
with it. Such abstracts are also useful in access publications
and machine-readable databases” (American National
Standards Institute Inc., 1979)
• “an abstract summarises the essential contents of a particular
knowledge record, and it is a true surrogate of the document”
(Cleveland, 1983)
32. Definitions of summary
• “an abbreviated, accurate representation of the content of a
document preferably prepared by its author(s) for publication
with it. Such abstracts are also useful in access publications
and machine-readable databases” (American National
Standards Institute Inc., 1979)
• “an abstract summarises the essential contents of a particular
knowledge record, and it is a true surrogate of the document”
(Cleveland, 1983)
• “the primary function of abstracts is to indicate and predict
the structure and content of the text” (van Dijk, 1980)
33. Definitions of summary
• “an abbreviated, accurate representation of the content of a
document preferably prepared by its author(s) for publication
with it. Such abstracts are also useful in access publications
and machine-readable databases” (American National
Standards Institute Inc., 1979)
• “an abstract summarises the essential contents of a particular
knowledge record, and it is a true surrogate of the document”
(Cleveland, 1983)
• “the primary function of abstracts is to indicate and predict
the structure and content of the text” (van Dijk, 1980)
34. Definitions of summary (II)
• “the abstract is a time saving device that can be used to find
a particular part of the article without reading it; [...] knowing
the structure in advance will help the reader to get into the
article; [...] as a summary of the article, it can serve as a
review, or as a clue to the content”. Also, an abstract gives
“an exact and concise knowledge of the total content of the
very much more lengthy original, a factual summary which is
both an elaboration of the title and a condensation of the
report [...] if comprehensive enough, it might replace reading
the article for some purposes” (Graetz, 1985).
35. Definitions of summary (II)
• “the abstract is a time saving device that can be used to find
a particular part of the article without reading it; [...] knowing
the structure in advance will help the reader to get into the
article; [...] as a summary of the article, it can serve as a
review, or as a clue to the content”. Also, an abstract gives
“an exact and concise knowledge of the total content of the
very much more lengthy original, a factual summary which is
both an elaboration of the title and a condensation of the
report [...] if comprehensive enough, it might replace reading
the article for some purposes” (Graetz, 1985).
36. Definitions of summary (II)
• “the abstract is a time saving device that can be used to find
a particular part of the article without reading it; [...] knowing
the structure in advance will help the reader to get into the
article; [...] as a summary of the article, it can serve as a
review, or as a clue to the content”. Also, an abstract gives
“an exact and concise knowledge of the total content of the
very much more lengthy original, a factual summary which is
both an elaboration of the title and a condensation of the
report [...] if comprehensive enough, it might replace reading
the article for some purposes” (Graetz, 1985).
• these definitions refer to human produced summaries
38. Definitions for automatic summaries
• these definitions are less ambitious
• “a concise representation of a document’s content to enable
the reader to determine its relevance to a specific
information” (Johnson, 1995)
39. Definitions for automatic summaries
• these definitions are less ambitious
• “a concise representation of a document’s content to enable
the reader to determine its relevance to a specific
information” (Johnson, 1995)
40. Definitions for automatic summaries
• these definitions are less ambitious
• “a concise representation of a document’s content to enable
the reader to determine its relevance to a specific
information” (Johnson, 1995)
• “a summary is a text produced from one or more texts, that
contains a significant portion of the information in the original
text(s), and is not longer than half of the original text(s)”.
(Hovy, 2003)
41. Definitions for automatic summaries
• these definitions are less ambitious
• “a concise representation of a document’s content to enable
the reader to determine its relevance to a specific
information” (Johnson, 1995)
• “a summary is a text produced from one or more texts, that
contains a significant portion of the information in the original
text(s), and is not longer than half of the original text(s)”.
(Hovy, 2003)
43. What is automatic (text) summarisation
• Text summarisation
• a reductive transformation of source text to summary text
through content reduction by selection and/or generalisation
on what is important in the source. (Sparck Jones, 1999)
44. What is automatic (text) summarisation
• Text summarisation
• a reductive transformation of source text to summary text
through content reduction by selection and/or generalisation
on what is important in the source. (Sparck Jones, 1999)
45. What is automatic (text) summarisation
• Text summarisation
• a reductive transformation of source text to summary text
through content reduction by selection and/or generalisation
on what is important in the source. (Sparck Jones, 1999)
• the process of distilling the most important information from a
source (or sources) to produce an abridged version for a
particular user (or users) and task (or tasks). (Mani and
Maybury, 1999)
46. What is automatic (text) summarisation
• Text summarisation
• a reductive transformation of source text to summary text
through content reduction by selection and/or generalisation
on what is important in the source. (Sparck Jones, 1999)
• the process of distilling the most important information from a
source (or sources) to produce an abridged version for a
particular user (or users) and task (or tasks). (Mani and
Maybury, 1999)
47. What is automatic (text) summarisation
• Text summarisation
• a reductive transformation of source text to summary text
through content reduction by selection and/or generalisation
on what is important in the source. (Sparck Jones, 1999)
• the process of distilling the most important information from a
source (or sources) to produce an abridged version for a
particular user (or users) and task (or tasks). (Mani and
Maybury, 1999)
• Automatic text summarisation = The process of producing
summaries automatically.
48. Related disciplines
There are many disciplines which are related to automatic
summarisation:
• automatic categorisation/classification
• term/keyword extraction
• information retrieval
• information extraction
• question answering
• text generation
• data/opinion mining
49. Automatic categorisation/classification
• Automatic text categorisation
• is the task of building software tools capable of classifying text
documents under predefined categories or subject codes
• each document can be in one or several categories
• examples of categories: Library of Congress subject headings
• Automatic text classification
• is usually considered broader than text categorisation
• includes text clustering and text categorisation
• in does not necessary require to know the classes
• Examples: email/spam filtering, routing,
50. Term/keyword extraction
• automatically identifies terms/keywords in texts
• a term is a word or group of words which are important in a
domain and represent a concept of the domain
• a keyword is an important word in a document, but it is not
necessary a term
• terms and keywords are extracted using a mixture of
statistical and linguistic approaches
• automatic indexing identifies all the relevant occurrences of a
keyword in texts and produces indexes
51. Information retrieval (IR)
• Information retrieval attempts to find information relevant to
a user query and rank it according to its relevance
• the output is usually a list of documents in some cases
together with relevant snippets from the document
• Example: search engines
• needs to be able to deal with enormous quantities of
information and process information in any format (e.g. text,
image, video, etc.)
• is a field which achieved a level of maturity and is used in
industry and business
• combines statistics, text analysis, link analysis and user
interfaces
52. Information extraction (IE)
• Information extraction is the automatic identification of
predefined types of entities, relations or events in free text
• quite often the best results are obtained by rule-based
approaches, but machine learning approaches are used more
and more
• can generate database records
• is domain dependent
• this field developed a lot as a result of the MUC conferences
• one of the tasks in the MUC conferences was to fill in
templates
• Example: Ford appointed Harriet Smith as president
53. Information extraction (IE)
• Information extraction is the automatic identification of
predefined types of entities, relations or events in free text
• quite often the best results are obtained by rule-based
approaches, but machine learning approaches are used more
and more
• can generate database records
• is domain dependent
• this field developed a lot as a result of the MUC conferences
• one of the tasks in the MUC conferences was to fill in
templates
• Example: Ford appointed Harriet Smith as president
54. Information extraction (IE)
• Information extraction is the automatic identification of
predefined types of entities, relations or events in free text
• quite often the best results are obtained by rule-based
approaches, but machine learning approaches are used more
and more
• can generate database records
• is domain dependent
• this field developed a lot as a result of the MUC conferences
• one of the tasks in the MUC conferences was to fill in
templates
• Example: Ford appointed Harriet Smith as president
• Person: Harriet Smith
• Job: president
• Company: Ford
55. Question answering (QA)
• Question answering aims at identifying the answer to a
question in a large collection of documents
• the information provided by QA is more focused than
information retrieval
• a QA system should be able to answer any question and
should not be restricted to a domain (like IE)
• the output can be the exact answer or a text snippet which
contains the answer
• the domain took off as a result of the introduction of QA
track in TREC
• user-focused summarisation = open-domain question
answering
56. Text generation
• Text generation creates text from computer-internal
representations of information
• most generation systems rely on massive amounts of linguistic
knowledge and manually encoded rules for translating the
underlying representation into language
• text generation systems are very domain dependent
57. Data mining
• Data mining is the (semi)automatic discovery of trends,
patterns or unusual data across very large data sets, usually
for the purposes of decision making
• Text mining applies methods from data mining to textual
collections
• Processes really large amounts of data in order to find useful
information
• In many cases it is not known (clearly) what is sought
• Visualisation has a very important role in data mining
58. Opinion mining
• Opinion mining (OM) is a recent discipline at the crossroads
of information retrieval and computational linguistics which is
concerned not with the topic a document is about, but with
the opinion it expresses.
• Is usually applied to collections of documents (e.g. blogs) and
seen part of text/data mining
• Sentiment Analysis, Sentiment Classification, Opinion
Extraction are other names used in literature to identify this
discipline.
• Examples of OM problems:
• What is the general opinion on the proposed tax reform?
• How is popular opinion on the presidential candidates evolving?
• Which of our customers are unsatisfied? Why?
61. Context factors
• the context factors defined by Sparck Jones (1999; 2001)
represent a good way of characterising summaries
62. Context factors
• the context factors defined by Sparck Jones (1999; 2001)
represent a good way of characterising summaries
• they do not necessary refer to automatic summaries
63. Context factors
• the context factors defined by Sparck Jones (1999; 2001)
represent a good way of characterising summaries
• they do not necessary refer to automatic summaries
• they do not necessary refer to summaries
64. Context factors
• the context factors defined by Sparck Jones (1999; 2001)
represent a good way of characterising summaries
• they do not necessary refer to automatic summaries
• they do not necessary refer to summaries
• there are three types of factors:
65. Context factors
• the context factors defined by Sparck Jones (1999; 2001)
represent a good way of characterising summaries
• they do not necessary refer to automatic summaries
• they do not necessary refer to summaries
• there are three types of factors:
• input factors: characterise the input document(s)
66. Context factors
• the context factors defined by Sparck Jones (1999; 2001)
represent a good way of characterising summaries
• they do not necessary refer to automatic summaries
• they do not necessary refer to summaries
• there are three types of factors:
• input factors: characterise the input document(s)
• purpose factors: define the transformations necessary to obtain
the output
67. Context factors
• the context factors defined by Sparck Jones (1999; 2001)
represent a good way of characterising summaries
• they do not necessary refer to automatic summaries
• they do not necessary refer to summaries
• there are three types of factors:
• input factors: characterise the input document(s)
• purpose factors: define the transformations necessary to obtain
the output
• output factors: characterise the produced summaries
68. Context factors
Input factors Purpose factors Output factors
Form Situation Form
- Structure Use - Structure
- Scale Summary type - Scale
- Medium Coverage - Medium
- Genre Relation to source - Language
- Language - Format
- Format Subject matter
Subject type
Unit
69. Input factors - Form
• structure: explicit organisation of documents.
Can be problem - solution structure of scientific documents,
pyramidal structure of newspaper articles, presence of
embedded structure in text (e.g. rhetorical patterns)
70. Input factors - Form
• structure: explicit organisation of documents.
Can be problem - solution structure of scientific documents,
pyramidal structure of newspaper articles, presence of
embedded structure in text (e.g. rhetorical patterns)
• scale: the length of the documents
Different methods need to be used for a book and for a
newspaper article due to very different compression rates
71. Input factors - Form
• structure: explicit organisation of documents.
Can be problem - solution structure of scientific documents,
pyramidal structure of newspaper articles, presence of
embedded structure in text (e.g. rhetorical patterns)
• scale: the length of the documents
Different methods need to be used for a book and for a
newspaper article due to very different compression rates
• medium: natural language/sublanguage/specialised language
If the text is written in a sublanguage it is less ambiguous and
therefore it’s easier to process.
72. Input factors - Form
• language: monolingual/multilingual/cross-lingual
73. Input factors - Form
• language: monolingual/multilingual/cross-lingual
• Monolingual: the source and the output are in the same
language
74. Input factors - Form
• language: monolingual/multilingual/cross-lingual
• Monolingual: the source and the output are in the same
language
• Multilingual: the input is in several languages and output in
one of these languages
75. Input factors - Form
• language: monolingual/multilingual/cross-lingual
• Monolingual: the source and the output are in the same
language
• Multilingual: the input is in several languages and output in
one of these languages
• Cross-lingual: the language of the output is different from the
language of the source(s)
76. Input factors - Form
• language: monolingual/multilingual/cross-lingual
• Monolingual: the source and the output are in the same
language
• Multilingual: the input is in several languages and output in
one of these languages
• Cross-lingual: the language of the output is different from the
language of the source(s)
• formatting: whether the source is in any special formatting.
This is more a programming problem, but needs to be taken
into consideration if information is lost as a result of
conversion.
77. Input factors
• Subject type: intended readership
Indicates whether the source was written from the general
reader or for specific readers. It influences the amount of
background information present in the source.
78. Input factors
• Subject type: intended readership
Indicates whether the source was written from the general
reader or for specific readers. It influences the amount of
background information present in the source.
• Unit: single/multiple sources (single vs. multi-document
summarisation)
mainly concerned with the amount of redundancy in the text
79. Why input factors are useful?
The input factors can be used whether to summarise a text or not:
• Brandow, Mitze, and Rau (1995) use structure of the
document (presence of speech, tables, embedded lists, etc.)
to decide whether to summarise it or not.
• Louis and Nenkova (2009) train a system on DUC data to
determine whether the result is expected to be reliable or not.
81. Purpose factors
• Use: how the summary is used
• retrieving: the user uses the summary to decide whether to
read the whole document,
82. Purpose factors
• Use: how the summary is used
• retrieving: the user uses the summary to decide whether to
read the whole document,
• substituting: use the summary instead of the full document,
83. Purpose factors
• Use: how the summary is used
• retrieving: the user uses the summary to decide whether to
read the whole document,
• substituting: use the summary instead of the full document,
• previewing: get the structure of the source, etc.
84. Purpose factors
• Use: how the summary is used
• retrieving: the user uses the summary to decide whether to
read the whole document,
• substituting: use the summary instead of the full document,
• previewing: get the structure of the source, etc.
• Summary type: indicates how is the summary
85. Purpose factors
• Use: how the summary is used
• retrieving: the user uses the summary to decide whether to
read the whole document,
• substituting: use the summary instead of the full document,
• previewing: get the structure of the source, etc.
• Summary type: indicates how is the summary
• indicative summaries provide a brief description of the source
without going into details,
86. Purpose factors
• Use: how the summary is used
• retrieving: the user uses the summary to decide whether to
read the whole document,
• substituting: use the summary instead of the full document,
• previewing: get the structure of the source, etc.
• Summary type: indicates how is the summary
• indicative summaries provide a brief description of the source
without going into details,
• informative summaries follow the ideas main ideas and
structure of the source
87. Purpose factors
• Use: how the summary is used
• retrieving: the user uses the summary to decide whether to
read the whole document,
• substituting: use the summary instead of the full document,
• previewing: get the structure of the source, etc.
• Summary type: indicates how is the summary
• indicative summaries provide a brief description of the source
without going into details,
• informative summaries follow the ideas main ideas and
structure of the source
• critical summaries give a description of the source and discuss
its contents (e.g. review articles can be considered critical
summaries)
89. Purpose factors
• Relation to source: whether the summary is an extract or
abstract
• extract: contains units directly extracted from the document
(i.e. paragraphs, sentences, clauses),
90. Purpose factors
• Relation to source: whether the summary is an extract or
abstract
• extract: contains units directly extracted from the document
(i.e. paragraphs, sentences, clauses),
• abstract: includes units which are not present in the source
91. Purpose factors
• Relation to source: whether the summary is an extract or
abstract
• extract: contains units directly extracted from the document
(i.e. paragraphs, sentences, clauses),
• abstract: includes units which are not present in the source
• Coverage: which type of information should be present in the
summary
92. Purpose factors
• Relation to source: whether the summary is an extract or
abstract
• extract: contains units directly extracted from the document
(i.e. paragraphs, sentences, clauses),
• abstract: includes units which are not present in the source
• Coverage: which type of information should be present in the
summary
• generic: the summary should cover all the important
information of the document,
93. Purpose factors
• Relation to source: whether the summary is an extract or
abstract
• extract: contains units directly extracted from the document
(i.e. paragraphs, sentences, clauses),
• abstract: includes units which are not present in the source
• Coverage: which type of information should be present in the
summary
• generic: the summary should cover all the important
information of the document,
• user-focused: the user indicates which should be the focus of
the summary
94. Output factors
• Scale (also referred to as compression rate): indicates the
length of the summary
• American National Standards Institute Inc. (1979)
recommends 250 words
• Borko and Bernier (1975) point out that imposing an arbitrary
limit on summaries is not good for their quality, but that a
length of around 10% is usually enough
• Hovy (2003) requires that the length of the summary is kept
less then half of the source’s size
• Goldstein et al. (1999) point out that the summary length
seems to be independent from the length of the source
• the structure of the output can be influenced by the structure
of the input or by existing conventions
• the subject matter can be the same as the input, or can be
broader when background information is added
96. Why is evaluation necessary?
• Evaluation is very important because it allows us to assess the
results of a method or system
• Evaluation allows us to compare the results of different
methods or systems
• Some types of evaluation allow us to understand why a
method fails
• almost each field has its specific evaluation methods
• there are several ways to perform evaluation
• How the system is considered
• How humans interact with the evaluation process
• What is measured
97. How the system is considered
• black-box evaluation:
• the system is considered opaque to the user
• the system is considered as a whole
• allows direct comparison between different systems
• does not explain the system’s performance
98. How the system is considered
• black-box evaluation:
• the system is considered opaque to the user
• the system is considered as a whole
• allows direct comparison between different systems
• does not explain the system’s performance
• glass-box evaluation:
• each of the system’s components are assessed in order to
understand how the final result is obtained
• is very time consuming and difficult
• relies on phenomena which are not fully understood (e.g. error
propagation)
99. How humans interact with the process
• off-line evaluation
• also called automatic evaluation because it does not require
human intervention
• usually involves the comparison between the system’s output
and a gold standard
• very often annotated corpora are used as gold standards
• are usually preferred because they are fast and not directly
influenced by the human subjectivity
• can be repeated
• cannot be (easily) used in all the fields
100. How humans interact with the process
• off-line evaluation
• also called automatic evaluation because it does not require
human intervention
• usually involves the comparison between the system’s output
and a gold standard
• very often annotated corpora are used as gold standards
• are usually preferred because they are fast and not directly
influenced by the human subjectivity
• can be repeated
• cannot be (easily) used in all the fields
• online evaluation
• requires humans to assess the output of the system according
to some guidelines
• is useful for those tasks where the output of the system cannot
be uniquely predicted (e.g. summarisation, text generation,
question answering, machine translation)
• are time consuming, expensive and cannot be easily repeated
101. What it is measured
• intrinsic evaluation:
• evaluates the results of a system directly
• for example: quality, informativeness
• sometimes does not give a very accurate view of how useful
the output can be for another task
102. What it is measured
• intrinsic evaluation:
• evaluates the results of a system directly
• for example: quality, informativeness
• sometimes does not give a very accurate view of how useful
the output can be for another task
• extrinsic evaluation:
• evaluates the results of another system which uses the results
of the first
• examples: post-edit measures, relevance assessment, reading
comprehension
103. Evaluation used in automatic
summarisation
• evaluation is very difficult task because there is no clear idea
what constitutes a good summary
• the number of perfectly acceptable summaries from a text is
not limited
• four types of evaluation methods
104. Evaluation used in automatic
summarisation
• evaluation is very difficult task because there is no clear idea
what constitutes a good summary
• the number of perfectly acceptable summaries from a text is
not limited
• four types of evaluation methods
Intrinsic Extrinsic
On-line Direct evaluation Task-based evaluation
Off-line evaluation Target-based evaluation Automatic evaluation
105. Direct evaluation
• intrinsic & online evaluation
• requires humans to read summaries and measure their quality
and informativeness according to some guidelines
• is one of the first evaluation methods used in automatic
summarisation
• to a certain extent it is quite straight forward which makes it
appealing for small scale evaluation
• it is time consuming, subjective and in many cases cannot be
repeated by others
106. Direct evaluation: quality
• it tries to assess the quality of a summary independently from
the source
• can be simple classification of sentences in acceptable or
unacceptable
• Minel, Nugier, and Piat (1997) proposed an evaluation
protocol which considers the coherence, cohesion and legibility
of summaries
• cohesion of a summary is measured in terms of dangling
anaphors
• the coherence in terms of discourse ruptures.
• the legibility is decided by jurors who are requested to classify
each summary in very bad, bad, mediocre, good and very good.
• it does not assess the contents of a summary so it could be
misleading
107. Direct evaluation: informativeness
• assesses how correctly the information in the source is
reflected in the summary
• the judges are required to read both the source and the
summary, for this reason making the process longer and more
expensive
• judges are generally required to:
• identify important ideas from the source which do not appear
in the summary
• ideas from the summary which are not important enough and
therefore should not be there
• identify the logical development of the ideas and see whether
they appear in the summary
• given that it is time consuming automatic methods to
compute the informativeness are preferred
108. Target-based evaluation
• it is the most used evaluation method
• compares the automatic summary with a gold standard
• they are appropriate for extractive summarisation methods
• it is intrinsic and off-line
• it does not require to have humans involved in the evaluation
• has the advantage of being fast, cheap and can be repeated
by other researchers
• the drawback is that it requires a gold standard which usually
is not easy to produce
109. Corpora as gold standards
• usually annotated corpora are used as gold standard
• usually the annotation is very simple: for each sentence it
indicates whether it is important enough to be included in the
summary or not
• such corpora are normally used to assess extracts
• can be produced manually and automatically
• these corpora normally represent one point of view
110. Manually produced corpora
• Require human judges to read each text from the corpus and
to identify the important units in each text according to
guidelines
• Kupiec, Pederson, and Chen (1995) and Teufel and Moens
(1997) took advantage of the existence of human produced
abstracts and asked human annotators to align sentences from
the document with sentences from the abstracts.
• it is not necessary to use specialised tools apply this
annotation, but in many cases they can help
111. Guidelines for manually annotated corpora
• Edmundson (1969) annotated a heterogenous corpus
consisting of 200 documents in the fields of physics, life
science, information science and humanities. The important
sentences were considered to be those which indicated:
• what the subject area is,
• why the research is necessary,
• how the problem is solved,
• which are the findings of the research.
• Hasler, Or˘san, and Mitkov (2003) annotated a corpus of
a
newspaper articles and the important sentences were
considered those linked to the main topic of text as indicated
in the title (See http://clg.wlv.ac.uk/projects/CAST/ for the
complete guidelines)
112. Problems with manually produced corpora
• given how subjective the identification of important sentences
is, the agreement between annotators is low
• the inter-annotator agreement is determined by the genre of
texts and the length of summaries
• Hasler, Or˘san, and Mitkov (2003) tries to measure the
a
agreement between three annotators and notice very low
value, but
• when the contents is compared the agreement increases
113. Automatically produced corpora
• Relies on the fact that very often human produce summaries
by copy-paste from the source
• there are algorithms which identify sets of sentences from the
source which cover the information in the summary
• Marcu (1999) employed a greedy algorithm which eliminates
sentences from the whole document that do not reduce the
similarity between the summary and the remaining sentences.
• Jing and McKeown (1999) treat the human produced abstract
as a sequence of words which appears in the document, and
reformulate the problem of alignment as the problem of
finding the most likely position of the words from the abstract
in the full document using a Hidden Markov Model.
114. Evaluation measures used with annotated
corpora
• usually precision, recall and f-measure are used to calculate
the performance of a system
• the list of sentences extracted by the program is compared
with the list of sentences marked by humans
Extracted by program Not-extracted by program
Extracted by humans True Positives False negatives
Not extracted by humans False positives True negatives
TruePositives
Precision =
TruePositives + FalsePositives
TruePositives
Recall =
TruePositives + FalseNegatives
(β 2 + 1)PR
F − score =
β2P + R
115. Summary Evaluation Environment (SEE)
• SEE environment was is being used in the DUC evaluations
• is a combination between direct and target evaluation
• it requires humans to assess whether each unit from the
automatic summary appears in the target summary
• it also offers the option to answer questions about the quality
of the summary (e.g. Does the summary build from sentence
to sentence to a coherent body of information about the
topic?)
116.
117. Relative utility of sentences (Radev et. al.,
2000)
• Addresses the problem that humans often disagree when they
are asked to select the top n% sentences from a document
• Each sentence in the document receives a score from 1 to 10
depending on how “summary worthy” is
• The score of an automatic summary is the normalised score of
the extracted sentences
• When several judges are available the score of a summary is
the average over all judges
• Can be used for any compression rate
118. Target-based evaluation without
annotated corpora
• They require that the sources have a human provided
summary (but they do not need to be annotated)
• Donaway et. al. (2000) propose to use cosine similarity
between an automatic summary and human summary - but it
relies on words co-occurrences
• ROUGE uses the number of overlapping units (Lin, 2004)
• Nenkova and Passonneau (2004) proposed the pyramid
evaluation method which addresses the problem that different
people select different content when writing summaries
119. ROUGE
• ROUGE = Recall-Oriented Understudy for Gisting Evaluation
(Lin, 2004)
• inspired by BLEU (Bilingual Evaluation Understudy) used in
machine translation (Papineni et al., 2002)
• Developed by Chin-Yew Lin and available at
http://berouge.com
• Compares quality of a summary by comparison with ideal
summaries
• Metrics count the number of overlapping units
• There are several versions depending on how the comparison
is made
120. ROUGE-N
N-gram co-occurrence statistics is a recall oriented metric
• S1: Police killed the gunman
• S2: Police kill the gunman
• S3: The gunman kill police
• S2=S3
121. ROUGE-L
Longest common sequence
• S1: police killed the gunman
• S2: police kill the gunman
• S3: the gunman kill police
• S2 = 3/4 (police the gunman)
• S3 = 2/4 (the gunman)
• S2 > S3
122. ROUGE-W
Weighted Longest Common Subsequence
• S1: [A B C D E F G]
• S2: [A B C D H I J]
• S3: [A H B J C I D]
• ROUGE-W favours consecutive matches
• S2 better than S3
123. ROUGE-S
ROUGE-S: Skip-bigram recall metric
• Arbitrary in-sequence bigrams are computed
• S1: police killed the gunman (“police killed”, “police the”,
“police gunman”, “killed the”, “killed gunman”, “the
gunman”)
• S2: police kill the gunman (“police the”, “police gunman”,
“the gunman”)
• S3: the gunman kill police (“the gunman”)
• S4: the gunman police killed (“police killed”, “the gunman”)
• S2 better than S4 better than S3
• ROUGE-SU adds unigrams to ROUGE-S
124. ROUGE
• Experiments on DUC 2000 - 2003 data shows good corelation
with human judgement
• Using multiple references achieved better correlation with
human judgement than just using a single reference.
• Stemming and removing stopwords improved correlation with
human judgement
125. Task-based evaluation
• is an extrinsic and on-line evaluation
• instead of evaluating the summaries directly, humans are
asked to perform tasks using summaries and the accuracy of
these tasks is measured
• the assumption is that the accuracy does not decrease when
good summaries are used
• the time should reduce
• Example of tasks: classification of summaries according to
predefined classes (Saggion and Lapalme, 2000), determining
the relevance of a summary to a topic (Miike et al., 1994; Oka
and Ueda, 2000), and reading comprehension (Morris, Kasper,
and Adams, 1992; Or˘san, Pekar, and Hasler, 2004).
a
126. Task-based evaluation
• this evaluation can be very useful because it assess a summary
in real situations
• it is time consuming and requires humans to be involved in
the evaluation process
• in order to obtain statistically significant results a large
number of judges have to be involved
• this evaluation method has been used in evaluation
conferences
127. Automatic evaluation
• extrinsic and off-line evaluation method
• tries to replace humans in task-based evaluations with
automatic methods which perform the same task and are
evaluated automatically
• Examples:
• text retrieval (Brandow, Mitze, and Rau, 1995): increase in
precision but drastic reduction of recall
• text categorisation (Kolcz, Prabakarmurthi, and Kalita, 2001):
the performance of categorisation increases
• has the advantage of being fast and cheap, but in many cases
the tasks which can benefit from summaries are as difficult to
evaluate as automatic summarisation (e.g. Kuo et al. (2002)
proposed to use QA)
129. intrinsic
• semi-purpose: inspection (e.g. for proper
English)
extrinsic
From (Sparck Jones, 2007)
130. intrinsic
• semi-purpose: inspection (e.g. for proper
English)
• quasi-purpose: comparison with models (e.g.
ngrams, nuggets)
extrinsic
From (Sparck Jones, 2007)
131. intrinsic
• semi-purpose: inspection (e.g. for proper
English)
• quasi-purpose: comparison with models (e.g.
ngrams, nuggets)
• pseudo-purpose: simulation of task contexts
(e.g. action scenarios)
extrinsic
From (Sparck Jones, 2007)
132. intrinsic
• semi-purpose: inspection (e.g. for proper
English)
• quasi-purpose: comparison with models (e.g.
ngrams, nuggets)
• pseudo-purpose: simulation of task contexts
(e.g. action scenarios)
• full-purpose: operation in task context (e.g.
report writing)
extrinsic
From (Sparck Jones, 2007)
133. Evaluation conferences
• evaluation conferences are conferences where all the
participants have to complete the same task on a common set
of data
• these conferences allow direct comparison between the
participants
• such conferences determined quick advances in fields: MUC
(information extraction), TREC (Information retrieval &
question answering), CLEF (question answering for
non-English languages and cross-lingual QA)
134. SUMMAC
• the first evaluation conference organised in automatic
summarisation (in 1998)
• 6 participants in the dry-run and 16 in the formal evaluation
• mainly extrinsic evaluation:
• adhoc task determine the relevance of the source document to
a query (topic)
• categorisation assign to each document a category on the basis
of its summary
• question answering answer questions using the summary
• a small acceptability test where direct evaluation was used
135. SUMMAC
• the TREC dataset was used
• for the adhoc evaluation 20 topics each with 50 documents
were selected
• the time for the adhoc task halves with a slight reduction in
the accuracy (which is not significant)
• for the categorisation task 10 topics each with 100 documents
(5 categories)
• there is no difference in the classification accuracy and the
time reduces only for 10% summaries
• more details can be found in (Mani et al., 1998)
136. Text Summarization Challenge
• is an evaluation conference organised in Japan and its main
goals are to evaluate Japanese summarisers
• it was organised using the SUMMAC model
• precision and recall were used to evaluate single document
summaries
• humans had to assess the relevance of summaries from text
retrieved for specific queries to these queries
• is also included some readability measures (e.g. how many
deletions, insertions and replacements were necessary)
• more details can be found in (Fukusima and Okumura, 2001;
Okumura, Fukusima, and Nanba, 2003)
137. Document Understanding Conference
(DUC)
• it is an evaluation conference organised part of a larger
program called TIDES (Translingual Information Detection,
Extraction and Summarisation)
• organised from 2000
• at be beginning it was not that different from SUMMAC, but
in time more difficult tasks were introduced:
• 2001: single and multi-document generic summaries with 50,
100, 200, 400 words
• 2002: single and multi-document generic abstracts with 50,
100, 200, 400 words, and multi-document extracts with 200
and 400 words
• 2003: abstracts of documents and document sets with 10 and
100 words, and focused multi-document summaries
138. Document Understanding Conference
• in 2004 participants were required to produce short (<665
bytes) and (very short <75 bytes) summaries of single
documents and document sets, short document profile,
headlines
• from 2004 ROUGE is used as evaluation method
• in 2005: short multiple document summaries, user-oriented
questions
• in 2006: same as in 2005 but also used pyramid evaluation
• more information available at: http://duc.nist.gov/
• in 2007: 250 word summary, 100 update task, pyramid
evaluation was used as a community effort
• in 2008 DUC became TAC (Text Analysis Conference)
139. Structure of the course
1 Introduction to automatic summarisation
2 Important methods in automatic summarisation
How humans produce summaries
Single-document summarisation methods
Surface-based summarisation methods
Machine learning methods
Methods which exploit the discourse structure
Knowledge-rich methods
Multi-document summarisation methods
3 Automatic summarisation and the Internet
140. Ideal summary processing model
Source text(s)
Interpretation
Source representation
Transformation
Summary representation
Generation
Summary text
142. How humans summarise documents
• Determining how humans summarise documents is a difficult
task because it requires interdisciplinary research
• Endres-Niggemeyer (1998) breaks the process in three stages:
document exploration, relevance assessment and summary
production
• these have been determined through interviews with
professional summarisers
• use a top-down approach
• the expert summarisers do not attempt to understand the
source in great detail, instead they are trained to identify
snippets which contain important information
• very few automatic summarisation methods use an approach
similar to humans
143. Document exploration
• it’s the first step
• the source’s title, outline, layout and table of contents are
examined
• the genre of the texts is investigated because very often each
genre dictates a certain structure
• For example expository texts are expected to have a
problem-solution structure
• the abstractor’s knowledge about the source is represented as
a schema.
• schema = an abstractor’s prior knowledge of document types
and their information structure
144. Relevance assessment
• at this stage summarisers identify the theme and the thematic
structure
• theme = a structured mental representation of what the
document is about
• this structure allows identification of relations between text
chunks
• is used to identify important information, deletion of irrelevant
and unnecessary information
• the schema is populated with elements from the thematic
structure, producing an extended structure of the theme
145. Summary production
• the summary is produced from the expanded structure of the
theme
• in order to avoid producing a distorted summary, summarisers
relay mainly on copy/paste operations
• the chunks which are copied are reorganised to fit the new
structure
• standard sentence patters are also used
• summary production is a long process which requires several
iterations
• checklists can be used
149. Single document summarisation
• Produces summaries from a single document
• There are two main approaches:
• automatic text extraction → produces extracts also referred to
as extract and rearrange
150. Single document summarisation
• Produces summaries from a single document
• There are two main approaches:
• automatic text extraction → produces extracts also referred to
as extract and rearrange
• automatic text abstraction → produces abstracts also referred
to as understand and generate
151. Single document summarisation
• Produces summaries from a single document
• There are two main approaches:
• automatic text extraction → produces extracts also referred to
as extract and rearrange
• automatic text abstraction → produces abstracts also referred
to as understand and generate
• Automatic text extraction is the most used method to
produce summaries
152. Automatic text extraction
• Extracts important sentences from the text using different
methods and produces an extract by displaying the important
sentences (usually in order of appearance)
• A large proportion of the sentences used in human produces
summaries are sentences have been extracted directly from the
text or which contain only minor modifications
• Uses different statistical, surface-based and machine learning
techniques to determine which sentences are important
• First attempts made in the 50s
153. Automatic text extraction
• These methods are quite robust
• The main drawback of this method is that it overlooks the
way in which relationships between concepts in the text are
realised by the use of anaphoric links and other discourse
devices
• Extracting paragraphs can solve some of these problems
• Some methods involve excluding the unimportant sentences
instead of extracting the important sentences
155. Term-based summarisation
• It was the first method used to produce summaries by Luhn
(1958)
• Relies on the assumption that important sentences have a
large number of important words
• The importance of a word is calculated using statistical
measures
• Even though this method is very simple it is still used in
combination with other methods
• A demo summariser which relies on term frequency can be
found at:
http://clg.wlv.ac.uk/projects/CAST/demos.php
156. How to compute the importance of a word
• Different methods can be used:
• Term frequency: how frequent is a word in the document
• TF*IDF: relies on how frequent is a word in a document and in
how many documents from a collection the word appears
Number of documents
TF ∗ IDF (w ) = TF (w ) ∗ log ( )
Number of documents with w
• other statistical measures, for examples see (Or˘san, 2009)
a
• Issues:
• stoplists should be used
• what should be counted: words, lemmas, truncation, stems
• how to select the document collection
157. Term-based summarisation: the algorithm
(and can be used for other types of summarisers)
1 Score all the words in the source according to the selected
measure
158. Term-based summarisation: the algorithm
(and can be used for other types of summarisers)
1 Score all the words in the source according to the selected
measure
2 Score all the sentences in the text by adding the scores of the
words from these sentences
159. Term-based summarisation: the algorithm
(and can be used for other types of summarisers)
1 Score all the words in the source according to the selected
measure
2 Score all the sentences in the text by adding the scores of the
words from these sentences
3 Extract the sentences with top N scores
160. Term-based summarisation: the algorithm
(and can be used for other types of summarisers)
1 Score all the words in the source according to the selected
measure
2 Score all the sentences in the text by adding the scores of the
words from these sentences
3 Extract the sentences with top N scores
4 Present the extracted sentences in the original order
161. Position method
• It was noticed that in some genres important sentence appear
in predefined positions
• First used by Edmundson (1969)
• Depends very much from one genre to another:
• newswire: lead summary the first few sentences from the text
• scientific papers: the first/last sentences in the paragraph are
relevant for the topic of the paragraph (Baxendale, 1958)
• scientific papers: important information occurs in specific
sections of the document (introduction/conclusion)
• Lin and Hovy (1997) use a corpus to determine the where
these important sentences occur
162. Title method
• words in titles and headings are positively relevant to
summarisation
• Edmundson (1969) noticed that can lead to an increase in
performance of up to 8% if the score of sentences which
include such words are increased
163. Cue words/indicating phrases
• Makes use of words or phrases classified as ”positive” or
”negative” which may indicate the topicality and thus the
sentence value in an abstract
• positive: significant, purpose, in this paper, we show,
• negative: Figure 1, believe, hardly, impossible, pronouns
• Paice (1981) proposes indicating phrases which are basically
patterns (e.g. [In] this paper/report/article we/I show)
164. Methods inspired from IR (Salton et. al.,
1997)
• decomposes a document in a set of paragraphs
• computes the similarity between paragraphs and it represents
the strength of the link between two paragraphs
• similar paragraphs are considered those who have a similarity
above a threshold
• paragraphs can be extracted according to different strategies
(e.g. the number of links they have, select connected
paragraphs)
165.
166.
167. How to combine different methods
• Edmundson (1969) used a linear combination of features:
Weight(S) = α∗Title(S)+β∗Cue(S)+γ∗Keyword(S)+δ∗Position(S)
• the weights were adjusted manually
• the best system was cue + title + position
• it is better to use machine learning methods to combine the
results of different modules
169. What is machine learning (ML)?
Mitchell (1997):
• “machine learning is concerned with the question of how to
construct computer programs that automatically improve with
experience”
• “A computer program is said to learn from experience E with
respect to some class of tasks T and performance measure P,
if its performance at tasks in T, as measured by P, improves
with experience E”
170. What is machine learning? (2)
• Reasoning is based on the similarity between new situations
and the ones present in the training corpus
• In some cases it is possible to understand what it is learnt
(e.g. If-then rules)
• But in many cases the knowledge learnt by an algorithm
cannot be easily understood (instance-based learning, neural
networks)
171. ML for language processing
• Has been widely employed in a large number of NLP
applications which range from part-of-speech tagging and
syntactic parsing to word-sense disambiguation and
coreference resolution.
• In NLP both symbolic methods (e.g. decision trees,
instance-based classifiers) and numerically oriented statistical
and neural-network training approaches were used
172. ML as classification task
Very often an NLP problem can be seen as a classification problem
• POS: finding the appropriate class of a word
• Segmentation (e.g. noun phrase extraction): each word is
classified as the beginning, end or inside of the segment
• Anaphora/coreference resolution: classify candidates in
antecedent/non-antecedent
173. Summarisation as a classification task
• Each example (instance) in the set to be learnt can be
described by a set of features f1 , f2 , ...fn
• The task is to find a way to assign an instance to one of the
m disjoint classes c1 , c2 , ..., cm
• The automatic summarisation process is usually transformed
in a classification one
• The features are different properties of sentences (e.g.
position, keywords, etc.)
• Two classes: extract/do-not-extract
• Not always classification. It is possible to use the score or
automatically learnt rules as well
174. Kupiec et. al. (1995)
• used a Bayesian classifier to combine different features
• the features were:
• if the length of a sentence is above a threshold (true/false)
• contains cue words (true/false)
• position in the paragraph (initial/middle/final)
• contains keywords (true/false)
• contains capitalised words (true/false)
• the training and testing corpus consisted of 188 documents
with summaries
• humans identified sentences from the full text which are used
in the summary
• the best combination was position + cue + length
• Teufl and Moens (1997) used a similar method for sentence
extraction
175. Mani and Bloedorn (1998)
• learn rules about how to classify sentences
• features used:
• location features: location of sentence in paragraph, sentence
in special section, etc.
• thematic features: tf score, tf*idf score, number of section
heading words
• cohesion features: number of sentences with a synonym link to
sentence
• user focused features: number of terms relevant to the topic
• Example of rule learnt: IF sentence in conclusion & tf*idf high
& compression = 20% THEN summary sentence
176. Other ML methods
• Osborne (2002) used maximum entropy with features such as
word pairs, sentence length, sentence position, discourse
features (e.g., whether sentence follows the “Introduction”,
etc.)
• Knight and Marcu(2000) use noisy channel for sentence
compression
• Conroy et. al. (2001) use HMM
• Most of the methods these days try to use machine learning
178. Methods which exploit discourse cohesion
• summarisation methods which use discourse structure usually
produce better quality summaries because they consider the
relations between the extracted chunks
• they rely on global discourse structure
• they are more difficult to implement because very often the
theories on which they are based are difficult and not fully
understood
• there are methods which use text cohesion and text coherence
• very often it is difficult to control the length of summaries
produced in this way
179. Methods which exploit text cohesion
• text cohesion involves relations between words, word senses,
referring expressions which determine how tightly connected
the text is
• (S13) ”All we want is justice in our own country,” aboriginal
activist Charles Perkins told Tuesday’s rally. ... (S14) ”We
don’t want budget cuts - it’s hard enough as it is ,” said
Perkins
• there are methods which exploit lexical chains and
coreferential chains
180. Methods which exploit text cohesion
• text cohesion involves relations between words, word senses,
referring expressions which determine how tightly connected
the text is
• (S13) ”All we want is justice in our own country,” aboriginal
activist Charles Perkins told Tuesday’s rally. ... (S14) ”We
don’t want budget cuts - it’s hard enough as it is ,” said
Perkins
• there are methods which exploit lexical chains and
coreferential chains
181. Lexical chains for text summarisation
• Telepattan system: Bembrahim and Ahmad (1995)
• two sentences are linked if the words are related by repetition,
synonymy, class/superclass, paraphrase
• sentences which have a number of links above a threshold
form a bond
• on the basis of bonds a sentence has to previous and following
sentences it is possible to classify them as start topic, end
topic and mid topic
• sentences are extracted on the basis of open-continue-end
topic
• Barzilay and Elhadad (1997) implemented a more refined
version of the algorithm which includes ambiguity resolution
182. Using coreferential chains for text
summarisation
• method presented in (Azzam, Humphreys, Gaizauskas, 1999)
• the underlying idea is that it is possible to capture the most
important topic of a document by using a principal
coreferential chain
• The LaSIE system was used to produce the coreferential
chains extended with a focus-based algorithm for resolution of
pronominal anaphora
183. Coreference chain selection
The summarisation module implements several selection criteria:
• Length of chain: prefers a chain which contains most entires
which represents the most mentioned instance in a text
• Spread of the chain: the distance between the earliest and the
latest entry in each chain
• Start of Chain: the chain which starts in the title or in the
first paragraph of the text (this criteria could be very useful
for some genres such as newswire)
184. Summarisation methods which use
rhetorical structure of texts
• it is based on the Rhetorical Structure Theory (RST) (Mann
and Thompson, 1988)
• according to this theory text is organised in non-overlapping
spans which are linked by rhetorical relations and can be
organised in a tree structure
• there are two types of spans: nuclei and satellites
• a nucleus can be understood without satellites, but not the
other way around
• satellites can be removed in order to obtain a summary
• the most difficult part is to build the rhetorical structure of a
text
• Ono, Sumita and Miike (1994), Marcu (1997) and
Corston-Oliver (1998) present summarisation methods which
use the rhetorical structure of the text
186. Summarisation using argumentative
zoning
• Teufel and Moens (2002) exploit the structure of scientific
documents in order to produce summaries
• the summarisation process is split into two parts
1 identification of important sentences using an approach similar
to the one proposed by Kupiec, Pederson, and Chen (1995)
2 recognition of the rhetorical roles of the extracted sentences
• for rhetorical roles the following classes are used: Aim,
Textual, Own, Background, Contrast, Basis, Other
188. Knowledge rich methods
• Produce abstracts
• Most of them try to “understand” (at least partially a text)
and to make inferences before generating the summary
• The systems do not really understand the contents of the
documents, but they are using different techniques to extract
the meaning
• Since this process involves a huge amount of world knowledge
the application is restricted to a specific domain only
189. Knowledge-rich methods
• The abstracts obtained in this way are betters in terms of
cohesion and coherence
• The abstracts produced in this way tend to be more
informative
• This method is also known as the understand and generate
approach
• This method extracts the information from the text and holds
it in some intermediate form
• The representation is then used as the input for a natural
language generator to produce an abstract
190. FRUMP (deJong, 1982)
• uses sketchy scripts to understand a situation
• these scripts only keep the information relevant to the event
and discard the rest
• 50 scripts were manually created
• words from the source activate scripts and heuristics are used
to decide which script is used in case more than one script is
activated
191. Example of script used by FRUMP
1 The demonstrators arrive at the demonstration location
192. Example of script used by FRUMP
1 The demonstrators arrive at the demonstration location
2 The demonstrators march
193. Example of script used by FRUMP
1 The demonstrators arrive at the demonstration location
2 The demonstrators march
3 The police arrive on the scene
194. Example of script used by FRUMP
1 The demonstrators arrive at the demonstration location
2 The demonstrators march
3 The police arrive on the scene
4 The demonstrators communicate with the target of the
demonstration
195. Example of script used by FRUMP
1 The demonstrators arrive at the demonstration location
2 The demonstrators march
3 The police arrive on the scene
4 The demonstrators communicate with the target of the
demonstration
5 The demonstrators attack the target of the demonstration
196. Example of script used by FRUMP
1 The demonstrators arrive at the demonstration location
2 The demonstrators march
3 The police arrive on the scene
4 The demonstrators communicate with the target of the
demonstration
5 The demonstrators attack the target of the demonstration
6 The demonstrators attack the police
197. Example of script used by FRUMP
1 The demonstrators arrive at the demonstration location
2 The demonstrators march
3 The police arrive on the scene
4 The demonstrators communicate with the target of the
demonstration
5 The demonstrators attack the target of the demonstration
6 The demonstrators attack the police
7 The police attack the demonstrators
198. Example of script used by FRUMP
1 The demonstrators arrive at the demonstration location
2 The demonstrators march
3 The police arrive on the scene
4 The demonstrators communicate with the target of the
demonstration
5 The demonstrators attack the target of the demonstration
6 The demonstrators attack the police
7 The police attack the demonstrators
8 The police arrest the demonstrators
199. FRUMP
• the evaluation of the system revealed that it could not process
a large number of scripts because it did not have the
appropriate scripts
• the system is very difficult to be ported to a different domain
• sometimes it can misunderstand some scripts: Vatican City.
The dead of the Pope shakes the world. He passed away →
Earthquake in the Vatican. One dead.
• the advantage of this method is that the output can be in any
language
200. Concept-based abstracting (Paice and
Jones, 1993)
• Also referred to as extract and generate
• Summaries in the field of agriculture
• Relies on predefined text patterns such as this paper studies
the effect of [AGENT] on the [HLP] of [SPECIES] → This
paper studies the effect of G. pallida on the yield of potato.
• The summarisation process involves instantiation of patterns
with concepts from the source
• Each pattern has a weight with is used to decide whether the
generated sentence is included in the output
• This method is good to produce informative summaries
201. Other knowledge-rich methods
• Rumelhart (1975) developed a system to understand and
summarise simple stories, using a grammar which generated
semantic interpretations of the story on the basis of
hand-coded rules.
• Alterman (1986) used local understanding
• Fum, Guida, and Tasso (1985) tries to replicate the human
summarisation process
• Rau, Jacobs, and Zernik (1989) integrates a bottom-up
linguistic analyser and a top-down conceptual interpretation
203. Multi-document summarisation
• multi-document summarisation is the extension of
single-document summarisation to collections of related
documents
• very rarely methods from single-document summarisation can
be directly used
• it is not possible to produce single-document summaries from
every single document in collection and then to concatenate
them
• normally they are user-focused summaries
204. Issues with multi-document summaries
• the collections to be summarised can vary a lot in size, so
different methods might need to be used
• a much higher compression rate is needed
• redundancy
• ordering of sentences (usually the date of publication is used)
• similarities and differences between different texts need to be
considered
• contradiction between information
• fragmentary information
205. IR inspired methods
• Salton et. al. (1997) can be adapted to multi-document
summarisation
• instead of using paragraphs from one documents, paragraphs
from all the documents are used
• the extraction strategies are kept
206. Maximal Marginal Relevance
• proposed by (Goldstein et al., 2000)
• addresses the redundancy among multiple documents
• allows a balance between the diversity of the information and
relevance to a user query
• MMR(Q, R, S) =
argmaxDi ∈RS [λSim1 (Di , Q) − (1 − λ)maxDj ∈R Sim2 (Di , Dj ))]
• can be used also for single document summarisation
207. Cohesion text maps
• use knowledge based on lexical cohesion Mani and Bloedorn
(1999)
• good to compare pairs of documents and tell what’s common,
what’s different
• builds a graph from the texts: the nodes of the graph are the
words of the text. Arcs represent adjacency, grammatical,
co-reference, and lexical similarity-based relations.
• sentences are scored using tf.idf metric.
• user query is used to traverse the graph (a spread activation is
used)
• to minimize redundancy in extracts, extraction can be greedy
to cover as many different terms as possible
209. Theme fusion Barzilay et. al. (1999)
• used to avoid redundancy in multi-document summaries
• Theme = collection of similar sentences drawn from one or
more related documents
• Computes theme intersection: phrases which are common to
all sentences in a theme
• paraphrasing rules are used (active vs. passive, different orders
of adjuncts, classifier vs. apposition, ignoring certain
premodifiers in NPs, synonymy)
• generation is used to put the theme intersection together
210. Centroid based summarisation
• a centroid = a set of words that are statistically important to
a cluster of documents
• each document is represented as a weighted vector of TF*IDF
scores
• each sentence receives a score equal with the sum of
individual centroid values
• sentence salience Boguraev and Kennedy (1999)
• centroid score Radev, Jing, and Budzikowska (2000)
211. Cross Structure Theory
• Cross Structure Theory provides a theoretical model for issues
that arise when trying to summarise multiple texts (Radev,
Otterbacher, and Zhang, 2004).
• describing relationships between two or more sentences from
different source documents related to the same topic.
• similar to RST but at cross-document level
• 18 domain-independent relations such as identity, equivalence,
subsumption, contradiction, overlap, fulfilment and
elaboration between texts spans
• can be used to extract sentences and avoid redundancy
213. • New research topics have emerged at the confluence of
summarisation with other disciplines (e.g. question answering
and opinion mining)
• Many of these fields appeared as a result of the expansion of
the Internet
• The Internet is probably the largest source of information, but
it is largely unstructured and heterogeneous
• Multi-document summarisation is more necessary than ever
• Web content mining = extraction of useful information from
the Web
214. Challenges posed by the Web
• Huge amount of information
• Wide and diverse
• Information of all types e.g. structured data, texts, videos, etc.
• Semi-structured
• Linked
• Redundant
• Noisy
215. Summarisation of news on the Web
• Newsblaster (McKeown et. al. 2002) summarises news from
the Web (http://newsblaster.cs.columbia.edu/)
• it is mainly statistical, but with symbolic elements
• it crawls the Web to identify stories (e.g. filters out ads),
clusters them on specific topics and produces a
multidocument summary
• theme sentences are analysed and fused together to produce
the summary
• summaries also contain images using high precision rules
• similar services: newsinessence, Google News, News Explorer
• tracking and updating are important features of such systems
216. Email summarisation
• email summarisation is more difficult because they have a
dialogue structure
• Muresan et. al. (2001) use machine learning to learn rules for
salient NP extraction
• Nenkova and Bagga (2003) use developed a set of rules to
extract important sentences
• Newman and Blitzer (2003) use clustering to group messages
together and then they extract a summary from each cluster
• Rambow et. al. (2004) automatically learn rules to extract
sentences from emails
• these methods do not use may email specific features, but in
general the subject of the first email is used as a query
217. Blog summarisation
• Zhou et. al. (2006) see a blog entry as a summary of a news
stories with personal opinions added. They produce a
summary by deleting sentences not related to the story
• Hu et. al. (2007) use blog’s comments to identify words that
can be used to extract sentences from blogs
• Conrad et. al. (2009) developed a query-based opinion
summarisation for legal blog entries based on the TAC 2008
system
218. Opinion mining and summarisation
• find what reviewers liked and disliked about a product
• usually large number of reviews, so an opinion summary
should be produced
• visualisation of the result is important and it may not be a text
• analogous to, but different to multi-document summarisation
219. Producing the opinion summary
A three stage process:
1 Extract object features that have been commented on in each
review.
2 Classify each opinion
3 Group feature synonym and produce the summary (pro vs.
cons, detailed review, graphical representation)
220. Opinion summaries
• Mao and Lebanon (2007) suggest to produce summaries that
track the sentiment flow within a document i.e., how
sentiment orientation changes from one sentence to the next
• Pang and Lee (2008) suggest to create “subjectivity extracts.”
• sometimes graph-based output seems much more appropriate
or useful than text-based output
• in traditional summarization redundant information is often
discarded, in opinion summarization one wants to track and
report the degree of redundancy, since in the opinion-oriented
setting the user is typically interested in the (relative) number
of times a given sentiment is expressed in the corpus.
• there is much more contradictory information
221. Opinion summarisation at TAC
• the Text Analysis Conference 2008 (TAC) contained an
opinion summarisation from blogs
• http://www.nist.gov/tac/
• generate summaries of opinions about targets
• What features do people dislike about Vista?
• a question answering system is used to extract snippets that
are passed to the summariser
222. QA and Summarisation at INEX2009
• the QA track at INEX2009 requires participants to answer
factual and complex questions
• the complex questions will require to aggregate the answer
from several documents
• What are the main applications of bayesian networks in the
field of bioinformatics?
• for complex sentences evaluators will mark syntactic
incoherence, unresolved anaphora, redundancy and not
answering the question
• Wikipedia will be used as document collection
223. Conclusions
• research in automatic summarisation is still a very active, but
in many cases it merges with other fields
• evaluation is still a problem in summarisation
• the current state-of-the-art is still sentence extraction
• more language understanding needs to be added to the
systems
224. Thank you!
More information and updates at:
http://www.summarizationonline.info
226. Alterman, Richard. 1986. Summarisation in small. In N. Sharkey, editor, Advances in
cognitive science. Chichester, England, Ellis Horwood.
American National Standards Institute Inc. 1979. American National Standard for
Writing Abstracts. Technical Report ANSI Z39.14 – 1979, American National
Standards Institute, New York.
Baxendale, Phyllis B. 1958. Man-made index for technical literature - an experiment.
I.B.M. Journal of Research and Development, 2(4):354 – 361.
Boguraev, Branimir and Christopher Kennedy. 1999. Salience-based content
characterisation of text documents. In Inderjeet Mani and Mark T. Maybury, editors,
Advances in Automated Text Summarization. The MIT Press, pages 99 – 110.
Borko, Harold and Charles L. Bernier. 1975. Abstracting concepts and methods.
Academic Press, London.
Brandow, Ronald, Karl Mitze, and Lisa F. Rau. 1995. Automatic condensation of
electronic publications by sentence selection. Information Processing & Management,
31(5):675 – 685.
Cleveland, Donald B. 1983. Introduction to Indexing and Abstracting. Libraries
Unlimited, Inc.
Conroy, James M., Jjudith D. Schlesinger, Dianne P. O’Leary, and Mary E. Okurowski.
2001. Using HMM and logistic regression to generate extract summaries for DUC. In
Proceedings of the 1st Document Understanding Conference, New Orleans, Louisiana
USA, September 13-14.
DeJong, G. 1982. An overview of the FRUMP system. In W. G. Lehnert and M. H.
Ringle, editors, Strategies for natural language processing. Hillsdale, NJ: Lawrence
Erlbaum, pages 149 – 176.
Edmundson, H. P. 1969. New methods in automatic extracting. Journal of the
Association for Computing Machinery, 16(2):264 – 285, April.
227. Endres-Niggemeyer, Brigitte. 1998. Summarizing information. Springer.
Fukusima, Takahiro and Manabu Okumura. 2001. Text Summarization Challenge
Text summarization evaluation in Japan (TSC). In Proceedings of Automatic
Summarization Workshop.
Fum, Danilo, Giovanni Guida, and Carlo Tasso. 1985. Evaluating importance: a step
towards text summarisation. In Proceedings of the 9th International Joint Conference
on Artificial Intelligence, pages 840 – 844, Los Altos CA, August.
Goldstein, Jade, Mark Kantrowitz, Vibhu Mittal, and Jaime Carbonell. 1999.
Summarizing text documents: Sentence selection and evaluation metrics. In
Proceedings of the 22nd Annual International ACM SIGIR Conference on Research
and Development in Information Retrieval, pages 121 – 128, Berkeley, California,
August, 15 – 19.
Goldstein, Jade, Vibhu O. Mittal, Jamie Carbonell, and Mark Kantrowitz. 2000.
Multi-Document Summarization by Sentence Extraction. In Udo Hahn, Chin-Yew Lin,
Inderjeet Mani, and Dragomir R. Radev, editors, Proceedings of the Workshop on
Automatic Summarization at the 6th Applied Natural Language Processing
Conference and the 1st Conference of the North American Chapter of the Association
for Computational Linguistics, Seattle, WA, April.
Graetz, Naomi. 1985. Teaching EFL students to extract structural information from
abstracts. In J. M. Ulign and A. K. Pugh, editors, Reading for Professional Purposes:
Methods and Materials in Teaching Languages. Leuven: Acco, pages 123–135.
Hasler, Laura, Constantin Or˘san, and Ruslan Mitkov. 2003. Building better corpora
a
for summarisation. In Proceedings of Corpus Linguistics 2003, pages 309 – 319,
Lancaster, UK, March, 28 – 31.
Hovy, Eduard. 2003. Text summarisation. In Ruslan Mitkov, editor, The Oxford
Handbook of computational linguistics. Oxford University Press, pages 583 – 598.
228. Jing, Hongyan and Kathleen R. McKeown. 1999. The decomposition of
human-written summary sentences. In Proceedings of the 22nd International
Conference on Research and Development in Information Retrieval (SIGIR’99), pages
129 – 136, University of Berkeley, CA, August.
Johnson, Frances. 1995. Automatic abstracting research. Library review, 44(8):28 –
36.
Knight, Kevin and Daniel Marcu. 2000. Statistics-based summarization — step one:
Sentence compression. In Proceedings of the 17th National Conference on Artificial
Intelligence (AAAI), pages 703 – 710, Austin, Texas, USA, July 30 – August 3.
Kolcz, Aleksander, Vidya Prabakarmurthi, and Jugal Kalita. 2001. Summarization as
feature selection for text categorization. In Proceedings of the 10th International
Conference on Information and Knowledge Management, pages 365 – 370, Atlanta,
Georgia, US, October 05 - 10.
Kuo, June-Jei, Hung-Chia Wung, Chuan-Jie Lin, and Hsin-Hsi Chen. 2002.
Multi-document summarization using informative words and its evaluation with a QA
system. In Proceedings of the Third International Conference on Intelligent Text
Processing and Computational Linguistics (CICLing-2002), pages 391 – 401, Mexico
City, Mexico, February, 17 – 23.
Kupiec, Julian, Jan Pederson, and Francine Chen. 1995. A trainable document
summarizer. In Proceedings of the 18th ACM/SIGIR Annual Conference on Research
and Development in Information Retrieval, pages 68 – 73, Seattle, July 09 – 13.
Lin, Chin-Yew. 2004. Rouge: a package for automatic evaluation of summaries. In
Proceedings of the Workshop on Text Summarization Branches Out (WAS 2004),
Barcelona, Spain, July 25 - 26.
Lin, Chin-Yew and Eduard Hovy. 1997. Identifying topic by position. In Proceedings
of the 5th Conference on Applied Natural Language Processing, pages 283 – 290,
Washington, DC, March 31 – April 3.
229. Louis, Annie and Ani Nenkova. 2009. Performance confidence estimation for
automatic summarization. In Proceedings of the 12th Conference of the European
Chapter of the ACL, page 541548, Athens, Greece, March 30 - April 3.
Luhn, H. P. 1958. The automatic creation of literature abstracts. IBM Journal of
research and development, 2(2):159 – 165.
Mani, Inderjeet and Eric Bloedorn. 1998. Machine learning of generic and
user-focused summarization. In Proceedings of the Fifthteen National Conference on
Artificial Intelligence, pages 821 – 826, Madison, Wisconsin. MIT Press.
Mani, Inderjeet and Eric Bloedorn. 1999. Summarizing similarities and differences
among related documents. In Inderjeet Mani and Mark T. Maybury, editors, Advances
in automatic text summarization. The MIT Press, chapter 23, pages 357 – 379.
Mani, Inderjeet, Therese Firmin, David House, Michael Chrzanowski, Gary Klein,
Lynette Hirshman, Beth Sundheim, and Leo Obrst. 1998. The TIPSTER SUMMAC
text summarisation evaluation: Final report. Technical Report MTR 98W0000138,
The MITRE Corporation.
Mani, Inderjeet and Mark T. Maybury, editors. 1999. Advances in automatic text
summarisation. MIT Press.
Marcu, Daniel. 1999. The automatic construction of large-scale corpora for
summarization research. In The 22nd International ACM SIGIR Conference on
Research and Development in Information Retrieval (SIGIR’99), pages 137–144,
Berkeley, CA, August 15 – 19.
Marcu, Daniel. 2000. The theory and practice of discourse parsing and summarisation.
The MIT Press.
Miike, Seiji, Etsuo Itoh, Kenji Ono, and Kazuo Sumita. 1994. A full-text retrieval
system with a dynamic abstract generation function. In Proceedings of the 17th ACM
SIGIR conference, pages 152 – 161, Dublin, Ireland, 3-6 July. ACM/Springer.