2. Abstract
This article proposes a novel framework for
representing and measuring local coherence.
Central to this approach is the entity-grid
representation of discourse, which captures patterns
of entity distribution in a text. The algorithm
introduced in the article automatically abstracts a text
into a set of entity transition sequences and records
distributional, syntactic, and referential information
about discourse entities. We re-conceptualize
coherence assessment as a learning task and show
that our entity-based representation is well-suited for
ranking-based generation and text classification
tasks. Using the proposed representation, we achieve
good performance on text ordering, summary
coherence evaluation, and readability assessment.
3. Introduction
A key requirement for any system that produces text
is the coherence of its output.
Use of coherence theories: text generation, especially
indistinguishable from human writing
Previous efforts have relied on handcrafted rules,
valid only for limited domains, with no guarantee of
scalability or portability (Reiter and Dale 2000).
Furthermore, coherence constraints are often
embedded in complex representations (e.g., Asher
and Lascarides 2003) which are hard to implement in
a robust application.
4. Introduction
Here, the focus is on local coherence (sentence
to sentence). Necessary for global coherence,
too.
The key premise of our work is that the
distribution of entities in locally coherent texts
exhibits certain regularities.
Covered before in Centering Theory (Grosz,
Joshi, and Weinstein 1995) and other entity-
based theories of discourse (e.g., Givon 1987;
Prince 1981).
5. Introduction
The proposed entity-based representation of
discourse allows us to learn the properties of
coherent texts from a corpus, without recourse to
manual annotation or a predefined knowledge
base.
Usefulness tests: text ordering, automatic
evaluation of summary coherence, and
readability assessment.
Lapata formulates text ordering and summary
evaluation—as ranking problems, wit a learning
model.
6. Introduction
Evaluation:
In the text-ordering task our algorithm has to select a
maximally coherent sentence order from a set of
candidate permutations.
In the summary evaluation task, we compare the
rankings produced by the model against human
coherence judgments elicited for automatically
generated summaries.
In both experiments, our method yields improvements
over state-of-the-art models.
7. Introduction
Evaluation:
By incorporating coherence features stemming
from the proposed entity-based representation,
we improve the performance of a state-of-the-art
readability assessment system
8. Outline
2. Related Work
3. The Coherence model
4. Experiment 1: Sentence Ordering
5. Experiment 2: Summary Coherence Rating
6. Experiment 3: Readabiality Assessment
7. Discussion and Conclusions
9. Related Work
2. Related Work
1. Summary of entity-based theories of
discourse, and overview previous attempts for
translating their underlying principles into
computational coherence models.
2. Description of ranking approaches to natural
language generation and focus on coherence
metrics used in current text planners.
10. Related Work
2.1 Entity-Based Approaches to Local
Coherence
Entity-based accounts of local coherence
common
Unifying assumption: discourse coherence is
achieved in view of the way discourse entities
are introduced and discussed.
Commonly formalized by devising constraints on
the linguistic realization and distribution of
discourse entities in coherent texts.
11. Related Work
Centering theory: salience concerns how entities
are realized in an utterance
Else: salience is defined in terms of topicality
(Chafe 1976; Prince 1978), predictability (Kuno
1972; Halliday and Hasan 1976), and cognitive
accessibility (Gundel, Hedberg, and Zacharski
1993)
12. Related Work
Entity-based theories: capture coherence by
characterizing the distribution of entities across
discourse utterances, distinguishing between
salient entities and the rest.
The intuition here is that texts about the same
discourse entity are perceived to be more
coherent than texts fraught with abrupt switches
from one topic to the next.
13. Related Work
Hard to model coherence computationally (often
as the underlying theories are not fleshed out)
Often use manual annotations as bootstrappers
for algorithms.
14. Related Work
B & L:
Not based on any particular theory
Inference model combines relevant information
(not manual annotations)
Emphasizes automatic computation for both the
underlying discourse representation and the
inference procedure
Automatic, albeit noisy, feature extraction allows
performing a large scale evaluation of differently
instantiated coherence models across genres and
applications.
15. Related Work
2.2 Ranking Approaches
Produce a large set of candidate outputs, rank
them based on desired features using a ranking
function. Two-stage generate-and-rank system
minimizes complexity.
In re: coherence, text planning is important for
coherent output. Same iterated ranking system for
text plans.
Feature selection & weighting done manually –
not sufficient.
―The problem is far too complex and our
knowledge of the issues involved so meager that
only a token gesture can be made at this point.‖
(Mellish et al. 1998, p.100)
16. Related Work
B&L:
Introduce an entity-based representation of
discourse that is automatically computed from raw
text;
The representation reveals entity transition
patterns characteristic of coherent texts.
This can be easily translated into a large feature
space which lends itself naturally to the effective
learning of a ranking function, without explicit
manual involvement.
17. The Coherence Model
3.1 Entity-Grid Discourse Representation
Each text is represented by an entity grid, a two-
dimensional array that captures the distribution of
discourse entities across text sentences.
The rows of the grid correspond to
sentences, and the columns correspond to
discourse entities. By discourse entity we mean
a class of coreferent noun phrases.
Each grid cell thus corresponds to a string from a
set of categories reflecting whether the entity in
question is a subject (S), object (O), or neither (X)
19. The Coherence Model
3.2 Entity Grids as Feature Vectors
Assumption: the distribution of entities in coherent
texts exhibits certain regularities reflected in grid
topology.
One would further expect that entities
corresponding to dense columns are more often
subjects or objects (for instance.)
20. The Coherence Model
Analysis revolves around local entity transition:
A sequence {S, O, X, –}n that represents entity
occurrences and their syntactic roles in n adjacent
sentences. (And their probability in the text).
Each text is represented by a fixed set of
transition sequences using a standard feature
vector notation – which can be used for:
Learning algorithms
Identifying information relevant to coherence
assessment
21. The Coherence Model
3.3 Grid Construction: Linguistic Dimensions
What linguistic information is relevant for
coherence prediction?
How should we represent those?
What should the parameters be for a good
computational, automatic model?
22. The Coherence Model
Parameters:
Exploration of the parameter space guided by:
Linguistic importance of parameter (linked to local
coherence)
Accuracy of automatic computation
(granularity, etc.)
Size of the resulting feature space (too big is not
good.)
23. The Coherence Model
Entity extraction:
Co-reference resolution system (Ng & Cardie
2002) (Various lexical, grammatical, semantic,
positional features)
For different domains/languages: simply cluster
nouns based on identify. Works consistently.
Grammatical function:
Collins’ 1997 Parser
24. The Coherence Model
Salience:
Evaluate by using two models: one with uniform
treatment, one that discriminates between
transitions of salient entities and the rest.
Frequency counts.
Compute each salient group’s transitions
separately, then combine then into a single feature
vector.
25. The Coherence Model
With feature vector representation, coherence
assessment becomes a machine learning
problem.
By encoding texts as entity transition sequences,
the algorithm can learn a ranking function
(instead of manually specifying it.)
The feature vector representation can also be
used for conventional classification tasks (apart
from information ordering and summary
coherence rating).
26. Sentence Ordering
A document is a bag of sentences, and the
algorithm task is to find the maximal coherent
order.
Again, the algorithm is used here to rank
alternative sentence orderings, but not to find the
optimal one. (Local coherence is not enough for
this.)
27. Sentence Ordering
3.1 Modeling
Training set: ordered pairs of alternate readings
(xij,xik)
Document di, j > k
Goal is to find parameter vector w such that yields
a ranking score function which minimises
violations or pairwise rankings in training set:
∀(xij,xik)∈r∗ : w · Φ(xij) > w · Φ(xik)
r* = optimal ranking
Φ(xij) and Φ(xik) are a mapping onto features
representing the coherence properties of
renderings xij and xik
28. Sentence Ordering
3.1 Modeling
Ideal ranking function, represented by the weight
vector w would satisfy the condition:
w · (Φ(xij) − Φ(xik)) > 0 ∀j, i, k such that j > k
Total number of training and test instances in
corpora:
Earthquakes: Train 1,896, Test 2,056
Accidents: Train 2,095, Test 2,087
29. Sentence Ordering
4.2 Method
Data: To acquire a large collection for training and
testing, B&L created synthetic data, wherein the
candidate set consists of a source document and
permutations of its sentences.
AP press articles on earthquakes, National
Transportation Safety Board’s aviation accident
database
100 source articles, with up to 20 randomly
generated permutations for training.
30. Sentence Ordering
Comparison with State-of-the-Art Methods
Compared against Foltz, Kintsch, Landauer 1998
Barzilay and Lee 2004
Both rely mainly on lexical information, unlike here.
FKL98: LSA coherence measure for semantic
relatedness of adjacent sentences
BL04: HMM, where states are topics
Evaluation Metric
Accuracy = correct predictions / size of test.
Random = 50% (Binary)
32. Sentence Ordering
Comparison with SotA Methods:
Outperforms LSA on both domains: Because of –
Coreference + grammatical role information, more
holistic representation (over more than 2
sentences), exposure to domain relevant texts.
HMM comparable for Earthquakes corpora, not
Accidents
may be complementary
35. Summary Coherence
Rating
Tested model-induced rankings against human
rankings.
If successful, holds implications for automatic
evaluation of machine-generated texts.
Better than BLEU or ROUGE, which weren’t
designed for this.
36. Summary Coherence
Rating
5.1 Modeling
Summary coherence is also a ranking learning
task.
Same as before.
37. Summary Coherence
Rating
5.2 Data
Evaluation based on Document Understanding
Conference 2003, which has rated summaries.
Not good enough, so randomly selected 16 input
document clusters and five systems that
produced summaries.
Ratings collected by 177 internet slaves (unpaid
volunteers). These were then checked by leave-
one-out-resampling and discretization into two
classes.
Training: 144 summaries. Test: 80 pairwise
ratings. Dev: 6 documents.
38. Summary Coherence
Rating
Experiment 1: co-reference resolution tool for
human-written texts. Here: co-reference tool to
automatically generated summaries.
Compared against LSA, not B&L 04 (domain-
dependent)
Did much better (p < .01)
41. Readability Assessment
Can entity grids be used for style classification?
Judged against Schwarm and Ostendorf 2005’s
method for assessing readability (among others)
42. Readability Assessment
As in S&O05, readability assessment is a
classification task.
Training sample consisted of n documents such
that
(x⃗1,y1),...,(x⃗n,yn) x⃗i ∈RN,yi∈{−1,+1}
where x⃗i is a feature vector for the ith document
in the training sample and yi its (positive or
negative) class label.
43. Readability Assessment
6.2 Method
Data: 107 artciles from Encyclopedia Britannica
and Britannica Elemenary (from Barzilay &
Elhadad 2003)
45. Readability Assessment
Features: Two versions, one with S&O features:
Syntactic, semantic, combination (Flesch-Kincaid
formula)
One with more features:
Coherence based, with entity transition notation
(compared against LSA)
48. Discussion and
Conclusions
Presented: novel framework for representing and
measuring text coherence.
Central to this framework is the entity-grid
representation of discourse, which captures
important patterns of sentence transitions.
Coherence assessment rec-onceptualised as a
learning task.
Good performance on text ordering, summary
coherence evaluation, and readability
assessment.
49. Discussion and
Conclusions
The entity grid is a flexible, yet computationally
tractable, representation.
Three important parameters for grid construction:
the computation of coreferring entity classes,
the inclusion of syntactic knowledge;
the influence of salience.
Empirically validated the importance of salience
and syntactic information for coherence-based
models.
50. Discussion and
Conclusions
Full coreference resolution not perfect.
(mismatches between training and testing
conditions.)
Instead of automatic coreference resolution
system, entity classes can be approximated
simply by string matching.
51. Discussion and
Conclusions
This approach is not a direct implementation of any
theory in particular, in favor of automatic computation
and breadth of coverage.
Findings:
pronominalization is a good indicator of document
coherence.
coherent texts are characterized by transitions with
particular properties which do not hold for all
discourses.
These models are sensitive to the domain at hand and
the type of texts under consideration (human-authored
vs. machine generated texts).
52. Discussion and
Conclusions
Future work:
Augmenting entity-based representation with fine-
grained lexico-semantic knowledge.
cluster entities based on their semantic relatedness,
thereby creating a grid representation over lexical
chains.
develop fully lexicalized models, akin to traditional
language models.
Expanding grammatical categories to modifiers and
adjuncts may provide additional information, in
particular when considering machine generated texts.
Investigating whether the proposed discourse
representation and modeling approaches generalize
across different languages
Improving prediction on both local and global levels,
with the ultimate goal of handling longer texts.