SlideShare uma empresa Scribd logo
1 de 52
Modeling Local
Coherence: An Entity-
Based Approach
Regina Barzilay (MIT), Mirella Lapata (UoE)
ACL 2008
Abstract
This article proposes a novel framework for
representing and measuring local coherence.
Central to this approach is the entity-grid
representation of discourse, which captures patterns
of entity distribution in a text. The algorithm
introduced in the article automatically abstracts a text
into a set of entity transition sequences and records
distributional, syntactic, and referential information
about discourse entities. We re-conceptualize
coherence assessment as a learning task and show
that our entity-based representation is well-suited for
ranking-based generation and text classification
tasks. Using the proposed representation, we achieve
good performance on text ordering, summary
coherence evaluation, and readability assessment.
Introduction
A key requirement for any system that produces text
is the coherence of its output.
Use of coherence theories: text generation, especially
indistinguishable from human writing
Previous efforts have relied on handcrafted rules,
valid only for limited domains, with no guarantee of
scalability or portability (Reiter and Dale 2000).
Furthermore, coherence constraints are often
embedded in complex representations (e.g., Asher
and Lascarides 2003) which are hard to implement in
a robust application.
Introduction
Here, the focus is on local coherence (sentence
to sentence). Necessary for global coherence,
too.

The key premise of our work is that the
distribution of entities in locally coherent texts
exhibits certain regularities.

Covered before in Centering Theory (Grosz,
Joshi, and Weinstein 1995) and other entity-
based theories of discourse (e.g., Givon 1987;
Prince 1981).
Introduction
The proposed entity-based representation of
discourse allows us to learn the properties of
coherent texts from a corpus, without recourse to
manual annotation or a predefined knowledge
base.

Usefulness tests: text ordering, automatic
evaluation of summary coherence, and
readability assessment.

Lapata formulates text ordering and summary
evaluation—as ranking problems, wit a learning
model.
Introduction
Evaluation:

   In the text-ordering task our algorithm has to select a
   maximally coherent sentence order from a set of
   candidate permutations.

   In the summary evaluation task, we compare the
   rankings produced by the model against human
   coherence judgments elicited for automatically
   generated summaries.

   In both experiments, our method yields improvements
   over state-of-the-art models.
Introduction
Evaluation:

   By incorporating coherence features stemming
   from the proposed entity-based representation,
   we improve the performance of a state-of-the-art
   readability assessment system
Outline
2. Related Work

3. The Coherence model

4. Experiment 1: Sentence Ordering

5. Experiment 2: Summary Coherence Rating

6. Experiment 3: Readabiality Assessment

7. Discussion and Conclusions
Related Work
2. Related Work
  1. Summary of entity-based theories of
     discourse, and overview previous attempts for
     translating their underlying principles into
     computational coherence models.
  2. Description of ranking approaches to natural
     language generation and focus on coherence
     metrics used in current text planners.
Related Work
2.1 Entity-Based Approaches to Local
Coherence

Entity-based accounts of local coherence
common

Unifying assumption: discourse coherence is
achieved in view of the way discourse entities
are introduced and discussed.

Commonly formalized by devising constraints on
the linguistic realization and distribution of
discourse entities in coherent texts.
Related Work
Centering theory: salience concerns how entities
are realized in an utterance

Else: salience is defined in terms of topicality
(Chafe 1976; Prince 1978), predictability (Kuno
1972; Halliday and Hasan 1976), and cognitive
accessibility (Gundel, Hedberg, and Zacharski
1993)
Related Work
Entity-based theories: capture coherence by
characterizing the distribution of entities across
discourse utterances, distinguishing between
salient entities and the rest.

The intuition here is that texts about the same
discourse entity are perceived to be more
coherent than texts fraught with abrupt switches
from one topic to the next.
Related Work
Hard to model coherence computationally (often
as the underlying theories are not fleshed out)

Often use manual annotations as bootstrappers
for algorithms.
Related Work
B & L:
   Not based on any particular theory
   Inference model combines relevant information
   (not manual annotations)
   Emphasizes automatic computation for both the
   underlying discourse representation and the
   inference procedure
   Automatic, albeit noisy, feature extraction allows
   performing a large scale evaluation of differently
   instantiated coherence models across genres and
   applications.
Related Work
2.2 Ranking Approaches
  Produce a large set of candidate outputs, rank
  them based on desired features using a ranking
  function. Two-stage generate-and-rank system
  minimizes complexity.
  In re: coherence, text planning is important for
  coherent output. Same iterated ranking system for
  text plans.
  Feature selection & weighting done manually –
  not sufficient.
  ―The problem is far too complex and our
  knowledge of the issues involved so meager that
  only a token gesture can be made at this point.‖
  (Mellish et al. 1998, p.100)
Related Work
B&L:
  Introduce an entity-based representation of
  discourse that is automatically computed from raw
  text;
  The representation reveals entity transition
  patterns characteristic of coherent texts.
  This can be easily translated into a large feature
  space which lends itself naturally to the effective
  learning of a ranking function, without explicit
  manual involvement.
The Coherence Model
3.1 Entity-Grid Discourse Representation
   Each text is represented by an entity grid, a two-
   dimensional array that captures the distribution of
   discourse entities across text sentences.
   The rows of the grid correspond to
   sentences, and the columns correspond to
   discourse entities. By discourse entity we mean
   a class of coreferent noun phrases.
   Each grid cell thus corresponds to a string from a
   set of categories reflecting whether the entity in
   question is a subject (S), object (O), or neither (X)
The Coherence Model
The Coherence Model
3.2 Entity Grids as Feature Vectors
   Assumption: the distribution of entities in coherent
   texts exhibits certain regularities reflected in grid
   topology.
   One would further expect that entities
   corresponding to dense columns are more often
   subjects or objects (for instance.)
The Coherence Model
Analysis revolves around local entity transition:
   A sequence {S, O, X, –}n that represents entity
   occurrences and their syntactic roles in n adjacent
   sentences. (And their probability in the text).
   Each text is represented by a fixed set of
   transition sequences using a standard feature
   vector notation – which can be used for:
     Learning algorithms
     Identifying information relevant to coherence
     assessment
The Coherence Model
3.3 Grid Construction: Linguistic Dimensions
   What linguistic information is relevant for
   coherence prediction?
   How should we represent those?
   What should the parameters be for a good
   computational, automatic model?
The Coherence Model
Parameters:
  Exploration of the parameter space guided by:
     Linguistic importance of parameter (linked to local
     coherence)
     Accuracy of automatic computation
     (granularity, etc.)
     Size of the resulting feature space (too big is not
     good.)
The Coherence Model
Entity extraction:
   Co-reference resolution system (Ng & Cardie
   2002) (Various lexical, grammatical, semantic,
   positional features)
   For different domains/languages: simply cluster
   nouns based on identify. Works consistently.

Grammatical function:
   Collins’ 1997 Parser
The Coherence Model
Salience:
     Evaluate by using two models: one with uniform
     treatment, one that discriminates between
     transitions of salient entities and the rest.
     Frequency counts.
     Compute each salient group’s transitions
     separately, then combine then into a single feature
     vector.
The Coherence Model
With feature vector representation, coherence
assessment becomes a machine learning
problem.

By encoding texts as entity transition sequences,
the algorithm can learn a ranking function
(instead of manually specifying it.)

The feature vector representation can also be
used for conventional classification tasks (apart
from information ordering and summary
coherence rating).
Sentence Ordering
A document is a bag of sentences, and the
algorithm task is to find the maximal coherent
order.

Again, the algorithm is used here to rank
alternative sentence orderings, but not to find the
optimal one. (Local coherence is not enough for
this.)
Sentence Ordering
3.1 Modeling
  Training set: ordered pairs of alternate readings
  (xij,xik)
  Document di, j > k
  Goal is to find parameter vector w such that yields
  a ranking score function which minimises
  violations or pairwise rankings in training set:
      ∀(xij,xik)∈r∗ : w · Φ(xij) > w · Φ(xik)
  r* = optimal ranking
  Φ(xij) and Φ(xik) are a mapping onto features
  representing the coherence properties of
  renderings xij and xik
Sentence Ordering
3.1 Modeling
  Ideal ranking function, represented by the weight
  vector w would satisfy the condition:
     w · (Φ(xij) − Φ(xik)) > 0 ∀j, i, k such that j > k
  Total number of training and test instances in
  corpora:
     Earthquakes: Train 1,896, Test 2,056
     Accidents: Train 2,095, Test 2,087
Sentence Ordering
4.2 Method
  Data: To acquire a large collection for training and
  testing, B&L created synthetic data, wherein the
  candidate set consists of a source document and
  permutations of its sentences.
  AP press articles on earthquakes, National
  Transportation Safety Board’s aviation accident
  database
    100 source articles, with up to 20 randomly
    generated permutations for training.
Sentence Ordering
Comparison with State-of-the-Art Methods
   Compared against Foltz, Kintsch, Landauer 1998
   Barzilay and Lee 2004
     Both rely mainly on lexical information, unlike here.
     FKL98: LSA coherence measure for semantic
     relatedness of adjacent sentences
     BL04: HMM, where states are topics

Evaluation Metric
   Accuracy = correct predictions / size of test.
   Random = 50% (Binary)
Sentence Ordering
Sentence Ordering
Comparison with SotA Methods:
  Outperforms LSA on both domains: Because of –
    Coreference + grammatical role information, more
    holistic representation (over more than 2
    sentences), exposure to domain relevant texts.
  HMM comparable for Earthquakes corpora, not
  Accidents
    may be complementary
Sentence Ordering
Sentence Ordering
Summary Coherence
       Rating
Tested model-induced rankings against human
rankings.

If successful, holds implications for automatic
evaluation of machine-generated texts.

Better than BLEU or ROUGE, which weren’t
designed for this.
Summary Coherence
       Rating
5.1 Modeling
  Summary coherence is also a ranking learning
  task.
  Same as before.
Summary Coherence
       Rating
5.2 Data
   Evaluation based on Document Understanding
   Conference 2003, which has rated summaries.
   Not good enough, so randomly selected 16 input
   document clusters and five systems that
   produced summaries.
   Ratings collected by 177 internet slaves (unpaid
   volunteers). These were then checked by leave-
   one-out-resampling and discretization into two
   classes.
   Training: 144 summaries. Test: 80 pairwise
   ratings. Dev: 6 documents.
Summary Coherence
       Rating
Experiment 1: co-reference resolution tool for
human-written texts. Here: co-reference tool to
automatically generated summaries.

Compared against LSA, not B&L 04 (domain-
dependent)
   Did much better (p < .01)
Summary Coherence
     Rating
Summary Coherence
     Rating
Readability Assessment
Can entity grids be used for style classification?

Judged against Schwarm and Ostendorf 2005’s
method for assessing readability (among others)
Readability Assessment
As in S&O05, readability assessment is a
classification task.

Training sample consisted of n documents such
that
   (x⃗1,y1),...,(x⃗n,yn) x⃗i ∈RN,yi∈{−1,+1}

where x⃗i is a feature vector for the ith document
in the training sample and yi its (positive or
negative) class label.
Readability Assessment
6.2 Method
  Data: 107 artciles from Encyclopedia Britannica
  and Britannica Elemenary (from Barzilay &
  Elhadad 2003)
Readability Assessment
Readability Assessment
Features: Two versions, one with S&O features:
   Syntactic, semantic, combination (Flesch-Kincaid
   formula)




One with more features:
   Coherence based, with entity transition notation
   (compared against LSA)
Readability Assessment
Readability Assessment
Discussion and
         Conclusions
Presented: novel framework for representing and
measuring text coherence.
Central to this framework is the entity-grid
representation of discourse, which captures
important patterns of sentence transitions.
Coherence assessment rec-onceptualised as a
learning task.
Good performance on text ordering, summary
coherence evaluation, and readability
assessment.
Discussion and
         Conclusions
The entity grid is a flexible, yet computationally
tractable, representation.

Three important parameters for grid construction:
   the computation of coreferring entity classes,
   the inclusion of syntactic knowledge;
   the influence of salience.

Empirically validated the importance of salience
and syntactic information for coherence-based
models.
Discussion and
         Conclusions
Full coreference resolution not perfect.
(mismatches between training and testing
conditions.)

Instead of automatic coreference resolution
system, entity classes can be approximated
simply by string matching.
Discussion and
          Conclusions
This approach is not a direct implementation of any
theory in particular, in favor of automatic computation
and breadth of coverage.

Findings:
   pronominalization is a good indicator of document
   coherence.
   coherent texts are characterized by transitions with
   particular properties which do not hold for all
   discourses.
   These models are sensitive to the domain at hand and
   the type of texts under consideration (human-authored
   vs. machine generated texts).
Discussion and
          Conclusions
Future work:
   Augmenting entity-based representation with fine-
   grained lexico-semantic knowledge.
      cluster entities based on their semantic relatedness,
      thereby creating a grid representation over lexical
      chains.
      develop fully lexicalized models, akin to traditional
      language models.
   Expanding grammatical categories to modifiers and
   adjuncts may provide additional information, in
   particular when considering machine generated texts.
   Investigating whether the proposed discourse
   representation and modeling approaches generalize
   across different languages
   Improving prediction on both local and global levels,
   with the ultimate goal of handling longer texts.

Mais conteúdo relacionado

Mais procurados

An Entity-Driven Recursive Neural Network Model for Chinese Discourse Coheren...
An Entity-Driven Recursive Neural Network Model for Chinese Discourse Coheren...An Entity-Driven Recursive Neural Network Model for Chinese Discourse Coheren...
An Entity-Driven Recursive Neural Network Model for Chinese Discourse Coheren...ijaia
 
Information Retrieval Models
Information Retrieval ModelsInformation Retrieval Models
Information Retrieval ModelsNisha Arankandath
 
Metrics for Evaluating Quality of Embeddings for Ontological Concepts
Metrics for Evaluating Quality of Embeddings for Ontological Concepts Metrics for Evaluating Quality of Embeddings for Ontological Concepts
Metrics for Evaluating Quality of Embeddings for Ontological Concepts Saeedeh Shekarpour
 
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from TextCooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from TextUniversity of Bari (Italy)
 
Blei ngjordan2003
Blei ngjordan2003Blei ngjordan2003
Blei ngjordan2003Ajay Ohri
 
Discovering Novel Information with sentence Level clustering From Multi-docu...
Discovering Novel Information with sentence Level clustering  From Multi-docu...Discovering Novel Information with sentence Level clustering  From Multi-docu...
Discovering Novel Information with sentence Level clustering From Multi-docu...irjes
 
Ontology Matching Based on hypernym, hyponym, holonym, and meronym Sets in Wo...
Ontology Matching Based on hypernym, hyponym, holonym, and meronym Sets in Wo...Ontology Matching Based on hypernym, hyponym, holonym, and meronym Sets in Wo...
Ontology Matching Based on hypernym, hyponym, holonym, and meronym Sets in Wo...dannyijwest
 
New Quantitative Methodology for Identification of Drug Abuse Based on Featur...
New Quantitative Methodology for Identification of Drug Abuse Based on Featur...New Quantitative Methodology for Identification of Drug Abuse Based on Featur...
New Quantitative Methodology for Identification of Drug Abuse Based on Featur...Carrie Wang
 
Topic models
Topic modelsTopic models
Topic modelsAjay Ohri
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingNimrita Koul
 
SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING
SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING
SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING cscpconf
 
Computational Intelligence Methods for Clustering of Sense Tagged Nepali Docu...
Computational Intelligence Methods for Clustering of Sense Tagged Nepali Docu...Computational Intelligence Methods for Clustering of Sense Tagged Nepali Docu...
Computational Intelligence Methods for Clustering of Sense Tagged Nepali Docu...IOSR Journals
 
Usage of word sense disambiguation in concept identification in ontology cons...
Usage of word sense disambiguation in concept identification in ontology cons...Usage of word sense disambiguation in concept identification in ontology cons...
Usage of word sense disambiguation in concept identification in ontology cons...Innovation Quotient Pvt Ltd
 
Taxonomy extraction from automotive natural language requirements using unsup...
Taxonomy extraction from automotive natural language requirements using unsup...Taxonomy extraction from automotive natural language requirements using unsup...
Taxonomy extraction from automotive natural language requirements using unsup...ijnlc
 
Domain-Specific Term Extraction for Concept Identification in Ontology Constr...
Domain-Specific Term Extraction for Concept Identification in Ontology Constr...Domain-Specific Term Extraction for Concept Identification in Ontology Constr...
Domain-Specific Term Extraction for Concept Identification in Ontology Constr...Innovation Quotient Pvt Ltd
 

Mais procurados (19)

An Entity-Driven Recursive Neural Network Model for Chinese Discourse Coheren...
An Entity-Driven Recursive Neural Network Model for Chinese Discourse Coheren...An Entity-Driven Recursive Neural Network Model for Chinese Discourse Coheren...
An Entity-Driven Recursive Neural Network Model for Chinese Discourse Coheren...
 
Ijetcas14 624
Ijetcas14 624Ijetcas14 624
Ijetcas14 624
 
Information Retrieval Models
Information Retrieval ModelsInformation Retrieval Models
Information Retrieval Models
 
Metrics for Evaluating Quality of Embeddings for Ontological Concepts
Metrics for Evaluating Quality of Embeddings for Ontological Concepts Metrics for Evaluating Quality of Embeddings for Ontological Concepts
Metrics for Evaluating Quality of Embeddings for Ontological Concepts
 
Distributional semantics
Distributional semanticsDistributional semantics
Distributional semantics
 
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from TextCooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
 
Blei ngjordan2003
Blei ngjordan2003Blei ngjordan2003
Blei ngjordan2003
 
Topicmodels
TopicmodelsTopicmodels
Topicmodels
 
Discovering Novel Information with sentence Level clustering From Multi-docu...
Discovering Novel Information with sentence Level clustering  From Multi-docu...Discovering Novel Information with sentence Level clustering  From Multi-docu...
Discovering Novel Information with sentence Level clustering From Multi-docu...
 
Ontology Matching Based on hypernym, hyponym, holonym, and meronym Sets in Wo...
Ontology Matching Based on hypernym, hyponym, holonym, and meronym Sets in Wo...Ontology Matching Based on hypernym, hyponym, holonym, and meronym Sets in Wo...
Ontology Matching Based on hypernym, hyponym, holonym, and meronym Sets in Wo...
 
New Quantitative Methodology for Identification of Drug Abuse Based on Featur...
New Quantitative Methodology for Identification of Drug Abuse Based on Featur...New Quantitative Methodology for Identification of Drug Abuse Based on Featur...
New Quantitative Methodology for Identification of Drug Abuse Based on Featur...
 
Topic models
Topic modelsTopic models
Topic models
 
Canini09a
Canini09aCanini09a
Canini09a
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING
SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING
SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING
 
Computational Intelligence Methods for Clustering of Sense Tagged Nepali Docu...
Computational Intelligence Methods for Clustering of Sense Tagged Nepali Docu...Computational Intelligence Methods for Clustering of Sense Tagged Nepali Docu...
Computational Intelligence Methods for Clustering of Sense Tagged Nepali Docu...
 
Usage of word sense disambiguation in concept identification in ontology cons...
Usage of word sense disambiguation in concept identification in ontology cons...Usage of word sense disambiguation in concept identification in ontology cons...
Usage of word sense disambiguation in concept identification in ontology cons...
 
Taxonomy extraction from automotive natural language requirements using unsup...
Taxonomy extraction from automotive natural language requirements using unsup...Taxonomy extraction from automotive natural language requirements using unsup...
Taxonomy extraction from automotive natural language requirements using unsup...
 
Domain-Specific Term Extraction for Concept Identification in Ontology Constr...
Domain-Specific Term Extraction for Concept Identification in Ontology Constr...Domain-Specific Term Extraction for Concept Identification in Ontology Constr...
Domain-Specific Term Extraction for Concept Identification in Ontology Constr...
 

Destaque

Deep and surface_structures
Deep and surface_structuresDeep and surface_structures
Deep and surface_structuresAkzharka
 
Deep & surface learning
Deep & surface learningDeep & surface learning
Deep & surface learningRosa Martinez
 
Transformational grammar
Transformational grammarTransformational grammar
Transformational grammarJack Feng
 
Transformations in Transformational Generative Grammar
Transformations in Transformational Generative GrammarTransformations in Transformational Generative Grammar
Transformations in Transformational Generative GrammarBayu Jaka Magistra
 
Summary - Transformational-Generative Theory
Summary - Transformational-Generative TheorySummary - Transformational-Generative Theory
Summary - Transformational-Generative TheoryMarielis VI
 
Transformational generative grammar
Transformational generative grammarTransformational generative grammar
Transformational generative grammarKat OngCan
 
Transformational-Generative Grammar
Transformational-Generative GrammarTransformational-Generative Grammar
Transformational-Generative GrammarRuth Ann Llego
 
Deep structure and surface structure
Deep structure and surface structureDeep structure and surface structure
Deep structure and surface structureAsif Ali Raza
 
grammaticality, deep & surface structure, and ambiguity
grammaticality, deep & surface structure, and ambiguitygrammaticality, deep & surface structure, and ambiguity
grammaticality, deep & surface structure, and ambiguityDedew Deviarini
 
Transformational Grammar by: Noam Chomsky
Transformational Grammar by: Noam ChomskyTransformational Grammar by: Noam Chomsky
Transformational Grammar by: Noam ChomskyShiela May Claro
 

Destaque (12)

Marcu 2000 presentation
Marcu 2000 presentationMarcu 2000 presentation
Marcu 2000 presentation
 
Deep and surface_structures
Deep and surface_structuresDeep and surface_structures
Deep and surface_structures
 
Deep & surface learning
Deep & surface learningDeep & surface learning
Deep & surface learning
 
MORPHOSYNTAX: GENERATIVE MORPHOLOGY
MORPHOSYNTAX: GENERATIVE MORPHOLOGYMORPHOSYNTAX: GENERATIVE MORPHOLOGY
MORPHOSYNTAX: GENERATIVE MORPHOLOGY
 
Transformational grammar
Transformational grammarTransformational grammar
Transformational grammar
 
Transformations in Transformational Generative Grammar
Transformations in Transformational Generative GrammarTransformations in Transformational Generative Grammar
Transformations in Transformational Generative Grammar
 
Summary - Transformational-Generative Theory
Summary - Transformational-Generative TheorySummary - Transformational-Generative Theory
Summary - Transformational-Generative Theory
 
Transformational generative grammar
Transformational generative grammarTransformational generative grammar
Transformational generative grammar
 
Transformational-Generative Grammar
Transformational-Generative GrammarTransformational-Generative Grammar
Transformational-Generative Grammar
 
Deep structure and surface structure
Deep structure and surface structureDeep structure and surface structure
Deep structure and surface structure
 
grammaticality, deep & surface structure, and ambiguity
grammaticality, deep & surface structure, and ambiguitygrammaticality, deep & surface structure, and ambiguity
grammaticality, deep & surface structure, and ambiguity
 
Transformational Grammar by: Noam Chomsky
Transformational Grammar by: Noam ChomskyTransformational Grammar by: Noam Chomsky
Transformational Grammar by: Noam Chomsky
 

Semelhante a Barzilay & Lapata 2008 presentation

Dexa2007 Orsi V1.5
Dexa2007 Orsi V1.5Dexa2007 Orsi V1.5
Dexa2007 Orsi V1.5Giorgio Orsi
 
Clustering sentence level text using a novel fuzzy relational clustering algo...
Clustering sentence level text using a novel fuzzy relational clustering algo...Clustering sentence level text using a novel fuzzy relational clustering algo...
Clustering sentence level text using a novel fuzzy relational clustering algo...JPINFOTECH JAYAPRAKASH
 
Data Integration Ontology Mapping
Data Integration Ontology MappingData Integration Ontology Mapping
Data Integration Ontology MappingPradeep B Pillai
 
ONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATION
ONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATIONONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATION
ONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATIONIJDKP
 
G04124041046
G04124041046G04124041046
G04124041046IOSR-JEN
 
Learning from similarity and information extraction from structured documents...
Learning from similarity and information extraction from structured documents...Learning from similarity and information extraction from structured documents...
Learning from similarity and information extraction from structured documents...Infrrd
 
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from TextCooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from TextFulvio Rotella
 
SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity
SemEval-2012 Task 6: A Pilot on Semantic Textual SimilaritySemEval-2012 Task 6: A Pilot on Semantic Textual Similarity
SemEval-2012 Task 6: A Pilot on Semantic Textual Similaritypathsproject
 
Text Mining: (Asynchronous Sequences)
Text Mining: (Asynchronous Sequences)Text Mining: (Asynchronous Sequences)
Text Mining: (Asynchronous Sequences)IJERA Editor
 
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANS
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANSCONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANS
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANSijseajournal
 
A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categor...
A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categor...A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categor...
A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categor...Hiroshi Ono
 
FAST FUZZY FEATURE CLUSTERING FOR TEXT CLASSIFICATION
FAST FUZZY FEATURE CLUSTERING FOR TEXT CLASSIFICATION FAST FUZZY FEATURE CLUSTERING FOR TEXT CLASSIFICATION
FAST FUZZY FEATURE CLUSTERING FOR TEXT CLASSIFICATION cscpconf
 
Survey of Analogy Reasoning
Survey of Analogy ReasoningSurvey of Analogy Reasoning
Survey of Analogy ReasoningSang-Kyun Kim
 
Scaling Down Dimensions and Feature Extraction in Document Repository Classif...
Scaling Down Dimensions and Feature Extraction in Document Repository Classif...Scaling Down Dimensions and Feature Extraction in Document Repository Classif...
Scaling Down Dimensions and Feature Extraction in Document Repository Classif...ijdmtaiir
 
Doc format.
Doc format.Doc format.
Doc format.butest
 
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTEA FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTEijnlc
 
TextRank: Bringing Order into Texts
TextRank: Bringing Order into TextsTextRank: Bringing Order into Texts
TextRank: Bringing Order into TextsShubhangi Tandon
 

Semelhante a Barzilay & Lapata 2008 presentation (20)

Dexa2007 Orsi V1.5
Dexa2007 Orsi V1.5Dexa2007 Orsi V1.5
Dexa2007 Orsi V1.5
 
Clustering sentence level text using a novel fuzzy relational clustering algo...
Clustering sentence level text using a novel fuzzy relational clustering algo...Clustering sentence level text using a novel fuzzy relational clustering algo...
Clustering sentence level text using a novel fuzzy relational clustering algo...
 
Data Integration Ontology Mapping
Data Integration Ontology MappingData Integration Ontology Mapping
Data Integration Ontology Mapping
 
ONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATION
ONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATIONONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATION
ONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATION
 
G04124041046
G04124041046G04124041046
G04124041046
 
Learning from similarity and information extraction from structured documents...
Learning from similarity and information extraction from structured documents...Learning from similarity and information extraction from structured documents...
Learning from similarity and information extraction from structured documents...
 
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from TextCooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
 
Cc35451454
Cc35451454Cc35451454
Cc35451454
 
SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity
SemEval-2012 Task 6: A Pilot on Semantic Textual SimilaritySemEval-2012 Task 6: A Pilot on Semantic Textual Similarity
SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity
 
Text Mining: (Asynchronous Sequences)
Text Mining: (Asynchronous Sequences)Text Mining: (Asynchronous Sequences)
Text Mining: (Asynchronous Sequences)
 
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANS
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANSCONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANS
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANS
 
A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categor...
A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categor...A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categor...
A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categor...
 
FAST FUZZY FEATURE CLUSTERING FOR TEXT CLASSIFICATION
FAST FUZZY FEATURE CLUSTERING FOR TEXT CLASSIFICATION FAST FUZZY FEATURE CLUSTERING FOR TEXT CLASSIFICATION
FAST FUZZY FEATURE CLUSTERING FOR TEXT CLASSIFICATION
 
P13 corley
P13 corleyP13 corley
P13 corley
 
Survey of Analogy Reasoning
Survey of Analogy ReasoningSurvey of Analogy Reasoning
Survey of Analogy Reasoning
 
Scaling Down Dimensions and Feature Extraction in Document Repository Classif...
Scaling Down Dimensions and Feature Extraction in Document Repository Classif...Scaling Down Dimensions and Feature Extraction in Document Repository Classif...
Scaling Down Dimensions and Feature Extraction in Document Repository Classif...
 
Doc format.
Doc format.Doc format.
Doc format.
 
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTEA FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
 
TextRank: Bringing Order into Texts
TextRank: Bringing Order into TextsTextRank: Bringing Order into Texts
TextRank: Bringing Order into Texts
 
L1803058388
L1803058388L1803058388
L1803058388
 

Mais de Richard Littauer

Academic Research in the Blogosphere: Adapting to New Risks and Opportunities...
Academic Research in the Blogosphere: Adapting to New Risks and Opportunities...Academic Research in the Blogosphere: Adapting to New Risks and Opportunities...
Academic Research in the Blogosphere: Adapting to New Risks and Opportunities...Richard Littauer
 
Named Entity Recognition - ACL 2011 Presentation
Named Entity Recognition - ACL 2011 PresentationNamed Entity Recognition - ACL 2011 Presentation
Named Entity Recognition - ACL 2011 PresentationRichard Littauer
 
Building Corpora from Social Media
Building Corpora from Social MediaBuilding Corpora from Social Media
Building Corpora from Social MediaRichard Littauer
 
Visualising Typological Relationships: Plotting WALS with Heat Maps
Visualising Typological Relationships: Plotting WALS with Heat MapsVisualising Typological Relationships: Plotting WALS with Heat Maps
Visualising Typological Relationships: Plotting WALS with Heat MapsRichard Littauer
 
On Tocharian Exceptionality to the centum/satem Isogloss
On Tocharian Exceptionality to the centum/satem IsoglossOn Tocharian Exceptionality to the centum/satem Isogloss
On Tocharian Exceptionality to the centum/satem IsoglossRichard Littauer
 
The Evolution of Morphological Agreement
The Evolution of Morphological AgreementThe Evolution of Morphological Agreement
The Evolution of Morphological AgreementRichard Littauer
 
Trends in Use of Scientific Workflows: Insights from a Public Repository and ...
Trends in Use of Scientific Workflows: Insights from a Public Repository and ...Trends in Use of Scientific Workflows: Insights from a Public Repository and ...
Trends in Use of Scientific Workflows: Insights from a Public Repository and ...Richard Littauer
 
Evolution of Morphological Agreement - Peche Kucha
Evolution of Morphological Agreement - Peche KuchaEvolution of Morphological Agreement - Peche Kucha
Evolution of Morphological Agreement - Peche KuchaRichard Littauer
 
Workflow Classification and Open-Sourcing Methods: Towards a New Publication ...
Workflow Classification and Open-Sourcing Methods: Towards a New Publication ...Workflow Classification and Open-Sourcing Methods: Towards a New Publication ...
Workflow Classification and Open-Sourcing Methods: Towards a New Publication ...Richard Littauer
 
The Evolution of Speech Segmentation: A Computer Simulation
The Evolution of Speech Segmentation: A Computer SimulationThe Evolution of Speech Segmentation: A Computer Simulation
The Evolution of Speech Segmentation: A Computer SimulationRichard Littauer
 
Towards Open Methods: Using Scientific Workflows in Linguistics
Towards Open Methods: Using Scientific Workflows in LinguisticsTowards Open Methods: Using Scientific Workflows in Linguistics
Towards Open Methods: Using Scientific Workflows in LinguisticsRichard Littauer
 
A Reanalysis of Anatomical Changes for Language
A Reanalysis of Anatomical Changes for LanguageA Reanalysis of Anatomical Changes for Language
A Reanalysis of Anatomical Changes for LanguageRichard Littauer
 

Mais de Richard Littauer (13)

Academic Research in the Blogosphere: Adapting to New Risks and Opportunities...
Academic Research in the Blogosphere: Adapting to New Risks and Opportunities...Academic Research in the Blogosphere: Adapting to New Risks and Opportunities...
Academic Research in the Blogosphere: Adapting to New Risks and Opportunities...
 
Named Entity Recognition - ACL 2011 Presentation
Named Entity Recognition - ACL 2011 PresentationNamed Entity Recognition - ACL 2011 Presentation
Named Entity Recognition - ACL 2011 Presentation
 
Saarland and UdS
Saarland and UdSSaarland and UdS
Saarland and UdS
 
Building Corpora from Social Media
Building Corpora from Social MediaBuilding Corpora from Social Media
Building Corpora from Social Media
 
Visualising Typological Relationships: Plotting WALS with Heat Maps
Visualising Typological Relationships: Plotting WALS with Heat MapsVisualising Typological Relationships: Plotting WALS with Heat Maps
Visualising Typological Relationships: Plotting WALS with Heat Maps
 
On Tocharian Exceptionality to the centum/satem Isogloss
On Tocharian Exceptionality to the centum/satem IsoglossOn Tocharian Exceptionality to the centum/satem Isogloss
On Tocharian Exceptionality to the centum/satem Isogloss
 
The Evolution of Morphological Agreement
The Evolution of Morphological AgreementThe Evolution of Morphological Agreement
The Evolution of Morphological Agreement
 
Trends in Use of Scientific Workflows: Insights from a Public Repository and ...
Trends in Use of Scientific Workflows: Insights from a Public Repository and ...Trends in Use of Scientific Workflows: Insights from a Public Repository and ...
Trends in Use of Scientific Workflows: Insights from a Public Repository and ...
 
Evolution of Morphological Agreement - Peche Kucha
Evolution of Morphological Agreement - Peche KuchaEvolution of Morphological Agreement - Peche Kucha
Evolution of Morphological Agreement - Peche Kucha
 
Workflow Classification and Open-Sourcing Methods: Towards a New Publication ...
Workflow Classification and Open-Sourcing Methods: Towards a New Publication ...Workflow Classification and Open-Sourcing Methods: Towards a New Publication ...
Workflow Classification and Open-Sourcing Methods: Towards a New Publication ...
 
The Evolution of Speech Segmentation: A Computer Simulation
The Evolution of Speech Segmentation: A Computer SimulationThe Evolution of Speech Segmentation: A Computer Simulation
The Evolution of Speech Segmentation: A Computer Simulation
 
Towards Open Methods: Using Scientific Workflows in Linguistics
Towards Open Methods: Using Scientific Workflows in LinguisticsTowards Open Methods: Using Scientific Workflows in Linguistics
Towards Open Methods: Using Scientific Workflows in Linguistics
 
A Reanalysis of Anatomical Changes for Language
A Reanalysis of Anatomical Changes for LanguageA Reanalysis of Anatomical Changes for Language
A Reanalysis of Anatomical Changes for Language
 

Último

Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 

Último (20)

Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 

Barzilay & Lapata 2008 presentation

  • 1. Modeling Local Coherence: An Entity- Based Approach Regina Barzilay (MIT), Mirella Lapata (UoE) ACL 2008
  • 2. Abstract This article proposes a novel framework for representing and measuring local coherence. Central to this approach is the entity-grid representation of discourse, which captures patterns of entity distribution in a text. The algorithm introduced in the article automatically abstracts a text into a set of entity transition sequences and records distributional, syntactic, and referential information about discourse entities. We re-conceptualize coherence assessment as a learning task and show that our entity-based representation is well-suited for ranking-based generation and text classification tasks. Using the proposed representation, we achieve good performance on text ordering, summary coherence evaluation, and readability assessment.
  • 3. Introduction A key requirement for any system that produces text is the coherence of its output. Use of coherence theories: text generation, especially indistinguishable from human writing Previous efforts have relied on handcrafted rules, valid only for limited domains, with no guarantee of scalability or portability (Reiter and Dale 2000). Furthermore, coherence constraints are often embedded in complex representations (e.g., Asher and Lascarides 2003) which are hard to implement in a robust application.
  • 4. Introduction Here, the focus is on local coherence (sentence to sentence). Necessary for global coherence, too. The key premise of our work is that the distribution of entities in locally coherent texts exhibits certain regularities. Covered before in Centering Theory (Grosz, Joshi, and Weinstein 1995) and other entity- based theories of discourse (e.g., Givon 1987; Prince 1981).
  • 5. Introduction The proposed entity-based representation of discourse allows us to learn the properties of coherent texts from a corpus, without recourse to manual annotation or a predefined knowledge base. Usefulness tests: text ordering, automatic evaluation of summary coherence, and readability assessment. Lapata formulates text ordering and summary evaluation—as ranking problems, wit a learning model.
  • 6. Introduction Evaluation: In the text-ordering task our algorithm has to select a maximally coherent sentence order from a set of candidate permutations. In the summary evaluation task, we compare the rankings produced by the model against human coherence judgments elicited for automatically generated summaries. In both experiments, our method yields improvements over state-of-the-art models.
  • 7. Introduction Evaluation: By incorporating coherence features stemming from the proposed entity-based representation, we improve the performance of a state-of-the-art readability assessment system
  • 8. Outline 2. Related Work 3. The Coherence model 4. Experiment 1: Sentence Ordering 5. Experiment 2: Summary Coherence Rating 6. Experiment 3: Readabiality Assessment 7. Discussion and Conclusions
  • 9. Related Work 2. Related Work 1. Summary of entity-based theories of discourse, and overview previous attempts for translating their underlying principles into computational coherence models. 2. Description of ranking approaches to natural language generation and focus on coherence metrics used in current text planners.
  • 10. Related Work 2.1 Entity-Based Approaches to Local Coherence Entity-based accounts of local coherence common Unifying assumption: discourse coherence is achieved in view of the way discourse entities are introduced and discussed. Commonly formalized by devising constraints on the linguistic realization and distribution of discourse entities in coherent texts.
  • 11. Related Work Centering theory: salience concerns how entities are realized in an utterance Else: salience is defined in terms of topicality (Chafe 1976; Prince 1978), predictability (Kuno 1972; Halliday and Hasan 1976), and cognitive accessibility (Gundel, Hedberg, and Zacharski 1993)
  • 12. Related Work Entity-based theories: capture coherence by characterizing the distribution of entities across discourse utterances, distinguishing between salient entities and the rest. The intuition here is that texts about the same discourse entity are perceived to be more coherent than texts fraught with abrupt switches from one topic to the next.
  • 13. Related Work Hard to model coherence computationally (often as the underlying theories are not fleshed out) Often use manual annotations as bootstrappers for algorithms.
  • 14. Related Work B & L: Not based on any particular theory Inference model combines relevant information (not manual annotations) Emphasizes automatic computation for both the underlying discourse representation and the inference procedure Automatic, albeit noisy, feature extraction allows performing a large scale evaluation of differently instantiated coherence models across genres and applications.
  • 15. Related Work 2.2 Ranking Approaches Produce a large set of candidate outputs, rank them based on desired features using a ranking function. Two-stage generate-and-rank system minimizes complexity. In re: coherence, text planning is important for coherent output. Same iterated ranking system for text plans. Feature selection & weighting done manually – not sufficient. ―The problem is far too complex and our knowledge of the issues involved so meager that only a token gesture can be made at this point.‖ (Mellish et al. 1998, p.100)
  • 16. Related Work B&L: Introduce an entity-based representation of discourse that is automatically computed from raw text; The representation reveals entity transition patterns characteristic of coherent texts. This can be easily translated into a large feature space which lends itself naturally to the effective learning of a ranking function, without explicit manual involvement.
  • 17. The Coherence Model 3.1 Entity-Grid Discourse Representation Each text is represented by an entity grid, a two- dimensional array that captures the distribution of discourse entities across text sentences. The rows of the grid correspond to sentences, and the columns correspond to discourse entities. By discourse entity we mean a class of coreferent noun phrases. Each grid cell thus corresponds to a string from a set of categories reflecting whether the entity in question is a subject (S), object (O), or neither (X)
  • 19. The Coherence Model 3.2 Entity Grids as Feature Vectors Assumption: the distribution of entities in coherent texts exhibits certain regularities reflected in grid topology. One would further expect that entities corresponding to dense columns are more often subjects or objects (for instance.)
  • 20. The Coherence Model Analysis revolves around local entity transition: A sequence {S, O, X, –}n that represents entity occurrences and their syntactic roles in n adjacent sentences. (And their probability in the text). Each text is represented by a fixed set of transition sequences using a standard feature vector notation – which can be used for: Learning algorithms Identifying information relevant to coherence assessment
  • 21. The Coherence Model 3.3 Grid Construction: Linguistic Dimensions What linguistic information is relevant for coherence prediction? How should we represent those? What should the parameters be for a good computational, automatic model?
  • 22. The Coherence Model Parameters: Exploration of the parameter space guided by: Linguistic importance of parameter (linked to local coherence) Accuracy of automatic computation (granularity, etc.) Size of the resulting feature space (too big is not good.)
  • 23. The Coherence Model Entity extraction: Co-reference resolution system (Ng & Cardie 2002) (Various lexical, grammatical, semantic, positional features) For different domains/languages: simply cluster nouns based on identify. Works consistently. Grammatical function: Collins’ 1997 Parser
  • 24. The Coherence Model Salience: Evaluate by using two models: one with uniform treatment, one that discriminates between transitions of salient entities and the rest. Frequency counts. Compute each salient group’s transitions separately, then combine then into a single feature vector.
  • 25. The Coherence Model With feature vector representation, coherence assessment becomes a machine learning problem. By encoding texts as entity transition sequences, the algorithm can learn a ranking function (instead of manually specifying it.) The feature vector representation can also be used for conventional classification tasks (apart from information ordering and summary coherence rating).
  • 26. Sentence Ordering A document is a bag of sentences, and the algorithm task is to find the maximal coherent order. Again, the algorithm is used here to rank alternative sentence orderings, but not to find the optimal one. (Local coherence is not enough for this.)
  • 27. Sentence Ordering 3.1 Modeling Training set: ordered pairs of alternate readings (xij,xik) Document di, j > k Goal is to find parameter vector w such that yields a ranking score function which minimises violations or pairwise rankings in training set: ∀(xij,xik)∈r∗ : w · Φ(xij) > w · Φ(xik) r* = optimal ranking Φ(xij) and Φ(xik) are a mapping onto features representing the coherence properties of renderings xij and xik
  • 28. Sentence Ordering 3.1 Modeling Ideal ranking function, represented by the weight vector w would satisfy the condition: w · (Φ(xij) − Φ(xik)) > 0 ∀j, i, k such that j > k Total number of training and test instances in corpora: Earthquakes: Train 1,896, Test 2,056 Accidents: Train 2,095, Test 2,087
  • 29. Sentence Ordering 4.2 Method Data: To acquire a large collection for training and testing, B&L created synthetic data, wherein the candidate set consists of a source document and permutations of its sentences. AP press articles on earthquakes, National Transportation Safety Board’s aviation accident database 100 source articles, with up to 20 randomly generated permutations for training.
  • 30. Sentence Ordering Comparison with State-of-the-Art Methods Compared against Foltz, Kintsch, Landauer 1998 Barzilay and Lee 2004 Both rely mainly on lexical information, unlike here. FKL98: LSA coherence measure for semantic relatedness of adjacent sentences BL04: HMM, where states are topics Evaluation Metric Accuracy = correct predictions / size of test. Random = 50% (Binary)
  • 32. Sentence Ordering Comparison with SotA Methods: Outperforms LSA on both domains: Because of – Coreference + grammatical role information, more holistic representation (over more than 2 sentences), exposure to domain relevant texts. HMM comparable for Earthquakes corpora, not Accidents may be complementary
  • 35. Summary Coherence Rating Tested model-induced rankings against human rankings. If successful, holds implications for automatic evaluation of machine-generated texts. Better than BLEU or ROUGE, which weren’t designed for this.
  • 36. Summary Coherence Rating 5.1 Modeling Summary coherence is also a ranking learning task. Same as before.
  • 37. Summary Coherence Rating 5.2 Data Evaluation based on Document Understanding Conference 2003, which has rated summaries. Not good enough, so randomly selected 16 input document clusters and five systems that produced summaries. Ratings collected by 177 internet slaves (unpaid volunteers). These were then checked by leave- one-out-resampling and discretization into two classes. Training: 144 summaries. Test: 80 pairwise ratings. Dev: 6 documents.
  • 38. Summary Coherence Rating Experiment 1: co-reference resolution tool for human-written texts. Here: co-reference tool to automatically generated summaries. Compared against LSA, not B&L 04 (domain- dependent) Did much better (p < .01)
  • 41. Readability Assessment Can entity grids be used for style classification? Judged against Schwarm and Ostendorf 2005’s method for assessing readability (among others)
  • 42. Readability Assessment As in S&O05, readability assessment is a classification task. Training sample consisted of n documents such that (x⃗1,y1),...,(x⃗n,yn) x⃗i ∈RN,yi∈{−1,+1} where x⃗i is a feature vector for the ith document in the training sample and yi its (positive or negative) class label.
  • 43. Readability Assessment 6.2 Method Data: 107 artciles from Encyclopedia Britannica and Britannica Elemenary (from Barzilay & Elhadad 2003)
  • 45. Readability Assessment Features: Two versions, one with S&O features: Syntactic, semantic, combination (Flesch-Kincaid formula) One with more features: Coherence based, with entity transition notation (compared against LSA)
  • 48. Discussion and Conclusions Presented: novel framework for representing and measuring text coherence. Central to this framework is the entity-grid representation of discourse, which captures important patterns of sentence transitions. Coherence assessment rec-onceptualised as a learning task. Good performance on text ordering, summary coherence evaluation, and readability assessment.
  • 49. Discussion and Conclusions The entity grid is a flexible, yet computationally tractable, representation. Three important parameters for grid construction: the computation of coreferring entity classes, the inclusion of syntactic knowledge; the influence of salience. Empirically validated the importance of salience and syntactic information for coherence-based models.
  • 50. Discussion and Conclusions Full coreference resolution not perfect. (mismatches between training and testing conditions.) Instead of automatic coreference resolution system, entity classes can be approximated simply by string matching.
  • 51. Discussion and Conclusions This approach is not a direct implementation of any theory in particular, in favor of automatic computation and breadth of coverage. Findings: pronominalization is a good indicator of document coherence. coherent texts are characterized by transitions with particular properties which do not hold for all discourses. These models are sensitive to the domain at hand and the type of texts under consideration (human-authored vs. machine generated texts).
  • 52. Discussion and Conclusions Future work: Augmenting entity-based representation with fine- grained lexico-semantic knowledge. cluster entities based on their semantic relatedness, thereby creating a grid representation over lexical chains. develop fully lexicalized models, akin to traditional language models. Expanding grammatical categories to modifiers and adjuncts may provide additional information, in particular when considering machine generated texts. Investigating whether the proposed discourse representation and modeling approaches generalize across different languages Improving prediction on both local and global levels, with the ultimate goal of handling longer texts.

Notas do Editor

  1. Computed from a standard dependency Parser
  2. Does worse on out of domain training.
  3. Does worse on out of domain training.
  4. Does worse on out of domain training.
  5. Does worse on out of domain training.
  6. Does worse on out of domain training.
  7. Does worse on out of domain training.
  8. Does worse on out of domain training.
  9. Does worse on out of domain training.
  10. Does worse on out of domain training.
  11. Does worse on out of domain training.
  12. Does worse on out of domain training.
  13. Does worse on out of domain training.
  14. Does worse on out of domain training.
  15. Does worse on out of domain training.
  16. Does worse on out of domain training.
  17. Does worse on out of domain training.
  18. Does worse on out of domain training.
  19. Does worse on out of domain training.
  20. Does worse on out of domain training.