Reference Scope Identification in Citing Sentences

Reference Scope Identification
in Citing Sentences
　　　　　　　　　Authors:
Amjad Abu-Jbara, Dragomir Radev
(University of Michigan)
　　　　　　　　　　　　Conference:
NAACL 2012
　　　　　　　　　　　　Expositor:
Akihiro Kameda
(Aizawa Lab. The University of Tokyo)

Abstract
● Problem:
● Multiple citation in one sentence
● There are many POS taggers developed using
different techniques for many major languages such
as transformation-based error-driven learning (Brill,
1995), decision trees (Black et al., 1992), Markov
model (Cutting et al., 1992), maximum entropy
methods (Ratnaparkhi, 1996) etc for English.
● Approach:Prepossessing
　　　　　and 2+1+2*3+1=10 methods

Reference Preprocessing
(tagging, grouping, non-syntactical element removal)
● These constraints can be lexicalized (REF.1; REF.2),
unlexicalized (REF.3; TREF.4) or automatically learned
(REF.5; REF.6).

● These constraints can be lexicalized (GREF.1), unlexicalized
(GTREF.2) or automatically learned (GREF.3).

● (GTREF.1) apply fuzzy techniques for integrating source
syntax into hierarchical phrase-based systems (REF.2).

Approach 1(SVM,LR)
● Word classification
● with SVM, a logistic regression classifier
● Feature: Distance, Position(Before/After), in Segment(,.;
and, but, for, nor, or, so, yet), POS tag, Dependency
Distance, Dependency Relations, Common Ancestor Node,
Syntactic Distance
● Problem Example:
● There are many POS taggers developed using different
techniques for many major languages such as transformation-
based error-driven learning (Brill, 1995), decision trees (Black et
al., 1992), Markov model (Cutting et al., 1992), maximum entropy
methods (Ratnaparkhi, 1996) etc for English.

Approach 2(CRF)
● Sequence Labeling with CRF
● feature is same as Approach 1

Approach 3-S1-* (CRF/segment)
● segmentation (1)
● punctuation marks
● coordination conjunctions
– and, but, for, nor, or, so, yet
● a set of special expressions
– "for example", "for instance", "including", "includes",
"such as", "like", etc.
● [Rerankers have been successfully applied to numerous
NLP tasks such as] [parse selection (GTREF)], [parse
reranking (GREF)], [question-answering (REF)].

Approach 3-S2-* (CRF/segment)
● segmentation (2)
● chunking tool
– noun groups
– verb groups
– preposition groups
– adjective groups
– adverb groups
– other parts form segment by themselves
● [To] [score] [the output] [of] [the coreference models], [we]
[employ] [the commonly-used MUC scoring program (REF)]
[and] [the recently-developed CEAF scoring program (TREF)].

Approach 3-*-R1,2,3
(CRF/segment)
● R1: majority label of the words it contains
● R2: inside if any word is inside
● R3: outside if any word is outside
● [I O O O O] [I I I] [O O]

AR2011

the link grammar parser
(Sleator and Temperley,1991)

Data
● ACL Anthology Network Corpus
● 3300 sentences, citations in each ≧ 2

Annotation agreement
● 500 of 3300,
● Preprocessing is perfect
● Kappa coefficient of scope is
P ( A)−P ( E )
K= =2P ( A)−1=0.61
1−P ( E )

Tools
● Edinburgh Language Technology Text
Tokenization Toolkit (LT-TTT)
● text tokenization, part-of-speech tagging, chunking,
and noun phrase head identification.
● Stanford parser
● syntactic and dependency parsing
● LibSVM with linear kernel
● Weka
● logistic regression classification

Tools
● Machine Learning for Language Toolkit
(MALLET)
● CRF

Validation
● 10-fold cross validation

Experiment (Preprocessing)
These constraints can be lexicalized (REF.1; REF.2), ll
r ec a
●

unlexicalized (REF.3; TREF.4) or and 93 .1%learned
(REF.5; REF.6). 3% preci
s ion automatically
ng: 9 8 .
Taggi
● These constraints can be lexicalized (GREF.1), unlexicalized
(GTREF.2) or Perfect!
automatically learned (GREF.3).
Grouping:
(GTREF.1) apply fuzzy techniques for integrating source
a l:
●

syntax into hierarchicalence
removsystems (REF.2).
Non-syn tactic refer phrase-based ecall
9 0. 1% r
cision and
9 0.08% pre

Experiment (Main)
● CRF
● Chunking

● Majority

Feature Analysis
● Feature: Distance, Position(Before/After), Same
segment(,.; and, but, for, nor, or, so, yet), POS
tag, Dependency Distance, Dependency
Relations, Common Ancestor Node, Syntactic
Distance

Summary
● Identified reference scope in a sentence which
has multiple citation
● CRF
● Chunking

● Majority

Reference Scope Identification in Citing Sentences

Reference Scope Identification in Citing Sentences

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (20)

Semelhante a Reference Scope Identification in Citing Sentences

Semelhante a Reference Scope Identification in Citing Sentences (20)

Mais de Akihiro Kameda

Mais de Akihiro Kameda (7)

Último

Último (20)

Reference Scope Identification in Citing Sentences

Notas do Editor