Axa Assurance Maroc - Insurer Innovation Award 2024
Reference Scope Identification in Citing Sentences
1. Reference Scope Identification
in Citing Sentences
Authors:
Amjad Abu-Jbara, Dragomir Radev
(University of Michigan)
Conference:
NAACL 2012
Expositor:
Akihiro Kameda
(Aizawa Lab. The University of Tokyo)
2. Abstract
● Problem:
● Multiple citation in one sentence
● There are many POS taggers developed using
different techniques for many major languages such
as transformation-based error-driven learning (Brill,
1995), decision trees (Black et al., 1992), Markov
model (Cutting et al., 1992), maximum entropy
methods (Ratnaparkhi, 1996) etc for English.
● Approach:Prepossessing
and 2+1+2*3+1=10 methods
4. Reference Preprocessing
(tagging, grouping, non-syntactical element removal)
● These constraints can be lexicalized (REF.1; REF.2),
unlexicalized (REF.3; TREF.4) or automatically learned
(REF.5; REF.6).
● These constraints can be lexicalized (GREF.1), unlexicalized
(GTREF.2) or automatically learned (GREF.3).
● (GTREF.1) apply fuzzy techniques for integrating source
syntax into hierarchical phrase-based systems (REF.2).
5. Approach 1(SVM,LR)
● Word classification
● with SVM, a logistic regression classifier
● Feature: Distance, Position(Before/After), in Segment(,.;
and, but, for, nor, or, so, yet), POS tag, Dependency
Distance, Dependency Relations, Common Ancestor Node,
Syntactic Distance
● Problem Example:
● There are many POS taggers developed using different
techniques for many major languages such as transformation-
based error-driven learning (Brill, 1995), decision trees (Black et
al., 1992), Markov model (Cutting et al., 1992), maximum entropy
methods (Ratnaparkhi, 1996) etc for English.
6. Approach 2(CRF)
● Sequence Labeling with CRF
● feature is same as Approach 1
7. Approach 3-S1-* (CRF/segment)
● segmentation (1)
● punctuation marks
● coordination conjunctions
– and, but, for, nor, or, so, yet
● a set of special expressions
– "for example", "for instance", "including", "includes",
"such as", "like", etc.
● [Rerankers have been successfully applied to numerous
NLP tasks such as] [parse selection (GTREF)], [parse
reranking (GREF)], [question-answering (REF)].
8. Approach 3-S2-* (CRF/segment)
● segmentation (2)
● chunking tool
– noun groups
– verb groups
– preposition groups
– adjective groups
– adverb groups
– other parts form segment by themselves
● [To] [score] [the output] [of] [the coreference models], [we]
[employ] [the commonly-used MUC scoring program (REF)]
[and] [the recently-developed CEAF scoring program (TREF)].
9. Approach 3-*-R1,2,3
(CRF/segment)
● R1: majority label of the words it contains
● R2: inside if any word is inside
● R3: outside if any word is outside
● [I O O O O] [I I I] [O O]
12. Data
● ACL Anthology Network Corpus
● 3300 sentences, citations in each ≧ 2
Annotation agreement
● 500 of 3300,
● Preprocessing is perfect
● Kappa coefficient of scope is
P ( A)−P ( E )
K= =2P ( A)−1=0.61
1−P ( E )
13. Tools
● Edinburgh Language Technology Text
Tokenization Toolkit (LT-TTT)
● text tokenization, part-of-speech tagging, chunking,
and noun phrase head identification.
● Stanford parser
● syntactic and dependency parsing
● LibSVM with linear kernel
● Weka
● logistic regression classification
14. Tools
● Machine Learning for Language Toolkit
(MALLET)
● CRF
Validation
● 10-fold cross validation
15. Experiment (Preprocessing)
These constraints can be lexicalized (REF.1; REF.2), ll
r ec a
●
unlexicalized (REF.3; TREF.4) or and 93 .1%learned
(REF.5; REF.6). 3% preci
s ion automatically
ng: 9 8 .
Taggi
● These constraints can be lexicalized (GREF.1), unlexicalized
(GTREF.2) or Perfect!
automatically learned (GREF.3).
Grouping:
(GTREF.1) apply fuzzy techniques for integrating source
a l:
●
syntax into hierarchicalence
removsystems (REF.2).
Non-syn tactic refer phrase-based ecall
9 0. 1% r
cision and
9 0.08% pre