An Alignment-based Pattern Representation Model for Information Extraction

An Alignment-based Pattern Representation Model
for Information Extraction
Seokhwan Kim, Minwoo Jeong, Gary Geunbae Lee
{megaup, stardust, gblee}@postech.ac.kr
Abstract - In this paper, we propose an alternative pattern representation model and the effective method of utilizing it. While the previous pattern
representation models completely depend on the result of dependency analysis, our approach is basically based on the lexical alignment and
considers the result of dependency analysis only as a meaningful feature of the alignment process. In this way, we can cope with the errors of
incomplete dependency analysis. An evaluation of a scenario template task shows that our proposed model outperforms the previous
syntax-dependent models.

Pattern Representation Model for Information Extraction
 Information Extraction  Pattern Representation Model for IE  Related Works
 Extracting the defined number of relevant  Problem Definition  Lexical Sequence Pattern Models
arguments from natural language documents  Ex)  A set of lexical sequences
 Subtasks <hum_tgt> be kidnapped
About 50 peasants have been kidnapped by terrorists of the FMNL
be kidnapped by <prep_ind>
# of arguments subtask
1 named-entity recognition Extraction Pattern  Syntax-dependent Pattern Models
2 binary relation extraction ?  A set of subtrees (from D-tree)
more than 2 relation/event extraction
incident type kidnapping kidnapped
 Approach prep_ind terrorists nsubjpass agent

 Automatic Pattern Learning prep_org FMNL peasants terrorists
hum_tgt peasants
 Pattern Representation Model prep_of

 Pattern Learning Algorithm (kidnapped ({HUM_TGT}-nsubjpass) FMNL
(kidnapped ({PREP_IND}-agent))
(kidnapped ({PREP_IND}-agent ({PREP_ORG}-prep_of)))

Method
 Our Approach  Pattern Sequences Extraction
 Pattern Model 1) Searching the sentences containing all  Ex) (3+1)/(0+1) = 4
 Lexical Sequence Pattern arguments of each tuple in source documents
2) Segmenting out subpart of the sentence kidnapped
 + Term Weight (from Dependency Analysis)
based on clausal boundaries nsubjpass agent
<HUM_TGT> of [NP] have been kidnapped by <PRED_IND>
3) Replacing the parts of arguments in the
(1+1)/(1+1) = 1 (2+1)/(1+1) = 1.5
1 0.33 0.33 4 4 4 1.5 1.5
sub-sentence with argument labels
 Computing Term Weights <HUM_TGT> <PREP_IND>
 Soft Pattern Matching
 Sequence Alignment wi = (ri + c) / (di + c) prep_of prep_of

about 50 peasants have been kidnapped by terrorists wi : weight of i-th term
ri : number of relevant terms within [NP] <PREP_ORG>
a subtree, ti as root
<HUM_TGT> of [NP] have been kidnapped by <PREP_IND> di : distance from root node (0+1)/(2+1) = 0.33 (1+1)/(2+1) =0.67
c : for smoothing (default:1)
Experiment
 Pattern Matching
 Sequence Alignment  Experimental Setup  Experimental Result
 Based on a Dynamic Programming  Data  Pattern Models
 Alignment Matrix  MUC-3/4 Data  SVO Model (Yangarber ‘00)
peasants have been kidnapped by terrorists  About the Terrorism Events  Linked-Chain Model (Greenwood ‘06)
<HUM_TGT> 1 0 0 0 0 0
of 0 1 0 0 0 0
 Simpler template structure with 4 slots  Subtree Model (Sudo ‘03)
[NP] 0 0.66 1 0 0 0  perp_ind, perp_org, phys_tgt, hum_tgt  Our Model
have 0 4 3 2 1 0
been 0 0 8 7 6 5  Dev-set (training), Test-set (evaluation)  Result
kidnapped 0 0 4 12 11 10  Preprocessing Model Precision Recall F-measure
by 0 0 0 8 13.5 12.5
<PRED_IND> 1.5 0.5 0 0 0 15  Dependency Parsing and NP-chunking SVO 21.74 20.62 21.16

 Stanford Parser Linked-Chain 20.04 26.55 22.84
 Matrix Computation  Extracting Pattern Candidates Subtree 23.34 32.73 27.25
Alignment 23.35 45.62 30.89
Mi-1,j-1 + sim i-1,j-1 * wi-1  Selecting all pattern candidates for test
Mi-1,j + gp * wi-1  Without pattern filtering  Our proposed model achieved much
Mi,j = max
Mi,j-1 + gp * wi  To compare not the pattern filtering higher recall than the other models with
0 method, but the representative performance similar precision
among pattern models

An Alignment-based Pattern Representation Model for Information Extraction

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (9)

Destaque

Destaque (10)

Mais de Seokhwan Kim

Mais de Seokhwan Kim (20)

Último

Último (20)

An Alignment-based Pattern Representation Model for Information Extraction