SlideShare uma empresa Scribd logo
1 de 1
Baixar para ler offline
An Alignment-based Pattern Representation Model
                                              for Information Extraction
                                                                         Seokhwan Kim, Minwoo Jeong, Gary Geunbae Lee
                                                                             {megaup, stardust, gblee}@postech.ac.kr
 Abstract - In this paper, we propose an alternative pattern representation model and the effective method of utilizing it. While the previous pattern
 representation models completely depend on the result of dependency analysis, our approach is basically based on the lexical alignment and
 considers the result of dependency analysis only as a meaningful feature of the alignment process. In this way, we can cope with the errors of
 incomplete dependency analysis. An evaluation of a scenario template task shows that our proposed model outperforms the previous
 syntax-dependent models.


                                                                  Pattern Representation Model for Information Extraction
 Information Extraction                                                               Pattern Representation Model for IE                                        Related Works
 Extracting the defined number of relevant                                            Problem Definition                                                          Lexical Sequence Pattern Models
arguments from natural language documents                                                    Ex)                                                                     A set of lexical sequences
 Subtasks                                                                                                                                                                   <hum_tgt> be kidnapped
                                                                                           About 50 peasants have been kidnapped by terrorists of the FMNL
                                                                                                                                                                             be kidnapped by <prep_ind>
 # of arguments                         subtask
        1                      named-entity recognition                                                                             Extraction Pattern                Syntax-dependent Pattern Models
        2                      binary relation extraction                                                                                   ?                           A set of subtrees (from D-tree)
   more than 2                 relation/event extraction
                                                                                                             incident type   kidnapping                                                      kidnapped
 Approach                                                                                                   prep_ind        terrorists                                         nsubjpass                        agent


   Automatic Pattern Learning                                                                               prep_org        FMNL                                               peasants                  terrorists
                                                                                                             hum_tgt         peasants
     Pattern Representation Model                                                                                                                                                                                       prep_of


     Pattern Learning Algorithm                                                                                                                                    (kidnapped ({HUM_TGT}-nsubjpass)    FMNL
                                                                                                                                                                    (kidnapped ({PREP_IND}-agent))
                                                                                                                                                                    (kidnapped ({PREP_IND}-agent ({PREP_ORG}-prep_of)))

                                                                                                                    Method
 Our Approach                                                                                        Pattern Sequences Extraction
  Pattern Model                                                                                     1) Searching the sentences containing all                    Ex)                      (3+1)/(0+1) = 4
    Lexical Sequence Pattern                                                                        arguments of each tuple in source documents
                                                                                                     2) Segmenting out subpart of the sentence                                    kidnapped
    + Term Weight (from Dependency Analysis)
                                                                                                     based on clausal boundaries                                nsubjpass                                agent
      <HUM_TGT>         of      [NP]     have    been      kidnapped   by     <PRED_IND>
                                                                                                     3) Replacing the parts of arguments in the
                                                                                                                                                                                (1+1)/(1+1) = 1                    (2+1)/(1+1) = 1.5
           1           0.33     0.33      4          4        4        1.5           1.5
                                                                                                     sub-sentence with argument labels
                                                                                                      Computing Term Weights                                  <HUM_TGT>                       <PREP_IND>
   Soft Pattern Matching
     Sequence Alignment                                                                                  wi = (ri + c) / (di + c)                              prep_of                                       prep_of

     about 50 peasants          have          been       kidnapped     by         terrorists              wi : weight of i-th term
                                                                                                          ri : number of relevant terms within                      [NP]                         <PREP_ORG>
                                                                                                              a subtree, ti as root
  <HUM_TGT>       of         [NP]      have      been      kidnapped         by     <PREP_IND>            di : distance from root node                     (0+1)/(2+1) = 0.33                            (1+1)/(2+1) =0.67
                                                                                                          c : for smoothing (default:1)
                                                                                                                                                         Experiment
 Pattern Matching
  Sequence Alignment                                                                                Experimental Setup                            Experimental Result
  Based on a Dynamic Programming                                                                     Data                                          Pattern Models
  Alignment Matrix                                                                                     MUC-3/4 Data                                  SVO Model (Yangarber ‘00)
               peasants have been kidnapped by terrorists                                                 About the Terrorism Events                  Linked-Chain Model (Greenwood ‘06)
   <HUM_TGT>      1       0   0        0     0     0
         of       0       1   0        0     0     0
                                                                                                        Simpler template structure with 4 slots       Subtree Model (Sudo ‘03)
        [NP]      0     0.66 1         0     0     0                                                      perp_ind, perp_org, phys_tgt, hum_tgt       Our Model
        have      0       4   3        2     1     0
        been      0       0   8        7     6     5                                                    Dev-set (training), Test-set (evaluation)   Result
     kidnapped    0       0   4       12     11    10                                                 Preprocessing                                      Model      Precision Recall F-measure
         by       0       0   0        8    13.5  12.5
   <PRED_IND>    1.5     0.5  0        0     0     15                                                   Dependency Parsing and NP-chunking                SVO        21.74    20.62    21.16

                                                                                                          Stanford Parser                             Linked-Chain   20.04    26.55    22.84
   Matrix Computation                                                                                Extracting Pattern Candidates                      Subtree     23.34    32.73    27.25
                                                                                                                                                        Alignment     23.35    45.62    30.89
                                       Mi-1,j-1 + sim i-1,j-1 * wi-1                                    Selecting all pattern candidates for test
                                       Mi-1,j + gp * wi-1                                               Without pattern filtering                     Our proposed model achieved much
          Mi,j = max
                                       Mi,j-1 + gp * wi                                                 To compare not the pattern filtering         higher recall than the other models with
                                       0                                                                  method, but the representative performance                      similar precision
                                                                                                          among pattern models

Mais conteúdo relacionado

Mais procurados

Detecting aspect-specific code smells using Ekeko for AspectJ
Detecting aspect-specific code smells using Ekeko for AspectJDetecting aspect-specific code smells using Ekeko for AspectJ
Detecting aspect-specific code smells using Ekeko for AspectJCoen De Roover
 
Dart function - Recursive functions
Dart function - Recursive functionsDart function - Recursive functions
Dart function - Recursive functionsKoAungThuOo1
 
Ekeko Technology Showdown at SoTeSoLa 2012
Ekeko Technology Showdown at SoTeSoLa 2012Ekeko Technology Showdown at SoTeSoLa 2012
Ekeko Technology Showdown at SoTeSoLa 2012Coen De Roover
 
The SOUL Tool Suite for Querying Programs in Symbiosis with Eclipse
The SOUL Tool Suite for Querying Programs in Symbiosis with EclipseThe SOUL Tool Suite for Querying Programs in Symbiosis with Eclipse
The SOUL Tool Suite for Querying Programs in Symbiosis with EclipseCoen De Roover
 
django095-cheat-sheet
django095-cheat-sheetdjango095-cheat-sheet
django095-cheat-sheetwebuploader
 
Scientific computing
Scientific computingScientific computing
Scientific computingBrijesh Kumar
 

Mais procurados (9)

Detecting aspect-specific code smells using Ekeko for AspectJ
Detecting aspect-specific code smells using Ekeko for AspectJDetecting aspect-specific code smells using Ekeko for AspectJ
Detecting aspect-specific code smells using Ekeko for AspectJ
 
Dart function - Recursive functions
Dart function - Recursive functionsDart function - Recursive functions
Dart function - Recursive functions
 
Ekeko Technology Showdown at SoTeSoLa 2012
Ekeko Technology Showdown at SoTeSoLa 2012Ekeko Technology Showdown at SoTeSoLa 2012
Ekeko Technology Showdown at SoTeSoLa 2012
 
JSTLQuick Reference
JSTLQuick ReferenceJSTLQuick Reference
JSTLQuick Reference
 
The SOUL Tool Suite for Querying Programs in Symbiosis with Eclipse
The SOUL Tool Suite for Querying Programs in Symbiosis with EclipseThe SOUL Tool Suite for Querying Programs in Symbiosis with Eclipse
The SOUL Tool Suite for Querying Programs in Symbiosis with Eclipse
 
Discovery Of Functional Protein Linear Motifs Using a Greaddy Algorithm and I...
Discovery Of Functional Protein Linear Motifs Using a Greaddy Algorithm and I...Discovery Of Functional Protein Linear Motifs Using a Greaddy Algorithm and I...
Discovery Of Functional Protein Linear Motifs Using a Greaddy Algorithm and I...
 
django095-cheat-sheet
django095-cheat-sheetdjango095-cheat-sheet
django095-cheat-sheet
 
Chtp405
Chtp405Chtp405
Chtp405
 
Scientific computing
Scientific computingScientific computing
Scientific computing
 

Destaque

Presenter and Decorator in Rails
Presenter and Decorator in RailsPresenter and Decorator in Rails
Presenter and Decorator in RailsThaichor Seng
 
Functional reactive programming
Functional reactive programmingFunctional reactive programming
Functional reactive programmingAraf Karsh Hamid
 
Phani Kumar - Decorator Pattern
Phani Kumar - Decorator PatternPhani Kumar - Decorator Pattern
Phani Kumar - Decorator Patternmelbournepatterns
 
Builder Design Pattern (Generic Construction -Different Representation)
Builder Design Pattern (Generic Construction -Different Representation)Builder Design Pattern (Generic Construction -Different Representation)
Builder Design Pattern (Generic Construction -Different Representation)Sameer Rathoud
 
How I Learned To Apply Design Patterns
How I Learned To Apply Design PatternsHow I Learned To Apply Design Patterns
How I Learned To Apply Design PatternsAndy Maleh
 
Pattern Languages — An Approach to Holistic Knowledge Representation
Pattern Languages — An Approach to Holistic Knowledge RepresentationPattern Languages — An Approach to Holistic Knowledge Representation
Pattern Languages — An Approach to Holistic Knowledge RepresentationDouglas Schuler
 
Software Design Patterns - Selecting the right design pattern
Software Design Patterns - Selecting the right design patternSoftware Design Patterns - Selecting the right design pattern
Software Design Patterns - Selecting the right design patternJoao Pereira
 
Decorator Design Pattern
Decorator Design PatternDecorator Design Pattern
Decorator Design PatternAdeel Riaz
 

Destaque (10)

Presenter and Decorator in Rails
Presenter and Decorator in RailsPresenter and Decorator in Rails
Presenter and Decorator in Rails
 
Functional reactive programming
Functional reactive programmingFunctional reactive programming
Functional reactive programming
 
Phani Kumar - Decorator Pattern
Phani Kumar - Decorator PatternPhani Kumar - Decorator Pattern
Phani Kumar - Decorator Pattern
 
Builder Design Pattern (Generic Construction -Different Representation)
Builder Design Pattern (Generic Construction -Different Representation)Builder Design Pattern (Generic Construction -Different Representation)
Builder Design Pattern (Generic Construction -Different Representation)
 
How I Learned To Apply Design Patterns
How I Learned To Apply Design PatternsHow I Learned To Apply Design Patterns
How I Learned To Apply Design Patterns
 
Pattern Languages — An Approach to Holistic Knowledge Representation
Pattern Languages — An Approach to Holistic Knowledge RepresentationPattern Languages — An Approach to Holistic Knowledge Representation
Pattern Languages — An Approach to Holistic Knowledge Representation
 
Observer pattern
Observer patternObserver pattern
Observer pattern
 
Software Design Patterns - Selecting the right design pattern
Software Design Patterns - Selecting the right design patternSoftware Design Patterns - Selecting the right design pattern
Software Design Patterns - Selecting the right design pattern
 
Decorator Design Pattern
Decorator Design PatternDecorator Design Pattern
Decorator Design Pattern
 
Design pattern
Design patternDesign pattern
Design pattern
 

Mais de Seokhwan Kim

The Eighth Dialog System Technology Challenge (DSTC8)
The Eighth Dialog System Technology Challenge (DSTC8)The Eighth Dialog System Technology Challenge (DSTC8)
The Eighth Dialog System Technology Challenge (DSTC8)Seokhwan Kim
 
Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punc...
Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punc...Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punc...
Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punc...Seokhwan Kim
 
Dynamic Memory Networks for Dialogue Topic Tracking
Dynamic Memory Networks for Dialogue Topic TrackingDynamic Memory Networks for Dialogue Topic Tracking
Dynamic Memory Networks for Dialogue Topic TrackingSeokhwan Kim
 
The Fifth Dialog State Tracking Challenge (DSTC5)
The Fifth Dialog State Tracking Challenge (DSTC5)The Fifth Dialog State Tracking Challenge (DSTC5)
The Fifth Dialog State Tracking Challenge (DSTC5)Seokhwan Kim
 
Natural Language in Human-Robot Interaction
Natural Language in Human-Robot InteractionNatural Language in Human-Robot Interaction
Natural Language in Human-Robot InteractionSeokhwan Kim
 
Exploring Convolutional and Recurrent Neural Networks in Sequential Labelling...
Exploring Convolutional and Recurrent Neural Networks in Sequential Labelling...Exploring Convolutional and Recurrent Neural Networks in Sequential Labelling...
Exploring Convolutional and Recurrent Neural Networks in Sequential Labelling...Seokhwan Kim
 
The Fourth Dialog State Tracking Challenge (DSTC4)
The Fourth Dialog State Tracking Challenge (DSTC4)The Fourth Dialog State Tracking Challenge (DSTC4)
The Fourth Dialog State Tracking Challenge (DSTC4)Seokhwan Kim
 
Wikification of Concept Mentions within Spoken Dialogues Using Domain Constra...
Wikification of Concept Mentions within Spoken Dialogues Using Domain Constra...Wikification of Concept Mentions within Spoken Dialogues Using Domain Constra...
Wikification of Concept Mentions within Spoken Dialogues Using Domain Constra...Seokhwan Kim
 
Towards Improving Dialogue Topic Tracking Performances with Wikification of C...
Towards Improving Dialogue Topic Tracking Performances with Wikification of C...Towards Improving Dialogue Topic Tracking Performances with Wikification of C...
Towards Improving Dialogue Topic Tracking Performances with Wikification of C...Seokhwan Kim
 
A Composite Kernel Approach for Dialog Topic Tracking with Structured Domain ...
A Composite Kernel Approach for Dialog Topic Tracking with Structured Domain ...A Composite Kernel Approach for Dialog Topic Tracking with Structured Domain ...
A Composite Kernel Approach for Dialog Topic Tracking with Structured Domain ...Seokhwan Kim
 
Sequential Labeling for Tracking Dynamic Dialog States
Sequential Labeling for Tracking Dynamic Dialog StatesSequential Labeling for Tracking Dynamic Dialog States
Sequential Labeling for Tracking Dynamic Dialog StatesSeokhwan Kim
 
Wikipedia-based Kernels for Dialogue Topic Tracking
Wikipedia-based Kernels for Dialogue Topic TrackingWikipedia-based Kernels for Dialogue Topic Tracking
Wikipedia-based Kernels for Dialogue Topic TrackingSeokhwan Kim
 
A Graph-based Cross-lingual Projection Approach for Spoken Language Understan...
A Graph-based Cross-lingual Projection Approach for Spoken Language Understan...A Graph-based Cross-lingual Projection Approach for Spoken Language Understan...
A Graph-based Cross-lingual Projection Approach for Spoken Language Understan...Seokhwan Kim
 
A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relatio...
A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relatio...A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relatio...
A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relatio...Seokhwan Kim
 
MMR-based active machine learning for Bio named entity recognition
MMR-based active machine learning for Bio named entity recognitionMMR-based active machine learning for Bio named entity recognition
MMR-based active machine learning for Bio named entity recognitionSeokhwan Kim
 
A semi-supervised method for efficient construction of statistical spoken lan...
A semi-supervised method for efficient construction of statistical spoken lan...A semi-supervised method for efficient construction of statistical spoken lan...
A semi-supervised method for efficient construction of statistical spoken lan...Seokhwan Kim
 
A spoken dialog system for electronic program guide information access
A spoken dialog system for electronic program guide information accessA spoken dialog system for electronic program guide information access
A spoken dialog system for electronic program guide information accessSeokhwan Kim
 
An alignment-based approach to semi-supervised relation extraction including ...
An alignment-based approach to semi-supervised relation extraction including ...An alignment-based approach to semi-supervised relation extraction including ...
An alignment-based approach to semi-supervised relation extraction including ...Seokhwan Kim
 
EPG 정보 검색을 위한 예제 기반 자연어 대화 시스템
EPG 정보 검색을 위한 예제 기반 자연어 대화 시스템EPG 정보 검색을 위한 예제 기반 자연어 대화 시스템
EPG 정보 검색을 위한 예제 기반 자연어 대화 시스템Seokhwan Kim
 
A Cross-Lingual Annotation Projection Approach for Relation Detection
A Cross-Lingual Annotation Projection Approach for Relation DetectionA Cross-Lingual Annotation Projection Approach for Relation Detection
A Cross-Lingual Annotation Projection Approach for Relation DetectionSeokhwan Kim
 

Mais de Seokhwan Kim (20)

The Eighth Dialog System Technology Challenge (DSTC8)
The Eighth Dialog System Technology Challenge (DSTC8)The Eighth Dialog System Technology Challenge (DSTC8)
The Eighth Dialog System Technology Challenge (DSTC8)
 
Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punc...
Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punc...Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punc...
Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punc...
 
Dynamic Memory Networks for Dialogue Topic Tracking
Dynamic Memory Networks for Dialogue Topic TrackingDynamic Memory Networks for Dialogue Topic Tracking
Dynamic Memory Networks for Dialogue Topic Tracking
 
The Fifth Dialog State Tracking Challenge (DSTC5)
The Fifth Dialog State Tracking Challenge (DSTC5)The Fifth Dialog State Tracking Challenge (DSTC5)
The Fifth Dialog State Tracking Challenge (DSTC5)
 
Natural Language in Human-Robot Interaction
Natural Language in Human-Robot InteractionNatural Language in Human-Robot Interaction
Natural Language in Human-Robot Interaction
 
Exploring Convolutional and Recurrent Neural Networks in Sequential Labelling...
Exploring Convolutional and Recurrent Neural Networks in Sequential Labelling...Exploring Convolutional and Recurrent Neural Networks in Sequential Labelling...
Exploring Convolutional and Recurrent Neural Networks in Sequential Labelling...
 
The Fourth Dialog State Tracking Challenge (DSTC4)
The Fourth Dialog State Tracking Challenge (DSTC4)The Fourth Dialog State Tracking Challenge (DSTC4)
The Fourth Dialog State Tracking Challenge (DSTC4)
 
Wikification of Concept Mentions within Spoken Dialogues Using Domain Constra...
Wikification of Concept Mentions within Spoken Dialogues Using Domain Constra...Wikification of Concept Mentions within Spoken Dialogues Using Domain Constra...
Wikification of Concept Mentions within Spoken Dialogues Using Domain Constra...
 
Towards Improving Dialogue Topic Tracking Performances with Wikification of C...
Towards Improving Dialogue Topic Tracking Performances with Wikification of C...Towards Improving Dialogue Topic Tracking Performances with Wikification of C...
Towards Improving Dialogue Topic Tracking Performances with Wikification of C...
 
A Composite Kernel Approach for Dialog Topic Tracking with Structured Domain ...
A Composite Kernel Approach for Dialog Topic Tracking with Structured Domain ...A Composite Kernel Approach for Dialog Topic Tracking with Structured Domain ...
A Composite Kernel Approach for Dialog Topic Tracking with Structured Domain ...
 
Sequential Labeling for Tracking Dynamic Dialog States
Sequential Labeling for Tracking Dynamic Dialog StatesSequential Labeling for Tracking Dynamic Dialog States
Sequential Labeling for Tracking Dynamic Dialog States
 
Wikipedia-based Kernels for Dialogue Topic Tracking
Wikipedia-based Kernels for Dialogue Topic TrackingWikipedia-based Kernels for Dialogue Topic Tracking
Wikipedia-based Kernels for Dialogue Topic Tracking
 
A Graph-based Cross-lingual Projection Approach for Spoken Language Understan...
A Graph-based Cross-lingual Projection Approach for Spoken Language Understan...A Graph-based Cross-lingual Projection Approach for Spoken Language Understan...
A Graph-based Cross-lingual Projection Approach for Spoken Language Understan...
 
A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relatio...
A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relatio...A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relatio...
A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relatio...
 
MMR-based active machine learning for Bio named entity recognition
MMR-based active machine learning for Bio named entity recognitionMMR-based active machine learning for Bio named entity recognition
MMR-based active machine learning for Bio named entity recognition
 
A semi-supervised method for efficient construction of statistical spoken lan...
A semi-supervised method for efficient construction of statistical spoken lan...A semi-supervised method for efficient construction of statistical spoken lan...
A semi-supervised method for efficient construction of statistical spoken lan...
 
A spoken dialog system for electronic program guide information access
A spoken dialog system for electronic program guide information accessA spoken dialog system for electronic program guide information access
A spoken dialog system for electronic program guide information access
 
An alignment-based approach to semi-supervised relation extraction including ...
An alignment-based approach to semi-supervised relation extraction including ...An alignment-based approach to semi-supervised relation extraction including ...
An alignment-based approach to semi-supervised relation extraction including ...
 
EPG 정보 검색을 위한 예제 기반 자연어 대화 시스템
EPG 정보 검색을 위한 예제 기반 자연어 대화 시스템EPG 정보 검색을 위한 예제 기반 자연어 대화 시스템
EPG 정보 검색을 위한 예제 기반 자연어 대화 시스템
 
A Cross-Lingual Annotation Projection Approach for Relation Detection
A Cross-Lingual Annotation Projection Approach for Relation DetectionA Cross-Lingual Annotation Projection Approach for Relation Detection
A Cross-Lingual Annotation Projection Approach for Relation Detection
 

Último

Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Bhuvaneswari Subramani
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 

Último (20)

Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 

An Alignment-based Pattern Representation Model for Information Extraction

  • 1. An Alignment-based Pattern Representation Model for Information Extraction Seokhwan Kim, Minwoo Jeong, Gary Geunbae Lee {megaup, stardust, gblee}@postech.ac.kr Abstract - In this paper, we propose an alternative pattern representation model and the effective method of utilizing it. While the previous pattern representation models completely depend on the result of dependency analysis, our approach is basically based on the lexical alignment and considers the result of dependency analysis only as a meaningful feature of the alignment process. In this way, we can cope with the errors of incomplete dependency analysis. An evaluation of a scenario template task shows that our proposed model outperforms the previous syntax-dependent models. Pattern Representation Model for Information Extraction  Information Extraction  Pattern Representation Model for IE  Related Works  Extracting the defined number of relevant  Problem Definition  Lexical Sequence Pattern Models arguments from natural language documents  Ex)  A set of lexical sequences  Subtasks <hum_tgt> be kidnapped About 50 peasants have been kidnapped by terrorists of the FMNL be kidnapped by <prep_ind> # of arguments subtask 1 named-entity recognition Extraction Pattern  Syntax-dependent Pattern Models 2 binary relation extraction ?  A set of subtrees (from D-tree) more than 2 relation/event extraction incident type kidnapping kidnapped  Approach prep_ind terrorists nsubjpass agent  Automatic Pattern Learning prep_org FMNL peasants terrorists hum_tgt peasants  Pattern Representation Model prep_of  Pattern Learning Algorithm (kidnapped ({HUM_TGT}-nsubjpass) FMNL (kidnapped ({PREP_IND}-agent)) (kidnapped ({PREP_IND}-agent ({PREP_ORG}-prep_of))) Method  Our Approach  Pattern Sequences Extraction  Pattern Model 1) Searching the sentences containing all  Ex) (3+1)/(0+1) = 4  Lexical Sequence Pattern arguments of each tuple in source documents 2) Segmenting out subpart of the sentence kidnapped  + Term Weight (from Dependency Analysis) based on clausal boundaries nsubjpass agent <HUM_TGT> of [NP] have been kidnapped by <PRED_IND> 3) Replacing the parts of arguments in the (1+1)/(1+1) = 1 (2+1)/(1+1) = 1.5 1 0.33 0.33 4 4 4 1.5 1.5 sub-sentence with argument labels  Computing Term Weights <HUM_TGT> <PREP_IND>  Soft Pattern Matching  Sequence Alignment wi = (ri + c) / (di + c) prep_of prep_of about 50 peasants have been kidnapped by terrorists wi : weight of i-th term ri : number of relevant terms within [NP] <PREP_ORG> a subtree, ti as root <HUM_TGT> of [NP] have been kidnapped by <PREP_IND> di : distance from root node (0+1)/(2+1) = 0.33 (1+1)/(2+1) =0.67 c : for smoothing (default:1) Experiment  Pattern Matching  Sequence Alignment  Experimental Setup  Experimental Result  Based on a Dynamic Programming  Data  Pattern Models  Alignment Matrix  MUC-3/4 Data  SVO Model (Yangarber ‘00) peasants have been kidnapped by terrorists  About the Terrorism Events  Linked-Chain Model (Greenwood ‘06) <HUM_TGT> 1 0 0 0 0 0 of 0 1 0 0 0 0  Simpler template structure with 4 slots  Subtree Model (Sudo ‘03) [NP] 0 0.66 1 0 0 0  perp_ind, perp_org, phys_tgt, hum_tgt  Our Model have 0 4 3 2 1 0 been 0 0 8 7 6 5  Dev-set (training), Test-set (evaluation)  Result kidnapped 0 0 4 12 11 10  Preprocessing Model Precision Recall F-measure by 0 0 0 8 13.5 12.5 <PRED_IND> 1.5 0.5 0 0 0 15  Dependency Parsing and NP-chunking SVO 21.74 20.62 21.16  Stanford Parser Linked-Chain 20.04 26.55 22.84  Matrix Computation  Extracting Pattern Candidates Subtree 23.34 32.73 27.25 Alignment 23.35 45.62 30.89 Mi-1,j-1 + sim i-1,j-1 * wi-1  Selecting all pattern candidates for test Mi-1,j + gp * wi-1  Without pattern filtering  Our proposed model achieved much Mi,j = max Mi,j-1 + gp * wi  To compare not the pattern filtering higher recall than the other models with 0 method, but the representative performance similar precision among pattern models