SlideShare uma empresa Scribd logo
1 de 14
Baixar para ler offline
Using Parallel Propbanks to
 Enhance Word-Alignments
The 3rd Linguistic Annotation Workshop at ACL ’09
                  August 7th, 2009


Jinho D. Choi (Univ. of Colorado at Boulder)
Martha Palmer (Univ. of Colorado at Boulder)
Niawen Xue (Brandeis University)
Parallel Propbanks
•   Propbank
    -    Corpus annotated with verbal propositions and their
         arguments (semantic roles)
        [ Gansu Province] also actively [ explored ] [ high risk business]
           Arg0: explorer                            Arg1: things explored


•   Parallel Propbanks
    -    Propbanks annotated in parallel corpus
        [!!"] " #             [ #$ ] [% $% &']
          Arg0                           Arg1



                                    2
Word-Alignments
•   Given parallel sentences, discover translation for each
    word
 !"         #      !     $"      %       &   #   '(     $%    )&


Construction is a principal economic activity in developing Pudong


•   GIZA++: a statistical machine translation toolkit
    -   It is hard to verify if the alignments are correct.

    -   Words with low frequencies may not get aligned.

    -   It does not account for semantics.



                                     3
Predicate Matching (based on GIZA++)
•    English Chinese Parallel Treebank (ECTB)
    -     Xinhua: Chinese newswire + literal translation

    -     Sinorama: Chinese news magazine + non-literal translation

        Xinhua: 12,895                              Sinorama: 40,086


                                                               19%
    32%
                                    En.verb
                         45%        En.be                          3%
                                    En.else     56%
                                    En.none                      22%
          19%   3%


                                    6
Top-down Argument Matching
•   Verify word-alignments
    -   For each Chinese verb vc aligned to some English verb ve

    -   Verify that the alignment is correct if the arguments of
        vc and ve match

         Arg0      ArgM ArgM     Rel                 Arg1
        [ !!" ]    [ " ] [ # ] [ #$ ] [ %            $%      &' ]

[Gansu Province ][ also][ actively] [explored ][ high risk business ]
      Arg0       ArgM ArgM              Rel            Arg1

                                      Bingo!

                                  7
Bottom-up Argument Matching
      •   Expand word-alignments
          -    For each Chinese verb vc aligned to no English word

          -    Align vc to ve such that ve is an English verb that maximizes
               the argument matching with vc



                     Arg0    A.M A.M A.M       Arg1    Rel
              [ !!" # $" %#] [ &] [' ][ ( ][ $ )" %&] [ ']


[ Foreign funded enterprises in Gansu Province][ no][longer ][worry about investment risk ]
                                                                        ][
                     Arg0                      A.M A.M            Rel          Arg1



                                             8
Bottom-up Argument Matching
      •   Expand word-alignments
          -    For each Chinese verb vc aligned to no English word

          -    Align vc to ve such that ve is an English verb that maximizes
               the argument matching with vc
  ArgM        Rel       Arg1
[Foreign ][ funded ][enterprises]in Gansu Province no longer worry about investment risk


              [ !!" # $" %#] [ &] [' ][ ( ][ $ )" %&] [ ']
                    Arg0     A.M A.M A.M       Arg1    Rel

[ Foreign funded enterprises in Gansu Province][ no][longer ][worry about investment risk ]
                                                                        ][
                     Arg0                      A.M A.M            Rel          Arg1



                                             8
Argument Matching Score
•   Macro argument matching score




•   Micro argument matching score




•   Thresholds
    -   Top-down: thresholds on macro score

    -   Bottom-up: thresholds on both macro and micro scores



                                9
System Overview
Source Language                     Target Language
    Corpus                              Corpus
                        GIZA++


                          Word
 Verbs aligned         Alignments    Verbs aligned
   to verbs                           to no word
                        Parallel
   Top-down            Propbanks      Bottom-up
    Matching                           Matching


   Verified                            Expanded
  Alignments                          Alignments
                       Enhanced
                       Alignments

                           10
Evaluations
•   Test Corpus
    -   NIST-GALE Web Genre Test Data

    -   100 parallel sentences, 365 verb tokens, 273 verb types

•   Measurements
    -   Term Coverage
        : how many Chinese verb-types are covered

    -   Term Expansion
        : how many English verb-types are suggested

    -   Alignment Accuracy
        : how many suggested English verb-types are correct



                                 11
Evaluations: Top-down
    Mac.th = 0.0 (GIZA++)                Mac.th = 0.5 (TDAM)
                               Term Coverage
        130.0
                                          129
         97.5
         65.0        79        76
                                                  62
         32.5
            0
                          Xinhua            Sinorama
                         Average Alignment Accuracy
90.0%
67.5%           83.35%     83.71%                 78.09%
45.0%                                    57.76%
22.5%
   0%
                    Xinhua                   Sinorama
                                  12
Evaluations: Bottom-up
                         Mac.th = 0.8, Mic.th = 0.6

                                Term Coverage
             30.0
             22.5                                27
             15.0         18
              7.5
                0
5.5% error-reduction    Xinhua               Sinorama
17% abs-improvement     Average Alignment Accuracy
         70.0%
         52.5%         63.89%
         35.0%
         17.5%
            0%                                  14.46%
                       Xinhua                   Sinorama
                                   13
Conclusions & Future Work
•   Conclusions
    -   Top-down Argument Matching is most effective for verifying
        word-alignments based on non-literal translations that have
        proven difficult for GIZA++.

    -   Bottom-up Argument Matching shows promise for expanding
        the coverage of GIZA++ alignments based on literal
        translations.

•   We will try to enhance word-alignments by using
    -   Automatically labeled Propbanks

    -   Nombanks, Named-entity tags

    -   Parallel Propbanks prior to GIZA++


                                 14
Acknowledgements
•   We gratefully acknowledge the support of the National
    Science Foundation Grants IIS-0325646, Domain
    Independent Semantic Parsing, CISE-CRI-0551615,
    Towards a Comprehensive Linguistic Annotation, and a
    grant from the Defense Advanced Research Projects
    Agency (DARPA/IPTO) under the GALE program,
    DARPA/CMO Contract No. HR0011-06-C-0022,
    subcontract from BBN, Inc.
•   Special thanks to Daniel Gildea, Ding Liu (University of
    Rochester) who provided word-alignments, Wei Wang
    (Information Sciences Institute at University of Southern
    California) who provided the test-corpus, and Hua
    Zhong (University of Colorado at Boulder) who
    performed the evaluations.

                             15

Mais conteúdo relacionado

Destaque

Voluntariado Corporativo
Voluntariado CorporativoVoluntariado Corporativo
Voluntariado Corporativobancaliasturias
 
2012022 Esquema nacional de interoperabilidad (ENI), aplicando las normas téc...
2012022 Esquema nacional de interoperabilidad (ENI), aplicando las normas téc...2012022 Esquema nacional de interoperabilidad (ENI), aplicando las normas téc...
2012022 Esquema nacional de interoperabilidad (ENI), aplicando las normas téc...Miguel A. Amutio
 
El Esquema Nacional de Interoperabilidad (ENI) y la información geográfica en...
El Esquema Nacional de Interoperabilidad (ENI) y la información geográfica en...El Esquema Nacional de Interoperabilidad (ENI) y la información geográfica en...
El Esquema Nacional de Interoperabilidad (ENI) y la información geográfica en...Miguel A. Amutio
 
01 planeación 2012 2013 telesecundaria vicente guerrero
01 planeación 2012 2013 telesecundaria vicente guerrero01 planeación 2012 2013 telesecundaria vicente guerrero
01 planeación 2012 2013 telesecundaria vicente guerreroUSET
 
Proyecto dia del logro I.E Fanny Abanto Calle
Proyecto dia del logro I.E Fanny Abanto CalleProyecto dia del logro I.E Fanny Abanto Calle
Proyecto dia del logro I.E Fanny Abanto CalleJuan Japz
 
Plan de trabajo del 1° día del logro 2016
Plan de trabajo del  1° día del logro 2016Plan de trabajo del  1° día del logro 2016
Plan de trabajo del 1° día del logro 2016Reymundo Salcedo
 
Proyecto de aprendizaje dia del logro 2015
Proyecto de aprendizaje dia del logro 2015Proyecto de aprendizaje dia del logro 2015
Proyecto de aprendizaje dia del logro 2015Jenrry Sánchez
 
Proyecto i dia del logro
Proyecto i dia del logroProyecto i dia del logro
Proyecto i dia del logrovioletaegu
 

Destaque (12)

Voluntariado Corporativo
Voluntariado CorporativoVoluntariado Corporativo
Voluntariado Corporativo
 
2012022 Esquema nacional de interoperabilidad (ENI), aplicando las normas téc...
2012022 Esquema nacional de interoperabilidad (ENI), aplicando las normas téc...2012022 Esquema nacional de interoperabilidad (ENI), aplicando las normas téc...
2012022 Esquema nacional de interoperabilidad (ENI), aplicando las normas téc...
 
El Esquema Nacional de Interoperabilidad (ENI) y la información geográfica en...
El Esquema Nacional de Interoperabilidad (ENI) y la información geográfica en...El Esquema Nacional de Interoperabilidad (ENI) y la información geográfica en...
El Esquema Nacional de Interoperabilidad (ENI) y la información geográfica en...
 
01 planeación 2012 2013 telesecundaria vicente guerrero
01 planeación 2012 2013 telesecundaria vicente guerrero01 planeación 2012 2013 telesecundaria vicente guerrero
01 planeación 2012 2013 telesecundaria vicente guerrero
 
EL DIA DEL LOGRO
EL DIA DEL LOGROEL DIA DEL LOGRO
EL DIA DEL LOGRO
 
Proyecto dia del logro I.E Fanny Abanto Calle
Proyecto dia del logro I.E Fanny Abanto CalleProyecto dia del logro I.E Fanny Abanto Calle
Proyecto dia del logro I.E Fanny Abanto Calle
 
Sesión día del logro 16 10 (2)
Sesión día del logro 16 10 (2)Sesión día del logro 16 10 (2)
Sesión día del logro 16 10 (2)
 
Plan de trabajo del 1° día del logro 2016
Plan de trabajo del  1° día del logro 2016Plan de trabajo del  1° día del logro 2016
Plan de trabajo del 1° día del logro 2016
 
Plan de trabajo dia de logro
Plan de trabajo dia de logroPlan de trabajo dia de logro
Plan de trabajo dia de logro
 
Proyecto de aprendizaje dia del logro 2015
Proyecto de aprendizaje dia del logro 2015Proyecto de aprendizaje dia del logro 2015
Proyecto de aprendizaje dia del logro 2015
 
I dia del logro 2015
I  dia del logro 2015I  dia del logro 2015
I dia del logro 2015
 
Proyecto i dia del logro
Proyecto i dia del logroProyecto i dia del logro
Proyecto i dia del logro
 

Mais de Jinho Choi

Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...
Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...
Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...Jinho Choi
 
Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...
Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...
Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...Jinho Choi
 
Competence-Level Prediction and Resume & Job Description Matching Using Conte...
Competence-Level Prediction and Resume & Job Description Matching Using Conte...Competence-Level Prediction and Resume & Job Description Matching Using Conte...
Competence-Level Prediction and Resume & Job Description Matching Using Conte...Jinho Choi
 
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...Jinho Choi
 
The Myth of Higher-Order Inference in Coreference Resolution
The Myth of Higher-Order Inference in Coreference ResolutionThe Myth of Higher-Order Inference in Coreference Resolution
The Myth of Higher-Order Inference in Coreference ResolutionJinho Choi
 
Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...
Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...
Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...Jinho Choi
 
Abstract Meaning Representation
Abstract Meaning RepresentationAbstract Meaning Representation
Abstract Meaning RepresentationJinho Choi
 
Semantic Role Labeling
Semantic Role LabelingSemantic Role Labeling
Semantic Role LabelingJinho Choi
 
CS329 - WordNet Similarities
CS329 - WordNet SimilaritiesCS329 - WordNet Similarities
CS329 - WordNet SimilaritiesJinho Choi
 
CS329 - Lexical Relations
CS329 - Lexical RelationsCS329 - Lexical Relations
CS329 - Lexical RelationsJinho Choi
 
Automatic Knowledge Base Expansion for Dialogue Management
Automatic Knowledge Base Expansion for Dialogue ManagementAutomatic Knowledge Base Expansion for Dialogue Management
Automatic Knowledge Base Expansion for Dialogue ManagementJinho Choi
 
Attention is All You Need for AMR Parsing
Attention is All You Need for AMR ParsingAttention is All You Need for AMR Parsing
Attention is All You Need for AMR ParsingJinho Choi
 
Graph-to-Text Generation and its Applications to Dialogue
Graph-to-Text Generation and its Applications to DialogueGraph-to-Text Generation and its Applications to Dialogue
Graph-to-Text Generation and its Applications to DialogueJinho Choi
 
Real-time Coreference Resolution for Dialogue Understanding
Real-time Coreference Resolution for Dialogue UnderstandingReal-time Coreference Resolution for Dialogue Understanding
Real-time Coreference Resolution for Dialogue UnderstandingJinho Choi
 
Topological Sort
Topological SortTopological Sort
Topological SortJinho Choi
 
Multi-modal Embedding Learning for Early Detection of Alzheimer's Disease
Multi-modal Embedding Learning for Early Detection of Alzheimer's DiseaseMulti-modal Embedding Learning for Early Detection of Alzheimer's Disease
Multi-modal Embedding Learning for Early Detection of Alzheimer's DiseaseJinho Choi
 
Building Widely-Interpretable Semantic Networks for Dialogue Contexts
Building Widely-Interpretable Semantic Networks for Dialogue ContextsBuilding Widely-Interpretable Semantic Networks for Dialogue Contexts
Building Widely-Interpretable Semantic Networks for Dialogue ContextsJinho Choi
 
How to make Emora talk about Sports Intelligently
How to make Emora talk about Sports IntelligentlyHow to make Emora talk about Sports Intelligently
How to make Emora talk about Sports IntelligentlyJinho Choi
 

Mais de Jinho Choi (20)

Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...
Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...
Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...
 
Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...
Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...
Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...
 
Competence-Level Prediction and Resume & Job Description Matching Using Conte...
Competence-Level Prediction and Resume & Job Description Matching Using Conte...Competence-Level Prediction and Resume & Job Description Matching Using Conte...
Competence-Level Prediction and Resume & Job Description Matching Using Conte...
 
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...
 
The Myth of Higher-Order Inference in Coreference Resolution
The Myth of Higher-Order Inference in Coreference ResolutionThe Myth of Higher-Order Inference in Coreference Resolution
The Myth of Higher-Order Inference in Coreference Resolution
 
Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...
Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...
Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...
 
Abstract Meaning Representation
Abstract Meaning RepresentationAbstract Meaning Representation
Abstract Meaning Representation
 
Semantic Role Labeling
Semantic Role LabelingSemantic Role Labeling
Semantic Role Labeling
 
CKY Parsing
CKY ParsingCKY Parsing
CKY Parsing
 
CS329 - WordNet Similarities
CS329 - WordNet SimilaritiesCS329 - WordNet Similarities
CS329 - WordNet Similarities
 
CS329 - Lexical Relations
CS329 - Lexical RelationsCS329 - Lexical Relations
CS329 - Lexical Relations
 
Automatic Knowledge Base Expansion for Dialogue Management
Automatic Knowledge Base Expansion for Dialogue ManagementAutomatic Knowledge Base Expansion for Dialogue Management
Automatic Knowledge Base Expansion for Dialogue Management
 
Attention is All You Need for AMR Parsing
Attention is All You Need for AMR ParsingAttention is All You Need for AMR Parsing
Attention is All You Need for AMR Parsing
 
Graph-to-Text Generation and its Applications to Dialogue
Graph-to-Text Generation and its Applications to DialogueGraph-to-Text Generation and its Applications to Dialogue
Graph-to-Text Generation and its Applications to Dialogue
 
Real-time Coreference Resolution for Dialogue Understanding
Real-time Coreference Resolution for Dialogue UnderstandingReal-time Coreference Resolution for Dialogue Understanding
Real-time Coreference Resolution for Dialogue Understanding
 
Topological Sort
Topological SortTopological Sort
Topological Sort
 
Tries - Put
Tries - PutTries - Put
Tries - Put
 
Multi-modal Embedding Learning for Early Detection of Alzheimer's Disease
Multi-modal Embedding Learning for Early Detection of Alzheimer's DiseaseMulti-modal Embedding Learning for Early Detection of Alzheimer's Disease
Multi-modal Embedding Learning for Early Detection of Alzheimer's Disease
 
Building Widely-Interpretable Semantic Networks for Dialogue Contexts
Building Widely-Interpretable Semantic Networks for Dialogue ContextsBuilding Widely-Interpretable Semantic Networks for Dialogue Contexts
Building Widely-Interpretable Semantic Networks for Dialogue Contexts
 
How to make Emora talk about Sports Intelligently
How to make Emora talk about Sports IntelligentlyHow to make Emora talk about Sports Intelligently
How to make Emora talk about Sports Intelligently
 

Último

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 

Último (20)

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 

Using Parallel Propbanks to Enhance Word-alignments

  • 1. Using Parallel Propbanks to Enhance Word-Alignments The 3rd Linguistic Annotation Workshop at ACL ’09 August 7th, 2009 Jinho D. Choi (Univ. of Colorado at Boulder) Martha Palmer (Univ. of Colorado at Boulder) Niawen Xue (Brandeis University)
  • 2. Parallel Propbanks • Propbank - Corpus annotated with verbal propositions and their arguments (semantic roles) [ Gansu Province] also actively [ explored ] [ high risk business] Arg0: explorer Arg1: things explored • Parallel Propbanks - Propbanks annotated in parallel corpus [!!"] " # [ #$ ] [% $% &'] Arg0 Arg1 2
  • 3. Word-Alignments • Given parallel sentences, discover translation for each word !" # ! $" % & # '( $% )& Construction is a principal economic activity in developing Pudong • GIZA++: a statistical machine translation toolkit - It is hard to verify if the alignments are correct. - Words with low frequencies may not get aligned. - It does not account for semantics. 3
  • 4. Predicate Matching (based on GIZA++) • English Chinese Parallel Treebank (ECTB) - Xinhua: Chinese newswire + literal translation - Sinorama: Chinese news magazine + non-literal translation Xinhua: 12,895 Sinorama: 40,086 19% 32% En.verb 45% En.be 3% En.else 56% En.none 22% 19% 3% 6
  • 5. Top-down Argument Matching • Verify word-alignments - For each Chinese verb vc aligned to some English verb ve - Verify that the alignment is correct if the arguments of vc and ve match Arg0 ArgM ArgM Rel Arg1 [ !!" ] [ " ] [ # ] [ #$ ] [ % $% &' ] [Gansu Province ][ also][ actively] [explored ][ high risk business ] Arg0 ArgM ArgM Rel Arg1 Bingo! 7
  • 6. Bottom-up Argument Matching • Expand word-alignments - For each Chinese verb vc aligned to no English word - Align vc to ve such that ve is an English verb that maximizes the argument matching with vc Arg0 A.M A.M A.M Arg1 Rel [ !!" # $" %#] [ &] [' ][ ( ][ $ )" %&] [ '] [ Foreign funded enterprises in Gansu Province][ no][longer ][worry about investment risk ] ][ Arg0 A.M A.M Rel Arg1 8
  • 7. Bottom-up Argument Matching • Expand word-alignments - For each Chinese verb vc aligned to no English word - Align vc to ve such that ve is an English verb that maximizes the argument matching with vc ArgM Rel Arg1 [Foreign ][ funded ][enterprises]in Gansu Province no longer worry about investment risk [ !!" # $" %#] [ &] [' ][ ( ][ $ )" %&] [ '] Arg0 A.M A.M A.M Arg1 Rel [ Foreign funded enterprises in Gansu Province][ no][longer ][worry about investment risk ] ][ Arg0 A.M A.M Rel Arg1 8
  • 8. Argument Matching Score • Macro argument matching score • Micro argument matching score • Thresholds - Top-down: thresholds on macro score - Bottom-up: thresholds on both macro and micro scores 9
  • 9. System Overview Source Language Target Language Corpus Corpus GIZA++ Word Verbs aligned Alignments Verbs aligned to verbs to no word Parallel Top-down Propbanks Bottom-up Matching Matching Verified Expanded Alignments Alignments Enhanced Alignments 10
  • 10. Evaluations • Test Corpus - NIST-GALE Web Genre Test Data - 100 parallel sentences, 365 verb tokens, 273 verb types • Measurements - Term Coverage : how many Chinese verb-types are covered - Term Expansion : how many English verb-types are suggested - Alignment Accuracy : how many suggested English verb-types are correct 11
  • 11. Evaluations: Top-down Mac.th = 0.0 (GIZA++) Mac.th = 0.5 (TDAM) Term Coverage 130.0 129 97.5 65.0 79 76 62 32.5 0 Xinhua Sinorama Average Alignment Accuracy 90.0% 67.5% 83.35% 83.71% 78.09% 45.0% 57.76% 22.5% 0% Xinhua Sinorama 12
  • 12. Evaluations: Bottom-up Mac.th = 0.8, Mic.th = 0.6 Term Coverage 30.0 22.5 27 15.0 18 7.5 0 5.5% error-reduction Xinhua Sinorama 17% abs-improvement Average Alignment Accuracy 70.0% 52.5% 63.89% 35.0% 17.5% 0% 14.46% Xinhua Sinorama 13
  • 13. Conclusions & Future Work • Conclusions - Top-down Argument Matching is most effective for verifying word-alignments based on non-literal translations that have proven difficult for GIZA++. - Bottom-up Argument Matching shows promise for expanding the coverage of GIZA++ alignments based on literal translations. • We will try to enhance word-alignments by using - Automatically labeled Propbanks - Nombanks, Named-entity tags - Parallel Propbanks prior to GIZA++ 14
  • 14. Acknowledgements • We gratefully acknowledge the support of the National Science Foundation Grants IIS-0325646, Domain Independent Semantic Parsing, CISE-CRI-0551615, Towards a Comprehensive Linguistic Annotation, and a grant from the Defense Advanced Research Projects Agency (DARPA/IPTO) under the GALE program, DARPA/CMO Contract No. HR0011-06-C-0022, subcontract from BBN, Inc. • Special thanks to Daniel Gildea, Ding Liu (University of Rochester) who provided word-alignments, Wei Wang (Information Sciences Institute at University of Southern California) who provided the test-corpus, and Hua Zhong (University of Colorado at Boulder) who performed the evaluations. 15