SlideShare uma empresa Scribd logo
1 de 14
Baixar para ler offline
Exploring Higher Order Dependency Parsers

             Pranava Swaroop Madhyastha

   Supervised by: Prof. Michael Rosner & RNDr. Daniel Zeman


                   September 6, 2011
Introduction

     ◮   Dependency Grammar.
           ◮   Binary asymmetric relations - Head and Modifier - Highly
               lexical relationships.
     ◮   A quick example:




     ◮   Projective Constraint
     ◮   Graph Based Dependency Parsing
           ◮   Arc-Factored Parsing
Problem Description?



    ◮   Augmentation of Features
          ◮   Semantic features
          ◮   Morpho-syntactic features
    ◮   Higher order parsing
          ◮   Context availability
          ◮   horizontal and vertical context availability

    ◮   Motivation
          ◮   Semi-supervised dependency parsing and improvements.
          ◮   Using well defined linguistic components.
What is Higher Order Dependency Parsing
    ◮   First-order model - decomposition of the tree into head and
        modifier dependencies.
    ◮   Second-order models - inclusion of sibling relation of the
        modifier tokens along with head and modifier or inclusion of
        head and modifier and children of the modifier.
    ◮   Third-order models - one level up.




    ◮   An illustration
Still Why?
Features


    ◮   For a given φ - a feature vector and w - the list of related
        parameters, each part is scored as

                              Part(x, p) = w .φ(x, p)                  (1)
    ◮   Each of these contributing feature vectors would be scored by
        calculating the individual features in this fashion:
           ◮   dir.pos(h).pos(m)
           ◮   dir.form(h).pos(m)
           ◮   and so on ...
    ◮   The most basic feature patterns consider the surface form,
        part-of-speech, lemma and other morphosyntactic attributes
        of the head or the modifier of a dependency.
Experimentation done with:

    ◮   English - Penn Treebank
          ◮   Section 2 to 10 as training set - a set of 15000 sentences.
          ◮   Random sets of sentences from sections 15, 17, 19, 25 of the
              Penn Treebank as development data - a set of 1000 sentences.
          ◮   Test set was chosen from Sections 0, 1, 21, 23 of the penn
              treebank - a set of 2000 sentences.
    ◮   Czech - Prague Dependency Treebank
          ◮   The sentences were chosen from pdt2-full-automorph dataset.
          ◮   The training set consisted of train1 - train5 splits - a set of
              15,000 sentences..
          ◮   The development set consisted of train6 and train7 splits - a
              set of 1000 sentences.
          ◮   The test set was made up of dtest and etest parts - a set of
              2000 sentences.
Experimentation

    ◮   Fine and Coarse Grained Wordsenses
    ◮   Approximation
    ◮   For English:
          ◮   Both Fine and Coarse Grained Wordsense extraction make use
              of WordNet::SenseRelate package.
          ◮   Fine grained wordsense basically restricts a word to a particular
              sense - Word - noun and first sense (extracted from the
              wordnet)
          ◮   Coarse Grained wordsense is a more generic wordsense
              description Word - the semantic file to which the word belongs
              to.
    ◮   For Czech:
          ◮   Only Fine Grained Wordsense extraction (approximately).
          ◮   extracted by using the sempos which is already tagged in the
              prague dependency treebank.
Results for the Wordsense augmentation experiment

    ◮   Sibling based parsers show a statistically significant
        improvement.
    ◮   For English with Fine Grained wordsense addition - Third
        order grand-sibling based parser gives an improvement of
        +0.81 percent (Unlabeled Accuracy Score). A closer
        statistical examination showed that sibling based interactions
        which are close to each other have better precision.
    ◮   For English with Coarse Grained wordsense addition - the
        second order sibling based parser gives an improvement of
        approximately +1.09 percent.
    ◮   Again for Czech with fine grained wordsense augmentation,
        the 3rd order sibling based parser gives an improvement of
        approximately +1.20 percent.
Results for Morphosyntactic augmentation experiment




    ◮   Morphosyntactic augmentation was basically used directly by
        extracting tags from the corpus.
    ◮   For Czech, instead of the 15 Letter tagset, we tried out a
        subset (which includes - Person, Number, POSSGender,
        Tense, Voice and Case)
    ◮   For English we integrated the fine grained part-of-speech.
Results




     ◮    Both for English and Czech, there is a significant
          improvement in the parsing accuracy when it is parsed with
          the grandchild based algorithms.
     ◮    For Czech, the third order grand sibling based algorithm
          shows an improvement of +1.72 percent.
     ◮    For English, the third order grand sibling based algorithm
          shows an improvement of +1.21 percent.
Conclusion



    ◮   Semantic features work better with sibling based parsers
        (larger horizontal contexts).
    ◮   Morpho-syntactic features work better with grandchild based
        parsers (larger vertical contexts).
    ◮   Features can be instrumental in several tasks, which include
        accurate labeling of semantic roles and other related tasks.
    ◮   Linguistic information can be better handled by a higher order
        parsing algorithm.
Future Work




    ◮   Higher order parsers with labels (we have not yet tested
        labeled accuracy scores).
    ◮   Joint extraction of word-senses and semantic roles.
    ◮   Experimentation with lexical clusters.
    ◮   Thorough experimentation of several features.
    ◮   Maximum and Minimum order requirements.
Thanks

Mais conteúdo relacionado

Mais procurados

pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...
Lifeng (Aaron) Han
 
Statistically-Enhanced New Word Identification
Statistically-Enhanced New Word IdentificationStatistically-Enhanced New Word Identification
Statistically-Enhanced New Word Identification
Andi Wu
 

Mais procurados (20)

Info 2402 irt-chapter_4
Info 2402 irt-chapter_4Info 2402 irt-chapter_4
Info 2402 irt-chapter_4
 
A general method applicable to the search for anglicisms in russian social ne...
A general method applicable to the search for anglicisms in russian social ne...A general method applicable to the search for anglicisms in russian social ne...
A general method applicable to the search for anglicisms in russian social ne...
 
Identification of Fertile Translations in Comparable Corpora: a Morpho-Compos...
Identification of Fertile Translations in Comparable Corpora: a Morpho-Compos...Identification of Fertile Translations in Comparable Corpora: a Morpho-Compos...
Identification of Fertile Translations in Comparable Corpora: a Morpho-Compos...
 
ACL-WMT2013.A Description of Tunable Machine Translation Evaluation Systems i...
ACL-WMT2013.A Description of Tunable Machine Translation Evaluation Systems i...ACL-WMT2013.A Description of Tunable Machine Translation Evaluation Systems i...
ACL-WMT2013.A Description of Tunable Machine Translation Evaluation Systems i...
 
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...
 
2021-0509_JAECS2021_Spring
2021-0509_JAECS2021_Spring2021-0509_JAECS2021_Spring
2021-0509_JAECS2021_Spring
 
Statistically-Enhanced New Word Identification
Statistically-Enhanced New Word IdentificationStatistically-Enhanced New Word Identification
Statistically-Enhanced New Word Identification
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
 
IRJET- Automatic Language Identification using Hybrid Approach and Classifica...
IRJET- Automatic Language Identification using Hybrid Approach and Classifica...IRJET- Automatic Language Identification using Hybrid Approach and Classifica...
IRJET- Automatic Language Identification using Hybrid Approach and Classifica...
 
Ceis 3
Ceis 3Ceis 3
Ceis 3
 
Language Models for Information Retrieval
Language Models for Information RetrievalLanguage Models for Information Retrieval
Language Models for Information Retrieval
 
static dictionary technique
static dictionary techniquestatic dictionary technique
static dictionary technique
 
Phonetic Recognition In Words For Persian Text To Speech Systems
Phonetic Recognition In Words For Persian Text To Speech SystemsPhonetic Recognition In Words For Persian Text To Speech Systems
Phonetic Recognition In Words For Persian Text To Speech Systems
 
Segmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On SilenceSegmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On Silence
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
 
Fasttext(Enriching Word Vectors with Subword Information) 논문 리뷰
Fasttext(Enriching Word Vectors with Subword Information) 논문 리뷰Fasttext(Enriching Word Vectors with Subword Information) 논문 리뷰
Fasttext(Enriching Word Vectors with Subword Information) 논문 리뷰
 
Enriching Word Vectors with Subword Information
Enriching Word Vectors with Subword InformationEnriching Word Vectors with Subword Information
Enriching Word Vectors with Subword Information
 
Deep Dependency Graph Conversion in English
Deep Dependency Graph Conversion in EnglishDeep Dependency Graph Conversion in English
Deep Dependency Graph Conversion in English
 
Noun Paraphrasing Based on a Variety of Contexts
Noun Paraphrasing Based on a Variety of ContextsNoun Paraphrasing Based on a Variety of Contexts
Noun Paraphrasing Based on a Variety of Contexts
 
Ics1019 ics5003
Ics1019 ics5003Ics1019 ics5003
Ics1019 ics5003
 

Semelhante a Presentation

Pptphrase tagset mapping for french and english treebanks and its application...
Pptphrase tagset mapping for french and english treebanks and its application...Pptphrase tagset mapping for french and english treebanks and its application...
Pptphrase tagset mapping for french and english treebanks and its application...
Lifeng (Aaron) Han
 
An Improved Approach to Word Sense Disambiguation
An Improved Approach to Word Sense DisambiguationAn Improved Approach to Word Sense Disambiguation
An Improved Approach to Word Sense Disambiguation
Surabhi Verma
 
Understanding Natural Languange with Corpora-based Generation of Dependency G...
Understanding Natural Languange with Corpora-based Generation of Dependency G...Understanding Natural Languange with Corpora-based Generation of Dependency G...
Understanding Natural Languange with Corpora-based Generation of Dependency G...
Edmond Lepedus
 
Nikolay Karpov - Single-sentence readability prediction in russian
Nikolay Karpov - Single-sentence readability prediction in russianNikolay Karpov - Single-sentence readability prediction in russian
Nikolay Karpov - Single-sentence readability prediction in russian
AIST
 

Semelhante a Presentation (20)

Pptphrase tagset mapping for french and english treebanks and its application...
Pptphrase tagset mapping for french and english treebanks and its application...Pptphrase tagset mapping for french and english treebanks and its application...
Pptphrase tagset mapping for french and english treebanks and its application...
 
Cross lingual similarity discrimination with translation characteristics
Cross lingual similarity discrimination with translation characteristicsCross lingual similarity discrimination with translation characteristics
Cross lingual similarity discrimination with translation characteristics
 
NLP
NLPNLP
NLP
 
NLP
NLPNLP
NLP
 
Artificial Intelligence Notes Unit 4
Artificial Intelligence Notes Unit 4Artificial Intelligence Notes Unit 4
Artificial Intelligence Notes Unit 4
 
NLP-my-lecture (3).ppt
NLP-my-lecture (3).pptNLP-my-lecture (3).ppt
NLP-my-lecture (3).ppt
 
An Improved Approach to Word Sense Disambiguation
An Improved Approach to Word Sense DisambiguationAn Improved Approach to Word Sense Disambiguation
An Improved Approach to Word Sense Disambiguation
 
haenelt.ppt
haenelt.ppthaenelt.ppt
haenelt.ppt
 
Lecture 7- Text Statistics and Document Parsing
Lecture 7- Text Statistics and Document ParsingLecture 7- Text Statistics and Document Parsing
Lecture 7- Text Statistics and Document Parsing
 
Natural Language Processing Course in AI
Natural Language Processing Course in AINatural Language Processing Course in AI
Natural Language Processing Course in AI
 
Using selectors for nouns, verbs and adjectives
Using selectors for nouns, verbs and adjectivesUsing selectors for nouns, verbs and adjectives
Using selectors for nouns, verbs and adjectives
 
Understanding Natural Languange with Corpora-based Generation of Dependency G...
Understanding Natural Languange with Corpora-based Generation of Dependency G...Understanding Natural Languange with Corpora-based Generation of Dependency G...
Understanding Natural Languange with Corpora-based Generation of Dependency G...
 
semeval2016
semeval2016semeval2016
semeval2016
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Nikolay Karpov - Single-sentence readability prediction in russian
Nikolay Karpov - Single-sentence readability prediction in russianNikolay Karpov - Single-sentence readability prediction in russian
Nikolay Karpov - Single-sentence readability prediction in russian
 
Junki Matsuo - 2015 - Source Phrase Segmentation and Translation for Japanese...
Junki Matsuo - 2015 - Source Phrase Segmentation and Translation for Japanese...Junki Matsuo - 2015 - Source Phrase Segmentation and Translation for Japanese...
Junki Matsuo - 2015 - Source Phrase Segmentation and Translation for Japanese...
 
Types of parsers
Types of parsersTypes of parsers
Types of parsers
 
Intrinsic and Extrinsic Evaluations of Word Embeddings
Intrinsic and Extrinsic Evaluations of Word EmbeddingsIntrinsic and Extrinsic Evaluations of Word Embeddings
Intrinsic and Extrinsic Evaluations of Word Embeddings
 
Enriching Transliteration Lexicon Using Automatic Transliteration Extraction
Enriching Transliteration Lexicon Using Automatic Transliteration ExtractionEnriching Transliteration Lexicon Using Automatic Transliteration Extraction
Enriching Transliteration Lexicon Using Automatic Transliteration Extraction
 
P99 1067
P99 1067P99 1067
P99 1067
 

Último

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Último (20)

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 

Presentation

  • 1. Exploring Higher Order Dependency Parsers Pranava Swaroop Madhyastha Supervised by: Prof. Michael Rosner & RNDr. Daniel Zeman September 6, 2011
  • 2. Introduction ◮ Dependency Grammar. ◮ Binary asymmetric relations - Head and Modifier - Highly lexical relationships. ◮ A quick example: ◮ Projective Constraint ◮ Graph Based Dependency Parsing ◮ Arc-Factored Parsing
  • 3. Problem Description? ◮ Augmentation of Features ◮ Semantic features ◮ Morpho-syntactic features ◮ Higher order parsing ◮ Context availability ◮ horizontal and vertical context availability ◮ Motivation ◮ Semi-supervised dependency parsing and improvements. ◮ Using well defined linguistic components.
  • 4. What is Higher Order Dependency Parsing ◮ First-order model - decomposition of the tree into head and modifier dependencies. ◮ Second-order models - inclusion of sibling relation of the modifier tokens along with head and modifier or inclusion of head and modifier and children of the modifier. ◮ Third-order models - one level up. ◮ An illustration
  • 6. Features ◮ For a given φ - a feature vector and w - the list of related parameters, each part is scored as Part(x, p) = w .φ(x, p) (1) ◮ Each of these contributing feature vectors would be scored by calculating the individual features in this fashion: ◮ dir.pos(h).pos(m) ◮ dir.form(h).pos(m) ◮ and so on ... ◮ The most basic feature patterns consider the surface form, part-of-speech, lemma and other morphosyntactic attributes of the head or the modifier of a dependency.
  • 7. Experimentation done with: ◮ English - Penn Treebank ◮ Section 2 to 10 as training set - a set of 15000 sentences. ◮ Random sets of sentences from sections 15, 17, 19, 25 of the Penn Treebank as development data - a set of 1000 sentences. ◮ Test set was chosen from Sections 0, 1, 21, 23 of the penn treebank - a set of 2000 sentences. ◮ Czech - Prague Dependency Treebank ◮ The sentences were chosen from pdt2-full-automorph dataset. ◮ The training set consisted of train1 - train5 splits - a set of 15,000 sentences.. ◮ The development set consisted of train6 and train7 splits - a set of 1000 sentences. ◮ The test set was made up of dtest and etest parts - a set of 2000 sentences.
  • 8. Experimentation ◮ Fine and Coarse Grained Wordsenses ◮ Approximation ◮ For English: ◮ Both Fine and Coarse Grained Wordsense extraction make use of WordNet::SenseRelate package. ◮ Fine grained wordsense basically restricts a word to a particular sense - Word - noun and first sense (extracted from the wordnet) ◮ Coarse Grained wordsense is a more generic wordsense description Word - the semantic file to which the word belongs to. ◮ For Czech: ◮ Only Fine Grained Wordsense extraction (approximately). ◮ extracted by using the sempos which is already tagged in the prague dependency treebank.
  • 9. Results for the Wordsense augmentation experiment ◮ Sibling based parsers show a statistically significant improvement. ◮ For English with Fine Grained wordsense addition - Third order grand-sibling based parser gives an improvement of +0.81 percent (Unlabeled Accuracy Score). A closer statistical examination showed that sibling based interactions which are close to each other have better precision. ◮ For English with Coarse Grained wordsense addition - the second order sibling based parser gives an improvement of approximately +1.09 percent. ◮ Again for Czech with fine grained wordsense augmentation, the 3rd order sibling based parser gives an improvement of approximately +1.20 percent.
  • 10. Results for Morphosyntactic augmentation experiment ◮ Morphosyntactic augmentation was basically used directly by extracting tags from the corpus. ◮ For Czech, instead of the 15 Letter tagset, we tried out a subset (which includes - Person, Number, POSSGender, Tense, Voice and Case) ◮ For English we integrated the fine grained part-of-speech.
  • 11. Results ◮ Both for English and Czech, there is a significant improvement in the parsing accuracy when it is parsed with the grandchild based algorithms. ◮ For Czech, the third order grand sibling based algorithm shows an improvement of +1.72 percent. ◮ For English, the third order grand sibling based algorithm shows an improvement of +1.21 percent.
  • 12. Conclusion ◮ Semantic features work better with sibling based parsers (larger horizontal contexts). ◮ Morpho-syntactic features work better with grandchild based parsers (larger vertical contexts). ◮ Features can be instrumental in several tasks, which include accurate labeling of semantic roles and other related tasks. ◮ Linguistic information can be better handled by a higher order parsing algorithm.
  • 13. Future Work ◮ Higher order parsers with labels (we have not yet tested labeled accuracy scores). ◮ Joint extraction of word-senses and semantic roles. ◮ Experimentation with lexical clusters. ◮ Thorough experimentation of several features. ◮ Maximum and Minimum order requirements.