SlideShare uma empresa Scribd logo
1 de 65
Baixar para ler offline
Optimization of NLP Components
                           For Robustness and Scalability
                                  Dissertation Defense
                                    August 15, 2012


                                     Jinho D. Choi
                                 choijd@colorado.edu

                             Department of Computer Science
                              University of Colorado Boulder



Friday, August 17, 2012
Ever since I came, Boulder has been ...
              •       #1: Top 10 College Towns (Livability, 2012)

              •       #1: Top 10 Least Obese Metro Areas (Gallup Healthways, 2012)

              •       #1: Top 10 Happiest Cities (Gallup Healthways, 2012)

              •       #1: The 10 Most Educated U.S. Cities (US News, 2011)

              •       #1: America’s 15 Most Active Cities (Time - Healthland, 2011)

              •       #1: Best Quality of Life in America (Porfolio, 2011)

              •       #1: 20 Brainiest Cities in America (Daily Beast, 2010)

              •       #1: Western Cities Fare Best in Well-being (USA Today, 2010)

              •       #1: America's Foodiest Town (Bon Appétit, 2010)

              •       #1: The Best Cities to Raise an Outdoor Kid (Backpacker, 2009)

              •       #1: America's Top 25 Towns To Live Well (Forbes, 2009)

              •       #1: America's Smartest Cities (Forbes, 2008)

              •       #1: Top Heart Friendly Cities (American Heart Association, 2008)


                                                             2
Friday, August 17, 2012
Contents
              •       Introduction
              •       Dependency conversion
              •       Experimental setup
              •       Part-of-speech tagging
              •       Dependency parsing
              •       Semantic role labeling
              •       Conclusion




                                               3
Friday, August 17, 2012
Introduction
              •       The application of NLP has ...
                    -     Expanded to everyday computing.

                    -     Broadened to a general audience.

                    ‣     More attention is drawn to the practical aspects of NLP.


              •       NLP components should be tested for
                    -     Robustness in handling heterogeneous data.
                          •   Need to be evaluated on data from several different sources.

                    -     Scalability in handling a large amount of data.
                          •   Need to be evaluated for speed and complexity.


                                                        4
Friday, August 17, 2012
Introduction
              •       Research question
                    -     How to improve the robustness and scalability of standard
                          NLP components.


              •       Goals
                    -     To prepare gold-standard data from several different sources
                          for in-genre and out-of-genre experiments.

                    -     To develop a POS tagger, a dependency parser, and a semantic
                          role labeler showing robust results across this data.

                    -     To reduce average complexities of these components while
                          retaining good performance in accuracy.



                                                   5
Friday, August 17, 2012
Introduction
              •       Thesis statement
                    1. We improve the robustness of three NLP components:
                          •   POS   tagger: by building a generalized model.

                          •   Dependency parser: by bootstrapping parse information.

                          •   Semantic role labeler: by applying higher-order argument pruning.

                    2. We improve the scalability of these three components:
                          •   POS   tagger: by adapting dynamic model selection.

                          •   Dependency parser: by optimizing the engineering of transition-
                              based parsing algorithms.

                          •   Semantic role labeler: by applying conditional higher-order
                              argument pruning.


                                                           6
Friday, August 17, 2012
Introduction
                                                    Start


                                                 Constituent
                                                 Treebanks
                                                 PropBanks


                                                 Dependency
                                                 Conversion


                            Training Set:                         Evaluation Set:
                          Dependency Trees                       Dependency Trees
                          + Semantic Roles                       + Semantic Roles


                           Part-of-speech       Part-of-speech    Part-of-speech
                              Trainer           Tagging Model         Tagger


                            Dependency           Dependency        Dependency
                              Trainer           Parsing Model        Parser


                           Semantic Role        Semantic Role     Semantic Role
                              Trainer           Labeling Model       Labeler


                                                                       Stop

                                                      7
Friday, August 17, 2012
Contents
              •       Introduction
              •       Dependency conversion
              •       Experimental setup
              •       Part-of-speech tagging
              •       Dependency parsing
              •       Semantic role labeling
              •       Conclusion




                                               8
Friday, August 17, 2012
Dependency Conversion
              •       Motivation
                    -     A small amount of manually annotated dependency trees
                          (Rambow et al., 2002; Čmejrek et al., 2004).

                    -     A large amount of manually annotated constituent trees
                          (Marcus et al., 1993; Weischedel et al., 2011).

                    -     Converting constituent trees into dependency trees
                          → A large amount of pseudo annotated dependency trees.

              •       Previous approaches
                    -     Penn2Malt (stp.lingfil.uu.se/~nivre/research/Penn2Malt.html).

                    -     LTH converter (Johansson and Nugues, 2007).

                    -     Stanford converter (de Marneffe and Manning, 2008a).

                                                    9
Friday, August 17, 2012
Dependency Conversion
              •       Comparison
                    -     The Stanford and CLEAR dependency approaches generate
                          3.62% and 0.23% of unclassified dependencies, respectively.

                    -     Our conversion produces 3.69% of non-projective trees.

                                         Penn2Malt         LTH    Stanford     CLEAR
                     Labels                 Malt          CoNLL   Stanford   Stanford+
                     Long-distance DPs                    ✓✓✓                 ✓✓✓
                     Secondary DPs                          ✓      ✓✓        ✓✓✓✓
                     Function Tags                         ✓✓                   ✓✓
                     New TB Format          NO             NO       NO          YES
                     Maintenance            NO             NO       YES         YES


                                                     10
Friday, August 17, 2012
Dependency Conversion (1/6)
              1. Input a constituent tree.
                    • Penn, OntoNotes, CRAFT, MiPACQ, and SHARP Treebanks.

                                              NP
                                  NP                  SBAR
                                              WHNP-1         S
                                                        NP          VP
                                                                         NP
                             NN   CC    NN WDT PRP VB               -NONE-
                            Peace and   joy    that     we   want    *T*-1



                                                   11
Friday, August 17, 2012
Dependency Conversion (2/6)
              2. Reorder constituents related to empty categories.
                    • *T*: wh-movement and topicalization.
                    • *RNR*: right node raising.
                    • *ICH* and *PPA*: discontinuous constituent.
                                NP                                                     NP
                    NP                  SBAR                               NP                    SBAR
                                WHNP-1         S                                                  S
                                         NP           VP                                    NP           VP
                                                           NP                                            WHNP-1
         NN         CC    NN WDT PRP VB               -NONE-         NN    CC    NN     PRP VB            WDT
       Peace and          joy    that    we    want    *T*-1         Peace and   joy        we    want        that



                                                                12
Friday, August 17, 2012
Dependency Conversion (3/6)
              3. Handle special cases.
                    • Apposition, coordination, and small clauses.
                                                       NP
                                         NP                      SBAR
                                                                   S
                                                            NP           VP
                                                                         WHNP-1
                                  NN     CC     NN       PRP VB           WDT
                                 Peace and       joy        we    want        that

                                                                  The original word order is preserved
                                                conj
                                              cc                  in the converted dependency tree.
                               root   Peace    and      joy      that    we      want


                                                        13
Friday, August 17, 2012
Dependency Conversion (4/6)
              4. Handle general cases.
                    • Head-finding rules and heuristics.
                                                           NP
                                             NP                      SBAR
                                                                          S
                                                                NP            VP
                                                                              WHNP-1
                                  NN         CC     NN       PRP VB             WDT
                                 Peace and           joy        we    want         that
                                                                  rcmod
                                                    conj                      dobj
                                      root        cc                            nsubj

                               root      Peace     and      joy      that     we      want


                                                            14
Friday, August 17, 2012
Dependency Conversion (5/6)
              5. Add secondary dependencies.
                    • Gapping, referent, right node raising, open clausal subject.
                                                            NP
                                              NP                      SBAR
                                                                            S
                                                                 NP             VP
                                                                                WHNP-1
                                   NN         CC     NN        PRP VB             WDT
                                  Peace and           joy        we     want         that
                                                                    rcmod
                                                     conj                       dobj
                                       root        cc                             nsubj

                                root      Peace     and       joy     that      we      want
                                                        ref

                                                              15
Friday, August 17, 2012
Nielsen et al., 2010; Weischedel et al., 2011; Verspoor et al., 2012). Tags followed by   ∗   are not the

                          Dependency Conversion (6/6)
           typical Penn Treebank tags but used in some other Treebanks.


            A.1    Function tags
              6. Add function tags.
                                                       Syntactic roles
                           ADV    Adverbial                    PUT    Locative complement of put
                           CLF    It-cleft                     PRD    Non-VP predicate
                           CLR    Closely related constituent RED∗ Reduced auxiliary
                           DTV    Dative                       SBJ    Surface subject
                           LGS    Logical subject in passive   TPC    Topicalization
                           NOM    Nominalization
                                                       Semantic roles
                           BNF    Benefactive                  MNR    Manner
                           DIR    Direction                    PRP    Purpose or reason
                           EXT    Extent                       TMP    Temporal
                           LOC    Locative                     VOC    Vocative
                                                Text and speech categories
                           ETC    Et cetera                    SEZ    Direct speech
                           FRM∗   Formula                      TTL    Title
                           HLN    Headline                     UNF    Unfinished constituent
                           IMP    Imperative

                                        Table A.1: A list of function tags for English.

                                                             16
Friday, August 17, 2012
Contents
              •       Introduction
              •       Dependency conversion
              •       Experimental setup
              •       Part-of-speech tagging
              •       Dependency parsing
              •       Semantic role labeling
              •       Conclusion




                                               17
Friday, August 17, 2012
Experimental Setup
              •       The Wall Street Journal (WSJ) models
                    -     Train
                          •   The WSJ 2-21 in OntoNotes (Weischedel et al., 2011).

                          •   Total: 30,060 sentences, 731,677 tokens, 77,826 predicates.

                    -     In-genre evaluation (Avgi)
                          •   The WSJ 23 in OntoNotes.

                          •   Total: 1,640 sentences, 39,590 tokens, 4,138 predicates.

                    -     Out-of-genre evaluation (Avgo)
                          •   5 genres in OntoNotes, 2 genres in MiPACQ (Nielsen et al., 2010),
                              1 genre in SHARP.

                          •   Total: 19,368 sentences, 265,337 tokens, 32,142 predicates.

                                                        18
Friday, August 17, 2012
Experimental Setup
              •       The OntoNotes models
                    -     Train
                          •   6 genres in OntoNotes.

                          •   Total: 96,406 sentences, 1,983,012 tokens, 213,695 predicates.

                    -     In-genre evaluation (Avgi)
                          •   6 genres in OntoNotes.

                          •   Total: 13,337 sentences, 201,893 tokens, 25,498 predicates.

                    -     Out-of-genre evaluation (Avgo)
                          •   Same 2 genres in MiPACQ, same 1 genre in SHARP.

                          •   Total: 7,671 sentences, 103,034 tokens, 10,782 predicates.


                                                        19
Friday, August 17, 2012
Experimental Setup
              •       Accuracy
                    -     Part-of-speech tagging
                          •   Accuracy.

                    -     Dependency parsing
                          •   Labeled attachment score (LAS).

                          •   Unlabeled attachment score (UAS).

                    -     Semantic role labeling
                          •   F1-score of argument identification.

                          •   F1-score of both argument identification and classification.




                                                        20
Friday, August 17, 2012
Experimental Setup
              •       Speed
                    -     All experiments are run on an Intel Xeon 2.57GHz machine.

                    -     Each model is run 5 times, and an average speed is measured
                          by taking the average of middle 3 speeds.


              •       Machine learning algorithm
                    -     Liblinear L2-regularization, L1-loss SVM classification
                          (Hsieh et al., 2008).

                    -     Designed to handle large scale, high dimensional vectors.

                    -     Runs fast with accurate performance.

                    -     Our implementation of LibLinear is publicly available.

                                                    21
Friday, August 17, 2012
Contents
              •       Introduction
              •       Dependency conversion
              •       Experimental setup
              •       Part-of-speech tagging
              •       Dependency parsing
              •       Semantic role labeling
              •       Conclusion




                                               22
Friday, August 17, 2012
Part-of-Speech Tagging
              •       Motivation
                    -     Supervised learning approaches do not perform well in
                          out-of-genre experiments.

                    -     Domain adaptation approaches require knowledge of
                          incoming data.

                    -     Complicated tagging or learning approaches often run slowly
                          during decoding.

              •       Dynamic model selection
                    -     Build two models, generalized and domain-specific, given one
                          set of training data.

                    -     Dynamically select one of the models during decoding.


                                                  23
Friday, August 17, 2012
Part-of-Speech Tagging
              •       Training
                    1. Group training data into documents (e.g., sections in WSJ).
                    2. Get the document frequency of each simplified word form.
                          • In simplified word forms, all numerical expressions with or w/o
                             special characters are converted to 0.

                    3. Build a domain-specific model using features extracted from
                       only tokens whose DF(SW) > 1.
                    4. Build a generalized model using features extracted from only
                       tokens whose DF(SW) > 2.
                    5. Find the cosine similarity threshold for dynamic model
                       selection.


                                                       24
Friday, August 17, 2012
Part-of-Speech Tagging
              •       Cosine similarity threshold
                    -     During cross-validation, collect cosine-similarities between
                          simplified word forms used for building the domain-specific
                          model and input sentences that the domain-specific model
                          shows advantage.

                    -     The cosine similarity in the first 5% area becomes the
                          threshold for dynamic model selection.
                                             190
                                             160
                                Occurrence




                                             120

                                              80

                                              40
                                                              5%
                                               0
                                                   0   0.02        0.04       0.06

                                                          Cosine Similarity
                                                              25
Friday, August 17, 2012
Part-of-Speech Tagging
              •       Decoding
                    -     Measure the cosine similarity between simplified word forms
                          used for building the domain-specific model and each input
                          sentence.

                    -     If the similarity is greater than the threshold, use the domain-
                          specific model.

                    -     If the similarity is less than or equal to the threshold, use the
                          generalized model.



                                 Runs as fast as a single model approach.



                                                     26
Friday, August 17, 2012
Part-of-Speech Tagging
              •       Experiments
                    -     Baseline: using the original word forms.

                    -     Baseline+: using lowercase simplified word forms.

                    -     Domain: domain-specific model.

                    -     General: generalized model.

                    -     ClearNLP: dynamic model selection.

                    -     Stanford: Toutanova et al., 2003.

                    -     SVMTool: Giménez and Màrquez, 2004.




                                                     27
Friday, August 17, 2012
Part-of-Speech Tagging
              •       Accuracy - WSJ models (Avgi and Avgo)
                                              In-domain experiments
                    97.5
                                                97.39             97.40   97.41
                                                                                   97.31
                                                         97.24
                    97.0
                            96.93    96.98

                    96.5
                           Baseline Baseline+ Domain    General ClearNLP Stanford SVMTool

                                             Out-of-domain experiments
                    90.5                                 90.61    90.79
                                                90.43
                    89.5                                                  89.92
                                                                                   89.49
                    88.5             88.64
                            88.25
                    87.5
                           Baseline Baseline+ Domain    General ClearNLP Stanford SVMTool


                                                        28
Friday, August 17, 2012
Part-of-Speech Tagging
              •       Accuracy - OntoNotes models (Avgi and Avgo)
                                                  In-domain experiments
                    96.6
                                                    96.58             96.56   96.52
                    96.4                                     96.41
                                         96.32
                    96.2        96.23                                                  96.19
                          96
                               Baseline Baseline+ Domain    General ClearNLP Stanford SVMTool

                                                 Out-of-domain experiments
                          90
                          89                                 89.26    89.26   89.20
                          88                        88.60
                                         87.75                                         87.61
                          87
                                86.79
                          86
                               Baseline Baseline+ Domain    General ClearNLP Stanford SVMTool


                                                            29
Friday, August 17, 2012
Part-of-Speech Tagging
              •       Speed comparison
                                   Model                Tokens per sec. Millisecs. per sen.
                                         ClearNLP            32,654             0.44
                                        ClearNLP+            39,491             0.37
                            WSJ
                                           Stanford            250             58.06
                                           SVMTool            1,058            13.71
                                         ClearNLP            32,206             0.45
                                        ClearNLP+            39,882             0.36
                          OntoNotes
                                           Stanford            136            106.34
                                           SVMTool             924             15.71
                                   • ClearNLP : as reported in the thesis.
                                   • ClearNLP+: new improved results.
                                                      30
Friday, August 17, 2012
Contents
              •       Introduction
              •       Dependency conversion
              •       Experimental setup
              •       Part-of-speech tagging
              •       Dependency parsing
              •       Semantic role labeling
              •       Conclusion




                                               31
Friday, August 17, 2012
Dependency Parsing
              •       Goals
                    1. To improve the average parsing complexity for non-
                       projective dependency parsing.
                    2. To reduce the discrepancy between dynamic features used
                       for training on gold trees and decoding automatic trees.
                    3. To ensure well-formed dependency graph properties.

              •       Approach
                    1. Combine transitions in both projective and non-projective
                       dependency parsing algorithms.
                    2. Bootstrap dynamic features during training.
                    3. Post-process.

                                                32
Friday, August 17, 2012
Table 5.1 shows functional decomposition of transitions used in Nivre’s arc-eager and Covington’s

                                    Dependency Parsing
     algorithms. Nivre’s arc-eager algorithm is a projective parsing algorithm that shows a worst-case

     parsing complexity of O(n) (Nivre, 2003). Covington’s algorithm is a non-projective parsing al-

              •       Transition decomposition
     gorithm that shows a worst-case parsing complexity of O(n2 ) without backtracking (Covington,

                    -
                 Decompose transitions in:
     2001). Covington’s algorithm was later formulated as a transition-based parsing algorithm by Nivre

                  •
     (2008), called Nivre’s list-based algorithm. Table(projective; Nivre, 2003).
                      Nivre’s arc-eager algorithm 5.3 shows the relation between the decomposed

                  • Nivre’s list-based algorithm (non-projective; Nivre, 2008).
     transitions in Table 5.1 and the transitions from the original algorithms.


               Operation        Transition                              Description
                                                                                                          l
                                Left-∗l       ( [λ1 |i], λ2 , [j|β], A ) ⇒ ( [λ1 |i], λ2 , [j|β], A ∪ {i ← j} )
                     Arc                                                                                  l
                                Right-∗l      ( [λ1 |i], λ2 , [j|β], A ) ⇒ ( [λ1 |i], λ2 , [j|β], A ∪ {i → j} )
                                No-∗          ( [λ1 |i], λ2 , [j|β], A ) ⇒ ( [λ1 |i], λ2 , [j|β], A )
                                ∗-Shiftd|n    ( [λ1 |i], λ2 , [j|β], A ) ⇒ ( [λ1 |i|λ2 |j], [ ], β, A )
                     List       ∗-Reduce      ( [λ1 |i], λ2 , [j|β], A ) ⇒ ( λ1 , λ2 , [j|β], A )
                                ∗-Pass        ( [λ1 |i], λ2 , [j|β], A ) ⇒ ( λ1 , [i|λ2 ], [j|β], A )

                      Table 5.1: Decomposed transitions grouped into the Arc and List operations.
                                 This decomposition makes it easier to integrate
                                  transitions from different parsing algorithms.
                  Operation      Transition                          Precondition
                                 Left-∗l        [i = 0] ∧ ¬[∃k. (i ← k) ∈ A] ∧ ¬[(i →∗ j) ∈ A]
                          Arc    Right-∗l                    33
                                                ¬[∃k. (k → j) ∈ A] ∧ ¬[(i ∗← j) ∈ A]
Friday, August 17, 2012          No-∗           ¬[∃l. Left-∗l ∨ Right-∗l ]
be recomposed into transitions used in several different dependency parsing algorithms.


       5.2.2
                                    Dependency Parsing
                          Transition recomposition

              •       Transition recomposition
     Any combination of two decomposed transitions in Table 5.1, one from each operation, can be

                    -
     recomposed into a new transition. Forof two decomposedof Left-∗l and ∗-Reduce makes a
                 Any combination instance, the combination transitions, one from
                   each operation,performs Left-∗ and ∗-Reduce sequentially; the Arc operation
                                     can be recomposed.
     transition, Left-Reduce , which l                    l

                    -
     is always performed before the List operation. Table 5.3 an ARC operation is
                  For each recomposed transition, shows how these decomposed transitions
                           performed first and a LIST operation is performed later.
     are recomposed into transitions used in different dependency parsing algorithms.
                                   Projective                  Non-projective
               Transition           Nivre’03    Covington’01   Nivre’08   CP’11       This work
               Left-Reducel                                                              
               Left-Passl                                                               
               Right-Shiftnl                                                              
               Right-Passl                                                              
               No-Shiftd                                                               
               No-Shiftn                                                                
               No-Reduce                                                                  
               No-Pass                                                                  

     Table 5.3: Transitions in different dependency parsing algorithms. The last column shows transitions
     used in our parsing algorithm. The other columns show transitions used in Nivre (2003), Covington
                                                     34
     (2001), Nivre (2008), and Choi and Palmer (2011a), respectively.
Friday, August 17, 2012
Dependency Parsing
              •       Average parsing complexity
                    -     The number of transitions performed per sentence.

                                              2850
                                               330                                     Nivre'08
                                                                                  Covington'01

                                              250
                           # of transitions
                           # of transitions




                                              2000
                                               200                                         CP'11
                                              1500
                                              150                                      Nivre'08
                                              1000                                     CP'11
                                               100                                      This work
                                               500                                     This work
                                                50
                                                0
                                                 10   20   30      40
                                                                   40    50
                                                                         50       60
                                                                                  60    70
                                                                                         70       80
                                                                                                   80

                                                                Sentence length
                                                                Sentence length


                                                                  35
Friday, August 17, 2012
Dependency Parsing
              •       Bootstrapping
                    -     Transition-based dependency parsing can take advantage of
                          dynamic features (e.g., head, leftmost/rightmost dependent).
                                      w0 ! h  j               wi  p  j


                                         wi                       wj


                                    w1         wj-1        wi+1 wj-1

                    -     Features extracted from gold-standard trees during training
                          can be different from features extracted from automatic
                          trees during decoding.

                    -     By bootstrapping these dynamic features, we can significantly
                          improve parsing accuracy.

                                                      36
Friday, August 17, 2012
Dependency Parsing
                                                   Begin



                                                  Training
                                                   Data


                                               Gold-standard          Gold-standard
                                                 Features                Labels


                                              Machine Learning
                                                 Algorithm


                                                 Statistical           Automatic
                                                  Model                Features


                          Determined by            Stop?
                                                                 NO   Dependency
                          cross-validation.                             Parser
                                                        YES

                                                    End


                                                   37
Friday, August 17, 2012
Dependency Parsing
              •       Post-processing
                    -     Transition-based dependency parsing does not guarantee
                          parse output to be a tree.

                    -     After parsing, we find the head of each headless token by
                          comparing it to all other tokens using the same model.

                    -     A predicted head with the highest score that does not break
                          tree properties becomes the head of this token.

                    -     This post-processing technique significantly improves parsing
                          accuracy in out-of-genre experiments.




                                                   38
Friday, August 17, 2012
Dependency Parsing
              •       Experiments
                    -     Baseline: using all recomposed transitions.

                    -     Baseline+: Baseline with post-processing.

                    -     ClearNLP: Baseline+ with bootstrapping.

                    -     CN’09: Choi and Nicolov, 2009.

                    -     CP’11: Choi and Palmer, 2011a.

                    -     MaltParser: Nivre, 2009.

                    -     MSTParser: McDonald et al., 2005.
                          •   Use only 1st order features; with 2nd order features, accuracy is
                              expected to be higher and speed is expected to be slower.


                                                        39
Friday, August 17, 2012
Dependency Parsing
              •       Accuracy - WSJ models (Avgi and Avgo)                                                                   LAS
                                                                                                                              UAS
                                                            In-genre experiments
            90
                                                                  89.68           89.5            89.74
         88.75
                                  88.57           88.81
                                                                                                                  88.23           88.36
          87.5                                            88.10           87.79           88.03
                          86.94           87.18
         86.25                                                                                            86.49
                                                                                                                          86.03
            85
                          Baseline        Baseline+ ClearNLP              CN’09           CP’11         MaltParser MSTParser
                                                          Out-of-genre experiments
            80
         78.25                                                    79.36           79.08           79.18                           79.26
                                                  78.60                                                           78.29
                                  78.04
          76.5
         74.75                                            75.50           75.23           75.34
                                          74.68                                                                           74.46
                          74.18                                                                           74.10
            73
                          Baseline        Baseline+ ClearNLP              CN’09           CP’11         MaltParser MSTParser


                                                                          40
Friday, August 17, 2012
Dependency Parsing
              •       Accuracy - OntoNotes models (Avgi and Avgo)                                                             LAS
                                                                                                                              UAS
                                                            In-genre experiments
               88
               87                                                 87.75           87.48           87.57
               86                 86.54           86.83                                                                           86.70
                                                                                                                  86.40
               85                                         85.68           85.41           85.49
               84         84.51           84.76
                                                                                                          84.05
               83                                                                                                         83.66
                          Baseline        Baseline+ ClearNLP              CN’09           CP’11         MaltParser MSTParser
                                                          Out-of-genre experiments
          78.5
                                                                  78.05                                                           77.94
         76.75                                                                    77.43           77.40           77.54
                                  76.26           76.65
            75
         73.25                                            74.18           73.83           73.86           73.47           73.30
                          72.37           72.73
          71.5
                          Baseline        Baseline+ ClearNLP              CN’09           CP’11         MaltParser MSTParser


                                                                          41
Friday, August 17, 2012
Dependency Parsing
              •       Speed comparison - WSJ models
                             ClearNLP               ClearNLP+          CN’09     CP’11     MaltParser
                              1.61 ms                1.16 ms           1.25 ms    1.08 ms     2.14 ms
                                     20


                                     15
                      Milliseconds




                                     10


                                      5


                                      0
                                          10   20       30        40        50    60    70    80
                                                                Sentence Length

                                                                       42
Friday, August 17, 2012
Dependency Parsing
              •       Speed comparison - OntoNotes models
                             ClearNLP               ClearNLP+          CN’09     CP’11     MaltParser
                              1.89 ms                1.28 ms           1.26 ms    1.12 ms     2.14 ms
                                     20


                                     15
                      Milliseconds




                                     10


                                      5


                                      0
                                          10   20       30        40        50    60    70    80
                                                                Sentence Length

                                                                       43
Friday, August 17, 2012
Contents
              •       Introduction
              •       Dependency conversion
              •       Experimental setup
              •       Part-of-speech tagging
              •       Dependency parsing
              •       Semantic role labeling
              •       Conclusion




                                               44
Friday, August 17, 2012
Semantic Role Labeling
              •       Motivation
                    -     Not all tokens need to be visited for semantic role labeling.

                    -     A typical pruning algorithm does not work as well when
                          automatically generated trees are provided.

                    -     An enhanced pruning algorithm could improve argument
                          coverage while maintaining low average labeling complexity.

              •       Approach
                    -     Higher-order argument pruning.

                    -     Conditional higher-order argument pruning.

                    -     Positional feature separation.


                                                    45
Friday, August 17, 2012
Semantic Role Labeling
              •       Semantic roles in dependency trees
                          ARG0             ARG1       ARG2   ARGM-TMP




                                              46
Friday, August 17, 2012
Semantic Role Labeling
              •       First-order argument pruning (1st)
                    -     Originally designed for constituent trees.
                          •   Considers only siblings of the predicate, predicate’s ancestors, and
                              siblings of predicate’s ancestors argument candidates (Xue and
                              Palmer, 2004).

                    -     Redesigned for dependency trees.
                          •   Considers only dependents of the predicate, predicate’s ancestors,
                              and dependents of predicate’s ancestors argument candidates
                              (Johansson and Nugues, 2008).

                    -     Covers over 99% of all arguments using gold-standard trees.

                    -     Covers only 93% of all arguments using automatic trees.



                                                         47
Friday, August 17, 2012
Semantic Role Labeling
              •                      Higher-order argument pruning (High)
                            -          Considers all descendants of the predicate, predicate’s
                                       ancestors, and dependents of predicate’s ancestors
                                       argument candidates.

                            -          Significantly improves argument coverage when automatically
                                       generated trees are used.
                                     100
                                                                                                99.92
                                                                                    99.44
                 Argument Coverage




                                      98                                  98.24
                                                                97.59
                                      96

                                      94

                                      92              92.94

                                            91.02
                                      90
                                           WSJ-1st   ON-1st   WSJ-High   ON-High   Gold-1st   Gold-High


                                                                  48
Friday, August 17, 2012
Semantic Role Labeling
              •       Conditional higher-order argument pruning (High+)
                    -     Reduces argument candidates using path-rules.

                    -     Before training,
                          •   Collect paths between predicates and their descendants whose
                              subtrees contain arguments of the predicates.

                          •   Collect paths between predicates and their ancestors whose
                              direct dependents or ancestors are arguments of the predicates.

                          •   Cut off paths whose counts are below thresholds.

                    -     During training and decoding, skip tokens and their subtrees
                          or ancestors whose paths to the predicates are not seen.




                                                       49
Friday, August 17, 2012
Semantic Role Labeling
              •       Average labeling complexity
                    -                The number of tokens visited per predicate.
                                             Using the WSJ models (OntoNotes graph is similar)
                                            75                                                   All
                                            60
                          # of candidates




                                            50
                                            40                                                   High
                                            30
                                                                                                 High+
                                            20                                                   1st
                                            10
                                             0
                                              10   20     30      40        50   60   70    80

                                                               Sentence length
                                                                       50
Friday, August 17, 2012
Semantic Role Labeling
              •       Positional feature separation
                    -     Group features by arguments’ positions with respect to their
                          predicates.

                    -     Two sets of features are extracted.
                          •   All features derived from arguments on the lefthand side of the
                              predicates are grouped in one set, SL.

                          •   All features derived from arguments on the righthand side of the
                              predicates are grouped in another set, SR.

                    -     During training, build two models, ML and MR, for SL and SR.

                    -     During decoding, use ML and MR for argument candidates on
                          the lefthand and righthand sides of the predicates.


                                                        51
Friday, August 17, 2012
Semantic Role Labeling
              •       Experiments
                    -     Baseline: 1st order argument pruning.

                    -     Baseline+: Baseline with positional feature separation.

                    -     High: higher-order argument pruning.

                    -     All: no argument pruning.

                    -     ClearNLP: conditional higher-order argument pruning.
                          •   Previously called High+.

                    -     ClearParser: Choi and Palmer, 2011b.




                                                         52
Friday, August 17, 2012
Semantic Role Labeling
              •       Accuracy - WSJ models (Avgi and Avgo)
                                                 In-domain experiments
             82.6
                                                     82.52        82.48
             82.3                                                            82.42
                                       82.28                                             82.26
             82.0
                           81.88
             81.7
                          Baseline   Baseline+       High          All     ClearNLP   ClearParser

                                               Out-of-domain experiments
              72
                                                     71.90        71.95
             71.7                                                            71.85
                                       71.64
             71.4                                                                        71.52
             71.1
                           71.07
             70.8
                          Baseline   Baseline+       High          All     ClearNLP   ClearParser


                                                             53
Friday, August 17, 2012
Semantic Role Labeling
              •       Accuracy - OntoNotes models (Avgi and Avgo)
                                                 In-domain experiments
             81.7
                                                                                         81.69
                                                     81.51        81.48      81.52
             81.3                      81.33
             80.9
                           80.73
             80.5
                          Baseline   Baseline+       High          All     ClearNLP   ClearParser

                                               Out-of-domain experiments
             70.9
                                                                  70.81
             70.5                                    70.68                   70.68
                                       70.54
             70.1
                           70.02                                                         70.01
             69.7
                          Baseline   Baseline+       High          All     ClearNLP   ClearParser


                                                             54
Friday, August 17, 2012
Semantic Role Labeling
                    •      Speed comparison - WSJ models
                          -     Milliseconds for finding all arguments of each predicate.
                           3                                                       ClearNLP
                                                                                   ClearNLP+
                                                                                   Baseline+
                         2.25                                                      High
                                                                                   All
          Milliseconds




                                                                                   ClearParser
                          1.5


                         0.75


                           0
                                10   20    30     40      50        60   70   80
                                                Sentence Length

                                                               55
Friday, August 17, 2012
Semantic Role Labeling
               •            Speed comparison - OntoNotes models

                             3                                                    ClearNLP
                                                                                  ClearNLP+
                                                                                  Baseline+
                           2.25                                                   High
                                                                                  All
            Milliseconds




                                                                                  ClearParser
                            1.5


                           0.75


                             0
                                  10   20   30     40      50      60   70   80
                                                 Sentence Length


                                                            56
Friday, August 17, 2012
Contents
              •       Introduction
              •       Dependency conversion
              •       Experimental setup
              •       Part-of-speech tagging
              •       Dependency parsing
              •       Semantic role labeling
              •       Conclusion




                                               57
Friday, August 17, 2012
Conclusion
              •       Our dependency conversion gives rich dependency
                      representations and can be applied to most English Treebanks.

              •       The dynamic model selection runs fast and shows robust POS
                      tagging accuracy across different genres.

              •       Our parsing algorithm shows linear-time average parsing
                      complexity for generating both proj. and non-proj. trees.

              •       The bootstrapping technique gives significant improvement on
                      parsing accuracy.

              •       The higher-order argument pruning gives significant
                      improvement on argument coverage.

              •       The conditional higher-order argument pruning reduces average
                      labeling complexity without compromising the F1-score.


                                                  58
Friday, August 17, 2012
Conclusion
              •       Contributions
                    -     First time that these three components have been evaluated
                          together on such a wide variety of English data.

                    -     Maintained high level accuracy while improving efficiency,
                          modularity, and portability of these components.

                    -     Dynamic model selection and bootstrapping can be generally
                          applicable for tagging and parsing, respectively.

                    -     Processing all three components take about 2.49 - 2.69 ms
                          (tagging: 0.36 - 0.37, parsing: 1.16 - 1.28, labeling: 0.97 - 1.04).

                    -     All components are publicly available as an open source
                          project, called ClearNLP (clearnlp.googlecode.com).



                                                      59
Friday, August 17, 2012
Conclusion
              •       Future work
                    -     Integrate the dynamic model selection approach with more
                          sophisticated tagging algorithms.

                    -     Evaluate our parsing approach on languages containing more
                          non-projective dependency trees.

                    -     Improve semantic role labeling where the quality of input
                          parse trees is poor (using joint-inference).




                                                   60
Friday, August 17, 2012
Acknowledgment
              •       We gratefully acknowledge the support of the following grants. Any contents
                      expressed in this material are those of the authors and do not necessarily
                      reflect the views of any grant.

                    -     The National Science Foundation Grants IIS-0325646, Domain Independent
                          Semantic Parsing, CISE-CRI-0551615, Towards a Comprehensive Linguistic
                          Annotation, CISE-CRI 0709167, Collaborative: A Multi-Representational and
                          Multi-Layered Treebank for Hindi/Urdu, CISE- IIS-RI-0910992, Richer
                          Representations for Machine Translation.

                    -     A grant from the Defense Advanced Research Projects Agency (DARPA/
                          IPTO) under the GALE program, DARPA/CMO Contract No. HR0011-06-
                          C-0022, subcontract from BBN, Inc.

                    -     A subcontract from the Mayo Clinic and Harvard Children’s Hospital based
                          on a grant from the ONC, 90TR0002/01.

                    -     Strategic Health Advanced Research Project Area 4: Natural Language
                          Processing.



                                                        61
Friday, August 17, 2012
Acknowledgment
              •       Special thanks are due to
                    -     Martha Palmer for practically being my mom for 5 years.

                    -     James Martin for always encouraging me when I’m low.

                    -     Wayne Ward for wonderful smiles.

                    -     Bhuvana Narasimhan for bringing Hindi to my life.

                    -     Joakim Nivre for suffering under millions of my questions.

                    -     Nicolas Nicolov for making me feel normal when others call
                          me “workaholic”.

                    -     All CINC folks for letting me live (literally) at my cube.



                                                     62
Friday, August 17, 2012
References
              •       Jinho D. Choi and Nicolas Nicolov. K-best, Locally Pruned, Transition-based Dependency Parsing Using
                      Robust Risk Minimization. In Recent Advances in Natural Language Processing V, pages 205–216. John
                      Benjamins, 2009.

              •       Jinho D. Choi and Martha Palmer. Getting the Most out of Transition-based Dependency Parsing. In
                      Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics:Human
                      Language Technologies, ACL:HLT’11, pages 687–692, 2011a.

              •       Jinho D. Choi and Martha Palmer. Transition-based Semantic Role Labeling Using Predicate Argument
                      Clustering. In Proceedings of ACL workshop on Relational Models of Semantics, RELMS’11, pages 37–
                      45, 2011b.

              •       M. Čmejrek, J. Cuřín, and J. Havelka. Prague Czech-English Dependency Treebank: Any Hopes for a
                      Common Annotation Scheme? In HLT-NAACL’04 workshop on Frontiers in CorpusAnnotation, pages
                      47–54, 2004.

              •       Jesús Giménez and Lluís Màrquez. SVMTool: A general POS tagger generator based on Support Vector
                      Machines. In Proceedings of the 4th International Conference on Language Resources and Evaluation,
                      LREC’04, 2004.

              •       Richard Johansson and Pierre Nugues. Dependency-based Semantic Role Labeling of PropBank. In
                      Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing
                      (EMNLP’08), pages 69–78, 2008.




                                                                  63
Friday, August 17, 2012
References
              •       Cho-Jui Hsieh, Kai-Wei Chang, Chih-Jen Lin, S. Sathiya Keerthi, and S. Sundararajan. A Dual Coordinate
                      Descent Method for Large-scale Linear SVM. In Proceedings of the 25th international conference on
                      Machine learning, ICML’08, pages 408–415, 2008.

              •       Mitchell P. Marcus, Mary Ann Marcinkiewicz, and Beatrice Santorini. Building a Large Annotated Corpus
                      of English: The Penn Treebank. Computational Linguistics, 19(2):313–330, 1993.

              •       Marie-Catherine de Marneffe and Christopher D. Manning. The Stanford typed dependencies
                      representation. In Proceedings of the COLING workshop on Cross-Framework and Cross-
                      DomainParser Evaluation, 2008a.

              •       Ryan McDonald, Fernando Pereira, Kiril Ribarov, and Jan Hajič. Non-projective Dependency Parsing
                      using Spanning Tree Algorithms. In Proceedings of the Conference on Human LanguageTechnology and
                      Empirical Methods in Natural Language Processing (HLT-EMNLP’05), pages523–530, 2005.

              •       Rodney D. Nielsen, James Masanz, Philip Ogren, Wayne Ward, James H. Martin, Guergana Savova, and
                      Martha Palmer. An architecture for complex clinical question answering. In Proceedings of the 1st ACM
                      International Health Informatics Symposium, IHI’10, pages 395–399, 2010.

              •       Joakim Nivre. An Efficient Algorithm for Projective Dependency Parsing. In Proceedings of the 8th
                      International Workshop on Parsing Technologies, IWPT’03, pages 149–160, 2003.

              •       Joakim Nivre. Algorithms for deterministic incremental dependency parsing. Computational
                      Linguistics, 34(4):513–553, 2008.




                                                                   64
Friday, August 17, 2012
References
              •       Joakim Nivre. Non-Projective Dependency Parsing in Expected Linear Time. In Proceedings of the Joint
                      Conference of the 47th Annual Meeting of the ACL and the 4th International JointConference on
                      Natural Language Processing of the AFNLP (ACL-IJCNLP’09), pages 351–359,2009.

              •       Owen Rambow, Cassandre Creswell, Rachel Szekely, Harriet Taber, and Marilyn Walker. A Dependency
                      Treebank for English. In Proceedings of the 3rd International Conference on LanguageResources and
                      Evaluation (LREC’02), 2002.

              •       Ralph Weischedel, Eduard Hovy, Martha Palmer, Mitch Marcus, Robert Belvin, Sameer Pradhan, Lance
                      Ramshaw, and Nianwen Xue. OntoNotes: A Large Training Corpus for Enhanced Processing. In Joseph
                      Olive, Caitlin Christianson, and John McCary, editors, Handbook of NaturalLanguage Processing and
                      Machine Translation. Springer, 2011.

              •       Kristina Toutanova, Dan Klein, Christopher D. Manning, and Yoram Singer. Feature-Rich Part-of-Speech
                      Tagging with a Cyclic Dependency Network. In Proceedings of the Annual Conference of the North
                      American Chapter of the Association for Computational Linguistics on HumanLanguage Technology,
                      NAACL’03, pages 173–180, 2003.

              •       Nianwen Xue and Martha Palmer. Calibrating Features for Semantic Role Labeling. In Proceedings of
                      the Conference on Empirical Methods in Natural Language Processing, 2004.




                                                                  65
Friday, August 17, 2012

Mais conteúdo relacionado

Semelhante a Optimization of NLP Components for Robustness and Scalability

An Analytics Toolkit Tour
An Analytics Toolkit TourAn Analytics Toolkit Tour
An Analytics Toolkit TourRory Winston
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)Yuriy Guts
 
Is43 Developing A Control System Scope Of Work Talking Points
Is43 Developing A Control System Scope Of Work   Talking PointsIs43 Developing A Control System Scope Of Work   Talking Points
Is43 Developing A Control System Scope Of Work Talking Pointsstevegreenblatt
 
Alabfi em-20120624
Alabfi em-20120624Alabfi em-20120624
Alabfi em-20120624zepheiraorg
 
Wed 1430 kartik_subramanian_color
Wed 1430 kartik_subramanian_colorWed 1430 kartik_subramanian_color
Wed 1430 kartik_subramanian_colorDATAVERSITY
 
Foundation Models in Recommender Systems
Foundation Models in Recommender SystemsFoundation Models in Recommender Systems
Foundation Models in Recommender SystemsAnoop Deoras
 
Chapter7 simulation handbook_nohanagi
Chapter7 simulation handbook_nohanagiChapter7 simulation handbook_nohanagi
Chapter7 simulation handbook_nohanagiNoha Nagi
 
Maintaining Large Scale Julia Ecosystems
Maintaining Large Scale Julia EcosystemsMaintaining Large Scale Julia Ecosystems
Maintaining Large Scale Julia EcosystemsChris Rackauckas
 
pre-defence.pptx
pre-defence.pptxpre-defence.pptx
pre-defence.pptxkhanz8
 
Creativity vs Best Practices
Creativity vs Best PracticesCreativity vs Best Practices
Creativity vs Best PracticesSupun Dissanayake
 
Experimenting with eXtreme Design (EKAW2010)
Experimenting with eXtreme Design (EKAW2010)Experimenting with eXtreme Design (EKAW2010)
Experimenting with eXtreme Design (EKAW2010)evabl444
 
Astitva jneyatva-abhideyatva
Astitva jneyatva-abhideyatvaAstitva jneyatva-abhideyatva
Astitva jneyatva-abhideyatvaNagaraju Pappu
 
Introduction to Software - Coder Forge - John Mulhall
Introduction to Software - Coder Forge - John MulhallIntroduction to Software - Coder Forge - John Mulhall
Introduction to Software - Coder Forge - John MulhallJohn Mulhall
 
IRJET- Survey on Deep Learning Approaches for Phrase Structure Identification...
IRJET- Survey on Deep Learning Approaches for Phrase Structure Identification...IRJET- Survey on Deep Learning Approaches for Phrase Structure Identification...
IRJET- Survey on Deep Learning Approaches for Phrase Structure Identification...IRJET Journal
 

Semelhante a Optimization of NLP Components for Robustness and Scalability (20)

An Analytics Toolkit Tour
An Analytics Toolkit TourAn Analytics Toolkit Tour
An Analytics Toolkit Tour
 
Evolutionary Design Solid
Evolutionary Design SolidEvolutionary Design Solid
Evolutionary Design Solid
 
Iqnite keynote
Iqnite keynoteIqnite keynote
Iqnite keynote
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)
 
TermGenie and Shoge
TermGenie and ShogeTermGenie and Shoge
TermGenie and Shoge
 
Is43 Developing A Control System Scope Of Work Talking Points
Is43 Developing A Control System Scope Of Work   Talking PointsIs43 Developing A Control System Scope Of Work   Talking Points
Is43 Developing A Control System Scope Of Work Talking Points
 
Alabfi em-20120624
Alabfi em-20120624Alabfi em-20120624
Alabfi em-20120624
 
Wed 1430 kartik_subramanian_color
Wed 1430 kartik_subramanian_colorWed 1430 kartik_subramanian_color
Wed 1430 kartik_subramanian_color
 
LLM.pdf
LLM.pdfLLM.pdf
LLM.pdf
 
Metamorphic Domain-Specific Languages
Metamorphic Domain-Specific LanguagesMetamorphic Domain-Specific Languages
Metamorphic Domain-Specific Languages
 
Foundation Models in Recommender Systems
Foundation Models in Recommender SystemsFoundation Models in Recommender Systems
Foundation Models in Recommender Systems
 
Chapter7 simulation handbook_nohanagi
Chapter7 simulation handbook_nohanagiChapter7 simulation handbook_nohanagi
Chapter7 simulation handbook_nohanagi
 
Maintaining Large Scale Julia Ecosystems
Maintaining Large Scale Julia EcosystemsMaintaining Large Scale Julia Ecosystems
Maintaining Large Scale Julia Ecosystems
 
4.3.pdf
4.3.pdf4.3.pdf
4.3.pdf
 
pre-defence.pptx
pre-defence.pptxpre-defence.pptx
pre-defence.pptx
 
Creativity vs Best Practices
Creativity vs Best PracticesCreativity vs Best Practices
Creativity vs Best Practices
 
Experimenting with eXtreme Design (EKAW2010)
Experimenting with eXtreme Design (EKAW2010)Experimenting with eXtreme Design (EKAW2010)
Experimenting with eXtreme Design (EKAW2010)
 
Astitva jneyatva-abhideyatva
Astitva jneyatva-abhideyatvaAstitva jneyatva-abhideyatva
Astitva jneyatva-abhideyatva
 
Introduction to Software - Coder Forge - John Mulhall
Introduction to Software - Coder Forge - John MulhallIntroduction to Software - Coder Forge - John Mulhall
Introduction to Software - Coder Forge - John Mulhall
 
IRJET- Survey on Deep Learning Approaches for Phrase Structure Identification...
IRJET- Survey on Deep Learning Approaches for Phrase Structure Identification...IRJET- Survey on Deep Learning Approaches for Phrase Structure Identification...
IRJET- Survey on Deep Learning Approaches for Phrase Structure Identification...
 

Mais de Jinho Choi

Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...
Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...
Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...Jinho Choi
 
Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...
Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...
Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...Jinho Choi
 
Competence-Level Prediction and Resume & Job Description Matching Using Conte...
Competence-Level Prediction and Resume & Job Description Matching Using Conte...Competence-Level Prediction and Resume & Job Description Matching Using Conte...
Competence-Level Prediction and Resume & Job Description Matching Using Conte...Jinho Choi
 
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...Jinho Choi
 
The Myth of Higher-Order Inference in Coreference Resolution
The Myth of Higher-Order Inference in Coreference ResolutionThe Myth of Higher-Order Inference in Coreference Resolution
The Myth of Higher-Order Inference in Coreference ResolutionJinho Choi
 
Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...
Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...
Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...Jinho Choi
 
Abstract Meaning Representation
Abstract Meaning RepresentationAbstract Meaning Representation
Abstract Meaning RepresentationJinho Choi
 
Semantic Role Labeling
Semantic Role LabelingSemantic Role Labeling
Semantic Role LabelingJinho Choi
 
CS329 - WordNet Similarities
CS329 - WordNet SimilaritiesCS329 - WordNet Similarities
CS329 - WordNet SimilaritiesJinho Choi
 
CS329 - Lexical Relations
CS329 - Lexical RelationsCS329 - Lexical Relations
CS329 - Lexical RelationsJinho Choi
 
Automatic Knowledge Base Expansion for Dialogue Management
Automatic Knowledge Base Expansion for Dialogue ManagementAutomatic Knowledge Base Expansion for Dialogue Management
Automatic Knowledge Base Expansion for Dialogue ManagementJinho Choi
 
Attention is All You Need for AMR Parsing
Attention is All You Need for AMR ParsingAttention is All You Need for AMR Parsing
Attention is All You Need for AMR ParsingJinho Choi
 
Graph-to-Text Generation and its Applications to Dialogue
Graph-to-Text Generation and its Applications to DialogueGraph-to-Text Generation and its Applications to Dialogue
Graph-to-Text Generation and its Applications to DialogueJinho Choi
 
Real-time Coreference Resolution for Dialogue Understanding
Real-time Coreference Resolution for Dialogue UnderstandingReal-time Coreference Resolution for Dialogue Understanding
Real-time Coreference Resolution for Dialogue UnderstandingJinho Choi
 
Topological Sort
Topological SortTopological Sort
Topological SortJinho Choi
 
Multi-modal Embedding Learning for Early Detection of Alzheimer's Disease
Multi-modal Embedding Learning for Early Detection of Alzheimer's DiseaseMulti-modal Embedding Learning for Early Detection of Alzheimer's Disease
Multi-modal Embedding Learning for Early Detection of Alzheimer's DiseaseJinho Choi
 
Building Widely-Interpretable Semantic Networks for Dialogue Contexts
Building Widely-Interpretable Semantic Networks for Dialogue ContextsBuilding Widely-Interpretable Semantic Networks for Dialogue Contexts
Building Widely-Interpretable Semantic Networks for Dialogue ContextsJinho Choi
 
How to make Emora talk about Sports Intelligently
How to make Emora talk about Sports IntelligentlyHow to make Emora talk about Sports Intelligently
How to make Emora talk about Sports IntelligentlyJinho Choi
 

Mais de Jinho Choi (20)

Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...
Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...
Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...
 
Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...
Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...
Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...
 
Competence-Level Prediction and Resume & Job Description Matching Using Conte...
Competence-Level Prediction and Resume & Job Description Matching Using Conte...Competence-Level Prediction and Resume & Job Description Matching Using Conte...
Competence-Level Prediction and Resume & Job Description Matching Using Conte...
 
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...
 
The Myth of Higher-Order Inference in Coreference Resolution
The Myth of Higher-Order Inference in Coreference ResolutionThe Myth of Higher-Order Inference in Coreference Resolution
The Myth of Higher-Order Inference in Coreference Resolution
 
Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...
Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...
Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...
 
Abstract Meaning Representation
Abstract Meaning RepresentationAbstract Meaning Representation
Abstract Meaning Representation
 
Semantic Role Labeling
Semantic Role LabelingSemantic Role Labeling
Semantic Role Labeling
 
CKY Parsing
CKY ParsingCKY Parsing
CKY Parsing
 
CS329 - WordNet Similarities
CS329 - WordNet SimilaritiesCS329 - WordNet Similarities
CS329 - WordNet Similarities
 
CS329 - Lexical Relations
CS329 - Lexical RelationsCS329 - Lexical Relations
CS329 - Lexical Relations
 
Automatic Knowledge Base Expansion for Dialogue Management
Automatic Knowledge Base Expansion for Dialogue ManagementAutomatic Knowledge Base Expansion for Dialogue Management
Automatic Knowledge Base Expansion for Dialogue Management
 
Attention is All You Need for AMR Parsing
Attention is All You Need for AMR ParsingAttention is All You Need for AMR Parsing
Attention is All You Need for AMR Parsing
 
Graph-to-Text Generation and its Applications to Dialogue
Graph-to-Text Generation and its Applications to DialogueGraph-to-Text Generation and its Applications to Dialogue
Graph-to-Text Generation and its Applications to Dialogue
 
Real-time Coreference Resolution for Dialogue Understanding
Real-time Coreference Resolution for Dialogue UnderstandingReal-time Coreference Resolution for Dialogue Understanding
Real-time Coreference Resolution for Dialogue Understanding
 
Topological Sort
Topological SortTopological Sort
Topological Sort
 
Tries - Put
Tries - PutTries - Put
Tries - Put
 
Multi-modal Embedding Learning for Early Detection of Alzheimer's Disease
Multi-modal Embedding Learning for Early Detection of Alzheimer's DiseaseMulti-modal Embedding Learning for Early Detection of Alzheimer's Disease
Multi-modal Embedding Learning for Early Detection of Alzheimer's Disease
 
Building Widely-Interpretable Semantic Networks for Dialogue Contexts
Building Widely-Interpretable Semantic Networks for Dialogue ContextsBuilding Widely-Interpretable Semantic Networks for Dialogue Contexts
Building Widely-Interpretable Semantic Networks for Dialogue Contexts
 
How to make Emora talk about Sports Intelligently
How to make Emora talk about Sports IntelligentlyHow to make Emora talk about Sports Intelligently
How to make Emora talk about Sports Intelligently
 

Último

Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 

Último (20)

Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 

Optimization of NLP Components for Robustness and Scalability

  • 1. Optimization of NLP Components For Robustness and Scalability Dissertation Defense August 15, 2012 Jinho D. Choi choijd@colorado.edu Department of Computer Science University of Colorado Boulder Friday, August 17, 2012
  • 2. Ever since I came, Boulder has been ... • #1: Top 10 College Towns (Livability, 2012) • #1: Top 10 Least Obese Metro Areas (Gallup Healthways, 2012) • #1: Top 10 Happiest Cities (Gallup Healthways, 2012) • #1: The 10 Most Educated U.S. Cities (US News, 2011) • #1: America’s 15 Most Active Cities (Time - Healthland, 2011) • #1: Best Quality of Life in America (Porfolio, 2011) • #1: 20 Brainiest Cities in America (Daily Beast, 2010) • #1: Western Cities Fare Best in Well-being (USA Today, 2010) • #1: America's Foodiest Town (Bon Appétit, 2010) • #1: The Best Cities to Raise an Outdoor Kid (Backpacker, 2009) • #1: America's Top 25 Towns To Live Well (Forbes, 2009) • #1: America's Smartest Cities (Forbes, 2008) • #1: Top Heart Friendly Cities (American Heart Association, 2008) 2 Friday, August 17, 2012
  • 3. Contents • Introduction • Dependency conversion • Experimental setup • Part-of-speech tagging • Dependency parsing • Semantic role labeling • Conclusion 3 Friday, August 17, 2012
  • 4. Introduction • The application of NLP has ... - Expanded to everyday computing. - Broadened to a general audience. ‣ More attention is drawn to the practical aspects of NLP. • NLP components should be tested for - Robustness in handling heterogeneous data. • Need to be evaluated on data from several different sources. - Scalability in handling a large amount of data. • Need to be evaluated for speed and complexity. 4 Friday, August 17, 2012
  • 5. Introduction • Research question - How to improve the robustness and scalability of standard NLP components. • Goals - To prepare gold-standard data from several different sources for in-genre and out-of-genre experiments. - To develop a POS tagger, a dependency parser, and a semantic role labeler showing robust results across this data. - To reduce average complexities of these components while retaining good performance in accuracy. 5 Friday, August 17, 2012
  • 6. Introduction • Thesis statement 1. We improve the robustness of three NLP components: • POS tagger: by building a generalized model. • Dependency parser: by bootstrapping parse information. • Semantic role labeler: by applying higher-order argument pruning. 2. We improve the scalability of these three components: • POS tagger: by adapting dynamic model selection. • Dependency parser: by optimizing the engineering of transition- based parsing algorithms. • Semantic role labeler: by applying conditional higher-order argument pruning. 6 Friday, August 17, 2012
  • 7. Introduction Start Constituent Treebanks PropBanks Dependency Conversion Training Set: Evaluation Set: Dependency Trees Dependency Trees + Semantic Roles + Semantic Roles Part-of-speech Part-of-speech Part-of-speech Trainer Tagging Model Tagger Dependency Dependency Dependency Trainer Parsing Model Parser Semantic Role Semantic Role Semantic Role Trainer Labeling Model Labeler Stop 7 Friday, August 17, 2012
  • 8. Contents • Introduction • Dependency conversion • Experimental setup • Part-of-speech tagging • Dependency parsing • Semantic role labeling • Conclusion 8 Friday, August 17, 2012
  • 9. Dependency Conversion • Motivation - A small amount of manually annotated dependency trees (Rambow et al., 2002; Čmejrek et al., 2004). - A large amount of manually annotated constituent trees (Marcus et al., 1993; Weischedel et al., 2011). - Converting constituent trees into dependency trees → A large amount of pseudo annotated dependency trees. • Previous approaches - Penn2Malt (stp.lingfil.uu.se/~nivre/research/Penn2Malt.html). - LTH converter (Johansson and Nugues, 2007). - Stanford converter (de Marneffe and Manning, 2008a). 9 Friday, August 17, 2012
  • 10. Dependency Conversion • Comparison - The Stanford and CLEAR dependency approaches generate 3.62% and 0.23% of unclassified dependencies, respectively. - Our conversion produces 3.69% of non-projective trees. Penn2Malt LTH Stanford CLEAR Labels Malt CoNLL Stanford Stanford+ Long-distance DPs ✓✓✓ ✓✓✓ Secondary DPs ✓ ✓✓ ✓✓✓✓ Function Tags ✓✓ ✓✓ New TB Format NO NO NO YES Maintenance NO NO YES YES 10 Friday, August 17, 2012
  • 11. Dependency Conversion (1/6) 1. Input a constituent tree. • Penn, OntoNotes, CRAFT, MiPACQ, and SHARP Treebanks. NP NP SBAR WHNP-1 S NP VP NP NN CC NN WDT PRP VB -NONE- Peace and joy that we want *T*-1 11 Friday, August 17, 2012
  • 12. Dependency Conversion (2/6) 2. Reorder constituents related to empty categories. • *T*: wh-movement and topicalization. • *RNR*: right node raising. • *ICH* and *PPA*: discontinuous constituent. NP NP NP SBAR NP SBAR WHNP-1 S S NP VP NP VP NP WHNP-1 NN CC NN WDT PRP VB -NONE- NN CC NN PRP VB WDT Peace and joy that we want *T*-1 Peace and joy we want that 12 Friday, August 17, 2012
  • 13. Dependency Conversion (3/6) 3. Handle special cases. • Apposition, coordination, and small clauses. NP NP SBAR S NP VP WHNP-1 NN CC NN PRP VB WDT Peace and joy we want that The original word order is preserved conj cc in the converted dependency tree. root Peace and joy that we want 13 Friday, August 17, 2012
  • 14. Dependency Conversion (4/6) 4. Handle general cases. • Head-finding rules and heuristics. NP NP SBAR S NP VP WHNP-1 NN CC NN PRP VB WDT Peace and joy we want that rcmod conj dobj root cc nsubj root Peace and joy that we want 14 Friday, August 17, 2012
  • 15. Dependency Conversion (5/6) 5. Add secondary dependencies. • Gapping, referent, right node raising, open clausal subject. NP NP SBAR S NP VP WHNP-1 NN CC NN PRP VB WDT Peace and joy we want that rcmod conj dobj root cc nsubj root Peace and joy that we want ref 15 Friday, August 17, 2012
  • 16. Nielsen et al., 2010; Weischedel et al., 2011; Verspoor et al., 2012). Tags followed by ∗ are not the Dependency Conversion (6/6) typical Penn Treebank tags but used in some other Treebanks. A.1 Function tags 6. Add function tags. Syntactic roles ADV Adverbial PUT Locative complement of put CLF It-cleft PRD Non-VP predicate CLR Closely related constituent RED∗ Reduced auxiliary DTV Dative SBJ Surface subject LGS Logical subject in passive TPC Topicalization NOM Nominalization Semantic roles BNF Benefactive MNR Manner DIR Direction PRP Purpose or reason EXT Extent TMP Temporal LOC Locative VOC Vocative Text and speech categories ETC Et cetera SEZ Direct speech FRM∗ Formula TTL Title HLN Headline UNF Unfinished constituent IMP Imperative Table A.1: A list of function tags for English. 16 Friday, August 17, 2012
  • 17. Contents • Introduction • Dependency conversion • Experimental setup • Part-of-speech tagging • Dependency parsing • Semantic role labeling • Conclusion 17 Friday, August 17, 2012
  • 18. Experimental Setup • The Wall Street Journal (WSJ) models - Train • The WSJ 2-21 in OntoNotes (Weischedel et al., 2011). • Total: 30,060 sentences, 731,677 tokens, 77,826 predicates. - In-genre evaluation (Avgi) • The WSJ 23 in OntoNotes. • Total: 1,640 sentences, 39,590 tokens, 4,138 predicates. - Out-of-genre evaluation (Avgo) • 5 genres in OntoNotes, 2 genres in MiPACQ (Nielsen et al., 2010), 1 genre in SHARP. • Total: 19,368 sentences, 265,337 tokens, 32,142 predicates. 18 Friday, August 17, 2012
  • 19. Experimental Setup • The OntoNotes models - Train • 6 genres in OntoNotes. • Total: 96,406 sentences, 1,983,012 tokens, 213,695 predicates. - In-genre evaluation (Avgi) • 6 genres in OntoNotes. • Total: 13,337 sentences, 201,893 tokens, 25,498 predicates. - Out-of-genre evaluation (Avgo) • Same 2 genres in MiPACQ, same 1 genre in SHARP. • Total: 7,671 sentences, 103,034 tokens, 10,782 predicates. 19 Friday, August 17, 2012
  • 20. Experimental Setup • Accuracy - Part-of-speech tagging • Accuracy. - Dependency parsing • Labeled attachment score (LAS). • Unlabeled attachment score (UAS). - Semantic role labeling • F1-score of argument identification. • F1-score of both argument identification and classification. 20 Friday, August 17, 2012
  • 21. Experimental Setup • Speed - All experiments are run on an Intel Xeon 2.57GHz machine. - Each model is run 5 times, and an average speed is measured by taking the average of middle 3 speeds. • Machine learning algorithm - Liblinear L2-regularization, L1-loss SVM classification (Hsieh et al., 2008). - Designed to handle large scale, high dimensional vectors. - Runs fast with accurate performance. - Our implementation of LibLinear is publicly available. 21 Friday, August 17, 2012
  • 22. Contents • Introduction • Dependency conversion • Experimental setup • Part-of-speech tagging • Dependency parsing • Semantic role labeling • Conclusion 22 Friday, August 17, 2012
  • 23. Part-of-Speech Tagging • Motivation - Supervised learning approaches do not perform well in out-of-genre experiments. - Domain adaptation approaches require knowledge of incoming data. - Complicated tagging or learning approaches often run slowly during decoding. • Dynamic model selection - Build two models, generalized and domain-specific, given one set of training data. - Dynamically select one of the models during decoding. 23 Friday, August 17, 2012
  • 24. Part-of-Speech Tagging • Training 1. Group training data into documents (e.g., sections in WSJ). 2. Get the document frequency of each simplified word form. • In simplified word forms, all numerical expressions with or w/o special characters are converted to 0. 3. Build a domain-specific model using features extracted from only tokens whose DF(SW) > 1. 4. Build a generalized model using features extracted from only tokens whose DF(SW) > 2. 5. Find the cosine similarity threshold for dynamic model selection. 24 Friday, August 17, 2012
  • 25. Part-of-Speech Tagging • Cosine similarity threshold - During cross-validation, collect cosine-similarities between simplified word forms used for building the domain-specific model and input sentences that the domain-specific model shows advantage. - The cosine similarity in the first 5% area becomes the threshold for dynamic model selection. 190 160 Occurrence 120 80 40 5% 0 0 0.02 0.04 0.06 Cosine Similarity 25 Friday, August 17, 2012
  • 26. Part-of-Speech Tagging • Decoding - Measure the cosine similarity between simplified word forms used for building the domain-specific model and each input sentence. - If the similarity is greater than the threshold, use the domain- specific model. - If the similarity is less than or equal to the threshold, use the generalized model. Runs as fast as a single model approach. 26 Friday, August 17, 2012
  • 27. Part-of-Speech Tagging • Experiments - Baseline: using the original word forms. - Baseline+: using lowercase simplified word forms. - Domain: domain-specific model. - General: generalized model. - ClearNLP: dynamic model selection. - Stanford: Toutanova et al., 2003. - SVMTool: Giménez and Màrquez, 2004. 27 Friday, August 17, 2012
  • 28. Part-of-Speech Tagging • Accuracy - WSJ models (Avgi and Avgo) In-domain experiments 97.5 97.39 97.40 97.41 97.31 97.24 97.0 96.93 96.98 96.5 Baseline Baseline+ Domain General ClearNLP Stanford SVMTool Out-of-domain experiments 90.5 90.61 90.79 90.43 89.5 89.92 89.49 88.5 88.64 88.25 87.5 Baseline Baseline+ Domain General ClearNLP Stanford SVMTool 28 Friday, August 17, 2012
  • 29. Part-of-Speech Tagging • Accuracy - OntoNotes models (Avgi and Avgo) In-domain experiments 96.6 96.58 96.56 96.52 96.4 96.41 96.32 96.2 96.23 96.19 96 Baseline Baseline+ Domain General ClearNLP Stanford SVMTool Out-of-domain experiments 90 89 89.26 89.26 89.20 88 88.60 87.75 87.61 87 86.79 86 Baseline Baseline+ Domain General ClearNLP Stanford SVMTool 29 Friday, August 17, 2012
  • 30. Part-of-Speech Tagging • Speed comparison Model Tokens per sec. Millisecs. per sen. ClearNLP 32,654 0.44 ClearNLP+ 39,491 0.37 WSJ Stanford 250 58.06 SVMTool 1,058 13.71 ClearNLP 32,206 0.45 ClearNLP+ 39,882 0.36 OntoNotes Stanford 136 106.34 SVMTool 924 15.71 • ClearNLP : as reported in the thesis. • ClearNLP+: new improved results. 30 Friday, August 17, 2012
  • 31. Contents • Introduction • Dependency conversion • Experimental setup • Part-of-speech tagging • Dependency parsing • Semantic role labeling • Conclusion 31 Friday, August 17, 2012
  • 32. Dependency Parsing • Goals 1. To improve the average parsing complexity for non- projective dependency parsing. 2. To reduce the discrepancy between dynamic features used for training on gold trees and decoding automatic trees. 3. To ensure well-formed dependency graph properties. • Approach 1. Combine transitions in both projective and non-projective dependency parsing algorithms. 2. Bootstrap dynamic features during training. 3. Post-process. 32 Friday, August 17, 2012
  • 33. Table 5.1 shows functional decomposition of transitions used in Nivre’s arc-eager and Covington’s Dependency Parsing algorithms. Nivre’s arc-eager algorithm is a projective parsing algorithm that shows a worst-case parsing complexity of O(n) (Nivre, 2003). Covington’s algorithm is a non-projective parsing al- • Transition decomposition gorithm that shows a worst-case parsing complexity of O(n2 ) without backtracking (Covington, - Decompose transitions in: 2001). Covington’s algorithm was later formulated as a transition-based parsing algorithm by Nivre • (2008), called Nivre’s list-based algorithm. Table(projective; Nivre, 2003). Nivre’s arc-eager algorithm 5.3 shows the relation between the decomposed • Nivre’s list-based algorithm (non-projective; Nivre, 2008). transitions in Table 5.1 and the transitions from the original algorithms. Operation Transition Description l Left-∗l ( [λ1 |i], λ2 , [j|β], A ) ⇒ ( [λ1 |i], λ2 , [j|β], A ∪ {i ← j} ) Arc l Right-∗l ( [λ1 |i], λ2 , [j|β], A ) ⇒ ( [λ1 |i], λ2 , [j|β], A ∪ {i → j} ) No-∗ ( [λ1 |i], λ2 , [j|β], A ) ⇒ ( [λ1 |i], λ2 , [j|β], A ) ∗-Shiftd|n ( [λ1 |i], λ2 , [j|β], A ) ⇒ ( [λ1 |i|λ2 |j], [ ], β, A ) List ∗-Reduce ( [λ1 |i], λ2 , [j|β], A ) ⇒ ( λ1 , λ2 , [j|β], A ) ∗-Pass ( [λ1 |i], λ2 , [j|β], A ) ⇒ ( λ1 , [i|λ2 ], [j|β], A ) Table 5.1: Decomposed transitions grouped into the Arc and List operations. This decomposition makes it easier to integrate transitions from different parsing algorithms. Operation Transition Precondition Left-∗l [i = 0] ∧ ¬[∃k. (i ← k) ∈ A] ∧ ¬[(i →∗ j) ∈ A] Arc Right-∗l 33 ¬[∃k. (k → j) ∈ A] ∧ ¬[(i ∗← j) ∈ A] Friday, August 17, 2012 No-∗ ¬[∃l. Left-∗l ∨ Right-∗l ]
  • 34. be recomposed into transitions used in several different dependency parsing algorithms. 5.2.2 Dependency Parsing Transition recomposition • Transition recomposition Any combination of two decomposed transitions in Table 5.1, one from each operation, can be - recomposed into a new transition. Forof two decomposedof Left-∗l and ∗-Reduce makes a Any combination instance, the combination transitions, one from each operation,performs Left-∗ and ∗-Reduce sequentially; the Arc operation can be recomposed. transition, Left-Reduce , which l l - is always performed before the List operation. Table 5.3 an ARC operation is For each recomposed transition, shows how these decomposed transitions performed first and a LIST operation is performed later. are recomposed into transitions used in different dependency parsing algorithms. Projective Non-projective Transition Nivre’03 Covington’01 Nivre’08 CP’11 This work Left-Reducel Left-Passl Right-Shiftnl Right-Passl No-Shiftd No-Shiftn No-Reduce No-Pass Table 5.3: Transitions in different dependency parsing algorithms. The last column shows transitions used in our parsing algorithm. The other columns show transitions used in Nivre (2003), Covington 34 (2001), Nivre (2008), and Choi and Palmer (2011a), respectively. Friday, August 17, 2012
  • 35. Dependency Parsing • Average parsing complexity - The number of transitions performed per sentence. 2850 330 Nivre'08 Covington'01 250 # of transitions # of transitions 2000 200 CP'11 1500 150 Nivre'08 1000 CP'11 100 This work 500 This work 50 0 10 20 30 40 40 50 50 60 60 70 70 80 80 Sentence length Sentence length 35 Friday, August 17, 2012
  • 36. Dependency Parsing • Bootstrapping - Transition-based dependency parsing can take advantage of dynamic features (e.g., head, leftmost/rightmost dependent). w0 ! h j wi p j wi wj w1 wj-1 wi+1 wj-1 - Features extracted from gold-standard trees during training can be different from features extracted from automatic trees during decoding. - By bootstrapping these dynamic features, we can significantly improve parsing accuracy. 36 Friday, August 17, 2012
  • 37. Dependency Parsing Begin Training Data Gold-standard Gold-standard Features Labels Machine Learning Algorithm Statistical Automatic Model Features Determined by Stop? NO Dependency cross-validation. Parser YES End 37 Friday, August 17, 2012
  • 38. Dependency Parsing • Post-processing - Transition-based dependency parsing does not guarantee parse output to be a tree. - After parsing, we find the head of each headless token by comparing it to all other tokens using the same model. - A predicted head with the highest score that does not break tree properties becomes the head of this token. - This post-processing technique significantly improves parsing accuracy in out-of-genre experiments. 38 Friday, August 17, 2012
  • 39. Dependency Parsing • Experiments - Baseline: using all recomposed transitions. - Baseline+: Baseline with post-processing. - ClearNLP: Baseline+ with bootstrapping. - CN’09: Choi and Nicolov, 2009. - CP’11: Choi and Palmer, 2011a. - MaltParser: Nivre, 2009. - MSTParser: McDonald et al., 2005. • Use only 1st order features; with 2nd order features, accuracy is expected to be higher and speed is expected to be slower. 39 Friday, August 17, 2012
  • 40. Dependency Parsing • Accuracy - WSJ models (Avgi and Avgo) LAS UAS In-genre experiments 90 89.68 89.5 89.74 88.75 88.57 88.81 88.23 88.36 87.5 88.10 87.79 88.03 86.94 87.18 86.25 86.49 86.03 85 Baseline Baseline+ ClearNLP CN’09 CP’11 MaltParser MSTParser Out-of-genre experiments 80 78.25 79.36 79.08 79.18 79.26 78.60 78.29 78.04 76.5 74.75 75.50 75.23 75.34 74.68 74.46 74.18 74.10 73 Baseline Baseline+ ClearNLP CN’09 CP’11 MaltParser MSTParser 40 Friday, August 17, 2012
  • 41. Dependency Parsing • Accuracy - OntoNotes models (Avgi and Avgo) LAS UAS In-genre experiments 88 87 87.75 87.48 87.57 86 86.54 86.83 86.70 86.40 85 85.68 85.41 85.49 84 84.51 84.76 84.05 83 83.66 Baseline Baseline+ ClearNLP CN’09 CP’11 MaltParser MSTParser Out-of-genre experiments 78.5 78.05 77.94 76.75 77.43 77.40 77.54 76.26 76.65 75 73.25 74.18 73.83 73.86 73.47 73.30 72.37 72.73 71.5 Baseline Baseline+ ClearNLP CN’09 CP’11 MaltParser MSTParser 41 Friday, August 17, 2012
  • 42. Dependency Parsing • Speed comparison - WSJ models ClearNLP ClearNLP+ CN’09 CP’11 MaltParser 1.61 ms 1.16 ms 1.25 ms 1.08 ms 2.14 ms 20 15 Milliseconds 10 5 0 10 20 30 40 50 60 70 80 Sentence Length 42 Friday, August 17, 2012
  • 43. Dependency Parsing • Speed comparison - OntoNotes models ClearNLP ClearNLP+ CN’09 CP’11 MaltParser 1.89 ms 1.28 ms 1.26 ms 1.12 ms 2.14 ms 20 15 Milliseconds 10 5 0 10 20 30 40 50 60 70 80 Sentence Length 43 Friday, August 17, 2012
  • 44. Contents • Introduction • Dependency conversion • Experimental setup • Part-of-speech tagging • Dependency parsing • Semantic role labeling • Conclusion 44 Friday, August 17, 2012
  • 45. Semantic Role Labeling • Motivation - Not all tokens need to be visited for semantic role labeling. - A typical pruning algorithm does not work as well when automatically generated trees are provided. - An enhanced pruning algorithm could improve argument coverage while maintaining low average labeling complexity. • Approach - Higher-order argument pruning. - Conditional higher-order argument pruning. - Positional feature separation. 45 Friday, August 17, 2012
  • 46. Semantic Role Labeling • Semantic roles in dependency trees ARG0 ARG1 ARG2 ARGM-TMP 46 Friday, August 17, 2012
  • 47. Semantic Role Labeling • First-order argument pruning (1st) - Originally designed for constituent trees. • Considers only siblings of the predicate, predicate’s ancestors, and siblings of predicate’s ancestors argument candidates (Xue and Palmer, 2004). - Redesigned for dependency trees. • Considers only dependents of the predicate, predicate’s ancestors, and dependents of predicate’s ancestors argument candidates (Johansson and Nugues, 2008). - Covers over 99% of all arguments using gold-standard trees. - Covers only 93% of all arguments using automatic trees. 47 Friday, August 17, 2012
  • 48. Semantic Role Labeling • Higher-order argument pruning (High) - Considers all descendants of the predicate, predicate’s ancestors, and dependents of predicate’s ancestors argument candidates. - Significantly improves argument coverage when automatically generated trees are used. 100 99.92 99.44 Argument Coverage 98 98.24 97.59 96 94 92 92.94 91.02 90 WSJ-1st ON-1st WSJ-High ON-High Gold-1st Gold-High 48 Friday, August 17, 2012
  • 49. Semantic Role Labeling • Conditional higher-order argument pruning (High+) - Reduces argument candidates using path-rules. - Before training, • Collect paths between predicates and their descendants whose subtrees contain arguments of the predicates. • Collect paths between predicates and their ancestors whose direct dependents or ancestors are arguments of the predicates. • Cut off paths whose counts are below thresholds. - During training and decoding, skip tokens and their subtrees or ancestors whose paths to the predicates are not seen. 49 Friday, August 17, 2012
  • 50. Semantic Role Labeling • Average labeling complexity - The number of tokens visited per predicate. Using the WSJ models (OntoNotes graph is similar) 75 All 60 # of candidates 50 40 High 30 High+ 20 1st 10 0 10 20 30 40 50 60 70 80 Sentence length 50 Friday, August 17, 2012
  • 51. Semantic Role Labeling • Positional feature separation - Group features by arguments’ positions with respect to their predicates. - Two sets of features are extracted. • All features derived from arguments on the lefthand side of the predicates are grouped in one set, SL. • All features derived from arguments on the righthand side of the predicates are grouped in another set, SR. - During training, build two models, ML and MR, for SL and SR. - During decoding, use ML and MR for argument candidates on the lefthand and righthand sides of the predicates. 51 Friday, August 17, 2012
  • 52. Semantic Role Labeling • Experiments - Baseline: 1st order argument pruning. - Baseline+: Baseline with positional feature separation. - High: higher-order argument pruning. - All: no argument pruning. - ClearNLP: conditional higher-order argument pruning. • Previously called High+. - ClearParser: Choi and Palmer, 2011b. 52 Friday, August 17, 2012
  • 53. Semantic Role Labeling • Accuracy - WSJ models (Avgi and Avgo) In-domain experiments 82.6 82.52 82.48 82.3 82.42 82.28 82.26 82.0 81.88 81.7 Baseline Baseline+ High All ClearNLP ClearParser Out-of-domain experiments 72 71.90 71.95 71.7 71.85 71.64 71.4 71.52 71.1 71.07 70.8 Baseline Baseline+ High All ClearNLP ClearParser 53 Friday, August 17, 2012
  • 54. Semantic Role Labeling • Accuracy - OntoNotes models (Avgi and Avgo) In-domain experiments 81.7 81.69 81.51 81.48 81.52 81.3 81.33 80.9 80.73 80.5 Baseline Baseline+ High All ClearNLP ClearParser Out-of-domain experiments 70.9 70.81 70.5 70.68 70.68 70.54 70.1 70.02 70.01 69.7 Baseline Baseline+ High All ClearNLP ClearParser 54 Friday, August 17, 2012
  • 55. Semantic Role Labeling • Speed comparison - WSJ models - Milliseconds for finding all arguments of each predicate. 3 ClearNLP ClearNLP+ Baseline+ 2.25 High All Milliseconds ClearParser 1.5 0.75 0 10 20 30 40 50 60 70 80 Sentence Length 55 Friday, August 17, 2012
  • 56. Semantic Role Labeling • Speed comparison - OntoNotes models 3 ClearNLP ClearNLP+ Baseline+ 2.25 High All Milliseconds ClearParser 1.5 0.75 0 10 20 30 40 50 60 70 80 Sentence Length 56 Friday, August 17, 2012
  • 57. Contents • Introduction • Dependency conversion • Experimental setup • Part-of-speech tagging • Dependency parsing • Semantic role labeling • Conclusion 57 Friday, August 17, 2012
  • 58. Conclusion • Our dependency conversion gives rich dependency representations and can be applied to most English Treebanks. • The dynamic model selection runs fast and shows robust POS tagging accuracy across different genres. • Our parsing algorithm shows linear-time average parsing complexity for generating both proj. and non-proj. trees. • The bootstrapping technique gives significant improvement on parsing accuracy. • The higher-order argument pruning gives significant improvement on argument coverage. • The conditional higher-order argument pruning reduces average labeling complexity without compromising the F1-score. 58 Friday, August 17, 2012
  • 59. Conclusion • Contributions - First time that these three components have been evaluated together on such a wide variety of English data. - Maintained high level accuracy while improving efficiency, modularity, and portability of these components. - Dynamic model selection and bootstrapping can be generally applicable for tagging and parsing, respectively. - Processing all three components take about 2.49 - 2.69 ms (tagging: 0.36 - 0.37, parsing: 1.16 - 1.28, labeling: 0.97 - 1.04). - All components are publicly available as an open source project, called ClearNLP (clearnlp.googlecode.com). 59 Friday, August 17, 2012
  • 60. Conclusion • Future work - Integrate the dynamic model selection approach with more sophisticated tagging algorithms. - Evaluate our parsing approach on languages containing more non-projective dependency trees. - Improve semantic role labeling where the quality of input parse trees is poor (using joint-inference). 60 Friday, August 17, 2012
  • 61. Acknowledgment • We gratefully acknowledge the support of the following grants. Any contents expressed in this material are those of the authors and do not necessarily reflect the views of any grant. - The National Science Foundation Grants IIS-0325646, Domain Independent Semantic Parsing, CISE-CRI-0551615, Towards a Comprehensive Linguistic Annotation, CISE-CRI 0709167, Collaborative: A Multi-Representational and Multi-Layered Treebank for Hindi/Urdu, CISE- IIS-RI-0910992, Richer Representations for Machine Translation. - A grant from the Defense Advanced Research Projects Agency (DARPA/ IPTO) under the GALE program, DARPA/CMO Contract No. HR0011-06- C-0022, subcontract from BBN, Inc. - A subcontract from the Mayo Clinic and Harvard Children’s Hospital based on a grant from the ONC, 90TR0002/01. - Strategic Health Advanced Research Project Area 4: Natural Language Processing. 61 Friday, August 17, 2012
  • 62. Acknowledgment • Special thanks are due to - Martha Palmer for practically being my mom for 5 years. - James Martin for always encouraging me when I’m low. - Wayne Ward for wonderful smiles. - Bhuvana Narasimhan for bringing Hindi to my life. - Joakim Nivre for suffering under millions of my questions. - Nicolas Nicolov for making me feel normal when others call me “workaholic”. - All CINC folks for letting me live (literally) at my cube. 62 Friday, August 17, 2012
  • 63. References • Jinho D. Choi and Nicolas Nicolov. K-best, Locally Pruned, Transition-based Dependency Parsing Using Robust Risk Minimization. In Recent Advances in Natural Language Processing V, pages 205–216. John Benjamins, 2009. • Jinho D. Choi and Martha Palmer. Getting the Most out of Transition-based Dependency Parsing. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics:Human Language Technologies, ACL:HLT’11, pages 687–692, 2011a. • Jinho D. Choi and Martha Palmer. Transition-based Semantic Role Labeling Using Predicate Argument Clustering. In Proceedings of ACL workshop on Relational Models of Semantics, RELMS’11, pages 37– 45, 2011b. • M. Čmejrek, J. Cuřín, and J. Havelka. Prague Czech-English Dependency Treebank: Any Hopes for a Common Annotation Scheme? In HLT-NAACL’04 workshop on Frontiers in CorpusAnnotation, pages 47–54, 2004. • Jesús Giménez and Lluís Màrquez. SVMTool: A general POS tagger generator based on Support Vector Machines. In Proceedings of the 4th International Conference on Language Resources and Evaluation, LREC’04, 2004. • Richard Johansson and Pierre Nugues. Dependency-based Semantic Role Labeling of PropBank. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing (EMNLP’08), pages 69–78, 2008. 63 Friday, August 17, 2012
  • 64. References • Cho-Jui Hsieh, Kai-Wei Chang, Chih-Jen Lin, S. Sathiya Keerthi, and S. Sundararajan. A Dual Coordinate Descent Method for Large-scale Linear SVM. In Proceedings of the 25th international conference on Machine learning, ICML’08, pages 408–415, 2008. • Mitchell P. Marcus, Mary Ann Marcinkiewicz, and Beatrice Santorini. Building a Large Annotated Corpus of English: The Penn Treebank. Computational Linguistics, 19(2):313–330, 1993. • Marie-Catherine de Marneffe and Christopher D. Manning. The Stanford typed dependencies representation. In Proceedings of the COLING workshop on Cross-Framework and Cross- DomainParser Evaluation, 2008a. • Ryan McDonald, Fernando Pereira, Kiril Ribarov, and Jan Hajič. Non-projective Dependency Parsing using Spanning Tree Algorithms. In Proceedings of the Conference on Human LanguageTechnology and Empirical Methods in Natural Language Processing (HLT-EMNLP’05), pages523–530, 2005. • Rodney D. Nielsen, James Masanz, Philip Ogren, Wayne Ward, James H. Martin, Guergana Savova, and Martha Palmer. An architecture for complex clinical question answering. In Proceedings of the 1st ACM International Health Informatics Symposium, IHI’10, pages 395–399, 2010. • Joakim Nivre. An Efficient Algorithm for Projective Dependency Parsing. In Proceedings of the 8th International Workshop on Parsing Technologies, IWPT’03, pages 149–160, 2003. • Joakim Nivre. Algorithms for deterministic incremental dependency parsing. Computational Linguistics, 34(4):513–553, 2008. 64 Friday, August 17, 2012
  • 65. References • Joakim Nivre. Non-Projective Dependency Parsing in Expected Linear Time. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International JointConference on Natural Language Processing of the AFNLP (ACL-IJCNLP’09), pages 351–359,2009. • Owen Rambow, Cassandre Creswell, Rachel Szekely, Harriet Taber, and Marilyn Walker. A Dependency Treebank for English. In Proceedings of the 3rd International Conference on LanguageResources and Evaluation (LREC’02), 2002. • Ralph Weischedel, Eduard Hovy, Martha Palmer, Mitch Marcus, Robert Belvin, Sameer Pradhan, Lance Ramshaw, and Nianwen Xue. OntoNotes: A Large Training Corpus for Enhanced Processing. In Joseph Olive, Caitlin Christianson, and John McCary, editors, Handbook of NaturalLanguage Processing and Machine Translation. Springer, 2011. • Kristina Toutanova, Dan Klein, Christopher D. Manning, and Yoram Singer. Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network. In Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics on HumanLanguage Technology, NAACL’03, pages 173–180, 2003. • Nianwen Xue and Martha Palmer. Calibrating Features for Semantic Role Labeling. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2004. 65 Friday, August 17, 2012