Optimization of NLP Components for Robustness and Scalability

Optimization of NLP Components
For Robustness and Scalability
Dissertation Defense
August 15, 2012

Jinho D. Choi
choijd@colorado.edu

Department of Computer Science
University of Colorado Boulder

Friday, August 17, 2012

Ever since I came, Boulder has been ...
• #1: Top 10 College Towns (Livability, 2012)

• #1: Top 10 Least Obese Metro Areas (Gallup Healthways, 2012)

• #1: Top 10 Happiest Cities (Gallup Healthways, 2012)

• #1: The 10 Most Educated U.S. Cities (US News, 2011)

• #1: America’s 15 Most Active Cities (Time - Healthland, 2011)

• #1: Best Quality of Life in America (Porfolio, 2011)

• #1: 20 Brainiest Cities in America (Daily Beast, 2010)

• #1: Western Cities Fare Best in Well-being (USA Today, 2010)

• #1: America's Foodiest Town (Bon Appétit, 2010)

• #1: The Best Cities to Raise an Outdoor Kid (Backpacker, 2009)

• #1: America's Top 25 Towns To Live Well (Forbes, 2009)

• #1: America's Smartest Cities (Forbes, 2008)

• #1: Top Heart Friendly Cities (American Heart Association, 2008)

2

Contents
• Introduction
• Dependency conversion
• Experimental setup
• Part-of-speech tagging
• Dependency parsing
• Semantic role labeling
• Conclusion

3

Introduction
• The application of NLP has ...
- Expanded to everyday computing.

- Broadened to a general audience.

‣ More attention is drawn to the practical aspects of NLP.

• NLP components should be tested for
- Robustness in handling heterogeneous data.
• Need to be evaluated on data from several different sources.

- Scalability in handling a large amount of data.
• Need to be evaluated for speed and complexity.

4

Introduction
• Research question
- How to improve the robustness and scalability of standard
NLP components.

• Goals
- To prepare gold-standard data from several different sources
for in-genre and out-of-genre experiments.

- To develop a POS tagger, a dependency parser, and a semantic
role labeler showing robust results across this data.

- To reduce average complexities of these components while
retaining good performance in accuracy.

5

Introduction
• Thesis statement
1. We improve the robustness of three NLP components:
• POS tagger: by building a generalized model.

• Dependency parser: by bootstrapping parse information.

• Semantic role labeler: by applying higher-order argument pruning.

2. We improve the scalability of these three components:
• POS tagger: by adapting dynamic model selection.

• Dependency parser: by optimizing the engineering of transition-
based parsing algorithms.

• Semantic role labeler: by applying conditional higher-order
argument pruning.

6

Introduction
Start

Constituent
Treebanks
PropBanks

Dependency
Conversion

Training Set: Evaluation Set:
Dependency Trees Dependency Trees
+ Semantic Roles + Semantic Roles

Part-of-speech Part-of-speech Part-of-speech
Trainer Tagging Model Tagger

Dependency Dependency Dependency
Trainer Parsing Model Parser

Semantic Role Semantic Role Semantic Role
Trainer Labeling Model Labeler

Stop

7

Contents
• Introduction
• Conclusion

8

Dependency Conversion
• Motivation
- A small amount of manually annotated dependency trees
(Rambow et al., 2002; Čmejrek et al., 2004).

- A large amount of manually annotated constituent trees
(Marcus et al., 1993; Weischedel et al., 2011).

- Converting constituent trees into dependency trees
→ A large amount of pseudo annotated dependency trees.

• Previous approaches
- Penn2Malt (stp.lingﬁl.uu.se/~nivre/research/Penn2Malt.html).

- LTH converter (Johansson and Nugues, 2007).

- Stanford converter (de Marneffe and Manning, 2008a).

9

Dependency Conversion
• Comparison
- The Stanford and CLEAR dependency approaches generate
3.62% and 0.23% of unclassiﬁed dependencies, respectively.

- Our conversion produces 3.69% of non-projective trees.

Penn2Malt LTH Stanford CLEAR
Labels Malt CoNLL Stanford Stanford+
Long-distance DPs ✓✓✓ ✓✓✓
Secondary DPs ✓ ✓✓ ✓✓✓✓
Function Tags ✓✓ ✓✓
New TB Format NO NO NO YES
Maintenance NO NO YES YES

10

Dependency Conversion (1/6)
1. Input a constituent tree.
• Penn, OntoNotes, CRAFT, MiPACQ, and SHARP Treebanks.

NP
NP SBAR
WHNP-1 S
NP VP
NP
NN CC NN WDT PRP VB -NONE-
Peace and joy that we want *T*-1

11

2. Reorder constituents related to empty categories.
• *T*: wh-movement and topicalization.
• *RNR*: right node raising.
• *ICH* and *PPA*: discontinuous constituent.
NP NP
NP SBAR NP SBAR
WHNP-1 S S
NP VP NP VP
NP WHNP-1
NN CC NN WDT PRP VB -NONE- NN CC NN PRP VB WDT
Peace and joy that we want *T*-1 Peace and joy we want that

12

3. Handle special cases.
• Apposition, coordination, and small clauses.
NP
NP SBAR
S
NP VP
WHNP-1
NN CC NN PRP VB WDT
Peace and joy we want that

The original word order is preserved
conj
cc in the converted dependency tree.
root Peace and joy that we want

13

4. Handle general cases.
• Head-ﬁnding rules and heuristics.
NP
NP SBAR
S
NP VP
WHNP-1
NN CC NN PRP VB WDT
rcmod
conj dobj
root cc nsubj


14

5. Add secondary dependencies.
• Gapping, referent, right node raising, open clausal subject.
NP
NP SBAR
S
NP VP
WHNP-1
NN CC NN PRP VB WDT
rcmod
conj dobj
root cc nsubj

ref

15

Nielsen et al., 2010; Weischedel et al., 2011; Verspoor et al., 2012). Tags followed by ∗ are not the

typical Penn Treebank tags but used in some other Treebanks.

A.1 Function tags
6. Add function tags.
Syntactic roles
ADV Adverbial PUT Locative complement of put
CLF It-cleft PRD Non-VP predicate
CLR Closely related constituent RED∗ Reduced auxiliary
DTV Dative SBJ Surface subject
LGS Logical subject in passive TPC Topicalization
NOM Nominalization
Semantic roles
BNF Benefactive MNR Manner
DIR Direction PRP Purpose or reason
EXT Extent TMP Temporal
LOC Locative VOC Vocative
Text and speech categories
ETC Et cetera SEZ Direct speech
FRM∗ Formula TTL Title
HLN Headline UNF Unﬁnished constituent
IMP Imperative

Table A.1: A list of function tags for English.

16

Contents
• Introduction
• Conclusion

17

Experimental Setup
• The Wall Street Journal (WSJ) models
- Train
• The WSJ 2-21 in OntoNotes (Weischedel et al., 2011).

• Total: 30,060 sentences, 731,677 tokens, 77,826 predicates.

- In-genre evaluation (Avgi)
• The WSJ 23 in OntoNotes.


- Out-of-genre evaluation (Avgo)
• 5 genres in OntoNotes, 2 genres in MiPACQ (Nielsen et al., 2010),
1 genre in SHARP.


18

Experimental Setup
• The OntoNotes models
- Train
• 6 genres in OntoNotes.

• Total: 96,406 sentences, 1,983,012 tokens, 213,695 predicates.

- In-genre evaluation (Avgi)
• 6 genres in OntoNotes.


- Out-of-genre evaluation (Avgo)
• Same 2 genres in MiPACQ, same 1 genre in SHARP.


19

Experimental Setup
• Accuracy
- Part-of-speech tagging
• Accuracy.

- Dependency parsing
• Labeled attachment score (LAS).

• Unlabeled attachment score (UAS).

- Semantic role labeling
• F1-score of argument identification.

• F1-score of both argument identification and classification.

20

Experimental Setup
• Speed
- All experiments are run on an Intel Xeon 2.57GHz machine.

- Each model is run 5 times, and an average speed is measured
by taking the average of middle 3 speeds.

• Machine learning algorithm
- Liblinear L2-regularization, L1-loss SVM classiﬁcation
(Hsieh et al., 2008).

- Designed to handle large scale, high dimensional vectors.

- Runs fast with accurate performance.

- Our implementation of LibLinear is publicly available.

21

Contents
• Introduction
• Conclusion

22

Part-of-Speech Tagging
• Motivation
- Supervised learning approaches do not perform well in
out-of-genre experiments.

- Domain adaptation approaches require knowledge of
incoming data.

- Complicated tagging or learning approaches often run slowly
during decoding.

• Dynamic model selection
- Build two models, generalized and domain-speciﬁc, given one
set of training data.

- Dynamically select one of the models during decoding.

23

• Training
1. Group training data into documents (e.g., sections in WSJ).
2. Get the document frequency of each simplified word form.
• In simplified word forms, all numerical expressions with or w/o
special characters are converted to 0.

3. Build a domain-specific model using features extracted from
only tokens whose DF(SW) > 1.
4. Build a generalized model using features extracted from only
tokens whose DF(SW) > 2.
5. Find the cosine similarity threshold for dynamic model
selection.

24

• Cosine similarity threshold
- During cross-validation, collect cosine-similarities between
simplified word forms used for building the domain-specific
model and input sentences that the domain-specific model
shows advantage.

- The cosine similarity in the first 5% area becomes the
threshold for dynamic model selection.
190
160
Occurrence

120

80

40
5%
0
0 0.02 0.04 0.06

Cosine Similarity
25

• Decoding
- Measure the cosine similarity between simplified word forms
used for building the domain-specific model and each input
sentence.

- If the similarity is greater than the threshold, use the domain-
specific model.

- If the similarity is less than or equal to the threshold, use the
generalized model.

Runs as fast as a single model approach.

26

• Experiments
- Baseline: using the original word forms.

- Baseline+: using lowercase simpliﬁed word forms.

- Domain: domain-speciﬁc model.

- General: generalized model.

- ClearNLP: dynamic model selection.

- Stanford: Toutanova et al., 2003.

- SVMTool: Giménez and Màrquez, 2004.

27

• Accuracy - WSJ models (Avgi and Avgo)
In-domain experiments
97.5
97.39 97.40 97.41
97.31
97.24
97.0
96.93 96.98

96.5
Baseline Baseline+ Domain General ClearNLP Stanford SVMTool

Out-of-domain experiments
90.5 90.61 90.79
90.43
89.5 89.92
89.49
88.5 88.64
88.25
87.5

28

• Accuracy - OntoNotes models (Avgi and Avgo)
96.6
96.58 96.56 96.52
96.4 96.41
96.32
96.2 96.23 96.19
96

90
89 89.26 89.26 89.20
88 88.60
87.75 87.61
87
86.79
86

29

• Speed comparison
Model Tokens per sec. Millisecs. per sen.
ClearNLP 32,654 0.44
ClearNLP+ 39,491 0.37
WSJ
Stanford 250 58.06
SVMTool 1,058 13.71
ClearNLP 32,206 0.45
ClearNLP+ 39,882 0.36
OntoNotes
Stanford 136 106.34
SVMTool 924 15.71
• ClearNLP : as reported in the thesis.
• ClearNLP+: new improved results.
30

Contents
• Introduction
• Conclusion

31

Dependency Parsing
• Goals
1. To improve the average parsing complexity for non-
projective dependency parsing.
2. To reduce the discrepancy between dynamic features used
for training on gold trees and decoding automatic trees.
3. To ensure well-formed dependency graph properties.

• Approach
1. Combine transitions in both projective and non-projective
dependency parsing algorithms.
2. Bootstrap dynamic features during training.
3. Post-process.

32

Table 5.1 shows functional decomposition of transitions used in Nivre’s arc-eager and Covington’s

Dependency Parsing
algorithms. Nivre’s arc-eager algorithm is a projective parsing algorithm that shows a worst-case

parsing complexity of O(n) (Nivre, 2003). Covington’s algorithm is a non-projective parsing al-

• Transition decomposition
gorithm that shows a worst-case parsing complexity of O(n2 ) without backtracking (Covington,

-
Decompose transitions in:
2001). Covington’s algorithm was later formulated as a transition-based parsing algorithm by Nivre

•
(2008), called Nivre’s list-based algorithm. Table(projective; Nivre, 2003).
Nivre’s arc-eager algorithm 5.3 shows the relation between the decomposed

• Nivre’s list-based algorithm (non-projective; Nivre, 2008).
transitions in Table 5.1 and the transitions from the original algorithms.

Operation Transition Description
l
Left-∗l ( [λ1 |i], λ2 , [j|β], A ) ⇒ ( [λ1 |i], λ2 , [j|β], A ∪ {i ← j} )
Arc l
Right-∗l ( [λ1 |i], λ2 , [j|β], A ) ⇒ ( [λ1 |i], λ2 , [j|β], A ∪ {i → j} )
No-∗ ( [λ1 |i], λ2 , [j|β], A ) ⇒ ( [λ1 |i], λ2 , [j|β], A )
∗-Shiftd|n ( [λ1 |i], λ2 , [j|β], A ) ⇒ ( [λ1 |i|λ2 |j], [ ], β, A )
List ∗-Reduce ( [λ1 |i], λ2 , [j|β], A ) ⇒ ( λ1 , λ2 , [j|β], A )
∗-Pass ( [λ1 |i], λ2 , [j|β], A ) ⇒ ( λ1 , [i|λ2 ], [j|β], A )

Table 5.1: Decomposed transitions grouped into the Arc and List operations.
This decomposition makes it easier to integrate
transitions from different parsing algorithms.
Operation Transition Precondition
Left-∗l [i = 0] ∧ ¬[∃k. (i ← k) ∈ A] ∧ ¬[(i →∗ j) ∈ A]
Arc Right-∗l 33
¬[∃k. (k → j) ∈ A] ∧ ¬[(i ∗← j) ∈ A]
Friday, August 17, 2012 No-∗ ¬[∃l. Left-∗l ∨ Right-∗l ]

be recomposed into transitions used in several different dependency parsing algorithms.

5.2.2
Dependency Parsing
Transition recomposition

• Transition recomposition
Any combination of two decomposed transitions in Table 5.1, one from each operation, can be

-
recomposed into a new transition. Forof two decomposedof Left-∗l and ∗-Reduce makes a
Any combination instance, the combination transitions, one from
each operation,performs Left-∗ and ∗-Reduce sequentially; the Arc operation
can be recomposed.
transition, Left-Reduce , which l l

-
is always performed before the List operation. Table 5.3 an ARC operation is
For each recomposed transition, shows how these decomposed transitions
performed first and a LIST operation is performed later.
are recomposed into transitions used in different dependency parsing algorithms.
Projective Non-projective
Transition Nivre’03 Covington’01 Nivre’08 CP’11 This work
Left-Reducel
Left-Passl
Right-Shiftnl
Right-Passl
No-Shiftd
No-Shiftn
No-Reduce
No-Pass

Table 5.3: Transitions in different dependency parsing algorithms. The last column shows transitions
used in our parsing algorithm. The other columns show transitions used in Nivre (2003), Covington
34
(2001), Nivre (2008), and Choi and Palmer (2011a), respectively.

Dependency Parsing
• Average parsing complexity
- The number of transitions performed per sentence.

2850
330 Nivre'08
Covington'01

250
# of transitions
# of transitions

2000
200 CP'11
1500
150 Nivre'08
1000 CP'11
100 This work
500 This work
50
0
10 20 30 40
40 50
50 60
60 70
70 80
80

Sentence length
Sentence length

35

Dependency Parsing
• Bootstrapping
- Transition-based dependency parsing can take advantage of
dynamic features (e.g., head, leftmost/rightmost dependent).
w0 ! h j wi p j

wi wj

w1 wj-1 wi+1 wj-1

- Features extracted from gold-standard trees during training
can be different from features extracted from automatic
trees during decoding.

- By bootstrapping these dynamic features, we can signiﬁcantly
improve parsing accuracy.

36

Dependency Parsing
Begin

Training
Data

Gold-standard Gold-standard
Features Labels

Machine Learning
Algorithm

Statistical Automatic
Model Features

Determined by Stop?
NO Dependency
cross-validation. Parser
YES

End

37

Dependency Parsing
• Post-processing
- Transition-based dependency parsing does not guarantee
parse output to be a tree.

- After parsing, we ﬁnd the head of each headless token by
comparing it to all other tokens using the same model.

- A predicted head with the highest score that does not break
tree properties becomes the head of this token.

- This post-processing technique signiﬁcantly improves parsing
accuracy in out-of-genre experiments.

38

Dependency Parsing
• Experiments
- Baseline: using all recomposed transitions.

- Baseline+: Baseline with post-processing.

- ClearNLP: Baseline+ with bootstrapping.

- CN’09: Choi and Nicolov, 2009.

- CP’11: Choi and Palmer, 2011a.

- MaltParser: Nivre, 2009.

- MSTParser: McDonald et al., 2005.
• Use only 1st order features; with 2nd order features, accuracy is
expected to be higher and speed is expected to be slower.

39

Dependency Parsing
• Accuracy - WSJ models (Avgi and Avgo) LAS
UAS
In-genre experiments
90
89.68 89.5 89.74
88.75
88.57 88.81
88.23 88.36
87.5 88.10 87.79 88.03
86.94 87.18
86.25 86.49
86.03
85
Baseline Baseline+ ClearNLP CN’09 CP’11 MaltParser MSTParser
Out-of-genre experiments
80
78.25 79.36 79.08 79.18 79.26
78.60 78.29
78.04
76.5
74.75 75.50 75.23 75.34
74.68 74.46
74.18 74.10
73

40

Dependency Parsing
• Accuracy - OntoNotes models (Avgi and Avgo) LAS
UAS
In-genre experiments
88
87 87.75 87.48 87.57
86 86.54 86.83 86.70
86.40
85 85.68 85.41 85.49
84 84.51 84.76
84.05
83 83.66
Out-of-genre experiments
78.5
78.05 77.94
76.75 77.43 77.40 77.54
76.26 76.65
75
73.25 74.18 73.83 73.86 73.47 73.30
72.37 72.73
71.5

41

Dependency Parsing
• Speed comparison - WSJ models
ClearNLP ClearNLP+ CN’09 CP’11 MaltParser
1.61 ms 1.16 ms 1.25 ms 1.08 ms 2.14 ms
20

15
Milliseconds

10

5

0
10 20 30 40 50 60 70 80
Sentence Length

42

Dependency Parsing
• Speed comparison - OntoNotes models
ClearNLP ClearNLP+ CN’09 CP’11 MaltParser
1.89 ms 1.28 ms 1.26 ms 1.12 ms 2.14 ms
20

15
Milliseconds

10

5

0
10 20 30 40 50 60 70 80
Sentence Length

43

Contents
• Introduction
• Conclusion

44

Semantic Role Labeling
• Motivation
- Not all tokens need to be visited for semantic role labeling.

- A typical pruning algorithm does not work as well when
automatically generated trees are provided.

- An enhanced pruning algorithm could improve argument
coverage while maintaining low average labeling complexity.

• Approach
- Higher-order argument pruning.

- Conditional higher-order argument pruning.

- Positional feature separation.

45

• Semantic roles in dependency trees
ARG0 ARG1 ARG2 ARGM-TMP

46

• First-order argument pruning (1st)
- Originally designed for constituent trees.
• Considers only siblings of the predicate, predicate’s ancestors, and
siblings of predicate’s ancestors argument candidates (Xue and
Palmer, 2004).

- Redesigned for dependency trees.
• Considers only dependents of the predicate, predicate’s ancestors,
and dependents of predicate’s ancestors argument candidates
(Johansson and Nugues, 2008).

- Covers over 99% of all arguments using gold-standard trees.

- Covers only 93% of all arguments using automatic trees.

47

• Higher-order argument pruning (High)
- Considers all descendants of the predicate, predicate’s
ancestors, and dependents of predicate’s ancestors
argument candidates.

- Signiﬁcantly improves argument coverage when automatically
generated trees are used.
100
99.92
99.44
Argument Coverage

98 98.24
97.59
96

94

92 92.94

91.02
90
WSJ-1st ON-1st WSJ-High ON-High Gold-1st Gold-High

48

• Conditional higher-order argument pruning (High+)
- Reduces argument candidates using path-rules.

- Before training,
• Collect paths between predicates and their descendants whose
subtrees contain arguments of the predicates.

• Collect paths between predicates and their ancestors whose
direct dependents or ancestors are arguments of the predicates.

• Cut off paths whose counts are below thresholds.

- During training and decoding, skip tokens and their subtrees
or ancestors whose paths to the predicates are not seen.

49

• Average labeling complexity
- The number of tokens visited per predicate.
Using the WSJ models (OntoNotes graph is similar)
75 All
60
# of candidates

50
40 High
30
High+
20 1st
10
0
10 20 30 40 50 60 70 80

Sentence length
50

• Positional feature separation
- Group features by arguments’ positions with respect to their
predicates.

- Two sets of features are extracted.
• All features derived from arguments on the lefthand side of the
predicates are grouped in one set, SL.

• All features derived from arguments on the righthand side of the
predicates are grouped in another set, SR.

- During training, build two models, ML and MR, for SL and SR.

- During decoding, use ML and MR for argument candidates on
the lefthand and righthand sides of the predicates.

51

• Experiments
- Baseline: 1st order argument pruning.

- Baseline+: Baseline with positional feature separation.

- High: higher-order argument pruning.

- All: no argument pruning.

- ClearNLP: conditional higher-order argument pruning.
• Previously called High+.

- ClearParser: Choi and Palmer, 2011b.

52

• Accuracy - WSJ models (Avgi and Avgo)
82.6
82.52 82.48
82.3 82.42
82.28 82.26
82.0
81.88
81.7
Baseline Baseline+ High All ClearNLP ClearParser

72
71.90 71.95
71.7 71.85
71.64
71.4 71.52
71.1
71.07
70.8

53

• Accuracy - OntoNotes models (Avgi and Avgo)
81.7
81.69
81.51 81.48 81.52
81.3 81.33
80.9
80.73
80.5

70.9
70.81
70.5 70.68 70.68
70.54
70.1
70.02 70.01
69.7

54

• Speed comparison - WSJ models
- Milliseconds for ﬁnding all arguments of each predicate.
3 ClearNLP
ClearNLP+
Baseline+
2.25 High
All
Milliseconds

ClearParser
1.5

0.75

0
10 20 30 40 50 60 70 80
Sentence Length

55

• Speed comparison - OntoNotes models

3 ClearNLP
ClearNLP+
Baseline+
2.25 High
All
Milliseconds

ClearParser
1.5

0.75

0
10 20 30 40 50 60 70 80
Sentence Length

56

Contents
• Introduction
• Conclusion

57

Conclusion
• Our dependency conversion gives rich dependency
representations and can be applied to most English Treebanks.

• The dynamic model selection runs fast and shows robust POS
tagging accuracy across different genres.

• Our parsing algorithm shows linear-time average parsing
complexity for generating both proj. and non-proj. trees.

• The bootstrapping technique gives signiﬁcant improvement on
parsing accuracy.

• The higher-order argument pruning gives signiﬁcant
improvement on argument coverage.

• The conditional higher-order argument pruning reduces average
labeling complexity without compromising the F1-score.

58

Conclusion
• Contributions
- First time that these three components have been evaluated
together on such a wide variety of English data.

- Maintained high level accuracy while improving efﬁciency,
modularity, and portability of these components.

- Dynamic model selection and bootstrapping can be generally
applicable for tagging and parsing, respectively.

- Processing all three components take about 2.49 - 2.69 ms
(tagging: 0.36 - 0.37, parsing: 1.16 - 1.28, labeling: 0.97 - 1.04).

- All components are publicly available as an open source
project, called ClearNLP (clearnlp.googlecode.com).

59

Conclusion
• Future work
- Integrate the dynamic model selection approach with more
sophisticated tagging algorithms.

- Evaluate our parsing approach on languages containing more
non-projective dependency trees.

- Improve semantic role labeling where the quality of input
parse trees is poor (using joint-inference).

60

Acknowledgment
• We gratefully acknowledge the support of the following grants. Any contents
expressed in this material are those of the authors and do not necessarily
reﬂect the views of any grant.

- The National Science Foundation Grants IIS-0325646, Domain Independent
Semantic Parsing, CISE-CRI-0551615, Towards a Comprehensive Linguistic
Annotation, CISE-CRI 0709167, Collaborative: A Multi-Representational and
Multi-Layered Treebank for Hindi/Urdu, CISE- IIS-RI-0910992, Richer
Representations for Machine Translation.

- A grant from the Defense Advanced Research Projects Agency (DARPA/
IPTO) under the GALE program, DARPA/CMO Contract No. HR0011-06-
C-0022, subcontract from BBN, Inc.

- A subcontract from the Mayo Clinic and Harvard Children’s Hospital based
on a grant from the ONC, 90TR0002/01.

- Strategic Health Advanced Research Project Area 4: Natural Language
Processing.

61

Acknowledgment
• Special thanks are due to
- Martha Palmer for practically being my mom for 5 years.

- James Martin for always encouraging me when I’m low.

- Wayne Ward for wonderful smiles.

- Bhuvana Narasimhan for bringing Hindi to my life.

- Joakim Nivre for suffering under millions of my questions.

- Nicolas Nicolov for making me feel normal when others call
me “workaholic”.

- All CINC folks for letting me live (literally) at my cube.

62

References
• Jinho D. Choi and Nicolas Nicolov. K-best, Locally Pruned, Transition-based Dependency Parsing Using
Robust Risk Minimization. In Recent Advances in Natural Language Processing V, pages 205–216. John
Benjamins, 2009.

• Jinho D. Choi and Martha Palmer. Getting the Most out of Transition-based Dependency Parsing. In
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics:Human
Language Technologies, ACL:HLT’11, pages 687–692, 2011a.

• Jinho D. Choi and Martha Palmer. Transition-based Semantic Role Labeling Using Predicate Argument
Clustering. In Proceedings of ACL workshop on Relational Models of Semantics, RELMS’11, pages 37–
45, 2011b.

• M. Čmejrek, J. Cuřín, and J. Havelka. Prague Czech-English Dependency Treebank: Any Hopes for a
Common Annotation Scheme? In HLT-NAACL’04 workshop on Frontiers in CorpusAnnotation, pages
47–54, 2004.

• Jesús Giménez and Lluís Màrquez. SVMTool: A general POS tagger generator based on Support Vector
Machines. In Proceedings of the 4th International Conference on Language Resources and Evaluation,
LREC’04, 2004.

• Richard Johansson and Pierre Nugues. Dependency-based Semantic Role Labeling of PropBank. In
Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing
(EMNLP’08), pages 69–78, 2008.

63

References
• Cho-Jui Hsieh, Kai-Wei Chang, Chih-Jen Lin, S. Sathiya Keerthi, and S. Sundararajan. A Dual Coordinate
Descent Method for Large-scale Linear SVM. In Proceedings of the 25th international conference on
Machine learning, ICML’08, pages 408–415, 2008.

• Mitchell P. Marcus, Mary Ann Marcinkiewicz, and Beatrice Santorini. Building a Large Annotated Corpus
of English: The Penn Treebank. Computational Linguistics, 19(2):313–330, 1993.

• Marie-Catherine de Marneffe and Christopher D. Manning. The Stanford typed dependencies
representation. In Proceedings of the COLING workshop on Cross-Framework and Cross-
DomainParser Evaluation, 2008a.

• Ryan McDonald, Fernando Pereira, Kiril Ribarov, and Jan Hajič. Non-projective Dependency Parsing
using Spanning Tree Algorithms. In Proceedings of the Conference on Human LanguageTechnology and
Empirical Methods in Natural Language Processing (HLT-EMNLP’05), pages523–530, 2005.

• Rodney D. Nielsen, James Masanz, Philip Ogren, Wayne Ward, James H. Martin, Guergana Savova, and
Martha Palmer. An architecture for complex clinical question answering. In Proceedings of the 1st ACM
International Health Informatics Symposium, IHI’10, pages 395–399, 2010.

• Joakim Nivre. An Efﬁcient Algorithm for Projective Dependency Parsing. In Proceedings of the 8th
International Workshop on Parsing Technologies, IWPT’03, pages 149–160, 2003.

• Joakim Nivre. Algorithms for deterministic incremental dependency parsing. Computational
Linguistics, 34(4):513–553, 2008.

64

References
• Joakim Nivre. Non-Projective Dependency Parsing in Expected Linear Time. In Proceedings of the Joint
Conference of the 47th Annual Meeting of the ACL and the 4th International JointConference on
Natural Language Processing of the AFNLP (ACL-IJCNLP’09), pages 351–359,2009.

• Owen Rambow, Cassandre Creswell, Rachel Szekely, Harriet Taber, and Marilyn Walker. A Dependency
Treebank for English. In Proceedings of the 3rd International Conference on LanguageResources and
Evaluation (LREC’02), 2002.

• Ralph Weischedel, Eduard Hovy, Martha Palmer, Mitch Marcus, Robert Belvin, Sameer Pradhan, Lance
Ramshaw, and Nianwen Xue. OntoNotes: A Large Training Corpus for Enhanced Processing. In Joseph
Olive, Caitlin Christianson, and John McCary, editors, Handbook of NaturalLanguage Processing and
Machine Translation. Springer, 2011.

• Kristina Toutanova, Dan Klein, Christopher D. Manning, and Yoram Singer. Feature-Rich Part-of-Speech
Tagging with a Cyclic Dependency Network. In Proceedings of the Annual Conference of the North
American Chapter of the Association for Computational Linguistics on HumanLanguage Technology,
NAACL’03, pages 173–180, 2003.

• Nianwen Xue and Martha Palmer. Calibrating Features for Semantic Role Labeling. In Proceedings of
the Conference on Empirical Methods in Natural Language Processing, 2004.

65

Optimization of NLP Components for Robustness and Scalability

Recomendados

Recomendados

Mais conteúdo relacionado

Semelhante a Optimization of NLP Components for Robustness and Scalability

Semelhante a Optimization of NLP Components for Robustness and Scalability (20)

Mais de Jinho Choi

Mais de Jinho Choi (20)

Último

Último (20)

Optimization of NLP Components for Robustness and Scalability