Optimization of NLP Components for Robustness and Scalability
1. Optimization of NLP Components
For Robustness and Scalability
Dissertation Defense
August 15, 2012
Jinho D. Choi
choijd@colorado.edu
Department of Computer Science
University of Colorado Boulder
Friday, August 17, 2012
2. Ever since I came, Boulder has been ...
• #1: Top 10 College Towns (Livability, 2012)
• #1: Top 10 Least Obese Metro Areas (Gallup Healthways, 2012)
• #1: Top 10 Happiest Cities (Gallup Healthways, 2012)
• #1: The 10 Most Educated U.S. Cities (US News, 2011)
• #1: America’s 15 Most Active Cities (Time - Healthland, 2011)
• #1: Best Quality of Life in America (Porfolio, 2011)
• #1: 20 Brainiest Cities in America (Daily Beast, 2010)
• #1: Western Cities Fare Best in Well-being (USA Today, 2010)
• #1: America's Foodiest Town (Bon Appétit, 2010)
• #1: The Best Cities to Raise an Outdoor Kid (Backpacker, 2009)
• #1: America's Top 25 Towns To Live Well (Forbes, 2009)
• #1: America's Smartest Cities (Forbes, 2008)
• #1: Top Heart Friendly Cities (American Heart Association, 2008)
2
Friday, August 17, 2012
4. Introduction
• The application of NLP has ...
- Expanded to everyday computing.
- Broadened to a general audience.
‣ More attention is drawn to the practical aspects of NLP.
• NLP components should be tested for
- Robustness in handling heterogeneous data.
• Need to be evaluated on data from several different sources.
- Scalability in handling a large amount of data.
• Need to be evaluated for speed and complexity.
4
Friday, August 17, 2012
5. Introduction
• Research question
- How to improve the robustness and scalability of standard
NLP components.
• Goals
- To prepare gold-standard data from several different sources
for in-genre and out-of-genre experiments.
- To develop a POS tagger, a dependency parser, and a semantic
role labeler showing robust results across this data.
- To reduce average complexities of these components while
retaining good performance in accuracy.
5
Friday, August 17, 2012
6. Introduction
• Thesis statement
1. We improve the robustness of three NLP components:
• POS tagger: by building a generalized model.
• Dependency parser: by bootstrapping parse information.
• Semantic role labeler: by applying higher-order argument pruning.
2. We improve the scalability of these three components:
• POS tagger: by adapting dynamic model selection.
• Dependency parser: by optimizing the engineering of transition-
based parsing algorithms.
• Semantic role labeler: by applying conditional higher-order
argument pruning.
6
Friday, August 17, 2012
7. Introduction
Start
Constituent
Treebanks
PropBanks
Dependency
Conversion
Training Set: Evaluation Set:
Dependency Trees Dependency Trees
+ Semantic Roles + Semantic Roles
Part-of-speech Part-of-speech Part-of-speech
Trainer Tagging Model Tagger
Dependency Dependency Dependency
Trainer Parsing Model Parser
Semantic Role Semantic Role Semantic Role
Trainer Labeling Model Labeler
Stop
7
Friday, August 17, 2012
9. Dependency Conversion
• Motivation
- A small amount of manually annotated dependency trees
(Rambow et al., 2002; Čmejrek et al., 2004).
- A large amount of manually annotated constituent trees
(Marcus et al., 1993; Weischedel et al., 2011).
- Converting constituent trees into dependency trees
→ A large amount of pseudo annotated dependency trees.
• Previous approaches
- Penn2Malt (stp.lingfil.uu.se/~nivre/research/Penn2Malt.html).
- LTH converter (Johansson and Nugues, 2007).
- Stanford converter (de Marneffe and Manning, 2008a).
9
Friday, August 17, 2012
10. Dependency Conversion
• Comparison
- The Stanford and CLEAR dependency approaches generate
3.62% and 0.23% of unclassified dependencies, respectively.
- Our conversion produces 3.69% of non-projective trees.
Penn2Malt LTH Stanford CLEAR
Labels Malt CoNLL Stanford Stanford+
Long-distance DPs ✓✓✓ ✓✓✓
Secondary DPs ✓ ✓✓ ✓✓✓✓
Function Tags ✓✓ ✓✓
New TB Format NO NO NO YES
Maintenance NO NO YES YES
10
Friday, August 17, 2012
11. Dependency Conversion (1/6)
1. Input a constituent tree.
• Penn, OntoNotes, CRAFT, MiPACQ, and SHARP Treebanks.
NP
NP SBAR
WHNP-1 S
NP VP
NP
NN CC NN WDT PRP VB -NONE-
Peace and joy that we want *T*-1
11
Friday, August 17, 2012
12. Dependency Conversion (2/6)
2. Reorder constituents related to empty categories.
• *T*: wh-movement and topicalization.
• *RNR*: right node raising.
• *ICH* and *PPA*: discontinuous constituent.
NP NP
NP SBAR NP SBAR
WHNP-1 S S
NP VP NP VP
NP WHNP-1
NN CC NN WDT PRP VB -NONE- NN CC NN PRP VB WDT
Peace and joy that we want *T*-1 Peace and joy we want that
12
Friday, August 17, 2012
13. Dependency Conversion (3/6)
3. Handle special cases.
• Apposition, coordination, and small clauses.
NP
NP SBAR
S
NP VP
WHNP-1
NN CC NN PRP VB WDT
Peace and joy we want that
The original word order is preserved
conj
cc in the converted dependency tree.
root Peace and joy that we want
13
Friday, August 17, 2012
14. Dependency Conversion (4/6)
4. Handle general cases.
• Head-finding rules and heuristics.
NP
NP SBAR
S
NP VP
WHNP-1
NN CC NN PRP VB WDT
Peace and joy we want that
rcmod
conj dobj
root cc nsubj
root Peace and joy that we want
14
Friday, August 17, 2012
15. Dependency Conversion (5/6)
5. Add secondary dependencies.
• Gapping, referent, right node raising, open clausal subject.
NP
NP SBAR
S
NP VP
WHNP-1
NN CC NN PRP VB WDT
Peace and joy we want that
rcmod
conj dobj
root cc nsubj
root Peace and joy that we want
ref
15
Friday, August 17, 2012
16. Nielsen et al., 2010; Weischedel et al., 2011; Verspoor et al., 2012). Tags followed by ∗ are not the
Dependency Conversion (6/6)
typical Penn Treebank tags but used in some other Treebanks.
A.1 Function tags
6. Add function tags.
Syntactic roles
ADV Adverbial PUT Locative complement of put
CLF It-cleft PRD Non-VP predicate
CLR Closely related constituent RED∗ Reduced auxiliary
DTV Dative SBJ Surface subject
LGS Logical subject in passive TPC Topicalization
NOM Nominalization
Semantic roles
BNF Benefactive MNR Manner
DIR Direction PRP Purpose or reason
EXT Extent TMP Temporal
LOC Locative VOC Vocative
Text and speech categories
ETC Et cetera SEZ Direct speech
FRM∗ Formula TTL Title
HLN Headline UNF Unfinished constituent
IMP Imperative
Table A.1: A list of function tags for English.
16
Friday, August 17, 2012
18. Experimental Setup
• The Wall Street Journal (WSJ) models
- Train
• The WSJ 2-21 in OntoNotes (Weischedel et al., 2011).
• Total: 30,060 sentences, 731,677 tokens, 77,826 predicates.
- In-genre evaluation (Avgi)
• The WSJ 23 in OntoNotes.
• Total: 1,640 sentences, 39,590 tokens, 4,138 predicates.
- Out-of-genre evaluation (Avgo)
• 5 genres in OntoNotes, 2 genres in MiPACQ (Nielsen et al., 2010),
1 genre in SHARP.
• Total: 19,368 sentences, 265,337 tokens, 32,142 predicates.
18
Friday, August 17, 2012
19. Experimental Setup
• The OntoNotes models
- Train
• 6 genres in OntoNotes.
• Total: 96,406 sentences, 1,983,012 tokens, 213,695 predicates.
- In-genre evaluation (Avgi)
• 6 genres in OntoNotes.
• Total: 13,337 sentences, 201,893 tokens, 25,498 predicates.
- Out-of-genre evaluation (Avgo)
• Same 2 genres in MiPACQ, same 1 genre in SHARP.
• Total: 7,671 sentences, 103,034 tokens, 10,782 predicates.
19
Friday, August 17, 2012
20. Experimental Setup
• Accuracy
- Part-of-speech tagging
• Accuracy.
- Dependency parsing
• Labeled attachment score (LAS).
• Unlabeled attachment score (UAS).
- Semantic role labeling
• F1-score of argument identification.
• F1-score of both argument identification and classification.
20
Friday, August 17, 2012
21. Experimental Setup
• Speed
- All experiments are run on an Intel Xeon 2.57GHz machine.
- Each model is run 5 times, and an average speed is measured
by taking the average of middle 3 speeds.
• Machine learning algorithm
- Liblinear L2-regularization, L1-loss SVM classification
(Hsieh et al., 2008).
- Designed to handle large scale, high dimensional vectors.
- Runs fast with accurate performance.
- Our implementation of LibLinear is publicly available.
21
Friday, August 17, 2012
23. Part-of-Speech Tagging
• Motivation
- Supervised learning approaches do not perform well in
out-of-genre experiments.
- Domain adaptation approaches require knowledge of
incoming data.
- Complicated tagging or learning approaches often run slowly
during decoding.
• Dynamic model selection
- Build two models, generalized and domain-specific, given one
set of training data.
- Dynamically select one of the models during decoding.
23
Friday, August 17, 2012
24. Part-of-Speech Tagging
• Training
1. Group training data into documents (e.g., sections in WSJ).
2. Get the document frequency of each simplified word form.
• In simplified word forms, all numerical expressions with or w/o
special characters are converted to 0.
3. Build a domain-specific model using features extracted from
only tokens whose DF(SW) > 1.
4. Build a generalized model using features extracted from only
tokens whose DF(SW) > 2.
5. Find the cosine similarity threshold for dynamic model
selection.
24
Friday, August 17, 2012
25. Part-of-Speech Tagging
• Cosine similarity threshold
- During cross-validation, collect cosine-similarities between
simplified word forms used for building the domain-specific
model and input sentences that the domain-specific model
shows advantage.
- The cosine similarity in the first 5% area becomes the
threshold for dynamic model selection.
190
160
Occurrence
120
80
40
5%
0
0 0.02 0.04 0.06
Cosine Similarity
25
Friday, August 17, 2012
26. Part-of-Speech Tagging
• Decoding
- Measure the cosine similarity between simplified word forms
used for building the domain-specific model and each input
sentence.
- If the similarity is greater than the threshold, use the domain-
specific model.
- If the similarity is less than or equal to the threshold, use the
generalized model.
Runs as fast as a single model approach.
26
Friday, August 17, 2012
27. Part-of-Speech Tagging
• Experiments
- Baseline: using the original word forms.
- Baseline+: using lowercase simplified word forms.
- Domain: domain-specific model.
- General: generalized model.
- ClearNLP: dynamic model selection.
- Stanford: Toutanova et al., 2003.
- SVMTool: Giménez and Màrquez, 2004.
27
Friday, August 17, 2012
32. Dependency Parsing
• Goals
1. To improve the average parsing complexity for non-
projective dependency parsing.
2. To reduce the discrepancy between dynamic features used
for training on gold trees and decoding automatic trees.
3. To ensure well-formed dependency graph properties.
• Approach
1. Combine transitions in both projective and non-projective
dependency parsing algorithms.
2. Bootstrap dynamic features during training.
3. Post-process.
32
Friday, August 17, 2012
33. Table 5.1 shows functional decomposition of transitions used in Nivre’s arc-eager and Covington’s
Dependency Parsing
algorithms. Nivre’s arc-eager algorithm is a projective parsing algorithm that shows a worst-case
parsing complexity of O(n) (Nivre, 2003). Covington’s algorithm is a non-projective parsing al-
• Transition decomposition
gorithm that shows a worst-case parsing complexity of O(n2 ) without backtracking (Covington,
-
Decompose transitions in:
2001). Covington’s algorithm was later formulated as a transition-based parsing algorithm by Nivre
•
(2008), called Nivre’s list-based algorithm. Table(projective; Nivre, 2003).
Nivre’s arc-eager algorithm 5.3 shows the relation between the decomposed
• Nivre’s list-based algorithm (non-projective; Nivre, 2008).
transitions in Table 5.1 and the transitions from the original algorithms.
Operation Transition Description
l
Left-∗l ( [λ1 |i], λ2 , [j|β], A ) ⇒ ( [λ1 |i], λ2 , [j|β], A ∪ {i ← j} )
Arc l
Right-∗l ( [λ1 |i], λ2 , [j|β], A ) ⇒ ( [λ1 |i], λ2 , [j|β], A ∪ {i → j} )
No-∗ ( [λ1 |i], λ2 , [j|β], A ) ⇒ ( [λ1 |i], λ2 , [j|β], A )
∗-Shiftd|n ( [λ1 |i], λ2 , [j|β], A ) ⇒ ( [λ1 |i|λ2 |j], [ ], β, A )
List ∗-Reduce ( [λ1 |i], λ2 , [j|β], A ) ⇒ ( λ1 , λ2 , [j|β], A )
∗-Pass ( [λ1 |i], λ2 , [j|β], A ) ⇒ ( λ1 , [i|λ2 ], [j|β], A )
Table 5.1: Decomposed transitions grouped into the Arc and List operations.
This decomposition makes it easier to integrate
transitions from different parsing algorithms.
Operation Transition Precondition
Left-∗l [i = 0] ∧ ¬[∃k. (i ← k) ∈ A] ∧ ¬[(i →∗ j) ∈ A]
Arc Right-∗l 33
¬[∃k. (k → j) ∈ A] ∧ ¬[(i ∗← j) ∈ A]
Friday, August 17, 2012 No-∗ ¬[∃l. Left-∗l ∨ Right-∗l ]
34. be recomposed into transitions used in several different dependency parsing algorithms.
5.2.2
Dependency Parsing
Transition recomposition
• Transition recomposition
Any combination of two decomposed transitions in Table 5.1, one from each operation, can be
-
recomposed into a new transition. Forof two decomposedof Left-∗l and ∗-Reduce makes a
Any combination instance, the combination transitions, one from
each operation,performs Left-∗ and ∗-Reduce sequentially; the Arc operation
can be recomposed.
transition, Left-Reduce , which l l
-
is always performed before the List operation. Table 5.3 an ARC operation is
For each recomposed transition, shows how these decomposed transitions
performed first and a LIST operation is performed later.
are recomposed into transitions used in different dependency parsing algorithms.
Projective Non-projective
Transition Nivre’03 Covington’01 Nivre’08 CP’11 This work
Left-Reducel
Left-Passl
Right-Shiftnl
Right-Passl
No-Shiftd
No-Shiftn
No-Reduce
No-Pass
Table 5.3: Transitions in different dependency parsing algorithms. The last column shows transitions
used in our parsing algorithm. The other columns show transitions used in Nivre (2003), Covington
34
(2001), Nivre (2008), and Choi and Palmer (2011a), respectively.
Friday, August 17, 2012
35. Dependency Parsing
• Average parsing complexity
- The number of transitions performed per sentence.
2850
330 Nivre'08
Covington'01
250
# of transitions
# of transitions
2000
200 CP'11
1500
150 Nivre'08
1000 CP'11
100 This work
500 This work
50
0
10 20 30 40
40 50
50 60
60 70
70 80
80
Sentence length
Sentence length
35
Friday, August 17, 2012
36. Dependency Parsing
• Bootstrapping
- Transition-based dependency parsing can take advantage of
dynamic features (e.g., head, leftmost/rightmost dependent).
w0 ! h j wi p j
wi wj
w1 wj-1 wi+1 wj-1
- Features extracted from gold-standard trees during training
can be different from features extracted from automatic
trees during decoding.
- By bootstrapping these dynamic features, we can significantly
improve parsing accuracy.
36
Friday, August 17, 2012
37. Dependency Parsing
Begin
Training
Data
Gold-standard Gold-standard
Features Labels
Machine Learning
Algorithm
Statistical Automatic
Model Features
Determined by Stop?
NO Dependency
cross-validation. Parser
YES
End
37
Friday, August 17, 2012
38. Dependency Parsing
• Post-processing
- Transition-based dependency parsing does not guarantee
parse output to be a tree.
- After parsing, we find the head of each headless token by
comparing it to all other tokens using the same model.
- A predicted head with the highest score that does not break
tree properties becomes the head of this token.
- This post-processing technique significantly improves parsing
accuracy in out-of-genre experiments.
38
Friday, August 17, 2012
39. Dependency Parsing
• Experiments
- Baseline: using all recomposed transitions.
- Baseline+: Baseline with post-processing.
- ClearNLP: Baseline+ with bootstrapping.
- CN’09: Choi and Nicolov, 2009.
- CP’11: Choi and Palmer, 2011a.
- MaltParser: Nivre, 2009.
- MSTParser: McDonald et al., 2005.
• Use only 1st order features; with 2nd order features, accuracy is
expected to be higher and speed is expected to be slower.
39
Friday, August 17, 2012
45. Semantic Role Labeling
• Motivation
- Not all tokens need to be visited for semantic role labeling.
- A typical pruning algorithm does not work as well when
automatically generated trees are provided.
- An enhanced pruning algorithm could improve argument
coverage while maintaining low average labeling complexity.
• Approach
- Higher-order argument pruning.
- Conditional higher-order argument pruning.
- Positional feature separation.
45
Friday, August 17, 2012
46. Semantic Role Labeling
• Semantic roles in dependency trees
ARG0 ARG1 ARG2 ARGM-TMP
46
Friday, August 17, 2012
47. Semantic Role Labeling
• First-order argument pruning (1st)
- Originally designed for constituent trees.
• Considers only siblings of the predicate, predicate’s ancestors, and
siblings of predicate’s ancestors argument candidates (Xue and
Palmer, 2004).
- Redesigned for dependency trees.
• Considers only dependents of the predicate, predicate’s ancestors,
and dependents of predicate’s ancestors argument candidates
(Johansson and Nugues, 2008).
- Covers over 99% of all arguments using gold-standard trees.
- Covers only 93% of all arguments using automatic trees.
47
Friday, August 17, 2012
48. Semantic Role Labeling
• Higher-order argument pruning (High)
- Considers all descendants of the predicate, predicate’s
ancestors, and dependents of predicate’s ancestors
argument candidates.
- Significantly improves argument coverage when automatically
generated trees are used.
100
99.92
99.44
Argument Coverage
98 98.24
97.59
96
94
92 92.94
91.02
90
WSJ-1st ON-1st WSJ-High ON-High Gold-1st Gold-High
48
Friday, August 17, 2012
49. Semantic Role Labeling
• Conditional higher-order argument pruning (High+)
- Reduces argument candidates using path-rules.
- Before training,
• Collect paths between predicates and their descendants whose
subtrees contain arguments of the predicates.
• Collect paths between predicates and their ancestors whose
direct dependents or ancestors are arguments of the predicates.
• Cut off paths whose counts are below thresholds.
- During training and decoding, skip tokens and their subtrees
or ancestors whose paths to the predicates are not seen.
49
Friday, August 17, 2012
50. Semantic Role Labeling
• Average labeling complexity
- The number of tokens visited per predicate.
Using the WSJ models (OntoNotes graph is similar)
75 All
60
# of candidates
50
40 High
30
High+
20 1st
10
0
10 20 30 40 50 60 70 80
Sentence length
50
Friday, August 17, 2012
51. Semantic Role Labeling
• Positional feature separation
- Group features by arguments’ positions with respect to their
predicates.
- Two sets of features are extracted.
• All features derived from arguments on the lefthand side of the
predicates are grouped in one set, SL.
• All features derived from arguments on the righthand side of the
predicates are grouped in another set, SR.
- During training, build two models, ML and MR, for SL and SR.
- During decoding, use ML and MR for argument candidates on
the lefthand and righthand sides of the predicates.
51
Friday, August 17, 2012
52. Semantic Role Labeling
• Experiments
- Baseline: 1st order argument pruning.
- Baseline+: Baseline with positional feature separation.
- High: higher-order argument pruning.
- All: no argument pruning.
- ClearNLP: conditional higher-order argument pruning.
• Previously called High+.
- ClearParser: Choi and Palmer, 2011b.
52
Friday, August 17, 2012
53. Semantic Role Labeling
• Accuracy - WSJ models (Avgi and Avgo)
In-domain experiments
82.6
82.52 82.48
82.3 82.42
82.28 82.26
82.0
81.88
81.7
Baseline Baseline+ High All ClearNLP ClearParser
Out-of-domain experiments
72
71.90 71.95
71.7 71.85
71.64
71.4 71.52
71.1
71.07
70.8
Baseline Baseline+ High All ClearNLP ClearParser
53
Friday, August 17, 2012
54. Semantic Role Labeling
• Accuracy - OntoNotes models (Avgi and Avgo)
In-domain experiments
81.7
81.69
81.51 81.48 81.52
81.3 81.33
80.9
80.73
80.5
Baseline Baseline+ High All ClearNLP ClearParser
Out-of-domain experiments
70.9
70.81
70.5 70.68 70.68
70.54
70.1
70.02 70.01
69.7
Baseline Baseline+ High All ClearNLP ClearParser
54
Friday, August 17, 2012
55. Semantic Role Labeling
• Speed comparison - WSJ models
- Milliseconds for finding all arguments of each predicate.
3 ClearNLP
ClearNLP+
Baseline+
2.25 High
All
Milliseconds
ClearParser
1.5
0.75
0
10 20 30 40 50 60 70 80
Sentence Length
55
Friday, August 17, 2012
56. Semantic Role Labeling
• Speed comparison - OntoNotes models
3 ClearNLP
ClearNLP+
Baseline+
2.25 High
All
Milliseconds
ClearParser
1.5
0.75
0
10 20 30 40 50 60 70 80
Sentence Length
56
Friday, August 17, 2012
58. Conclusion
• Our dependency conversion gives rich dependency
representations and can be applied to most English Treebanks.
• The dynamic model selection runs fast and shows robust POS
tagging accuracy across different genres.
• Our parsing algorithm shows linear-time average parsing
complexity for generating both proj. and non-proj. trees.
• The bootstrapping technique gives significant improvement on
parsing accuracy.
• The higher-order argument pruning gives significant
improvement on argument coverage.
• The conditional higher-order argument pruning reduces average
labeling complexity without compromising the F1-score.
58
Friday, August 17, 2012
59. Conclusion
• Contributions
- First time that these three components have been evaluated
together on such a wide variety of English data.
- Maintained high level accuracy while improving efficiency,
modularity, and portability of these components.
- Dynamic model selection and bootstrapping can be generally
applicable for tagging and parsing, respectively.
- Processing all three components take about 2.49 - 2.69 ms
(tagging: 0.36 - 0.37, parsing: 1.16 - 1.28, labeling: 0.97 - 1.04).
- All components are publicly available as an open source
project, called ClearNLP (clearnlp.googlecode.com).
59
Friday, August 17, 2012
60. Conclusion
• Future work
- Integrate the dynamic model selection approach with more
sophisticated tagging algorithms.
- Evaluate our parsing approach on languages containing more
non-projective dependency trees.
- Improve semantic role labeling where the quality of input
parse trees is poor (using joint-inference).
60
Friday, August 17, 2012
61. Acknowledgment
• We gratefully acknowledge the support of the following grants. Any contents
expressed in this material are those of the authors and do not necessarily
reflect the views of any grant.
- The National Science Foundation Grants IIS-0325646, Domain Independent
Semantic Parsing, CISE-CRI-0551615, Towards a Comprehensive Linguistic
Annotation, CISE-CRI 0709167, Collaborative: A Multi-Representational and
Multi-Layered Treebank for Hindi/Urdu, CISE- IIS-RI-0910992, Richer
Representations for Machine Translation.
- A grant from the Defense Advanced Research Projects Agency (DARPA/
IPTO) under the GALE program, DARPA/CMO Contract No. HR0011-06-
C-0022, subcontract from BBN, Inc.
- A subcontract from the Mayo Clinic and Harvard Children’s Hospital based
on a grant from the ONC, 90TR0002/01.
- Strategic Health Advanced Research Project Area 4: Natural Language
Processing.
61
Friday, August 17, 2012
62. Acknowledgment
• Special thanks are due to
- Martha Palmer for practically being my mom for 5 years.
- James Martin for always encouraging me when I’m low.
- Wayne Ward for wonderful smiles.
- Bhuvana Narasimhan for bringing Hindi to my life.
- Joakim Nivre for suffering under millions of my questions.
- Nicolas Nicolov for making me feel normal when others call
me “workaholic”.
- All CINC folks for letting me live (literally) at my cube.
62
Friday, August 17, 2012
63. References
• Jinho D. Choi and Nicolas Nicolov. K-best, Locally Pruned, Transition-based Dependency Parsing Using
Robust Risk Minimization. In Recent Advances in Natural Language Processing V, pages 205–216. John
Benjamins, 2009.
• Jinho D. Choi and Martha Palmer. Getting the Most out of Transition-based Dependency Parsing. In
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics:Human
Language Technologies, ACL:HLT’11, pages 687–692, 2011a.
• Jinho D. Choi and Martha Palmer. Transition-based Semantic Role Labeling Using Predicate Argument
Clustering. In Proceedings of ACL workshop on Relational Models of Semantics, RELMS’11, pages 37–
45, 2011b.
• M. Čmejrek, J. Cuřín, and J. Havelka. Prague Czech-English Dependency Treebank: Any Hopes for a
Common Annotation Scheme? In HLT-NAACL’04 workshop on Frontiers in CorpusAnnotation, pages
47–54, 2004.
• Jesús Giménez and Lluís Màrquez. SVMTool: A general POS tagger generator based on Support Vector
Machines. In Proceedings of the 4th International Conference on Language Resources and Evaluation,
LREC’04, 2004.
• Richard Johansson and Pierre Nugues. Dependency-based Semantic Role Labeling of PropBank. In
Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing
(EMNLP’08), pages 69–78, 2008.
63
Friday, August 17, 2012
64. References
• Cho-Jui Hsieh, Kai-Wei Chang, Chih-Jen Lin, S. Sathiya Keerthi, and S. Sundararajan. A Dual Coordinate
Descent Method for Large-scale Linear SVM. In Proceedings of the 25th international conference on
Machine learning, ICML’08, pages 408–415, 2008.
• Mitchell P. Marcus, Mary Ann Marcinkiewicz, and Beatrice Santorini. Building a Large Annotated Corpus
of English: The Penn Treebank. Computational Linguistics, 19(2):313–330, 1993.
• Marie-Catherine de Marneffe and Christopher D. Manning. The Stanford typed dependencies
representation. In Proceedings of the COLING workshop on Cross-Framework and Cross-
DomainParser Evaluation, 2008a.
• Ryan McDonald, Fernando Pereira, Kiril Ribarov, and Jan Hajič. Non-projective Dependency Parsing
using Spanning Tree Algorithms. In Proceedings of the Conference on Human LanguageTechnology and
Empirical Methods in Natural Language Processing (HLT-EMNLP’05), pages523–530, 2005.
• Rodney D. Nielsen, James Masanz, Philip Ogren, Wayne Ward, James H. Martin, Guergana Savova, and
Martha Palmer. An architecture for complex clinical question answering. In Proceedings of the 1st ACM
International Health Informatics Symposium, IHI’10, pages 395–399, 2010.
• Joakim Nivre. An Efficient Algorithm for Projective Dependency Parsing. In Proceedings of the 8th
International Workshop on Parsing Technologies, IWPT’03, pages 149–160, 2003.
• Joakim Nivre. Algorithms for deterministic incremental dependency parsing. Computational
Linguistics, 34(4):513–553, 2008.
64
Friday, August 17, 2012
65. References
• Joakim Nivre. Non-Projective Dependency Parsing in Expected Linear Time. In Proceedings of the Joint
Conference of the 47th Annual Meeting of the ACL and the 4th International JointConference on
Natural Language Processing of the AFNLP (ACL-IJCNLP’09), pages 351–359,2009.
• Owen Rambow, Cassandre Creswell, Rachel Szekely, Harriet Taber, and Marilyn Walker. A Dependency
Treebank for English. In Proceedings of the 3rd International Conference on LanguageResources and
Evaluation (LREC’02), 2002.
• Ralph Weischedel, Eduard Hovy, Martha Palmer, Mitch Marcus, Robert Belvin, Sameer Pradhan, Lance
Ramshaw, and Nianwen Xue. OntoNotes: A Large Training Corpus for Enhanced Processing. In Joseph
Olive, Caitlin Christianson, and John McCary, editors, Handbook of NaturalLanguage Processing and
Machine Translation. Springer, 2011.
• Kristina Toutanova, Dan Klein, Christopher D. Manning, and Yoram Singer. Feature-Rich Part-of-Speech
Tagging with a Cyclic Dependency Network. In Proceedings of the Annual Conference of the North
American Chapter of the Association for Computational Linguistics on HumanLanguage Technology,
NAACL’03, pages 173–180, 2003.
• Nianwen Xue and Martha Palmer. Calibrating Features for Semantic Role Labeling. In Proceedings of
the Conference on Empirical Methods in Natural Language Processing, 2004.
65
Friday, August 17, 2012