6. QUOTES
âą Probably the last PBMT paper ever
âą People working on digital humanities don't really know what digital
humanities areâŠ
âą Kids learn language having heard a very small amount â to further
advance AI we need to focus on low resourced conditions instead of
big data
âą Home Made Restaurant Warmly
âą to make by hand taste
9. KEYNOTE: MARK SELIGMAN, SPOKEN
TRANSLATION, INC.
PERCEPTUALLY GROUNDED DEEP SEMANTICS
IN FUTURE HYBRID MACHINE TRANSLATION
Nine Issues in Speech Translation
â Discourse
â Speech acts
â Topic tracking
â Domain
â Prosody
â Pauses
â Pitch, stress
â Translation mismatches
â System architecture, data
structures
Improve Statistical MT
âą User feedback + machine learning
âą More, better data
âą Parsing > hybrid MT
10. KEYNOTE: MARK SELIGMAN, SPOKEN
TRANSLATION, INC.
PERCEPTUALLY GROUNDED DEEP SEMANTICS
IN FUTURE HYBRID MACHINE TRANSLATION
è»
_car
ă
_obj
éè»ą
_driving
ăă
_do
äșș
_person
Syntactic
structure
NP
VP
Semantic
structure
PP V
N NP VN V
drive
person
person
car
mod
agt obj
The Return of Semantics:
Interlingua/Ontologies
Grounded Semantics
12. JOAKIM NIVRE
UPPSALA UNIVERSITY, SWEDEN
Universal Dependencies - Dubious Linguistics and Crappy Parsing?
âą Maximize parallelism â but donât overdo it
âą Donât annotate the same thing in different ways
âą Donât make different things look the same
âą Donât annotate things that are not there
âą Universal taxonomy with language-specific elaboration
âą Languages select from a universal pool of categories
âą Allow language-specific extensions
13. JOAKIM NIVRE
UPPSALA UNIVERSITY, SWEDEN
Manning's law
1. UD needs to be satisfactory on linguistic analysis grounds for individual languages.
2. UD needs to be good for linguistic typology, i.e., providing a suitable basis for bringing
out cross-linguistic parallelism across languages and language families.
3. UD must be suitable for rapid, consistent annotation by a human annotator.
4. UD must be suitable for computer parsing with high accuracy.
5. UD must be easily comprehended and used by a non-linguist, whether a language
learner or an engineer with prosaic needs for language processing.
6. UD must support well downstream language understanding tasks (relation extraction,
reading comprehension, machine translation, âŠ).
14. JOAKIM NIVRE
UPPSALA UNIVERSITY, SWEDEN
Dubious linguistics?
âą Lexical dependencies and functional relations encoded in a
single tree
âą Grounded in linguistic typology and dependency grammar
traditions
Crappy parsing?
âą Not so bad with existing parsers, especially for cross-lingual
parsing
âą Learn richer parsing models grounded in linguistic typology
15. REIKO MAZUKA
RIKEN BRAIN SCIENCE INSTITUTE, JAPAN
âą 12month old babies are called 'old babiesâ
âą Medical stuff has lots of data, lots of problems
⹠⊠let alone âŠ
DINA DEMNER-FUSHMAN
U.S. NATIONAL LIBRARY OF MEDICINE, U.S.A.
SIMONE TEUFEL
UNIVERSITY OF CAMBRIDGE, U.K.
17. CHARNER: CHARACTER-LEVEL
NAMED ENTITY RECOGNITION
Onur Kuru, Ozan Arkan Can, Deniz Yuret
âą Stacked bidirectional LSTMs
âą inputs characters
âą outputs tag probabilities for each character
âą Probabilities are then converted to word level named entity tags using a
Viterbi decoder
âą Close to state-of-the-art NER performance in seven languages with the
same basic model using only labeled NER data and no hand-engineered
features or other external resources like syntactic taggers or Gazetteers
18. WHAT TOPIC DO YOU WANT TO HEAR ABOUT?
A BILINGUAL TALKING ROBOT
USING ENGLISH AND JAPANESE WIKIPEDIAS
Graham Wilcock, Kristiina Jokinen
21. INTERACTIVE ATTENTION
FOR NEURAL MACHINE TRANSLATION
Fandong Meng, Zhengdong Lu, Hang Li, Qun Liu
âą Models the interaction between the decoder and the
representation of source sentence during translation by both
reading and writing operations
âą Can keep track of the interaction history and therefore improve
the translation performance
22. SUB-WORD SIMILARITY BASED SEARCH FOR
EMBEDDINGS: INDUCING RARE-WORD
EMBEDDINGS FOR WORD SIMILARITY TASKS AND
LANGUAGE MODELLING
Mittul Singh, Clayton Greenberg, Youssef Oualil, Dietrich Klakow
âą Training good word embeddings requires large amounts of data.
Out-of-vocabulary words will still be encountered.
âą Existing methods use computationally-intensive morphological
analysis to generate embeddings
âą The proposed system applies a computationally-simpler sub-word
search on words that have existing embeddings
âą Up to 50% reduction in rare word perplexity in comparison to other
more complex language models
23. MULTI-ENGINE AND MULTI-ALIGNMENT BASED
AUTOMATIC POST-EDITING
AND ITS IMPACT ON TRANSLATION
PRODUCTIVITY
Santanu Pal, Sudip Kumar Naskar, Josef van Genabith
âą Parallel system combination in the APE stage of a sequential MT-
APE combination
âą Substantial translation improvements
âą automatic evaluation (+5.9%)
âą productivity in post-editing (21.76%)
âą System combination on the level of APE alignments yields further
improvements
33. PREDICTING HUMAN SIMILARITY JUDGMENTS
WITH DISTRIBUTIONAL MODELS:
THE VALUE OF WORD ASSOCIATIONS
Simon De Deyne, Amy Perfors, Daniel J Navarro
âą Internal language models, that are more closely aligned to the
mental representations of words
âą Count based model for text corpora
âą Predicting structure from text corpora using word embeddings
âą Count based model for word associations
âą A spreading activation approach to semantic structure
34. EXTENDING THE USE OF ADAPTOR GRAMMARS
FOR UNSUPERVISED MORPHOLOGICAL
SEGMENTATION OF UNSEEN LANGUAGES
Ramy Eskander, Owen Rambow, Tianchun Yang
âą Segmentation of words in a language into a sequence of
morphs
âą Without rewriting or normalizing morphs
âą Without identifying the stem
âą Without identifying morphological features