Processing of multi-word expressions (MWEs) is a known problem for any natural language processing task. Even neural machine translation (NMT) struggles to overcome it. This paper presents results of experiments on investigating NMT attention allocation to the MWEs and improving automated translation of
sentences that contain MWEs in English -> Latvian and English -> Czech NMT systems. Two improvement strategies were
explored - (1) bilingual pairs of automatically extracted MWE candidates were added to the parallel corpus used to train the NMT system, and (2) full sentences containing the automatically extracted MWE candidates were added to the parallel corpus. Both approaches allowed to increase automated evaluation
results. The best result - 0.99 BLEU point increase - has been reached with the first approach, while with the second approach minimal improvements achieved. We also provide open-source software and tools used for MWE extraction and alignment inspection.
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Â
Paying attention to MWEs in NMT
1. Paying Attention to Multi-Word Expressions
in Neural Machine Translation
MatÄ«ss Rikters1 and OndĆej Bojar2
1University of Latvia, Faculty of Computing
2Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics
The 16th Machine Translation Summit
Nagoya, Japan
September 20, 2017
2. Contents
âą Introduction
âą Related Work
âą Workflow
âą Data
âą NMT Systems
âą Experiments
âą Results
âą Manual Inspection
âą Attention Inspection
âą Conclusions
3. Introduction
âą Raining cats and dogs En â Lv
âą Lietu kaÄ·i un suĆi
âą SuĆu un kaÄ·u
âą Raining kaÄ·i un suĆi
7. Related Work
âą Extracting MWE candidates and integrating them in SMT (SkadiĆa, 2016)
âą Tagging candidate phrases in source sentence and forcing the decoder
to generate multiple words at once for the target phrase (Tang et al., 2016)
âą Inclusion of structural biases from word-based alignment models,
such as positional bias, Markov conditioning, fertility and agreement over
translation directions, in attentional NMT (Cohn et al., 2016)
âą Automatically extracting smaller parts of training segment pairs and adding
them to NMT training data (Chen et al., 2016)
âą No related work specifically targeting MWEs in NMT
8. More Related Work
âą Translating Phrases in Neural Machine Translation (Wang et al., 2017)
âą Results of the WMT17 Neural MT Training Task (Bojar et al., 2017)
10. Data
âą WMT17 News Translation Task
âą Training
âą En â Lv
âą 4.5M parallel sentences for the baseline
âą En â Cs
âą 49M parallel sentences for the baseline
âą Evaluation
âą En â Lv
âą 2003 sentences in total
âą En â Cs
âą 6000 sentences in total
12. Data
En â Lv
En â Cs
1M 1xMWE 1M 2xMWE 2M 2xMWE 0.5M
2.5M 1xMWE 2.5M 2xMWE 5M 2xMWE 5M
13. Data
âą WMT17 News Translation Task
âą Training
âą En â Lv
âą 4.5M parallel sentences for the baseline
âą 4.8M after adding MEWs/MWE sentences
âą En â Cs
âą 49M parallel sentences for the baseline
âą 17M after adding MEWs/MWE sentences
âą Evaluation
âą En â Lv
âą 2003 sentences in total
âą 611 sentences with at least one MWE
âą En â Cs
âą 6000 sentences in total
âą 112 sentences with at least one MWE
14. NMT Systems
âą Neural Monkey
âą Embedding size 350
âą Encoder state size 350
âą Decoder state size 350
âą Max sentence length 50
âą BPE merges 30000
https://github.com/ufal/neuralmonkey
15. Experiments
Two forms of the presenting MWEs to the NMT system
âą Adding only the parallel MWEs themselves (MWE phrases)
each pair forming a new âsentence pairâ in the parallel corpus
âą Adding full sentences that contain the identified MWEs (MWE sentences)
16. Results
Languages En â Cs En â Lv
Dataset Dev MWE Dev MWE
Baseline 13.71 10.25 11.29 9.32
+MWE phrases - - 11.94 10.31
+MWE sentences 13.99 10.44 - -
20. Conclusions
âą First experiments with handling multi-word expressions in neural
machine translation â tow methods for MWE integration in NMT
training data
âą Open-source scripts for a complete workflow of identifying, extracting
and integrating MWEs into the NMT training and translation workflow
âą Started work on an open-source tool for visualizing NMT attention
alignments (Rikters et al., 2017)
22. References
âą Chen, W., Matusov, E., Khadivi, S., and Peter, J.-T. (2016). Guided alignment training for topic-
aware neural machine translation. AMTA 2016, Vol., page 121.
âą Cohn, T., Hoang, C. D. V., Vymolova, E., Yao, K., Dyer, C., and Haffari, G. (2016). Incorporating
structural alignment biases into an attentional neural translation model. In Proceedings of the
2016 Conference of the North American Chapter of the Association for Computational Linguistics:
Human Language Technologies, pages 876â885, San Diego, California. Association for
Computational Linguistics.
âą Majchrakova, D., Dusek, O., Hajic, J., Karcova, A., and Garabik, R. (2012). Semi-automatic
detection of multiword expressions in the Slovak dependency treebank.
âą Pecina, P. (2008). Reference data for Czech collocation extraction. In Proc. of the LREC Workshop
Towards a Shared Task for MWEs (MWE 2008), pages 11â14.
âą Rikters, M., Fishel, M., Bojar, O. (2017). Visualizing Neural Machine Translation Attention and
Confidence. Prague Bulletin for Mathematical Linguistics, volume 109.
âą SkadiĆa, I. (2016). Multi-word expressions in English - Latvian. In Human Language Technologies â
The Baltic Perspective: Proceedings of the Seventh International Conference Baltic HLT 2016,
volume 289, page 97. IOS Press.
âą Tang, Y., Meng, F., Lu, Z., Li, H., and Yu, P. L. H. (2016). Neural machine translation with external
phrase memory. CoRR, abs/1606.01792.
Wang et al. propose a method to translate phrases in NMT by integrating a phrase memory storing target phrases from a phrase-based statistical machine translation (SMT) system into the encoder-decoder architecture of NMT.
Curriculum learning, namely learning first on short target (Czech) sentences only and gradually adding also longer sentences to the batches as the training progresses.