Anúncio

Paying attention to MWEs in NMT

Matīss
Postdoctoral Researcher em The University of Tokyo
14 de Sep de 2017
Anúncio

Mais conteúdo relacionado

Similar a Paying attention to MWEs in NMT(20)

Anúncio

Paying attention to MWEs in NMT

  1. Paying Attention to Multi-Word Expressions in Neural Machine Translation Matīss Rikters1 and Ondřej Bojar2 1University of Latvia, Faculty of Computing 2Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics The 16th Machine Translation Summit Nagoya, Japan September 20, 2017
  2. Contents • Introduction • Related Work • Workflow • Data • NMT Systems • Experiments • Results • Manual Inspection • Attention Inspection • Conclusions
  3. Introduction • Raining cats and dogs En → Lv • Lietu kaķi un suņi • Suņu un kaķu • Raining kaķi un suņi
  4. Introduction
  5. Introduction • Raining cats and dogs En → Lv • Lietu kaķi un suņi • Suņu un kaķu • Raining kaķi un suņi • Līst kā pa Jāņiem
  6. Introduction
  7. Related Work • Extracting MWE candidates and integrating them in SMT (Skadiņa, 2016) • Tagging candidate phrases in source sentence and forcing the decoder to generate multiple words at once for the target phrase (Tang et al., 2016) • Inclusion of structural biases from word-based alignment models, such as positional bias, Markov conditioning, fertility and agreement over translation directions, in attentional NMT (Cohn et al., 2016) • Automatically extracting smaller parts of training segment pairs and adding them to NMT training data (Chen et al., 2016) • No related work specifically targeting MWEs in NMT
  8. More Related Work • Translating Phrases in Neural Machine Translation (Wang et al., 2017) • Results of the WMT17 Neural MT Training Task (Bojar et al., 2017)
  9. Workflow Tag corpora with morphological taggers UDPipe LV Tagger Identify MWE candidates MWE Toolkit Align identified MWE candidates MPAligner Shuffle MWEs into training corpora; Train NMT systems Neural Monkey Identify changes
  10. Data • WMT17 News Translation Task • Training • En → Lv • 4.5M parallel sentences for the baseline • En → Cs • 49M parallel sentences for the baseline • Evaluation • En → Lv • 2003 sentences in total • En → Cs • 6000 sentences in total
  11. Identifying Multi-word Expressions En → Lv • 210 patterns (Skadiņa, 2016) • 60 000 multi-word expressions En → Cs • 23 patterns (Majchrakova et al., 2012; Pecina 2008) • 400 000 multi-word expressions
  12. Data En → Lv En → Cs 1M 1xMWE 1M 2xMWE 2M 2xMWE 0.5M 2.5M 1xMWE 2.5M 2xMWE 5M 2xMWE 5M
  13. Data • WMT17 News Translation Task • Training • En → Lv • 4.5M parallel sentences for the baseline • 4.8M after adding MEWs/MWE sentences • En → Cs • 49M parallel sentences for the baseline • 17M after adding MEWs/MWE sentences • Evaluation • En → Lv • 2003 sentences in total • 611 sentences with at least one MWE • En → Cs • 6000 sentences in total • 112 sentences with at least one MWE
  14. NMT Systems • Neural Monkey • Embedding size 350 • Encoder state size 350 • Decoder state size 350 • Max sentence length 50 • BPE merges 30000 https://github.com/ufal/neuralmonkey
  15. Experiments Two forms of the presenting MWEs to the NMT system • Adding only the parallel MWEs themselves (MWE phrases) each pair forming a new “sentence pair” in the parallel corpus • Adding full sentences that contain the identified MWEs (MWE sentences)
  16. Results Languages En → Cs En → Lv Dataset Dev MWE Dev MWE Baseline 13.71 10.25 11.29 9.32 +MWE phrases - - 11.94 10.31 +MWE sentences 13.99 10.44 - -
  17. Manual Inspection
  18. Alignment Inspection
  19. Alignment Inspection
  20. Conclusions • First experiments with handling multi-word expressions in neural machine translation – tow methods for MWE integration in NMT training data • Open-source scripts for a complete workflow of identifying, extracting and integrating MWEs into the NMT training and translation workflow • Started work on an open-source tool for visualizing NMT attention alignments (Rikters et al., 2017)
  21. Advertisements
  22. References • Chen, W., Matusov, E., Khadivi, S., and Peter, J.-T. (2016). Guided alignment training for topic- aware neural machine translation. AMTA 2016, Vol., page 121. • Cohn, T., Hoang, C. D. V., Vymolova, E., Yao, K., Dyer, C., and Haffari, G. (2016). Incorporating structural alignment biases into an attentional neural translation model. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 876–885, San Diego, California. Association for Computational Linguistics. • Majchrakova, D., Dusek, O., Hajic, J., Karcova, A., and Garabik, R. (2012). Semi-automatic detection of multiword expressions in the Slovak dependency treebank. • Pecina, P. (2008). Reference data for Czech collocation extraction. In Proc. of the LREC Workshop Towards a Shared Task for MWEs (MWE 2008), pages 11–14. • Rikters, M., Fishel, M., Bojar, O. (2017). Visualizing Neural Machine Translation Attention and Confidence. Prague Bulletin for Mathematical Linguistics, volume 109. • Skadiņa, I. (2016). Multi-word expressions in English - Latvian. In Human Language Technologies – The Baltic Perspective: Proceedings of the Seventh International Conference Baltic HLT 2016, volume 289, page 97. IOS Press. • Tang, Y., Meng, F., Lu, Z., Li, H., and Yu, P. L. H. (2016). Neural machine translation with external phrase memory. CoRR, abs/1606.01792.
  23. Code & Presentation

Notas do Editor

  1. Wang et al. propose a method to translate phrases in NMT by integrating a phrase memory storing target phrases from a phrase-based statistical machine translation (SMT) system into the encoder-decoder architecture of NMT. Curriculum learning, namely learning first on short target (Czech) sentences only and gradually adding also longer sentences to the batches as the training progresses.
Anúncio