Neural Network Language Models for Candidate Scoring in Multi-System Machine Translation

Postdoctoral Researcher em The University of Tokyo
10 de Dec de 2016

Mais conteúdo relacionado

Apresentações para você(20)

Similar a Neural Network Language Models for Candidate Scoring in Multi-System Machine Translation(20)


Neural Network Language Models for Candidate Scoring in Multi-System Machine Translation

  1. Neural Network Language Models for Candidate Scoring in Multi-System Machine Translation Matīss Rikters University of Latvia COLING 2016 6th Workshop on Hybrid Approaches to Translation Osaka, Japan December 11, 2016
  2. Contents 1. Introduction 2. Baseline System 3. Example Sentence 4. Neural Network Language Models 5. Results 6. Related publications 7. Future plans
  3. Chunking – Parse sentences with Berkeley Parser (Petrov et al., 2006) – Traverse the syntax tree bottom up, from right to left – Add a word to the current chunk if • The current chunk is not too long (sentence word count / 4) • The word is non-alphabetic or only one symbol long • The word begins with a genitive phrase («of ») – Otherwise, initialize a new chunk with the word – In case when chunking results in too many chunks, repeat the process, allowing more (than sentence word count / 4) words in a chunk Translation with online MT systems – Google Translate; Bing Translator; Yandex.Translate; 12-gram language model – DGT-Translation Memory corpus (Steinberger, 2011) – 3.1 million Latvian legal domain sentences Baseline System
  4. Teikumu dalīšana tekstvienībās Tulkošana artiešsaistes MT API Google Translate Bing Translator LetsMT Labāko fragmentu izvēle Tulkojumu izvade Teikumu sadalīšana fragmentos Sintaktiskā analīze Teikumu apvienošana Sentence tokenization Translation with online MT Selection of the best chunks Output Syntactic analysis Sentence chunking Sentence recomposition Baseline System
  5. Sentence Chunking
  6. Choose the best candidate KenLM (Heafield, 2011) calculates probabilities based on the observed entry with longest matching history 𝑤𝑓 𝑛 : 𝑝 𝑤 𝑛 𝑤1 𝑛−1 = 𝑝 𝑤 𝑛 𝑤𝑓 𝑛−1 𝑖=1 𝑓−1 𝑏(𝑤𝑖 𝑛−1 ) where the probability 𝑝 𝑤 𝑛 𝑤𝑓 𝑛−1 and backoff penalties 𝑏(𝑤𝑖 𝑛−1 ) are given by an already-estimated language model. Perplexity is then calculated using this probability: where given an unknown probability distribution p and a proposed probability model q, it is evaluated by determining how well it predicts a separate test sample x1, x2... xN drawn from p.
  7. Example sentence
  8. Example sentence
  9. Example sentence
  10. Example sentence
  11. Example sentence
  12. Example sentence
  13. Example sentence
  14. Example sentence
  15. Example sentence
  16. Example sentence
  17. Example sentence
  18. Example sentence
  19. Example sentence
  20. Example sentence
  21. Example sentence
  22. Example sentence Recently there has been an increased interest in the automated discovery of equivalent expressions in different languages .
  23. Neural Language Models • RWTHLM • CPU only • Feed-forward, recurrent (RNN) and long short-term memory (LSTM) NNs • MemN2N • CPU or GPU • End-to-end memory network (RNN with attention) • Char-RNN • CPU or GPU • RNNs, LSTMs and rated recurrent units (GRU) • Character level
  24. Best Models • RWTHLM • one feed-forward input layer with a 3-word history, followed by one linear layer of 200 neurons with sigmoid activation function • MemN2N • internal state dimension of 150, linear part of the state 75, number of hops set to six • Char-RNN • 2 LSTM layers with 1,024 neurons each, dropout set to 0.5
  25. Char-RNN • A character level model works better for highly inflected languages with less data • Requires Torch scientific computing framework + additional packages • Can run on CPU, NVIDIA GPU or AMD GPU • Intended for generating new text, modified to score new text More in Andrej Karpathy’s blog
  26. Experiment Environment Training • Baseline KenLM and RWTHLM modes • 8-core CPU with 16GB of RAM • MemN2N • GeForce Titan X (12GB, 3,072 CUDA cores) 12-core CPU and 64GB RAM • Char-RNN • Radeon HD 7950 (3GB, 1,792 cores) 8-core CPU and 16GB RAM Translation • All models • 4-core CPU with 16GB of RAM
  27. Results System Perplexity Training Corpus Size Trained On Training Time BLEU KenLM 34.67 3.1M CPU 1 hour 19.23 RWTHLM 136.47 3.1M CPU 7 days 18.78 MemN2N 25.77 3.1M GPU 4 days 18.81 Char-RNN 24.46 1.5M GPU 2 days 19.53
  28. General domain 12.00 12.50 13.00 13.50 14.00 14.50 15.00 15.50 16.00 16.50 17.00 15.00 20.00 25.00 30.00 35.00 40.00 45.00 50.00 0.11 0.20 0.32 0.41 0.50 0.61 0.70 0.79 0.88 1.00 1.09 1.20 1.29 1.40 1.47 1.56 1.67 1.74 1.77 BLEU Perplexity Epoch Perplexity BLEU-HY BLEU-BG Linear (BLEU-HY) Linear (BLEU-BG)
  29. Legal domain 16.00 17.00 18.00 19.00 20.00 21.00 22.00 23.00 24.00 25.00 15.00 20.00 25.00 30.00 35.00 40.00 45.00 50.00 0.11 0.20 0.32 0.41 0.50 0.61 0.70 0.79 0.88 1.00 1.09 1.20 1.29 1.40 1.47 1.56 1.67 1.74 1.77 BLEU Perplexity Epoch Perplexity BLEU-BG BLEU-HY Linear (BLEU-BG) Linear (BLEU-HY)
  30. • Matīss Rikters "Multi-system machine translation using online APIs for English-Latvian" ACL-IJCNLP 2015 4th HyTra Workshop • Matīss Rikters and Inguna Skadiņa "Syntax-based multi-system machine translation" LREC 2016 • Matīss Rikters and Inguna Skadiņa "Combining machine translated sentence chunks from multiple MT systems" CICLing 2016 • Matīss Rikters "K-translate – interactive multi-system machine translation" Baltic DB&IS 2016 • Matīss Rikters "Searching for the Best Translation Combination Across All Possible Variants" Baltic HLT 2016 Related publications
  31. Baseline system • Only the chunker + visualizer • Interactive browser version • With integrated usage of NN LMs • Code on GitHub
  32. More enhancements for the chunking step – Try dependency parsing instead of constituency Choose the best translation candidate with MT quality estimation – QuEst++ (Specia et al., 2015) – SHEF-NN (Shah et al., 2015) Add special processing of multi-word expressions (MWEs) Handle MWEs in neural machine translation systems Future work
  33. References• Ahsan, A., and P. Kolachina. "Coupling Statistical Machine Translation with Rule-based Transfer and Generation, AMTA-The Ninth Conference of the Association for Machine Translation in the Americas." Denver, Colorado (2010). • Barrault, Loïc. "MANY: Open source machine translation system combination." The Prague Bulletin of Mathematical Linguistics 93 (2010): 147-155. • Heafield, Kenneth. "KenLM: Faster and smaller language model queries." Proceedings of the Sixth Workshop on Statistical Machine Translation. Association for Computational Linguistics, 2011. • Kim, Yoon, et al. "Character-aware neural language models." arXiv preprint arXiv:1508.06615 (2015). • Mellebeek, Bart, et al. "Multi-engine machine translation by recursive sentence decomposition." (2006). • Mikolov, Tomas, et al. "Recurrent neural network based language model." INTERSPEECH. Vol. 2. 2010. • Petrov, Slav, et al. "Learning accurate, compact, and interpretable tree annotation." Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 2006. • Raivis Skadiņš, Kārlis Goba, Valters Šics. 2010. Improving SMT for Baltic Languages with Factored Models. Proceedings of the Fourth International Conference Baltic HLT 2010, Frontiers in Artificial Intelligence and Applications, Vol. 2192. , 125-132. • Rikters, M., Skadiņa, I.: Syntax-based multi-system machine translation. LREC 2016. (2016) • Rikters, M., Skadiņa, I.: Combining machine translated sentence chunks from multiple MT systems. CICLing 2016. (2016) • Santanu, Pal, et al. "USAAR-DCU Hybrid Machine Translation System for ICON 2014" The Eleventh International Conference on Natural Language Processing. , 2014. • Schwenk, Holger, Daniel Dchelotte, and Jean-Luc Gauvain. "Continuous space language models for statistical machine translation." Proceedings of the COLING/ACL on Main conference poster sessions. Association for Computational Linguistics, 2006. • Shah, Kashif, et al. "SHEF-NN: Translation Quality Estimation with Neural Networks." Proceedings of the Tenth Workshop on Statistical Machine Translation. 2015. • Specia, Lucia, G. Paetzold, and Carolina Scarton. "Multi-level Translation Quality Prediction with QuEst++." 53rd Annual Meeting of the Association for Computational Linguistics and Seventh International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing: System Demonstrations. 2015. • Steinberger, Ralf, et al. "Dgt-tm: A freely available translation memory in 22 languages." arXiv preprint arXiv:1309.5226 (2013). • Steinberger, Ralf, et al. "The JRC-Acquis: A multilingual aligned parallel corpus with 20+ languages." arXiv preprint cs/0609058 (2006). References
  34. Thank you! Thank you!