Neural Network Language Models for Candidate Scoring in Multi-System Machine Translation
Neural Network Language Models
for Candidate Scoring
in Multi-System Machine Translation
Matīss Rikters
University of Latvia
COLING 2016 6th Workshop on
Hybrid Approaches to Translation
Osaka, Japan
December 11, 2016
Contents
1. Introduction
2. Baseline System
3. Example Sentence
4. Neural Network Language Models
5. Results
6. Related publications
7. Future plans
Chunking
– Parse sentences with Berkeley Parser (Petrov et al., 2006)
– Traverse the syntax tree bottom up, from right to left
– Add a word to the current chunk if
• The current chunk is not too long (sentence word count / 4)
• The word is non-alphabetic or only one symbol long
• The word begins with a genitive phrase («of »)
– Otherwise, initialize a new chunk with the word
– In case when chunking results in too many chunks, repeat the process,
allowing more (than sentence word count / 4) words in a chunk
Translation with online MT systems
– Google Translate; Bing Translator; Yandex.Translate; Hugo.lv
12-gram language model
– DGT-Translation Memory corpus (Steinberger, 2011) – 3.1 million
Latvian legal domain sentences
Baseline System
Teikumu dalīšana tekstvienībās
Tulkošana artiešsaistes MT API
Google
Translate
Bing
Translator
LetsMT
Labāko fragmentu izvēle
Tulkojumu izvade
Teikumu sadalīšana fragmentos
Sintaktiskā analīze
Teikumu apvienošana
Sentence tokenization
Translation with online MT
Selection of
the best chunks
Output
Syntactic analysis
Sentence chunking
Sentence
recomposition
Baseline System
Choose the best candidate
KenLM (Heafield, 2011) calculates probabilities based on the
observed entry with longest matching history 𝑤𝑓
𝑛
:
𝑝 𝑤 𝑛 𝑤1
𝑛−1
= 𝑝 𝑤 𝑛 𝑤𝑓
𝑛−1
𝑖=1
𝑓−1
𝑏(𝑤𝑖
𝑛−1
)
where the probability 𝑝 𝑤 𝑛 𝑤𝑓
𝑛−1
and backoff penalties
𝑏(𝑤𝑖
𝑛−1
) are given by an already-estimated language model.
Perplexity is then calculated using this probability:
where given an unknown probability distribution p and a
proposed probability model q, it is evaluated by determining
how well it predicts a separate test sample x1, x2... xN drawn
from p.
Example sentence
Recently there has been an increased interest
in the automated discovery
of equivalent expressions in different languages .
Neural Language Models
• RWTHLM
• CPU only
• Feed-forward, recurrent (RNN) and long short-term
memory (LSTM) NNs
• MemN2N
• CPU or GPU
• End-to-end memory network (RNN with attention)
• Char-RNN
• CPU or GPU
• RNNs, LSTMs and rated recurrent units (GRU)
• Character level
Best Models
• RWTHLM
• one feed-forward input layer with a 3-word
history, followed by one linear layer of 200
neurons with sigmoid activation function
• MemN2N
• internal state dimension of 150, linear part of
the state 75, number of hops set to six
• Char-RNN
• 2 LSTM layers with 1,024 neurons each,
dropout set to 0.5
Char-RNN
• A character level model works better
for highly inflected languages with
less data
• Requires Torch scientific computing
framework + additional packages
• Can run on CPU, NVIDIA GPU or
AMD GPU
• Intended for generating new text,
modified to score new text
More in Andrej Karpathy’s blog
Experiment Environment
Training
• Baseline KenLM and RWTHLM modes
• 8-core CPU with 16GB of RAM
• MemN2N
• GeForce Titan X (12GB, 3,072 CUDA cores)
12-core CPU and 64GB RAM
• Char-RNN
• Radeon HD 7950 (3GB, 1,792 cores)
8-core CPU and 16GB RAM
Translation
• All models
• 4-core CPU with 16GB of RAM
• Matīss Rikters
"Multi-system machine translation using online APIs for English-Latvian"
ACL-IJCNLP 2015 4th HyTra Workshop
• Matīss Rikters and Inguna Skadiņa
"Syntax-based multi-system machine translation"
LREC 2016
• Matīss Rikters and Inguna Skadiņa
"Combining machine translated sentence chunks from multiple MT systems"
CICLing 2016
• Matīss Rikters
"K-translate – interactive multi-system machine translation"
Baltic DB&IS 2016
• Matīss Rikters
"Searching for the Best Translation Combination Across All Possible Variants"
Baltic HLT 2016
Related publications
Baseline system
• http://ej.uz/ChunkMT
Only the chunker + visualizer
• http://ej.uz/chunker
Interactive browser version
• http://ej.uz/KTranslate
With integrated usage of NN LMs
• http://ej.uz/NNLMs
Code on GitHub
https://github.com/M4t1ss
More enhancements for the chunking step
– Try dependency parsing instead of constituency
Choose the best translation candidate with MT quality estimation
– QuEst++ (Specia et al., 2015)
– SHEF-NN (Shah et al., 2015)
Add special processing of multi-word expressions (MWEs)
Handle MWEs in neural machine translation systems
Future work
References• Ahsan, A., and P. Kolachina. "Coupling Statistical Machine Translation with Rule-based Transfer and Generation, AMTA-The Ninth Conference of
the Association for Machine Translation in the Americas." Denver, Colorado (2010).
• Barrault, Loïc. "MANY: Open source machine translation system combination." The Prague Bulletin of Mathematical Linguistics 93 (2010): 147-155.
• Heafield, Kenneth. "KenLM: Faster and smaller language model queries." Proceedings of the Sixth Workshop on Statistical Machine Translation.
Association for Computational Linguistics, 2011.
• Kim, Yoon, et al. "Character-aware neural language models." arXiv preprint arXiv:1508.06615 (2015).
• Mellebeek, Bart, et al. "Multi-engine machine translation by recursive sentence decomposition." (2006).
• Mikolov, Tomas, et al. "Recurrent neural network based language model." INTERSPEECH. Vol. 2. 2010.
• Petrov, Slav, et al. "Learning accurate, compact, and interpretable tree annotation." Proceedings of the 21st International Conference on
Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics. Association for Computational Linguistics,
2006.
• Raivis Skadiņš, Kārlis Goba, Valters Šics. 2010. Improving SMT for Baltic Languages with Factored Models. Proceedings of the Fourth International
Conference Baltic HLT 2010, Frontiers in Artificial Intelligence and Applications, Vol. 2192. , 125-132.
• Rikters, M., Skadiņa, I.: Syntax-based multi-system machine translation. LREC 2016. (2016)
• Rikters, M., Skadiņa, I.: Combining machine translated sentence chunks from multiple MT systems. CICLing 2016. (2016)
• Santanu, Pal, et al. "USAAR-DCU Hybrid Machine Translation System for ICON 2014" The Eleventh International Conference on Natural Language
Processing. , 2014.
• Schwenk, Holger, Daniel Dchelotte, and Jean-Luc Gauvain. "Continuous space language models for statistical machine translation." Proceedings of
the COLING/ACL on Main conference poster sessions. Association for Computational Linguistics, 2006.
• Shah, Kashif, et al. "SHEF-NN: Translation Quality Estimation with Neural Networks." Proceedings of the Tenth Workshop on Statistical Machine
Translation. 2015.
• Specia, Lucia, G. Paetzold, and Carolina Scarton. "Multi-level Translation Quality Prediction with QuEst++." 53rd Annual Meeting of the Association
for Computational Linguistics and Seventh International Joint Conference on Natural Language Processing of the Asian Federation of Natural
Language Processing: System Demonstrations. 2015.
• Steinberger, Ralf, et al. "Dgt-tm: A freely available translation memory in 22 languages." arXiv preprint arXiv:1309.5226 (2013).
• Steinberger, Ralf, et al. "The JRC-Acquis: A multilingual aligned parallel corpus with 20+ languages." arXiv preprint cs/0609058 (2006).
References