Axa Assurance Maroc - Insurer Innovation Award 2024
Hybrid machine translation by combining multiple machine translation systems
1. September 27, 2017 1
Hybrid Machine Translation
by Combining Multiple
Machine Translation Systems
Matīss Rikters
Supervisor: Dr. sc. comp., prof. Inguna Skadiņa
2. September 27, 2017 2
Contents
• Introduction
• Aim and objectives
• Background and related work
• Combining statistical machine translations
• Combining neural machine translations
• Practical implementations
• Conclusions
3. Introduction
• Machine translation (MT) is a sub-field of natural language
processing that investigates the use of computers to
translate text from one language to another
• Rule-based MT (RBMT) is based on linguistic information
covering the main semantic, morphological, and syntactic
regularities of source and target languages
• Statistical MT (SMT) consists of subcomponents that are
separately engineered to learn how to translate from vast
amounts of translated text
• Neural MT (NMT) consists of a large neural network in which
weights are trained jointly to maximize the translation
performance
September 27, 2017 3
4. Introduction
Automatic Evaluation MT
• BLEU - one of the first metrics to report high
correlation with human judgments
• One of the most popular in the field
• The closer MT is to a professional human
translation, the better it is
• Scores a translation on a scale of 0 to 100
September 27, 2017 4
5. Aim and objectives
The aim is to research and develop methods
and tools that allow to combine output from
multiple different machine translation systems
to acquire one superior final translation.
The primary focus is on MT issues related to
Latvian, the reviewed and introduced methods
are usually applicable to other languages as
well
September 27, 2017 5
6. Aim and objectives
Objectives:
• Analyze RBMT, SMT and NMT methods as well
as existing HMT and multi-system MT (MSMT)
methods
• Experiment with different methods of combining
translations
• Evaluate quality of the resulting translations
• Investigate applicability of methods for Latvian
and other morphologically rich. less-resourced
languages
• Provide practical applications of MT combining
September 27, 2017 6
10. Corpus-based MT
September 27, 2017 10
English Latvian
The cat sat on the mat Kaķis sēdēja uz paklāja
The rat sat on the mat Žurka sēdēja uz paklāja
11. Hybrid MT
Statistical rule generation
– Rules for RBMT systems are generated from training corpora
Multi-pass
– Process data through RBMT first, and then through SMT
Multi-System hybrid MT
– Multiple MT systems run in parallel
September 27, 2017 11
14. Combining statistical
machine translation output
• Full sentence translations
• Simple sentence fragments
• Advanced sentence fragments
• Exhaustive search
• Neural network language models
September 27, 2017 14
15. Full sentence translations
September 27, 2017 15
Sentence tokenization
Translation with APIs
Google Translate Bing Translator LetsMT
Selection of the best
translation
Output
16. Full sentence translations
September 27, 2017 16
Probabilities are calculated based on the observed entry with
longest matching history 𝑤𝑓
𝑛
:
𝑝 𝑤 𝑛 𝑤1
𝑛−1
= 𝑝 𝑤 𝑛 𝑤𝑓
𝑛−1
𝑖=1
𝑓−1
𝑏(𝑤𝑖
𝑛−1
),
where the probability 𝑝 𝑤 𝑛 𝑤𝑓
𝑛−1
and backoff penalties
𝑏(𝑤𝑖
𝑛−1
) are given by an already-estimated language model.
Perplexity is then calculated using this probability:
𝑏−
1
𝑁 𝑖=1
𝑁
𝑙𝑜𝑔 𝑏 𝑞 𝑥 𝑖
,
where given an unknown probability distribution p and a
proposed probability model q, it is evaluated by determining
how well it predicts a separate test sample x1, x2... xN drawn
from p.
17. Experiments
September 27, 2017 17
• En→Lv
• Language model for the target language (Lv)
– JRC Acquis corpus version 3.0 (1.4M sentences)
– 5-gram LM trained with KenLM
• Parallel test sets
– 1581 random sentences from the JRC Acquis 3.0
– ACCURAT balanced test corpus for under
resourced languages (512 sentences)
18. Experiments
September 27, 2017 18
ACCURAT balanced test corpus
System BLEU
Google Translate 24.73
Bing Translator 22.07
LetsMT! 32.01
Hybrid Google + Bing 23.75
Hybrid Google + LetsMT! 28.94
Hybrid LetsMT! + Bing 27.44
Hybrid Google + Bing + LetsMT! 26.74
19. Experiments
September 27, 2017 19
JRC Acquis test corpus
System BLEU TER WER
Translations selected
Google Bing LetsMT Equal
Google Translate 16.92 47.68 58.55 100 % - - -
Bing Translator 17.16 49.66 58.40 - 100 % - -
LetsMT 28.27 36.19 42.89 - - 100 % -
Hybrid
Google + Bing
17.28 48.30 58.15 50.09 % 45.03 % - 4.88 %
Hybrid
Google + LetsMT
22.89 41.38 50.31 46.17 % - 48.39 % 5.44 %
Hybrid
LetsMT + Bing
22.83 42.92 50.62 - 45.35 % 49.84 % 4.81 %
Hybrid
Google + Bing + LetsMT
21.08 44.12 52.99 28.93 % 34.31 % 33.98 % 2.78 %
20. Human evaluation
September 27, 2017 20
• 5 native Latvian speakers were given a random 2% - 32
sentences
• They were told to mark which of the three MT outputs is the
best, worst and OK
• With the option to select multiple answers for best, worst or
OK
21. Human evaluation
September 27, 2017 21
System User 1 User 2 User 3 User 4 User 5 AVG user Hybrid BLEU
Bing 21,88% 53,13% 28,13% 25,00% 31,25% 31,88% 28,93% 16.92
Google 28,13% 25,00% 25,00% 28,13% 46,88% 30,63% 34,31% 17.16
LetsMT! 50,00% 21,88% 46,88% 46,88% 21,88% 37,50% 33,98% 28.27
22. Simple sentence fragments
September 27, 2017 22
Sentence tokenization
Translation with APIs
Google
Translate
Bing
Translator
LetsMT
Selection of the best translated chunk
Output
Sentence chunking (decomposition)
Syntactic parsing
Sentence recomposition
23. Simple sentence fragments
September 27, 2017 23
Input
Output
Google
Translate
3. the list referred to in paragraph 1 and all
amendments thereto shall be published in the official
journal of the european communities .
Chunk
( (S (NP (NP (CD 3.)) (SBAR (S (NP (DT the) (NN list)) (VP (VBD referred) (PP
(TO to)) (PP (IN in) (NP (NP (NN paragraph) (CD 1)) (CC and) (NP (DT all)
(NNS amendments) (NN thereto)))))))) (VP (MD shall) (VP (VB be) (VP (VBN
published) (PP (IN in) (NP (NP (DT the) (JJ official) (NN journal)) (PP (IN of)
(NP (DT the) (JJ european) (NNS communities)))))))) (. .)) )
Parse
3. the list referred to in
paragraph 1 and all
amendments thereto
shall be published in the
official journal of the
european communities
.
Google
Translate
LetsMT
Bing
Translator
Bing
Translator
LetsMT
Google
Translate
Bing Translator LetsMT
Google
Translate
Bing
Translator
LetsMT
Recompose
3. sarakstu, kas minēts 1. punktā,
un visus tā grozījumus ir publicēti
Eiropas kopienu oficiālajā žurnālā.
.. .
ir publicēti
Eiropas kopienu
oficiālajā
žurnālā.
publicē
oficiālajā
vēstnesī Eiropas
Kopienu
publicē Eiropas
Kopienu
Oficiālajā
Vēstnesī
3. punktā
minēto sarakstu
un visus
grozījumus 1
3. sarakstu, kas
minēts 1.
punktā, un visus
tā grozījumus
3. sarakstu, kas
minētas punktā
1 un visi
grozījumi tajos
24. Experiments
Syntactic analysis
Berkeley Parser
Sentences are split into chunks from the top level subtrees of the
syntax tree
Selection of the best chunk
The same as in the previous experiment
(5-gram LM with KenLM using JRC-Acquis)
Test data
The same as in the previous experiment
(1581 random sentences from JRC-Acquis)
September 27, 2017 24
25. Experiments
System
BLEU NIST
MHyT SyMHyT MHyT SyMHyT
Google Translate 18.09 8.37
Bing Translator 18.87 8.09
LetsMT! 30.28 9.45
Google + Bing 18.73 21.27 7.76 8.30
Google + LetsMT 24.50 26.24 9.60 9.09
LetsMT! + Bing 24.66 26.63 9.47 8.97
Google + Bing + LetsMT! 22.69 24.72 8.57 8.24
September 27, 2017 25
27. Additional Experiments
September 27, 2017 27
Language Model Size (sentences) BLEU
5-gram JRC 1.4 million 24.72
12-gram JRC 1.4 million 24.70
12-gram DGT-TM 3.1 million 24.04
Experiments with different language models
Experiments with random chunks
Chunks BLEU
SyMHyT chunks 24.72
5-grams 11.85
Random 1-4 grams 7.33
Random 1-6 grams 10.25
Random 6-max grams 20.94
28. Human evaluation
September 27, 2017 28
System
Fluency
AVG
Accuracy
AVG
SyMHyT
selection
BLEU
Google 35.29% 34.93% 16.83% 18.09
Bing 23.53% 23.97% 17.94% 18.87
LetsMT 20.00% 21.92% 65.23% 30.28
SyMHyT 21.18% 19.18% - 24.72
29. Advanced sentence
fragments
An advanced approach to chunking
– Traverse the syntax tree bottom up, from right to left
– Add a word to the current chunk if
• The current chunk is not too long (sentence word count / 4)
• The word is non-alphabetic or only one symbol long
• The word begins with a genitive phrase («of »)
– Otherwise, initialize a new chunk with the word
– In case when chunking results in too many chunks,
repeat the process, allowing more (than sentence word
count / 4) words in a chunk
Changes in the MT API systems
– LetsMT! API temporarily replaced with Hugo.lv API
– Added Yandex API
September 27, 2017 29
31. Experiments
September 27, 2017 31
Selection of the best translation:
6-gram and 12-gram LMs trained with
– KenLM
– JRC-Acquis corpus v. 3.0
– DGT-Translation Memory corpus – 3.1 million sentences
– Sentences scored with the query program from KenLM
Test corpora
– 1581 random sentences from JRC-Acquis
– ACCURAT balanced evaluation corpus
32. Experiments
September 27, 2017 32
Sentence chunks with SyMHyT Sentence chunks with ChunkMT
• Recently
• there
• has been an increased interest in the
automated discovery of equivalent
expressions in different languages .
• Recently there has been an
increased interest
• in the automated discovery of
equivalent expressions
• in different languages .
34. Experiments
September 27, 2017 34
System BLEU Equal Bing Google Hugo Yandex
BLEU - - 17.43 17.73 17.14 16.04
MSMT
Google + Bing 17.70 7.25% 43.85% 48.90% - -
MSMT
Google + Bing + LetsMT 17.63 3.55% 33.71% 30.76% 31.98% -
SyMHyT
Google + Bing 17.95 4.11% 19.46% 76.43% - -
SyMHyT
Google + Bing + LetsMT 17.30 3.88% 15.23% 19.48% 61.41% -
ChunkMT
Google + Bing 18.29 22.75% 39.10% 38.15% - -
ChunkMT
all four 19.21 7.36% 30.01% 19.47% 32.25% 10.91%
35. Exhaustive search
The main differences:
• the manner of scoring chunks with the LM
and selecting the best translation
• utilisation of multi-threaded computing that
allows to run the process on all available
CPU cores in parallel
• very slow
September 27, 2017 35
38. Experiments
September 27, 2017 38
System
BLEU
Legal General
Full-search 23.61 14.40
Linguistic chunks 20.00 17.27
Bing 16.99 17.43
Google 16.19 17.72
Hugo 20.27 17.13
Yandex 19.75 16.03
39. Experiment Results
September 27, 2017 39
System Sentence / Chunk Perplexity
Full-search
Šis lēmums stājas spēkā tā publicēšanas dienā oficiālajā
vēstnesī .
16.57
ChunkMT
šo lēmumu . stājas spēkā tās publicēšanas dienā , oficiālajā
vēstnesī .
132.14
Other
possible
variants
šo lēmumu lēmums stājas spēkā trešajā dienā pēc tās
publicēšanas valsts oficiālajā vēstnesī.
54.31
Šis lēmums lēmums stājas spēkā trešajā dienā pēc tās
publicēšanas valsts oficiālajā vēstnesī .
68.82
Šis lēmums stājas spēkā tās publicēšanas dienā Savienības
Oficiālajā Vēstnesī .
21.79
40. Experiment Results
September 27, 2017 40
System Chunk / Perplexity
Bing
Šis lēmums lēmums stājas spēkā trešajā
dienā pēc tās publicēšanas
Savienības Oficiālajā Vēstnesī.
70.73 33.21 678.29
Google
šis lēmums stājas spēkā tā publicēšanas
dienā
oficiālajā vēstnesī.
568.43 64.58 6858.23
Hugo
šo lēmumu . stājas spēkā tās publicēšanas
dienā ,
valsts oficiālajā vēstnesī.
48.04 23.91 951.49
Yandex
šo lēmumu stājas spēkā tās publicēšanas
dienā
oficiālajā vēstnesī .
760.09 61.66 164.97
41. Neural network
language models
September 27, 2017 41
• RWTHLM
• CPU only
• Feed-forward, recurrent (RNN) and long short-term
memory (LSTM) NNs
• MemN2N
• CPU or GPU
• End-to-end memory network (RNN with attention)
• Char-RNN
• CPU or GPU
• RNNs, LSTMs and rated recurrent units
(GRU)
• Character level
42. Best models
September 27, 2017 42
• RWTHLM
• one feed-forward input layer with a 3-word
history, followed by one linear layer of 200
neurons with sigmoid activation function
• MemN2N
• internal state dimension of 150, linear part of
the state 75 and number of hops set to six
• Char-RNN
• 2 LSTM layers with 1024 neurons each and
the dropout set to 0.5
43. Experiment
Environment
September 27, 2017 43
Training
• Baseline KenLM and RWTHLM modes
• 8-core CPU with 16GB of RAM
• MemN2N
• GeForce Titan X (12GB, 3,072 CUDA cores)
12-core CPU and 64GB RAM
• Char-RNN
• Radeon HD 7950 (3GB, 1,792 cores)
8-core CPU and 16GB RAM
Translation
• All models
• 4-core CPU with 16GB of RAM
44. Experiment Results
September 27, 2017 44
System Perplexity
Training
Corpus
Size
Trained
On
Training
Time
BLEU
KenLM 34.67 3.1M CPU 1 hour 19.23
RWTHLM 136.47 3.1M CPU 7 days 18.78
MemN2N 25.77 3.1M GPU 4 days 18.81
Char-RNN 24.46 1.5M GPU 2 days 19.53
48. Combining neural machine
translation output
• Experimenting with NMT attention alignments
• Simple system combination using neural
network attention
• System combination by estimating confidence
from neural network attention
September 27, 2017 48
49. Experimenting with NMT
attention alignments
Goals
• Improve translation of multiword-expressions
• Keep track of changes in attention alignments
September 27, 2017 49
50. Workflow
September 27, 2017 50
Tag corpora with
morphological
taggers
UDPipe
LV Tagger
Identify MWE
candidates
MWE Toolkit
Align identified
MWE candidates
MPAligner
Shuffle MWEs into
training corpora;
Train NMT systems
Neural Monkey
Identify changes
51. Data
Training
– En → Lv
• 4.5M parallel sentences
for the baseline
• 4.8M after adding
MEWs/MWE sentences
– En → Cs
• 49M parallel sentences
for the baseline
• 17M after adding
MEWs/MWE sentences
Evaluation
– En → Lv
• 2003 sentences in total
• 611 sentences with at
least one MWE
– En → Cs
• 6000 sentences in total
• 112 sentences with at
least one MWE
September 27, 2017 51
WMT17 News Translation Task
52. Data
En → Lv
En → Cs
September 27, 2017 52
1M 1xMWE 1M 2xMWE 2M 2xMWE 0.5M
2.5M 1xMWE 2.5M 2xMWE 5M 2xMWE 5M
53. NMT Systems
Neural Monkey
– Embedding size 350
– Encoder state size 350
– Decoder state size 350
– Max sentence length 50
– BPE merges 30000
September 27, 2017 53
54. Experiments
Two forms of the presenting MWEs to the NMT system
– Adding only the parallel MWEs themselves
(MWE phrases)
each pair forming a new “sentence pair” in the parallel corpus
– Adding full sentences that contain the identified MWEs
(MWE sentences)
September 27, 2017 54
Languages En → Cs En → Lv
Dataset Dev MWE Dev MWE
Baseline 13.71 10.25 11.29 9.32
+MWE phrases - - 11.94 10.31
+MWE sentences 13.99 10.44 - -
59. Simple system combination using
neural network attention
Workflow
• Translate the same sentence with two different NMT
systems and one SMT system; save attention
alignment data from the NMT systems
• Choose output from the system that does not
• Align most of its attention to a single token
• Have only very strong one-to-one alignments
• Otherwise - back off to the output of the SMT
system
September 27, 2017 59
60. Experiments
September 27, 2017 60
System En->Lv Lv->En
Dataset Dev Test Dev Test
LetsMT! 19.8 12.9 24.3 13.4
Neural Monkey 16.7 13.5 15.7 14.3
Nematus 16.9 13.6 15.0 13.8
NM+NT+LMT - 13.6 - 14.3
Data – WMT17 News translation Task
62. System combination by estimating
confidence from neural network attention
September 27, 2017 62
Source Viņš bija labs cilvēks ar plašu sirdi.
Reference He was a kind spirit with a big heart.
Hypothesis He was a good man with a wide heart.
CDP -0.099
APout -1.077
APin -0.847
Confidence -2.024
63. System combination by estimating
confidence from neural network attention
September 27, 2017 63
Source
Aizvadītajā diennaktī Latvijā reģistrēts 71 ceļu satiksmes negadījumos, kuros cietuši
16 cilvēki.
Reference
71 traffic accidents in which 16 persons were injured have happened in Latvia
during the last 24 hours.
Hypothesis
The first day of the EU’European Parliament is the first of the three years of the Eur
opean Union .
CDP -0.900
APout -2.809
APin -2.137
Confidence -5.846
64. Experiments
September 27, 2017 64
BLEU
System En->De De->En En->Lv Lv->En
Neural Monkey 18.89 26.07 13.74 11.09
Nematus 22.35 30.53 13.80 12.64
Hybrid 20.19 27.06 14.79 12.65
Human 23.86 34.26 15.12 13.24
Data – WMT17 News translation Task
65. Human Evaluation
September 27, 2017 65
En->Lv Lv->En
LM-based overlap with human 58% 56%
Attention-based overlap with human 52% 60%
LM-based overlap with Attention-based 34% 22%
Language pair CDP APin APout Overall
En->Lv 0.099 0.074 0.123 0.086
Lv->En -0.012 -0.153 -0.2 -0.153
68. Interactive multi-system
machine translation
September 27, 2017 68
Start page
Translate with
onlinesystems
Inputtranslations
to combine
Input
translated
chunks
Settings
Translation results
Inputsource
sentence
Inputsource
sentence
69. Interactive multi-system
machine translation
• Adding a user-friendly interface to ChunkMT
– Draws a syntax tree with chunks highlighted
– Designates which chunks where chosen from which system
– Provides a confidence score for the choices
• Allows using online APIs or user provided translations
• Comes with resources for translating between
English, French, German and Latvian
• Can be used in a web browser
September 27, 2017 69
72. Visualizing NMT
attention and confidence
Works with attention alignment data from
• Nematus
• Neural Monkey
• AmuNMT
• OpenNMT
• Sockeye
Visualise translations in
• Linux Terminal or Windows PowerShell
• Web browser
• Line form or matrix form
• Save as PNG
• Sort and navigate dataset by confidence scores
September 27, 2017 72
76. Conclusions
• Exploration of a variety of methods for
combining multiple MT systems
• Mostly focused on translating from and to
Latvian, but also on other morphologically
complex languages like Czech and German
• All results evaluated using automatic
metrics; most of them also using manual
human evaluation
September 27, 2017 76
77. Conclusions
• Hybrid MT combination via chunking outperformed
individual systems in translating long sentences
• Hybrid combination for NMT via attention
alignments complies to the emerging technology
of neural network systems and can distinguish low
quality translations from high quality ones
• The graphical tools serve for performing
translations while inspecting their composition of
parts from individual systems, as well as
overviewing results of already generated
translations to quickly locate better or worse results
September 27, 2017 77
78. Conclusions
Since in most cases of evaluating both, the
chunking method and the attention-based
method, the author observed improvements in
automatic, as well as human evaluation, the
proposed hypothesis that it is possible to
achieve a higher quality MT than produced
by each component system individually, by
combining output from multiple different MT
systems can be considered as proven.
September 27, 2017 78
79. Main results
• Methods for hybrid machine translation
combination via chunking;
• Methods for hybrid neural machine
translation combination via attention
alignments;
• Graphical tools for overviewing the
processes.
September 27, 2017 79
80. Publications
• 11 publications
• 3 indexed in Web of Science
• 2 indexed in Scopus
• 10 peer reviewed
• Presented in
• 8 conferences
• 2 workshops
September 27, 2017 80
81. Publications
• Rikters, M., Fishel, M., (2017, September). Confidence
Through Attention. In the proceedings of The 16th Machine
Translation Summit.
• Rikters, M., Bojar, O. (2017, September). Paying Attention
to Multi-word Expressions in Neural Machine Translation. In
the proceedings of The 16th Machine Translation Summit.
• Rikters, M., Amrhein, C., Del, M., Fishel, M. (2017b,
September). C-3MA: Tartu-Riga-Zurich Translation Systems
for WMT17. In the proceedings of The 2nd Conference on
Machine Translation.
September 27, 2017 81
82. Publications
• Rikters, M., Fishel, M., Bojar, O. (2017a, August).
Visualizing Neural Machine Translation Attention and
Confidence. In The Prague Bulletin for Mathematical
Linguistics issue 109.
• Rikters, M. (2016d, December). Neural Network Language
Models for Candidate Scoring in Hybrid Multi-System
Machine Translation. In CoLing 2016, 6th Workshop on
Hybrid Approaches to Translation (HyTra 6).
• Rikters, M. (2016c, October). Searching for the Best
Translation Combination Across All Possible Variants. In
The 7th Conference on Human Language Technologies -
the Baltic Perspective (Baltic HLT 2016) (pp. 92-96).
September 27, 2017 82
83. Publications
• Rikters, M. (2016b, September). Interactive multi-system
machine translation with neural language models. In
Frontiers in Artificial Intelligence and Applications.
• Rikters, M. (2016a, July). K-Translate-Interactive Multi-
System Machine Translation. In The 12th International Baltic
Conference on Databases and Information Systems (pp.
304-318). Springer International Publishing.
• Rikters, M., Skadiņa, I. (2016b, May). Syntax-based multi-
system machine translation. In N. C. C. Chair) et al. (Eds.),
In Proceedings of The 10th International Conference on
Language Resources and Evaluation (LREC 2016). Paris,
France: European Language Resources Association
(ELRA).
September 27, 2017 83
84. Publications
• Rikters, M., Skadiņa, I. (2016a, April) Combining machine
translated sentence chunks from multiple MT systems. In
The 17th International Conference on Intelligent Text
Processing and Computational Linguistics (CICLing 2016).
• Rikters, M. (2015, July). Multi-system machine translation
using online APIs for English-Latvian. In ACL-IJCNLP 2015,
4th Workshop on Hybrid Approaches to Translation (HyTra
4).
September 27, 2017 84