SlideShare uma empresa Scribd logo
1 de 23
Matīss Rikters
Searching for the Best Machine
Translation Combination
Tartu, Estonia
22.03.2017
Machine Translation
Hybrid Machine Translation
Methods I used
• A count-based language model for candidate selection from full whole translations
• Combining translations of sentence chunks
• Combining translations of linguistically motivated chunks
• A character-level neural language model for candidate selection
A graphical implementation of the methods
Translation of multiword expressions
Other academic activities
Future plans
Contents
• Machine translation (MT) is a sub-field of natural language processing that
investigates the use of computers to translate text from one language to another
• Statistical MT (SMT) consists of subcomponents that are separately engineered
to learn how to translate from vast amounts of translated text
• Rule-based MT (RBMT) is based on linguistic information covering the main
semantic, morphological, and syntactic regularities of source and target languages
• Neural MT (NMT) consists of a large neural network in which weights are trained
jointly to maximize the translation performance
Machine Translation
• One of the first metrics to report high correlation with human judgments
• One of the most popular in the field
• The closer MT is to a professional human translation, the better it is
• Scores a translation on a scale of 0 to 100
Automatic Evaluation of MT: BLEU
Statistical rule generation
• Rules for RBMT systems are generated from training corpora
Multi-pass
• Process data through RBMT first, and then through SMT
Multi-System hybrid MT
• Multiple MT systems run in parallel
• SMT + RBMT (Ahsan and Kolachina, 2010)
• Confusion Networks (Barrault, 2010)
+ Neural Network Model (Freitag et al., 2015)
• SMT + EBMT + TM + NE (Santanu et al., 2014)
• Recursive sentence decomposition (Mellebeek et al., 2006)
Literature Review: Hybrid Machine Translation
Combining full whole translations
• Translate the full input sentence with multiple MT systems
• Choose the best translation as the output
Combining translations of sentence chunks
• Split the sentence into smaller chunks
• The chunks are the top level subtrees of the syntax tree of the sentence
• Translate each chunk with multiple MT systems
• Choose the best translated chunks and combine them
Combining Translations
KenLM (Heafield, 2011) calculates probabilities based on the observed entry with longest matching
history 𝑤𝑓
𝑛
:
𝑝 𝑤 𝑛 𝑤1
𝑛−1
= 𝑝 𝑤 𝑛 𝑤𝑓
𝑛−1
𝑖=1
𝑓−1
𝑏(𝑤𝑖
𝑛−1
)
where the probability 𝑝 𝑤 𝑛 𝑤𝑓
𝑛−1
and backoff penalties 𝑏(𝑤𝑖
𝑛−1
) are given by an already-estimated
language model. Perplexity is then calculated using this probability: where
given an unknown probability distribution p and a proposed probability model q, it is evaluated by
determining how well it predicts a separate test sample x1, x2... xN drawn from p.
Candidate Selection
Teikumu dalīšana tekstvienībās
Tulkošana ar tiešsaistes MT API
Google Translate Bing Translator LetsMT
Labākā tulkojuma izvēle
Tulkojuma izvade
Sentence tokenization
Translation with online MT
Selection of
the best translation
Output
Whole Translations
Teikumu dalīšana tekstvienībās
Tulkošana artiešsaistes MT API
Google
Translate
Bing
Translator
LetsMT
Labāko fragmentu izvēle
Tulkojumu izvade
Teikumu sadalīšana fragmentos
Sintaktiskā analīze
Teikumu apvienošana
Sentence tokenization
Translation with online MT
Selection of
the best chunks
Output
Syntactic analysis
Sentence chunking
Sentence
recomposition
Chunks
An advanced approach to chunking
• Traverse the syntax tree bottom up, from right to left
• Add a word to the current chunk if
• The current chunk is not too long (sentence word count / 4)
• The word is non-alphabetic or only one symbol long
• The word begins with a genitive phrase («of »)
• Otherwise, initialize a new chunk with the word
• When chunking results in too many chunks, repeat the process,
allowing more (than sentence word count / 4) words in a chunk
Candidate Selection:
12-gram LM trained with
• KenLM
• DGT-Translation Memory corpus (Steinberger, 2011)
3.1 million legal domain sentences
• Sentences scored with the query program from KenLM
Test data
• 1581 random sentences from the JRC-Acquis corpus
• ACCURAT balanced evaluation corpus
Linguistically Motivated Chunks
CICLing 2016
Linguistically Motivated Chunks
Simple chunks Linguistically motivated chunks
• Recently
• there
• has been an increased interest in the automated
automated discovery of equivalent expressions
expressions in different languages
• .
• Recently there has been an increased interest
16.00
17.00
18.00
19.00
20.00
21.00
22.00
23.00
24.00
25.00
15.00
20.00
25.00
30.00
35.00
40.00
45.00
50.00
0.11
0.20
0.32
0.41
0.50
0.61
0.70
0.79
0.88
1.00
1.09
1.20
1.29
1.40
1.47
1.56
1.67
1.74
1.77
BLEU
Perplexity
Epoch
Perplexity BLEU-HY Linear (BLEU-HY)
Neural Language Models
13.30
13.80
14.30
14.80
15.30
15.80
16.30
15.00
20.00
25.00
30.00
35.00
40.00
45.00
50.00
BLEU
Perplexity
Epoch
Perplexity BLEU Linear (BLEU)
System BLEU
Whole translations – G+B
(Rikters 2015)
17.70
Simple Chunks– G+B
(Rikters and Skadiņa 2016a)
17.95
Linguistic Chunks – G+B
(Rikters and Skadiņa 2016b)
18.29
Linguistic Chunks – G+B+H+Y
(Rikters and Skadiņa 2016b)
19.21
+ Char-RNN Neural Language Model
(Rikters 2016d)
19.51
Some Results
Baselines BLEU
Bing 17.43
Google 17.63
Hugo.lv 17.14
Yandex 16.04
Start page
Translate with
onlinesystems
Inputtranslations
to combine
Input
translated
chunks
Settings
Translation results
Inputsource
sentence
Inputsource
sentence
Interactive MS MT
(Rikters 2016a)
Translation of Multi-Word Expressions (MWEs)
Find & Mark
MWE candidates
in corpora
Pre-process
monolingual texts
with TreeTagger
Extract MWE
candidate lists
from corpora
Mark MWE
candidates in
text
Find translation equivalents for
monolingual MWE candidates
with MPAligner
Monolingual MWE extraction
and annotation
MWE alignment
SMT Experiments
Adding data to
the parallel
corpora
Adding a second
translation table
Adding a sixth
feature to the
translation table
Using the Jaccard
Index for translation
probabilities
Using a Levenshtein
distance-based
similarity metric for
translation
probabilities
Method BLEU
Baseline 62.23
Baseline + MWE training data 62.10
Baseline + 2nd translation table 62.04
Baseline + 6th feature 62.37
MWEs in Neural Machine Translation
English-Latvian English-Czech
Training
Validation
2.5M 1xMWE 2.5M 2xMWE 5M 2xMWE 5M
1M 1xMWE 1M 2xMWE 2M 2xMWE 0.5M
• Matīss Rikters
"Multi-system machine translation using online APIs for English-Latvian"
The Fourth Workshop on Hybrid Approaches to Translation (2015)
• Matīss Rikters and Inguna Skadiņa
"Syntax-based multi-system machine translation"
The 10th edition of the Language Resources and Evaluation Conference (2016a)
• Matīss Rikters and Inguna Skadiņa
"Combining machine translated sentence chunks from multiple MT systems"
The 17th International Conference on Computational Linguistics and Intelligent Text Processing (2016b)
• Matīss Rikters
"K-translate – interactive multi-system machine translation"
12th International Baltic Conference on Databases and Information Systems (2016a)
• Matīss Rikters
“Searching for the Best Translation Combination Across All Possible Variants”
The 7th Conference on Human Language Technologies - the Baltic Perspective (2016b)
• Matīss Rikters
“Interactive Multi-System Machine Translation with Neural Language Models”
IOS Press Ebook (2016c)
• Matīss Rikters
“Neural Network Language Models for Candidate Scoring in Hybrid Multi-System Machine Translation”
The Sixth Workshop on Hybrid Approaches to Translation (2016d)
Publications
CICLing 2016
• Matīss Rikters and Ondřej Bojar
"Handling Multi-Word Expressions in Neural Machine Translation"
Publications in Progress
http://ej.uz/ChunkMT
http://ej.uz/SyMHyT
http://ej.uz/MSMT
http://ej.uz/chunker
http://ej.uz/NeuralLM
Code on GitHub
Teaching
• Supervised multiple course, qualification and bachelor theses
• Average grade 8.67
• Student curator
Attended Summer / Winter Schools
• Machine Translation Marathon 2015
• Deep Learning For Machine Translation 2015
• ParseME 2nd Training School
• Neural Machine Translation Marathon 2016
Other Academic Activities
Future Work
• Complete experiments and inspect results for English – Estonian
• Win WMT17 news translation task
• At least for English-Latvian
• At least beat Tilde
• Perform chunking on the target side
• Get chunks from dependency parses
• Complete PhD thesis draft
• Pass final exams
• Experiment with other types of LMs for candidate selection
• Factored Language Models (POS tag + lemma)
• Convolutional Neural Network Language Models
• Perform candidate selection using MT quality estimation
• QuEst++ (Specia et al., 2015)
• SHEF-NN (Shah et al., 2015)
Ahsan, A., and P. Kolachina. "Coupling Statistical Machine Translation with Rule-based Transfer and Generation, AMTA-The Ninth Conference of the Association for Machine Translation in the
Americas." Denver, Colorado (2010).
Barrault, Loïc. "MANY: Open source machine translation system combination." The Prague Bulletin of Mathematical Linguistics 93 (2010): 147-155.
Heafield, Kenneth. "KenLM: Faster and smaller language model queries." Proceedings of the Sixth Workshop on Statistical Machine Translation. Association for Computational Linguistics, 2011.
Kim, Yoon, et al. "Character-aware neural language models." arXiv preprint arXiv:1508.06615 (2015).
Mellebeek, Bart, et al. "Multi-engine machine translation by recursive sentence decomposition." (2006).
Mikolov, Tomas, et al. "Recurrent neural network based language model." INTERSPEECH. Vol. 2. 2010.
Petrov, Slav, et al. "Learning accurate, compact, and interpretable tree annotation." Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting
of the Association for Computational Linguistics. Association for Computational Linguistics, 2006.
Raivis Skadiņš, Kārlis Goba, Valters Šics. 2010. Improving SMT for Baltic Languages with Factored Models. Proceedings of the Fourth International Conference Baltic HLT 2010, Frontiers in
Artificial Intelligence and Applications, Vol. 2192. , 125-132.
Rikters, M., Skadiņa, I.: Syntax-based multi-system machine translation. LREC 2016. (2016a)
Rikters, M., Skadiņa, I.: Combining machine translated sentence chunks from multiple MT systems. CICLing 2016. (2016b)
Santanu, Pal, et al. "USAAR-DCU Hybrid Machine Translation System for ICON 2014" The Eleventh International Conference on Natural Language Processing. , 2014.
Schwenk, Holger, Daniel Dchelotte, and Jean-Luc Gauvain. "Continuous space language models for statistical machine translation." Proceedings of the COLING/ACL on Main conference poster
sessions. Association for Computational Linguistics, 2006.
Shah, Kashif, et al. "SHEF-NN: Translation Quality Estimation with Neural Networks." Proceedings of the Tenth Workshop on Statistical Machine Translation. 2015.
Specia, Lucia, G. Paetzold, and Carolina Scarton. "Multi-level Translation Quality Prediction with QuEst++." 53rd Annual Meeting of the Association for Computational Linguistics and Seventh
International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing: System Demonstrations. 2015.
Steinberger, Ralf, et al. "Dgt-tm: A freely available translation memory in 22 languages." arXiv preprint arXiv:1309.5226 (2013).
Steinberger, Ralf, et al. "The JRC-Acquis: A multilingual aligned parallel corpus with 20+ languages." arXiv preprint cs/0609058 (2006).
References
Aitäh!

Mais conteúdo relacionado

Mais procurados

cushLEPOR uses LABSE distilled knowledge to improve correlation with human tr...
cushLEPOR uses LABSE distilled knowledge to improve correlation with human tr...cushLEPOR uses LABSE distilled knowledge to improve correlation with human tr...
cushLEPOR uses LABSE distilled knowledge to improve correlation with human tr...Lifeng (Aaron) Han
 
[slide] A Compare-Aggregate Model with Latent Clustering for Answer Selection
[slide] A Compare-Aggregate Model with Latent Clustering for Answer Selection[slide] A Compare-Aggregate Model with Latent Clustering for Answer Selection
[slide] A Compare-Aggregate Model with Latent Clustering for Answer SelectionSeoul National University
 
LEPOR: an augmented machine translation evaluation metric - Thesis PPT
LEPOR: an augmented machine translation evaluation metric - Thesis PPT LEPOR: an augmented machine translation evaluation metric - Thesis PPT
LEPOR: an augmented machine translation evaluation metric - Thesis PPT Lifeng (Aaron) Han
 
Successes and Frontiers of Deep Learning
Successes and Frontiers of Deep LearningSuccesses and Frontiers of Deep Learning
Successes and Frontiers of Deep LearningSebastian Ruder
 
Transition Based Dependency Parsing
Transition Based Dependency ParsingTransition Based Dependency Parsing
Transition Based Dependency ParsingDavid Przybilla
 
Monte Carlo Modelling of Confidence Intervals in Translation Quality Evaluati...
Monte Carlo Modelling of Confidence Intervals in Translation Quality Evaluati...Monte Carlo Modelling of Confidence Intervals in Translation Quality Evaluati...
Monte Carlo Modelling of Confidence Intervals in Translation Quality Evaluati...Lifeng (Aaron) Han
 
Neural Network Language Models for Candidate Scoring in Multi-System Machine...
 Neural Network Language Models for Candidate Scoring in Multi-System Machine... Neural Network Language Models for Candidate Scoring in Multi-System Machine...
Neural Network Language Models for Candidate Scoring in Multi-System Machine...Matīss ‎‎‎‎‎‎‎  
 
Master defence 2020 - Anastasiia Khaburska - Statistical and Neural Language ...
Master defence 2020 - Anastasiia Khaburska - Statistical and Neural Language ...Master defence 2020 - Anastasiia Khaburska - Statistical and Neural Language ...
Master defence 2020 - Anastasiia Khaburska - Statistical and Neural Language ...Lviv Data Science Summer School
 
Detecting and Describing Historical Periods in a Large Corpora
Detecting and Describing Historical Periods in a Large CorporaDetecting and Describing Historical Periods in a Large Corpora
Detecting and Describing Historical Periods in a Large CorporaTraian Rebedea
 
2010 PACLIC - pay attention to categories
2010 PACLIC - pay attention to categories2010 PACLIC - pay attention to categories
2010 PACLIC - pay attention to categoriesWarNik Chow
 
MT SUMMIT PPT: Language-independent Model for Machine Translation Evaluation ...
MT SUMMIT PPT: Language-independent Model for Machine Translation Evaluation ...MT SUMMIT PPT: Language-independent Model for Machine Translation Evaluation ...
MT SUMMIT PPT: Language-independent Model for Machine Translation Evaluation ...Lifeng (Aaron) Han
 
Combining machine translated sentence chunks from multiple MT systems
Combining machine translated sentence chunks from multiple MT systemsCombining machine translated sentence chunks from multiple MT systems
Combining machine translated sentence chunks from multiple MT systemsMatīss ‎‎‎‎‎‎‎  
 
Transfer Learning for Natural Language Processing
Transfer Learning for Natural Language ProcessingTransfer Learning for Natural Language Processing
Transfer Learning for Natural Language ProcessingSebastian Ruder
 
TextRank: Bringing Order into Texts
TextRank: Bringing Order into TextsTextRank: Bringing Order into Texts
TextRank: Bringing Order into TextsShubhangi Tandon
 
Nlp research presentation
Nlp research presentationNlp research presentation
Nlp research presentationSurya Sg
 
Statistical machine translation for indian language copy
Statistical machine translation for indian language   copyStatistical machine translation for indian language   copy
Statistical machine translation for indian language copyNakul Sharma
 
2010 INTERSPEECH
2010 INTERSPEECH 2010 INTERSPEECH
2010 INTERSPEECH WarNik Chow
 

Mais procurados (20)

cushLEPOR uses LABSE distilled knowledge to improve correlation with human tr...
cushLEPOR uses LABSE distilled knowledge to improve correlation with human tr...cushLEPOR uses LABSE distilled knowledge to improve correlation with human tr...
cushLEPOR uses LABSE distilled knowledge to improve correlation with human tr...
 
[slide] A Compare-Aggregate Model with Latent Clustering for Answer Selection
[slide] A Compare-Aggregate Model with Latent Clustering for Answer Selection[slide] A Compare-Aggregate Model with Latent Clustering for Answer Selection
[slide] A Compare-Aggregate Model with Latent Clustering for Answer Selection
 
LEPOR: an augmented machine translation evaluation metric - Thesis PPT
LEPOR: an augmented machine translation evaluation metric - Thesis PPT LEPOR: an augmented machine translation evaluation metric - Thesis PPT
LEPOR: an augmented machine translation evaluation metric - Thesis PPT
 
Successes and Frontiers of Deep Learning
Successes and Frontiers of Deep LearningSuccesses and Frontiers of Deep Learning
Successes and Frontiers of Deep Learning
 
Transition Based Dependency Parsing
Transition Based Dependency ParsingTransition Based Dependency Parsing
Transition Based Dependency Parsing
 
Monte Carlo Modelling of Confidence Intervals in Translation Quality Evaluati...
Monte Carlo Modelling of Confidence Intervals in Translation Quality Evaluati...Monte Carlo Modelling of Confidence Intervals in Translation Quality Evaluati...
Monte Carlo Modelling of Confidence Intervals in Translation Quality Evaluati...
 
Neural Network Language Models for Candidate Scoring in Multi-System Machine...
 Neural Network Language Models for Candidate Scoring in Multi-System Machine... Neural Network Language Models for Candidate Scoring in Multi-System Machine...
Neural Network Language Models for Candidate Scoring in Multi-System Machine...
 
Master defence 2020 - Anastasiia Khaburska - Statistical and Neural Language ...
Master defence 2020 - Anastasiia Khaburska - Statistical and Neural Language ...Master defence 2020 - Anastasiia Khaburska - Statistical and Neural Language ...
Master defence 2020 - Anastasiia Khaburska - Statistical and Neural Language ...
 
Detecting and Describing Historical Periods in a Large Corpora
Detecting and Describing Historical Periods in a Large CorporaDetecting and Describing Historical Periods in a Large Corpora
Detecting and Describing Historical Periods in a Large Corpora
 
2010 PACLIC - pay attention to categories
2010 PACLIC - pay attention to categories2010 PACLIC - pay attention to categories
2010 PACLIC - pay attention to categories
 
MT SUMMIT PPT: Language-independent Model for Machine Translation Evaluation ...
MT SUMMIT PPT: Language-independent Model for Machine Translation Evaluation ...MT SUMMIT PPT: Language-independent Model for Machine Translation Evaluation ...
MT SUMMIT PPT: Language-independent Model for Machine Translation Evaluation ...
 
Combining machine translated sentence chunks from multiple MT systems
Combining machine translated sentence chunks from multiple MT systemsCombining machine translated sentence chunks from multiple MT systems
Combining machine translated sentence chunks from multiple MT systems
 
Language models
Language modelsLanguage models
Language models
 
1909 paclic
1909 paclic1909 paclic
1909 paclic
 
Transfer Learning for Natural Language Processing
Transfer Learning for Natural Language ProcessingTransfer Learning for Natural Language Processing
Transfer Learning for Natural Language Processing
 
TextRank: Bringing Order into Texts
TextRank: Bringing Order into TextsTextRank: Bringing Order into Texts
TextRank: Bringing Order into Texts
 
Sybrandt Thesis Proposal Presentation
Sybrandt Thesis Proposal PresentationSybrandt Thesis Proposal Presentation
Sybrandt Thesis Proposal Presentation
 
Nlp research presentation
Nlp research presentationNlp research presentation
Nlp research presentation
 
Statistical machine translation for indian language copy
Statistical machine translation for indian language   copyStatistical machine translation for indian language   copy
Statistical machine translation for indian language copy
 
2010 INTERSPEECH
2010 INTERSPEECH 2010 INTERSPEECH
2010 INTERSPEECH
 

Semelhante a Searching for the Best MT Combination

Integration of speech recognition with computer assisted translation
Integration of speech recognition with computer assisted translationIntegration of speech recognition with computer assisted translation
Integration of speech recognition with computer assisted translationChamani Shiranthika
 
Lepor: augmented automatic MT evaluation metric
Lepor: augmented automatic MT evaluation metricLepor: augmented automatic MT evaluation metric
Lepor: augmented automatic MT evaluation metricLifeng (Aaron) Han
 
Real-time DirectTranslation System for Sinhala and Tamil Languages.
Real-time DirectTranslation System for Sinhala and Tamil Languages.Real-time DirectTranslation System for Sinhala and Tamil Languages.
Real-time DirectTranslation System for Sinhala and Tamil Languages.Sheeyam Shellvacumar
 
KantanFest: Andy Way
KantanFest: Andy WayKantanFest: Andy Way
KantanFest: Andy Waykantanmt
 
Error Analysis of Rule-based Machine Translation Outputs
Error Analysis of Rule-based Machine Translation OutputsError Analysis of Rule-based Machine Translation Outputs
Error Analysis of Rule-based Machine Translation OutputsParisa Niksefat
 
What is machine translation
What is machine translationWhat is machine translation
What is machine translationStephen Peacock
 
Philippe Langlais - 2017 - Users and Data: The Two Neglected Children of Bili...
Philippe Langlais - 2017 - Users and Data: The Two Neglected Children of Bili...Philippe Langlais - 2017 - Users and Data: The Two Neglected Children of Bili...
Philippe Langlais - 2017 - Users and Data: The Two Neglected Children of Bili...Association for Computational Linguistics
 
GPT-2: Language Models are Unsupervised Multitask Learners
GPT-2: Language Models are Unsupervised Multitask LearnersGPT-2: Language Models are Unsupervised Multitask Learners
GPT-2: Language Models are Unsupervised Multitask LearnersYoung Seok Kim
 
Building a Neural Machine Translation System From Scratch
Building a Neural Machine Translation System From ScratchBuilding a Neural Machine Translation System From Scratch
Building a Neural Machine Translation System From ScratchNatasha Latysheva
 
Hybrid Machine Translation by Combining Multiple Machine Translation Systems
Hybrid Machine Translation by Combining Multiple Machine Translation SystemsHybrid Machine Translation by Combining Multiple Machine Translation Systems
Hybrid Machine Translation by Combining Multiple Machine Translation SystemsMatīss ‎‎‎‎‎‎‎  
 
Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...RajkiranVeluri
 
mt_cat_presentations CAT TRANSLATION PPT
mt_cat_presentations CAT TRANSLATION PPTmt_cat_presentations CAT TRANSLATION PPT
mt_cat_presentations CAT TRANSLATION PPTRamdan43
 
Pangeanic Cor-ActivaTM-Neural machine translation Taus Tokyo 2017
Pangeanic Cor-ActivaTM-Neural machine translation Taus Tokyo 2017Pangeanic Cor-ActivaTM-Neural machine translation Taus Tokyo 2017
Pangeanic Cor-ActivaTM-Neural machine translation Taus Tokyo 2017Manuel Herranz
 
Meta-Evaluation of Translation Evaluation Methods: a systematic up-to-date ov...
Meta-Evaluation of Translation Evaluation Methods: a systematic up-to-date ov...Meta-Evaluation of Translation Evaluation Methods: a systematic up-to-date ov...
Meta-Evaluation of Translation Evaluation Methods: a systematic up-to-date ov...Lifeng (Aaron) Han
 

Semelhante a Searching for the Best MT Combination (20)

Searching for the best translation combination
Searching for the best translation combinationSearching for the best translation combination
Searching for the best translation combination
 
Doktorantūras semināra 3. prezentācija
Doktorantūras semināra 3. prezentācijaDoktorantūras semināra 3. prezentācija
Doktorantūras semināra 3. prezentācija
 
Integration of speech recognition with computer assisted translation
Integration of speech recognition with computer assisted translationIntegration of speech recognition with computer assisted translation
Integration of speech recognition with computer assisted translation
 
Lepor: augmented automatic MT evaluation metric
Lepor: augmented automatic MT evaluation metricLepor: augmented automatic MT evaluation metric
Lepor: augmented automatic MT evaluation metric
 
Real-time DirectTranslation System for Sinhala and Tamil Languages.
Real-time DirectTranslation System for Sinhala and Tamil Languages.Real-time DirectTranslation System for Sinhala and Tamil Languages.
Real-time DirectTranslation System for Sinhala and Tamil Languages.
 
Linguistic Evaluation of Support Verb Construction Translations by OpenLogos ...
Linguistic Evaluation of Support Verb Construction Translations by OpenLogos ...Linguistic Evaluation of Support Verb Construction Translations by OpenLogos ...
Linguistic Evaluation of Support Verb Construction Translations by OpenLogos ...
 
KantanFest: Andy Way
KantanFest: Andy WayKantanFest: Andy Way
KantanFest: Andy Way
 
K translate - Baltic DBIS2016
K translate - Baltic DBIS2016K translate - Baltic DBIS2016
K translate - Baltic DBIS2016
 
Error Analysis of Rule-based Machine Translation Outputs
Error Analysis of Rule-based Machine Translation OutputsError Analysis of Rule-based Machine Translation Outputs
Error Analysis of Rule-based Machine Translation Outputs
 
What is machine translation
What is machine translationWhat is machine translation
What is machine translation
 
Philippe Langlais - 2017 - Users and Data: The Two Neglected Children of Bili...
Philippe Langlais - 2017 - Users and Data: The Two Neglected Children of Bili...Philippe Langlais - 2017 - Users and Data: The Two Neglected Children of Bili...
Philippe Langlais - 2017 - Users and Data: The Two Neglected Children of Bili...
 
GPT-2: Language Models are Unsupervised Multitask Learners
GPT-2: Language Models are Unsupervised Multitask LearnersGPT-2: Language Models are Unsupervised Multitask Learners
GPT-2: Language Models are Unsupervised Multitask Learners
 
Building a Neural Machine Translation System From Scratch
Building a Neural Machine Translation System From ScratchBuilding a Neural Machine Translation System From Scratch
Building a Neural Machine Translation System From Scratch
 
Hybrid Machine Translation by Combining Multiple Machine Translation Systems
Hybrid Machine Translation by Combining Multiple Machine Translation SystemsHybrid Machine Translation by Combining Multiple Machine Translation Systems
Hybrid Machine Translation by Combining Multiple Machine Translation Systems
 
team10.ppt.pptx
team10.ppt.pptxteam10.ppt.pptx
team10.ppt.pptx
 
Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...
 
Translationusing moses1
Translationusing moses1Translationusing moses1
Translationusing moses1
 
mt_cat_presentations CAT TRANSLATION PPT
mt_cat_presentations CAT TRANSLATION PPTmt_cat_presentations CAT TRANSLATION PPT
mt_cat_presentations CAT TRANSLATION PPT
 
Pangeanic Cor-ActivaTM-Neural machine translation Taus Tokyo 2017
Pangeanic Cor-ActivaTM-Neural machine translation Taus Tokyo 2017Pangeanic Cor-ActivaTM-Neural machine translation Taus Tokyo 2017
Pangeanic Cor-ActivaTM-Neural machine translation Taus Tokyo 2017
 
Meta-Evaluation of Translation Evaluation Methods: a systematic up-to-date ov...
Meta-Evaluation of Translation Evaluation Methods: a systematic up-to-date ov...Meta-Evaluation of Translation Evaluation Methods: a systematic up-to-date ov...
Meta-Evaluation of Translation Evaluation Methods: a systematic up-to-date ov...
 

Mais de Matīss ‎‎‎‎‎‎‎  

Effective online learning implementation for statistical machine translation
Effective online learning implementation for statistical machine translationEffective online learning implementation for statistical machine translation
Effective online learning implementation for statistical machine translationMatīss ‎‎‎‎‎‎‎  
 
Hybrid machine translation by combining multiple machine translation systems
Hybrid machine translation by combining multiple machine translation systemsHybrid machine translation by combining multiple machine translation systems
Hybrid machine translation by combining multiple machine translation systemsMatīss ‎‎‎‎‎‎‎  
 

Mais de Matīss ‎‎‎‎‎‎‎   (20)

日本のお風呂
日本のお風呂日本のお風呂
日本のお風呂
 
Thrifty Food Tweets on a Rainy Day
Thrifty Food Tweets on a Rainy DayThrifty Food Tweets on a Rainy Day
Thrifty Food Tweets on a Rainy Day
 
私の趣味
私の趣味私の趣味
私の趣味
 
How Masterly Are People at Playing with Their Vocabulary?
How Masterly Are People at Playing with Their Vocabulary?How Masterly Are People at Playing with Their Vocabulary?
How Masterly Are People at Playing with Their Vocabulary?
 
私の町リガ
私の町リガ私の町リガ
私の町リガ
 
大学への交通手段
大学への交通手段大学への交通手段
大学への交通手段
 
小学生に 携帯電話
小学生に 携帯電話小学生に 携帯電話
小学生に 携帯電話
 
Tracing multisensory food experience on twitter
Tracing multisensory food experience on twitterTracing multisensory food experience on twitter
Tracing multisensory food experience on twitter
 
ラトビア大学
ラトビア大学ラトビア大学
ラトビア大学
 
私の趣味
私の趣味私の趣味
私の趣味
 
富士山りょこう
富士山りょこう富士山りょこう
富士山りょこう
 
Tips and Tools for NMT
Tips and Tools for NMTTips and Tools for NMT
Tips and Tools for NMT
 
The Impact of Corpora Qulality on Neural Machine Translation
The Impact of Corpora Qulality on Neural Machine TranslationThe Impact of Corpora Qulality on Neural Machine Translation
The Impact of Corpora Qulality on Neural Machine Translation
 
Advancing Estonian Machine Translation
Advancing Estonian Machine TranslationAdvancing Estonian Machine Translation
Advancing Estonian Machine Translation
 
Debugging neural machine translations
Debugging neural machine translationsDebugging neural machine translations
Debugging neural machine translations
 
Effective online learning implementation for statistical machine translation
Effective online learning implementation for statistical machine translationEffective online learning implementation for statistical machine translation
Effective online learning implementation for statistical machine translation
 
Neirontulkojumu atkļūdošana
Neirontulkojumu atkļūdošanaNeirontulkojumu atkļūdošana
Neirontulkojumu atkļūdošana
 
Hybrid machine translation by combining multiple machine translation systems
Hybrid machine translation by combining multiple machine translation systemsHybrid machine translation by combining multiple machine translation systems
Hybrid machine translation by combining multiple machine translation systems
 
Paying attention to MWEs in NMT
Paying attention to MWEs in NMTPaying attention to MWEs in NMT
Paying attention to MWEs in NMT
 
CoLing 2016
CoLing 2016CoLing 2016
CoLing 2016
 

Último

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 

Último (20)

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 

Searching for the Best MT Combination

  • 1. Matīss Rikters Searching for the Best Machine Translation Combination Tartu, Estonia 22.03.2017
  • 2. Machine Translation Hybrid Machine Translation Methods I used • A count-based language model for candidate selection from full whole translations • Combining translations of sentence chunks • Combining translations of linguistically motivated chunks • A character-level neural language model for candidate selection A graphical implementation of the methods Translation of multiword expressions Other academic activities Future plans Contents
  • 3. • Machine translation (MT) is a sub-field of natural language processing that investigates the use of computers to translate text from one language to another • Statistical MT (SMT) consists of subcomponents that are separately engineered to learn how to translate from vast amounts of translated text • Rule-based MT (RBMT) is based on linguistic information covering the main semantic, morphological, and syntactic regularities of source and target languages • Neural MT (NMT) consists of a large neural network in which weights are trained jointly to maximize the translation performance Machine Translation
  • 4. • One of the first metrics to report high correlation with human judgments • One of the most popular in the field • The closer MT is to a professional human translation, the better it is • Scores a translation on a scale of 0 to 100 Automatic Evaluation of MT: BLEU
  • 5. Statistical rule generation • Rules for RBMT systems are generated from training corpora Multi-pass • Process data through RBMT first, and then through SMT Multi-System hybrid MT • Multiple MT systems run in parallel • SMT + RBMT (Ahsan and Kolachina, 2010) • Confusion Networks (Barrault, 2010) + Neural Network Model (Freitag et al., 2015) • SMT + EBMT + TM + NE (Santanu et al., 2014) • Recursive sentence decomposition (Mellebeek et al., 2006) Literature Review: Hybrid Machine Translation
  • 6. Combining full whole translations • Translate the full input sentence with multiple MT systems • Choose the best translation as the output Combining translations of sentence chunks • Split the sentence into smaller chunks • The chunks are the top level subtrees of the syntax tree of the sentence • Translate each chunk with multiple MT systems • Choose the best translated chunks and combine them Combining Translations
  • 7. KenLM (Heafield, 2011) calculates probabilities based on the observed entry with longest matching history 𝑤𝑓 𝑛 : 𝑝 𝑤 𝑛 𝑤1 𝑛−1 = 𝑝 𝑤 𝑛 𝑤𝑓 𝑛−1 𝑖=1 𝑓−1 𝑏(𝑤𝑖 𝑛−1 ) where the probability 𝑝 𝑤 𝑛 𝑤𝑓 𝑛−1 and backoff penalties 𝑏(𝑤𝑖 𝑛−1 ) are given by an already-estimated language model. Perplexity is then calculated using this probability: where given an unknown probability distribution p and a proposed probability model q, it is evaluated by determining how well it predicts a separate test sample x1, x2... xN drawn from p. Candidate Selection
  • 8. Teikumu dalīšana tekstvienībās Tulkošana ar tiešsaistes MT API Google Translate Bing Translator LetsMT Labākā tulkojuma izvēle Tulkojuma izvade Sentence tokenization Translation with online MT Selection of the best translation Output Whole Translations
  • 9. Teikumu dalīšana tekstvienībās Tulkošana artiešsaistes MT API Google Translate Bing Translator LetsMT Labāko fragmentu izvēle Tulkojumu izvade Teikumu sadalīšana fragmentos Sintaktiskā analīze Teikumu apvienošana Sentence tokenization Translation with online MT Selection of the best chunks Output Syntactic analysis Sentence chunking Sentence recomposition Chunks
  • 10. An advanced approach to chunking • Traverse the syntax tree bottom up, from right to left • Add a word to the current chunk if • The current chunk is not too long (sentence word count / 4) • The word is non-alphabetic or only one symbol long • The word begins with a genitive phrase («of ») • Otherwise, initialize a new chunk with the word • When chunking results in too many chunks, repeat the process, allowing more (than sentence word count / 4) words in a chunk Candidate Selection: 12-gram LM trained with • KenLM • DGT-Translation Memory corpus (Steinberger, 2011) 3.1 million legal domain sentences • Sentences scored with the query program from KenLM Test data • 1581 random sentences from the JRC-Acquis corpus • ACCURAT balanced evaluation corpus Linguistically Motivated Chunks CICLing 2016
  • 11. Linguistically Motivated Chunks Simple chunks Linguistically motivated chunks • Recently • there • has been an increased interest in the automated automated discovery of equivalent expressions expressions in different languages • . • Recently there has been an increased interest
  • 12. 16.00 17.00 18.00 19.00 20.00 21.00 22.00 23.00 24.00 25.00 15.00 20.00 25.00 30.00 35.00 40.00 45.00 50.00 0.11 0.20 0.32 0.41 0.50 0.61 0.70 0.79 0.88 1.00 1.09 1.20 1.29 1.40 1.47 1.56 1.67 1.74 1.77 BLEU Perplexity Epoch Perplexity BLEU-HY Linear (BLEU-HY) Neural Language Models 13.30 13.80 14.30 14.80 15.30 15.80 16.30 15.00 20.00 25.00 30.00 35.00 40.00 45.00 50.00 BLEU Perplexity Epoch Perplexity BLEU Linear (BLEU)
  • 13. System BLEU Whole translations – G+B (Rikters 2015) 17.70 Simple Chunks– G+B (Rikters and Skadiņa 2016a) 17.95 Linguistic Chunks – G+B (Rikters and Skadiņa 2016b) 18.29 Linguistic Chunks – G+B+H+Y (Rikters and Skadiņa 2016b) 19.21 + Char-RNN Neural Language Model (Rikters 2016d) 19.51 Some Results Baselines BLEU Bing 17.43 Google 17.63 Hugo.lv 17.14 Yandex 16.04
  • 14. Start page Translate with onlinesystems Inputtranslations to combine Input translated chunks Settings Translation results Inputsource sentence Inputsource sentence Interactive MS MT (Rikters 2016a)
  • 15. Translation of Multi-Word Expressions (MWEs) Find & Mark MWE candidates in corpora Pre-process monolingual texts with TreeTagger Extract MWE candidate lists from corpora Mark MWE candidates in text Find translation equivalents for monolingual MWE candidates with MPAligner Monolingual MWE extraction and annotation MWE alignment SMT Experiments Adding data to the parallel corpora Adding a second translation table Adding a sixth feature to the translation table Using the Jaccard Index for translation probabilities Using a Levenshtein distance-based similarity metric for translation probabilities Method BLEU Baseline 62.23 Baseline + MWE training data 62.10 Baseline + 2nd translation table 62.04 Baseline + 6th feature 62.37
  • 16. MWEs in Neural Machine Translation English-Latvian English-Czech Training Validation 2.5M 1xMWE 2.5M 2xMWE 5M 2xMWE 5M 1M 1xMWE 1M 2xMWE 2M 2xMWE 0.5M
  • 17. • Matīss Rikters "Multi-system machine translation using online APIs for English-Latvian" The Fourth Workshop on Hybrid Approaches to Translation (2015) • Matīss Rikters and Inguna Skadiņa "Syntax-based multi-system machine translation" The 10th edition of the Language Resources and Evaluation Conference (2016a) • Matīss Rikters and Inguna Skadiņa "Combining machine translated sentence chunks from multiple MT systems" The 17th International Conference on Computational Linguistics and Intelligent Text Processing (2016b) • Matīss Rikters "K-translate – interactive multi-system machine translation" 12th International Baltic Conference on Databases and Information Systems (2016a) • Matīss Rikters “Searching for the Best Translation Combination Across All Possible Variants” The 7th Conference on Human Language Technologies - the Baltic Perspective (2016b) • Matīss Rikters “Interactive Multi-System Machine Translation with Neural Language Models” IOS Press Ebook (2016c) • Matīss Rikters “Neural Network Language Models for Candidate Scoring in Hybrid Multi-System Machine Translation” The Sixth Workshop on Hybrid Approaches to Translation (2016d) Publications CICLing 2016
  • 18. • Matīss Rikters and Ondřej Bojar "Handling Multi-Word Expressions in Neural Machine Translation" Publications in Progress
  • 20. Teaching • Supervised multiple course, qualification and bachelor theses • Average grade 8.67 • Student curator Attended Summer / Winter Schools • Machine Translation Marathon 2015 • Deep Learning For Machine Translation 2015 • ParseME 2nd Training School • Neural Machine Translation Marathon 2016 Other Academic Activities
  • 21. Future Work • Complete experiments and inspect results for English – Estonian • Win WMT17 news translation task • At least for English-Latvian • At least beat Tilde • Perform chunking on the target side • Get chunks from dependency parses • Complete PhD thesis draft • Pass final exams • Experiment with other types of LMs for candidate selection • Factored Language Models (POS tag + lemma) • Convolutional Neural Network Language Models • Perform candidate selection using MT quality estimation • QuEst++ (Specia et al., 2015) • SHEF-NN (Shah et al., 2015)
  • 22. Ahsan, A., and P. Kolachina. "Coupling Statistical Machine Translation with Rule-based Transfer and Generation, AMTA-The Ninth Conference of the Association for Machine Translation in the Americas." Denver, Colorado (2010). Barrault, Loïc. "MANY: Open source machine translation system combination." The Prague Bulletin of Mathematical Linguistics 93 (2010): 147-155. Heafield, Kenneth. "KenLM: Faster and smaller language model queries." Proceedings of the Sixth Workshop on Statistical Machine Translation. Association for Computational Linguistics, 2011. Kim, Yoon, et al. "Character-aware neural language models." arXiv preprint arXiv:1508.06615 (2015). Mellebeek, Bart, et al. "Multi-engine machine translation by recursive sentence decomposition." (2006). Mikolov, Tomas, et al. "Recurrent neural network based language model." INTERSPEECH. Vol. 2. 2010. Petrov, Slav, et al. "Learning accurate, compact, and interpretable tree annotation." Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 2006. Raivis Skadiņš, Kārlis Goba, Valters Šics. 2010. Improving SMT for Baltic Languages with Factored Models. Proceedings of the Fourth International Conference Baltic HLT 2010, Frontiers in Artificial Intelligence and Applications, Vol. 2192. , 125-132. Rikters, M., Skadiņa, I.: Syntax-based multi-system machine translation. LREC 2016. (2016a) Rikters, M., Skadiņa, I.: Combining machine translated sentence chunks from multiple MT systems. CICLing 2016. (2016b) Santanu, Pal, et al. "USAAR-DCU Hybrid Machine Translation System for ICON 2014" The Eleventh International Conference on Natural Language Processing. , 2014. Schwenk, Holger, Daniel Dchelotte, and Jean-Luc Gauvain. "Continuous space language models for statistical machine translation." Proceedings of the COLING/ACL on Main conference poster sessions. Association for Computational Linguistics, 2006. Shah, Kashif, et al. "SHEF-NN: Translation Quality Estimation with Neural Networks." Proceedings of the Tenth Workshop on Statistical Machine Translation. 2015. Specia, Lucia, G. Paetzold, and Carolina Scarton. "Multi-level Translation Quality Prediction with QuEst++." 53rd Annual Meeting of the Association for Computational Linguistics and Seventh International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing: System Demonstrations. 2015. Steinberger, Ralf, et al. "Dgt-tm: A freely available translation memory in 22 languages." arXiv preprint arXiv:1309.5226 (2013). Steinberger, Ralf, et al. "The JRC-Acquis: A multilingual aligned parallel corpus with 20+ languages." arXiv preprint cs/0609058 (2006). References