2013 ALC Boston: Your Trained Moses SMT System doesn't work. What can you do?
1. Your Trained Moses SMT
System doesn't work.
What can you do?
Diego Bartolome, CEO tauyou <language technology>
diego.bartolome@tauyou.com
@diegobartolome
18. Decision 8: Metrics
SMT metrics: BLEU, NIST
Feedback from translators
Translation time vs. Post-editing time
Word Error Rate (WER) or Edit Distance
Cost reduction
23. Let's play with Moses
Best resource to start
www.statmt.org/moses
TAUS tutorial
www.translationautomation.com
tauyou slides
www.speakerdeck.com/tauyoucom
24. Everything is clear!
Gather TMs and other linguistic assets
Select domains
Train systems
BLEU score is great
… but …
Translation quality is awful
25. Why?
Not enough data
Too much data
Unclean TMs
Misalignments
Difficult language pairs
Selection of wrong parameters
Suboptimal techniques
26.
27. Some steps
Maximum exploitation of existing assets
Source content optimization
Data selection and cleaning
Improvement of the models
Linguistic processing
Continuous improvement
28. Linguistic assets
Translation memory sharing
Clients, Partners, EU, UN, TAUS
Relevant on-line data retrieval
Advanced TM techniques
Sub-segment matching
Parts of Speech replacement
31. Data selection + cleaning
Clean translation memories
Length, punctuation, terminology, …
Inconsistencies, repetitions, ...
Segment splitting
Optimize weight of most frequent n-grams
Validate their translations
Add out-of-domain data
32. Models optimization
Filter the translation tables
Remove the garbage + tune weights
Optimize language models
Adapt them to the translation purpose
Tune parameters correctly
Tune set, test set, optimization parameters
Improve recasing
33. Linguistic processing
In the source and/or target language
Grammar checking
Entities detection
Proper nouns, alphanumeric words, ...
Compound words splitting
Sentence reordering
34. Life is about the people you meet and
the things you create with them.
So go out and start creating
Part of the Holstee Manifesto
Diego Bartolome
CEO tauyou <language technology>
diego.bartolome@tauyou.com
@diegobartolome