In this presentation by tauyou and Prompsit, we explain the basics of Neural Machine Translation, and then we apply it to two use cases, one for a generic engine, and another one for a domain-specific engine. Results show that, despite Neural Machine Translation is promising, we have quite a lot of work to do to make it an alternative in real use cases.
(LocWorld Dublin, June 2016)
1. Beyond the Hype
of Neural Machine
Translation
Tauyou & Prompsit
(Diego) dbc@tauyou.com | (Gema) gramirez@prompsit.com
2. Why neural nets?
âartificial neural networks [...] are able to be trained
from examples without the need for a thorough
understanding of the task in hand, and able to show
surprising generalization performance and predicting
powerâ
Mikel L. Forcada (Neural Networks: Automata and Formal Models of Computation)
3. Why neural nets in MT now?
MT maturity
â MT is widely used (but planning to use it everywhere)
â MT for some languages is still not good enough (yes for others)
â RBMT, SMT and hybrid MT approaches widely exploited
Resources availability
â Computational power available and cheap (GPUs)
â Deep learning algorithms and frameworks available
â Data to learn from also available (corpora)
4. So, why not?
Promising results from WMT16 competition: all best systems are NMT ones
SMT NMT
BLEU TER BLEU TER
en-fi* 14.8 0.76 17.8 0.72
en-ro 27.4 0.61 28.7 0.60
en-ru 24.0 0.68 26.0 0.65
en-de 31.4 0.58 34.8 0.54
en-cz 24.1 0.67 26.3 0.63
* en-fi are Prompsitâs + DCU systems
5. Neural nets are...
â ...computational models inspired by Biology
â ...playing increasing key roles in Graphics and Pattern Recognition
â ...experiencing a new edge thanks to hardware and deep learning
â ...made of encoding/decoding âneuronsâ
â ...applied to translation (= neural MT = NMT):
â encode SL words as vectors that represent the relevant
information
â decode vectors into words preserving syntactic and semantic
information in the TL
6. NMT requires...
â Hardware: raw 10xCPUs or GPU
(times get shorter with GPUs)
â Software: deep learning framework
(Theano, Torch, etc.) + NMT libraries
â Data: bilingual corpora
(monolingual for LM only)
â Learning & (early) stopping: iteratively, translation models are created.
â Picking up a model: evaluation and selection of best model(s)
â Translating: model(s) are used to translate
8. Applying NMT to generic and in-domain use cases
Generic English -- Swedish SMT vs. NMT
â Same generic corpus (8M segments), same training and test sets
â SMT: Moses-based with no tuning on CPU
â NMT: Theano-based Groundhog NMT toolkit on GPU
Domain-specific English -- Norwegian SMT vs. NMT
â Same in-domain corpus (800K segments), same training and test sets
â SMT: Moses-based + tuning on CPU
â NMT: Theano-based Groundhog NMT toolkit on GPU
9. Comparison for generic English - Swedish
SMT NMT
Training time 48 hours (CPU) 2 weeks (GPU)
Translation time 00:12:35 (866 segments) 01:38:47 (866 segments)
CPU usage in translation 56% (CPU) 100% (CPU)
Space in disk 37.7 GB 9.1GB
BLEU score 0.440 0.404
Identical matches 19.33% (161/866) 12% (104/866)
Edit distance similarity 0.78 0.746
10. Comparison for in-domain English - Norwegian
SMT NMT
Training time 1.8 hours (3 CPUs) 7 days (1 GPU)
Translation time 00:01:22 (1,000 segments) 02:08:00 (1,000 segments)
CPU usage in translation 56% (CPU) 100% (CPU)
Space in disk 2.3 GB 6.5GB
BLEU score 0.53 0.62
Identical matches 27.76% (276/1000) 30% (300/1000)
Edit distance similarity 0.77 0.83
11. Conclusions SMT vs. NMT: technical insight
SMT NMT
Space in disk â â Smaller
CPU during translation â â
RAM during translation â â Lesser
Training speed rate â Faster â Can be optimized by hardware
Translation speed rate â Faster â Can be optimized by hardware
13. Final conclusions
â NMT is a new big player in MT:
â Research now focusing heavily on NMT: already
outperforms SMT in many cases
â Use case results: with little effort, it is on par with SMT
â Hardware requirements are more demanding for NMT:
higher budget
â Translators feedback: SMT is still better
14. Final conclusions
â SMT, and other approaches, more robust and alive
â Better quality and consistency in MT output.
â Better ROI, specially for real-time translation applications
where speed is critical
â Deep learning for other NLP applications?
â Of course! Vivid in quality estimation, terminology,
sentiment analysis, etc.
15. Thanks!
Go raibh maith agaibh!
Tauyou & Prompsit
(Diego) dbc@tauyou.com | (Gema) gramirez@prompsit.com