Mais conteĂșdo relacionado
Mais de Konstantin Savenkov (20)
Fantastic MT Engines and Where to Find Them
- 1. Intento
1
BEST vs. FIT TO PURPOSE
© Intento, Inc. / November 2019
linguistically best
for my
language pair
MT with a proper data
protection and retention
policy, proper level of custom
terminology support, which
trains on my linguistic assets
with good ROI per my business
goals, and works well with my
source data quality, format
and content type according to
my subject matter experts
vs.
- 2. Intento
2
BEST vs. FIT TO PURPOSE
© Intento, Inc. / November 2019
linguistically best
for my
language pair
MT with a proper data
protection and retention
policy, proper level of custom
terminology support, which
trains on my linguistic assets
with good ROI per my business
goals, and works well with my
source data quality, format
and content type according to
my subject matter experts
vs.
on-the-ïŹy
MT routing
based on the
historical data
automated
procurement
and vendor
management
- 4. Intento
1. SELECT CANDIDATE ENGINES
4© Intento, Inc. / November 2019
GENERIC STOCK MODELS
Alibaba Amazon Baidu DeepL eBay Google
GTCom IBM Kakao Microsoft Mirai ModernMT
Niutrans Naver Omniscien PROMT Rozetta SAP
SDL Sogou Systran Tencent Tilde Yandex
VERTICAL STOCK MODELS
CUSTOM TERMINOLOGY SUPPORT
AUTO DOMAIN ADAPTATION MANUAL DOMAIN ADAPTATION
Youdao
Alibaba Baidu
Cloud
Translate
Iconic Microsoft Omniscien
PROMT SAP Systran
Amazon Baidu Google IBM Iconic Microsoft Rozetta SDL Systran
Globalese Google IBM
Kantan Microsoft ModernMT
Omniscien SDL Systran
Alibaba Baidu
Cloud
Translate
Iconic
Omniscien PangeaMT Prompsit PROMT
SDL Systran Tilde Yandex
Yandex
Standalone commercial MT products with an API. All product names, trademarks and registered trademarks are property of their respective owners. All company,
product and service names used in this website are for identiïŹcation purposes only. Use of these names, trademarks and brands does not imply endorsement.
- 6. Intento
6
2. IMPROVING
ENGINES
© Intento, Inc. / November 2019
data cleaning
TM training
glossaries
sentence scores
40-60% of âliveâ
TM is not suitable
for MT
â
linguistic
glossaries need to
be âcompiledâ for
MT
- 7. Intento
7
2. IMPROVING
ENGINES
© Intento, Inc. / November 2019
data cleaning
TM training
glossaries
sentence scores Different data volume and data
quality requirements
Different performance of baseline
models
July 2018 January 2019
- 10. Intento
CORPUS SCORES TO FIND TOP-RUNNERS
10© Intento, Inc. / November 2019
lack of correlation
indicates certain
types of errors
â
statistically
signiïŹcant rapid
drop-off identiïŹes
top-runners
- 11. Intento
SENTENCE SCORES TO HELP REVIEWERS
11© Intento, Inc. / November 2019
hard show NMT
training ïŹaws
â
controversial expose
NMT quirks
â
easy to check how
high scores are
correlated with quality
â
typical to measure
PE effort
typical
- 13. Intento
DIFFERENT SCENARIOS - DIFFERENT CHOICES
(even for the same language pair!)
13© Intento, Inc. / November 2019
PEMT / LSP
â
PEMT / Individual
â
Cross-Language Analysis and Retrieval (think eDiscovery)
â
Large-Scale Raw MT (think eCommerce)
â
Customer Support (think Global B2C)
â
Gisting and Inbound Content (think translation portals)
â
Large Enterprise
â
Government and Regulated Industries