SlideShare uma empresa Scribd logo
1 de 35
Baixar para ler offline
STATE OF THE
MACHINE TRANSLATION
by Intento

July 2018
July 2018© Intento, Inc.
About
At Intento, we make Cloud Cognitive AI easy to discover,
access, and evaluate for a specific use.
—
Evaluation is a pain for everyone: to compare different services,
you have to sign a lot of contracts and integrate many APIs.
—
As we show in this report, the Machine Translation landscape is
complex, with 4x difference in quality and 195x difference in
price across pre-build models available from different vendors.
—
We deliver this overview report for FREE. To evaluate on your
own dataset, reach us at hello@inten.to
2
July 2018© Intento, Inc.
Intento MT Gateway
- that’s how we run such evaluations
Vendor-agnostic
API
Sync and async
modes
CLI tools and
SDKs
Works with files
of any size
Much faster due
to hyper-
threading
Get your
API key at
inten.to
3
July 2018© Intento, Inc.
Important highlights
Amazon and SAP went from preview to production
—
Amazon, Baidu, IBM, Microsoft, and PROMT increased language coverage
—
For 7 language pairs, available MT quality raised more than 5% since Mar
2018: en-ko (▲25%), en-nl (▲11%), nl-en (▲14%), ru-de (▲8%), ja-fr
(▲10%), en-cs (▲5%), en-tr (▲7%) (see slide 15)
—
For 13 language pairs, the best MT provider has changed since Mar 2018:
en-zh, de-ru, ru-de, en-tr, en-pt, nl-en, en-nl, ja-en, zh-it, cs-en, en-cs, en-
it, ru-en
—
To get the best quality across 48 language pairs, one needs 9 engines (see
slide 18)
4
July 2018© Intento, Inc.
Overview
1 TRANSLATION QUALITY
2 PRICING
3 LANGUAGE COVERAGE
4 HISTORICAL PROGRESS
5 CONCLUSIONS
48
Language Pairs
19
Machine Translation
Engines
5
July 2018© Intento, Inc.
Benchmark changes
since March 2018
Added 3 engines: ModernMT*, Alibaba**, Youdao**
—
Updated to new versions: IBM (v3/NMT), Microsoft (v3/
NMT)
—
Updated SAP*** and Amazon from preview to public
—
Added detailed best and optimal engines chart (slides
18-19)
—
Added Pricing section (slide 21)
* evaluated on one language pair (cost prohibitive)
** unavailable outside of China yet
*** not evaluated (cost prohibitive & unstable)
6
July 2018© Intento, Inc.
Machine Translation Engines*
Evaluated
* We have evaluated general purpose Cloud Machine Translation services with prebuilt translation models, provided via API. Some vendors also provide
web-based, on-premise or custom MT engines, which may differ on all aspects from what we’ve evaluated.
Alibaba Cloud
Machine Translation
Amazon
Translate
Baidu
Translate API
DeepL
API
Google Cloud
Translation API
GTCom
YeeCloud MT
IBM Watson NMT
Language Translator
IBM Watson SMT
Language Translator
Microsoft NMT
Translator Text API
Microsoft SMT
Translator Text API
ModernMT
API
PROMT
Cloud API
SAP
Translation Hub
SDL Language Cloud
Translation Toolkit
Systran PNMT
Enterprise Server
Systran REST
Translation API
Tencent Cloud
TMT API (preview)
Yandex
Translate API
Youdao Cloud
Translation API
7
July 2018© Intento, Inc.
1Translation Quality
1.1 Evaluation Methodology
1.2 Available MT Quality
1.3 Top-Performing Engines
1.4 Best General-Purpose Engines
1.5 Optimal General-Purpose Engines
1.6 Price vs. Performance
8
July 2018© Intento, Inc.
Evaluation methodology (I)
Translation quality is evaluated by computing LEPOR score
between reference translations and the MT output (Slide 11).
—
Currently, our goal is to evaluate the performance of translation
between the most popular languages (Slide 12).
—
We use public datasets from StatMT/WMT, CASMACAT News
Commentary and Tatoeba (Slide 13).
—
We have performed LEPOR metric convergence analysis to
identify the minimal viable number of segments in the dataset.
See Slide 14 for some details.
9
July 2018© Intento, Inc.
Evaluation methodology (II)
We judge that the MT quality of service A is better than that of
B for the language pair C if:
- mean LEPOR score of A is greater than LEPOR of B for the
pair C, and
- lower bound of the LEPOR 95% confidence interval of A is
greater than the upper bound of the LEPOR confidence
interval of B for the pair C. See Slide 14 for example.
—
Different language pairs (and different datasets) impose different
translation complexity. To compare overall MT performance of
different services, we regularize LEPOR scores across all
language pairs (See Appendix A for more details).
10
July 2018© Intento, Inc.
LEPOR score
LEPOR: automatic machine translation evaluation metric
considering the enhanced Length Penalty, n-gram Position
difference Penalty and Recall
—
In our evaluation, we used hLEPORA v.3.1:
—
(best metric from ACL-WMT 2013 contest)
https://www.slideshare.net/AaronHanLiFeng/lepor-an-augmented-machine-translation-evaluation-metric-thesis-ppt
https://github.com/aaronlifenghan/aaron-project-lepor
LIKE BLEU,
BUT BETTER
11
July 2018© Intento, Inc.
48
Language
Pairs
* https://w3techs.com/technologies/overview/content_language/all
Language groups by
web popularity*:
P1 - ≥ 2.0% websites
P2 - 0.5%-2% websites
P3 - 0.1-0.3% websites
P4 - <0.1% websites
—
We focus on the en-P1,
P1-en and P1-P1
(partially)
en ru ja de es fr pt it zh cs tr fi ro ko ar nl
en ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓
ru ✓ ✓ ✓ ✓ ✓
ja ✓ ✓ ✓
de ✓ ✓ ✓ ✓ ✓
es ✓ ✓
fr ✓ ✓ ✓ ✓
pt ✓
it ✓ ✓ ✓
zh ✓ ✓ ✓
cs ✓
tr ✓
fi ✓
ro ✓
ko ✓
ar ✓
nl ✓
12
July 2018© Intento, Inc.
Datasets
WMT-2013 (translation task, news domain)
en-es, es-en
WMT-2015 (translation task, news domain)
fr-en, en-fr
WMT-2016 (translation task, news domain)
cs-en, en-cs, de-en, en-de, ro-en, en-ro, fi-en, en-fi, ru-en, en-ru, tr-en, en-tr
WMT-2017 (translation task, news domain)
zh-en, en-zh
NewsCommentary-2011
en-ja, ja-en, en-pt, pt-en, en-it, it-en, ru-de, ru-es, ru-fr, ru-pt, ja-fr, de-ja, es-zh, fr-
ru, fr-es, it-pt, zh-it, en-ar, ar-en, en-nl, nl-en, fr-de, de-fr, de-it, ja-zh, zh-ja
Tatoeba
en-ko, ko-en
13
July 2018© Intento, Inc.
We used 900 - 3000 sentences per language pair. The metric stabilizes and adding
more from the same domain won’t change the outcome.
number of sentences
regularisedhLEPORscores
Aggregated across all language pairs Examples for individual language pairs:
LEPOR Convergence
Confi-
dence

interval
Aggre-
gated
mean
14
July 2018© Intento, Inc.
en ru ja de es fr pt it zh cs tr fi ro ko ar nl
en 2 6 3 6 4 5 5 4 2 3 1 2 1 2 1
ru 2 3 3 3 2
ja 4 2 4
de 5 3 3 4 4
es 5 3
fr 6 3 5 8
pt 5
it 8 2 5
zh 4 4 4
cs 4
tr 4
fi 2
ro 3
ko 1
ar 5
nl 1
$$
$$
Available
MT
Quality Maximal
Available

hLEPOR score:
>80 %
70 %
60 %
50 %
40 %
<40 %
Minimal price
for this quality,
per 1M char*:
$$$ ≥$20
$$ $10-15
$ <$10
No. of 

top-performing

MT Providers**
* base pricing tier
** up to 5% worse than the leader,
SMT and NMT counted separately
$$
$$
$$
$$
$$
$$$
$$
$$
$$
$$
$$
$
$$
$$$$$ $$
$
$
$$
$$
$$
$$
$
$$
$
$$$
$$
$$
$$
$$
$$ $$ $$$$
$$
$$
$
$$
$$$
$$
$$
$$ $$$
$
$$
15
July 2018© Intento, Inc.
Sample pair analysis: English-Chinese
LEPOR

score Providers
Price range

(per 1M characters)
71 % Tencent (preview)
70 % Google, GTCom $10-20
68 % Baidu $7
66.5 % Systran PNMT, Amazon $15-?
65 % Microsoft, IBM NMT $10-21.4
based on
WMT-17

dataset
BEST
QUALITY:
Tencent (preview)
TOP 5%: Tencent, Google, GTCom,
Baidu
BEST PRICE
IN TOP 5%:
Baidu
16
July 2018© Intento, Inc.
optimal
Provides the lowest price
among the top 5% MT
engines for a language
pair
0
10
20
30
40
50
google
deepl
am
azon
yandex
ibm
-nm
t
prom
t
m
sft-nm
t
tencent
ibm
-sm
t
baidu
systran-pnm
tgtcom
m
sft-sm
t
sdl-sm
t
m
odernm
t
across 48 language pairs*
TOP Performing MT Providers
best
Provides the best MT
Quality for a language
pair
top 5%
Within 5% of the best
available MT Quality for a
language pair
17
July 2018© Intento, Inc.
en ru ja de es fr pt it zh cs tr fi ro ko ar nl
en
ru
ja
de
es
fr
pt
it
zh
cs
tr
fi
ro
ko
ar
nl
Best
general-
purpose
MT
engines
MT Engines
google
deepl
amazon
yandex
ibm-nmt
promt
msft-nmt
ibm-smt
tencent
18
July 2018© Intento, Inc.
en ru ja de es fr pt it zh cs tr fi ro ko ar nl
en
ru
ja
de
es
fr
pt
it
zh
cs
tr
fi
ro
ko
ar
nl
* Cheapest with a
performance within
5% of the best
available for this
language pair
Optimal*
general-
purpose
MT
engines
MT Engines
msft-nmt
yandex
msft-smt
baidu
google
amazon
ibm-nmt
promt
ibm-smt
19
July 2018© Intento, Inc.
Price vs. Performance*
AFFORDABILITY
PERFORMANCE
As of March 2018
ACCURATE
NOT
PUB
LIC
COST-EFFECTIVE
Performance

Regularized hLEPOR
score aggregated
across all language
pairs in the dataset

Affordability = 1/price

Using public volume-
based pricing tiers

Legend

• performance range:

- regularized average

- max across all pairs

- min across all pairs

• price range
* only production-ready engines shown 20
July 2018© Intento, Inc.
2Public pricing
USD
per 1M
symbols
* +20% for some language pairs
** estimation based on 4.79 symbols per word
21
July 2018© Intento, Inc.
3Language Coverage
3.1 Supported and Unique per Provider
3.2 Coverage by Language Popularity
22
July 2018© Intento, Inc.
1
100
10000
G
oogle
Yandex
M
icrosoftN
M
TM
icrosoftSM
T
Baidu
Tencent
Systran
Systran
PN
M
T
PRO
M
T
SDL
Language


C
loud
Youdao
SAP
M
odernM
T
DeepL
IBM
N
M
T
Am
azon
IBM
SM
T
Alibaba
G
TC
om
2
11
2
56
138
119
1 074
3 022
6
8
20
24
34
424447
72
104106110110
210
812
3 7823 660
8 556
10 712
Total
Unique
Supported and Unique Language Pairs
Unique
language pairs
- supported
exclusively by
one provider
23
July 2018© Intento, Inc.
Language popularity
Language groups by
web popularity*:
P1 - ≥ 2.0% websites
P2 - 0.5%-2% websites
P3 - 0.1-0.3% websites
P4 - <0.1% websites
* https://w3techs.com/technologies/overview/content_language/all
A total of
29070
pairs possible,
13098
are supported
across all providers
P1
en, ru, ja, de, es, fr,
pt, it, zh
P2
pl, fa, tr, nl, ko, cs, ar,
vi, el, sv in, ro, hu
P3
da, sk, fi, th, bg, he, lt, uk, hr,
no, nb, sr, ca, sl, lv, et
P4
hi, az, bs, ms, is, mk, bn, eu, ka, sq, gl,
mn, kk, hy, se, uz, kr, ur, ta, nn, af, be,
si, my, br, ne, sw, km, fil, ml, pa, …
24
July 2018© Intento, Inc.
100% 100% 63%
31%
P1 P2 P3 P4
P1
P2
P3
P4
60%
100%
100%
100%
63%
100% 100%
100%
63%
63% 60%
99%
Language coverage
by popularity
45%
of possible
language pairs
25
July 2018© Intento, Inc.
Language coverage
by service provider
Google Cloud
Translation API
Yandex
Translate API
Microsoft
Translator Text
API (SMT)
Microsoft
Translator Text
API (NMT)
Baidu
Translate API
Tencent Cloud
TMT API
(preview)
Systran REST
Translation API
Systran PNMT
Enterprise
Server
PROMT
Cloud API
SDL Language
Cloud Translation
Toolkit
Youdao Cloud
Translation API
SAP Translation
Hub
ModernMT
API
DeepL
API
IBM Watson
Language
Translator (NMT)
Amazon
Translate
IBM Watson
Language
Translator (SMT)
Alibaba
Translate
GTCom
YeeCloud MT
26
July 2018© Intento, Inc.
4 Historical Progress
4.1 Number of Cloud MT Vendors
4.2 MT Quality
4.3 Performance/Price Efficiency
27
July 2018© Intento, Inc.
Independent Cloud MT Vendors
with pre-built models
Commercial
Alibaba, Amazon,
Baidu, DeepL,
Google, GTCom,
IBM, Microsoft,
ModernMT, PROMT,
SAP, SDL, Systran,
Yandex, Youdao
Preview
Tencent
0
4
8
12
16
Jul 17 Nov 17 Mar 18 Jul 2018
Preview
Commercial
Intento, Inc. • July 2018
28
July 2018© Intento, Inc.
30 %
40 %
50 %
60 %
70 %
80 %
Jul 17 Nov 17 Mar 18 Jul 18
Best pair
Worst pair
1 1
Best available
MT Quality
Number of
language pairs
available at this level
of LEPOR quality
out of 14 pairs we
evaluated since July
2017 (ru, de, cs, tr,
fi, ro, zh to en and
back)
8
4
2
7
4
2
Intento, Inc. • July 2018
7
4
2
2
7
4
1
29
July 2018© Intento, Inc.
1
12
Best available
Performance/Price Efficiency
Efficiency =
(hLEPOR in %)² /
(USD per 1M
symbols)
—
Number of
language pairs
available at this level
of efficiency out of
14 pairs we
evaluated since July
2017 (ru, de, cs, tr,
fi, ro, zh to en and
back)
100
200
300
400
500
600
700
800
900
Jul 17 Nov 17 Mar 18 Jul 18
Best pair Worst pair
3
2
3
2
3
1
1
1
6
3
2
Intento, Inc. • July 2018
3
1
4
3
1
3
4
3
4
30
July 2018© Intento, Inc.
5 Conclusions
Machine Translation quality and efficiency improves
monthly, but far from being ideal, hence clever MT choice
is a must.
—
In the same time, the MT landscape gets more
fragmented as focus shifts from having the best
algorithms to having the best data.
—
Even for the general domain, having the best quality
across 48 language pairs requires 9 engines used
simultaneously.
31
July 2018© Intento, Inc.
Custom version
of this report
You may the evaluation for your project using
our vendor-agnostic API and command-line
tools.
—
Also we may help with translating your corpus
via multiple vendors or handling the whole
evaluation for your project.
—
Reach us at hello@inten.to
32
July 2018© Intento, Inc.
Evaluate vendors
on your own data
with no effort
—
up to +230% quality and
-87% price by choosing
the right vendor
—
save 12wk of engineering
and data science efforts
Manage and
optimise vendor
portfolio with our
smart routing AI
—
use the best vendor for each
language pair and domain
with no hassle
Single integration
and contract to
multiple vendors
and models
—

save upfront 5-7wk per each
vendor API
—
save 1d per month per each
vendor API
Intento Single API

routes requests to the best models
Reach us for pricing and contract
33
STATE OF THE
MACHINE TRANSLATION
by Intento (https://inten.to)

July 2018
Konstantin Savenkov
ks@inten.to
(415) 429-0021
2150 Shattuck Ave
Berkeley CA 94705
34
July 2018© Intento, Inc.
Appendix A
Overall performance of the MT services across many language
pairs is computed in the following way:
1. [Standardisation] We compute mean language-standardised
LEPOR score (or z-score) for each provider.
2. [Scale adjustment] We restore the original scale by multiplying
z-score for each MT provider by the global LEPOR standard
deviation and adding the global mean LEPOR score.
35

Mais conteúdo relacionado

Mais procurados

Project sentiment analysis
Project sentiment analysisProject sentiment analysis
Project sentiment analysis
Bob Prieto
 
DANES: Diet and Nutrition Expert System for Meal Management and Nutrition Cou...
DANES: Diet and Nutrition Expert System for Meal Management and Nutrition Cou...DANES: Diet and Nutrition Expert System for Meal Management and Nutrition Cou...
DANES: Diet and Nutrition Expert System for Meal Management and Nutrition Cou...
rahulmonikasharma
 
Open AI Chat GPT-4-3.pptx
Open AI Chat GPT-4-3.pptxOpen AI Chat GPT-4-3.pptx
Open AI Chat GPT-4-3.pptx
JKHomer
 

Mais procurados (20)

What is chat gpt
What is chat gptWhat is chat gpt
What is chat gpt
 
7 reasons you should be excited about Tata Nexon EV
7 reasons you should be excited about Tata Nexon EV7 reasons you should be excited about Tata Nexon EV
7 reasons you should be excited about Tata Nexon EV
 
Лайфхаки Confluence для разработки требований
Лайфхаки Confluence для разработки требованийЛайфхаки Confluence для разработки требований
Лайфхаки Confluence для разработки требований
 
chatbots presentation .pptx
chatbots presentation .pptxchatbots presentation .pptx
chatbots presentation .pptx
 
Project sentiment analysis
Project sentiment analysisProject sentiment analysis
Project sentiment analysis
 
RTO Management System
RTO Management SystemRTO Management System
RTO Management System
 
India Electric Vehicle Market 2018 2025: Report Sample
India Electric Vehicle Market 2018 2025: Report SampleIndia Electric Vehicle Market 2018 2025: Report Sample
India Electric Vehicle Market 2018 2025: Report Sample
 
Presentation on Sentiment Analysis
Presentation on Sentiment AnalysisPresentation on Sentiment Analysis
Presentation on Sentiment Analysis
 
200109-Open AI Chat GPT-4-3.pptx
200109-Open AI Chat GPT-4-3.pptx200109-Open AI Chat GPT-4-3.pptx
200109-Open AI Chat GPT-4-3.pptx
 
DANES: Diet and Nutrition Expert System for Meal Management and Nutrition Cou...
DANES: Diet and Nutrition Expert System for Meal Management and Nutrition Cou...DANES: Diet and Nutrition Expert System for Meal Management and Nutrition Cou...
DANES: Diet and Nutrition Expert System for Meal Management and Nutrition Cou...
 
Toyota Prius 4 Presentation
Toyota Prius 4 PresentationToyota Prius 4 Presentation
Toyota Prius 4 Presentation
 
Green Veh Strategy
Green Veh StrategyGreen Veh Strategy
Green Veh Strategy
 
Approaches to Sentiment Analysis
Approaches to Sentiment AnalysisApproaches to Sentiment Analysis
Approaches to Sentiment Analysis
 
Online Food Ordering Website project
Online Food Ordering Website projectOnline Food Ordering Website project
Online Food Ordering Website project
 
Sample Blue Ocean Strategy for Drug Delivery
Sample Blue Ocean Strategy for Drug DeliverySample Blue Ocean Strategy for Drug Delivery
Sample Blue Ocean Strategy for Drug Delivery
 
Sentiment classification for product reviews (documentation)
Sentiment classification for product reviews (documentation)Sentiment classification for product reviews (documentation)
Sentiment classification for product reviews (documentation)
 
Gender voice recognition.pptx
Gender voice recognition.pptxGender voice recognition.pptx
Gender voice recognition.pptx
 
marketing presentation + electric car
marketing presentation + electric carmarketing presentation + electric car
marketing presentation + electric car
 
Explore the Impact of AI on E-Commerce
Explore the Impact of AI on E-CommerceExplore the Impact of AI on E-Commerce
Explore the Impact of AI on E-Commerce
 
Open AI Chat GPT-4-3.pptx
Open AI Chat GPT-4-3.pptxOpen AI Chat GPT-4-3.pptx
Open AI Chat GPT-4-3.pptx
 

Semelhante a State of the Machine Translation by Intento (July 2018)

Semelhante a State of the Machine Translation by Intento (July 2018) (20)

State of the Machine Translation by Intento (stock engines, Jan 2019)
State of the Machine Translation by Intento (stock engines, Jan 2019)State of the Machine Translation by Intento (stock engines, Jan 2019)
State of the Machine Translation by Intento (stock engines, Jan 2019)
 
State of the Machine Translation by Intento (March 2018)
State of the Machine Translation by Intento (March 2018)State of the Machine Translation by Intento (March 2018)
State of the Machine Translation by Intento (March 2018)
 
EVALUATION IN USE: NAVIGATING THE MT ENGINE LANDSCAPE WITH THE INTENTO EVALUA...
EVALUATION IN USE: NAVIGATING THE MT ENGINE LANDSCAPE WITH THE INTENTO EVALUA...EVALUATION IN USE: NAVIGATING THE MT ENGINE LANDSCAPE WITH THE INTENTO EVALUA...
EVALUATION IN USE: NAVIGATING THE MT ENGINE LANDSCAPE WITH THE INTENTO EVALUA...
 
Intento Enterprise MT Hub
Intento Enterprise MT HubIntento Enterprise MT Hub
Intento Enterprise MT Hub
 
Intento Enterprise MT Hub
Intento Enterprise MT HubIntento Enterprise MT Hub
Intento Enterprise MT Hub
 
Intento Enterprise MT Hub
Intento Enterprise MT HubIntento Enterprise MT Hub
Intento Enterprise MT Hub
 
Intento Machine Translation Benchmark, July 2017
Intento Machine Translation Benchmark, July 2017Intento Machine Translation Benchmark, July 2017
Intento Machine Translation Benchmark, July 2017
 
State of the Domain-Adaptive Machine Translation by Intento (November 2018)
State of the Domain-Adaptive Machine Translation by Intento (November 2018)State of the Domain-Adaptive Machine Translation by Intento (November 2018)
State of the Domain-Adaptive Machine Translation by Intento (November 2018)
 
Cloud Sentiment Analysis - Vendor Overview (April 2018)
Cloud Sentiment Analysis - Vendor Overview (April 2018)Cloud Sentiment Analysis - Vendor Overview (April 2018)
Cloud Sentiment Analysis - Vendor Overview (April 2018)
 
MuleSoft + Augmented Reality & ChatGPT
MuleSoft + Augmented Reality & ChatGPTMuleSoft + Augmented Reality & ChatGPT
MuleSoft + Augmented Reality & ChatGPT
 
MuleSoft + Augmented Reality & ChatGPT
MuleSoft + Augmented Reality & ChatGPTMuleSoft + Augmented Reality & ChatGPT
MuleSoft + Augmented Reality & ChatGPT
 
ONEs Q1 2019 - Human MT Evaluation At Scale
ONEs Q1 2019 - Human MT Evaluation At ScaleONEs Q1 2019 - Human MT Evaluation At Scale
ONEs Q1 2019 - Human MT Evaluation At Scale
 
R Vs Python – The most trending debate of aspiring Data Scientists
R Vs Python – The most trending debate of aspiring Data ScientistsR Vs Python – The most trending debate of aspiring Data Scientists
R Vs Python – The most trending debate of aspiring Data Scientists
 
Topic 4: The Magician's Hat: Turning Data into Business Intelligence (3)
Topic 4: The Magician's Hat: Turning Data into Business Intelligence (3)Topic 4: The Magician's Hat: Turning Data into Business Intelligence (3)
Topic 4: The Magician's Hat: Turning Data into Business Intelligence (3)
 
Machine learning for predictive maintenance external
Machine learning for predictive maintenance   externalMachine learning for predictive maintenance   external
Machine learning for predictive maintenance external
 
Machine Translation Insights
Machine Translation InsightsMachine Translation Insights
Machine Translation Insights
 
Sviluppare un backend serverless in real time attraverso GraphQL
Sviluppare un backend serverless in real time attraverso GraphQLSviluppare un backend serverless in real time attraverso GraphQL
Sviluppare un backend serverless in real time attraverso GraphQL
 
OPENMP ANALYSIS IN VTUNE AMPLIFIER XE
OPENMP ANALYSIS IN VTUNE AMPLIFIER XEOPENMP ANALYSIS IN VTUNE AMPLIFIER XE
OPENMP ANALYSIS IN VTUNE AMPLIFIER XE
 
MLops workshop AWS
MLops workshop AWSMLops workshop AWS
MLops workshop AWS
 
Cloud Artificial Intelligence Landscape
Cloud Artificial Intelligence LandscapeCloud Artificial Intelligence Landscape
Cloud Artificial Intelligence Landscape
 

Mais de Konstantin Savenkov

Mais de Konstantin Savenkov (18)

GPT and other Text Transformers: Black Swans and Stochastic Parrots
GPT and other Text Transformers:  Black Swans and Stochastic ParrotsGPT and other Text Transformers:  Black Swans and Stochastic Parrots
GPT and other Text Transformers: Black Swans and Stochastic Parrots
 
Dodging AI biases in future-proof Machine Translation solutions
Dodging AI biases in future-proof Machine Translation solutionsDodging AI biases in future-proof Machine Translation solutions
Dodging AI biases in future-proof Machine Translation solutions
 
Building Multi-Purpose MT Portfolio
Building Multi-Purpose MT PortfolioBuilding Multi-Purpose MT Portfolio
Building Multi-Purpose MT Portfolio
 
Как выбрать и приручить машинный перевод / How to choose and tame the Machine...
Как выбрать и приручить машинный перевод / How to choose and tame the Machine...Как выбрать и приручить машинный перевод / How to choose and tame the Machine...
Как выбрать и приручить машинный перевод / How to choose and tame the Machine...
 
Progress in Commercial Machine Translation Systems
Progress in Commercial Machine Translation SystemsProgress in Commercial Machine Translation Systems
Progress in Commercial Machine Translation Systems
 
Intento Enterprise MT Hub
Intento Enterprise MT HubIntento Enterprise MT Hub
Intento Enterprise MT Hub
 
Improving the Demand Side of the AI Economy (API World 2018)
Improving the Demand Side of the AI Economy (API World 2018)Improving the Demand Side of the AI Economy (API World 2018)
Improving the Demand Side of the AI Economy (API World 2018)
 
Сравнительный анализ систем машинного перевода
Сравнительный анализ систем машинного переводаСравнительный анализ систем машинного перевода
Сравнительный анализ систем машинного перевода
 
NLU / Intent Detection Benchmark by Intento, August 2017
NLU / Intent Detection Benchmark by Intento, August 2017NLU / Intent Detection Benchmark by Intento, August 2017
NLU / Intent Detection Benchmark by Intento, August 2017
 
Building a Data Driven Business
Building a Data Driven BusinessBuilding a Data Driven Business
Building a Data Driven Business
 
Управление бизнесом на основе данных
Управление бизнесом на основе данныхУправление бизнесом на основе данных
Управление бизнесом на основе данных
 
Messengers, Bots and Personal Assistants
Messengers, Bots and Personal AssistantsMessengers, Bots and Personal Assistants
Messengers, Bots and Personal Assistants
 
Рекомендательные системы: роль и оценка эффективности
Рекомендательные системы: роль и оценка эффективностиРекомендательные системы: роль и оценка эффективности
Рекомендательные системы: роль и оценка эффективности
 
Measuring the agile process improvement
Measuring the agile process improvementMeasuring the agile process improvement
Measuring the agile process improvement
 
Lean production для SAAS
Lean production для SAASLean production для SAAS
Lean production для SAAS
 
Driving Business Goals with Recommender Systems @ YAC/m 2015
Driving Business Goals with Recommender Systems @ YAC/m 2015Driving Business Goals with Recommender Systems @ YAC/m 2015
Driving Business Goals with Recommender Systems @ YAC/m 2015
 
The Economics of Recommender Systems
The Economics of Recommender SystemsThe Economics of Recommender Systems
The Economics of Recommender Systems
 
Recommender Systems in a nutshell
Recommender Systems in a nutshellRecommender Systems in a nutshell
Recommender Systems in a nutshell
 

Último

Último (20)

Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 

State of the Machine Translation by Intento (July 2018)

  • 1. STATE OF THE MACHINE TRANSLATION by Intento July 2018
  • 2. July 2018© Intento, Inc. About At Intento, we make Cloud Cognitive AI easy to discover, access, and evaluate for a specific use. — Evaluation is a pain for everyone: to compare different services, you have to sign a lot of contracts and integrate many APIs. — As we show in this report, the Machine Translation landscape is complex, with 4x difference in quality and 195x difference in price across pre-build models available from different vendors. — We deliver this overview report for FREE. To evaluate on your own dataset, reach us at hello@inten.to 2
  • 3. July 2018© Intento, Inc. Intento MT Gateway - that’s how we run such evaluations Vendor-agnostic API Sync and async modes CLI tools and SDKs Works with files of any size Much faster due to hyper- threading Get your API key at inten.to 3
  • 4. July 2018© Intento, Inc. Important highlights Amazon and SAP went from preview to production — Amazon, Baidu, IBM, Microsoft, and PROMT increased language coverage — For 7 language pairs, available MT quality raised more than 5% since Mar 2018: en-ko (▲25%), en-nl (▲11%), nl-en (▲14%), ru-de (▲8%), ja-fr (▲10%), en-cs (▲5%), en-tr (▲7%) (see slide 15) — For 13 language pairs, the best MT provider has changed since Mar 2018: en-zh, de-ru, ru-de, en-tr, en-pt, nl-en, en-nl, ja-en, zh-it, cs-en, en-cs, en- it, ru-en — To get the best quality across 48 language pairs, one needs 9 engines (see slide 18) 4
  • 5. July 2018© Intento, Inc. Overview 1 TRANSLATION QUALITY 2 PRICING 3 LANGUAGE COVERAGE 4 HISTORICAL PROGRESS 5 CONCLUSIONS 48 Language Pairs 19 Machine Translation Engines 5
  • 6. July 2018© Intento, Inc. Benchmark changes since March 2018 Added 3 engines: ModernMT*, Alibaba**, Youdao** — Updated to new versions: IBM (v3/NMT), Microsoft (v3/ NMT) — Updated SAP*** and Amazon from preview to public — Added detailed best and optimal engines chart (slides 18-19) — Added Pricing section (slide 21) * evaluated on one language pair (cost prohibitive) ** unavailable outside of China yet *** not evaluated (cost prohibitive & unstable) 6
  • 7. July 2018© Intento, Inc. Machine Translation Engines* Evaluated * We have evaluated general purpose Cloud Machine Translation services with prebuilt translation models, provided via API. Some vendors also provide web-based, on-premise or custom MT engines, which may differ on all aspects from what we’ve evaluated. Alibaba Cloud Machine Translation Amazon Translate Baidu Translate API DeepL API Google Cloud Translation API GTCom YeeCloud MT IBM Watson NMT Language Translator IBM Watson SMT Language Translator Microsoft NMT Translator Text API Microsoft SMT Translator Text API ModernMT API PROMT Cloud API SAP Translation Hub SDL Language Cloud Translation Toolkit Systran PNMT Enterprise Server Systran REST Translation API Tencent Cloud TMT API (preview) Yandex Translate API Youdao Cloud Translation API 7
  • 8. July 2018© Intento, Inc. 1Translation Quality 1.1 Evaluation Methodology 1.2 Available MT Quality 1.3 Top-Performing Engines 1.4 Best General-Purpose Engines 1.5 Optimal General-Purpose Engines 1.6 Price vs. Performance 8
  • 9. July 2018© Intento, Inc. Evaluation methodology (I) Translation quality is evaluated by computing LEPOR score between reference translations and the MT output (Slide 11). — Currently, our goal is to evaluate the performance of translation between the most popular languages (Slide 12). — We use public datasets from StatMT/WMT, CASMACAT News Commentary and Tatoeba (Slide 13). — We have performed LEPOR metric convergence analysis to identify the minimal viable number of segments in the dataset. See Slide 14 for some details. 9
  • 10. July 2018© Intento, Inc. Evaluation methodology (II) We judge that the MT quality of service A is better than that of B for the language pair C if: - mean LEPOR score of A is greater than LEPOR of B for the pair C, and - lower bound of the LEPOR 95% confidence interval of A is greater than the upper bound of the LEPOR confidence interval of B for the pair C. See Slide 14 for example. — Different language pairs (and different datasets) impose different translation complexity. To compare overall MT performance of different services, we regularize LEPOR scores across all language pairs (See Appendix A for more details). 10
  • 11. July 2018© Intento, Inc. LEPOR score LEPOR: automatic machine translation evaluation metric considering the enhanced Length Penalty, n-gram Position difference Penalty and Recall — In our evaluation, we used hLEPORA v.3.1: — (best metric from ACL-WMT 2013 contest) https://www.slideshare.net/AaronHanLiFeng/lepor-an-augmented-machine-translation-evaluation-metric-thesis-ppt https://github.com/aaronlifenghan/aaron-project-lepor LIKE BLEU, BUT BETTER 11
  • 12. July 2018© Intento, Inc. 48 Language Pairs * https://w3techs.com/technologies/overview/content_language/all Language groups by web popularity*: P1 - ≥ 2.0% websites P2 - 0.5%-2% websites P3 - 0.1-0.3% websites P4 - <0.1% websites — We focus on the en-P1, P1-en and P1-P1 (partially) en ru ja de es fr pt it zh cs tr fi ro ko ar nl en ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ru ✓ ✓ ✓ ✓ ✓ ja ✓ ✓ ✓ de ✓ ✓ ✓ ✓ ✓ es ✓ ✓ fr ✓ ✓ ✓ ✓ pt ✓ it ✓ ✓ ✓ zh ✓ ✓ ✓ cs ✓ tr ✓ fi ✓ ro ✓ ko ✓ ar ✓ nl ✓ 12
  • 13. July 2018© Intento, Inc. Datasets WMT-2013 (translation task, news domain) en-es, es-en WMT-2015 (translation task, news domain) fr-en, en-fr WMT-2016 (translation task, news domain) cs-en, en-cs, de-en, en-de, ro-en, en-ro, fi-en, en-fi, ru-en, en-ru, tr-en, en-tr WMT-2017 (translation task, news domain) zh-en, en-zh NewsCommentary-2011 en-ja, ja-en, en-pt, pt-en, en-it, it-en, ru-de, ru-es, ru-fr, ru-pt, ja-fr, de-ja, es-zh, fr- ru, fr-es, it-pt, zh-it, en-ar, ar-en, en-nl, nl-en, fr-de, de-fr, de-it, ja-zh, zh-ja Tatoeba en-ko, ko-en 13
  • 14. July 2018© Intento, Inc. We used 900 - 3000 sentences per language pair. The metric stabilizes and adding more from the same domain won’t change the outcome. number of sentences regularisedhLEPORscores Aggregated across all language pairs Examples for individual language pairs: LEPOR Convergence Confi- dence interval Aggre- gated mean 14
  • 15. July 2018© Intento, Inc. en ru ja de es fr pt it zh cs tr fi ro ko ar nl en 2 6 3 6 4 5 5 4 2 3 1 2 1 2 1 ru 2 3 3 3 2 ja 4 2 4 de 5 3 3 4 4 es 5 3 fr 6 3 5 8 pt 5 it 8 2 5 zh 4 4 4 cs 4 tr 4 fi 2 ro 3 ko 1 ar 5 nl 1 $$ $$ Available MT Quality Maximal Available hLEPOR score: >80 % 70 % 60 % 50 % 40 % <40 % Minimal price for this quality, per 1M char*: $$$ ≥$20 $$ $10-15 $ <$10 No. of top-performing MT Providers** * base pricing tier ** up to 5% worse than the leader, SMT and NMT counted separately $$ $$ $$ $$ $$ $$$ $$ $$ $$ $$ $$ $ $$ $$$$$ $$ $ $ $$ $$ $$ $$ $ $$ $ $$$ $$ $$ $$ $$ $$ $$ $$$$ $$ $$ $ $$ $$$ $$ $$ $$ $$$ $ $$ 15
  • 16. July 2018© Intento, Inc. Sample pair analysis: English-Chinese LEPOR score Providers Price range (per 1M characters) 71 % Tencent (preview) 70 % Google, GTCom $10-20 68 % Baidu $7 66.5 % Systran PNMT, Amazon $15-? 65 % Microsoft, IBM NMT $10-21.4 based on WMT-17 dataset BEST QUALITY: Tencent (preview) TOP 5%: Tencent, Google, GTCom, Baidu BEST PRICE IN TOP 5%: Baidu 16
  • 17. July 2018© Intento, Inc. optimal Provides the lowest price among the top 5% MT engines for a language pair 0 10 20 30 40 50 google deepl am azon yandex ibm -nm t prom t m sft-nm t tencent ibm -sm t baidu systran-pnm tgtcom m sft-sm t sdl-sm t m odernm t across 48 language pairs* TOP Performing MT Providers best Provides the best MT Quality for a language pair top 5% Within 5% of the best available MT Quality for a language pair 17
  • 18. July 2018© Intento, Inc. en ru ja de es fr pt it zh cs tr fi ro ko ar nl en ru ja de es fr pt it zh cs tr fi ro ko ar nl Best general- purpose MT engines MT Engines google deepl amazon yandex ibm-nmt promt msft-nmt ibm-smt tencent 18
  • 19. July 2018© Intento, Inc. en ru ja de es fr pt it zh cs tr fi ro ko ar nl en ru ja de es fr pt it zh cs tr fi ro ko ar nl * Cheapest with a performance within 5% of the best available for this language pair Optimal* general- purpose MT engines MT Engines msft-nmt yandex msft-smt baidu google amazon ibm-nmt promt ibm-smt 19
  • 20. July 2018© Intento, Inc. Price vs. Performance* AFFORDABILITY PERFORMANCE As of March 2018 ACCURATE NOT PUB LIC COST-EFFECTIVE Performance Regularized hLEPOR score aggregated across all language pairs in the dataset Affordability = 1/price Using public volume- based pricing tiers Legend • performance range: - regularized average - max across all pairs - min across all pairs • price range * only production-ready engines shown 20
  • 21. July 2018© Intento, Inc. 2Public pricing USD per 1M symbols * +20% for some language pairs ** estimation based on 4.79 symbols per word 21
  • 22. July 2018© Intento, Inc. 3Language Coverage 3.1 Supported and Unique per Provider 3.2 Coverage by Language Popularity 22
  • 23. July 2018© Intento, Inc. 1 100 10000 G oogle Yandex M icrosoftN M TM icrosoftSM T Baidu Tencent Systran Systran PN M T PRO M T SDL Language C loud Youdao SAP M odernM T DeepL IBM N M T Am azon IBM SM T Alibaba G TC om 2 11 2 56 138 119 1 074 3 022 6 8 20 24 34 424447 72 104106110110 210 812 3 7823 660 8 556 10 712 Total Unique Supported and Unique Language Pairs Unique language pairs - supported exclusively by one provider 23
  • 24. July 2018© Intento, Inc. Language popularity Language groups by web popularity*: P1 - ≥ 2.0% websites P2 - 0.5%-2% websites P3 - 0.1-0.3% websites P4 - <0.1% websites * https://w3techs.com/technologies/overview/content_language/all A total of 29070 pairs possible, 13098 are supported across all providers P1 en, ru, ja, de, es, fr, pt, it, zh P2 pl, fa, tr, nl, ko, cs, ar, vi, el, sv in, ro, hu P3 da, sk, fi, th, bg, he, lt, uk, hr, no, nb, sr, ca, sl, lv, et P4 hi, az, bs, ms, is, mk, bn, eu, ka, sq, gl, mn, kk, hy, se, uz, kr, ur, ta, nn, af, be, si, my, br, ne, sw, km, fil, ml, pa, … 24
  • 25. July 2018© Intento, Inc. 100% 100% 63% 31% P1 P2 P3 P4 P1 P2 P3 P4 60% 100% 100% 100% 63% 100% 100% 100% 63% 63% 60% 99% Language coverage by popularity 45% of possible language pairs 25
  • 26. July 2018© Intento, Inc. Language coverage by service provider Google Cloud Translation API Yandex Translate API Microsoft Translator Text API (SMT) Microsoft Translator Text API (NMT) Baidu Translate API Tencent Cloud TMT API (preview) Systran REST Translation API Systran PNMT Enterprise Server PROMT Cloud API SDL Language Cloud Translation Toolkit Youdao Cloud Translation API SAP Translation Hub ModernMT API DeepL API IBM Watson Language Translator (NMT) Amazon Translate IBM Watson Language Translator (SMT) Alibaba Translate GTCom YeeCloud MT 26
  • 27. July 2018© Intento, Inc. 4 Historical Progress 4.1 Number of Cloud MT Vendors 4.2 MT Quality 4.3 Performance/Price Efficiency 27
  • 28. July 2018© Intento, Inc. Independent Cloud MT Vendors with pre-built models Commercial Alibaba, Amazon, Baidu, DeepL, Google, GTCom, IBM, Microsoft, ModernMT, PROMT, SAP, SDL, Systran, Yandex, Youdao Preview Tencent 0 4 8 12 16 Jul 17 Nov 17 Mar 18 Jul 2018 Preview Commercial Intento, Inc. • July 2018 28
  • 29. July 2018© Intento, Inc. 30 % 40 % 50 % 60 % 70 % 80 % Jul 17 Nov 17 Mar 18 Jul 18 Best pair Worst pair 1 1 Best available MT Quality Number of language pairs available at this level of LEPOR quality out of 14 pairs we evaluated since July 2017 (ru, de, cs, tr, fi, ro, zh to en and back) 8 4 2 7 4 2 Intento, Inc. • July 2018 7 4 2 2 7 4 1 29
  • 30. July 2018© Intento, Inc. 1 12 Best available Performance/Price Efficiency Efficiency = (hLEPOR in %)² / (USD per 1M symbols) — Number of language pairs available at this level of efficiency out of 14 pairs we evaluated since July 2017 (ru, de, cs, tr, fi, ro, zh to en and back) 100 200 300 400 500 600 700 800 900 Jul 17 Nov 17 Mar 18 Jul 18 Best pair Worst pair 3 2 3 2 3 1 1 1 6 3 2 Intento, Inc. • July 2018 3 1 4 3 1 3 4 3 4 30
  • 31. July 2018© Intento, Inc. 5 Conclusions Machine Translation quality and efficiency improves monthly, but far from being ideal, hence clever MT choice is a must. — In the same time, the MT landscape gets more fragmented as focus shifts from having the best algorithms to having the best data. — Even for the general domain, having the best quality across 48 language pairs requires 9 engines used simultaneously. 31
  • 32. July 2018© Intento, Inc. Custom version of this report You may the evaluation for your project using our vendor-agnostic API and command-line tools. — Also we may help with translating your corpus via multiple vendors or handling the whole evaluation for your project. — Reach us at hello@inten.to 32
  • 33. July 2018© Intento, Inc. Evaluate vendors on your own data with no effort — up to +230% quality and -87% price by choosing the right vendor — save 12wk of engineering and data science efforts Manage and optimise vendor portfolio with our smart routing AI — use the best vendor for each language pair and domain with no hassle Single integration and contract to multiple vendors and models —
 save upfront 5-7wk per each vendor API — save 1d per month per each vendor API Intento Single API
 routes requests to the best models Reach us for pricing and contract 33
  • 34. STATE OF THE MACHINE TRANSLATION by Intento (https://inten.to) July 2018 Konstantin Savenkov ks@inten.to (415) 429-0021 2150 Shattuck Ave Berkeley CA 94705 34
  • 35. July 2018© Intento, Inc. Appendix A Overall performance of the MT services across many language pairs is computed in the following way: 1. [Standardisation] We compute mean language-standardised LEPOR score (or z-score) for each provider. 2. [Scale adjustment] We restore the original scale by multiplying z-score for each MT provider by the global LEPOR standard deviation and adding the global mean LEPOR score. 35