The many uses of MT. Paula Shannon, Vice President, Lionbridge
Lionbridge is the largest translation services provider in the world. Paula Shannon will present the company profile with its wide range of services and technologies. In particular she will present how Lionbridge uses Machine Translation in four different ways:
Increase the amount of content clients can afford to translate
Reduce overall translation costs
Help clients reach global customers faster
Offer different levels of automation for different quality needs
The presentation will cover examples of each method in use.
2. 2
Copyright 2015. Confidential – Distribution prohibited without permission
MachineTranslationisOurIndustry’sSingleBiggestInnovation
But,mostdiscussionsfocusonthetacticsofmachinetranslationnot thestrategy
• Debate on the best MT
engine
• Discourse on RBMT vs.
SMT
• Disagreement on
quality frameworks
• Evaluation against a
human standard
• Post-editing output
• Cost and price per word
3. 3
Copyright 2015 Confidential – Distribution prohibited without permission
• Sustaining Innovation
focuses on incremental
improvements to existing
processes or products for
existing customers. It
eventually creates
offerings that are too
complex and too costly to
compete
• A disruptive innovation
helps create a new market
and value network,
displacing an earlier
technology. Initially
disruptive innovation often
begins with lower quality
and seems to “get it all
wrong”
The Innovator’s Dilemma
TheTwoTypesofInnovation:SustainingorDisruptive
“Can we use
MT+Workflow+PostEdit to
produce FAQ and Support
Content for 45% less cost?”
“Can we embed Automatic
Translation in Skype to
connect people around the
world ?”
4. 4
Copyright 2015. Confidential – Distribution prohibited without permission
Lionbridge’sApproachtoMachineTranslation
We Focus on MT solutions and services.
We are MT engine ‘agnostic’ (independent)
We use whatever engine best suits the need:
o Microsoft Translator Hub (w/ and w/o Geofluent)
o Systran Enterprise engine (Hybrid engine)
o MSR-MT (SMT),
o Barcelona (RBMT, own in-house developed RBMT engine)
o Moses (SMT)
o Apertium (RBMT)
After analyzing each customer’s needs, we will
propose (internally or externally) the right MT
solution (e.g.: Microsoft, Renault, eBay, Amazon,
Alibaba, Becton Dickinson)
Leading both Sustaining and Disruptive Innovation
5. Copyright 2015. Confidential – Distribution prohibited without permission
Lionbridge Machine Translation Data
More than 1Billion Words pushed through MTM (Machine
Translation + Translation Memory)
+100 Million Words in 2014 Alone*
-
10,000,000
20,000,000
30,000,000
40,000,000
50,000,000
60,000,000
70,000,000
80,000,000
90,000,000
2009 2010 2011 2012 2013
Words per year 18,854,754 16,445,110 65,426,924 77,847,141 83,224,523
Words Post-Edited PerYear
MT research and development started in 1998
(NetX), RBMT deployment in 2002, Statistical in
2003, and Hybrid MT in 2009
30+ language pairs in our MT solution used daily.
New record last week, 45 languages for a program
+1500 post-editors registered in our vendor
database
In-house MT customizers for 10+ languages
* 3 Million words cited by other leading MLV
6. Copyright 2015. Confidential – Distribution prohibited without permission
MT is Used in Most of Top 50 Accounts
• 60+ Million Words Post-edited
• 101+ Million Words Machine Translated
• 76% of Projects Use Connected, Automated Workflow (TMS)
• 93% of Projects Rely on Automated Production Workflows
(Internal)
29% ofVolume is
“Premium” output
which means
professional post-
editors perform a full
review and edit of
the machine
translated segments
to deliver top human
translation quality.
31% ofVolume is “Basic”
output which includes a
light revision of the MT
output following agreed
guidelines. focus on
delivering output that
should be well
understood in target
language.
42% ofVolume is
“Raw” output which
means that we have
worked to customize
the engine and
processes but have
not post-edited the
output
2014 MTVolume by Output
7. Copyright 2015. Confidential – Distribution prohibited without permission
Growing Breadth of Languages
From 5 Language Pairs in 2002 to More than 58 Today
• 30 to 45
Language Pairs
on average per
week
• 44
Customizations
Language Pairs
Produced
2009 - 2014
English ‘X’ – Language Family
English Spanish – Romance
English French – Romance
English Portuguese – Romance
English Italian – Romance
English Swedish – Scandinavian
English Norwegian – Scandinavian
English Danish – Scandinavian
English Dutch – West Germanic
English German – West Germanic
English Czech – Slavic
English Polish – Slavic
English Russian – Slavic
English Chinese – Asian
English Japanese – Asian
English Korean – Asian
‘X’ English
- Normally better quality than in the opposite
- Very good when ‘X’ is Romance or Scandinavian
Better
Weaker
8. 8
Copyright 2015. Confidential – Distribution prohibited without permission
HowareEngines Customized?
RBMTandStatisticalMTSharemanybestpracticesteps
Customizing Statistical and Hybrid Machine Translation Engines
Linguistic CustomizationTechnical Analysis and Setup Training Publishing ProductionPreparation
Input Analysis
Setup
Feed MT Engine
Training Process
Production Servers
Source Files
Samples
TMs
and glossaries
Clean-up unwanted
data (noise)
Identify and extract
entities, tags and other
important source
elements
Create filters for training
and translation
Classify the elements
based on their function in
the translation
Upload filters to servers
and test them
Extract Terminology
Create Customized
Dictionaries
Create Customized Rules
Training
Corpus
Dictionaries
Rules
Create Baseline
Create Custom
Profile
Run Training
Quality is
OK?
NO
Publish
Translation
Model
YES
Upload Translation Model
Upload Profiles
Upload Custom Filters
Legend
Common Tasks
Hybrid-specific
Tasks
Source FilesSource Files
Access to MT EngineAccess to MT Engine
Translated FilesTranslated Files
9. 9
Copyright 2015. Confidential – Distribution prohibited without permission
Background TM (M-
translated TM)
Source Files
Handoff
>75% Match Leverage
from Previous TM
<75% Match
Enhancement
steps
Foreground TM
(project TM)
Entity
Dictionary
Project
Dictionary
Long/Freq.
Short/Infreq.
Entity
Extractor
Terminology
Extractor
Segment
Analyzer
Machine
Translation
TM Analyzer
Unknown
Segments
Translated
Segments
-15%
penalty
QUICK
Term
&
Punctuation
Lionbridge MTM process workflow
OneApproachtodeployingMTaspartoftheregulartranslationprocess
10. 10
Copyright 2015. Confidential – Distribution prohibited without permission
Edit Distance: AWaytoAssessPostEditEffortLevelNeeded
Perfect translations
No changes are required to obtain a "human quality" translation for these
segments.
Good-quality sentences
Few changes are required to achieve "human quality". The effort
necessary to post-edit the sentence is small.
Compensating sentences
Approximately half of the sentence needs to be modified to achieve
"human quality".
Mistranslated sentences
Most of the sentence has been wrongly translated. In many cases is faster
to translate from scratch.
The Edit Distance Ratio shows the percentage of changes
(insertions, deletions and substitutions of words) needed to achieve
the full human quality standard, as represented by the existing
translations in the reference TM.
ED is an easy to read metric: ED = 0, zero changes; ED = 1, one
word changed.
The goal of an ED analysis is to measure the MT quality
improvement and try to measure Post Editor's effort.
It is very applicable to the
Language Services Provider
who must estimate level of effort,
resourcing, and cost accurately.
11. 11
3
2
1
Copyright 2015. Confidential – Distribution prohibited without permission
eBay European ecommerce Program - Unlocking Global Listings
Moses, and Systran – high degree of customization on entity mining,
terminology.
Unique challenges as product listing and titles are non grammatical strings
Microsoft Visual Studio 2005/2008/2010/2012
15 Million words per language
Highly technical content, Complex format and tagging
UA and UI translated simultaneously. Tight schedule (throughput required: 2
million words per month)
Becton Dickinson internal ERP deployment
No legacy Material and Poor source quality
Unmarked, referenced UI strings. Translation had to preserve English
Solved by developing pattern-based rules to detect probable UI strings
based on surrounding words
Large Scale Challenges, Complex Processes, Transparent Solutions
CaseStudiesofMTasSustainingInnovation
12. 12
Copyright 2015 Confidential – Distribution prohibited without permission
• Sustaining Innovation
is typically driven by
the need to provide
existing customers
with incremental
improvements and
efficiencies over time
• Disruptive innovation is
driven by new market
entrants who introduce
products to the under-
served portion of a
market, often with lower
quality and seem to “get it
all wrong”
The Innovator’s Dilemma
TheTwoTypesofInnovation:SustainingorDisruptive
“Can we use
MT+Workflow+PostEdit to
produce FAQ and Support
Content for 45% less cost?”
“Can we embed Automatic
Translation in Skype to
connect people around the
world ?”
13. 13
Copyright 2014. Confidential – Distribution prohibited without permission
MachineTranslationUnlocking Social MediaMonitoring
A process to listen, classify, report, and deliver for action
Crawl the web in
multiple
languages/countries
using localized keyword
to identify
‘conversations’ related
to customer products
or interests in different
social media
Get results from the
crawler, filter and clean
them, and, when
necessary, fine-tune
the crawling rules to
obtain more relevant
and clean user
comments and results
Using Sentiment
Analysis tool, text
analytics experts and
the crowd perform
sentiment classification
Results are classified,
as positive, negative or
neutral, and quantified,
taking into account
product categories and
features
MachineTranslation,
with special
customization, of
Sentiments to have all
the comment in English
Final Human and
machine accumulative
analysis of all the
feedback collected and
classified from all the
languages
Amplify the customer’s global presence and responsiveness
14. 14
Solving Social Media Challenges with Machine Translation
Combination of Large Scale Translation and Business Process Crowdsourcing Technology
Automated
Entity Identification
Multilingual Crowd
Validation & Extraction
MachineTranslation MultilingualCrowd
Post-Edit & Audit
Lionbridge Smart
Crowd Post-Edit
Lionbridge Hybrid
MachineTranslation
Lionbridge Smart
Crowd Data Extraction
Lionbridge
LinguisticToolbox
15. 15
Copyright 2013. Confidential – Distribution prohibited without permission
Listening on the Topic of…Machine Translation
Analysis,Sentiment,Classification,Reporting
The spike of posts around MT
was because of CNN (generic)
program on MT
Traffic SentimentTrending
16. 16
Copyright 2015. Confidential – Distribution prohibited without permission
67% prefer
online answers
(45% will abandon
purchase if hard)*
Chat
Email Guided Self-help
Online
Communities
IM
Social
Networks
Knowledge
Bases
*Forrester Research, Inc., Navigate the Future of Customer Service
Video
Communicating with global customers who expect answers in real time
The Global Customer Support Challenge
• Language exacerbates
the problem
*Source: Common SenseAdvisory Report “AutomatedTranslationTechnology”
• Business on the Internet
Pervasive real-time connectivity
Smartphone Internet traffic exceeding desktop
Real-time expectations
Instant gratification syndrome
• Customers want convenience
Social networks & search engines
are primary gateways
Preference for online answers
17. 17
But Raw Online MT is not Appropriate for Business?
Real customer scenario in Home Improvement Retail Store support forum
• Regional or
industry-
specific
vernacular
• Proper names
• Slang
• Branded terms
• Typo /
misspellings
/contractions
GeoFluent Output
Greetings greetings my handyman people,
have heard the new regional director of Lowes,
Generoso Caminante, (he who calls the shots!)
is considering the consolidation of your DIY line
within the framework of “You and Lowes” as
brand and website. This means that the
products most common DIY, from carpet,
Tapcon screws to drywall Sheetrock will be
available in a common portal. I’m really excited
because it means that I will be able to select
and purchase my materials in one place. Now
I’ll have more time to do my chores, LOL! That
is, it is phenomenal! What is your opinion of
these events?
Generic Online MT
Salu Salu my little hand people, I’ve heard that
the new regional manager of Lowes,
generous Walker, (the K short COD!) This is
considering the consolidation of our DIY line
within the framework of “Tu and Lowes” such
as brand and web site. This means that the
products most common DIY, from folder, to
handsome screws, to the plasterboard cheet
rock will be available in a common portal. ‘toy
really excited because signifika that I’ll be able
to select and purchase my materials in one
place. Now I’ll have more time to do my
chapusas, JA! Or, is the pump! Q say about
these events?
18. 18
Copyright 2015. Confidential – Distribution prohibited without permission
Language
Processing
Engine
Microsoft®
Translator
Real-time Automated TranslationGeoFluent
• Fix slang , shortcuts,
misspellings, etc.
• Identify branding
and terminology
• Sequester sensitive
data
• Output Correction
• Preserve branding
and terminology
• Restore sensitive
data
20. 20
Copyright 2015. Confidential – Distribution prohibited without permission
Not about translation quality, it’s about call deflection, support costs, and customer satisfaction
Disrupting Customer Support
For Pre-Sales Assistance:
• 11% increase in online conversions*
• 16% productivity increase for call center
agents
* Where multilingual chat was previously unavailable
Blended support cost $150
Cost of a self-served
translated page view
$0.15
Deflection rate 0.5%
Number of translated page
views needed for one
deflection
200
Total cost to get one
deflection (200 x $.15)
$30.00
Savings per deflection
($150-$30 )
$120.00
Net value per translated
page view ($120/200)
$0.60
Breakeven for medium
customer (translated page
views)
33,000
Breakeven 1-2 months
For Customer Support:
• 15% increase in call deflection
• 21% increase in CSAT among
non-English speakers
21. 21
Copyright 2015. Confidential – Distribution prohibited without permission
MachineTranslationasTRUEDisruptiveInnovation
MicrosoftMachineTranslationKnowledge+Skype
It’s not about how
close it is to human
quality - it’s about
the quality of the
humans being close
SMT considerations: 300k translated words is considered the minimum volume to train a statistical MT engine, but this is just a guideline used to simplify things: more complex languages (morphology rich) need more, etc. And simpler languages and content may be OK with less
Hybrid engines with a good RBMT customization (dictionaries and rules) will need less
Type of content and quality of source are also key factors
Customization is not a binary value (yes or no, a couple of hours or several days), customization is analogic, the key is in identifying the right amount and the right type of customization needed
Customization can be time consuming in complex programs and requires expertise. In other cases, customization time is negligible in comparison with the time that takes to sell and implement the solution.
Microsoft Translator
The best-in-class Statistical Machine Translation engine
48 supported languages
Big data: Created by machine learning on billions of words of translated and monolingual material
Leverages Microsoft’s grammatical and syntactic parsers
Very good base-line output, particularly appropriate for unpredictable user-generated content
Microsoft Translator Hub allows training of MT systems with narrower content
GeoFluent by Lionbridge
Improves Microsoft Translator output through
Flexible customization to incorporate glossaries, technical terms, and brand names
Pre-processing of slang, text shortcuts, and grammatical problems often seen in UGC
Seamless integration in third-party communication platforms
Created by the world’s largest translation company:
Deep expertise in MT and other language technologies
Broad linguistic expertise