WeMT Tools and Processes, a presentation by Olga Beregovaya at Localization World 2013 in Silicon Valley. Presented during TAUS Showcase. Discussion of automation and machine translation programs. Welocalize is the leader in localization and translation solutions.
3. Current MT Programs
Dell – 27 languages
Autodesk – 11 languages
PayPal - 8 languages
Cisco – 17 languages between 3 tiers
Intuit – 20+languages
Microsoft (pre-project support)
McAfee (pilot)
… many more in pilot stage
4. MT Program: Path-to-Success
Components
A set of MT engines – “mix and match”
TMT Selection Mechanisms
Post-editing Environment
Processes and metrics
Data gathering and reporting tool – what,
how much, how fast and at what effort
EDUCATION EDUCATION EDUCATION
CHANGE
The recipe
for success
5. Process and Workflow
All aspects of the localization ecosystem are
taken into consideration
MT KPIs:
Selecting the right MT engine
By using our MT engine selection Scorecard we make sure all
important KPIs are taken into consideration at selection time
Empowerment through education
Internal, by the use of customized Toolkits; external, through
specialised Trainings.
The feedback loop
Constructive communication from post-editor to MT
provider
Productivity: Throughputs
Productivity: Delta
Quality: LQA
Quality: Automatic Scores
Cost
GlobalSight: Connectivity
GlobalSight: Tagging
Human Evaluation
Customization: Internal/External
Customization: Time
7. MT Engine Selection Scorecard
Productivity - Throughputs
Number of post-edited words per hour
Productivity - Delta
Percentage difference between translation and postediting time
Cost
Extrapolation, cost per word
CMS - Connectivity
We have tested and used
Is there a connector in place?
different engines so we’ve seen
Quality/Nature of source
the good, the bad and the ugly;
now we can better appreciate
Quality (Final) - LQA
what we have
Internal quality verification
Quality (MT) - Automatic Scores
A set of automatic scoring systems is used
10. Transparency and Ownership
Theory – knowledge foundations
Practice – customized PE sessions for different client accounts
Transparency – process, engine selection/customization, evaluations
Training helps a lot - After I was told
some of the background information
and tips and tricks for certain
engines/outputs, I was much more
relaxed and happy to give MT a go.
Responsibility – valid evaluations, constructive feedback, quality ownership
11. Legacy data – best prediction tool
> Statistics from legacy knowledge base
12. The feedback loop
For me the biggest
advantage would be
the possibility to
implement a client
terminology list [in SMT]
I wish we could easily fix
the corpus for outdated
terminology and
characters
Teach the engine to properly
cope with sentences containing
more than one verb and/or
verbs in progressive form
engine retraining improved significantly the
handling of tags and spaces around tags,
this is a productive achievement as it saves
us a lot of manual corrections.
14. “Beyond the Engine” Tools
• Teaminology - crowdsourcing platform for centralized term governance; simultaneous
concordance search of TMs and term bases => clean training data
• Dispatcher - A global community content translation application that connects user
generated content (UGC) including live chats, social media, forums, comments and
knowledge bases to customized machine translation (MT) engines for real-time
translation
• Source Candidate Scorer – scoring of candidate sentences against historically good and
bad sentences based on POS and perplexity
• Corpus Preparation Toolkit – set of application to maximize data preparation for MT
engine training