This presentation is a part of the MosesCore project that encourages the development and usage of open source machine translation tools, notably the Moses statistical MT toolkit. MosesCore is supported by the European Commission Grant Number 288487 under the 7th Framework Programme.
For the latest updates go to http://www.statmt.org/mosescore/
or follow us on Twitter - #MosesCore
3. Why MT?
The purpose
The Crude
§ Extent of localization
§ Data Mining & Business Intelligence
§ Globalized NLP
§ Triage for human translation
Research
§ Machine Learning
§ Statistical Linguistics
§ Same-language translation
The Good
§ Breaking down language barriers
§ Text, Speech, Images & Video
§ Language Preservation
NOT:
§ Spend less money
§ Take the job of human translators
§ Perform miracles
4. Microsoft Translator – Quick Facts
§ Linguistically informed statistical MT system
§ 41 languages – from any language to any other language
§ Runs in Microsoft Datacenter
§ Simple web service API: SOAP, REST, AJAX, OData, web site widget
§ 2 million characters/month free
§ Available in the Enterprise Agreement, as a monthly subscription
§ For extreme confidentiality situations available on-premise
§ Highly customizable:
– Collaborative Translations – Involve community, coworkers and customers
– Hub: Custom engine training via an easy-to use UI
§ Web Scale
– Powers translations in Bing, Microsoft Office, Microsoft SharePoint, Internet Explorer,
Yammer
– Powers translations in Facebook, Twitter, eBay, and many other government and enterprise
sites
4
5. Microsoft Translator at a Glance
World-class Statistical Machine Translation
Built on over a decade of work at Microsoft Research
Big Data Powered
Trained with billions of “parallel” sentences (Bing index & licensed)
General Purpose System
Powers Bing Translator, supports 40+ languages, any-to-any
Unprecedented Customization Capability
Hub train before translation + CTF edit after translation
Powerful Cloud API
Rich, secure API enabling integrations, 99.9% availability
6. Enabling Translation in Many Products
Fully integrated across the stack, Translator extends the value of Microsoft platform and your solutions
built on the Microsoft platform for our customers including consumer facing applications such as Bing
Translator, Bing Toolbar, Bing Dictionary, and Windows Phone App.
A few of our customers and
partners….
+80,000 more.
7. Powerful Tools and Customization
Our machine learning & big-data based translation technology brings the power of instant
translations to break down language barriers for users, developers, webmasters, translators
and businesses. Robust, industry leading tools such as the HUB and CTF allow for
unprecedented customization of the translation experience.
Powerful API
Instant translation and language services in
web, desktop and mobile applications.
Highly scalable and robust cloud-based,
machine-translation service from Microsoft.
Supports SOAP, REST, AJAX, OData, and the
Translator web site translation widget.
Extensibility for development on SharePoint,
Office , Windows Phone, and more…..
Widget
Hub
CTF
Instant translations of
web pages without the
need to write any code.
Custom translation portal to build, train,
and deploy customized automatic
language translation systems.
Override, modify or vote for
the translated output to
best fit the content.
Use the AJAX API to
roll-your-own widget.
Combine your data with Bing big data
to tune the translation output to best fit
your content.
Provide the end-user
alternative translations.
Free with any level of Translator
subscription (including the free tier).
Import the edits back into
Hub for further training.
Use the integrated
“Collaborative
Translations” (CTF)
functionality to tap into
your community.
8. Integrates with your TM tool
Top translation tools support Microsoft Translator
8
9. Give these a try! (Demo)
Bing Translator
Lync Conversation Translator
Translator Widget for Webpages
Word Web App
Contextual Thesaurus
10. Price
Competitively priced
§ Monthly subscription
§ Free for up to 2 million characters per month
§ Base price: $10 per million characters
§ Discounted for higher volumes
§ Paid by credit card or via Microsoft Enterprise agreement
10
11. Extent of localization
Methods of applying MT
Post-Editing
Raw publishing
§ Goal: Human translation
quality
§ Increase human
translator’s productivity
§ In practice: 0% to 25%
productivity increase
§ Goals:
– Varies by content, style and
language
– Good enough for the purpose
– Speed
– Cost
§ Publish the output of the
MT system directly to end
user
§ Best with bilingual UI
§ Good results with technical
audiences
11
12. Extent of localization
Methods of applying MT
Post-Editing Post-Publish Post-Editing
Raw publishing
“P3”
§ Goal: Human translation you are human
§ Know what § Goals:
quality
– Good
translating, and why enough for the purpose
– Speed
§ Increase human
§ Make use of community
translator’s productivity experts – Cost
– Domain
– 25%
§ In practice: 0% to Enthusiasts § Publish the output of the
MT system directly to end
– Employees
productivity increase
– Professional translators
user
– Varies by content, style and
language
§ Best of both worlds with bilingual UI
§ Best
– Fast
§ Good results with technical
– Better than raw
audiences
– Always current
12
13. The Triangle
You can have only two. Not anymore!
Price
P3
Quality
Speed
P3: Post-Publishing Post-Edit
13
14. The cost/quality curve
Optimize for the knee
User satisfaction
Highly visible
marketing
content
Low pageview
supporting
content
Good enough for the
intended purpose
$
No cost
No translation
Low cost
MT+TM+
Community
High cost
Fully qualified
HT
Very high cost
Expert reviewed
translation/
transcreation
14
18. Collaboration: MT + Your community
Translation
Request
Your
community
Response
Your
Web
Site
Microsoft
Translator
Collaborative
TM
Match
f irst
Microsoft
T ranslator
API
Your
App
Translate
if
no
match
Collaborative TM entries:
§ Rating 1 to 4: unapproved
§ Rating 5 to10: Approved
§ Rating -10 to -1: Rejected
1 to many is possible
What makes this possible – fully integrated 100% matching TM
Enormous
l anguage
knowledge
23. Measuring Quality: Human
Evaluations
Knowledge powered by people
§ Absolute
§ 3 to 5 independent human evaluators are asked to rank translation
quality for 250 sentences on a scale of 1 to 4
– Comparing to human translated sentence
– No source language knowledge required
4
Ideal
3
Acceptable
2
Possibly
Acceptable
1
Unacceptable
Grammatically correct, all information
included
Not perfect, but definitely comprehensible,
and with accurate transfer of all important
information
May be interpretable given context/time,
some information transferred accurately
Absolutely not comprehensible and/or little or
not information transferred accurately
Also: Relative evals, against a competitor, or a previous version of ourselves
23
24. Measuring Quality: BLEU*
Cheap and effective – but be aware of the limits
§ A fully automated MT evaluation metric
– Modified N-gram precision, comparing a test
sentence to reference sentences
§ Standard in the MT community
– Immediate, simple to administer
– Correlates with human judgments
§ Automatic and cheap: runs daily and for
every change
§ Not suitable for cross-engine or crosslanguage evaluations
* BLEU: BiLingual Evaluation Understudy
Result are always relative to the test set.
24
25. Measuring Quality In Context
Real-world data
§ Instrumentation to observe user’s behavior
§ A/B testing
§ Polling
In-Context gives you the most useful results
25
30. Knowledge Base Resolve Rate
Human Translation
Machine Translation
Source: Martine
Smets,
Microsoft
Customer
Support
Microsoft is using a customized version of Microsoft
Translator
30
31. Statistical MT - The Simple View
User Input
Text, web pages, Chat etc
Government data
Microsoft manuals
Dictionaries
Phrasebooks
Publisher data
Collect and store
parallel and target
language data
Train statistical
models
Translation
Engine
Translation
Engine
Distributed Runtime
Web mined data
High-Performance
Computing Cluster
Translation APIs and UX
Translated Output
31
32. Collaboration: MT + Your community
Your
community
Your
Web
Site
Microsoft
Translator
Collaborative
TM
Microsoft
T ranslator
API
Your
App
Enormous
l anguage
knowledge
Remember the collaborative TM? There is more.
33. Collaboration: You, your community, and Microsoft
You, your community and Microsoft
working together to create the optimal MT
system for your terminology and style
Your
community
Your
Web
Site
Microsoft
Translator
Collaborative
TM
Your
App
Your
TMs
Microsoft
Translator
API
Microsoft
Translator
Hub
Your
previously
translated
documents
Your
custom
MT
system
Enormous
language
knowledge
Your
collaborators
40. Office 2013 Beta
Send-a-smile program
§ 107 languages
§ 234M words translated
§ $22B revenue, > 60% outside U.S.
§ > 100,000 Send-a-smiles received
§ > 500 bugs fixed
Example of Business Intelligence
use