This presentation is a part of the MosesCore project that encourages the development and usage of open source machine translation tools, notably the Moses statistical MT toolkit. MosesCore is supported by the European Commission Grant Number 288487 under the 7th Framework Programme.
For the latest updates go to http://www.statmt.org/mosescore/
or follow us on Twitter - #MosesCore
Similar to TAUS MT SHOWCASE, Creating Competitive Advantage with Rapid Customization & Deployment of Moses, Tony O’Dowd, KantanMT, 10 October 2013 (20)
TAUS MT SHOWCASE, Creating Competitive Advantage with Rapid Customization & Deployment of Moses, Tony O’Dowd, KantanMT, 10 October 2013
1. TAUS
MACHINE
TRANSLATION
SHOWCASE
Creating Competitive Advantage with Rapid Customization &
Deployment of Moses
10:20 – 10:30
Thursday, 10 October 2013
Tony O’Dowd
KantanMT
2. No
Hardware.
No
So,ware.
No
Hassle
MT.
Tony
O’Dowd
Founder
&
Chief
Architect
Localiza6on
World
2013
3. What
we
aim
to
cover
today?
— User
Scenario
#1
— Building
Produc?on
MT
Systems
—
—
Structured
Approach
Build
–
Measure
–
Learn
Process
— User
Scenario
#2
— Retraining
with
Post-‐Edits
—
RoundTable
Inc.
–
their
story
— User
Scenario
#3
— Selec?ng
the
best
engine
for
the
job
—
—
20
Minutes
Milengo
–
their
approach
GeLng
the
Translator
involved
— Q&A
TAUS
–
MT
Showcase
4. What
is
KantanMT.com?
— Sta6s6cal
MT
System
— Cloud-‐based
—
—
—
Highly
scalable
Inexpensive
to
operate
Quick
to
deploy
— Our
Vision
— To
put
Machine
Transla?on
—
—
—
Customiza?on
Improvement
Deployment
— into
your
hands
Fully
Opera?onal
7
months
Ac6ve
KantanMT
Engines
6,632
Training
Words
Uploaded
23,653,605,925
Member
Words
Translated
362,291,925
TAUS
–
MT
Showcase
5. Measure
–
KantanMT
engine
calibra?on
— Track
using
KantanWatch™
— Compare
engines
quickly
— Monitor
produc?on
data
— Use
your
own
test/tune
data
sets
TAUS
–
MT
Showcase
7. Learn
–
KantanMT
Experimenta?on
— What
to
look
out
for?
BLEU
F-‐Measure
24%
50%
TER
66%
Wordcount
172K
TAUS
–
MT
Showcase
8. Learn
–
KantanMT
Experimenta?on
— Learn
from
examining
the
output
§
Low
OK
High
Low
Catalogue
Errors
§
§
§
§
Untranslated
text
Incorrect
numeric
formaLng
Invalid
characters
High
level
of
post-‐edi?ng
required
§
Conclusions
§
§
§
§
Engine
coverage
is
bad
due
to
low
wordcount
Post-‐Edi?ng
is
high
due
to
low
engine
coverage
Training
data
doesn’t
contain
correct
numeric
formaLng
Bad
formaLng
in
training
data
TAUS
–
MT
Showcase
9. Learn
–
KantanMT
Experimenta?on
— Learn
from
examining
the
output
§ Ac6on
Plan
§
§
§
§
Low
OK
High
Low
Coverage
–
More
training
data
required,
relevant
and
of
high
quality.
Also
use
a
Glossary
File
to
improve
terminology
consistency
and
accuracy.
Numeric
Forma_ng
–
Use
PEX
rule
to
post-‐edit
transla?on
and
fix
numeric
formats
Invalid
Character
–
Use
PEX
rule
to
fix
this
invalid
character
issue
Post-‐Edi6ng
–
By
increasing
the
quan?ty
of
training
data
the
KantanMT
engine
will
perform
be]er
overall
TAUS
–
MT
Showcase
10. Ac6on
Plan
–
focus
on
improving
measurements
TAUS
–
MT
Showcase
11. Build
Measure
Learn
:
The
Results
— Analyse
output
§ Untranslated
text
§
§
Numeric
FormaLng
Invalid
Character
TAUS
–
MT
Showcase
12. User
Scenario
#2
— Long
history
of
MT
usage
— In-‐house
exper?se
— Large
customer
demand
— Using
MT
since
2005
— Now
manage
their
own
in-‐house
system
on
the
KantanMT.com
— Goal
— Faster
project
turnaround
?mes
— More
service
offerings
to
client
base
— More
produc?on
capacity
— Cost
efficiencies
About
RoundTable
Studio
RoundTable
Studio
is
a
leading
provider
of
transla?on
and
localiza?on
services
for
the
Spanish
and
Brazilian
Portuguese
language
markets.
Early Adopter
TAUS
–
MT
Showcase
13. User
Scenario
#2
— Business
Scenario
—
—
Con?nuous
transla?on
quality
improvement
Reduced
post-‐edi?ng/turn-‐around
?mes
Early Adopter
TAUS
–
MT
Showcase
14. User
Scenario
#2
— Results
—
—
—
Greater
produc?on
capacity
Improvement
in
quality
Faster
project
turn-‐around
?mes
“Since
signing
up
with
KantanMT,
we
have
been
able
to
take
on
more
work
and
increase
our
capacity
levels”
Early Adopter
Laura
Grossi
–
MT
Specialist,
RoundTable
Studio
TAUS
–
MT
Showcase
15. User
Scenario
#3
— Long
history
of
MT
usage
— In-‐house
exper?se
— Large
customer
demand
— Originally
outsourced
MT
— 3rd
party
consultancy
company
— Vendor
Agnos6c
— Microso,
Translator
Hub
— KantanMT.com
— All
systems
are
cloud
based
— Like
hands-‐on
approach
to
managing
their
own
MT
engines
About
Milengo
Milengo
provides
transla?on,
localiza?on
and
related
language
services
specializing
in
so,ware,
website
and
documenta?on
localiza?on.
TAUS
–
MT
Showcase
16. User
Scenario
#3
— Business
Scenario
— Select
best
engine
for
language
combina?on
— Client
requests
a
job
that
involves
a
MT
component
— Finding
Training
Data
— Data
is
aggregated
from
the
clients
previous
transla?ons
— Building
Engines
— Same
training
data
is
provided
to
each
engine
— Same
language
combina?ons
— Itera?ve
process
un?l
sa?sfied
with
system
performance
(internal
process)
TAUS
–
MT
Showcase
17. Source
MT
Target
Spacing
Syntax
and
Grammar
Locale
Adaptation
Tags
and
Markup
Sentence
Structure
Punctuation
Wrong
Part
of
Speech
— Transla6on
Quality
Analysis
— Sample
of
1,000
segments
selected
— Tabulated
&
anonymised
Style
Wrong
Word
Form
Capitalization
Text/Information
added
Literal
translation
Compliance
with
client
specs
Source
not
Translated/Omissions
Wrong
Spelling
Wrong
terminology
Overall
quality
(1-‐4)
Fluency
(Score
1-‐5)
Adequacy
(Score
1-‐5)
User
Scenario
#3
Tech
— Dispatched
to
Senior
Translators
TAUS
–
MT
Showcase
18. User
Scenario
#3
— Feedback
collated
from
Senior
Translators
— Match
best
engine
for
language
quality
— Very
unique
–
pseudo-‐crowd
sourcing
of
most
appropriate
engine
— Match
engine
to
best
language
support
— Translators
always
involved
in
engine
selec?on
process
— Feedback
to
client
— Match
requirements
and
quality
expecta?ons
TAUS
–
MT
Showcase
19. User
Scenario
#3
— Levels
of
post-‐edi6ng
services
— Adequacy
Review
—
—
—
All
meaning
expressed
in
the
source
segment
appears
in
the
translated
segment
Structural
integrity
–
tags,
placeholders
Fit-‐for-‐purpose
quality
— Fluency
Review
—
—
No
grammar
errors,
excellent
word
selec?on
and
good
syntax
Publishable
quality
— Client
picks
review
—
To
fit
budget,
?me-‐frame,
audience,
channel
etc.
TAUS
–
MT
Showcase