This presentation is a part of the MosesCore project that encourages the development and usage of open source machine translation tools, notably the Moses statistical MT toolkit.
MosesCore is supported by the European Commission Grant Number 288487 under the 7th Framework Programme.
For the latest updates, follow us on Twitter - #MosesCore
3. Agenda
• What
is
Sta$s$cal
Machine
Transla$on?
• What
is
Moses?
– Common
misconcep$ons
• Coming
up
• What
can
we
do
for
you?
Moses
by
Hieu
Hoang,
University
of
Edinburgh
3
4. Agenda
• What
is
Sta$s$cal
Machine
Transla$on?
• What
is
Moses?
– Common
misconcep$ons
• Coming
up
• What
can
we
do
for
you?
Moses
by
Hieu
Hoang,
University
of
Edinburgh
4
5. What
is
Sta$s$cal
Machine
Transla$on?
It
is
very
temp,ng
to
say
that
a
book
wri5en
in
Chinese
is
simply
a
book
wri5en
in
English
which
was
coded
into
the
“Chinese
code.”
If
we
have
useful
methods
for
solving
almost
any
cryptographic
problem,
may
it
not
be
that
with
proper
interpreta,on
we
already
have
useful
methods
for
transla,on?
Warren
Weaver
1949
Moses
by
Hieu
Hoang,
University
of
Edinburgh
5
6. • NLP
Applica$on
– search
engines,
text
mining
etc.
• Big-‐data
– bi-‐text
from
the
Internet
• eg.
mul$lingual
websites,
documents
– large
monolingual
data
• Learn
to
translate
– from
previous
transla$ons
– models
of
language
What
is
Sta$s$cal
Machine
Transla$on?
Moses
by
Hieu
Hoang,
University
of
Edinburgh
6
7. What
is
Sta$s$cal
Machine
Transla$on?
Training
Training
Data
Linguis$c
Tools
bi-‐text
monolingual
data
dic$onary
SMT
System
transla$on
model
language
model
lots
of
numbers…
Using
Source
Text
SMT
System
transla$on
model
language
model
lots
of
numbers…
§
Source
Text
Moses
by
Hieu
Hoang,
University
of
Edinburgh
7
8. What
is
a
model?
Moses
by
Hieu
Hoang,
University
of
Edinburgh
8
thanks
to
Precision
Transla$on
Tools
• Transla$on
Model
• Language
Model
– (of
the
target
language)
9. What
is
a
model?
• Transla$on
model
– source
à
transla$on
– probability
Moses
by
Hieu
Hoang,
University
of
Edinburgh
9
source
target
probability
den
Vorschlag
the
proposal
0.6227
‘s
proposal
0.1068
a
proposal
0.0341
the
idea
0.0250
this
proposal
0.0227
proposal
0.0205
….
….
10. What
is
a
model?
• Language
model
– Likelihood
of
sentence
– in
target
language
Moses
by
Hieu
Hoang,
University
of
Edinburgh
10
text
probability
I
would
like
0.489
would
like
to
0.905
like
to
commend
0.002
to
commend
the
0.472
commend
the
rapporteur
0.147
….
….
11. Agenda
• What
is
Sta$s$cal
Machine
Transla$on?
• What
is
Moses?
– Common
misconcep$ons
• Coming
up
• What
can
we
do
for
you?
Moses
by
Hieu
Hoang,
University
of
Edinburgh
11
12. What
is
Moses?
• Replacement
for
Pharoah
– Academic
so_ware
– Closed-‐source
• Open
source
• Re-‐wriaen,
clean
code
– More
features
• Large
developer
community
– Ini$ated
by
Hieu
Hoang
– Developed
at
NLP
Workshop
Moses
by
Hieu
Hoang,
University
of
Edinburgh
12
13. Agenda
• What
is
Sta$s$cal
Machine
Transla$on?
• What
is
Moses?
– Timeline
– Common
misconcep$ons
• Coming
up
• What
can
we
do
for
you?
Moses
by
Hieu
Hoang,
University
of
Edinburgh
13
14. What
is
Moses?
• Only
for
Linux
• Difficult
to
use
• Unreliable
• Only
phrase-‐based
• Developed
by
one
person
• Slow
Common
Misconcep$ons
Moses
by
Hieu
Hoang,
University
of
Edinburgh
14
15. Only
works
on
Linux
• Tested
on
– Windows
7
(32-‐bit)
with
Cygwin
6.1
– Mac
OSX
10.7
with
MacPorts
– Ubuntu
12.10,
32
and
64-‐bit
– Debian
6.0,
32
and
64-‐bit
– Fedora
17,
32
and
64-‐bit
– openSUSE
12.2,
32
and
64-‐bit
• Project
files
for
– Visual
Studio
– Eclipse
on
Linux
and
Mac
OSX
Moses
by
Hieu
Hoang,
University
of
Edinburgh
15
16. Difficult
to
use
• Easier
compile
and
install
– Boost
bjam
– No
installa$on
required
• Binaries
available
for
– Linux
– Mac
– Windows/Cygwin
– Moses
+
Friends
• IRSTLM
• GIZA++
and
MGIZA
• Ready-‐made
models
trained
on
Europarl
Moses
by
Hieu
Hoang,
University
of
Edinburgh
16
17. Unreliable
• Monitor
check-‐ins
• Unit
tests
• More
regression
tests
• Nightly
tests
– Run
end-‐to-‐end
training
– hap://www.statmt.org/moses/cruise/
• Tested
on
all
major
OSes
• Train
Europarl
models
– Phrase-‐based,
hierarchical,
factored
– 8
language-‐pairs
– hap://www.statmt.org/moses/RELEASE-‐1.0/models/
Moses
by
Hieu
Hoang,
University
of
Edinburgh
17
18. Only
phrase-‐based
model
– replacement
for
Pharoah
– extension
of
Pharaoh
• From
the
beginning
– Factored
models
– Lamce
and
confusion
network
input
– Mul$ple
LMs,
mul$ple
phrase-‐tables
• since
2009
– Hierarchical
model
– Syntac$c
models
Moses
by
Hieu
Hoang,
University
of
Edinburgh
18
19. Developed
by
one
person
• ANYONE
can
contribute
– 50
contributors
‘git
blame’
of
Moses
repository
0%
5%
10%
15%
20%
25%
30%
35%
40%
Moses
by
Hieu
Hoang,
University
of
Edinburgh
19
20. Slow
thanks
to
Ken!!
Decoding
-101.7
-101.6
-101.5
-101.4
1 2 3 4 5
Modelscore
CPU seconds/sentence excluding loading
Moses
cdec
Joshua
Moses
by
Hieu
Hoang,
University
of
Edinburgh
20
21. Slow
• Mul$threaded
• Reduced
disk
IO
– compress
intermediate
files
• Reduce
disk
space
requirement
Time
(mins)
1-‐core
2-‐cores
4-‐cores
8-‐cores
Size
(MB)
Phrase-‐
based
60
47
(79%)
37
(63%)
33
(56%)
893
Hierarchical
1030
677
(65%)
473
(45%)
375
(36%)
8300
Training
Moses
by
Hieu
Hoang,
University
of
Edinburgh
21
22. What
is
Moses?
Common
Misconcep$ons
• Only
for
Linux
• Difficult
to
use
• Unreliable
• Only
phrase-‐based
• Developed
by
one
person
• Slow
Moses
by
Hieu
Hoang,
University
of
Edinburgh
22
23. What
is
Moses?
• Only
for
Linux
Windows,
Linux,
Mac
• Difficult
to
use
Easier
compile
and
install
• Unreliable
Mul$-‐stage
tes$ng
• Only
phrase-‐based
Hierarchical,
syntax
model
• Developed
by
one
person
everyone
• Slow
Fastest
decoder,
mul$threaded
training,
less
IO
Common
Misconcep$ons
Moses
by
Hieu
Hoang,
University
of
Edinburgh
23
24. Agenda
• What
is
Sta$s$cal
Machine
Transla$on?
• What
is
Moses?
– Common
misconcep$ons
• Coming
up
• What
can
we
do
for
you?
Moses
by
Hieu
Hoang,
University
of
Edinburgh
24
25. Coming
up…
Moses
by
Hieu
Hoang,
University
of
Edinburgh
25
• Code
cleanup
• Incremental
Training
• Beaer
transla$on
– smaller
model
– bigger
data
– faster
training
and
decoding
• Applica$ons
– CAT
tools
– Speech
transla$on
26. Applica$ons
• EU
Project
– CASMACAT
– MATECAT
Moses
by
Hieu
Hoang,
University
of
Edinburgh
26
Computer-‐Aided
Transla$on
27. Agenda
• What
is
Sta$s$cal
Machine
Transla$on?
• What
is
Moses?
– Common
misconcep$ons
• Coming
up
• What
can
we
do
for
you?
Moses
by
Hieu
Hoang,
University
of
Edinburgh
27
28. What
can
we
do
for
you?
– simpler
Moses
– graphical
interface
– Windows
compa$bility
– terminology
and
glossary
– incremental
training
• What
can
you
do
for
us?
– code
– data
– funding
Moses
by
Hieu
Hoang,
University
of
Edinburgh
28
29. What
can
we
do
for
you?
– simpler
Moses
– graphical
interface
– Windows
compa$bility
– terminology
and
glossary
– incremental
training
• What
can
you
do
for
us?
– code
– data
– funding
Moses
by
Hieu
Hoang,
University
of
Edinburgh
29