SlideShare uma empresa Scribd logo
1 de 29
Baixar para ler offline
TAUS	
  MACHINE	
  TRANSLATION	
  SHOWCASE	
  
Moses Past, Present and Future
09:20 – 09:40
Wednesday, 12 June 2013
Hieu Hoang
University of Edinburgh
Sta$s$cal	
  Machine	
  Transla$on	
  
with	
  Moses	
  
Hieu	
  Hoang	
  
Localiza$on	
  World	
  2013	
  
0.6227	
  
Agenda	
  
•  What	
  is	
  Sta$s$cal	
  Machine	
  Transla$on?	
  
•  What	
  is	
  Moses?	
  
– Common	
  misconcep$ons	
  
•  Coming	
  up	
  
•  What	
  can	
  we	
  do	
  for	
  you?	
  
Moses	
  by	
  Hieu	
  Hoang,	
  University	
  of	
  
Edinburgh	
  
3	
  
Agenda	
  
•  What	
  is	
  Sta$s$cal	
  Machine	
  Transla$on?	
  
•  What	
  is	
  Moses?	
  
– Common	
  misconcep$ons	
  
•  Coming	
  up	
  
•  What	
  can	
  we	
  do	
  for	
  you?	
  
Moses	
  by	
  Hieu	
  Hoang,	
  University	
  of	
  
Edinburgh	
  
4	
  
What	
  is	
  Sta$s$cal	
  Machine	
  Transla$on?	
  
	
  
It	
  is	
  very	
  temp,ng	
  to	
  say	
  that	
  a	
  book	
  wri5en	
  in	
  
Chinese	
  is	
  simply	
  a	
  book	
  wri5en	
  in	
  English	
  
which	
  was	
  coded	
  into	
  the	
  “Chinese	
  code.”	
  If	
  we	
  
have	
  useful	
  methods	
  for	
  solving	
  almost	
  any	
  
cryptographic	
  problem,	
  may	
  it	
  not	
  be	
  that	
  
with	
  proper	
  interpreta,on	
  we	
  already	
  have	
  
useful	
  methods	
  for	
  transla,on?	
  
Warren	
  Weaver	
  
1949	
  
Moses	
  by	
  Hieu	
  Hoang,	
  University	
  of	
  
Edinburgh	
  
5	
  
•  NLP	
  Applica$on	
  
– search	
  engines,	
  text	
  mining	
  etc.	
  
•  Big-­‐data	
  
– bi-­‐text	
  from	
  the	
  Internet	
  
•  eg.	
  mul$lingual	
  websites,	
  documents	
  
– large	
  monolingual	
  data	
  
•  Learn	
  to	
  translate	
  
– from	
  previous	
  transla$ons	
  
– models	
  of	
  language	
  
What	
  is	
  Sta$s$cal	
  Machine	
  Transla$on?	
  
	
  
Moses	
  by	
  Hieu	
  Hoang,	
  University	
  of	
  
Edinburgh	
  
6	
  
What	
  is	
  Sta$s$cal	
  Machine	
  Transla$on?	
  
Training	
  
Training	
  Data	
   Linguis$c	
  Tools	
  
bi-­‐text	
  
monolingual	
  data	
  
dic$onary	
  
SMT	
  System	
  
transla$on	
  model	
  
language	
  model	
  
lots	
  of	
  numbers…	
  
Using	
  
Source	
  Text	
  
SMT	
  System	
  
transla$on	
  model	
  
language	
  model	
  
lots	
  of	
  numbers…	
  
§	
  
Source	
  Text	
  
Moses	
  by	
  Hieu	
  Hoang,	
  University	
  of	
  
Edinburgh	
  
7	
  
What	
  is	
  a	
  model?	
  
Moses	
  by	
  Hieu	
  Hoang,	
  University	
  of	
  
Edinburgh	
  
8	
  
thanks	
  to	
  Precision	
  Transla$on	
  Tools	
  
•  Transla$on	
  Model	
  
•  Language	
  Model	
  
– (of	
  the	
  target	
  language)	
  
What	
  is	
  a	
  model?	
  
•  Transla$on	
  model	
  
– source	
  à	
  transla$on	
  
– probability	
  
Moses	
  by	
  Hieu	
  Hoang,	
  University	
  of	
  
Edinburgh	
  
9	
  
source	
   target	
   probability	
  
den	
  Vorschlag	
   the	
  proposal	
   0.6227	
  
‘s	
  proposal	
   0.1068	
  
a	
  proposal	
   0.0341	
  
the	
  idea	
   0.0250	
  
this	
  proposal	
   0.0227	
  
proposal	
   0.0205	
  
….	
   ….	
  
What	
  is	
  a	
  model?	
  
•  Language	
  model	
  
– Likelihood	
  of	
  sentence	
  
– in	
  target	
  language	
  
Moses	
  by	
  Hieu	
  Hoang,	
  University	
  of	
  
Edinburgh	
  
10	
  
text	
   probability	
  
I	
  would	
  like	
   0.489	
  
would	
  like	
  to	
   0.905	
  
like	
  to	
  commend	
   0.002	
  
to	
  commend	
  the	
   0.472	
  
commend	
  the	
  
rapporteur	
  
0.147	
  
….	
   ….	
  
Agenda	
  
•  What	
  is	
  Sta$s$cal	
  Machine	
  Transla$on?	
  
•  What	
  is	
  Moses?	
  
– Common	
  misconcep$ons	
  
•  Coming	
  up	
  
•  What	
  can	
  we	
  do	
  for	
  you?	
  
Moses	
  by	
  Hieu	
  Hoang,	
  University	
  of	
  
Edinburgh	
  
11	
  
What	
  is	
  Moses?	
  
•  Replacement	
  for	
  Pharoah	
  
– Academic	
  so_ware	
  
– Closed-­‐source	
  
•  Open	
  source	
  
•  Re-­‐wriaen,	
  clean	
  code	
  
– More	
  features	
  
•  Large	
  developer	
  community	
  
– Ini$ated	
  by	
  Hieu	
  Hoang	
  
– Developed	
  at	
  NLP	
  Workshop	
  
Moses	
  by	
  Hieu	
  Hoang,	
  University	
  of	
  
Edinburgh	
  
12	
  
Agenda	
  
•  What	
  is	
  Sta$s$cal	
  Machine	
  Transla$on?	
  
•  What	
  is	
  Moses?	
  
– Timeline	
  
– Common	
  misconcep$ons	
  
•  Coming	
  up	
  
•  What	
  can	
  we	
  do	
  for	
  you?	
  
Moses	
  by	
  Hieu	
  Hoang,	
  University	
  of	
  
Edinburgh	
  
13	
  
What	
  is	
  Moses?	
  
•  Only	
  for	
  Linux	
  
•  Difficult	
  to	
  use	
  
•  Unreliable	
  
•  Only	
  phrase-­‐based	
  
•  Developed	
  by	
  one	
  person	
  
•  Slow	
  
Common	
  Misconcep$ons	
  
Moses	
  by	
  Hieu	
  Hoang,	
  University	
  of	
  
Edinburgh	
  
14	
  
Only	
  works	
  on	
  Linux	
  
•  Tested	
  on	
  
–  Windows	
  7	
  (32-­‐bit)	
  with	
  Cygwin	
  6.1	
  	
  
–  Mac	
  OSX	
  10.7	
  with	
  MacPorts	
  
–  Ubuntu	
  12.10,	
  32	
  and	
  64-­‐bit	
  
–  Debian	
  6.0,	
  32	
  and	
  64-­‐bit	
  
–  Fedora	
  17,	
  32	
  and	
  64-­‐bit	
  
–  openSUSE	
  12.2,	
  32	
  and	
  64-­‐bit	
  
•  Project	
  files	
  for	
  
–  Visual	
  Studio	
  
–  Eclipse	
  on	
  Linux	
  and	
  Mac	
  OSX	
  
Moses	
  by	
  Hieu	
  Hoang,	
  University	
  of	
  
Edinburgh	
  
15	
  
Difficult	
  to	
  use	
  
•  Easier	
  compile	
  and	
  install	
  
–  Boost	
  bjam	
  	
  
–  No	
  installa$on	
  required	
  
•  Binaries	
  available	
  for	
  
–  Linux	
  
–  Mac	
  
–  Windows/Cygwin	
  
–  Moses	
  +	
  Friends	
  
•  IRSTLM	
  
•  GIZA++	
  and	
  MGIZA	
  
•  Ready-­‐made	
  models	
  trained	
  on	
  Europarl	
  
Moses	
  by	
  Hieu	
  Hoang,	
  University	
  of	
  
Edinburgh	
  
16	
  
Unreliable	
  
•  Monitor	
  check-­‐ins	
  
•  Unit	
  tests	
  
•  More	
  regression	
  tests	
  
•  Nightly	
  tests	
  
–  Run	
  end-­‐to-­‐end	
  training	
  
–  hap://www.statmt.org/moses/cruise/	
  
•  Tested	
  on	
  all	
  major	
  OSes	
  
•  Train	
  Europarl	
  models	
  
–  Phrase-­‐based,	
  hierarchical,	
  factored	
  
–  8	
  language-­‐pairs	
  
–  hap://www.statmt.org/moses/RELEASE-­‐1.0/models/	
  
Moses	
  by	
  Hieu	
  Hoang,	
  University	
  of	
  
Edinburgh	
  
17	
  
Only	
  phrase-­‐based	
  model	
  
– replacement	
  for	
  Pharoah	
  
– extension	
  of	
  Pharaoh	
  
•  From	
  the	
  beginning	
  
– Factored	
  models	
  
– Lamce	
  and	
  confusion	
  network	
  input	
  
– Mul$ple	
  LMs,	
  mul$ple	
  phrase-­‐tables	
  
•  since	
  2009	
  
– Hierarchical	
  model	
  
– Syntac$c	
  models	
  
Moses	
  by	
  Hieu	
  Hoang,	
  University	
  of	
  
Edinburgh	
  
18	
  
Developed	
  by	
  one	
  person	
  
•  ANYONE	
  can	
  contribute	
  
	
  
– 50	
  contributors	
  
‘git	
  blame’	
  of	
  Moses	
  repository	
  
0%	
  
5%	
  
10%	
  
15%	
  
20%	
  
25%	
  
30%	
  
35%	
  
40%	
  
Moses	
  by	
  Hieu	
  Hoang,	
  University	
  of	
  
Edinburgh	
  
19	
  
Slow	
  
thanks	
  to	
  Ken!!	
  
Decoding	
  
-101.7
-101.6
-101.5
-101.4
1 2 3 4 5
Modelscore
CPU seconds/sentence excluding loading
Moses
cdec
Joshua
Moses	
  by	
  Hieu	
  Hoang,	
  University	
  of	
  
Edinburgh	
  
20	
  
Slow	
  
•  Mul$threaded	
  
•  Reduced	
  disk	
  IO	
  
– compress	
  intermediate	
  files	
  
•  Reduce	
  disk	
  space	
  requirement	
  
Time	
  (mins)	
   1-­‐core	
   2-­‐cores	
   4-­‐cores	
   8-­‐cores	
   Size	
  (MB)	
  
Phrase-­‐
based	
  
60	
   47	
  
(79%)	
  
37	
  
(63%)	
  
33	
  
(56%)	
  
893	
  
Hierarchical	
   1030	
   677	
  
(65%)	
  
473	
  
(45%)	
  
375	
  
(36%)	
  
8300	
  
Training	
  
Moses	
  by	
  Hieu	
  Hoang,	
  University	
  of	
  
Edinburgh	
  
21	
  
What	
  is	
  Moses?	
  
Common	
  Misconcep$ons	
  
•  Only	
  for	
  Linux	
  
•  Difficult	
  to	
  use	
  
•  Unreliable	
  
•  Only	
  phrase-­‐based	
  
•  Developed	
  by	
  one	
  person	
  
•  Slow	
  
Moses	
  by	
  Hieu	
  Hoang,	
  University	
  of	
  
Edinburgh	
  
22	
  
What	
  is	
  Moses?	
  
•  Only	
  for	
  Linux	
  	
  Windows,	
  Linux,	
  Mac	
  
•  Difficult	
  to	
  use	
  Easier	
  compile	
  and	
  install	
  
•  Unreliable	
  Mul$-­‐stage	
  tes$ng	
  
•  Only	
  phrase-­‐based	
  Hierarchical,	
  syntax	
  model	
  
•  Developed	
  by	
  one	
  person	
  everyone	
  
•  Slow	
  Fastest	
  decoder,	
  mul$threaded	
  training,	
  
less	
  IO	
  
Common	
  Misconcep$ons	
  
Moses	
  by	
  Hieu	
  Hoang,	
  University	
  of	
  
Edinburgh	
  
23	
  
Agenda	
  
•  What	
  is	
  Sta$s$cal	
  Machine	
  Transla$on?	
  
•  What	
  is	
  Moses?	
  
– Common	
  misconcep$ons	
  
•  Coming	
  up	
  
•  What	
  can	
  we	
  do	
  for	
  you?	
  
Moses	
  by	
  Hieu	
  Hoang,	
  University	
  of	
  
Edinburgh	
  
24	
  
Coming	
  up…	
  
Moses	
  by	
  Hieu	
  Hoang,	
  University	
  of	
  
Edinburgh	
  
25	
  
•  Code	
  cleanup	
  
•  Incremental	
  Training	
  
•  Beaer	
  transla$on	
  
– smaller	
  model	
  
– bigger	
  data	
  
– faster	
  training	
  and	
  decoding	
  
•  Applica$ons	
  
– CAT	
  tools	
  
– Speech	
  transla$on	
  
Applica$ons	
  
•  EU	
  Project	
  
– CASMACAT	
  
– MATECAT	
  
Moses	
  by	
  Hieu	
  Hoang,	
  University	
  of	
  
Edinburgh	
  
26	
  
Computer-­‐Aided	
  Transla$on	
  
Agenda	
  
•  What	
  is	
  Sta$s$cal	
  Machine	
  Transla$on?	
  
•  What	
  is	
  Moses?	
  
– Common	
  misconcep$ons	
  
•  Coming	
  up	
  
•  What	
  can	
  we	
  do	
  for	
  you?	
  
Moses	
  by	
  Hieu	
  Hoang,	
  University	
  of	
  
Edinburgh	
  
27	
  
What	
  can	
  we	
  do	
  for	
  you?	
  
– simpler	
  Moses	
  
– graphical	
  interface	
  
– Windows	
  compa$bility	
  
– terminology	
  and	
  glossary	
  
– incremental	
  training	
  
•  What	
  can	
  you	
  do	
  for	
  us?	
  
– code	
  
– data	
  
– funding	
  
Moses	
  by	
  Hieu	
  Hoang,	
  University	
  of	
  
Edinburgh	
  
28	
  
What	
  can	
  we	
  do	
  for	
  you?	
  
– simpler	
  Moses	
  
– graphical	
  interface	
  
– Windows	
  compa$bility	
  
– terminology	
  and	
  glossary	
  
– incremental	
  training	
  
•  What	
  can	
  you	
  do	
  for	
  us?	
  
– code	
  
– data	
  
– funding	
  
Moses	
  by	
  Hieu	
  Hoang,	
  University	
  of	
  
Edinburgh	
  
29	
  

Mais conteúdo relacionado

Destaque

Human Science - TAUS Tokyo Forum 2015
Human Science - TAUS Tokyo Forum 2015Human Science - TAUS Tokyo Forum 2015
Human Science - TAUS Tokyo Forum 2015
TAUS - The Language Data Network
 

Destaque (7)

TaaS Workshop 2014, Active Terminology Prompting for SEO and Website Translat...
TaaS Workshop 2014, Active Terminology Prompting for SEO and Website Translat...TaaS Workshop 2014, Active Terminology Prompting for SEO and Website Translat...
TaaS Workshop 2014, Active Terminology Prompting for SEO and Website Translat...
 
Eric Blassin (Lionbridge) at the Industry Leaders Forum 2015
Eric Blassin (Lionbridge) at the Industry Leaders Forum 2015Eric Blassin (Lionbridge) at the Industry Leaders Forum 2015
Eric Blassin (Lionbridge) at the Industry Leaders Forum 2015
 
TAUS Post-Editing Productivity Guidelines
TAUS Post-Editing Productivity GuidelinesTAUS Post-Editing Productivity Guidelines
TAUS Post-Editing Productivity Guidelines
 
Honyaku Center - TAUS Tokyo Forum 2015
Honyaku Center - TAUS Tokyo Forum 2015Honyaku Center - TAUS Tokyo Forum 2015
Honyaku Center - TAUS Tokyo Forum 2015
 
Human Science - TAUS Tokyo Forum 2015
Human Science - TAUS Tokyo Forum 2015Human Science - TAUS Tokyo Forum 2015
Human Science - TAUS Tokyo Forum 2015
 
TAUS webinar The Big Picture View On The Translation Industry, March 2013
TAUS webinar The Big Picture View On The Translation Industry, March 2013TAUS webinar The Big Picture View On The Translation Industry, March 2013
TAUS webinar The Big Picture View On The Translation Industry, March 2013
 
Quality Management in Localization Certification
Quality Management in Localization CertificationQuality Management in Localization Certification
Quality Management in Localization Certification
 

Semelhante a TAUS MT SHOWCASE, Moses Past, Present and Future, Hieu Hoang, University of Edinburgh, 12 June 2013

SLE/GPCE Keynote: What's the value of an end user? Platforms and Research: Th...
SLE/GPCE Keynote: What's the value of an end user? Platforms and Research: Th...SLE/GPCE Keynote: What's the value of an end user? Platforms and Research: Th...
SLE/GPCE Keynote: What's the value of an end user? Platforms and Research: Th...
Stéphane Ducasse
 
Foss In Undergraduate Studies
Foss In Undergraduate StudiesFoss In Undergraduate Studies
Foss In Undergraduate Studies
guestec838a
 

Semelhante a TAUS MT SHOWCASE, Moses Past, Present and Future, Hieu Hoang, University of Edinburgh, 12 June 2013 (20)

The tipping point
The tipping pointThe tipping point
The tipping point
 
The Tipping Point
The Tipping PointThe Tipping Point
The Tipping Point
 
Rudy Marsman's thesis presentation slides: Speech synthesis based on a limite...
Rudy Marsman's thesis presentation slides: Speech synthesis based on a limite...Rudy Marsman's thesis presentation slides: Speech synthesis based on a limite...
Rudy Marsman's thesis presentation slides: Speech synthesis based on a limite...
 
2010 INTERSPEECH
2010 INTERSPEECH 2010 INTERSPEECH
2010 INTERSPEECH
 
Introducción a NLP (Natural Language Processing) en Azure
Introducción a NLP (Natural Language Processing) en AzureIntroducción a NLP (Natural Language Processing) en Azure
Introducción a NLP (Natural Language Processing) en Azure
 
SLE/GPCE Keynote: What's the value of an end user? Platforms and Research: Th...
SLE/GPCE Keynote: What's the value of an end user? Platforms and Research: Th...SLE/GPCE Keynote: What's the value of an end user? Platforms and Research: Th...
SLE/GPCE Keynote: What's the value of an end user? Platforms and Research: Th...
 
Recent trends in natural language processing
Recent trends in natural language processingRecent trends in natural language processing
Recent trends in natural language processing
 
Learning Beyond Walls - eLearning Ideas and Possibilities for Youth
Learning Beyond Walls - eLearning Ideas and Possibilities for YouthLearning Beyond Walls - eLearning Ideas and Possibilities for Youth
Learning Beyond Walls - eLearning Ideas and Possibilities for Youth
 
How to do science in a large IT company (ICPC World Finals 2021, Moscow)
How to do science in a large IT company (ICPC World Finals 2021, Moscow)How to do science in a large IT company (ICPC World Finals 2021, Moscow)
How to do science in a large IT company (ICPC World Finals 2021, Moscow)
 
GPT-2: Language Models are Unsupervised Multitask Learners
GPT-2: Language Models are Unsupervised Multitask LearnersGPT-2: Language Models are Unsupervised Multitask Learners
GPT-2: Language Models are Unsupervised Multitask Learners
 
Introduction to NLP.pptx
Introduction to NLP.pptxIntroduction to NLP.pptx
Introduction to NLP.pptx
 
Advances in word sense disambiguation
Advances in word sense disambiguationAdvances in word sense disambiguation
Advances in word sense disambiguation
 
OpenSE Learner Support Framework - part 3
OpenSE Learner Support Framework - part 3OpenSE Learner Support Framework - part 3
OpenSE Learner Support Framework - part 3
 
Foss In Undergraduate Studies
Foss In Undergraduate StudiesFoss In Undergraduate Studies
Foss In Undergraduate Studies
 
How to supervise a thesis in NLP in the ChatGPT era? By Laure Soulier
How to supervise a thesis in NLP in the ChatGPT era? By Laure SoulierHow to supervise a thesis in NLP in the ChatGPT era? By Laure Soulier
How to supervise a thesis in NLP in the ChatGPT era? By Laure Soulier
 
Pair Programming - the lightning talk
Pair Programming - the lightning talkPair Programming - the lightning talk
Pair Programming - the lightning talk
 
What is open source?
What is open source?What is open source?
What is open source?
 
Navigating the Storm: eMOP, Big DH Projects, and Agile Steering Standards
Navigating the Storm: eMOP, Big DH Projects, and Agile Steering StandardsNavigating the Storm: eMOP, Big DH Projects, and Agile Steering Standards
Navigating the Storm: eMOP, Big DH Projects, and Agile Steering Standards
 
Waves keynote2c
Waves keynote2cWaves keynote2c
Waves keynote2c
 
Open Source Building Career and Competency
Open Source Building Career and CompetencyOpen Source Building Career and Competency
Open Source Building Career and Competency
 

Mais de TAUS - The Language Data Network

Mais de TAUS - The Language Data Network (20)

TAUS Global Content Summit Amsterdam 2019 / Beyond MT. A few premature reflec...
TAUS Global Content Summit Amsterdam 2019 / Beyond MT. A few premature reflec...TAUS Global Content Summit Amsterdam 2019 / Beyond MT. A few premature reflec...
TAUS Global Content Summit Amsterdam 2019 / Beyond MT. A few premature reflec...
 
TAUS Global Content Summit Amsterdam 2019 / Measure with DQF, Dace Dzeguze (T...
TAUS Global Content Summit Amsterdam 2019 / Measure with DQF, Dace Dzeguze (T...TAUS Global Content Summit Amsterdam 2019 / Measure with DQF, Dace Dzeguze (T...
TAUS Global Content Summit Amsterdam 2019 / Measure with DQF, Dace Dzeguze (T...
 
TAUS Global Content Summit Amsterdam 2019 / Automatic for the People by Domin...
TAUS Global Content Summit Amsterdam 2019 / Automatic for the People by Domin...TAUS Global Content Summit Amsterdam 2019 / Automatic for the People by Domin...
TAUS Global Content Summit Amsterdam 2019 / Automatic for the People by Domin...
 
TAUS Global Content Summit Amsterdam 2019 / The Quantum Leap: Human Parity, C...
TAUS Global Content Summit Amsterdam 2019 / The Quantum Leap: Human Parity, C...TAUS Global Content Summit Amsterdam 2019 / The Quantum Leap: Human Parity, C...
TAUS Global Content Summit Amsterdam 2019 / The Quantum Leap: Human Parity, C...
 
TAUS Global Content Summit Amsterdam 2019 / Growing Business by Connecting Co...
TAUS Global Content Summit Amsterdam 2019 / Growing Business by Connecting Co...TAUS Global Content Summit Amsterdam 2019 / Growing Business by Connecting Co...
TAUS Global Content Summit Amsterdam 2019 / Growing Business by Connecting Co...
 
Achieving Translation Efficiency and Accuracy for Video Content, Xiao Yuan (P...
Achieving Translation Efficiency and Accuracy for Video Content, Xiao Yuan (P...Achieving Translation Efficiency and Accuracy for Video Content, Xiao Yuan (P...
Achieving Translation Efficiency and Accuracy for Video Content, Xiao Yuan (P...
 
Introduction Innovation Contest Shenzhen by Henri Broekmate (Lionbridge)
Introduction Innovation Contest Shenzhen by Henri Broekmate (Lionbridge)Introduction Innovation Contest Shenzhen by Henri Broekmate (Lionbridge)
Introduction Innovation Contest Shenzhen by Henri Broekmate (Lionbridge)
 
Game Changer for Linguistic Review: Shifting the Paradigm, Klaus Fleischmann...
 Game Changer for Linguistic Review: Shifting the Paradigm, Klaus Fleischmann... Game Changer for Linguistic Review: Shifting the Paradigm, Klaus Fleischmann...
Game Changer for Linguistic Review: Shifting the Paradigm, Klaus Fleischmann...
 
A translation memory P2P trading platform - to make global translation memory...
A translation memory P2P trading platform - to make global translation memory...A translation memory P2P trading platform - to make global translation memory...
A translation memory P2P trading platform - to make global translation memory...
 
Shiyibao — The Most Efficient Translation Feedback System Ever, Guanqing Hao ...
Shiyibao — The Most Efficient Translation Feedback System Ever, Guanqing Hao ...Shiyibao — The Most Efficient Translation Feedback System Ever, Guanqing Hao ...
Shiyibao — The Most Efficient Translation Feedback System Ever, Guanqing Hao ...
 
Stepes – Instant Human Translation Services for the Digital World, Carl Yao (...
Stepes – Instant Human Translation Services for the Digital World, Carl Yao (...Stepes – Instant Human Translation Services for the Digital World, Carl Yao (...
Stepes – Instant Human Translation Services for the Digital World, Carl Yao (...
 
Farmer Lv (TrueTran)
Farmer Lv (TrueTran)Farmer Lv (TrueTran)
Farmer Lv (TrueTran)
 
Smart Translation Resource Management: Semantic Matching, Kirk Zhang (Wiitran...
Smart Translation Resource Management: Semantic Matching, Kirk Zhang (Wiitran...Smart Translation Resource Management: Semantic Matching, Kirk Zhang (Wiitran...
Smart Translation Resource Management: Semantic Matching, Kirk Zhang (Wiitran...
 
The Theory and Practice of Computer Aided Translation Training System, Liu Q...
 The Theory and Practice of Computer Aided Translation Training System, Liu Q... The Theory and Practice of Computer Aided Translation Training System, Liu Q...
The Theory and Practice of Computer Aided Translation Training System, Liu Q...
 
Translation Technology Showcase in Shenzhen
Translation Technology Showcase in ShenzhenTranslation Technology Showcase in Shenzhen
Translation Technology Showcase in Shenzhen
 
How to efficiently use large-scale TMs in translation, Jing Zhang (Tmxmall)
How to efficiently use large-scale TMs in translation, Jing Zhang (Tmxmall)How to efficiently use large-scale TMs in translation, Jing Zhang (Tmxmall)
How to efficiently use large-scale TMs in translation, Jing Zhang (Tmxmall)
 
SDL Trados Studio 2017, Jocelyn He (SDL)
SDL Trados Studio 2017, Jocelyn He (SDL)SDL Trados Studio 2017, Jocelyn He (SDL)
SDL Trados Studio 2017, Jocelyn He (SDL)
 
How we train post-editors - Yongpeng Wei (Lingosail)
How we train post-editors - Yongpeng Wei (Lingosail)How we train post-editors - Yongpeng Wei (Lingosail)
How we train post-editors - Yongpeng Wei (Lingosail)
 
A use-case for getting MT into your company, Kerstin Berns (berns language c...
 A use-case for getting MT into your company, Kerstin Berns (berns language c... A use-case for getting MT into your company, Kerstin Berns (berns language c...
A use-case for getting MT into your company, Kerstin Berns (berns language c...
 
QE integrated in XTM, by Bob Willans (XTM)
QE integrated in XTM, by Bob Willans (XTM)QE integrated in XTM, by Bob Willans (XTM)
QE integrated in XTM, by Bob Willans (XTM)
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 

TAUS MT SHOWCASE, Moses Past, Present and Future, Hieu Hoang, University of Edinburgh, 12 June 2013

  • 1. TAUS  MACHINE  TRANSLATION  SHOWCASE   Moses Past, Present and Future 09:20 – 09:40 Wednesday, 12 June 2013 Hieu Hoang University of Edinburgh
  • 2. Sta$s$cal  Machine  Transla$on   with  Moses   Hieu  Hoang   Localiza$on  World  2013   0.6227  
  • 3. Agenda   •  What  is  Sta$s$cal  Machine  Transla$on?   •  What  is  Moses?   – Common  misconcep$ons   •  Coming  up   •  What  can  we  do  for  you?   Moses  by  Hieu  Hoang,  University  of   Edinburgh   3  
  • 4. Agenda   •  What  is  Sta$s$cal  Machine  Transla$on?   •  What  is  Moses?   – Common  misconcep$ons   •  Coming  up   •  What  can  we  do  for  you?   Moses  by  Hieu  Hoang,  University  of   Edinburgh   4  
  • 5. What  is  Sta$s$cal  Machine  Transla$on?     It  is  very  temp,ng  to  say  that  a  book  wri5en  in   Chinese  is  simply  a  book  wri5en  in  English   which  was  coded  into  the  “Chinese  code.”  If  we   have  useful  methods  for  solving  almost  any   cryptographic  problem,  may  it  not  be  that   with  proper  interpreta,on  we  already  have   useful  methods  for  transla,on?   Warren  Weaver   1949   Moses  by  Hieu  Hoang,  University  of   Edinburgh   5  
  • 6. •  NLP  Applica$on   – search  engines,  text  mining  etc.   •  Big-­‐data   – bi-­‐text  from  the  Internet   •  eg.  mul$lingual  websites,  documents   – large  monolingual  data   •  Learn  to  translate   – from  previous  transla$ons   – models  of  language   What  is  Sta$s$cal  Machine  Transla$on?     Moses  by  Hieu  Hoang,  University  of   Edinburgh   6  
  • 7. What  is  Sta$s$cal  Machine  Transla$on?   Training   Training  Data   Linguis$c  Tools   bi-­‐text   monolingual  data   dic$onary   SMT  System   transla$on  model   language  model   lots  of  numbers…   Using   Source  Text   SMT  System   transla$on  model   language  model   lots  of  numbers…   §   Source  Text   Moses  by  Hieu  Hoang,  University  of   Edinburgh   7  
  • 8. What  is  a  model?   Moses  by  Hieu  Hoang,  University  of   Edinburgh   8   thanks  to  Precision  Transla$on  Tools   •  Transla$on  Model   •  Language  Model   – (of  the  target  language)  
  • 9. What  is  a  model?   •  Transla$on  model   – source  à  transla$on   – probability   Moses  by  Hieu  Hoang,  University  of   Edinburgh   9   source   target   probability   den  Vorschlag   the  proposal   0.6227   ‘s  proposal   0.1068   a  proposal   0.0341   the  idea   0.0250   this  proposal   0.0227   proposal   0.0205   ….   ….  
  • 10. What  is  a  model?   •  Language  model   – Likelihood  of  sentence   – in  target  language   Moses  by  Hieu  Hoang,  University  of   Edinburgh   10   text   probability   I  would  like   0.489   would  like  to   0.905   like  to  commend   0.002   to  commend  the   0.472   commend  the   rapporteur   0.147   ….   ….  
  • 11. Agenda   •  What  is  Sta$s$cal  Machine  Transla$on?   •  What  is  Moses?   – Common  misconcep$ons   •  Coming  up   •  What  can  we  do  for  you?   Moses  by  Hieu  Hoang,  University  of   Edinburgh   11  
  • 12. What  is  Moses?   •  Replacement  for  Pharoah   – Academic  so_ware   – Closed-­‐source   •  Open  source   •  Re-­‐wriaen,  clean  code   – More  features   •  Large  developer  community   – Ini$ated  by  Hieu  Hoang   – Developed  at  NLP  Workshop   Moses  by  Hieu  Hoang,  University  of   Edinburgh   12  
  • 13. Agenda   •  What  is  Sta$s$cal  Machine  Transla$on?   •  What  is  Moses?   – Timeline   – Common  misconcep$ons   •  Coming  up   •  What  can  we  do  for  you?   Moses  by  Hieu  Hoang,  University  of   Edinburgh   13  
  • 14. What  is  Moses?   •  Only  for  Linux   •  Difficult  to  use   •  Unreliable   •  Only  phrase-­‐based   •  Developed  by  one  person   •  Slow   Common  Misconcep$ons   Moses  by  Hieu  Hoang,  University  of   Edinburgh   14  
  • 15. Only  works  on  Linux   •  Tested  on   –  Windows  7  (32-­‐bit)  with  Cygwin  6.1     –  Mac  OSX  10.7  with  MacPorts   –  Ubuntu  12.10,  32  and  64-­‐bit   –  Debian  6.0,  32  and  64-­‐bit   –  Fedora  17,  32  and  64-­‐bit   –  openSUSE  12.2,  32  and  64-­‐bit   •  Project  files  for   –  Visual  Studio   –  Eclipse  on  Linux  and  Mac  OSX   Moses  by  Hieu  Hoang,  University  of   Edinburgh   15  
  • 16. Difficult  to  use   •  Easier  compile  and  install   –  Boost  bjam     –  No  installa$on  required   •  Binaries  available  for   –  Linux   –  Mac   –  Windows/Cygwin   –  Moses  +  Friends   •  IRSTLM   •  GIZA++  and  MGIZA   •  Ready-­‐made  models  trained  on  Europarl   Moses  by  Hieu  Hoang,  University  of   Edinburgh   16  
  • 17. Unreliable   •  Monitor  check-­‐ins   •  Unit  tests   •  More  regression  tests   •  Nightly  tests   –  Run  end-­‐to-­‐end  training   –  hap://www.statmt.org/moses/cruise/   •  Tested  on  all  major  OSes   •  Train  Europarl  models   –  Phrase-­‐based,  hierarchical,  factored   –  8  language-­‐pairs   –  hap://www.statmt.org/moses/RELEASE-­‐1.0/models/   Moses  by  Hieu  Hoang,  University  of   Edinburgh   17  
  • 18. Only  phrase-­‐based  model   – replacement  for  Pharoah   – extension  of  Pharaoh   •  From  the  beginning   – Factored  models   – Lamce  and  confusion  network  input   – Mul$ple  LMs,  mul$ple  phrase-­‐tables   •  since  2009   – Hierarchical  model   – Syntac$c  models   Moses  by  Hieu  Hoang,  University  of   Edinburgh   18  
  • 19. Developed  by  one  person   •  ANYONE  can  contribute     – 50  contributors   ‘git  blame’  of  Moses  repository   0%   5%   10%   15%   20%   25%   30%   35%   40%   Moses  by  Hieu  Hoang,  University  of   Edinburgh   19  
  • 20. Slow   thanks  to  Ken!!   Decoding   -101.7 -101.6 -101.5 -101.4 1 2 3 4 5 Modelscore CPU seconds/sentence excluding loading Moses cdec Joshua Moses  by  Hieu  Hoang,  University  of   Edinburgh   20  
  • 21. Slow   •  Mul$threaded   •  Reduced  disk  IO   – compress  intermediate  files   •  Reduce  disk  space  requirement   Time  (mins)   1-­‐core   2-­‐cores   4-­‐cores   8-­‐cores   Size  (MB)   Phrase-­‐ based   60   47   (79%)   37   (63%)   33   (56%)   893   Hierarchical   1030   677   (65%)   473   (45%)   375   (36%)   8300   Training   Moses  by  Hieu  Hoang,  University  of   Edinburgh   21  
  • 22. What  is  Moses?   Common  Misconcep$ons   •  Only  for  Linux   •  Difficult  to  use   •  Unreliable   •  Only  phrase-­‐based   •  Developed  by  one  person   •  Slow   Moses  by  Hieu  Hoang,  University  of   Edinburgh   22  
  • 23. What  is  Moses?   •  Only  for  Linux    Windows,  Linux,  Mac   •  Difficult  to  use  Easier  compile  and  install   •  Unreliable  Mul$-­‐stage  tes$ng   •  Only  phrase-­‐based  Hierarchical,  syntax  model   •  Developed  by  one  person  everyone   •  Slow  Fastest  decoder,  mul$threaded  training,   less  IO   Common  Misconcep$ons   Moses  by  Hieu  Hoang,  University  of   Edinburgh   23  
  • 24. Agenda   •  What  is  Sta$s$cal  Machine  Transla$on?   •  What  is  Moses?   – Common  misconcep$ons   •  Coming  up   •  What  can  we  do  for  you?   Moses  by  Hieu  Hoang,  University  of   Edinburgh   24  
  • 25. Coming  up…   Moses  by  Hieu  Hoang,  University  of   Edinburgh   25   •  Code  cleanup   •  Incremental  Training   •  Beaer  transla$on   – smaller  model   – bigger  data   – faster  training  and  decoding   •  Applica$ons   – CAT  tools   – Speech  transla$on  
  • 26. Applica$ons   •  EU  Project   – CASMACAT   – MATECAT   Moses  by  Hieu  Hoang,  University  of   Edinburgh   26   Computer-­‐Aided  Transla$on  
  • 27. Agenda   •  What  is  Sta$s$cal  Machine  Transla$on?   •  What  is  Moses?   – Common  misconcep$ons   •  Coming  up   •  What  can  we  do  for  you?   Moses  by  Hieu  Hoang,  University  of   Edinburgh   27  
  • 28. What  can  we  do  for  you?   – simpler  Moses   – graphical  interface   – Windows  compa$bility   – terminology  and  glossary   – incremental  training   •  What  can  you  do  for  us?   – code   – data   – funding   Moses  by  Hieu  Hoang,  University  of   Edinburgh   28  
  • 29. What  can  we  do  for  you?   – simpler  Moses   – graphical  interface   – Windows  compa$bility   – terminology  and  glossary   – incremental  training   •  What  can  you  do  for  us?   – code   – data   – funding   Moses  by  Hieu  Hoang,  University  of   Edinburgh   29