SlideShare uma empresa Scribd logo
1 de 18
Baixar para ler offline
Introduction
                                Methodology
                                  Discussion




Integrating Machine Translation with Translation
         Memory: A Practical Approach

            Panagiotis Kanavos and Dimitrios Kartsaklis


                                 November 4, 2010




  Panagiotis Kanavos and Dimitrios Kartsaklis   Integrating MT with TM: A Practical Approach   1/ 18
Introduction
                                     Methodology
                                       Discussion


Introduction


      Despite the ongoing research and the progress on the field,
      Machine Translation has not been widely accepted by the
      professional translation industry
      Common criticisms:
              MT is only suitable for draft translations of e-mails and web
              pages
              MT is not efficient for morphologically rich languages
              MT is useful only to large companies owning a wealth of
              resources
      In a nutshell: MT is something for researchers to play around
      with



       Panagiotis Kanavos and Dimitrios Kartsaklis   Integrating MT with TM: A Practical Approach   2/ 18
Introduction
                                    Methodology
                                      Discussion


A Case Study


      How MT can be incorporated into professional translation
      workflows, with limited resources, in ways that significantly
      increase productivity.
      We combine both statistical and rule-based MT systems with
      Translation Memory software using two approaches:
             The on demand, sentence-by-sentence application of MT
             The one-time application of MT into the whole translation
             project
      The case study is conducted in production conditions, with
      final deliverables that require the highest translation quality.



      Panagiotis Kanavos and Dimitrios Kartsaklis   Integrating MT with TM: A Practical Approach   3/ 18
Introduction    Configuration
                                     Methodology     Segment-by-segment workflows
                                       Discussion    One-time MT application workflow


Our setting



      Language pair: English to Greek
      Text to be translated: Two Informatics books: one
      technical guide and one academic textbook.
      TM size: 140,000 TUs coming from in-domain texts
      Terminology DB size: 30,000 entries
      Fuzzy threshold: 70%




       Panagiotis Kanavos and Dimitrios Kartsaklis   Integrating MT with TM: A Practical Approach   4/ 18
Introduction    Configuration
                                    Methodology     Segment-by-segment workflows
                                      Discussion    One-time MT application workflow


Software programs and combinations


      MT systems:
             Statistical: Moses
             Rule-based: Systran
      CAT programs:
             Swordfish II (Java application) over Linux
             D´j` Vu X over MS Windows
              ea
             Wordfast, an MS Word macro template
      Three combinations, based on practical factors:
             Sentence-by-sentence workflow with Swordfish/Moses
             Sentence-by-sentence workflow with Wordfast/Systran
             One-time MT application workflow with D´j` Vu X/Moses
                                                    ea



      Panagiotis Kanavos and Dimitrios Kartsaklis   Integrating MT with TM: A Practical Approach   5/ 18
Introduction    Configuration
                                    Methodology     Segment-by-segment workflows
                                      Discussion    One-time MT application workflow


Swordfish/Moses combination
      Swordfish: Allows connection to external programs or scripts
      Connection with Moses achieved with a custom Python script
      Basic workflow:
        if TM match > 80% then
           accept fuzzy match for post-edit
        else if 70% < TM match =< 80% then
           evaluate the fuzzy match
           if quality not acceptable then
              apply MT
           end if
        else
           apply MT
           if quality not acceptable then
              type the translation from scratch
           end if
        end if
        post-edit
      Panagiotis Kanavos and Dimitrios Kartsaklis   Integrating MT with TM: A Practical Approach   6/ 18
Introduction    Configuration
                                    Methodology     Segment-by-segment workflows
                                      Discussion    One-time MT application workflow


Swordfish/Moses combination: Results




                       Book 1 : Instructive guide, Book 2 : Textbook

      Panagiotis Kanavos and Dimitrios Kartsaklis   Integrating MT with TM: A Practical Approach   7/ 18
Introduction    Configuration
                                    Methodology     Segment-by-segment workflows
                                      Discussion    One-time MT application workflow


Wordfast/Systran combination

      Wordfast: A macro template working on top of MS Word
      Great deal of customization through MS Word macros
      Rule-based version of Systran, supporting user dictionaries
      Basic workflow:
        if TM match < 70% then
           apply pre-editing macros
           send segment to MT engine
           apply post-editing macros
           while MT result not good do
              amend Systran user dictionary and re-send segment to MT
           end while
        else
           accept the translation for post-edit
        end if
        post-edit

      Panagiotis Kanavos and Dimitrios Kartsaklis   Integrating MT with TM: A Practical Approach   8/ 18
Introduction    Configuration
                                    Methodology     Segment-by-segment workflows
                                      Discussion    One-time MT application workflow


Wordfast/Systran combination: Results




                       Book 1 : Instructive guide, Book 2 : Textbook

      Panagiotis Kanavos and Dimitrios Kartsaklis   Integrating MT with TM: A Practical Approach   9/ 18
Introduction    Configuration
                                    Methodology     Segment-by-segment workflows
                                      Discussion    One-time MT application workflow


D´j` Vu X/Moses combination
 ea
      D´j` Vu X: similar concept to Swordfish
       ea
      However: No way of integration with an MT system, so the
      only option is pre-translation of the whole project with Moses
      Send for MT only segments with no TM matches or TM
      matches below 80%
      Pre-translation stage:




      Panagiotis Kanavos and Dimitrios Kartsaklis   Integrating MT with TM: A Practical Approach   10/ 18
Introduction    Configuration
                                    Methodology     Segment-by-segment workflows
                                      Discussion    One-time MT application workflow


D´j` Vu X/Moses combination
 ea
      Basic workflow:
        if TM match > 80% then
           accept the translation for post-edit
        else
           evaluate MT translation
           if quality not acceptable then
              if any TM match exists (between 70-80%) then
                 accept the translation for post-edit
              else
                 apply “auto-assemble” feature
                 if quality not acceptable then
                     type the translation from scratch
                 end if
              end if
           end if
        end if
        post-edit
      Panagiotis Kanavos and Dimitrios Kartsaklis   Integrating MT with TM: A Practical Approach   11/ 18
Introduction    Configuration
                                    Methodology     Segment-by-segment workflows
                                      Discussion    One-time MT application workflow


D´j` Vu X/Moses combination: Results
 ea




                       Book 1 : Instructive guide, Book 2 : Textbook

      Panagiotis Kanavos and Dimitrios Kartsaklis   Integrating MT with TM: A Practical Approach   12/ 18
Introduction
                                     Methodology
                                       Discussion


Productivity increase
       MT & TM combination: Productivity increased to a level not
       possible by applying either technology in isolation:




       Panagiotis Kanavos and Dimitrios Kartsaklis   Integrating MT with TM: A Practical Approach   13/ 18
Introduction
                                     Methodology
                                       Discussion


Important factors

      Quantity and quality of TM entries
      The domain of the translation material used to train the
      statistical MT system
              The above impose serious limitations for those who work with
              small texts in many different domains. Rule-based systems are
              more suitable in such cases
      Language pair: Coding efficient user dictionaries with
      morphologically rich languages is difficult and requires some
      trial and error. Phrase-based systems like Moses have better
      performance
      Style of text: Productivity is higher with repetitive text and
      step-by-step instructions
      User expertise with all technologies involved

       Panagiotis Kanavos and Dimitrios Kartsaklis   Integrating MT with TM: A Practical Approach   14/ 18
Introduction
                                     Methodology
                                       Discussion


A proposal for a unified application

       For general acceptance by the professional translation
       community, MT should be integrated with TM into an
       intuitive unified system
       Basically a TM environment, with the MT engine as an extra
       component working on top of it
       MT suggestions should be presented in a controlled and
       selective way
       Basic components:
              A 2-column translation grid for source and target segments
              Terminology management
              MT engine
              Alignment tool
              Quality assurance control

       Panagiotis Kanavos and Dimitrios Kartsaklis   Integrating MT with TM: A Practical Approach   15/ 18
Introduction
                                    Methodology
                                      Discussion


Advanced issues


      Automation of the training process with TM databases
      Statistical systems require considerable computing resources.
      A solution: MT as Software As a Service (SaaS)
      Terminology databases can be used for more than reference
      purposes
             Additional entry fields for coding MT dictionary entries
             (Systran)
             Linguistic information can be used for creating factored models
             (Moses)
      Automatic suggestions-as-you-type (TransType, Caitra)



      Panagiotis Kanavos and Dimitrios Kartsaklis   Integrating MT with TM: A Practical Approach   16/ 18
Introduction
                                   Methodology
                                     Discussion


Summary



     The combination of MT with TM results in significant
     productivity increase not feasible in a TM-only environment
     Currently there is not a straightforward way for doing that
     Work is in progress by the authors towards this purpose, in
     the form of a Software Specification document that will
     describe the design and the components of such a system in
     every detail




     Panagiotis Kanavos and Dimitrios Kartsaklis   Integrating MT with TM: A Practical Approach   17/ 18
Introduction
                              Methodology
                                Discussion




                            Thank you!

                        Any questions?




Panagiotis Kanavos and Dimitrios Kartsaklis   Integrating MT with TM: A Practical Approach   18/ 18

Mais conteúdo relacionado

Semelhante a Integrating Machine Translation with Translation Memory: A Practical Approach

Carla Parra Escartin - ER2 Hermes Traducciones
Carla Parra Escartin - ER2 Hermes Traducciones Carla Parra Escartin - ER2 Hermes Traducciones
Carla Parra Escartin - ER2 Hermes Traducciones RIILP
 
5 challenges of scaling l10n workflows KantanMT/bmmt webinar
5 challenges of scaling l10n workflows KantanMT/bmmt webinar5 challenges of scaling l10n workflows KantanMT/bmmt webinar
5 challenges of scaling l10n workflows KantanMT/bmmt webinarkantanmt
 
Application of computer aided translation technology in translation teaching
Application of computer aided translation technology in translation teachingApplication of computer aided translation technology in translation teaching
Application of computer aided translation technology in translation teachingHoangtrungchinh Ttnct
 
New Breakthroughs in Machine Transation Technology
New Breakthroughs in Machine Transation TechnologyNew Breakthroughs in Machine Transation Technology
New Breakthroughs in Machine Transation Technologykantanmt
 
Gestión proyectos traducción - Universitat Autònoma de Barcelona
Gestión proyectos traducción - Universitat Autònoma de BarcelonaGestión proyectos traducción - Universitat Autònoma de Barcelona
Gestión proyectos traducción - Universitat Autònoma de BarcelonaManuel Herranz
 
Gestión proyectos traducción en la Universitat Autònoma de Barcelona
Gestión proyectos traducción en la Universitat Autònoma de BarcelonaGestión proyectos traducción en la Universitat Autònoma de Barcelona
Gestión proyectos traducción en la Universitat Autònoma de BarcelonaManuel Herranz
 
Temporal Convolutional Networks - Dethroning RNN's for sequence modelling?
Temporal Convolutional Networks - Dethroning RNN's for sequence modelling?Temporal Convolutional Networks - Dethroning RNN's for sequence modelling?
Temporal Convolutional Networks - Dethroning RNN's for sequence modelling?Thomas Hjelde Thoresen
 
Amta 2012-federico (1)
Amta 2012-federico (1)Amta 2012-federico (1)
Amta 2012-federico (1)FabiolaPanetti
 
A NEURAL MACHINE LANGUAGE TRANSLATION SYSTEM FROM GERMAN TO ENGLISH
A NEURAL MACHINE LANGUAGE TRANSLATION SYSTEM FROM GERMAN TO ENGLISHA NEURAL MACHINE LANGUAGE TRANSLATION SYSTEM FROM GERMAN TO ENGLISH
A NEURAL MACHINE LANGUAGE TRANSLATION SYSTEM FROM GERMAN TO ENGLISHIRJET Journal
 
Machine Translation Master Class at the EUATC Conference by Diego Bartolome
Machine Translation Master Class at the EUATC Conference by Diego BartolomeMachine Translation Master Class at the EUATC Conference by Diego Bartolome
Machine Translation Master Class at the EUATC Conference by Diego Bartolometauyou
 
Lexcelera MT Breaking Compromises
Lexcelera MT Breaking CompromisesLexcelera MT Breaking Compromises
Lexcelera MT Breaking CompromisesLoriThicke
 
Webinar automotive and engineering content 16.06.16
Webinar   automotive and engineering content 16.06.16Webinar   automotive and engineering content 16.06.16
Webinar automotive and engineering content 16.06.16kantanmt
 
kerstin bier, localization world barcelona, manuel herranz, mt, pangeanic, sy...
kerstin bier, localization world barcelona, manuel herranz, mt, pangeanic, sy...kerstin bier, localization world barcelona, manuel herranz, mt, pangeanic, sy...
kerstin bier, localization world barcelona, manuel herranz, mt, pangeanic, sy...Manuel Herranz
 
Pangeanic presentation at Elia Together Athens - Manuel Herranz
Pangeanic presentation at Elia Together Athens - Manuel HerranzPangeanic presentation at Elia Together Athens - Manuel Herranz
Pangeanic presentation at Elia Together Athens - Manuel HerranzManuel Herranz
 
No more SMT black boxes with MTradumàtica: a step-by-step web-based SMT appli...
No more SMT black boxes with MTradumàtica: a step-by-step web-based SMT appli...No more SMT black boxes with MTradumàtica: a step-by-step web-based SMT appli...
No more SMT black boxes with MTradumàtica: a step-by-step web-based SMT appli...TAUS - The Language Data Network
 
Collaborative Construction of Telecommunications Services
Collaborative Construction of Telecommunications ServicesCollaborative Construction of Telecommunications Services
Collaborative Construction of Telecommunications ServicesVanea Chiprianov
 
Welocalize Throughputs and Post-Editing Productivity Webinar Laura Casanellas
Welocalize Throughputs and Post-Editing Productivity Webinar Laura CasanellasWelocalize Throughputs and Post-Editing Productivity Webinar Laura Casanellas
Welocalize Throughputs and Post-Editing Productivity Webinar Laura CasanellasWelocalize
 

Semelhante a Integrating Machine Translation with Translation Memory: A Practical Approach (20)

Carla Parra Escartin - ER2 Hermes Traducciones
Carla Parra Escartin - ER2 Hermes Traducciones Carla Parra Escartin - ER2 Hermes Traducciones
Carla Parra Escartin - ER2 Hermes Traducciones
 
5 challenges of scaling l10n workflows KantanMT/bmmt webinar
5 challenges of scaling l10n workflows KantanMT/bmmt webinar5 challenges of scaling l10n workflows KantanMT/bmmt webinar
5 challenges of scaling l10n workflows KantanMT/bmmt webinar
 
Application of computer aided translation technology in translation teaching
Application of computer aided translation technology in translation teachingApplication of computer aided translation technology in translation teaching
Application of computer aided translation technology in translation teaching
 
New Breakthroughs in Machine Transation Technology
New Breakthroughs in Machine Transation TechnologyNew Breakthroughs in Machine Transation Technology
New Breakthroughs in Machine Transation Technology
 
Gestión proyectos traducción - Universitat Autònoma de Barcelona
Gestión proyectos traducción - Universitat Autònoma de BarcelonaGestión proyectos traducción - Universitat Autònoma de Barcelona
Gestión proyectos traducción - Universitat Autònoma de Barcelona
 
Gestión proyectos traducción en la Universitat Autònoma de Barcelona
Gestión proyectos traducción en la Universitat Autònoma de BarcelonaGestión proyectos traducción en la Universitat Autònoma de Barcelona
Gestión proyectos traducción en la Universitat Autònoma de Barcelona
 
Temporal Convolutional Networks - Dethroning RNN's for sequence modelling?
Temporal Convolutional Networks - Dethroning RNN's for sequence modelling?Temporal Convolutional Networks - Dethroning RNN's for sequence modelling?
Temporal Convolutional Networks - Dethroning RNN's for sequence modelling?
 
CAT TOOLS.ppt
CAT TOOLS.pptCAT TOOLS.ppt
CAT TOOLS.ppt
 
Amta 2012-federico (1)
Amta 2012-federico (1)Amta 2012-federico (1)
Amta 2012-federico (1)
 
A NEURAL MACHINE LANGUAGE TRANSLATION SYSTEM FROM GERMAN TO ENGLISH
A NEURAL MACHINE LANGUAGE TRANSLATION SYSTEM FROM GERMAN TO ENGLISHA NEURAL MACHINE LANGUAGE TRANSLATION SYSTEM FROM GERMAN TO ENGLISH
A NEURAL MACHINE LANGUAGE TRANSLATION SYSTEM FROM GERMAN TO ENGLISH
 
Machine Translation Master Class at the EUATC Conference by Diego Bartolome
Machine Translation Master Class at the EUATC Conference by Diego BartolomeMachine Translation Master Class at the EUATC Conference by Diego Bartolome
Machine Translation Master Class at the EUATC Conference by Diego Bartolome
 
Lexcelera MT Breaking Compromises
Lexcelera MT Breaking CompromisesLexcelera MT Breaking Compromises
Lexcelera MT Breaking Compromises
 
Webinar automotive and engineering content 16.06.16
Webinar   automotive and engineering content 16.06.16Webinar   automotive and engineering content 16.06.16
Webinar automotive and engineering content 16.06.16
 
kerstin bier, localization world barcelona, manuel herranz, mt, pangeanic, sy...
kerstin bier, localization world barcelona, manuel herranz, mt, pangeanic, sy...kerstin bier, localization world barcelona, manuel herranz, mt, pangeanic, sy...
kerstin bier, localization world barcelona, manuel herranz, mt, pangeanic, sy...
 
Pangeanic presentation at Elia Together Athens - Manuel Herranz
Pangeanic presentation at Elia Together Athens - Manuel HerranzPangeanic presentation at Elia Together Athens - Manuel Herranz
Pangeanic presentation at Elia Together Athens - Manuel Herranz
 
Intento Enterprise MT Hub
Intento Enterprise MT HubIntento Enterprise MT Hub
Intento Enterprise MT Hub
 
Intento Enterprise MT Hub
Intento Enterprise MT HubIntento Enterprise MT Hub
Intento Enterprise MT Hub
 
No more SMT black boxes with MTradumàtica: a step-by-step web-based SMT appli...
No more SMT black boxes with MTradumàtica: a step-by-step web-based SMT appli...No more SMT black boxes with MTradumàtica: a step-by-step web-based SMT appli...
No more SMT black boxes with MTradumàtica: a step-by-step web-based SMT appli...
 
Collaborative Construction of Telecommunications Services
Collaborative Construction of Telecommunications ServicesCollaborative Construction of Telecommunications Services
Collaborative Construction of Telecommunications Services
 
Welocalize Throughputs and Post-Editing Productivity Webinar Laura Casanellas
Welocalize Throughputs and Post-Editing Productivity Webinar Laura CasanellasWelocalize Throughputs and Post-Editing Productivity Webinar Laura Casanellas
Welocalize Throughputs and Post-Editing Productivity Webinar Laura Casanellas
 

Último

Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 

Último (20)

Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 

Integrating Machine Translation with Translation Memory: A Practical Approach

  • 1. Introduction Methodology Discussion Integrating Machine Translation with Translation Memory: A Practical Approach Panagiotis Kanavos and Dimitrios Kartsaklis November 4, 2010 Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 1/ 18
  • 2. Introduction Methodology Discussion Introduction Despite the ongoing research and the progress on the field, Machine Translation has not been widely accepted by the professional translation industry Common criticisms: MT is only suitable for draft translations of e-mails and web pages MT is not efficient for morphologically rich languages MT is useful only to large companies owning a wealth of resources In a nutshell: MT is something for researchers to play around with Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 2/ 18
  • 3. Introduction Methodology Discussion A Case Study How MT can be incorporated into professional translation workflows, with limited resources, in ways that significantly increase productivity. We combine both statistical and rule-based MT systems with Translation Memory software using two approaches: The on demand, sentence-by-sentence application of MT The one-time application of MT into the whole translation project The case study is conducted in production conditions, with final deliverables that require the highest translation quality. Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 3/ 18
  • 4. Introduction Configuration Methodology Segment-by-segment workflows Discussion One-time MT application workflow Our setting Language pair: English to Greek Text to be translated: Two Informatics books: one technical guide and one academic textbook. TM size: 140,000 TUs coming from in-domain texts Terminology DB size: 30,000 entries Fuzzy threshold: 70% Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 4/ 18
  • 5. Introduction Configuration Methodology Segment-by-segment workflows Discussion One-time MT application workflow Software programs and combinations MT systems: Statistical: Moses Rule-based: Systran CAT programs: Swordfish II (Java application) over Linux D´j` Vu X over MS Windows ea Wordfast, an MS Word macro template Three combinations, based on practical factors: Sentence-by-sentence workflow with Swordfish/Moses Sentence-by-sentence workflow with Wordfast/Systran One-time MT application workflow with D´j` Vu X/Moses ea Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 5/ 18
  • 6. Introduction Configuration Methodology Segment-by-segment workflows Discussion One-time MT application workflow Swordfish/Moses combination Swordfish: Allows connection to external programs or scripts Connection with Moses achieved with a custom Python script Basic workflow: if TM match > 80% then accept fuzzy match for post-edit else if 70% < TM match =< 80% then evaluate the fuzzy match if quality not acceptable then apply MT end if else apply MT if quality not acceptable then type the translation from scratch end if end if post-edit Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 6/ 18
  • 7. Introduction Configuration Methodology Segment-by-segment workflows Discussion One-time MT application workflow Swordfish/Moses combination: Results Book 1 : Instructive guide, Book 2 : Textbook Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 7/ 18
  • 8. Introduction Configuration Methodology Segment-by-segment workflows Discussion One-time MT application workflow Wordfast/Systran combination Wordfast: A macro template working on top of MS Word Great deal of customization through MS Word macros Rule-based version of Systran, supporting user dictionaries Basic workflow: if TM match < 70% then apply pre-editing macros send segment to MT engine apply post-editing macros while MT result not good do amend Systran user dictionary and re-send segment to MT end while else accept the translation for post-edit end if post-edit Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 8/ 18
  • 9. Introduction Configuration Methodology Segment-by-segment workflows Discussion One-time MT application workflow Wordfast/Systran combination: Results Book 1 : Instructive guide, Book 2 : Textbook Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 9/ 18
  • 10. Introduction Configuration Methodology Segment-by-segment workflows Discussion One-time MT application workflow D´j` Vu X/Moses combination ea D´j` Vu X: similar concept to Swordfish ea However: No way of integration with an MT system, so the only option is pre-translation of the whole project with Moses Send for MT only segments with no TM matches or TM matches below 80% Pre-translation stage: Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 10/ 18
  • 11. Introduction Configuration Methodology Segment-by-segment workflows Discussion One-time MT application workflow D´j` Vu X/Moses combination ea Basic workflow: if TM match > 80% then accept the translation for post-edit else evaluate MT translation if quality not acceptable then if any TM match exists (between 70-80%) then accept the translation for post-edit else apply “auto-assemble” feature if quality not acceptable then type the translation from scratch end if end if end if end if post-edit Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 11/ 18
  • 12. Introduction Configuration Methodology Segment-by-segment workflows Discussion One-time MT application workflow D´j` Vu X/Moses combination: Results ea Book 1 : Instructive guide, Book 2 : Textbook Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 12/ 18
  • 13. Introduction Methodology Discussion Productivity increase MT & TM combination: Productivity increased to a level not possible by applying either technology in isolation: Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 13/ 18
  • 14. Introduction Methodology Discussion Important factors Quantity and quality of TM entries The domain of the translation material used to train the statistical MT system The above impose serious limitations for those who work with small texts in many different domains. Rule-based systems are more suitable in such cases Language pair: Coding efficient user dictionaries with morphologically rich languages is difficult and requires some trial and error. Phrase-based systems like Moses have better performance Style of text: Productivity is higher with repetitive text and step-by-step instructions User expertise with all technologies involved Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 14/ 18
  • 15. Introduction Methodology Discussion A proposal for a unified application For general acceptance by the professional translation community, MT should be integrated with TM into an intuitive unified system Basically a TM environment, with the MT engine as an extra component working on top of it MT suggestions should be presented in a controlled and selective way Basic components: A 2-column translation grid for source and target segments Terminology management MT engine Alignment tool Quality assurance control Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 15/ 18
  • 16. Introduction Methodology Discussion Advanced issues Automation of the training process with TM databases Statistical systems require considerable computing resources. A solution: MT as Software As a Service (SaaS) Terminology databases can be used for more than reference purposes Additional entry fields for coding MT dictionary entries (Systran) Linguistic information can be used for creating factored models (Moses) Automatic suggestions-as-you-type (TransType, Caitra) Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 16/ 18
  • 17. Introduction Methodology Discussion Summary The combination of MT with TM results in significant productivity increase not feasible in a TM-only environment Currently there is not a straightforward way for doing that Work is in progress by the authors towards this purpose, in the form of a Software Specification document that will describe the design and the components of such a system in every detail Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 17/ 18
  • 18. Introduction Methodology Discussion Thank you! Any questions? Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 18/ 18