DeepMiner - Advanced Leveraging :Integrating Translation Memories and Machine Translation

DeepMiner
Integrating Translation Memories and
Machine Translation

TEKOM
October 25th, 2012

Presenter: Daniel Benito

Introduction

• History
• Limitations of Translation Memory
• Beyond Segment-Level Reuse
– Machine Translation
– Fuzzy Match Repair
– Advanced Leveraging
– Combining TM and MT
• Current Limitations
• Perspectives
• Conclusion

History

• Past:
– 1950s – Early Machine Translation (MT) experiments
– 1960s – General awareness that Machine Translation (MT)
was not going to replace human translators
– 1970s – First proposals for Translator Workstations
– 1990s – Translation Memory (TM) became viable
• Present:
– TM technology has barely advanced in the last ten years
– MT has advanced to the point where its applications in the
translation industry are incontrovertible

Limitations of Translation Memory

• Segment-level translation reuse is only useful in
limited cases
• Even in highly repetitive texts, most of the
repetitions happen at the sub-segment level:
– Terms and phrases
– Sentence structure
• Most Translation Memory systems are limited to
providing fuzzy matches but are unable to exploit
sub-segment repetition

Beyond Segment-level Reuse

• We need to translate:
EN: The black cat usually sleeps in the hallway.
• Our TM contains:
EN: The grey cat usually sleeps in the living room.
DE: Die graue Katze schläft gewöhnlich im Wohnzimmer.
• What can we do to reduce the time spent editing
fuzzy matches?
– Ignore the fuzzy matches and use MT
– Automatically repair the fuzzy matches

Machine Translation

• Results returned by various MT systems:
DE: Die schwarze Katze in der Regel schläft im Flur.
DE: Die schwarze Katze schläft normalerweise im Flur.
• Achieving consistency and using specific terminology
(e.g. Gang instead of Flur) will require some degree
of training or post-editing

Machine Translation

• General-purpose MT engines such as Google
Translate or Microsoft Translator usually require
extensive post-editing, but can be used for
inspiration
• Rule-based and statistical MT engines customized for
specific domains offer much higher quality but
require expensive tuning or retraining
• It is usually more expensive to use MT than to
manually edit a fuzzy match

Fuzzy Match Repair

• Inspired by the translation by analogy concept from
Example-Based Machine Translation (EBMT)
• Attempts to maintain the quality and consistency of
existing translations in the TM while increasing
productivity

Fuzzy Match Repair

• We can replace graue with schwarze and
Wohnzimmer with Gang to produce an exact match.

Fuzzy Match Repair

• Requires knowing the following translations:
grey → graue
black → schwarze
living room → Wohnzimmer
hallway → Gang
• What do we do if those translations are not explicitly
in our TMs or termbases?

Advanced Leveraging

• Bilingual concordance search:
EN: Mary has bought a new pair of grey running shoes.
DE: Maria hat ein neues Paar graue Laufschuhe gekauft.
EN: This article is also available in grey.
DE: Dieser Artikel ist auch in grau erhältlich.

Advanced Leveraging

• Statistically infer translations from the TM
• Compare all of the German translations and suggest
one or more probable translations (e.g. graue, grau)
• Requires:
– Large TMs with many examples
– Consistent translations in the TM

Combining TM and MT

• We can use MT as an additional resource for finding
the translations needed to repair fuzzy matches
• MT systems often give better results for terms and
short phrases than for long sentences
• We approach this combination based on the
following premises:
– A client’s own data is considered to be of higher quality
and will always have priority over the Machine Translation
results
– A fuzzy match repaired with Machine Translation will
usually be better than a normal fuzzy match, and better
than an MT result for an entire segment

Combining TM and MT

• Our termbase contains:
EN: grey
DE: graue
EN: black
DE: schwarze
EN: hallway
DE: Gang

Combining TM and MT

• We do not have the translation for living room in our
TM or our termbase, so we can request it from the
MT system:
EN: living room
DE: Wohnzimmer
• The combination of material in our TM, termbase
and MT system allows to perform the appropriate
replacements and obtain:
DE: Die schwarze Katze schläft gewöhnlich im Gang.

Current Limitations

EN: The white dog usually sleeps in the living room.
• Our termbase contains:
EN: grey cat
DE: graue Katze

Current Limitations

• Asking the MT system for the missing translation, we
get:
EN: white dog
DE: weißer Hund
• The result of fixing the fuzzy match is:
EN: The white dog usually sleeps in the living room.
DE: Die weißer Hund schläft gewöhnlich im Wohnzimmer.
• Some post-editing is still required

Current Limitations

EN: The grey cat often sleeps in the living room.
• The translations we get from the MT system are:
EN: usually
DE: normalerweise
EN: often
DE: oft
• We cannot repair the fuzzy match because we do not
know how usually has been translated

Future Developments

• Greater integration with the MT engines
– Access to internal translation candidates:
• EN: usually
• DE: normalerweise, gewöhnlich, sonst, ...
– Access to internal language models:
• DE: Die weißer Hund – never
• DE: Der weiße Hund – often
– Automatic upload of new TM material to the MT engine so
it can be used for retraining in the future

Conclusion

• Traditional segment-level translation reuse has
reached its full potential
• ATRIL’s Déjà Vu X2 already includes DeepMiner
technology that improves productivity by cleverly
combining all the approaches we described:
– (Statistical) Machine Translation
– Example-Based Machine Translation
– Advanced Leveraging (sub-segment matching)

Predictive Typing

• Find all sub-segment matches and offer them to the
translator as he or she types
• Suggestions are context-sensitive, so there are never
too many results to choose from
• Translations are constructed piece by piece from
previous texts, guided by the translator

Advanced Predictive Typing

• Advanced Leveraging techniques for statistically
inferring sub-segment translations from the TM can
be adapted to provide additional predictive typing
suggestions
• Translations from MT can be added to the predictive
typing mechanism, to offer additional suggestions for
translations of terms and phrases

MT integrations in Déjà Vu X2

• Systran Entreprise Server
• Google Translate
• Microsoft Translator
• PROMT Translation Server
• itranslate4eu

DeepMiner - Advanced Leveraging :Integrating Translation Memories and Machine Translation

Recomendados

Recomendados

Mais conteúdo relacionado

Último

Último (20)

Destaque

Destaque (20)