SlideShare uma empresa Scribd logo
1 de 89
Baixar para ler offline
1/89
Peter Sandrini
University of Innsbruck, Austria
Control over Digital Technology
Free and Open Source CAT Tools
Timisoara
March 26, 2015
Workshop
2/89
Abstract
In a digitalized and globalized world, translation technology is becoming an
inevitable part of translation. It not only concerns translators but also users of
translation, trainers of translators, and localizers. Translation Technology can
boost the efficiency and consistency of translation, but inconsiderate use of
software and services may also cause translators losing control over the
translation process and translation data.
The workshop outlines the concept of translation technology as well as free
and open source software and presents two compilations of available free
translation technology tools developed at the University of Innsbruck:
USBTrans – a collection of preinstalled packages on a USB stick, and tuxtrans –
a tailor-made Linux distribution for translators.
3/89
Contents
1) Control over digital technology
Free Software
Free translation technology packages:
USBTrans and tuxtrans
2) Typical work tasks:
✔translate a website
✔create a TM on the basis of existing translations
✔manage terminology and dictionaries
✔extract terminology from texts
✔use machine translation
✔convert file formats
✔manage bilingual files
✔manage pdf files
✔quality assurance
✔use text corpora
4/89
Dominant Technology?
• „it is hard to think of a business process that is not
wholly, or partly, dependent on technology“
• what about language services and translation?
„technological turn in translation“ (Cronin 2010)
• costs and risks of technology
• translation technology ≠ ≥
5/89
Translation Technology
• any kind of digital Information and communication
technology (ICT)
• which supports or performs the translation process
• with the aim of meeting adequate efficiency and
quality requirements
6/89
Control
• choice vs (being) use(d) (independence)
• confidentiality
• integrity of program code
• integrity of data
• availability of data
• overall management of the IT
• configuration, installation of software
7/89
Control: examples
Choose your software independently
do not let translation agencies dictate your choice
Choose your software independently
do not let translation agencies dictate your choice
Do not let financial factors limit your choiceDo not let financial factors limit your choice
Do not let licenses limit your freedom
I want to install my software on a desktop and
on a notebook computer as well as on my network
I want hassle-free updates
Do not let licenses limit your freedom
I want to install my software on a desktop and
on a notebook computer as well as on my network
I want hassle-free updates
Be social and share your software with friends
the easiest way to guarantee cooperation
and data exchange with colleagues
Be social and share your software with friends
the easiest way to guarantee cooperation
and data exchange with colleagues
Be confident about your data
I have a confidentiality agreement with my clients
my translation data are economic assets
Be confident about your data
I have a confidentiality agreement with my clients
my translation data are economic assets
8/89
Win back and maintain control
9/89
Free and Open-Source
Licenses:
• GNU GPL
• Apache License 2.0
• BSD 2/3
• (L)PGL
• MIT license
• Mozilla Public License 2.0
• Eclipse Public License
• Creative Commons
Freedom to
– run the program as you wish, for any
purpose (freedom 0).
– study how the program works, and
change it so it does your computing as
you wish (freedom 1). Access to the
source code is a precondition for this.
– redistribute copies so you can help
your neighbor (freedom 2).
– distribute copies of your modified
versions to others (freedom 3). By
doing this you can give the whole
community a chance to benefit from
your changes. Access to the source
code is a precondition for this.
10/89
11/89
• allow a cost-saving start of your career
• facilitate cooperation with colleagues
• avoid copyright infringements
• full control over your own PC
• ease of use without a license or activation code
• participation in developing applications
through online communities
• changes (dependent) consumers into (autonomous) agents
Why use free software?
12/89
Translation Environment
Tools (TenT)
all-in-one application for
translators
proprietary software
individual projects with
specific functionality
✔ translation memory
✔ analysis and statistics
✔ code protection
free software
✔ terminology management
✔ project management
✔ code page conversion
✔ format conversion
✔ Spell checking
✔ ...
Structural differences
✔ Translation-Memory
✔ Terminology-Management
✔ Alignment
✔ Search for collocations
✔ Analysis and statistics
✔ Project management
✔ Code-Protection
✔ Batch-scripts
✔ Spellchecking
✔ Code page conversion
✔ Format conversion
✔ ...
13/89
TEnTs
• Translation Environment Tools (TenT)
all-encompassing application for
translators
• Generic term for „Translation-
Memory-System“ or „Computer aided
translation CAT-System“ ✔ Translation-Memory
✔ Terminology-Management
✔ Alignment
✔ Search for collocations
✔ Analysis and statistics
✔ Project management
✔ Code-Protection
✔ Batch-scripts
✔ Spellchecking
✔ Code page conversion
✔ Format conversion
✔ ...
14/89
market reality
Okapi
JublerAnaphraseus
TinyTm
open
BiText2TMX
Sun OLT Virtaal
OmegaT
OpenTMS
FOLTHeartsome Gaupol
Lokalize
proprietary
SDL/Trados
Across
Star Transit
MemoQ
DéjàVuRainbow
Wordfast
Transolution
ForeignDesk
Pootle
KBabel
PO-Edit
Translate Toolkit
Catalyst
Apertium
15/89
FOSTT compiled
1. USBTrans – for Windows
http://homepage.uibk.ac.at/~c61302/fsftrans.html
2. tuxtrans – Open Translation Desktop System
http://www.tuxtrans.org
16/89
USBTrans
• a compilation of translation related free and
open source software which can be started
from an USB stick without installation
(Portable Apps)
• download from
http://homepage.uibk.ac.at/~c61302/fsftrans.htmlsome
samples on your USB
• unpack the zip archive on your local pc and you
are ready to go; you may also copy the
unpacked files onto an USB stick (4GB) and
start the programs from there
• just plug in the USB stick and start the
USBTrans menu
17/89
tuxtrans
• complete desktop system for translators
based on Linux as a free operating system
and many specific application for translators
• multilingual
Italian, English, Spanish, German by default
with many more languages available online
• all open source or free software
• website: http://www.tuxtrans.org
Twitter: https://twitter.com/tuxtrans
• live-system, it can be started
from your USB stick without
installation
18/89
Requirements
• operating system
• standard applications
• specific applications } = free
open-source
19/89
Requirements
• operating system
• standard applications
• specific applications } = free
open-source
why?
●
to be able to perform all tasks related to translation
●
to be able to use and install it ad libitum
●
to be able to distribute it to translators and students
20/89
tuxtrans may be used as
1) Live-DVD
(without installation, slow performance)
2) Live-USB
(without installation, rather slow performance)
3) Virtual machine (VirtualBox VMWare)
4) second OS
(with installation, fast performance, easy to customize
and adapt)
5) main operating system
21/89
prepare to migrate
• use cross-platform-applications
• and standard formats
odt, pdf, tmx, tbx, xliff ...
22/89
standard apps
➢
word processors
➢
spreadsheets
➢
presentations
➢
databases
➢
DTP
➢
image manipulation
➢
office suite
➢
Browser
➢
E-Mail
MS-Windows
LO-Writer
LO-Calc
LO-Impress
Access
Xpress
Gimp
Libre Office
Firefox
Thunderbird
Linux
LO-Writer
LO-Calc
LO-Impress
MYSQL
Scribus
Gimp
Libre Office
Firefox
Thunderbird
23/89
translate with FOSS
• common tasks of a free-lance translator
• with free and open source translation technology
• on a free digital infrastructure
• with tuxtrans the Linux Desktop for translators
24/89
▶ translate a Word document
with TM support
what you need:
– a word processor
– a TM system
– translation memories
– terminology lists
25/89
sample text
26/89
OmegaT: supported formats
text formats
•Plain text (any encoding supported by Java),
including Unicode
•StarOffice, OpenOffice.org, LibreOffice and
OpenDocument
•Open XML (Microsoft 2007/2010/2013)
•(X)HTML (including complete website tree
structure)
•Help & Manual
•HTML Help Compiler
•LaTeX
•DokuWiki
•CopyFlow Gold for QuarkXPress
•DocBook
•Typo3 LocManager
•Iceni Infix (PDF)
•XLIFF source = target
•TXML Wordfast source = target
localization formats
•Android resources
•Java .properties
•Key-value files
•Mozilla DTD
•Windows resources (RC)
•WiX localisation
•ResX
•Flash XML export
•Camtasia for Windows
•Magento CE localisation
•PO (Portable Object File) (reading
existing translations)
•SubRip subtitles (SRT)
•SVG images
27/89
start OmegaT
28/89
OmegaT: organisation
terminology/glossaries
conference.txt, congress.tbx ...
source text
9th PCTS Conference - Reg. Form EN-1.docx
project specific
configuration files
filters.conf, segmentation.conf
translation memories
regform.tmx ...
inputinput
29/89
OmegaT: statistics
30/89
OmegaT: editor
31/89
OmegaT: organisation
new terminology
glossary.txt
target text
9th PCTS Conference - Reg. Form EN-1.docx
project specific
configuration files
filters.conf, segmentation.conf
new translation memories
test01.tmx ...
outputoutput
spell checking files
learned_words.txt, ignored_words.txt
32/89
Part 1
33/89
Overview: part II
typical tasks of a translator
✔ translate a websitetranslate a website
✔ create a TM on the basis of existing translations
✔ manage terminology and dictionaries
✔ extract terminology from texts
✔ use machine translation
✔ convert file formats
✔ manage bilingual files
✔ manage pdf files
✔ quality assessment
✔ use text corpora
34/89
▶ translate websites
what you need:
• website with a few HTML documents
• local copy of website
• translation memory system
• existing translation memories
• existing terminology lists
• HTML editor
35/89
save website locally
Httrack website copier:
36/89
OmegaT: tuxtrans.org
37/89
OmegaT: tuxtrans.org
● source text = complete website
directory structure with files
● target text = identical website
directory structure with files
38/89
Edit HTML: Geany
39/89
Edit HTML: Bluefish
40/89
Overview: part II
typical tasks of a translator
✔ translate a website
✔ create a TM on the basis of existing translationscreate a TM on the basis of existing translations
✔ manage terminology and dictionaries
✔ extract terminology from texts
✔ use machine translation
✔ convert file formats
✔ manage bilingual files
✔ manage pdf files
✔ quality assessment
✔ use text corpora
41/89
▶ Align source and target text
How can I create a translation memory
from an existing translation?
What you need:
• source text and target text
• an alignment tool
• time and patience
42/89
Alignment: BiText2TMX
● no development
● only text format
● TM as TMX
43/89
Alignment: LF-Aligner
• Formats:
txt (UTF-8!), rtf, doc,
docx, odt, pdf, html
<tu creationdate="20140916T141042Z"
creationid="LF Aligner 3.11"><prop
44/89
▶ TMX-Management
Heartsome TMX-Editor
• Edit, filter
• convert from TMX to ..., to TMX from ...
• QA
• ...
45/89
TMX-Management
Virtaal
• edit
• TMX QA
46/89
TMX-Management
Okapi Olifant
• edit
• QA
• alpha
47/89
Overview: part II
typical tasks of a translator
✔ translate a website
✔ create a TM on the basis of existing translations
✔ manage terminology and dictionariesmanage terminology and dictionaries
✔ extract terminology from texts
✔ use machine translation
✔ convert file formats
✔ manage bilingual files
✔ manage pdf files
✔ quality assessment
✔ use text corpora
48/89
▶ manage your terminology
ForeignDesk Termbase
● concept oriented
● import/export
html, csv, Multiterm
● OmegaT integration with
csv export +
edit
49/89
▶ search for terminology
GoldenDict
● several dictionary formats
● same formats
as dictionaries
in OmegaT
50/89
integrate dictionaries in OmegaT
dictionaries
dictionaries vs glossaries:
● no interaction possible
just reading of entries
no editing nor adding
● all dictionaries in stardict format
● available on the web
51/89
dictionaries in OmegaT
monolingual StarDict dictionary
integrated in OmegaT:
„Oxford Advanced Learner dictionary“
52/89
dictionaries in OmegaT
bilingual StarDict dictionaries
in OmegaT: „en-de“
53/89
glossaries in OmegaT
create,edit and use glossary lists in OmegaT
• simple 3-column format
source term (Tab) target term (Tab) note
• always UTF-8 coded
• file names: *.txt, *.tab, *.utf8, *.tbx
• terminology recognition and auto-
completer in OmegaT:
https://www.youtube.com/watch?v=LszI
aP22QhQ
glossaries
54/89
glossaries in OmegaT
create, edit and use glossary lists in OmegaT
● add terminology:
[CTRL Shift G] or
Edit / Create glossay entry
● saved in
Project / Properties:
55/89
Overview: part II
typical tasks of a translator
✔ translate a website
✔ create a TM on the basis of existing translations
✔ manage terminology and dictionaries
✔ extract terminology from textsextract terminology from texts
✔ use machine translation
✔ convert file formats
✔ manage bilingual files
✔ manage pdf files
✔ quality assessment
✔ use text corpora
56/89
▶ term extraction
• simple monolingual term extraction with Rainbow
57/89
▶ term extraction
• simple monolingual term extraction
with Phrase extractor
58/89
Overview: part II
typical tasks of a translator
✔ translate a website
✔ create a TM on the basis of existing translations
✔ manage terminology and dictionaries
✔ extract terminology from texts
✔ use machine translationuse machine translation
✔ convert file formats
✔ manage bilingual files
✔ manage pdf files
✔ quality assurance
✔ use text corpora
59/89
▶ Machine Translation
what you need:
• an Open Source MT system
• a free on-line MT system
• an interface to your TM system
60/89
Apertium: OS MT system
• Apertium is a rule based Open Source MT system
• mostly regional language pairs
from Spain
with Englisch
61/89
Apertium: OS MT system
• Apertium offline can be integrated into OmegaT
62/89
on-line MT systems
• you need to register to use
online Mt within OmegaT:
– Mymemory (only E-Mail)
– Microsoft Translator
API ID + Secret key
– Google Translate
API key
OmegaT.I4J.ini
63/89
• available for free on the web
– Microsoft Translator
– Google Translate
• within OmegaT
– Apertium online + offline free
– Mymemory free
– Microsoft Translator free, but
you need to register
– Google Translate
registration + costs
on-line MT systems
64/89
on-line MT systems
• interface in OmegaT
65/89
Overview: part II
typical tasks of a translator
✔ translate a website
✔ create a TM on the basis of existing translations
✔ manage terminology and dictionaries
✔ extract terminology from texts
✔ use machine translation
✔ convert file formatsconvert file formats
✔ manage bilingual files
✔ manage pdf files
✔ quality assurance
✔ use text corpora
66/89
convert file formats
• text formats from doc to docx, odt …
with LibreOffice
67/89
convert file formats
• CSV, TAB, TXT glossary lists to TBX
with Heartsome Translation Studio
• CSV, TXT parallel texts
to TMX Tms with Heartsome
• Okapi Rainbow
to PO, TMX, CSV
• SDL/Trados
TM and TB
to TMX with
TradosStudio
resources
converter
68/89
Overview: part II
typical tasks of a translator
✔ translate a website
✔ create a TM on the basis of existing translations
✔ manage terminology and dictionaries
✔ extract terminology from texts
✔ use machine translation
✔ convert file formats
✔ manage bilingual filesmanage bilingual files
✔ manage pdf files
✔ quality assurance
✔ use text corpora
69/89
▶ bilingual file types
text files which include source text segments aligned to target
text segments, i.e. files which contain translations for all
segments or just for a part of segments
XLIFF, sdlxliff, ttx
can not be translated with OmegaT directly, or rather only
when source text segment = target text segment
existing translations can thus not be leveraged
partly translated
file as shown
in Virtaal
70/89
▶ bilingual file types
Existing translations can be re-used in OmegaT by using
Okapi Rainbow and by creating ready-to-go OmegaT
translation projects
}
● extract source text
● create TM from existing translations
● translate project in OmegaT
● post process translation in Rainbow
● get translated bilingual files
● XLIFF
● sdlxliff
● ttx
71/89
Okapi Rainbow
translation kit creation
72/89
Okapi Rainbow
translation kit post processing
73/89
Overview: part II
typical tasks of a translator
✔ translate a website
✔ create a TM on the basis of existing translations
✔ manage terminology and dictionaries
✔ extract terminology from texts
✔ use machine translation
✔ convert file formats
✔ manage bilingual files
✔ manage pdf filesmanage pdf files
✔ quality assurance
✔ use text corpora
74/89
▶ PDF files
• translate PDF files with OmegaT
only text-based PDF and target text will be txt file
• extract text from PDF files
with gPDFText eBook-Editor (Linux)
• split and merge PDF files
with PDF-Sam
• annotate PDF files
with Xournal (Linux)
• compare PDF files with DiffPDF
75/89
PDF and OmegaT
• translate text-based PDF directly in OmegaT
76/89
extract text from PDF
• gPDFText eBook-Editor extracts Text
from text-based PDF
without line breaks
77/89
split and merge PDFs
• PDF-Sam
78/89
PDF-Dateien annotieren
• comment, underline,
highlight, etc. with
Xournal
79/89
Overview: part II
typical tasks of a translator
✔ translate a website
✔ create a TM on the basis of existing translations
✔ manage terminology and dictionaries
✔ extract terminology from texts
✔ use machine translation
✔ convert file formats
✔ manage bilingual files
✔ manage pdf files
✔ quality assurancequality assurance
✔ use text corpora
80/89
▶ Quality assurance
1) OmegaT QA scripts
2) Okapi Checkmate
3) TMX-Validator
4) XLIFF Checker
81/89
QA with OmegaT
• OmegaT QA scripts
82/89
QA with Checkmate
83/89
formal validation
TMX-Validator
XLIFF-Checker
84/89
Overview: part II
typical tasks of a translator
✔ translate a website
✔ create a TM on the basis of existing translations
✔ manage terminology and dictionaries
✔ extract terminology from texts
✔ use machine translation
✔ convert file formats
✔ manage bilingual files
✔ manage pdf files
✔ quality assurance
✔ use text corporause text corpora
85/89
▶ create a reference corpus
Why?
search for terms, collocations, idioms ...
what you need:
• a representative quantity of LSP texts in the
subject domain you want to translate
• a free concordance tool
86/89
TextSTAT
• file formats:
doc, odt, html, txt
• concordance search
• frequency lists
87/89
AntConc
• AntConc
less file formats
more linguistic
analysis
88/89
Overview: part II
typical tasks of a translator
✔ translate a website
✔ create a TM on the basis of existing translations
✔ manage terminology and dictionaries
✔ extract terminology from texts
✔ use machine translation
✔ convert file formats
✔ manage bilingual files
✔ manage pdf files
✔ quality assurance
✔ use text corpora
translation of subtitles
editing and translating PO files
creating mindmaps for terminology
creating a web corpus
localizing software
evaluating MT output ...
… + ...
89/89
http://www.petersandrini.net
http://uibk.academia.edu/PeterSandrini
Simply, try it out!
Thank youThank you
for your attention!for your attention!
What next?

Mais conteúdo relacionado

Semelhante a Control over Digital Technology: Free and Open Source CAT Tools

A Roadmap for Students Using FOSS (Free and Open Source Software) and Reachin...
A Roadmap for Students Using FOSS (Free and Open Source Software) and Reachin...A Roadmap for Students Using FOSS (Free and Open Source Software) and Reachin...
A Roadmap for Students Using FOSS (Free and Open Source Software) and Reachin...PK Mishra
 
2010 tool forum ata handout
2010 tool forum ata handout2010 tool forum ata handout
2010 tool forum ata handoutascetlan
 
Computer assisted tools at a glance
Computer assisted tools at a glanceComputer assisted tools at a glance
Computer assisted tools at a glanceTradas
 
LTR: Open Source Public Workstations
LTR: Open Source Public WorkstationsLTR: Open Source Public Workstations
LTR: Open Source Public Workstationskoegeljm
 
Ltr Open Source Public Workstations Presentat
Ltr Open Source Public Workstations PresentatLtr Open Source Public Workstations Presentat
Ltr Open Source Public Workstations Presentatburmaball
 
LTR: Open Source Public Workstations
LTR: Open Source Public WorkstationsLTR: Open Source Public Workstations
LTR: Open Source Public Workstationskoegeljm
 
LTR: Open Source Public Workstations
LTR: Open Source Public Workstations LTR: Open Source Public Workstations
LTR: Open Source Public Workstations koegeljm
 
LTR: Open Source Public Workstations
LTR: Open Source Public Workstations LTR: Open Source Public Workstations
LTR: Open Source Public Workstations koegeljm
 
Module 3 - Software Classification.pptx
Module 3 - Software Classification.pptxModule 3 - Software Classification.pptx
Module 3 - Software Classification.pptxpercivalfernandez2
 
A presentation on system software
A presentation on system software A presentation on system software
A presentation on system software Ankit Sangwan
 
New Standards for Enterprise Thermal Printer Management
New Standards for Enterprise Thermal Printer ManagementNew Standards for Enterprise Thermal Printer Management
New Standards for Enterprise Thermal Printer ManagementDawn Kehr
 
system software and application software
system software and application softwaresystem software and application software
system software and application softwareTallat Satti
 
DOC-20230427-WA0012..pptx
DOC-20230427-WA0012..pptxDOC-20230427-WA0012..pptx
DOC-20230427-WA0012..pptxkumarkaushal17
 
fdocuments.in_unit-2-foc.ppt
fdocuments.in_unit-2-foc.pptfdocuments.in_unit-2-foc.ppt
fdocuments.in_unit-2-foc.pptKrishanPalSingh39
 
Introduction to Python Programming
Introduction to Python ProgrammingIntroduction to Python Programming
Introduction to Python ProgrammingAkhil Kaushik
 

Semelhante a Control over Digital Technology: Free and Open Source CAT Tools (20)

A Roadmap for Students Using FOSS (Free and Open Source Software) and Reachin...
A Roadmap for Students Using FOSS (Free and Open Source Software) and Reachin...A Roadmap for Students Using FOSS (Free and Open Source Software) and Reachin...
A Roadmap for Students Using FOSS (Free and Open Source Software) and Reachin...
 
2010 tool forum ata handout
2010 tool forum ata handout2010 tool forum ata handout
2010 tool forum ata handout
 
Computer assisted tools at a glance
Computer assisted tools at a glanceComputer assisted tools at a glance
Computer assisted tools at a glance
 
LTR: Open Source Public Workstations
LTR: Open Source Public WorkstationsLTR: Open Source Public Workstations
LTR: Open Source Public Workstations
 
Ltr Open Source Public Workstations Presentat
Ltr Open Source Public Workstations PresentatLtr Open Source Public Workstations Presentat
Ltr Open Source Public Workstations Presentat
 
LTR: Open Source Public Workstations
LTR: Open Source Public WorkstationsLTR: Open Source Public Workstations
LTR: Open Source Public Workstations
 
LTR: Open Source Public Workstations
LTR: Open Source Public Workstations LTR: Open Source Public Workstations
LTR: Open Source Public Workstations
 
LTR: Open Source Public Workstations
LTR: Open Source Public Workstations LTR: Open Source Public Workstations
LTR: Open Source Public Workstations
 
2 software
2 software2 software
2 software
 
Lecture 10
Lecture 10Lecture 10
Lecture 10
 
Module 3 - Software Classification.pptx
Module 3 - Software Classification.pptxModule 3 - Software Classification.pptx
Module 3 - Software Classification.pptx
 
A presentation on system software
A presentation on system software A presentation on system software
A presentation on system software
 
New Standards for Enterprise Thermal Printer Management
New Standards for Enterprise Thermal Printer ManagementNew Standards for Enterprise Thermal Printer Management
New Standards for Enterprise Thermal Printer Management
 
Atrc opensource in_universities_presentation_7_june_2012-1
Atrc opensource in_universities_presentation_7_june_2012-1Atrc opensource in_universities_presentation_7_june_2012-1
Atrc opensource in_universities_presentation_7_june_2012-1
 
system software and application software
system software and application softwaresystem software and application software
system software and application software
 
DOC-20230427-WA0012..pptx
DOC-20230427-WA0012..pptxDOC-20230427-WA0012..pptx
DOC-20230427-WA0012..pptx
 
fdocuments.in_unit-2-foc.ppt
fdocuments.in_unit-2-foc.pptfdocuments.in_unit-2-foc.ppt
fdocuments.in_unit-2-foc.ppt
 
Operating system
Operating systemOperating system
Operating system
 
Introduction to Python Programming
Introduction to Python ProgrammingIntroduction to Python Programming
Introduction to Python Programming
 
Internship msc cs
Internship msc csInternship msc cs
Internship msc cs
 

Mais de Universität Innsbruck

Mais de Universität Innsbruck (6)

Ergebnis einer Umfrage zur Studienwahl
Ergebnis einer Umfrage zur StudienwahlErgebnis einer Umfrage zur Studienwahl
Ergebnis einer Umfrage zur Studienwahl
 
tuxtrans - workshop interattivo
tuxtrans - workshop interattivotuxtrans - workshop interattivo
tuxtrans - workshop interattivo
 
Introduction to OmegaT
Introduction to OmegaTIntroduction to OmegaT
Introduction to OmegaT
 
Wstuxtrans lingue
Wstuxtrans lingueWstuxtrans lingue
Wstuxtrans lingue
 
Wstuxtrans installa
Wstuxtrans installaWstuxtrans installa
Wstuxtrans installa
 
Wstuxtrans live
Wstuxtrans liveWstuxtrans live
Wstuxtrans live
 

Último

Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Matt Ray
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commercemanigoyal112
 
Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Rob Geurden
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Natan Silnitsky
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfStefano Stabellini
 
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...Akihiro Suda
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odishasmiwainfosol
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Developmentvyaparkranti
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfDrew Moseley
 
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxUI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxAndreas Kunz
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Angel Borroy López
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfMarharyta Nedzelska
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
Post Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on IdentityPost Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on Identityteam-WIBU
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Cizo Technology Services
 

Último (20)

Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commerce
 
Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdf
 
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Development
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdf
 
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxUI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalm
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdf
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
Post Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on IdentityPost Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on Identity
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
 

Control over Digital Technology: Free and Open Source CAT Tools

  • 1. 1/89 Peter Sandrini University of Innsbruck, Austria Control over Digital Technology Free and Open Source CAT Tools Timisoara March 26, 2015 Workshop
  • 2. 2/89 Abstract In a digitalized and globalized world, translation technology is becoming an inevitable part of translation. It not only concerns translators but also users of translation, trainers of translators, and localizers. Translation Technology can boost the efficiency and consistency of translation, but inconsiderate use of software and services may also cause translators losing control over the translation process and translation data. The workshop outlines the concept of translation technology as well as free and open source software and presents two compilations of available free translation technology tools developed at the University of Innsbruck: USBTrans – a collection of preinstalled packages on a USB stick, and tuxtrans – a tailor-made Linux distribution for translators.
  • 3. 3/89 Contents 1) Control over digital technology Free Software Free translation technology packages: USBTrans and tuxtrans 2) Typical work tasks: ✔translate a website ✔create a TM on the basis of existing translations ✔manage terminology and dictionaries ✔extract terminology from texts ✔use machine translation ✔convert file formats ✔manage bilingual files ✔manage pdf files ✔quality assurance ✔use text corpora
  • 4. 4/89 Dominant Technology? • „it is hard to think of a business process that is not wholly, or partly, dependent on technology“ • what about language services and translation? „technological turn in translation“ (Cronin 2010) • costs and risks of technology • translation technology ≠ ≥
  • 5. 5/89 Translation Technology • any kind of digital Information and communication technology (ICT) • which supports or performs the translation process • with the aim of meeting adequate efficiency and quality requirements
  • 6. 6/89 Control • choice vs (being) use(d) (independence) • confidentiality • integrity of program code • integrity of data • availability of data • overall management of the IT • configuration, installation of software
  • 7. 7/89 Control: examples Choose your software independently do not let translation agencies dictate your choice Choose your software independently do not let translation agencies dictate your choice Do not let financial factors limit your choiceDo not let financial factors limit your choice Do not let licenses limit your freedom I want to install my software on a desktop and on a notebook computer as well as on my network I want hassle-free updates Do not let licenses limit your freedom I want to install my software on a desktop and on a notebook computer as well as on my network I want hassle-free updates Be social and share your software with friends the easiest way to guarantee cooperation and data exchange with colleagues Be social and share your software with friends the easiest way to guarantee cooperation and data exchange with colleagues Be confident about your data I have a confidentiality agreement with my clients my translation data are economic assets Be confident about your data I have a confidentiality agreement with my clients my translation data are economic assets
  • 8. 8/89 Win back and maintain control
  • 9. 9/89 Free and Open-Source Licenses: • GNU GPL • Apache License 2.0 • BSD 2/3 • (L)PGL • MIT license • Mozilla Public License 2.0 • Eclipse Public License • Creative Commons Freedom to – run the program as you wish, for any purpose (freedom 0). – study how the program works, and change it so it does your computing as you wish (freedom 1). Access to the source code is a precondition for this. – redistribute copies so you can help your neighbor (freedom 2). – distribute copies of your modified versions to others (freedom 3). By doing this you can give the whole community a chance to benefit from your changes. Access to the source code is a precondition for this.
  • 10. 10/89
  • 11. 11/89 • allow a cost-saving start of your career • facilitate cooperation with colleagues • avoid copyright infringements • full control over your own PC • ease of use without a license or activation code • participation in developing applications through online communities • changes (dependent) consumers into (autonomous) agents Why use free software?
  • 12. 12/89 Translation Environment Tools (TenT) all-in-one application for translators proprietary software individual projects with specific functionality ✔ translation memory ✔ analysis and statistics ✔ code protection free software ✔ terminology management ✔ project management ✔ code page conversion ✔ format conversion ✔ Spell checking ✔ ... Structural differences ✔ Translation-Memory ✔ Terminology-Management ✔ Alignment ✔ Search for collocations ✔ Analysis and statistics ✔ Project management ✔ Code-Protection ✔ Batch-scripts ✔ Spellchecking ✔ Code page conversion ✔ Format conversion ✔ ...
  • 13. 13/89 TEnTs • Translation Environment Tools (TenT) all-encompassing application for translators • Generic term for „Translation- Memory-System“ or „Computer aided translation CAT-System“ ✔ Translation-Memory ✔ Terminology-Management ✔ Alignment ✔ Search for collocations ✔ Analysis and statistics ✔ Project management ✔ Code-Protection ✔ Batch-scripts ✔ Spellchecking ✔ Code page conversion ✔ Format conversion ✔ ...
  • 14. 14/89 market reality Okapi JublerAnaphraseus TinyTm open BiText2TMX Sun OLT Virtaal OmegaT OpenTMS FOLTHeartsome Gaupol Lokalize proprietary SDL/Trados Across Star Transit MemoQ DéjàVuRainbow Wordfast Transolution ForeignDesk Pootle KBabel PO-Edit Translate Toolkit Catalyst Apertium
  • 15. 15/89 FOSTT compiled 1. USBTrans – for Windows http://homepage.uibk.ac.at/~c61302/fsftrans.html 2. tuxtrans – Open Translation Desktop System http://www.tuxtrans.org
  • 16. 16/89 USBTrans • a compilation of translation related free and open source software which can be started from an USB stick without installation (Portable Apps) • download from http://homepage.uibk.ac.at/~c61302/fsftrans.htmlsome samples on your USB • unpack the zip archive on your local pc and you are ready to go; you may also copy the unpacked files onto an USB stick (4GB) and start the programs from there • just plug in the USB stick and start the USBTrans menu
  • 17. 17/89 tuxtrans • complete desktop system for translators based on Linux as a free operating system and many specific application for translators • multilingual Italian, English, Spanish, German by default with many more languages available online • all open source or free software • website: http://www.tuxtrans.org Twitter: https://twitter.com/tuxtrans • live-system, it can be started from your USB stick without installation
  • 18. 18/89 Requirements • operating system • standard applications • specific applications } = free open-source
  • 19. 19/89 Requirements • operating system • standard applications • specific applications } = free open-source why? ● to be able to perform all tasks related to translation ● to be able to use and install it ad libitum ● to be able to distribute it to translators and students
  • 20. 20/89 tuxtrans may be used as 1) Live-DVD (without installation, slow performance) 2) Live-USB (without installation, rather slow performance) 3) Virtual machine (VirtualBox VMWare) 4) second OS (with installation, fast performance, easy to customize and adapt) 5) main operating system
  • 21. 21/89 prepare to migrate • use cross-platform-applications • and standard formats odt, pdf, tmx, tbx, xliff ...
  • 22. 22/89 standard apps ➢ word processors ➢ spreadsheets ➢ presentations ➢ databases ➢ DTP ➢ image manipulation ➢ office suite ➢ Browser ➢ E-Mail MS-Windows LO-Writer LO-Calc LO-Impress Access Xpress Gimp Libre Office Firefox Thunderbird Linux LO-Writer LO-Calc LO-Impress MYSQL Scribus Gimp Libre Office Firefox Thunderbird
  • 23. 23/89 translate with FOSS • common tasks of a free-lance translator • with free and open source translation technology • on a free digital infrastructure • with tuxtrans the Linux Desktop for translators
  • 24. 24/89 ▶ translate a Word document with TM support what you need: – a word processor – a TM system – translation memories – terminology lists
  • 26. 26/89 OmegaT: supported formats text formats •Plain text (any encoding supported by Java), including Unicode •StarOffice, OpenOffice.org, LibreOffice and OpenDocument •Open XML (Microsoft 2007/2010/2013) •(X)HTML (including complete website tree structure) •Help & Manual •HTML Help Compiler •LaTeX •DokuWiki •CopyFlow Gold for QuarkXPress •DocBook •Typo3 LocManager •Iceni Infix (PDF) •XLIFF source = target •TXML Wordfast source = target localization formats •Android resources •Java .properties •Key-value files •Mozilla DTD •Windows resources (RC) •WiX localisation •ResX •Flash XML export •Camtasia for Windows •Magento CE localisation •PO (Portable Object File) (reading existing translations) •SubRip subtitles (SRT) •SVG images
  • 28. 28/89 OmegaT: organisation terminology/glossaries conference.txt, congress.tbx ... source text 9th PCTS Conference - Reg. Form EN-1.docx project specific configuration files filters.conf, segmentation.conf translation memories regform.tmx ... inputinput
  • 31. 31/89 OmegaT: organisation new terminology glossary.txt target text 9th PCTS Conference - Reg. Form EN-1.docx project specific configuration files filters.conf, segmentation.conf new translation memories test01.tmx ... outputoutput spell checking files learned_words.txt, ignored_words.txt
  • 33. 33/89 Overview: part II typical tasks of a translator ✔ translate a websitetranslate a website ✔ create a TM on the basis of existing translations ✔ manage terminology and dictionaries ✔ extract terminology from texts ✔ use machine translation ✔ convert file formats ✔ manage bilingual files ✔ manage pdf files ✔ quality assessment ✔ use text corpora
  • 34. 34/89 ▶ translate websites what you need: • website with a few HTML documents • local copy of website • translation memory system • existing translation memories • existing terminology lists • HTML editor
  • 37. 37/89 OmegaT: tuxtrans.org ● source text = complete website directory structure with files ● target text = identical website directory structure with files
  • 40. 40/89 Overview: part II typical tasks of a translator ✔ translate a website ✔ create a TM on the basis of existing translationscreate a TM on the basis of existing translations ✔ manage terminology and dictionaries ✔ extract terminology from texts ✔ use machine translation ✔ convert file formats ✔ manage bilingual files ✔ manage pdf files ✔ quality assessment ✔ use text corpora
  • 41. 41/89 ▶ Align source and target text How can I create a translation memory from an existing translation? What you need: • source text and target text • an alignment tool • time and patience
  • 42. 42/89 Alignment: BiText2TMX ● no development ● only text format ● TM as TMX
  • 43. 43/89 Alignment: LF-Aligner • Formats: txt (UTF-8!), rtf, doc, docx, odt, pdf, html <tu creationdate="20140916T141042Z" creationid="LF Aligner 3.11"><prop
  • 44. 44/89 ▶ TMX-Management Heartsome TMX-Editor • Edit, filter • convert from TMX to ..., to TMX from ... • QA • ...
  • 47. 47/89 Overview: part II typical tasks of a translator ✔ translate a website ✔ create a TM on the basis of existing translations ✔ manage terminology and dictionariesmanage terminology and dictionaries ✔ extract terminology from texts ✔ use machine translation ✔ convert file formats ✔ manage bilingual files ✔ manage pdf files ✔ quality assessment ✔ use text corpora
  • 48. 48/89 ▶ manage your terminology ForeignDesk Termbase ● concept oriented ● import/export html, csv, Multiterm ● OmegaT integration with csv export + edit
  • 49. 49/89 ▶ search for terminology GoldenDict ● several dictionary formats ● same formats as dictionaries in OmegaT
  • 50. 50/89 integrate dictionaries in OmegaT dictionaries dictionaries vs glossaries: ● no interaction possible just reading of entries no editing nor adding ● all dictionaries in stardict format ● available on the web
  • 51. 51/89 dictionaries in OmegaT monolingual StarDict dictionary integrated in OmegaT: „Oxford Advanced Learner dictionary“
  • 52. 52/89 dictionaries in OmegaT bilingual StarDict dictionaries in OmegaT: „en-de“
  • 53. 53/89 glossaries in OmegaT create,edit and use glossary lists in OmegaT • simple 3-column format source term (Tab) target term (Tab) note • always UTF-8 coded • file names: *.txt, *.tab, *.utf8, *.tbx • terminology recognition and auto- completer in OmegaT: https://www.youtube.com/watch?v=LszI aP22QhQ glossaries
  • 54. 54/89 glossaries in OmegaT create, edit and use glossary lists in OmegaT ● add terminology: [CTRL Shift G] or Edit / Create glossay entry ● saved in Project / Properties:
  • 55. 55/89 Overview: part II typical tasks of a translator ✔ translate a website ✔ create a TM on the basis of existing translations ✔ manage terminology and dictionaries ✔ extract terminology from textsextract terminology from texts ✔ use machine translation ✔ convert file formats ✔ manage bilingual files ✔ manage pdf files ✔ quality assessment ✔ use text corpora
  • 56. 56/89 ▶ term extraction • simple monolingual term extraction with Rainbow
  • 57. 57/89 ▶ term extraction • simple monolingual term extraction with Phrase extractor
  • 58. 58/89 Overview: part II typical tasks of a translator ✔ translate a website ✔ create a TM on the basis of existing translations ✔ manage terminology and dictionaries ✔ extract terminology from texts ✔ use machine translationuse machine translation ✔ convert file formats ✔ manage bilingual files ✔ manage pdf files ✔ quality assurance ✔ use text corpora
  • 59. 59/89 ▶ Machine Translation what you need: • an Open Source MT system • a free on-line MT system • an interface to your TM system
  • 60. 60/89 Apertium: OS MT system • Apertium is a rule based Open Source MT system • mostly regional language pairs from Spain with Englisch
  • 61. 61/89 Apertium: OS MT system • Apertium offline can be integrated into OmegaT
  • 62. 62/89 on-line MT systems • you need to register to use online Mt within OmegaT: – Mymemory (only E-Mail) – Microsoft Translator API ID + Secret key – Google Translate API key OmegaT.I4J.ini
  • 63. 63/89 • available for free on the web – Microsoft Translator – Google Translate • within OmegaT – Apertium online + offline free – Mymemory free – Microsoft Translator free, but you need to register – Google Translate registration + costs on-line MT systems
  • 64. 64/89 on-line MT systems • interface in OmegaT
  • 65. 65/89 Overview: part II typical tasks of a translator ✔ translate a website ✔ create a TM on the basis of existing translations ✔ manage terminology and dictionaries ✔ extract terminology from texts ✔ use machine translation ✔ convert file formatsconvert file formats ✔ manage bilingual files ✔ manage pdf files ✔ quality assurance ✔ use text corpora
  • 66. 66/89 convert file formats • text formats from doc to docx, odt … with LibreOffice
  • 67. 67/89 convert file formats • CSV, TAB, TXT glossary lists to TBX with Heartsome Translation Studio • CSV, TXT parallel texts to TMX Tms with Heartsome • Okapi Rainbow to PO, TMX, CSV • SDL/Trados TM and TB to TMX with TradosStudio resources converter
  • 68. 68/89 Overview: part II typical tasks of a translator ✔ translate a website ✔ create a TM on the basis of existing translations ✔ manage terminology and dictionaries ✔ extract terminology from texts ✔ use machine translation ✔ convert file formats ✔ manage bilingual filesmanage bilingual files ✔ manage pdf files ✔ quality assurance ✔ use text corpora
  • 69. 69/89 ▶ bilingual file types text files which include source text segments aligned to target text segments, i.e. files which contain translations for all segments or just for a part of segments XLIFF, sdlxliff, ttx can not be translated with OmegaT directly, or rather only when source text segment = target text segment existing translations can thus not be leveraged partly translated file as shown in Virtaal
  • 70. 70/89 ▶ bilingual file types Existing translations can be re-used in OmegaT by using Okapi Rainbow and by creating ready-to-go OmegaT translation projects } ● extract source text ● create TM from existing translations ● translate project in OmegaT ● post process translation in Rainbow ● get translated bilingual files ● XLIFF ● sdlxliff ● ttx
  • 73. 73/89 Overview: part II typical tasks of a translator ✔ translate a website ✔ create a TM on the basis of existing translations ✔ manage terminology and dictionaries ✔ extract terminology from texts ✔ use machine translation ✔ convert file formats ✔ manage bilingual files ✔ manage pdf filesmanage pdf files ✔ quality assurance ✔ use text corpora
  • 74. 74/89 ▶ PDF files • translate PDF files with OmegaT only text-based PDF and target text will be txt file • extract text from PDF files with gPDFText eBook-Editor (Linux) • split and merge PDF files with PDF-Sam • annotate PDF files with Xournal (Linux) • compare PDF files with DiffPDF
  • 75. 75/89 PDF and OmegaT • translate text-based PDF directly in OmegaT
  • 76. 76/89 extract text from PDF • gPDFText eBook-Editor extracts Text from text-based PDF without line breaks
  • 77. 77/89 split and merge PDFs • PDF-Sam
  • 78. 78/89 PDF-Dateien annotieren • comment, underline, highlight, etc. with Xournal
  • 79. 79/89 Overview: part II typical tasks of a translator ✔ translate a website ✔ create a TM on the basis of existing translations ✔ manage terminology and dictionaries ✔ extract terminology from texts ✔ use machine translation ✔ convert file formats ✔ manage bilingual files ✔ manage pdf files ✔ quality assurancequality assurance ✔ use text corpora
  • 80. 80/89 ▶ Quality assurance 1) OmegaT QA scripts 2) Okapi Checkmate 3) TMX-Validator 4) XLIFF Checker
  • 81. 81/89 QA with OmegaT • OmegaT QA scripts
  • 84. 84/89 Overview: part II typical tasks of a translator ✔ translate a website ✔ create a TM on the basis of existing translations ✔ manage terminology and dictionaries ✔ extract terminology from texts ✔ use machine translation ✔ convert file formats ✔ manage bilingual files ✔ manage pdf files ✔ quality assurance ✔ use text corporause text corpora
  • 85. 85/89 ▶ create a reference corpus Why? search for terms, collocations, idioms ... what you need: • a representative quantity of LSP texts in the subject domain you want to translate • a free concordance tool
  • 86. 86/89 TextSTAT • file formats: doc, odt, html, txt • concordance search • frequency lists
  • 87. 87/89 AntConc • AntConc less file formats more linguistic analysis
  • 88. 88/89 Overview: part II typical tasks of a translator ✔ translate a website ✔ create a TM on the basis of existing translations ✔ manage terminology and dictionaries ✔ extract terminology from texts ✔ use machine translation ✔ convert file formats ✔ manage bilingual files ✔ manage pdf files ✔ quality assurance ✔ use text corpora translation of subtitles editing and translating PO files creating mindmaps for terminology creating a web corpus localizing software evaluating MT output ... … + ...
  • 89. 89/89 http://www.petersandrini.net http://uibk.academia.edu/PeterSandrini Simply, try it out! Thank youThank you for your attention!for your attention! What next?