This document introduces Terminology as a Service (TaaS), a cloud-based platform for acquiring, cleaning, sharing, and reusing multilingual terminology data. It discusses the complexity of terminology work and the need for TaaS. A survey found that over 80% of users see terminology work as important or very important. The document outlines TaaS partners and services, which include automatic term extraction and translation, integration with terminology databases and tools, and APIs for other applications. TaaS aims to simplify terminology processes for language workers and provide terminology to translation tools and machine translation systems.
Welcome to the Cloud! Terminology as a Service, CHAT2013
1. Welcome to the Cloud!
Terminology as a Service
Andrejs Vasiļjevs
Tilde
tekom 2013 / Wiesbaden / 07.11.2013.
2. Complexity of terminology works
Term identification in the source text
Consulting online databases and local files for translation
equivalents
Creating and maintaining terminology glossaries
Sharing term glossaries and involving others in their
polishing
Structuring data in the industry standard formats
Integrating term glossaries in CAT and other productivity
tools
Keeping terminology up to date
etc.
3. Terminology as a Service
cloud-based platform for acquiring, cleaning up,
sharing, and reusing multilingual terminological data
4. TaaS User Needs Survey Results:
Importance of terminology work
1.8%
14.8%
43.5%
Very important
Quite important
Less important
Not important
39.9%
5. TaaS User Needs Survey:
willingness to share
60.5%
39.5%
Yes, provided that…
16.7%
No, because…
8.3%
24.9%
6.0%
4.6%
16.5%
48.6%
7.6%
19.2%
11.4%
14.2%
Joint contribution to the DB
Access control
Legal aspects
External quality control
Little effort
Anonymity
Other
22.0%
Legal restrictions
Poor quality/Lack of time
Own asset
Risk of misunderstanding
6. TaaS Partners
Tilde
Latvia (Coordinator)
TAUS
Netherlands
Kilgray
Hungary
Cologne University
of Applied Sciences
University of Sheffield
Germany
UK
7. TaaS Mission
Simplify the process for language workers to prepare,
store and share of task-specific multilingual term glossaries
Provide instant access to term translation equivalents and
translation candidates for professional translators through
CAT tools
Domain adaptation of statistical machine translation
systems by dynamic integration with TaaS provided
terminology data
8. Key services of TaaS
Automatic extraction of monolingual term
candidates
from user uploaded documents
Automatic retrieval of translation equivalents
from different public and industry terminology
databases
Translation candidate acquisition
from multilingual web data
Facilities for cleaning-up
by users automatically acquired terminological
data;
Data sharing and integration facilities
through APIs and export tools
11. Target Repositories
TAUS Data
repository of multilingual translation memories
EuroTermBank
databank of federated multilingual terminology
IATE
inter-institutional termbank of European Union
META-SHARE
distributed Pan-European repository of language
resources
12. Integration
Support for industry standard
formats
Integration into CAT and
productivity tools
API to integrate TaaS services
into various software
applications
14. HTML Term Annotation
Term entries for terms identified in EuroTermBank are stored in TBX format
in a <script> element that is placed in the HTML5 document.
16. Identifying and marking terms
New W3C standard for Internationalization
Tag Set ITS 2.0
ITS 2.0 enriched
content
ITS 2.0 enriched
content
Showcase
Web Page
Terminology
Annotation
Web Service API
Plaintext
TaaS Terminology Services
Human users
(e.g., translators,
terminologists)
ITS2.0
term-annotated content
export / visualisation
ITS2.0
term-annotated
content
ITS 2.0
enriched
content
Term-annotated
content
ITS2.0
term-annotated
content
Machine users
CAT Tools MT Systems
17.
18. CAT tools
MT
https
REST
https
REST
Presentation Layer
included
Public API
included
Web Page UI
External
TDBs
https
REST
Web
Browsers
http/https
html
TaaS Architecture
Application Logic Layer
Terminology
collection
management
User
management
Data Storage Layer
(Shared Term Repository)
Terminology
collection
search
Terminology
collection
creation
Term extraction workflows
Full collection
creation
workflow
Monolingual
collection
creation
High-performance
Computing (HPC) Cluster
File Store
HPC frontend
SGE
Translation
candidate
extraction
Modules
Term extraction
TXT extractor
TWSC
Kilgray Term
Extractor
Term normalizer
CPU
CPU
Collection creator
CPU
CPU
Statistical DB
acquisition
CPU
Statistical
DB
CPU
CPU
Shared Term
Repository
DB
Text
tagging
with terms
CPU
CPU
CPU
CPU
CPU
Parameter retriever
Bilingual Term
Extraction System
Statistical DB feeding
....
Translation
lookup
ETB & STR
IATE
TAUS API
Statistical DB
Collection merger
Result processing
Collection Importer
Marked Text
enrichment
23. Boost in the quality of
machine translation
Narrow Domain Automotive MT
English – Latvian
DATA
2 M unique parallel sentences
1.9 M monolingual sentences
0.2 M in-domain monolingual
QUALITY
16% improvement from
terminology integration
25. Thank you!
andrejs@tilde.com
The research within the project TaaS leading to these results has received funding from the European
Union Seventh Framework Programme (FP7/2007-2013), Grant Agreement no 296312