Terminology as a Service – a model for collaborative terminology management
Klaus-Dirk Schmitz - Cologne University of Applied Sciences
Tatiana Gornostay - Tilde, Riga
VII EAFT Terminology Summit. Barcelona, 27-28 november 2014
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
Terminology as a Service – a model for collaborative terminology management
1. Terminology as a Service –
a model for collaborative
terminology management
EAFT Terminology Summit
Barcelona – 27-28 November 2014
Klaus-Dirk Schmitz
Cologne University of Applied Sciences
klaus.schmitz@fh-koeln.de
Tatiana Gornostay
Tilde, Riga
tatiana.gornostay@tilde.lv
3. K.-D. Schmitz, IIM, FH Köln
Collaborative terminology management
Collaborative: several individuals are involved in
the creation of terminological entries
Different terminological competences require well
elaborated user profiles with specific rights and
views (read/write, only certain languages/datCats, …)
Well defined workflow and quality assurance
procedures needed (supported by e.g. QuickTerm)
Metadata (datCats) for normative and workflow
status needed (preferred/admitted/deprecated,
draft/under discussion/final, …)
4. K.-D. Schmitz, IIM, FH Köln
Cloud-based terminology management
Since terminology work is “expensive”,
why not involve the Crowd to create and
validate terminology?
You need a tool for managing terminology in the Cloud!
Examples:
Wikipedia (www.wikipedia.org)
TermWiki (www.termwiki.com)
Different approach to:
web interfaces for TMS (e.g. MultiTerm-Web)
web-based TMS (e.g. TermWeb)
5. K.-D. Schmitz, IIM, FH Köln
The main questions:
How can you animate the Crowd?
Hidden business model? Free services? Sharing data?
Do you want to have your data in the Cloud?
Can you apply established terminological
principles (meta model, datCats, concept-orientation)
How can you ensure correctness?
How can you ensure completeness?
How can you ensure consistency?
How can you ensure reliability?
Cloud-/Crowd-based terminology work
6. K.-D. Schmitz, IIM, FH Köln
A new approach as an example:
TaaS - Terminology as a Service:
cloud-based platform for acquiring, cleaning up,
sharing, and reusing multilingual terminological
data
The project has received funding from the European
Union Seventh Framework Programme (FP7/2007-2013),
grant agreement no 296312.
The TaaS Project
7. K.-D. Schmitz, IIM, FH Köln
Partners:
Tilde Latvia (Coordinator)
TAUS The Netherlands
Kilgray Hungary
Fachhochschule Köln Germany
University of Sheffield UK
Time: 1. June 2012 – 31. May 2014
Languages: all European + Russian
www.taas-project.eu
The TaaS Project
8. K.-D. Schmitz, IIM, FH Köln
Automatic extraction of monolingual term
candidates
from user uploaded documents using state-of-the art
terminology extraction techniques
Automatic retrieval of translation equivalents
for the extracted terms, in user-defined target language(s)
from different public and industry terminology databases
Translation candidate acquisition
for terms not found in term banks from parallel web data
using state of-the-art terminology extraction and bilingual
terminology alignment methods;
Basic Services of TaaS
9. K.-D. Schmitz, IIM, FH Köln
Facilities for cleaning-up
by users automatically acquired terminological data
Data sharing and integration facilities
through APIs and export tools for sharing of resulting
terminological data with major term banks and usage in
different applications
Basic Services of TaaS
13. K.-D. Schmitz, IIM, FH Köln
Go to https://term.tilde.com
Direct search for terms and equivalents
Or log in / sign up for further services
Example: Term extraction via TaaS
15. K.-D. Schmitz, IIM, FH Köln
Gehe zu https://term.tilde.com
Entweder direkte Suche
Oder anmelden / registrieren für weitere Services
Projekt zur Termextraktion anlegen
Text(e) zur Extraktion laden
Beispiel: Termextraktion mit TaaS
17. K.-D. Schmitz, IIM, FH Köln
Gehe zu https://term.tilde.com
Entweder direkte Suche
Oder anmelden / registrieren für weitere Services
Projekt zur Termextraktion anlegen
Text(e) zur Extraktion laden
Extraktionseinstellungen festlegen
Extraktion starten
Beispiel: Termextraktion mit TaaS
19. K.-D. Schmitz, IIM, FH Köln
Gehe zu https://term.tilde.com
Entweder direkte Suche
Oder anmelden / registrieren für weitere Services
Projekt zur Termextraktion anlegen
Text(e) zur Extraktion laden
Extraktionseinstellungen festlegen
Extraktion starten
Prüfe und ergänze Extraktionsergebnisse
Beispiel: Termextraktion mit TaaS
20. K.-D. Schmitz, IIM, FH Köln
Gehe zu https://term.tilde.com
Entweder direkte Suche
Oder anmelden / registrieren für weitere Services
Projekt zur Termextraktion anlegen
Text(e) zur Extraktion laden
Extraktionseinstellungen festlegen
Extraktion starten
Prüfe und ergänze Extraktionsergebnisse
Visualisierung
Beispiel: Termextraktion mit TaaS
22. K.-D. Schmitz, IIM, FH Köln
Some evaluation results
Evaluation in April (and June) 2014
4 test documents
Type: online article, white paper, dissertation
Domain: energy, economics, IT, astronomy
Languages: DE-EN, DE-FR, EN-FR
Gold Standard:
human term extraction, 7-10 candidates / document
problem: subjectivity
23. K.-D. Schmitz, IIM, FH Köln
Gold Standard
Example Astronomy: 36x1 + 26x2 + 63x3 = 277
24. K.-D. Schmitz, IIM, FH Köln
Calculation of Recall and Precision
Recall:
all found relevant TC / all relevant TC
all relevant TC found?
Precision:
all found relevant TC / all found TC
all found TC relevant?
25. K.-D. Schmitz, IIM, FH Köln
Test with Kilgray (statistic):
Results of the TaaS evaluation
Test with TWSC and Term Normalizer (linguistic):
27. K.-D. Schmitz, IIM, FH Köln
Improvement of TaaS
Second (short) evaluation after the end of the project in
June 2014:
28. K.-D. Schmitz, IIM, FH Köln
Comparison TaaS – human – MT-Extract
T1: Terminologist with the best Recall and Precision values
T4: Terminologist with the worst Recall values
Ü1: Translator with the worst Precision values
MT: MultiTerm Extract (statistical) with different Silence/Noise values
30. K.-D. Schmitz, IIM, FH Köln
30
Data
acquisition
from SMT
systems
Export of
multilingual
terminology
for reuse in
MT systems
Online Terminology Services
Translation
Training
SMT System
Training and
adaptation
Online Translation
Service
Input Text for
Translation
Parallel
corpus
Monolingual
corpus
Bilingual term
collections
Monolingual
Term
Extraction
Trained
SMT
Model
Bilingual
Term
Extraction
Translated
Text
TaaS: (statistical) Machine Translation
31. K.-D. Schmitz, IIM, FH Köln
Conclusion
TaaS offers free of charge services for terminology
extraction, retrieval, management, and sharing
The term extraction results are excellent, if the
linguistic algorithms are available for that language
Companies react very carefully concerning TaaS
But the free services offered by TaaS may attract
language workers to use TaaS for terminology
management, to share (validated) terminology,
and to collaborate with others.
32. Thank you
for your attention
Prof. Dr. Klaus-Dirk Schmitz
Cologne University of Applied Sciences
Fakulty 03 - ITMK/IIM
Ubierring 48
D-50678 Köln
Germany
klaus.schmitz@fh-koeln.de