SlideShare uma empresa Scribd logo
1 de 62
TERMINOLOGY
EXTRACTION
TOOLS FOR
INTERPRETERS
JOSH GOLDSMITH
2ND COLOGNE CONFERENCE ON
TRANSLATION, INTERPRETING,
AND TECHNICAL
DOCUMENTATION
NOVEMBER 30, 2018
JG@JOSHGOLDSMITH.COM
@GOLDSMITH_JOSH
http://xl8.link/
TerminologyExtractionSlides
1.
STATE OF THE
ART
2
DEFINITIONS
TERM
“lexical items belonging to
specialized areas of
usage”
Sager (1990: 2)
TERMINOLOGY
EXTRACTION
“Automatically isolating
terminology from texts”
Cabré, Estopà & Vivaldi
(2001:53)
3
WHY TERMINOLOGY MATTERS
FOR INTERPRETERS
To be accepted as “insiders” and perceived as “competent,”
interpreters must:
● Have sufficient specialized knowledge of the domain
● Know and use domain-specific terminology
● Master phraseology of specialized language
Fantinuoli (2012:41)
4
WHY EXTRACT
TERMINOLOGY?
● Limited preparation time: materials made available last
minute (Pignataro 2012)
● Preparation is time-intensive; generally entails collecting
parallel texts and extracting relevant terminology
(Fantinuoli 2017)
● Collecting terminology is a regular part of preparing for an
assignment (Bilgen 2009:91)
● Preparation front-loads cognitively challenging tasks and
can decrease cognitive load while interpreting (Stoll 2009)
● Terminological preparation may improve performance and
processing, leading to target language renditions featuring
more specialized terminology (Diaz Galaz, 2015)
5
TERMINOLOGY
MANAGEMENT SYSTEMS
● Early studies survey professionals about terminology-
related needs and practices to develop terminology
management tools for interpreters
Rütten (2003), Bilgen (2009)
● Researchers analyze tools to see if meet needs
Costa, Corpas Pastor & Durán
Muñoz (2014, 2017); Will (2015)
● These studies tend to be based on researchers’ subjective
assessments of interpreters’ needs rather than on
objective criteria
Goldsmith (2017)
6
COULD TERMINOLOGY EXTRACTION
STREAMLINE PREPARATION?
● Tools could decrease preparation time and allow
interpreters to focus on the most relevant terms during
preparation
Rütten (2003)
● Corpus-based preparation gave rise to better terminology-
related performance in simultaneous interpretation
Xu (2015)
7
LACK OF
TECHNOLOGY AND RESEARCH
● “No tool has been specifically developed to satisfy the
needs of interpreters during the preparatory phase”
Fantinuoli (2017:24)
● Research has considered key features of terminology
extraction tools for translators, but not interpreters
Costa, Zaretskaya, Corpas Pastor
& Seghiri (2016)
8
TYPES OF AUTOMATIC
TERMINOLOGY EXTRACTION
SYSTEMS
9
LINGUISTIC
Use linguistic
knowledge
(morphology, etc.)
to detect lexical
units
▸ Noise tends to
be high
STATISTIC
Use relative
frequencies to
identify high-
frequency lexical
units
▸ Hard to find
low-frequency
terms
HYBRID
Combine
statistical and
linguistic
measures
Cabré, Estopà &
Vivaldi (2001:53);
Fantinuoli (2012)
ASSESSING TERMINOLOGY
EXTRACTION SYSTEMS
RECALL
“Capacity of the detection
system to extract all terms
from a document”
SILENCE
“Terms contained in an
analysed text that are not
detected by the system”
PRECISION
“Capacity to discriminate
between those units
detected by the system
which are terms and those
which are not”
NOISE
“The rate between discarded
candidates and accepted
ones”
Cabré, Estopà & Vivaldi
(2001:53-56)
10
AIMS OF AUTOMATIC
TERMINOLOGY EXTRACTION
● Reduce noise (be accurate)
● Reduce silence (be complete)
● Allow for manual selection of terms and validation of
candidate terms
Heid (2001)
● “As usability is regarded as being fundamental for the
acceptability of an interpreter-oriented tool, a terminology
extraction system for interpreters must give priority to
precision over recall.”
Fantinuoli (2012: 49)
11
2.
STUDY DESIGN
AND
PARTICIPANTS
12
1. What tools are interpreters using for terminology
extraction?
2. What are the strengths and weaknesses of these tools?
3. In which settings are terminology extraction tools useful?
In which settings should they be avoided?
4. What does the terminology extraction process look like?
5. How does terminology extraction compare to other types of
preparation?
6. In addition to the term itself, what should these tools
extract?
7. What features would an ideal terminology extraction tool
offer?
RESEARCH QUESTIONS
EXPLORATORY, MULTI-PHASE MIXED METHODS RESEARCH TO
▸ Map the field of terminology extraction tools for interpreters
▸ Develop an instrument to assess tools (Creswell & Clark 2006)
SEMI-STRUCTURED IN-DEPTH INTERVIEWS
▸ Develop detailed descriptions, present multiple perspectives, describe
process, understand a situation from the inside (Weiss, 1994).
▸ Answers “are rich and thick with qualitative data” (Turner, 2010, p. 756).
▸ Zoom™, Speechmatics™
▸ Informed consent
▸ Anonymous
INDUCTIVE THEMATIC ANALYSIS
▸ Transcribe interviews and inductively derive categories (Kvale, 1996)
▸ Coded with NVivo™ (CAQDAS program)
RESEARCH DESIGN
▸ 10 respondents, all professional interpreters (2 women)
▸ Age 29 – 57 (μ = 42.2)
▸ Domiciled in Europe and North America
▸ 6 members of professional associations (60%)
▸ 2 staff interpreters (20%)
▸ Conference (100%), Media (10%), Court (10%) and
Community (10%) interpreting
▸ Experience: 3 – 30 years (μ = 17.7)
▸ Experience using terminology extraction tools: 1 - 17
years (μ = 8.9)
▸ Translation, training, research, administration,
voiceovers
PARTICIPANTS
PARTICIPANTS’ EXPERIENCE
MANUAL
SEMI-
AUTOMATIC
AUTOMATIC
PERCENTAGE OF
ASSIGNMENTS USED
0 - 100%
(μ = 48.0%)
0 - 100%
(μ = 18.9%)
0 - 100%
(μ =
40.0%)
NUMBER OF
ASSIGNMENTS USED
0 - 840
(μ = 123.8)
0 - 150
(μ = 17.2)
0 - 600
(μ = 135.6)
THIS IS A PILOT STUDY
RESULTS CANNOT BE
GENERALIZED, BUT DO AIM
TO GIVE A GENERAL
OVERVIEW OF TOOLS,
EXPERIENCES AND
EXPECTATIONS.
PERCENTAGES ARE NOT
STATISTICALLY SIGNIFICANT
OR GENERALIZABLE.
3.
TOOLS USED
18
19
HARDWARE USED
Desktop (50%)
Laptop (75%)
Tablet (20%)
Windows operating system (80%)
MacOS (40%)
iOS (20%)
▸ Some users utilize multiple devices
20
InterpretBank (60%)
Interpreters’ Help (40%)
SketchEngine (20%; 30% used or tested)
Intragloss (10%; 40% used or tested)
Wordsmith, Terminotix, Readdle Documents, GoodReader, GT4T,
dtSearch, Thermostat, as well as an in-house tool at an
international organization (10% each)
▸ Users work with or had tested multiple types of
terminology extraction software
TERMINOLOGY EXTRACTION
SOFTWARE USED
21
Terminology management tools (InterpretBank, Interpreters’
Help, Interplex, MS Access): 100%
Annotation tools (Readdle Documents, GoodReader, PDF
Exchange Editor, Skim): 50%
Terminology database (e.g. IATE): 50%
Wikipedia: 40%
Linguee: 40%
Search Engines: 30%
OTHER
SOFTWARE USED
4.
THE
EXTRACTION
PROCESS
DIFFERENT
APPROACHES TO
MANUAL, SEMI-
AUTOMATIC AND
AUTOMATIC
TERMINOLOGY
EXTRACTION
22
TYPES OF
TECHNOLOGY-ASSISTED
TERMINOLOGY EXTRACTION
23
MANUAL
User selects terms
manually.
Tool provides
support, e.g., to:
▸ add terms to
glossary
▸ look up
translation
▸ help manage
terms
SEMI-AUTOMATIC
User provides
document(s).
Tool suggests terms.
User reviews and
accepts them.
AUTOMATIC
User provides
document(s).
Tool suggests term
candidates.
Goldsmith (2018)
MONOLINGUAL
MANUAL
TERMINOLOGY EXTRACTION
24
WITH
ANNOTATION
BILINGUAL
MANUAL
TERMINOLOGY EXTRACTION
25
WITH
PARALLEL
DOCUMENTS
MONOLINGUAL/BILINGUAL
MANUAL
TERMINOLOGY EXTRACTION
26
WITH
PARALLEL
DOCUMENTS
MULTILINGUAL
MANUAL
TERMINOLOGY EXTRACTION
27
MONOLINGUAL
SEMI-AUTOMATIC
TERMINOLOGY EXTRACTION
28
MONOLINGUAL
SEMI-AUTOMATIC
TERMINOLOGY EXTRACTION
29
BILINGUAL
SEMI-AUTOMATIC
TERMINOLOGY EXTRACTION
30
BILINGUAL
SEMI-AUTOMATIC
TERMINOLOGY EXTRACTION
31
MULTILINGUAL
SEMI-AUTOMATIC
TERMINOLOGY EXTRACTION
32
WITH
ANNOTATION
MONOLINGUAL/MULTILINGUAL
AUTOMATIC
TERMINOLOGY EXTRACTION
33
BILINGUAL
AUTOMATIC
TERMINOLOGY EXTRACTION
34
WITH
ANNOTATION
5.
OTHER
PREPARATION
STRATEGIES
35
36
OTHER
PREPARATION STRATEGIES
Read documents (90%)
Background reading (50%)
Web research (50%)
Memorize/drill terms (50%)
Manual annotation (40%)
Wikipedia (40%)
Terminological research (30%)
Gisting/text summarization (20%)
Automatic translation; Concordancer; Build glossaries
collaboratively; Read news; Read technical documents; Practice
interpreting on similar topics (10%)
6.
PROS, CONS
AND
EFFECTIVENESS
37
38
STRENGTHS OF TERMINOLOGY
EXTRACTION TOOLS (1)
FACILITATES PREPARATION
Saves time (100%)
Provides terminology despite time pressure (90%)
Quick extraction from lengthy documents (60%)
Less hassle / menial copying and pasting (30%)
Automatic annotation of term (and translation) (20%)
Better preparation (10%)
CONSISTENCY/RELIABILITY
Accurate/reliable results from automatic extraction (50%)
Consistent preparation (20%)
39
STRENGTHS OF TERMINOLOGY
EXTRACTION TOOLS (2)
TERMINOLOGICAL PRECISION
Automatically extract most important / “right” terms (50%)
Automatically look up translations on other sites (40%)
Automatically extract named entities (10%)
Add stop words (10%)
Search function (10%)
ERGONOMICS
Lightweight, portable, small footprint (30%)
40
STRENGTHS OF TERMINOLOGY
EXTRACTION TOOLS (3)
DISPLAY/INTERFACE
Parallel scrolling (50%)
Easy comparison of bilingual/multilingual texts (30%)
Manual highlighting/annotation/bookmarking (20%)
Easy to use; easy input of terms; visually appealing; filter/edit
results (10% each)
EXPORT/STORAGE
Export candidates to database (40%)
Back up/digitize glossaries (30%)
Export in shareable format (20%)
Reuse for later assignments (10%)
41
WEAKNESSES OF TERMINOLOGY
EXTRACTION TOOLS (1)
PREPARATION
Incomplete preparation if only use term extraction (40%)
Time-intensive (manual, copy/paste) (20%)
Slow with large glossaries (10%)
IMPORT/EXPORT/STORAGE
Poor export/formatting of exported text (20%)
Tool doesn’t recognize format (e.g. line breaks, images) (20%)
Compatibility (Mac/PC, etc.) (10%)
Poor import of documents/glossaries (10%)
Export not provided (10%)
42
WEAKNESSES OF TERMINOLOGY
EXTRACTION TOOLS (2)
EXTRACTION
Multilingual extraction not supported (50%)
Too many terms extracted (50%)
Results need cleaning up (30%)
Too few/many words in term (20%)
Noise (20%)
Too few terms extracted (20%)
Incomplete extraction (e.g. context missing) (10%)
Tool reorders words (10%)
43
WEAKNESSES OF TERMINOLOGY
EXTRACTION TOOLS (3)
DISPLAY
Poor/incomplete presentation of results (30%)
Terminology entry lacks relevant fields (10%)
Small screen size (tablet) (10%)
CUSTOMIZATION
Tools not designed for interpreters (10%)
Software doesn’t know user’s individual needs (10%)
COST
Cost/subscription (20%)
44
SETTINGS WHERE
EXTRACTION TOOLS PREFERRED
80% used extraction when documents available
MANUAL
Parallel texts (40%)
New topic (30%)
Few documents (30%)
Time permitting (30%)
Focus on collocations (10%)
Only monolingual
documents available (10%)
AUTOMATIC
Numerous/long documents (40%)
For institutions (40%)
Time pressure (40%)
For hearings (20%)
For automatic annotation when
glossaries available (20%)
Familiar subject matter (10%)
All assignments (10%)
When onsite (10%)
45
SETTINGS WHERE
EXTRACTION TOOLS AVOIDED
Limited / no materials available (50%)
Documents not available in digital format (30%)
Need to understand content (30%)
Text too general (20%)
Powerpoint (20%)
Faster to read than extract (20%)
Recurring meeting/familiar with terminology (10%)
Confidentiality (10%)
Multilingual documents not available (10%)
Vague subject matter (10%)
Very large / small glossary available (10%)
70%of respondents felt terminology extraction was more effective
than other types of preparation
46
62.5%
BUT ONLY 40%
of respondents preferred terminology extraction over other
types of preparation
of respondents felt terminology extraction tools meet their
needs
90%of respondents felt clients were not aware they used
terminology extraction tools. Those who were aware reacted
positively (20%) and found it professional (10%)
47
80%of respondents felt colleagues were curious about
terminology extraction tools, although some mentioned
uninterested colleagues (40%) who were averse to new
approaches (20%) or unwilling to change their habits (20%)
7.
THE IDEAL
TOOL
48
49
THE IDEAL TOOL
SHOULD EXTRACT
Term (100%)
Single and multi-word terms (100%)
Context/examples (90%)
Equivalents in other languages (70%)
Source / source document (50%)
Definition (40%)
Frequencies (40%)
Subject matter overview (40%)
Collocations / phraseology (30%)
Named entities; figures; domain; link to source (20%)
Graphical information; images; hyponyms; semantic
groupings (10%)
50
THE IDEAL TOOL
ANNOTATION
Allow manual annotation (70%)
Highlight terms (60%)
Highlight phraseology (60%)
Print translations above extracted term (40%)
Automatically annotate term occurrences from glossary (30%)
Manually add sticky notes (30%)
Highlight relevant content (20%)
Annotations overview pane (20%)
Bookmarks; Highlight phraseology; Highlight named entities (10%
each)
51
THE IDEAL TOOL
EXTRACTION/TRANSLATION
Extract unknown terms (80%)
Multilingual extraction available (80%)
Statistical extraction/show frequencies (70%)
Filter results (manually, chronologically, thematically, by frequency, by
agenda item, etc.) (60%)
Extract from multiple files (60%)
Access external resources from within program (60%)
Ignore stop words / decrease noise (60%)
View parallel texts & manually extract equivalents (50%)
Automatically rank most relevant terms (40%)
No clean up necessary; access multiple termbases/dictionaries; search
glossaries for extracted terms; tablet and/or stylus interface (30%) ...
52
THE IDEAL TOOL
IMPORT
Limited preprocessing / automatic conversion regardless of
source file format (40%)
Batch upload (30%)
Import from parallel resources / in multiple languages (20%)
Built-in webcrawler (10%)
Import from your institutional calendar (10%)
Flawless import (no errors with line breaks, etc.) (10%)
Imports pre-existing glossaries (10%)
53
THE IDEAL TOOL
EXPORT
Multilingual export (60%)
One-click import into database (50%)
Export into widely used/compatible formats (30%)
Export annotated text (20%)
Print from tool (10%)
54
THE IDEAL TOOL
FORMAT AND STORAGE
FORMAT
Cross-platform (50%)
Software suite / integration with terminology management tool
(50%)
Compatible with mobile devices (30%)
“Available on my operating system” (30%)
Compatible with translation tools/databases (20%)
Checks pre-existing glossaries to avoid duplicates (20%)
STORAGE
Local storage (40%)
Offline to maintain confidentiality (30%)
Cloud storage (30%)
55
THE IDEAL TOOL
INTERFACE
Link term to context (90%)
View parallel texts side by side with synchronous scrolling (70%)
Bilingual/multilingual term list (50%)
Reliability marker/index (50%)
Simple, uncluttered display (40%)
Search within source documents (30%)
Customize display (30%)
Speech recognition interface (20%)
Can manually annotate with stylus (20%)
Clear color code (20%)/color code for fuzzy matches (20%)
Search within/filter exported terms (20%)
Extensive information available (20%) ...
56
THE IDEAL TOOL
CUSTOMIZATION
Configure number of terms extracted (50%)
Configure working languages (40%)
Customize external resources (40%)
Custom results based on audience/domain/client (40%)
Configure term length (n-gram) (30%)
Customize display/user interface (30%)
Knows interpreter’s preferences (20%); Designed for interpreters (20%)
Tool knows interpreter’s background and adjusts accordingly (20%)
Configure frequency threshold (20%)
Learns from human postprocessing; preconfigure database / fields;
configure domain; tool knows where to find information in document (10%)
8.
CONCLUSIONS
AND FUTURE
RESEARCH
57
58
CONCLUSIONS (1)
Interpreters regularly use manual, semi-automatic and
automatic terminology extraction tools.
The terminology extraction process differs for every
interpreter, although it tends to include document
collection, extraction, glossary building, and possible
annotation.
Interpreters prefer different approaches (manual vs.
[semi-]automatic) in different settings, and avoid
terminology extraction when documents are not available or
digitized or when they need an in-depth understanding of
content and have time to read the entire text.
59
Terminology extraction saves time and can lead to
reliable results and terminological precision.
Terminology extraction alone may be insufficient.
Most respondents felt terminology extraction was more
effective than other types of preparation.
Most respondents felt that terminology extraction tools did
not meet their needs.
CONCLUSIONS (2)
60
Interpreters use a wide variety of terminology extraction
software, but few terminology extraction tools are
designed for interpreters, and the perfect tool
doesn’t exist yet.
Minimally, the ideal tool should extract unknown terms,
context, and translations and offer multilingual
extraction, filtering of results, access to
terminological resources, multilingual export,
manual annotation, parallel scrolling,
bilingual/multilingual term lists and significant
customization.
CONCLUSIONS (3)
61
Phase 2: Survey to rank the features of ideal
tools and make recommendations to
designers
Phase 3: Use weighted rankings to assess
existing tools and make recommendations to
practitioners.
FUTURE WORK
THANK YOU!
jg@joshgoldsmith.com
@Goldsmith_Josh
http://xl8.link/ TerminologyExtractionSlides
62

Mais conteúdo relacionado

Mais procurados

Sociolinguistic Patterns
Sociolinguistic PatternsSociolinguistic Patterns
Sociolinguistic Patterns
gabrielaquez
 
Rhina Genre Analysis
Rhina Genre AnalysisRhina Genre Analysis
Rhina Genre Analysis
Hanagaj
 
Explaining second language learning
Explaining second language learningExplaining second language learning
Explaining second language learning
UTPL UTPL
 
The Sounds of Language by George Yule
The Sounds of Language by George YuleThe Sounds of Language by George Yule
The Sounds of Language by George Yule
Karla Fonseca
 
An Overview of Syllabuses in English Language Teaching
An Overview of Syllabuses in English Language TeachingAn Overview of Syllabuses in English Language Teaching
An Overview of Syllabuses in English Language Teaching
jetnang
 
Chapter 1
Chapter 1Chapter 1
Chapter 1
ibad321
 

Mais procurados (20)

Uhusiano fonolojia vs mofolojia
Uhusiano fonolojia vs mofolojiaUhusiano fonolojia vs mofolojia
Uhusiano fonolojia vs mofolojia
 
Natural approach
Natural approach Natural approach
Natural approach
 
Pp31
Pp31Pp31
Pp31
 
Sociolinguistic Patterns
Sociolinguistic PatternsSociolinguistic Patterns
Sociolinguistic Patterns
 
Rhina Genre Analysis
Rhina Genre AnalysisRhina Genre Analysis
Rhina Genre Analysis
 
Common European Framework of Reference for Languages
Common European Framework of Reference for Languages Common European Framework of Reference for Languages
Common European Framework of Reference for Languages
 
Explaining second language learning
Explaining second language learningExplaining second language learning
Explaining second language learning
 
Arabic alphabet and their shapes
Arabic alphabet and their shapesArabic alphabet and their shapes
Arabic alphabet and their shapes
 
Developing listening and speaking skills
Developing listening and speaking skills Developing listening and speaking skills
Developing listening and speaking skills
 
The Sounds of Language by George Yule
The Sounds of Language by George YuleThe Sounds of Language by George Yule
The Sounds of Language by George Yule
 
Microlinguistics
MicrolinguisticsMicrolinguistics
Microlinguistics
 
1.curriculum approaches-in-language-teaching
1.curriculum approaches-in-language-teaching1.curriculum approaches-in-language-teaching
1.curriculum approaches-in-language-teaching
 
Workshop on "How to Develop English Language Skills" By Monir Hossen
Workshop on "How to Develop English Language Skills" By Monir HossenWorkshop on "How to Develop English Language Skills" By Monir Hossen
Workshop on "How to Develop English Language Skills" By Monir Hossen
 
Material design
Material designMaterial design
Material design
 
Phonetic
PhoneticPhonetic
Phonetic
 
The grammar translation method
The grammar translation methodThe grammar translation method
The grammar translation method
 
The Linguistics of Second Language Acquisition
The Linguistics of Second Language AcquisitionThe Linguistics of Second Language Acquisition
The Linguistics of Second Language Acquisition
 
Teaching Listening to College Students
Teaching Listening to College StudentsTeaching Listening to College Students
Teaching Listening to College Students
 
An Overview of Syllabuses in English Language Teaching
An Overview of Syllabuses in English Language TeachingAn Overview of Syllabuses in English Language Teaching
An Overview of Syllabuses in English Language Teaching
 
Chapter 1
Chapter 1Chapter 1
Chapter 1
 

Semelhante a Terminology Extraction Tools for Interpreters

Franz et al 2015 escjam 2015 logic resolution taxonomic variable
Franz et al 2015 escjam 2015 logic resolution taxonomic variableFranz et al 2015 escjam 2015 logic resolution taxonomic variable
Franz et al 2015 escjam 2015 logic resolution taxonomic variable
taxonbytes
 

Semelhante a Terminology Extraction Tools for Interpreters (11)

Applications of ontologies and problem-solving methods.pdf
Applications of ontologies and problem-solving methods.pdfApplications of ontologies and problem-solving methods.pdf
Applications of ontologies and problem-solving methods.pdf
 
Franz et al 2015 escjam 2015 logic resolution taxonomic variable
Franz et al 2015 escjam 2015 logic resolution taxonomic variableFranz et al 2015 escjam 2015 logic resolution taxonomic variable
Franz et al 2015 escjam 2015 logic resolution taxonomic variable
 
How do you know what I mean?: Psycholinguistics of spoken language communicat...
How do you know what I mean?: Psycholinguistics of spoken language communicat...How do you know what I mean?: Psycholinguistics of spoken language communicat...
How do you know what I mean?: Psycholinguistics of spoken language communicat...
 
A statistical approach to term extraction.pdf
A statistical approach to term extraction.pdfA statistical approach to term extraction.pdf
A statistical approach to term extraction.pdf
 
Asr
AsrAsr
Asr
 
Bilingual terminology mining
Bilingual terminology miningBilingual terminology mining
Bilingual terminology mining
 
Temporal Semantic Techniques for Text Analysis and Applications
Temporal Semantic Techniques for Text Analysis and ApplicationsTemporal Semantic Techniques for Text Analysis and Applications
Temporal Semantic Techniques for Text Analysis and Applications
 
Applying static code analysis for domain-specific languages
Applying static code analysis for domain-specific languagesApplying static code analysis for domain-specific languages
Applying static code analysis for domain-specific languages
 
Ontology Mapping
Ontology MappingOntology Mapping
Ontology Mapping
 
ENeL_WG3_Survey-AKA4Lexicography-TiberiusHeylenKrek (1).pptx
ENeL_WG3_Survey-AKA4Lexicography-TiberiusHeylenKrek (1).pptxENeL_WG3_Survey-AKA4Lexicography-TiberiusHeylenKrek (1).pptx
ENeL_WG3_Survey-AKA4Lexicography-TiberiusHeylenKrek (1).pptx
 
Exploring the frontiers of Agile Development in the Digital Era
 Exploring the frontiers of Agile Development in the Digital Era Exploring the frontiers of Agile Development in the Digital Era
Exploring the frontiers of Agile Development in the Digital Era
 

Último

Último (20)

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 

Terminology Extraction Tools for Interpreters

  • 1. TERMINOLOGY EXTRACTION TOOLS FOR INTERPRETERS JOSH GOLDSMITH 2ND COLOGNE CONFERENCE ON TRANSLATION, INTERPRETING, AND TECHNICAL DOCUMENTATION NOVEMBER 30, 2018 JG@JOSHGOLDSMITH.COM @GOLDSMITH_JOSH http://xl8.link/ TerminologyExtractionSlides
  • 3. DEFINITIONS TERM “lexical items belonging to specialized areas of usage” Sager (1990: 2) TERMINOLOGY EXTRACTION “Automatically isolating terminology from texts” Cabré, Estopà & Vivaldi (2001:53) 3
  • 4. WHY TERMINOLOGY MATTERS FOR INTERPRETERS To be accepted as “insiders” and perceived as “competent,” interpreters must: ● Have sufficient specialized knowledge of the domain ● Know and use domain-specific terminology ● Master phraseology of specialized language Fantinuoli (2012:41) 4
  • 5. WHY EXTRACT TERMINOLOGY? ● Limited preparation time: materials made available last minute (Pignataro 2012) ● Preparation is time-intensive; generally entails collecting parallel texts and extracting relevant terminology (Fantinuoli 2017) ● Collecting terminology is a regular part of preparing for an assignment (Bilgen 2009:91) ● Preparation front-loads cognitively challenging tasks and can decrease cognitive load while interpreting (Stoll 2009) ● Terminological preparation may improve performance and processing, leading to target language renditions featuring more specialized terminology (Diaz Galaz, 2015) 5
  • 6. TERMINOLOGY MANAGEMENT SYSTEMS ● Early studies survey professionals about terminology- related needs and practices to develop terminology management tools for interpreters Rütten (2003), Bilgen (2009) ● Researchers analyze tools to see if meet needs Costa, Corpas Pastor & Durán Muñoz (2014, 2017); Will (2015) ● These studies tend to be based on researchers’ subjective assessments of interpreters’ needs rather than on objective criteria Goldsmith (2017) 6
  • 7. COULD TERMINOLOGY EXTRACTION STREAMLINE PREPARATION? ● Tools could decrease preparation time and allow interpreters to focus on the most relevant terms during preparation Rütten (2003) ● Corpus-based preparation gave rise to better terminology- related performance in simultaneous interpretation Xu (2015) 7
  • 8. LACK OF TECHNOLOGY AND RESEARCH ● “No tool has been specifically developed to satisfy the needs of interpreters during the preparatory phase” Fantinuoli (2017:24) ● Research has considered key features of terminology extraction tools for translators, but not interpreters Costa, Zaretskaya, Corpas Pastor & Seghiri (2016) 8
  • 9. TYPES OF AUTOMATIC TERMINOLOGY EXTRACTION SYSTEMS 9 LINGUISTIC Use linguistic knowledge (morphology, etc.) to detect lexical units ▸ Noise tends to be high STATISTIC Use relative frequencies to identify high- frequency lexical units ▸ Hard to find low-frequency terms HYBRID Combine statistical and linguistic measures Cabré, Estopà & Vivaldi (2001:53); Fantinuoli (2012)
  • 10. ASSESSING TERMINOLOGY EXTRACTION SYSTEMS RECALL “Capacity of the detection system to extract all terms from a document” SILENCE “Terms contained in an analysed text that are not detected by the system” PRECISION “Capacity to discriminate between those units detected by the system which are terms and those which are not” NOISE “The rate between discarded candidates and accepted ones” Cabré, Estopà & Vivaldi (2001:53-56) 10
  • 11. AIMS OF AUTOMATIC TERMINOLOGY EXTRACTION ● Reduce noise (be accurate) ● Reduce silence (be complete) ● Allow for manual selection of terms and validation of candidate terms Heid (2001) ● “As usability is regarded as being fundamental for the acceptability of an interpreter-oriented tool, a terminology extraction system for interpreters must give priority to precision over recall.” Fantinuoli (2012: 49) 11
  • 13. 1. What tools are interpreters using for terminology extraction? 2. What are the strengths and weaknesses of these tools? 3. In which settings are terminology extraction tools useful? In which settings should they be avoided? 4. What does the terminology extraction process look like? 5. How does terminology extraction compare to other types of preparation? 6. In addition to the term itself, what should these tools extract? 7. What features would an ideal terminology extraction tool offer? RESEARCH QUESTIONS
  • 14. EXPLORATORY, MULTI-PHASE MIXED METHODS RESEARCH TO ▸ Map the field of terminology extraction tools for interpreters ▸ Develop an instrument to assess tools (Creswell & Clark 2006) SEMI-STRUCTURED IN-DEPTH INTERVIEWS ▸ Develop detailed descriptions, present multiple perspectives, describe process, understand a situation from the inside (Weiss, 1994). ▸ Answers “are rich and thick with qualitative data” (Turner, 2010, p. 756). ▸ Zoom™, Speechmatics™ ▸ Informed consent ▸ Anonymous INDUCTIVE THEMATIC ANALYSIS ▸ Transcribe interviews and inductively derive categories (Kvale, 1996) ▸ Coded with NVivo™ (CAQDAS program) RESEARCH DESIGN
  • 15. ▸ 10 respondents, all professional interpreters (2 women) ▸ Age 29 – 57 (μ = 42.2) ▸ Domiciled in Europe and North America ▸ 6 members of professional associations (60%) ▸ 2 staff interpreters (20%) ▸ Conference (100%), Media (10%), Court (10%) and Community (10%) interpreting ▸ Experience: 3 – 30 years (μ = 17.7) ▸ Experience using terminology extraction tools: 1 - 17 years (μ = 8.9) ▸ Translation, training, research, administration, voiceovers PARTICIPANTS
  • 16. PARTICIPANTS’ EXPERIENCE MANUAL SEMI- AUTOMATIC AUTOMATIC PERCENTAGE OF ASSIGNMENTS USED 0 - 100% (μ = 48.0%) 0 - 100% (μ = 18.9%) 0 - 100% (μ = 40.0%) NUMBER OF ASSIGNMENTS USED 0 - 840 (μ = 123.8) 0 - 150 (μ = 17.2) 0 - 600 (μ = 135.6)
  • 17. THIS IS A PILOT STUDY RESULTS CANNOT BE GENERALIZED, BUT DO AIM TO GIVE A GENERAL OVERVIEW OF TOOLS, EXPERIENCES AND EXPECTATIONS. PERCENTAGES ARE NOT STATISTICALLY SIGNIFICANT OR GENERALIZABLE.
  • 19. 19 HARDWARE USED Desktop (50%) Laptop (75%) Tablet (20%) Windows operating system (80%) MacOS (40%) iOS (20%) ▸ Some users utilize multiple devices
  • 20. 20 InterpretBank (60%) Interpreters’ Help (40%) SketchEngine (20%; 30% used or tested) Intragloss (10%; 40% used or tested) Wordsmith, Terminotix, Readdle Documents, GoodReader, GT4T, dtSearch, Thermostat, as well as an in-house tool at an international organization (10% each) ▸ Users work with or had tested multiple types of terminology extraction software TERMINOLOGY EXTRACTION SOFTWARE USED
  • 21. 21 Terminology management tools (InterpretBank, Interpreters’ Help, Interplex, MS Access): 100% Annotation tools (Readdle Documents, GoodReader, PDF Exchange Editor, Skim): 50% Terminology database (e.g. IATE): 50% Wikipedia: 40% Linguee: 40% Search Engines: 30% OTHER SOFTWARE USED
  • 23. TYPES OF TECHNOLOGY-ASSISTED TERMINOLOGY EXTRACTION 23 MANUAL User selects terms manually. Tool provides support, e.g., to: ▸ add terms to glossary ▸ look up translation ▸ help manage terms SEMI-AUTOMATIC User provides document(s). Tool suggests terms. User reviews and accepts them. AUTOMATIC User provides document(s). Tool suggests term candidates. Goldsmith (2018)
  • 36. 36 OTHER PREPARATION STRATEGIES Read documents (90%) Background reading (50%) Web research (50%) Memorize/drill terms (50%) Manual annotation (40%) Wikipedia (40%) Terminological research (30%) Gisting/text summarization (20%) Automatic translation; Concordancer; Build glossaries collaboratively; Read news; Read technical documents; Practice interpreting on similar topics (10%)
  • 38. 38 STRENGTHS OF TERMINOLOGY EXTRACTION TOOLS (1) FACILITATES PREPARATION Saves time (100%) Provides terminology despite time pressure (90%) Quick extraction from lengthy documents (60%) Less hassle / menial copying and pasting (30%) Automatic annotation of term (and translation) (20%) Better preparation (10%) CONSISTENCY/RELIABILITY Accurate/reliable results from automatic extraction (50%) Consistent preparation (20%)
  • 39. 39 STRENGTHS OF TERMINOLOGY EXTRACTION TOOLS (2) TERMINOLOGICAL PRECISION Automatically extract most important / “right” terms (50%) Automatically look up translations on other sites (40%) Automatically extract named entities (10%) Add stop words (10%) Search function (10%) ERGONOMICS Lightweight, portable, small footprint (30%)
  • 40. 40 STRENGTHS OF TERMINOLOGY EXTRACTION TOOLS (3) DISPLAY/INTERFACE Parallel scrolling (50%) Easy comparison of bilingual/multilingual texts (30%) Manual highlighting/annotation/bookmarking (20%) Easy to use; easy input of terms; visually appealing; filter/edit results (10% each) EXPORT/STORAGE Export candidates to database (40%) Back up/digitize glossaries (30%) Export in shareable format (20%) Reuse for later assignments (10%)
  • 41. 41 WEAKNESSES OF TERMINOLOGY EXTRACTION TOOLS (1) PREPARATION Incomplete preparation if only use term extraction (40%) Time-intensive (manual, copy/paste) (20%) Slow with large glossaries (10%) IMPORT/EXPORT/STORAGE Poor export/formatting of exported text (20%) Tool doesn’t recognize format (e.g. line breaks, images) (20%) Compatibility (Mac/PC, etc.) (10%) Poor import of documents/glossaries (10%) Export not provided (10%)
  • 42. 42 WEAKNESSES OF TERMINOLOGY EXTRACTION TOOLS (2) EXTRACTION Multilingual extraction not supported (50%) Too many terms extracted (50%) Results need cleaning up (30%) Too few/many words in term (20%) Noise (20%) Too few terms extracted (20%) Incomplete extraction (e.g. context missing) (10%) Tool reorders words (10%)
  • 43. 43 WEAKNESSES OF TERMINOLOGY EXTRACTION TOOLS (3) DISPLAY Poor/incomplete presentation of results (30%) Terminology entry lacks relevant fields (10%) Small screen size (tablet) (10%) CUSTOMIZATION Tools not designed for interpreters (10%) Software doesn’t know user’s individual needs (10%) COST Cost/subscription (20%)
  • 44. 44 SETTINGS WHERE EXTRACTION TOOLS PREFERRED 80% used extraction when documents available MANUAL Parallel texts (40%) New topic (30%) Few documents (30%) Time permitting (30%) Focus on collocations (10%) Only monolingual documents available (10%) AUTOMATIC Numerous/long documents (40%) For institutions (40%) Time pressure (40%) For hearings (20%) For automatic annotation when glossaries available (20%) Familiar subject matter (10%) All assignments (10%) When onsite (10%)
  • 45. 45 SETTINGS WHERE EXTRACTION TOOLS AVOIDED Limited / no materials available (50%) Documents not available in digital format (30%) Need to understand content (30%) Text too general (20%) Powerpoint (20%) Faster to read than extract (20%) Recurring meeting/familiar with terminology (10%) Confidentiality (10%) Multilingual documents not available (10%) Vague subject matter (10%) Very large / small glossary available (10%)
  • 46. 70%of respondents felt terminology extraction was more effective than other types of preparation 46 62.5% BUT ONLY 40% of respondents preferred terminology extraction over other types of preparation of respondents felt terminology extraction tools meet their needs
  • 47. 90%of respondents felt clients were not aware they used terminology extraction tools. Those who were aware reacted positively (20%) and found it professional (10%) 47 80%of respondents felt colleagues were curious about terminology extraction tools, although some mentioned uninterested colleagues (40%) who were averse to new approaches (20%) or unwilling to change their habits (20%)
  • 49. 49 THE IDEAL TOOL SHOULD EXTRACT Term (100%) Single and multi-word terms (100%) Context/examples (90%) Equivalents in other languages (70%) Source / source document (50%) Definition (40%) Frequencies (40%) Subject matter overview (40%) Collocations / phraseology (30%) Named entities; figures; domain; link to source (20%) Graphical information; images; hyponyms; semantic groupings (10%)
  • 50. 50 THE IDEAL TOOL ANNOTATION Allow manual annotation (70%) Highlight terms (60%) Highlight phraseology (60%) Print translations above extracted term (40%) Automatically annotate term occurrences from glossary (30%) Manually add sticky notes (30%) Highlight relevant content (20%) Annotations overview pane (20%) Bookmarks; Highlight phraseology; Highlight named entities (10% each)
  • 51. 51 THE IDEAL TOOL EXTRACTION/TRANSLATION Extract unknown terms (80%) Multilingual extraction available (80%) Statistical extraction/show frequencies (70%) Filter results (manually, chronologically, thematically, by frequency, by agenda item, etc.) (60%) Extract from multiple files (60%) Access external resources from within program (60%) Ignore stop words / decrease noise (60%) View parallel texts & manually extract equivalents (50%) Automatically rank most relevant terms (40%) No clean up necessary; access multiple termbases/dictionaries; search glossaries for extracted terms; tablet and/or stylus interface (30%) ...
  • 52. 52 THE IDEAL TOOL IMPORT Limited preprocessing / automatic conversion regardless of source file format (40%) Batch upload (30%) Import from parallel resources / in multiple languages (20%) Built-in webcrawler (10%) Import from your institutional calendar (10%) Flawless import (no errors with line breaks, etc.) (10%) Imports pre-existing glossaries (10%)
  • 53. 53 THE IDEAL TOOL EXPORT Multilingual export (60%) One-click import into database (50%) Export into widely used/compatible formats (30%) Export annotated text (20%) Print from tool (10%)
  • 54. 54 THE IDEAL TOOL FORMAT AND STORAGE FORMAT Cross-platform (50%) Software suite / integration with terminology management tool (50%) Compatible with mobile devices (30%) “Available on my operating system” (30%) Compatible with translation tools/databases (20%) Checks pre-existing glossaries to avoid duplicates (20%) STORAGE Local storage (40%) Offline to maintain confidentiality (30%) Cloud storage (30%)
  • 55. 55 THE IDEAL TOOL INTERFACE Link term to context (90%) View parallel texts side by side with synchronous scrolling (70%) Bilingual/multilingual term list (50%) Reliability marker/index (50%) Simple, uncluttered display (40%) Search within source documents (30%) Customize display (30%) Speech recognition interface (20%) Can manually annotate with stylus (20%) Clear color code (20%)/color code for fuzzy matches (20%) Search within/filter exported terms (20%) Extensive information available (20%) ...
  • 56. 56 THE IDEAL TOOL CUSTOMIZATION Configure number of terms extracted (50%) Configure working languages (40%) Customize external resources (40%) Custom results based on audience/domain/client (40%) Configure term length (n-gram) (30%) Customize display/user interface (30%) Knows interpreter’s preferences (20%); Designed for interpreters (20%) Tool knows interpreter’s background and adjusts accordingly (20%) Configure frequency threshold (20%) Learns from human postprocessing; preconfigure database / fields; configure domain; tool knows where to find information in document (10%)
  • 58. 58 CONCLUSIONS (1) Interpreters regularly use manual, semi-automatic and automatic terminology extraction tools. The terminology extraction process differs for every interpreter, although it tends to include document collection, extraction, glossary building, and possible annotation. Interpreters prefer different approaches (manual vs. [semi-]automatic) in different settings, and avoid terminology extraction when documents are not available or digitized or when they need an in-depth understanding of content and have time to read the entire text.
  • 59. 59 Terminology extraction saves time and can lead to reliable results and terminological precision. Terminology extraction alone may be insufficient. Most respondents felt terminology extraction was more effective than other types of preparation. Most respondents felt that terminology extraction tools did not meet their needs. CONCLUSIONS (2)
  • 60. 60 Interpreters use a wide variety of terminology extraction software, but few terminology extraction tools are designed for interpreters, and the perfect tool doesn’t exist yet. Minimally, the ideal tool should extract unknown terms, context, and translations and offer multilingual extraction, filtering of results, access to terminological resources, multilingual export, manual annotation, parallel scrolling, bilingual/multilingual term lists and significant customization. CONCLUSIONS (3)
  • 61. 61 Phase 2: Survey to rank the features of ideal tools and make recommendations to designers Phase 3: Use weighted rankings to assess existing tools and make recommendations to practitioners. FUTURE WORK