SlideShare uma empresa Scribd logo
1 de 30
General Guidelines for
Thesaurus
Construction
Where To Start
Before You Begin
Be Sure to Know
How will we use the thesaurus?
What is the size and the scope of the project?
Who will be viewing and accessing the vocabulary?
When will we update the thesaurus?
Approaches For Taxonomy
Design
Top-down
• Identifying top categories first or utilizing a pre-established category list
for your top areas.
• Organizing each top domain with the most relevant and broad coverage of
documents
• Dividing each category with subcategories to narrow granular topical
areas
• Establishing attributes and additional subcategories to each thesaurus
node created
Approaches For Taxonomy
Design
Bottom-up
• Beginning from an unsorted list of vocabulary terms and concepts
compiled from multiple resources
• Moving terms in the list to classify their Broader / Narrower relationships
• Declaring top domains after exploring the amount of topics covered in a
single term
• Guaranteeing every term to be evaluated at least once
• Sorting extremely large unsorted data sets efficiently
Top-down
• Easier for smaller vocabulary sets
• Quick method of identifying key top areas
• Designed for a navigational mindset
Bottom-up
• More accurate representation of the content
• Ideal for larger scale thesauri
• Content drives the entire structure of the thesaurus
We recommend a mix of both, but every vocabulary demands
different courses of action
Resource Gathering
Resources for Designing a
Thesaurus
Existing Controlled Vocabularies
• Additional taxonomies
• Classification schemes
• Topics and headings
• Sitemaps
• Glossaries and Definitions
Listing of Keywords
• Entered by an author or indexer
• May range in size from 100 to 100,000 terms
Resources for Designing a
Thesaurus
• Search Logs
• An unruly mess of words
• What to look out for…
• Which topics are more frequently searched for by users?
• Has common terminology for concepts and technologies changed within the past x
years?
• Trim search logs to the most frequent and concise topics
• Data Mining
• N-gram tests
• “Content-aware” vocabulary
Defining the Thesaurus Specialization
What Goes into the Thesaurus?
Selecting Thesaurus Terms
• Looking for descriptors, terms in the thesaurus which must
adequately reflect the content
• Terms which describe fields of study, technology, applications,
devices, research, and other content
• Thesaurus terms must be concise, must express a single concept,
and must be free of ambiguity.
• Concepts such as General and Applications will not describe what is
written within a single document.
Literary warrant
• Justification for the representation of a concept in an indexing language or
for the selection of a preferred term because of its frequent occurrence in
the literature
Organizational warrant
• Justification for the representation of a concept in an indexing language or
for the selection of a preferred term due to characteristics and context of
the organization
User Warrant
• Justification for the representation of a concept in an indexing language or
for the selection of a preferred term because of frequent requests for
information on the concept or free-text searches on the term by users of
an information storage and retrieval system.
Creating the Initial
Build
Compiling the Terms
Existing vocabularies
• Be aware of overlap and multiple terminologies
• Standardize the terms (plural, hyphenation, etc.)
• Breakup pre-coordination if it exists
Whether to include the vocabularies current hierarchy (if it
contains one) is purely the decision of the thesaurus developer
• Will save time and effort to retain existing hierarchy while providing an
early look at the structure of the vocabulary
• However, conflicting and overlapping terms may cause problems when
reviewing the initial build
Filtering the Unsorted Lists
Standardize the “Word Salad”
• Combining singular and plural forms of terms
• Combining hyphenated terms
• Removing named entities
• Identifying and/or removing acronyms
Add only the most frequently searched terms and added keywords
• Can limit to the top 50 or 100 most frequent
• Too many results can litter a vocabulary with rubbish terms
Next Step – Import!
Creation of the Initial Build
• Establish primary categories for the thesaurus
• Sort uncontrolled terms into appropriate categories
• Most time-consuming process
• Content will be re-evaluated, don’t stress too much on getting it right the first
time
• Create synonyms and related terms as you sort each term
• Double-check for conceptual duplicates within the project
• Ensure standardized spelling (American vs. British English)
• Check for typos
• Review Literary, Organizational, and User Warrant for each term
• Delete terms with little to no indexing value
Initial Build - Equivalence and
Associations
Six-Second Rule
• As a rule-of-thumb, give yourself six seconds to brainstorm multiple ways
to express a single concept.
Creating synonyms not only allows for a stronger thesaurus, but
will potentially identify duplicate concepts within the early
vocabulary.
Adding and searching for related terms will identify other subject
areas included in the unsorted taxonomy
Evaluation of the
Thesaurus Build
Evaluation
• Review Literary, Organizational, and User Warrant
• Division of top terms
• Assign team members top levels to review
• Fill in missing gaps of classification
• Ensure no flat list of topics (more than 15 terms in a category) exist within a
single section
• Merge conceptual duplications within the content
• Preferring one expression over the others
• Delete terms with little to zero indexing value
• Add synonyms not listed for each term
• Add related terms which do not appear
Evaluation - Term style and Form
Must represent single-train of thought
• Removes ambiguity and uncertainty of concepts
• Pre-coordination of terms should be disregarded (“Acoustics in music”,
“Cancer and metastasis”)
Reduce slang and jargon for preferred terms unless no other word
describes the concept or if the older terminology is infrequently
• (Microelectromechanical Systems and MEMS)
• (Quantum bits and Qubits)
Evaluation - Term style and Form
Use nouns, or noun phrases / Avoid action verbs for concepts
• Catalysis rather than catalyze
• Distillation rather than distill
• Reading rather than read
Adjectives and Adverbs
• May be used to differentiate different concepts
• Should not be used as individual terms
Evaluation
Proper nouns (including names, places, etc.) should have proper
capitalization
Compound terms
• Used for Disambiguation and for specificity
• Granular descriptors
“Lead coating on copper pipes”
Arabian Peninsula
Milky Way Galaxy
Louvre
Albert Einstein
Evaluation - Term style and Form
Loanwords are fine if they are covered well within the content
(habeas corpus)
Abbreviations and acronyms should be spelled out, unless the
proper name is rarely used (DNA)
Do not include parentheses unless disambiguating the term
• Mercury (element) = Okay
• Computed tomography (CT) = Frowned upon
Indexing
Post-coordination
• Two or more thesaurus terms are applied to an article to represent a
concept.
• Used at the time of search and retrieval
Pre-coordination
• Terms are combined before indexing
• Uses one node to describe content
Liver AND Anatomy
New York AND Subway
Furniture-California-San Francisco-History-20th Century
Liver-Blood Vessels-Diseases-Congresses
Post-coordinated terms work more effectively for MAIstro
(Thesaurus Master and M.A.I.)
• Allows M.A.I. to easily identify subject terms within a range documents
without elaborate rules
• Easier to maintain simpler vocabulary terms
Pre-coordination allows an unlimited amount of terms to be added
to the Thesaurus
• Expressing multiple concepts within a singular thesaurus term will set a
precedence for enabling all terms in this manner
• If you have the term Computers in chemistry, what will stop you from
creating Computers in biology, Computers in dentistry, Computers in
echolocation, etc.
Evaluation - Term style and form
Keep terms plural unless
• changing the term to a plural form alters the meaning of the term (e.g.
Technology; Technologies)
• If this is the case, disambiguate the concepts with parenthetical qualifiers
Technology (applied sciences) and Technologies (devices)
• Literary warrant or User warrant dictates the term to be singular
Control the vocabulary through use of synonyms
• Terms must represent unique concepts
Keep single Train-of-Thought
Revision and Reiteration
Thesaurus development is highly cyclical
• For multiple personnel, reviewing alternate sections and others work is
highly recommended
• Alternating a pair of eyes will catch plenty of errors and inconsistencies
within the thesaurus terms
Subject Matter Expert feedback is always recommended
• Must be clear what SMEs are reviewing and why they are reviewing it
• Many experts are highly opinionated and unaware of the
scope/implementation of the project
• Feedback must be re-evaluated (sometimes taken with a grain of salt)
Standards and Compliance
• American National Standards Institute / National Information
Standards Organization
• ANSI/NISO Z.39.19
• British Standards Institute
• BS 8723 parts 1-4
• International Standards Institute
• ISO 25964
Continue on with the Live Demo

Mais conteúdo relacionado

Mais procurados (20)

DELNET by Gaurav Boudh
DELNET by Gaurav BoudhDELNET by Gaurav Boudh
DELNET by Gaurav Boudh
 
LIS EDUCATION
LIS EDUCATIONLIS EDUCATION
LIS EDUCATION
 
RELATIONSHIP OF LIBRARY SCIENCE WITH ‎INFORMATION SCIENCE
RELATIONSHIP OF LIBRARY SCIENCE WITH ‎INFORMATION SCIENCERELATIONSHIP OF LIBRARY SCIENCE WITH ‎INFORMATION SCIENCE
RELATIONSHIP OF LIBRARY SCIENCE WITH ‎INFORMATION SCIENCE
 
Chain indexing
Chain indexingChain indexing
Chain indexing
 
Inis ppt
Inis pptInis ppt
Inis ppt
 
Cds Isis Intro Huridocs
Cds Isis Intro HuridocsCds Isis Intro Huridocs
Cds Isis Intro Huridocs
 
Indexing language concept types and characteristics
Indexing language concept types and characteristicsIndexing language concept types and characteristics
Indexing language concept types and characteristics
 
Basics of licensings electronic resources 2
Basics of licensings electronic resources 2Basics of licensings electronic resources 2
Basics of licensings electronic resources 2
 
DESIDOC
DESIDOC DESIDOC
DESIDOC
 
Reasons for information repackaging in library
Reasons for information repackaging in libraryReasons for information repackaging in library
Reasons for information repackaging in library
 
Precis
PrecisPrecis
Precis
 
Dds
Dds Dds
Dds
 
UNISIST
UNISISTUNISIST
UNISIST
 
ASTINFO & APINESS
ASTINFO & APINESS ASTINFO & APINESS
ASTINFO & APINESS
 
Modes of formation of subject
Modes of formation of subjectModes of formation of subject
Modes of formation of subject
 
Post coordinate indexing .. Library and information science
Post coordinate indexing .. Library and information sciencePost coordinate indexing .. Library and information science
Post coordinate indexing .. Library and information science
 
Thesaurus 2101
Thesaurus 2101Thesaurus 2101
Thesaurus 2101
 
Marc 21
Marc 21Marc 21
Marc 21
 
Reference services in Libraries
Reference services in LibrariesReference services in Libraries
Reference services in Libraries
 
Total quality of management in libraries
Total quality of management in librariesTotal quality of management in libraries
Total quality of management in libraries
 

Semelhante a DHUG 2017 - Thesaurus Construction Training

The Role of Thesauri in Data Modeling
The Role of Thesauri in Data ModelingThe Role of Thesauri in Data Modeling
The Role of Thesauri in Data ModelingDanny Greefhorst
 
Developing the AIP Thesaurus: The Platform for an Ontology
Developing the AIP Thesaurus: The Platform for an OntologyDeveloping the AIP Thesaurus: The Platform for an Ontology
Developing the AIP Thesaurus: The Platform for an OntologyAccess Innovations, Inc.
 
Writing the research protocol part 1-Dr. Yasser Mohammed Hassanain Elsayed.pptx
Writing the research protocol part 1-Dr. Yasser Mohammed Hassanain Elsayed.pptxWriting the research protocol part 1-Dr. Yasser Mohammed Hassanain Elsayed.pptx
Writing the research protocol part 1-Dr. Yasser Mohammed Hassanain Elsayed.pptxYasserMohammedHassan1
 
Library of congress subject heading
Library of congress subject headingLibrary of congress subject heading
Library of congress subject headingMahendraAdhikari7
 
Taxonomy design best practices
Taxonomy design best practices Taxonomy design best practices
Taxonomy design best practices voginip
 
Verbal plane Canon
Verbal plane CanonVerbal plane Canon
Verbal plane CanonYesan Sellan
 
Tips to write an Academic Essay L&WE IV
Tips to write an Academic Essay L&WE IVTips to write an Academic Essay L&WE IV
Tips to write an Academic Essay L&WE IVPau_32
 
How to write a good reserach paper: Guidelines and Tips
How to write a good reserach paper: Guidelines and Tips How to write a good reserach paper: Guidelines and Tips
How to write a good reserach paper: Guidelines and Tips Aboul Ella Hassanien
 
You Say Dog I Say Canine
You Say Dog I Say CanineYou Say Dog I Say Canine
You Say Dog I Say Canineaubreymm
 

Semelhante a DHUG 2017 - Thesaurus Construction Training (20)

Taxonomy Fundamentals Workshop 2013
Taxonomy Fundamentals Workshop 2013Taxonomy Fundamentals Workshop 2013
Taxonomy Fundamentals Workshop 2013
 
The Role of Thesauri in Data Modeling
The Role of Thesauri in Data ModelingThe Role of Thesauri in Data Modeling
The Role of Thesauri in Data Modeling
 
Developing the AIP Thesaurus: The Platform for an Ontology
Developing the AIP Thesaurus: The Platform for an OntologyDeveloping the AIP Thesaurus: The Platform for an Ontology
Developing the AIP Thesaurus: The Platform for an Ontology
 
Writing the research protocol part 1-Dr. Yasser Mohammed Hassanain Elsayed.pptx
Writing the research protocol part 1-Dr. Yasser Mohammed Hassanain Elsayed.pptxWriting the research protocol part 1-Dr. Yasser Mohammed Hassanain Elsayed.pptx
Writing the research protocol part 1-Dr. Yasser Mohammed Hassanain Elsayed.pptx
 
Library of congress subject heading
Library of congress subject headingLibrary of congress subject heading
Library of congress subject heading
 
Taxonomy 101
Taxonomy 101Taxonomy 101
Taxonomy 101
 
Taxonomy design best practices
Taxonomy design best practices Taxonomy design best practices
Taxonomy design best practices
 
Taxonomy Fundamentals Workshop
Taxonomy Fundamentals WorkshopTaxonomy Fundamentals Workshop
Taxonomy Fundamentals Workshop
 
EAC conference presentation
EAC conference presentationEAC conference presentation
EAC conference presentation
 
Ch. 3 & 4 Presentation
Ch. 3 & 4 PresentationCh. 3 & 4 Presentation
Ch. 3 & 4 Presentation
 
Using search tools
Using search toolsUsing search tools
Using search tools
 
RESEARCH-FORMAT-DW2 (2) (1).pptx
RESEARCH-FORMAT-DW2 (2) (1).pptxRESEARCH-FORMAT-DW2 (2) (1).pptx
RESEARCH-FORMAT-DW2 (2) (1).pptx
 
Verbal plane Canon
Verbal plane CanonVerbal plane Canon
Verbal plane Canon
 
Developing a draft Information Literacy thesaurus
Developing a draft Information Literacy thesaurusDeveloping a draft Information Literacy thesaurus
Developing a draft Information Literacy thesaurus
 
THEORETICAL-FRAMEWORK.pptx
THEORETICAL-FRAMEWORK.pptxTHEORETICAL-FRAMEWORK.pptx
THEORETICAL-FRAMEWORK.pptx
 
Academic Writing and Error
Academic Writing and ErrorAcademic Writing and Error
Academic Writing and Error
 
Tips to write an Academic Essay L&WE IV
Tips to write an Academic Essay L&WE IVTips to write an Academic Essay L&WE IV
Tips to write an Academic Essay L&WE IV
 
E-LEARN Search Strategies
E-LEARN Search StrategiesE-LEARN Search Strategies
E-LEARN Search Strategies
 
How to write a good reserach paper: Guidelines and Tips
How to write a good reserach paper: Guidelines and Tips How to write a good reserach paper: Guidelines and Tips
How to write a good reserach paper: Guidelines and Tips
 
You Say Dog I Say Canine
You Say Dog I Say CanineYou Say Dog I Say Canine
You Say Dog I Say Canine
 

Mais de Access Innovations, Inc.

Making AI Behave: Using Knowledge Domains to Produce Useful, Trustworthy Results
Making AI Behave: Using Knowledge Domains to Produce Useful, Trustworthy ResultsMaking AI Behave: Using Knowledge Domains to Produce Useful, Trustworthy Results
Making AI Behave: Using Knowledge Domains to Produce Useful, Trustworthy ResultsAccess Innovations, Inc.
 
ISO 25964-1Working Group ISO/TC 46/SC 9/WG 8
ISO 25964-1Working Group ISO/TC 46/SC 9/WG 8ISO 25964-1Working Group ISO/TC 46/SC 9/WG 8
ISO 25964-1Working Group ISO/TC 46/SC 9/WG 8Access Innovations, Inc.
 
Hindawi taxonomy and personalization 27.10 (1)
Hindawi taxonomy and personalization 27.10 (1)Hindawi taxonomy and personalization 27.10 (1)
Hindawi taxonomy and personalization 27.10 (1)Access Innovations, Inc.
 
Asco using ai-taxos-for meta-titles-february-2021
Asco using ai-taxos-for meta-titles-february-2021Asco using ai-taxos-for meta-titles-february-2021
Asco using ai-taxos-for meta-titles-february-2021Access Innovations, Inc.
 
Ai webinar 2 -what's in a name (consolidated pdf)
Ai webinar 2 -what's in a name (consolidated pdf)Ai webinar 2 -what's in a name (consolidated pdf)
Ai webinar 2 -what's in a name (consolidated pdf)Access Innovations, Inc.
 
Tagging overview - Why Keywords Don't Cut It
Tagging overview  - Why Keywords Don't Cut ItTagging overview  - Why Keywords Don't Cut It
Tagging overview - Why Keywords Don't Cut ItAccess Innovations, Inc.
 
DHUG 2018: Towards Web-Centric Repository Interoperability
DHUG 2018: Towards Web-Centric Repository InteroperabilityDHUG 2018: Towards Web-Centric Repository Interoperability
DHUG 2018: Towards Web-Centric Repository InteroperabilityAccess Innovations, Inc.
 
DHUG 2017 - Understanding ROI Just Enough to Get Your Project Funded
DHUG 2017 - Understanding ROI Just Enough to Get Your Project FundedDHUG 2017 - Understanding ROI Just Enough to Get Your Project Funded
DHUG 2017 - Understanding ROI Just Enough to Get Your Project FundedAccess Innovations, Inc.
 

Mais de Access Innovations, Inc. (20)

Making AI Behave: Using Knowledge Domains to Produce Useful, Trustworthy Results
Making AI Behave: Using Knowledge Domains to Produce Useful, Trustworthy ResultsMaking AI Behave: Using Knowledge Domains to Produce Useful, Trustworthy Results
Making AI Behave: Using Knowledge Domains to Produce Useful, Trustworthy Results
 
ISO 25964-1Working Group ISO/TC 46/SC 9/WG 8
ISO 25964-1Working Group ISO/TC 46/SC 9/WG 8ISO 25964-1Working Group ISO/TC 46/SC 9/WG 8
ISO 25964-1Working Group ISO/TC 46/SC 9/WG 8
 
Smart submit
Smart submitSmart submit
Smart submit
 
Plos taxonomy beyond search dhug 2021
Plos taxonomy beyond search   dhug 2021Plos taxonomy beyond search   dhug 2021
Plos taxonomy beyond search dhug 2021
 
Hindawi taxonomy and personalization 27.10 (1)
Hindawi taxonomy and personalization 27.10 (1)Hindawi taxonomy and personalization 27.10 (1)
Hindawi taxonomy and personalization 27.10 (1)
 
Data harmonycloudpowerpointclientfacing
Data harmonycloudpowerpointclientfacingData harmonycloudpowerpointclientfacing
Data harmonycloudpowerpointclientfacing
 
Data harmony update 2021
Data harmony update 2021 Data harmony update 2021
Data harmony update 2021
 
Atypon dhug2021
Atypon dhug2021Atypon dhug2021
Atypon dhug2021
 
Asco using ai-taxos-for meta-titles-february-2021
Asco using ai-taxos-for meta-titles-february-2021Asco using ai-taxos-for meta-titles-february-2021
Asco using ai-taxos-for meta-titles-february-2021
 
Asce more than just topic taxonomies
Asce more than just topic taxonomiesAsce more than just topic taxonomies
Asce more than just topic taxonomies
 
Acs discoverability-dhug2021
Acs discoverability-dhug2021Acs discoverability-dhug2021
Acs discoverability-dhug2021
 
Ai webinar 2 -what's in a name (consolidated pdf)
Ai webinar 2 -what's in a name (consolidated pdf)Ai webinar 2 -what's in a name (consolidated pdf)
Ai webinar 2 -what's in a name (consolidated pdf)
 
Tagging overview - Why Keywords Don't Cut It
Tagging overview  - Why Keywords Don't Cut ItTagging overview  - Why Keywords Don't Cut It
Tagging overview - Why Keywords Don't Cut It
 
Health Affairs - Why Keywords Don't Cut It
Health Affairs - Why Keywords Don't Cut ItHealth Affairs - Why Keywords Don't Cut It
Health Affairs - Why Keywords Don't Cut It
 
Why Keywords Don't Cut It
Why Keywords Don't Cut ItWhy Keywords Don't Cut It
Why Keywords Don't Cut It
 
Data Harmony update 2020 final
Data Harmony update 2020 finalData Harmony update 2020 final
Data Harmony update 2020 final
 
Data Harmony Update 2020 final
Data Harmony Update 2020 finalData Harmony Update 2020 final
Data Harmony Update 2020 final
 
DHUG 2018: Towards Web-Centric Repository Interoperability
DHUG 2018: Towards Web-Centric Repository InteroperabilityDHUG 2018: Towards Web-Centric Repository Interoperability
DHUG 2018: Towards Web-Centric Repository Interoperability
 
DHUG 2018 - Florida Thesis OCR
DHUG 2018 - Florida Thesis OCRDHUG 2018 - Florida Thesis OCR
DHUG 2018 - Florida Thesis OCR
 
DHUG 2017 - Understanding ROI Just Enough to Get Your Project Funded
DHUG 2017 - Understanding ROI Just Enough to Get Your Project FundedDHUG 2017 - Understanding ROI Just Enough to Get Your Project Funded
DHUG 2017 - Understanding ROI Just Enough to Get Your Project Funded
 

Último

%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in sowetomasabamasaba
 
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2
 
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benonimasabamasaba
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisamasabamasaba
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrainmasabamasaba
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2
 
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open SourceWSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open SourceWSO2
 
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...masabamasaba
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...WSO2
 
What Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationWhat Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationJuha-Pekka Tolvanen
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyviewmasabamasaba
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...Shane Coughlan
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension AidPhilip Schwarz
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024VictoriaMetrics
 
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2
 
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...masabamasaba
 

Último (20)

%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
 
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
 
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open SourceWSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
 
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
 
What Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationWhat Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the Situation
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go Platformless
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
 
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
 
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
 

DHUG 2017 - Thesaurus Construction Training

  • 3. Before You Begin Be Sure to Know How will we use the thesaurus? What is the size and the scope of the project? Who will be viewing and accessing the vocabulary? When will we update the thesaurus?
  • 4. Approaches For Taxonomy Design Top-down • Identifying top categories first or utilizing a pre-established category list for your top areas. • Organizing each top domain with the most relevant and broad coverage of documents • Dividing each category with subcategories to narrow granular topical areas • Establishing attributes and additional subcategories to each thesaurus node created
  • 5. Approaches For Taxonomy Design Bottom-up • Beginning from an unsorted list of vocabulary terms and concepts compiled from multiple resources • Moving terms in the list to classify their Broader / Narrower relationships • Declaring top domains after exploring the amount of topics covered in a single term • Guaranteeing every term to be evaluated at least once • Sorting extremely large unsorted data sets efficiently
  • 6. Top-down • Easier for smaller vocabulary sets • Quick method of identifying key top areas • Designed for a navigational mindset Bottom-up • More accurate representation of the content • Ideal for larger scale thesauri • Content drives the entire structure of the thesaurus We recommend a mix of both, but every vocabulary demands different courses of action
  • 8. Resources for Designing a Thesaurus Existing Controlled Vocabularies • Additional taxonomies • Classification schemes • Topics and headings • Sitemaps • Glossaries and Definitions Listing of Keywords • Entered by an author or indexer • May range in size from 100 to 100,000 terms
  • 9. Resources for Designing a Thesaurus • Search Logs • An unruly mess of words • What to look out for… • Which topics are more frequently searched for by users? • Has common terminology for concepts and technologies changed within the past x years? • Trim search logs to the most frequent and concise topics • Data Mining • N-gram tests • “Content-aware” vocabulary
  • 10. Defining the Thesaurus Specialization What Goes into the Thesaurus?
  • 11. Selecting Thesaurus Terms • Looking for descriptors, terms in the thesaurus which must adequately reflect the content • Terms which describe fields of study, technology, applications, devices, research, and other content • Thesaurus terms must be concise, must express a single concept, and must be free of ambiguity. • Concepts such as General and Applications will not describe what is written within a single document.
  • 12. Literary warrant • Justification for the representation of a concept in an indexing language or for the selection of a preferred term because of its frequent occurrence in the literature Organizational warrant • Justification for the representation of a concept in an indexing language or for the selection of a preferred term due to characteristics and context of the organization User Warrant • Justification for the representation of a concept in an indexing language or for the selection of a preferred term because of frequent requests for information on the concept or free-text searches on the term by users of an information storage and retrieval system.
  • 14. Compiling the Terms Existing vocabularies • Be aware of overlap and multiple terminologies • Standardize the terms (plural, hyphenation, etc.) • Breakup pre-coordination if it exists Whether to include the vocabularies current hierarchy (if it contains one) is purely the decision of the thesaurus developer • Will save time and effort to retain existing hierarchy while providing an early look at the structure of the vocabulary • However, conflicting and overlapping terms may cause problems when reviewing the initial build
  • 15. Filtering the Unsorted Lists Standardize the “Word Salad” • Combining singular and plural forms of terms • Combining hyphenated terms • Removing named entities • Identifying and/or removing acronyms Add only the most frequently searched terms and added keywords • Can limit to the top 50 or 100 most frequent • Too many results can litter a vocabulary with rubbish terms
  • 16. Next Step – Import!
  • 17. Creation of the Initial Build • Establish primary categories for the thesaurus • Sort uncontrolled terms into appropriate categories • Most time-consuming process • Content will be re-evaluated, don’t stress too much on getting it right the first time • Create synonyms and related terms as you sort each term • Double-check for conceptual duplicates within the project • Ensure standardized spelling (American vs. British English) • Check for typos • Review Literary, Organizational, and User Warrant for each term • Delete terms with little to no indexing value
  • 18. Initial Build - Equivalence and Associations Six-Second Rule • As a rule-of-thumb, give yourself six seconds to brainstorm multiple ways to express a single concept. Creating synonyms not only allows for a stronger thesaurus, but will potentially identify duplicate concepts within the early vocabulary. Adding and searching for related terms will identify other subject areas included in the unsorted taxonomy
  • 20. Evaluation • Review Literary, Organizational, and User Warrant • Division of top terms • Assign team members top levels to review • Fill in missing gaps of classification • Ensure no flat list of topics (more than 15 terms in a category) exist within a single section • Merge conceptual duplications within the content • Preferring one expression over the others • Delete terms with little to zero indexing value • Add synonyms not listed for each term • Add related terms which do not appear
  • 21. Evaluation - Term style and Form Must represent single-train of thought • Removes ambiguity and uncertainty of concepts • Pre-coordination of terms should be disregarded (“Acoustics in music”, “Cancer and metastasis”) Reduce slang and jargon for preferred terms unless no other word describes the concept or if the older terminology is infrequently • (Microelectromechanical Systems and MEMS) • (Quantum bits and Qubits)
  • 22. Evaluation - Term style and Form Use nouns, or noun phrases / Avoid action verbs for concepts • Catalysis rather than catalyze • Distillation rather than distill • Reading rather than read Adjectives and Adverbs • May be used to differentiate different concepts • Should not be used as individual terms
  • 23. Evaluation Proper nouns (including names, places, etc.) should have proper capitalization Compound terms • Used for Disambiguation and for specificity • Granular descriptors “Lead coating on copper pipes” Arabian Peninsula Milky Way Galaxy Louvre Albert Einstein
  • 24. Evaluation - Term style and Form Loanwords are fine if they are covered well within the content (habeas corpus) Abbreviations and acronyms should be spelled out, unless the proper name is rarely used (DNA) Do not include parentheses unless disambiguating the term • Mercury (element) = Okay • Computed tomography (CT) = Frowned upon
  • 25. Indexing Post-coordination • Two or more thesaurus terms are applied to an article to represent a concept. • Used at the time of search and retrieval Pre-coordination • Terms are combined before indexing • Uses one node to describe content Liver AND Anatomy New York AND Subway Furniture-California-San Francisco-History-20th Century Liver-Blood Vessels-Diseases-Congresses
  • 26. Post-coordinated terms work more effectively for MAIstro (Thesaurus Master and M.A.I.) • Allows M.A.I. to easily identify subject terms within a range documents without elaborate rules • Easier to maintain simpler vocabulary terms Pre-coordination allows an unlimited amount of terms to be added to the Thesaurus • Expressing multiple concepts within a singular thesaurus term will set a precedence for enabling all terms in this manner • If you have the term Computers in chemistry, what will stop you from creating Computers in biology, Computers in dentistry, Computers in echolocation, etc.
  • 27. Evaluation - Term style and form Keep terms plural unless • changing the term to a plural form alters the meaning of the term (e.g. Technology; Technologies) • If this is the case, disambiguate the concepts with parenthetical qualifiers Technology (applied sciences) and Technologies (devices) • Literary warrant or User warrant dictates the term to be singular Control the vocabulary through use of synonyms • Terms must represent unique concepts Keep single Train-of-Thought
  • 28. Revision and Reiteration Thesaurus development is highly cyclical • For multiple personnel, reviewing alternate sections and others work is highly recommended • Alternating a pair of eyes will catch plenty of errors and inconsistencies within the thesaurus terms Subject Matter Expert feedback is always recommended • Must be clear what SMEs are reviewing and why they are reviewing it • Many experts are highly opinionated and unaware of the scope/implementation of the project • Feedback must be re-evaluated (sometimes taken with a grain of salt)
  • 29. Standards and Compliance • American National Standards Institute / National Information Standards Organization • ANSI/NISO Z.39.19 • British Standards Institute • BS 8723 parts 1-4 • International Standards Institute • ISO 25964
  • 30. Continue on with the Live Demo