SlideShare uma empresa Scribd logo
1 de 16
Baixar para ler offline
Usage and impact of controlled vocabularies in a
subject repository for indexing and retrieval

Dr. Timo Borst

LIBER 2011
Barcelona
29.6.-2.7.2011




                                ZBW is member of the Leibniz Association
Overview

1. Terminology webservices as a means for supporting retrieval in the
   realm of library applications

2. Logfile analysis as an approach for analysing users‘ search
   behaviour

3. Results

4. Conclusions and suggestions for improving search interfaces




                                                                 Seite 2
Terminology webservices
General idea: “Provide a framework for integrating authority data, which
is both normative and flexible enough to tolerate local idiosyncrasies on
a string level.”
Approach: Concept modelling based on Semantic Web /
SKOS standards (for concepts, persons, institutions,…)




                                                                    Seite 3
Terminology webservices
Architecture




                          Seite 4
Terminology webservices
Terminology?

   „STW Thesaurus for Economics“,
    http://zbw.eu/stw/versions/latest/about.en.html
   More than 6,000 standardized subject headings and 18,000 entry
    terms
   Contains concepts from Economics and Business Research, but
    also from law, sociology and politics
   Part of the Semantic Web and the LOD cloud
   Integrated into our own retrieval applications, downloaded from
    many institutions


                                                                 Seite 5
Terminology webservices
How does it work?




                          Seite 6
Logfile analysis
 Many approaches to analysis of user behaviour (logfile analysis,
  real-time tracking, usability studies, questionnaires…)
 To us, logfiles serve as a basis for analysing string patterns in
  queries, hence search behaviour on a linguistic level
 Basic idea: each user request is logged in a standardized way, e.g.
  by a web server




 Query strings are automatically processed and analysed e.g. through
  scripts (PERL), regexp or UNIX Shell commands (grep, sed, awk,…)

                                                                   Seite 7
Logfile analysis


 Pros                                   Cons

+ Automatic generation and             - Access through proxies and
  persistence of logfiles                browser caches -> no user
+ Can be processed at any time           identification and counting
  by different tools                     possible
+ Filtering of robots, crawlers etc.   - Sometimes restricted to data
  possible                               privacy rules (e.g., no IP
                                         tracking allowed)
                                       - No real-time processing

                                                                    Seite 8
Logfile analysis
To be investigated:

1. What is the current rate of search queries with controlled
   vocabulary?

2. What is the potential mapping of uncontrolled search terms to
   controlled vocabulary?

3. How does the use of controlled vocabulary affect document views?




                                                                   Seite 9
Results
What is the current rate of search queries with controlled vocabulary
(JEL, STW terms by autosuggest and search term expansion
with/without scrolling)?
                      rate of controlled queries
                                               12%          STW terms

                                                            JEL terms
                                                     14%
                                                            STW expansion
                                                            terms
                                                            STW expansion
                                                            terms (scrolled)
                                                            non-controlled

                                                       6%
                67%                                  1%




                                                                               Seite 10
Results
What is the potential mapping of uncontrolled search terms to
controlled vocabulary (internal search)?*
   potential for controlled queries / internal search

             18%
                                                                             *approach:
                                                                             Running search terms
                                                                             against Lucene/SOLR-
                                                                             index of STW terms with
                                                        matching terms       stemming
                                                        non-matching terms
    15%                                                 other




                                              67%




                                                                                             Seite 11
Results
How does the use of controlled vocabulary affect document views
(Google search)?*
    potential for controlled queries / Google search

             18%


                                                       matching terms
                                                       non-matching terms   *approach:
                                                       other
                                                                            Running search terms
     13%                                                                    against Lucene/SOLR-
                                                                            index of STW terms with
                                                                            stemming
                                               69%




                                                                                           Seite 12
Conclusions and suggestions for improving
search interfaces (I)
  Significant use of and potential for controlled vocabulary – if the
   vocabulary is big enough and constantly maintained
 Significant rate of uncontrolled terms belonging to other-categories
   like „names“ and „document titles“ – how to support this better?
   Different searches for names according to different roles (e.g.
      search for (co-)authors, in citations, author information etc.)
   Suggesting names by authority files
 Result sets resulting from search term expansion are scrolled quite
   often – how to avoid this?
   Adding filters
   Sorting by column
   Cascading search

                                                                  Seite 13
Conclusions and suggestions for improving
search interfaces (II)
  Mapping of uncontrolled terms to vocabulary still may be further
   improved by linguistic techniques – main goal: convergence
   between „information system‘s language“ and user language
 Uncontrolled internal search (in our repository) and Google search
   formally do not differ much - what does that mean?
   Statement: Adaptation to Google text based search is not
     appropriate for domain specific scientific search. Instead, we do
     need
     suggest services based on authority data for terms, names,
       institutions etc. to better anticipate domain specific queries
     visible real-time information about other users‘ search
       behaviour (community building)
     visualization and navigation of domain specific topics
                                                                  Seite 14
Seite 15
Thank you!


Questions?

 Dr. Timo Borst
t.borst@zbw.eu




                  ZBW is member of the Leibniz Association

Mais conteúdo relacionado

Semelhante a Usage and impact of controlled vocabularies in a subject repository for indexing and retrieval

BSI: BLOOM FILTER-BASED SEMANTIC INDEXING FOR UNSTRUCTURED P2P NETWORKS
BSI: BLOOM FILTER-BASED SEMANTIC INDEXING FOR UNSTRUCTURED P2P NETWORKSBSI: BLOOM FILTER-BASED SEMANTIC INDEXING FOR UNSTRUCTURED P2P NETWORKS
BSI: BLOOM FILTER-BASED SEMANTIC INDEXING FOR UNSTRUCTURED P2P NETWORKSijp2p
 
BSI: BLOOM FILTER-BASED SEMANTIC INDEXING FOR UNSTRUCTURED P2P NETWORKS
BSI: BLOOM FILTER-BASED SEMANTIC INDEXING FOR UNSTRUCTURED P2P NETWORKSBSI: BLOOM FILTER-BASED SEMANTIC INDEXING FOR UNSTRUCTURED P2P NETWORKS
BSI: BLOOM FILTER-BASED SEMANTIC INDEXING FOR UNSTRUCTURED P2P NETWORKSijp2p
 
Bsi bloom filter based semantic indexing
Bsi bloom filter based semantic indexingBsi bloom filter based semantic indexing
Bsi bloom filter based semantic indexingijp2p
 
BSI: BLOOM FILTER-BASED SEMANTIC INDEXING FOR UNSTRUCTURED P2P NETWORKS
BSI: BLOOM FILTER-BASED SEMANTIC INDEXING FOR UNSTRUCTURED P2P NETWORKSBSI: BLOOM FILTER-BASED SEMANTIC INDEXING FOR UNSTRUCTURED P2P NETWORKS
BSI: BLOOM FILTER-BASED SEMANTIC INDEXING FOR UNSTRUCTURED P2P NETWORKSijp2p
 
BSI: BLOOM FILTER-BASED SEMANTIC INDEXING FOR UNSTRUCTURED P2P NETWORKS
BSI: BLOOM FILTER-BASED SEMANTIC INDEXING FOR UNSTRUCTURED P2P NETWORKSBSI: BLOOM FILTER-BASED SEMANTIC INDEXING FOR UNSTRUCTURED P2P NETWORKS
BSI: BLOOM FILTER-BASED SEMANTIC INDEXING FOR UNSTRUCTURED P2P NETWORKSijp2p
 
WEB SEARCH ENGINE BASED SEMANTIC SIMILARITY MEASURE BETWEEN WORDS USING PATTE...
WEB SEARCH ENGINE BASED SEMANTIC SIMILARITY MEASURE BETWEEN WORDS USING PATTE...WEB SEARCH ENGINE BASED SEMANTIC SIMILARITY MEASURE BETWEEN WORDS USING PATTE...
WEB SEARCH ENGINE BASED SEMANTIC SIMILARITY MEASURE BETWEEN WORDS USING PATTE...cscpconf
 
Document Retrieval System, a Case Study
Document Retrieval System, a Case StudyDocument Retrieval System, a Case Study
Document Retrieval System, a Case StudyIJERA Editor
 
Semantic Search of E-Learning Documents Using Ontology Based System
Semantic Search of E-Learning Documents Using Ontology Based SystemSemantic Search of E-Learning Documents Using Ontology Based System
Semantic Search of E-Learning Documents Using Ontology Based Systemijcnes
 
Perception Determined Constructing Algorithm for Document Clustering
Perception Determined Constructing Algorithm for Document ClusteringPerception Determined Constructing Algorithm for Document Clustering
Perception Determined Constructing Algorithm for Document ClusteringIRJET Journal
 
USING NLP APPROACH FOR ANALYZING CUSTOMER REVIEWS
USING NLP APPROACH FOR ANALYZING CUSTOMER REVIEWSUSING NLP APPROACH FOR ANALYZING CUSTOMER REVIEWS
USING NLP APPROACH FOR ANALYZING CUSTOMER REVIEWScsandit
 
Using NLP Approach for Analyzing Customer Reviews
Using NLP Approach for Analyzing Customer Reviews Using NLP Approach for Analyzing Customer Reviews
Using NLP Approach for Analyzing Customer Reviews cscpconf
 
Search Interface Feature Evaluation in Biosciences
Search Interface Feature Evaluation in BiosciencesSearch Interface Feature Evaluation in Biosciences
Search Interface Feature Evaluation in BiosciencesZanda Mark
 
Ontology Based Approach for Semantic Information Retrieval System
Ontology Based Approach for Semantic Information Retrieval SystemOntology Based Approach for Semantic Information Retrieval System
Ontology Based Approach for Semantic Information Retrieval SystemIJTET Journal
 
Web traffic and campus trends: a multi-institution analysis
Web traffic and campus trends: a multi-institution analysisWeb traffic and campus trends: a multi-institution analysis
Web traffic and campus trends: a multi-institution analysisRobin Paynter
 
Web Traffic and Campus Trends: A Multi-Institutional Analysis
Web Traffic and Campus Trends: A Multi-Institutional AnalysisWeb Traffic and Campus Trends: A Multi-Institutional Analysis
Web Traffic and Campus Trends: A Multi-Institutional AnalysisLaura Zeigen
 
Performance Evaluation of Query Processing Techniques in Information Retrieval
Performance Evaluation of Query Processing Techniques in Information RetrievalPerformance Evaluation of Query Processing Techniques in Information Retrieval
Performance Evaluation of Query Processing Techniques in Information Retrievalidescitation
 
Learning about Information Searchers from Eye-Tracking by Jacek Gwizdka
Learning about Information Searchers from Eye-Tracking by Jacek GwizdkaLearning about Information Searchers from Eye-Tracking by Jacek Gwizdka
Learning about Information Searchers from Eye-Tracking by Jacek Gwizdkajacekg
 
Resource Description Framework Approach to Data Publication and Federation
Resource Description Framework Approach to Data Publication and FederationResource Description Framework Approach to Data Publication and Federation
Resource Description Framework Approach to Data Publication and FederationPistoia Alliance
 
Technical Whitepaper: A Knowledge Correlation Search Engine
Technical Whitepaper: A Knowledge Correlation Search EngineTechnical Whitepaper: A Knowledge Correlation Search Engine
Technical Whitepaper: A Knowledge Correlation Search Engines0P5a41b
 

Semelhante a Usage and impact of controlled vocabularies in a subject repository for indexing and retrieval (20)

BSI: BLOOM FILTER-BASED SEMANTIC INDEXING FOR UNSTRUCTURED P2P NETWORKS
BSI: BLOOM FILTER-BASED SEMANTIC INDEXING FOR UNSTRUCTURED P2P NETWORKSBSI: BLOOM FILTER-BASED SEMANTIC INDEXING FOR UNSTRUCTURED P2P NETWORKS
BSI: BLOOM FILTER-BASED SEMANTIC INDEXING FOR UNSTRUCTURED P2P NETWORKS
 
BSI: BLOOM FILTER-BASED SEMANTIC INDEXING FOR UNSTRUCTURED P2P NETWORKS
BSI: BLOOM FILTER-BASED SEMANTIC INDEXING FOR UNSTRUCTURED P2P NETWORKSBSI: BLOOM FILTER-BASED SEMANTIC INDEXING FOR UNSTRUCTURED P2P NETWORKS
BSI: BLOOM FILTER-BASED SEMANTIC INDEXING FOR UNSTRUCTURED P2P NETWORKS
 
Bsi bloom filter based semantic indexing
Bsi bloom filter based semantic indexingBsi bloom filter based semantic indexing
Bsi bloom filter based semantic indexing
 
BSI: BLOOM FILTER-BASED SEMANTIC INDEXING FOR UNSTRUCTURED P2P NETWORKS
BSI: BLOOM FILTER-BASED SEMANTIC INDEXING FOR UNSTRUCTURED P2P NETWORKSBSI: BLOOM FILTER-BASED SEMANTIC INDEXING FOR UNSTRUCTURED P2P NETWORKS
BSI: BLOOM FILTER-BASED SEMANTIC INDEXING FOR UNSTRUCTURED P2P NETWORKS
 
BSI: BLOOM FILTER-BASED SEMANTIC INDEXING FOR UNSTRUCTURED P2P NETWORKS
BSI: BLOOM FILTER-BASED SEMANTIC INDEXING FOR UNSTRUCTURED P2P NETWORKSBSI: BLOOM FILTER-BASED SEMANTIC INDEXING FOR UNSTRUCTURED P2P NETWORKS
BSI: BLOOM FILTER-BASED SEMANTIC INDEXING FOR UNSTRUCTURED P2P NETWORKS
 
WEB SEARCH ENGINE BASED SEMANTIC SIMILARITY MEASURE BETWEEN WORDS USING PATTE...
WEB SEARCH ENGINE BASED SEMANTIC SIMILARITY MEASURE BETWEEN WORDS USING PATTE...WEB SEARCH ENGINE BASED SEMANTIC SIMILARITY MEASURE BETWEEN WORDS USING PATTE...
WEB SEARCH ENGINE BASED SEMANTIC SIMILARITY MEASURE BETWEEN WORDS USING PATTE...
 
Document Retrieval System, a Case Study
Document Retrieval System, a Case StudyDocument Retrieval System, a Case Study
Document Retrieval System, a Case Study
 
Semantic Search of E-Learning Documents Using Ontology Based System
Semantic Search of E-Learning Documents Using Ontology Based SystemSemantic Search of E-Learning Documents Using Ontology Based System
Semantic Search of E-Learning Documents Using Ontology Based System
 
Perception Determined Constructing Algorithm for Document Clustering
Perception Determined Constructing Algorithm for Document ClusteringPerception Determined Constructing Algorithm for Document Clustering
Perception Determined Constructing Algorithm for Document Clustering
 
USING NLP APPROACH FOR ANALYZING CUSTOMER REVIEWS
USING NLP APPROACH FOR ANALYZING CUSTOMER REVIEWSUSING NLP APPROACH FOR ANALYZING CUSTOMER REVIEWS
USING NLP APPROACH FOR ANALYZING CUSTOMER REVIEWS
 
Using NLP Approach for Analyzing Customer Reviews
Using NLP Approach for Analyzing Customer Reviews Using NLP Approach for Analyzing Customer Reviews
Using NLP Approach for Analyzing Customer Reviews
 
Search Interface Feature Evaluation in Biosciences
Search Interface Feature Evaluation in BiosciencesSearch Interface Feature Evaluation in Biosciences
Search Interface Feature Evaluation in Biosciences
 
Search Interface Feature Evaluation
Search Interface Feature EvaluationSearch Interface Feature Evaluation
Search Interface Feature Evaluation
 
Ontology Based Approach for Semantic Information Retrieval System
Ontology Based Approach for Semantic Information Retrieval SystemOntology Based Approach for Semantic Information Retrieval System
Ontology Based Approach for Semantic Information Retrieval System
 
Web traffic and campus trends: a multi-institution analysis
Web traffic and campus trends: a multi-institution analysisWeb traffic and campus trends: a multi-institution analysis
Web traffic and campus trends: a multi-institution analysis
 
Web Traffic and Campus Trends: A Multi-Institutional Analysis
Web Traffic and Campus Trends: A Multi-Institutional AnalysisWeb Traffic and Campus Trends: A Multi-Institutional Analysis
Web Traffic and Campus Trends: A Multi-Institutional Analysis
 
Performance Evaluation of Query Processing Techniques in Information Retrieval
Performance Evaluation of Query Processing Techniques in Information RetrievalPerformance Evaluation of Query Processing Techniques in Information Retrieval
Performance Evaluation of Query Processing Techniques in Information Retrieval
 
Learning about Information Searchers from Eye-Tracking by Jacek Gwizdka
Learning about Information Searchers from Eye-Tracking by Jacek GwizdkaLearning about Information Searchers from Eye-Tracking by Jacek Gwizdka
Learning about Information Searchers from Eye-Tracking by Jacek Gwizdka
 
Resource Description Framework Approach to Data Publication and Federation
Resource Description Framework Approach to Data Publication and FederationResource Description Framework Approach to Data Publication and Federation
Resource Description Framework Approach to Data Publication and Federation
 
Technical Whitepaper: A Knowledge Correlation Search Engine
Technical Whitepaper: A Knowledge Correlation Search EngineTechnical Whitepaper: A Knowledge Correlation Search Engine
Technical Whitepaper: A Knowledge Correlation Search Engine
 

Mais de redsys

DSpace as publication platform
DSpace as publication platformDSpace as publication platform
DSpace as publication platformredsys
 
Einbindung von Linked Data in existierende Bibliotheksanswendungen
Einbindung von Linked Data in existierende BibliotheksanswendungenEinbindung von Linked Data in existierende Bibliotheksanswendungen
Einbindung von Linked Data in existierende Bibliotheksanswendungenredsys
 
Datenschutz für Bibliotheksanwendungen
Datenschutz für BibliotheksanwendungenDatenschutz für Bibliotheksanwendungen
Datenschutz für Bibliotheksanwendungenredsys
 
Medienkompetenz und Wikipedia an Hochschulen
Medienkompetenz und Wikipedia an HochschulenMedienkompetenz und Wikipedia an Hochschulen
Medienkompetenz und Wikipedia an Hochschulenredsys
 
Poster presentation
Poster presentationPoster presentation
Poster presentationredsys
 
Integration von Normdaten in Bibliotheksanwendungen auf der Basis von Semanti...
Integration von Normdaten in Bibliotheksanwendungen auf der Basis von Semanti...Integration von Normdaten in Bibliotheksanwendungen auf der Basis von Semanti...
Integration von Normdaten in Bibliotheksanwendungen auf der Basis von Semanti...redsys
 
Improving library services with semantic web technology in the realm of repos...
Improving library services with semantic web technology in the realm of repos...Improving library services with semantic web technology in the realm of repos...
Improving library services with semantic web technology in the realm of repos...redsys
 

Mais de redsys (7)

DSpace as publication platform
DSpace as publication platformDSpace as publication platform
DSpace as publication platform
 
Einbindung von Linked Data in existierende Bibliotheksanswendungen
Einbindung von Linked Data in existierende BibliotheksanswendungenEinbindung von Linked Data in existierende Bibliotheksanswendungen
Einbindung von Linked Data in existierende Bibliotheksanswendungen
 
Datenschutz für Bibliotheksanwendungen
Datenschutz für BibliotheksanwendungenDatenschutz für Bibliotheksanwendungen
Datenschutz für Bibliotheksanwendungen
 
Medienkompetenz und Wikipedia an Hochschulen
Medienkompetenz und Wikipedia an HochschulenMedienkompetenz und Wikipedia an Hochschulen
Medienkompetenz und Wikipedia an Hochschulen
 
Poster presentation
Poster presentationPoster presentation
Poster presentation
 
Integration von Normdaten in Bibliotheksanwendungen auf der Basis von Semanti...
Integration von Normdaten in Bibliotheksanwendungen auf der Basis von Semanti...Integration von Normdaten in Bibliotheksanwendungen auf der Basis von Semanti...
Integration von Normdaten in Bibliotheksanwendungen auf der Basis von Semanti...
 
Improving library services with semantic web technology in the realm of repos...
Improving library services with semantic web technology in the realm of repos...Improving library services with semantic web technology in the realm of repos...
Improving library services with semantic web technology in the realm of repos...
 

Último

Copyright: What Creators and Users of Art Need to Know
Copyright: What Creators and Users of Art Need to KnowCopyright: What Creators and Users of Art Need to Know
Copyright: What Creators and Users of Art Need to KnowMiriam Robeson
 
Equinox Gold Corporate Deck May 24th 2024
Equinox Gold Corporate Deck May 24th 2024Equinox Gold Corporate Deck May 24th 2024
Equinox Gold Corporate Deck May 24th 2024Equinox Gold Corp.
 
Future of Trade 2024 - Decoupled and Reconfigured - Snapshot Report
Future of Trade 2024 - Decoupled and Reconfigured - Snapshot ReportFuture of Trade 2024 - Decoupled and Reconfigured - Snapshot Report
Future of Trade 2024 - Decoupled and Reconfigured - Snapshot ReportDubai Multi Commodity Centre
 
NewBase 24 May 2024 Energy News issue - 1727 by Khaled Al Awadi_compresse...
NewBase   24 May  2024  Energy News issue - 1727 by Khaled Al Awadi_compresse...NewBase   24 May  2024  Energy News issue - 1727 by Khaled Al Awadi_compresse...
NewBase 24 May 2024 Energy News issue - 1727 by Khaled Al Awadi_compresse...Khaled Al Awadi
 
The Truth About Dinesh Bafna's Situation.pdf
The Truth About Dinesh Bafna's Situation.pdfThe Truth About Dinesh Bafna's Situation.pdf
The Truth About Dinesh Bafna's Situation.pdfMont Surfaces
 
LinkedIn Masterclass Techweek 2024 v4.1.pptx
LinkedIn Masterclass Techweek 2024 v4.1.pptxLinkedIn Masterclass Techweek 2024 v4.1.pptx
LinkedIn Masterclass Techweek 2024 v4.1.pptxSymbio Agency Ltd
 
Toyota Kata Coaching for Agile Teams & Transformations
Toyota Kata Coaching for Agile Teams & TransformationsToyota Kata Coaching for Agile Teams & Transformations
Toyota Kata Coaching for Agile Teams & TransformationsStefan Wolpers
 
FEXLE- Salesforce Field Service Lightning
FEXLE- Salesforce Field Service LightningFEXLE- Salesforce Field Service Lightning
FEXLE- Salesforce Field Service LightningFEXLE
 
Sedex Members Ethical Trade Audit (SMETA) Measurement Criteria
Sedex Members Ethical Trade Audit (SMETA) Measurement CriteriaSedex Members Ethical Trade Audit (SMETA) Measurement Criteria
Sedex Members Ethical Trade Audit (SMETA) Measurement Criteriamilos639
 
Elevate Your Online Presence with SEO Services
Elevate Your Online Presence with SEO ServicesElevate Your Online Presence with SEO Services
Elevate Your Online Presence with SEO ServicesHaseebBashir5
 
What is paper chromatography, principal, procedure,types, diagram, advantages...
What is paper chromatography, principal, procedure,types, diagram, advantages...What is paper chromatography, principal, procedure,types, diagram, advantages...
What is paper chromatography, principal, procedure,types, diagram, advantages...srcw2322l101
 
بروفايل شركة ميار الخليج للاستشارات الهندسية.pdf
بروفايل شركة ميار الخليج للاستشارات الهندسية.pdfبروفايل شركة ميار الخليج للاستشارات الهندسية.pdf
بروفايل شركة ميار الخليج للاستشارات الهندسية.pdfomnme1
 
Unlock Your TikTok Potential: Free TikTok Likes with InstBlast
Unlock Your TikTok Potential: Free TikTok Likes with InstBlastUnlock Your TikTok Potential: Free TikTok Likes with InstBlast
Unlock Your TikTok Potential: Free TikTok Likes with InstBlastInstBlast Marketing
 
HR and Employment law update: May 2024.
HR and Employment law update:  May 2024.HR and Employment law update:  May 2024.
HR and Employment law update: May 2024.FelixPerez547899
 
How Do Venture Capitalists Make Decisions?
How Do Venture Capitalists Make Decisions?How Do Venture Capitalists Make Decisions?
How Do Venture Capitalists Make Decisions?Alejandro Cremades
 
zidauu _business communication.pptx /pdf
zidauu _business  communication.pptx /pdfzidauu _business  communication.pptx /pdf
zidauu _business communication.pptx /pdfzukhrafshabbir
 
Constitution of Company Article of Association
Constitution of Company Article of AssociationConstitution of Company Article of Association
Constitution of Company Article of Associationseri bangash
 
Blinkit: Revolutionizing the On-Demand Grocery Delivery Service.pptx
Blinkit: Revolutionizing the On-Demand Grocery Delivery Service.pptxBlinkit: Revolutionizing the On-Demand Grocery Delivery Service.pptx
Blinkit: Revolutionizing the On-Demand Grocery Delivery Service.pptxSaksham Gupta
 
Meaningful Technology for Humans: How Strategy Helps to Deliver Real Value fo...
Meaningful Technology for Humans: How Strategy Helps to Deliver Real Value fo...Meaningful Technology for Humans: How Strategy Helps to Deliver Real Value fo...
Meaningful Technology for Humans: How Strategy Helps to Deliver Real Value fo...Björn Rohles
 
Stages of Startup Funding - An Explainer
Stages of Startup Funding - An ExplainerStages of Startup Funding - An Explainer
Stages of Startup Funding - An ExplainerAlejandro Cremades
 

Último (20)

Copyright: What Creators and Users of Art Need to Know
Copyright: What Creators and Users of Art Need to KnowCopyright: What Creators and Users of Art Need to Know
Copyright: What Creators and Users of Art Need to Know
 
Equinox Gold Corporate Deck May 24th 2024
Equinox Gold Corporate Deck May 24th 2024Equinox Gold Corporate Deck May 24th 2024
Equinox Gold Corporate Deck May 24th 2024
 
Future of Trade 2024 - Decoupled and Reconfigured - Snapshot Report
Future of Trade 2024 - Decoupled and Reconfigured - Snapshot ReportFuture of Trade 2024 - Decoupled and Reconfigured - Snapshot Report
Future of Trade 2024 - Decoupled and Reconfigured - Snapshot Report
 
NewBase 24 May 2024 Energy News issue - 1727 by Khaled Al Awadi_compresse...
NewBase   24 May  2024  Energy News issue - 1727 by Khaled Al Awadi_compresse...NewBase   24 May  2024  Energy News issue - 1727 by Khaled Al Awadi_compresse...
NewBase 24 May 2024 Energy News issue - 1727 by Khaled Al Awadi_compresse...
 
The Truth About Dinesh Bafna's Situation.pdf
The Truth About Dinesh Bafna's Situation.pdfThe Truth About Dinesh Bafna's Situation.pdf
The Truth About Dinesh Bafna's Situation.pdf
 
LinkedIn Masterclass Techweek 2024 v4.1.pptx
LinkedIn Masterclass Techweek 2024 v4.1.pptxLinkedIn Masterclass Techweek 2024 v4.1.pptx
LinkedIn Masterclass Techweek 2024 v4.1.pptx
 
Toyota Kata Coaching for Agile Teams & Transformations
Toyota Kata Coaching for Agile Teams & TransformationsToyota Kata Coaching for Agile Teams & Transformations
Toyota Kata Coaching for Agile Teams & Transformations
 
FEXLE- Salesforce Field Service Lightning
FEXLE- Salesforce Field Service LightningFEXLE- Salesforce Field Service Lightning
FEXLE- Salesforce Field Service Lightning
 
Sedex Members Ethical Trade Audit (SMETA) Measurement Criteria
Sedex Members Ethical Trade Audit (SMETA) Measurement CriteriaSedex Members Ethical Trade Audit (SMETA) Measurement Criteria
Sedex Members Ethical Trade Audit (SMETA) Measurement Criteria
 
Elevate Your Online Presence with SEO Services
Elevate Your Online Presence with SEO ServicesElevate Your Online Presence with SEO Services
Elevate Your Online Presence with SEO Services
 
What is paper chromatography, principal, procedure,types, diagram, advantages...
What is paper chromatography, principal, procedure,types, diagram, advantages...What is paper chromatography, principal, procedure,types, diagram, advantages...
What is paper chromatography, principal, procedure,types, diagram, advantages...
 
بروفايل شركة ميار الخليج للاستشارات الهندسية.pdf
بروفايل شركة ميار الخليج للاستشارات الهندسية.pdfبروفايل شركة ميار الخليج للاستشارات الهندسية.pdf
بروفايل شركة ميار الخليج للاستشارات الهندسية.pdf
 
Unlock Your TikTok Potential: Free TikTok Likes with InstBlast
Unlock Your TikTok Potential: Free TikTok Likes with InstBlastUnlock Your TikTok Potential: Free TikTok Likes with InstBlast
Unlock Your TikTok Potential: Free TikTok Likes with InstBlast
 
HR and Employment law update: May 2024.
HR and Employment law update:  May 2024.HR and Employment law update:  May 2024.
HR and Employment law update: May 2024.
 
How Do Venture Capitalists Make Decisions?
How Do Venture Capitalists Make Decisions?How Do Venture Capitalists Make Decisions?
How Do Venture Capitalists Make Decisions?
 
zidauu _business communication.pptx /pdf
zidauu _business  communication.pptx /pdfzidauu _business  communication.pptx /pdf
zidauu _business communication.pptx /pdf
 
Constitution of Company Article of Association
Constitution of Company Article of AssociationConstitution of Company Article of Association
Constitution of Company Article of Association
 
Blinkit: Revolutionizing the On-Demand Grocery Delivery Service.pptx
Blinkit: Revolutionizing the On-Demand Grocery Delivery Service.pptxBlinkit: Revolutionizing the On-Demand Grocery Delivery Service.pptx
Blinkit: Revolutionizing the On-Demand Grocery Delivery Service.pptx
 
Meaningful Technology for Humans: How Strategy Helps to Deliver Real Value fo...
Meaningful Technology for Humans: How Strategy Helps to Deliver Real Value fo...Meaningful Technology for Humans: How Strategy Helps to Deliver Real Value fo...
Meaningful Technology for Humans: How Strategy Helps to Deliver Real Value fo...
 
Stages of Startup Funding - An Explainer
Stages of Startup Funding - An ExplainerStages of Startup Funding - An Explainer
Stages of Startup Funding - An Explainer
 

Usage and impact of controlled vocabularies in a subject repository for indexing and retrieval

  • 1. Usage and impact of controlled vocabularies in a subject repository for indexing and retrieval Dr. Timo Borst LIBER 2011 Barcelona 29.6.-2.7.2011 ZBW is member of the Leibniz Association
  • 2. Overview 1. Terminology webservices as a means for supporting retrieval in the realm of library applications 2. Logfile analysis as an approach for analysing users‘ search behaviour 3. Results 4. Conclusions and suggestions for improving search interfaces Seite 2
  • 3. Terminology webservices General idea: “Provide a framework for integrating authority data, which is both normative and flexible enough to tolerate local idiosyncrasies on a string level.” Approach: Concept modelling based on Semantic Web / SKOS standards (for concepts, persons, institutions,…) Seite 3
  • 5. Terminology webservices Terminology?  „STW Thesaurus for Economics“, http://zbw.eu/stw/versions/latest/about.en.html  More than 6,000 standardized subject headings and 18,000 entry terms  Contains concepts from Economics and Business Research, but also from law, sociology and politics  Part of the Semantic Web and the LOD cloud  Integrated into our own retrieval applications, downloaded from many institutions Seite 5
  • 7. Logfile analysis  Many approaches to analysis of user behaviour (logfile analysis, real-time tracking, usability studies, questionnaires…)  To us, logfiles serve as a basis for analysing string patterns in queries, hence search behaviour on a linguistic level  Basic idea: each user request is logged in a standardized way, e.g. by a web server  Query strings are automatically processed and analysed e.g. through scripts (PERL), regexp or UNIX Shell commands (grep, sed, awk,…) Seite 7
  • 8. Logfile analysis Pros Cons + Automatic generation and - Access through proxies and persistence of logfiles browser caches -> no user + Can be processed at any time identification and counting by different tools possible + Filtering of robots, crawlers etc. - Sometimes restricted to data possible privacy rules (e.g., no IP tracking allowed) - No real-time processing Seite 8
  • 9. Logfile analysis To be investigated: 1. What is the current rate of search queries with controlled vocabulary? 2. What is the potential mapping of uncontrolled search terms to controlled vocabulary? 3. How does the use of controlled vocabulary affect document views? Seite 9
  • 10. Results What is the current rate of search queries with controlled vocabulary (JEL, STW terms by autosuggest and search term expansion with/without scrolling)? rate of controlled queries 12% STW terms JEL terms 14% STW expansion terms STW expansion terms (scrolled) non-controlled 6% 67% 1% Seite 10
  • 11. Results What is the potential mapping of uncontrolled search terms to controlled vocabulary (internal search)?* potential for controlled queries / internal search 18% *approach: Running search terms against Lucene/SOLR- index of STW terms with matching terms stemming non-matching terms 15% other 67% Seite 11
  • 12. Results How does the use of controlled vocabulary affect document views (Google search)?* potential for controlled queries / Google search 18% matching terms non-matching terms *approach: other Running search terms 13% against Lucene/SOLR- index of STW terms with stemming 69% Seite 12
  • 13. Conclusions and suggestions for improving search interfaces (I)  Significant use of and potential for controlled vocabulary – if the vocabulary is big enough and constantly maintained  Significant rate of uncontrolled terms belonging to other-categories like „names“ and „document titles“ – how to support this better?  Different searches for names according to different roles (e.g. search for (co-)authors, in citations, author information etc.)  Suggesting names by authority files  Result sets resulting from search term expansion are scrolled quite often – how to avoid this?  Adding filters  Sorting by column  Cascading search Seite 13
  • 14. Conclusions and suggestions for improving search interfaces (II)  Mapping of uncontrolled terms to vocabulary still may be further improved by linguistic techniques – main goal: convergence between „information system‘s language“ and user language  Uncontrolled internal search (in our repository) and Google search formally do not differ much - what does that mean?  Statement: Adaptation to Google text based search is not appropriate for domain specific scientific search. Instead, we do need  suggest services based on authority data for terms, names, institutions etc. to better anticipate domain specific queries  visible real-time information about other users‘ search behaviour (community building)  visualization and navigation of domain specific topics Seite 14
  • 16. Thank you! Questions? Dr. Timo Borst t.borst@zbw.eu ZBW is member of the Leibniz Association