SlideShare uma empresa Scribd logo
1 de 15
Project Gutenberg as an
Information Retrieval System
Kai Li
IST616 Final Assignment
2012.11
Introduction to Project Gutenberg
• The first digital library project in the
world, initiated by the late Michael Hart in
1971.
• Project Gutenberg currently offers more than
41,000 public domain eBooks (in more than
50 languages) as well as other resources (like
scientific data).
• Website: http://www.gutenberg.org/
Intended Audience and Functionalities
• Intended audience: eBook readers and general
users.
• Functionalities: portal of the project, eBook
repository and discovery system.
Mobile Site
• There are two kinds of
interfaces of this
website based on the
device one uses. Only
the traditional nonmobile interface will be
examined in this
presentation due to the
limited scope of the
assignment.
Indexing System
Issues of Indexing/Tag System
• There is a searching box as well as a tag called
“Search Catalog”;
– The searching box is too small to be noticed;
– The tag “Search Catalog” actually leads users to a
page where one cannot find the searching box,
but only some browsing selections;

• There are a number of repetitive tags on the
left-hand bar and on the top of the page;
– For example, the tag “Book Categories”.
Means To Find a Book
• Searching
• Browsing
– By categories
Searching
Issues of Searching
• The display is different from most of the
interfaces one can see on the Internet, which
may result some difficulties for new users;
• Due to a lack of navigation mechanism and
the function to refine the result by facets, it’s
extremely inconvenient to locate a resource if
the result is big.
Precision and Recall
• The retrieval method used by this website is a
string-matching method, which matches the
string inputted by the user with the full-text of all
the resources.
– “Or” relationship used for multiple words.

• Because the scope of the index is the full-text, the
recall is higher than traditional library catalogs;
however, since it is still a string-matching
method, the precision is still not very good.
Browsing
Issues of Browsing
• There are three searching tools offered on this
page, which should have been offered on the
searching page rather than this one.
• Only one standard can be used to limit the
resources at the same time. And after one
chooses a certain standard, there is no other
way to further limit the result.
Categories/Classification
• There are two tiers of the “classification” on
this website:
– Subcategories: 23
• These subcategories are called “bookshelf” too, which
is confusing.

– Bookshelves: 133
• Which can be seen as a lower level than subcategories.
However, not all bookshelves are linked to a given
subcategory.
Overall Evaluation
• Advantages:
– Mobile functionalities:
• Mobile site
• QR codes

• Disadvantages:
– Poorly organized and
designed;
– Failing to display the full
richness of the metadata
on the website:
• LoC classification and
subject headings

– The interface being lack
of communication with
the users;
Thanks!

Mais conteúdo relacionado

Mais procurados

Cds Isis Intro Huridocs
Cds Isis Intro HuridocsCds Isis Intro Huridocs
Cds Isis Intro Huridocs
huridocs
 
Normative principles of cataloguing
Normative principles of cataloguingNormative principles of cataloguing
Normative principles of cataloguing
Sarika Sawant
 
Usage of helpful sequence in cc(colon classification)
Usage of helpful sequence in cc(colon classification) Usage of helpful sequence in cc(colon classification)
Usage of helpful sequence in cc(colon classification)
Prakash Das
 

Mais procurados (20)

UNIVERSAL DECIMAL CLASSIFICATION-UDC
UNIVERSAL DECIMAL CLASSIFICATION-UDCUNIVERSAL DECIMAL CLASSIFICATION-UDC
UNIVERSAL DECIMAL CLASSIFICATION-UDC
 
Canon of classification
Canon of classificationCanon of classification
Canon of classification
 
Opac
OpacOpac
Opac
 
Lcsh
LcshLcsh
Lcsh
 
ILA.pptx
ILA.pptxILA.pptx
ILA.pptx
 
Cds Isis Intro Huridocs
Cds Isis Intro HuridocsCds Isis Intro Huridocs
Cds Isis Intro Huridocs
 
Union Catalogues
Union CataloguesUnion Catalogues
Union Catalogues
 
Subject Indexing & Techniques
Subject Indexing  & TechniquesSubject Indexing  & Techniques
Subject Indexing & Techniques
 
What is special library
What  is special libraryWhat  is special library
What is special library
 
Normative principles of cataloguing
Normative principles of cataloguingNormative principles of cataloguing
Normative principles of cataloguing
 
Oclc
OclcOclc
Oclc
 
Forms of catalogue
Forms of catalogueForms of catalogue
Forms of catalogue
 
Usage of helpful sequence in cc(colon classification)
Usage of helpful sequence in cc(colon classification) Usage of helpful sequence in cc(colon classification)
Usage of helpful sequence in cc(colon classification)
 
Library and information science: an evolving profession
Library and information science: an evolving professionLibrary and information science: an evolving profession
Library and information science: an evolving profession
 
BIBFRAME
BIBFRAMEBIBFRAME
BIBFRAME
 
RDA
RDA RDA
RDA
 
Z39.50: Information Retrieval protocol ppt
Z39.50: Information Retrieval protocol pptZ39.50: Information Retrieval protocol ppt
Z39.50: Information Retrieval protocol ppt
 
Koha Library Software: Practical Applications
Koha Library Software: Practical ApplicationsKoha Library Software: Practical Applications
Koha Library Software: Practical Applications
 
Koha ppt
Koha pptKoha ppt
Koha ppt
 
Bibliography Services.pptx
Bibliography Services.pptxBibliography Services.pptx
Bibliography Services.pptx
 

Semelhante a Project Gutenberg as Information Retrieval System

Lost in Translation:
Lost in Translation: Lost in Translation:
Lost in Translation:
tmnewberry
 
What Public Library Users Want and How to
What Public Library Users Want and How to What Public Library Users Want and How to
What Public Library Users Want and How to
Nina McHale
 
Device agnostic discovery using drupal and bibliocommons
Device agnostic discovery using drupal and bibliocommonsDevice agnostic discovery using drupal and bibliocommons
Device agnostic discovery using drupal and bibliocommons
onlinenw
 

Semelhante a Project Gutenberg as Information Retrieval System (20)

Lost in Translation:
Lost in Translation: Lost in Translation:
Lost in Translation:
 
Leveraging Library Thing (2009)
Leveraging Library Thing (2009)Leveraging Library Thing (2009)
Leveraging Library Thing (2009)
 
What Public Library Users Want and How to
What Public Library Users Want and How to What Public Library Users Want and How to
What Public Library Users Want and How to
 
K3 edith falk_discoverytoolslibrary
K3 edith falk_discoverytoolslibraryK3 edith falk_discoverytoolslibrary
K3 edith falk_discoverytoolslibrary
 
K3 edith falk_discoverytoolslibrary
K3 edith falk_discoverytoolslibraryK3 edith falk_discoverytoolslibrary
K3 edith falk_discoverytoolslibrary
 
Web-Scale Discovery: Post Implementation
Web-Scale Discovery: Post ImplementationWeb-Scale Discovery: Post Implementation
Web-Scale Discovery: Post Implementation
 
web opac
 web opac  web opac
web opac
 
Federated to library discovery platfoms
Federated to library discovery platfomsFederated to library discovery platfoms
Federated to library discovery platfoms
 
Web OPAC
Web OPAC Web OPAC
Web OPAC
 
WorldCat Local@Auraria
WorldCat Local@AurariaWorldCat Local@Auraria
WorldCat Local@Auraria
 
Presentacion tics (1)
Presentacion tics (1)Presentacion tics (1)
Presentacion tics (1)
 
Discovery on a budget
Discovery on a budgetDiscovery on a budget
Discovery on a budget
 
Discovery on a budget: Improved searching without a Web-scale discovery product
Discovery on a budget: Improved searching without a Web-scale discovery productDiscovery on a budget: Improved searching without a Web-scale discovery product
Discovery on a budget: Improved searching without a Web-scale discovery product
 
Rethinking Library Cooperatives: Prepared for the Program for Cooperative Cat...
Rethinking Library Cooperatives: Prepared for the Program for Cooperative Cat...Rethinking Library Cooperatives: Prepared for the Program for Cooperative Cat...
Rethinking Library Cooperatives: Prepared for the Program for Cooperative Cat...
 
Library portal by Gaurav Boudh
Library portal by Gaurav BoudhLibrary portal by Gaurav Boudh
Library portal by Gaurav Boudh
 
Web Scale Discovery Services: Google like search experience
Web Scale Discovery Services: Google like search experienceWeb Scale Discovery Services: Google like search experience
Web Scale Discovery Services: Google like search experience
 
Device agnostic discovery using drupal and bibliocommons
Device agnostic discovery using drupal and bibliocommonsDevice agnostic discovery using drupal and bibliocommons
Device agnostic discovery using drupal and bibliocommons
 
Creating better user interfaces for libraries catalogues: how to present and ...
Creating better user interfaces for libraries catalogues: how to present and ...Creating better user interfaces for libraries catalogues: how to present and ...
Creating better user interfaces for libraries catalogues: how to present and ...
 
Role of libraries in research and scholarly communication
Role of libraries in research and scholarly communicationRole of libraries in research and scholarly communication
Role of libraries in research and scholarly communication
 
opacs.ppt
opacs.pptopacs.ppt
opacs.ppt
 

Mais de Kai Li

Introduction to Visualizing Uncertainties
Introduction to Visualizing UncertaintiesIntroduction to Visualizing Uncertainties
Introduction to Visualizing Uncertainties
Kai Li
 
How Americans recognize libraries
How Americans recognize librariesHow Americans recognize libraries
How Americans recognize libraries
Kai Li
 
新一代的Opac服务
新一代的Opac服务新一代的Opac服务
新一代的Opac服务
Kai Li
 
Augmented reality @ libraries
Augmented reality @ librariesAugmented reality @ libraries
Augmented reality @ libraries
Kai Li
 

Mais de Kai Li (20)

Using a keyword extraction pipeline to understand concepts in future work sec...
Using a keyword extraction pipeline to understand concepts in future work sec...Using a keyword extraction pipeline to understand concepts in future work sec...
Using a keyword extraction pipeline to understand concepts in future work sec...
 
Knowledge production between laboratories and scientific texts: a proposal of...
Knowledge production between laboratories and scientific texts: a proposal of...Knowledge production between laboratories and scientific texts: a proposal of...
Knowledge production between laboratories and scientific texts: a proposal of...
 
Data and Software in Scientific Activities: a Literature Review
Data and Software in Scientific Activities: a Literature ReviewData and Software in Scientific Activities: a Literature Review
Data and Software in Scientific Activities: a Literature Review
 
A metadata scheme of the software-data relationship: A proposal
A metadata scheme of the software-data relationship: A proposalA metadata scheme of the software-data relationship: A proposal
A metadata scheme of the software-data relationship: A proposal
 
Software Citation, Reuse and Metadata Considerations: An Exploratory Study ...
Software Citation, Reuse and Metadata Considerations:  An Exploratory Study ...Software Citation, Reuse and Metadata Considerations:  An Exploratory Study ...
Software Citation, Reuse and Metadata Considerations: An Exploratory Study ...
 
On metaphor: a book review of Metaphors we live by
On metaphor: a book review of Metaphors we live byOn metaphor: a book review of Metaphors we live by
On metaphor: a book review of Metaphors we live by
 
Visual perception and mixed-initiative interaction for assisted visualization...
Visual perception and mixed-initiative interaction for assisted visualization...Visual perception and mixed-initiative interaction for assisted visualization...
Visual perception and mixed-initiative interaction for assisted visualization...
 
A family tree of graph types
A family tree of graph typesA family tree of graph types
A family tree of graph types
 
Introduction to Visualizing Uncertainties
Introduction to Visualizing UncertaintiesIntroduction to Visualizing Uncertainties
Introduction to Visualizing Uncertainties
 
InfoVis Final Project: NBA in historical context
InfoVis Final Project: NBA in historical contextInfoVis Final Project: NBA in historical context
InfoVis Final Project: NBA in historical context
 
Introduction to bibframe
Introduction to bibframeIntroduction to bibframe
Introduction to bibframe
 
Grassroots Read: Planning, Marketing and Assessing Plan
Grassroots Read: Planning, Marketing and Assessing PlanGrassroots Read: Planning, Marketing and Assessing Plan
Grassroots Read: Planning, Marketing and Assessing Plan
 
RDFa: an introduction
RDFa: an introductionRDFa: an introduction
RDFa: an introduction
 
Culture Classification: An Analysis
Culture Classification: An AnalysisCulture Classification: An Analysis
Culture Classification: An Analysis
 
RDA in China
RDA in ChinaRDA in China
RDA in China
 
How Americans recognize libraries
How Americans recognize librariesHow Americans recognize libraries
How Americans recognize libraries
 
How libraries use 新浪微博
How libraries use 新浪微博How libraries use 新浪微博
How libraries use 新浪微博
 
新一代的Opac服务
新一代的Opac服务新一代的Opac服务
新一代的Opac服务
 
Ipad and Library
Ipad and LibraryIpad and Library
Ipad and Library
 
Augmented reality @ libraries
Augmented reality @ librariesAugmented reality @ libraries
Augmented reality @ libraries
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 

Último (20)

Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 

Project Gutenberg as Information Retrieval System

  • 1. Project Gutenberg as an Information Retrieval System Kai Li IST616 Final Assignment 2012.11
  • 2. Introduction to Project Gutenberg • The first digital library project in the world, initiated by the late Michael Hart in 1971. • Project Gutenberg currently offers more than 41,000 public domain eBooks (in more than 50 languages) as well as other resources (like scientific data). • Website: http://www.gutenberg.org/
  • 3. Intended Audience and Functionalities • Intended audience: eBook readers and general users. • Functionalities: portal of the project, eBook repository and discovery system.
  • 4. Mobile Site • There are two kinds of interfaces of this website based on the device one uses. Only the traditional nonmobile interface will be examined in this presentation due to the limited scope of the assignment.
  • 6. Issues of Indexing/Tag System • There is a searching box as well as a tag called “Search Catalog”; – The searching box is too small to be noticed; – The tag “Search Catalog” actually leads users to a page where one cannot find the searching box, but only some browsing selections; • There are a number of repetitive tags on the left-hand bar and on the top of the page; – For example, the tag “Book Categories”.
  • 7. Means To Find a Book • Searching • Browsing – By categories
  • 9. Issues of Searching • The display is different from most of the interfaces one can see on the Internet, which may result some difficulties for new users; • Due to a lack of navigation mechanism and the function to refine the result by facets, it’s extremely inconvenient to locate a resource if the result is big.
  • 10. Precision and Recall • The retrieval method used by this website is a string-matching method, which matches the string inputted by the user with the full-text of all the resources. – “Or” relationship used for multiple words. • Because the scope of the index is the full-text, the recall is higher than traditional library catalogs; however, since it is still a string-matching method, the precision is still not very good.
  • 12. Issues of Browsing • There are three searching tools offered on this page, which should have been offered on the searching page rather than this one. • Only one standard can be used to limit the resources at the same time. And after one chooses a certain standard, there is no other way to further limit the result.
  • 13. Categories/Classification • There are two tiers of the “classification” on this website: – Subcategories: 23 • These subcategories are called “bookshelf” too, which is confusing. – Bookshelves: 133 • Which can be seen as a lower level than subcategories. However, not all bookshelves are linked to a given subcategory.
  • 14. Overall Evaluation • Advantages: – Mobile functionalities: • Mobile site • QR codes • Disadvantages: – Poorly organized and designed; – Failing to display the full richness of the metadata on the website: • LoC classification and subject headings – The interface being lack of communication with the users;

Notas do Editor

  1. The project has been accepting eBooks uploaded by members which are not protected by US copyright laws.
  2. Because this website is also the main page of the whole project, the audience include not only the people who want to get the eBooks but also people who are interested in the project itself.
  3. The indexing system is actually very confusing. This slide lists some of the problems.
  4. The searching result page: related bookshelves and subjects are displayed in front of all the books; books are ranked by popularity (times of download), but one can also choose to sort alphabetically or by released date.
  5. The interface was very unintuitive for me when I first used it.If the book is not ranked high in terms of alphabetic, popularity or released date, and if the result is big, it’s almost impossible for one to find a specific book. Like traditional library catalogs, this interface doesn’t support finding an unknown book very well.
  6. String-matching method cannot solve the issues of one words with multiple meanings or different words bearing the same meaning.
  7. Methods: by author; by title; by language; by recently added; by popularity.One can also browse the website by LC classification (as well as LCSH). However, they are not listed on this page. LC classification can be found only from the book pages.
  8. Not all bookshelves can be linked with a subcategory.Moreover, there are also some bookshelves containing materials in other languages that is not inside the above system, which indicates that the classification scheme in English may not cover all the resources on the website.
  9. Many libraries and other parties have imported the metadata of Gutenberg eBooks to the local systems, which makes the issues of this website a less important one.But this is still a problem!