SlideShare uma empresa Scribd logo
1 de 14
Open Archives Initiatives for Metadata Harvesting
   A Framework for Building Open Digital Libraries




                    Term Paper-1
                       Submitted by


                     NIKESH.N




 International School of Information Management

           University of Mysore
                         2010
Open Archives Initiatives for Metadata Harvesting
            A Framework for Building Open Digital Libraries

1.0       Introduction

Digital Library may be defined as system that supports collection, organization, storage, retrieval
and dissemination of Digital Documents. It may be viewed as the intersection of Library Science,
Computer Science and networked information systems. Open movements are gaining acceptance
in the scholarly information arena and many of the Universities and research centers have started
to provide public access to their repositories. With the growing number of repositories of digital
repositories in the Web, it became difficult for the users to visit individual places in search of
information. Many organizational repositories have not been indexed by the search engines. Such
mechanism is therefore required by which the repositories can share the resources and work in
coordination, to provide a broader purview to the users. The mechanism which provides the ability to
the information systems to work in coordination has been termed as Interoperability. Open Archives
Initiative is one of the landmark efforts to ensure the availability of the metadata of digital resources
of many repositories at the users’ end.
    The essence of the open archives approach is to enable access to Web-accessible material through
interoperable repositories for metadata sharing, publishing and archiving.
Such interoperability requirements necessitated the development of standards such as the Dublin
Core Metadata Element Set and the Open Archives Initiative's Protocol for Metadata Harvesting
(OAI-PMH). These standards have achieved a degree of success in the DL community largely
because of their generality and simplicity.


2.0 Need for a Harvester protocol

There is a growing need to make resources, not only descriptive metadata, harvestable in an
interoperable manner. There are two major use cases that motivate this need:

•     Preservation: The need to periodically transfer digital content from a data repository to one or
      more trusted digital repositories charged with storing and preserving safety copies of the
content. The trusted digital repositories need a mechanism to automatically synchronize with
    the originating data repository.
•   Discovery: The need to use content itself in the creation of services. Examples include search
    engines that make full-text from multiple data repositories searchable, and citation indexing
    systems that extract references from the full-text content. Another scenario is the provision of
    thumbnail versions of high-quality images from cultural heritage collections to external
    services that build browsing interfaces that include the thumbnails


3.0 OAI Protocol for Metadata Harvesting (OAI-PMH)
In October of 1999 the Open Archives Initiative (OAI) was launched in an attempt to address
interoperability issues among the many existing and independent DLs. The focus was on high-
level communication among systems and simplicity of protocols. The OAI has since received
much media attention in the DL community and, primarily because of the simplicity of its
standards, has attracted many early adopters. It defines a mechanism for harvesting records
containing metadata from repositories.

3.1 Definitions of Key terms

•   Open archives Initiatives (OAI)

OAI is an initiative to develop and promote interoperability standards that aim to facilitate the
efficient dissemination of content.

•   Archive
    The term "archive" in the name Open Archives Initiative reflects the origins of the OAI in
    the e-prints community where the term archive is generally accepted as a synonym for
    repository of scholarly papers. Members of the archiving profession have justifiably noted
    the strict definition of an ?archive? within their domain; with connotations of preservation of
    long-term value, statutory authorization and institutional policy. The OAI uses the term ?
    archive? in a broader sense: as a repository for stored information. Language and terms are
    never unambiguous and uncontroversial and the OAI respectfully requests the indulgence of
    the professional archiving community with this broader use of ?archive?
(OAI definition quoted from FAQ on OAI Web site)




•   OAI Protocol for Metadata Harvesting (OAI-PMH)

    OAI-PMH is a lightweight harvesting protocol for sharing metadata between services.




•   Protocol

    A protocol is a set of rules defining communication between systems. FTP (File Transfer
    Protocol) and HTTP (Hypertext Transport Protocol) are examples of other protocols used for
    communication between systems across the Internet.




•   Harvesting

    In the OAI context, harvesting refers specifically to the gathering together of metadata from a
    number of distributed repositories into a combined data store.




3.2 Prerequisites to develop metadata harvesting protocol
To facilitate metadata harvesting there needs to be agreement on:
    o Transport protocol - HTTP or FTP or other such protocol
    o Metadata format - Dublin Core or MARC or other such format
    o Metadata Quality Assurance - mandatory element set, naming and subject conventions, etc.
    o Intellectual Property and Usage Rights - who can do what with what?


3.3 OAI: Key players

There are two groups of 'participants': Data Providers and Service Providers.
Data Providers

(open archives, repositories) provide free access to metadata, and may, but do not necessarily,
offer free access to full texts or other resources. OAI-PMH provides an easy to implement, low
barrier solution for Data Providers.

Service Providers

use the OAI interfaces of the Data Providers to harvest and store metadata. Note that this means
that there are no live search requests to the Data Providers; rather, services are based on the
harvested data via OAI-PMH. Service Providers may select certain subsets from Data Providers
(e.g., by set hierarchy or date stamp). Service Providers offer (value-added) services on the basis
of the metadata harvested, and they may enrich the harvested metadata in order to do so.



3.4 How it works
Prerequisites to develop metadata harvesting protocol
   To facilitate metadata harvesting there needs to be agreement on:
   o Transport protocol - HTTP or FTP or other such protocol
   o Metadata format - Dublin Core or MARC or other such format
   o Metadata Quality Assurance - mandatory element set, naming and subject conventions, etc.
   o Intellectual Property and Usage Rights - who can do what with what?




The OAI-PMH gives a simple technical option for data providers to make their metadata
available to services, based on the open standards HTTP (Hypertext Transport Protocol) and
XML (Extensible Markup Language). The metadata that is harvested may be in any format that
is agreed by a community (or by any discrete set of data and service providers), although
unqualified Dublin Core is specified to provide a basic level of interoperability. Thus, metadata
from many sources can be gathered together in one database, and services can be provided based
on this centrally harvested or "aggregated" data. The link between this metadata and the related
content is not defined by the OAI protocol. It is important to realize that OAI-PMH does not
provide a search across this data, it simply makes it possible to bring the data together in one
place. In order to provide services, the harvesting approach must be combined with other
mechanisms.
3.5 Protocol details

Records

A record is the metadata of a resource in a specific format. A record has three parts: a header and
metadata, both of which are mandatory, and an optional about statement. Each of these is made
up of various components as set out below.

header (mandatory)
 identifier (mandatory: 1 only)
datestamp (mandatory: 1 only)
 setSpec elements (optional: 0, 1 or more)
 status attribute for deleted item

metadata (mandatory)
XML encoded metadata with root tag, namespace
repositories must support Dublin Core, may support other formats

about (optional)
rights statements
provenance statements

Datestamps

A datestamp is the date of last modification of a metadata record. Datestamp is a mandatory
characteristic of every item. It has two possible levels of granularity:
YYYY-MM-DD or YYYY-MM-DDThh:mm:ssZ.

The function of the datestamp is to provide information on metadata that enables selective
harvesting using from and until arguments. Its applications are in incremental update
mechanisms. It gives either the date of creation, last modification, or deletion. Deletion is
covered with three support levels: no, persistent, transient.



Metadata schema

OAI-PMH supports dissemination of multiple metadata formats from a repository. The
properties of metadata formats are:
 –   id string to specify the format (metadataPrefix)
 –   metadata schema URL (XML schema to test validity)
 –   XML namespace URI (global identifier for metadata format)

Repositories must be able to disseminate unqualified Dublin Core. Further arbitrary metadata
formats can be defined and transported via the OAI-PMH. Any returned metadata must comply
with an XML namespace specification. The Dublin Core Metadata Element Set contains 15
elements. All elements are optional, and all elements may be repeated.




3.6 The Dublin Core Metadata Element Set:

Title                        Contributor                   Source
Creator                      Date                          Language
Subject                      Type                          Relation
Description                  Format                        Coverage
Publisher                    Identifier                    Rights



Sets

Sets enable a logical partitioning of repositories. They are optional archives do not have to
define Sets. There are no recommendations for the implementation of Sets. Sets are not
necessarily exhaustive of the content of a repository. They are not necessarily strictly
hierarchical. It is important and necessary to have negotiated agreements within communities
defining useful sets for the communities.

    •     function: selective harvesting (set parameter)
    •     applications: subject gateways, dissertation search engine, and others
    •     examples
              o   publication types (thesis, article, ?)
              o   document types (text, audio, image, ?)
              o   content sets, according to DNB (medicine, biology, ?)

3.7 Request format

Requests must be submitted using the GET or POST methods of HTTP, and repositories must
support both methods. At least one key=value pair: verb=RequestType (where RequestType is
some type of request such as ListRecords) must be provided. Additional key=value pairs depend
on the request type.

example for GET request: http://archive.org/oai?
verb=ListRecords&metadataPrefix=oai_dc

The encoding of special characters must be supported; for example, ":" (host port separator)
becomes "%3A"



3.8 Response

Responses are formatted as HTTP responses. The content type must be text/xml. HTTP-based
status codes, as distinguished from OAI-PMH errors, such as 302 (redirect) and 503 (service not
available) may be returned. Compression codes are optional in OAI-PMH, only identity
encoding is mandatory. The response format must be well-formed XML with markup as follows:

   1. XML declaration
       (<?xml version="1.0" encoding="UTF-8" ?>)
   2. root element named OAI-PMH with three attributes
       (xmlns, xmlns:xsi, xsi:schemaLocation)
   3. three child elements
           1. responseDate (UTC datetime)
           2. request (the request that generated this response)
           3. a) error (in case of an error or exception condition)
               b) element with the name of the OAI-PMH request
3.9                                                                                                OAI-
                                                                                                   PMH
                                                                                                   Verbs
Here                                                                                               ‘verb’
                                                                                                   means
request type which the service provider/harvester sends to get responses from data providers. There is
a standard set of 6 verbs:
      o Identify
      o ListMetadataFormats
      o ListSets
      o GetRecord
      o ListIdentifiers
      o ListRecords



                                                               Function
                          Identify                                         Description of repository
                   ListMetadataFormats                 Metadata format supported by the repository
                          ListSets                                        Sets defined by repository
                       ListIdentifiers                       Retrieves unique identifiers of the item
                        ListRecords                     Used to harvest records from the repository
                        GetRecords                     Retrieves individual metadata record from the
                                                                                          repository
A harvester is not required to use all types. However, a repository must implement all types.
There are required and optional arguments, depending on request types.




4.0 Dspace : OAI compatible Digital Library Software
DSpace is open source software for building and managing Digital repositories. Developed jointly by
MIT Libraries and Hewlett-Packard (HP), is freely available to research institutions as an open
source system that can be customized and extended. DSpace is a digital institutional repository that
captures, stores, indexes, preserves, and redistributes content in digital formats. Institutional
Repository is a set of services that a research institution/ organization/ university offers to the
members of its community for the management and dissemination of digital
materials created by the institution and its community members Typically, DSpace has been
deployed for Institutional Repositories of publications, thesis and dissertations. There are several
groups working on extending its capabilities such implementation of ontologies in search interface
and for submission module, customization for management of electronic theses and dissertations and
for localization and international of the package for the world languages.
Dspace is compliant with OAI-PMH ver 2.0 and metadata in Dspace digital libraries can be
harvested.
4.1 DSpace Search System
The end user can browse, search and access the collections using the hierarchies and also the
alphabetic bar menu. For searching the collection, Dspace uses Lucene Search Engine, which is a
part of Apache Jakarta Project (1). Additionally research projects such as the …(Portugal)…
provides Ontologies that enables context based querying. This work like subject based directory
structures.
Lucene search engine has very powerful search features that encompass many search approaches of
the end-user. It provides the basic ‘exact term’ or keyword search. In addition it allows fielded search
akin the field level search of library databases. In Dspace, Dublin Core elements are used for the field
names. Lucene also facilitates Boolean search, range searches, term boosting and proximity searches.
The interesting search facility lucene uses fuzzy logic that is based on the Levenstien’s alogorithm
(5) that can replace and match terms by similarity. This feature is especially useful in instances where
we hear a term and guess it spellings and more so in the case of personal names.
4.2 Metadata in Dspace
DSpace users deal with/come across metadata in the following modules:
   D     Administration modules: Dublin core registry, administrative metadata- default values, mail
        alert to subscribers
   a Submission modules: descriptive metadata
   a Harvesting – OAI-PMH using the DC elements (unqualified)
   a Search result display: brief and full metadata
4.3 Metadata harvesting in Dspace
Dspace is compliant with the OAI-PMH for exposing metadata. OAI-PMH allows repositories to
expose an hierarchy of sets in which records may be placed. DSpace exposes collections as sets.
Each collection has a corresponding OAI set and harvestors use a verb (OAI- command) ListSets, to
discover the sets. Only the 15 basic Dublin Core elements is exposed at present.


5.0 OAI Harvester Software
       o Arc (http://arc.cs.odu.edu/)
       o Citebase (http://citebase.eprints.org/cgi-bin/search)
       o CYCLADES (http://www.ercim.org/cyclades/)
       o DP9 (http://arc.cs.odu.edu:8080/dp9/index.jsp)
       o MeIND (http://www.meind.de/)
       o METALIS (http://metalis.cilea.it/)
       o my.OAI (http://www.myoai.com)
       o NCSTRL (http://www.ncstrl.org/)
       o Purseus (http://www.perseus.tufts.edu/cgi-bin/vor)
       o Public Knowledge Project – Open Archives Harvester (http://pkp.ubc.ca/harvester/)
       o OAICAT (http://www.oclc.org/research/software/oai/cat.htm)
       o OAI Repository Explorer (http://re.cs.uct.ac.za/)
       o OAIster (http://oaister.umdl.umich.edu/o/oaister/)
       o OASIC (Open Archvies en SIC) (http://oasic.ccsd.cnrs.fr/)
       o OAIHarvester (http://www.oclc.org/research/software/oai/harvester.htm)
       o DLESE OAI Software (http://dlese.org/oai/index.jsp)


6.0 Future Prospects
Some more work has to be done in order to make OAI-PMH as a complete globally accepted
metadata harvesting protocol:
   o Tools and software has to be developed by which the non-OAI-PMH compliant repositories
       can be converted into OAI-PMH compliant so that the repository can be made data provider.
   o The higher versions of the protocol should be made compatible of the lower ones.


At metadata creation level some standardization is required, as a particular resource is described
inconsistently at different repositories. Vocabulary control measures should be also taken care of.
     Still some more improvements are awaited in OAI-PMH protocol, and then only we can ensure
         a comprehensive view of the resources available on a particular subject to our end-users.




7.0 Conclusion

Much promise is seen for the use of the protocol within an open archives approach. Support for a
new pattern for scholarly communication is the most publicized potential benefit. Perhaps most
readily achievable are the goals of surfacing 'hidden resources' and low cost interoperability.
Although the OAI-PMH is technically very simple, building coherent services that meet user
requirements remains complex. The OAI-PMH protocol could become part of the infrastructure
of the Web, as taken-for-granted as the HTTP protocol now is, if a combination of its relative
simplicity and proven success by early implementers in a service context leads to widespread
uptake by research organizations, publishers and archives.



REFERENCES

   1. http://www.openarchives.org/
   2. Breeding, M. (2002, April). The Emergence of the Open Archives Initiative: This Protocol
   could become a key part of the digital library infrastructure. Information Today.
   from http://www.findarticles.com/cf_0/m3336/4_19/85251474/p1/article.jhtml

   3. Breeding, M. (2002). Understanding the Protocol for Metadata Harvesting of the Open
   Archives Initiative. Computers in Libraries, 22(8).

   4. Lagoze, C., & Sompel, H. V. d. (2001, January). The Open Archives Initiative Protocol for
   Metadata Harvesting,from http://www.openarchives.org/OAI/openarchivesprotocol.htm
5. Lynch, C. A. (2001, August). Metadata Harvesting and the Open Archives Initiative. ARL
   Bimonthly Report 217. from http://www.arl.org/newsltr/217/mhp.html

   6. Shearer, K. (2002, March). The Open Archives Initiative: Developing an Interoperability
   Framework for Scholarly Publishing. CARL/ABRC Background Series, No. 5. from
   http://www.carl-abrc.ca/projects/scholarly/open_archives.PDF

   7. Suleman, H., & Fox, E. A. (2001, December). A Framework for Building Open Digital
   Libraries. D-Lib Magazine, 7(12). from
   http://www.dlib.org/dlib/december01/suleman/12suleman.html

    8. Sompel, H. V. d., & Lagoze, C. (2000, February). The Santa Fe Convention of the Open
Archives Initiative. D-Lib Magazine, 6(2). from http://www.dlib.org/dlib/february00/vandesompel-
oai/02vandesompel-oai.html

    9. Warner, S. (2001, June). Exposing and Harvesting Metadata Using the OAI Metadata
Harvesting Protocol: A Tutorial. HEP Libraries Webzine Issue 4. from
http://library.cern.ch/HEPLW/4/papers/3/

11 . http://www.ukoln.ac.uk/repositories/digirep/index/FAQs
12 . Michael Shepherd, (2003), Interoperability for Digital Libraries, DRTC Workshop on
Semantic Web 8th – 10th December, 2003,DRTC, Bangalore
13 . http://www.openarchives.org/Register/BrowseSites
14 . http://www.openarchives.org/service/listproviders.html

Mais conteúdo relacionado

Mais procurados (20)

Information Analysis Consolidation and Repackaging (IACR): an overview
Information Analysis Consolidation and Repackaging (IACR): an overviewInformation Analysis Consolidation and Repackaging (IACR): an overview
Information Analysis Consolidation and Repackaging (IACR): an overview
 
NISCAIR by Jaya Singh
NISCAIR by Jaya SinghNISCAIR by Jaya Singh
NISCAIR by Jaya Singh
 
Electronic Resource Management in the library
Electronic Resource Management in the libraryElectronic Resource Management in the library
Electronic Resource Management in the library
 
Dspace software
Dspace softwareDspace software
Dspace software
 
Information products
Information products Information products
Information products
 
Oclc
OclcOclc
Oclc
 
Desidoc
DesidocDesidoc
Desidoc
 
LIS 653, Session 10: Controlled Vocabulary
LIS 653, Session 10: Controlled VocabularyLIS 653, Session 10: Controlled Vocabulary
LIS 653, Session 10: Controlled Vocabulary
 
International Digital Library Initiatives
International Digital Library InitiativesInternational Digital Library Initiatives
International Digital Library Initiatives
 
Institutional repositories
Institutional repositoriesInstitutional repositories
Institutional repositories
 
Structure of subject lit ppt
Structure of subject lit pptStructure of subject lit ppt
Structure of subject lit ppt
 
Soul
Soul Soul
Soul
 
Indexing language concept types and characteristics
Indexing language concept types and characteristicsIndexing language concept types and characteristics
Indexing language concept types and characteristics
 
Digital library software
Digital library softwareDigital library software
Digital library software
 
CAS & SDI service
CAS & SDI serviceCAS & SDI service
CAS & SDI service
 
Subject gateway knowledge organisation
Subject gateway knowledge organisationSubject gateway knowledge organisation
Subject gateway knowledge organisation
 
Dspace
DspaceDspace
Dspace
 
DELNET.pptx
DELNET.pptxDELNET.pptx
DELNET.pptx
 
citation analysis
citation analysiscitation analysis
citation analysis
 
Greenstone Digital Library
Greenstone Digital LibraryGreenstone Digital Library
Greenstone Digital Library
 

Semelhante a Open Archives Initiatives For Metadata Harvesting

Towards an Infrastructure for Mining Scientific Publications
Towards an Infrastructure for Mining Scientific PublicationsTowards an Infrastructure for Mining Scientific Publications
Towards an Infrastructure for Mining Scientific Publicationspetrknoth
 
The Open Archives Initiative Protocol for Metadata Harvesting and ePrints UK
The Open Archives Initiative Protocol for Metadata Harvesting and ePrints UKThe Open Archives Initiative Protocol for Metadata Harvesting and ePrints UK
The Open Archives Initiative Protocol for Metadata Harvesting and ePrints UKAndy Powell
 
Open archives initiatives(final)
 Open archives initiatives(final) Open archives initiatives(final)
Open archives initiatives(final)marevil awas
 
Open archives initiatives(final)
 Open archives initiatives(final) Open archives initiatives(final)
Open archives initiatives(final)floyd taag
 
Open archives initiatives(final)
 Open archives initiatives(final) Open archives initiatives(final)
Open archives initiatives(final)floyd taag
 
Open archives initiatives(final)
 Open archives initiatives(final) Open archives initiatives(final)
Open archives initiatives(final)floyd taag
 
-Open Archives Initiatives(final)
-Open Archives Initiatives(final)-Open Archives Initiatives(final)
-Open Archives Initiatives(final)floyd taag
 
Open for Business - Open Archives, OpenURL, RSS and the Dublin Core
Open for Business - Open Archives, OpenURL, RSS and the Dublin CoreOpen for Business - Open Archives, OpenURL, RSS and the Dublin Core
Open for Business - Open Archives, OpenURL, RSS and the Dublin CoreAndy Powell
 
The Open Archives Initiative Protocol for Metadata Harvesting
The Open Archives Initiative Protocol for Metadata HarvestingThe Open Archives Initiative Protocol for Metadata Harvesting
The Open Archives Initiative Protocol for Metadata HarvestingAndy Powell
 
From Open Access Metadata to Open Access Content: Two Principles for Increase...
From Open Access Metadata to Open Access Content: Two Principles for Increase...From Open Access Metadata to Open Access Content: Two Principles for Increase...
From Open Access Metadata to Open Access Content: Two Principles for Increase...petrknoth
 
Digital Libraries
Digital LibrariesDigital Libraries
Digital LibrariesJack Eapen
 
Digital Libraries
Digital LibrariesDigital Libraries
Digital LibrariesJack Eapen
 
DataCite and its DOI infrastructure - IASSIST 2013
DataCite and its DOI infrastructure - IASSIST 2013DataCite and its DOI infrastructure - IASSIST 2013
DataCite and its DOI infrastructure - IASSIST 2013Frauke Ziedorn
 
Linked Open Data Principles, Technologies and Examples
Linked Open Data Principles, Technologies and ExamplesLinked Open Data Principles, Technologies and Examples
Linked Open Data Principles, Technologies and ExamplesOpen Data Support
 

Semelhante a Open Archives Initiatives For Metadata Harvesting (20)

Towards an Infrastructure for Mining Scientific Publications
Towards an Infrastructure for Mining Scientific PublicationsTowards an Infrastructure for Mining Scientific Publications
Towards an Infrastructure for Mining Scientific Publications
 
Digitisation and institutional repositories 3
Digitisation and institutional repositories 3Digitisation and institutional repositories 3
Digitisation and institutional repositories 3
 
The Open Archives Initiative Protocol for Metadata Harvesting and ePrints UK
The Open Archives Initiative Protocol for Metadata Harvesting and ePrints UKThe Open Archives Initiative Protocol for Metadata Harvesting and ePrints UK
The Open Archives Initiative Protocol for Metadata Harvesting and ePrints UK
 
Metadata april 8 2013
Metadata april 8 2013Metadata april 8 2013
Metadata april 8 2013
 
Open archives initiatives(final)
 Open archives initiatives(final) Open archives initiatives(final)
Open archives initiatives(final)
 
Open archives initiatives(final)
 Open archives initiatives(final) Open archives initiatives(final)
Open archives initiatives(final)
 
Open archives initiatives(final)
 Open archives initiatives(final) Open archives initiatives(final)
Open archives initiatives(final)
 
Open archives initiatives(final)
 Open archives initiatives(final) Open archives initiatives(final)
Open archives initiatives(final)
 
-Open Archives Initiatives(final)
-Open Archives Initiatives(final)-Open Archives Initiatives(final)
-Open Archives Initiatives(final)
 
Digitisation and institutional repositories 2
Digitisation and institutional repositories 2Digitisation and institutional repositories 2
Digitisation and institutional repositories 2
 
Presentation 16 may keynote karin bredenberg
Presentation 16 may keynote karin bredenbergPresentation 16 may keynote karin bredenberg
Presentation 16 may keynote karin bredenberg
 
OAI-PMH
OAI-PMHOAI-PMH
OAI-PMH
 
Open for Business - Open Archives, OpenURL, RSS and the Dublin Core
Open for Business - Open Archives, OpenURL, RSS and the Dublin CoreOpen for Business - Open Archives, OpenURL, RSS and the Dublin Core
Open for Business - Open Archives, OpenURL, RSS and the Dublin Core
 
The Open Archives Initiative Protocol for Metadata Harvesting
The Open Archives Initiative Protocol for Metadata HarvestingThe Open Archives Initiative Protocol for Metadata Harvesting
The Open Archives Initiative Protocol for Metadata Harvesting
 
From Open Access Metadata to Open Access Content: Two Principles for Increase...
From Open Access Metadata to Open Access Content: Two Principles for Increase...From Open Access Metadata to Open Access Content: Two Principles for Increase...
From Open Access Metadata to Open Access Content: Two Principles for Increase...
 
Harvesting&Metadata Enrich Project EVA 2009
Harvesting&Metadata Enrich Project   EVA 2009Harvesting&Metadata Enrich Project   EVA 2009
Harvesting&Metadata Enrich Project EVA 2009
 
Digital Libraries
Digital LibrariesDigital Libraries
Digital Libraries
 
Digital Libraries
Digital LibrariesDigital Libraries
Digital Libraries
 
DataCite and its DOI infrastructure - IASSIST 2013
DataCite and its DOI infrastructure - IASSIST 2013DataCite and its DOI infrastructure - IASSIST 2013
DataCite and its DOI infrastructure - IASSIST 2013
 
Linked Open Data Principles, Technologies and Examples
Linked Open Data Principles, Technologies and ExamplesLinked Open Data Principles, Technologies and Examples
Linked Open Data Principles, Technologies and Examples
 

Mais de Nikesh Narayanan

Federated to library discovery platfoms
Federated to library discovery platfomsFederated to library discovery platfoms
Federated to library discovery platfomsNikesh Narayanan
 
Role of libraries in research and scholarly communication
Role of libraries in research and scholarly communicationRole of libraries in research and scholarly communication
Role of libraries in research and scholarly communicationNikesh Narayanan
 
Research data management free online courses, publisher policies
Research data management   free online courses, publisher policiesResearch data management   free online courses, publisher policies
Research data management free online courses, publisher policiesNikesh Narayanan
 
Role of libraries in accelerating research
Role of libraries in accelerating researchRole of libraries in accelerating research
Role of libraries in accelerating researchNikesh Narayanan
 
Current and emerging trends in library services
Current and emerging trends in library servicesCurrent and emerging trends in library services
Current and emerging trends in library servicesNikesh Narayanan
 
Implementing web scale discovery services: special reference to Indian Librar...
Implementing web scale discovery services: special reference to Indian Librar...Implementing web scale discovery services: special reference to Indian Librar...
Implementing web scale discovery services: special reference to Indian Librar...Nikesh Narayanan
 
Web scale discovery vs google scholar
Web scale discovery vs google scholarWeb scale discovery vs google scholar
Web scale discovery vs google scholarNikesh Narayanan
 
Web Scale Discovery Services: Google like search experience
Web Scale Discovery Services: Google like search experienceWeb Scale Discovery Services: Google like search experience
Web Scale Discovery Services: Google like search experienceNikesh Narayanan
 
Evaluation of web scale discovery services
Evaluation of web scale discovery servicesEvaluation of web scale discovery services
Evaluation of web scale discovery servicesNikesh Narayanan
 
Evaluation of Web Scale Discovery Services
Evaluation of Web Scale Discovery ServicesEvaluation of Web Scale Discovery Services
Evaluation of Web Scale Discovery ServicesNikesh Narayanan
 
Web Scale Discovery Vs Federated Search
Web Scale Discovery Vs Federated SearchWeb Scale Discovery Vs Federated Search
Web Scale Discovery Vs Federated SearchNikesh Narayanan
 
Cloud web scale discovery services landscape an overview
Cloud web scale discovery services landscape an overviewCloud web scale discovery services landscape an overview
Cloud web scale discovery services landscape an overviewNikesh Narayanan
 
Tag based Information Retrieval using foksonomy
Tag based Information Retrieval using foksonomyTag based Information Retrieval using foksonomy
Tag based Information Retrieval using foksonomyNikesh Narayanan
 
Emerging Trends in Knowledge Management
Emerging Trends in Knowledge ManagementEmerging Trends in Knowledge Management
Emerging Trends in Knowledge ManagementNikesh Narayanan
 
Knowledge Management at Infosys and Unisys : A Comparison
Knowledge Management at Infosys and Unisys : A ComparisonKnowledge Management at Infosys and Unisys : A Comparison
Knowledge Management at Infosys and Unisys : A ComparisonNikesh Narayanan
 
Semantic Web Technologies For Digital Libraries
Semantic Web Technologies For Digital LibrariesSemantic Web Technologies For Digital Libraries
Semantic Web Technologies For Digital LibrariesNikesh Narayanan
 
Knowledge Management at Infosys and Unisys A comparison
Knowledge Management at Infosys and UnisysA comparisonKnowledge Management at Infosys and UnisysA comparison
Knowledge Management at Infosys and Unisys A comparisonNikesh Narayanan
 
Personal Knowledge Management
Personal Knowledge ManagementPersonal Knowledge Management
Personal Knowledge ManagementNikesh Narayanan
 
Sequence Alignment In Bioinformatics
Sequence Alignment In BioinformaticsSequence Alignment In Bioinformatics
Sequence Alignment In BioinformaticsNikesh Narayanan
 

Mais de Nikesh Narayanan (20)

Federated to library discovery platfoms
Federated to library discovery platfomsFederated to library discovery platfoms
Federated to library discovery platfoms
 
Role of libraries in research and scholarly communication
Role of libraries in research and scholarly communicationRole of libraries in research and scholarly communication
Role of libraries in research and scholarly communication
 
Research data management free online courses, publisher policies
Research data management   free online courses, publisher policiesResearch data management   free online courses, publisher policies
Research data management free online courses, publisher policies
 
Role of libraries in accelerating research
Role of libraries in accelerating researchRole of libraries in accelerating research
Role of libraries in accelerating research
 
Researh data management
Researh data managementResearh data management
Researh data management
 
Current and emerging trends in library services
Current and emerging trends in library servicesCurrent and emerging trends in library services
Current and emerging trends in library services
 
Implementing web scale discovery services: special reference to Indian Librar...
Implementing web scale discovery services: special reference to Indian Librar...Implementing web scale discovery services: special reference to Indian Librar...
Implementing web scale discovery services: special reference to Indian Librar...
 
Web scale discovery vs google scholar
Web scale discovery vs google scholarWeb scale discovery vs google scholar
Web scale discovery vs google scholar
 
Web Scale Discovery Services: Google like search experience
Web Scale Discovery Services: Google like search experienceWeb Scale Discovery Services: Google like search experience
Web Scale Discovery Services: Google like search experience
 
Evaluation of web scale discovery services
Evaluation of web scale discovery servicesEvaluation of web scale discovery services
Evaluation of web scale discovery services
 
Evaluation of Web Scale Discovery Services
Evaluation of Web Scale Discovery ServicesEvaluation of Web Scale Discovery Services
Evaluation of Web Scale Discovery Services
 
Web Scale Discovery Vs Federated Search
Web Scale Discovery Vs Federated SearchWeb Scale Discovery Vs Federated Search
Web Scale Discovery Vs Federated Search
 
Cloud web scale discovery services landscape an overview
Cloud web scale discovery services landscape an overviewCloud web scale discovery services landscape an overview
Cloud web scale discovery services landscape an overview
 
Tag based Information Retrieval using foksonomy
Tag based Information Retrieval using foksonomyTag based Information Retrieval using foksonomy
Tag based Information Retrieval using foksonomy
 
Emerging Trends in Knowledge Management
Emerging Trends in Knowledge ManagementEmerging Trends in Knowledge Management
Emerging Trends in Knowledge Management
 
Knowledge Management at Infosys and Unisys : A Comparison
Knowledge Management at Infosys and Unisys : A ComparisonKnowledge Management at Infosys and Unisys : A Comparison
Knowledge Management at Infosys and Unisys : A Comparison
 
Semantic Web Technologies For Digital Libraries
Semantic Web Technologies For Digital LibrariesSemantic Web Technologies For Digital Libraries
Semantic Web Technologies For Digital Libraries
 
Knowledge Management at Infosys and Unisys A comparison
Knowledge Management at Infosys and UnisysA comparisonKnowledge Management at Infosys and UnisysA comparison
Knowledge Management at Infosys and Unisys A comparison
 
Personal Knowledge Management
Personal Knowledge ManagementPersonal Knowledge Management
Personal Knowledge Management
 
Sequence Alignment In Bioinformatics
Sequence Alignment In BioinformaticsSequence Alignment In Bioinformatics
Sequence Alignment In Bioinformatics
 

Último

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Principled Technologies
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 

Último (20)

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 

Open Archives Initiatives For Metadata Harvesting

  • 1. Open Archives Initiatives for Metadata Harvesting A Framework for Building Open Digital Libraries Term Paper-1 Submitted by NIKESH.N International School of Information Management University of Mysore 2010
  • 2. Open Archives Initiatives for Metadata Harvesting A Framework for Building Open Digital Libraries 1.0 Introduction Digital Library may be defined as system that supports collection, organization, storage, retrieval and dissemination of Digital Documents. It may be viewed as the intersection of Library Science, Computer Science and networked information systems. Open movements are gaining acceptance in the scholarly information arena and many of the Universities and research centers have started to provide public access to their repositories. With the growing number of repositories of digital repositories in the Web, it became difficult for the users to visit individual places in search of information. Many organizational repositories have not been indexed by the search engines. Such mechanism is therefore required by which the repositories can share the resources and work in coordination, to provide a broader purview to the users. The mechanism which provides the ability to the information systems to work in coordination has been termed as Interoperability. Open Archives Initiative is one of the landmark efforts to ensure the availability of the metadata of digital resources of many repositories at the users’ end. The essence of the open archives approach is to enable access to Web-accessible material through interoperable repositories for metadata sharing, publishing and archiving. Such interoperability requirements necessitated the development of standards such as the Dublin Core Metadata Element Set and the Open Archives Initiative's Protocol for Metadata Harvesting (OAI-PMH). These standards have achieved a degree of success in the DL community largely because of their generality and simplicity. 2.0 Need for a Harvester protocol There is a growing need to make resources, not only descriptive metadata, harvestable in an interoperable manner. There are two major use cases that motivate this need: • Preservation: The need to periodically transfer digital content from a data repository to one or more trusted digital repositories charged with storing and preserving safety copies of the
  • 3. content. The trusted digital repositories need a mechanism to automatically synchronize with the originating data repository. • Discovery: The need to use content itself in the creation of services. Examples include search engines that make full-text from multiple data repositories searchable, and citation indexing systems that extract references from the full-text content. Another scenario is the provision of thumbnail versions of high-quality images from cultural heritage collections to external services that build browsing interfaces that include the thumbnails 3.0 OAI Protocol for Metadata Harvesting (OAI-PMH) In October of 1999 the Open Archives Initiative (OAI) was launched in an attempt to address interoperability issues among the many existing and independent DLs. The focus was on high- level communication among systems and simplicity of protocols. The OAI has since received much media attention in the DL community and, primarily because of the simplicity of its standards, has attracted many early adopters. It defines a mechanism for harvesting records containing metadata from repositories. 3.1 Definitions of Key terms • Open archives Initiatives (OAI) OAI is an initiative to develop and promote interoperability standards that aim to facilitate the efficient dissemination of content. • Archive The term "archive" in the name Open Archives Initiative reflects the origins of the OAI in the e-prints community where the term archive is generally accepted as a synonym for repository of scholarly papers. Members of the archiving profession have justifiably noted the strict definition of an ?archive? within their domain; with connotations of preservation of long-term value, statutory authorization and institutional policy. The OAI uses the term ? archive? in a broader sense: as a repository for stored information. Language and terms are never unambiguous and uncontroversial and the OAI respectfully requests the indulgence of the professional archiving community with this broader use of ?archive?
  • 4. (OAI definition quoted from FAQ on OAI Web site) • OAI Protocol for Metadata Harvesting (OAI-PMH) OAI-PMH is a lightweight harvesting protocol for sharing metadata between services. • Protocol A protocol is a set of rules defining communication between systems. FTP (File Transfer Protocol) and HTTP (Hypertext Transport Protocol) are examples of other protocols used for communication between systems across the Internet. • Harvesting In the OAI context, harvesting refers specifically to the gathering together of metadata from a number of distributed repositories into a combined data store. 3.2 Prerequisites to develop metadata harvesting protocol To facilitate metadata harvesting there needs to be agreement on: o Transport protocol - HTTP or FTP or other such protocol o Metadata format - Dublin Core or MARC or other such format o Metadata Quality Assurance - mandatory element set, naming and subject conventions, etc. o Intellectual Property and Usage Rights - who can do what with what? 3.3 OAI: Key players There are two groups of 'participants': Data Providers and Service Providers.
  • 5. Data Providers (open archives, repositories) provide free access to metadata, and may, but do not necessarily, offer free access to full texts or other resources. OAI-PMH provides an easy to implement, low barrier solution for Data Providers. Service Providers use the OAI interfaces of the Data Providers to harvest and store metadata. Note that this means that there are no live search requests to the Data Providers; rather, services are based on the harvested data via OAI-PMH. Service Providers may select certain subsets from Data Providers (e.g., by set hierarchy or date stamp). Service Providers offer (value-added) services on the basis of the metadata harvested, and they may enrich the harvested metadata in order to do so. 3.4 How it works
  • 6. Prerequisites to develop metadata harvesting protocol To facilitate metadata harvesting there needs to be agreement on: o Transport protocol - HTTP or FTP or other such protocol o Metadata format - Dublin Core or MARC or other such format o Metadata Quality Assurance - mandatory element set, naming and subject conventions, etc. o Intellectual Property and Usage Rights - who can do what with what? The OAI-PMH gives a simple technical option for data providers to make their metadata available to services, based on the open standards HTTP (Hypertext Transport Protocol) and XML (Extensible Markup Language). The metadata that is harvested may be in any format that is agreed by a community (or by any discrete set of data and service providers), although unqualified Dublin Core is specified to provide a basic level of interoperability. Thus, metadata from many sources can be gathered together in one database, and services can be provided based on this centrally harvested or "aggregated" data. The link between this metadata and the related content is not defined by the OAI protocol. It is important to realize that OAI-PMH does not provide a search across this data, it simply makes it possible to bring the data together in one place. In order to provide services, the harvesting approach must be combined with other mechanisms. 3.5 Protocol details Records A record is the metadata of a resource in a specific format. A record has three parts: a header and metadata, both of which are mandatory, and an optional about statement. Each of these is made up of various components as set out below. header (mandatory) identifier (mandatory: 1 only)
  • 7. datestamp (mandatory: 1 only) setSpec elements (optional: 0, 1 or more) status attribute for deleted item metadata (mandatory) XML encoded metadata with root tag, namespace repositories must support Dublin Core, may support other formats about (optional) rights statements provenance statements Datestamps A datestamp is the date of last modification of a metadata record. Datestamp is a mandatory characteristic of every item. It has two possible levels of granularity: YYYY-MM-DD or YYYY-MM-DDThh:mm:ssZ. The function of the datestamp is to provide information on metadata that enables selective harvesting using from and until arguments. Its applications are in incremental update mechanisms. It gives either the date of creation, last modification, or deletion. Deletion is covered with three support levels: no, persistent, transient. Metadata schema OAI-PMH supports dissemination of multiple metadata formats from a repository. The properties of metadata formats are: – id string to specify the format (metadataPrefix) – metadata schema URL (XML schema to test validity) – XML namespace URI (global identifier for metadata format) Repositories must be able to disseminate unqualified Dublin Core. Further arbitrary metadata formats can be defined and transported via the OAI-PMH. Any returned metadata must comply
  • 8. with an XML namespace specification. The Dublin Core Metadata Element Set contains 15 elements. All elements are optional, and all elements may be repeated. 3.6 The Dublin Core Metadata Element Set: Title Contributor Source Creator Date Language Subject Type Relation Description Format Coverage Publisher Identifier Rights Sets Sets enable a logical partitioning of repositories. They are optional archives do not have to define Sets. There are no recommendations for the implementation of Sets. Sets are not necessarily exhaustive of the content of a repository. They are not necessarily strictly hierarchical. It is important and necessary to have negotiated agreements within communities defining useful sets for the communities. • function: selective harvesting (set parameter) • applications: subject gateways, dissertation search engine, and others • examples o publication types (thesis, article, ?) o document types (text, audio, image, ?) o content sets, according to DNB (medicine, biology, ?) 3.7 Request format Requests must be submitted using the GET or POST methods of HTTP, and repositories must support both methods. At least one key=value pair: verb=RequestType (where RequestType is
  • 9. some type of request such as ListRecords) must be provided. Additional key=value pairs depend on the request type. example for GET request: http://archive.org/oai? verb=ListRecords&metadataPrefix=oai_dc The encoding of special characters must be supported; for example, ":" (host port separator) becomes "%3A" 3.8 Response Responses are formatted as HTTP responses. The content type must be text/xml. HTTP-based status codes, as distinguished from OAI-PMH errors, such as 302 (redirect) and 503 (service not available) may be returned. Compression codes are optional in OAI-PMH, only identity encoding is mandatory. The response format must be well-formed XML with markup as follows: 1. XML declaration (<?xml version="1.0" encoding="UTF-8" ?>) 2. root element named OAI-PMH with three attributes (xmlns, xmlns:xsi, xsi:schemaLocation) 3. three child elements 1. responseDate (UTC datetime) 2. request (the request that generated this response) 3. a) error (in case of an error or exception condition) b) element with the name of the OAI-PMH request
  • 10. 3.9 OAI- PMH Verbs Here ‘verb’ means request type which the service provider/harvester sends to get responses from data providers. There is a standard set of 6 verbs: o Identify o ListMetadataFormats o ListSets o GetRecord o ListIdentifiers o ListRecords Function Identify Description of repository ListMetadataFormats Metadata format supported by the repository ListSets Sets defined by repository ListIdentifiers Retrieves unique identifiers of the item ListRecords Used to harvest records from the repository GetRecords Retrieves individual metadata record from the repository
  • 11. A harvester is not required to use all types. However, a repository must implement all types. There are required and optional arguments, depending on request types. 4.0 Dspace : OAI compatible Digital Library Software DSpace is open source software for building and managing Digital repositories. Developed jointly by MIT Libraries and Hewlett-Packard (HP), is freely available to research institutions as an open source system that can be customized and extended. DSpace is a digital institutional repository that captures, stores, indexes, preserves, and redistributes content in digital formats. Institutional Repository is a set of services that a research institution/ organization/ university offers to the members of its community for the management and dissemination of digital materials created by the institution and its community members Typically, DSpace has been deployed for Institutional Repositories of publications, thesis and dissertations. There are several groups working on extending its capabilities such implementation of ontologies in search interface and for submission module, customization for management of electronic theses and dissertations and for localization and international of the package for the world languages. Dspace is compliant with OAI-PMH ver 2.0 and metadata in Dspace digital libraries can be harvested. 4.1 DSpace Search System The end user can browse, search and access the collections using the hierarchies and also the alphabetic bar menu. For searching the collection, Dspace uses Lucene Search Engine, which is a part of Apache Jakarta Project (1). Additionally research projects such as the …(Portugal)… provides Ontologies that enables context based querying. This work like subject based directory structures. Lucene search engine has very powerful search features that encompass many search approaches of the end-user. It provides the basic ‘exact term’ or keyword search. In addition it allows fielded search akin the field level search of library databases. In Dspace, Dublin Core elements are used for the field names. Lucene also facilitates Boolean search, range searches, term boosting and proximity searches. The interesting search facility lucene uses fuzzy logic that is based on the Levenstien’s alogorithm (5) that can replace and match terms by similarity. This feature is especially useful in instances where we hear a term and guess it spellings and more so in the case of personal names.
  • 12. 4.2 Metadata in Dspace DSpace users deal with/come across metadata in the following modules: D Administration modules: Dublin core registry, administrative metadata- default values, mail alert to subscribers a Submission modules: descriptive metadata a Harvesting – OAI-PMH using the DC elements (unqualified) a Search result display: brief and full metadata 4.3 Metadata harvesting in Dspace Dspace is compliant with the OAI-PMH for exposing metadata. OAI-PMH allows repositories to expose an hierarchy of sets in which records may be placed. DSpace exposes collections as sets. Each collection has a corresponding OAI set and harvestors use a verb (OAI- command) ListSets, to discover the sets. Only the 15 basic Dublin Core elements is exposed at present. 5.0 OAI Harvester Software o Arc (http://arc.cs.odu.edu/) o Citebase (http://citebase.eprints.org/cgi-bin/search) o CYCLADES (http://www.ercim.org/cyclades/) o DP9 (http://arc.cs.odu.edu:8080/dp9/index.jsp) o MeIND (http://www.meind.de/) o METALIS (http://metalis.cilea.it/) o my.OAI (http://www.myoai.com) o NCSTRL (http://www.ncstrl.org/) o Purseus (http://www.perseus.tufts.edu/cgi-bin/vor) o Public Knowledge Project – Open Archives Harvester (http://pkp.ubc.ca/harvester/) o OAICAT (http://www.oclc.org/research/software/oai/cat.htm) o OAI Repository Explorer (http://re.cs.uct.ac.za/) o OAIster (http://oaister.umdl.umich.edu/o/oaister/) o OASIC (Open Archvies en SIC) (http://oasic.ccsd.cnrs.fr/) o OAIHarvester (http://www.oclc.org/research/software/oai/harvester.htm) o DLESE OAI Software (http://dlese.org/oai/index.jsp) 6.0 Future Prospects
  • 13. Some more work has to be done in order to make OAI-PMH as a complete globally accepted metadata harvesting protocol: o Tools and software has to be developed by which the non-OAI-PMH compliant repositories can be converted into OAI-PMH compliant so that the repository can be made data provider. o The higher versions of the protocol should be made compatible of the lower ones. At metadata creation level some standardization is required, as a particular resource is described inconsistently at different repositories. Vocabulary control measures should be also taken care of. Still some more improvements are awaited in OAI-PMH protocol, and then only we can ensure a comprehensive view of the resources available on a particular subject to our end-users. 7.0 Conclusion Much promise is seen for the use of the protocol within an open archives approach. Support for a new pattern for scholarly communication is the most publicized potential benefit. Perhaps most readily achievable are the goals of surfacing 'hidden resources' and low cost interoperability. Although the OAI-PMH is technically very simple, building coherent services that meet user requirements remains complex. The OAI-PMH protocol could become part of the infrastructure of the Web, as taken-for-granted as the HTTP protocol now is, if a combination of its relative simplicity and proven success by early implementers in a service context leads to widespread uptake by research organizations, publishers and archives. REFERENCES 1. http://www.openarchives.org/ 2. Breeding, M. (2002, April). The Emergence of the Open Archives Initiative: This Protocol could become a key part of the digital library infrastructure. Information Today. from http://www.findarticles.com/cf_0/m3336/4_19/85251474/p1/article.jhtml 3. Breeding, M. (2002). Understanding the Protocol for Metadata Harvesting of the Open Archives Initiative. Computers in Libraries, 22(8). 4. Lagoze, C., & Sompel, H. V. d. (2001, January). The Open Archives Initiative Protocol for Metadata Harvesting,from http://www.openarchives.org/OAI/openarchivesprotocol.htm
  • 14. 5. Lynch, C. A. (2001, August). Metadata Harvesting and the Open Archives Initiative. ARL Bimonthly Report 217. from http://www.arl.org/newsltr/217/mhp.html 6. Shearer, K. (2002, March). The Open Archives Initiative: Developing an Interoperability Framework for Scholarly Publishing. CARL/ABRC Background Series, No. 5. from http://www.carl-abrc.ca/projects/scholarly/open_archives.PDF 7. Suleman, H., & Fox, E. A. (2001, December). A Framework for Building Open Digital Libraries. D-Lib Magazine, 7(12). from http://www.dlib.org/dlib/december01/suleman/12suleman.html 8. Sompel, H. V. d., & Lagoze, C. (2000, February). The Santa Fe Convention of the Open Archives Initiative. D-Lib Magazine, 6(2). from http://www.dlib.org/dlib/february00/vandesompel- oai/02vandesompel-oai.html 9. Warner, S. (2001, June). Exposing and Harvesting Metadata Using the OAI Metadata Harvesting Protocol: A Tutorial. HEP Libraries Webzine Issue 4. from http://library.cern.ch/HEPLW/4/papers/3/ 11 . http://www.ukoln.ac.uk/repositories/digirep/index/FAQs 12 . Michael Shepherd, (2003), Interoperability for Digital Libraries, DRTC Workshop on Semantic Web 8th – 10th December, 2003,DRTC, Bangalore 13 . http://www.openarchives.org/Register/BrowseSites 14 . http://www.openarchives.org/service/listproviders.html