Augmenting interoperability across scholarly repositories
1. Augmenting Interoperability
across Scholarly Repositories
Harvest
Obtain
Put
Herbert Van de Sompel
Research Library
Los Alamos National Laboratory, USA
This work was supported by NSF award number IIS-0430906 (Pathways)
RESEARCH
Augmenting Interoperability across Scholarly Repositories LIBRARY
JISC CNI Conference, York, UK, July 6th 2006
Herbert Van de Sompel
2. Pathways Project
• NSF grant number IIS-0430906
• http://www.infosci.cornell.edu/pathways/
• PIs: Carl Lagoze, Sandy Payette, Herbert Van de Sompel, Simeon
Warner
• Research Participants: Lyudmila Balakireva, Jeroen Bekaert,
Xiaoming Liu, Chris Wilper, Zhiwu Xie
RESEARCH
Augmenting Interoperability across Scholarly Repositories LIBRARY
JISC CNI Conference, York, UK, July 6th 2006
Herbert Van de Sompel
3. Meeting in NYC, April 20-21 2006
• Supported by Microsoft, Mellon Foundation, Coalition for
Networked Information, Digital Library Federation, JISC
• Representatives from institutional Repository projects, scholarly
content Repositories, Registry projects, various projects that touch
on interoperability
• See http://msc.mellon.org/Meetings/Interop/ for Agenda,
Participants, Topics & Goals, Terminology, Presentations, Prototype
demonstration.
• Report available July 2006
RESEARCH
Augmenting Interoperability across Scholarly Repositories LIBRARY
JISC CNI Conference, York, UK, July 6th 2006
Herbert Van de Sompel
4. And more discussions with the community
• Panel at JCDL 2006, Chapel-Hill, NC
• IATUL 2006, Porto, Portugal
• ElPub 2006, Bansko, Bulgaria
• Meeting at the University of Southampton, UK
RESEARCH
Augmenting Interoperability across Scholarly Repositories LIBRARY
JISC CNI Conference, York, UK, July 6th 2006
Herbert Van de Sompel
5. Context: the Repository model
An environment consisting of
Digital Object Repositories
with a Long Life Expectation:
o Scholarly repositories
- Institutional
repositories
- Discipline-oriented
repositories
- Publisher’s repositories
- Dataset repositories
- …
o Cultural heritage
repositories
Repository
o Preservation archives
o Educational repositories
RESEARCH
Augmenting Interoperability across Scholarly Repositories LIBRARY
JISC CNI Conference, York, UK, July 6th 2006
Herbert Van de Sompel
6. Context: compound digital objects
Objects of scholarly
communication system are
increasingly compound in
nature, simultaneously
consisting of:
• Multiple media types
id
• Multiple content types
o Papers,
o Datasets,
o simulations,
Digital Object
o software,
o dynamic knowledge
representations,
o machine readable chemical
structures
RESEARCH
Augmenting Interoperability across Scholarly Repositories LIBRARY
JISC CNI Conference, York, UK, July 6th 2006
Herbert Van de Sompel
7. Context: the Repository model
• We must leverage the value of the materials that become
available in those distributed Repositories.
• Think about these Repositories as active nodes in a global
environment, not as passive local nodes
o These Repositories are about facilitating the use and re-
use of materials in many contexts
o These Repositories are the starting point of value chains
• In order to enable value chains, we need to augment
interoperability across repositories
RESEARCH
Augmenting Interoperability across Scholarly Repositories LIBRARY
JISC CNI Conference, York, UK, July 6th 2006
Herbert Van de Sompel
8. Motivation 1 : Richer cross-Repository services
Distributed Repositories provide source
materials for cross-Repository overlay
services such as discovery services
Selective collecting
service
Need: digital object representation,
harvesting interface, datastream
semantics
RESEARCH
Augmenting Interoperability across Scholarly Repositories LIBRARY
JISC CNI Conference, York, UK, July 6th 2006
Herbert Van de Sompel
9. Motivation 2 : Scholarly communication workflow
Distributed Repositories at the basis of a
digital scholarly communication system.
Scholarly communication as a global
workflow across those Repositories
id
recombine & add value id
id
Need: digital object representation,
obtain interface, put interface
RESEARCH
Augmenting Interoperability across Scholarly Repositories LIBRARY
JISC CNI Conference, York, UK, July 6th 2006
Herbert Van de Sompel
10. Augmenting interoperability across Repositories
Shared Data Model and Services
DSpace
Nature
ePrints
Fedora
aDORe
arXiv
Individual Data Models and Services
RESEARCH
Augmenting Interoperability across Scholarly Repositories LIBRARY
JISC CNI Conference, York, UK, July 6th 2006
Herbert Van de Sompel
11. Considerations re interoperable framework
• Scholarly communication is a long-term endeavor:
• Need abstract definitions of Repository interfaces that can be
instantiated on the basis of various technologies as time goes by
• Repository interfaces need to work with whichever type of
identifier (current and future) because Repositories will use
whichever type of identifier
• Value chains do not require transfer of all digital object
content
• The content that needs to be transferred depends on the nature
of the value chain
• Recording a chain of evidence of a value chain requires fine
granularity of identification
• Not only identifier of the digital object but also of the
repository
RESEARCH
Augmenting Interoperability across Scholarly Repositories LIBRARY
JISC CNI Conference, York, UK, July 6th 2006
Herbert Van de Sompel
12. Augmenting interoperability across Repositories
m
Obtain
Harvest
Put
DSpace
Nature
ePrints
Fedora
aDORe
arXiv
Individual Data Models and Services
RESEARCH
Augmenting Interoperability across Scholarly Repositories LIBRARY
JISC CNI Conference, York, UK, July 6th 2006
Herbert Van de Sompel
13. Augmenting interoperability across Repositories
m Pathways Core Data Model for Cross-Repository services
Bekaert, Jeroen, Xiaoming Liu, Herbert Van de Sompel, Sandy Payette, Carl Lagoze, and
Simeon Warner. Pathways Core: A Data Model for Cross-Repository Services. 2006.
Poster for JCDL 2006. http://public.lanl.gov/herbertv/papers/pathways_core_poster_submit.pdf
RESEARCH
Augmenting Interoperability across Scholarly Repositories LIBRARY
JISC CNI Conference, York, UK, July 6th 2006
Herbert Van de Sompel
14. Augmenting interoperability across Repositories
m Pathways Core Surrogates (currently XML/RDF)
• A Surrogate is available for every Digital Object
• A Surrogate is a representation of the Digital
Object according to the Pathways Core data model
• The representation is uniform across repositories;
not tied to identifier type, content type, application
domain.
• The Surrogate is what is used in the value chains;
the Surrogate is used at Obtain, Harvest and Put
interfaces.
o Expresses properties and access points for the
Digital Object (see later)
oThe Surrogate for a specific Digital Object can
change over time
RESEARCH
Augmenting Interoperability across Scholarly Repositories LIBRARY
JISC CNI Conference, York, UK, July 6th 2006
Herbert Van de Sompel
15. Augmenting interoperability across Repositories
m Pathways Core Surrogates (currently XML/RDF)
• The Surrogates provide By-Reference access to
constituent datastreams of Digital Objects
• Full asset transfer is only required for certain
applications
• Static asset transform may be undesirable for
dynamic objects => Live references
• Avoid IP issues at the level of the interoperability
framework
• The idea is that the Surrogate itself is not
encumbered by IP issues; attach - by definition -
a liberal Creative Commons license to Surrogates
• Allow Surrogates to flow freely independent of
business models of the underlying content
RESEARCH
Augmenting Interoperability across Scholarly Repositories LIBRARY
JISC CNI Conference, York, UK, July 6th 2006
Herbert Van de Sompel
16. Augmenting interoperability across Repositories
m Pathways Core Surrogates (currently XML/RDF)
• A Surrogate expresses access points and
properties of a Digital Object, e.g.:
• Location of content streams
• providerInfo: the keys necessary to Obtain a
fresh Surrogate at some later point in time:
• (Repository identifier, preferredIdentifier,
versionKey)
• Lineage: A Surrogate expresses its
predecessor(s)
• == providerInfo in previous life
• semantic: A Surrogate expresses the type of
content.
RESEARCH
Augmenting Interoperability across Scholarly Repositories LIBRARY
JISC CNI Conference, York, UK, July 6th 2006
Herbert Van de Sompel
17. Augmenting interoperability across Repositories
Obtain interface: a Repository interface that supports the request of
Obtain
services pertaining to individual Digital Objects (including their
component Datastreams). The core service is the request of a
Surrogate for a Digital Object.
Harvest
Harvest interface: a Repository interface that exposes Surrogates for
incremental collecting/harvesting.
Put interface: a Repository interface that supports submission of one
Put
or more Surrogates into the Repository, thereby facilitating the
addition of Digital Objects to the collection of the Repository.
RESEARCH
Augmenting Interoperability across Scholarly Repositories LIBRARY
JISC CNI Conference, York, UK, July 6th 2006
Herbert Van de Sompel
18. Surrogate is at the core of the value chain
providerInfo
Lineage
Obtain
recombine &
id add value
Obtain
Put
id
providerInfo
Obtain
id
Lineage
RESEARCH
Augmenting Interoperability across Scholarly Repositories LIBRARY
JISC CNI Conference, York, UK, July 6th 2006
Herbert Van de Sompel
19. Basis for a Network of Linked Digital Objects
RESEARCH
Augmenting Interoperability across Scholarly Repositories LIBRARY
JISC CNI Conference, York, UK, July 6th 2006
Herbert Van de Sompel
20. Harvest
Put
Put1 Harvest1
Obtain
Obtain1 service
Repo1
Harvest
Put
Put2 Harvest2
Obtain
Repo2 Obtain2
RESEARCH
Augmenting Interoperability across Scholarly Repositories LIBRARY
JISC CNI Conference, York, UK, July 6th 2006
Herbert Van de Sompel
21. (provider,
Harvest
preferredIdentifier,
versionKey)
Put providerInfo
Put1 Harvest1
Obtain
Obtain1
Repo1
Registry
Service
Harvest
Put
Put2 Harvest2
provider Obtain Harvest Put
Obtain
Repo1 Obtain1 Harvest1 Put1
Obtain2 Repo2 Obtain2 Harvest2 Put2
Repo2
RESEARCH
Augmenting Interoperability across Scholarly Repositories LIBRARY
JISC CNI Conference, York, UK, July 6th 2006
Herbert Van de Sompel