1. Robin Rice
EDINA and Data Library, University of Edinburgh
Repository Fringe 2013: 1-2 August, Edinburgh
*
2.
3. *The data repository and
University RDM policy
“9. Research data of future historical interest, and all
research data that represent records of the University,
including data that substantiate research findings, will be
offered and assessed for deposit and retention in an
appropriate national or international data service or
domain repository, or a University repository.”
4. *
Edinburgh DataShare is
seen by the RDM Steering
Group as one of the key
RDM services offered by
Information Services, and
as such has challenged us
to meet the requirements
of a number of pilot
submissions from a range
of different types of
research communities with
particular types of data.
5. *
Single item deposit,
the dataset behind an
article.
Desire to get students
to deposit their data
from theses as norm -
need unambiguous
deposit workflow.
Fieldwork in NHS
means much data is
„sensitive‟. Permanent
embargoes?
Dr. Nuno Feirrera,
Teaching Fellow
6. *
Dr. Bert Remijsen
Chancellor’s Fellow
Village of Fafanlap, Indonesia,
on Bert‟s home page
Dinka Songs of South Sudan
collection, 62 items.
Used Dspace collection
template for metadata; files
uploaded by assisted deposit.
Also deposited in Max Planck
specialist language repository.
Annotations in specialist
format, requiring software
from Max Planck to read.
User happy with download
statistics, referring colleagues.
7. *
*The Listening Talker
collection identified for
deposit, ongoing.
*Very large video files
plus software as VM
image. Tar gzipped files
containing millions of
files. Several GB in size.
*Desires user
registration, non-
standard licenses and
checksums with
downloads.
Prof. Simon King
8. *
*Lots of „omics data:
not as many subject
repositories to hold
these as thought –
storage cost concerns.
*Interested in push-pull
of metadata to
websites, from CRIS
*Spearheaded by Data
Manager
Dolly the Sheep
9. *
*Fish4Knowledge EU-funded
research project
*Long-term sustainability
issues for observational data
*Search engine maintained on
their website – using METS
feed to locate items
*Testing SWORD implemen-
tation, 5% sample >10K files,
video + sql rows (3 TB)
*Efficiency & performance
Prof. Bob Fisher
10. *
*New member of
University
*Digital asset
management needs
*Nature of research
data in the arts
*Streaming & display
requirements (high
quality desired)
11. *
*Usability & user education
*Encouraging user to document and future-proof
*Relationship of IRs and and subject repositories
*Closed collections, length of embargoes, user
registration in an open access service
*Enhancing repo. functionality while developing new
systems (storage, data asset registry)
*Repository as golden copy/format
*Preservation procedures and SIPs, AIPs, and DIPs
Notas do Editor
Edinburgh DataShare is a free-at-point-of-use data repository service which allows University researchers to upload, share, and license their data resources for online discovery and re-use by others. It was set up in 2008 as an exemplar of institutional data repositories, using DSpace, with partners at Oxford and Southampton working with Fedora and EPrints.
The University RDM Policy has implications for the provision of the data repository service.
EdinburghDataShare is a key component of the Data Stewardship component of the University RDM Roadmap. Legacy datasets can pose challenges for deposit and are not considered important for policy.
This pilot has challenged us on a number of usability issues for deposit: easing the burden of making decisions and making our instructions and hints as clear as possible. Making it easy to skip fields that are not relevant. Provided user guide with screenshots and checklist for deposit.
This pilot user had an audio archive that was well-curated and ready to be made open. Collection already had a ‘home’ in a trusted disciplinary repository, though ours was made public first. User was happy to give the collection greater visibility, as long as he didn’t have to upload files one by one.
User is already delivering files to specialist peers via website. Legacy datasets have existing licences embedded in headers; customised by University lawyers ten years ago. We are grappling with the desire for user registration in an open repository.
Some research considered ‘sensitive’ because of use of animals: not wanting to attract unwanted attention. Many large datasets saved in various places without archiving. Can/should the repository offer their storage solution for large and exponentially growing datasets, so long as they make it open, or should some appraisal step be introduced? The institute is wondering if they should be serving their own data for a price.
The research centre in Taiwan which serves the data during the life of the project may not feel obliged to make the data available long-term. The PI has offered to deposit a 5% sample of the data only. Could this be a good example for an external website maintained by others providing the search mechanism to retrieve objects within the repository? Do we need to alter some aspects of repository behaviour to accommodate this collection and the balance for searching across the repository, pagination of item listings, etc.?
This community is considering Edinburgh DataShare as one of several options for solving a range of problems to do with its research data.