The document discusses the British Library's role in managing and providing access to research data. It notes that the Library archives over 300TB of digital data, including datasets, and is working to improve discovery and citation of datasets. This includes testing a dataset discovery service, establishing selection criteria for datasets, and implementing DOIs for datasets in partnership with DataCite to help researchers find and cite data more easily. The goal is for datasets to be treated similarly to research articles and better integrated into the scholarly record.
3. And we are embracing digital
National library of the UK
Here for everyone who wants to do
research
Archiving since 1662
Legal deposit incl. non-print
publications (from April 2013)
Print occupies > 600km shelving
300TB of data in the Digital Library
Provide access to 45k eJournals &
newspapers, eBooks, datasets &
800 bibliographic databases
2M sound recordings, 4M maps, 5M
reports, theses, conference papers,
the world’s largest patents collection
(c.50M) & 8M stamps
www.bl.uk
3
4. Catering for contemporary science
Managing collections
Delivering new content
Developing services
Research
www.bl.uk
Engaging and inspiring
Science team
Collaborations &
Partnerships
4
6. The value of research data
•
Data are a vital part of the scientific record
•
But what is/should be/will be the role of
libraries in this changing landscape?
• Data as a format is very different
from traditional library content, so are
libraries equipped with the
knowledge, technology and capacity
to deal with it?
• How should libraries prepare for this?
We examined the landscape of data and assessed the services that
the British Library might offer
www.bl.uk
6
7. Testing dataset discovery
A service involving a ‘new’ material type
raised questions about:
• Users
• Selection
• Metadata
SDASM Archives. Public Domain Via Flickr
• Operational sustainability
Preliminary work:
• Studies conducted on our behalf
• Literature review of user behaviour
• Internal scoping to define suitable
processes and systems
Lead to a pilot service, using existing
systems
www.bl.uk
7
9. Datasets discovery in Explore the British Library
>500 research datasets
Environmental Science
Tropical & Rare Diseases
www.bl.uk
9
9
10. Results
Metadata for SEARCH
% conversion from dataset view to click
through
100%
90%
80%
80.0
70.0
70%
60%
50%
60.0
50.0
40%
40.0
30%
30.0
20%
20.0
10%
0%
10.0
0.0
• A wide variety of approaches were used • Usage statistics suggest the service
to search
was used to find research data
www.bl.uk
10
11. The benefits of citing data
• Checking facts
• Obtaining easier access to data
• Enabling re-use of data
• Providing acknowledgement to a
wider group – the data
centre, curators etc.
• Supporting openness and
transparency
Reich NG, Perl TM, Cummings DAT, Lessler J (2011) Visualizing Clinical Evidence:
Citation Networks for the Incubation Periods of Respiratory Viral Infections. PLoS ONE
6(4): e19496. doi:10.1371/journal.pone.0019496
www.bl.uk
11
12. Why finding and citing data is not easy
• No widely used method to identify datasets
• No widely used method to cite datasets
• No effective way to link between articles and datasets
• How can we solve these challenges?
www.bl.uk
12
13. Why DOIs?
The Digital Object Identifier is a persistent identifier that directs
users to an online object, even if it changes location.
Why DOIs?
• Most widely used identifier for research articles
• Researchers and publishers already know how to use them
• Puts datasets on the same playing field as articles
• The DOI system offers an easy way to connect the article
with the underlying data
www.bl.uk
13
14. DataCite
• Established in 2009 as a not-for-profit
organisation
• A member of the International DOI Foundation
• A Registration Agency for DOI names
• 18 full members from Europe, North
America, Asia and Australia (2m DOIs)
• Members work with data centres in their own
countries
• Provide a shared infrastructure for minting
DOIs
www.bl.uk
14
15. British Library's role in DataCite
International DOI
Foundation
Member
• The British Library is one of 18
international members of DataCite
DataCite
• We are an allocating agent
Member
Institution
• We provide the DataCite
infrastructure, enabling UK Data
Centres to ‘mint’ DOIs for data
Data Centre
Data Centre
Data Client
www.bl.uk
• While the aim is to support
researchers, we do not work with
individuals - they must deposit to a
data centre/institution
15
The blood letting zodiac man is taken from one of the Library’s 15th century Harley Manuscripts They were donated to the nation in 1753 and form one of the foundation collections of the BLNot going to spend 20 minutes giving you a discourse on the representation of medicine in the medieval period although it is interesting to reflect on what some of this tells us about information – when you are complaining about writing up your work at least you don’t have to write it by hand and draw the picturesWill it still be around in 700 yearsWill people laugh at it
PhD focus groups: People Science & PolicyWe built evidence based on our own user research, research from the literature, and with internal consultation. But theoretical evidence of discovery as the route for the library to take needed to be backed up with something more concrete – a pilot.A pilot would allows us to test the proposition with users to get concrete evidence for the Library’s work in this area, but also something we could show those internal to the Library.
Some of this information was relevant to metadata, hence needing to have something in place to start selection properly.We couldn’t go out and select everything as an STM dataset at once, so for the pilot we chose a specific subject area: Living with Environmental Change – that is data from monitoring or modeling the environment. This is now also expanded to Biodiversity (for the International Year if Biodiversity in 2010) and soon there will also be records available on Neglected Tropical Diseases. In the next year we will be expanding to Food security – spanning environmental and bioscience topics, from crop genetics and animal breeding to soil quality and pollination.Guidelines were otherwise very much based on existing STM selection criteria.
And this is what it looks like!Research datasets material type, accessible via the I want this tab, direct link.
Darker blues are very useful or just useful. Lighter blues are not useful.Survey confirmed that our approach was suitable, but was it actually being used?Initial effort in promotion of the service to get feedback was high, but towards the end of the first year, when little or no time was put into promotion, usage stats showed that usage still remained stable, even given the ‘pilot’ status of the Search Our Catalogue interface itself.These data do exclude staff IP ranges, so are reading room and external visitors only.And wasn't just curiosity, this graph shows that people were clicking through to the website containing the data.
And just referring to other published articles where readers can’t check the facts for themselves can be problematic. A recent study (shown on the slide) demonstrated that ‘conventional wisdom’ is often not based on experimental data. The study looked at reported incubation times for various viral infections and found that half of the studies did not even provide a source for their estimate. Mapping the citation networks enabled the authors to show that the information about incubation times was often based on a small fraction of the data or on no empirical evidence. But there would be benefits if researchers could actually cite the dataset itself. This would enable people to check the facts for themselves, obtain easier access to the data (theoretically researchers should share any data that underpins a published article but in reality they often don’t), funding organisations could show better value for money if data generated could be re-used, many people or data centres who actually manage data do not receive credit for doing so but this could offer a form of acknowledgement. In addition, the process of science is aided by openness and transparency.
- In the same way that researchers don’t directly get DOIs for their papers, they must go via a publisher to get a DOI for them.When we say ‘data centre’ this is for ease of time – we include trusted digital/institutional repositories in this!!We work at an organisational level
You will see that this DOI appears quite long – data centres are free to determine the format for the DOI suffix (see slide #24).