Presentation given at the European Research Council workshop on research data management and sharing in Brussels on 18th-19th September 2014. The presentation covers the benefits and drivers for RDM, points to relevant tools and resources and closes with some open questions for discussion.
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Managing and sharing data
1. Managing and sharing data
Sarah Jones
DCC, University of Glasgow
sarah.jones@glasgow.ac.uk
Twitter: @sjDCC
ERC Workshop on Research Data Management and Sharing
18-19 September 2014 , Brussels
Funded by:
2. European Research Council policy
Commitment to open science from the start:
"it is the firm intention of the ERC Scientific Council to issue
specific guidelines for the mandatory deposit in open access
repositories of research results – that is, publications, data
and primary materials – obtained thanks to ERC grants, as
soon as pertinent repositories become operational."
Statement on Open Access, December 2006
Image CC BY-SA 3.0 by Greg Emmerich
www.flickr.com/photos/gemmerich/6365692655
4. Sharing leads to breakthroughs
www.nytimes.com/2010/08/13/health/research
/13alzheimer.html?pagewanted=all&_r=0
“It was unbelievable. Its not science
the way most of us have practiced in
our careers. But we all realised that
we would never get biomarkers
unless all of us parked our egos and
intellectual property noses outside
the door and agreed that all of our
data would be public immediately.”
Dr John Trojanowski, University of Pennsylvania
... increases the speed of discovery
5. Returns for institutions
“If an institution spent A$10 million on data,
what would be the return? The answer is: more
publications; an increased citation count; more
grants; greater profile; and more collaboration.”
Dr Ross Wilkinson, ANDS
www.ariadne.ac.uk/issue72/oar-2013-rpt
6. Researchers get a citation boost
“Publicly available data was significantly
(p = 0.006) associated with a 69% increase in
citations, independently of journal impact
factor, date of publication, and author
country of origin using linear regression.”
Piwowar H., Day, R and Fridsma, D. (2007) Sharing detailed research data
is associated with increased citation rate. DOI: 10.1371/journal.pone.0000308
7. But, there are also barriers...
Who owns the data?
• Researchers?
• University?
• Commercial partners?
• Funders?
• …
People are often misinformed about
who owns the data. It is particularly
hard to determine in international
projects or ones with industry.
Restrictions on sharing
• Patentable data
• Commercial sensitivities
• Personal, identifiable data
• Lack of consent
• …
There are legitimate reasons to agree
embargo periods, impose conditions,
or to share only some of the data.
However, these are often given as
reasons not to share data at all.
www.dcc.ac.uk/sites/default/files/documents/events/
workshops/IHW-2013/UKDA-barriers-to-data-sharing.pdf
8. And opportunity costs
By Emilio Bruna
http://brunalab.org/blog/2014/09/04/the-opportunity-cost-
of-my-openscience-was-35-hours-690
For his most recent paper:
1. Double checking the main dataset and
reformatting to submit to Dryad: 5 hours
2. Creating complementary file and preparing
metadata: 3 hours
3. Submission of these two files and the
metadata to Dryad: 45 minutes
4. Preparing a map of the locations: 1 hour
5. Submission of map to Figshare: 15 minutes
6. Cleaning up and documenting the code,
uploading it to GitHub: 25 hours
7. Cost of archiving in Dryad: US$90
8. Page Charges: $600
9. What needs to change?
Conclusions from Emilio Bruna:
• Develop a better system of incentives from the
community for archiving data and code
• Teach our students how to do this NOW - it’s much easier
if you develop good habits early
• Minimise the actual and opportunity costs
We need to stop telling people “You should” and get
better at telling people “Here’s how”
10. What is involved in data curation
• Data Management Planning
• Data creation
• Annotating / documenting data
• Analysis, use, versioning
• Storage and backup
• Publishing papers and data
• Preparing for deposit
• Archiving and sharing
• Licensing
• Citing…
Plan
Create
Document
Use
Share
Publish
11. Data Management Plans
Brief plans to determine how data will be created, managed and
shared. DMPs usually cover:
1. Description of data to be collected / created
2. Standards and methodologies for data collection & management
3. Any issues or restrictions due to ethics and Intellectual Property
4. Plans for data sharing and access
5. Strategy for long-term preservation
DMPs are often submitted as part of grant applications, but are
useful whenever you’re creating data.
12. Help with DMPs
A web-based tool to help researchers
write data management plans
https://dmponline.dcc.ac.uk
Framework for creating a DMP
A list of common elements explaining why they
are important and giving example answers
www.icpsr.umich.edu/icpsrweb/content/
datamanagement/dmp/framework.html
www.dcc.ac.uk/sites/default/files/documents
/resource/DMP_Checklist_2013.pdf
Examples plans
www.dcc.ac.uk/resources/data-management-
plans/guidance-examples
13. Managing and sharing data:
a best practice guide
http://data-archive.ac.uk/media/2894/managingsharing.pdf
14. Training materials
FOSTER project
• Open science training
• Courses across EU
• Portal to OA materials
• Guidance on Horizon 2020
• Free online training course
• Aimed at PhD students
• Case studies, quizzes etc
• Data handling tutorials
– R
– SPSS
– ArcGIS
– Nvivo
http://datalib.edina.ac.uk/mantra www.fosteropenscience.eu
15. DCC tools catalogue
A catalogue of RDM tools for different audiences.
Tools for researchers focus on data handling, managing
workflows, citation and impact.
www.dcc.ac.uk/resources/external/tools-services
16. Tools to help with RDM activities
impactstory.org
Citation &
impact
owncloud.org
www.datacite.org
thedata.org
www.taverna.org.uk
www.myexperiment.org
www.labtrove.org
Documentation
& metadata
dataup.cdlib.org
Workflow
management
Storage &
collaboration
17. Metadata standards catalogue
Use standards wherever possible for interoperability
www.dcc.ac.uk/resources/
metadata-standards
19. 1. How do you foster open science?
• Make it feasible to comply
– provide tools and infrastructure
• Train people early in their careers
• Incentivise openness
• Listen to researchers and learn from their
experience about what doesn’t work
• Follow up on any demands made in policies
20. 2. Who is responsible for providing
infrastructure and support?
Discipline
Funders
Institution
Third-party
services
National
provider
Data centres
e.g. via NERC
Institutional support for discipline-specific
tools e.g. Monash MeRC
partnership on tools like OMERO
National brokerage of deals with third-party
providers e.g. Jisc Janet deals with Arkivum
And what about
co-ordination?
21. 3. Who should pay?
Funding Research Data Management
"A conversation with the funders”
The DCC held a special
event on this topic in
the UK, but there’s still a
long way to go
www.dcc.ac.uk/events/research-data-
management-forum-rdmf/
rdmf-special-event-funding-
research-data-management
22. Thanks – any questions?
DCC guidance, tools and case studies:
www.dcc.ac.uk/resources
Follow us on twitter:
@digitalcuration and #ukdcc
Notas do Editor
Quite forward-thinking for such an early OA policy to be framed in terms of data and primary materials too, not just publications.
He was making a comparison with the Hubble telescope, which A$1.5 billion is spent on each year. The cost of the Hubble archive (A$1 million per annum) is just a fraction of this, but given the OA mandate, they’ve see the research publications produced by Hubble discoveries double.
There have been lots of studies in this area since that show a demonstrable citation boost, though not as high as 69%. This figure was for microarray data from cancer trials and it seems that the early datasets had a particularly strong impact and came from authors who were well-cited. A more realistic figure across the board is probably 10-30% increase.