The state of global research data initiatives: observations from a life on the road
1. Data Management:
all you need to know
Sarah Jones
Digital Curation Centre
sarah.jones@glasgow.ac.uk
Twitter: @sjDCC
Link to come
2. What is Research Data Management?
Image CC-BY-SA by Janneke Staaks www.flickr.com/photos/jannekestaaks/14411397343
3. What is Research Data Management?
Create
Document
Use
Store
Share
Preserve
“the active management and
appraisal of data over the
lifecycle of scholarly and
scientific interest”
Data management is part of
good research practice
4. What is involved in RDM?
• Data Management Planning
• Data creation
• Annotating / documenting data
• Analysis, use, versioning
• Storage and backup
• Publishing papers and data
• Preparing for deposit
• Archiving and sharing
• Licensing
• Citing…
Create
Document
Use
Store
Share
Preserve
5. What is a data management plan?
A brief plan written at the start of a project to define:
• how the data will be created?
• how it will be documented?
• who will access it?
• where it will be stored?
• who will back it up?
• whether (and how) it will be shared & preserved?
DMPs are often submitted as part of grant applications,
but are useful whenever researchers are creating data.
6. Typical coverage of a DMP
1. Description of data to be collected / created
(i.e. content, type, format, volume...)
2. Standards / methodologies for data collection & management
3. Ethics and Intellectual Property
(highlight any restrictions on data sharing e.g. embargoes, confidentiality)
4. Plans for data sharing and access
(i.e. how, when, to whom)
5. Strategy for long-term preservation
8. Sharing data increases citations!
There are benefits for you. Want evidence?
• Piwowar, Vision – 9% (microarray data)
• Drachen, Dorch, et al – 25-40%, astronomy
• Gleditch, et al – doubling to trebling (international relations)
Open Data Citation Advantage
http://sparceurope.org/open-data-citation-advantage
9. Why manage research data?
Image Azgan Mjeshtri https://unsplash.com/photos/KgxawsqiAJs
10. To avoid problems
• Data duplication
• Data loss and security breaches
• Versioning issues
• Inability to reuse data
Save time and effort to make your life easier!
11. To keep your options open
Decisions you make early on will affect what you can do later:
• Choice of file formats
• Consent forms
• Licence and consortium agreements
Avoid having to renegotiate consent or being prevented from
reusing data by keeping options open
13. DMPs can be helpful
it helped us reflect on potential issues and
decide how to address these as a project
I find it very useful since, although I have an idea
of what data I will collect in my project, this makes
me reflect on the best format to present them,
where to make them available, etc
OpenAIRE & FAIR data Expert Group DMP survey. Report, dataset & infographic at:
https://doi.org/10.5281/zenodo.1120245
15. How to manage research data?
Image Guille Alvarez https://unsplash.com/photos/P11Z-nILhCs
16. Follow RDM basics
• Use common data formats
• Use metadata standards and controlled vocabularies
• Document your processes
• Version your data – and code
• Store securely
• Back-up automatically
• Deposit in repositories
• Get a Persistent Identifier
• Licence your data
Create
Document
Use
Store
Share
Preserve
17. 17
Choose where to store/backup?
• Your own device (laptop, flash drive, server etc.)
– And if you lose it? Or it breaks?
• Departmental drives or university servers with
automatic backup
• “Cloud” storage
– Do they care as much about your data as you do?
The decision will be based on how sensitive your data are,
how robust you need the storage to be, and
who needs access to the data and when
20. Make data understandable
Metadata
• Standardised
• Structured
• Machine and human
readable
Metadata helps to cite &
disambiguate data
Documentation aids reuse
Metadata
Documentation
21. Metadata standards
These can be general – such as Dublin Core
Or discipline specific
– Data Documentation Initiative (DDI) – social science
– Ecological Metadata Language (EML) - ecology
– Flexible Image Transport System (FITS) – astronomy
Search for standards in catalogues like:
http://rd-alliance.github.io/metadata-directory
22. Documentation
Think about what is needed in order to evaluate,
understand, and reuse the data.
• Why was the data created?
• Have you documented what you did and how?
• Did you develop code to run analyses? If so, this
should be kept and shared too.
• Important to provide wider context for trust
23. ReadMe files
We recommend that a ReadMe be a plain text file containing the following:
• for each filename, a short description of what data it includes,
optionally describing the relationship to the tables, figures, or sections
within the accompanying publication
• for tabular data: definitions of column headings and row labels; data
codes (including missing data); and measurement units
• any data processing steps, especially if not described in the publication,
that may affect interpretation of results
• a description of what associated datasets are stored elsewhere, if
applicable
• whom to contact with questions
http://datadryad.org/pages/readme
Example template: https://www.lib.umn.edu/datamanagement/metadata
26. Use available DMP tools
DMPonline offers:
• Example plans
• Tailored guidance
• Plan sharing &
visibility controls
• Institutional feedback
and DMP review
• Export to multiple
formats
• Online helpdesk
27. 27
How does DMPonline work?
Pulls together requirements and guidance,
tailored to your context
Guidance and examples from
funders, unis, research
disciplines and others
DMP
Requirements from
funders, institutions
and others
Create Share Review Export Update …..
29. How to share your data?
Image CC-BY-NC-ND by talkingplant www.flickr.com/photos/talkingplant/2256485110
30. Steps to make data open?
1. Choose your dataset(s)
– What can you may open? You may need to revisit this step if you
encounter problems later.
2. Apply an open license
– Determine what IP exists. Apply a suitable licence e.g. CC-BY
3. Make the data available
– Provide the data in a suitable format. Use repositories.
4. Make it discoverable
– Post on the web, register in catalogues, ensure you cite…
https://okfn.org
31. DCC how-to guide: www.dcc.ac.uk/resources/how-guides/license-research-data
License research data openly
32. Deposit in a data repository
http://databib.org
www.re3data.org
The Re3data catalogue can be searched to find a home for data
www.fosteropenscience.eu
/content/re3data-demo
33. National / domain repositories
FAIRsharing portal of
databases in life sciences
and earth sciences
www.re3data.org
https://fairsharing.org
34. Zenodo is a multi-disciplinary repository that can be
used for the long-tail of research data
• An OpenAIRE-CERN joint effort
• Multidisciplinary repository accepting
– Multiple data types
– Publications
– Software
• Assigns a Digital Object Identifier (DOI)
• Links funding, publications, data & software
www.zenodo.org
Zenodo
35. Archiving code in Zenodo
Get a DOI for each release
https://guides.github.com/
activities/citable-code
37. How to cite data
www.dcc.ac.uk/resources/briefing-papers/introduction-
curation/data-citation-and-linking
Key citation elements
• Author
• Publication date
• Title
• Location (= identifier)
• Funder (if applicable)
38. How do you share data effectively?
• Use appropriate repositories, this
catalogue is a good place to start
http://www.re3data.org
• Document and describe it enough for
others to understand, use and cite
• http://www.dcc.ac.uk/resources/how-
guides/cite-datasets
• Licence it so others can reuse
www.dcc.ac.uk/resources/how-guides/license-
research-data