| www.eudat.eu | 2nd Session: July 14, 2016.
In this webinar, Sarah Jones (DCC) and Marjan Grootveld (DANS) talked through the aspects that Horizon 2020 requires from a DMP. They discussed examples from real DMPs and also touched upon the Software Management Plan, which for some projects can be a sensible addition
EUDAT & OpenAIRE Webinar: How to write a Data Management Plan - July 14, 2016| www.eudat.eu |
1. How to write a
Data Management Plan
Sarah Jones (DCC)
Marjan Grootveld (DANS)
both involved in EUDAT and OpenAIRE
This work is licensed under the Creative
Commons CC-BY 4.0 licence
2. Open Access Infrastructure
for Research in Europe
www.openaire.eu
Who we are
Research Data Services, Expertise &
Technology https://www.eudat.eu
3. Joint webinar held on 26 May 2016 covering:
• Reasons to manage data
• Horizon 2020 Open Research Data Pilot
• How to manage and share data
• EUDAT & OpenAIRE services
Slides, webinar recording and Q&A document online
www.openaire.eu/research-data-management-an-
introductory-webinar-from-openaire-and-eudat
Introduction to RDM
4. • What is a DMP and why write one?
• Requirements under Horizon 2020
• Example plans
• Lessons and guidance
Overview
5. WHAT IS A DMP & WHY WRITE ONE?
Image CC-BY-NC-SA by Leo Reynolds www.flickr.com/photos/lwr/13442910354
6. A DMP is a brief plan to define:
• how the data will be created
• how it will be documented
• who will be able to access it
• where it will be stored
• who will back it up
• whether (and how) it will be shared & preserved
DMPs are often submitted as part of grant applications, but
are useful whenever researchers are creating data.
Data Management Plans
7. Why manage data?
NON PECUNIAE INVESTIGATIONIS CURATORE
SED VITAE FACIMUS PROGRAMMAS DATORUM PROCURATIONIS
(Not for the research funder, but for life we make data management plans)
• Make your research easier
• Stop yourself drowning in irrelevant stuff
• Save data for later
• Avoid accusations of fraud or bad science
• Write a data paper
• Share your data for re-use
• Get credit for it
8. CREATING
DATA
PROCESSING
DATA
ANALYSING
DATA
PRESERVING
DATA
GIVING
ACCESS TO
DATA
RE-USING
DATA
Research data lifecycle
CREATING DATA: designing research,
DMPs, planning consent, locate existing
data, data collection and management,
capturing and creating metadata
RE-USING DATA: follow-
up research, new
research, undertake
research reviews,
scrutinising findings,
teaching & learning
ACCESS TO DATA:
distributing data,
sharing data,
controlling access,
establishing copyright,
promoting data PRESERVING DATA: data storage, back-
up & archiving, migrating to best format
& medium, creating metadata and
documentation
ANALYSING DATA:
interpreting, & deriving
data, producing outputs,
authoring publications,
preparing for sharing
PROCESSING DATA:
entering, transcribing,
checking, validating and
cleaning data, anonymising
data, describing data,
manage and store data
Ref: UK Data Archive: http://www.data-archive.ac.uk/create-manage/life-cycle
9. What data organisation would a re-user like?
Planning trick 1: think backwards
CREATING
DATA
PROCESSING
DATA
PRESERVING
DATA
GIVING
ACCESS TO
DATA
RE-USING
DATA
10. Data organisation exercises
Design a data organisation for
the project (folder structure,
file naming convention, …)
Research Data Netherlands data support training:
http://datasupport.researchdata.nl/en/start-de-cursus/iii-onderzoeksfase/organising-data/
14. A DMP is about ‘keeping’ data
• Storing data < > archiving data
• Archived data < > findable data
• Findable < > accessible
• Accessible < > understandable
• Understandable < > usable
• A USB stick is not safe
• A persistent ID is essential but no guarantee for usability
• Data in a proprietary format is not sustainable
15. • Findable
– Assign persistent IDs, provide rich metadata, register in a
searchable resource,...
• Accessible
– Retrievable by their ID using a standard protocol, metadata remain
accessible even if data aren’t...
• Interoperable
– Use formal, broadly applicable languages, use standard
vocabularies, qualified references...
• Reusable
– Rich, accurate metadata, clear licences, provenance, use of
community standards...
www.force11.org/group/fairgroup/fairprinciples
Making data FAIR
16. How to deal with data and context?
• Versioning, back-up, storage and archiving
– During the project and in the long term
• Ethics, consent forms, legal access
• Security and technical access
• Usage licences
17. What should be preserved and shared?
• The data needed to validate results in scientific publications (minimally!).
• The associated metadata: the dataset’s creator, title, year of publication,
repository, identifier etc.
– Follow a metadata standard in your line of work, or a generic standard, e.g.
Dublin Core or DataCite, and be FAIR.
– The repository will assign a persistent ID to the dataset: important for
discovering and citing the data.
• Documentation: code books, lab journals, informed consent forms – domain-
dependent, and important for understanding the data and combining them with
other data sources.
• Software, hardware, tools, syntax queries, machine configurations – domain-
dependent, and important for using the data. (Alternative: information about the
software etc.)
Basically, everything that is needed to replicate a study should be available. Plus
everything that is potentially useful for others.
Research Data Alliance (RDA) http://rd-alliance.github.io/metadata-directory/standards/
FAIR Guiding Principles for scientific data management & stewardship http://www.nature.com/articles/sdata201618
How to select and appraise research data:www.dcc.ac.uk/resources/how-guides/appraise-select-research-data
18. DMPS IN HORIZON 2020
Image “Open Data” CC BY 2.0 by http://www.descrier.co.uk
20. Common themes in DMPs
1. Description of data to be collected / created
(i.e. content, type, format, volume...)
2. Standards / methodologies for data collection & management
3. Ethics and Intellectual Property
(highlight restrictions on data sharing e.g. embargoes, confidentiality)
4. Plans for data sharing and access
(i.e. how, when, to whom)
5. Strategy for long-term preservation
Start planning and communicating early
21. Horizon 2020: Open Research Data Pilot
http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/o
a_pilot/h2020-hi-oa-data-mgt_en.pdf
• Open access to research data refers to the right to
access and re-use digital research data. Openly
accessible research data can typically be accessed,
mined, exploited, reproduced and disseminated free
of charge for the user.
• The use of a Data Management Plan (DMP) is
required for projects participating in the Open
Research Data Pilot, detailing what data the project
will generate, whether and how they will be exploited
or made accessible for verification and re-use, and
how they will be curated and preserved.
22. Who’s involved in this pilot?
Current situation:
• Researchers funded by Horizon 2020 within 9 specified call
areas - https://www.openaire.eu/opendatapilot
• Opt out and opt in are possible.
• A DMP per dataset
As of 2017:
• European Cloud Initiative to give Europe a global lead in
the data-driven economy.
• For new projects open data will become the default option.
The pilot will be extended to cover all call areas. Opting out
remains possible.
• http://europa.eu/rapid/press-release_IP-16-1408_en.htm
23. Open, unless…
• The EC’s goal is Open Access to research data: as
open as possible, as closed as necessary.
• Grant Agreement, Art. 29.3, Open Access to research
data:
• When applicable: explain in the DMP why you need to
(partially) opt out.
24. Timing the DMP
• Note that the Commission does NOT require applicants
to submit a DMP at the proposal stage (see next slide).
• A DMP is therefore NOT part of the evaluation.
• DMPs are a deliverable for those in the pilot (due by
month 6).
• Note that the Commission requires updates. A DMP is a
living or “active” document.
25. Proposal phase
Where relevant*, H2020 proposals can include a section on data management which is
evaluated under the criterion ‘Impact’.
• What types of data will the project generate/collect?
• What standards will be applied?
• How will this data be exploited &/or shared/made accessible for verification and
reuse?
• If data cannot be made available, why not?
• How will this data be curated and preserved?
Your data management policy should reflect the current state of consortium agreements
on RDM.
* For “Research and Innovation actions” and “Innovation Actions”
http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-data-mgt_en.pdf
26. Initial DMP (at 6 months)
The DMP should address the points below on a dataset by dataset
basis:
• Dataset reference and name
• Data set description
• Standards and metadata
• Data sharing
• Archiving and preservation (including storage and backup)
See Annex 1 at:
http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020
-hi-oa-data-mgt_en.pdf
27. More elaborate DMP
Scientific research data should be easily:
1. Discoverable
Are the data discoverable and identifiable by a standard mechanism e.g. DOIs?
2. Accessible
Are the data accessible and under what conditions e.g. licenses, embargoes?
3. Assessable and intelligible
Are the data and software assessable and intelligible to third parties for peer-review? E.g. can
judgements be made about their reliability and the competence of those who created them?
4. Useable beyond the original purpose for which it was collected
Are the data properly curated and stored together with the minimum software and documentation to
be useful by third parties in the long-term?
5. Interoperable to specific quality standards
Are the data and software interoperable, allowing data exchange? E.g. were common formats and
standards for metadata used?
See Annex 2 at:
http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-
hi-oa-data-mgt_en.pdf
28. DMPonline
A web-based tool to help researchers write DMPs
Includes a template for Horizon 2020
Guidance from EUDAT and OpenAIRE being added
https://dmponline.dcc.ac.uk
29. How the tool works
Click to write a
generic DMP
Or choose your
funder to get their
specific template
Pick your uni to
add local
guidance and to
get their template
if no funder
applies
Choose any
additional
optional
guidance
31. OpenAIRE support
• Summary on the Open Research Data pilot
https://www.openaire.eu/opendatapilot
• Brief guide on developing a DMP
https://www.openaire.eu/opendatapilot-dmp
• Selecting a data repository
https://www.openaire.eu/opendatapilot-repository
• Developing guidance to add to DMPonline
• Will be adding an ‘export to Zenodo’ feature in early 2017 to
allow DMPs to be published and assigned a DOI
32. Deliver the DMP and keep it up to date
• EC: “Since DMPs are expected to mature during the
project, more developed versions of the plan can be
included as additional deliverables at later stages. (…)
New versions of the DMP should be created whenever
important changes to the project occur due to inclusion
of new data sets, changes in consortium policies or
external factors.”
Focus on how you
will ensure your data
are “FAIR”
33. Active DMPs
• Interested in ways to support this active quality, where
“active” is understood as “able to evolve and be
monitored”?
• Join the RDA’s Active Data Management Plans interest
group https://rd-alliance.org/groups/active-data-
management-plans.html
• And see recordings, slides and notes of the international
and interdisciplinary ADMP Workshop 28-30 June 2016
https://indico.cern.ch/event/520120
34. Option: add SSI template for
software projects
Two templates available for Software Management Plans in
DMPonline courtesy of SSI
www.software.ac.uk/resources/guides/software-management-plans
36. Example plans
• 108 DMPs from the National Endowment for the Humanities
www.neh.gov/divisions/odh/grant-news/data-management-plans-successful-
grant-applications-2011-2014-now-available
• 20+ scientific DMPs submitted to the NSF (USA) provided by UCSD
– http://libraries.ucsd.edu/services/data-curation/data-management/ dmp-
samples.html
• Example DMP collection from Leeds University
• https://library.leeds.ac.uk/research-data-tools
• Further examples:
• www.dcc.ac.uk/resources/data-management-plans/guidance-examples
37. Example: OpenMinTed
OpenMinTed aims to
create an infrastructure for
Text and Data Mining
(TDM) of scientific and
scholarly content
Have adopted their own
structure to create a ‘Data
and Software
Management Plan’
http://openminted.eu
38. Example: OpenMinTed –
Data chapter
Six high-level datasets identified:
1. Scholarly publications
2. Language and knowledge resources
3. Services and workflows
4. Automatically and manually generated annotations
5. Consortium publications
6. Metadata
Described in a table
per dataset
(see illustration)
40. Example: CAPSELLA
CAPSELLA aims to develop ICT
solutions for farmers and other
actors engaged in agrobiodiversity
Devised a questionnaire to collate
datset information from project
partners
Identified 13 datasets, 6 of which
are imported as is, 3 aggregated, 3
transformed and 1 generated
www.capsella.eu
42. Data description examples
The final dataset will include self-reported demographic and
behavioural data from interviews with the subjects and laboratory data
from urine specimens provided.
From NIH data sharing statements
Every two days, we will subsample E. affinis populations growing under our
treatment conditions. We will use a microscope to identify the life stage
and sex of the subsampled individuals. We will document the information
first in a laboratory notebook and then copy the data into an Excel
spreadsheet. The Excel spreadsheet will be saved as a comma separated
value (.csv) file.
From DataOne – E. affinis DMP example
43. Metadata examples
Metadata will be tagged in XML using the Data Documentation Initiative (DDI)
format. The codebook will contain information on study design, sampling
methodology, fieldwork, variable-level detail, and all information necessary for
a secondary analyst to use the data accurately and effectively.
From ICPSR Framework for Creating a DMP
We will first document our metadata by taking careful notes in the laboratory notebook
that refer to specific data files and describe all columns, units, abbreviations, and
missing value identifiers. These notes will be transcribed into a .txt document that will
be stored with the data file. After all of the data are collected, we will then use EML
(Ecological Metadata Language) to digitize our metadata. EML is one of the accepted
formats used in ecology, and works well for the types of data we will be producing. We
will create these metadata using Morpho software, available through KNB. The
metadata will fully describe the data files and the context of the measurements.
From DataOne – E. affinis DMP example
44. Data sharing examples
We will make the data and associated documentation available to users under a data-
sharing agreement that provides for: (1) a commitment to using the data only for research
purposes and not to identify any individual participant; (2) a commitment to securing the data
using appropriate computer technology; and (3) a commitment to destroying or returning the
data after analyses are completed.
From NIH data sharing statements
The videos will be made available via the bristol.ac.uk website (both as streaming media
and downloads) HD and SD versions will be provided to accommodate those with lower
bandwidth. Videos will also be made available via Vimeo, a platform that is already well
used by research students at Bristol. Appropriate metadata will also be provided to the
existing Vimeo standard.
All video will also be available for download and re-editing by third parties. To facilitate this
Creative Commons licenses will be assigned to each item. In order to ensure this usage is
possible, the required permissions will be gathered from participants (using a suitable
release form) before recording commences.
From University of Bristol Kitchen Cosmology DMP
45. Examples restrictions
Because the STDs being studied are reportable diseases, we will be collecting
identifying information. Even though the final dataset will be stripped of identifiers
prior to release for sharing, we believe that there remains the possibility of
deductive disclosure of subjects with unusual characteristics. Thus, we will make
the data and associated documentation available to users only under a data-
sharing agreement.
From NIH data sharing statements
1. Share data privately within 1 year.
Data will be held in Private Repository, but metadata will be public
2. Release data to public within 2 years.
Encouraged after one year to release data for public access.
3. Request, in writing, data privacy up to 4 years.
Extensions beyond 3 years will only be granted for compelling cases.
4. Consult with creators of private CZO datasets prior to use.
Pis required to seek consent before using private data they can access
From Boulder Creek Critical Zone Observatory DMP
46. Archiving examples
The investigators will work with staff at the UKDA to determine what to
archive and how long the deposited data should be retained. Future long-
term use of the data will be ensured by placing a copy of the data into the
repository.
From ICPSR Framework for Creating a DMP
Data will be provided in file formats considered appropriate for long-term
access, as recommended by the UK Data Service. For example, SPSS Portal
format and tab-delimited text for qualitative tabular data and RTF and
PDF/A for interview transcripts. Appropriate documentation necessary to
understand the data will also be provided. Anonymised data will be held
for a minimum of 10 years following project completion, in compliance
with LSHTM’s Records Retention and Disposal Schedule. Biological samples
(output 3) will be deposited with the UK BioBank for future use.
From Writing a Wellcome Trust Data Management and Sharing Plan
47. Share your example DMPs!
Send us links to your
DMPs
We will add them to
the DCC list
Aim to cover wide
range of disciplines
and funders
www.dcc.ac.uk/
share-DMPs
48. LESSONS AND RESOURCES
Image ‘Energy Resources | Energie Quelle’ CC-BY-NC by K. H. Reichert www.flickr.com/photos/reupa/19502634575
49. Tips for writing DMPs
• Seek advice - consult and collaborate
• Consider good practice for your field
• Base plans on available skills & support
• Make sure implementation is feasible
• Think about things early…
50. Plan to share data from the outset
• Negotiation on licenses and consent agreement may
preclude later sharing if not careful
• Costings can’t be included retrospectively
• Useful to consider data issues at the consortium
negotiation stage to make sure potential issues are
identified and sorted asap
Decisions made early on affect what you can do later
51. Sharing data: what is meant?
With collaborators while
research is active
Data are mutable
(Open) data sharing
Data are stable,
searchable, citable, clearly
licensed
52. Storing data: what is meant?
Storing and backing up files
while research is active
Likely to be on a networked
filestore or hard drive
Easy to change or delete
Archiving or preserving
data in the long-term
Likely to be deposited in a
digital repository
Safeguarded and preserved
53. Archiving, repositories, ehm?
• Horizon 2020 ORD pilot participants are asked to “deposit
your data in a research data repository”: a digital archive
collecting and displaying datasets and their metadata.
• Select a data repository that will preserve your data,
metadata and possibly tools in the long term.
• It is advisable to contact the repository of your choice
when writing the first version of your DMP.
• Repositories may offer guidelines for sustainable data
formats and metadata standards, as well as support for
dealing with sensitive data and licensing.
54. Where to find a repository?
• More information: https://www.openaire.eu/opendatapilot-repository
• Zenodo: http://www.zenodo.org
• Re3data.org: http://www.re3data.org
56. How to select a repository?
• Certification as a ‘Trustworthy Digital Repository’ with an explicit
ambition to keep the data available in long term.
• Matches your particular data needs: e.g. formats accepted; mixture of
Open and Restricted Access.
• Provides guidance on how to cite the deposited data.
• Gives your submitted dataset a persistent and globally unique identifier
for sustainable citations and to link back to particular researchers and
grants.
www.openaire.eu/opendatapilot-repository
Data Seal of Approval
nestor seal
ISO 16363
57. Keep everything? For always?
• When regenerating data would be cheaper than archiving, don’t
archive. Select what data you’ll need and want to retain.
• 10 years is often stated in data policies and academic codes, but data
can be valuable for ages, in climatology, sociology, health sciences,
astronomy, linguistics, … Look beyond minimal retention periods
where relevant.
• Explain your selection criteria in the DMP.
DCC How-to guide: http://www.dcc.ac.uk/resources/how-guides/appraise-select-data
RDNL Selection criteria: http://www.researchdata.nl/en/services/data-
management/selecting-research-data/
58. Licensing research data
• Horizon 2020 guidelines point to CC-BY or CC-0
• EUDAT licensing wizard help you pick licence for data & software
http://ufal.github.io/public-license-selector
• DCC How-to guide helps you to license data
www.dcc.ac.uk/resources/how-guides/license-research-data
59. Metadata standards
Metadata Standards Directory
• Broad, disciplinary listing of
standards and tools
• Maintained by RDA group
http://rd-alliance.github.io/metadata-
directory
Biosharing
• A portal of data standards,
databases, and policies
• Focused on life, environmental
and biomedical sciences
https://biosharing.org
60. • How to develop a DMP
www.dcc.ac.uk/resources/how-guides/develop-data-plan
• RDM brochure and template
https://dans.knaw.nl/en/about/organisation-and-policy/information-
material?set_language=en
• OpenAIRE guidelines
• www.openaire.eu/opendatapilot-dmp
• ICPSR framework for a DMP
www.icpsr.umich.edu/icpsrweb/content/datamanagement/dmp/fram
ework.html
Guidelines on DMPs
61. • Guidelines on Data Management
in Horizon 2020
• Provides summary of
requirements
• Includes templates for DMPs
http://ec.europa.eu/research/participants/data/r
ef/h2020/grants_manual/hi/oa_pilot/h2020-hi-
oa-data-mgt_en.pdf
EC guidance
63. Key messages
• Data management is part of good research practice whether you plan
to make the data open or not – it benefits you!
• The process of planning and reflecting are most important. Think
about the desired end result and plan for this.
• Approach the DMP in whatever way best fits your project
– adopt a different template to suit
– add sections / elements e.g. ethics, software
– decide whether to describe each dataset in detail
– focus effort on datasets you’ll create rather than reuse…
64. www.eudat.eu www.openaire.eu
Thanks – any questions?
Contact us:
Marjan Grootveld: marjan.grootveld@dans.knaw.nl
Sarah Jones: sarah.jones@glasgow.ac.uk
Acknowledgements:
Thanks to DANS and DCC for reuse of slides, and to the OpenMinTeD and
CAPSELLA projects for sharing their Data Management Plans
65. www.eudat.eu www.openaire.eu
Please let us know what you
thought of the webinar
https://eudat.eu/evaluation-form-for-the-webinar-
how-to-write-a-data-management-plan
Notas do Editor
OpenAIRE: H2020 project in our third project phase. Started with the aim of supporting & monitoring the EC’s OA mandate, enabling OA publications, but as the OA movement has evolved, we are now an infrastructure for Oscience generally: Publications, Data and Processes within science.
We run a network of national OA helpdesks and align OA policies across Europe.
Also we have technologies to capture and interling research outputs of Europe.
EUDAT: also H2020, also in a follow-up phase.
Goal is to enable researchers and practitioners from any discipline to find, access, process, share and preserve data in the trustworthy environment of the Collaborative Data Infrastructure. There is a whole suite of data services, for different stages of the reseach life cycle. Like OpenAIRE it is a paneuropean network, with cooperating compute and data centers.
There is an overlap and synergy in RDM between both projects, also for Sarah and me personally as we both work in both projects, and this is why we’ve come together again today.
When you are interested in our projects’ services, check out the websites and the previous webinar, which was presented by Sarah and Tony Ross-Hellauer from OpenAIRE.
1. I will start by talking about the Why and What of DMPs, and also about the Who.
2. Then I’ll zoom in on Horizon 2020, the requirements that the EC imposes and the DMP templates that are available.
Now we all know that there are huge differences between disciplines, e.g. in the typical duration of projects, team size, the sensitivity of the data that you collect, the cultures of collaboration and data sharing, the use of software, tools and other machinery. A webinar like this is not the ideal place to really go into those differences.
3. But when Sarah takes over, with the example plans, she will give extracts from DMPs from different fields, which will make it more concrete.
4. She will also present some lessons learned in the past and point to useful resources for Planning your RDM.
This should leave us with about 15 minutes for questions and discussion. Don’t hesitate to put your questions into the chatbox.
So let’s begin by looking at the changing data landscape.
A Data Management Plan is often written early on in the research process to determine what data will be created and how it will be managed. Sometime you are asked for a DMP as part of a grant application, but they are useful to write regardless as it helps to develop consistent procedures from the outset.
You may know the old saying “We do not learn for school, but for life”. For planning and carrying out data management we’d like to encourage a similar attitude in researchers and other stakeholders.
There are lots of reasons to manage research data. You may be required to explain how you will manage your data by your funder or university. Ultimately though, it’s to make your research easier. If data are properly documented and organised, you can stop yourself drowning in irrelevant stuff and find the data when you need it – for example to validate findings. By managing your data you can also more easily share it with others to get more credit and impact.
Well-managed data opens up opportunities for re-use, integration and new science. And RDM is just part of a researcher’s life…
This research data lifecycle is taken from the UK Data Archive. It shows you the different processes and activities you’ll go through. As I’m sure you all know, data has a life beyond the project end.
Depending on your line of work, you may enter the cycle at ‘half past ten’, by re-using existing data, or at 12 o’clock:
Creating data: This is when you’ll design the research, write Data Management Plans, negotiate consent agreements, find any existing data you want to reuse, collect/capture your data and create any associated metadata
Processing data: When processing your data, you’ll be entering, transcribing, checking, validating and cleaning it, you may also need to anonymise your data, you should describe it and make sure it’s properly managed and stored.
Analysing data: when you analyse your data you’ll be interpreting it and creating derived data and outputs, you’ll probably also author publications and prepare the data for deposit and sharing.
Preserving data: data repositories play a key role in preserving data: they will make sure it’s properly stored and archived, they will migrate the formats and storage medium and create associated metadata and documentation to explain any changes made
Access to data: it may be that you share your data via a repository or handle access requests yourself. Either way, you need to establish copyright, decide who can have access and promote the data.
Re-using data: data can be re-used in follow-up studies, new research, research reviews, to evidence findings or for teaching and learning. Try to keep an open mind about the different ways in which your data could be re-used and make it as open as possible.
Let’s adopt the perspective of a future data user – maybe yourself: what should your data organisation – folders with data, metadata and documentation – look like at the moment that you start sharing - outside your team - and archiving?
When you are part of a large project which has been going on for some years already, this may be obvious, but for many researchers it isn’t clear from the start.
To answer that broad question, you want to come up, at an early stage, with answers regarding:
Types and formats of data;
New and/or existing;
Expected size;
Metadata;
Documentation;
Software.
It can be a very useful exercise to sit together with colleagues and discuss for 15 minutes which data organisation would be good during the project and also for handing over data to an archive later on. I’ve been part of such “thinking aloud” exercises and that was a great success: file formats, access rights, versioning, sensitivity… So we strongly recommend you to start with making a plausible overview of the expected project output.
Note that “output” is not “outcome”: for organising the data in their context and answering the first questions in the DMP the intellectual results of the project are irrelevant.
And you may find the following reference helpful …
It’s no fun to do the exercise by yourself, so use this as a communication opportunity.
With so many parties who have a stake in RDM, it’s clear that a DMP is an instrument for communication.
AND: for those of you who are not researchers: make sure that you get involved during, or even better, before the writing phase.
DMPs are about ‘keeping’ data. There are some misconceptions about “keeping” data; that’s why this slide looks so gloomy and heavy.
Working in a FAIR way can help you to deal with the first part of the previous slide.
It’s becoming an international ambition to make data FAIR. We’ve put sugggestions back to the EC and they are reworking the guideline, and FAIR concepts will play a role.
As always, namedropping is easy, so you do have to think at an early stage about what complying with the FAIR principles means in your situation.
There are some pointer here to what it means that data are FAIR.
With whom: Immediate collegeagues, researchers within your organisation, all researchers, the public at large?
When: now, after an embargo periode? For publications H2020 allows 6 (STEM –science, technical, engineering and medical) to 12 months (SSH). For data no-one talks about embargos, but it is an option that LT repositories may offer.
Under what terms: OA should be the default, but even then you are wise to make use of a license (and check this with your LT repository)
Re Software etc: you might also think of virtual machines with the corresponding setup information.
In many cases copyright will prevent the archiving of software and tools. The alternative is a sensible description of configuration settings etc.
Let’s move on to what H2020 requires from DMPs
Much of what was in the earlier slides relates to any funder or research organisation that requires a DMP, and…
…and there are clear common themes in the templates and checklists.
But let’s now focus on H2020…
As you will know, the EC runs a pilot study with Open Research Data and in the pilot the EC requires that data will be preserved for later use; a DMP should describe the What and How.
Starting next year, this with hold for all project call areas. The requirements will apply from when the work programmes start, so this will vary throughout the course of the year. It's not from 1st January or retrospective.
As far as we know, opting out and partially opting out will remain possible als long as it is motivated.
The EC’s goal is Open Access to research data. Participating in the Open Research Data Pilot does not necessarily mean opening up all research data. Rather, the focus of the Pilot is on encouraging good data management as an essential element of research best practice: awareness raising.
Contrary to some other funders, the EC does not require a DMP at the proposal stage, but there is an optional section under “Impact”
Although DMPs are a project deliverable and not required at the application stage, proposals can include a section on data management if desired. The info suggested here is similar to the preliminary DMP, so essentially gets that started.
For the DMP you can use a word document in your project layout, but you can also use the template within DMPonline.
Here is where you can log in.
From the start, the DCC has offered guidance, independent of funder or discipline. EUDAT and OpenAIRE and others are developing extra guidance as well.
For instance, should to want to archive your dataset later on in EUDAT’s B2SHARE facility, you see here that this will assign a persistent identifier to the dataset.
[final bullet] Acting on requests from the community, DMPonline will add an ‘export to Zenodo’ feature alongside the other export options. You might want to use this to increase your project’s transparancy, share good practices, or maybe because you write your DMP as a (kind of ) data paper, which is interesting in its own right. At the moment there are a few H2020 DMPs in Zenodo and figshare.
Make sure that you know what will be asked of you for the mid-term and the final review: the focus here is on enabling reuse of your data – by your future self and others.
In subsequent reviews (or any time they feel like) the PO and reviewers may check to see if the DMP is followed (e.g., data files deposited, access status, metadata format, ...).
As an aside…
When in your project SW is not only a tool for capturing or analysing data, but a planned project deliverable, you might consider to also plan your SW Management.
Sample questions from the Minimal plan:
1. What will your software do?
2. Will your software have a name? Do you have one in mind? Is this name unique and meaningful and not in violation of any existing trademarks?
What, if any, software installation and configuration skills, knowledge and expertise will your users need? Will they need to be familiar with building and installing software via the command-line? Will they need to develop their own code to be able to use your software?
3. There are many ways in which you can release your software, e.g.: a binary executable that can be run directly or a .zip archive, or as Python or R packages
4. Will it help to produce results more rapidly? Will it help to produce results to a higher degree of accuracy or a finer level of detail? Will it help to conduct analyses cannot be conducted at present?
5. Asking users to cite your software, directly or via a related paper, and providing a recommended citation, means you can search for these citations. Consider adding a citation requirement to your software’s licence, so it becomes a condition of its use.
In addition the full plan asks for instance about how you make good SW (including tests and adhering to disability accessibility guidelines), dependencies like third-party tools, engaging with the users, etc.
Example: the OpenMinTed project combines software with the H2020 DMP issues.
OpenMinted is an EINFRA project, which means that it is building an e-Infrastructure and data is passing through
Basically, the project partners have selected from the long list of the SSI template what is relevant for them.
Capsella is an ICT project (RIA).
These are not the same thing! When the EC asks about your approach to sharing data they’re interested in the latter.
When data are stored on ‘active data storage’ they’re subject to change. Anyone with permission could edit or delete files. They may still be there in 10 years time, but this is not guaranteed.
An archive is different as the data and associated metadata is packaged up together and protected.
Backup is not the same as preservation. If you want your data to be accessible in the future, you should deposit in a trustworthy digital repository which commits to preserving it.
Look early for a research data repository for sharing and preserving the data long term.
Remember to give also your open data and software a proper licence.
Guidance from the DCC can also help researchers to understand data licensing. This guide outlines the pros and cons of each approach e.g. the limitations of some CC options
The OA guidelines under Horizon 2020 point to CC-0 or CC-BY as a straightforward and effective way to make it possible for others to mine, exploit and reproduce the data. See p11 at: http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-pilot-guide_en.pdf
Let’s move on to the considerations to make when managing and sharing data