Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Practical Research Data Management: tools and approaches, pre- and post-award
1. Practical Research Data Management:
tools and approaches, pre- and post-award
Martin Donnelly
Digital Curation Centre
University of Edinburgh
Research Data Management: a good practice exchange
London, 10 February 2016
2. The Digital Curation Centre (DCC)
• The UK’s national centre of expertise in digital
preservation and data management, established 2004
• Provide guidance, training, tools and other services on
all aspects of research data management
• Organise national and international events and webinars
(International Digital Curation Conference, Research
Data Management Forum)
• Principal audience is the UK higher education sector, but
we increasingly work further afield (Europe, North
America, South Africa…)
• Now offering tailored consultancy/training services
3. Overview
1. What is Research Data Management, and why?
2. Who’s involved and what does it mean for them?
3. Data management planning: a shared activity
(including short interactive exercise)
4. Case study: The Horizon 2020 data pilot
a. Pre-project
b. In-project
c. Post-project
5. Links and resources
4. Overview
1. What is Research Data Management, and why?
2. Who’s involved and what does it mean for them?
3. Data management planning: a shared activity
(including short interactive exercise)
4. Case study: The Horizon 2020 data pilot
a. Pre-project
b. In-project
c. Post-project
5. Links and resources
5. The old way of doing research
1. Researcher collects data (information)
2. Researcher interprets/synthesises data
3. Researcher writes paper based on data
4. Paper is published (and preserved)
5. The data is left to benign neglect,
and eventually ceases to be
accessible
6. Without intervention, data + time = no data
Vines et al. “examined the availability of data from 516 studies between 2 and 22
years old”
- The odds of a data set being reported as extant fell by 17% per year
- Broken e-mails and obsolete storage devices were the main obstacles to data sharing
- Policies mandating data archiving at publication are clearly needed
“The current system of leaving data with authors means that almost all of it is lost
over time, unavailable for validation of the original results or to use for entirely
new purposes” according to Timothy Vines, one of the researchers. This underscores
the need for intentional management of data from all disciplines and opened our
conversation on potential roles for librarians in this arena. (“80 Percent of
Scientific Data Gone in 20 Years” HNGN, Dec. 20, 2013,
http://www.hngn.com/articles/20083/20131220/80-percent-of-scientific-data-
gone-in-20-years.htm.)
Vines et al., The Availability of Research Data Declines Rapidly with Article Age,
Current Biology (2014), http://dx.doi.org/10.1016/j.cub.2013.11.014
7. The new way of doing research
Plan
Collect
Assure
Describe
Preserve
Discover
Integrate
Analyze
DEPOSIT
…and
RE-USE
The DataONE
lifecycle model
8. What is RDM?
“the active management
and appraisal of data over
the lifecycle of scholarly
and scientific interest”
What sorts of activities?
- Planning and describing data-
related work before it takes
place
- Documenting your data so that
others can find and understand it
- Storing it safely during the
project
- Depositing it in a trusted archive
at the end of the project
- Linking publications to the
datasets that underpin them
9. (Aside: from data to research objects?)
• ‘Research object’ is a term that is gaining in popularity,
not least in the humanities where the relevance of the
term ‘data’ is not always recognised…
• Research objects can comprise any supporting material
which underpins or otherwise enriches the (written)
outputs of research
• Data (numeric, written, audiovisual….)
• Software code and algorithms
• Workflows and methodologies
• Slides, logs, lab books, sketchbooks, notebooks, etc
• See http://www.researchobject.org/ for more info
10. Helicopter view: benefits of RDM
• SPEED: The research process becomes faster
• EFFICIENCY: Data collection can be funded once, and
used many times for a variety of purposes
• ACCESSIBILITY: Interested third parties can (where
appropriate) access and build upon publicly-funded
research resources with minimal barriers to access
• IMPACT and LONGEVITY: Publications with open data
receive more citations, over longer periods
• TRANSPARENCY and QUALITY: The evidence that
underpins research can be made open for anyone to
scrutinise, and attempt to replicate findings. This leads
to a more robust scholarly record
11. Growing momentum and ubiquity…
Data management
is a part of good
research practice.
- RCUK Policy and Code of
Conduct on the
Governance of Good
Research Conduct
12. (Aside: To share or not to share?)
• Data management and data sharing are not the same
thing!
• Sensitive data, whether commercially or ethically
sensitive, must be protected in line with the laws of the
land, research funder policies, and institutional ethical
approval processes
• Some sensitive data can be shared after certain kinds of
processing, e.g. anonymisation, pseudonymisation,
aggregation, etc.
• Other datasets may be subject to rigorous access /
clearance controls, or embargos
• The real experts in sensitive data are the UK Data
Archive, based at the University of Essex
13. Overview
1. What is Research Data Management, and why?
2. Who’s involved and what does it mean for them?
3. Data management planning: a shared activity
(including short interactive exercise)
4. Case study: The Horizon 2020 data pilot
a. Pre-project
b. In-project
c. Post-project
5. Links and resources
14. Who and how?
• RDM is a hybrid activity, involving multiple
stakeholder groups…
• The researchers themselves
• Research support personnel
• Partners based in other institutions, commercial partners,
etc
• Other stakeholders in the modern research process
include governments, public services, and the
general public (who fund lots of research via their
taxes)
15. What does it mean in practice? (i)
• For research institutions, there are
three principal areas of focus…
1. Developing and integrating technical
infrastructure (repositories/ CRIS
systems, storage space, data catalogues
and registries, etc)
2. Developing human infrastructure
(creating policies, assessing current data
management capabilities, identifying
areas of good practice, DMP templates,
tailored training and guidance materials…)
3. Developing business plans for sustainable
service
• Many have formed cross-function
(hybrid) working groups, advisory
groups, task forces, etc
http://blog.soton.ac.uk/keepit/
2010/01/28/aida-and-
institutional-wobbliness/
16. What does it mean in practice? (ii)
• For researchers it is…
• A disruption to previous working processes
• Additional expectations / requirements from
the funders (and sometimes home institutions)
• But! It provides opportunities for new types of
investigation
• And leads to a more robust scholarly record
17. What does it mean in practice? (iii)
• Research administrators and other
support professionals:
• Need to understand the key elements in the
process, as well as roles and responsibilities
• Should understand the key points of the
funders’ requirements
• Should expect questions from researchers…
and perhaps some resistance!
18. Overview
1. What is Research Data Management, and why?
2. Who’s involved and what does it mean for them?
3. Data management planning: a shared activity
(including short interactive exercise)
4. Case study: The Horizon 2020 data pilot
a. Pre-project
b. In-project
c. Post-project
5. Links and resources
19. Data management planning (DMP)
• Data management planning is the process of planning, describing and
communicating activities carried out during the research lifecycle in
order to…
• Keep sensitive data safe
• Maximise data’s reuse potential
• Support longer-term preservation
• Data management planning underpins and pulls together different
strands of data management activities, often across multiple project
partners
• A data management plan (DMP) is usually a short document detailing
specifics of the data that will be created during a research project,
together with information on how it can be accessed and utilised
• Research funders often ask for DMPs to be submitted alongside grant
applications and/or developed over the course of the research project.
(HEIs are increasingly asking their researchers to do this too…)
20. Why plan?
• It is intuitive that planned activities stand a better chance of
meeting their goals than unplanned ones. The process of
planning is also a process of communication, increasingly
important in interdisciplinary/multi-partner research.
Collaboration will be more harmonious if project partners (in
industry, other universities, other countries…) are in accord
• In terms of data security, if there are good reasons not to
publish/share data, in whole or in part, you will be on more solid
ground with funders if you flag these up early in the process
• DMP also provides an ideal opportunity to engender good
practice with regard to (e.g.) file formats, metadata standards,
storage and risk management practices, leading to greater
longevity of data, and improved quality standards…
21. (Aside: limits of data management planning)
What can a plan not do? It can’t do the work
for you.
The map is not the territory (Korzybski)
or
Chalk’s no shears (Scottish saying)
It is important to remember that the human
challenges in data management are often
more difficult to meet than the
technological ones.
Communication is vital!
22. What does a data management plan look like?
It is usually a couple of pages outlining:
how data will be captured/created
how it will be documented
who will be able to access it
where it will be stored
how it will be backed up, and
whether (and how) it will be shared and preserved long-term
etc
DMPs are often submitted as part of funding applications – and
requirements vary from funder to funder – but they are useful whenever
researchers are creating (or reusing) data, especially where the research
involves multiple partners, countries, etc…
23. Roles and responsibilities
Like RDM in general, data management planning is a hybrid
activity, involving multiple stakeholder groups…
• The principal investigator (usually ultimately responsible for data)
• Research assistants (may be more involved in day-to-day data
management)
• The institution’s funding office (may have a compliance role)
• Library/IT/Legal (The library may issue PIDs, or liaise with an
external service who do this, e.g. DataCite.)
• Partners based in other institutions
• Commercial partners
• etc
24. Interactive exercise: data management planning
• Select one of the DMP
Checklist headings (left), and
brainstorm all the internal and
external stakeholders you
think might be involved (and
how/why) – be as specific as
you like
• Remember to consider the
different stages of research:
pre-award, in-project, post-
project
• We’ll have a short
reporting/discussion session
at the end
• http://www.dcc.ac.uk/resourc
es/data-management-
plans/checklist
§1. Administrative Data [basic details about the project]
§2. Data Collection
What data will you collect or create?
How will the data be collected or created?
§3. Documentation and Metadata
What documentation and metadata will accompany the data?
§4. Ethics and Legal Compliance
How will you manage any ethical issues?
How will you manage copyright and Intellectual Property
Rights (IPR) issues?
§5. Storage and Backup
How will the data be stored and backed up during the
research?
How will you manage access and security?
§6. Selection and Preservation
Which data should be retained, shared, and/or preserved?
What is the long-term preservation plan for the dataset?
§7. Data Sharing
How will you share the data?
Are any restrictions on data sharing required?
§8. Responsibilities and Resources
Who will be responsible for data management?
What resources will you require to deliver your plan?
25. Data management planning exercise: outcomes
• It’s not necessary – or even desirable – for every
researcher (or research administrator, or librarian, or IT
person…) to become an expert in every aspect of data
management
• Universities have an increasing obligation to provide
infrastructure and support
• Specific expertise may be available from the research
office, library, IT, departmental support staff, legal
services etc, as well as academic colleagues with
particular areas of expertise
• The trick is to make this appear seamless -
communication and coordination is ever more important
26. Overview
1. What is Research Data Management, and why?
2. Who’s involved and what does it mean for them?
3. Data management planning: a shared activity
(including short interactive exercise)
4. Case study: The Horizon 2020 data pilot
a. Pre-project
b. In-project
c. Post-project
5. Links and resources
27. Case study: The Horizon 2020 data pilot
• Horizon 2020 includes a data management (planning)
pilot…
• http://ec.europa.eu/research/participants/data/ref/h2020
/grants_manual/hi/oa_pilot/h2020-hi-oa-data-mgt_en.pdf
• Proposals covered
• “Innovation actions” and “Research and innovation actions”
• DMP contents
• Data types; Standards used; Sharing/making available;
Curation and preservation
• Multi-phase approach
• Initial DMP due within first 6 months
• Mid-term DMP
• Final review stage DMP
• There are opt-out conditions. A detailed description
and scope of the Open Research Data Pilot
requirements is provided on the Participants’ Portal
28. The Horizon 2020 DMP requirements (i)
v1: Within Six Months
For each data set specify the
following:
• Data set reference and name
• Data set description
• Standards and metadata
• Data sharing
• Archiving and preservation
(including storage and backup)
.docx output from DMPonline
29. The Horizon 2020 DMP requirements (i)
v1: Within Six Months
For each data set specify the
following:
• Data set reference and name
• Data set description
• Standards and metadata
• Data sharing
• Archiving and preservation
(including storage and backup)
Tools/resources
• DOI or other identifier
• Example plans
• RDA | Metadata Directory
• UKDA resources
• Data archive, e.g. Zenodo or a
funding council mandated archive
30. The Horizon 2020 DMP requirements (ii)
v2 and v3: Mid-Term and Final Reviews
Scientific research data should be easily:
1. Discoverable
• Are the data and associated software produced and/or
used in the project discoverable (and readily located),
identifiable by means of a standard identification
mechanism (e.g. Digital Object Identifier)?
2. Accessible
• Are the data and associated software produced and/or
used in the project accessible and in what modalities,
scope, licenses?
3. Assessable and intelligible
• Are the data and associated software produced and/or
used in the project assessable for and intelligible to
third parties in contexts such as scientific scrutiny and
peer review? Continues…
31. The Horizon 2020 DMP requirements (ii)
v2 and v3: Mid-Term and Final Reviews
Scientific research data should be easily:
1. Discoverable
• Are the data and associated software produced and/or
used in the project discoverable (and readily located),
identifiable by means of a standard identification
mechanism (e.g. Digital Object Identifier)?
2. Accessible
• Are the data and associated software produced and/or
used in the project accessible and in what modalities,
scope, licenses?
3. Assessable and intelligible
• Are the data and associated software produced and/or
used in the project assessable for and intelligible to
third parties in contexts such as scientific scrutiny and
peer review? Continues…
Tools/resources
1. DataCite / Zenodo
2. DCC guidance on
licensing data
3. Software
Sustainability
Institute
32. v2 and v3: Mid-Term and Final Reviews
Scientific research data should be easily:
4. Usable beyond the original purpose for
which it was collected
• Are the data and associated software
produced and/or used in the project useable
by third parties even long time after the
collection of the data?
5. Interoperable to specific quality
standards
• Are the data and associated software
produced and/or used in the project
interoperable allowing data exchange between
researchers, institutions, organisations,
countries, etc?
The Horizon 2020 DMP requirements (iii)
33. Tools/resources
4. Guidance on data
description, e.g.
LSHTM
5. Using open formats,
e.g. Five Stars of
Open Data
v2 and v3: Mid-Term and Final Reviews
Scientific research data should be easily:
4. Usable beyond the original purpose for
which it was collected
• Are the data and associated software
produced and/or used in the project useable
by third parties even long time after the
collection of the data?
5. Interoperable to specific quality
standards
• Are the data and associated software
produced and/or used in the project
interoperable allowing data exchange between
researchers, institutions, organisations,
countries, etc?
The Horizon 2020 DMP requirements (iii)
34. Overview
1. What is Research Data Management, and why?
2. Who’s involved and what does it mean for them?
3. Data management planning: a shared activity
(including short interactive exercise)
4. Case study: The Horizon 2020 data pilot
a. Pre-project
b. In-project
c. Post-project
5. Links and resources
35. Resources mentioned in the talk
• DataCite: https://www.datacite.org/
• Zenodo: https://zenodo.org/
• DCC guidance on licensing data:
http://www.dcc.ac.uk/resources/how-guides/license-
research-data
• Software Sustainability Institute:
http://www.software.ac.uk/
• LSHTM guidance on describing data:
http://www.lshtm.ac.uk/research/researchdataman/desc
ribe/describe_data.html
• Five Star Open Data: http://5stardata.info/en/#costs-
benefits
36. DCC resources on data management planning
• Guidance, e.g. “How-To Develop a Data
Management and Sharing Plan”
• DCC Checklist for a Data Management
Plan:
http://www.dcc.ac.uk/resources/data-
management-plans/checklist
• DMPonline tool:
https://dmponline.dcc.ac.uk/
• Links to all DCC DMP resources via
http://www.dcc.ac.uk/resources/data-
management-plans
37. • Helps researchers write DMPs
• Provides funder questions and guidance
• Includes a template DMP for Horizon 2020
• Provides help from universities
• Examples and suggested answers
• Free to use
• Mature (v1 launched April 2010)
• Code is Open Source (on GitHub)
https://dmponline.dcc.ac.uk
DMPonline: overview
38. Registration
Sign up with your
email address,
organisation and
password
Select ‘other
organisation’ if
yours is not listed
39. Creating a plan
Select funder (if any)
Select organisation for
additional questions
and guidance
Select other sources
of guidance
45. Institutions can customise the tool by…
• Adding templates
• Adding custom guidance
• Providing example or suggested answers
• Monitoring usage within their organisation
• Offering non-English language versions
www.dcc.ac.uk/news/customising-dmponline-admin-
interface-launches
47. Sample plans, and last words of advice
• There are lots of data management plans available on the
Web. The DCC provides links to a number of sample DMPs
via http://www.dcc.ac.uk/resources/data-management-
plans/guidance-examples
• The US National Endowment for the Humanities (NEH) recently
released over 100 of its DMPs. These are available via:
http://www.neh.gov/divisions/odh/grant-news/data-management-
plans-successful-grant-applications-2011-2014-now-available
• Remember that there is no magic bullet, and no one-size-
fits-all solution! Much of the benefit of data management
planning lies in the process of planning, above and beyond
the plans produced at the end of the process
• DMP is above all a communication activity, between the
data collectors and their contemporaries (project partners
and funders) and with future data re-users…
48. Thank you – any questions?
• For more information about the DCC:
• Website: www.dcc.ac.uk
• Director: Kevin Ashley
(kevin.ashley@ed.ac.uk)
• General enquiries: Lorna Brown
(lorna.brown@ed.ac.uk)
• Twitter: @digitalcuration
• My contact details:
• Email: martin.donnelly@ed.ac.uk
• Twitter: @mkdDCC
• Slideshare:
http://www.slideshare.net/martindonn
elly
This work is licensed
under the Creative
Commons Attribution 2.5
UK: Scotland License.
Editor's Notes
Will talk about active management now, and appraisal a little later…
Note that even if data is not suitable for sharing/publication, it still needs active management!
Focus on data management – what is it, what activities are involved (and how do these affect different roles, e.g. researchers, PhD Students, librarians, data managers, research administrators, publishers, policy makers, funders, project managers.) – cite Vines et al.
Research itself is increasingly international, interdisciplinary and collaborative, crossing both public and private spheres. Communication has never been more important.
Not proposing to run this exercise, but it’s included here as an example of the kind of introductory level work we can do
These guidelines echo the G8 Science Ministers’ statement (2013), which offered similar good practice principles: https://www.gov.uk/government/news/g8-science-ministers-statement
Horizon 2020 open data pilot follows on from G8 Open Data Charter (June 2013)
We’ve created a DMPonline template for Horizon 2020 based on the annexes in the Guidelines for Data Management.
These guidelines echo the G8 Science Ministers’ statement (2013), which offered similar good practice principles: https://www.gov.uk/government/news/g8-science-ministers-statement
Horizon 2020 open data pilot follows on from G8 Open Data Charter (June 2013)
We’ve created a DMPonline template for Horizon 2020 based on the annexes in the Guidelines for Data Management.
These guidelines echo the G8 Science Ministers’ statement (2013), which offered similar good practice principles: https://www.gov.uk/government/news/g8-science-ministers-statement
Horizon 2020 open data pilot follows on from G8 Open Data Charter (June 2013)
We’ve created a DMPonline template for Horizon 2020 based on the annexes in the Guidelines for Data Management.
These guidelines echo the G8 Science Ministers’ statement (2013), which offered similar good practice principles: https://www.gov.uk/government/news/g8-science-ministers-statement
Horizon 2020 open data pilot follows on from G8 Open Data Charter (June 2013)
We’ve created a DMPonline template for Horizon 2020 based on the annexes in the Guidelines for Data Management.
These guidelines echo the G8 Science Ministers’ statement (2013), which offered similar good practice principles: https://www.gov.uk/government/news/g8-science-ministers-statement
Horizon 2020 open data pilot follows on from G8 Open Data Charter (June 2013)
We’ve created a DMPonline template for Horizon 2020 based on the annexes in the Guidelines for Data Management.
These guidelines echo the G8 Science Ministers’ statement (2013), which offered similar good practice principles: https://www.gov.uk/government/news/g8-science-ministers-statement
Horizon 2020 open data pilot follows on from G8 Open Data Charter (June 2013)
We’ve created a DMPonline template for Horizon 2020 based on the annexes in the Guidelines for Data Management.
These guidelines echo the G8 Science Ministers’ statement (2013), which offered similar good practice principles: https://www.gov.uk/government/news/g8-science-ministers-statement
Horizon 2020 open data pilot follows on from G8 Open Data Charter (June 2013)
We’ve created a DMPonline template for Horizon 2020 based on the annexes in the Guidelines for Data Management.