3. Adapted original source: The University of California, Santa Cruz, Data Management LibGuide, Research Data Management Lifecycle, diagram, viewed 5th May 2018 http://guides.library.ucsc.edu/datamanagement
Challenge:
Adapted original source: The University of California, Santa Cruz, Data Management LibGuide, Research Data Management Lifecycle, diagram, viewed 5th May 2018 http://guides.library.ucsc.edu/datamanagement
From liniar
process to
research data
lifecycle!
4. OpenAIREwebinar:OpenResearchDatainH2020 –09/05/2018
Whatisdatamanagement?
EXPLAIN IT
STORE IT SAFELY
OPEN IT
• CONTEXTUALIZE YOUR MATERIAL
• DESCRIBE YOUR RESEARCH PROCESS
• PROVIDE INFORMATION ABOUT DATASETS
• MAKE COPIES
• CONTROL ACCESS TO FILES
• DECIDE WHAT DATA TO KEEP AND WHAT TO DELETE
• GAIN MORE IMPACT
• USE DATA REPOSITORIES
• INCREASE TRANSPARENCY
5. Free to access reuse, repurpose,
and redistribute
Restricted access to limited
amount of people under
certain conditions
Open Data Data sharing
Whatisopendata?
6. 1. e.g. Piwowar HA, Vision TJ. (2013) Data reuse and the open data citation advantage. PeerJ 1:e175 https://doi.org/10.7717/peerj.175, Piwowar HA, Day RS, Fridsma DB (2007) Sharing Detailed Research Data Is Associated with Increased Citation Rate. PLoS ONE 2(3): e308.
doi:10.1371/journal.pone.0000308
Prevents data loss
Maximize usefulness
Write a data paper
Credit & longer shelf life 1
Increases transparency
Promote integrity Citizens science
DATAMANAGEMENT
AND
OPENDATA
8. OpenAIREwebinar:OpenResearchDatainH2020 –09/05/2018
Whichareasareparticipating?
• DMP/Dataset
Costs eligible (Article 6.2.D.3 of the Model Grant Agreement)
Projects started in 2014-2016
Limited ORD Pilot
From 2017
Extended ORD Pilot
• Limited ORD pilot: some
areas: Check Article 29.3
• Participating is default
option for all projects
• Possibility to opt-out• Possibility to opt-in or opt-out
• 1 DMP/Project
9. (PARTIALLY)
OPTING-OUT
OpenAIREwebinar:OpenResearchDatainH2020 –09/05/2018
Reasons e.g.
• Exploitation of results
• Confidentiality
• Protection of personal data
• Would jeopardize the main aim of the action
• No data generated
• Any other legitimate reason
As open as possible as closed as necessary
Projects can opt out at any stage:
• Complete opt-out via project amendment
• Complete or partially opt-out:
describe issues in project DMP
10. FAIR Data Management guidelines
10
http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-data-mgt_en.pdf
• Notes the extension of the pilot
• Clarifies concept of FAIR data
• Explains what a DMP is and when they should
be updated
• Notes what happens at proposal, submission
and evaluation stage
• Explains costs are eligible
• Provides a DMP template
12. PROPOSALSTAGE
12
Applications should address:
1. Standards e.g. file formats, metadata scheme’s, licenses…..
2. How to make data available. If not, why?
3. How will data be curated and preserved
4. Current state of agreements on data management
Plan your budget!
5. Be consistent with IPR requirements
≠
DMP
Note: not required to submit DMP at proposal stage. A DMP is
therefore a deliverable, and NOT part of the evaluation.
13. basic idea
≠ DMP
Timeline
1st version DMP
Changes in data,
policy,
consortium
Update DMP
Final version
UPDATE PERIODIC
EVALUATION
6 MONTHSPROPOSAL FINAL REVIEW
14. OpenAIRE webinar: Open Research Data in H2020 – 09/05/2018
WhatisaDMP?
Living document: update Reflects on curation,
preservation, sustainability and
security
What parts will be open
and how?
Handling of data during and after project
16. 16
• How to discover your
data?
• How to understand your
data?
• Where to find your
data?
• Can people access
your data?
• Metadata
• Persistent identifier
• Naming convention
• Keywords
• Versioning
• Software,
documentation
• Data repository
• Standards
• Vocabulary
• Methodologies
• Licensing
Findable
ReusableInteroperable
Accessible
17. OpenAIRE webinar: Open Research Data in H2020 – 09/05/2018
17
HowtowriteaDMP
online tool dmponline.dcc.ac.uk
24. Research can define what is appropriate
Don’t have to share data if inappropriate – exemptions apply
What datasets to mention?
Data and metadata needed to validate the results presented in
scientific publications.
Other (as specified in DMP: raw/curated data)
WhichdatadoestheORDpilotapplyto?
Does not apply to ALL data
OpenAIRE webinar: Open Research Data in H2020 – 09/05/2018
25. • Types e.g. digital/non-digital data,
qualitative/quantitative, audio files, surveys,
databases, field notes…..
Data
summary
• Re-use? Provide the source and
check IPR
DataCollection
• Does similar data exists?
What about reintegration or reuse?
Origin: generated, collected, reused
26. Datafilesformats
Use data formats that are:
• Are not proprietary, open
standard possible?
• In an easily re-usable format
• Commonly used
by research community
Examples of preferred format choices:
Text .odt, .txt, .xml, .html, .rtf
Tabular Data .csv (comma separated values),
.xml, .rdf, .SPSS portable
Images .tif, .jpeg2000, .png, .svg,
Structured data .xml, .rdf
Any standard used in your field
Data
summary
Use consequent naming convention
Structured organizing of files
OpenAIRE webinar: Open Research Data in H2020 – 09/05/2018
27. For personal data of EU citizens (compliance by May 25, 2018)
• EU General Data Protection Regulation (GDPR):
Broad interpretation of personal data: e.g. name, IP address, cookies
ONLY collect what is absolutely necessary and only use for the purpose
indented.
Consent
Right to be forgotten
Right to data correction
Notification if data is at risk
Privacy by default: any divergence on original agreement asks for an additional
agreement
Ethical aspects
Ethical
aspects
28. Considerethicalissuesearly:
• Consent form: sharing, preservation and re-use
• Anonymization
• Privacy
Ethical aspects
Ethical
aspects
Include in Ethic deliverables/Ethics chapter of
Description of the Action
29. Filesharingandstorage
• Back-up procedures
• Befriend your ICT department
Data security
• Strategy for file sharing: servers, synchronization, secure
storage, encrypting, ….
- What is needed?
- Who can access?
especially important if you deal with:
• Large data files: storage capacity
or
• Personal data: protection (passwords, encryption,…)
OpenAIRE webinar: Open Research Data in H2020 – 09/05/2018
30. OpenAIRE webinar: Open Research Data in H2020 – 09/05/2018
Documenting data
• Make your data understandable: project level (context)
and data level (e.g. codebooks, protocol)
• E.g. lab notebook, end-to-end code/scripts for statistics
• Software can help: R, MatLab, Python…
• Be clear what methods you use
Accessible
• Decide on a clear and consisting naming-
system
• Who has access and editing privilege?
• Version tracking software/ file sharing
services can help
• Version control
31. OpenAIRE webinar: Open Research Data in H2020 – 09/05/2018
Createsearchabledata:METADATA
Findable
CC-BY ERESEARCHSA www.ersa.edu.au/wp-content/uploads/2014/06/understanding-metadata.jpg
32. OpenAIRE webinar: Open Research Data in H2020 – 09/05/2018
Createsearchabledata:METADATA
• Data about data
• Machine readable making data findable
Using metadata
• Consists of set of attributes
• Helps prevent inappropriate use
• Use metadata standards of your domain
CC-BY ERESEARCHSA www.ersa.edu.au/wp-content/uploads/2014/06/understanding-metadata.jpg
34. Use standards of
your domain
Digital Curation Centre
General
• Dublin Core (DC)
• Datacite metadata schema
• Metadata Object Description
Schema (MODS)
Humanities
• Text Encoding Initiative (TEI)
• Visual Resources Association
Core (VRA)
Archives/Repositories
• DatastaR minimD-space
metadata
• um Metadata
Social Science
• Data Documentation
Initiative (DDI)
Life Sciences
• Darwin Core
• Integrated Taxonomic
information System (ITIS)
Earth Science
• Directory Interchange Format
(DIF)
• Standard for the Exchange of
Earthquake Data (SEED)
Ecology
• Ecological Metadata Language
(EML)
Geographic/Geospatial
• Federal Geographic Data
Committee (FGDC)
• ISO 19115
• Geospatial Interoperability
Framework (GIF)
METADATA
STANDARDS
36. What about our project page?
36
Sustainable?
Services?
Legal aspects?
Technical standards?
Metadata standards?
Findable?
37. OpenAIRE webinar: Open Research Data in H2020 – 09/05/2018
Where to deposit data?
• Disciplinary/Institutional data repository
Research data repository
• Zenodo cost-free data repository
• Matches data needs
• Directory of data repositories:
www.Re3data.org
Accessible
39. @openaire_euOpenAIRE webinar: Open Research Data in H2020 – 09/05/2018
Re3data.org
Trustworthy Digital
repository
• Persistent identifier
• Licenses
• Access
41. OpenAIRE webinar: Open Research Data in H2020 – 09/05/2018
Opendata
• Apply an open license:
• Keep it simple: “as open as possible, as closed as necessary”
• e.g. creative commons.
Recommended:
• Data repositories can provide licenses
• Re3data.org
Reusable
• License for widest reuse possible
42. OpenAIRE webinar: Open Research Data in H2020 – 09/05/2018
Example
Understandable for
humans
Machine readable
metadata
Tools
Open Data
Open license
45. (Open) Data
Metadata
Other tools
Open
Research
Data Pilot
Data Repositories
• EC guidelines
• OpenAIRE.eu
• dcc.ac.uk
• Standard File Formats
• Standards metadata
schema
• (Open) Licences
• 6 months
• Periodic evaluation
• Final review
STEP 1
WRITE A DMP
dmponline.dcc.ac.uk
Update at
FIND REPOSITORY DEPOSIT DATA Supporting
infrastructure and
information
STEP 2 STEP 3 SUPPORT
• discipline/institutional
• www.re3data.org
• Zenodo
Matches data needs
Designed by Freepik
46. OpenAIRE webinar: Open Research Data in H2020 – 09/05/2018
Support and information?
4646
OpenAIRE- An Open Knowledge & Research Information Infrastructure
• www.OpenAIRE.eu offers infrastructure, tools, information and helpdesk system on Open Science.
OPENAIRE FOSTERS THE SOCIAL
AND TECHNICAL LINKS
THAT ENABLE OPEN SCIENCE IN
EUROPE AND BEYOND.
47. OpenAIREwebinar:OpenResearchDatainH2020 –09/05/2018
OpenAIRE
4747
Training and support material
Information on:
• Open research data pilot
• Creating a data management plan
• Selecting a data repository
• Dealing with personal data
Support material:
Briefing papers, factsheets, webinars,
workshops , FAQs, helpdesk
www.openaire.eu/opendatapilot
48. OpenAIRE webinar: Open Research Data in H2020 – 09/05/2018
OpenAIRE
4848
www.openaire.eu/search
Link your data to
publications or project
49. How can we help?
• Guides
• Factsheets
• Workshops
• Webinars
• Helpdesk
• FAQ
50. Regionalexperts
Ask a question
• The National Open Access
Desks (NOADs):
• Support on a national level
• Country pages with local
information on Open Science
OpenAIREWebinarOAtopublicationsinH2020–08/05/2018
Idea- experiment – data analyse and writing paper – finally time for some pizza while paper gets reviewed – paper: jeej, al your hard work dissapears
FROM DATA IN A SCIENTIFIC PIPELINE TO RESEARCH DATA LIFECYCLE
Research data management concerns the organisation of data, from its entry to the research cycle through to the dissemination and archiving of valuable results.
Research Data Management is part of the research process, and aims to make the research process as efficient as possible
Open data is data that is free to access, reuse, repurpose, and redistribute. The Open Research Data Pilot aims to make the research data generated by selected Horizon 2020 projects accessible with as few restrictions as possible, while at the same time protecting sensitive data from inappropriate access
Data sharing
restricted data to restricted organisations or individuals. Access to this data is usually restricted because it is sensitive in some way, either because it is personal or because its general release might cause security problems.
Prevents data loss: 80% of data is lost after 10 years. Data is fragile and reproducibility very difficult without data.
2, Maximize usefulness and built much more efficient on previous work: Maximize usefulness: organize, make
understandable, reusable and avoid duplication. Preserves data for further research by organizing, Stop drowning in irrelevant stuff. Reproducibility crisis.
3. Fosters creativity, interdisciplinary use of data and meta-analysis
4, public participation in scientific research
5. Promote integrity and increases transparency: managing data is part of good research, avoid accusations of sloppy science
4. Data tend to have a (much!) longer shelf life than interpretation
After accounting for other factors affecting citation rate, we find a robust citation benefit from open data.1
Horizon 2020 Includes a flexible pilot with opt-outs and safeguards
action on open access to research data. There are two main pillars: Participating projects must develop a Data Management Plan(DMP)
specifying which data will be openly accessible.
If your project stems from one of these Horizon 2020 areas, you are automatically part of the pilot.
Costs related to data management in Horizon 2020 are eligible for reimbursement during the duration of the project (see Article 6.2.D.3 of the Annotated Model Grant Agreement)
IPR protection, no data generated and privacy are the most cited reasons.
3, provide information via chosen repository about tools and instruments necessary for validating the results
4, take measures to enable third parties to access, mine, exploide, reproduce and disseminate, free of charge for any user, this research data.
A data management plan or DMP is a formal document that outlines how you will handle your data both during your research, and after the project is completed.[1] The goal of a data management plan is to consider the many aspects of data management, metadata generation, data preservation, and analysis before the project begins; this ensures that data are well-managed in the present, and prepared for preservation in the future.
The DMP needs to be updated over the course of the project whenever significant changes arise, such as(but not limited to)
- new data
- changes in consortium policies (e.g. new innovation potential, decision to file for a patent)
changes in consortium composition and external factors(e.g. new consortium members joining or old members leaving)
DMP is a deliverable, and NOT part of the evaluation.
5 main topic
Costs eligible and resposibilities
Interoperability: how can my data be combined with other datasets and used in other fields?
Licensing: who can access my data and for what perpuse can it be used
- No reimbursement: Articles published before the initiative was launched are not eligible and don’t pay for it yourselves
IPR: Identify source and ownership of third party data Establish conditions of use, copyright constraints and redistribution rights
Statement on intellectual property rights of generated data
IPR: Identify source and ownership of third party data Establish conditions of use, copyright constraints and redistribution rights
Statement on intellectual property rights of generated data
Try to make the barriers to view your data as low as possible. Use open file formats.
Avoid word, pdf and excel files. You can use pdf/a for archiving/if the layout matters.
Outline and justify your file format choices
- Consent not prohibiting data sharing. Gain consent for sharing, preservation and re-use of research
data
Intellectual property rights Identify source and ownership of third party data
- Plan and anonymise data early in research
Create anonymisation log of edits, replacements, removals or
aggregations made
- Consent not prohibiting data sharing. Gain consent for sharing, preservation and re-use of research
data
Intellectual property rights Identify source and ownership of third party data
- Plan and anonymise data early in research
Create anonymisation log of edits, replacements, removals or
aggregations made
EU General Data Protection Regulation
Will the data be stored and backed-up appropriately during the research project? For example on managed university filestores rather than external hard drives
Know your institutional IT security arrangements and capacity for data storage
Know your institutional procedure and regularity of data back-up, especially for remote and cross-institutional working
Arrange backup and storage procedures which are most suited to the partners and nature of your project
Collecting: how will data be collected? What will you do with the data? E.g. survey: will it include a disclaimer what will happen with the data?
Provide links to data sets you used or if you’re allowed, lincenses and copyright, you can also upload the original data set.
Provide end-to-end code/scripts for the generation of figures and statistics
Keep in mind: will someone who is not familiar with the data or the research setup understand what the data is about
metadata assures accessibility of the data, Metadata is the backbone of digital curation.
Data about data to discover and disclose data: resource descriptions
A metadata record consists of a set of attributes or elements, necessary to describe the data in question
Structured information that describes, explains, locates or otherwise makes it easier to retrieve, use or manage an information source
basically data is read by humans, metadata is read by PCs
Helps prevent inappropriate use due to misunderstanding or research purpose or parameters
metadata assures accessibility of the data, Metadata is the backbone of digital curation.
Data about data to discover and disclose data: resource descriptions
A metadata record consists of a set of attributes or elements, necessary to describe the data in question
Structured information that describes, explains, locates or otherwise makes it easier to retrieve, use or manage an information source
basically data is read by humans, metadata is read by PCs
Helps prevent inappropriate use due to misunderstanding or research purpose or parameters
Metadata standards often start as schemas developed by a particular user community to enable the best possible description of a resource type for their needs. Standards-based metadata is generally preferable, but where no appropriate standard exists, for internal use, writing “readme” style metadata is an appropriate strategy.
Trustworthy Digital repository: either supports a repository standard or is certified
Metadata
Other data, including associated metadata, as specified and within the deadlines laid down in the Data Management Plan, that is, according to the individual judgement of each project: For instance curated data not directly attributable to a publication, or raw data.
Documentation: Codebooks, lab journals, informed consents forms… required to enable reuse of the data.
Will people not involved in the project understand what the data is about, how it has been processed.
Read me file: in a plain text format about your data:
Keep in mind: will someone who is not familiar with the data or the research setup understand what the data is about
keep it simple: There is no requirement that every dataset must be made open right now. Keep it feasible: privacy, valorization, ownership. Starting out by opening up just one dataset, or even one part of a large dataset, is fine – of course, the more datasets you can open up the better.
Open licenses: legaly sound licensing
CC0: public domain, waive copyright
CC-BY: Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. No additional restrictions
NUP84 proteins.
MRC is a standard file format for electron density
The txt explains the parameters used
Catch all repository for all kinds of research output: data publications, presentations;, software….
For free, up to 50GB per dataset limit
equipped with a helpdesk system. specific questions: by the subject or by your function.
Questions are answered by a team of specialist, not any on a technical level but there are people of all countries involved able to answer country specific questions; The NOADs: national open access desks, are representatives of open access for their country and can help you with any country specific questions.