2. Blackboard
UT website, employees page
ORG-AA-BA-RESDATAMAN: Course Research Data Management
Course material: presentations, links to information, DMP template,
datasets
After the course-day: contact for support and feedback
3. Why research data management
• Importance of quality, reliability, replicability and
verification of scientific research
• Better and more efficient access to research data
• Requirements of research funders with regard to data
management
• Data management will become an issue in research
assessments
4. Benefits research data management
• Improved research quality
• Improved efficiency
• Protection from data-related risks
• Enhanced reputation and prestige
5. Research Data Management: importance (1/2)
Scientific integrity (1), funder requirements (2) and developments in science
(3)
(1) Fabrication, Falsification and Plagiarism (FFP) > RDM?
Neglect of basic preservation of data
Neglect of data management
No proper mechanism for quality control: no data or instruments
for easy data reproduction means no possible check
See also:
https://www.utwente.nl/en/organization/structure/management/good-management/
Netherlands Code of Conduct for Academic Practice: Verification section
6. Research Data Management: importance (2/2)
(2) NWO and EU Horizon 2020 data management pilots
Focus on open data and reuse
Data Management Plan
Data archived in data repository
NWO: http://www.nwo.nl/en/policies/open+science/data+management
EU H2020:
http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pi
(3) Development in science
Data intensive science (4th
paradigm)
Data collections are future assets of research groups
7. What you will learn today
Data management planning: how to make a DMP, what issues and
how to describe (interactive)
Awareness of importance of managing data after research: data
citation and publication (persistent identifiers) and proper data
archiving
Knowledge about legal issues in data management
8. Programme
9:30 Introduction to Research Data Management Dr. ir. Maarten van Bentum, data librarian
UT - Library & Archive
9:45 Data Management Planning Dr. ir. Maarten van Bentum, data librarian
UT - Library & Archive
10:00 Small group assignment:
Writing a DMP section (based on one of the
research cases in the group)
Dr. ir. Maarten van Bentum, data librarian
UT - Library & Archive
10:45 Break
11:00 Plenary presentations: Each group presents the
section they have prepared, and rest of the teams
act as the EU review committee.
Dr. ir. Maarten van Bentum, data librarian
UT - Library & Archive
12:30 Lunch
13:30 Data Citation: Claiming Data with DOI’s (incl. small
assignments)
Ellen Verbakel, data librarian TU Delft -
3TU.Datacentrum
14:00 Hands on Data CV, ORCID (participants individually) Ellen Verbakel, data librarian TU Delft -
3TU.Datacentrum
14:45 Data publications Ellen Verbakel, data librarian TU Delft -
3TU.Datacentrum
15:00 Break
15:15 Data archive, Dataseal, DIY/DIT Ellen Verbakel, data librarian TU Delft -
3TU.Datacentrum
15:30 Legal issues: Data retention, data protection,
privacy, ownership
Drs. Heiko Tjalsma, legal advisor DANS
16:30 Evaluation form: tell us what you think about this
course
16:45 Closure
9. Data Management Plan – a definition
Formal research project document about what and how data will be
collected, stored, described, and archived and how access, reuse and
linking to publications will be realised.
10. Data Management Plan - topics
Responsibility
Description of data
Methodology data collection
Documentation: metadata (standards)
Quality assurance
Storage and backup
Policies for access and sharing and provisions for appropriate
protection/privacy
Policies and provisions for reuse, redistribution
Plans for archiving and preservation of access
From: National Science Foundation and University of California
11. Data Management Plan - templates
Information, templates and checklists
UT template: website RDM on Library & Archive
3TU.Datacentrum: template
DANS checklist
NWO form
12. Writing a DMP
6 small groups (data collection, data storage and backup, data
documentation, data access, data sharing and reuse, data preservation
and archiving)
Use UT template
Work with research case or dataset of one of the group members
Plenary presentations and discussion (15 min each)
13. DMP - Data collection (1/1)
Type of data > what else should be considered to be object for management:
software, models, scripts, instruments, questionnaires, informed consent, etc.
Legal and contractual regulations: Personal data? >
Dutch Personal Data Protection Act,
http://www.utwente.nl/az/gegevensbescherming/ (in Dutch)
UT classification guideline for information and information systems (in
Dutch)
Who collects data: third party? > contract about rights and licenses, example
bankruptcy research agency (see later: data access)
14. DMP - Data storage and backup (1/4)
Criteria
Sustainability/reliability: frequency backup (off line / off site?)
Dataset type: raw dataset, versions during processing and analysis, final
datasets
Size dataset: capacity, costs, data transfer
Legal or contractual regulations
Access: individual, community, open
15. DMP - Data storage and backup (2/4)
Storage options
1.UT central storage
p- or m-disk (ICTS): http
://www.utwente.nl/icts/diensten/catalogus/dataopslag_mw/storage/)
1.Project, community or research institute storage
IGS Datalab: https://www.utwente.nl/igs/datalab/
§Individual data storage (computer, dvd/cd, external hard disk,…)
§Non-commercial cloud storage
Surfdrive: https://www.surfdrive.nl/en
DataverseNL: https://dataverse.nl/dvn/
§Commercial cloud storage: Dropbox, OneDrive, …
16. DMP - Data storage and backup (3/4)
Storage solution Advantages Disadvantages Suitable for
University of Twente
(ICTS) central storage
M: and P:
full service; reliable,
durable, secure; high
speed data transfer
no sharing outside UT saving large data files; master
copy of data; use encryption for
sensitive and critical data; use
SURFfilesender for encrypted
data transfer
PC or laptop always available;
portable; low cost;
high speed data
transfer
sensitive to damage and
loss (no automatic
backup); no sharing
saving large data files; temporary
storage; use encryption for
sensitive and critical data
Personal storage
devices (USB flash,
external hard drive,
DVD/CD)
portable; low cost easily damaged or lost
(no automatic backup);
not for sensitive or
critical data; difficult
sharing
saving large data files; temporary
storage of standard data
Non-commercial cloud
services (for example,
DataverseNL1
,
SURFdrive)
automatic
synchronization on
several devices; easy
access; external
sharing
medium speed data
transfer; not for
sensitive or critical data
(SURFDrive: when
encrypted)
sharing standard data with
external parties
Commercial cloud
services (for example,
Dropbox, Google Drive,
OneDrive)
automatic
synchronization on
several devices; easy
access; external
sharing
medium speed data
transfer; not for
sensitive or critical data;
unclear access to data;
unclear privacy
regulations
sharing standard data with
external parties
17. DMP - Data storage and backup (4/4)
UT data policy
During the research the research data will be saved in a central
repository which is available to at least the members of the research
group/ institute and which is managed by this research group/ institute.
Storage and access should be managed in accordance with legal
regulations, any third party contractual requirements, etc.
Backup
3 copies (original, external/local, external/remote)
Local vs. remote depends on recovery time needed
Data transfer
https://www.utwente.nl/icts/en/diensten/catalogus/filesender/
18. DMP - Data documentation (1/4)
Documentation during research of dynamic data sets (for yourself,
fellow researchers in the project and/or group)
Documentation after research of static data sets (for discovery,
verification, replication, and reuse)
Documentation: standard metadata schemes enhanced with specific
descriptive elements necessary for verification, replication, and reuse
See list: http://www.dcc.ac.uk/resources/metadata-standards/list
See also 3TU.Datacentrum Data description and formats
19. DMP - Data documentation (2/4)
Title name of the dataset or research project that produced it
Creator names and addresses of the organization or people who created the
data, including all significant contributors
Identifier The identification number used to identify the data, even if it’s just
an internal project reference number
Subject keywords or phrases describing the subject or content of the data
Dates key dates associated with the data, including:
project start and end date; release date;
other dates associated with the data lifespan, e.g., maintenance
cycle, update schedule
Funders organizations or agencies who funded the research
Language language(s) of the intellectual content of the resource, when
relevant
Location where the data relates to a physical location, record information
about its spatial coverage
Rights description of any known intellectual property rights held for the data
List of file names and relationships list of all digital files in the archive, with
their names and file extensions (e.g., 'NWPalaceTR.WRL', 'stone.mov')
20. DMP - Data documentation (3/4)
Formats format(s) of the data, e.g., FITS, SPSS, HTML, JPEG
Methodology how the data was generated, including equipment or software
used, experimental protocol, other things you would include in your lab
notebook. Can reference a published article, if it covers everything
Workflows or analyses to be able to reproduce your work
Sources references to source material for data derived from other sources,
including details of where the source data is held, how identified and
accessed
Versions date/time stamped, and use a separate ID (e.g., version number) for
each version
Checksums to test if your file has changed over time
Explanation of codes used in file names brief explanation of any naming
conventions or abbreviations used to label the files
List of codes used in files list of any special values used in the data (e.g.,
codes for categorical survey responses, '999 indicates a "dummy" value in
the data,' etc.)
Store metadata in a text file (such as a readme file or codebook) in the
same directory as the data
21. DMP - Data documentation (4/4)
File naming conventions: http://guides.lib.purdue.edu/content.php?
pid=440001&sid=4901667
Good directory structure:
Directory top-level should include
Project title
Unique identifier
Date (e.g. year)
Substructure should have clear, documented naming convention
e.g. each run of an experiment, each version of a dataset, each person
in the group.
22. DMP - Data access (1/3)
- UT data policy?
- Funder requirements?
- Requirements other parties? Contracts?
- Open Access required? Possible? Dutch Personal Data Protection
Act (UT Data Protection Officer)
23. DMP - Data access (2/3)
data access
M:drive (Home-
directory)
P:drive (Group-
permissions)
DataverseNL Surfdrive
Commercial cloud
(Dropbox, etc)
internal group/organization no yes yes yes yes
external group/organization no no yes yes yes
on request no no yes no no
view/download rights management no yes yes yes yes
edit rights management no yes yes yes yes
collaborating on data no no yes yes yes
24. DMP - Data access (3/3)
DataverseNL
dynamic data sets (file version control)
static data sets (release with persistent id)
access rights management
not for privacy sensitive data!
25. DMP - Data sharing and reuse (1/1)
Why sharing your data?
Replication / verification
Promote your research
Enable new discoveries (reuse)
"Open where possible, protected where needed"
See NWO policy http://www.nwo.nl/en/policies/open+science
After research: public, linked to publication(s) > DataverseNL, data
centres
26. DMP - Data preservation and archiving (1/2)
UT data policy
Preferably during the research, but not later than 1 month after
finishing the research, the research data are archived in a trusted
repository (e.g. DANS or 3TU.Datacentrum). The research data
are, taking legal regulations, any third party contractual conditions
into account, preferably publicly available. This covers at least the
research data that form the basis of publications about the
research, but can also comprise the full set of raw and/or edited
research data.
After the research all durably stored research data and the
publications based on those data are linked. This is at least the
case for PhD dissertations.
27. DMP – Data preservation and archiving (2/2)
Data centres:
3TU.Datacentrum
DANS
List of data repositories: Databib or Data repositories
Notas do Editor
General reasons for more attention to RDM
Specific benefits of good RDM
Costs time in the beginning, saves time in the end and overall
Data loss, data corruption, unauthorized access (confidential data, privacy, …)
Good to show that your research is based in proper data creation and handling and that partly because of that can be replicated. Some remarks: Data as reference material.
Although still underestimated: when data are linked to publication, it raises the value of that publication (more journals require data with the publication). Data in itself can be seen as output, data journals.
Data management needed for these reasons (integrity) but also for other (scientific) users, obligation of funders (OA), and other reasons.
To avoid any doubts on scientific integrity: in general good practice, but some bad practices.
Criteria: Fabrication, Falsification and Plagiarism (FFP)
Fabrication of data (Stapel, Schön)
Untraceable data (Poldermans)
Neglect of basic preservation of data
Neglect of data management
No proper mechanism for quality control: no data or instruments for easy data reproduction means no possible check
NWO pilot
from 1-1-2015,
7 rounds of funding
Data management section (based on 4 questions) followed by DMP after awarded funding
EU H2020
from 1-1-2014
7 research areas
Data Management Plan required within six months after project grant
Deposit in a research data repository
Opting out of the pilot is possible when motivated
DMP regarded as living document
Data intensive research: New type of research (research without any lab/field time and more data than we can analyze)
Learn from data management practice from other researchers in different scientific fields.
Learn different solutions, in many cases not standard
Also term data curation is often mentioned. This is broader than data management. It covers also the technical part of data handling both during and after the research. For instance how data centres handle data during preservation. It is therefore less suitable for describing the handling of data by the researcher.
Good management starts with a data management plan.
person responsible for data management within your research project
description of the data and the methods used to collect or create the data
how data will be documented throughout the research project
how data quality will be assured
backup procedures
how data will be made available for public use and potential secondary uses
preservation plans
any exceptional arrangements that might be needed to protect participant confidentiality or intellectual property
UT data classification guideline (only in Dutch: informatiebeveiliging, classificatierichtlijn informatie en informatiesystemen)
See also access
(see also guidance DMP)
In general you need 3 copies : original, external/local, external/remote
Dataverse: come back later to that with data access
Surf filesender: encrypted,
There are subject-based metadata schemes, but these even may be to generic for your data.
- Who decides? Can IP on data be claimed? Does any party claim IP?
UT data policy: no statement about ownership
What if other data collection is done by specific organisation…(bankrupt? > curator?)
This afternoon more in presentation on legal issues
Hosted by DANS, data archive for social sciences and humanities
Store, describe data sets and give selected access
Keep all versions?
Just final version?
First and last?
DANS and 3TU.Datacentrum: Data seal of approval
Question to estimate costs: no tariff structure yet, 4,5 euro/GB. Invoice to university, how this will be passed on to research project is not clear yet.
More about data archiving, data citation, etc. in afternoon session