Slides of a lecture on research data management (RDM), given for 3rd year students (Eindhoven University of Technology, major Psychology & Technology), as part of the course 0HV90 Quantitative Research. At the end of the slides a handy summary 'Research data management basics in a nutshell' is added.
1. Information Expertise Center / Library
Leon Osinski, IEC
Course for 0HV90 OGO Quantitative research, 20-11-2019
Good (enough) research data management practices
Available under CC BY-SA 4.0 license, which permits copying and redistributing the
material in any medium or format for non-commercial purposes & adapting the material
for any purpose, provided the original author and source are credited & you distribute the
adapted material under the same license as the original
2. Recommendations*
1. Save your raw data
2. Backup your data
3. Organize your data
4. Make your data human friendly
5. Make your data machine friendly
6. Archive and preserve your data
*Based upon Eugene Barsky (2017), Good enough research data management: a very brief guide.
https://researchdata.library.ubc.ca/files/files/2017/07/GoodEnoughResearchDataManagement_V1.1_20170705.pdf
Good (enough) research data management practices: course for 0HV90, 20-11-20192
3. Keep your raw or original data raw
1. Save your raw data read-only in its original format in a separate folder
2. Make a working copy of your raw data (input data, used for processing or
analysis)
+ This version can be identical to the original version. In some cases it will be a
modified version. For example, modifications required to allow your software
to read the file or removing explanatory notes from a table
+ The original and working copy of a data file should be given different names
+ The changes you make to your original data files should be described in a
Readme file
3. Keep the metadata of the original data files (if obtained from others) in a
separate folder
Good (enough) research data management practices: course for 0HV90, 20-11-2019
1. Save your raw data
2. Backup your data
3. Organize your data
4. Make your data human friendly
5. Make your data machine friendly
6. Archive and preserve your data
3
4. Backup your data
During research:
General backup rule: save 3 copies of your data, on 2 different devices, and 1 copy
off site
By using the local ICT infrastructure (university network servers (home drive) and
OneDrive (Office365)) this general rule is met. Moreover, your data is also stored
securely (with controlled access)
For sharing data with (project)partners use SURFdrive (on request for students)
For sending files use SURFfilesender
Do not store or share sensitive data on a commercial cloud (Dropbox, Google
Drive, WeTransfer)
Good (enough) research data management practices: course for 0HV90, 20-11-2019
1. Save your raw data
2. Backup your data
3. Organize your data
4. Make your data human friendly
5. Make your data machine friendly
6. Archive and preserve your data
4
5. Organize your data
TIER documentation protocol : guiding principles
1. keep your raw or original data read-only in its original format in a separate folder
2. keep the working copy of your raw data (used for processing and analysis) in a
separate folder
3. keep the command files (files containing code written in the syntax of the
(statistical) software you use for the study) apart from the data
4. keep the analysis files (the fully cleaned and processed data files that you use to
generate the results reported in your paper) in a separate folder
5. store the metadata (codebook, description of variables, etc.) in a separate folder,
apart from the data itself
Good (enough) research data management practices: course for 0HV90, 20-11-2019
1. Save your raw data
2. Backup your data
3. Organize your data
4. Make your data human friendly
5. Make your data machine friendly
6. Archive and preserve your data
5
6. Organize your data
1. Main project folder (name of your research project/working title of your thesis or paper)
1.1. Original data and metadata
1.1.1. Original data
1.1.2. Metadata (if the original data has been obtained from others)
1.2. Processing and analysis files
1.2.1. Importable data files (a working copy of the original data)
1.2.2. Command files (files containing the code)
1.2.3. Analysis files (the fully cleaned cleaned and processed data files)
1.3. Documents (codebook, readme file, final paper)
1.4. Literature
Good (enough) research data management practices: course for 0HV90, 20-11-2019
1. Save your raw data
2. Backup your data
3. Organize your data
4. Make your data human friendly
5. Make your data machine friendly
6. Archive and preserve your data
6
7. Organize your data
File naming
Good file names are human readable:
meaningful (use descriptive names that contain info on content)
consistent (use file-naming conventions)
unique (distinguishes a file from files with similar subjects as well as different versions of
the file)
and machine readable/searchable
avoid using special characters in file names
use “_” underscore to delimit units in names
use “- “ hyphen to delimit names for readability
include dates (format YYYYMMDD) and a version number on file names
Good (enough) research data management practices: course for 0HV90, 20-11-20197
1. Save your raw data
2. Backup your data
3. Organize your data
4. Make your data human friendly
5. Make your data machine friendly
6. Archive and preserve your data
8. Make your data human friendly
Describe your data in such a way that your data is understandable for humans
Describe your variables and give them clear descriptive names
Use standard names for nominal/categorical data within cells
Document your data by adding a readme file that at least mentions
+ the size of the dataset (number of variables and observations)
+ information about the variables and its measurement units (codebook)
+ what’s included and excluded in the dataset (why data are missing)
+ a description of each step of how the data is collected (study design) and processed
(provenance)
+ an explanation of the structure and naming of the files when your data consists of
multiple files organized in a folder
Good (enough) research data management practices: course for 0HV90, 20-11-2019
1. Save your raw data
2. Backup your data
3. Organize your data
4. Make your data human friendly
5. Make your data machine friendly
6. Archive and preserve your data
8
9. Make your data machine friendly
Ensure that your data can be easily processed, found and reused by computers
Tidy your (tabular) data
Convert your data to open, non-proprietary formats
Describe your dataset with a metadata standard for discovery (title, creators,
description, etc. of the dataset)
Add a user license to your data.
Good (enough) research data management practices: course for 0HV90, 20-11-2019
1. Save your raw data
2. Backup your data
3. Organize your data
4. Make your data human friendly
5. Make your data machine friendly
6. Archive and preserve your data
9
10. Make your data machine friendly
Tidy your (tabular) data
Each variable you measure is in one column
Column headers are variable names
Store units as metadata in their own column
Each observation is in a different row
Each cell contains only one piece of information
Good (enough) research data management practices: course for 0HV90, 20-11-2019
1. Save your raw data
2. Backup your data
3. Organize your data
4. Make your data human friendly
5. Make your data machine friendly
6. Archive and preserve your data
10
11. Make your data machine friendly
Tidy data Messy data
Good (enough) research data management practices: course for 0HV90, 20-11-2019
1. Save your raw data
2. Backup your data
3. Organize your data
4. Make your data human friendly
5. Make your data machine friendly
6. Archive and preserve your data
11
12. Make your data machine friendly
Use open non-proprietary formats
With open non-proprietary data formats it is best ensured that the data will
remain usable and ‘legible’ for computers in the future
Are easy to use in a variety of software, like .csv for tabular data
Check the data formats that are supported by a data archive like
4TU.ResearchData
Good (enough) research data management practices: course for 0HV90, 20-11-2019
1. Save your raw data
2. Backup your data
3. Organize your data
4. Make your data human friendly
5. Make your data machine friendly
6. Archive and preserve your data
12
13. Make your data machine friendly
Add metadata for discovery
Creator ; title ; short description + key words ; date(s) of data collection ;
publication year ; related publications ; DOI ; etc.
When uploading your data in a data archive like 4TU.ResearchData, you will be
asked to enter these metadata.
A DOI - a number that uniquely identifies your dataset regardless of where it is on
the internet (URL) - is assigned by the data archive
Good (enough) research data management practices: course for 0HV90, 20-11-2019
1. Save your raw data
2. Backup your data
3. Organize your data
4. Make your data human friendly
5. Make your data machine friendly
6. Archive and preserve your data
13
14. Make your data machine friendly
Add a user license
With a user license you make clear in advance what other people under what
conditions are allowed to do with your data
Creative Commons license for data sets
GNU General Public License (GPL) for software
License selector ; Choose an open source license
Good (enough) research data management practices: course for 0HV90, 20-11-2019
1. Save your raw data
2. Backup your data
3. Organize your data
4. Make your data human friendly
5. Make your data machine friendly
6. Archive and preserve your data
14
15. Archive and preserve your data
Submit your final datasets to a data repository your data are archived and preserved
for the long term and your data will be published and made available to others
Choose a repository where other researchers of your discipline are sharing their
data
If not available use a general repository that at least assigns a DOI, that requires
that you provide adequate metadata and where you can select a user license
+ 4TU.Centre for Research Data, DANS, Zenodo
+ ARCHIE (TU/e, HTI)
Good (enough) research data management practices: course for 0HV90, 20-11-2019
1. Save your raw data
2. Backup your data
3. Organize your data
4. Make your data human friendly
5. Make your data machine friendly
6. Archive and preserve your data
15
16. FAIR research data management
If you follow these good data management practices, you data will be FAIR, that is:
Findable, Accessible, Interoperable, Reusable.
Good (enough) research data management practices: course for 0HV90, 20-11-201916
17. Furthermore, these good data management practices
reduce the risk of data loss
improve your research workflow
can help you get recognized for your work
can lead to novel insights because data can easily be combined
promote scientific integrity and quality of data (combat scientific fraud)
reduce the need for duplication of research and data collection
put public-funded research results in the public sphere
promote collaboration as your data is findable and usable for other researchers
businesses and other organisations can also profit from research (data)
contribute to better and more efficient science overall as your research results are more
accessible and usable
Good (enough) research data management practices: course for 0HV90, 20-11-201917
18. Support
General: rdmsupport@tue.nl
4TU.Centre for Research data: researchdata@4tu.nl ; l.osinski@tue.nl ;
m.j.h.ollers@tue.nl
Data Coach (website): https://www.tue.nl/datacoach
Working with data (website): https://intranet.tue.nl/en/university/digital-
university/data-stewardship/working-with-data/
Open Science Community Eindhoven: https://osceindhoven.github.io/
Online course Research data management basics (online training)
PROOF course Open Science (with a part on research data management) (training)
Good (enough) research data management practices: course for 0HV90, 20-11-201918
19. Recommended reading on ‘good data practices’
1. Eugene Barsky (2017), Good enough research data management: a very brief guide.
https://researchdata.library.ubc.ca/files/files/2017/07/GoodEnoughResearchDataManagement_V1.1_201
70705.pdf
2. Wilson G, Bryan J, Cranston K, Kitzes J, Nederbragt L, Teal TK (2017) Good enough practices in scientific
computing. PLOS Computational Biology, 13(6): e1005510. https://doi.org/10.1371/journal.pcbi.1005510
3. Broman, K.W., Woo, K.H., Data organization in spreadsheets, in: The American Statistician.
https://doi.org/10.1080/00031305.2017.1375989
4. Ellis SE, Leek JT. (2017) How to share data for collaboration. PeerJ reprints : e3139v1
https://doi.org/10.7287/peerj.preprints.3139v1
Good (enough) research data management practices: course for 0HV90, 20-11-201919
20. Useful references
Save your raw data and backup your data
1. Storage, back up of data: https://www.ukdataservice.ac.uk/manage-data/store
2. University network servers: https://intranet.tue.nl/en/university/services/ict-services/ict-service-catalog/management-
services/data-management-storage/ (TU/e intranet)
3. OneDrive: https://intranet.tue.nl/en/university/services/01-01-1970-information-management-services/help-and-
support/manuals/user-support-systems/office-365/manual-transition-to-office-365/migration-to-office-365/step-4-put-
onedrive-into-operation/faqs-onedrive/how-do-i-use-onedrive/
4. SURFdrive: https://www.surfdrive.nl/
5. SURFdrive (at TU/e): https://intranet.tue.nl/en/university/services/ict-services/ict-service-catalog/management-
services/data-management-surfdrive (TU/e intranet)
Good (enough) research data management practices: course for 0HV90, 20-11-201920
21. Useful references
Organize your data
6. Organization of your data in folders according to the TIER documentation protocol: https://www.projecttier.org/tier-
protocol/specifications/#overview-of-the-documentation
7. File naming: Best practices for file naming (Stanford University Libraries) and
http://www2.stat.duke.edu/~rcs46/lectures_2015/01-markdown-git/slides/naming-slides/naming-slides.pdf
Make your data human friendly
8. Readme file: https://researchdata.4tu.nl/fileadmin/user_upload/Documenten/Guidelines_for_creating_a_README_file.pdf
based upon Cornell University’s Guide to writing ‘readme’ style metadata:
https://data.research.cornell.edu/content/readme
Good (enough) research data management practices: course for 0HV90, 20-11-201921
22. Useful references
Make your data machine friendly
9. Tidy data: https://www.jstatsoft.org/article/view/v059i10
10. OpenRefine (tool for tidying data): http://openrefine.org
11. TidyR (R package for tidying data) : http://tidyr.tidyverse.org/
12. PROOF course Practical data analysis using R for researchers: https://intranet.tue.nl/en/university/services/service-for-
personnel-and-organization/human-resource-management/professional-development/proof-training-program/research-
skills/practical-data-analysis-using-r-for-researchers/
13. Licensing your data with 4TU.ResearchData: https://researchdata.4tu.nl/en/use-4turesearchdata/archive-research-data/upload-
your-data-in-our-data-archive/licencing/
14. Creative Commons licenses: https://creativecommons.org/
15. GNU General Public License: https://www.gnu.org/licenses/gpl-3.0.en.html
16. License selector: https://ufal.github.io/public-license-selector/
17. Choose an open source license: https://choosealicense.com/
18. Preferred data formats of 4TU.ResearchData: http://researchdata.4tu.nl/en/publishing-research/data-description-and-formats/
19. DataCite metadata schema for discovery: https://schema.datacite.org/
Good (enough) research data management practices: course for 0HV90, 20-11-201922
23. Useful references
Archive and preserve your data
20. Research data catalogue Re3data.org: https://www.re3data.org/
21. Publishing data: 4TU.Centre for Research Data: https://researchdata.4tu.nl/en/
22. Self upload 4TU.ResearchData: https://data.4tu.nl/account/login/?next=/upload/
23. Publishing data: Zenodo: http://www.zenodo.org/
24. Publishing data: DANS: http://www.dans.knaw.nl/en
Good (enough) research data management practices: course for 0HV90, 20-11-201923
24. 1
Research data management basics in a nutshell1
by Leon Osinski, Eindhoven University of Technology & 4TU.Centre for Research Data, November 2019
6 recommendations
1. Save your raw data
2. Backup your data
3. Organize your data
4. Make your data human friendly
5. Make your data machine friendly
6. Archive and preserve your data
Save your raw data
1. Keep your raw data raw by saving your raw data read-only in its original format in a separate folder
2. Make a working copy of your raw data (input data, used for processing or analysis)
+ This version can be identical to the original version. In some cases, it will be a modified version.
For example, modifications required to allow your software to read the file or removing
explanatory notes from a table
+ The original and working copy of a data file should be given different names
+ The changes you make to your original data files should be described in a Readme file
3. If the original data file has been obtained from others, keep the metadata of it in a separate folder.
Backup your data
During research:
1. Save 3 copies of your data, on 2 different devices, and 1 copy off site
2. By using the local ICT infrastructure (university network servers (home drives) and OneDrive
(Office365)) this general rule is met. Moreover, your data is also stored securely (controlled access)
3. For sharing data with (project)partners use SURFdrive (on request for students)
4. For sending files use SURFfilesender
5. Do not store or share sensitive data on a commercial cloud (Dropbox, Google Drive, WeTransfer)
Organize your data
1. Main project folder (name of your research project/working title of your thesis or paper)
1.1. Original data and metadata
1.1.1.Original data
1.1.2.Metadata (if the original data has been obtained from others)
1.2. Processing and analysis files
1.2.1.Importable data files (a working copy of the original data)
1.2.2.Command files (files containing code written in the syntax of the statistical software)
1.2.3.Analysis files (the fully cleaned and processed data files used to generate the results)
1.3. Documents (codebook, readme file, final paper)
1.4. Literature
1
Based upon Eugene Barsky (2017), Good enough research data management: a very brief guide.
https://researchdata.library.ubc.ca/files/files/2017/07/GoodEnoughResearchDataManagement_V1.1_2
0170705.pdf
25. 2
Make your data human friendly
Describe your data in such a way that your data is understandable for humans
1. Describe your variables and give them clear descriptive names
2. Use standard names for nominal/categorical data within cells
3. Document your data by adding a readme file that at least mentions
+ the size of the dataset (number of variables and observations)
+ information about the variables and its measurement units (codebook)
+ what’s included and excluded in the dataset (why data are missing)
+ a description of each step of how the data is collected (study design) and processed
(provenance)
+ an explanation of the structure and naming of the files when your data consists of multiple files
organized in a folder
Make your data machine friendly
Ensure that your data can be easily processed, found and reused by computers
1. Tidy your (tabular) data
+ Each variable you measure is in one column
+ Column headers are variable names
+ Store units as metadata in their own column
+ Each observation is in a different row
+ Each cell contains only one piece of information
2. Convert your data to open, non-proprietary formats
3. Describe your dataset with a metadata standard for discovery (title, creators, description, etc. of the
dataset)
4. Add a user license to your data.
Archive and preserve your data
Submit your final datasets to a data repository where your data are archived and preserved for the long
term and your data will be published and made available to others
1. Choose a repository where other researchers of your discipline are sharing their data
2. If not available use a general repository that at least assigns a persistent identifier to your data (a
DOI), that requires that you provide adequate metadata and where you can select a user license of
choice
+ 4TU.Centre for Research Data, DANS, Zenodo
Support
1. General RDM support: rdmsupport@tue.nl
2. 4TU.Centre for Research data: researchdata@4tu.nl ; l.osinski@tue.nl ; m.j.h.ollers@tue.nl
3. Website Data Coach: https://www.tue.nl/datacoach
4. Website Working with data: https://intranet.tue.nl/en/university/digital-university/data-
stewardship/working-with-data/
5. Training Online course Research data management basics
6. Training PROOF course Open Science (with a part on research data management)
7. Training Electronic lab notebook (ELN)
Available under CC BY license, which permits copying and redistributing the material in any medium or format &
adapting the material for any purpose, provided the original author and source are credited