This presentation describes best practices for how to write a data management plan for your research data. Additionally, it provides information about finding funder requirements, metadata standards, and repositories.
Data and Donuts: How to write a data management plan
1. How to write
a data
management
plan
C. Tobin Magle, PhD
Jan 24, 2017
10:00-11:30 a.m.
Morgan Library Computer
Classroom 173
*inspired by content from CU
Boulder research computing
3. What is research data?
• “The recorded factual material
commonly accepted in the
scientific community as
necessary to validate research
findings”
- White House Office of
Management and Budget
• Reality: anything that is a
(digital) product or your
research
4. What is a data
management plan?
A description of how you plan to describe, preserve
and share your research data.
Often required by funding agencies
5. Successful DMPs include
• A data inventory, including type(s) and size
• A strategy for describing the data
• A plan for preserving the data long term
• A method for access to the data
Always make sure to follow funder requirements
6. Data inventory
• What type of data are you going to collect?
• What file type will be produced?
• What size will these files be? How many files?
• What other research outputs will be produced?
• Code/Software?
• Templates/protocols?
7. Data inventory
miRNA sequences
FASTQ files
1 GB per file
x 64 strains
x 3 replicates
-------------------
~200 GB
R scripts for
analysis and
visualization
Data use tutorials
• What type of data are you going to collect?
• What file type will be produced?
• What size will these files be? How many files?
• What other research outputs will be produced?
• Code/Software?
• Templates/protocols?
8. Data formats
• Avoid proprietary formats
• Know what software can read your data
Proprietary Format Alternative Format
Excel (.xls, .xlsx) Comma Separated Values (.csv)
Word (.doc, .docx) plain text (.txt)
PowerPoint (.ppt, .pptx) PDF/A (.pdf)
Photoshop (.psd) TIFF (.tif, .tiff)
Quicktime (.mov) MPEG-4 (.mp4)
MPEG 4 Protected audio (.m4p) MP3 (.mp3)
9. Exercise: Data Inventory
What kind of data are you going to collect?
What file type will be produced?
What size will these files be? How many files?
What other research outputs will be produced?
10. A strategy for describing the data
• Metadata: Relevant information
for re-creation and re-use
• Contact info
• How data was collected
• Details about collection
• Date, location of collection
• Units
• Can be as simple as a text file
11. Genomics example (README)
This project contains next-generation miRNA sequencing data from 64 mouse strains.
Brain tissue from 10 week old male mice were harvested, stored in RNA later. RNA was
extracted using an RNeasy kit, and miRNA libraries were produced using an Illumina kit.
They were run on an Illumina mySeq sequencer. The FASTQ Files produced were analyzed
in R using Bioconductor.
The data and descriptive will be made available on NCBI in the bioproject (PRJXXXX). The
scripts used to analyzed the data are available on github (URL). Tutorials for data use will
be made available in the Digital Collections of Colorado (handle).
Contact Tobin Magle (tobin.magle@colostate.edu) for more information.
http://orcid.org/0000-0003-3185-7034
12. Metadata standards
• Dublin Core: http://dublincore.org/documents/dcmi-terms/
• Can be applied to anything
• Many discipline specific metadata standards
• EML: https://knb.ecoinformatics.org/#external//emlparser/docs/index.html
• MIAME: http://fged.org/projects/miame/
• Search for other standards:
• http://www.dcc.ac.uk/resources/metadata-standards
• https://biosharing.org/standards/
14. Exercise: Describe your data
What do people need to know to reuse your data?
Are there any discipline-specific metadata standards?
What format will you describe your data in (text, XML, tabular)?
What fields will you include (author, date, format, identifier?)
15. A plan for preserving the data long term
• What will you do to ensure
data are properly stored and
preserved?
• Include metadata and other
products needed for reuse
• Might change over course of
the project
16. Preservation questions
• What will you store?
• Who will be in charge?
• How long will you store it?
• Where will you store it?
• Multiple copies
17. Recommendations for backing up data
• Store in geographically distinct
locations
• Automation: Will you remember to do it
manually?
• Security: Are you working with PHI?
18. Exercise: Preservation plan
What will you store?
Who will be responsible for the data (person or position)?
How long will you store it?
Where will you store it?
How will you back it up?
19. A method to access the data
• Important to funding agencies
• Reproduce existing research
• Promote further research
• Must be easily available:
• No “by request only”
• Embargoes are “ok”
• Data security: consider privacy
and IP issues before sharing
20. Data access and sharing best practices
• Non-proprietary formats
• Include metadata
• Proper storage
• Stable identifier
• Licensing: conditions for reuse
21. Trusted Repositories: store and share
• Discipline specific repositories
• Search:
http://service.re3data.org/browse/by-
subject/
• Generic:
• Figshare - https://figshare.com/
• Dryad - http://datadryad.org/
• CSU Digital Repository:
• http://lib.colostate.edu/digital-collections/ http://67.media.tumblr.com/6228cbe58a9652f1a85e8a
b1ed08d715/tumblr_inline_n6oukhNlZW1qf11bs.png
22. Data archiving service
• Finished products for
sharing
• CSU Digital Repository
• Over 100 Datasets
• Satisfy requirements for
manuscripts and grants
• At no cost <1 TB
• $150/TB for 5 years
• $300/TB for >5 years
23. Stable identifiers
• URLs break
• Stable identifiers are
permanent in a database
• Some provide linking
capabilities
• DOI –
https://doi.org/10.1109/5.771073
• Handle-
http://hdl.handle.net/10217/177356
24. Licensing
• State your conditions for reuse
• Paper citation?
• Disclaimers
• Must justify limitations, describe
how you’ll advertise them
• Creative common licenses are a
good starting point
25. Exercise: Access methods
Where will people be able to access the data?
Does your discipline have a repository?
What kind of stable identifier will it have?
What are the conditions for reuse?
Are there any limitations to use of these data? Why?
26. DMPTool
• Review requirements from
different agencies
• https://dmptool.org/guidance
• Create new DMPs based on
funding agency templates
• Search public DMPs
27. Need help?
• Email: tobin.magle@colostate.edu
• DMPTool: http://dmptool.org/
• Data Management Services website:
http://lib.colostate.edu/services/data-management