This document provides guidance on creating a data management plan (DMP). It explains that DMPs are required by many funders to help researchers better organize, document, and preserve their data. The key parts of a DMP include describing the data, metadata standards, data security, archiving and preservation, and access. The presenter provides tips for addressing each part, such as using open formats and partnering with repositories. Resources for creating a DMP at the University of Wisconsin-Milwaukee are also listed.
This PowerPoint helps students to consider the concept of infinity.
Creating a Data Management Plan
1. Creating a Data Management Plan
Kristin Briney, PhD
Data Services Librarian
2. This Session Will Answer
• Why am I being asked to create a DMP?
• What are the key parts of a DMP?
• How do I translate my research to each of
these parts?
3. You Will Leave With
• An understanding of the main parts of a data
management plan
• Knowledge of where to find resources and
assistance
Rough outline of your data management plan
4. WHY AM I BEING ASKED TO CREATE
A DATA MANAGEMENT PLAN?
5. Why Data? Why Now?
• Data are DIGITAL
– Easy to copy and share
– Difficult to preserve
• Data are COMPUTABLE
– New avenues of research like data mining
• Data represent a FINANCIAL INVESTMENT
– Poor research funding climate
– Can no longer ignore data as a scholarly product
6. Many Funders Require DMPs
• NSF
• NEH
• NIH
• NOAA
• NASA
• …even more funders will require DMPs soon!
– White House OSTP Public Access memo
7. The Funder Perspective
• Data is a scholarly resource
– Data sharing akin to scholarly publishing
• Barriers to sharing are
– Organization
– Documentation
– Long-term management and preservation
Hence data management plans
8. DMPs Help You Too!
• Don’t loose data
• Find data more easily
• Easier to analyze organized, documented data
• Avoid accusations of fraud & misconduct
• Get credit for your data
• Don’t drown in irrelevant data!
9. For each minute of planning at
beginning of a project, you will save
10 minutes of headache later
10. DMPs Help You Too!
A data management plan will make conducting
research easier for you…
…So if you are required to create a DMP, why
not use it to improve your practices?
11. WHAT ARE THE KEY PARTS OF A
DATA MANAGEMENT PLAN?
12. Actual NSF DMP Requirements
• The types of data, samples, physical
collections, software, curriculum materials, and other
materials to be produced in the course of the project
• The standards to be used for data and metadata format and
content
• Policies for access and sharing including provisions for
appropriate protection of
privacy, confidentiality, security, intellectual property, or
other rights or requirements
• Policies and provisions for re-use, re-distribution, and the
production of derivatives
• Plans for archiving data, samples, and other research
products, and for preservation of access to them
http://www.nsf.gov/pubs/policydocs/pappguide/nsf13001/gpg_2.jsp#dmp
13. Key Questions
1. What data will I create?
2. What standards will I use to document the
data?
3. How will I protect private/secure/confidential
data?
4. How will I archive and preserve the data?
5. How will I provide access to and allow reuse
of the data?
14. Be Aware
• Actual requirements vary by funder and
division
• Look up your requirements before you write
your DMP
15. HOW DO I TRANSLATE MY RESEARCH
TO EACH OF THESE PARTS?
17. What Are Data?
• “Research data is defined as the recorded
factual material commonly accepted in the
scientific community as necessary to validate
research findings”
– OMB Circular A-110
http://www.whitehouse.gov/omb/circulars_a110
18. What Are Data?
• Observational
– Sensor data, telemetry, survey data, sample
data, images
• Experimental
– Gene sequences, chromatograms, toroid magnetic
field data
• Simulation
– Climate models, economic models
• Derived or compiled
– Text and data mining, compiled database, 3D
models, data gathered from public documents
19. What Not To Share
• Laboratory notebooks
• Preliminary analyses
• Drafts of scientific papers
• Plans for future research
• Peer reviews or communications with
colleagues
• Physical Samples
20. No Data?
• Still need a data management plan
• Plans with no data and no sharing will likely be
examined more closely
– Carefully explain situation if you are in this
position
21. Exercise
• Conduct a quick inventory of the data you will
acquire
– What data will you collect?
– Is your data unique?
– How big will the data be?
– How fast will the data grow?
23. What would someone unfamiliar
with your data need in order to find,
evaluate, understand, and reuse
them?
24. Documentation
• Consider the difference in documenting for
– someone inside your lab
– someone outside your lab but in your field
– someone outside your field
• Audience matters!
25. Documentation
Methods
• How the data were
gathered
• How the data should be
interpreted
• What you did
– Limitations on what you did
• …build trust in your data
Metadata
• What you’re looking at
• Who made it and when
• How it got there
• What it means
• What you can do with it
• …before you even look at
the file
26. Methods
• Examples of methods to document
– Code
– Survey
– Codebook
– Data dictionary
– Anything that lets someone reproduce your results
• Don’t forget the units!
27. Metadata
• Look for a metadata scheme before you collect
the data!
– Lots of metadata schemas available
– Easier to record metadata when collecting data than
to convert later
• Consult
– Disciplinary repository
• Repositories usually have required metadata schemas
– Your peers
– Subject librarian
28. Metadata Example: Dublin Core
• Contributor
– Jane Collaborator
• Creator
– Kristin Briney
• Date
– 2013 Apr 15
• Description
– A microscopy image of
cancerous breast tissues
under 20x zoom. This image is
my control, so it has only the
standard staining describe on
2013 Feb 2 in my notebook.
• Format
– JPEG
• Identifier
– IMG00057.jpg
• Relation
– Same sample as images
IMG00056.jpg and
IMG00055.jpg
• Subject
– Breast cancer
• Title
– Cancerous breast tissue
control
29. Exercise
• What methods information do you need to
preserve?
• What metadata standard will you use for your
data? -OR- Who will you contact to find a
relevant standard?
30. 3. HOW WILL I PROTECT
PRIVATE/SECURE/CONFIDENTIAL DATA?
31. Security Issues
• Does your data fall under the following?
– HIPAA
• Health information
– FERPA
• Student information
– FISMA
• Government subcontractor
– Human subject research, etc.
Ask for help!
32. Security Issues
• Secure storage
• Controlled access
• De-identification of personal information
• Security training
33. Security Questions
• Access permissions
– Who is allowed to access the data?
• Sharing
– Am I required to share? Can I actually share?
– Despite requirements, some data can’t be shared
• Responsibility
– Who will make sure the data stays secure?
34. UWM Security Resources
• UWM Information Security Office
– Visit: https://www4.uwm.edu/itsecurity/
– Email: infosec@uwm.edu
• Certificate in Information Security
• HIPAA
– https://www4.uwm.edu/legal/hipaa/index.cfm
• FERPA
– http://www4.uwm.edu/academics/ferpa.cfm
35. Exercise
• Do any regulations apply to your data?
• If so, who is allowed to access your secure
data? Who will be responsible for data
security?
37. Archiving Is Not Storage
• Storage is keeping files to access
• Archiving is about preservation
– Data should be readable and usable
– Data should be uncorrupted
• We can’t read some digital files from 10 years
ago
– This is what good digital preservation solves
38. Side Note
• If federally funded, you are required to retain
your data “for a period of three years from the
date of submission of the final expenditure
report.” AT LEAST.
• Better to keep on hand for at least 6 years
– Recent retraction in 6-year old paper for failure to
provide original data
• Preservation not an abstract issue
http://www.whitehouse.gov/omb/circulars_a110#53
http://retractionwatch.wordpress.com/2013/07/19/jci-paper-retracted-for-duplicated-panels-after-authors-cant-provide-
original-data/
39. File Formats
• Easy way to ensure long-term usability
• Use open file formats
– Open and standardized
– Well documented
– In wide use
– Examples: .txt, .tiff, .csv, .dbf
• Transform your data now, not later
– Keep both file types
40. Other Preservation Concerns
• Obsolescence
– Preserve software along with data
• Deterioration
– Keep more than 1 copy to avoid corruption
• Media
– ie. Can you still read a floppy disk?
– Periodically move data off outdated media
41. Find a Trustworthy Partner
• Find outside help
– Servers come and go, so do labs
• Off campus
– Disciplinary data repository
– Journal that accepts data
• Let someone else worry about this
42. Exercise
• What open file formats will you use to help
preserve your data?
• If there isn’t an adequate open format, what
software and hardware will you preserve?
43. 5. HOW WILL I PROVIDE ACCESS TO
AND ALLOW REUSE OF THE DATA?
44. Why Share?
• Get more credit for your work
– In “studies that created gene expression
microarray data, we found that studies that made
data available in a public repository received 9% …
more citations than similar studies for which the
data was not made available”
– “The citation boost varied with date of dataset
deposition: a citation boost was most clear for
papers published in 2004 and 2005, at about 30%”
• Get credit for unpublishable results
https://peerj.com/preprints/1/ (2013 study)
45. Why Share?
• Make your funder happy
• Helps you find and use your data later
• Disprove misconduct or fraud accusations
• Stimulate new research
46. Audience
• Who is the audience for this data?
– Coworkers?
– Disciplinary/institutional colleague?
– Researchers in allied fields?
– Anyone?
• Audience will determine how to share the
data
47. Ways To Provide Access
• Hands-off options preferable
– Journal
– Disciplinary repository
• Embargoes may be possible here
– UWM Digital Commons
• Small, discrete datasets
• Other options
– By request
– On your lab website
48. Exercise
• Who is the audience for your data?
• Which way will you provide access?
50. Resources
• Data Services Librarian
– briney@uwm.edu
• Data management information
– dataplan.uwm.edu
• UWM Information Security Office
– infosec@uwm.edu
51. Thank You
• This presentation is available on Slideshare
– http://www.slideshare.net/kbriney
• The content of this presentation is licensed under a Creative
Commons Attribution 3.0 Unported License (CC BY)
• Some content used with permission from Brad Houston and
Dorothea Salo