Data Management for Undergraduate Researchers

Data Management for
Undergraduate
Researchers
Office of Undergraduate Research Seminar and Workshop Series
Rebekah Cummings, Research Data Management Librarian
J. Willard Marriott Library, University of Utah
September 21, 2015

• Introductions
• What are data?
• Why manage data?
• Data Management Plans
• Data Organization
• Metadata
• Storage and Archiving
• Questions

What is data management?
The process of controlling the
information (read: data) generated
during a research project.
https://www.libraries.psu.edu/psul/pubcur/what_is_dm.html

What are data?
“The recorded factual material
commonly accepted in the research
community as necessary to validate
research findings.”
- U.S. OMB Circular A-110

Why manage data?
• Save time and efficiency
• Meet grant requirements
• Promote reproducible research
• Enable new discoveries from your data
• Make the results of publicly funded research
publicly available

We are trying to avoid
this scenario…

Two bears data
management problems
1. Didn’t know where he stored the data
2. Saved one copy of the data on a USB drive
3. Data was in a format that could only be read by
outdated, proprietary software
4. No codebook to explain the variable names
5. Variable names were not descriptive
6. No contact information for the co-author Sam Lee

Scenario
You develop a research project during your
undergraduate experience.You write up the
results, which are accepted by a reputable
journal. People start citing your work! Three
years later someone accuses you of falsifying
your work.
Scenario adapted from MANTRA training
module

• Would you be able to prove you did the
work as you described in the article?
• What would you need to prove you hadn’t
falsified the data?
• What should you have done throughout
your research study to be able to prove
you did the work as described?

Data Management Plans
• What data are generated by your research?
• What is your plan for managing the data?
• How will your data be shared?

Research Data Lifecycle
Courtesy of the UK Data
Archive http://www.data-
archive.ac.uk/create-manage/life-
cycle
• Types of data
• Data description
• Data storage
• Data sharing
• Data archiving and
responsibility
• Data management costs

MyData.xls
MeetingNotes.doc
Presentation.ppt
Assignment1.pdf

File naming best practices
1. Be descriptive
2. Don’t be generic
3. Appropriate length
4. Be consistent
5. Think critically about
your file names

File naming best practices
• Files should include only letters,
numbers, and underscores/dashes.
• No special characters
• No spaces; Use dashes, underscores, or
camel case (like-this or likeThis)
• Not all systems are case sensitive.
Assume this,THIS, and tHiS are the
same.

Version Control - Numbering
001
002
003
009
010
099
Use leading zeros for
scalability
Bonus Tip: Use ordinal numbers (v1,v2,v3) for major version
changes and decimals for minor changes (v1.1, v2.6)
1
10
2
3
9
99

Version Control - Dates
If using dates useYYYYMMDD
June2015 = BAD!
06-18-2015 = BAD!
20150618 = GREAT!
2015-06-18 = This is fine too 

From a DMP…
“Each file name, for all types of data, will
contain the project acronym PUCCUK; a
reference to the file content (survey,
interview, media) and the date of an event
(such as the date of an interview).

• PLPP_EvaluationData_Workshop2_2014.xlsx
• MyData.xlsx
• publiclibrarypartnershipsprojectevaluationdataw
orkshop22014CummingsHelenaMontana.xlsx
Who filed better?

Who filed better?
• July 24 2014_SoilSamples%_v6
• 20140724_NSF_SoilSamples_Cummings
• SoilSamples_FINAL

File organization best
practices
• Top level folder should include project title
and date.
• Sub-structure should have a clear and
consistent naming convention.
• Document your structure in a README
text file.

Research Documentation
• Grant proposals and related reports
• Applications and approvals (e.g. IRB)
• Codebooks, data dictionaries
• Consent forms
• Surveys, questionnaires, interview protocols
• Transcripts, hard copies of audio and video files
• Any software or code you used (no matter how
insignificant or buggy)

Three levels of
documentation
• Project level – what the study set out to do, research
questions, methods, sampling frames, instruments,
protocols, members of the research team
• File or database level – How all the files relate to
one another.A README file is a classic way of capturing
this information.
• Variable or item level – Full label explaining the
meaning of each variable.
http://datalib.edina.ac.uk/mantra/documentation_metadata_citation/

Metadata
Unstructured
Data
Structured
Data
There was a study put out by Dr. Gary
Bradshaw from the University of
Nebraska Medical Center in 1982
called “ Growth of Rodent Kidney
Cells in Serum Media and the Effect of
Viral Transformation On Growth”. It
concerns the cytology of kidney cells.
Title Growth of rodent
kidney cells in serum
media and the effect of
viral transformations on
growth.
Author Gary Bradshaw
Date 1982
Publisher University of Nebraska
Medical Center
Subject Kidney -- Cytology

Disciplinary Metadata
Digital Curation Centre’s list of subject-specific metadata
schemas -
http://www.dcc.ac.uk/resources/metadata-standards

LOCKSS (Lots of
Copies Keeps
Stuff Safe)

Options for data
storage
• Personal computers or laptops
• Networked drives
• External storage devices

Language from a DMP
“All data files will be stored on the University server that is backed
up nightly.The University's computing network is protected from
viruses by a firewall and anti-virus software. Digital recordings will
be copied to the server each day after interviews.
Signed consent forms will be stored in a locked cabinet in the
office. Interview recordings and transcripts, which may contain
personal information, will be password protected at file-level and
stored on the server.
Original versions of the files will always be kept on the server. If
copies of files are held on a laptop and edits made, their file names
will be changed.”

Archiving options
• Domain-specific repository
• General Purpose Data Repository
• Institutional repository

Major takeaways
• Data management starts at the beginning of
a project
• Document your data so that someone else
could understand it
• Have more than one copy of your data
• Consider archiving options when you are
done with your project

Questions?
rebekah.cummings@utah.edu
(801) 581-7701
Marriott Library, 1705Y
…or ask now!

Data Management for Undergraduate Researchers

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Data Management for Undergraduate Researchers

Semelhante a Data Management for Undergraduate Researchers (20)

Mais de Rebekah Cummings

Mais de Rebekah Cummings (20)

Último

Último (20)

Data Management for Undergraduate Researchers

Notas do Editor