This document discusses Dryad's process of developing a formal preservation policy. It provides context on Dryad as a digital repository and the benefits of data preservation. It then outlines Dryad's needs for a policy, the development process, key elements of the final policy, lessons learned in creating the policy, and open questions. The policy development involved input from a working group and staff over 18 months to balance ideals with practical realities.
Developing Preservation Policy for Dryad Digital Repository
1. It’s
a
Real
World:
Developing
Preserva6on
Policy
for
Dryad
Ayoung
Yoon
(Dryad
preserva2on
working
group,
Doctoral
Candidate
at
UNC-‐CH)
Sara
Mannheimer
(Former
Dryad
curator,
Data
management
librarian
at
Uof
Montada)
Elena
Feinstein,
Jane
Greenberg,
Ryan
Scherle,
Dryad
Digital
Repository
March
26,
2014
Research
Data
Access
&
Preserva6on
Submit
(RDAP)
2014
2. Outline
• Introduc2on
• What
is
Dryad
Digital
Repository?
• Preserva2on
policy
development
process
• Dryad
preserva2on
policy
• Lesson
learned
and
open
ques2ons
• Conclusion
• Acknowledgement
3. Introduction
• “Data
deluge”
• Journals
and
funding
agency
mandates
• Benefits
to
archiving
and
preserving
research
data:
– Facilitates:
• Verifica2on
of
research
• accessibility
and
discoverability
• opportuni2es
for
data
reuse
• increased
cita2ons
• research
visibility
– Prevents:
• redundant
data
collec2on
• inefficient
legacy
data
cura2on
• burden
of
sharing-‐on-‐request
• Challenges
of
data
archiving:
– Wider
variety
of
file
formats
than
most
digital
archival
materials.
– New
versions
as
data
sets
are
added
to
and
updated
– Security
considera2ons
– Large
amounts
of
data
Benefits
adapted
from
Beagrie
N,
Lavoie
BF,
Woollard
M
(2010)
Keeping
research
data
safe
2.
HEFCE
4. Why preservation policy?
• Preserva2on
policy
supports
strategic
planning
for
implementa2on
• Communicates
to
stakeholders
– trustworthiness
and
commitment
to
preserva2on
• Not
many
data
preserva2on
policies.
Some
examples:
– CERN:
CMS
data
– Archaeology
Data
Service
– NSIDC
Data
Management
Policies
– Odum
Ins2tute
Preserva2on
Policy
– ISPSR
– DataONE
5. Dryad Digital Repository
• A
curated,
general-‐purpose
repository
that
makes
the
data
underlying
scien2fic
and
medical
publica2ons
discoverable,
freely
reusable,
and
citable
(hap://datadryad.org/).
• Facilitates
data
availability,
data
sharing,
and
scholarly
communica2on.
• Originally
partnered
with
leading
journals
and
scien2fic
socie2es
in
evolu2onary
biology
and
ecology.
• Broad
collec2ng
policy
–
almost
any
data
is
accepted,
as
long
as
it
is
associated
with
a
publica2on.
6. Common filetypes in Dryad
0
200
400
600
800
1000
WAV
HTML
Phylip
R
script
JPEG
Image
Newick
tree
file
RTF
XML
GZip
archive
MS
Word
OpenXML
MS
Word
97-‐2007
Nexus
PDF
FASTA
MS
Excel
OpenXML
Zip
archive
CSV
MS
Excel
97-‐2007
7. Dryad and Preservation Needs
• Preserva2on
is
a
major
part
of
Dryad’s
mission.
• Current
preserva2on
ac2ons:
– MD5
Checksums
– provenance
metadata
– informal
encouragement
of
preferred
formats
• Developing
and
implemen2ng
a
formal
preserva2on
policy
will:
– guide
current
and
future
preserva2on
prac2ce
– Facilitate
the
long-‐term
preserva2on
of
the
repository’s
digital
assets
8. Policy Development Process
2012
Feb
2013
May
2013
July
2013
Nov
2013
An
ini2al
preserva2on
plan
(version
1.0.)
Preserva2on
Working
Group
in
Feb
2013
Version
2.0.
presented
to
the
Dryad
Board
of
Directors
Version
2.0.
revised
in
coopera2on
with
Dryad
staff
• Version
2.4.
Approved
by
Dryad
Board
of
Directors
• Preserva2on
Working
Group
dissolved.
Preserva2on
Task
Force
formed
9. Preservation Policy
• Purpose
• Scope
and
content
coverage
• Overview
of
preserva2on
strategies
• Format
support
and
levels
of
preserva2on
– e.g.
Preferred
formats
and
format
support
levels
• Implemen2ng
the
strategy
– e.g.
integra2ons
of
OAIS
func2onal
ac2vi2es,
pre-‐ingest
&
ingest,
and
archival
storage,
authen2city
and
integrity,
security,
versioning,
and
withdrawal
of
collec2ons
• Sustainability
plans
– e.g.
technical
sustainability,
ins2tu2onal
and
financial
sustainability
10. Lesson Learned and Open Questions
• A
nego2a2on
between
what
is
ideal
and
what
is
realis2c
– Adop2ng
Interna2onal
standards,
models,
and
best
prac2ces
exist
for
long-‐term
preserva2on
• Open
Archival
Informa2on
System
(OAIS)
reference
model
(ISO
14721:2003)
• PREMIS
(PREserva2on
Metadata:
Implementa2on
Strategies)
– Other
standards
and
guidelines
about
audit
and
cer2fica2on
for
building
a
trusted
digital
repository
• Trustworthy
Repositories
Audit
&
Cer4fica4on:
Criteria
and
Checklist
(TRAC)
and
Data
Seal
of
Approval
(DSA)
11. Lesson Learned and Open Questions
• Aligning
with
other
internal
and
ins2tu2onal
policies
– Follow
Dryad’s
internal
policies,
we
looked
primarily
to
Dryad’s
Terms
of
Service
document
(
haps://datadryad.org/pages/policies),
which
includes
policies
on
submission,
content,
payment,
usage,
and
privacy
– Comply
with
Dryad’s
unofficial
policies,
which
have
yet
to
be
finalized
• A
policy-‐in-‐progress:
Dryad’s
policy
on
versioning
– Comply
with
policy
from
partner
ins2tu2ons
• Dryad
func2ons
as
a
partnership
between
the
University
of
North
Carolina
at
Chapel
Hill
(UNC),
Duke
University
(Duke),
and
North
Carolina
State
University
(NC
State)
12. Lesson Learned and Open Questions
• Structuring
the
policy
according
to
Dryad’s
specific
needs
– Mee2ng
specific
organiza2onal
needs
is
fundamentally
important
and
should
be
the
first
considera2on
in
all
work,
as
each
organiza2on
has
different
goals,
priori2es,
and
capabili2es.
– Data
depositors’
requirements:
minimum
requirements
• balance
“minimum
efforts”
and
having
“enough”
representa2on
informa2on
• compensated
by
other
factors
13. Conclusion
• Policy-‐crea2on
and
planning
are
just
first
steps
-‐-‐
implementa2on
will
require
further
considera2ons
• Future
plan
– Poten2als
for
implemen2ng
TRAC
/
DSA
in
the
future
– Divide
policy
and
implementa2on
into
separate
documents
– New
Task
Force
14. Acknowledgement
• The
works
was
supported
in
part
from
Na2onal
Science
Founda2on
(NSF),
Award
number:
1147166/
ABI
Development:
Dryad:
scalable
and
sustainable
infrastructure
for
the
publica2on
of
data.
15. Thank you!
Ayoung
Yoon
Doctoral
candidate
University
of
North
Carolina
at
Chapel
Hill
ayyoon@email.unc.edu
Sara
Mannheimer
Data
management
librarian
Montana
State
University
sara.mannheimer@montana.edu