6. Curation goals
• Implement services and process to gather,
organize and preserve information about the
circumstances of creation and interpretation
• Facilitate re-use
– Of underlying objects?
– Of interpretations, analyses, etc?
8. Data
DCC: “Data, any information in binary digital
form, is at the centre of the Curation Lifecycle.”
OMB: “Research data means the recorded
factual material commonly accepted in the
scientific community as necessary to validate
research findings”
11. Questions to ask and answer
• Do we care what this object is?
• Do we care where this object came from?
• Do we care how it was rendered into this form?
• Do we care what interventions have taken place?
• Do we care who performed those interventions?
• How do we identify and evaluate the original
scholarly contributions?
12. Salo’s (via SPOT) threat model /
Threats to print
• Homelessness
• Water
• Flora and fauna
• Physical damage
• Loss or destruction
Brittle books
13. Salo’s threat model / Threats to digital
• Physical media failure
• Bitrot
• File format obsolescence
• Forgetting what you have
• Forgetting what the stuff you have means
• Rights and DRM (digital rights management)
• Lack of organizational commitment
• Ignorance (assumptions)
• Apathy
Apollo data tape
17. Repository as a service
• Description and characterization - descriptive,
provenance and technical metadata
• Selection, conversion, digitization
• Deposit and versioning
• Interoperability, APIs for ingest, discovery
• Access control, copyright support and other
legal/regulatory compliance
• Persistence
– Stable, permanent links (URLs, DOIs, etc.)
– Health of digital objects
– Replication and dark archiving
– Migration or emulation, virtualization
18. Northwestern Books
• A very library-centered project
• We keep: all versions of the digitized pages;
checksums; object relationship information;
extracted information about the process;
inferred information about who, where, when
• No easy export, integration with analytical
tools, etc.
• No information about use, integrations,
annotations
19. Who are the players?
“Understanding the relationship between critical
or interpretive activities which are also
curatorial, and more traditional curatorial
activities, which bear more relation to tasks
traditionally carried out by libraries and
archives, will be important in the context of the
humanities.” (Flanders and Muñoz 2011)
20. Practical challenges
• Data often inextricable from apparatus
(software for preservation, querying, etc.)
making selective curation/preservation
difficult
• Data versioning – e.g. Google Books OCR
• Links between data (in a particular version?)
and other research outputs – the monograph,
the journal article, etc.
25. Text markup and analysis
Folger Shakespeare Library TEI-encoded texts
• Full text, encoded down to
the word
• Responsibility statement
indicates encoder
• No facsimile texts
• Not attached to analytical
apparatus
• Creative Commons
noncommercial license
(more shortly)
27. Systems and approaches
• A: The data and the environment are one
• B: A, but some elements also available for
extraction and re-use elsewhere
• C: All the elements extractable and available
for re-use in other environments, settings
29. Legal & policy issues
• Unhelpful application of copyright is common
• Embargoes and first publication concerns
• License and attribution ‘stacking’ problems
with digital data
• Expressive v non-expressive debates and
litigation
• Creative commons, carving out a scholarly
space (what does ‘non-commercial’ mean,
anyway?)
30. Copyright basics
• Only original expressions are eligible
• Copyright is limited in duration (it expires)
• Copyright only applies to certain activities:
making copies, distributing copies, etc.
• There are broad exceptions (fair use, library
reproduction, etc.)
34. Legal & policy issues
• Unhelpful application of copyright is common
• Embargoes and first publication concerns
• License and attribution ‘stacking’ problems
with digital data
• Expressive v non-expressive debates and
litigation
• Creative commons, carving out a scholarly
space (what does ‘non-commercial’ mean,
anyway?)
35. Final thoughts
• Not everything can be made explicit
• Not everything should be retained
• We cannot afford to do everything
• Re-use is elusive
• We don’t know how to address all of the legal
issues, but trying something is a good start
36. Additional image credits
• What are data? Work found at https://www.flickr.com/photos/rh2ox/9990024683
/ undefined (https://creativecommons.org/licenses/by-sa/2.0/)
• Why? What are the risks? Work found at
https://www.flickr.com/photos/swanksalot/2704017177/ / undefined
(https://creativecommons.org/licenses/by-sa/2.0/)
• Sciences/Humanities
T/Q/U Maps. 2014. http://bicepkeck.org/B2_2014_i_figs/tqu_maps.pdf.
Fig. 8, Piccini, Angela. 2009. “Locating Grid Technologies: Performativity, Place,
Space: Challenging the Institutionalized Spaces of E-Science.” DHQ: Digital
Humanities Quarterly: 3 (4).
• Salo’s threat model/threats to print, Work found at Cornell University Library
Department of Preservation & Collection Maintenance. “Brittle Books.” Accessed
April 16, 2014.
http://wwwdev.library.cornell.edu/preservation/operations/brittlebooks.html.
• Salo’s threat model/threats to digital, Center, NASA Goddard Space Flight. Apollo
Data Tape, July 10, 2009. http://www.flickr.com/photos/gsfc/3720663276/.
37. Bibliography
• BICEP2 Collaboration, P. A. R. Ade, R. W. Aikin, D. Barkats, S. J. Benton, C. A. Bischoff, J. J. Bock, et al.
2014. “BICEP2 I: Detection Of B-Mode Polarization at Degree Angular Scales.” arXiv:1403.3985
[astro-Ph, Physics:gr-Qc, Physics:hep-Ph, Physics:hep-Th], March. http://arxiv.org/abs/1403.3985.
• Burgess, Helen J., and Jeanne Hamming. 2011. “New Media in the Academy: Labor and the
Production of Knowledge in Scholarly Multimedia.” DHQ: Digital Humanities Quarterly 5 (3).
http://www.digitalhumanities.org/dhq/vol/5/3/000102/000102.html.
• Digital Curation Centre. 2014. “DCC Curation Lifecycle Model.” Accessed April 16.
http://www.dcc.ac.uk/sites/default/files/documents/publications/DCCLifecycle.pdf.
• Flanders, Julia, and Trevor Muñoz. 2011. “An Introduction to Humanities Data Curation.” DH
Curation Guide, September.
• Piccini, Angela. 2009. “Locating Grid Technologies: Performativity, Place, Space: Challenging the
Institutionalized Spaces of E-Science.” DHQ: Digital Humanities Quarterly: 3 (4).
http://www.digitalhumanities.org/dhq/vol/3/4/000076/000076.html.
• Salo, Dorothea. 2013. “Risk Management and Auditing” presented at the DH Curation Institute,
October 16, College Park, MD. http://files.dsalo.info/S13RiskMgmtAuditing.pdf.
• United States Office of Management and Budget. 2013. Uniform Administrative Requirements, Cost
Principles, and Audit Requirements for Federal Awards. https://federalregister.gov/a/2013-30465.