The document discusses digital reformatting and digital preservation/curation/stewardship. It notes that digital reformatting, such as digitization, is good for access but not preservation alone. It then discusses best practices for digital capture and preservation, including using standards like METS, PREMIS, and FADGI guidelines. The document defines digital preservation as the managed activities needed to ensure continued access to digital materials for as long as necessary.
1. Digital Reformatting and Digital Preservation/Curation/Stewardship
Keri Thompson :: Web Services Department :: Smithsonian Institution Libraries
www.sil.si.edu :: thompsonk@si.edu :: @DigiKeri_SIL
March 22, 2011
2. aka Digitization
Works great for
access
Keri Thompson :: Web Services Department :: Smithsonian Institution Libraries
www.sil.si.edu :: thompsonk@si.edu :: @DigiKeri_SIL
3. Imaging basics
Resolution ppi/dpi
Bit depth
Color vs. grayscale vs. b&w
File formats & compression
Keri Thompson :: Web Services Department :: Smithsonian Institution Libraries
www.sil.si.edu :: thompsonk@si.edu :: @DigiKeri_SIL
5. Digital Capture Best Practices
METS ,PREMIS, embedded
metadata
FADGI
http://www.digitizationguidelines.gov/
Keri Thompson :: Web Services Department :: Smithsonian Institution Libraries
www.sil.si.edu :: thompsonk@si.edu :: @DigiKeri_SIL
6. Keri Thompson :: Web Services Department :: Smithsonian Institution Libraries
www.sil.si.edu :: thompsonk@si.edu :: @DigiKeri_SIL
7. Be involved
in the
process!
Keri Thompson :: Web Services Department :: Smithsonian Institution Libraries
www.sil.si.edu :: thompsonk@si.edu :: @DigiKeri_SIL
8. So, you’ve scanned a
book…what’s next?
Keri Thompson :: Web Services Department :: Smithsonian Institution Libraries
www.sil.si.edu :: thompsonk@si.edu :: @DigiKeri_SIL
9. “The series of managed
activities
necessary to ensure
continued access
to digital materials for as long
as necessary.”
Keri Thompson :: Web Services Department :: Smithsonian Institution Libraries
www.sil.si.edu :: thompsonk@si.edu :: @DigiKeri_SIL
10. Object
Definition
Authenticity
Access
Process
Planning
Management
OAIS model
(ISO 14721:2003)
Keri Thompson :: Web Services Department :: Smithsonian Institution Libraries
www.sil.si.edu :: thompsonk@si.edu :: @DigiKeri_SIL
11. Issues are as much organizational as technological.
Keri Thompson :: Web Services Department :: Smithsonian Institution Libraries
www.sil.si.edu :: thompsonk@si.edu :: @DigiKeri_SIL
12. Technical
Authenticity and checksums
Systems and media
Access methods
Keri Thompson :: Web Services Department :: Smithsonian Institution Libraries
www.sil.si.edu :: thompsonk@si.edu :: @DigiKeri_SIL
13. Digital Imaging Primer from Cornell
http://www.library.cornell.edu/preservation/tutorial/contents.html
FADGI Federal Agencies Digitization Guideline Initiative
www.digitizationguidelines.gov/
NDIIP National Digital Information Infrastructure & Preservation Program (LC)
http://www.digitalpreservation.gov/
Digital Preservation Primer by Michael Day, UKOLN via JISC (UK)
http://www.slideshare.net/michaelday/digital-preservation-an-introduction
APA Alliance for Permanent Access (EU)
http://www.alliancepermanentaccess.org/
Digital Preservation Coalition (UK) (includes Digital Preservation Handbook)
http://www.dpconline.org/
DMP Tool (for grant proposals, minimal but a place to start)
https://dmp.cdlib.org/
Digital reformatting – sure Digital preservation – ok, this is a little tricky Digital reformatting for preservation … is it Digital reformatting and preservation?
Reformatting reduces demand for/wear on physical books, but is not generally done for preservation but rather for access. If you are reformatting for preservation, similar issues but will want to keep in mind doing highest quality possible to retain maximum info, following digital preservation practices for resulting digital object. (For some disciplines (hist of the book etc) digital object won’t be satisfactory replacement for physical volume) In order create a good surrogate, understand how digital item will be used. Text mining? Verifying citations? Reading for fun? Scientific/artistic analysis of plates? Helps determine which aspects you concentrate on when digitizing, standards followed (resolution, color or bitonal, etc). Boutique vs. mass digitization, economies of scale. If $ not a problem, then can follow all the best practices & highest standards. If it is, then pick and choose (cheap, fast, good) Scanning just the first step, arguably the easiest and most understood. Maintenance of the resulting digital object – not just in the digital preservation/curation sense but also enhancement, error correction, and general management takes work & resources.
What are we doing when we create a digital image? Explain – divide picture area into grid, store color info for each grid space (pixel) Resolution =#of pixels used to represent each n area of the original, ppi -true ppi calculate against book size or character size (10x5” book, should be 3000x1500px if scanned at 300ppi) Bitdepth =number of bits used to represent each pixel color (stored in a byte=8bits), enables the capture of more gray shades or color tones. 8 bit is basic, 48 or 64 bit = more info (= bigger files) Color =Few still doing bitonal (EXCEPT GOOG) most doing halftone or color. Choose based on need. Bitonal good for text on new docs. Halftone (gray) good for text documents with damage, spots. Color good for color. Plus, you can alter color docs to b&w or gray if you want. When doing color, imp to choose colorspace (?) and calibrate regularly ‘golden thread’ or similar calibration tools. Many still shoot each page with a color reference card. File formats- still mostly tiff. Some save RAW (proprietary to each camera manufacturer=decoding issues later?) jpeg2000 iso standard, still controversial, but has many advantages.
Imaging for Books = not just images! Metadata, metadata, metadata Regular library stuff, like MARC, DC or MODS Structural metadata Rights metadata OCR – one of life’s little frustrations, the heartbreak of the long ‘s’ planning for manual OCR correction while we wait for folks to develop better engines Additional file formats – you have your images. Now make pdfs, epub formats. Making real epubs takes more work than just making a document readable on an ebook device.
Digital Capture Best Practices Resolution – true ppi, 400ppi or 600! 300 minimum Color fidelity (color space, bit depth) Camera, monitor, and target calibration FADGI http://www.digitizationguidelines.gov/ File formats – RAW, tiff, jpg2000 Filenaming conventions – keep related files together in worst case scenario (filename part of IPTC/XMP) Embedded metadata IPTC or XMP – technical is automatic, space for admin/rights , descriptive (basic) METS and PREMIS create “self describing” information packets that include the image files
Decision based on Quality vs speed less imp than Type of material being scanned (sheetfeeders, maps), condition of items being scanned Equipment choices: Camera on a stand (scanback, hi res $$) ||Flatbed scanners ||Overhead scanners || Dual-camera models||Robots vs. humans! Lighting – flash vs. continuous, color temp Page curvature & Depth of field issues Outsourcing ?????? MegaPixel = how many pixels are available on the sensor (camera back, scanner) surface for recording info from the original. 8MP cameras will usually do fine (300ppi) for quarto sized books. For Folios you need a bigger camera.
Be involved in the process! Equipment evaluation lighting (strobes vs. hot lights) things with platens, flatbeds, etc. training scanners for book handling, reviewing condition making sure conservation/repair/rehousing is part of the workflow (minimum at least indicate what items need treatment)
Store it somewhere. Make it findable and usable. Master file, derivatives, other derivative files – based on use cases and user needs. This may vary with material type (rb, mss) or by discipline (lit& humanities vs. soc sci vs. sci) Manage the lifecycle of your new digital object and ensure you can continue to make it available – hey, that’s…
Use of “preservation” is a little misleading to those in library-land not about conservation or restoration not about backup procedures or media on which data is stored no concept of “keeping it for 500 years” or any fixed period of permanence will often hear people use Digital Curation interchangeably, Digital Stewardship also gaining traction Continuous evolving process, not one time action. More like housework than building a house. No fixed time.
Definition: Digital objects can be very complex, website w/ media, or simple but dependant on hardware/software WordStar document. Even simple digitized books need all pages in order, included descriptive metadata. How object is defined must include context so can be correctly interpreted in future Authenticity: Media & systems on which digital objects are stored have uncertain lifespan. Need to plan for migration, assure entire object is being safely migrated and is the same after migration – ensure authenticity Digital objects are easy to change, either maliciously or accidentally – made of 1s and 0s! Bit rot! Oopsies Access : if they can’t find it, what’s the point? Metadata, systems Need robust metadata to describe the object so it can be found AND used AND understood. Continuing access for as long as necessary migration (see above) Planning: includes creating and maintaining preservation plans, DAMPs Management: putting organizational structure in place to maintain and manage systems and objects OAIS model : high level conceptual framework that guides an organization’s implementation of digital preservation practices. also covers the technical issues, but in a broad way. Preservation planning and institutional commitment. OAIS : (is an ISO standard, but no way of ‘certifying’ if orgs/systems are implementing it, or implementing it well.)"an archive, consisting of an organization of people and systems, that has accepted the responsibility to preserve information and make it available for a Designated Community." Where "The information being maintained is deemed to need Long Term Preservation, even if the OAIS itself is not permanent. Long Term is long enough to be concerned with the impacts of changing technologies, including support for new media and data formats, or with a changing user community". http://www.paradigm.ac.uk/workbook/introduction/oais.html
Planning and documenting, including documenting methodology (why we are doing x this way) Organizational permanence (will your department be around in 20 years, even if your institution will be…) Responsibilities Creation & Administration of preservation plans Creation & management of DAMPs Choosing standards Data management Importance of periodic reevaluation of methodologies, standards that are applied to make sure they are working, still relevant. Plans are living documents, need to build in necessity of review & revision. Preservation planning concerns preservation of accessibility and readability of data. The functions of preservation planning address technical issues like recommendations for file format standards, monitoring changes in technology, evaluating content of a digital archive. The preservation plan should reflect what current strategies would preserve access to content in the best possible way. Selecting a suitable solution, by using different tools, makes it possible to implement a specific method
Handled by systems, good metadata and carrying out a well developed plan. Systems use checksum for data authenticity, other security pieces to ensure no tampering. Need context to know what your ‘authentic’ object is, so metadata is part of authenticity. Systems need to have built in periodic authenticity checking Hardware/software is available that support OAIS model – including DSPACE, FEDORA,LOCKSS Make sure files are findable & accessable by having good metadata (again!) includes standards like PREMIS, but also standard practices like assigning DOIs or URIs, having clear filenaming conventions, careful migration from system to system. Access may include joining Cooperatives/Services – it’s too much for one little org to go it alone! Datavers, MetaArchive, LOCKSS Internet Archive, DuraSpace/DuraCloud