Digital Library Forum presentation on Cabrinety software preservation grant. Combined presentation from National Institute of Standards and Technology and Stanford University.
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
Preserving Software at Scale: The Stephen Cabrinety Collection
1. Preserving Software at Scale: The
Stephen Cabrinety Collection
Michael Olson, Stanford University Libraries
Douglas White, National Institute of Standards and Technology
2. Disclaimer
Trade names and company products are
mentioned in the text or identified. In no case
does such identification imply recommendation
or endorsement by the National Institute of
Standards and Technology, nor does it imply that
the products are necessarily the best available
for the purpose.
3. The Collection and NIST Grant
Collection consists of ~ 15,000 software titles from 1975 –
1995
Grant (Sept. 2013 – Aug. 2014) funded by National Institute
of Standards and Technology
Contains all media types from this period
Disk images to be added to National Software Reference
Library (NSRL) Reference Data Set
Disk images and photographs will be ingested into the
Stanford Digital Repository
4. Initial Stanford Tasks
Page software to campus
Register software titles in Digital Object Registry (DRUID,
Title, Source ID)
Enter descriptive metadata in NSRL database
Print tracking sheet
Ship to NIST
7. NIST NSRL Collection
Contains 14,500 pieces of computer software.
Focuses on Windows, Mac, Linux operating systems and
popular applications.
Modern formats : DVD & CD ROMs, 5¼ in. & 3 ½ in. disks.
Efforts 2005 to date:
19,500 media images
395 media errors (2%)
3,500 photograph sets
25,200 photos
9. SUL Cabrinety Collection
Focuses on games for Atari, Commodore, Amiga, Sega,
Nintendo, and Apple systems.
27 different operating systems represented.
Several formats : 8 in., 5¼ in., and 3 ½ in. computer disks,
cassettes, cartridges, CD-ROMs.
NIST Efforts to date:
900 media images
158 media errors (17%)
1,100 photograph sets
61,100 photos
11. Workstation Equipment
Apple Mini, running Ubuntu 12.04 LTS
5000K lighting station
Canon T3i, tethered
Golden Thread Object Level Target
USB 3.5-inch floppy drive
Device Side Data FC5025 USB 5.25-inch floppy controller
ATA 5.25-inch floppy drive
USB barcode scanner
Firefox browser
Java photo organizer (custom, wraps gphoto2 etc.)
Perl media imager (custom, wraps dfcldd etc.)
13. Cartridge Media
Using Retrode adapter for SEGA Genesis and Super Nintendo
(SNES) games, plus plug-ins for Gameboy, Atari, Nintendo 64.
Could not generate a complete, consistent media image.
Every cartridge has metadata in a ROM “header” area; many
include a checksum, for anti-piracy use.
NSRL can calculate the SNES and SEGA Genesis checksums.
Game Boy and Nintendo are works in progress.
Detailed blog article recently published on Stanford website.
14. Results to date
Just received first batch of data from NIST
– 360 GB = 870 software titles, 116,000 unique files
Capture success rate:
– 83% with no modification or intervention
– Can increase by 5% with human intervention during imaging
– Can increase by 4% with intervention during image mount
– 8% of media have many (> 10%) sector read errors
15. Lessons and Improvements
Automation; less human interaction
Photography; use RAW and convert
Hardware for legacy media:
Apple physical formats
Large format floppy disks (8”)
Cassettes
Cartridge batteries
16. Lessons and Improvements
Data modeling beginning this month for repository
Copyright letter created to send to rights holders
Create persistent URL citation page (PURL) for software
Integration into Stanford Catalog called SearchWorks –
when rights allow
17.
Just received first batch of data from NIST
360 GB = 870 software titles, 116,000 unique files
Copyright permissions letter created