O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

Time Travelling Analyst: The Things That Only a Time Machine Can Tell Me...

1.498 visualizações

Publicada em

My time at Archives New Zealand has been my first, truly hands-on experience with born-digital collections. Material transferred in 2008 containing files created over the period of an entire decade has been the focus of my first born-digital ingests with the organisation. The work in the Systems Standards and Strategies team (SSS) at Archives New Zealand has been split into two initial sets of ingests, one set of two followed by another; the idea: to create processes and develop them incrementally. My surprise after the first two ingests back in late November and December 2014, is that five months into the next two, we're still finding challenges - daily! With only the slightest nod to digital preservation and my title as digital preservation analyst, this paper discusses more the difficulties of wrestling core information received from agencies, organizational issues, and the tools available to us in this agency. Organizations and records managers have an opportunity to make recommendations to their users that can ensure issues are minimized when we place records into long-
term preservation, and over the next few years we'll collect plenty of evidence to see the number of surprises reduced, but it is this author's assertion that despite best efforts, we're always going to receive badly behaved digital material for reasons not always foreseen, and that, despite concerted efforts at control, any agency receiving born-digital material must be prepared to understand it, and must also be prepared to manage it through different mitigation strategies - depending on appetite. This paper will introduce the challenges faced while processing the organization’s first born-digital material looking at where the issues arose and why, before concluding that we must learn by doing, and that the collection of evidence and understanding 'real world' scenarios is our best opportunity to reduce surprises even if we can’t reduce them to zero.

Publicada em: Governo e ONGs
  • Seja o primeiro a comentar

Time Travelling Analyst: The Things That Only a Time Machine Can Tell Me...

  1. 1. Department of Internal Affairs Time Traveling Analyst: The Things Only a Time Machine Can Tell Me… Ross Spencer - @beet_keeper Archives New Zealand #ARANZ2015 Tuesday September 7 2015
  2. 2. Department of Internal Affairs Sun image, R24685027, E4, Archway, Archives New Zealand. http://www.archway.archives.govt.nz/ViewFullItem.do? code=24685027&digital=yes
  3. 3. Department of Internal Affairs Background Two sets of born-digital ingest, Minister's Papers, 'code-named', E1 and E4, E2 and E3. First sets selected for simplicity. Second sets followed numerical sequence and were used as a learning exercise. Complexity grew. First sets enabled creation of CSV ingest mechanism, configuration of Rosetta, creation of process. Second sets enabled the proof of that method.
  4. 4. Department of Internal Affairs ● E1~ ● 175 Files ● 10 Directories ● 0 Unidentified Objects ● 0 Unidentified Extensions ● 7 Known Formats N.B. E4 also contained two identification false positives. ● E4~ ● 1295 Files ● 6 Directories ● 2 Unidentified Objects ● 1 Unidentified Extensions ● 12 Known Formats Approximate collection breakdowns at the beginning of the process… Approximate collection breakdowns at the beginning of the process…
  5. 5. Department of Internal Affairs Approximate collection breakdowns at the beginning of the process… • E2~ • 2519 Files • 177 Directories • 5 Unidentified Objects • 4 Unidentified Extensions • 22 Known Formats • 25 Extension Mismatches • E3~ • 1748 Files • 144 Directories • 8 Unidentified Objects • 5 Unidentified Extensions • 12 Known Formats • 37 Extension Mismatches N.B. Both collections contained empty folders, empty files, and multiple-id formats.
  6. 6. Department of Internal Affairs Let's begin with a story... E1, the simplest... Enabled us to develop an ingest mechanism for heterogeneous collections – and it worked! E4, not that different, slightly larger, about as 'known', but! An unexpected exception discovered in the relationship between the preservation system and some of the filenames in the collection...
  7. 7. Department of Internal Affairs Where do astronauts go for a beer?
  8. 8. Department of Internal Affairs The...
  9. 9. Department of Internal Affairs We had filenames with multiple spaces in them... E.g. 'A [space] [space] Filename.docx' An innocuous enough looking problem... Our digital preservation system couldn't handle them... Investigate the system... ... Confirm it's the system... … Ask vendor to fix the problem... … No fix forthcoming for next release...
  10. 10. Department of Internal Affairs What now...? Change filenames? ... Serious change, this is how we received them! … Record provenance... … Mechanisms in METS metadata schema [EVENT] … How to implement?
  11. 11. Department of Internal Affairs We continue... Configure CSV to handle EVENT fields... ... Modify CSV generation tool to output blank EVENT fields... … Test ingest in system until configuration is perfected … Mechanism works so pre-condition filenames... ... Record R-Numbers* and design provenance note controlled list... … Add data to CSV … DONE!!!! *Dependency on listing being fixed in Archway
  12. 12. Department of Internal Affairs
  13. 13. Department of Internal Affairs Test in digital preservation system fails... ... UTF-8 character encoding... … How to preserve in Excel? … … Import using special ribbon in Excel... … Add notes to sheet... … DONE?! … Not even now... >.< Nope...
  14. 14. Department of Internal Affairs It can become exhausting... As a speaker! And for the audience!!! ^_^; ...Time and date based data becomes a problem... ...Asking non-expert users to do the same... ...Even power tools like Open Office suffer issues... ...E4 went in after solving the UTF-8 issues... ...E2 and E3 suffered from issues with time/date information on top
  15. 15. Department of Internal Affairs But we learn and move onwards an upwards...
  16. 16. Department of Internal Affairs The work isn't straight-forward ● It Pushes out time-frames... ● And the problems we're solving aren't what we expected... ● We need to develop with the problem...
  17. 17. Department of Internal Affairs But we have new tools... Tools to create provenance information in CSV for ingest into the digital preservation system. Tools to identify files with this issue up front. The digital preservation system is fixed, so this specific use-case for us is unlikely to occur again. We have gained new experience. For E2 and E3, we created mechanisms of creating an ingest 'mash-up' using a separate provenance spreadsheet. For our next ingest we have a macro to automate an Excel import!!!!! ← IN MICROSOFT?!!!!
  18. 18. Department of Internal Affairs We have what seems like an exhaust-less list... ● [Tools] Ability to handle multi-byte character encodings. Maori macrons, ‘Ā’, in DROID, digital preservation system, spreadsheets, etc. . • [Tools] Unidentified files and false positives - contribute to [Tools] Zero-byte files, empty folders ● [Tools] System files • [Tools] Digital preservation system’s capabilities; dates, delivery, metadata extraction, etc. • [Files] Invalid objects • [Files] Templates, objects with auto-fields
  19. 19. Department of Internal Affairs And we'd never have guessed these up front... ● What are the next challenges? ● We'd be too conservative, or too O.T.T... ●WE NEED A TIME MACHINE!!!
  20. 20. Department of Internal Affairs Questions?
  21. 21. Department of Internal Affairs We don't need a time machine at all... ● We need evidence! ● We need to practice! ● We need to do! ● Time-frames will be pushed out ● In a world that loves strategy, it's terribly detail focused. ● Can someone figure it out first? ● Definition of Leadership! ● But you will almost certainly find new exceptions... as will we.
  22. 22. Department of Internal Affairs Ground process and policy in the real world… ● We can reduce surprises... ● But we can't reduce them zero... ● Find the exceptions, create rules, and encode them in those policies... ● Move one step at a time, with modes increments. ● Flexible endpoints / reasonable / multiple goals... ● Q. HOW DID WE GET THESE FILES?? ● A. It doesn't matter, we have to deal with them...
  23. 23. Department of Internal Affairs Evidence will… ● Inform policy ● Inform Procedures ➔ Tools ➔ Skills ➔ Appetite ➔ Strategy
  24. 24. Department of Internal Affairs Writing these documents becomes a much more advanced thought experiment with a greater number of inputs from a greater number of people, and experiences...
  25. 25. Department of Internal Affairs Robustness Principle... (Postel's Law) e.g. checksums “Be conservative in what you do; be liberal in what you accept from others.” Follow standards... mechanisms should accept non-conforming input as long as the meaning is clear... Be prepared to understand material, be prepared to manage it. A way of doing things... not the only way... WRITE OTHER SOLUTIONS! RE-WRITE YOUR SOLUTIONS!
  26. 26. Department of Internal Affairs Other tools for you... DROID (National Archives UK): http://www.nationalarchives.gov.uk/information-management/manage-information/policy-proce Or Siegfried (State Records NSW): https://github.com/richardlehane/siegfried DROID Analysis Tool: https://github.com/exponential-decay/droid-sqlite-analysis Other presentations: http://www.slideshare.net/RossSpencer/presentations Blogs (Open Preservation Foundation): http://openpreservation.org/knowledge/blogs/ Record Keeping Tookit (Archives New Zealand): http://www.records.archives.govt.nz/
  27. 27. Department of Internal Affairs Share yours too!
  28. 28. Department of Internal Affairs Who do digital preservation analysts want to drink a beer with?
  29. 29. Department of Internal Affairs Commander Hadfield! https://twitter.com/cmdr_hadfield TED: What I learned from going blind in space? Star Talk: http://www.startalkradio.net/show/social-media-i
  30. 30. Department of Internal Affairs It’s almost comical that astronauts are stereotyped as daredevils and cowboys. As a rule, we’re highly methodical and detail-oriented. Our passion isn’t for thrills but for the grindstone, and pressing our noses to it. We have to: we’re responsible for equipment that has cost taxpayers many millions of dollars, and the best insurance policy we have on our lives is our own dedication to training. Studying, simulating, practicing until responses become automatic—astronauts don’t do all this only to fulfill NASA’s requirements. Training is something we do to reduce the odds that we’ll die.”   ― Chris Hadfield, An Astronaut's Guide to Life on Earth The Right Stuff
  31. 31. Department of Internal Affairs What next..?
  32. 32. Department of Internal Affairs Questions! Thank you!
  33. 33. Department of Internal Affairs