SlideShare a Scribd company logo
1 of 27
Download to read offline
Preserve or Preserve Not, There is No
    Try: some dilemmas relating to
       Personal Digital Archiving
Luke: All right, I'll give it a try.
Yoda: No. Try not. Do... or do not. There is no try.
Star Wars: Episode V - The Empire Strikes Back (1980)




David Pearson
Manager –
Digital Preservation Section
National Library of Australia
Outline
What I am going to talk to you about today is some
of my experience in dealing with, and the
consequences of not looking after digital materials.

      Part 1: The current environment, some observations

      Part 2: Monster, what monster?

      Part 3: Some personal experiences 2002-2009

      Part 4: What can ‘I’ do about it?

      Part 5: Conclusion
Part 1: The Current
          Environment
So essentially we know that:

   – There is a lot of ‘digital stuff’ in various personal and
     institution archives and it is growing;

   – loss of access to it (or to much of it) will occur unless
     action taken;

   – The process of keeping it accessible and usable needs
     to be managed; and

   – Any ‘solutions’ have to be scalable, reliable and
     automated and in the long term sustainable by a
     community.
Not managing ‘it’ is not an option. Collectively and or
individually we, need to:

   – Understand our ‘digital stuff’ & associated risks;

   – Provide safe storage, ensure immutability and
     authenticity;

   – Ensure access over time as technology changes;

   – Develop & implement preservation workflows, skills,
     standards, & strategies for ongoing access; and

   – Enable content to be shared and used in different ways
     in the future.
There are a number of problems which we face during the life-
cycle of digital material:

    – There appears to be a massive difference in the way that
      people create and use data, and they way in which collecting
      bodies would like to receive data in order to preserve it.

The challenge is that either:

    – We need to be able to handle any sorts of data that comes to
      us (impossible!); or

    – Find a way that the data that we receive works for us, without
      affecting how the user creates and use their data. The data
      needs to be capable of being identified, preserved and
      accessed.
Part 2: Monster, what
           monster?
The challenging part is
keeping these rich
information resources
available.

One of our greatest
problems is knowing that
we have a problem or will
have a problem.

We can ignore the monster
at our peril, but sooner or
later it will bite ‘someone’
on the …!
Yet, the game keeps changing!

The digital environment is a dynamic entity. This means that in relation
to digital materials, some problems relating to collecting archives are:

    – They are constantly being created;

    – People are submitting more and institutions are actively looking for
      content;

    – The ways in which it can be submitted are changing;

    – The way in which the content is presented and the files types keep
      changing;

    – The accompanying metadata that comes to us is generally not
      consistent, if there at all (standard change); and

    – In many cases we only know what was sent to use when we open it up!
Act or risk losing it
‘Digital stuff’ is dependent on technology at all stages:

    – Creation/capture
    – Storage
    – Access


Because Technology changes - sometimes rapidly, sometimes
more slowly, but eventually over time, we will lose access. Thus
software, hardware, media, file formats, operating systems will
become obsolete.


Unless managed deterioration can occur rapidly (e.g. data can
be corrupted or lost in storage or transfer process).
Different flavours of
              obsolescence




Differential preservation of technology which is driven by:
    – The availability of the technology
       • Vendor designed and ultimately consumer driven
         obsolescence;
       • Disruptive and or destructive technologies (destroys
         stability and content); and
       • Increase in the capabilities of technologies.
It’s ‘only MOSTLY dead’
                                  Miracle Max: Whoo-hoo-hoo, look who
                                  knows so much. It just so happens that
                                  your friend here is only MOSTLY dead.
                                  There's a big difference between mostly
                                  dead and all dead. Mostly dead is slightly
                                  alive. With all dead, well, with all dead
                                  there's usually only one thing you can
                                  do.
                                  Inigo Montoya: What's that?
                                  Miracle Max: Go through his clothes and
                                  look for loose change.
                                  The Princess Bride (1987)




Entropy:
  – Loss of availability or degradation of the
    technology; and
  – Loss of skills and understanding of the
    technology.
An example of MOSTLY
           dead
An example: If you did not know already, you may have forgotten that to
access a 5 ¼ inch Floppy Disk (in one particular configuration) you might
need the following:
A working 5 ¼ inch Floppy Disk


A working 5 ¼ inch Drive


A working 5 ¼ inch floppy cable


A mother board that can connect
to a 5 ¼ inch floppy cable and has
a compatible BIOS for this drive
type


A working version of a compatible
OS
Part 3: Some Personal
   Experiences 2002-2009
The National Archives of
Australia.

Dealing with the Digital
Preservation of Government
Manuscript materials.

For example, data recovery of
300 data carriers containing
Royal Commissions records
from 1970-1995. Data
Recovery from:
    –   9 Track ½ Magnetic Tape
    –   8 inch Floppy Disks
    –   5 ¼ inch Floppy Disks
    –   3 ½ inch Floppy Disks
Some Experiences 2007-
         present
National Library of Australia.

Dealing with Digital Preservation of published (serial, monograph), unpublished
personal manuscripts, large scale digitisation of newspapers and web sites
(around 350TB of data).

For example, The PANDORA Web Archive:
NAA Part 1
I have learnt the some of the following during my 8 years in Digital
Preservation.

NAA (data recovery from carriers):
    – Problematic with sensitive or security classified data;

    – External Data Recovery Companies don’t like recording the types or
      descriptive data which we like to record. Is the detailed metadata
      recorded about series, media and data objects necessary for future
      audit and authentication? What will a researcher require in the future
      when they are looking at some derivative data object?

    – Very expensive. In this case costing about $277 per carrier (300
      carriers) for about 8Gb of data (about $35 per Mb);

    – Surprisingly, recovered data as follows:
         •   245 (81.9%) carriers with 100% data recovery
         •   20 (6.7%) system or known duplicate data
         •   14 (4.7%) with partial recovery
         •   14 (4.7%) found to be blank
         •   6 (2%) failed the process completely.
NAA Part 2
– Although the Data recovery process was first concerned with
  the media carriers, there are in fact two distinct problems:
    • accessing the physical media (hardware/software issue)
    • accessing the file system by rendering/performing/converting the
      file (software problem).

– Because of potential media degradation, caution or
  conservative action should be taken when accessing legacy
  media;

– The media is only the record carrier (such as the box in the
  analogue world);

– Manifest of the content is essential;

– There is a vast variety of media types and file formats used
  over the last 40 years which exacerbates the legacy problem;
NAA Part 3
Arrangement and Description (A&D):

– Is continued A&D required at the conclusion of each step of
  the data recovery process?

– As the custody model at the time of initial digital transfers of
  these records was distributed, it is assumed that the
  controlling agency also might have provided with paper
  copies.

– Ideally when conducting A&D on these series, cross-
  referencing all digital files with paper records of the same
  series. Looking for a direct correlation between both.

– Which has primacy, duplicates of digital or paper records?
  Are the paper printouts of digital records copies? Does digital
  make more sense from a storage perspective?
NLA Part 1
In relation to Web Archives, the following are problematic:

    – Knowing what you have;

    – Measuring numbers of files;

    – Dealing with imbedded digital objects;

    – Processing and identifying ‘unusual’ file formats;

    – Maintaining links between objects;

    – Dilemmas about migration & emulation of at ‘risk files’; and

    – Browsers are so tolerant of poor conforming code that
      migration results can be variable.
Part 4: Solutions
No one has solved this yet. We all have small parts of the
puzzle. As a community we need to concentrate on services,
tools application that work together. We need to:

   – Avoid duplication of effort;

   – Build tools that work not only for your own business but could
     work for others;

   – If a product is good, concentrate effort rather than start yet
     another project;

   – Use standardized APIs;

   – Make vendors partly responsible for their own file formats and
     hardware; and

   – Don’t lock ourselves into a situation where the is no way out
     ‘except pain’.
Ok, I’ll do it
This is easier said that done!

As we all have:

     – Different businesses (to varying degrees. Although we all
       want to preserve data).

     – Different internal environments (are usually dependant on
       applications and services which are customised to our
       business)

     – Different common understanding of the problem and what is
       required to fix the problem; and

     – Different levels of resources and sustainability models.
Part 4: What can ‘I’ do
                   about it?
I have been working on a number of projects to assist in the acquisition,
ingesting, and ensuring access to digital content (these are):

Mediapedia (soon to be a community web-base service for carrier identification
and knowledge) aims to:

     –   be a curated repository for business knowledge and ‘trusted’ sources;

     –   enable the identification with a low level of audience knowledge;

     –   cater for both general and specialists audiences;

     –   contain additional information about dependencies (such as hardware,
         connectors, interfaces, software, etc.) and access paths for usage which are not
         currently represented in other sources such as Wikipedia; and

     –   be a web based service which can be integrated internally or externally to other
         services.
Mediapedia
Prometheus
The NLA built a application called Prometheus to transfer
digital content off common media carriers into managed
storage system in a systematic manner.

The NLA had to designed a system and process which was:
 –     semi-automatic;
 –     modular;
 –     scalable (able to deal with multiple carrier types); and
 –     capable of automatically harvesting and generating
       appropriate metadata for future access.

The acquisition of digital materials on carriers is a constantly
growing problem. Therefore, the NLA had to address:

 –     the backlog
 –     adding to the backlog
Prometheus
Prometheus
Conclusion
In conclusion:

   – We are all responsible for a lot of “digital stuff”;

   – If we simply collect and store it, it will become unusable
     in a relatively short time as technologies change;

   – Maintaining the ability to access it requires a lot of good
     management, planning, & dedicated resources; and

   – We have to find and use solutions that can be applied
     automatically and reliably to billions of digital files.
References
                           Websites

                           NLA Digital Preservation Section
                                http://www.nla.gov.au/preserve/digipres/index.html

                           Mediapedia Information
                                http://www.nla.gov.au/mediapedia/

                           Prometheus Sourceforge Website:
                                http://prometheus-digi.sourceforge.net/


                           Some Recent Articles:

                           Elford, D., del Pozo, N., Mihajlovic, S., Pearson, D., Clifton, G. and Webb, C.
                                 2008. Media Matters: developing processes for preserving digital objects
                                 on physical carriers at the National Library of Australia. In World Library
                                 and Information Congress: 74th IFLA General Conference and Council 10-
                                 14 August 2008, Québec, Canada
                                  www.ifla.org/IV/ifla74/papers/084-Webb-en.pdf.

Everything, for Everyone   Pearce, J., Pearson, D., Williams, M. and Yeadon, S. 2008. ‘Australian METS
                                Profile: A Journey about Metadata’ D-Lib Magazine March/April 2008,
        Forever                 Vol. 14 No.3/4, http://www.dlib.org/dlib/march08/pearce/03pearce.html

                           Pearson, D. 2008. Titans in the Library: Prometheus Unbinds At-risk Data. In
                                Gateways Dec 2008.
                                http://www.nla.gov.au/pub/gateways/issues/96/story02.htm.

                           Pearson, D. and Webb, C. 2008. ‘Defining File Format Obsolescence: A Risky
                                Journey’, The International Journal of Digital Curation (IJDC), Issue 1,
                                Volume 3 (July 2008), pp.89-106. http://www.ijdc.net/ijdc/article/view/76/78

More Related Content

Similar to Preserve or preserve not

Digital Forensic is a part of forensic that focuses on investigati.docx
Digital Forensic is a part of forensic that focuses on investigati.docxDigital Forensic is a part of forensic that focuses on investigati.docx
Digital Forensic is a part of forensic that focuses on investigati.docxcuddietheresa
 
File Formats for Preservation
File Formats for PreservationFile Formats for Preservation
File Formats for PreservationStephen Gray
 
Personal Digital Archiving Initiatives at the Library of Congress
Personal Digital Archiving Initiatives at the Library of CongressPersonal Digital Archiving Initiatives at the Library of Congress
Personal Digital Archiving Initiatives at the Library of Congresslljohnston
 
Object Based Storage
Object Based StorageObject Based Storage
Object Based StorageEMC
 
Spectra Logic BlackPearl Developer Summit 2015
Spectra Logic BlackPearl Developer Summit 2015Spectra Logic BlackPearl Developer Summit 2015
Spectra Logic BlackPearl Developer Summit 2015spectralogic
 
Palmetto Technology Hub - Skill Sharing
Palmetto Technology Hub - Skill Sharing Palmetto Technology Hub - Skill Sharing
Palmetto Technology Hub - Skill Sharing Tina Arnoldi, MA, LPC
 
Databases & Challenges of a Digital Age
Databases & Challenges of a Digital AgeDatabases & Challenges of a Digital Age
Databases & Challenges of a Digital AgeWendy Lile
 
University of Bath Research Data Management training for researchers
University of Bath Research Data Management training for researchersUniversity of Bath Research Data Management training for researchers
University of Bath Research Data Management training for researchersJez Cope
 
Perspectives on digitization of music
Perspectives on digitization of musicPerspectives on digitization of music
Perspectives on digitization of musicOle Bisbjerg
 
2010 AIRI Petabyte Challenge - View From The Trenches
2010 AIRI Petabyte Challenge - View From The Trenches2010 AIRI Petabyte Challenge - View From The Trenches
2010 AIRI Petabyte Challenge - View From The TrenchesGeorge Ang
 
Planning and Managing Digital Library & Archive Projects
Planning and Managing Digital Library & Archive ProjectsPlanning and Managing Digital Library & Archive Projects
Planning and Managing Digital Library & Archive Projectsac2182
 
Exercises portfolio-Digital Curation Tools (IS40620)
Exercises portfolio-Digital Curation Tools (IS40620)Exercises portfolio-Digital Curation Tools (IS40620)
Exercises portfolio-Digital Curation Tools (IS40620)softwaresatish
 
GWAVACon - Files Matters (English)
GWAVACon - Files Matters (English)GWAVACon - Files Matters (English)
GWAVACon - Files Matters (English)GWAVA
 

Similar to Preserve or preserve not (20)

Digital Forensic is a part of forensic that focuses on investigati.docx
Digital Forensic is a part of forensic that focuses on investigati.docxDigital Forensic is a part of forensic that focuses on investigati.docx
Digital Forensic is a part of forensic that focuses on investigati.docx
 
Andrew Waugh presentation
Andrew Waugh   presentationAndrew Waugh   presentation
Andrew Waugh presentation
 
File Formats for Preservation
File Formats for PreservationFile Formats for Preservation
File Formats for Preservation
 
Andrew waugh
Andrew waughAndrew waugh
Andrew waugh
 
Personal Digital Archiving Initiatives at the Library of Congress
Personal Digital Archiving Initiatives at the Library of CongressPersonal Digital Archiving Initiatives at the Library of Congress
Personal Digital Archiving Initiatives at the Library of Congress
 
Object Based Storage
Object Based StorageObject Based Storage
Object Based Storage
 
Spectra Logic BlackPearl Developer Summit 2015
Spectra Logic BlackPearl Developer Summit 2015Spectra Logic BlackPearl Developer Summit 2015
Spectra Logic BlackPearl Developer Summit 2015
 
2014 aus-agta
2014 aus-agta2014 aus-agta
2014 aus-agta
 
Palmetto Technology Hub - Skill Sharing
Palmetto Technology Hub - Skill Sharing Palmetto Technology Hub - Skill Sharing
Palmetto Technology Hub - Skill Sharing
 
Quality of information
Quality of informationQuality of information
Quality of information
 
Databases & Challenges of a Digital Age
Databases & Challenges of a Digital AgeDatabases & Challenges of a Digital Age
Databases & Challenges of a Digital Age
 
University of Bath Research Data Management training for researchers
University of Bath Research Data Management training for researchersUniversity of Bath Research Data Management training for researchers
University of Bath Research Data Management training for researchers
 
Perspectives on digitization of music
Perspectives on digitization of musicPerspectives on digitization of music
Perspectives on digitization of music
 
2010 AIRI Petabyte Challenge - View From The Trenches
2010 AIRI Petabyte Challenge - View From The Trenches2010 AIRI Petabyte Challenge - View From The Trenches
2010 AIRI Petabyte Challenge - View From The Trenches
 
Planning and Managing Digital Library & Archive Projects
Planning and Managing Digital Library & Archive ProjectsPlanning and Managing Digital Library & Archive Projects
Planning and Managing Digital Library & Archive Projects
 
Dig c curr
Dig c currDig c curr
Dig c curr
 
Exercises portfolio-Digital Curation Tools (IS40620)
Exercises portfolio-Digital Curation Tools (IS40620)Exercises portfolio-Digital Curation Tools (IS40620)
Exercises portfolio-Digital Curation Tools (IS40620)
 
Proact story on Archiving
Proact story on ArchivingProact story on Archiving
Proact story on Archiving
 
GWAVACon - Files Matters (English)
GWAVACon - Files Matters (English)GWAVACon - Files Matters (English)
GWAVACon - Files Matters (English)
 
Carl idigpres
Carl idigpresCarl idigpres
Carl idigpres
 

More from National Library of Australia

Publicity and media - Anna Gressier & Sarah Kleven (Communications and Market...
Publicity and media - Anna Gressier & Sarah Kleven (Communications and Market...Publicity and media - Anna Gressier & Sarah Kleven (Communications and Market...
Publicity and media - Anna Gressier & Sarah Kleven (Communications and Market...National Library of Australia
 
CHG recipient case study - Julia Mant of the National Institute of Dramatic Art
CHG recipient case study - Julia Mant of the National Institute of Dramatic ArtCHG recipient case study - Julia Mant of the National Institute of Dramatic Art
CHG recipient case study - Julia Mant of the National Institute of Dramatic ArtNational Library of Australia
 
Just Digitise It - Daniel Wilksch of the Public Records Office Victoria
Just Digitise It - Daniel Wilksch of the Public Records Office VictoriaJust Digitise It - Daniel Wilksch of the Public Records Office Victoria
Just Digitise It - Daniel Wilksch of the Public Records Office VictoriaNational Library of Australia
 
Trove - a window to our community heritage - Hilary Berthon of Trove, NLA
Trove - a window to our community heritage - Hilary Berthon of Trove, NLATrove - a window to our community heritage - Hilary Berthon of Trove, NLA
Trove - a window to our community heritage - Hilary Berthon of Trove, NLANational Library of Australia
 
Disaster Prevention, Preparedness, Response and Recovery for Collections - Ki...
Disaster Prevention, Preparedness, Response and Recovery for Collections - Ki...Disaster Prevention, Preparedness, Response and Recovery for Collections - Ki...
Disaster Prevention, Preparedness, Response and Recovery for Collections - Ki...National Library of Australia
 
Assessing Significance and Significance 2.0: an introduction - Margaret Birt...
 Assessing Significance and Significance 2.0: an introduction - Margaret Birt... Assessing Significance and Significance 2.0: an introduction - Margaret Birt...
Assessing Significance and Significance 2.0: an introduction - Margaret Birt...National Library of Australia
 
Assessing the significance of cultural heritage - Tania Cleary
Assessing the significance of cultural heritage - Tania ClearyAssessing the significance of cultural heritage - Tania Cleary
Assessing the significance of cultural heritage - Tania ClearyNational Library of Australia
 
Publicity, Media & Completing your CHG project - 2017 - Fran D'Castro
Publicity, Media & Completing your CHG project - 2017 - Fran D'CastroPublicity, Media & Completing your CHG project - 2017 - Fran D'Castro
Publicity, Media & Completing your CHG project - 2017 - Fran D'CastroNational Library of Australia
 
Just Digitise It - Daniel Wilksch of the Public Records Office Victoria
Just Digitise It - Daniel Wilksch of the Public Records Office VictoriaJust Digitise It - Daniel Wilksch of the Public Records Office Victoria
Just Digitise It - Daniel Wilksch of the Public Records Office VictoriaNational Library of Australia
 
TROVE - a window to our community heritage - Hilary Berthon of Trove, NLA
TROVE - a window to our community heritage - Hilary Berthon of Trove, NLATROVE - a window to our community heritage - Hilary Berthon of Trove, NLA
TROVE - a window to our community heritage - Hilary Berthon of Trove, NLANational Library of Australia
 
Disaster Prevention, Preparedness, Response and Recovery for Collections - Ki...
Disaster Prevention, Preparedness, Response and Recovery for Collections - Ki...Disaster Prevention, Preparedness, Response and Recovery for Collections - Ki...
Disaster Prevention, Preparedness, Response and Recovery for Collections - Ki...National Library of Australia
 
CHG recipient case study - Donna Bailey of the Catholic Diocese of Sandhurst
CHG recipient case study - Donna Bailey of the Catholic Diocese of SandhurstCHG recipient case study - Donna Bailey of the Catholic Diocese of Sandhurst
CHG recipient case study - Donna Bailey of the Catholic Diocese of SandhurstNational Library of Australia
 
Assessing the significance of cultural heritage - Tania Cleary
Assessing the significance of cultural heritage - Tania ClearyAssessing the significance of cultural heritage - Tania Cleary
Assessing the significance of cultural heritage - Tania ClearyNational Library of Australia
 
Significance Assessment and Significance 2.0: an introduction - Veronica Bull...
Significance Assessment and Significance 2.0: an introduction - Veronica Bull...Significance Assessment and Significance 2.0: an introduction - Veronica Bull...
Significance Assessment and Significance 2.0: an introduction - Veronica Bull...National Library of Australia
 
Just digitise it - Daniel Wilksch of the Public Records Office Victoria
Just digitise it - Daniel Wilksch of the Public Records Office VictoriaJust digitise it - Daniel Wilksch of the Public Records Office Victoria
Just digitise it - Daniel Wilksch of the Public Records Office VictoriaNational Library of Australia
 

More from National Library of Australia (20)

Publicity and media - Anna Gressier & Sarah Kleven (Communications and Market...
Publicity and media - Anna Gressier & Sarah Kleven (Communications and Market...Publicity and media - Anna Gressier & Sarah Kleven (Communications and Market...
Publicity and media - Anna Gressier & Sarah Kleven (Communications and Market...
 
CHG recipient case study - Julia Mant of the National Institute of Dramatic Art
CHG recipient case study - Julia Mant of the National Institute of Dramatic ArtCHG recipient case study - Julia Mant of the National Institute of Dramatic Art
CHG recipient case study - Julia Mant of the National Institute of Dramatic Art
 
Completing your CHG project - Fran D'Castro
Completing your CHG project - Fran D'CastroCompleting your CHG project - Fran D'Castro
Completing your CHG project - Fran D'Castro
 
Just Digitise It - Daniel Wilksch of the Public Records Office Victoria
Just Digitise It - Daniel Wilksch of the Public Records Office VictoriaJust Digitise It - Daniel Wilksch of the Public Records Office Victoria
Just Digitise It - Daniel Wilksch of the Public Records Office Victoria
 
Trove - a window to our community heritage - Hilary Berthon of Trove, NLA
Trove - a window to our community heritage - Hilary Berthon of Trove, NLATrove - a window to our community heritage - Hilary Berthon of Trove, NLA
Trove - a window to our community heritage - Hilary Berthon of Trove, NLA
 
National Archives of Australia
National Archives of AustraliaNational Archives of Australia
National Archives of Australia
 
Disaster Prevention, Preparedness, Response and Recovery for Collections - Ki...
Disaster Prevention, Preparedness, Response and Recovery for Collections - Ki...Disaster Prevention, Preparedness, Response and Recovery for Collections - Ki...
Disaster Prevention, Preparedness, Response and Recovery for Collections - Ki...
 
Assessing Significance and Significance 2.0: an introduction - Margaret Birt...
 Assessing Significance and Significance 2.0: an introduction - Margaret Birt... Assessing Significance and Significance 2.0: an introduction - Margaret Birt...
Assessing Significance and Significance 2.0: an introduction - Margaret Birt...
 
Preservation Needs Assessment - Tamara Lavrencic
Preservation Needs Assessment  - Tamara LavrencicPreservation Needs Assessment  - Tamara Lavrencic
Preservation Needs Assessment - Tamara Lavrencic
 
Assessing the significance of cultural heritage - Tania Cleary
Assessing the significance of cultural heritage - Tania ClearyAssessing the significance of cultural heritage - Tania Cleary
Assessing the significance of cultural heritage - Tania Cleary
 
Publicity, Media & Completing your CHG project - 2017 - Fran D'Castro
Publicity, Media & Completing your CHG project - 2017 - Fran D'CastroPublicity, Media & Completing your CHG project - 2017 - Fran D'Castro
Publicity, Media & Completing your CHG project - 2017 - Fran D'Castro
 
Just Digitise It - Daniel Wilksch of the Public Records Office Victoria
Just Digitise It - Daniel Wilksch of the Public Records Office VictoriaJust Digitise It - Daniel Wilksch of the Public Records Office Victoria
Just Digitise It - Daniel Wilksch of the Public Records Office Victoria
 
TROVE - a window to our community heritage - Hilary Berthon of Trove, NLA
TROVE - a window to our community heritage - Hilary Berthon of Trove, NLATROVE - a window to our community heritage - Hilary Berthon of Trove, NLA
TROVE - a window to our community heritage - Hilary Berthon of Trove, NLA
 
Disaster Prevention, Preparedness, Response and Recovery for Collections - Ki...
Disaster Prevention, Preparedness, Response and Recovery for Collections - Ki...Disaster Prevention, Preparedness, Response and Recovery for Collections - Ki...
Disaster Prevention, Preparedness, Response and Recovery for Collections - Ki...
 
CHG recipient case study - Donna Bailey of the Catholic Diocese of Sandhurst
CHG recipient case study - Donna Bailey of the Catholic Diocese of SandhurstCHG recipient case study - Donna Bailey of the Catholic Diocese of Sandhurst
CHG recipient case study - Donna Bailey of the Catholic Diocese of Sandhurst
 
Preservation Needs Assessment - Tamara Lavrencic
Preservation Needs Assessment - Tamara LavrencicPreservation Needs Assessment - Tamara Lavrencic
Preservation Needs Assessment - Tamara Lavrencic
 
Assessing the significance of cultural heritage - Tania Cleary
Assessing the significance of cultural heritage - Tania ClearyAssessing the significance of cultural heritage - Tania Cleary
Assessing the significance of cultural heritage - Tania Cleary
 
Significance Assessment and Significance 2.0: an introduction - Veronica Bull...
Significance Assessment and Significance 2.0: an introduction - Veronica Bull...Significance Assessment and Significance 2.0: an introduction - Veronica Bull...
Significance Assessment and Significance 2.0: an introduction - Veronica Bull...
 
Preservation assessment - Tamara Lavrencic
Preservation assessment - Tamara LavrencicPreservation assessment - Tamara Lavrencic
Preservation assessment - Tamara Lavrencic
 
Just digitise it - Daniel Wilksch of the Public Records Office Victoria
Just digitise it - Daniel Wilksch of the Public Records Office VictoriaJust digitise it - Daniel Wilksch of the Public Records Office Victoria
Just digitise it - Daniel Wilksch of the Public Records Office Victoria
 

Recently uploaded

Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 

Recently uploaded (20)

Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 

Preserve or preserve not

  • 1. Preserve or Preserve Not, There is No Try: some dilemmas relating to Personal Digital Archiving Luke: All right, I'll give it a try. Yoda: No. Try not. Do... or do not. There is no try. Star Wars: Episode V - The Empire Strikes Back (1980) David Pearson Manager – Digital Preservation Section National Library of Australia
  • 2. Outline What I am going to talk to you about today is some of my experience in dealing with, and the consequences of not looking after digital materials. Part 1: The current environment, some observations Part 2: Monster, what monster? Part 3: Some personal experiences 2002-2009 Part 4: What can ‘I’ do about it? Part 5: Conclusion
  • 3. Part 1: The Current Environment So essentially we know that: – There is a lot of ‘digital stuff’ in various personal and institution archives and it is growing; – loss of access to it (or to much of it) will occur unless action taken; – The process of keeping it accessible and usable needs to be managed; and – Any ‘solutions’ have to be scalable, reliable and automated and in the long term sustainable by a community.
  • 4. Not managing ‘it’ is not an option. Collectively and or individually we, need to: – Understand our ‘digital stuff’ & associated risks; – Provide safe storage, ensure immutability and authenticity; – Ensure access over time as technology changes; – Develop & implement preservation workflows, skills, standards, & strategies for ongoing access; and – Enable content to be shared and used in different ways in the future.
  • 5. There are a number of problems which we face during the life- cycle of digital material: – There appears to be a massive difference in the way that people create and use data, and they way in which collecting bodies would like to receive data in order to preserve it. The challenge is that either: – We need to be able to handle any sorts of data that comes to us (impossible!); or – Find a way that the data that we receive works for us, without affecting how the user creates and use their data. The data needs to be capable of being identified, preserved and accessed.
  • 6. Part 2: Monster, what monster? The challenging part is keeping these rich information resources available. One of our greatest problems is knowing that we have a problem or will have a problem. We can ignore the monster at our peril, but sooner or later it will bite ‘someone’ on the …!
  • 7. Yet, the game keeps changing! The digital environment is a dynamic entity. This means that in relation to digital materials, some problems relating to collecting archives are: – They are constantly being created; – People are submitting more and institutions are actively looking for content; – The ways in which it can be submitted are changing; – The way in which the content is presented and the files types keep changing; – The accompanying metadata that comes to us is generally not consistent, if there at all (standard change); and – In many cases we only know what was sent to use when we open it up!
  • 8. Act or risk losing it ‘Digital stuff’ is dependent on technology at all stages: – Creation/capture – Storage – Access Because Technology changes - sometimes rapidly, sometimes more slowly, but eventually over time, we will lose access. Thus software, hardware, media, file formats, operating systems will become obsolete. Unless managed deterioration can occur rapidly (e.g. data can be corrupted or lost in storage or transfer process).
  • 9. Different flavours of obsolescence Differential preservation of technology which is driven by: – The availability of the technology • Vendor designed and ultimately consumer driven obsolescence; • Disruptive and or destructive technologies (destroys stability and content); and • Increase in the capabilities of technologies.
  • 10. It’s ‘only MOSTLY dead’ Miracle Max: Whoo-hoo-hoo, look who knows so much. It just so happens that your friend here is only MOSTLY dead. There's a big difference between mostly dead and all dead. Mostly dead is slightly alive. With all dead, well, with all dead there's usually only one thing you can do. Inigo Montoya: What's that? Miracle Max: Go through his clothes and look for loose change. The Princess Bride (1987) Entropy: – Loss of availability or degradation of the technology; and – Loss of skills and understanding of the technology.
  • 11. An example of MOSTLY dead An example: If you did not know already, you may have forgotten that to access a 5 ¼ inch Floppy Disk (in one particular configuration) you might need the following: A working 5 ¼ inch Floppy Disk A working 5 ¼ inch Drive A working 5 ¼ inch floppy cable A mother board that can connect to a 5 ¼ inch floppy cable and has a compatible BIOS for this drive type A working version of a compatible OS
  • 12.
  • 13. Part 3: Some Personal Experiences 2002-2009 The National Archives of Australia. Dealing with the Digital Preservation of Government Manuscript materials. For example, data recovery of 300 data carriers containing Royal Commissions records from 1970-1995. Data Recovery from: – 9 Track ½ Magnetic Tape – 8 inch Floppy Disks – 5 ¼ inch Floppy Disks – 3 ½ inch Floppy Disks
  • 14. Some Experiences 2007- present National Library of Australia. Dealing with Digital Preservation of published (serial, monograph), unpublished personal manuscripts, large scale digitisation of newspapers and web sites (around 350TB of data). For example, The PANDORA Web Archive:
  • 15. NAA Part 1 I have learnt the some of the following during my 8 years in Digital Preservation. NAA (data recovery from carriers): – Problematic with sensitive or security classified data; – External Data Recovery Companies don’t like recording the types or descriptive data which we like to record. Is the detailed metadata recorded about series, media and data objects necessary for future audit and authentication? What will a researcher require in the future when they are looking at some derivative data object? – Very expensive. In this case costing about $277 per carrier (300 carriers) for about 8Gb of data (about $35 per Mb); – Surprisingly, recovered data as follows: • 245 (81.9%) carriers with 100% data recovery • 20 (6.7%) system or known duplicate data • 14 (4.7%) with partial recovery • 14 (4.7%) found to be blank • 6 (2%) failed the process completely.
  • 16. NAA Part 2 – Although the Data recovery process was first concerned with the media carriers, there are in fact two distinct problems: • accessing the physical media (hardware/software issue) • accessing the file system by rendering/performing/converting the file (software problem). – Because of potential media degradation, caution or conservative action should be taken when accessing legacy media; – The media is only the record carrier (such as the box in the analogue world); – Manifest of the content is essential; – There is a vast variety of media types and file formats used over the last 40 years which exacerbates the legacy problem;
  • 17. NAA Part 3 Arrangement and Description (A&D): – Is continued A&D required at the conclusion of each step of the data recovery process? – As the custody model at the time of initial digital transfers of these records was distributed, it is assumed that the controlling agency also might have provided with paper copies. – Ideally when conducting A&D on these series, cross- referencing all digital files with paper records of the same series. Looking for a direct correlation between both. – Which has primacy, duplicates of digital or paper records? Are the paper printouts of digital records copies? Does digital make more sense from a storage perspective?
  • 18. NLA Part 1 In relation to Web Archives, the following are problematic: – Knowing what you have; – Measuring numbers of files; – Dealing with imbedded digital objects; – Processing and identifying ‘unusual’ file formats; – Maintaining links between objects; – Dilemmas about migration & emulation of at ‘risk files’; and – Browsers are so tolerant of poor conforming code that migration results can be variable.
  • 19. Part 4: Solutions No one has solved this yet. We all have small parts of the puzzle. As a community we need to concentrate on services, tools application that work together. We need to: – Avoid duplication of effort; – Build tools that work not only for your own business but could work for others; – If a product is good, concentrate effort rather than start yet another project; – Use standardized APIs; – Make vendors partly responsible for their own file formats and hardware; and – Don’t lock ourselves into a situation where the is no way out ‘except pain’.
  • 20. Ok, I’ll do it This is easier said that done! As we all have: – Different businesses (to varying degrees. Although we all want to preserve data). – Different internal environments (are usually dependant on applications and services which are customised to our business) – Different common understanding of the problem and what is required to fix the problem; and – Different levels of resources and sustainability models.
  • 21. Part 4: What can ‘I’ do about it? I have been working on a number of projects to assist in the acquisition, ingesting, and ensuring access to digital content (these are): Mediapedia (soon to be a community web-base service for carrier identification and knowledge) aims to: – be a curated repository for business knowledge and ‘trusted’ sources; – enable the identification with a low level of audience knowledge; – cater for both general and specialists audiences; – contain additional information about dependencies (such as hardware, connectors, interfaces, software, etc.) and access paths for usage which are not currently represented in other sources such as Wikipedia; and – be a web based service which can be integrated internally or externally to other services.
  • 23. Prometheus The NLA built a application called Prometheus to transfer digital content off common media carriers into managed storage system in a systematic manner. The NLA had to designed a system and process which was: – semi-automatic; – modular; – scalable (able to deal with multiple carrier types); and – capable of automatically harvesting and generating appropriate metadata for future access. The acquisition of digital materials on carriers is a constantly growing problem. Therefore, the NLA had to address: – the backlog – adding to the backlog
  • 26. Conclusion In conclusion: – We are all responsible for a lot of “digital stuff”; – If we simply collect and store it, it will become unusable in a relatively short time as technologies change; – Maintaining the ability to access it requires a lot of good management, planning, & dedicated resources; and – We have to find and use solutions that can be applied automatically and reliably to billions of digital files.
  • 27. References Websites NLA Digital Preservation Section http://www.nla.gov.au/preserve/digipres/index.html Mediapedia Information http://www.nla.gov.au/mediapedia/ Prometheus Sourceforge Website: http://prometheus-digi.sourceforge.net/ Some Recent Articles: Elford, D., del Pozo, N., Mihajlovic, S., Pearson, D., Clifton, G. and Webb, C. 2008. Media Matters: developing processes for preserving digital objects on physical carriers at the National Library of Australia. In World Library and Information Congress: 74th IFLA General Conference and Council 10- 14 August 2008, Québec, Canada www.ifla.org/IV/ifla74/papers/084-Webb-en.pdf. Everything, for Everyone Pearce, J., Pearson, D., Williams, M. and Yeadon, S. 2008. ‘Australian METS Profile: A Journey about Metadata’ D-Lib Magazine March/April 2008, Forever Vol. 14 No.3/4, http://www.dlib.org/dlib/march08/pearce/03pearce.html Pearson, D. 2008. Titans in the Library: Prometheus Unbinds At-risk Data. In Gateways Dec 2008. http://www.nla.gov.au/pub/gateways/issues/96/story02.htm. Pearson, D. and Webb, C. 2008. ‘Defining File Format Obsolescence: A Risky Journey’, The International Journal of Digital Curation (IJDC), Issue 1, Volume 3 (July 2008), pp.89-106. http://www.ijdc.net/ijdc/article/view/76/78