SlideShare uma empresa Scribd logo
1 de 77
Baixar para ler offline
In Case of Failure
           ELAG 2011 Prague
      Patrick Hochstenbach * Ghent University
       Email: Patrick.Hochstenbach@UGent.be
               Twitter: @hochstenbach
 https://github.com/phochste/ELAG-2001-Bootcamp
BOM-VL/Archipel
http://www.slideshare.net/hochstenbach/20081007-workshop-bomvl-wp3
Life expectancies of media
                                     Magnetic Tape                                                                                             Optical Disk Paper                                                                                          Microfilm




                                                                                                                                                                                                      High Quality (low lignin)
                                                                                                                                                                            Newspaper (high lignin)




                                                                                                                                                                                                                                                                              Archival Quality (Silver)
                                                                                                                                                                                                                                  "Permanent" (buffered)
                                                                                            Data 8mm / Data VHS




                                                                                                                                                                                                                                                           Medium-Term Film
                                                                                                                              QIC / QIC-wide
                                                                                                                  DDS / 4mm
                                                                         3490/3490e
                       Retention                                                                                                                                                                                                                                                                           Retention




                                                                                                                                               CD-ROM
                                            Data D-2
                                                       Data D-3


                        Period -                                                                                                                                                                                                                                                                            Period -




                                                                                                                                                        WORM
                                                                                                                                                               CD-R
                       Required                                                                                                                                                                                                                                                                            Required
                                                                  3480




                                                                                                                                                                      M-O
                                                                                      DLT
                                     I-D1




                      Storage Life                                                                                                                                                                                                                                                                        Storage Life
                         1 year                                                                                                                                                                                                                                                                              1 year
                         2 years                                                                                                                                                                                                                                                                             2 years
                         5 years                                                                                                                                                                                                                                                                             5 years
                        10 years                                                                                                                                                                                                                                                                            10 years
                        15 years                                                                                                                                                                                                                                                                            15 years
                        20 years                                                                                                                                                                                                                                                                            20 years
                        30 years                                                                                                                                                                                                                                                                            30 years
                        50 years                                                                                                                                                                                                                                                                            50 years




“Storage Media Life Expectancies” - Van Bogart, 1998
Growth of digital data
                                           Capacity of desktop computers




http://commons.wikimedia.org/wiki/File:Hard_drive_capacity_over_time.png HanKwang (2008)
Growth in formats

                          *"+$!""$&'((,$-$.$$

                !"#$%$&'(()$$           !/0$%$&'((#$$                    *//$%$234$560$
   !",$1$&4?$                                                                       !7)$1$<@4$
                                                   !/0$1$:;<'=$


!"#$%                              !""$%                                   &$$$%

        !".$1$>:2$
*",$$1$49:$                                             !/#$1$234$567$       *77$1$82940777$
                      *"+$%$4'("/$
ABCDE;$
         *"+$%$4'("+$          !/0$1$8294$
Formats of formats
MIME type image/tiff:
•  TIFF (alle versies)
•  TIFF/IT
•  TIFF G4/LZW/UNC
•  Digital Negative Format (DNG)
•  GeoTIFF
•  Pyramid TIFF
•  !

Bron: PRONOM Technical Registry [http://www.nationalarchives.gov.uk/pronom/]
Short & long term risks

,'&/00#$=4#.&>&.#0?.($

      !"#$%&&'&()!*+($
               ,"-.$,'&/00#$1"23"+"4+.4$
                          5.784'-'+".$98":$
                                           ;&+04"(0#'&"(78.$<"23"+"4+.4$




  !"#$%                    !""$%                           &$$$%
                                                                     5"26$
Best practices
Best practices
1. Create a preservation plan
Best practices
1. Create a preservation plan
2. Backup and replicate your data
Best practices
1. Create a preservation plan
2. Backup and replicate your data
3. Store preservation metadata
Best practices
1. Create a preservation plan
2. Backup and replicate your data
3. Store preservation metadata
4. Store technical metadata
Best practices
1. Create a preservation plan
2. Backup and replicate your data
3. Store preservation metadata
4. Store technical metadata
5. Store representation metadata
Best practices
1. Create a preservation plan
2. Backup and replicate your data
3. Store preservation metadata
4. Store technical metadata
5. Store representation metadata
6. Don’t trust software
Best practices
1. Create a preservation plan
2. Backup and replicate your data
3. Store preservation metadata
4. Store technical metadata
5. Store representation metadata
6. Don’t trust software
7. Store descriptive metadata
Preservation Plan
• Preservation policies (what to preserve)
• Legal obligations
• Organizational & Technical constraints
• User requirements
• Context
• http://plato.ifs.tuwien.ac.at:8080/plato
Risk Analysis
Random error
3
``````       1

                 1
    4
                 1




         2

                     Random error
3
``````       1                  3
                              ``````    1         2

                 1                            1
    4                               4             1
                 1                            1




         2                                  1.9

                     Random error
Systematic error
3
``````   1         2

             1
    4               1
             1




                 Systematic error
3
``````   1           2

               1
    4                 1
               1




             1.9
                   Systematic error
3
``````   1           2                75

               1
    4                 1
               1




             1.9
                   Systematic error
MTBF
       MTBF = Mean Time Between Failure
         3
   2
                                      10
             5
Time

            Total Time               40 hours
   MTBF =                        =                = 13.3 hrs
            Number of failures       3 failures
MTTF
           MTTF = Mean Time To Failure
            3
       2
                                   10
                 5
Time
               Total time          20 hours
   MTTF =                      =              = 5 hrs
             Number of units       4 units
MTTF = 2 M hours = 228 years!
MTTF = 2 M hours = 228 years!
AFR = 1/MTTF = 0.004 = 0.4 %
MTTF = 2 M hours = 228 years!
AFR = 1/MTTF = 0.004 = 0.4 %
      R(t) = exp(-t/ϴ)
MTTF = 2 M hours = 228 years!
AFR = 1/MTTF = 0.004 = 0.4 %
       R(t) = exp(-t/ϴ)
R(5) = exp(-5/228) = 0.98 = 98%
MTTF = 2 M hours = 228 years!
AFR = 1/MTTF = 0.004 = 0.4 %
       R(t) = exp(-t/ϴ)
R(5) = exp(-5/228) = 0.98 = 98%

 50 disks = 0.98^ 50 = 0.36 = 36%
Experiments

• Simulate 100 disks with a 200 MTTF using
  Processing. What happens if the AFR is not
  0.4% but 4% (hint: what is MTTF in that
  case)?
• Given a MTTF of 200 years and 50 disks
  what is the reliability in 1,2 and 5 years?
Experiments
               •      Amazon S3 claims an AFR per object of
                      0.000000001% [1]. What is the MTTF?

               •      There are 100 billion objects in S3. Given an
                      estimated average size of 1 MB how big is S3?

               •      What is the chance (reliability) none of these 100
                      billion objects are lost in 1 year?



[1] http://aws.amazon.com/s3/faqs/#How_reliable_is_Amazon_S3#How_durable_is_Amazon_S3
http://db.usenix.org/events/fast07/tech/schroeder/schroeder_html/index.html
Shroeder & Gibson
1 yr   3-5yr
Experiments
• Given the lifetime of the universe (13
  billion years) as the lifetime of one storage
  byte. What is the probability one Tera byte
  (1 billion bytes) will survive 100 years?
• Discuss
http://www.hpl.hp.com/techreports/tandem/TR-85.7.html
Serial Failures
Serial Failures
    87 years
Serial Failures
    87 years

    75 years
Serial Failures
    87 years

    75 years

    50 years
Serial Failures
    87 years

    75 years

    50 years


     31 years
Serial Failures
    •A     B C D ....            SYSTEM

       1         1       1       1       1
             =       +       +       +       +
   SYSTEM        A       B       C       D

E.g. : components : 1 , 100 , 1000, 10000
          System: 0.989 years
Parallel Failures
        = 200 years




        = ?? years
Parallel Failures

           {
                A
                 B
SYSTEM =
                C
                D

   SYSTEM = A * B * C * D

   E.g. : components : 200,200
        System: 40000 years
Composite Failures


                = ?? years
Composite Failures
                = 40.000 years


                      = SYSTEM

  1            1            1
         =            +
SYSTEM       40.000       40.000

      SYSTEM = 20.000
Experiments
• Calculate the composite failure of the
  Tandem example (administration, software,
  hardware, environment)
• How would you make this setup more
  reliable? Calculate the effect
• What is the MTTF of a 5-way mirror of
  7K3000 disks?
http://old.hki.uni-koeln.de/people/herrmann/forschung/heydegger_archiving2008_40.pdf
Bit Errors

                      0110001011




                      0010101001

BER = Bit Error Rate = 3/10 = 0.3 = 30 %
Bit Errors

• Soft error - repeat the operation
• Hard error - after some repeats data is lost
• Typical disk BER = 10 to 10 (every 10KB
                        -5    -6



  to 100 KB read)
Bit Errors
 Drive Type   Hard Error                        14
                                               10 =~ 10 TB
 Consumer                     -14
                                                15
                       10                      10 =~ 100TB
   SATA
 Enterprise                   -15               16
                       10                      10 =~ 1 PB
   SATA
 Enterprise                  -16
                       10
    SAS
              *) BER-s are in bit = 1/8 byte




1 sector error for every 10 TB -> 1 PB read
Experiments
•   Collect a few sample document from the web
    (images, documents, executables, etc); flip one or
    more random bits; explain the resulting effect

•   Use the visual defects experiment to measure the
    effect of flipping bits on images files with various
    compressions

•   Open and save an image file. Measure the visual
    effects.

•   Calculate the checksum of the files and repeat the
    experiments. Check results.
File Formats

• The goal of digital preservation is not
  preserving the bits and bytes but the means
  to access and use the information
  represented by them.
File Formats




          Software
Bits                  Information
              +
        Environment
File Formats
hypothetical 3-bit format


   110110010111010


                            Width = bit [1 .. 3]
                            Height = bit [4 .. 6]
                            Data = bit [7 .. 15]
File Formats
With software you have only two options:


1. The software works and is maintained
2. The software doesn’t work and is not
   maintained
File Formats
  1. The software works and is maintained

• Your designated community has the
  software tools
• Your archive has the software tools
• In both cases you need to provide
  information which software you need and
  the steps required to get access to the data
File Formats
2. The software doesn’t work and is not maintained


  • Archive the source code of the orginal
     software
  • Emulate the original software
Experiments
• Experiment with different textencoding
  demo files to discover the bit content of
  these files.
• Use droid and jhove to characterize and
  validate the demo files.
• Invalidate the files using truncation, bit
  errors. Check the results.
• Use migration and emulation to get access
  to the demo.wp file.
Metadata

• Descriptive Metadata
• Administrative Metadata
• Structural Metadata
• Rights Metadata
• Representation Metadata
Packaging

• Digital objects are composite structures
• Need to be described, validated and
  accessed as a whole
• Complex Objects
Package Formats

• METS
• MPEG-21/DIDL
• LOM/IMS
• BagIt
• TIPR RXP
BagIt

• Library of Congress & California Digital
  Library
• NDIIP
• Generic Format
BagIt
Experiments

• Create using the Bagger toolkit a bag. Add
  Dublin Core descriptive metadata.
• Save the bag as ZIP-file and deposit it do
  the demo archive.
• As archivist access the deposit and validate
  its contents.
Conclusions

Mais conteúdo relacionado

Destaque

@Agawish creating a stunning ui with oracle adf faces, using sass
@Agawish   creating a stunning ui with oracle adf faces, using sass@Agawish   creating a stunning ui with oracle adf faces, using sass
@Agawish creating a stunning ui with oracle adf faces, using sassAmr Gawish
 
Chicago Chemists
Chicago ChemistsChicago Chemists
Chicago Chemistshostage
 
Searchthewebtutorial2014
Searchthewebtutorial2014Searchthewebtutorial2014
Searchthewebtutorial2014Joyce Miller
 
Business Consulting
Business ConsultingBusiness Consulting
Business ConsultingChris Walker
 
Culture Change 2 days seminar
Culture Change 2 days seminarCulture Change 2 days seminar
Culture Change 2 days seminarChris Walker
 
Optimized Internet Marketing
Optimized Internet MarketingOptimized Internet Marketing
Optimized Internet MarketingHans Riemer
 
Abraham Upfront Frontality In The Dura Europos Narratives
Abraham Upfront  Frontality In The Dura Europos NarrativesAbraham Upfront  Frontality In The Dura Europos Narratives
Abraham Upfront Frontality In The Dura Europos NarrativesPaige Dansinger
 
Tutorial dynamics of a rigid body (part i)
Tutorial dynamics of a rigid body (part i)Tutorial dynamics of a rigid body (part i)
Tutorial dynamics of a rigid body (part i)Kumutha Danasakaran
 

Destaque (19)

Are You Rewarding Loyal Members? ASAE 2013 Annual Meeting
Are You Rewarding Loyal Members? ASAE 2013 Annual MeetingAre You Rewarding Loyal Members? ASAE 2013 Annual Meeting
Are You Rewarding Loyal Members? ASAE 2013 Annual Meeting
 
The Ying & Yang of Creative Management
The Ying & Yang of Creative ManagementThe Ying & Yang of Creative Management
The Ying & Yang of Creative Management
 
Incentive Cards Explained - Incentive Mag Dec 1995
Incentive Cards Explained - Incentive Mag Dec 1995Incentive Cards Explained - Incentive Mag Dec 1995
Incentive Cards Explained - Incentive Mag Dec 1995
 
@Agawish creating a stunning ui with oracle adf faces, using sass
@Agawish   creating a stunning ui with oracle adf faces, using sass@Agawish   creating a stunning ui with oracle adf faces, using sass
@Agawish creating a stunning ui with oracle adf faces, using sass
 
Chicago Chemists
Chicago ChemistsChicago Chemists
Chicago Chemists
 
Searchthewebtutorial2014
Searchthewebtutorial2014Searchthewebtutorial2014
Searchthewebtutorial2014
 
Business Consulting
Business ConsultingBusiness Consulting
Business Consulting
 
Culture Change 2 days seminar
Culture Change 2 days seminarCulture Change 2 days seminar
Culture Change 2 days seminar
 
Biradsfa qs
Biradsfa qsBiradsfa qs
Biradsfa qs
 
Style Plus Presentation (1)
Style Plus Presentation (1)Style Plus Presentation (1)
Style Plus Presentation (1)
 
Understanding Member Engagement
Understanding Member EngagementUnderstanding Member Engagement
Understanding Member Engagement
 
A Review of Incentive Reward Cards 1997 - White Paper
A Review of Incentive Reward Cards 1997 - White PaperA Review of Incentive Reward Cards 1997 - White Paper
A Review of Incentive Reward Cards 1997 - White Paper
 
Purchasing Cooperatives and Job Order Contracting Make Sense CJE news 2006 ...
Purchasing Cooperatives and Job Order Contracting Make Sense   CJE news 2006 ...Purchasing Cooperatives and Job Order Contracting Make Sense   CJE news 2006 ...
Purchasing Cooperatives and Job Order Contracting Make Sense CJE news 2006 ...
 
Delivering Consistent National Brand Service At Multiple Locations - ICSA Pr...
Delivering Consistent National Brand Service At Multiple Locations - ICSA  Pr...Delivering Consistent National Brand Service At Multiple Locations - ICSA  Pr...
Delivering Consistent National Brand Service At Multiple Locations - ICSA Pr...
 
Gent_M 2011-04-26
Gent_M 2011-04-26Gent_M 2011-04-26
Gent_M 2011-04-26
 
Open | Linked | Open Linked data
Open | Linked | Open Linked dataOpen | Linked | Open Linked data
Open | Linked | Open Linked data
 
Optimized Internet Marketing
Optimized Internet MarketingOptimized Internet Marketing
Optimized Internet Marketing
 
Abraham Upfront Frontality In The Dura Europos Narratives
Abraham Upfront  Frontality In The Dura Europos NarrativesAbraham Upfront  Frontality In The Dura Europos Narratives
Abraham Upfront Frontality In The Dura Europos Narratives
 
Tutorial dynamics of a rigid body (part i)
Tutorial dynamics of a rigid body (part i)Tutorial dynamics of a rigid body (part i)
Tutorial dynamics of a rigid body (part i)
 

Mais de Patrick Hochstenbach

Mais de Patrick Hochstenbach (20)

Elag2015
Elag2015Elag2015
Elag2015
 
Processing Linked Data with Catmandu
Processing Linked Data with CatmanduProcessing Linked Data with Catmandu
Processing Linked Data with Catmandu
 
The Library in 2050
The Library in 2050The Library in 2050
The Library in 2050
 
20130308 webstrategie
20130308 webstrategie20130308 webstrategie
20130308 webstrategie
 
MARC Died
MARC DiedMARC Died
MARC Died
 
LibreCat::Catmandu
LibreCat::CatmanduLibreCat::Catmandu
LibreCat::Catmandu
 
Catmandu Librecat
Catmandu LibrecatCatmandu Librecat
Catmandu Librecat
 
Catmandu / LibreCat Project
Catmandu / LibreCat ProjectCatmandu / LibreCat Project
Catmandu / LibreCat Project
 
UGent Datacenter of waarom we 140TB kopen
UGent Datacenter of waarom we 140TB kopenUGent Datacenter of waarom we 140TB kopen
UGent Datacenter of waarom we 140TB kopen
 
देवनागरी Devanāgarī
 देवनागरी Devanāgarī  देवनागरी Devanāgarī
देवनागरी Devanāgarī
 
Informatie Aan Zee - TTT E-Research
Informatie Aan Zee - TTT E-ResearchInformatie Aan Zee - TTT E-Research
Informatie Aan Zee - TTT E-Research
 
Informatie Aan Zee - TTT Digital Architecture
Informatie Aan Zee - TTT Digital ArchitectureInformatie Aan Zee - TTT Digital Architecture
Informatie Aan Zee - TTT Digital Architecture
 
Biblio
BiblioBiblio
Biblio
 
GREP - Ghent University Repository
GREP - Ghent University RepositoryGREP - Ghent University Repository
GREP - Ghent University Repository
 
20100831 igelu mobilise_ugent
20100831 igelu mobilise_ugent20100831 igelu mobilise_ugent
20100831 igelu mobilise_ugent
 
20100618 Datasalon5 Vooruit Gent
20100618 Datasalon5 Vooruit Gent20100618 Datasalon5 Vooruit Gent
20100618 Datasalon5 Vooruit Gent
 
20100306 Datasalon 4 : code4lib
20100306 Datasalon 4 : code4lib20100306 Datasalon 4 : code4lib
20100306 Datasalon 4 : code4lib
 
20091120 Vlengel Maastricht
20091120 Vlengel Maastricht20091120 Vlengel Maastricht
20091120 Vlengel Maastricht
 
Data Salon 3 - Ghent
Data Salon 3 - GhentData Salon 3 - Ghent
Data Salon 3 - Ghent
 
20081007 Workshop BOM-VL WP3
20081007  Workshop BOM-VL WP320081007  Workshop BOM-VL WP3
20081007 Workshop BOM-VL WP3
 

Último

Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 

Último (20)

Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 

ELAG2011 Bootcamp

  • 1. In Case of Failure ELAG 2011 Prague Patrick Hochstenbach * Ghent University Email: Patrick.Hochstenbach@UGent.be Twitter: @hochstenbach https://github.com/phochste/ELAG-2001-Bootcamp
  • 3. Life expectancies of media Magnetic Tape Optical Disk Paper Microfilm High Quality (low lignin) Newspaper (high lignin) Archival Quality (Silver) "Permanent" (buffered) Data 8mm / Data VHS Medium-Term Film QIC / QIC-wide DDS / 4mm 3490/3490e Retention Retention CD-ROM Data D-2 Data D-3 Period - Period - WORM CD-R Required Required 3480 M-O DLT I-D1 Storage Life Storage Life 1 year 1 year 2 years 2 years 5 years 5 years 10 years 10 years 15 years 15 years 20 years 20 years 30 years 30 years 50 years 50 years “Storage Media Life Expectancies” - Van Bogart, 1998
  • 4. Growth of digital data Capacity of desktop computers http://commons.wikimedia.org/wiki/File:Hard_drive_capacity_over_time.png HanKwang (2008)
  • 5. Growth in formats *"+$!""$&'((,$-$.$$ !"#$%$&'(()$$ !/0$%$&'((#$$ *//$%$234$560$ !",$1$&4?$ !7)$1$<@4$ !/0$1$:;<'=$ !"#$% !""$% &$$$% !".$1$>:2$ *",$$1$49:$ !/#$1$234$567$ *77$1$82940777$ *"+$%$4'("/$ ABCDE;$ *"+$%$4'("+$ !/0$1$8294$
  • 6. Formats of formats MIME type image/tiff: •  TIFF (alle versies) •  TIFF/IT •  TIFF G4/LZW/UNC •  Digital Negative Format (DNG) •  GeoTIFF •  Pyramid TIFF •  ! Bron: PRONOM Technical Registry [http://www.nationalarchives.gov.uk/pronom/]
  • 7. Short & long term risks ,'&/00#$=4#.&>&.#0?.($ !"#$%&&'&()!*+($ ,"-.$,'&/00#$1"23"+"4+.4$ 5.784'-'+".$98":$ ;&+04"(0#'&"(78.$<"23"+"4+.4$ !"#$% !""$% &$$$% 5"26$
  • 9. Best practices 1. Create a preservation plan
  • 10. Best practices 1. Create a preservation plan 2. Backup and replicate your data
  • 11. Best practices 1. Create a preservation plan 2. Backup and replicate your data 3. Store preservation metadata
  • 12. Best practices 1. Create a preservation plan 2. Backup and replicate your data 3. Store preservation metadata 4. Store technical metadata
  • 13. Best practices 1. Create a preservation plan 2. Backup and replicate your data 3. Store preservation metadata 4. Store technical metadata 5. Store representation metadata
  • 14. Best practices 1. Create a preservation plan 2. Backup and replicate your data 3. Store preservation metadata 4. Store technical metadata 5. Store representation metadata 6. Don’t trust software
  • 15. Best practices 1. Create a preservation plan 2. Backup and replicate your data 3. Store preservation metadata 4. Store technical metadata 5. Store representation metadata 6. Don’t trust software 7. Store descriptive metadata
  • 16. Preservation Plan • Preservation policies (what to preserve) • Legal obligations • Organizational & Technical constraints • User requirements • Context • http://plato.ifs.tuwien.ac.at:8080/plato
  • 19. 3 `````` 1 1 4 1 2 Random error
  • 20. 3 `````` 1 3 `````` 1 2 1 1 4 4 1 1 1 2 1.9 Random error
  • 22. 3 `````` 1 2 1 4 1 1 Systematic error
  • 23. 3 `````` 1 2 1 4 1 1 1.9 Systematic error
  • 24. 3 `````` 1 2 75 1 4 1 1 1.9 Systematic error
  • 25.
  • 26.
  • 27. MTBF MTBF = Mean Time Between Failure 3 2 10 5 Time Total Time 40 hours MTBF = = = 13.3 hrs Number of failures 3 failures
  • 28. MTTF MTTF = Mean Time To Failure 3 2 10 5 Time Total time 20 hours MTTF = = = 5 hrs Number of units 4 units
  • 29.
  • 30. MTTF = 2 M hours = 228 years!
  • 31. MTTF = 2 M hours = 228 years! AFR = 1/MTTF = 0.004 = 0.4 %
  • 32. MTTF = 2 M hours = 228 years! AFR = 1/MTTF = 0.004 = 0.4 % R(t) = exp(-t/ϴ)
  • 33. MTTF = 2 M hours = 228 years! AFR = 1/MTTF = 0.004 = 0.4 % R(t) = exp(-t/ϴ) R(5) = exp(-5/228) = 0.98 = 98%
  • 34. MTTF = 2 M hours = 228 years! AFR = 1/MTTF = 0.004 = 0.4 % R(t) = exp(-t/ϴ) R(5) = exp(-5/228) = 0.98 = 98% 50 disks = 0.98^ 50 = 0.36 = 36%
  • 35. Experiments • Simulate 100 disks with a 200 MTTF using Processing. What happens if the AFR is not 0.4% but 4% (hint: what is MTTF in that case)? • Given a MTTF of 200 years and 50 disks what is the reliability in 1,2 and 5 years?
  • 36. Experiments • Amazon S3 claims an AFR per object of 0.000000001% [1]. What is the MTTF? • There are 100 billion objects in S3. Given an estimated average size of 1 MB how big is S3? • What is the chance (reliability) none of these 100 billion objects are lost in 1 year? [1] http://aws.amazon.com/s3/faqs/#How_reliable_is_Amazon_S3#How_durable_is_Amazon_S3
  • 37.
  • 40. 1 yr 3-5yr
  • 41.
  • 42. Experiments • Given the lifetime of the universe (13 billion years) as the lifetime of one storage byte. What is the probability one Tera byte (1 billion bytes) will survive 100 years? • Discuss
  • 43.
  • 46. Serial Failures 87 years
  • 47. Serial Failures 87 years 75 years
  • 48. Serial Failures 87 years 75 years 50 years
  • 49. Serial Failures 87 years 75 years 50 years 31 years
  • 50. Serial Failures •A B C D .... SYSTEM 1 1 1 1 1 = + + + + SYSTEM A B C D E.g. : components : 1 , 100 , 1000, 10000 System: 0.989 years
  • 51. Parallel Failures = 200 years = ?? years
  • 52. Parallel Failures { A B SYSTEM = C D SYSTEM = A * B * C * D E.g. : components : 200,200 System: 40000 years
  • 53. Composite Failures = ?? years
  • 54. Composite Failures = 40.000 years = SYSTEM 1 1 1 = + SYSTEM 40.000 40.000 SYSTEM = 20.000
  • 55. Experiments • Calculate the composite failure of the Tandem example (administration, software, hardware, environment) • How would you make this setup more reliable? Calculate the effect • What is the MTTF of a 5-way mirror of 7K3000 disks?
  • 56.
  • 58. Bit Errors 0110001011 0010101001 BER = Bit Error Rate = 3/10 = 0.3 = 30 %
  • 59. Bit Errors • Soft error - repeat the operation • Hard error - after some repeats data is lost • Typical disk BER = 10 to 10 (every 10KB -5 -6 to 100 KB read)
  • 60. Bit Errors Drive Type Hard Error 14 10 =~ 10 TB Consumer -14 15 10 10 =~ 100TB SATA Enterprise -15 16 10 10 =~ 1 PB SATA Enterprise -16 10 SAS *) BER-s are in bit = 1/8 byte 1 sector error for every 10 TB -> 1 PB read
  • 61. Experiments • Collect a few sample document from the web (images, documents, executables, etc); flip one or more random bits; explain the resulting effect • Use the visual defects experiment to measure the effect of flipping bits on images files with various compressions • Open and save an image file. Measure the visual effects. • Calculate the checksum of the files and repeat the experiments. Check results.
  • 62.
  • 63. File Formats • The goal of digital preservation is not preserving the bits and bytes but the means to access and use the information represented by them.
  • 64. File Formats Software Bits Information + Environment
  • 65. File Formats hypothetical 3-bit format 110110010111010 Width = bit [1 .. 3] Height = bit [4 .. 6] Data = bit [7 .. 15]
  • 66. File Formats With software you have only two options: 1. The software works and is maintained 2. The software doesn’t work and is not maintained
  • 67. File Formats 1. The software works and is maintained • Your designated community has the software tools • Your archive has the software tools • In both cases you need to provide information which software you need and the steps required to get access to the data
  • 68. File Formats 2. The software doesn’t work and is not maintained • Archive the source code of the orginal software • Emulate the original software
  • 69. Experiments • Experiment with different textencoding demo files to discover the bit content of these files. • Use droid and jhove to characterize and validate the demo files. • Invalidate the files using truncation, bit errors. Check the results. • Use migration and emulation to get access to the demo.wp file.
  • 70.
  • 71. Metadata • Descriptive Metadata • Administrative Metadata • Structural Metadata • Rights Metadata • Representation Metadata
  • 72. Packaging • Digital objects are composite structures • Need to be described, validated and accessed as a whole • Complex Objects
  • 73. Package Formats • METS • MPEG-21/DIDL • LOM/IMS • BagIt • TIPR RXP
  • 74. BagIt • Library of Congress & California Digital Library • NDIIP • Generic Format
  • 75. BagIt
  • 76. Experiments • Create using the Bagger toolkit a bag. Add Dublin Core descriptive metadata. • Save the bag as ZIP-file and deposit it do the demo archive. • As archivist access the deposit and validate its contents.