The Koninklijke Bibliotheek (KB) digitizes the national collection
of the Netherlands. Digitization leads to multiple versions of a
publication: a digital access file, a digital master file, back-ups of
the digital versions and the physical original publication. This in
turn increases the need for storage capacity quickly. And raises
questions like: Should all versions be stored? Do all the versions
need to be preserved in order to ensure permanent access, and if so
which ones should be preserved and how? Based on the collection
care plan and the content strategy a differentiated storage policy is
set up in order to establish a relation between the physical object
and the digital counterpart(s). This method assigns value to
different collection lots and is used to find out how to apply
collection care in an efficient way.
1. Finding the Balance
An attempt at modeling differentiated storage for digitized collections : finding
the balance between storage, costs and preservation of digitized publications.
Trudie Stoutjesdijk, September 5th 2013
2. How to find the balance….
Digitization
•Multiple versions of a publication
•Which versions should be stored?
•What representation is the object of preservation?
•How can we reduce the need for storage?
Finding the balance Trudie Stoutjesdijk, September 5th 2013
3. Agenda
• Who we are
• What we have
• Finding the balance
Finding the balance Trudie Stoutjesdijk, September 5th 2013
4. Who we are
• National Library
• Strategic Plan 2010-2013
• We offer everyone access to
everything published in and about the
Netherlands
• We improve the national information
infrastructure
• We guarantee long-term storage of
digital information
• We maintain, present and strengthen
our collection
Finding the balance Trudie Stoutjesdijk, September 5th 2013
5. What we have
1. Collection Development
Programme
2. Collection Care Plan
3. Storage Management
4. Digital Preservation System
Finding the balance Trudie Stoutjesdijk, September 5th 2013
6. What we have
1. Collection development programme
(2010-2013)
• Collect and preserve everything published in and about
the Netherlands
• Transition from printed to digital format is key priority.
• Collect 50% of all Dutch digital born publications
• Harvest 10.000 websites
• Digitization of all the books, periodicals and
newspapers since 1470 (60 M pages before 2014)
Finding the balance Trudie Stoutjesdijk, September 5th 2013
7. Finding the balance Trudie Stoutjesdijk, September 5th 2013
Output:
•Digital objects in JPEG2000.
•Different versions of an object: master,
access, back-up, physical publication.
Rapid increase in the number of items and
total cost for storage
What we have
10% of all books, periodicals and newspapers
(since 1470), digitized before 2014.
1. Collection development programme : Digitization
8. What we have
2. Collection Care Plan
Integrated, efficient and effective collection care for both
physical and digital collections, based on the following principles:
•Integrated collection care for digital files and physical
objects
•Value assessment of collections
•Risk identification
•Differentiated levels of collection care
•Care redirected from the most valuable collections,
to those where the biggest loss of value is expected.
Finding the balance Trudie Stoutjesdijk, September 5th 2013
9. 2. Collection Care
Finding the balance Trudie Stoutjesdijk, September 5th 2013
What we have
Differentiated collection care based on a
rational selection tool: value assessment
•Divide the collections in different collection lots or
categories
•Describe collection units
•Establish the definition of every criterion
•Rate every collection unit
•Calculate the average value
Result: The level and duration of collection care
Primary criteria Secondary criteria
Informational value Use
Aesthetic value Completeness
Historical value Condition
Social value Provenance
10. What we have
Hierarchical storage management (HSM)
Finding the balance Trudie Stoutjesdijk, September 5th
2013
• Using several tiers defining
different levels of storage
quality.
• Based on different needs.
• Use more than one type of
media (HDD, Magnetic Tape).
3. Storage strategy
11. What we have
4. Digital Preservation System
•e-Depot system (DIAS)
at the end of its natural life:
•New Digital Preservation System (DPS)
•2012 migration from DIAS to new DPS
•2013 new ingest workflows for
born digital publications.
•Next step: new ingest workflows
for all the digitized collections.
Finding the balance Trudie Stoutjesdijk, September 5th
2013
12. How to find the balance….
It is impossible to preserve all the versions at the
highest preservation level.
The value assessment provides insight in:
- The level and duration of collection care
- The relation between physical object and
digital counterparts.
- The relation between the state of the physical
object and the necessity of preservation
imaging and sustainable storage.
Finding the balance Trudie Stoutjesdijk, September 5th 2013
13. A differentiated storage policy has been applied on the
digitized collections; based on the following secondary
values:
• Use
- The availability of digital content for the
customer
• Condition
- The vulnerability of the physical resources
- Sustainability of digital storage
Finding the balance Trudie Stoutjesdijk, September 5th 2013
Finding the balance:
Differentiated storage model for digitized collections
In anticipation of the results of the value assessment we
tried to identify classification levels.
14. Finding the balance Trudie Stoutjesdijk, September 5th 2013
Collection Care: Classification levels
Preservation
level
1. 2. 3. 4. 5.
Representation available?
-Digital Master No No Master light Preservation
master
Preservation
master
- Access file No Yes Yes Yes Yes
- Physical
original
No Yes Yes Yes Yes
Preservation copy available?
No No Physical
original
Preservation
master
- Physical original
- Preservation
master
Effort of conservation / preservation care
Active Physical
original
preservation
master
Physical original
and digital master
Passive physical
original;
access file
Master light physical
original
Finding the balance:
Differentiated storage model for digitized collections
15. Finding the balance Trudie Stoutjesdijk, September 5th 2013
Collection Care: Classification levels
Finding the balance:
Differentiated storage model for digitized collections
Preservation level 1.
Representation available?
-Digital Master No
- Access file No
- Physical original No
Preservation copy available?
No
Effort of conservation /
preservation care
Active
Passive
Level 1:
-Lowest imaginable level.
-For use only.
-Contains no representations and
there’s nothing to preserve.
-Example: the reference collection
which is being transformed from
physical to digital.
16. Finding the balance Trudie Stoutjesdijk, September 5th 2013
Collection Care: Classification levels
Finding the balance:
Differentiated storage model for digitized collections
Level 2:
-Digitized for use.
-Contains publications that can be
digitized more than once.
-Condition is good and will continue
under the current circumstances.
-No need for a digital master unless
decay strikes
-Example: all foreign titles of the
Google project
Preservation
level
2.
Representation available?
-Digital Master No
- Access file Yes
- Physical original Yes
Preservation copy available?
No
Effort of conservation /
preservation care
Active
Passive physical
original;
access file
17. Finding the balance Trudie Stoutjesdijk, September 5th 2013
Collection Care: Classification levels
Finding the balance:
Differentiated storage model for digitized collections
Level 3:
-Digitization for use
-Contains objects that represents
multiple values
-Physical object is in a quite good
condition. Can be digitized
repeatedly
-No need for preservation image
-Active preservation: physical
original.
-Example: large parts of the special
collection (18th
century)
Preservation
level
3.
Representation available?
-Digital Master Master light
- Access file Yes
- Physical original Yes
Preservation copy available?
Physical original
Effort of conservation /
preservation care
Active Physical original
Passive Master light
18. Finding the balance Trudie Stoutjesdijk, September 5th 2013
Collection Care: Classification levels
Finding the balance:
Differentiated storage model for digitized collections
Level 4:
-For use and preservation
-Objects with high information value,
hardly value as an object.
-The material can be fragile,
digitization can sometimes be done
only once
-Maintenance of the physical object
may not be possible in the future
-Create high quality preservation
master
-Example: Metamorfoze Nat.
program for the
Preservation of Paper Heritage
Preservation
level
4.
Representation available?
-Digital Master Preservation
master
- Access file Yes
- Physical
original
Yes
Preservation copy available?
Preservation
master
Effort of conservation / preservation
care
Active preservation
master
Passive physical original
19. Finding the balance Trudie Stoutjesdijk, September 5th 2013
Collection Care: Classification levels
Finding the balance:
Differentiated storage model for digitized collections
Level 5:
-For use and preservation
-Contains fragile, precious
objects
-Physical object represents
primary values that might not be
reflected in the digital master
-Can only be digitized once
-High quality digital master
-Example: Bookbinding of
William the Silent
Preservation
level
5.
Representation available?
-Digital Master Preservation
master
- Access file Yes
- Physical original Yes
Preservation copy available?
- Physical original
- Preservation
master
Effort of conservation / preservation
care
Active Physical original
and digital master
Passive
20. Finding the balance Trudie Stoutjesdijk, September 5th 2013
Digitized collections and storage costs.
Finding the balance:
Differentiated storage model for digitized collections
Currently the output of digitization process is a digital master
and a digital access file.
21. Finding the balance Trudie Stoutjesdijk, September 5th 2013
Finding the balance:
Differentiated storage model for digitized collections
Type of publication Total costs
Books Storage/year Digitization / page
Master € 0,01 € 0,72
Access file € 0,008 € 0,56
Master & Access € 0,02 € 1,28
Newspapers Storage Digitization
Master € 0,02 € 1,08
Access file € 0,01 € 0,93
Master & Access € 0,05 € 2,01
Journals Storage Digitization
Master € 0,01 € 0,77
Access file € 0,009 € 0,61
Master & Access € 0,01 € 1,38
Costs based on TCO storage & digitization
22. Finding the balance Trudie Stoutjesdijk, September 5th 2013
Finding the balance:
Differentiated storage model for digitized collections
Classification levels & Cost savings
The application of the five level classification model reduce the
storage costs of digitized publications for 2 levels.
•level 2 will not contain digital master files. This could reduce
the costs with 30 – 40%.
•level 3 a digital master light will be created; a master light
could require less image quality than a preservation master
which could reduce the size of a digitized publication, less
storage costs.
23. Finding the balance Trudie Stoutjesdijk, September 5th 2013
Finding the balance:
Differentiated storage model for digitized collections
Alternatives for cost saving
New digital master or digital access files needed:
•The access file no longer meets the requirements of the
user,
•technologies offers new opportunities, possibly better and
smaller digital masters
•the original physical decay appears to be stronger than
expected...
Rescanning and/or conversion?
24. Finding the balance Trudie Stoutjesdijk, September 5th 2013
Finding the balance:
Differentiated storage model for digitized collections
Rescanning : i.e. re-digitization of (parts of) the collection.
•Level 1 has no objects.
•Level 2 when decay increases
•Level 3 has 2 digital copies, decay / obselescence
•Level 4 & 5 rescanning is undesirable / impossible.
Conversion: generate a digital access file from the digital
master.
•Can offer a solution, for level 4 and 5, (vulnerable physical
collections).
Conversion on the fly: generate a digital access file on
demand
•Suitable for level 3, 4 and 5, access files don’t need to be
stored.
25. Finding the balance Trudie Stoutjesdijk, September 5th 2013
Finding the balance:
Differentiated storage model for digitized collections
Conversion and/or on the fly conversion
Pro’s
•Appropriate and efficient method for permanent storage and
access of the collections.
•Good solution for the collections at level 4 & 5
•Probably cost saving on production
•Cost saving on storage
Con’s
•A system intensive activity that could create a bottleneck in
the delivery to the end user
•Insufficient knowledge about the technique
•No insight in the costs
Started research on conversion by the Research Department
26. Finding the balance Trudie Stoutjesdijk, September 5th 2013
Finding the balance:
Differentiated storage model for digitized collections
Wrap up: Tried to realise a model
Lessons learned:
•Value assessment helps to gain insight in the value of
collections
•USE and CONDITION of collections helps to find a balance
between permanent access and costs
•Transparency of the costs.
• Rescanning is not feasible for publications that are in
vulnerable state.
• Conversion might seem preferable to that of rescanning.
• Investigation of the conversion / on-the-fly conversion
technique is necessary to gain insight into the benefits of
this method. In particular with respect to applicability,
performance and efficiency.