1. Preservation metadata
Andrew Waugh
Senior Manager, Standards and Policy
Public Record Office Victoria
2. Structure of the talk
β’ What is preservation metadata?
β’ Recordkeeping metadata in theory
β’ NAA/ANZ recordkeeping metadata standard
β’ PREMIS β standard for preservation metadata
β’ Practical reading and implementing tips
β’ Conclusions
3. What is preservation?
β’ The ability to be able to access content for as
long as it is required
β’ Access means
β Being able to find the content
β Extract information from the content
β Understand the context of the content
β Be confident of the history of the content
4. Preservation metadata
β’ Preservation metadata is the information
necessary to maintain access to content
β’ Difference between short and long term
access is one of degree of metadata, not kind
β’ As preservation professionals, we are rarely
interested in the content, just managing it.
Preservation metadata is the basic information
that we use to do our job
5. Examples of preservation
metadata
β’ Identifier
β’ Creation date
β’ Title
β’ History information
β’ Relationship between objects
β’ Data formats
6. Recordkeeping Metadata
β’ The archival profession has been developing
recordkeeping (=preservation) metadata for
around a decade
β’ This work provides a useful framework to think
about preservation and metadata
7. RK Metadata Standards
β’ ISO 20381 Information and documentation β
Records management processes β Metadata for
records
β Part 1: Principles
β Part 2: Conceptual and implementation issues
β’ National Archives of Australia (and Archives New
Zealand) - Recordkeeping Metadata Standard
Version 2.0
β http://www.naa.gov.au/Images/AGRkMS_Final%20Edit_16
%2007%2008_Revised_tcm2-12630.pdf
β’ Forthcoming Australian/New Zealand Standard
8. Metadata from a records view
β’ Records are content, context, and structure
β’ Record management metadata is data
describing the context, content, and structure
of records and their management through time
(ISO 15489-1:2001, 3.12)
β’ Recordkeeping metadata is the key to
providing access (and hence preservation)
β’ In practice, metadata is everything except the
actual content of the record
9. Purpose of recordkeeping
metadata
β’ The purpose of recordkeeping metadata includes
β Protecting records as evidence
β Ensuring their accessibility and usability through time
β Facilitating the ability to understand records
β Helping ensure the authenticity, reliability and integrity of
records
β Supporting and managing access, privacy, and rights
β Supporting the migration of records from one
(preservation) system to another
10. Metadata at record capture
β’ Records are captured into a system, and
metadata is created/captured with them
β’ This metadata documents
β Environment in which records were created
β Purpose or business activity being undertaken
β Relationship with other records or aggregations
β Physical or technical structure of the record
β Logical structure of the record
11. Metadata after record capture
β’ Metadata captured after record creation
documents what happened to a record over
time
β demonstrates authenticity, reliability, usability, and
integrity)
β’ Answers the basic questions of who, what,
when, where, why
12. Metadata after disposal
β’ Metadata is a record itself, and some parts
may need to be kept after the record has been
disposed of to account for their existence,
management, and disposition
13. Four entity model
β’ Modern Australian recordkeeping metadata
models normally are expressed in terms of
entities
β Records (the objects to be preserved: record, file,
seriesβ¦)
β Agents (people who create and use the records)
β The business transacted
β Mandates (the rules governing the business)
15. One, two, three, four entity models
β’ The four entity model can be flattened to
facilitate implementation
β A system could only store one entity (record)
which contains metadata for agents, business,
and mandates
β Practical because most metadata is captured at
creation, subsequent changes in relationships or
information less relevant
17. Identity metadata
β’ Distinguishes entity from all other entities in
the domain
β Entity type (e.g. record, agent)
β Aggregation (e.g. file, record)
β Registration Identifier (the actual identifier)
18. Description metadata
β’ Describes the entity to allow determination if
this is the entity sought
β Title
β Classification
β Abstract
β Place
β External Identifiers
β’ WARNING β description elements are
normally business specific
19. Use metadata
β’ Assists long-term access to the entity
β Technical environment
β Rights (who may legal use it & under what
conditions)
β Access (access control)
β Language
β Integrity
β Documentary form
20. Event plan
β’ Allows the entity to be managed
β’ Consists of management actions that are
planned to occur in the future
β Appraisal (To keep or not)
β Disposal (Implementation of appraisal decision)
β Preservation
β Access Control (Changes to)
β Rights (Changes to)
21. Event history
β’ Documents the trail of past events
β’ Who, what, when, why
β Event identifier
β Event date/time
β Event type
β Event description
β Event relation (mandate, agent)
22. Relation
β’ Links two (or more) entities
β’ Implicitly bi-directional, but need not be
implemented this way
β’ Relationships often have a time span
β Entity Identifiers (from, to)
β Relationship type
β Relationship description
β Relationship date range
23. NAA/ANZ metadata standard
β’ Same content, two standards
β’ NAA version
β Recordkeeping Metadata Standard Version 2.0
β http://www.naa.gov.au/Images/AGRkMS_Final%2
0Edit_16%2007%2008_Revised_tcm2-12630.pdf
β Based on five entities (Record, Agent, Business,
Mandate, Relationship)
β Defines 26 elements with 44 sub-elements
β Includes extensive element schemes
24. NAA/ANZ Elements
All Entities
Entity Type Mandatory Element
Category Conditional Element
Identifier* Optional Element
Name*
Date Range
Description
Record Agent Business Mandate Relationshp
Jurisdiction* Jurisdiction* Jurisdiction* Jurisdiction* Related Entity*
Security Class* Permissions* Security Class* Security Class* Change History*
Security Caveat* Contact* Permissions* Security Caveat*
Rights* Position* Coverage*
Language* Language*
Coverage*
Keyword*
Disposal*
Format
Extent*
Medium
Integrity Check
Location*
Document Form
Precedence
25. Future Australian Standard
β’ Work is in progress on an Australian Standard
for recordkeeping metadata
β’ Based on the NAA/ANZ metadata standard
β’ Focus on relationships
26. PREMIS
β’ Preservation metadata is the information a
respository uses to support the digital preseration
process
β’ Supports the viability, renderability,
understandability, authenticity, and identity of digital
objects
β’ Built on OAIS reference model
β’ Data dictionary & supporting materials
β http://www.loc.gov/standards/premis/
27. PREMIS scope
β’ Not intended to define all preservations elements,
only those that most repositories are likely to need to
know in order to support digital preservation
β’ Excludes
β Format specific metadata (even for a class of format)
β Repository specific metadata and business rules
β Descriptive metadata
β Detailed information about media or hardware
β Information about agents, apart from minimum required for
identification
β Information about rights and permissions, except those
that directly affect preservation functions
28. PREMIS Data Model
β’ From Understanding PREMIS http://www.loc.gov/standards/premis/understanding-premis.pdf
29. PREMIS Entities
β’ Intellectual Entity β set of content that is a single
intellectual unit β has no metadata in PREMIS
β’ Object Entity β things actually stored in a repository
β Representation Object β collection of all file objects
necessary to represent an intellectual entity
β File Object β discrete object on a computer file system
β Bitstream Object β portion of a file
β’ Event Entity β contains the history of an Object
β’ Rights Entity β rights and permissions about object
β’ Agent Entity β actors involved in events or rights
30. Elements for Object Entities
β’ Unique Identifier β’ Significant properties
β’ Fixity information (aspects that must be
β’ Size preserved)
β’ Environment
β’ Format
(infrastructure required
β’ Original Name to use)
β’ Creators β’ Storage media
β’ Inhibitors (things β’ Digital signatures
designed to prevent
use) β’ Relationship with other
entities
31. NAA/ANZ vs PREMIS
β’ NAA/ANZ β’ PREMIS
β Recordkeeping is about β Deliberately focuses on
relationships, so includes preserving the files that
the context of objects form a digital object β
which is often necessary context is important, but
to understand the object not documented
β Documents the β Documents critical
management plan for the information necessary to
object use objects
33. General observations
β’ Most metadata schemes are lengthy, but
contain relatively little information
β’ If you understand the typical structure, it is
easy to quickly pick out the information you
need
β’ Metadata schemes tend to be aspirational β
what the drafters thought you should do, often
beyond what can do or have to do
34. Metadata schemes
β’ Typical metadata schemes contain
β Entities (i.e. objects modelled)
β’ Definition
β’ Lists valid elements
β Elements (i.e. specific pieces of information)
β’ Definition
β’ Mandatory, optional, conditional flag
β’ Repeatable or not
β’ Structure (child elements)
β Element schemas (i.e. controls over the values that can be
used)
β’ Lists of valid values (e.g. States)
β’ Format controls (e.g. dates)
35. Implementation
β’ Metadata schemes are information models, not
implementation instructions
β’ Adopting a scheme means that your implementation
has the
β mandatory elements
β conditional elements (if relevant)
β (perhaps) some of the optional elements
β The element structure is correct
β’ Metadata schemes are often associated with a
representation standard (e.g. in XML)
β Still not an implementation β often just for exchange
36. Conclusions
β’ Preservation metadata is simply the
information that preservation professionals
use to ensure continued access to objects
β’ What is viewed as essential depends on your
discipline (what features is it necessary to
preserve?)
β E.g. archivists are concerned about context,
librarians less so
37. Conclusions (2)
β’ Typical preservation β’ Other common
metadata metadata
β Identity information β Description
β Technical details and β Management Plans
organisation of the β Relationships between
objects to be preserved objects
β Rights and access
β History of object
38. Conclusions (3)
β’ You only have to implement the logical model
and the mandatory elements
β’ Standards are usually aspirational β include
metadata that is nice to have, but not essential
β’ Specific representations (e.g. XML) are for
data exchange, not how you must implement
them internally