The document discusses implementing metadata standards for a digital audiovisual preservation repository. It describes the goals of creating a prototype repository at New York University (NYU) to aggregate content from partner stations and populate records with existing metadata. It outlines the metadata models used, including METS, PBCore and PREMIS, and how they can be combined and embedded within submission information packages (SIPs) and archival information packages (AIPs) to standardize the representation and preservation of content and associated metadata for long-term preservation.
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Implementing Metadata Standards for a Digital Audiovisual Preservation Repository
1. Implementing metadata
standards for a digital
audiovisual preservation
repository
Kara van Malssen
AudioVisual Preservation Solutions
2011-03-24
2. Case Study: NDIIPP Preserving
Digital Public Television Project
SIP site
Repository
WNET WGBH
NYU
PBS
Library of
Congress
3. Producing Stations Satellite
Station
Transmitting
A Stations
Station
B WNET WGBH
Station Station Station
C A B
Station Station
C D
PBS Station Station
WNET E F
Station Station
G H
Station Station
I J
WGBH
NYU PDPTV Prototype
Repository
Submission Workflow
4. NYU Goals:
• Create a prototype repository for long term retention
• Aggregate content from partner stations + PBS for
sample programs
• Populate records with metadata that already
exists (in station databases, files, scheduling systems, etc)
• Transform data and package content, while
preserving relationships between items
5. Important Vocabulary
•The Repository: NYU
prototype preservation repository
• OAIS : Open Archival
Information System
OAIS
• SIP: Submission Information
Package
Terms!
• AIP : Archival Information
Package
6. SD
HD SD
Broadcast Production Production
Broadcast Broadcast
Master Master Master
Master Master
(mov/aiff/ (mov) (mxf)
(mov/data) (mpeg)
m2v)
DATABASE EXPORTS PODS TEAMS
PRO INMAGIC
TRACK
ADDITIONAL ITEMS Scripts,
etc
Challenge of
managing SIP Class 1: WNET National SIP Class 3: WNET Local Broadcast
diverse Broadcast (Nature) (New York Voices)
SD Production
HD SD
Broadcast Production
SIPs:
Broadcast Broadcast Master
Master Master (mxf)
Master Master
(mov/aiff/ (mxf)
(mov/data) (mpeg)
m2v)
PODS PRO INMAGIC
INMAGIC
TRACK
SIP Class 2: WGBH National SIP Class 4: Religion and Ethics
Broadcasts
SD SD
Broadcast Production
Broadcast Production Master
Master Master Master
(mov/aiff/ (mov)
(mov/aiff/ (mov)
m2v) m2v)
TEAMS Scripts,
PODS PODS PRO etc
TRACK
7. PDPTV metadata model
METS: Metadata Encoding
and Transmission Standard
Structural and administrative
PBCore: Public Broadcasting
Metadata Dictionary
Descriptive and technical
PREMIS: Preservation
Metadata Implementation
Strategy
Technical preservation metadata
8. METS : Metadata Encoding and Transmission
Standard
• Provides a structure to bundle all content
(essence + metadata) in one AIP
• Identifies types of metadata, but not the
terms to define them (with a few exceptions)
METS fileSec
amdSec
dmdSec structMap
techMD rightsMD sourceMD digiprovMD
behaviorSec
9. PBCore : What is it good for?
• Descriptive metadata elements that are
specific to public broadcasting & AV
• Controlled vocabularies with broadcast terms
• Easy to map to from legacy station databases
• Granular technical metadata (PBCore 1.2+)
➡ Accurately represents the file specific metadata
➡ Can be auto populated using technical metadata
extraction tools & sytlesheets
10. PREMIS : Preservation Metadata Implementation Strategies
Intellectual
Object Entity: Entity
•Creating Rights
application info
•Playback
environment Object Agents
(hardware and
software
Events
11. “Given the wide range of institutional contexts,
PREMIS cannot be an out-of-the box solution.
Users have to decide how to model their
specific application, which semantic units
need to be captured to support them, and
how to implement them.”
- ISQ Special Issue: Digitial Preservation, Spring 2010, p.9
12. <?xml version="1.0" encoding="UTF-8"?>
<premis xmlns="info:lc/xmlns/premis-v2" version="2.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="info:lc/xmlns/premis-v2 http://www.loc.gov/standards/premis/premis.xsd">
<!-- ================================================================== -->
<object xsi:type="representation" xmlID="bmaster-sd-001">
<objectIdentifier>
<objectIdentifierType>NDIIPP:PDPTV repository naming scheme</objectIdentifierType>
<objectIdentifierValue><!-- SD_BMASTER --></objectIdentifierValue>
</objectIdentifier>
<environment>
<environmentPurpose>create</environmentPurpose>
<environmentNote>The OMNEON server generated three files for the SD broadcast master:
one QuickTime movie file (.mov), one video track (.m2v), and one audio track
(.aiff). The .mov file contains fully-qualified pathname references to the .m2v and
.aiff tracks that were only valid in the OMNEON server environment.</environmentNote>
<environmentExtension>
<creatingApplication>
<creatingApplicationName>Avid Unity Workgroup</creatingApplicationName>
<creatingApplicationVersion>4</creatingApplicationVersion>
</creatingApplication>
<creatingApplication>
<creatingApplicationName>Avid Media Composer REG_SZ</creatingApplicationName>
<creatingApplicationVersion>3.0.5</creatingApplicationVersion>
</creatingApplication>
<creatingApplication>
<creatingApplicationName>Omneon</creatingApplicationName>
<creatingApplicationVersion>4.3 sr2</creatingApplicationVersion>
</creatingApplication>
</environmentExtension>
</environment>
13. <object xsi:type="representation" xmlID="bmaster-sd-001">
<objectIdentifier>
<objectIdentifierType>NDIIPP:PDPTV repository naming scheme</objectIdentifierType>
<objectIdentifierValue><!-- SD_BMASTER --></objectIdentifierValue>
</objectIdentifier>
<environment>
<environmentPurpose>create</environmentPurpose>
<environmentNote>The OMNEON server generated three files for the SD broadcast master:
one QuickTime movie file (.mov), one video track (.m2v), and one audio track
(.aiff). The .mov file contains fully-qualified pathname references to the .m2v and
.aiff tracks that were only valid in the OMNEON server environment.</environmentNote>
<environmentExtension>
<creatingApplication>
<creatingApplicationName>Avid Unity Workgroup</creatingApplicationName>
<creatingApplicationVersion>4</creatingApplicationVersion>
</creatingApplication>
<creatingApplication>
<creatingApplicationName>Avid Media Composer REG_SZ</creatingApplicationName>
<creatingApplicationVersion>3.0.5</creatingApplicationVersion>
</creatingApplication>
<creatingApplication>
<creatingApplicationName>Omneon</creatingApplicationName>
<creatingApplicationVersion>4.3 sr2</creatingApplicationVersion>
</creatingApplication>
</environmentExtension>
</environment>
<environment>
<environmentCharacteristic>known to work</environmentCharacteristic>
<environmentPurpose>render</environmentPurpose>
<environmentNote>To render the content in this environment the video track (.m2v) and
audio track (.aiff) must be muxed using QTCoffee. The QuickTime movie file cannot be
14. </environment>
<environment>
<environmentCharacteristic>known to work</environmentCharacteristic>
<environmentPurpose>render</environmentPurpose>
<environmentNote>To render the content in this environment the video track (.m2v) and
audio track (.aiff) must be muxed using QTCoffee. The QuickTime movie file cannot be
used to render the content because the .mov file refers to the .m2v and .aiff tracks
by fully-qualified file names that were only valid in the creating environment. </
environmentNote>
<software>
<swName>Apple Macintosh OS X version 10.5.5</swName>
<swType>operating system</swType>
</software>
<software>
<swName>Apple QuickTime Player version 7.5.5</swName>
<swType>renderer</swType>
</software>
<software>
<swName>QTCoffee 1.2.5</swName>
<swType>muxer</swType>
</software>
<hardware>
<hwName>Intel Core 2 Duo</hwName>
<hwType>processor</hwType>
<hwOtherInformation>2 GB RAM</hwOtherInformation>
</hardware>
</environment>
</object>
15. “When combining different metadata
specifications or when embedding extension
metadata, we often find that data models
are mismatched or that semantic units
overlap. In these cases, it is necessary to
decide how to overcome the conflicts.”
- ISQ Special Issue: Digitial Preservation, Spring 2010, p.7
16. METS PBCore
Title
Structure Creator
Description
Relationships
Agents File Format
Rights
Checksums File Size
Hardware
Software
PREMIS
17. METS PBCore
Title
Structure Creator MODS
Description
Relationships
Agents
Rights File Format
Checksums File Size
Descriptive elements only
map to MODS
Hardware
Software
METSRights!
PREMIS
20. AIP creation simplified
1. Content submitted, verified
2. METS automatically generated (checksums
into METS attributes)
3. Source database exports automatically
converted to PBCore
4. Technical metadata extracted from files using
MediaInfo, converted to PBCore
5. MODS created from completed PBCore
6. Rights metadata (METSRights), preservation
metadata (PREMIS) created
7. AIP complete
21. SD
HD SD
Broadcast Production Production
ESSENCE FILE Broadcast
Master
Master Master Master Broadcast
(mov/aiff/ (mov) (mxf) Master
TYPES (mov/data)
m2v) (mpeg)
METS
METADATA METS PBCore PREMIS
Rights
MODS
Original
ADDITIONAL ITEMS Scripts,
database
etc
exports
AIP Class 1: Nationally distributed content (Nature)
SD
AIPs:
HD
Broadcast Production
Broadcast
METS Master Master
Master
(mov/aiff/ (mxf)
(mov/data)
m2v)
Original
METS database
PBCore PREMIS MODS
Rights exports
AIP Class 4: Religion and Ethics
SD
Broadcast Production Original
METS Master Master database
(mov/aiff/ (mov) exports
m2v)
METS Scripts,
PBCore PREMIS MODS
Rights etc
22. “Some file formats enable the capture of
technical, and other, metadata within their
files, which has the advantage of keeping the
files self-descriptive. However, by extracting
and storing metadata explicitly we may
also benefit.”
- ISQ Special Issue: Digitial Preservation, Spring 2010, p.11