SlideShare uma empresa Scribd logo
1 de 11
Baixar para ler offline
METS-Bagger Tool
Normalizing existing digitized content into standardized
packages for robust long-term management.

Marcus Emmanuel Barnes
#c4lbc
2013-11-28
Background
● SFU Library holds about 15 TB of content
○ the Library has created high-quality master versions
of content it has digitized using ‘preservationfriendly’ formats.
○ descriptive metadata exists for almost all of it.

However, this content was not previously
managed with generally accepted digital
preservation practice.
Solution
● SFU Library Digitized Content Packaging
Specification
● METS-Bagger tool for normalizing existing
digitized content based on this specification
for robust long-term management.
METS-Bagger Tool
● Two components:
○ Collection normalization script
○ Integrity scripts based on collection
manifest
Collection Normalization
● Processes existing collections of files into a format
compliant with the SFU Library Digitized Content
Packaging Specification
● Packaging Formats:
○ METS (http://www.loc.gov/standards/mets/)
○ BagIt (http://tools.ietf.org/html/draft-kunze-bagit)
How Collection Normalization Works
1. Configuration file for settings
2. Script walks the directory tree of a collection, compiles
list of files to be preserved
3. Files are collated into items (e.g., newspaper issue),
METS file is generated
4. Items files and associated METS file are bagged (and
serialized)
5. Future: A collection manifest is created for the collection
for integrity checking (automatic or manual).
Before and After Processing
Design Principles
● a minimalist implementation - uses as few METS and
BagIt options as possible.
● incorporates three widely implemented and understood
standards: METS, BagIt and UUID (Universally Unique
Identifiers)
● Technical metadata included in METS should include at
a minimum bit-level checksums, file type identification,
creating application, and where possible format validity
● Whenever possible, include descriptive metadata for the
item in the METS file.
Script Details
● Configuration file, main script, log file, processed
collection output directory
● Uses Python for using the tool on multiple platforms
● Plugins for technical metadata (FITS) and descriptive
metadata.
● Configuration options include:
○ test run (limited run size)
○ skipping technical metadata creation
○ file types of interest
Future
● Addition of manifest and integrity checking
tools that check a collection against its
manifest
● Additional plugins
● Sharing code on GitHub
Thank You
This work was made possible by the support of:
● Simon Fraser University Library
● SFU Library Systems group
● Mark Jordan @mjordan

Mais conteúdo relacionado

Semelhante a SFU Library's METS-Bagger Tool

Page 18Goal Implement a complete search engine. Milestones.docx
Page 18Goal Implement a complete search engine. Milestones.docxPage 18Goal Implement a complete search engine. Milestones.docx
Page 18Goal Implement a complete search engine. Milestones.docx
smile790243
 
Lecture 8 comp forensics 03 10-18 file system
Lecture 8 comp forensics 03 10-18 file systemLecture 8 comp forensics 03 10-18 file system
Lecture 8 comp forensics 03 10-18 file system
Alchemist095
 

Semelhante a SFU Library's METS-Bagger Tool (20)

BatIg
BatIgBatIg
BatIg
 
Presentation 16 may keynote karin bredenberg
Presentation 16 may keynote karin bredenbergPresentation 16 may keynote karin bredenberg
Presentation 16 may keynote karin bredenberg
 
NCompass Live: Best Practices for Digital Collections
NCompass Live: Best Practices for Digital Collections NCompass Live: Best Practices for Digital Collections
NCompass Live: Best Practices for Digital Collections
 
2020 07-30 elastic agent + ingest management
2020 07-30 elastic agent + ingest management2020 07-30 elastic agent + ingest management
2020 07-30 elastic agent + ingest management
 
APS-Presentation-MK.pptx
APS-Presentation-MK.pptxAPS-Presentation-MK.pptx
APS-Presentation-MK.pptx
 
Biothings presentation
Biothings presentationBiothings presentation
Biothings presentation
 
Archivematica and Local Authority Archive Services
Archivematica and Local Authority Archive ServicesArchivematica and Local Authority Archive Services
Archivematica and Local Authority Archive Services
 
PERICLES Process Compiler - ‘Eye of the Storm: Preserving Digital Content in ...
PERICLES Process Compiler - ‘Eye of the Storm: Preserving Digital Content in ...PERICLES Process Compiler - ‘Eye of the Storm: Preserving Digital Content in ...
PERICLES Process Compiler - ‘Eye of the Storm: Preserving Digital Content in ...
 
Introduction to digital curation
Introduction to digital curationIntroduction to digital curation
Introduction to digital curation
 
People aggregator
People aggregatorPeople aggregator
People aggregator
 
What is Digital Asset Management?
What is Digital Asset Management?What is Digital Asset Management?
What is Digital Asset Management?
 
Asp .net folders and web.config
Asp .net folders and web.configAsp .net folders and web.config
Asp .net folders and web.config
 
Webinar: What's New in Pipeline Pilot 8.5 Collection Update 1?
Webinar: What's New in Pipeline Pilot 8.5 Collection Update 1?Webinar: What's New in Pipeline Pilot 8.5 Collection Update 1?
Webinar: What's New in Pipeline Pilot 8.5 Collection Update 1?
 
Page 18Goal Implement a complete search engine. Milestones.docx
Page 18Goal Implement a complete search engine. Milestones.docxPage 18Goal Implement a complete search engine. Milestones.docx
Page 18Goal Implement a complete search engine. Milestones.docx
 
Asp .net folders and web.config
Asp .net folders and web.configAsp .net folders and web.config
Asp .net folders and web.config
 
File management in OS
File management in OSFile management in OS
File management in OS
 
Islandora & Archivematica combined NDSA RAG poster for LITA
Islandora & Archivematica combined NDSA RAG poster for LITAIslandora & Archivematica combined NDSA RAG poster for LITA
Islandora & Archivematica combined NDSA RAG poster for LITA
 
The ECM world from the point of view of Alfresco - Linux Day 2013 - Rome
The ECM world from the point of view of Alfresco - Linux Day 2013 - RomeThe ECM world from the point of view of Alfresco - Linux Day 2013 - Rome
The ECM world from the point of view of Alfresco - Linux Day 2013 - Rome
 
Lecture 8 comp forensics 03 10-18 file system
Lecture 8 comp forensics 03 10-18 file systemLecture 8 comp forensics 03 10-18 file system
Lecture 8 comp forensics 03 10-18 file system
 
Personal Digital Archiving 2015 - NYU - Workshop
Personal Digital Archiving 2015 - NYU - WorkshopPersonal Digital Archiving 2015 - NYU - Workshop
Personal Digital Archiving 2015 - NYU - Workshop
 

Último

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Último (20)

MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 

SFU Library's METS-Bagger Tool

  • 1. METS-Bagger Tool Normalizing existing digitized content into standardized packages for robust long-term management. Marcus Emmanuel Barnes #c4lbc 2013-11-28
  • 2. Background ● SFU Library holds about 15 TB of content ○ the Library has created high-quality master versions of content it has digitized using ‘preservationfriendly’ formats. ○ descriptive metadata exists for almost all of it. However, this content was not previously managed with generally accepted digital preservation practice.
  • 3. Solution ● SFU Library Digitized Content Packaging Specification ● METS-Bagger tool for normalizing existing digitized content based on this specification for robust long-term management.
  • 4. METS-Bagger Tool ● Two components: ○ Collection normalization script ○ Integrity scripts based on collection manifest
  • 5. Collection Normalization ● Processes existing collections of files into a format compliant with the SFU Library Digitized Content Packaging Specification ● Packaging Formats: ○ METS (http://www.loc.gov/standards/mets/) ○ BagIt (http://tools.ietf.org/html/draft-kunze-bagit)
  • 6. How Collection Normalization Works 1. Configuration file for settings 2. Script walks the directory tree of a collection, compiles list of files to be preserved 3. Files are collated into items (e.g., newspaper issue), METS file is generated 4. Items files and associated METS file are bagged (and serialized) 5. Future: A collection manifest is created for the collection for integrity checking (automatic or manual).
  • 7. Before and After Processing
  • 8. Design Principles ● a minimalist implementation - uses as few METS and BagIt options as possible. ● incorporates three widely implemented and understood standards: METS, BagIt and UUID (Universally Unique Identifiers) ● Technical metadata included in METS should include at a minimum bit-level checksums, file type identification, creating application, and where possible format validity ● Whenever possible, include descriptive metadata for the item in the METS file.
  • 9. Script Details ● Configuration file, main script, log file, processed collection output directory ● Uses Python for using the tool on multiple platforms ● Plugins for technical metadata (FITS) and descriptive metadata. ● Configuration options include: ○ test run (limited run size) ○ skipping technical metadata creation ○ file types of interest
  • 10. Future ● Addition of manifest and integrity checking tools that check a collection against its manifest ● Additional plugins ● Sharing code on GitHub
  • 11. Thank You This work was made possible by the support of: ● Simon Fraser University Library ● SFU Library Systems group ● Mark Jordan @mjordan