SlideShare uma empresa Scribd logo
1 de 1
Baixar para ler offline
The BagIt file package format
                               John Kunze and Stephen Abrams, California Digital Library (CDL)

                                                                 1. start: put your files…           2. in a directory, data          Summary
  Replicas for a rainy day                                                                                                           BagIt is a file hierarchy, suitable for disk- or network-
  • Need a data package format that can carry any data                .pdf                .mpg                  data/                  based transfer, and maybe bag return, with just
  • Suitable for disk-based and network-based transfer                          .xls                                          .mpg     enough structure to safely enclose a manifest,
  • Main purpose is to move data from one digital library                                                                              checksums, tag info, and an arbitrary payload.
    or archive to another for safe-keeping                       4. put tag next to data
                                                                                                            .pdf              .xls
  • Not important whether the receiver provide access to
    or even understand meaning of the sender’s content
                                                                               manifest-md5.txt
                                                                                  bagit.txt                                          Bags in action
                            Fig. 1. Sleeping better at                data
                            night is more likely for the
                                                                             .mpg                         3. create a tag
                            archivist who has replicas
                            safely tucked away at other               .pdf                                    manifest-md5.txt       Library of Congress (LC) grantees send bags
                            memory organizations.                              .xls                              bagit.txt           • For example, bagged web crawls funded by LC are
     e³°°°                                                                                 Bag&Tag                  ...                 sent by CDL to LC, with replica sent to San Diego



  A BagIt bag directory listing                                   Holey bags! for efficiency                                                                                                 clarrisa~




  • A BagIt bag is a special directory (or Windows folder)        •   A common optional tag file lists “holes” to be filled
  • The directory name is your choice, but inside the top         •   Payload is incomplete until these files are fetched                                                               LaurelJukebox

    level BagIt reserves names for required files:                •   File fetch.txt lists URLs for receiver to grab
     • data : a subdirectory where you can put anything           •   Benefits include (network transfers only):
     • bagit.txt : a 2-line file declaring this is a bag               • No need to stage extra data copy at either end
                                                                                                                                     BagIt credits and details
     • manifest-md5.txt : a list of files present                      • Cheap parallelism from fetching URLs in batches             Authors from LC and CDL: Andy Boyko, John Kunze,
                                                                                                                                       Justin Littman, Liz Madden, Brian Vargas
             MyFirstBag/                                                        manifest-md5.txt
                                                                                   bagit.txt               manifest-md5.txt          Many thanks to: Stephen Abrams, Mike Ashenfelder, Scott
                     manifest-md5.txt                                                                                                  Fisher, Erik Hetzner, Keith Johnson, David Loy, Tracy
                 |   manifest-md5.txt                                              fetch.txt                  bagit.txt
                                                                                                              fetch.txt                Seneca, Mark Phillips, Adam Turoff, Jim Tuttle
                 |   bagit.txt                                          data                                                         BagIt specification:
                 |   bag-info.txt
                     bag-info.txt                                                                   data                                 http://www.ietf.org/internet-drafts/draft-kunze-bagit-03.txt
                                                                                                                                         http://www.cdlib.org/inside/diglib/bagit/bagitspec.html
                 --- data/
                      data/                                                                                 .mpg
                                                                                                                                     BagIt was informed by
                       |
                       |    my.pdf
                            my.pdf                                                                 .pdf                              • Enclose-and-Deposit method, Tabata & Sugimoto,
                       |
                       |    my.xsl
                            my.xsl                                                                           .xls                    • LC’s eDeposit Pilot and NDIIPP AIHT
                       |    my.mpg                                                      (http)                                       • ARC/WARC aggregate file format
                       |    my.mpg
Fig. 2. Payload directory, data,holds anything you want.         Fig. 3. On the left is a bag received with missing files, or         For further information
“Tag” files (purple) describe the bag itself, including list of   “holes”. The “fetch.txt” file lists URLs and corresponding           Please contact jak@ucop.edu or stephen.abrams@ucop.edu
files and checksums in the manifest (algorithms other than        payload filenames that the receiver must fetch before                For information on CDL’s Preservation Program, see
md5 possible) and an optional “bag-info.txt” metadata file.       declaring the bag on the right complete.                              http://www.cdlib.org/programs/digital_preservation.html

Mais conteúdo relacionado

Semelhante a The BagIt file package format

Pairtrees for object storage
Pairtrees for object storagePairtrees for object storage
Pairtrees for object storageJohn Kunze
 
Borthakur hadoop univ-research
Borthakur hadoop univ-researchBorthakur hadoop univ-research
Borthakur hadoop univ-researchsaintdevil163
 
File organization and introduction of DBMS
File organization and introduction of DBMSFile organization and introduction of DBMS
File organization and introduction of DBMSVrushaliSolanke
 
fileorganizationandintroductionofdbms-210313163900.pdf
fileorganizationandintroductionofdbms-210313163900.pdffileorganizationandintroductionofdbms-210313163900.pdf
fileorganizationandintroductionofdbms-210313163900.pdfFraolUmeta
 
Communications
CommunicationsCommunications
Communicationssimosk
 
Systems, processes & how we stop the wheels falling off
Systems, processes & how we stop the wheels falling offSystems, processes & how we stop the wheels falling off
Systems, processes & how we stop the wheels falling offWellcome Library
 
Scale your Alfresco Solutions
Scale your Alfresco Solutions Scale your Alfresco Solutions
Scale your Alfresco Solutions Alfresco Software
 
Communications systems
Communications systemsCommunications systems
Communications systemsMohd Arif
 
PERICLES Information Packaging Techniques
PERICLES  Information Packaging TechniquesPERICLES  Information Packaging Techniques
PERICLES Information Packaging TechniquesPERICLES_FP7
 
Some key value stores using log-structure
Some key value stores using log-structureSome key value stores using log-structure
Some key value stores using log-structureZhichao Liang
 
Plank
PlankPlank
PlankFNian
 
WoSC19: Serverless Workflows for Indexing Large Scientific Data
WoSC19: Serverless Workflows for Indexing Large Scientific DataWoSC19: Serverless Workflows for Indexing Large Scientific Data
WoSC19: Serverless Workflows for Indexing Large Scientific DataUniversity of Chicago
 
Communications
CommunicationsCommunications
CommunicationsDev Sharma
 
Top 10 database optimization tips
Top 10 database optimization tipsTop 10 database optimization tips
Top 10 database optimization tipsraviwriter
 

Semelhante a The BagIt file package format (20)

Pairtrees for object storage
Pairtrees for object storagePairtrees for object storage
Pairtrees for object storage
 
Borthakur hadoop univ-research
Borthakur hadoop univ-researchBorthakur hadoop univ-research
Borthakur hadoop univ-research
 
Personal Digital Archiving 2015 - NYU - Workshop
Personal Digital Archiving 2015 - NYU - WorkshopPersonal Digital Archiving 2015 - NYU - Workshop
Personal Digital Archiving 2015 - NYU - Workshop
 
File organization and introduction of DBMS
File organization and introduction of DBMSFile organization and introduction of DBMS
File organization and introduction of DBMS
 
fileorganizationandintroductionofdbms-210313163900.pdf
fileorganizationandintroductionofdbms-210313163900.pdffileorganizationandintroductionofdbms-210313163900.pdf
fileorganizationandintroductionofdbms-210313163900.pdf
 
Communications
CommunicationsCommunications
Communications
 
SFU Library's METS-Bagger Tool
SFU Library's METS-Bagger ToolSFU Library's METS-Bagger Tool
SFU Library's METS-Bagger Tool
 
Systems, processes & how we stop the wheels falling off
Systems, processes & how we stop the wheels falling offSystems, processes & how we stop the wheels falling off
Systems, processes & how we stop the wheels falling off
 
Scale your Alfresco Solutions
Scale your Alfresco Solutions Scale your Alfresco Solutions
Scale your Alfresco Solutions
 
Communications systems
Communications systemsCommunications systems
Communications systems
 
PERICLES Information Packaging Techniques
PERICLES  Information Packaging TechniquesPERICLES  Information Packaging Techniques
PERICLES Information Packaging Techniques
 
Some key value stores using log-structure
Some key value stores using log-structureSome key value stores using log-structure
Some key value stores using log-structure
 
Plank
PlankPlank
Plank
 
XDC demo: CTA
XDC demo: CTAXDC demo: CTA
XDC demo: CTA
 
NISO Webinar: Metadata for Preservation: A Digital Object's Best Friend
NISO Webinar: Metadata for Preservation: A Digital Object's Best Friend NISO Webinar: Metadata for Preservation: A Digital Object's Best Friend
NISO Webinar: Metadata for Preservation: A Digital Object's Best Friend
 
WoSC19: Serverless Workflows for Indexing Large Scientific Data
WoSC19: Serverless Workflows for Indexing Large Scientific DataWoSC19: Serverless Workflows for Indexing Large Scientific Data
WoSC19: Serverless Workflows for Indexing Large Scientific Data
 
Communications
CommunicationsCommunications
Communications
 
Top 10 database optimization tips
Top 10 database optimization tipsTop 10 database optimization tips
Top 10 database optimization tips
 
Digital library
Digital libraryDigital library
Digital library
 
Digital Preservation with Archivematica: An Introduction
Digital Preservation with Archivematica: An IntroductionDigital Preservation with Archivematica: An Introduction
Digital Preservation with Archivematica: An Introduction
 

Mais de John Kunze

The YAMZ Metadictionary
The YAMZ MetadictionaryThe YAMZ Metadictionary
The YAMZ MetadictionaryJohn Kunze
 
YAMZ Metadata Vocabulary Builder
YAMZ Metadata Vocabulary BuilderYAMZ Metadata Vocabulary Builder
YAMZ Metadata Vocabulary BuilderJohn Kunze
 
The ARK Alliance: 20 years, 850 institutions, 8.2 billion persistent identifi...
The ARK Alliance: 20 years, 850 institutions, 8.2 billion persistent identifi...The ARK Alliance: 20 years, 850 institutions, 8.2 billion persistent identifi...
The ARK Alliance: 20 years, 850 institutions, 8.2 billion persistent identifi...John Kunze
 
EZID and N2T at CDL
EZID and N2T at CDLEZID and N2T at CDL
EZID and N2T at CDLJohn Kunze
 
YAMZ.net: better, faster, cheaper taxonomy building
YAMZ.net:  better, faster, cheaper taxonomy buildingYAMZ.net:  better, faster, cheaper taxonomy building
YAMZ.net: better, faster, cheaper taxonomy buildingJohn Kunze
 
A Vocabulary for Persistence
A Vocabulary for PersistenceA Vocabulary for Persistence
A Vocabulary for PersistenceJohn Kunze
 
Identifiers obey Resolvers not Schemes
Identifiers obey Resolvers not SchemesIdentifiers obey Resolvers not Schemes
Identifiers obey Resolvers not SchemesJohn Kunze
 
Names, Things, and Open Identifier Infrastructure: N2T and ARKs
Names, Things, and Open Identifier Infrastructure: N2T and ARKsNames, Things, and Open Identifier Infrastructure: N2T and ARKs
Names, Things, and Open Identifier Infrastructure: N2T and ARKsJohn Kunze
 
ARK identifiers: lessons learnt at BnF: paths forward
ARK identifiers: lessons learnt at BnF: paths forwardARK identifiers: lessons learnt at BnF: paths forward
ARK identifiers: lessons learnt at BnF: paths forwardJohn Kunze
 
YAMZ: a cross-domain crowd-sourced metadata vocabulary
YAMZ: a cross-domain crowd-sourced metadata vocabularyYAMZ: a cross-domain crowd-sourced metadata vocabulary
YAMZ: a cross-domain crowd-sourced metadata vocabularyJohn Kunze
 
DataONE Preservation and Metadata Working Group Report 2014
DataONE Preservation and Metadata Working Group Report 2014DataONE Preservation and Metadata Working Group Report 2014
DataONE Preservation and Metadata Working Group Report 2014John Kunze
 
Selected Bash shell tricks from Camp CDL breakout group
Selected Bash shell tricks from Camp CDL breakout groupSelected Bash shell tricks from Camp CDL breakout group
Selected Bash shell tricks from Camp CDL breakout groupJohn Kunze
 
Annotating Research Datasets
Annotating Research DatasetsAnnotating Research Datasets
Annotating Research DatasetsJohn Kunze
 
The Data Management Ecosystem
The Data Management EcosystemThe Data Management Ecosystem
The Data Management EcosystemJohn Kunze
 
Library Tools Supporting Data-Rich Research
Library Tools Supporting Data-Rich ResearchLibrary Tools Supporting Data-Rich Research
Library Tools Supporting Data-Rich ResearchJohn Kunze
 
Big Data's Long Tail
Big Data's Long TailBig Data's Long Tail
Big Data's Long TailJohn Kunze
 
Scalable Identifiers for Natural History Collections
Scalable Identifiers for Natural History CollectionsScalable Identifiers for Natural History Collections
Scalable Identifiers for Natural History CollectionsJohn Kunze
 
Future-Proofing the Web: What We Can Do Today
Future-Proofing the Web: What We Can Do TodayFuture-Proofing the Web: What We Can Do Today
Future-Proofing the Web: What We Can Do TodayJohn Kunze
 
Supporting Data-Rich Research on Many Fronts
Supporting Data-Rich Research on Many FrontsSupporting Data-Rich Research on Many Fronts
Supporting Data-Rich Research on Many FrontsJohn Kunze
 

Mais de John Kunze (20)

The YAMZ Metadictionary
The YAMZ MetadictionaryThe YAMZ Metadictionary
The YAMZ Metadictionary
 
YAMZ Metadata Vocabulary Builder
YAMZ Metadata Vocabulary BuilderYAMZ Metadata Vocabulary Builder
YAMZ Metadata Vocabulary Builder
 
The ARK Alliance: 20 years, 850 institutions, 8.2 billion persistent identifi...
The ARK Alliance: 20 years, 850 institutions, 8.2 billion persistent identifi...The ARK Alliance: 20 years, 850 institutions, 8.2 billion persistent identifi...
The ARK Alliance: 20 years, 850 institutions, 8.2 billion persistent identifi...
 
EZID and N2T at CDL
EZID and N2T at CDLEZID and N2T at CDL
EZID and N2T at CDL
 
YAMZ.net: better, faster, cheaper taxonomy building
YAMZ.net:  better, faster, cheaper taxonomy buildingYAMZ.net:  better, faster, cheaper taxonomy building
YAMZ.net: better, faster, cheaper taxonomy building
 
A Vocabulary for Persistence
A Vocabulary for PersistenceA Vocabulary for Persistence
A Vocabulary for Persistence
 
Identifiers obey Resolvers not Schemes
Identifiers obey Resolvers not SchemesIdentifiers obey Resolvers not Schemes
Identifiers obey Resolvers not Schemes
 
Names, Things, and Open Identifier Infrastructure: N2T and ARKs
Names, Things, and Open Identifier Infrastructure: N2T and ARKsNames, Things, and Open Identifier Infrastructure: N2T and ARKs
Names, Things, and Open Identifier Infrastructure: N2T and ARKs
 
ARK identifiers: lessons learnt at BnF: paths forward
ARK identifiers: lessons learnt at BnF: paths forwardARK identifiers: lessons learnt at BnF: paths forward
ARK identifiers: lessons learnt at BnF: paths forward
 
YAMZ: a cross-domain crowd-sourced metadata vocabulary
YAMZ: a cross-domain crowd-sourced metadata vocabularyYAMZ: a cross-domain crowd-sourced metadata vocabulary
YAMZ: a cross-domain crowd-sourced metadata vocabulary
 
DataONE Preservation and Metadata Working Group Report 2014
DataONE Preservation and Metadata Working Group Report 2014DataONE Preservation and Metadata Working Group Report 2014
DataONE Preservation and Metadata Working Group Report 2014
 
Selected Bash shell tricks from Camp CDL breakout group
Selected Bash shell tricks from Camp CDL breakout groupSelected Bash shell tricks from Camp CDL breakout group
Selected Bash shell tricks from Camp CDL breakout group
 
Annotating Research Datasets
Annotating Research DatasetsAnnotating Research Datasets
Annotating Research Datasets
 
The Data Management Ecosystem
The Data Management EcosystemThe Data Management Ecosystem
The Data Management Ecosystem
 
Library Tools Supporting Data-Rich Research
Library Tools Supporting Data-Rich ResearchLibrary Tools Supporting Data-Rich Research
Library Tools Supporting Data-Rich Research
 
Big Data's Long Tail
Big Data's Long TailBig Data's Long Tail
Big Data's Long Tail
 
Pamwg 2012ahm
Pamwg 2012ahmPamwg 2012ahm
Pamwg 2012ahm
 
Scalable Identifiers for Natural History Collections
Scalable Identifiers for Natural History CollectionsScalable Identifiers for Natural History Collections
Scalable Identifiers for Natural History Collections
 
Future-Proofing the Web: What We Can Do Today
Future-Proofing the Web: What We Can Do TodayFuture-Proofing the Web: What We Can Do Today
Future-Proofing the Web: What We Can Do Today
 
Supporting Data-Rich Research on Many Fronts
Supporting Data-Rich Research on Many FrontsSupporting Data-Rich Research on Many Fronts
Supporting Data-Rich Research on Many Fronts
 

Último

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...apidays
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 

Último (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 

The BagIt file package format

  • 1. The BagIt file package format John Kunze and Stephen Abrams, California Digital Library (CDL) 1. start: put your files… 2. in a directory, data Summary Replicas for a rainy day BagIt is a file hierarchy, suitable for disk- or network- • Need a data package format that can carry any data .pdf .mpg data/ based transfer, and maybe bag return, with just • Suitable for disk-based and network-based transfer .xls .mpg enough structure to safely enclose a manifest, • Main purpose is to move data from one digital library checksums, tag info, and an arbitrary payload. or archive to another for safe-keeping 4. put tag next to data .pdf .xls • Not important whether the receiver provide access to or even understand meaning of the sender’s content manifest-md5.txt bagit.txt Bags in action Fig. 1. Sleeping better at data night is more likely for the .mpg 3. create a tag archivist who has replicas safely tucked away at other .pdf manifest-md5.txt Library of Congress (LC) grantees send bags memory organizations. .xls bagit.txt • For example, bagged web crawls funded by LC are e³°°° Bag&Tag ... sent by CDL to LC, with replica sent to San Diego A BagIt bag directory listing Holey bags! for efficiency clarrisa~ • A BagIt bag is a special directory (or Windows folder) • A common optional tag file lists “holes” to be filled • The directory name is your choice, but inside the top • Payload is incomplete until these files are fetched LaurelJukebox level BagIt reserves names for required files: • File fetch.txt lists URLs for receiver to grab • data : a subdirectory where you can put anything • Benefits include (network transfers only): • bagit.txt : a 2-line file declaring this is a bag • No need to stage extra data copy at either end BagIt credits and details • manifest-md5.txt : a list of files present • Cheap parallelism from fetching URLs in batches Authors from LC and CDL: Andy Boyko, John Kunze, Justin Littman, Liz Madden, Brian Vargas MyFirstBag/ manifest-md5.txt bagit.txt manifest-md5.txt Many thanks to: Stephen Abrams, Mike Ashenfelder, Scott manifest-md5.txt Fisher, Erik Hetzner, Keith Johnson, David Loy, Tracy | manifest-md5.txt fetch.txt bagit.txt fetch.txt Seneca, Mark Phillips, Adam Turoff, Jim Tuttle | bagit.txt data BagIt specification: | bag-info.txt bag-info.txt data http://www.ietf.org/internet-drafts/draft-kunze-bagit-03.txt http://www.cdlib.org/inside/diglib/bagit/bagitspec.html --- data/ data/ .mpg BagIt was informed by | | my.pdf my.pdf .pdf • Enclose-and-Deposit method, Tabata & Sugimoto, | | my.xsl my.xsl .xls • LC’s eDeposit Pilot and NDIIPP AIHT | my.mpg (http) • ARC/WARC aggregate file format | my.mpg Fig. 2. Payload directory, data,holds anything you want. Fig. 3. On the left is a bag received with missing files, or For further information “Tag” files (purple) describe the bag itself, including list of “holes”. The “fetch.txt” file lists URLs and corresponding Please contact jak@ucop.edu or stephen.abrams@ucop.edu files and checksums in the manifest (algorithms other than payload filenames that the receiver must fetch before For information on CDL’s Preservation Program, see md5 possible) and an optional “bag-info.txt” metadata file. declaring the bag on the right complete. http://www.cdlib.org/programs/digital_preservation.html