SlideShare uma empresa Scribd logo
1 de 21
Origin: It started with a simple need
As the Library of Congress began to deal with increasing amounts of digital content,
they faced some issues:
     • How do they know what files they have and who they belong to?
     • How do they get files from where they are to where they need to be?

The Library of Congress Repository Development Center began working on a solution--
tools for transfer activities including:
     • Adding digital content to the collections (whether internal or external data)
     • Moving digital content between storage systems
     • Review of digital files for fixity, quality and/or authoritativeness
     • Inventorying and recording transfer life cycle events for digital files
Origin: It evolved naturally from that need
Here is what Leslie Johnson (Library of Congress contributor) and John Kunze (California Digital Library
co-creator) shared about the project’s origin:
Origin: But what is it exactly?
•   The name comes from the concept of "bag it and tag it”. BagIt allows for the transfer of digital
    files by packaging them into a digital “bag” that is accessible for the library to download.
•   A bag is like a folder or directory on a computer; it can hold documents, photos, movies, music,
    or even other folders.
•   Bags are comprised of three main elements:
       1. A bag declaration text file (like a seal of authenticity)
       2. A text-file manifest (tag) listing the files in the collection
       3. A subdirectory filled with the digital content
•   A bag can also contain an optional text file with a small amount of administrative metadata (e.g.
    contact info for the collection owner and a description of the collection)
•   Once a bag is sent, the receiving computer can analyze the manifest and run checksums on the
    contents; if the checksums match (i.e. the files are unchanged), the transfer is successful.
•   It’s that simple!
Evolution: Community involvement
•   Working with John Kunze of the California Digital Library, Andy Boyko, Justin Littman, Liz Madden,
    and Brian Vargas of the Library produced draft version of BagIt (initially referred to as the “LC
    Package Specification”) in December 2008.
•   This was posted on the LOC and California Digital Library sites and as an internet “Request for
    Comment” (RFC).
•   It was also promoted on blogs, in conference presentations, articles, etc. NDIPP strongly
    encouraged partners to “bag” their content for transfer.
•    Through the process, project managers began learning what was still missing and where the
    specification needed clarification.
•   The team then launched a Digital Curation Google group to support the activities of this
    participatory community and encourage open, public discussion.
•   BagIt is now on version 0.97, having undergone several iterative revisions (6 drafts to date).
Evolution: Tools
•   BagIt was intended to be simple enough for users to work with directly. However, the community increasingly
    began to request tools to help with the use of BagIt, as well as the source code so that they could develop
    their own further tools.
•   The LOC developed three initial scripts- key utilities for the movement and validation of bagged content- and
    released them through SourceForge on December 18, 2008 under a BSD license (essentially open-sourced).
    These tools have been rather popular with 4,617 downloads to date (31 this week).
       • The Parallel Retriever: automates the retrieval of remote resources such as web pages, files on an FTP
           server, or files on a network drive, and then wraps them into a package that meets the BagIt
           specification.
       • The Bag Validator Script: checks that a bag meets the standards of the specification (i.e. all files listed in
           the manifest are in the data directory, there are no files in the directory not in the manifest, and there
           are no duplicate entries in the manifest)
       • VerifyIt Script: verifies the checksums of files in a bag against the manifest each time the files are
           moved or copied.
•   They later released the BagIt Library (BIL) – a Java library to support key functionality such as creating,
    manipulating, validating, and verifying Bags, and reading from and writing to a number of formats.
•   A client-side Bagger application was also underway in 2009. Bagger is intended to provide a graphical desktop
    for the Bagging of content, and ideally will require no client-side IT support or infrastructure.
Evolution: Adaptations
The BagIt tool set became the LOC’s first open source software release. Since then, several BagIt specific
tools have been created to simplify the process in several programming environments (it was originally
designed for use with Unix utilities):

      •   Python BagIt Library– at least two recent versions exist for this, one completed by Andrew
          Hankinson (https://github.com/ahankinson/bagit) and the other by Ed Summer
          (https://github.com/edsu/bagit). These libraries can be used to create BagIt style packages
          programmatically in Python or from the command line.
      •   Drupal– Mark Jordan developed a Drupal module for BagIt (http://drupal.org/project/bagit).
      •   Ruby– Francesco Lazzarino at the Florida Center for Library Automation developed a Ruby
          adaptation for BagIt (https://github.com/tipr/bagit).
      •   PHP– A PHP implementation of BagIt was created by Wayne Graham and Mark Jordan
          (https://github.com/scholarslab/BagItPHP).
      •   RESTful Bag Storage Proposal- Chris Adams developed this draft protocol for serving BagIt
          repositories RESTfully (https://github.com/acdha/restful-bag-server).
Practicalities: Where does BagIt fit?
“Why are such transfer tools and processes so important? Transfer processes are not surprisingly
linked with preservation, as the tasks performed during the transfer of files must follow a
documented workflow and be recorded in order to mitigate preservation risks... While initial
interest in this problem space came from the need to better manage transfers from external
partners to the Library, the transfer and transport of files within the organization for the purpose
of archiving, transformation, and delivery is an increasingly large part of daily operations. The
digitization of an item can create one or hundreds of files, each of which might have many
derivative versions, and which might reside in multiple locations simultaneously to serve different
purposes. Developing tools to manage such transfer tasks reduce the number of tasks performed
and tracked by humans, and automatically provides for the validation and verification of files with
each transfer event.”

-- from “Releasing Open Source at the Library of Congress” by Leslie Johnson
Practicalities: What’s so special about BagIt?
•   Bags are uncomplicated, and are therefore able to transcend differences in institutional
    data, data architecture, formats and practices.
•   Bags have built-in inventory checking (validation) to help ensure that the content is
    transferred unchanged and fully intact.
•   Unlike other packaging tools like zip or tar, Bagit does not require special software to extract
    the files.
•   Additionally, in these formats, all individual files included are condensed into a single zip or tar
    file. However, BagIt creates a logical package where files maintain their individuality and are
    simply stored in a traditional folder or directory container.
•   There is no limit to the number / type of files that can be transferred through the use of BagIt.
•   Bags are flexible and can work in many different settings– including situations when the
    content is located in many different places.
•   A bag’s metadata is machine readable, meaning that data can be ingested automatically.
•   Bags can be used over computer networks or through the use of portable storage devices.
Practicalities: Who Is Using BagIt?
•   As of 2009, a significant percentage of the 130 NDIIPP partners were already utilizing the BagIt
    specification in their preservation transfers to the Library.

•   A few of the organizations who are using BagIt include:
       The University of Virginia Libraries
       The Stanford Digital Repository
       Archivematica
       Ghent University Library
       The Dryad Data Repository
       The University of North Texas
       Central Connecticut State University
       Towards Interoperable Preservation Repositories (including the Florida Center for Library
          Automation, Cornell University, and New York University)
Practicalities: BagIt Usage Highlights
• The Stanford Digital Repository: Having had success using BagIt to move geospatial data from the National Geospatial Digital
  Archive project from Stanford to the Library of Congress, they settled on BagIt as the primary transfer format for content being
  deposited into their repository (ingest stage of OAIS) (http://www.dlib.org/dlib/september10/cramer/09cramer.html).

• Ghent University Library: They currently use BagIt as archival format for their digital collections. They also use it as an
  interchange format for the addition of new external collections (e.g. Google Books) to the local repositories.
  http://www.slideshare.net/hochstenbach/grep-ghent-university-repository

• The Dryad Data Repository: (a repository of data underlying scientific publications) is using the BagIt specification to share
  data and related metadata with TreeBASE, a repository of phylogenetic information.
  http://wiki.datadryad.org/BagIt_Handshaking

• Towards Interoperable Preservation Repositories (TIPR): is a partnership between the Florida Center for Library Automation,
  Cornell University, and New York University to develop, test and promote a standard interchange format for exchanging
  information packages among OAIS-based repositories. The proposed format is using the BagIt specification to exchange
  package bundles via HTTP. (http://wiki.fcla.edu:8000/TIPR); (https://github.com/tipr/bagit/)
The Process: Tutorials
•   The North Carolina State Archives has provided a set of 10 thorough tutorials to explain the
    BagIt process. The first video includes a summary of the steps involved; the second set
    explains the installation process; and the third details creation and verification step-by-step:
    http://www.youtube.com/playlist?list=PL1763D432BE25663D&feature=plcp

•   The NDIIPP-funded GeoMAPP project has published a BagIt User Guide that can be found at:
    http://www.geomapp.net/docs/Using_BagIt_ver2_geomapp_FINAL_20110321.pdf

•   The Library of Congress NDIIPP Partner Tools and Services Inventory page includes a brief
    description of BagIt, a PDF of the latest version of the BagIt specification, links to some of the
    BagIt tools, and a brief video demonstrating the BagIt process:
    http://www.digitalpreservation.gov/partners/resources/tools/index.html#b
Four Steps to use BagIt
  The process is as simple as 1, 2, 3, 4…




Prepare Files                 Create &       Copy &      Extract Files
 for Transfer                Verify Bag     Verify Bag     for Use
Image courtesy of the GeoMapp.net BagIt Guide
http://www.geomapp.net/docs/Using_BagIt_ver2_geomapp_FINAL_20110321.pdf
Prepare files for transfer

• A bag must have three things– a bag declaration, a list of the content files
  (manifest), and the content itself
• Validate content and metadata
• Perform virus check (suggested)
Create and verify the bag


•   Attach portable drive to computer (or use shared drive)
•   Create a new folder to serve as the holding place for your bag
•   Use the “BagIt” command to create the bag on this drive
•   Verify the bag by using the “verifyvalid” command
Copy and Verify the bag


• Copy the bag to a staging area
• Validate the received bag
• Run virus check software on the bag
Extract files for use



• Unpack the bag
• Your files are now ready for use!
Challenges: Limiting Usage Factors
•   Lack of information: The LOC website contains little information aside from what is
    included in their brief 3 minute video and short printed description. It’s hard to
    find much more via outside online sources either. It would be useful to have
    further example implementations to really understand how it can be used and
    what the advantages are over other formats such as zip files.

•   Learning curve: Most of the documentation language is complicated, and would
    not be easy to understand by the average person. BagIt doesn’t currently have an
    easy to use GUI interface to make the process simple for non-techie users. Bagger
    may help with this, but there is little information out there about the Bagger
    interface.
?
    And that concludes our tour
    of BagIT…
    Any Questions?
Additional Sources
"BagIt File Packaging Format." IETF Documents. Internet Engineering Task Force, 15 Apr 2011. Web. 1 Apr 2012.
              <http://tools.ietf.org/html/draft-kunze-bagit-06>.

BagIt: Transferring Content for Digital Preservation. 2009. video. The Library of Congress, Washington, DC.
              Web. 1 Apr 2012. <http://www.digitalpreservation.gov/multimedia/videos/bagit0609.html>.

Johnston, Leslie. "Releasing Open Source at the Library of Congress. "OCLC Systems & Services: International Digital
             Library Perspectives. 26.2 (2010): 94-102.

Johnston, Leslie, and John Kunze. "BagIt funding and versions." 29 Mar 2012. N.p., Online Posting to Digital Curation
             Google Group. Web. 1 Apr. 2012. <http://groups.google.com/group/digital-
             curation/browse_thread/thread/ace8eafae819762b?pli=1>.

Lavoie, Brian. "The Open Archival Information System Reference Model: Introductory Guide." Technology Watch
              Report. 04-01 (2004).

Lazorchak, Butch. "From There to Here, from Here to There, Digital Content is Everywhere!." The Signal: Digital
             Preservation. The Library of Congress, 3 Jan 2012. Web. 1 Apr 2012.
             <http://blogs.loc.gov/digitalpreservation/2012/01/from-there-to-here-from-here-to-there-digital-
             content-is-everywhere/>.

Willett, Perry. "BagIt File Packaging Format." California Digital Library, 10 Feb 2012. Web. 1 Apr 2012.
               <https://wiki.ucop.edu/display/Curation/BagIt>.

Mais conteúdo relacionado

Mais procurados

Archivematica integration handshaking towards comprehensive digital preserva...
Archivematica integration  handshaking towards comprehensive digital preserva...Archivematica integration  handshaking towards comprehensive digital preserva...
Archivematica integration handshaking towards comprehensive digital preserva...
Artefactual Systems - Archivematica
 
Adlug annual meeting 2013
Adlug annual meeting 2013Adlug annual meeting 2013
Adlug annual meeting 2013
@CULT Srl
 
Vila LOD-innovacion- bib-semweb-redux
Vila LOD-innovacion- bib-semweb-reduxVila LOD-innovacion- bib-semweb-redux
Vila LOD-innovacion- bib-semweb-redux
LIS EPI Meeting
 
IFLA 2012 - OCLC Linked Data round table
IFLA 2012 - OCLC Linked Data round tableIFLA 2012 - OCLC Linked Data round table
IFLA 2012 - OCLC Linked Data round table
Figoblog
 

Mais procurados (20)

Archives canada digital preservation service (acdps)
Archives canada digital preservation service (acdps)Archives canada digital preservation service (acdps)
Archives canada digital preservation service (acdps)
 
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...
 
Archivematica integration handshaking towards comprehensive digital preserva...
Archivematica integration  handshaking towards comprehensive digital preserva...Archivematica integration  handshaking towards comprehensive digital preserva...
Archivematica integration handshaking towards comprehensive digital preserva...
 
C06 linda levi_jeffrey_edelstein_jdc_archives
C06 linda levi_jeffrey_edelstein_jdc_archivesC06 linda levi_jeffrey_edelstein_jdc_archives
C06 linda levi_jeffrey_edelstein_jdc_archives
 
C06 linda levi_jeffrey_edelstein_jdc_archives
C06 linda levi_jeffrey_edelstein_jdc_archivesC06 linda levi_jeffrey_edelstein_jdc_archives
C06 linda levi_jeffrey_edelstein_jdc_archives
 
Greenstone Digital Library Software
Greenstone Digital Library SoftwareGreenstone Digital Library Software
Greenstone Digital Library Software
 
Digital Library Software
Digital Library SoftwareDigital Library Software
Digital Library Software
 
Report: Archivematica hosting in the cloud
Report: Archivematica hosting in the cloudReport: Archivematica hosting in the cloud
Report: Archivematica hosting in the cloud
 
Save This Book
Save This BookSave This Book
Save This Book
 
Feedable, Portable, Mashable, DITAble
Feedable, Portable, Mashable, DITAbleFeedable, Portable, Mashable, DITAble
Feedable, Portable, Mashable, DITAble
 
Welcome to the Mountain West Digital Library: The Power of Partnership
Welcome to the Mountain West Digital Library: The Power of PartnershipWelcome to the Mountain West Digital Library: The Power of Partnership
Welcome to the Mountain West Digital Library: The Power of Partnership
 
Adlug annual meeting 2013
Adlug annual meeting 2013Adlug annual meeting 2013
Adlug annual meeting 2013
 
148 john shaw2006fall
148 john shaw2006fall148 john shaw2006fall
148 john shaw2006fall
 
Vila LOD-innovacion- bib-semweb-redux
Vila LOD-innovacion- bib-semweb-reduxVila LOD-innovacion- bib-semweb-redux
Vila LOD-innovacion- bib-semweb-redux
 
Preservation as a Process MetaArchive and Distributed Digital Preservation
Preservation as a Process MetaArchive and Distributed Digital PreservationPreservation as a Process MetaArchive and Distributed Digital Preservation
Preservation as a Process MetaArchive and Distributed Digital Preservation
 
Data Designed for Discovery
Data Designed for DiscoveryData Designed for Discovery
Data Designed for Discovery
 
IFLA 2012 - OCLC Linked Data round table
IFLA 2012 - OCLC Linked Data round tableIFLA 2012 - OCLC Linked Data round table
IFLA 2012 - OCLC Linked Data round table
 
Seamless access to the world’s open access research papers via ResourceSync
Seamless access to the world’s open access research papers via ResourceSyncSeamless access to the world’s open access research papers via ResourceSync
Seamless access to the world’s open access research papers via ResourceSync
 
Best Practices for Descriptive Metadata for Web Archiving
Best Practices for Descriptive Metadata for Web ArchivingBest Practices for Descriptive Metadata for Web Archiving
Best Practices for Descriptive Metadata for Web Archiving
 
Digital Repositories, the Data Set of the Humanities
Digital Repositories, the Data Set of the HumanitiesDigital Repositories, the Data Set of the Humanities
Digital Repositories, the Data Set of the Humanities
 

Destaque

Metodología
MetodologíaMetodología
Metodología
javiuclm4
 
Hal el balad News paper
Hal el balad News paperHal el balad News paper
Hal el balad News paper
Dis Is Ability
 
Resume - Soumita Dutta
Resume - Soumita DuttaResume - Soumita Dutta
Resume - Soumita Dutta
soumita dutta
 

Destaque (12)

Metodología
MetodologíaMetodología
Metodología
 
Revista pisteyo 4
Revista pisteyo 4Revista pisteyo 4
Revista pisteyo 4
 
Contratación publica y privada miller
Contratación publica y privada millerContratación publica y privada miller
Contratación publica y privada miller
 
Hal el balad News paper
Hal el balad News paperHal el balad News paper
Hal el balad News paper
 
2559 project
2559 project 2559 project
2559 project
 
Rmodmex pc7
Rmodmex pc7Rmodmex pc7
Rmodmex pc7
 
My Presentation
My PresentationMy Presentation
My Presentation
 
2559 604 28
2559 604 282559 604 28
2559 604 28
 
Resume - Soumita Dutta
Resume - Soumita DuttaResume - Soumita Dutta
Resume - Soumita Dutta
 
Rmodmex pc6
Rmodmex pc6Rmodmex pc6
Rmodmex pc6
 
แนวทางการดำเนินงานเฝ้าระวัง โรคติดเชื้อไวรัสซิกา
แนวทางการดำเนินงานเฝ้าระวัง โรคติดเชื้อไวรัสซิกาแนวทางการดำเนินงานเฝ้าระวัง โรคติดเชื้อไวรัสซิกา
แนวทางการดำเนินงานเฝ้าระวัง โรคติดเชื้อไวรัสซิกา
 
Cosecha patatas
Cosecha patatasCosecha patatas
Cosecha patatas
 

Semelhante a BatIg

Digital preservation and curation of information.presentation
Digital preservation and curation of information.presentationDigital preservation and curation of information.presentation
Digital preservation and curation of information.presentation
Prince Sterling
 

Semelhante a BatIg (20)

From Box to Hydra via Archivematica
From Box to Hydra via ArchivematicaFrom Box to Hydra via Archivematica
From Box to Hydra via Archivematica
 
INNOVATION AND ‎RESEARCH (Digital Library ‎Information Access)‎
INNOVATION AND ‎RESEARCH (Digital Library ‎Information Access)‎INNOVATION AND ‎RESEARCH (Digital Library ‎Information Access)‎
INNOVATION AND ‎RESEARCH (Digital Library ‎Information Access)‎
 
New ICT Trends and Issues of Librarianship
New ICT Trends and Issues of LibrarianshipNew ICT Trends and Issues of Librarianship
New ICT Trends and Issues of Librarianship
 
Digital preservation and curation of information.presentation
Digital preservation and curation of information.presentationDigital preservation and curation of information.presentation
Digital preservation and curation of information.presentation
 
Project management report-on Digital Libraries
Project management report-on Digital LibrariesProject management report-on Digital Libraries
Project management report-on Digital Libraries
 
Archivematica and Local Authority Archive Services
Archivematica and Local Authority Archive ServicesArchivematica and Local Authority Archive Services
Archivematica and Local Authority Archive Services
 
greenstone-bbla seminar july 2010-cheyrl
greenstone-bbla seminar july 2010-cheyrlgreenstone-bbla seminar july 2010-cheyrl
greenstone-bbla seminar july 2010-cheyrl
 
UBC Library's Digital Preservation Strategy
UBC Library's Digital Preservation StrategyUBC Library's Digital Preservation Strategy
UBC Library's Digital Preservation Strategy
 
PERICLES Information Packaging Techniques
PERICLES  Information Packaging TechniquesPERICLES  Information Packaging Techniques
PERICLES Information Packaging Techniques
 
Archival Technologies
Archival TechnologiesArchival Technologies
Archival Technologies
 
Building the Future Together: AtoM3, Governance, and the Sustainability of Op...
Building the Future Together: AtoM3, Governance, and the Sustainability of Op...Building the Future Together: AtoM3, Governance, and the Sustainability of Op...
Building the Future Together: AtoM3, Governance, and the Sustainability of Op...
 
SFU Library's METS-Bagger Tool
SFU Library's METS-Bagger ToolSFU Library's METS-Bagger Tool
SFU Library's METS-Bagger Tool
 
"A Toolkit for Digital Research" - CNI 2013
"A Toolkit for Digital Research" - CNI 2013"A Toolkit for Digital Research" - CNI 2013
"A Toolkit for Digital Research" - CNI 2013
 
Digital library
Digital libraryDigital library
Digital library
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
 
"Filling the Digital Preservation Gap" with Archivematica
"Filling the Digital Preservation Gap" with Archivematica"Filling the Digital Preservation Gap" with Archivematica
"Filling the Digital Preservation Gap" with Archivematica
 
Implementing Archivematica, research data network
Implementing Archivematica, research data networkImplementing Archivematica, research data network
Implementing Archivematica, research data network
 
08 chapter 03
08 chapter 0308 chapter 03
08 chapter 03
 
Archiving the French Web: the BnF web archiving workflow. Sara Aubry
Archiving the French Web: the BnF web archiving workflow. Sara AubryArchiving the French Web: the BnF web archiving workflow. Sara Aubry
Archiving the French Web: the BnF web archiving workflow. Sara Aubry
 
Introduction to digital curation
Introduction to digital curationIntroduction to digital curation
Introduction to digital curation
 

Último

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Último (20)

Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 

BatIg

  • 1.
  • 2. Origin: It started with a simple need As the Library of Congress began to deal with increasing amounts of digital content, they faced some issues: • How do they know what files they have and who they belong to? • How do they get files from where they are to where they need to be? The Library of Congress Repository Development Center began working on a solution-- tools for transfer activities including: • Adding digital content to the collections (whether internal or external data) • Moving digital content between storage systems • Review of digital files for fixity, quality and/or authoritativeness • Inventorying and recording transfer life cycle events for digital files
  • 3. Origin: It evolved naturally from that need Here is what Leslie Johnson (Library of Congress contributor) and John Kunze (California Digital Library co-creator) shared about the project’s origin:
  • 4. Origin: But what is it exactly? • The name comes from the concept of "bag it and tag it”. BagIt allows for the transfer of digital files by packaging them into a digital “bag” that is accessible for the library to download. • A bag is like a folder or directory on a computer; it can hold documents, photos, movies, music, or even other folders. • Bags are comprised of three main elements: 1. A bag declaration text file (like a seal of authenticity) 2. A text-file manifest (tag) listing the files in the collection 3. A subdirectory filled with the digital content • A bag can also contain an optional text file with a small amount of administrative metadata (e.g. contact info for the collection owner and a description of the collection) • Once a bag is sent, the receiving computer can analyze the manifest and run checksums on the contents; if the checksums match (i.e. the files are unchanged), the transfer is successful. • It’s that simple!
  • 5. Evolution: Community involvement • Working with John Kunze of the California Digital Library, Andy Boyko, Justin Littman, Liz Madden, and Brian Vargas of the Library produced draft version of BagIt (initially referred to as the “LC Package Specification”) in December 2008. • This was posted on the LOC and California Digital Library sites and as an internet “Request for Comment” (RFC). • It was also promoted on blogs, in conference presentations, articles, etc. NDIPP strongly encouraged partners to “bag” their content for transfer. • Through the process, project managers began learning what was still missing and where the specification needed clarification. • The team then launched a Digital Curation Google group to support the activities of this participatory community and encourage open, public discussion. • BagIt is now on version 0.97, having undergone several iterative revisions (6 drafts to date).
  • 6. Evolution: Tools • BagIt was intended to be simple enough for users to work with directly. However, the community increasingly began to request tools to help with the use of BagIt, as well as the source code so that they could develop their own further tools. • The LOC developed three initial scripts- key utilities for the movement and validation of bagged content- and released them through SourceForge on December 18, 2008 under a BSD license (essentially open-sourced). These tools have been rather popular with 4,617 downloads to date (31 this week). • The Parallel Retriever: automates the retrieval of remote resources such as web pages, files on an FTP server, or files on a network drive, and then wraps them into a package that meets the BagIt specification. • The Bag Validator Script: checks that a bag meets the standards of the specification (i.e. all files listed in the manifest are in the data directory, there are no files in the directory not in the manifest, and there are no duplicate entries in the manifest) • VerifyIt Script: verifies the checksums of files in a bag against the manifest each time the files are moved or copied. • They later released the BagIt Library (BIL) – a Java library to support key functionality such as creating, manipulating, validating, and verifying Bags, and reading from and writing to a number of formats. • A client-side Bagger application was also underway in 2009. Bagger is intended to provide a graphical desktop for the Bagging of content, and ideally will require no client-side IT support or infrastructure.
  • 7. Evolution: Adaptations The BagIt tool set became the LOC’s first open source software release. Since then, several BagIt specific tools have been created to simplify the process in several programming environments (it was originally designed for use with Unix utilities): • Python BagIt Library– at least two recent versions exist for this, one completed by Andrew Hankinson (https://github.com/ahankinson/bagit) and the other by Ed Summer (https://github.com/edsu/bagit). These libraries can be used to create BagIt style packages programmatically in Python or from the command line. • Drupal– Mark Jordan developed a Drupal module for BagIt (http://drupal.org/project/bagit). • Ruby– Francesco Lazzarino at the Florida Center for Library Automation developed a Ruby adaptation for BagIt (https://github.com/tipr/bagit). • PHP– A PHP implementation of BagIt was created by Wayne Graham and Mark Jordan (https://github.com/scholarslab/BagItPHP). • RESTful Bag Storage Proposal- Chris Adams developed this draft protocol for serving BagIt repositories RESTfully (https://github.com/acdha/restful-bag-server).
  • 8. Practicalities: Where does BagIt fit? “Why are such transfer tools and processes so important? Transfer processes are not surprisingly linked with preservation, as the tasks performed during the transfer of files must follow a documented workflow and be recorded in order to mitigate preservation risks... While initial interest in this problem space came from the need to better manage transfers from external partners to the Library, the transfer and transport of files within the organization for the purpose of archiving, transformation, and delivery is an increasingly large part of daily operations. The digitization of an item can create one or hundreds of files, each of which might have many derivative versions, and which might reside in multiple locations simultaneously to serve different purposes. Developing tools to manage such transfer tasks reduce the number of tasks performed and tracked by humans, and automatically provides for the validation and verification of files with each transfer event.” -- from “Releasing Open Source at the Library of Congress” by Leslie Johnson
  • 9. Practicalities: What’s so special about BagIt? • Bags are uncomplicated, and are therefore able to transcend differences in institutional data, data architecture, formats and practices. • Bags have built-in inventory checking (validation) to help ensure that the content is transferred unchanged and fully intact. • Unlike other packaging tools like zip or tar, Bagit does not require special software to extract the files. • Additionally, in these formats, all individual files included are condensed into a single zip or tar file. However, BagIt creates a logical package where files maintain their individuality and are simply stored in a traditional folder or directory container. • There is no limit to the number / type of files that can be transferred through the use of BagIt. • Bags are flexible and can work in many different settings– including situations when the content is located in many different places. • A bag’s metadata is machine readable, meaning that data can be ingested automatically. • Bags can be used over computer networks or through the use of portable storage devices.
  • 10. Practicalities: Who Is Using BagIt? • As of 2009, a significant percentage of the 130 NDIIPP partners were already utilizing the BagIt specification in their preservation transfers to the Library. • A few of the organizations who are using BagIt include:  The University of Virginia Libraries  The Stanford Digital Repository  Archivematica  Ghent University Library  The Dryad Data Repository  The University of North Texas  Central Connecticut State University  Towards Interoperable Preservation Repositories (including the Florida Center for Library Automation, Cornell University, and New York University)
  • 11. Practicalities: BagIt Usage Highlights • The Stanford Digital Repository: Having had success using BagIt to move geospatial data from the National Geospatial Digital Archive project from Stanford to the Library of Congress, they settled on BagIt as the primary transfer format for content being deposited into their repository (ingest stage of OAIS) (http://www.dlib.org/dlib/september10/cramer/09cramer.html). • Ghent University Library: They currently use BagIt as archival format for their digital collections. They also use it as an interchange format for the addition of new external collections (e.g. Google Books) to the local repositories. http://www.slideshare.net/hochstenbach/grep-ghent-university-repository • The Dryad Data Repository: (a repository of data underlying scientific publications) is using the BagIt specification to share data and related metadata with TreeBASE, a repository of phylogenetic information. http://wiki.datadryad.org/BagIt_Handshaking • Towards Interoperable Preservation Repositories (TIPR): is a partnership between the Florida Center for Library Automation, Cornell University, and New York University to develop, test and promote a standard interchange format for exchanging information packages among OAIS-based repositories. The proposed format is using the BagIt specification to exchange package bundles via HTTP. (http://wiki.fcla.edu:8000/TIPR); (https://github.com/tipr/bagit/)
  • 12. The Process: Tutorials • The North Carolina State Archives has provided a set of 10 thorough tutorials to explain the BagIt process. The first video includes a summary of the steps involved; the second set explains the installation process; and the third details creation and verification step-by-step: http://www.youtube.com/playlist?list=PL1763D432BE25663D&feature=plcp • The NDIIPP-funded GeoMAPP project has published a BagIt User Guide that can be found at: http://www.geomapp.net/docs/Using_BagIt_ver2_geomapp_FINAL_20110321.pdf • The Library of Congress NDIIPP Partner Tools and Services Inventory page includes a brief description of BagIt, a PDF of the latest version of the BagIt specification, links to some of the BagIt tools, and a brief video demonstrating the BagIt process: http://www.digitalpreservation.gov/partners/resources/tools/index.html#b
  • 13. Four Steps to use BagIt The process is as simple as 1, 2, 3, 4… Prepare Files Create & Copy & Extract Files for Transfer Verify Bag Verify Bag for Use
  • 14. Image courtesy of the GeoMapp.net BagIt Guide http://www.geomapp.net/docs/Using_BagIt_ver2_geomapp_FINAL_20110321.pdf
  • 15. Prepare files for transfer • A bag must have three things– a bag declaration, a list of the content files (manifest), and the content itself • Validate content and metadata • Perform virus check (suggested)
  • 16. Create and verify the bag • Attach portable drive to computer (or use shared drive) • Create a new folder to serve as the holding place for your bag • Use the “BagIt” command to create the bag on this drive • Verify the bag by using the “verifyvalid” command
  • 17. Copy and Verify the bag • Copy the bag to a staging area • Validate the received bag • Run virus check software on the bag
  • 18. Extract files for use • Unpack the bag • Your files are now ready for use!
  • 19. Challenges: Limiting Usage Factors • Lack of information: The LOC website contains little information aside from what is included in their brief 3 minute video and short printed description. It’s hard to find much more via outside online sources either. It would be useful to have further example implementations to really understand how it can be used and what the advantages are over other formats such as zip files. • Learning curve: Most of the documentation language is complicated, and would not be easy to understand by the average person. BagIt doesn’t currently have an easy to use GUI interface to make the process simple for non-techie users. Bagger may help with this, but there is little information out there about the Bagger interface.
  • 20. ? And that concludes our tour of BagIT… Any Questions?
  • 21. Additional Sources "BagIt File Packaging Format." IETF Documents. Internet Engineering Task Force, 15 Apr 2011. Web. 1 Apr 2012. <http://tools.ietf.org/html/draft-kunze-bagit-06>. BagIt: Transferring Content for Digital Preservation. 2009. video. The Library of Congress, Washington, DC. Web. 1 Apr 2012. <http://www.digitalpreservation.gov/multimedia/videos/bagit0609.html>. Johnston, Leslie. "Releasing Open Source at the Library of Congress. "OCLC Systems & Services: International Digital Library Perspectives. 26.2 (2010): 94-102. Johnston, Leslie, and John Kunze. "BagIt funding and versions." 29 Mar 2012. N.p., Online Posting to Digital Curation Google Group. Web. 1 Apr. 2012. <http://groups.google.com/group/digital- curation/browse_thread/thread/ace8eafae819762b?pli=1>. Lavoie, Brian. "The Open Archival Information System Reference Model: Introductory Guide." Technology Watch Report. 04-01 (2004). Lazorchak, Butch. "From There to Here, from Here to There, Digital Content is Everywhere!." The Signal: Digital Preservation. The Library of Congress, 3 Jan 2012. Web. 1 Apr 2012. <http://blogs.loc.gov/digitalpreservation/2012/01/from-there-to-here-from-here-to-there-digital- content-is-everywhere/>. Willett, Perry. "BagIt File Packaging Format." California Digital Library, 10 Feb 2012. Web. 1 Apr 2012. <https://wiki.ucop.edu/display/Curation/BagIt>.

Notas do Editor

  1. Ingest– in practice, it might be used to send packets of information to a digital preservation repository (as part of an AIP packet)