SlideShare uma empresa Scribd logo
1 de 52
Transfers go in, not SIPs
Transfers go in, not SIPs
•   Verify transfer compliance
•   Rename with transfer UUID
•   Assign file UUIDs to objects
•   Assign checksums and file sizes to objects
•   Verify metadata directory checksums
•   Generate METS.xml document
•   Extract packages
•   Scan for viruses
•   Sanitize object's file and directory names
•   Sanitize transfer name
•   Characterize and extract metadata
•   Create SIP(s)
Transfer directory structure (0.8)
   Transfer_Name
          /logs (only for logs created by Archivematica)
          /metadata
                   checksum.md5 (must have this name)
                   /submissionDocumentation (optional)
                            info_about_digitization.xls
          /objects
                   example.MKV
                   transfer_name.csv
                   /access (optional)
                            example.MP4 (must have same name)
                            transfer_name.csv
         processingMCP.xml
Digitization metadata: excerpt
Ingest Planning: organization

• Decide how to organize transfer contents :
  logical or arbitrary groupings?
• Don’t make AIP too big!
• Organize files into single transfers – we
  did this on a network drive
Ingest Planning: organization




TIFFs: arbitrary blocks of ~ 1,000    Audio & video: fewer files, fewer errors,
for migration; prob. logical groups   easier to organize by fonds or series
Ingest Planning: Quality
• Need time for quality control of digital
  objects and metadata before ingest
• Set aside problem or “do later” files, such
  as those requiring rescanning
Ingest Planning: Quality
• Check metadata completeness &
  accuracy, image quality: no garbage in!
• Started with export from current db:
  current workflow is to only ingest items
  already described to item level
• One item #, one file (and no more)
• Filenames have to agree with item # and
  with csv ingested: spaces, capitalization,
  etc, including extensions
Ingest Planning: Quality

• Using custom MS Access form, volunteers
  inspected master & derivative images next
  to their descriptions to ensure correct
  image
• Image sizes double-checked in case any
  too small
Ingest Planning: TIFFs
• Separate files not to be
  ingested and those to be
  ingested as sub-items
• Not for ingest: not owned
  by us, should never have
  been digitized; or need
  descriptions
• Sub-items: multi-page,
  need a procedure first
Assemble transfer objects:
     no access files
Assemble transfer objects:
    have access files
Tracking transfers
Configure Archivematica

• Workflow (s): configure for each transfer unless
  using default
• AIP compression: will be the same for all
  transfers processing at the same time
• Normalization choices
Workflow: processingMCP.xml
• Overrides default processing xml files
  used to process born-digital materials
• Can customize to make the processing
  faster
• Must always have exactly this name, even
  if contents vary
Workflow: processingMCP.xml
          Location

 Transfer_Name
          /logs
          /metadata
          /objects
      processingMCP.xml
Workflow: processingMCP.xml
          Example
Workflow: processingMCP.xml
          Example
Workflow: processingMCP.xml
          Example
Workflow: processingMCP.xml
          Example
Workflow: processingMCP.xml
          Example
Workflow: processingMCP.xml
 Example. No normalization
Workflow: processingMCP.xml
Example. Access normalization
Workflow: processingMCP.xml
          Example
Setting AIP compression
Setting AIP compression
Normalization:
Preservation planning tab
Normalization: TIF
convert "%fileFullName%" -sampling-factor 4:4:4 -quality 60 %outputDirectory
                   %%prefix%%fileName%%postfix%.jpg"
Load Transfers
Structured directory on server
Staging area for transfers
Launching transfers
Monitoring using htop
Transfer failed: X marks it
Failed microservice: what went wrong?
Failed microservice message
Reporting: microservices
Reporting: Compress AIP




           Greenwich Mean Time
Upload DIP directory
• As they are created, DIPs appear here
Upload DIP directory
• DIP objects in “objects” folder
Upload DIP directory
• Inside objects folder, each DIP object with
  new UUID
Completed AIPs
Inside the AIP directory
Open Terminal to checksum AIP bag
Checksum AIP bag
Copy
to external drive

  and then to
   AIPstore
Exception: service copies
Exceptions: service directory
  Transfer_Name
           /logs
           /metadata
           /objects
                 /access
                 /service
       processingMCP.xml
Bare-metal Workflow: lots of copying!



  Network                                        Network
 storage –                                       storage -
Digitized QA                                     AIPstore




               External drive   External drive
VM Workflow: less manual copying


                                 automated
   Network
  storage –
 Digitized QA



                                             Network
                                             storage -
                                             AIPstore


                External drive

Mais conteúdo relacionado

Mais procurados

RedisConf17 - Redis Labs - Implementing Real-time Machine Learning with Redis-ML
RedisConf17 - Redis Labs - Implementing Real-time Machine Learning with Redis-MLRedisConf17 - Redis Labs - Implementing Real-time Machine Learning with Redis-ML
RedisConf17 - Redis Labs - Implementing Real-time Machine Learning with Redis-ML
Redis Labs
 
Scalable and Available, Patterns for Success
Scalable and Available, Patterns for SuccessScalable and Available, Patterns for Success
Scalable and Available, Patterns for Success
Derek Collison
 
RedisConf18 - Redis Enterprise on Cloud Native Platforms
RedisConf18 - Redis Enterprise on Cloud  Native  Platforms RedisConf18 - Redis Enterprise on Cloud  Native  Platforms
RedisConf18 - Redis Enterprise on Cloud Native Platforms
Redis Labs
 

Mais procurados (20)

DalmatinerDB and cockroachDB monitoring plataform
DalmatinerDB and cockroachDB monitoring plataformDalmatinerDB and cockroachDB monitoring plataform
DalmatinerDB and cockroachDB monitoring plataform
 
Data Pipeline with Docker on AWS
Data Pipeline with Docker on AWSData Pipeline with Docker on AWS
Data Pipeline with Docker on AWS
 
Building Distributed Systems With Riak and Riak Core
Building Distributed Systems With Riak and Riak CoreBuilding Distributed Systems With Riak and Riak Core
Building Distributed Systems With Riak and Riak Core
 
Walmart & IBM Revisit the Linear Road Benchmark- Roger Rea, IBM
Walmart & IBM Revisit the Linear Road Benchmark- Roger Rea, IBMWalmart & IBM Revisit the Linear Road Benchmark- Roger Rea, IBM
Walmart & IBM Revisit the Linear Road Benchmark- Roger Rea, IBM
 
What's new with enterprise Redis - Leena Joshi, Redis Labs
What's new with enterprise Redis - Leena Joshi, Redis LabsWhat's new with enterprise Redis - Leena Joshi, Redis Labs
What's new with enterprise Redis - Leena Joshi, Redis Labs
 
DynomiteDB - No spof High-availability Redis cluster solution
DynomiteDB -  No spof High-availability Redis cluster solutionDynomiteDB -  No spof High-availability Redis cluster solution
DynomiteDB - No spof High-availability Redis cluster solution
 
RedisConf17 - Redis Development, An Update - @antirez
RedisConf17 - Redis Development, An Update - @antirezRedisConf17 - Redis Development, An Update - @antirez
RedisConf17 - Redis Development, An Update - @antirez
 
EVCache at Netflix
EVCache at NetflixEVCache at Netflix
EVCache at Netflix
 
Technologies, Data Analytics Service and Enterprise Business
Technologies, Data Analytics Service and Enterprise BusinessTechnologies, Data Analytics Service and Enterprise Business
Technologies, Data Analytics Service and Enterprise Business
 
RedisConf17 - Redis Labs - Implementing Real-time Machine Learning with Redis-ML
RedisConf17 - Redis Labs - Implementing Real-time Machine Learning with Redis-MLRedisConf17 - Redis Labs - Implementing Real-time Machine Learning with Redis-ML
RedisConf17 - Redis Labs - Implementing Real-time Machine Learning with Redis-ML
 
Scalable and Available, Patterns for Success
Scalable and Available, Patterns for SuccessScalable and Available, Patterns for Success
Scalable and Available, Patterns for Success
 
tdtechtalk20160330johan
tdtechtalk20160330johantdtechtalk20160330johan
tdtechtalk20160330johan
 
Distributed Logging Architecture in Container Era
Distributed Logging Architecture in Container EraDistributed Logging Architecture in Container Era
Distributed Logging Architecture in Container Era
 
London HUG 8/3 - Nomad
London HUG 8/3 - NomadLondon HUG 8/3 - Nomad
London HUG 8/3 - Nomad
 
Scylla Summit 2022: ScyllaDB Rust Driver: One Driver to Rule Them All
Scylla Summit 2022: ScyllaDB Rust Driver: One Driver to Rule Them AllScylla Summit 2022: ScyllaDB Rust Driver: One Driver to Rule Them All
Scylla Summit 2022: ScyllaDB Rust Driver: One Driver to Rule Them All
 
DataEngConf SF16 - BYOMQ: Why We [re]Built IronMQ
DataEngConf SF16 - BYOMQ: Why We [re]Built IronMQDataEngConf SF16 - BYOMQ: Why We [re]Built IronMQ
DataEngConf SF16 - BYOMQ: Why We [re]Built IronMQ
 
Kubernetes at Spreadshirt - First steps to production
Kubernetes at Spreadshirt - First steps to productionKubernetes at Spreadshirt - First steps to production
Kubernetes at Spreadshirt - First steps to production
 
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
 
Why learn jenkins via nomad_ci (nomad/consul/docker/jenkins) 
Why learn jenkins via nomad_ci (nomad/consul/docker/jenkins) Why learn jenkins via nomad_ci (nomad/consul/docker/jenkins) 
Why learn jenkins via nomad_ci (nomad/consul/docker/jenkins) 
 
RedisConf18 - Redis Enterprise on Cloud Native Platforms
RedisConf18 - Redis Enterprise on Cloud  Native  Platforms RedisConf18 - Redis Enterprise on Cloud  Native  Platforms
RedisConf18 - Redis Enterprise on Cloud Native Platforms
 

Semelhante a Using Archivematica 0.8 for Digitized Content

Q2 Briefing Presentation
Q2 Briefing PresentationQ2 Briefing Presentation
Q2 Briefing Presentation
Kurt Carlsen
 
Exadata下的数据并行加载、并行卸载及性能监控
Exadata下的数据并行加载、并行卸载及性能监控Exadata下的数据并行加载、并行卸载及性能监控
Exadata下的数据并行加载、并行卸载及性能监控
Kaiyao Huang
 
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
DATAVERSITY
 

Semelhante a Using Archivematica 0.8 for Digitized Content (20)

Q2 Briefing Presentation
Q2 Briefing PresentationQ2 Briefing Presentation
Q2 Briefing Presentation
 
Scale your Alfresco Solutions
Scale your Alfresco Solutions Scale your Alfresco Solutions
Scale your Alfresco Solutions
 
Practical Experience with Automation Tools by Tim Walsh (Archivematica Camp B...
Practical Experience with Automation Tools by Tim Walsh (Archivematica Camp B...Practical Experience with Automation Tools by Tim Walsh (Archivematica Camp B...
Practical Experience with Automation Tools by Tim Walsh (Archivematica Camp B...
 
Scaling Servers and Storage for Film Assets
Scaling Servers and Storage for Film Assets  Scaling Servers and Storage for Film Assets
Scaling Servers and Storage for Film Assets
 
Introduction to memcached
Introduction to memcachedIntroduction to memcached
Introduction to memcached
 
Exadata下的数据并行加载、并行卸载及性能监控
Exadata下的数据并行加载、并行卸载及性能监控Exadata下的数据并行加载、并行卸载及性能监控
Exadata下的数据并行加载、并行卸载及性能监控
 
Migrating from OCLC's Digital Archive to DuraCloud
Migrating from OCLC's Digital Archive to DuraCloudMigrating from OCLC's Digital Archive to DuraCloud
Migrating from OCLC's Digital Archive to DuraCloud
 
Dataflow in 104corp - DataConTW2018
Dataflow in 104corp - DataConTW2018Dataflow in 104corp - DataConTW2018
Dataflow in 104corp - DataConTW2018
 
Elastic storage in the cloud session 5224 final v2
Elastic storage in the cloud session 5224 final v2Elastic storage in the cloud session 5224 final v2
Elastic storage in the cloud session 5224 final v2
 
BGOUG "Agile Data: revolutionizing database cloning'
BGOUG  "Agile Data: revolutionizing database cloning'BGOUG  "Agile Data: revolutionizing database cloning'
BGOUG "Agile Data: revolutionizing database cloning'
 
IBM Cloud Object Storage System (powered by Cleversafe) and its Applications
IBM Cloud Object Storage System (powered by Cleversafe) and its ApplicationsIBM Cloud Object Storage System (powered by Cleversafe) and its Applications
IBM Cloud Object Storage System (powered by Cleversafe) and its Applications
 
Understanding Elastic Block Store Availability and Performance
Understanding Elastic Block Store Availability and PerformanceUnderstanding Elastic Block Store Availability and Performance
Understanding Elastic Block Store Availability and Performance
 
2020 07-30 elastic agent + ingest management
2020 07-30 elastic agent + ingest management2020 07-30 elastic agent + ingest management
2020 07-30 elastic agent + ingest management
 
AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...
AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...
AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...
 
 Active Storage - Modern File Storage? 
 Active Storage - Modern File Storage?  Active Storage - Modern File Storage? 
 Active Storage - Modern File Storage? 
 
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
 
Rman Presentation
Rman PresentationRman Presentation
Rman Presentation
 
Linux Container Primitives and Runtimes (CON407-R1) - AWS re:Invent 2018
Linux Container Primitives and Runtimes (CON407-R1) - AWS re:Invent 2018Linux Container Primitives and Runtimes (CON407-R1) - AWS re:Invent 2018
Linux Container Primitives and Runtimes (CON407-R1) - AWS re:Invent 2018
 
(ATS6-PLAT07) Managing AEP in an enterprise environment
(ATS6-PLAT07) Managing AEP in an enterprise environment(ATS6-PLAT07) Managing AEP in an enterprise environment
(ATS6-PLAT07) Managing AEP in an enterprise environment
 
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Último (20)

Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 

Using Archivematica 0.8 for Digitized Content