With Expedited, Standard, and Bulk retrievals, you can leverage Amazon Glacier’s extremely low-cost storage service to support the full spectrum of archive use cases. These range from deep archives that are never retrieved to active workloads with minute-level access, such as media broadcasting, to petabyte-scale content distribution or big data analytics use cases. This session will dive deep into the recently launched retrieval features, review Amazon Glacier’s current feature set, and share use cases from customers leveraging Glacier’s latest features.
Learning Objectives:
• Dive deep on Amazon Glacier and the new retrieval features
• Learn about the benefits of Amazon Glacier and the new retrieval features
• Learn about the different use cases
• Learn how to get started using Amazon Glacier
2. Storing 20 PB and 1M+ hours of
motion picture and television
content, growing 1 PB per
quarter
Single-copy on Glacier
Over $10MM in savings
Replaced legacy tape solution
Higher performance, higher
durability, lower cost
Media Content Distribution – Sony DADC
3. HealthSuite digital platform powered
by AWS
15 PB of patient data
Archives patient records and medical
images produced across over 1,500
hospitals
Securely stored for decades (lifetime
of patients)
Uses HIPAA-eligible AWS services
Patient data – Philips Healthcare
4. Batches and Streams
Direct
Connect
Snowball,
Snowball Edge,
Snowmobile
3rd Party
Connectors
Transfer
Acceleration
Storage
Gateway
Kinesis Firehose
File
Amazon EFS
Block
Amazon EBS
(persistent)
Object
Amazon GlacierAmazon S3 Amazon EC2
Instance Store
(ephemeral)
5. Data Storage Demand
Media assets, 4k, 8k
Healthcare/life sciences
Financial services
Regulated industries
Oil and gas/geospatial
Digital preservation
Longterm backups
Logs
Solution Requirements:
Secure and durable
Scalable
Cost-effective
Flexible data access
Compliant
6. Flexible Data
Access
Three retrieval options from
minutes to hours
Durable
11 9s of durability (5 orders of
magnitude better than 2 copies
on tape)
Management Features
Vault Lock, Retrieval Policies,
CloudTrail
Cost-Effective
Starting at $0.004 per GB
per month
Secure
All data encrypted at rest
Scalable
From gigabytes to exabytes
Amazon Glacier
7. Amazon Glacier
Metered usage:
pay as you go
No capital investment
No commitment
No risky capacity planning
Avoid risks of physical
media handling
Control your
geographic locality for
performance and
compliance
8. Key Terms and Concepts
Vaults – container for archives, up to 1,000 vaults per account
Archives – basic unit, write-once, 40 TB max, unlimited archives
Inventory – cold index of archives refreshed every 24 hours
1. Access – three ways to access Amazon Glacier
2. Uploads – multipart, lifecycle, cost optimizations, AWS Snowball
3. Data management – Vault Lock, tagging, audit logs
4. Retrievals – retrieval policies, range retrievals, new retrieval features
9. Accessing Amazon Glacier
1. Direct Amazon Glacier API/SDK
2. Amazon S3 lifecycle integration
3. Third-party tools and gateways
FastGlacier
10. Uploading data: Internet or sneaker-net
AWS Direct
Connect
Dedicated bandwidth between
your site and AWS
Internet
Transfer data in a secure SSL tunnel
over the public Internet
Snowball
Snowball Edge
Snowmobile
Physical transfer of media into
and out of AWS
11. Uploading data: archive descriptions
Use archive description field for
metadata
If local index is corrupted or
destroyed, use archive description
to reconstruct critical mappings
For example, create index entry,
add primary key to archive
description on upload
Local Index Entry
Primary key: 12345
Description: 2014Audit
Dept: FinanceDept
ArchiveID: 9FG23…..
…..
UploadArchive(data,
ArchiveDescription=“12345,
2014Audit,FinanceDept”) ->
Archive ID = 9FG23…..
12. Uploading data: optimizing costs
Every archive has 32 KB of associated
overhead and some operations are charged per
request
For archive size of 3.2 MB ~1% cost overheads
For 1 KB archive, 97% of cost would go to
overhead
Solution is aggregation – recommend minimum
size on the order of at least MBs
14. Best practices: multipart uploads
Improve throughput, reliability, and get idempotency
1. InitiateMultipartUpload(partSize) → uploadId
2. UploadPart(uploadId, data)
3. CompleteMultipartUpload(uploadId) → archiveId
Archive
Parallel Uploads
Parts
15. Amazon Glacier: Amazon S3 lifecycle policies
Seamlessly move data from Amazon S3 to Amazon Glacier
Automated lifecycle rules
Transition based on object age
16. Amazon Glacier: Amazon S3 lifecycle policies
Object-level tagging for S3
objects
Apply lifecycle rules based on
object tags
Example: transition objects to
Amazon Glacier when 1 year
old and have object tags
‘Project=Delta’ and ‘Data
type=HPI’.
18. Management features: AWS CloudTrail
Enable AWS CloudTrail
in console
Control plane events:
vault activities
Data plane events:
archive activities
19. Management features: vault access policies
Manage access to a vault in a single location – single AWS Identity
and Access Management (IAM) policy
Grant/revoke access to internal business units/teams
“Marketing_Vault” has an access policy that is distinct from
“DevOps_Vault”
Easily manage cross-account access for your business partner
Simply add a section for your business partner in the same policy
20. Management features: Vault Lock
Non-overwrite, non-erasable records
Time-based retention with “ArchiveAgeInDays” control
Policy lockdown (strong governance)
Legal hold with vault-level tags
Configure optional designated third-party access and grant
temporary access
21. Vault Lock: two-step locking
InitiateVaultLock
Effectuates a retention policy for testing (in-progress state)
Returns a unique lock ID (expires after 24 hours)
AbortVaultLock
Deletes an in-progress policy
Ability to modify a policy before locking it down
CompleteVaultLock
Locks down the vault with the appropriate lock ID
A Vault Lock policy cannot be aborted once locked
Management features: Vault Lock
22. Set up a legal hold tag
Configure a vault-level tag “LegalHold”
Set initial value to “False”
Add compliance control for legal hold in a vault lock policy
Deny delete archive operation
From anybody (root, administrators, users, business partners)
When LegalHold tag = “True”
Place or lift legal hold by updating the tag value
Legal hold with vault-level tags
Management features: Vault Lock
24. Map one vault to a single retention range
Group regulatory data by retention: 1-year vault, 6-year vault, etc.
Create a new vault and lock it before storing production data
Enforce the full ArchiveAgeInDays on all new archives
Leave no “gap” on existing archives
Thoroughly test a vault lock policy before locking it down (Abort/Initiate)
Implement only the most restrictive controls with Vault Lock
Leave the flexible controls to vault access policy
Vault Lock best practices
Management features: Vault Lock
25. Amazon Glacier received a third-party assessment from
Cohasset Associates on how Amazon Glacier with Vault Lock
can be used to meet the requirements of SEC 17a-4(f) and
CFTC 1.31(b)-(c)
Third-party assessment
Management features: Vault Lock
26. Data retrievals: basic concepts
Initiate job
ArchiveId: AE99F…
Vault: Films -> Job ID
1
Retrieval Processing
(minutes or hours depending on
retrieval option)
2
3 Job completion notification
4 Download output
29. Data retrievals: data retrieval policies
Provides transparency and cost control for data retrievals
Governs all retrieval activities for an account in a region
Synchronously accepts or rejects each retrieval request
Accounts for inflight retrieval operations
30. Data retrievals: expedited and bulk retrievals
Expedited Standard Bulk
Data Access Time 1 - 5 minutes 3 - 5 hours 5 - 12 hours
Data Retrievals $0.03 per GB $0.01 per GB $0.0025 per GB
Retrieval Requests $0.01 per request $0.05 per 1,000 requests $0.025 per 1,000 requests
Expedited: designed for occasional urgent access to a small number of archives
Standard: low-cost option for retrieving data in just a few hours
Bulk: lowest cost option optimized for large retrievals, up to petabytes of data in
12 hours
Three flexible and powerful retrieval options to access any of your Amazon
Glacier data
31. Data retrievals: expedited retrievals
Expedited: two types of requests
On-demand: like EC2 On-Demand instances are available
the vast majority of the time
Provisioned requests: guaranteed capacity
Provisioned capacity
Guarantees expedited retrieval capacity is available when
needed
Ensure at least 3 expedited requests every 5 minutes and
provides up to 150 MB/s of retrieval throughput
$100 per month per unit