Most organizations have data that they need to retain, but is accessed infrequently, if ever. In cases where this data needs to be accessible at a moment’s notice, it’s hard to save money by moving to an archival storage because access times on these platforms are slower. Now, customers are using Amazon S3 & Glacier for “Active Archiving” to reduce storage costs while maintaining the flexibility of instant access. In this tech talk, we’ll show you how implement Active Archiving with AWS Object Storage services, and we’ll provide some real world examples of how AWS customers are saving money with these capabilities today.
Learning Outcomes:
• Define Active Archiving, and understand how it is different from traditional cold archiving
• Review the cost modeling tools available to determine if Active Archiving is a good fit for your organization
• Learn about best practices for using AWS Object Storage features & functionality to enable Active Archiving
1. Active Archiving with Amazon S3
….and Tiering To Glacier
Marc Trimuschat
AWS Storage Services
2. Data has gravity
…easier to move processing to the data
4k/8k
Genomics
Seismic
Financial
Logs
IoT
3. Cloud Data Migration
Direct ConnectSnow* data
transport family
3rd Party
Connectors
Transfer
Acceleration
Storage
Gateway
Kinesis Firehose
AWS Storage Platform and SolutionsThe AWS Storage Portfolio
Object
Amazon GlacierAmazon S3
Block
Amazon EBS
(persistent)
Amazon EC2
Instance Store
(ephemeral)
File
Amazon EFS
4. Audio Archives – SoundCloud
• World’s leading social sound platform
• Audio files transcoded and stored in
multiple formats
• Stores PBs of data
• Transcoded files served from Amazon S3
• Originals moved to Amazon Glacier for
long-term retention
5. Satellite Image Archive
• DigitalGlobe takes Satellite imagery of the Earth
• 100PB image library = 6 billion square kilometers
• 1PB new image every year
• Images to be archived and retained for decades
6. Patient Data–Philips Healthcare
• HealthSuite digital platform powered by AWS
• 15 petabytes of patient data
• Archived for decades (beyond the lifetime of patients)
• Uses AWS HIPAA-eligible services in the BAA
7. Archive:
Data retained for the long term, for
compliance or potential future
reference
Data archiving needs are growing everywhere
• Media assets, 4K, 8K
• Health care/life sciences
• Financial services
• Regulated industries
• Oil and gas/geospatial
• Digital preservation
• Long-term backups
• Logs
9. Choice of storage classes
Standard
Active data Archive dataInfrequently accessed data
Standard - Infrequent Access Amazon Glacier
10. - Transition Standard to Standard-IA
- Transition Standard-IA to Amazon Glacier
- Expiration lifecycle policy
- Versioning support
- Prefix support
Data Lifecycle Management
T T+3 days T+5 days T+ 15 days T + 25 days T + 30 days T + 60 days T + 90 days T + 150 days T + 250 days T + 365 days
Data access frequency over time
11. Cross-Region
Replication Lifecycle Policy
Data Classification
& Management
Event
Notifications
CloudWatch Metrics S3 Inventory Audit with CloudTrail Data
Events
Storage Analytics
Standard Standard - Infrequent Access Amazon Glacier
Amazon S3: What’s New
12. Data-driven storage management for S3
• Analyze storage usage to transition the right data to the right storage class
• Understand how storage usage changes as your S3 objects get older
• Discover how much of your storage is retrieved over time
13. Manage your data
Data Classification and Management
Manage data based on what it is as opposed to where its located
• Easy data management
• Classify your data
• Tag your objects with key-value pairs
• Write policies once based on the type of data
Classification Lifecycle PolicyAccess Control
14. Amazon Glacier
• Extremely low-cost archive storage service, starting at $0.004 GB/mo
• 3 retrieval options: Expedited (1-5min), Standard (3-5hrs), Batch (5-12 hrs)
• 99.999999999% of durability (5-6 orders of magnitude higher than 2 copies
of tape)
• All data is encrypted at rest
• Features: compliance, data management, cost management, audit logging
15. Glacier: Key Concepts
• Vaults – Container for archives, up to 1,000 vaults per account
• Archives – basic unit, write-once, 40TB max, unlimited archives
• Inventory – Cold index of archives refreshed every 24 hours
• Access – Three ways to access Glacier
• Uploads – Multi-part, lifecycle, cost optimizations, Snowball
• Data management – Vault Lock, tagging, audit logs
• Retrievals – Retrieval policies, range retrievals, new feature announcement
17. Traditional archiving approaches
• Tape libraries, robots, drives, media
• Onsite (online and offline)
• Offsite tape out/vaulting
• Specialized software and personnel
• Tape refresh every 3-5 years
18. How can AWS help with your archival?
Metered usage:
Pay as you go
No capital investment
No commitment
No risky capacity planning
Avoid risks of physical
media handling
Control your geographic
locality for performance
and compliance
25. Accessing Glacier
1. S3 lifecycle integration
2. Direct Glacier API/SDK
3. Third party tools and gateways
FastGlacier
26. Use Glacier via S3 Lifecycle
S3 Standard
Active data Archive dataInfrequently accessed data
S3 - Infrequent Access Amazon Glacier
Synchronous access Async accessSynchronous access
$0.023/GB/mo. $0.004/GB/mo.$0.0125/GB/mo.
27. - Transition Standard to Standard-IA
- Transition Standard-IA to Amazon Glacier
- Transition based on object tags
- Expiration and versioning
Data lifecycle management
T T+3 days T+5 days T+ 15 days T + 25 days T + 30 days T + 60 days T + 90 days T + 150 days T + 250 days T + 365 days
Data access frequency over time
29. Glacier Direct Upload– The Basics
Create vault1
Configure access policies2
ArchiveApp user policy
Effect:Allow
Resource:
arn:aws:glacier:<accountId>:vaults/Films
Action: glacier:UploadArchive
3 Upload archives
UploadArchive(data) ->
Archive ID
30. Uploading Data: Inter- or Sneaker- net
AWS Direct
Connect
Dedicated bandwidth between
your site and AWS
Internet
Transfer data in a secure SSL tunnel
over the public Internet
AWS Import/Export
Snowball
Physical transfer of media into
and out of AWS
31. AWS Snowball Edge
Petabyte-scale hybrid device with onboard compute and storage
• 100 TB local storage
• Local compute equivalent to an Amazon EC2
m4.4xlarge instance
• 10GBase-T, 10/25Gb SFP28, and 40Gb
QSFP+ copper, and optical networking
• Ruggedized and rack-mountable
RE:INVENT 2016 LAUNCH
32. Use cases: AWS Import/Export Snowball
Cloud
Migration
Disaster
Recovery
Data Center
Decommission
Content
Distribution
34. Storage Gateway Enables Hybrid Storage Solutions
Use standard storage protocols to access AWS storage services
Customer Premises
File
Volume
Tape
Amazon EBS
snapshots
Amazon
S3
Amazon Glacier
AWS
IAM
AWS
KMS
AWS
CloudTr
ail
Amazon
CloudWatc
h
Internet
Direct
Connect
Amazon
VPC
NFS
Enterprise
storage
Backup
servers
Application
servers
iSCSI
VTL
35. Which option should I choose?
• Use S3 lifecycle managed Amazon Glacier if the S3 object keys are
sufficient for index/search capability
• Use Amazon Glacier directly if you already plan to store more
metadata/indices in a database
• Use 3rd party tools or AWS Storage Gateway to minimize coding
37. corporate data center
Media Archive and Metadata (cloud transition)
Onsite Archive Offsite Tape Archive
Hierarchical Storage Manager
Metadata (Asset Manager)
Processing Tasks
On-Premise Tape
38. Onsite Archive
Hierarchical Storage Manager
Metadata (Asset Manager)
Processing Tasks
corporate data center
AWS Region
Amazon Glacier
Cloud DAM (Syncing Metadata from on-prem)
Amazon Direct Connect
Offsite Tape ArchiveOn-Premise Tape
Media Archive (transition to the cloud)
39. Onsite Archive
Hierarchical Storage Manager
Metadata (Asset Manager)
Processing Tasks
corporate data center
AWS Region
Amazon Glacier
Cloud DAM (Syncing
Metadata from on-
prem)
Amazon S3
Cloud Based Processing Tasks
Amazon Direct Connect
On-Premise Tape Offsite Tape Archive
Media Archive (transition to the cloud)
40. Onsite Archive
Hierarchical Storage Manager
Metadata (Asset Manager)
Processing Tasks
corporate data center
AWS Region
Amazon Glacier
Cloud DAM (Syncing
Metadata from on-
prem)
Amazon S3
Cloud Based Processing Tasks
Amazon Direct Connect
Onsite Cache Offsite Tape ArchiveOn-Premise Tape
Media Archive (transition to the cloud)
41. Media Solution: Sony DADC
Problem Statement:
• Challenged by on-prem legacy infrastructure.
• Provide a performant, secure, economical media distribution solution.
• Decrease time to market for their customer’s finished content.
Use of AWS:
• EC2 content processing and SWF, SQS, SNS for media workflow
automation
• S3 for storage, Glacier for content archive
• CloudFront for OTT.
Business Benefits:
• Workflow pipelines can be run in a highly parallelized fashion through
AWS elastic scalability.
• Significantly shorten content delivery SLA with a new AWS enabled
target of 1-hr.
• Fully migrating away from on-prem infrastructure.
On-demand cloud-based media supply chain and delivery solution
42. • Media distribution backbone (Ve.nue platform)
• Over-The-Top (OTT) broadcast service
• 20PBs of media assets, 1MM+ hours of high-res content
• Assets to be archived and retained for decades
Video archives
44. “If physical deliveries can happen within
one hour based on unpredictable
requests, surely we are able to exceed
such expectations digitally”
@SonyDADCNMS
45. Sony Migration
The Challenge
• Seamlessly migrate a platform that enables content delivery across
all devices and more than 1,200 distribution points worldwide
• Store 20 petabytes of motion picture and television content
• Equating to 1,000,000M+ Hrs of content
• At a growth curve of ~1 petabyte every quarter
Desired Goals:
• One hour delivery turn around time
• Agile, scalable, predictable cost model &
infrastructure
• Investing in innovation vs. hardware
@SonyDADCNMS
50. Amazon Glacier Vault Lock allows you to easily
set compliance controls on individual vaults and
enforce them via a lockable policy
Time-based retention
MFA authentication
Controls govern all
records in a vault
Immutable policy
Two-step locking
Compliance storage with Vault Lock
51. Glacier Vault Lock
• Non-overwrite, non-erasable records
• Time-based retention with “ArchiveAgeInDays” control
• Policy lockdown (strong governance)
• Legal hold with vault-level tags
• Configure designated third-party access and grant temporary access
Amazon Glacier received a third-party assessment from Cohasset Associates on how Amazon Glacier with Vault Lock
can be used to meet the requirements of SEC 17a-4(f) and CFTC 1.31(b)-(c).
52. Proofpoint
• Cloud-based security and compliance for the enterprise: threat
research, email, mobile, social, digital risk
• Founded 2002, public in 2012
• $350M annual revenue, $3B market cap
53. Proofpoint SocialPatrol
• Policy controls and enforcement for social
• Combats fraudulent brand impersonation
• Moderates content at scale
• Ensures compliance in publishing
• Integrates with social APIs
• 150+ classifiers using NLP and ML
• Text, links, images, meta data
• Ingesting >1M social posts per day
• Built in AWS
54. Proofpoint SocialPatrol Archive with Glacier
• SEC Rule 17a-4(f)-compliant archive, purpose-built for social, enabled
by Amazon Glacier and Vault Lock
PFPT in AWS
Policy engine MySQL/C*/SolrSocial
Amazon Glacier &
Vault Lock