This session reviews the latest developments in AWS services and features for ingesting and storing Digital media content. The presentation examines storage strategies for dealing with increasingly large media files and content resolution, as well as the different pricing and feature options for block, file and object storage on AWS, including an overview the recently announced Snowball, a high volume data transfer appliance Amazon S3 Infrequent Access. The key issues addressed include cost consideration for storing digital content of different quality, aging strategies and ingestion options for large storage volumes.
5. File Block Object
AWS Storage options for digital media
Amazon
EFS
Amazon
EBS
Amazon EC2
Instance
storage
Amazon
S3
Amazon
Glacier
6. A Concept - the Content Lake
Inspired from Data Lake (Coined by James Dixon in 2010)
A single store of all of digital content that you create and
acquire in any form or factor
•Don’t assume any resolutions/formats (for now or future)
•It is up to the consumer (application consuming the content) to use the
appropriate infrastructure for processing
7. Amazon S3 : the Content Lake
• Durable, cost-effective and fast
• Highly scalable front-end
– Multi-part uploads (parallel writes)
– Range-gets (parallel reads)
• No need for capacity planning or
provisioning
• Use Amazon S3 with on-premises
storage in a hybrid model
• Secure
9. Hydrating the Content Lake
Amazon S3
Amazon S3
(multi-part Upload)
Direct Connect
N x 1G | 10G
Massively Scalable Front-end
10. Introducing AWS Import/Export Snowball
Scale and Speed
• Up to 50TB Capacity per device
• 10Gbps and 1Gbps connectivity
• Parallel data transfer enables PBs transferred in a week
Secure
• Tamper-resistant enclosure
• 256-bit encryption with KMS
• Secure data erasure
Simple
• Manage entire process through AWS Console
• Lightweight data transfer client
• Notifications
11. What is Snowball? Petabyte scale data transport
E-ink shipping
label
Ruggedized
case
“8.5G Impact”
All data encrypted
end-to-end
50 TB
10G network
Rain & dust
resistant
Tamper-resistant
case & electronics
12. Can I drop it?
• No (please don’t)
• Snowball is its own box
• Has had many drop tests already
• Can handle 8.5G impacts
• Designed for shipping
14. What does it cost?
• $200 / job plus shipping
• Includes 10 days to fill the device at your site
• $15/day after the tenth day on site
• Standard Amazon S3 charges apply
• $0.03/GB to transfer data out
• $0.00/GB to transfer data in
15. How fast is that truck full of drives?
• Less than 1 day to transfer 250TB via 5x10G connections with 5
Snowballs, less than 1 week including shipping
• Number of days to transfer 250TB via the Internet at typical
utilizations
Internet Connection Speed
Utilization 1Gbps 500Mbps 300Mbps 150Mbps
25% 95 190 316 632
50% 47 95 158 316
75% 32 63 105 211
16. What does it cost?
Example 1:
• 250TB loaded on to 5 Snowballs
• 8 days at your site
• 5 * $200 = $1,000 plus shipping
Example 2:
• 30TB exported on to 1 Snowball
• 8 days at your site
• $200 + 30TB * $0.03/GB = $1,121.60 plus shipping
17. Edge Locations
Availability Zone
Region
Dallas (2)
St.Louis
Miami
JacksonvilleLos Angeles (2)
Seattle
Ashburn (3)
Newark
New York (3)
Dublin
London (2)
Amsterdam (2)
Stockholm
Frankfurt (2)Paris (2)
Singapore(2)
Hong Kong (2)
Tokyo (2)
Sao Paulo
South Bend
San Jose
Palo Alto
Hayward
Osaka
Milan
Sydney
Madrid
Seoul
Mumbai
Chennai
Regional Lakes …
18. Source
(Virginia)
Destination
(Oregon)
• Only replicates new PUTs. Once
S3 is configured, all new uploads
into a source bucket will be
replicated
• Entire bucket or prefix based
• 1:1 replication between any 2
regions
Use cases
Compliance - store data hundreds of miles apart
Lower latency - distribute data to remote customers/partners)
S3 cross-region replication
Automated, fast, and reliable asynchronous replication of data across AWS regions
19. Amazon S3
Amazon S3 (range-gets)
Direct Connect
N x 1G | 10G
Massively Scalable S3 Front-end
EBS
Instance
Store
c
Massively Scalable
Compute on AWS Cloud
On-Prem Apps
Consuming the Content Lake
20. Object life cycle from hot to cold
S3 Standard
• Primary data
• 11 9’s of durability
• 2.75c – 3c per
GB/month, $338 -
369 per TB/year
S3 – Infrequent Access
• Active Archives
• Mezzanine files
• 11 9’s of durability
• 1.25c per GB/month,
$154 per TB/year
• 1c per GB for retrievals
Glacier
• Deep/offline archives
• WORM-compliant
data
• 11 9’s of durability
• 0.7c per GB/month,
$86 per TB/year
Data tiering using Life Cycle Policies
Actual customer quote: $0.0125 ?! OMG I will
take all your storage!!!
21. 1 PB raw storage
800 TB usable storage
600 TB allocated storage
400 TB application data
S3 capacity pricing—pay only for what you use!
AWS Cloud
Storage
22. Securing your data on S3
• AWS alignment with the latest MPAA cloud
based application guidelines for content security
– August 2015
• VPC private endpoint for Amazon S3 – enables
a true private workflow capability
• Encryption & key management capabilities
• Amazon Glacier Vault for high-value
media/originals
23. Preserve, retrieve, and restore every version
of every object stored in your bucket
S3 automatically adds new versions and
preserves deleted objects with delete
markers
Easily control the number of versions kept by
using lifecycle expiration policies
Easy to turn on in the AWS Management
Console
Key = photo.gif
ID = 121212
Key = photo.gif
ID = 111111
Versioning
Enabled
PUT
Key = photo.gif
S3 versioning
24. Amazon S3 event notifications
Delivers notifications to Amazon SNS, Amazon SQS, or AWS
Lambda when events occur in Amazon S3
S3
Events
SNS topic
SQS queue
Lambda function
Notifications
Foo() {
…
}
Support for notification when
objects are created via Put,
Post, Copy, or Multipart
Upload.
Support for notification when
objects are deleted, as well
as with filtering on prefixes
and suffixes for all types of
notifications.
25. Reference Architecture – Content Processing
Pipeline (Using Lambda)
S3 multi-part API
S3 as backend storage for Content Files acesable to
other processing tasks
Amazon Elastic
Transcoder
S3 Notification
Trigger a Lambda
Function to Start a
transcoding job
Ingest
S3 Notification
Lambda function to
generate a signed
URL to share the
file
Update CMS or
Metadata
26. Elastic File System - Rendering in the Cloud
• Designed to support petabyte scale
file systems
• Throughput scales linearly with
storage
• Same latency spec across each AZ
• Thousands of concurrent NFS
connections
• Works great for large I/O sizes
• Pay for only what you use not what
you provision
• Managed with multi-copy durability
27. Media Workloads (redefined)
EBS
Instance
Store
Amazon EBS/EFS/EC2 Instance Store
Process
Partner/Affiliate/
Service Provider
User Delivery/ConsumptionVFX/Production
On-Prem Apps
Archive
Amazon Glacier (Life Cycle Policies)
c
c
Direct Connect
Content Access Transfer
Disposable Infrastructure
Auto-scaling
Workload specific
Amazon S3
EFS
28. Q&A
Learn more at: http://aws.amazon.com/s3/
http://aws.amazon.com/glacier/
gfarber@amazon.com
29. How is my data transported securely?
• Strong chain of custody
• Tamper-resistant case
• Tamper-resistant electronics
(TPM)
• Each Snowball is erased
according to NIST 800-88 media
sanitization guidelines between
every job
30. How fast is that truck full of drives?
• Less than 1 day to transfer 50TB via a 10G
connection with Snowball, less than 1 week
including shipping
• Number of days to transfer 50TB via the internet at
typical utilizationsInternet Connection Speed
Utilization 1Gbps 500Mbps 300Mbps 150Mbps
25% 19 38 63 126
50% 9 19 32 63
75% 6 13 21 42
31. What does it cost?
• Example 1:
• 40TB loaded on to 1 Snowball
• 2 days at your site
• $200 plus shipping
• Example 2:
• 30TB loaded on to 1 Snowball
• 12 days at your site
• $200 + 2*$15/day = $230 plus shipping
32. Media Storage Services
Amazon EBS
Block storage for use
with Amazon EC2
Amazon S3
Massively scalable
storage & front-end
11 9’s of durability
Internet scale
storage via API
Amazon Glacier
$0.01/GB/month
11 9’s of durability
Multiple copies across
different DCs
Storage for archiving and
backup
EC2
EBS
Amazon EFS
Share File storage for
use with Amazon EC2
EC2
EFS
Massively scalable
Storage up & down
Scalable Performance
Up to 16TB/volume
Up to 20K IOPS
SSD backed
Encryption