S3 and Glacier

Glacier and S3
Dave Thompson
AWS Meetup Michigan, Jan 2014

Who the @#%^ is Dave
Thompson?
• DevOps/SRE/Systems guy from MI by way of San
Francisco
• Current Employer: MuleSoft Inc
• Past Employers: Netflix, Domino’s Pizza, U of M
• Also contributing to the madness at RBN

… and what is he talking
about?
• Today, we’ll talk about a case study using Glacier
with S3, and the various surprises that I
encountered on the way.

Our Story So Far
• Client’s datacenter is going dark in a few months.
• Their app is data heavy… a little less than 1 BN
small files.

Our Story So Far (cont.)
• Client has migrated app servers to EC2
• Data has been uploaded to S3

Everything Goes According
to Plan!
• Files are uploaded to S3
• App updated to use S3 data

Act 2: The Public Cloud Strikes Back

Things take a
dark turn…
S3 is too latent for the app.

Enter RBN!
The proposal: migrate the data from S3 to a cloud storage
solution (Zadara), and archive the files to Glacier.

Everything Goes According
to Plan (Again)!
• Files are copied to Zadara share
• S3 lifecycle configured to archive objects to Glacier

The Zadara share becomes
corrupted after the data is migrated.
Except…

Amazon Glacier: a Primer
• Glacier is an archival solution provided by AWS.
• It’s closely integrated with S3.
• Use cases for Glacier and S3 are different,
though…

S3 vs Glacier
• Unlike an S3 GET, a Glacier RETRIEVAL takes ~4
hours
• UPLOAD and RETRIEVAL API requests are 10x
more expensive on Glacier than comparable S3
requests
• Bandwidth charges for RETRIEVAL requests apply,
even inside us-east-1

S3 vs Glacier (cont.)
• This means that Glacier is optimized for
compressed archives (i.e. tarball data)
• S3 is about equally suited for smaller or larger files
• Automatically archiving S3 objects to Glacier can
thus lead to great sadness.

What a Twist!
~100MM files had already been
automatically archived to Glacier.

The New Plan
• Restore files from Glacier back to S3
• Migrate data from S3 to Zadara share
• Archive files back to Glacier in tar.gz chunks
• Create DynamoDB index from file name to Glacier
archive for future restore

but wait…
How much was this restore going to cost?

Task 0: Calculating Cost
• Glacier pricing model is… interesting
• Costs are fixed per UPLOAD and RETRIEVAL
request
• Cost for bandwidth based on the peak outbound
bandwidth consumed in a monthly billing period2
• Monthly bandwidth equal to 5% of your total Glacier
usage is permitted free of charge

The Equation(Oh, boy. Okay, let’s do
this.)
• Let X equal the number of RETRIEVE API calls made.
• Let Y equal the amount to restore in GB.
• Let Z equal the total amount of data archived in GB.
• Let T equal the time to restore the data in hours.
• Then the cost can be expressed as:
(0.05 * (X / 1000)) + (((Y / T) - (Z * .05 / 30) * .01 * 720)

Task 1: Restore from Glacier
• Two m2.large instances running a Python daemon
• Multiple iterations, from single threaded to multi-
threaded to multiprocessing with threading
After iterating several times to get the speed we needed, I
started the process for the ‘last time’ on a Sunday evening.
ETA: ~5 days

This Page Intentionally
Left Blank

Protip:
Glacier is not optimized for RPS

(cont.)
Glacier team was not amused.

(cont.)
Restore continued at the ‘suggested’ rate, and thereafter
completed successfully a couple of weeks later.
Task 1 complete!

Task 2: Migrate and Archive
Data
Now we just needed to migrate the data from S3 to Zadara
(again), create tarballs of the files, archive them to Glacier, and
create a DynamoDB index so you can look up individual files.
Easy!

Task 2: Migrate and Archive
Data (cont.)
Back to iPython and Boto. Recent experience with Python
threading and multiprocessing was to prove helpful.

Great Success!
And the whole thing only took about 10x as long
as the client initially estimated!

Lessons Learned
• Glacier is optimized for large, compressed files and
lower request rates.
• Be very careful about the S3 -> Glacier lifecycle
option.
• If you DoS an Amazon service, you get special
attention!

S3 and Glacier

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a S3 and Glacier

Semelhante a S3 and Glacier (20)

Último

Último (20)

S3 and Glacier