As more startups use Amazon Web Services, the following scenario becomes increasingly frequent - the startup is acquired but required by the parent company to move away from AWS and into their own data centers. Given the all encompassing nature of AWS, this is not a trivial task and requires careful planning at both the application and systems level. In this presentation, I recount my experiences at Delve, a video publishing SaaS platform, with our post acquisition migration to Limelight Networks, a global CDN, during a period of tremendous growth in traffic. In particular, I share some of the tips/techniques we employed during this process to reduce AWS dependence and evolve to a hybrid private/AWS global architecture that allowed us to compete effectively with other digital video leaders.
3. Introduction
• Without Amazon we wouldn’t be where we are today
• Audience for this talk:
– Advanced AWS users
• Too much of a good thing
• Have to stop using AWS
– Beginners
• Design system to avoid pitfalls
3
4. Agenda
• Why Reduce AWS Dependence?
• Case Study: Delve, now Limelight Video Platform
– Who Are We?
– Our Experiences
• Pre Migration Status
• Challenges
• Current Setup
• Lessons Learned: Tips/Techniques For Reducing AWS Dependencies
& Costs
4
5. Why Reduce AWS Dependence?
• Outages
– Not limited to a single service
5
6. Why Reduce AWS Dependence?
• Service depreciation
– SimpleDB
• Shared public cloud
– Multi-tenancy issues
• Business Reasons:
– “frenemy” i.e. you compete with Amazon in something
– single vendor lock-in
• Reduces leverage
6
7. Why Reduce AWS Dependence?
$$$
• Scenario #1:
– Startup acquisition
– Required to migrate
• Scenario #2:
– Grow too big for your own good
– Economical to run your own hardware
7
8. Case Study - Limelight Video Platform (LVP)
• Many world class customers – NFL, Sony, QVC, Pokemon,
MBC, Hearst, Prudential, Alloy Media etc
• Global footprint – 100+ countries, 5000+ websites
• Based in Seattle with employees in SF, NYC, LON, LAX
• Founded in 2006 as Pluggd
• Pivoted in 2008 as Delve Networks
– Online Video Platform (OVP)
– Competes with Ooyala, Brightcove, Kaltura
• Acquired by Limelight Networks in August 2010
– Limelight is a global content delivery network
8
11. Case Study – LVP AWS Usage History
• Delve Networks:
– Founded by ex-Amazon folks
– Started moving to AWS in Summer 2008
• Used Scalr for cloud management
• At peak:
– Several hundred EC2 instances
– ELB, S3, SimpleDB, EMR, CloudFront, CloudWatch, EBS, SQS
• Acquired by Limelight Networks in August 2010
– Migration work started in late Fall 2010
11
15. Current Status
• Hybrid model
– Limelight
• 4 data centers
– ~400
– 50+ services/handlers
• Other infrastructure
– Hadoop cluster
– Databases
– CDN services
– AWS services
• Burst into EC2
• S3, DynamoDB, SimpleDB, SQS, Elastic Map Reduce
– Work continuing on reducing dependence on these
15
16. Tips/Techniques for Reducing AWS
Dependency and Costs
• Machine Placement
• Caching
• Parallelization
• Open Source + Alternative Services
• Cross service redundancy
• Miscellaneous tips
16
17. Tip: Machine Placement
• Our strategy: use EC2 as little as possible for steady state
• Where put non EC2 machines?
– Still need access to other AWS services
• Weight of data
– Find data centers as close as possible to target AWS center (N Virginia)
• Proximity is important
– S3 files visible from one data center may not be immediately visible from another
– One data center isn’t enough:
• Service, geo redundancy
18. Tip: Machine Placement
• Limelight POPs:
– Direct connections to access networks
– Global fiber-optic interconnect
– But:
• POP capacity
• placement within POP
• shipping ..
19. Machine Placement - PHX
• Started off in PHX
• Close to Limelight HQ
• S3 download tests
conducted every hour
over a week
• Early 2011
19
20. Machine Placement – SoftLayer/Houston
• From SoftLayer in
Houston
• Has peering
arrangement with
Amazon
20
23. Machine Placement - IAD
• From IAD
– Best non EC2 performance
– One external hop away
• But even within IAD:
– Machine NIC
– Switch/Router setup
• Peering helps
23
24. Caching
• Tip: cache access to AWS services
– Save on RTT
– Better redundancy, fault tolerance
– AWS bandwidth costs
24
25. Caching: LVP Analytics Reporting
S3
LLNW
Simple Reporting
mem- +
DB cached service
clusters
• Need to quickly fetch, assemble
Dynamo analytics reports
DB
• SimpleDB: charged by usage 25
26. Caching: Transcoding
AWS
Virginia
IAD
Video
Processing
Handlers
Video
Processing
Handlers S3
• Video processors (transoders, thumbnail
processors …) require access to original video
• Bandwidth out of AWS - $$
26
27. Caching: Transcoding
AWS
Virginia
• Use Limelight Proxy IAD
Caching
Video
Processing
Handlers
L
L
Video
Processing
Handlers P S3
r
o
x
y
27
28. Caching: Transcoding
AWS
Virginia
• Additional benefits IAD
Video
Processing
Handlers
L
L
Video
Processing
Handlers P S3
r
o
Another POP
x
y
Video
Processing L
Handlers L
P
28
r
29. Parallelization
• AWS services are set up to be highly distributed
• Construct application/systems to parallelize requests:
– Useful for applications/systems located outside AWS
– Pipelining to get around large RTTs to AWS
• Example:
– Our transcoding
– Our real time analytics processing
29
30. Parellization – RT Processing
Simple
• hadoop process in DB
IAD
Metadata lookup
“fast” logs Job
S3 Hadoop
process Controller
Reports
Simple
DB
30
31. Parellelization – RT Processing
Simple
• Move to LL hadoop DB
cluster in PHX
• Further away from
Metadata lookup
AWS but ….
“fast” logs
h h
Job
S3 Controller
h h
Reports
Simple
DB
31
32. Parellelization/Caching – RT Processing
Simple
• Introduce caching DB
into the mix
cache
“fast” logs
h h
Job
S3 Controller
h h
Reports
Simple
DB
32
33. Open Source + Alternative Services
• Moving out of AWS means you have to find alternatives
• Sometimes involves multiple building blocks
• Alternatives to
– SimpleDB
• MongoDB instances
– CloudWatch
• Cloudkick
• Zabbix
– S3
• GlusterFS, Limelight Cloud Storage
– ELB
– Public cloud
33
36. Private Cloud Alternative
• At AWS:
– Used Scalr for cloud management
– Amazon constantly improving own tools
• At Limelight:
– Original vision:
• Use something like Eucalyptus/OpenStack
• Seamless amalgam of public-private cloud using Scalr
– Rude reality:
• Learning curve
• Price, maintainance
• Didn’t know internal Limelight processes, network topology
• Business reality: start migration ASAP
36
37. Private Cloud Alternative
• Opscode’s Chef
– Infrastructure as code
– Infrastructure as a service
• Hosted version of Chef
• We use Chef for:
– Node management
– Service deployment
• Limelight
• Starting to use in EC2 as well
37
38. Private Cloud Alternative
• Our infrastructure management model:
– Recipes:
• Tomcat service, apache service, java, memcached setup
– Roles:
• Use recipes to construct a service
– Environment:
• Base, dev, staging, production
– Node:
• Environment + roles
• Difficulties:
– Rolling deployments
– Repurposing nodes without virtualization
38
39. Cross Service Redundancy
• Backup data
• Example: we keep copies in S3 of reports stored in
SimpleDB, DynamoDB
– Alternative source if SimpleDB, DynamoDB goes down
– Also:
• Easy to copy reports to other alternatives
• Don’t have to incur additional AWS costs pulling entire corpus out of dbs
39
40. Other Miscellaneous Tips
• S3:
– Compress files!
• Save storage costs
• Less time to transfer over networks
• Elastic Map Reduce:
– Multitenancy issues affect performance
• Time of day
• instance type
– Non cluster compute instances
40
41. Other Miscellaneous Tips
• DynamoDB:
– A big component of DynamoDB bill is read/write provisioning speed
• Limits on how often provisioning can be changed
• Can be reduced only once a day
– Toggle speeds if uploads can be batched
• raise write throughput prior to uploading the bulk of our data for the day,
then reduce
Start most of the day’s uploads
Complete most of the day’s uploads
Ddb write speed
Time during a day