Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Effective EC2
1. Effective EC2
PyCon Italia Qu4ttro - May 7/8/9 2010
Valentino Volonghi - valentino@adroll.com
Wednesday, May 19, 2010
2. Buzz
Why I’ll never own another server...
Joe Stump - Digg Lead Architect, CTO SimpleGeo
http://stu.mp/2010/04/why-ill-never-own-another-server.html
Cloud computing economies of scale
James Hamilton - VP & Distinguished Engineer, Amazon Web Services
http://live.visitmix.com/MIX10/Sessions/EX01
Moving to the cloud
Reddit Team
http://blog.reddit.com/2009/11/moving-to-cloud.html
Effective EC2 - Valentino Volonghi
Wednesday, May 19, 2010
5. Google AppEngine
• Transparently scalable
• No maintenance headaches
• Software must be written ad hoc
• Little control over infrastructure
Effective EC2 - Valentino Volonghi
Wednesday, May 19, 2010
6. Rackspace Cloud
• CloudServers/CloudFiles/CloudSites
• Some nice additional features
• Fewer datacenters and can’t pick them
• 12/11/07, 09/07/08, 03/11/09
Effective EC2 - Valentino Volonghi
Wednesday, May 19, 2010
8. Amazon AWS
• 4 independent regions - 10 data centers
• 3rd party vendor support
• Well integrated services accessible via API
• “Poor” single-node performance
• Flexible upper limits
Effective EC2 - Valentino Volonghi
Wednesday, May 19, 2010
9. Amazon AWS
Amazon Web Services can be as big
as our retail business
Andy Jassy - Amazon Senior VP Cloud Computing Business
Revenue: Market Share:
$650.000.000 77%
Effective EC2 - Valentino Volonghi Source: www.businessweek.com/technology/content/apr2010/tc20100428_085106.htm
Wednesday, May 19, 2010
10. AdRoll
• Scaling High ROI Display Advertising
• 60M paid campaigns impressions
• 200M advertiser pixels impressions
• 99% of requests below 3ms think time
• Realtime Dynamic Ads
Effective EC2 - Valentino Volonghi
Wednesday, May 19, 2010
11. AdRoll 2008
• ServePath Housing
• NaviSite CDN with mod_python
• Self-Hosted DNS
• MySQL DB
• NFS
Effective EC2 - Valentino Volonghi
Wednesday, May 19, 2010
12. AdRoll 2009
• ServePath Housing
• Amazon EC2 auto-managed AdServers
• Amazon S3/CloudFront CDN
• Dynect DNS Global Server Load Blancing
• MySQL DB
• NFS
Effective EC2 - Valentino Volonghi
Wednesday, May 19, 2010
17. Migration
Phase I - May 2009
Effective EC2 - Valentino Volonghi
Wednesday, May 19, 2010
18. New Ad Servers
• Very Quick & faultless boot procedure
• Easy software upgrade
• Low latency network
• Real time monitoring
Effective EC2 - Valentino Volonghi
Wednesday, May 19, 2010
19. New Ad Servers
• Custom bundled AMI stored in S3
• Python deploys the AdServer on boot
• Hidden real load test
• Boto monitoring
Effective EC2 - Valentino Volonghi
Wednesday, May 19, 2010
20. Boto monitoring
def regional_max_allocation(start_adserver=ec2.Image.start_adserver):
zones = [zone.name for zone in ec2.bconn.get_all_zones()]
(availables_with_ip, availables_without_ip,
software_failures, startings) = get_instances_by_status()
def setup_candidate(failure=None, free_ip=None):
if availables_without_ip:
new_instance = get_instance_in_unused_zone(availables_with_ip,
availables_without_ip,
lenient=True)
if failure is not None:
free_ip = failure.disassociate_elastic()
availables_without_ip.remove(new_instance)
availables_with_ip.add(new_instance)
new_instance.associate_elastic(free_ip)
return
if startings:
startings.pop()
return
start_adserver("adserver-" + utils.uuid(),
zone=get_unused_zone(availables_with_ip,
zones))
for software_failure in software_failures:
setup_candidate(failure=software_failure)
free_ips = ec2.get_all_free_elastic_ip(default=[])
for free_ip in free_ips:
Effective EC2 - Valentino Volonghi setup_candidate(free_ip=free_ip)
Wednesday, May 19, 2010
21. Migration
Phase 2 - March 2010
Effective EC2 - Valentino Volonghi
Wednesday, May 19, 2010
22. Preparation
• SQLAlchemy is awesome
• PostgreSQL 5-10x faster than MySQL
• 1 EBS is slower than 8 in RAID!! :)
• m1.small instances are useless
Effective EC2 - Valentino Volonghi
Wednesday, May 19, 2010
23. Automated Deploy
• Stock Ubuntu AMI
• Setup scripts on S3
• Fabric
➡ 5 minutes to boot a new instance
➡ No maintenance overhead
Effective EC2 - Valentino Volonghi
Wednesday, May 19, 2010
26. Generic Instance
• m1.large - 7.5 GB RAM - 2 VCPU
• 1 EBS with 500GB space
• Store frequently changed data
• Exact copies of each other
Effective EC2 - Valentino Volonghi
Wednesday, May 19, 2010
27. Web Instance
• c1.medium - 1.7 GB RAM - 2 VCPU
• No EBS
• Amazon ELB frontend
• Easily replaceable
• Logs aggregated separately in real time
Effective EC2 - Valentino Volonghi
Wednesday, May 19, 2010
29. runurl
by Eric Hammond. Download and run any script from
any URL.
Effective EC2 - Valentino Volonghi
Wednesday, May 19, 2010
30. ec2-consistent-snapshot
by Eric Hammond, atomic snapshot of EBS volumes
with xfs filesystem.
Effective EC2 - Valentino Volonghi
Wednesday, May 19, 2010
31. Eric Hammond
If you want to know more about Amazon AWS,
he’s the one you want to talk to.
Effective EC2 - Valentino Volonghi
Wednesday, May 19, 2010
32. Use Public addresses
Public addresses are automatically resolved to internal
addresses when possible.
Effective EC2 - Valentino Volonghi
Wednesday, May 19, 2010
33. Use ELB and
Cloudwatch
Very useful for latency and load monitoring… It also
makes scaling the web frontend extremely easy!
Effective EC2 - Valentino Volonghi
Wednesday, May 19, 2010
34. EBS RAID
EBS is cheap. Instead of one big EBS create one big
soft-RAID/LVM volume from many of them, it’s faster
and safer.
Effective EC2 - Valentino Volonghi
Wednesday, May 19, 2010
35. SMTPAuth / SendGrid
Mail sending from Amazon EC2 was/is crippled. Use a
third party service to improve deliverability.
Effective EC2 - Valentino Volonghi
Wednesday, May 19, 2010
36. Use /etc/hosts
Setting up a DNS is a lot of work at the beginning. Use
SSH key names to setup a custom /etc/hosts file in each
instance.
Effective EC2 - Valentino Volonghi
Wednesday, May 19, 2010
37. new_lines = ["127.0.0.1 localhost"]
for reservation in conn.get_all_instances():
for instance in reservation.instances:
if instance.state == "running":
KEYS.discard(instance.key_name)
new_lines.append(
"%s %s.internal %s" % (
socket.gethostbyname(instance.public_dns_name),
instance.key_name,
instance.key_name)
)
for missing_key in KEYS:
new_lines.append(
"127.0.0.1 %s.internal %s" % (missing_key, missing_key)
)
f = open('/etc/hosts.new', 'wb')
f.write("n".join(new_lines))
f.write("n")
f.flush()
f.close()
shutil.move('/etc/hosts.new', '/etc/hosts')
Effective EC2 - Valentino Volonghi
Wednesday, May 19, 2010