9. Edge Locations
Global AWS Infrastructure ][
Dallas (2)
St.Louis
Miami
JacksonvilleLos Angeles (2)
Palo Alto
Seattle
Ashburn
(2)
Newark
New York (3)
Dublin
London (2)Amsterdam (2)
Stockholm
Frankfurt (2)
Paris
(2)
Singapore
(2)
Hong Kong
(2)
Tokyo (2)
Sao Paulo
South Bend
San Jose
Osaka
Milan
Sydney
Hayward
Madrid
15. [ ]
Management +
Interface
RDS CloudSearch
SES CloudFront SQS
EMR DynamoDB
Elastic Beanstalk
Simple WorkFlow
CloudFormation
CloudWatch
IAM
VPC EC2
EBS S3
(some services are omitted here)
Infrastructure
Building
Blocks
Platform
Building
blocks
Cross
Service
Features Command Line
Web Console
APIs
SDK
The Amazon Web Services universe
16. Agenda
• House Keeping & Setup
• Introduction to Amazon
Web Services
• Building a Web Property
on AWS
– Storage: S3, EBS
– Compute
– Content Delivery
– Relational Database
– Dynamo DB
• Scalability and Availability
– Snapshots
– Load Balancing
– Auto Scaling
– Security
• Log Processing Scenario
– Logging to AWS
– Elastic Map Reduce
17. Labs
During this workshop, we will build from
scratch a highly available, redundant,
scalable web property on AWS.
19. Compute
Our Building Blocks
Amazon Elastic Compute Cloud (EC2)
Amazon EC2 Instance Instances AMI DB on
Instance
Instance with
CloudWatch
Elastic IP
20. Content Delivery & Database
Our Building Blocks
Amazon Database Services RDS/DDB
Amazon RDS MySQL DB Instance DynamoDB
Amazon Cloudfront
Amazon
Cloudfront
Download
Distribution
Edge LocationStreaming
Distribution
22. • Object-based storage
(no Filesystem)
• Easily store/retrieve data
• Durability of 99.999999999%
or 99.99%
• Integrated with other AWS
Services
• Scalable
• Redundancy is managed
transparently
• File (Object): up to 5 TB each
• HTTP, HTTPS, BitTorrent
protocols
28
Amazon S3
Simple Storage Service
23. Data
Any Amazon S3 Region
Your Data
Data
Data
Data
Data
Amazon S3 Redundancy
Data is replicated
multiple times
In case of failure,
data is replicated again,
transparently
27. Regions, Availability Zones,
Edge Locations
Dallas
St.Louis Miami
Jacksonville
Los Angeles (2)
Palo Alto
Seattle
Ashburn
Newark
New York (2)
Dublin
London
Amsterdam
Stockholm
Frankfurt
Paris
Singapore
Hong Kong
Tokyo
Sao Paulo
South Bend
San Jose
38 Edge Locations in total (as of Dec 2012)
Sydney
28. Let’s simplify a bit:
we consider only a few of them
Stockholm
Hong Kong
Sao Paulo
San Jose
Your web servers
in Singapore
29. Stockholm
Hong Kong
Sao Paulo
San Jose
Content Delivery Network:
How it works
Dynamic pages (PHP, Java) (from web servers)
Static content or streaming (with CloudFront)
Your web servers
in Singapore
30. • Accelerated web content delivery
• Off-load traffic from web servers
• Big spikes in traffic
• Event streaming
• Marketing campaigns
37
Amazon CloudFront Use cases
34. • Media
• Hosting
• High Performance
Computing
• Dev & Test
• Internal Applications
• Gaming
• ... Everything that
needs computing!
42
EC2 Use cases
35. Lab Exercise
• Create a new Security Group
• Launch an Amazon EC2 instance (Linux)
• Log in with SSH as ec2-user@
• Install a web server
• Create a simple web page
• Test it on a browser: it works!
• Create and attach an Elastic IP
• Create an AMI from an EC2 Instance
37. • Block Level Storage for use
with EC2
• Volume: 1 GB to 1 TB
• Raw unformatted block
device
• Local to an Availability Zone
• Redundant
• Persistent
• Point-in-time snapshots to
Amazon S3
• Integration with CloudWatch
46
EBS
Elastic Block Storage
38. • Relational Database “as a
Service”
• Simple to Deploy
• Managed by the AWS team
• MySQL, SQLServer or
Oracle (as of 2012)
• Scalable
• Optional: automatic
Standby Replica
• Optional: multiple Read-
Only copies
• Easy DB Snapshots and
automated backup
49
RDS
Relational Database Service
39. Demo RDS
• Create a DB Instance on RDS (MySQL)
• Enable Multi-AZ Deployment
• Enable one Read Replica
• Optional: connect to the DB Instance
40. Agenda
• House Keeping & Setup
• Introduction to Amazon
Web Services
• Lab: Building a Web
Property on AWS
– Storage: S3, EBS
– Compute
– Content Delivery
– Relational Database
– Dynamo DB
• Lab: Scalability and
Availability
– Snapshots
– Load Balancing
– Auto Scaling
– Security
• Log Processing Scenario
– Logging to AWS
– Elastic Map Reduce
42. EBS
Elastic Load Balancer
• Automatically distribute incoming traffic to
multiple Amazon EC2 instances (in the same
Region).
• Automatic Health check
• IPv6 support
• Can be integrated with AutoScaling
44. Snapshots & AMIs
• Copies of EBS Volumes
• Essential to Reusability
• Copy between Regions
• Durability in S3
45. Lab Exercise
• Duplicate your entire architecture by
making an AMI
• Increase your availability by spreading
your application across availability
zones
• Bring up an ELB in front of your website
• Optional – Create a CNAME to the ELB
50. AutoScaling
• Auto Up and Auto Down
• Runs on CloudWatch metrics
• Notifications via SNS
• Spot or On-demand
• No additional Fees
51. AutoScaling
• Launch config: AMI to be used
• Autoscaling group: where/how to
launch
• Autoscaling policy: what should AS do
• Autoscaling trigger: what will activate
AS
53. Security
• Security Groups
• Granular tiered secure architecture
• Roles for services
• Best Practices - Bastions
54.
55. Before we get started on the controls…
• AWS Reports, Certifications & Accreditations
• SOC 1, Type 2 report
• SOC 2 report
• ISO 27001
• PCI DSS Level 1 service provider
• FISMA Moderate
• MPAA
• Look at http://aws.amazon.com/security
59. Agenda
• House Keeping & Setup
• Introduction to Amazon
Web Services
• Lab: Building a Web
Property on AWS
– Storage: S3, EBS
– Compute
– Content Delivery
– Relational Database
– Dynamo DB
• Lab: Scalability and
Availability
– Snapshots
– Load Balancing
– Auto Scaling
– Security
• Log Processing Scenario
– Logging to AWS
– Elastic Map Reduce
60. Getting your Data into S3
S3Console Upload
FTP
S3 API
AWS Import / Export
Direct Connect
Tsunami UDP
Storage Gateway
3rd Party Commercial
Applications
CloudFront
Flume
AWS Data Pipeline
61. S3 and Big Data
• Why S3?
• Hadoop Overview
• Hadoop on the Cloud
• Hadoop File System
63. Introducing Apache Hadoop
• Apache Hadoop
• Software for distributed data analysis
• Map/Reduce framework
• Focus on data
64. • But
• Complex
• Hard to setup
• Cap-ex intensive
• Difficult to manage
65. EMR is Hadoop in the Cloud
Hadoop is an open-source framework for
parallel processing huge amounts of data on
a cluster of machines
What is Amazon Elastic MapReduce (EMR)?
66. How does it work?
EMR
EMR ClusterS3
Put the data
into S3
Choose: Hadoop distribution, #
of nodes, types of nodes, custom
configs, Hive/Pig/etc.
Get the output
from S3
Launch the cluster using
the EMR console, CLI, SDK,
or APIs
You can also store
everything in HDFS
011001101
67. DynamoDB and Big Data
• What is Dynamo?
• Dynamo in Big Data – Volume & Velocity
69. Watch out for unexpected Costs
When the Technical Workshop comes to an end, to
avoid unwanted costs:
• Delete your S3 objects
• Destroy your CloudFront distributions
• Stop or Shut Down your EC2 and RDS instances
The customer is responsible for the resources he’s
using. AWS declines any responsibility if the
customer forgets to shut down resources.