AWS Webinar 201: Designing scalable, available & resilient cloud applications

AWS
201

Designing
Scalable,
Available
&

Resilient
Cloud
Applica<ons

Markku
Lepistö
-‐
Technology
Evangelist

@markkulepisto

Housekeeping
•  Presentation ~45mins
•  Q&A using the questions panel during the
presentation
•  Reminder – Fill in the survey!

AWS
Global
Presence

10 Regions
26 Availability Zones
52 Edge Locations

SCALABLE,
AVAILABLE,
RESILIENT

CLOUD
APPLICATIONS

What
your
users
want…

What
your
users
want…

Fast,
performant

experience

What
your
users
want…

Fast,
performant

experience

Always
on,

accessible

anywhere

What
your
users
want…

Fast,
performant

experience

Always
on,

accessible

anywhere

Personalized
and

rich
applica<on

What
your
users
want…

Fast,
performant

experience

Always
on,

accessible

anywhere

Personalized
and

rich
applica<on

Lots
of
new

features
all
of
the

<me

Fast,
performant

experience

Lots
of
new

features
all
of
the

<me

Always
on,

accessible

anywhere

Personalized
and

rich
applica<on

Powerful
cloud
applica<ons

Building
powerful
cloud
applica<ons

Rule
2:
Service
requests
as
fast
as
possible

Rule
1:
Service
all
requests

Rule
3:
Handle
requests
at
any
scale

Rule
4:
Simplify
architecture
with
services

Rule
5:
Automate
opera<onal
management

Rule
6:
Design
for
failure

DNS
Applica<on
Data

Rule
1:
Service
all
requests

a)
Make
sure
requests
get
to
your
‘front
door’

DNS
Applica<on
Data
Request

Rule
1:
Service
all
requests

a)
Make
sure
requests
get
to
your
‘front
door’

DNS
Applica<on
Data
Request

a)
Make
sure
requests
get
to
your
‘front
door’

Rule
1:
Service
all
requests

DNS
Applica<on
Data
Request

…then
this
is

irrelevant

Clients
can’t
resolve

you?

Rule
1:
Service
all
requests

a)
Make
sure
requests
get
to
your
‘front
door’

DNS
Applica<on
Data
Request

“100%

Available”

SLA

Rule
1:
Service
all
requests

Route53

Feature
Details

Global
Supported
from
AWS
global
edge
loca<ons
for
fast
and
reliable
domain

name
resolu<on

Scalable
Automa<cally
scales
based
upon
query
volumes

Latency
based
rouCng
Supports
resolu<on
of
endpoints
based
upon
latency,
enabling
mul<-‐
region
applica<on
delivery

Integrated
Integrates
with
other
AWS
services
allowing
Route
53
to
front
load

balancers,
S3
and
EC2

Secure
Integrates
with
IAM
giving
ﬁne
grained
control
over
DNS
record
access

hbp://aws.amazon.com/route53/sla

a)
Make
sure
requests
get
to
your
‘front
door’

DNS
Applica<on
Data
Request

Rule
1:
Service
all
requests

a)  Make
sure
requests
get
to
your
‘front
door’

b)  Make
sure
you
open
the
door
when
they
arrive

Route53

Region

DNS
Applica<on
Data
Request

Rule
1:
Service
all
requests

Elas<c

Load

Balancer
Region

Availability
Zone

Availability
Zone

Availability
Zone

Availability
Zone

Route53

a)  Make
sure
requests
get
to
your
‘front
door’

b)  Make
sure
you
open
the
door
when
they
arrive

Elas<c
load
balancing

Mul<-‐availability
zone

Mul<-‐region

Region

Rule
1:
Service
all
requests

DNS
Applica<on
Data
Request

a)  Make
sure
requests
get
to
your
‘front
door’

b)  Make
sure
you
open
the
door
when
they
arrive

c)  Have
the
data
to
form
a
response

Elas<c

Load

Balancer
Region

Availability
Zone

Availability
Zone

Availability
Zone

Availability
Zone

Route53

Region

Rule
1:
Service
all
requests

DNS
Applica<on
Data
Request

Elas<c

Load

Balancer

Route53

Region

Availability
Zone

Availability
Zone

Availability
Zone

Availability
Zone

a)  Make
sure
requests
get
to
your
‘front
door’

b)  Make
sure
you
open
the
door
when
they
arrive

c)  Have
the
data
to
form
a
response

Mul<-‐AZ
RDS

Synchronous

Intra-‐region

Master/Slave

Asynchronous

Cross-‐region

Read
Replicas

Rule
2:
Service
requests
as
fast
as
possible

Rule
2:
Service
requests
as
fast
as
possible

a)  Choose
the
fastest
route

Region
A

Route53

Region
B

Request

Rule
2:
Service
requests
as
fast
as
possible

a)  Choose
the
fastest
route

Region
A

Route53

Region
B

16ms
92ms

Request

Rule
2:
Service
requests
as
fast
as
possible

Region
A

Route53

Region
B

16ms

Request

Region
A
DNS
entry

a)  Choose
the
fastest
route

Rule
2:
Service
requests
as
fast
as
possible

a)  Choose
the
fastest
route

b)  Oﬄoad
your
applica<on
servers

Singapore
Tokyo
Sydney
Served from S3
/images/*

3
Served from EC2
*.php

2
Single CNAME
www.mysite.com

1
CloudFront

World-‐wide
content
distribu1on
network

Easily
distribute
content
to
end
users
with
low

latency,
high
data
transfer
speeds,
and
no

commitments.

Without
CloudFront

EC2
webservers/app
servers
loaded
by
user

requests

Rule
2:
Service
requests
as
fast
as
possible

a)  Choose
the
fastest
route

b)  Oﬄoad
your
applica<on
servers

With
CloudFront

Load
of
user
requests
pushed
into

CloudFront,
EC2
cluster
can
scale

down

Oﬄoad

Scale

Down

Rule
2:
Service
requests
as
fast
as
possible

a)  Choose
the
fastest
route

b)  Oﬄoad
your
applica<on
servers

Rule
2:
Service
requests
as
fast
as
possible

Response
Time

Server
Load

Response
Time

Server

Load

Response
Time

Server

Load

No
CDN
CDN
for

Sta<c

Content

CDN
for

Sta<c
&

Dynamic

Content

Oﬄoad

Scale

Down

a)  Choose
the
fastest
route

b)  Oﬄoad
your
applica<on
servers

Rule
2:
Service
requests
as
fast
as
possible

a)  Choose
the
fastest
route

b)  Oﬄoad
your
applica<on
servers

c)  Cache
it
if
you
can

Elas<Cache

Memcached
and
Redis
compa1ble

caching
layer

Serve
frequently
requested
&
slow

changing
data
from
scalable
cache

clusters

Reduce
load
on
database
and
other

servers

Rule
2:
Service
requests
as
fast
as
possible

a)  Choose
the
fastest
route

b)  Oﬄoad
your
applica<on
servers

c)  Cache
it
if
you
can

d)  Single
digit
latencies
where
it
mabers

Scale

Database
Query
Performance

Desired
consistency,
predictability

Rule
2:
Service
requests
as
fast
as
possible

a)  Choose
the
fastest
route

b)  Oﬄoad
your
applica<on
servers

c)  Cache
it
if
you
can

d)  Single
digit
latencies
where
it
mabers

Scale

Database
Query
Performance

Desired
consistency,
predictability

Actual

degraded

performance

with
scale

Rule
2:
Service
requests
as
fast
as
possible

a)  Choose
the
fastest
route

b)  Oﬄoad
your
applica<on
servers

c)  Cache
it
if
you
can

d)  Single
digit
latencies
where
it
mabers

Scale

Database
Query
Performance

Desired
consistency,
predictability

Actual

degraded

performance

with
scale

Management problems

Data
sharding

Data
caching

Provisioning

Cluster
management

Fault
management

Rule
2:
Service
requests
as
fast
as
possible

a)  Choose
the
fastest
route

b)  Oﬄoad
your
applica<on
servers

c)  Cache
it
if
you
can

d)  Single
digit
latencies
where
it
mabers

Scale

Database
Query
Performance

Dynamo
DB
Query
Performance

Rela<onal

Database

Query

Performance

DynamoDB

Low
latency

Large
scale

Zero
admin

Predictable
performance

Rule
2:
Service
requests
as
fast
as
possible

a)  Choose
the
fastest
route

b)  Oﬄoad
your
applica<on
servers

c)  Cache
it
if
you
can

d)  Single
digit
latencies
where
it
mabers

Scale

Database
Query
Performance

Dynamo
DB
Query
Performance
DynamoDB

Low
latency

Large
scale

Zero
admin

Predictable
performance

Average
single-‐digit
milliseconds
server
side

latencies

Runs
on
solid
state
drives,
and
is
built
to

maintain
consistent,
fast
latencies
at
any
scale

Rule
3:
Handle
requests
at
any
scale

a)  Scale
up

Ver<cal
Scaling

From
$0.013/hr

Basic
unit
of
compute
capacity

Several
families
of
instance
types
available,
from
micro
to

compute,
storage,
memory
and
GPU
op1mized

Scale
up
with
Elas<c
Compute
Cloud
(EC2)

Rule
3:
Handle
requests
at
any
scale

a)  Scale
up

measure
instance
resource

u<liza<on
under
load

&

select
opCmal
instance
size

per
applica<on
<er
/

service

Rule
3:
Handle
requests
at
any
scale

a)  Scale
up

b)  Scale
out

Trigger
auto-scaling
policy
as-create-auto-scaling-group MyGroup
--launch-configuration MyConfig
--availability-zones ap-southeast-1a
--min-size 4
--max-size 200
Auto-‐scaling

Automa1c
re-‐sizing
of
compute
clusters
based
upon
demand

Manually

Send
an
API
call
or
use
CLI
to
launch/
terminate
instances
–
Only
need
to

specify
capacity
change
(+/-‐)

By
Schedule

Scale
up/down
based
on
date
and
<me

a)  Scale
up

b)  Scale
out

By
Policy

Scale
in
response
to
changing
condi<ons,

based
on
user
conﬁgured
real-‐<me

monitoring
and
alerts

Auto-‐Rebalance

Instances
are
automa<cally
launched/
terminated
to
ensure
the
applica<on
is

balanced
across
mul<ple
AZs

Rule
3:
Handle
requests
at
any
scale

Manually

Send
an
API
call
or
use
CLI
to
launch/
terminate
instances
–
Only
need
to

specify
capacity
change
(+/-‐)

By
Schedule

Scale
up/down
based
on
date
and
<me
Preemp<ve
manual
scaling
of

capacity

e.g.
before
a
marke1ng
event
add
10
more

instances

Regular
scaling
up
and
down
of

instances

e.g.
scale
from
0
to
2
to
process
SQS

messages
every
night
or
double
capacity

on
a
Friday
night

a)  Scale
up

b)  Scale
out

By
Policy

Scale
in
response
to
changing
condi<ons,

based
on
user
conﬁgured
real-‐<me

monitoring
and
alerts

Auto-‐Rebalance

Instances
are
automa<cally
launched/
terminated
to
ensure
the
applica<on
is

balanced
across
mul<ple
Azs

Rule
3:
Handle
requests
at
any
scale

Dynamic
scale
based
upon

custom
metrics

e.g.
SQS
queue
depth,
Average
CPU
load,

ELB
latency

Maintain
capacity
across

availability
zones

e.g.
Instance
availability
maintained
in

event
of
AZ
becoming
unavailable

Rule
3:
Handle
requests
at
any
scale

a)  Scale
up

b)  Scale
out

c)  Dial
it
up

Elas<c
Block
Store

Provisioned
IOPS
up
to
4000
per

volume,
up
to
48
000
per
instance

Predictable
performance
for

demanding
workloads
such
as

databases

DynamoDB

Provisioned
read/write
performance
per
table

Predictable
high
performance
scaled
via

console,
API
or

Dynamic
DynamoDB,
at

hYp://dynamic-‐dynamodb.readthedocs.org

Rule
3:
Handle
requests
at
any
scale

a)  Scale
up

b)  Scale
out

c)  Dial
it
up

Dynamic
DynamoDB

Your

Business

70%

On-‐Premise

Infrastructure

30%

Managing
All
of
the

“Undiﬀeren<ated
Heavy
Liring”

Rule
4:
Simplify
architecture
with
services

AWS

Cloud-‐Based

Infrastructure

Your

Business

More
Time
to
Focus
on

Your
Business

Conﬁguring
Your

Cloud
Assets

70%

30%
70%

On-‐Premise

Infrastructure

30%

Managing
All
of
the

“Undiﬀeren<ated
Heavy
Liring”

Rule
4:
Simplify
architecture
with
services

Enterprise
Applications
Virtual Desktops Collaboration and Sharing
Platform
Services
Databases
Caching
Relational
No SQL
Analytics
Hadoop
Real-time
Data
Workflows
Data
Warehouse
App Services
Queuing
Orchestration
App Streaming
Transcoding
Email
Search
Deployment & Management
Containers
Dev/ops Tools
Resource Templates
Usage Tracking
Monitoring and Logs
Mobile Services
Identity
Sync
Mobile Analytics
Notifications
Foundation
Services
Compute
(VMs, Auto-scaling
and Load Balancing)
Storage
(Object, Block
and Archive)
Security &
Access Control
Networking
Infrastructure Regions CDN and Points of PresenceAvailability Zones

Compute

Storage

Security
Scaling

Database

Networking

Monitoring

Messaging

Workﬂow

DNS

Load
Balancing

Backup
CDN

Rule
5:
Automate
opera<onal
management

a)  Everything
is
programmable

Access
everything

via
CLI,
API
or

Console

Achieve
the
highest
levels

of
automa<on

sophis<ca<on
with
ease

Rule
5:
Automate
opera<onal
management

a)  Everything
is
programmable

b)  Think
disposable,
one
click
deployments

AWS
OpsWorks
AWS

CloudFormaCon

AWS
ElasCc

Beanstalk

DevOps
framework
for

applicaCon
lifecycle

management
and

automaCon

Templates
to
deploy
&

update
infrastructure

as
code

Automated
resource

management
–
web

apps
made
easy

DIY
/

On
Demand

DIY,
on
demand

resources:
EC2,
S3,

custom
AMI’s,
etc.

ControlConvenience

Rule
5:
Automate
opera<onal
management

a)  Everything
is
programmable

b)  Think
disposable,
one
click
deployments

Rule
2:
Service
requests
as
fast
as
possible

Rule
1:
Service
all
web
requests

Rule
3:
Handle
requests
at
any
scale

Rule
4:
Simplify
architecture
with
services

Rule
5:
Automate
opera<onal
management

Rule
6:
Design
for
failure

Rule
5:
Automate
opera<onal
management

a)  Everything
is
programmable

b)  Think
disposable,
one
click
deployments

c)  Design
for
failure,
implement
self
healing

Customize
instance

startup

Get
instances
to
ask
‘who
am

I?’
ques<on
on
startup
and
be

conﬁgured
dynamically
upon

being
answered

Maintain
capacity
of

instances

Using
a
minimum
pool

size
will
maintain

capacity
in
the
event
of

instance
failures

Know
what’s
going
on,

take
automated
ac<ons

Use
CloudWatch
standard
and

custom
metrics
to
create

alarms.

Respond
with
automated

administra<on
ac<ons

Bootstrapping Auto-scaling Cloud Watch

YOUR GOAL
Applications should continue to function even if
the underlying HW or SW unit fails or is removed
or replaced

Avoid single points of failure.
Assume everything fails, and design
backwards.

AWS BUILDING BLOCKS
Inherently Fault-Tolerant Services Fault-Tolerant with
the right architecture
!  Amazon S3
!  Amazon DynamoDB
!  Amazon CloudFront
!  Amazon SWF
!  Amazon SQS
!  Amazon SNS
!  Amazon SES
!  Amazon Route53
!  Elastic Load Balancing
!  AWS IAM
!  AWS Elastic
Beanstalk
!  Amazon
ElastiCache
!  Amazon EMR
!  Amazon CloudSearch
!  Amazon Redshift
!  etc
"  Amazon EC2
"  Amazon EBS
"  Amazon RDS
"  Amazon VPC

BUILD LOOSELY
COUPLED SYSTEMS
The looser the are coupled,
the bigger they scale

Create independent Components
Design everything as a Black Box

Create independent Components
Design everything as a Black Box
Think in terms of (Micro) Services

Services are Black Boxes Exposed via APIs
My Cool Feature
Iterate, even re-
write internal
implementation
API is stable, with
few changes,
potentially
versioning
API

Loose Coupling Enables Scale-out and Resiliency
Use Message Queues
Simple Queue
Service (SQS)

Use Idempotent Interfaces

Use Circuit Breakers

Use Circuit Breakers
Temporarily bypass
unresponsive
service. Switch to
degraded mode
transactions

Auto Scale, Load Balance, Monitor, HA Assure
Each Service Separately

Statelessness Enables Scale-out
Separate State and Data from Compute Instances
Load Balanced, Auto Scaling
pool of EC2 Workers
Scalable Services for
State and Data
ElastiCacheDynamoDBS3

TEST IT
Verify your design by generating
failure modes

Rule 5: Automate operational management
a)  Everything
is
programmable

b)  Think
disposable,
one
click
deployments

c)  Design
for
failure,
implement
self
healing

Chaos Monkey
Introduce failures
GAME DAY!

a)  Everything
is
programmable

b)  Think
disposable,
one
click
deployments

c)  Design
for
failure,
implement
self
healing

Latency Monkey
Slow down dependent
service responses

a)  Everything
is
programmable

b)  Think
disposable,
one
click
deployments

c)  Design
for
failure,
implement
self
healing

Conformity Monkey
Detect system
entropy & drift

What
your
users
want…

Fast,
performant

experience

Lots
of
new

features
all
of
the

<me

Always
on,

accessible

anywhere

Personalized
and

rich
applica<on

With
AWS

Elas<c
u<lity

capacity

✔

Lots
of
new

features
all
of
the

<me

Always
on,

accessible

anywhere

Personalized
and

rich
applica<on

With
AWS

Elas<c
u<lity

capacity

✔
Highly
available

global
coverage

✔

Lots
of
new

features
all
of
the

<me

Personalized
and

rich
applica<on

With
AWS

Elas<c
u<lity

capacity

✔
Highly
available

global
coverage

✔

Personalized
and

rich
applica<on

Agility
&

automated

opera<ons

✔

With
AWS

Elas<c
u<lity

capacity

✔
Highly
available

global
coverage

✔

Agility
&

automated

opera<ons

✔

Cost
eﬀec<ve

storage,
big
data
&

analy<cs

✔

aws.amazon.com

get
started
with
the
free
<er

Thank
you

Markku
Lepistö
-‐
Technology
Evangelist

@markkulepisto

Your
feedback
is
important

Let’s
have
a
Poll!

Let
us
know
what
you
want
to
see
next

Your
feedback
is
important

Please
complete
the

Survey!

What’s
good,
what’s
not

What
you
want
to
see
at
these
events

What
you
want
AWS
to
deliver
for
you

AWS Webinar 201: Designing scalable, available & resilient cloud applications

AWS Webinar 201: Designing scalable, available & resilient cloud applications

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (20)

Semelhante a AWS Webinar 201: Designing scalable, available & resilient cloud applications

Semelhante a AWS Webinar 201: Designing scalable, available & resilient cloud applications (20)

Mais de Amazon Web Services

Mais de Amazon Web Services (20)

Último

Último (20)

AWS Webinar 201: Designing scalable, available & resilient cloud applications