The AWS Well-Architected Framework enables customers to understand best practices around security, reliability, performance, cost optimization and operational excellence when building systems on AWS. This approach helps customers make informed decisions and weigh the pros and cons of application design patterns for the cloud. In this session, you'll learn how to use the Well-Architected Framework to follow AWS guidelines and best practices to your architecture on AWS.
5. Why would I want to apply the AWS Well-Architected Framework?
Build and
deploy faster
Lower or
mitigate risks
Make informed
decisions
Learn AWS
best practices
8. General Design Principles
Stop guessing your capacity needs
Test systems at production scale
Automate to make architectural experimentation easier
Allow for evolutionary architectures
Build data-driven architectures
Improve through game days
10. AWS Reference Serverless Micro Service Architectures
aws.amazon.com/serverless/
AWS Serverless Multi-Tier
Architectures
Using Amazon API Gateway and AWS Lambda
November 2015
11. AWS Foundation Services
Compute Storage Database Networking
AWS Global
Infrastructure
Regions
Availability Zones
Edge Locations
Client-side Data Encryption
Server-side Data
Encryption
Network Traffic Protection
Platform, Applications, Identity & Access Management
Operating System, Network & Firewall Configuration
Customer content
Customers
Shared Responsibility between AWS and our customers
Customers are
responsible for their
security IN the Cloud
AWS is responsible
for the security OF
the Cloud
15. Specialized Reviews by Architecture Type
• Web Application Hosting
• Content Streaming and Media Serving
• COTS Enterprise Workloads (e.g. SAP, Microsoft, Oracle)
• Fault Tolerance and High Availability
• Large Scale Processing and Huge Data Sets
• Ad Serving
• Serverless
• Gaming
17. Design Principles for Security
Apply security at all layers
Enable traceability
Implement a principle of least privilege
Focus on securing your system
Automate security best practices
21. Upload: FTP
• Work through the questions
• Use the questions as a prompt
• CURRENT STATE – what is being done now?
• TARGET STATE – what do you think they should be doing?
• Not an absolute right or wrong – use case specific
• It’s a guide
How to Document Your System
22. Key Services for Security AWS IAM
Areas Key Services
Identity and Access
Management
Detective Controls
Infrastructure Protection
Data Protection
Incident Response
Elastic Load
Balancing
Amazon EBS Amazon S3 Amazon RDS
AWS Key
Management Service
MFA Token
Amazon VPC
AWS CloudTrail AWS Config Amazon CloudWatch
AWS IAM
AWS IAM AWS CloudFormation
AWS Organizations
23. AWS Organizations
Control AWS service
use across accounts
Policy-based management for multiple AWS accounts.
Consolidate billingAutomate AWS
account creation
AWS
Organizations
24. AWS Identity & Access Management
IAM Users IAM Groups IAM Roles IAM Policies
• Granular access control for least privileges
• Manage hierarchies of AWS Accounts with
AWS Organizations
• Federate with your existing directory services
• Role-based access and segregation of duties
• Achieve just-in-time access using automation
• Create rich mobile applications without giving
end-users long-term access keys
IAM
25. You are making
API calls...
API Executed AWS CloudTrail
is continuously
recording API
calls…
And delivering
log files to you
AWS CLOUDTRAIL AWS
CloudTrail
28. Private Subnet (Web Tier)
Private Subnet (App Tier)
VPC Defense in Depth
Public Subnet
SG-Web
SG-App
SG-Web SG-Web
SG-App SG-App
10.0.2.0/24
10.0.1.0/24
10.0.3.0/24
SG-ALB
Allow CloudFront
IP ranges only
Allow SG-ALB
only
Allow SG-Web
only
29. VPC Flow Logs
• Agentless
• Enable per ENI, per subnet, or per VPC
• Logged to AWS CloudWatch Logs
• Create CloudWatch metrics from log data
• Alarm on those metrics
AWS
account
Source IP
Destination IP
Source port
Destination port
Interface Protocol Packets
Bytes Start/end time
Accept
or reject
32. Mitigate OWASP Application Threats
Good users
Bad guys
Web server
Database
Exploit
code
SQL injectionXSS
AWS WAF
filtering rule
33. SSL/TLS
Deep integration with AWS Services
Automated Certificate Renewal
CloudTrail
No extra cost
… or you can always use your own
AWS
Certificate
Manager
34. Cryptographic Services
Deep integration with AWS Services
CloudTrail
AWS SDK for application encryption
AWS
KMS
Hardware Security Module
Integrate with on-premises HSMs
Hybrid Architectures
Amazon
CloudHSM
… or you can always use your own
35. AWS CloudFormation – Infrastructure as Code
AWS CloudFormation
Orchestrate changes across AWS Services
Use as foundation to Service Catalog
products
Use with source code repositories to
manage infrastructure changes
JSON & YAML text file
describing infrastructure
Resources created from a template
Can be updated
Updates can be restricted
Template Stack
37. Design Principles for Reliability
Test recovery procedures
Automatically recover from failure
Scale horizontally to increase aggregate system availability
Stop guessing capacity
Manage change in automation
38. Key Services for Reliability
Areas Key Services
Foundations
Change management
Failure management
AWS IAM Amazon VPC
AWS CloudTrail AWS Config
AWS CloudFormation
Amazon CloudWatch
39. Foundations | Limit Management
How do you manage AWS service limits for your
accounts?
41. Foundations | Limit Management
Easy wins:
Default service limits
AWS Trusted Advisor limit checks.
Increasing soft limits if needed.
Things to consider:
Limit monitoring (possible automation)
The difference between hard and soft limits
Plan for more than you need.
Consider your limits across accounts.
Fixed Limit - 125 peering connections per VPC
Fixed Limit - 100 routes across Direct Connect
49. Foundations | Network Topology
Easy wins:
Redundant networking built in to AWS regions.
Highly available load balancing, DNS.
Choose correct CIDR masks.
Things to consider:
Default VPC quick and resilient, but plan your own.
Redundant connectivity to office/datacentre?
VPN or Direct Connect?
IP subnet address ranges overlap for VPC peering.
53. Change Management | Monitoring
Easy wins:
Amazon CloudWatch deep integration with AWS services.
Built-in CloudWatch metrics.
Highly durable CloudWatch logs.
Things to consider:
Integrate existing log solutions like Greylog or Splunk.
Automate responses to alerts.
Use Amazon EMR to gain insights.
Long term event trigger refinement.
56. Change Management | Change Execution
Easy wins:
Infrastructure as code for simple services.
Version control infrastructure for change and rollback.
Environments kept consistent.
Things to consider:
CI/CD pipeline is a long term strategy.
Continuous Delivery is different to Deployment.
Identify automation candidates.
Shift approvals to the left.
59. Failure Management | Data Durability
Easy wins:
S3 designed for 99.99999999999% durability.
Frequent snapshots of EBS volumes.
RDS takes regular incremental snapshots.
Things to consider:
Durability requirements, ease of snapshots, speed, cost.
Encryption of your data and management of keys.
Periodic recovery testing to meet RPO and RTO.
62. Failure Management | Recovery Planning
Easy wins:
Automated infrastructure for flexible testing.
Chaos Monkey and the Simian Army for failure injection.
Scheduling game days to break your system.
Things to consider:
Make sure your build servers are reliable as well.
Do your playbooks sufficiently cover recovery pathways?
Learn from your failures with Root Cause Analysis.
63. Failure Management | Recovery Planning
How are you planning for disaster recovery?
65. Failure Management | Recovery Planning
Easy wins:
Automated system recovery using infrastructure as code.
Versioning in S3 with object lifecycle policies easy to turn on.
Use another region or account to test failover.
Knowledge base for capturing incident responses.
Things to consider:
RPOs and RTOs need to be defined first.
Manage data access policies with IAM.
Be aware of Configuration drift.
Consider continuous availability.
66. Three Key Takeaways
1. Don’t forget the foundations.
2. Continually monitor your environment for events and
analysis.
3. Automate, test and iterate.
68. Design Principles for Performance Efficiency
Democratize advanced technologies
Go global in minutes
Use serverless architectures
Experiment more often
Mechanical sympathy
74. Selection | Database
Amazon
DynamoDB
Amazon
ElastiCache
Amazon
RDS
Amazon
Redshift
Fully Managed No-SQL
- Fast and Predictable
- Seamless Scalability
- Secondary Indexing
- Managed Table
Partitioning
In-Memory Cache
- Memcached/Redis
- High Performance
- Supports Sharding,
Clustering, Read
Replicas
Managed Relational DB
- Industry standard
relational databases
- Options for Read
Replicas, Provisioned
IOPs, Indexes
Data Warehouse
- Fully Managed
- Petabyte-scale
- Columnar Storage
- Specify sort keys,
distribution keys,
column encoding
75. Selection | Network
Location (Regions and Availability Zones)
- Where your users are located
- Where your data is located
- Other constraints (e.g. Security, compliance)
Considerations:
- Placement Groups
- Edge Locations
- DNS - Route53 edge location
77. Design Principles for Cost Optimization
Adopt a consumption model
Benefit from economies of scale
Stop spending money on data center operations
Analyze and attribute expenditure
Use managed services to reduce cost of ownership
78. Key Services for Cost Optimization
Areas Key Services
Cost-effective resources
Matched supply and demand
Expenditure awareness
Optimizing over time
Amazon CloudWatch
Auto Scaling
Amazon SNS
Reserved Instances AWS Trusted Advisor
AWS Blog & What’s New
Cost Allocation Tags
79. How do you visualize and allocate costs for chargeback
Cost explorer in the “billing and management” console
80. Tagging resources – add your own metadata
(Almost) everything in AWS can be tagged
Each tag is a key and an optional value
Up to 10 tags per resource
Project = natasha
Stack = Development
DevTribe = Tribe3
ticket = 78912
EC2 instance name
i-4a1c2f5d
RDS instance name
d-6x3r2f7h
Owner = DBAdmin
Stack = Production
Department =
Accounts
CostCenter = 8899
Project = BAU
Key ValueKey Value
Project = natasha
Owner = DBAdmin
Department =
Accounts
Stack = Production
S3 bucket name
s378236
Key Value
ticket = 78912
CostCenter = 8899
81. Tagging resources – Now you have metadata you can pivot
E.G. Accurately measure
What resources (name) did project = natasha use?
E.G. Chargeback
how much (monthly $) did department = accounts spend?
what proportion (monthly $) of ticket = 78921 should be charged to stack =
production?
Project
Natasha
Natasha
BAU
Stack
t
Developmen
t
Production
Production
Devtribe
Tribe3
Ticket
78921
78912
Owner
DBAdmin
DBAdmin
Depart
Accounts
Accounts
Cost center
8899
8899
EC2
S3
RDS
$680
$700
$45
Monthly $Name
82. Auto scaling: variable workloads
CloudWatch for usage
start more instances when usage is
high
stop instances when usage is low
Time Based : For development and scheduled workloads
720 hours in a month
160 business hours in a month
80% saving if you switch them off
Strategies to make sure your capacity matches, but does not
substantially exceed what you need
83. Example – using CloudWatch metrics to control Auto-Scaling
Single large instance
= wasted capacity
Autoscaling with
cloudwatch
= less wasted capacity
Autoscaling with
cloudwatch and
appropriate instance size
= Cost optimized
Time
Utilization
Time
Utilization
Time
Utilization
84. EC2 instance types – consider RAM usage
Monitor RAM with a CloudWatch custom metric
http://docs.aws.amazon.com/AmazonCloudWatc
h/latest/DeveloperGuide/mon-scripts.html
EBS
Provisioned
IOPS
EBS
General
Purpose
S3
Standard
S3
Reduced
redundancy
Glacier
EC2 c3.8xlarge
32 x vCPU, 60GB
RAM
EC2 r3.8xlarge
32 x vCPU,
244GB RAM
Greatest Savings
Greatest PerformanceGreatest Performance
EBS
Magnetic
Storage Types – choose the right storage class for your workload
Selecting appropriate EC2 instance types and storage types meet cost
targets
50% saving
85. Cost optimizing EC2 instances – same technology – optimized
commercials
EC2 “On Demand” – scale up and down for dynamic workloads
EC2 “Reserved instances” - reduce costs for steady state workloads
EC2 “Spot instances” – Lowest possible price for time insensitive
workloads
The technology is the same BUT You can pick a commercial model that
meets your business need
Serverless Compute – event based computing model with step change in
price
Or Managed services with consumption based pricing models
87. Design Principles for Operational Excellence
Perform Operations with Code
Align Operations Processes to Business Objectives
Make Regular, Small, Incremental Changes
Test for Responses to Unexpected Events
Learn from Operational Events and Failures
Keep Operations Procedures Current
88. Topics explored in Operations Excellence Pillar
• What best practices for cloud operations are you using?
• How are you doing configuration management for your workload?
• How are you evolving your workload while minimizing the impact of
change?
• How do you monitor your workload to ensure it is operating as expected?
• How do you respond to unplanned operational events?
• How is escalation managed when responding to unplanned operational
events?
93. Use CloudWatch Events and Lambda
https://aws.amazon.com/blogs/security/how-to-detect-and-automatically-remediate-unintended-permissions-in-amazon-s3-object-acls-with-
cloudwatch-events/
94. Benefits of Well-Architected
Think Cloud-Natively
Consistent Approach to
Reviewing Architecture
Understand
Potential Impact
Visibility of Risks
95. Preparing for Well Architected Review
• Complete the Online Training
• Perform Customer Self Assessment
• Evaluate Automated Assessment Tools
• Certified APN Partner Led Assessment
• AWS Account Team Engagement & Review
• Work with AWS SA on any Remediation Plans