2. What is the
AWS
Well
Architected
Framework?
WHITEPAPER
The AWS Well Architected Framework is a white paper, published by Amazon
Web Services.
It is written by a team of AWS Solutions Architects and aims to share best
practices and core strategies for architecting in the cloud.
The White paper is designed for all levels of technical employee including:
▪ CTOs
▪ Architects
▪ Developers
▪ Operations Team Members
The paper outlines 5 pillars which are the foundation of creating a Well Architected
Framework.
It also discusses general design principles to facilitate good design in the cloud.
https://d0.awsstatic.com/whitepapers/architecture/AWS_Well-
Architected_Framework.pdf
2
4. General
Design
Principles
4
Stop guessing your capacity needs
With the cloud, there is no need to have
resources idling away doing nothing or
have downtime due to capacity being
exceeded, Scale up or down as needed.
Test systems at production scale
In the cloud, you can create a production
scale test environment on-demand and
simply shut it down as soon as you are
finished
🔨
Automate to make architectural
experimentation easier
Automation allows you to create and
replicate your systems at low cost and
avoid the expense of manual effort.
5. General
Design
Principles
cont.
5
Allow for evolutionary architectures
Rather than static one time architecture
choices, businesses can take advantage of
innovations and change their architecture.
e.g. New Instance Classes / Lambda vs EC2
Drive architectures using data
In the cloud, you can collect data on how
your architectural choices affect the
behaviour of your workload. This lets
you make fact-based decisions on how
to improve your workload.
e.g. MySQL RDS vs Aurora
Improve through game days
Test how your architecture and
processes perform by regularly
scheduling game days to simulate
production events.
e.g. Black Friday Deals
6. Operational
Excellence
Design Principles
▪ Perform operations with code
▪ Annotate documentation
▪ Make frequent , small, reversible
changes
▪ Anticipate failure
- Test for responses to
unexpected events
- Simian army (Chaos
monkey, chaos snail) used
by Netflix
▪ Learn from operational events and
failures
▪ Refine operations procedure
frequently
Questions
▪ How are you evolving your
workload while minimizing the
impact of change?
▪ How do you monitor your
workload to ensure it is operating
as expected?
▪ How do you respond to unplanned
operational events?
▪ How is escalation managed when
responding to unplanned
operational events?
6
7. Security Design Principles
▪ Implement a strong identity
foundation
▪ Enable traceability
▪ Apply security at all layers
▪ Automate security best practices
▪ Protect data in transit and at rest
▪ Prepare for security events
Questions
▪ How are you protecting access to and
use of the AWS root account
credentials?
▪ How are you enforcing network and
host level boundary protection?
▪ How are you encrypting and
protecting your data at rest?
▪ How are you encrypting and
protecting your data in transit?
▪ How are you managing keys and
credentials?
▪ How are you capturing and analyzing
logs?
▪ Sample of 6 questions, full 12 are in
the whitepaper
7
8. Reliability Design Principles
▪ Test recovery procedures
▪ Automatically recover from failure
▪ Scale horizontally to increase
aggregate system availability
▪ Stop guessing capacity
▪ Manage change in automation
Questions
▪ How does your system adapt to
changes in demand?
▪ How are you monitoring AWS
resources?
▪ How are you executing change?
▪ How are you backing up your
data?
▪ How does your system withstand
component failures?
▪ How are you testing resiliency?
▪ How are you planning for disaster
recovery?
8
9. Performance
Efficiency
Design Principles
▪ Democratize advanced
technologies
▪ Go global in minutes
▪ Use Serverless architecture
▪ Experiment more often
▪ Mechanical sympathy
Questions
▪ How do you select the best
performing architecture?
▪ How did you select your compute
solution?
▪ How do you select your storage
solution?
▪ How do you select your database
solution?
▪ How do you configure your
networking solution?
▪ How do you ensure that you continue
to have the most appropriate resource
type as new resource types and
features are introduced?
9
10. Cost
Optimisation
Design Principles
▪ Adopt a consumption model
▪ Measure overall efficiency
▪ Stop spending money on data
centre operations
▪ Analyze and attribute expenditure
▪ Use managed services to reduce
the cost of ownership
Questions
▪ Are you considering cost when
you select AWS services for your
solution?
▪ Have you sized your resources to
meet your cost targets?
▪ Have you selected the appropriate
pricing model to meet cost
targets?
▪ How do you make sure your
capacity matches but does not
exceed what you need?
▪ How are you monitoring usage
and spending?
▪ Do you decommission resources
that you no longer need or stop
resources that are temporarily not
needed? 10
Operational Excellence: Run and monitor systems to deliver business value & continually improve supporting processes and procedures
Security: Protect information, systems and assets while delivering value through risk assessments and mitigation strategies
Reliability: The ability of a system to recover from infrastructure or service disruptions, dynamically acquire computing resources to meet demand and mitigate disruptions such as misconfigurations or transient network issues.
Performance Efficiency: Use resources efficiently to meet system requirements and to maintain efficiency as demand changes and technologies evolve.
Cost Optimization: The ability to avoid or eliminate unneeded cost or suboptimal resources
PROTECTION
- Multiple layers of defense are advisable in any environment.
- Boundary protection – VPC security Groups - NACLs
- Monitoring points of ingress/outgress
- Comprehensive logging
- Monitoring
Alerting
KEYS
Rotation
Securely stored
Democratize advanced technologies
Amazon’s way of saying use managed resources where possible, especially where the technology is difficult/complicated. e.g. Media Transcoding, NoSQL databases
Mechanical Sympathy
- Understand the hardware makes you a better developer. Consider data access patterns when selecting database or storage approaches. Consider instance type? Optimized for memory vs compute
How do you ensure that you continue to have the most appropriate resource type as new resource types and features are introduced?
- In other words, how do you ensure the correct choice you made stays corrects as new products/instance classes are brought to market.
Adopt a consumption model
Pay only for what you need. Stop services when not in use. 75% reduction in costs if used for 40 hours of developer’s work week, rather than 168 hours.
Have you sized your resources to meet your cost targets?
i.e, a small instance that 23 hours to run an operation could actually cost more than a large instance that could run code < 1 hour
Pricing Model
Spot / On-Demand / Reserved