2. Versions
2
● Architecting for the Cloud: AWS Best Practices - (Feb 2016)
○ 比較官腔的寫法
● Architecting for the Cloud: AWS Best Practices - (Jan 2011)
○ 比較有人性的寫法
7. ● Scalability
● Disposable Resources Instead of Fixed Servers
● Automation
● Loose Coupling
● Services, Not Servers
● Databases
● Removing Single Points of Failure
● Optimize for Cost
● Caching
● Security
Design Principles
7
8. ● Scaling Vertically
○ Scale-up, scale-down
○ c4.large → c4.xlarge → c4.2xlarge
○ CPU, Memory, IO
● Scaling Horizontally → Scale-out, Scale-in
○ Stateless Applications
○ Stateless Components
○ Stateful Components
○ Distributed Processing
● Elasticity is one of the fundamental properties of the cloud.
○ EC2, ELB, ECS, EBS, EIP, ENI,
Scalability
8
9. ● Push mode: distribute a workload is through the use of a load balancing solution
○ ELB routes incoming application request across multiple EC2 instances
● Pull mode: async event-driven workloads do not require a load balancing solution.
○ tasks that need to be performed or data that need to be processed could be stored as
messages in a queue using Amazon Simple Queue Service (Amazon SQS) or as a streaming
data solution like Amazon Kinesis.
Stateless Applications
9
25. OS Patch
Instance Size
Service Capacity
Cost and Budget
OS Version
User Permission
OS Utilization
Performance OS Optimize
Package for Services
Config Management
Hardware Failuare
25
Operation Tasks for Servers
28. ● Introducing Redundancy
○ standby or active mode
○ when a resource fails, functionality is recovered on a 2nd resource using a process call
failover
● Detect Failure
○ Route53 Health Check
○ EC2 auto recovery
○ Auto Scaling
Removing Single Points of Failure (SPOF)
28
29. Removing Single Points of Failure (SPOF)
29
● Durable Data Storage
○ Maintain a variety of data
○ ync replication -> RAID1, RAID5, GFS (GlusterFS)
○ Durability: No replacement for backups
○ DR: RPO, RTO
● Automated Multi-Data Center Resilience
○ MultiAZ, VPC AZs
○ ELB AZ, DynamoDB, RDS
○ Region Levels
● Fault Isolation and Traditional Horizontal Scaling
○ Shuffle Sharding
30. Reference
1. Site Reliability Engineering
a. Chapter 22 - Addressing Cascading Failures
b. Chapter 23 - Managing Critical State: Distributed Consensus for Reliability
2. 高品質微服務
a. 第五章 容錯與災難預防
3. AWS Whitepapers
a. AWS Well-Architected Framework - Reliability Pillar (December 2017)
b. Building Fault-Tolerant Applications on AWS (October 2011)
30
40. 40
Reference
● Architecting for the Cloud (Feb, 2016) (PDF)
● Architecting on The Cloud (slideshare)
● Building Microservices
● Clean Architecture
● Site Reliability Engineering