O SlideShare utiliza cookies para otimizar a funcionalidade e o desempenho do site, assim como para apresentar publicidade mais relevante aos nossos usuários. Se você continuar a navegar o site, você aceita o uso de cookies. Leia nosso Contrato do Usuário e nossa Política de Privacidade.
O SlideShare utiliza cookies para otimizar a funcionalidade e o desempenho do site, assim como para apresentar publicidade mais relevante aos nossos usuários. Se você continuar a utilizar o site, você aceita o uso de cookies. Leia nossa Política de Privacidade e nosso Contrato do Usuário para obter mais detalhes.
AWS 201 : Breakout Track Singapore “Design for Failure” HA and DR Best practices Harish Ganesan Co founder & CTO 8KMiles www.twitter.com/harish11g http://www.linkedin.com/in/harishganesan
Agenda• Explain HA Architecture with Real Customer Case• Understand how to Architect a web app in AWS with – Highly Availability – DR – Scalability• Why AWS ?
About the Customer• Online ecommerce company• NASDAQ Listed• Application consumed by Online users , Mobile and Web Services
Requirements• High Availability on all tiers with No SPOF• Auto Scalable and elastic infrastructure• Ability to serve millions of requests per day• Serve peak HTTP traffic of 8000+ reqs/sec• Serve peak HTTPS traffic of 2500+ reqs/sec• 65% of the business is done during holiday , so no downtime is affordable• Monitoring , Backup and deployment ease• Optimal DR setup ( Cost vs RTO/RPO)
Technology and Tiers• Multi tiered Linux, Apache, Java Web site on AWS• Data base tier using MySQL• Cache Tier• Integration tier with Queues and Background programs• HTTP and HTTPS protocol
What 8KMiles did ?• Consulting : Architected the entire website infra on AWS• Implementation: – Configured the Infra on AWS – Developed custom DevOps scripts on AWS• Supported during the Thanksgiving and Holiday• Cloud Development Partner : – Currently Reengineering the customer App to leverage more AWS services
A simple LAMJ Architecture Single Point of Failure at US-EAST-1a multiple tiers AWS Security Groups IntegrationWeb/App/Cache ServicesServer MySQL DB CloudWatch Not a Highly Available Architecture
How to avoid SPOF and build a robust architecture ?
Step 1: Distribute the Application to Multiple Tiers 1 Separate out the US-EAST-1a individual tiers into AWS Security Groups separate EC2 instances IntegrationWeb/App Server Service tier MySQL DB CloudWatch
Step 2: Add Multiple Servers in each layer 1 Add Multiple EC2 US-EAST-1a instances in every tier AWS Security Groups IntegrationWeb/App Server Service tier MySQL DB CloudWatch
Why AWS ELB ?• AWS ELB provides load balancing service with thousands of EC2 servers behind them• AWS ELB will automatically Scale up /down the load balancing servers in backend• The theoretical maximum response rate of AWS ELB is limitless• It can handle 20000+ concurrent requests easily (RightScale Benchmark)• AWS ELB works seamlessly with AWS Auto Scaling
Why AWS ELB ?• AWS ELB is integrated well with other AWS• No maintenance• Pay as you go
Load balancing LayerOnline / Web / Mobile 1 Simple Round Robin AlgorithmAWS Elastic Load balancer US-EAST-1a AWS Security Groups 2 Health Checks , SSL termination 3 ELB is a Highly Available Web/App Server Service with No SPOF MySQL DB
High Availability @ Web/App tier 1 Add AWS Auto Scaling to Web / App tier AWS Elastic Load balancer US-EAST-1a AWS Security Groups 2 Tie AWS Auto Scaling with Web/App Server AWS ELBS3 Puppet Auto Scaling 3 Deploy the app using Puppet Integration Service Tier MySQL DB
Designing HA @ Web/App Tier• AWS Auto Scaling will manage un Healthy EC2 instances• AWS Auto Scaling will ensure minimum number Web/App EC2 instances are always running• In event of failure , new instances will be launched between 30-120 seconds automatically• ELB traffic is seamlessly attached to the Auto Scaled EC2 instances
Designing HA @ Web/App Tier• Deploy the application / patches in Auto Scaling environment using Puppet / S3 scripts• Choose the right EC2 instance Type – Large ( Less CPU intensive , HEAP 5.5 GB RAM ) – High CPU Extra Large ( More CPU intensive , HEAP 5.5 GB RAM , Concurrent GC)• Points to remember – Do not store the Session in-memory of web/app server – Rotate and move the log files to S3 periodically – Move the Uploaded data files , images to S3 or GlusterFS
What happens when US-EAST-1a AZ fails ?Solution : Leverage AWS Multi-AZ architecture
1 Infrastructure is spread across HTTP/S requests hit the Amazon Load Balancer from the browser or mobile devices Multi AZ’s of AWS inside a Region AWS Elastic Load balancer AZ: US-EAST-1a AZ: US-EAST-1b AWS Security Groups 2 AWS Elastic Load balancerWeb/App EC2 Web/App EC2 directs requests to EC2 instances across Multiple AZ’s Auto Scaling Auto Scaling 3 Amazon AutoScaling automatically launches new EC2 instances across Multiple AZ’s 4 No Code Changes required to leverage Multi-AZ
High Availability @ Web/App/DEX layer• AZ’s are connected by Low Latency network• AZ’s are insulated from failures in other Availability Zones *• AWS Auto Scaling can manage EC2 instances across AZ’s• AWS ELB can direct load to EC2 instances across AZ’s• AWS CloudWatch can monitor the EC2 instance availability across AZ
Database Tier• Options – MySQL Master- Slave replication – MySQL ndbCluster – RDS MySQL Master – Standby – RDS MySQL Master – Standby + Read Replica’s
High Availability @ DB Layer 1 Read Replica’s launched in Multiple AZ’s for HA AWS Elastic Load Balancer USA- EAST -1A USA- EAST -1B AWS Security groups 2 RDS Standby will be launched on different AZ from the RDS master for Web/App EC2 Web/App EC2 HA Auto Scaling Auto Scaling 3 Web/APP hosted on Amazon EC2 will transact S3 Read Read with RDS master and Replica Replica read from Read replica’s RDS RDS Master Standby DCloudWatch
High Availability @ DB Layer• RDS Master and RDS Standby in Multiple AZ for HA• Read Replica’s in Multiple AZ for HA• Offers No SPOF on AZ level• Read Replica’s can be launched/terminated without affecting the RDS Master availability• In event of RDS master failure, RDS Standby will be automatically promoted• Promotion <180 seconds and no changes in the application
High Availability @ DB Layer• DB snapshots and MySQL Dumps facility available• Automatic full backups at configured maintenance windows• Point in time recovery till last minute• Recovery might require App layer configuration changes
High Availability @ DB Layer• Points to remember – RDS supports only MySQL innodb engine – Give more memory to RDS Master • Use Extra Large or High Memory instance types – Keep your Read Replica’s and RDS Master with same size – Multiple Read Replica’s can be Load Balanced using HAProxy LB
Use AWS Building blocks• AWS Building blocks are in built with – Inherent fault tolerance – HA and scalability• Following Building blocks were used – S3 , CloudFront , Route 53 , CloudWatch , SNS , SQS , SES , ELB , EIP , EBS
Application Architecture in AWS Browser / Web Services / Mobile Route 53 AWS CloudFront Elastic Load balancer CDN AZ: US-EAST-1a AZ: US-EAST-1b AWS Simple AWS Security Groups Email ServiceAmazon EC2 Servers Amazon EC2 Servers C L O U Auto Scaling Auto Scaling D W ElastiCache A T AWS Simple S3 C Notification Service (Alerts) Read Slave Read Slave 1 2 H DB Master DB Standby Puppet SQS
How it is used in the Project ?• ELB – Load Balancing• Route 53 – DNS mappings , Algo- RR• CloudFront - Assets , HTML , CSS , JS , Images• S3 – Logs , Snapshots , Images• CloudWatch – Monitor the CPU , ELB , RDS , Custom metrics• SNS – System Alerts• SES – Emails ( Password , activation , app alerts )• EBS – EBS backed AMI for Web/app tier• EIP – Elastic IP for Puppet server
What happens if the Entire AWS region is affected ?Solution : Design HA/DR across Regions
High Availability across AWS Regions DR Web site is hosted in AWS Tokyo Main Web Site is hosted in AWS Singapore region
DR / HA Options in AWSNo downtime Hot Active In minutes Hot DR > 1-2 hours Warm DR > Few hours Cold DR $ $$ $$$ $$$$
Cold DR PassiveActive AWS Tokyo AWS Singapore Amazon Route 53 ELB ELB Web / App EC2 Web/App EC2 Web / App EC2 Web/App EC2 Database Layer Database Layer Master Standby Master Standby Puppet D D Sync DB Snaphsots / Dumps every X hours Sync
Cold DR• When the primary is Down , entire Secondary site is manually activated in Cold DR• RTO > Few Hours to get the Secondary site up and running• RPO – Data loss is acceptable• CloudFormation templates can be configured on Primary and Secondary• AMI’s , App and DB Data are synced periodically
Cold DR• EIP Problem – Integration Services ( FTP , WebServices)• Cost effective• Most common
Warm DR PassiveActive AWS Tokyo AWS Singapore Amazon Route 53 ELB ELB Web / App EC2 Web/App EC2 Web / App EC2 Web/App EC2 Database Layer Database Layer Master Standby Master Standby Puppet Puppet D D Asynchronous Replication of databases between AWS regions Sync
Warm DR• When the primary is Down , Secondary site is manually activated in Warm DR• RTO > 1 hours to get the Secondary site up and running• RPO – minimal Data loss is acceptable• CloudFormation templates can be configured on Primary and Secondary site• DB Data are replicated using Asynchronously• Only DB and Puppet Servers are ready and running
Warm DR• AMI’s, Application Patches and deployments are managed through Puppet• EIP Problem – Integration Services ( FTP , Web Services)• Costlier than Cold DR• Recommended in many use cases
Hot DR PassiveActive AWS Singapore AWS Tokyo Amazon ELB Route 53 ELB Web/App EC2 Web / App EC2 Web/App EC2 Web / App EC2 Database Layer Database Layer Master Standby Master Standby Puppet Puppet D D Asynchronous Replication of databases between AWS regions Sync
Hot DR• When the primary is Down , Secondary site is activated in Hot DR• RTO > few minutes to get the Secondary site up and running• RPO – very minimal Data loss is acceptable• CloudFormation templates can be configured on Primary and Secondary site• All the tiers are in ready and running state in secondary but not active with live transactions
Hot DR• DB Data are replicated using Asynchronously• AMI’s, Application Patches and deployments are managed through Puppet• EIP Problem – Integration Services ( FTP , Web Services)• Costlier than Warm DR• Rare usage
Hot Active Directional DNS / TrafficActive Active AWS Singapore AWS Tokyo Amazon ELB Route 53 ELB Web/App EC2 Web / App EC2 Web/App EC2 Web / App EC2 Database Layer Database Layer Master Standby Master Standby Puppet Puppet D D 2- way Asynchronous Replication of databases between AWS regions Sync
Hot Active-Active• Both primary and Secondary site are active• RTO > few seconds to direct the traffic from primary to Secondary site• RPO – negligible Data loss• Managed DNS server will provide automatic failover at DNS level in case of a outage at the primary website location• Transparent switch between websites hosted in AWS Singapore and AWS Tokyo within <30-60 seconds during outage
Hot Active-Active• Automatic Traffic diversion to nearest site location• Managed/Directional DNS servers are globally distributed and Highly Available Service• Persistent Data are replicated using Asynchronously (2-way)• AMI’s, Application Patches and deployments are managed through Distributed Puppet• EIP Problem – Integration Services ( FTP , Web Services)• Use case specific
Hot Active-Active• Website deployed in both regions can scale and shrink according to load• Cost effective for large server farm deployments• Low latency achieved through traffic direction• No customers are lost because of load or availability problems . Ops are happy !!!
Hot Active-Active• Technically complex and intricate setup• Costlier to build and operate (Sophistication comes at a cost)• No Unified Infra Management currently for this architecture – Example : Directional DNS Console – AWS Console – Puppet Console
Summary• Understood how to Architect HA on AWS for LAMJ website case• Understood AWS Building blocks for HA and fault tolerance• How to achieve High Availability across AWS Availability Zones (AZ’s) ?• How to achieve High Availability across AWS regions ?
If you need help in architecting High Availabilitysolutions on AWS?
Leave it to the experts , we willhandle thisCloud Architecture ConsultingCloud Application DevelopmentCloud Migration & ImplementationCloud Adoption Strategy “Lets get the job done”
Q&A“All you need is an idea and the cloud will execute it for you.” (Structure 2010 event) - Dr Werner Vogels , CTO of Amazon on 8KMiles Contact : cloud@8KMiles.com harish@8KMiles.com www.twitter.com/harish11g http://www.linkedin.com/in/harishganesan