Traditional disaster recovery (DR) has had a spotty record for enterprises. This session compares conventional approaches to DR to those using the AWS cloud and talks about the four ascending levels of AWS DR options and the benefits and tradeoffs among them. The session goes on to discuss backup and restore architectures both using partner products and solutions that assist in backup, recovery, DR, and continuity of operations (COOP).
2. Session agenda
Context: on-premises Disaster Recovery (DR) using AWS
Why AWS for recovery of on-premises IT infrastructure
The ascending levels of DR
DR/Continuity scenarios
Demo
Q&A
3. Terminology
Business Continuity
Business Continuity ensures that an
organization's critical business functions
continue to operate or recover quickly
despite serious incidents.
Disaster Recovery
Disaster Recovery (DR) enables the
recovery or continuation of vital technology
infrastructure and systems following a
natural or human-induced disaster.
Recovery Point Objective Recovery Time Objective
RTO is a targeted duration in which a
business process must be restored after a
disaster or disruption.
RPO is the maximum targeted period in
which data might be lost from an IT
service due to a major incident.
6. History of DR
There have been many challenges for traditional DR for
enterprises
Building and maintaining regional data centers
Failed DR tests
Not meeting RPO & RTO
High technical debt
7. AWS compared to traditional disaster recovery
Conventional
High cost to build disaster recovery
sites or data centers (CAPEX)
High cost of storage, backup, archival
and retrieval tools, and processes
(OPEX)
Difficult planning, procurement and
deployment
Challenging to verify DR plans
Single level of DR across the
organization
AWS
Low cost upfront investment (CAPEX)
On-demand costs (OPEX)
Consistent experience across AWS
environments
Recovery automation
Separate levels of DR per application
or business unit
8. DR topology map
ELB/Appliance
EC2/Auto Scaling
Route 53
Load Balancers
Web/App Servers
Your Data Centers
DNS
DB failover nodes
AD failover nodes
Availability Zones
Multi-regionDisaster Recovery
Data Centers
AD/Authentication
Database Servers
9. Ascending levels of DR options
Backup &
Restore
Pilot Light
Warm
Standby
Multi-Site
Backup of on-
premises data to
AWS to use in a
DR event
Replicate data and
minimal running
services into AWS,
ready to take over
and flare up
Replicate data and
services into AWS
ready to take over
Replicated and
load balanced
environments that
are both actively
taking production
traffic
RPO
a
RTO
COST
24 hours 24 hours
$
RPO
a
RTO
COST
12 hours 4 hours
$$
RPO
a
RTO
COST
1-4 hours 15 min
$$$
RPO
a
RTO
COST
<15 min 0-5 min
$$$$
Business continuity
begins
Un-interrupted Business
continuity
10. Backup & Restore Pilot Light Warm Standby Multi-Site
S3Storage
Gateway
Glacier EBS
Volumes
Route 53 Direct
Connect
VPN
NetworkingStorage
Multiple Direct
Connect
locations
Compute
Auto
Scaling
ELBEC2
Deployment/
Management
CloudFormation IAM
Added through the levels of DR
VPC
11. Backup and restore architecture
~$200 / Month
In US-EAST
+VPN
On-premises
Active
Production
www.example.com
Corporate data center AWS region
AWS DR failover
App
Servers
DB
Server
VPN
Connection
Storage
GatewayiSCSI
Backup
System
S3 / Bucket
Glacier / Archive
Web
Servers Internet traffic
S3 (1TB)
$31/Month
Glacier (2TB)
$22/Month
Storage Gateway
$125/Month
S3 / Bucket
S3 (1TB)
$31/Month
1TB
Data
Volume
12. Backup and restore details
Suitable for:
• Solutions that can sustain higher technical debt
• Lower business critical nature
• Low cost DR option
Leverage existing investments in
• De-duplication
• Compression
• WAN Acceleration
13. Pilot light architecture
Data Replication
On-premises
Active
Production
Route 53
www.example.com
Corporate data center
1 TB Data
Volume
AWS region
Web
Servers
AWS
Active
Production
Direct Connect
App
Servers
DB
Server
1TB
Data
Volume
DB
Server
14. Pilot light architecture
$309 / Month
In US-EAST
+DirectConnect
Data Replication
ELB
On-premises
Active
Production
Route 53
www.example.com
Corporate data center
1 TB Data
Volume
Web
Servers
AWS region
Web
Servers
AWS
Active
Production
Direct Connect
App
Servers
DB
Server
App
Servers
1TB
Data
Volume
DB
Server
EBS (GP2)
$100/Month
EC2 (m4.xlarge)
$205/Month
EC2 (t2.medium)
$0/Month
ELB (100GB Data)
$0/Month
EC2 (t2.small)
$0/Month
ELB (100GB Data)
$0/Month
R53 (1M Query)
$4/Month
CloudFormation
15. Pilot light details
Considerations
Suitable for:
Solutions that need lower RTO
& RPO
higher business critical nature
Mid-range cost DR option
3rd Party & Marketplace
CloudEndure
Racemi
Zerto
Others
16. Warm standby architecture
$410 / Month
In US-EAST
+DirectConnect
ELB
On-premises
Active
Production
Route 53
www.example.com
Corporate data center
1 TB Data
Volume
Web
Servers
AWS region
Web
Servers
AWS
Active
Production
App
Servers
DB
Server
App
Servers
1TB
Data
Volume
DB
Server EBS (GP2)
$100/Month
EC2 (m3.xlarge)
$205/Month
EC2 (t2.medium)
$41/Month
ELB (100GB Data)
$19/Month
EC2 (t2.small)
$22/Month
ELB (100GB Data)
$19/Month
R53 (1M Query)
$4/Month
CloudFormation
Data Replication
Direct Connect
17. Multi-site architecture
$473 / Month
In US-EAST
+DirectConnect
Data Replication
ELB
On-premises
Active
Production
Route 53
www.example.com
Corporate data center
1 TB Data
Volume
Web
Servers
AWS region
Web
Servers
AWS
Active
Production
Direct Connect
App
Servers
DB
Server
App
Servers
1TB
Data
Volume
DB
Server EBS (GP2)
$100/Month
EC2 (m3.xlarge)
$205/Month
EC2 (t2.medium)
$82/Month
ELB (100GB Data)
$19/Month
EC2 (t2.small)
$44/Month
ELB (100GB Data)
$19/Month
R53 (1M Query)
$4/Month
CloudFormation
18. Warm standby and multi-site details
Considerations
Suitable for:
Solutions that require RTO &
RPO in minutes
Core business critical functions
Higher cost DR option
Partners
Partner ecosystem
19. Lessons Learned
3rd Party solutions
Partner engagement
Opportunity to automate technical debt
Customer experiences
Briefly introduce some of the things we will do
Grab attention with $1000 giveaway
Describe how it will work with a partner engagement
Not discussing BC, however we will discuss Disaster Recovery, which is part of BC
BC is the business functions recovery model
DR is the technology & infrastructure systems
There will be more questions as we get into a panel discussion during the Q&A panel
Tell the story of a friend, who is now the CEO of a mid-sized enterprise.
who lost his entire office, data center and building in a fire, then after telling the story relate those terms in RPO and RTO.
A small company with only a few employee’s at the time is now thousands strong
There are many options and variations for setting up disaster recovery. Your business requirements like RPO and RTO drives a lot of this.
Most of the DR scenarios depend on these two key metrics.
If it harms critical business processes, it may be a disaster
Time-based definition – how long can the business stand the pain?
Think about the Probability of occurrence
Fire, flood, hurricane, tornado, earthquake, volcanoes
Plane crashes, vandalism, terrorism, riots, sabotage, loss of personnel, etc.
Anything that diminishes or destroys normal data processing capabilities
User Error / Corruption / Hacking Attack - Hacking – Thief Icon
User initiated threat
High Availability in the context of corrupt data.
Systems corruption (systems corruption as in the systems stop functioning)
U
Discuss my past of designing and building out traditional DR/Data centers. The complexity that came from those scenarios.
Lead into the next slide that shows the advantages of AWS
Very manual process.
Challenges with High Technical debt and runbooks for executing a DR
Conventional vs AWS
High upfront capex
Multi-Region vs. Multi Data Center messaging / Geographic separation
Mention data guard being used with RDS
Compare AZ’s to DR data centers
9
10
Discuss the application to be used throughout all the scenarios
Open Source Software to be used for all layers
Qualify upfront: Simple, Stateless application
Backup and restore to on-prem or other location.
Options include the AWS Storage Gateway and solutions from partners available from the AWS Marketplace
Same application,
Database replication is the key difference.
Pilot Light architecture
Note the addition of DirectConnect
Costs for DirectConnect not included
Consider augmenting this with existing technologies
Lets looks at warm standby scenario
The term warm standby is used to describe a DR scenario in which a scaled-down version of a fully functional environment is always up and running.
A warm standby solution extends the pilot light as it decreases the overall recovery time
Extremely low RTO/RPO
Automation becomes a critical element as you ascend to this level of Disaster Recovery
Multi-Site
Running both sites at once
Database replication going both directions
Costs: Remember this doesn’t include direct connect costs.
Warm standby and multisite are great options for low RTO/RPO’s
3rd – Do some in depth investigation on 3rd party vendors, they aren’t all a good fit for every use case
Technical Debt – The cost’s don’t represent the level of technical debt
Look to automate and opportunity to remove the technical debt.
Technical debt as an opportunity to remove
Customers are using DR as their entry point to DR
We have many partners in this space that are ready to assist you with these challenges.