In this session, we focus on designing for high availability, with evaluation criteria for using services and features such as Amazon Route 53, Elastic Load Balancing, Auto Scaling, route tables, network interfaces, device clustering, and the Transit VPC architecture. We also explore how to create highly available networking between regions as well as on-premises.
3. What if it’s not that simple?
AWS Provides many services to improve availability
4. Common non-webby applications
• Business applications
• Legacy application
• Requirements for third party services
• Security
• Networking
• Load Balancing
• Storage
• Must use IP addresses
6. DNS and Auto-Scaling Design
• Old DNS problems are still DNS problems
• Caching, TTL, client support
• ELB pointers
• IP addresses change
• Performs Source NAT
• Session Stickiness for HTTP/S
• Supports TCP
• Minimum failover time is 7-12 seconds
• Route 53 pointers
• Multi-Region
• Separate from Auto-Scaling
• Better for UDP, non-NAT, and simpler workloads
• Minimum failover time is 10-20 seconds
• Auto-Scaling
• Publish and use custom metrics when appropriate
• Lifecycle hooks can assist with instance provisioning
Amazon Route 53
Elastic Load Balancing
Auto Scaling
7. Lambda is glue
• Lambda helps fill gaps
• Handles Availability Zone degradation gracefully
• Event driven or scheduled
• 1 minute minimum frequency
• 1M requests and 400,000 GB-seconds in the free tier
• Use Cases
• Adding interfaces in Auto Scaling groups
• Adding and removing IP addresses in Route 53
• Automated failure detection and remediation
• Detecting new Elastic Load Balancing IP addresses
Lambda
8. Networking Tips
• Internet gateways are highly available and don’t have
bandwidth limits
• There is one Virtual private gateway per VPC which
supports many Direct Connect virtual interfaces and
VPN connections
• For Direct Connect, availability and bandwidth are dependent
on the port speeds and BGP routing policy
• For VPN, availability is automatically managed with 2
connections which are multi-gigabit in throughput
• Subnets, IP addresses, Elastic Network Interfaces, and
NAT Gateways are local to one Availability Zone
10. High Availability Methods
• Agent-based solutions
• DNS
• Route 53
• Elastic Load Balancing Sandwich
• Auto Scaling Group Size 1
• Networking
• Floating Elastic Network Interface
• Floating Elastic IP address
• Route shifting
11. Agent-based solutions
Host-based Security Host-based Security
Central Monitoring
and Control
Use Cases
• Highly elastic applications
• DevOps + DevSecOps
• Host IDS / IPS
Design Notes
• Can inspect encrypted data
• Scales with application
• Requires trust in user or
application space
• Requires application
compatibility
• Increases host overhead
Failover Time
• Variable
13. Route 53 or DNS
Use Cases
• Multi-region applications
• Stateless web front ends
• Applications utilizing UDP
Design Notes
• Client must support DNS
• Application is tolerant of DNS
caching
• Inbound only
• Multiple routing policies to use
• Outbound return may be
asymmetric
Failover Time
• 20+ seconds
example.com
Internet
AZ 1 AZ 2
Route 53
14. Elastic Load Balancing Sandwich
Use Cases
• Web Proxies, WAF
• Inbound web security
Design Notes
• Stickiness is available for HTTP/S
• Use X-Forwarded for source
visibility
• Set a low TTL for faster failover
• Health check the device instead of
a pass-through health check
• May require a worker node to
prepare instances for auto-scaling
Failover Time
• 8+ second failover
Elastic Load
Balancing (ELB)
Elastic Load
Balancing (ELB)
Auto Scaling
Auto Scaling
Web Servers
inside.example.com
example.com
Internet
Proxy, WAF, or Firewall
15. Auto Scaling Group Size of 1
Use Cases
• Simple HA
• Tolerant to minutes of
interruptions
• Management consoles
Design Notes
• Effective cost reduction
• Aware of EC2 failures
• Optional addition of ELB health
checks
Failover Time
• Minutes, dependent on instance
boot time and ELB monitoring
17. Design considerations:
• VPC API calls are eventually consistent
• Test it!
• Relies upon user or partner built monitoring
• Can happen ‘on box’ or ‘off box’
• Who’s monitoring the monitor?
• Routes, interfaces, and EIPs point to one instance
• Does this meet your scaling requirement?
Networking Options
18. Floating Elastic Network Interface
Use Cases
• Stateful Applications
• Clustering
• Virtual IP emulation
Design Notes
• Inbound and Outbound Traffic
• Attach EIPs to the border
instances for inbound traffic
• Monitoring between instances is
required
• Single Availability Zone only
Failover Time
• Timing is subject to the attach-
network-interface API request
19. Floating Elastic IP Address
Use Cases
• Similar to Floating ENI
• EIPs are more granular to move
Design Notes
• Monitoring between instances is
required
• Costs begin after remapping
EIPs over 100 times per month
• EIPs can move between
Availability Zones, but will
change private addresses
Failover Time
• Timing is subject to the
associate-address API request
20. Route Shifting
Use Cases
• Active-passive solutions in
different Availability Zones
• Inline security services
Design Notes
• Outbound traffic
• No clustering or synchronization
• Monitoring between instances is
required
• Multiple Availability Zones
Failover Time
• Timing is subject to the replace-
route API request
21. Transit VPC
Use Cases
• Connecting VPCs within a
region and across accounts
• Centralize resources back
on-premises
Design Notes
• Utilizes Cisco CSR
• Uses tags to automate VPC
connectivity with Lambda
• Bandwidth bottleneck at
approximately 1.5-2 gbps
Failover Time
• BGP and DPD timers are 30
seconds
23. • Protect your AWS deployment from advanced cyberattacks
• Enforce policy consistency with centralized management
• Automate deployment and policy updates so security keeps pace with
the business
VM-Series Next-Generation Firewall on AWS
AZ1b
Web
1
DB1
Subnet1
Subnet2
24. CloudFormation Template: Automates full
use case deployments
S3: AWS service where bootstrapping files
are stored
CloudWatch: Consumes metrics and makes
intelligent scale in/out decisions
Lambda: Code as a service pushes custom
metrics to CloudWatch via XML API
Auto Scaling Groups (ASG): The firewalls
are members of a group that scales in/out
based on custom metrics
PAN-OS Bootstrapping: Automates
creation of fully configured firewall
PAN-OS API: enables delivery of custom
metric to CloudWatch
Panorama: Optional but highly
recommended to simplify VM-Series
management
Native AWS and PAN-OS/VM-Series Services Used
AWS Services PAN-OS/VM-Series Services
25. Region 1
AZ1
External ELB
AZ2
Internal ELB
Web ASG
1
CFT deploys
base topology
ASG1
2
Initial firewalls are
bootstrapped from
S3
ASG2
Bootstrapping
adds FWs to
Panorama
27. Region 1
AZ1
External ELB
AZ2
Internal ELB
Web ASG
ASG1
5 l function collects
PAN-OS metrics via API
Custom metrics sent to
CloudWatch
6
7
Alarm triggers
FW ASG scale
events
ASG2
Bootstrapping
continues to add
FWs to
Panorama
l Function
removes FWs
from Panorama
28. Region 1
AZ1
IELB VIP 1 IELB VIP 2
AZ2
Web ASG
ASG1 ASG2
8 l function monitors
for ELB VIP changes IELB VIP 3
9
l function deploys
new ASG with NAT
rule for new VIP
ASG3
IELB VIP 4
ASG4
External ELB
Internal ELB
31. Overlays
Use Cases
• When simpler topologies don’t
meet requirements
• Multicast
• Abstraction frameworks
Design Notes
• Limitless responsibility
• Security redesign
• Visibility and complexity
problems
• Outbound only unless extended
outside of the VPC
Failover Time
• Variable
32. Services VPC
Use Cases
• Centralized firewalls, IDS/IPS
• WAN, Security or Shared Services
• Multiple VPCs
Design Notes
• Device must support VPN + NAT
• VPN Overhead on devices
• VGW outbound is active/passive and
has multi-gigabit bandwidth
• Supports multiple Availability Zones
• Scales to 8-10 VPCs due to VGW IP
address overlap without VRFs
• Requires BGP policy design for
symmetric routing
Failover Time
• BGP and DPD timers are 30 seconds
FW
Internet
AZ1
Application VPC
VPN
VGW
Incoming could be
EIP, DNS, or
Route 53
Advertising a
0.0.0.0/0 down,
VPC advertising
CIDR up
FW
AZ2
AZ1 Routes have
shorter path than
AZ2
Application VPC
VGW
33. Availability Zone Overlay Mesh
Use Cases
• Encryption in transit
• Applications manage their own
high availability
Design Notes
• Single or multiple devices per
Availability Zone
• A device failure is equivalent to
an Availability Zone failure
• Centralized management is
recommended
• Cost of devices may be high
Failover Time
• Variable, depending on routing
protocol and tunnel
AZ1 AZ2
FW FW
AZ1 AZ2
FW FW
AZ1 AZ2
FW FW
AZ1 AZ2
FW FW
Full Mesh VPN
Internet
Production
VPC
Staging
VPC
Development
VPC
WAN
DMZ VPC
On Premises
35. Customer #1
Scale
• Using VPCs to segment production and development and
different organizations – 8 VPCs total
Application mix
• Traffic will be a mix of TCP, UDP, and HTTP
Security
• Firewalls are required between VPCs and to the Internet
• Need centralized control
• Require 1gbps of private connectivity to on-premises
36. Customer #1 – Services VPC
with Direct Connect
FW
Internet
AZ1
Application VPC
VPN
VGW
FW
AZ2
Application VPC
VGW
Direct Connect
Private Virtual Interface
WAN
Datacenter
• Security Groups within the VPC
• Spoke VPCs route points
towards VPN
• On-premises (RFC 1918) routes
towards Direct Connect
• Traffic to the Internet or other
applications goes through the
firewall
37. Customer #2
Scale
• 2 Gbps to a single VPC
• Requires high availability and backup for failure
Application
• Mix of lift and shift applications and web applications
Security
• AWS is an ‘untrusted datacenter’; IPS to and from on-premises
• Use AWS Internet, but only for patches and AWS API calls
38. Customer #2
Encrypted Direct Connect and Outbound Proxy
Instances have proxies set
for outbound HTTP traffic
Routes to on-premises split
between firewalls with VPN
connections
Multiple firewalls and
routes to handle load
Firewalls handle
approximately 1.5 Gbps
Use ENI shifting for
additional outbound high
availability
Internet
AZ1
VPN over Direct Connect
AZ2
Direct Connect
WAN
Datacenter
FW
FW
FW
FW
Backup VPN
Application
Subnets
URL URLOutbound
Proxy
Subnets
Internet
39. Closing Thoughts
• Pick the right design for your use case
• Think about inbound vs. outbound, scale, and auto-scaling
• Mix and match designs to meet requirements
• May require segmenting your applications
• Start simple, grow as you need
• Migrate from one design pattern to another