(Presented by cloudpack)
cloudpack is a premium consulting partner of AWS in Japan, and since 2010 has been helping customers architect their workloads for scalability, availability and disaster recovery. In this session, cloudpack explains how they are solving customer pain points with AWS architecture best practices. Specifically, they will discuss a multi-region Disaster Recovery system designed for Toyota and a highly available and scalable second screen system for Nippon Television (JoinTV).
How to Host and Manage Enterprise Customers on AWS (ARC213) | AWS re:Invent 2013
1. How to host and manage enterprise customers on AWS:
TOYOTA, Nippon Television, UNIQLO use cases
Kazutaka Goto - Evangelist, cloudpack
Ken Tamagawa - Sr. Manager, Solutions Architecture, Amazon Web Services
November 15, 2013
2. Japan Market
• Tokyo region opened in March 2011
• Tokyo region was the fastest growing AWS
region in its first year
• There are more than 20,000 AWS accounts in
Japan
3. • Launched in 2010 as a managed hosting service
provider
• Systems integrator for enterprise companies
• Premier Consulting Partner for 2013 & 2014
5. Focus of Today’s Session:
Real world use cases in Japan
Second Screen
Disaster Recovery
6. What is Second Screen?
Second Screen
First Screen
Shazam provided “2012 Super Bowl advertising
platform” on AWS, handled 500,000 requests per
second
7. • Interactive communication
service by Nippon Television
Network, one of Japan’s
largest broadcasting
companies
• Combines TV and Internet
into a unified experience
10. Requirements from JoinTV
• Handle sudden access surges
• Share across social networks with little delay
• Build a scalable system for the above
requirements
11. Our Approach
• Determine minimum resources to handle traffic
surges with on-demand load testing service
“neuster”
– Configure load testing setting
– Set up the console and run load testing repeatedly
– Optimize application and identify minimum resources
• Minimize delay by desynchronizing feed
requests to social networks
15. Architecture for Minimizing Delay
Batch
Batch
Auto Scaling
SQS and Batch server cluster
with Auto Scaling mechanism
to share message to social networks
18. Architecture for Minimizing Delay
Batch
Batch
Auto Scaling
SQS and Batch server cluster
with Auto Scaling mechanism
to share message to social networks
24. CDP Template
•
Problem
•
•
Wanted to exchange servers with a shorter amount
of down time
Solution
•
•
Using EIP to exchange an existing EC2 instance to
a newly launched one
Elastic IP
Diagram
Pros
•
•
•
Route 53
Implementation
•
•
•
Using IP address to exchange an existing server to
a newly launched server
Can exchange servers regardless of DNS TTL
Notes
Others
EC2
EC2
25. CDP Categories
Basic
Snapshot
Stamp
Scale Up
On-demand Disk
Static Contents
Web Storage
Direct Hosting
Private Distribution
Cache Distribution
Rename Distribution
Availability
Multi-Server
Multi-Datacenter
Floating IP
Deep Health Check
Scaling
Scale Out
Clone Server
NFS Sharding
NFS Replica
State Sharing
URL Rewriting
Rewrite Proxy
Cache Proxy
Scheduled Scale Out
Data Uploading
Write Proxy
Storage Index
Direct Object Upload
Relational Database
DB Replication
Read Replica
In-memory DB Cache
Sharding Write
Batch Processing
Queuing Chain
Priority Queue
Job Observer
Scheduled Auto Scaling
Maintenance
Bootstrap
Cloud DI
Stack Deployment
Server Swapping
Monitoring Integration
Web Storage Archive
Networking
On-demand NAT
Backnet
Functional Firewall
Operational Firewall
Multi Load Balancer
WAF Proxy
CloudHub
29. Requirements from TOYOTA
• Large-scale migration into AWS for 20 websites
experiencing traffic surges
– Those sites need to interact with their on-premises system
• A system for prompt multi-regional disaster
recovery
30. Our Approach
• Build an environment able to handle traffic larger
than requested by TOYOTA (100-300 million
PVs/month)
• Design disaster recovery process across
regions in short time
–
–
–
–
Identify the order of processes
Prepare OS images (AMIs)
Write configuration for AWS CloudFormation template
Import latest data from their data center
31. Overview of System Architecture
• Redundant and scalable
• Make an additional backup
on-premise
• Disaster recovery
in Singapore region
daily backup
On-premises
32. Architecture for Toyota.jp
• Each website has 1 load
balancer and at least 3
EC2 instances
– for redundancy
Toyota.jp
Availability Zone
Availability Zone
On-premise
Lexus.jp
Availability Zone
daily backup
Availability Zone
Toyotaglobal.com
Availability Zone
daily backup
Availability Zone
33. Management Servers
• Monitoring, CMS, gateway...
• Server-to-server connection
for on-premise system
– via VPN
– with internal load balancer for
redundancy
daily backup
On-premise
34. Backup strategy
• Make an additional backup
on-premises
– also have original backup in AWS
daily backup
daily backup
On-premise
35. AWS CloudFormation for Disaster Recovery
• Develop same system in Singapore on AWS CloudFormation
• One-click deployment with a template
Data recovery
On-premises
36. Infrastructure Design and Development
NAT is running on
a single instance,
which could fail!
daily backup
On-premises
38. Design Pattern for Architecting
High Availability NAT
Bootstrap
Multi-Datacenter
DB Replication
Stack Deployment
39. In Real World Cases...
• For situations that traditional Auto Scaling can’t
handle, optimize provisioning by using load
testing similar to the real access environment
– You can do scheduled Auto Scaling too if necessary
• Multi-regional disaster recovery is now possible
if necessary and can be done quickly