2. Agenda
• Background
• Why we use AWS Cloud
• What you should know before start-up
• Case sharing
• Summary & next steps
3. Background
• Started to use AWS Cloud since 2010.
• Focus on managing multiple cloud platform to cooperate with
existing on premise environment.
• Support cloud management, software architect & cost
optimization
• High performance, availability & scalability AP development
9. Reason - Save money
• Avoid Resource Overprovision
• Project Risk Management
• Technique Refresh / Price Cut
• Pay by usage
• Computing/ Network/ Storage
• Software/ Service
10. Reason - Innovation
• Failure is the mother of innovation
• Experiment often
• Fail at low cost
• No need to reinvent the wheel
• Synergy from application & infrastructure
• API-based collaboration with Infrastructure
• Change design thinking
• Elastic Nature / Focus on business
12. Financial Part
Cost Visibility UP !
• CAPEX to OPEX
• Payment Process
• Communicate with Cost Center
• No Initial Investment
• Clear usage report
• Clear expectation (SLA)
• How we can help on cost optimization
13. Management Part
• Central Support Spot
• Volume discount
• Central AWS account creation
• Management Strategy
• Account management (IAM)
• Network & IP management
• Resource Naming & Tagging
• Foundation tool set
• Avoid reinventing everywhere
Payer account
?? ? ? ? ?
15. Cost Saving Part – Use cloud in right way
• Understand price model & limitation of AWS service.
• Internal traffic across AZ, VPC, Region
• Hybrid cloud is also a good choice
• disaster recovery
• development & testing
• scalability for unpredictable spike
• Software architecture is the key.
• Monolithic to Micro-service / Modularization (Dynamic load)
• Reduce unnecessary usage (IO / Storage / Traffic)
16. Technical Part – Computing
• Right Size (Vertical Scaling, Right instance type)
• Right Number (Horizontal Scaling, Auto scaling)
• Right Time
• Season (E-commerce promotion)
• Month ( Financial Report )
• Daily ( Working days)
• Hourly ( Working hours)
• Request ( Server less)
• Right Density (Docker, AWS ECS, Free)
18. System Briefing
• Summary (50M+ Devices)
• Device Software update
• Device Configuration update
• Device management
• Response time < 2 seconds
• System Capacity
• 45 VM (without DR)
• > 150,000,000 requests / day
Corporate data center + CDN
19. What are the challenges?
• Disaster Recovery Requirement
• Large initial procurement (45 VM 90 VM)
• Heavy implementation cost
• Heavy Operation Cost & Risk
• Many manual routine jobs
• DR Drill Costs a lot
• Resource Overprovision
• Reservation for surge
• Firewalls of responsibility
20. How we solve them
• Disaster Recovery Requirement
• Large initial procurement
• Heavy implementation cost
• Heavy Operation Cost & Risk
• Many manual routine jobs
• DR Drill Costs a lot
• Resource Overprovision
• Reservation for surge
• Firewalls of responsibility
Pay by usage
No reinvent the wheel
Auto-Scaling
Docker to isolate env.
Automation
Automation
21. This is why we use cloud
and want to use it well
23. How we do
• Application & VM monitor
• Logstash / Elasticsearch / Kibana
• AWS Cost Tools
• CloudWatch / Cost Alert
• Billing Console / Cost explorer
• Advanced Cost Dashboard
• TIBCO Spotfire
Visibility
24. How we do
• AWS CLI/ Docker/ ECS (Application)
• Cloud Formation Template (Environment)
• AWS Lambda (Event Trigger)Control Automation
tools
SOP
&
Policy
• VM Scheduler
• VM & Storage Backup
• Central application log store
• General Tag policy
• General Name policy
• General Log policy
25. How we do
• Auto-scaling
• Design for failure
• Recover by reboot (Self-Healing)
• Micro-service
• Cost optimization
Cloud-Native Design
26. New Architecture on AWS Cloud
• Active & Active mode
• No single point of failure
• Auto-scaling support
• RTO < 5 minutes
• RPO < 5 minutes
Active Active
27. When Disaster happened in main site (1)
• Route53 redirect traffic to
DR site.
• CloudWatch notify NOC
by SNS.
• NOC will decide to trigger
DR process or not.
Inactive ActiveActive
28. When Disaster happened in main site (2)
DR site will be promoted
to main site ( < 15 minutes)
Automation
29. When Disaster happened in main site (3)
• New DR Site can be
Created within 3 hours
• Service has no impact
• Active & Active mode
works again.
Active Active
30. Result
• TCO of 45 VMs 1,500 USD/ Months (with DR)
• Disaster Recovery implementation & drill are easier
• Lower operation cost & service risk
• Team growth & have fun