This document provides operational checklists for using AWS, including a basic operations checklist and an enterprise operations checklist. The basic checklist covers security, EC2 usage, HA/backup, and application deployment testing. The enterprise checklist covers billing, security, asset management, HA/resilience, DR/backup, monitoring, configuration management, and release deployment. It also provides an example of how Monetate, a SAAS marketing company, uses the AWS cloud for operations.
3. Basic Operations Checklist
Purpose
Prior to initial deployment
Assess an application’s use of
specific services
Avoid common first-time
implementation mistakes
6. Basic Operations Checklist
Basic Security Questions
Nested
IAM Users Instance Security Security Groups Sharing AMIs
Operational use of Amazon EC2
Dynamic
EBS-backed Instance Separate Addressing
Instance Store-backed OS & Data
Volumes
7. Basic Operations Checklist (cont…)
HA, Backup and Recovery
EC2 EC2
Instance Snapshots
Mapping Custom Names to AWS
Route
53
11. Customer Example
Tom Janofsky
• VP Engineering at Monetate
Monetate
• SAAS provider of marketing agility tools - testing, targeting and
merchandising
• 20% of comScore BlackFriday transactions passed through
Monetate’s platform
• Deployed on AWS for 4 years
12. Billing & Account Mgmt @ Monetate
Simple Setup
• 1 AWS account for dev, test, accept, 1 account for production
Billing/Charge Back
• Spent much time modeling AWS costs and built a model driven by a single factor
(API calls) that is simple to explain and an accurate proxy for actual AWS costs
• No direct billing for AWS usage
Cost Optimization
• Reserved instances for constant load
• Blend of on-demand and spot Instances with EMR to reduce costs for intensive
data processing
13. Security & Access Mgmt @ Monetate
Access Control
• Console access via IAM credentials
• AWS REST API via secret keys
• Network access via ssh public key authentication
• Application access over HTTPS, role based access control
• Automated tools for granting and revoking privileges and rolling keys
• No PCI or PII data
14. Application HA/Resilience @ Monetate
Deployed in 4 availability zones across 2 regions (east and west)
Routing and failover with DNS based global traffic management
Each zone has a consistent configuration
Custom load balancing with HAProxy
EIP for public facing proxies - automated takeover for failed proxies
All DBs on EBS volumes, snapshotted
15. Monitoring & Incident Mgmt @ Monetate
24x7 Internal and external based monitoring
CloudWatch metrics
Application and OS level monitoring and alerting
3rd party notification and escalation tool
16. Config/Deployment Mgmt @ Monetate
Configuration Management
• Consistent AMI across deployment
• Automated configuration
• Automated patch management
Deployment Management
• Updates applied only to new instances, added to cluster, rollback is
to existing instances
• No downtime for deployment
Testing
• 5x like-like production testing
AWS providesFlexible cloud platformDifferent optionsCustomers appreciate this, but are also asking for Operational best practicesWays to apply consistencyIdeally in checklist formCreating checklistsWide range of customersStartups (Open Amplify social media)Large enterprises like Shell or NASA JPL interacting with rovers on Mars from AWSWide range of needsJust getting started, maybe first POCRunning mission critical applicationsComplex deploymentsBuilding sophisticated cloud management strategiesWe realized that a single checklist would not meet this diverse range of needs, so we created two operational checklists.
For customers just dipping their toe in the cloudprior to initial deploymentAssess app’s use of specific servicesAvoid common first-time implementation mistakesCovers things like making sure your application is leveraging:Basic securityHA/DRapplication testing and deployment best practices
Designed to:Identify key conceptsDevelop a holistic cloud strategySophisticated cloud migrations or deploymentsStrategically approach:BillingSecurityHA & DRand manage changes to their applications and infrastructure
AgendaSummarize Basic Checklist by grouping the checklist questions into related topicsProvide a quick overview to familiarize you with breadth and scope of the Enterprise Operations ChecklistTurn the presentation over to a Tom who will provide some specific examples of the best practices they are using in relation to several of the Enterprise Operational Checklist categories.Quick note:The information that we will discuss today is available on the AWS website under both the whitepaper and architecture centers. You can see the URL to the AWS whitepapers where Operational Checklists for AWS white paper.
We take the security of our customers extremely seriously and therefore added several basic security questions to help guide our customers to leverage security best practices such asUsing Identity & Access Management to provide individual access credentials to AWS APIs instead of shared credentialsApplying security best practices to your EC2 instance operating system:OS user account access credentialsPatching, updating, and hardeningImplementing secure Security Group rulesThinking through the security implications of sharing Amazon Machine ImagesUse of Amazon EC2 checklist items cover basic operational best practices in regards to Amazon’s Elastic Cloud Compute service.AWS provides 2 different classes of EC2 instances based on where the operating system is storedAnd while we are talking about storage, it’s a best practice in any environment to separate your OS and application data volumes for data intensive applications like database servers.Additionally, in order to provide a flexible and dynamic environment for our customers, EC2 provides dynamic IP addresses that can take some getting used to at first.Elastic IPsLoad balancersDynamic DNSManage your own static IP assignments in your own Virtual Private Cloud
Another set of checklist items around high availability, backup and recovery best practicesRegularly backup EC2 instances (e.g. snapshots)Fully test your recovery plansDeploy critical application components across multiple AZs Understand how fail-over will occur across AZsAnother checklist item addresses best practices for mapping customer domain names to AWS ELBs, CloudFront, or S3 buckets. DNS “CNAME” recordsRoute53 “Alias” records for ELB
AWS provides tremendous flexibilitytest in parallellow-cost, only paying for what you use like-like performance testingIdentical Production EnvironmentHour or twobang away at itReturn the capacity with no upfront costs or ongoing commitments.It’s quick, easy, powerful, and inexpensive. Please take advantage of this to deploy better tested, more solid applications.
Summarize Basic ChecklistIntended for new customers or assessing a specific application prior to deploymentEnterprise Operations ChecklistIdentifies some key high-level conceptsSophisticated, multi-application cloud deployments
High level categoriesAWS account management, billing & charge back, and cost optimizationOS, Application, transport and data-at-rest layersTagging, metadata, integration with existing asset management systemsHA & DR pointers and guidanceMonitoring & Incident MgmtCloudWatch, SNS, EC2 instance health APIsThe last 2 section deal with various options for managing change and application deployments, at which point I would like to transition over to Tom from Monetate to talk about some of the things they are doing in this, as well as some other of these checklist categories.
Thank you for joining us. Hopefully they will help you more consistently implement operational best practices in the AWS cloud.Thank you.