SlideShare uma empresa Scribd logo
1 de 20
10 Tips for Your Journey to the Public Cloud
Suchi Upadhyayula Sean McCluskey
Director of Product Development, Intuit Director of Quality and Operations, Intuit
May 28, 2015
Quick Facts About Mint
Millions of Active Users
> 50TB of Financial Data
> 400 Servers
(in 10 PODS, > 90 MySQL Shards)
1.5k req/sec, 80k concurrent
connections, 120k concurrent
sessions
Tablets
iPad, Android, Surface
Smart Phones
iPhone, Android, Win 8
Web
Desktops
Mac, Win 8
Mint is on …
10 Tips from Our Journey
Load Balancing
• Security policy against terminating SSL on ELB
– ELB acts as a dumb pass-through
• Routing logic to support bulk-head pattern (Pods) too complex for
current ELBs
• Developed a proxy layer to:
– Terminate SSL
– Implement routing logic
– Access audit logging
1
Securing Sensitive Customer Data
• Multi-layer encryption (integrated with Amazon’s Key Management System) with periodic key
rotation:
– Application encryption of sensitive data
– Encryption in flight
– File level encryption at rest
• Reviewed fields to identify sensitive data to be “application level” encrypted
– Dropping of clear text columns before data ready to ship
• >50TB of data encrypted
2
Establishing a Framework for Low Latency
• Prepare for latency impact due to encryption
– Mint planned for 30% degradation
• Continuous measurement of TP50, TP90, TP99 for critical features
– Weekly review of TPs to drive improvements to reduce latency
– Constant tuning of code and single page architecture
– Able to maintain TP50 & TP90 SLAs
• Create a culture of continuous focus on TPs to drive improvements
3
Infrastructure as Code
• Configuration change in the infrastructure resulted in a release
failing to deploy and requiring rollback
• What we learned:
– In AWS, operations spends a lot of time writing code: CloudFormation
templates, deployment automation, monitors
– Development rigor was new to the operations team
– Needed to adopt development practices within operations: designs, code
reviews, testing, validation, formal release processes for infrastructure
4
Migrating Large Volumes of Data
• Not feasible to copy >50TB (and growing) of secure data “over the
wire”
• Plan for data transport to AWS:
– Encrypted drives physically secure shipped to AWS; 3 days to ship backup
copy to AWS and upload
– Catch up replication
– Final drive shipment needs to be timed so that replication can catch up to the
shipment window and sustain data growth prior to production cutover
5
High Availability and Disaster Recovery
• Recovery Time Objective (RTO): time to restore a
service to operation
• Recovery Point Objective (RPO): amount of data
acceptable to lose
• Solve for availability first with Multi-AZ
• Determine acceptable RTO/RPO and solve for regional
failures second
– Balance lower RTO/RPO against increased cost and
complexity
– Recognize the technology you use to handle regional
failures will add complexity that could increase outages
Region US-EAST
Availability
Zone
Availability
Zone
Availability
Zone
Region US-WEST
Availability
Zone
Availability
Zone
Availability
Zone
6
Monitoring and Diagnostics
• Disassociate with IPs
– Instances, ELBs, and their IP addresses are dynamic
– Number of instances are constantly changing
– When an instance has issues it can be “blown away”
• Build resilient and self-healing infrastructure
– Monitoring should then be built to compliment this
– If you alert on failure, have the courtesy to alert on healing
7
End-to-End Testing
• In addition to validating the full functionality of the production
environment, you also need to validate:
– Build, config, deploy, and validation infrastructure
– Logging, Monitoring, etc system that ensure the environment is healthy
– Access controls and security
– Auto-Scaling
• Continuous synthetic testing in the production environment
– provide an end-to-end test to ensure the customer experience doesn’t degrade
8
Managing Costs
• Compute: reserved vs. on-demand
– If compute is “on” for more than 9 hours per day, reserved will save money
– On-demand for seasonal workloads and rare peaks
– Reaper scripts; shutdown unused instances
• Snapshots drove significant cost savings
• Storage is cheap
– A lot of work that yields a small return
• IOPS are not
– Optimizing IOPS per shard saved a lot of money
9
Other,
3.13%
Storage,
3.42%
IOPS,
17.09%Snapshots,
42.17%
Compute,
34.19%
Savings Distribution
Release Operations
• Infrastructure deployed independently of applications
– DB schema
– AMI
– Infrastructure as code
– Application
• Support rollbacks for everything (blue-green)
– We can always go back to N-1, ALWAYS!!
10
Summary
1. Load balancing: Evaluate if ELB is sufficient and plan ahead
2. Security: Multi-layer encryption, AWS Key Management
3. Low latency: TP50, TP90, TP99 measure and improve
4. Infrastructure as code: Design, review, test templates
5. Migrating large volumes of data: Encrypted drives
6. HA/DR: Multi-AZ, multi-region
7. Monitoring and diagnostics: Disassociate with IP addresses
8. End-to-end testing: Don’t forget to test auto-scaling
9. Managing costs: Compute is more expensive than storage
10. Release operations: Rollback-ready, blue-green
Thank You

Mais conteúdo relacionado

Mais procurados

Technologies You Need to Safely Use the Cloud
Technologies You Need to Safely Use the CloudTechnologies You Need to Safely Use the Cloud
Technologies You Need to Safely Use the Cloud
CloudPassage
 
Security and Compliance for Enterprise Cloud Infrastructure
Security and Compliance for Enterprise Cloud InfrastructureSecurity and Compliance for Enterprise Cloud Infrastructure
Security and Compliance for Enterprise Cloud Infrastructure
CloudPassage
 
Programatori cu capul in nori
Programatori cu capul in noriProgramatori cu capul in nori
Programatori cu capul in nori
Alex Popescu
 
Siebel Clinical for Small and Medium-Sized Organizations
Siebel Clinical for Small and Medium-Sized OrganizationsSiebel Clinical for Small and Medium-Sized Organizations
Siebel Clinical for Small and Medium-Sized Organizations
Perficient
 

Mais procurados (20)

Rethinking Security: The Cloud Infrastructure Effect
Rethinking Security: The Cloud Infrastructure EffectRethinking Security: The Cloud Infrastructure Effect
Rethinking Security: The Cloud Infrastructure Effect
 
Unlock the Intelligent Data Center with VMware & Zenoss
Unlock the Intelligent Data Center with VMware & ZenossUnlock the Intelligent Data Center with VMware & Zenoss
Unlock the Intelligent Data Center with VMware & Zenoss
 
Technologies You Need to Safely Use the Cloud
Technologies You Need to Safely Use the CloudTechnologies You Need to Safely Use the Cloud
Technologies You Need to Safely Use the Cloud
 
Powering Postbank Group’s Data-driven Strategy
Powering Postbank Group’s Data-driven Strategy Powering Postbank Group’s Data-driven Strategy
Powering Postbank Group’s Data-driven Strategy
 
45 Minutes to PCI Compliance in the Cloud
45 Minutes to PCI Compliance in the Cloud45 Minutes to PCI Compliance in the Cloud
45 Minutes to PCI Compliance in the Cloud
 
Grainger: Our Rookie Year with Zenoss
Grainger: Our Rookie Year with ZenossGrainger: Our Rookie Year with Zenoss
Grainger: Our Rookie Year with Zenoss
 
Elastic at Procter & Gamble: A Network Story
Elastic at Procter & Gamble: A Network StoryElastic at Procter & Gamble: A Network Story
Elastic at Procter & Gamble: A Network Story
 
Monitoreo en Azure con Operations Management Suite
Monitoreo en Azure con Operations Management SuiteMonitoreo en Azure con Operations Management Suite
Monitoreo en Azure con Operations Management Suite
 
What? VDI without Nutanix and ControlUp?!
What? VDI without Nutanix and ControlUp?!What? VDI without Nutanix and ControlUp?!
What? VDI without Nutanix and ControlUp?!
 
Security and Compliance for Enterprise Cloud Infrastructure
Security and Compliance for Enterprise Cloud InfrastructureSecurity and Compliance for Enterprise Cloud Infrastructure
Security and Compliance for Enterprise Cloud Infrastructure
 
Best Practices for Workload Security: Securing Servers in Modern Data Center ...
Best Practices for Workload Security: Securing Servers in Modern Data Center ...Best Practices for Workload Security: Securing Servers in Modern Data Center ...
Best Practices for Workload Security: Securing Servers in Modern Data Center ...
 
Firehost Webinar: How a Secure High Performance Cloud Powers Applications
Firehost Webinar: How a Secure High Performance Cloud Powers ApplicationsFirehost Webinar: How a Secure High Performance Cloud Powers Applications
Firehost Webinar: How a Secure High Performance Cloud Powers Applications
 
FireHost Webinar: How a Secure High Performance Cloud Powers Critical Applica...
FireHost Webinar: How a Secure High Performance Cloud Powers Critical Applica...FireHost Webinar: How a Secure High Performance Cloud Powers Critical Applica...
FireHost Webinar: How a Secure High Performance Cloud Powers Critical Applica...
 
Programatori cu capul in nori
Programatori cu capul in noriProgramatori cu capul in nori
Programatori cu capul in nori
 
Infrastructure monitoring made easy, from ingest to insight
Infrastructure monitoring made easy, from ingest to insightInfrastructure monitoring made easy, from ingest to insight
Infrastructure monitoring made easy, from ingest to insight
 
AWS Finland March meetup 2017 - selecting enterprise IoT platform
AWS Finland March meetup 2017 - selecting enterprise IoT platformAWS Finland March meetup 2017 - selecting enterprise IoT platform
AWS Finland March meetup 2017 - selecting enterprise IoT platform
 
Siebel Clinical for Small and Medium-Sized Organizations
Siebel Clinical for Small and Medium-Sized OrganizationsSiebel Clinical for Small and Medium-Sized Organizations
Siebel Clinical for Small and Medium-Sized Organizations
 
SplunkLive! Customer Presentation - SSA
SplunkLive! Customer Presentation - SSASplunkLive! Customer Presentation - SSA
SplunkLive! Customer Presentation - SSA
 
AWS Big Data in everyday use at Yle
AWS Big Data in everyday use at YleAWS Big Data in everyday use at Yle
AWS Big Data in everyday use at Yle
 
SplunkLive! Customer Presentation - FINRA
SplunkLive! Customer Presentation - FINRASplunkLive! Customer Presentation - FINRA
SplunkLive! Customer Presentation - FINRA
 

Semelhante a 10 Tips for Your Journey to the Public Cloud

Semelhante a 10 Tips for Your Journey to the Public Cloud (20)

Mtc learnings from isv & enterprise (dated - Dec -2014)
Mtc learnings from isv & enterprise (dated - Dec -2014)Mtc learnings from isv & enterprise (dated - Dec -2014)
Mtc learnings from isv & enterprise (dated - Dec -2014)
 
Mtc learnings from isv & enterprise interaction
Mtc learnings from isv & enterprise  interactionMtc learnings from isv & enterprise  interaction
Mtc learnings from isv & enterprise interaction
 
Postgresql in Education
Postgresql in EducationPostgresql in Education
Postgresql in Education
 
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your MindDeliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
 
CON5451_Brydon-OOW2014_Brydon_CON5451 (1).pptx
CON5451_Brydon-OOW2014_Brydon_CON5451 (1).pptxCON5451_Brydon-OOW2014_Brydon_CON5451 (1).pptx
CON5451_Brydon-OOW2014_Brydon_CON5451 (1).pptx
 
Suning OpenStack Cloud and Heat
Suning OpenStack Cloud and HeatSuning OpenStack Cloud and Heat
Suning OpenStack Cloud and Heat
 
VMworld 2013: Virtualizing Databases: Doing IT Right
VMworld 2013: Virtualizing Databases: Doing IT Right VMworld 2013: Virtualizing Databases: Doing IT Right
VMworld 2013: Virtualizing Databases: Doing IT Right
 
Denver devops : enabling DevOps with data virtualization
Denver devops : enabling DevOps with data virtualizationDenver devops : enabling DevOps with data virtualization
Denver devops : enabling DevOps with data virtualization
 
Resume_Kuldeep
Resume_KuldeepResume_Kuldeep
Resume_Kuldeep
 
Agile infrastructure
Agile infrastructureAgile infrastructure
Agile infrastructure
 
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
 
Drinking our own Champagne: How Woot, an Amazon subsidiary, uses AWS (ARC212)...
Drinking our own Champagne: How Woot, an Amazon subsidiary, uses AWS (ARC212)...Drinking our own Champagne: How Woot, an Amazon subsidiary, uses AWS (ARC212)...
Drinking our own Champagne: How Woot, an Amazon subsidiary, uses AWS (ARC212)...
 
Run Book Automation with PlateSpin Orchestrate
Run Book Automation with PlateSpin OrchestrateRun Book Automation with PlateSpin Orchestrate
Run Book Automation with PlateSpin Orchestrate
 
Run Book Automation with PlateSpin Orchestrate
Run Book Automation with PlateSpin OrchestrateRun Book Automation with PlateSpin Orchestrate
Run Book Automation with PlateSpin Orchestrate
 
Run Book Automation with PlateSpin Orchestrate
Run Book Automation with PlateSpin OrchestrateRun Book Automation with PlateSpin Orchestrate
Run Book Automation with PlateSpin Orchestrate
 
Run Book Automation with PlateSpin Orchestrate
Run Book Automation with PlateSpin OrchestrateRun Book Automation with PlateSpin Orchestrate
Run Book Automation with PlateSpin Orchestrate
 
Run Book Automation with PlateSpin Orchestrate
Run Book Automation with PlateSpin OrchestrateRun Book Automation with PlateSpin Orchestrate
Run Book Automation with PlateSpin Orchestrate
 
Využijte svou Oracle databázi na maximum!
Využijte svou Oracle databázi na maximum!Využijte svou Oracle databázi na maximum!
Využijte svou Oracle databázi na maximum!
 
Amazon RDS for MySQL – Diagnostics, Security, and Data Migration (DAT302) | A...
Amazon RDS for MySQL – Diagnostics, Security, and Data Migration (DAT302) | A...Amazon RDS for MySQL – Diagnostics, Security, and Data Migration (DAT302) | A...
Amazon RDS for MySQL – Diagnostics, Security, and Data Migration (DAT302) | A...
 
What's new in informix v11.70
What's new in informix v11.70What's new in informix v11.70
What's new in informix v11.70
 

Mais de Intuit Inc.

Mais de Intuit Inc. (20)

State of Small Business – Growth and Success Report
State of Small Business – Growth and Success ReportState of Small Business – Growth and Success Report
State of Small Business – Growth and Success Report
 
The State of Small Business Cash Flow
The State of Small Business Cash FlowThe State of Small Business Cash Flow
The State of Small Business Cash Flow
 
Small Business in the Age of AI
Small Business in the Age of AI Small Business in the Age of AI
Small Business in the Age of AI
 
Get financially Fit: Tips for Using QuickBooks
Get financially Fit: Tips for Using QuickBooksGet financially Fit: Tips for Using QuickBooks
Get financially Fit: Tips for Using QuickBooks
 
SEO, Social, and More: Digital Marketing for your Business
SEO, Social, and More: Digital Marketing for your BusinessSEO, Social, and More: Digital Marketing for your Business
SEO, Social, and More: Digital Marketing for your Business
 
Why Building Your Brand is Key to Getting Customers
Why Building Your Brand is Key to Getting CustomersWhy Building Your Brand is Key to Getting Customers
Why Building Your Brand is Key to Getting Customers
 
Get Found Fast: Google AdWords Strategies for Growth
Get Found Fast: Google AdWords Strategies for GrowthGet Found Fast: Google AdWords Strategies for Growth
Get Found Fast: Google AdWords Strategies for Growth
 
Giving Clients What They Want
Giving Clients What They WantGiving Clients What They Want
Giving Clients What They Want
 
What Accounting Will Look Like in 2030
What Accounting Will Look Like in 2030What Accounting Will Look Like in 2030
What Accounting Will Look Like in 2030
 
Pricing in the Digital Age
Pricing in the Digital Age Pricing in the Digital Age
Pricing in the Digital Age
 
Handbook: Power Panel on Apps you need to give you more time to serve your cl...
Handbook: Power Panel on Apps you need to give you more time to serve your cl...Handbook: Power Panel on Apps you need to give you more time to serve your cl...
Handbook: Power Panel on Apps you need to give you more time to serve your cl...
 
Handbook: Advanced QuickBooks Online - Handling Tricky Transactions
Handbook: Advanced QuickBooks Online - Handling Tricky TransactionsHandbook: Advanced QuickBooks Online - Handling Tricky Transactions
Handbook: Advanced QuickBooks Online - Handling Tricky Transactions
 
Advanced QuickBooks Online - Handling Tricky Transactions
Advanced QuickBooks Online - Handling Tricky TransactionsAdvanced QuickBooks Online - Handling Tricky Transactions
Advanced QuickBooks Online - Handling Tricky Transactions
 
Handling tricky transactions in QuickBooks Online
Handling tricky transactions in QuickBooks OnlineHandling tricky transactions in QuickBooks Online
Handling tricky transactions in QuickBooks Online
 
Social media is social business
Social media is social business  Social media is social business
Social media is social business
 
Conversation guide: Forming deep relationships with your clients
Conversation guide: Forming deep relationships with your clientsConversation guide: Forming deep relationships with your clients
Conversation guide: Forming deep relationships with your clients
 
Making tax digital
Making tax digital  Making tax digital
Making tax digital
 
Giving clients what they want
Giving clients what they want Giving clients what they want
Giving clients what they want
 
100 percent cloud your action plan for success
100 percent cloud your action plan for success 100 percent cloud your action plan for success
100 percent cloud your action plan for success
 
Attracting and retaining top talent
Attracting and retaining top talent Attracting and retaining top talent
Attracting and retaining top talent
 

Último

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Último (20)

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 

10 Tips for Your Journey to the Public Cloud

  • 1. 10 Tips for Your Journey to the Public Cloud Suchi Upadhyayula Sean McCluskey Director of Product Development, Intuit Director of Quality and Operations, Intuit May 28, 2015
  • 4. > 50TB of Financial Data
  • 5. > 400 Servers (in 10 PODS, > 90 MySQL Shards)
  • 6. 1.5k req/sec, 80k concurrent connections, 120k concurrent sessions
  • 7. Tablets iPad, Android, Surface Smart Phones iPhone, Android, Win 8 Web Desktops Mac, Win 8 Mint is on …
  • 8. 10 Tips from Our Journey
  • 9. Load Balancing • Security policy against terminating SSL on ELB – ELB acts as a dumb pass-through • Routing logic to support bulk-head pattern (Pods) too complex for current ELBs • Developed a proxy layer to: – Terminate SSL – Implement routing logic – Access audit logging 1
  • 10. Securing Sensitive Customer Data • Multi-layer encryption (integrated with Amazon’s Key Management System) with periodic key rotation: – Application encryption of sensitive data – Encryption in flight – File level encryption at rest • Reviewed fields to identify sensitive data to be “application level” encrypted – Dropping of clear text columns before data ready to ship • >50TB of data encrypted 2
  • 11. Establishing a Framework for Low Latency • Prepare for latency impact due to encryption – Mint planned for 30% degradation • Continuous measurement of TP50, TP90, TP99 for critical features – Weekly review of TPs to drive improvements to reduce latency – Constant tuning of code and single page architecture – Able to maintain TP50 & TP90 SLAs • Create a culture of continuous focus on TPs to drive improvements 3
  • 12. Infrastructure as Code • Configuration change in the infrastructure resulted in a release failing to deploy and requiring rollback • What we learned: – In AWS, operations spends a lot of time writing code: CloudFormation templates, deployment automation, monitors – Development rigor was new to the operations team – Needed to adopt development practices within operations: designs, code reviews, testing, validation, formal release processes for infrastructure 4
  • 13. Migrating Large Volumes of Data • Not feasible to copy >50TB (and growing) of secure data “over the wire” • Plan for data transport to AWS: – Encrypted drives physically secure shipped to AWS; 3 days to ship backup copy to AWS and upload – Catch up replication – Final drive shipment needs to be timed so that replication can catch up to the shipment window and sustain data growth prior to production cutover 5
  • 14. High Availability and Disaster Recovery • Recovery Time Objective (RTO): time to restore a service to operation • Recovery Point Objective (RPO): amount of data acceptable to lose • Solve for availability first with Multi-AZ • Determine acceptable RTO/RPO and solve for regional failures second – Balance lower RTO/RPO against increased cost and complexity – Recognize the technology you use to handle regional failures will add complexity that could increase outages Region US-EAST Availability Zone Availability Zone Availability Zone Region US-WEST Availability Zone Availability Zone Availability Zone 6
  • 15. Monitoring and Diagnostics • Disassociate with IPs – Instances, ELBs, and their IP addresses are dynamic – Number of instances are constantly changing – When an instance has issues it can be “blown away” • Build resilient and self-healing infrastructure – Monitoring should then be built to compliment this – If you alert on failure, have the courtesy to alert on healing 7
  • 16. End-to-End Testing • In addition to validating the full functionality of the production environment, you also need to validate: – Build, config, deploy, and validation infrastructure – Logging, Monitoring, etc system that ensure the environment is healthy – Access controls and security – Auto-Scaling • Continuous synthetic testing in the production environment – provide an end-to-end test to ensure the customer experience doesn’t degrade 8
  • 17. Managing Costs • Compute: reserved vs. on-demand – If compute is “on” for more than 9 hours per day, reserved will save money – On-demand for seasonal workloads and rare peaks – Reaper scripts; shutdown unused instances • Snapshots drove significant cost savings • Storage is cheap – A lot of work that yields a small return • IOPS are not – Optimizing IOPS per shard saved a lot of money 9 Other, 3.13% Storage, 3.42% IOPS, 17.09%Snapshots, 42.17% Compute, 34.19% Savings Distribution
  • 18. Release Operations • Infrastructure deployed independently of applications – DB schema – AMI – Infrastructure as code – Application • Support rollbacks for everything (blue-green) – We can always go back to N-1, ALWAYS!! 10
  • 19. Summary 1. Load balancing: Evaluate if ELB is sufficient and plan ahead 2. Security: Multi-layer encryption, AWS Key Management 3. Low latency: TP50, TP90, TP99 measure and improve 4. Infrastructure as code: Design, review, test templates 5. Migrating large volumes of data: Encrypted drives 6. HA/DR: Multi-AZ, multi-region 7. Monitoring and diagnostics: Disassociate with IP addresses 8. End-to-end testing: Don’t forget to test auto-scaling 9. Managing costs: Compute is more expensive than storage 10. Release operations: Rollback-ready, blue-green

Notas do Editor

  1. Mint is running in many different devices and platforms.