O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.
Limiting Damage During Chaos Experiments
Nils Meder | Computer Scientist @ Adobe
© 2016 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
Agenda
• Doing Chaos In Your Production System...
© 2016 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
Doing Chaos In Your Production System
• Testin...
© 2016 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
Building A Context Around Your Experiments
• C...
© 2016 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
Protect Your Infrastructure
• Target Infrastru...
© 2016 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
Example: Kill Random Instances
• Terminate Ran...
© 2016 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
Protect Your Application
• Plan For Chaos in Y...
© 2016 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
Resilience Patterns
• Bulk Heads
• Building Fa...
© 2016 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
Resilience Patterns
• “Release It!” - Michael ...
© 2016 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
Wrap-Up & Discussion
• Expect The Unexpected
•...
© 2016 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
References
• Resilience Patterns: http://de.sl...
Chaos Engineering - Limiting Damage During Chaos Experiments
Próximos SlideShares
Carregando em…5
×

Chaos Engineering - Limiting Damage During Chaos Experiments

282 visualizações

Publicada em

When doing chaos experiments, how can we avoid damage and how can we limit experiments to certain infrastructure/application parts.

Publicada em: Engenharia
  • Seja o primeiro a comentar

  • Seja a primeira pessoa a gostar disto

Chaos Engineering - Limiting Damage During Chaos Experiments

  1. 1. Limiting Damage During Chaos Experiments Nils Meder | Computer Scientist @ Adobe
  2. 2. © 2016 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Agenda • Doing Chaos In Your Production System • Building A Context Around Your Experiment • Protect Your Infrastructure • Example: Kill Random Instances • Protect Your Application • Resilience Patterns • Wrap-Up & Discussion 2
  3. 3. © 2016 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Doing Chaos In Your Production System • Testing in Production is The Ultimate Goal • But, It is Not The First Step • There are Always Differences Between Staging and Production • Scale, Networking, Datasets, … • Start In Staging Environment • Make Sure Doesn’t Bring Down The Whole Service • “Know Your Enemy” - Have A Clear View of Your Environment • Iterate Over Your Experiments • Be Brave - Having Just Some Basic Tests Running in Production is Better Than None 3
  4. 4. © 2016 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Building A Context Around Your Experiments • Chaos Testing is Not Just Pull The Plug • Focus On Business Critical Scenarios/Components First • Have A Clear Goal, e.g. What Happens When The Network Fails? • Focus - Run One Experiment At a Time • Monitor Your Experiments • Define Fallbacks And Defaults 4
  5. 5. © 2016 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Protect Your Infrastructure • Target Infrastructure Components • Think About Recovery • Take Snapshots • Limit The Damage To Single Instances • Limit The Damage To Groups of Instances • Of The Same Kind • Within The Same Workflow • Limit Percentage Of Impact • Limit What Chaos Tests Are Allowed To Do 5
  6. 6. © 2016 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Example: Kill Random Instances • Terminate Random EC2 Instances • Focus: • What Happens If A Number Of My Servers Die? • Does Autoscaling Work? • Is the Web API still serving requests? • The Test is Only Allowed To Terminate Instances • Simulate Experiment Before • Take An Environment Snapshot • Run The Test 6 Chaos Test App1 App2App3 Client Appx
  7. 7. © 2016 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Protect Your Application • Plan For Chaos in Your Application • Fail Fast, But Keep The Streams Flowing • Build Your Application Isolated • Apply Loose Coupling • Introduce Latency Control • Real-Time Data and Diagnostics 7
  8. 8. © 2016 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Resilience Patterns • Bulk Heads • Building Failure Units • Protect App Against Cross-Failures • Event-Driven & Stateless • Embrace Loose Coupling • Circuit Breaker • Timeouts • Fallbacks • Healthchecks 8
  9. 9. © 2016 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Resilience Patterns • “Release It!” - Michael Nygard • More On Resilience Patterns, Anit-Patterns and Case-Studies • ISBN-13: 978-0978739218 9
  10. 10. © 2016 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Wrap-Up & Discussion • Expect The Unexpected • Failures Are The Normal Case & Not Predictable • Do Not Try To Avoid Failures. Embrace Them. • Chaos Engineering Helps To Discover Weak Points • Apply Resilience Patterns 10
  11. 11. © 2016 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. References • Resilience Patterns: http://de.slideshare.net/ufried/patterns-of-resilience • Bulk Heads: http://skife.org/architecture/fault-tolerance/2009/12/31/bulkheads.html • Making APIs More Resilient: http://techblog.netflix.com/2011/12/making-netflix-api-more- resilient.html • “Release It!” - Michael Nygard 12

×