3. “EVERYTHING FAILS ALL THE TIME”
-WERNER VOGELS, CTO, AMAZON WEB SERVICES
HTTP://THENEXTWEB.COM/2008/04/04/WERNER-VOGELS-EVERYTHING-FAILS-ALL-THE-TIME/
BRUCE M. WONG | @BRUCE_M_WONG
4. THE ORIGINAL CHAOS MONKEY
CREATED BY NETFLIX CLOUD ARCHITECT, GREG ORZELL - @CHAOSSIMIA 2010
BRUCE M. WONG | @BRUCE_M_WONG
HTTPS://WWW.LINKEDIN.COM/IN/GORZELL
5. a
A STATE OF XEN
AWS EC2 REBOOT, 2014
BRUCE M. WONG | @BRUCE_M_WONG
17. SLOW IS HARD
START SLOW
•ACCOUNT LEVEL
•+10MS BEFORE +100MS
•+1% ERRORS BEFORE +80%
ERRORS
DIAL IT UP
•A -> D NOT * -> D
BRUCE M. WONG | @BRUCE_M_WONG
18. LESSON # 2 : FIXING ONE FAILURE MODE
EXPOSES NEW ONES
BRUCE M. WONG | @BRUCE_M_WONG
19. WHATS SO SPECIAL ABOUT CHAOS
BRUCE M. WONG | @BRUCE_M_WONG
CHAOS IS A CHOICE
20. WHATS SO SPECIAL ABOUT CHAOS
BRUCE M. WONG | @BRUCE_M_WONG
OUTAGES VS CHAOS
21. BRUCE M. WONG | @BRUCE_M_WONG
OUTAGES VS CHAOS
Uncontrolled Controlled
Unpredictable Scheduled
Time to Detect: Minutes 0 Time to Detect
Time to Resolve: ???? Time to Resolve: seconds*
Analysis Time: ???? Root Cause Analysis: Intentional
24. LESSON # 3 : THE CULTURE ASPECTS OF CHAOS
ARE HARD
BRUCE M. WONG | @BRUCE_M_WONG
25. BRUCE M. WONG | @BRUCE_M_WONG
MOST ENTERPRISES HIRE PEOPLE TO FIX THINGS. NETFLIX
HIRES PEOPLE TO BREAK THINGS….
…WE SHOULD EMBRACE NETFLIX'S CULTURE OF "CHAOS ENGINEERING"
THROUGHOUT ORGANIZATIONS OF ALL SHAPES AND SIZES.
27. SEEK PROGRESS OVER PERFECTION
TWILIO LEADERSHIP PRINCIPLE
BRUCE M. WONG | @BRUCE_M_WONG
28. GAME DAYS - BENEFITS
•Training New Engineers
•Discover Instrumentation gaps
•New Product Launches
•Incident Management Practices
BRUCE M. WONG | @BRUCE_M_WONG
29. GAME DAYS - THE SETUP
•Two “on-call” teams
•Separate rooms, separate slack
channels
•Master of Disaster
•Incident Commander
BRUCE M. WONG | @BRUCE_M_WONG
30. LEVERAGE EXISTING TESTBOTS
•Functionally test fallback code
•Early warning!
•Existing Integrations with
Telemetry, PagerDuty, Slack
•Incorporate into Canary
process
FUTURE
BRUCE M. WONG | @BRUCE_M_WONG
31. RECAP
Lesson # 1 : Trust your resilience
Lesson # 2 : Fixing one failure mode exposes new ones
Lesson # 3 : The culture aspects of Chaos are HARD
Get started today!
Game Days are your friend - do them early and often
Testbots + focus on developer productivity
BRUCE M. WONG | @BRUCE_M_WONG
32. WHEN YOU WISH UPON A BLUE MOON
BRUCE M. WONG | @BRUCE_M_WONG