The document is a presentation about embracing failure in front-end development. It discusses measuring errors and performance metrics over time, using tools like Phantomas to monitor changes, configuring alerts when issues occur, and injecting "chaos" to test resilience. The presentation recommends getting experience with failure before Fridays to avoid being woken by alerts at 3am.
2. 7/25/14
What this talk covers
NOT COVERED: MY RECIPE FOR TEXAS-STYLE BEEF CHILI. FIND ME AFTER TO TALK ABOUT IT.
The inevitability that Javascript apps will break.
Borrowing good ideas about failure from operations teams.
A bit about the theory of complex systems failure.
Open-source tools and services that help make apps more resilient.
Why talking about failure in the front-end is important.
4. 7/25/14DR. COOK IS MY HERO
RICHARD I. COOK, MD. HOW COMPLEX SYSTEMS FAIL.
“Complex systems are intrinsically
hazardous systems.”
SOME THEORY, PART 1
6. 7/25/14
So you want to use a 3rd party service…
SERIOUSLY, PAUL IRISH APPEARS IN ALL MY TALKS.
THERE ARE LOTS: HTTPS://PLUS.GOOGLE.COM/+PAULIRISH/POSTS/12BVL5EXFJN
7. 7/25/14NS_TOO_MUCH_NOISE. NOT REALLY SURE WHY I REDACTED THE URLS.
FURTHER READING: HTTP://BLOG.MELDIUM.COM/HOME/2013/9/30/SO-YOURE-THINKING-OF-TRACKING-YOUR-JS-ERRORS
Example window.onerror output
8. 7/25/14DOES THIS SOUND LIKE COMMON SENSE YET?
"Change introduces new forms of failure."
RICHARD I. COOK, MD. HOW COMPLEX SYSTEMS FAIL.
SOME THEORY, PART 2
9. 7/25/14
Monitor change with phantomas
CREEPY PICTURE, NO? I BET HE WRITES ERLANG. I ALSO DON'T KNOW HOW TO SAY PHANTOMAS.
HTTPS://GITHUB.COM/MACBRE/PHANTOMAS
JEAN MARAIS AS FANTÔMAS IN THE 1964 FILM.
Phantomas is “PhantomJS-based web performance metrics collector and monitoring tool”.
phantomas --cookie
'_session=<redacted>'
--reporter=statsd
--statsd-host 127.0.0.1 -
-statsd-prefix stg
--runs 5
http://staging-web.com
10. 7/25/14
How to get super-detailed site metrics…
if you’re lazy and cheap.
5 HABITS OF HIGHLY LAZY FRONT-END PERFORMANCE ENGINEERS
Cloud server/your laptop with phantomas installed
Cron job that runs phantomas with statsd output
DataDog Lite Account + Install DataDog Agent on Server
Configure Alerting (I recommend PagerDuty)
Get woken up at 3am
11. 7/25/14
Make the metrics understandable and actionable
THIS LOOKS IMPRESSIVE WHILE YOU READ HACKER NEWS ON YOUR OTHER MONITOR
TESTING DASHBOARD FOR STAGING ENVIRONMENT IN DATADOG.
EVEN FANCIER: INTEGRATE IT INTO YOUR WEB APP: HTTPS://GITHUB.COM/BLOG/1252-HOW-WE-KEEP-GITHUB-FAST
12. 7/25/14
Get alerted as things happen
YOU'LL BE ANGRY AT ME WHEN THIS WAKES YOU UP AT 3AM
CREATING A NEW METRIC ALERT IN DATADOG
Choose a phantomas
metric
Define conditions
13. 7/25/14SAY THIS THE NEXT TIME YOU BLOW SOMETHING UP.
“Failure free operations require experience
with failure.”
RICHARD I. COOK, MD. HOW COMPLEX SYSTEMS FAIL.
See also: https://blog.pagerduty.com/2013/11/failure-friday-at-pagerduty/
SOME THEORY, PART 3
14. 7/25/14
Inject chaos into your front-end
ORIGINAL GRAPHIC SLIGHTLY REDACTED
HTTPS://GITHUB.COM/TRAVIS-HILTERBRAND/CHAOS-MONKEY-BROWSER
HTTPS://GITHUB.COM/MIKL/NODE-CHAOS-MONKEYWARE
15. 7/25/14EMBRACING FAILURE ON THE FRONT-END
var props = {
probability:0.5,
allowedMethods:['GET'],
mischiefTypes:[
ChaosMonkey.MischiefTypes.delay,
ChaosMonkey.MischiefTypes.http403
]
};
ChaosMonkey(props);
CONFIGURING CHAOS-MONKEY-
BROWSER (*JQUERY REQUIRED)
With a 50% probability, this configuration will
cause jQuery ajax GET requests to slowly
fail with a 403 response.
CDN Failure
API Failure
Connection Failure
Bad SSL certificates
And more!
Prepares for:
16. 7/25/14
Other possible strategies
HOW TO ANNOY PEOPLE DURING CODE REVIEW
1. DISABLE/SLOW DOWN NETWORK CONNECTION (IN CHROME CANARY DEVTOOLS):
2. WHAT HAPPENS WHEN YOU DISABLE JS? (USING PLUGIN RECOMMENDED):
AMAZON.COM ISN’T HAPPY WITHOUT JAVASCRIPT
17. 7/25/14
Lessons learned in failure
SERIOUSLY, REMEMBER ONE OF THESE THINGS
Measure errors and key performance metrics over time
Bad performance = failure
Annoy yourself to fix the broken things with alerting
Find remediation steps to make sure it doesn’t happen again
Get experience with failure before 7pm on a Friday
Javascript is twice as popular as failure. That’s a good thing. Most of the time the things we right do actually work as expected.
Dr. Cook is an internationally recognized expert on medical accidents, complex system failures, and human performance at the sharp end of these systems. He spoke at Velocity in 2012 and has a cult following in the Operations community because of his paper “How Complex Systems Fail.”
Lesson here: It’s a good thing that nobody gets physically hurt when our Javascript code blows up. Theoretically.
window.onerror is typically one of the first places we go to get insight into how JavaScript in the browser is blowing up. Unfortunately, the data we receive from it is limited. This seems to be improving in new browser version.
There are many, many Javascript error tracking services out there. All hook into the window.onerror object and aggregate exceptions with pretty graphs. Not surprisingly, Paul Irish has the best list available.