The Unrealized Role of Monitoring & Alerting w/ Jason Hand

The Unrealized Role of:
Monitoring & Alerting
@jasonhand | VictorOps | #AllDayDevOps

THE UNREALIZED
ROLE OF:
Monitoring
& Alerting

JASON
HAND
DevOps Evangelist
VictorOps

2015
MONITORING
SURVEY@jasonhand | VictorOps | #AllDayDevOps

WHY ARE YOU COLLECTING THIS DATA?
NOTE: You may choose more than one
▸ Performance analysis and trending
▸ Fault and Anomaly detection
▸ Capacity Planning
▸ A/B Testing

THE RESULTS
NOTE: Respondents may have chose more than one
▸ Performance analysis and trending - 63%
▸ Fault and Anomaly detection - 53%
▸ Capacity Planning - 45%
▸ A/B Testing - 11%

Tyranny of the
S.L.A.(Service Level Agreement)

HIGH
AVAILABILITYPrediction & Prevention

THAT'S IMPORTANT
... BUT ...@jasonhand | VictorOps | #AllDayDevOps

BUSINESS
OBJECTIVES?@jasonhand | VictorOps | #AllDayDevOps

HAPPY CAMPER

CUSTOMERSwant more than just
99.999% UPTIME@jasonhand | VictorOps | #AllDayDevOps

WHERE'S THE
INNOVATION?

HOW IMPORTANT
IS
Learning & Innovation?

The result of underutilizing monitoring & alerting
is that the IT department and the organization have
no chance to...
LEARN,
IMPROVE, OR
INNOVATE.@jasonhand | VictorOps | #AllDayDevOps

CONTINUALLY UNDERSTANDING & RESPONDING
TO THE FEEDBACK
from
monitoring, logging, & alerting
allows you to use information about events in the past to drive future
actions.

It's not just about
PREDICTION
& PREVENTION

RESPOND &
REPAIR
...QUICKLY

NOPE

MTTRRather Than
MTBF@jasonhand | VictorOps | #AllDayDevOps

FAILURE IS
INEVITABLE

US·ER
/ˈYOOZƏR/
DISTRIBUTED FAULT INJECTION TEST SUITE FOR
PRODUCTION.
credit: Leon Fayer (@papa_fire)

SUCCESS
is a result of
FAILURE@jasonhand | VictorOps | #AllDayDevOps

UNDERSTAND
LEARN
INNOVATE@jasonhand | VictorOps | #AllDayDevOps

RE·SIL·IENT/RƏˈZILYƏNT/
The ability to resist, absorb, recover from or successfully adapt to
adversity or a change in conditions

CHANGE
can cause failure
but innovation requires
CHANGE

CONFLICT

CHANGEREQUIRED@jasonhand | VictorOps | #AllDayDevOps

Without deviation from the norm,
progress is not possible
— Frank Zappa

What Did You
LEARNFrom the Recovery Efforts?
(including monitoring & alerting)

POSTMORTEMS / LEARNING REVIEWS:
Stories of:
WHAT TOOK PLACE
leading up to & during
the disruption & recovery efforts

WHO WAS
INVOLVED?@jasonhand | VictorOps | #AllDayDevOps

WHAT DID THEY
SEE?@jasonhand | VictorOps | #AllDayDevOps

WHAT WAS
SAID?@jasonhand | VictorOps | #AllDayDevOps

WHAT
ACTIONSWERE TAKEN?
jhand.co/chatopsbook

HOW DO
events & actions
CORRELATE
OVER TIME?@jasonhand | VictorOps | #AllDayDevOps

5 Why's@jasonhand | VictorOps | #AllDayDevOps

WHAT IS THE "cause"
OF THE PROBLEM?
Root Cause is ...

OUR...
obsession with
"Root Cause"

ASKING "WHY"
.. leads to ..
BLAME@jasonhand | VictorOps | #AllDayDevOps

BLAMING
LEADS TO..
operators hiding relevant & important
information

We must
BELIEVEthat our operators are doing their best given the
constraints of the "system"

"We are here to"
LEARNFrom Failure
(and success)

RATHER THAN ..

AVOIDFAILURE@jasonhand | VictorOps | #AllDayDevOps

WHAT'S THE
STORY?@jasonhand | VictorOps | #AllDayDevOps

INNOVATE
Learning from both success & failure
to develop & implement
small incremental improvements
is critical.

MONITORING &
ALERTINGHelps us understand the story in greater detail

LEARNING
ORGANIZATION

Learning does NOT come from
READING
&
LISTENING@jasonhand | VictorOps | #AllDayDevOps

Learning comes from
DOING@jasonhand | VictorOps | #AllDayDevOps

Real Learning comes from:
OBSERVING
ORIENTING
DECIDING
ACTING
John Boyd's OODA Loop

Example:
LEARNING TO PLAY THE
DOBRO GUITAR@jasonhand | VictorOps | #AllDayDevOps

LEARNING

WHY?
Go from knowing...
to understanding...
to learning
NOTE:
(Requires making mistakes)

We will trade some uptime in exchange for innovation
-Dave Hahn (Netflix)
DevOpsDays Boise 2016

SHIFT OUR GAZE
from:
MAINTAINING
& PROTECTING

LEARNING
Which leads to...
IMPROVING
& INNOVATING

WE INCREASE VALUE OF:
- Monitoring & Alerting
- IT teams
- Products & Services
- Organization

HYPOTHESIZE
EXPLORE
STRETCH
EXPERIMENT
FAIL
LEARN
Try Again

LEARNING & INNOVATING
leads to uncovering new ways of
BUILDING, DEPLOYING, AND MAINTAINING
SOFTWARE & INFRASTRUCTURE
Which leads to...

RESILIENTSYSTEMS@jasonhand | VictorOps | #AllDayDevOps

The
By-product
of a highly
RESILIENT
system is ...

HIGHLY
AVAILABLE
SYSTEM@jasonhand | VictorOps | #AllDayDevOps

THE UNREALIZED
ROLE OF:
Monitoring
& Alerting is ....

LEARNING
&
INNOVATION@jasonhand | VictorOps | #AllDayDevOps

THANK
YOUBe Victorious!

Monitoring Survey: https://kartar.net/2015/08/monitoring-
survey-2015---metrics/
Firefighter: https://www.learyfirefighters.org/wp-content/uploads/
2013/09/cover-slide-1.jpg
Mechanic: https://upload.wikimedia.org/wikipedia/commons/4/4b/
Flickr_-_Israel_Defense_Forces_-
_Airplane_Technician,_March_2010.jpg
Gnome Plan: http://www.nerdfitness.com/wp-content/uploads/
2012/04/Screen-Shot-2012-03-30-at-3.15.38-AM-1024x7591.jpg
NOC: https://upload.wikimedia.org/wikipedia/commons/0/03/@jasonhand | VictorOps | #AllDayDevOps

References:
Kodak: http://file.answcdn.com/answ-cld/image/upload/v1/tk/
brand_image/b59911fc/
91d6e71d30a0878dfe3cb30a22751cb874a3ea8c.jpeg
VW Camper: https://upload.wikimedia.org/wikipedia/commons/d/d7/
VW_Camper.jpg
Blockbuster: https://jordanandeddie.files.wordpress.com/2013/11/
blockbuster-feature.jpg
Borders: http://smashingtops.com/wp-content/uploads/2012/06/
borders_logo1.jpg@jasonhand | VictorOps | #AllDayDevOps

Chained Hands: https://www.google.com/url?
sa=i&rct=j&q=&esrc=s&source=images&cd=&ved=0ahUKEwjgrNCD
h5TMAhXJs4MKHaoZDssQjBwIBA&url=http%3A%2F
%2Fwww.publicdomainpictures.net%2Fdownload-picture.php
%3Fadresar%3D50000%26soubor%3Dhands-in-chains.jpg%26id
%3D40426&bvm=bv.119745492,d.amc&psig=AFQjCNFIdnDPzSqiLA-
znIW5SCTCUHhqEw&ust=1460926880336203
Inevitable: http://vignette4.wikia.nocookie.net/matrix/images/5/51/
SMITH.png/revision/latest?cb=20110214092002
Bulb: https://smhttp-ssl-37293.nexcesscdn.net/media/catalog/

scoreboard/1000/Safety-Awareness-Sign-DSE-195271000.gif
Stewie:
http://chroniclesofredmark.com/wp-content/uploads/2014/01/
Stewie.gif
change: http://i.imgur.com/EQyC6N3.gif
Hard drive: https://i.imgur.com/pWsKSEf.gif
Change: https://farm6.staticflickr.com/
5208/5270199049df99b234e9od.jpg
Value: https://d13yacurqjgara.cloudfront.net/users/6437/
screenshots/1405551/value-cropped.gif

The Unrealized Role of Monitoring & Alerting w/ Jason Hand

Recomendados

Recomendados

Mais conteúdo relacionado

Mais de Sonatype

Mais de Sonatype (20)

Último

Último (20)

The Unrealized Role of Monitoring & Alerting w/ Jason Hand