SlideShare a Scribd company logo
1 of 37
Monitoring 101
THE BASICS
http://l42.org/JQE
Theo Schlossnagle CEO @Circonus
Twitter: @postwait
Agenda
Navigation Skills
Tenets
Service
Level
Objective
Overview
History
“
”
Monitoring is the action of observing
and checking static and dynamic
properties of a system.
- HEINRICH HARTMANN (http://l42.org/GwE)
Your System
Is Larger Than Your “Systems”
Technology
Engineering
Operations
HR
Finance
Sales
Marketin
Evolution of Web Monitoring
1995:
Synthetic web page
loads every 15
minutes
2000:
Watching every web
request to understand
“real users”
2015:
Deep analysis on every
transaction to ensure no user left
behind
Evolution of Database Monitoring
1995:
Synthetic queries to
test performance
2005:
Watching every query
request to understand
“real performance”
2015:
Deep analysis of every transaction
to understand overall system
behavior
Evolution of Systems Monitoring
1995:
Looked at avwait
lat (disks)
2015:
Track the latency of every
disk I/O operation
performed
Monitoring Is Sophisticated
Increased
Telemetry
Volume
Advances in Time Series
Databases to store
trillions of samples in a
billion streams.
Advances in Stream
Analytics to handle
velocity at scale for real-
time analysis and
alerting.
More Valuable
Operational
Questions
Data Science is the
future.
Increased volume
mandates computer
assistance where “ops
dashboards” once
worked.
Most sophisticated
modeling: stats,
machine learning, AI,
etc.
Increased
Organizational
Velocity
Systems are decoupled,
distributed and
changing faster.
Understanding overall
systems behavior is like
looking at sand dunes.
Service
Level
Objectives
SLOS ARE WHAT DRIVES SRES
SLO: usually based on percentiles
 E.g. 95th percentile less than 10ms
 “simply” 95% of all samples should 10ms or less, 1% can be arbitrarily bad
 Not “simple”
 Calculated over what period of time (or worse, number of samples)?
 Why 95% and not 99% or 99.9% or 99.34860943%?
 Why 10ms?
 The tragedy of the not-a-histogram histogram:
 There are no right answers, and rarely good ones.
Median Latency
Over 5m Stepping Window
Summary Histogram
30days and 36mm samples
Time-series Histogram
30days and 36mm samples
Time-series Histogram
30days and 36mm samples
Time-series Histogram
30days and 36mm samples
Average Latency
Over 5m Stepping Window
Summary Histogram
2days and 1.6mm samples
Latency
Over 5m Stepping Window
Latency
Over 5m Stepping Window
p(95) Latency
Over 5m Stepping Window
p-1(10ms) Latency
Over 5m Stepping Window
p-1(50ms) Latency
Over 5m Stepping Window
Time Matters
The time quantum you use to assess
is your minimum window of failure.
Uncertainty Matters
You will certainly want to revise your goals,
likely in all parametric space.
Histograms Matter
You cannot manage percentile-based SLOs at scale without
histograms.
Do not measure rates.
You can derive the rate of change over time at query time.
#1
Monitor outside the tech stack.
Your tech stack would not exist without happy customers and a sales pipeline.
Monitor that which is important to the health of your organization.
#2
Do not silo data.
The behavior of the parts must be put in context.
Correlating disparate systems and even business outcomes is critical.
#3
Value observation of real work
over the measurement of synthesized work.
#4
Synthesize work to ensure function
for business critical, low-volume events.
#5
Percentiles are not histograms.
For robust SLO management
you need to store histograms for post-processing.
#6
History is critical;
not weeks or months, but years of detailed history.
Capacity planning, retrospectives, comparative analysis,
and modelling all rely on accurate, high-fidelity history.
#7
Alerts require documentation.
No ruleset should trigger an alert without:
human-readable explanation
business impact description
remediation procedure
escalation documentation
#8
Be outside the blast radius.
The purpose of monitoring is to detect changes in behavior and assist in
answering operational questions.
#9
Something is better than nothing.
Don’t let perfect be the enemy of good.
You have to start somewhere.
#10
Thank You!
http://l42.org/JQE

More Related Content

What's hot

Last Conference 2016 - Rapid Delivery
Last Conference 2016 - Rapid DeliveryLast Conference 2016 - Rapid Delivery
Last Conference 2016 - Rapid Delivery
Lay Ming Clough
 

What's hot (20)

How to SRE when you have no SRE
How to SRE when you have no SREHow to SRE when you have no SRE
How to SRE when you have no SRE
 
8 Blind Spots Often Overlooked When Testing on Mobile
8 Blind Spots Often Overlooked When Testing on Mobile8 Blind Spots Often Overlooked When Testing on Mobile
8 Blind Spots Often Overlooked When Testing on Mobile
 
Translating Tester-Speak Into Plain English: Simple Explanations for 8 Testin...
Translating Tester-Speak Into Plain English: Simple Explanations for 8 Testin...Translating Tester-Speak Into Plain English: Simple Explanations for 8 Testin...
Translating Tester-Speak Into Plain English: Simple Explanations for 8 Testin...
 
Sigma Open Tech Week: Bitter Truth About Software Security
Sigma Open Tech Week: Bitter Truth About Software SecuritySigma Open Tech Week: Bitter Truth About Software Security
Sigma Open Tech Week: Bitter Truth About Software Security
 
Avoid these 7 risk assessment and method statement mistakes
Avoid these 7 risk assessment and method statement mistakesAvoid these 7 risk assessment and method statement mistakes
Avoid these 7 risk assessment and method statement mistakes
 
Control Charts: Finding the Right Control Chart
Control Charts: Finding the Right Control ChartControl Charts: Finding the Right Control Chart
Control Charts: Finding the Right Control Chart
 
Brainstorming failure
Brainstorming failureBrainstorming failure
Brainstorming failure
 
Is your Automation Infrastructure ‘Well Architected’?
Is your Automation Infrastructure ‘Well Architected’?Is your Automation Infrastructure ‘Well Architected’?
Is your Automation Infrastructure ‘Well Architected’?
 
The Most Important Thing: How Mozilla Does Security and What You Can Steal
The Most Important Thing: How Mozilla Does Security and What You Can StealThe Most Important Thing: How Mozilla Does Security and What You Can Steal
The Most Important Thing: How Mozilla Does Security and What You Can Steal
 
QMSS Root Cause Analysis - Sample Slides
QMSS Root Cause Analysis - Sample SlidesQMSS Root Cause Analysis - Sample Slides
QMSS Root Cause Analysis - Sample Slides
 
Last Conference 2016 - Rapid Delivery
Last Conference 2016 - Rapid DeliveryLast Conference 2016 - Rapid Delivery
Last Conference 2016 - Rapid Delivery
 
Chaos Engineering 101: A Field Guide
Chaos Engineering 101: A Field GuideChaos Engineering 101: A Field Guide
Chaos Engineering 101: A Field Guide
 
The left is not wrong, just not right; It's time to shift right!
The left is not wrong, just not right; It's time to shift right!The left is not wrong, just not right; It's time to shift right!
The left is not wrong, just not right; It's time to shift right!
 
[CXL Live 16] Beyond Test-by-Test Results: CRO Metrics for Performance & Insi...
[CXL Live 16] Beyond Test-by-Test Results: CRO Metrics for Performance & Insi...[CXL Live 16] Beyond Test-by-Test Results: CRO Metrics for Performance & Insi...
[CXL Live 16] Beyond Test-by-Test Results: CRO Metrics for Performance & Insi...
 
Troubleshooting.pdf
Troubleshooting.pdfTroubleshooting.pdf
Troubleshooting.pdf
 
Infographic :: Critical Mistake Analysis by CognitiveArts
Infographic :: Critical Mistake Analysis by CognitiveArtsInfographic :: Critical Mistake Analysis by CognitiveArts
Infographic :: Critical Mistake Analysis by CognitiveArts
 
Rewriting DevOps
Rewriting DevOpsRewriting DevOps
Rewriting DevOps
 
The R.O.A.D to DevOps
The R.O.A.D to DevOpsThe R.O.A.D to DevOps
The R.O.A.D to DevOps
 
Building a SIPOC with Matt Hansen at StatStuff
Building a SIPOC with Matt Hansen at StatStuffBuilding a SIPOC with Matt Hansen at StatStuff
Building a SIPOC with Matt Hansen at StatStuff
 
4 pc repair
4 pc repair4 pc repair
4 pc repair
 

Similar to Monitoring 101

SplunkLive! Splunk App for VMware
SplunkLive! Splunk App for VMwareSplunkLive! Splunk App for VMware
SplunkLive! Splunk App for VMware
Splunk
 
TopConf : DevOps Monitoring: Feedback Loops in Enterprise Environments
TopConf : DevOps Monitoring: Feedback Loops in Enterprise EnvironmentsTopConf : DevOps Monitoring: Feedback Loops in Enterprise Environments
TopConf : DevOps Monitoring: Feedback Loops in Enterprise Environments
Jonah Kowall
 

Similar to Monitoring 101 (20)

ThoughtWorks Continuous Delivery
ThoughtWorks Continuous DeliveryThoughtWorks Continuous Delivery
ThoughtWorks Continuous Delivery
 
Dow Chemical presentation at the Chief Analytics Officer Forum East Coast USA...
Dow Chemical presentation at the Chief Analytics Officer Forum East Coast USA...Dow Chemical presentation at the Chief Analytics Officer Forum East Coast USA...
Dow Chemical presentation at the Chief Analytics Officer Forum East Coast USA...
 
Monitoring Distributed Systems
Monitoring Distributed SystemsMonitoring Distributed Systems
Monitoring Distributed Systems
 
The State of Streaming Analytics: The Need for Speed and Scale
The State of Streaming Analytics: The Need for Speed and ScaleThe State of Streaming Analytics: The Need for Speed and Scale
The State of Streaming Analytics: The Need for Speed and Scale
 
Reduce The Risk Critical To Protect Critical To Monitor
Reduce The Risk Critical To Protect Critical To MonitorReduce The Risk Critical To Protect Critical To Monitor
Reduce The Risk Critical To Protect Critical To Monitor
 
Prometheus - Open Source Forum Japan
Prometheus  - Open Source Forum JapanPrometheus  - Open Source Forum Japan
Prometheus - Open Source Forum Japan
 
SplunkLive! Splunk App for VMware
SplunkLive! Splunk App for VMwareSplunkLive! Splunk App for VMware
SplunkLive! Splunk App for VMware
 
MANT 265 S01.ppt
MANT 265 S01.pptMANT 265 S01.ppt
MANT 265 S01.ppt
 
Taking Splunk to the Next Level - Management Breakout Session
Taking Splunk to the Next Level - Management Breakout SessionTaking Splunk to the Next Level - Management Breakout Session
Taking Splunk to the Next Level - Management Breakout Session
 
How to Handle the Realities of DevOps Monitoring Today
How to Handle the Realities of DevOps Monitoring TodayHow to Handle the Realities of DevOps Monitoring Today
How to Handle the Realities of DevOps Monitoring Today
 
Joel Marusiak, Neovia Logistics presenatation at Spare Parts 2013
Joel Marusiak, Neovia Logistics presenatation at Spare Parts 2013Joel Marusiak, Neovia Logistics presenatation at Spare Parts 2013
Joel Marusiak, Neovia Logistics presenatation at Spare Parts 2013
 
TopConf : DevOps Monitoring: Feedback Loops in Enterprise Environments
TopConf : DevOps Monitoring: Feedback Loops in Enterprise EnvironmentsTopConf : DevOps Monitoring: Feedback Loops in Enterprise Environments
TopConf : DevOps Monitoring: Feedback Loops in Enterprise Environments
 
Observability - The good, the bad and the ugly Xp Days 2019 Kiev Ukraine
Observability -  The good, the bad and the ugly Xp Days 2019 Kiev Ukraine Observability -  The good, the bad and the ugly Xp Days 2019 Kiev Ukraine
Observability - The good, the bad and the ugly Xp Days 2019 Kiev Ukraine
 
The Anti-Transformation transformation @DevOps Summit Amsterdam
The Anti-Transformation transformation @DevOps Summit AmsterdamThe Anti-Transformation transformation @DevOps Summit Amsterdam
The Anti-Transformation transformation @DevOps Summit Amsterdam
 
Splunk
SplunkSplunk
Splunk
 
Operational Analytics
Operational AnalyticsOperational Analytics
Operational Analytics
 
Gov Day Sacramento 2015 - Keynote/Overview
Gov Day Sacramento 2015 - Keynote/OverviewGov Day Sacramento 2015 - Keynote/Overview
Gov Day Sacramento 2015 - Keynote/Overview
 
Streaming Analytics - Comparison of Open Source Frameworks and Products
Streaming Analytics - Comparison of Open Source Frameworks and ProductsStreaming Analytics - Comparison of Open Source Frameworks and Products
Streaming Analytics - Comparison of Open Source Frameworks and Products
 
Unified Monitoring Webinar with Dustin Whittle
Unified Monitoring Webinar with Dustin WhittleUnified Monitoring Webinar with Dustin Whittle
Unified Monitoring Webinar with Dustin Whittle
 
5 Reasons DevOps Toolchain Needs Time-Series Based Monitoring
5 Reasons DevOps Toolchain Needs Time-Series Based Monitoring5 Reasons DevOps Toolchain Needs Time-Series Based Monitoring
5 Reasons DevOps Toolchain Needs Time-Series Based Monitoring
 

More from Theo Schlossnagle

A Coherent Discussion About Performance
A Coherent Discussion About PerformanceA Coherent Discussion About Performance
A Coherent Discussion About Performance
Theo Schlossnagle
 
OmniOS Motivation and Design ~ LISA 2012
OmniOS Motivation and Design ~ LISA 2012OmniOS Motivation and Design ~ LISA 2012
OmniOS Motivation and Design ~ LISA 2012
Theo Schlossnagle
 
Monitoring and observability
Monitoring and observabilityMonitoring and observability
Monitoring and observability
Theo Schlossnagle
 

More from Theo Schlossnagle (20)

Adding Simplicity to Complexity
Adding Simplicity to ComplexityAdding Simplicity to Complexity
Adding Simplicity to Complexity
 
Put Some SRE in Your Shipped Software
Put Some SRE in Your Shipped SoftwarePut Some SRE in Your Shipped Software
Put Some SRE in Your Shipped Software
 
Distributed Systems - Like It Or Not
Distributed Systems - Like It Or NotDistributed Systems - Like It Or Not
Distributed Systems - Like It Or Not
 
Craftsmanship
CraftsmanshipCraftsmanship
Craftsmanship
 
SRECon Coherent Performance
SRECon Coherent PerformanceSRECon Coherent Performance
SRECon Coherent Performance
 
Commandments of scale
Commandments of scaleCommandments of scale
Commandments of scale
 
Adaptive availability
Adaptive availabilityAdaptive availability
Adaptive availability
 
Project reality
Project realityProject reality
Project reality
 
Monitoring the #DevOps way
Monitoring the #DevOps wayMonitoring the #DevOps way
Monitoring the #DevOps way
 
Operational Software Design
Operational Software DesignOperational Software Design
Operational Software Design
 
A Coherent Discussion About Performance
A Coherent Discussion About PerformanceA Coherent Discussion About Performance
A Coherent Discussion About Performance
 
The math behind big systems analysis.
The math behind big systems analysis.The math behind big systems analysis.
The math behind big systems analysis.
 
Understanding Slowness
Understanding SlownessUnderstanding Slowness
Understanding Slowness
 
OmniOS Motivation and Design ~ LISA 2012
OmniOS Motivation and Design ~ LISA 2012OmniOS Motivation and Design ~ LISA 2012
OmniOS Motivation and Design ~ LISA 2012
 
Monitoring and observability
Monitoring and observabilityMonitoring and observability
Monitoring and observability
 
Omnios and unix
Omnios and unixOmnios and unix
Omnios and unix
 
Monitoring and observability
Monitoring and observabilityMonitoring and observability
Monitoring and observability
 
Xtreme Deployment
Xtreme DeploymentXtreme Deployment
Xtreme Deployment
 
Atldevops
AtldevopsAtldevops
Atldevops
 
It's all about telemetry
It's all about telemetryIt's all about telemetry
It's all about telemetry
 

Recently uploaded

%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
masabamasaba
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
masabamasaba
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
shinachiaurasa2
 
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
masabamasaba
 
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
masabamasaba
 

Recently uploaded (20)

Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
 
Artyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxArtyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptx
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
tonesoftg
tonesoftgtonesoftg
tonesoftg
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 

Monitoring 101