SlideShare uma empresa Scribd logo
1 de 43
Baixar para ler offline
A tour through mathematical methods on systems telemetry 
Math in Big Systems If it was a 
simple math problem, 
we’d have 
solved all this by now.
The many faces of 
Theo Schlossnagle @postwait 
CEO Circonus
Choosing an approach is premature… 
Problems 
How do we determine the type of signal? 
How do we manage off-line modeling? 
How do we manage online fault detection? 
How do we reconcile so users don’t hate us? 
How we we solve this in a big-data context?
Picking 
an Approach 
Statistical? 
Machine learning? 
Supervised? 
Ad-hoc? 
ontological? (why it is what it is)
tl;dr 
Apply PhDs Apply PhDs 
Rinse 
Wash 
Repeat
Garbage in, category out. 
Classification Understanding a signal 
We found to be quite ad-hoc 
At least the feature extraction
A year of service… I should be able to learn something. 
API requests/second 1 year
A year of service… I should be able to learn something. 
API requests 1 year
A year of service… I should be able to learn something. 
API requests 1 year 
Δv 
Δt 
, ! Δv ≥ 0
Some data goes both ways… 
Complicating Things 
Imagine disk space used… 
it makes sense as a gauge (how full) 
it makes sense as rate (fill rate)
Error + error + guessing = success 
How we categorize 
Human identify a variety of categories. 
Devise a set of ad-hoc features. 
Bayesian model of features to categories. 
Human tests. 
https://www.flickr.com/photos/chrisyarzab/5827332576
Many signals have significant noise around their averages 
Signal Noise A single “obviously wrong” 
measurement… 
is often a reasonable outlier.
A year of service… I should be able to learn something. 
API requests/second 1 year
At a resolution where we witness: “uh oh” 
API requests/second 4 weeks
But, are there two? three? 
API requests/second 4 weeks 
Is that super interesting?
Bring the noise! 
API requests/second 2 days
Think about what this means… statistically 
API requests/second 1 year 
envelope of ±1 std dev
Lies, damned lies, and statistics 
Simple Truths 
Statistics are only really useful with 
p-values are low. 
p ≤ 0.01 : very strong presumption against null hyp. 
0.01 < p ≤ 0.05 : strong presumption against null hyp. 
0.05 < p ≤ 0.1 : low presumption against null hyp. 
p > 0.1 : no presumption against the null hyp. 
from xkcd #882 by Randall Munroe
What does a p-value have to do with applying stats? 
The p-value problem It turns out a lot of measurement data 
(passive) is very infrequent. 
60% of the time… 
it works every time.
Our low frequencies lead us to 
questions of doubt… 
Given a certain statistical model: 
How many few points need to be seen before we 
are sufficiently confident that it does not fit the 
model (presumption against the null hypothesis)? 
With few, we simply have outliers or insignificant 
aberrations. 
http://www.flickr.com/photos/rooreynolds/
Solving the Frequency Problem 
More data, more often… 
(obviously) 
1. sample faster 
(faster from the source) 
2. analyze wider 
(more sources) 
OR
https://www.flickr.com/photos/whosdadog/3652670385 
Increasing frequency is the only option at times. 
Signals of Importance Without large-scale systems 
We must increase frequency
What do mean that my mean is mean? 
Why can’t math be nice to people? 
https://www.flickr.com/photos/imagesbywestfall/3606314694 
Most algorithms require measuring residuals from a mean 
Mean means Calculating means is “easy” 
There are some pitfalls
Newer data should influence our model. 
Signals change 
The model needs to adapt. 
Exponentially decaying averages are quite common 
in online control systems and used as a basis for 
creating control charts. 
Sliding windows are a bit more expensive.
EWM vs. SWM 
❖ EWM : ST 
❖ S1 = V1 
❖ ST = 훼VT + (1-훼)ST-1 
❖ Low computation overhead 
❖ Low memory usage 
❖ Hard to repeat offline 
❖ SMW tracking the last N values : ST 
❖ ST = ST-1 - VT-N/n + VT/n 
❖ Low computational overhead 
❖ High memory usage 
❖ Easy to repeat offline
Repeatable outcomes are needed 
In our system… 
We need our online algorithms to match our offline 
algorithms. 
This is because human beings get pissed off when 
they can’t repeat outcomes that woke them up in 
the middle of the night. 
EWM: not repeatable 
SWM: expensive in online application
Repeatable, low-cost sliding windows 
Our solution: 
lurching windows 
fixed rolling windows 
of 
fixed windows 
SWM + EWM
Putting it all together… How to test if we don’t match our model?
Hypothesis Testing
The CUSUM Method
Applying CUSUM 
API requests/second 4 weeks 
CUSUM Control Chart
Can we do better? 
Investigations 
The CUSUM method has some issues. It’s challenging 
when signals are noise or of variable rate. 
We’re looking into the Tukey test: 
• compares all possible pairs of means 
• test is conservative in light of uneven sample sizes 
https://www.flickr.com/photos/st3f4n/4272645780
All that work and you tell me *now* 
that I don’t have a normal distribution? 
Most statistical methods assume a normal distribution. 
Your telemetry data has 
never been evenly distributed 
Statistics suck. 
https://www.flickr.com/photos/imagesbywestfall/3606314694 
All this “deviation from the mean” 
assumes some symmetric distribution.
Think about what this means… statistically 
Rather obviously 
not a normal distribution 
The average minus the standard deviation 
is less than zero on a measurement that can 
only be ≥ 0.
With all the data, we have a complete picture of population distribution. 
What if we had all the data? 
… full distribution 
Now we can do statistics 
that isn’t hand-wavy.
High volume data requires a different strategy 
What happens when 
we get what we asked for? 
10,000 measurements per second? more? 
on each stream… 
with millions of streams.
Let’s understand the scope of the problem. 
First some realities 
This is 10 billion to 1 trillion measurements per second. 
At least a million independent models. 
We need to cheat. 
https://www.flickr.com/photos/thost/319978448
https://www.flickr.com/photos/meddygarnet/3085238543 
When we have too much, simplify… 
Information compression We need to look to transform the data. 
Add error in the value space. 
Add error in the time space.
Summarization & Extraction 
❖ Take our high-velocity stream 
❖ Summarize as a histogram over 1 minute (error) 
❖ Extract useful less-dimensional characteristics 
❖ Apply CUSUM and Tukey tests on characteristics
Lorem Ipsum Dolor 
Modes & moments. Strong indicators of 
shifts in workload 
http://www.brendangregg.com/FrequencyTrails/modes.html
Lorem Ipsum Dolor 
Quantiles… Useful if you understand the problem 
domain and the expected distribution. 
https://metamarkets.com/2013/histograms/
Lorem Ipsum Dolor 
Inverse Quantiles… Q: “What quantile is 5ms of latency?” 
Useful if you understand the problem 
domain and the expected distribution.
Thank ou! and thank you to the real brains behind Circonus’ analysis systems: Heinrich and George.

Mais conteúdo relacionado

Mais procurados

Influx/Days 2017 San Francisco | Baron Schwartz
Influx/Days 2017 San Francisco | Baron SchwartzInflux/Days 2017 San Francisco | Baron Schwartz
Influx/Days 2017 San Francisco | Baron SchwartzInfluxData
 
Provisioning and Capacity Planning (Travel Meets Big Data)
Provisioning and Capacity Planning (Travel Meets Big Data)Provisioning and Capacity Planning (Travel Meets Big Data)
Provisioning and Capacity Planning (Travel Meets Big Data)Brian Brazil
 
With Cloud Computing, Who Needs Performance Testing?
With Cloud Computing, Who Needs Performance Testing?With Cloud Computing, Who Needs Performance Testing?
With Cloud Computing, Who Needs Performance Testing?TEST Huddle
 
Fallacy of Fast
Fallacy of FastFallacy of Fast
Fallacy of FastFastly
 
It Probably Works
It Probably WorksIt Probably Works
It Probably WorksFastly
 
What does "monitoring" mean? (FOSDEM 2017)
What does "monitoring" mean? (FOSDEM 2017)What does "monitoring" mean? (FOSDEM 2017)
What does "monitoring" mean? (FOSDEM 2017)Brian Brazil
 
EuroSTAR 2013 Albert Witteveen Final
EuroSTAR 2013 Albert Witteveen FinalEuroSTAR 2013 Albert Witteveen Final
EuroSTAR 2013 Albert Witteveen FinalAlbert Witteveen
 
Staleness and Isolation in Prometheus 2.0 (PromCon 2017)
Staleness and Isolation in Prometheus 2.0 (PromCon 2017)Staleness and Isolation in Prometheus 2.0 (PromCon 2017)
Staleness and Isolation in Prometheus 2.0 (PromCon 2017)Brian Brazil
 
GALE: Geometric active learning for Search-Based Software Engineering
GALE: Geometric active learning for Search-Based Software EngineeringGALE: Geometric active learning for Search-Based Software Engineering
GALE: Geometric active learning for Search-Based Software EngineeringCS, NcState
 
How Machines Help Humans Root Case Issues @ Netflix
How Machines Help Humans Root Case Issues @ NetflixHow Machines Help Humans Root Case Issues @ Netflix
How Machines Help Humans Root Case Issues @ NetflixC4Media
 
Chaos Engineering 101: A Field Guide
Chaos Engineering 101: A Field GuideChaos Engineering 101: A Field Guide
Chaos Engineering 101: A Field Guidematthewbrahms
 
Your Data Scientist Hates You
Your Data Scientist Hates YouYour Data Scientist Hates You
Your Data Scientist Hates YouBradford Stephens
 
Make Life Suck Less (Building Scalable Systems)
Make Life Suck Less (Building Scalable Systems)Make Life Suck Less (Building Scalable Systems)
Make Life Suck Less (Building Scalable Systems)Bradford Stephens
 
Queues: An Invisible Money Drain
Queues: An Invisible Money DrainQueues: An Invisible Money Drain
Queues: An Invisible Money DrainPhil Sarin
 

Mais procurados (14)

Influx/Days 2017 San Francisco | Baron Schwartz
Influx/Days 2017 San Francisco | Baron SchwartzInflux/Days 2017 San Francisco | Baron Schwartz
Influx/Days 2017 San Francisco | Baron Schwartz
 
Provisioning and Capacity Planning (Travel Meets Big Data)
Provisioning and Capacity Planning (Travel Meets Big Data)Provisioning and Capacity Planning (Travel Meets Big Data)
Provisioning and Capacity Planning (Travel Meets Big Data)
 
With Cloud Computing, Who Needs Performance Testing?
With Cloud Computing, Who Needs Performance Testing?With Cloud Computing, Who Needs Performance Testing?
With Cloud Computing, Who Needs Performance Testing?
 
Fallacy of Fast
Fallacy of FastFallacy of Fast
Fallacy of Fast
 
It Probably Works
It Probably WorksIt Probably Works
It Probably Works
 
What does "monitoring" mean? (FOSDEM 2017)
What does "monitoring" mean? (FOSDEM 2017)What does "monitoring" mean? (FOSDEM 2017)
What does "monitoring" mean? (FOSDEM 2017)
 
EuroSTAR 2013 Albert Witteveen Final
EuroSTAR 2013 Albert Witteveen FinalEuroSTAR 2013 Albert Witteveen Final
EuroSTAR 2013 Albert Witteveen Final
 
Staleness and Isolation in Prometheus 2.0 (PromCon 2017)
Staleness and Isolation in Prometheus 2.0 (PromCon 2017)Staleness and Isolation in Prometheus 2.0 (PromCon 2017)
Staleness and Isolation in Prometheus 2.0 (PromCon 2017)
 
GALE: Geometric active learning for Search-Based Software Engineering
GALE: Geometric active learning for Search-Based Software EngineeringGALE: Geometric active learning for Search-Based Software Engineering
GALE: Geometric active learning for Search-Based Software Engineering
 
How Machines Help Humans Root Case Issues @ Netflix
How Machines Help Humans Root Case Issues @ NetflixHow Machines Help Humans Root Case Issues @ Netflix
How Machines Help Humans Root Case Issues @ Netflix
 
Chaos Engineering 101: A Field Guide
Chaos Engineering 101: A Field GuideChaos Engineering 101: A Field Guide
Chaos Engineering 101: A Field Guide
 
Your Data Scientist Hates You
Your Data Scientist Hates YouYour Data Scientist Hates You
Your Data Scientist Hates You
 
Make Life Suck Less (Building Scalable Systems)
Make Life Suck Less (Building Scalable Systems)Make Life Suck Less (Building Scalable Systems)
Make Life Suck Less (Building Scalable Systems)
 
Queues: An Invisible Money Drain
Queues: An Invisible Money DrainQueues: An Invisible Money Drain
Queues: An Invisible Money Drain
 

Destaque (12)

Monitoring and observability
Monitoring and observabilityMonitoring and observability
Monitoring and observability
 
Atldevops
AtldevopsAtldevops
Atldevops
 
Xtreme Deployment
Xtreme DeploymentXtreme Deployment
Xtreme Deployment
 
What's in a number?
What's in a number?What's in a number?
What's in a number?
 
Omnios and unix
Omnios and unixOmnios and unix
Omnios and unix
 
Monitoring is easy, why are we so bad at it presentation
Monitoring is easy, why are we so bad at it  presentationMonitoring is easy, why are we so bad at it  presentation
Monitoring is easy, why are we so bad at it presentation
 
OmniOS Motivation and Design ~ LISA 2012
OmniOS Motivation and Design ~ LISA 2012OmniOS Motivation and Design ~ LISA 2012
OmniOS Motivation and Design ~ LISA 2012
 
Project reality
Project realityProject reality
Project reality
 
It's all about telemetry
It's all about telemetryIt's all about telemetry
It's all about telemetry
 
Monitoring and observability
Monitoring and observabilityMonitoring and observability
Monitoring and observability
 
Monitoring the #DevOps way
Monitoring the #DevOps wayMonitoring the #DevOps way
Monitoring the #DevOps way
 
Operational Software Design
Operational Software DesignOperational Software Design
Operational Software Design
 

Semelhante a The math behind big systems analysis.

[243] turning data into value
[243] turning data into value[243] turning data into value
[243] turning data into valueNAVER D2
 
Data Data Everywhere: Not An Insight to Take Action Upon
Data Data Everywhere: Not An Insight to Take Action UponData Data Everywhere: Not An Insight to Take Action Upon
Data Data Everywhere: Not An Insight to Take Action UponArun Kejariwal
 
Say "Hi!" to Your New Boss
Say "Hi!" to Your New BossSay "Hi!" to Your New Boss
Say "Hi!" to Your New BossAndreas Dewes
 
Observability for Emerging Infra (what got you here won't get you there)
Observability for Emerging Infra (what got you here won't get you there)Observability for Emerging Infra (what got you here won't get you there)
Observability for Emerging Infra (what got you here won't get you there)Charity Majors
 
End-to-End Machine Learning Project
End-to-End Machine Learning ProjectEnd-to-End Machine Learning Project
End-to-End Machine Learning ProjectEng Teong Cheah
 
Prometheus (Prometheus London, 2016)
Prometheus (Prometheus London, 2016)Prometheus (Prometheus London, 2016)
Prometheus (Prometheus London, 2016)Brian Brazil
 
Semi-Supervised Insight Generation from Petabyte Scale Text Data
Semi-Supervised Insight Generation from Petabyte Scale Text DataSemi-Supervised Insight Generation from Petabyte Scale Text Data
Semi-Supervised Insight Generation from Petabyte Scale Text DataTech Triveni
 
Chaos Engineering Without Observability ... Is Just Chaos
Chaos Engineering Without Observability ... Is Just ChaosChaos Engineering Without Observability ... Is Just Chaos
Chaos Engineering Without Observability ... Is Just ChaosCharity Majors
 
The limits of unit testing by Craig Stuntz
The limits of unit testing by Craig StuntzThe limits of unit testing by Craig Stuntz
The limits of unit testing by Craig StuntzQA or the Highway
 
The Limits of Unit Testing by Craig Stuntz
The Limits of Unit Testing by Craig StuntzThe Limits of Unit Testing by Craig Stuntz
The Limits of Unit Testing by Craig StuntzQA or the Highway
 
An Introduction to Prometheus (GrafanaCon 2016)
An Introduction to Prometheus (GrafanaCon 2016)An Introduction to Prometheus (GrafanaCon 2016)
An Introduction to Prometheus (GrafanaCon 2016)Brian Brazil
 
Less is More: Behind the Data at Risk I/O
Less is More: Behind the Data at Risk I/OLess is More: Behind the Data at Risk I/O
Less is More: Behind the Data at Risk I/OMichael Roytman
 
Big Data Science - hype?
Big Data Science - hype?Big Data Science - hype?
Big Data Science - hype?BalaBit
 
Machine learning at b.e.s.t. summer university
Machine learning  at b.e.s.t. summer universityMachine learning  at b.e.s.t. summer university
Machine learning at b.e.s.t. summer universityLászló Kovács
 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learningGovind Mudumbai
 
Strategies for Practical Active Learning, Robert Munro
Strategies for Practical Active Learning, Robert MunroStrategies for Practical Active Learning, Robert Munro
Strategies for Practical Active Learning, Robert MunroRobert Munro
 
notes as .ppt
notes as .pptnotes as .ppt
notes as .pptbutest
 
Measure All the Things! - Austin Data Day 2014
Measure All the Things! - Austin Data Day 2014Measure All the Things! - Austin Data Day 2014
Measure All the Things! - Austin Data Day 2014gdusbabek
 
Data Science for Business Managers - An intro to ROI for predictive analytics
Data Science for Business Managers - An intro to ROI for predictive analyticsData Science for Business Managers - An intro to ROI for predictive analytics
Data Science for Business Managers - An intro to ROI for predictive analyticsAkin Osman Kazakci
 

Semelhante a The math behind big systems analysis. (20)

[243] turning data into value
[243] turning data into value[243] turning data into value
[243] turning data into value
 
Data Data Everywhere: Not An Insight to Take Action Upon
Data Data Everywhere: Not An Insight to Take Action UponData Data Everywhere: Not An Insight to Take Action Upon
Data Data Everywhere: Not An Insight to Take Action Upon
 
Say "Hi!" to Your New Boss
Say "Hi!" to Your New BossSay "Hi!" to Your New Boss
Say "Hi!" to Your New Boss
 
Observability for Emerging Infra (what got you here won't get you there)
Observability for Emerging Infra (what got you here won't get you there)Observability for Emerging Infra (what got you here won't get you there)
Observability for Emerging Infra (what got you here won't get you there)
 
End-to-End Machine Learning Project
End-to-End Machine Learning ProjectEnd-to-End Machine Learning Project
End-to-End Machine Learning Project
 
Prometheus (Prometheus London, 2016)
Prometheus (Prometheus London, 2016)Prometheus (Prometheus London, 2016)
Prometheus (Prometheus London, 2016)
 
Semi-Supervised Insight Generation from Petabyte Scale Text Data
Semi-Supervised Insight Generation from Petabyte Scale Text DataSemi-Supervised Insight Generation from Petabyte Scale Text Data
Semi-Supervised Insight Generation from Petabyte Scale Text Data
 
Chaos Engineering Without Observability ... Is Just Chaos
Chaos Engineering Without Observability ... Is Just ChaosChaos Engineering Without Observability ... Is Just Chaos
Chaos Engineering Without Observability ... Is Just Chaos
 
The limits of unit testing by Craig Stuntz
The limits of unit testing by Craig StuntzThe limits of unit testing by Craig Stuntz
The limits of unit testing by Craig Stuntz
 
The Limits of Unit Testing by Craig Stuntz
The Limits of Unit Testing by Craig StuntzThe Limits of Unit Testing by Craig Stuntz
The Limits of Unit Testing by Craig Stuntz
 
An Introduction to Prometheus (GrafanaCon 2016)
An Introduction to Prometheus (GrafanaCon 2016)An Introduction to Prometheus (GrafanaCon 2016)
An Introduction to Prometheus (GrafanaCon 2016)
 
Less is More: Behind the Data at Risk I/O
Less is More: Behind the Data at Risk I/OLess is More: Behind the Data at Risk I/O
Less is More: Behind the Data at Risk I/O
 
Big Data Science - hype?
Big Data Science - hype?Big Data Science - hype?
Big Data Science - hype?
 
Machine learning at b.e.s.t. summer university
Machine learning  at b.e.s.t. summer universityMachine learning  at b.e.s.t. summer university
Machine learning at b.e.s.t. summer university
 
Ml masterclass
Ml masterclassMl masterclass
Ml masterclass
 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learning
 
Strategies for Practical Active Learning, Robert Munro
Strategies for Practical Active Learning, Robert MunroStrategies for Practical Active Learning, Robert Munro
Strategies for Practical Active Learning, Robert Munro
 
notes as .ppt
notes as .pptnotes as .ppt
notes as .ppt
 
Measure All the Things! - Austin Data Day 2014
Measure All the Things! - Austin Data Day 2014Measure All the Things! - Austin Data Day 2014
Measure All the Things! - Austin Data Day 2014
 
Data Science for Business Managers - An intro to ROI for predictive analytics
Data Science for Business Managers - An intro to ROI for predictive analyticsData Science for Business Managers - An intro to ROI for predictive analytics
Data Science for Business Managers - An intro to ROI for predictive analytics
 

Mais de Theo Schlossnagle

Adding Simplicity to Complexity
Adding Simplicity to ComplexityAdding Simplicity to Complexity
Adding Simplicity to ComplexityTheo Schlossnagle
 
Put Some SRE in Your Shipped Software
Put Some SRE in Your Shipped SoftwarePut Some SRE in Your Shipped Software
Put Some SRE in Your Shipped SoftwareTheo Schlossnagle
 
Distributed Systems - Like It Or Not
Distributed Systems - Like It Or NotDistributed Systems - Like It Or Not
Distributed Systems - Like It Or NotTheo Schlossnagle
 
Applying SRE techniques to micro service design
Applying SRE techniques to micro service designApplying SRE techniques to micro service design
Applying SRE techniques to micro service designTheo Schlossnagle
 
Social improvements in monitoring
Social improvements in monitoringSocial improvements in monitoring
Social improvements in monitoringTheo Schlossnagle
 
Building Scalable Systems: an asynchronous approach
Building Scalable Systems: an asynchronous approachBuilding Scalable Systems: an asynchronous approach
Building Scalable Systems: an asynchronous approachTheo Schlossnagle
 
Applying operations culture to everything
Applying operations culture to everythingApplying operations culture to everything
Applying operations culture to everythingTheo Schlossnagle
 

Mais de Theo Schlossnagle (12)

Adding Simplicity to Complexity
Adding Simplicity to ComplexityAdding Simplicity to Complexity
Adding Simplicity to Complexity
 
Put Some SRE in Your Shipped Software
Put Some SRE in Your Shipped SoftwarePut Some SRE in Your Shipped Software
Put Some SRE in Your Shipped Software
 
Monitoring 101
Monitoring 101Monitoring 101
Monitoring 101
 
Distributed Systems - Like It Or Not
Distributed Systems - Like It Or NotDistributed Systems - Like It Or Not
Distributed Systems - Like It Or Not
 
Applying SRE techniques to micro service design
Applying SRE techniques to micro service designApplying SRE techniques to micro service design
Applying SRE techniques to micro service design
 
Commandments of scale
Commandments of scaleCommandments of scale
Commandments of scale
 
Social improvements in monitoring
Social improvements in monitoringSocial improvements in monitoring
Social improvements in monitoring
 
Building Scalable Systems: an asynchronous approach
Building Scalable Systems: an asynchronous approachBuilding Scalable Systems: an asynchronous approach
Building Scalable Systems: an asynchronous approach
 
Webops dashboards
Webops dashboardsWebops dashboards
Webops dashboards
 
Web Operations Career
Web Operations CareerWeb Operations Career
Web Operations Career
 
Http front-ends
Http front-endsHttp front-ends
Http front-ends
 
Applying operations culture to everything
Applying operations culture to everythingApplying operations culture to everything
Applying operations culture to everything
 

Último

IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaManalVerma4
 
DATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etcDATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etclalithasri22
 
Digital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfDigital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfNicoChristianSunaryo
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
 
Presentation of project of business person who are success
Presentation of project of business person who are successPresentation of project of business person who are success
Presentation of project of business person who are successPratikSingh115843
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Boston Institute of Analytics
 
Role of Consumer Insights in business transformation
Role of Consumer Insights in business transformationRole of Consumer Insights in business transformation
Role of Consumer Insights in business transformationAnnie Melnic
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...Jack Cole
 
Statistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfStatistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfnikeshsingh56
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...Dr Arash Najmaei ( Phd., MBA, BSc)
 
Non Text Magic Studio Magic Design for Presentations L&P.pdf
Non Text Magic Studio Magic Design for Presentations L&P.pdfNon Text Magic Studio Magic Design for Presentations L&P.pdf
Non Text Magic Studio Magic Design for Presentations L&P.pdfPratikPatil591646
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfblazblazml
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelBoston Institute of Analytics
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksdeepakthakur548787
 

Último (17)

IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in India
 
DATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etcDATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etc
 
Digital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfDigital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdf
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 
Presentation of project of business person who are success
Presentation of project of business person who are successPresentation of project of business person who are success
Presentation of project of business person who are success
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
Role of Consumer Insights in business transformation
Role of Consumer Insights in business transformationRole of Consumer Insights in business transformation
Role of Consumer Insights in business transformation
 
2023 Survey Shows Dip in High School E-Cigarette Use
2023 Survey Shows Dip in High School E-Cigarette Use2023 Survey Shows Dip in High School E-Cigarette Use
2023 Survey Shows Dip in High School E-Cigarette Use
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
 
Statistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfStatistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdf
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
 
Non Text Magic Studio Magic Design for Presentations L&P.pdf
Non Text Magic Studio Magic Design for Presentations L&P.pdfNon Text Magic Studio Magic Design for Presentations L&P.pdf
Non Text Magic Studio Magic Design for Presentations L&P.pdf
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
 
Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing works
 

The math behind big systems analysis.

  • 1. A tour through mathematical methods on systems telemetry Math in Big Systems If it was a simple math problem, we’d have solved all this by now.
  • 2. The many faces of Theo Schlossnagle @postwait CEO Circonus
  • 3. Choosing an approach is premature… Problems How do we determine the type of signal? How do we manage off-line modeling? How do we manage online fault detection? How do we reconcile so users don’t hate us? How we we solve this in a big-data context?
  • 4. Picking an Approach Statistical? Machine learning? Supervised? Ad-hoc? ontological? (why it is what it is)
  • 5. tl;dr Apply PhDs Apply PhDs Rinse Wash Repeat
  • 6. Garbage in, category out. Classification Understanding a signal We found to be quite ad-hoc At least the feature extraction
  • 7. A year of service… I should be able to learn something. API requests/second 1 year
  • 8. A year of service… I should be able to learn something. API requests 1 year
  • 9. A year of service… I should be able to learn something. API requests 1 year Δv Δt , ! Δv ≥ 0
  • 10. Some data goes both ways… Complicating Things Imagine disk space used… it makes sense as a gauge (how full) it makes sense as rate (fill rate)
  • 11. Error + error + guessing = success How we categorize Human identify a variety of categories. Devise a set of ad-hoc features. Bayesian model of features to categories. Human tests. https://www.flickr.com/photos/chrisyarzab/5827332576
  • 12. Many signals have significant noise around their averages Signal Noise A single “obviously wrong” measurement… is often a reasonable outlier.
  • 13. A year of service… I should be able to learn something. API requests/second 1 year
  • 14. At a resolution where we witness: “uh oh” API requests/second 4 weeks
  • 15. But, are there two? three? API requests/second 4 weeks Is that super interesting?
  • 16. Bring the noise! API requests/second 2 days
  • 17. Think about what this means… statistically API requests/second 1 year envelope of ±1 std dev
  • 18. Lies, damned lies, and statistics Simple Truths Statistics are only really useful with p-values are low. p ≤ 0.01 : very strong presumption against null hyp. 0.01 < p ≤ 0.05 : strong presumption against null hyp. 0.05 < p ≤ 0.1 : low presumption against null hyp. p > 0.1 : no presumption against the null hyp. from xkcd #882 by Randall Munroe
  • 19. What does a p-value have to do with applying stats? The p-value problem It turns out a lot of measurement data (passive) is very infrequent. 60% of the time… it works every time.
  • 20. Our low frequencies lead us to questions of doubt… Given a certain statistical model: How many few points need to be seen before we are sufficiently confident that it does not fit the model (presumption against the null hypothesis)? With few, we simply have outliers or insignificant aberrations. http://www.flickr.com/photos/rooreynolds/
  • 21. Solving the Frequency Problem More data, more often… (obviously) 1. sample faster (faster from the source) 2. analyze wider (more sources) OR
  • 22. https://www.flickr.com/photos/whosdadog/3652670385 Increasing frequency is the only option at times. Signals of Importance Without large-scale systems We must increase frequency
  • 23. What do mean that my mean is mean? Why can’t math be nice to people? https://www.flickr.com/photos/imagesbywestfall/3606314694 Most algorithms require measuring residuals from a mean Mean means Calculating means is “easy” There are some pitfalls
  • 24. Newer data should influence our model. Signals change The model needs to adapt. Exponentially decaying averages are quite common in online control systems and used as a basis for creating control charts. Sliding windows are a bit more expensive.
  • 25. EWM vs. SWM ❖ EWM : ST ❖ S1 = V1 ❖ ST = 훼VT + (1-훼)ST-1 ❖ Low computation overhead ❖ Low memory usage ❖ Hard to repeat offline ❖ SMW tracking the last N values : ST ❖ ST = ST-1 - VT-N/n + VT/n ❖ Low computational overhead ❖ High memory usage ❖ Easy to repeat offline
  • 26. Repeatable outcomes are needed In our system… We need our online algorithms to match our offline algorithms. This is because human beings get pissed off when they can’t repeat outcomes that woke them up in the middle of the night. EWM: not repeatable SWM: expensive in online application
  • 27. Repeatable, low-cost sliding windows Our solution: lurching windows fixed rolling windows of fixed windows SWM + EWM
  • 28. Putting it all together… How to test if we don’t match our model?
  • 31. Applying CUSUM API requests/second 4 weeks CUSUM Control Chart
  • 32. Can we do better? Investigations The CUSUM method has some issues. It’s challenging when signals are noise or of variable rate. We’re looking into the Tukey test: • compares all possible pairs of means • test is conservative in light of uneven sample sizes https://www.flickr.com/photos/st3f4n/4272645780
  • 33. All that work and you tell me *now* that I don’t have a normal distribution? Most statistical methods assume a normal distribution. Your telemetry data has never been evenly distributed Statistics suck. https://www.flickr.com/photos/imagesbywestfall/3606314694 All this “deviation from the mean” assumes some symmetric distribution.
  • 34. Think about what this means… statistically Rather obviously not a normal distribution The average minus the standard deviation is less than zero on a measurement that can only be ≥ 0.
  • 35. With all the data, we have a complete picture of population distribution. What if we had all the data? … full distribution Now we can do statistics that isn’t hand-wavy.
  • 36. High volume data requires a different strategy What happens when we get what we asked for? 10,000 measurements per second? more? on each stream… with millions of streams.
  • 37. Let’s understand the scope of the problem. First some realities This is 10 billion to 1 trillion measurements per second. At least a million independent models. We need to cheat. https://www.flickr.com/photos/thost/319978448
  • 38. https://www.flickr.com/photos/meddygarnet/3085238543 When we have too much, simplify… Information compression We need to look to transform the data. Add error in the value space. Add error in the time space.
  • 39. Summarization & Extraction ❖ Take our high-velocity stream ❖ Summarize as a histogram over 1 minute (error) ❖ Extract useful less-dimensional characteristics ❖ Apply CUSUM and Tukey tests on characteristics
  • 40. Lorem Ipsum Dolor Modes & moments. Strong indicators of shifts in workload http://www.brendangregg.com/FrequencyTrails/modes.html
  • 41. Lorem Ipsum Dolor Quantiles… Useful if you understand the problem domain and the expected distribution. https://metamarkets.com/2013/histograms/
  • 42. Lorem Ipsum Dolor Inverse Quantiles… Q: “What quantile is 5ms of latency?” Useful if you understand the problem domain and the expected distribution.
  • 43. Thank ou! and thank you to the real brains behind Circonus’ analysis systems: Heinrich and George.