SlideShare uma empresa Scribd logo
1 de 25
Baixar para ler offline
Sometimes (most times) down and out is better than slow.
Adaptive Availability for
Quality of Service
A new world order
Slow ≅ Byzantine
In most modern systems, users perceive:



“slow is the new down.”
In most distributed systems:



“slow is indistinguishable from byzantine operations.”
We had to be “very sure” in the
Days of Failover
Primary : Replica system usually have non-zero
operational costs in performance failover.
• dataloss (in asynchronous systems)
• operational downtime
• operational rebuild time (reversing the flows)
For well-designed, available systems,
Constraints Have Changed
Deciding to fail a node is no longer a
“last resort” decision.
What do I mean by
well-designed?
The failure of a node does not cause
• service interruption
• significant performance regressions
The recovery of a node does not cause
• unnecessary work (only minimal replay)
• significant performance regressions
A brief tangent on an
Anecdotal Design Active feedback on replay performance
Snowth design
❖ Need: zero-downtime
❖ Know: Agreement is hard.
❖ Know: Consensus is expensive.
❖ CAP theorem tradeoffs suck.
❖ CRDT (Commutative
Replicated Data Type)
n1 n2 n3 n4 n5 n6
n1-1
n1-2
n1-3
n1-4
n2-1
n2-2
n2-3
n2-4
n3-1
n3-2
n3-3
n3-4
n4-1
n4-2
n4-3
n4-4
n5-1
n5-2
n5-3
n5-4
n6-1 n6-2
n6-3
n6-4
n1-1
n1-2
n1-3
n1-4
n2-1
n2-2
n2-3
n2-4
n3-1
n3-2
n3-3
n3-4
n4-1
n4-2
n4-3
n4-4
n5-1
n5-2
n5-3
n5-4
n6-1
n6-2
n6-3
n6-4
o1
n1-1
n1-2
n1-3
n1-4
n2-1
n2-2
n2-3
n2-4
n3-1
n3-2
n3-3
n3-4
n4-1
n4-2
n4-3
n4-4
n5-1
n5-2
n5-3
n5-4
n6-1
n6-2
n6-3
n6-4
o1
n1-1
n1-2
n1-3
n1-4
n2-1
n2-2
n2-3
n2-4
n3-1
n3-2
n3-3
n3-4
n4-1
n4-2
n4-3
n4-4
n5-1
n5-2
n5-3
n5-4
n6-1 n6-2
n6-3
n6-4
n1-1
n1-2
n1-3
n1-4
n2-1
n2-2
n2-3
n2-4
n3-1
n3-2
n3-3
n3-4
n4-1
n4-2
n4-3
n4-4
n5-1
n5-2
n5-3
n5-4
n6-1
n6-2
n6-3
n6-4
Availability

Zone 1
Availability

Zone 2
o1
n1-1
n1-2
n1-3
n1-4
n2-1
n2-2
n2-3
n2-4
n3-1
n3-2
n3-3
n3-4
n4-1
n4-2
n4-3
n4-4
n5-1
n5-2
n5-3
n5-4
n6-1
n6-2
n6-3
n6-4
Availability

Zone 1
Availability

Zone 2
Availability

Zone 1
Availability

Zone 2
o1
n1-1
n1-2
n1-3
n1-4
n2-1
n2-2
n2-3
n2-4
n3-1
n3-2
n3-3
n3-4
n4-1
n4-2
n4-3
n4-4
n5-1
n5-2
n5-3
n5-4
n6-1
n6-2
n6-3
n6-4
A look at adaptive algorithms in
Replication
How do you choose the right unit of work for tasks?
What does it sound like when a system
Backfires
Batch it faster than single ops
• less latency impact
• less transactional overhead
What with QoS enforcement & circuit breakers?
Flogging
TCP (and everything else) can teach us something.
This provides us
Opportunities
What if we had relative homogeny of

systems and workloads?
Some problems get easier
Simplified Outlier Detection
If there is an implicit assumption that
machines behave similarly,



then it becomes much easier to determine
when they fail to do so.
New things become possible
Predicting Future Conditions
With higher volume data,



statistical models offer higher confidence.
We have better tools now that high-volume data isn’t intimidating:
Better insight
That hairline contains >9MM samples.
Histogram shown.
4 modes… WTF?
It takes good understanding of statistics to ask the right questions.
Misleading yourself
This is a q(0.99) — 99th percentile.
It obviously goes off the rails around 1am.
No.
It takes good understanding of statistics to ask the right questions.
Measuring what matters
Instead of measuring

“how slow transaction are”



we measure

“how many transactions are too slow”
Condition
We have a new tool in the tool chest:
Intentionally Failing Nodes When nodes are cattle, not pets…
Expect more from you systems.
Thank You
You can observe better,
know more,
don’t settle.

Mais conteúdo relacionado

Destaque

OmniOS Motivation and Design ~ LISA 2012
OmniOS Motivation and Design ~ LISA 2012OmniOS Motivation and Design ~ LISA 2012
OmniOS Motivation and Design ~ LISA 2012
Theo Schlossnagle
 
Monitoring is easy, why are we so bad at it presentation
Monitoring is easy, why are we so bad at it  presentationMonitoring is easy, why are we so bad at it  presentation
Monitoring is easy, why are we so bad at it presentation
Theo Schlossnagle
 
Monitoring and observability
Monitoring and observabilityMonitoring and observability
Monitoring and observability
Theo Schlossnagle
 
Introduction to the Improvement Kata
Introduction to the Improvement KataIntroduction to the Improvement Kata
Introduction to the Improvement Kata
Mike Rother
 
David Anderson Kanban At Q Con
David Anderson Kanban At Q ConDavid Anderson Kanban At Q Con
David Anderson Kanban At Q Con
deimos
 
devops - what's missing? what's next?
devops - what's missing? what's next?devops - what's missing? what's next?
devops - what's missing? what's next?
Andrew Shafer
 

Destaque (20)

What's in a number?
What's in a number?What's in a number?
What's in a number?
 
Atldevops
AtldevopsAtldevops
Atldevops
 
Xtreme Deployment
Xtreme DeploymentXtreme Deployment
Xtreme Deployment
 
SRECon Coherent Performance
SRECon Coherent PerformanceSRECon Coherent Performance
SRECon Coherent Performance
 
Omnios and unix
Omnios and unixOmnios and unix
Omnios and unix
 
OmniOS Motivation and Design ~ LISA 2012
OmniOS Motivation and Design ~ LISA 2012OmniOS Motivation and Design ~ LISA 2012
OmniOS Motivation and Design ~ LISA 2012
 
Monitoring is easy, why are we so bad at it presentation
Monitoring is easy, why are we so bad at it  presentationMonitoring is easy, why are we so bad at it  presentation
Monitoring is easy, why are we so bad at it presentation
 
Project reality
Project realityProject reality
Project reality
 
It's all about telemetry
It's all about telemetryIt's all about telemetry
It's all about telemetry
 
Monitoring and observability
Monitoring and observabilityMonitoring and observability
Monitoring and observability
 
Go or No-Go: Operability and Contingency Planning at Etsy.com
Go or No-Go: Operability and Contingency Planning at Etsy.comGo or No-Go: Operability and Contingency Planning at Etsy.com
Go or No-Go: Operability and Contingency Planning at Etsy.com
 
Simple math for anomaly detection toufic boubez - metafor software - monito...
Simple math for anomaly detection   toufic boubez - metafor software - monito...Simple math for anomaly detection   toufic boubez - metafor software - monito...
Simple math for anomaly detection toufic boubez - metafor software - monito...
 
Immutable infrastructure with Docker and containers (GlueCon 2015)
Immutable infrastructure with Docker and containers (GlueCon 2015)Immutable infrastructure with Docker and containers (GlueCon 2015)
Immutable infrastructure with Docker and containers (GlueCon 2015)
 
Monitoring and observability
Monitoring and observabilityMonitoring and observability
Monitoring and observability
 
Introduction to the Improvement Kata
Introduction to the Improvement KataIntroduction to the Improvement Kata
Introduction to the Improvement Kata
 
DevOps Kaizen: Practical Steps to Start & Sustain a Transformation
DevOps Kaizen: Practical Steps to Start & Sustain a TransformationDevOps Kaizen: Practical Steps to Start & Sustain a Transformation
DevOps Kaizen: Practical Steps to Start & Sustain a Transformation
 
Monitoring the #DevOps way
Monitoring the #DevOps wayMonitoring the #DevOps way
Monitoring the #DevOps way
 
Support and Initiate a DevOps Transformation
Support and Initiate a DevOps TransformationSupport and Initiate a DevOps Transformation
Support and Initiate a DevOps Transformation
 
David Anderson Kanban At Q Con
David Anderson Kanban At Q ConDavid Anderson Kanban At Q Con
David Anderson Kanban At Q Con
 
devops - what's missing? what's next?
devops - what's missing? what's next?devops - what's missing? what's next?
devops - what's missing? what's next?
 

Semelhante a Adaptive availability

Resource Management in (Embedded) Real-Time Systems
Resource Management in (Embedded) Real-Time SystemsResource Management in (Embedded) Real-Time Systems
Resource Management in (Embedded) Real-Time Systems
jeronimored
 
Slides Cost Based Performance Modelling
Slides Cost Based Performance ModellingSlides Cost Based Performance Modelling
Slides Cost Based Performance Modelling
Eugene Margulis
 

Semelhante a Adaptive availability (20)

Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...
Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...
Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...
 
Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...
Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...
Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...
 
Training - What is Performance ?
Training  - What is Performance ?Training  - What is Performance ?
Training - What is Performance ?
 
Using Time Series for Full Observability of a SaaS Platform
Using Time Series for Full Observability of a SaaS PlatformUsing Time Series for Full Observability of a SaaS Platform
Using Time Series for Full Observability of a SaaS Platform
 
Kanban - A Crash Course
Kanban - A Crash CourseKanban - A Crash Course
Kanban - A Crash Course
 
Observability - the good, the bad, and the ugly
Observability - the good, the bad, and the uglyObservability - the good, the bad, and the ugly
Observability - the good, the bad, and the ugly
 
Observability - The good, the bad and the ugly Xp Days 2019 Kiev Ukraine
Observability -  The good, the bad and the ugly Xp Days 2019 Kiev Ukraine Observability -  The good, the bad and the ugly Xp Days 2019 Kiev Ukraine
Observability - The good, the bad and the ugly Xp Days 2019 Kiev Ukraine
 
Using InfluxDB for Full Observability of a SaaS Platform by Aleksandr Tavgen,...
Using InfluxDB for Full Observability of a SaaS Platform by Aleksandr Tavgen,...Using InfluxDB for Full Observability of a SaaS Platform by Aleksandr Tavgen,...
Using InfluxDB for Full Observability of a SaaS Platform by Aleksandr Tavgen,...
 
Non-Functional Requirements
Non-Functional RequirementsNon-Functional Requirements
Non-Functional Requirements
 
Observability – the good, the bad, and the ugly
Observability – the good, the bad, and the uglyObservability – the good, the bad, and the ugly
Observability – the good, the bad, and the ugly
 
(DVO205) Monitoring Evolution: Flying Blind to Flying by Instrument
(DVO205) Monitoring Evolution: Flying Blind to Flying by Instrument(DVO205) Monitoring Evolution: Flying Blind to Flying by Instrument
(DVO205) Monitoring Evolution: Flying Blind to Flying by Instrument
 
Overview of Site Reliability Engineering (SRE) & best practices
Overview of Site Reliability Engineering (SRE) & best practicesOverview of Site Reliability Engineering (SRE) & best practices
Overview of Site Reliability Engineering (SRE) & best practices
 
Understanding Microservice Performance
Understanding Microservice PerformanceUnderstanding Microservice Performance
Understanding Microservice Performance
 
Patterns of enterprise application architecture
Patterns of enterprise application architecturePatterns of enterprise application architecture
Patterns of enterprise application architecture
 
Resource Management in (Embedded) Real-Time Systems
Resource Management in (Embedded) Real-Time SystemsResource Management in (Embedded) Real-Time Systems
Resource Management in (Embedded) Real-Time Systems
 
Provisioning and Capacity Planning (Travel Meets Big Data)
Provisioning and Capacity Planning (Travel Meets Big Data)Provisioning and Capacity Planning (Travel Meets Big Data)
Provisioning and Capacity Planning (Travel Meets Big Data)
 
Introduction
IntroductionIntroduction
Introduction
 
Natural Laws of Software Performance
Natural Laws of Software PerformanceNatural Laws of Software Performance
Natural Laws of Software Performance
 
2008-10-09 - Bits and Chips Conference - Embedded Systemen Architecture patterns
2008-10-09 - Bits and Chips Conference - Embedded Systemen Architecture patterns2008-10-09 - Bits and Chips Conference - Embedded Systemen Architecture patterns
2008-10-09 - Bits and Chips Conference - Embedded Systemen Architecture patterns
 
Slides Cost Based Performance Modelling
Slides Cost Based Performance ModellingSlides Cost Based Performance Modelling
Slides Cost Based Performance Modelling
 

Mais de Theo Schlossnagle

Social improvements in monitoring
Social improvements in monitoringSocial improvements in monitoring
Social improvements in monitoring
Theo Schlossnagle
 
Applying operations culture to everything
Applying operations culture to everythingApplying operations culture to everything
Applying operations culture to everything
Theo Schlossnagle
 

Mais de Theo Schlossnagle (13)

Adding Simplicity to Complexity
Adding Simplicity to ComplexityAdding Simplicity to Complexity
Adding Simplicity to Complexity
 
Put Some SRE in Your Shipped Software
Put Some SRE in Your Shipped SoftwarePut Some SRE in Your Shipped Software
Put Some SRE in Your Shipped Software
 
Monitoring 101
Monitoring 101Monitoring 101
Monitoring 101
 
Distributed Systems - Like It Or Not
Distributed Systems - Like It Or NotDistributed Systems - Like It Or Not
Distributed Systems - Like It Or Not
 
Applying SRE techniques to micro service design
Applying SRE techniques to micro service designApplying SRE techniques to micro service design
Applying SRE techniques to micro service design
 
Commandments of scale
Commandments of scaleCommandments of scale
Commandments of scale
 
Operational Software Design
Operational Software DesignOperational Software Design
Operational Software Design
 
Social improvements in monitoring
Social improvements in monitoringSocial improvements in monitoring
Social improvements in monitoring
 
Building Scalable Systems: an asynchronous approach
Building Scalable Systems: an asynchronous approachBuilding Scalable Systems: an asynchronous approach
Building Scalable Systems: an asynchronous approach
 
Webops dashboards
Webops dashboardsWebops dashboards
Webops dashboards
 
Web Operations Career
Web Operations CareerWeb Operations Career
Web Operations Career
 
Http front-ends
Http front-endsHttp front-ends
Http front-ends
 
Applying operations culture to everything
Applying operations culture to everythingApplying operations culture to everything
Applying operations culture to everything
 

Último

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 

Adaptive availability