SlideShare uma empresa Scribd logo
1 de 120
Baixar para ler offline
PETER ALVARO
Orchestrated
Chaos
With a prelude
of vignettes
and an appendix
of fairy tales
Mythology
About me
About me
About me
Platitudes
“Managing complexity”
Easy: removing complexity
Much harder: moving complexity around
Much harder: moving complexity around
Much harder: moving complexity around
Much harder: moving complexity around
Much harder: moving complexity around
Nontrivial systems problems
always require tradeoffs
Productivity /
Convenience
Purity /
Correctness
Vignettes
Vignette 1: teaching myself docker
Vignette 2: a DBA tale
Vignette 3: selling lovely languages
Vignette 4: Microservices
The UNIX philosophy:
Do one thing and do it well.
> man ls
Ease of release wins
Ease of release wins
Ease of release wins
Ease of release wins
Ease of release wins
Ease of release wins
Ease of release wins
Ease of release wins
Ease of release wins
Ease of release wins
Ease of release wins
Ease of release wins
Ease of release wins
The profound solipsism of the microservice
The profound solipsism of the microservice
The profound solipsism of the microservice
The profound solipsism of the microservice
Every microservice is a piece of the continent
Every microservice is a piece of the continent
What could possibly go wrong?
Consider computation
involving 100 services
Search Space:
2100
executions
“Depth” of bugs
Single Faults Search Space:
100 executions
“Depth” of bugs
Combination of 4 faults Search Space:
3M executions
“Depth” of bugs
Combination of 7 faults Search Space:
16B executions
Reflections
1. Managing complexity can be a zero-sum game
2. Productivity trumps purity
3. Chaos results…. and gives rise to a new order
Opportunity
What the hell is going on? (Observability)
Call
graph
tracing
(e.g. Zipkin)
What could possibly go wrong? (Fault injection)
A fault
injection
framework
(e.g. FIT)
Random search
A fault
injection
framework
(e.g. FIT)
Call
graph
tracing
(e.g. Zipkin)
Random Search
Search Space:
2100
executions
Engineer-guided search
A fault
injection
framework
(e.g. FIT)
Call
graph
tracing
(e.g. Zipkin)
Engineer-guided Search
Search Space:
???
…?
A fault
injection
framework
(e.g. FIT)
Call
graph
tracing
(e.g. Zipkin)
A cunning malevolent sentience?
A fault
injection
framework
(e.g. FIT)
Call
graph
tracing
(e.g. Zipkin)
A cunning malevolent sentience?
A fault
injection
framework
(e.g. FIT)
Call
graph
tracing
(e.g. Zipkin)
Lineage-driven Fault Injection
A fault
injection
framework
(e.g. FIT)
LDFI
Call
graph
tracing
(e.g. Zipkin)
Fault-tolerance “is just” redundancy
But how do we know redundancy when we see it?
Hard question: “Could a bad thing ever happen?”
Easier: “Exactly why did a good thing happen?”
“What could have gone wrong?”
Lineage-driven fault injection
Why did a good thing happen?
Consider its lineage.
The write
is stable
Stored on
RepA
Stored on
RepB
Bcast1 Bcast2
Client Client
Lineage-driven fault injection
Why did a good thing happen?
Consider its lineage.
What could have gone wrong?
Faults are cuts in the lineage graph.
Is there a cut that breaks all supports?
The write
is stable
Stored on
RepA
Stored on
RepB
Bcast1 Bcast2
Client Client
Lineage-driven fault injection
Why did a good thing happen?
Consider its lineage.
What could have gone wrong?
Faults are cuts in the lineage graph.
Is there a cut that breaks all supports?
The write
is stable
Stored on
RepA
Stored on
RepB
Bcast1 Bcast2
Client Client
What would have to go wrong?
(RepA OR Bcast1)
The write
is stable
Stored on
RepA
Stored on
RepB
Bcast2
Client Client
Bcast1
What would have to go wrong?
(RepA OR Bcast1)
AND (RepA OR Bcast2)
The write
is stable
Stored on
RepA
Stored on
RepB
Bcast1 Bcast2
Client Client
What would have to go wrong?
(RepA OR Bcast1)
AND (RepA OR Bcast2)
AND (RepB OR Bcast2)
The write
is stable
Stored on
RepA
Stored on
RepB
Bcast1
Client Client
Bcast2
What would have to go wrong?
(RepA OR Bcast1)
AND (RepA OR Bcast2)
AND (RepB OR Bcast2)
AND (RepB OR Bcast1)
The write
is stable
Stored on
RepA
Stored on
RepB
Bcast1 Bcast2
Client Client
Lineage-driven fault injection The write
is stable
Stored on
RepA
Stored on
RepB
Bcast1 Bcast2
Client Client
Hypothesis: {Bcast1, Bcast2}
Lineage-driven fault injection The write
is stable
Stored on
RepA
Stored on
RepB
Bcast1 Bcast2
Client Client
Bcast3
Client
(RepA OR Bcast1)
AND (RepA OR Bcast2)
AND (RepB OR Bcast2)
AND (RepB OR Bcast1)
Lineage-driven fault injection The write
is stable
Stored on
RepA
Stored on
RepB
Bcast1 Bcast2
Client Client
Bcast3
Client
(RepA OR Bcast1)
AND (RepA OR Bcast2)
AND (RepB OR Bcast2)
AND (RepB OR Bcast1)
AND (RepA OR Bcast3)
AND (RepB OR Bcast3)
Search Space Reduction
Each Experiment finds
a bug, OR
Reduces the
Search space
Lineage-driven Fault Injection
Recipe:
1. Start with a successful
outcome. Work backwards.
2. Ask why it happened: Lineage
3. Convert lineage to a boolean
formula and solve
4. Lather, rinse, repeat
2. Lineage 3. CNF
Fail1. Success
Why?
Encode
Solve
4. REPEAT
Minimal requirements
1. Fault injection infrastructure
2. Mechanism for collecting lineage
3. Ability to replay interactions
Lineage
Request Tracing
Request Tracing
Alternate Execution
Redundancy through History
Case study: “Netflix AppBoot”
Services ~100
Search space (executions) 2100
(1,000,000,000,000,000,000,000,000,000,000)
Experiments performed 200
Critical bugs found 11
Fairy tale
Growing Research
Don’t:
“Throw it over the wall”
Do:
Deep embeddings
Trading shoes
Growing Research
Work with us
Search prioritization
Input generation
Richer lineage collection
Search prioritization: minimal hitting sets
(A ∨ B ∨ C ) ∧ (C ∨ D ∨ E ∨ F) ∧ (D ∨ E ∨ F ∨ G) ∧ (H ∨ I)
(A, B, C), (C, D, E, F), (D, E, F, G), (H, I)
⇩
Search prioritization: minimal hitting sets
(A ∨ B ∨ C ) ∧ (C ∨ D ∨ E ∨ F) ∧ (D ∨ E ∨ F ∨ G) ∧ (H ∨ I)
(A, B, C), (C, D, E, F), (D, E, F, G), (H, I)
⇩
e.g. (C, E, H) ✔
X X X X X
Measuring FT by counting alternatives
Measuring fault tolerance by counting alternatives
Most likely combination of faults
X
X
X
X
X
Most likely combination of faults
X
X
X
X
X
Most likely combination of faults
X
X
X
X
X
Input generation
Using lightweight modeling to understand Chord
Pamela Zave
The importance of being inputs
Using lightweight modeling to understand Chord
Pamela Zave
The importance of being inputs
Using lightweight modeling to understand Chord
Pamela Zave
The importance of being inputs
Using lightweight modeling to understand Chord
Pamela Zave
Richer lineage collection
Where we are
A fault
injection
framework
(e.g. FIT)
Call
graph
tracing
(e.g. Zipkin)
Where we’re headed
A fault
injection
framework
(e.g. FIT)
Lineage-
driven
fault
injection
Call
graph
tracing
(e.g. Zipkin)
Thanks to our hosts, benefactors and collaborators!
References
● ‘Automating Failure Testing at Internet Scale [ACM SoCC’16]
https://people.ucsc.edu/~palvaro/fit-ldfi.pdf
● ‘Lineage Driven Fault Injection’ [ACM SIGMOD’15]
http://people.ucsc.edu/~palvaro/molly.pdf
● Netflix Tech Blog on ‘Automated Failure Testing’
http://techblog.netflix.com/2016/01/automated-failure-testing.html
FOLD
The profound solipsism of the microservice
UGLY
GOOD RAW
GOOD RAW
GOOD RAW
GOOD RAW
True Silicon Valley Stories
1. Crazy legwork
2. The “what the hell does our site do” project
3. Offsite => online
Replay
Bins and Balls
Request
Class 1
Class 2
Class 3
Class n
[...]
r’ r
Class n
Predicting Request Graphs
Request
Class n
Predicting Request Graphs
Request
Some function f:
Requests → Classes
F( ) =
Class n
Request
Predicting Request Graphs
The profound solipsism of the microservice

Mais conteúdo relacionado

Semelhante a Orchestrated Chaos: Applying Failure Testing Research at Scale.

Java memory presentation IBM 7
Java memory presentation IBM 7Java memory presentation IBM 7
Java memory presentation IBM 7
Yury Bubnov
 
Fixed-Parameter Intractability
Fixed-Parameter IntractabilityFixed-Parameter Intractability
Fixed-Parameter Intractability
ASPAK2014
 
Boston Spark Meetup May 24, 2016
Boston Spark Meetup May 24, 2016Boston Spark Meetup May 24, 2016
Boston Spark Meetup May 24, 2016
Chris Fregly
 

Semelhante a Orchestrated Chaos: Applying Failure Testing Research at Scale. (20)

Let's Get to the Rapids
Let's Get to the RapidsLet's Get to the Rapids
Let's Get to the Rapids
 
44CON 2014 - Simple Hardware Sidechannel Attacks for 10 GBP or Less, Joe Fitz...
44CON 2014 - Simple Hardware Sidechannel Attacks for 10 GBP or Less, Joe Fitz...44CON 2014 - Simple Hardware Sidechannel Attacks for 10 GBP or Less, Joe Fitz...
44CON 2014 - Simple Hardware Sidechannel Attacks for 10 GBP or Less, Joe Fitz...
 
JDD2015: Frege - Introducing purely functional programming on the JVM - Dierk...
JDD2015: Frege - Introducing purely functional programming on the JVM - Dierk...JDD2015: Frege - Introducing purely functional programming on the JVM - Dierk...
JDD2015: Frege - Introducing purely functional programming on the JVM - Dierk...
 
Java memory presentation IBM 7
Java memory presentation IBM 7Java memory presentation IBM 7
Java memory presentation IBM 7
 
Building streaming pipelines for neural machine translation
Building streaming pipelines for neural machine translationBuilding streaming pipelines for neural machine translation
Building streaming pipelines for neural machine translation
 
presentation
presentationpresentation
presentation
 
Scaling Secure Computation
Scaling Secure ComputationScaling Secure Computation
Scaling Secure Computation
 
Field Failure Reproduction Using Symbolic Execution and Genetic Programming
Field Failure Reproduction Using Symbolic Execution and Genetic ProgrammingField Failure Reproduction Using Symbolic Execution and Genetic Programming
Field Failure Reproduction Using Symbolic Execution and Genetic Programming
 
Tail Call Elimination in Open Smalltalk
Tail Call Elimination in Open SmalltalkTail Call Elimination in Open Smalltalk
Tail Call Elimination in Open Smalltalk
 
Unraveling mysteries of the Universe at CERN, with OpenStack and Hadoop
Unraveling mysteries of the Universe at CERN, with OpenStack and HadoopUnraveling mysteries of the Universe at CERN, with OpenStack and Hadoop
Unraveling mysteries of the Universe at CERN, with OpenStack and Hadoop
 
Computational Techniques for the Statistical Analysis of Big Data in R
Computational Techniques for the Statistical Analysis of Big Data in RComputational Techniques for the Statistical Analysis of Big Data in R
Computational Techniques for the Statistical Analysis of Big Data in R
 
XPDDS17: uniprof: Transparent Unikernel Performance Profiling and Debugging -...
XPDDS17: uniprof: Transparent Unikernel Performance Profiling and Debugging -...XPDDS17: uniprof: Transparent Unikernel Performance Profiling and Debugging -...
XPDDS17: uniprof: Transparent Unikernel Performance Profiling and Debugging -...
 
Fixed-Parameter Intractability
Fixed-Parameter IntractabilityFixed-Parameter Intractability
Fixed-Parameter Intractability
 
Boston Spark Meetup May 24, 2016
Boston Spark Meetup May 24, 2016Boston Spark Meetup May 24, 2016
Boston Spark Meetup May 24, 2016
 
Real time, streaming advanced analytics, approximations, and recommendations ...
Real time, streaming advanced analytics, approximations, and recommendations ...Real time, streaming advanced analytics, approximations, and recommendations ...
Real time, streaming advanced analytics, approximations, and recommendations ...
 
Graph mining ppt
Graph mining pptGraph mining ppt
Graph mining ppt
 
Compliance monitoring of multi-perspective declarative process models
Compliance monitoring of multi-perspective declarative process modelsCompliance monitoring of multi-perspective declarative process models
Compliance monitoring of multi-perspective declarative process models
 
Solr and Machine Vision - Scott Cote, Lucidworks & Trevor Grant, IBM
Solr and Machine Vision - Scott Cote, Lucidworks & Trevor Grant, IBMSolr and Machine Vision - Scott Cote, Lucidworks & Trevor Grant, IBM
Solr and Machine Vision - Scott Cote, Lucidworks & Trevor Grant, IBM
 
A New Approach to Output-Sensitive Voronoi Diagrams and Delaunay Triangulations
A New Approach to Output-Sensitive Voronoi Diagrams and Delaunay TriangulationsA New Approach to Output-Sensitive Voronoi Diagrams and Delaunay Triangulations
A New Approach to Output-Sensitive Voronoi Diagrams and Delaunay Triangulations
 
Wcre08.ppt
Wcre08.pptWcre08.ppt
Wcre08.ppt
 

Mais de Reactivesummit

Mais de Reactivesummit (6)

Distributed stream processing with Apache Kafka
Distributed stream processing with Apache KafkaDistributed stream processing with Apache Kafka
Distributed stream processing with Apache Kafka
 
Reactive Polyglot Microservices with OpenShift and Vert.x
Reactive Polyglot Microservices with OpenShift and Vert.xReactive Polyglot Microservices with OpenShift and Vert.x
Reactive Polyglot Microservices with OpenShift and Vert.x
 
Microservices: The danger of overhype and importance of checklists
Microservices: The danger of overhype and importance of checklistsMicroservices: The danger of overhype and importance of checklists
Microservices: The danger of overhype and importance of checklists
 
The Zen Of Erlang
The Zen Of ErlangThe Zen Of Erlang
The Zen Of Erlang
 
Monolith to Reactive Microservices
Monolith to Reactive MicroservicesMonolith to Reactive Microservices
Monolith to Reactive Microservices
 
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Ka...
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Ka...Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Ka...
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Ka...
 

Último

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Último (20)

presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 

Orchestrated Chaos: Applying Failure Testing Research at Scale.