SlideShare uma empresa Scribd logo
1 de 115
Baixar para ler offline
Without Resilience
Nothing Else Matters
Jonas Bonér
CTO Lightbend
@jboner
This Is Fault Tolerance
“But it ain’t how hard you’re hit;
it’s about how hard you can get
hit, and keep moving forward.
How much you can take, and
keep moving forward. That’s
how winning is done.”
- Rocky Balboa
Resilience
Is Beyond
Fault
Tolerance
Resilience
“The ability of a substance or
object to spring back into shape.
The capacity to recover quickly
from difficulties.”
-Merriam Webster
Antifragility
“Antifragility is beyond resilience and
robustness. The resilient resists shock and
stays the same; the antifragile gets better.”
- Nassem Nicholas Taleb
Antifragile: Things That Gain from Disorder - Nassim Nicholas Taleb
“We can model and understand in isolation. 

But, when released into competitive nominally
regulated societies, their connections proliferate, 

their interactions and interdependencies multiply, 

their complexities mushroom. 

And we are caught short.”
- Sidney Dekker
Drift into Failure - Sidney Dekker
Software Systems Today Are
Incredibly Complex
Netflix Twitter
We need to study
Resilience in
Complex
Systems
Complicated System
Complex System
Complicated ≠Complex
“Complex systems run in degraded mode.”
“Complex systems run as broken systems.”
- richard Cook
How Complex Systems Fail - Richard Cook
“Counterintuitive. That’s [Jay] Forrester’s
word to describe complex systems.
Leverage points are not intuitive. Or if they
are, we intuitively use them backward,
systematically worsening whatever
problems we are trying to solve.”
- Donella Meadows
Leverage Points: Places to Intervene in a System - Donella Meadows
“Humans should not be
involved in setting timeouts.”
“Human involvement in
complex systems is the biggest
source of trouble.”
- Ben Christensen, Netflix
Humans Generally
Make Things Worse
‘‘Going solid’’: a model of system dynamics and consequences for patient safety - R Cook, J Rasmussen
Resilience in complex adaptive systems: Operating at the Edge of Failure - Richard Cook - Talk at Velocity NY 2013
Economic
Failure
Boundary
Unacceptable
Workload
Boundary
Operating Point
FAILURE
Accident
Boundary
Operating at the Edge of Failure
Economic
Failure
Boundary
Unacceptable
Workload
Boundary
Accident
Boundary
Management Pressure
Towards Economic Efficiency
Gradient Towards
Least Effort
Counter Gradient
For More Resilience
‘‘Going solid’’: a model of system dynamics and consequences for patient safety - R Cook, J Rasmussen
Resilience in complex adaptive systems: Operating at the Edge of Failure - Richard Cook - Talk at Velocity NY 2013
Operating at the Edge of Failure
‘‘Going solid’’: a model of system dynamics and consequences for patient safety - R Cook, J Rasmussen
Resilience in complex adaptive systems: Operating at the Edge of Failure - Richard Cook - Talk at Velocity NY 2013
Economic
Failure
Boundary
Unacceptable
Workload
Boundary
Accident
Boundary
Error Margin
Marginal
Boundary
Operating at the Edge of Failure
‘‘Going solid’’: a model of system dynamics and consequences for patient safety - R Cook, J Rasmussen
Resilience in complex adaptive systems: Operating at the Edge of Failure - Richard Cook - Talk at Velocity NY 2013
Accident
Boundary
Marginal
Boundary
?
Operating at the Edge of Failure
‘‘Going solid’’: a model of system dynamics and consequences for patient safety - R Cook, J Rasmussen
Resilience in complex adaptive systems: Operating at the Edge of Failure - Richard Cook - Talk at Velocity NY 2013
Operating at the Edge of Failure
Accident
Boundary
Marginal
Boundary
Embrace
Failure
Resilience
is by
Design
Photo courtesy of FEMA/Joselyne Augustino
“Autonomy makes information local,
leading to greater certainty and stability.”
- Mark Burgess
In Search of Certainty - Mark Burgess
Promise Theory
Think in
Promises
Not
Commands
Promise Theory
Promises converge towards
A definite outcome from
unpredictable beginnings
improved Stability
Commands diverge into
unpredictable outcomes from
definite beginnings
decreased Stability
Resilience in
Biological
Systems
MeerkatsPuppies! Now that I’ve got your attention, complexity theory - Nicolas Perony, TED talk
“In three words, in the animal kingdom,
simplicity leads to complexity 

which leads to resilience.”
- Nicolas Perony
Puppies! Now that I’ve got your attention, complexity theory - Nicolas Perony, TED talk
Resilience in
Social
Systems
Dealing in Security
Understanding vital services, and how they keep you safe
1INDIVIDUAL (you)
6ways to die
3sets of essential services
7layers of PROTECTION
Dealing in Security - Mike Bennet, Vinay Gupta
What we can learn from
Resilience in
Biological and
Social Systems
1. Feature Diversity and redundancy
2. Inter-Connected network structure
3. Wide distribution across all scales
4. Capacity to self-adapt & self-organize
Toward Resilient Architectures 1: Biology Lessons - Michael Mehaffy, Nikos A. Salingaros
Applying resilience thinking: Seven principles for building resilience in social-ecological systems - Reinette Biggs et. al.
Resilience in
Computer
Systems
We Need To
Manage
Failure
Not Try To Avoid It
Let It
Crash
Crash
Only
Software
Crash-Only Software - George Candea, Armando Fox
Stop = Crash Safely
Start = Recover Fast
Recursive Restartability
Turning the Crash-Only Sledgehammer into a Scalpel
Recursive Restartability: Turning the Reboot Sledgehammer into a Scalpel - George Candea, Armando Fox
Traditional
State Management
Object
Critical state
that needs protection
Client
Thread boundary
Synchronous dispatch Thread boundary
?
Utterly broken
“Accidents come from relationships 
not broken parts.”
- Sidney dekker
Drift into Failure - Sidney Dekker
Requirements for a
Sane Failure Model
1. Contained—Avoid cascading failures
2. Reified—as messages
3. Signalled—Asynchronously
4. Observed—by 1-N
5. Managed—Outside failed Context
Failures need to be
Bulkhead
Pattern
Enter Supervision
Out of the Tar Pit - Ben Moseley , Peter Marks
• Input Data
• Derived Data
Critical
We need a way out of the
State Tar Pit
Essential
State
Out of the Tar Pit - Ben Moseley , Peter Marks
Essential
Logic
Accidental
State and
Control
We need a way out of the
State Tar Pit
The
Vending
Machine
Pattern
Think Vending Machine
Coffee
Machine
Programmer
Inserts coins
Gets coffee
Add more coins
Think Vending Machine
Programmer
Service
Guy
Inserts coins
Gets coffee
Out of
coffee beans
failure
Adds
more
beans
Out of coffee beans error
WRONG
Coffee
Machine
Think Vending Machine
ServiceClient
Supervisor
Request
Response
Validation Error
Application
Failure
Manages
Failure
Error
Kernel
Pattern
Onion-layered state & Failure management
Making reliable distributed systems in the presence of software errors - Joe Armstrong
On Erlang, State and Crashes - Jesper Louis Andersen
Onion Layered
State Management
Error Kernel
Object
Critical state
that needs protection
Client
Supervision
Supervision
Thread boundary
Supervision
Demo
Time
Let’s model a resilient vending machine, in Akka
Demo Runner
object VendingMachineDemo extends App {



val system = ActorSystem("vendingMachineDemo")

val coffeeMachine = system.actorOf(Props[CoffeeMachineManager], "coffeeMachineManager")

val customer = Inbox.create(system) // emulates the customer

… // test runs


system.shutdown()

}
https://gist.github.com/jboner/d24c0eb91417a5ec10a6
Test Happy Path
// Insert 2 coins and get an Espresso

customer.send(coffeeMachine, Coins(2))

customer.send(coffeeMachine, Selection(Espresso))

val Beverage(coffee1) = customer.receive(5.seconds)

println(s"Got myself an $coffee1")

assert(coffee1 == Espresso)
https://gist.github.com/jboner/d24c0eb91417a5ec10a6
Test User Error
customer.send(coffeeMachine, Coins(1))

customer.send(coffeeMachine, Selection(Latte))

val NotEnoughCoinsError(message) = customer.receive(5.seconds)

println(s"Got myself a validation error: $message")

assert(message == "Please insert [1] coins")
https://gist.github.com/jboner/d24c0eb91417a5ec10a6
Test System Failure
// Insert 1 coin (had 1 before) and try to get my Latte

// Machine should:

// 1. Fail

// 2. Restart

// 3. Resubmit my order

// 4. Give me my coffee

customer.send(coffeeMachine, Coins(1))

customer.send(coffeeMachine, TriggerOutOfCoffeeBeansFailure)

customer.send(coffeeMachine, Selection(Latte))

val Beverage(coffee2) = customer.receive(5.seconds)

println(s"Got myself a $coffee2")

assert(coffee2 == Latte)

https://gist.github.com/jboner/d24c0eb91417a5ec10a6
Protocol
// Coffee types

trait CoffeeType

case object BlackCoffee extends CoffeeType

case object Latte extends CoffeeType

case object Espresso extends CoffeeType



// Commands

case class Coins(number: Int)

case class Selection(coffee: CoffeeType)

case object TriggerOutOfCoffeeBeansFailure



// Events

case class CoinsReceived(number: Int)



// Replies

case class Beverage(coffee: CoffeeType)



// Errors

case class NotEnoughCoinsError(message: String)



// Failures

case class OutOfCoffeeBeansFailure(customer: ActorRef,

pendingOrder: Selection,

nrOfInsertedCoins: Int) extends Exception
https://gist.github.com/jboner/d24c0eb91417a5ec10a6
CoffeeMachine
class CoffeeMachine extends Actor {

val price = 2

var nrOfInsertedCoins = 0

var outOfCoffeeBeans = false

var totalNrOfCoins = 0



def receive = { … }



override def postRestart(failure: Throwable): Unit = { … }
}
https://gist.github.com/jboner/d24c0eb91417a5ec10a6
CoffeeMachine
def receive = {

case Coins(nr) =>

nrOfInsertedCoins += nr

totalNrOfCoins += nr

println(s"Inserted [$nr] coins")

println(s"Total number of coins in machine is [$totalNrOfCoins]")



case selection @ Selection(coffeeType) =>

if (nrOfInsertedCoins < price)

sender.tell(NotEnoughCoinsError(
s”Insert [${price - nrOfInsertedCoins}] coins"), self)

else {

if (outOfCoffeeBeans)

throw new OutOfCoffeeBeansFailure(sender, selection, nrOfInsertedCoins)

println(s"Brewing your $coffeeType")

sender.tell(Beverage(coffeeType), self)

nrOfInsertedCoins = 0

}



case TriggerOutOfCoffeeBeansFailure =>

outOfCoffeeBeans = true

}
https://gist.github.com/jboner/d24c0eb91417a5ec10a6
CoffeeMachine
override def postRestart(failure: Throwable): Unit = {

println(s"Restarting coffee machine...")

failure match {

case OutOfCoffeeBeansFailure(customer, pendingOrder, coins) =>

nrOfInsertedCoins = coins

outOfCoffeeBeans = false

println(s"Resubmitting pending order $pendingOrder")

context.self.tell(pendingOrder, customer)

}

}
https://gist.github.com/jboner/d24c0eb91417a5ec10a6
Supervisor
class CoffeeMachineManager extends Actor {

override val supervisorStrategy =

OneForOneStrategy(maxNrOfRetries = 10, withinTimeRange = 1.minute) {

case e: OutOfCoffeeBeansFailure =>

println(s"ServiceGuy notified: $e")

Restart

case _: Exception =>

Escalate

}



// to simplify things he is only managing 1 single machine

val machine = context.actorOf(
Props[CoffeeMachine], name = "coffeeMachine")



def receive = {

case request => machine.forward(request)

}

}
https://gist.github.com/jboner/d24c0eb91417a5ec10a6
So.........
Sorry...but Not really.
Are We Done?
We can not
keep putting
all eggs in the
same basket
We need to
Maintain Diversity
and Redundancy
The Network
is Reliable
NOT
Really
Here, We are living in the
Looming
Shadow of
Impossibility
Theorems
CAP: Consistency is impossible
FLP: Consensus is impossible
Towards Resilient Distributed Systems
Isolation
• Autonomous Microservices
• Resilient Protocols
• Virtualization
Data Resilience
• Eventual & Causal Consistency
• Event Logging
• Flow Control / Feedback Control
Self-healing
• Decentralized Architectures
• Gossip Protocols
• Failure Detection
Embrace the Network
•Asynchronicity
•Location Transparency
Microservices
1. Autonomy
2. Isolation
3. Mobility
4. Single Responsibility
5. Exclusive StatE
An autonomous Service
can only promise
its own behavior
Apply Promise Theory
We need to decompose the system using
Consistency Boundaries
Inside Data
Our current present—state
Outside Data
Blast from the past—facts
Between Services
Hope for the future—commands
Data on the inside vs Data on the outside - Pat Helland
WITHIN the Consistency Boundary
we can have STRONG CONSISTENCY
BETWEEN
Consistency
Boundaries
it is a
ZOO
We need Systems that are Decoupled in
Time and Space
Embrace the Network
•Go Asynchronous
•Make distribution first class
•Learn from the mistakes of RPC, EJB & CORBA
•Leverage Location Transparency
•Actor Model does it right
Location Transparency
One communication abstraction
across all dimensions of scale
Core Socket CPU
Container Server Rack
Data Center GLobal
Resilient Protocols
are tolerant to
•Message loss
•Message reordering
•Message duplication
Embrace ACID 2.0
• Associative
• Commutative
• Idempotent
• Distributed
Depend on
• Asynchronous Communication
• Eventual Consistency
“To make a system of interconnected components
crash-only, it must be designed so that components
can tolerate the crashes and temporary unavailability
of their peers. This means we require: [1] strong
modularity with relatively impermeable component
boundaries, [2] timeout-based communication and
lease-based resource allocation, and [3] self-
describing requests that carry a time-to-live and
information on whether they are idempotent.”
- George Candea, Armando Fox
Crash-Only Software - George Candea, Armando Fox
"Software components should be designed such
that they can deny service for any request or call.
Then, if an underlying component can say No,
apps must be designed to take No for an answer
and decide how to proceed: give up, wait and
retry, reduce fidelity, etc.”
- George Candea, Armando Fox
Recursive Restartability: Turning the Reboot Sledgehammer into a Scalpel - George Candea, Armando Fox
Services need to learn to accept
NO for an answer
Member
Node
Member
Node
Member
Node
Member
Node
Member
Node
Member
Node
Member
Node
Member
Node
Member
Node
Member
Node
Decentralized
Epidemic Gossip Protocols
Gossip Of membership, Data & Meta DataFailure detection heartbeat
STRONG
Consistency
Is the wrong default
“Two-phase commit is the
anti-availability protocol.”
- Pat Helland
Standing on Distributed Shoulders of Giants - Pat Helland
Eventual
Consistency
We have to rely on
But relax, it’s how the world works
Transactions
But I really need
“In general, application developers
simply do not implement large
scalable applications assuming
distributed transactions.”
- Pat Helland
Life Beyond Distributed Transactions - Pat Helland
Guess.
Apologize.
Compensate.
Use a protocol of
“The truth is the log. The database is a
cache of a subset of the log.”
- Pat Helland
Immutability Changes Everything - Pat Helland
CRUD is DEAD
Event Logging
• Work with Facts—immutable values
• Event Sourcing
• DB of Facts—Keep all history
• Just replay on failure
• Free Auditing, Debugging, Replication
• Single Writer PRinciple
• Avoids OO-Relational impedence mismatch
• CQRS—Separate the Read & Write Model
Let’s model a resilient & Event Logged vending machine, in Akka
Demo
Time
Event Logged CoffeeMachine
// Events

case class CoinsReceived(number: Int)

class CoffeeMachine extends PersistentActor {

val price = 2

var nrOfInsertedCoins = 0

var outOfCoffeeBeans = false

var totalNrOfCoins = 0



override def persistenceId = "CoffeeMachine"



override def receiveCommand: Receive = {

case Coins(nr) =>

nrOfInsertedCoins += nr

println(s"Inserted [$nr] coins")

persist(CoinsReceived(nr)) { evt =>

totalNrOfCoins += nr

println(s"Total number of coins in machine is [$totalNrOfCoins]")

}
…
}
override def receiveRecover: Receive = {

case CoinsReceived(coins) =>

totalNrOfCoins += coins

println(s"Total number of coins in machine is [$totalNrOfCoins]")

}

}
https://gist.github.com/jboner/1db37eeee3ed3c9422e4
“An escalator can never break: it can only
become stairs. You should never see an
Escalator Temporarily Out Of Order sign, just
Escalator Temporarily Stairs. Sorry for the
convenience.”
- Mitch Hedberg
Graceful
Degradation
Circuit Breaker
Little’s Law
L = λW
Queue Length = Arrival Rate * Response Time
W = L/λ
Response Time = Queue Length / Arrival Rate
W: Response Time
L: Queue Length
Flow Control
Always Apply BackPressure
Feedback
Control
“Continuously compare the actual
output to its desired reference value;
then apply a change to the system
inputs that counteracts any deviation of
the actual output from the reference.”
- Philipp K. Janert
Feedback Control for Computer Systems - Philipp K. Janet
The Feedback Principle
Feedback Control
Influencing a
Complex System
Places to Intervene
in a Complex System
1. The constants, parameters or numbers
2. The sizes of buffers relative to their flows
3. The structure of material stocks and flows
4. The lengths of delays, relative to the rate of system change
5. The strength of negative feedback loops
6. The gain around driving positive feedback loops
7. The structure of information flows
8. The rules of the system
9. The power to add, change, evolve, or self-organize structure
10. The goals of the system
11. The mindset or paradigm out of which the system arises
12. The power to transcend paradigms
Leverage Points: Places to Intervene in a System - Donella Meadows:
Triple Loop Learning
Loop 1: Follow the rules
Loop 2: Change the rules
Loop 3: Learn how to learn
Triple Loop Learning - Chris Argyris
Testing
What can we learn from Arnold?
Blow things up
Shoot
Your App
Down
Pull the Plug
…and see what happens
Executive
Summary
“Complex systems run as broken systems.”
- richard Cook
How Complex Systems Fail - Richard Cook
Resilience
is by
Design
Photo courtesy of FEMA/Joselyne Augustino
Without Resilience
Nothing Else Matters
References
Drift into Failure - http://www.amazon.com/Drift-into-Failure-Components-Understanding-ebook/dp/B009KOKXKY

How Complex Systems Fail - http://web.mit.edu/2.75/resources/random/How%20Complex%20Systems%20Fail.pdf

Leverage Points: Places to Intervene in a System - http://www.donellameadows.org/archives/leverage-points-places-to-intervene-in-a-system/
Going Solid: A Model of System Dynamics and Consequences for Patient Safety - http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1743994/

Resilience in Complex Adaptive Systems: Operating at the Edge of Failure - https://www.youtube.com/watch?v=PGLYEDpNu60

Puppies! Now that I’ve got your attention, Complexity Theory - https://www.ted.com/talks/
nicolas_perony_puppies_now_that_i_ve_got_your_attention_complexity_theory

How Bacteria Becomes Resistant - http://www.abc.net.au/science/slab/antibiotics/resistance.htm

Towards Resilient Architectures: Biology Lessons - http://www.metropolismag.com/Point-of-View/March-2013/Toward-Resilient-Architectures-1-Biology-Lessons/

Dealing in Security - http://resiliencemaps.org/files/Dealing_in_Security.July2010.en.pdf

What is resilience? An introduction to social-ecological research - http://www.stockholmresilience.org/download/18.10119fc11455d3c557d6d21/1398172490555/
SU_SRC_whatisresilience_sidaApril2014.pdf
Applying resilience thinking: Seven principles for building resilience in social-ecological systems - http://www.stockholmresilience.org/download/
18.10119fc11455d3c557d6928/1398150799790/SRC+Applying+Resilience+final.pdf

Crash-Only Software - https://www.usenix.org/legacy/events/hotos03/tech/full_papers/candea/candea.pdf

Recursive Restartability: Turning the Reboot Sledgehammer into a Scalpel - http://roc.cs.berkeley.edu/papers/recursive_restartability.pdf

Out of the Tar Pit - http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.93.8928

Bulkhead Pattern - http://skife.org/architecture/fault-tolerance/2009/12/31/bulkheads.html

Making Reliable Distributed Systems in the Presence of Software Errors - http://www.erlang.org/download/armstrong_thesis_2003.pdf

On Erlang, State and Crashes - http://jlouisramblings.blogspot.be/2010/11/on-erlang-state-and-crashes.html

Akka Supervision - http://doc.akka.io/docs/akka/snapshot/general/supervision.html

Release It!: Design and Deploy Production-Ready Software - https://pragprog.com/book/mnee/release-it

Feedback Control for Computer Systems - http://www.amazon.com/Feedback-Control-Computer-Systems-Philipp/dp/1449361692

The Network in Reliable - http://queue.acm.org/detail.cfm?id=2655736

Data on the Outside vs Data on the Inside - https://msdn.microsoft.com/en-us/library/ms954587.aspx

Life Beyond Distributed Transactions - http://adrianmarriott.net/logosroot/papers/LifeBeyondTxns.pdf

Immutability Changes Everything - http://cidrdb.org/cidr2015/Papers/CIDR15_Paper16.pdf

Standing on Distributed Shoulders of Giants - https://queue.acm.org/detail.cfm?id=2953944

Thinking in Promises - http://shop.oreilly.com/product/0636920036289.do

In Search Of Certainty - http://shop.oreilly.com/product/0636920038542.do

Reactive Microservices Architecture - http://www.oreilly.com/programming/free/reactive-microservices-architecture-orm.csp
Reactive Streams - http://reactive-streams.org

Vending Machine Akka Supervision Demo - https://gist.github.com/jboner/d24c0eb91417a5ec10a6

Persistent Vending Machine Akka Supervision Demo - https://gist.github.com/jboner/1db37eeee3ed3c9422e4
Thank
You
Without Resilience
Nothing Else Matters
Jonas Bonér
CTO Lightbend
@jboner

Mais conteúdo relacionado

Semelhante a Without Resilience, Nothing Else Matters

Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Del...
Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Del...Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Del...
Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Del...Legacy Typesafe (now Lightbend)
 
Software Availability by Resiliency
Software Availability by ResiliencySoftware Availability by Resiliency
Software Availability by ResiliencyReza Samei
 
Normal accidents and outpatient surgeries
Normal accidents and outpatient surgeriesNormal accidents and outpatient surgeries
Normal accidents and outpatient surgeriesJonathan Creasy
 
Antifragile, Microservices and DevOps - A Study
Antifragile, Microservices and DevOps - A StudyAntifragile, Microservices and DevOps - A Study
Antifragile, Microservices and DevOps - A StudyWilliam Yang
 
Defence Data Science - BlueHat Seattle, 2019
Defence Data Science - BlueHat Seattle, 2019Defence Data Science - BlueHat Seattle, 2019
Defence Data Science - BlueHat Seattle, 2019Jon Hawes
 
Green Custard Friday Talk 19: Chaos Engineering
Green Custard Friday Talk 19: Chaos EngineeringGreen Custard Friday Talk 19: Chaos Engineering
Green Custard Friday Talk 19: Chaos EngineeringGreen Custard
 
Complex Systems Failure: Designing Reliability Teams - 2012 STS Roundtable Pr...
Complex Systems Failure: Designing Reliability Teams - 2012 STS Roundtable Pr...Complex Systems Failure: Designing Reliability Teams - 2012 STS Roundtable Pr...
Complex Systems Failure: Designing Reliability Teams - 2012 STS Roundtable Pr...Sociotechnical Roundtable
 
AllDayDevOps 2020 Aaron Rinehart Security Differently
AllDayDevOps 2020 Aaron Rinehart Security DifferentlyAllDayDevOps 2020 Aaron Rinehart Security Differently
AllDayDevOps 2020 Aaron Rinehart Security DifferentlyAaron Rinehart
 
Reactive applications and Akka intro used in the Madrid Scala Meetup
Reactive applications and Akka intro used in the Madrid Scala MeetupReactive applications and Akka intro used in the Madrid Scala Meetup
Reactive applications and Akka intro used in the Madrid Scala MeetupMiguel Pastor
 
2016-10-21 CAS Annual Meeting Slides
2016-10-21 CAS Annual Meeting Slides2016-10-21 CAS Annual Meeting Slides
2016-10-21 CAS Annual Meeting SlidesMichael Solomon
 
Resilience Engineering & Human Error... in IT
Resilience Engineering & Human Error... in ITResilience Engineering & Human Error... in IT
Resilience Engineering & Human Error... in ITJoão Miranda
 
Security Differently - DevSecOps Days Austin 2019
Security Differently - DevSecOps Days Austin 2019Security Differently - DevSecOps Days Austin 2019
Security Differently - DevSecOps Days Austin 2019Aaron Rinehart
 
Performance Testing in Production - Leveraging the Universal Scalability Law
Performance Testing in Production - Leveraging the Universal Scalability LawPerformance Testing in Production - Leveraging the Universal Scalability Law
Performance Testing in Production - Leveraging the Universal Scalability LawKevin Brockhoff
 
How to be Wrong (or How to be Successful at Being Wrong)
How to be Wrong (or How to be Successful at Being Wrong)How to be Wrong (or How to be Successful at Being Wrong)
How to be Wrong (or How to be Successful at Being Wrong)Russell Miles
 
Society for Risk Analysis On Transnational Risk & Terrorism
Society for Risk Analysis  On Transnational Risk & TerrorismSociety for Risk Analysis  On Transnational Risk & Terrorism
Society for Risk Analysis On Transnational Risk & TerrorismJohn Marke
 
Preparing for a Black Swan: Planning and Programming for Risk Mitigation in E...
Preparing for a Black Swan: Planning and Programming for Risk Mitigation in E...Preparing for a Black Swan: Planning and Programming for Risk Mitigation in E...
Preparing for a Black Swan: Planning and Programming for Risk Mitigation in E...juliekannai
 
Week 5An Introduction to Systems AnalysisComplex Systems.docx
Week 5An Introduction to Systems AnalysisComplex Systems.docxWeek 5An Introduction to Systems AnalysisComplex Systems.docx
Week 5An Introduction to Systems AnalysisComplex Systems.docxmelbruce90096
 
Using Hackers’ Own Methods and Tools to Defeat Persistent Adversaries
Using Hackers’ Own Methods and Tools to Defeat Persistent AdversariesUsing Hackers’ Own Methods and Tools to Defeat Persistent Adversaries
Using Hackers’ Own Methods and Tools to Defeat Persistent AdversariesEC-Council
 
Resiliency in Systems Engineering
Resiliency in Systems EngineeringResiliency in Systems Engineering
Resiliency in Systems EngineeringCaltech
 

Semelhante a Without Resilience, Nothing Else Matters (20)

Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Del...
Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Del...Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Del...
Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Del...
 
Software Availability by Resiliency
Software Availability by ResiliencySoftware Availability by Resiliency
Software Availability by Resiliency
 
Normal accidents and outpatient surgeries
Normal accidents and outpatient surgeriesNormal accidents and outpatient surgeries
Normal accidents and outpatient surgeries
 
Antifragile, Microservices and DevOps - A Study
Antifragile, Microservices and DevOps - A StudyAntifragile, Microservices and DevOps - A Study
Antifragile, Microservices and DevOps - A Study
 
Defence Data Science - BlueHat Seattle, 2019
Defence Data Science - BlueHat Seattle, 2019Defence Data Science - BlueHat Seattle, 2019
Defence Data Science - BlueHat Seattle, 2019
 
Green Custard Friday Talk 19: Chaos Engineering
Green Custard Friday Talk 19: Chaos EngineeringGreen Custard Friday Talk 19: Chaos Engineering
Green Custard Friday Talk 19: Chaos Engineering
 
Complex Systems Failure: Designing Reliability Teams - 2012 STS Roundtable Pr...
Complex Systems Failure: Designing Reliability Teams - 2012 STS Roundtable Pr...Complex Systems Failure: Designing Reliability Teams - 2012 STS Roundtable Pr...
Complex Systems Failure: Designing Reliability Teams - 2012 STS Roundtable Pr...
 
AllDayDevOps 2020 Aaron Rinehart Security Differently
AllDayDevOps 2020 Aaron Rinehart Security DifferentlyAllDayDevOps 2020 Aaron Rinehart Security Differently
AllDayDevOps 2020 Aaron Rinehart Security Differently
 
Reactive applications and Akka intro used in the Madrid Scala Meetup
Reactive applications and Akka intro used in the Madrid Scala MeetupReactive applications and Akka intro used in the Madrid Scala Meetup
Reactive applications and Akka intro used in the Madrid Scala Meetup
 
2016-10-21 CAS Annual Meeting Slides
2016-10-21 CAS Annual Meeting Slides2016-10-21 CAS Annual Meeting Slides
2016-10-21 CAS Annual Meeting Slides
 
Chaos engineering
Chaos engineering Chaos engineering
Chaos engineering
 
Resilience Engineering & Human Error... in IT
Resilience Engineering & Human Error... in ITResilience Engineering & Human Error... in IT
Resilience Engineering & Human Error... in IT
 
Security Differently - DevSecOps Days Austin 2019
Security Differently - DevSecOps Days Austin 2019Security Differently - DevSecOps Days Austin 2019
Security Differently - DevSecOps Days Austin 2019
 
Performance Testing in Production - Leveraging the Universal Scalability Law
Performance Testing in Production - Leveraging the Universal Scalability LawPerformance Testing in Production - Leveraging the Universal Scalability Law
Performance Testing in Production - Leveraging the Universal Scalability Law
 
How to be Wrong (or How to be Successful at Being Wrong)
How to be Wrong (or How to be Successful at Being Wrong)How to be Wrong (or How to be Successful at Being Wrong)
How to be Wrong (or How to be Successful at Being Wrong)
 
Society for Risk Analysis On Transnational Risk & Terrorism
Society for Risk Analysis  On Transnational Risk & TerrorismSociety for Risk Analysis  On Transnational Risk & Terrorism
Society for Risk Analysis On Transnational Risk & Terrorism
 
Preparing for a Black Swan: Planning and Programming for Risk Mitigation in E...
Preparing for a Black Swan: Planning and Programming for Risk Mitigation in E...Preparing for a Black Swan: Planning and Programming for Risk Mitigation in E...
Preparing for a Black Swan: Planning and Programming for Risk Mitigation in E...
 
Week 5An Introduction to Systems AnalysisComplex Systems.docx
Week 5An Introduction to Systems AnalysisComplex Systems.docxWeek 5An Introduction to Systems AnalysisComplex Systems.docx
Week 5An Introduction to Systems AnalysisComplex Systems.docx
 
Using Hackers’ Own Methods and Tools to Defeat Persistent Adversaries
Using Hackers’ Own Methods and Tools to Defeat Persistent AdversariesUsing Hackers’ Own Methods and Tools to Defeat Persistent Adversaries
Using Hackers’ Own Methods and Tools to Defeat Persistent Adversaries
 
Resiliency in Systems Engineering
Resiliency in Systems EngineeringResiliency in Systems Engineering
Resiliency in Systems Engineering
 

Mais de Jonas Bonér

We are drowning in complexity—can we do better?
We are drowning in complexity—can we do better?We are drowning in complexity—can we do better?
We are drowning in complexity—can we do better?Jonas Bonér
 
Kalix: Tackling the The Cloud to Edge Continuum
Kalix: Tackling the The Cloud to Edge ContinuumKalix: Tackling the The Cloud to Edge Continuum
Kalix: Tackling the The Cloud to Edge ContinuumJonas Bonér
 
The Reactive Principles: Design Principles For Cloud Native Applications
The Reactive Principles: Design Principles For Cloud Native ApplicationsThe Reactive Principles: Design Principles For Cloud Native Applications
The Reactive Principles: Design Principles For Cloud Native ApplicationsJonas Bonér
 
Cloudstate—Towards Stateful Serverless
Cloudstate—Towards Stateful ServerlessCloudstate—Towards Stateful Serverless
Cloudstate—Towards Stateful ServerlessJonas Bonér
 
How Events Are Reshaping Modern Systems
How Events Are Reshaping Modern SystemsHow Events Are Reshaping Modern Systems
How Events Are Reshaping Modern SystemsJonas Bonér
 
Reactive Microsystems: The Evolution of Microservices at Scale
Reactive Microsystems: The Evolution of Microservices at ScaleReactive Microsystems: The Evolution of Microservices at Scale
Reactive Microsystems: The Evolution of Microservices at ScaleJonas Bonér
 
From Microliths To Microsystems
From Microliths To MicrosystemsFrom Microliths To Microsystems
From Microliths To MicrosystemsJonas Bonér
 
Life Beyond the Illusion of Present
Life Beyond the Illusion of PresentLife Beyond the Illusion of Present
Life Beyond the Illusion of PresentJonas Bonér
 
Go Reactive: Building Responsive, Resilient, Elastic & Message-Driven Systems
Go Reactive: Building Responsive, Resilient, Elastic & Message-Driven SystemsGo Reactive: Building Responsive, Resilient, Elastic & Message-Driven Systems
Go Reactive: Building Responsive, Resilient, Elastic & Message-Driven SystemsJonas Bonér
 
Reactive Supply To Changing Demand
Reactive Supply To Changing DemandReactive Supply To Changing Demand
Reactive Supply To Changing DemandJonas Bonér
 
Building Reactive Systems with Akka (in Java 8 or Scala)
Building Reactive Systems with Akka (in Java 8 or Scala)Building Reactive Systems with Akka (in Java 8 or Scala)
Building Reactive Systems with Akka (in Java 8 or Scala)Jonas Bonér
 
Go Reactive: Event-Driven, Scalable, Resilient & Responsive Systems
Go Reactive: Event-Driven, Scalable, Resilient & Responsive SystemsGo Reactive: Event-Driven, Scalable, Resilient & Responsive Systems
Go Reactive: Event-Driven, Scalable, Resilient & Responsive SystemsJonas Bonér
 
Building Scalable, Highly Concurrent & Fault Tolerant Systems - Lessons Learned
Building Scalable, Highly Concurrent & Fault Tolerant Systems -  Lessons LearnedBuilding Scalable, Highly Concurrent & Fault Tolerant Systems -  Lessons Learned
Building Scalable, Highly Concurrent & Fault Tolerant Systems - Lessons LearnedJonas Bonér
 
Event Driven-Architecture from a Scalability perspective
Event Driven-Architecture from a Scalability perspectiveEvent Driven-Architecture from a Scalability perspective
Event Driven-Architecture from a Scalability perspectiveJonas Bonér
 
Scalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsJonas Bonér
 
Akka: Simpler Scalability, Fault-Tolerance, Concurrency & Remoting through Ac...
Akka: Simpler Scalability, Fault-Tolerance, Concurrency & Remoting through Ac...Akka: Simpler Scalability, Fault-Tolerance, Concurrency & Remoting through Ac...
Akka: Simpler Scalability, Fault-Tolerance, Concurrency & Remoting through Ac...Jonas Bonér
 
State: You're Doing It Wrong - Alternative Concurrency Paradigms For The JVM
State: You're Doing It Wrong - Alternative Concurrency Paradigms For The JVMState: You're Doing It Wrong - Alternative Concurrency Paradigms For The JVM
State: You're Doing It Wrong - Alternative Concurrency Paradigms For The JVMJonas Bonér
 
Pragmatic Real-World Scala (short version)
Pragmatic Real-World Scala (short version)Pragmatic Real-World Scala (short version)
Pragmatic Real-World Scala (short version)Jonas Bonér
 

Mais de Jonas Bonér (19)

We are drowning in complexity—can we do better?
We are drowning in complexity—can we do better?We are drowning in complexity—can we do better?
We are drowning in complexity—can we do better?
 
Kalix: Tackling the The Cloud to Edge Continuum
Kalix: Tackling the The Cloud to Edge ContinuumKalix: Tackling the The Cloud to Edge Continuum
Kalix: Tackling the The Cloud to Edge Continuum
 
The Reactive Principles: Design Principles For Cloud Native Applications
The Reactive Principles: Design Principles For Cloud Native ApplicationsThe Reactive Principles: Design Principles For Cloud Native Applications
The Reactive Principles: Design Principles For Cloud Native Applications
 
Cloudstate—Towards Stateful Serverless
Cloudstate—Towards Stateful ServerlessCloudstate—Towards Stateful Serverless
Cloudstate—Towards Stateful Serverless
 
How Events Are Reshaping Modern Systems
How Events Are Reshaping Modern SystemsHow Events Are Reshaping Modern Systems
How Events Are Reshaping Modern Systems
 
Reactive Microsystems: The Evolution of Microservices at Scale
Reactive Microsystems: The Evolution of Microservices at ScaleReactive Microsystems: The Evolution of Microservices at Scale
Reactive Microsystems: The Evolution of Microservices at Scale
 
From Microliths To Microsystems
From Microliths To MicrosystemsFrom Microliths To Microsystems
From Microliths To Microsystems
 
Life Beyond the Illusion of Present
Life Beyond the Illusion of PresentLife Beyond the Illusion of Present
Life Beyond the Illusion of Present
 
Go Reactive: Building Responsive, Resilient, Elastic & Message-Driven Systems
Go Reactive: Building Responsive, Resilient, Elastic & Message-Driven SystemsGo Reactive: Building Responsive, Resilient, Elastic & Message-Driven Systems
Go Reactive: Building Responsive, Resilient, Elastic & Message-Driven Systems
 
Reactive Supply To Changing Demand
Reactive Supply To Changing DemandReactive Supply To Changing Demand
Reactive Supply To Changing Demand
 
Building Reactive Systems with Akka (in Java 8 or Scala)
Building Reactive Systems with Akka (in Java 8 or Scala)Building Reactive Systems with Akka (in Java 8 or Scala)
Building Reactive Systems with Akka (in Java 8 or Scala)
 
Go Reactive: Event-Driven, Scalable, Resilient & Responsive Systems
Go Reactive: Event-Driven, Scalable, Resilient & Responsive SystemsGo Reactive: Event-Driven, Scalable, Resilient & Responsive Systems
Go Reactive: Event-Driven, Scalable, Resilient & Responsive Systems
 
Introducing Akka
Introducing AkkaIntroducing Akka
Introducing Akka
 
Building Scalable, Highly Concurrent & Fault Tolerant Systems - Lessons Learned
Building Scalable, Highly Concurrent & Fault Tolerant Systems -  Lessons LearnedBuilding Scalable, Highly Concurrent & Fault Tolerant Systems -  Lessons Learned
Building Scalable, Highly Concurrent & Fault Tolerant Systems - Lessons Learned
 
Event Driven-Architecture from a Scalability perspective
Event Driven-Architecture from a Scalability perspectiveEvent Driven-Architecture from a Scalability perspective
Event Driven-Architecture from a Scalability perspective
 
Scalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability Patterns
 
Akka: Simpler Scalability, Fault-Tolerance, Concurrency & Remoting through Ac...
Akka: Simpler Scalability, Fault-Tolerance, Concurrency & Remoting through Ac...Akka: Simpler Scalability, Fault-Tolerance, Concurrency & Remoting through Ac...
Akka: Simpler Scalability, Fault-Tolerance, Concurrency & Remoting through Ac...
 
State: You're Doing It Wrong - Alternative Concurrency Paradigms For The JVM
State: You're Doing It Wrong - Alternative Concurrency Paradigms For The JVMState: You're Doing It Wrong - Alternative Concurrency Paradigms For The JVM
State: You're Doing It Wrong - Alternative Concurrency Paradigms For The JVM
 
Pragmatic Real-World Scala (short version)
Pragmatic Real-World Scala (short version)Pragmatic Real-World Scala (short version)
Pragmatic Real-World Scala (short version)
 

Último

Analysis and Evaluation of Dal Lake Biomass for Conversion to Fuel/Green fert...
Analysis and Evaluation of Dal Lake Biomass for Conversion to Fuel/Green fert...Analysis and Evaluation of Dal Lake Biomass for Conversion to Fuel/Green fert...
Analysis and Evaluation of Dal Lake Biomass for Conversion to Fuel/Green fert...arifengg7
 
Artificial Intelligence in Power System overview
Artificial Intelligence in Power System overviewArtificial Intelligence in Power System overview
Artificial Intelligence in Power System overviewsandhya757531
 
Python Programming for basic beginners.pptx
Python Programming for basic beginners.pptxPython Programming for basic beginners.pptx
Python Programming for basic beginners.pptxmohitesoham12
 
SOFTWARE ESTIMATION COCOMO AND FP CALCULATION
SOFTWARE ESTIMATION COCOMO AND FP CALCULATIONSOFTWARE ESTIMATION COCOMO AND FP CALCULATION
SOFTWARE ESTIMATION COCOMO AND FP CALCULATIONSneha Padhiar
 
Uk-NO1 Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Exp...
Uk-NO1 Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Exp...Uk-NO1 Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Exp...
Uk-NO1 Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Exp...Amil baba
 
Secure Key Crypto - Tech Paper JET Tech Labs
Secure Key Crypto - Tech Paper JET Tech LabsSecure Key Crypto - Tech Paper JET Tech Labs
Secure Key Crypto - Tech Paper JET Tech Labsamber724300
 
AntColonyOptimizationManetNetworkAODV.pptx
AntColonyOptimizationManetNetworkAODV.pptxAntColonyOptimizationManetNetworkAODV.pptx
AntColonyOptimizationManetNetworkAODV.pptxLina Kadam
 
CS 3251 Programming in c all unit notes pdf
CS 3251 Programming in c all unit notes pdfCS 3251 Programming in c all unit notes pdf
CS 3251 Programming in c all unit notes pdfBalamuruganV28
 
Detection&Tracking - Thermal imaging object detection and tracking
Detection&Tracking - Thermal imaging object detection and trackingDetection&Tracking - Thermal imaging object detection and tracking
Detection&Tracking - Thermal imaging object detection and trackinghadarpinhas1
 
Indian Tradition, Culture & Societies.pdf
Indian Tradition, Culture & Societies.pdfIndian Tradition, Culture & Societies.pdf
Indian Tradition, Culture & Societies.pdfalokitpathak01
 
FUNCTIONAL AND NON FUNCTIONAL REQUIREMENT
FUNCTIONAL AND NON FUNCTIONAL REQUIREMENTFUNCTIONAL AND NON FUNCTIONAL REQUIREMENT
FUNCTIONAL AND NON FUNCTIONAL REQUIREMENTSneha Padhiar
 
Module-1-Building Acoustics(Introduction)(Unit-1).pdf
Module-1-Building Acoustics(Introduction)(Unit-1).pdfModule-1-Building Acoustics(Introduction)(Unit-1).pdf
Module-1-Building Acoustics(Introduction)(Unit-1).pdfManish Kumar
 
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...Sumanth A
 
A brief look at visionOS - How to develop app on Apple's Vision Pro
A brief look at visionOS - How to develop app on Apple's Vision ProA brief look at visionOS - How to develop app on Apple's Vision Pro
A brief look at visionOS - How to develop app on Apple's Vision ProRay Yuan Liu
 
Immutable Image-Based Operating Systems - EW2024.pdf
Immutable Image-Based Operating Systems - EW2024.pdfImmutable Image-Based Operating Systems - EW2024.pdf
Immutable Image-Based Operating Systems - EW2024.pdfDrew Moseley
 
70 POWER PLANT IAE V2500 technical training
70 POWER PLANT IAE V2500 technical training70 POWER PLANT IAE V2500 technical training
70 POWER PLANT IAE V2500 technical trainingGladiatorsKasper
 
Robotics Group 10 (Control Schemes) cse.pdf
Robotics Group 10  (Control Schemes) cse.pdfRobotics Group 10  (Control Schemes) cse.pdf
Robotics Group 10 (Control Schemes) cse.pdfsahilsajad201
 
Gravity concentration_MI20612MI_________
Gravity concentration_MI20612MI_________Gravity concentration_MI20612MI_________
Gravity concentration_MI20612MI_________Romil Mishra
 
STATE TRANSITION DIAGRAM in psoc subject
STATE TRANSITION DIAGRAM in psoc subjectSTATE TRANSITION DIAGRAM in psoc subject
STATE TRANSITION DIAGRAM in psoc subjectGayathriM270621
 

Último (20)

Analysis and Evaluation of Dal Lake Biomass for Conversion to Fuel/Green fert...
Analysis and Evaluation of Dal Lake Biomass for Conversion to Fuel/Green fert...Analysis and Evaluation of Dal Lake Biomass for Conversion to Fuel/Green fert...
Analysis and Evaluation of Dal Lake Biomass for Conversion to Fuel/Green fert...
 
Artificial Intelligence in Power System overview
Artificial Intelligence in Power System overviewArtificial Intelligence in Power System overview
Artificial Intelligence in Power System overview
 
Python Programming for basic beginners.pptx
Python Programming for basic beginners.pptxPython Programming for basic beginners.pptx
Python Programming for basic beginners.pptx
 
SOFTWARE ESTIMATION COCOMO AND FP CALCULATION
SOFTWARE ESTIMATION COCOMO AND FP CALCULATIONSOFTWARE ESTIMATION COCOMO AND FP CALCULATION
SOFTWARE ESTIMATION COCOMO AND FP CALCULATION
 
Uk-NO1 Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Exp...
Uk-NO1 Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Exp...Uk-NO1 Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Exp...
Uk-NO1 Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Exp...
 
Secure Key Crypto - Tech Paper JET Tech Labs
Secure Key Crypto - Tech Paper JET Tech LabsSecure Key Crypto - Tech Paper JET Tech Labs
Secure Key Crypto - Tech Paper JET Tech Labs
 
AntColonyOptimizationManetNetworkAODV.pptx
AntColonyOptimizationManetNetworkAODV.pptxAntColonyOptimizationManetNetworkAODV.pptx
AntColonyOptimizationManetNetworkAODV.pptx
 
CS 3251 Programming in c all unit notes pdf
CS 3251 Programming in c all unit notes pdfCS 3251 Programming in c all unit notes pdf
CS 3251 Programming in c all unit notes pdf
 
Detection&Tracking - Thermal imaging object detection and tracking
Detection&Tracking - Thermal imaging object detection and trackingDetection&Tracking - Thermal imaging object detection and tracking
Detection&Tracking - Thermal imaging object detection and tracking
 
Indian Tradition, Culture & Societies.pdf
Indian Tradition, Culture & Societies.pdfIndian Tradition, Culture & Societies.pdf
Indian Tradition, Culture & Societies.pdf
 
FUNCTIONAL AND NON FUNCTIONAL REQUIREMENT
FUNCTIONAL AND NON FUNCTIONAL REQUIREMENTFUNCTIONAL AND NON FUNCTIONAL REQUIREMENT
FUNCTIONAL AND NON FUNCTIONAL REQUIREMENT
 
Module-1-Building Acoustics(Introduction)(Unit-1).pdf
Module-1-Building Acoustics(Introduction)(Unit-1).pdfModule-1-Building Acoustics(Introduction)(Unit-1).pdf
Module-1-Building Acoustics(Introduction)(Unit-1).pdf
 
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
 
A brief look at visionOS - How to develop app on Apple's Vision Pro
A brief look at visionOS - How to develop app on Apple's Vision ProA brief look at visionOS - How to develop app on Apple's Vision Pro
A brief look at visionOS - How to develop app on Apple's Vision Pro
 
Immutable Image-Based Operating Systems - EW2024.pdf
Immutable Image-Based Operating Systems - EW2024.pdfImmutable Image-Based Operating Systems - EW2024.pdf
Immutable Image-Based Operating Systems - EW2024.pdf
 
70 POWER PLANT IAE V2500 technical training
70 POWER PLANT IAE V2500 technical training70 POWER PLANT IAE V2500 technical training
70 POWER PLANT IAE V2500 technical training
 
Robotics Group 10 (Control Schemes) cse.pdf
Robotics Group 10  (Control Schemes) cse.pdfRobotics Group 10  (Control Schemes) cse.pdf
Robotics Group 10 (Control Schemes) cse.pdf
 
Gravity concentration_MI20612MI_________
Gravity concentration_MI20612MI_________Gravity concentration_MI20612MI_________
Gravity concentration_MI20612MI_________
 
ASME-B31.4-2019-estandar para diseño de ductos
ASME-B31.4-2019-estandar para diseño de ductosASME-B31.4-2019-estandar para diseño de ductos
ASME-B31.4-2019-estandar para diseño de ductos
 
STATE TRANSITION DIAGRAM in psoc subject
STATE TRANSITION DIAGRAM in psoc subjectSTATE TRANSITION DIAGRAM in psoc subject
STATE TRANSITION DIAGRAM in psoc subject
 

Without Resilience, Nothing Else Matters

  • 1. Without Resilience Nothing Else Matters Jonas Bonér CTO Lightbend @jboner
  • 2.
  • 3. This Is Fault Tolerance “But it ain’t how hard you’re hit; it’s about how hard you can get hit, and keep moving forward. How much you can take, and keep moving forward. That’s how winning is done.” - Rocky Balboa
  • 5. Resilience “The ability of a substance or object to spring back into shape. The capacity to recover quickly from difficulties.” -Merriam Webster
  • 6. Antifragility “Antifragility is beyond resilience and robustness. The resilient resists shock and stays the same; the antifragile gets better.” - Nassem Nicholas Taleb Antifragile: Things That Gain from Disorder - Nassim Nicholas Taleb
  • 7. “We can model and understand in isolation. 
 But, when released into competitive nominally regulated societies, their connections proliferate, 
 their interactions and interdependencies multiply, 
 their complexities mushroom. 
 And we are caught short.” - Sidney Dekker Drift into Failure - Sidney Dekker
  • 8. Software Systems Today Are Incredibly Complex Netflix Twitter
  • 9. We need to study Resilience in Complex Systems
  • 13. “Complex systems run in degraded mode.” “Complex systems run as broken systems.” - richard Cook How Complex Systems Fail - Richard Cook
  • 14. “Counterintuitive. That’s [Jay] Forrester’s word to describe complex systems. Leverage points are not intuitive. Or if they are, we intuitively use them backward, systematically worsening whatever problems we are trying to solve.” - Donella Meadows Leverage Points: Places to Intervene in a System - Donella Meadows
  • 15. “Humans should not be involved in setting timeouts.” “Human involvement in complex systems is the biggest source of trouble.” - Ben Christensen, Netflix
  • 17. ‘‘Going solid’’: a model of system dynamics and consequences for patient safety - R Cook, J Rasmussen Resilience in complex adaptive systems: Operating at the Edge of Failure - Richard Cook - Talk at Velocity NY 2013 Economic Failure Boundary Unacceptable Workload Boundary Operating Point FAILURE Accident Boundary Operating at the Edge of Failure
  • 18. Economic Failure Boundary Unacceptable Workload Boundary Accident Boundary Management Pressure Towards Economic Efficiency Gradient Towards Least Effort Counter Gradient For More Resilience ‘‘Going solid’’: a model of system dynamics and consequences for patient safety - R Cook, J Rasmussen Resilience in complex adaptive systems: Operating at the Edge of Failure - Richard Cook - Talk at Velocity NY 2013 Operating at the Edge of Failure
  • 19. ‘‘Going solid’’: a model of system dynamics and consequences for patient safety - R Cook, J Rasmussen Resilience in complex adaptive systems: Operating at the Edge of Failure - Richard Cook - Talk at Velocity NY 2013 Economic Failure Boundary Unacceptable Workload Boundary Accident Boundary Error Margin Marginal Boundary Operating at the Edge of Failure
  • 20. ‘‘Going solid’’: a model of system dynamics and consequences for patient safety - R Cook, J Rasmussen Resilience in complex adaptive systems: Operating at the Edge of Failure - Richard Cook - Talk at Velocity NY 2013 Accident Boundary Marginal Boundary ? Operating at the Edge of Failure
  • 21. ‘‘Going solid’’: a model of system dynamics and consequences for patient safety - R Cook, J Rasmussen Resilience in complex adaptive systems: Operating at the Edge of Failure - Richard Cook - Talk at Velocity NY 2013 Operating at the Edge of Failure Accident Boundary Marginal Boundary
  • 23. Resilience is by Design Photo courtesy of FEMA/Joselyne Augustino
  • 24. “Autonomy makes information local, leading to greater certainty and stability.” - Mark Burgess In Search of Certainty - Mark Burgess
  • 26. Promise Theory Promises converge towards A definite outcome from unpredictable beginnings improved Stability Commands diverge into unpredictable outcomes from definite beginnings decreased Stability
  • 28. MeerkatsPuppies! Now that I’ve got your attention, complexity theory - Nicolas Perony, TED talk
  • 29. “In three words, in the animal kingdom, simplicity leads to complexity 
 which leads to resilience.” - Nicolas Perony Puppies! Now that I’ve got your attention, complexity theory - Nicolas Perony, TED talk
  • 31. Dealing in Security Understanding vital services, and how they keep you safe 1INDIVIDUAL (you) 6ways to die 3sets of essential services 7layers of PROTECTION Dealing in Security - Mike Bennet, Vinay Gupta
  • 32. What we can learn from Resilience in Biological and Social Systems 1. Feature Diversity and redundancy 2. Inter-Connected network structure 3. Wide distribution across all scales 4. Capacity to self-adapt & self-organize Toward Resilient Architectures 1: Biology Lessons - Michael Mehaffy, Nikos A. Salingaros Applying resilience thinking: Seven principles for building resilience in social-ecological systems - Reinette Biggs et. al.
  • 34.
  • 35. We Need To Manage Failure Not Try To Avoid It
  • 37. Crash Only Software Crash-Only Software - George Candea, Armando Fox Stop = Crash Safely Start = Recover Fast
  • 38. Recursive Restartability Turning the Crash-Only Sledgehammer into a Scalpel Recursive Restartability: Turning the Reboot Sledgehammer into a Scalpel - George Candea, Armando Fox
  • 39. Traditional State Management Object Critical state that needs protection Client Thread boundary Synchronous dispatch Thread boundary ? Utterly broken
  • 41. Requirements for a Sane Failure Model 1. Contained—Avoid cascading failures 2. Reified—as messages 3. Signalled—Asynchronously 4. Observed—by 1-N 5. Managed—Outside failed Context Failures need to be
  • 44. Out of the Tar Pit - Ben Moseley , Peter Marks • Input Data • Derived Data Critical We need a way out of the State Tar Pit
  • 45. Essential State Out of the Tar Pit - Ben Moseley , Peter Marks Essential Logic Accidental State and Control We need a way out of the State Tar Pit
  • 48. Think Vending Machine Programmer Service Guy Inserts coins Gets coffee Out of coffee beans failure Adds more beans Out of coffee beans error WRONG Coffee Machine
  • 50. Error Kernel Pattern Onion-layered state & Failure management Making reliable distributed systems in the presence of software errors - Joe Armstrong On Erlang, State and Crashes - Jesper Louis Andersen
  • 51. Onion Layered State Management Error Kernel Object Critical state that needs protection Client Supervision Supervision Thread boundary Supervision
  • 52. Demo Time Let’s model a resilient vending machine, in Akka
  • 53. Demo Runner object VendingMachineDemo extends App {
 
 val system = ActorSystem("vendingMachineDemo")
 val coffeeMachine = system.actorOf(Props[CoffeeMachineManager], "coffeeMachineManager")
 val customer = Inbox.create(system) // emulates the customer
 … // test runs 
 system.shutdown()
 } https://gist.github.com/jboner/d24c0eb91417a5ec10a6
  • 54. Test Happy Path // Insert 2 coins and get an Espresso
 customer.send(coffeeMachine, Coins(2))
 customer.send(coffeeMachine, Selection(Espresso))
 val Beverage(coffee1) = customer.receive(5.seconds)
 println(s"Got myself an $coffee1")
 assert(coffee1 == Espresso) https://gist.github.com/jboner/d24c0eb91417a5ec10a6
  • 55. Test User Error customer.send(coffeeMachine, Coins(1))
 customer.send(coffeeMachine, Selection(Latte))
 val NotEnoughCoinsError(message) = customer.receive(5.seconds)
 println(s"Got myself a validation error: $message")
 assert(message == "Please insert [1] coins") https://gist.github.com/jboner/d24c0eb91417a5ec10a6
  • 56. Test System Failure // Insert 1 coin (had 1 before) and try to get my Latte
 // Machine should:
 // 1. Fail
 // 2. Restart
 // 3. Resubmit my order
 // 4. Give me my coffee
 customer.send(coffeeMachine, Coins(1))
 customer.send(coffeeMachine, TriggerOutOfCoffeeBeansFailure)
 customer.send(coffeeMachine, Selection(Latte))
 val Beverage(coffee2) = customer.receive(5.seconds)
 println(s"Got myself a $coffee2")
 assert(coffee2 == Latte)
 https://gist.github.com/jboner/d24c0eb91417a5ec10a6
  • 57. Protocol // Coffee types
 trait CoffeeType
 case object BlackCoffee extends CoffeeType
 case object Latte extends CoffeeType
 case object Espresso extends CoffeeType
 
 // Commands
 case class Coins(number: Int)
 case class Selection(coffee: CoffeeType)
 case object TriggerOutOfCoffeeBeansFailure
 
 // Events
 case class CoinsReceived(number: Int)
 
 // Replies
 case class Beverage(coffee: CoffeeType)
 
 // Errors
 case class NotEnoughCoinsError(message: String)
 
 // Failures
 case class OutOfCoffeeBeansFailure(customer: ActorRef,
 pendingOrder: Selection,
 nrOfInsertedCoins: Int) extends Exception https://gist.github.com/jboner/d24c0eb91417a5ec10a6
  • 58. CoffeeMachine class CoffeeMachine extends Actor {
 val price = 2
 var nrOfInsertedCoins = 0
 var outOfCoffeeBeans = false
 var totalNrOfCoins = 0
 
 def receive = { … }
 
 override def postRestart(failure: Throwable): Unit = { … } } https://gist.github.com/jboner/d24c0eb91417a5ec10a6
  • 59. CoffeeMachine def receive = {
 case Coins(nr) =>
 nrOfInsertedCoins += nr
 totalNrOfCoins += nr
 println(s"Inserted [$nr] coins")
 println(s"Total number of coins in machine is [$totalNrOfCoins]")
 
 case selection @ Selection(coffeeType) =>
 if (nrOfInsertedCoins < price)
 sender.tell(NotEnoughCoinsError( s”Insert [${price - nrOfInsertedCoins}] coins"), self)
 else {
 if (outOfCoffeeBeans)
 throw new OutOfCoffeeBeansFailure(sender, selection, nrOfInsertedCoins)
 println(s"Brewing your $coffeeType")
 sender.tell(Beverage(coffeeType), self)
 nrOfInsertedCoins = 0
 }
 
 case TriggerOutOfCoffeeBeansFailure =>
 outOfCoffeeBeans = true
 } https://gist.github.com/jboner/d24c0eb91417a5ec10a6
  • 60. CoffeeMachine override def postRestart(failure: Throwable): Unit = {
 println(s"Restarting coffee machine...")
 failure match {
 case OutOfCoffeeBeansFailure(customer, pendingOrder, coins) =>
 nrOfInsertedCoins = coins
 outOfCoffeeBeans = false
 println(s"Resubmitting pending order $pendingOrder")
 context.self.tell(pendingOrder, customer)
 }
 } https://gist.github.com/jboner/d24c0eb91417a5ec10a6
  • 61. Supervisor class CoffeeMachineManager extends Actor {
 override val supervisorStrategy =
 OneForOneStrategy(maxNrOfRetries = 10, withinTimeRange = 1.minute) {
 case e: OutOfCoffeeBeansFailure =>
 println(s"ServiceGuy notified: $e")
 Restart
 case _: Exception =>
 Escalate
 }
 
 // to simplify things he is only managing 1 single machine
 val machine = context.actorOf( Props[CoffeeMachine], name = "coffeeMachine")
 
 def receive = {
 case request => machine.forward(request)
 }
 } https://gist.github.com/jboner/d24c0eb91417a5ec10a6
  • 63. We can not keep putting all eggs in the same basket
  • 64. We need to Maintain Diversity and Redundancy
  • 66. Here, We are living in the Looming Shadow of Impossibility Theorems CAP: Consistency is impossible FLP: Consensus is impossible
  • 67. Towards Resilient Distributed Systems Isolation • Autonomous Microservices • Resilient Protocols • Virtualization Data Resilience • Eventual & Causal Consistency • Event Logging • Flow Control / Feedback Control Self-healing • Decentralized Architectures • Gossip Protocols • Failure Detection Embrace the Network •Asynchronicity •Location Transparency
  • 68. Microservices 1. Autonomy 2. Isolation 3. Mobility 4. Single Responsibility 5. Exclusive StatE
  • 69. An autonomous Service can only promise its own behavior Apply Promise Theory
  • 70. We need to decompose the system using Consistency Boundaries
  • 71. Inside Data Our current present—state Outside Data Blast from the past—facts Between Services Hope for the future—commands Data on the inside vs Data on the outside - Pat Helland
  • 72. WITHIN the Consistency Boundary we can have STRONG CONSISTENCY
  • 74. We need Systems that are Decoupled in Time and Space
  • 75. Embrace the Network •Go Asynchronous •Make distribution first class •Learn from the mistakes of RPC, EJB & CORBA •Leverage Location Transparency •Actor Model does it right
  • 76. Location Transparency One communication abstraction across all dimensions of scale Core Socket CPU Container Server Rack Data Center GLobal
  • 77. Resilient Protocols are tolerant to •Message loss •Message reordering •Message duplication Embrace ACID 2.0 • Associative • Commutative • Idempotent • Distributed Depend on • Asynchronous Communication • Eventual Consistency
  • 78. “To make a system of interconnected components crash-only, it must be designed so that components can tolerate the crashes and temporary unavailability of their peers. This means we require: [1] strong modularity with relatively impermeable component boundaries, [2] timeout-based communication and lease-based resource allocation, and [3] self- describing requests that carry a time-to-live and information on whether they are idempotent.” - George Candea, Armando Fox Crash-Only Software - George Candea, Armando Fox
  • 79. "Software components should be designed such that they can deny service for any request or call. Then, if an underlying component can say No, apps must be designed to take No for an answer and decide how to proceed: give up, wait and retry, reduce fidelity, etc.” - George Candea, Armando Fox Recursive Restartability: Turning the Reboot Sledgehammer into a Scalpel - George Candea, Armando Fox
  • 80. Services need to learn to accept NO for an answer
  • 83. “Two-phase commit is the anti-availability protocol.” - Pat Helland Standing on Distributed Shoulders of Giants - Pat Helland
  • 84. Eventual Consistency We have to rely on But relax, it’s how the world works
  • 86. “In general, application developers simply do not implement large scalable applications assuming distributed transactions.” - Pat Helland Life Beyond Distributed Transactions - Pat Helland
  • 88. “The truth is the log. The database is a cache of a subset of the log.” - Pat Helland Immutability Changes Everything - Pat Helland
  • 90. Event Logging • Work with Facts—immutable values • Event Sourcing • DB of Facts—Keep all history • Just replay on failure • Free Auditing, Debugging, Replication • Single Writer PRinciple • Avoids OO-Relational impedence mismatch • CQRS—Separate the Read & Write Model
  • 91. Let’s model a resilient & Event Logged vending machine, in Akka Demo Time
  • 92. Event Logged CoffeeMachine // Events
 case class CoinsReceived(number: Int)
 class CoffeeMachine extends PersistentActor {
 val price = 2
 var nrOfInsertedCoins = 0
 var outOfCoffeeBeans = false
 var totalNrOfCoins = 0
 
 override def persistenceId = "CoffeeMachine"
 
 override def receiveCommand: Receive = {
 case Coins(nr) =>
 nrOfInsertedCoins += nr
 println(s"Inserted [$nr] coins")
 persist(CoinsReceived(nr)) { evt =>
 totalNrOfCoins += nr
 println(s"Total number of coins in machine is [$totalNrOfCoins]")
 } … } override def receiveRecover: Receive = {
 case CoinsReceived(coins) =>
 totalNrOfCoins += coins
 println(s"Total number of coins in machine is [$totalNrOfCoins]")
 }
 } https://gist.github.com/jboner/1db37eeee3ed3c9422e4
  • 93. “An escalator can never break: it can only become stairs. You should never see an Escalator Temporarily Out Of Order sign, just Escalator Temporarily Stairs. Sorry for the convenience.” - Mitch Hedberg
  • 96. Little’s Law L = λW Queue Length = Arrival Rate * Response Time W = L/λ Response Time = Queue Length / Arrival Rate W: Response Time L: Queue Length
  • 99. “Continuously compare the actual output to its desired reference value; then apply a change to the system inputs that counteracts any deviation of the actual output from the reference.” - Philipp K. Janert Feedback Control for Computer Systems - Philipp K. Janet The Feedback Principle
  • 102. Places to Intervene in a Complex System 1. The constants, parameters or numbers 2. The sizes of buffers relative to their flows 3. The structure of material stocks and flows 4. The lengths of delays, relative to the rate of system change 5. The strength of negative feedback loops 6. The gain around driving positive feedback loops 7. The structure of information flows 8. The rules of the system 9. The power to add, change, evolve, or self-organize structure 10. The goals of the system 11. The mindset or paradigm out of which the system arises 12. The power to transcend paradigms Leverage Points: Places to Intervene in a System - Donella Meadows:
  • 103. Triple Loop Learning Loop 1: Follow the rules Loop 2: Change the rules Loop 3: Learn how to learn Triple Loop Learning - Chris Argyris
  • 105. What can we learn from Arnold? Blow things up
  • 107. Pull the Plug …and see what happens
  • 108.
  • 110. “Complex systems run as broken systems.” - richard Cook How Complex Systems Fail - Richard Cook
  • 111. Resilience is by Design Photo courtesy of FEMA/Joselyne Augustino
  • 113. References Drift into Failure - http://www.amazon.com/Drift-into-Failure-Components-Understanding-ebook/dp/B009KOKXKY How Complex Systems Fail - http://web.mit.edu/2.75/resources/random/How%20Complex%20Systems%20Fail.pdf Leverage Points: Places to Intervene in a System - http://www.donellameadows.org/archives/leverage-points-places-to-intervene-in-a-system/ Going Solid: A Model of System Dynamics and Consequences for Patient Safety - http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1743994/ Resilience in Complex Adaptive Systems: Operating at the Edge of Failure - https://www.youtube.com/watch?v=PGLYEDpNu60 Puppies! Now that I’ve got your attention, Complexity Theory - https://www.ted.com/talks/ nicolas_perony_puppies_now_that_i_ve_got_your_attention_complexity_theory How Bacteria Becomes Resistant - http://www.abc.net.au/science/slab/antibiotics/resistance.htm Towards Resilient Architectures: Biology Lessons - http://www.metropolismag.com/Point-of-View/March-2013/Toward-Resilient-Architectures-1-Biology-Lessons/ Dealing in Security - http://resiliencemaps.org/files/Dealing_in_Security.July2010.en.pdf What is resilience? An introduction to social-ecological research - http://www.stockholmresilience.org/download/18.10119fc11455d3c557d6d21/1398172490555/ SU_SRC_whatisresilience_sidaApril2014.pdf Applying resilience thinking: Seven principles for building resilience in social-ecological systems - http://www.stockholmresilience.org/download/ 18.10119fc11455d3c557d6928/1398150799790/SRC+Applying+Resilience+final.pdf Crash-Only Software - https://www.usenix.org/legacy/events/hotos03/tech/full_papers/candea/candea.pdf Recursive Restartability: Turning the Reboot Sledgehammer into a Scalpel - http://roc.cs.berkeley.edu/papers/recursive_restartability.pdf Out of the Tar Pit - http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.93.8928 Bulkhead Pattern - http://skife.org/architecture/fault-tolerance/2009/12/31/bulkheads.html Making Reliable Distributed Systems in the Presence of Software Errors - http://www.erlang.org/download/armstrong_thesis_2003.pdf On Erlang, State and Crashes - http://jlouisramblings.blogspot.be/2010/11/on-erlang-state-and-crashes.html Akka Supervision - http://doc.akka.io/docs/akka/snapshot/general/supervision.html Release It!: Design and Deploy Production-Ready Software - https://pragprog.com/book/mnee/release-it Feedback Control for Computer Systems - http://www.amazon.com/Feedback-Control-Computer-Systems-Philipp/dp/1449361692 The Network in Reliable - http://queue.acm.org/detail.cfm?id=2655736 Data on the Outside vs Data on the Inside - https://msdn.microsoft.com/en-us/library/ms954587.aspx Life Beyond Distributed Transactions - http://adrianmarriott.net/logosroot/papers/LifeBeyondTxns.pdf Immutability Changes Everything - http://cidrdb.org/cidr2015/Papers/CIDR15_Paper16.pdf Standing on Distributed Shoulders of Giants - https://queue.acm.org/detail.cfm?id=2953944 Thinking in Promises - http://shop.oreilly.com/product/0636920036289.do In Search Of Certainty - http://shop.oreilly.com/product/0636920038542.do Reactive Microservices Architecture - http://www.oreilly.com/programming/free/reactive-microservices-architecture-orm.csp Reactive Streams - http://reactive-streams.org Vending Machine Akka Supervision Demo - https://gist.github.com/jboner/d24c0eb91417a5ec10a6 Persistent Vending Machine Akka Supervision Demo - https://gist.github.com/jboner/1db37eeee3ed3c9422e4
  • 115. Without Resilience Nothing Else Matters Jonas Bonér CTO Lightbend @jboner