SlideShare uma empresa Scribd logo
1 de 28
Baixar para ler offline
Resisting to the Shocks
Resilience Patterns in an unstable world!
STEFANO FAGO (Extendend Version from Meetup
Crafted Software: 7th Edition 11th
October 2018)
Resilience?
The concept of Resilience has multiple
definitions; the definition we will use is:
… The Capacity to Recover Quickly from
Difficulties; Toughness. ...
What is a Resilient System?
<< ...it is a system that on the outside seems complex but is characterized
by a simpler modular structure made up of components that, when
necessary, can detach and reconfigure themselves: this prevents the
problems of one part from cascading onto the others... >>
[A. Zolli - http://resiliencethebook.com/]
A Resilient System is featured by:
– dynamicity
– modularity
– diversity
– decoupling
– integrated shock obsorbers
Why have a Resilient System?
● ...because have a 24/7 and 99.99999 system... is Cool!?!
● ...because I'm ... an Incredible Software Engineer!?!
● ...because I don't want my Business lose money!
<< ...Many systems are built to pass QA testing rather than to survive
the world after launch... >>
[Michael Nygard - https://pragprog.com/book/mnee2/release-it-second-edition]
Fallacies Of Distributed Computing
● The network is reliable
● Latency is zero
● Bandwidth is infinite
● The network is secure
● Topology doesn't change
● There is one administrator
● Transport cost is zero
● The network is homogeneous
[https://en.wikipedia.org/wiki/Fallacies_of_distributed_computing]
[https://www.rgoarchitects.com/Files/fallacies.pdf]
The Murphy's Laws for the Resilience
● If there is anything that can break in the system, it will break!
● If there is something that can break the System, there is at least one
Customer who will find it!
● Under Pressure... things get worse!
● The size matters but ... You'll be wrong anyway!
<< ...the three most frequent types of failures we observed were due to: 1)
Inbound request pattern changes, including overload and bad actors 2)
Resource exhaustion such as CPU, memory, io_loop, or networking resources
3) Dependency failures, including infrastructure, data store, and
downstream services … >> [UBER Engineering]
Fragility? Some Causes...
● Usage of proprietary protocols and software
● Deployment of proprietary systems to a large number of computers
that cannot be properly assessed in terms of security vulnerabilities
or other potential misuses
● Single points of failure
● Inter-dependece of services
● Systems that can easily be influenced by pressure groups
● Weak architecture
● Missing fallback-scenarios, graceful degradation
https://devopsagenda.techtarget.com/opinion/Why-software-
resilience-should-be-the-real-goal-of-DevOps
Resiliency isn't Reliability...
● Reliability: The target at which software designers have always aimed:
perfect operation all the time. Reliability is the planned outcome.
● Resiliency: The ability of an app to recover from certain types of failure
and yet remain functional from the customer perspective. Resilience is
how you achieve the outcome.
https://cabforward.com/the-difference-between-reliable-and-resilient-
software/
Resilience in Distributed System :
What does it imply?
● 100% Trap: not IF it will break but ... WHEN it will break!
<< ...the normal state of operation is partial failure... >> [Adrian Hornsby]
● It is not a perfect feature!
<< ...it is impossible for a system to have all three properties of consistency,
resilience and partition-tolerance... >> [Architectural Design for
Resilience - Dong Liu, Ralph Deters, and W.J. Zhang (2010)]
● It implies complexity, it does not reduce it!
● It need to study, measure and understand the business
objectives!
Resilience in Distributed System :
Base Elements
● Isolation
● Low Coupling
● Communication Methods
● Mitigate Failures
Break down into parts, autonomy of the parties, avoid the propagation of
failures
Complementary to Isolation, contributes to the non-propagation of the failures,
the Components are ignorant of the others
It conditions how to model the domain and the recovery mechanisms, it can be
heterogeneous (Sync, Async, Location Transparency, Message Passing,
Streaming, ...)
Anticipate unavoidable failures and adopt both system and application recovery
mechanisms
Resilience in Distributed System :
Isolation is important
...using an intuitive point of view...
FAILURE & CHANGE [Mark Hibberd - https://www.youtube.com/watch?v=_VftQXWDkfk]
Patterns of Resilience (by Uwe Friedrichsen)
Patterns of Resilience: Bulkhead
Isolate! Don't Propagate!
● Redundancy of Systems and Resources: where possible, multiply a critical
resource to be readily replaceable
● Categorized Resource Allocation: Classify Resources and break them down into
measurable and manipulable reference pools
Warning: Redundancy and Pools may vary over time and some of them are
affected by more than one factor
Patterns of Resilience: Queueing
Take Your Time!
● Deferrable Work : postpone a non-urgent activity
● Bounded Queue/ Load-Levelling Queue: load-absorbers for request or
traffic spikes
● BackPressure/Throttling: queue overload management policies to avoid
indefinite growth
WARNING: Asynchrony make the coordination complex and it is necessary to
refine the approach on measurements deriving from reality
Patterns of Resilience: Timeout
Stop to Wait: Fail Fast & Don't Propagate!
● Make predicatable the duration of an activity
● Set Timing Goals, measure, refine according to reality
WARNING: The goals may be specific to a resource and does not impact the others;
how to handle timeout errors?
Patterns of Resilience: Retry
If you fail once, try again!
Some failures are temporary or recoverable...
...Trying again require: the number of attempts, the presence of a
temporal degradation between the retries (backoff)
https://aws.amazon.com/it/blogs/architecture/exponential-backoff-and-jitter/
WARNING: Assumes the Idempotence analysis of the activities involved
Patterns of Resilience: Fallback/Fail Silent
Don't Fail... Degrade gracefully!
Do not fail with destructive actions but with approximation or alternative
actions
● Default Value/Derived Value
● Alternative Actions/Invocations
● Caching
WARNING: It is needed to incorporate the relate business conditions!...
Patterns of Resilience: Limiter
No Stress, Know Your Limit!
● Rate-Limiter
● Concurrency-Limiter
● Adaptive Resource Sizing
● BackPressure/Throttling
WARNING: These policies should not replace an effort to understand the
Resource-Sizing, use appropriate algorithms and refine the reality of data for the
different use-cases.
Patterns of Resilience: Circuit Breaker
Don't do it if it hurts!
Interrupt a pathological situation with controlled and immediate failure. The
state of failure is revoked according to indices or time conditions.
WARNING: The definition of the parameters for the activation of the failure and for
the recovery, can be a difficult task and it is needed to study the consequences on
the critical-path of execution of the services.
Patterns of Resilience: Decoupling By Events
Describe in terms of the things that happen (Event), not the things that
do the work (Command)
Isolate/Decouple components, Model with Domains, accept failures with
notifications allowing the recover of the components / sub-systems
● Event-Sourcing / CQRS / Message-Passing
● SAGA ( alternative to 2PC)
WARNING: Asynchronous Activities and Domain Modeling make the system safer
but complex. It could be presents abuse of queues and listener networks. Tradeoff
between Transactionality and Compensative Activities.
Patterns of Resilience: Chaos Engineering
<< ...Chaos Engineering is the discipline of experimenting on a distributed
system in order to build confidence in the system’s capability to withstand
turbulent conditions in production... >>
https://www.oreilly.com/ideas/chaos-engineering
● Implementing Testing in Production, with realistic data and volumes!
● Having the infrastructure for continuous experiments of ... Chaos!
● Learn from every failure / Always invent new failures!
WARNING: Complex Startup, specific Skills, get products and <<...don't use the
term Chaos Engineering, use Continuous limited scope disaster recovery
instead. You might actually get a budget that way...>> [Russ Miles]
From Resilient to (auto)Recoverable
Target for Architectural Maturity [Bilgin Ibryam]
From Resilient to (auto)Recoverable
At the first sight yuo'll think to adopt these patterns only as an application
solution but...
… is in this context that DevOps practices and tools become an integral part of
a broader vision
– containers and containers orchestration
– artifacts life cycle
– distribution policies for certificates, configurations and artifacts
– monitoring & metrics
WARNING: adopting DevOps implies complexity, skills, organization and <<
...application safety and correctness, in a distributed system is still the
responsibility, of the application... >> [Christian Posta]
From Resilient to (auto)Recoverable
In order to be suitable for automation (in cloud native) environments a service
must be:
– Idempotent for restarts (a service can be killed and started multiple times).
– Idempotent for scaling up/down (a service can be autoscaled to multiple
instances).
– Idempotent service producer (other services may retry calls).
– Idempotent service consumer (the service or the mesh can retry outgoing
calls).
If you service always behaves the same way when the above actions are
performed one or multiples times, then the platform will be able recover your
services from failures without human intervention.
[https://www.infoq.com/articles/microservices-post-kubernetes - Bilgin
Ibryam]
Remember that ...
● Distributed systems are different because they fail often / Extract services
● Writing robust distributed systems costs more than writing robust single-
machine systems. / Robust, open source distributed systems are much less
common than robust, single-machine systems
● If you can fit your problem in memory, it’s probably trivial / “It’s slow” is the
hardest problem you’ll ever debug
● Implement backpressure throughout your system /Find ways to be partially
available
● Metrics are the only way to get your job done : Use percentiles, not averages
● Learn to estimate your capacity / Exploit data-locality / Writing cached data
back to persistent storage is bad
● Feature flags are how infrastructure is rolled out / Use the CAP theorem to
critique systems
https://www.somethingsimilar.com/2013/01/14/notes-on-distributed-systems-
for-young-bloods/
Resilience & Performance Anti-Patterns
Are you in doubt? Does the system get complicated? Maybe is useful to
compare the design of the system, services or resilience patterns used, with the
following performance anti-patterns!
● N+1 Calls
● N+1 Query
● Payload flood
● Granularity
● Tigh-Coupling
● Inefficient Service Flow
● Dependencies
Reality will change again but ...
...do not waste money! Be Resilient and
Recoverable!
Thank You All!!!

Mais conteúdo relacionado

Semelhante a Resisting to The Shocks

Resiliency vs High Availability vs Fault Tolerance vs Reliability
Resiliency vs High Availability vs Fault Tolerance vs  ReliabilityResiliency vs High Availability vs Fault Tolerance vs  Reliability
Resiliency vs High Availability vs Fault Tolerance vs Reliabilityjeetendra mandal
 
Being Elastic -- Evolving Programming for the Cloud
Being Elastic -- Evolving Programming for the CloudBeing Elastic -- Evolving Programming for the Cloud
Being Elastic -- Evolving Programming for the CloudRandy Shoup
 
The Great Disconnect of Data Protection: Perception, Reality and Best Practices
The Great Disconnect of Data Protection: Perception, Reality and Best PracticesThe Great Disconnect of Data Protection: Perception, Reality and Best Practices
The Great Disconnect of Data Protection: Perception, Reality and Best Practicesiland Cloud
 
Logical Architecture for Protection
Logical Architecture for ProtectionLogical Architecture for Protection
Logical Architecture for ProtectionSunita Shrivastava
 
Reactive - Is it really a Magic Pill?
Reactive - Is it really a Magic Pill?Reactive - Is it really a Magic Pill?
Reactive - Is it really a Magic Pill?Tech Triveni
 
Intro to SW Eng Principles for Cloud Computing - DNelson Apr2015
Intro to SW Eng Principles for Cloud Computing - DNelson Apr2015Intro to SW Eng Principles for Cloud Computing - DNelson Apr2015
Intro to SW Eng Principles for Cloud Computing - DNelson Apr2015Darryl Nelson
 
Pivotal gem fire_wp_hardest-problems-data-management_053013
Pivotal gem fire_wp_hardest-problems-data-management_053013Pivotal gem fire_wp_hardest-problems-data-management_053013
Pivotal gem fire_wp_hardest-problems-data-management_053013EMC
 
HA & DR System Design - Concepts and Solution
HA & DR System Design - Concepts and SolutionHA & DR System Design - Concepts and Solution
HA & DR System Design - Concepts and SolutionContinuity and Resilience
 
Microservices architecture
Microservices architectureMicroservices architecture
Microservices architectureFaren faren
 
IANS information security forum 2019 summary
IANS information security forum 2019 summaryIANS information security forum 2019 summary
IANS information security forum 2019 summaryKarun Chennuri
 
ACIC Rome & Veritas: High-Availability and Disaster Recovery Scenarios
ACIC Rome & Veritas: High-Availability and Disaster Recovery ScenariosACIC Rome & Veritas: High-Availability and Disaster Recovery Scenarios
ACIC Rome & Veritas: High-Availability and Disaster Recovery ScenariosAccenture Italia
 
Resilience reloaded - more resilience patterns
Resilience reloaded - more resilience patternsResilience reloaded - more resilience patterns
Resilience reloaded - more resilience patternsUwe Friedrichsen
 
D1.2 analysis and selection of low power techniques, services and patterns
D1.2 analysis and selection of low power techniques, services and patternsD1.2 analysis and selection of low power techniques, services and patterns
D1.2 analysis and selection of low power techniques, services and patternsBabak Sorkhpour
 
What is dr and bc 12-2017
What is dr and bc 12-2017What is dr and bc 12-2017
What is dr and bc 12-2017Atef Yassin
 
Software Architecture for Cloud Infrastructure
Software Architecture for Cloud InfrastructureSoftware Architecture for Cloud Infrastructure
Software Architecture for Cloud InfrastructureTapio Rautonen
 
Technology insights: Decision Science Platform
Technology insights: Decision Science PlatformTechnology insights: Decision Science Platform
Technology insights: Decision Science PlatformDecision Science Community
 
An Introduction to Designing Reliable Cloud Services January 2014
An Introduction to Designing Reliable Cloud Services January 2014An Introduction to Designing Reliable Cloud Services January 2014
An Introduction to Designing Reliable Cloud Services January 2014David J Rosenthal
 
Agile integration: Decomposing the monolith
Agile integration: Decomposing the monolith Agile integration: Decomposing the monolith
Agile integration: Decomposing the monolith Judy Breedlove
 
MGT3342BUS - Architecting Data Protection with Rubrik - VMworld 2017
MGT3342BUS - Architecting Data Protection with Rubrik - VMworld 2017MGT3342BUS - Architecting Data Protection with Rubrik - VMworld 2017
MGT3342BUS - Architecting Data Protection with Rubrik - VMworld 2017Andrew Miller
 

Semelhante a Resisting to The Shocks (20)

Resiliency vs High Availability vs Fault Tolerance vs Reliability
Resiliency vs High Availability vs Fault Tolerance vs  ReliabilityResiliency vs High Availability vs Fault Tolerance vs  Reliability
Resiliency vs High Availability vs Fault Tolerance vs Reliability
 
Being Elastic -- Evolving Programming for the Cloud
Being Elastic -- Evolving Programming for the CloudBeing Elastic -- Evolving Programming for the Cloud
Being Elastic -- Evolving Programming for the Cloud
 
The Great Disconnect of Data Protection: Perception, Reality and Best Practices
The Great Disconnect of Data Protection: Perception, Reality and Best PracticesThe Great Disconnect of Data Protection: Perception, Reality and Best Practices
The Great Disconnect of Data Protection: Perception, Reality and Best Practices
 
Logical Architecture for Protection
Logical Architecture for ProtectionLogical Architecture for Protection
Logical Architecture for Protection
 
Reactive - Is it really a Magic Pill?
Reactive - Is it really a Magic Pill?Reactive - Is it really a Magic Pill?
Reactive - Is it really a Magic Pill?
 
Intro to SW Eng Principles for Cloud Computing - DNelson Apr2015
Intro to SW Eng Principles for Cloud Computing - DNelson Apr2015Intro to SW Eng Principles for Cloud Computing - DNelson Apr2015
Intro to SW Eng Principles for Cloud Computing - DNelson Apr2015
 
Pivotal gem fire_wp_hardest-problems-data-management_053013
Pivotal gem fire_wp_hardest-problems-data-management_053013Pivotal gem fire_wp_hardest-problems-data-management_053013
Pivotal gem fire_wp_hardest-problems-data-management_053013
 
HA & DR System Design - Concepts and Solution
HA & DR System Design - Concepts and SolutionHA & DR System Design - Concepts and Solution
HA & DR System Design - Concepts and Solution
 
A4 (1).pdf
A4 (1).pdfA4 (1).pdf
A4 (1).pdf
 
Microservices architecture
Microservices architectureMicroservices architecture
Microservices architecture
 
IANS information security forum 2019 summary
IANS information security forum 2019 summaryIANS information security forum 2019 summary
IANS information security forum 2019 summary
 
ACIC Rome & Veritas: High-Availability and Disaster Recovery Scenarios
ACIC Rome & Veritas: High-Availability and Disaster Recovery ScenariosACIC Rome & Veritas: High-Availability and Disaster Recovery Scenarios
ACIC Rome & Veritas: High-Availability and Disaster Recovery Scenarios
 
Resilience reloaded - more resilience patterns
Resilience reloaded - more resilience patternsResilience reloaded - more resilience patterns
Resilience reloaded - more resilience patterns
 
D1.2 analysis and selection of low power techniques, services and patterns
D1.2 analysis and selection of low power techniques, services and patternsD1.2 analysis and selection of low power techniques, services and patterns
D1.2 analysis and selection of low power techniques, services and patterns
 
What is dr and bc 12-2017
What is dr and bc 12-2017What is dr and bc 12-2017
What is dr and bc 12-2017
 
Software Architecture for Cloud Infrastructure
Software Architecture for Cloud InfrastructureSoftware Architecture for Cloud Infrastructure
Software Architecture for Cloud Infrastructure
 
Technology insights: Decision Science Platform
Technology insights: Decision Science PlatformTechnology insights: Decision Science Platform
Technology insights: Decision Science Platform
 
An Introduction to Designing Reliable Cloud Services January 2014
An Introduction to Designing Reliable Cloud Services January 2014An Introduction to Designing Reliable Cloud Services January 2014
An Introduction to Designing Reliable Cloud Services January 2014
 
Agile integration: Decomposing the monolith
Agile integration: Decomposing the monolith Agile integration: Decomposing the monolith
Agile integration: Decomposing the monolith
 
MGT3342BUS - Architecting Data Protection with Rubrik - VMworld 2017
MGT3342BUS - Architecting Data Protection with Rubrik - VMworld 2017MGT3342BUS - Architecting Data Protection with Rubrik - VMworld 2017
MGT3342BUS - Architecting Data Protection with Rubrik - VMworld 2017
 

Mais de Stefano Fago

Exploring Open Source Licensing
Exploring Open Source LicensingExploring Open Source Licensing
Exploring Open Source LicensingStefano Fago
 
Non solo Microservizi: API, Prodotti e Piattaforme
Non solo Microservizi: API, Prodotti e PiattaformeNon solo Microservizi: API, Prodotti e Piattaforme
Non solo Microservizi: API, Prodotti e PiattaformeStefano Fago
 
Don’t give up, You can... Cache!
Don’t give up, You can... Cache!Don’t give up, You can... Cache!
Don’t give up, You can... Cache!Stefano Fago
 
Gamification - Introduzione e Idee di un NON GIOCATORE
Gamification - Introduzione e Idee di un NON GIOCATOREGamification - Introduzione e Idee di un NON GIOCATORE
Gamification - Introduzione e Idee di un NON GIOCATOREStefano Fago
 
Quale IT nel futuro delle Banche?
Quale IT nel futuro delle Banche?Quale IT nel futuro delle Banche?
Quale IT nel futuro delle Banche?Stefano Fago
 
Microservices & Bento
Microservices & BentoMicroservices & Bento
Microservices & BentoStefano Fago
 
What drives Innovation? Innovations And Technological Solutions for the Distr...
What drives Innovation? Innovations And Technological Solutions for the Distr...What drives Innovation? Innovations And Technological Solutions for the Distr...
What drives Innovation? Innovations And Technological Solutions for the Distr...Stefano Fago
 
Reasoning about QRCode
Reasoning about QRCodeReasoning about QRCode
Reasoning about QRCodeStefano Fago
 
... thinking about Microformats!
... thinking about Microformats!... thinking about Microformats!
... thinking about Microformats!Stefano Fago
 
Uncommon Design Patterns
Uncommon Design PatternsUncommon Design Patterns
Uncommon Design PatternsStefano Fago
 
Riuso Object Oriented
Riuso Object OrientedRiuso Object Oriented
Riuso Object OrientedStefano Fago
 

Mais de Stefano Fago (13)

Exploring Open Source Licensing
Exploring Open Source LicensingExploring Open Source Licensing
Exploring Open Source Licensing
 
Non solo Microservizi: API, Prodotti e Piattaforme
Non solo Microservizi: API, Prodotti e PiattaformeNon solo Microservizi: API, Prodotti e Piattaforme
Non solo Microservizi: API, Prodotti e Piattaforme
 
Api and Fluency
Api and FluencyApi and Fluency
Api and Fluency
 
Don’t give up, You can... Cache!
Don’t give up, You can... Cache!Don’t give up, You can... Cache!
Don’t give up, You can... Cache!
 
Gamification - Introduzione e Idee di un NON GIOCATORE
Gamification - Introduzione e Idee di un NON GIOCATOREGamification - Introduzione e Idee di un NON GIOCATORE
Gamification - Introduzione e Idee di un NON GIOCATORE
 
Quale IT nel futuro delle Banche?
Quale IT nel futuro delle Banche?Quale IT nel futuro delle Banche?
Quale IT nel futuro delle Banche?
 
Microservices & Bento
Microservices & BentoMicroservices & Bento
Microservices & Bento
 
Giochi in Azienda
Giochi in AziendaGiochi in Azienda
Giochi in Azienda
 
What drives Innovation? Innovations And Technological Solutions for the Distr...
What drives Innovation? Innovations And Technological Solutions for the Distr...What drives Innovation? Innovations And Technological Solutions for the Distr...
What drives Innovation? Innovations And Technological Solutions for the Distr...
 
Reasoning about QRCode
Reasoning about QRCodeReasoning about QRCode
Reasoning about QRCode
 
... thinking about Microformats!
... thinking about Microformats!... thinking about Microformats!
... thinking about Microformats!
 
Uncommon Design Patterns
Uncommon Design PatternsUncommon Design Patterns
Uncommon Design Patterns
 
Riuso Object Oriented
Riuso Object OrientedRiuso Object Oriented
Riuso Object Oriented
 

Último

Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 

Último (20)

Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 

Resisting to The Shocks

  • 1. Resisting to the Shocks Resilience Patterns in an unstable world! STEFANO FAGO (Extendend Version from Meetup Crafted Software: 7th Edition 11th October 2018)
  • 2. Resilience? The concept of Resilience has multiple definitions; the definition we will use is: … The Capacity to Recover Quickly from Difficulties; Toughness. ...
  • 3. What is a Resilient System? << ...it is a system that on the outside seems complex but is characterized by a simpler modular structure made up of components that, when necessary, can detach and reconfigure themselves: this prevents the problems of one part from cascading onto the others... >> [A. Zolli - http://resiliencethebook.com/] A Resilient System is featured by: – dynamicity – modularity – diversity – decoupling – integrated shock obsorbers
  • 4. Why have a Resilient System? ● ...because have a 24/7 and 99.99999 system... is Cool!?! ● ...because I'm ... an Incredible Software Engineer!?! ● ...because I don't want my Business lose money! << ...Many systems are built to pass QA testing rather than to survive the world after launch... >> [Michael Nygard - https://pragprog.com/book/mnee2/release-it-second-edition]
  • 5. Fallacies Of Distributed Computing ● The network is reliable ● Latency is zero ● Bandwidth is infinite ● The network is secure ● Topology doesn't change ● There is one administrator ● Transport cost is zero ● The network is homogeneous [https://en.wikipedia.org/wiki/Fallacies_of_distributed_computing] [https://www.rgoarchitects.com/Files/fallacies.pdf]
  • 6. The Murphy's Laws for the Resilience ● If there is anything that can break in the system, it will break! ● If there is something that can break the System, there is at least one Customer who will find it! ● Under Pressure... things get worse! ● The size matters but ... You'll be wrong anyway! << ...the three most frequent types of failures we observed were due to: 1) Inbound request pattern changes, including overload and bad actors 2) Resource exhaustion such as CPU, memory, io_loop, or networking resources 3) Dependency failures, including infrastructure, data store, and downstream services … >> [UBER Engineering]
  • 7. Fragility? Some Causes... ● Usage of proprietary protocols and software ● Deployment of proprietary systems to a large number of computers that cannot be properly assessed in terms of security vulnerabilities or other potential misuses ● Single points of failure ● Inter-dependece of services ● Systems that can easily be influenced by pressure groups ● Weak architecture ● Missing fallback-scenarios, graceful degradation https://devopsagenda.techtarget.com/opinion/Why-software- resilience-should-be-the-real-goal-of-DevOps
  • 8. Resiliency isn't Reliability... ● Reliability: The target at which software designers have always aimed: perfect operation all the time. Reliability is the planned outcome. ● Resiliency: The ability of an app to recover from certain types of failure and yet remain functional from the customer perspective. Resilience is how you achieve the outcome. https://cabforward.com/the-difference-between-reliable-and-resilient- software/
  • 9. Resilience in Distributed System : What does it imply? ● 100% Trap: not IF it will break but ... WHEN it will break! << ...the normal state of operation is partial failure... >> [Adrian Hornsby] ● It is not a perfect feature! << ...it is impossible for a system to have all three properties of consistency, resilience and partition-tolerance... >> [Architectural Design for Resilience - Dong Liu, Ralph Deters, and W.J. Zhang (2010)] ● It implies complexity, it does not reduce it! ● It need to study, measure and understand the business objectives!
  • 10. Resilience in Distributed System : Base Elements ● Isolation ● Low Coupling ● Communication Methods ● Mitigate Failures Break down into parts, autonomy of the parties, avoid the propagation of failures Complementary to Isolation, contributes to the non-propagation of the failures, the Components are ignorant of the others It conditions how to model the domain and the recovery mechanisms, it can be heterogeneous (Sync, Async, Location Transparency, Message Passing, Streaming, ...) Anticipate unavoidable failures and adopt both system and application recovery mechanisms
  • 11. Resilience in Distributed System : Isolation is important ...using an intuitive point of view... FAILURE & CHANGE [Mark Hibberd - https://www.youtube.com/watch?v=_VftQXWDkfk]
  • 12. Patterns of Resilience (by Uwe Friedrichsen)
  • 13.
  • 14. Patterns of Resilience: Bulkhead Isolate! Don't Propagate! ● Redundancy of Systems and Resources: where possible, multiply a critical resource to be readily replaceable ● Categorized Resource Allocation: Classify Resources and break them down into measurable and manipulable reference pools Warning: Redundancy and Pools may vary over time and some of them are affected by more than one factor
  • 15. Patterns of Resilience: Queueing Take Your Time! ● Deferrable Work : postpone a non-urgent activity ● Bounded Queue/ Load-Levelling Queue: load-absorbers for request or traffic spikes ● BackPressure/Throttling: queue overload management policies to avoid indefinite growth WARNING: Asynchrony make the coordination complex and it is necessary to refine the approach on measurements deriving from reality
  • 16. Patterns of Resilience: Timeout Stop to Wait: Fail Fast & Don't Propagate! ● Make predicatable the duration of an activity ● Set Timing Goals, measure, refine according to reality WARNING: The goals may be specific to a resource and does not impact the others; how to handle timeout errors?
  • 17. Patterns of Resilience: Retry If you fail once, try again! Some failures are temporary or recoverable... ...Trying again require: the number of attempts, the presence of a temporal degradation between the retries (backoff) https://aws.amazon.com/it/blogs/architecture/exponential-backoff-and-jitter/ WARNING: Assumes the Idempotence analysis of the activities involved
  • 18. Patterns of Resilience: Fallback/Fail Silent Don't Fail... Degrade gracefully! Do not fail with destructive actions but with approximation or alternative actions ● Default Value/Derived Value ● Alternative Actions/Invocations ● Caching WARNING: It is needed to incorporate the relate business conditions!...
  • 19. Patterns of Resilience: Limiter No Stress, Know Your Limit! ● Rate-Limiter ● Concurrency-Limiter ● Adaptive Resource Sizing ● BackPressure/Throttling WARNING: These policies should not replace an effort to understand the Resource-Sizing, use appropriate algorithms and refine the reality of data for the different use-cases.
  • 20. Patterns of Resilience: Circuit Breaker Don't do it if it hurts! Interrupt a pathological situation with controlled and immediate failure. The state of failure is revoked according to indices or time conditions. WARNING: The definition of the parameters for the activation of the failure and for the recovery, can be a difficult task and it is needed to study the consequences on the critical-path of execution of the services.
  • 21. Patterns of Resilience: Decoupling By Events Describe in terms of the things that happen (Event), not the things that do the work (Command) Isolate/Decouple components, Model with Domains, accept failures with notifications allowing the recover of the components / sub-systems ● Event-Sourcing / CQRS / Message-Passing ● SAGA ( alternative to 2PC) WARNING: Asynchronous Activities and Domain Modeling make the system safer but complex. It could be presents abuse of queues and listener networks. Tradeoff between Transactionality and Compensative Activities.
  • 22. Patterns of Resilience: Chaos Engineering << ...Chaos Engineering is the discipline of experimenting on a distributed system in order to build confidence in the system’s capability to withstand turbulent conditions in production... >> https://www.oreilly.com/ideas/chaos-engineering ● Implementing Testing in Production, with realistic data and volumes! ● Having the infrastructure for continuous experiments of ... Chaos! ● Learn from every failure / Always invent new failures! WARNING: Complex Startup, specific Skills, get products and <<...don't use the term Chaos Engineering, use Continuous limited scope disaster recovery instead. You might actually get a budget that way...>> [Russ Miles]
  • 23. From Resilient to (auto)Recoverable Target for Architectural Maturity [Bilgin Ibryam]
  • 24. From Resilient to (auto)Recoverable At the first sight yuo'll think to adopt these patterns only as an application solution but... … is in this context that DevOps practices and tools become an integral part of a broader vision – containers and containers orchestration – artifacts life cycle – distribution policies for certificates, configurations and artifacts – monitoring & metrics WARNING: adopting DevOps implies complexity, skills, organization and << ...application safety and correctness, in a distributed system is still the responsibility, of the application... >> [Christian Posta]
  • 25. From Resilient to (auto)Recoverable In order to be suitable for automation (in cloud native) environments a service must be: – Idempotent for restarts (a service can be killed and started multiple times). – Idempotent for scaling up/down (a service can be autoscaled to multiple instances). – Idempotent service producer (other services may retry calls). – Idempotent service consumer (the service or the mesh can retry outgoing calls). If you service always behaves the same way when the above actions are performed one or multiples times, then the platform will be able recover your services from failures without human intervention. [https://www.infoq.com/articles/microservices-post-kubernetes - Bilgin Ibryam]
  • 26. Remember that ... ● Distributed systems are different because they fail often / Extract services ● Writing robust distributed systems costs more than writing robust single- machine systems. / Robust, open source distributed systems are much less common than robust, single-machine systems ● If you can fit your problem in memory, it’s probably trivial / “It’s slow” is the hardest problem you’ll ever debug ● Implement backpressure throughout your system /Find ways to be partially available ● Metrics are the only way to get your job done : Use percentiles, not averages ● Learn to estimate your capacity / Exploit data-locality / Writing cached data back to persistent storage is bad ● Feature flags are how infrastructure is rolled out / Use the CAP theorem to critique systems https://www.somethingsimilar.com/2013/01/14/notes-on-distributed-systems- for-young-bloods/
  • 27. Resilience & Performance Anti-Patterns Are you in doubt? Does the system get complicated? Maybe is useful to compare the design of the system, services or resilience patterns used, with the following performance anti-patterns! ● N+1 Calls ● N+1 Query ● Payload flood ● Granularity ● Tigh-Coupling ● Inefficient Service Flow ● Dependencies
  • 28. Reality will change again but ... ...do not waste money! Be Resilient and Recoverable! Thank You All!!!