SlideShare uma empresa Scribd logo
1 de 64
Baixar para ler offline
FULLSTACK TECH RADAR DAY
CHAOS is a Ladder
Haggai Philip Zagury (hagzag) | DevOps Group
& Tech Lead @ Tikal Knowledge
FULLSTACK TECH RADAR DAY
Haggai Philip Zagury
DevOps Group & Tech Lead -> 10+ years @ Tikal
My open thinking and open techniques ideology is driven by Open Source technologies and the
collaborative manner defining my M.O.
My solution driven approach is strongly based on hands-on and deep understanding of Operating
Systems, Applications stacks and Software languages, Networking, Cloud in general and today more
an more Cloud Native solutions.
@hagzag
FULLSTACK TECH RADAR DAY
What is Chaos Engineering ?
The philosophy behind Chaos Engineering
FULLSTACK TECH RADAR DAY
http://bit.ly/2VQGCup
Chaos means many different
things to different people…
FULLSTACK TECH RADAR DAY
In 1 Sentence
‣ Chaos Engineering is the discipline of
experimenting on a distributed system in
order to build confidence in the system’s
capability to withstand turbulent
conditions in production.
Building Trust
FULLSTACK TECH RADAR DAY
Building Resilient Trust in systems is hard !
Backend DevOps Frontend & Mobile
}
Frontend
Backend
DevOps
DevOps
FULLSTACK TECH RADAR DAY
Building confidence in computer systems is hard !
● Systems fail (Some “Design to Fail”)
● “Best Effort” Infra
● *aaS
● Cloud
● Cloud native
● Hybrid Cloud
● …
FULLSTACK TECH RADAR DAY
Experiment in Pr duction !
FULLSTACK TECH RADAR DAY
Additional to “Traditional Testing”
● Chaos Engineering goes beyond
traditional (failure) testing in that it's not
only about verifying assumptions. It also
helps us explore the many unpredictable
things that could happen and discover
new properties of our inherently chaotic
systems.
FULLSTACK TECH RADAR DAY
Hypothesis-Driven Experiments
● Hypothesis Define your steady state
FULLSTACK TECH RADAR DAY
Hypothesis-Driven Experiments
● Hypothesis Define your steady state
● Experiment by challenging it
FULLSTACK TECH RADAR DAY
Hypothesis-Driven Experiments
● Hypothesis Define your steady state
● Experiment by challenging it
● Analyse your findings - spread the word
FULLSTACK TECH RADAR DAY
Hypothesis-Driven Experiments
● Hypothesis - Define your steady state
● Experiment by challenging it
● Analyse your findings - spread the word
● Action items should be noted
● Perhaps run another round with
other limits / variables
● Immune your system (eventually)
Immune
FULLSTACK TECH RADAR DAY
Chaos engineering is:
● Like injecting a Vaccine to immune yourself.
● Increase system resilience - by discovering vulnerabilities
● Identify failure before it becomes an outage
● Better define your steady state (iterative) and constantly challenge it.
FULLSTACK TECH RADAR DAY
Chaos engineering isn’t:
● Breaking down production on purpose.
● A (new) blame mechanism
● Surprising partial outages.
● Taking down all the system at the same time.
FULLSTACK TECH RADAR DAY
Chaos Engineering Origins?
How did we get here ?
FULLSTACK TECH RADAR DAY
DevOps
2010
FULLSTACK TECH RADAR DAY
DevOps
2010
FULLSTACK TECH RADAR DAY
DevOps
2010 2011
FaaS
FULLSTACK TECH RADAR DAY
DevOps
2010 20111998
How Complex Systems Fail (Being a Short
Treatise on the Nature of Failure;
How Failure is Evaluated; How Failure is Attributed to
Proximate Cause; and the Resulting New
25 years Resilience partitionist
FULLSTACK TECH RADAR DAY
DevOps
2010 20111998
How Complex Systems Fail (Being a Short
Treatise on the Nature of Failure;
How Failure is Evaluated; How Failure is Attributed to
Proximate Cause; and the Resulting New
25 years Resilience partitionist
http://erikhollnagel.com/ideas/resilience-engineering.html
A system is resilient if it can adjust its
functioning prior to, during, or following
events (changes, disturbances, and
opportunities), and thereby sustain
required operations under both expected and
Resilience Engineering
FULLSTACK TECH RADAR DAY
Unleash the Army
DevOps
2010 2011 2014
Chaos Engineer
Role Announced
FULLSTACK TECH RADAR DAY
DevOps
2010 2011 2014
Chaos Engineer
Role Announced
gremlin.com
Failure as a service
Unleash the Army
2015
FULLSTACK TECH RADAR DAY
DevOps
2010 2011 2014
Chaos Engineer
Role Announced
gremlin.com
Failure as a service
2017
Unleash the Army
2015
A system is resilient if it can adjust its
functioning prior to, during, or following
events (changes, disturbances, and
opportunities), and thereby sustain
required operations under both expected and
Resilience Engineering
FULLSTACK TECH RADAR DAY
DevOps
2010 20142011
http://erikhollnagel.com/ideas/resilience-engineering.html
2015
A system is resilient if it can adjust its
functioning prior to, during, or following
events (changes, disturbances, and
opportunities), and thereby sustain
required operations under both expected and
Resilience Engineering
20172016
Building trust in

Chaos Engineering
1998
Chaos Engineer
Role Announced
FULLSTACK TECH RADAR DAY
Where we meet Chaos
How did we get here ?
FULLSTACK TECH RADAR DAY
Where we meet Chaos
Chaos
starts here
FULLSTACK TECH RADAR DAY
In 1 Sentence
‣ Chaos Engineering is the discipline of experimenting on a
distributed system in order to build confidence in the
system’s capability to withstand turbulent
conditions in production.
‣ Preparing for the unknown …
Building Trust
FULLSTACK TECH RADAR DAY
Turbulent condition - failing node in a cluster
default
a b
b
aa a
● 2 services in a 3 node cluster
FULLSTACK TECH RADAR DAY
Turbulent conditions
default
a b
b
aa a
● What’s my application going to suffer from ?
FULLSTACK TECH RADAR DAY
Turbulent conditions
default
a b
b aa
a
● 2 services in a 3 node cluster
● What’s my application going
to suffer from ?
● Is this OK ?
FULLSTACK TECH RADAR DAY
Turbulent conditions
default
a b
b
aa a
● Back to Normal
FULLSTACK TECH RADAR DAY
Turbulents
FULLSTACK TECH RADAR DAY
How to practice Chaos Engineering ?
Perquisites + Tools of Chaos Engineering
FULLSTACK TECH RADAR DAY
Practice
● You should have:
● GameDays
● ChaosDays
● Controlled & Schedule drills /
experiments
FULLSTACK TECH RADAR DAY
Practice & Collaborate
● You should have:
● GameDays
● ChaosDays
● Controlled & Schedule drills /
experiments
FULLSTACK TECH RADAR DAY
It’s slowly becoming a culture
https://github.com/dastergon/awesome-chaos-engineering
FULLSTACK TECH RADAR DAY
Automation is key !
FULLSTACK TECH RADAR DAY
Monitoring (ROI)
Observability
DevOps
FULLSTACK TECH RADAR DAY
Not just graphs and logs (that too)
● RCA’s - recording and being able to reach it !
● Document, Document, Document - great resources on how to do that.
● We don’t Chaos everything …
● Only what makes sense / repeats
● Game / Chaos Days -> keep experiment definitions for GameDay/
ChaosDay to define
FULLSTACK TECH RADAR DAY
SLA … is innovation driven - how fast did you do without
failing ?
https://cloudplatformonline.com/rs/248-TPC-286/images/DORA-State%20of%20DevOps.pdf
FULLSTACK TECH RADAR DAY
SLA … is innovation driven - how fast did you do without
failing ?
https://cloudplatformonline.com/rs/248-TPC-286/images/DORA-State%20of%20DevOps.pdf
FULLSTACK TECH RADAR DAY
Experiment !
FULLSTACK TECH RADAR DAY
Application
Caching
Database
Hardware
Network
What layer ? - All !
FULLSTACK TECH RADAR DAY
The ultimate chaos “butterfly Affect” / “Domino Affect”
● How will my application do
● without cache ?
● without a certain api available ?
● with n sessions
FULLSTACK TECH RADAR DAY
The ultimate chaos “butterfly Affect” / “Domino Affect”
● How will my application do
● without cache ?
● without a certain api available ?
● with n sessions
FULLSTACK TECH RADAR DAY
Applying Chos Engineering practices
Log | Messure

Monitor
Break Things & Auto Recover

Experiment
Full Cycle - Chaos

Immune
Application
Caching
Database
Hardware
Network
Security
FULLSTACK TECH RADAR DAY
Where is Chaos going ?
"the discipline of experimenting on
a distributed system in order to
build confidence in the system's
capability to withstand turbulent
conditions in production."
FULLSTACK TECH RADAR DAY
Toolz
FULLSTACK TECH RADAR DAY
Failure as a service
FULLSTACK TECH RADAR DAY
Game-day resources
https://www.gremlin.com/community/tutorials/planning-your-own-chaos-day/
Planning your GameDay ?
Feel Free to contact me directly - 

we’d be happy to help -> hagzag@tikalk.com
FULLSTACK TECH RADAR DAY
Hypothesis - steady state
{
"name": "all-our-microservices-should-be-healthy",
"type": "probe",
"tolerance": "true",
"provider": {
"type": "python",
"module": "chaosk8s.probes",
"func": "microservice_available_and_healthy",
"arguments": {
"name": "myapp",
"ns": “default"
}
}
}
FULLSTACK TECH RADAR DAY
Experiment Terminate a pod !
● What to do
● When to do it
{
"type": "action",
"name": "terminate-db-pod",
"provider": {
"type": "python",
"module": "chaosk8s.pod.actions",
"func": "terminate_pods",
"arguments": {
"label_selector": "app=my-app",
"name_pattern": "my-app-[0-9]$",
"rand": true,
"ns": "default"
}
},
"pauses": {
"after": 5
}
FULLSTACK TECH RADAR DAY
If your just peeping / evaluating
FULLSTACK TECH RADAR DAY
Chaoskube
● chaoskube is a “chaos-monkey lite” it basically takes down pod based
on a schedule to test your resilience (and there are some tweaks via
configuration)
● use —dry-run
https://github.com/linki/chaoskube
FULLSTACK TECH RADAR DAY
kube-bench
Find vulnerabilities, configuration flags, define your own policies.
FULLSTACK TECH RADAR DAY
kube-hunter (Security)
1. Remote scanning To specify remote machines for hunting, select option 1 or use
the --remote option. Example:./kube-hunter.py --remote some.node.com

2. Internal scanning To specify internal scanning, you can use the --internal option.
(this will scan all of the machine's network interfaces) Example: ./kube-hunter.py --
internal

3. Network scanning To specify a specific CIDR to scan, use the --cidr option.
Example: ./kube-hunter.py --cidr 192.168.0.0/24

FULLSTACK TECH RADAR DAY
Many many more ….
● Stay tuned for more stuff about Chaos Engineering
● https://www.tikalk.com/community
Thank you for joining us
Haggai Philip Zagury
DevOps Group & Tech Lead @ Tikal

Mais conteúdo relacionado

Mais procurados

Red Hat OpenShift V3 Overview and Deep Dive
Red Hat OpenShift V3 Overview and Deep DiveRed Hat OpenShift V3 Overview and Deep Dive
Red Hat OpenShift V3 Overview and Deep Dive
Greg Hoelzer
 

Mais procurados (20)

Build Your Own PaaS, Just like Red Hat's OpenShift from LinuxCon 2013 New Orl...
Build Your Own PaaS, Just like Red Hat's OpenShift from LinuxCon 2013 New Orl...Build Your Own PaaS, Just like Red Hat's OpenShift from LinuxCon 2013 New Orl...
Build Your Own PaaS, Just like Red Hat's OpenShift from LinuxCon 2013 New Orl...
 
ThoughtWorks Technology Radar Roadshow - Perth
ThoughtWorks Technology Radar Roadshow - PerthThoughtWorks Technology Radar Roadshow - Perth
ThoughtWorks Technology Radar Roadshow - Perth
 
Measure and Increase Developer Productivity with Help of Serverless at JCON 2...
Measure and Increase Developer Productivity with Help of Serverless at JCON 2...Measure and Increase Developer Productivity with Help of Serverless at JCON 2...
Measure and Increase Developer Productivity with Help of Serverless at JCON 2...
 
Dev ops with smell v1.2
Dev ops with smell v1.2Dev ops with smell v1.2
Dev ops with smell v1.2
 
Docker Enables DevOps
Docker Enables DevOpsDocker Enables DevOps
Docker Enables DevOps
 
DevOps@Morpho for ParisDevOps - 2nd of December 2014
DevOps@Morpho for ParisDevOps - 2nd of December 2014DevOps@Morpho for ParisDevOps - 2nd of December 2014
DevOps@Morpho for ParisDevOps - 2nd of December 2014
 
Puppet on a string
Puppet on a stringPuppet on a string
Puppet on a string
 
OpenShift PaaS Anywhere (Infrastructure.Next Ghent 2014-02-24) Diane Mueller
OpenShift PaaS Anywhere (Infrastructure.Next Ghent 2014-02-24) Diane Mueller OpenShift PaaS Anywhere (Infrastructure.Next Ghent 2014-02-24) Diane Mueller
OpenShift PaaS Anywhere (Infrastructure.Next Ghent 2014-02-24) Diane Mueller
 
Kubernetes or OpenShift - choosing your container platform for Dev and Ops
Kubernetes or OpenShift - choosing your container platform for Dev and OpsKubernetes or OpenShift - choosing your container platform for Dev and Ops
Kubernetes or OpenShift - choosing your container platform for Dev and Ops
 
OpenShift and next generation application development
OpenShift and next generation application developmentOpenShift and next generation application development
OpenShift and next generation application development
 
SRECon16: Moving Large Workloads from a Public Cloud to an OpenStack Private ...
SRECon16: Moving Large Workloads from a Public Cloud to an OpenStack Private ...SRECon16: Moving Large Workloads from a Public Cloud to an OpenStack Private ...
SRECon16: Moving Large Workloads from a Public Cloud to an OpenStack Private ...
 
A local private PaaS in minutes with the Red Hat CDK
A local private PaaS in minutes with the Red Hat CDKA local private PaaS in minutes with the Red Hat CDK
A local private PaaS in minutes with the Red Hat CDK
 
Red Hat OpenShift V3 Overview and Deep Dive
Red Hat OpenShift V3 Overview and Deep DiveRed Hat OpenShift V3 Overview and Deep Dive
Red Hat OpenShift V3 Overview and Deep Dive
 
A DevOps guide to Kubernetes
A DevOps guide to KubernetesA DevOps guide to Kubernetes
A DevOps guide to Kubernetes
 
Deploying Containers with Rancher
Deploying Containers with RancherDeploying Containers with Rancher
Deploying Containers with Rancher
 
Running Rancher and Docker on Dev Machines - Rancher Online Meetup - May 2016
Running Rancher and Docker on Dev Machines - Rancher Online Meetup - May 2016Running Rancher and Docker on Dev Machines - Rancher Online Meetup - May 2016
Running Rancher and Docker on Dev Machines - Rancher Online Meetup - May 2016
 
Monitoring and alerting as code with Terraform and New Relic
Monitoring and alerting as code with Terraform and New RelicMonitoring and alerting as code with Terraform and New Relic
Monitoring and alerting as code with Terraform and New Relic
 
Red Hhat Summit 2017 : Love Containers, Love Devops, Love Openshift, Where's ...
Red Hhat Summit 2017 : Love Containers, Love Devops, Love Openshift, Where's ...Red Hhat Summit 2017 : Love Containers, Love Devops, Love Openshift, Where's ...
Red Hhat Summit 2017 : Love Containers, Love Devops, Love Openshift, Where's ...
 
More tips and tricks for running containers like a pro - Rancher Online MEetu...
More tips and tricks for running containers like a pro - Rancher Online MEetu...More tips and tricks for running containers like a pro - Rancher Online MEetu...
More tips and tricks for running containers like a pro - Rancher Online MEetu...
 
Red Hat OpenShift Operators - Operators ABC
Red Hat OpenShift Operators - Operators ABCRed Hat OpenShift Operators - Operators ABC
Red Hat OpenShift Operators - Operators ABC
 

Semelhante a Chaos is a ladder !

Graal and Truffle: Modularity and Separation of Concerns as Cornerstones for ...
Graal and Truffle: Modularity and Separation of Concerns as Cornerstones for ...Graal and Truffle: Modularity and Separation of Concerns as Cornerstones for ...
Graal and Truffle: Modularity and Separation of Concerns as Cornerstones for ...
Thomas Wuerthinger
 

Semelhante a Chaos is a ladder ! (20)

OWASP AppSec Global 2019 Security & Chaos Engineering
OWASP AppSec Global 2019 Security & Chaos EngineeringOWASP AppSec Global 2019 Security & Chaos Engineering
OWASP AppSec Global 2019 Security & Chaos Engineering
 
RSA Conference APJ 2019 DevSecOps Days Security Chaos Engineering
RSA Conference APJ 2019 DevSecOps Days Security Chaos EngineeringRSA Conference APJ 2019 DevSecOps Days Security Chaos Engineering
RSA Conference APJ 2019 DevSecOps Days Security Chaos Engineering
 
Pivotal APJ Security Chaos Engineering
Pivotal APJ Security Chaos EngineeringPivotal APJ Security Chaos Engineering
Pivotal APJ Security Chaos Engineering
 
From Duke of DevOps to Queen of Chaos - Api days 2018
From Duke of DevOps to Queen of Chaos - Api days 2018From Duke of DevOps to Queen of Chaos - Api days 2018
From Duke of DevOps to Queen of Chaos - Api days 2018
 
Chaos Engineering - The Art of Breaking Things in Production
Chaos Engineering - The Art of Breaking Things in ProductionChaos Engineering - The Art of Breaking Things in Production
Chaos Engineering - The Art of Breaking Things in Production
 
Practical Chaos Engineering
Practical Chaos EngineeringPractical Chaos Engineering
Practical Chaos Engineering
 
Chaos Engineering: Injecting Failure for Building Resilience in Systems
Chaos Engineering: Injecting Failure for Building Resilience in SystemsChaos Engineering: Injecting Failure for Building Resilience in Systems
Chaos Engineering: Injecting Failure for Building Resilience in Systems
 
Embracing Disruption: Adding a Bit of Chaos to Help You Grow
Embracing Disruption: Adding a Bit of Chaos to Help You GrowEmbracing Disruption: Adding a Bit of Chaos to Help You Grow
Embracing Disruption: Adding a Bit of Chaos to Help You Grow
 
Fine line between performance and security
Fine line between performance and securityFine line between performance and security
Fine line between performance and security
 
Rackspace::Solve NYC - Solving for Rapid Customer Growth and Scale Through De...
Rackspace::Solve NYC - Solving for Rapid Customer Growth and Scale Through De...Rackspace::Solve NYC - Solving for Rapid Customer Growth and Scale Through De...
Rackspace::Solve NYC - Solving for Rapid Customer Growth and Scale Through De...
 
EVOLVE'16 | Keynote | Cat Reusswig | Taking Your AEM Implementation to The RO...
EVOLVE'16 | Keynote | Cat Reusswig | Taking Your AEM Implementation to The RO...EVOLVE'16 | Keynote | Cat Reusswig | Taking Your AEM Implementation to The RO...
EVOLVE'16 | Keynote | Cat Reusswig | Taking Your AEM Implementation to The RO...
 
JEE on DC/OS
JEE on DC/OSJEE on DC/OS
JEE on DC/OS
 
JEE on DC/OS - MesosCon Europe
JEE on DC/OS - MesosCon EuropeJEE on DC/OS - MesosCon Europe
JEE on DC/OS - MesosCon Europe
 
Introduction to Chaos Engineering
Introduction to Chaos EngineeringIntroduction to Chaos Engineering
Introduction to Chaos Engineering
 
Graal and Truffle: Modularity and Separation of Concerns as Cornerstones for ...
Graal and Truffle: Modularity and Separation of Concerns as Cornerstones for ...Graal and Truffle: Modularity and Separation of Concerns as Cornerstones for ...
Graal and Truffle: Modularity and Separation of Concerns as Cornerstones for ...
 
Stress Test & Chaos Engineering
Stress Test & Chaos EngineeringStress Test & Chaos Engineering
Stress Test & Chaos Engineering
 
How to lock a Python in a cage? Managing Python environment inside an R project
How to lock a Python in a cage?  Managing Python environment inside an R projectHow to lock a Python in a cage?  Managing Python environment inside an R project
How to lock a Python in a cage? Managing Python environment inside an R project
 
DevOps for Defenders in the Enterprise
DevOps for Defenders in the EnterpriseDevOps for Defenders in the Enterprise
DevOps for Defenders in the Enterprise
 
Lessons Learned in Software Development: QA Infrastructure – Maintaining Rob...
Lessons Learned in Software Development: QA Infrastructure – Maintaining Rob...Lessons Learned in Software Development: QA Infrastructure – Maintaining Rob...
Lessons Learned in Software Development: QA Infrastructure – Maintaining Rob...
 
DevOpsDays - Pick any Three - Devops from scratch
DevOpsDays - Pick any Three - Devops from scratchDevOpsDays - Pick any Three - Devops from scratch
DevOpsDays - Pick any Three - Devops from scratch
 

Mais de Haggai Philip Zagury

Tce automation-d4-110102123012-phpapp01
Tce automation-d4-110102123012-phpapp01Tce automation-d4-110102123012-phpapp01
Tce automation-d4-110102123012-phpapp01
Haggai Philip Zagury
 

Mais de Haggai Philip Zagury (15)

DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...
DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...
DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...
 
Kube Security Shifting left | Scanners & OPA
Kube Security Shifting left | Scanners & OPAKube Security Shifting left | Scanners & OPA
Kube Security Shifting left | Scanners & OPA
 
TechRadarCon 2022 | Have you built your platform yet ?
TechRadarCon 2022 | Have you built your platform yet ?TechRadarCon 2022 | Have you built your platform yet ?
TechRadarCon 2022 | Have you built your platform yet ?
 
Gitlab, GitOps & ArgoCD
Gitlab, GitOps & ArgoCDGitlab, GitOps & ArgoCD
Gitlab, GitOps & ArgoCD
 
DevEx | there’s no place like k3s
DevEx | there’s no place like k3sDevEx | there’s no place like k3s
DevEx | there’s no place like k3s
 
Auth experience - vol 1.0
Auth experience  - vol 1.0Auth experience  - vol 1.0
Auth experience - vol 1.0
 
Linux intro
Linux introLinux intro
Linux intro
 
Auth experience
Auth experienceAuth experience
Auth experience
 
Scaling i/o bound Microservices
Scaling i/o bound MicroservicesScaling i/o bound Microservices
Scaling i/o bound Microservices
 
Terraform 101
Terraform 101Terraform 101
Terraform 101
 
Helm intro
Helm introHelm intro
Helm intro
 
Machine Learning - Continuous operations
Machine Learning - Continuous operationsMachine Learning - Continuous operations
Machine Learning - Continuous operations
 
Whats all the FaaS About
Whats all the FaaS AboutWhats all the FaaS About
Whats all the FaaS About
 
Git internals
Git internalsGit internals
Git internals
 
Tce automation-d4-110102123012-phpapp01
Tce automation-d4-110102123012-phpapp01Tce automation-d4-110102123012-phpapp01
Tce automation-d4-110102123012-phpapp01
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Último (20)

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

Chaos is a ladder !

  • 1. FULLSTACK TECH RADAR DAY CHAOS is a Ladder Haggai Philip Zagury (hagzag) | DevOps Group & Tech Lead @ Tikal Knowledge
  • 2. FULLSTACK TECH RADAR DAY Haggai Philip Zagury DevOps Group & Tech Lead -> 10+ years @ Tikal My open thinking and open techniques ideology is driven by Open Source technologies and the collaborative manner defining my M.O. My solution driven approach is strongly based on hands-on and deep understanding of Operating Systems, Applications stacks and Software languages, Networking, Cloud in general and today more an more Cloud Native solutions. @hagzag
  • 3. FULLSTACK TECH RADAR DAY What is Chaos Engineering ? The philosophy behind Chaos Engineering
  • 4. FULLSTACK TECH RADAR DAY http://bit.ly/2VQGCup Chaos means many different things to different people…
  • 5. FULLSTACK TECH RADAR DAY In 1 Sentence ‣ Chaos Engineering is the discipline of experimenting on a distributed system in order to build confidence in the system’s capability to withstand turbulent conditions in production. Building Trust
  • 6. FULLSTACK TECH RADAR DAY Building Resilient Trust in systems is hard ! Backend DevOps Frontend & Mobile }
  • 10.
  • 12. FULLSTACK TECH RADAR DAY Building confidence in computer systems is hard ! ● Systems fail (Some “Design to Fail”) ● “Best Effort” Infra ● *aaS ● Cloud ● Cloud native ● Hybrid Cloud ● …
  • 13. FULLSTACK TECH RADAR DAY Experiment in Pr duction !
  • 14. FULLSTACK TECH RADAR DAY Additional to “Traditional Testing” ● Chaos Engineering goes beyond traditional (failure) testing in that it's not only about verifying assumptions. It also helps us explore the many unpredictable things that could happen and discover new properties of our inherently chaotic systems.
  • 15. FULLSTACK TECH RADAR DAY Hypothesis-Driven Experiments ● Hypothesis Define your steady state
  • 16. FULLSTACK TECH RADAR DAY Hypothesis-Driven Experiments ● Hypothesis Define your steady state ● Experiment by challenging it
  • 17. FULLSTACK TECH RADAR DAY Hypothesis-Driven Experiments ● Hypothesis Define your steady state ● Experiment by challenging it ● Analyse your findings - spread the word
  • 18. FULLSTACK TECH RADAR DAY Hypothesis-Driven Experiments ● Hypothesis - Define your steady state ● Experiment by challenging it ● Analyse your findings - spread the word ● Action items should be noted ● Perhaps run another round with other limits / variables ● Immune your system (eventually) Immune
  • 19. FULLSTACK TECH RADAR DAY Chaos engineering is: ● Like injecting a Vaccine to immune yourself. ● Increase system resilience - by discovering vulnerabilities ● Identify failure before it becomes an outage ● Better define your steady state (iterative) and constantly challenge it.
  • 20. FULLSTACK TECH RADAR DAY Chaos engineering isn’t: ● Breaking down production on purpose. ● A (new) blame mechanism ● Surprising partial outages. ● Taking down all the system at the same time.
  • 21. FULLSTACK TECH RADAR DAY Chaos Engineering Origins? How did we get here ?
  • 22. FULLSTACK TECH RADAR DAY DevOps 2010
  • 23. FULLSTACK TECH RADAR DAY DevOps 2010
  • 24. FULLSTACK TECH RADAR DAY DevOps 2010 2011 FaaS
  • 25. FULLSTACK TECH RADAR DAY DevOps 2010 20111998 How Complex Systems Fail (Being a Short Treatise on the Nature of Failure; How Failure is Evaluated; How Failure is Attributed to Proximate Cause; and the Resulting New 25 years Resilience partitionist
  • 26. FULLSTACK TECH RADAR DAY DevOps 2010 20111998 How Complex Systems Fail (Being a Short Treatise on the Nature of Failure; How Failure is Evaluated; How Failure is Attributed to Proximate Cause; and the Resulting New 25 years Resilience partitionist http://erikhollnagel.com/ideas/resilience-engineering.html A system is resilient if it can adjust its functioning prior to, during, or following events (changes, disturbances, and opportunities), and thereby sustain required operations under both expected and Resilience Engineering
  • 27. FULLSTACK TECH RADAR DAY Unleash the Army DevOps 2010 2011 2014 Chaos Engineer Role Announced
  • 28. FULLSTACK TECH RADAR DAY DevOps 2010 2011 2014 Chaos Engineer Role Announced gremlin.com Failure as a service Unleash the Army 2015
  • 29. FULLSTACK TECH RADAR DAY DevOps 2010 2011 2014 Chaos Engineer Role Announced gremlin.com Failure as a service 2017 Unleash the Army 2015 A system is resilient if it can adjust its functioning prior to, during, or following events (changes, disturbances, and opportunities), and thereby sustain required operations under both expected and Resilience Engineering
  • 30. FULLSTACK TECH RADAR DAY DevOps 2010 20142011 http://erikhollnagel.com/ideas/resilience-engineering.html 2015 A system is resilient if it can adjust its functioning prior to, during, or following events (changes, disturbances, and opportunities), and thereby sustain required operations under both expected and Resilience Engineering 20172016 Building trust in
 Chaos Engineering 1998 Chaos Engineer Role Announced
  • 31. FULLSTACK TECH RADAR DAY Where we meet Chaos How did we get here ?
  • 32. FULLSTACK TECH RADAR DAY Where we meet Chaos Chaos starts here
  • 33. FULLSTACK TECH RADAR DAY In 1 Sentence ‣ Chaos Engineering is the discipline of experimenting on a distributed system in order to build confidence in the system’s capability to withstand turbulent conditions in production. ‣ Preparing for the unknown … Building Trust
  • 34. FULLSTACK TECH RADAR DAY Turbulent condition - failing node in a cluster default a b b aa a ● 2 services in a 3 node cluster
  • 35. FULLSTACK TECH RADAR DAY Turbulent conditions default a b b aa a ● What’s my application going to suffer from ?
  • 36. FULLSTACK TECH RADAR DAY Turbulent conditions default a b b aa a ● 2 services in a 3 node cluster ● What’s my application going to suffer from ? ● Is this OK ?
  • 37. FULLSTACK TECH RADAR DAY Turbulent conditions default a b b aa a ● Back to Normal
  • 38. FULLSTACK TECH RADAR DAY Turbulents
  • 39. FULLSTACK TECH RADAR DAY How to practice Chaos Engineering ? Perquisites + Tools of Chaos Engineering
  • 40. FULLSTACK TECH RADAR DAY Practice ● You should have: ● GameDays ● ChaosDays ● Controlled & Schedule drills / experiments
  • 41. FULLSTACK TECH RADAR DAY Practice & Collaborate ● You should have: ● GameDays ● ChaosDays ● Controlled & Schedule drills / experiments
  • 42. FULLSTACK TECH RADAR DAY It’s slowly becoming a culture https://github.com/dastergon/awesome-chaos-engineering
  • 43. FULLSTACK TECH RADAR DAY Automation is key !
  • 44. FULLSTACK TECH RADAR DAY Monitoring (ROI) Observability DevOps
  • 45. FULLSTACK TECH RADAR DAY Not just graphs and logs (that too) ● RCA’s - recording and being able to reach it ! ● Document, Document, Document - great resources on how to do that. ● We don’t Chaos everything … ● Only what makes sense / repeats ● Game / Chaos Days -> keep experiment definitions for GameDay/ ChaosDay to define
  • 46. FULLSTACK TECH RADAR DAY SLA … is innovation driven - how fast did you do without failing ? https://cloudplatformonline.com/rs/248-TPC-286/images/DORA-State%20of%20DevOps.pdf
  • 47. FULLSTACK TECH RADAR DAY SLA … is innovation driven - how fast did you do without failing ? https://cloudplatformonline.com/rs/248-TPC-286/images/DORA-State%20of%20DevOps.pdf
  • 48. FULLSTACK TECH RADAR DAY Experiment !
  • 49. FULLSTACK TECH RADAR DAY Application Caching Database Hardware Network What layer ? - All !
  • 50. FULLSTACK TECH RADAR DAY The ultimate chaos “butterfly Affect” / “Domino Affect” ● How will my application do ● without cache ? ● without a certain api available ? ● with n sessions
  • 51. FULLSTACK TECH RADAR DAY The ultimate chaos “butterfly Affect” / “Domino Affect” ● How will my application do ● without cache ? ● without a certain api available ? ● with n sessions
  • 52. FULLSTACK TECH RADAR DAY Applying Chos Engineering practices Log | Messure
 Monitor Break Things & Auto Recover
 Experiment Full Cycle - Chaos
 Immune Application Caching Database Hardware Network Security
  • 53. FULLSTACK TECH RADAR DAY Where is Chaos going ? "the discipline of experimenting on a distributed system in order to build confidence in the system's capability to withstand turbulent conditions in production."
  • 54. FULLSTACK TECH RADAR DAY Toolz
  • 55. FULLSTACK TECH RADAR DAY Failure as a service
  • 56. FULLSTACK TECH RADAR DAY Game-day resources https://www.gremlin.com/community/tutorials/planning-your-own-chaos-day/ Planning your GameDay ? Feel Free to contact me directly - 
 we’d be happy to help -> hagzag@tikalk.com
  • 57. FULLSTACK TECH RADAR DAY Hypothesis - steady state { "name": "all-our-microservices-should-be-healthy", "type": "probe", "tolerance": "true", "provider": { "type": "python", "module": "chaosk8s.probes", "func": "microservice_available_and_healthy", "arguments": { "name": "myapp", "ns": “default" } } }
  • 58. FULLSTACK TECH RADAR DAY Experiment Terminate a pod ! ● What to do ● When to do it { "type": "action", "name": "terminate-db-pod", "provider": { "type": "python", "module": "chaosk8s.pod.actions", "func": "terminate_pods", "arguments": { "label_selector": "app=my-app", "name_pattern": "my-app-[0-9]$", "rand": true, "ns": "default" } }, "pauses": { "after": 5 }
  • 59. FULLSTACK TECH RADAR DAY If your just peeping / evaluating
  • 60. FULLSTACK TECH RADAR DAY Chaoskube ● chaoskube is a “chaos-monkey lite” it basically takes down pod based on a schedule to test your resilience (and there are some tweaks via configuration) ● use —dry-run https://github.com/linki/chaoskube
  • 61. FULLSTACK TECH RADAR DAY kube-bench Find vulnerabilities, configuration flags, define your own policies.
  • 62. FULLSTACK TECH RADAR DAY kube-hunter (Security) 1. Remote scanning To specify remote machines for hunting, select option 1 or use the --remote option. Example:./kube-hunter.py --remote some.node.com
 2. Internal scanning To specify internal scanning, you can use the --internal option. (this will scan all of the machine's network interfaces) Example: ./kube-hunter.py -- internal
 3. Network scanning To specify a specific CIDR to scan, use the --cidr option. Example: ./kube-hunter.py --cidr 192.168.0.0/24

  • 63. FULLSTACK TECH RADAR DAY Many many more …. ● Stay tuned for more stuff about Chaos Engineering ● https://www.tikalk.com/community
  • 64. Thank you for joining us Haggai Philip Zagury DevOps Group & Tech Lead @ Tikal