SlideShare uma empresa Scribd logo
1 de 32
Baixar para ler offline
Jorge Salamero Sanz <jsalamero@serverdensity.com>
Atmosphere Conference Krakow May 2016
HumanOps - the impact of health on operations
Jorge Salamero
@bencerillo
@serverdensity
blog.serverdensity.com
www.CloudStatusApp.com
● Infrastructure automation
● Configuration automation
● Continuous testing
● Continuous deployment / delivery
● Monitoring
● Logs, error handling
● Feedback
● Human Ops
● Humans are part of any system
● Initial design, ongoing improvements
● Maintenance
● Upgrades
● Issues, Incident response
● System issues = error rates + SLA + ...
● Human issues = alerts out of hours + interruptions + .
● System issues = Human issues
● Downtime = loss of users, reputation, revenue
● Downtime caused by unreliable systems
● Unhealthy teams reduce reliability
● Unhealthy teams = loss of users, reputation, revenue
● Slip
● Lapse
● Mistake
● Violation
● (Always, again, again)
What can we do?
● Prepare and practice
● Respond
● Postmortem
Real example
(small war story, won’t be long)
● Power failure to half of our servers
● Automated failover unavailable
(known failure condition)
● Manual DNS switch required
● Expected impact: 20 min
● Actual impact: 43min
Lessons learned?
● Unfamiliarity with the process
● Pressure of time sensitive event
(panic effect)
● Escalation introduces delays
Handling the Human factor
● First responder, acknowledge alert
● Load incident response checklist
● Log into #ops-war-room in Slack
● Log incident into JIRA
● Begin investigation
1. Extended use of checklists
2. Not to follow blindly, use knowledge
and experience
3. Independent system
4. Searchable
5. List of known issues and
documented workarounds/fixes
● The “limits of human memory and
attention”
○ Complexity
○ Stress and fatigue
○ Ego
● Pilots, doctors, divers:
Bruce Willis Ruins All Films
(BCD, weights, releases, air, final)
1. Extended use of checklists
2. Not to follow blindly, use knowledge
and experience
3. Independent system
4. Searchable
5. List of known issues and
documented workarounds/fixes
● Realistic replica environment
● or mock command line
● Record actions and timing
● Multiple failures
● Unexpected results
Results
● Team and individual test of response
● Run real commands
● Training the people
● Training the procedures
● Training the tools
● Increase confidence
● Reduce panic
● Better coordination
● Trust relationships
● Improves time to resolution
● Review
● Suggestions for improvements
● Do it again
● Scenario evolves
● People forget
● On call rotation design
● Alert prioritization
● Notification optimization
Human Ops
1. Humans are part of the system
2. Humans impact systems
3. Humans impact business
4. Human issues count as system issues
meetup.com/humanops-london/
humanops.com
serverdensity.com/conferences
DEVOPSDAYS
Jorge Salamero
@bencerillo
@serverdensity
blog.serverdensity.com

Mais conteúdo relacionado

Mais procurados

Signs that your scrum adoption is failing
Signs that your scrum adoption is failingSigns that your scrum adoption is failing
Signs that your scrum adoption is failing
Anand Kumar
 

Mais procurados (12)

Process & Methodologies (1.2)
Process & Methodologies (1.2)Process & Methodologies (1.2)
Process & Methodologies (1.2)
 
Process & Methodologies (1.1)
Process & Methodologies (1.1)Process & Methodologies (1.1)
Process & Methodologies (1.1)
 
Scrum in few minutes
Scrum in few minutesScrum in few minutes
Scrum in few minutes
 
Session-2
Session-2Session-2
Session-2
 
Process Troubleshooting
Process TroubleshootingProcess Troubleshooting
Process Troubleshooting
 
Scrum Project Examples Dwika v7.2
Scrum Project Examples   Dwika v7.2Scrum Project Examples   Dwika v7.2
Scrum Project Examples Dwika v7.2
 
Scrum
ScrumScrum
Scrum
 
Sudokuban&agile values
Sudokuban&agile valuesSudokuban&agile values
Sudokuban&agile values
 
Ppt sbardia
Ppt sbardiaPpt sbardia
Ppt sbardia
 
Sudokuban - A practical Kanban learning game
Sudokuban - A practical Kanban learning gameSudokuban - A practical Kanban learning game
Sudokuban - A practical Kanban learning game
 
Scrum
ScrumScrum
Scrum
 
Signs that your scrum adoption is failing
Signs that your scrum adoption is failingSigns that your scrum adoption is failing
Signs that your scrum adoption is failing
 

Destaque

4Developers2016: Kuba Marchwicki- JavaEE - nie musi byc tak smutna jak się To...
4Developers2016: Kuba Marchwicki- JavaEE - nie musi byc tak smutna jak się To...4Developers2016: Kuba Marchwicki- JavaEE - nie musi byc tak smutna jak się To...
4Developers2016: Kuba Marchwicki- JavaEE - nie musi byc tak smutna jak się To...
PROIDEA
 
Birdhouse Builder
Birdhouse BuilderBirdhouse Builder
Birdhouse Builder
UMHcaring
 

Destaque (17)

MCE^3 - Adrian Catalan - Android Architecture for the Everyday Developer
MCE^3 - Adrian Catalan - Android Architecture for the Everyday DeveloperMCE^3 - Adrian Catalan - Android Architecture for the Everyday Developer
MCE^3 - Adrian Catalan - Android Architecture for the Everyday Developer
 
Atmosphere 2016 - Arvid Picciani - Continuous Deployment for Massive Scale Em...
Atmosphere 2016 - Arvid Picciani - Continuous Deployment for Massive Scale Em...Atmosphere 2016 - Arvid Picciani - Continuous Deployment for Massive Scale Em...
Atmosphere 2016 - Arvid Picciani - Continuous Deployment for Massive Scale Em...
 
Atmosphere 2016 - Matt Harasymczuk - Case Study: Agile @gov.pl
 Atmosphere 2016 - Matt Harasymczuk - Case Study: Agile @gov.pl  Atmosphere 2016 - Matt Harasymczuk - Case Study: Agile @gov.pl
Atmosphere 2016 - Matt Harasymczuk - Case Study: Agile @gov.pl
 
4Developers2016: Kuba Marchwicki- JavaEE - nie musi byc tak smutna jak się To...
4Developers2016: Kuba Marchwicki- JavaEE - nie musi byc tak smutna jak się To...4Developers2016: Kuba Marchwicki- JavaEE - nie musi byc tak smutna jak się To...
4Developers2016: Kuba Marchwicki- JavaEE - nie musi byc tak smutna jak się To...
 
4Developers: Jakub Kubryński- Effective Software Delivery
4Developers: Jakub  Kubryński- Effective Software Delivery4Developers: Jakub  Kubryński- Effective Software Delivery
4Developers: Jakub Kubryński- Effective Software Delivery
 
Atmosphere 2016 - Andreas Grabner - Metrics Driven-DevOps: Delivering High Qu...
Atmosphere 2016 - Andreas Grabner - Metrics Driven-DevOps: Delivering High Qu...Atmosphere 2016 - Andreas Grabner - Metrics Driven-DevOps: Delivering High Qu...
Atmosphere 2016 - Andreas Grabner - Metrics Driven-DevOps: Delivering High Qu...
 
Bartosz kowalik Shapeless Matrix
Bartosz kowalik  Shapeless MatrixBartosz kowalik  Shapeless Matrix
Bartosz kowalik Shapeless Matrix
 
Atmosphere 2014: When Storm hits data. Data streams processing in real time -...
Atmosphere 2014: When Storm hits data. Data streams processing in real time -...Atmosphere 2014: When Storm hits data. Data streams processing in real time -...
Atmosphere 2014: When Storm hits data. Data streams processing in real time -...
 
Prezentacja konferencje
Prezentacja konferencjePrezentacja konferencje
Prezentacja konferencje
 
[4developers] - Wydajność webowa - jak to ugryźć? (Konrad Kokosa)
[4developers] - Wydajność webowa - jak to ugryźć? (Konrad Kokosa)[4developers] - Wydajność webowa - jak to ugryźć? (Konrad Kokosa)
[4developers] - Wydajność webowa - jak to ugryźć? (Konrad Kokosa)
 
PLNOG16: Automatyzacja tworzenia sieci w środowisku Vmware, Maciej Lelusz
PLNOG16:  Automatyzacja tworzenia sieci w środowisku Vmware, Maciej LeluszPLNOG16:  Automatyzacja tworzenia sieci w środowisku Vmware, Maciej Lelusz
PLNOG16: Automatyzacja tworzenia sieci w środowisku Vmware, Maciej Lelusz
 
The effects of price and package on consumer
The effects of price and package on consumerThe effects of price and package on consumer
The effects of price and package on consumer
 
PLNOG16: Netflix Open Connect is the Netflix proprietary CDN, Nina Bargisen
PLNOG16: Netflix Open Connect is the Netflix proprietary CDN, Nina BargisenPLNOG16: Netflix Open Connect is the Netflix proprietary CDN, Nina Bargisen
PLNOG16: Netflix Open Connect is the Netflix proprietary CDN, Nina Bargisen
 
infraxstructure: Piotr Wojciechowski "Secure Data Center"
infraxstructure: Piotr Wojciechowski  "Secure Data Center"infraxstructure: Piotr Wojciechowski  "Secure Data Center"
infraxstructure: Piotr Wojciechowski "Secure Data Center"
 
infraxstructure: Rafał Stańczak "Postępujący rozwój infrastruktury na potrze...
infraxstructure: Rafał Stańczak  "Postępujący rozwój infrastruktury na potrze...infraxstructure: Rafał Stańczak  "Postępujący rozwój infrastruktury na potrze...
infraxstructure: Rafał Stańczak "Postępujący rozwój infrastruktury na potrze...
 
infraxstructure: Marcin Kaczmarek "SDS - Storage jako aplikacja."
infraxstructure: Marcin Kaczmarek  "SDS - Storage jako aplikacja."infraxstructure: Marcin Kaczmarek  "SDS - Storage jako aplikacja."
infraxstructure: Marcin Kaczmarek "SDS - Storage jako aplikacja."
 
Birdhouse Builder
Birdhouse BuilderBirdhouse Builder
Birdhouse Builder
 

Semelhante a Atmosphere 2016 - Jorge Salamero Sanz - HumanOps, the impact of human health of operations

Cissp Week 23
Cissp Week 23Cissp Week 23
Cissp Week 23
jemtallon
 
Maintenance Stabilisation
Maintenance StabilisationMaintenance Stabilisation
Maintenance Stabilisation
Zsolt Fabok
 

Semelhante a Atmosphere 2016 - Jorge Salamero Sanz - HumanOps, the impact of human health of operations (20)

Monitoring &amp; alerting presentation sabin&amp;mustafa
Monitoring &amp; alerting presentation sabin&amp;mustafaMonitoring &amp; alerting presentation sabin&amp;mustafa
Monitoring &amp; alerting presentation sabin&amp;mustafa
 
HowTo DR
HowTo DRHowTo DR
HowTo DR
 
The on-call survival guide - how to be confident on-call
The on-call survival guide - how to be confident on-call The on-call survival guide - how to be confident on-call
The on-call survival guide - how to be confident on-call
 
Data Integrity - Patryk Hes
Data Integrity - Patryk HesData Integrity - Patryk Hes
Data Integrity - Patryk Hes
 
Practical DevSecOps: Fundamentals of Successful Programs
Practical DevSecOps: Fundamentals of Successful ProgramsPractical DevSecOps: Fundamentals of Successful Programs
Practical DevSecOps: Fundamentals of Successful Programs
 
Latency Control And Supervision In Resilience Design Patterns
Latency Control And Supervision In Resilience Design Patterns Latency Control And Supervision In Resilience Design Patterns
Latency Control And Supervision In Resilience Design Patterns
 
APIdays Singapore 2019 - Building Applications in the Cloud: Best Practices F...
APIdays Singapore 2019 - Building Applications in the Cloud: Best Practices F...APIdays Singapore 2019 - Building Applications in the Cloud: Best Practices F...
APIdays Singapore 2019 - Building Applications in the Cloud: Best Practices F...
 
Winston - Netflix's event driven auto remediation and diagnostics tool
Winston - Netflix's event driven auto remediation and diagnostics toolWinston - Netflix's event driven auto remediation and diagnostics tool
Winston - Netflix's event driven auto remediation and diagnostics tool
 
Brainstorming failure
Brainstorming failureBrainstorming failure
Brainstorming failure
 
Software development myths that block your career
Software development myths that block your careerSoftware development myths that block your career
Software development myths that block your career
 
Incident response orchestration
Incident response orchestrationIncident response orchestration
Incident response orchestration
 
KNOLX - SCRUM ANTI-PATTERNS_SEP 02, 2022_PPT.pptx.pdf
KNOLX - SCRUM ANTI-PATTERNS_SEP 02, 2022_PPT.pptx.pdfKNOLX - SCRUM ANTI-PATTERNS_SEP 02, 2022_PPT.pptx.pdf
KNOLX - SCRUM ANTI-PATTERNS_SEP 02, 2022_PPT.pptx.pdf
 
Cissp Week 23
Cissp Week 23Cissp Week 23
Cissp Week 23
 
GO WITH THE FLOW: Scrum teams are interrupted 2000+ times per Sprint. Let's ...
GO WITH THE FLOW: Scrum teams are interrupted 2000+ times per Sprint.  Let's ...GO WITH THE FLOW: Scrum teams are interrupted 2000+ times per Sprint.  Let's ...
GO WITH THE FLOW: Scrum teams are interrupted 2000+ times per Sprint. Let's ...
 
Sprint Zero in Scrum
Sprint Zero in ScrumSprint Zero in Scrum
Sprint Zero in Scrum
 
English redistributable-intro-scrum
English redistributable-intro-scrumEnglish redistributable-intro-scrum
English redistributable-intro-scrum
 
August: DevOps 101 (in lieu of DevOps Patterns Distilled)
August: DevOps 101 (in lieu of DevOps Patterns Distilled)August: DevOps 101 (in lieu of DevOps Patterns Distilled)
August: DevOps 101 (in lieu of DevOps Patterns Distilled)
 
Maintenance Stabilisation
Maintenance StabilisationMaintenance Stabilisation
Maintenance Stabilisation
 
Digital Forensics & Incident Response Fundamentals.pdf
Digital Forensics & Incident Response Fundamentals.pdfDigital Forensics & Incident Response Fundamentals.pdf
Digital Forensics & Incident Response Fundamentals.pdf
 
Demise of test scripts rise of test ideas
Demise of test scripts rise of test ideasDemise of test scripts rise of test ideas
Demise of test scripts rise of test ideas
 

Último

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Último (20)

A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 

Atmosphere 2016 - Jorge Salamero Sanz - HumanOps, the impact of human health of operations