O slideshow foi denunciado.
Seu SlideShare está sendo baixado. ×

Code Yellow: Helping operations top-heavy teams the smart way

Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Carregando em…3
×

Confira estes a seguir

1 de 29 Anúncio

Code Yellow: Helping operations top-heavy teams the smart way

Baixar para ler offline

We will look at the process for Code Yellow, the term we use for this process of "righting the ship," and discuss how to identify teams that are struggling. Through a look at three separate experiences, we will examine some of the root causes, what steps were taken, and how the engineering organization as a whole supports the process.

We will look at the process for Code Yellow, the term we use for this process of "righting the ship," and discuss how to identify teams that are struggling. Through a look at three separate experiences, we will examine some of the root causes, what steps were taken, and how the engineering organization as a whole supports the process.

Anúncio
Anúncio

Mais Conteúdo rRelacionado

Diapositivos para si (20)

Semelhante a Code Yellow: Helping operations top-heavy teams the smart way (20)

Anúncio

Mais de Michael Kehoe (20)

Mais recentes (20)

Anúncio

Code Yellow: Helping operations top-heavy teams the smart way

  1. 1. Helping operations top-heavy teams the smart way Jeff Weiner Chief Executive Officer Michael Kehoe Staff Site Reliability Engineer Todd Palino Sr Staff Site Reliability Engineer
  2. 2. This Is The Only Slide You May Need a Picture Of slideshare.net/ToddPalino slideshare.net/MichaelKehoe3
  3. 3. Michael Kehoe $ WHOAMI • Staff Site Reliability Engineer @ LinkedIn • Production-SRE Team • Former Network Engineer at the University of Queensland
  4. 4. Todd Palino $ WHOAMI • Senior Staff SRE @ LinkedIn • Capacity Engineering Team • Co-Author of Kafka: The Definitive Guide • Late of VeriSign Infrastructure Engineering
  5. 5. When Operations Isn’t Perfect Code Yellow https://devops.com/code-yellow-when-operations-isnt-perfect/
  6. 6. • How to quickly erase all your technical debt • How to change your engineering culture This talk is not
  7. 7. • How to identify team anti-patterns • How to work through high toil • How to create sustainable workloads This talk is
  8. 8. Today’s agenda 1 Background 2 Scenario 1: Traffic-SRE 3 Scenario 2: Kafka-SRE 4 Building A Formula For Success 5 Key Learnings 6 Q&A
  9. 9. Background
  10. 10. Personal Experience in the past two years ASSISTANCE RENDERED • Traffic-SRE: Technical Debt/ Resource Allocation • Voyager-SRE: Technical Debt • Capacity War-room • Espresso-SRE: Reliability • Kafka-SRE: Capacity and Alert Fatigue
  11. 11. Scenario 1: Traffic-SRE
  12. 12. Problem Statement Technical Debt • Written documentation needed improvement • Deployment infrastructure needed investment • Alert Fatigue Traffic-SRE
  13. 13. Problem Statement Resource Allocations • Backlog of work for clients • Staff shortage
  14. 14. Scenario 2: Kafka
  15. 15. Problem Statement Capacity Planning • Multi-tenant Infrastructure • No resource controls • Unclear resource ownership • Ad-hoc capacity planning • Sudden 100% increase in traffic
  16. 16. Problem Statement Alert Fatigue • Multiple applications overutilized • No time for proactive work • Most alerts non-actionable
  17. 17. Building a formula for success
  18. 18. Code Yellow
  19. 19. Building a formula for success Define the areas that need attacking Problem Statement Communicate expectations with clients & partners Communication & Partnerships Define success criteria Exit Criteria Get the help that you require Resource Acquisition Plan for short-term & long-term Planning
  20. 20. Define the areas that need attacking Problem Statement • Admit there is a problem • Measure the problem • Understand the problem • Determines underlying causes that need to be fixed Building a formula for success
  21. 21. Define success criteria Exit Criteria • Define concrete goals • Define concrete success criteria • Measure via an operational metric • Measure via a project being completed • Define timelines for completion Building a formula for success
  22. 22. Get the help you require Resource Acquisition • Ask other teams for help • Get dedicated engineers/ project managers/ other roles as required • Set exit-date for resources Building a formula for success
  23. 23. Plan for the short-term & long-term Planning • Plan out short-term work • Plan out longer-term projects • Do they need to be rescheduled? • Prioritize work that will reduce toil & burnout (Automation + Measurement) Building a formula for success
  24. 24. Communicate expectations with clients & partners Communicatio n & Partnerships • Communicate problem statement & exit criteria • Send regular progress updates • Ensure that stakeholders understand delays & expected outcomes Building a formula for success
  25. 25. Key Learnings
  26. 26. Key Learnings Measure toil/ overhead Measure Prioritize efforts to remove overhead/toil Prioritize Communicate with partners & teams Communicate
  27. 27. Q&A

×