SlideShare uma empresa Scribd logo
1 de 54
Baixar para ler offline
Production Engineering (PE)
There Is No Spoon
Ran Leibman
Production Engineer
Agenda
1. How Production Engineering was formed in Facebook
2. What we do in Onavo
3. How we moved to the PE model in Onavo
4. Q & A
Facebook - Pre PE Days
SRO - Site Reliability Operations
1. Keep the site up 24/7
2. Follow the sun
3. Capacity plans
Why SRO was not enough ?
What are the alternatives ?
NOC
The Production Engineering Model
1. PEs are embedded within the software engineering teams
2. Taking part in meetings
3. Involved in roadmap plans
4. Reviewing diffs
5. Oncall - Software & Production Engineers
Onavo - Adopting The PE Model
» Protect user traffic using IPsec
» Protect against malicious sites
» Compress user traffic
» Control data leakage
Save, Measure & Protect your mobile data
a bit of context
Onavo
1. Founded at 2010
2. Classic Startup Dev & Ops teams
1. Dev - writes code
2. Ops - keeps the infra up & running
3. Acquired by Facebook at 2013
Making The Change - Step By Step
Step 1 - Go Sit Close/Next With The Developers
Step 2 - Get The Colleagues Onboard
Step 3 - Get Your Tooling Ready
you don’t want that “Confused Travolta” moment …
Have Good (short) Documentation
» Document your alerts
» Links to dashboards
» Links to third party software docs
» Runbooks - how to debug in prod
» log files, how to restart the service, getting stack traces & metrics …
» Links to config management
Dev Friendly Systems
avoid the graph porn …
Simple And Indicative Dashboards
1. Match the product KPIs
2. Strong signal
3. Intuitive titles
4. Easy to spot anomalies
5. Easy to find correlations
Step 4 - Review Your Alerts
rm -rf /all/false/alarms*
Refactor Your Alerts as Needed
» The first challenge is to make sure alerts are handled
» To make it possible every alert should be
» Indicate a real problem
» Clear to understand - Informative
» Impactful
» Actionable
Step 5 - Train The Team - Get Them Ready
learning is easy - remembering is hard
Train The Team
» Wiki / Doc based
» makes it easier to remember
» Hands-on Hands-on Hands-on
» Pre create task pool (even if low impact)
» Give oncall use cases & examples
» Reusable
Step 6 - Oncall + Hand Holding
make yourself available and adjust as you go
Shared Oncall
» Short oncall cycles, 1-2 days
» Increase the period each cycle
» Oncall Summaries
» Do oncall as well - set an example
» Preemptively check status with

the current oncall
√ Step 1 - Go Sit Close/Next With The Developers
The Steps
√ Step 2 - Get The Colleagues Onboard
√ Step 1 - Go Sit Close/Next With The Developers
The Steps
√ Step 3 - Get Your Tooling Ready
√ Step 2 - Get The Colleagues Onboard
√ Step 1 - Go Sit Close/Next With The Developers
The Steps
√ Step 4 - Review Your Alerts
√ Step 3 - Get Your Tooling Ready
√ Step 2 - Get The Colleagues Onboard
√ Step 1 - Go Sit Close/Next With The Developers
The Steps
√ Step 5 - Train The Team
√ Step 4 - Review Your Alerts
√ Step 3 - Get Your Tooling Ready
√ Step 2 - Get The Colleagues Onboard
√ Step 1 - Go Sit Close/Next With The Developers
The Steps
√ Step 6 - Oncall + Hand Holding
√ Step 5 - Train The Team
√ Step 4 - Review Your Alerts
√ Step 3 - Get Your Tooling Ready
√ Step 2 - Get The Colleagues Onboard
√ Step 1 - Go Sit Close/Next With The Developers
The Steps
Questions?
Ran Leibman
Production Engineer
There is No Spoon - Ran Leibman, Facebook - DevOpsDays Tel Aviv 2016

Mais conteúdo relacionado

Destaque

Production testing through monitoring
Production testing through monitoringProduction testing through monitoring
Production testing through monitoringLeon Fayer
 
SREcon 2016 Performance Checklists for SREs
SREcon 2016 Performance Checklists for SREsSREcon 2016 Performance Checklists for SREs
SREcon 2016 Performance Checklists for SREsBrendan Gregg
 
Microservices Workshop All Topics Deck 2016
Microservices Workshop All Topics Deck 2016Microservices Workshop All Topics Deck 2016
Microservices Workshop All Topics Deck 2016Adrian Cockcroft
 
Scaling Pinterest's Monitoring
Scaling Pinterest's MonitoringScaling Pinterest's Monitoring
Scaling Pinterest's MonitoringBrian Overstreet
 
Bruce Lawson: Progressive Web Apps: the future of Apps
Bruce Lawson: Progressive Web Apps: the future of AppsBruce Lawson: Progressive Web Apps: the future of Apps
Bruce Lawson: Progressive Web Apps: the future of Appsbrucelawson
 
Performance and Scalability Testing with Python and Multi-Mechanize
Performance and Scalability Testing with Python and Multi-MechanizePerformance and Scalability Testing with Python and Multi-Mechanize
Performance and Scalability Testing with Python and Multi-Mechanizecoreygoldberg
 
Continuous Delivery: Making DevOps Awesome
Continuous Delivery: Making DevOps AwesomeContinuous Delivery: Making DevOps Awesome
Continuous Delivery: Making DevOps AwesomeNicole Forsgren
 

Destaque (8)

Production testing through monitoring
Production testing through monitoringProduction testing through monitoring
Production testing through monitoring
 
SREcon 2016 Performance Checklists for SREs
SREcon 2016 Performance Checklists for SREsSREcon 2016 Performance Checklists for SREs
SREcon 2016 Performance Checklists for SREs
 
Microservices Workshop All Topics Deck 2016
Microservices Workshop All Topics Deck 2016Microservices Workshop All Topics Deck 2016
Microservices Workshop All Topics Deck 2016
 
Scaling Pinterest's Monitoring
Scaling Pinterest's MonitoringScaling Pinterest's Monitoring
Scaling Pinterest's Monitoring
 
Bruce Lawson: Progressive Web Apps: the future of Apps
Bruce Lawson: Progressive Web Apps: the future of AppsBruce Lawson: Progressive Web Apps: the future of Apps
Bruce Lawson: Progressive Web Apps: the future of Apps
 
How to Speak "Manager"
How to Speak "Manager"How to Speak "Manager"
How to Speak "Manager"
 
Performance and Scalability Testing with Python and Multi-Mechanize
Performance and Scalability Testing with Python and Multi-MechanizePerformance and Scalability Testing with Python and Multi-Mechanize
Performance and Scalability Testing with Python and Multi-Mechanize
 
Continuous Delivery: Making DevOps Awesome
Continuous Delivery: Making DevOps AwesomeContinuous Delivery: Making DevOps Awesome
Continuous Delivery: Making DevOps Awesome
 

Mais de DevOpsDays Tel Aviv

YOUR OPEN SOURCE PROJECT IS LIKE A STARTUP, TREAT IT LIKE ONE, EYAR ZILBERMAN...
YOUR OPEN SOURCE PROJECT IS LIKE A STARTUP, TREAT IT LIKE ONE, EYAR ZILBERMAN...YOUR OPEN SOURCE PROJECT IS LIKE A STARTUP, TREAT IT LIKE ONE, EYAR ZILBERMAN...
YOUR OPEN SOURCE PROJECT IS LIKE A STARTUP, TREAT IT LIKE ONE, EYAR ZILBERMAN...DevOpsDays Tel Aviv
 
GRAPHQL TO THE RES(T)CUE, ELLA SHARAKANSKI, Salto
GRAPHQL TO THE RES(T)CUE, ELLA SHARAKANSKI, SaltoGRAPHQL TO THE RES(T)CUE, ELLA SHARAKANSKI, Salto
GRAPHQL TO THE RES(T)CUE, ELLA SHARAKANSKI, SaltoDevOpsDays Tel Aviv
 
MICROSERVICES ABOVE THE CLOUD - DESIGNING THE INTERNATIONAL SPACE STATION FOR...
MICROSERVICES ABOVE THE CLOUD - DESIGNING THE INTERNATIONAL SPACE STATION FOR...MICROSERVICES ABOVE THE CLOUD - DESIGNING THE INTERNATIONAL SPACE STATION FOR...
MICROSERVICES ABOVE THE CLOUD - DESIGNING THE INTERNATIONAL SPACE STATION FOR...DevOpsDays Tel Aviv
 
THE (IR)RATIONAL INCIDENT RESPONSE: HOW PSYCHOLOGICAL BIASES AFFECT INCIDENT ...
THE (IR)RATIONAL INCIDENT RESPONSE: HOW PSYCHOLOGICAL BIASES AFFECT INCIDENT ...THE (IR)RATIONAL INCIDENT RESPONSE: HOW PSYCHOLOGICAL BIASES AFFECT INCIDENT ...
THE (IR)RATIONAL INCIDENT RESPONSE: HOW PSYCHOLOGICAL BIASES AFFECT INCIDENT ...DevOpsDays Tel Aviv
 
PRINCIPLES OF OBSERVABILITY // DANIEL MAHER, DataDog
PRINCIPLES OF OBSERVABILITY // DANIEL MAHER, DataDogPRINCIPLES OF OBSERVABILITY // DANIEL MAHER, DataDog
PRINCIPLES OF OBSERVABILITY // DANIEL MAHER, DataDogDevOpsDays Tel Aviv
 
NUDGE AND SLUDGE: DRIVING SECURITY WITH DESIGN // J. WOLFGANG GOERLICH, Duo S...
NUDGE AND SLUDGE: DRIVING SECURITY WITH DESIGN // J. WOLFGANG GOERLICH, Duo S...NUDGE AND SLUDGE: DRIVING SECURITY WITH DESIGN // J. WOLFGANG GOERLICH, Duo S...
NUDGE AND SLUDGE: DRIVING SECURITY WITH DESIGN // J. WOLFGANG GOERLICH, Duo S...DevOpsDays Tel Aviv
 
(Ignite) TAKE A HIKE: PREVENTING BATTERY CORROSION - LEAH VOGEL, CHEGG
(Ignite) TAKE A HIKE: PREVENTING BATTERY CORROSION - LEAH VOGEL, CHEGG(Ignite) TAKE A HIKE: PREVENTING BATTERY CORROSION - LEAH VOGEL, CHEGG
(Ignite) TAKE A HIKE: PREVENTING BATTERY CORROSION - LEAH VOGEL, CHEGGDevOpsDays Tel Aviv
 
BUILDING A DR PLAN FOR YOUR CLOUD INFRASTRUCTURE FROM THE GROUND UP, MOSHE BE...
BUILDING A DR PLAN FOR YOUR CLOUD INFRASTRUCTURE FROM THE GROUND UP, MOSHE BE...BUILDING A DR PLAN FOR YOUR CLOUD INFRASTRUCTURE FROM THE GROUND UP, MOSHE BE...
BUILDING A DR PLAN FOR YOUR CLOUD INFRASTRUCTURE FROM THE GROUND UP, MOSHE BE...DevOpsDays Tel Aviv
 
THE THREE DISCIPLINES OF CI/CD SECURITY, DANIEL KRIVELEVICH, Cider Security
THE THREE DISCIPLINES OF CI/CD SECURITY, DANIEL KRIVELEVICH, Cider SecurityTHE THREE DISCIPLINES OF CI/CD SECURITY, DANIEL KRIVELEVICH, Cider Security
THE THREE DISCIPLINES OF CI/CD SECURITY, DANIEL KRIVELEVICH, Cider SecurityDevOpsDays Tel Aviv
 
THE PLEASURES OF ON-PREM, TOMER GABEL
THE PLEASURES OF ON-PREM, TOMER GABELTHE PLEASURES OF ON-PREM, TOMER GABEL
THE PLEASURES OF ON-PREM, TOMER GABELDevOpsDays Tel Aviv
 
CONFIGURATION MANAGEMENT IN THE CLOUD NATIVE ERA, SHAHAR MINTZ, EggPack
CONFIGURATION MANAGEMENT IN THE CLOUD NATIVE ERA, SHAHAR MINTZ, EggPackCONFIGURATION MANAGEMENT IN THE CLOUD NATIVE ERA, SHAHAR MINTZ, EggPack
CONFIGURATION MANAGEMENT IN THE CLOUD NATIVE ERA, SHAHAR MINTZ, EggPackDevOpsDays Tel Aviv
 
SOLVING THE DEVOPS CRISIS, ONE PERSON AT A TIME, CHRISTINA BABITSKI, Develeap
SOLVING THE DEVOPS CRISIS, ONE PERSON AT A TIME, CHRISTINA BABITSKI, DeveleapSOLVING THE DEVOPS CRISIS, ONE PERSON AT A TIME, CHRISTINA BABITSKI, Develeap
SOLVING THE DEVOPS CRISIS, ONE PERSON AT A TIME, CHRISTINA BABITSKI, DeveleapDevOpsDays Tel Aviv
 
OPTIMIZING PERFORMANCE USING CONTINUOUS PRODUCTION PROFILING ,YONATAN GOLDSCH...
OPTIMIZING PERFORMANCE USING CONTINUOUS PRODUCTION PROFILING ,YONATAN GOLDSCH...OPTIMIZING PERFORMANCE USING CONTINUOUS PRODUCTION PROFILING ,YONATAN GOLDSCH...
OPTIMIZING PERFORMANCE USING CONTINUOUS PRODUCTION PROFILING ,YONATAN GOLDSCH...DevOpsDays Tel Aviv
 
HOW TO SCALE YOUR ONCALL OPERATION, AND SURVIVE TO TELL, ANTON DRUKH
HOW TO SCALE YOUR ONCALL OPERATION, AND SURVIVE TO TELL, ANTON DRUKHHOW TO SCALE YOUR ONCALL OPERATION, AND SURVIVE TO TELL, ANTON DRUKH
HOW TO SCALE YOUR ONCALL OPERATION, AND SURVIVE TO TELL, ANTON DRUKHDevOpsDays Tel Aviv
 
HOW TO OPTIMIZE NON-CODING TIME, ORI KEREN, LinearB
HOW TO OPTIMIZE NON-CODING TIME, ORI KEREN, LinearBHOW TO OPTIMIZE NON-CODING TIME, ORI KEREN, LinearB
HOW TO OPTIMIZE NON-CODING TIME, ORI KEREN, LinearBDevOpsDays Tel Aviv
 
FLYING BLIND - ACCESSIBILITY IN MONITORING, FEU MOUREK, Icinga
FLYING BLIND - ACCESSIBILITY IN MONITORING, FEU MOUREK, IcingaFLYING BLIND - ACCESSIBILITY IN MONITORING, FEU MOUREK, Icinga
FLYING BLIND - ACCESSIBILITY IN MONITORING, FEU MOUREK, IcingaDevOpsDays Tel Aviv
 
(Ignite) WHAT'S BURNING THROUGH YOUR CLOUD BILL - GIL BAHAT, CIDER SECURITY
(Ignite) WHAT'S BURNING THROUGH YOUR CLOUD BILL - GIL BAHAT, CIDER SECURITY(Ignite) WHAT'S BURNING THROUGH YOUR CLOUD BILL - GIL BAHAT, CIDER SECURITY
(Ignite) WHAT'S BURNING THROUGH YOUR CLOUD BILL - GIL BAHAT, CIDER SECURITYDevOpsDays Tel Aviv
 
SLO DRIVEN DEVELOPMENT, ALON NATIV, Tomorrow.io
SLO DRIVEN DEVELOPMENT, ALON NATIV, Tomorrow.ioSLO DRIVEN DEVELOPMENT, ALON NATIV, Tomorrow.io
SLO DRIVEN DEVELOPMENT, ALON NATIV, Tomorrow.ioDevOpsDays Tel Aviv
 
ONBOARDING IN LOCKDOWN, HILA FOX, Augury
ONBOARDING IN LOCKDOWN, HILA FOX, AuguryONBOARDING IN LOCKDOWN, HILA FOX, Augury
ONBOARDING IN LOCKDOWN, HILA FOX, AuguryDevOpsDays Tel Aviv
 
DON'T PANIC: GETTING YOUR INFRASTRUCTURE DRIFT UNDER CONTROL, ERAN BIBI, Firefly
DON'T PANIC: GETTING YOUR INFRASTRUCTURE DRIFT UNDER CONTROL, ERAN BIBI, FireflyDON'T PANIC: GETTING YOUR INFRASTRUCTURE DRIFT UNDER CONTROL, ERAN BIBI, Firefly
DON'T PANIC: GETTING YOUR INFRASTRUCTURE DRIFT UNDER CONTROL, ERAN BIBI, FireflyDevOpsDays Tel Aviv
 

Mais de DevOpsDays Tel Aviv (20)

YOUR OPEN SOURCE PROJECT IS LIKE A STARTUP, TREAT IT LIKE ONE, EYAR ZILBERMAN...
YOUR OPEN SOURCE PROJECT IS LIKE A STARTUP, TREAT IT LIKE ONE, EYAR ZILBERMAN...YOUR OPEN SOURCE PROJECT IS LIKE A STARTUP, TREAT IT LIKE ONE, EYAR ZILBERMAN...
YOUR OPEN SOURCE PROJECT IS LIKE A STARTUP, TREAT IT LIKE ONE, EYAR ZILBERMAN...
 
GRAPHQL TO THE RES(T)CUE, ELLA SHARAKANSKI, Salto
GRAPHQL TO THE RES(T)CUE, ELLA SHARAKANSKI, SaltoGRAPHQL TO THE RES(T)CUE, ELLA SHARAKANSKI, Salto
GRAPHQL TO THE RES(T)CUE, ELLA SHARAKANSKI, Salto
 
MICROSERVICES ABOVE THE CLOUD - DESIGNING THE INTERNATIONAL SPACE STATION FOR...
MICROSERVICES ABOVE THE CLOUD - DESIGNING THE INTERNATIONAL SPACE STATION FOR...MICROSERVICES ABOVE THE CLOUD - DESIGNING THE INTERNATIONAL SPACE STATION FOR...
MICROSERVICES ABOVE THE CLOUD - DESIGNING THE INTERNATIONAL SPACE STATION FOR...
 
THE (IR)RATIONAL INCIDENT RESPONSE: HOW PSYCHOLOGICAL BIASES AFFECT INCIDENT ...
THE (IR)RATIONAL INCIDENT RESPONSE: HOW PSYCHOLOGICAL BIASES AFFECT INCIDENT ...THE (IR)RATIONAL INCIDENT RESPONSE: HOW PSYCHOLOGICAL BIASES AFFECT INCIDENT ...
THE (IR)RATIONAL INCIDENT RESPONSE: HOW PSYCHOLOGICAL BIASES AFFECT INCIDENT ...
 
PRINCIPLES OF OBSERVABILITY // DANIEL MAHER, DataDog
PRINCIPLES OF OBSERVABILITY // DANIEL MAHER, DataDogPRINCIPLES OF OBSERVABILITY // DANIEL MAHER, DataDog
PRINCIPLES OF OBSERVABILITY // DANIEL MAHER, DataDog
 
NUDGE AND SLUDGE: DRIVING SECURITY WITH DESIGN // J. WOLFGANG GOERLICH, Duo S...
NUDGE AND SLUDGE: DRIVING SECURITY WITH DESIGN // J. WOLFGANG GOERLICH, Duo S...NUDGE AND SLUDGE: DRIVING SECURITY WITH DESIGN // J. WOLFGANG GOERLICH, Duo S...
NUDGE AND SLUDGE: DRIVING SECURITY WITH DESIGN // J. WOLFGANG GOERLICH, Duo S...
 
(Ignite) TAKE A HIKE: PREVENTING BATTERY CORROSION - LEAH VOGEL, CHEGG
(Ignite) TAKE A HIKE: PREVENTING BATTERY CORROSION - LEAH VOGEL, CHEGG(Ignite) TAKE A HIKE: PREVENTING BATTERY CORROSION - LEAH VOGEL, CHEGG
(Ignite) TAKE A HIKE: PREVENTING BATTERY CORROSION - LEAH VOGEL, CHEGG
 
BUILDING A DR PLAN FOR YOUR CLOUD INFRASTRUCTURE FROM THE GROUND UP, MOSHE BE...
BUILDING A DR PLAN FOR YOUR CLOUD INFRASTRUCTURE FROM THE GROUND UP, MOSHE BE...BUILDING A DR PLAN FOR YOUR CLOUD INFRASTRUCTURE FROM THE GROUND UP, MOSHE BE...
BUILDING A DR PLAN FOR YOUR CLOUD INFRASTRUCTURE FROM THE GROUND UP, MOSHE BE...
 
THE THREE DISCIPLINES OF CI/CD SECURITY, DANIEL KRIVELEVICH, Cider Security
THE THREE DISCIPLINES OF CI/CD SECURITY, DANIEL KRIVELEVICH, Cider SecurityTHE THREE DISCIPLINES OF CI/CD SECURITY, DANIEL KRIVELEVICH, Cider Security
THE THREE DISCIPLINES OF CI/CD SECURITY, DANIEL KRIVELEVICH, Cider Security
 
THE PLEASURES OF ON-PREM, TOMER GABEL
THE PLEASURES OF ON-PREM, TOMER GABELTHE PLEASURES OF ON-PREM, TOMER GABEL
THE PLEASURES OF ON-PREM, TOMER GABEL
 
CONFIGURATION MANAGEMENT IN THE CLOUD NATIVE ERA, SHAHAR MINTZ, EggPack
CONFIGURATION MANAGEMENT IN THE CLOUD NATIVE ERA, SHAHAR MINTZ, EggPackCONFIGURATION MANAGEMENT IN THE CLOUD NATIVE ERA, SHAHAR MINTZ, EggPack
CONFIGURATION MANAGEMENT IN THE CLOUD NATIVE ERA, SHAHAR MINTZ, EggPack
 
SOLVING THE DEVOPS CRISIS, ONE PERSON AT A TIME, CHRISTINA BABITSKI, Develeap
SOLVING THE DEVOPS CRISIS, ONE PERSON AT A TIME, CHRISTINA BABITSKI, DeveleapSOLVING THE DEVOPS CRISIS, ONE PERSON AT A TIME, CHRISTINA BABITSKI, Develeap
SOLVING THE DEVOPS CRISIS, ONE PERSON AT A TIME, CHRISTINA BABITSKI, Develeap
 
OPTIMIZING PERFORMANCE USING CONTINUOUS PRODUCTION PROFILING ,YONATAN GOLDSCH...
OPTIMIZING PERFORMANCE USING CONTINUOUS PRODUCTION PROFILING ,YONATAN GOLDSCH...OPTIMIZING PERFORMANCE USING CONTINUOUS PRODUCTION PROFILING ,YONATAN GOLDSCH...
OPTIMIZING PERFORMANCE USING CONTINUOUS PRODUCTION PROFILING ,YONATAN GOLDSCH...
 
HOW TO SCALE YOUR ONCALL OPERATION, AND SURVIVE TO TELL, ANTON DRUKH
HOW TO SCALE YOUR ONCALL OPERATION, AND SURVIVE TO TELL, ANTON DRUKHHOW TO SCALE YOUR ONCALL OPERATION, AND SURVIVE TO TELL, ANTON DRUKH
HOW TO SCALE YOUR ONCALL OPERATION, AND SURVIVE TO TELL, ANTON DRUKH
 
HOW TO OPTIMIZE NON-CODING TIME, ORI KEREN, LinearB
HOW TO OPTIMIZE NON-CODING TIME, ORI KEREN, LinearBHOW TO OPTIMIZE NON-CODING TIME, ORI KEREN, LinearB
HOW TO OPTIMIZE NON-CODING TIME, ORI KEREN, LinearB
 
FLYING BLIND - ACCESSIBILITY IN MONITORING, FEU MOUREK, Icinga
FLYING BLIND - ACCESSIBILITY IN MONITORING, FEU MOUREK, IcingaFLYING BLIND - ACCESSIBILITY IN MONITORING, FEU MOUREK, Icinga
FLYING BLIND - ACCESSIBILITY IN MONITORING, FEU MOUREK, Icinga
 
(Ignite) WHAT'S BURNING THROUGH YOUR CLOUD BILL - GIL BAHAT, CIDER SECURITY
(Ignite) WHAT'S BURNING THROUGH YOUR CLOUD BILL - GIL BAHAT, CIDER SECURITY(Ignite) WHAT'S BURNING THROUGH YOUR CLOUD BILL - GIL BAHAT, CIDER SECURITY
(Ignite) WHAT'S BURNING THROUGH YOUR CLOUD BILL - GIL BAHAT, CIDER SECURITY
 
SLO DRIVEN DEVELOPMENT, ALON NATIV, Tomorrow.io
SLO DRIVEN DEVELOPMENT, ALON NATIV, Tomorrow.ioSLO DRIVEN DEVELOPMENT, ALON NATIV, Tomorrow.io
SLO DRIVEN DEVELOPMENT, ALON NATIV, Tomorrow.io
 
ONBOARDING IN LOCKDOWN, HILA FOX, Augury
ONBOARDING IN LOCKDOWN, HILA FOX, AuguryONBOARDING IN LOCKDOWN, HILA FOX, Augury
ONBOARDING IN LOCKDOWN, HILA FOX, Augury
 
DON'T PANIC: GETTING YOUR INFRASTRUCTURE DRIFT UNDER CONTROL, ERAN BIBI, Firefly
DON'T PANIC: GETTING YOUR INFRASTRUCTURE DRIFT UNDER CONTROL, ERAN BIBI, FireflyDON'T PANIC: GETTING YOUR INFRASTRUCTURE DRIFT UNDER CONTROL, ERAN BIBI, Firefly
DON'T PANIC: GETTING YOUR INFRASTRUCTURE DRIFT UNDER CONTROL, ERAN BIBI, Firefly
 

Último

Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 

Último (20)

Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 

There is No Spoon - Ran Leibman, Facebook - DevOpsDays Tel Aviv 2016

  • 1. Production Engineering (PE) There Is No Spoon Ran Leibman Production Engineer
  • 2. Agenda 1. How Production Engineering was formed in Facebook 2. What we do in Onavo 3. How we moved to the PE model in Onavo 4. Q & A
  • 3. Facebook - Pre PE Days
  • 4. SRO - Site Reliability Operations 1. Keep the site up 24/7 2. Follow the sun 3. Capacity plans
  • 5. Why SRO was not enough ?
  • 6.
  • 7.
  • 8.
  • 9. What are the alternatives ?
  • 10. NOC
  • 11.
  • 12. The Production Engineering Model 1. PEs are embedded within the software engineering teams 2. Taking part in meetings 3. Involved in roadmap plans 4. Reviewing diffs 5. Oncall - Software & Production Engineers
  • 13. Onavo - Adopting The PE Model
  • 14. » Protect user traffic using IPsec » Protect against malicious sites » Compress user traffic » Control data leakage Save, Measure & Protect your mobile data
  • 15. a bit of context Onavo 1. Founded at 2010 2. Classic Startup Dev & Ops teams 1. Dev - writes code 2. Ops - keeps the infra up & running 3. Acquired by Facebook at 2013
  • 16.
  • 17. Making The Change - Step By Step
  • 18. Step 1 - Go Sit Close/Next With The Developers
  • 19. Step 2 - Get The Colleagues Onboard
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30. Step 3 - Get Your Tooling Ready
  • 31.
  • 32. you don’t want that “Confused Travolta” moment … Have Good (short) Documentation » Document your alerts » Links to dashboards » Links to third party software docs » Runbooks - how to debug in prod » log files, how to restart the service, getting stack traces & metrics … » Links to config management
  • 34. avoid the graph porn … Simple And Indicative Dashboards 1. Match the product KPIs 2. Strong signal 3. Intuitive titles 4. Easy to spot anomalies 5. Easy to find correlations
  • 35. Step 4 - Review Your Alerts
  • 36. rm -rf /all/false/alarms* Refactor Your Alerts as Needed » The first challenge is to make sure alerts are handled » To make it possible every alert should be » Indicate a real problem » Clear to understand - Informative » Impactful » Actionable
  • 37. Step 5 - Train The Team - Get Them Ready
  • 38. learning is easy - remembering is hard Train The Team » Wiki / Doc based » makes it easier to remember » Hands-on Hands-on Hands-on » Pre create task pool (even if low impact) » Give oncall use cases & examples » Reusable
  • 39. Step 6 - Oncall + Hand Holding
  • 40. make yourself available and adjust as you go Shared Oncall » Short oncall cycles, 1-2 days » Increase the period each cycle » Oncall Summaries » Do oncall as well - set an example » Preemptively check status with
 the current oncall
  • 41. √ Step 1 - Go Sit Close/Next With The Developers The Steps
  • 42. √ Step 2 - Get The Colleagues Onboard √ Step 1 - Go Sit Close/Next With The Developers The Steps
  • 43. √ Step 3 - Get Your Tooling Ready √ Step 2 - Get The Colleagues Onboard √ Step 1 - Go Sit Close/Next With The Developers The Steps
  • 44. √ Step 4 - Review Your Alerts √ Step 3 - Get Your Tooling Ready √ Step 2 - Get The Colleagues Onboard √ Step 1 - Go Sit Close/Next With The Developers The Steps
  • 45. √ Step 5 - Train The Team √ Step 4 - Review Your Alerts √ Step 3 - Get Your Tooling Ready √ Step 2 - Get The Colleagues Onboard √ Step 1 - Go Sit Close/Next With The Developers The Steps
  • 46. √ Step 6 - Oncall + Hand Holding √ Step 5 - Train The Team √ Step 4 - Review Your Alerts √ Step 3 - Get Your Tooling Ready √ Step 2 - Get The Colleagues Onboard √ Step 1 - Go Sit Close/Next With The Developers The Steps
  • 47.
  • 48.
  • 49.
  • 50.
  • 51.
  • 52.