While learning and implementing new technologies into your infra has major impact, sometime we miss the big picture and forget to involve our good friends - the software engineers. Come hear how you can solve this problem, like we did at Facebook: how production engineers and software engineers works together, from the roadmap plans all the way to a shared oncall.
12. The Production Engineering Model
1. PEs are embedded within the software engineering teams
2. Taking part in meetings
3. Involved in roadmap plans
4. Reviewing diffs
5. Oncall - Software & Production Engineers
14. » Protect user traffic using IPsec
» Protect against malicious sites
» Compress user traffic
» Control data leakage
Save, Measure & Protect your mobile data
15. a bit of context
Onavo
1. Founded at 2010
2. Classic Startup Dev & Ops teams
1. Dev - writes code
2. Ops - keeps the infra up & running
3. Acquired by Facebook at 2013
32. you don’t want that “Confused Travolta” moment …
Have Good (short) Documentation
» Document your alerts
» Links to dashboards
» Links to third party software docs
» Runbooks - how to debug in prod
» log files, how to restart the service, getting stack traces & metrics …
» Links to config management
34. avoid the graph porn …
Simple And Indicative Dashboards
1. Match the product KPIs
2. Strong signal
3. Intuitive titles
4. Easy to spot anomalies
5. Easy to find correlations
36. rm -rf /all/false/alarms*
Refactor Your Alerts as Needed
» The first challenge is to make sure alerts are handled
» To make it possible every alert should be
» Indicate a real problem
» Clear to understand - Informative
» Impactful
» Actionable
38. learning is easy - remembering is hard
Train The Team
» Wiki / Doc based
» makes it easier to remember
» Hands-on Hands-on Hands-on
» Pre create task pool (even if low impact)
» Give oncall use cases & examples
» Reusable
40. make yourself available and adjust as you go
Shared Oncall
» Short oncall cycles, 1-2 days
» Increase the period each cycle
» Oncall Summaries
» Do oncall as well - set an example
» Preemptively check status with
the current oncall
41. √ Step 1 - Go Sit Close/Next With The Developers
The Steps
42. √ Step 2 - Get The Colleagues Onboard
√ Step 1 - Go Sit Close/Next With The Developers
The Steps
43. √ Step 3 - Get Your Tooling Ready
√ Step 2 - Get The Colleagues Onboard
√ Step 1 - Go Sit Close/Next With The Developers
The Steps
44. √ Step 4 - Review Your Alerts
√ Step 3 - Get Your Tooling Ready
√ Step 2 - Get The Colleagues Onboard
√ Step 1 - Go Sit Close/Next With The Developers
The Steps
45. √ Step 5 - Train The Team
√ Step 4 - Review Your Alerts
√ Step 3 - Get Your Tooling Ready
√ Step 2 - Get The Colleagues Onboard
√ Step 1 - Go Sit Close/Next With The Developers
The Steps
46. √ Step 6 - Oncall + Hand Holding
√ Step 5 - Train The Team
√ Step 4 - Review Your Alerts
√ Step 3 - Get Your Tooling Ready
√ Step 2 - Get The Colleagues Onboard
√ Step 1 - Go Sit Close/Next With The Developers
The Steps