2. 2
• January 2000: Graduated, worked as Software Engineer
for 11 years, across 3 companies
• January 2011: Switched to being a manager
• January 2012: Started a new team from scratch in EC2,
over the next 4.5 years, grew team from 1 to 52
• November 2016: Left EC2, burned out. Joined Two
Sigma. Developed this material.
• March 2019: Joined Datadog, VP of Metrics and Alerts
Who am I?
3. 3
● Seven areas of management: People, Product, Execution, Partners,
Operations, Engineering and the Company
● Getting in front of all to have no negative surprises is impossible
● These negative surprises are “misses”. They happen.
● Growing as a manager is owning misses, and thinking broadly about
mechanisms to avoid or mitigate them earlier
● All while delegating more responsibility to your team in a mechanistic
manner, to help them through the recursive process
Summary
4. 4
What is a miss?
A miss is anytime the organization or anyone in it
is negatively impacted as a result of your team’s
action or inaction
5. 5
What are the 7 Areas of Management?
The seven areas that need your time and focus:
Engineering: How are things being built?
People: Are people happy and growing in what is being built?
Execution: How are things getting built?
Product: Are customers satisfied by what is being built?
Operations: Is the built thing going to keep running?
Partners: Do all my partners understand and agree with all the above?
Company: Does the company align with all these answers?
7. 7
● Is your team following industry best practices on:
○ Code Quality?
○ Unit and integration testing?
○ Specification and design
○ Getting consensus on specification and design?
○ The amount of tech debt being accumulated or paid down?
● If not, how are you spending time and focus to change the path?
Engineering: Broad Strategic Questions
8. 8
● January 2012:
○ Start new team (EC2 Nitro)
○ 7 person team reporting to me. Lead reported to my manager.
○ Lead refused to unit test his code; thought it was a bad practice
● November 2013:
○ Release V1 on time
○ Over 50,000 lines of C
○ Lead engineer wrote 80% - 40,000 lines
○ With no unit tests, and good (but shallow) integration tests
Engineering: Anatomy of a Miss
9. 9
● March 2014:
○ Lead engineer quits to found a startup
○ All my focus was on shipping V2
○ I give the lead’s old code to strong junior engineer
● July 2014:
○ Two character “/8” bug costs two development months to resolve
Engineering: Anatomy of a Miss
10. 10
I didn’t take the time and ask:
Now the lead is leaving, how do I accommodate the tech
debt we have accumulated?
Engineering: My Miss
11. 11
The only things you control are your time and your
focus.
You need to always ask yourself: are you using them in
the most optimal manner at this time?
Engineering: Broad Lesson
13. 13
● Do your people have purpose; understanding where their work fits
in the company mission?
● Do you have the right people to achieve your part of that mission?
● Do you understand and accommodate what motivates them?
● Do you understand and accommodate their growth?
● Have you created an environment of safety where they can be
honest, have their own misses, and grow?
People: Broad Strategic Questions
14. 14
● March 2014: Two months after lead engineer left:
○ My manager also left
○ I took over existing team, who owned software with no clear future
○ I met with the manager every week, he thought team was happy
○ I focussed on executing V2 with my original team
● August 2014: Finally had skip 1:1s with the team I took over
○ And realized half were on the verge of quitting
○ They saw no future for their team and so themselves
People: Anatomy of a Miss
15. 15
My manager missed in being too tactical in his 1:1s
But:
○ I missed in not asking deeper questions in my 1:1s
○ I also missed in not having skip 1:1s sooner
○ We both missed in putting too much time and focus
on Execution rather than People
People: My Misses
16. 16
When your head is down you are missing what’s up
i.e., When you focus on only one area you miss
information on the other six
People: Broad Lessons
18. 18
● What deadlines does your team have?
○ How real are they?
○ Are you on top of executing to hit them in face of all risk?
○ What buffer or options to shuffle priorities do you have?
○ Do your partners, management and customers understand all this?
● Do you know all your external dependencies?
○ Are they on track?
○ Do they believe your deadlines for them are real?
Execution: Broad Strategic Questions
19. 19
● November 2013: V1 miracuously shipped on time
● November 2014: deadline for V2
● August 2014 year to date recap:
○ Manager and lead engineer gone
○ Managing additional team, which I had to convince had a future
○ V1 in production with bug that cost dev-months
Execution: Anatomy of a Miss
20. 20
● 1st November 2014: (2 weeks out from release date)
○ Reset for December 15th
● 1st December 2014: (2 weeks out from new release date)
○ Reset for January 9th
● 9th January 2015: Launched V2
○ With nasty data corruption bug
○ Discovered quickly, but two months to fully mitigate
Execution: Anatomy of a Miss
21. 21
When the first slip happened, did not seriously re-
evaluate ship date and risks
Result was a 6 month death march
Execution: My Miss
22. 22
By being actionable information, misses are
opportunities
To take advantage of the opportunity you need to own
the miss, and reset your strategy in light of them
Execution: Broad Lesson
24. 24
● Why does your team exist - what is your vision?
○ “A collection of somewhat related systems” is not a vision
○ Does your team own the right systems to execute that vision?
● What is your strategy to deliver your vision?
○ i.e., which systems are you investing in and why?
● What is your execution plan (i.e. roadmap) for that strategy?
● Do all these people agree with the above:
○ Customers, Partners, Team members, Your Management?
○ Why are you sure?
Product: Broad Strategic Questions
25. 25
● January 2015: Technology established, new 2015 initiatives come in
needing major work from my team:
○ Hypervisor and bare metal functionality (i.e. c5 nitro)
○ Network load balancing (i.e. ALB)
○ Multiple types of storage (i.e. EFS)
○ Low latency NICs (i.e., ENA/EFA)
● All on top of my organizations (VPC) full roadmap
● I met with each 1:1 to come up with a compromise of partial commits
● But each new team set goals assuming 100% commit
● Causing a lot of political infighting, costing me a lot of time and focus
Product: Anatomy of a Miss
26. 26
I didn’t proactively own the roadmap narrative for my
team
That led partner teams to make mistakes in timeline of
their strategies
Product: My Miss
27. 27
A miss is anytime the organization or anyone in it
is negatively impacted as a result of your team’s
action or inaction
● i.e., It includes misses of communication
● It includes when the other party should have communicated with you
● It includes when you did communicate, but not in a way the other
party committed to being accountable
Product: Broadening What I Think of as a Miss
28. 28
You need to own the narrative
● You need to have a strategy
● You need to communicate that strategy
● You need to be seen to deliver on your strategies
Product: Broad Lesson
30. 30
● Am I having sideways 1:1s often enough with all managers whose
teams are impacted by, or impact my team?
● Do I understand what their goals and challenges are?
● Am I always pushing back on behalf of my team’s happiness and goals,
and never considering the other team’s happiness and goals?
● Are my team doing the same, without me knowing?
Partners: Broad Strategic Questions
31. 31
● December:
○ Take over software engineering team
■ Owns their own networking switches
■ Different vendor to the rest of the network
○ My team strongly pushed that Networking team should take them
○ Run quiet for 2 years, and due to be retired in 14 months
○ Networking team refused to take ownership
● 11 months later:
○ Switches started having mass operational issues
○ I had to ask the Networking team for help, and they did
○ Their engineer engaged for more than a month
Partners: Anatomy of an Mitigated Miss
32. 32
Partners: How I avoided a bigger miss
● Throughout the year, had 1:1s with Networking management
● Also gave three months of my developer time to work on a project that
fit my developer’s interest and their need
33. 33
Sometimes the only way to avoid a miss is a partner
sacrificing. It’s better when this is because they want to
help you, instead of needing to escalate
That comes from building relationships and
understanding ahead of time
Partners: Broad Lesson
35. 35
● Are my team on top of:
○ Monitoring Production?
○ Capacity Planning?
○ Change Management?
○ Operational Customer Communication?
● Why am I sure?
● What mechanisms do I need to stay sure?
Operations: Broad Strategic Questions
36. 36
Operations: Anatomy of a Small Miss
● Beginning of Year:
○ Datacenter team moves to a quarterly ordering model for servers
○ This means ordering a server can take 5 months
○ I communicated this to my managers in a staff meeting
● August:
○ Two of my managers say they need capacity in 3 months
37. 37
Operations: Avoiding a Bigger Miss
False miss: I did not communicate in a way that drove
ongoing focus.
Real Miss: I had no mechanism to ensure managers were
staying on top of this continuous need
38. 38
Leveraging time and focus is building mechanisms
around delegation
Operations: Broad Lesson
39. 39
What is a Mechanism?
● 4 basic things:
○ Identification of stakeholders affected
○ A goal for which success/failure can be ongoingly judged
○ A owner
○ A periodic or edge triggered check in mechanism
communicated to all stakeholders
● Example: TPM owns a project with a deadline, defines
milestones they own, hosting a status update meeting for all
stakeholders after each
● Example: Eng Manager owns a Availability SLA for their
service, each month owns reporting misses to stakeholders
41. 41
● Is there something about the way the organization does people,
product, process, partners, engineering, operations that is not actually
right for the company?
● Is this going to be a major problem?
● Then what do I need to influence the company to change?
Company: Broad Strategic Questions
42. 42
● AWS was losing Systems Engineers to competitors because of
compensation
● A peer of mine decided to take ownership
● After taking up with HR, understands Amazon lumps engineers doing
automation at massive scale with operational engineers
● Peer works with HR to create new job family; works with his leadership
to get it through CEO approval
● Amazon creates new job family (Systems Development Engineer) with
compensation that aligns with competitors
Company: Peer Addresses a Miss
44. 44
● Seven areas of management: People, Product, Execution, Partners,
Operations, Engineering and the Company
● Getting in front of all to have no negative surprises is impossible
● These negative surprises are “misses”. They happen.
● Growing as a manager is owning misses, and thinking broadly about
mechanisms to avoid or mitigate them earlier
● All while delegating more responsibility to your team in a mechanistic
manner, to help them through the recursive process
Summary