Fifteen years ago, we'd barely started to use S3, and ten years ago DevOps was the new thing. Today, we can add a new tool, technology, or trick every week, and more and more work is shifted into the application developer's workflow. If security, resiliency, and incident response become part of product teams, where will we be ten years from now, and what should we do today to get ready?
7. iPads were new
Google+ just launched
Quikster
Nokia
Angular was new
No NPM
AWS Console was Manageable
SysAdmin vs. DevOps was a
thing
8.
9. 1. Business requires (fast) change
2. Change causes outages
3.Lowering the risk of change through
tools and culture
John Allspaw (Flickr/Yahoo!) and Paul Hammond (Flickr)
"10+ Deploys Per Day: Dev and Ops Cooperation at Flickr"
Dev and Ops
10.
11. Everyone has a computer in
their pocket
Metaverse, web3, crypto, NFTs
Jamstack Apps
1.3 Million Packages in NPM
150 services in AWS Console
Deploying 26,280x more often
17. “Their evidence refutes the
bimodal IT notion that you
have to choose between
speed and stability—instead,
speed depends on stability,
so good IT practices give you
both.”
28. Pager Goes Off
“In the future systems will be
much smarter about escalating
to the best possible people
considering a bunch of factors
like time zone, area of
expertise, and recency of
contact with the system being
reported on (the last N
committers to a project, or the
last N to update some config)”
- Paul Nakata
36. 1. Take into account learning style
2. Not too hard
3. Not too easy
4. Progressive disclosure
5. Docs & error messages to enable
users to solve their own issues
6. Customization is expert mode
43. 1. Take into account learning style
2. Not too hard
3. Not too easy
4. Progressive disclosure
44.
45. 1. Take into account learning style
2. Not too hard
3. Not too easy
4. Progressive disclosure
5. Docs & error messages to enable
users to solve their own issues
47. 1. Take into account learning style
2. Not too hard
3. Not too easy
4. Progressive disclosure
5. Docs & error messages to enable
users to solve their own issues
6. Customization is expert mode
51. Today’s Big Four Metrics (Accelerate)
Speed:
1. Deployment Frequency (the frequency at which new releases go to production)
2. Lead Time For Changes (the time until a commit goes to production)
Risk:
1. Change Failure Rate (the ratio of deployments to production that leads to errors
and successful deployments).
2. Mean Time to Restore (the time it takes to resolve a service impairment in
production)
52.
53. Today’s Big Four Metrics (Accelerate)
Speed:
1. Deployment Frequency (the frequency at which new releases go to production)
2. Lead Time For Changes (the time until a commit goes to production)
Risk:
1. Change Failure Rate (the ratio of deployments to production that leads to errors
and successful deployments).
2. Mean Time to Restore (the time it takes to resolve a service impairment in
production)
59. If we get:
- Full organizational buy-in for the importance of ops
- Tools designed for flow and developer experience
- Real time feedback when something goes wrong
- Incidents managed as well as deploys
61. Smoother cross team collaboration and
more infra resources
Work is more fun
Speed goes towards ∞
Risk goes towards 0
Self-provisioning runtimes
Everyone writes software
We’re going to need a bigger room for
this
conference
Ops (like the system part)
Developers (like the code part)
Started a company
Programming language, batteries included
Still exists
For very tool I mention in here I’ll link you at the end
Boldstart.vc
Some companies in the room
Share lessons from lots of people!
Anchors us squarely in the middle of this journey
Remind you or tell you because I know lots of us in the room may not have been around for this
Web Accessibility Companions
Office for Mobile
Self service APIs
Automation
Cloud provisioning
Matters more than ever as we have more businesses that rely heavily on software
Who does it matter to?
Internal customer, not a business/product center
Engineers are looking, Hashicorp or Snyk examples - DevOps2.0
Developers will physically take over in our inner loop and is being distributed
Everything is the product - Beth Long
Doing things that matter to the business
26280x better again would be 7.3 deploys/second
Balsa did a survey recently saying basically this
Productivity with faster, better, more secure operations
Implicit benefit to productivity
Builds on observability - builds on monitoring/observability before it happens
ServiceNow and Lightstep
Replay
Snaplet
Dark, Lambdragon, Natto
Kids - seem familiar, Logo Turtle, Mindstorms, etc.
MIT Mathematician - learning from feedback - Norbert Wiener
https://en.wikipedia.org/wiki/Norbert_Wiener
Fullstory + Chrome DevTools
Individuals write code faster - debug faster, etc.
Individuals check their work in atomic steps - Zuplo example with people finding other routes
Deployments babysat, now happens automatically
Manual process vs. automated process
Document
https://newsletter.pragmaticengineer.com/p/incident-review-best-practices
I don’t know about you, haven’t seen many places with automatic resolutions
If it has a manual playbook someone could read, that could definitely be automated
Personal Ownership
Certain types of problems always go to some group of people; right now most systems rely on an explicit definition of who is on call, but in the future systems will be much smarter about escalating to the best possible people considering a bunch of factors like time zone, area of expertise, and recency of contact with the system being reported on (the last N committers to a project, or the last N to update some config)
Automated recovery policies
Playbooks
Monitoring: knowns -- automated -- Pagerduty, Runbook
Observability: unknowns -- get the right person who knows the most
After a pager needs to go, pages will change
Personal Ownership
Certain types of problems always go to some group of people; right now most systems rely on an explicit definition of who is on call, but in the future systems will be much smarter about escalating to the best possible people considering a bunch of factors like time zone, area of expertise, and recency of contact with the system being reported on (the last N committers to a project, or the last N to update some config)
Automated recovery policies
Playbooks
Monitoring: knowns -- automated -- Pagerduty, Runbook
Observability: unknowns -- get the right person who knows the most
Doesn’t have to be in Slack necessarily, but can be - Allma, blameless
People automatically get directed to the right place
Pulls in related information
Easy “status” section
Firehydrant, Blameless
Less bad when something goes wrong
Continue to learn for next time to be able to go faster
Framework for what is good
We know tools go better when people adopt, what makes them adopt
Solve a real problem, have a good experience
This is for those of you who are building tools for devs or choosing them
Right information at the right time, not a good DevEx
muh·hay·lee
Chik·sent·mee·hai·ee
Not too hard - what people have, be it chat or OpenAPI
Feels like github
Works on to of the API you have
Has an OSS version and a hosted version
If you’ve gotten here, you’re winning!
Individuals write code faster - debug faster, etc.
Individuals check their work in atomic steps - Zuplo example with people finding other routes
Productivity with faster, better, more secure operations
Implicit benefit to productivity
Builds on observability - builds on monitoring/observability before it happens
ServiceNow and Lightstep
Replay
Snaplet
Dark, Lambdragon, Natto
Goes to infinity - once every three years to once a minute we’ve already improved this by 12,000,000x
Goes to 0
Goes to 0
Doesn’t matter if the previous has gone to 0
Goes to infinity - once every three years to once a minute we’ve already improved this by 12,000,000x
Goes to 0
Goes to 0
Doesn’t matter if the previous has gone to 0
Spoiler alert: MBA people don’t care about your normal metrics
Time to ticket resolution (lead time for changes)
Ratio of successful to poor experiences (change failure rate)
HIring Velocity
Dev NPS
Dev Testing
Less bad when something goes wrong
Continue to learn for next time to be able to go faster
Productivity with faster, better, more secure operations
Implicit benefit to productivity
Builds on observability - builds on monitoring/observability before it happens
ServiceNow and Lightstep
Replay
Snaplet
Dark, Lambdragon, Natto