9. Just kidding, there are wrong answers
• Why obsessing over testing is an anti-
pattern
• How observability (o11y) can empower your
organization
• Observability Case Study: Single Music
14. Studies have analyzed the effects of TDD
http://softwareprocess.es/pubs/borle2017EMSE-TDD.pdf
http://www.sserg.org/publications/uploads/04b700e040d0cac8681ba3d039be87a56020dd41.pdf
15. • You’re abstracting /
coupling / adding more
logic / code
• It slows down velocity
• It reduces productivity
Unit testing is like putting your code in a tar pit
16. • If the temperature reaches critical, we should
insert these rods into the Nuclear Reactor
• If the sensors on this aircraft don’t match, our
software shouldn’t crash the plane
• Open Source libraries which other
organizations rely upon
What should we be testing?
Critical Systems Code / Pathways
17. • Is the user able to login / logout
• Can the user heart their friends
Avocado toast
• Fence-Post errors, CRUD
actions, etc.
What shouldn’t we be testing
Instead, use observability to ensure functionality
18. • How often you break production and how long
it takes you to fix it (MTTR)
• The responsiveness of your system and its
endpoints
• How long it takes to put a change request into
production
Test coverage is a vanity metric
Instead consider tracking these metrics:
20. “Everybody has a testing environment.
Some people are lucky enough enough
to have a totally separate environment to
run production in.” – ???
When production is REALLY broken
How many of you wait for integration tests to
pass before pushing to production?
21.
22. There is no such thing as a bug-free
system, choose your adventure:
• Your users see the bugs and you
already know about it
• You wait until they tell you about the
bugs (on Twitter)
You’re already testing in production
(Whether you like it or not)
23. • Slow Rollouts / Deployments
• Observe performance / error rates on
a small number of deployments and
increate over time (5% -> 10% -> 25%
-> 50% -> 100%)
Utilize Canary Deployments
They will enable you to effectively “test” in prod
24. Behaviors & I/O
• Number of retries / back-offs
• Request Parameters / Query Statements / Response
• Falling back to a default
• Top-Level Exceptions
How do we measure the internals
of an application service?
We must ask questions and emit signals from
within our applications – control theory
25. “A system is observable if the behavior of the
entire system can be determined by only looking
at its inputs and outputs.”
Lesson: control theory is a well-documented
approach which we can learn from vs trying to
reinvent
What is Observability?
Kálmán, 1961 paper
on the general theory of control systems
26. • Not just tooling
• Similar to how DevOps is a
mindset
• No longer treating services
like Schrödinger's cat
• Rich context around events
Observability
What does that word mean?
• Monitoring
• Instrumentation
• Structure Logging (tracing)
• Alerting
• Dashboards
27. • Also known as Distributed Structured Logging
• Much Larger Payloads
• Rich Context (Parameters, Query Strings, Response
Codes, etc)
https://w3c.github.io/trace-context
Distributed Tracing
It’s not as complicated as you think
28. Let’s take a few minutes to see
some of the problems we’ve solved
with Distributed Tracing at a
company I helped build called Single
Music
How does distributed tracing give
me more observability?
29. • Operated by 3 engineers (1 FE/1 BE/1 SRE)
• Over 20k transaction / hour, 20+ integrations,
50k LOC, with less than 15% test coverage
• Launched in 2018 with 15 microservices on
Docker Swarm – has since expanded to over 28
microservices with zero additional engineering
personnel
30.
31.
32. • We enable powerful insights into our
production applications
• Dependency mapping becomes trivial
• SREs and Engineers can track golden signals
for EVERY operational perspective on their
apps
When we begin emitting signals
from every transaction
39. Don’t like the dashboards your
vendor provides? Then you should be
able to use the leading open-source
solution for building your own.
You should be able to build your
own dashboards
40.
41.
42. Ed Keyes
Site Reliability Engineer – Google // 2008
“Sufficiently advanced
monitoring is indistinguishable
from testing …”
43.
44. ”I think we’ll stick with the old way
of doing things, we need more
test coverage.”
45. “I think our organization could
really benefit from more
observability”
46. Observability Workshop w/ Jaeger and Prometheus
Located in Workshop Room 2018
Today, 2-4pm
Tomorrow, 11:30-1:30pm
Want to learn more?
Instana Booth #S23
47. Rate & Share
Rate this session in the DockerCon App
Follow me @notsureifkevin
Win this droid! You can find the link on
my most recent Tweet!