This document discusses enabling collaborative monitoring across teams through observability. It recommends standardizing metrics, logs, and traces and linking them together. This provides relevant information to different teams in an integrated way through everyone's tools. A case study demonstrates how a chatbot can serve as a generic interface for accessing various monitoring tools simply and with the appropriate level of detail. Lessons learned emphasize that tool configuration and conventions are important for collaborative observability to work, and that providing context like trace IDs eases communication across roles.
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
Observability für alle
1. Observability für alle
Cloud Native Night
25. Oktober 2018
Florian Lautenschlager
florian.lautenschlager@qaware.de
@flolaut
Josef Fuchshuber
josef.fuchshuber@qaware.de
@fuchshuber
4. Our team is just as vital and heterogeneous as our software.
Observability für alle 4
Platform Developer
App Developer
Skill Developer
Client Developer
Tester
Ops
Help Desk
Product Management
Data Scientist
UX Designer
6. What is the hardest step in the DevOps process?
Observability für alle 6
DEV OPS
7. Much better: The 6 Cs of the DevOps Cycle.
Observability für alle 7
Source: https://dzone.com/articles/6-cs-of-devops-adoption
8. Observability in the wild!
A case study… and how we found
collaborative monitoring.
9. Monitoring Toolchain: Simply Cloud Native.
Observability für alle 9
Metrics Events Traces
Java (Spring Boot) or Python
on
Azure / Kubernetes / Openshift / Docker
10. Monitoring
Technical and Functional
Observability für alle 10
Kubernetes
Generic monitoring that
does not need knowledge
about the application.
Monitoring that does
need knowledge about
the application.
Health of platform and application Telemetry data
Infrastructure-Monitoring
Application-Monitoring
11. Monitoring
Technical and Functional
Observability für alle 11
Questions:
Services are up and running
Services can accept traffic
Sources:
Kubestate-Exporter
Prometheus-Node-Exporter
JMX, top, iostat etc.
Questions:
Use-Cases runtimes
Service level agreements
Sources:
Specific instrumentation
(around use cases, etc.)
Health of platform and application Telemetry data
Kubernetes
Infrastructure-Monitoring
Application-Monitoring
14. Code-Slide: Standardize tracing logs and tags.
Observability für alle 14
Span logs: We model database calls as well
as other expensive calls as logs using a
template to reduce the size of traces:
db:<Repo>.<Call> took: xx ms.
call:<Class>.<Method> took: xx ms.
Span tags: Used to model values that are
valid for a span. We use a template to
standardize tags.
span.tag. (to mark our tags)
Environment (staging, integration , etc.)
db (to mark spans with db calls.)
param.<name>=value (call parameters)
17. end-2-end tests are also integrated in our observability stack.
17
See the logs
18. We provide these tools and techniques to every
developer, but…
Observability für alle 18
Best SmartSpeaker in the World. Best Software-System in the World.Best Developers in the World.
Blah blah Weather blah
(voice)
Don’t understand “Blah blah Weather blah”
… in case of an error the experts of the best software
system in the world were often asked, what is the
problem?
19. I know. Most of you do this already. But what about ..
Observability für alle 19
Collaborative Monitoring!?!?
20. An example is the best explanation.
Observability für alle 20
and a chatbot…
and a monitoring toolchain…
Once there was a little tiny application…
29. Three steps to enable collaborative monitoring.
29
Standardize
metrics, logs
and traces
Link and
combine them
as far as
possible
Integrate them
into everyone's
tools
Start Here
Correlate Events and Trace by Context
Metrics with Events and Traces by Time
Structured Logging + Context, Metric names, etc.
Tools your team
30.
31. Did we create an uncontrollable observability monster?
Observability für alle 31
32. There’s No
Such Thing as
a Free Lunch
• The more complex a
microservice architecture is,
the more sophisticated the
observability solution must be.
• For Collaborative Observability
there is no out of the box
solution.
Observability für alle 32
33. Collaborative Monitoring by everyone.
Observability für alle 33
Ease of use.
Simple general interface to access various monitoring tools.
Integrated into everyone's daily tools (ChatBots, E-Mail, etc.)
Support all kinds of teams: Operations / Dev-Ops / Developers / QA-Team / My mum =)
Allow everyone to get superman insights.
Decrease Mean Time To Recovery (MTTR) with a fast analysis
Integrates different kinds of monitoring data (traces, metrics and logs) of different monitoring layers.
The right information. Provide relevant information for different teams, e.g. runtimes for perf. engineer.
Level of Detail: Abstract (use case level) for management vs. details (database calls) for developers
35. Lessons Learned
Observability für alle 35
Tool stack is awesome: Prometheus, Sleuth / Zipkin, Logging (fluentD, elastic) is stable with a good
documentation.
Maximum flexibility compared to commercial products.
But: Effort for concepts, implementation and quality checks. Conventions and rulesets are important!
Mindset:We found that we had to convince people first. But we have seen a high level of acceptance.
Example: Chatbot with trace-links is standard tool for discussing possible bugs between all project roles.
Development and system understanding: No need of “cloudy” conversations. Just provide the context,
e.g. a trace id.
Example: Issues typically contain the context (trace id) that points the developer to the logs and the trace.
Mark customer and automatic test traffic for better dashboards and analytics.
Observability tool stack is a first calls citizen:You do not make friends when it's down