Serverless introduces a number of challenges to existing tools for observability, we need to adapt our practices to fit this new paradigm. In this talk we will discuss how we can build observability into a serverless application. We will see how you can implement log aggregation, distributed tracing and correlation IDs through both synchronous as well as asynchronous events.
6. Abraham Wald
Wald noted that the study only
considered the aircraft that had survived
their missions—the bombers that had
been shot down were not present for the
damage assessment.
The holes in the returning aircraft, then,
represented areas where a bomber could
take damage and still return home safely.
7. Abraham Wald
Wald noted that the study only
considered the aircraft that had survived
their missions—the bombers that had
been shot down were not present for the
damage assessment.
The holes in the returning aircraft, then,
represented areas where a bomber could
take damage and still return home safely.
9. survivor bias in monitoring
Only focus on failure modes that we were able to successfully
identify through investigation and postmortem in the past.
The bullet holes that shot us down and we couldn’t identify stay
invisible, and will continue to shoot us down.
13. In control theory, observability is a measure of how well
internal states of a system can be inferred from
knowledge of its external outputs.
https://en.wikipedia.org/wiki/Observability
24. These are the four pillars of the Observability Engineering
team’s charter:
• Monitoring
• Alerting/Visualization
• Distributed systems tracing infrastructure
• Log aggregation/analytics
“
” http://bit.ly/2DnjyuW- Observability Engineering at Twitter
34. About me
▪ Principal Engineer at DAZN
▪ AWS Serverless Hero
▪ Author of Production-Ready Serverless* by Manning
▪ Blogger**
▪ Speaker
* https://bit.ly/production-ready-serverless
** https://theburningmonk.com
46. user request
user request
user request
user request
user request
user request
user request
critical paths:
minimise user-facing latency
handler
handler
handler
handler
handler
handler
handler
47. user request
user request
user request
user request
user request
user request
user request
critical paths:
minimise user-facing latency
StatsD
handler
handler
handler
handler
handler
handler
handler
rsyslog
background processing:
batched, asynchronous, low
overhead
48. user request
user request
user request
user request
user request
user request
user request
critical paths:
minimise user-facing latency
StatsD
handler
handler
handler
handler
handler
handler
handler
rsyslog
background processing:
batched, asynchronous, low
overhead
NO background processing
except what platform provides
63. •high chance of data loss (if batching)
•nowhere to install agents/daemons
•no background processing
•higher concurrency to telemetry system
new challenges
73. •asynchronous invocations
•nowhere to install agents/daemons
•no background processing
•higher concurrency to telemetry system
•high chance of data loss (if batching)
new challenges
74. These are the four pillars of the Observability Engineering
team’s charter:
• Monitoring
• Alerting/Visualization
• Distributed systems tracing infrastructure
• Log aggregation/analytics
“
” http://bit.ly/2DnjyuW- Observability Engineering at Twitter
149. those extra 10-20ms for
sending custom metrics would
compound when you have
microservices and multiple
APIs are called within one slice
of user event
150. Amazon found every 100ms of latency cost them 1% in sales.
http://bit.ly/2EXPfbA
151. console.log(“hydrating yubls from db…”);
console.log(“fetching user info from user-api”);
console.log(“MONITORING|1489795335|27.4|latency|user-api-latency”);
console.log(“MONITORING|1489795335|8|count|yubls-served”);
timestamp metric value
metric type
metric namemetrics
logs
165. don’t span over async
invocations
good for identifying dependencies of a function,
but not good enough for tracing the entire call
chain as user request/data flows through the
system via async event sources.