Adopting Open Telemetry as Distributed Tracer on your Microservices at Kubernetes
1. Adopting Open Telemetry as Distributed Tracer on Your
Microservices at Kubernetes
Open Infrastructure & Cloud Native Day Indonesia 2020
07 November 2020
Tonny Adhi Sabastian
tonny.sabastian@go-jek.com
2. About Me
● Former System Administrator for kambing.ui.ac.id - one of national F/OSS repository ; now defunct (2009 - 2018)
● Former Lead Engineer for Univ. Indonesia Data Center (2012 - 2018)
● Former (Part Time) Lecturer at Faculty of Computer Science , Univ. Indonesia (2014 - 2018)
○ Teaching Subject : Distributed System & System Programming
○ Research Subject : IoT , Deep Packet Inspection, Aeronautical Telecommunication Network, Linux System
Performance
● Co-founding two startup during 2012 - 2018 , but failed - (Rumio - AR Platform & Peentar - IoT Platform)
● Currently : Senior DevOps Engineer at MAPAN (GOJEK Group) - https://mapan.id (2018 - Now)
○ We’re recruiting, please visit https://jobs.lever.co/gojek and find job with MAPAN suffix
○ Send your CV to recruitment@ruma.co.id
● Contact :
○ tonny.sabastian@go-jek.com
○ tonny@segmentationfault.xyz
○ https://segmentationfault.xyz
3. Agenda
1. Knowing About Performance Tracing
2. Get To Know With Open Telemetry
3. Deploy Go Microservices with Open Telemetry
4. Demo & (Q&A)
5. Why We Need Tracing ? (0)
Your system (contemporary) is distributed and
doing distributed concurrency
Images Source : “Mastering Distributed Tracing - Yuri Shkuro”
6. Why We Need Tracing ? (1)
Observability : Ability To Allow a Human as System Creator and Maintainer to Ask (impactful) Questions
and Get (impactful) Answers on States of Their System
● Which services did a request go through?
● What did every microservice do when processing the request?
● If the request was slow, where were the bottlenecks?
● If the request failed, where did the error happen?
● How different was the execution of the request from the normal behavior of the system?
○ Were the differences structural, that is, some new services were called, or vice versa, some
usual services were not called?
● What was the critical path of the request?
● What about the user experience when the request not getting through ?
7. What is a Tracing Activity ? (0)
In a nutshell :
● Tracing infrastructure attaches contextual metadata to each request and ensures that
metadata is passed around during the request execution, even when one component
communicates with another over a network.
● At various trace points in the code, the instrumentation records events annotated
with relevant information, such as the URL of an HTTP request or an SQL statement of
a database query.
● Recorded events are tagged with the contextual metadata and explicit causality
references to prior events.
10. Get To Know With Open Telemetry
(Lets Trace Our Code)
11. Distributed Tracing Libraries
● Open Census ( opencensus.io )
○ Start as Google Census Library to collect metrics and traces
○ Provided set of API and SDK for Distributed Tracing
○ Supported backends include Azure Monitor, Datadog, Instana, Jaeger, New Relic, SignalFX,
Google Cloud Monitoring + Trace, and Zipkin.
● Open Tracing ( opentracing.io )
○ More focus on tracing capabilities, minus metrics
● Open Telemetry ( opentelemetry.io )
○ OpenCensus and OpenTracing combined together to provided a full set telemetry
exporter, collector and analyzer , including metric, traces and logging (incoming)
○ Provided agent collector (planned), serve as endpoint for log, metric, and analyzer.
○ Supported exporters for various trace capture platform like Jaeger, Zipkin, Datadog,
LightStep, NewRelic, etc.
○ Last specification version v0.6.0
13. Open Telemetry Architecture (2)
● Open Telemetry API is used to instrument our code, and code authors use it to write
instrumentation directly into their services or libraries.
● Open Telemetry SDK is an implementation from Open Telemetry API. The SDK implements Tracer,
Metrics, and Trace Context.
● Open Telemetry Collector (agent) collect Metrics and Traces , and sending them to the backend of
our choices
● A more specific implementation can only use Open Telemetry Exporter, but this exporters are
specific to certain backend ( ex : Jaeger Exporter , Prometheus Exporter, Datadog Exporter, NewRelic
Exporter)