O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

Distributed tracing with erlang/elixir

70 visualizações

Publicada em

What is Distributed Tracing (DT), why it may be useful for you.
Design of DT, how OpenTracing and OpenCensus works for Elixir/Erlang projects (libraries, problems, my experience)

Publicada em: Software
  • Seja o primeiro a comentar

Distributed tracing with erlang/elixir

  1. 1. May 2019 Distributed Tracing with Erlang/Elixir projects Ivan Glushkov 
 @gliush
  2. 2. About myself ❖ Postmates, Infra Team ❖ MZ, Infra Team ❖ Echo, Backend Team ❖ MCST, Compiler Project ❖ DevZen podcast, Co-founder
  3. 3. Content ❖ Why Distributed Tracing (DT) is needed ❖ Ideal Design of the DT ❖ OpenTracing + Erlang/Elixir ❖ OpenCensus + Erlang/Elixir
  4. 4. Problem
  5. 5. Problem
  6. 6. Problem
  7. 7. Problem - debug? - introspect? - profile?
  8. 8. Design DT
  9. 9. Design DT: Use Cases
  10. 10. Design DT: Use Cases ❖ Log one request through all the services
  11. 11. Design DT: Use Cases ❖ Log one request through all the services ❖ Gather all operations information (result, time)
  12. 12. Design DT: Use Cases ❖ Log one request through all the services ❖ Gather all operations information (result, time) ❖ Build Dependency Graph
  13. 13. Design DT: Use Cases ❖ Log one request through all the services ❖ Gather all operations information (result, time) ❖ Build Dependency Graph ❖ Analytics (“Daper” paper)
  14. 14. Design DT: Use Cases ❖ Log one request through all the services ❖ Gather all operations information (result, time) ❖ Build Dependency Graph ❖ Analytics (“Daper” paper) ❖ Tags, Logs, Artifacts for each operation
  15. 15. Design DT: Use Cases ❖ Log one request through all the services ❖ Gather all operations information (result, time) ❖ Build Dependency Graph ❖ Analytics (“Daper” paper) ❖ Tags, Logs, Artifacts for each operation ❖ Lines of Business analytics
  16. 16. Design DT: Use Cases ❖ Log one request through all the services ❖ Gather all operations information (result, time) ❖ Build Dependency Graph ❖ Analytics (“Daper” paper) ❖ Tags, Logs, Artifacts for each operation ❖ Lines of Business analytics ❖ QoS, Traffic Control
  17. 17. Design DT: Use Cases
  18. 18. Design DT: Use Cases
  19. 19. Design DT: Idea ❖ User Request ID -> to pass to every subsystem:
  20. 20. Design DT: Idea ❖ User Request ID -> to pass to every subsystem: ❖ HTTP: headers
  21. 21. Design DT: Idea ❖ User Request ID -> to pass to every subsystem: ❖ HTTP: headers ❖ gRPC: additional field / auto wrapping
  22. 22. Design DT: Idea ❖ User Request ID -> to pass to every subsystem: ❖ HTTP: headers ❖ gRPC: additional field / auto wrapping ❖ Event Bus: additional field / auto wrapping
  23. 23. Design DT: Idea ❖ User Request ID -> to pass to every subsystem: ❖ HTTP: headers ❖ gRPC: additional field / auto wrapping ❖ Event Bus: additional field / auto wrapping ❖ Subsystem to have sub-request ID
  24. 24. Design DT: Idea ❖ User Request ID -> to pass to every subsystem: ❖ HTTP: headers ❖ gRPC: additional field / auto wrapping ❖ Event Bus: additional field / auto wrapping ❖ Subsystem to have sub-request ID ❖ Relation to the previous subsystem (parent/child, sequence, …)
  25. 25. ❖ Sampling: ❖ pre/intra/post ❖ random/rate limited/by flag ❖ Storage for DT Design DT: Idea
  26. 26. ❖ Lib - Storage? ❖ Lib - Collector - Storage? ❖ Agent - Storage? ❖ Agent - Collector - Storage? ❖ Synchronous? Design DT: Architecture
  27. 27. ❖ Lib - Storage? ❖ Lib - Collector - Storage? ❖ Agent - Storage? ❖ Agent - Collector - Storage? ❖ Synchronous? Design DT: Architecture https://github.com/EchoTeam/gtl
  28. 28. Design DT: Architecture StorageServiceA ReqID1 ReqID2 ReqID3 ServiceB ServiceB Collector
  29. 29. Design DT: Problems ❖ Too many traces -> OOM or CPU is 100% ❖ Too few traces -> miss problems ❖ Decide “on the fly” is difficult
  30. 30. OpenTracing ❖ Cloud Native Computing Foundation (cncf.io) incubating project ❖ Uber, Apple, Pinterest, Couchbase ❖ API specification, libraries
  31. 31. OpenTracing: Concepts ❖ Trace ❖ Span: name, start time, end time ❖ Span: kv tags, kv logs, baggage items ❖ SpanContext ❖ Scopes + Threading + ActiveSpan ❖ Tracers: API + ready solutions ❖ Carriers: API to inject/extract SpanContext
  32. 32. OpenTracing: Flow 1. get SpanContext or start Trace => span.start(SpanContext) 2. span.store(tags/metrics/logs/baggage) 3. 4. span.finish()
  33. 33. OpenTracing: Flow 1. get SpanContext or start Trace => span.start(SpanContext) 2. span.store(tags/metrics/logs/baggage) 3. run another function with SpanContext 4. span.finish()
  34. 34. OpenTracing: Flow 1. get SpanContext or start Trace => span.start(SpanContext) 2. span.store(tags/metrics/logs/baggage) 3. send async message with SpanContext 4. span.finish()
  35. 35. OpenTracing: Flow 1. get SpanContext or start Trace => span.start(SpanContext) 2. span.store(tags/metrics/logs/baggage) 3. HTTP request with SpanContext in headers 4. span.finish()
  36. 36. OpenTracing: Sampling ❖ Sampling ratio ❖ Sampling priority (by tag, flag, …)
  37. 37. OpenTracing: Tracers ❖ CNCF Jaeger (Uber) ❖ LightStep - SaaS solution ❖ Apache SkyWalking ❖ Datadog ❖ Wavefront
  38. 38. OpenTracing: problems ❖ No strict agreement about how to pass the SpanContext ❖ No good libraries for all the languages
  39. 39. OpenTracing: OTTER (Erlang) ❖ Last Update: Apr 2018
  40. 40. ❖ Span - record, could be stored: ❖ Process Dict ❖ Multiname Process Dict ❖ Separate Process OpenTracing: OTTER (Erlang) Pid = otter_span_id_api:start("my request”), … otter_span_id_api:tag(SpanPid, "result", “ok"), … otter_span_id_api:finish(SpanPid),
  41. 41. [{
 [ %% Condition {greater, otter_span_duration, 5000000}, {value, otter_span_name, "radius request"} ], [ %% Action {snapshot_count, [long_radius_request], []}, send_to_zipkin ] }] OpenTracing: OTTER (Erlang): Filters
  42. 42. ❖ Implement Inject/Extract by yourself ❖ Repeat the semantics for every languages OpenTracing: OTTER (Erlang): Inject/Extract
  43. 43. ❖ Need to write A LOT of code ❖ Flexible configuration ❖ No default agreements OpenTracing: OTTER (Erlang): Summary
  44. 44. OpenTracing: Ex_Ray (Elixir) ❖ Last update: Oct 2017 ❖ Store spans in ETS ❖ Magic with Elixir Macros
  45. 45. defmodule Nested do use ExRay, pre: :before_fun, post: :after_fun … @trace kind: :critical def fred(a, b), do: blee(a, b) … defp before_fun(ctx) do Span.open(ctx.target, @req_id) |> :otter.tag(:kind, ctx.meta[:kind]) |> :otter.log(">>> #{ctx.target} with #{ctx.args |> inspect}") end end OpenTracing: Ex_Ray (Elixir)
  46. 46. ❖ Less code needed ❖ Low quality code ❖ Memory leaks ❖ Exceptions are not re-raised in wrappers ❖ No default agreements OpenTracing: Ex_Ray (Elixir): Summary
  47. 47. OpenCensus ❖ Started in Google ❖ Large community (Microsoft, Datadog, Prometheus, …) ❖ Automatic Context Propagation ❖ Reference implementation of the official W3C HTTP tracing header
  48. 48. OpenCensus: Concepts ❖ Trace, Span - similar to OpenTracing ❖ Link between spans: child/parent/unknown ❖ Sampling: Always/Never/Probabilistic (1 in 10000)/RateLimiting (10 per sec) ❖ Automatic Context Propagation ❖ Stats/Metrics
  49. 49. ❖ OpenCensus Service: Agent + Collector OpenCensus: Concepts
  50. 50. ❖ Agent OpenCensus: Concepts
  51. 51. ❖ Collector OpenCensus: Concepts
  52. 52. OpenCensus Erlang ❖ Public GitHub repo for all Elixir/Erlang libs ❖ Libs for web-servers (Elli, Cowboy, Phoenix, …) ❖ Integrate with minimum effort
  53. 53. OpenCensus Erlang ❖ ETS table for Span data + GC for abandoned Spans ❖ Track SpanContext: process dict / variable ❖ Parse transform or manual context tracking ❖ Logger can receive SpanContext ❖ Metrics
  54. 54. ocp:with_child_span(<<“span1”>>), ocp:with_child_span(<<“span2”>>, #{}, fun() … end) OpenCensus Erlang Process Dictionary Example
  55. 55. handler(Ctx, NextHandler) -> SpanCtx = oc_trace:with_child_span(Ctx, <<"span-name">>), try oc_trace:put_attribute(<<"key">>, <<"value">>, SpanCtx), {Code, Message} = NextHandler(SpanCtx), oc_trace:set_status(Code, Message, SpanCtx) after oc_trace:finish_span(SpanCtx) end. OpenCensus Erlang Manual Context Handling
  56. 56. ❖ Gathers metrics (request processed, bytes sent, latency, …) ❖ Gets parent SpanContext, creates new child Span ❖ Integration (rebar.config change): [{callback, elli_middleware}, {callback_args, [{mods, [{oc_elli_middleware, []}] OpenCensus Erlang: Elli
  57. 57. OpenCensus Elixir ❖ Uses opencencus-erlang (e.g. prepare headers with SpanContext) ❖ Implements a macro:
 with_child_span “span1” do … end
  58. 58. ❖ Uses “Phoenix Instrumenter” ❖ Creates Span for any Controller or View ❖ Integration (config.exs):
 
 instrumenters: [OpencensusPhoenix.Instrumenter] OpenCensus Elixir: Phoenix
  59. 59. ❖ Integrates into any pipeline with “Plug” ❖ Gets parent Span from headers ❖ Creates child Span with new attributes (call function to get them) ❖ Integration: defmodule MyApp.TracePlug do
 # some custom configuration
 end
 
 plug MyApp.TracePlug OpenCensus Elixir: Plug
  60. 60. OpenCensus BEAM: Summary ❖ A lot of libraries ready to be used ❖ Seamless integration with other languages ❖ You need to understand the concept
  61. 61. Jaeger
  62. 62. Summary ❖ A lot of advantages: Introspection, Analytics, LoB, QoS ❖ Think about sending metrics with OpenCensus ❖ Easy to integrate even with Erlang/Elixir
  63. 63. Breaking News ❖ Update: May 21st ❖ OpenTracing + OpenCensus 
 => OpenTelemetry ❖ Backward compatibility for both projects ❖ Nov 2019: readonly mode for 
 OpenTracing, OpenCensus
  64. 64. Questions ❖ @gliush ❖ Ivan Glushkov ❖ http://devzen.ru

×