Apricot2017 Request tracing in distributed environment

2017 February 07
Hieu LE (hieulq@vn.fujitsu.com)
Fujitsu Vietnam Limited
PODC (Platform Offshore Development Center)
Vietnam OpenStack Community - VFOSSA
Logging/Request Tracing in Distributed
Environment
Copyright 2017 Fujitsu Vietnam Limited

/me
2 APRICOT 2017
Hieu LE
Vietnam Official OpenStack Community Organizer
VFOSSA Executive Member
OpenStack Project leader @ Fujitsu
OpenStack ATC/AUC
Email: hieulq@vn.fujitsu.com

Outline
3 APRICOT 2017
1. Intro
2. Current Logging solution
 Pros
 Cons
3. Tracing requirements
4. Request tracing
 Demo with OpenStack

Intro
4 APRICOT 2017
 Distributed Environment:
 Cloud Computing – Fog Computing.
 IoT environment.
 Micro-services architecture.

IoT – Fog – Cloud
5 APRICOT 2017
(Virtual) Storage
Services/Servers
Virtual Compute
Resources
Virtual Network
O2M2 Thingworx DeviceHive
Other
Platforms
Multiple Clouds
- Routing
+ Optimizing paths
+ Data pre-processing

6 APRICOT 2017
• What if something happened in our system?
• How can we resolve the problems as quick as possible?

Current Logging solution (1)
7 APRICOT 2017
 ELK, Graylog:
 Collecting logs from systems and appliances.
 Indexing and filtering  RCA
 Multiple Alert/Notify mechanisms.
 Visualization based on user’s needs.

8 APRICOT 2017
 Pros:
 Quickly trouble-shoot problems of systems/appliances.
 Reduce cost for storing log, based on PCI DSS or HIPAA
requirements.
 Cons:
 Mostly depend on systems/appliances log.
 Require more efforts on sizing/deploying, maintaining and operating
these logging solution.
 Ate up resources (mostly storage)  May not suitable for small
sensors.

9 APRICOT 2017
 Example 01:
 Single request for launching 01 VM in OpenStack cloud system can
go through at least 04 micro-services.
 Log INFO level sometimes contain misleading information or not-
enough information for trouble-shooting
 Turn on DEBUG log level
 Too much information and eat up storage.
 Hard to control the overhead threshold.

10 APRICOT 2017
 Example 02:
 ELK/Graylog requires some tweaks and efforts on visualize,
collecting, profiling and RCA in distributed environment.
 Consider following queries in environments with >10 services:
 “Find me the root cause of all error requests where the requests
process X business.”
 “Find me requests where the user was logged in and the request
took more than two seconds and a DB transaction was held open
for more than 500 ms.”

Tracing Requirements
Address the Data
Explosion
Logs, Metrics, Events,
Active/Passive Checks,
…
End-to-End Debugging
Understand what the real
issue is and what is affected
when errors occur
Visibility
Deliver centralized
intelligence for cloud
operations at scale
Operator Needs
Resource Utilization
Understand resource
availability and
utilization
Solution Requirements
Able to Collect,
Store and Access
all types of data
in one place
Highly
Performant and
Scalable
Platform
Flexible Processing Pipeline that
can support multiple use cases:
diagnostics, root cause analysis,
SLA calculations, utilization
reporting, …
Extensible Platform that
can be extended to
support new types of data
and processing
11 APRICOT 2017

Tracing Requirements
• Users need centralize solution that provide enough
information related to machine centric (monitor) and
workflow centric (tracing).
– Provide general picture for every workflow: the
communication steps, req/resp time for each step
for performance reviewing purpose.
– Show monitoring metrics of hardware/services for
each step at the time of investigation.
– Provide general purpose RCA method for quickly
troubleshooting.
12 APRICOT 2017

Workflow Centric solution quick survey
There are many solutions aim to tracing the workflow centric, divided into
3 categories: [1]
1. Explicit metadata propagation: inject tracing metadata into current
system (Zipkin, Kieker, X-Trace, Tracelytics, Cloudera Htrace,
ExplorViz, OpenTracing - CNCF)
2. Schema-based: rely on the event semantics of system and use
temporal schema of custom log message for tracing. (Magpie)
3. Black-box tracing: rely on log analysis for inferring relationship among
events. (Fchain, Netmedic)
[1]. HANSEL: Diagnosing Faults in OpenStack – IBM Research
13 APRICOT 2017

Workflow centric solutions (1)
14 APRICOT 2017
• Figure of traditional workflow
Service A Service B Service C Service D
Req

15 APRICOT 2017
• Explicit metadata propagation
 Figure of explicit metadata tracing workflow: inject metadata in request/response
and send to tracing mechanism (Zipkin, Dapper..)
Tracing
Mechanism
Req

16 APRICOT 2017
• Explicit metadata propagation
 Pros:
• Give enough detail for tracing the problems
• Highly scalability.
 Cons:
• Must modify code base and inject meta-data into header of each request and
response
• Increase network packet (maybe a little bit like Zipkin - around 500bytes)

17 APRICOT 2017
• Schema-based: based on sematic of event generated from system
(including OS, services and applications), then joining all related event
schema for final inference.
Authenticate
Authenticate
Authenticate
Get Image
Create port, IP and attach
Req Read/Write
DB
Event Listener

18 APRICOT 2017
• Schema-based
 Pros:
• Less modification into code base
 Cons:
• Low scalability. (the result is delayed until all event are collected).
• Less details than explicit meta-data. (the semantic of event, the event list and also
the way to join schemas define the success of this approach  we need to build a
warehouse of event semantic)

19 APRICOT 2017
• Black-box tracing: collect logs of all services, then do analyzing all the
logs and infer the root cause of problem.
DB
Log Collector
and Analyzer
Logs
Logs Logs Logs
Logs

20 APRICOT 2017
• Black-box tracing:
 Pros:
• No modification to code base.
 Cons:
• High error rate. (almost is probabilistic data mining approaches)

Example (1)
21 APRICOT 2017
Magpie: Schema-based

Example (2)
22 APRICOT 2017
Zipkin: Explicit metadata propagation

Demo with OpenStack
23 APRICOT 2017
OSProfiler: Explicit metadata propagation small library

Q & A
THANK YOU!
24 APRICOT 2017

Apricot2017 Request tracing in distributed environment

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (20)

Semelhante a Apricot2017 Request tracing in distributed environment

Semelhante a Apricot2017 Request tracing in distributed environment (20)

Último

Último (20)

Apricot2017 Request tracing in distributed environment