SlideShare uma empresa Scribd logo
1 de 24
Baixar para ler offline
2017 February 07
Hieu LE (hieulq@vn.fujitsu.com)
Fujitsu Vietnam Limited
PODC (Platform Offshore Development Center)
Vietnam OpenStack Community - VFOSSA
Logging/Request Tracing in Distributed
Environment
Copyright 2017 Fujitsu Vietnam Limited
/me
2 APRICOT 2017
Hieu LE
Vietnam Official OpenStack Community Organizer
VFOSSA Executive Member
OpenStack Project leader @ Fujitsu
OpenStack ATC/AUC
Email: hieulq@vn.fujitsu.com
Outline
3 APRICOT 2017
1. Intro
2. Current Logging solution
 Pros
 Cons
3. Tracing requirements
4. Request tracing
 Demo with OpenStack
Intro
4 APRICOT 2017
 Distributed Environment:
 Cloud Computing – Fog Computing.
 IoT environment.
 Micro-services architecture.
IoT – Fog – Cloud
5 APRICOT 2017
(Virtual) Storage
Services/Servers
Virtual Compute
Resources
Virtual Network
O2M2 Thingworx DeviceHive
Other
Platforms
Multiple Clouds
- Routing
+ Optimizing paths
+ Data pre-processing
6 APRICOT 2017
• What if something happened in our system?
• How can we resolve the problems as quick as possible?
Current Logging solution (1)
7 APRICOT 2017
 ELK, Graylog:
 Collecting logs from systems and appliances.
 Indexing and filtering  RCA
 Multiple Alert/Notify mechanisms.
 Visualization based on user’s needs.
Current Logging solution (2)
8 APRICOT 2017
 Pros:
 Quickly trouble-shoot problems of systems/appliances.
 Reduce cost for storing log, based on PCI DSS or HIPAA
requirements.
 Cons:
 Mostly depend on systems/appliances log.
 Require more efforts on sizing/deploying, maintaining and operating
these logging solution.
 Ate up resources (mostly storage)  May not suitable for small
sensors.
Current Logging solution (3)
9 APRICOT 2017
 Example 01:
 Single request for launching 01 VM in OpenStack cloud system can
go through at least 04 micro-services.
 Log INFO level sometimes contain misleading information or not-
enough information for trouble-shooting
 Turn on DEBUG log level
 Too much information and eat up storage.
 Hard to control the overhead threshold.
Current Logging solution (4)
10 APRICOT 2017
 Example 02:
 ELK/Graylog requires some tweaks and efforts on visualize,
collecting, profiling and RCA in distributed environment.
 Consider following queries in environments with >10 services:
 “Find me the root cause of all error requests where the requests
process X business.”
 “Find me requests where the user was logged in and the request
took more than two seconds and a DB transaction was held open
for more than 500 ms.”
Tracing Requirements
Address the Data
Explosion
Logs, Metrics, Events,
Active/Passive Checks,
…
End-to-End Debugging
Understand what the real
issue is and what is affected
when errors occur
Visibility
Deliver centralized
intelligence for cloud
operations at scale
Operator Needs
Resource Utilization
Understand resource
availability and
utilization
Solution Requirements
Able to Collect,
Store and Access
all types of data
in one place
Highly
Performant and
Scalable
Platform
Flexible Processing Pipeline that
can support multiple use cases:
diagnostics, root cause analysis,
SLA calculations, utilization
reporting, …
Extensible Platform that
can be extended to
support new types of data
and processing
11 APRICOT 2017
Tracing Requirements
• Users need centralize solution that provide enough
information related to machine centric (monitor) and
workflow centric (tracing).
– Provide general picture for every workflow: the
communication steps, req/resp time for each step
for performance reviewing purpose.
– Show monitoring metrics of hardware/services for
each step at the time of investigation.
– Provide general purpose RCA method for quickly
troubleshooting.
12 APRICOT 2017
Workflow Centric solution quick survey
There are many solutions aim to tracing the workflow centric, divided into
3 categories: [1]
1. Explicit metadata propagation: inject tracing metadata into current
system (Zipkin, Kieker, X-Trace, Tracelytics, Cloudera Htrace,
ExplorViz, OpenTracing - CNCF)
2. Schema-based: rely on the event semantics of system and use
temporal schema of custom log message for tracing. (Magpie)
3. Black-box tracing: rely on log analysis for inferring relationship among
events. (Fchain, Netmedic)
[1]. HANSEL: Diagnosing Faults in OpenStack – IBM Research
13 APRICOT 2017
Workflow centric solutions (1)
14 APRICOT 2017
• Figure of traditional workflow
Service A Service B Service C Service D
Req
Workflow centric solutions (2)
15 APRICOT 2017
• Explicit metadata propagation
 Figure of explicit metadata tracing workflow: inject metadata in request/response
and send to tracing mechanism (Zipkin, Dapper..)
Service A Service B Service C Service D
Tracing
Mechanism
Req
Workflow centric solutions (3)
16 APRICOT 2017
• Explicit metadata propagation
 Pros:
• Give enough detail for tracing the problems
• Highly scalability.
 Cons:
• Must modify code base and inject meta-data into header of each request and
response
• Increase network packet (maybe a little bit like Zipkin - around 500bytes)
Workflow centric solutions (4)
17 APRICOT 2017
• Schema-based: based on sematic of event generated from system
(including OS, services and applications), then joining all related event
schema for final inference.
Service A Service B Service C Service D
Authenticate
Authenticate
Authenticate
Get Image
Create port, IP and attach
Req Read/Write
DB
Event Listener
Workflow centric solutions (5)
18 APRICOT 2017
• Schema-based
 Pros:
• Less modification into code base
 Cons:
• Low scalability. (the result is delayed until all event are collected).
• Less details than explicit meta-data. (the semantic of event, the event list and also
the way to join schemas define the success of this approach  we need to build a
warehouse of event semantic)
Workflow centric solutions (6)
19 APRICOT 2017
• Black-box tracing: collect logs of all services, then do analyzing all the
logs and infer the root cause of problem.
Service A Service B Service C Service D
DB
Log Collector
and Analyzer
Logs
Logs Logs Logs
Logs
Workflow centric solutions (7)
20 APRICOT 2017
• Black-box tracing:
 Pros:
• No modification to code base.
 Cons:
• High error rate. (almost is probabilistic data mining approaches)
Example (1)
21 APRICOT 2017
Magpie: Schema-based
Example (2)
22 APRICOT 2017
Zipkin: Explicit metadata propagation
Demo with OpenStack
23 APRICOT 2017
OSProfiler: Explicit metadata propagation small library
Q & A
THANK YOU!
24 APRICOT 2017

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

CI/CD for a Data Platform
CI/CD for a Data PlatformCI/CD for a Data Platform
CI/CD for a Data Platform
 
A Framework for Infrastructure Visibility, Analytics & Operational Intelligence
A Framework for Infrastructure Visibility, Analytics & Operational IntelligenceA Framework for Infrastructure Visibility, Analytics & Operational Intelligence
A Framework for Infrastructure Visibility, Analytics & Operational Intelligence
 
Combining Logs, Metrics, and Traces for Unified Observability
Combining Logs, Metrics, and Traces for Unified ObservabilityCombining Logs, Metrics, and Traces for Unified Observability
Combining Logs, Metrics, and Traces for Unified Observability
 
Getting started with apache flink streaming api
Getting started with apache flink streaming apiGetting started with apache flink streaming api
Getting started with apache flink streaming api
 
Elastic at KPN
Elastic at KPNElastic at KPN
Elastic at KPN
 
What’s Evolving in the Elastic Stack
What’s Evolving in the Elastic StackWhat’s Evolving in the Elastic Stack
What’s Evolving in the Elastic Stack
 
Cloudera Federal Forum 2014: EzBake, the DoDIIS App Engine
Cloudera Federal Forum 2014: EzBake, the DoDIIS App EngineCloudera Federal Forum 2014: EzBake, the DoDIIS App Engine
Cloudera Federal Forum 2014: EzBake, the DoDIIS App Engine
 
Search for all with Elastic Enterprise Search
Search for all with Elastic Enterprise Search Search for all with Elastic Enterprise Search
Search for all with Elastic Enterprise Search
 
Siscale Lightning Talk: Automated Root Cause Analysis with Elastic Stack
Siscale Lightning Talk: Automated Root Cause Analysis with Elastic StackSiscale Lightning Talk: Automated Root Cause Analysis with Elastic Stack
Siscale Lightning Talk: Automated Root Cause Analysis with Elastic Stack
 
SnapLogic Live: Anaplan Integration
SnapLogic Live: Anaplan IntegrationSnapLogic Live: Anaplan Integration
SnapLogic Live: Anaplan Integration
 
SMW Use Cases at the Provincial Government of Lower Austria, Gerald Streimelw...
SMW Use Cases at the Provincial Government of Lower Austria, Gerald Streimelw...SMW Use Cases at the Provincial Government of Lower Austria, Gerald Streimelw...
SMW Use Cases at the Provincial Government of Lower Austria, Gerald Streimelw...
 
Discover How Allscripts Uses InfluxDB to Monitor its Healthcare IT Platform
Discover How Allscripts Uses InfluxDB to Monitor its Healthcare IT PlatformDiscover How Allscripts Uses InfluxDB to Monitor its Healthcare IT Platform
Discover How Allscripts Uses InfluxDB to Monitor its Healthcare IT Platform
 
Big data lab as a service
Big data lab as a serviceBig data lab as a service
Big data lab as a service
 
How KeyBank Used Elastic to Build an Enterprise Monitoring Solution
How KeyBank Used Elastic to Build an Enterprise Monitoring SolutionHow KeyBank Used Elastic to Build an Enterprise Monitoring Solution
How KeyBank Used Elastic to Build an Enterprise Monitoring Solution
 
Achieving cyber mission assurance with near real-time impact
Achieving cyber mission assurance with near real-time impactAchieving cyber mission assurance with near real-time impact
Achieving cyber mission assurance with near real-time impact
 
O monitoramento da infraestrutura facilitado, da ingestão ao insight
O monitoramento da infraestrutura facilitado, da ingestão ao insightO monitoramento da infraestrutura facilitado, da ingestão ao insight
O monitoramento da infraestrutura facilitado, da ingestão ao insight
 
Combinação de logs, métricas e rastreamentos para observabilidade unificada
Combinação de logs, métricas e rastreamentos para observabilidade unificadaCombinação de logs, métricas e rastreamentos para observabilidade unificada
Combinação de logs, métricas e rastreamentos para observabilidade unificada
 
ECL-Watch: A Big Data Application Performance Tuning Tool in the HPCC Systems...
ECL-Watch: A Big Data Application Performance Tuning Tool in the HPCC Systems...ECL-Watch: A Big Data Application Performance Tuning Tool in the HPCC Systems...
ECL-Watch: A Big Data Application Performance Tuning Tool in the HPCC Systems...
 
Empower your security practitioners with the Elastic Stack
Empower your security practitioners with the Elastic StackEmpower your security practitioners with the Elastic Stack
Empower your security practitioners with the Elastic Stack
 
The Elastic Evolution of CenturyLink’s Network Management System
The Elastic Evolution of CenturyLink’s Network Management SystemThe Elastic Evolution of CenturyLink’s Network Management System
The Elastic Evolution of CenturyLink’s Network Management System
 

Destaque

Actividad no. 11 ecologia.
Actividad no. 11 ecologia.Actividad no. 11 ecologia.
Actividad no. 11 ecologia.
isabeltrejo44
 

Destaque (20)

Interesting Facts About Mount Kilimanjaro
Interesting Facts About Mount KilimanjaroInteresting Facts About Mount Kilimanjaro
Interesting Facts About Mount Kilimanjaro
 
Γνωρίζω την οικογένειά μου
Γνωρίζω την οικογένειά μου Γνωρίζω την οικογένειά μου
Γνωρίζω την οικογένειά μου
 
Actividad no. 11 ecologia.
Actividad no. 11 ecologia.Actividad no. 11 ecologia.
Actividad no. 11 ecologia.
 
Hot Resources for Financial Advisors to get more Referrals and Clients
Hot Resources for Financial Advisors to get more Referrals and ClientsHot Resources for Financial Advisors to get more Referrals and Clients
Hot Resources for Financial Advisors to get more Referrals and Clients
 
Abeer Elshahat
Abeer ElshahatAbeer Elshahat
Abeer Elshahat
 
Distributed tracing - get a grasp on your production
Distributed tracing - get a grasp on your productionDistributed tracing - get a grasp on your production
Distributed tracing - get a grasp on your production
 
Fordismo
FordismoFordismo
Fordismo
 
Simplifying the OpenStack and Kubernetes network stack with Romana
Simplifying the OpenStack and Kubernetes network stack with RomanaSimplifying the OpenStack and Kubernetes network stack with Romana
Simplifying the OpenStack and Kubernetes network stack with Romana
 
Summit 16: Cengn Experience in Opnfv Projects
Summit 16: Cengn Experience in Opnfv ProjectsSummit 16: Cengn Experience in Opnfv Projects
Summit 16: Cengn Experience in Opnfv Projects
 
Monasca 를 이용한 cloud 모니터링 final
Monasca 를 이용한 cloud 모니터링 finalMonasca 를 이용한 cloud 모니터링 final
Monasca 를 이용한 cloud 모니터링 final
 
Stop todos a ler
Stop todos a lerStop todos a ler
Stop todos a ler
 
OpenStack本番環境の作り方 - Interop 2016
OpenStack本番環境の作り方 - Interop 2016OpenStack本番環境の作り方 - Interop 2016
OpenStack本番環境の作り方 - Interop 2016
 
How to Develop OpenStack
How to Develop OpenStackHow to Develop OpenStack
How to Develop OpenStack
 
OpenStack networking-sfc flow 분석
OpenStack networking-sfc flow 분석OpenStack networking-sfc flow 분석
OpenStack networking-sfc flow 분석
 
Internet Resource Management (IRM) & Internet Routing Registry (IRR)
Internet Resource Management (IRM) & Internet Routing Registry (IRR)Internet Resource Management (IRM) & Internet Routing Registry (IRR)
Internet Resource Management (IRM) & Internet Routing Registry (IRR)
 
Geek Week 2016 - Deep Dive To Openstack
Geek Week 2016 -  Deep Dive To OpenstackGeek Week 2016 -  Deep Dive To Openstack
Geek Week 2016 - Deep Dive To Openstack
 
Openstack에 컨트리뷰션 해보기
Openstack에 컨트리뷰션 해보기Openstack에 컨트리뷰션 해보기
Openstack에 컨트리뷰션 해보기
 
Ceph Performance on OpenStack - Barcelona Summit
Ceph Performance on OpenStack - Barcelona SummitCeph Performance on OpenStack - Barcelona Summit
Ceph Performance on OpenStack - Barcelona Summit
 
Open stack ocata summit enabling aws lambda-like functionality with openstac...
Open stack ocata summit  enabling aws lambda-like functionality with openstac...Open stack ocata summit  enabling aws lambda-like functionality with openstac...
Open stack ocata summit enabling aws lambda-like functionality with openstac...
 
Logging/Request Tracing in Distributed Environment
Logging/Request Tracing in Distributed EnvironmentLogging/Request Tracing in Distributed Environment
Logging/Request Tracing in Distributed Environment
 

Semelhante a Apricot2017 Request tracing in distributed environment

Introducing Events and Stream Processing into Nationwide Building Society
Introducing Events and Stream Processing into Nationwide Building SocietyIntroducing Events and Stream Processing into Nationwide Building Society
Introducing Events and Stream Processing into Nationwide Building Society
confluent
 

Semelhante a Apricot2017 Request tracing in distributed environment (20)

Enterprise Data Lakes
Enterprise Data LakesEnterprise Data Lakes
Enterprise Data Lakes
 
EUDAT B2STAGE & EOSC-hub
EUDAT B2STAGE & EOSC-hubEUDAT B2STAGE & EOSC-hub
EUDAT B2STAGE & EOSC-hub
 
SharePoint Best Practices Conference 2013
SharePoint Best Practices Conference 2013SharePoint Best Practices Conference 2013
SharePoint Best Practices Conference 2013
 
Introducing Events and Stream Processing into Nationwide Building Society
Introducing Events and Stream Processing into Nationwide Building SocietyIntroducing Events and Stream Processing into Nationwide Building Society
Introducing Events and Stream Processing into Nationwide Building Society
 
Introducing Events and Stream Processing into Nationwide Building Society (Ro...
Introducing Events and Stream Processing into Nationwide Building Society (Ro...Introducing Events and Stream Processing into Nationwide Building Society (Ro...
Introducing Events and Stream Processing into Nationwide Building Society (Ro...
 
Building Reactive Real-time Data Pipeline
Building Reactive Real-time Data PipelineBuilding Reactive Real-time Data Pipeline
Building Reactive Real-time Data Pipeline
 
Monitoring as an entry point for collaboration
Monitoring as an entry point for collaborationMonitoring as an entry point for collaboration
Monitoring as an entry point for collaboration
 
A Logical Architecture is Always a Flexible Architecture (ASEAN)
A Logical Architecture is Always a Flexible Architecture (ASEAN)A Logical Architecture is Always a Flexible Architecture (ASEAN)
A Logical Architecture is Always a Flexible Architecture (ASEAN)
 
CPaaS.io Y1 Review Meeting - Use Cases
CPaaS.io Y1 Review Meeting - Use CasesCPaaS.io Y1 Review Meeting - Use Cases
CPaaS.io Y1 Review Meeting - Use Cases
 
DM Radio Webinar: Adopting a Streaming-Enabled Architecture
DM Radio Webinar: Adopting a Streaming-Enabled ArchitectureDM Radio Webinar: Adopting a Streaming-Enabled Architecture
DM Radio Webinar: Adopting a Streaming-Enabled Architecture
 
Monitoring in 2017 - TIAD Camp Docker
Monitoring in 2017 - TIAD Camp DockerMonitoring in 2017 - TIAD Camp Docker
Monitoring in 2017 - TIAD Camp Docker
 
Big Data LDN 2018: FORTUNE 100 LESSONS ON ARCHITECTING DATA LAKES FOR REAL-TI...
Big Data LDN 2018: FORTUNE 100 LESSONS ON ARCHITECTING DATA LAKES FOR REAL-TI...Big Data LDN 2018: FORTUNE 100 LESSONS ON ARCHITECTING DATA LAKES FOR REAL-TI...
Big Data LDN 2018: FORTUNE 100 LESSONS ON ARCHITECTING DATA LAKES FOR REAL-TI...
 
Modern Monitoring
Modern MonitoringModern Monitoring
Modern Monitoring
 
Mapping presentation THAG big data from space
Mapping presentation THAG big data from spaceMapping presentation THAG big data from space
Mapping presentation THAG big data from space
 
Cytoscape: Now and Future
Cytoscape: Now and FutureCytoscape: Now and Future
Cytoscape: Now and Future
 
Trivadis TechEvent 2017 Field report SQL Server by Stephan Hurni
Trivadis TechEvent 2017 Field report SQL Server by Stephan HurniTrivadis TechEvent 2017 Field report SQL Server by Stephan Hurni
Trivadis TechEvent 2017 Field report SQL Server by Stephan Hurni
 
Importance of ‘Centralized Event collection’ and BigData platform for Analysis !
Importance of ‘Centralized Event collection’ and BigData platform for Analysis !Importance of ‘Centralized Event collection’ and BigData platform for Analysis !
Importance of ‘Centralized Event collection’ and BigData platform for Analysis !
 
Private Network Project for Colleges
Private Network Project for CollegesPrivate Network Project for Colleges
Private Network Project for Colleges
 
Ojoconsulting Oy Nimbus Monitoring Service description v1.2 public
Ojoconsulting Oy Nimbus Monitoring Service description v1.2 publicOjoconsulting Oy Nimbus Monitoring Service description v1.2 public
Ojoconsulting Oy Nimbus Monitoring Service description v1.2 public
 
Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
 Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
 

Último

Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoorTop Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
dharasingh5698
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
ankushspencer015
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
dollysharma2066
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 

Último (20)

Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
 
Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
 
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
Unit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdfUnit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdf
 
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoorTop Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdf
 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.ppt
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 

Apricot2017 Request tracing in distributed environment

  • 1. 2017 February 07 Hieu LE (hieulq@vn.fujitsu.com) Fujitsu Vietnam Limited PODC (Platform Offshore Development Center) Vietnam OpenStack Community - VFOSSA Logging/Request Tracing in Distributed Environment Copyright 2017 Fujitsu Vietnam Limited
  • 2. /me 2 APRICOT 2017 Hieu LE Vietnam Official OpenStack Community Organizer VFOSSA Executive Member OpenStack Project leader @ Fujitsu OpenStack ATC/AUC Email: hieulq@vn.fujitsu.com
  • 3. Outline 3 APRICOT 2017 1. Intro 2. Current Logging solution  Pros  Cons 3. Tracing requirements 4. Request tracing  Demo with OpenStack
  • 4. Intro 4 APRICOT 2017  Distributed Environment:  Cloud Computing – Fog Computing.  IoT environment.  Micro-services architecture.
  • 5. IoT – Fog – Cloud 5 APRICOT 2017 (Virtual) Storage Services/Servers Virtual Compute Resources Virtual Network O2M2 Thingworx DeviceHive Other Platforms Multiple Clouds - Routing + Optimizing paths + Data pre-processing
  • 6. 6 APRICOT 2017 • What if something happened in our system? • How can we resolve the problems as quick as possible?
  • 7. Current Logging solution (1) 7 APRICOT 2017  ELK, Graylog:  Collecting logs from systems and appliances.  Indexing and filtering  RCA  Multiple Alert/Notify mechanisms.  Visualization based on user’s needs.
  • 8. Current Logging solution (2) 8 APRICOT 2017  Pros:  Quickly trouble-shoot problems of systems/appliances.  Reduce cost for storing log, based on PCI DSS or HIPAA requirements.  Cons:  Mostly depend on systems/appliances log.  Require more efforts on sizing/deploying, maintaining and operating these logging solution.  Ate up resources (mostly storage)  May not suitable for small sensors.
  • 9. Current Logging solution (3) 9 APRICOT 2017  Example 01:  Single request for launching 01 VM in OpenStack cloud system can go through at least 04 micro-services.  Log INFO level sometimes contain misleading information or not- enough information for trouble-shooting  Turn on DEBUG log level  Too much information and eat up storage.  Hard to control the overhead threshold.
  • 10. Current Logging solution (4) 10 APRICOT 2017  Example 02:  ELK/Graylog requires some tweaks and efforts on visualize, collecting, profiling and RCA in distributed environment.  Consider following queries in environments with >10 services:  “Find me the root cause of all error requests where the requests process X business.”  “Find me requests where the user was logged in and the request took more than two seconds and a DB transaction was held open for more than 500 ms.”
  • 11. Tracing Requirements Address the Data Explosion Logs, Metrics, Events, Active/Passive Checks, … End-to-End Debugging Understand what the real issue is and what is affected when errors occur Visibility Deliver centralized intelligence for cloud operations at scale Operator Needs Resource Utilization Understand resource availability and utilization Solution Requirements Able to Collect, Store and Access all types of data in one place Highly Performant and Scalable Platform Flexible Processing Pipeline that can support multiple use cases: diagnostics, root cause analysis, SLA calculations, utilization reporting, … Extensible Platform that can be extended to support new types of data and processing 11 APRICOT 2017
  • 12. Tracing Requirements • Users need centralize solution that provide enough information related to machine centric (monitor) and workflow centric (tracing). – Provide general picture for every workflow: the communication steps, req/resp time for each step for performance reviewing purpose. – Show monitoring metrics of hardware/services for each step at the time of investigation. – Provide general purpose RCA method for quickly troubleshooting. 12 APRICOT 2017
  • 13. Workflow Centric solution quick survey There are many solutions aim to tracing the workflow centric, divided into 3 categories: [1] 1. Explicit metadata propagation: inject tracing metadata into current system (Zipkin, Kieker, X-Trace, Tracelytics, Cloudera Htrace, ExplorViz, OpenTracing - CNCF) 2. Schema-based: rely on the event semantics of system and use temporal schema of custom log message for tracing. (Magpie) 3. Black-box tracing: rely on log analysis for inferring relationship among events. (Fchain, Netmedic) [1]. HANSEL: Diagnosing Faults in OpenStack – IBM Research 13 APRICOT 2017
  • 14. Workflow centric solutions (1) 14 APRICOT 2017 • Figure of traditional workflow Service A Service B Service C Service D Req
  • 15. Workflow centric solutions (2) 15 APRICOT 2017 • Explicit metadata propagation  Figure of explicit metadata tracing workflow: inject metadata in request/response and send to tracing mechanism (Zipkin, Dapper..) Service A Service B Service C Service D Tracing Mechanism Req
  • 16. Workflow centric solutions (3) 16 APRICOT 2017 • Explicit metadata propagation  Pros: • Give enough detail for tracing the problems • Highly scalability.  Cons: • Must modify code base and inject meta-data into header of each request and response • Increase network packet (maybe a little bit like Zipkin - around 500bytes)
  • 17. Workflow centric solutions (4) 17 APRICOT 2017 • Schema-based: based on sematic of event generated from system (including OS, services and applications), then joining all related event schema for final inference. Service A Service B Service C Service D Authenticate Authenticate Authenticate Get Image Create port, IP and attach Req Read/Write DB Event Listener
  • 18. Workflow centric solutions (5) 18 APRICOT 2017 • Schema-based  Pros: • Less modification into code base  Cons: • Low scalability. (the result is delayed until all event are collected). • Less details than explicit meta-data. (the semantic of event, the event list and also the way to join schemas define the success of this approach  we need to build a warehouse of event semantic)
  • 19. Workflow centric solutions (6) 19 APRICOT 2017 • Black-box tracing: collect logs of all services, then do analyzing all the logs and infer the root cause of problem. Service A Service B Service C Service D DB Log Collector and Analyzer Logs Logs Logs Logs Logs
  • 20. Workflow centric solutions (7) 20 APRICOT 2017 • Black-box tracing:  Pros: • No modification to code base.  Cons: • High error rate. (almost is probabilistic data mining approaches)
  • 21. Example (1) 21 APRICOT 2017 Magpie: Schema-based
  • 22. Example (2) 22 APRICOT 2017 Zipkin: Explicit metadata propagation
  • 23. Demo with OpenStack 23 APRICOT 2017 OSProfiler: Explicit metadata propagation small library
  • 24. Q & A THANK YOU! 24 APRICOT 2017