SlideShare uma empresa Scribd logo
1 de 50
KEVIN CRAWLEY
Instana + Single Music
Observability Workshop
w/ Jaeger and Prometheus
Observability Workshop
https://bit.ly/ot-ee-workshop
A system is observable if the behavior of the
entire system can be determined by only looking
at its inputs and outputs.
Lesson: control theory is a well-documented
approach which people can learn from vs trying
to reinvent
What is Observability?
Kalman, 1961 paper
On the general theory of control systems
"Observability aims to provide highly granular insights
into the behavior of production systems along with
rich context, perfect for debugging and performance
analysis purposes.” – Cindy Sridharan @copyconstruct
What is the goal of observability?
How many of you are running
staging environments?
Why does my organization need
Observability?
Now, how many of you
actually trust your staging
environments?
Why does my organization need
Observability?
This is your staging environment
… and this is your prod environment
• Gain a basic understanding of Distributed Tracing and
“How it works”
• Implement Metrics and Tracing in a small microservice
app using FOSS tools
• Understand how metrics and distributed tracing can help
your organization manage complexity
• Understand the limitations of FOSS and the challenges
ahead
What are the goals of this workshop?
• Workshop (1-1.5 hours)
• How Does Distributed Tracing Work
• Challenges with FOSS monitoring
• Advanced Use Cases w/ Distributed Tracing
(Single Music)
• Q&A
Agenda
Lab 01 - Setting up Kubernetes in Docker Enterprise Edition Lab
Lab 02 - Setting up Gitlab and our Microservice Application
Repository Kubernetes Integration
Lab 03 - Deploying our Microservice Application and Adding
Observability
Lab 04 - Monitoring Application Metrics with Grafana / Prometheus
Lab 05 - Observing with Jaeger and Breaking Things
Lab 06 - Advanced Analytics and Use Cases with Automated
Distributed Tracing
Workshop Overview
w/ Jaeger and Prometheus
Observability Workshop
https://bit.ly/ot-ee-workshop
How does distributed tracing work?
At runtime custom headers / metadata are injected into
each request which includes identifiers that enable trace
backends to correlate spans between requests
• X-B3-TraceId: 128 or 64 lower-hex encoded bits
• X-B3-SpanId: 64 lower-hex encoded bits
• X-B3-ParentSpanId: 64 lower-hex encoded bits
• X-B3-Sampled: Bool
• X-B3-Flags: “1” includes DEBUG
It’s literally just headers / meta data
HTTP Request Example
service-a requests:
GET service-b:8080/api/groceries
X-B3-TraceId: af38bc9
X-B3-SpanId: b9ca
X-B3-ParentSpanId: nil
service-b receives:
GET service-b:8080/api/groceries
X-B3-TraceId: af38bc9
X-B3-SpanId: b9ca
X-B3-ParentSpanId: nil
service-b requests:
GET service-c:8080/api/products
X-B3-TraceId: af38bc9
X-B3-SpanId: a3bc
X-B3-ParentSpanId: b9ca
service-c receives:
GET service-b:8080/api/products
X-B3-TraceId: af38bc9
X-B3-SpanId: a3bc
X-B3-ParentSpanId: b9ca
Gannt Chart
GET /api/groceries 800ms
GET /api/groceries 550ms
GET /api/products 400ms
• Correlation is nearly impossible across
multiple vendors / solutions (Logging, Metrics,
Traces)
• Large scale applications require equally large
scale monitoring (cpu/mem, i/o, distributed
systems, clustered storage, sharded TSDB)
Challenges of FOSS monitoring
Is anything ever truly free?
• Distributed tracing exposes a lot of data which
goes unanalyzed by FOSS tools
• The same holds true for Metrics and Logging
• … and Alerting
Actually, can I just show you what is possible?
Current solutions only collect / display
There is no analysis of the data
How Distributed Tracing and
Log Insights empowers Single
Music
Advanced Use Cases
• Operated by 3 engineers (1 FE/1 BE/1 SRE)
• Over 20k transaction / hour, 20+ integrations,
100k LOC, with less than 15% test coverage
• Launched in 2018 with 15 microservices on
Docker Swarm – has since expanded to over 28
microservices with zero additional engineering
personnel
Visualizing Large and Complex
Visualizing Large
and Complex
Environments
What happens if we aggregate
timing, error rate, and # of reqs
What can analyzing
Distributed Traces
tell us?
Database Optimizations, Caching,
and Concurrency
What problems have
Distributed Tracing
helped solve?
Slow Death
of a Service
• DBO (Hibernate Query) causing O(n log n)
rise in latency and processing time
• Application Dashboard indicated an issue with
overall latency increasing
• Fix deployed and improvement was observed
immediatly
Rise in Latency + Processing Time
Issue
Resolved
• We implemented Redis for caching, and
processing time went down
• However, we didn’t account for token policies
changing and they suddenly began to expire
after 30 seconds
• Alerting around error rates for this endpoint
raised our awareness around this issue
Caching Solved one problem
… but caused another
Context is critical when doing
Contributing Factor Analysis
Metrics are not
standalone, they
have relationships
Logs can benefit from analytics too!
Let’s not forget about
Logging
We utilize a mix of Instana, Logz.io
and Grafana to manage our systems
Custom Dashboards
deliver peace of mind
• Using FOSS monitoring is a great way to both
learn and demonstrate the value of
observability to your peers
• Understand the limitations of FOSS and be
prepared to invest in either 3rd party tooling our
managing your own monitoring infrastructure
Focus on what matters to your business
… at Single Music we focus on delivering music
Schedule a meeting with me!
Want to learn more?
Come visit our booth@
Instana Booth #S23
Rate & Share
Rate this session in the
DockerCon App
Follow me @notsureifkevin
and share #DockerCon

Mais conteúdo relacionado

Mais procurados

Performance Testing Internet of Things
Performance Testing Internet of ThingsPerformance Testing Internet of Things
Performance Testing Internet of ThingsSTePINForum
 
AppSec Pipeline - Velcocity NY 2015
AppSec Pipeline - Velcocity NY 2015AppSec Pipeline - Velcocity NY 2015
AppSec Pipeline - Velcocity NY 2015Matt Tesauro
 
Testability is Everyone's Responsibility
Testability is Everyone's ResponsibilityTestability is Everyone's Responsibility
Testability is Everyone's ResponsibilityAsh Winter
 
OSMC 2015: Monitoring at Spotify-When things go ping in the night by Martin Parm
OSMC 2015: Monitoring at Spotify-When things go ping in the night by Martin ParmOSMC 2015: Monitoring at Spotify-When things go ping in the night by Martin Parm
OSMC 2015: Monitoring at Spotify-When things go ping in the night by Martin ParmNETWAYS
 
ATAGTR2017 Security Testing / IoT Testing in Real World
ATAGTR2017 Security Testing / IoT Testing in Real WorldATAGTR2017 Security Testing / IoT Testing in Real World
ATAGTR2017 Security Testing / IoT Testing in Real WorldAgile Testing Alliance
 
From Sage 500 to 1000 ... Performance Testing myths exposed
From Sage 500 to 1000 ... Performance Testing myths exposedFrom Sage 500 to 1000 ... Performance Testing myths exposed
From Sage 500 to 1000 ... Performance Testing myths exposedTrust IV Ltd
 
Using ai and automation to build resiliency into azure dev ops
Using ai and automation to build resiliency into azure dev opsUsing ai and automation to build resiliency into azure dev ops
Using ai and automation to build resiliency into azure dev opsRob Jahn
 
ATAGTR2017 The way to recover the issue faced in IoT regression Testing
ATAGTR2017 The way to recover the issue faced in IoT regression TestingATAGTR2017 The way to recover the issue faced in IoT regression Testing
ATAGTR2017 The way to recover the issue faced in IoT regression TestingAgile Testing Alliance
 
Bug prediction based on your code history
Bug prediction based on your code historyBug prediction based on your code history
Bug prediction based on your code historyAlexey Tokar
 
Taking AppSec to 11 - BSides Austin 2016
Taking AppSec to 11 - BSides Austin 2016Taking AppSec to 11 - BSides Austin 2016
Taking AppSec to 11 - BSides Austin 2016Matt Tesauro
 
On Rapid Releases and Software Testing
On Rapid Releases and Software TestingOn Rapid Releases and Software Testing
On Rapid Releases and Software TestingFoutse Khomh
 
Spirent Accelerating SDN and NFV Deployments
Spirent Accelerating SDN and NFV DeploymentsSpirent Accelerating SDN and NFV Deployments
Spirent Accelerating SDN and NFV DeploymentsSailaja Tennati
 
ATAGTR2017 Batch Workload Modelling and Performance Optimization
ATAGTR2017 Batch Workload Modelling and Performance Optimization ATAGTR2017 Batch Workload Modelling and Performance Optimization
ATAGTR2017 Batch Workload Modelling and Performance Optimization Agile Testing Alliance
 
Connect Ops and Security with Flexible Web App and API Protection
Connect Ops and Security with Flexible Web App and API ProtectionConnect Ops and Security with Flexible Web App and API Protection
Connect Ops and Security with Flexible Web App and API ProtectionDevOps.com
 
Matt tesauro Lessons from DevOps: Taking DevOps practices into your AppSec Li...
Matt tesauro Lessons from DevOps: Taking DevOps practices into your AppSec Li...Matt tesauro Lessons from DevOps: Taking DevOps practices into your AppSec Li...
Matt tesauro Lessons from DevOps: Taking DevOps practices into your AppSec Li...Matt Tesauro
 
Making security-agile matt-tesauro
Making security-agile matt-tesauroMaking security-agile matt-tesauro
Making security-agile matt-tesauroMatt Tesauro
 
Static analysis tools as the best friend of QA
Static analysis tools as the best friend of QAStatic analysis tools as the best friend of QA
Static analysis tools as the best friend of QAMikalai Alimenkou
 
AppSec++ Take the best of Agile, DevOps and CI/CD into your AppSec Program
AppSec++ Take the best of Agile, DevOps and CI/CD into your AppSec ProgramAppSec++ Take the best of Agile, DevOps and CI/CD into your AppSec Program
AppSec++ Take the best of Agile, DevOps and CI/CD into your AppSec ProgramMatt Tesauro
 
Embracing the Rise of SecDevOps
Embracing the Rise of SecDevOpsEmbracing the Rise of SecDevOps
Embracing the Rise of SecDevOpsTom Cappetta
 
Observability in highly distributed systems
Observability in highly distributed systemsObservability in highly distributed systems
Observability in highly distributed systemsDevOps Indonesia
 

Mais procurados (20)

Performance Testing Internet of Things
Performance Testing Internet of ThingsPerformance Testing Internet of Things
Performance Testing Internet of Things
 
AppSec Pipeline - Velcocity NY 2015
AppSec Pipeline - Velcocity NY 2015AppSec Pipeline - Velcocity NY 2015
AppSec Pipeline - Velcocity NY 2015
 
Testability is Everyone's Responsibility
Testability is Everyone's ResponsibilityTestability is Everyone's Responsibility
Testability is Everyone's Responsibility
 
OSMC 2015: Monitoring at Spotify-When things go ping in the night by Martin Parm
OSMC 2015: Monitoring at Spotify-When things go ping in the night by Martin ParmOSMC 2015: Monitoring at Spotify-When things go ping in the night by Martin Parm
OSMC 2015: Monitoring at Spotify-When things go ping in the night by Martin Parm
 
ATAGTR2017 Security Testing / IoT Testing in Real World
ATAGTR2017 Security Testing / IoT Testing in Real WorldATAGTR2017 Security Testing / IoT Testing in Real World
ATAGTR2017 Security Testing / IoT Testing in Real World
 
From Sage 500 to 1000 ... Performance Testing myths exposed
From Sage 500 to 1000 ... Performance Testing myths exposedFrom Sage 500 to 1000 ... Performance Testing myths exposed
From Sage 500 to 1000 ... Performance Testing myths exposed
 
Using ai and automation to build resiliency into azure dev ops
Using ai and automation to build resiliency into azure dev opsUsing ai and automation to build resiliency into azure dev ops
Using ai and automation to build resiliency into azure dev ops
 
ATAGTR2017 The way to recover the issue faced in IoT regression Testing
ATAGTR2017 The way to recover the issue faced in IoT regression TestingATAGTR2017 The way to recover the issue faced in IoT regression Testing
ATAGTR2017 The way to recover the issue faced in IoT regression Testing
 
Bug prediction based on your code history
Bug prediction based on your code historyBug prediction based on your code history
Bug prediction based on your code history
 
Taking AppSec to 11 - BSides Austin 2016
Taking AppSec to 11 - BSides Austin 2016Taking AppSec to 11 - BSides Austin 2016
Taking AppSec to 11 - BSides Austin 2016
 
On Rapid Releases and Software Testing
On Rapid Releases and Software TestingOn Rapid Releases and Software Testing
On Rapid Releases and Software Testing
 
Spirent Accelerating SDN and NFV Deployments
Spirent Accelerating SDN and NFV DeploymentsSpirent Accelerating SDN and NFV Deployments
Spirent Accelerating SDN and NFV Deployments
 
ATAGTR2017 Batch Workload Modelling and Performance Optimization
ATAGTR2017 Batch Workload Modelling and Performance Optimization ATAGTR2017 Batch Workload Modelling and Performance Optimization
ATAGTR2017 Batch Workload Modelling and Performance Optimization
 
Connect Ops and Security with Flexible Web App and API Protection
Connect Ops and Security with Flexible Web App and API ProtectionConnect Ops and Security with Flexible Web App and API Protection
Connect Ops and Security with Flexible Web App and API Protection
 
Matt tesauro Lessons from DevOps: Taking DevOps practices into your AppSec Li...
Matt tesauro Lessons from DevOps: Taking DevOps practices into your AppSec Li...Matt tesauro Lessons from DevOps: Taking DevOps practices into your AppSec Li...
Matt tesauro Lessons from DevOps: Taking DevOps practices into your AppSec Li...
 
Making security-agile matt-tesauro
Making security-agile matt-tesauroMaking security-agile matt-tesauro
Making security-agile matt-tesauro
 
Static analysis tools as the best friend of QA
Static analysis tools as the best friend of QAStatic analysis tools as the best friend of QA
Static analysis tools as the best friend of QA
 
AppSec++ Take the best of Agile, DevOps and CI/CD into your AppSec Program
AppSec++ Take the best of Agile, DevOps and CI/CD into your AppSec ProgramAppSec++ Take the best of Agile, DevOps and CI/CD into your AppSec Program
AppSec++ Take the best of Agile, DevOps and CI/CD into your AppSec Program
 
Embracing the Rise of SecDevOps
Embracing the Rise of SecDevOpsEmbracing the Rise of SecDevOps
Embracing the Rise of SecDevOps
 
Observability in highly distributed systems
Observability in highly distributed systemsObservability in highly distributed systems
Observability in highly distributed systems
 

Semelhante a DockerCon SF 2019 - Observability Workshop

Practical operability techniques for teams - Matthew Skelton - Conflux - Cont...
Practical operability techniques for teams - Matthew Skelton - Conflux - Cont...Practical operability techniques for teams - Matthew Skelton - Conflux - Cont...
Practical operability techniques for teams - Matthew Skelton - Conflux - Cont...Matthew Skelton
 
5 practical operability techniques - Matthew Skelton - SkillsMatter 2018
5 practical operability techniques - Matthew Skelton - SkillsMatter 20185 practical operability techniques - Matthew Skelton - SkillsMatter 2018
5 practical operability techniques - Matthew Skelton - SkillsMatter 2018Matthew Skelton
 
5 practical operability techniques for teams - Matthew Skelton - SQUID meetup...
5 practical operability techniques for teams - Matthew Skelton - SQUID meetup...5 practical operability techniques for teams - Matthew Skelton - SQUID meetup...
5 practical operability techniques for teams - Matthew Skelton - SQUID meetup...Matthew Skelton
 
5 practical operability techniques for teams - Matthew Skelton - ADDO 2018
5 practical operability techniques for teams - Matthew Skelton - ADDO 20185 practical operability techniques for teams - Matthew Skelton - ADDO 2018
5 practical operability techniques for teams - Matthew Skelton - ADDO 2018Conflux
 
Practical operability techniques - Matthew Skelton - Unicom DevOps Showcase N...
Practical operability techniques - Matthew Skelton - Unicom DevOps Showcase N...Practical operability techniques - Matthew Skelton - Unicom DevOps Showcase N...
Practical operability techniques - Matthew Skelton - Unicom DevOps Showcase N...Matthew Skelton
 
2016 - 10 questions you should answer before building a new microservice
2016 - 10 questions you should answer before building a new microservice2016 - 10 questions you should answer before building a new microservice
2016 - 10 questions you should answer before building a new microservicedevopsdaysaustin
 
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...confluent
 
ADDO Open Source Observability Tools
ADDO Open Source Observability Tools ADDO Open Source Observability Tools
ADDO Open Source Observability Tools Mickey Boxell
 
20160000 Cloud Discovery Event - Cloud Access Security Brokers
20160000 Cloud Discovery Event - Cloud Access Security Brokers20160000 Cloud Discovery Event - Cloud Access Security Brokers
20160000 Cloud Discovery Event - Cloud Access Security BrokersRobin Vermeirsch
 
The IBM dashboard for operational metrics
The IBM dashboard for operational metricsThe IBM dashboard for operational metrics
The IBM dashboard for operational metricsPlatform CF
 
Instrumentation and measurement
Instrumentation and measurementInstrumentation and measurement
Instrumentation and measurementDr.M.Prasad Naidu
 
Making Runtime Data Useful for Incident Diagnosis: An Experience Report
Making Runtime Data Useful for Incident Diagnosis: An Experience ReportMaking Runtime Data Useful for Incident Diagnosis: An Experience Report
Making Runtime Data Useful for Incident Diagnosis: An Experience ReportQAware GmbH
 
Hardware enhanced association rule mining
Hardware enhanced association rule miningHardware enhanced association rule mining
Hardware enhanced association rule miningStudsPlanet.com
 
From Monoliths to Microservices at Realestate.com.au
From Monoliths to Microservices at Realestate.com.auFrom Monoliths to Microservices at Realestate.com.au
From Monoliths to Microservices at Realestate.com.auevanbottcher
 
OSMC 2023 | Journey to observability: tracking every function execution in pr...
OSMC 2023 | Journey to observability: tracking every function execution in pr...OSMC 2023 | Journey to observability: tracking every function execution in pr...
OSMC 2023 | Journey to observability: tracking every function execution in pr...NETWAYS
 
Monitoring Containerized Micro-Services In Azure
Monitoring Containerized Micro-Services In AzureMonitoring Containerized Micro-Services In Azure
Monitoring Containerized Micro-Services In AzureAlex Bulankou
 
From Duke of DevOps to Queen of Chaos - Api days 2018
From Duke of DevOps to Queen of Chaos - Api days 2018From Duke of DevOps to Queen of Chaos - Api days 2018
From Duke of DevOps to Queen of Chaos - Api days 2018Christophe Rochefolle
 
The Architecture of Continuous Innovation - OSCON 2015
The Architecture of Continuous Innovation - OSCON 2015The Architecture of Continuous Innovation - OSCON 2015
The Architecture of Continuous Innovation - OSCON 2015Chip Childers
 
Observability with Spring-based distributed systems
Observability with Spring-based distributed systemsObservability with Spring-based distributed systems
Observability with Spring-based distributed systemsRakuten Group, Inc.
 

Semelhante a DockerCon SF 2019 - Observability Workshop (20)

Practical operability techniques for teams - Matthew Skelton - Conflux - Cont...
Practical operability techniques for teams - Matthew Skelton - Conflux - Cont...Practical operability techniques for teams - Matthew Skelton - Conflux - Cont...
Practical operability techniques for teams - Matthew Skelton - Conflux - Cont...
 
5 practical operability techniques - Matthew Skelton - SkillsMatter 2018
5 practical operability techniques - Matthew Skelton - SkillsMatter 20185 practical operability techniques - Matthew Skelton - SkillsMatter 2018
5 practical operability techniques - Matthew Skelton - SkillsMatter 2018
 
5 practical operability techniques for teams - Matthew Skelton - SQUID meetup...
5 practical operability techniques for teams - Matthew Skelton - SQUID meetup...5 practical operability techniques for teams - Matthew Skelton - SQUID meetup...
5 practical operability techniques for teams - Matthew Skelton - SQUID meetup...
 
5 practical operability techniques for teams - Matthew Skelton - ADDO 2018
5 practical operability techniques for teams - Matthew Skelton - ADDO 20185 practical operability techniques for teams - Matthew Skelton - ADDO 2018
5 practical operability techniques for teams - Matthew Skelton - ADDO 2018
 
Practical operability techniques - Matthew Skelton - Unicom DevOps Showcase N...
Practical operability techniques - Matthew Skelton - Unicom DevOps Showcase N...Practical operability techniques - Matthew Skelton - Unicom DevOps Showcase N...
Practical operability techniques - Matthew Skelton - Unicom DevOps Showcase N...
 
2016 - 10 questions you should answer before building a new microservice
2016 - 10 questions you should answer before building a new microservice2016 - 10 questions you should answer before building a new microservice
2016 - 10 questions you should answer before building a new microservice
 
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
 
ADDO Open Source Observability Tools
ADDO Open Source Observability Tools ADDO Open Source Observability Tools
ADDO Open Source Observability Tools
 
20160000 Cloud Discovery Event - Cloud Access Security Brokers
20160000 Cloud Discovery Event - Cloud Access Security Brokers20160000 Cloud Discovery Event - Cloud Access Security Brokers
20160000 Cloud Discovery Event - Cloud Access Security Brokers
 
The IBM dashboard for operational metrics
The IBM dashboard for operational metricsThe IBM dashboard for operational metrics
The IBM dashboard for operational metrics
 
Instrumentation and measurement
Instrumentation and measurementInstrumentation and measurement
Instrumentation and measurement
 
Making Runtime Data Useful for Incident Diagnosis: An Experience Report
Making Runtime Data Useful for Incident Diagnosis: An Experience ReportMaking Runtime Data Useful for Incident Diagnosis: An Experience Report
Making Runtime Data Useful for Incident Diagnosis: An Experience Report
 
Hardware enhanced association rule mining
Hardware enhanced association rule miningHardware enhanced association rule mining
Hardware enhanced association rule mining
 
From Monoliths to Microservices at Realestate.com.au
From Monoliths to Microservices at Realestate.com.auFrom Monoliths to Microservices at Realestate.com.au
From Monoliths to Microservices at Realestate.com.au
 
OSMC 2023 | Journey to observability: tracking every function execution in pr...
OSMC 2023 | Journey to observability: tracking every function execution in pr...OSMC 2023 | Journey to observability: tracking every function execution in pr...
OSMC 2023 | Journey to observability: tracking every function execution in pr...
 
Monitoring Containerized Micro-Services In Azure
Monitoring Containerized Micro-Services In AzureMonitoring Containerized Micro-Services In Azure
Monitoring Containerized Micro-Services In Azure
 
Shikha fdp 62_14july2017
Shikha fdp 62_14july2017Shikha fdp 62_14july2017
Shikha fdp 62_14july2017
 
From Duke of DevOps to Queen of Chaos - Api days 2018
From Duke of DevOps to Queen of Chaos - Api days 2018From Duke of DevOps to Queen of Chaos - Api days 2018
From Duke of DevOps to Queen of Chaos - Api days 2018
 
The Architecture of Continuous Innovation - OSCON 2015
The Architecture of Continuous Innovation - OSCON 2015The Architecture of Continuous Innovation - OSCON 2015
The Architecture of Continuous Innovation - OSCON 2015
 
Observability with Spring-based distributed systems
Observability with Spring-based distributed systemsObservability with Spring-based distributed systems
Observability with Spring-based distributed systems
 

Último

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 

Último (20)

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 

DockerCon SF 2019 - Observability Workshop

  • 1. KEVIN CRAWLEY Instana + Single Music Observability Workshop
  • 2. w/ Jaeger and Prometheus Observability Workshop https://bit.ly/ot-ee-workshop
  • 3. A system is observable if the behavior of the entire system can be determined by only looking at its inputs and outputs. Lesson: control theory is a well-documented approach which people can learn from vs trying to reinvent What is Observability? Kalman, 1961 paper On the general theory of control systems
  • 4. "Observability aims to provide highly granular insights into the behavior of production systems along with rich context, perfect for debugging and performance analysis purposes.” – Cindy Sridharan @copyconstruct What is the goal of observability?
  • 5. How many of you are running staging environments? Why does my organization need Observability?
  • 6. Now, how many of you actually trust your staging environments? Why does my organization need Observability?
  • 7. This is your staging environment
  • 8. … and this is your prod environment
  • 9.
  • 10. • Gain a basic understanding of Distributed Tracing and “How it works” • Implement Metrics and Tracing in a small microservice app using FOSS tools • Understand how metrics and distributed tracing can help your organization manage complexity • Understand the limitations of FOSS and the challenges ahead What are the goals of this workshop?
  • 11. • Workshop (1-1.5 hours) • How Does Distributed Tracing Work • Challenges with FOSS monitoring • Advanced Use Cases w/ Distributed Tracing (Single Music) • Q&A Agenda
  • 12. Lab 01 - Setting up Kubernetes in Docker Enterprise Edition Lab Lab 02 - Setting up Gitlab and our Microservice Application Repository Kubernetes Integration Lab 03 - Deploying our Microservice Application and Adding Observability Lab 04 - Monitoring Application Metrics with Grafana / Prometheus Lab 05 - Observing with Jaeger and Breaking Things Lab 06 - Advanced Analytics and Use Cases with Automated Distributed Tracing Workshop Overview
  • 13. w/ Jaeger and Prometheus Observability Workshop https://bit.ly/ot-ee-workshop
  • 14. How does distributed tracing work?
  • 15. At runtime custom headers / metadata are injected into each request which includes identifiers that enable trace backends to correlate spans between requests • X-B3-TraceId: 128 or 64 lower-hex encoded bits • X-B3-SpanId: 64 lower-hex encoded bits • X-B3-ParentSpanId: 64 lower-hex encoded bits • X-B3-Sampled: Bool • X-B3-Flags: “1” includes DEBUG It’s literally just headers / meta data
  • 16. HTTP Request Example service-a requests: GET service-b:8080/api/groceries X-B3-TraceId: af38bc9 X-B3-SpanId: b9ca X-B3-ParentSpanId: nil service-b receives: GET service-b:8080/api/groceries X-B3-TraceId: af38bc9 X-B3-SpanId: b9ca X-B3-ParentSpanId: nil service-b requests: GET service-c:8080/api/products X-B3-TraceId: af38bc9 X-B3-SpanId: a3bc X-B3-ParentSpanId: b9ca service-c receives: GET service-b:8080/api/products X-B3-TraceId: af38bc9 X-B3-SpanId: a3bc X-B3-ParentSpanId: b9ca
  • 17. Gannt Chart GET /api/groceries 800ms GET /api/groceries 550ms GET /api/products 400ms
  • 18. • Correlation is nearly impossible across multiple vendors / solutions (Logging, Metrics, Traces) • Large scale applications require equally large scale monitoring (cpu/mem, i/o, distributed systems, clustered storage, sharded TSDB) Challenges of FOSS monitoring Is anything ever truly free?
  • 19. • Distributed tracing exposes a lot of data which goes unanalyzed by FOSS tools • The same holds true for Metrics and Logging • … and Alerting Actually, can I just show you what is possible? Current solutions only collect / display There is no analysis of the data
  • 20. How Distributed Tracing and Log Insights empowers Single Music Advanced Use Cases
  • 21. • Operated by 3 engineers (1 FE/1 BE/1 SRE) • Over 20k transaction / hour, 20+ integrations, 100k LOC, with less than 15% test coverage • Launched in 2018 with 15 microservices on Docker Swarm – has since expanded to over 28 microservices with zero additional engineering personnel
  • 22. Visualizing Large and Complex Visualizing Large and Complex Environments
  • 23.
  • 24.
  • 25. What happens if we aggregate timing, error rate, and # of reqs What can analyzing Distributed Traces tell us?
  • 26.
  • 27.
  • 28.
  • 29. Database Optimizations, Caching, and Concurrency What problems have Distributed Tracing helped solve?
  • 30. Slow Death of a Service
  • 31. • DBO (Hibernate Query) causing O(n log n) rise in latency and processing time • Application Dashboard indicated an issue with overall latency increasing • Fix deployed and improvement was observed immediatly Rise in Latency + Processing Time
  • 33.
  • 34. • We implemented Redis for caching, and processing time went down • However, we didn’t account for token policies changing and they suddenly began to expire after 30 seconds • Alerting around error rates for this endpoint raised our awareness around this issue Caching Solved one problem … but caused another
  • 35.
  • 36.
  • 37.
  • 38. Context is critical when doing Contributing Factor Analysis Metrics are not standalone, they have relationships
  • 39.
  • 40.
  • 41. Logs can benefit from analytics too! Let’s not forget about Logging
  • 42.
  • 43.
  • 44.
  • 45. We utilize a mix of Instana, Logz.io and Grafana to manage our systems Custom Dashboards deliver peace of mind
  • 46.
  • 47.
  • 48. • Using FOSS monitoring is a great way to both learn and demonstrate the value of observability to your peers • Understand the limitations of FOSS and be prepared to invest in either 3rd party tooling our managing your own monitoring infrastructure Focus on what matters to your business … at Single Music we focus on delivering music
  • 49. Schedule a meeting with me! Want to learn more? Come visit our booth@ Instana Booth #S23
  • 50. Rate & Share Rate this session in the DockerCon App Follow me @notsureifkevin and share #DockerCon