SlideShare uma empresa Scribd logo
1 de 21
Baixar para ler offline
Monitoring with Prometheus
at Scale
Adam Hamsik
adam.hamsik@lablabs.io
Labyrinth Labs
Rock-solid infrastructure and DevOps
● Building rock-solid and secure foundations for all your digital operations. Our
mission is to let you focus on your business without ever needing to worry
about technical issues again.
● Making you ready for growing traffic, safe against new security vulnerabilities
and data-loss.
2
TL;DR
● We will start with common monitoring issues and problems.
● Deploying Prometheus is easy and running a single instance can be sufficient
for most deployments.
● We will have a quick look at AlertManager
● We will talk about scalability limits of prometheus instance, when and how to
use sharding.
● What is Trickster and why you should use it too
● How Thanos/Cortex can help you when all hope is lost.
3
Common Monitoring Problems
● Monitoring tools are limited both technically and conceptually
● Most of existing tools don’t really scale with current infrastructure needs.
● Limited visibility
○ Generally we want to monitor and gather as much information as we can.
○ Even if we don’t need it right away usually it will be useful in a future(I promise)
● No common application monitoring interface. There are different
protocols/standards
○ Openmetrics
○ SNMP
4
Common Monitoring Problems
5
Prometheus Monitoring System
The Prometheus monitoring system and time series database is CNCF graduated
project.
● Originally developed by exGooglers for SoundCloud as their internal monitoring
system
● Inspired by Google’s Borgmon monitoring system
● Open Source under the Apache License
● Written as monolithic application in Go
6
Prometheus Server Overview
● Multi-dimensional data model with time series data identified by metric
name and key/value(labels) pairs
● PromQL, a flexible query language to leverage this dimensionality
● No reliance on distributed storage; single server nodes are autonomous
● Targets are discovered via service discovery or static configuration
● Pushing time series is supported via an intermediary gateway
● Monitor Services not Machines/Servers
7
Prometheus Architecture
81. https://www.prometheus.io/assets/architecture.png
Company Prometheus
Usage
● We deployed first prometheus servers
● Add some services
● Setup trickster as a Grafana Cache
● Add more services/servers
● Continuous adding of CPU/Memory to Prometheus instance
● Setup simple federation/sharding if single instance is too big
● Use Thanos
9
First Prometheus Deployment
● Deploying your first Prometheus server is very easy. Fetch prometheus
binary + config.
● There is a no concept of a Prometheus Cluster
● Generally Prometheus can scale very well with CPU/Memory
○ Providing more cpu/memory allows prometheus to monitor more
metrics
○ It’s hard to run large pod in a kubernetes cluster if it’s as big as a
worker node.
● If job is too big for a single server you can use federation/sharding
(remote reads) for simple scaling
10
Trickster Setup
● Loading complicated/big dashboard on Grafana can overload your
prometheus server
○ Use trickster to cache PromQL results for future reuse
○ Queries on metrics with high cardinality can use a lot of memory on
you prometheus instance[1].
○ Use limits to make sure user will not overload your server
query.max-concurrency/query.max-samples
● Delta Proxy caching - inspects the time range of a client query to
determine what data points are already cached
111. https://www.robustperception.io/limiting-promql-resource-usage
Trickster Setup
121. https://github.com/tricksterproxy/trickster/blob/main/docs/images/partial-cache-hit.png
Trickster Setup
131. https://secure.meetupstatic.com/photos/event/5/7/9/e/600_469882430.jpeg
Metrics Cardinality
● Prometheus performance almost always comes to one thing metrics
cardinality.
● Cardinality describes how many unique values of some metric you have
○ container_tasks_state metric will have a unique (pod/container) pair for each running
container in your cluster
○ custom_api_http_request will have a unique metric for each combination of
url/http_method/env. (/api/v2/users, get, dev; /api/v2/users, post, prod...)
141. https://www.robustperception.io/cardinality-is-key
Bad Metrics Cardinality
151. https://www.robustperception.io/cardinality-is-key
● See example below where we throw away bad fluentd metrics and dropped number of
scrapped metrics by ½
● If you are using fluentd look for fluentd_tail_file_inode, fluentd_tail_file_position
○ In our use case we saw cardinality 1220 from 2 metrics above per node !
Thanos/Cortex as ultimate solution
● If you have multiple kubernetes clusters, datacenters with millions of
metrics and adding more CPU/memory to prometheus is not an option.
○ Consider adding Thanos/Cortex to your infrastructure
● Thanos querier Prometheus Server HA, can load metrics from multiple
prometheus servers and make sure it will present full data to user.
○ Implements Prometheus 1.1 HTTP api.
● Thanos compactor can downsample, change retention or resolution of
your metrics.
● Thanos store is a component which can save your metrics in a AWS S3
compatible object store.
16
Thanos architecture
17
Thanos SideCar
18
● It implements Thanos’ Store API on top of Prometheus’ remote-read API. This allows
Queriers to treat Prometheus servers as yet another source of time series data without
directly talking to its APIs.
● Optionally, the sidecar uploads TSDB blocks to an object storage bucket as Prometheus
produces them every 2 hours. This allows Prometheus servers to be run with relatively
low retention while their historic data is made durable and queryable via object storage.
● Optionally Thanos sidecar is able to watch Prometheus rules and configuration,
decompress and substitute environment variables if needed and ping Prometheus to
reload them.
Thanos Query
19
● The PromQL query is posted to the Querier
● It interprets the query and goes to a pre-filter
● The query fans out its request for stores, prometheuses or other queries on the basis of labels and
time-range requirements
● The Query only sends and receives StoreAPI messages
● After it has collected all the responses, it merges and deduplicates them (if enabled)
● It then sends back the series for the user
1. https://banzaicloud.com/img/blog/multi-cluster-monitoring/life_of_a_query.png
Questions ?
20
Thank You.
We are hiring, remote working DevOps/Kubernetes
engineers.
adam.hamsik@lablabs.io
www.lablabs.io
21

Mais conteúdo relacionado

Mais procurados

AWS re:Invent re:Cap 2019: My ElasticSearch Journey on AWS
AWS re:Invent re:Cap 2019: My ElasticSearch Journey on AWSAWS re:Invent re:Cap 2019: My ElasticSearch Journey on AWS
AWS re:Invent re:Cap 2019: My ElasticSearch Journey on AWSsmalltown
 
AWS Lambda and serverless Java | DevNation Live
AWS Lambda and serverless Java | DevNation LiveAWS Lambda and serverless Java | DevNation Live
AWS Lambda and serverless Java | DevNation LiveRed Hat Developers
 
Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)
Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)
Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)Brian Brazil
 
CDK Meetup: Rule the World through IaC
CDK Meetup: Rule the World through IaCCDK Meetup: Rule the World through IaC
CDK Meetup: Rule the World through IaCsmalltown
 
Monitoring kubernetes with prometheus-operator
Monitoring kubernetes with prometheus-operatorMonitoring kubernetes with prometheus-operator
Monitoring kubernetes with prometheus-operatorLili Cosic
 
Secrets management vault cncf meetup
Secrets management vault cncf meetupSecrets management vault cncf meetup
Secrets management vault cncf meetupJuraj Hantak
 
Weave Cortex: Multi-tenant, horizontally scalable Prometheus as a Service
Weave Cortex: Multi-tenant, horizontally scalable Prometheus as a ServiceWeave Cortex: Multi-tenant, horizontally scalable Prometheus as a Service
Weave Cortex: Multi-tenant, horizontally scalable Prometheus as a ServiceWeaveworks
 
Monitoring Weave Cloud with Prometheus
Monitoring Weave Cloud with PrometheusMonitoring Weave Cloud with Prometheus
Monitoring Weave Cloud with PrometheusWeaveworks
 
Kubernetes Meetup: CNI, Flex Volume, and Scheduler
Kubernetes Meetup: CNI, Flex Volume, and SchedulerKubernetes Meetup: CNI, Flex Volume, and Scheduler
Kubernetes Meetup: CNI, Flex Volume, and SchedulerKatie Crimi
 
Getting Started with Kafka on k8s
Getting Started with Kafka on k8sGetting Started with Kafka on k8s
Getting Started with Kafka on k8sVMware Tanzu
 
Going deep (learning) with tensor flow and quarkus
Going deep (learning) with tensor flow and quarkusGoing deep (learning) with tensor flow and quarkus
Going deep (learning) with tensor flow and quarkusRed Hat Developers
 
"Using Automation Tools To Deploy And Operate Applications In Real World Scen...
"Using Automation Tools To Deploy And Operate Applications In Real World Scen..."Using Automation Tools To Deploy And Operate Applications In Real World Scen...
"Using Automation Tools To Deploy And Operate Applications In Real World Scen...ConSol Consulting & Solutions Software GmbH
 
Developing a user-friendly OpenResty application
Developing a user-friendly OpenResty applicationDeveloping a user-friendly OpenResty application
Developing a user-friendly OpenResty applicationThibault Charbonnier
 
Persist your data in an ephemeral k8 ecosystem
Persist your data in an ephemeral k8 ecosystemPersist your data in an ephemeral k8 ecosystem
Persist your data in an ephemeral k8 ecosystemLibbySchulze
 
OSDC 2018 | Self Hosted bare Metal Kubernetes for SMEs by Thomas Hoppe
OSDC 2018 | Self Hosted bare Metal Kubernetes for SMEs by Thomas HoppeOSDC 2018 | Self Hosted bare Metal Kubernetes for SMEs by Thomas Hoppe
OSDC 2018 | Self Hosted bare Metal Kubernetes for SMEs by Thomas HoppeNETWAYS
 
Luigi Hostplumber intro slide.pptx (1).pdf
Luigi Hostplumber intro slide.pptx (1).pdfLuigi Hostplumber intro slide.pptx (1).pdf
Luigi Hostplumber intro slide.pptx (1).pdfLibbySchulze
 
DevOpsDays Taipei 2019 - Mastering IaC the DevOps Way
DevOpsDays Taipei 2019 - Mastering IaC the DevOps WayDevOpsDays Taipei 2019 - Mastering IaC the DevOps Way
DevOpsDays Taipei 2019 - Mastering IaC the DevOps Waysmalltown
 

Mais procurados (20)

AWS re:Invent re:Cap 2019: My ElasticSearch Journey on AWS
AWS re:Invent re:Cap 2019: My ElasticSearch Journey on AWSAWS re:Invent re:Cap 2019: My ElasticSearch Journey on AWS
AWS re:Invent re:Cap 2019: My ElasticSearch Journey on AWS
 
AWS Lambda and serverless Java | DevNation Live
AWS Lambda and serverless Java | DevNation LiveAWS Lambda and serverless Java | DevNation Live
AWS Lambda and serverless Java | DevNation Live
 
Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)
Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)
Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)
 
Crafting Kubernetes Operators
Crafting Kubernetes OperatorsCrafting Kubernetes Operators
Crafting Kubernetes Operators
 
Shaker
ShakerShaker
Shaker
 
CDK Meetup: Rule the World through IaC
CDK Meetup: Rule the World through IaCCDK Meetup: Rule the World through IaC
CDK Meetup: Rule the World through IaC
 
Monitoring kubernetes with prometheus-operator
Monitoring kubernetes with prometheus-operatorMonitoring kubernetes with prometheus-operator
Monitoring kubernetes with prometheus-operator
 
Secrets management vault cncf meetup
Secrets management vault cncf meetupSecrets management vault cncf meetup
Secrets management vault cncf meetup
 
Weave Cortex: Multi-tenant, horizontally scalable Prometheus as a Service
Weave Cortex: Multi-tenant, horizontally scalable Prometheus as a ServiceWeave Cortex: Multi-tenant, horizontally scalable Prometheus as a Service
Weave Cortex: Multi-tenant, horizontally scalable Prometheus as a Service
 
Monitoring Weave Cloud with Prometheus
Monitoring Weave Cloud with PrometheusMonitoring Weave Cloud with Prometheus
Monitoring Weave Cloud with Prometheus
 
Crunchy containers
Crunchy containersCrunchy containers
Crunchy containers
 
Kubernetes Meetup: CNI, Flex Volume, and Scheduler
Kubernetes Meetup: CNI, Flex Volume, and SchedulerKubernetes Meetup: CNI, Flex Volume, and Scheduler
Kubernetes Meetup: CNI, Flex Volume, and Scheduler
 
Getting Started with Kafka on k8s
Getting Started with Kafka on k8sGetting Started with Kafka on k8s
Getting Started with Kafka on k8s
 
Going deep (learning) with tensor flow and quarkus
Going deep (learning) with tensor flow and quarkusGoing deep (learning) with tensor flow and quarkus
Going deep (learning) with tensor flow and quarkus
 
"Using Automation Tools To Deploy And Operate Applications In Real World Scen...
"Using Automation Tools To Deploy And Operate Applications In Real World Scen..."Using Automation Tools To Deploy And Operate Applications In Real World Scen...
"Using Automation Tools To Deploy And Operate Applications In Real World Scen...
 
Developing a user-friendly OpenResty application
Developing a user-friendly OpenResty applicationDeveloping a user-friendly OpenResty application
Developing a user-friendly OpenResty application
 
Persist your data in an ephemeral k8 ecosystem
Persist your data in an ephemeral k8 ecosystemPersist your data in an ephemeral k8 ecosystem
Persist your data in an ephemeral k8 ecosystem
 
OSDC 2018 | Self Hosted bare Metal Kubernetes for SMEs by Thomas Hoppe
OSDC 2018 | Self Hosted bare Metal Kubernetes for SMEs by Thomas HoppeOSDC 2018 | Self Hosted bare Metal Kubernetes for SMEs by Thomas Hoppe
OSDC 2018 | Self Hosted bare Metal Kubernetes for SMEs by Thomas Hoppe
 
Luigi Hostplumber intro slide.pptx (1).pdf
Luigi Hostplumber intro slide.pptx (1).pdfLuigi Hostplumber intro slide.pptx (1).pdf
Luigi Hostplumber intro slide.pptx (1).pdf
 
DevOpsDays Taipei 2019 - Mastering IaC the DevOps Way
DevOpsDays Taipei 2019 - Mastering IaC the DevOps WayDevOpsDays Taipei 2019 - Mastering IaC the DevOps Way
DevOpsDays Taipei 2019 - Mastering IaC the DevOps Way
 

Semelhante a Monitoring with prometheus at scale

Prometheus (Microsoft, 2016)
Prometheus (Microsoft, 2016)Prometheus (Microsoft, 2016)
Prometheus (Microsoft, 2016)Brian Brazil
 
Kubernetes Observability with Prometheus by Example
Kubernetes Observability with Prometheus by ExampleKubernetes Observability with Prometheus by Example
Kubernetes Observability with Prometheus by ExampleThomas Riley
 
Build cloud native solution using open source
Build cloud native solution using open source Build cloud native solution using open source
Build cloud native solution using open source Nitesh Jadhav
 
Prometheus in Practice: High Availability with Thanos (DevOpsDays Edinburgh 2...
Prometheus in Practice: High Availability with Thanos (DevOpsDays Edinburgh 2...Prometheus in Practice: High Availability with Thanos (DevOpsDays Edinburgh 2...
Prometheus in Practice: High Availability with Thanos (DevOpsDays Edinburgh 2...Thomas Riley
 
Prometheus: A Next Generation Monitoring System (FOSDEM 2016)
Prometheus: A Next Generation Monitoring System (FOSDEM 2016)Prometheus: A Next Generation Monitoring System (FOSDEM 2016)
Prometheus: A Next Generation Monitoring System (FOSDEM 2016)Brian Brazil
 
Prometheus for Monitoring Metrics (Fermilab 2018)
Prometheus for Monitoring Metrics (Fermilab 2018)Prometheus for Monitoring Metrics (Fermilab 2018)
Prometheus for Monitoring Metrics (Fermilab 2018)Brian Brazil
 
Monitoring in Big Data Platform - Albert Lewandowski, GetInData
Monitoring in Big Data Platform - Albert Lewandowski, GetInDataMonitoring in Big Data Platform - Albert Lewandowski, GetInData
Monitoring in Big Data Platform - Albert Lewandowski, GetInDataGetInData
 
Prometheus and Docker (Docker Galway, November 2015)
Prometheus and Docker (Docker Galway, November 2015)Prometheus and Docker (Docker Galway, November 2015)
Prometheus and Docker (Docker Galway, November 2015)Brian Brazil
 
IRJET- Real Time Monitoring of Servers with Prometheus and Grafana for High A...
IRJET- Real Time Monitoring of Servers with Prometheus and Grafana for High A...IRJET- Real Time Monitoring of Servers with Prometheus and Grafana for High A...
IRJET- Real Time Monitoring of Servers with Prometheus and Grafana for High A...IRJET Journal
 
Prometheus - Intro, CNCF, TSDB,PromQL,Grafana
Prometheus - Intro, CNCF, TSDB,PromQL,GrafanaPrometheus - Intro, CNCF, TSDB,PromQL,Grafana
Prometheus - Intro, CNCF, TSDB,PromQL,GrafanaSridhar Kumar N
 
Prometheus Overview
Prometheus OverviewPrometheus Overview
Prometheus OverviewBrian Brazil
 
Scaling Prometheus on Kubernetes with Thanos
Scaling Prometheus on Kubernetes with ThanosScaling Prometheus on Kubernetes with Thanos
Scaling Prometheus on Kubernetes with ThanosThomas Riley
 
DevOps Braga #15: Agentless monitoring with icinga and prometheus
DevOps Braga #15: Agentless monitoring with icinga and prometheusDevOps Braga #15: Agentless monitoring with icinga and prometheus
DevOps Braga #15: Agentless monitoring with icinga and prometheusDevOps Braga
 
What's New in Alluxio 2.3
What's New in Alluxio 2.3What's New in Alluxio 2.3
What's New in Alluxio 2.3Alluxio, Inc.
 
The hitchhiker’s guide to Prometheus
The hitchhiker’s guide to PrometheusThe hitchhiker’s guide to Prometheus
The hitchhiker’s guide to PrometheusBol.com Techlab
 
The hitchhiker’s guide to Prometheus
The hitchhiker’s guide to PrometheusThe hitchhiker’s guide to Prometheus
The hitchhiker’s guide to PrometheusBol.com Techlab
 

Semelhante a Monitoring with prometheus at scale (20)

Prometheus
PrometheusPrometheus
Prometheus
 
Prometheus (Microsoft, 2016)
Prometheus (Microsoft, 2016)Prometheus (Microsoft, 2016)
Prometheus (Microsoft, 2016)
 
Kubernetes Observability with Prometheus by Example
Kubernetes Observability with Prometheus by ExampleKubernetes Observability with Prometheus by Example
Kubernetes Observability with Prometheus by Example
 
Build cloud native solution using open source
Build cloud native solution using open source Build cloud native solution using open source
Build cloud native solution using open source
 
Prometheus in Practice: High Availability with Thanos (DevOpsDays Edinburgh 2...
Prometheus in Practice: High Availability with Thanos (DevOpsDays Edinburgh 2...Prometheus in Practice: High Availability with Thanos (DevOpsDays Edinburgh 2...
Prometheus in Practice: High Availability with Thanos (DevOpsDays Edinburgh 2...
 
Prometheus: A Next Generation Monitoring System (FOSDEM 2016)
Prometheus: A Next Generation Monitoring System (FOSDEM 2016)Prometheus: A Next Generation Monitoring System (FOSDEM 2016)
Prometheus: A Next Generation Monitoring System (FOSDEM 2016)
 
Prometheus for Monitoring Metrics (Fermilab 2018)
Prometheus for Monitoring Metrics (Fermilab 2018)Prometheus for Monitoring Metrics (Fermilab 2018)
Prometheus for Monitoring Metrics (Fermilab 2018)
 
Monitoring in Big Data Platform - Albert Lewandowski, GetInData
Monitoring in Big Data Platform - Albert Lewandowski, GetInDataMonitoring in Big Data Platform - Albert Lewandowski, GetInData
Monitoring in Big Data Platform - Albert Lewandowski, GetInData
 
Prometheus and Docker (Docker Galway, November 2015)
Prometheus and Docker (Docker Galway, November 2015)Prometheus and Docker (Docker Galway, November 2015)
Prometheus and Docker (Docker Galway, November 2015)
 
IRJET- Real Time Monitoring of Servers with Prometheus and Grafana for High A...
IRJET- Real Time Monitoring of Servers with Prometheus and Grafana for High A...IRJET- Real Time Monitoring of Servers with Prometheus and Grafana for High A...
IRJET- Real Time Monitoring of Servers with Prometheus and Grafana for High A...
 
Prometheus - Intro, CNCF, TSDB,PromQL,Grafana
Prometheus - Intro, CNCF, TSDB,PromQL,GrafanaPrometheus - Intro, CNCF, TSDB,PromQL,Grafana
Prometheus - Intro, CNCF, TSDB,PromQL,Grafana
 
Prometheus Overview
Prometheus OverviewPrometheus Overview
Prometheus Overview
 
Scaling Prometheus on Kubernetes with Thanos
Scaling Prometheus on Kubernetes with ThanosScaling Prometheus on Kubernetes with Thanos
Scaling Prometheus on Kubernetes with Thanos
 
DevOps Braga #15: Agentless monitoring with icinga and prometheus
DevOps Braga #15: Agentless monitoring with icinga and prometheusDevOps Braga #15: Agentless monitoring with icinga and prometheus
DevOps Braga #15: Agentless monitoring with icinga and prometheus
 
What's New in Alluxio 2.3
What's New in Alluxio 2.3What's New in Alluxio 2.3
What's New in Alluxio 2.3
 
Prometheus and Grafana
Prometheus and GrafanaPrometheus and Grafana
Prometheus and Grafana
 
Cloud Monitoring tool Grafana
Cloud Monitoring  tool Grafana Cloud Monitoring  tool Grafana
Cloud Monitoring tool Grafana
 
The hitchhiker’s guide to Prometheus
The hitchhiker’s guide to PrometheusThe hitchhiker’s guide to Prometheus
The hitchhiker’s guide to Prometheus
 
The hitchhiker’s guide to Prometheus
The hitchhiker’s guide to PrometheusThe hitchhiker’s guide to Prometheus
The hitchhiker’s guide to Prometheus
 
Prometheus monitoring
Prometheus monitoringPrometheus monitoring
Prometheus monitoring
 

Mais de Juraj Hantak

Kubernetes day 2_jozef_halgas_pf
Kubernetes day 2_jozef_halgas_pfKubernetes day 2_jozef_halgas_pf
Kubernetes day 2_jozef_halgas_pfJuraj Hantak
 
Kubernetes day 2 @ zse energia
Kubernetes day 2 @ zse energiaKubernetes day 2 @ zse energia
Kubernetes day 2 @ zse energiaJuraj Hantak
 
Dev ops culture_final
Dev ops culture_finalDev ops culture_final
Dev ops culture_finalJuraj Hantak
 
Integracia security do ci cd pipelines
Integracia security do ci cd pipelinesIntegracia security do ci cd pipelines
Integracia security do ci cd pipelinesJuraj Hantak
 
Introductiontohelmcharts2021
Introductiontohelmcharts2021Introductiontohelmcharts2021
Introductiontohelmcharts2021Juraj Hantak
 
19. stretnutie komunity kubernetes
19. stretnutie komunity kubernetes19. stretnutie komunity kubernetes
19. stretnutie komunity kubernetesJuraj Hantak
 
16. Cncf meetup-docker
16. Cncf meetup-docker16. Cncf meetup-docker
16. Cncf meetup-dockerJuraj Hantak
 
16. meetup sietovy model v kubernetes
16. meetup sietovy model v kubernetes16. meetup sietovy model v kubernetes
16. meetup sietovy model v kubernetesJuraj Hantak
 
Terraform a gitlab ci
Terraform a gitlab ciTerraform a gitlab ci
Terraform a gitlab ciJuraj Hantak
 
Nginx app protect-for-meetup-v1.0-202006_lk
Nginx app protect-for-meetup-v1.0-202006_lkNginx app protect-for-meetup-v1.0-202006_lk
Nginx app protect-for-meetup-v1.0-202006_lkJuraj Hantak
 
10. th cncf meetup - Routing microservice-architectures-with-traefik-cncfsk
10. th cncf meetup - Routing microservice-architectures-with-traefik-cncfsk10. th cncf meetup - Routing microservice-architectures-with-traefik-cncfsk
10. th cncf meetup - Routing microservice-architectures-with-traefik-cncfskJuraj Hantak
 
10.cncfsk en-story
10.cncfsk en-story10.cncfsk en-story
10.cncfsk en-storyJuraj Hantak
 
Ingress controller present, past and future
Ingress controller present, past and futureIngress controller present, past and future
Ingress controller present, past and futureJuraj Hantak
 
Cncf meetup-service-mesh-sk
Cncf meetup-service-mesh-skCncf meetup-service-mesh-sk
Cncf meetup-service-mesh-skJuraj Hantak
 

Mais de Juraj Hantak (20)

Kubernetes day 2_jozef_halgas_pf
Kubernetes day 2_jozef_halgas_pfKubernetes day 2_jozef_halgas_pf
Kubernetes day 2_jozef_halgas_pf
 
Kubernetes day 2 @ zse energia
Kubernetes day 2 @ zse energiaKubernetes day 2 @ zse energia
Kubernetes day 2 @ zse energia
 
Dev ops culture_final
Dev ops culture_finalDev ops culture_final
Dev ops culture_final
 
Promise of DevOps
Promise of DevOpsPromise of DevOps
Promise of DevOps
 
23 meetup rancher
23 meetup rancher23 meetup rancher
23 meetup rancher
 
Integracia security do ci cd pipelines
Integracia security do ci cd pipelinesIntegracia security do ci cd pipelines
Integracia security do ci cd pipelines
 
CNCF opa
CNCF opaCNCF opa
CNCF opa
 
Introductiontohelmcharts2021
Introductiontohelmcharts2021Introductiontohelmcharts2021
Introductiontohelmcharts2021
 
19. stretnutie komunity kubernetes
19. stretnutie komunity kubernetes19. stretnutie komunity kubernetes
19. stretnutie komunity kubernetes
 
16. Cncf meetup-docker
16. Cncf meetup-docker16. Cncf meetup-docker
16. Cncf meetup-docker
 
16. meetup sietovy model v kubernetes
16. meetup sietovy model v kubernetes16. meetup sietovy model v kubernetes
16. meetup sietovy model v kubernetes
 
16.meetup uvod
16.meetup uvod16.meetup uvod
16.meetup uvod
 
14. meetup
14. meetup14. meetup
14. meetup
 
Terraform a gitlab ci
Terraform a gitlab ciTerraform a gitlab ci
Terraform a gitlab ci
 
Grafana 7.0
Grafana 7.0Grafana 7.0
Grafana 7.0
 
Nginx app protect-for-meetup-v1.0-202006_lk
Nginx app protect-for-meetup-v1.0-202006_lkNginx app protect-for-meetup-v1.0-202006_lk
Nginx app protect-for-meetup-v1.0-202006_lk
 
10. th cncf meetup - Routing microservice-architectures-with-traefik-cncfsk
10. th cncf meetup - Routing microservice-architectures-with-traefik-cncfsk10. th cncf meetup - Routing microservice-architectures-with-traefik-cncfsk
10. th cncf meetup - Routing microservice-architectures-with-traefik-cncfsk
 
10.cncfsk en-story
10.cncfsk en-story10.cncfsk en-story
10.cncfsk en-story
 
Ingress controller present, past and future
Ingress controller present, past and futureIngress controller present, past and future
Ingress controller present, past and future
 
Cncf meetup-service-mesh-sk
Cncf meetup-service-mesh-skCncf meetup-service-mesh-sk
Cncf meetup-service-mesh-sk
 

Último

Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...gajnagarg
 
Ballia Escorts Service Girl ^ 9332606886, WhatsApp Anytime Ballia
Ballia Escorts Service Girl ^ 9332606886, WhatsApp Anytime BalliaBallia Escorts Service Girl ^ 9332606886, WhatsApp Anytime Ballia
Ballia Escorts Service Girl ^ 9332606886, WhatsApp Anytime Balliameghakumariji156
 
20240507 QFM013 Machine Intelligence Reading List April 2024.pdf
20240507 QFM013 Machine Intelligence Reading List April 2024.pdf20240507 QFM013 Machine Intelligence Reading List April 2024.pdf
20240507 QFM013 Machine Intelligence Reading List April 2024.pdfMatthew Sinclair
 
Indian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
Indian Escort in Abu DHabi 0508644382 Abu Dhabi EscortsIndian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
Indian Escort in Abu DHabi 0508644382 Abu Dhabi EscortsMonica Sydney
 
Mira Road Housewife Call Girls 07506202331, Nalasopara Call Girls
Mira Road Housewife Call Girls 07506202331, Nalasopara Call GirlsMira Road Housewife Call Girls 07506202331, Nalasopara Call Girls
Mira Road Housewife Call Girls 07506202331, Nalasopara Call GirlsPriya Reddy
 
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girlsRussian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girlsMonica Sydney
 
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoil
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime NagercoilNagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoil
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoilmeghakumariji156
 
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查ydyuyu
 
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
2nd Solid Symposium: Solid Pods vs Personal Knowledge GraphsEleniIlkou
 
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样ayvbos
 
Real Men Wear Diapers T Shirts sweatshirt
Real Men Wear Diapers T Shirts sweatshirtReal Men Wear Diapers T Shirts sweatshirt
Real Men Wear Diapers T Shirts sweatshirtrahman018755
 
Best SEO Services Company in Dallas | Best SEO Agency Dallas
Best SEO Services Company in Dallas | Best SEO Agency DallasBest SEO Services Company in Dallas | Best SEO Agency Dallas
Best SEO Services Company in Dallas | Best SEO Agency DallasDigicorns Technologies
 
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...APNIC
 
Trump Diapers Over Dems t shirts Sweatshirt
Trump Diapers Over Dems t shirts SweatshirtTrump Diapers Over Dems t shirts Sweatshirt
Trump Diapers Over Dems t shirts Sweatshirtrahman018755
 
Meaning of On page SEO & its process in detail.
Meaning of On page SEO & its process in detail.Meaning of On page SEO & its process in detail.
Meaning of On page SEO & its process in detail.krishnachandrapal52
 
一比一原版田纳西大学毕业证如何办理
一比一原版田纳西大学毕业证如何办理一比一原版田纳西大学毕业证如何办理
一比一原版田纳西大学毕业证如何办理F
 
20240508 QFM014 Elixir Reading List April 2024.pdf
20240508 QFM014 Elixir Reading List April 2024.pdf20240508 QFM014 Elixir Reading List April 2024.pdf
20240508 QFM014 Elixir Reading List April 2024.pdfMatthew Sinclair
 
best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...
best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...
best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...kajalverma014
 
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrStory Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrHenryBriggs2
 

Último (20)

Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
 
Ballia Escorts Service Girl ^ 9332606886, WhatsApp Anytime Ballia
Ballia Escorts Service Girl ^ 9332606886, WhatsApp Anytime BalliaBallia Escorts Service Girl ^ 9332606886, WhatsApp Anytime Ballia
Ballia Escorts Service Girl ^ 9332606886, WhatsApp Anytime Ballia
 
20240507 QFM013 Machine Intelligence Reading List April 2024.pdf
20240507 QFM013 Machine Intelligence Reading List April 2024.pdf20240507 QFM013 Machine Intelligence Reading List April 2024.pdf
20240507 QFM013 Machine Intelligence Reading List April 2024.pdf
 
Indian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
Indian Escort in Abu DHabi 0508644382 Abu Dhabi EscortsIndian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
Indian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
 
Mira Road Housewife Call Girls 07506202331, Nalasopara Call Girls
Mira Road Housewife Call Girls 07506202331, Nalasopara Call GirlsMira Road Housewife Call Girls 07506202331, Nalasopara Call Girls
Mira Road Housewife Call Girls 07506202331, Nalasopara Call Girls
 
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girlsRussian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
 
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoil
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime NagercoilNagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoil
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoil
 
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
 
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
 
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
 
Real Men Wear Diapers T Shirts sweatshirt
Real Men Wear Diapers T Shirts sweatshirtReal Men Wear Diapers T Shirts sweatshirt
Real Men Wear Diapers T Shirts sweatshirt
 
Best SEO Services Company in Dallas | Best SEO Agency Dallas
Best SEO Services Company in Dallas | Best SEO Agency DallasBest SEO Services Company in Dallas | Best SEO Agency Dallas
Best SEO Services Company in Dallas | Best SEO Agency Dallas
 
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...
 
Trump Diapers Over Dems t shirts Sweatshirt
Trump Diapers Over Dems t shirts SweatshirtTrump Diapers Over Dems t shirts Sweatshirt
Trump Diapers Over Dems t shirts Sweatshirt
 
Meaning of On page SEO & its process in detail.
Meaning of On page SEO & its process in detail.Meaning of On page SEO & its process in detail.
Meaning of On page SEO & its process in detail.
 
一比一原版田纳西大学毕业证如何办理
一比一原版田纳西大学毕业证如何办理一比一原版田纳西大学毕业证如何办理
一比一原版田纳西大学毕业证如何办理
 
20240508 QFM014 Elixir Reading List April 2024.pdf
20240508 QFM014 Elixir Reading List April 2024.pdf20240508 QFM014 Elixir Reading List April 2024.pdf
20240508 QFM014 Elixir Reading List April 2024.pdf
 
best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...
best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...
best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...
 
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrStory Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
 
call girls in Anand Vihar (delhi) call me [🔝9953056974🔝] escort service 24X7
call girls in Anand Vihar (delhi) call me [🔝9953056974🔝] escort service 24X7call girls in Anand Vihar (delhi) call me [🔝9953056974🔝] escort service 24X7
call girls in Anand Vihar (delhi) call me [🔝9953056974🔝] escort service 24X7
 

Monitoring with prometheus at scale

  • 1. Monitoring with Prometheus at Scale Adam Hamsik adam.hamsik@lablabs.io
  • 2. Labyrinth Labs Rock-solid infrastructure and DevOps ● Building rock-solid and secure foundations for all your digital operations. Our mission is to let you focus on your business without ever needing to worry about technical issues again. ● Making you ready for growing traffic, safe against new security vulnerabilities and data-loss. 2
  • 3. TL;DR ● We will start with common monitoring issues and problems. ● Deploying Prometheus is easy and running a single instance can be sufficient for most deployments. ● We will have a quick look at AlertManager ● We will talk about scalability limits of prometheus instance, when and how to use sharding. ● What is Trickster and why you should use it too ● How Thanos/Cortex can help you when all hope is lost. 3
  • 4. Common Monitoring Problems ● Monitoring tools are limited both technically and conceptually ● Most of existing tools don’t really scale with current infrastructure needs. ● Limited visibility ○ Generally we want to monitor and gather as much information as we can. ○ Even if we don’t need it right away usually it will be useful in a future(I promise) ● No common application monitoring interface. There are different protocols/standards ○ Openmetrics ○ SNMP 4
  • 6. Prometheus Monitoring System The Prometheus monitoring system and time series database is CNCF graduated project. ● Originally developed by exGooglers for SoundCloud as their internal monitoring system ● Inspired by Google’s Borgmon monitoring system ● Open Source under the Apache License ● Written as monolithic application in Go 6
  • 7. Prometheus Server Overview ● Multi-dimensional data model with time series data identified by metric name and key/value(labels) pairs ● PromQL, a flexible query language to leverage this dimensionality ● No reliance on distributed storage; single server nodes are autonomous ● Targets are discovered via service discovery or static configuration ● Pushing time series is supported via an intermediary gateway ● Monitor Services not Machines/Servers 7
  • 9. Company Prometheus Usage ● We deployed first prometheus servers ● Add some services ● Setup trickster as a Grafana Cache ● Add more services/servers ● Continuous adding of CPU/Memory to Prometheus instance ● Setup simple federation/sharding if single instance is too big ● Use Thanos 9
  • 10. First Prometheus Deployment ● Deploying your first Prometheus server is very easy. Fetch prometheus binary + config. ● There is a no concept of a Prometheus Cluster ● Generally Prometheus can scale very well with CPU/Memory ○ Providing more cpu/memory allows prometheus to monitor more metrics ○ It’s hard to run large pod in a kubernetes cluster if it’s as big as a worker node. ● If job is too big for a single server you can use federation/sharding (remote reads) for simple scaling 10
  • 11. Trickster Setup ● Loading complicated/big dashboard on Grafana can overload your prometheus server ○ Use trickster to cache PromQL results for future reuse ○ Queries on metrics with high cardinality can use a lot of memory on you prometheus instance[1]. ○ Use limits to make sure user will not overload your server query.max-concurrency/query.max-samples ● Delta Proxy caching - inspects the time range of a client query to determine what data points are already cached 111. https://www.robustperception.io/limiting-promql-resource-usage
  • 14. Metrics Cardinality ● Prometheus performance almost always comes to one thing metrics cardinality. ● Cardinality describes how many unique values of some metric you have ○ container_tasks_state metric will have a unique (pod/container) pair for each running container in your cluster ○ custom_api_http_request will have a unique metric for each combination of url/http_method/env. (/api/v2/users, get, dev; /api/v2/users, post, prod...) 141. https://www.robustperception.io/cardinality-is-key
  • 15. Bad Metrics Cardinality 151. https://www.robustperception.io/cardinality-is-key ● See example below where we throw away bad fluentd metrics and dropped number of scrapped metrics by ½ ● If you are using fluentd look for fluentd_tail_file_inode, fluentd_tail_file_position ○ In our use case we saw cardinality 1220 from 2 metrics above per node !
  • 16. Thanos/Cortex as ultimate solution ● If you have multiple kubernetes clusters, datacenters with millions of metrics and adding more CPU/memory to prometheus is not an option. ○ Consider adding Thanos/Cortex to your infrastructure ● Thanos querier Prometheus Server HA, can load metrics from multiple prometheus servers and make sure it will present full data to user. ○ Implements Prometheus 1.1 HTTP api. ● Thanos compactor can downsample, change retention or resolution of your metrics. ● Thanos store is a component which can save your metrics in a AWS S3 compatible object store. 16
  • 18. Thanos SideCar 18 ● It implements Thanos’ Store API on top of Prometheus’ remote-read API. This allows Queriers to treat Prometheus servers as yet another source of time series data without directly talking to its APIs. ● Optionally, the sidecar uploads TSDB blocks to an object storage bucket as Prometheus produces them every 2 hours. This allows Prometheus servers to be run with relatively low retention while their historic data is made durable and queryable via object storage. ● Optionally Thanos sidecar is able to watch Prometheus rules and configuration, decompress and substitute environment variables if needed and ping Prometheus to reload them.
  • 19. Thanos Query 19 ● The PromQL query is posted to the Querier ● It interprets the query and goes to a pre-filter ● The query fans out its request for stores, prometheuses or other queries on the basis of labels and time-range requirements ● The Query only sends and receives StoreAPI messages ● After it has collected all the responses, it merges and deduplicates them (if enabled) ● It then sends back the series for the user 1. https://banzaicloud.com/img/blog/multi-cluster-monitoring/life_of_a_query.png
  • 21. Thank You. We are hiring, remote working DevOps/Kubernetes engineers. adam.hamsik@lablabs.io www.lablabs.io 21