3. Agenda
â What is a Cloud Native Application?
â Cloud Native Application Challenges
â The 5 Pillars of Monitoring
â An Introduction to Prometheus
â What FreshTracks Provides
7. Cloud Native Challenges
â Containers are ephemeral
â Scheduled on any node in the cluster
â Move Frequently on restarts and deployments
â Kubernetes needs to be monitored
â Kubernetes brings additional complexities
â Resource Quotas
â Pod and Cluster Scaling
â Challenges traditional tools
11. Prometheus
â Started in 2012 at SoundCloud by ex-Google Engineers
â Open Sourced in 2015
â Patterned after âBorgMonâ - Googleâs Container monitoring system
â Second project accepted into the CNCF after Kubernetes
â Adoption surge is tracking Kubernetes
â 63% of teams using Kubernetes use Prometheus
12. Prometheus Major Features
â Label/value based time series data model
â âPull basedâ metrics collection
â Service discovery mechanism
â Simple metrics format with a rich set of âexportersâ
â Extremely high-performance TSDB
â Extensive query language - PromQL
â Alert Manager
â Easily installable from Helm
â Single, statically linked binary
â Open Source Grafana used for visualization
13. Time Series Data Model
<identifier> â [(t0, v0), (t1, v1), (t2, v2) âŠ]
Identifier is a collection of label/value pairs
Time stored as int64 - Millis since the epoch
Values stored as float64
Efficient storage on disk -- 1.3 bytes/sample
14. Label/Value Based Data Model
â Graphite/StatsD
â apache.192-168-5-1.home.200.http_request_total
â apache.192-168-5-1.home.500.http_request_total
â apache.192-168-5-1.about.200.http_request_total
â Prometheus
â http_request_total{job=âapacheâ, instance=â192.168.5.1â, path=â/homeâ, status=â200â}
â http_request_total{job=âapacheâ, instance=â192.168.5.1â, path=â/homeâ, status=â500â}
â http_request_total{job=âapacheâ, instance=â192.168.5.1â, path=â/aboutâ, status=â200â}
â Selecting Series
â *.*.home.200.*.http_requests_total
â http_requests_total{status=â200â, path=â/homeâ}
15. Client Data Model
â Counters
â Always go up or get reset to 0
â Gauge
â Tracks a real value e.g. temperature
â Histogram and Summary
â Used for percentiles
16. Prometheus Service Discovery and Target Scrape
Prometheus
K8s API Server
TSDB
Kublet
(cAdvisor)
node-exporter
kube_state_metrics
App containers
other exporters
node_exporter
App containers
Kublet
(cAdvisor)
Service Discovery
17. Prometheus Exposition Format and Exporters
â The Prometheus exposition format - Text over http. Simple, human readable
â Supported by Sysdig and the TICK collector
â Efforts to make it a standard
â Close to 100 exporters for various technologies
â The jmx_exporter can cover any Java/JMX application
â https://prometheus.io/docs/instrumenting/exporters/
Official Exporters:
â node_exporter
â jmx_exporter
â snmp_exporter
â haproxy_exporter
â cloudwatch_exporter
â collectd_exporter
â mysql_exporter
â memcached_exporter
18. Querying Series with PromQL
â PromQL is a functional query language. Nothing like SQL
rate(http_requests_total[5m])
select job, instance, path, status
rate(value, 5m)
FROM http_requests_total;
19. Querying Series with PromQL
Calculate a ratio of website hits to failures:
sum(rate(http_requests_total{status=â500â}[5m])) by (path) /
sum(rate(http_requests_total[5m])) by (path)
{path=â/homeâ} 0.014
{path=â/aboutâ} 0.027
23. Label/Value Based Data Model
â Graphite/StatsD
â apache.192-168-5-1.home.200.http_request_total
â apache.192-168-5-1.home.500.http_request_total
â apache.192-168-5-1.about.200.http_request_total
â Prometheus
â http_request_total{job=âapacheâ, instance=â192.168.5.1â, path=â/homeâ, status=â200â}
â http_request_total{job=âapacheâ, instance=â192.168.5.1â, path=â/homeâ, status=â500â}
â http_request_total{job=âapacheâ, instance=â192.168.5.1â, path=â/aboutâ, status=â200â}
â Selecting Series
â *.*.home.200.*.http_requests_total
â http_requests_total{status=â200â, path=â/homeâ}
24. @bob_cotton
Kubernetes Labels
â Kubernetes gives us labels on all the things
â Our scrape targets live in the context of the K8s labels
â This comes from service discovery
â We want to enhance the scraped metric labels with K8s labels
â This is why we need relabel rules in Prometheus
26. Recording Rules - Derivative Series
â New series can be generated by querying existing series and storing them
path:request_failures_per_requests:ratio_rate5m =
sum(rate(http_requests_total{status=â500â}[5m])) by (path)
sum(rate(http_requests_total[5m])) by (path)
35. Filling the Gaps
â A small Kubernetes cluster generate > 500K unique samples
â Which metrics are important?
â Performance of any one container is easy
â How is the whole microservice behaving? Node? Cluster?
â Prometheus has no anomaly detection
â Dashboard creation is tedious, even if you know what to watch
â How is my service behaving in the context of the cluster?
â How do node/container/application metrics correlate to each other?