Time series denver an introduction to prometheus

An Introduction to Prometheus
Time Series Denver - May 30, 2018

Introduction
● CTO & Co-Founder - FreshTracks.io - A CA Accelerator Incubation
○ “Simplifying Kubernetes Visibility”
● bob@freshtracks.io
● @bob_cotton
● Father, Fly Fisher & Avid Homebrewer

Agenda
● What is a Cloud Native Application?
● Cloud Native Application Challenges
● The 5 Pillars of Monitoring
● An Introduction to Prometheus
● What FreshTracks Provides

What is a Cloud Native Application?

Cloud Native Application
● Follows 12 Factor Application Practices
● Packaged into containers
● Follows a micro-service architecture
● Managed by a Container Orchestration
○ Kubernetes, Docker Swarm, Mesos
● Usually deployed on dynamic
infrastructure
○ VMWare
○ Cloud providers
● Application lifecycle allows for
○ Auto-provisioning
○ Auto-scaling
○ Auto-redundancy

Cloud Native Applications Challenges

Cloud Native Challenges
● Containers are ephemeral
○ Scheduled on any node in the cluster
○ Move Frequently on restarts and deployments
● Kubernetes needs to be monitored
● Kubernetes brings additional complexities
○ Resource Quotas
○ Pod and Cluster Scaling
● Challenges traditional tools

The 5 Pillars of Monitoring
Metrics and
Alerting Log Analytics
Distributed
Tracing
Application
Performance
Monitoring
Real User
Monitoring

Prometheus
● Started in 2012 at SoundCloud by ex-Google Engineers
○ Open Sourced in 2015
● Patterned after “BorgMon” - Google’s Container monitoring system
● Second project accepted into the CNCF after Kubernetes
● Adoption surge is tracking Kubernetes
○ 63% of teams using Kubernetes use Prometheus

Prometheus Major Features
● Label/value based time series data model
● “Pull based” metrics collection
● Service discovery mechanism
● Simple metrics format with a rich set of “exporters”
● Extremely high-performance TSDB
● Extensive query language - PromQL
● Alert Manager
● Easily installable from Helm
○ Single, statically linked binary
● Open Source Grafana used for visualization

Time Series Data Model
<identifier> → [(t0, v0), (t1, v1), (t2, v2) …]
Identifier is a collection of label/value pairs
Time stored as int64 - Millis since the epoch
Values stored as float64
Efficient storage on disk -- 1.3 bytes/sample

Label/Value Based Data Model
● Graphite/StatsD
○ apache.192-168-5-1.home.200.http_request_total
○ apache.192-168-5-1.home.500.http_request_total
○ apache.192-168-5-1.about.200.http_request_total
● Prometheus
○ http_request_total{job=”apache”, instance=”192.168.5.1”, path=”/home”, status=”200”}
○ http_request_total{job=”apache”, instance=”192.168.5.1”, path=”/home”, status=”500”}
○ http_request_total{job=”apache”, instance=”192.168.5.1”, path=”/about”, status=”200”}
● Selecting Series
○ *.*.home.200.*.http_requests_total
○ http_requests_total{status=”200”, path=”/home”}

Client Data Model
● Counters
○ Always go up or get reset to 0
● Gauge
○ Tracks a real value e.g. temperature
● Histogram and Summary
○ Used for percentiles

Prometheus Service Discovery and Target Scrape
Prometheus
K8s API Server
TSDB
Kublet
(cAdvisor)
node-exporter
kube_state_metrics
App containers
other exporters
node_exporter
App containers
Kublet
(cAdvisor)
Service Discovery

Prometheus Exposition Format and Exporters
● The Prometheus exposition format - Text over http. Simple, human readable
● Supported by Sysdig and the TICK collector
○ Efforts to make it a standard
● Close to 100 exporters for various technologies
● The jmx_exporter can cover any Java/JMX application
● https://prometheus.io/docs/instrumenting/exporters/
Official Exporters:
● node_exporter
● jmx_exporter
● snmp_exporter
● haproxy_exporter
● cloudwatch_exporter
● collectd_exporter
● mysql_exporter
● memcached_exporter

Querying Series with PromQL
● PromQL is a functional query language. Nothing like SQL
rate(http_requests_total[5m])
select job, instance, path, status
rate(value, 5m)
FROM http_requests_total;

Querying Series with PromQL
Calculate a ratio of website hits to failures:
sum(rate(http_requests_total{status=”500”}[5m])) by (path) /
sum(rate(http_requests_total[5m])) by (path)
{path=”/home”} 0.014
{path=”/about”} 0.027

@bob_cotton@bob_cotton
Labels, Re-Label and Recording Rules
Oh My...

@bob_cotton
Kubernetes Labels
● Kubernetes gives us labels on all the things
● Our scrape targets live in the context of the K8s labels
○ This comes from service discovery
● We want to enhance the scraped metric labels with K8s labels
● This is why we need relabel rules in Prometheus

@bob_cotton
K8s API Server
TSDB
Scrape Target
Service Discovery
Prometheus
0="{__address__ 300.196.17.41}"
1="{__meta_kubernetes_namespace default}"
2="{__meta_kubernetes_pod_annotation_freshtracks_io_data_sidecar true}"
3="{__meta_kubernetes_pod_annotation_freshtracks_io_path /metrics2}"
4="{__meta_kubernetes_pod_annotation_kubernetes_io_created_by "kind":"SerializedReference"?}"
5="{__meta_kubernetes_pod_annotation_kubernetes_io_limit_ranger LimitRanger plugin set: cpu
request for container prometheus-configmap-reload; cpu request for container data-sidecar}"
6="{__meta_kubernetes_pod_annotation_prometheus_io_port 8077}"
7="{__meta_kubernetes_pod_annotation_prometheus_io_scrape false}"
8="{__meta_kubernetes_pod_container_name prometheus-configmap-reload}"
9="{__meta_kubernetes_pod_host_ip 172.20.42.119}"
10="{__meta_kubernetes_pod_ip 100.96.17.41}"
11="{__meta_kubernetes_pod_label_freshtracks_io_cluster bowl.freshtracks.io}"
12="{__meta_kubernetes_pod_label_pod_template_hash 1636686694}"
13="{__meta_kubernetes_pod_label_run data-sidecar}"
14="{__meta_kubernetes_pod_name data-sidecar-1636686694-83crm}"
15="{__meta_kubernetes_pod_node_name ip-xx-xxx-xx-xxx.us-west-2.compute.internal}"
16="{__meta_kubernetes_pod_ready false}"
17="{__metrics_path__ /metrics}"
18="{__scheme__ http}"
19="{job ftio-data-sidecar-calc}"
<relabel_config>
{__address__ 300.196.17.41:8077}
{__scheme__ http}
{__metrics_path__ /metrics}
{job ftio-data-sidecar-calc}
{kubernetes_namespace default}
{container_name prometheus-configmap-reload}
http_requests_total{region=”us-east”,
az=”us-east-1”, instance_type=”m2.xlarge”,
instance=”i-3582k8”, hostname=”host1”} = 5439
http_requests_total{region=”us-east”,
az=”us-east-1”,
instance_type=”m2.xlarge”,
instance=”i-3582k8”,
hostname=”host1”,
instance=”300.196.17.41:8077”,
job=”ftio-data-sidecar-calc”,
kubernetes_namespace=”default”,
container_name=”prometheus-configmap-reload”,
} = 5439
<metric_relabel_config>

Recording Rules - Derivative Series
● New series can be generated by querying existing series and storing them
path:request_failures_per_requests:ratio_rate5m =
sum(rate(http_requests_total{status=”500”}[5m])) by (path)
sum(rate(http_requests_total[5m])) by (path)

High Availability
Prometheus
Prometheus

Federation
Prometheus
Prometheus
Prometheus
Prometheus
Prometheus
Prometheus
Prometheus
Prometheus
Subset of Metrics

Long Term Storage and External Integrations
Prometheus
remote_write
● AppOptics: write
● Chronix: write
● Cortex: read and write
● CrateDB: read and write
● Elasticsearch: write
● Gnocchi: write
● Graphite: write
● InfluxDB: read and write
● OpenTSDB: write
● PostgreSQL/TimescaleD
B: read and write
● SignalFx: write
remote_read

Alert Definition
ALERT <alert name>
EXPR <expression>
[ FOR <duration> ]
[ LABELS <label set> ]
[ ANNOTATIONS <labelset> ]
ALERT: IngesterCrowding
EXPR: count by(ft_cluster, node)
(cortex_ingester_ingested_samples_total) > 1
FOR: 30m
LABELS: severity: critical
ANNOTATIONS:
description:
https://github.com/Fresh-Tracks/gke-configs/blob/master
/docs/alerts.md#ingestercrowding
summary: Node {{ $labels.node }} is hosting {{ $value
}} ingester pods

Alert Manager
● Deduplication
● Grouping
● Routing
● Suppression

Alert Manager
Prometheus
Prometheus
Alert Manager
Alert Manager
PagerDuty
VictorOps
Slack

FreshTracks.io
Simplifying Kubernetes Visibility

Filling the Gaps
● A small Kubernetes cluster generate > 500K unique samples
○ Which metrics are important?
● Performance of any one container is easy
○ How is the whole microservice behaving? Node? Cluster?
● Prometheus has no anomaly detection
● Dashboard creation is tedious, even if you know what to watch
● How is my service behaving in the context of the cluster?
○ How do node/container/application metrics correlate to each other?

Kubernetes Hierarchy Visibility
Namespace
Workload
Pod
Container
(Workload can be a deployment,
replicaSet, statefulSet,
daemonSet or similar)

Time series denver an introduction to prometheus

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Time series denver an introduction to prometheus

Semelhante a Time series denver an introduction to prometheus (20)

Último

Último (20)

Time series denver an introduction to prometheus