O SlideShare utiliza cookies para otimizar a funcionalidade e o desempenho do site, assim como para apresentar publicidade mais relevante aos nossos usuários. Se você continuar a navegar o site, você aceita o uso de cookies. Leia nosso Contrato do Usuário e nossa Política de Privacidade.
O SlideShare utiliza cookies para otimizar a funcionalidade e o desempenho do site, assim como para apresentar publicidade mais relevante aos nossos usuários. Se você continuar a utilizar o site, você aceita o uso de cookies. Leia nossa Política de Privacidade e nosso Contrato do Usuário para obter mais detalhes.
Monitoring on Kubernetes
Engineer at AI
Kubernetes at Arvind Internet
● Our Infra is deployed on AWS
● Kubernetes minions are running on m4.xlarge instances
● Kubernetes version 1.7.5 in QA/Prod, 1.8.3 on Pre-prod
● QA/Dev, Pre-Prod & Production running on Kubernetes
● Total Pods ⇒ More than 350 (QA/Dev, Prod)
● Total services ⇒ More than 200 (QA/Dev, Prod)
● Running Mongo, MySQL, Redis, Hazelcast in Kubernetes in QA/Dev
What is Kubernetes?
Kubernetes is an open-source container orchestration engine and also an
abstraction layer for managing full stack operations of hosts and containers.
From deployment, Scaling, Load Balancing and to rolling updates of
containerized applications across multiple hosts within a cluster. Kubernetes
make sure that your applications are in the desired state.
Master: The machine that controls Kubernetes nodes. This is where all task assignments
Node: These machines perform the requested, assigned tasks. The Kubernetes master
Deployments: Provides declarative updates for
Pod: A group of one or more containers deployed to a single node. All containers in a pod
share an IP address, IPC, hostname, and other resources. Pods abstract network and
storage away from the underlying container. This lets you move containers around the
cluster more easily.
Service: This decouples work definitions from the pods. Kubernetes service
proxies automatically get service requests to the right pod—no matter where it
moves to in the cluster or even if it’s been replaced.
Config maps : ConfigMaps allow you to decouple configuration artifacts from
image content to keep containerized applications portable
Secrets: Secret are intended to hold sensitive information, such as passwords,
OAuth tokens, and ssh keys. Putting this information in a secret is safer and
more flexible than putting it verbatim in a pod definition or in a docker image
Monitoring at AI (earlier)
1. Multiple monitoring system
2. Difficulty in troubleshooting
3. Additional Infrastructure cost to support three monitoring system
4. Graphite doesn’t provide pod level Application metrics
5. Infra team need to understand Sensu, Prometheus alerting
6. Application metrics are single dimension Ex. (a.b.c.d.99)
7. Grafana alerting for Application metrics
● It developed at SoundCloud by ex-Googlers
● Prometheus is a close cousin of Kubernetes
● A multi-dimensional data model with time series data identified by metric
name and key/value pairs
● Alerting and graphing are unified, using the same language.
● Time series collection happens via a pull model over HTTP
● Targets are discovered via service discovery or static configuration
● Provides multiple exporters to send AWS EC2, Kafka, Mongo, Cassandra,
RMQ, Redis metrics
Prometheus exporter for hardware and OS metrics exposed by *NIX kernels,
written in Go with pluggable metric collectors.
● CPU (system, user, nice, iowait, steal, idle, irq, softirq, guest)
● Memory (Apps, Buffers, Cached, Free, Sla, SwapCached, PageTables, VmallocUser, Swap, Committed, Mapped,
● Disk Space Used in percent
● Disk Utilization per Device
● Disk IOS per device (read, write)
● Disk Throughput per Device (read, write)
● Context Switches
● Network Traffic (In, Out)
● Netstat (Established)
● UDP stats (InDatagrams, InErrors, OutDatagrams, NoPorts)
AWS EC2 config
__meta_ec2_availability_zone Availability zone
__meta_ec2_instance_id Instance Id
__meta_ec2_instance_state Instance state
__meta_ec2_instance_type Instance type
__meta_ec2_private_ip Private ip
__meta_ec2_public_dns_name Public DNS Name
__meta_ec2_public_ip Public IP
__meta_ec2_tag_<tagkey> Custom Tag key
Approach #1 - Prometheus on EC2
#1. Getting EC2 server metrics is quite easy and straightforward. Prometheus
provides EC2 discovery.
#2. Getting Kubernetes and Application metrics is very complex. It has 300+
lines of configuration to support just Kubernetes metrics
What is Prometheus operator?
The Prometheus Operator creates, configures, and manages Prometheus
monitoring instances. Automatically generates monitoring target configurations
based on familiar Kubernetes label queries.
Service monitor Custom Resource Definition(CRD)