Autoscaling of workloads in the Kubernetes environment. A slidedeck about Pod and Node autoscaling and the machinery behind it that makes it happen. Few recommendations for Pod and Node autoscaling while implementing it.
2. Hrishikesh Deodhar
Director of Engineering
InfraCloud technologies
www.infracloud.io
https://www.linkedin.com/in/hrishikesh-deodhar
https://twitter.com/Hrishi_kesh_
Containers | DevOps | Cloud | Kubernetes
3. Auto of Scale
● Autoscaling: Why & what?
● What to scale in Kubernetes
○ Pods
○ Nodes
● Pod scaling
○ Metric Server
○ HPA controller
○ Monitoring pipeline
● Node Scaling
○ Kubernetes Autoscaler
○ Escalator
4. As explained to a 5 year old
Image Source: http://blog.infracloud.io/kubernetes-autoscaling-explained/
Capacity of
current app
With single
instance
User
Requests
Your current
VM/Pod/Container
A spare instance so that
load can be shared
5. ● Horizontally Scale
○ Applications to meet user demand
○ Nodes to meet infrastructure demand
(Of applications)
Autoscaling: what?
6. ● Match: Actual usage == Current Usage
● Use elasticity of cloud effectively
● Optimize Cost
Autoscaling: why?
8. Horizontal Pod Autoscaler Controller
Deployment Controller
Kubelet
KubeletcAdvisor
Kubelet
Pod Pod Pod
Metrics ServerMetrics Aggregator
Replica set
Resource
Metrics from
Pods
Pod Autoscaling (What just happened earlier...)
Prometheus Adapter
Prometheus
Custom
Metrics
Pod
10. Node Autoscaling
● Kubernetes AutoScaler
(https://github.com/kubernetes/autoscaler)
● Escalator (https://github.com/atlassian/escalator)
We will talk about Kubernetes Autoscaler
11. Kubernetes Autoscaler: Basics
● A controller inside Kubernetes Cluster
● Increases Cluster Size when:
○ Pods in pending state due to insufficient resources
● Decreases Cluster Size when:
○ Cluster resource consumption is low for sufficient duration
12. Kubernetes Autoscaler: Safety first!
Won’t evict nodes if:
● PodDisruptionBudget is restrictive
● Pod can not be moved because of affinity/node selector rules
● Kube-system pods running
● Pods with local storage
● Pods with annotation:
"cluster-autoscaler.kubernetes.io/safe-to-evict": "false"
13. Clusters can be complicated business!
Availability Zone 1
Availability Zone 3
Availability Zone 2
Node Pool 1 (CPU Intensive)
● Scaling happens
at node pool level
● Can be done
across AZs
● “Expanders” can
be used for
different
strategies
Node Pool 2 (Mem Intensive)
Node Pool 2 (GPU - ML/DL)
14. HPA & Autoscaler: Marriage made in heaven
Pods & Nodes scale down
After the load goes down, pods are evicted.
This leads to under utilization of nodes and
node is evicted
HPA Scales Pods
HPA scales the pods based on HPA
definitions. More pods are are scheduled
and some of them go in pending state
Autoscaler Kicks in
Cluster autoscaler adds more nodes based on pending pods and pods start running
HPA and cluster
scaling working
together
16. Scaling speed
The delay between two consecutive up/down scale in HPA
is configured at cluster level (Current upscale delay at 3m).
Based on how fast you need to scale, this should be
configured at cluster level.
Similar controls exist for node autoscaler
17. Scaling metric
Every workload is different, some should be scaled on CPU,
some on number of messages in queue, some on a different
metric outside of application. Choose your “scaling metric”
carefully!
18. Monitoring Adapters
If you are using a commercial monitoring tool - you will
have to route metrics to metric server. You can also build
not to depend on outage of a SaaS monitoring tool!
Also check you have adapter from monitoring tool to
Metric server!
19. State & Scaling
Statefulset scaling is very different - you need to provision
volume etc. and depending on underlying datastore, might
need initial data bootstrapping etc
20. Further reading
● Metrics Server (https://github.com/kubernetes-incubator/metrics-server)
● https://github.com/infracloudio/kubernetes-autoscaling
● https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/
● https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md
Autoscaling in kubernetes has 2 dimensions:
Pod autoscaling or workload autoscaling and
Cluster autoscaling, autoscaling your cluster hosts to provide more compute.
Why do you need autoscaling?
Well why not? When you setup your kubernetes cluster to run your containerized workloads, the setup typically takes into consideration, the load that is being expected on the cluster, number of users which will utilize the application, what kind of application do you serve and several other factors.
Autoscaling in kubernetes has 2 dimensions:
Pod autoscaling or workload autoscaling and
Cluster autoscaling, autoscaling your cluster hosts to provide more compute.
Why do you need autoscaling?
Well why not? When you setup your kubernetes cluster to run your containerized workloads, the setup typically takes into consideration, the load that is being expected on the cluster, number of users which will utilize the application, what kind of application do you serve and several other factors.
Horizontal Pod Autoscaling :
It is a kubernetes resource and is implemented as a controller. Control loop can be controlled by the --horizontal-pod-autoscaler-sync-period flag.
Dynamically control the replicas of the pods based on a defined objective such as the CPU usage being X %
The HorizontalPodAutoscaler normally fetches metrics from a series of aggregated APIs (metrics.k8s.io, custom.metrics.k8s.io, and external.metrics.k8s.io)
There are 2 kinds of metrics that the HPA can act upon.
Per-pod resource metrics. (These are the metrics fetched from the resource metrics API for each pod. Metrics such as cpu, memory are targetted in this case)
Per-pod Custom metrics. (These are the metrics that are reported by your monitoring pipeline in this case prometheus, these would change from a case by case basis since metrics for scaling would be different for different implementations)
Cluster Autoscaler is a tool that automatically adjusts the size of the Kubernetes cluster when:
there are pods that failed to run in the cluster due to insufficient resources.
some nodes in the cluster are so underutilized, for an extended period of time, that they can be deleted and their pods will be easily placed on some other, existing nodes.