SlideShare uma empresa Scribd logo
1 de 67
Baixar para ler offline
Optimizing Kubernetes
Resource Requests/Limits
for Cost-Efficiency and Latency
Henning Jacobs (Zalando SE)
@try_except_
4
ZALANDO AT A GLANCE
~ 4.5billion EUR
revenue 2017
> 200
million
visits
per
month
> 15.000
employees in
Europe
> 70%
of visits via
mobile devices
> 24
million
active customers
> 300.000
product choices
~ 2.000
brands
17
countries
5
SCALE
100Clusters
373Accounts
6
KUBERNETES: IT'S ALL ABOUT RESOURCES
Node Node
Pods demand capacity
Nodes offer capacity
Scheduler
7
COMPUTE RESOURCE TYPES
● CPU
● Memory
● Local ephemeral storage (1.12+)
● Extended Resources
○ GPU
○ TPU?
Node
8
KUBERNETES RESOURCES
CPU
○ Base: 1 AWS vCPU (or GCP Core or ..)
○ Example: 100m (0.1 vCPU, "100 Millicores")
Memory
○ Base: 1 Byte
○ Example: 500Mi (500 MiB memory)
9
REQUESTS / LIMITS
Requests
○ Affect Scheduling Decision
○ Priority (CPU, OOM adjust)
Limits
○ Limit maximum container usage
resources:
requests:
cpu: 100m
memory: 300Mi
limits:
cpu: 1
memory: 300Mi
10
CPU
Memory
REQUESTS: POD SCHEDULING
CPU
Memory
CPU
Memory
Node 1
Node 2
Pod 1
Requests
11
POD SCHEDULING
CPU
Memory
CPU
Memory
Node 1
Node 2
Pod 2
12
POD SCHEDULING
CPU
Memory
CPU
Memory
Node 1
Node 2
Pod 3
13
POD SCHEDULING
CPU
Memory
CPU
Memory
Node 1
Node 2
Pod 4
14
POD SCHEDULING: TRY TO FIT
CPU
Memory
CPU
Memory
Node 1
Node 2
15
POD SCHEDULING: NO CAPACITY
CPU
Memory
CPU
Memory
Node 1
Node 2
Pod 4
"PENDING"
16
REQUESTS: CPU SHARES
docker run --requests=cpu=10m/5m ..sha512()..
cat /sys/fs/cgroup/cpu/kubepods/burstable/pod5d5..0d/cpu.shares
10 // relative share of CPU time
cat /sys/fs/cgroup/cpu/kubepods/burstable/pod6e0..0d/cpu.shares
5 // relative share of CPU time
cat /sys/fs/cgroup/cpuacct/kubepods/burstable/pod5d5..0d/cpuacct.usage
/sys/fs/cgroup/cpuacct/kubepods/burstable/pod6e0..0d/cpuacct.usage
13432815283 // total CPU time in nanoseconds
7528759332 // total CPU time in nanoseconds
17
LIMITS: COMPRESSIBLE RESOURCES
Can be taken away quickly,
"only" cause slowness
CPU Throttling
200m CPU limit ⇒
container can use
0.2s of CPU time per second
18
CPU THROTTLING
docker run --cpus CPUS -it python
python -m timeit -s 'import hashlib' -n 10000 -v
'hashlib.sha512().update(b"foo")'
CPUS=1.0 3.8 - 4ms
CPUS=0.5 3.8 - 52ms
CPUS=0.2 6.8 - 88ms
CPUS=0.1 5.7 - 190ms
19
LIMITS: NON-COMPRESSIBLE RESOURCES
Hold state,
are slower to take away.
Killing (OOMKill)
20
MEMORY LIMITS: OUT OF MEMORY
kubectl get pod
NAME READY STATUS RESTARTS AGE
kube-ops-view-7bc-tcwkt 0/1 CrashLoopBackOff 3 2m
kubectl describe pod kube-ops-view-7bc-tcwkt
...
Last State: Terminated
Reason: OOMKilled
Exit Code: 137
21
QOS
Guaranteed: all containers have limits == requests
Burstable: some containers have limits > requests
BestEffort: no requests/limits set
kubectl describe pod …
…
Limits:
memory: 100Mi
Requests:
cpu: 100m
memory: 100Mi
QoS Class: Burstable
22
OVERCOMMIT
Limits > Requests ⇒ Burstable QoS ⇒ Overcommit
Overcommit for CPU: fine, running into completely fair scheduling
Overcommit for memory: fine, as long as demand < node capacity
https://code.fb.com/production-engineering/oomd/
Might run into unpredictable OOM situations
when demand reaches node's memory
capacity (Kernel OOM Killer)
23
LIMITS: CGROUPS
docker run --cpus 1 -m 200m --rm -it busybox
cat /sys/fs/cgroup/cpu/docker/8ab25..1c/cpu.{shares,cfs_*}
1024 // cpu.shares (default value)
100000 // cpu.cfs_period_us (100ms period length)
100000 // cpu.cfs_quota_us (total CPU time in µs consumable per period)
cat /sys/fs/cgroup/memory/docker/8ab25..1c/memory.limit_in_bytes
209715200
24
LIMITS: PROBLEMS
1. CPU CFS Quota: Latency
2. Memory: accounting, OOM behavior
25
PROBLEMS: LATENCY
https://github.com/zalando-incubator/kubernetes-on-aws/pull/923
26
PROBLEMS: HARDCODED PERIOD
27
PROBLEMS: HARDCODED PERIOD
https://github.com/kubernetes/kubernetes/issues/51135
28
WHAT'S A GOOD VALUE FOR CFS PERIOD?
Latencyp95→
29
NOW IN KUBERNETES 1.12
https://github.com/kubernetes/kubernetes/pull/63437
30
OVERLY AGGRESSIVE CFS
31
OVERLY AGGRESSIVE CFS: EXPERIMENT #1
CPU Period: 100ms
CPU Quota: None
Burn 5ms and sleep 100ms
⇒ Quota disabled
⇒ No Throttling expected!
https://gist.github.com/bobrik/2030ff040fad360327a5fab7a09c4ff1
32
EXPERIMENT #1: NO QUOTA, NO THROTTLING
2018/11/03 13:04:02 [0] burn took 5ms, real time so far: 5ms, cpu time so far: 6ms
2018/11/03 13:04:03 [1] burn took 5ms, real time so far: 510ms, cpu time so far: 11ms
2018/11/03 13:04:03 [2] burn took 5ms, real time so far: 1015ms, cpu time so far: 17ms
2018/11/03 13:04:04 [3] burn took 5ms, real time so far: 1520ms, cpu time so far: 23ms
2018/11/03 13:04:04 [4] burn took 5ms, real time so far: 2025ms, cpu time so far: 29ms
2018/11/03 13:04:05 [5] burn took 5ms, real time so far: 2530ms, cpu time so far: 35ms
2018/11/03 13:04:05 [6] burn took 5ms, real time so far: 3036ms, cpu time so far: 40ms
2018/11/03 13:04:06 [7] burn took 5ms, real time so far: 3541ms, cpu time so far: 46ms
2018/11/03 13:04:06 [8] burn took 5ms, real time so far: 4046ms, cpu time so far: 52ms
2018/11/03 13:04:07 [9] burn took 5ms, real time so far: 4551ms, cpu time so far: 58ms
33
OVERLY AGGRESSIVE CFS: EXPERIMENT #2
CPU Period: 100ms
CPU Quota: 20ms
Burn 5ms and sleep 500ms
⇒ No 100ms intervals where possibly 20ms is burned
⇒ No Throttling expected!
34
EXPERIMENT #2: OVERLY AGGRESSIVE CFS
2018/11/03 13:05:05 [0] burn took 5ms, real time so far: 5ms, cpu time so far: 5ms
2018/11/03 13:05:06 [1] burn took 99ms, real time so far: 690ms, cpu time so far: 9ms
2018/11/03 13:05:06 [2] burn took 99ms, real time so far: 1290ms, cpu time so far: 14ms
2018/11/03 13:05:07 [3] burn took 99ms, real time so far: 1890ms, cpu time so far: 18ms
2018/11/03 13:05:07 [4] burn took 5ms, real time so far: 2395ms, cpu time so far: 24ms
2018/11/03 13:05:08 [5] burn took 94ms, real time so far: 2990ms, cpu time so far: 27ms
2018/11/03 13:05:09 [6] burn took 99ms, real time so far: 3590ms, cpu time so far: 32ms
2018/11/03 13:05:09 [7] burn took 5ms, real time so far: 4095ms, cpu time so far: 37ms
2018/11/03 13:05:10 [8] burn took 5ms, real time so far: 4600ms, cpu time so far: 43ms
2018/11/03 13:05:10 [9] burn took 5ms, real time so far: 5105ms, cpu time so far: 49ms
35
OVERLY AGGRESSIVE CFS: EXPERIMENT #3
CPU Period: 10ms
CPU Quota: 2ms
Burn 5ms and sleep 100ms
⇒ Same 20% CPU (200m) limit, but smaller period
⇒ Throttling expected!
36
SMALLER CPU PERIOD ⇒ BETTER LATENCY
2018/11/03 16:31:07 [0] burn took 18ms, real time so far: 18ms, cpu time so far: 6ms
2018/11/03 16:31:07 [1] burn took 9ms, real time so far: 128ms, cpu time so far: 8ms
2018/11/03 16:31:07 [2] burn took 9ms, real time so far: 238ms, cpu time so far: 13ms
2018/11/03 16:31:07 [3] burn took 5ms, real time so far: 343ms, cpu time so far: 18ms
2018/11/03 16:31:07 [4] burn took 30ms, real time so far: 488ms, cpu time so far: 24ms
2018/11/03 16:31:07 [5] burn took 19ms, real time so far: 608ms, cpu time so far: 29ms
2018/11/03 16:31:07 [6] burn took 9ms, real time so far: 718ms, cpu time so far: 34ms
2018/11/03 16:31:08 [7] burn took 5ms, real time so far: 824ms, cpu time so far: 40ms
2018/11/03 16:31:08 [8] burn took 5ms, real time so far: 943ms, cpu time so far: 45ms
2018/11/03 16:31:08 [9] burn took 9ms, real time so far: 1068ms, cpu time so far: 48ms
37
LIMITS: VISIBILITY
docker run --cpus 1 -m 200m --rm -it busybox top
Mem: 7369128K used, 726072K free, 128164K shrd, 303924K buff, 1208132K cached
CPU0: 14.8% usr 8.4% sys 0.2% nic 67.6% idle 8.2% io 0.0% irq 0.6% sirq
CPU1: 8.8% usr 10.3% sys 0.0% nic 75.9% idle 4.4% io 0.0% irq 0.4% sirq
CPU2: 7.3% usr 8.7% sys 0.0% nic 63.2% idle 20.1% io 0.0% irq 0.6% sirq
CPU3: 9.3% usr 9.9% sys 0.0% nic 65.7% idle 14.5% io 0.0% irq 0.4% sirq
38
• Container-aware memory configuration
• JVM MaxHeap
• Container-aware processor configuration
• Thread pools
• GOMAXPROCS
• node.js cluster module
LIMITS: VISIBILITY
39
ZALANDO: DEVELOPERS USING KUBERNETES
40
KUBERNETES RESOURCES
41
ZALANDO: DECISION
• Forbid Memory Overcommit
• Implement mutating admission webhook
• Set requests = limits
• Disable CPU CFS Quota in all clusters
42
ZALANDO: DISABLING CPU LIMITS
43
INGRESS LATENCY IMPROVEMENT
Example: Getting started
with Zalenium & UI Tests
Example: Step by step guide to the first UI test with Zalenium running in the
Continuous Delivery Platform. I was always afraid of UI tests because it looked too
difficult to get started, Zalenium solved this problem for me.
45
CLUSTER AUTOSCALER
Simulates the Kubernetes scheduler internally to find out..
• ..if any of the pods wouldn’t fit on existing nodes
⇒ upscale is needed
• ..if it’s possible to fit some of the pods on existing nodes
⇒ downscale is needed
⇒ Cluster size is determined by resource requests (+ constraints)
github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler
46
ALLOCATABLE
Reserve resources for
system components,
Kubelet, and container runtime:
--system-reserved=
cpu=100m,memory=164Mi
--kube-reserved=
cpu=100m,memory=282Mi
47
CPU/memory requests "block" resources on nodes.
Difference between actual usage and requests → Slack
SLACK
CPU
Memory
Node
"Slack"
48
STRANDED RESOURCES
Stranded
CPU
Memory
CPU
Memory
Node 1
Node 2
Some available capacity
can become unusable /
stranded.
⇒ Reschedule, bin packing
49
HOW TO FIND GOOD VALUES FOR REQUESTS?
resources:
limits:
memory: "2Gi"
requests:
cpu: "1.5"
memory: "2Gi"
50
TWO TOOLS
● Kubernetes Resource Report
○ Dashboard showing Kubernetes Resource Slack
● Kubernetes Application Dashboard
○ Grafana Dashboard showing historical CPU and Memory
usage for Kubernetes Applications
51
KUBERNETES RESOURCE REPORT
github.com/hjacobs/kube-resource-report
52
KUBERNETES RESOURCE REPORT
"Slack"
53
KUBERNETES APPLICATION DASHBOARD
54
VERTICAL POD AUTOSCALER (VPA)
"Some 2/3 of the Borg users use autopilot."
- Tim Hockin
VPA: Set resource requests
automatically based on usage.
55
VPA FOR PROMETHEUS
apiVersion: poc.autoscaling.k8s.io/v1alpha1
kind: VerticalPodAutoscaler
metadata:
name: prometheus-vpa
namespace: kube-system
spec:
selector:
matchLabels:
application: prometheus
updatePolicy:
updateMode: Auto
CPU/memory
56
AUTOSCALING BUFFER
• Cluster Autoscaler only triggers on Pending Pods
• Node provisioning is slow
⇒ Reserve extra capacity via low priority Pods
"Autoscaling Buffer Pods"
57
AUTOSCALING BUFFER
kubectl describe pod autoscaling-buffer-eu-central-1a-78-rzjq5 -n kube-system
...
Namespace: kube-system
Priority: -1000000
PriorityClassName: autoscaling-buffer
Containers:
pause:
Image: teapot/pause-amd64:3.1
Requests:
cpu: 1600m
memory: 6871947673
Evict if higher
priority (default)
Pod needs
capacity
58
HORIZONTAL POD AUTOSCALER
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: myapp
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: myapp
minReplicas: 3
maxReplicas: 5
metrics:
- type: Resource
resource:
name: cpu
targetAverageUtilization: 100
target: ~100% of
CPU requests
...
59
HORIZONTAL POD AUTOSCALING (CUSTOM METRICS)
Queue Length
Prometheus Query
Ingress Req/s
ZMON Check
github.com/zalando-incubator/kube-metrics-adapter
60
DOWNSCALING DURING OFF-HOURS
github.com/hjacobs/kube-downscaler
Weekend
61
DOWNSCALING DURING OFF-HOURS
DEFAULT_UPTIME="Mon-Fri 07:30-20:30 CET"
annotations:
downscaler/exclude: "true"
github.com/hjacobs/kube-downscaler
62
WHAT WORKED FOR US
● Disable CPU CFS Quota in all clusters
● Admission Controller:
○ Prevent memory overcommit
○ Set default requests
● Kubernetes Resource Report
● Downscaling during off-hours
63
WRAP UP
● Understanding Requests/Limits
● Deciding on quota period (or disable), overcommit
● Autoscaling (cluster-autoscaler, HPA, VPA, downscaler)
● Buffer capacity
● Optimizing slack
64
OPEN SOURCE
Kubernetes on AWS
github.com/zalando-incubator/kubernetes-on-aws
AWS ALB Ingress controller
github.com/zalando-incubator/kube-ingress-aws-controller
Skipper HTTP Router & Ingress controller
github.com/zalando/skipper
External DNS
github.com/kubernetes-incubator/external-dns
Postgres Operator
github.com/zalando-incubator/postgres-operator
Kubernetes Resource Report
github.com/hjacobs/kube-resource-report
Kubernetes Downscaler
github.com/hjacobs/kube-downscaler
https://github.com/hjacobs/kube-ops-view
66
OTHER TALKS
• Everything You Ever Wanted to Know About Resource Scheduling
• Inside Kubernetes Resource Management (QoS) - KubeCon 2018
• Setting Resource Requests and Limits in Kubernetes (Best Practices)
QUESTIONS?
HENNING JACOBS
HEAD OF
DEVELOPER PRODUCTIVITY
henning@zalando.de
@try_except_
Illustrations by @01k

Mais conteúdo relacionado

Mais procurados

쿠버네티스를 이용한 기능 브랜치별 테스트 서버 만들기 (GitOps CI/CD)
쿠버네티스를 이용한 기능 브랜치별 테스트 서버 만들기 (GitOps CI/CD)쿠버네티스를 이용한 기능 브랜치별 테스트 서버 만들기 (GitOps CI/CD)
쿠버네티스를 이용한 기능 브랜치별 테스트 서버 만들기 (GitOps CI/CD)충섭 김
 
왜 쿠버네티스는 systemd로 cgroup을 관리하려고 할까요
왜 쿠버네티스는 systemd로 cgroup을 관리하려고 할까요왜 쿠버네티스는 systemd로 cgroup을 관리하려고 할까요
왜 쿠버네티스는 systemd로 cgroup을 관리하려고 할까요Jo Hoon
 
Kata Container - The Security of VM and The Speed of Container | Yuntong Jin
Kata Container - The Security of VM and The Speed of Container | Yuntong Jin	Kata Container - The Security of VM and The Speed of Container | Yuntong Jin
Kata Container - The Security of VM and The Speed of Container | Yuntong Jin Vietnam Open Infrastructure User Group
 
Free GitOps Workshop + Intro to Kubernetes & GitOps
Free GitOps Workshop + Intro to Kubernetes & GitOpsFree GitOps Workshop + Intro to Kubernetes & GitOps
Free GitOps Workshop + Intro to Kubernetes & GitOpsWeaveworks
 
How Netflix Tunes EC2 Instances for Performance
How Netflix Tunes EC2 Instances for PerformanceHow Netflix Tunes EC2 Instances for Performance
How Netflix Tunes EC2 Instances for PerformanceBrendan Gregg
 
Storing 16 Bytes at Scale
Storing 16 Bytes at ScaleStoring 16 Bytes at Scale
Storing 16 Bytes at ScaleFabian Reinartz
 
[Outdated] Secrets of Performance Tuning Java on Kubernetes
[Outdated] Secrets of Performance Tuning Java on Kubernetes[Outdated] Secrets of Performance Tuning Java on Kubernetes
[Outdated] Secrets of Performance Tuning Java on KubernetesBruno Borges
 
Amazon EKS - Elastic Container Service for Kubernetes
Amazon EKS - Elastic Container Service for KubernetesAmazon EKS - Elastic Container Service for Kubernetes
Amazon EKS - Elastic Container Service for KubernetesAmazon Web Services
 
Secrets of Performance Tuning Java on Kubernetes
Secrets of Performance Tuning Java on KubernetesSecrets of Performance Tuning Java on Kubernetes
Secrets of Performance Tuning Java on KubernetesBruno Borges
 
[오픈소스컨설팅] EFK Stack 소개와 설치 방법
[오픈소스컨설팅] EFK Stack 소개와 설치 방법[오픈소스컨설팅] EFK Stack 소개와 설치 방법
[오픈소스컨설팅] EFK Stack 소개와 설치 방법Open Source Consulting
 
Rancher 2.0 Technical Deep Dive
Rancher 2.0 Technical Deep DiveRancher 2.0 Technical Deep Dive
Rancher 2.0 Technical Deep DiveLINE Corporation
 
Autoscaling Kubernetes
Autoscaling KubernetesAutoscaling Kubernetes
Autoscaling Kubernetescraigbox
 
Automation with ansible
Automation with ansibleAutomation with ansible
Automation with ansibleKhizer Naeem
 
Bringing Kafka Without Zookeeper Into Production with Colin McCabe | Kafka Su...
Bringing Kafka Without Zookeeper Into Production with Colin McCabe | Kafka Su...Bringing Kafka Without Zookeeper Into Production with Colin McCabe | Kafka Su...
Bringing Kafka Without Zookeeper Into Production with Colin McCabe | Kafka Su...HostedbyConfluent
 
The Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and ContainersThe Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and ContainersSATOSHI TAGOMORI
 
Kubernetes - A Comprehensive Overview
Kubernetes - A Comprehensive OverviewKubernetes - A Comprehensive Overview
Kubernetes - A Comprehensive OverviewBob Killen
 

Mais procurados (20)

쿠버네티스를 이용한 기능 브랜치별 테스트 서버 만들기 (GitOps CI/CD)
쿠버네티스를 이용한 기능 브랜치별 테스트 서버 만들기 (GitOps CI/CD)쿠버네티스를 이용한 기능 브랜치별 테스트 서버 만들기 (GitOps CI/CD)
쿠버네티스를 이용한 기능 브랜치별 테스트 서버 만들기 (GitOps CI/CD)
 
왜 쿠버네티스는 systemd로 cgroup을 관리하려고 할까요
왜 쿠버네티스는 systemd로 cgroup을 관리하려고 할까요왜 쿠버네티스는 systemd로 cgroup을 관리하려고 할까요
왜 쿠버네티스는 systemd로 cgroup을 관리하려고 할까요
 
Kata Container - The Security of VM and The Speed of Container | Yuntong Jin
Kata Container - The Security of VM and The Speed of Container | Yuntong Jin	Kata Container - The Security of VM and The Speed of Container | Yuntong Jin
Kata Container - The Security of VM and The Speed of Container | Yuntong Jin
 
Free GitOps Workshop + Intro to Kubernetes & GitOps
Free GitOps Workshop + Intro to Kubernetes & GitOpsFree GitOps Workshop + Intro to Kubernetes & GitOps
Free GitOps Workshop + Intro to Kubernetes & GitOps
 
How Netflix Tunes EC2 Instances for Performance
How Netflix Tunes EC2 Instances for PerformanceHow Netflix Tunes EC2 Instances for Performance
How Netflix Tunes EC2 Instances for Performance
 
Apache Kafka Best Practices
Apache Kafka Best PracticesApache Kafka Best Practices
Apache Kafka Best Practices
 
Storing 16 Bytes at Scale
Storing 16 Bytes at ScaleStoring 16 Bytes at Scale
Storing 16 Bytes at Scale
 
Kubernetes PPT.pptx
Kubernetes PPT.pptxKubernetes PPT.pptx
Kubernetes PPT.pptx
 
DevOps with Kubernetes
DevOps with KubernetesDevOps with Kubernetes
DevOps with Kubernetes
 
[Outdated] Secrets of Performance Tuning Java on Kubernetes
[Outdated] Secrets of Performance Tuning Java on Kubernetes[Outdated] Secrets of Performance Tuning Java on Kubernetes
[Outdated] Secrets of Performance Tuning Java on Kubernetes
 
Ceph issue 해결 사례
Ceph issue 해결 사례Ceph issue 해결 사례
Ceph issue 해결 사례
 
Amazon EKS - Elastic Container Service for Kubernetes
Amazon EKS - Elastic Container Service for KubernetesAmazon EKS - Elastic Container Service for Kubernetes
Amazon EKS - Elastic Container Service for Kubernetes
 
Secrets of Performance Tuning Java on Kubernetes
Secrets of Performance Tuning Java on KubernetesSecrets of Performance Tuning Java on Kubernetes
Secrets of Performance Tuning Java on Kubernetes
 
[오픈소스컨설팅] EFK Stack 소개와 설치 방법
[오픈소스컨설팅] EFK Stack 소개와 설치 방법[오픈소스컨설팅] EFK Stack 소개와 설치 방법
[오픈소스컨설팅] EFK Stack 소개와 설치 방법
 
Rancher 2.0 Technical Deep Dive
Rancher 2.0 Technical Deep DiveRancher 2.0 Technical Deep Dive
Rancher 2.0 Technical Deep Dive
 
Autoscaling Kubernetes
Autoscaling KubernetesAutoscaling Kubernetes
Autoscaling Kubernetes
 
Automation with ansible
Automation with ansibleAutomation with ansible
Automation with ansible
 
Bringing Kafka Without Zookeeper Into Production with Colin McCabe | Kafka Su...
Bringing Kafka Without Zookeeper Into Production with Colin McCabe | Kafka Su...Bringing Kafka Without Zookeeper Into Production with Colin McCabe | Kafka Su...
Bringing Kafka Without Zookeeper Into Production with Colin McCabe | Kafka Su...
 
The Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and ContainersThe Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and Containers
 
Kubernetes - A Comprehensive Overview
Kubernetes - A Comprehensive OverviewKubernetes - A Comprehensive Overview
Kubernetes - A Comprehensive Overview
 

Semelhante a Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latency - Highload++

Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latency
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and LatencyOptimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latency
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and LatencyHenning Jacobs
 
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...Henning Jacobs
 
Mastering java in containers - MadridJUG
Mastering java in containers - MadridJUGMastering java in containers - MadridJUG
Mastering java in containers - MadridJUGJorge Morales
 
Tuning Solr for Logs: Presented by Radu Gheorghe, Sematext
Tuning Solr for Logs: Presented by Radu Gheorghe, SematextTuning Solr for Logs: Presented by Radu Gheorghe, Sematext
Tuning Solr for Logs: Presented by Radu Gheorghe, SematextLucidworks
 
JITServerTalk.pdf
JITServerTalk.pdfJITServerTalk.pdf
JITServerTalk.pdfRichHagarty
 
Kvm performance optimization for ubuntu
Kvm performance optimization for ubuntuKvm performance optimization for ubuntu
Kvm performance optimization for ubuntuSim Janghoon
 
JITServerTalk-OSS-2023.pdf
JITServerTalk-OSS-2023.pdfJITServerTalk-OSS-2023.pdf
JITServerTalk-OSS-2023.pdfRichHagarty
 
Build an High-Performance and High-Durable Block Storage Service Based on Ceph
Build an High-Performance and High-Durable Block Storage Service Based on CephBuild an High-Performance and High-Durable Block Storage Service Based on Ceph
Build an High-Performance and High-Durable Block Storage Service Based on CephRongze Zhu
 
Sizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-Final
Sizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-FinalSizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-Final
Sizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-FinalVigyan Jain
 
Speedrunning the Open Street Map osm2pgsql Loader
Speedrunning the Open Street Map osm2pgsql LoaderSpeedrunning the Open Street Map osm2pgsql Loader
Speedrunning the Open Street Map osm2pgsql LoaderGregSmith458515
 
KOCOON – KAKAO Automatic K8S Monitoring
KOCOON – KAKAO Automatic K8S MonitoringKOCOON – KAKAO Automatic K8S Monitoring
KOCOON – KAKAO Automatic K8S Monitoringissac lim
 
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...Amazon Web Services
 
Container Performance Analysis Brendan Gregg, Netflix
Container Performance Analysis Brendan Gregg, NetflixContainer Performance Analysis Brendan Gregg, Netflix
Container Performance Analysis Brendan Gregg, NetflixDocker, Inc.
 
JVM Garbage Collection Tuning
JVM Garbage Collection TuningJVM Garbage Collection Tuning
JVM Garbage Collection Tuningihji
 
Container Performance Analysis
Container Performance AnalysisContainer Performance Analysis
Container Performance AnalysisBrendan Gregg
 
Devoxx France 2018 : Mes Applications en Production sur Kubernetes
Devoxx France 2018 : Mes Applications en Production sur KubernetesDevoxx France 2018 : Mes Applications en Production sur Kubernetes
Devoxx France 2018 : Mes Applications en Production sur KubernetesMichaël Morello
 
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...Amazon Web Services
 

Semelhante a Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latency - Highload++ (20)

Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latency
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and LatencyOptimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latency
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latency
 
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...
 
Mastering java in containers - MadridJUG
Mastering java in containers - MadridJUGMastering java in containers - MadridJUG
Mastering java in containers - MadridJUG
 
Tuning Solr for Logs: Presented by Radu Gheorghe, Sematext
Tuning Solr for Logs: Presented by Radu Gheorghe, SematextTuning Solr for Logs: Presented by Radu Gheorghe, Sematext
Tuning Solr for Logs: Presented by Radu Gheorghe, Sematext
 
JITServerTalk.pdf
JITServerTalk.pdfJITServerTalk.pdf
JITServerTalk.pdf
 
Kvm performance optimization for ubuntu
Kvm performance optimization for ubuntuKvm performance optimization for ubuntu
Kvm performance optimization for ubuntu
 
MesosCon 2018
MesosCon 2018MesosCon 2018
MesosCon 2018
 
JITServerTalk-OSS-2023.pdf
JITServerTalk-OSS-2023.pdfJITServerTalk-OSS-2023.pdf
JITServerTalk-OSS-2023.pdf
 
Build an High-Performance and High-Durable Block Storage Service Based on Ceph
Build an High-Performance and High-Durable Block Storage Service Based on CephBuild an High-Performance and High-Durable Block Storage Service Based on Ceph
Build an High-Performance and High-Durable Block Storage Service Based on Ceph
 
Sizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-Final
Sizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-FinalSizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-Final
Sizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-Final
 
Speedrunning the Open Street Map osm2pgsql Loader
Speedrunning the Open Street Map osm2pgsql LoaderSpeedrunning the Open Street Map osm2pgsql Loader
Speedrunning the Open Street Map osm2pgsql Loader
 
KOCOON – KAKAO Automatic K8S Monitoring
KOCOON – KAKAO Automatic K8S MonitoringKOCOON – KAKAO Automatic K8S Monitoring
KOCOON – KAKAO Automatic K8S Monitoring
 
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
 
Container Performance Analysis Brendan Gregg, Netflix
Container Performance Analysis Brendan Gregg, NetflixContainer Performance Analysis Brendan Gregg, Netflix
Container Performance Analysis Brendan Gregg, Netflix
 
JVM Garbage Collection Tuning
JVM Garbage Collection TuningJVM Garbage Collection Tuning
JVM Garbage Collection Tuning
 
Container Performance Analysis
Container Performance AnalysisContainer Performance Analysis
Container Performance Analysis
 
PROJECT GREEN
PROJECT GREENPROJECT GREEN
PROJECT GREEN
 
Devoxx France 2018 : Mes Applications en Production sur Kubernetes
Devoxx France 2018 : Mes Applications en Production sur KubernetesDevoxx France 2018 : Mes Applications en Production sur Kubernetes
Devoxx France 2018 : Mes Applications en Production sur Kubernetes
 
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
 
Deep Dive on Amazon EC2
Deep Dive on Amazon EC2Deep Dive on Amazon EC2
Deep Dive on Amazon EC2
 

Mais de Henning Jacobs

How Zalando runs Kubernetes clusters at scale on AWS - AWS re:Invent
How Zalando runs Kubernetes clusters at scale on AWS - AWS re:InventHow Zalando runs Kubernetes clusters at scale on AWS - AWS re:Invent
How Zalando runs Kubernetes clusters at scale on AWS - AWS re:InventHenning Jacobs
 
Open Source at Zalando - OSB Open Source Day 2019
Open Source at Zalando - OSB Open Source Day 2019Open Source at Zalando - OSB Open Source Day 2019
Open Source at Zalando - OSB Open Source Day 2019Henning Jacobs
 
Why I love Kubernetes Failure Stories and you should too - GOTO Berlin
Why I love Kubernetes Failure Stories and you should too - GOTO BerlinWhy I love Kubernetes Failure Stories and you should too - GOTO Berlin
Why I love Kubernetes Failure Stories and you should too - GOTO BerlinHenning Jacobs
 
Why Kubernetes? Cloud Native and Developer Experience at Zalando - Enterprise...
Why Kubernetes? Cloud Native and Developer Experience at Zalando - Enterprise...Why Kubernetes? Cloud Native and Developer Experience at Zalando - Enterprise...
Why Kubernetes? Cloud Native and Developer Experience at Zalando - Enterprise...Henning Jacobs
 
Why Kubernetes? Cloud Native and Developer Experience at Zalando - OWL Tech &...
Why Kubernetes? Cloud Native and Developer Experience at Zalando - OWL Tech &...Why Kubernetes? Cloud Native and Developer Experience at Zalando - OWL Tech &...
Why Kubernetes? Cloud Native and Developer Experience at Zalando - OWL Tech &...Henning Jacobs
 
Kubernetes + Python = ❤ - Cloud Native Prague
Kubernetes + Python = ❤ - Cloud Native PragueKubernetes + Python = ❤ - Cloud Native Prague
Kubernetes + Python = ❤ - Cloud Native PragueHenning Jacobs
 
Kubernetes Failure Stories, or: How to Crash Your Cluster - ContainerDays EU ...
Kubernetes Failure Stories, or: How to Crash Your Cluster - ContainerDays EU ...Kubernetes Failure Stories, or: How to Crash Your Cluster - ContainerDays EU ...
Kubernetes Failure Stories, or: How to Crash Your Cluster - ContainerDays EU ...Henning Jacobs
 
Why we don’t use the Term DevOps: the Journey to a Product Mindset - DevOpsCo...
Why we don’t use the Term DevOps: the Journey to a Product Mindset - DevOpsCo...Why we don’t use the Term DevOps: the Journey to a Product Mindset - DevOpsCo...
Why we don’t use the Term DevOps: the Journey to a Product Mindset - DevOpsCo...Henning Jacobs
 
Why we don’t use the Term DevOps: the Journey to a Product Mindset - Destinat...
Why we don’t use the Term DevOps: the Journey to a Product Mindset - Destinat...Why we don’t use the Term DevOps: the Journey to a Product Mindset - Destinat...
Why we don’t use the Term DevOps: the Journey to a Product Mindset - Destinat...Henning Jacobs
 
Kubernetes Failure Stories - KubeCon Europe Barcelona
Kubernetes Failure Stories - KubeCon Europe BarcelonaKubernetes Failure Stories - KubeCon Europe Barcelona
Kubernetes Failure Stories - KubeCon Europe BarcelonaHenning Jacobs
 
Developer Experience at Zalando - CNCF End User SIG-DX
Developer Experience at Zalando - CNCF End User SIG-DXDeveloper Experience at Zalando - CNCF End User SIG-DX
Developer Experience at Zalando - CNCF End User SIG-DXHenning Jacobs
 
Let's talk about Failures with Kubernetes - Hamburg Meetup
Let's talk about Failures with Kubernetes - Hamburg MeetupLet's talk about Failures with Kubernetes - Hamburg Meetup
Let's talk about Failures with Kubernetes - Hamburg MeetupHenning Jacobs
 
Developer Experience at Zalando - Handelsblatt Strategisches IT-Management 2019
Developer Experience at Zalando - Handelsblatt Strategisches IT-Management 2019Developer Experience at Zalando - Handelsblatt Strategisches IT-Management 2019
Developer Experience at Zalando - Handelsblatt Strategisches IT-Management 2019Henning Jacobs
 
Running Kubernetes in Production: A Million Ways to Crash Your Cluster - DevO...
Running Kubernetes in Production: A Million Ways to Crash Your Cluster - DevO...Running Kubernetes in Production: A Million Ways to Crash Your Cluster - DevO...
Running Kubernetes in Production: A Million Ways to Crash Your Cluster - DevO...Henning Jacobs
 
Running Kubernetes in Production: A Million Ways to Crash Your Cluster - Cont...
Running Kubernetes in Production: A Million Ways to Crash Your Cluster - Cont...Running Kubernetes in Production: A Million Ways to Crash Your Cluster - Cont...
Running Kubernetes in Production: A Million Ways to Crash Your Cluster - Cont...Henning Jacobs
 
API First with Connexion - PyConWeb 2018
API First with Connexion - PyConWeb 2018API First with Connexion - PyConWeb 2018
API First with Connexion - PyConWeb 2018Henning Jacobs
 
Developer Journey at Zalando - Idea to Production with Containers in the Clou...
Developer Journey at Zalando - Idea to Production with Containers in the Clou...Developer Journey at Zalando - Idea to Production with Containers in the Clou...
Developer Journey at Zalando - Idea to Production with Containers in the Clou...Henning Jacobs
 
Kubernetes on AWS at Zalando: Failures & Learnings - DevOps NRW
Kubernetes on AWS at Zalando: Failures & Learnings - DevOps NRWKubernetes on AWS at Zalando: Failures & Learnings - DevOps NRW
Kubernetes on AWS at Zalando: Failures & Learnings - DevOps NRWHenning Jacobs
 
Large Scale Kubernetes on AWS at Europe's Leading Online Fashion Platform - C...
Large Scale Kubernetes on AWS at Europe's Leading Online Fashion Platform - C...Large Scale Kubernetes on AWS at Europe's Leading Online Fashion Platform - C...
Large Scale Kubernetes on AWS at Europe's Leading Online Fashion Platform - C...Henning Jacobs
 
From AWS/STUPS to Kubernetes on AWS @Zalando - Berlin Kubernetes Meetup
From AWS/STUPS to Kubernetes on AWS @Zalando - Berlin Kubernetes MeetupFrom AWS/STUPS to Kubernetes on AWS @Zalando - Berlin Kubernetes Meetup
From AWS/STUPS to Kubernetes on AWS @Zalando - Berlin Kubernetes MeetupHenning Jacobs
 

Mais de Henning Jacobs (20)

How Zalando runs Kubernetes clusters at scale on AWS - AWS re:Invent
How Zalando runs Kubernetes clusters at scale on AWS - AWS re:InventHow Zalando runs Kubernetes clusters at scale on AWS - AWS re:Invent
How Zalando runs Kubernetes clusters at scale on AWS - AWS re:Invent
 
Open Source at Zalando - OSB Open Source Day 2019
Open Source at Zalando - OSB Open Source Day 2019Open Source at Zalando - OSB Open Source Day 2019
Open Source at Zalando - OSB Open Source Day 2019
 
Why I love Kubernetes Failure Stories and you should too - GOTO Berlin
Why I love Kubernetes Failure Stories and you should too - GOTO BerlinWhy I love Kubernetes Failure Stories and you should too - GOTO Berlin
Why I love Kubernetes Failure Stories and you should too - GOTO Berlin
 
Why Kubernetes? Cloud Native and Developer Experience at Zalando - Enterprise...
Why Kubernetes? Cloud Native and Developer Experience at Zalando - Enterprise...Why Kubernetes? Cloud Native and Developer Experience at Zalando - Enterprise...
Why Kubernetes? Cloud Native and Developer Experience at Zalando - Enterprise...
 
Why Kubernetes? Cloud Native and Developer Experience at Zalando - OWL Tech &...
Why Kubernetes? Cloud Native and Developer Experience at Zalando - OWL Tech &...Why Kubernetes? Cloud Native and Developer Experience at Zalando - OWL Tech &...
Why Kubernetes? Cloud Native and Developer Experience at Zalando - OWL Tech &...
 
Kubernetes + Python = ❤ - Cloud Native Prague
Kubernetes + Python = ❤ - Cloud Native PragueKubernetes + Python = ❤ - Cloud Native Prague
Kubernetes + Python = ❤ - Cloud Native Prague
 
Kubernetes Failure Stories, or: How to Crash Your Cluster - ContainerDays EU ...
Kubernetes Failure Stories, or: How to Crash Your Cluster - ContainerDays EU ...Kubernetes Failure Stories, or: How to Crash Your Cluster - ContainerDays EU ...
Kubernetes Failure Stories, or: How to Crash Your Cluster - ContainerDays EU ...
 
Why we don’t use the Term DevOps: the Journey to a Product Mindset - DevOpsCo...
Why we don’t use the Term DevOps: the Journey to a Product Mindset - DevOpsCo...Why we don’t use the Term DevOps: the Journey to a Product Mindset - DevOpsCo...
Why we don’t use the Term DevOps: the Journey to a Product Mindset - DevOpsCo...
 
Why we don’t use the Term DevOps: the Journey to a Product Mindset - Destinat...
Why we don’t use the Term DevOps: the Journey to a Product Mindset - Destinat...Why we don’t use the Term DevOps: the Journey to a Product Mindset - Destinat...
Why we don’t use the Term DevOps: the Journey to a Product Mindset - Destinat...
 
Kubernetes Failure Stories - KubeCon Europe Barcelona
Kubernetes Failure Stories - KubeCon Europe BarcelonaKubernetes Failure Stories - KubeCon Europe Barcelona
Kubernetes Failure Stories - KubeCon Europe Barcelona
 
Developer Experience at Zalando - CNCF End User SIG-DX
Developer Experience at Zalando - CNCF End User SIG-DXDeveloper Experience at Zalando - CNCF End User SIG-DX
Developer Experience at Zalando - CNCF End User SIG-DX
 
Let's talk about Failures with Kubernetes - Hamburg Meetup
Let's talk about Failures with Kubernetes - Hamburg MeetupLet's talk about Failures with Kubernetes - Hamburg Meetup
Let's talk about Failures with Kubernetes - Hamburg Meetup
 
Developer Experience at Zalando - Handelsblatt Strategisches IT-Management 2019
Developer Experience at Zalando - Handelsblatt Strategisches IT-Management 2019Developer Experience at Zalando - Handelsblatt Strategisches IT-Management 2019
Developer Experience at Zalando - Handelsblatt Strategisches IT-Management 2019
 
Running Kubernetes in Production: A Million Ways to Crash Your Cluster - DevO...
Running Kubernetes in Production: A Million Ways to Crash Your Cluster - DevO...Running Kubernetes in Production: A Million Ways to Crash Your Cluster - DevO...
Running Kubernetes in Production: A Million Ways to Crash Your Cluster - DevO...
 
Running Kubernetes in Production: A Million Ways to Crash Your Cluster - Cont...
Running Kubernetes in Production: A Million Ways to Crash Your Cluster - Cont...Running Kubernetes in Production: A Million Ways to Crash Your Cluster - Cont...
Running Kubernetes in Production: A Million Ways to Crash Your Cluster - Cont...
 
API First with Connexion - PyConWeb 2018
API First with Connexion - PyConWeb 2018API First with Connexion - PyConWeb 2018
API First with Connexion - PyConWeb 2018
 
Developer Journey at Zalando - Idea to Production with Containers in the Clou...
Developer Journey at Zalando - Idea to Production with Containers in the Clou...Developer Journey at Zalando - Idea to Production with Containers in the Clou...
Developer Journey at Zalando - Idea to Production with Containers in the Clou...
 
Kubernetes on AWS at Zalando: Failures & Learnings - DevOps NRW
Kubernetes on AWS at Zalando: Failures & Learnings - DevOps NRWKubernetes on AWS at Zalando: Failures & Learnings - DevOps NRW
Kubernetes on AWS at Zalando: Failures & Learnings - DevOps NRW
 
Large Scale Kubernetes on AWS at Europe's Leading Online Fashion Platform - C...
Large Scale Kubernetes on AWS at Europe's Leading Online Fashion Platform - C...Large Scale Kubernetes on AWS at Europe's Leading Online Fashion Platform - C...
Large Scale Kubernetes on AWS at Europe's Leading Online Fashion Platform - C...
 
From AWS/STUPS to Kubernetes on AWS @Zalando - Berlin Kubernetes Meetup
From AWS/STUPS to Kubernetes on AWS @Zalando - Berlin Kubernetes MeetupFrom AWS/STUPS to Kubernetes on AWS @Zalando - Berlin Kubernetes Meetup
From AWS/STUPS to Kubernetes on AWS @Zalando - Berlin Kubernetes Meetup
 

Último

The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 

Último (20)

The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 

Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latency - Highload++

  • 1. Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latency Henning Jacobs (Zalando SE) @try_except_
  • 2.
  • 3.
  • 4. 4 ZALANDO AT A GLANCE ~ 4.5billion EUR revenue 2017 > 200 million visits per month > 15.000 employees in Europe > 70% of visits via mobile devices > 24 million active customers > 300.000 product choices ~ 2.000 brands 17 countries
  • 6. 6 KUBERNETES: IT'S ALL ABOUT RESOURCES Node Node Pods demand capacity Nodes offer capacity Scheduler
  • 7. 7 COMPUTE RESOURCE TYPES ● CPU ● Memory ● Local ephemeral storage (1.12+) ● Extended Resources ○ GPU ○ TPU? Node
  • 8. 8 KUBERNETES RESOURCES CPU ○ Base: 1 AWS vCPU (or GCP Core or ..) ○ Example: 100m (0.1 vCPU, "100 Millicores") Memory ○ Base: 1 Byte ○ Example: 500Mi (500 MiB memory)
  • 9. 9 REQUESTS / LIMITS Requests ○ Affect Scheduling Decision ○ Priority (CPU, OOM adjust) Limits ○ Limit maximum container usage resources: requests: cpu: 100m memory: 300Mi limits: cpu: 1 memory: 300Mi
  • 14. 14 POD SCHEDULING: TRY TO FIT CPU Memory CPU Memory Node 1 Node 2
  • 15. 15 POD SCHEDULING: NO CAPACITY CPU Memory CPU Memory Node 1 Node 2 Pod 4 "PENDING"
  • 16. 16 REQUESTS: CPU SHARES docker run --requests=cpu=10m/5m ..sha512().. cat /sys/fs/cgroup/cpu/kubepods/burstable/pod5d5..0d/cpu.shares 10 // relative share of CPU time cat /sys/fs/cgroup/cpu/kubepods/burstable/pod6e0..0d/cpu.shares 5 // relative share of CPU time cat /sys/fs/cgroup/cpuacct/kubepods/burstable/pod5d5..0d/cpuacct.usage /sys/fs/cgroup/cpuacct/kubepods/burstable/pod6e0..0d/cpuacct.usage 13432815283 // total CPU time in nanoseconds 7528759332 // total CPU time in nanoseconds
  • 17. 17 LIMITS: COMPRESSIBLE RESOURCES Can be taken away quickly, "only" cause slowness CPU Throttling 200m CPU limit ⇒ container can use 0.2s of CPU time per second
  • 18. 18 CPU THROTTLING docker run --cpus CPUS -it python python -m timeit -s 'import hashlib' -n 10000 -v 'hashlib.sha512().update(b"foo")' CPUS=1.0 3.8 - 4ms CPUS=0.5 3.8 - 52ms CPUS=0.2 6.8 - 88ms CPUS=0.1 5.7 - 190ms
  • 19. 19 LIMITS: NON-COMPRESSIBLE RESOURCES Hold state, are slower to take away. Killing (OOMKill)
  • 20. 20 MEMORY LIMITS: OUT OF MEMORY kubectl get pod NAME READY STATUS RESTARTS AGE kube-ops-view-7bc-tcwkt 0/1 CrashLoopBackOff 3 2m kubectl describe pod kube-ops-view-7bc-tcwkt ... Last State: Terminated Reason: OOMKilled Exit Code: 137
  • 21. 21 QOS Guaranteed: all containers have limits == requests Burstable: some containers have limits > requests BestEffort: no requests/limits set kubectl describe pod … … Limits: memory: 100Mi Requests: cpu: 100m memory: 100Mi QoS Class: Burstable
  • 22. 22 OVERCOMMIT Limits > Requests ⇒ Burstable QoS ⇒ Overcommit Overcommit for CPU: fine, running into completely fair scheduling Overcommit for memory: fine, as long as demand < node capacity https://code.fb.com/production-engineering/oomd/ Might run into unpredictable OOM situations when demand reaches node's memory capacity (Kernel OOM Killer)
  • 23. 23 LIMITS: CGROUPS docker run --cpus 1 -m 200m --rm -it busybox cat /sys/fs/cgroup/cpu/docker/8ab25..1c/cpu.{shares,cfs_*} 1024 // cpu.shares (default value) 100000 // cpu.cfs_period_us (100ms period length) 100000 // cpu.cfs_quota_us (total CPU time in µs consumable per period) cat /sys/fs/cgroup/memory/docker/8ab25..1c/memory.limit_in_bytes 209715200
  • 24. 24 LIMITS: PROBLEMS 1. CPU CFS Quota: Latency 2. Memory: accounting, OOM behavior
  • 28. 28 WHAT'S A GOOD VALUE FOR CFS PERIOD? Latencyp95→
  • 29. 29 NOW IN KUBERNETES 1.12 https://github.com/kubernetes/kubernetes/pull/63437
  • 31. 31 OVERLY AGGRESSIVE CFS: EXPERIMENT #1 CPU Period: 100ms CPU Quota: None Burn 5ms and sleep 100ms ⇒ Quota disabled ⇒ No Throttling expected! https://gist.github.com/bobrik/2030ff040fad360327a5fab7a09c4ff1
  • 32. 32 EXPERIMENT #1: NO QUOTA, NO THROTTLING 2018/11/03 13:04:02 [0] burn took 5ms, real time so far: 5ms, cpu time so far: 6ms 2018/11/03 13:04:03 [1] burn took 5ms, real time so far: 510ms, cpu time so far: 11ms 2018/11/03 13:04:03 [2] burn took 5ms, real time so far: 1015ms, cpu time so far: 17ms 2018/11/03 13:04:04 [3] burn took 5ms, real time so far: 1520ms, cpu time so far: 23ms 2018/11/03 13:04:04 [4] burn took 5ms, real time so far: 2025ms, cpu time so far: 29ms 2018/11/03 13:04:05 [5] burn took 5ms, real time so far: 2530ms, cpu time so far: 35ms 2018/11/03 13:04:05 [6] burn took 5ms, real time so far: 3036ms, cpu time so far: 40ms 2018/11/03 13:04:06 [7] burn took 5ms, real time so far: 3541ms, cpu time so far: 46ms 2018/11/03 13:04:06 [8] burn took 5ms, real time so far: 4046ms, cpu time so far: 52ms 2018/11/03 13:04:07 [9] burn took 5ms, real time so far: 4551ms, cpu time so far: 58ms
  • 33. 33 OVERLY AGGRESSIVE CFS: EXPERIMENT #2 CPU Period: 100ms CPU Quota: 20ms Burn 5ms and sleep 500ms ⇒ No 100ms intervals where possibly 20ms is burned ⇒ No Throttling expected!
  • 34. 34 EXPERIMENT #2: OVERLY AGGRESSIVE CFS 2018/11/03 13:05:05 [0] burn took 5ms, real time so far: 5ms, cpu time so far: 5ms 2018/11/03 13:05:06 [1] burn took 99ms, real time so far: 690ms, cpu time so far: 9ms 2018/11/03 13:05:06 [2] burn took 99ms, real time so far: 1290ms, cpu time so far: 14ms 2018/11/03 13:05:07 [3] burn took 99ms, real time so far: 1890ms, cpu time so far: 18ms 2018/11/03 13:05:07 [4] burn took 5ms, real time so far: 2395ms, cpu time so far: 24ms 2018/11/03 13:05:08 [5] burn took 94ms, real time so far: 2990ms, cpu time so far: 27ms 2018/11/03 13:05:09 [6] burn took 99ms, real time so far: 3590ms, cpu time so far: 32ms 2018/11/03 13:05:09 [7] burn took 5ms, real time so far: 4095ms, cpu time so far: 37ms 2018/11/03 13:05:10 [8] burn took 5ms, real time so far: 4600ms, cpu time so far: 43ms 2018/11/03 13:05:10 [9] burn took 5ms, real time so far: 5105ms, cpu time so far: 49ms
  • 35. 35 OVERLY AGGRESSIVE CFS: EXPERIMENT #3 CPU Period: 10ms CPU Quota: 2ms Burn 5ms and sleep 100ms ⇒ Same 20% CPU (200m) limit, but smaller period ⇒ Throttling expected!
  • 36. 36 SMALLER CPU PERIOD ⇒ BETTER LATENCY 2018/11/03 16:31:07 [0] burn took 18ms, real time so far: 18ms, cpu time so far: 6ms 2018/11/03 16:31:07 [1] burn took 9ms, real time so far: 128ms, cpu time so far: 8ms 2018/11/03 16:31:07 [2] burn took 9ms, real time so far: 238ms, cpu time so far: 13ms 2018/11/03 16:31:07 [3] burn took 5ms, real time so far: 343ms, cpu time so far: 18ms 2018/11/03 16:31:07 [4] burn took 30ms, real time so far: 488ms, cpu time so far: 24ms 2018/11/03 16:31:07 [5] burn took 19ms, real time so far: 608ms, cpu time so far: 29ms 2018/11/03 16:31:07 [6] burn took 9ms, real time so far: 718ms, cpu time so far: 34ms 2018/11/03 16:31:08 [7] burn took 5ms, real time so far: 824ms, cpu time so far: 40ms 2018/11/03 16:31:08 [8] burn took 5ms, real time so far: 943ms, cpu time so far: 45ms 2018/11/03 16:31:08 [9] burn took 9ms, real time so far: 1068ms, cpu time so far: 48ms
  • 37. 37 LIMITS: VISIBILITY docker run --cpus 1 -m 200m --rm -it busybox top Mem: 7369128K used, 726072K free, 128164K shrd, 303924K buff, 1208132K cached CPU0: 14.8% usr 8.4% sys 0.2% nic 67.6% idle 8.2% io 0.0% irq 0.6% sirq CPU1: 8.8% usr 10.3% sys 0.0% nic 75.9% idle 4.4% io 0.0% irq 0.4% sirq CPU2: 7.3% usr 8.7% sys 0.0% nic 63.2% idle 20.1% io 0.0% irq 0.6% sirq CPU3: 9.3% usr 9.9% sys 0.0% nic 65.7% idle 14.5% io 0.0% irq 0.4% sirq
  • 38. 38 • Container-aware memory configuration • JVM MaxHeap • Container-aware processor configuration • Thread pools • GOMAXPROCS • node.js cluster module LIMITS: VISIBILITY
  • 41. 41 ZALANDO: DECISION • Forbid Memory Overcommit • Implement mutating admission webhook • Set requests = limits • Disable CPU CFS Quota in all clusters
  • 44. Example: Getting started with Zalenium & UI Tests Example: Step by step guide to the first UI test with Zalenium running in the Continuous Delivery Platform. I was always afraid of UI tests because it looked too difficult to get started, Zalenium solved this problem for me.
  • 45. 45 CLUSTER AUTOSCALER Simulates the Kubernetes scheduler internally to find out.. • ..if any of the pods wouldn’t fit on existing nodes ⇒ upscale is needed • ..if it’s possible to fit some of the pods on existing nodes ⇒ downscale is needed ⇒ Cluster size is determined by resource requests (+ constraints) github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler
  • 46. 46 ALLOCATABLE Reserve resources for system components, Kubelet, and container runtime: --system-reserved= cpu=100m,memory=164Mi --kube-reserved= cpu=100m,memory=282Mi
  • 47. 47 CPU/memory requests "block" resources on nodes. Difference between actual usage and requests → Slack SLACK CPU Memory Node "Slack"
  • 48. 48 STRANDED RESOURCES Stranded CPU Memory CPU Memory Node 1 Node 2 Some available capacity can become unusable / stranded. ⇒ Reschedule, bin packing
  • 49. 49 HOW TO FIND GOOD VALUES FOR REQUESTS? resources: limits: memory: "2Gi" requests: cpu: "1.5" memory: "2Gi"
  • 50. 50 TWO TOOLS ● Kubernetes Resource Report ○ Dashboard showing Kubernetes Resource Slack ● Kubernetes Application Dashboard ○ Grafana Dashboard showing historical CPU and Memory usage for Kubernetes Applications
  • 54. 54 VERTICAL POD AUTOSCALER (VPA) "Some 2/3 of the Borg users use autopilot." - Tim Hockin VPA: Set resource requests automatically based on usage.
  • 55. 55 VPA FOR PROMETHEUS apiVersion: poc.autoscaling.k8s.io/v1alpha1 kind: VerticalPodAutoscaler metadata: name: prometheus-vpa namespace: kube-system spec: selector: matchLabels: application: prometheus updatePolicy: updateMode: Auto CPU/memory
  • 56. 56 AUTOSCALING BUFFER • Cluster Autoscaler only triggers on Pending Pods • Node provisioning is slow ⇒ Reserve extra capacity via low priority Pods "Autoscaling Buffer Pods"
  • 57. 57 AUTOSCALING BUFFER kubectl describe pod autoscaling-buffer-eu-central-1a-78-rzjq5 -n kube-system ... Namespace: kube-system Priority: -1000000 PriorityClassName: autoscaling-buffer Containers: pause: Image: teapot/pause-amd64:3.1 Requests: cpu: 1600m memory: 6871947673 Evict if higher priority (default) Pod needs capacity
  • 58. 58 HORIZONTAL POD AUTOSCALER apiVersion: autoscaling/v2beta1 kind: HorizontalPodAutoscaler metadata: name: myapp spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: myapp minReplicas: 3 maxReplicas: 5 metrics: - type: Resource resource: name: cpu targetAverageUtilization: 100 target: ~100% of CPU requests ...
  • 59. 59 HORIZONTAL POD AUTOSCALING (CUSTOM METRICS) Queue Length Prometheus Query Ingress Req/s ZMON Check github.com/zalando-incubator/kube-metrics-adapter
  • 61. 61 DOWNSCALING DURING OFF-HOURS DEFAULT_UPTIME="Mon-Fri 07:30-20:30 CET" annotations: downscaler/exclude: "true" github.com/hjacobs/kube-downscaler
  • 62. 62 WHAT WORKED FOR US ● Disable CPU CFS Quota in all clusters ● Admission Controller: ○ Prevent memory overcommit ○ Set default requests ● Kubernetes Resource Report ● Downscaling during off-hours
  • 63. 63 WRAP UP ● Understanding Requests/Limits ● Deciding on quota period (or disable), overcommit ● Autoscaling (cluster-autoscaler, HPA, VPA, downscaler) ● Buffer capacity ● Optimizing slack
  • 64. 64 OPEN SOURCE Kubernetes on AWS github.com/zalando-incubator/kubernetes-on-aws AWS ALB Ingress controller github.com/zalando-incubator/kube-ingress-aws-controller Skipper HTTP Router & Ingress controller github.com/zalando/skipper External DNS github.com/kubernetes-incubator/external-dns Postgres Operator github.com/zalando-incubator/postgres-operator Kubernetes Resource Report github.com/hjacobs/kube-resource-report Kubernetes Downscaler github.com/hjacobs/kube-downscaler
  • 66. 66 OTHER TALKS • Everything You Ever Wanted to Know About Resource Scheduling • Inside Kubernetes Resource Management (QoS) - KubeCon 2018 • Setting Resource Requests and Limits in Kubernetes (Best Practices)
  • 67. QUESTIONS? HENNING JACOBS HEAD OF DEVELOPER PRODUCTIVITY henning@zalando.de @try_except_ Illustrations by @01k