Anúncio
Anúncio

Mais conteúdo relacionado

Similar a GDG Cloud Southlake #20:Stefano Doni: Kubernetes performance tuning dilemma: How to solve it with AI(20)

Mais de James Anderson(13)

Anúncio

GDG Cloud Southlake #20:Stefano Doni: Kubernetes performance tuning dilemma: How to solve it with AI

  1. © 2023 Akamas • All Rights Reserved • Confidential Kubernetes performance tuning dilemma: How to solve it with AI Stefano Doni, CTO
  2. © 2023 Akamas • All Rights Reserved • Confidential Agenda 1 The problem 2 Tuning challenges for modern K8s apps 3 AI-powered optimization 4 Demo
  3. © 2023 Akamas • All Rights Reserved • Confidential ● Obsessed with performance optimization ● 18+ years of capacity & performance work ● CMG speaker since 2014, Best paper on Java performance & efficiency in 2015 ● Co-founder and CTO @ Akamas, the software platform for autonomous optimization, powered by AI Who Am I
  4. © 2023 Akamas • All Rights Reserved • Confidential Kubernetes has become the operating system of the cloud Cloud Native Computing Foundation, Annual Survey 2021 96% of organizations are either using or evaluating Kubernetes
  5. © 2023 Akamas • All Rights Reserved • Confidential The dark side of Kubernetes youtu.be/watch?v=4CT0cI62YHk youtu.be/QXApVwRBeys Cost efficiency Apps reliability Apps performance Kubernetes FinOps Report, 2021 June Kubernetes failure stories: k8s.af
  6. © 2023 Akamas • All Rights Reserved • Confidential Application runtime resource management Kubernetes resource management ● Memory sizing ● Garbage collection ● Compiler & thread settings ● Container resource requests & limits ● Number of replicas ● Horizontal auto-scaling settings New challenges for cloud-native apps 100s-1000s microservices 10s-100s inter-dependent configurations
  7. © 2023 Akamas • All Rights Reserved • Confidential Why is K8s so hard? K8s resource management
  8. © 2023 Akamas • All Rights Reserved • Confidential Pod A Pod B Resource requests drive K8s cluster costs CPU Memory ● Requests are resources the container is guaranteed to get ● Cluster capacity is based on pod resource requests - there is no overcommitment! ● Resource requests != resource utilization: a cluster can be full even if utilization is 10% Node (4 CPU, 8 GB Memory) Resource requests from pod manifest Pod A 2 cores 2GB Memory Pod A apiVersion: v1 kind: Pod metadata: name: Pod A spec: containers: - name: app image: nginx:1.1 resources: requests: memory: “2Gi” cpu: “2” 2 4 2 4 6 8 Pod B Resource used
  9. © 2023 Akamas • All Rights Reserved • Confidential Resource limits may strongly impact application performance and stability ● A container can consume more resources than it has requested ● Resource limits allow to specify the maximum resources a container can use (e.g. CPU = 2) ● When a container hits its resource limits bad things can happen Container CPU limit Container Memory limit K8s throttle container CPU -> Application performance slowdown When hitting Memory Limits When hitting CPU Limits K8s kills the container -> Application stability issues X CPU Usage Memory Usage
  10. © 2023 Akamas • All Rights Reserved • Confidential CPU throttling impacts cost & performance in surprising ways SRE Significant CPU throttling… … with CPU < 40% “The container's CPU use is being throttled, because the container is attempting to use more CPU resources than its limit” https://kubernetes.io/docs/tasks/configure-pod- container/assign-cpu-resource Why do I have CPU throttling if I’m using less than 40% of my CPU limit? Must be a K8s issue… Perf. impact
  11. © 2023 Akamas • All Rights Reserved • Confidential Fact #4: Setting resource requests and limits is required to ensure Kubernetes stability “While your Kubernetes cluster might work fine without setting resource requests and limits, you will start running into stability issues as your teams and projects grow” (Google, Kubernetes best practices) https://cloud.google.com/blog/products/containers-kubernetes/ kubernetes-best-practices-resource-requests-and-limits
  12. © 2023 Akamas • All Rights Reserved • Confidential Why is K8s so hard? Application runtime resource management
  13. © 2023 Akamas • All Rights Reserved • Confidential App runtimes are highly configurable engines “Because Java is so often deployed on servers, this kind of performance tuning is an essential activity for many organizations. The JVM is highly configurable with literally hundreds of command-line options and switches. These switches provide performance engineers a gold mine of possibilities to explore in the pursuit of the optimal configuration for a given workload on a given platform.” $ docker run eclipse-temurin:11-alpine java -XX:+PrintFlagsFinal
  14. © 2023 Akamas • All Rights Reserved • Confidential Why heap size tuning is important? JVM uses all of the available memory 2 GiB 1.2 GiB JVM heap used JVM max heap App response time ● The JVM tends to use all of the memory it has been configured with ● Sizing based on K8s container memory usage is going to miss a lot of savings ● Experiment with JVM max heap size to see how much you can save - while monitoring app performance! Key Takeaways -40% Mem used
  15. © 2023 Akamas • All Rights Reserved • Confidential Max heap size is set by default to 25% of container memory limit You can tune the 25% via the -XX:MaxRAMPercentage parameter: Alternatively, you can always set a fixed max heap size with the -Xmx parameter: How does the JVM set the max heap size in K8s? JVM container-aware ergonomics $ docker run --memory 1G eclipse-temurin:11-alpine java -XX:+PrintFlagsFinal 2>&1 | grep -w MaxHeapSize size_t MaxHeapSize = 268435456 {product} {ergonomic} $ docker run --memory 1G eclipse-temurin:11-alpine java -XX:MaxRAMPercentage=50 -XX:+PrintFlagsFinal 2>&1 | grep -w MaxHeapSize size_t MaxHeapSize = 536870912 {product} {ergonomic} $ docker run --memory 1G eclipse-temurin:11-alpine java -Xmx1024M -XX:+PrintFlagsFinal 2>&1 | grep -w MaxHeapSize size_t MaxHeapSize = 1073741824 {product} {command line}
  16. © 2023 Akamas • All Rights Reserved • Confidential JVM ergonomics in K8s are tricky Source: Microsoft ● JVM ergonomics do a lot of magic stuff, but they are tricky to understand and may do the wrong thing! ● MaxRAMPercentage default is very conservative: increase it, but watch out for out of memory kills by k8s ● Do not trust JVM ergonomics: it’s best to explicitly set JVM flags to avoid surprises Key Takeaways
  17. © 2023 Akamas • All Rights Reserved • Confidential OOM kills your app reliability - but better heap sizing can fix them Container memory used hits the memory limit, triggering K8s out-of-memory killer Context: Java microservices getting restarted due to out-of-memory kill by K8s SRE Container memory limit Container memory used Availability impact My containers keep getting OOM killed… Is this a memory leak or a misconfiguration? Let’s increase the memory limit just in case…
  18. © 2023 Akamas • All Rights Reserved • Confidential App runtime memory management Key Takeaways ● Heap max heap size is the main memory tuning parameter (e.g. JVM -Xmx or -XX:MaxRAMPercentage) ● Off-heap cannot be sized via configuration options - memory usage depends on your application (200 MB up to 1GB is common for the JVM) ● You need to monitor your app in production and take both spaces into account when sizing memory to achieve cost efficient and reliable microservices Heap Threads JVM max heap size K8s container memory limit JVM off-heap Classes Compiler JVM memory Initial Heap Garbage Collector
  19. © 2023 Akamas • All Rights Reserved • Confidential GC tuning can lead to big cost benefits 1500 millicores 600 millicores CPU used App response time G1 GC (-XX:+UseG1GC) Parallel GC (-XX:+UseParallelGC) -60% CPU used
  20. © 2023 Akamas • All Rights Reserved • Confidential JVM default ergonomics in K8s: garbage collector 2 4 6 8 1 Number of CPUs Memory (MB) 1791 MB Serial GC G1 GC Key Takeaways ● Default GC selection is based on hard-coded thresholds defined decades ago ● You may end up paying the cost of a suboptimal GC, and you may not even know it! ● Other good collectors like Parallel GC are not considered ● Do not trust JVM ergonomics - always set your JVM options!
  21. © 2023 Akamas • All Rights Reserved • Confidential Golang CPU reduction with GOGC tuning 400 millicores 180 millicores -55% CPU used Node.js has a lot of tuning flags as well (flaviocopes.com/node-runtime-v8-options)
  22. © 2023 Akamas • All Rights Reserved • Confidential How to solve this problem? Performance Engineering to the rescue!
  23. © 2023 Akamas • All Rights Reserved • Confidential The industry standard performance tuning process Analyze system performance Identify tuning parameters Change one parameter Test system with new config it’s manual, slow and error-prone, requires deep skills, doesn’t scale, is not continuous… Optimizing cloud-native applications requires a better approach!
  24. © 2023 Akamas • All Rights Reserved • Confidential Enter AI-driven Optimization
  25. © 2023 Akamas • All Rights Reserved • Confidential Autonomous optimization key capabilities
  26. © 2022 Akamas • All Rights Reserved • Confidential Autonomous optimization process
  27. © 2023 Akamas • All Rights Reserved • Confidential Optimization Studies Live Optimizations The Akamas Platform
  28. © 2023 Akamas • All Rights Reserved • Confidential Reducing cost of a Kubernetes microservice, while preserving app performance & reliability Demo
  29. © 2023 Akamas • All Rights Reserved • Confidential Key takeaways ● K8s enables unprecedented scalability & efficiency, but it’s not automatic ● Tuning is your responsibility - if you don’t tune, you don’t save! ● The biggest cost & reliability wins lie in K8s workload and app runtime layers - don’t rely on ergonomics! ● AI-powered optimization enables you to automate tuning and achieve savings at scale 1 2 3 4
  30. © 2023 Akamas • All Rights Reserved • Confidential Q&A
  31. Contacts info@akamas.io @AkamasLabs @akamaslabs Italy HQ Via Schiaffino 11 Milan, 20158 +39-02-4951-7001 USA East 211 Congress Street Boston, MA 02110 +1-617-936-0212 Singapore 5 Temasek Blvd Singapore 038985 USA West 12130 Millennium Drive Los Angeles, CA 90094 +1-323-524-0524 LinkedIn Twitter Email © 2023 Akamas • All Rights Reserved • Confidential
Anúncio