Kubernetes became the de-facto standard for running cloud-native applications. And many users turn to it also to run stateful applications such as Apache Kafka. You can use different tools to deploy Kafka on Kubernetes - write your own YAML files, use Helm Charts, or go for one of the available operators. But there is one thing all of these have in common. You still need very good knowledge of Kubernetes to make sure your Kafka cluster works properly in all situations. This talk will cover different Kubernetes features such as resources, affinity, tolerations, pod disruption budgets, topology spread constraints and more. And it will explain why they are important for Apache Kafka and how to use them. If you are interested in running Kafka on Kubernetes and do not know all of these, this is a talk for you.
2. About me
" Principal Software Engineer @ Red Hat
" Maintainer of Strimzi project (https://strimzi.io)
" Apache Kafka contributor
@scholzj
https://github.com/scholzj
https://www.linkedin.com/in/scholzj/
Everything you ever needed to know about Kafka on Kubernetes
2
3. Kafka on Kubernetes
" Many different ways to run Kafka on Kubernetes
○ Bunch of YAML files
○ Helm Charts
○ Operators
" You should still understand how Kubernetes works
Everything you ever needed to know about Kafka on Kubernetes
3
5. Resources
" Configure which resources available to Pods
○ CPU, Memory, Hugepages
" Requests and Limits
○ Requests are guaranteed
○ Limits can be available if enough resources are available
○ When only Limit is configured, it is used automatically as Request
Everything you ever needed to know about Kafka on Kubernetes
5
6. Resources
" CPU
○ Pods are not killed for exceeding CPU usage
" Memory
○ Pods are killed when the exceed the memory limit
○ Pods might be killed when they exceed the memory request and node runs OoM
Everything you ever needed to know about Kafka on Kubernetes
6
7. Kafka and Memory
" New JVMs can correctly detect available memory to the container
○ It will auto-configure to use the memory limit and not request
○ Be careful, because the limit might not be really available
" Disk page-cache is counted into the memory request / limit
Everything you ever needed to know about Kafka on Kubernetes
7
8. Key takeaways
" Always configure container resources
○ More stable and predictable performance, Better scheduling results
" Configure Java memory
○ Control how much memory should be used by JVM and how much by disk cache
○ Configure Java to use only the requested memory
Everything you ever needed to know about Kafka on Kubernetes
8
10. Affinity
" Defines relationships between different resources
○ Between different Pods
○ Between Pods and Nodes
" Affinity versus Anti-affinity
" Required versus Preferred
Everything you ever needed to know about Kafka on Kubernetes
10
11. Node affinity
" Defines on which worker nodes will your broker pods be scheduled
" Uses node labels to express where the pods should be placed
○ Built-in labels or custom labels
○ Labels might describe node features (node type, network performance, ...)
○ But also the cluster topology including zones / racks in which the node is running
Everything you ever needed to know about Kafka on Kubernetes
11
14. Pod (anti-)affinity
" Defines which pods should or should not be co-located in the same topology
" Configurable topology to which it applies
○ Worker node
○ Availability zone
Everything you ever needed to know about Kafka on Kubernetes
14
19. Topology Spread
" Affinity supports only preferred or required scheduling
○ No guarantees when you have more pods than topologies
○ Problem when spreading pods across racks / availability zones
" Topology Spread Constraints come to the rescue
○ Define how are pods spread across topology
Everything you ever needed to know about Kafka on Kubernetes
19
21. Topology Spread
" Maximal skew
○ Defines how unevenly the pods can be spread
" Configures the behaviour when maximal skew is unsatisfiable
○ Do not schedule the pod versus schedule it anyway
" Label selector to define which pod should be included in the topology spread
Everything you ever needed to know about Kafka on Kubernetes
21
23. Stability
" Assignment of a pod to a worker node is by default not permanent
○ After the pods are deleted, they might be scheduled to different nodes or zones
○ Use tools such as Cruise Control to regularly check that your topic replicas are
distributed across the racks / zones
Everything you ever needed to know about Kafka on Kubernetes
23
24. Storage
" Storage might have its own scheduling limitations
○ AWS EBS volumes are bound to single availability zone
○ Affects how Pods can be scheduled
" Pro-tip: Use allowedTopologies field in Storage Class to schedule volumes
Everything you ever needed to know about Kafka on Kubernetes
24
25. Dedicated nodes
" Worker nodes dedicated only for Kafka
○ Will still run Kubernetes components, log / metrics collectors etc.
○ Less competing for resources with other applications
○ Better isolation and more predictable performance
Everything you ever needed to know about Kafka on Kubernetes
25
27. Dedicated nodes
" Taint the nodes to prevent other apps to be scheduled there
" In your Kafka Pods
○ Configure tolerations to allow them on the tainted nodes
○ Configure node affinity to make sure they are not scheduled on any other nodes
Everything you ever needed to know about Kafka on Kubernetes
27
32. Key takeaways
" Schedule brokers to the right type of nodes
" Avoid sharing nodes with other I/O intensive workloads or other Kafka brokers
" Spread brokers equally over all zones and use Kafka rack-awareness
" Check distributions of topic replicas over racks regularly
" Consider using dedicated nodes for big clusters
Everything you ever needed to know about Kafka on Kubernetes
32
34. Disruptions
" Can impact any environment => Kubernetes is not an exception
" Involuntary disruptions
○ Hardware failures, Network issues, Kernel panics
" Voluntary disruptions
○ Node draining (node repair, upgrades or scaling), bin-packing
Everything you ever needed to know about Kafka on Kubernetes
34
36. Disruptions
" PodDisruptionBudgets define how much disruption can your cluster handle
○ Limits maximal number of unavailable pods / Minimum number of available pods
○ Defined as absolute number / percentage
○ Selector selects pods to which the budget applies
" Any voluntary disruptions should check PDBs before disrupting your cluster
Everything you ever needed to know about Kafka on Kubernetes
36
39. Key takeaways
" Configure Pod Disruption Budgets
" Set max-unavailability to 1 to minimize the disruptions
" Set max-unavailability to 0 to avoid voluntary disruptions
○ Pods will need to be restarted manually when needed
Everything you ever needed to know about Kafka on Kubernetes
39
41. Others
" Local Persistent Volumes
" Pod Priority and Preemption
" Scheduling Framework
" Horizontal Pod Autoscaler
Everything you ever needed to know about Kafka on Kubernetes
41