This presentation is to help you understand https://kubernetes.io/docs/tutorials/stateful-application/zookeeper/ without having to read all the concepts in a number of Kubernetes documents.
2. Why we use Kubernetes?
Container-based virtualization + Container orchestration
Satisfying common needs in production
co-locating helper processes
mounting storage systems
distributing secrets
application health checking
replicating application instances
horizontal auto-scaling
naming and discovery
load balancing
rolling updates
resource monitoring
log access and ingestion
...
from a web page from the official site : https://kubernetes.io/docs/whatisk8s/
3. Pod – the basic unit of Kubernetes
• Components
• a group of containers
• docker, rkt (pronounced “rock-it”) from CoreOS, etc
• a group of shared storage called volumes
• ephemeral volume
• persistent volume
• host local directories
• nfs
• iscsi
• flocker
• Google Compute Engine (GCE) Persistent Disk
• Amazon Web Services (AWS) Elastic Block Store (EBS)
• Purpose
• model an application-specific logical host/VM
• Characteristics
• containers in a pod share IP addresses/ports
• containers in a pod can communicate via IPC
Pod
Container
(port : 1234)
Volume
(ephemeral)
Container
(port : 3456)
Container
(port : 5678)
Volume
(persistent)
Containers claim their volumes
ipc
Address : 10.244.1.10localhost:3456
4. Few things to consider when running Zookeeper with Kubernetes
• How to launch Zookeeper servers using a pod?
• How to give IDs to pods?
• What is the domain name of each pod?
• How to make sure a certain # of pods running during maintenance?
Pod
Zookeeper server (leader)
- myid : 1
- server.1
- zk-1:2888:3888
- server.2
- zk-2:2888:3888
- server.3
- zk-3:2888:3888
Zookeeper server
- myid : 2
- server.1
- zk-1:2888:3888
- server.2
- zk-2:2888:3888
- server.3
- zk-3:2888:3888
Zookeeper server
- myid : 3
- server.1
- zk-1:2888:3888
- server.2
- zk-2:2888:3888
- server.3
- zk-3:2888:3888
Kafka server
- broker.id : 1
- zookeeper.connect
- zk-1.zk:2181
- zk-2.zk:2181
- zk-3.zk:2181
Kafka server
- broker.id : 2
- zookeeper.connect
- zk-1.zk:2181
- zk-2.zk:2181
- zk-3.zk:2181
Kafka server
- broker.id : 3
- zookeeper.connect
- zk-1.zk:2181
- zk-2.zk:2181
- zk-3.zk:2181
Zookeeper
servers
(zk)
Kafka
servers
(kk)
Pod Pod
Pod Pod Pod
zk-1 zk-2 zk-3
kk-1 kk-1 kk-1
a majority quorum must be present
5. StatefulSet – a way of launching ordered replicas of a container
zk-0
Containers
Volumes
zk-1
Containers
Volumes
zk-2
Containers
Volumes
The StatefulSet creates 3 pods with ordinals suffixed to pod names,
and guarantees the followings:
pod-0
Containers
Volumes
pod-1
Containers
Volumes
pod-2
Containers
Volumes
pods are created sequentially
pod-0
Containers
Volumes
pod-1
Containers
Volumes
pod-2
Containers
Volumes
pods are deleted in reverse order
pod-0
Containers
Volumes
pod-1
Containers
Volumes
pod-2
Containers
Volumes pod-3
Containers
Volumes
Before a scaling op is applied
all its predecessors must be running
pod-0
Containers
Volumes
pod-1
Containers
Volumes
pod-2
Containers
Volumes
Before a pod is terminated,
all of its successors are shutdown
Each pod is created and scheduled
using this template
Each pod lays its claim to storage
using this template
Create 3 replicas of servers
using the following templates
6. Service (10.111.67.108)
Service – to represent a group of pods with a cluster IP
server-0
Containers
Volumes
server-1
Containers
Volumes
server-2
Containers
Volumes
Q) How to achieve the followings?
• Users must be unaware of the replicas
• Traffic is distributed over the replicas
server-0
Containers
Volumes
server-1
Containers
Volumes
server-2
Containers
Volumes
Let’s say that we have 3 replicas of a pod for load balancing
A) Define a service with a cluster IP.
Then Kubernetes does round-robin forwarding
7. Headless service – service without a common IP
• Zookeeper clients (e.g. Kafka) need to specify the address of each Zookeeper server
• Kubernetes depends on its DNS service for headless services
• Each pod is assigned a domain name from Kubernetes
• Each pod is directly accessed with its domain name (not through a cluster IP)
• Fully Qualified Domain Name (FQDN) format
• $pod.$service.$namespace.svc.cluster.local
Pod
Zookeeper server
- myid : 1
- server.1
- zk-1:2888:3888
- server.2
- zk-2:2888:3888
- server.3
- zk-3:2888:3888
Zookeeper server
- myid : 2
- server.1
- zk-1:2888:3888
- server.2
- zk-2:2888:3888
- server.3
- zk-3:2888:3888
Zookeeper server
- myid : 3
- server.1
- zk-1:2888:3888
- server.2
- zk-2:2888:3888
- server.3
- zk-3:2888:3888
Kafka server
- broker.id : 1
- zookeeper.connect
- zk-1.zk:2181
- zk-2.zk:2181
- zk-3.zk:2181
Kafka server
- broker.id : 2
- zookeeper.connect
- zk-1.zk:2181
- zk-2.zk:2181
- zk-3.zk:2181
Kafka server
- broker.id : 3
- zookeeper.connect
- zk-1.zk:2181
- zk-2.zk:2181
- zk-3.zk:2181
Zookeeper
servers
(zk)
Kafka
servers
(kk)
Pod Pod
Pod Pod Pod
zk-1 zk-2 zk-3
kk-1 kk-1 kk-1
8. Namespace in Kubernetes
zk-0
Containers
Volumes
zk-1
Containers
Volumes
zk-2
Containers
Volumes
Three pods are defined within zk-headless service,
and they are given DNS entries of the following format:
pod.service.namespace.svc.cluster.local
zk-headless service
zk-1:2181 (within service)
zk-1.zk-headless:2181 (within same namespace)
default namespace
kafka service
kk-0
Containers
Volumes
kk-1
Containers
Volumes
kk-2
Containers
Volumes
kk-3
Containers
Volumes
zk-1.zk-headless.default.svc.cluster.local:2181 (from other namespace)
alien namespace
The default namespace is used
as there’s no namespace declaration
9. Pod anti-affinity
This pod should not run in X in which one or more pods that satisfy Y are
running.
- X belongs to topology domain
- node (topologyKey:kubernetes.io/hostname in this example)
- rack
- cloud provider zone
- cloud provider region
- Y is a label selector
- it selects all pods belonging to a service named zk-headless
⇓ debugging hook (a pod pauses until it is set to true)
kube-scheduler is about to schedule pod2 labeled app=zk-headless,
but wants to avoid node3 because there’s pod1 labeled app=zk-headless.
Kubernetes provides pod anti-affinity for this case.
node1 node2 node3
pod1
Containers
Volumes
pod2
Containers
Volumes
app=
zk-headless
kube-
scheduler
app=
zk-headless
10. Files in the container image
• Dockerfile
1. Download the latest Zookeeper tarball
2. Extract and place the content under /opt/zookeeper
3. ln -s /opt/zookeeper/* /usr/bin
• zkGenConfig.sh
1. create zoo.cfg
2. configure log-related properties
3. create data directories
4. set myid extracted from domain name
• ex) zk-0.zk-headless.default.svc.cluster.local 0+1 = 1
• zkOk.sh
• check readiness and liveness of a pod
⇓ it’s from Zookeeper
11. Environmental variables for container processes in a pod
env defines environmental variables
to be used in container processes.
Two ways to assign values
1. value = constant val
2. valueFrom = val from ConfigMap
12. Readiness & liveness check for containers
Kubernetes provides a means of checking
readiness & liveness
13. Kubernetes
How to guarantee a certain # of running pods during maintenance
• Users can define PodDisruptionBudget with minAvailable
• At least two pods from zk must be available at any time
• Below is an example illustrating PodDisruptionBudget
• together with StatefulSet and PodAntiAffinity
node1
zk-0
Containers
Volumes
node2
zk-2
Containers
Volumes
node3
zk-3
Containers
Volumes
Drain node1
Operation is permitted
because allowed-disruptions=1
Kubernetes
Drain node2
3 replicas have to be running
due to StatefulSet,
so try scheduling zk-0
on other nodes!
Oops!
cannot schedule zk-0
on node2 and node3
due to PodAntiAffinity!
Operation not permitted
because allowed-disruptions=0
(Note that minAvailable=2)
Please wait until
node1 is up and zk-0 is rescheduled!
node1
zk-0
Containers
Volumes
node2
zk-2
Containers
Volumes
node3
zk-3
Containers
Volumes
14. Scaling issue with Zookeeper
• Dynamically changing the membership of a replicated distributed system, while
preserving data consistency and system availability, is challenging
• from “Dynamic Reconfiguration of Primary/Backup Clusters” in USENIX ATC 2012
• Prior to Zookeeper 3.5.0 (We use 3.4.9 which is the latest stable version at this point)
• Configuration parameters are loaded during boot
• Configuration parameters are immutable at runtime
• Operators have to carefully restart all daemons
• Starting with Zookeeper 3.5.0,
• Full support for automated configuration changes
• without service interruption while preserving data consistency
• Set of zookeeper servers, roles of servers, all ports, and even quorum systems
* https://zookeeper.apache.org/doc/trunk/zookeeperReconfig.html
15. Scaling up/down a StatefulSet
StatefulSet itself has means to scaling up/down
• kubectl scale statefulset $statefulSetInstanceName --replicas=5
• kubectl patch statefulset $statefulSetInstanceName -p '{"spec":{"replicas":3}}’
16. Topics not covered here
• Detailed architecture of Kubernetes
• https://github.com/kubernetes/community/blob/master/contributors/design-
proposals/architecture.md
• ReplicaSet and Deployment (other than StatefulSet)
• https://kubernetes.io/docs/user-guide/replicasets/
• https://kubernetes.io/docs/user-guide/deployments/
• Persistent Volume and Persistent Volume Claim
• https://kubernetes.io/docs/user-guide/volumes/
• Kubernetes network (Proxy, DNS, etc)
• https://kubernetes.io/docs/admin/networking/
• https://kubernetes.io/docs/admin/dns/