Kubernetes
• Kubernetes (K8S) provides a logical abstraction of treating your
data-center as a single machine.
• It allows for deploying, provisioning and self-healing of container
groups (aka pods) across your cluster.
Main Concepts
• Pods – the basic building-block in K8S. A logical group of
containers.
• Controllers – responsible of bringing the reality to the desired
state.
• Services – an abstraction over pods.
Pods
• A pod is a group of co-located containers.
• They run on the same node and share the same Linux
namespaces.
• Life-cycle of a pod consists of the following phases:
• Pending.
• Running.
• Succeeded.
• Failed.
• Unknown.
Copyright 2017 Trainologic LTD
Pod Spec
• An example of a spec:
---
apiVersion: v1
kind: Pod
metadata:
name: nginx
labels:
app: web
spec:
containers:
- name: front-end
image: nginx
ports:
- containerPort: 80
Specs
• Specs will be saved in a persisted storage.
• Etcd.
• The controllers are responsible for moving into the desired
state.
Probes
• You can define diagnostic probes for pods.
• Liveness probe – determines whether the container is alive.
Failures will be handled by RestartPolicy.
• Readiness probe – determines whether the container is ready
to serve requests. Initial state is failed.
Copyright 2017 Trainologic LTD
RestartPolicy
• You can define a RestartPolicy for every pod.
• Can be set to:
• Always – the default value.
• OnFailure – only if the container exited in a failed status.
• Never.
• A restart will have an exponential backoff and is cupped at 5
minutes.
Copyright 2017 Trainologic LTD
ImagePullPolicy
• When a container is started in a pod, this property controls
whether to check a remote registry for a newer image.
• Possible values: Always and IfNotPresent.
• The default is “IfNotPresent” (except for “:latest” tag).
• It is strongly advised that you don’t use “latest”.
Copyright 2017 Trainologic LTD
Controllers
• Most common one is: ReplicaSet.
• Intended for you immortal containers.
• Let’s see a spec example…
Spec Example
apiVersion: apps/v1
kind: ReplicaSet
metadata:
name: nginx
labels:
tier: frontend
spec:
# this replicas value is default
# modify it according to your case
replicas: 6
selector:
matchLabels:
tier: frontend
template:
metadata:
labels:
tier: frontend
spec:
containers:
- name: nginx
image: nginx
ports:
- containerPort: 80
Controllers
• For “it should someday die” containers, you have the Job
controller.
• Kubernetes also provides:
• DaemonSet.
• StatefulSet.
Deployments
• Deployments allow you to manage pods and Replica-Sets.
• In a declarative way!
• Allows to view the desired state of you Replica-Sets and pods.
• The name of the Replica-Set is
deploymentName-podTemplateHash.
Copyright 2017 Trainologic LTD
Example
• Let’s consider deploying a new version.
• How many ReplicaSets do we need?
• How about the scaling?
Deployments
• By default, a deployment makes sure that at most one pod is
unavailable during an update (1 max unavailable).
• It will also ensure that (by default) at most one pod can be
created more than the desired amount (1 max surge).
• When you update a deployment, a new Replica-Set is created
to scale up the new pods and the old Replica-Set (which has
the same selector) will be scaled-down.
Copyright 2017 Trainologic LTD
Rollback
• A rollback will affect only the pod’s template.
kubectl rollout history deployment/NAME
• You can list the deployment’s revisions by:
• You can provide the --revision=N to see revision’s details.
• Rollback is done with:
kubectl rollout undo deployment/NAME --to-revision=N
Copyright 2017 Trainologic LTD
Pause & Resume
• You can pause a deployment with:
kubectl rollout pause deployment/NAME
• Resume it with: “resume”.
• You can use it for canary deployments.
Copyright 2017 Trainologic LTD
Services
• A Service provides an abstraction over a logical set of pods.
• Somewhat analogous to a micro-service.
• Usually exposes a selector-based pods.
• Can export a port connected to pods’ target-port which may
even be a string (a name of the port inside).
• Allows for great flexibility.
• Supports both UDP and TCP.
Copyright 2017 Trainologic LTD
Virtual IPs
• In proxy mode “iptables”, if a pod has failed, the client will not
automatically be connected with a new one (unlike in
“userspace” mode).
• If relies on well-defined readiness probes.
• If you want you can specify a clusterIP address for your service
(must be inside your service-cluster-ip-range).
Copyright 2017 Trainologic LTD
DNS
• A cluster add-on that creates DNS records for each service.
• Doesn’t suffer from the ‘envs’ ordering problems.
• You can define headless services (specify ‘None’ in clusterIP)
and you’ll get only DNS (discovery) support (without proxy and
load-balancing).
• Headless services are currently a requirement for StatefulSet.
Copyright 2017 Trainologic LTD
Service Types
• The following ServiceType values are supported:
• ClusterIP – cluster-only internal IP (the default).
• ExternalName – maps to an external host without proxying.
• NodePort – in addition to internal cluster IP, expose the service on each
node in the cluster (same port for all nodes).
• LoadBalancer – in addition to NodePort, asks the cloud provider for a
load-balancing service.
Copyright 2017 Trainologic LTD
Components Overview
• etcd – The cluster’s data backend store. A most reliable key-
value store. Written in Go and uses the Raft protocol for
consensus.
• kubelet – The node agent. Responsible for monitoring pods
healthiness.
• kube-apiserver – The main API for K8s. Responsible for
validation and configuration of the cluster state.
• kube-controller-manager – Manages the controllers which are
responsible for moving the cluster state to the desired direction.
Components Overview
• kube-proxy – Runs on every node. Manages basic load-
balancing and TCP/UDP forwarding.
• kube-scheduler – Responsible for capacity planning and
workload distribution.
Kubelet
• Can actually be used without other K8s components.
• Creates the containers according to the pod specs.
• Can listen on a directory for manifests (specs).
• Can also receive requests through an internal HTTP server.
apiserver
• A simple REST server.
• Performs validation.
• Updates the etcd with the changes to K8s objects.
kube-proxy
• Performs load-balancing on the node.
• Provides a virtual IP to which clients can send requests
transparently.
• Responsible for updating the iptables for the Services.
• Service endpoints are defined on the DNS.
Volumes
• Volumes are attached to the lifecycle of a Pod.
• Not to its container/s.
• In the pod-spec you specify the requirements from the volume
and to which container and where to mount it.
Persistent Volumes
• Persistent Volumes (PV) are not tied to the lifecycle of a pod.
• They are resources (like nodes).
• Persistent volumes are consumed according to persistent
volume claims (PVC).
• A PV abstracts the underlying specifics of the storage.
• User needs only to deal with PVCs.
• A PVC has a mapping of one-to-one to a PV.
PVC
• Before a pod can use a PV, the user must create a PVC.
• A namespaced resource in K8S.
• The cluster manager then binds the PVC to the PV.
• The pod can use the claim as a volume now.
Lifecycle
• When the user deletes a PVC, the PV is treated according to
the following options of reclaim policies:
• Retain – reclamation is manual.
• Delete – the volume is deleted.
• Recycle – ‘rm –rf …’
ConfigMap
• Allows to decouple a container from where the configuration
comes from.
• First you need to define a configMap:
kubectl create configmap name source.
• The source is either a file or a literal.
ConfigMap
• You can specify literal values with --from-literal=key=value.
• You can specify a file with --from-file=path/url.
• When using the file version, the key is by default the file name
and its contents are the values.
• You can inspect the configMap with kubectl get.
Using ConfigMap
• You can use a configMap value for an env variable value to a
container.
• You can also use all values of a configMap as env variable for a pod.
• You can also use configMap variables as values for a container
command.
• You can also use a volume of type configMap to mount files based
on the keys in the configMap and the contents will be the values.
• Note that whewn a configMap is updated, it will be reflected in the
pods.
Secrets
• Secrets holds sensitive data like keys, passwords and tokens.
• Just like with configMaps, you can create secrets based on
files.
• Like with configMaps, you can use secrets as environment
variables and mount them as files.
Zones
• Kubernetes supports multiple availability zones.
• However, only in a single region.
• A cluster can’t spawn across regions.
• Kubernetes automatically attach zone labels for nodes and PVs.
• Note that PVs can’t be attached to a different zone than the one
they were created at.
• K8S takes care of that.
• Need to specify MULTIZONE=true when starting the cluster.
Ingress
• Services, by default, are accessible only from inside the cluster.
• Also, they work at the TCP/UDP level.
• Ingress is a set of rules directing incoming traffic to service
endpoints.
• It works at the HTTP level.
Ingress Types
• You can map HTTP URLs to services (fanout).
• You can also include the “Host” header in your rules (for virtual
hosts).
Helm
• Helm allows for streamlining K8S applications.
• Packages in Helm are called charts.
• You can use available chart for popular software.
• You can use charts to template your K8S specifications.
• Composed of server part (tiller) that runs inside the K8S cluster
and the client (helm).
Installation
• Download and install the helm client.
• Invoke “helm init” to install the tiller in your K8S cluster.
• Execute “helm repo update” to update the latest charts
versions.
• You can check the repositories with “helm repo list”.
• Install a chart with “helm install repo/chart”.
• You can see deployments with “helm ls”.
Charts
• Charts have a very specific directory structure.
• The name of the root directory is the basic name of the chart
(without the version part).
• In the root directory there must be a Chart.yaml, the base
descriptor of the chart.
• Helm also looks for the “templates” and “charts” sub-folders.
Chart.yaml
apiVersion: The chart API version, always "v1" (required)
name: The name of the chart (required)
version: A SemVer 2 version (required)
kubeVersion: A SemVer range of compatible Kubernetes versions (optional)
description: A single-sentence description of this project (optional)
keywords:
- A list of keywords about this project (optional)
home: The URL of this project's home page (optional)
sources:
- A list of URLs to source code for this project (optional)
maintainers: # (optional)
- name: The maintainer's name (required for each maintainer)
email: The maintainer's email (optional for each maintainer)
url: A URL for the maintainer (optional for each maintainer)
engine: gotpl # The name of the template engine (optional, defaults to gotpl)
icon: A URL to an SVG or PNG image to be used as an icon (optional).
appVersion: The version of the app that this contains (optional).
deprecated: Whether this chart is deprecated (optional, boolean)
tillerVersion: The version of Tiller that this chart requires.
Dependencies
• The “charts” directory can have a requirements.yaml file
specifying charts that the current one depends on.
• Execute ”helm dependency update” to download the
dependencies archives into the “charts” directory.
Templates
• Chart templates are written in Go template language.
• They are stored under the ”templates” directory.
• Every file in this directory passed through the template engine
at the time of rendering.
• You can specify values for the templates with:
• A default values file: values.yaml at the root directory.
• Pass a yaml file with values on “helm install”.
• Take a look at what is created with “helm create”.
Extensibility
• Kubernetes is highly extensible. There are many extension
points that you can use, depending on the use-case.
• Let’s start with terminology.
Extension Patterns
• If your extension is a client of K8s, then your extension is called
a controller.
• When K8S is the client, we have two flavors:
• Remote service accepting a network request: Webhook Backend.
• A binary executed by K8S: Binary Plugin.
api-server
• At the heart of K8S sits the api-server.
• Provides REST endpoints for the cluster-state.
• All communications between K8S components go through the
api-server.
• Can be extended of-course…
api-server flow
• An incoming request goes through 3 stages:
• Authentication.
• Authorization.
• Admission Control.
Authentication
• At the authentication phase, we differentiate between 2 types of
users:
• Normal user accounts.
• Service accounts.
• K8s doesn’t manage normal users’ account and doesn’t have
an object representation for them.
• It does so for service accounts.
Authentication
• Each request is couple with either:
• A normal user.
• A service account.
• Is an anonymous request.
• Built-in authentication can be either:
• HTTP basic authentication.
• Client certificate (supports user groups as of version > 1.4).
• Bearer-token.
• Authentication Proxy.
Webhook
• You can configure a Webhook to handle bearer-tokens.
• The configuration file is passed through the --authentication-
token-webhook-file flag.
• The webhook will receive a POST request with the token and
should return a status field with the authentication result.
Authentication Proxy
• If there is an authentication proxy in your organization, you can
configure K8s to acknowledge the specific headers set by the
authentication proxy.
• For example:
--requestheader-username-headers=X-Original-User
--requestheader-group-headers=X-Original-Group
--requestheader-extra-headers-prefix=X-Original-Attribute-
Authorization
• There are several authorization modules that are shipped with
K8s.
• Configured through the --authorization-mode flag.
• When multiple modules are specified, they are invoked in a
serial manner.
• If a module rejects or accepts the request, no further module
will be executed.
• If all modules didn’t have an opinion, the request is rejected.
Request Attributes
• A request can either be a resource-API request, or a non-
resource request.
• For non-resource requests, authorization concerns the HTTP
verb and Request-path fields.
• For resource requests, authorization concerns the API, API
request verb, resource, namespace and API group fields.
• Common fields to both are: user, group and extra
(authentication provided attributes).
ABAC
• Stands for: Attribute-Based Access Control.
• Policy file should be specified with the flag: --authorization-
policy-file.
• The policy file holds one JSON per line.
• Changing the policy file requires a restart of the api-server.
• Note that an unspecified property default to its zero value.
• Let’s see a policy example…
RBAC
• Stands for: Role-Based Access Control.
• When enabled, defines 4 object types: Role, ClusterRole,
RoleBinding and ClusterRoleBinding.
• Users can interact with these types just like any other K8s types
(e.g., pods).
• Let’s review them…
Roles & ClusterRoles
• Role objects specify permissions to a single namespace.
• ClusterRole objects specify permissions cluster-wide (across all
namespaces).
• Example:
kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
namespace: app1
name: default-deployment
rules:
- apiGroups: [”apps"] # "" indicates the core API group
resources: [”deployments"]
verbs: ["get","watch","list”, “create”]
RoleBinding
• Binds role to user/s.
• As before, RoleBinding refers to a single namespace, whereas
ClusterRoleBinding refers to all namespaces.
• Think, what does it mean to bind a ClusterRole using a
RoleBinding.
Subjects
• Note that subjects in a Binding can be one of: users, groups or
service accounts.
• Groups are provided by the authentication methods.
• For both users and groups the prefix: “system:” should be
disallowed and reserved for K8s system use.
Authorization – Webhook
• As mentioned before, a webhook is a REST extension to which
K8s send requests.
• In this case, authorization requests.
• Adding a webhook requires:
• A configuration file for the webhook.
• A service responding to SuccessAccessReview POST request.
Admission Control
• Admission control components execute after the request has
been authentication and authorized.
• They are built-in components that are compiled into the api-
server.
• They are specified as a flag to the api-server (--admission-
control).
• Order is important.
Admission Control
• Each admission-control component can operate in either (or
both) of two phases: mutation and validation.
• Mutating components can modify the object the request
operates on.
• For example: AlwaysPullImages (very useful in multi-tenancy
scenarios).
Extension Points
• MutatingAdmissionWebhook – runs in the mutation phase.
Invokes webhooks defined as MutatingWebhookConfiguration.
Matching requests are invoked in a serial manner.
• ValidatingAdmissionWebhook – runs in the validation phase.
Can’t mutate the state. Matching requests are invoked in
parallel. Configuration is based on
ValidatingWebhookConfiguration objects.
• ImagePolicyWebhook – Allows for reviewing container images
that are requested to be used.
ImagePolicyWebhook
• Requires a configuration file specified with the flag: --admission-
control-config-file.
• json or yaml format. Example:
imagePolicy:
kubeConfigFile: /Users/shimi/k8s/reviewer.yml
# time in s to cache approval
allowTTL: 50
# time in s to cache denial
denyTTL: 50
# time in ms to wait between retries
retryBackoff: 500
# determines behavior if the webhook backend fails
defaultAllow: false
ImagePolicyWebhook
• The admission-control configuration file points to the webhook
configuration:
clusters:
- name: image-review-server
cluster:
server: https://host1:9090/reviewer
#users refers to the API server's webhook configuration.
users:
- name: kube-apiserver
user:
token: blue-token
current-context: webhook
contexts:
- context:
cluster: image-review-server
user: kube-apiserver
name: webhook
ImageReview
• Your webhook will receive a POST request with an
ImageReview JSON document.
• It must fill the status field with an allow subfield of either true or
false values.
Objects & Resources
• Objects are persisted entities in K8S.
• Each object is composed of a spec and a status.
• A spec is a “record of intent” (e.g., a pod spec).
• A resource is a K8S endpoint that represents a collection of
objects.
Custom Resources
• Custom resources allows you to incorporate third-party
resources into K8S management.
• The easiest way (albeit the less flexible one) for defining a
custom resource is by: CustomResourceDefinition (CDR).
• Let’s see an example…
CDR
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
# name must match the spec fields below, and be in the form:<plural>.<group>
name: databases.trainologic.com
spec:
group: trainologic.com
version: v1
# can be either Namespaced or Cluster
scope: Namespaced
names:
plural: databases
singular: database
kind: DataBase
shortNames:
- db
- dbs
CDR
• Now, kubectl ”understands” databases.
• E.g.: kubectl get dbs
• And we can create resources of this custom type.
• However, this only allows us to manage simple CRUD
operations.