Cloud Native Night November 2018, Munich: Talk by Dirk Marwinski (SAP).
Join our Meetup: www.meetup.com/cloud-native-muc
Abstract: There are many Open Source tools which help in creating and updating single Kubernetes clusters. Corporations usually require many clusters, depending on their size they may require hundreds or even thousands of clusters. However, the more clusters you need the harder it becomes to operate, monitor, manage, and keep all of them alive and up-to-date.
That is exactly what open source project “Gardener” focuses on. It is not just another provisioning tool, but it is rather designed to manage Kubernetes clusters as a service. It provides Kubernetes-conformant clusters on various cloud providers and the ability to maintain hundreds or thousands of them at scale. At SAP, we face this heterogeneous multi-cloud & on-premise challenge not only in our own platform, but also encounter the same demand at all our larger and smaller customers implementing Kubernetes & Cloud Native.
Inspired by the possibilities of Kubernetes and the ability to self-host, the foundation of Gardener is Kubernetes itself. While self-hosting, as in, to run Kubernetes components inside Kubernetes is a popular topic in the community, we apply a special pattern catering to the needs of operating a huge number of clusters with minimal total cost of ownership.
In this session Dirk will provide a comprehensive overview of Gardener, the underlying concepts, and talk about interesting implementation details. In addition there will be a hands-on sessions where attendants will be given free access to a Gardener instance and given the opportunity to dynamically create Kubernetes cluster and test them.
2. From Containers to Kubernetes
VM
Host OS
Container Runtime
Benefits
Isolation
Immutable infrastructure
Portability
Faster deployments
Versioning
Ease of sharing
Challenges
Networking
Deployments
Service Discovery
Auto Scaling
Persisting Data
Logging, Monitoring
Access Control
Kubernetes
Orchestration of cluster of containers across
multiple hosts
• Automatic placements, networking,
deployments, scaling, roll-out/-back, A/B testing
Docker
Workload Portability
• Abstract from cloud provider specifics
• Multiple container runtimes
Declarative – not procedural
• Declare target state, reconcile to desired state
• Self-healing
Container Scheduler
Container
3. What does Kubernetes not cover ?
• Install and manage many clusters
• Homogeneously across Multi-Cloud
• Public Cloud Providers
• Private Cloud
• Zero Ops
• Minimal TCO
• Manage Nodes
• Manage Control Planes
• Day 2 Operations
Gardener
4. WHAT do we want to achieve with the Gardener?
Provide and establish solution for Kubernetes Clusters as a Service
Central Provisioning
Engage with Open Source community,
foster adoption, become CNCF project
Large scale organisations need hundreds or
thousands of clusters
5. WHAT do we want to achieve with the Gardener?
Securely and Homogenously on Hyper-Scale Providers and for the Private Cloud
Full Control of Kubernetes,
Homogeneous Across All Installations
AWS, Azure, GCP, Alibaba and Others
Private DCs for Data Privacy:
OpenStack
and eventually Bare Metal
Secure by default infrastructure and
clusters
6. WHAT do we want to achieve with the Gardener?
with Minimal TCO and Full Day-2 Operations Support
Full Automation, Backup & Recovery,
High Resilience and Robustness, Self-Healing,
Auto-Scaling, …
Rollout Bug Fixes, Security Patches,
Updates of Kubernetes, OS, Infrastructure,
Certificate Management,
…
7. Gardener Mission
Provide and establish solution for Kubernetes Clusters as a Service
Securely and Homogenously on Hyper-Scale Providers and for the
Private Cloud
with Minimal TCO and Full Day-2 Operations Support
8. Primary Gardener Architecture Principle
Following the definition of Kubernetes…
Kubernetes is a system for automating
deployment, scaling, and management
of containerized software
…we do the following:
We use Kubernetes to deploy, host and operate Kubernetes
Control planes are “seeded” into already existing clusters
9. Common Kubernetes Cluster Setup
Master
Master
Master
Worker/
Minion
Worker/
Minion
Worker/
Minion
Worker/
Minion
HA
Master
Master
Master
Worker/
Minion
Worker/
Minion
Worker/
Minion
HA
Master
Master
Master
Worker/
Minion
Worker/
Minion
HA
Master
Master
Master
Worker/
Minion
HA
Master
Master
Master
Worker/
Minion
Worker/
Minion
Worker/
Minion
Worker/
Minion
HA
Worker/
Minion
Worker/
Minion
Master
Worker/
Minion
The host the control plane,
often in HA and on separated hardware
(usually underutilized or, worse, overutilized)
green machines
The host the actual workload and
are managed by Kubernetes (usually pretty well utilized)
blue machines
Worker/
Minion
Master
Worker/
Minion
Worker/
Minion
10. Worker
Seed Cluster
Master
Master
Master
Worker
Worker
HA
Shoot Clusters
Worker
Worker
Worker Worker
Worker
Worker
Zooming into the
Seed Cluster reveals…
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Gardener Machine Controller Manager
Machine Provisioning
Self-Healing
Auto-Update
Auto-Scaling
Gardener Kubernetes Cluster Setup
Multiple Shoot Cluster
Control Planes
WorkerMaster
Master
Master
Worker
Worker
HA
Gardener Cluster
Inside a Seed
Cluster Worker
manages
API ServerETCD
SchedulerController Mgr
API Server
SchedulerController Mgr
ETCD
ETCD …API Server
11. Primary Gardener Design Principle
“Let Kubernetes drive the design
of the Gardener.”
Do not reinvent the wheel and…
12. Shoot ClusterSeed Cluster
kubectl
W Worker
...
Main PV
Worker
...
VPN D
Kubelet + Container Runtime
Calico DS
Actual Workload
Kube DNS D
Shoot Cluster VPN LB
Administrator
HTTPS
Seed Cluster API LB
Kubelet + Container Runtime
...
Garden Cluster
Worker
...
...
Kubelet + Container Runtime
Ingress LBGarden Cluster API LB
Gardener API Server D
Gardener Controller Manager D
Kubernetes
Dashboard
Gardener
Dashboard
Cloud Cockpit
UI
End-User
kubectl
Kubernetes
Dashboard
SCP
Seed Cluster
Control Plane
Storage
[K8s] DS, RS, SS, J, ...
[CRD] Shoot, Seed, ...
Garden Cluster
Control Plane
Storage
[K8s] DS, RS, SS, J, ...
[CRD] Machine Deployment
R
R RRR RR
R RR
R
R R
R R
R
R
R
R
R
SAP
New Shoot Clusters can be created via the Gardener
dashboard or by uploading a new Shoot resource to
the Garden Cluster. The Gardener picks it up and starts
a Terraform job to create the necessary IaaS
components. Then it deploys the Shoot Cluster Control
Plane into the Seed Cluster and required add-ons into
the Shoot Cluster. Update or delete operations are
handled by the Gardener fully automatically as well.
Kube Proxy DS
Logging
Garden Cluster Shoot Cluster
gardenctl
R
R
Optional Addons
R
R
R
R
...
Shoot Cluster API LB
Kubify
R
Gardener Dashboard D
R
R
R
Monitoring VPN
IaaS
R
Scheduler D
Controller Manager D
SSetcd Main Backup
Events PVetcd Events SS
API Server VPN D
Terraformer J
Machine Controller D
Addon Manager D
Shoot Cluster Control Plane
R
IaaS
R
R
Seed Cluster
13. Following the Design Principle Gardener uses…
K8S building blocks
Kubernetes as deployment underlay
Deployments Stateful Sets
API Server Extension
CRDsPVs PVCs
Driver RBAC
Admission ControlControllers
Reconciliation
Pods
Replicasets
Additional Tooling
Cluster
AutoscalerCalico
Network policies
Helm
Add-On Manager Prometheus
EFK Stack
Load-
Balancer
Config
Maps
Jobs Secrets
Workload
Cert Broker
Cert Manager
14. Where are all These Clusters Coming From?
Garden clusters are set up with Kubify (a Gardener family project based on Hashicorp Terraform)
This is about to be replaced with the Gardener Ring (more on the next slide)
Seed clusters used to be set up with Kubify,
but are since early 2018 created as shoot clusters themselves, fully automated by the Gardener
Shoot clusters are created since the beginning by the Gardener
15. Gardener Ring – Let Gardener manage itself
That’s were it will all start…
Garden Cluster
Control Plane B
Garden Cluster
Control Plane C
Garden Cluster
Control Plane A
Gardener
Control Plane
Bootstrap Cluster
(Kubify, Minikube, …)
Garden Cluster B
(Gardener)
Garden Cluster A
(Gardener)
Garden Cluster C
(Gardener)
- Bootstrap Cluster
- Garden Cluster A
- Garden Cluster B
- Garden Cluster C
Seeds
Gardener
Control Plane
Gardener
Control Plane
16. Lingua Franca – Gardener Cluster Resource
apiVersion: garden.sapcloud.io/v1
kind: Shoot
metadata:
name: my-cluster
namespace: garden-project
spec:
dns:
provider: aws-route53
domain: cluster.ondemand.com
cloud:
aws:
networks:
vpc:
cidr: 10.250.0.0/16
workers:
- name: cpu-worker
machineType: m4.xlarge
autoScalerMin: 5
autoScalerMax: 20
kubernetes:
version: 1.11.2
kubeAPIServer:
featureGates: ...
runtimeConfig: ...
admissionPlugins: ...
kubeControllerManager:
featureGates: ...
kubeScheduler:
featureGates: ...
kubelet:
featureGates: ...
maintenance:
timeWindow:
begin: 220000+0000
end: 230000+0000
autoUpdate:
kubernetesVersion: true
status:
...
Avoid Vendor Lock-In (Lingua Franca)
Native Kubernetes Resource
Define Your Infrastructure Needs
Specify (Multiple) Worker Pools
Gardener or Self-Managed DNS
Tweak Kubernetes Control Plane
Set Kubernetes Version
Define When and What to Update
Gardener Reported Status
19. Gardener Community Installer
Setting up a Gardener landscape is not trivial,
so we have a community installer:
https://github.com/gardener/landscape-setup
• Many shortcuts to make it simple (Gardener and Seed in a single cluster)
• Do not use productively!
• You can use it as a starter for a productive setup
• Different cluster and different cloud provider accounts recommended
20. • The Problem
• Provisioning and de-provisioning of nodes is out of the scope of standard Kubernetes right
now
• Gardener was using terraform scripts for provisioning and this is proving unmanageable
• No mechanism existed to smoothly scale clusters or upgrade cluster nodes for all providers
• The Solution
• Machine Controller Manager (MCM) provides a Kubernetes-native declarative way to
describe the relevant aspects of the nodes required in the Kubernetes cluster
• It enables support for different cloud providers by the way of modular plugins
• It enables easy scaling of the cluster and upgrade of cluster nodes
Kubernetes Machine Controller Manager (MCM)
21. MCM Model
Model for Kubernetes
deployments (Deployment,
ReplicaSet, Pod) works great
so why not use if for
machines?
Pod
ReplicaSet
Deployment
Machine
MachineSet
MachineDeployment
23. ETCD
(Key-value store)
Kubernetes
API Server
Machine
Deployment
Controller
Kubectl
Machine Set
Controller
Machine
Controller
Kubernetes
Controller
Manager
Cloud
Provider API
Machine
Class + Secret
V1
Machine
Class + Secret
Machine
Class + Secret
V1
Machine
Deployment
Class: V1
Replicas: 3
Machine
Deployment
Class: V1
Replicas: 3
Machine
Set
Replicas: 3
Machine
Machine
Machine
3 VMs
Node obj
Node obj
Node obj
Machine Controller
Manager
Node objects help in
monitoring the machine
status – Health
Working of MCM
24. ETCD
(Key-value store)
Kubernetes
API Server
Machine
Deployment
Controller
Kubectl
Machine Set
Controller
Machine
Controller
Kubernetes
Controller
Manager
Cloud
Provider API
Machine Controller
Manager
Machine
Deployment
Class: V1
Replicas: 3
Machine
Machine
Machine
Node 1
Node 2
Node 3
Forked Cluster
Autoscaler
Pod
Image: Nginx
Node: Unschedulable
Pod
Image: Nginx
Node: -
Machine
Deployment
Class: V1
Replicas: 4
MachineNode 4
Pod
Image: Nginx
Node: Node4
Now assume that all the nodes resources are
nearly consumed and a new pod is created
Autoscaling
26. • Non-expert Kubernetes users can easily make mistakes that make their
clusters vulnerable – in many different ways.
• We have already seen some of this and we see a lot of potential for more.
• The goal is to offer users an option to order their clusters with a safety net
in order to avoid common misconfigurations and mistakes.
• Offer a restricted setup but allow users to actively de-activate them.
• Protect scenarios where you hand out a kubeconfig file or happen to
run unprivileged code in the cluster.
Clusters Shall be Secure By Default
27. • Limit default service account and its privileges to access the cluster API server or disable it completely
with automountServiceAccountToken
• When overlooked, this will provide untrusted code with cluster-admin privileges
• Deny access to metadata service via network policies
• Many hazards due to potential sensitive information provided by the metadata service
• In general deny access to any network ranges nobody should access via network policies
• Might have to protect your company’s internal network or shield tenants from each other
• Offer clusters with “allow-privileged” set to false
• PSP preventing privileged pods, rejecting root users via MustRunAs, denying hostPath,
hostNetwork, or hostPID pods
• With some it is quite easy for an attacker to take over cluster nodes
• Deny access to certain privileged infrastructure pods (calico)
• See above
• ImagePolicyWebhook admission controller to restrict from where to pull images
• see Tainted, crypto-mining containers pulled from Docker Hub on what happens even on Docker Hub
• …
Features
28. Gardener Community Installer
• Setting up a Gardener landscape is not trivial, so we have provided a
community installer:
https://github.com/gardener/landscape-setup
• This is a a setup with many shortcuts to make it as simple as possible
(Gardener and seed in one single cluster).
• Do not use productively! You can however use it as a starter for a
productive setup.
• Different cluster and different cloud provider accounts recommended.
29. Gardener Blog, CNCF Presentation à Hacker News, Reddit, Kubernetes Podcast
Gardener is Open Source
Long-Term Goal
Become CNCF Project
31. Thank You!
GitHub: https://github.com/gardener
Landing Page: https://gardener.cloud (Preview: https://gardener.github.io/website)
Wiki: https://github.com/gardener/documentation/wiki
Mailing List: https://groups.google.com/forum/?fromgroups#!forum/gardener
Set up your own Gardener: https://github.com/gardener/landscape-setup-template
Community Installer: https://github.com/gardener/landscape-setup
Kubernetes Slack Channel: https://kubernetes.slack.com/messages/gardener
(most of the communication happens here)