SlideShare a Scribd company logo
1 of 56
Download to read offline
Kubernetes - Beyond a Black Box
A humble peek into one level below your running application in production
Copyright Ā© 2017 Hao Zhang All rights reserved.
About The Author
ā€¢ Hao (Harry) Zhang
ā€¢ Currently: Software Engineer, LinkedIn, Team Helix @ Data
Infrastructure
ā€¢ Previously: Member of Technical Staff, Applatix, Worked with
Kubernetes, Docker and AWS
ā€¢ Previous blog series: ā€œMaking Kubernetes Production Readyā€
ā€¢ Part 1, Part 2, Part 3
ā€¢ Or just Google ā€œKubernetes productionā€, ā€œKubernetes in productionā€, or similar
ā€¢ Connect with me on LinkedIn
Copyright Ā© 2017 Hao Zhang All rights reserved.
Motivation
ā€¢ Itā€™s one of the hottest cluster management project on Github
ā€¢ It has THE largest developer community
ā€¢ State-of-art architecture designed mainly by people from Google
who have been working on container orchestration for 10 years
ā€¢ Collaboration of hundreds of engineers from tens of companies
Copyright Ā© 2017 Hao Zhang All rights reserved.
Introduction
This set of slides is a humble dig into one level below your running
application in production, revealing how different components of
Kubernetes work together to orchestrate containers and present your
applications to the rest of the world.
The slides contains 80+ external links to Kubernetes documentations, blog
posts, Github issues, discussions, design proposals, pull requests,
papers, source code ļ¬les I went through when I was working with
Kubernetes - which I think are valuable for people to understand how
Kubernetes works, Kubernetes design philosophies and why these design
came into places.
Copyright Ā© 2017 Hao Zhang All rights reserved.
Outline - Part I
ā€¢ A high level idea about what is Kubernetes
ā€¢ Components, functionalities, and design choice
ā€¢ API Server
ā€¢ Controller Manager
ā€¢ Scheduler
ā€¢ Kubelet
ā€¢ Kube-proxy
ā€¢ Put it together, what happens when you do `kubectl create`
Copyright Ā© 2017 Hao Zhang All rights reserved.
Outline - Part II
ā€¢ An analysis of controlling framework design
ā€¢ Micro-service Choreography
ā€¢ Generalized Workload and Centralized Controller
ā€¢ Interfaces for production environments
ā€¢ High level workload abstractions
ā€¢ Strength and limitations
ā€¢ Conclusion
Part I
Overview
A high level idea about what is Kubernetes
ā€“ Kubernetes Ofļ¬cial Website
ā€œKubernetes is an open-source system for
automating deployment, scaling, and
management of containerized
applications.ā€
Copyright Ā© 2017 Hao Zhang All rights reserved.
Manage Apps, Not Machines
Manages Your Apps
ā€¢ App Image Management
ā€¢ Where to Run
ā€¢ When to Get, when to Run, and when to Discard
ā€¢ App Image Usage
ā€¢ Image Service: Image lifecycle management
ā€¢ Runtime: Run image as application
Copyright Ā© 2017 Hao Zhang All rights reserved.
Manage Apps, Not Machines
Manages Things Around Your Apps
ā€¢ High level workload abstractions
ā€¢ Pod, Job, Deployment, StatefulSet, etc.
ā€¢ ā€œIf container resembles atom, Kubernetes provides moleculesā€ - Tim Hockin
ā€¢ Storages
ā€¢ Data volumes
ā€¢ Network
ā€¢ IP, Port mapping, DNS, ā€¦
ā€¢ Monitoring, scaling, communicating, ā€¦
Copyright Ā© 2017 Hao Zhang All rights reserved.
etcd3
Kube-apiserver
Kube-controller-
manager
Kube-scheduler
Persistent Data Volume
KubeletKube-proxy
Kubernetes Control Plane
KubeletKube-proxy KubeletKube-proxy
Copyright Ā© 2017 Hao Zhang All rights reserved.
A Simpliļ¬ed Typical Web Service
Load Balancer
Web Server
MemCache
Object Store
(BLOB)
Data Volumes
Database
(SQL / NoSQL)
Backend Sync
Read Services
Backend Write
Sync Services
Backend Write
Async Services
Task Queue
Worker Pool
Browser Client
Copyright Ā© 2017 Hao Zhang All rights reserved.
Web Service On Kube (AWS)
VPC (Virtual Private Cloud)
Subnet
EC2 Instance
EBS etcd
apiserver
Sched
Kubelet
CtlMgr
Docker
EBS
Auto Scaling Group
EC2 Instance
EBS etcd
apiserver
Sched
Kubelet
CtlMgr
Docker
EBS
ElasticLoadBalancer
Subnet
Elastic Load Balancer
EC2 Instance
KubeletDocker
Kube-proxy
EBS
SyncWrite AsyncW
Worker Pool MemCache
EC2 Instance
KubeletDocker
Kube-proxy
EBS
Web SvrSyncRead
AsyncW TaskQ EBS
MemCache
EC2 Instance
KubeletDocker
Kube-proxy
EBS
Web Svr Worker Pool
MemCache DB EBS
EC2 Instance
KubeletDocker
Kube-proxy
EBS
SyncRead SyncWrite
Worker PoolMemCache
DB EBS
EC2 Instance
KubeletDocker
Kube-proxy
EBS
SyncWrite TaskQ EBS
MemCache
DB EBS
EC2 Instance
KubeletDocker
Kube-proxy
EBS
Web SvrAsyncW
TaskQ EBSWorker Pool
DBEBS
Auto Scaling Group
VPC Routing Table
S3 (External Blob, Kubernetes Binaries, Docker Registry)
VPC Gateway
Note: This chart is a rough illustration about how different production components might sit in cloud.
It does not reļ¬‚ect any strict production conļ¬gurations such as high-availability, fault tolerance, machine load balancing etc.
Kubernetes Architecture
Components, Functionalities and Design Choice
Copyright Ā© 2017 Hao Zhang All rights reserved.
Kubernetes REST API
ā€¢ Every piece of information in Kubernetes is an API object (a.k.a
Cluster metadata)
ā€¢ Node, Pod, Job, Deployment, Endpoint, Service, Event,
PersistentVolume, ResourceQuota, Secrets, Conļ¬gMap, ā€¦ā€¦
ā€¢ Schema and data model explicitly deļ¬ned and enforced
ā€¢ Every API object has a UID and version number
ā€¢ Micro-services can perform CRUD on API objects through REST
ā€¢ Desired state in cluster is achieved by collaborations of separate
autonomous entities reacting on changes of one or more API
object(s) they are interested in
Copyright Ā© 2017 Hao Zhang All rights reserved.
Kubernetes REST API
ā€¢ The API Space
ā€¢ Different API Groups
ā€¢ Usually a group represents a set of features, i.e. batch, apps, rbac, etc.
ā€¢ Each group has its own version control
/apis/batch/v1/namespaces/default/jobs/myjob
Root Group
Name
Namespaced
Object
Namespace
Name
API Resource
Kind
API Object
Instance Name
Version
More Readings:
āœ¤ Extending APIs design spec: https://github.com/kubernetes/community/blob/master/contributors/design-proposals/api-machinery/extending-api.md
āœ¤ Customer Resource Documentation: https://kubernetes.io/docs/tasks/access-kubernetes-api/extend-api-custom-resource-deļ¬nitions/
āœ¤ Extending core API with aggregation layer: https://kubernetes.io/docs/concepts/api-extension/apiserver-aggregation/
āœ¤ Kubernetes API Spec: https://raw.githubusercontent.com/kubernetes/kubernetes/release-1.7/api/openapi-spec/swagger.json
āœ¤ Kubernetes API Server Deep Dives:
āœ¤ https://blog.openshift.com/kubernetes-deep-dive-api-server-part-1/
āœ¤ https://blog.openshift.com/kubernetes-deep-dive-api-server-part-2/
Copyright Ā© 2017 Hao Zhang All rights reserved.
Kubernetes REST API
ā€¢ The API Object Deļ¬nition
ā€¢ 5 major ļ¬elds: apiVersion, kind, metadata, spec and status
ā€¢ A few exceptions such as v1Event, v1Binding, etc
ā€¢ Metadata includes name, namespace, label, annotation, version, etc
ā€¢ Spec describes desired state, while status describes current state
$ kubectl get jobs myjob --namespace default --output yaml
apiVersion: batch/v1
kind: job
metadata:
name: myjob
namespace: default
spec:
(Describing desired ā€œjobā€ object, i.e. Pod template, how many runs, etc.)
status:
(ā€œjobā€ object current status, i.e. ran # times already, etc.)
Copyright Ā© 2017 Hao Zhang All rights reserved.
Kubernetes Control Plane
etcd3
Kube-apiserver
Kube-controller-
manager
Kube-scheduler
Persistent Data Volume (Usually SSD)
KubeletKube-proxy
REST/protobuf
REST/protobuf
REST/protobuf
REST/protobuf
RPC/protobuf
Master Node
Minion Node
Copyright Ā© 2017 Hao Zhang All rights reserved.
Kube-apiserver & Etcd
etcd3
Kube-apiserver
Kube-controller-
manager
Kube-scheduler
Persistent Data Volume (Usually SSD)
KubeletKube-proxy
REST/protobuf
REST/protobuf
REST/protobuf
REST/protobuf
RPC/protobuf
Master Node
Minion Node
Copyright Ā© 2017 Hao Zhang All rights reserved.
Kube-apiserver & Etcd
Etcd as metadata store:
ā€¢ Watchable distributed K-V store
ā€¢ Transaction and locking support
ā€¢ Scalable and low memory consumption
ā€¢ RBAC support
ā€¢ Uses Raft for leader election
ā€¢ R/W on all nodes of Etcd cluster
ā€¢ ā€¦
General Functionalities:
ā€¢ Authentication
ā€¢ Admission control (RBAC)
ā€¢ Rate limiting
ā€¢ Connection pooling
ā€¢ Feature grouping
ā€¢ Pluggable storage backend
ā€¢ Pluggable customer API deļ¬nition
ā€¢ Client tracing and API call stats
For API Objects:
ā€¢ CRUD Operations
ā€¢ Defaulting
ā€¢ Validation
ā€¢ Versioning
ā€¢ Caching
For Containers:
ā€¢ Application Proxying
ā€¢ Stream Container Logs
ā€¢ Execute Command in Container
ā€¢ Metrics
WRITE API Call Process Pipeline:
1. Route through mux to the routine
2. Version conversions
3. Validation
4. Encode
5. Storage backend communication
READ API calls are process roughly
opposite of WRITE API calls.
etcd3
Kube-apiserver
gRPC + protobuf
REST APIs:
HTTP/HTTPS
Compute Node
Usually Co-locate
on same host
SSD
Copyright Ā© 2017 Hao Zhang All rights reserved.
Kube-apiserver & Etcd
ā€¢ API Server caching
ā€¢ Decoding object from Etcd is a huge burden
ā€¢ Without caching, 50% CPU spent on decoding objects read from etcd,
70% of which is on convert K/V (as of Etcd2)
ā€¢ Also adds pressure on Golangā€™s GC (Lots of unused K/V)
ā€¢ In-memory Cache for READs (Watch, Get and List)
ā€¢ Decode only when etcd has more up-to-date version of the object
Copyright Ā© 2017 Hao Zhang All rights reserved.
Kube-apiserver & Etcd
Citation and More Readings:
āœ¤ Etcd! Etcd3!
āœ¤ Discussion about pluggable storage backend (ZK, Consul, Redis, etc): https://github.com/kubernetes/kubernetes/issues/1957
āœ¤ More on ā€œWhy etcd?ā€ (Could be biased): https://github.com/coreos/etcd/blob/master/Documentation/learning/why.md
āœ¤ Upgrade to etcd3: https://github.com/kubernetes/kubernetes/issues/22448
āœ¤ Etcd3 Features: https://coreos.com/blog/etcd3-a-new-etcd.html
āœ¤ Kubernetes & Etcd
āœ¤ Kubernetesā€™ Etcd usage: http://cloud-mechanic.blogspot.com/2014/09/kubernetes-under-hood-etcd.html
āœ¤ Raft - the consensus protocol behind etcd: https://speakerdeck.com/benbjohnson/raft-the-understandable-distributed-consensus-protocol/
āœ¤ Protocol Buffer: https://developers.google.com/protocol-buffers/
āœ¤ Setting up HA Kubernetes cluster: https://kubernetes.io/docs/admin/high-availability/#clustering-etcd
āœ¤ Kubernetes API
āœ¤ Kubernetes API Convention: https://github.com/kubernetes/community/blob/master/contributors/devel/api-conventions.md
āœ¤ API Server Optimizations and Improvements
āœ¤ Discussion about cache: https://github.com/kubernetes/kubernetes/issues/6678, Watch Cache Design Proposal
āœ¤ Discussion about optimizing etcd connection: https://github.com/kubernetes/kubernetes/issues/7451
āœ¤ Optimizing GET, LIST, and WATCH: https://github.com/kubernetes/kubernetes/issues/4817, https://github.com/kubernetes/kubernetes/issues/6564
āœ¤ Serve LIST from memory: https://github.com/kubernetes/kubernetes/issues/15945
āœ¤ Optimizing get many pods: https://github.com/kubernetes/kubernetes/issues/6514
Copyright Ā© 2017 Hao Zhang All rights reserved.
Kube-apiserver & Etcd
ā€¢ Kubernetesā€™ Performance SLO
ā€¢ API Responsiveness: 99% API calls < 1s
ā€¢ Pod Startup Time: 99% Pods and their containers start < 5s (With pre-pulled images)
ā€¢ 5000 Nodes, 150,000 Pods, ~300,000 Containers (1~1.5M API objects)
ā€¢ Resource (Single master set up for 1000 nodes as of v1.2): 32 Cores, 120G Memory
Citation and More Readings:
āœ¤ Kubernetes 1.2 Scalability: http://blog.kubernetes.io/2016/03/1000-nodes-and-beyond-updates-to-Kubernetes-performance-and-scalability-in-12.html
āœ¤ Kubernetes 1.6 Scalability: http://blog.kubernetes.io/2017/03/scalability-updates-in-kubernetes-1.6.html
Copyright Ā© 2017 Hao Zhang All rights reserved.
Kube-controller-manager
etcd3
Kube-apiserver
Kube-controller-
manager
Kube-scheduler
Persistent Data Volume (Usually SSD)
KubeletKube-proxy
REST/protobuf
REST/protobuf
REST/protobuf
REST/protobuf
RPC/protobuf
Master Node
Minion Node
Copyright Ā© 2017 Hao Zhang All rights reserved.
Kube-controller-manager
ā€¢ Framework: Reļ¬‚ector, Store, SharedInformer, and Controller
ā€¢ One of the optimizations in 1.6 to make it scale from 2k to 5k nodes
ā€¢ ListAndWatch on a particular resource
ā€¢ All instance of a resource
ā€¢ List upon startup and connection broken
ā€¢ List: get current states
ā€¢ Watch: get updates
ā€¢ Push ListAndWatch results into store
ā€¢ Store is always up-to-date
ā€¢ In memory read-only cache
ā€¢ Shared among informers
ā€¢ Optimizes GET and LIST
ā€¢ Cache Update Broadcast
ā€¢ Events: OnAdd, OnUpdate, OnDelete
ā€¢ Store is at least as fresh as notiļ¬cation
Controller
Reļ¬‚ector
Store
SharedInformer
Event Handlers
Task Queue
ā€¢ Blocking, MT Safe FIFO Queue
ā€¢ Additional feature such as delayed
scheduling, failure counts, etc.
ā€¢ Controller registers event handlers
to the shared informer of the
resource it is interested in
ā€¢ A single SharedInformer has a pool
of event handlers registered by
different controllers
ā€¢ Worker pool of a particular resource
ā€¢ Reconciliation controlling loops
ā€¢ Action is computed based on
DesiredState and CurrentState
Copyright Ā© 2017 Hao Zhang All rights reserved.
Kube-controller-manager
ā€¢ Reconciliation Controlling Loop: Things will ļ¬nally go right
func (c *Controller) Run(poolSize int, stopCh chan struct{}) {
// start worker pool
for i := 0; i < poolSize; i++ {
// runWorker will loop until "something bad" happens. The .Until will
// then rekick the worker after one second
go wait.Until(c.runWorker, 1 * time.Second, stopCh)
}
// wait until we're told to stop
<-stopCh
}
func (c *Controller) runWorker() {
// hot loop until we're told to stop.
for c.processNextWorkItem() {}
}
func (c *Controller) processNextWorkItem() {
obj = c.queue.Get()
// Reconciliation between current and desired state
err = c.makeChange(obj.CurrentState, obj.DesiredState)
if !err {
return
}
// Cool down and retry using delayed scheduling upon error
c.queue.AddRateLimited(obj)
return
}
Copyright Ā© 2017 Hao Zhang All rights reserved.
Kube-controller-manager
Shared Resource Informer Factory
Shared Cache
Pod Reļ¬‚ Job Reļ¬‚ Dep Reļ¬‚ Svc Reļ¬‚ Node Reļ¬‚ Ds Reļ¬‚ Rs Reļ¬‚ ā€¦ā€¦
Pod EventHdl Job EventHdl Dep EventHdl Svc EventHdl Node EventHdl Ds EventHdl Rs EventHdl ā€¦ā€¦
Pod Informer Job Informer Dep Informer Svc Informer Node Informer Ds Informer Dep Informer Other Informers
Shared Kubernetes Client
Handler Event Dispatching Mesh
Dep Worker Pool
Dep Controller
Queue
Job Worker Pool
Job Controller
Queue
Svc Worker Pool
Svc Controller
Queue
Rs Worker Pool
Rs Controller
Queue
Ds Worker Pool
Ds Controller
Queue
Node Worker Pool
Node Controller
Queue
XYZ Worker Pool
XYZ Controller
Queue
Copyright Ā© 2017 Hao Zhang All rights reserved.
Kube-controller-manager
ā€¢ 28 Controllers as of 1.8, usually deployed as a Pod
ā€¢ All autonomous micro-services: easily turn on/off
ā€¢ HA via leader election
ā€¢ Shared Kubernetes client
ā€¢ Limit QPS, manage session/HTTPConnection, auth, retry with jittered backoff
ā€¢ InformerFactory (SharedInformer)
ā€¢ Reduce API Call, ensure cache consistency, lighter GET/LIST
ā€¢ Reļ¬‚ector / Store / Informer / Controller framework
ā€¢ Easy, separated development of different controllers as micro services
ā€¢ Worker Pool
ā€¢ Individually conļ¬gurable agility and resource consumption
Copyright Ā© 2017 Hao Zhang All rights reserved.
Kube-controller-manager
More Readings:
āœ¤ Controller Dev Guide: https://github.com/kubernetes/community/blob/master/contributors/devel/controllers.md
āœ¤ Controller Manager Main: https://github.com/kubernetes/kubernetes/blob/master/cmd/kube-controller-manager/app/controllermanager.go
āœ¤ Shared Informer: https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/client-go/tools/cache/shared_informer.go
āœ¤ Controller Conļ¬g: https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/client-go/tools/cache/controller.go
āœ¤ Reļ¬‚ector: https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/client-go/tools/cache/reļ¬‚ector.go
āœ¤ List and Watch: https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/client-go/tools/cache/listwatch.go
āœ¤ All Controller Logics: https://github.com/kubernetes/kubernetes/tree/master/pkg/controller
Copyright Ā© 2017 Hao Zhang All rights reserved.
Kube-scheduler
etcd3
Kube-apiserver
Kube-controller-
manager
Kube-scheduler
Persistent Data Volume (Usually SSD)
KubeletKube-proxy
REST/protobuf
REST/protobuf
REST/protobuf
REST/protobuf
RPC/protobuf
Master Node
Minion Node
Copyright Ā© 2017 Hao Zhang All rights reserved.
Kube-scheduler
ā€¢ Scheduling Algorithm
ā€¢ Filter Out Impossible Node
ā€¢ NodeIsReady: Node should be in Ready state
ā€¢ CheckNodeMemoryPressure: Node can be over-provisioned,
ļ¬lter out node with memory pressure
ā€¢ CheckNodeDiskPressure: Node reporting host disk pressure
should not be considered
ā€¢ MaxNodeEBSVolumeCount: If a Pod needs a volume, ļ¬lter out
nodes reach max volume count
ā€¢ MatchNodeSelector: If Pod has a nodeSelector, it should only
consider nodes with that label
ā€¢ PodFitsResources: Node should have resource quota
ā€¢ PodFitsHostPort: Podā€™s HostPort should be available
ā€¢ NoDiskConļ¬‚ict: We should have volume this Pod wants
ā€¢ NoVolumeZoneConļ¬‚ict: Filter out nodes with zone conļ¬‚ict
ā€¢ ā€¦ā€¦
ā€¢ Give Every Node a Score
ā€¢ LeastRequestedPriority: Nodes ā€œmore idleā€ is preferred
ā€¢ BalancedResourceAllocation: Balance node utilization
ā€¢ SelectorSpreadPriority: Replicas should be spread out
ā€¢ Afļ¬nity/AntiAfļ¬nityPriority: ā€œI prefer to stay away / co-locate
from some other categories of Podsā€
ā€¢ ImageLocalityPriority: Node is preferred if it has the image
ā€¢ ā€¦ā€¦
Kubernetes Client
Node Filter
Node Ranker
Decision
Watch Unscheduled Pods
Node Candidates
Nodes Sorted By Ranking
POST/v1/bindings
Node with highest ranking
Copyright Ā© 2017 Hao Zhang All rights reserved.
Kube-scheduler
ā€¢ Default scheduler
ā€¢ Sequential scheduler
ā€¢ Some application / topology awareness
ā€¢ User can deļ¬ne afļ¬nity, anti-afļ¬nity, node-selector, etc
ā€¢ Schedule 30,000 Pods onto 1,000 Nodes within 587 Seconds (In v1.2)
ā€¢ HA via leader election
ā€¢ Supports multiple schedulers
ā€¢ Pod can choose the scheduler that schedules it
ā€¢ Schedulers share same pool of nodes, race condition is possible
ā€¢ Kubelet can reject Pods in case of conļ¬‚ict, and triggers reschedule
ā€¢ Easy to write userā€™s own scheduler or just extend algorithm provider for default
scheduler
Copyright Ā© 2017 Hao Zhang All rights reserved.
Kube-scheduler
More Readings:
āœ¤ Documentations:
āœ¤ Scheduler Overview: https://github.com/kubernetes/community/blob/master/contributors/devel/scheduler.md
āœ¤ Guaranteed scheduling through re-scheduling: https://kubernetes.io/docs/tasks/administer-cluster/guaranteed-scheduling-critical-addon-pods/
āœ¤ Default Scheduling Algorithm: https://github.com/kubernetes/community/blob/master/contributors/devel/scheduler_algorithm.md
āœ¤ Blogs
āœ¤ Scheduler Optimization by CoreOS: https://coreos.com/blog/improving-kubernetes-scheduler-performance.html
āœ¤ Advanced Scheduling and Customized Scheduler After 1.6: http://blog.kubernetes.io/2017/03/advanced-scheduling-in-kubernetes.html
āœ¤ Critical Code Snippets
āœ¤ Scheduling Predicates: https://github.com/kubernetes/kubernetes/blob/master/plugin/pkg/scheduler/algorithm/predicates/predicates.go
āœ¤ Scheduling Defaults: https://github.com/kubernetes/kubernetes/blob/master/plugin/pkg/scheduler/algorithmprovider/defaults/defaults.go
āœ¤ Designs, Specs, Discussions
āœ¤ Scheduling Feature Proposals: https://github.com/kubernetes/community/tree/master/contributors/design-proposals/scheduling
Copyright Ā© 2017 Hao Zhang All rights reserved.
Kubelet
etcd3
Kube-apiserver
Kube-controller-
manager
Kube-scheduler
Persistent Data Volume (Usually SSD)
KubeletKube-proxy
REST/protobuf
REST/protobuf
REST/protobuf
REST/protobuf
RPC/protobuf
Master Node
Minion Node
Copyright Ā© 2017 Hao Zhang All rights reserved.
Kubelet
ā€¢ Kubernetesā€™ node worker
ā€¢ Ensure Pods run as described by spec
ā€¢ Report readiness and liveness
ā€¢ Node-level admission control
ā€¢ Reserve resource, limite # of Pods, etc.
ā€¢ Report node status to API server (v1Node)
ā€¢ Machine info, resource usage, health, images
ā€¢ Host stats (Cgroup metrics via cAdvisor)
ā€¢ Communicate with container through runtime interface
ā€¢ Stream logs, execute commands, port-forward, etc
ā€¢ Usually not run as a container, but a service under Systemd
Copyright Ā© 2017 Hao Zhang All rights reserved.
Kubelet
ContainerRuntime(i.e.Docker)
ā€¦
Pod Lifecycle Event Generator (PLEG)
(Pending, Running, Succeeded, Failed, Unknown)
Runtime Status
(RPC)
Pod Cache
Kubernetes
ClientWatched
PodsiNotifyPod Spec
Source
Pod Worker UID1
Pod Worker UID2
Pod Worker UIDN
Runtime Op
(RPC)
Pod Worker Pool
External Pod Event
Sources:
i.e. Container runtime
event stream, cAdvisor
events (Cgroup)
Pod Status
Report
Volume
Management
Container
Prober
Main Sync Loop
Pod Conļ¬g Channel
Pod Re-sync Channel
(Clock Tick)
Housekeeping Channel
(Clock Tick)
Cleanup unwanted
Pods, pod workers,
release volumes, etc
PLEG
Channel
Copyright Ā© 2017 Hao Zhang All rights reserved.
Kubelet
ā€¢ Main Controller Loop
ā€¢ Listen on events from channels and react
ā€¢ PLEG (Pod Lifecycle Event Generation)
ā€¢ Listen and List on external sources, i.e. container runtime, cAdvisor
ā€¢ Translate container status changes to Pod Events
ā€¢ Pod Worker
ā€¢ Interact with container runtime
ā€¢ Move Pod from current state (v1Pod.status) to desired state
(v1Pod.spec)
Copyright Ā© 2017 Hao Zhang All rights reserved.
Kubelet
ā€¢ Container Prober
ā€¢ Probe container liveness and healthiness using methods deļ¬ned by
spec
ā€¢ Publish event (v1Event) and put container in CrashLoopBackoff upon
probing failures
ā€¢ Volume Manager
ā€¢ Local directory management, device mount/unmount, device
initialization, etc
ā€¢ cAdvisor
ā€¢ Container matrix collection and reporting, events about kernel Cgroups
Copyright Ā© 2017 Hao Zhang All rights reserved.
Kubelet
ā€¢ How Pod is set up
ā€¢ Sandbox: An isolated environment with resource constraints
ā€¢ Impl. Based on runtime
ā€¢ Could be VM, Linux Cgroup, etc
ā€¢ Kubelet sets the following for docker impl.
ā€¢ Cgroup, Port mapping, Linux security context
ā€¢ Holds resolv.conf that is shared by all containers in Pod
ā€¢ dockerService.makeSandboxDockerConfig()
ā€¢ App Containers
ā€¢ Use Sandbox as Cgroup parent
ā€¢ Share same ā€œlocalhostā€
ā€¢ dockerService.CreateContainer()
ā€¢ kubeGenericRuntimeManager. startContainer()
Pod App
Container
Pod
Pod App
Container
Port: 80
curl localhost:80
Pod Sandbox
(Infra Container)
PodIP: 192.168.12.3
PortMapping: 80::33580
curl 192.168.12.3:33580
Copyright Ā© 2017 Hao Zhang All rights reserved.
Kubelet
More Readings:
āœ¤ Kubelet Main: https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/kubelet.go
āœ¤ Kubelet main function: Kubelet.Run()
āœ¤ Kubelet sync Loop channel event processing: Kubelet.syncLoopIteration()
āœ¤ Kubelet sync Pod: Kubelet.syncPod()
āœ¤ Pod Worker: https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/pod_workers.go
āœ¤ Container Runtime Sync Pod: https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/kuberuntime/kuberuntime_manager.go
āœ¤ Sync Pod using CRI: kubeGenericRuntimeManager.SyncPod()
āœ¤ Container Runtime Start Container: https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/kuberuntime/kuberuntime_container.go
āœ¤ How kubelet setup container: kubeGenericRuntimeManager.startContainer()
āœ¤ Implementation of PodSandbox using Docker: https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/dockershim/docker_sandbox.go
āœ¤ Create Pod sandbox: dockerService.RunPodSandbox()
āœ¤ PLEG Design Proposal: https://github.com/kubernetes/community/blob/master/contributors/design-proposals/node/pod-lifecycle-event-generator.md
āœ¤ Pod Cgroup Hierarchy: https://github.com/kubernetes/community/blob/master/contributors/design-proposals/node/pod-resource-
management.md#cgroup-hierachy
āœ¤ All Kubelet related design docs: https://github.com/kubernetes/community/tree/master/contributors/design-proposals/node
Copyright Ā© 2017 Hao Zhang All rights reserved.
Kube-proxy
etcd3
Kube-apiserver
Kube-controller-
manager
Kube-scheduler
Persistent Data Volume (Usually SSD)
KubeletKube-proxy
REST/protobuf
REST/protobuf
REST/protobuf
REST/protobuf
RPC/protobuf
Master Node
Minion Node
Copyright Ā© 2017 Hao Zhang All rights reserved.
Kube-proxy
Kubernetes
Client
Kube-proxy
WATCH /api/v1/service
WATCH /api/v1/endpoint
Kernel iptables
Manages iptable rules
ā€¢ Daemon on every node
ā€¢ Watch on changes of Service and
Endpoint
ā€¢ Manipulate iptable rules on host to
achieve service load balancing (one of
the implementations)
More Readings:
āœ¤ v1Service object and different types of proxy implementations: https://kubernetes.io/
docs/concepts/services-networking/service/#virtual-ips-and-service-proxies
āœ¤ iptable proxy implementation: https://github.com/kubernetes/kubernetes/blob/
master/pkg/proxy/iptables/proxier.go
āœ¤ Proxier.syncProxyRules()
āœ¤ Other built-in proxy implementations: https://github.com/kubernetes/kubernetes/
tree/master/pkg/proxy
āœ¤ Some early discussions about Service object and implementations: https://
github.com/kubernetes/kubernetes/issues/1107
Put It Together
What will happen after `kubectl create`
Copyright Ā© 2017 Hao Zhang All rights reserved.
Put it together
$ kubectl create -f myapp.yaml
Deployment nginx-deployment created
$
$ cat myapp.yaml
apiVersion: apps/v1beta1
kind: Deployment
metadata:
name: nginx-deployment
spec:
replicas: 3
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:1.7.9
ports:
- containerPort: 9376
ā€¢ What will happen when you do the
following?
Copyright Ā© 2017 Hao Zhang All rights reserved.
Kubernetes Reaction Chain
Kubenetes API Server & Etcd
Kubectl
Deployment Controller
WATCH v1Deployment, v1ReplicaSet
ReplicaSet Controller
WATCH v1ReplicaSet, v1Pod
Scheduler
WATCH v1Pod,
v1Node
Kubelet-node1
WATCH v1Pod on node1
CRI
CNI
Note: this is a rather high-level idea about Kubernetes reaction chain. It hides some details for illustration purpose and is different than actual behavior
Copyright Ā© 2017 Hao Zhang All rights reserved.
Kubernetes Reaction Chain
Kubenetes API Server & Etcd
1.POSTv1Deployment
Kubectl
v1Deployment
2. Persist v1Deployment API object
desiredReplica: 3
availableReplica: 0
3.Reļ¬‚ectorupdatesDep
Deployment Controller
WATCH v1Deployment, v1ReplicaSet
ReplicaSet Controller
WATCH v1ReplicaSet, v1Pod
Scheduler
WATCH v1Pod,
v1Node
Kubelet-node1
WATCH v1Pod on node1
CRI
CNI
Note: this is a rather high-level idea about Kubernetes reaction chain. It hides some details for illustration purpose and is different than actual behavior
Copyright Ā© 2017 Hao Zhang All rights reserved.
Kubernetes Reaction Chain
Kubenetes API Server & Etcd
1.POSTv1Deployment
Kubectl
v1Deployment
2. Persist v1Deployment API object
desiredReplica: 3
availableReplica: 0
3.Reļ¬‚ectorupdatesDep
Deployment Controller
WATCH v1Deployment, v1ReplicaSet
4. Deployment reconciliation loop
action: create ReplicaSet
5.POSTv1ReplicaSet
v1ReplicaSet
6. Persist v1ReplicaSet API object
desiredReplica: 3
availableReplica: 0
ReplicaSet Controller
WATCH v1ReplicaSet, v1Pod
7.Reļ¬‚ectorupdatesRS
Scheduler
WATCH v1Pod,
v1Node
Kubelet-node1
WATCH v1Pod on node1
CRI
CNI
Note: this is a rather high-level idea about Kubernetes reaction chain. It hides some details for illustration purpose and is different than actual behavior
Copyright Ā© 2017 Hao Zhang All rights reserved.
Kubernetes Reaction Chain
Kubenetes API Server & Etcd
1.POSTv1Deployment
Kubectl
v1Deployment
2. Persist v1Deployment API object
desiredReplica: 3
availableReplica: 0
3.Reļ¬‚ectorupdatesDep
Deployment Controller
WATCH v1Deployment, v1ReplicaSet
4. Deployment reconciliation loop
action: create ReplicaSet
5.POSTv1ReplicaSet
v1ReplicaSet
6. Persist v1ReplicaSet API object
desiredReplica: 3
availableReplica: 0
ReplicaSet Controller
WATCH v1ReplicaSet, v1Pod
7.Reļ¬‚ectorupdatesRS
8. ReplicaSet reconciliation loop
action: create Pod
9.POSTv1Pod*3
Scheduler
WATCH v1Pod,
v1Node
v1Pod *3
10. Persist 3 v1Pod API objects
status: Pending
nodeName: none
11.Reļ¬‚ectorupdatesPod
Kubelet-node1
WATCH v1Pod on node1
CRI
CNI
Note: this is a rather high-level idea about Kubernetes reaction chain. It hides some details for illustration purpose and is different than actual behavior
Copyright Ā© 2017 Hao Zhang All rights reserved.
Kubernetes Reaction Chain
Kubenetes API Server & Etcd
1.POSTv1Deployment
Kubectl
v1Deployment
2. Persist v1Deployment API object
desiredReplica: 3
availableReplica: 0
3.Reļ¬‚ectorupdatesDep
Deployment Controller
WATCH v1Deployment, v1ReplicaSet
4. Deployment reconciliation loop
action: create ReplicaSet
5.POSTv1ReplicaSet
v1ReplicaSet
6. Persist v1ReplicaSet API object
desiredReplica: 3
availableReplica: 0
ReplicaSet Controller
WATCH v1ReplicaSet, v1Pod
7.Reļ¬‚ectorupdatesRS
8. ReplicaSet reconciliation loop
action: create Pod
9.POSTv1Pod*3
Scheduler
WATCH v1Pod,
v1Node
v1Pod *3
10. Persist 3 v1Pod API objects
status: Pending
nodeName: none
11.Reļ¬‚ectorupdatesPod
12. Scheduler assign
node to Pod
13.POSTv1Binding*3
14. Persist 3 v1Binding API objects
name: PodName
targetNodeName: node1
Kubelet-node1
WATCH v1Pod on node1
15.Podspecpushedtokubelet
CRI
16. Pull Image, Run Container,
PLEG, etc. Create v1Event and
update Pod status
CNI
v1Binding * 3
Note: this is a rather high-level idea about Kubernetes reaction chain. It hides some details for illustration purpose and is different than actual behavior
Copyright Ā© 2017 Hao Zhang All rights reserved.
Kubernetes Reaction Chain
Kubenetes API Server & Etcd
1.POSTv1Deployment
Kubectl
v1Deployment
2. Persist v1Deployment API object
desiredReplica: 3
availableReplica: 0
3.Reļ¬‚ectorupdatesDep
Deployment Controller
WATCH v1Deployment, v1ReplicaSet
4. Deployment reconciliation loop
action: create ReplicaSet
5.POSTv1ReplicaSet
v1ReplicaSet
6. Persist v1ReplicaSet API object
desiredReplica: 3
availableReplica: 0
ReplicaSet Controller
WATCH v1ReplicaSet, v1Pod
7.Reļ¬‚ectorupdatesRS
8. ReplicaSet reconciliation loop
action: create Pod
9.POSTv1Pod*3
Scheduler
WATCH v1Pod,
v1Node
v1Pod *3
10. Persist 3 v1Pod API objects
status: Pending
nodeName: none
11.Reļ¬‚ectorupdatesPod
12. Scheduler assign
node to Pod
13.POSTv1Binding*3
14. Persist 3 v1Binding API objects
name: PodName
targetNodeName: node1
Kubelet-node1
WATCH v1Pod on node1
15.Podspecpushedtokubelet
CRI
16. Pull Image, Run Container,
PLEG, etc. Create v1Event and
update Pod status
17.PATCHv1Pod*3
17-1.PUTv1Event
17. Update 3 v1Pod API objects
status: PodInitializing -> Running
nodeName: node1
v1Event * N
17-1. Several v1Event objects created
e.g. ImagePulled, CreatingContainer,
StartedContainer, etc., controllers, scheduler
can also create events throughout the entire
reaction chain
CNI
v1Binding * 3
Note: this is a rather high-level idea about Kubernetes reaction chain. It hides some details for illustration purpose and is different than actual behavior
Copyright Ā© 2017 Hao Zhang All rights reserved.
Kubernetes Reaction Chain
Kubenetes API Server & Etcd
1.POSTv1Deployment
Kubectl
v1Deployment
2. Persist v1Deployment API object
desiredReplica: 3
availableReplica: 0
3.Reļ¬‚ectorupdatesDep
Deployment Controller
WATCH v1Deployment, v1ReplicaSet
4. Deployment reconciliation loop
action: create ReplicaSet
5.POSTv1ReplicaSet
v1ReplicaSet
6. Persist v1ReplicaSet API object
desiredReplica: 3
availableReplica: 0
ReplicaSet Controller
WATCH v1ReplicaSet, v1Pod
7.Reļ¬‚ectorupdatesRS
8. ReplicaSet reconciliation loop
action: create Pod
9.POSTv1Pod*3
Scheduler
WATCH v1Pod,
v1Node
v1Pod *3
10. Persist 3 v1Pod API objects
status: Pending
nodeName: none
11.Reļ¬‚ectorupdatesPod
12. Scheduler assign
node to Pod
13.POSTv1Binding*3
14. Persist 3 v1Binding API objects
name: PodName
targetNodeName: node1
Kubelet-node1
WATCH v1Pod on node1
15.Podspecpushedtokubelet
CRI
16. Pull Image, Run Container,
PLEG, etc. Create v1Event and
update Pod status
17.PATCHv1Pod*3
17-1.PUTv1Event
17. Update 3 v1Pod API objects
status: PodInitializing -> Running
nodeName: node1
v1Event * N
17-1. Several v1Event objects created
e.g. ImagePulled, CreatingContainer,
StartedContainer, etc., controllers, scheduler
can also create events throughout the entire
reaction chain
CNI
18.Reļ¬‚ectorupdatesPods
19. ReplicaSet reconciliation loop
Update RS when ever a Pod enters
ā€œRunningā€ state
v1Binding * 3
Note: this is a rather high-level idea about Kubernetes reaction chain. It hides some details for illustration purpose and is different than actual behavior
Copyright Ā© 2017 Hao Zhang All rights reserved.
Kubernetes Reaction Chain
Kubenetes API Server & Etcd
1.POSTv1Deployment
Kubectl
v1Deployment
2. Persist v1Deployment API object
desiredReplica: 3
availableReplica: 0
3.Reļ¬‚ectorupdatesDep
Deployment Controller
WATCH v1Deployment, v1ReplicaSet
4. Deployment reconciliation loop
action: create ReplicaSet
5.POSTv1ReplicaSet
v1ReplicaSet
6. Persist v1ReplicaSet API object
desiredReplica: 3
availableReplica: 0
ReplicaSet Controller
WATCH v1ReplicaSet, v1Pod
7.Reļ¬‚ectorupdatesRS
8. ReplicaSet reconciliation loop
action: create Pod
9.POSTv1Pod*3
Scheduler
WATCH v1Pod,
v1Node
v1Pod *3
10. Persist 3 v1Pod API objects
status: Pending
nodeName: none
11.Reļ¬‚ectorupdatesPod
12. Scheduler assign
node to Pod
13.POSTv1Binding*3
14. Persist 3 v1Binding API objects
name: PodName
targetNodeName: node1
Kubelet-node1
WATCH v1Pod on node1
15.Podspecpushedtokubelet
CRI
16. Pull Image, Run Container,
PLEG, etc. Create v1Event and
update Pod status
17.PATCHv1Pod*3
17-1.PUTv1Event
17. Update 3 v1Pod API objects
status: PodInitializing -> Running
nodeName: node1
v1Event * N
17-1. Several v1Event objects created
e.g. ImagePulled, CreatingContainer,
StartedContainer, etc., controllers, scheduler
can also create events throughout the entire
reaction chain
CNI
18.Reļ¬‚ectorupdatesPods
19. ReplicaSet reconciliation loop
Update RS when ever a Pod enters
ā€œRunningā€ state
20.PATCHv1ReplicaSet
21. Update v1ReplicaSet API object
desiredReplica: 3
availableReplica: 0 -> 3
v1Binding * 3
Note: this is a rather high-level idea about Kubernetes reaction chain. It hides some details for illustration purpose and is different than actual behavior
Copyright Ā© 2017 Hao Zhang All rights reserved.
Kubernetes Reaction Chain
Kubenetes API Server & Etcd
1.POSTv1Deployment
Kubectl
v1Deployment
2. Persist v1Deployment API object
desiredReplica: 3
availableReplica: 0
3.Reļ¬‚ectorupdatesDep
Deployment Controller
WATCH v1Deployment, v1ReplicaSet
4. Deployment reconciliation loop
action: create ReplicaSet
5.POSTv1ReplicaSet
v1ReplicaSet
6. Persist v1ReplicaSet API object
desiredReplica: 3
availableReplica: 0
ReplicaSet Controller
WATCH v1ReplicaSet, v1Pod
7.Reļ¬‚ectorupdatesRS
8. ReplicaSet reconciliation loop
action: create Pod
9.POSTv1Pod*3
Scheduler
WATCH v1Pod,
v1Node
v1Pod *3
10. Persist 3 v1Pod API objects
status: Pending
nodeName: none
11.Reļ¬‚ectorupdatesPod
12. Scheduler assign
node to Pod
13.POSTv1Binding*3
14. Persist 3 v1Binding API objects
name: PodName
targetNodeName: node1
Kubelet-node1
WATCH v1Pod on node1
15.Podspecpushedtokubelet
CRI
16. Pull Image, Run Container,
PLEG, etc. Create v1Event and
update Pod status
17.PATCHv1Pod*3
17-1.PUTv1Event
17. Update 3 v1Pod API objects
status: PodInitializing -> Running
nodeName: node1
v1Event * N
17-1. Several v1Event objects created
e.g. ImagePulled, CreatingContainer,
StartedContainer, etc., controllers, scheduler
can also create events throughout the entire
reaction chain
CNI
18.Reļ¬‚ectorupdatesPods
19. ReplicaSet reconciliation loop
Update RS when ever a Pod enters
ā€œRunningā€ state
20.PATCHv1ReplicaSet
21. Update v1ReplicaSet API object
desiredReplica: 3
availableReplica: 0 -> 3
22.Reļ¬‚ectorupdatesRS
23. Deployment reconciliation loop
Update Dev status with RS status changes
v1Binding * 3
Note: this is a rather high-level idea about Kubernetes reaction chain. It hides some details for illustration purpose and is different than actual behavior
Copyright Ā© 2017 Hao Zhang All rights reserved.
Kubernetes Reaction Chain
Kubenetes API Server & Etcd
1.POSTv1Deployment
Kubectl
v1Deployment
2. Persist v1Deployment API object
desiredReplica: 3
availableReplica: 0
3.Reļ¬‚ectorupdatesDep
Deployment Controller
WATCH v1Deployment, v1ReplicaSet
4. Deployment reconciliation loop
action: create ReplicaSet
5.POSTv1ReplicaSet
v1ReplicaSet
6. Persist v1ReplicaSet API object
desiredReplica: 3
availableReplica: 0
ReplicaSet Controller
WATCH v1ReplicaSet, v1Pod
7.Reļ¬‚ectorupdatesRS
8. ReplicaSet reconciliation loop
action: create Pod
9.POSTv1Pod*3
Scheduler
WATCH v1Pod,
v1Node
v1Pod *3
10. Persist 3 v1Pod API objects
status: Pending
nodeName: none
11.Reļ¬‚ectorupdatesPod
12. Scheduler assign
node to Pod
13.POSTv1Binding*3
14. Persist 3 v1Binding API objects
name: PodName
targetNodeName: node1
Kubelet-node1
WATCH v1Pod on node1
15.Podspecpushedtokubelet
CRI
16. Pull Image, Run Container,
PLEG, etc. Create v1Event and
update Pod status
17.PATCHv1Pod*3
17-1.PUTv1Event
17. Update 3 v1Pod API objects
status: PodInitializing -> Running
nodeName: node1
v1Event * N
17-1. Several v1Event objects created
e.g. ImagePulled, CreatingContainer,
StartedContainer, etc., controllers, scheduler
can also create events throughout the entire
reaction chain
CNI
18.Reļ¬‚ectorupdatesPods
19. ReplicaSet reconciliation loop
Update RS when ever a Pod enters
ā€œRunningā€ state
20.PATCHv1ReplicaSet
21. Update v1ReplicaSet API object
desiredReplica: 3
availableReplica: 0 -> 3
22.Reļ¬‚ectorupdatesRS
24.PATCHv1Deployment
23. Deployment reconciliation loop
Update Dev status with RS status changes
25. Update v1Deployment API object
desiredReplica: 3
availableReplica: 0 -> 3
v1Binding * 3
Note: this is a rather high-level idea about Kubernetes reaction chain. It hides some details for illustration purpose and is different than actual behavior
To be continued in Part II

More Related Content

What's hot

What's hot (20)

ģæ ė²„ė„¤ķ‹°ģŠ¤ ( Kubernetes ) ģ†Œź°œ ģžė£Œ
ģæ ė²„ė„¤ķ‹°ģŠ¤ ( Kubernetes ) ģ†Œź°œ ģžė£Œģæ ė²„ė„¤ķ‹°ģŠ¤ ( Kubernetes ) ģ†Œź°œ ģžė£Œ
ģæ ė²„ė„¤ķ‹°ģŠ¤ ( Kubernetes ) ģ†Œź°œ ģžė£Œ
Ā 
Kubernetes - A Comprehensive Overview
Kubernetes - A Comprehensive OverviewKubernetes - A Comprehensive Overview
Kubernetes - A Comprehensive Overview
Ā 
Kubernetes internals (Kubernetes ķ•“ė¶€ķ•˜źø°)
Kubernetes internals (Kubernetes ķ•“ė¶€ķ•˜źø°)Kubernetes internals (Kubernetes ķ•“ė¶€ķ•˜źø°)
Kubernetes internals (Kubernetes ķ•“ė¶€ķ•˜źø°)
Ā 
Kubernetes 101
Kubernetes 101Kubernetes 101
Kubernetes 101
Ā 
Kubernetes
KubernetesKubernetes
Kubernetes
Ā 
An Introduction to Kubernetes
An Introduction to KubernetesAn Introduction to Kubernetes
An Introduction to Kubernetes
Ā 
Introduction to kubernetes
Introduction to kubernetesIntroduction to kubernetes
Introduction to kubernetes
Ā 
Kubernetes 101 for Beginners
Kubernetes 101 for BeginnersKubernetes 101 for Beginners
Kubernetes 101 for Beginners
Ā 
Introduction to Kubernetes Workshop
Introduction to Kubernetes WorkshopIntroduction to Kubernetes Workshop
Introduction to Kubernetes Workshop
Ā 
Kubernetes 101
Kubernetes 101Kubernetes 101
Kubernetes 101
Ā 
Kubernetes Introduction
Kubernetes IntroductionKubernetes Introduction
Kubernetes Introduction
Ā 
Kubernetes
KubernetesKubernetes
Kubernetes
Ā 
DevOps with Kubernetes
DevOps with KubernetesDevOps with Kubernetes
DevOps with Kubernetes
Ā 
01. Kubernetes-PPT.pptx
01. Kubernetes-PPT.pptx01. Kubernetes-PPT.pptx
01. Kubernetes-PPT.pptx
Ā 
Kubernetes Architecture | Understanding Kubernetes Components | Kubernetes Tu...
Kubernetes Architecture | Understanding Kubernetes Components | Kubernetes Tu...Kubernetes Architecture | Understanding Kubernetes Components | Kubernetes Tu...
Kubernetes Architecture | Understanding Kubernetes Components | Kubernetes Tu...
Ā 
Getting Started with Kubernetes
Getting Started with Kubernetes Getting Started with Kubernetes
Getting Started with Kubernetes
Ā 
Kubernetes - introduction
Kubernetes - introductionKubernetes - introduction
Kubernetes - introduction
Ā 
Intro to kubernetes
Intro to kubernetesIntro to kubernetes
Intro to kubernetes
Ā 
Docker and kubernetes_introduction
Docker and kubernetes_introductionDocker and kubernetes_introduction
Docker and kubernetes_introduction
Ā 
Kubernetes Webinar - Using ConfigMaps & Secrets
Kubernetes Webinar - Using ConfigMaps & Secrets Kubernetes Webinar - Using ConfigMaps & Secrets
Kubernetes Webinar - Using ConfigMaps & Secrets
Ā 

Viewers also liked

Viewers also liked (12)

Large Scale Kubernetes on AWS at Europe's Leading Online Fashion Platform - A...
Large Scale Kubernetes on AWS at Europe's Leading Online Fashion Platform - A...Large Scale Kubernetes on AWS at Europe's Leading Online Fashion Platform - A...
Large Scale Kubernetes on AWS at Europe's Leading Online Fashion Platform - A...
Ā 
From dev to prod: Kubernetes on AWS (short ver.)
From dev to prod: Kubernetes on AWS (short ver.)From dev to prod: Kubernetes on AWS (short ver.)
From dev to prod: Kubernetes on AWS (short ver.)
Ā 
Kubernetes on AWS
Kubernetes on AWSKubernetes on AWS
Kubernetes on AWS
Ā 
KELK Stack on AWS
KELK Stack on AWSKELK Stack on AWS
KELK Stack on AWS
Ā 
Container Days Boston - Kubernetes in production
Container Days Boston - Kubernetes in productionContainer Days Boston - Kubernetes in production
Container Days Boston - Kubernetes in production
Ā 
Webcast - Making kubernetes production ready
Webcast - Making kubernetes production readyWebcast - Making kubernetes production ready
Webcast - Making kubernetes production ready
Ā 
Running Production-Grade Kubernetes on AWS
Running Production-Grade Kubernetes on AWSRunning Production-Grade Kubernetes on AWS
Running Production-Grade Kubernetes on AWS
Ā 
Cloud Solution Day 2016: Service Mesh for Kubernetes
Cloud Solution Day 2016: Service Mesh for KubernetesCloud Solution Day 2016: Service Mesh for Kubernetes
Cloud Solution Day 2016: Service Mesh for Kubernetes
Ā 
Kubernetes on AWS at Europe's Leading Online Fashion Platform
Kubernetes on AWS at Europe's Leading Online Fashion PlatformKubernetes on AWS at Europe's Leading Online Fashion Platform
Kubernetes on AWS at Europe's Leading Online Fashion Platform
Ā 
Kubernetes networking in AWS
Kubernetes networking in AWSKubernetes networking in AWS
Kubernetes networking in AWS
Ā 
O'Reilly Software Architecture Conference London 2017: Building Resilient Mic...
O'Reilly Software Architecture Conference London 2017: Building Resilient Mic...O'Reilly Software Architecture Conference London 2017: Building Resilient Mic...
O'Reilly Software Architecture Conference London 2017: Building Resilient Mic...
Ā 
Beyond Ingresses - Better Traffic Management in Kubernetes
Beyond Ingresses - Better Traffic Management in KubernetesBeyond Ingresses - Better Traffic Management in Kubernetes
Beyond Ingresses - Better Traffic Management in Kubernetes
Ā 

Similar to Kubernetes Architecture - beyond a black box - Part 1

Kubernetes: Š²Ń–Š“ Š·Š½Š°Š¹Š¾Š¼ŃŃ‚Š²Š° Š“Š¾ Š²ŠøŠŗŠ¾Ń€ŠøстŠ°Š½Š½Ń у CI/CD
Kubernetes: Š²Ń–Š“ Š·Š½Š°Š¹Š¾Š¼ŃŃ‚Š²Š° Š“Š¾ Š²ŠøŠŗŠ¾Ń€ŠøстŠ°Š½Š½Ń у CI/CDKubernetes: Š²Ń–Š“ Š·Š½Š°Š¹Š¾Š¼ŃŃ‚Š²Š° Š“Š¾ Š²ŠøŠŗŠ¾Ń€ŠøстŠ°Š½Š½Ń у CI/CD
Kubernetes: Š²Ń–Š“ Š·Š½Š°Š¹Š¾Š¼ŃŃ‚Š²Š° Š“Š¾ Š²ŠøŠŗŠ¾Ń€ŠøстŠ°Š½Š½Ń у CI/CD
Stfalcon Meetups
Ā 
Implementing-SaaS-on-Kubernetes-Michael-Knapp-Andrew-Gao-Capital-One.pdf
Implementing-SaaS-on-Kubernetes-Michael-Knapp-Andrew-Gao-Capital-One.pdfImplementing-SaaS-on-Kubernetes-Michael-Knapp-Andrew-Gao-Capital-One.pdf
Implementing-SaaS-on-Kubernetes-Michael-Knapp-Andrew-Gao-Capital-One.pdf
ssuserf4844f
Ā 

Similar to Kubernetes Architecture - beyond a black box - Part 1 (20)

[DevDay 2017] OpenShift Enterprise - Speaker: Linh Do - DevOps Engineer at Ax...
[DevDay 2017] OpenShift Enterprise - Speaker: Linh Do - DevOps Engineer at Ax...[DevDay 2017] OpenShift Enterprise - Speaker: Linh Do - DevOps Engineer at Ax...
[DevDay 2017] OpenShift Enterprise - Speaker: Linh Do - DevOps Engineer at Ax...
Ā 
[Spark Summit 2017 NA] Apache Spark on Kubernetes
[Spark Summit 2017 NA] Apache Spark on Kubernetes[Spark Summit 2017 NA] Apache Spark on Kubernetes
[Spark Summit 2017 NA] Apache Spark on Kubernetes
Ā 
Kube journey 2017-04-19
Kube journey   2017-04-19Kube journey   2017-04-19
Kube journey 2017-04-19
Ā 
OpenShift Enterprise 3.1 vs kubernetes
OpenShift Enterprise 3.1 vs kubernetesOpenShift Enterprise 3.1 vs kubernetes
OpenShift Enterprise 3.1 vs kubernetes
Ā 
Amazon EC2 Container Service: Manage Docker-Enabled Apps in EC2
Amazon EC2 Container Service: Manage Docker-Enabled Apps in EC2Amazon EC2 Container Service: Manage Docker-Enabled Apps in EC2
Amazon EC2 Container Service: Manage Docker-Enabled Apps in EC2
Ā 
Kubernetes: Š²Ń–Š“ Š·Š½Š°Š¹Š¾Š¼ŃŃ‚Š²Š° Š“Š¾ Š²ŠøŠŗŠ¾Ń€ŠøстŠ°Š½Š½Ń у CI/CD
Kubernetes: Š²Ń–Š“ Š·Š½Š°Š¹Š¾Š¼ŃŃ‚Š²Š° Š“Š¾ Š²ŠøŠŗŠ¾Ń€ŠøстŠ°Š½Š½Ń у CI/CDKubernetes: Š²Ń–Š“ Š·Š½Š°Š¹Š¾Š¼ŃŃ‚Š²Š° Š“Š¾ Š²ŠøŠŗŠ¾Ń€ŠøстŠ°Š½Š½Ń у CI/CD
Kubernetes: Š²Ń–Š“ Š·Š½Š°Š¹Š¾Š¼ŃŃ‚Š²Š° Š“Š¾ Š²ŠøŠŗŠ¾Ń€ŠøстŠ°Š½Š½Ń у CI/CD
Ā 
MongoDB World 2016: Get MEAN and Lean with MongoDB and Kubernetes
MongoDB World 2016: Get MEAN and Lean with MongoDB and KubernetesMongoDB World 2016: Get MEAN and Lean with MongoDB and Kubernetes
MongoDB World 2016: Get MEAN and Lean with MongoDB and Kubernetes
Ā 
Amazon EC2 Container Service: Manage Docker-Enabled Apps in EC2
Amazon EC2 Container Service: Manage Docker-Enabled Apps in EC2Amazon EC2 Container Service: Manage Docker-Enabled Apps in EC2
Amazon EC2 Container Service: Manage Docker-Enabled Apps in EC2
Ā 
Fission Introduction
Fission IntroductionFission Introduction
Fission Introduction
Ā 
AWS April Webinar Series - Getting Started with Amazon EC2 Container Service
AWS April Webinar Series - Getting Started with Amazon EC2 Container ServiceAWS April Webinar Series - Getting Started with Amazon EC2 Container Service
AWS April Webinar Series - Getting Started with Amazon EC2 Container Service
Ā 
Apache Spark on Kubernetes Anirudh Ramanathan and Tim Chen
Apache Spark on Kubernetes Anirudh Ramanathan and Tim ChenApache Spark on Kubernetes Anirudh Ramanathan and Tim Chen
Apache Spark on Kubernetes Anirudh Ramanathan and Tim Chen
Ā 
(DVO308) Docker & ECS in Production: How We Migrated Our Infrastructure from ...
(DVO308) Docker & ECS in Production: How We Migrated Our Infrastructure from ...(DVO308) Docker & ECS in Production: How We Migrated Our Infrastructure from ...
(DVO308) Docker & ECS in Production: How We Migrated Our Infrastructure from ...
Ā 
Containers, microservices and serverless for realists
Containers, microservices and serverless for realistsContainers, microservices and serverless for realists
Containers, microservices and serverless for realists
Ā 
GPSTEC304_Shipping With PorpoiseA K8s Story
GPSTEC304_Shipping With PorpoiseA K8s StoryGPSTEC304_Shipping With PorpoiseA K8s Story
GPSTEC304_Shipping With PorpoiseA K8s Story
Ā 
DevOps with Azure, Kubernetes, and Helm Webinar
DevOps with Azure, Kubernetes, and Helm WebinarDevOps with Azure, Kubernetes, and Helm Webinar
DevOps with Azure, Kubernetes, and Helm Webinar
Ā 
Introduction to kubernetes
Introduction to kubernetesIntroduction to kubernetes
Introduction to kubernetes
Ā 
Tech connect aws
Tech connect  awsTech connect  aws
Tech connect aws
Ā 
01 - VMUGIT - Lecce 2018 - Fabio Rapposelli, VMware
01 - VMUGIT - Lecce 2018 - Fabio Rapposelli, VMware01 - VMUGIT - Lecce 2018 - Fabio Rapposelli, VMware
01 - VMUGIT - Lecce 2018 - Fabio Rapposelli, VMware
Ā 
Docker kubernetes fundamental(pod_service)_190307
Docker kubernetes fundamental(pod_service)_190307Docker kubernetes fundamental(pod_service)_190307
Docker kubernetes fundamental(pod_service)_190307
Ā 
Implementing-SaaS-on-Kubernetes-Michael-Knapp-Andrew-Gao-Capital-One.pdf
Implementing-SaaS-on-Kubernetes-Michael-Knapp-Andrew-Gao-Capital-One.pdfImplementing-SaaS-on-Kubernetes-Michael-Knapp-Andrew-Gao-Capital-One.pdf
Implementing-SaaS-on-Kubernetes-Michael-Knapp-Andrew-Gao-Capital-One.pdf
Ā 

Recently uploaded

Recently uploaded (20)

Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
Ā 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Ā 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
Ā 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Ā 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
Ā 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
Ā 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
Ā 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
Ā 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
Ā 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Ā 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Ā 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
Ā 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
Ā 
šŸ¬ The future of MySQL is Postgres šŸ˜
šŸ¬  The future of MySQL is Postgres   šŸ˜šŸ¬  The future of MySQL is Postgres   šŸ˜
šŸ¬ The future of MySQL is Postgres šŸ˜
Ā 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
Ā 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
Ā 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Ā 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Ā 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Ā 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
Ā 

Kubernetes Architecture - beyond a black box - Part 1

  • 1. Kubernetes - Beyond a Black Box A humble peek into one level below your running application in production
  • 2. Copyright Ā© 2017 Hao Zhang All rights reserved. About The Author ā€¢ Hao (Harry) Zhang ā€¢ Currently: Software Engineer, LinkedIn, Team Helix @ Data Infrastructure ā€¢ Previously: Member of Technical Staff, Applatix, Worked with Kubernetes, Docker and AWS ā€¢ Previous blog series: ā€œMaking Kubernetes Production Readyā€ ā€¢ Part 1, Part 2, Part 3 ā€¢ Or just Google ā€œKubernetes productionā€, ā€œKubernetes in productionā€, or similar ā€¢ Connect with me on LinkedIn
  • 3. Copyright Ā© 2017 Hao Zhang All rights reserved. Motivation ā€¢ Itā€™s one of the hottest cluster management project on Github ā€¢ It has THE largest developer community ā€¢ State-of-art architecture designed mainly by people from Google who have been working on container orchestration for 10 years ā€¢ Collaboration of hundreds of engineers from tens of companies
  • 4. Copyright Ā© 2017 Hao Zhang All rights reserved. Introduction This set of slides is a humble dig into one level below your running application in production, revealing how different components of Kubernetes work together to orchestrate containers and present your applications to the rest of the world. The slides contains 80+ external links to Kubernetes documentations, blog posts, Github issues, discussions, design proposals, pull requests, papers, source code ļ¬les I went through when I was working with Kubernetes - which I think are valuable for people to understand how Kubernetes works, Kubernetes design philosophies and why these design came into places.
  • 5. Copyright Ā© 2017 Hao Zhang All rights reserved. Outline - Part I ā€¢ A high level idea about what is Kubernetes ā€¢ Components, functionalities, and design choice ā€¢ API Server ā€¢ Controller Manager ā€¢ Scheduler ā€¢ Kubelet ā€¢ Kube-proxy ā€¢ Put it together, what happens when you do `kubectl create`
  • 6. Copyright Ā© 2017 Hao Zhang All rights reserved. Outline - Part II ā€¢ An analysis of controlling framework design ā€¢ Micro-service Choreography ā€¢ Generalized Workload and Centralized Controller ā€¢ Interfaces for production environments ā€¢ High level workload abstractions ā€¢ Strength and limitations ā€¢ Conclusion
  • 8. Overview A high level idea about what is Kubernetes
  • 9. ā€“ Kubernetes Ofļ¬cial Website ā€œKubernetes is an open-source system for automating deployment, scaling, and management of containerized applications.ā€
  • 10. Copyright Ā© 2017 Hao Zhang All rights reserved. Manage Apps, Not Machines Manages Your Apps ā€¢ App Image Management ā€¢ Where to Run ā€¢ When to Get, when to Run, and when to Discard ā€¢ App Image Usage ā€¢ Image Service: Image lifecycle management ā€¢ Runtime: Run image as application
  • 11. Copyright Ā© 2017 Hao Zhang All rights reserved. Manage Apps, Not Machines Manages Things Around Your Apps ā€¢ High level workload abstractions ā€¢ Pod, Job, Deployment, StatefulSet, etc. ā€¢ ā€œIf container resembles atom, Kubernetes provides moleculesā€ - Tim Hockin ā€¢ Storages ā€¢ Data volumes ā€¢ Network ā€¢ IP, Port mapping, DNS, ā€¦ ā€¢ Monitoring, scaling, communicating, ā€¦
  • 12. Copyright Ā© 2017 Hao Zhang All rights reserved. etcd3 Kube-apiserver Kube-controller- manager Kube-scheduler Persistent Data Volume KubeletKube-proxy Kubernetes Control Plane KubeletKube-proxy KubeletKube-proxy
  • 13. Copyright Ā© 2017 Hao Zhang All rights reserved. A Simpliļ¬ed Typical Web Service Load Balancer Web Server MemCache Object Store (BLOB) Data Volumes Database (SQL / NoSQL) Backend Sync Read Services Backend Write Sync Services Backend Write Async Services Task Queue Worker Pool Browser Client
  • 14. Copyright Ā© 2017 Hao Zhang All rights reserved. Web Service On Kube (AWS) VPC (Virtual Private Cloud) Subnet EC2 Instance EBS etcd apiserver Sched Kubelet CtlMgr Docker EBS Auto Scaling Group EC2 Instance EBS etcd apiserver Sched Kubelet CtlMgr Docker EBS ElasticLoadBalancer Subnet Elastic Load Balancer EC2 Instance KubeletDocker Kube-proxy EBS SyncWrite AsyncW Worker Pool MemCache EC2 Instance KubeletDocker Kube-proxy EBS Web SvrSyncRead AsyncW TaskQ EBS MemCache EC2 Instance KubeletDocker Kube-proxy EBS Web Svr Worker Pool MemCache DB EBS EC2 Instance KubeletDocker Kube-proxy EBS SyncRead SyncWrite Worker PoolMemCache DB EBS EC2 Instance KubeletDocker Kube-proxy EBS SyncWrite TaskQ EBS MemCache DB EBS EC2 Instance KubeletDocker Kube-proxy EBS Web SvrAsyncW TaskQ EBSWorker Pool DBEBS Auto Scaling Group VPC Routing Table S3 (External Blob, Kubernetes Binaries, Docker Registry) VPC Gateway Note: This chart is a rough illustration about how different production components might sit in cloud. It does not reļ¬‚ect any strict production conļ¬gurations such as high-availability, fault tolerance, machine load balancing etc.
  • 16. Copyright Ā© 2017 Hao Zhang All rights reserved. Kubernetes REST API ā€¢ Every piece of information in Kubernetes is an API object (a.k.a Cluster metadata) ā€¢ Node, Pod, Job, Deployment, Endpoint, Service, Event, PersistentVolume, ResourceQuota, Secrets, Conļ¬gMap, ā€¦ā€¦ ā€¢ Schema and data model explicitly deļ¬ned and enforced ā€¢ Every API object has a UID and version number ā€¢ Micro-services can perform CRUD on API objects through REST ā€¢ Desired state in cluster is achieved by collaborations of separate autonomous entities reacting on changes of one or more API object(s) they are interested in
  • 17. Copyright Ā© 2017 Hao Zhang All rights reserved. Kubernetes REST API ā€¢ The API Space ā€¢ Different API Groups ā€¢ Usually a group represents a set of features, i.e. batch, apps, rbac, etc. ā€¢ Each group has its own version control /apis/batch/v1/namespaces/default/jobs/myjob Root Group Name Namespaced Object Namespace Name API Resource Kind API Object Instance Name Version More Readings: āœ¤ Extending APIs design spec: https://github.com/kubernetes/community/blob/master/contributors/design-proposals/api-machinery/extending-api.md āœ¤ Customer Resource Documentation: https://kubernetes.io/docs/tasks/access-kubernetes-api/extend-api-custom-resource-deļ¬nitions/ āœ¤ Extending core API with aggregation layer: https://kubernetes.io/docs/concepts/api-extension/apiserver-aggregation/ āœ¤ Kubernetes API Spec: https://raw.githubusercontent.com/kubernetes/kubernetes/release-1.7/api/openapi-spec/swagger.json āœ¤ Kubernetes API Server Deep Dives: āœ¤ https://blog.openshift.com/kubernetes-deep-dive-api-server-part-1/ āœ¤ https://blog.openshift.com/kubernetes-deep-dive-api-server-part-2/
  • 18. Copyright Ā© 2017 Hao Zhang All rights reserved. Kubernetes REST API ā€¢ The API Object Deļ¬nition ā€¢ 5 major ļ¬elds: apiVersion, kind, metadata, spec and status ā€¢ A few exceptions such as v1Event, v1Binding, etc ā€¢ Metadata includes name, namespace, label, annotation, version, etc ā€¢ Spec describes desired state, while status describes current state $ kubectl get jobs myjob --namespace default --output yaml apiVersion: batch/v1 kind: job metadata: name: myjob namespace: default spec: (Describing desired ā€œjobā€ object, i.e. Pod template, how many runs, etc.) status: (ā€œjobā€ object current status, i.e. ran # times already, etc.)
  • 19. Copyright Ā© 2017 Hao Zhang All rights reserved. Kubernetes Control Plane etcd3 Kube-apiserver Kube-controller- manager Kube-scheduler Persistent Data Volume (Usually SSD) KubeletKube-proxy REST/protobuf REST/protobuf REST/protobuf REST/protobuf RPC/protobuf Master Node Minion Node
  • 20. Copyright Ā© 2017 Hao Zhang All rights reserved. Kube-apiserver & Etcd etcd3 Kube-apiserver Kube-controller- manager Kube-scheduler Persistent Data Volume (Usually SSD) KubeletKube-proxy REST/protobuf REST/protobuf REST/protobuf REST/protobuf RPC/protobuf Master Node Minion Node
  • 21. Copyright Ā© 2017 Hao Zhang All rights reserved. Kube-apiserver & Etcd Etcd as metadata store: ā€¢ Watchable distributed K-V store ā€¢ Transaction and locking support ā€¢ Scalable and low memory consumption ā€¢ RBAC support ā€¢ Uses Raft for leader election ā€¢ R/W on all nodes of Etcd cluster ā€¢ ā€¦ General Functionalities: ā€¢ Authentication ā€¢ Admission control (RBAC) ā€¢ Rate limiting ā€¢ Connection pooling ā€¢ Feature grouping ā€¢ Pluggable storage backend ā€¢ Pluggable customer API deļ¬nition ā€¢ Client tracing and API call stats For API Objects: ā€¢ CRUD Operations ā€¢ Defaulting ā€¢ Validation ā€¢ Versioning ā€¢ Caching For Containers: ā€¢ Application Proxying ā€¢ Stream Container Logs ā€¢ Execute Command in Container ā€¢ Metrics WRITE API Call Process Pipeline: 1. Route through mux to the routine 2. Version conversions 3. Validation 4. Encode 5. Storage backend communication READ API calls are process roughly opposite of WRITE API calls. etcd3 Kube-apiserver gRPC + protobuf REST APIs: HTTP/HTTPS Compute Node Usually Co-locate on same host SSD
  • 22. Copyright Ā© 2017 Hao Zhang All rights reserved. Kube-apiserver & Etcd ā€¢ API Server caching ā€¢ Decoding object from Etcd is a huge burden ā€¢ Without caching, 50% CPU spent on decoding objects read from etcd, 70% of which is on convert K/V (as of Etcd2) ā€¢ Also adds pressure on Golangā€™s GC (Lots of unused K/V) ā€¢ In-memory Cache for READs (Watch, Get and List) ā€¢ Decode only when etcd has more up-to-date version of the object
  • 23. Copyright Ā© 2017 Hao Zhang All rights reserved. Kube-apiserver & Etcd Citation and More Readings: āœ¤ Etcd! Etcd3! āœ¤ Discussion about pluggable storage backend (ZK, Consul, Redis, etc): https://github.com/kubernetes/kubernetes/issues/1957 āœ¤ More on ā€œWhy etcd?ā€ (Could be biased): https://github.com/coreos/etcd/blob/master/Documentation/learning/why.md āœ¤ Upgrade to etcd3: https://github.com/kubernetes/kubernetes/issues/22448 āœ¤ Etcd3 Features: https://coreos.com/blog/etcd3-a-new-etcd.html āœ¤ Kubernetes & Etcd āœ¤ Kubernetesā€™ Etcd usage: http://cloud-mechanic.blogspot.com/2014/09/kubernetes-under-hood-etcd.html āœ¤ Raft - the consensus protocol behind etcd: https://speakerdeck.com/benbjohnson/raft-the-understandable-distributed-consensus-protocol/ āœ¤ Protocol Buffer: https://developers.google.com/protocol-buffers/ āœ¤ Setting up HA Kubernetes cluster: https://kubernetes.io/docs/admin/high-availability/#clustering-etcd āœ¤ Kubernetes API āœ¤ Kubernetes API Convention: https://github.com/kubernetes/community/blob/master/contributors/devel/api-conventions.md āœ¤ API Server Optimizations and Improvements āœ¤ Discussion about cache: https://github.com/kubernetes/kubernetes/issues/6678, Watch Cache Design Proposal āœ¤ Discussion about optimizing etcd connection: https://github.com/kubernetes/kubernetes/issues/7451 āœ¤ Optimizing GET, LIST, and WATCH: https://github.com/kubernetes/kubernetes/issues/4817, https://github.com/kubernetes/kubernetes/issues/6564 āœ¤ Serve LIST from memory: https://github.com/kubernetes/kubernetes/issues/15945 āœ¤ Optimizing get many pods: https://github.com/kubernetes/kubernetes/issues/6514
  • 24. Copyright Ā© 2017 Hao Zhang All rights reserved. Kube-apiserver & Etcd ā€¢ Kubernetesā€™ Performance SLO ā€¢ API Responsiveness: 99% API calls < 1s ā€¢ Pod Startup Time: 99% Pods and their containers start < 5s (With pre-pulled images) ā€¢ 5000 Nodes, 150,000 Pods, ~300,000 Containers (1~1.5M API objects) ā€¢ Resource (Single master set up for 1000 nodes as of v1.2): 32 Cores, 120G Memory Citation and More Readings: āœ¤ Kubernetes 1.2 Scalability: http://blog.kubernetes.io/2016/03/1000-nodes-and-beyond-updates-to-Kubernetes-performance-and-scalability-in-12.html āœ¤ Kubernetes 1.6 Scalability: http://blog.kubernetes.io/2017/03/scalability-updates-in-kubernetes-1.6.html
  • 25. Copyright Ā© 2017 Hao Zhang All rights reserved. Kube-controller-manager etcd3 Kube-apiserver Kube-controller- manager Kube-scheduler Persistent Data Volume (Usually SSD) KubeletKube-proxy REST/protobuf REST/protobuf REST/protobuf REST/protobuf RPC/protobuf Master Node Minion Node
  • 26. Copyright Ā© 2017 Hao Zhang All rights reserved. Kube-controller-manager ā€¢ Framework: Reļ¬‚ector, Store, SharedInformer, and Controller ā€¢ One of the optimizations in 1.6 to make it scale from 2k to 5k nodes ā€¢ ListAndWatch on a particular resource ā€¢ All instance of a resource ā€¢ List upon startup and connection broken ā€¢ List: get current states ā€¢ Watch: get updates ā€¢ Push ListAndWatch results into store ā€¢ Store is always up-to-date ā€¢ In memory read-only cache ā€¢ Shared among informers ā€¢ Optimizes GET and LIST ā€¢ Cache Update Broadcast ā€¢ Events: OnAdd, OnUpdate, OnDelete ā€¢ Store is at least as fresh as notiļ¬cation Controller Reļ¬‚ector Store SharedInformer Event Handlers Task Queue ā€¢ Blocking, MT Safe FIFO Queue ā€¢ Additional feature such as delayed scheduling, failure counts, etc. ā€¢ Controller registers event handlers to the shared informer of the resource it is interested in ā€¢ A single SharedInformer has a pool of event handlers registered by different controllers ā€¢ Worker pool of a particular resource ā€¢ Reconciliation controlling loops ā€¢ Action is computed based on DesiredState and CurrentState
  • 27. Copyright Ā© 2017 Hao Zhang All rights reserved. Kube-controller-manager ā€¢ Reconciliation Controlling Loop: Things will ļ¬nally go right func (c *Controller) Run(poolSize int, stopCh chan struct{}) { // start worker pool for i := 0; i < poolSize; i++ { // runWorker will loop until "something bad" happens. The .Until will // then rekick the worker after one second go wait.Until(c.runWorker, 1 * time.Second, stopCh) } // wait until we're told to stop <-stopCh } func (c *Controller) runWorker() { // hot loop until we're told to stop. for c.processNextWorkItem() {} } func (c *Controller) processNextWorkItem() { obj = c.queue.Get() // Reconciliation between current and desired state err = c.makeChange(obj.CurrentState, obj.DesiredState) if !err { return } // Cool down and retry using delayed scheduling upon error c.queue.AddRateLimited(obj) return }
  • 28. Copyright Ā© 2017 Hao Zhang All rights reserved. Kube-controller-manager Shared Resource Informer Factory Shared Cache Pod Reļ¬‚ Job Reļ¬‚ Dep Reļ¬‚ Svc Reļ¬‚ Node Reļ¬‚ Ds Reļ¬‚ Rs Reļ¬‚ ā€¦ā€¦ Pod EventHdl Job EventHdl Dep EventHdl Svc EventHdl Node EventHdl Ds EventHdl Rs EventHdl ā€¦ā€¦ Pod Informer Job Informer Dep Informer Svc Informer Node Informer Ds Informer Dep Informer Other Informers Shared Kubernetes Client Handler Event Dispatching Mesh Dep Worker Pool Dep Controller Queue Job Worker Pool Job Controller Queue Svc Worker Pool Svc Controller Queue Rs Worker Pool Rs Controller Queue Ds Worker Pool Ds Controller Queue Node Worker Pool Node Controller Queue XYZ Worker Pool XYZ Controller Queue
  • 29. Copyright Ā© 2017 Hao Zhang All rights reserved. Kube-controller-manager ā€¢ 28 Controllers as of 1.8, usually deployed as a Pod ā€¢ All autonomous micro-services: easily turn on/off ā€¢ HA via leader election ā€¢ Shared Kubernetes client ā€¢ Limit QPS, manage session/HTTPConnection, auth, retry with jittered backoff ā€¢ InformerFactory (SharedInformer) ā€¢ Reduce API Call, ensure cache consistency, lighter GET/LIST ā€¢ Reļ¬‚ector / Store / Informer / Controller framework ā€¢ Easy, separated development of different controllers as micro services ā€¢ Worker Pool ā€¢ Individually conļ¬gurable agility and resource consumption
  • 30. Copyright Ā© 2017 Hao Zhang All rights reserved. Kube-controller-manager More Readings: āœ¤ Controller Dev Guide: https://github.com/kubernetes/community/blob/master/contributors/devel/controllers.md āœ¤ Controller Manager Main: https://github.com/kubernetes/kubernetes/blob/master/cmd/kube-controller-manager/app/controllermanager.go āœ¤ Shared Informer: https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/client-go/tools/cache/shared_informer.go āœ¤ Controller Conļ¬g: https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/client-go/tools/cache/controller.go āœ¤ Reļ¬‚ector: https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/client-go/tools/cache/reļ¬‚ector.go āœ¤ List and Watch: https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/client-go/tools/cache/listwatch.go āœ¤ All Controller Logics: https://github.com/kubernetes/kubernetes/tree/master/pkg/controller
  • 31. Copyright Ā© 2017 Hao Zhang All rights reserved. Kube-scheduler etcd3 Kube-apiserver Kube-controller- manager Kube-scheduler Persistent Data Volume (Usually SSD) KubeletKube-proxy REST/protobuf REST/protobuf REST/protobuf REST/protobuf RPC/protobuf Master Node Minion Node
  • 32. Copyright Ā© 2017 Hao Zhang All rights reserved. Kube-scheduler ā€¢ Scheduling Algorithm ā€¢ Filter Out Impossible Node ā€¢ NodeIsReady: Node should be in Ready state ā€¢ CheckNodeMemoryPressure: Node can be over-provisioned, ļ¬lter out node with memory pressure ā€¢ CheckNodeDiskPressure: Node reporting host disk pressure should not be considered ā€¢ MaxNodeEBSVolumeCount: If a Pod needs a volume, ļ¬lter out nodes reach max volume count ā€¢ MatchNodeSelector: If Pod has a nodeSelector, it should only consider nodes with that label ā€¢ PodFitsResources: Node should have resource quota ā€¢ PodFitsHostPort: Podā€™s HostPort should be available ā€¢ NoDiskConļ¬‚ict: We should have volume this Pod wants ā€¢ NoVolumeZoneConļ¬‚ict: Filter out nodes with zone conļ¬‚ict ā€¢ ā€¦ā€¦ ā€¢ Give Every Node a Score ā€¢ LeastRequestedPriority: Nodes ā€œmore idleā€ is preferred ā€¢ BalancedResourceAllocation: Balance node utilization ā€¢ SelectorSpreadPriority: Replicas should be spread out ā€¢ Afļ¬nity/AntiAfļ¬nityPriority: ā€œI prefer to stay away / co-locate from some other categories of Podsā€ ā€¢ ImageLocalityPriority: Node is preferred if it has the image ā€¢ ā€¦ā€¦ Kubernetes Client Node Filter Node Ranker Decision Watch Unscheduled Pods Node Candidates Nodes Sorted By Ranking POST/v1/bindings Node with highest ranking
  • 33. Copyright Ā© 2017 Hao Zhang All rights reserved. Kube-scheduler ā€¢ Default scheduler ā€¢ Sequential scheduler ā€¢ Some application / topology awareness ā€¢ User can deļ¬ne afļ¬nity, anti-afļ¬nity, node-selector, etc ā€¢ Schedule 30,000 Pods onto 1,000 Nodes within 587 Seconds (In v1.2) ā€¢ HA via leader election ā€¢ Supports multiple schedulers ā€¢ Pod can choose the scheduler that schedules it ā€¢ Schedulers share same pool of nodes, race condition is possible ā€¢ Kubelet can reject Pods in case of conļ¬‚ict, and triggers reschedule ā€¢ Easy to write userā€™s own scheduler or just extend algorithm provider for default scheduler
  • 34. Copyright Ā© 2017 Hao Zhang All rights reserved. Kube-scheduler More Readings: āœ¤ Documentations: āœ¤ Scheduler Overview: https://github.com/kubernetes/community/blob/master/contributors/devel/scheduler.md āœ¤ Guaranteed scheduling through re-scheduling: https://kubernetes.io/docs/tasks/administer-cluster/guaranteed-scheduling-critical-addon-pods/ āœ¤ Default Scheduling Algorithm: https://github.com/kubernetes/community/blob/master/contributors/devel/scheduler_algorithm.md āœ¤ Blogs āœ¤ Scheduler Optimization by CoreOS: https://coreos.com/blog/improving-kubernetes-scheduler-performance.html āœ¤ Advanced Scheduling and Customized Scheduler After 1.6: http://blog.kubernetes.io/2017/03/advanced-scheduling-in-kubernetes.html āœ¤ Critical Code Snippets āœ¤ Scheduling Predicates: https://github.com/kubernetes/kubernetes/blob/master/plugin/pkg/scheduler/algorithm/predicates/predicates.go āœ¤ Scheduling Defaults: https://github.com/kubernetes/kubernetes/blob/master/plugin/pkg/scheduler/algorithmprovider/defaults/defaults.go āœ¤ Designs, Specs, Discussions āœ¤ Scheduling Feature Proposals: https://github.com/kubernetes/community/tree/master/contributors/design-proposals/scheduling
  • 35. Copyright Ā© 2017 Hao Zhang All rights reserved. Kubelet etcd3 Kube-apiserver Kube-controller- manager Kube-scheduler Persistent Data Volume (Usually SSD) KubeletKube-proxy REST/protobuf REST/protobuf REST/protobuf REST/protobuf RPC/protobuf Master Node Minion Node
  • 36. Copyright Ā© 2017 Hao Zhang All rights reserved. Kubelet ā€¢ Kubernetesā€™ node worker ā€¢ Ensure Pods run as described by spec ā€¢ Report readiness and liveness ā€¢ Node-level admission control ā€¢ Reserve resource, limite # of Pods, etc. ā€¢ Report node status to API server (v1Node) ā€¢ Machine info, resource usage, health, images ā€¢ Host stats (Cgroup metrics via cAdvisor) ā€¢ Communicate with container through runtime interface ā€¢ Stream logs, execute commands, port-forward, etc ā€¢ Usually not run as a container, but a service under Systemd
  • 37. Copyright Ā© 2017 Hao Zhang All rights reserved. Kubelet ContainerRuntime(i.e.Docker) ā€¦ Pod Lifecycle Event Generator (PLEG) (Pending, Running, Succeeded, Failed, Unknown) Runtime Status (RPC) Pod Cache Kubernetes ClientWatched PodsiNotifyPod Spec Source Pod Worker UID1 Pod Worker UID2 Pod Worker UIDN Runtime Op (RPC) Pod Worker Pool External Pod Event Sources: i.e. Container runtime event stream, cAdvisor events (Cgroup) Pod Status Report Volume Management Container Prober Main Sync Loop Pod Conļ¬g Channel Pod Re-sync Channel (Clock Tick) Housekeeping Channel (Clock Tick) Cleanup unwanted Pods, pod workers, release volumes, etc PLEG Channel
  • 38. Copyright Ā© 2017 Hao Zhang All rights reserved. Kubelet ā€¢ Main Controller Loop ā€¢ Listen on events from channels and react ā€¢ PLEG (Pod Lifecycle Event Generation) ā€¢ Listen and List on external sources, i.e. container runtime, cAdvisor ā€¢ Translate container status changes to Pod Events ā€¢ Pod Worker ā€¢ Interact with container runtime ā€¢ Move Pod from current state (v1Pod.status) to desired state (v1Pod.spec)
  • 39. Copyright Ā© 2017 Hao Zhang All rights reserved. Kubelet ā€¢ Container Prober ā€¢ Probe container liveness and healthiness using methods deļ¬ned by spec ā€¢ Publish event (v1Event) and put container in CrashLoopBackoff upon probing failures ā€¢ Volume Manager ā€¢ Local directory management, device mount/unmount, device initialization, etc ā€¢ cAdvisor ā€¢ Container matrix collection and reporting, events about kernel Cgroups
  • 40. Copyright Ā© 2017 Hao Zhang All rights reserved. Kubelet ā€¢ How Pod is set up ā€¢ Sandbox: An isolated environment with resource constraints ā€¢ Impl. Based on runtime ā€¢ Could be VM, Linux Cgroup, etc ā€¢ Kubelet sets the following for docker impl. ā€¢ Cgroup, Port mapping, Linux security context ā€¢ Holds resolv.conf that is shared by all containers in Pod ā€¢ dockerService.makeSandboxDockerConfig() ā€¢ App Containers ā€¢ Use Sandbox as Cgroup parent ā€¢ Share same ā€œlocalhostā€ ā€¢ dockerService.CreateContainer() ā€¢ kubeGenericRuntimeManager. startContainer() Pod App Container Pod Pod App Container Port: 80 curl localhost:80 Pod Sandbox (Infra Container) PodIP: 192.168.12.3 PortMapping: 80::33580 curl 192.168.12.3:33580
  • 41. Copyright Ā© 2017 Hao Zhang All rights reserved. Kubelet More Readings: āœ¤ Kubelet Main: https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/kubelet.go āœ¤ Kubelet main function: Kubelet.Run() āœ¤ Kubelet sync Loop channel event processing: Kubelet.syncLoopIteration() āœ¤ Kubelet sync Pod: Kubelet.syncPod() āœ¤ Pod Worker: https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/pod_workers.go āœ¤ Container Runtime Sync Pod: https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/kuberuntime/kuberuntime_manager.go āœ¤ Sync Pod using CRI: kubeGenericRuntimeManager.SyncPod() āœ¤ Container Runtime Start Container: https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/kuberuntime/kuberuntime_container.go āœ¤ How kubelet setup container: kubeGenericRuntimeManager.startContainer() āœ¤ Implementation of PodSandbox using Docker: https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/dockershim/docker_sandbox.go āœ¤ Create Pod sandbox: dockerService.RunPodSandbox() āœ¤ PLEG Design Proposal: https://github.com/kubernetes/community/blob/master/contributors/design-proposals/node/pod-lifecycle-event-generator.md āœ¤ Pod Cgroup Hierarchy: https://github.com/kubernetes/community/blob/master/contributors/design-proposals/node/pod-resource- management.md#cgroup-hierachy āœ¤ All Kubelet related design docs: https://github.com/kubernetes/community/tree/master/contributors/design-proposals/node
  • 42. Copyright Ā© 2017 Hao Zhang All rights reserved. Kube-proxy etcd3 Kube-apiserver Kube-controller- manager Kube-scheduler Persistent Data Volume (Usually SSD) KubeletKube-proxy REST/protobuf REST/protobuf REST/protobuf REST/protobuf RPC/protobuf Master Node Minion Node
  • 43. Copyright Ā© 2017 Hao Zhang All rights reserved. Kube-proxy Kubernetes Client Kube-proxy WATCH /api/v1/service WATCH /api/v1/endpoint Kernel iptables Manages iptable rules ā€¢ Daemon on every node ā€¢ Watch on changes of Service and Endpoint ā€¢ Manipulate iptable rules on host to achieve service load balancing (one of the implementations) More Readings: āœ¤ v1Service object and different types of proxy implementations: https://kubernetes.io/ docs/concepts/services-networking/service/#virtual-ips-and-service-proxies āœ¤ iptable proxy implementation: https://github.com/kubernetes/kubernetes/blob/ master/pkg/proxy/iptables/proxier.go āœ¤ Proxier.syncProxyRules() āœ¤ Other built-in proxy implementations: https://github.com/kubernetes/kubernetes/ tree/master/pkg/proxy āœ¤ Some early discussions about Service object and implementations: https:// github.com/kubernetes/kubernetes/issues/1107
  • 44. Put It Together What will happen after `kubectl create`
  • 45. Copyright Ā© 2017 Hao Zhang All rights reserved. Put it together $ kubectl create -f myapp.yaml Deployment nginx-deployment created $ $ cat myapp.yaml apiVersion: apps/v1beta1 kind: Deployment metadata: name: nginx-deployment spec: replicas: 3 selector: matchLabels: app: nginx template: metadata: labels: app: nginx spec: containers: - name: nginx image: nginx:1.7.9 ports: - containerPort: 9376 ā€¢ What will happen when you do the following?
  • 46. Copyright Ā© 2017 Hao Zhang All rights reserved. Kubernetes Reaction Chain Kubenetes API Server & Etcd Kubectl Deployment Controller WATCH v1Deployment, v1ReplicaSet ReplicaSet Controller WATCH v1ReplicaSet, v1Pod Scheduler WATCH v1Pod, v1Node Kubelet-node1 WATCH v1Pod on node1 CRI CNI Note: this is a rather high-level idea about Kubernetes reaction chain. It hides some details for illustration purpose and is different than actual behavior
  • 47. Copyright Ā© 2017 Hao Zhang All rights reserved. Kubernetes Reaction Chain Kubenetes API Server & Etcd 1.POSTv1Deployment Kubectl v1Deployment 2. Persist v1Deployment API object desiredReplica: 3 availableReplica: 0 3.Reļ¬‚ectorupdatesDep Deployment Controller WATCH v1Deployment, v1ReplicaSet ReplicaSet Controller WATCH v1ReplicaSet, v1Pod Scheduler WATCH v1Pod, v1Node Kubelet-node1 WATCH v1Pod on node1 CRI CNI Note: this is a rather high-level idea about Kubernetes reaction chain. It hides some details for illustration purpose and is different than actual behavior
  • 48. Copyright Ā© 2017 Hao Zhang All rights reserved. Kubernetes Reaction Chain Kubenetes API Server & Etcd 1.POSTv1Deployment Kubectl v1Deployment 2. Persist v1Deployment API object desiredReplica: 3 availableReplica: 0 3.Reļ¬‚ectorupdatesDep Deployment Controller WATCH v1Deployment, v1ReplicaSet 4. Deployment reconciliation loop action: create ReplicaSet 5.POSTv1ReplicaSet v1ReplicaSet 6. Persist v1ReplicaSet API object desiredReplica: 3 availableReplica: 0 ReplicaSet Controller WATCH v1ReplicaSet, v1Pod 7.Reļ¬‚ectorupdatesRS Scheduler WATCH v1Pod, v1Node Kubelet-node1 WATCH v1Pod on node1 CRI CNI Note: this is a rather high-level idea about Kubernetes reaction chain. It hides some details for illustration purpose and is different than actual behavior
  • 49. Copyright Ā© 2017 Hao Zhang All rights reserved. Kubernetes Reaction Chain Kubenetes API Server & Etcd 1.POSTv1Deployment Kubectl v1Deployment 2. Persist v1Deployment API object desiredReplica: 3 availableReplica: 0 3.Reļ¬‚ectorupdatesDep Deployment Controller WATCH v1Deployment, v1ReplicaSet 4. Deployment reconciliation loop action: create ReplicaSet 5.POSTv1ReplicaSet v1ReplicaSet 6. Persist v1ReplicaSet API object desiredReplica: 3 availableReplica: 0 ReplicaSet Controller WATCH v1ReplicaSet, v1Pod 7.Reļ¬‚ectorupdatesRS 8. ReplicaSet reconciliation loop action: create Pod 9.POSTv1Pod*3 Scheduler WATCH v1Pod, v1Node v1Pod *3 10. Persist 3 v1Pod API objects status: Pending nodeName: none 11.Reļ¬‚ectorupdatesPod Kubelet-node1 WATCH v1Pod on node1 CRI CNI Note: this is a rather high-level idea about Kubernetes reaction chain. It hides some details for illustration purpose and is different than actual behavior
  • 50. Copyright Ā© 2017 Hao Zhang All rights reserved. Kubernetes Reaction Chain Kubenetes API Server & Etcd 1.POSTv1Deployment Kubectl v1Deployment 2. Persist v1Deployment API object desiredReplica: 3 availableReplica: 0 3.Reļ¬‚ectorupdatesDep Deployment Controller WATCH v1Deployment, v1ReplicaSet 4. Deployment reconciliation loop action: create ReplicaSet 5.POSTv1ReplicaSet v1ReplicaSet 6. Persist v1ReplicaSet API object desiredReplica: 3 availableReplica: 0 ReplicaSet Controller WATCH v1ReplicaSet, v1Pod 7.Reļ¬‚ectorupdatesRS 8. ReplicaSet reconciliation loop action: create Pod 9.POSTv1Pod*3 Scheduler WATCH v1Pod, v1Node v1Pod *3 10. Persist 3 v1Pod API objects status: Pending nodeName: none 11.Reļ¬‚ectorupdatesPod 12. Scheduler assign node to Pod 13.POSTv1Binding*3 14. Persist 3 v1Binding API objects name: PodName targetNodeName: node1 Kubelet-node1 WATCH v1Pod on node1 15.Podspecpushedtokubelet CRI 16. Pull Image, Run Container, PLEG, etc. Create v1Event and update Pod status CNI v1Binding * 3 Note: this is a rather high-level idea about Kubernetes reaction chain. It hides some details for illustration purpose and is different than actual behavior
  • 51. Copyright Ā© 2017 Hao Zhang All rights reserved. Kubernetes Reaction Chain Kubenetes API Server & Etcd 1.POSTv1Deployment Kubectl v1Deployment 2. Persist v1Deployment API object desiredReplica: 3 availableReplica: 0 3.Reļ¬‚ectorupdatesDep Deployment Controller WATCH v1Deployment, v1ReplicaSet 4. Deployment reconciliation loop action: create ReplicaSet 5.POSTv1ReplicaSet v1ReplicaSet 6. Persist v1ReplicaSet API object desiredReplica: 3 availableReplica: 0 ReplicaSet Controller WATCH v1ReplicaSet, v1Pod 7.Reļ¬‚ectorupdatesRS 8. ReplicaSet reconciliation loop action: create Pod 9.POSTv1Pod*3 Scheduler WATCH v1Pod, v1Node v1Pod *3 10. Persist 3 v1Pod API objects status: Pending nodeName: none 11.Reļ¬‚ectorupdatesPod 12. Scheduler assign node to Pod 13.POSTv1Binding*3 14. Persist 3 v1Binding API objects name: PodName targetNodeName: node1 Kubelet-node1 WATCH v1Pod on node1 15.Podspecpushedtokubelet CRI 16. Pull Image, Run Container, PLEG, etc. Create v1Event and update Pod status 17.PATCHv1Pod*3 17-1.PUTv1Event 17. Update 3 v1Pod API objects status: PodInitializing -> Running nodeName: node1 v1Event * N 17-1. Several v1Event objects created e.g. ImagePulled, CreatingContainer, StartedContainer, etc., controllers, scheduler can also create events throughout the entire reaction chain CNI v1Binding * 3 Note: this is a rather high-level idea about Kubernetes reaction chain. It hides some details for illustration purpose and is different than actual behavior
  • 52. Copyright Ā© 2017 Hao Zhang All rights reserved. Kubernetes Reaction Chain Kubenetes API Server & Etcd 1.POSTv1Deployment Kubectl v1Deployment 2. Persist v1Deployment API object desiredReplica: 3 availableReplica: 0 3.Reļ¬‚ectorupdatesDep Deployment Controller WATCH v1Deployment, v1ReplicaSet 4. Deployment reconciliation loop action: create ReplicaSet 5.POSTv1ReplicaSet v1ReplicaSet 6. Persist v1ReplicaSet API object desiredReplica: 3 availableReplica: 0 ReplicaSet Controller WATCH v1ReplicaSet, v1Pod 7.Reļ¬‚ectorupdatesRS 8. ReplicaSet reconciliation loop action: create Pod 9.POSTv1Pod*3 Scheduler WATCH v1Pod, v1Node v1Pod *3 10. Persist 3 v1Pod API objects status: Pending nodeName: none 11.Reļ¬‚ectorupdatesPod 12. Scheduler assign node to Pod 13.POSTv1Binding*3 14. Persist 3 v1Binding API objects name: PodName targetNodeName: node1 Kubelet-node1 WATCH v1Pod on node1 15.Podspecpushedtokubelet CRI 16. Pull Image, Run Container, PLEG, etc. Create v1Event and update Pod status 17.PATCHv1Pod*3 17-1.PUTv1Event 17. Update 3 v1Pod API objects status: PodInitializing -> Running nodeName: node1 v1Event * N 17-1. Several v1Event objects created e.g. ImagePulled, CreatingContainer, StartedContainer, etc., controllers, scheduler can also create events throughout the entire reaction chain CNI 18.Reļ¬‚ectorupdatesPods 19. ReplicaSet reconciliation loop Update RS when ever a Pod enters ā€œRunningā€ state v1Binding * 3 Note: this is a rather high-level idea about Kubernetes reaction chain. It hides some details for illustration purpose and is different than actual behavior
  • 53. Copyright Ā© 2017 Hao Zhang All rights reserved. Kubernetes Reaction Chain Kubenetes API Server & Etcd 1.POSTv1Deployment Kubectl v1Deployment 2. Persist v1Deployment API object desiredReplica: 3 availableReplica: 0 3.Reļ¬‚ectorupdatesDep Deployment Controller WATCH v1Deployment, v1ReplicaSet 4. Deployment reconciliation loop action: create ReplicaSet 5.POSTv1ReplicaSet v1ReplicaSet 6. Persist v1ReplicaSet API object desiredReplica: 3 availableReplica: 0 ReplicaSet Controller WATCH v1ReplicaSet, v1Pod 7.Reļ¬‚ectorupdatesRS 8. ReplicaSet reconciliation loop action: create Pod 9.POSTv1Pod*3 Scheduler WATCH v1Pod, v1Node v1Pod *3 10. Persist 3 v1Pod API objects status: Pending nodeName: none 11.Reļ¬‚ectorupdatesPod 12. Scheduler assign node to Pod 13.POSTv1Binding*3 14. Persist 3 v1Binding API objects name: PodName targetNodeName: node1 Kubelet-node1 WATCH v1Pod on node1 15.Podspecpushedtokubelet CRI 16. Pull Image, Run Container, PLEG, etc. Create v1Event and update Pod status 17.PATCHv1Pod*3 17-1.PUTv1Event 17. Update 3 v1Pod API objects status: PodInitializing -> Running nodeName: node1 v1Event * N 17-1. Several v1Event objects created e.g. ImagePulled, CreatingContainer, StartedContainer, etc., controllers, scheduler can also create events throughout the entire reaction chain CNI 18.Reļ¬‚ectorupdatesPods 19. ReplicaSet reconciliation loop Update RS when ever a Pod enters ā€œRunningā€ state 20.PATCHv1ReplicaSet 21. Update v1ReplicaSet API object desiredReplica: 3 availableReplica: 0 -> 3 v1Binding * 3 Note: this is a rather high-level idea about Kubernetes reaction chain. It hides some details for illustration purpose and is different than actual behavior
  • 54. Copyright Ā© 2017 Hao Zhang All rights reserved. Kubernetes Reaction Chain Kubenetes API Server & Etcd 1.POSTv1Deployment Kubectl v1Deployment 2. Persist v1Deployment API object desiredReplica: 3 availableReplica: 0 3.Reļ¬‚ectorupdatesDep Deployment Controller WATCH v1Deployment, v1ReplicaSet 4. Deployment reconciliation loop action: create ReplicaSet 5.POSTv1ReplicaSet v1ReplicaSet 6. Persist v1ReplicaSet API object desiredReplica: 3 availableReplica: 0 ReplicaSet Controller WATCH v1ReplicaSet, v1Pod 7.Reļ¬‚ectorupdatesRS 8. ReplicaSet reconciliation loop action: create Pod 9.POSTv1Pod*3 Scheduler WATCH v1Pod, v1Node v1Pod *3 10. Persist 3 v1Pod API objects status: Pending nodeName: none 11.Reļ¬‚ectorupdatesPod 12. Scheduler assign node to Pod 13.POSTv1Binding*3 14. Persist 3 v1Binding API objects name: PodName targetNodeName: node1 Kubelet-node1 WATCH v1Pod on node1 15.Podspecpushedtokubelet CRI 16. Pull Image, Run Container, PLEG, etc. Create v1Event and update Pod status 17.PATCHv1Pod*3 17-1.PUTv1Event 17. Update 3 v1Pod API objects status: PodInitializing -> Running nodeName: node1 v1Event * N 17-1. Several v1Event objects created e.g. ImagePulled, CreatingContainer, StartedContainer, etc., controllers, scheduler can also create events throughout the entire reaction chain CNI 18.Reļ¬‚ectorupdatesPods 19. ReplicaSet reconciliation loop Update RS when ever a Pod enters ā€œRunningā€ state 20.PATCHv1ReplicaSet 21. Update v1ReplicaSet API object desiredReplica: 3 availableReplica: 0 -> 3 22.Reļ¬‚ectorupdatesRS 23. Deployment reconciliation loop Update Dev status with RS status changes v1Binding * 3 Note: this is a rather high-level idea about Kubernetes reaction chain. It hides some details for illustration purpose and is different than actual behavior
  • 55. Copyright Ā© 2017 Hao Zhang All rights reserved. Kubernetes Reaction Chain Kubenetes API Server & Etcd 1.POSTv1Deployment Kubectl v1Deployment 2. Persist v1Deployment API object desiredReplica: 3 availableReplica: 0 3.Reļ¬‚ectorupdatesDep Deployment Controller WATCH v1Deployment, v1ReplicaSet 4. Deployment reconciliation loop action: create ReplicaSet 5.POSTv1ReplicaSet v1ReplicaSet 6. Persist v1ReplicaSet API object desiredReplica: 3 availableReplica: 0 ReplicaSet Controller WATCH v1ReplicaSet, v1Pod 7.Reļ¬‚ectorupdatesRS 8. ReplicaSet reconciliation loop action: create Pod 9.POSTv1Pod*3 Scheduler WATCH v1Pod, v1Node v1Pod *3 10. Persist 3 v1Pod API objects status: Pending nodeName: none 11.Reļ¬‚ectorupdatesPod 12. Scheduler assign node to Pod 13.POSTv1Binding*3 14. Persist 3 v1Binding API objects name: PodName targetNodeName: node1 Kubelet-node1 WATCH v1Pod on node1 15.Podspecpushedtokubelet CRI 16. Pull Image, Run Container, PLEG, etc. Create v1Event and update Pod status 17.PATCHv1Pod*3 17-1.PUTv1Event 17. Update 3 v1Pod API objects status: PodInitializing -> Running nodeName: node1 v1Event * N 17-1. Several v1Event objects created e.g. ImagePulled, CreatingContainer, StartedContainer, etc., controllers, scheduler can also create events throughout the entire reaction chain CNI 18.Reļ¬‚ectorupdatesPods 19. ReplicaSet reconciliation loop Update RS when ever a Pod enters ā€œRunningā€ state 20.PATCHv1ReplicaSet 21. Update v1ReplicaSet API object desiredReplica: 3 availableReplica: 0 -> 3 22.Reļ¬‚ectorupdatesRS 24.PATCHv1Deployment 23. Deployment reconciliation loop Update Dev status with RS status changes 25. Update v1Deployment API object desiredReplica: 3 availableReplica: 0 -> 3 v1Binding * 3 Note: this is a rather high-level idea about Kubernetes reaction chain. It hides some details for illustration purpose and is different than actual behavior
  • 56. To be continued in Part II