93. Static website
Web frontend
User DB
Queue Analytics DB
Background workers
API endpoint
nginx 1.5 + modsecurity + openssl + bootstrap 2
postgresql + pgv8 + v8
hadoop + hive + thrift + OpenJDK
Ruby + Rails + sass + Unicorn
Redis + redis-sentinel
Python 3.0 + celery + pyredis + libcurl + ffmpeg + libopencv + nodejs +
phantomjs
Python 2.7 + Flask + pyredis + celery + psycopg + postgresql-client
Development VM
QA server
Public Cloud
Disaster recovery
Contributor’s laptop
Production Servers
The Challenge
Multiplicityof
Stacks
Multiplicityof
hardware
environments
Production Cluster
Customer Data Center
Doservicesand
appsinteract
appropriately?
CanImigrate
smoothlyand
quickly?
94. Static website Web frontendUser DB Queue Analytics DB
Development
VM
QA server Public Cloud Contributor’s
laptop
Docker is a shipping container system for
code
MultiplicityofStacks
Multiplicityof
hardware
environments
Production
Cluster
Customer Data
Center
Doservicesand
appsinteract
appropriately?
CanImigrate
smoothlyand
quickly
…that can be manipulated using
standard operations and run
consistently on virtually any
hardware platform
An engine that enables any
payload to be encapsulated
as a lightweight, portable,
self-sufficient container…
98. Why Developers Care
• Build once…(finally) run anywhere*
• A clean, safe, hygienic and portable runtime environment for your app.
• No worries about missing dependencies, packages and other pain points during
subsequent deployments.
• Run each app in its own isolated container, so you can run various versions of
libraries and other dependencies for each app without worrying
• Automate testing, integration, packaging…anything you can script
• Reduce/eliminate concerns about compatibility on different platforms, either
your own or your customers.
• Cheap, zero-penalty containers to deploy services? A VM without the overhead
of a VM? Instant replay and reset of image snapshots? That’s the power of
Docker
* Almost ;-)
99. Why Devops Cares?
• Configure once…run anything
• Make the entire lifecycle more efficient, consistent, and repeatable
• Increase the quality of code produced by developers.
• Eliminate inconsistencies between development, test, production, and
customer environments
• Support segregation of duties
• Significantly improves the speed and reliability of continuous
deployment and continuous integration systems
• Because the containers are so lightweight, address significant
performance, costs, deployment, and portability issues normally
associated with VMs
100. Why it works—separation of concerns
• Dan the Developer
• Worries about what’s “inside”
the container
• His code
• His Libraries
• His Package Manager
• His Apps
• His Data
• All Linux servers look the same
• Oren the Ops Guy
• Worries about what’s “outside”
the container
• Logging
• Remote access
• Monitoring
• Network configuration
• All containers start, stop, copy,
attach, migrate, etc. the same
way
104. Docker Containers as a Service Platform
Docker
Toolbox
Docker
Trusted Registry
Docker Universal
Control Plane
Build Ship Run
• “docker push”
with image signing
• Search/browse repos
• Teams-based RBAC
• View signed images
• Deleting tags
• Authentication
• Deploy and scale-out app
• Monitor stats
• Secrets management
105. App
A
Containers vs. VMs
Hypervisor (Type 2)
Host OS
Server
Gues
t
OS
Bins/
Libs
App
A’
Gues
t
OS
Bins/
Libs
App
B
Gues
t
OS
Bins/
Libs
AppA’
Docker
Host OS
Server
Bins/Libs
AppA
Bins/Libs
AppB
AppB’
AppB’
AppB’
VM
Container
Containers are isolated,
but share OS and, where
appropriate, bins/libraries
Gues
t
OS
Gues
t
OS
…result is significantly faster deployment,
much less overhead, easier migration,
faster restart
106. Why are Docker containers lightweight?
Bins/
Libs
App
A
Original App
(No OS to take
up space, resources,
or require restart)
AppΔ
Bins
/
App
A
Bins/
Libs
App
A’
Gues
t
OS
Bins/
Libs
Modified App
Copy on write
capabilities allow
us to only save the diffs
Between container A
and container
A’
VMs
Every app, every copy of an
app, and every slight modification
of the app requires a new virtual server
App
A
Gues
t
OS
Bins/
Libs
Copy of
App
No OS. Can
Share bins/libs
App
A
Gues
t
OS
Gues
t
OS
VMs Containers
109. Isolation using Linux kernel features
namespaces
pid
mnt
net
uts
ipc
user
cgroups
memory
cpu
blkio
devices
110. Linux Cgroups ……
• Kernel Feature
• Groups of processes
• Control resource allocations
– CPU
– Memory
– Disk
– I/O
• May be nested
111. • Kernel Feature
• Restrict your view of the system
– Mounts (CLONE_NEWNS)
– UTS (CLONE_NEWUTS)
• uname() output
– IPC (CLONE_NEWIPC)
– PID (CLONE_NEWPID)
– Networks (CLONE_NEWNET)
– User (CLONE_NEWUSER)
• Not supported in Docker yet
• Has privileged/unprivileged modes today
• May be nested
Linux Kernel Namespaces ……
113. What are the basics of the Docker
system?
Source
Code
Repositor
y
Dockerfil
e
For
A
Docker Engine
Docker
Container
Image
Registry
Build
Docker
Host 2 OS (Linux)
ContainerA
Container
B
Container
C
ContainerA
Push
Search
Pull
Run
Host 1 OS (Linux)
114. Changes and Updates
Docker Engine
Docker
Container
Image
Registry
Docker Engine
Push
Update
Bins/
Libs
App
A
AppΔ
Bins
/
Base
Container
Image
Host is now running A’’
Container
Mod A’’
AppΔ
Bins
/
Bins/
Libs
App
A
Bins
/
Bins/
Libs
App
A’’
Host running A wants to upgrade to A’’.
Requests update. Gets only diffs
Container
Mod A’
116. • Like a Makefile (shell script with keywords)
• Extends from a Base Image
• Results in a new Docker Image
• Imperative, not Declarative
• A Docker file lists the steps needed to build an images
• docker build is used to run a Docker file
• Can define default command for docker run, ports to expose, etc
Dockerfile ……
117. docker-compose: running multiple
containers
Run your stack with one command: docker-compose
up
Describe your stack with one file: docker-
compose.ymlweb:
build: .
command: python app.py
ports:
- "5000:5000"
volumes:
- .:/code
links:
- redis:redis
redis:
image: redis
131. • The Life of a Container
– Conception
• BUILD an Image from a Dockerfile
– Birth
• RUN (create+start) a container
– Reproduction
• COMMIT (persist) a container to a new image
• RUN a new container from an image
– Sleep
• KILL/stop a running container
– Wake
• START a stopped container
– Death
• RM (delete) a stopped container
• Extinction
– RMI a container image (delete image)
Docker Container Lifecycle
138. Concept -> Service Dependency Graph
Your App/Service
Service X
Service Y
Service Z
Service L
Service M
139. Why ?
• Faster and simpler deployments and rollbacks
• Independent Speed of Delivery (by different teams)
• Right framework/tool/language for each domain
• Recommendation component using Python?, Catalog
Service in Java ..
• Greater Resiliency
• Fault Isolation
• Better Availability
• If architected right
152. Ecosystem Support
• Operating systems
• Virtually any distribution with a 2.6.32+ kernel
• Red Hat/Docker collaboration to make work across RHEL 6.4+, Fedora, and other members of
the family (2.6.32 +)
• CoreOS—Small core OS purpose built with Docker
• OpenStack
• Docker integration into NOVA (& compatibility with Glance, Horizon, etc.) accepted for Havana
release
• Private PaaS
• OpenShift
• Solum (Rackspace, OpenStack)
• Other TBA
• Public PaaS
• Deis, Voxoz, Cocaine (Yandex), Baidu PaaS
• Public IaaS
• Native support in Rackspace, Digital Ocean,+++
• AMI (or equivalent) available for AWS & other
• DevOps Tools
• Integrations with Chef, Puppet, Jenkins, Travis, Salt, Ansible +++
• Orchestration tools
• Mesos, Heat, ++
• Shipyard & others purpose built for Docker
• Applications
• 1000’s of Dockerized applications available at index.docker.io
153. Advanced topics
• Data
• Today: Externally mounted volumes
• Share volumes between containers
• Share volume between a containers and underlying hosts
• high-performance storage backend for your production database
• making live development changes available to a container, etc.
• Optional: specify memory limit for containers, CPU priority
• Device mapper/ LVM snapshots in 0.7
• Futures:
• I/O limits
• Container resource monitoring (CPU & memory usage)
• Orchestration (linking & synchronization between containers)
• Cluster orchestration (multi-host environment)
• Networking
• Supported today:
• UDP/TCP port allocation to containers
• specify which public port to redirect. If you don’t specify a public port, Docker will revert to allocating a random
public port.
• Docker uses IPtables/netfilter
• IP allocation to containers
• Docker uses virtual interfaces, network bridge,
• Futures:
• See Pipework (Upstream) : Software-Defined Networking for Linux Containers
(https://github.com/jpetazzo/pipework)
• Certain pipework concepts will move from upstream to part of core Docker
• Additional capabilities come with libvirt support in 0.8-0.9 timeframe
196. Windows Server Node in Kubernetes
kubelet kube-proxy
Kubernetes
Master
Components
(unchanged)
Kubectl
(unchanged)
Windows Server 2016 Node
docker
Infra
container POD
container
Infra
container POD
container
Without VMs: single OS owns all hardware resources while with VMs: Multi-OSes share hardware resources through VMM.
A virtual machine is implemented by adding software to an execution platform to give it the appearance of a different platform, or for that matter, to give the appearance of multiple platforms.
We introduce the definition of Wikipedia.
Virtualization is a term that (wikipedia) refers to the abstraction of computer resources, or the act of creating a virtual (rather than actual) version of something in computing, including virtual computer hardware platforms, operating systems, storage devices, and computer network resources.
Virtualization has a history of 50 years. It firstly appeared in IBM mainframe computer OS in the 1960s, used as a method of logically dividing the system resources of mainframe computers between different applications. Since then, the meaning of the term virtualization has broadened.
About 30 years before 1999, the interest to virtualization was decreased and the development of virtualization was almost at a standstill. Until in 1999 VMware Inc. presented Vmware virtual platform for x86-32 architecture, a common architecture used in most computers.
In 2003, the University of Cambridge Computer Laboratory designed Xen, an open-source virtualization, causing the climax of academia research on virtualization.
Afterwards, more virtualization productions are developed, such as KVM, Linux OS supports virtualization, Kernel-based VM(KVM) in 2007. At the same time, Intel and AMD provided the hardware support to virtualization, such as Intel (VT): VT-x,VT-d, VT-I; and AMD: AMD-V, Pacifica. Nowadays, the commercial virtualization tools has been about 50 types. The virtualization has become a crucial technology in cloud computing.
We introduce four types of virtualization according to virtualization technology types.
Hardware virtualization or platform virtualization is the most fundamental virtualization. It refers to the creation of a virtual machine (VM) that acts like a real computer with an operating system (OS). Software executed on these VMs is separated from the underlying hardware resources.
For example, a computer that is running Microsoft Windows may host a virtual machine that looks like a computer with the Ubuntu Linux operating system; Ubuntu-based software can be run on the virtual machine.
In hardware virtualization, the host machine is the actual machine on which the virtualization takes place, and the guest machine is the virtual machine.
The words host and guest are used to distinguish the software that runs on the physical machine from the software that runs on the virtual machine.
The software that creates or manages a virtual machine on the host hardware is called a hypervisor or Virtual Machine Manager or Monitor.
In virtualization, a virtual machine monitor (VMM), also called as Hypervisor, is added to the traditional computer architecture. VMM transforms the single machine interface into the illusion of many. Each of these interfaces (virtual machines) is an efficient replica of the original computer system, complete with all of the processor instructions.
In Hypervisor type I, VMM can be directly installed among hardware without OS. Vmware Esx server
In Hypervisor type II, VMM need to installed among OS. E.g VMware Esx workstation.
From the perspective of design, three essential characteristics of VMM: equivalence, isolation and efficiency should be taken into consideration.
Equivalence: the virtual platform must be equivalent to the physical platform except …
2)The isolation or resource control must guarantee that
VMM is in complete control of system resources: VM can’t access any resources not explicitly allocated to it. VMM may regain control of resources already allocated.
3)The efficiency depends on only minor decreases in speed at worst and rules out traditional emulators and complete software interpreters(simulators).
So questions are:
How to simulate the cpu instructions. How to schedule PCPU to execute the Vcpu instructions.
Let’s start getting the instruction system of CPU because all hardware resource management is implemented by the instruction.
The operating system controls and manages resources by executing some instructions to complete the application program tasks. One application program denotes a set of instructions. The guest OS running an application by running these instructions by CPU. Some instructions can only be executed by root user. These instructions are privileged instructions. We call some privileged instructions as sensitive instructions. The ISA virtualization must control sensitive instructions.
The non-privileged instruction can be directly executed in guest OS and some sensitive instructions can not be executed in GOS. And the GOS will send the process of sensitive instructions to VMM and VMM sends to the PCPU to run.
The key implementation of CPU virtualization is the instruction virtualization.
There are three methods.
Binary Translation.
Paravirtualization
Hardware-assisted virtualization
Intel® VT for Intel® 64 (VT-x) and AMD-V technology
VMX Transitions.
Modifying guest OS source to cooperatively work with hypervisor for performance, simplicity etc.
The implementation approaches include hypercall, event channel, share memory and virtio.
Some sensitive instructions can be directly captured and executed by hardware extension.
This needs to improve hardware support that VM can.
The hardware can directly access the some instructions of guest OS.
Now, let’s see that how to realize the memory virtualization.
As a guest OS, it must ensure that
In general, all the linux OS expect to see physical memory starting from 0. The BIOS OS are designed to boot from address low 1M (mega byte).
OS expect to see contiguous memory in address space.
So VMM has to remap guest physical address to host physical address.
A translation lookaside buffer (TLB) is a cache that memory management hardware uses to improve virtual address translation speed.[1] The majority of desktop, laptop, and server processors includes one or more TLBs in the memory management hardware, and it is nearly always present in any hardware that utilizes paged orsegmented virtual memory.
For example, we consider four VMs run in one PM. The VM guest OS has a special contiguous memory page address.
Each page with a logical address of VM is corresponding to a PM page address.
After the redirection, the VM guest memory pages are remapped to real PM memory with noncontiguous address.
The VMM implements the page frame allocation. The primary solutions are as follows.
At first, we can partition a special amount of memory pages for each VM. This requires more $ to buy more memory.
At second, contents based sharing, pages with same contents can share the physical frame such as zero page, guest code pages etc.
Ballooning solution is used in many VMMs. A VMM-aware balloon driver running in guest OS to dynamically allocate memory from OS and release them to VMM, and vice versa.
The last solution is host swaping. The physical frames for guest pages may be swapped out.
Another conception is the shadow page table.
It is a memory address mapping table, a mapping from guest page address to the physical page address.
When a guest OS requires to access a special page, it firstly has a page address based on the guest OS. It is a virtual address.
Not a physical memory address. It firstly access the shadow page table to determine the real physical address. If the mapping is not in the shadow page table, then trigger a page miss interrupt to look for the real address from the physical memory.
The ballooning method implements VMM a dynamical memory allocation solution by a balloon driver program .
The allocation operation is like a balloon which can become big or small.
All VMs initially are allocated with a special size of memory. When a VM running application needs more memory, the VM will do ballooning in method to obtain some idle memory pages from other adjacent VMs. Then, the size of memory of this VM becomes bigger. Conversely, when the other VMs need more memory pages, the VM can also do balloon out to release some memory pages for other VMs. All memory allocation operations are completed dynamically.
每个VM首先分配指定大小的内存,随着应用的运行,内存需求会变大,在用完本地分配的内存之后,气泡就会变大,从其他VM中获取有用的内存。
当内存用完后,通过balloon out会释放内存使自己变小。
We focus on the following five virtualization tools.
raw format (KVM/Xen):
(linux default) the raw format is a plain binary image of the disc image, and is very portable.
On file systems that support sparse files, images in this format only use the space actually used by the data recorded in them.
raw format (KVM/Xen)
(linux default) the raw format is a plain binary image of the disc image, and is very portable.
On file systems that support sparse files, images in this format only use the space actually used by the data recorded in them.
Other
vmdk: VMWare file format
vdi: virtualbox file format
Image File format convert
A commercial VMM can convert its format file to the standard raw formant.
The network communication between VM1 and VM2 which are disperse placed onto distinct PMs will pass through the network interface card of PMs.
The network communication between VM1 and VM2 which are colocation placed onto the same PM will not pass through the network interface card of PMs but vNIC.
To connect to a network, a computer must be network-capable, namely, must installed a working network interface controller (NIC), also known as a network card or network adapter. The NIC enables the computer to interact with a network.
Many computers must use switch to connect together.
Multiple computers interconnecting requires a network device, called switch.
In most business environments, computers are usually connected to a device called a switch, which creates a local area network (LAN).
Switches are responsible for intelligently routing this network traffic to the appropriate destination.
The network communication must need protocols such as TCP/IP.
1)Similar as physical network, many VMs can be connected together by virtual network devices, creating a virtual network
2)For each VM, the connection to network is by the virtual network card. Hypervisor can create one or more vNICs for each VM and each vNIC is identical to a physical NIC.
3)Switch can also be virtualized as a virtual switch. Each vNIC is connected to the vSwitch port, and these vSwitch access external physical network through the physical NIC of Physical Server.
In order to realize the connection between the VM and PM, it is required a virtual network device to connect.
These virtual network devices are all implemented in software and provide the software on operating system with the same function as hardware on the network.
To create and deploy VMs on physical servers (with fix capacity)
Two steps
Estimating VM size
Forecast future use
Find capacity size that can cover the workload
VM placement
Bin packing problem
Traditionally VM-sizing is done in a VM-by-VM basis
Each VM has an estimated size based on its workload pattern
Low consolidation ratio and resource utilization
Joint-VM provisioning
Statistically, the peaks and valleys of VMs do not necessarily coincide with each other
VM multiplexing can save capacity
Role-based access control (RBAC)
To re-cap, you saw and end-to-end demo of the Docker CaaS Platform
Devs using Docker Toolbox and Docker Trusted Registry
Ops using DTR and DUCP
Work together to move quickly yet with control
Who wants to try these out?
Docker Hub is Docker’s cloud service for …
Publishing and discovering container images through the public registry
Team collaboration and automation of application workflows
DAY 1 REFERENCE “… as we saw yesterday, Docker Content Trust …
… uses trust service on Docker Hub (built on Notary)
… but now you can stand-up a trust service on-premise alongside DTR
…DTR to store, view, and distribute signed images
DTR is the only registry on the market with this capability.
It’s important to note that the containers run on VMs in (most) public clouds, which is why node auto-scaling is important.
I would expect Azure to eventually provide node auto-scaling.
Examples of Kubernetes as a service: StackPointCloud (which is what I tried) and the new KUBE2GO (“Run Kubernetes Anywhere. Instantly. Free.”).