This presentation looks deep into the concept of containerization. What is containerization, how is it different from VMs, how containerization is achieved using Linux containers (LXC), control groups (cgroups) and copy on write file systems and current trends in containerization/docker are described.
11. Containers vs VMs - Virtualization
● Containers virtualize at the operating
system level.
○ Runs on Docker daemon
● Effectively virtualize the operating system.
● Make available protected portions of
operating system.
○ Two containers running on the same
operating system don't know that they are
sharing resources because each has its own
abstracted networking layer, processes and
so on.
● Use a layer on top of hardware
(hypervisor) to make pieces of hardware
available for virtual machines to install host
OS.
● Hypervisor-based solutions virtualize at the
hardware level.
○ “Type 1” (ex: Xen, VMWare ESX) on bare
metal hardware
○ “Type 2” (ex: VMWare/VirtualBox open
source versions) on the guest OS
12. Containers vs VMs - OS’s and Resources
● Containers run on an already running
operating system as the host environment.
○ Executes in spaces that are isolated from
each other and from certain parts of the
host OS.
● Much efficient resource utilization
○ If a container is not executing anything, no
resource is used.
○ Containers can call upon their host OS to
satisfy some or all of their dependencies.
● Containers are cheap and therefore fast to
create and destroy.
○ Just the cost of creating/stopping processes
that run in the isolated space.
○ Similar to starting/stopping a program in
our computer.
● Hypervisors only provide access to
hardware. We need to install the guest OS
by ourselves.
● When an OS per VM is running on the
same server, they eats up server resources
(CPU, RAM and bandwidth).
○ Inefficient resource utilization because
multiple guest OS’s eating up resources
(CPU time, etc) unnecessarily.
● Creation and destruction of a VM mean
booting up/shutting down an entire OS.
15. Why Docker?
● Docker tries to solve the
problem of “dependency hell”
● Imagine being able to package
an application along with all of
its dependencies easily and then
run it smoothly in disparate
development, test and
production environments
Dependency Hell
17. Under the hood
● Processes executing in a Docker container are isolated from processes running
on the host OS or in other Docker containers.
○ Nevertheless, all processes are executing in the same kernel
○ Containers sandbox processes from each other
● Docker uses 3 concepts to achieve this OS level virtualization.
○ LXC(Linux Containers)
■ Namespaces - To provide namespaces for containers
■ cgroups (Control Groups) - For resource auditing and limiting
○ copy-on-write filesystem - AuFS (Advanced Multi-Layered Unification Filesystem)
20. LXC Namespaces
● A user-space control package for Linux Containers.
○ Limits what you can see (and therefore use).
● Uses namespaces for isolation at different levels.
○ Uses kernel-level namespaces to isolate the container from the host.
○ User namespace separates the container's and the host's user database, thus ensuring that the
container's root user does not have root privileges on the host.
○ The process namespace is responsible for displaying and managing only processes running in the
container, not the host.
○ the network namespace provides the container with its own network device and virtual IP
address.
21. LXC Namespaces contd ...
● Provide processes with their own view of the system
● Multiple namespaces:
○ pid
○ net
○ mnt
○ uts
○ ipc
○ user
● Each process is in one namespace of each type
22. PID Namespaces
● Processes within a PID namespace only see processes in the same PID
namespace.
● Each PID namespace has its own numbering.
○ Starting at 1
○ When PID 1 goes away, the whole namespace is killed.
● Those namespaces can be nested.
● A process ends up having multiple PIDs
○ One per namespace in which its nested
23. Net Namespaces
● Processes within a given network namespace get their own private network
stack, including:
○ network interfaces (including lo)
○ routing tables
○ iptables rules
○ sockets (ss, netstat)
● You can move a network interface from a netns to another
○ ip link set dev eth0 netns PID
24. Mnt Namespaces
● Processes can have their own root fs (chroot)
● Processes can also have "private" mounts
○ /tmp (scoped per user, per service...)
○ Masking of /proc, /sys
○ NFS automounts
● Mounts can be totally private, or shared
25. IPC Namespaces
● Allows a process (or group of processes) to have own:
○ IPC semaphores
○ IPC message queues
○ IPC shared memory
● without risk of conflict with other instances
26. User Namespaces
● Allows to map UID/GID; e.g.:
○ UID 0→1999 in container C1 is mapped to
○ UID 10000→11999 on host
○ UID 0→1999 in container C2 is mapped to
○ UID 12000→13999 on host
○ etc.
● Avoids extra configuration in containers
● UID 0 (root) can be squashed to a non-privileged user
● Security improvement
28. LXC cgroups
● Older than namespaces concept.
● Resource metering and limiting
○ Memory
○ CPU
○ block I/O
○ network
● Device node (/dev/*) access control
● While allowing Docker to limit the resources being consumed by a container
also output lots of metrics about these resources.
○ Allow Docker to monitor the resource consumption of the various processes within the
containers and make sure that each gets only its fair share of the available resources.
30. Copy-on-write filesystem
● Create a new container instantly
○ Instead of copying its whole filesystem
○ Allows Docker to use certain images as the basis for containers
● Storage keeps track of what has changed
● Many options available
○ AuFS (Advanced Multi-Layered Unification Filesystem), overlay (file level)
○ BTRFS, VFS
○ Device-Mapper
● Considerably reduces footprint and "boot" times
50. References
● Docker: lightweight linux containers for consistent development and
deployment [2014]
● An updated performance comparison of virtual machines and Linux containers
[2015]
● https://www.slideshare.net/jpetazzo/anatomy-of-a-container-namespaces-
cgroups-some-filesystem-magic-linuxcon
● https://www.slideshare.net/Docker/golubbenarevmspasse-
140402122017phpapp02-37589021
● https://www.slideshare.net/julienbarbier42/docker-the-future-of-distributed-
applications-docker-tour-de-france-2014
Notas do Editor
Full/Native - The virtual machine simulates enough hardware to allow an unmodified "guest" OS (one designed for the same CPU) to be run in isolation.
Hardware Assisted - The virtual machine has its own hardware and allows a guest OS to be run in isolation.
Paravirtualization - The virtual machine does not necessarily simulate hardware, but instead (or in addition) offers a special API that can only be used by modifying the "guest" OS.
A technology that has been present in Linux kernels for 5+ years and is considered fairly mature.
A layered file system that can transparently overlay one or more existing filesystems. When a process needs to modify a file, AuFS creates a copy of that file. AuFS is capable of merging multiple layers into a single representation of a filesystem. This process is called copy-on-write