5. Outline
● Why Linux Containers?
● What are Linux Containers exactly?
● What do we need on top of LXC?
● Why Docker?
● What is Docker exactly?
● Where is it going?
6. Outline
● Why Linux Containers?
● What are Linux Containers exactly?
● What do we need on top of LXC?
● Why Docker?
● What is Docker exactly?
● Where is it going?
15. Many targets
● your local development environment
● your coworkers' developement environment
● your Q&A team's test environment
● some random demo/test server
● the staging server(s)
● the production server(s)
● bare metal
● virtual machines
● shared hosting
+ your dog's Raspberry Pi
23. Solution to the transport problem:
the intermodal shipping container
24. Solution to the transport problem:
the intermodal shipping container
● 90% of all cargo now shipped in a standard container
● faster and cheaper to load and unload on ships
(by an order of magnitude)
● less theft, less damage
● freight cost used to be >25% of final goods cost, now <3%
● 5000 ships deliver 200M containers per year
26. Linux containers...
● run everywhere
– regardless of kernel version
– regardless of host distro
– (but container and host architecture must match*)
● run anything
– if it can run on the host, it can run in the container
– i.e., if it can run on a Linux kernel, it can run
*Unless you emulate CPU with qemu and binfmt
27. Outline
● Why Linux Containers?
● What are Linux Containers exactly?
● What do we need on top of LXC?
● Why Docker?
● What is Docker exactly?
● Where is it going?
29. High level approach:
it's a lightweight VM
● own process space
● own network interface
● can run stuff as root
● can have its own /sbin/init
(different from the host)
« Machine Container »
30. Low level approach:
it's chroot on steroids
● can also not have its own /sbin/init
● container = isolated process(es)
● share kernel with host
● no device emulation (neither HVM nor PV)
« Application Container »
31. Separation of concerns:
Dmitry the Developer
● inside my container:
– my code
– my libraries
– my package manager
– my app
– my data
32. Separation of concerns:
Oleg the Ops guy
● outside the container:
– logging
– remote access
– network configuration
– monitoring
33. How does it work?
Isolation with namespaces
● pid
● mnt
● net
● uts
● ipc
● user
34. pid namespace
jpetazzo@tarrasque:~$ ps aux | wc -l
212
jpetazzo@tarrasque:~$ sudo docker run -t -i ubuntu bash
root@ea319b8ac416:/# ps aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.0 0.0 18044 1956 ? S 02:54 0:00 bash
root 16 0.0 0.0 15276 1136 ? R+ 02:55 0:00 ps aux
(That's 2 processes)
39. user namespace
● no « demo » for this one... Yet!
● UID 0→1999 in container C1 is mapped to
UID 10000→11999 in host;
UID 0→1999 in container C2 is mapped to
UID 12000→13999 in host; etc.
● required lots of VFS and FS patches (esp. XFS)
● what will happen with copy-on-write?
– double translation at VFS?
– single root UID on read-only FS?
40. How does it work?
Isolation with cgroups
● memory
● cpu
● blkio
● devices
41. memory cgroup
● keeps track pages used by each group:
– file (read/write/mmap from block devices; swap)
– anonymous (stack, heap, anonymous mmap)
– active (recently accessed)
– inactive (candidate for eviction)
● each page is « charged » to a group
● pages can be shared (e.g. if you use any COW FS)
● Individual (per-cgroup) limits and out-of-memory killer
42. cpu and cpuset cgroups
● keep track of user/system CPU time
● set relative weight per group
● pin groups to specific CPU(s)
– Can be used to « reserve » CPUs for some apps
– This is also relevant for big NUMA systems
43. blkio cgroups
● keep track IOs for each block device
– read vs write; sync vs async
● set relative weights
● set throttle (limits) for each block device
– read vs write; bytes/sec vs operations/sec
Note: earlier versions (pre-3.8) didn't account async correctly.
3.8 is better, but use 3.10 for best results.
47. Efficiency: almost no overhead
● processes are isolated, but run straight on the host
● CPU performance
= native performance
● memory performance
= a few % shaved off for (optional) accounting
● network performance
= small overhead; can be optimized to zero overhead
48. Outline
● Why Linux Containers?
● What are Linux Containers exactly?
● What do we need on top of LXC?
● Why Docker?
● What is Docker exactly?
● Where is it going?
49. Efficiency: storage-friendly
● unioning filesystems
(AUFS, overlayfs)
● snapshotting filesystems
(BTRFS, ZFS)
● copy-on-write
(thin snapshots with LVM or device-mapper)
This is now being integrated with low-level LXC tools as well!
50. Efficiency: storage-friendly
● provisioning now takes a few milliseconds
● … and a few kilobytes
● creating a new base image (from a running container) takes a
few seconds (or even less)
52. Outline
● Why Linux Containers?
● What are Linux Containers exactly?
● What do we need on top of LXC?
● Why Docker?
● What is Docker exactly?
● Where is it going?
53. What can Docker do?
● Open Source engine to commoditize LXC
● using copy-on-write for quick provisioning
● allowing to create and share images
● standard format for containers
(stack of layers; 1 layer = tarball+metadata)
● standard, reproducible way to easily build trusted images
(Dockerfile)
54. Docker: authoring images
● you can author « images »
– either with « run+commit » cycles, taking snapshots
– or with a Dockerfile (=source code for a container)
– both ways, it's ridiculously easy
● you can run them
– anywhere
– multiple times
55. Dockerfile example
FROM ubuntu
RUN apt-get -y update
RUN apt-get install -y g++
RUN apt-get install -y erlang-dev erlang-manpages erlang-base-hipe ...
RUN apt-get install -y libmozjs185-dev libicu-dev libtool ...
RUN apt-get install -y make wget
RUN wget http://.../apache-couchdb-1.3.1.tar.gz | tar -C /tmp -zxf-
RUN cd /tmp/apache-couchdb-* && ./configure && make install
RUN printf "[httpd]nport = 8101nbind_address = 0.0.0.0" >
/usr/local/etc/couchdb/local.d/docker.ini
EXPOSE 8101
CMD ["/usr/local/bin/couchdb"]
56. Yes, but...
● « I don't need Docker;
I can do all that stuff with LXC tools, rsync, some scripts! »
● correct on all accounts;
but it's also true for apt, dpkg, rpm, yum, etc.
● the whole point is to commoditize,
i.e. make it ridiculously easy to use
59. What this really means…
● instead of writing « very small shell scripts » to
manage containers, write them to do the rest:
– continuous deployment/integration/testing
– orchestration
● = use Docker as a building block
● re-use other people images (yay ecosystem!)
60. Docker: sharing images
● you can push/pull images to/from a registry
(public or private)
● you can search images through a public index
● dotCloud maintains a collection of base images
(Ubuntu, Fedora...)
● satisfaction guaranteed or your money back
61. Docker: not sharing images
● private registry
– for proprietary code
– or security credentials
– or fast local access
● the private registry is available
as an image on the public registry
(yes, that makes sense)
62. Typical workflow
● code in local environment
(« dockerized » or not)
● each push to the git repo triggers a hook
● the hook tells a build server to clone the code and run « docker build »
(using the Dockerfile)
● the containers are tested (nosetests, Jenkins...),
and if the tests pass, pushed to the registry
● production servers pull the containers and run them
● for network services, load balancers are updated
63. Hybrid clouds
● Docker is part of OpenStack « Havana »,
as a Nova driver + Glance translator
● typical workflow:
– code on local environment
– push container to Glance-backed registry
– run and manage containers using OpenStack APIs
64. Outline
● Why Linux Containers?
● What are Linux Containers exactly?
● What do we need on top of LXC?
● Why Docker?
● What is Docker exactly?
● Where is it going?
65. What's Docker exactly?
● rewrite of dotCloud internal container engine
– original version: Python, tied to dotCloud's internal stuff
– released version: Go, legacy-free
● the Docker daemon runs in the background
– manages containers, images, and builds
– HTTP API (over UNIX or TCP socket)
– embedded CLI talking to the API
● Open Source (GitHub public repository + issue tracking)
● user and dev mailing lists
67. Outline
● Why Linux Containers?
● What are Linux Containers exactly?
● What do we need on top of LXC?
● Why Docker?
● What is Docker exactly?
● Where is it going?
69. Docker: the ecosystem
● Cocaine (PAAS; has Docker plugin)
● CoreOS (full distro based on Docker)
● Deis (PAAS; available)
● Dokku (mini-Heroku in 100 lines of bash)
● Flynn (PAAS; in development)
● Maestro (orchestration from a simple YAML file)
● OpenStack integration (in Havana, Nova has a Docker driver)
● Shipper (fabric-like orchestration)
And many more
70. Cocaine integration
● what's Cocaine?
– Open Source PaaS from Yandex
– modular: can switch logging, storage, etc. without changing apps
– infrastructure abstraction layer + service discovery
– monitoring: metrics collection; load balancing
● why Docker?
– Cocaine initially used cgroups
– wanted to add LXC for better isolation and resource control
– heard about Docker at the right time
– uses custom distributed storage instead of Docker registry
71. device-mapper thin snapshots
(aka « thinp »)
● start with a 10 GB empty ext4 filesystem
– snapshot: that's the root of everything
● base image:
– clone the original snapshot
– untar image on the clone
– re-snapshot; that's your image
● create container from image:
– clone the image snapshot
– run; repeat cycle as many times as needed
72. AUFS vs THINP
AUFS
● easy to see changes
● small change =
copy whole file
● ~42 layers
● patched kernel
(Debian, Ubuntu OK)
● efficient caching
● no quotas
THINP
● must diff manually
● small change =
copy 1 block (100k-1M)
● unlimited layers
● stock kernel (>3.2)
(RHEL 2.6.32 OK)
● duplicated pages
● FS size acts as quota
73. Misconceptions about THINP
● « performance degradation »
no; that was with « old » LVM snapshots
● « can't handle 1000s of volumes »
that's LVM; Docker uses devmapper directly
● « if snapshot volume is out of space,
it breaks and you lose everything »
that's « old » LVM snapshots; thinp halts I/O
● « if still use disk space after 'rm -rf' »
no, thanks to 'discard passdown'
74. Other features in 0.7
● links
– linked containers can discover each other
– environment variable injection
– allows to expose remote services thru containers
(implements the ambassador pattern)
– side-effect: container naming
● host integration
– we ♥ systemd
75. 0.8 and beyond
● beam
– introspection API
– based on Redis protocol
(i.e. all Redis clients work)
– works well for synchronous req/rep and streams
– reimplementation of Redis core in Go
– think of it as « live environment variables »,
that you can watch/subscribe to
● and much more