What's Docker, why does it matter, how does it use Linux Containers, why should you use it, and how? You'll find answers to those questions (and a bit more) in this presentation, given February 20th 2014 at the Large Scale Production Engineering Meet-Up at Yahoo, in Sunnyvale.
2. Who?
●
Jérôme Petazzoni — @jpetazzo
–Not
Maxime Petazzoni (author of Maestro NG)
–Not Thomas Petazzoni (embedded systems & kernel guru)
●
Wrote dotCloud PAAS deployment tools
–EC2,
●
LXC, Puppet, Python, Shell, ØMQ...
Docker contributor (CONTAINERIZE ALL THE THINGS!)
–Docker-in-Docker,
●
VPN-in-Docker, router-in-Docker...
Runs Docker in production
–Psssst...
#lspe
You shouldn't do it, but here's how anyway!
3. Outline
●
LXC and the container metaphor
●
Docker basics
●
Docker images
●
Docker deployment
#lspe
4. Outline
●
LXC and the container metaphor
●
Docker basics
●
Docker images
●
Docker deployment
#lspe
6. High level approach:
it's a lightweight VM
●
own process space
●
own network interface
●
can run stuff as root
●
can have its own /sbin/init
(different from the host)
« Machine Container »
#lspe
7. Low level approach:
it's chroot on steroids
●
can also not have its own /sbin/init
●
container = isolated process(es)
●
share kernel with host
●
no device emulation (neither HVM nor PV)
« Application Container »
#lspe
8. How does it work?
Isolation with namespaces
●
pid
●
mnt
●
net
●
uts
●
ipc
●
user
#lspe
9. pid namespace
jpetazzo@tarrasque:~$ ps aux | wc -l
212
jpetazzo@tarrasque:~$ sudo docker run -t -i ubuntu bash
root@ea319b8ac416:/# ps aux
USER
root
root
PID %CPU %MEM
1 0.0 0.0
16 0.0 0.0
(That's 2 processes)
#lspe
VSZ
18044
15276
RSS TTY
1956 ?
1136 ?
STAT START
S
02:54
R+
02:55
TIME COMMAND
0:00 bash
0:00 ps aux
14. user namespace
●
●
no « demo » for this one... Yet!
UID 0→1999 in container C1 is mapped to
UID 10000→11999 in host;
UID 0→1999 in container C2 is mapped to
UID 12000→13999 in host; etc.
●
required lots of VFS and FS patches (esp. XFS)
●
what will happen with copy-on-write?
–
double translation at VFS?
–
single root UID on read-only FS?
#lspe
15. How does it work?
Isolation with cgroups
●
memory
●
cpu
●
blkio
●
devices
#lspe
16. memory cgroup
●
keeps track pages used by each group:
–
file (read/write/mmap from block devices; swap)
–
anonymous (stack, heap, anonymous mmap)
–
active (recently accessed)
–
inactive (candidate for eviction)
●
each page is « charged » to a group
●
pages can be shared (e.g. if you use any COW FS)
●
Individual (per-cgroup) limits and out-of-memory killer
#lspe
17. cpu and cpuset cgroups
●
keep track of user/system CPU time
●
set relative weight per group
●
pin groups to specific CPU(s)
–
Can be used to « reserve » CPUs for some apps
–
This is also relevant for big NUMA systems
#lspe
18. blkio cgroups
●
keep track IOs for each block device
–
read vs write; sync vs async
●
set relative weights
●
set throttle (limits) for each block device
–
read vs write; bytes/sec vs operations/sec
Note: earlier versions (<3.8) didn't account async correctly.
#lspe3.8 is better, but use 3.10 for best results.
19. devices cgroups
●
controls read/write/mknod permissions
●
typically:
–
–
deny: everything else
–
●
allow: /dev/{tty,zero,random,null}...
maybe: /dev/net/tun, /dev/fuse, /dev/kvm, /dev/dri...
fine-grained control for GPU, virtualization, etc.
#lspe
20. How does it work?
Copy-on-write storage
●
Create a new machine instantly
(Instead of copying its whole filesystem)
●
Storage keeps track of what has changed
●
Since 0.7, Docker has a storage plugin system
#lspe
22. Compute efficiency:
almost no overhead
●
●
●
●
processes are isolated,
but run straight on the host
CPU performance
= native performance
memory performance
= a few % shaved off for (optional) accounting
network performance
= small overhead; can be reduced to zero
#lspe
38. Yes, but...
●
●
●
« I don't need Docker;
I can do all that stuff with LXC tools, rsync,
and some scripts! »
correct on all accounts;
but it's also true for apt, dpkg, rpm, yum, etc.
the whole point is to commoditize,
i.e. make it ridiculously easy to use
#lspe
39. What this really means…
●
instead of writing « very small shell scripts » to
manage containers, write them to do the rest:
–
continuous deployment/integration/testing
–
orchestration
●
= use Docker as a building block
●
re-use other people images (yay ecosystem!)
#lspe
40. Docker-what?
The big picture
●
Open Source engine to run containers
●
using copy-on-write for quick provisioning
●
allowing to create and share images
●
standard format for containers
(stack of layers; 1 layer = tarball+metadata)
standard, reproducible way to easily build
trusted images (Dockerfile, Stackbrew...)
#lspe
●
41. Docker-what?
Under the hood
●
rewrite of dotCloud internal container engine
–
–
●
original version: Python, tied to dotCloud's internal stuff
released version: Go, legacy-free
the Docker daemon runs in the background
–
–
HTTP API (over UNIX or TCP socket)
–
●
manages containers, images, and builds
embedded CLI talking to the API
Open Source (GitHub public repository + issue tracking)
#lspe
42. Docker-what?
The ecosystem
●
Docker Inc. (formerly dotCloud Inc.)
–
–
●
~30 employees, VC-backed
SAAS and support offering around Docker
Docker, the community
–
more than 300 contributors, 1500 forks on GitHub
–
dozens of projects around/on top of Docker
–
x100k trained developers
#lspe
43. Outline
●
LXC and the container metaphor
●
Docker basics
●
Docker images
●
Docker deployment
#lspe
48. FROM ubuntu
RUN
RUN
RUN
RUN
RUN
apt-get
apt-get
apt-get
apt-get
apt-get
-y update
install -y
install -y
install -y
install -y
g++
erlang-dev erlang-manpages erlang-base-hipe ...
libmozjs185-dev libicu-dev libtool ...
make wget
RUN wget http://.../apache-couchdb-1.3.1.tar.gz | tar -C /tmp -zxfRUN cd /tmp/apache-couchdb-* && ./configure && make install
RUN printf "[httpd]nport = 8101nbind_address = 0.0.0.0" >
/usr/local/etc/couchdb/local.d/docker.ini
EXPOSE 8101
CMD ["/usr/local/bin/couchdb"]
docker build -t jpetazzo/couchdb .
#lspe
49. Authoring images
with a Dockerfile
●
Minimal learning curve
●
Rebuilds are easy
●
Caching system makes rebuilds faster
●
Single file to define the whole environment!
#lspe
51. Authoring Images
with Chef/Puppet/Ansible/Salt/...
Plan A: « my other VM is a container »
●
write a Dockerfile to install $YOUR_CM
●
start tons of containers
●
run $YOUR_CM in them
Good if you want a mix of containers/VM/metal
But slower to deploy, and uses more resources
#lspe
52. Authoring Images
with Chef/Puppet/Ansible/Salt/...
Plan B: « the revolution will be containerized »
●
write a Dockerfile to install $YOUR_CM
●
… and run $YOUR_CM as part of build process
●
deploy fully baked images
Faster to deploy
Easier to rollback
#lspe
53. Outline
●
LXC and the container metaphor
●
Docker basics
●
Docker images
●
Docker deployment
#lspe
54. Install Docker
●
On your servers (Linux)
–
–
Single binary install (Golang FTW!)
–
●
Packages (Ubuntu, Debian, Fedora, Gentoo, Arch...)
Easy provisioning on Rackspace, Digital Ocean, EC2, GCE...
On your dev env (Linux, OS X, Windows)
–
Vagrantfile
–
boot2docker (25 MB VM image)
–
Natively (if you run Linux)
#lspe
56. Service discovery
●
you can name your containers
●
you can link your containers
docker run -d -name frontdb mysqlimage
docker run -d -link frontdb:sql nginximage
→ environment vars are injected in web container
→ twelve-factors FTW!
#lspe
58. Allocating CPU and RAM
●
docker run -c $CPU_SHARES -m $RAM_MB
Docker API will soon expose:
●
total CPU/RAM
●
allocated CPU/RAM
●
used CPU/RAM
WARNING: memory metrics are tricky!
#lspe
65. Networking
●
sometimes, you need to expose range of ports
●
or completely arbitary ports
●
or non-IP protocols
●
or you have special needs:
–
more than 1 Gb/s in containers
–
more than 1,000 connections/s
–
more than 100,000 concurrent connections
#lspe
66. Get rid of the overhead
●
●
●
●
●
use openvswitch
bridge a container directly with a NIC
(remove iptables out of the equation)
move a (macvlan) NIC to a container
(a bit of overhead; multi-tenant)
move a (physical) NIC to a container
(zero overhead; single-tenant)
move a (virtual function) NIC to a container
(if your hardware supports it)
#lspe
68. Pipework
●
●
sidekick script for Docker
I replaced the plumber with
a very small small shell script
pipework br1 $APACHE 192.168.1.1/24
pipework br1 $MYSQL 192.168.1.2/24
pipework br1 $CONTAINERID 192.168.4.25/20@192.168.4.1
pipework eth2 $(docker run -d hipache) 50.19.169.157
pipework eth3 $(docker run -d hipache) 107.22.140.5
pipework br0 $(docker run -d zmqworker) dhcp fa:de:b0:99:52:1c
#lspe