4. 4
What is the container?
Container == Docker?
There are other OSS implementations!
LXC
Runc
OpenVZ
etc…
So what the container is ...?
5. 5
What is the Docker ?
Docker provides many container related features.
Containerize
Packaging software
Managing Layers and its catalog
REST API
etc…
How does it work??
6. 6
Docker is Great, but...
It seems a bit .. too BIG
All the features are hidden in one binary
It is hard to know how it works
Remember the Unix philosophy
Keep It Simple, Stupid
We can do it with existing tools
7. 7
Let's mimic it!
Let's try to make a minimal container
How to use the namespaces
How to bind the devices
How to change the rootfs with chroot/pivot_root
How to use Capabilities and CPUSET etc.
Let's try to overlay the layers
Now we have the overlayfs!
How to manage layers
8. 8
MINCS
Minimum Container Shell-scripts
https://github.com/mhiramat/mincs
Basic functions
Use PID/Net/UTS/Mount namespaces
Layering with overlayfs
Capabilities, CPUSET and more
POSIX shell script (not bash script)
This can work with busybox shell/dash
10. 10
Frontend Scripts
Frontends of MINCS
Minc : run a command in a container
Marten : manage layered container images
Polecat : make a self executable containerized command
Frontend == parsing options
Set options to environment vars and call backend scripts
The pair of marten/minc-farm is exception
11. 11
minc
The main tool of MINCS
Run a command in a container
Works as chroot
(Or Docker run? :)
Setup namespaces and workspaces by overlayfs
Do not need any container images like Docker
No need rootfs dir as chroot (we can reuse current rootfs)
Netns is not enabled by default
[mhiramat@localhost mincs]$ sudo ./minc ps -ef
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 10:58 ? 00:00:00 ps -ef
12. 12
minc: Usage
Usage: minc [options] [command]
Options:
-r/--root ROOTDIR Specify a directory as a rootfs. If omitted, use “/”.
-t/--temp TEMPDIR Specify a working directory. If omitted, use a tmpdir by mkdir.
-k Do not remove working directory
--name UTSNAME Specify the host name in the container
--debug Show the debug log
If the command is omitted, run $SHELL.
13. 13
Dive into the shell script
Let's look into the minc command
Phase 1: Parse the command line and setup env-vars.
Phase 2: Invoke minc-exec
Setup netns and cpumask (if needed)
Move to the new namespaces
Get correct PID and setup UTSNAME
Setup rootfs for container
Bind device files
Unmount original mounts
Chroot to new rootfs and setup capabilities
14. 14
Minc: command line parsing
Case and while loop
Getopts is not used (not so flexible)
While { case & shift } loop
Mainly setup the environment value
After loop
Call minc-farm to get image based on UUID
Post-scripting by trap command
Call minc-exec
15. 15
Minc-exec(1) : Overview
Self execution shellscript
Unshare requires some other command to execute, so call the script itself
This is a historical reason – previously minc-exec was chns – 1 script
The first execution is outside a container
Setup netns and cpuset
Call unshare to make a container (namespace)
The second is inside of the container
Switch the script by checking PID == 1
Hide something from the program running in the container
Device files / unused mount points
16. 16
Minc-exec(2) : netns/cpuset
netns
Use “ip netns” to create new network namespace if needed
Use trap command to remove when the shell exits
Just create an eth pair on the namespaces
Do not assign IP address
We can use “pipework” for more networking options
CPUSET
Just setup a CPUSET bitmask by using taskset.
Still not using cgroups
17. 17
Intermission: Trap command
Trap is great :)
We can handle signal interrupts and exit
Able to call shell script functions
Minc usually use trap for...
Remove temporary files/PID file
Show the information messages when exits
Suppress ^C
18. 18
Minc-exec(3): Change namespace
Use unshare to change namespaces
Run unshare by passing $0
Pid, mount, ipc, uts namespaces are unshared
unshare -iumpf $0 “$@”
For the netns, we use ip netns exec
ip netns exec $MINC_NETNS unshare -iumpf $0 “$@”
19. 19
Minc-exec(4): Setup PID and utsname
Get the original PID (PID in parent namespace)
The PID outside container is good to send signal
Since unshare command forks, we can know the PID inside the container.
Even if we separate mount namespace, /proc is still same until remount it.
This means we can see /proc/self.
Set up utsname
Use hostname command to setup utsname
20. 20
Minc-exec(5):Mount namespace
Setup mount namespace
In some environment (with systemd?), mount information propagates to
other namespaces
Mount --make-rprivate /
Do not propagate all the mount operations
Overlaying workspace via minc-coat
Minc-coat backend does overlay on rootfs image.
Do not change rootfs afterwords.
If the rootfs can be changed, use --direct option
21. 21
Minc-coat: Implement overlays
Make root/, storage/, work/ under tempdir
Root/: The mountpoint for overlayfs → $RD
Storage/: Overlayfs top directory →$UD
work: a workdir for overlayfs → $WD
Build a new rootfs via Overlayfs
Not only using mount namespaces, but also layering for storage isolation
Some differences are there depends on the version
Overlayfs for upstream kernel
mount -t overlay -o upperdir=$UD,lowerdir=$BASEDIR,workdir=$WD overlayfs $RD
Overlayfs for Ubuntu14.10 (out-of-tree)
mount -t overlayfs -o upperdir=$UD,lowerdir=$BASEDIR overlayfs $RD
22. 22
Minc-exec(6): Special Files
Special files and directories
Make /etc, /dev, /sys and /proc on new rootfs
Bind mounts under /dev
Touch dummy files and bind it (like symlink)
/dev/console, /dev/null, /dev/zero, /dev/random, /dev/urandom, /dev/mqueue
(and others, if you need)
/dev/pts are mounted with newinstance
Mount /proc for new PID namespace
Old /proc should be ro remount.
Some files to be readonly (/proc/sys etc.), should be bind-mounted the ro /proc.
Bind mounts /sys
This could be skipped or be read only
23. 23
Intermission: Debug
How to debug it?
Just for checking the commands, run it with --debug
This option enables “set -x”
If you want to break into it, write “bash”(or other shell you like)
You can do anything :)
Or write a command what you run
MINCS is just a set of shell scripts
You can change it as you want.
24. 24
Minc-exec(7): Post-process Mountpoint
Remove old mountpoints
If we keep it, it can still be visible after chroot
Use pivot_root to unmount somethings
Let's monitor it with “df -h”
At last, call minc-leash to chroot.
25. 25
Minc-exec(7): Post-process Mountpoint
Remove old mountpoints
If we keep it, it can still be visible after chroot
Use pivot_root to unmount somethings
Let's monitor it with “df -h”
Filesystem Size Used Avail Use% Mounted on
devtmpfs 740M 0 740M 0% /dev
tmpfs 748M 0 748M 0% /dev/shm
tmpfs 748M 8.5M 740M 2% /run
tmpfs 748M 0 748M 0% /sys/fs/cgroup
/dev/sda2 15G 8.6G 6.5G 58% /
Before minc
26. 26
Minc-exec(7): Post-process Mountpoint
Remove old mountpoints
If we keep it, it can still be visible after chroot
Use pivot_root to unmount somethings
Let's monitor it with “df -h”Filesystem Size Used Avail Use% Mounted on
/dev/sda2 15G 8.6G 6.5G 58% /
devtmpfs 740M 0 740M 0% /dev
tmpfs 748M 0 748M 0% /dev/shm
tmpfs 748M 0 748M 0% /sys/fs/cgroup
tmpfs 748M 8.5M 740M 2% /run
overlayfs 15G 8.6G 6.5G 58% /tmp/minc1012-NpuyIA/root
tmpfs 748M 0 748M 0% /tmp/minc1012-NpuyIA/root/dev
devtmpfs 740M 0 740M 0% /tmp/minc1012-NpuyIA/root/dev/console
devtmpfs 740M 0 740M 0% /tmp/minc1012-NpuyIA/root/dev/null
devtmpfs 740M 0 740M 0% /tmp/minc1012-NpuyIA/root/dev/zero
devtmpfs 740M 0 740M 0% /tmp/minc1012-NpuyIA/root/dev/random
devtmpfs 740M 0 740M 0% /tmp/minc1012-NpuyIA/root/dev/urandom
Special files
27. 27
Minc-exec(7): Post-process Mountpoint
Remove old mountpoints
If we keep it, it can still be visible after chroot
Use pivot_root to unmount somethings
Let's monitor it with “df -h”Filesystem Size Used Avail Use% Mounted on
/dev/sda2 15G 8.6G 6.5G 58% /.orig
devtmpfs 740M 0 740M 0% /.orig/dev
tmpfs 748M 0 748M 0% /.orig/dev/shm
tmpfs 748M 0 748M 0% /.orig/sys/fs/cgroup
tmpfs 748M 8.5M 740M 2% /.orig/run
overlayfs 15G 8.6G 6.5G 58% /
tmpfs 748M 0 748M 0% /dev
devtmpfs 740M 0 740M 0% /dev/console
devtmpfs 740M 0 740M 0% /dev/null
devtmpfs 740M 0 740M 0% /dev/zero
devtmpfs 740M 0 740M 0% /dev/random
devtmpfs 740M 0 740M 0% /dev/urandom
After the first
pivot_root
28. 28
Minc-exec(7): Post-process Mountpoint
Remove old mountpoints
If we keep it, it can still be visible after chroot
Use pivot_root to unmount somethings
Let's monitor it with “df -h”
Filesystem Size Used Avail Use% Mounted on
/dev/sda2 15G 8.6G 6.5G 58% /.orig
overlayfs 15G 8.6G 6.5G 58% /
tmpfs 748M 0 748M 0% /dev
devtmpfs 740M 0 740M 0% /dev/console
devtmpfs 740M 0 740M 0% /dev/null
devtmpfs 740M 0 740M 0% /dev/zero
devtmpfs 740M 0 740M 0% /dev/random
devtmpfs 740M 0 740M 0% /dev/urandom
Remove old
Procfs, etc.
29. 29
Minc-exec(7): Post-process Mountpoint
Remove old mountpoints
If we keep it, it can still be visible after chroot
Use pivot_root to unmount somethings
Let's monitor it with “df -h”
Filesystem Size Used Avail Use% Mounted on
overlayfs 15G 8.6G 6.5G 58% /
tmpfs 748M 0 748M 0% /dev
devtmpfs 740M 0 740M 0% /dev/console
devtmpfs 740M 0 740M 0% /dev/null
devtmpfs 740M 0 740M 0% /dev/zero
devtmpfs 740M 0 740M 0% /dev/random
devtmpfs 740M 0 740M 0% /dev/urandom
2nd
pivot_root and
Chroot to new rootfs
30. 30
Minc-leash: capabilities and chroot
Leash() = “Least capabilities shell”
Limits capabilities and chroot by using capsh(libcap)
Change UID/GID too
If we skip capabilities setting, just do chroot
Wash() = “Wash out the environment variables”
MINCS use environment variables internally, clean it up
Unset all the vars start with MINC_*
31. 31
Use cases of MINCS
Good learning material for containers
If you hits some limitations on docker, you can try it, and understand.
Prototyping new features
Containers for embedded devices
Is it wrong to desire running applications in containers on embedded
device? :)
Docker(>14MB, docker only) vs MINCS+Busybox(<4MB, +shell and tools)
→ Boot2MINC
32. 32
Boot2minc
Minimal ISO image + MINCS
https://github.com/mhiramat/boot2minc
Forked from minimal Linux Live (https://github.com/ivandavidov/minimal )
Including
Linux kernel
Busybox(+unshare patch)
MINCS
8MB image including kernel (can run on Qemu-kvm)
Able to reduce the size if we optimize the configuration
33. 33
Marten: Manage container images
Minc provides only container feature
Should we prepare rootfs via debootstrap?
How to get the rootfs of Fedora/CentOS etc.?
Want to reuse the result of previous container easily
Overlayfs-based container image manager
Identify container images by Docker-like UUID
Track the dependency between images
Import Docker export/saved images
35. 35
TODO
minc
Work with pipework
Correct TTY support via tmux/screen
Use cgroups to limit cpu/memory/io usage (minc-cage?)
Plugin support of btrfs and dm-thin
Marten
Container execution command (like docker run)
Support OCI compatible container export/import and signing
36. 36
Known Issues
Testcases
Well, we can make it by shell script too :)
Capsh
Capsh only accepts “sh -c” type command
It doesn't accept escape characters…
37. 37
Conclusion
What I'd like to say is
“We can run a container by combining commands”
Docker etc. is not a special, we've already have fundamental tools.
And
“Shell script is great!”