July 2014 HUG : Privilege Isolation in Docker Containers

Containers and
Hadoop
Hadoop virtualization, done right!
Dinesh Subhraveti - dineshs@altiscale.com
Altiscale Inc.

“Brief History of Containers”
2001 2002 2003 20052004
First implementation of
containers based on syscall
interposition — Columbia

2001 2002 2003 20052004
First research paper on
Linux Containers —
OSDI’02

2001 2002 2003 20052004
OSDI’02
First container-based
distributed checkpointing —
HP Labs

2001 2002 2003 2005
Enterprise Linux
Container solution —
Meiosys
2004
OSDI’02
HP Labs

2001 2002 2003 2005
Enterprise Linux
Meiosys
2004
OSDI’02
IBM acquires Meiosys —
Focus shifted to AIX
HP Labs

2001 2002 2003 2005
Enterprise Linux
Meiosys
2004
OSDI’02
IBM acquires Meiosys —
Focus shifted to AIX
HP Labs
Most core kernel changes
finally made into Linux mainline

Container Renaissance
“Datacenter is the Computer”

“The new computer needs an OS!”
Computer
OS
Mesos KubernetesYARN

Mesos KubernetesYARN
Containers: Enabler of the Datacenter OS
Computer
OS
ProcessesContainers: isolated abstractions

Why not Virtual Machines?
Application — Hardware misalignment
Hypervisor
Container Host
Application
Application
Applications have round edges
— system call interface
Hypervisors expose square holes
— hardware interface
Lightweight abstraction without
IO overhead or startup latency

Application — Hardware misalignment
Hypervisor
Container Host
Application
Applications have round edges
— system call interface
Hypervisors expose square holes
— hardware interface
Lightweight abstraction without
IO overhead or startup latency
The unwelcome
Guest OS
Application

Host
iSCSI, NFS
Image Format Interpreter
Virtual Device
VM Exit (Context Switch)
Guest Driver
Guest File System
Host
Application
Layers of Intermediate Software
VMsContainers
Application
High IO overhead due to
many intermediate layers

The Unwelcome Guest OS
Slow startup time
Guest OS licensing and maintenance burden
Poor scalability
High resource consumption due to duplication
Obfuscated network / storage / compute topologies
Application semantic information is lost

!
Hadoop
Resource Manager
Map Reduce
!
YARN
Map
Reduce
Spark Hbase ...
Evolution of Hadoop from Map Reduce to
YARN
Isolation is an immediate challenge

!
Hadoop
Resource Manager
Map Reduce
!
YARN
Map
Reduce
Spark Hbase ...
Containers on YARN
Containers provide a simple and elegant solution
Container Virtualization

!
Node Manager
Customer A
Task 1
Customer B
Task 1
Containers on YARN
Node Manager Spawned Tasks as Containers
Container Virtualization
Customer A
Task 2
Customer C
Task 1
Tasks representing the same job share the same container

Containers on YARN
Advantages
Secure multitenancy
Performance Isolation
Utilization via coscheduling IO and CPU tasks
Consistent cluster environment
Isolation of software dependencies / configuration
Reproducible way to define app environment
Rapid provisioning

❏ Recent addition to the kernel 
❏ Superuser in container maps to a
regular user on the host 
❏ Docker support for UID virtualization
Privilege Isolation through UID namespaces
Host
Container
Container root
UID 0
Regular user
UID 100
UID Virtualization
U
Host root
UID 0

References
!
❏ Blog post describing UID virtualization support in Docker
❏ https://www.altiscale.com/making-docker-work-yarn/
❏ Apache wiki page tracking work status across Docker and YARN projects
❏ https://wiki.apache.org/hadoop/dineshs/IsolatingYarnAppsInDockerContainers
❏ JIRA tracking Docker integration into YARN
❏ https://issues.apache.org/jira/browse/YARN-1964
❏ Related Docker tickets
❏ Several tickets linked from: https://github.com/dotcloud/docker/pull/4572 
 
dineshs@altiscale.com
Questions?

Backup
Containers on Hadoop or
Hadoop on Containers?

Hadoop on Separate Physical Clusters
Awesomely Secure !
Everybody gets private hardware running private
services
Customer 1 Customer 2 Customer 3

Hadoop on Separate Physical Clusters
Cannot scale the business this way!
Poor utilization
Host platform is a huge maintenance burden
❖ Customer 1 needs R
❖ Customer 2 needs Matlab
❖ Customer 3 needs ß∂ø…
Utilization: 6
Spare: 0
Unused: 3
Utilization: 1
Spare: 6
Unused: 2
Utilization: 4
Spare: 3
Unused: 2

Container Clusters to Decouple Host from Customer
Each customer gets a container image
❖ Encapsulates customer specific software and
configuration
❖ Host platform remains lean and simple
Utilization: 6
Spare: 0
Unused: 3
Utilization: 1
Spare: 6
Unused: 2
Utilization: 4
Spare: 3
Unused: 2
Poor utilization

Global Pool of Resources
Global Utilization: 11
Spare: 16
Unused: 0
Container Clusters to Drive Utilization
Each customer gets a container image
❖ Encapsulates customer specific software and
configuration
❖ Host platform remains lean and simple
Densely pack containers together

Containers with Fine-grain Resources
❖ Container resource levels adjusted dynamically per
customer
➢ As dictated by business policy
❖ Fractional resource allocation

Disaggregated Compute and Storage
DNNM
❖ Add more storage to Customer 1 cluster from a storage rich node
➢ While a compute intensive job from Customer 2 utilizes the available compute capacity on the
same node
Independently scale compute and storage

July 2014 HUG : Privilege Isolation in Docker Containers

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to July 2014 HUG : Privilege Isolation in Docker Containers

Similar to July 2014 HUG : Privilege Isolation in Docker Containers (20)

More from Yahoo Developer Network

More from Yahoo Developer Network (20)

Recently uploaded

Recently uploaded (20)

July 2014 HUG : Privilege Isolation in Docker Containers

Editor's Notes