5. DSG and Eurecom
Engineering research center
• Academic research in telecommunication, multimedia, networks and
security
• Close ties with local and international companies
Distributed Systems Group
• Focusing on data-intensive applications (so called “big data”) at all
levels
• Performance impact of virtualization, storage and network technologies (that’s me!)
• Data processing frameworks (Hadoop, Spark)
• Machine learning algorithms
5
6. Docker at the Distributed Systems Group
Started investigating Docker in 2012
• Virtualization platform for Big Data research
Summer 2015
• Built Swarm cluster
• Planning to shift from VMs to Containers for most use cases
Bigfoot project
6
7. Use cases
Internally at Eurecom:
• Laboratory sessions for Data Science course
• ~100 students, fixed configuration, throw-away environments
• Academic research
• very dynamic loads, all kinds of software combinations, higher priorities near deadlines
Companies have similar use cases
• Production jobs
• Fixed configuration, periodic executions
• Research teams
Smart airports
Power load
forecasting
Customer location
forecasting
7
8. The last 3 years: OpenStack + Sahara
Public/private cloud with VM-based virtualization
We contributed Spark support to Sahara
Users can create clusters on-demand
Assumes infinite resources
Slow
• Create an HDFS+Spark cluster: 5 to 10 minutes
• Swarm takes a few seconds for the same task
Supporting new services/versions requires code changes
Users make
static allocations
8
9. Why build on top of Docker and Swarm?
Swarm has a simple, documented API
Start solving our problem immediately
Packaging software is very easy
Freedom to experiment
Fast deployments
No static allocation, automatic resizing
Swarm does only one thing and does it well
9
10. Zoe
Application scheduler on top of Swarm
Queues requests when resources are scarce
Users can submit their own applications
And create their own container images!
Dynamically resizes active applications
Free unused resources to speed-up other apps
Can coexist with other Swarm users
MSC Zoe
Launch: August 2015
Tonnage: 197,362t
Capacity: 19,224 TEU
Length: 395.4 m
Engine: 83,800 HP
Crew: 22
10
13. Automatic resize of running applications
Volumes
Data layer
Applications
Example: a data layer is not needed if there are no users
Data is kept in volumes
The data layer can be restarted when needed
13
14. Examples of scheduling policies
FIFO – First In First Out
Priority based
Researchers near deadlines have more priority
Fits nicely the Swarm priority model
Deadline
Finish this work by 3 p.m.
Streaming analysis latency must be less than 200ms
Size-based
Run first the smallest applications
Need to know the runtime in advance
14
15. Zoe implementation
Two client implementations
Web interface
Command line for scripting
Simple FIFO scheduler
Docker images for Spark, HDFS, iPython and Spark notebooks
Open source on GitHub, images available on the Docker Hub
15
16. Zoe - future
Set date: March 2016 version 1.0
Big plans for Zoe
One full-time programmer
Companies we spoke to, all, are very interested
Features for 1.0 and after:
Create Zoe applications with more and more services
Automatic resizing of applications
Use the new volume management
Monitoring
Advanced scheduling
16
17. Using Docker Swarm for data-intensive apps
L2 networking for Docker
containers
Service discovery via DNS
Docker bridge
eth0 eth1
Docker bridge
eth0 eth1
What about Swarm 1.0 multi-host networking?
- We need hostnames to be visible from outside
- Will run measurements on overlay network performance
c1 c2 c3 c4
17
18. Key takeaways
1. Zoe is a data-intensive application scheduler that targets data
scientists and private clouds
2. It is very easy to build cloud applications on top of Swarm
3. Data-intensive frameworks like Spark can run easily and efficiently
on top of Swarm
4. Network between Docker containers on different hosts can be made
transparent
18