This document discusses using Docker and Ferry to share and deploy big data applications. It summarizes:
1) Packaging a simple Python/Bokeh application using Docker to make it easy to install and run.
2) Using Ferry to orchestrate the application across multiple containers for the web frontend, Cassandra database, and to specify the overall application configuration.
3) How Ferry allows easily sharing and deploying the application across different environments like local machines, cloud instances, and container orchestration platforms.
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Share & Deploy Big Data Apps with Docker & Ferry
1. Ferry - Share & Deploy Big
Data Applications with Docker
James Horey
2. • Writing a simple application with Bokeh
• Packaging our application with Docker
• Orchestrating our application with Ferry
Technical material can be found at:
https://github.com/jhorey/pydata
8. Let’s share
#!/bin/bash
!
# Make sure we have ‘pip’ installed
apt-get install python-pip
!
# Install packages in right order
apt-get —-yes install g++ python-dev
pip install bokeh
!
# Now download the data
python geography.py data/
python population economic Kentucky
data/
!
# Start the web server
python webserver data/
• Your script didn’t work
• Oh, I was supposed to run this as
sudo?
• Ok, it still didn’t work
• I get this funny error
• Oh yeah, I’m running Redhat
• Ok I’m at my desk, just use my
computer
9. • Encapsulates applications in isolated containers
• Makes it easy and safe to distribute applications
• Easy to get started
10. Our Dockerfile
Start from a
clean Precise
image
Install stuff
Add our files
Run this when
starting
$ docker build -t ferry/pydata .
$ docker push ferry/pydata
11. Sharing made simple
$ docker pull ferry/pydata
$ docker run -p 8000:8000 -name p1 —d ferry/pydata
p1
Kernel
Hardware
12. Sharing made simple
$ docker pull ferry/pydata
$ docker run -p 8000:8000 -name p1 —d ferry/pydata
$ docker run -p 8001:8000 -name p2 —d ferry/pydata
$ docker run -p 8002:8000 -name p3 —d ferry/pydata
p1 p2 p3
Kernel
Hardware
• Containers share basic kernel
and H.W. capabilities
• No virtualization
• Containers are isolated
• Access via port forwarding
You can run these commands now!
13. • Highly scalable and fault-tolerant
• Great for storing streaming data (sensors,
messages)
CREATE KEYSPACE census WITH REPLICATION = {
'class' : 'SimpleStrategy',
'replication_factor' : 1 };
!
USE census;
!
CREATE TABLE acs_economic_data (
state_cd TEXT,
state_name TEXT,
county_cd TEXT,
county_name TEXT,
median INT,
mean INT,
capita INT,
PRIMARY KEY(count_cd, state_cd)
);
14. Orchestration
Web DB
Web + DB
• Simple
• Full control
• More work for you
• Simpler Dockerfile
• More extensible
• How to orchestrate?
15. • Specify the containers that constitute your
application in YAML
• Support for Hadoop, Cassandra, GlusterFS, and
OpenMPI
• It’s a little bit like pip for your Docker-based
runtime environment
Ferry
http://ferry.opencore.io
16. Our Application
backend:
- storage:
personality: "cassandra"
instances: 1
connectors:
- personality: "ferry/pydata-cassandra"
ports: ["8000:8000"]
# The cassandra-client base comes with the various drivers
# pre-installed.
FROM ferry/cassandra-client
NAME ferry/pydata-cassandra
!
# Place the start scripts in the events directories so they
# are started when the connector is brought up.
ADD ./scripts/startcas.sh /service/runscripts/start/
ADD ./scripts/restartcas.sh /service/runscripts/restart/
RUN chmod a+x /service/runscripts/start/startcas.sh
RUN chmod a+x /service/runscripts/restart/restartcas.sh
+
18. What’s it doing?
$ ferry start cassandra.yml
Web C* C*
root@client-se-a5350a8d:~# env | grep BACK
BACKEND_STORAGE_TYPE=cassandra
BACKEND_STORAGE_IP=10.1.0.12
Generate!
Config
19. What’s it doing?
$ ferry start yarn
Client
Y Y
root@client-se-b597cb21:~# env | grep BACK
BACKEND_STORAGE_TYPE=gluster
BACKEND_STORAGE_IP=10.1.0.18
BACKEND_COMPUTE_TYPE=yarn
BACKEND_COMPUTE_IP=10.1.0.15
G G
21. Next steps
$ ferry share sa-df8d0aa6
w c* c*
Hardware
w c* c*
Hardware
w c* c*
Hardware
22. Next steps
$ ferry deploy sa-df8d0aa6
w c* c*
Hardware
w
c* c*
Hardware
Hardware Hardware
VPCEC2
S3
23. • Even simple applications can be complicated to
install and run
• Docker helps quite a bit with this
• Ferry helps build out big data applications