In this presentation is briefly introduced the use of Docker for Data Science.
Are presented arguments like the management of containers and the creation of new Docker images
4. Data Science Lifecycle
The Team Data Science Process lifecycle consist of the
following steps:
• Business Understanding
1
5. Data Science Lifecycle
The Team Data Science Process lifecycle consist of the
following steps:
• Business Understanding
• Data Acquisition and Understanding
• Data ingestion
• Data exploration
• Set up data pipeline
1
6. Data Science Lifecycle
The Team Data Science Process lifecycle consist of the
following steps:
• Business Understanding
• Data Acquisition and Understanding
• Data ingestion
• Data exploration
• Set up data pipeline
• Modeling
• Feature engineering
• Model training
1
7. Data Science Lifecycle
The Team Data Science Process lifecycle consist of the
following steps:
• Business Understanding
• Data Acquisition and Understanding
• Data ingestion
• Data exploration
• Set up data pipeline
• Modeling
• Feature engineering
• Model training
• Deployment
• Operationalize a model: deploy the model and pipeline to a
production or production-like environment for application
consumption.
1
8. Data Science Lifecycle
The Team Data Science Process lifecycle consist of the
following steps:
• Business Understanding
• Data Acquisition and Understanding
• Data ingestion
• Data exploration
• Set up data pipeline
• Modeling
• Feature engineering
• Model training
• Deployment
• Operationalize a model: deploy the model and pipeline to a
production or production-like environment for application
consumption.
• Customer Acceptance
1
9. Challenges in Data Science
Data science life cycle highlight some challenges:
2
10. Challenges in Data Science
Data science life cycle highlight some challenges:
• Download and install libraries
2
11. Challenges in Data Science
Data science life cycle highlight some challenges:
• Download and install libraries
• Manage versions and dependencies
2
12. Challenges in Data Science
Data science life cycle highlight some challenges:
• Download and install libraries
• Manage versions and dependencies
• Upgrade libraries
2
13. Challenges in Data Science
Data science life cycle highlight some challenges:
• Download and install libraries
• Manage versions and dependencies
• Upgrade libraries
• Isolate dependencies between projects
2
15. Containirezation
Containers come with many very attractive benefits for
developers, data science team and operations teams.
• Abstraction of the host system away from the containerized
application
3
16. Containirezation
Containers come with many very attractive benefits for
developers, data science team and operations teams.
• Abstraction of the host system away from the containerized
application
• Easy Scalability
3
17. Containirezation
Containers come with many very attractive benefits for
developers, data science team and operations teams.
• Abstraction of the host system away from the containerized
application
• Easy Scalability
• Simple Dependency Management and Application Versioning
3
18. Containirezation
Containers come with many very attractive benefits for
developers, data science team and operations teams.
• Abstraction of the host system away from the containerized
application
• Easy Scalability
• Simple Dependency Management and Application Versioning
• Extremely lightweight, isolated execution environments
3
19. Containirezation
Containers come with many very attractive benefits for
developers, data science team and operations teams.
• Abstraction of the host system away from the containerized
application
• Easy Scalability
• Simple Dependency Management and Application Versioning
• Extremely lightweight, isolated execution environments
• Shared Layering
3
20. Containirezation
Containers come with many very attractive benefits for
developers, data science team and operations teams.
• Abstraction of the host system away from the containerized
application
• Easy Scalability
• Simple Dependency Management and Application Versioning
• Extremely lightweight, isolated execution environments
• Shared Layering
• Composability and Predictability
3
21. Containirezation in Data Science
Containirezation solve a many problems simultaneously:
• They make easy to use libraries with complicated setups
• CPU version vs. GPU version (eg. Tensorflow)
• Different enviromets (eg. Python 2 vs. Python 3)
• Etc...
4
22. Containirezation in Data Science
Containirezation solve a many problems simultaneously:
• They make easy to use libraries with complicated setups
• CPU version vs. GPU version (eg. Tensorflow)
• Different enviromets (eg. Python 2 vs. Python 3)
• Etc...
• They make an output reproducible
4
23. Containirezation in Data Science
Containirezation solve a many problems simultaneously:
• They make easy to use libraries with complicated setups
• CPU version vs. GPU version (eg. Tensorflow)
• Different enviromets (eg. Python 2 vs. Python 3)
• Etc...
• They make an output reproducible
• They make easy the prototyping and deploy of complex
algorithms
4
24. Containirezation in Data Science
Containirezation solve a many problems simultaneously:
• They make easy to use libraries with complicated setups
• CPU version vs. GPU version (eg. Tensorflow)
• Different enviromets (eg. Python 2 vs. Python 3)
• Etc...
• They make an output reproducible
• They make easy the prototyping and deploy of complex
algorithms
• They can make easy and isolated the Python / R / Scala
data science development enviroments.
4
25. Containerization vs Virtualization
Virtual Machines (VMs)
• Represents hardware-level
virtualization
• Heavyweight
• Slow provisioning
• Limited performance
• Fully isolated and hence
more secure
5
26. Containerization vs Virtualization
Virtual Machines (VMs)
• Represents hardware-level
virtualization
• Heavyweight
• Slow provisioning
• Limited performance
• Fully isolated and hence
more secure
Containers
• Represents operating
system virtualization
• Lightweight
• Real-time provisioning and
scalability
• Native performance
• Process-level isolation and
hence less secure
5
27. Docker and Containerization
Figure 1: Containers isolate individual applications and use operating
system resources that have been abstracted by Docker. Containers can
be built by ”layering”, with multiple containers sharing underlying layers,
decreasing resource usage.
6
29. Run a Docker container
Docker runs processes in isolated containers. The docker run
command must specify an image to derive the container from. An
image developer can define image defaults related to:
7
30. Run a Docker container
Docker runs processes in isolated containers. The docker run
command must specify an image to derive the container from. An
image developer can define image defaults related to:
• Detached or foreground running
• Container identification
• Network settings
• Runtime constraints on CPU and memory
7
31. Interactive and Detached mode
Docker support two different running mode: interactive and
detached
8
32. Interactive and Detached mode
Docker support two different running mode: interactive and
detached
Interactive mode
$ sudo docker run -t -i --name mycontainer
alessandroadamo/ubuntu-ds-python3 /bin/bash
NB. To exit from an interactive container type exit command.
8
33. Interactive and Detached mode
Docker support two different running mode: interactive and
detached
Interactive mode
$ sudo docker run -t -i --name mycontainer
alessandroadamo/ubuntu-ds-python3 /bin/bash
NB. To exit from an interactive container type exit command.
Detached mode
$ sudo docker run -t -d -p 8888:8888
-v /home/user/notebooks:/home/ds/notebooks
--name mycontainer-daemon
alessandroadamo/ubuntu-ds-python3 8
34. List Containers
To list informations about the containers status we use the docker
ps.
Running containers
$ sudo docker ps
9
35. List Containers
To list informations about the containers status we use the docker
ps.
Running containers
$ sudo docker ps
Interactive mode
$ sudo docker ps -a
9
36. List Containers
To list informations about the containers status we use the docker
ps.
Running containers
$ sudo docker ps
Interactive mode
$ sudo docker ps -a
Latest container
$ sudo docker ps -l
9
37. List Containers
To list informations about the containers status we use the docker
ps.
Running containers
$ sudo docker ps
Interactive mode
$ sudo docker ps -a
Latest container
$ sudo docker ps -l
List quiet
$ sudo docker ps -q
9
38. Start and Stop Containers
Start a container
$ sudo docker start
mycontainer
10
39. Start and Stop Containers
Start a container
$ sudo docker start
mycontainer
Stop a container
$ sudo docker stop
mycontainer
10
40. Start and Stop Containers
Start a container
$ sudo docker start
mycontainer
Stop a container
$ sudo docker stop
mycontainer
Attach to a running container
$ sudo docker attach
mycontainer
10
41. Start and Stop Containers
Start a container
$ sudo docker start
mycontainer
Stop a container
$ sudo docker stop
mycontainer
Attach to a running container
$ sudo docker attach
mycontainer
Detach from a running
container
[ Ctrl + C ]
10
45. Building Process
• Docker can build images automatically by reading the
instructions from a Dockerfile.
• A Dockerfile is a text document that contains all the
commands a user could call on the command line to assemble
an image.
• Using docker build users can create an automated build that
executes several command-line instructions in succession.
12
47. Dockerfile
Dockerfile
And now an example of minimal Ubuntu Linux Docker image:
FROM ubuntu:16.04
MAINTAINER Alessandro Adamo "alessandro.adamo@gmail.com"
ENV REFRESHED_AT 2017-06-15
RUN apt-get update && apt-get dist-upgrade
13
50. Dockerfile Commands 1 / 3
Enviroment Variable
ENV <key> <value>
ENV <key> = <value>
Working Directory
WORKDIR ${foo}
Change User
USER username
14
51. Dockerfile Commands 1 / 3
Enviroment Variable
ENV <key> <value>
ENV <key> = <value>
Working Directory
WORKDIR ${foo}
Change User
USER username
Run a Command in new
Layer
RUN ["executable",
"param1", "param2"]
14
52. Dockerfile Commands 1 / 3
Enviroment Variable
ENV <key> <value>
ENV <key> = <value>
Working Directory
WORKDIR ${foo}
Change User
USER username
Run a Command in new
Layer
RUN ["executable",
"param1", "param2"]
Default for Container
CMD ["executable",
"param1","param2"]
14
53. Dockerfile Commands 1 / 3
Enviroment Variable
ENV <key> <value>
ENV <key> = <value>
Working Directory
WORKDIR ${foo}
Change User
USER username
Run a Command in new
Layer
RUN ["executable",
"param1", "param2"]
Default for Container
CMD ["executable",
"param1","param2"]
Metadata
LABEL version="1.0"
14