SlideShare uma empresa Scribd logo
1 de 26
The case for Docker in multi-
cloud enabled
bioinformatics applications
Ahmed Ali, Mohamed M. ElKalioby, Mohamed Abouelhoda
Nile University, Egypt
Presented By
Mohamed M. El-Kalioby, MSc
1
Introduction
● Next generation sequencing technology has changed the
traditional bioinformatics practice
● Sophisticated multi-step workflows used to transform the raw
sequence data into knowledge.
● One NGS workflow can include tens of tasks and hundreds of
information sources integrated together to achieve the analysis
goals.
● Medical Variant Detection Workflow is an example of such
workflows.
2
Medical Variant Detection Workflow
(MVDW)
3
Medical Variant Detection Workflow (2)
● Multiple Versions and Instances of the workflow needed
● Tools and parameters can be changed
● per user, where each one may require certain modules, annotation
databases, and special post-processing;
● per experiment type, e.g., whole genome, whole exome, or RNAseq
in a single or multiplexed mode
● per sequencing platforms, illumina, IonTorrent, or any other one.
4
Requirements5
● Efficient Dynamic Deployment Strategy
● The deployed system should use HPC resources
● Able to consume cloud computing resources (private and public
clouds)
Virtualization Technology
● the whole system with all modules, databases and the
related dependencies are packaged in a virtual machine
(VM) image.
● These images can be then used to instantiate a virtual
machine running in private or public cloud.
● Examples from sequence analysis
● Crossbow for NGS read alignment & SNP calling,
● RSD-Cloud for comparative genomics
● … many more
6
Virtual Technology (2)
● The traditional engine for running the virtual machine
instances is based either on
● Oracle Virtual Box,
● KVM,
● Xen Hypervisor
● VMware
7
Docker8
● Docker provides a new level of virtualization
● the computing machine (including the operating system) is
not virtualized,
● Only the application and the related dependencies are
encapsulated in a ’virtual’ isolated process
INFRASTRUCTURE
Operating System
Virtual Machine Hypervisor
VM1 VM2 … VMn
APP1 APP2 …. APPn
INFRASTRUCTURE
Operating System
Container Container … Container
APPnAPP1 APP2 …
Container
Engine
Software Stack with Virtual Machines Software Stack with Containers
(a) (b)
Usage of Docker
9
Dockerclient
DockerServer
(Daemon)
Pull Image
Download/upload
Images
Build Image
Run Container
Build/Push container
images to local registry
Terminate Container
Docker
public
registry
Local registry
Infrastructure
Operating System
container container
Run containers
Why Docker10
● Reduced execution overhead compared to traditional whole
machine virtualization
● Provides an effective solution to the image portability
problem.
● Virtual machine images running in Amazon are not compatible
with those running in Google and vice versa which directly lead
to duplication of work to prepare new images with each
deployment.
Challenges
● Extra layers need to be built on top of Docker to enable the use of HPC resources
(computer cluster) and multi-cloud platforms
● Deployment in different commercial clouds is not an easy task.
● Each cloud platforms has different APIs and different business models.
● Images are compatible with different providers
11
Contribution
● Define use case scenario for using Docker within a computer cluster for
bioinformatics workflows.
● Evaluate its performance in comparison to the use of native hardware and usual
virtual machines, in private and public cloud.
● We also present a new version of our multicloud elasticHPC, referred to as
elasticHPC-Docker
1. enable the user deploy and run multi-step whole analysis workflows,
2. create computer cluster with Docker based applications and define a use case scenario
for that
3. support the use of private clouds as well as commercial clouds like Amazon and Google.
12
Containers in the Cloud13
Google
● Google Cloud offers a container service in the form of two products
1. container-optimized virtual machine images, which includes programs to run standard Docker
images, according to a user defined file in YAML format.
2. Google Kubernetes Engine (GKE) to create a cluster of virtual machines that can run Docker
images. GKE is based on pods,
● Google has established Google container registry (GCR).
● Cost:
● The optimized container images and GKE run at no extra cost. pays usual price of virtual
machines.
● GKE charges an extra fee of $0.15 per hour per cluster on top of the usual machine price (for
cluster size > 5 nodes).
● GKE has two limitations:
1. It does not support Docker’s private images.
2. The cluster size in GKE cannot exceed 100 nodes.
14
Amazon
● Amazon provides Elastic Container Service (ECS).
● ECS enables the deployment of Docker containers on Amazon EC2.
● Amazon uses docker-compose to manage docker containers.
● Docker-compose facilitates the process of setting up a multi-container application
by defining the application and all its dependencies in a single file using YAML
format.
● The instantiated machines include programs to automatically configure the
Docker environment.
● Amazon has its own images registry.
● Cost:
● the user pays for same as that of the usual instance types.
● If the load balancing service is selected, the user pays an extra small cost of $0.025 per
hour and $0.008 per GB transferred between instances
● Limitations:
● It does not support attaching EBS volumes to the running containers.
15
ElasticHPC-Docker
Features
● Ability to port and run any docker image to either private or commercial clouds.
● Creation and management of a cluster of containers. The cluster can use single or
multiple machines.
● The computer cluster can have nodes from different cloud providers; i.e. some
nodes can come from Amazon and some can come from Google.
● Ability to create and destroy containers in the run-time. This makes it possible to
run multiple containers on the same machine, one at a time.
● The package supports scaling up/down of virtual machines (worker nodes) in a
running clusters.
16
ElasticHPC-Docker
Features (2)
17
● The package allows mounting of virtual disks and establishment of a
shared file system to the containers (Default option is the NFS). In AWS, we
use EBS volumes and in Google we use persistent storage disks.
● elasticHPC-Docker automatically configures a job scheduler (including
security settings among the different providers) among the containers. The
default job schedule is PBS Torque, but SGE is also supported.
● The current package includes many Docker specification files (DockerFile)
for the most important tools for NGS data analysis. These include Fastx,
BWA, GATK .
● It includes a number of structural bioinformatics tools, including AutoDock,
Frodock, and AMBER GROMACS,, among others;.
EHPC-Docker (Use Case)18
EHPC-Client
EHPC-VM
Manager
Port 5000
Communication
with VM Manager
Port 5555
Ports1:4999,
5001:65535
Container
Communication with
Container service
Master Node
Communication
Among conainer
Service
Communication
Among Containerized
Services
Attached
Data
Volume
Shared File System
(Block Storage)
Running on
Users PC
EHPC-VM
Manager
Port 5000
Port 5555
Ports1:4999,
5001:65535
Container
Slave Node Worker Node
Attached
Data
Volume
EHPC-VM
Manager
Port 5000
Port 5555
Ports1:4999,
5001:65535
Container
Slave Node Worker Node
Attached
Data
Volume
EHPC-VM
Manager
Port 5000
Port 5555
Ports1:4999,
5001:65535
Container
Slave Node Worker Node
Attached
Data
Volume
1. User downloads the EHPC-Docker client2. User runs the client to create a cluster on a supported clouda. The client starts Master nodeb. Master node creates the rest of the cluster in parallelc. Master node distributes the URL of the image registryd. Master and worker nodes retrieve the image and start the containers.
e. Once done, the master node sets up the ports and finalizes the configuration of in
terms of setting up the job scheduler and the shared storage.Cluster is ready
Experiments
● We conducted two experiments:
1. Measure the time for establishing container clusters over different cloud platforms.
2. Measure the performance of using Docker when running the variant detection workflow.
19
Experiment 120
1. GKE is faster than ECS
2. elasticHPC is faster than GKE
3. elasticHPC is close to ECS
Experiment 2
● For this experiment, we used an exome dataset from DePristo et al. of size ~ 9 GB.
● The exome is a set of NGS reads sequenced only from the whole coding regions of a
genome.)
● The workflow was executed three times independently on Google, AWS, and private
cloud based on OpenStack.
● In each cloud, the 9 GB input data is divided into blocks to be processed in parallel
over the cluster nodes.
● For fair comparison, we used machines of as similar specifications as possible.
● Amazon: m3.2xlarge (8 C, Intel 2.5 GHz, 30 GB RAM, SSD disks, $0.532/hour),
● Google: n1-highmem-8(8 C, Intel 2.5 GHz, 52 GB RAM, SSD disks,$0.504/hour)
● OpenStack: we used local machine with 8 Cores, 56 GB RAM.
21
Experiment 2
Physical Servers
22
Docker is too close to physical
Experiment 2
Google Cloud
23
ElasticHPC is faster than
GCE Containers
Experiment 2
Amazon Cloud
24
ElasticHPC is very close to Amazon ECS
Conclusion
● We introduced elasticHPC-Docker based on container technology.
● Our package enables the creation of a computer cluster with containerized
applications and workflows in private and in different commercial clouds using
single interface.
● It includes options to run bioinformatics applications and workflows for large
datasets
● Through the container technology, elasticHPC-Docker provides an efficient
solution to the inter-operability among commercial clouds,
● It is efficient in practice with reduced overhead especially on local infrastructures.
● It is available on http://www.elastichpc.org
25
26
Thank You

Mais conteúdo relacionado

Mais procurados

Scaling Jakarta EE Applications Vertically and Horizontally with Jelastic PaaS
Scaling Jakarta EE Applications Vertically and Horizontally with Jelastic PaaSScaling Jakarta EE Applications Vertically and Horizontally with Jelastic PaaS
Scaling Jakarta EE Applications Vertically and Horizontally with Jelastic PaaSJelastic Multi-Cloud PaaS
 
Meteor South Bay Meetup - Kubernetes & Google Container Engine
Meteor South Bay Meetup - Kubernetes & Google Container EngineMeteor South Bay Meetup - Kubernetes & Google Container Engine
Meteor South Bay Meetup - Kubernetes & Google Container EngineKit Merker
 
Introduction to kubernetes
Introduction to kubernetesIntroduction to kubernetes
Introduction to kubernetesRishabh Indoria
 
DevOps in AWS with Kubernetes
DevOps in AWS with KubernetesDevOps in AWS with Kubernetes
DevOps in AWS with KubernetesOleg Chunikhin
 
ARCHITECTING TENANT BASED QOS IN MULTI-TENANT CLOUD PLATFORMS
ARCHITECTING TENANT BASED QOS IN MULTI-TENANT CLOUD PLATFORMSARCHITECTING TENANT BASED QOS IN MULTI-TENANT CLOUD PLATFORMS
ARCHITECTING TENANT BASED QOS IN MULTI-TENANT CLOUD PLATFORMSArun prasath
 
Getting started with kubernetes
Getting started with kubernetesGetting started with kubernetes
Getting started with kubernetesBob Killen
 
Kubernetes Requests and Limits
Kubernetes Requests and LimitsKubernetes Requests and Limits
Kubernetes Requests and LimitsAhmed AbouZaid
 
Kubernetes for Beginners: An Introductory Guide
Kubernetes for Beginners: An Introductory GuideKubernetes for Beginners: An Introductory Guide
Kubernetes for Beginners: An Introductory GuideBytemark
 
Quantifying the Noisy Neighbor Problem in Openstack
Quantifying the Noisy Neighbor Problem in OpenstackQuantifying the Noisy Neighbor Problem in Openstack
Quantifying the Noisy Neighbor Problem in OpenstackNodir Kodirov
 
Evolution of containers to kubernetes
Evolution of containers to kubernetesEvolution of containers to kubernetes
Evolution of containers to kubernetesKrishna-Kumar
 
Microsoft Azure in HPC scenarios
Microsoft Azure in HPC scenariosMicrosoft Azure in HPC scenarios
Microsoft Azure in HPC scenariosmictc
 
Federated Kubernetes: As a Platform for Distributed Scientific Computing
Federated Kubernetes: As a Platform for Distributed Scientific ComputingFederated Kubernetes: As a Platform for Distributed Scientific Computing
Federated Kubernetes: As a Platform for Distributed Scientific ComputingBob Killen
 
Kubernetes a comprehensive overview
Kubernetes   a comprehensive overviewKubernetes   a comprehensive overview
Kubernetes a comprehensive overviewGabriel Carro
 
Kubernetes
KubernetesKubernetes
Kuberneteserialc_w
 

Mais procurados (19)

Kubernetes Basics
Kubernetes BasicsKubernetes Basics
Kubernetes Basics
 
Scaling Jakarta EE Applications Vertically and Horizontally with Jelastic PaaS
Scaling Jakarta EE Applications Vertically and Horizontally with Jelastic PaaSScaling Jakarta EE Applications Vertically and Horizontally with Jelastic PaaS
Scaling Jakarta EE Applications Vertically and Horizontally with Jelastic PaaS
 
Meteor South Bay Meetup - Kubernetes & Google Container Engine
Meteor South Bay Meetup - Kubernetes & Google Container EngineMeteor South Bay Meetup - Kubernetes & Google Container Engine
Meteor South Bay Meetup - Kubernetes & Google Container Engine
 
Kubernetes Basics
Kubernetes BasicsKubernetes Basics
Kubernetes Basics
 
Introduction to kubernetes
Introduction to kubernetesIntroduction to kubernetes
Introduction to kubernetes
 
DevOps in AWS with Kubernetes
DevOps in AWS with KubernetesDevOps in AWS with Kubernetes
DevOps in AWS with Kubernetes
 
ARCHITECTING TENANT BASED QOS IN MULTI-TENANT CLOUD PLATFORMS
ARCHITECTING TENANT BASED QOS IN MULTI-TENANT CLOUD PLATFORMSARCHITECTING TENANT BASED QOS IN MULTI-TENANT CLOUD PLATFORMS
ARCHITECTING TENANT BASED QOS IN MULTI-TENANT CLOUD PLATFORMS
 
kubernetes 101
kubernetes 101kubernetes 101
kubernetes 101
 
Getting started with kubernetes
Getting started with kubernetesGetting started with kubernetes
Getting started with kubernetes
 
Kubernetes Requests and Limits
Kubernetes Requests and LimitsKubernetes Requests and Limits
Kubernetes Requests and Limits
 
Containers kuberenetes
Containers kuberenetesContainers kuberenetes
Containers kuberenetes
 
Kubernetes for Beginners: An Introductory Guide
Kubernetes for Beginners: An Introductory GuideKubernetes for Beginners: An Introductory Guide
Kubernetes for Beginners: An Introductory Guide
 
Quantifying the Noisy Neighbor Problem in Openstack
Quantifying the Noisy Neighbor Problem in OpenstackQuantifying the Noisy Neighbor Problem in Openstack
Quantifying the Noisy Neighbor Problem in Openstack
 
Evolution of containers to kubernetes
Evolution of containers to kubernetesEvolution of containers to kubernetes
Evolution of containers to kubernetes
 
Microsoft Azure in HPC scenarios
Microsoft Azure in HPC scenariosMicrosoft Azure in HPC scenarios
Microsoft Azure in HPC scenarios
 
Kubernetes Basics
Kubernetes BasicsKubernetes Basics
Kubernetes Basics
 
Federated Kubernetes: As a Platform for Distributed Scientific Computing
Federated Kubernetes: As a Platform for Distributed Scientific ComputingFederated Kubernetes: As a Platform for Distributed Scientific Computing
Federated Kubernetes: As a Platform for Distributed Scientific Computing
 
Kubernetes a comprehensive overview
Kubernetes   a comprehensive overviewKubernetes   a comprehensive overview
Kubernetes a comprehensive overview
 
Kubernetes
KubernetesKubernetes
Kubernetes
 

Destaque

Head first docker
Head first dockerHead first docker
Head first dockerHan Qin
 
An optimized scientific workflow scheduling in cloud computing
An optimized scientific workflow scheduling in cloud computingAn optimized scientific workflow scheduling in cloud computing
An optimized scientific workflow scheduling in cloud computingDIGVIJAY SHINDE
 
Using Docker Containers to Improve Reproducibility in Software and Web Engine...
Using Docker Containers to Improve Reproducibility in Software and Web Engine...Using Docker Containers to Improve Reproducibility in Software and Web Engine...
Using Docker Containers to Improve Reproducibility in Software and Web Engine...Vincenzo Ferme
 
Caravane Bio [Mohammed Benbouida, AMBS, Morocco]
Caravane Bio [Mohammed Benbouida, AMBS, Morocco]Caravane Bio [Mohammed Benbouida, AMBS, Morocco]
Caravane Bio [Mohammed Benbouida, AMBS, Morocco]UNESCO Venice Office
 
Kallio Chipster Bosc2009
Kallio Chipster Bosc2009Kallio Chipster Bosc2009
Kallio Chipster Bosc2009bosc
 
استراتيجيات العلوم والتكنولوجيا والتجديد العالمية المعاصرة (ST&I)
 استراتيجيات العلوم والتكنولوجيا والتجديد العالمية المعاصرة (ST&I) استراتيجيات العلوم والتكنولوجيا والتجديد العالمية المعاصرة (ST&I)
استراتيجيات العلوم والتكنولوجيا والتجديد العالمية المعاصرة (ST&I)Prof. Tafida Ghanem
 
Lt npsti process-and_forms_april_2011
Lt npsti process-and_forms_april_2011Lt npsti process-and_forms_april_2011
Lt npsti process-and_forms_april_2011Mosab-Khayat
 
الهوية الرقمية على مواقع التواصل الاجتماعي
الهوية الرقمية على مواقع التواصل الاجتماعيالهوية الرقمية على مواقع التواصل الاجتماعي
الهوية الرقمية على مواقع التواصل الاجتماعيFatma Esa
 
Delivering Bioinformatics MapReduce Applications in the Cloud
Delivering Bioinformatics MapReduce Applications in the CloudDelivering Bioinformatics MapReduce Applications in the Cloud
Delivering Bioinformatics MapReduce Applications in the CloudLukas Forer
 
Bioinformatics lecture 1
Bioinformatics lecture 1Bioinformatics lecture 1
Bioinformatics lecture 1Hamid Ur-Rahman
 
الثقافة المعلوماتية في الجامعات مكتبة جامعة 6 أكتوبر نوفمبر 2012م
الثقافة المعلوماتية في الجامعات   مكتبة جامعة 6 أكتوبر نوفمبر 2012مالثقافة المعلوماتية في الجامعات   مكتبة جامعة 6 أكتوبر نوفمبر 2012م
الثقافة المعلوماتية في الجامعات مكتبة جامعة 6 أكتوبر نوفمبر 2012مProf. Sherif Shaheen
 
تسويق خدمات المعلومات
تسويق خدمات المعلوماتتسويق خدمات المعلومات
تسويق خدمات المعلوماتu083125
 
الثقافة التقنية والمواطنة الالكترونية
الثقافة التقنية والمواطنة الالكترونيةالثقافة التقنية والمواطنة الالكترونية
الثقافة التقنية والمواطنة الالكترونيةNazzal Th. Alenezi
 

Destaque (20)

Head first docker
Head first dockerHead first docker
Head first docker
 
An optimized scientific workflow scheduling in cloud computing
An optimized scientific workflow scheduling in cloud computingAn optimized scientific workflow scheduling in cloud computing
An optimized scientific workflow scheduling in cloud computing
 
Using Docker Containers to Improve Reproducibility in Software and Web Engine...
Using Docker Containers to Improve Reproducibility in Software and Web Engine...Using Docker Containers to Improve Reproducibility in Software and Web Engine...
Using Docker Containers to Improve Reproducibility in Software and Web Engine...
 
Caravane Bio [Mohammed Benbouida, AMBS, Morocco]
Caravane Bio [Mohammed Benbouida, AMBS, Morocco]Caravane Bio [Mohammed Benbouida, AMBS, Morocco]
Caravane Bio [Mohammed Benbouida, AMBS, Morocco]
 
Kallio Chipster Bosc2009
Kallio Chipster Bosc2009Kallio Chipster Bosc2009
Kallio Chipster Bosc2009
 
استراتيجيات العلوم والتكنولوجيا والتجديد العالمية المعاصرة (ST&I)
 استراتيجيات العلوم والتكنولوجيا والتجديد العالمية المعاصرة (ST&I) استراتيجيات العلوم والتكنولوجيا والتجديد العالمية المعاصرة (ST&I)
استراتيجيات العلوم والتكنولوجيا والتجديد العالمية المعاصرة (ST&I)
 
Lt npsti process-and_forms_april_2011
Lt npsti process-and_forms_april_2011Lt npsti process-and_forms_april_2011
Lt npsti process-and_forms_april_2011
 
Dr Justin Schonfeld - Bioinformatics Applications
Dr Justin Schonfeld - Bioinformatics ApplicationsDr Justin Schonfeld - Bioinformatics Applications
Dr Justin Schonfeld - Bioinformatics Applications
 
الهوية الرقمية على مواقع التواصل الاجتماعي
الهوية الرقمية على مواقع التواصل الاجتماعيالهوية الرقمية على مواقع التواصل الاجتماعي
الهوية الرقمية على مواقع التواصل الاجتماعي
 
Delivering Bioinformatics MapReduce Applications in the Cloud
Delivering Bioinformatics MapReduce Applications in the CloudDelivering Bioinformatics MapReduce Applications in the Cloud
Delivering Bioinformatics MapReduce Applications in the Cloud
 
مهارات+1
مهارات+1مهارات+1
مهارات+1
 
Present
PresentPresent
Present
 
Dr. Dario Lijtmaer - Data Sharing/Collaboration and Publication using BOLD
Dr. Dario Lijtmaer - Data Sharing/Collaboration and Publication using BOLDDr. Dario Lijtmaer - Data Sharing/Collaboration and Publication using BOLD
Dr. Dario Lijtmaer - Data Sharing/Collaboration and Publication using BOLD
 
e justice
e justice e justice
e justice
 
Bioinformatics lecture 1
Bioinformatics lecture 1Bioinformatics lecture 1
Bioinformatics lecture 1
 
Brin bws13 quiz mmc
Brin bws13 quiz mmcBrin bws13 quiz mmc
Brin bws13 quiz mmc
 
Visual Studio
Visual StudioVisual Studio
Visual Studio
 
الثقافة المعلوماتية في الجامعات مكتبة جامعة 6 أكتوبر نوفمبر 2012م
الثقافة المعلوماتية في الجامعات   مكتبة جامعة 6 أكتوبر نوفمبر 2012مالثقافة المعلوماتية في الجامعات   مكتبة جامعة 6 أكتوبر نوفمبر 2012م
الثقافة المعلوماتية في الجامعات مكتبة جامعة 6 أكتوبر نوفمبر 2012م
 
تسويق خدمات المعلومات
تسويق خدمات المعلوماتتسويق خدمات المعلومات
تسويق خدمات المعلومات
 
الثقافة التقنية والمواطنة الالكترونية
الثقافة التقنية والمواطنة الالكترونيةالثقافة التقنية والمواطنة الالكترونية
الثقافة التقنية والمواطنة الالكترونية
 

Semelhante a The Case For Docker In Multi-Cloud Enabled Bioinformatics Applications

Introduction to Docker storage, volume and image
Introduction to Docker storage, volume and imageIntroduction to Docker storage, volume and image
Introduction to Docker storage, volume and imageejlp12
 
Introduction to containers a practical session using core os and docker
Introduction to containers  a practical session using core os and dockerIntroduction to containers  a practical session using core os and docker
Introduction to containers a practical session using core os and dockerAlessandro Martellone
 
Scalable Spark deployment using Kubernetes
Scalable Spark deployment using KubernetesScalable Spark deployment using Kubernetes
Scalable Spark deployment using Kubernetesdatamantra
 
Docker on Amazon ECS
Docker on Amazon ECSDocker on Amazon ECS
Docker on Amazon ECSDeepak Kumar
 
Introduction to containers, k8s, Microservices & Cloud Native
Introduction to containers, k8s, Microservices & Cloud NativeIntroduction to containers, k8s, Microservices & Cloud Native
Introduction to containers, k8s, Microservices & Cloud NativeTerry Wang
 
Cloud Run and Containers
Cloud Run and ContainersCloud Run and Containers
Cloud Run and ContainersOmar Fathy
 
6 Months Sailing with Docker in Production
6 Months Sailing with Docker in Production 6 Months Sailing with Docker in Production
6 Months Sailing with Docker in Production Hung Lin
 
Academy PRO: Docker. Part 1
Academy PRO: Docker. Part 1Academy PRO: Docker. Part 1
Academy PRO: Docker. Part 1Binary Studio
 
VASCAN - Docker and Security
VASCAN - Docker and SecurityVASCAN - Docker and Security
VASCAN - Docker and SecurityMichael Irwin
 
Kubernetes in Docker
Kubernetes in DockerKubernetes in Docker
Kubernetes in Dockerdocker-athens
 
Containerize! Between Docker and Jube.
Containerize! Between Docker and Jube.Containerize! Between Docker and Jube.
Containerize! Between Docker and Jube.Henryk Konsek
 
Kubernetes: training micro-dragons for a serious battle
Kubernetes: training micro-dragons for a serious battleKubernetes: training micro-dragons for a serious battle
Kubernetes: training micro-dragons for a serious battleAmir Moghimi
 
Apache Cassandra Lunch #41: Cassandra on Kubernetes - Docker/Kubernetes/Helm ...
Apache Cassandra Lunch #41: Cassandra on Kubernetes - Docker/Kubernetes/Helm ...Apache Cassandra Lunch #41: Cassandra on Kubernetes - Docker/Kubernetes/Helm ...
Apache Cassandra Lunch #41: Cassandra on Kubernetes - Docker/Kubernetes/Helm ...Anant Corporation
 
Making Service Deployments to AWS a breeze with Nova
Making Service Deployments to AWS a breeze with NovaMaking Service Deployments to AWS a breeze with Nova
Making Service Deployments to AWS a breeze with NovaGregor Heine
 
Best Practices for Developing & Deploying Java Applications with Docker
Best Practices for Developing & Deploying Java Applications with DockerBest Practices for Developing & Deploying Java Applications with Docker
Best Practices for Developing & Deploying Java Applications with DockerEric Smalling
 
Microservices , Docker , CI/CD , Kubernetes Seminar - Sri Lanka
Microservices , Docker , CI/CD , Kubernetes Seminar - Sri Lanka Microservices , Docker , CI/CD , Kubernetes Seminar - Sri Lanka
Microservices , Docker , CI/CD , Kubernetes Seminar - Sri Lanka Mario Ishara Fernando
 
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...javier ramirez
 

Semelhante a The Case For Docker In Multi-Cloud Enabled Bioinformatics Applications (20)

JOSA TechTalks - Docker in Production
JOSA TechTalks - Docker in ProductionJOSA TechTalks - Docker in Production
JOSA TechTalks - Docker in Production
 
Introduction to Docker storage, volume and image
Introduction to Docker storage, volume and imageIntroduction to Docker storage, volume and image
Introduction to Docker storage, volume and image
 
Introduction to containers a practical session using core os and docker
Introduction to containers  a practical session using core os and dockerIntroduction to containers  a practical session using core os and docker
Introduction to containers a practical session using core os and docker
 
Scalable Spark deployment using Kubernetes
Scalable Spark deployment using KubernetesScalable Spark deployment using Kubernetes
Scalable Spark deployment using Kubernetes
 
Docker on Amazon ECS
Docker on Amazon ECSDocker on Amazon ECS
Docker on Amazon ECS
 
Introduction to containers, k8s, Microservices & Cloud Native
Introduction to containers, k8s, Microservices & Cloud NativeIntroduction to containers, k8s, Microservices & Cloud Native
Introduction to containers, k8s, Microservices & Cloud Native
 
Cloud Run and Containers
Cloud Run and ContainersCloud Run and Containers
Cloud Run and Containers
 
6 Months Sailing with Docker in Production
6 Months Sailing with Docker in Production 6 Months Sailing with Docker in Production
6 Months Sailing with Docker in Production
 
Academy PRO: Docker. Part 1
Academy PRO: Docker. Part 1Academy PRO: Docker. Part 1
Academy PRO: Docker. Part 1
 
Gdsc muk - innocent
Gdsc   muk - innocentGdsc   muk - innocent
Gdsc muk - innocent
 
VASCAN - Docker and Security
VASCAN - Docker and SecurityVASCAN - Docker and Security
VASCAN - Docker and Security
 
Kubernetes in Docker
Kubernetes in DockerKubernetes in Docker
Kubernetes in Docker
 
Containerize! Between Docker and Jube.
Containerize! Between Docker and Jube.Containerize! Between Docker and Jube.
Containerize! Between Docker and Jube.
 
Containers > VMs
Containers > VMsContainers > VMs
Containers > VMs
 
Kubernetes: training micro-dragons for a serious battle
Kubernetes: training micro-dragons for a serious battleKubernetes: training micro-dragons for a serious battle
Kubernetes: training micro-dragons for a serious battle
 
Apache Cassandra Lunch #41: Cassandra on Kubernetes - Docker/Kubernetes/Helm ...
Apache Cassandra Lunch #41: Cassandra on Kubernetes - Docker/Kubernetes/Helm ...Apache Cassandra Lunch #41: Cassandra on Kubernetes - Docker/Kubernetes/Helm ...
Apache Cassandra Lunch #41: Cassandra on Kubernetes - Docker/Kubernetes/Helm ...
 
Making Service Deployments to AWS a breeze with Nova
Making Service Deployments to AWS a breeze with NovaMaking Service Deployments to AWS a breeze with Nova
Making Service Deployments to AWS a breeze with Nova
 
Best Practices for Developing & Deploying Java Applications with Docker
Best Practices for Developing & Deploying Java Applications with DockerBest Practices for Developing & Deploying Java Applications with Docker
Best Practices for Developing & Deploying Java Applications with Docker
 
Microservices , Docker , CI/CD , Kubernetes Seminar - Sri Lanka
Microservices , Docker , CI/CD , Kubernetes Seminar - Sri Lanka Microservices , Docker , CI/CD , Kubernetes Seminar - Sri Lanka
Microservices , Docker , CI/CD , Kubernetes Seminar - Sri Lanka
 
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
 

Último

A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 

Último (20)

A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 

The Case For Docker In Multi-Cloud Enabled Bioinformatics Applications

  • 1. The case for Docker in multi- cloud enabled bioinformatics applications Ahmed Ali, Mohamed M. ElKalioby, Mohamed Abouelhoda Nile University, Egypt Presented By Mohamed M. El-Kalioby, MSc 1
  • 2. Introduction ● Next generation sequencing technology has changed the traditional bioinformatics practice ● Sophisticated multi-step workflows used to transform the raw sequence data into knowledge. ● One NGS workflow can include tens of tasks and hundreds of information sources integrated together to achieve the analysis goals. ● Medical Variant Detection Workflow is an example of such workflows. 2
  • 3. Medical Variant Detection Workflow (MVDW) 3
  • 4. Medical Variant Detection Workflow (2) ● Multiple Versions and Instances of the workflow needed ● Tools and parameters can be changed ● per user, where each one may require certain modules, annotation databases, and special post-processing; ● per experiment type, e.g., whole genome, whole exome, or RNAseq in a single or multiplexed mode ● per sequencing platforms, illumina, IonTorrent, or any other one. 4
  • 5. Requirements5 ● Efficient Dynamic Deployment Strategy ● The deployed system should use HPC resources ● Able to consume cloud computing resources (private and public clouds)
  • 6. Virtualization Technology ● the whole system with all modules, databases and the related dependencies are packaged in a virtual machine (VM) image. ● These images can be then used to instantiate a virtual machine running in private or public cloud. ● Examples from sequence analysis ● Crossbow for NGS read alignment & SNP calling, ● RSD-Cloud for comparative genomics ● … many more 6
  • 7. Virtual Technology (2) ● The traditional engine for running the virtual machine instances is based either on ● Oracle Virtual Box, ● KVM, ● Xen Hypervisor ● VMware 7
  • 8. Docker8 ● Docker provides a new level of virtualization ● the computing machine (including the operating system) is not virtualized, ● Only the application and the related dependencies are encapsulated in a ’virtual’ isolated process INFRASTRUCTURE Operating System Virtual Machine Hypervisor VM1 VM2 … VMn APP1 APP2 …. APPn INFRASTRUCTURE Operating System Container Container … Container APPnAPP1 APP2 … Container Engine Software Stack with Virtual Machines Software Stack with Containers (a) (b)
  • 9. Usage of Docker 9 Dockerclient DockerServer (Daemon) Pull Image Download/upload Images Build Image Run Container Build/Push container images to local registry Terminate Container Docker public registry Local registry Infrastructure Operating System container container Run containers
  • 10. Why Docker10 ● Reduced execution overhead compared to traditional whole machine virtualization ● Provides an effective solution to the image portability problem. ● Virtual machine images running in Amazon are not compatible with those running in Google and vice versa which directly lead to duplication of work to prepare new images with each deployment.
  • 11. Challenges ● Extra layers need to be built on top of Docker to enable the use of HPC resources (computer cluster) and multi-cloud platforms ● Deployment in different commercial clouds is not an easy task. ● Each cloud platforms has different APIs and different business models. ● Images are compatible with different providers 11
  • 12. Contribution ● Define use case scenario for using Docker within a computer cluster for bioinformatics workflows. ● Evaluate its performance in comparison to the use of native hardware and usual virtual machines, in private and public cloud. ● We also present a new version of our multicloud elasticHPC, referred to as elasticHPC-Docker 1. enable the user deploy and run multi-step whole analysis workflows, 2. create computer cluster with Docker based applications and define a use case scenario for that 3. support the use of private clouds as well as commercial clouds like Amazon and Google. 12
  • 13. Containers in the Cloud13
  • 14. Google ● Google Cloud offers a container service in the form of two products 1. container-optimized virtual machine images, which includes programs to run standard Docker images, according to a user defined file in YAML format. 2. Google Kubernetes Engine (GKE) to create a cluster of virtual machines that can run Docker images. GKE is based on pods, ● Google has established Google container registry (GCR). ● Cost: ● The optimized container images and GKE run at no extra cost. pays usual price of virtual machines. ● GKE charges an extra fee of $0.15 per hour per cluster on top of the usual machine price (for cluster size > 5 nodes). ● GKE has two limitations: 1. It does not support Docker’s private images. 2. The cluster size in GKE cannot exceed 100 nodes. 14
  • 15. Amazon ● Amazon provides Elastic Container Service (ECS). ● ECS enables the deployment of Docker containers on Amazon EC2. ● Amazon uses docker-compose to manage docker containers. ● Docker-compose facilitates the process of setting up a multi-container application by defining the application and all its dependencies in a single file using YAML format. ● The instantiated machines include programs to automatically configure the Docker environment. ● Amazon has its own images registry. ● Cost: ● the user pays for same as that of the usual instance types. ● If the load balancing service is selected, the user pays an extra small cost of $0.025 per hour and $0.008 per GB transferred between instances ● Limitations: ● It does not support attaching EBS volumes to the running containers. 15
  • 16. ElasticHPC-Docker Features ● Ability to port and run any docker image to either private or commercial clouds. ● Creation and management of a cluster of containers. The cluster can use single or multiple machines. ● The computer cluster can have nodes from different cloud providers; i.e. some nodes can come from Amazon and some can come from Google. ● Ability to create and destroy containers in the run-time. This makes it possible to run multiple containers on the same machine, one at a time. ● The package supports scaling up/down of virtual machines (worker nodes) in a running clusters. 16
  • 17. ElasticHPC-Docker Features (2) 17 ● The package allows mounting of virtual disks and establishment of a shared file system to the containers (Default option is the NFS). In AWS, we use EBS volumes and in Google we use persistent storage disks. ● elasticHPC-Docker automatically configures a job scheduler (including security settings among the different providers) among the containers. The default job schedule is PBS Torque, but SGE is also supported. ● The current package includes many Docker specification files (DockerFile) for the most important tools for NGS data analysis. These include Fastx, BWA, GATK . ● It includes a number of structural bioinformatics tools, including AutoDock, Frodock, and AMBER GROMACS,, among others;.
  • 18. EHPC-Docker (Use Case)18 EHPC-Client EHPC-VM Manager Port 5000 Communication with VM Manager Port 5555 Ports1:4999, 5001:65535 Container Communication with Container service Master Node Communication Among conainer Service Communication Among Containerized Services Attached Data Volume Shared File System (Block Storage) Running on Users PC EHPC-VM Manager Port 5000 Port 5555 Ports1:4999, 5001:65535 Container Slave Node Worker Node Attached Data Volume EHPC-VM Manager Port 5000 Port 5555 Ports1:4999, 5001:65535 Container Slave Node Worker Node Attached Data Volume EHPC-VM Manager Port 5000 Port 5555 Ports1:4999, 5001:65535 Container Slave Node Worker Node Attached Data Volume 1. User downloads the EHPC-Docker client2. User runs the client to create a cluster on a supported clouda. The client starts Master nodeb. Master node creates the rest of the cluster in parallelc. Master node distributes the URL of the image registryd. Master and worker nodes retrieve the image and start the containers. e. Once done, the master node sets up the ports and finalizes the configuration of in terms of setting up the job scheduler and the shared storage.Cluster is ready
  • 19. Experiments ● We conducted two experiments: 1. Measure the time for establishing container clusters over different cloud platforms. 2. Measure the performance of using Docker when running the variant detection workflow. 19
  • 20. Experiment 120 1. GKE is faster than ECS 2. elasticHPC is faster than GKE 3. elasticHPC is close to ECS
  • 21. Experiment 2 ● For this experiment, we used an exome dataset from DePristo et al. of size ~ 9 GB. ● The exome is a set of NGS reads sequenced only from the whole coding regions of a genome.) ● The workflow was executed three times independently on Google, AWS, and private cloud based on OpenStack. ● In each cloud, the 9 GB input data is divided into blocks to be processed in parallel over the cluster nodes. ● For fair comparison, we used machines of as similar specifications as possible. ● Amazon: m3.2xlarge (8 C, Intel 2.5 GHz, 30 GB RAM, SSD disks, $0.532/hour), ● Google: n1-highmem-8(8 C, Intel 2.5 GHz, 52 GB RAM, SSD disks,$0.504/hour) ● OpenStack: we used local machine with 8 Cores, 56 GB RAM. 21
  • 22. Experiment 2 Physical Servers 22 Docker is too close to physical
  • 23. Experiment 2 Google Cloud 23 ElasticHPC is faster than GCE Containers
  • 24. Experiment 2 Amazon Cloud 24 ElasticHPC is very close to Amazon ECS
  • 25. Conclusion ● We introduced elasticHPC-Docker based on container technology. ● Our package enables the creation of a computer cluster with containerized applications and workflows in private and in different commercial clouds using single interface. ● It includes options to run bioinformatics applications and workflows for large datasets ● Through the container technology, elasticHPC-Docker provides an efficient solution to the inter-operability among commercial clouds, ● It is efficient in practice with reduced overhead especially on local infrastructures. ● It is available on http://www.elastichpc.org 25