Latency is inevitable. And the best you can do: Try to hide it. For some High Performance Computing (HPC) applications, however, latency can be well hidden in various contexts for cloud computing; whereas, for others, not so much. In addition to providing a template you can apply to 'cloudifying' your own applications, we share emerging and future prospects for cloud-native applications - not surprisingly, containers (e.g., Docker) are factored in. To listen to the webinar, please visit: http://www.univa.com/resources/webinar-hpc-in-the-cloud.php. To view the Q&A, please see http://blogs.univa.com/2016/02/webinar-qa-part1-hpc-in-the-cloud/ and http://blogs.univa.com/2016/02/webinar-qa-part2-hpc-in-the-cloud/.
2. Commoditized cloud applications
Latency: The gating factor for HPC-in-the-Cloud
A very brief taxonomy of clouds
Latency notwithstanding …
Embarrassingly parallel HPC applications
The impact of large data volumes
The impact of execution time
Distributed-memory parallel HPC applications - MPI and beyond
The impact of containerization
The past, present and future viability of HPC in the cloud
Agenda
5. Commoditized cloud applications
Latency: The gating factor for HPC-in-the-Cloud
A very brief taxonomy of clouds
Latency notwithstanding …
Embarrassingly parallel HPC applications
The impact of large data volumes
The impact of execution time
Distributed-memory parallel HPC applications - MPI and beyond
The impact of containerization
The past, present and future viability of HPC in the cloud
Agenda
11. Latency is physically a consequence of the
limited velocity with which any physical
interaction can propagate.
https://en.wikipedia.org/wiki/Latency_(engineering)
16. … if you have a network link with low bandwidth then
it's an easy matter of putting several in parallel to make
a combined link with higher bandwidth, but if you have
a network link with bad latency then no amount of
money can turn any number of them into a link with
good latency.
It's the Latency, Stupid
https://rescomp.stanford.edu/~cheshire/rants/Latency.html
18. Commoditized cloud applications
Latency: The gating factor for HPC-in-the-Cloud
A very brief taxonomy of clouds
Latency notwithstanding …
Embarrassingly parallel HPC applications
The impact of large data volumes
The impact of execution time
Distributed-memory parallel HPC applications - MPI and beyond
The impact of containerization
The past, present and future viability of HPC in the cloud
Agenda
20. www.univa.com
20
Cloud Taxonomy
Private Clouds
o Use containers and VMs to increase data center
workflow by dynamically optimizing the configuration
of the cluster based on job priority
Hybrid Clouds
o Combine servers in the cloud with a company’s data
center servers, making it look like one seamless
cluster
Public Clouds
o Quickly provision a cluster in the Cloud, and pay only
for what you need
21. www.univa.com
21
Use Cases
Building a physical Univa Grid Engine cluster
Creating a Univa Grid Engine cluster on Google Compute, Amazon EC2, Azure,
OpenStack, …..
Mixed clusters with more than one Cloud provider
Creating a mixed physical and VMware virtual Univa Grid Engine cluster on your
own hardware
Creating an internal cluster that can ‘burst out’ to the Cloud on demand
22. Commoditized cloud applications
Latency: The gating factor for HPC-in-the-Cloud
A very brief taxonomy of clouds
Latency notwithstanding …
Embarrassingly parallel HPC applications
The impact of large data volumes
The impact of execution time
Distributed-memory parallel HPC applications - MPI and beyond
The impact of containerization
The past, present and future viability of HPC in the cloud
Agenda
24. www.univa.com
24
Case Study: The Broad Institute
Challenge: Augment on-premise HPC resources with cost-effective,
scalable cloud based offering for bioinformatics workloads
Solution: 50K cores on Google Compute Engine via Cycle Computing
and Univa Grid Engine
Results
Ran 30 years of cancer research calculations in just a few hours
Made use of 1.4 million sequenced or genotyped biological samples
http://www.nextplatform.com/2015/09/08/google-cycle-computing-pair-for-broad-genomics-effort/
25. Commoditized cloud applications
Latency: The gating factor for HPC-in-the-Cloud
A very brief taxonomy of clouds
Latency notwithstanding …
Embarrassingly parallel HPC applications
The impact of large data volumes
The impact of execution time
Distributed-memory parallel HPC applications - MPI and beyond
The impact of containerization
The past, present and future viability of HPC in the cloud
Agenda
27. Commoditized cloud applications
Latency: The gating factor for HPC-in-the-Cloud
A very brief taxonomy of clouds
Latency notwithstanding …
Embarrassingly parallel HPC applications
The impact of large data volumes
The impact of execution time
Distributed-memory parallel HPC applications - MPI and beyond
The impact of containerization
The past, present and future viability of HPC in the cloud
Agenda
30. MPI Apps Remain a Challenge …
… for
cloud use
containerization
Constrain MPI apps to mitigate concerns with latency
Run HPC on-premise OR in a cloud, but not between
Containers?
o Just say no???
Seek alternatives
Apache Spark ???
Message busses ???
Shifter ???
31. Commoditized cloud applications
Latency: The gating factor for HPC-in-the-Cloud
A very brief taxonomy of clouds
Latency notwithstanding …
Embarrassingly parallel HPC applications
The impact of large data volumes
The impact of execution time
Distributed-memory parallel HPC applications - MPI and beyond
The impact of containerization
The past, present and future viability of HPC in the cloud
Agenda
36. HPC as a Containerized Cloud Based Service
http://insidehpc.com/2015/11/ubercloud-delivers-cae-as-a-service-with-univa-grid-engine-
container-edition/
37. Cloud Native Computing Foundation (CNCF)
For current applications and services
Uptake of cloud computing remains an afterthought from a systems-
architecture perspective
CNCF aims to introduce a cloud-native paradigm shift that
emphasizes:
Containerization
Dynamic scheduling
Orientation around micro services
Making use of Kubernetes as a ‘seed technology’
#1 priority: Integrate the orchestration layer of the container
ecosystem
Univa is a Founding Member
Along with Google, IBM, Intel, Red Hat and numerous others ...
Prototype implementations becoming available
https://cncf.io/
38. Commoditized cloud applications
Latency: The gating factor for HPC-in-the-Cloud
A very brief taxonomy of clouds
Latency notwithstanding …
Embarrassingly parallel HPC applications
The impact of large data volumes
The impact of execution time
Distributed-memory parallel HPC applications - MPI and beyond
The impact of containerization
The past, present and future viability of HPC in the cloud
Agenda
48. GPUs in the Cloud? The Top Four Reasons
1.You can realize possibilities using the cloud
a. You can scale up and scale out
2.You still realize the promise of GPU programmability
a. … via HPC in the cloud
3.Your use of the cloud is transparent
a. You’ve found ways to `hide’ latency
i. Constraints apply for MPI apps
4.Your go-to apps still work in the cloud
http://info.brightcomputing.com/Blog/bid/196290/The-Top-4-Reasons-You-Should-Try-Cloud-Based-
GPUs-for-HPC
50. www.univa.com
50
Docker
What is Docker?
Docker is a tool that packages an application, filesystem, and all other
dependencies into a easily distributable software package that can be
installed and run on any modern Linux Server.
What is a Software Container?
Similar to a Virtual Machine but a single Operating System is shared.
o Faster than Virtual Machines
o Less overhead than Virtual Machines
o You can run more Software Containers on a machine than VMs.
Not a new concept, Sun Microsystems has ‘Solaris Zones’.
Why is Docker different?
52. www.univa.com
52
Docker
What is Docker?
Docker is a tool that packages an application, filesystem, and all other
dependencies into a easily distributable software package that can be
installed and run on any modern Linux Server.
What is a Software Container?
Similar to a Virtual Machine but a single Operating System is shared.
o Faster than Virtual Machines
o Less overhead than Virtual Machines
o You can run more Software Containers on a machine than VMs.
Not a new concept, Sun Microsystems has ‘Solaris Zones’.
Why is Docker different?
54. www.univa.com
54
Kubernetes
What is Kubernetes?
Kubernetes is a workload and service orchestration tool for
containerized applications and services running on a cluster or cloud
infrastructure.
Where did it come from?
It is derived from research work Google has been doing (called Omega),
drawing from the experience of Google has gained with their own in-
house orchestration system (Borg) in the past 10+ years.
Why is it important?
Google wants Kubernetes to become a standard container orchestration
platform for Clouds and Enterprises.
Running multiple containers on multiple machines is hard, you need
Kubernetes
55.
56. “The wonderful thing
about standards is
that there are so
many of them to
choose from.”
https://en.wikiquote.org/wiki/Grace
_Hopper
What most people think when “Cloud Computing” is mentioned …
What about computing? And, more importantly, HPC???
http://img04.deviantart.net/bd14/i/2013/046/3/d/space_the_final_frontier_by_unusualsuspex-d5v0h8m.jpg
There is a final frontier to consider ...
The final, make that ULTIMATE, frontier when it comes to HPC in the cloud.
Thanks to Jim Freemantle (OARS) for suggesting this illustration based on gaming.
Source: http://t2.rbxcdn.com/a9edb551eb372d1049b53bf66ca8e494
What is latency? The elapsed time between stimulus and response.
The ultimate limit …
Back to Star Trek for ideas …
https://improvdandies.files.wordpress.com/2014/06/cloaking-device-joke-section42.jpg
Latency can be ‘hidden’ …
+ The Cloud …
Granularity is a measure of the amount of computation that can take place before there is a need for synchronization or communication. Thus the ratio computation/communication serves as a proxy for the vertical axis of the figure.
Concurrency refers to an ability to carry out activities simultaneously. In other words, it is a measure of the degree of parallelism that is present.
Granularity is a measure of the amount of computation that can take place before there is a need for synchronization or communication. Thus the ratio computation/communication serves as a proxy for the vertical axis of the figure.
Concurrency refers to an ability to carry out activities simultaneously. In other words, it is a measure of the degree of parallelism that is present.
Description: Best machine is the machine that already has most of or all of the docker image already downloaded.
Description: Allow a user or Administrator to run any Docker Container in a Grid Engine Cluster.
Description: Running a container is very similar to running a standard batch job in Grid Engine. Containers provide a useful mechanism for running complex applications and in Grid Engine you can put limits on the runtime, memory and cpu usage of a container running on a machine to ensure it does not consume all of the resources on the machine.
Description: Docker Containers may require input files and generate output or error files. Since those files run in a container they are not normally available to the end user outside of the container.
Description: Univa Grid Engine can run interactive commands in a cluster. Many organizations use this to run tools across the cluster from their custom scripts. Extending this to create a container then run the interactive command provides Administrators with more control over how their end users run applications in the Grid Engine Cluster.
Description: keeping track of the resources used by a Docker Container allows companies to ensure that each project and team in the company receives the correct amount of compute resources based on the business needs of the organization.
+ The Cloud …
Digging into traditional cloud apps in a little more detail …
For more on AJAX, please see https://en.wikipedia.org/wiki/Ajax_(programming).
A simple perspective of a typical cloud app ...
http://www0.cloudbootcamp.com/node/660946
http://ianlumb.files.wordpress.com/2008/04/desktop-software-figures0031.png
A simple perspective of a typical cloud app with sync’d data …
Google Gears has been supplanted by an analogous capability that was implemented in HTML5 (e.g., http://gearsblog.blogspot.ca/2011/03/stopping-gears.html).
http://ianlumb.files.wordpress.com/2008/04/desktop-software-figures0041.png
Alternate perspective ;-)
Many HPC apps, on the ground or in the cloud, are latency intolerant.
http://rusvesna.su/sites/default/files/styles/orign_wm/public/tiraspol_atakuet_kiev_i_kishinev_vvedeniem_poshliny_na_moloko.jpg?itok=sb6_OU8l - the milk part