SlideShare uma empresa Scribd logo
1 de 58
www.univa.com
February 2016
High Performance
Computing in the
Cloud?
RECORDING
Q&A – Part 1 & Part 2
 Commoditized cloud applications
 Latency: The gating factor for HPC-in-the-Cloud
 A very brief taxonomy of clouds
 Latency notwithstanding …
 Embarrassingly parallel HPC applications
 The impact of large data volumes
 The impact of execution time
 Distributed-memory parallel HPC applications - MPI and beyond
 The impact of containerization
 The past, present and future viability of HPC in the cloud
Agenda
HPC???
 Commoditized cloud applications
 Latency: The gating factor for HPC-in-the-Cloud
 A very brief taxonomy of clouds
 Latency notwithstanding …
 Embarrassingly parallel HPC applications
 The impact of large data volumes
 The impact of execution time
 Distributed-memory parallel HPC applications - MPI and beyond
 The impact of containerization
 The past, present and future viability of HPC in the cloud
Agenda
http://img04.deviantart.net/bd14/i/2013/046/3/d/space_the_final_frontier_by_unusualsuspex-d5v0h8m.jpg
Latency
Latency is physically a consequence of the
limited velocity with which any physical
interaction can propagate.
https://en.wikipedia.org/wiki/Latency_(engineering)
http://www.cartoonsidrew.com/2011/05/einsteins-speed-limit.html
Current consumer devices have appallingly bad latency …
It's the Latency, Stupid
https://rescomp.stanford.edu/~cheshire/rants/Latency.html
Latency
… if you have a network link with low bandwidth then
it's an easy matter of putting several in parallel to make
a combined link with higher bandwidth, but if you have
a network link with bad latency then no amount of
money can turn any number of them into a link with
good latency.
It's the Latency, Stupid
https://rescomp.stanford.edu/~cheshire/rants/Latency.html
Latency
… and the best we can do? Try to ‘HIDE’ it!
 Commoditized cloud applications
 Latency: The gating factor for HPC-in-the-Cloud
 A very brief taxonomy of clouds
 Latency notwithstanding …
 Embarrassingly parallel HPC applications
 The impact of large data volumes
 The impact of execution time
 Distributed-memory parallel HPC applications - MPI and beyond
 The impact of containerization
 The past, present and future viability of HPC in the cloud
Agenda
Latency
… and the best we can do? Try to ‘HIDE’ it!
www.univa.com
20
Cloud Taxonomy
 Private Clouds
o Use containers and VMs to increase data center
workflow by dynamically optimizing the configuration
of the cluster based on job priority
 Hybrid Clouds
o Combine servers in the cloud with a company’s data
center servers, making it look like one seamless
cluster
 Public Clouds
o Quickly provision a cluster in the Cloud, and pay only
for what you need
www.univa.com
21
Use Cases
 Building a physical Univa Grid Engine cluster
 Creating a Univa Grid Engine cluster on Google Compute, Amazon EC2, Azure,
OpenStack, …..
 Mixed clusters with more than one Cloud provider
 Creating a mixed physical and VMware virtual Univa Grid Engine cluster on your
own hardware
 Creating an internal cluster that can ‘burst out’ to the Cloud on demand
 Commoditized cloud applications
 Latency: The gating factor for HPC-in-the-Cloud
 A very brief taxonomy of clouds
 Latency notwithstanding …
 Embarrassingly parallel HPC applications
 The impact of large data volumes
 The impact of execution time
 Distributed-memory parallel HPC applications - MPI and beyond
 The impact of containerization
 The past, present and future viability of HPC in the cloud
Agenda
http://c59951.r51.cf2.rackcdn.com/4994-1182-lumb.pdf
www.univa.com
24
Case Study: The Broad Institute
Challenge: Augment on-premise HPC resources with cost-effective,
scalable cloud based offering for bioinformatics workloads
Solution: 50K cores on Google Compute Engine via Cycle Computing
and Univa Grid Engine
Results
 Ran 30 years of cancer research calculations in just a few hours
 Made use of 1.4 million sequenced or genotyped biological samples
http://www.nextplatform.com/2015/09/08/google-cycle-computing-pair-for-broad-genomics-effort/
 Commoditized cloud applications
 Latency: The gating factor for HPC-in-the-Cloud
 A very brief taxonomy of clouds
 Latency notwithstanding …
 Embarrassingly parallel HPC applications
 The impact of large data volumes
 The impact of execution time
 Distributed-memory parallel HPC applications - MPI and beyond
 The impact of containerization
 The past, present and future viability of HPC in the cloud
Agenda
www.univa.com
Univa Short Jobs: Architecture
Workflow
Submission
• Policy ctrl through
launchers
• Pull vs push  fast
Copyright © 2016 Univa Corporation, All Rights Reserved. 26
 Commoditized cloud applications
 Latency: The gating factor for HPC-in-the-Cloud
 A very brief taxonomy of clouds
 Latency notwithstanding …
 Embarrassingly parallel HPC applications
 The impact of large data volumes
 The impact of execution time
 Distributed-memory parallel HPC applications - MPI and beyond
 The impact of containerization
 The past, present and future viability of HPC in the cloud
Agenda
http://c59951.r51.cf2.rackcdn.com/4994-1182-lumb.pdf
http://www.mellanox.com/page/performance_infiniband
MPI Apps Remain a Challenge …
 … for
 cloud use
 containerization
 Constrain MPI apps to mitigate concerns with latency
 Run HPC on-premise OR in a cloud, but not between
 Containers?
o Just say no???
 Seek alternatives
 Apache Spark ???
 Message busses ???
 Shifter ???
 Commoditized cloud applications
 Latency: The gating factor for HPC-in-the-Cloud
 A very brief taxonomy of clouds
 Latency notwithstanding …
 Embarrassingly parallel HPC applications
 The impact of large data volumes
 The impact of execution time
 Distributed-memory parallel HPC applications - MPI and beyond
 The impact of containerization
 The past, present and future viability of HPC in the cloud
Agenda
https://insights.sei.cmu.edu/assets/content/VM-Diagram.png
www.univa.com
33
Full Docker Integration
 Docker Test scripts useful however:
 No interactive containers
 No runtime resource usage for containers
 No accounting for containers
 Complete Docker integration in progress:
 Integrated into the Execution Daemon
 Beta shipping now!!! will ship in November 2015
o If you are interested please contact us!
Copyright © Univa Corporation, 2016. All Rights Reserved
www.univa.com
Copyright © Univa Corporation, 2016. All Rights Reserved 34
Docker with Univa Grid Engine
 Launch Docker Container on best machine in cluster
 Reduces the time wasted (it can be minutes) waiting for the Docker
image to download from the Docker registry. Container runs faster
increasing throughput in the cluster.
 Run Docker Containers in a Univa Grid Engine Cluster
 Business Critical containers are prioritized over other
containers. Increases efficiency of the overall organization.
 Job Control and Limits for Docker Containers
 Provides user and administrator control over containers running on Grid
Engine Hosts.
www.univa.com
Copyright © Univa Corporation, 2015. All Rights Reserved 35
Docker Integration with Unvia
 Accounting for Docker Containers
 Keeps track of containers. Share policies require accounting.
 Data file Management for Docker Containers
 Transparent access to input, output and error files. Simplifies the
management of input and output files for Docker Containers and
ensures any output or error files are moved to a location where the user
can access them.
 Interactive Docker Containers
 Good for debugging when containers don’t work correctly!
HPC as a Containerized Cloud Based Service
http://insidehpc.com/2015/11/ubercloud-delivers-cae-as-a-service-with-univa-grid-engine-
container-edition/
Cloud Native Computing Foundation (CNCF)
 For current applications and services
 Uptake of cloud computing remains an afterthought from a systems-
architecture perspective
 CNCF aims to introduce a cloud-native paradigm shift that
emphasizes:
 Containerization
 Dynamic scheduling
 Orientation around micro services
 Making use of Kubernetes as a ‘seed technology’
 #1 priority: Integrate the orchestration layer of the container
ecosystem
 Univa is a Founding Member
 Along with Google, IBM, Intel, Red Hat and numerous others ...
 Prototype implementations becoming available
https://cncf.io/
 Commoditized cloud applications
 Latency: The gating factor for HPC-in-the-Cloud
 A very brief taxonomy of clouds
 Latency notwithstanding …
 Embarrassingly parallel HPC applications
 The impact of large data volumes
 The impact of execution time
 Distributed-memory parallel HPC applications - MPI and beyond
 The impact of containerization
 The past, present and future viability of HPC in the cloud
Agenda
Latency
… and the best we can do? Try to ‘HIDE’ it!
www.univa.com
THANK YOU
Ian Lumb
Solutions Architect
+1 630 303 9068 ilumb@univa.com
RECORDING
Q&A – Part 1 & Part 2
www.univa.com
Hidden Slides
http://www0.cloudbootcamp.com/node/660946
https://c2.staticflickr.com/8/7174/6406442009_70cc52d8aa_b.jpg
http://runge.math.smu.edu/SMUHPC_workshop_Summer14/_images/flynn.png
Flynn’s
Taxonomy
GPUs in the Cloud? The Top Four Reasons
1.You can realize possibilities using the cloud
a. You can scale up and scale out
2.You still realize the promise of GPU programmability
a. … via HPC in the cloud
3.Your use of the cloud is transparent
a. You’ve found ways to `hide’ latency
i. Constraints apply for MPI apps
4.Your go-to apps still work in the cloud
http://info.brightcomputing.com/Blog/bid/196290/The-Top-4-Reasons-You-Should-Try-Cloud-Based-
GPUs-for-HPC
https://aws.amazon.com/ec2/instance-types/
www.univa.com
50
Docker
 What is Docker?
 Docker is a tool that packages an application, filesystem, and all other
dependencies into a easily distributable software package that can be
installed and run on any modern Linux Server.
 What is a Software Container?
 Similar to a Virtual Machine but a single Operating System is shared.
o Faster than Virtual Machines
o Less overhead than Virtual Machines
o You can run more Software Containers on a machine than VMs.
 Not a new concept, Sun Microsystems has ‘Solaris Zones’.
 Why is Docker different?
http://dockone.io/uploads/article/20150329/aa61c8ee04d815507d575c9d
0a3c162f.png
www.univa.com
52
Docker
 What is Docker?
 Docker is a tool that packages an application, filesystem, and all other
dependencies into a easily distributable software package that can be
installed and run on any modern Linux Server.
 What is a Software Container?
 Similar to a Virtual Machine but a single Operating System is shared.
o Faster than Virtual Machines
o Less overhead than Virtual Machines
o You can run more Software Containers on a machine than VMs.
 Not a new concept, Sun Microsystems has ‘Solaris Zones’.
 Why is Docker different?
www.univa.com
53
Docker on Google Trends
Interest in Docker (US only)
Rapid growth since the end
of 2013 … continues …
www.univa.com
54
Kubernetes
 What is Kubernetes?
 Kubernetes is a workload and service orchestration tool for
containerized applications and services running on a cluster or cloud
infrastructure.
 Where did it come from?
 It is derived from research work Google has been doing (called Omega),
drawing from the experience of Google has gained with their own in-
house orchestration system (Borg) in the past 10+ years.
 Why is it important?
 Google wants Kubernetes to become a standard container orchestration
platform for Clouds and Enterprises.
 Running multiple containers on multiple machines is hard, you need
Kubernetes
“The wonderful thing
about standards is
that there are so
many of them to
choose from.”
https://en.wikiquote.org/wiki/Grace
_Hopper
Cloud Computing
is bereft of standards!!!
Cloud Computing
is bereft of standards!!!
...but, FLUSH with implementations!!!

Mais conteúdo relacionado

Mais de Ian Lumb

Towards Deep Learning from Twitter for Improved Tsunami Alerts and Advisories
Towards Deep Learning from Twitter for Improved Tsunami Alerts and AdvisoriesTowards Deep Learning from Twitter for Improved Tsunami Alerts and Advisories
Towards Deep Learning from Twitter for Improved Tsunami Alerts and AdvisoriesIan Lumb
 
Univa and SUSE at SC17: Scaling Machine Learning for SUSE Linux Containers, S...
Univa and SUSE at SC17: Scaling Machine Learning for SUSE Linux Containers, S...Univa and SUSE at SC17: Scaling Machine Learning for SUSE Linux Containers, S...
Univa and SUSE at SC17: Scaling Machine Learning for SUSE Linux Containers, S...Ian Lumb
 
Managing Containerized HPC and AI Workloads on TSUBAME3.0
Managing Containerized HPC and AI Workloads on TSUBAME3.0Managing Containerized HPC and AI Workloads on TSUBAME3.0
Managing Containerized HPC and AI Workloads on TSUBAME3.0Ian Lumb
 
Univa Unicloud - High Volume Workloads: How Smart Companies are Harnessing th...
Univa Unicloud - High Volume Workloads: How Smart Companies are Harnessing th...Univa Unicloud - High Volume Workloads: How Smart Companies are Harnessing th...
Univa Unicloud - High Volume Workloads: How Smart Companies are Harnessing th...Ian Lumb
 
Dev / Test / Ops – Gain More Horsepower and Reduce Costs by Sharing Kubernete...
Dev / Test / Ops – Gain More Horsepower and Reduce Costs by Sharing Kubernete...Dev / Test / Ops – Gain More Horsepower and Reduce Costs by Sharing Kubernete...
Dev / Test / Ops – Gain More Horsepower and Reduce Costs by Sharing Kubernete...Ian Lumb
 
High Performance Computing in the Cloud?
High Performance Computing in the Cloud?High Performance Computing in the Cloud?
High Performance Computing in the Cloud?Ian Lumb
 
VoDcast Slides: The Rise in Popularity of Apache Spark
VoDcast Slides: The Rise in Popularity of Apache SparkVoDcast Slides: The Rise in Popularity of Apache Spark
VoDcast Slides: The Rise in Popularity of Apache SparkIan Lumb
 
Bright Topics Webinar April 15, 2015 - Modernized Monitoring for Cluster and ...
Bright Topics Webinar April 15, 2015 - Modernized Monitoring for Cluster and ...Bright Topics Webinar April 15, 2015 - Modernized Monitoring for Cluster and ...
Bright Topics Webinar April 15, 2015 - Modernized Monitoring for Cluster and ...Ian Lumb
 
Utilizing Public AND Private Clouds with Bright Cluster Manager
Utilizing Public AND Private Clouds with Bright Cluster ManagerUtilizing Public AND Private Clouds with Bright Cluster Manager
Utilizing Public AND Private Clouds with Bright Cluster ManagerIan Lumb
 
How to Upgrade Your Hadoop Stack in 1 Step -- with Zero Downtime
How to Upgrade Your Hadoop Stack in 1 Step -- with Zero DowntimeHow to Upgrade Your Hadoop Stack in 1 Step -- with Zero Downtime
How to Upgrade Your Hadoop Stack in 1 Step -- with Zero DowntimeIan Lumb
 
Bright Cluster Manager: A Comprehensive, Integrated Management Solution for P...
Bright Cluster Manager: A Comprehensive, Integrated Management Solution for P...Bright Cluster Manager: A Comprehensive, Integrated Management Solution for P...
Bright Cluster Manager: A Comprehensive, Integrated Management Solution for P...Ian Lumb
 

Mais de Ian Lumb (11)

Towards Deep Learning from Twitter for Improved Tsunami Alerts and Advisories
Towards Deep Learning from Twitter for Improved Tsunami Alerts and AdvisoriesTowards Deep Learning from Twitter for Improved Tsunami Alerts and Advisories
Towards Deep Learning from Twitter for Improved Tsunami Alerts and Advisories
 
Univa and SUSE at SC17: Scaling Machine Learning for SUSE Linux Containers, S...
Univa and SUSE at SC17: Scaling Machine Learning for SUSE Linux Containers, S...Univa and SUSE at SC17: Scaling Machine Learning for SUSE Linux Containers, S...
Univa and SUSE at SC17: Scaling Machine Learning for SUSE Linux Containers, S...
 
Managing Containerized HPC and AI Workloads on TSUBAME3.0
Managing Containerized HPC and AI Workloads on TSUBAME3.0Managing Containerized HPC and AI Workloads on TSUBAME3.0
Managing Containerized HPC and AI Workloads on TSUBAME3.0
 
Univa Unicloud - High Volume Workloads: How Smart Companies are Harnessing th...
Univa Unicloud - High Volume Workloads: How Smart Companies are Harnessing th...Univa Unicloud - High Volume Workloads: How Smart Companies are Harnessing th...
Univa Unicloud - High Volume Workloads: How Smart Companies are Harnessing th...
 
Dev / Test / Ops – Gain More Horsepower and Reduce Costs by Sharing Kubernete...
Dev / Test / Ops – Gain More Horsepower and Reduce Costs by Sharing Kubernete...Dev / Test / Ops – Gain More Horsepower and Reduce Costs by Sharing Kubernete...
Dev / Test / Ops – Gain More Horsepower and Reduce Costs by Sharing Kubernete...
 
High Performance Computing in the Cloud?
High Performance Computing in the Cloud?High Performance Computing in the Cloud?
High Performance Computing in the Cloud?
 
VoDcast Slides: The Rise in Popularity of Apache Spark
VoDcast Slides: The Rise in Popularity of Apache SparkVoDcast Slides: The Rise in Popularity of Apache Spark
VoDcast Slides: The Rise in Popularity of Apache Spark
 
Bright Topics Webinar April 15, 2015 - Modernized Monitoring for Cluster and ...
Bright Topics Webinar April 15, 2015 - Modernized Monitoring for Cluster and ...Bright Topics Webinar April 15, 2015 - Modernized Monitoring for Cluster and ...
Bright Topics Webinar April 15, 2015 - Modernized Monitoring for Cluster and ...
 
Utilizing Public AND Private Clouds with Bright Cluster Manager
Utilizing Public AND Private Clouds with Bright Cluster ManagerUtilizing Public AND Private Clouds with Bright Cluster Manager
Utilizing Public AND Private Clouds with Bright Cluster Manager
 
How to Upgrade Your Hadoop Stack in 1 Step -- with Zero Downtime
How to Upgrade Your Hadoop Stack in 1 Step -- with Zero DowntimeHow to Upgrade Your Hadoop Stack in 1 Step -- with Zero Downtime
How to Upgrade Your Hadoop Stack in 1 Step -- with Zero Downtime
 
Bright Cluster Manager: A Comprehensive, Integrated Management Solution for P...
Bright Cluster Manager: A Comprehensive, Integrated Management Solution for P...Bright Cluster Manager: A Comprehensive, Integrated Management Solution for P...
Bright Cluster Manager: A Comprehensive, Integrated Management Solution for P...
 

Último

SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfkalichargn70th171
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number SystemsJheuzeDellosa
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 

Último (20)

SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
Exploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the ProcessExploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the Process
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number Systems
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 

Univa webinar: High Performance Computing (HPC) in the Cloud?

  • 1. www.univa.com February 2016 High Performance Computing in the Cloud? RECORDING Q&A – Part 1 & Part 2
  • 2.  Commoditized cloud applications  Latency: The gating factor for HPC-in-the-Cloud  A very brief taxonomy of clouds  Latency notwithstanding …  Embarrassingly parallel HPC applications  The impact of large data volumes  The impact of execution time  Distributed-memory parallel HPC applications - MPI and beyond  The impact of containerization  The past, present and future viability of HPC in the cloud Agenda
  • 3.
  • 5.  Commoditized cloud applications  Latency: The gating factor for HPC-in-the-Cloud  A very brief taxonomy of clouds  Latency notwithstanding …  Embarrassingly parallel HPC applications  The impact of large data volumes  The impact of execution time  Distributed-memory parallel HPC applications - MPI and beyond  The impact of containerization  The past, present and future viability of HPC in the cloud Agenda
  • 8.
  • 9.
  • 10.
  • 11. Latency is physically a consequence of the limited velocity with which any physical interaction can propagate. https://en.wikipedia.org/wiki/Latency_(engineering)
  • 13. Current consumer devices have appallingly bad latency … It's the Latency, Stupid https://rescomp.stanford.edu/~cheshire/rants/Latency.html
  • 15.
  • 16. … if you have a network link with low bandwidth then it's an easy matter of putting several in parallel to make a combined link with higher bandwidth, but if you have a network link with bad latency then no amount of money can turn any number of them into a link with good latency. It's the Latency, Stupid https://rescomp.stanford.edu/~cheshire/rants/Latency.html
  • 17. Latency … and the best we can do? Try to ‘HIDE’ it!
  • 18.  Commoditized cloud applications  Latency: The gating factor for HPC-in-the-Cloud  A very brief taxonomy of clouds  Latency notwithstanding …  Embarrassingly parallel HPC applications  The impact of large data volumes  The impact of execution time  Distributed-memory parallel HPC applications - MPI and beyond  The impact of containerization  The past, present and future viability of HPC in the cloud Agenda
  • 19. Latency … and the best we can do? Try to ‘HIDE’ it!
  • 20. www.univa.com 20 Cloud Taxonomy  Private Clouds o Use containers and VMs to increase data center workflow by dynamically optimizing the configuration of the cluster based on job priority  Hybrid Clouds o Combine servers in the cloud with a company’s data center servers, making it look like one seamless cluster  Public Clouds o Quickly provision a cluster in the Cloud, and pay only for what you need
  • 21. www.univa.com 21 Use Cases  Building a physical Univa Grid Engine cluster  Creating a Univa Grid Engine cluster on Google Compute, Amazon EC2, Azure, OpenStack, …..  Mixed clusters with more than one Cloud provider  Creating a mixed physical and VMware virtual Univa Grid Engine cluster on your own hardware  Creating an internal cluster that can ‘burst out’ to the Cloud on demand
  • 22.  Commoditized cloud applications  Latency: The gating factor for HPC-in-the-Cloud  A very brief taxonomy of clouds  Latency notwithstanding …  Embarrassingly parallel HPC applications  The impact of large data volumes  The impact of execution time  Distributed-memory parallel HPC applications - MPI and beyond  The impact of containerization  The past, present and future viability of HPC in the cloud Agenda
  • 24. www.univa.com 24 Case Study: The Broad Institute Challenge: Augment on-premise HPC resources with cost-effective, scalable cloud based offering for bioinformatics workloads Solution: 50K cores on Google Compute Engine via Cycle Computing and Univa Grid Engine Results  Ran 30 years of cancer research calculations in just a few hours  Made use of 1.4 million sequenced or genotyped biological samples http://www.nextplatform.com/2015/09/08/google-cycle-computing-pair-for-broad-genomics-effort/
  • 25.  Commoditized cloud applications  Latency: The gating factor for HPC-in-the-Cloud  A very brief taxonomy of clouds  Latency notwithstanding …  Embarrassingly parallel HPC applications  The impact of large data volumes  The impact of execution time  Distributed-memory parallel HPC applications - MPI and beyond  The impact of containerization  The past, present and future viability of HPC in the cloud Agenda
  • 26. www.univa.com Univa Short Jobs: Architecture Workflow Submission • Policy ctrl through launchers • Pull vs push  fast Copyright © 2016 Univa Corporation, All Rights Reserved. 26
  • 27.  Commoditized cloud applications  Latency: The gating factor for HPC-in-the-Cloud  A very brief taxonomy of clouds  Latency notwithstanding …  Embarrassingly parallel HPC applications  The impact of large data volumes  The impact of execution time  Distributed-memory parallel HPC applications - MPI and beyond  The impact of containerization  The past, present and future viability of HPC in the cloud Agenda
  • 30. MPI Apps Remain a Challenge …  … for  cloud use  containerization  Constrain MPI apps to mitigate concerns with latency  Run HPC on-premise OR in a cloud, but not between  Containers? o Just say no???  Seek alternatives  Apache Spark ???  Message busses ???  Shifter ???
  • 31.  Commoditized cloud applications  Latency: The gating factor for HPC-in-the-Cloud  A very brief taxonomy of clouds  Latency notwithstanding …  Embarrassingly parallel HPC applications  The impact of large data volumes  The impact of execution time  Distributed-memory parallel HPC applications - MPI and beyond  The impact of containerization  The past, present and future viability of HPC in the cloud Agenda
  • 33. www.univa.com 33 Full Docker Integration  Docker Test scripts useful however:  No interactive containers  No runtime resource usage for containers  No accounting for containers  Complete Docker integration in progress:  Integrated into the Execution Daemon  Beta shipping now!!! will ship in November 2015 o If you are interested please contact us! Copyright © Univa Corporation, 2016. All Rights Reserved
  • 34. www.univa.com Copyright © Univa Corporation, 2016. All Rights Reserved 34 Docker with Univa Grid Engine  Launch Docker Container on best machine in cluster  Reduces the time wasted (it can be minutes) waiting for the Docker image to download from the Docker registry. Container runs faster increasing throughput in the cluster.  Run Docker Containers in a Univa Grid Engine Cluster  Business Critical containers are prioritized over other containers. Increases efficiency of the overall organization.  Job Control and Limits for Docker Containers  Provides user and administrator control over containers running on Grid Engine Hosts.
  • 35. www.univa.com Copyright © Univa Corporation, 2015. All Rights Reserved 35 Docker Integration with Unvia  Accounting for Docker Containers  Keeps track of containers. Share policies require accounting.  Data file Management for Docker Containers  Transparent access to input, output and error files. Simplifies the management of input and output files for Docker Containers and ensures any output or error files are moved to a location where the user can access them.  Interactive Docker Containers  Good for debugging when containers don’t work correctly!
  • 36. HPC as a Containerized Cloud Based Service http://insidehpc.com/2015/11/ubercloud-delivers-cae-as-a-service-with-univa-grid-engine- container-edition/
  • 37. Cloud Native Computing Foundation (CNCF)  For current applications and services  Uptake of cloud computing remains an afterthought from a systems- architecture perspective  CNCF aims to introduce a cloud-native paradigm shift that emphasizes:  Containerization  Dynamic scheduling  Orientation around micro services  Making use of Kubernetes as a ‘seed technology’  #1 priority: Integrate the orchestration layer of the container ecosystem  Univa is a Founding Member  Along with Google, IBM, Intel, Red Hat and numerous others ...  Prototype implementations becoming available https://cncf.io/
  • 38.  Commoditized cloud applications  Latency: The gating factor for HPC-in-the-Cloud  A very brief taxonomy of clouds  Latency notwithstanding …  Embarrassingly parallel HPC applications  The impact of large data volumes  The impact of execution time  Distributed-memory parallel HPC applications - MPI and beyond  The impact of containerization  The past, present and future viability of HPC in the cloud Agenda
  • 39. Latency … and the best we can do? Try to ‘HIDE’ it!
  • 40. www.univa.com THANK YOU Ian Lumb Solutions Architect +1 630 303 9068 ilumb@univa.com RECORDING Q&A – Part 1 & Part 2
  • 43.
  • 44.
  • 46.
  • 48. GPUs in the Cloud? The Top Four Reasons 1.You can realize possibilities using the cloud a. You can scale up and scale out 2.You still realize the promise of GPU programmability a. … via HPC in the cloud 3.Your use of the cloud is transparent a. You’ve found ways to `hide’ latency i. Constraints apply for MPI apps 4.Your go-to apps still work in the cloud http://info.brightcomputing.com/Blog/bid/196290/The-Top-4-Reasons-You-Should-Try-Cloud-Based- GPUs-for-HPC
  • 50. www.univa.com 50 Docker  What is Docker?  Docker is a tool that packages an application, filesystem, and all other dependencies into a easily distributable software package that can be installed and run on any modern Linux Server.  What is a Software Container?  Similar to a Virtual Machine but a single Operating System is shared. o Faster than Virtual Machines o Less overhead than Virtual Machines o You can run more Software Containers on a machine than VMs.  Not a new concept, Sun Microsystems has ‘Solaris Zones’.  Why is Docker different?
  • 52. www.univa.com 52 Docker  What is Docker?  Docker is a tool that packages an application, filesystem, and all other dependencies into a easily distributable software package that can be installed and run on any modern Linux Server.  What is a Software Container?  Similar to a Virtual Machine but a single Operating System is shared. o Faster than Virtual Machines o Less overhead than Virtual Machines o You can run more Software Containers on a machine than VMs.  Not a new concept, Sun Microsystems has ‘Solaris Zones’.  Why is Docker different?
  • 53. www.univa.com 53 Docker on Google Trends Interest in Docker (US only) Rapid growth since the end of 2013 … continues …
  • 54. www.univa.com 54 Kubernetes  What is Kubernetes?  Kubernetes is a workload and service orchestration tool for containerized applications and services running on a cluster or cloud infrastructure.  Where did it come from?  It is derived from research work Google has been doing (called Omega), drawing from the experience of Google has gained with their own in- house orchestration system (Borg) in the past 10+ years.  Why is it important?  Google wants Kubernetes to become a standard container orchestration platform for Clouds and Enterprises.  Running multiple containers on multiple machines is hard, you need Kubernetes
  • 55.
  • 56. “The wonderful thing about standards is that there are so many of them to choose from.” https://en.wikiquote.org/wiki/Grace _Hopper
  • 57. Cloud Computing is bereft of standards!!!
  • 58. Cloud Computing is bereft of standards!!! ...but, FLUSH with implementations!!!

Notas do Editor

  1. What most people think when “Cloud Computing” is mentioned …
  2. What about computing? And, more importantly, HPC???
  3. http://img04.deviantart.net/bd14/i/2013/046/3/d/space_the_final_frontier_by_unusualsuspex-d5v0h8m.jpg There is a final frontier to consider ...
  4. The final, make that ULTIMATE, frontier when it comes to HPC in the cloud.
  5. Thanks to Jim Freemantle (OARS) for suggesting this illustration based on gaming. Source: http://t2.rbxcdn.com/a9edb551eb372d1049b53bf66ca8e494
  6. What is latency? The elapsed time between stimulus and response.
  7. The ultimate limit …
  8. Back to Star Trek for ideas … https://improvdandies.files.wordpress.com/2014/06/cloaking-device-joke-section42.jpg
  9. Latency can be ‘hidden’ …
  10. + The Cloud …
  11. Granularity is a measure of the amount of computation that can take place before there is a need for synchronization or communication. Thus the ratio computation/communication serves as a proxy for the vertical axis of the figure. Concurrency refers to an ability to carry out activities simultaneously. In other words, it is a measure of the degree of parallelism that is present.
  12. Granularity is a measure of the amount of computation that can take place before there is a need for synchronization or communication. Thus the ratio computation/communication serves as a proxy for the vertical axis of the figure. Concurrency refers to an ability to carry out activities simultaneously. In other words, it is a measure of the degree of parallelism that is present.
  13. Description: Best machine is the machine that already has most of or all of the docker image already downloaded. Description: Allow a user or Administrator to run any Docker Container in a Grid Engine Cluster.
  14. Description: Running a container is very similar to running a standard batch job in Grid Engine.  Containers provide a useful mechanism for running complex applications and in Grid Engine you can put limits on the runtime, memory and cpu usage of a container running on a machine to ensure it does not consume all of the resources on the machine. Description: Docker Containers may require input files and generate output or error files.  Since those files run in a container they are not normally available to the end user outside of the container. Description: Univa Grid Engine can run interactive commands in a cluster.  Many organizations use this to run tools across the cluster from their custom scripts.  Extending this to create a container then run the interactive command provides Administrators with more control over how their end users run applications in the Grid Engine Cluster. Description: keeping track of the resources used by a Docker Container allows companies to ensure that each project and team in the company receives the correct amount of compute resources based on the business needs of the organization.
  15. + The Cloud …
  16. Digging into traditional cloud apps in a little more detail … For more on AJAX, please see https://en.wikipedia.org/wiki/Ajax_(programming).
  17. A simple perspective of a typical cloud app ... http://www0.cloudbootcamp.com/node/660946 http://ianlumb.files.wordpress.com/2008/04/desktop-software-figures0031.png
  18. A simple perspective of a typical cloud app with sync’d data … Google Gears has been supplanted by an analogous capability that was implemented in HTML5 (e.g., http://gearsblog.blogspot.ca/2011/03/stopping-gears.html). http://ianlumb.files.wordpress.com/2008/04/desktop-software-figures0041.png
  19. Alternate perspective ;-)
  20. Many HPC apps, on the ground or in the cloud, are latency intolerant. http://rusvesna.su/sites/default/files/styles/orign_wm/public/tiraspol_atakuet_kiev_i_kishinev_vvedeniem_poshliny_na_moloko.jpg?itok=sb6_OU8l - the milk part
  21. http://icdn4.digitaltrends.com/image/mod-291145_nvidia_tesla_k80_dual-gpu_accelerator_3qtr-515x343.jpg
  22. https://www.google.com/trends/explore#q=%2Fm%2F0wkcjgj&geo=US&cmpt=geo&tz=Etc%2FGMT%2B7
  23. OpenStack is an open source implementation of a platform for cloud computing; it is not a standard, or a collection of standards.