Best practices for optimizing Red Hat platforms for large scale datacenter deployments on DGX systems

Best practices for optimizing Red Hat
platforms for large scale datacenter
deployments on DGX systems
Charlie Boyle, NVIDIA
Andre Beausoleil and Jeremy Eder, Red Hat
NVIDIA GTC, Washington, DC, October, 2018

Agenda
● Relationship Overview
● Announcements / What’s New
● Tuned profile for DGX
● NGC Container Support overview
● RHEL, OpenShift, DGX-1 Integration Details
2

Summary of Announcements!
NVIDIA DGX-1 is now CERTIFIED on
Red Hat Enterprise Linux 7

Support for using DGX nodes as
workers in OpenShift 3.10 or later

NGC containers can run on Red Hat
Enterprise Linux and OpenShift

Expanded Engineering Relationship
NGC containers can run on Red Hat
Enterprise Linux and OpenShift

Red Hat/NVIDIA Partnership Timeline

Open Source Project Collaboration
Key Red Hat Maintainer: Ben Skeggs
Qualified with new NVIDIA architectures
Part of complete OSS toolchain for HMM
NOUVEAU DRIVER
Key Red Hat developer: Jerome Glisse
Memory management between device & CPU
Key developer simplification, not just NVIDIA
HETEROGENEOUS MEMORY MGMT.
Key Red Hat Maintainer: Jakub Jelinek
OpenMP common library
GPU AWARE GCC (LIBGOMP)
Multiple vGPUs for compute and graphic
workloads
NVIDIA VGPU & RHV

9
Joint Testing of Critical CVEs

Installing Red Hat Enterprise Linux 7
●
●
●

Tuned
Tuning profile delivery mechanism
Red Hat ships tuned profiles that
improve performance for many
workloads...hopefully yours!
Okay, but why do I care ???

Children
Parents
Tuned: Your Custom Profiles
latency-performancethroughput-performance
network-latencynetwork-throughput
virtual-host
virtual-guest
balanced
desktop
Your Database ProfileYour Web Profile Your Middleware Profile
Children/Grandchildren

Tuned: Profile Inheritance (throughput)
throughput-performance
dgx-performance
governor=performance
energy_perf_bias=performance
min_perf_pct=100
readahead=4096
kernel.sched_min_granularity_ns = 10000000
kernel.sched_wakeup_granularity_ns = 15000000
vm.dirty_background_ratio = 10
vm.swappiness=10
[bootloader]
cmdline = ast.modeset=0
rd.driver.blacklist=nouveau nouveau.modeset=0
transparent_hugepage=madvise console=tty0
console=ttyS1,115200n8
intremap=no_x2apic_optout

Red Hat OpenShift Container Platform

OPENSHIFT IS GAINING MOMENTUM
OPENSHIFT CUSTOMER GROWTH IS ACCELERATING

COMPREHENSIVECLOUD PARTNERSCUSTOMERSCODE
Strong partnerships
with cloud providers,
ISVs, CCSPs.
Extensive container
catalog of certified
partner images.
Comprehensive portfolio of
container products and
services, including developer
tools, security, application
services, storage, and
management.
Red Hat is the leading
Kubernetes developer and
contributor with Google.
We make container
development easy, reliable,
and more secure.
Most reference customers
running in production.
Years of experience
running OpenShift Online
and OpenShift Dedicated
services.
Why OpenShift is the Best Choice

One Platform to...
OpenShift is the single platform
to run any application:
● Old or new
● Monolithic/Microservice
17
FSI

What does an OpenShift (OCP) Cluster look like?
c

What does an OpenShift (OCP) Cluster look like?
c
DGX-1 server
with Red Hat Enterprise Linux and
OpenShift Container platform (OCP)

● Resource Management Working Group
○ Features Delivered
■ Device Plugins (GPU/Bypass/FPGA)
■ CPU Manager (exclusive cores)
■ Huge Pages Support
○ Extensive Roadmap
● Intel, IBM, Google, NVIDIA, Red Hat, many more...
Upstream First: Kubernetes Working Groups

● Network Plumbing Working Group
○ Formalized Dec 2017
● Goal is to implement an out of tree, pseudo-standard collection of
CRDs for multiple networks, owned by sig-network, *out of tree*
● Separate control- and data-plane, Overlapping IPs, Fast Data-plane
● IBM, Intel, Red Hat, Huawei, Cisco, Tigera...at least.
Upstream First: Kubernetes Working Groups

Control Plane
Compute and GPU Nodes
Infrastructure
master
and etcd
master
and etcd
master
and etcd
registry
and
router
registry
and
router
LB
registry
and
router
OpenShift Cluster Topology
DGX-1 DGX-1
DGX-1 DGX-1

● How to enable software to take advantage of “special”
hardware
● Create Node Pools
○ Mark them as “special”
○ Taints/Tolerations
○ Priority/Preemption
○ ExtendedResourceTole
ration
DGX-1 DGX-1
DGX-1 DGX-1

hardware
● Tune/Configure the OS
○ Tuned Profiles
○ CPU Isolation
○ sysctlsCompute and GPU Nodes
DGX-1 DGX-1
DGX-1 DGX-1

hardware
● Optimize your workload
○ Dedicate CPU cores
○ Consume hugepages
DGX-1 DGX-1
DGX-1 DGX-1

hardware
● Enable the Hardware
○ Install drivers
○ Deploy Device Plugin
DGX-1 DGX-1
DGX-1 DGX-1

hardware
● Consume the Device
○ KubeFlow Template
deployment
DGX-1 DGX-1
DGX-1 DGX-1

Soft or Hard Shared Cluster Partitioning?
Priority and Preemption
● Create PriorityClasses based on business
goals
● Annotate pod specs with priorityClassName
● If all GPUs are used
○ A high prio pod is queued
○ A low prio pod is running
○ Kube will preempt low prio pod
■ And schedule high prio pod
● Ensures optimal density
Taints and Toleration
● Taints are “node labels with policies”
○ You can taint a node like
○ nvidia.com/gpu=value:NoSchedule
● Then a pod will have to “tolerate” the
nvidia.com/gpu taint, otherwise it won’t run
on that node.
● This allows you to create “node pools”
● Could lead to under-utilized resources
● Might make sense for security or business
rules

OpenShift + NVIDIA Device Plugin on DGX
Red Hat Enterprise Linux
30
OpenShift Container Platform
Linux Container Runtime nvidia-container-runtime-hook
NVIDIA Driver
libnvidia-container
NGC-gpu-pod-1
nvidia-device-plugin
NGC-gpu-pod-2 NGC-gpu-pod-3

Volta GPU Kubelet
Device Plugin
(daemonset)
Kube Scheduler
Volta GPU
Volta GPU
Volta GPU
Volta GPU
Volta GPU
Volta GPU
Volta GPU
Benchmark (pod)
resources:
limits:
nvidia.com/gpu: 8
oc create
31

Benchmark (pod)
resources:
limits:
nvidia.com/gpu: 8
Volta GPU Kubelet
Device Plugin
(daemonset)
Kube Scheduler
Volta GPU
Volta GPU
Volta GPU
Volta GPU
Volta GPU
Volta GPU
Volta GPU
Benchmark (pod)
resources:
limits:
nvidia.com/gpu: 8
oc create
32

Demo
Link
1. Login to openshift web console and land at Service Catalog
2. Verify NVIDIA device-plugin daemonset is running in kube-system namespace
3. Show how you can get a console in any running container
4. Change to nvidia namespace, and filter catalog to only show NGC templates
5. Start a TensorRT Inference Server that uses 4 of the 8 GPUs in the DGX
6. Show logs of tensorRT pod, that it is consuming 4 GPUs and that the model server is ready (curl
output)
7. Go back to service catalog and again filter by NGC images
8. Start NGC caffe framework pod, and configure it to use the remaining 4 GPUs
9. Show logs of caffe pod, show nvidia-smi, and show that this pod can access the inference server via
curl

●
●
○
●
●
NVIDIA Driver Packaging

Red Hat/NVIDIA Expanded Collaboration
● Driver Packaging
● Expanded DGX Testing
● Monitoring
● Heterogeneous Clusters
○ Resource API
● Topology Awareness
● Resource Quota API

References
● radanalytics templates for ML-workflow on OpenShift
● How to use GPUs with DevicePlugin in OpenShift 3.10
● Machine-Learning OpenShift Commons

THANK YOU
plus.google.com/+RedHat
linkedin.com/company/red-hat
youtube.com/user/RedHatVideos
facebook.com/redhatinc
twitter.com/RedHat

Best practices for optimizing Red Hat platforms for large scale datacenter deployments on DGX systems

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Best practices for optimizing Red Hat platforms for large scale datacenter deployments on DGX systems

Semelhante a Best practices for optimizing Red Hat platforms for large scale datacenter deployments on DGX systems (20)

Mais de Jeremy Eder

Mais de Jeremy Eder (9)

Último

Último (20)

Best practices for optimizing Red Hat platforms for large scale datacenter deployments on DGX systems