SlideShare uma empresa Scribd logo
1 de 23
Baixar para ler offline
HPE demystifies deep
learning for faster intelligence
across all organizations
Edmondo Orlotti
HPC & AI Business Development Manager
October, 2017
Data analytics and insights are fueling the digital transformation
Enhanced customer
experiences
Improved products
and services
Optimized business
processes
Personalized, real-time
mobile insights for retail
Genomics sequencing
analytics for Life Sciences
Predictive maintenance
insights for manufacturing
2
AI propels analytics and insights to a new dimension
Unleash automated intelligence from massive data volumes
3
Data protection
and archival to
mitigate risk
HPE Fraud
Detection using
deep learning
Infrastructure
modernization for new
data types and scale
User behavioral analytics
for the data center using
machine learning
Next generation
analytics for
real-time business
HPE Intelligent Edge
real-time analytics with
SAP Leonardo
Insights from
modeling and
simulation
Deep learning in HPC
using GPU-accelerated
computing
What’s all the “buzz” around AI?
4
1 Source : McKinsey AI report, 2017
Gain competitive advantage using the vibrant new market of AI
Overview of HPE’s GPU portfolio
HPE has a comprehensive, purpose-built portfolio for deep learning
6
Compute ideal for training models in data center Edge analytics and
inference engine
Compute for both training models
and inference at edge
HPE Apollo 6500
HPC Storage Choice of Fabrics
HPE SGI 8600
Government,
academia and
industries
Financial
services
Life Sciences,
Health
Government
and academia
Autonomous
vehicles / Mfg.
AI Software Framework
HPE Apollo
4520
Arista
Networking
Intel® Omni-Path
Architecture
Mellanox
InfiniBand
HPE FlexFabric
Network
HPC Data
Management
Framework
Software
Large-scale, storage
virtualization & tiered
data management
platform
Petaflop scale for deep
learning and HPC
The enterprise bridge to
accelerated computing
HPE Apollo 2000
The bridge to enterprise
scale-out architecture
HPE Edgeline EL4000
Unprecedented deep edge compute and
high capacity storage; open standards
Advisory, professional and operational services, HPE Flexible Capacity, HPE Datacenter Care for Hyperscale
HPE Apollo sx40
Maximize GPU capacity and
performance with lower TCO
Easy Setup and Flexible OS
Using Bright Computing’s distribution
of deep learning software
development components and
workload management tool
integration
Introducing Tesla V100
5
TESLA V100
THE MOST ADVANCED DATA CENTER GPU EVER BUILT
5,120 CUDA cores
640 NEW Tensor cores
7.5 FP64 TFLOPS | 15 FP32 TFLOPS
120 Tensor TFLOPS
20MB SM RF | 16MB Cache | 16GB HBM2 @ 900 GB/s
300 GB/s NVLink
V100
Tensor Cores
2
P100
FP32
V100
Tensor Cores
P100
FP16
ImagesperSecond
ImagesperSecond
2.4x faster
ResNet-50 Inference
TensorRT - 7ms Latency
3.7x faster
V100 measured on pre-production hardware.
ResNet-50 Training
VOLTA: A GIANT LEAP FOR DEEP LEARNING
4
The Fastest and Most Productive GPU for Deep Learning and HPC
Volta Architecture
Most Productive GPU Tensor Core
120 Programmable
TFLOPS Deep Learning
Improved SIMT Model
New Algorithms
Volta MPS
Inference Utilization
Improved NVLink &
HBM2
Efficient Bandwidth
INTRODUCING TESLA V100
5
*full GV100 chip contains 84 SMs
21B transistors
815 mm2
80 SM
5120 CUDA Cores
640 Tensor Cores
16 GB HBM2
900 GB/s HBM2
300 GB/s NVLink
TESLA V100 ARCHITECTURE
Completely new ISA
Twice the schedulers
Simplified Issue Logic
Large, fast L1 cache
Improved SIMT model
Tensor acceleration
VOLTA V100 SM
8
VOLTA NVLINK
300GB/sec
50% more links
28% faster signaling
Hardware
Accelerated
Work Submission
Hardware
Isolation
VOLTA MULTI-PROCESS SERVICE
Volta GV100
A B C
CUDA MULTI-PROCESS SERVICE CONTROL
CPU Processes
14
GPU Execution
Volta MPS Enhancements:
• Reduced launch latency
• Improved launch throughput
• Improved quality of service with
scheduler partitioning
• More reliable performance
• 3x more clients than Pascal
A B C
VOLTA MULTI-PROCESS SERVICE
Volta: Starvation Free AlgorithmsPascal: for messages Lock-Free
Algorithms
Threads cannot wait
Threads may wait for messages
VOLTA: INDEPENDENT THREAD SCHEDULING
6
ALL MAJOR FRAMEWORKSVOLTA-OPTIMIZED cuDNN
MATRIX DATA OPTIMIZATION:
Dense Matrix of Tensor Compute
TENSOR-OP CONVERSION:
FP32 to Tensor Op Data for
Frameworks
VOLTA TENSOR CORE
4x4 matrix processing array
D[FP32] = A[FP16] * B[FP16] + C[FP32]
Optimized For Deep Learning
NEW TENSOR CORE BUILT FOR AI
Delivering 120 TFLOPS of DL Performance
7
Over 80x DL Training
Performance in 3 Years
cuDNN3
1x K80
cuDNN2
8x P100
cuDNN6
4x M40
8x V100
cuDNN7
0x
20x
40x
60x
80x
100x
Q1
15
Q3
15
Q2
17
Q2
16
Googlenet Training Performance
(Speedup Vs K80)
SpeedupvsK80
85% Scale-Out Efficiency
Scales to 64 GPUs with Microsoft
Cognitive Toolkit
0 5 10 15
64X V100
8X V100
8X P100
Multi-Node Training with NCCL2.0
(ResNet-50)
ResNet50 Training for 90 Epochs with 1.28M images dataset | Using
Caffe2 | V100 performance measured on pre-production hardware.
1 Hour
7.4 Hours
18 Hours
3X Reduction in Time to Train
Over P100
0 10 20
1X
V100
1X
P100
2X
CPU
LSTM Training
(Neural Machine Translation)
Neural Machine Translation Training for 13 Epochs |German ->English,
WMT15 subset | CPU = 2x Xeon E5 2699 V4 | V100 performance
measured on pre-production hardware.
15 Days
18 Hours
6 Hours
AI PERFORMANCE
3X Faster DL Training Performance
8
TensorRT
Fuse Layers
Compact
Optimize Precision
(FP32, FP16, INT8)
Compiled
Real-time
Network
Trained
Neural
Network
3x more throughput at 7ms latency with V100
(ResNet-50)
5,000
33ms
0
1,000
2,000
3,000
4,000
CPU Tesla P100 Tesla P100
(TensorFlow) (TensorRT)
Tesla V100
(TensorRT)Throughput@7ms(Images/Sec)
CPU Server: 2X Xeon E5-2660 V4; GPU: w/P100, w/V100 (@150W) | V100 performance measured on pre-production hardware.
3X
10ms
7ms
7ms
VOLTA DELIVERS 3X MORE INFERENCE THROUGHPUT
Low Latency performance with V100 and TensorRT
10
SINGLE UNIVERSAL GPU FOR ALL ACCELERATED WORKLOADS
V100 UNIVERSAL GPU
BOOSTS ALLACCELERATED WORKLOADS
HPC
1.5X
Vs P100
k
3X
Vs P100
AI Training
3X
Vs P100
AI Inference
2X
Vs M60
Virtual Desktop
11
80% Perf at Half the Power
40% More Performance in a Rack
V100
Max Efficiency
V100
Max Performance
13 KW Rack
4 Nodes of 8xV100
13
ResNet-50 Networks
Trained Per Day
13 KW Rack
7 Nodes of 8xV100
18
ResNet-50 Networks
Trained Per Day
ResNet-50 Training, Max Efficiency run with V100@160W | V100 performance
measured on pre-production hardware.
OPTIMIZED FOR DATACENTER EFFICIENCY
12
For NVLink Servers For PCIe Servers
Compute 7.5 TF DP ∙ 15 TF SP ∙ 120 TF DL 7 TF DP ∙ 14 TF SP ∙ 112 TF DL
Memory HBM2: 900 GB/s ∙ 16 GB HBM2: 900 GB/s ∙ 16 GB
Interconnect NVLink (up to 300 GB/s) +
PCIe Gen3 (up to 32 GB/s)
PCIe Gen3 (up to 32 GB/s)
Power 300W 250W
TESLA V100 SPECIFICATIONS
HPE enables an optimized Deep Learning Experience
22
Hardware Infrastructure
Deep Learning Services
Fraud Detection, Predictive
Maintenance, Patient Diagnostics
Applications
Deep Learning Frameworks
Data Infrastructure
HPE Confidential
External announcement at NVDIIA GTC on May10th, 2017
Thank you
Edmondo.Orlotti@HPE.com
23December 2015, #c03880772

Mais conteúdo relacionado

Mais procurados

Intel Itanium Hotchips 2011 Overview
Intel Itanium Hotchips 2011 OverviewIntel Itanium Hotchips 2011 Overview
Intel Itanium Hotchips 2011 OverviewPauline Nist
 
Fujitsu World Tour 2017 - Compute Platform For The Digital World
Fujitsu World Tour 2017 - Compute Platform For The Digital WorldFujitsu World Tour 2017 - Compute Platform For The Digital World
Fujitsu World Tour 2017 - Compute Platform For The Digital WorldFujitsu India
 
INTEL® XEON® SCALABLE PROCESSORS
INTEL® XEON® SCALABLE PROCESSORSINTEL® XEON® SCALABLE PROCESSORS
INTEL® XEON® SCALABLE PROCESSORSTyrone Systems
 
Covid-19 Response Capability with Power Systems
Covid-19 Response Capability with Power SystemsCovid-19 Response Capability with Power Systems
Covid-19 Response Capability with Power SystemsGanesan Narayanasamy
 
Intel® Xeon® Scalable Processors Enabled Applications Marketing Guide
Intel® Xeon® Scalable Processors Enabled Applications Marketing GuideIntel® Xeon® Scalable Processors Enabled Applications Marketing Guide
Intel® Xeon® Scalable Processors Enabled Applications Marketing GuideIntel IT Center
 
HPC Market Update and Observations on Big Memory
HPC Market Update and Observations on Big MemoryHPC Market Update and Observations on Big Memory
HPC Market Update and Observations on Big MemoryMemVerge
 
OpenPOWER/POWER9 Webinar from MIT and IBM
OpenPOWER/POWER9 Webinar from MIT and IBM OpenPOWER/POWER9 Webinar from MIT and IBM
OpenPOWER/POWER9 Webinar from MIT and IBM Ganesan Narayanasamy
 
AIDC NY: BODO AI Presentation - 09.19.2019
AIDC NY: BODO AI Presentation - 09.19.2019AIDC NY: BODO AI Presentation - 09.19.2019
AIDC NY: BODO AI Presentation - 09.19.2019Intel® Software
 
SUPERMICRO Innovative Computing Architecture
SUPERMICRO Innovative Computing ArchitectureSUPERMICRO Innovative Computing Architecture
SUPERMICRO Innovative Computing ArchitectureIntel IT Center
 
IBM Data Centric Systems & OpenPOWER
IBM Data Centric Systems & OpenPOWERIBM Data Centric Systems & OpenPOWER
IBM Data Centric Systems & OpenPOWERinside-BigData.com
 
Trends in Systems and How to Get Efficient Performance
Trends in Systems and How to Get Efficient PerformanceTrends in Systems and How to Get Efficient Performance
Trends in Systems and How to Get Efficient Performanceinside-BigData.com
 
Large-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC WorkloadsLarge-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC Workloadsinside-BigData.com
 
Ibm symp14 referentin_barbara koch_power_8 launch bk
Ibm symp14 referentin_barbara koch_power_8 launch bkIbm symp14 referentin_barbara koch_power_8 launch bk
Ibm symp14 referentin_barbara koch_power_8 launch bkIBM Switzerland
 
Application Report: Big Data - Big Cluster Interconnects
Application Report: Big Data - Big Cluster InterconnectsApplication Report: Big Data - Big Cluster Interconnects
Application Report: Big Data - Big Cluster InterconnectsIT Brand Pulse
 
Design of a low power processor for Embedded system applications
Design of a low power processor for Embedded system applicationsDesign of a low power processor for Embedded system applications
Design of a low power processor for Embedded system applicationsROHIT89352
 

Mais procurados (20)

Intel Itanium Hotchips 2011 Overview
Intel Itanium Hotchips 2011 OverviewIntel Itanium Hotchips 2011 Overview
Intel Itanium Hotchips 2011 Overview
 
Fujitsu World Tour 2017 - Compute Platform For The Digital World
Fujitsu World Tour 2017 - Compute Platform For The Digital WorldFujitsu World Tour 2017 - Compute Platform For The Digital World
Fujitsu World Tour 2017 - Compute Platform For The Digital World
 
INTEL® XEON® SCALABLE PROCESSORS
INTEL® XEON® SCALABLE PROCESSORSINTEL® XEON® SCALABLE PROCESSORS
INTEL® XEON® SCALABLE PROCESSORS
 
Covid-19 Response Capability with Power Systems
Covid-19 Response Capability with Power SystemsCovid-19 Response Capability with Power Systems
Covid-19 Response Capability with Power Systems
 
Intel® Xeon® Scalable Processors Enabled Applications Marketing Guide
Intel® Xeon® Scalable Processors Enabled Applications Marketing GuideIntel® Xeon® Scalable Processors Enabled Applications Marketing Guide
Intel® Xeon® Scalable Processors Enabled Applications Marketing Guide
 
HPC Market Update and Observations on Big Memory
HPC Market Update and Observations on Big MemoryHPC Market Update and Observations on Big Memory
HPC Market Update and Observations on Big Memory
 
OpenPOWER/POWER9 Webinar from MIT and IBM
OpenPOWER/POWER9 Webinar from MIT and IBM OpenPOWER/POWER9 Webinar from MIT and IBM
OpenPOWER/POWER9 Webinar from MIT and IBM
 
OpenPOWER/POWER9 AI webinar
OpenPOWER/POWER9 AI webinar OpenPOWER/POWER9 AI webinar
OpenPOWER/POWER9 AI webinar
 
PowerAI Deep Dive ( key points )
PowerAI Deep Dive ( key points )PowerAI Deep Dive ( key points )
PowerAI Deep Dive ( key points )
 
AIDC NY: BODO AI Presentation - 09.19.2019
AIDC NY: BODO AI Presentation - 09.19.2019AIDC NY: BODO AI Presentation - 09.19.2019
AIDC NY: BODO AI Presentation - 09.19.2019
 
SUPERMICRO Innovative Computing Architecture
SUPERMICRO Innovative Computing ArchitectureSUPERMICRO Innovative Computing Architecture
SUPERMICRO Innovative Computing Architecture
 
IBM Data Centric Systems & OpenPOWER
IBM Data Centric Systems & OpenPOWERIBM Data Centric Systems & OpenPOWER
IBM Data Centric Systems & OpenPOWER
 
WML OpenPOWER presentation
WML OpenPOWER presentationWML OpenPOWER presentation
WML OpenPOWER presentation
 
Trends in Systems and How to Get Efficient Performance
Trends in Systems and How to Get Efficient PerformanceTrends in Systems and How to Get Efficient Performance
Trends in Systems and How to Get Efficient Performance
 
Large-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC WorkloadsLarge-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC Workloads
 
Deeplearningusingcloudpakfordata
DeeplearningusingcloudpakfordataDeeplearningusingcloudpakfordata
Deeplearningusingcloudpakfordata
 
Ibm symp14 referentin_barbara koch_power_8 launch bk
Ibm symp14 referentin_barbara koch_power_8 launch bkIbm symp14 referentin_barbara koch_power_8 launch bk
Ibm symp14 referentin_barbara koch_power_8 launch bk
 
OpenPOWER Webinar
OpenPOWER Webinar OpenPOWER Webinar
OpenPOWER Webinar
 
Application Report: Big Data - Big Cluster Interconnects
Application Report: Big Data - Big Cluster InterconnectsApplication Report: Big Data - Big Cluster Interconnects
Application Report: Big Data - Big Cluster Interconnects
 
Design of a low power processor for Embedded system applications
Design of a low power processor for Embedded system applicationsDesign of a low power processor for Embedded system applications
Design of a low power processor for Embedded system applications
 

Destaque

Towards Chainer v1.5
Towards Chainer v1.5Towards Chainer v1.5
Towards Chainer v1.5Seiya Tokui
 
A Chainer MeetUp Talk
A Chainer MeetUp TalkA Chainer MeetUp Talk
A Chainer MeetUp TalkYusuke Oda
 
Chainer meetup
Chainer meetupChainer meetup
Chainer meetupkikusu
 
Chainer meetup20151014
Chainer meetup20151014Chainer meetup20151014
Chainer meetup20151014Jiro Nishitoba
 
Chainer Meetup LT (Alpaca)
Chainer Meetup LT (Alpaca)Chainer Meetup LT (Alpaca)
Chainer Meetup LT (Alpaca)Jun-ya Norimatsu
 
NVIDIA 更新情報: Tesla P100 PCIe/cuDNN 5.1
NVIDIA 更新情報: Tesla P100 PCIe/cuDNN 5.1NVIDIA 更新情報: Tesla P100 PCIe/cuDNN 5.1
NVIDIA 更新情報: Tesla P100 PCIe/cuDNN 5.1NVIDIA Japan
 
Chainerを使って細胞を数えてみた
Chainerを使って細胞を数えてみたChainerを使って細胞を数えてみた
Chainerを使って細胞を数えてみたsamacoba1983
 
Chainer Update v1.8.0 -> v1.10.0+
Chainer Update v1.8.0 -> v1.10.0+Chainer Update v1.8.0 -> v1.10.0+
Chainer Update v1.8.0 -> v1.10.0+Seiya Tokui
 
深層学習ライブラリの環境問題Chainer Meetup2016 07-02
深層学習ライブラリの環境問題Chainer Meetup2016 07-02深層学習ライブラリの環境問題Chainer Meetup2016 07-02
深層学習ライブラリの環境問題Chainer Meetup2016 07-02Yuta Kashino
 
On the benchmark of Chainer
On the benchmark of ChainerOn the benchmark of Chainer
On the benchmark of ChainerKenta Oono
 
俺のtensorが全然flowしないのでみんなchainer使おう by DEEPstation
俺のtensorが全然flowしないのでみんなchainer使おう by DEEPstation俺のtensorが全然flowしないのでみんなchainer使おう by DEEPstation
俺のtensorが全然flowしないのでみんなchainer使おう by DEEPstationYusuke HIDESHIMA
 
ヤフー音声認識サービスでのディープラーニングとGPU利用事例
ヤフー音声認識サービスでのディープラーニングとGPU利用事例ヤフー音声認識サービスでのディープラーニングとGPU利用事例
ヤフー音声認識サービスでのディープラーニングとGPU利用事例Yahoo!デベロッパーネットワーク
 
Chainer Development Plan 2015/12
Chainer Development Plan 2015/12Chainer Development Plan 2015/12
Chainer Development Plan 2015/12Seiya Tokui
 
Chainer, Cupy入門
Chainer, Cupy入門Chainer, Cupy入門
Chainer, Cupy入門Yuya Unno
 
深層学習ライブラリのプログラミングモデル
深層学習ライブラリのプログラミングモデル深層学習ライブラリのプログラミングモデル
深層学習ライブラリのプログラミングモデルYuta Kashino
 
Capitalicoでのchainer 1.1 → 1.5 バージョンアップ事例
Capitalicoでのchainer 1.1 → 1.5 バージョンアップ事例Capitalicoでのchainer 1.1 → 1.5 バージョンアップ事例
Capitalicoでのchainer 1.1 → 1.5 バージョンアップ事例Jun-ya Norimatsu
 
TensorFlow XLAの可能性
TensorFlow XLAの可能性 TensorFlow XLAの可能性
TensorFlow XLAの可能性 Mr. Vengineer
 
マシンパーセプション研究におけるChainer活用事例
マシンパーセプション研究におけるChainer活用事例マシンパーセプション研究におけるChainer活用事例
マシンパーセプション研究におけるChainer活用事例nlab_utokyo
 
Using Raspberry Pi GPU for DNN
Using Raspberry Pi GPU for DNNUsing Raspberry Pi GPU for DNN
Using Raspberry Pi GPU for DNNnotogawa
 

Destaque (20)

Towards Chainer v1.5
Towards Chainer v1.5Towards Chainer v1.5
Towards Chainer v1.5
 
A Chainer MeetUp Talk
A Chainer MeetUp TalkA Chainer MeetUp Talk
A Chainer MeetUp Talk
 
Chainer meetup
Chainer meetupChainer meetup
Chainer meetup
 
Chainer meetup20151014
Chainer meetup20151014Chainer meetup20151014
Chainer meetup20151014
 
Chainer Meetup LT (Alpaca)
Chainer Meetup LT (Alpaca)Chainer Meetup LT (Alpaca)
Chainer Meetup LT (Alpaca)
 
LT@Chainer Meetup
LT@Chainer MeetupLT@Chainer Meetup
LT@Chainer Meetup
 
NVIDIA 更新情報: Tesla P100 PCIe/cuDNN 5.1
NVIDIA 更新情報: Tesla P100 PCIe/cuDNN 5.1NVIDIA 更新情報: Tesla P100 PCIe/cuDNN 5.1
NVIDIA 更新情報: Tesla P100 PCIe/cuDNN 5.1
 
Chainerを使って細胞を数えてみた
Chainerを使って細胞を数えてみたChainerを使って細胞を数えてみた
Chainerを使って細胞を数えてみた
 
Chainer Update v1.8.0 -> v1.10.0+
Chainer Update v1.8.0 -> v1.10.0+Chainer Update v1.8.0 -> v1.10.0+
Chainer Update v1.8.0 -> v1.10.0+
 
深層学習ライブラリの環境問題Chainer Meetup2016 07-02
深層学習ライブラリの環境問題Chainer Meetup2016 07-02深層学習ライブラリの環境問題Chainer Meetup2016 07-02
深層学習ライブラリの環境問題Chainer Meetup2016 07-02
 
On the benchmark of Chainer
On the benchmark of ChainerOn the benchmark of Chainer
On the benchmark of Chainer
 
俺のtensorが全然flowしないのでみんなchainer使おう by DEEPstation
俺のtensorが全然flowしないのでみんなchainer使おう by DEEPstation俺のtensorが全然flowしないのでみんなchainer使おう by DEEPstation
俺のtensorが全然flowしないのでみんなchainer使おう by DEEPstation
 
ヤフー音声認識サービスでのディープラーニングとGPU利用事例
ヤフー音声認識サービスでのディープラーニングとGPU利用事例ヤフー音声認識サービスでのディープラーニングとGPU利用事例
ヤフー音声認識サービスでのディープラーニングとGPU利用事例
 
Chainer Development Plan 2015/12
Chainer Development Plan 2015/12Chainer Development Plan 2015/12
Chainer Development Plan 2015/12
 
Chainer, Cupy入門
Chainer, Cupy入門Chainer, Cupy入門
Chainer, Cupy入門
 
深層学習ライブラリのプログラミングモデル
深層学習ライブラリのプログラミングモデル深層学習ライブラリのプログラミングモデル
深層学習ライブラリのプログラミングモデル
 
Capitalicoでのchainer 1.1 → 1.5 バージョンアップ事例
Capitalicoでのchainer 1.1 → 1.5 バージョンアップ事例Capitalicoでのchainer 1.1 → 1.5 バージョンアップ事例
Capitalicoでのchainer 1.1 → 1.5 バージョンアップ事例
 
TensorFlow XLAの可能性
TensorFlow XLAの可能性 TensorFlow XLAの可能性
TensorFlow XLAの可能性
 
マシンパーセプション研究におけるChainer活用事例
マシンパーセプション研究におけるChainer活用事例マシンパーセプション研究におけるChainer活用事例
マシンパーセプション研究におけるChainer活用事例
 
Using Raspberry Pi GPU for DNN
Using Raspberry Pi GPU for DNNUsing Raspberry Pi GPU for DNN
Using Raspberry Pi GPU for DNN
 

Semelhante a HPC DAY 2017 | NVIDIA Volta Architecture. Performance. Efficiency. Availability

GTC Taiwan 2017 企業端深度學習與人工智慧應用
GTC Taiwan 2017 企業端深度學習與人工智慧應用GTC Taiwan 2017 企業端深度學習與人工智慧應用
GTC Taiwan 2017 企業端深度學習與人工智慧應用NVIDIA Taiwan
 
HPE and NVIDIA empowering AI and IoT
HPE and NVIDIA empowering AI and IoTHPE and NVIDIA empowering AI and IoT
HPE and NVIDIA empowering AI and IoTRenee Yao
 
Dell and NVIDIA for Your AI workloads in the Data Center
Dell and NVIDIA for Your AI workloads in the Data CenterDell and NVIDIA for Your AI workloads in the Data Center
Dell and NVIDIA for Your AI workloads in the Data CenterRenee Yao
 
Data Science Week 2016. NVIDIA. "Платформы и инструменты для реализации систе...
Data Science Week 2016. NVIDIA. "Платформы и инструменты для реализации систе...Data Science Week 2016. NVIDIA. "Платформы и инструменты для реализации систе...
Data Science Week 2016. NVIDIA. "Платформы и инструменты для реализации систе...Newprolab
 
計算力学シミュレーションに GPU は役立つのか?
計算力学シミュレーションに GPU は役立つのか?計算力学シミュレーションに GPU は役立つのか?
計算力学シミュレーションに GPU は役立つのか?Shinnosuke Furuya
 
Implementing AI: High Performance Architectures: A Universal Accelerated Comp...
Implementing AI: High Performance Architectures: A Universal Accelerated Comp...Implementing AI: High Performance Architectures: A Universal Accelerated Comp...
Implementing AI: High Performance Architectures: A Universal Accelerated Comp...KTN
 
AI, A New Computing Model
AI, A New Computing ModelAI, A New Computing Model
AI, A New Computing ModelNVIDIA Taiwan
 
NVIDIA DGX User Group 1st Meet Up_30 Apr 2021.pdf
NVIDIA DGX User Group 1st Meet Up_30 Apr 2021.pdfNVIDIA DGX User Group 1st Meet Up_30 Apr 2021.pdf
NVIDIA DGX User Group 1st Meet Up_30 Apr 2021.pdfMuhammadAbdullah311866
 
Build FAST Deep Learning Apps with Docker on OpenPOWER and GPUs
Build FAST Deep Learning Apps with Docker on OpenPOWER and GPUs  Build FAST Deep Learning Apps with Docker on OpenPOWER and GPUs
Build FAST Deep Learning Apps with Docker on OpenPOWER and GPUs Indrajit Poddar
 
Ac922 watson 180208 v1
Ac922 watson 180208 v1Ac922 watson 180208 v1
Ac922 watson 180208 v1IBM Sverige
 
GTC 2017: Powering the AI Revolution
GTC 2017: Powering the AI RevolutionGTC 2017: Powering the AI Revolution
GTC 2017: Powering the AI RevolutionNVIDIA
 
Introduction to PowerAI - The Enterprise AI Platform
Introduction to PowerAI - The Enterprise AI PlatformIntroduction to PowerAI - The Enterprise AI Platform
Introduction to PowerAI - The Enterprise AI PlatformIndrajit Poddar
 
Ceph on Intel: Intel Storage Components, Benchmarks, and Contributions
Ceph on Intel: Intel Storage Components, Benchmarks, and ContributionsCeph on Intel: Intel Storage Components, Benchmarks, and Contributions
Ceph on Intel: Intel Storage Components, Benchmarks, and ContributionsRed_Hat_Storage
 
Ceph on Intel: Intel Storage Components, Benchmarks, and Contributions
Ceph on Intel: Intel Storage Components, Benchmarks, and ContributionsCeph on Intel: Intel Storage Components, Benchmarks, and Contributions
Ceph on Intel: Intel Storage Components, Benchmarks, and ContributionsColleen Corrice
 
Intel Distribution for Python - Scaling for HPC and Big Data
Intel Distribution for Python - Scaling for HPC and Big DataIntel Distribution for Python - Scaling for HPC and Big Data
Intel Distribution for Python - Scaling for HPC and Big DataDESMOND YUEN
 
Harnessing the virtual realm for successful real world artificial intelligence
Harnessing the virtual realm for successful real world artificial intelligenceHarnessing the virtual realm for successful real world artificial intelligence
Harnessing the virtual realm for successful real world artificial intelligenceAlison B. Lowndes
 
TECHNICAL OVERVIEW NVIDIA DEEP LEARNING PLATFORM Giant Leaps in Performance ...
TECHNICAL OVERVIEW NVIDIA DEEP  LEARNING PLATFORM Giant Leaps in Performance ...TECHNICAL OVERVIEW NVIDIA DEEP  LEARNING PLATFORM Giant Leaps in Performance ...
TECHNICAL OVERVIEW NVIDIA DEEP LEARNING PLATFORM Giant Leaps in Performance ...Willy Marroquin (WillyDevNET)
 

Semelhante a HPC DAY 2017 | NVIDIA Volta Architecture. Performance. Efficiency. Availability (20)

GTC Taiwan 2017 企業端深度學習與人工智慧應用
GTC Taiwan 2017 企業端深度學習與人工智慧應用GTC Taiwan 2017 企業端深度學習與人工智慧應用
GTC Taiwan 2017 企業端深度學習與人工智慧應用
 
HPE and NVIDIA empowering AI and IoT
HPE and NVIDIA empowering AI and IoTHPE and NVIDIA empowering AI and IoT
HPE and NVIDIA empowering AI and IoT
 
Dell and NVIDIA for Your AI workloads in the Data Center
Dell and NVIDIA for Your AI workloads in the Data CenterDell and NVIDIA for Your AI workloads in the Data Center
Dell and NVIDIA for Your AI workloads in the Data Center
 
Data Science Week 2016. NVIDIA. "Платформы и инструменты для реализации систе...
Data Science Week 2016. NVIDIA. "Платформы и инструменты для реализации систе...Data Science Week 2016. NVIDIA. "Платформы и инструменты для реализации систе...
Data Science Week 2016. NVIDIA. "Платформы и инструменты для реализации систе...
 
計算力学シミュレーションに GPU は役立つのか?
計算力学シミュレーションに GPU は役立つのか?計算力学シミュレーションに GPU は役立つのか?
計算力学シミュレーションに GPU は役立つのか?
 
Implementing AI: High Performance Architectures: A Universal Accelerated Comp...
Implementing AI: High Performance Architectures: A Universal Accelerated Comp...Implementing AI: High Performance Architectures: A Universal Accelerated Comp...
Implementing AI: High Performance Architectures: A Universal Accelerated Comp...
 
AI, A New Computing Model
AI, A New Computing ModelAI, A New Computing Model
AI, A New Computing Model
 
Deep Learning Update May 2016
Deep Learning Update May 2016Deep Learning Update May 2016
Deep Learning Update May 2016
 
NVIDIA DGX User Group 1st Meet Up_30 Apr 2021.pdf
NVIDIA DGX User Group 1st Meet Up_30 Apr 2021.pdfNVIDIA DGX User Group 1st Meet Up_30 Apr 2021.pdf
NVIDIA DGX User Group 1st Meet Up_30 Apr 2021.pdf
 
Build FAST Deep Learning Apps with Docker on OpenPOWER and GPUs
Build FAST Deep Learning Apps with Docker on OpenPOWER and GPUs  Build FAST Deep Learning Apps with Docker on OpenPOWER and GPUs
Build FAST Deep Learning Apps with Docker on OpenPOWER and GPUs
 
Ac922 watson 180208 v1
Ac922 watson 180208 v1Ac922 watson 180208 v1
Ac922 watson 180208 v1
 
GTC 2022 Keynote
GTC 2022 KeynoteGTC 2022 Keynote
GTC 2022 Keynote
 
GTC 2017: Powering the AI Revolution
GTC 2017: Powering the AI RevolutionGTC 2017: Powering the AI Revolution
GTC 2017: Powering the AI Revolution
 
Introduction to PowerAI - The Enterprise AI Platform
Introduction to PowerAI - The Enterprise AI PlatformIntroduction to PowerAI - The Enterprise AI Platform
Introduction to PowerAI - The Enterprise AI Platform
 
Ceph on Intel: Intel Storage Components, Benchmarks, and Contributions
Ceph on Intel: Intel Storage Components, Benchmarks, and ContributionsCeph on Intel: Intel Storage Components, Benchmarks, and Contributions
Ceph on Intel: Intel Storage Components, Benchmarks, and Contributions
 
Ceph on Intel: Intel Storage Components, Benchmarks, and Contributions
Ceph on Intel: Intel Storage Components, Benchmarks, and ContributionsCeph on Intel: Intel Storage Components, Benchmarks, and Contributions
Ceph on Intel: Intel Storage Components, Benchmarks, and Contributions
 
Intel Distribution for Python - Scaling for HPC and Big Data
Intel Distribution for Python - Scaling for HPC and Big DataIntel Distribution for Python - Scaling for HPC and Big Data
Intel Distribution for Python - Scaling for HPC and Big Data
 
Harnessing the virtual realm for successful real world artificial intelligence
Harnessing the virtual realm for successful real world artificial intelligenceHarnessing the virtual realm for successful real world artificial intelligence
Harnessing the virtual realm for successful real world artificial intelligence
 
TECHNICAL OVERVIEW NVIDIA DEEP LEARNING PLATFORM Giant Leaps in Performance ...
TECHNICAL OVERVIEW NVIDIA DEEP  LEARNING PLATFORM Giant Leaps in Performance ...TECHNICAL OVERVIEW NVIDIA DEEP  LEARNING PLATFORM Giant Leaps in Performance ...
TECHNICAL OVERVIEW NVIDIA DEEP LEARNING PLATFORM Giant Leaps in Performance ...
 
Open power ddl and lms
Open power ddl and lmsOpen power ddl and lms
Open power ddl and lms
 

Último

The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 

Último (20)

The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 

HPC DAY 2017 | NVIDIA Volta Architecture. Performance. Efficiency. Availability

  • 1. HPE demystifies deep learning for faster intelligence across all organizations Edmondo Orlotti HPC & AI Business Development Manager October, 2017
  • 2. Data analytics and insights are fueling the digital transformation Enhanced customer experiences Improved products and services Optimized business processes Personalized, real-time mobile insights for retail Genomics sequencing analytics for Life Sciences Predictive maintenance insights for manufacturing 2
  • 3. AI propels analytics and insights to a new dimension Unleash automated intelligence from massive data volumes 3 Data protection and archival to mitigate risk HPE Fraud Detection using deep learning Infrastructure modernization for new data types and scale User behavioral analytics for the data center using machine learning Next generation analytics for real-time business HPE Intelligent Edge real-time analytics with SAP Leonardo Insights from modeling and simulation Deep learning in HPC using GPU-accelerated computing
  • 4. What’s all the “buzz” around AI? 4 1 Source : McKinsey AI report, 2017 Gain competitive advantage using the vibrant new market of AI
  • 5. Overview of HPE’s GPU portfolio
  • 6. HPE has a comprehensive, purpose-built portfolio for deep learning 6 Compute ideal for training models in data center Edge analytics and inference engine Compute for both training models and inference at edge HPE Apollo 6500 HPC Storage Choice of Fabrics HPE SGI 8600 Government, academia and industries Financial services Life Sciences, Health Government and academia Autonomous vehicles / Mfg. AI Software Framework HPE Apollo 4520 Arista Networking Intel® Omni-Path Architecture Mellanox InfiniBand HPE FlexFabric Network HPC Data Management Framework Software Large-scale, storage virtualization & tiered data management platform Petaflop scale for deep learning and HPC The enterprise bridge to accelerated computing HPE Apollo 2000 The bridge to enterprise scale-out architecture HPE Edgeline EL4000 Unprecedented deep edge compute and high capacity storage; open standards Advisory, professional and operational services, HPE Flexible Capacity, HPE Datacenter Care for Hyperscale HPE Apollo sx40 Maximize GPU capacity and performance with lower TCO Easy Setup and Flexible OS Using Bright Computing’s distribution of deep learning software development components and workload management tool integration
  • 8. 5 TESLA V100 THE MOST ADVANCED DATA CENTER GPU EVER BUILT 5,120 CUDA cores 640 NEW Tensor cores 7.5 FP64 TFLOPS | 15 FP32 TFLOPS 120 Tensor TFLOPS 20MB SM RF | 16MB Cache | 16GB HBM2 @ 900 GB/s 300 GB/s NVLink
  • 9. V100 Tensor Cores 2 P100 FP32 V100 Tensor Cores P100 FP16 ImagesperSecond ImagesperSecond 2.4x faster ResNet-50 Inference TensorRT - 7ms Latency 3.7x faster V100 measured on pre-production hardware. ResNet-50 Training VOLTA: A GIANT LEAP FOR DEEP LEARNING
  • 10. 4 The Fastest and Most Productive GPU for Deep Learning and HPC Volta Architecture Most Productive GPU Tensor Core 120 Programmable TFLOPS Deep Learning Improved SIMT Model New Algorithms Volta MPS Inference Utilization Improved NVLink & HBM2 Efficient Bandwidth INTRODUCING TESLA V100
  • 11. 5 *full GV100 chip contains 84 SMs 21B transistors 815 mm2 80 SM 5120 CUDA Cores 640 Tensor Cores 16 GB HBM2 900 GB/s HBM2 300 GB/s NVLink TESLA V100 ARCHITECTURE
  • 12. Completely new ISA Twice the schedulers Simplified Issue Logic Large, fast L1 cache Improved SIMT model Tensor acceleration VOLTA V100 SM
  • 13. 8 VOLTA NVLINK 300GB/sec 50% more links 28% faster signaling
  • 14. Hardware Accelerated Work Submission Hardware Isolation VOLTA MULTI-PROCESS SERVICE Volta GV100 A B C CUDA MULTI-PROCESS SERVICE CONTROL CPU Processes 14 GPU Execution Volta MPS Enhancements: • Reduced launch latency • Improved launch throughput • Improved quality of service with scheduler partitioning • More reliable performance • 3x more clients than Pascal A B C VOLTA MULTI-PROCESS SERVICE
  • 15. Volta: Starvation Free AlgorithmsPascal: for messages Lock-Free Algorithms Threads cannot wait Threads may wait for messages VOLTA: INDEPENDENT THREAD SCHEDULING
  • 16. 6 ALL MAJOR FRAMEWORKSVOLTA-OPTIMIZED cuDNN MATRIX DATA OPTIMIZATION: Dense Matrix of Tensor Compute TENSOR-OP CONVERSION: FP32 to Tensor Op Data for Frameworks VOLTA TENSOR CORE 4x4 matrix processing array D[FP32] = A[FP16] * B[FP16] + C[FP32] Optimized For Deep Learning NEW TENSOR CORE BUILT FOR AI Delivering 120 TFLOPS of DL Performance
  • 17. 7 Over 80x DL Training Performance in 3 Years cuDNN3 1x K80 cuDNN2 8x P100 cuDNN6 4x M40 8x V100 cuDNN7 0x 20x 40x 60x 80x 100x Q1 15 Q3 15 Q2 17 Q2 16 Googlenet Training Performance (Speedup Vs K80) SpeedupvsK80 85% Scale-Out Efficiency Scales to 64 GPUs with Microsoft Cognitive Toolkit 0 5 10 15 64X V100 8X V100 8X P100 Multi-Node Training with NCCL2.0 (ResNet-50) ResNet50 Training for 90 Epochs with 1.28M images dataset | Using Caffe2 | V100 performance measured on pre-production hardware. 1 Hour 7.4 Hours 18 Hours 3X Reduction in Time to Train Over P100 0 10 20 1X V100 1X P100 2X CPU LSTM Training (Neural Machine Translation) Neural Machine Translation Training for 13 Epochs |German ->English, WMT15 subset | CPU = 2x Xeon E5 2699 V4 | V100 performance measured on pre-production hardware. 15 Days 18 Hours 6 Hours AI PERFORMANCE 3X Faster DL Training Performance
  • 18. 8 TensorRT Fuse Layers Compact Optimize Precision (FP32, FP16, INT8) Compiled Real-time Network Trained Neural Network 3x more throughput at 7ms latency with V100 (ResNet-50) 5,000 33ms 0 1,000 2,000 3,000 4,000 CPU Tesla P100 Tesla P100 (TensorFlow) (TensorRT) Tesla V100 (TensorRT)Throughput@7ms(Images/Sec) CPU Server: 2X Xeon E5-2660 V4; GPU: w/P100, w/V100 (@150W) | V100 performance measured on pre-production hardware. 3X 10ms 7ms 7ms VOLTA DELIVERS 3X MORE INFERENCE THROUGHPUT Low Latency performance with V100 and TensorRT
  • 19. 10 SINGLE UNIVERSAL GPU FOR ALL ACCELERATED WORKLOADS V100 UNIVERSAL GPU BOOSTS ALLACCELERATED WORKLOADS HPC 1.5X Vs P100 k 3X Vs P100 AI Training 3X Vs P100 AI Inference 2X Vs M60 Virtual Desktop
  • 20. 11 80% Perf at Half the Power 40% More Performance in a Rack V100 Max Efficiency V100 Max Performance 13 KW Rack 4 Nodes of 8xV100 13 ResNet-50 Networks Trained Per Day 13 KW Rack 7 Nodes of 8xV100 18 ResNet-50 Networks Trained Per Day ResNet-50 Training, Max Efficiency run with V100@160W | V100 performance measured on pre-production hardware. OPTIMIZED FOR DATACENTER EFFICIENCY
  • 21. 12 For NVLink Servers For PCIe Servers Compute 7.5 TF DP ∙ 15 TF SP ∙ 120 TF DL 7 TF DP ∙ 14 TF SP ∙ 112 TF DL Memory HBM2: 900 GB/s ∙ 16 GB HBM2: 900 GB/s ∙ 16 GB Interconnect NVLink (up to 300 GB/s) + PCIe Gen3 (up to 32 GB/s) PCIe Gen3 (up to 32 GB/s) Power 300W 250W TESLA V100 SPECIFICATIONS
  • 22. HPE enables an optimized Deep Learning Experience 22 Hardware Infrastructure Deep Learning Services Fraud Detection, Predictive Maintenance, Patient Diagnostics Applications Deep Learning Frameworks Data Infrastructure HPE Confidential External announcement at NVDIIA GTC on May10th, 2017