SlideShare a Scribd company logo
1 of 25
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Lin Yuan, Yuxi (Darren) Hu
Distributed Training Using Apache
MXNet with Horovod
Feb 11, 2019
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Outline
• What is distributed model training
• Introduction to Apache MXNet and Horovod
• Integrating MXNet with Horovod
• Performance results
• Demo
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Model Training 101
data
model
optimizer
gradients
converge? done
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
The Growing Pain of DNN
• Increasing model complexity
• ResNet50 network has over 25 million parameters [1]
• Huge training data
• ImageNet has 14,197,122 images [2]
• How to leverage the computing resource
• Cost/energy efficiency
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Model Training Going Distributed
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Data Parallelism vs Model Parallelism
data1 data2 datan
model model model
global state of parameters
machine1 machine2 machinen
machine1 machine2
machine3 machine4
…
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Data Parallel: Parameter Server based approach
data1
model
optimizer
worker1
server1
data2
model
optimizer
worker2
datan
model
optimizer
workern
server2
…
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Data Parallel: Ring-Allreduce based approach
worker1
model
worker4
model
worker3
model
worker2
model
data1
data4
data3
data2
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Apache MXNet
• Apache (incubating) open source project
• Framework for DNNs
• Created by academia (CMU and UW)
• Adopted byAWS as DNN framework of choice,
Nov 2016
• Widely used within Amazon
http://mxnet.io
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Asynchronous Engine
Operations appear to return
immediately, but just pushed to engine
queue on backend.
Allows for much greater parallelism.
Serial or parallel? “The execution of
any two functions when one of them
modifies at least one common variable
is serialized in their push order.”
Must wait_to_read() or similar to
retrieve value and this blocks.
wait_to_read()
frontend
backend
Sync Async
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Horovod
• An open source framework (under Linux
Foundation) for distributed model training
• Support TensorFlow, Keras, MXNet, and PyTorch
• Developed at Uber since Oct 2017
• Implement the ring-allreduce approach using MPI
and NCCL
• MPI: a message passing interface to
communicate between worker nodes
• NCCL: an efficient communicator methods
between GPUs
https://eng.uber.com/
horovod/
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Integrating MXNet with Horovod
Horovod MXNet
broadcast
parameters
model
optimizer
distributed
optimizer
allreduce update gradients
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Leverage the power of asynchronous engine in MXNet
• MXNet engine
starts executing the
operation
asynchronously
• Task dependency is
taken care of
automatically
• Improves the
training throughput
MXNet Engine
horovod.broadcast
horovod.allreduce
PushAsync
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Performance Optimization
• Mixed precision: using float32 for training and float16 for passing
gradients
• Tensor fusion: combine all tensors that are ready to be reduced into
one reduction operation
• Hierarchical Allreduce (only supported in NCCL)
• Aggregate SGD*: aggregate multiple weights in a single call to
optimizer to reduce synchronizing overhead
* contributed by NVIDIA (https://devblogs.nvidia.com/new-optimizations-
accelerate-deep-learning-training-gpu/)
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Next Steps
• Fused operators such as BatchNorm-ReLU and BatchNorm-Add-
ReLU to reduce unnecessary data transfer between CPU and GPU
memory*
• Provide different layout (NHWC) to improve convolution operators in
GPU*
* contributed by NVIDIA (https://devblogs.nvidia.com/new-optimizations-
accelerate-deep-learning-training-gpu/)
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Benchmark Setup
• Model and data
• ResNet50-v1b (~25 million parameters) [1]
• ImageNet (~14 million images) [2]
• Training setup
• batch size (per device): 256
• learning rate: 0.1 (scaled linearly with number of GPUs) [3]
• number of epochs: 90
• Software
• CUDA 9.2
• Ubuntu 16.04
• cuDNN 7.2.1
• NCCL 2.2.13
• OpenMPI 3.1.1
• Hardware
• GPU instance: p3.16xlarge (8 NVIDIA Tesla V100 GPUs, each pairing 5,120 CUDA Cores and 640 Tensor
Cores)
• CPU instance: c5.18xlarge (72 vCPU and 144GiB memory)
• Network bandwidth: 25Gbps
[1] He et al, “Deep Residual Learning for Image Recognition”, CVPR 2016
[2] http://image-net.org/challenges/LSVRC/2015/
[3] Goyal et al., “Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour”, CVPR 2018
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Scaling Efficiency
0
10000
20000
30000
40000
50000
60000
1 8 16 32 64
Images/sec
Training ResNet50 model with ImageNet data on
NVIDIA Tesla V100 GPUs
Parameter Server Horovod Ideal
82.6%
48.7%
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Cost Comparison
• Adding extra machines as parameter servers can help increase
throughput at the cost of computation resource ($$)
Setup Time to train
(min)
Throughput
(images/sec)
Top-1 Validation
Accuracy
Cost ($$) *
Horovod on 8
p3.16xlarge
43.5 44900 75.69% 142
Parameter Server
on 8 p3.16xlarge
and 16 c5.18xlarge
44.1 43482 74.81% 190
Parameter Server
(collocated)
76 26500 74.72% 248
* cost is calculated based on AWS on demand EC2 instance hourly rate of p3.16xlarge and c5.18xlarge
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
MLPerf Benchmark of ResNet-50v1.5 on ImageNet*
Submitter Hardware Software Time (mins) Speedup
Reference Pascal P100 Unoptimized
reference
8831.3 1.0x
Google TPUv2.512 +
TPUv2.8 (260
cores)
TensorFlow
1.12
11.3 781.5x
Intel 8x2S
SKX8180 (16
processors)
Intel Caffe
1.1.2a
1312.8 6.7x
NVIDIA 80xDGX-1
(640 Volta
GPU)
MXNet-
ngc18.11,
cuDNN 7.4
6.2 1424.4x
*MLPerf: https://mlperf.org/results/
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
How to run distributed training using MXNet with
Horovod
• Install MXNet
• Currently we recommend users to build MXNet from source if you are running on
machines with GCC 5.x and beyond: https://github.com/apache/incubator-mxnet
• If you are running on machines with GCC 4.x, you may install MXNet using pip:
• pip install mxnet-cu92
• Install Horovod
• Currently MXNet is supported in Horovod by building from source:
https://github.com/uber/horovod
• Horovod 0.16.0 will include MXNet in PyPI package:
• pip install horovod
• Run MPI in cluster
• Specify cluster in a host file
• mpirun -np <num of gpu/cpu devices> -H <hostfile> -bind-to none -map-by slot
python <training script>
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Changes needed in training script
Single GPU training Distributed training in Horovod
import mxnet as mx
# Set context to GPU
context = mx.gpu(0)
# Build model
model = …
# Define hyper parameters
optimizer_params = …
# Create optimizer
opt = mx.optimizer.create(…)
# Initialize parameters
initializer = …
model.bind(data=…,label=…)
model.init_params(initializer)
# Train model
model.fit(train_data, optimizer=opt,num_epoch=…
import mxnet as mx
import horovod.mxnet as hvd
# Initialize Horovod
hvd.init()
# Set conext to GPU by local rank
context = mx.gpu(hvd.local_rank())
# Build model
model = …
# Define hyper parameters
optimizer_params = …
# Create distributed optimizer
opt = mx.optimizer.create(…)
opt = hvd.DistributedOptimizer(opt)
# Initialize parameters
initializer = …
model.bind(data=…,label=…)
model.init_params(initializer)
# Fetch and broadcast parameters
hvd.broadcast_parameters(model.get_params())
# Train model
model.fit(train_data, optimizer=opt,num_epoch=…)
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Demo
• MXNet + Hovorod MNIST example Jupyter Notebook
• MXNet + Horovod MNIST example full scripts: Gluon, Module
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
How to Get Started with Apache MXNet on AWS
• Get started with Apache MXNet onAWS:
https://aws.amazon.com/mxnet/get-started/
• UsingApache MXNet with Amazon SageMaker:
https://docs.aws.amazon.com/sagemaker/latest/dg/mxnet.html
• Contact: mxnet-info@amazon.com
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Using Apache MXNet with AWS ML Services
• Amazon SageMaker: https://aws.amazon.com/sagemaker/
• Amazon SageMaker Neo: https://aws.amazon.com/sagemaker/neo/
• Amazon Elastic Inference: https://aws.amazon.com/machine-learning/elastic-inference/
• Amazon Reinforcement Learning: https://aws.amazon.com/about-aws/whats-new/2018/11/amazon-sagemaker-announces-support-
for-reinforcement-learning/
• AWS IoT Greengrass ML Inference: https://aws.amazon.com/greengrass/ml/
• Dynamic Training with Apache MXNet on AWS: https://aws.amazon.com/about-aws/whats-new/2018/11/introducing-dynamic-
training-with-apache-mxnet/
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Thank you for coming!
Q&A

More Related Content

What's hot

JavaプログラマのためのWebSocket概要
JavaプログラマのためのWebSocket概要JavaプログラマのためのWebSocket概要
JavaプログラマのためのWebSocket概要Shumpei Shiraishi
 
How We Scaled Bert To Serve 1+ Billion Daily Requests on CPU
How We Scaled Bert To Serve 1+ Billion Daily Requests on CPUHow We Scaled Bert To Serve 1+ Billion Daily Requests on CPU
How We Scaled Bert To Serve 1+ Billion Daily Requests on CPUDatabricks
 
猿でもわかるコンテナ
猿でもわかるコンテナ猿でもわかるコンテナ
猿でもわかるコンテナTsuyoshi Miyake
 
SQL Server/SQL Database の新機能のお話し
SQL Server/SQL Database の新機能のお話しSQL Server/SQL Database の新機能のお話し
SQL Server/SQL Database の新機能のお話しInsight Technology, Inc.
 
Virtualisation.pptx
Virtualisation.pptxVirtualisation.pptx
Virtualisation.pptxISaf3
 
WebRTCハンズオン
WebRTCハンズオンWebRTCハンズオン
WebRTCハンズオンYusuke Naka
 
eBPFは何が嬉しいのか
eBPFは何が嬉しいのかeBPFは何が嬉しいのか
eBPFは何が嬉しいのかYutaro Hayakawa
 
YugaByte DB on Kubernetes - An Introduction
YugaByte DB on Kubernetes - An IntroductionYugaByte DB on Kubernetes - An Introduction
YugaByte DB on Kubernetes - An IntroductionYugabyte
 
SAS Fraud Framework for Insurance
SAS Fraud Framework for InsuranceSAS Fraud Framework for Insurance
SAS Fraud Framework for Insurancestuartdrose
 
企業システムにSELinuxを適用するときの検討事項
企業システムにSELinuxを適用するときの検討事項企業システムにSELinuxを適用するときの検討事項
企業システムにSELinuxを適用するときの検討事項Atsushi Mitsu
 
ROCm and Distributed Deep Learning on Spark and TensorFlow
ROCm and Distributed Deep Learning on Spark and TensorFlowROCm and Distributed Deep Learning on Spark and TensorFlow
ROCm and Distributed Deep Learning on Spark and TensorFlowDatabricks
 
Virtualization with KVM (Kernel-based Virtual Machine)
Virtualization with KVM (Kernel-based Virtual Machine)Virtualization with KVM (Kernel-based Virtual Machine)
Virtualization with KVM (Kernel-based Virtual Machine)Novell
 
Zynq mp勉強会資料
Zynq mp勉強会資料Zynq mp勉強会資料
Zynq mp勉強会資料一路 川染
 
TinyML as-a-Service
TinyML as-a-ServiceTinyML as-a-Service
TinyML as-a-ServiceHiroshi Doyu
 
Big Data Ingestion @ Flipkart Data Platform
Big Data Ingestion @ Flipkart Data PlatformBig Data Ingestion @ Flipkart Data Platform
Big Data Ingestion @ Flipkart Data PlatformNavneet Gupta
 

What's hot (20)

QNX Sales Engineering Presentation
QNX Sales Engineering PresentationQNX Sales Engineering Presentation
QNX Sales Engineering Presentation
 
JavaプログラマのためのWebSocket概要
JavaプログラマのためのWebSocket概要JavaプログラマのためのWebSocket概要
JavaプログラマのためのWebSocket概要
 
How We Scaled Bert To Serve 1+ Billion Daily Requests on CPU
How We Scaled Bert To Serve 1+ Billion Daily Requests on CPUHow We Scaled Bert To Serve 1+ Billion Daily Requests on CPU
How We Scaled Bert To Serve 1+ Billion Daily Requests on CPU
 
猿でもわかるコンテナ
猿でもわかるコンテナ猿でもわかるコンテナ
猿でもわかるコンテナ
 
SQL Server/SQL Database の新機能のお話し
SQL Server/SQL Database の新機能のお話しSQL Server/SQL Database の新機能のお話し
SQL Server/SQL Database の新機能のお話し
 
IBM Watson Overview
IBM Watson OverviewIBM Watson Overview
IBM Watson Overview
 
Virtualisation.pptx
Virtualisation.pptxVirtualisation.pptx
Virtualisation.pptx
 
WebRTCハンズオン
WebRTCハンズオンWebRTCハンズオン
WebRTCハンズオン
 
eBPFは何が嬉しいのか
eBPFは何が嬉しいのかeBPFは何が嬉しいのか
eBPFは何が嬉しいのか
 
Datastage Introduction To Data Warehousing
Datastage Introduction To Data WarehousingDatastage Introduction To Data Warehousing
Datastage Introduction To Data Warehousing
 
YugaByte DB on Kubernetes - An Introduction
YugaByte DB on Kubernetes - An IntroductionYugaByte DB on Kubernetes - An Introduction
YugaByte DB on Kubernetes - An Introduction
 
SAS Fraud Framework for Insurance
SAS Fraud Framework for InsuranceSAS Fraud Framework for Insurance
SAS Fraud Framework for Insurance
 
企業システムにSELinuxを適用するときの検討事項
企業システムにSELinuxを適用するときの検討事項企業システムにSELinuxを適用するときの検討事項
企業システムにSELinuxを適用するときの検討事項
 
ROCm and Distributed Deep Learning on Spark and TensorFlow
ROCm and Distributed Deep Learning on Spark and TensorFlowROCm and Distributed Deep Learning on Spark and TensorFlow
ROCm and Distributed Deep Learning on Spark and TensorFlow
 
Virtualization with KVM (Kernel-based Virtual Machine)
Virtualization with KVM (Kernel-based Virtual Machine)Virtualization with KVM (Kernel-based Virtual Machine)
Virtualization with KVM (Kernel-based Virtual Machine)
 
Node-Red
Node-RedNode-Red
Node-Red
 
Zynq mp勉強会資料
Zynq mp勉強会資料Zynq mp勉強会資料
Zynq mp勉強会資料
 
TinyML as-a-Service
TinyML as-a-ServiceTinyML as-a-Service
TinyML as-a-Service
 
Big Data Ingestion @ Flipkart Data Platform
Big Data Ingestion @ Flipkart Data PlatformBig Data Ingestion @ Flipkart Data Platform
Big Data Ingestion @ Flipkart Data Platform
 
JDK versions and OpenJDK
JDK versions and OpenJDKJDK versions and OpenJDK
JDK versions and OpenJDK
 

Similar to Distributed Model Training using MXNet with Horovod

Scalable Multi-Node Deep Learning Training in the Cloud (CMP368-R1) - AWS re:...
Scalable Multi-Node Deep Learning Training in the Cloud (CMP368-R1) - AWS re:...Scalable Multi-Node Deep Learning Training in the Cloud (CMP368-R1) - AWS re:...
Scalable Multi-Node Deep Learning Training in the Cloud (CMP368-R1) - AWS re:...Amazon Web Services
 
Build Deep Learning Applications Using Apache MXNet - Featuring Chick-fil-A (...
Build Deep Learning Applications Using Apache MXNet - Featuring Chick-fil-A (...Build Deep Learning Applications Using Apache MXNet - Featuring Chick-fil-A (...
Build Deep Learning Applications Using Apache MXNet - Featuring Chick-fil-A (...Amazon Web Services
 
Machine Learning e Amazon SageMaker: Algoritmos, Modelos e Inferências - MCL...
Machine Learning e Amazon SageMaker: Algoritmos, Modelos e Inferências -  MCL...Machine Learning e Amazon SageMaker: Algoritmos, Modelos e Inferências -  MCL...
Machine Learning e Amazon SageMaker: Algoritmos, Modelos e Inferências - MCL...Amazon Web Services
 
[NEW LAUNCH!] Introducing Amazon Elastic Inference: Reduce Deep Learning Infe...
[NEW LAUNCH!] Introducing Amazon Elastic Inference: Reduce Deep Learning Infe...[NEW LAUNCH!] Introducing Amazon Elastic Inference: Reduce Deep Learning Infe...
[NEW LAUNCH!] Introducing Amazon Elastic Inference: Reduce Deep Learning Infe...Amazon Web Services
 
Build Deep Learning Applications Using Apache MXNet, Featuring Workday (AIM40...
Build Deep Learning Applications Using Apache MXNet, Featuring Workday (AIM40...Build Deep Learning Applications Using Apache MXNet, Featuring Workday (AIM40...
Build Deep Learning Applications Using Apache MXNet, Featuring Workday (AIM40...Amazon Web Services
 
BDA301 Working with Machine Learning in Amazon SageMaker: Algorithms, Models,...
BDA301 Working with Machine Learning in Amazon SageMaker: Algorithms, Models,...BDA301 Working with Machine Learning in Amazon SageMaker: Algorithms, Models,...
BDA301 Working with Machine Learning in Amazon SageMaker: Algorithms, Models,...Amazon Web Services
 
Accelerate Machine Learning Workloads using Amazon EC2 P3 Instances - SRV201 ...
Accelerate Machine Learning Workloads using Amazon EC2 P3 Instances - SRV201 ...Accelerate Machine Learning Workloads using Amazon EC2 P3 Instances - SRV201 ...
Accelerate Machine Learning Workloads using Amazon EC2 P3 Instances - SRV201 ...Amazon Web Services
 
Save up to 90% on Big Data and Machine Learning Workloads with Spot Instances...
Save up to 90% on Big Data and Machine Learning Workloads with Spot Instances...Save up to 90% on Big Data and Machine Learning Workloads with Spot Instances...
Save up to 90% on Big Data and Machine Learning Workloads with Spot Instances...Amazon Web Services
 
From Notebook to production with Amazon SageMaker
From Notebook to production with Amazon SageMakerFrom Notebook to production with Amazon SageMaker
From Notebook to production with Amazon SageMakerAmazon Web Services
 
Deep learning acceleration with Amazon Elastic Inference
Deep learning acceleration with Amazon Elastic Inference  Deep learning acceleration with Amazon Elastic Inference
Deep learning acceleration with Amazon Elastic Inference Hagay Lupesko
 
Building Deep Learning Applications with TensorFlow and SageMaker on AWS - Te...
Building Deep Learning Applications with TensorFlow and SageMaker on AWS - Te...Building Deep Learning Applications with TensorFlow and SageMaker on AWS - Te...
Building Deep Learning Applications with TensorFlow and SageMaker on AWS - Te...Amazon Web Services
 
Introduction to Scalable Deep Learning on AWS with Apache MXNet
Introduction to Scalable Deep Learning on AWS with Apache MXNetIntroduction to Scalable Deep Learning on AWS with Apache MXNet
Introduction to Scalable Deep Learning on AWS with Apache MXNetAmazon Web Services
 
Amazon SageMaker (December 2018)
Amazon SageMaker (December 2018)Amazon SageMaker (December 2018)
Amazon SageMaker (December 2018)Julien SIMON
 
Julien Simon, Principal Technical Evangelist at Amazon - Machine Learning: Fr...
Julien Simon, Principal Technical Evangelist at Amazon - Machine Learning: Fr...Julien Simon, Principal Technical Evangelist at Amazon - Machine Learning: Fr...
Julien Simon, Principal Technical Evangelist at Amazon - Machine Learning: Fr...Codiax
 
Apache MXNet and Gluon
Apache MXNet and GluonApache MXNet and Gluon
Apache MXNet and GluonSoji Adeshina
 
Work with Machine Learning in Amazon SageMaker - BDA203 - Atlanta AWS Summit
Work with Machine Learning in Amazon SageMaker - BDA203 - Atlanta AWS SummitWork with Machine Learning in Amazon SageMaker - BDA203 - Atlanta AWS Summit
Work with Machine Learning in Amazon SageMaker - BDA203 - Atlanta AWS SummitAmazon Web Services
 
Amazon SageMaker를 통한 대용량 모델 훈련 방법 살펴보기 - 김대근 AWS AI/ML 스페셜리스트 솔루션즈 아키텍트 / 최영준...
Amazon SageMaker를 통한 대용량 모델 훈련 방법 살펴보기 - 김대근 AWS AI/ML 스페셜리스트 솔루션즈 아키텍트 / 최영준...Amazon SageMaker를 통한 대용량 모델 훈련 방법 살펴보기 - 김대근 AWS AI/ML 스페셜리스트 솔루션즈 아키텍트 / 최영준...
Amazon SageMaker를 통한 대용량 모델 훈련 방법 살펴보기 - 김대근 AWS AI/ML 스페셜리스트 솔루션즈 아키텍트 / 최영준...Amazon Web Services Korea
 
Debugging Gluon and Apache MXNet (AIM423) - AWS re:Invent 2018
Debugging Gluon and Apache MXNet (AIM423) - AWS re:Invent 2018Debugging Gluon and Apache MXNet (AIM423) - AWS re:Invent 2018
Debugging Gluon and Apache MXNet (AIM423) - AWS re:Invent 2018Amazon Web Services
 
Amazon AI/ML Overview
Amazon AI/ML OverviewAmazon AI/ML Overview
Amazon AI/ML OverviewBESPIN GLOBAL
 
An Introduction to Amazon SageMaker (October 2018)
An Introduction to Amazon SageMaker (October 2018)An Introduction to Amazon SageMaker (October 2018)
An Introduction to Amazon SageMaker (October 2018)Julien SIMON
 

Similar to Distributed Model Training using MXNet with Horovod (20)

Scalable Multi-Node Deep Learning Training in the Cloud (CMP368-R1) - AWS re:...
Scalable Multi-Node Deep Learning Training in the Cloud (CMP368-R1) - AWS re:...Scalable Multi-Node Deep Learning Training in the Cloud (CMP368-R1) - AWS re:...
Scalable Multi-Node Deep Learning Training in the Cloud (CMP368-R1) - AWS re:...
 
Build Deep Learning Applications Using Apache MXNet - Featuring Chick-fil-A (...
Build Deep Learning Applications Using Apache MXNet - Featuring Chick-fil-A (...Build Deep Learning Applications Using Apache MXNet - Featuring Chick-fil-A (...
Build Deep Learning Applications Using Apache MXNet - Featuring Chick-fil-A (...
 
Machine Learning e Amazon SageMaker: Algoritmos, Modelos e Inferências - MCL...
Machine Learning e Amazon SageMaker: Algoritmos, Modelos e Inferências -  MCL...Machine Learning e Amazon SageMaker: Algoritmos, Modelos e Inferências -  MCL...
Machine Learning e Amazon SageMaker: Algoritmos, Modelos e Inferências - MCL...
 
[NEW LAUNCH!] Introducing Amazon Elastic Inference: Reduce Deep Learning Infe...
[NEW LAUNCH!] Introducing Amazon Elastic Inference: Reduce Deep Learning Infe...[NEW LAUNCH!] Introducing Amazon Elastic Inference: Reduce Deep Learning Infe...
[NEW LAUNCH!] Introducing Amazon Elastic Inference: Reduce Deep Learning Infe...
 
Build Deep Learning Applications Using Apache MXNet, Featuring Workday (AIM40...
Build Deep Learning Applications Using Apache MXNet, Featuring Workday (AIM40...Build Deep Learning Applications Using Apache MXNet, Featuring Workday (AIM40...
Build Deep Learning Applications Using Apache MXNet, Featuring Workday (AIM40...
 
BDA301 Working with Machine Learning in Amazon SageMaker: Algorithms, Models,...
BDA301 Working with Machine Learning in Amazon SageMaker: Algorithms, Models,...BDA301 Working with Machine Learning in Amazon SageMaker: Algorithms, Models,...
BDA301 Working with Machine Learning in Amazon SageMaker: Algorithms, Models,...
 
Accelerate Machine Learning Workloads using Amazon EC2 P3 Instances - SRV201 ...
Accelerate Machine Learning Workloads using Amazon EC2 P3 Instances - SRV201 ...Accelerate Machine Learning Workloads using Amazon EC2 P3 Instances - SRV201 ...
Accelerate Machine Learning Workloads using Amazon EC2 P3 Instances - SRV201 ...
 
Save up to 90% on Big Data and Machine Learning Workloads with Spot Instances...
Save up to 90% on Big Data and Machine Learning Workloads with Spot Instances...Save up to 90% on Big Data and Machine Learning Workloads with Spot Instances...
Save up to 90% on Big Data and Machine Learning Workloads with Spot Instances...
 
From Notebook to production with Amazon SageMaker
From Notebook to production with Amazon SageMakerFrom Notebook to production with Amazon SageMaker
From Notebook to production with Amazon SageMaker
 
Deep learning acceleration with Amazon Elastic Inference
Deep learning acceleration with Amazon Elastic Inference  Deep learning acceleration with Amazon Elastic Inference
Deep learning acceleration with Amazon Elastic Inference
 
Building Deep Learning Applications with TensorFlow and SageMaker on AWS - Te...
Building Deep Learning Applications with TensorFlow and SageMaker on AWS - Te...Building Deep Learning Applications with TensorFlow and SageMaker on AWS - Te...
Building Deep Learning Applications with TensorFlow and SageMaker on AWS - Te...
 
Introduction to Scalable Deep Learning on AWS with Apache MXNet
Introduction to Scalable Deep Learning on AWS with Apache MXNetIntroduction to Scalable Deep Learning on AWS with Apache MXNet
Introduction to Scalable Deep Learning on AWS with Apache MXNet
 
Amazon SageMaker (December 2018)
Amazon SageMaker (December 2018)Amazon SageMaker (December 2018)
Amazon SageMaker (December 2018)
 
Julien Simon, Principal Technical Evangelist at Amazon - Machine Learning: Fr...
Julien Simon, Principal Technical Evangelist at Amazon - Machine Learning: Fr...Julien Simon, Principal Technical Evangelist at Amazon - Machine Learning: Fr...
Julien Simon, Principal Technical Evangelist at Amazon - Machine Learning: Fr...
 
Apache MXNet and Gluon
Apache MXNet and GluonApache MXNet and Gluon
Apache MXNet and Gluon
 
Work with Machine Learning in Amazon SageMaker - BDA203 - Atlanta AWS Summit
Work with Machine Learning in Amazon SageMaker - BDA203 - Atlanta AWS SummitWork with Machine Learning in Amazon SageMaker - BDA203 - Atlanta AWS Summit
Work with Machine Learning in Amazon SageMaker - BDA203 - Atlanta AWS Summit
 
Amazon SageMaker를 통한 대용량 모델 훈련 방법 살펴보기 - 김대근 AWS AI/ML 스페셜리스트 솔루션즈 아키텍트 / 최영준...
Amazon SageMaker를 통한 대용량 모델 훈련 방법 살펴보기 - 김대근 AWS AI/ML 스페셜리스트 솔루션즈 아키텍트 / 최영준...Amazon SageMaker를 통한 대용량 모델 훈련 방법 살펴보기 - 김대근 AWS AI/ML 스페셜리스트 솔루션즈 아키텍트 / 최영준...
Amazon SageMaker를 통한 대용량 모델 훈련 방법 살펴보기 - 김대근 AWS AI/ML 스페셜리스트 솔루션즈 아키텍트 / 최영준...
 
Debugging Gluon and Apache MXNet (AIM423) - AWS re:Invent 2018
Debugging Gluon and Apache MXNet (AIM423) - AWS re:Invent 2018Debugging Gluon and Apache MXNet (AIM423) - AWS re:Invent 2018
Debugging Gluon and Apache MXNet (AIM423) - AWS re:Invent 2018
 
Amazon AI/ML Overview
Amazon AI/ML OverviewAmazon AI/ML Overview
Amazon AI/ML Overview
 
An Introduction to Amazon SageMaker (October 2018)
An Introduction to Amazon SageMaker (October 2018)An Introduction to Amazon SageMaker (October 2018)
An Introduction to Amazon SageMaker (October 2018)
 

Recently uploaded

Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisamasabamasaba
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...masabamasaba
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyviewmasabamasaba
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...masabamasaba
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension AidPhilip Schwarz
 
Harnessing ChatGPT - Elevating Productivity in Today's Agile Environment
Harnessing ChatGPT  - Elevating Productivity in Today's Agile EnvironmentHarnessing ChatGPT  - Elevating Productivity in Today's Agile Environment
Harnessing ChatGPT - Elevating Productivity in Today's Agile EnvironmentVictorSzoltysek
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesVictorSzoltysek
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplatePresentation.STUDIO
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...masabamasaba
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareJim McKeeth
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnAmarnathKambale
 

Recently uploaded (20)

Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
Harnessing ChatGPT - Elevating Productivity in Today's Agile Environment
Harnessing ChatGPT  - Elevating Productivity in Today's Agile EnvironmentHarnessing ChatGPT  - Elevating Productivity in Today's Agile Environment
Harnessing ChatGPT - Elevating Productivity in Today's Agile Environment
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 

Distributed Model Training using MXNet with Horovod

  • 1. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Lin Yuan, Yuxi (Darren) Hu Distributed Training Using Apache MXNet with Horovod Feb 11, 2019
  • 2. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Outline • What is distributed model training • Introduction to Apache MXNet and Horovod • Integrating MXNet with Horovod • Performance results • Demo
  • 3. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Model Training 101 data model optimizer gradients converge? done
  • 4. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark The Growing Pain of DNN • Increasing model complexity • ResNet50 network has over 25 million parameters [1] • Huge training data • ImageNet has 14,197,122 images [2] • How to leverage the computing resource • Cost/energy efficiency
  • 5. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Model Training Going Distributed
  • 6. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Data Parallelism vs Model Parallelism data1 data2 datan model model model global state of parameters machine1 machine2 machinen machine1 machine2 machine3 machine4 …
  • 7. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Data Parallel: Parameter Server based approach data1 model optimizer worker1 server1 data2 model optimizer worker2 datan model optimizer workern server2 …
  • 8. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Data Parallel: Ring-Allreduce based approach worker1 model worker4 model worker3 model worker2 model data1 data4 data3 data2
  • 9. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Apache MXNet • Apache (incubating) open source project • Framework for DNNs • Created by academia (CMU and UW) • Adopted byAWS as DNN framework of choice, Nov 2016 • Widely used within Amazon http://mxnet.io
  • 10. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Asynchronous Engine Operations appear to return immediately, but just pushed to engine queue on backend. Allows for much greater parallelism. Serial or parallel? “The execution of any two functions when one of them modifies at least one common variable is serialized in their push order.” Must wait_to_read() or similar to retrieve value and this blocks. wait_to_read() frontend backend Sync Async
  • 11. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Horovod • An open source framework (under Linux Foundation) for distributed model training • Support TensorFlow, Keras, MXNet, and PyTorch • Developed at Uber since Oct 2017 • Implement the ring-allreduce approach using MPI and NCCL • MPI: a message passing interface to communicate between worker nodes • NCCL: an efficient communicator methods between GPUs https://eng.uber.com/ horovod/
  • 12. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Integrating MXNet with Horovod Horovod MXNet broadcast parameters model optimizer distributed optimizer allreduce update gradients
  • 13. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Leverage the power of asynchronous engine in MXNet • MXNet engine starts executing the operation asynchronously • Task dependency is taken care of automatically • Improves the training throughput MXNet Engine horovod.broadcast horovod.allreduce PushAsync
  • 14. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Performance Optimization • Mixed precision: using float32 for training and float16 for passing gradients • Tensor fusion: combine all tensors that are ready to be reduced into one reduction operation • Hierarchical Allreduce (only supported in NCCL) • Aggregate SGD*: aggregate multiple weights in a single call to optimizer to reduce synchronizing overhead * contributed by NVIDIA (https://devblogs.nvidia.com/new-optimizations- accelerate-deep-learning-training-gpu/)
  • 15. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Next Steps • Fused operators such as BatchNorm-ReLU and BatchNorm-Add- ReLU to reduce unnecessary data transfer between CPU and GPU memory* • Provide different layout (NHWC) to improve convolution operators in GPU* * contributed by NVIDIA (https://devblogs.nvidia.com/new-optimizations- accelerate-deep-learning-training-gpu/)
  • 16. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Benchmark Setup • Model and data • ResNet50-v1b (~25 million parameters) [1] • ImageNet (~14 million images) [2] • Training setup • batch size (per device): 256 • learning rate: 0.1 (scaled linearly with number of GPUs) [3] • number of epochs: 90 • Software • CUDA 9.2 • Ubuntu 16.04 • cuDNN 7.2.1 • NCCL 2.2.13 • OpenMPI 3.1.1 • Hardware • GPU instance: p3.16xlarge (8 NVIDIA Tesla V100 GPUs, each pairing 5,120 CUDA Cores and 640 Tensor Cores) • CPU instance: c5.18xlarge (72 vCPU and 144GiB memory) • Network bandwidth: 25Gbps [1] He et al, “Deep Residual Learning for Image Recognition”, CVPR 2016 [2] http://image-net.org/challenges/LSVRC/2015/ [3] Goyal et al., “Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour”, CVPR 2018
  • 17. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Scaling Efficiency 0 10000 20000 30000 40000 50000 60000 1 8 16 32 64 Images/sec Training ResNet50 model with ImageNet data on NVIDIA Tesla V100 GPUs Parameter Server Horovod Ideal 82.6% 48.7%
  • 18. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Cost Comparison • Adding extra machines as parameter servers can help increase throughput at the cost of computation resource ($$) Setup Time to train (min) Throughput (images/sec) Top-1 Validation Accuracy Cost ($$) * Horovod on 8 p3.16xlarge 43.5 44900 75.69% 142 Parameter Server on 8 p3.16xlarge and 16 c5.18xlarge 44.1 43482 74.81% 190 Parameter Server (collocated) 76 26500 74.72% 248 * cost is calculated based on AWS on demand EC2 instance hourly rate of p3.16xlarge and c5.18xlarge
  • 19. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark MLPerf Benchmark of ResNet-50v1.5 on ImageNet* Submitter Hardware Software Time (mins) Speedup Reference Pascal P100 Unoptimized reference 8831.3 1.0x Google TPUv2.512 + TPUv2.8 (260 cores) TensorFlow 1.12 11.3 781.5x Intel 8x2S SKX8180 (16 processors) Intel Caffe 1.1.2a 1312.8 6.7x NVIDIA 80xDGX-1 (640 Volta GPU) MXNet- ngc18.11, cuDNN 7.4 6.2 1424.4x *MLPerf: https://mlperf.org/results/
  • 20. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark How to run distributed training using MXNet with Horovod • Install MXNet • Currently we recommend users to build MXNet from source if you are running on machines with GCC 5.x and beyond: https://github.com/apache/incubator-mxnet • If you are running on machines with GCC 4.x, you may install MXNet using pip: • pip install mxnet-cu92 • Install Horovod • Currently MXNet is supported in Horovod by building from source: https://github.com/uber/horovod • Horovod 0.16.0 will include MXNet in PyPI package: • pip install horovod • Run MPI in cluster • Specify cluster in a host file • mpirun -np <num of gpu/cpu devices> -H <hostfile> -bind-to none -map-by slot python <training script>
  • 21. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Changes needed in training script Single GPU training Distributed training in Horovod import mxnet as mx # Set context to GPU context = mx.gpu(0) # Build model model = … # Define hyper parameters optimizer_params = … # Create optimizer opt = mx.optimizer.create(…) # Initialize parameters initializer = … model.bind(data=…,label=…) model.init_params(initializer) # Train model model.fit(train_data, optimizer=opt,num_epoch=… import mxnet as mx import horovod.mxnet as hvd # Initialize Horovod hvd.init() # Set conext to GPU by local rank context = mx.gpu(hvd.local_rank()) # Build model model = … # Define hyper parameters optimizer_params = … # Create distributed optimizer opt = mx.optimizer.create(…) opt = hvd.DistributedOptimizer(opt) # Initialize parameters initializer = … model.bind(data=…,label=…) model.init_params(initializer) # Fetch and broadcast parameters hvd.broadcast_parameters(model.get_params()) # Train model model.fit(train_data, optimizer=opt,num_epoch=…)
  • 22. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Demo • MXNet + Hovorod MNIST example Jupyter Notebook • MXNet + Horovod MNIST example full scripts: Gluon, Module
  • 23. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark How to Get Started with Apache MXNet on AWS • Get started with Apache MXNet onAWS: https://aws.amazon.com/mxnet/get-started/ • UsingApache MXNet with Amazon SageMaker: https://docs.aws.amazon.com/sagemaker/latest/dg/mxnet.html • Contact: mxnet-info@amazon.com
  • 24. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Using Apache MXNet with AWS ML Services • Amazon SageMaker: https://aws.amazon.com/sagemaker/ • Amazon SageMaker Neo: https://aws.amazon.com/sagemaker/neo/ • Amazon Elastic Inference: https://aws.amazon.com/machine-learning/elastic-inference/ • Amazon Reinforcement Learning: https://aws.amazon.com/about-aws/whats-new/2018/11/amazon-sagemaker-announces-support- for-reinforcement-learning/ • AWS IoT Greengrass ML Inference: https://aws.amazon.com/greengrass/ml/ • Dynamic Training with Apache MXNet on AWS: https://aws.amazon.com/about-aws/whats-new/2018/11/introducing-dynamic- training-with-apache-mxnet/
  • 25. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Thank you for coming! Q&A

Editor's Notes

  1. Thanks for coming to our meetup today. My colleage Darren and I will present traininig deep neural network models on multiple GPU instances using Apache MXNet with Horovod
  2. First, I will give an overview of distributed model training. Next, I will briefly introduce MXNet a deep learning library and Horovod a framework for distributed training. After that, I will describe how we support running MXNet on Horovod and show you some performance results we achieved. Finally, we will give you a short demo of running MXNet with Horovod on multiple hosts.
  3. This is typical flow of today’s model training especially for deep neural networks.
  4. As the DNN becomes a popular models for machine learning applications, model training has become a challenging task.
  5. There are two trends in todays model training tasks. GPU has become the dominant hardware architecture for training due to its massive parallel computing capability for matrix operations. Second, more training jobs are running on multiple nodes than on single node.
  6. ring-allreduce utilizes the network in an optimal way if the tensors are large enough, but does not work as efficiently or quickly if they are very small. Up to 65% improvement by doing tensor fusion using fusion buffer hierarchical allreduce can further boost performance by 10% ~ 30%