SlideShare uma empresa Scribd logo
1 de 66
Baixar para ler offline
Efficient Model Selection for
Deep Neural Networks on
Massively Parallel Processing Databases
Frank McQuillan
Feb 2020
Nikhil Kak Ekta Khanna Orhan Kislal Domino Valdano
Professor
Arun Kumar
Yuhao Zhang
Free and Open Source
Agenda
1. Training deep neural
networks in parallel
2. Model hopper parallelism
3. Model hopper on MPP
4. Examples
– Grid search
– AutoML (Hyperband)
Deep Learning
• Type of machine
learning inspired by
biology of the brain
Convolutional
neural
network
• Artificial neural
network have multiple
layers between input
and output
Problem: Training deep nets is painful
Batch size
8, 16, 64, 256
...
Model architecture
3 layer CNN, 5 layer
CNN, LSTM …
Learning rate
0.1, 0.01, 0.001,
0.0001 ...
Regularization
L2, L1, dropout,
batch norm ...
4 4 4 4
256 different configurations!
Need for speed → parallelism
Model accuracy = f(model architecture, hyperparameters, datasets)
Trial and error
1. Training Deep Neural Nets in Parallel
Source: https://medium.com/@divakar_239/stochastic-vs-batch-gradient-descent-8820568eada1
Gradient Descent
Mini-batch SGD on a Single Machine
Model
Updated Model
η ∇
Learning
rate
Avg. of
gradients
x1 x2 y
1.1 2.3 0
0.9 1.6 1
0.6 1.3 1
... ... ...
... ... ...
... ... ...
... ... ...
... ... ...
... ... ...
One mini-
batch
Mini-batch SGD on a Single Machine
X1 X2 y
1.1 2.3 0
0.9 1.6 1
0.6 1.3 1
... ... ...
... ... ...
... ... ...
... ... ...
... ... ...
... ... ...
One iteration
(epoch)
One minibatch
Sequential
How to parallelize?
Models (tasks)
Replicate datasets on
each machine
Task Parallelism
Train multiple models at the same time
Task Parallelism
Con: wasted memory and storage
Task Parallelism
Con: wasted network bandwidth
Common FS or
data repo
Data Parallelism
Models (tasks)
Partition data
across machines
Train models one at a time
Data Parallelism
Queue
Training on one
mini-batch or full
partition
● Update model every iteration: bulk synchronous
parallelism (model averaging)
○ Convergence can be poor
Updates
● Update per mini-batch:
○ Sync parameter server
○ Async parameter server
○ Decentralized: MPI allreduce (Horovod)
○ High communication cost
Task Parallelism Data Parallelism
Model Hopper Parallelism
Let’s try to get the best of both worlds
2. Model Hopper Parallelism
Model Hopper Parallelism
Models (tasks)
Partition data
across machines
Model Hopper Parallelism
Training on
entire partition
Model Hopper Parallelism
Training on
entire partition
Model hopping
& training
Model Hopper Parallelism
Training on
entire partition
Model hopping
& training
Model hopping
& training
Model Hopper Parallelism
Training on
entire partition
Model hopping
& training
Model hopping
& trainingOne
iteration
One sub-
epoch
Model Hopper Parallelism
https://adalabucsd.github.io/papers/2019_Cerebro_DEEM.pdf
● 8-node GPU cluster
● Nvidia P100 GPUs
● Imagenet dataset
● 16 model
configurations using
VGG16 and
ResNet50
Model hopper
3. Model Hopper on Greenplum Database
Greenplum Database with GPUs
…
Master
Host
SQL
Segment Host Segment Host Segment Host Segment Host
GPU N
…
GPU 1
…
GPU N
…
GPU 1
GPUs only on
certain hosts for
cost reasons
… … … …
API -- load_model_selection_table()
SELECT load_model_selection_table(
'model_arch_table', -- model architecture table
'model_selection_table’, -- output table
ARRAY[...], -- model architecture ids
ARRAY[...], -- compile hyperparameters
ARRAY[...] -- fit hyperparameters
);
model_selection_table
API -- madlib_keras_fit_multiple_model()
SELECT madlib_keras_fit_multiple_model(
'data', -- data table
'trained_models', -- output table name
'model_selection_table', -- model selection table
100, -- number of iterations
True, -- use GPUs
... -- optional parameters
);
trained_models
4a. Grid Search Example Using Model Hopper
Test Setup
• 60k 32x32 color images in 10
classes, with 6k images per class
• 50k training images and 10k test
images
https://www.cs.toronto.edu/~kriz/cifar.html
• 4 hosts each with 32 vCPUs & 150 GB memory
& 4 NVIDIA Tesla P100 GPUs
• Greenplum 5 with 4 segments per host
• Apache MADlib 1.17
• Keras 2.2.4, TensorFlow 1.13.1
CIFAR-10
Approach to Training
Henri Cartier-Bresson
Area of promise
to fine tune
Model Configurations
Model architecture
Model id Type Weights
1 CNN 1.2M
2 CNN 552K
3 CNN 500K
Optimizer
Type Params
RMSprop lr=.0001
decay=1e-6
RMSprop lr=.001
decay=1e-6
Adam lr=.0001
Adam lr=.001
SGD r=.001
momentum=0.9
SGD r=.01
momentum=0.9
SGD r=.001
momentum=0.95
SGD r=.01
momentum=0.95
Batch size
Model id
32
64
128
256
3 4
8
96 configs
Too much information!
Run time: 2:47:57 on 16 workers/16 GPUS
Why not just use the most accurate?
82.3%
Because it is overfitting the data
Overfit
Overfit
Focus on models with low overfit
Models: 2 & 3
Optimizers: Adam & SGD
Batch sizes: 128 & 256
These charts only consider models with accuracy > 70%
Example with less overfitting
70.3%
Refine Model Configs
Model architecture
Model id Type Weights
1 CNN 1.2M
2 CNN 552K
3 CNN 500K
Optimizer
Type Params
RMSprop lr=.0001
decay=1e-6
RMSprop lr=.001
decay=1e-6
Adam lr=.0001
Adam lr=.001
SGD r=.001
momentum=0.9
SGD r=.01
momentum=0.9
SGD r=.001
momentum=0.95
SGD r=.01
momentum=0.95
Batch size
Model id
32
64
128
256
2 2
3
12 configs
(was 96)
Re-run the best 12 with more iterations
Run time: 1:53:06 on 16 workers/16 GPUS
Example good result
80.0%
sgd (lr=0.001,momentum=0.95)
batch size=256
Inference
Who am I ??? Model says:
dog 0.9532
deer 0.0258
bird 0.0093
4b. AutoML Example Using Model Hopper
Hyperband
Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization
https://arxiv.org/pdf/1603.06560.pdf
Initial models
(tasks)
train
ri iterations
...
train
ri iterations
Iteration 1
survivors
Iteration 2
survivors
Successive halving
Hyperband
Inputs:
● R: maximum amount of resource that can be allocated to a single
configuration
● 𝜂: controls the proportion of configurations discarded in each round of
successive halving
R= 81
𝜂 = 3
num model configsbracket num iterations
Model Configurations
Model architecture
Model id Type Weights
1 CNN 1.2M
2 CNN 552K
3 CNN 500K
Optimizer
Type Params
RMSprop lr=[.0001, 0.001]
decay=1e-6
Adam lr=[.0001, 0.001]
SGD lr=[.001, 0.01]
momentum=[0.9, 0.95]
Batch size
Model id
32
64
128
256
Hyperband
schedule
Note ranges specified,
not exact values
Again, a lot of information!
Run time: 3:21:04 on 16 workers/16 GPUS
Refine Model Configurations
Model architecture
Model id Type Weights
1 CNN 1.2M
2 CNN 552K
3 CNN 500K
Optimizer
Type Params
RMSprop lr=[.0001, 0.001]
decay=1e-6
Adam lr=[.0001, 0.0005]
SGD lr=[.001, 0.01]
lr=[.001, 0.005]
momentum=[0.9, 0.95]
Batch size
Model id
32
64
128
256
Adjust
Hyperband
schedule
Adjust ranges
Re-run the best with Hyperband again
Run time: 55:49 on 16 workers/16 GPUS
Example good result
78.2%
sgd (lr=0.00485,momentum=0.90919)
batch size=128
Summary
• Model hopper can efficiently train many deep nets at a
time on an MPP database
• Can add autoML methods on top of model hopper
• Areas of future work:
– Improve GPU efficiency
– Add more autoML methods
References
• Model hopper
– Supun Nakandala, Yuhao Zhang, and Arun Kumar, "Cerebro: Efficient and Reproducible Model Selection on
Deep Learning Systems," DEEM’30, June 30, 2019, Amsterdam, Netherlands
https://adalabucsd.github.io/papers/2019_Cerebro_DEEM.pdf
• Greenplum
– https://github.com/greenplum-db/gpdb
– https://greenplum.org/
• Apache MADlib
– https://github.com/apache/madlib
– http://madlib.apache.org/
• Jupyter notebooks
– https://github.com/apache/madlib-site/tree/asf-site/community-artifacts
Thank you!
Backup Slides
Tuning Epochs on a Distributed System
Effect of cluster size
CIFAR-10, 2 CNNs with ~500k and ~1M weights
16 model configurations
5 passes per sub-epoch X 10 iterations =
50 epochs
5
10
(16 workers)(8 workers)(4 workers)
Effect of cluster size
1
50
CIFAR-10, 2 CNNs with ~500k and ~1M weights
16 model configurations
50 passes per sub-epoch X 1 iteration =
50 epochs
(16 workers)(8 workers)(4 workers)(1 worker)
Effect of passes per sub-epoch
CIFAR-10, 2 CNNs with ~500k and
~1M weights
16 model configurations
4 hosts X 4 workers per host = 16
workers
4 hosts X 4 GPUs per host = 16
GPUs
Other MOP Slides
Comparison of Parallel Training Approaches
Implementation on GPDB
1. Scheduling
2. Dispatch model to specific segment (worker)
3. Run training
4. No data motion
Implementation on GPDB -- Scheduling
One
iteration
One sub-
epoch
Round-robin
Implementation on GPDB -- Dispatching Models
DISTRIBUTED BY(__dist_key__)
DISTRIBUTED BY(__dist_key__)
Data table
1 2 3
1 2 3
Model table
JOIN
Implementation on GPDB -- Training
1 2 3
SELECT
FROM
GROUP BY
UDA
__dist_key__
Other Hyperband Slides
Hyperband Implementation on Greenplum
R= 81
𝜂 = 3
81 model configurations
1 iteration each
1 model configuration
81 iterations
Running 1 bracket at a time will result in idle machines
Hyperband Implementation on Greenplum
R= 81
𝜂 = 3
1+1+1+2+5 = 10 model configurations
81 iteration eachs
Run across brackets “on the diagonal” to keep machines busy

Mais conteúdo relacionado

Mais procurados

Exploration of Supervised Machine Learning Techniques for Runtime Selection o...
Exploration of Supervised Machine Learning Techniques for Runtime Selection o...Exploration of Supervised Machine Learning Techniques for Runtime Selection o...
Exploration of Supervised Machine Learning Techniques for Runtime Selection o...Akihiro Hayashi
 
Tutorial: Image Generation and Image-to-Image Translation using GAN
Tutorial: Image Generation and Image-to-Image Translation using GANTutorial: Image Generation and Image-to-Image Translation using GAN
Tutorial: Image Generation and Image-to-Image Translation using GANWuhyun Rico Shin
 
Hadoop mapreduce performance study on arm cluster
Hadoop mapreduce performance study on arm clusterHadoop mapreduce performance study on arm cluster
Hadoop mapreduce performance study on arm clusterairbots
 
Deep Learning with Apache MXNet (September 2017)
Deep Learning with Apache MXNet (September 2017)Deep Learning with Apache MXNet (September 2017)
Deep Learning with Apache MXNet (September 2017)Julien SIMON
 
A Scaleable Implementation of Deep Learning on Spark -Alexander Ulanov
A Scaleable Implementation of Deep Learning on Spark -Alexander UlanovA Scaleable Implementation of Deep Learning on Spark -Alexander Ulanov
A Scaleable Implementation of Deep Learning on Spark -Alexander UlanovSpark Summit
 
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016MLconf
 
Keras: Deep Learning Library for Python
Keras: Deep Learning Library for PythonKeras: Deep Learning Library for Python
Keras: Deep Learning Library for PythonRafi Khan
 
Video Analytics At Scale: DL, CV, ML On Databricks Platform
Video Analytics At Scale: DL, CV, ML On Databricks PlatformVideo Analytics At Scale: DL, CV, ML On Databricks Platform
Video Analytics At Scale: DL, CV, ML On Databricks PlatformDatabricks
 
Intro to the Distributed Version of TensorFlow
Intro to the Distributed Version of TensorFlowIntro to the Distributed Version of TensorFlow
Intro to the Distributed Version of TensorFlowAltoros
 
Knitting boar - Toronto and Boston HUGs - Nov 2012
Knitting boar - Toronto and Boston HUGs - Nov 2012Knitting boar - Toronto and Boston HUGs - Nov 2012
Knitting boar - Toronto and Boston HUGs - Nov 2012Josh Patterson
 
Distributed Model Training using MXNet with Horovod
Distributed Model Training using MXNet with HorovodDistributed Model Training using MXNet with Horovod
Distributed Model Training using MXNet with HorovodLin Yuan
 
Deep Learning Computer Build
Deep Learning Computer BuildDeep Learning Computer Build
Deep Learning Computer BuildPetteriTeikariPhD
 
Tom Peters, Software Engineer, Ufora at MLconf ATL 2016
Tom Peters, Software Engineer, Ufora at MLconf ATL 2016Tom Peters, Software Engineer, Ufora at MLconf ATL 2016
Tom Peters, Software Engineer, Ufora at MLconf ATL 2016MLconf
 
Distributed Deep learning Training.
Distributed Deep learning Training.Distributed Deep learning Training.
Distributed Deep learning Training.Umang Sharma
 
Deep Learning Applications (dadada2017)
Deep Learning Applications (dadada2017)Deep Learning Applications (dadada2017)
Deep Learning Applications (dadada2017)Abhishek Thakur
 
Deep Learning Frameworks Using Spark on YARN by Vartika Singh
Deep Learning Frameworks Using Spark on YARN by Vartika SinghDeep Learning Frameworks Using Spark on YARN by Vartika Singh
Deep Learning Frameworks Using Spark on YARN by Vartika SinghData Con LA
 
How to use Apache TVM to optimize your ML models
How to use Apache TVM to optimize your ML modelsHow to use Apache TVM to optimize your ML models
How to use Apache TVM to optimize your ML modelsDatabricks
 
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDSAccelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDSDatabricks
 
CUDA and Caffe for deep learning
CUDA and Caffe for deep learningCUDA and Caffe for deep learning
CUDA and Caffe for deep learningAmgad Muhammad
 
Surge: Rise of Scalable Machine Learning at Yahoo!
Surge: Rise of Scalable Machine Learning at Yahoo!Surge: Rise of Scalable Machine Learning at Yahoo!
Surge: Rise of Scalable Machine Learning at Yahoo!DataWorks Summit
 

Mais procurados (20)

Exploration of Supervised Machine Learning Techniques for Runtime Selection o...
Exploration of Supervised Machine Learning Techniques for Runtime Selection o...Exploration of Supervised Machine Learning Techniques for Runtime Selection o...
Exploration of Supervised Machine Learning Techniques for Runtime Selection o...
 
Tutorial: Image Generation and Image-to-Image Translation using GAN
Tutorial: Image Generation and Image-to-Image Translation using GANTutorial: Image Generation and Image-to-Image Translation using GAN
Tutorial: Image Generation and Image-to-Image Translation using GAN
 
Hadoop mapreduce performance study on arm cluster
Hadoop mapreduce performance study on arm clusterHadoop mapreduce performance study on arm cluster
Hadoop mapreduce performance study on arm cluster
 
Deep Learning with Apache MXNet (September 2017)
Deep Learning with Apache MXNet (September 2017)Deep Learning with Apache MXNet (September 2017)
Deep Learning with Apache MXNet (September 2017)
 
A Scaleable Implementation of Deep Learning on Spark -Alexander Ulanov
A Scaleable Implementation of Deep Learning on Spark -Alexander UlanovA Scaleable Implementation of Deep Learning on Spark -Alexander Ulanov
A Scaleable Implementation of Deep Learning on Spark -Alexander Ulanov
 
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
 
Keras: Deep Learning Library for Python
Keras: Deep Learning Library for PythonKeras: Deep Learning Library for Python
Keras: Deep Learning Library for Python
 
Video Analytics At Scale: DL, CV, ML On Databricks Platform
Video Analytics At Scale: DL, CV, ML On Databricks PlatformVideo Analytics At Scale: DL, CV, ML On Databricks Platform
Video Analytics At Scale: DL, CV, ML On Databricks Platform
 
Intro to the Distributed Version of TensorFlow
Intro to the Distributed Version of TensorFlowIntro to the Distributed Version of TensorFlow
Intro to the Distributed Version of TensorFlow
 
Knitting boar - Toronto and Boston HUGs - Nov 2012
Knitting boar - Toronto and Boston HUGs - Nov 2012Knitting boar - Toronto and Boston HUGs - Nov 2012
Knitting boar - Toronto and Boston HUGs - Nov 2012
 
Distributed Model Training using MXNet with Horovod
Distributed Model Training using MXNet with HorovodDistributed Model Training using MXNet with Horovod
Distributed Model Training using MXNet with Horovod
 
Deep Learning Computer Build
Deep Learning Computer BuildDeep Learning Computer Build
Deep Learning Computer Build
 
Tom Peters, Software Engineer, Ufora at MLconf ATL 2016
Tom Peters, Software Engineer, Ufora at MLconf ATL 2016Tom Peters, Software Engineer, Ufora at MLconf ATL 2016
Tom Peters, Software Engineer, Ufora at MLconf ATL 2016
 
Distributed Deep learning Training.
Distributed Deep learning Training.Distributed Deep learning Training.
Distributed Deep learning Training.
 
Deep Learning Applications (dadada2017)
Deep Learning Applications (dadada2017)Deep Learning Applications (dadada2017)
Deep Learning Applications (dadada2017)
 
Deep Learning Frameworks Using Spark on YARN by Vartika Singh
Deep Learning Frameworks Using Spark on YARN by Vartika SinghDeep Learning Frameworks Using Spark on YARN by Vartika Singh
Deep Learning Frameworks Using Spark on YARN by Vartika Singh
 
How to use Apache TVM to optimize your ML models
How to use Apache TVM to optimize your ML modelsHow to use Apache TVM to optimize your ML models
How to use Apache TVM to optimize your ML models
 
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDSAccelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
 
CUDA and Caffe for deep learning
CUDA and Caffe for deep learningCUDA and Caffe for deep learning
CUDA and Caffe for deep learning
 
Surge: Rise of Scalable Machine Learning at Yahoo!
Surge: Rise of Scalable Machine Learning at Yahoo!Surge: Rise of Scalable Machine Learning at Yahoo!
Surge: Rise of Scalable Machine Learning at Yahoo!
 

Semelhante a Efficient Model Selection for Deep Neural Networks on Massively Parallel Processing Databases

Prediction as a service with ensemble model in SparkML and Python ScikitLearn
Prediction as a service with ensemble model in SparkML and Python ScikitLearnPrediction as a service with ensemble model in SparkML and Python ScikitLearn
Prediction as a service with ensemble model in SparkML and Python ScikitLearnJosef A. Habdank
 
Spark Summit EU talk by Josef Habdank
Spark Summit EU talk by Josef HabdankSpark Summit EU talk by Josef Habdank
Spark Summit EU talk by Josef HabdankSpark Summit
 
StackNet Meta-Modelling framework
StackNet Meta-Modelling frameworkStackNet Meta-Modelling framework
StackNet Meta-Modelling frameworkSri Ambati
 
Optimize Your Machine Learning Workloads
Optimize Your Machine Learning WorkloadsOptimize Your Machine Learning Workloads
Optimize Your Machine Learning WorkloadsAmazon Web Services
 
Resource-Efficient Deep Learning Model Selection on Apache Spark
Resource-Efficient Deep Learning Model Selection on Apache SparkResource-Efficient Deep Learning Model Selection on Apache Spark
Resource-Efficient Deep Learning Model Selection on Apache SparkDatabricks
 
2018 03 25 system ml ai and openpower meetup
2018 03 25 system ml ai and openpower meetup2018 03 25 system ml ai and openpower meetup
2018 03 25 system ml ai and openpower meetupGanesan Narayanasamy
 
Tensors Are All You Need: Faster Inference with Hummingbird
Tensors Are All You Need: Faster Inference with HummingbirdTensors Are All You Need: Faster Inference with Hummingbird
Tensors Are All You Need: Faster Inference with HummingbirdDatabricks
 
Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for De...
Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for De...Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for De...
Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for De...Databricks
 
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)
AI optimizing HPC simulations (presentation from  6th EULAG Workshop)AI optimizing HPC simulations (presentation from  6th EULAG Workshop)
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)byteLAKE
 
Creating smaller, faster, production-ready mobile machine learning models.
Creating smaller, faster, production-ready mobile machine learning models.Creating smaller, faster, production-ready mobile machine learning models.
Creating smaller, faster, production-ready mobile machine learning models.Jameson Toole
 
Things you can find in the plan cache
Things you can find in the plan cacheThings you can find in the plan cache
Things you can find in the plan cachesqlserver.co.il
 
MLConf 2013: Metronome and Parallel Iterative Algorithms on YARN
MLConf 2013: Metronome and Parallel Iterative Algorithms on YARNMLConf 2013: Metronome and Parallel Iterative Algorithms on YARN
MLConf 2013: Metronome and Parallel Iterative Algorithms on YARNJosh Patterson
 
Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest...
Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest...Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest...
Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest...Databricks
 
AI On the Edge: Model Compression
AI On the Edge: Model CompressionAI On the Edge: Model Compression
AI On the Edge: Model CompressionApache MXNet
 
Understand and Harness the Capabilities of Intel® Xeon Phi™ Processors
Understand and Harness the Capabilities of Intel® Xeon Phi™ ProcessorsUnderstand and Harness the Capabilities of Intel® Xeon Phi™ Processors
Understand and Harness the Capabilities of Intel® Xeon Phi™ ProcessorsIntel® Software
 
OpenCL caffe IWOCL 2016 presentation final
OpenCL caffe IWOCL 2016 presentation finalOpenCL caffe IWOCL 2016 presentation final
OpenCL caffe IWOCL 2016 presentation finalJunli Gu
 

Semelhante a Efficient Model Selection for Deep Neural Networks on Massively Parallel Processing Databases (20)

Prediction as a service with ensemble model in SparkML and Python ScikitLearn
Prediction as a service with ensemble model in SparkML and Python ScikitLearnPrediction as a service with ensemble model in SparkML and Python ScikitLearn
Prediction as a service with ensemble model in SparkML and Python ScikitLearn
 
Spark Summit EU talk by Josef Habdank
Spark Summit EU talk by Josef HabdankSpark Summit EU talk by Josef Habdank
Spark Summit EU talk by Josef Habdank
 
StackNet Meta-Modelling framework
StackNet Meta-Modelling frameworkStackNet Meta-Modelling framework
StackNet Meta-Modelling framework
 
Optimize Your Machine Learning Workloads
Optimize Your Machine Learning WorkloadsOptimize Your Machine Learning Workloads
Optimize Your Machine Learning Workloads
 
Resource-Efficient Deep Learning Model Selection on Apache Spark
Resource-Efficient Deep Learning Model Selection on Apache SparkResource-Efficient Deep Learning Model Selection on Apache Spark
Resource-Efficient Deep Learning Model Selection on Apache Spark
 
System mldl meetup
System mldl meetupSystem mldl meetup
System mldl meetup
 
2018 03 25 system ml ai and openpower meetup
2018 03 25 system ml ai and openpower meetup2018 03 25 system ml ai and openpower meetup
2018 03 25 system ml ai and openpower meetup
 
Tensors Are All You Need: Faster Inference with Hummingbird
Tensors Are All You Need: Faster Inference with HummingbirdTensors Are All You Need: Faster Inference with Hummingbird
Tensors Are All You Need: Faster Inference with Hummingbird
 
Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for De...
Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for De...Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for De...
Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for De...
 
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)
AI optimizing HPC simulations (presentation from  6th EULAG Workshop)AI optimizing HPC simulations (presentation from  6th EULAG Workshop)
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)
 
Creating smaller, faster, production-ready mobile machine learning models.
Creating smaller, faster, production-ready mobile machine learning models.Creating smaller, faster, production-ready mobile machine learning models.
Creating smaller, faster, production-ready mobile machine learning models.
 
Things you can find in the plan cache
Things you can find in the plan cacheThings you can find in the plan cache
Things you can find in the plan cache
 
C3 w1
C3 w1C3 w1
C3 w1
 
MLConf 2013: Metronome and Parallel Iterative Algorithms on YARN
MLConf 2013: Metronome and Parallel Iterative Algorithms on YARNMLConf 2013: Metronome and Parallel Iterative Algorithms on YARN
MLConf 2013: Metronome and Parallel Iterative Algorithms on YARN
 
Open power ddl and lms
Open power ddl and lmsOpen power ddl and lms
Open power ddl and lms
 
Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest...
Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest...Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest...
Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest...
 
Deep Learning at Scale
Deep Learning at ScaleDeep Learning at Scale
Deep Learning at Scale
 
AI On the Edge: Model Compression
AI On the Edge: Model CompressionAI On the Edge: Model Compression
AI On the Edge: Model Compression
 
Understand and Harness the Capabilities of Intel® Xeon Phi™ Processors
Understand and Harness the Capabilities of Intel® Xeon Phi™ ProcessorsUnderstand and Harness the Capabilities of Intel® Xeon Phi™ Processors
Understand and Harness the Capabilities of Intel® Xeon Phi™ Processors
 
OpenCL caffe IWOCL 2016 presentation final
OpenCL caffe IWOCL 2016 presentation finalOpenCL caffe IWOCL 2016 presentation final
OpenCL caffe IWOCL 2016 presentation final
 

Mais de inside-BigData.com

Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...inside-BigData.com
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networksinside-BigData.com
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...inside-BigData.com
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...inside-BigData.com
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...inside-BigData.com
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networksinside-BigData.com
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoringinside-BigData.com
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecastsinside-BigData.com
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Updateinside-BigData.com
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19inside-BigData.com
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuninginside-BigData.com
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODinside-BigData.com
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Accelerationinside-BigData.com
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficientlyinside-BigData.com
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Erainside-BigData.com
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computinginside-BigData.com
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Clusterinside-BigData.com
 

Mais de inside-BigData.com (20)

Major Market Shifts in IT
Major Market Shifts in ITMajor Market Shifts in IT
Major Market Shifts in IT
 
Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networks
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networks
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecasts
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Update
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuning
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
 
State of ARM-based HPC
State of ARM-based HPCState of ARM-based HPC
State of ARM-based HPC
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Acceleration
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Era
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computing
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Cluster
 
Overview of HPC Interconnects
Overview of HPC InterconnectsOverview of HPC Interconnects
Overview of HPC Interconnects
 

Último

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 

Último (20)

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 

Efficient Model Selection for Deep Neural Networks on Massively Parallel Processing Databases

  • 1. Efficient Model Selection for Deep Neural Networks on Massively Parallel Processing Databases Frank McQuillan Feb 2020
  • 2. Nikhil Kak Ekta Khanna Orhan Kislal Domino Valdano Professor Arun Kumar Yuhao Zhang
  • 3. Free and Open Source
  • 4. Agenda 1. Training deep neural networks in parallel 2. Model hopper parallelism 3. Model hopper on MPP 4. Examples – Grid search – AutoML (Hyperband)
  • 5. Deep Learning • Type of machine learning inspired by biology of the brain Convolutional neural network • Artificial neural network have multiple layers between input and output
  • 6. Problem: Training deep nets is painful Batch size 8, 16, 64, 256 ... Model architecture 3 layer CNN, 5 layer CNN, LSTM … Learning rate 0.1, 0.01, 0.001, 0.0001 ... Regularization L2, L1, dropout, batch norm ... 4 4 4 4 256 different configurations! Need for speed → parallelism Model accuracy = f(model architecture, hyperparameters, datasets) Trial and error
  • 7. 1. Training Deep Neural Nets in Parallel
  • 9. Mini-batch SGD on a Single Machine Model Updated Model η ∇ Learning rate Avg. of gradients x1 x2 y 1.1 2.3 0 0.9 1.6 1 0.6 1.3 1 ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... One mini- batch
  • 10. Mini-batch SGD on a Single Machine X1 X2 y 1.1 2.3 0 0.9 1.6 1 0.6 1.3 1 ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... One iteration (epoch) One minibatch Sequential
  • 12. Models (tasks) Replicate datasets on each machine Task Parallelism Train multiple models at the same time
  • 13. Task Parallelism Con: wasted memory and storage
  • 14. Task Parallelism Con: wasted network bandwidth Common FS or data repo
  • 15. Data Parallelism Models (tasks) Partition data across machines Train models one at a time
  • 16. Data Parallelism Queue Training on one mini-batch or full partition ● Update model every iteration: bulk synchronous parallelism (model averaging) ○ Convergence can be poor Updates ● Update per mini-batch: ○ Sync parameter server ○ Async parameter server ○ Decentralized: MPI allreduce (Horovod) ○ High communication cost
  • 17. Task Parallelism Data Parallelism Model Hopper Parallelism Let’s try to get the best of both worlds
  • 18. 2. Model Hopper Parallelism
  • 19. Model Hopper Parallelism Models (tasks) Partition data across machines
  • 20. Model Hopper Parallelism Training on entire partition
  • 21. Model Hopper Parallelism Training on entire partition Model hopping & training
  • 22. Model Hopper Parallelism Training on entire partition Model hopping & training Model hopping & training
  • 23. Model Hopper Parallelism Training on entire partition Model hopping & training Model hopping & trainingOne iteration One sub- epoch
  • 24. Model Hopper Parallelism https://adalabucsd.github.io/papers/2019_Cerebro_DEEM.pdf ● 8-node GPU cluster ● Nvidia P100 GPUs ● Imagenet dataset ● 16 model configurations using VGG16 and ResNet50 Model hopper
  • 25. 3. Model Hopper on Greenplum Database
  • 26. Greenplum Database with GPUs … Master Host SQL Segment Host Segment Host Segment Host Segment Host GPU N … GPU 1 … GPU N … GPU 1 GPUs only on certain hosts for cost reasons … … … …
  • 27. API -- load_model_selection_table() SELECT load_model_selection_table( 'model_arch_table', -- model architecture table 'model_selection_table’, -- output table ARRAY[...], -- model architecture ids ARRAY[...], -- compile hyperparameters ARRAY[...] -- fit hyperparameters ); model_selection_table
  • 28. API -- madlib_keras_fit_multiple_model() SELECT madlib_keras_fit_multiple_model( 'data', -- data table 'trained_models', -- output table name 'model_selection_table', -- model selection table 100, -- number of iterations True, -- use GPUs ... -- optional parameters ); trained_models
  • 29. 4a. Grid Search Example Using Model Hopper
  • 30. Test Setup • 60k 32x32 color images in 10 classes, with 6k images per class • 50k training images and 10k test images https://www.cs.toronto.edu/~kriz/cifar.html • 4 hosts each with 32 vCPUs & 150 GB memory & 4 NVIDIA Tesla P100 GPUs • Greenplum 5 with 4 segments per host • Apache MADlib 1.17 • Keras 2.2.4, TensorFlow 1.13.1 CIFAR-10
  • 31. Approach to Training Henri Cartier-Bresson Area of promise to fine tune
  • 32. Model Configurations Model architecture Model id Type Weights 1 CNN 1.2M 2 CNN 552K 3 CNN 500K Optimizer Type Params RMSprop lr=.0001 decay=1e-6 RMSprop lr=.001 decay=1e-6 Adam lr=.0001 Adam lr=.001 SGD r=.001 momentum=0.9 SGD r=.01 momentum=0.9 SGD r=.001 momentum=0.95 SGD r=.01 momentum=0.95 Batch size Model id 32 64 128 256 3 4 8 96 configs
  • 33. Too much information! Run time: 2:47:57 on 16 workers/16 GPUS
  • 34. Why not just use the most accurate? 82.3%
  • 35. Because it is overfitting the data Overfit Overfit
  • 36. Focus on models with low overfit Models: 2 & 3 Optimizers: Adam & SGD Batch sizes: 128 & 256 These charts only consider models with accuracy > 70%
  • 37. Example with less overfitting 70.3%
  • 38. Refine Model Configs Model architecture Model id Type Weights 1 CNN 1.2M 2 CNN 552K 3 CNN 500K Optimizer Type Params RMSprop lr=.0001 decay=1e-6 RMSprop lr=.001 decay=1e-6 Adam lr=.0001 Adam lr=.001 SGD r=.001 momentum=0.9 SGD r=.01 momentum=0.9 SGD r=.001 momentum=0.95 SGD r=.01 momentum=0.95 Batch size Model id 32 64 128 256 2 2 3 12 configs (was 96)
  • 39. Re-run the best 12 with more iterations Run time: 1:53:06 on 16 workers/16 GPUS
  • 40. Example good result 80.0% sgd (lr=0.001,momentum=0.95) batch size=256
  • 41. Inference Who am I ??? Model says: dog 0.9532 deer 0.0258 bird 0.0093
  • 42. 4b. AutoML Example Using Model Hopper
  • 43. Hyperband Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization https://arxiv.org/pdf/1603.06560.pdf Initial models (tasks) train ri iterations ... train ri iterations Iteration 1 survivors Iteration 2 survivors Successive halving
  • 44. Hyperband Inputs: ● R: maximum amount of resource that can be allocated to a single configuration ● 𝜂: controls the proportion of configurations discarded in each round of successive halving R= 81 𝜂 = 3 num model configsbracket num iterations
  • 45. Model Configurations Model architecture Model id Type Weights 1 CNN 1.2M 2 CNN 552K 3 CNN 500K Optimizer Type Params RMSprop lr=[.0001, 0.001] decay=1e-6 Adam lr=[.0001, 0.001] SGD lr=[.001, 0.01] momentum=[0.9, 0.95] Batch size Model id 32 64 128 256 Hyperband schedule Note ranges specified, not exact values
  • 46. Again, a lot of information! Run time: 3:21:04 on 16 workers/16 GPUS
  • 47. Refine Model Configurations Model architecture Model id Type Weights 1 CNN 1.2M 2 CNN 552K 3 CNN 500K Optimizer Type Params RMSprop lr=[.0001, 0.001] decay=1e-6 Adam lr=[.0001, 0.0005] SGD lr=[.001, 0.01] lr=[.001, 0.005] momentum=[0.9, 0.95] Batch size Model id 32 64 128 256 Adjust Hyperband schedule Adjust ranges
  • 48. Re-run the best with Hyperband again Run time: 55:49 on 16 workers/16 GPUS
  • 49. Example good result 78.2% sgd (lr=0.00485,momentum=0.90919) batch size=128
  • 50. Summary • Model hopper can efficiently train many deep nets at a time on an MPP database • Can add autoML methods on top of model hopper • Areas of future work: – Improve GPU efficiency – Add more autoML methods
  • 51. References • Model hopper – Supun Nakandala, Yuhao Zhang, and Arun Kumar, "Cerebro: Efficient and Reproducible Model Selection on Deep Learning Systems," DEEM’30, June 30, 2019, Amsterdam, Netherlands https://adalabucsd.github.io/papers/2019_Cerebro_DEEM.pdf • Greenplum – https://github.com/greenplum-db/gpdb – https://greenplum.org/ • Apache MADlib – https://github.com/apache/madlib – http://madlib.apache.org/ • Jupyter notebooks – https://github.com/apache/madlib-site/tree/asf-site/community-artifacts
  • 54. Tuning Epochs on a Distributed System
  • 55. Effect of cluster size CIFAR-10, 2 CNNs with ~500k and ~1M weights 16 model configurations 5 passes per sub-epoch X 10 iterations = 50 epochs 5 10 (16 workers)(8 workers)(4 workers)
  • 56. Effect of cluster size 1 50 CIFAR-10, 2 CNNs with ~500k and ~1M weights 16 model configurations 50 passes per sub-epoch X 1 iteration = 50 epochs (16 workers)(8 workers)(4 workers)(1 worker)
  • 57. Effect of passes per sub-epoch CIFAR-10, 2 CNNs with ~500k and ~1M weights 16 model configurations 4 hosts X 4 workers per host = 16 workers 4 hosts X 4 GPUs per host = 16 GPUs
  • 59. Comparison of Parallel Training Approaches
  • 60. Implementation on GPDB 1. Scheduling 2. Dispatch model to specific segment (worker) 3. Run training 4. No data motion
  • 61. Implementation on GPDB -- Scheduling One iteration One sub- epoch Round-robin
  • 62. Implementation on GPDB -- Dispatching Models DISTRIBUTED BY(__dist_key__) DISTRIBUTED BY(__dist_key__) Data table 1 2 3 1 2 3 Model table JOIN
  • 63. Implementation on GPDB -- Training 1 2 3 SELECT FROM GROUP BY UDA __dist_key__
  • 65. Hyperband Implementation on Greenplum R= 81 𝜂 = 3 81 model configurations 1 iteration each 1 model configuration 81 iterations Running 1 bracket at a time will result in idle machines
  • 66. Hyperband Implementation on Greenplum R= 81 𝜂 = 3 1+1+1+2+5 = 10 model configurations 81 iteration eachs Run across brackets “on the diagonal” to keep machines busy