SlideShare uma empresa Scribd logo
1 de 51
Baixar para ler offline
Resource-efficient Deep
Learning Model Selection
on Apache Spark
Yuhao Zhang and Supun Nakandala
ADALab, University of California, San Diego
About us
▪ PHD students from ADALab at UCSD, advised by
Prof. Arun Kumar
▪ Our research mission: democratize data science
▪ More:
Supun Nakandala
https://scnakandala.github.io/
Yuhao Zhang
https://yhzhang.info/
ADALab
https://adalabucsd.github.io/
Introduction
Artificial Neural Networks (ANNs) are
revolutionizing many domains - “Deep Learning”
Problem: training deep nets is Painful!
Batch size?
8, 16, 64, 256 ...
Model architecture?
3 layer CNN,5 layer
CNN, LSTM…
Learning rate?
0.1, 0.01, 0.001,
0.0001 ...
Regularization?
L2, L1, Dropout,
Batchnorm ...
4 4 4 4
256 Different configurations !
Model performance = f(model architecture, hyperparameters, ...)
→Trial and error
Need for speed → $$$
(Distributed DL)
→ Better utilization of resources
Outline
1. Background
a. Mini-batch SGD
b. Task Parallelism
c. Data Parallelism
2. Model Hopper Parallelism (MOP)
3. MOP on Apache Spark
a. Implementation
b. APIs
c. Tests
Outline
1. Background
a. Mini-batch SGD
b. Task Parallelism
c. Data Parallelism
2. Model Hopper Parallelism (MOP)
3. MOP on Apache Spark
a. Implementation
b. APIs
c. Tests
Introduction - mini-batch SGD
Model
Updated Model
η ∇
Learning
rate
Avg. of
gradients
X1 X2 y
1.1 2.3 0
0.9 1.6 1
0.6 1.3 1
... ... ...
... ... ...
... ... ...
... ... ...
... ... ...
... ... ...
One
mini-batch
The most popular algorithm family for
training deep nets
Introduction - mini-batch SGD
X1 X2 y
1.1 2.3 0
0.9 1.6 1
0.6 1.3 1
... ... ...
... ... ...
... ... ...
... ... ...
... ... ...
... ... ...
One epoch
One mini-batch
Sequential
Outline
1. Background
a. Mini-batch SGD
b. Task Parallelism
c. Data Parallelism
2. Model Hopper Parallelism (MOP)
3. MOP on Apache Spark
a. Implementation
b. APIs
c. Tests
Models (tasks)
Machines with replicated
datasets
Task Parallelism - Problem Setting
(Embarrassing) Task Parallelism
Con: wasted storage
(Embarrassing) Task Parallelism
Con: wasted network
Shared FS or
data repo
Outline
1. Background
a. Mini-batch SGD
b. Task Parallelism
c. Data Parallelism
2. Model Hopper Parallelism (MOP)
3. MOP on Apache Spark
a. Implementation
b. APIs
c. Tests
Data Parallelism - Problem Setting
Models(tasks)
Partitioned data
High data scalability
Data Parallelism
Queue
Training on one mini-batch
or full partition
● Update only per epoch: bulk synchronous parallelism
(model averaging)
○ Bad convergence
● Update per mini-batch: sync parameter server
○ + Async updates: async parameter server
○ + Decentralized: MPI allreduce (Horovod)
○ High communication cost
Updates
Task Parallelism
+ high throughput
- low data scalability
- memory/storage wastage
Data Parallelism
+ high data scalability
- low throughput
- high communication cost
Model Hopper Parallelism (Cerebro)
+ high throughput
+ high data scalability
+ low communication cost
+ no memory/storage wastage
Outline
1. Background
a. Mini-batch SGD
b. Task Parallelism
c. Data Parallelism
2. Model Hopper Parallelism (MOP)
3. MOP on Apache Spark
a. Implementation
b. APIs
c. Tests
Model Hopper Parallelism -
Problem Setting
Models (tasks)
Partitioned data
Model Hopper Parallelism
Training on full
local partitions
One
sub-epoch
Model Hopper Parallelism
Training on full
local partitions
Model hopping
& training
One
sub-epoch
Model Hopper Parallelism
Training on full
local partitions
Model hopping
& training
Model hopping
& training
One
sub-epoch
Model Hopper Parallelism
Training on full
local partitions
Model hopping
& training
Model hopping
& trainingOne
epoch
One
sub-epoch
Heterogeneous Tasks
Time
Redundant sync barrier!
Queue
Randomized Scheduler
Time
Cerebro -- Data System with MOP
Outline
1. Background
a. Mini-batch SGD
b. Task Parallelism
c. Data Parallelism
2. Model Hopper Parallelism (MOP)
3. MOP on Apache Spark
a. Implementation
b. APIs
c. Tests
MOP (Cerebro)
on Spark Spark Driver
Cerebro
Scheduler
Spark Worker
Cerebro
Worker
Spark Worker
Cerebro
Worker
Distributed File System (HDFS, NFS)
Implementation Details
▪ Spark DataFrames converted to partitioned Parquet
and locally cached in workers
▪ TensorFlow threads run training on local data
partitions
▪ Model Hopping implemented via shared file system
Outline
1. Background
a. Mini-batch SGD
b. Task Parallelism
c. Data Parallelism
2. Model Hopper Parallelism (MOP)
3. MOP on Apache Spark
a. Implementation
b. APIs
c. Tests
Example: Grid Search on
Model Selection + Hyperparameter
Search
▪ Two model architecture: {VGG16, ResNet50}
▪ Two learning rate: {1e-4, 1e-6}
▪ Two batch size: {32, 256}
Initialization
from pyspark.sql import SparkSession
import cerebro
spark = SparkSession.builder.master(...) # initialize spark
spark_backend = cerebro.backend.SparkBackend(
spark_context=spark.sparkContext, num_workers=num_workers
) # initialize cerebro
data_store = cerebro.storage.HDFSStore('hdfs://...') # set the shared data
storage
Define the Models
params = {'model_arch':['vgg16', 'resnet50'], 'learning_rate':[1e-4, 1e-6], 'batch_size':[32, 256]}
def estimator_gen_fn(params):
'''A model factory that returns an estimator,
given the input hyper-parameters, as well as model architectures'''
if params['model_arch'] == 'resnet50':
model = ... # tf.keras model
elif params['model_arch'] == 'vgg16':
model = ... # tf.keras model
optimizer = tf.keras.optimizers.Adam(lr=params['learning_rate']) # choose optimizer
loss = ... # define loss
estimator = cerebro.keras.SparkEstimator(model=model,
optimizer=optimizer,
loss=loss,
batch_size=params['batch_size'])
return estimator
Run Grid Search
df = ... # read data in as Spark DataFrame
grid_search = cerebro.tune.GridSearch(spark_backend,
data_store,
estimator_gen_fn,
params,
epoch=5,
validation=0.2,
feature_columns=['features'],
label_columns=['labels'])
model = grid_search.fit(df)
Outline
1. Background
a. Mini-batch SGD
b. Task Parallelism
c. Data Parallelism
2. Model Hopper Parallelism (MOP)
3. MOP on Apache Spark
a. Implementation
b. APIs
c. Tests
Tests - Setups - Hardware
▪ 9-node cluster, 1 master + 8 workers
▪ On each nodes:
▪ Intel Xeon 10-core 2.20 GHz CPU x 2
▪ 192 GB RAM
▪ Nvidia P100 GPU x 1
Tests - Setups - Workload
▪ Model selection + hyperparameter tuning on
ImageNet
▪ Adam optimizer
▪ Grid search space:
▪ Model architecture: {ResNet50, VGG16}
▪ Learning rate: {1e-4, 1e-6}
▪ Batch size: {32, 256}
▪ L2 regularization: {1e-4, 1e-6}
Tests - Results - Learning Curves
Tests - Results - Learning Curves
Tests - Results - Per Epoch Runtimes
* Horovod uses GPU kernels for communication. Thus, it has high GPU utilization.
Tests - Results - Runtimes
* Horovod uses GPU kernels for communication. Thus, it has high GPU utilization.
System
Runtime (hrs/epoch)
GPU Utili. (%)
Storage
Footprint (GiB)
Train Validation
TF PS - Async 8.6 250
Horovod 92.1 250
Cerebro-Spark 2.63 0.57 42.4 250
TF Model Averaging 1.94 0.03 72.1 250
Celery 1.69 0.03 82.4 2000
Cerebro-Standalone 1.72 0.05 79.8 250
Tests - Cerebro-Spark Gantt Chart
▪ Only overhead: stragglers randomly caused by TF 2.1 Keras Model saving/loading.
Overheads range from 1% to 300%
Stragglers
Tests - Cerebro-Spark Gantt Chart
▪ One epoch of training
▪ (Almost) optimal!
Tests - Cerebro-Standalone Gantt Chart
Other Available Hyperparameter
Tuning Algorithms
▪ PBT
▪ HyperBand
▪ ASHA
▪ Hyperopt
More Features to Come
▪ Grouped learning
▪ API for transfer learning
▪ Model parallelism
References
▪ Cerebro project site
▪ https://adalabucsd.github.io/cerebro-system
▪ Github repo
▪ https://github.com/adalabucsd/cerebro-system
▪ Blog post
▪ https://adalabucsd.github.io/research-blog/cerebro.html
▪ Tech report
▪ https://adalabucsd.github.io/papers/TR_2020_Cerebro.pdf
Questions?
Thank you!
Feedback
Your feedback is important to us.
Don’t forget to rate and
review the sessions.
Resource-Efficient Deep Learning Model Selection on Apache Spark

Mais conteúdo relacionado

Mais procurados

Improving the Life of Data Scientists: Automating ML Lifecycle through MLflow
Improving the Life of Data Scientists: Automating ML Lifecycle through MLflowImproving the Life of Data Scientists: Automating ML Lifecycle through MLflow
Improving the Life of Data Scientists: Automating ML Lifecycle through MLflow
Databricks
 
Deep Learning Pipelines for High Energy Physics using Apache Spark with Distr...
Deep Learning Pipelines for High Energy Physics using Apache Spark with Distr...Deep Learning Pipelines for High Energy Physics using Apache Spark with Distr...
Deep Learning Pipelines for High Energy Physics using Apache Spark with Distr...
Databricks
 
Building a Unified Data Pipeline with Apache Spark and XGBoost with Nan Zhu
Building a Unified Data Pipeline with Apache Spark and XGBoost with Nan ZhuBuilding a Unified Data Pipeline with Apache Spark and XGBoost with Nan Zhu
Building a Unified Data Pipeline with Apache Spark and XGBoost with Nan Zhu
Databricks
 

Mais procurados (20)

Deep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best PracticesDeep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best Practices
 
Improving the Life of Data Scientists: Automating ML Lifecycle through MLflow
Improving the Life of Data Scientists: Automating ML Lifecycle through MLflowImproving the Life of Data Scientists: Automating ML Lifecycle through MLflow
Improving the Life of Data Scientists: Automating ML Lifecycle through MLflow
 
Building, Debugging, and Tuning Spark Machine Leaning Pipelines-(Joseph Bradl...
Building, Debugging, and Tuning Spark Machine Leaning Pipelines-(Joseph Bradl...Building, Debugging, and Tuning Spark Machine Leaning Pipelines-(Joseph Bradl...
Building, Debugging, and Tuning Spark Machine Leaning Pipelines-(Joseph Bradl...
 
Spark Summit EU talk by Kent Buenaventura and Willaim Lau
Spark Summit EU talk by Kent Buenaventura and Willaim LauSpark Summit EU talk by Kent Buenaventura and Willaim Lau
Spark Summit EU talk by Kent Buenaventura and Willaim Lau
 
Koalas: How Well Does Koalas Work?
Koalas: How Well Does Koalas Work?Koalas: How Well Does Koalas Work?
Koalas: How Well Does Koalas Work?
 
Extreme Apache Spark: how in 3 months we created a pipeline that can process ...
Extreme Apache Spark: how in 3 months we created a pipeline that can process ...Extreme Apache Spark: how in 3 months we created a pipeline that can process ...
Extreme Apache Spark: how in 3 months we created a pipeline that can process ...
 
Spark Summit EU talk by Josef Habdank
Spark Summit EU talk by Josef HabdankSpark Summit EU talk by Josef Habdank
Spark Summit EU talk by Josef Habdank
 
Spark Summit EU talk by Oscar Castaneda
Spark Summit EU talk by Oscar CastanedaSpark Summit EU talk by Oscar Castaneda
Spark Summit EU talk by Oscar Castaneda
 
Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark
Build, Scale, and Deploy Deep Learning Pipelines Using Apache SparkBuild, Scale, and Deploy Deep Learning Pipelines Using Apache Spark
Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark
 
Performance Optimization Case Study: Shattering Hadoop's Sort Record with Spa...
Performance Optimization Case Study: Shattering Hadoop's Sort Record with Spa...Performance Optimization Case Study: Shattering Hadoop's Sort Record with Spa...
Performance Optimization Case Study: Shattering Hadoop's Sort Record with Spa...
 
Best Practices for Building Robust Data Platform with Apache Spark and Delta
Best Practices for Building Robust Data Platform with Apache Spark and DeltaBest Practices for Building Robust Data Platform with Apache Spark and Delta
Best Practices for Building Robust Data Platform with Apache Spark and Delta
 
Spark Summit EU talk by Rolf Jagerman
Spark Summit EU talk by Rolf JagermanSpark Summit EU talk by Rolf Jagerman
Spark Summit EU talk by Rolf Jagerman
 
Spark Summit 2016: Connecting Python to the Spark Ecosystem
Spark Summit 2016: Connecting Python to the Spark EcosystemSpark Summit 2016: Connecting Python to the Spark Ecosystem
Spark Summit 2016: Connecting Python to the Spark Ecosystem
 
Large-Scale Data Science in Apache Spark 2.0
Large-Scale Data Science in Apache Spark 2.0Large-Scale Data Science in Apache Spark 2.0
Large-Scale Data Science in Apache Spark 2.0
 
Advanced Natural Language Processing with Apache Spark NLP
Advanced Natural Language Processing with Apache Spark NLPAdvanced Natural Language Processing with Apache Spark NLP
Advanced Natural Language Processing with Apache Spark NLP
 
Build, Scale, and Deploy Deep Learning Pipelines with Ease Using Apache Spark
Build, Scale, and Deploy Deep Learning Pipelines with Ease Using Apache SparkBuild, Scale, and Deploy Deep Learning Pipelines with Ease Using Apache Spark
Build, Scale, and Deploy Deep Learning Pipelines with Ease Using Apache Spark
 
Deep Learning Pipelines for High Energy Physics using Apache Spark with Distr...
Deep Learning Pipelines for High Energy Physics using Apache Spark with Distr...Deep Learning Pipelines for High Energy Physics using Apache Spark with Distr...
Deep Learning Pipelines for High Energy Physics using Apache Spark with Distr...
 
Towards True Elasticity of Spark-(Michael Le and Min Li, IBM)
Towards True Elasticity of Spark-(Michael Le and Min Li, IBM)Towards True Elasticity of Spark-(Michael Le and Min Li, IBM)
Towards True Elasticity of Spark-(Michael Le and Min Li, IBM)
 
Building a Unified Data Pipeline with Apache Spark and XGBoost with Nan Zhu
Building a Unified Data Pipeline with Apache Spark and XGBoost with Nan ZhuBuilding a Unified Data Pipeline with Apache Spark and XGBoost with Nan Zhu
Building a Unified Data Pipeline with Apache Spark and XGBoost with Nan Zhu
 
Building a SIMD Supported Vectorized Native Engine for Spark SQL
Building a SIMD Supported Vectorized Native Engine for Spark SQLBuilding a SIMD Supported Vectorized Native Engine for Spark SQL
Building a SIMD Supported Vectorized Native Engine for Spark SQL
 

Semelhante a Resource-Efficient Deep Learning Model Selection on Apache Spark

Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...
inside-BigData.com
 
Explore big data at speed of thought with Spark 2.0 and Snappydata
Explore big data at speed of thought with Spark 2.0 and SnappydataExplore big data at speed of thought with Spark 2.0 and Snappydata
Explore big data at speed of thought with Spark 2.0 and Snappydata
Data Con LA
 
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
Chester Chen
 
夏俊鸾:Spark——基于内存的下一代大数据分析框架
夏俊鸾:Spark——基于内存的下一代大数据分析框架夏俊鸾:Spark——基于内存的下一代大数据分析框架
夏俊鸾:Spark——基于内存的下一代大数据分析框架
hdhappy001
 

Semelhante a Resource-Efficient Deep Learning Model Selection on Apache Spark (20)

Spark Summit EU talk by Luca Canali
Spark Summit EU talk by Luca CanaliSpark Summit EU talk by Luca Canali
Spark Summit EU talk by Luca Canali
 
Build Large-Scale Data Analytics and AI Pipeline Using RayDP
Build Large-Scale Data Analytics and AI Pipeline Using RayDPBuild Large-Scale Data Analytics and AI Pipeline Using RayDP
Build Large-Scale Data Analytics and AI Pipeline Using RayDP
 
CaffeOnSpark: Deep Learning On Spark Cluster
CaffeOnSpark: Deep Learning On Spark ClusterCaffeOnSpark: Deep Learning On Spark Cluster
CaffeOnSpark: Deep Learning On Spark Cluster
 
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...
 
Deep Dive into GPU Support in Apache Spark 3.x
Deep Dive into GPU Support in Apache Spark 3.xDeep Dive into GPU Support in Apache Spark 3.x
Deep Dive into GPU Support in Apache Spark 3.x
 
Explore big data at speed of thought with Spark 2.0 and Snappydata
Explore big data at speed of thought with Spark 2.0 and SnappydataExplore big data at speed of thought with Spark 2.0 and Snappydata
Explore big data at speed of thought with Spark 2.0 and Snappydata
 
Profiling & Testing with Spark
Profiling & Testing with SparkProfiling & Testing with Spark
Profiling & Testing with Spark
 
End-to-End Deep Learning with Horovod on Apache Spark
End-to-End Deep Learning with Horovod on Apache SparkEnd-to-End Deep Learning with Horovod on Apache Spark
End-to-End Deep Learning with Horovod on Apache Spark
 
Towards a Systematic Study of Big Data Performance and Benchmarking
Towards a Systematic Study of Big Data Performance and BenchmarkingTowards a Systematic Study of Big Data Performance and Benchmarking
Towards a Systematic Study of Big Data Performance and Benchmarking
 
GPU and Deep learning best practices
GPU and Deep learning best practicesGPU and Deep learning best practices
GPU and Deep learning best practices
 
2018 03 25 system ml ai and openpower meetup
2018 03 25 system ml ai and openpower meetup2018 03 25 system ml ai and openpower meetup
2018 03 25 system ml ai and openpower meetup
 
SnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14th
SnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14thSnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14th
SnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14th
 
Machine Learning With H2O vs SparkML
Machine Learning With H2O vs SparkMLMachine Learning With H2O vs SparkML
Machine Learning With H2O vs SparkML
 
Distributed Deep Learning on Hadoop Clusters
Distributed Deep Learning on Hadoop ClustersDistributed Deep Learning on Hadoop Clusters
Distributed Deep Learning on Hadoop Clusters
 
Project Tungsten Phase II: Joining a Billion Rows per Second on a Laptop
Project Tungsten Phase II: Joining a Billion Rows per Second on a LaptopProject Tungsten Phase II: Joining a Billion Rows per Second on a Laptop
Project Tungsten Phase II: Joining a Billion Rows per Second on a Laptop
 
Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest...
Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest...Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest...
Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest...
 
Boosting spark performance: An Overview of Techniques
Boosting spark performance: An Overview of TechniquesBoosting spark performance: An Overview of Techniques
Boosting spark performance: An Overview of Techniques
 
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
 
夏俊鸾:Spark——基于内存的下一代大数据分析框架
夏俊鸾:Spark——基于内存的下一代大数据分析框架夏俊鸾:Spark——基于内存的下一代大数据分析框架
夏俊鸾:Spark——基于内存的下一代大数据分析框架
 
Scaling TensorFlow with Hops, Global AI Conference Santa Clara
Scaling TensorFlow with Hops, Global AI Conference Santa ClaraScaling TensorFlow with Hops, Global AI Conference Santa Clara
Scaling TensorFlow with Hops, Global AI Conference Santa Clara
 

Mais de Databricks

Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 

Mais de Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Último

Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night StandCall Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
amitlee9823
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
amitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
gajnagarg
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 

Último (20)

Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning Approach
 
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night StandCall Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
 
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 

Resource-Efficient Deep Learning Model Selection on Apache Spark

  • 1.
  • 2. Resource-efficient Deep Learning Model Selection on Apache Spark Yuhao Zhang and Supun Nakandala ADALab, University of California, San Diego
  • 3. About us ▪ PHD students from ADALab at UCSD, advised by Prof. Arun Kumar ▪ Our research mission: democratize data science ▪ More: Supun Nakandala https://scnakandala.github.io/ Yuhao Zhang https://yhzhang.info/ ADALab https://adalabucsd.github.io/
  • 4. Introduction Artificial Neural Networks (ANNs) are revolutionizing many domains - “Deep Learning”
  • 5. Problem: training deep nets is Painful! Batch size? 8, 16, 64, 256 ... Model architecture? 3 layer CNN,5 layer CNN, LSTM… Learning rate? 0.1, 0.01, 0.001, 0.0001 ... Regularization? L2, L1, Dropout, Batchnorm ... 4 4 4 4 256 Different configurations ! Model performance = f(model architecture, hyperparameters, ...) →Trial and error Need for speed → $$$ (Distributed DL) → Better utilization of resources
  • 6. Outline 1. Background a. Mini-batch SGD b. Task Parallelism c. Data Parallelism 2. Model Hopper Parallelism (MOP) 3. MOP on Apache Spark a. Implementation b. APIs c. Tests
  • 7. Outline 1. Background a. Mini-batch SGD b. Task Parallelism c. Data Parallelism 2. Model Hopper Parallelism (MOP) 3. MOP on Apache Spark a. Implementation b. APIs c. Tests
  • 8. Introduction - mini-batch SGD Model Updated Model η ∇ Learning rate Avg. of gradients X1 X2 y 1.1 2.3 0 0.9 1.6 1 0.6 1.3 1 ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... One mini-batch The most popular algorithm family for training deep nets
  • 9. Introduction - mini-batch SGD X1 X2 y 1.1 2.3 0 0.9 1.6 1 0.6 1.3 1 ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... One epoch One mini-batch Sequential
  • 10. Outline 1. Background a. Mini-batch SGD b. Task Parallelism c. Data Parallelism 2. Model Hopper Parallelism (MOP) 3. MOP on Apache Spark a. Implementation b. APIs c. Tests
  • 11. Models (tasks) Machines with replicated datasets Task Parallelism - Problem Setting
  • 13. (Embarrassing) Task Parallelism Con: wasted network Shared FS or data repo
  • 14. Outline 1. Background a. Mini-batch SGD b. Task Parallelism c. Data Parallelism 2. Model Hopper Parallelism (MOP) 3. MOP on Apache Spark a. Implementation b. APIs c. Tests
  • 15. Data Parallelism - Problem Setting Models(tasks) Partitioned data High data scalability
  • 16. Data Parallelism Queue Training on one mini-batch or full partition ● Update only per epoch: bulk synchronous parallelism (model averaging) ○ Bad convergence ● Update per mini-batch: sync parameter server ○ + Async updates: async parameter server ○ + Decentralized: MPI allreduce (Horovod) ○ High communication cost Updates
  • 17. Task Parallelism + high throughput - low data scalability - memory/storage wastage Data Parallelism + high data scalability - low throughput - high communication cost Model Hopper Parallelism (Cerebro) + high throughput + high data scalability + low communication cost + no memory/storage wastage
  • 18. Outline 1. Background a. Mini-batch SGD b. Task Parallelism c. Data Parallelism 2. Model Hopper Parallelism (MOP) 3. MOP on Apache Spark a. Implementation b. APIs c. Tests
  • 19. Model Hopper Parallelism - Problem Setting Models (tasks) Partitioned data
  • 20. Model Hopper Parallelism Training on full local partitions One sub-epoch
  • 21. Model Hopper Parallelism Training on full local partitions Model hopping & training One sub-epoch
  • 22. Model Hopper Parallelism Training on full local partitions Model hopping & training Model hopping & training One sub-epoch
  • 23. Model Hopper Parallelism Training on full local partitions Model hopping & training Model hopping & trainingOne epoch One sub-epoch
  • 26. Cerebro -- Data System with MOP
  • 27. Outline 1. Background a. Mini-batch SGD b. Task Parallelism c. Data Parallelism 2. Model Hopper Parallelism (MOP) 3. MOP on Apache Spark a. Implementation b. APIs c. Tests
  • 28. MOP (Cerebro) on Spark Spark Driver Cerebro Scheduler Spark Worker Cerebro Worker Spark Worker Cerebro Worker Distributed File System (HDFS, NFS)
  • 29. Implementation Details ▪ Spark DataFrames converted to partitioned Parquet and locally cached in workers ▪ TensorFlow threads run training on local data partitions ▪ Model Hopping implemented via shared file system
  • 30. Outline 1. Background a. Mini-batch SGD b. Task Parallelism c. Data Parallelism 2. Model Hopper Parallelism (MOP) 3. MOP on Apache Spark a. Implementation b. APIs c. Tests
  • 31. Example: Grid Search on Model Selection + Hyperparameter Search ▪ Two model architecture: {VGG16, ResNet50} ▪ Two learning rate: {1e-4, 1e-6} ▪ Two batch size: {32, 256}
  • 32. Initialization from pyspark.sql import SparkSession import cerebro spark = SparkSession.builder.master(...) # initialize spark spark_backend = cerebro.backend.SparkBackend( spark_context=spark.sparkContext, num_workers=num_workers ) # initialize cerebro data_store = cerebro.storage.HDFSStore('hdfs://...') # set the shared data storage
  • 33. Define the Models params = {'model_arch':['vgg16', 'resnet50'], 'learning_rate':[1e-4, 1e-6], 'batch_size':[32, 256]} def estimator_gen_fn(params): '''A model factory that returns an estimator, given the input hyper-parameters, as well as model architectures''' if params['model_arch'] == 'resnet50': model = ... # tf.keras model elif params['model_arch'] == 'vgg16': model = ... # tf.keras model optimizer = tf.keras.optimizers.Adam(lr=params['learning_rate']) # choose optimizer loss = ... # define loss estimator = cerebro.keras.SparkEstimator(model=model, optimizer=optimizer, loss=loss, batch_size=params['batch_size']) return estimator
  • 34. Run Grid Search df = ... # read data in as Spark DataFrame grid_search = cerebro.tune.GridSearch(spark_backend, data_store, estimator_gen_fn, params, epoch=5, validation=0.2, feature_columns=['features'], label_columns=['labels']) model = grid_search.fit(df)
  • 35. Outline 1. Background a. Mini-batch SGD b. Task Parallelism c. Data Parallelism 2. Model Hopper Parallelism (MOP) 3. MOP on Apache Spark a. Implementation b. APIs c. Tests
  • 36. Tests - Setups - Hardware ▪ 9-node cluster, 1 master + 8 workers ▪ On each nodes: ▪ Intel Xeon 10-core 2.20 GHz CPU x 2 ▪ 192 GB RAM ▪ Nvidia P100 GPU x 1
  • 37. Tests - Setups - Workload ▪ Model selection + hyperparameter tuning on ImageNet ▪ Adam optimizer ▪ Grid search space: ▪ Model architecture: {ResNet50, VGG16} ▪ Learning rate: {1e-4, 1e-6} ▪ Batch size: {32, 256} ▪ L2 regularization: {1e-4, 1e-6}
  • 38. Tests - Results - Learning Curves
  • 39. Tests - Results - Learning Curves
  • 40. Tests - Results - Per Epoch Runtimes * Horovod uses GPU kernels for communication. Thus, it has high GPU utilization.
  • 41. Tests - Results - Runtimes * Horovod uses GPU kernels for communication. Thus, it has high GPU utilization. System Runtime (hrs/epoch) GPU Utili. (%) Storage Footprint (GiB) Train Validation TF PS - Async 8.6 250 Horovod 92.1 250 Cerebro-Spark 2.63 0.57 42.4 250 TF Model Averaging 1.94 0.03 72.1 250 Celery 1.69 0.03 82.4 2000 Cerebro-Standalone 1.72 0.05 79.8 250
  • 42. Tests - Cerebro-Spark Gantt Chart ▪ Only overhead: stragglers randomly caused by TF 2.1 Keras Model saving/loading. Overheads range from 1% to 300% Stragglers
  • 43. Tests - Cerebro-Spark Gantt Chart ▪ One epoch of training ▪ (Almost) optimal!
  • 45. Other Available Hyperparameter Tuning Algorithms ▪ PBT ▪ HyperBand ▪ ASHA ▪ Hyperopt
  • 46. More Features to Come ▪ Grouped learning ▪ API for transfer learning ▪ Model parallelism
  • 47. References ▪ Cerebro project site ▪ https://adalabucsd.github.io/cerebro-system ▪ Github repo ▪ https://github.com/adalabucsd/cerebro-system ▪ Blog post ▪ https://adalabucsd.github.io/research-blog/cerebro.html ▪ Tech report ▪ https://adalabucsd.github.io/papers/TR_2020_Cerebro.pdf
  • 50. Feedback Your feedback is important to us. Don’t forget to rate and review the sessions.