This document discusses techniques for optimizing deep learning models, including hyperparameter optimization. It describes SigOpt's approach which uses software to automate repeatable tasks like training orchestration and model tuning. Experts can then focus on data science tasks. SigOpt utilizes techniques like Bayesian optimization, multitask optimization, and infrastructure orchestration to improve model performance while reducing costs and tuning time.
3. DevOps Builds and Maintains Proprietary Infrastructure
Tasks that depend on your particular infrastructure
(e.g., model lifecycle management, model deployment)
Experts Focus on Data Science
Tasks that benefit from domain expertise
(e.g., metric-function selection)
Our model management philosophy
Software Automates Repeatable Tasks
Tasks that do not benefit from domain expertise
(e.g., training orchestration, model tuning)
4.
5. Hyperparameter Optimization
Model tuning
Grid Search
Random Search Bayesian Optimization
Training & Tuning
Evolutionary Algorithms
Deep Learning Architecture Search
Hyperparameter Search
6. How we optimize models
We never
access your
data or models
Iterative, automated optimization
Built specifically
for scalable
enterprise use
cases
12. Easily track, manage and reproduce experiments
Uncover model insights with
parameter importance
Monitor performance improvement as
the experiment progresses via API, the
web or your mobile phone
Cycle through analysis, suggestions,
history, and other experiment insights
13. Benefits: Better, cheaper, faster model development
90% Cost Savings
Maximize utilization of compute
https://aws.amazon.com/blogs/machine-learning/fast
-cnn-tuning-with-aws-gpu-instances-and-sigopt/
10x Faster Time to Tune
Less expert time per model
https://devblogs.nvidia.com/sigopt-deep-learning-hy
perparameter-optimization/
Better Performance
No free lunch, but optimize any model
https://arxiv.org/pdf/1603.09441.pdf
14. Overview of features behind SigOpt
Enterprise
Platform
Optimization
Engine
Experiment
Insights
Reproducibility
Intuitive web dashboards
Cross-team permissions
and collaboration
Advanced experiment
visualizations
Organizational
experiment analysis
Parameter importance
analysis
Multimetric optimization
Continuous, categorical,
or integer parameters
Constraints and failure
regions
Up to 10k observations,
100 parameters
Multitask optimization
and high parallelism
Conditional
parameters
Infrastructure agnostic
REST API
Model agnostic
Black-Box Interface
Doesn’t touch data
Libraries for Python,
Java, R, and MATLAB
Key:
Only HPO solution
with this capability
20. Training Resnet-50 on ImageNet takes 12 hours
Tuning 12 parameters requires at least 120 distinct models
That equals 1440 hours, or 60 days, of training time
23. Start with a simple idea:
We can use information about “partially trained” models
to more efficiently inform hyperparameter tuning
24. Building on prior research related to successive halving and Bayesian
techniques, Multitask samples lower-cost tasks to inexpensively learn
about the model and accelerate full Bayesian Optimization.
Swersky, Snoek, and Adams, “Multi-Task Bayesian Optimization”
http://papers.nips.cc/paper/5086-multi-task-bayesian-optimization.pdf
“
25. Cheap approximations promise a route to tractability, but bias and
noise complicate their use. An unknown bias arises whenever a
computational model incompletely models a real-world phenomenon,
and is pervasive in applications.
Poloczek, Wang, and Frazier, “Multi-Information Source Optimization”
https://papers.nips.cc/paper/7016-multi-information-source-optimization.pdf
“
36. Case: Putting multitask optimization to the test
Goal: Benchmark the performance of Multitask and Early Termination methods across a broad
variety of tasks and strategies to get a more complete sense of performance
Model: CNN
Dataset: CIFAR-10
Methods:
● Multitask Optimization
● Early Termination (Hyperband)
● Random Search
37. Multitask shows best performance
Benchmark: Which optimization technique most
efficiently tunes 10 hyperparameters under
compute constraints?
39. Complexity of deep learning DevOps
Concurrent optimization experiments
Concurrent model configuration
evaluations
Multiple GPUs per model
Training one model, no optimization
Basic Case Advanced Case
Multiple Users
41. 1 Spin up and share training clusters
$ sigopt create cluster $ sigopt run -f orchestrate.yml
Containerized
Model
Schedule optimization experiments2
Integrate with the optimization API3 Monitor experiment and infrastructure4
Optimization
How it works: Command-line orchestration
50. Training Resnet-50 on ImageNet takes 12 4 hours
Tuning 12 parameters requires at least 120 distinct models
That equals 1,440 480 hours, or 60 20 days, of training time
While training on 10 machines, wall-clock time is 2 days
52. Thank you
Try SigOpt Orchestrate: https://sigopt.com/orchestrate
Free access for Academics & Nonprofits: https://sigopt.com/edu
Solution-oriented program for the Enterprise: https://sigopt.com/pricing
Leading applied optimization research: https://sigopt.com/research
… and we're hiring! https://sigopt.com/careers