In this talk, I describe some newest Hyperparameter optimization libraries in Python.
Discuss the pros and cons of them and help you choose the one for you.
2. ● Slides link on Twitter @NeptuneML
● Blog posts on Medium @jakub.czakon:
○ Part 0 intro
○ Part 1 Scikit-Optimize
● Experiments on Neptune experiments link
Materials
19. API: {fun}_minimize
● There are (hyper)hyperparameters
● Acquisition function:
○ ‘EI’, ‘PI’ , minimal improvement or probability of improvement
○ ‘LCB’, what is the max expected improvement
● Exploration vs exploitation
○ xi for ‘EI’ and ‘PI’, low xi exploration high xi exploitation
○ Kappa for ‘LCB’, low kappa exploitation, high kappa exploration
28. Speed & Parallelization
● Runs sequentially and you cannot distribute it across many machines
● You can parallelize base estimator at every run with n_jobs
● If you have just 1 machine it is fast
29. Experimental Results
● Example training script is available here.
● I grid searched over acquisition functions, xi and kappa.
● All the experiments are available in Neptune with a tag skopt
31. Conclusions: good
● Easy to use API and great documentation
● A lot of optimizers and tweaking options
● Awesome visualizations
● Solid gains over the random search
● Fast if you are running sequentially on 1 machine
38. API: {fun}_minimize
● Selects based on Tree Parzen Estimation (like hyperopt) paper
● Allows pruning
● Handles exceptions in objective
● Doesn’t have callbacks
47. Experimental Results
● Example training script is available here.
● No (hyper)hyperparameters to search
● All the experiments are available in Neptune with a tag optuna
49. Conclusions: good
● Easy to use API
● Great documentation
● Can be easily distributed over a cluster of machines
● Has pruning
● Search space supports nesting
50. Conclusions: bad
● Only TPE optimizer available
● No visualizations
● *No gains over the random search (with 100 iterations budget)
51. Optuna is hyperopt with:
● better api
● waaaay better documentation
● pruning
● exception handling
● simpler parallelization
55. ● HyperBand on Steroids
● It has state-of-the-art algorithms
○ Hyperband paper
○ BOHB (Bayesian Optimization + Hyperband) paper
● Distributed-computing-first API
HpBandSter
58. API: server
● Workers communicate with server to:
○ get next parameter configuration
○ send results
● You have to define it even for the most basic problems (weird)
75. Experimental Results
● Example training script is available here.
● A lot of (hyper)hyperparameters to search
● All the experiments are available in Neptune with a tag hpbandster
77. Conclusions: good
● State-of-the-art algorithm
● Can be distributed over a cluster of machines
● Visualizations
● Search space (kind of) supports nesting
80. Results (mostly subjective)
Scikit-Optimize Optuna HpBandSter Hyperopt
API/ease of use Great Great Difficult Good
Documentation Great Great Ok(ish) Bad
Speed/Parallelization Fast if
sequential/None
Great Good Ok
Visualizations Amazing None Very lib specific Some
*Experimental results 0.8566 (100) 0.8419 (100)
0.8597 (10000)
0.8629 (100) 0.8420 (100)
81. Dream library
Viz + Callbacks Scikit-Optimize
+
API + Docs + Pruning + Parallelization Optuna
+
Optimizers HpBandSter
82. Conversions between
results objects are in
neptune-contrib
import neptunecontrib.hpo.utils as hpo_utils
results = hpo_utils.optuna2skopt(study)
Dream library
83. ● If you don’t have a lot of resources - use Scikit-Optimize
● If you want to get SOTA and don’t care about API/Docs - use HpBandSter
● If you want good docs/api/parallelization - use Optuna
Recommendations
84. Data science work sharing hub.
Track | Organize | Collaborate
jakub@neptune.ml
@NeptuneML
https://medium.com/neptune-ml
Jakub Czakon