Hyperparameter optimization landscape Berlin ML Group meetup 8/2019

Hyperparameter Optimization
(landscape) in Python
kuba@neptune.ml
@NeptuneML
https://medium.com/neptune-ml
Jakub Czakon

● Intro
● Methods
● Libraries + Evaluation Criteria
○ Scikit-Optimize
○ Optuna + Hyperopt
○ HpBandster
● Results and Recommendations
Agenda

learning rate
depth
feature fraction
Model scoredata
Intro

learning rate
depth
feature fraction
Model
score
bin_nr
groupby columns
lagging
Feature
Engineering
imputation method
scaling method
Data
Cleaning
thresholds
Post-
processing
data
objective(params, data=data) -> score
Intro

● Grid search
● Random search
● Guided search
● Grad student search (still best)
Methods

● Better conﬁguration proposal
○ Objective function is estimated with surrogate models
○ Evolutionary methods
○ ...
● Faster objective function calculation
○ Bandid methods
○ Pruning
○ Estimating a score from the learning curve of NN
○ ...
Methods

Methods: surrogate models
objective(params) -> score
surrogate(params) -> est_score
surrogate(params2) =
est_score*2
est_score1000
objective(params2) = score2
est_score*1
expensive
cheap
explore cheap
try expensive

Methods: surrogate models
objective(params) -> score
surrogate(params) -> est_score
est_score*2
est_score1000
objective(params2) = score2
est_score*1
TPE, GP, RF
EI, PI, LCB

Methods: bandid methods
objective(params, data=data) -> score
objective(params, budget=full) -> score
objective(params, budget=low) -> score
Estimate the score with lower ﬁdelity runs

● Budget options:
○ Dataset size
○ Number of epochs
○ Time
○ Number of features
○ Number of CV-folds

● Successive halving: set resource, set budget, set run nr link
● Hyperband: random resource, grid search run nr, within set budget link

Methods: pruning
● Prune/abort runs that show little hope before they ﬁnish
● Isn’t it called early stopping?

● Scikit-Optimize
● Optuna
● Hyperopt (sort of)
● HpBandSter
● Ray.tune (future work)
● … and many more
Libraries

● Algorithm
● API / ease of use
● Documentation
● Speed / Parallelization
● Visualization suite
● Experimental results
Evaluation Criteria

Algorithm
● Objective function estimated with surrogate models
○ Random Forests
○ Gradient Boosted Trees
○ Gaussian process
● Next run params selected via acquisition function
○ Expected Improvement
○ Probability of Improvement
○ Lower Conﬁdence Bound
● No objective func calculation speedup mechanism

API
search space
+
objective
+
{fun}_minimize

API: search space
● Basic options:
○ skopt.space.Real
○ skopt.space.Integer
○ skopt.space.Categorical
● No support for nested search spaces

API: search space
SPACE = [skopt.space.Real(0.01, 0.5, name='learning_rate', prior='log-uniform'),
skopt.space.Integer(1, 30, name='max_depth'),
skopt.space.Integer(2, 100, name='num_leaves'),
skopt.space.Integer(10, 1000, name='min_data_in_leaf'),
skopt.space.Real(0.1, 1.0, name='feature_fraction', prior='uniform'),
skopt.space.Real(0.1, 1.0, name='subsample', prior='uniform'),
]

API: objective
● Deﬁne a function to minimize!
● Decorate if you want to keep parameter names

API: objective
def objective(**params):
return -1.0 * train_evaluate(X, y, **params)
@skopt.utils.use_named_args(SPACE)
def objective(**params):
return -1.0 * train_evaluate(X, y, **params)

API: {fun}_minimize
● A few optimizers to choose from
○ skopt.forest_minimize
○ skopt.gbrt_minimize
○ skopt.gp_minimize
● Accepts callbacks

API: {fun}_minimize
results = skopt.forest_minimize(objective, SPACE,
n_calls=100,
n_random_starts=10,
base_estimator='ET',
acq_func='LCB',
xi=0.02,
kappa=1.96)

API: {fun}_minimize
def monitor(res):
neptune.send_metric('run_score', res.func_vals[-1])
results = skopt.forest_minimize(..., callback=[monitor])

API: {fun}_minimize
● There are (hyper)hyperparameters
● Acquisition function:
○ ‘EI’, ‘PI’ , expected improvement probability of improvement (max)
○ ‘LCB’, expected value of objective + variance of GP
● Exploration vs exploitation
○ xi for ‘EI’ and ‘PI’, low xi exploration high xi exploitation
○ Kappa for ‘LCB’, low kappa exploitation, high kappa exploration

Documentation
● Amazing!
● Functions have docstrings.
● A lot of examples.
link

Visualizations
● Options:
○ skopt.plots.plot_convergence - score improvement
○ skopt.plots.plot_evaluations - space search evolution
○ skopt.plots.plot_objective - sensitivity
● Beautiful and very useful.

Visualizations: plot_convergence
skopt.plots.plot_convergence(results)

Visualizations: plot_convergence
skopt.plots.plot_convergence(results_list)

Visualizations: plot_evaluations
skopt.plots.plot_evaluations(results)

Visualizations: plot_evaluations
skopt.plots.plot_objective(results)

Speed & Parallelization
● Runs sequentially and you cannot distribute it across many machines
● You can parallelize base estimator at every run with n_jobs
● If you have just 1 machine it is fast

Conclusions: good
● Easy to use API and great documentation
● A lot of optimizers and tweaking options
● Awesome visualizations
● Solid gains over the random search
● Fast if you are running sequentially on 1 machine
● Active project support

Conclusions: bad
● Search space doesn’t support nesting
● No support for distributed computing

Algorithm
● Objective function estimated with Tree of Parzen Estimators
● Next run params selected via Expected Improvement
● Objective func calculation speedup via run pruning and
successive halving (optionally)

API
search space & objective
+
{fun}_minimize

API: search space & objective
def objective(trial):
params = OrderedDict([
('learning_rate',trial.suggest_loguniform('learning_rate', 0.01, 0.5)),
('max_depth',trial.suggest_int('max_depth', 1, 30)),
('num_leaves',trial.suggest_int('num_leaves', 2, 100)),
('min_data_in_leaf',trial.suggest_int('min_data_in_leaf', 10, 1000)),
('feature_fraction',trial.suggest_uniform('feature_fraction', 0.1, 1.0)),
('subsample',trial.suggest_uniform('subsample', 0.1, 1.0))])
score = -1.0 * train_evaluate(X, y, params)
return score

● Basic options:
○ suggest_categorical
○ suggest_int , suggest_discrete_uniform
○ suggest_uniform , suggest_loguniform
● Nested search spaces
● Deﬁned in-run (pytorch-like)

classifier_name = trial.suggest_categorical('classifier', ['SVC', 'RandomForest'])
if classifier_name == 'SVC':
svc_c = trial.suggest_loguniform('svc_c', 1e-10, 1e10)
classifier_obj = sklearn.svm.SVC(C=svc_c)
else:
rf_max_depth = int(trial.suggest_loguniform('rf_max_depth', 2, 32))
classifier_obj = sklearn.ensemble.RandomForestClassifier(max_depth=rf_max_depth)
…

API: {fun}_minimize
● Allows pruning
● Handles exceptions in objective
● Handles callbacks

study = optuna.create_study()
study.optimize(objective, n_trials=100)
results = study.trails_dataframe()
API: {fun}_minimize

API: {fun}_minimize: pruning
from optuna.integration import LightGBMPruningCallback
params = OrderedDict([
('max_depth',trial.suggest_int('max_depth', 1, 30)),
('num_leaves',trial.suggest_int('num_leaves', 2, 100))])
pruning_callback = LightGBMPruningCallback(trial, 'auc')
score = -1.0 * train_evaluate_with_pruning(X, y, params, pruning_callback)
return score
def train_evaluate_with_pruning(X, y, params, callback):
...
model = lgb.train(params, train_data, ... , callbacks = [pruning_callback])
return model.best_score['valid']['auc']

API: {fun}_minimize: callbacks
study = optuna.create_study()
study.optimize(objective, n_trials=100, callbacks=[report_neptune])
def report_neptune(study, trial):
neptune.send_metric('value', trial.value)
neptune.send_metric('best_value', study.best_value)
Available in bleeding edge version from source*

Documentation
● Solid read-the-docs project,
● Docstrings, docstrings everywhere,
● A lot of examples.
link

Visualizations
● Options:
○ optuna.visualization.plot_intermediate_values
○ optuna.visualization.plot_optimization_history
● Basic monitoring
● Available in bleeding edge version from source*

● Can be easily distributed across one or many machines
● Has pruning to speed up unpromising runs

Speed & Parallelization: one
study.optimize(objective, n_trials=100, n_jobs=5)

Speed & Parallelization: many
…
study = optuna.Study(study_name='distributed-search', storage='sqlite:///example.db')
study.optimize(objective, n_trials=100)
...
$ optuna create-study --study-name "distributed-search" --storage "sqlite:///example.db"
$ python optuna_search.py
$ python optuna_search.py
terminal 1
terminal 2
terminal 3
optuna_search.py

Conclusions: good
● Easy to use API
● Great documentation
● Can be easily distributed over a cluster of machines
● Has pruning
● Has callbacks
● Search space supports nesting
● Active project support

Conclusions: bad
● Only TPE optimizer available
● Only some visualizations
● *No gains over the random search (with 100 iterations budget)

Optuna is hyperopt with:
● better api
● waaaay better documentation
● pruning (and halving available)
● exception handling
● simpler parallelization
● active project support

Should I swap hyperopt with optuna?

HpBandSter
https://www.automl.org/

● HyperBand on Steroids
● It has state-of-the-art algorithms
○ Hyperband link
○ BOHB (Bayesian Optimization + Hyperband) link
● Distributed-computing-ﬁrst API
HpBandSter

Algorithm
● Objective function estimated with TPE
● Next run params selected via Expected Improvement
● Objective func calculation speedup via bandid with
random budgets (hyperband)

API
server
+
worker
+
optimizer

API: server
● Workers communicate with server to:
○ get next parameter conﬁguration
○ send results
● You have to deﬁne it even for the most basic setups/problems (weird)

API: server
import hpbandster.core.nameserver as hpns
NS = hpns.NameServer(run_id=RUN_ID, host=HOST, port=PORT, working_directory=WORKING_DIRECTORY)
ns_host, ns_port = NS.start()

API: worker: objective
from hpbandster.core.worker import Worker
class TrainEvalWorker(Worker):
...
def compute(self, config, budget, working_directory, *args, **kwargs):
loss = -1.0 * train_evaluate(self.X, self.y, budget, config)
return ({'loss': loss,
'info': { 'auxiliary_stuff': 'worked'
}
})

API: worker: search space
● Basic options:
○ CSH.{Categorical/Ordinal}Hyperparameter
○ CSH.{Uniform/Normal}IntegerHyperparameter
○ CSH.{Uniform/Normal}FloatHyperparameter
● Nested search spaces with ifs

API: worker: search space
class TrainEvalWorker(Worker):
...
@staticmethod
def get_configspace():
cs = CS.ConfigurationSpace()
learning_rate = CSH.UniformFloatHyperparameter('learning_rate',
lower=0.01, upper=0.5, default_value=0.01, log=True)
subsample = CSH.UniformFloatHyperparameter('subsample',
lower=0.1, upper=1.0, default_value=0.5, log=False)
cs.add_hyperparameters([learning_rate, subsample])
return cs

API: worker: connecting to server
worker = TrainEvalWorker(run_id=RUN_ID, nameserver=ns_host, nameserver_port=ns_port)
worker.run(background=True)

API: optimizer
from hpbandster.optimizers import BOHB
optim = BOHB(configspace = worker.get_configspace(),
run_id = RUN_ID,
nameserver=ns_host,
nameserver_port=ns_port,
eta=3, min_budget=0.1, max_budget=1,
num_samples=64, top_n_percent=15,
min_bandwidth=1e-3, bandwidth_factor=3)
study = optim.run(n_iterations=100)

API: optimizer: callbacks
class NeptuneLogger:
def new_config(self, *args, **kwargs):
pass
def __call__(self, job):
neptune.send_metric('run_score', job.result['loss'])
neptune.send_text('run_parameters', str(job.kwargs['config']))
optim = BOHB(configspace=worker.get_configspace(),
run_id=RUN_ID,
nameserver=ns_host,
nameserver_port=ns_port,
result_logger=NeptuneLogger())

Documentation
● Decent Read-the-docs project,
● Missing docstrings in a lot of places,
● A bunch of examples.
link

Visualizations
● Options:
○ hpvis.losses_over_time - score improvement
○ hpvis.concurrent_runs_over_time - speed/parallelization
○ hpvis.ﬁnished_runs_over_time - budget adjustment
○ hpvis.correlation_across_budgets - budget adjustment
○ hpvis.performance_histogram_model_vs_random - sanity check
● Very lib/debug-speciﬁc but can be useful for tweaking

Visualizations: losses_over_time

Visualizations: losses_over_time
all_runs = results.get_all_runs()
hpvis.losses_over_time(all_runs);

Visualizations:
correlation_across_budgets

Visualizations:
correlation_across_budgets
hpvis.correlation_across_budgets(results);

Visualizations:
performance_histogram_model_vs_random

Visualizations:
performance_histogram_model_vs_random
all_runs = results.get_all_runs()
id2conf = results.get_id2config_mapping()
hpvis.performance_histogram_model_vs_random(all_runs, id2conf);

● Can be easily distributed across threads/processes/machines

Speed & Parallelization: threads
workers=[]
for i in range(N_WORKERS):
w = TrainEvalWorker(run_id=RUN_ID, id=isleep_interval = 0.5,
nameserver=ns_host, nameserver_port=ns_port)
w.run(background=True)
workers.append(w)
optim = BOHB(configspace = TrainEvalWorker.get_configspace(),
run_id = RUN_ID,
nameserver=ns_host,
nameserver_port=ns_port)
study = optim.run(n_iterations=100, min_n_workers=N_WORKERS)

Speed & Parallelization: processes
workers=[]
for i in range(N_WORKERS):
w = TrainEvalWorker(run_id=RUN_ID, id=isleep_interval = 0.5,
nameserver=ns_host, nameserver_port=ns_port)
w.run(background=False)
exit(0)
optim = BOHB(configspace = TrainEvalWorker.get_configspace(),
run_id = RUN_ID,
nameserver=ns_host,
nameserver_port=ns_port)
study = optim.run(n_iterations=100, min_n_workers=N_WORKERS)

Speed & Parallelization: machines
Follow the example from the docs … but it is not obvious

Conclusions: good
● State-of-the-art algorithm
● Can be distributed over a cluster of machines
● Useful visualizations
● Search space supports nesting

Conclusions: bad
● Project is not very active
● Complicated API
● Missing docstrings

Results (mostly subjective)
Scikit-Optimize Optuna HpBandSter Hyperopt
API/ease of use Great Great Diﬃcult Good
Documentation Great Great Ok(ish) Bad
Speed/Parallelization Fast if
sequential/None
Great Good Ok
Visualizations Amazing Basic Very lib speciﬁc Some
*Experimental results 0.8566 (100) 0.8419 (100)
0.8597 (10000)
0.8629 (100) 0.8420 (100)

Dream library
Scikit-Optimize Visualizations
+
Optuna API + Docs + Pruning + Callbacks +
Parallelization
+
HpBandSter Optimizers

Conversions between
results objects are in
neptune-contrib
import neptunecontrib.hpo.utils as hpo_utils
results = hpo_utils.optuna2skopt(study)
Dream library

● If you don’t have a lot of resources - use Scikit-Optimize
● If you want to get SOTA and don’t care about API/Docs - use HpBandSter
● If you want good docs/api/parallelization - use Optuna
Recommendations

● Slides link on Twitter @NeptuneML or Linkedin @neptune.ml
● Blog posts on Medium @jakub.czakon
● Experiments in Neptune tags skopt/optuna/hpbandster
○ Code
○ Best hyperparams and Hyper hyper params
○ learning curves
○ diagnostic charts
○ resource consumption charts
○ pickled results objects
Materials

Data science work sharing hub.
Track | Organize | Collaborate
kuba@neptune.ml
@NeptuneML
https://medium.com/neptune-ml
Jakub Czakon

Hyperparameter optimization landscape Berlin ML Group meetup 8/2019

Recommended

Recommended

More Related Content

Similar to Hyperparameter optimization landscape Berlin ML Group meetup 8/2019

Similar to Hyperparameter optimization landscape Berlin ML Group meetup 8/2019 (20)

Recently uploaded

Recently uploaded (20)

Hyperparameter optimization landscape Berlin ML Group meetup 8/2019