Dss2019 hyperparameter optimization landscape

Hyperparameter Optimization
(landscape) in Python
kuba@neptune.ml
@NeptuneML
https://medium.com/neptune-ml
Jakub Czakon

● Slides link on Twitter @NeptuneML
● Blog posts on Medium @jakub.czakon:
○ Part 0 intro
○ Part 1 Scikit-Optimize
● Experiments on Neptune experiments link
Materials

learning rate
depth
feature fraction
Model scoredata
https://bbabenko.github.io/convolutional-learnings/

learning rate
depth
feature fraction
Model score
bin_nr
groupby columns
lagging
Feature
Engineering
imputation method
scaling method
Data
Cleaning
thresholds
Post-
processing
data
objective(params, data=data) -> score

● Grid search
● Random search
● Guided search (Bayesian Optimization)
● Grad student search (still best)
Methods

● Scikit-Optimize
● Optuna
● HpBandSter
● Hyperopt (sort of)
● … many more
Libraries

● API / ease of use
● Documentation
● Speed / Parallelization
● Visualization suite
● Experimental results
Evaluation Criteria

API
search space
+
objective
+
{fun}_minimize

API: search space
● Basic options:
○ skopt.space.Real
○ skopt.space.Integer
○ skopt.space.Categorical
● No support for nested search spaces

API: search space
SPACE = [skopt.space.Real(0.01, 0.5, name='learning_rate', prior='log-uniform'),
skopt.space.Integer(1, 30, name='max_depth'),
skopt.space.Integer(2, 100, name='num_leaves'),
skopt.space.Integer(10, 1000, name='min_data_in_leaf'),
skopt.space.Real(0.1, 1.0, name='feature_fraction', prior='uniform'),
skopt.space.Real(0.1, 1.0, name='subsample', prior='uniform'),
]

API: objective
● Deﬁne a function to minimize!
● Decorate if you want to keep parameter names

API: objective
def objective(**params):
return -1.0 * train_evaluate(X, y, **params)
@skopt.utils.use_named_args(SPACE)
def objective(**params):
return -1.0 * train_evaluate(X, y, **params)

API: {fun}_minimize
● A few optimizers to choose from
○ skopt.forest_minimize
○ skopt.gbrt_minimize
○ skopt.gp_minimize
● Accepts callbacks

API: {fun}_minimize
results = skopt.forest_minimize(objective, SPACE,
callback=[monitor],
n_calls=100,
n_random_starts=10,
base_estimator='ET',
acq_func='LCB',
xi=0.02,
kappa=1.96)

API: {fun}_minimize
def monitor(res):
neptune.send_metric('run_score', res.func_vals[-1])

API: {fun}_minimizeAPI: {fun}_minimize
objective(params) -> score
surrogate(params) -> est_score
surrogate(params2) =
est_score*2
est_score1000
objective(params2) = score2
est_score*1
expensive
cheap
explore cheap
try expensive

API: {fun}_minimize
● There are (hyper)hyperparameters
● Acquisition function:
○ ‘EI’, ‘PI’ , minimal improvement or probability of improvement
○ ‘LCB’, what is the max expected improvement
● Exploration vs exploitation
○ xi for ‘EI’ and ‘PI’, low xi exploration high xi exploitation
○ Kappa for ‘LCB’, low kappa exploitation, high kappa exploration

Documentation
● Amazing!
● Functions have docstrings.
● A lot of examples.
link

Visualizations
● Options:
○ skopt.plots.plot_convergence - score improvement
○ skopt.plots.plot_evaluations - space search evolution
○ skopt.plots.plot_objective - sensitivity
● Beautiful and very useful.

Visualizations: plot_convergence
skopt.plots.plot_convergence(results)

Visualizations: plot_convergence
skopt.plots.plot_convergence(results_list)

Visualizations: plot_evaluations
skopt.plots.plot_evaluations(results)

skopt.plots.plot_evaluations(results)
random search forest_minimize

skopt.plots.plot_objective(results)

Speed & Parallelization
● Runs sequentially and you cannot distribute it across many machines
● You can parallelize base estimator at every run with n_jobs
● If you have just 1 machine it is fast

Experimental Results
● Example training script is available here.
● I grid searched over acquisition functions, xi and kappa.
● All the experiments are available in Neptune with a tag skopt

Conclusions: good
● Easy to use API and great documentation
● A lot of optimizers and tweaking options
● Awesome visualizations
● Solid gains over the random search
● Fast if you are running sequentially on 1 machine

Conclusions: bad
● Search space doesn’t support nesting
● No support for distributed computing

API
search space & objective
+
{fun}_minimize

API: search space & objective
def objective(trial):
params = OrderedDict([
('learning_rate',trial.suggest_loguniform('learning_rate', 0.01, 0.5)),
('max_depth',trial.suggest_int('max_depth', 1, 30)),
('num_leaves',trial.suggest_int('num_leaves', 2, 100)),
('min_data_in_leaf',trial.suggest_int('min_data_in_leaf', 10, 1000)),
('feature_fraction',trial.suggest_uniform('feature_fraction', 0.1, 1.0)),
('subsample',trial.suggest_uniform('subsample', 0.1, 1.0))])
score = -1.0 * train_evaluate(X, y, params)
return score

● Basic options:
○ suggest_categorical
○ suggest_int , suggest_discrete_uniform
○ suggest_uniform , suggest_loguniform
● Nested search spaces with ifs

classifier_name = trial.suggest_categorical('classifier', ['SVC', 'RandomForest'])
if classifier_name == 'SVC':
svc_c = trial.suggest_loguniform('svc_c', 1e-10, 1e10)
classifier_obj = sklearn.svm.SVC(C=svc_c)
else:
rf_max_depth = int(trial.suggest_loguniform('rf_max_depth', 2, 32))
classifier_obj = sklearn.ensemble.RandomForestClassifier(max_depth=rf_max_depth)
…

API: {fun}_minimize
● Selects based on Tree Parzen Estimation (like hyperopt) paper
● Allows pruning
● Handles exceptions in objective
● Doesn’t have callbacks

study = optuna.create_study()
study.optimize(objective, n_trials=100])
results = study.trails_dataframe()
API: {fun}_minimize

API: {fun}_minimize: pruning
from optuna.integration import LightGBMPruningCallback
params = OrderedDict([
('max_depth',trial.suggest_int('max_depth', 1, 30)),
('num_leaves',trial.suggest_int('num_leaves', 2, 100))])
pruning_callback = LightGBMPruningCallback(trial, 'auc')
score = -1.0 * train_evaluate_with_pruning(X, y, params, pruning_callback)
return score
def train_evaluate_with_pruning(X, y, params, callback):
...
model = lgb.train(params, train_data, ... , callbacks = [pruning_callback])
return model.best_score['valid']['auc']

Documentation
● Solid read-the-docs project,
● Docstrings, docstrings everywhere,
● A lot of examples.
link

● Can be easily distributed across one or many machines
● Has pruning to speed up unpromising runs

Speed & Parallelization: one
study.optimize(objective, n_trials=100, n_jobs=5)

Speed & Parallelization: many
…
study = optuna.Study(study_name='distributed-search', storage='sqlite:///example.db')
study.optimize(objective, n_trials=100)
...
$ optuna create-study --study-name "distributed-search" --storage "sqlite:///example.db"
$ python optuna_search.py
terminal 1
terminal 2
terminal 3
optuna_search.py

Speed & Parallelization: many (aws)
$ optuna create-study --study-name "distributed-search" --storage "DB_ADDRESS"
instance 1
instance 2
instance 3
$ python optuna_search.pyinstance 1
create ECS cluster
create RDS DB instance
launch ECS instances
setup

● No (hyper)hyperparameters to search
● All the experiments are available in Neptune with a tag optuna

Conclusions: good
● Easy to use API
● Great documentation
● Can be easily distributed over a cluster of machines
● Has pruning
● Search space supports nesting

Conclusions: bad
● Only TPE optimizer available
● No visualizations
● *No gains over the random search (with 100 iterations budget)

Optuna is hyperopt with:
● better api
● waaaay better documentation
● pruning
● exception handling
● simpler parallelization

HpBandSter
● Budget options:
○ Dataset size
○ Number of epochs
○ Time
○ Number of features

● HyperBand on Steroids
● It has state-of-the-art algorithms
○ Hyperband paper
○ BOHB (Bayesian Optimization + Hyperband) paper
● Distributed-computing-ﬁrst API
HpBandSter

API
server
+
worker
+
optimizer

API: server
● Workers communicate with server to:
○ get next parameter conﬁguration
○ send results
● You have to deﬁne it even for the most basic problems (weird)

API: server
import hpbandster.core.nameserver as hpns
NS = hpns.NameServer(run_id=RUN_ID, host=HOST, port=PORT, working_directory=WORKING_DIRECTORY)
ns_host, ns_port = NS.start()

API: worker: objective
from hpbandster.core.worker import Worker
class TrainEvalWorker(Worker):
...
def compute(self, config, budget, working_directory, *args, **kwargs):
loss = -1.0 * train_evaluate(self.X, self.y, budget, params)
return ({'loss': loss,
'info': { 'auxiliary_stuff': 'worked'
}
})

API: worker: search space
● Basic options:
○ CSH.{Categorical/Ordinal}Hyperparameter
○ CSH.{Uniform/Normal}IntegerHyperparameter
○ CSH.{Uniform/Normal}FloatHyperparameter
● Nested search spaces with ifs

API: worker: search space
class TrainEvalWorker(Worker):
...
@staticmethod
def get_configspace():
cs = CS.ConfigurationSpace()
learning_rate = CSH.UniformFloatHyperparameter('learning_rate',
lower=0.01, upper=0.5, default_value=0.01, log=True)
subsample = CSH.UniformFloatHyperparameter('subsample',
lower=0.1, upper=1.0, default_value=0.5, log=False)
cs.add_hyperparameters([learning_rate, subsample])
return cs

API: worker: connecting to server
worker = TrainEvalWorker(run_id=RUN_ID, nameserver=ns_host, nameserver_port=ns_port)
worker.run(background=True)

API: optimizer
from hpbandster.optimizers import BOHB
optim = BOHB(configspace = worker.get_configspace(),
run_id = RUN_ID,
nameserver=ns_host,
nameserver_port=ns_port,
result_logger=NeptuneLogger(),
eta=3, min_budget=0.1, max_budget=1,
num_samples=64, top_n_percent=15,
min_bandwidth=1e-3, bandwidth_factor=3)
study = optim.run(n_iterations=100)

API: optimizer: callbacks
class NeptuneLogger:
def new_config(self, *args, **kwargs):
pass
def __call__(self, job):
neptune.send_metric('run_score', job.result['loss'])
neptune.send_text('run_parameters', str(job.kwargs['config']))

Documentation
● Decent Read-the-docs project,
● Missing docstrings in a lot of places,
● A bunch of examples.
link

Visualizations
● Options:
○ hpvis.losses_over_time - score improvement
○ hpvis.concurrent_runs_over_time - speed/parallelization
○ hpvis.ﬁnished_runs_over_time - budget adjustment
○ hpvis.correlation_across_budgets - budget adjustment
○ hpvis.performance_histogram_model_vs_random - sanity check
● Very lib/debug-speciﬁc but can be useful for tweaking

Visualizations:
losses_over_time
all_runs = results.get_all_runs()
hpvis.losses_over_time(all_runs);

Visualizations:
correlation_across_budgets
hpvis.correlation_across_budgets(results);

Visualizations:
performance_histogram_model_vs_random
all_runs = results.get_all_runs()
id2conf = results.get_id2config_mapping()
hpvis.performance_histogram_model_vs_random(all_runs, id2conf);

● Can be easily distributed across threads/processes/machines

Speed & Parallelization: threads
workers=[]
for i in range(N_WORKERS):
w = TrainEvalWorker(run_id=RUN_ID, id=isleep_interval = 0.5,
nameserver=ns_host, nameserver_port=ns_port)
w.run(background=True)
workers.append(w)
optim = BOHB(configspace = TrainEvalWorker.get_configspace(),
run_id = RUN_ID,
nameserver=ns_host,
nameserver_port=ns_port)
study = optim.run(n_iterations=100, min_n_workers=N_WORKERS)

Speed & Parallelization: processes
workers=[]
for i in range(N_WORKERS):
w = TrainEvalWorker(run_id=RUN_ID, id=isleep_interval = 0.5,
nameserver=ns_host, nameserver_port=ns_port)
w.run(background=False)
exit(0)
optim = BOHB(configspace = TrainEvalWorker.get_configspace(),
run_id = RUN_ID,
nameserver=ns_host,
nameserver_port=ns_port)
study = optim.run(n_iterations=100, min_n_workers=N_WORKERS)

Speed & Parallelization: machines
Follow the example from the docs … but it is not obvious

● A lot of (hyper)hyperparameters to search
● All the experiments are available in Neptune with a tag hpbandster

Conclusions: good
● State-of-the-art algorithm
● Can be distributed over a cluster of machines
● Visualizations
● Search space (kind of) supports nesting

Conclusions: bad
● Complicated API
● Missing docstrings

Results (mostly subjective)
Scikit-Optimize Optuna HpBandSter Hyperopt
API/ease of use Great Great Diﬃcult Good
Documentation Great Great Ok(ish) Bad
Speed/Parallelization Fast if
sequential/None
Great Good Ok
Visualizations Amazing None Very lib speciﬁc Some
*Experimental results 0.8566 (100) 0.8419 (100)
0.8597 (10000)
0.8629 (100) 0.8420 (100)

Dream library
Viz + Callbacks Scikit-Optimize
+
API + Docs + Pruning + Parallelization Optuna
+
Optimizers HpBandSter

Conversions between
results objects are in
neptune-contrib
import neptunecontrib.hpo.utils as hpo_utils
results = hpo_utils.optuna2skopt(study)
Dream library

● If you don’t have a lot of resources - use Scikit-Optimize
● If you want to get SOTA and don’t care about API/Docs - use HpBandSter
● If you want good docs/api/parallelization - use Optuna
Recommendations

Data science work sharing hub.
Track | Organize | Collaborate
jakub@neptune.ml
@NeptuneML
https://medium.com/neptune-ml
Jakub Czakon

Dss2019 hyperparameter optimization landscape

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Dss2019 hyperparameter optimization landscape

Semelhante a Dss2019 hyperparameter optimization landscape (20)

Último

Último (20)

Dss2019 hyperparameter optimization landscape