Automated Machine Learning via Sequential Uniform Designs

Introduction to AutoML SeqUD-based Hyperparameter Optimization Numerical Experiments
Automated Machine Learning via
Sequential Uniform Designs
Dr. Aijun Zhang
The University of Hong Kong
(Joint work with Zebin Yang (HKU) and Ji Zhu (Michigan))
October 2018
StatSoft.org 1

Outline of the presentation
1 Introduction to AutoML
Hyperparameter Optimization
Review of Existing Methods
Proposed Approach to AutoML
2 SeqUD-based Hyperparameter Optimization
Sequential Uniform Design
SeqUDHO Meta-algorithm
3 Numerical Experiments
Simulation Study
AutoML Experiments
StatSoft.org 2

What is AutoML (Automated Machine Learning)?
AutoML is to perform automated ML model selection and hyperparameter
conﬁguration for the purpose of maximizing ML prediction accuracy.
It also targets progressive automation of data preprocessing, feature
extraction/transformation, postprocessing and interpretation.
StatSoft.org 3

Growing Interest in AutoML
With the ultimate goal of making ML algorithms to be easily used without
expert knowledge, there appear oﬀ-the-shelf AutoML packages:
Auto-WEKA 2.0: simultaneous selection of ML algorithm and its
hyperparameters on WEKA (Kotthof et al., JMLR 2017)
auto-sklearn: AutoML for Python scikit-learn (Feurer et al., NIPS 2015)
H2O AutoML: automated model selection and ensembling for H2O
AutoKeras: automated neural architecture search (Jin, et al. 2018)
Google Cloud: AutoMLBETA for Translation, NLP, and Vision (2018)
A recent Forbes article claims that AutoML is set to become the future of
artiﬁcial intelligence.
StatSoft.org 4

Hyperparameter optimization, a.k.a. (hyper)paramater tuning, plays a
central role in AutoML pipelines.
Hyperparameters can be continuous, integer-valued or categorical, e.g.
regularization parameters, kernel bandwidths, tree depth, learning rate,
batch size, number of layers, type of activation.
Hyperparameter Optimization is of combinatorial nature, therefore a
challenging problem with curse of dimensionality.
There is limited understanding about tunability of ML hyperparameters
(Probst et al., 2018). There are mostly empirical evidences.
Robustness and reproducibility of hyperparameter configuration depend
not only on the specific algorithm, but also on the specific dataset.
StatSoft.org 5

StatSoft.org 6

Hyperparameter Optimization: Existing Methods
Grid search: exhaustive search over grid combinations (most popular)
Random search: random sampling (Bergstra and Bengio, 2012)
Bayesian optimization: sequentially sampling one-point-at-a-time
through maximizing the expected improvement (Jones et al., 1998)
GP-EI: surface modeled by Gaussian process (Snoek et al., 2012)
SMAC: surface modeled by random forest (Hutter et al., 2011)
TPE: Tree-structured Parzen Estimator (Bergstra et al., 2011)
Genetic algorithm: Goldberg & Holland (Machine Learning 1988)
Reinforcement learning: DNN architecture search (Zoph and Le, 2016)
StatSoft.org 7

Grid Search vs. Random Search
StatSoft.org 8

Bayesian Optimization
E.g. the acquisition function used by GP-EI (Snoek, et al., 2012):
αEI(x) =
∫ ∞
y∗
(y − y∗
)pGP(y|x)dy = σ(x)
[
z∗
(x)Φ(z∗
(x)) + ϕ(z∗
(x))
]
where y∗ is the observed maximum, (µ(x), σ2
(x)) are the GP-predicted
(posterior) mean and variance, and z∗(x) = (µ(x) − y∗)/σ(x).
StatSoft.org 9

We reformulate AutoML as a kind of Computer Experiment (CompExp):
Connections between AutoML and CompExp: a) the blackbox response
surface can be complex; b) the experiment is expensive to run.
StatSoft.org 10

Within CompExp framework, we propose a SeqUDHO meta-algorithm to
perform hyperparameter optimization for each candidate ML algorithm.
Key innovation: Sequential Uniform Design with augmented runs
By simulation study, the proposed SeqUDHO meta-algorithm is shown to
outperform existing methods.
Numerical experiments with real-world datasets demonstrate SeqUDHO
has superior performance for SVM, Xgboost and CNN algorithms.
StatSoft.org 11

Simulation Study
AutoML Experiments
StatSoft.org 12

SNTO method for Global Optimization
Fang and Wang (1990) proposed an SNTO method using NT-nets for
global/blackbox optimization; see Fang and Wang (1994; Chapter 3)
However, SNTO does not utilize existing runs in the subdomain.
This motivates us to develop an augmented uniform design method ...
StatSoft.org 13

Augmented Uniform Design
Given an initial design D1 with n1 runs, ﬁnd an augmented D∗
2 with n2 runs
such that the combined design is as uniform as possible, i.e.
D∗
2 ← min
D2
ϕ
([
D1
D2
])
,
where ϕ(D) is a uniformity criterion, e.g. centered L2-discrepancy (CD2).
StatSoft.org 14

Real-time SeqUD Construction
R:UniDOE package by Zhang, et al. (2018)
for stochastic search of uniform designs
Left: Stochastic/Adaptive TA Algorithm
https://CRAN.R-project.org/package=UniDOE
Supports real-time construction of sequential
uniform design (SeqUD) with augmented runs
R:UniDOE used for AutoML implementation
StatSoft.org 15

1 Define the search space by converting parameters to unit hypercube. Set
Tmax (total runs), J (multi-shooting number) and k = 1 (current stage).
2 Generate D with T = n1 UD runs. Evaluate CV(θ) and fit GP(θ).
3 while T ≤ Tmax do
Set k = k + 1. Find from D and GP-predicted QMC samples the
top-J centers {θ∗
k j }j∈[J] with little overlapping sub-spaces.
for j = 1, . . ., J do
Subspace zooming into center θ∗
k j
with level doubling;
Generate nk j augmented runs in the subspace;
If T + nk j > Tmax, break;
Evaluate CV(θ) of nk j runs, set T = T + nk j.
Update SeqUD D with T runs, and refit GP(θ).
4 Output the optimal θ∗ from all evaluated T runs.
StatSoft.org 16

Simulation Study
AutoML Experiments
StatSoft.org 17

Simulation Study
To check the eﬀectiveness of hyperparameter optimization, we consider two
kinds of complex surfaces as ground truth:
StatSoft.org 18

Competitor Methods
Five existing methods are compared:
Grid search: still most popular today due to its simplicity
Random search: Bergstra and Bengio (JMLR 2012)
GP-EI (Snoek et al., NIPS 2012) based on Github:spearmint
SMAC (Hutter et al., 2011) based on Github:SMAC3
TPE (Bergstra et al., NIPS 2011) based on Github:hyperopt
StatSoft.org 19

Comparative Results
(a) Cliﬀ-shaped function (b) Octopus-shaped function
StatSoft.org 20

Sampling Points for Cliﬀ-shaped Function
(c) SeqUDHO (d) GP-EI (e) SMAC
(f) TPE (g) Rand (h) Grid
Figure: An example of evaluation trajectories on Cliﬀ-shaped function.StatSoft.org 21

Sampling Points for Octopus-shaped Function
(a) SeqUDHO (b) GP-EI (c) SMAC
(d) TPE (e) Rand (f) Grid
Figure: An example of evaluation trajectories on Octopus-shaped function.StatSoft.org 22

AutoML Experiments
Six real classiﬁcation datasets from UCI machine learning repository:
Table: Description of Datasets
Abb. Dataset nfeatures ndata prop
MBP molec-biol-promoter 58 106 0.49
Breast breast-cancer 10 286 0.69
IonS ionosphere 34 350 0.3
ConVot congressional-voting 17 434 0.59
Credit credit-approval 16 690 0.43
MamG mammographic 6 960 0.56
StatSoft.org 23

Testing Algorithm: SVM
SVM (support vector machine) algorithm with 2 hyperparameters: kernel
width in [10−16
, 106
] and regularization strength in [10−6
, 1016
]
Parameter tuning results for SVM under 5-fold CV accuracy (%):
Dataset Rand TPE GP-EI SMAC SeqUDHO
Breast 73.85 74.06 73.78 74.16 74.72
ConVot 62.97 62.99 62.99 62.83 62.99
Credit 86.13 86.29 86.38 86.03 86.52
IonS 95.13 95.41 95.73 95.73 95.73
MamG 83.83 83.92 83.56 84.00 84.00
MBP 83.49 83.96 83.96 83.96 83.96
StatSoft.org 24

Testing Algorithm: XGBoost
XGBoost (extreme gradient boosting) algorithm with 10 parameters: 1
binary (choice of base model), 2 integer (Maximum Tree Depth, Number of Estimators)
and 7 continuous (Learning Rate, Min Sample Weights, Min Loss Reduction, Ratio of
Samples in Trees, Ratio of Variables in Trees, L2 Regularization and L1 Regularization)
Parameter tuning results for XGBoost under 5-fold CV accuracy (%):
Dataset Rand TPE GP-EI SMAC SeqUDHO
Breast 75.77 76.18 76.22 76.22 76.18
ConVot 63.17 63.38 63.22 63.01 63.54
Credit 88.06 88.28 88.55 88.5 88.65
IonS 93.53 93.96 94.02 94.08 94.22
MamG 82.97 83.02 83.14 82.9 82.90
MBP 89.43 90.28 89.62 89.62 90.48
StatSoft.org 25

Testing Algorithm: CNN
CNN (convolutional neural network) with three layers. Each layer is tuned
by its number of ﬁlters and kernel size. Global parameters include the
choice of optimizer, batch size, learning rate and L2 penalty.
MNIST data split: 8000 samples for training, 2000 samples for validation
and 50000 samples for testing.
Here, our AutoML target is to maximize the validation accuracy.
StatSoft.org 26

Testing Algorithm: CNN
Hyperparameter settings and optimization results:
The best CNN model selected by SeqUDHO is tested on the 50K sample
with testing accuracy of 98.05%.
StatSoft.org 27

AutoML Demonstration
Finally, we demonstrate how to use SeqUDHO for AutoML in practice.
Consider the mixture.example (R:ElemStatLearn) and seven benchmark
datasets from UCI ML repository, all with binary responses.
Consider three candidate ML algorithms (SVM, Random Forest,
XGBoost), each having diﬀerent hyperparameter settings.
Example of AutoML output by SeqUDHO:
StatSoft.org 28

Future Work
1 To run simulation study for high-dimensional blackbox optimization;
analyze strength/weakness of SeqUDHO and other Bayesian methods;
2 To improve the Gaussian process meta-modeling (with nugget eﬀect)
through sequential approximation for non-stationary surfaces;
3 To investigate DNN architecture search with SeqUD, and compare with
genetic programming and reinforcement learning;
4 To investigate automated procedures for feature engineering, including
variable selection and transformation;
5 To develop AutoML R/Python package with SeqUDHO meta-algorithm.
StatSoft.org 29

References
1. Bergstra, J., Bardenet, R., Bengio, Y. and Kegl, B. (2011). Algorithms for hyper-parameter
optimization. In NIPS, 2546–2554.
2. Bergstra, J. and Bengio, Y. (2012). Random search for hyper-parameter optimization.
Journal of Machine Learning Research, 13, 281–305.
3. Fang, K.T. and Wang, Y. (1990). A sequential number-theoretic method for optimization and
its applications in statistics. In Lecture Notes in Contemporary Mathematics, Science Press.
4. Fang, K.T. and Wang, Y. (1994). Number-theoretic Methods in Statistics. CRC Press.
5. Feurer, M., Klein, A., Eggensperger, K., Springenberg, J., Blum, M. and Hutter, F. (2015).
Eﬃcient and robust automated machine learning. In NIPS, 2962–2970.
6. Goldberg, D.E. and Holland, J.H. (1988). Genetic algorithms and machine learning.
Machine learning, 3(2), 95–99.
7. Huang, C.M., Lee, Y.J., Lin, D.K. and Huang, S.Y. (2007). Model selection for support
vector machines via uniform design. CSDA, 52(1), 335–346.
8. Hutter, F., Hoos, H.H. and Leyton-Brown, K. (2011). Sequential model-based optimization
for general algorithm conﬁguration. In International Conference on Learning and Intelligent
Optimization, 507–523. Springer, Berlin, Heidelberg.
StatSoft.org 30

References
9. Jin, H., Song, Q., and Hu, X. (2018). Efficient neural architecture search with network
morphism. arXiv preprint arXiv:1806.10282.
10. Jones, D.R., Schonlau, M. and Welch, W.J. (1998). Efficient global optimization of
expensive black-box functions. Journal of Global optimization, 13(4), 455–492.
11. Kotthoff, L., Thornton, C., Hoos, H.H., Hutter, F. and Leyton-Brown, K. (2017).
Auto-WEKA 2.0: Automatic model selection and hyperparameter optimization in WEKA.
Journal of Machine Learning Research, 18(1), 826–830.
12. Probst, P., Bischl, B. and Boulesteix, A.L. (2018). Tunability: Importance of
hyperparameters of machine learning algorithms. arXiv:1802.09596.
13. Snoek, J., Larochelle, H. and Adams, R.P. (2012). Practical bayesian optimization of
machine learning algorithms. In NIPS, 2951–2959.
14. Zhang, A., Li, H., Quan, S. and Yang, Z. (2018). UniDOE: uniform design of experiments.
R package version 1.0.2. https://CRAN.R-project.org/package=UniDOE
15. Zoph, B. and Le, Q.V. (2016). Neural architecture search with reinforcement learning.
arXiv:1611.01578.
StatSoft.org 31

Thank You！
Q&A or Email ajzhang@hku.hk。
StatSoft.org 32

Automated Machine Learning via Sequential Uniform Designs

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (19)

Semelhante a Automated Machine Learning via Sequential Uniform Designs

Semelhante a Automated Machine Learning via Sequential Uniform Designs (20)

Último

Último (20)

Automated Machine Learning via Sequential Uniform Designs