This document introduces automated machine learning (AutoML) and sequential uniform design-based hyperparameter optimization (SeqUDHO). It discusses existing hyperparameter optimization methods and proposes using sequential uniform design. Numerical experiments demonstrate that SeqUDHO outperforms other methods like random search, Bayesian optimization, and grid search on both simulated complex surfaces and real-world classification tasks with SVM, XGBoost, and CNN algorithms. Future work is outlined to improve the approach.
Automated Machine Learning via Sequential Uniform Designs
1. Introduction to AutoML SeqUD-based Hyperparameter Optimization Numerical Experiments
Automated Machine Learning via
Sequential Uniform Designs
Dr. Aijun Zhang
The University of Hong Kong
(Joint work with Zebin Yang (HKU) and Ji Zhu (Michigan))
October 2018
StatSoft.org 1
2. Introduction to AutoML SeqUD-based Hyperparameter Optimization Numerical Experiments
Outline of the presentation
1 Introduction to AutoML
Hyperparameter Optimization
Review of Existing Methods
Proposed Approach to AutoML
2 SeqUD-based Hyperparameter Optimization
Sequential Uniform Design
SeqUDHO Meta-algorithm
3 Numerical Experiments
Simulation Study
AutoML Experiments
StatSoft.org 2
3. Introduction to AutoML SeqUD-based Hyperparameter Optimization Numerical Experiments
What is AutoML (Automated Machine Learning)?
AutoML is to perform automated ML model selection and hyperparameter
configuration for the purpose of maximizing ML prediction accuracy.
It also targets progressive automation of data preprocessing, feature
extraction/transformation, postprocessing and interpretation.
StatSoft.org 3
4. Introduction to AutoML SeqUD-based Hyperparameter Optimization Numerical Experiments
Growing Interest in AutoML
With the ultimate goal of making ML algorithms to be easily used without
expert knowledge, there appear off-the-shelf AutoML packages:
Auto-WEKA 2.0: simultaneous selection of ML algorithm and its
hyperparameters on WEKA (Kotthof et al., JMLR 2017)
auto-sklearn: AutoML for Python scikit-learn (Feurer et al., NIPS 2015)
H2O AutoML: automated model selection and ensembling for H2O
AutoKeras: automated neural architecture search (Jin, et al. 2018)
Google Cloud: AutoMLBETA for Translation, NLP, and Vision (2018)
A recent Forbes article claims that AutoML is set to become the future of
artificial intelligence.
StatSoft.org 4
5. Introduction to AutoML SeqUD-based Hyperparameter Optimization Numerical Experiments
Hyperparameter Optimization
Hyperparameter optimization, a.k.a. (hyper)paramater tuning, plays a
central role in AutoML pipelines.
Hyperparameters can be continuous, integer-valued or categorical, e.g.
regularization parameters, kernel bandwidths, tree depth, learning rate,
batch size, number of layers, type of activation.
Hyperparameter Optimization is of combinatorial nature, therefore a
challenging problem with curse of dimensionality.
There is limited understanding about tunability of ML hyperparameters
(Probst et al., 2018). There are mostly empirical evidences.
Robustness and reproducibility of hyperparameter configuration depend
not only on the specific algorithm, but also on the specific dataset.
StatSoft.org 5
7. Introduction to AutoML SeqUD-based Hyperparameter Optimization Numerical Experiments
Hyperparameter Optimization: Existing Methods
Grid search: exhaustive search over grid combinations (most popular)
Random search: random sampling (Bergstra and Bengio, 2012)
Bayesian optimization: sequentially sampling one-point-at-a-time
through maximizing the expected improvement (Jones et al., 1998)
GP-EI: surface modeled by Gaussian process (Snoek et al., 2012)
SMAC: surface modeled by random forest (Hutter et al., 2011)
TPE: Tree-structured Parzen Estimator (Bergstra et al., 2011)
Genetic algorithm: Goldberg & Holland (Machine Learning 1988)
Reinforcement learning: DNN architecture search (Zoph and Le, 2016)
StatSoft.org 7
8. Introduction to AutoML SeqUD-based Hyperparameter Optimization Numerical Experiments
Grid Search vs. Random Search
StatSoft.org 8
9. Introduction to AutoML SeqUD-based Hyperparameter Optimization Numerical Experiments
Bayesian Optimization
E.g. the acquisition function used by GP-EI (Snoek, et al., 2012):
αEI(x) =
∫ ∞
y∗
(y − y∗
)pGP(y|x)dy = σ(x)
[
z∗
(x)Φ(z∗
(x)) + ϕ(z∗
(x))
]
where y∗ is the observed maximum, (µ(x), σ2
(x)) are the GP-predicted
(posterior) mean and variance, and z∗(x) = (µ(x) − y∗)/σ(x).
StatSoft.org 9
10. Introduction to AutoML SeqUD-based Hyperparameter Optimization Numerical Experiments
Proposed Approach to AutoML
We reformulate AutoML as a kind of Computer Experiment (CompExp):
Connections between AutoML and CompExp: a) the blackbox response
surface can be complex; b) the experiment is expensive to run.
StatSoft.org 10
11. Introduction to AutoML SeqUD-based Hyperparameter Optimization Numerical Experiments
Proposed Approach to AutoML
Within CompExp framework, we propose a SeqUDHO meta-algorithm to
perform hyperparameter optimization for each candidate ML algorithm.
Key innovation: Sequential Uniform Design with augmented runs
By simulation study, the proposed SeqUDHO meta-algorithm is shown to
outperform existing methods.
Numerical experiments with real-world datasets demonstrate SeqUDHO
has superior performance for SVM, Xgboost and CNN algorithms.
StatSoft.org 11
12. Introduction to AutoML SeqUD-based Hyperparameter Optimization Numerical Experiments
Outline of the presentation
1 Introduction to AutoML
Hyperparameter Optimization
Review of Existing Methods
Proposed Approach to AutoML
2 SeqUD-based Hyperparameter Optimization
Sequential Uniform Design
SeqUDHO Meta-algorithm
3 Numerical Experiments
Simulation Study
AutoML Experiments
StatSoft.org 12
13. Introduction to AutoML SeqUD-based Hyperparameter Optimization Numerical Experiments
SNTO method for Global Optimization
Fang and Wang (1990) proposed an SNTO method using NT-nets for
global/blackbox optimization; see Fang and Wang (1994; Chapter 3)
However, SNTO does not utilize existing runs in the subdomain.
This motivates us to develop an augmented uniform design method ...
StatSoft.org 13
14. Introduction to AutoML SeqUD-based Hyperparameter Optimization Numerical Experiments
Augmented Uniform Design
Given an initial design D1 with n1 runs, find an augmented D∗
2 with n2 runs
such that the combined design is as uniform as possible, i.e.
D∗
2 ← min
D2
ϕ
([
D1
D2
])
,
where ϕ(D) is a uniformity criterion, e.g. centered L2-discrepancy (CD2).
StatSoft.org 14
15. Introduction to AutoML SeqUD-based Hyperparameter Optimization Numerical Experiments
Real-time SeqUD Construction
R:UniDOE package by Zhang, et al. (2018)
for stochastic search of uniform designs
Left: Stochastic/Adaptive TA Algorithm
https://CRAN.R-project.org/package=UniDOE
Supports real-time construction of sequential
uniform design (SeqUD) with augmented runs
R:UniDOE used for AutoML implementation
StatSoft.org 15
16. Introduction to AutoML SeqUD-based Hyperparameter Optimization Numerical Experiments
SeqUDHO Meta-algorithm
1 Define the search space by converting parameters to unit hypercube. Set
Tmax (total runs), J (multi-shooting number) and k = 1 (current stage).
2 Generate D with T = n1 UD runs. Evaluate CV(θ) and fit GP(θ).
3 while T ≤ Tmax do
Set k = k + 1. Find from D and GP-predicted QMC samples the
top-J centers {θ∗
k j }j∈[J] with little overlapping sub-spaces.
for j = 1, . . ., J do
Subspace zooming into center θ∗
k j
with level doubling;
Generate nk j augmented runs in the subspace;
If T + nk j > Tmax, break;
Evaluate CV(θ) of nk j runs, set T = T + nk j.
Update SeqUD D with T runs, and refit GP(θ).
4 Output the optimal θ∗ from all evaluated T runs.
StatSoft.org 16
17. Introduction to AutoML SeqUD-based Hyperparameter Optimization Numerical Experiments
Outline of the presentation
1 Introduction to AutoML
Hyperparameter Optimization
Review of Existing Methods
Proposed Approach to AutoML
2 SeqUD-based Hyperparameter Optimization
Sequential Uniform Design
SeqUDHO Meta-algorithm
3 Numerical Experiments
Simulation Study
AutoML Experiments
StatSoft.org 17
18. Introduction to AutoML SeqUD-based Hyperparameter Optimization Numerical Experiments
Simulation Study
To check the effectiveness of hyperparameter optimization, we consider two
kinds of complex surfaces as ground truth:
StatSoft.org 18
19. Introduction to AutoML SeqUD-based Hyperparameter Optimization Numerical Experiments
Competitor Methods
Five existing methods are compared:
Grid search: still most popular today due to its simplicity
Random search: Bergstra and Bengio (JMLR 2012)
GP-EI (Snoek et al., NIPS 2012) based on Github:spearmint
SMAC (Hutter et al., 2011) based on Github:SMAC3
TPE (Bergstra et al., NIPS 2011) based on Github:hyperopt
StatSoft.org 19
20. Introduction to AutoML SeqUD-based Hyperparameter Optimization Numerical Experiments
Comparative Results
(a) Cliff-shaped function (b) Octopus-shaped function
StatSoft.org 20
21. Introduction to AutoML SeqUD-based Hyperparameter Optimization Numerical Experiments
Sampling Points for Cliff-shaped Function
(c) SeqUDHO (d) GP-EI (e) SMAC
(f) TPE (g) Rand (h) Grid
Figure: An example of evaluation trajectories on Cliff-shaped function.StatSoft.org 21
22. Introduction to AutoML SeqUD-based Hyperparameter Optimization Numerical Experiments
Sampling Points for Octopus-shaped Function
(a) SeqUDHO (b) GP-EI (c) SMAC
(d) TPE (e) Rand (f) Grid
Figure: An example of evaluation trajectories on Octopus-shaped function.StatSoft.org 22
25. Introduction to AutoML SeqUD-based Hyperparameter Optimization Numerical Experiments
Testing Algorithm: XGBoost
XGBoost (extreme gradient boosting) algorithm with 10 parameters: 1
binary (choice of base model), 2 integer (Maximum Tree Depth, Number of Estimators)
and 7 continuous (Learning Rate, Min Sample Weights, Min Loss Reduction, Ratio of
Samples in Trees, Ratio of Variables in Trees, L2 Regularization and L1 Regularization)
Parameter tuning results for XGBoost under 5-fold CV accuracy (%):
Dataset Rand TPE GP-EI SMAC SeqUDHO
Breast 75.77 76.18 76.22 76.22 76.18
ConVot 63.17 63.38 63.22 63.01 63.54
Credit 88.06 88.28 88.55 88.5 88.65
IonS 93.53 93.96 94.02 94.08 94.22
MamG 82.97 83.02 83.14 82.9 82.90
MBP 89.43 90.28 89.62 89.62 90.48
StatSoft.org 25
26. Introduction to AutoML SeqUD-based Hyperparameter Optimization Numerical Experiments
Testing Algorithm: CNN
CNN (convolutional neural network) with three layers. Each layer is tuned
by its number of filters and kernel size. Global parameters include the
choice of optimizer, batch size, learning rate and L2 penalty.
MNIST data split: 8000 samples for training, 2000 samples for validation
and 50000 samples for testing.
Here, our AutoML target is to maximize the validation accuracy.
StatSoft.org 26
27. Introduction to AutoML SeqUD-based Hyperparameter Optimization Numerical Experiments
Testing Algorithm: CNN
Hyperparameter settings and optimization results:
The best CNN model selected by SeqUDHO is tested on the 50K sample
with testing accuracy of 98.05%.
StatSoft.org 27
28. Introduction to AutoML SeqUD-based Hyperparameter Optimization Numerical Experiments
AutoML Demonstration
Finally, we demonstrate how to use SeqUDHO for AutoML in practice.
Consider the mixture.example (R:ElemStatLearn) and seven benchmark
datasets from UCI ML repository, all with binary responses.
Consider three candidate ML algorithms (SVM, Random Forest,
XGBoost), each having different hyperparameter settings.
Example of AutoML output by SeqUDHO:
StatSoft.org 28
29. Introduction to AutoML SeqUD-based Hyperparameter Optimization Numerical Experiments
Future Work
1 To run simulation study for high-dimensional blackbox optimization;
analyze strength/weakness of SeqUDHO and other Bayesian methods;
2 To improve the Gaussian process meta-modeling (with nugget effect)
through sequential approximation for non-stationary surfaces;
3 To investigate DNN architecture search with SeqUD, and compare with
genetic programming and reinforcement learning;
4 To investigate automated procedures for feature engineering, including
variable selection and transformation;
5 To develop AutoML R/Python package with SeqUDHO meta-algorithm.
StatSoft.org 29
30. Introduction to AutoML SeqUD-based Hyperparameter Optimization Numerical Experiments
References
1. Bergstra, J., Bardenet, R., Bengio, Y. and Kegl, B. (2011). Algorithms for hyper-parameter
optimization. In NIPS, 2546–2554.
2. Bergstra, J. and Bengio, Y. (2012). Random search for hyper-parameter optimization.
Journal of Machine Learning Research, 13, 281–305.
3. Fang, K.T. and Wang, Y. (1990). A sequential number-theoretic method for optimization and
its applications in statistics. In Lecture Notes in Contemporary Mathematics, Science Press.
4. Fang, K.T. and Wang, Y. (1994). Number-theoretic Methods in Statistics. CRC Press.
5. Feurer, M., Klein, A., Eggensperger, K., Springenberg, J., Blum, M. and Hutter, F. (2015).
Efficient and robust automated machine learning. In NIPS, 2962–2970.
6. Goldberg, D.E. and Holland, J.H. (1988). Genetic algorithms and machine learning.
Machine learning, 3(2), 95–99.
7. Huang, C.M., Lee, Y.J., Lin, D.K. and Huang, S.Y. (2007). Model selection for support
vector machines via uniform design. CSDA, 52(1), 335–346.
8. Hutter, F., Hoos, H.H. and Leyton-Brown, K. (2011). Sequential model-based optimization
for general algorithm configuration. In International Conference on Learning and Intelligent
Optimization, 507–523. Springer, Berlin, Heidelberg.
StatSoft.org 30
31. Introduction to AutoML SeqUD-based Hyperparameter Optimization Numerical Experiments
References
9. Jin, H., Song, Q., and Hu, X. (2018). Efficient neural architecture search with network
morphism. arXiv preprint arXiv:1806.10282.
10. Jones, D.R., Schonlau, M. and Welch, W.J. (1998). Efficient global optimization of
expensive black-box functions. Journal of Global optimization, 13(4), 455–492.
11. Kotthoff, L., Thornton, C., Hoos, H.H., Hutter, F. and Leyton-Brown, K. (2017).
Auto-WEKA 2.0: Automatic model selection and hyperparameter optimization in WEKA.
Journal of Machine Learning Research, 18(1), 826–830.
12. Probst, P., Bischl, B. and Boulesteix, A.L. (2018). Tunability: Importance of
hyperparameters of machine learning algorithms. arXiv:1802.09596.
13. Snoek, J., Larochelle, H. and Adams, R.P. (2012). Practical bayesian optimization of
machine learning algorithms. In NIPS, 2951–2959.
14. Zhang, A., Li, H., Quan, S. and Yang, Z. (2018). UniDOE: uniform design of experiments.
R package version 1.0.2. https://CRAN.R-project.org/package=UniDOE
15. Zoph, B. and Le, Q.V. (2016). Neural architecture search with reinforcement learning.
arXiv:1611.01578.
StatSoft.org 31
32. Introduction to AutoML SeqUD-based Hyperparameter Optimization Numerical Experiments
Thank You!
Q&A or Email ajzhang@hku.hk。
StatSoft.org 32