Initial-Population Bias in the Univariate Estimation of Distribution Algorithm
1. Initial-Population Bias in the Univariate
Estimation of Distribution Algorithm
Martin Pelikan and Kumara Sastry
Missouri Estimation of Distribution Algorithms Laboratory (MEDAL)
University of Missouri, St. Louis, MO
http://medal.cs.umsl.edu/
pelikan@cs.umsl.edu
Download MEDAL Report No. 2009001
http://medal.cs.umsl.edu/files/2009001.pdf
Martin Pelikan and Kumara Sastry Initial-Population Bias in UMDA
2. Motivation
Importance of bias
Efficiency enhancements of EDAs may introduce bias.
Examples
Local search.
Injection of prior full or partial solutions.
Bias based on prior knowledge about the problem.
Bias may have positive or negative effects.
It is important to understand these effects.
This study
Study the effects of biasing the initial population.
Consider UMDA on onemax and noisy onemax.
Theory and experiment.
Martin Pelikan and Kumara Sastry Initial-Population Bias in UMDA
3. Outline
1. UMDA.
2. Basic model for bias.
3. Population size.
4. Number of generations.
5. Compare to hill climber.
6. Conclusions.
7. Future work.
Martin Pelikan and Kumara Sastry Initial-Population Bias in UMDA
4. Probability Vector as a Model
Probability vector, p
Store probability of 1 in each position.
p = (p1 , p2 , . . . , pn ).
pi is probability of 1 in position i.
Replace crossover/mutation by model building and sampling
Learn the probability vector from selected points.
Sample new points according to the learned vector.
Martin Pelikan and Kumara Sastry Initial-Population Bias in UMDA
5. Univariate Marginal Distribution Algorithm (UMDA)
UMDA (Muhlenbein & Paaß, 1996).
1. Generate random population of binary strings.
2. Selection (e.g. tournament selection).
3. Example: Probability Vector
Learn probability vector for selected solutions.
4. Sample probability vector to generate new solutions.
5. Incorporate new solutions into original population.
(Mühlenbein, Paass, 1996), (Baluja, 1994)
Current Selected New
population population population
Probability
11001 11001 vector 10101
10101 10101 10001
1.0 0.5 0.5 0.0 1.0
01011 01011 11101
11000 11000 11001
Martin Pelikan, Probabilistic Model-Building GAs
13
Martin Pelikan and Kumara Sastry Initial-Population Bias in UMDA
6. Assumptions
Algorithm
UMDA with binary tournament selection and full replacement.
Results should generalize to other selection methods with
fixed selection intensity.
Fitness
Deterministic onemax:
n
onemax(X1 , X2 , . . . , Xn ) = Xi
i=1
Noisy onemax:
n
onemaxnoisy (X1 , X2 , . . . , Xn ) = Xi + N (0, σ 2 )
i=1
Results should generalize to other separable problems of
bounded order (if good model is used).
Martin Pelikan and Kumara Sastry Initial-Population Bias in UMDA
7. Basic Model for Bias
Basic model
Introduce bias in the initial population.
Increase or decrease the initial proportion pinit of optimal bits.
Use the same bias for all string positions.
Examples
pinit = 0.2 pinit = 0.5 pinit = 0.8
00001 11110 11110
00001 01010 01011
01000 11101 01111
00010 00010 11111
10000 11011 10111
What to expect?
pinit grows ⇒ UMDA performance improves.
pinit decreases ⇒ UMDA performance suffers.
Martin Pelikan and Kumara Sastry Initial-Population Bias in UMDA
8. Theoretical Model for Deterministic Onemax
Population size
Gambler’s ruin population-sizing model (Harik et al., 1997).
Population sizing bound
1 √
N =− ln α πn
4pinit
Number of generations
Convergence model (Thierens & Goldberg, 1994).
Number of generations bound
π √
G= − arcsin(2pinit − 1) πn
2
Martin Pelikan and Kumara Sastry Initial-Population Bias in UMDA
9. Deterministic Onemax: Theoretical Speedup
Speedup factors
How many times faster the algorithm becomes compared to
pinit = 0.5?
Population size:
1
ηN =
2pinit
Number of generations:
2 arcsin(2pinit − 1)
ηG = 1 −
π
Number of evaluations:
1 2 arcsin(2pinit − 1)
ηE = 1−
2pinit π
Martin Pelikan and Kumara Sastry Initial-Population Bias in UMDA
10. Experimental Setup
Basic setup
Binary tournament selection without replacement.
Full replacement (no elitism or niching).
Problems of n = 100 to n = 500 tested (focus on n = 500).
Population size set using bisection to ensure 10 successful
runs with 95% optimal solution out of 10 independent runs.
Bisection repeated 10 times for each setting.
Observed statistics
Population size.
Number of generations.
Number of evaluations.
Martin Pelikan and Kumara Sastry Initial-Population Bias in UMDA
11. Deterministic Onemax: Speedup and Slowdown
Speedup Slowdown
8 20
Number of evaluations Number of evaluations
Population size Population size
6 Number of generations 15 Number of generations
Base case Base case
Speedup
Slowdown
4
10
2 (faster than pinit=0.5)
5
0 (slower than pinit=0.5)
(slower than p =0.5)
init
0 (faster than pinit=0.5)
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
pinit p
init
Empirical results confirm intuition. of
size, the number of generations and the The factor by which the population siz
Figure 2: number
mpared to the base case bias improves 0.5. The three
Positive with pevaluations should change with varying pinit comp
init = performance.
Negative bias The results are shown the population-sizing and tim
worsens performance.
time-to-convergence models. factors are based on
as speedup and slowdown curves.
Martin Pelikan and Kumara Sastry Initial-Population Bias in UMDA
12. Deterministic Onemax: Experiments vs. Theory
Population size Number of generations x
5
120
Experiment Experiment
400
Number of evaluations
Theory Theory 4
Number of generations
100
Population size
300 80 3
200 60
2
40
100 1
20
0 0
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0
pinit pinit
Empirical results size. theory.
(a) Population
match (b) Number of generations.
Theory makes conservative estimates.
Figure 3: Effects of initial-population bias on UMDA performance
Empirical results confirm intuition.
without external noise.
5.1 Noisy Onemax
Martin Pelikan and Kumara Sastry Initial-Population Bias in UMDA
13. Theoretical Model for Noisy Onemax: Population Size
Population size
Gambler’s ruin population-sizing model (Harik et al., 1997).
Variance of external noise given in terms of fitness variance:
2 2
σnoise = β × σf itness
Population sizing bound becomes
1
N =− ln α πn(1 + β)
4pinit
Martin Pelikan and Kumara Sastry Initial-Population Bias in UMDA
14. Theoretical Model for Noisy Onemax: Generations
Number of generations
Convergence model (Miller & Goldberg, 1994; Sastry, 2001;
Goldberg, 2002).
Difficult to solve analytically for arbitrary pinit .
Effects of pinit modeled by an empirical fit.
Number of generations bound
π√ 2 arcsin(2pinit − 1)
G= πn 1+β 1−
2 π
Martin Pelikan and Kumara Sastry Initial-Population Bias in UMDA
15. Noisy Onemax: Theoretical Speedup
Speedup factors same as for deterministic case!
Population size:
1
ηN =
2pinit
Number of generations:
2 arcsin(2pinit − 1)
ηG = 1 −
π
Number of evaluations:
1 2 arcsin(2pinit − 1)
ηE = 1−
2pinit π
Martin Pelikan and Kumara Sastry Initial-Population Bias in UMDA
16. Figure 4: Effects of initial-population bias on UMDA performance
Noisy Onemax: Experiments vs. Theory for β = 1 o
2 2
σN = 0.5σF = 0.125n.
Population size Number of generations
x
800 250 15
Experiment Experiment
Theory Theory
Number of evaluations
Number of generations
200
600
Population size
10
150
400
100 5
200
50
0 0
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0
pinit pinit
(a) Population size. (b) Number of generations. (
Empirical results match theory.
Figure 5: Effects of initial-population bias estimate. performance o
Population sizing remains a conservative on UMDA
2 =Note: β = 1 is a lot of noise (noise variance equal to overall
2
σN σF = 0.25n.
fitness variance).
Figure 8 visualizes the effects of external noise on the number of
Martin Pelikan and Kumara Sastry Initial-Population Bias in UMDA
17. Compare to Hill Climber on Deterministic Case
2 2
on UMDA performance with external noise σN = 2σF .
4
x 10
4
Experiment UMDA
heory Hill Climbing
Number of evaluations
3
2
1
0
7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
p
init
onemax. (b) Comparison of UMDA and HC.
Performance of HC is great regardless of bias.
This agrees with theory (M¨hlenbein, 1992).
u
500-bit deterministic onemax and its comparison to UMDA.
uhlenbein, Kumara Sastry is used to provide an upper bound on the
¨Martin Pelikan and 1992) Initial-Population Bias in UMDA
18. Compare to Hill Climber on Noisy Case
Performance of HC becomes poor with noise!
β n pinit HC evaluations UMDA evaluations
0.5 10 0.1 4,449 1,210
0.5 25 0.1 2,125,373 1,886
0.5 10 0.5 11,096 66
0.5 25 0.5 8,248,140 169
1.0 5 0.1 215 574
1.0 15 0.1 5,691,725 1,210
1.0 5 0.5 64 20
1.0 15 0.5 15,738,168 64
Martin Pelikan and Kumara Sastry Initial-Population Bias in UMDA
19. Conclusions
We have good theoretical understanding of the effects of one
type of initial-population bias on performance of UMDA on
deterministic and noisy onemax.
Effects of bias match intuition
Good bias improves performance.
Bad bias worsens performance.
Effects of bias are independent of noise.
Experimental results match theory.
Martin Pelikan and Kumara Sastry Initial-Population Bias in UMDA
20. Future Work
Study specific efficiency enhancement techniques and the bias
they introduce, and apply the theory developed here to
estimate the final effects.
Extend this work to other types of bias.
Extend this work to other evolutionary algorithms, especially
the standard genetic algorithms with two-parent
recombination and EDAs with multivariate models (e.g. BOA
and ecGA).
Eliminate the empirical fit from the model for the noisy
onemax.
Martin Pelikan and Kumara Sastry Initial-Population Bias in UMDA
21. Acknowledgments
Acknowledgments
NSF; NSF CAREER grant ECS-0547013.
U.S. Air Force, AFOSR; FA9550-06-1-0096.
University of Missouri; High Performance Computing
Collaboratory sponsored by Information Technology Services;
Research Award; Research Board.
Martin Pelikan and Kumara Sastry Initial-Population Bias in UMDA