Initial-Population Bias in the Univariate Estimation of Distribution Algorithm

Initial-Population Bias in the Univariate
Estimation of Distribution Algorithm

Martin Pelikan and Kumara Sastry

Missouri Estimation of Distribution Algorithms Laboratory (MEDAL)
University of Missouri, St. Louis, MO
http://medal.cs.umsl.edu/
pelikan@cs.umsl.edu

Download MEDAL Report No. 2009001
http://medal.cs.umsl.edu/files/2009001.pdf

Martin Pelikan and Kumara Sastry Initial-Population Bias in UMDA

Motivation

Importance of bias
Efficiency enhancements of EDAs may introduce bias.
Examples
Local search.
Injection of prior full or partial solutions.
Bias based on prior knowledge about the problem.
Bias may have positive or negative effects.
It is important to understand these effects.

This study
Study the effects of biasing the initial population.
Consider UMDA on onemax and noisy onemax.
Theory and experiment.


Outline

1. UMDA.

2. Basic model for bias.

3. Population size.

4. Number of generations.

5. Compare to hill climber.

6. Conclusions.

7. Future work.


Probability Vector as a Model

Probability vector, p
Store probability of 1 in each position.
p = (p1 , p2 , . . . , pn ).
pi is probability of 1 in position i.

Replace crossover/mutation by model building and sampling
Learn the probability vector from selected points.
Sample new points according to the learned vector.


Univariate Marginal Distribution Algorithm (UMDA)
UMDA (Muhlenbein & Paaß, 1996).
1. Generate random population of binary strings.
2. Selection (e.g. tournament selection).
3. Example: Probability Vector
Learn probability vector for selected solutions.
4. Sample probability vector to generate new solutions.
5. Incorporate new solutions into original population.
(Mühlenbein, Paass, 1996), (Baluja, 1994)
Current Selected New
population population population
Probability
11001 11001 vector 10101
10101 10101 10001
1.0 0.5 0.5 0.0 1.0
01011 01011 11101
11000 11000 11001

Martin Pelikan, Probabilistic Model-Building GAs
13

Assumptions
Algorithm
UMDA with binary tournament selection and full replacement.
Results should generalize to other selection methods with
ﬁxed selection intensity.

Fitness
Deterministic onemax:
n
onemax(X1 , X2 , . . . , Xn ) = Xi
i=1

Noisy onemax:
n
onemaxnoisy (X1 , X2 , . . . , Xn ) = Xi + N (0, σ 2 )
i=1

Results should generalize to other separable problems of
bounded order (if good model is used).


Basic Model for Bias

Basic model
Introduce bias in the initial population.
Increase or decrease the initial proportion pinit of optimal bits.
Use the same bias for all string positions.
Examples
pinit = 0.2 pinit = 0.5 pinit = 0.8
00001 11110 11110
00001 01010 01011
01000 11101 01111
00010 00010 11111
10000 11011 10111
What to expect?
pinit grows ⇒ UMDA performance improves.
pinit decreases ⇒ UMDA performance suﬀers.


Theoretical Model for Deterministic Onemax

Population size
Gambler’s ruin population-sizing model (Harik et al., 1997).
Population sizing bound
1 √
N =− ln α πn
4pinit

Number of generations
Convergence model (Thierens & Goldberg, 1994).
Number of generations bound
π √
G= − arcsin(2pinit − 1) πn
2


Deterministic Onemax: Theoretical Speedup

Speedup factors
How many times faster the algorithm becomes compared to
pinit = 0.5?
Population size:
1
ηN =
2pinit
Number of generations:

2 arcsin(2pinit − 1)
ηG = 1 −
π
Number of evaluations:
1 2 arcsin(2pinit − 1)
ηE = 1−
2pinit π


Experimental Setup

Basic setup
Binary tournament selection without replacement.
Full replacement (no elitism or niching).
Problems of n = 100 to n = 500 tested (focus on n = 500).
Population size set using bisection to ensure 10 successful
runs with 95% optimal solution out of 10 independent runs.
Bisection repeated 10 times for each setting.

Observed statistics
Population size.
Number of generations.
Number of evaluations.


Deterministic Onemax: Speedup and Slowdown

Speedup Slowdown
8 20
Number of evaluations Number of evaluations
Population size Population size
6 Number of generations 15 Number of generations
Base case Base case
Speedup

Slowdown
4
10

2 (faster than pinit=0.5)
5
0 (slower than pinit=0.5)
(slower than p =0.5)
init
0 (faster than pinit=0.5)
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
pinit p
init

Empirical results conﬁrm intuition. of
size, the number of generations and the The factor by which the population siz
Figure 2: number
mpared to the base case bias improves 0.5. The three
Positive with pevaluations should change with varying pinit comp
init = performance.
Negative bias The results are shown the population-sizing and tim
worsens performance.
time-to-convergence models. factors are based on
as speedup and slowdown curves.


Deterministic Onemax: Experiments vs. Theory

Population size Number of generations x
5
120
Experiment Experiment
400

Number of evaluations
Theory Theory 4

100
Population size

300 80 3

200 60
2
40
100 1
20
0 0
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0
pinit pinit

Empirical results size. theory.
(a) Population
match (b) Number of generations.

Theory makes conservative estimates.
Figure 3: Eﬀects of initial-population bias on UMDA performance
Empirical results conﬁrm intuition.
without external noise.

5.1 Noisy Onemax

Theoretical Model for Noisy Onemax: Population Size

Population size
Gambler’s ruin population-sizing model (Harik et al., 1997).
Variance of external noise given in terms of ﬁtness variance:
2 2
σnoise = β × σf itness

Population sizing bound becomes
1
N =− ln α πn(1 + β)
4pinit


Theoretical Model for Noisy Onemax: Generations

Convergence model (Miller & Goldberg, 1994; Sastry, 2001;
Goldberg, 2002).
Difficult to solve analytically for arbitrary pinit .
Effects of pinit modeled by an empirical fit.
Number of generations bound

π√ 2 arcsin(2pinit − 1)
G= πn 1+β 1−
2 π


Noisy Onemax: Theoretical Speedup

Speedup factors same as for deterministic case!
Population size:
1
ηN =
2pinit
Number of generations:

2 arcsin(2pinit − 1)
ηG = 1 −
π
Number of evaluations:
1 2 arcsin(2pinit − 1)
ηE = 1−
2pinit π


Figure 4: Effects of initial-population bias on UMDA performance
Noisy Onemax: Experiments vs. Theory for β = 1 o
2 2
σN = 0.5σF = 0.125n.
Population size Number of generations
x
800 250 15
Experiment Experiment
Theory Theory

200
600
Population size

10
150
400
100 5
200
50

0 0
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0
pinit pinit

(a) Population size. (b) Number of generations. (
Empirical results match theory.
Figure 5: Effects of initial-population bias estimate. performance o
Population sizing remains a conservative on UMDA
2 =Note: β = 1 is a lot of noise (noise variance equal to overall
2
σN σF = 0.25n.
fitness variance).

Figure 8 visualizes the effects of external noise on the number of

Compare to Hill Climber on Deterministic Case
2 2
on UMDA performance with external noise σN = 2σF .
4
x 10
4
Experiment UMDA
heory Hill Climbing

3

2

1

0
7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
p
init

onemax. (b) Comparison of UMDA and HC.
Performance of HC is great regardless of bias.
This agrees with theory (M¨hlenbein, 1992).
u
500-bit deterministic onemax and its comparison to UMDA.
uhlenbein, Kumara Sastry is used to provide an upper bound on the
¨Martin Pelikan and 1992) Initial-Population Bias in UMDA

Compare to Hill Climber on Noisy Case

Performance of HC becomes poor with noise!

β n pinit HC evaluations UMDA evaluations
0.5 10 0.1 4,449 1,210
0.5 25 0.1 2,125,373 1,886
0.5 10 0.5 11,096 66
0.5 25 0.5 8,248,140 169
1.0 5 0.1 215 574
1.0 15 0.1 5,691,725 1,210
1.0 5 0.5 64 20
1.0 15 0.5 15,738,168 64


Conclusions

We have good theoretical understanding of the effects of one
type of initial-population bias on performance of UMDA on
deterministic and noisy onemax.
Effects of bias match intuition
Good bias improves performance.
Bad bias worsens performance.
Effects of bias are independent of noise.
Experimental results match theory.


Future Work

Study specific efficiency enhancement techniques and the bias
they introduce, and apply the theory developed here to
estimate the final effects.
Extend this work to other types of bias.
Extend this work to other evolutionary algorithms, especially
the standard genetic algorithms with two-parent
recombination and EDAs with multivariate models (e.g. BOA
and ecGA).
Eliminate the empirical fit from the model for the noisy
onemax.


Acknowledgments

Acknowledgments
NSF; NSF CAREER grant ECS-0547013.
U.S. Air Force, AFOSR; FA9550-06-1-0096.
University of Missouri; High Performance Computing
Collaboratory sponsored by Information Technology Services;
Research Award; Research Board.


Initial-Population Bias in the Univariate Estimation of Distribution Algorithm

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (14)

Destaque

Destaque (17)

Mais de Martin Pelikan

Mais de Martin Pelikan (8)

Último

Último (20)

Initial-Population Bias in the Univariate Estimation of Distribution Algorithm