On Parameter Tuning in Search-Based Software Engineering: A Replicated Empirical Study

On Parameter Tuning in Search-Based
Software Engineering:
A Replicated Empirical Study
Abdel Salam Sayyad
Katerina Goseva-Popstojanova
Tim Menzies
Hany Ammar
West Virginia University, USA
International Workshop on Replication in Software
Engineering Research (RESER)
Oct 9, 2013

Sound bites
Search-based Software Engineering
Is here… to stay.
A helper… Not an alternative to human SE

Randomness…
is an essential part of Search Algorithms
… hence the need for statistical examination (A lot to learn from Empirical SE)

Parameter Tuning
A real problem…
Default values (rules of thumb) do exist… and (sadly?) they are being followed

Default parameter values fail to optimize performance…
… As seen in the original study, and in this replication…
No Free Lunch Theorems for Optimization [Wolpert and Macready ‘97+
the same parameter values don’t optimize all algorithms for all problems.
2

Roadmap

①
②
③
④

Randomness of Search
The original study
The replication
Conclusion

Searching for what?
• Correct solutions…
– Conform to system relationships and constraints.

• Optimal solutions…
– Achieve user objectives/preferences…

• Complex problems have big Search spaces
– Exhaustive search not a practical idea.
5

Genetic Algorithm
• Start with a large population of candidate
solutions… (How large?)
• Evaluate the fitness of your solutions.
• Let your candidate solutions crossover –
exchange genes… (How often?)
• Mutate a small portion of your solutions.
(How small?)
• How do those choices affect performance?
6

Multi-objective Optimization

The Pareto Front

Higher-level
Decision Making

The Chosen Solution

7

Survival of the fittest
(according to NSGA-II [Deb et al. 2002])
Boolean dominance (x Dominates y, or does not):
- In no objective is x worse than y
- In at least one objective, x is better than y

Crowd
pruning

8

Indicator-Based Evolutionary
Algorithm (IBEA) [Zitzler and Kunzli ‘04+
1) For {old generation + new generation} do
– Add up every individual’s amount of dominance with
respect to everyone else

– Sort all instances by F
– Delete worst, recalculate, delete worst, recalculate, …

2) Then, standard GA (cross-over, mutation) on the
survivors  Create a new generation  Back to 1.
9

NSGA-II… the default algorithm
• Much prior work in SBSE (*)
Used NSGA-II

Didn’t state why!

-------------------------(*) Sayyad and Ammar, RAISE’13

10

The Original Study
• A. Arcuri and G. Fraser, "On Parameter Tuning in Search
Based Software Engineering," in Proc. SSBSE, 2011, pp.
33-47.
• A. Arcuri and G. Fraser, "Parameter Tuning or Default
Values? An Empirical Investigation in Search-Based
Software Engineering," Empirical Software Engineering,
Feb 2013.

• Problem: generating test vectors for objectoriented software.
• Fitness function: percentage of test coverage.
12

Results of original study
• Different parameter settings cause very large
variance in the performance.
• Default parameter settings perform relatively well,
but are far from optimal on individual problem
instances.

13

Feature–oriented domain analysis [Kang 1990]
• Feature models = a
lightweight method for
defining a space of options
• De facto standard for
modeling variability, e.g.
Software Product Lines
Cross-Tree Constraints

Cross-Tree Constraints
15

What are the user preferences?
• Suppose each feature had the following metrics:
1. Boolean USED_BEFORE?
2. Integer DEFECTS
3. Real
COST
• Show me the space of “best options” according to the objectives:
1. That satisfies most domain constraints (0 ≤ #violations ≤ 100%)
2. That offers most features
3. Maximize overall feature that were used before. (promote re-use)
4. Minimize overall known defects.
5. Minimize cost.

16

Previous Work *Sayyad et al. ICSE’13+
• IBEA (continuous dominance criterion) beats NSGA-II
and a host of other algorithms based on Boolean
dominance criterion.
• Especially with a high number of objectives.
• Quality indicators:
– Percentage of conforming (useable) solutions
• We’re interested in 100% conforming solutions.

– Hypervolume (how close to optimal?)
– Spread (how diverse?)

17

What are “default settings”?
• Population size = 100
• Crossover rate = 80%
– 60% < Crossover rate < 90%
• [A. E. Eiben and J. E. Smith, Introduction to Evolutionary
Computing.: Springer, 2003.]

• Mutation rate = 1/Features
• [one bit out of the whole string]
19

Results [10 sec / algorithm / FM]

21

Answer to RQ1
• RQ1: How Large is the Potential Impact of a
Wrong Choice of Parameter Settings?
• We confirm Arcuri and Fraser’s conclusion:
“Different parameter settings cause very large
variance in the performance.”

22

Answer to RQ2
• RQ2: How Does a “Default” Setting Compare to the
Best and Worst Achievable Performance?
• Arcuri and Fraser concluded that: “Default parameter
settings perform relatively well, but are far from
optimal on individual problem instances.”
• We make a stronger conclusion: “Default parameter
settings perform generally poorly, but might perform
relatively well on individual problem instances.”
23

Answer to RQ3
• RQ3: How does the performance of IBEA’s
best tuning compare to NSGA-II’s best
tuning?

• Our results show that “IBEA’s best tuning
performs generally much better than NSGA-II’s
best tuning.”

24

RQ4: Parameter Training
• Find best tuning for a group of problem instances, apply it
to a new problem instance, would it be best tuning for the
new problem?
• Arcuri and Fraser concluded that: “Tuning should be done
on a very large sample of problem instances. Otherwise, the
obtained parameter settings are likely to be worse than
arbitrary default values.”
• Our conclusion: “Tuning on a sample of problem instances
does not, in general, result in the best parameter values for
a new problem instance, but the obtained setting are
generally better than the defaults settings.”
25

Conclusion
• Default parameter values fail
to optimize performance…

• And, sadly, many SBSE
researchers choose “default”
algorithms (e.g. NSGA-II) along
with “default” parameters.
• Alternatives?
– A long way to go!

Acknowledgment
This research work
was funded by the
Qatar National
Research Fund under
the National Priorities
Research Program

• Parameter control
• Adaptive parameter control
27

On Parameter Tuning in Search-Based Software Engineering: A Replicated Empirical Study

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (19)

Destaque

Destaque (8)

Semelhante a On Parameter Tuning in Search-Based Software Engineering: A Replicated Empirical Study

Semelhante a On Parameter Tuning in Search-Based Software Engineering: A Replicated Empirical Study (20)

Mais de Abdel Salam Sayyad

Mais de Abdel Salam Sayyad (11)

Último

Último (20)

On Parameter Tuning in Search-Based Software Engineering: A Replicated Empirical Study