"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
On Parameter Tuning in Search-Based Software Engineering: A Replicated Empirical Study
1. On Parameter Tuning in Search-Based
Software Engineering:
A Replicated Empirical Study
Abdel Salam Sayyad
Katerina Goseva-Popstojanova
Tim Menzies
Hany Ammar
West Virginia University, USA
International Workshop on Replication in Software
Engineering Research (RESER)
Oct 9, 2013
2. Sound bites
Search-based Software Engineering
Is here… to stay.
A helper… Not an alternative to human SE
Randomness…
is an essential part of Search Algorithms
… hence the need for statistical examination (A lot to learn from Empirical SE)
Parameter Tuning
A real problem…
Default values (rules of thumb) do exist… and (sadly?) they are being followed
Default parameter values fail to optimize performance…
… As seen in the original study, and in this replication…
No Free Lunch Theorems for Optimization [Wolpert and Macready ‘97+
the same parameter values don’t optimize all algorithms for all problems.
2
5. Searching for what?
• Correct solutions…
– Conform to system relationships and constraints.
• Optimal solutions…
– Achieve user objectives/preferences…
• Complex problems have big Search spaces
– Exhaustive search not a practical idea.
5
6. Genetic Algorithm
• Start with a large population of candidate
solutions… (How large?)
• Evaluate the fitness of your solutions.
• Let your candidate solutions crossover –
exchange genes… (How often?)
• Mutate a small portion of your solutions.
(How small?)
• How do those choices affect performance?
6
8. Survival of the fittest
(according to NSGA-II [Deb et al. 2002])
Boolean dominance (x Dominates y, or does not):
- In no objective is x worse than y
- In at least one objective, x is better than y
Crowd
pruning
8
9. Indicator-Based Evolutionary
Algorithm (IBEA) [Zitzler and Kunzli ‘04+
1) For {old generation + new generation} do
– Add up every individual’s amount of dominance with
respect to everyone else
– Sort all instances by F
– Delete worst, recalculate, delete worst, recalculate, …
2) Then, standard GA (cross-over, mutation) on the
survivors Create a new generation Back to 1.
9
10. NSGA-II… the default algorithm
• Much prior work in SBSE (*)
Used NSGA-II
Didn’t state why!
-------------------------(*) Sayyad and Ammar, RAISE’13
10
12. The Original Study
• A. Arcuri and G. Fraser, "On Parameter Tuning in Search
Based Software Engineering," in Proc. SSBSE, 2011, pp.
33-47.
• A. Arcuri and G. Fraser, "Parameter Tuning or Default
Values? An Empirical Investigation in Search-Based
Software Engineering," Empirical Software Engineering,
Feb 2013.
• Problem: generating test vectors for objectoriented software.
• Fitness function: percentage of test coverage.
12
13. Results of original study
• Different parameter settings cause very large
variance in the performance.
• Default parameter settings perform relatively well,
but are far from optimal on individual problem
instances.
13
15. Feature–oriented domain analysis [Kang 1990]
• Feature models = a
lightweight method for
defining a space of options
• De facto standard for
modeling variability, e.g.
Software Product Lines
Cross-Tree Constraints
Cross-Tree Constraints
15
16. What are the user preferences?
• Suppose each feature had the following metrics:
1. Boolean USED_BEFORE?
2. Integer DEFECTS
3. Real
COST
• Show me the space of “best options” according to the objectives:
1. That satisfies most domain constraints (0 ≤ #violations ≤ 100%)
2. That offers most features
3. Maximize overall feature that were used before. (promote re-use)
4. Minimize overall known defects.
5. Minimize cost.
16
17. Previous Work *Sayyad et al. ICSE’13+
• IBEA (continuous dominance criterion) beats NSGA-II
and a host of other algorithms based on Boolean
dominance criterion.
• Especially with a high number of objectives.
• Quality indicators:
– Percentage of conforming (useable) solutions
• We’re interested in 100% conforming solutions.
– Hypervolume (how close to optimal?)
– Spread (how diverse?)
17
19. What are “default settings”?
• Population size = 100
• Crossover rate = 80%
– 60% < Crossover rate < 90%
• [A. E. Eiben and J. E. Smith, Introduction to Evolutionary
Computing.: Springer, 2003.]
• Mutation rate = 1/Features
• [one bit out of the whole string]
19
22. Answer to RQ1
• RQ1: How Large is the Potential Impact of a
Wrong Choice of Parameter Settings?
• We confirm Arcuri and Fraser’s conclusion:
“Different parameter settings cause very large
variance in the performance.”
22
23. Answer to RQ2
• RQ2: How Does a “Default” Setting Compare to the
Best and Worst Achievable Performance?
• Arcuri and Fraser concluded that: “Default parameter
settings perform relatively well, but are far from
optimal on individual problem instances.”
• We make a stronger conclusion: “Default parameter
settings perform generally poorly, but might perform
relatively well on individual problem instances.”
23
24. Answer to RQ3
• RQ3: How does the performance of IBEA’s
best tuning compare to NSGA-II’s best
tuning?
• Our results show that “IBEA’s best tuning
performs generally much better than NSGA-II’s
best tuning.”
24
25. RQ4: Parameter Training
• Find best tuning for a group of problem instances, apply it
to a new problem instance, would it be best tuning for the
new problem?
• Arcuri and Fraser concluded that: “Tuning should be done
on a very large sample of problem instances. Otherwise, the
obtained parameter settings are likely to be worse than
arbitrary default values.”
• Our conclusion: “Tuning on a sample of problem instances
does not, in general, result in the best parameter values for
a new problem instance, but the obtained setting are
generally better than the defaults settings.”
25
27. Conclusion
• Default parameter values fail
to optimize performance…
• And, sadly, many SBSE
researchers choose “default”
algorithms (e.g. NSGA-II) along
with “default” parameters.
• Alternatives?
– A long way to go!
Acknowledgment
This research work
was funded by the
Qatar National
Research Fund under
the National Priorities
Research Program
• Parameter control
• Adaptive parameter control
27