SlideShare uma empresa Scribd logo
1 de 27
On Parameter Tuning in Search-Based
Software Engineering:
A Replicated Empirical Study
Abdel Salam Sayyad
Katerina Goseva-Popstojanova
Tim Menzies
Hany Ammar
West Virginia University, USA
International Workshop on Replication in Software
Engineering Research (RESER)
Oct 9, 2013
Sound bites
Search-based Software Engineering
Is here… to stay.
A helper… Not an alternative to human SE

Randomness…
is an essential part of Search Algorithms
… hence the need for statistical examination (A lot to learn from Empirical SE)

Parameter Tuning
A real problem…
Default values (rules of thumb) do exist… and (sadly?) they are being followed

Default parameter values fail to optimize performance…
… As seen in the original study, and in this replication…
No Free Lunch Theorems for Optimization [Wolpert and Macready ‘97+
the same parameter values don’t optimize all algorithms for all problems.
2
Roadmap

①
②
③
④

Randomness of Search
The original study
The replication
Conclusion
Roadmap

①
②
③
④

Randomness of Search
The original study
The replication
Conclusion
Searching for what?
• Correct solutions…
– Conform to system relationships and constraints.

• Optimal solutions…
– Achieve user objectives/preferences…

• Complex problems have big Search spaces
– Exhaustive search not a practical idea.
5
Genetic Algorithm
• Start with a large population of candidate
solutions… (How large?)
• Evaluate the fitness of your solutions.
• Let your candidate solutions crossover –
exchange genes… (How often?)
• Mutate a small portion of your solutions.
(How small?)
• How do those choices affect performance?
6
Multi-objective Optimization

The Pareto Front

Higher-level
Decision Making

The Chosen Solution

7
Survival of the fittest
(according to NSGA-II [Deb et al. 2002])
Boolean dominance (x Dominates y, or does not):
- In no objective is x worse than y
- In at least one objective, x is better than y

Crowd
pruning

8
Indicator-Based Evolutionary
Algorithm (IBEA) [Zitzler and Kunzli ‘04+
1) For {old generation + new generation} do
– Add up every individual’s amount of dominance with
respect to everyone else

– Sort all instances by F
– Delete worst, recalculate, delete worst, recalculate, …

2) Then, standard GA (cross-over, mutation) on the
survivors  Create a new generation  Back to 1.
9
NSGA-II… the default algorithm
• Much prior work in SBSE (*)
Used NSGA-II

Didn’t state why!

-------------------------(*) Sayyad and Ammar, RAISE’13

10
Roadmap

①
②
③
④

Randomness of Search
The original study
The replication
Conclusion
The Original Study
• A. Arcuri and G. Fraser, "On Parameter Tuning in Search
Based Software Engineering," in Proc. SSBSE, 2011, pp.
33-47.
• A. Arcuri and G. Fraser, "Parameter Tuning or Default
Values? An Empirical Investigation in Search-Based
Software Engineering," Empirical Software Engineering,
Feb 2013.

• Problem: generating test vectors for objectoriented software.
• Fitness function: percentage of test coverage.
12
Results of original study
• Different parameter settings cause very large
variance in the performance.
• Default parameter settings perform relatively well,
but are far from optimal on individual problem
instances.

13
Roadmap

①
②
③
④

Randomness of Search
The original study
The replication
Conclusion
Feature–oriented domain analysis [Kang 1990]
• Feature models = a
lightweight method for
defining a space of options
• De facto standard for
modeling variability, e.g.
Software Product Lines
Cross-Tree Constraints

Cross-Tree Constraints
15
What are the user preferences?
• Suppose each feature had the following metrics:
1. Boolean USED_BEFORE?
2. Integer DEFECTS
3. Real
COST
• Show me the space of “best options” according to the objectives:
1. That satisfies most domain constraints (0 ≤ #violations ≤ 100%)
2. That offers most features
3. Maximize overall feature that were used before. (promote re-use)
4. Minimize overall known defects.
5. Minimize cost.

16
Previous Work *Sayyad et al. ICSE’13+
• IBEA (continuous dominance criterion) beats NSGA-II
and a host of other algorithms based on Boolean
dominance criterion.
• Especially with a high number of objectives.
• Quality indicators:
– Percentage of conforming (useable) solutions
• We’re interested in 100% conforming solutions.

– Hypervolume (how close to optimal?)
– Spread (how diverse?)

17
Setup

18
What are “default settings”?
• Population size = 100
• Crossover rate = 80%
– 60% < Crossover rate < 90%
• [A. E. Eiben and J. E. Smith, Introduction to Evolutionary
Computing.: Springer, 2003.]

• Mutation rate = 1/Features
• [one bit out of the whole string]
19
Research Questions

20
Results [10 sec / algorithm / FM]

21
Answer to RQ1
• RQ1: How Large is the Potential Impact of a
Wrong Choice of Parameter Settings?
• We confirm Arcuri and Fraser’s conclusion:
“Different parameter settings cause very large
variance in the performance.”

22
Answer to RQ2
• RQ2: How Does a “Default” Setting Compare to the
Best and Worst Achievable Performance?
• Arcuri and Fraser concluded that: “Default parameter
settings perform relatively well, but are far from
optimal on individual problem instances.”
• We make a stronger conclusion: “Default parameter
settings perform generally poorly, but might perform
relatively well on individual problem instances.”
23
Answer to RQ3
• RQ3: How does the performance of IBEA’s
best tuning compare to NSGA-II’s best
tuning?

• Our results show that “IBEA’s best tuning
performs generally much better than NSGA-II’s
best tuning.”

24
RQ4: Parameter Training
• Find best tuning for a group of problem instances, apply it
to a new problem instance, would it be best tuning for the
new problem?
• Arcuri and Fraser concluded that: “Tuning should be done
on a very large sample of problem instances. Otherwise, the
obtained parameter settings are likely to be worse than
arbitrary default values.”
• Our conclusion: “Tuning on a sample of problem instances
does not, in general, result in the best parameter values for
a new problem instance, but the obtained setting are
generally better than the defaults settings.”
25
Roadmap

①
②
③
④

Randomness of Search
The original study
The replication
Conclusion
Conclusion
• Default parameter values fail
to optimize performance…

• And, sadly, many SBSE
researchers choose “default”
algorithms (e.g. NSGA-II) along
with “default” parameters.
• Alternatives?
– A long way to go!

Acknowledgment
This research work
was funded by the
Qatar National
Research Fund under
the National Priorities
Research Program

• Parameter control
• Adaptive parameter control
27

Mais conteúdo relacionado

Mais procurados

Using Developer Information as a Prediction Factor
Using Developer Information as a Prediction FactorUsing Developer Information as a Prediction Factor
Using Developer Information as a Prediction FactorTim Menzies
 
Experimental design
Experimental designExperimental design
Experimental designDan Toma
 
Software testing using genetic algorithms
Software testing using genetic algorithmsSoftware testing using genetic algorithms
Software testing using genetic algorithmsNurhussen Menza
 
Model-Driven Run-Time Enforcement of Complex Role-Based Access Control Policies
Model-Driven Run-Time Enforcement of Complex Role-Based Access Control PoliciesModel-Driven Run-Time Enforcement of Complex Role-Based Access Control Policies
Model-Driven Run-Time Enforcement of Complex Role-Based Access Control PoliciesLionel Briand
 
Sound Empirical Evidence in Software Testing
Sound Empirical Evidence in Software TestingSound Empirical Evidence in Software Testing
Sound Empirical Evidence in Software TestingJaguaraci Silva
 
Case Study Research in Software Engineering
Case Study Research in Software EngineeringCase Study Research in Software Engineering
Case Study Research in Software Engineeringalessio_ferrari
 
An Empirical Comparison of Model Validation Techniques for Defect Prediction ...
An Empirical Comparison of Model Validation Techniques for Defect Prediction ...An Empirical Comparison of Model Validation Techniques for Defect Prediction ...
An Empirical Comparison of Model Validation Techniques for Defect Prediction ...Chakkrit (Kla) Tantithamthavorn
 
Scenario $4$
Scenario $4$Scenario $4$
Scenario $4$Jason121
 
AI-Driven Software Quality Assurance in the Age of DevOps
AI-Driven Software Quality Assurance in the Age of DevOpsAI-Driven Software Quality Assurance in the Age of DevOps
AI-Driven Software Quality Assurance in the Age of DevOpsChakkrit (Kla) Tantithamthavorn
 
Practical Guidelines to Improve Defect Prediction Model – A Review
Practical Guidelines to Improve Defect Prediction Model – A ReviewPractical Guidelines to Improve Defect Prediction Model – A Review
Practical Guidelines to Improve Defect Prediction Model – A Reviewinventionjournals
 
Towards a Better Understanding of the Impact of Experimental Components on De...
Towards a Better Understanding of the Impact of Experimental Components on De...Towards a Better Understanding of the Impact of Experimental Components on De...
Towards a Better Understanding of the Impact of Experimental Components on De...Chakkrit (Kla) Tantithamthavorn
 
Explainable Artificial Intelligence (XAI) 
to Predict and Explain Future Soft...
Explainable Artificial Intelligence (XAI) 
to Predict and Explain Future Soft...Explainable Artificial Intelligence (XAI) 
to Predict and Explain Future Soft...
Explainable Artificial Intelligence (XAI) 
to Predict and Explain Future Soft...Chakkrit (Kla) Tantithamthavorn
 
Complexity Measures for Secure Service-Orieted Software Architectures
Complexity Measures for Secure Service-Orieted Software ArchitecturesComplexity Measures for Secure Service-Orieted Software Architectures
Complexity Measures for Secure Service-Orieted Software ArchitecturesTim Menzies
 
Odin2018_Minh_ML_Risk_Prediction
Odin2018_Minh_ML_Risk_PredictionOdin2018_Minh_ML_Risk_Prediction
Odin2018_Minh_ML_Risk_PredictionMinh Nguyen
 

Mais procurados (19)

VST2022.pdf
VST2022.pdfVST2022.pdf
VST2022.pdf
 
[Tho Quan] Fault Localization - Where is the root cause of a bug?
[Tho Quan] Fault Localization - Where is the root cause of a bug?[Tho Quan] Fault Localization - Where is the root cause of a bug?
[Tho Quan] Fault Localization - Where is the root cause of a bug?
 
Using Developer Information as a Prediction Factor
Using Developer Information as a Prediction FactorUsing Developer Information as a Prediction Factor
Using Developer Information as a Prediction Factor
 
Experimental design
Experimental designExperimental design
Experimental design
 
Software testing using genetic algorithms
Software testing using genetic algorithmsSoftware testing using genetic algorithms
Software testing using genetic algorithms
 
Model-Driven Run-Time Enforcement of Complex Role-Based Access Control Policies
Model-Driven Run-Time Enforcement of Complex Role-Based Access Control PoliciesModel-Driven Run-Time Enforcement of Complex Role-Based Access Control Policies
Model-Driven Run-Time Enforcement of Complex Role-Based Access Control Policies
 
Wcre13b.ppt
Wcre13b.pptWcre13b.ppt
Wcre13b.ppt
 
Sound Empirical Evidence in Software Testing
Sound Empirical Evidence in Software TestingSound Empirical Evidence in Software Testing
Sound Empirical Evidence in Software Testing
 
Case Study Research in Software Engineering
Case Study Research in Software EngineeringCase Study Research in Software Engineering
Case Study Research in Software Engineering
 
An Empirical Comparison of Model Validation Techniques for Defect Prediction ...
An Empirical Comparison of Model Validation Techniques for Defect Prediction ...An Empirical Comparison of Model Validation Techniques for Defect Prediction ...
An Empirical Comparison of Model Validation Techniques for Defect Prediction ...
 
Scenario $4$
Scenario $4$Scenario $4$
Scenario $4$
 
Ssbse12b.ppt
Ssbse12b.pptSsbse12b.ppt
Ssbse12b.ppt
 
Wcre13a.ppt
Wcre13a.pptWcre13a.ppt
Wcre13a.ppt
 
AI-Driven Software Quality Assurance in the Age of DevOps
AI-Driven Software Quality Assurance in the Age of DevOpsAI-Driven Software Quality Assurance in the Age of DevOps
AI-Driven Software Quality Assurance in the Age of DevOps
 
Practical Guidelines to Improve Defect Prediction Model – A Review
Practical Guidelines to Improve Defect Prediction Model – A ReviewPractical Guidelines to Improve Defect Prediction Model – A Review
Practical Guidelines to Improve Defect Prediction Model – A Review
 
Towards a Better Understanding of the Impact of Experimental Components on De...
Towards a Better Understanding of the Impact of Experimental Components on De...Towards a Better Understanding of the Impact of Experimental Components on De...
Towards a Better Understanding of the Impact of Experimental Components on De...
 
Explainable Artificial Intelligence (XAI) 
to Predict and Explain Future Soft...
Explainable Artificial Intelligence (XAI) 
to Predict and Explain Future Soft...Explainable Artificial Intelligence (XAI) 
to Predict and Explain Future Soft...
Explainable Artificial Intelligence (XAI) 
to Predict and Explain Future Soft...
 
Complexity Measures for Secure Service-Orieted Software Architectures
Complexity Measures for Secure Service-Orieted Software ArchitecturesComplexity Measures for Secure Service-Orieted Software Architectures
Complexity Measures for Secure Service-Orieted Software Architectures
 
Odin2018_Minh_ML_Risk_Prediction
Odin2018_Minh_ML_Risk_PredictionOdin2018_Minh_ML_Risk_Prediction
Odin2018_Minh_ML_Risk_Prediction
 

Destaque

Evolución de la web
Evolución de la webEvolución de la web
Evolución de la webJuliana Punk
 
Bases sobre teoria da cor aplicada aos sistemas
Bases sobre teoria da cor aplicada aos sistemasBases sobre teoria da cor aplicada aos sistemas
Bases sobre teoria da cor aplicada aos sistemasJoana Andrino
 
Plasticity: Workplace Social Engagement Software
Plasticity: Workplace Social Engagement SoftwarePlasticity: Workplace Social Engagement Software
Plasticity: Workplace Social Engagement SoftwareJim Moss
 
Funciones inversas y compuestas
Funciones inversas y compuestasFunciones inversas y compuestas
Funciones inversas y compuestasalbertoalamos09
 
Fracturamiento hidráulico de yacimientos de hidrocarburos
Fracturamiento hidráulico de yacimientos de hidrocarburosFracturamiento hidráulico de yacimientos de hidrocarburos
Fracturamiento hidráulico de yacimientos de hidrocarburosPhirored
 
Top Ten Devices to Get on the Web
Top Ten Devices to Get on the WebTop Ten Devices to Get on the Web
Top Ten Devices to Get on the Webmatthewjfrederick2
 
Legacy Games 2013 - Leader in Branded Games
Legacy Games 2013 - Leader in Branded GamesLegacy Games 2013 - Leader in Branded Games
Legacy Games 2013 - Leader in Branded GamesAriella Lehrer
 
Sistema de llenado
Sistema de llenadoSistema de llenado
Sistema de llenadoyanirys26
 

Destaque (8)

Evolución de la web
Evolución de la webEvolución de la web
Evolución de la web
 
Bases sobre teoria da cor aplicada aos sistemas
Bases sobre teoria da cor aplicada aos sistemasBases sobre teoria da cor aplicada aos sistemas
Bases sobre teoria da cor aplicada aos sistemas
 
Plasticity: Workplace Social Engagement Software
Plasticity: Workplace Social Engagement SoftwarePlasticity: Workplace Social Engagement Software
Plasticity: Workplace Social Engagement Software
 
Funciones inversas y compuestas
Funciones inversas y compuestasFunciones inversas y compuestas
Funciones inversas y compuestas
 
Fracturamiento hidráulico de yacimientos de hidrocarburos
Fracturamiento hidráulico de yacimientos de hidrocarburosFracturamiento hidráulico de yacimientos de hidrocarburos
Fracturamiento hidráulico de yacimientos de hidrocarburos
 
Top Ten Devices to Get on the Web
Top Ten Devices to Get on the WebTop Ten Devices to Get on the Web
Top Ten Devices to Get on the Web
 
Legacy Games 2013 - Leader in Branded Games
Legacy Games 2013 - Leader in Branded GamesLegacy Games 2013 - Leader in Branded Games
Legacy Games 2013 - Leader in Branded Games
 
Sistema de llenado
Sistema de llenadoSistema de llenado
Sistema de llenado
 

Semelhante a On Parameter Tuning in Search-Based Software Engineering: A Replicated Empirical Study

On the Value of User Preferences in Search-Based Software Engineering
On the Value of User Preferences in Search-Based Software EngineeringOn the Value of User Preferences in Search-Based Software Engineering
On the Value of User Preferences in Search-Based Software EngineeringAbdel Salam Sayyad
 
Scalable Software Testing and Verification of Non-Functional Properties throu...
Scalable Software Testing and Verification of Non-Functional Properties throu...Scalable Software Testing and Verification of Non-Functional Properties throu...
Scalable Software Testing and Verification of Non-Functional Properties throu...Lionel Briand
 
Artificial Intelligence for Automated Software Testing
Artificial Intelligence for Automated Software TestingArtificial Intelligence for Automated Software Testing
Artificial Intelligence for Automated Software TestingLionel Briand
 
Pareto-Optimal Search-Based Software Engineering (POSBSE): A Literature Survey
Pareto-Optimal Search-Based Software Engineering (POSBSE): A Literature SurveyPareto-Optimal Search-Based Software Engineering (POSBSE): A Literature Survey
Pareto-Optimal Search-Based Software Engineering (POSBSE): A Literature SurveyAbdel Salam Sayyad
 
In the age of Big Data, what role for Software Engineers?
In the age of Big Data, what role for Software Engineers?In the age of Big Data, what role for Software Engineers?
In the age of Big Data, what role for Software Engineers?CS, NcState
 
Enabling Automated Software Testing with Artificial Intelligence
Enabling Automated Software Testing with Artificial IntelligenceEnabling Automated Software Testing with Artificial Intelligence
Enabling Automated Software Testing with Artificial IntelligenceLionel Briand
 
Automated Software Enging, Fall 2015, NCSU
Automated Software Enging, Fall 2015, NCSUAutomated Software Enging, Fall 2015, NCSU
Automated Software Enging, Fall 2015, NCSUCS, NcState
 
It Does What You Say, Not What You Mean: Lessons From A Decade of Program Repair
It Does What You Say, Not What You Mean: Lessons From A Decade of Program RepairIt Does What You Say, Not What You Mean: Lessons From A Decade of Program Repair
It Does What You Say, Not What You Mean: Lessons From A Decade of Program RepairClaire Le Goues
 
Automated Testing of Autonomous Driving Assistance Systems
Automated Testing of Autonomous Driving Assistance SystemsAutomated Testing of Autonomous Driving Assistance Systems
Automated Testing of Autonomous Driving Assistance SystemsLionel Briand
 
Presentation by Lionel Briand
Presentation by Lionel BriandPresentation by Lionel Briand
Presentation by Lionel BriandPtidej Team
 
Efficient Online Testing for DNN-Enabled Systems using Surrogate-Assisted and...
Efficient Online Testing for DNN-Enabled Systems using Surrogate-Assisted and...Efficient Online Testing for DNN-Enabled Systems using Surrogate-Assisted and...
Efficient Online Testing for DNN-Enabled Systems using Surrogate-Assisted and...Lionel Briand
 
What Metrics Matter?
What Metrics Matter? What Metrics Matter?
What Metrics Matter? CS, NcState
 
AI in SE: A 25-year Journey
AI in SE: A 25-year JourneyAI in SE: A 25-year Journey
AI in SE: A 25-year JourneyLionel Briand
 
Evolving the Optimal Relevancy Ranking Model at Dice.com
Evolving the Optimal Relevancy Ranking Model at Dice.comEvolving the Optimal Relevancy Ranking Model at Dice.com
Evolving the Optimal Relevancy Ranking Model at Dice.comSimon Hughes
 
Software Engineering Research: Leading a Double-Agent Life.
Software Engineering Research: Leading a Double-Agent Life.Software Engineering Research: Leading a Double-Agent Life.
Software Engineering Research: Leading a Double-Agent Life.Lionel Briand
 
SMART International Symposium for Next Generation Infrastructure: The roles o...
SMART International Symposium for Next Generation Infrastructure: The roles o...SMART International Symposium for Next Generation Infrastructure: The roles o...
SMART International Symposium for Next Generation Infrastructure: The roles o...SMART Infrastructure Facility
 
Recommendation engine Using Genetic Algorithm
Recommendation engine Using Genetic AlgorithmRecommendation engine Using Genetic Algorithm
Recommendation engine Using Genetic AlgorithmVaibhav Varshney
 
Principles of effort estimation
Principles of effort estimationPrinciples of effort estimation
Principles of effort estimationCS, NcState
 
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016MLconf
 

Semelhante a On Parameter Tuning in Search-Based Software Engineering: A Replicated Empirical Study (20)

On the Value of User Preferences in Search-Based Software Engineering
On the Value of User Preferences in Search-Based Software EngineeringOn the Value of User Preferences in Search-Based Software Engineering
On the Value of User Preferences in Search-Based Software Engineering
 
Scalable Software Testing and Verification of Non-Functional Properties throu...
Scalable Software Testing and Verification of Non-Functional Properties throu...Scalable Software Testing and Verification of Non-Functional Properties throu...
Scalable Software Testing and Verification of Non-Functional Properties throu...
 
Artificial Intelligence for Automated Software Testing
Artificial Intelligence for Automated Software TestingArtificial Intelligence for Automated Software Testing
Artificial Intelligence for Automated Software Testing
 
Pareto-Optimal Search-Based Software Engineering (POSBSE): A Literature Survey
Pareto-Optimal Search-Based Software Engineering (POSBSE): A Literature SurveyPareto-Optimal Search-Based Software Engineering (POSBSE): A Literature Survey
Pareto-Optimal Search-Based Software Engineering (POSBSE): A Literature Survey
 
In the age of Big Data, what role for Software Engineers?
In the age of Big Data, what role for Software Engineers?In the age of Big Data, what role for Software Engineers?
In the age of Big Data, what role for Software Engineers?
 
Enabling Automated Software Testing with Artificial Intelligence
Enabling Automated Software Testing with Artificial IntelligenceEnabling Automated Software Testing with Artificial Intelligence
Enabling Automated Software Testing with Artificial Intelligence
 
Automated Software Enging, Fall 2015, NCSU
Automated Software Enging, Fall 2015, NCSUAutomated Software Enging, Fall 2015, NCSU
Automated Software Enging, Fall 2015, NCSU
 
It Does What You Say, Not What You Mean: Lessons From A Decade of Program Repair
It Does What You Say, Not What You Mean: Lessons From A Decade of Program RepairIt Does What You Say, Not What You Mean: Lessons From A Decade of Program Repair
It Does What You Say, Not What You Mean: Lessons From A Decade of Program Repair
 
Automated Testing of Autonomous Driving Assistance Systems
Automated Testing of Autonomous Driving Assistance SystemsAutomated Testing of Autonomous Driving Assistance Systems
Automated Testing of Autonomous Driving Assistance Systems
 
Presentation by Lionel Briand
Presentation by Lionel BriandPresentation by Lionel Briand
Presentation by Lionel Briand
 
Efficient Online Testing for DNN-Enabled Systems using Surrogate-Assisted and...
Efficient Online Testing for DNN-Enabled Systems using Surrogate-Assisted and...Efficient Online Testing for DNN-Enabled Systems using Surrogate-Assisted and...
Efficient Online Testing for DNN-Enabled Systems using Surrogate-Assisted and...
 
What Metrics Matter?
What Metrics Matter? What Metrics Matter?
What Metrics Matter?
 
AI in SE: A 25-year Journey
AI in SE: A 25-year JourneyAI in SE: A 25-year Journey
AI in SE: A 25-year Journey
 
Evolving the Optimal Relevancy Ranking Model at Dice.com
Evolving the Optimal Relevancy Ranking Model at Dice.comEvolving the Optimal Relevancy Ranking Model at Dice.com
Evolving the Optimal Relevancy Ranking Model at Dice.com
 
Software Engineering Research: Leading a Double-Agent Life.
Software Engineering Research: Leading a Double-Agent Life.Software Engineering Research: Leading a Double-Agent Life.
Software Engineering Research: Leading a Double-Agent Life.
 
SMART International Symposium for Next Generation Infrastructure: The roles o...
SMART International Symposium for Next Generation Infrastructure: The roles o...SMART International Symposium for Next Generation Infrastructure: The roles o...
SMART International Symposium for Next Generation Infrastructure: The roles o...
 
Recommendation engine Using Genetic Algorithm
Recommendation engine Using Genetic AlgorithmRecommendation engine Using Genetic Algorithm
Recommendation engine Using Genetic Algorithm
 
Principles of effort estimation
Principles of effort estimationPrinciples of effort estimation
Principles of effort estimation
 
Software Testing
Software TestingSoftware Testing
Software Testing
 
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
 

Mais de Abdel Salam Sayyad

Slide set 3 honesty, academic ethics
Slide set 3  honesty, academic ethicsSlide set 3  honesty, academic ethics
Slide set 3 honesty, academic ethicsAbdel Salam Sayyad
 
Slide set 1 intro to professional ethics
Slide set 1  intro to professional ethicsSlide set 1  intro to professional ethics
Slide set 1 intro to professional ethicsAbdel Salam Sayyad
 
Teaching methods - Active learning
Teaching methods - Active learningTeaching methods - Active learning
Teaching methods - Active learningAbdel Salam Sayyad
 
Software Engineering Code of Ethics
Software Engineering Code of EthicsSoftware Engineering Code of Ethics
Software Engineering Code of EthicsAbdel Salam Sayyad
 
Of Machines and Men: AI and Decision Making
Of Machines and Men: AI and Decision MakingOf Machines and Men: AI and Decision Making
Of Machines and Men: AI and Decision MakingAbdel Salam Sayyad
 
Scalable Product Line Configuration - ASE 2013 Palo Alto, CA
Scalable Product Line Configuration - ASE 2013 Palo Alto, CAScalable Product Line Configuration - ASE 2013 Palo Alto, CA
Scalable Product Line Configuration - ASE 2013 Palo Alto, CAAbdel Salam Sayyad
 

Mais de Abdel Salam Sayyad (11)

Slide set 5 workplace rights
Slide set 5  workplace rightsSlide set 5  workplace rights
Slide set 5 workplace rights
 
Slide set 4 safety and risk
Slide set 4  safety and riskSlide set 4  safety and risk
Slide set 4 safety and risk
 
Slide set 3 honesty, academic ethics
Slide set 3  honesty, academic ethicsSlide set 3  honesty, academic ethics
Slide set 3 honesty, academic ethics
 
Slide set 2 moral dilemmas
Slide set 2  moral dilemmasSlide set 2  moral dilemmas
Slide set 2 moral dilemmas
 
Slide set 1 intro to professional ethics
Slide set 1  intro to professional ethicsSlide set 1  intro to professional ethics
Slide set 1 intro to professional ethics
 
Teaching methods - Active learning
Teaching methods - Active learningTeaching methods - Active learning
Teaching methods - Active learning
 
Software Engineering Code of Ethics
Software Engineering Code of EthicsSoftware Engineering Code of Ethics
Software Engineering Code of Ethics
 
Of Machines and Men: AI and Decision Making
Of Machines and Men: AI and Decision MakingOf Machines and Men: AI and Decision Making
Of Machines and Men: AI and Decision Making
 
Scalable Product Line Configuration - ASE 2013 Palo Alto, CA
Scalable Product Line Configuration - ASE 2013 Palo Alto, CAScalable Product Line Configuration - ASE 2013 Palo Alto, CA
Scalable Product Line Configuration - ASE 2013 Palo Alto, CA
 
My summary 6-24-2013
My summary 6-24-2013My summary 6-24-2013
My summary 6-24-2013
 
Guest Lecture 1/30/2013
Guest Lecture 1/30/2013Guest Lecture 1/30/2013
Guest Lecture 1/30/2013
 

Último

"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 

Último (20)

"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 

On Parameter Tuning in Search-Based Software Engineering: A Replicated Empirical Study

  • 1. On Parameter Tuning in Search-Based Software Engineering: A Replicated Empirical Study Abdel Salam Sayyad Katerina Goseva-Popstojanova Tim Menzies Hany Ammar West Virginia University, USA International Workshop on Replication in Software Engineering Research (RESER) Oct 9, 2013
  • 2. Sound bites Search-based Software Engineering Is here… to stay. A helper… Not an alternative to human SE Randomness… is an essential part of Search Algorithms … hence the need for statistical examination (A lot to learn from Empirical SE) Parameter Tuning A real problem… Default values (rules of thumb) do exist… and (sadly?) they are being followed Default parameter values fail to optimize performance… … As seen in the original study, and in this replication… No Free Lunch Theorems for Optimization [Wolpert and Macready ‘97+ the same parameter values don’t optimize all algorithms for all problems. 2
  • 3. Roadmap ① ② ③ ④ Randomness of Search The original study The replication Conclusion
  • 4. Roadmap ① ② ③ ④ Randomness of Search The original study The replication Conclusion
  • 5. Searching for what? • Correct solutions… – Conform to system relationships and constraints. • Optimal solutions… – Achieve user objectives/preferences… • Complex problems have big Search spaces – Exhaustive search not a practical idea. 5
  • 6. Genetic Algorithm • Start with a large population of candidate solutions… (How large?) • Evaluate the fitness of your solutions. • Let your candidate solutions crossover – exchange genes… (How often?) • Mutate a small portion of your solutions. (How small?) • How do those choices affect performance? 6
  • 7. Multi-objective Optimization The Pareto Front Higher-level Decision Making The Chosen Solution 7
  • 8. Survival of the fittest (according to NSGA-II [Deb et al. 2002]) Boolean dominance (x Dominates y, or does not): - In no objective is x worse than y - In at least one objective, x is better than y Crowd pruning 8
  • 9. Indicator-Based Evolutionary Algorithm (IBEA) [Zitzler and Kunzli ‘04+ 1) For {old generation + new generation} do – Add up every individual’s amount of dominance with respect to everyone else – Sort all instances by F – Delete worst, recalculate, delete worst, recalculate, … 2) Then, standard GA (cross-over, mutation) on the survivors  Create a new generation  Back to 1. 9
  • 10. NSGA-II… the default algorithm • Much prior work in SBSE (*) Used NSGA-II Didn’t state why! -------------------------(*) Sayyad and Ammar, RAISE’13 10
  • 11. Roadmap ① ② ③ ④ Randomness of Search The original study The replication Conclusion
  • 12. The Original Study • A. Arcuri and G. Fraser, "On Parameter Tuning in Search Based Software Engineering," in Proc. SSBSE, 2011, pp. 33-47. • A. Arcuri and G. Fraser, "Parameter Tuning or Default Values? An Empirical Investigation in Search-Based Software Engineering," Empirical Software Engineering, Feb 2013. • Problem: generating test vectors for objectoriented software. • Fitness function: percentage of test coverage. 12
  • 13. Results of original study • Different parameter settings cause very large variance in the performance. • Default parameter settings perform relatively well, but are far from optimal on individual problem instances. 13
  • 14. Roadmap ① ② ③ ④ Randomness of Search The original study The replication Conclusion
  • 15. Feature–oriented domain analysis [Kang 1990] • Feature models = a lightweight method for defining a space of options • De facto standard for modeling variability, e.g. Software Product Lines Cross-Tree Constraints Cross-Tree Constraints 15
  • 16. What are the user preferences? • Suppose each feature had the following metrics: 1. Boolean USED_BEFORE? 2. Integer DEFECTS 3. Real COST • Show me the space of “best options” according to the objectives: 1. That satisfies most domain constraints (0 ≤ #violations ≤ 100%) 2. That offers most features 3. Maximize overall feature that were used before. (promote re-use) 4. Minimize overall known defects. 5. Minimize cost. 16
  • 17. Previous Work *Sayyad et al. ICSE’13+ • IBEA (continuous dominance criterion) beats NSGA-II and a host of other algorithms based on Boolean dominance criterion. • Especially with a high number of objectives. • Quality indicators: – Percentage of conforming (useable) solutions • We’re interested in 100% conforming solutions. – Hypervolume (how close to optimal?) – Spread (how diverse?) 17
  • 19. What are “default settings”? • Population size = 100 • Crossover rate = 80% – 60% < Crossover rate < 90% • [A. E. Eiben and J. E. Smith, Introduction to Evolutionary Computing.: Springer, 2003.] • Mutation rate = 1/Features • [one bit out of the whole string] 19
  • 21. Results [10 sec / algorithm / FM] 21
  • 22. Answer to RQ1 • RQ1: How Large is the Potential Impact of a Wrong Choice of Parameter Settings? • We confirm Arcuri and Fraser’s conclusion: “Different parameter settings cause very large variance in the performance.” 22
  • 23. Answer to RQ2 • RQ2: How Does a “Default” Setting Compare to the Best and Worst Achievable Performance? • Arcuri and Fraser concluded that: “Default parameter settings perform relatively well, but are far from optimal on individual problem instances.” • We make a stronger conclusion: “Default parameter settings perform generally poorly, but might perform relatively well on individual problem instances.” 23
  • 24. Answer to RQ3 • RQ3: How does the performance of IBEA’s best tuning compare to NSGA-II’s best tuning? • Our results show that “IBEA’s best tuning performs generally much better than NSGA-II’s best tuning.” 24
  • 25. RQ4: Parameter Training • Find best tuning for a group of problem instances, apply it to a new problem instance, would it be best tuning for the new problem? • Arcuri and Fraser concluded that: “Tuning should be done on a very large sample of problem instances. Otherwise, the obtained parameter settings are likely to be worse than arbitrary default values.” • Our conclusion: “Tuning on a sample of problem instances does not, in general, result in the best parameter values for a new problem instance, but the obtained setting are generally better than the defaults settings.” 25
  • 26. Roadmap ① ② ③ ④ Randomness of Search The original study The replication Conclusion
  • 27. Conclusion • Default parameter values fail to optimize performance… • And, sadly, many SBSE researchers choose “default” algorithms (e.g. NSGA-II) along with “default” parameters. • Alternatives? – A long way to go! Acknowledgment This research work was funded by the Qatar National Research Fund under the National Priorities Research Program • Parameter control • Adaptive parameter control 27