Understanding the Value of Software Engineering Technologies
1. Understanding the Value of Software
Engineering Technologies
Phillip Green, Tim Menzies, Steven Williams, Oussama El-Rasaws
WVU, USA
ASE’09. November. Auckland, New Zealand
?
2. Many technologies are good
• Model checking is a good thing
• Runtime verification is a good thing
• Test generation is a good thing
• Mining OO patterns is a good thing.
• X is a good thing
• Y is a good thing
• Z is a …
• etc
Automated SE is a “good thing”
3. But which are better?
• Is “technology1” more cost- effective than “technology2”?
• What is a “technology”?
– Paper-based; e.g.
• orthogonal defect classification .
– Tool-based; e.g.
• functional programming languages
• execution and testing tools
• automated formal analysis
– Process-based; e.g.
• changing an organization’s hiring practices,
• an agile continual renegotiation of the requirements
Better than others, in context of particular project?
4. This talk
• When making relative
assessments of cost/benefits of
SE technologies,
– Context changes everything
– Claims of relative cost benefits
of tools are context-dependent
• The “local lessons effect”
– Ideas that are useful in general
– May not the most useful in
particular
• Tools to find context-dependent
local lessons
– “W”: case-based reasoner
(goal: reduce effort)
– “NOVA”: more intricate tool
(goal: reduce effort and time and
defects)
• Take home lessons:
– Beware blanket claims of
“this is better”
– Always define context where
you tested your tools
– Compare ASE to non-ASE tools
The real world is a special case
5. Roadmap
• Round 1:
– “W” : a very simple local lessons” finder
• Discussion
– What’s wrong with “W” ?
• Round 2:
– “NOVA”
– Results from NOVA
• Conclusions
6. Roadmap
• Round 1:
– “W” : a very simple local lessons” finder
• Discussion
– What’s wrong with “W” ?
• Round 2:
– “NOVA”
– Results from NOVA
• Conclusions
8. “W”:Simpler (Bayesian) Contrast
Set Learning (in linear time)
• “best” = target class
• “rest” = other classes
• x = any range (e.g. sex = female)
• f(x|c) = frequency of x in class c
• b = f( x | best ) / F(best)
• r = f( x | rest ) / F(rest)
• LOR= log(odds ratio) = log(b/r)
– ? normalize 0 to max = 1 to 100
– e = 2.7183 …
– p = F(B) / (F(B) + F(R))
– P(B) = 1 / (1 + e^(-1*ln(p/(1 - p)) - s ))
Mozina: KDD’04
“W”:
1) Discretize data and outcomes
2) Count frequencies of ranges in classes
3) Sort ranges by LOR
4) Greedy search on top ranked ranges
9. Preliminaries
“W” + CBR
• “Query”
– What kind of project you want to analyze; e.g.
• Analysts not so clever,
• High reliability system
• Small KLOC
• “Cases”
– Historical records, with their development effort
• Output:
– A recommendation on how to change our projects
in order to reduce development effort
11. Cases
train test
Cases map features F to a utility
F= Controllables + others
(query ⊆ ranges)
relevant
k-NN
12. Cases
train test
(query ⊆ ranges)
relevant
Best
utilities
rest
x
x
b = F(x | best) / F(best)
r = F(x | rest) / F(rest)
k-NN
Cases map features F to a utility
F= Controllables + others
13. Cases
train test
(query ⊆ ranges)
relevant
Best
utilities
rest
x
x
S = all x sorted descending by score
if controllable(x) &&
b > r &&
b > min
then score(x) = log(b/r)
else score(x) = 0
fi
k-NN
Cases map features F to a utility
F= Controllables + others
b = F(x | best) / F(best)
r = F(x | rest) / F(rest)
14. Cases
train test
(query ⊆ ranges)
relevant
Best
utilities
rest
x
x
S = all x sorted descending by score
queryi* =
query + ∪iSi
treatedi
k-NN
k-NN
Cases map features F to a utility
F= Controllables + others
if controllable(x) &&
b > r &&
b > min
then score(x) = log(b/r)
else score(x) = 0
fi
b = F(x | best) / F(best)
r = F(x | rest) / F(rest)
15. Cases
train test
(query ⊆ ranges)
relevant
Best
utilities
rest
x
x
S = all x sorted descending by score
queryi* =
query + ∪iSi
treatedi
k-NN
k-NN
i
utility
spread
Cases map features F to a utility
F= Controllables + others
if controllable(x) &&
b > r &&
b > min
then score(x) = log(b/r)
else score(x) = 0
fi
b = F(x | best) / F(best)
r = F(x | rest) / F(rest)
median
16. Cases
train test
(query ⊆ ranges)
relevant
Best
utilities
rest
x
x
S = all x sorted descending by score
queryi* =
query + ∪iSi
treatedi
k-NN
k-NN
i
q0* qi*
As is To be
Cases map features F to a utility
F= Controllables + others
if controllable(x) &&
b > r &&
b > min
then score(x) = log(b/r)
else score(x) = 0
fi
treatment
b = F(x | best) / F(best)
r = F(x | rest) / F(rest)
i
utility
spread
median
17. Results (distribution of
development efforts in qi*)
Cases from promisedata.org/data
Median = 50% percentile
Spread = 75% - 25% percentile
Improvement = (X - Y) / X
• X = as is
• Y = to be
• more is better
Usually:
• spread ≥ 75% improvement
• median ≥ 60% improvement
0%
50%
100%
150%
-50% 0% 50% 100% 150%
median improvement
spreadimprovement
Using cases from http://promisedata.org
19. Roadmap
• Round 1:
– “W” : a very simple local lessons” finder
• Discussion
– What’s wrong with “W” ?
• Round 2:
– “NOVA”
– Results from NOVA
• Conclusions
20. Cases
train test
(query ⊆ ranges)
relevant
Best
utilities
rest
x
x
S = all x sorted descending by score
queryi* =
query + ∪iSi
treatedi
k-NN
k-NN
i
q0* qi*
As is To be
Cases map features F to a utility
F= Controllables + others
if controllable(x) &&
b > r &&
b > min
then score(x) = log(b/r)
else score(x) = 0
fi
treatment
b = F(x | best) / F(best)
r = F(x | rest) / F(rest)
i
utility
spread
median
21. Cases
train test
(query ⊆ ranges)
relevant
Best
utilities
rest
x
x
S = all x sorted descending by score
queryi* =
query + ∪iSi
treatedi
k-NN
k-NN
i
q0* qi*
As is To be
Cases map features F to a utility
F= Controllables + others
if controllable(x) &&
b > r &&
b > min
then score(x) = log(b/r)
else score(x) = 0
fi
treatment
b = F(x | best) / F(best)
r = F(x | rest) / F(rest)
i
utility
spread
median
22. Cases
train test
(query ⊆ ranges)
relevant
Best
utilities
rest
x
x
S = all x sorted descending by score
queryi* =
query + ∪iSi
treatedi
k-NN
k-NN
i
q0* qi*
As is To be
Cases map features F to a utility
F= Controllables + others
if controllable(x) &&
b > r &&
b > min
then score(x) = log(b/r)
else score(x) = 0
fi
treatment
b = F(x | best) / F(best)
r = F(x | rest) / F(rest)
i
utility
spread
median
A greedy linear time search?
• Need to use much better search algorithms
• Simulated annealing, Beam, Astar, ISSAMP, MaxWalkSat
• SEESAW (home brew)
23. Cases
train test
(query ⊆ ranges)
relevant
Best
utilities
rest
x
x
S = all x sorted descending by score
queryi* =
query + ∪iSi
treatedi
k-NN
k-NN
i
q0* qi*
As is To be
Cases map features F to a utility
F= Controllables + others
if controllable(x) &&
b > r &&
b > min
then score(x) = log(b/r)
else score(x) = 0
fi
treatment
b = F(x | best) / F(best)
r = F(x | rest) / F(rest)
i
utility
spread
median
24. Cases
train test
(query ⊆ ranges)
relevant
Best
utilities
rest
x
x
S = all x sorted descending by score
queryi* =
query + ∪iSi
treatedi
k-NN
k-NN
i
q0* qi*
As is To be
Cases map features F to a utility
F= Controllables + others
if controllable(x) &&
b > r &&
b > min
then score(x) = log(b/r)
else score(x) = 0
fi
treatment
b = F(x | best) / F(best)
r = F(x | rest) / F(rest)
i
utility
spread
median
25. Cases
train test
(query ⊆ ranges)
relevant
Best
utilities
rest
x
x
S = all x sorted descending by score
queryi* =
query + ∪iSi
treatedi
k-NN
k-NN
i
q0* qi*
As is To be
Cases map features F to a utility
F= Controllables + others
if controllable(x) &&
b > r &&
b > min
then score(x) = log(b/r)
else score(x) = 0
fi
treatment
b = F(x | best) / F(best)
r = F(x | rest) / F(rest)
i
utility
spread
median
Just trying to reduce effort?
• What about development time?
• What about number of defects?
• What about different business contexts?
e.g. “racing to market” vs “mission-critical” apps
26. Cases
train test
(query ⊆ ranges)
relevant
Best
utilities
rest
x
x
S = all x sorted descending by score
queryi* =
query + ∪iSi
treatedi
k-NN
k-NN
i
q0* qi*
As is To be
Cases map features F to a utility
F= Controllables + others
if controllable(x) &&
b > r &&
b > min
then score(x) = log(b/r)
else score(x) = 0
fi
treatment
b = F(x | best) / F(best)
r = F(x | rest) / F(rest)
i
utility
spread
median
27. Cases
train test
(query ⊆ ranges)
relevant
Best
utilities
rest
x
x
S = all x sorted descending by score
queryi* =
query + ∪iSi
treatedi
k-NN
k-NN
i
q0* qi*
As is To be
Cases map features F to a utility
F= Controllables + others
if controllable(x) &&
b > r &&
b > min
then score(x) = log(b/r)
else score(x) = 0
fi
treatment
b = F(x | best) / F(best)
r = F(x | rest) / F(rest)
i
utility
spread
median
28. Cases
train test
(query ⊆ ranges)
relevant
Best
utilities
rest
x
x
S = all x sorted descending by score
queryi* =
query + ∪iSi
treatedi
k-NN
k-NN
i
q0* qi*
As is To be
Cases map features F to a utility
F= Controllables + others
if controllable(x) &&
b > r &&
b > min
then score(x) = log(b/r)
else score(x) = 0
fi
treatment
b = F(x | best) / F(best)
r = F(x | rest) / F(rest)
i
utility
spread
median
Is nearest neighbor causing
conclusion instability?
• Q: How to smooth the bumps between
between the samples ?
• A: Don’t apply constraints to the data
• Apply it as model inputs instead
29. Cases
train test
(query ⊆ ranges)
relevant
Best
utilities
rest
x
x
S = all x sorted descending by score
queryi* =
query + ∪iSi
treatedi
k-NN
k-NN
i
q0* qi*
As is To be
Cases map features F to a utility
F= Controllables + others
if controllable(x) &&
b > r &&
b > min
then score(x) = log(b/r)
else score(x) = 0
fi
treatment
b = F(x | best) / F(best)
r = F(x | rest) / F(rest)
i
utility
spread
median
30. Cases
train test
(query ⊆ ranges)
relevant
Best
utilities
rest
x
x
S = all x sorted descending by score
queryi* =
query + ∪iSi
treatedi
k-NN
k-NN
i
q0* qi*
As is To be
Cases map features F to a utility
F= Controllables + others
if controllable(x) &&
b > r &&
b > min
then score(x) = log(b/r)
else score(x) = 0
fi
treatment
b = F(x | best) / F(best)
r = F(x | rest) / F(rest)
i
utility
spread
median
31. Cases
train test
(query ⊆ ranges)
relevant
Best
utilities
rest
x
x
S = all x sorted descending by score
queryi* =
query + ∪iSi
treatedi
k-NN
k-NN
i
q0* qi*
As is To be
Cases map features F to a utility
F= Controllables + others
if controllable(x) &&
b > r &&
b > min
then score(x) = log(b/r)
else score(x) = 0
fi
treatment
b = F(x | best) / F(best)
r = F(x | rest) / F(rest)
i
utility
spread
median
Just one test?
• What about looking for
stability in “N” repeats?
32. Cases
train test
(query ⊆ ranges)
relevant
Best
utilities
rest
x
x
S = all x sorted descending by score
queryi* =
query + ∪iSi
treatedi
k-NN
k-NN
i
q0* qi*
As is To be
Cases map features F to a utility
F= Controllables + others
if controllable(x) &&
b > r &&
b > min
then score(x) = log(b/r)
else score(x) = 0
fi
treatment
b = F(x | best) / F(best)
r = F(x | rest) / F(rest)
i
utility
spread
median
More tests1
33. Cases
train test
(query ⊆ ranges)
relevant
Best
utilities
rest
x
x
S = all x sorted descending by score
queryi* =
query + ∪iSi
treatedi
k-NN
k-NN
i
q0* qi*
As is To be
Cases map features F to a utility
F= Controllables + others
if controllable(x) &&
b > r &&
b > min
then score(x) = log(b/r)
else score(x) = 0
fi
treatment
b = F(x | best) / F(best)
r = F(x | rest) / F(rest)
i
utility
spread
median
More tests1
More models
2
34. Cases
train test
(query ⊆ ranges)
relevant
Best
utilities
rest
x
x
S = all x sorted descending by score
queryi* =
query + ∪iSi
treatedi
k-NN
k-NN
i
q0* qi*
As is To be
Cases map features F to a utility
F= Controllables + others
if controllable(x) &&
b > r &&
b > min
then score(x) = log(b/r)
else score(x) = 0
fi
treatment
b = F(x | best) / F(best)
r = F(x | rest) / F(rest)
i
utility
spread
median
More tests1
More models
2
More goals3
35. Cases
train test
(query ⊆ ranges)
relevant
Best
utilities
rest
x
x
S = all x sorted descending by score
queryi* =
query + ∪iSi
treatedi
k-NN
k-NN
i
q0* qi*
As is To be
Cases map features F to a utility
F= Controllables + others
if controllable(x) &&
b > r &&
b > min
then score(x) = log(b/r)
else score(x) = 0
fi
treatment
b = F(x | best) / F(best)
r = F(x | rest) / F(rest)
i
utility
spread
median
More tests1
More models
2
More goals3
More search4
36. Roadmap
• Round 1:
– “W” : a very simple local lessons” finder
• Discussion
– What’s wrong with “W” ?
• Round 2:
– “NOVA”
– Results from NOVA
• Conclusions
37. COCOMO
• Time to build it (calendar months)
• Effort to build it (total staff months)
COQUALMO
• defects per 1000 lines of code
Estimate = model( p, t )
• P = project options
• T = tuning options
• Normal practice: Adjust “t” using local data
• NOVA: Stagger randomly all tunings even seen before
USC Cocomo suite (Boehm 1981, 2000)
More models
?
38. B = BFC
Goal #1:
• better, faster, cheaper
Try to minimize:
• Development time and
• Development effort and
• # defects
Goal #2
• minimize risk exposure
Rushing to beat the competition
• Get to market, soon as you can
• Without too many defects
More goals
X = XPOS
40. Data sets
• OSP= orbital space plane GNC
• OSP2 = second generation GNC
• Flight = JPL flight systems
• Ground = JPL ground systems
For each data set
• Search N= 20 times (with SEESAW)
• Record how often decisions are found
Four data sets, repeat N=20 times
More tests
41. Frequency%
of range in
20 repeats
If high, then
more in BFC
If low, then
usually in XPOS
Better, faster, cheaper Minimize risk exposure
(rushing to market)
(ignore all ranges
found < 50%)
If 50% then same
In BFC and XPOS
Mostly: if selected by one, rejected by the other
“Value”
(business context)
changes everything
42. And what of
defect removal
techniques?
Better, faster, cheaper Minimize risk exposure
(rushing to market)
Aa = automated analysis
Etat= execution testing and tools
Pr= peer review
Stopping defect introduction is better than defect removal.
43. Roadmap
• Round 1:
– “W” : a very simple local lessons” finder
• Discussion
– What’s wrong with “W” ?
• Round 2:
– “NOVA”
– Results from NOVA
• Conclusions
44. Conclusion
• When making relative
assessments of cost/benefits of
SE technologies,
– Context changes everything
– Claims of relative cost benefits
of tools are context-dependent
• The “local lessons effect”
– Ideas that are useful in general
– May not the most useful in
particular
• Tools to find context-dependent
local lessons
– “W”: case-based reasoner
(goal: reduce effort)
– “NOVA”: more intricate tool
(goal: reduce effort and time and
defects)
• Take home lessons:
– Beware blanket claims of
“this is better”
– Always define context where
you tested your tools
– Compare ASE to non-ASE tools
The real world is a special case