I used these slides during my presentation at BeNeVol 2010 in Lille, France.
Paper:
Vasilescu B, Serebrenik A and van den Brand MGJ (2010), "Comparative study of software metrics' aggregation techniques", In Proceedings of the 9th Belgian-Netherlands Software Evolution Seminar, pp. 80-84.
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Benevol 2010
1. Software metrics are usually right-skewed
Histogram of SLOC(org.argouml.ui)
25
20
15
Frequency
10
5
0
0 100 200 300 400 500
SLOC for classes in org.argouml.ui
2. 2/11
Aggregation of software metrics using the
“softnometric” index
Bogdan Vasilescu
b.n.vasilescu@student.tue.nl
Eindhoven University of Technology
The Netherlands
March 9, 2011
3. Aggregation techniques 3/11
Inequality indices:
Classical: Distribution fitting:
Theil
Mean Log-normal
Gini
Sum Exponential
Kolm
Cardinality Negative binomial
Atkinson
4. Aggregation techniques 3/11
Inequality indices:
Classical: Distribution fitting:
Theil
Mean Log-normal
Gini
Sum Exponential
Kolm
Cardinality Negative binomial
Atkinson
5. Gini index 4/11
The Gini index is based on the Lorenz curve:
proportion of the total income of the population (y-axis)
cumulatively earned by the bottom x% of the people.
0 perfect equality: every person receives the same income.
1 perfect inequality: one person receives all the income.
IGini (X ) = A
A +B
6. Gini index 4/11
The Gini index is based on the Lorenz curve:
proportion of the total income of the population (y-axis)
cumulatively earned by the bottom x% of the people.
0 perfect equality: every person receives the same income.
1 perfect inequality: one person receives all the income.
IGini (X ) = A
A +B
7. Theoretical comparison 5/11
Criteria:
Domain → determines applicability
Range → determines interpretation
Invariance
• w.r.t. addition → LOC, ignore headers
• w.r.t. multiplication → LOC, percentages vs. absolute values
Decomposability → explain inequality by partitioning the
population into groups
8. Theoretical comparison 6/11
Agg. technique Domain Range Invariance Decomposability
Mean R R - N/A
Sum R R - N/A
Cardinality R N - N/A
Gini Index R+ [0, 1] mult. -
R R mult. -
Theil Index R+ [0, log n] mult. yes
Kolm Index R R+ add. yes
Atkinson Index R+ [0, 1 − 1/n] mult. -
9. Empirical comparison 7/11
Research questions:
Does LOC relate to bugs?
Do the aggregation techniques influence the presence/strength of
this relation?
Is there any difference between the aggregation techniques?
Do they express the same thing?
11. Empirical comparison 8/11
Case study: ArgoUML
Open-source, ∼ 1200 Java classes, ∼ 100 packages.
Methodology:
Tool chain to automatically process issue tracker and version
control system data.
Mapped defects to Java classes and then packages.
Measured SLOC of each class, aggregated to package level.
For each aggregation technique, statistically studied correlation
with bugs.
12. Results 9/11
mean IGini ITheil IKolm IAtkinson defects
mean 0.170 0.192 0.6761 0.203 0.0096
IGini 0.908 0.467 0.903 0.27
ITheil 0.488 0.918 0.273
IKolm 0.501 0.119
IAtkinson 0.229
IGini , ITheil and IAtkinson indicate the strongest and also statistically
significant correlation with the number of defects.
However, high and statistically significant correlation between
them.
Mean indicates the lowest correlation with the number of defects.
1 statistically significant correlations, with two-sided p-values not exceeding 0.01, are typeset in boldface
13. Threats to validity 10/11
No control over the issue tracker → mapping of defects to classes.
bugs missing from the issue tracker.
bug fixes not showing up in the commit log.
How representative is the case? How about the version?
replicate on more systems and more versions.
Is LOC the most suitable metric?
replicate with more metrics.
14. Conclusions 11/11
Software metrics are not distributed normally.
Histogram of SLOC(org.argouml.ui)
Theoretical comparison.
25
Agg. technique Domain Range Invariance Decomposability
20
Mean R R - N/A
Sum R R - N/A
15
Frequency
Cardinality R N - N/A
10
Gini Index R+ [0, 1] mult. -
R R mult. -
5
Theil Index R+ [0, log n] mult. yes
0
0 100 200 300 400 500
Kolm Index R R+ add. yes
SLOC for classes in org.argouml.ui Atkinson Index R+ [0, 1 − 1/n] mult. -
Empirical comparison.
mean Gini Theil Kolm Atkinson defects
mean 0.170 0.192 0.676 0.203 0.0096
Gini 0.908 0.467 0.903 0.27
Theil 0.488 0.918 0.273
Kolm 0.501 0.119
Atkinson 0.229
Classical aggregation techniques have problems when distributions are
skewed. Inequality indices look more promising.