Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Seeing the forest for the trees, UMons 2011
1. Seeing the forest for the trees
Bogdan Vasilescu
b.n.vasilescu@tue.nl
http://www.win.tue.nl/∼bvasiles/
Software Engineering and Technology group
Eindhoven University of Technology
November 23, 2011
2. Eindhoven 2/21
/ department of mathematics and computer science
3. Eindhoven 2/21
/ department of mathematics and computer science
5. Computer Science @TU/e 3/21
Section Model Driven Software Engineering (MDSE)
Group Software Engineering and Technology (SET)
Mark van den Brand Alexander Serebrenik
/ department of mathematics and computer science
6. Interested in . . . 4/21
Software evolution
Aggregation of code metrics Activity in open-source projects
Computational geometry
/ department of mathematics and computer science
7. Interested in . . . 4/21
Software evolution
Aggregation of code metrics Activity in open-source projects
Computational geometry
/ department of mathematics and computer science
8. Aggregation of software metrics 5/21
Maintaining a software system is like renovating a house.
Maintainability assessment precedes changing the software.
Metrics are often applied to measure maintainability.
But metrics are defined at a low level (method, class).
We need aggregation techniques.
/ department of mathematics and computer science
10. Traditional aggregation techniques 7/21
Standard summary statistics: mean, median, . . .
Red line – mean; blue line – median
/ department of mathematics and computer science
11. Recent trend: Inequality indices 8/21
Econometrics: measure/explain the inequality of income or wealth.
Software metrics and econometric variables have distributions with
similar shapes.
Source Lines of Code: freecol−0.9.4 Household income in Ilocos, Philippines (1998)
100 200 300 400 500
400
300
Frequency
Frequency
200
100
0
0
0 500 1000 1500 2000 2500 3000 0 500000 1500000 2500000
SLOC per class Income
/ department of mathematics and computer science
12. Degree of concentration of functionality 9/21
Lorenz curve for SLOC in Hibernate
3.6.0-beta4.
1.0
0.8
0.6
% SLOC
0.4
0.2
0.0
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
% Classes
/ department of mathematics and computer science
13. Degree of concentration of functionality 9/21
Lorenz curve for SLOC in Hibernate
3.6.0-beta4.
A
2A
A+ B =
I Gini =
I Hoover
A
B
/ department of mathematics and computer science
14. Degree of concentration of functionality 9/21
Lorenz curve for SLOC in Hibernate Measure inequality between:
3.6.0-beta4. individuals
(e.g., classes)
A groups
2A
A+ B =
I Gini =
(e.g., components)
I Hoover Often desirable to assess the
contribution of the inequality
A
between the groups.
B
Decomposable indices
Root-cause analysis
/ department of mathematics and computer science
15. Traceability via decomposability 10/21
Which individuals (classes in package) contribute to 80% of the
inequality (of SLOC)?
Which class contributes the most to the inequality?
/ department of mathematics and computer science
16. Other properties of inequality indices 11/21
Symmetry
Inequality stays the same for any permutation of the population.
/ department of mathematics and computer science
17. Other properties of inequality indices 11/21
Symmetry
Inequality stays the same for any permutation of the population.
/ department of mathematics and computer science
18. Other properties of inequality indices 11/21
Symmetry
Inequality stays the same for any permutation of the population.
/ department of mathematics and computer science
19. Other properties of inequality indices 12/21
Population principle
Inequality does not change if the population is replicated any number of
times.
/ department of mathematics and computer science
20. Other properties of inequality indices 12/21
Population principle
Inequality does not change if the population is replicated any number of
times.
/ department of mathematics and computer science
21. Other properties of inequality indices 12/21
Population principle
Inequality does not change if the population is replicated any number of
times.
/ department of mathematics and computer science
22. Other properties of inequality indices 13/21
Transfers principle
A transfer from a rich man to a poor man (without reversing their
position) should decrease inequality.
/ department of mathematics and computer science
23. Other properties of inequality indices 13/21
Transfers principle
A transfer from a rich man to a poor man (without reversing their
position) should decrease inequality.
/ department of mathematics and computer science
24. Other properties of inequality indices 13/21
Transfers principle
A transfer from a rich man to a poor man (without reversing their
position) should decrease inequality.
/ department of mathematics and computer science
25. Other properties of inequality indices 13/21
Transfers principle
20 36 45
30 36
A transfer from a rich man to a poor man (without reversing their
position) should decrease inequality.
/ department of mathematics and computer science
26. Other properties of inequality indices 14/21
Scale invariance: Gini, Theil, Atkinson, Hoover
Inequality does not change if all values are multiplied by the same
constant.
/ department of mathematics and computer science
27. Other properties of inequality indices 14/21
Scale invariance: Gini, Theil, Atkinson, Hoover
Inequality does not change if all values are multiplied by the same
constant.
/ department of mathematics and computer science
28. Summary 15/21
Ineq. index Sym. Inv. Dec. Pop. Tra.
IGini ×
ITheil ×
IMLD ×
IHoover ×
α
IAtkinson ×
β
IKolm +
/ department of mathematics and computer science
29. Summary 15/21
Ineq. index Sym. Inv. Dec. Pop. Tra.
IGini ×
ITheil ×
IMLD ×
IHoover ×
α
IAtkinson ×
β
IKolm +
Problems include:
Domain not always Rn .
No distinction between all values equal but low, and all values
equal but high.
/ department of mathematics and computer science
30. Our research 16/21
/ department of mathematics and computer science
31. Which are redundant? 17/21
IGini , ITheil , IMLD , IAtkinson , and IHoover always convey the same information.
1.0
0.5
SLOC
0.0
-0.5
-1.0
(91%) (89%) (91%) (90%) (92%) (92%) (90%) (91%) (91%) (92%)
MLD-Hoo Gin-MLD The-MLD Gin-Hoo Atk-Hoo The-Hoo Gin-Atk MLD-Atk Gin-The The-Atk
1.0
0.5
DIT
0.0
-0.5
-1.0
(85%) (87%) (87%) (88%) (88%) (89%) (88%) (88%) (88%) (89%)
MLD-Hoo Atk-Hoo Gin-MLD The-Hoo Gin-Atk Gin-Hoo Gin-The The-MLD The-Atk MLD-Atk
/ department of mathematics and computer science
32. Is the correlation meaningful? 18/21
Superlinear (e.g., ITheil –IGini ) and chaotic (e.g., ITheil –IKolm ) patterns can
be observed in the scatter plots.
compiere: Theil-Gini. Kendall: 0.94, p-val: 0.00 compiere: Theil-Kolm. Kendall: 0.25, p-val: 0.01
1.0
1.0
0.8
0.8
Theil (SLOC)
Theil (SLOC)
0.6
0.6
0.4
0.4
0.2
0.2
0.0
0.0
0.1 0.2 0.3 0.4 0.5 0.6 0 50 100 150 200 250 300 350
Gini (SLOC) Kolm (SLOC)
/ department of mathematics and computer science
33. Does the aggregation level matter? 19/21
Changing the aggregation level to class level does not affect the
correlation between various aggregation techniques as measured at
package level.
Kendall: Gini - Theil (SLOC) (100%) Kendall: Theil - Atkinson (SLOC) (100%) Kendall: Theil - MLD (SLOC) (100%)
1.0
1.0
1.0
0.5
0.5
0.5
Kendall correlation coefficient
Kendall correlation coefficient
Kendall correlation coefficient
0.0
0.0
0.0
-0.5
-0.5
-0.5
-1.0
-1.0
-1.0
/ department of mathematics and computer science
35. References 21/21
A. Serebrenik and M. G. J. van den Brand.
Theil index for aggregation of software metrics values.
In Int. Conf. on Software Maintenance, pages 1–9. IEEE, 2010.
B. Vasilescu.
Analysis of advanced aggregation techniques for software metrics.
Master’s thesis, Eindhoven, The Netherlands, July 2011.
B. Vasilescu, A. Serebrenik, and M. G. J. van den Brand.
By no means: A study on aggregating software metrics.
In 2nd International Workshop on Emerging Trends in Software Metrics,
Honolulu, Hawaii, USA, 2011.
B. Vasilescu, A. Serebrenik, and M. G. J. van den Brand.
You can’t control the unfamiliar: A study on the relations between
aggregation techniques for software metrics.
In Int. Conf. on Software Maintenance. IEEE, 2011.
/ department of mathematics and computer science