SlideShare uma empresa Scribd logo
1 de 33
Baixar para ler offline
Physical and
Conceptual
Identifier
Dispersion
Venera
Arnaoudova, Laleh
Eshkevari, Rocco
Oliveto, Yann-Ga¨el
Gu´eh´eneuc,
Giuliano Antoniol
Introduction
Our study
Dispersion
measures
Our study - refined
Case study
RQ1 – Metric Relevance
RQ2 – Relation to Faults
Conclusions and
future work
Physical and Conceptual Identifier Dispersion:
Measures and Relation to Fault Proneness
Venera Arnaoudova Laleh Eshkevari Rocco Oliveto
Yann-Ga¨el Gu´eh´eneuc Giuliano Antoniol
SOCCER Lab. – DGIGL, ´Ecole Polytechnique de Montr´eal, Qc, Canada
SE@SA Lab – DMI, University of Salerno - Salerno - Italy
Ptidej Team – DGIGL, ´Ecole Polytechnique de Montr´eal, Qc, Canada
September 15, 2010
SOftware Cost-effective Change and Evolution Research Lab
Software Engineering @ SAlerno
Pattern Trace Identification, Detection, and Enhancement in Java
Physical and
Conceptual
Identifier
Dispersion
Venera
Arnaoudova, Laleh
Eshkevari, Rocco
Oliveto, Yann-Ga¨el
Gu´eh´eneuc,
Giuliano Antoniol
Introduction
Our study
Dispersion
measures
Our study - refined
Case study
RQ1 – Metric Relevance
RQ2 – Relation to Faults
Conclusions and
future work
Outline
Introduction
Our study
Dispersion measures
Our study - refined
Case study
RQ1 – Metric Relevance
RQ2 – Relation to Faults
Conclusions and future work
2 / 16
Physical and
Conceptual
Identifier
Dispersion
Venera
Arnaoudova, Laleh
Eshkevari, Rocco
Oliveto, Yann-Ga¨el
Gu´eh´eneuc,
Giuliano Antoniol
Introduction
Our study
Dispersion
measures
Our study - refined
Case study
RQ1 – Metric Relevance
RQ2 – Relation to Faults
Conclusions and
future work
Introduction
Fault identification
size (e.g., [Gyim´othy et al., 2005])
cohesion (e.g., [Liu et al., 2009])
coupling (e.g., [Marcus et al., 2008])
number of changes (e.g., [Zimmermann et al., 2007])
Importance of linguistic information
program comprehension (e.g.,
[Takang et al., 1996, Deissenboeck and Pizka, 2006,
Haiduc and Marcus, 2008, Binkley et al., 2009])
code quality (e.g., [Marcus et al., 2008,
Poshyvanyk and Marcus, 2006, Butler et al., 2009])
3 / 16
Physical and
Conceptual
Identifier
Dispersion
Venera
Arnaoudova, Laleh
Eshkevari, Rocco
Oliveto, Yann-Ga¨el
Gu´eh´eneuc,
Giuliano Antoniol
Introduction
Our study
Dispersion
measures
Our study - refined
Case study
RQ1 – Metric Relevance
RQ2 – Relation to Faults
Conclusions and
future work
Our study
Term dispersion
We are interested in studying the relation between term
dispersion and the quality of the source code.
term basic component of identifiers
dispersion the way terms are scattered among different
entities (attributes and methods)
quality absence of faults
Example: What is the impact of using getRelativePath,
returnAbsolutePath, and setPath as method names on
the fault proneness of those methods?
4 / 16
Physical and
Conceptual
Identifier
Dispersion
Venera
Arnaoudova, Laleh
Eshkevari, Rocco
Oliveto, Yann-Ga¨el
Gu´eh´eneuc,
Giuliano Antoniol
Introduction
Our study
Dispersion
measures
Our study - refined
Case study
RQ1 – Metric Relevance
RQ2 – Relation to Faults
Conclusions and
future work
Dispersion measures
(1/3)
Physical dispersion - Entropy
fee
foo
bar
Terms
Entities
E1 E2 E3 E4 E5
Entropy
The circle indicates the occurrences of a term in an entity.
The higher the size of the circle the higher the number of occurrences.
5 / 16
Physical and
Conceptual
Identifier
Dispersion
Venera
Arnaoudova, Laleh
Eshkevari, Rocco
Oliveto, Yann-Ga¨el
Gu´eh´eneuc,
Giuliano Antoniol
Introduction
Our study
Dispersion
measures
Our study - refined
Case study
RQ1 – Metric Relevance
RQ2 – Relation to Faults
Conclusions and
future work
Dispersion measures
(2/3)
Conceptual dispersion - Context Coverage
E1
E3
E2
E5
E4
C1
C3
C2
C4
Entity Contexts
Entity contexts are identified taking into account
the terms contained in the entities.
fee
foo
bar
Terms
ContextsC1 C2 C3 C4
Context
coverage
The star indicates that the term appears in the particular context.
6 / 16
Physical and
Conceptual
Identifier
Dispersion
Venera
Arnaoudova, Laleh
Eshkevari, Rocco
Oliveto, Yann-Ga¨el
Gu´eh´eneuc,
Giuliano Antoniol
Introduction
Our study
Dispersion
measures
Our study - refined
Case study
RQ1 – Metric Relevance
RQ2 – Relation to Faults
Conclusions and
future work
Dispersion measures
Aggregated metric - numHEHCC
(3/3)
Context Coverage
Entropy
th
H
th
CC
7 / 16
Physical and
Conceptual
Identifier
Dispersion
Venera
Arnaoudova, Laleh
Eshkevari, Rocco
Oliveto, Yann-Ga¨el
Gu´eh´eneuc,
Giuliano Antoniol
Introduction
Our study
Dispersion
measures
Our study - refined
Case study
RQ1 – Metric Relevance
RQ2 – Relation to Faults
Conclusions and
future work
Dispersion measures
Aggregated metric - numHEHCC
(3/3)
Context Coverage
Entropy
th
H
th
CC
?
7 / 16
Physical and
Conceptual
Identifier
Dispersion
Venera
Arnaoudova, Laleh
Eshkevari, Rocco
Oliveto, Yann-Ga¨el
Gu´eh´eneuc,
Giuliano Antoniol
Introduction
Our study
Dispersion
measures
Our study - refined
Case study
RQ1 – Metric Relevance
RQ2 – Relation to Faults
Conclusions and
future work
Dispersion measures
Aggregated metric - numHEHCC
(3/3)
Context Coverage
Entropy
th
H
th
CC
H: used in few identifiers
CC: used in similar contexts
7 / 16
Physical and
Conceptual
Identifier
Dispersion
Venera
Arnaoudova, Laleh
Eshkevari, Rocco
Oliveto, Yann-Ga¨el
Gu´eh´eneuc,
Giuliano Antoniol
Introduction
Our study
Dispersion
measures
Our study - refined
Case study
RQ1 – Metric Relevance
RQ2 – Relation to Faults
Conclusions and
future work
Dispersion measures
Aggregated metric - numHEHCC
(3/3)
Context Coverage
Entropy
th
H
th
CC
?
7 / 16
Physical and
Conceptual
Identifier
Dispersion
Venera
Arnaoudova, Laleh
Eshkevari, Rocco
Oliveto, Yann-Ga¨el
Gu´eh´eneuc,
Giuliano Antoniol
Introduction
Our study
Dispersion
measures
Our study - refined
Case study
RQ1 – Metric Relevance
RQ2 – Relation to Faults
Conclusions and
future work
Dispersion measures
Aggregated metric - numHEHCC
(3/3)
Context Coverage
Entropy
th
H
th
CC
H: used in many identifiers
CC: used in similar contexts
7 / 16
Physical and
Conceptual
Identifier
Dispersion
Venera
Arnaoudova, Laleh
Eshkevari, Rocco
Oliveto, Yann-Ga¨el
Gu´eh´eneuc,
Giuliano Antoniol
Introduction
Our study
Dispersion
measures
Our study - refined
Case study
RQ1 – Metric Relevance
RQ2 – Relation to Faults
Conclusions and
future work
Dispersion measures
Aggregated metric - numHEHCC
(3/3)
Context Coverage
Entropy
th
H
th
CC
?
7 / 16
Physical and
Conceptual
Identifier
Dispersion
Venera
Arnaoudova, Laleh
Eshkevari, Rocco
Oliveto, Yann-Ga¨el
Gu´eh´eneuc,
Giuliano Antoniol
Introduction
Our study
Dispersion
measures
Our study - refined
Case study
RQ1 – Metric Relevance
RQ2 – Relation to Faults
Conclusions and
future work
Dispersion measures
Aggregated metric - numHEHCC
(3/3)
Context Coverage
Entropy
th
H
th
CC
H: used in few identifiers
CC: used in different contexts
7 / 16
Physical and
Conceptual
Identifier
Dispersion
Venera
Arnaoudova, Laleh
Eshkevari, Rocco
Oliveto, Yann-Ga¨el
Gu´eh´eneuc,
Giuliano Antoniol
Introduction
Our study
Dispersion
measures
Our study - refined
Case study
RQ1 – Metric Relevance
RQ2 – Relation to Faults
Conclusions and
future work
Dispersion measures
Aggregated metric - numHEHCC
(3/3)
Context Coverage
Entropy
th
H
th
CC
?
7 / 16
Physical and
Conceptual
Identifier
Dispersion
Venera
Arnaoudova, Laleh
Eshkevari, Rocco
Oliveto, Yann-Ga¨el
Gu´eh´eneuc,
Giuliano Antoniol
Introduction
Our study
Dispersion
measures
Our study - refined
Case study
RQ1 – Metric Relevance
RQ2 – Relation to Faults
Conclusions and
future work
Dispersion measures
Aggregated metric - numHEHCC
(3/3)
Context Coverage
Entropy
th
H
th
CC
H: used in many identifiers
CC: used in different contexts
7 / 16
Physical and
Conceptual
Identifier
Dispersion
Venera
Arnaoudova, Laleh
Eshkevari, Rocco
Oliveto, Yann-Ga¨el
Gu´eh´eneuc,
Giuliano Antoniol
Introduction
Our study
Dispersion
measures
Our study - refined
Case study
RQ1 – Metric Relevance
RQ2 – Relation to Faults
Conclusions and
future work
Dispersion measures
Aggregated metric - numHEHCC
(3/3)
Context Coverage
Entropy
th
H
th
CC
H: used in many identifiers
CC: used in different contexts
!
7 / 16
Physical and
Conceptual
Identifier
Dispersion
Venera
Arnaoudova, Laleh
Eshkevari, Rocco
Oliveto, Yann-Ga¨el
Gu´eh´eneuc,
Giuliano Antoniol
Introduction
Our study
Dispersion
measures
Our study - refined
Case study
RQ1 – Metric Relevance
RQ2 – Relation to Faults
Conclusions and
future work
Dispersion measures
Aggregated metric - numHEHCC
(3/3)
Context Coverage
Entropy
th
H
th
CC
H: used in many identifiers
CC: used in different contexts
!
For each entity, numHEHCC counts the number of
such terms
7 / 16
Physical and
Conceptual
Identifier
Dispersion
Venera
Arnaoudova, Laleh
Eshkevari, Rocco
Oliveto, Yann-Ga¨el
Gu´eh´eneuc,
Giuliano Antoniol
Introduction
Our study
Dispersion
measures
Our study - refined
Case study
RQ1 – Metric Relevance
RQ2 – Relation to Faults
Conclusions and
future work
Our study - refined
(1/2)
Research question 1
RQ1 – Metric Relevance: Does numHEHCC capture
characteristics different from size?
Our believe: Yes it does, although we expect some
overlap.
To this end, we verify the following:
1. To what extend numHEHCC and size vary together.
2. Can size explain numHEHCC?
3. Does numHEHCC bring additional information to size
for fault explanation?
8 / 16
Physical and
Conceptual
Identifier
Dispersion
Venera
Arnaoudova, Laleh
Eshkevari, Rocco
Oliveto, Yann-Ga¨el
Gu´eh´eneuc,
Giuliano Antoniol
Introduction
Our study
Dispersion
measures
Our study - refined
Case study
RQ1 – Metric Relevance
RQ2 – Relation to Faults
Conclusions and
future work
Our study - refined
(2/2)
Research question 2
RQ2 – Relation to Faults: Do term entropy and
context coverage help to explain the presence of faults
in an entity?
Our believe: Yes it does!
How?
1. Estimate the risk of being faulty when entities contain
terms with high entropy and high context coverage.
9 / 16
Physical and
Conceptual
Identifier
Dispersion
Venera
Arnaoudova, Laleh
Eshkevari, Rocco
Oliveto, Yann-Ga¨el
Gu´eh´eneuc,
Giuliano Antoniol
Introduction
Our study
Dispersion
measures
Our study - refined
Case study
RQ1 – Metric Relevance
RQ2 – Relation to Faults
Conclusions and
future work
Objects
Objects
ArgoUML v0.16 – a UML modeling CASE tool.
Rhino v1.4R3 – a JavaScript/ECMAScript interpreter
and compiler.
Program LOC # Entities # Terms
ArgoUML 97,946 12,423 2517
Rhino 18,163 1,624 949
We consider as entities both methods and attributes.
10 / 16
Physical and
Conceptual
Identifier
Dispersion
Venera
Arnaoudova, Laleh
Eshkevari, Rocco
Oliveto, Yann-Ga¨el
Gu´eh´eneuc,
Giuliano Antoniol
Introduction
Our study
Dispersion
measures
Our study - refined
Case study
RQ1 – Metric Relevance
RQ2 – Relation to Faults
Conclusions and
future work
Case study
RQ1 – Metric Relevance (1/3)
Results for RQ1 – Metric Relevance
To what extend numHEHCC and size vary together?
ArgoUML: 40%
Rhino: 43%
Correlation between numHEHCC and LOC
numHEHCC
LOC
11 / 16
Physical and
Conceptual
Identifier
Dispersion
Venera
Arnaoudova, Laleh
Eshkevari, Rocco
Oliveto, Yann-Ga¨el
Gu´eh´eneuc,
Giuliano Antoniol
Introduction
Our study
Dispersion
measures
Our study - refined
Case study
RQ1 – Metric Relevance
RQ2 – Relation to Faults
Conclusions and
future work
Case study
RQ1 – Metric Relevance (2/3)
Results for RQ1 – Metric Relevance
Can size explain numHEHCC?
ArgoUML: 17%
Rhino: 19%
Composition of numHEHCC.
12 / 16
Physical and
Conceptual
Identifier
Dispersion
Venera
Arnaoudova, Laleh
Eshkevari, Rocco
Oliveto, Yann-Ga¨el
Gu´eh´eneuc,
Giuliano Antoniol
Introduction
Our study
Dispersion
measures
Our study - refined
Case study
RQ1 – Metric Relevance
RQ2 – Relation to Faults
Conclusions and
future work
Case study
RQ1 – Metric Relevance (3/3)
Results for RQ1 – Metric Relevance (cont’d)
Does numHEHCC bring additional information to size
for fault explanation?
Variables Coefficients p-values
MArgoUML
Intercept -1.688e+00 2e − 16
LOC 7.703e-03 8.34e − 10
numHEHCC 7.490e-02 1.42e − 05
LOC:numHEHCC -2.819e-04 0.000211
MRhino
Intercept -4.9625130 2e − 16
LOC 0.0041486 0.17100
numHEHCC 0.2446853 0.00310
LOC:numHEHCC -0.0004976 0.29788
13 / 16
Physical and
Conceptual
Identifier
Dispersion
Venera
Arnaoudova, Laleh
Eshkevari, Rocco
Oliveto, Yann-Ga¨el
Gu´eh´eneuc,
Giuliano Antoniol
Introduction
Our study
Dispersion
measures
Our study - refined
Case study
RQ1 – Metric Relevance
RQ2 – Relation to Faults
Conclusions and
future work
Case study
Results for RQ2 – Relation to Faults (1/1)
The risk of being faulty when entities contain terms
with high entropy and high context coverage.
All entities
14 / 16
Physical and
Conceptual
Identifier
Dispersion
Venera
Arnaoudova, Laleh
Eshkevari, Rocco
Oliveto, Yann-Ga¨el
Gu´eh´eneuc,
Giuliano Antoniol
Introduction
Our study
Dispersion
measures
Our study - refined
Case study
RQ1 – Metric Relevance
RQ2 – Relation to Faults
Conclusions and
future work
Case study
Results for RQ2 – Relation to Faults (1/1)
The risk of being faulty when entities contain terms
with high entropy and high context coverage.
All entities
14 / 16
Physical and
Conceptual
Identifier
Dispersion
Venera
Arnaoudova, Laleh
Eshkevari, Rocco
Oliveto, Yann-Ga¨el
Gu´eh´eneuc,
Giuliano Antoniol
Introduction
Our study
Dispersion
measures
Our study - refined
Case study
RQ1 – Metric Relevance
RQ2 – Relation to Faults
Conclusions and
future work
Case study
Results for RQ2 – Relation to Faults (1/1)
The risk of being faulty when entities contain terms
with high entropy and high context coverage.
All entities
numHEHCC
10% of the
entities
14 / 16
Physical and
Conceptual
Identifier
Dispersion
Venera
Arnaoudova, Laleh
Eshkevari, Rocco
Oliveto, Yann-Ga¨el
Gu´eh´eneuc,
Giuliano Antoniol
Introduction
Our study
Dispersion
measures
Our study - refined
Case study
RQ1 – Metric Relevance
RQ2 – Relation to Faults
Conclusions and
future work
Case study
Results for RQ2 – Relation to Faults (1/1)
The risk of being faulty when entities contain terms
with high entropy and high context coverage.
All entities
numHEHCC
10% of the
entities
Risk of being faulty?
14 / 16
Physical and
Conceptual
Identifier
Dispersion
Venera
Arnaoudova, Laleh
Eshkevari, Rocco
Oliveto, Yann-Ga¨el
Gu´eh´eneuc,
Giuliano Antoniol
Introduction
Our study
Dispersion
measures
Our study - refined
Case study
RQ1 – Metric Relevance
RQ2 – Relation to Faults
Conclusions and
future work
Case study
Results for RQ2 – Relation to Faults (1/1)
The risk of being faulty when entities contain terms
with high entropy and high context coverage.
All entities
numHEHCC
10% of the
entities
Risk of being faulty?
ArgoUML: 2 x higher
Rhino: 6 x higher
14 / 16
Physical and
Conceptual
Identifier
Dispersion
Venera
Arnaoudova, Laleh
Eshkevari, Rocco
Oliveto, Yann-Ga¨el
Gu´eh´eneuc,
Giuliano Antoniol
Introduction
Our study
Dispersion
measures
Our study - refined
Case study
RQ1 – Metric Relevance
RQ2 – Relation to Faults
Conclusions and
future work
Conclusions and future work
Conclusions
Entropy and context coverage, together, capture
characteristics different from size!
Entropy and context coverage, together, help to explain
the presence of faults in entities!
Future directions
Replicate the study to other systems.
Use entropy and context coverage to suggest
refactoring.
Study the impact of lexicon evolution on entropy and
context coverage.
15 / 16
Physical and
Conceptual
Identifier
Dispersion
Venera
Arnaoudova, Laleh
Eshkevari, Rocco
Oliveto, Yann-Ga¨el
Gu´eh´eneuc,
Giuliano Antoniol
Introduction
Our study
Dispersion
measures
Our study - refined
Case study
RQ1 – Metric Relevance
RQ2 – Relation to Faults
Conclusions and
future work
Thank you!
Questions?
16 / 16
Physical and
Conceptual
Identifier
Dispersion
Venera
Arnaoudova, Laleh
Eshkevari, Rocco
Oliveto, Yann-Ga¨el
Gu´eh´eneuc,
Giuliano Antoniol
Introduction
Our study
Dispersion
measures
Our study - refined
Case study
RQ1 – Metric Relevance
RQ2 – Relation to Faults
Conclusions and
future work
Binkley, D., Davis, M., Lawrie, D., and Morrell, C.
(2009).
To CamelCase or Under score.
In Proceedings of 17th IEEE International Conference on
Program Comprehension. IEEE CS Press.
Butler, S., Wermelinger, M., Yu, Y., and Sharp, H.
(2009).
Relating identifier naming flaws and code quality: An
empirical study.
In Proceedings of the 16th Working Conference on
Reverse Engineering, pages 31–35. IEEE CS Press.
Deissenboeck, F. and Pizka, M. (2006).
Concise and consistent naming.
Software Quality Journal, 14(3):261–282.
Gyim´othy, T., Ferenc, R., and Siket, I. (2005).
Empirical validation of object-oriented metrics on open
source software for fault prediction.
16 / 16
Physical and
Conceptual
Identifier
Dispersion
Venera
Arnaoudova, Laleh
Eshkevari, Rocco
Oliveto, Yann-Ga¨el
Gu´eh´eneuc,
Giuliano Antoniol
Introduction
Our study
Dispersion
measures
Our study - refined
Case study
RQ1 – Metric Relevance
RQ2 – Relation to Faults
Conclusions and
future work
IEEE Transactions on Software Engineering,
31(10):897–910.
Haiduc, S. and Marcus, A. (2008).
On the use of domain terms in source code.
In Proceedings of 16th IEEE International Conference on
Program Comprehension, pages 113–122. IEEE CS
Press.
Liu, Y., Poshyvanyk, D., Ferenc, R., Gyim´othy, T., and
Chrisochoides, N. (2009).
Modelling class cohesion as mixtures of latent topics.
In Proceedings of 25th IEEE International Conference on
Software Maintenance, pages 233–242, Edmonton,
Canada. IEEE CS Press.
Marcus, A., Poshyvanyk, D., and Ferenc, R. (2008).
Using the conceptual cohesion of classes for fault
prediction in object-oriented systems.
IEEE Transactions on Software Engineering,
34(2):287–300.
16 / 16
Physical and
Conceptual
Identifier
Dispersion
Venera
Arnaoudova, Laleh
Eshkevari, Rocco
Oliveto, Yann-Ga¨el
Gu´eh´eneuc,
Giuliano Antoniol
Introduction
Our study
Dispersion
measures
Our study - refined
Case study
RQ1 – Metric Relevance
RQ2 – Relation to Faults
Conclusions and
future work
Poshyvanyk, D. and Marcus, A. (2006).
The conceptual coupling metrics for object-oriented
systems.
In Proceedings of 22nd IEEE International Conference on
Software Maintenance, pages 469 – 478. IEEE CS Press.
Takang, A., Grubb, P., and Macredie, R. (1996).
The effects of comments and identifier names on
program comprehensibility: an experiential study.
Journal of Program Languages, 4(3):143–167.
Zimmermann, T., Premraj, R., and Zeller, A. (2007).
Predicting defects for eclipse.
In Proceedings of the Third International Workshop on
Predictor Models in Software Engineering.
16 / 16

Mais conteúdo relacionado

Destaque

Linear regression and correlation analysis ppt @ bec doms
Linear regression and correlation analysis ppt @ bec domsLinear regression and correlation analysis ppt @ bec doms
Linear regression and correlation analysis ppt @ bec domsBabasab Patil
 
Measures of dispersion
Measures of dispersionMeasures of dispersion
Measures of dispersionDrZahid Khan
 
Lesson 8 Linear Correlation And Regression
Lesson 8 Linear Correlation And RegressionLesson 8 Linear Correlation And Regression
Lesson 8 Linear Correlation And RegressionSumit Prajapati
 
Simple linear regression (final)
Simple linear regression (final)Simple linear regression (final)
Simple linear regression (final)Harsh Upadhyay
 
Measure of dispersion part I (Range, Quartile Deviation, Interquartile devi...
Measure of dispersion part   I (Range, Quartile Deviation, Interquartile devi...Measure of dispersion part   I (Range, Quartile Deviation, Interquartile devi...
Measure of dispersion part I (Range, Quartile Deviation, Interquartile devi...Shakehand with Life
 
Regression analysis ppt
Regression analysis pptRegression analysis ppt
Regression analysis pptElkana Rorio
 
Student's T-test, Paired T-Test, ANOVA & Proportionate Test
Student's T-test, Paired T-Test, ANOVA & Proportionate TestStudent's T-test, Paired T-Test, ANOVA & Proportionate Test
Student's T-test, Paired T-Test, ANOVA & Proportionate TestAzmi Mohd Tamil
 
Analysis of variance (ANOVA)
Analysis of variance (ANOVA)Analysis of variance (ANOVA)
Analysis of variance (ANOVA)Sneh Kumari
 
Correlation analysis ppt
Correlation analysis pptCorrelation analysis ppt
Correlation analysis pptAnil Mishra
 
Least square method
Least square methodLeast square method
Least square methodSomya Bagai
 
Data Science - Part IV - Regression Analysis & ANOVA
Data Science - Part IV - Regression Analysis & ANOVAData Science - Part IV - Regression Analysis & ANOVA
Data Science - Part IV - Regression Analysis & ANOVADerek Kane
 
Correlation and regression
Correlation and regressionCorrelation and regression
Correlation and regressionKhalid Aziz
 

Destaque (17)

Linear regression and correlation analysis ppt @ bec doms
Linear regression and correlation analysis ppt @ bec domsLinear regression and correlation analysis ppt @ bec doms
Linear regression and correlation analysis ppt @ bec doms
 
Measures of dispersion
Measures of dispersionMeasures of dispersion
Measures of dispersion
 
Regression
RegressionRegression
Regression
 
T test and ANOVA
T test and ANOVAT test and ANOVA
T test and ANOVA
 
ANOVA II
ANOVA IIANOVA II
ANOVA II
 
Correlation
CorrelationCorrelation
Correlation
 
Lesson 8 Linear Correlation And Regression
Lesson 8 Linear Correlation And RegressionLesson 8 Linear Correlation And Regression
Lesson 8 Linear Correlation And Regression
 
Simple linear regression (final)
Simple linear regression (final)Simple linear regression (final)
Simple linear regression (final)
 
Measure of dispersion part I (Range, Quartile Deviation, Interquartile devi...
Measure of dispersion part   I (Range, Quartile Deviation, Interquartile devi...Measure of dispersion part   I (Range, Quartile Deviation, Interquartile devi...
Measure of dispersion part I (Range, Quartile Deviation, Interquartile devi...
 
Regression analysis ppt
Regression analysis pptRegression analysis ppt
Regression analysis ppt
 
Student's T-test, Paired T-Test, ANOVA & Proportionate Test
Student's T-test, Paired T-Test, ANOVA & Proportionate TestStudent's T-test, Paired T-Test, ANOVA & Proportionate Test
Student's T-test, Paired T-Test, ANOVA & Proportionate Test
 
Analysis of variance (ANOVA)
Analysis of variance (ANOVA)Analysis of variance (ANOVA)
Analysis of variance (ANOVA)
 
Correlation analysis ppt
Correlation analysis pptCorrelation analysis ppt
Correlation analysis ppt
 
Least square method
Least square methodLeast square method
Least square method
 
Measures of dispersion
Measures  of  dispersionMeasures  of  dispersion
Measures of dispersion
 
Data Science - Part IV - Regression Analysis & ANOVA
Data Science - Part IV - Regression Analysis & ANOVAData Science - Part IV - Regression Analysis & ANOVA
Data Science - Part IV - Regression Analysis & ANOVA
 
Correlation and regression
Correlation and regressionCorrelation and regression
Correlation and regression
 

Semelhante a Physical and Conceptual Identifier Dispersion: Measures and Relation to Fault Proneness

3 IGARSS2011_Pasolli_Final.pptx
3 IGARSS2011_Pasolli_Final.pptx3 IGARSS2011_Pasolli_Final.pptx
3 IGARSS2011_Pasolli_Final.pptxgrssieee
 
A Two Stage Estimator of Instrumental Variable Quantile Regression for Panel ...
A Two Stage Estimator of Instrumental Variable Quantile Regression for Panel ...A Two Stage Estimator of Instrumental Variable Quantile Regression for Panel ...
A Two Stage Estimator of Instrumental Variable Quantile Regression for Panel ...ijtsrd
 
A PROCEDURE FOR IDENTIFYING PRECURSORS TOPROBLEM BEHAVIOR.docx
A PROCEDURE FOR IDENTIFYING PRECURSORS TOPROBLEM BEHAVIOR.docxA PROCEDURE FOR IDENTIFYING PRECURSORS TOPROBLEM BEHAVIOR.docx
A PROCEDURE FOR IDENTIFYING PRECURSORS TOPROBLEM BEHAVIOR.docxbartholomeocoombs
 
Automatic eye fixations identification based on analysis of variance and cova...
Automatic eye fixations identification based on analysis of variance and cova...Automatic eye fixations identification based on analysis of variance and cova...
Automatic eye fixations identification based on analysis of variance and cova...Giuseppe Fineschi
 
ZR-Objective Airway Inter-Rater Reliability(11-9-15)
ZR-Objective Airway Inter-Rater Reliability(11-9-15)ZR-Objective Airway Inter-Rater Reliability(11-9-15)
ZR-Objective Airway Inter-Rater Reliability(11-9-15)Zachary Riley
 
Design of Field Experiments in Biodiversity Impact Assessment
Design of Field Experiments in Biodiversity Impact Assessment Design of Field Experiments in Biodiversity Impact Assessment
Design of Field Experiments in Biodiversity Impact Assessment Dr Stephen Ambrose
 
Reproducibilidad mapping
Reproducibilidad mappingReproducibilidad mapping
Reproducibilidad mappingSuhey Pérez
 
C3.04: Assessing the impact of observations on ocean forecasts and reanalyses...
C3.04: Assessing the impact of observations on ocean forecasts and reanalyses...C3.04: Assessing the impact of observations on ocean forecasts and reanalyses...
C3.04: Assessing the impact of observations on ocean forecasts and reanalyses...Blue Planet Symposium
 
Case-control Study on 2nd Hammertoe Deformity Correction Techniques
Case-control Study on 2nd Hammertoe Deformity Correction TechniquesCase-control Study on 2nd Hammertoe Deformity Correction Techniques
Case-control Study on 2nd Hammertoe Deformity Correction TechniquesWenjay Sung
 
Hammer Toe Correction Comparative Study
Hammer Toe Correction Comparative StudyHammer Toe Correction Comparative Study
Hammer Toe Correction Comparative StudyWenjay Sung
 
Research design and experimentation
Research design and experimentationResearch design and experimentation
Research design and experimentationDr NEETHU ASOKAN
 

Semelhante a Physical and Conceptual Identifier Dispersion: Measures and Relation to Fault Proneness (20)

ICSM10a.ppt
ICSM10a.pptICSM10a.ppt
ICSM10a.ppt
 
Msr11.ppt
Msr11.pptMsr11.ppt
Msr11.ppt
 
MSR11.ppt
MSR11.pptMSR11.ppt
MSR11.ppt
 
Dipenta msr2011-renaming
Dipenta msr2011-renamingDipenta msr2011-renaming
Dipenta msr2011-renaming
 
3 IGARSS2011_Pasolli_Final.pptx
3 IGARSS2011_Pasolli_Final.pptx3 IGARSS2011_Pasolli_Final.pptx
3 IGARSS2011_Pasolli_Final.pptx
 
A Two Stage Estimator of Instrumental Variable Quantile Regression for Panel ...
A Two Stage Estimator of Instrumental Variable Quantile Regression for Panel ...A Two Stage Estimator of Instrumental Variable Quantile Regression for Panel ...
A Two Stage Estimator of Instrumental Variable Quantile Regression for Panel ...
 
Csmr11a.ppt
Csmr11a.pptCsmr11a.ppt
Csmr11a.ppt
 
Iciap 2
Iciap 2Iciap 2
Iciap 2
 
A PROCEDURE FOR IDENTIFYING PRECURSORS TOPROBLEM BEHAVIOR.docx
A PROCEDURE FOR IDENTIFYING PRECURSORS TOPROBLEM BEHAVIOR.docxA PROCEDURE FOR IDENTIFYING PRECURSORS TOPROBLEM BEHAVIOR.docx
A PROCEDURE FOR IDENTIFYING PRECURSORS TOPROBLEM BEHAVIOR.docx
 
Automatic eye fixations identification based on analysis of variance and cova...
Automatic eye fixations identification based on analysis of variance and cova...Automatic eye fixations identification based on analysis of variance and cova...
Automatic eye fixations identification based on analysis of variance and cova...
 
ZR-Objective Airway Inter-Rater Reliability(11-9-15)
ZR-Objective Airway Inter-Rater Reliability(11-9-15)ZR-Objective Airway Inter-Rater Reliability(11-9-15)
ZR-Objective Airway Inter-Rater Reliability(11-9-15)
 
CSMR11a.ppt
CSMR11a.pptCSMR11a.ppt
CSMR11a.ppt
 
CV_Wilberth_Herrera_2016
CV_Wilberth_Herrera_2016CV_Wilberth_Herrera_2016
CV_Wilberth_Herrera_2016
 
Design of Field Experiments in Biodiversity Impact Assessment
Design of Field Experiments in Biodiversity Impact Assessment Design of Field Experiments in Biodiversity Impact Assessment
Design of Field Experiments in Biodiversity Impact Assessment
 
Reproducibilidad mapping
Reproducibilidad mappingReproducibilidad mapping
Reproducibilidad mapping
 
C3.04: Assessing the impact of observations on ocean forecasts and reanalyses...
C3.04: Assessing the impact of observations on ocean forecasts and reanalyses...C3.04: Assessing the impact of observations on ocean forecasts and reanalyses...
C3.04: Assessing the impact of observations on ocean forecasts and reanalyses...
 
Case-control Study on 2nd Hammertoe Deformity Correction Techniques
Case-control Study on 2nd Hammertoe Deformity Correction TechniquesCase-control Study on 2nd Hammertoe Deformity Correction Techniques
Case-control Study on 2nd Hammertoe Deformity Correction Techniques
 
Hammer Toe Correction Comparative Study
Hammer Toe Correction Comparative StudyHammer Toe Correction Comparative Study
Hammer Toe Correction Comparative Study
 
Research design and experimentation
Research design and experimentationResearch design and experimentation
Research design and experimentation
 
GiacomettiResume
GiacomettiResumeGiacomettiResume
GiacomettiResume
 

Mais de ICSM 2010

Scalable Semantic Web-based Source Code Search Infrastructure
Scalable Semantic Web-based Source Code Search InfrastructureScalable Semantic Web-based Source Code Search Infrastructure
Scalable Semantic Web-based Source Code Search InfrastructureICSM 2010
 
2D and 3D Visualizations In Wikidev2.0 M. Fokaefs, D. Serrano, B. Tansey and ...
2D and 3D Visualizations In Wikidev2.0 M. Fokaefs, D. Serrano, B. Tansey and ...2D and 3D Visualizations In Wikidev2.0 M. Fokaefs, D. Serrano, B. Tansey and ...
2D and 3D Visualizations In Wikidev2.0 M. Fokaefs, D. Serrano, B. Tansey and ...ICSM 2010
 
Wiki dev nlp
Wiki dev nlpWiki dev nlp
Wiki dev nlpICSM 2010
 
iFL: An Interactive Environment for Understanding Feature Implementations
iFL: An Interactive Environment for Understanding Feature ImplementationsiFL: An Interactive Environment for Understanding Feature Implementations
iFL: An Interactive Environment for Understanding Feature ImplementationsICSM 2010
 
Using Clone Detection to Identify Bugs in Concurrent Software
Using Clone Detection to Identify Bugs in Concurrent SoftwareUsing Clone Detection to Identify Bugs in Concurrent Software
Using Clone Detection to Identify Bugs in Concurrent SoftwareICSM 2010
 
Automatically Repairing Test Cases for Evolving Method Declarations
Automatically Repairing Test Cases for Evolving Method DeclarationsAutomatically Repairing Test Cases for Evolving Method Declarations
Automatically Repairing Test Cases for Evolving Method DeclarationsICSM 2010
 
Automated Identification of Cross-browser Issues in Web Applications
Automated Identification of Cross-browser Issues in Web ApplicationsAutomated Identification of Cross-browser Issues in Web Applications
Automated Identification of Cross-browser Issues in Web ApplicationsICSM 2010
 
Reverse Engineering Object-Oriented Distributed Systems
Reverse Engineering Object-Oriented Distributed SystemsReverse Engineering Object-Oriented Distributed Systems
Reverse Engineering Object-Oriented Distributed SystemsICSM 2010
 
Software asset management
Software asset managementSoftware asset management
Software asset managementICSM 2010
 
Successfulresearch 100915022614-phpapp01
Successfulresearch 100915022614-phpapp01Successfulresearch 100915022614-phpapp01
Successfulresearch 100915022614-phpapp01ICSM 2010
 
Enabling multi tenancy(An Industrial Experience Report)
Enabling multi tenancy(An Industrial Experience Report)Enabling multi tenancy(An Industrial Experience Report)
Enabling multi tenancy(An Industrial Experience Report)ICSM 2010
 
Ponsini automatic slides
Ponsini automatic slidesPonsini automatic slides
Ponsini automatic slidesICSM 2010
 
Studying the impact of dependency network measures on software quality
Studying the impact of dependency network measures on software quality	Studying the impact of dependency network measures on software quality
Studying the impact of dependency network measures on software quality ICSM 2010
 
Icsm2010 Announcement
Icsm2010 AnnouncementIcsm2010 Announcement
Icsm2010 AnnouncementICSM 2010
 

Mais de ICSM 2010 (14)

Scalable Semantic Web-based Source Code Search Infrastructure
Scalable Semantic Web-based Source Code Search InfrastructureScalable Semantic Web-based Source Code Search Infrastructure
Scalable Semantic Web-based Source Code Search Infrastructure
 
2D and 3D Visualizations In Wikidev2.0 M. Fokaefs, D. Serrano, B. Tansey and ...
2D and 3D Visualizations In Wikidev2.0 M. Fokaefs, D. Serrano, B. Tansey and ...2D and 3D Visualizations In Wikidev2.0 M. Fokaefs, D. Serrano, B. Tansey and ...
2D and 3D Visualizations In Wikidev2.0 M. Fokaefs, D. Serrano, B. Tansey and ...
 
Wiki dev nlp
Wiki dev nlpWiki dev nlp
Wiki dev nlp
 
iFL: An Interactive Environment for Understanding Feature Implementations
iFL: An Interactive Environment for Understanding Feature ImplementationsiFL: An Interactive Environment for Understanding Feature Implementations
iFL: An Interactive Environment for Understanding Feature Implementations
 
Using Clone Detection to Identify Bugs in Concurrent Software
Using Clone Detection to Identify Bugs in Concurrent SoftwareUsing Clone Detection to Identify Bugs in Concurrent Software
Using Clone Detection to Identify Bugs in Concurrent Software
 
Automatically Repairing Test Cases for Evolving Method Declarations
Automatically Repairing Test Cases for Evolving Method DeclarationsAutomatically Repairing Test Cases for Evolving Method Declarations
Automatically Repairing Test Cases for Evolving Method Declarations
 
Automated Identification of Cross-browser Issues in Web Applications
Automated Identification of Cross-browser Issues in Web ApplicationsAutomated Identification of Cross-browser Issues in Web Applications
Automated Identification of Cross-browser Issues in Web Applications
 
Reverse Engineering Object-Oriented Distributed Systems
Reverse Engineering Object-Oriented Distributed SystemsReverse Engineering Object-Oriented Distributed Systems
Reverse Engineering Object-Oriented Distributed Systems
 
Software asset management
Software asset managementSoftware asset management
Software asset management
 
Successfulresearch 100915022614-phpapp01
Successfulresearch 100915022614-phpapp01Successfulresearch 100915022614-phpapp01
Successfulresearch 100915022614-phpapp01
 
Enabling multi tenancy(An Industrial Experience Report)
Enabling multi tenancy(An Industrial Experience Report)Enabling multi tenancy(An Industrial Experience Report)
Enabling multi tenancy(An Industrial Experience Report)
 
Ponsini automatic slides
Ponsini automatic slidesPonsini automatic slides
Ponsini automatic slides
 
Studying the impact of dependency network measures on software quality
Studying the impact of dependency network measures on software quality	Studying the impact of dependency network measures on software quality
Studying the impact of dependency network measures on software quality
 
Icsm2010 Announcement
Icsm2010 AnnouncementIcsm2010 Announcement
Icsm2010 Announcement
 

Physical and Conceptual Identifier Dispersion: Measures and Relation to Fault Proneness

  • 1. Physical and Conceptual Identifier Dispersion Venera Arnaoudova, Laleh Eshkevari, Rocco Oliveto, Yann-Ga¨el Gu´eh´eneuc, Giuliano Antoniol Introduction Our study Dispersion measures Our study - refined Case study RQ1 – Metric Relevance RQ2 – Relation to Faults Conclusions and future work Physical and Conceptual Identifier Dispersion: Measures and Relation to Fault Proneness Venera Arnaoudova Laleh Eshkevari Rocco Oliveto Yann-Ga¨el Gu´eh´eneuc Giuliano Antoniol SOCCER Lab. – DGIGL, ´Ecole Polytechnique de Montr´eal, Qc, Canada SE@SA Lab – DMI, University of Salerno - Salerno - Italy Ptidej Team – DGIGL, ´Ecole Polytechnique de Montr´eal, Qc, Canada September 15, 2010 SOftware Cost-effective Change and Evolution Research Lab Software Engineering @ SAlerno Pattern Trace Identification, Detection, and Enhancement in Java
  • 2. Physical and Conceptual Identifier Dispersion Venera Arnaoudova, Laleh Eshkevari, Rocco Oliveto, Yann-Ga¨el Gu´eh´eneuc, Giuliano Antoniol Introduction Our study Dispersion measures Our study - refined Case study RQ1 – Metric Relevance RQ2 – Relation to Faults Conclusions and future work Outline Introduction Our study Dispersion measures Our study - refined Case study RQ1 – Metric Relevance RQ2 – Relation to Faults Conclusions and future work 2 / 16
  • 3. Physical and Conceptual Identifier Dispersion Venera Arnaoudova, Laleh Eshkevari, Rocco Oliveto, Yann-Ga¨el Gu´eh´eneuc, Giuliano Antoniol Introduction Our study Dispersion measures Our study - refined Case study RQ1 – Metric Relevance RQ2 – Relation to Faults Conclusions and future work Introduction Fault identification size (e.g., [Gyim´othy et al., 2005]) cohesion (e.g., [Liu et al., 2009]) coupling (e.g., [Marcus et al., 2008]) number of changes (e.g., [Zimmermann et al., 2007]) Importance of linguistic information program comprehension (e.g., [Takang et al., 1996, Deissenboeck and Pizka, 2006, Haiduc and Marcus, 2008, Binkley et al., 2009]) code quality (e.g., [Marcus et al., 2008, Poshyvanyk and Marcus, 2006, Butler et al., 2009]) 3 / 16
  • 4. Physical and Conceptual Identifier Dispersion Venera Arnaoudova, Laleh Eshkevari, Rocco Oliveto, Yann-Ga¨el Gu´eh´eneuc, Giuliano Antoniol Introduction Our study Dispersion measures Our study - refined Case study RQ1 – Metric Relevance RQ2 – Relation to Faults Conclusions and future work Our study Term dispersion We are interested in studying the relation between term dispersion and the quality of the source code. term basic component of identifiers dispersion the way terms are scattered among different entities (attributes and methods) quality absence of faults Example: What is the impact of using getRelativePath, returnAbsolutePath, and setPath as method names on the fault proneness of those methods? 4 / 16
  • 5. Physical and Conceptual Identifier Dispersion Venera Arnaoudova, Laleh Eshkevari, Rocco Oliveto, Yann-Ga¨el Gu´eh´eneuc, Giuliano Antoniol Introduction Our study Dispersion measures Our study - refined Case study RQ1 – Metric Relevance RQ2 – Relation to Faults Conclusions and future work Dispersion measures (1/3) Physical dispersion - Entropy fee foo bar Terms Entities E1 E2 E3 E4 E5 Entropy The circle indicates the occurrences of a term in an entity. The higher the size of the circle the higher the number of occurrences. 5 / 16
  • 6. Physical and Conceptual Identifier Dispersion Venera Arnaoudova, Laleh Eshkevari, Rocco Oliveto, Yann-Ga¨el Gu´eh´eneuc, Giuliano Antoniol Introduction Our study Dispersion measures Our study - refined Case study RQ1 – Metric Relevance RQ2 – Relation to Faults Conclusions and future work Dispersion measures (2/3) Conceptual dispersion - Context Coverage E1 E3 E2 E5 E4 C1 C3 C2 C4 Entity Contexts Entity contexts are identified taking into account the terms contained in the entities. fee foo bar Terms ContextsC1 C2 C3 C4 Context coverage The star indicates that the term appears in the particular context. 6 / 16
  • 7. Physical and Conceptual Identifier Dispersion Venera Arnaoudova, Laleh Eshkevari, Rocco Oliveto, Yann-Ga¨el Gu´eh´eneuc, Giuliano Antoniol Introduction Our study Dispersion measures Our study - refined Case study RQ1 – Metric Relevance RQ2 – Relation to Faults Conclusions and future work Dispersion measures Aggregated metric - numHEHCC (3/3) Context Coverage Entropy th H th CC 7 / 16
  • 8. Physical and Conceptual Identifier Dispersion Venera Arnaoudova, Laleh Eshkevari, Rocco Oliveto, Yann-Ga¨el Gu´eh´eneuc, Giuliano Antoniol Introduction Our study Dispersion measures Our study - refined Case study RQ1 – Metric Relevance RQ2 – Relation to Faults Conclusions and future work Dispersion measures Aggregated metric - numHEHCC (3/3) Context Coverage Entropy th H th CC ? 7 / 16
  • 9. Physical and Conceptual Identifier Dispersion Venera Arnaoudova, Laleh Eshkevari, Rocco Oliveto, Yann-Ga¨el Gu´eh´eneuc, Giuliano Antoniol Introduction Our study Dispersion measures Our study - refined Case study RQ1 – Metric Relevance RQ2 – Relation to Faults Conclusions and future work Dispersion measures Aggregated metric - numHEHCC (3/3) Context Coverage Entropy th H th CC H: used in few identifiers CC: used in similar contexts 7 / 16
  • 10. Physical and Conceptual Identifier Dispersion Venera Arnaoudova, Laleh Eshkevari, Rocco Oliveto, Yann-Ga¨el Gu´eh´eneuc, Giuliano Antoniol Introduction Our study Dispersion measures Our study - refined Case study RQ1 – Metric Relevance RQ2 – Relation to Faults Conclusions and future work Dispersion measures Aggregated metric - numHEHCC (3/3) Context Coverage Entropy th H th CC ? 7 / 16
  • 11. Physical and Conceptual Identifier Dispersion Venera Arnaoudova, Laleh Eshkevari, Rocco Oliveto, Yann-Ga¨el Gu´eh´eneuc, Giuliano Antoniol Introduction Our study Dispersion measures Our study - refined Case study RQ1 – Metric Relevance RQ2 – Relation to Faults Conclusions and future work Dispersion measures Aggregated metric - numHEHCC (3/3) Context Coverage Entropy th H th CC H: used in many identifiers CC: used in similar contexts 7 / 16
  • 12. Physical and Conceptual Identifier Dispersion Venera Arnaoudova, Laleh Eshkevari, Rocco Oliveto, Yann-Ga¨el Gu´eh´eneuc, Giuliano Antoniol Introduction Our study Dispersion measures Our study - refined Case study RQ1 – Metric Relevance RQ2 – Relation to Faults Conclusions and future work Dispersion measures Aggregated metric - numHEHCC (3/3) Context Coverage Entropy th H th CC ? 7 / 16
  • 13. Physical and Conceptual Identifier Dispersion Venera Arnaoudova, Laleh Eshkevari, Rocco Oliveto, Yann-Ga¨el Gu´eh´eneuc, Giuliano Antoniol Introduction Our study Dispersion measures Our study - refined Case study RQ1 – Metric Relevance RQ2 – Relation to Faults Conclusions and future work Dispersion measures Aggregated metric - numHEHCC (3/3) Context Coverage Entropy th H th CC H: used in few identifiers CC: used in different contexts 7 / 16
  • 14. Physical and Conceptual Identifier Dispersion Venera Arnaoudova, Laleh Eshkevari, Rocco Oliveto, Yann-Ga¨el Gu´eh´eneuc, Giuliano Antoniol Introduction Our study Dispersion measures Our study - refined Case study RQ1 – Metric Relevance RQ2 – Relation to Faults Conclusions and future work Dispersion measures Aggregated metric - numHEHCC (3/3) Context Coverage Entropy th H th CC ? 7 / 16
  • 15. Physical and Conceptual Identifier Dispersion Venera Arnaoudova, Laleh Eshkevari, Rocco Oliveto, Yann-Ga¨el Gu´eh´eneuc, Giuliano Antoniol Introduction Our study Dispersion measures Our study - refined Case study RQ1 – Metric Relevance RQ2 – Relation to Faults Conclusions and future work Dispersion measures Aggregated metric - numHEHCC (3/3) Context Coverage Entropy th H th CC H: used in many identifiers CC: used in different contexts 7 / 16
  • 16. Physical and Conceptual Identifier Dispersion Venera Arnaoudova, Laleh Eshkevari, Rocco Oliveto, Yann-Ga¨el Gu´eh´eneuc, Giuliano Antoniol Introduction Our study Dispersion measures Our study - refined Case study RQ1 – Metric Relevance RQ2 – Relation to Faults Conclusions and future work Dispersion measures Aggregated metric - numHEHCC (3/3) Context Coverage Entropy th H th CC H: used in many identifiers CC: used in different contexts ! 7 / 16
  • 17. Physical and Conceptual Identifier Dispersion Venera Arnaoudova, Laleh Eshkevari, Rocco Oliveto, Yann-Ga¨el Gu´eh´eneuc, Giuliano Antoniol Introduction Our study Dispersion measures Our study - refined Case study RQ1 – Metric Relevance RQ2 – Relation to Faults Conclusions and future work Dispersion measures Aggregated metric - numHEHCC (3/3) Context Coverage Entropy th H th CC H: used in many identifiers CC: used in different contexts ! For each entity, numHEHCC counts the number of such terms 7 / 16
  • 18. Physical and Conceptual Identifier Dispersion Venera Arnaoudova, Laleh Eshkevari, Rocco Oliveto, Yann-Ga¨el Gu´eh´eneuc, Giuliano Antoniol Introduction Our study Dispersion measures Our study - refined Case study RQ1 – Metric Relevance RQ2 – Relation to Faults Conclusions and future work Our study - refined (1/2) Research question 1 RQ1 – Metric Relevance: Does numHEHCC capture characteristics different from size? Our believe: Yes it does, although we expect some overlap. To this end, we verify the following: 1. To what extend numHEHCC and size vary together. 2. Can size explain numHEHCC? 3. Does numHEHCC bring additional information to size for fault explanation? 8 / 16
  • 19. Physical and Conceptual Identifier Dispersion Venera Arnaoudova, Laleh Eshkevari, Rocco Oliveto, Yann-Ga¨el Gu´eh´eneuc, Giuliano Antoniol Introduction Our study Dispersion measures Our study - refined Case study RQ1 – Metric Relevance RQ2 – Relation to Faults Conclusions and future work Our study - refined (2/2) Research question 2 RQ2 – Relation to Faults: Do term entropy and context coverage help to explain the presence of faults in an entity? Our believe: Yes it does! How? 1. Estimate the risk of being faulty when entities contain terms with high entropy and high context coverage. 9 / 16
  • 20. Physical and Conceptual Identifier Dispersion Venera Arnaoudova, Laleh Eshkevari, Rocco Oliveto, Yann-Ga¨el Gu´eh´eneuc, Giuliano Antoniol Introduction Our study Dispersion measures Our study - refined Case study RQ1 – Metric Relevance RQ2 – Relation to Faults Conclusions and future work Objects Objects ArgoUML v0.16 – a UML modeling CASE tool. Rhino v1.4R3 – a JavaScript/ECMAScript interpreter and compiler. Program LOC # Entities # Terms ArgoUML 97,946 12,423 2517 Rhino 18,163 1,624 949 We consider as entities both methods and attributes. 10 / 16
  • 21. Physical and Conceptual Identifier Dispersion Venera Arnaoudova, Laleh Eshkevari, Rocco Oliveto, Yann-Ga¨el Gu´eh´eneuc, Giuliano Antoniol Introduction Our study Dispersion measures Our study - refined Case study RQ1 – Metric Relevance RQ2 – Relation to Faults Conclusions and future work Case study RQ1 – Metric Relevance (1/3) Results for RQ1 – Metric Relevance To what extend numHEHCC and size vary together? ArgoUML: 40% Rhino: 43% Correlation between numHEHCC and LOC numHEHCC LOC 11 / 16
  • 22. Physical and Conceptual Identifier Dispersion Venera Arnaoudova, Laleh Eshkevari, Rocco Oliveto, Yann-Ga¨el Gu´eh´eneuc, Giuliano Antoniol Introduction Our study Dispersion measures Our study - refined Case study RQ1 – Metric Relevance RQ2 – Relation to Faults Conclusions and future work Case study RQ1 – Metric Relevance (2/3) Results for RQ1 – Metric Relevance Can size explain numHEHCC? ArgoUML: 17% Rhino: 19% Composition of numHEHCC. 12 / 16
  • 23. Physical and Conceptual Identifier Dispersion Venera Arnaoudova, Laleh Eshkevari, Rocco Oliveto, Yann-Ga¨el Gu´eh´eneuc, Giuliano Antoniol Introduction Our study Dispersion measures Our study - refined Case study RQ1 – Metric Relevance RQ2 – Relation to Faults Conclusions and future work Case study RQ1 – Metric Relevance (3/3) Results for RQ1 – Metric Relevance (cont’d) Does numHEHCC bring additional information to size for fault explanation? Variables Coefficients p-values MArgoUML Intercept -1.688e+00 2e − 16 LOC 7.703e-03 8.34e − 10 numHEHCC 7.490e-02 1.42e − 05 LOC:numHEHCC -2.819e-04 0.000211 MRhino Intercept -4.9625130 2e − 16 LOC 0.0041486 0.17100 numHEHCC 0.2446853 0.00310 LOC:numHEHCC -0.0004976 0.29788 13 / 16
  • 24. Physical and Conceptual Identifier Dispersion Venera Arnaoudova, Laleh Eshkevari, Rocco Oliveto, Yann-Ga¨el Gu´eh´eneuc, Giuliano Antoniol Introduction Our study Dispersion measures Our study - refined Case study RQ1 – Metric Relevance RQ2 – Relation to Faults Conclusions and future work Case study Results for RQ2 – Relation to Faults (1/1) The risk of being faulty when entities contain terms with high entropy and high context coverage. All entities 14 / 16
  • 25. Physical and Conceptual Identifier Dispersion Venera Arnaoudova, Laleh Eshkevari, Rocco Oliveto, Yann-Ga¨el Gu´eh´eneuc, Giuliano Antoniol Introduction Our study Dispersion measures Our study - refined Case study RQ1 – Metric Relevance RQ2 – Relation to Faults Conclusions and future work Case study Results for RQ2 – Relation to Faults (1/1) The risk of being faulty when entities contain terms with high entropy and high context coverage. All entities 14 / 16
  • 26. Physical and Conceptual Identifier Dispersion Venera Arnaoudova, Laleh Eshkevari, Rocco Oliveto, Yann-Ga¨el Gu´eh´eneuc, Giuliano Antoniol Introduction Our study Dispersion measures Our study - refined Case study RQ1 – Metric Relevance RQ2 – Relation to Faults Conclusions and future work Case study Results for RQ2 – Relation to Faults (1/1) The risk of being faulty when entities contain terms with high entropy and high context coverage. All entities numHEHCC 10% of the entities 14 / 16
  • 27. Physical and Conceptual Identifier Dispersion Venera Arnaoudova, Laleh Eshkevari, Rocco Oliveto, Yann-Ga¨el Gu´eh´eneuc, Giuliano Antoniol Introduction Our study Dispersion measures Our study - refined Case study RQ1 – Metric Relevance RQ2 – Relation to Faults Conclusions and future work Case study Results for RQ2 – Relation to Faults (1/1) The risk of being faulty when entities contain terms with high entropy and high context coverage. All entities numHEHCC 10% of the entities Risk of being faulty? 14 / 16
  • 28. Physical and Conceptual Identifier Dispersion Venera Arnaoudova, Laleh Eshkevari, Rocco Oliveto, Yann-Ga¨el Gu´eh´eneuc, Giuliano Antoniol Introduction Our study Dispersion measures Our study - refined Case study RQ1 – Metric Relevance RQ2 – Relation to Faults Conclusions and future work Case study Results for RQ2 – Relation to Faults (1/1) The risk of being faulty when entities contain terms with high entropy and high context coverage. All entities numHEHCC 10% of the entities Risk of being faulty? ArgoUML: 2 x higher Rhino: 6 x higher 14 / 16
  • 29. Physical and Conceptual Identifier Dispersion Venera Arnaoudova, Laleh Eshkevari, Rocco Oliveto, Yann-Ga¨el Gu´eh´eneuc, Giuliano Antoniol Introduction Our study Dispersion measures Our study - refined Case study RQ1 – Metric Relevance RQ2 – Relation to Faults Conclusions and future work Conclusions and future work Conclusions Entropy and context coverage, together, capture characteristics different from size! Entropy and context coverage, together, help to explain the presence of faults in entities! Future directions Replicate the study to other systems. Use entropy and context coverage to suggest refactoring. Study the impact of lexicon evolution on entropy and context coverage. 15 / 16
  • 30. Physical and Conceptual Identifier Dispersion Venera Arnaoudova, Laleh Eshkevari, Rocco Oliveto, Yann-Ga¨el Gu´eh´eneuc, Giuliano Antoniol Introduction Our study Dispersion measures Our study - refined Case study RQ1 – Metric Relevance RQ2 – Relation to Faults Conclusions and future work Thank you! Questions? 16 / 16
  • 31. Physical and Conceptual Identifier Dispersion Venera Arnaoudova, Laleh Eshkevari, Rocco Oliveto, Yann-Ga¨el Gu´eh´eneuc, Giuliano Antoniol Introduction Our study Dispersion measures Our study - refined Case study RQ1 – Metric Relevance RQ2 – Relation to Faults Conclusions and future work Binkley, D., Davis, M., Lawrie, D., and Morrell, C. (2009). To CamelCase or Under score. In Proceedings of 17th IEEE International Conference on Program Comprehension. IEEE CS Press. Butler, S., Wermelinger, M., Yu, Y., and Sharp, H. (2009). Relating identifier naming flaws and code quality: An empirical study. In Proceedings of the 16th Working Conference on Reverse Engineering, pages 31–35. IEEE CS Press. Deissenboeck, F. and Pizka, M. (2006). Concise and consistent naming. Software Quality Journal, 14(3):261–282. Gyim´othy, T., Ferenc, R., and Siket, I. (2005). Empirical validation of object-oriented metrics on open source software for fault prediction. 16 / 16
  • 32. Physical and Conceptual Identifier Dispersion Venera Arnaoudova, Laleh Eshkevari, Rocco Oliveto, Yann-Ga¨el Gu´eh´eneuc, Giuliano Antoniol Introduction Our study Dispersion measures Our study - refined Case study RQ1 – Metric Relevance RQ2 – Relation to Faults Conclusions and future work IEEE Transactions on Software Engineering, 31(10):897–910. Haiduc, S. and Marcus, A. (2008). On the use of domain terms in source code. In Proceedings of 16th IEEE International Conference on Program Comprehension, pages 113–122. IEEE CS Press. Liu, Y., Poshyvanyk, D., Ferenc, R., Gyim´othy, T., and Chrisochoides, N. (2009). Modelling class cohesion as mixtures of latent topics. In Proceedings of 25th IEEE International Conference on Software Maintenance, pages 233–242, Edmonton, Canada. IEEE CS Press. Marcus, A., Poshyvanyk, D., and Ferenc, R. (2008). Using the conceptual cohesion of classes for fault prediction in object-oriented systems. IEEE Transactions on Software Engineering, 34(2):287–300. 16 / 16
  • 33. Physical and Conceptual Identifier Dispersion Venera Arnaoudova, Laleh Eshkevari, Rocco Oliveto, Yann-Ga¨el Gu´eh´eneuc, Giuliano Antoniol Introduction Our study Dispersion measures Our study - refined Case study RQ1 – Metric Relevance RQ2 – Relation to Faults Conclusions and future work Poshyvanyk, D. and Marcus, A. (2006). The conceptual coupling metrics for object-oriented systems. In Proceedings of 22nd IEEE International Conference on Software Maintenance, pages 469 – 478. IEEE CS Press. Takang, A., Grubb, P., and Macredie, R. (1996). The effects of comments and identifier names on program comprehensibility: an experiential study. Journal of Program Languages, 4(3):143–167. Zimmermann, T., Premraj, R., and Zeller, A. (2007). Predicting defects for eclipse. In Proceedings of the Third International Workshop on Predictor Models in Software Engineering. 16 / 16