SlideShare uma empresa Scribd logo
1 de 18
Baixar para ler offline
Statistical distributions of software metrics: do
                      they matter?

                                     Israel Herraiz

                          Technical University of Madrid


                         israel.herraiz@upm.es


                               Grab these slides from
     http://slideshare.net/herraiz/statistical-distributions-of-metrics




Israel Herraiz, UPM       Statistical distributions of software metrics: do they matter?   1/17
Outline



1    Some background


2    Statistical properties of software metrics


3    Evidence of impact on quality


4    Summary of findings and further work




Israel Herraiz, UPM      Statistical distributions of software metrics: do they matter?   2/17
1    Some background


2    Statistical properties of software metrics


3    Evidence of impact on quality


4    Summary of findings and further work




Israel Herraiz, UPM      Statistical distributions of software metrics: do they matter?   3/17
A (not so) long time ago...



Statistical distribution of software metrics
Software size follows a double Pareto distribution
Towards a theoretical model for software growth MSR 2007

More recently
Not only size, but some OO metrics too (and some complexity metrics)
On the Statistical Distribution of Object-Oriented System
Properties WETSoM 2012




Israel Herraiz, UPM    Statistical distributions of software metrics: do they matter?   4/17
OK, but what is that double Pareto thing?
           1e+00
           1e−02
P[X > x]




                          Data
                          Double Pareto
           1e−04




                          Lognormal


                      1                   100                                   10000

                                                  SLOC
Israel Herraiz, UPM           Statistical distributions of software metrics: do they matter?   5/17
But does it matter?




 Most of the files are on the
 lognormal side
             10 15 20 25 30 35
   % Files

             5
             0




                                 C   C++   Java   Python     Lisp




Israel Herraiz, UPM                               Statistical distributions of software metrics: do they matter?   6/17
But does it matter?




 Most of the files are on the                                                But the power law minority
 lognormal side                                                             matters a lot
             10 15 20 25 30 35




                                                                                       40
                                                                                       30
                                                                              % SLOC
   % Files




                                                                                       20
                                                                                       10
             5




                                                                                       0
             0




                                 C   C++   Java   Python     Lisp                            C        C++          Java   Python   Lisp




Israel Herraiz, UPM                               Statistical distributions of software metrics: do they matter?                          6/17
Large files have a large impact

Size estimation models
Some software size estimation models are based on the log-normality of size
metrics. These models systematically underestimate the size of software.

                                                  C                                                 C++
                           50




                                                                              50
                      RE




                                                                         RE
                           0




                                                                              0
                           −100




                                                                              −100
                                  2000    5000 10000             50000                2000    5000          20000     50000

                                                 SLOC                                               SLOC



                                                 Java                                           Python
                           50




                                                                              50
                      RE




                                                                         RE
                           0




                                                                              0
                           −100




                                                                              −100




                                   1000   2000          5000   10000                 1000    2000          5000     10000

                                                 SLOC                                               SLOC



On the distribution of source code file sizes ICSOFT 2011
Israel Herraiz, UPM                       Statistical distributions of software metrics: do they matter?                      7/17
1    Some background


2    Statistical properties of software metrics


3    Evidence of impact on quality


4    Summary of findings and further work




Israel Herraiz, UPM      Statistical distributions of software metrics: do they matter?   8/17
Parameters of the statistical distribution

Power law parameters: λ and xmin
Transition from lognormal to power law
                             1e+00
                             1e−02
                  P[X > x]




                                            Data
                                            Double Pareto
                             1e−04




                                            Lognormal


                                     1                      100                           10000

                                                                   SLOC

Israel Herraiz, UPM                      Statistical distributions of software metrics: do they matter?   9/17
1    Some background


2    Statistical properties of software metrics


3    Evidence of impact on quality


4    Summary of findings and further work




Israel Herraiz, UPM      Statistical distributions of software metrics: do they matter?   10/17
Probability of finding defects


Probability of finding defects
We have seen that files above xmin account for 40% of total size, being
only about ∼ 1% of the files.
What about defects? Probability of finding defects in three software
projects (using CYCLO as metric)

                      Project             Below xmin               Above xmin
                      Apache                   .4178                   .7708
                      OpenIntents              .2500                   .7500
                      Zxing                    .2143                   .4161

* Data extracted from “ReLink: Recovering Links between Bugs and Changes” FSE
2011.



Israel Herraiz, UPM         Statistical distributions of software metrics: do they matter?   11/17
Probability of finding defects




Probability of finding defects (normalized metrics)
Using CYCLO / WMC as metric (cyclomatic complex. per LOC)

                      Project             Below xmin               Above xmin
                      Apache                   .4159                   .6296
                      OpenIntents              .2813                   .5417
                      Zxing                    .3181                   .2389




Israel Herraiz, UPM         Statistical distributions of software metrics: do they matter?   12/17
Probability of finding defects

Defects density (only pre-release defects)
Using Number of Methods and number of pre-release defects per LOC

                                      Below xmin                                                Above xmin
                                                  Below xmin                                                 Above xmin
                      12000                                                         300




                      10000                                                         250




                       8000                                                         200




                       6000                                                         150




                       4000                                                         100




                       2000                                                          50




                          0                                                           0
                              0   1   2   3   4       5        6   7   8   9   10         0   0.05   0.1   0.15       0.2   0.25   0.3   0.35




                      Avg .Dens. = .2685                                            Avg .Dens. = .4565

* Data obtained from "Predicting Defects for Eclipse” PROMISE 2007

Israel Herraiz, UPM                               Statistical distributions of software metrics: do they matter?                                13/17
Probability of finding defects

Defects density (only post-release defects)
Using Number of Methods and number of post-release defects per LOC

                                           Below xmin                                                             Above xmin
                                                    Below xmin                                                             Above xmin
                      12000                                                                    300




                      10000                                                                    250




                       8000                                                                    200




                       6000                                                                    150




                       4000                                                                    100




                       2000                                                                     50




                          0                                                                      0
                              0    1   2    3   4       5         6   7   8   9   10                 0     0.05    0.1   0.15       0.2   0.25   0.3   0.35




                                  Avg .Dens. = .1437                                                     Avg .Dens. = .2690

Israel Herraiz, UPM                                              Statistical distributions of software metrics: do they matter?                               14/17
Probability of finding defects
Defects density (pre + post-release defects)
Using CYCLO/SLOC and number of total defects per LOC

                         0                                                  3
                        10                                                 10




                         −1                                                 2
                        10                                                 10
            Pr(X ≥ x)




                         −2                                                 1
                        10                                                 10




                         −3                                                 0
                        10                                                 10




                         −4                                                 −1
                        10 −1    1         3             5
                                                                           10
                                                                                 −1    0    1      2       3    4    5
                                                                                10    10   10     10      10   10   10
                          10    10       10            10
                                     x




                  Below xmin                                                   Above xmin
       Avg .Dens. = .3335 (>9000 files)                                Avg .Dens. = .7747 (364 files)
Israel Herraiz, UPM                      Statistical distributions of software metrics: do they matter?                  15/17
1    Some background


2    Statistical properties of software metrics


3    Evidence of impact on quality


4    Summary of findings and further work




Israel Herraiz, UPM      Statistical distributions of software metrics: do they matter?   16/17
Summary and further work

Summary of preliminary findings
        Some metrics have a transition from lognormal to power law
        Clear relation between normalized metrics and defects density
        Although the threshold might not be perfect (e.g., you might find a
        high defects density in a lower side file), it greatly reduces the search
        space for potentially problematic files

Further work
    Verify in more projects
                Do you have defects data at the file level?
        Find explanation for the transition and its influence on quality
        How do the statistical parameters change over time? Do defects
        evolve accordingly?

Israel Herraiz, UPM           Statistical distributions of software metrics: do they matter?   17/17

Mais conteúdo relacionado

Semelhante a Statistical Distribution of Metrics

2011/2012 CAST report on Application Software Quality (CRASH)
2011/2012 CAST report on Application Software Quality (CRASH)2011/2012 CAST report on Application Software Quality (CRASH)
2011/2012 CAST report on Application Software Quality (CRASH)CAST
 
Software Cost Contingency Development
Software Cost Contingency DevelopmentSoftware Cost Contingency Development
Software Cost Contingency Developmentskillern
 
The Explosion of Petascale in the Race to Exascale
The Explosion of Petascale in the Race to ExascaleThe Explosion of Petascale in the Race to Exascale
The Explosion of Petascale in the Race to ExascaleIntel IT Center
 
Hedge Fund IT Challenges Financial Survey
Hedge Fund IT Challenges Financial SurveyHedge Fund IT Challenges Financial Survey
Hedge Fund IT Challenges Financial SurveyAvere Systems
 
Revolution R Enterprise - 100% R and More Webinar Presentation
Revolution R Enterprise - 100% R and More Webinar PresentationRevolution R Enterprise - 100% R and More Webinar Presentation
Revolution R Enterprise - 100% R and More Webinar PresentationRevolution Analytics
 
Introduction to Performance Testing Part 1
Introduction to Performance Testing Part 1Introduction to Performance Testing Part 1
Introduction to Performance Testing Part 1C.T.Co
 
Data visualization short v1.1
Data visualization short v1.1Data visualization short v1.1
Data visualization short v1.1Adam Winkler
 
C3 Citrix Cloud Center
C3 Citrix Cloud CenterC3 Citrix Cloud Center
C3 Citrix Cloud CenterRui Lopes
 
Aggregating API Services with an API Gateway (BFF)
Aggregating API Services with an API Gateway (BFF)Aggregating API Services with an API Gateway (BFF)
Aggregating API Services with an API Gateway (BFF)José Roberto Araújo
 
BPMN Usage Survey: Results
BPMN Usage Survey: ResultsBPMN Usage Survey: Results
BPMN Usage Survey: ResultsMichele Chinosi
 
5 APM and Capacity Planning Imperatives for a Virtualized World
5 APM and Capacity Planning Imperatives for a Virtualized World5 APM and Capacity Planning Imperatives for a Virtualized World
5 APM and Capacity Planning Imperatives for a Virtualized WorldCorrelsense
 
Xen.org: The past, the present and exciting Future
Xen.org: The past, the present and exciting FutureXen.org: The past, the present and exciting Future
Xen.org: The past, the present and exciting FutureThe Linux Foundation
 
Introduction to MATLAB
Introduction to MATLABIntroduction to MATLAB
Introduction to MATLABAshish Meshram
 
201103 cuore forms2_adf v0.2
201103 cuore forms2_adf v0.2201103 cuore forms2_adf v0.2
201103 cuore forms2_adf v0.2Pedro Gallardo
 
Simple is Not Necessarily Better: Why Software Productivity Factors Can Lead...
Simple is Not Necessarily Better:  Why Software Productivity Factors Can Lead...Simple is Not Necessarily Better:  Why Software Productivity Factors Can Lead...
Simple is Not Necessarily Better: Why Software Productivity Factors Can Lead...Michael Gallo
 
Empirical evaluation in 2020: how big, how beautiful?
Empirical evaluation in 2020: how big, how beautiful?Empirical evaluation in 2020: how big, how beautiful?
Empirical evaluation in 2020: how big, how beautiful?Massimiliano Di Penta
 
Population Stability Index(PSI) for Big Data World
Population Stability Index(PSI) for Big Data WorldPopulation Stability Index(PSI) for Big Data World
Population Stability Index(PSI) for Big Data WorldJeomoan Kurian
 

Semelhante a Statistical Distribution of Metrics (20)

2011/2012 CAST report on Application Software Quality (CRASH)
2011/2012 CAST report on Application Software Quality (CRASH)2011/2012 CAST report on Application Software Quality (CRASH)
2011/2012 CAST report on Application Software Quality (CRASH)
 
Software Cost Contingency Development
Software Cost Contingency DevelopmentSoftware Cost Contingency Development
Software Cost Contingency Development
 
The Explosion of Petascale in the Race to Exascale
The Explosion of Petascale in the Race to ExascaleThe Explosion of Petascale in the Race to Exascale
The Explosion of Petascale in the Race to Exascale
 
Hedge Fund IT Challenges Financial Survey
Hedge Fund IT Challenges Financial SurveyHedge Fund IT Challenges Financial Survey
Hedge Fund IT Challenges Financial Survey
 
Dallas Meloon BI
Dallas Meloon   BIDallas Meloon   BI
Dallas Meloon BI
 
WETSoM 2011
WETSoM 2011WETSoM 2011
WETSoM 2011
 
Itn no 06 06 application vendor evaluation matrix
Itn no 06 06 application vendor evaluation matrixItn no 06 06 application vendor evaluation matrix
Itn no 06 06 application vendor evaluation matrix
 
Revolution R Enterprise - 100% R and More Webinar Presentation
Revolution R Enterprise - 100% R and More Webinar PresentationRevolution R Enterprise - 100% R and More Webinar Presentation
Revolution R Enterprise - 100% R and More Webinar Presentation
 
Introduction to Performance Testing Part 1
Introduction to Performance Testing Part 1Introduction to Performance Testing Part 1
Introduction to Performance Testing Part 1
 
Data visualization short v1.1
Data visualization short v1.1Data visualization short v1.1
Data visualization short v1.1
 
C3 Citrix Cloud Center
C3 Citrix Cloud CenterC3 Citrix Cloud Center
C3 Citrix Cloud Center
 
Aggregating API Services with an API Gateway (BFF)
Aggregating API Services with an API Gateway (BFF)Aggregating API Services with an API Gateway (BFF)
Aggregating API Services with an API Gateway (BFF)
 
BPMN Usage Survey: Results
BPMN Usage Survey: ResultsBPMN Usage Survey: Results
BPMN Usage Survey: Results
 
5 APM and Capacity Planning Imperatives for a Virtualized World
5 APM and Capacity Planning Imperatives for a Virtualized World5 APM and Capacity Planning Imperatives for a Virtualized World
5 APM and Capacity Planning Imperatives for a Virtualized World
 
Xen.org: The past, the present and exciting Future
Xen.org: The past, the present and exciting FutureXen.org: The past, the present and exciting Future
Xen.org: The past, the present and exciting Future
 
Introduction to MATLAB
Introduction to MATLABIntroduction to MATLAB
Introduction to MATLAB
 
201103 cuore forms2_adf v0.2
201103 cuore forms2_adf v0.2201103 cuore forms2_adf v0.2
201103 cuore forms2_adf v0.2
 
Simple is Not Necessarily Better: Why Software Productivity Factors Can Lead...
Simple is Not Necessarily Better:  Why Software Productivity Factors Can Lead...Simple is Not Necessarily Better:  Why Software Productivity Factors Can Lead...
Simple is Not Necessarily Better: Why Software Productivity Factors Can Lead...
 
Empirical evaluation in 2020: how big, how beautiful?
Empirical evaluation in 2020: how big, how beautiful?Empirical evaluation in 2020: how big, how beautiful?
Empirical evaluation in 2020: how big, how beautiful?
 
Population Stability Index(PSI) for Big Data World
Population Stability Index(PSI) for Big Data WorldPopulation Stability Index(PSI) for Big Data World
Population Stability Index(PSI) for Big Data World
 

Mais de Israel Herraiz

intensive metrics software evolution
intensive metrics software evolutionintensive metrics software evolution
intensive metrics software evolutionIsrael Herraiz
 
Public Key Cryptography
Public Key CryptographyPublic Key Cryptography
Public Key CryptographyIsrael Herraiz
 
¿MATLAB? Yo uso Octave UPM
¿MATLAB? Yo uso Octave UPM¿MATLAB? Yo uso Octave UPM
¿MATLAB? Yo uso Octave UPMIsrael Herraiz
 
The Ultimate Debian Database
The Ultimate Debian DatabaseThe Ultimate Debian Database
The Ultimate Debian DatabaseIsrael Herraiz
 
Evaluating the presence and impact of bias in bug-fix datasets
Evaluating the presence and impact of bias in bug-fix datasetsEvaluating the presence and impact of bias in bug-fix datasets
Evaluating the presence and impact of bias in bug-fix datasetsIsrael Herraiz
 
Software size distribution - Why we always underestimate software cost
Software size distribution - Why we always underestimate software costSoftware size distribution - Why we always underestimate software cost
Software size distribution - Why we always underestimate software costIsrael Herraiz
 
The dynamics of software evolution - EVOLUMONS 2011
The dynamics of software evolution - EVOLUMONS 2011The dynamics of software evolution - EVOLUMONS 2011
The dynamics of software evolution - EVOLUMONS 2011Israel Herraiz
 
Public key cryptography
Public key cryptographyPublic key cryptography
Public key cryptographyIsrael Herraiz
 
Mining Software Repositories
Mining Software RepositoriesMining Software Repositories
Mining Software RepositoriesIsrael Herraiz
 

Mais de Israel Herraiz (9)

intensive metrics software evolution
intensive metrics software evolutionintensive metrics software evolution
intensive metrics software evolution
 
Public Key Cryptography
Public Key CryptographyPublic Key Cryptography
Public Key Cryptography
 
¿MATLAB? Yo uso Octave UPM
¿MATLAB? Yo uso Octave UPM¿MATLAB? Yo uso Octave UPM
¿MATLAB? Yo uso Octave UPM
 
The Ultimate Debian Database
The Ultimate Debian DatabaseThe Ultimate Debian Database
The Ultimate Debian Database
 
Evaluating the presence and impact of bias in bug-fix datasets
Evaluating the presence and impact of bias in bug-fix datasetsEvaluating the presence and impact of bias in bug-fix datasets
Evaluating the presence and impact of bias in bug-fix datasets
 
Software size distribution - Why we always underestimate software cost
Software size distribution - Why we always underestimate software costSoftware size distribution - Why we always underestimate software cost
Software size distribution - Why we always underestimate software cost
 
The dynamics of software evolution - EVOLUMONS 2011
The dynamics of software evolution - EVOLUMONS 2011The dynamics of software evolution - EVOLUMONS 2011
The dynamics of software evolution - EVOLUMONS 2011
 
Public key cryptography
Public key cryptographyPublic key cryptography
Public key cryptography
 
Mining Software Repositories
Mining Software RepositoriesMining Software Repositories
Mining Software Repositories
 

Último

This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxDr. Sarita Anand
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Jisc
 
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdfVishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdfssuserdda66b
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the ClassroomPooky Knightsmith
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxVishalSingh1417
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseAnaAcapella
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...Poonam Aher Patil
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentationcamerronhm
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptxMaritesTamaniVerdade
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfSherif Taha
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701bronxfugly43
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and ModificationsMJDuyan
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxheathfieldcps1
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsMebane Rash
 

Último (20)

This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdfVishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 

Statistical Distribution of Metrics

  • 1. Statistical distributions of software metrics: do they matter? Israel Herraiz Technical University of Madrid israel.herraiz@upm.es Grab these slides from http://slideshare.net/herraiz/statistical-distributions-of-metrics Israel Herraiz, UPM Statistical distributions of software metrics: do they matter? 1/17
  • 2. Outline 1 Some background 2 Statistical properties of software metrics 3 Evidence of impact on quality 4 Summary of findings and further work Israel Herraiz, UPM Statistical distributions of software metrics: do they matter? 2/17
  • 3. 1 Some background 2 Statistical properties of software metrics 3 Evidence of impact on quality 4 Summary of findings and further work Israel Herraiz, UPM Statistical distributions of software metrics: do they matter? 3/17
  • 4. A (not so) long time ago... Statistical distribution of software metrics Software size follows a double Pareto distribution Towards a theoretical model for software growth MSR 2007 More recently Not only size, but some OO metrics too (and some complexity metrics) On the Statistical Distribution of Object-Oriented System Properties WETSoM 2012 Israel Herraiz, UPM Statistical distributions of software metrics: do they matter? 4/17
  • 5. OK, but what is that double Pareto thing? 1e+00 1e−02 P[X > x] Data Double Pareto 1e−04 Lognormal 1 100 10000 SLOC Israel Herraiz, UPM Statistical distributions of software metrics: do they matter? 5/17
  • 6. But does it matter? Most of the files are on the lognormal side 10 15 20 25 30 35 % Files 5 0 C C++ Java Python Lisp Israel Herraiz, UPM Statistical distributions of software metrics: do they matter? 6/17
  • 7. But does it matter? Most of the files are on the But the power law minority lognormal side matters a lot 10 15 20 25 30 35 40 30 % SLOC % Files 20 10 5 0 0 C C++ Java Python Lisp C C++ Java Python Lisp Israel Herraiz, UPM Statistical distributions of software metrics: do they matter? 6/17
  • 8. Large files have a large impact Size estimation models Some software size estimation models are based on the log-normality of size metrics. These models systematically underestimate the size of software. C C++ 50 50 RE RE 0 0 −100 −100 2000 5000 10000 50000 2000 5000 20000 50000 SLOC SLOC Java Python 50 50 RE RE 0 0 −100 −100 1000 2000 5000 10000 1000 2000 5000 10000 SLOC SLOC On the distribution of source code file sizes ICSOFT 2011 Israel Herraiz, UPM Statistical distributions of software metrics: do they matter? 7/17
  • 9. 1 Some background 2 Statistical properties of software metrics 3 Evidence of impact on quality 4 Summary of findings and further work Israel Herraiz, UPM Statistical distributions of software metrics: do they matter? 8/17
  • 10. Parameters of the statistical distribution Power law parameters: λ and xmin Transition from lognormal to power law 1e+00 1e−02 P[X > x] Data Double Pareto 1e−04 Lognormal 1 100 10000 SLOC Israel Herraiz, UPM Statistical distributions of software metrics: do they matter? 9/17
  • 11. 1 Some background 2 Statistical properties of software metrics 3 Evidence of impact on quality 4 Summary of findings and further work Israel Herraiz, UPM Statistical distributions of software metrics: do they matter? 10/17
  • 12. Probability of finding defects Probability of finding defects We have seen that files above xmin account for 40% of total size, being only about ∼ 1% of the files. What about defects? Probability of finding defects in three software projects (using CYCLO as metric) Project Below xmin Above xmin Apache .4178 .7708 OpenIntents .2500 .7500 Zxing .2143 .4161 * Data extracted from “ReLink: Recovering Links between Bugs and Changes” FSE 2011. Israel Herraiz, UPM Statistical distributions of software metrics: do they matter? 11/17
  • 13. Probability of finding defects Probability of finding defects (normalized metrics) Using CYCLO / WMC as metric (cyclomatic complex. per LOC) Project Below xmin Above xmin Apache .4159 .6296 OpenIntents .2813 .5417 Zxing .3181 .2389 Israel Herraiz, UPM Statistical distributions of software metrics: do they matter? 12/17
  • 14. Probability of finding defects Defects density (only pre-release defects) Using Number of Methods and number of pre-release defects per LOC Below xmin Above xmin Below xmin Above xmin 12000 300 10000 250 8000 200 6000 150 4000 100 2000 50 0 0 0 1 2 3 4 5 6 7 8 9 10 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 Avg .Dens. = .2685 Avg .Dens. = .4565 * Data obtained from "Predicting Defects for Eclipse” PROMISE 2007 Israel Herraiz, UPM Statistical distributions of software metrics: do they matter? 13/17
  • 15. Probability of finding defects Defects density (only post-release defects) Using Number of Methods and number of post-release defects per LOC Below xmin Above xmin Below xmin Above xmin 12000 300 10000 250 8000 200 6000 150 4000 100 2000 50 0 0 0 1 2 3 4 5 6 7 8 9 10 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 Avg .Dens. = .1437 Avg .Dens. = .2690 Israel Herraiz, UPM Statistical distributions of software metrics: do they matter? 14/17
  • 16. Probability of finding defects Defects density (pre + post-release defects) Using CYCLO/SLOC and number of total defects per LOC 0 3 10 10 −1 2 10 10 Pr(X ≥ x) −2 1 10 10 −3 0 10 10 −4 −1 10 −1 1 3 5 10 −1 0 1 2 3 4 5 10 10 10 10 10 10 10 10 10 10 10 x Below xmin Above xmin Avg .Dens. = .3335 (>9000 files) Avg .Dens. = .7747 (364 files) Israel Herraiz, UPM Statistical distributions of software metrics: do they matter? 15/17
  • 17. 1 Some background 2 Statistical properties of software metrics 3 Evidence of impact on quality 4 Summary of findings and further work Israel Herraiz, UPM Statistical distributions of software metrics: do they matter? 16/17
  • 18. Summary and further work Summary of preliminary findings Some metrics have a transition from lognormal to power law Clear relation between normalized metrics and defects density Although the threshold might not be perfect (e.g., you might find a high defects density in a lower side file), it greatly reduces the search space for potentially problematic files Further work Verify in more projects Do you have defects data at the file level? Find explanation for the transition and its influence on quality How do the statistical parameters change over time? Do defects evolve accordingly? Israel Herraiz, UPM Statistical distributions of software metrics: do they matter? 17/17