SlideShare uma empresa Scribd logo
1 de 29
Baixar para ler offline
Is it a Bug or an Enhancement?
         A Text-based Approach to
          Classify Change Requests
              Giuliano Antoniol, Kamel Ayari, Massimiliano Di Penta,
                     Foutse Khomh and Yann-Gaël Guéhéneuc


                               CASCON [2008]
                                   October 30, 2008




CASCON [2008] October 27-30                                   © Khomh 2008
Context
   Bug tracking systems are valuable assets for managing
   maintenance activities.

   They collect many different kinds of issues: requests for defect
   fixing, enhancements, refactoring/restructuring activities and
   organizational issues. But, these are simply labeled as bug for
   lack of a better classification.

   In recent years, the literature reported contributions on merging
   data from CVS and bug reports to identify whether CVS changes
   are related to bug fixes, to detect co-changes and to study
   evolution patterns.




CASCON [2008] October 27-30       2                         © Khomh 2008
Related Works
These recent years, there have been many studies based on
    data from BTS

       Sliwerski et al. introduced a refined approach to identify
       whether a change induced a bug fix.

       Runeson et al. investigated using Natural Language
       Processing techniques to identify of duplicate defect reports.

       But, none of these works deeply investigated the kinds of data
       stored in BTS, such as the Mozilla's Bugzilla BTS.




CASCON [2008] October 27-30        3                         © Khomh 2008
Research questions
    In this paper, we study the consistency of data contain in
    BTS.
To achieve that,

       We manually classified 1,800 issues extracted from the BTS
       of Mozilla, Eclipse, and JBoss using simple majority voting.

       We used machine learning techniques to perform automatic
       classifications.




CASCON [2008] October 27-30       4                        © Khomh 2008
Background: BTS (Bug Tracking System)

         Developer views




                                                  Te
         the bug description




                                                     s
                                                    te
                                                       r po
                                                            s
                                                           ts
                                                                th
                                                                 e
                                                                     bu
                                                                        g
                                                                        in
                                                                          to
                                                                            BT
                                                                               T
                                                                               Tester verifies
       Developper                                                                the bug
       makes and error




                               Error results in
                                  a fault
      In this study, we’ll focus on the two most popular bug
      tracking systems: Bugzilla and Jira

CASCON [2008] October 27-30            5                                                 © Khomh 2008
Research questions
      We answered the following questions:
      RQ1: Issue classification.
      To what extent the information contained in issues posted on
      bug tracking systems can be used to classify such issues,
      distinguishing bugs (i.e., corrective maintenance) from other
      activities (e.g., enhancement, refactoring. . . )?

      RQ2: Discriminating terms.
      What are the terms/fields that machine learning techniques
      use to discern bugs from other issues?

      RQ3: Comparison with grep.
      Do machine learning techniques perform better than grep and
      regular expression matching in general, techniques often used
      to analyze Concurrent Versions Systems (CVS)/SubVersion
      (SVN) logs and classify commits between bugs and other
      activities?

CASCON [2008] October 27-30       6                        © Khomh 2008
Objects of our study
    We perform our study using three well-known, industrial-
    strength, open-source systems.
   - Eclipse,
   - Mozilla
   - JBoss
    We use a RSS feeder to extract the 3,207 issues classified as
    “Resolved”.

    We select issues with the “Resolved” or “Closed” status to
    avoid duplicated bugs, rejected issues, or issues awaiting
    triage.



CASCON [2008] October 27-30      7                        © Khomh 2008
Automatic classification

    We first use a feature selection algorithm to select a subset of
    the available features with which to perform the automatic
    classification.

    Automatic classifiers require a labeled corpus, a set of tagged
    BTS issues acting as the Oracle for the training.

    Then, each automatic classifier is trained on a set of BTS
    issues and its performance is evaluated using cross validation.




CASCON [2008] October 27-30       8                        © Khomh 2008
Automatic classification
The automatic classification of BTS issues is performed using the
  Weka tool (http://www.cs.waikato.ac.nz/ml/weka/), in
  particular using:
  the symmetrical uncertainty attribute selector,

    the standard probabilistic naive Bayes classifier,

    the alternating decision tree (ADTree), and

    the linear logistic regression classifier.




CASCON [2008] October 27-30         9                    © Khomh 2008
Construction of the Oracle
   We randomly sample and manually classify 600 issues for
   each system, for a total of 1,800 distinct issues.

   We organize the issues in bundles of 150 entries each.

   For every subset, we ask three software engineers to classify
   the issues manually. Stating if the issues are a corrective
   maintenance (bugs) or a non-corrective maintenance
   (enhancement, refactoring, re-documentation, or other, i.e.,
   non bug).

   The classifications go through a simple majority vote and a
   decision on the status of each issue is made.

CASCON [2008] October 27-30     10                          © Khomh 2008
Construction of the Oracle
Decision rule
   An entry is considered a corrective maintenance if at least
  two out of three engineers classified it as a corrective
  maintenance (hereby referred to as “bug”).

    Otherwise the entry is considered as a non-corrective
   maintenance (hereby referred to as “non bug”).
  The classification yields the following results:




CASCON [2008] October 27-30     11                          © Khomh 2008
Terms extraction and indexing
 Step 1: Term extraction from bug reporting systems
   Pruning punctuation and special characters
   Camel case, “-”, and “_” word splitting

 Step 2: Stemming
   using the R implementation of the Porter stemmer
      (http://www.r-project.org)
      Note: stop words are not removed since terms such as
      “not”, “might”, “should” contributes to the classification
      and are actually selected

 Step 3: Term indexing
   Terms are indexed using the term frequency (tf)
   We didn’t use tf-idf since we don’t want to penalize terms
   appearing in many documents

CASCON [2008] October 27-30        12                        © Khomh 2008
Results of the automatic classification




 Mozilla: automatic classification confusion matrices (in bold percentage
                           of correct decisions).




CASCON [2008] October 27-30        13                          © Khomh 2008
Results of the automatic classification




  Eclipse: automatic classification confusion matrices (in bold percentage
                            of correct decisions).




CASCON [2008] October 27-30         14                          © Khomh 2008
Results of the automatic classification




JBoss: automatic classification confusion matrices (in bold percentage of
                            correct decisions).




CASCON [2008] October 27-30        15                          © Khomh 2008
Discriminating Terms
    we also studied the features
    that are used to perform the
    classification.

    Positive coefficients lead the
    classification tend towards a
    bug classification, while
    negative coefficients towards
    a non-bug classification.

    Terms such as “crash”,
    “critic”, “broken”, “when” lead
    to classifying the issue as a
    “bug”.

    Terms such as “should”,
    “implement”, “support” cause
    a classification as “non-bug“.
                              Mozilla: Example of ADtree

CASCON [2008] October 27-30           16                   © Khomh 2008
Discriminating Terms
    Positive coefficients lead the
    classification tend towards a
    non-bug classification, while
    negative coefficients towards
    a bug classification.

    Terms having a high influence
    for the “bug” classification
    are: “except(ion)”, “fail”,
    “npe” (null-pointer exception),
    “error”, “correct”,
    “termin(ation)”, “invalid”.

    Terms such as “provid(e)”,
    “add”, possibly indicate a non-
    bug issue.
                             Eclipse: Logistic Regression Coefficients

CASCON [2008] October 27-30           17                        © Khomh 2008
Comparison with grep

    To assess the usefulness of the machine-learning classifiers, it
    is useful to compare their performance with those of the
    simplest classifier that developers would have used: string
    and regular expression matching, e.g. using the Unix utility
    grep.

    We classify issues my means of the following grep regular
    expression to maximize retrieval:




CASCON [2008] October 27-30       18                       © Khomh 2008
Comparison with grep

    Each hit on the filtered textual information of the 1,800
    manually-classified bugs was considered as a detected bug;
    multiple hits on the same issues were not counted.




       Mozilla grep confusion matrix for manually classified bugs (in bold
                        percentage of correct decisions).



CASCON [2008] October 27-30           19                          © Khomh 2008
Comparison with grep



            Eclipse grep confusion matrix for manually classified bugs (in
                        bold percentage of correct decisions).




                JBoss grep confusion matrix for manually classified bugs (in
                          bold percentage of correct decisions).

CASCON [2008] October 27-30             20                          © Khomh 2008
Threats to the Validity
Internal validity
  We attempted to avoid any bias in the building of the oracle
  and of the classifiers by first classifying each issues manually
  with-out making any choice on the classifiers to be used.

External validity

    We randomly selected and manually classified our issues. We
    obtained a confident level of 95% and a confidence interval
    of 10% for precision and recall.

    Although the approach is perfectly applicable to other
    systems, we do not know whether the same results will be
    obtained.


CASCON [2008] October 27-30      21                        © Khomh 2008
Conclusion
   We showed that linguistic information contained in BTS
   entries is sufficient to automatically distinguish corrective
   maintenance from other activities.

   This is relevant in that it opens the possibility of building
   automatic routing systems, i.e., systems that automatically
   classify submitted tickets and route them to the maintenance
   team (bugs) or to team leader (enhancement requests and
   other issues).
   Certain terms and fields lead to more discriminating classifiers
   between “bug” and “non-bug” issues.

     A naive approach, using grep, is no match for the classifiers
   built using our oracle.

CASCON [2008] October 27-30       22                         © Khomh 2008
Conclusion
    We can report that, out of the 1,800 manually-classified
    issues, less than half are related to corrective
    maintenance.



    Therefore, bug tracking systems, in open-source
    development, have a far more complex use than simple
    bookkeeping of corrective maintenance.



    Study based on BTS issues should carefully consider
    what issues are used to build their predictive models.


CASCON [2008] October 27-30      23                       © Khomh 2008
Future Work

Our future work includes studying the relation between:

    Bugs and design patterns,

    Bugs and design defects.




CASCON [2008] October 27-30     24                        © Khomh 2008
Questions




      Thank you for your attention !

CASCON [2008] October 27-30   25       © Khomh 2008
Feature selection
    Not all features (terms) contribute to increase precision and recall
    Also, need to build a simple model easy to be interpreted and
    reused (external validity)
       Occam’s razor principle
    Feature selection therefore necessary
    Some algorithms (e.g. decision tree) already have a feature
    selection, others not
    Symmetric uncertainty selector:
       select attributes that well correlate with the class and have little
       intercorrelation




CASCON [2008] October 27-30           26                           © Khomh 2008
Naïve Bayes Classifier
    Probabilistic classifier that applies the Bayes theorem assuming
    a strong (naïve) assumption of independence among features
    Selects the most likely classification C for the feature values F1,
    F2, … Fn




                              p(C) p(F1 ,..., Fn | C)
        p(C | F1 ,..., Fn ) =
                                 p(F 1 ,..., Fn )
CASCON [2008] October 27-30           27                           © Khomh 2008
Classification tree
Mapping from observations
about an item to conclusions                            outlook
about its target value
Leaves represent classifications          sunny                            rainy
Branches represent                                      overcast
conjunctions (questions) on
features that lead to those           humidity                        windy
classifications


                                   high     normal                 false           true
                                                           yes


                                    no            yes              yes            no



CASCON [2008] October 27-30           28                                 © Khomh 2008
Logistic regression
Models relationship
between set of variables xi
     dichotomous (binary)
     variable Y                                  1.0
Very common in software




                              defect-proneness
engineering                                      0.8
   E.g. classification of


                              Probability of
   faulty classes                                0.6

                                                 0.4

                                                 0.2

                                                 0.0
                                                        Xi




CASCON [2008] October 27-30                        29        © Khomh 2008

Mais conteúdo relacionado

Destaque

Document defect tracking for improving product quality and productivity
Document   defect tracking for improving product quality and productivityDocument   defect tracking for improving product quality and productivity
Document defect tracking for improving product quality and productivity
ch_tabitha7
 
Bug tracking system(synopsis)
Bug tracking system(synopsis)Bug tracking system(synopsis)
Bug tracking system(synopsis)
happiness09
 
Software Testing Life Cycle
Software Testing Life CycleSoftware Testing Life Cycle
Software Testing Life Cycle
Udayakumar Sree
 

Destaque (9)

Discussion Paper: Bugs Tracking
Discussion Paper: Bugs TrackingDiscussion Paper: Bugs Tracking
Discussion Paper: Bugs Tracking
 
Moving Towards Zero Defects with Specification by Example
Moving Towards Zero Defects with Specification by ExampleMoving Towards Zero Defects with Specification by Example
Moving Towards Zero Defects with Specification by Example
 
Document defect tracking for improving product quality and productivity
Document   defect tracking for improving product quality and productivityDocument   defect tracking for improving product quality and productivity
Document defect tracking for improving product quality and productivity
 
Defect Tracking Software Project Presentation
Defect Tracking Software Project PresentationDefect Tracking Software Project Presentation
Defect Tracking Software Project Presentation
 
Bug tracking system(synopsis)
Bug tracking system(synopsis)Bug tracking system(synopsis)
Bug tracking system(synopsis)
 
Bug Tracking System
Bug Tracking SystemBug Tracking System
Bug Tracking System
 
A Bug Tracking System Is A Software Application
A Bug Tracking System Is A Software ApplicationA Bug Tracking System Is A Software Application
A Bug Tracking System Is A Software Application
 
Software Testing Life Cycle
Software Testing Life CycleSoftware Testing Life Cycle
Software Testing Life Cycle
 
Bug tracking system ppt
Bug tracking system pptBug tracking system ppt
Bug tracking system ppt
 

Semelhante a CASCON08.ppt

Quality Assurance. Quality Assurance Approach. White Box
Quality Assurance. Quality Assurance Approach. White BoxQuality Assurance. Quality Assurance Approach. White Box
Quality Assurance. Quality Assurance Approach. White Box
Kimberly Jones
 
36x48_new_modelling_cloud_infrastructure
36x48_new_modelling_cloud_infrastructure36x48_new_modelling_cloud_infrastructure
36x48_new_modelling_cloud_infrastructure
Washington Garcia
 
A tale of bug prediction in software development
A tale of bug prediction in software developmentA tale of bug prediction in software development
A tale of bug prediction in software development
Martin Pinzger
 
Pointcut rejuvenation
Pointcut rejuvenationPointcut rejuvenation
Pointcut rejuvenation
Ravi Theja
 

Semelhante a CASCON08.ppt (20)

Cascon08.ppt
Cascon08.pptCascon08.ppt
Cascon08.ppt
 
USING CATEGORICAL FEATURES IN MINING BUG TRACKING SYSTEMS TO ASSIGN BUG REPORTS
USING CATEGORICAL FEATURES IN MINING BUG TRACKING SYSTEMS TO ASSIGN BUG REPORTSUSING CATEGORICAL FEATURES IN MINING BUG TRACKING SYSTEMS TO ASSIGN BUG REPORTS
USING CATEGORICAL FEATURES IN MINING BUG TRACKING SYSTEMS TO ASSIGN BUG REPORTS
 
IRJET-Automatic Bug Triage with Software
IRJET-Automatic Bug Triage with Software IRJET-Automatic Bug Triage with Software
IRJET-Automatic Bug Triage with Software
 
Cognitive Behavior Analysis framework for Fault Prediction in Cloud Computing...
Cognitive Behavior Analysis framework for Fault Prediction in Cloud Computing...Cognitive Behavior Analysis framework for Fault Prediction in Cloud Computing...
Cognitive Behavior Analysis framework for Fault Prediction in Cloud Computing...
 
Quality Assurance. Quality Assurance Approach. White Box
Quality Assurance. Quality Assurance Approach. White BoxQuality Assurance. Quality Assurance Approach. White Box
Quality Assurance. Quality Assurance Approach. White Box
 
Bug Triage: An Automated Process
Bug Triage: An Automated ProcessBug Triage: An Automated Process
Bug Triage: An Automated Process
 
IJET-V2I6P28
IJET-V2I6P28IJET-V2I6P28
IJET-V2I6P28
 
IRJET- A Detailed Analysis on Windows Event Log Viewer for Faster Root Ca...
IRJET-  	  A Detailed Analysis on Windows Event Log Viewer for Faster Root Ca...IRJET-  	  A Detailed Analysis on Windows Event Log Viewer for Faster Root Ca...
IRJET- A Detailed Analysis on Windows Event Log Viewer for Faster Root Ca...
 
36x48_new_modelling_cloud_infrastructure
36x48_new_modelling_cloud_infrastructure36x48_new_modelling_cloud_infrastructure
36x48_new_modelling_cloud_infrastructure
 
IRJET- A Novel Approch Automatically Categorizing Software Technologies
IRJET- A Novel Approch Automatically Categorizing Software TechnologiesIRJET- A Novel Approch Automatically Categorizing Software Technologies
IRJET- A Novel Approch Automatically Categorizing Software Technologies
 
Implementation of reducing features to improve code change based bug predicti...
Implementation of reducing features to improve code change based bug predicti...Implementation of reducing features to improve code change based bug predicti...
Implementation of reducing features to improve code change based bug predicti...
 
A tale of bug prediction in software development
A tale of bug prediction in software developmentA tale of bug prediction in software development
A tale of bug prediction in software development
 
DEVELOPMENT OF A MULTIAGENT BASED METHODOLOGY FOR COMPLEX SYSTEMS
DEVELOPMENT OF A MULTIAGENT BASED METHODOLOGY FOR COMPLEX SYSTEMSDEVELOPMENT OF A MULTIAGENT BASED METHODOLOGY FOR COMPLEX SYSTEMS
DEVELOPMENT OF A MULTIAGENT BASED METHODOLOGY FOR COMPLEX SYSTEMS
 
IRJET- Sentimental Analysis for Online Reviews using Machine Learning Algorithms
IRJET- Sentimental Analysis for Online Reviews using Machine Learning AlgorithmsIRJET- Sentimental Analysis for Online Reviews using Machine Learning Algorithms
IRJET- Sentimental Analysis for Online Reviews using Machine Learning Algorithms
 
Fake Reviews Detection using Supervised Machine Learning
Fake Reviews Detection using Supervised Machine LearningFake Reviews Detection using Supervised Machine Learning
Fake Reviews Detection using Supervised Machine Learning
 
IRJET- Data Reduction in Bug Triage using Supervised Machine Learning
IRJET- Data Reduction in Bug Triage using Supervised Machine LearningIRJET- Data Reduction in Bug Triage using Supervised Machine Learning
IRJET- Data Reduction in Bug Triage using Supervised Machine Learning
 
Pointcut rejuvenation
Pointcut rejuvenationPointcut rejuvenation
Pointcut rejuvenation
 
Aa03101540158
Aa03101540158Aa03101540158
Aa03101540158
 
IRJET- Automatic Database Schema Generator
IRJET- Automatic Database Schema GeneratorIRJET- Automatic Database Schema Generator
IRJET- Automatic Database Schema Generator
 
Pesttt testing
Pesttt testingPesttt testing
Pesttt testing
 

Mais de Ptidej Team

Mais de Ptidej Team (20)

From IoT to Software Miniaturisation
From IoT to Software MiniaturisationFrom IoT to Software Miniaturisation
From IoT to Software Miniaturisation
 
Presentation
PresentationPresentation
Presentation
 
Presentation
PresentationPresentation
Presentation
 
Presentation
PresentationPresentation
Presentation
 
Presentation by Lionel Briand
Presentation by Lionel BriandPresentation by Lionel Briand
Presentation by Lionel Briand
 
Manel Abdellatif
Manel AbdellatifManel Abdellatif
Manel Abdellatif
 
Azadeh Kermansaravi
Azadeh KermansaraviAzadeh Kermansaravi
Azadeh Kermansaravi
 
Mouna Abidi
Mouna AbidiMouna Abidi
Mouna Abidi
 
CSED - Manel Grichi
CSED - Manel GrichiCSED - Manel Grichi
CSED - Manel Grichi
 
Cristiano Politowski
Cristiano PolitowskiCristiano Politowski
Cristiano Politowski
 
Will io t trigger the next software crisis
Will io t trigger the next software crisisWill io t trigger the next software crisis
Will io t trigger the next software crisis
 
MIPA
MIPAMIPA
MIPA
 
Thesis+of+laleh+eshkevari.ppt
Thesis+of+laleh+eshkevari.pptThesis+of+laleh+eshkevari.ppt
Thesis+of+laleh+eshkevari.ppt
 
Thesis+of+nesrine+abdelkafi.ppt
Thesis+of+nesrine+abdelkafi.pptThesis+of+nesrine+abdelkafi.ppt
Thesis+of+nesrine+abdelkafi.ppt
 
Medicine15.ppt
Medicine15.pptMedicine15.ppt
Medicine15.ppt
 
Qrs17b.ppt
Qrs17b.pptQrs17b.ppt
Qrs17b.ppt
 
Icpc11c.ppt
Icpc11c.pptIcpc11c.ppt
Icpc11c.ppt
 
Icsme16.ppt
Icsme16.pptIcsme16.ppt
Icsme16.ppt
 
Msr17a.ppt
Msr17a.pptMsr17a.ppt
Msr17a.ppt
 
Icsoc15.ppt
Icsoc15.pptIcsoc15.ppt
Icsoc15.ppt
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Último (20)

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 

CASCON08.ppt

  • 1. Is it a Bug or an Enhancement? A Text-based Approach to Classify Change Requests Giuliano Antoniol, Kamel Ayari, Massimiliano Di Penta, Foutse Khomh and Yann-Gaël Guéhéneuc CASCON [2008] October 30, 2008 CASCON [2008] October 27-30 © Khomh 2008
  • 2. Context Bug tracking systems are valuable assets for managing maintenance activities. They collect many different kinds of issues: requests for defect fixing, enhancements, refactoring/restructuring activities and organizational issues. But, these are simply labeled as bug for lack of a better classification. In recent years, the literature reported contributions on merging data from CVS and bug reports to identify whether CVS changes are related to bug fixes, to detect co-changes and to study evolution patterns. CASCON [2008] October 27-30 2 © Khomh 2008
  • 3. Related Works These recent years, there have been many studies based on data from BTS Sliwerski et al. introduced a refined approach to identify whether a change induced a bug fix. Runeson et al. investigated using Natural Language Processing techniques to identify of duplicate defect reports. But, none of these works deeply investigated the kinds of data stored in BTS, such as the Mozilla's Bugzilla BTS. CASCON [2008] October 27-30 3 © Khomh 2008
  • 4. Research questions In this paper, we study the consistency of data contain in BTS. To achieve that, We manually classified 1,800 issues extracted from the BTS of Mozilla, Eclipse, and JBoss using simple majority voting. We used machine learning techniques to perform automatic classifications. CASCON [2008] October 27-30 4 © Khomh 2008
  • 5. Background: BTS (Bug Tracking System) Developer views Te the bug description s te r po s ts th e bu g in to BT T Tester verifies Developper the bug makes and error Error results in a fault In this study, we’ll focus on the two most popular bug tracking systems: Bugzilla and Jira CASCON [2008] October 27-30 5 © Khomh 2008
  • 6. Research questions We answered the following questions: RQ1: Issue classification. To what extent the information contained in issues posted on bug tracking systems can be used to classify such issues, distinguishing bugs (i.e., corrective maintenance) from other activities (e.g., enhancement, refactoring. . . )? RQ2: Discriminating terms. What are the terms/fields that machine learning techniques use to discern bugs from other issues? RQ3: Comparison with grep. Do machine learning techniques perform better than grep and regular expression matching in general, techniques often used to analyze Concurrent Versions Systems (CVS)/SubVersion (SVN) logs and classify commits between bugs and other activities? CASCON [2008] October 27-30 6 © Khomh 2008
  • 7. Objects of our study We perform our study using three well-known, industrial- strength, open-source systems. - Eclipse, - Mozilla - JBoss We use a RSS feeder to extract the 3,207 issues classified as “Resolved”. We select issues with the “Resolved” or “Closed” status to avoid duplicated bugs, rejected issues, or issues awaiting triage. CASCON [2008] October 27-30 7 © Khomh 2008
  • 8. Automatic classification We first use a feature selection algorithm to select a subset of the available features with which to perform the automatic classification. Automatic classifiers require a labeled corpus, a set of tagged BTS issues acting as the Oracle for the training. Then, each automatic classifier is trained on a set of BTS issues and its performance is evaluated using cross validation. CASCON [2008] October 27-30 8 © Khomh 2008
  • 9. Automatic classification The automatic classification of BTS issues is performed using the Weka tool (http://www.cs.waikato.ac.nz/ml/weka/), in particular using: the symmetrical uncertainty attribute selector, the standard probabilistic naive Bayes classifier, the alternating decision tree (ADTree), and the linear logistic regression classifier. CASCON [2008] October 27-30 9 © Khomh 2008
  • 10. Construction of the Oracle We randomly sample and manually classify 600 issues for each system, for a total of 1,800 distinct issues. We organize the issues in bundles of 150 entries each. For every subset, we ask three software engineers to classify the issues manually. Stating if the issues are a corrective maintenance (bugs) or a non-corrective maintenance (enhancement, refactoring, re-documentation, or other, i.e., non bug). The classifications go through a simple majority vote and a decision on the status of each issue is made. CASCON [2008] October 27-30 10 © Khomh 2008
  • 11. Construction of the Oracle Decision rule An entry is considered a corrective maintenance if at least two out of three engineers classified it as a corrective maintenance (hereby referred to as “bug”). Otherwise the entry is considered as a non-corrective maintenance (hereby referred to as “non bug”). The classification yields the following results: CASCON [2008] October 27-30 11 © Khomh 2008
  • 12. Terms extraction and indexing Step 1: Term extraction from bug reporting systems Pruning punctuation and special characters Camel case, “-”, and “_” word splitting Step 2: Stemming using the R implementation of the Porter stemmer (http://www.r-project.org) Note: stop words are not removed since terms such as “not”, “might”, “should” contributes to the classification and are actually selected Step 3: Term indexing Terms are indexed using the term frequency (tf) We didn’t use tf-idf since we don’t want to penalize terms appearing in many documents CASCON [2008] October 27-30 12 © Khomh 2008
  • 13. Results of the automatic classification Mozilla: automatic classification confusion matrices (in bold percentage of correct decisions). CASCON [2008] October 27-30 13 © Khomh 2008
  • 14. Results of the automatic classification Eclipse: automatic classification confusion matrices (in bold percentage of correct decisions). CASCON [2008] October 27-30 14 © Khomh 2008
  • 15. Results of the automatic classification JBoss: automatic classification confusion matrices (in bold percentage of correct decisions). CASCON [2008] October 27-30 15 © Khomh 2008
  • 16. Discriminating Terms we also studied the features that are used to perform the classification. Positive coefficients lead the classification tend towards a bug classification, while negative coefficients towards a non-bug classification. Terms such as “crash”, “critic”, “broken”, “when” lead to classifying the issue as a “bug”. Terms such as “should”, “implement”, “support” cause a classification as “non-bug“. Mozilla: Example of ADtree CASCON [2008] October 27-30 16 © Khomh 2008
  • 17. Discriminating Terms Positive coefficients lead the classification tend towards a non-bug classification, while negative coefficients towards a bug classification. Terms having a high influence for the “bug” classification are: “except(ion)”, “fail”, “npe” (null-pointer exception), “error”, “correct”, “termin(ation)”, “invalid”. Terms such as “provid(e)”, “add”, possibly indicate a non- bug issue. Eclipse: Logistic Regression Coefficients CASCON [2008] October 27-30 17 © Khomh 2008
  • 18. Comparison with grep To assess the usefulness of the machine-learning classifiers, it is useful to compare their performance with those of the simplest classifier that developers would have used: string and regular expression matching, e.g. using the Unix utility grep. We classify issues my means of the following grep regular expression to maximize retrieval: CASCON [2008] October 27-30 18 © Khomh 2008
  • 19. Comparison with grep Each hit on the filtered textual information of the 1,800 manually-classified bugs was considered as a detected bug; multiple hits on the same issues were not counted. Mozilla grep confusion matrix for manually classified bugs (in bold percentage of correct decisions). CASCON [2008] October 27-30 19 © Khomh 2008
  • 20. Comparison with grep Eclipse grep confusion matrix for manually classified bugs (in bold percentage of correct decisions). JBoss grep confusion matrix for manually classified bugs (in bold percentage of correct decisions). CASCON [2008] October 27-30 20 © Khomh 2008
  • 21. Threats to the Validity Internal validity We attempted to avoid any bias in the building of the oracle and of the classifiers by first classifying each issues manually with-out making any choice on the classifiers to be used. External validity We randomly selected and manually classified our issues. We obtained a confident level of 95% and a confidence interval of 10% for precision and recall. Although the approach is perfectly applicable to other systems, we do not know whether the same results will be obtained. CASCON [2008] October 27-30 21 © Khomh 2008
  • 22. Conclusion We showed that linguistic information contained in BTS entries is sufficient to automatically distinguish corrective maintenance from other activities. This is relevant in that it opens the possibility of building automatic routing systems, i.e., systems that automatically classify submitted tickets and route them to the maintenance team (bugs) or to team leader (enhancement requests and other issues). Certain terms and fields lead to more discriminating classifiers between “bug” and “non-bug” issues. A naive approach, using grep, is no match for the classifiers built using our oracle. CASCON [2008] October 27-30 22 © Khomh 2008
  • 23. Conclusion We can report that, out of the 1,800 manually-classified issues, less than half are related to corrective maintenance. Therefore, bug tracking systems, in open-source development, have a far more complex use than simple bookkeeping of corrective maintenance. Study based on BTS issues should carefully consider what issues are used to build their predictive models. CASCON [2008] October 27-30 23 © Khomh 2008
  • 24. Future Work Our future work includes studying the relation between: Bugs and design patterns, Bugs and design defects. CASCON [2008] October 27-30 24 © Khomh 2008
  • 25. Questions Thank you for your attention ! CASCON [2008] October 27-30 25 © Khomh 2008
  • 26. Feature selection Not all features (terms) contribute to increase precision and recall Also, need to build a simple model easy to be interpreted and reused (external validity) Occam’s razor principle Feature selection therefore necessary Some algorithms (e.g. decision tree) already have a feature selection, others not Symmetric uncertainty selector: select attributes that well correlate with the class and have little intercorrelation CASCON [2008] October 27-30 26 © Khomh 2008
  • 27. Naïve Bayes Classifier Probabilistic classifier that applies the Bayes theorem assuming a strong (naïve) assumption of independence among features Selects the most likely classification C for the feature values F1, F2, … Fn p(C) p(F1 ,..., Fn | C) p(C | F1 ,..., Fn ) = p(F 1 ,..., Fn ) CASCON [2008] October 27-30 27 © Khomh 2008
  • 28. Classification tree Mapping from observations about an item to conclusions outlook about its target value Leaves represent classifications sunny rainy Branches represent overcast conjunctions (questions) on features that lead to those humidity windy classifications high normal false true yes no yes yes no CASCON [2008] October 27-30 28 © Khomh 2008
  • 29. Logistic regression Models relationship between set of variables xi dichotomous (binary) variable Y 1.0 Very common in software defect-proneness engineering 0.8 E.g. classification of Probability of faulty classes 0.6 0.4 0.2 0.0 Xi CASCON [2008] October 27-30 29 © Khomh 2008