SlideShare uma empresa Scribd logo
1 de 28
Rongxin Wu, Hongyu Zhang, Sunghum Kim, Shi-chi Cheung
                 Tsinghua University, China
The Hong Kong University of Science and Technology, Hong Kong   1
• The links between fixed bugs and committed
  changes are important:
  – for measuring software quality
  – for constructing defect prediction models

                                           Committed
Fixed                                      Changes
Bugs
        BugZilla                 CVS/SVN


                                                  2
• To discover the links:
        Mining software repository!
• Heuristics traditionally used to collect links
  between bugs and changes:
   Searching for keywords (such as “Fixed” or
     “Bug”) and Bug IDs
                                          Bugzilla    Mailings
                                     Source
                                                   CVS/      Execution
                                      Code
                                                   SVN         traces
                                                             Crash
                               Require-   Developer
                                ments                 Logs
                                                                     … 3
Defective




        4
Missing Links!




Bird et al. “Fair and Balanced? Bias in Bug-Fix Datasets”, FSE 2009   a5
• Missing bug reference in change log




• Irregular bug reference formats
   “issue 681” , “bug 232”, “Fixed for #239”, “see
   #149”, “solve problem 681”,
   Typos: “Fic 239”
                                                      6
• To recover the missing links, we studied many
  bug reports (including comments) and change
  logs
• We have identified the following features of links:
   – Time interval: the bug-fix time and change committed
     time are close




                                                            7
• Time interval between bug-fix time and
  change committed time




                                           8
• Through empirical studies, we have identified
  the following features of links:


  – Bug owner and change committer: they are often
    the same person, or have mapping relationships




                                                     9
Mapping
• Bug owner and change committer                            relationship


       Bug Owner            Change Committer      Project

  dswitkin@gmail.com            dswitkin           ZXing

  dswitkin@google.com      dswitkin@google.com     ZXing

   srowen@gmail.com              srowen            ZXing
 pelili0101@googlemail.c
                                peli0101         Openintents
           om
       Will Rowe                 Wrowe             Apache

       Erik Abele               Erikabele          Apache
                                                                     10
Bug owner and change committer




                                 11
• Through empirical studies, we have identified
  the following features of links:




  – Text similarity: the textual descriptions in the bug
    report are often similar to those in the change
    logs.

                                                       12
• Text similarity       Texts are
                         similar!




                        Using IR
                     technology to
                    measure similarity
                                    13
14
• To determine the criteria of features, we learn
  from the explicit links that can be identified
  through traditional heuristics:
  – For the time interval feature and the text similarity
    feature, we exhaustively search for the optimal
    combination of these two values so that the
    maximum F-measure can be achieved.
  – For the mappings between bug owners and
    change committers, we also learn them from the
    explicit links.

                                                       15
• Determine time interval and similarity threshold
                                   Step by step search the
                                      optimal similarity
                                     threshold and time
                                       interval values
• Determine mapping relationship between bug
  owners and change committers

                                To find the possible mappings
                                     from the explicit links
• To obtain the ground truth (“golden set” of links)
  • For ZXing and OpenIntents, we manually identify the links
  • For Apache, we use the data provided by Bird et al. (annotated
    by an Apache core developer)
• Four possible outcomes
  –   A link we identify is a true link → TP
  –   A link we identify is not a true link → FP
  –   A link we miss is a true link → FN
  –   A link we miss is not a true link → TN
• Evaluation Metrics
                      TP                       TP
       Precision                  Recall
                    TP FP                    TP FN

                    2 * Precision * Recall
       FMeasure
                     Precision Recall                19
F-measure




    Recall                                                          ReLink
                                                                    Traditional



 Precision



             0.65     0.7      0.75      0.8     0.85         0.9

                    Performance of ReLink in Apache Project
21
• What can we do with the recovered links?
  – Improving Maintainability Measurement
    The percentage of bug-fixing changes
    The percentage of buggy files
    Mean time to fix
  – Constructing better software defect
    prediction models
• Maintainability Measurement:




                                 23
24
• Defect Prediction




  ReLink can improve the performance of defect prediction!
• The quality of golden set of links can’t be
  completely assured

• All the datasets are collected from open source
  projects

• The approach needs to be verified in more
  projects

                                                26
• We propose ReLink to recover the missing
  links
• The recovered links have positive impact on
  the follow-up software maintenance studies
  including defect prediction and maintainability
  measurement.
• Future work:
   Further improving the performance of ReLink
   Applying to more projects including industrial
   projects
                                                     27
Thank you!

Dr Hongyu Zhang
School of Software, Tsinghua University
Beijing 100084, China
Email: hongyu@tsinghua.edu.cn
Web: http://sites.google.com/site/hongyujohn/


                                                28

Mais conteúdo relacionado

Semelhante a ReLink: Recovering Links between Bugs and Changes (ESEC/FSE 2011)

Олексій Павленко. CONTRACT PROTECTION ON THE FRONTEND SIDE: HOW TO ORGANIZE R...
Олексій Павленко. CONTRACT PROTECTION ON THE FRONTEND SIDE: HOW TO ORGANIZE R...Олексій Павленко. CONTRACT PROTECTION ON THE FRONTEND SIDE: HOW TO ORGANIZE R...
Олексій Павленко. CONTRACT PROTECTION ON THE FRONTEND SIDE: HOW TO ORGANIZE R...OdessaJS Conf
 
An Empirical Study of Unspecified Dependencies in Make-Based Build Systems
An Empirical Study of Unspecified Dependencies in Make-Based Build SystemsAn Empirical Study of Unspecified Dependencies in Make-Based Build Systems
An Empirical Study of Unspecified Dependencies in Make-Based Build Systemscorpaulbezemer
 
REST vs. GraphQL: Critical Look
REST vs. GraphQL: Critical LookREST vs. GraphQL: Critical Look
REST vs. GraphQL: Critical LookNordic APIs
 
Measuring Your Code
Measuring Your CodeMeasuring Your Code
Measuring Your CodeNate Abele
 
Streamlined Geek Talk
Streamlined Geek TalkStreamlined Geek Talk
Streamlined Geek TalkSarah Allen
 
Concurrent Ruby Application Servers
Concurrent Ruby Application ServersConcurrent Ruby Application Servers
Concurrent Ruby Application ServersLin Jen-Shin
 
An Efficient Approach for Requirement Traceability Integrated With Software R...
An Efficient Approach for Requirement Traceability Integrated With Software R...An Efficient Approach for Requirement Traceability Integrated With Software R...
An Efficient Approach for Requirement Traceability Integrated With Software R...IOSR Journals
 
Put Your Hands in the Mud: What Technique, Why, and How
Put Your Hands in the Mud: What Technique, Why, and HowPut Your Hands in the Mud: What Technique, Why, and How
Put Your Hands in the Mud: What Technique, Why, and HowMassimiliano Di Penta
 
Package Repositories: The Unsung Heroes of Configuration and Release Managem...
Package Repositories:  The Unsung Heroes of Configuration and Release Managem...Package Repositories:  The Unsung Heroes of Configuration and Release Managem...
Package Repositories: The Unsung Heroes of Configuration and Release Managem...IBM UrbanCode Products
 
Towards a Quality Assessment of Web Corpora for Language Technology Applications
Towards a Quality Assessment of Web Corpora for Language Technology ApplicationsTowards a Quality Assessment of Web Corpora for Language Technology Applications
Towards a Quality Assessment of Web Corpora for Language Technology ApplicationsMarina Santini
 
Technical Challenges in Resource Discovery
Technical Challenges in Resource DiscoveryTechnical Challenges in Resource Discovery
Technical Challenges in Resource DiscoveryPaul Walk
 
Technical Coping Strategies for Resource Discovery - Paul Walk
Technical Coping Strategies for Resource Discovery - Paul WalkTechnical Coping Strategies for Resource Discovery - Paul Walk
Technical Coping Strategies for Resource Discovery - Paul WalkJisc
 
A Tasty deep-dive into Open API Specification Links
A Tasty deep-dive into Open API Specification LinksA Tasty deep-dive into Open API Specification Links
A Tasty deep-dive into Open API Specification LinksTony Tam
 
Actor Concurrency Bugs: A Comprehensive Study on Symptoms, Root Causes, API U...
Actor Concurrency Bugs: A Comprehensive Study on Symptoms, Root Causes, API U...Actor Concurrency Bugs: A Comprehensive Study on Symptoms, Root Causes, API U...
Actor Concurrency Bugs: A Comprehensive Study on Symptoms, Root Causes, API U...Raffi Khatchadourian
 
2014 IEEE DOTNET SOFTWARE ENGINEERING PROJECT Automatic summarization of bug ...
2014 IEEE DOTNET SOFTWARE ENGINEERING PROJECT Automatic summarization of bug ...2014 IEEE DOTNET SOFTWARE ENGINEERING PROJECT Automatic summarization of bug ...
2014 IEEE DOTNET SOFTWARE ENGINEERING PROJECT Automatic summarization of bug ...IEEEBEBTECHSTUDENTSPROJECTS
 
IEEE 2014 DOTNET SOFTWARE ENGINEER PROJECTS Automatic summarization of bug re...
IEEE 2014 DOTNET SOFTWARE ENGINEER PROJECTS Automatic summarization of bug re...IEEE 2014 DOTNET SOFTWARE ENGINEER PROJECTS Automatic summarization of bug re...
IEEE 2014 DOTNET SOFTWARE ENGINEER PROJECTS Automatic summarization of bug re...IEEEBEBTECHSTUDENTPROJECTS
 
CROSSMINER Project at OW2con'19
CROSSMINER Project at OW2con'19CROSSMINER Project at OW2con'19
CROSSMINER Project at OW2con'19OW2
 

Semelhante a ReLink: Recovering Links between Bugs and Changes (ESEC/FSE 2011) (20)

Олексій Павленко. CONTRACT PROTECTION ON THE FRONTEND SIDE: HOW TO ORGANIZE R...
Олексій Павленко. CONTRACT PROTECTION ON THE FRONTEND SIDE: HOW TO ORGANIZE R...Олексій Павленко. CONTRACT PROTECTION ON THE FRONTEND SIDE: HOW TO ORGANIZE R...
Олексій Павленко. CONTRACT PROTECTION ON THE FRONTEND SIDE: HOW TO ORGANIZE R...
 
An Empirical Study of Unspecified Dependencies in Make-Based Build Systems
An Empirical Study of Unspecified Dependencies in Make-Based Build SystemsAn Empirical Study of Unspecified Dependencies in Make-Based Build Systems
An Empirical Study of Unspecified Dependencies in Make-Based Build Systems
 
Fp201 unit1 1
Fp201 unit1 1Fp201 unit1 1
Fp201 unit1 1
 
REST vs. GraphQL: Critical Look
REST vs. GraphQL: Critical LookREST vs. GraphQL: Critical Look
REST vs. GraphQL: Critical Look
 
Measuring Your Code
Measuring Your CodeMeasuring Your Code
Measuring Your Code
 
Streamlined Geek Talk
Streamlined Geek TalkStreamlined Geek Talk
Streamlined Geek Talk
 
Concurrent Ruby Application Servers
Concurrent Ruby Application ServersConcurrent Ruby Application Servers
Concurrent Ruby Application Servers
 
An Efficient Approach for Requirement Traceability Integrated With Software R...
An Efficient Approach for Requirement Traceability Integrated With Software R...An Efficient Approach for Requirement Traceability Integrated With Software R...
An Efficient Approach for Requirement Traceability Integrated With Software R...
 
Put Your Hands in the Mud: What Technique, Why, and How
Put Your Hands in the Mud: What Technique, Why, and HowPut Your Hands in the Mud: What Technique, Why, and How
Put Your Hands in the Mud: What Technique, Why, and How
 
Show Some Spine!
Show Some Spine!Show Some Spine!
Show Some Spine!
 
Package Repositories: The Unsung Heroes of Configuration and Release Managem...
Package Repositories:  The Unsung Heroes of Configuration and Release Managem...Package Repositories:  The Unsung Heroes of Configuration and Release Managem...
Package Repositories: The Unsung Heroes of Configuration and Release Managem...
 
An Efficient Approach for Requirement Traceability Integrated With Software ...
An Efficient Approach for Requirement Traceability Integrated  With Software ...An Efficient Approach for Requirement Traceability Integrated  With Software ...
An Efficient Approach for Requirement Traceability Integrated With Software ...
 
Towards a Quality Assessment of Web Corpora for Language Technology Applications
Towards a Quality Assessment of Web Corpora for Language Technology ApplicationsTowards a Quality Assessment of Web Corpora for Language Technology Applications
Towards a Quality Assessment of Web Corpora for Language Technology Applications
 
Technical Challenges in Resource Discovery
Technical Challenges in Resource DiscoveryTechnical Challenges in Resource Discovery
Technical Challenges in Resource Discovery
 
Technical Coping Strategies for Resource Discovery - Paul Walk
Technical Coping Strategies for Resource Discovery - Paul WalkTechnical Coping Strategies for Resource Discovery - Paul Walk
Technical Coping Strategies for Resource Discovery - Paul Walk
 
A Tasty deep-dive into Open API Specification Links
A Tasty deep-dive into Open API Specification LinksA Tasty deep-dive into Open API Specification Links
A Tasty deep-dive into Open API Specification Links
 
Actor Concurrency Bugs: A Comprehensive Study on Symptoms, Root Causes, API U...
Actor Concurrency Bugs: A Comprehensive Study on Symptoms, Root Causes, API U...Actor Concurrency Bugs: A Comprehensive Study on Symptoms, Root Causes, API U...
Actor Concurrency Bugs: A Comprehensive Study on Symptoms, Root Causes, API U...
 
2014 IEEE DOTNET SOFTWARE ENGINEERING PROJECT Automatic summarization of bug ...
2014 IEEE DOTNET SOFTWARE ENGINEERING PROJECT Automatic summarization of bug ...2014 IEEE DOTNET SOFTWARE ENGINEERING PROJECT Automatic summarization of bug ...
2014 IEEE DOTNET SOFTWARE ENGINEERING PROJECT Automatic summarization of bug ...
 
IEEE 2014 DOTNET SOFTWARE ENGINEER PROJECTS Automatic summarization of bug re...
IEEE 2014 DOTNET SOFTWARE ENGINEER PROJECTS Automatic summarization of bug re...IEEE 2014 DOTNET SOFTWARE ENGINEER PROJECTS Automatic summarization of bug re...
IEEE 2014 DOTNET SOFTWARE ENGINEER PROJECTS Automatic summarization of bug re...
 
CROSSMINER Project at OW2con'19
CROSSMINER Project at OW2con'19CROSSMINER Project at OW2con'19
CROSSMINER Project at OW2con'19
 

Mais de Sung Kim

DeepAM: Migrate APIs with Multi-modal Sequence to Sequence Learning
DeepAM: Migrate APIs with Multi-modal Sequence to Sequence LearningDeepAM: Migrate APIs with Multi-modal Sequence to Sequence Learning
DeepAM: Migrate APIs with Multi-modal Sequence to Sequence LearningSung Kim
 
Deep API Learning (FSE 2016)
Deep API Learning (FSE 2016)Deep API Learning (FSE 2016)
Deep API Learning (FSE 2016)Sung Kim
 
Time series classification
Time series classificationTime series classification
Time series classificationSung Kim
 
Tensor board
Tensor boardTensor board
Tensor boardSung Kim
 
REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria...
REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria...REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria...
REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria...Sung Kim
 
Heterogeneous Defect Prediction (

ESEC/FSE 2015)
Heterogeneous Defect Prediction (

ESEC/FSE 2015)Heterogeneous Defect Prediction (

ESEC/FSE 2015)
Heterogeneous Defect Prediction (

ESEC/FSE 2015)Sung Kim
 
A Survey on Automatic Software Evolution Techniques
A Survey on Automatic Software Evolution TechniquesA Survey on Automatic Software Evolution Techniques
A Survey on Automatic Software Evolution TechniquesSung Kim
 
Crowd debugging (FSE 2015)
Crowd debugging (FSE 2015)Crowd debugging (FSE 2015)
Crowd debugging (FSE 2015)Sung Kim
 
Software Defect Prediction on Unlabeled Datasets
Software Defect Prediction on Unlabeled DatasetsSoftware Defect Prediction on Unlabeled Datasets
Software Defect Prediction on Unlabeled DatasetsSung Kim
 
Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)
Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)
Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)Sung Kim
 
Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)
Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)
Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)Sung Kim
 
How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...
How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...
How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...Sung Kim
 
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)Sung Kim
 
Source code comprehension on evolving software
Source code comprehension on evolving softwareSource code comprehension on evolving software
Source code comprehension on evolving softwareSung Kim
 
A Survey on Dynamic Symbolic Execution for Automatic Test Generation
A Survey on  Dynamic Symbolic Execution  for Automatic Test GenerationA Survey on  Dynamic Symbolic Execution  for Automatic Test Generation
A Survey on Dynamic Symbolic Execution for Automatic Test GenerationSung Kim
 
Survey on Software Defect Prediction
Survey on Software Defect PredictionSurvey on Software Defect Prediction
Survey on Software Defect PredictionSung Kim
 
MSR2014 opening
MSR2014 openingMSR2014 opening
MSR2014 openingSung Kim
 
Personalized Defect Prediction
Personalized Defect PredictionPersonalized Defect Prediction
Personalized Defect PredictionSung Kim
 
STAR: Stack Trace based Automatic Crash Reproduction
STAR: Stack Trace based Automatic Crash ReproductionSTAR: Stack Trace based Automatic Crash Reproduction
STAR: Stack Trace based Automatic Crash ReproductionSung Kim
 
Transfer defect learning
Transfer defect learningTransfer defect learning
Transfer defect learningSung Kim
 

Mais de Sung Kim (20)

DeepAM: Migrate APIs with Multi-modal Sequence to Sequence Learning
DeepAM: Migrate APIs with Multi-modal Sequence to Sequence LearningDeepAM: Migrate APIs with Multi-modal Sequence to Sequence Learning
DeepAM: Migrate APIs with Multi-modal Sequence to Sequence Learning
 
Deep API Learning (FSE 2016)
Deep API Learning (FSE 2016)Deep API Learning (FSE 2016)
Deep API Learning (FSE 2016)
 
Time series classification
Time series classificationTime series classification
Time series classification
 
Tensor board
Tensor boardTensor board
Tensor board
 
REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria...
REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria...REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria...
REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria...
 
Heterogeneous Defect Prediction (

ESEC/FSE 2015)
Heterogeneous Defect Prediction (

ESEC/FSE 2015)Heterogeneous Defect Prediction (

ESEC/FSE 2015)
Heterogeneous Defect Prediction (

ESEC/FSE 2015)
 
A Survey on Automatic Software Evolution Techniques
A Survey on Automatic Software Evolution TechniquesA Survey on Automatic Software Evolution Techniques
A Survey on Automatic Software Evolution Techniques
 
Crowd debugging (FSE 2015)
Crowd debugging (FSE 2015)Crowd debugging (FSE 2015)
Crowd debugging (FSE 2015)
 
Software Defect Prediction on Unlabeled Datasets
Software Defect Prediction on Unlabeled DatasetsSoftware Defect Prediction on Unlabeled Datasets
Software Defect Prediction on Unlabeled Datasets
 
Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)
Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)
Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)
 
Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)
Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)
Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)
 
How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...
How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...
How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...
 
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)
 
Source code comprehension on evolving software
Source code comprehension on evolving softwareSource code comprehension on evolving software
Source code comprehension on evolving software
 
A Survey on Dynamic Symbolic Execution for Automatic Test Generation
A Survey on  Dynamic Symbolic Execution  for Automatic Test GenerationA Survey on  Dynamic Symbolic Execution  for Automatic Test Generation
A Survey on Dynamic Symbolic Execution for Automatic Test Generation
 
Survey on Software Defect Prediction
Survey on Software Defect PredictionSurvey on Software Defect Prediction
Survey on Software Defect Prediction
 
MSR2014 opening
MSR2014 openingMSR2014 opening
MSR2014 opening
 
Personalized Defect Prediction
Personalized Defect PredictionPersonalized Defect Prediction
Personalized Defect Prediction
 
STAR: Stack Trace based Automatic Crash Reproduction
STAR: Stack Trace based Automatic Crash ReproductionSTAR: Stack Trace based Automatic Crash Reproduction
STAR: Stack Trace based Automatic Crash Reproduction
 
Transfer defect learning
Transfer defect learningTransfer defect learning
Transfer defect learning
 

Último

Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 

Último (20)

Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 

ReLink: Recovering Links between Bugs and Changes (ESEC/FSE 2011)

  • 1. Rongxin Wu, Hongyu Zhang, Sunghum Kim, Shi-chi Cheung Tsinghua University, China The Hong Kong University of Science and Technology, Hong Kong 1
  • 2. • The links between fixed bugs and committed changes are important: – for measuring software quality – for constructing defect prediction models Committed Fixed Changes Bugs BugZilla CVS/SVN 2
  • 3. • To discover the links: Mining software repository! • Heuristics traditionally used to collect links between bugs and changes: Searching for keywords (such as “Fixed” or “Bug”) and Bug IDs Bugzilla Mailings Source CVS/ Execution Code SVN traces Crash Require- Developer ments Logs … 3
  • 5. Missing Links! Bird et al. “Fair and Balanced? Bias in Bug-Fix Datasets”, FSE 2009 a5
  • 6. • Missing bug reference in change log • Irregular bug reference formats  “issue 681” , “bug 232”, “Fixed for #239”, “see #149”, “solve problem 681”,  Typos: “Fic 239” 6
  • 7. • To recover the missing links, we studied many bug reports (including comments) and change logs • We have identified the following features of links: – Time interval: the bug-fix time and change committed time are close 7
  • 8. • Time interval between bug-fix time and change committed time 8
  • 9. • Through empirical studies, we have identified the following features of links: – Bug owner and change committer: they are often the same person, or have mapping relationships 9
  • 10. Mapping • Bug owner and change committer relationship Bug Owner Change Committer Project dswitkin@gmail.com dswitkin ZXing dswitkin@google.com dswitkin@google.com ZXing srowen@gmail.com srowen ZXing pelili0101@googlemail.c peli0101 Openintents om Will Rowe Wrowe Apache Erik Abele Erikabele Apache 10
  • 11. Bug owner and change committer 11
  • 12. • Through empirical studies, we have identified the following features of links: – Text similarity: the textual descriptions in the bug report are often similar to those in the change logs. 12
  • 13. • Text similarity Texts are similar! Using IR technology to measure similarity 13
  • 14. 14
  • 15. • To determine the criteria of features, we learn from the explicit links that can be identified through traditional heuristics: – For the time interval feature and the text similarity feature, we exhaustively search for the optimal combination of these two values so that the maximum F-measure can be achieved. – For the mappings between bug owners and change committers, we also learn them from the explicit links. 15
  • 16. • Determine time interval and similarity threshold Step by step search the optimal similarity threshold and time interval values
  • 17. • Determine mapping relationship between bug owners and change committers To find the possible mappings from the explicit links
  • 18. • To obtain the ground truth (“golden set” of links) • For ZXing and OpenIntents, we manually identify the links • For Apache, we use the data provided by Bird et al. (annotated by an Apache core developer)
  • 19. • Four possible outcomes – A link we identify is a true link → TP – A link we identify is not a true link → FP – A link we miss is a true link → FN – A link we miss is not a true link → TN • Evaluation Metrics TP TP Precision Recall TP FP TP FN 2 * Precision * Recall FMeasure Precision Recall 19
  • 20. F-measure Recall ReLink Traditional Precision 0.65 0.7 0.75 0.8 0.85 0.9 Performance of ReLink in Apache Project
  • 21. 21
  • 22. • What can we do with the recovered links? – Improving Maintainability Measurement The percentage of bug-fixing changes The percentage of buggy files Mean time to fix – Constructing better software defect prediction models
  • 24. 24
  • 25. • Defect Prediction ReLink can improve the performance of defect prediction!
  • 26. • The quality of golden set of links can’t be completely assured • All the datasets are collected from open source projects • The approach needs to be verified in more projects 26
  • 27. • We propose ReLink to recover the missing links • The recovered links have positive impact on the follow-up software maintenance studies including defect prediction and maintainability measurement. • Future work:  Further improving the performance of ReLink  Applying to more projects including industrial projects 27
  • 28. Thank you! Dr Hongyu Zhang School of Software, Tsinghua University Beijing 100084, China Email: hongyu@tsinghua.edu.cn Web: http://sites.google.com/site/hongyujohn/ 28