SlideShare a Scribd company logo
1 of 22
Late Propagation
  in Software Clones
Liliane Barbour, Foutse Khomh,
          and Ying Zou
Late Propagation (LP)
• Definition: An inconsistent change that diverges a
  clone pair, later followed by a consistent, re-
  synchronizing change.
• It can be risky because failure to propagate changes
  between clones in a clone pair can lead to faults
• In our work, we found that 8-21% of genealogies
  contain a late propagation




                                                         2
LP With Propagation Example from
                ArgoUML
//Clone A, Revision 595
add Field(new UMLComboBox(typeModel),1,0,0);

//Clone B, Revision 595
add Field(new UMLComboBox(classifierModel),2,0,0);

//Diverging Change: Clone A, Revision 602
add Field(new UMLComboBoxNavigator(this,”NavClass”,
         new UMLComboBox(typeModel)),1,0,0);

//Re-synchronizing Change: Clone B, Revision 604
add Field(new UMLComboBoxNavigator (this,”NavClass”,
         new UMLComboBox(classifierModel)),2,0,0);
                                                          Clone A   Clone B

                                                Revision 595



                                                Revision 602              Diverging
                                                                          Change


                                                                          Re-synchronizing
                                                Revision 604              Change    3
LP Without Propagation Example
               from Ant
//Clone A, Revision 270250                                  Clone A   Clone B
if( destFile == null )
{                                                    Revision
   destFile = new File(destDir,file.getName());      270250
}

//Clone B, Revision 270250                           Revision              Diverging
if (destFile == null ) {                             270264                Change
   destFile = new File(destDir,file.getName());
}
                                                   Revision                Re-synchronizing
// Diverging Change: Clone A, Revision 270264      271109                  Change
if ( m_destFile == null )
{
   m_destFile = new File(m_destDir,m_file.getName());
}

//Re-synchronizing Change: Clone A, Revision 271109
if ( destFile == null ) {
   destFile = new File(destDir,file.getName());
}



                                                                                   4
Types of Late Propagation
Propagation       LP     Modified During Modified During   Modified During
Category          Type   Diverging Change the Period of    Re-synchronizing
                                          Divergence       Change
Propagation        LP1          A               A                  B
Always Occurs      LP2          A             A and B              B
                   LP3          A               A               A and B
Propagation May    LP4          A             A and B              A
or May Not         LP5          A             A and B           A and B
Occur
                   LP6       A and B          A and B            A or B
                   LP7       A and B          A and B           A and B
Propagation        LP8          A               A                  A
Never Occurs



                                                                              5
Research Questions
RQ1: Are there different types of LP?

RQ2: Are some types of LP more fault-prone than
  others?

RQ3: Which type of LP experiences the highest
    proportion of faults?



                                                  6
Subject Systems


                             # Gen    # LP     # Gen    # LP
System   # LOC # Revisions   CCFinder CCFinder Simian   Simian
ArgoUML 3.1M       18k         14k      1.1k     111      23
  Ant    2.3M     1.0M         30k      4.7k     461      80




                                                                 7
Our Approach




               8
Mining the SVN




• Use J-Rex to mine the SVN
• Heuristics used to identify reason for commit
  (Mockus et al., 2000)
• Snapshots of all revisions to each Java file are stored
  in an XML file
• Test files are removed
                                                            9
Clone Detection




• Contents of each method revision extracted into
  individual files
• Perform clone detection once on all snapshots
• Two existing clone detection tools are used
   – Simian (text-based) and CCFinder (token-based)
                                                      10
Building Clone Genealogies




• Build clone genealogies using the existing clone list
• Query the SVN using diff to track changes to each
  clone in a clone pair over time.
• If a change modifies one of the clones in a clone
  pair, query the clone list for a matching clone
                                                          11
RQ1: Are there different types of LP?




                                    12
RQ1: Are there different types of LP?
                                            Breakdown of LP Type by System
                                   80%
Percentage of All LP Occurrences



                                   70%
                                   60%
                                   50%
                                   40%
                                   30%
                                   20%
                                   10%
                                    0%
                                          LP1     LP2       LP3     LP4     LP5       LP6     LP7     LP8
                                                                      LP Types
                                   ArgoUML - Simian     ArgoUML - CCFinder     Ant - Simian   Ant - CCFinder


                There is representation from multiple types of LP
                          and across all categories of LP.                                                     13
RQ2: Are some types of LP more fault-
         prone than others?




      Part 1: Is Late Propagation fault-prone?

 Part 2: Are specific types of late propagation more
                       fault-prone?

                                                       14
Part 1: Is Late Propagation Fault-
                  prone?
                              LP vs. Non-LP
                               Odds Ratios
                   4
                                                                     ArgoUML – Simian
      Odds Ratio




                   3
                                                                    is omitted because
                   2
                                                                    it is not statistically
                   1                                                      significant
                   0
               Ant - Simian   ArgoUML - CCFinder   Ant - CCFinder


In all significant cases, the odds ratio is greater than 1.
 Therefore, LP genealogies are more fault prone than
                    non-LP genealogies.
                                                                                      15
Part 2: Are specific types of late
 propagation more fault-prone?
                    Odds Ratios Between Each LP Type
                        and Non-LP Genealogies
               16
               14
               12
  Odds Ratio




               10
                8
                6
                4
                2
                0
                      LP1     LP2   LP3    LP4    LP5    LP6   LP7     LP8
                                             LP Type
                    Ant - Simian    ArgoUML - CCFinder    Ant - CCFinder

Note: ArgoUML – Simian is omitted because it is not statistically significant   16
RQ2 Observations
• In general, some LP types are not more fault-prone
  than non-LP genealogies (i.e. odds ratio < 1)
• Some types that make up a small proportion of LP
  instances have a very high odds ratio
• LP7 and LP8 occur frequently but have low odds
  ratios.
Each type of LP has a different level of fault-proneness.



                                                       17
RQ3: Which type of LP experiences
 the highest proportion of faults?




                                     18
RQ3: Which type of LP experiences
 the highest proportion of faults?
                                          Percentage of Fault Occurrences
                                             Broken Down by LP Type
  Percentage of Fault Occurrences




                                    80%

                                    60%

                                    40%

                                    20%

                                    0%
                                           LP1   LP2    LP3    LP4    LP5   LP6    LP7    LP8
                                                                 LP Type

                                      Ant - Simian     ArgoUML - CCFinder    Ant - CCFinder

Note: ArgoUML – Simian is omitted because it is not statistically significant                   19
RQ3 Observations
• LP7 and LP8 contribute a large proportion of the
  faults but have lower odds ratios (RQ2)
   – When faults occur, they occur in large numbers
• Overall, LP7 and LP8 are the most dangerous, with
  the other types being system dependent in their
  fault-proneness.


       The proportion of faults is different for
                   each LP type.

                                                      20
Conclusion
• In general, LP genealogies are more fault-prone than
  non-LP genealogies
• LP7 and LP8 are the riskiest, in terms of their fault-
  proneness and magnitude of faults.
   – LP8 contains no propagation of changes
   – LP7 may or may not contain any propagation of
     changes
• The fault-proneness and fault-occurrence is
  dependent on the LP type and is system-dependent.

                                                       21
22

More Related Content

More from Foutse Khomh

Predicting bugs using antipatterns
Predicting bugs using antipatternsPredicting bugs using antipatterns
Predicting bugs using antipatterns
Foutse Khomh
 
Adapting Linux for Mobile Platforms: An Empirical Study of Android
Adapting Linux for Mobile Platforms: An Empirical Study of AndroidAdapting Linux for Mobile Platforms: An Empirical Study of Android
Adapting Linux for Mobile Platforms: An Empirical Study of Android
Foutse Khomh
 
Recovering Commit Dependencies for Selective Code Integration in Software Pro...
Recovering Commit Dependencies for Selective Code Integration in Software Pro...Recovering Commit Dependencies for Selective Code Integration in Software Pro...
Recovering Commit Dependencies for Selective Code Integration in Software Pro...
Foutse Khomh
 
An Entropy Evaluation Approach for Triaging Field Crashes: A Case Study of Mo...
An Entropy Evaluation Approach for Triaging Field Crashes: A Case Study of Mo...An Entropy Evaluation Approach for Triaging Field Crashes: A Case Study of Mo...
An Entropy Evaluation Approach for Triaging Field Crashes: A Case Study of Mo...
Foutse Khomh
 
Do Faster Releases Improve Software Quality?
Do Faster Releases Improve Software Quality? Do Faster Releases Improve Software Quality?
Do Faster Releases Improve Software Quality?
Foutse Khomh
 

More from Foutse Khomh (12)

Talk-Foutse-SrangeLoop.pdf
Talk-Foutse-SrangeLoop.pdfTalk-Foutse-SrangeLoop.pdf
Talk-Foutse-SrangeLoop.pdf
 
Foutse_Khomh.pptx
Foutse_Khomh.pptxFoutse_Khomh.pptx
Foutse_Khomh.pptx
 
Foutse_MSR Vision keynote.pptx
Foutse_MSR Vision keynote.pptxFoutse_MSR Vision keynote.pptx
Foutse_MSR Vision keynote.pptx
 
Stack overflow code_laundering
Stack overflow code_launderingStack overflow code_laundering
Stack overflow code_laundering
 
Mining the Relationship between Anti-patterns Dependencies and Fault-Proneness
Mining the Relationship between Anti-patterns Dependencies and Fault-PronenessMining the Relationship between Anti-patterns Dependencies and Fault-Proneness
Mining the Relationship between Anti-patterns Dependencies and Fault-Proneness
 
Predicting bugs using antipatterns
Predicting bugs using antipatternsPredicting bugs using antipatterns
Predicting bugs using antipatterns
 
How does Context affect the Distribution of Software Maintainability Metrics?
How does Context affect the Distribution of Software Maintainability Metrics?How does Context affect the Distribution of Software Maintainability Metrics?
How does Context affect the Distribution of Software Maintainability Metrics?
 
On Rapid Releases and Software Testing
On Rapid Releases and Software TestingOn Rapid Releases and Software Testing
On Rapid Releases and Software Testing
 
Adapting Linux for Mobile Platforms: An Empirical Study of Android
Adapting Linux for Mobile Platforms: An Empirical Study of AndroidAdapting Linux for Mobile Platforms: An Empirical Study of Android
Adapting Linux for Mobile Platforms: An Empirical Study of Android
 
Recovering Commit Dependencies for Selective Code Integration in Software Pro...
Recovering Commit Dependencies for Selective Code Integration in Software Pro...Recovering Commit Dependencies for Selective Code Integration in Software Pro...
Recovering Commit Dependencies for Selective Code Integration in Software Pro...
 
An Entropy Evaluation Approach for Triaging Field Crashes: A Case Study of Mo...
An Entropy Evaluation Approach for Triaging Field Crashes: A Case Study of Mo...An Entropy Evaluation Approach for Triaging Field Crashes: A Case Study of Mo...
An Entropy Evaluation Approach for Triaging Field Crashes: A Case Study of Mo...
 
Do Faster Releases Improve Software Quality?
Do Faster Releases Improve Software Quality? Do Faster Releases Improve Software Quality?
Do Faster Releases Improve Software Quality?
 

Recently uploaded

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 

Late Propagation in Software Clones

  • 1. Late Propagation in Software Clones Liliane Barbour, Foutse Khomh, and Ying Zou
  • 2. Late Propagation (LP) • Definition: An inconsistent change that diverges a clone pair, later followed by a consistent, re- synchronizing change. • It can be risky because failure to propagate changes between clones in a clone pair can lead to faults • In our work, we found that 8-21% of genealogies contain a late propagation 2
  • 3. LP With Propagation Example from ArgoUML //Clone A, Revision 595 add Field(new UMLComboBox(typeModel),1,0,0); //Clone B, Revision 595 add Field(new UMLComboBox(classifierModel),2,0,0); //Diverging Change: Clone A, Revision 602 add Field(new UMLComboBoxNavigator(this,”NavClass”, new UMLComboBox(typeModel)),1,0,0); //Re-synchronizing Change: Clone B, Revision 604 add Field(new UMLComboBoxNavigator (this,”NavClass”, new UMLComboBox(classifierModel)),2,0,0); Clone A Clone B Revision 595 Revision 602 Diverging Change Re-synchronizing Revision 604 Change 3
  • 4. LP Without Propagation Example from Ant //Clone A, Revision 270250 Clone A Clone B if( destFile == null ) { Revision destFile = new File(destDir,file.getName()); 270250 } //Clone B, Revision 270250 Revision Diverging if (destFile == null ) { 270264 Change destFile = new File(destDir,file.getName()); } Revision Re-synchronizing // Diverging Change: Clone A, Revision 270264 271109 Change if ( m_destFile == null ) { m_destFile = new File(m_destDir,m_file.getName()); } //Re-synchronizing Change: Clone A, Revision 271109 if ( destFile == null ) { destFile = new File(destDir,file.getName()); } 4
  • 5. Types of Late Propagation Propagation LP Modified During Modified During Modified During Category Type Diverging Change the Period of Re-synchronizing Divergence Change Propagation LP1 A A B Always Occurs LP2 A A and B B LP3 A A A and B Propagation May LP4 A A and B A or May Not LP5 A A and B A and B Occur LP6 A and B A and B A or B LP7 A and B A and B A and B Propagation LP8 A A A Never Occurs 5
  • 6. Research Questions RQ1: Are there different types of LP? RQ2: Are some types of LP more fault-prone than others? RQ3: Which type of LP experiences the highest proportion of faults? 6
  • 7. Subject Systems # Gen # LP # Gen # LP System # LOC # Revisions CCFinder CCFinder Simian Simian ArgoUML 3.1M 18k 14k 1.1k 111 23 Ant 2.3M 1.0M 30k 4.7k 461 80 7
  • 9. Mining the SVN • Use J-Rex to mine the SVN • Heuristics used to identify reason for commit (Mockus et al., 2000) • Snapshots of all revisions to each Java file are stored in an XML file • Test files are removed 9
  • 10. Clone Detection • Contents of each method revision extracted into individual files • Perform clone detection once on all snapshots • Two existing clone detection tools are used – Simian (text-based) and CCFinder (token-based) 10
  • 11. Building Clone Genealogies • Build clone genealogies using the existing clone list • Query the SVN using diff to track changes to each clone in a clone pair over time. • If a change modifies one of the clones in a clone pair, query the clone list for a matching clone 11
  • 12. RQ1: Are there different types of LP? 12
  • 13. RQ1: Are there different types of LP? Breakdown of LP Type by System 80% Percentage of All LP Occurrences 70% 60% 50% 40% 30% 20% 10% 0% LP1 LP2 LP3 LP4 LP5 LP6 LP7 LP8 LP Types ArgoUML - Simian ArgoUML - CCFinder Ant - Simian Ant - CCFinder There is representation from multiple types of LP and across all categories of LP. 13
  • 14. RQ2: Are some types of LP more fault- prone than others? Part 1: Is Late Propagation fault-prone? Part 2: Are specific types of late propagation more fault-prone? 14
  • 15. Part 1: Is Late Propagation Fault- prone? LP vs. Non-LP Odds Ratios 4 ArgoUML – Simian Odds Ratio 3 is omitted because 2 it is not statistically 1 significant 0 Ant - Simian ArgoUML - CCFinder Ant - CCFinder In all significant cases, the odds ratio is greater than 1. Therefore, LP genealogies are more fault prone than non-LP genealogies. 15
  • 16. Part 2: Are specific types of late propagation more fault-prone? Odds Ratios Between Each LP Type and Non-LP Genealogies 16 14 12 Odds Ratio 10 8 6 4 2 0 LP1 LP2 LP3 LP4 LP5 LP6 LP7 LP8 LP Type Ant - Simian ArgoUML - CCFinder Ant - CCFinder Note: ArgoUML – Simian is omitted because it is not statistically significant 16
  • 17. RQ2 Observations • In general, some LP types are not more fault-prone than non-LP genealogies (i.e. odds ratio < 1) • Some types that make up a small proportion of LP instances have a very high odds ratio • LP7 and LP8 occur frequently but have low odds ratios. Each type of LP has a different level of fault-proneness. 17
  • 18. RQ3: Which type of LP experiences the highest proportion of faults? 18
  • 19. RQ3: Which type of LP experiences the highest proportion of faults? Percentage of Fault Occurrences Broken Down by LP Type Percentage of Fault Occurrences 80% 60% 40% 20% 0% LP1 LP2 LP3 LP4 LP5 LP6 LP7 LP8 LP Type Ant - Simian ArgoUML - CCFinder Ant - CCFinder Note: ArgoUML – Simian is omitted because it is not statistically significant 19
  • 20. RQ3 Observations • LP7 and LP8 contribute a large proportion of the faults but have lower odds ratios (RQ2) – When faults occur, they occur in large numbers • Overall, LP7 and LP8 are the most dangerous, with the other types being system dependent in their fault-proneness. The proportion of faults is different for each LP type. 20
  • 21. Conclusion • In general, LP genealogies are more fault-prone than non-LP genealogies • LP7 and LP8 are the riskiest, in terms of their fault- proneness and magnitude of faults. – LP8 contains no propagation of changes – LP7 may or may not contain any propagation of changes • The fault-proneness and fault-occurrence is dependent on the LP type and is system-dependent. 21
  • 22. 22