O slideshow foi denunciado.
Seu SlideShare está sendo baixado. ×

Mining Co-Change Information to Understand when Build Changes are Necessary

Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Carregando em…3
×

Confira estes a seguir

1 de 89 Anúncio

Mining Co-Change Information to Understand when Build Changes are Necessary

Baixar para ler offline

As a software project ages, its source code is modified to add new features, restructure existing ones, and fix defects. These source code changes often induce changes in the build system, i.e., the system that specifies how source code is translated into deliverables. However, since developers are often not familiar with the complex and occasionally archaic technologies used to specify build systems, they may not be able to identify when their source code changes require accompanying build system changes. This can cause build breakages that slow development progress and impact other developers, testers, or even users. In this paper, we mine the source and test code changes that required accompanying build changes in order to better understand this co-change relationship. We build random forest classifiers using language-agnostic and language-specific code change characteristics to explain when code-accompanying build changes are necessary based on historical trends. Case studies of the Mozilla C++ system, the Lucene and Eclipse open source Java systems, and the IBM Jazz proprietary Java system indicate that our classifiers can accurately explain when build co-changes are necessary with an AUC of 0.60-0.88. Unsurprisingly, our highly accurate C++ classifiers (AUC of 0.88) derive much of their explanatory power from indicators of structural change (e.g., was a new source file added?). On the other hand, our Java classifiers are less accurate (AUC of 0.60-0.78) because roughly 75% of Java build co-changes do not coincide with changes to the structure of a system, but rather are instigated by concerns related to release engineering, quality assurance, and general build maintenance.

As a software project ages, its source code is modified to add new features, restructure existing ones, and fix defects. These source code changes often induce changes in the build system, i.e., the system that specifies how source code is translated into deliverables. However, since developers are often not familiar with the complex and occasionally archaic technologies used to specify build systems, they may not be able to identify when their source code changes require accompanying build system changes. This can cause build breakages that slow development progress and impact other developers, testers, or even users. In this paper, we mine the source and test code changes that required accompanying build changes in order to better understand this co-change relationship. We build random forest classifiers using language-agnostic and language-specific code change characteristics to explain when code-accompanying build changes are necessary based on historical trends. Case studies of the Mozilla C++ system, the Lucene and Eclipse open source Java systems, and the IBM Jazz proprietary Java system indicate that our classifiers can accurately explain when build co-changes are necessary with an AUC of 0.60-0.88. Unsurprisingly, our highly accurate C++ classifiers (AUC of 0.88) derive much of their explanatory power from indicators of structural change (e.g., was a new source file added?). On the other hand, our Java classifiers are less accurate (AUC of 0.60-0.78) because roughly 75% of Java build co-changes do not coincide with changes to the structure of a system, but rather are instigated by concerns related to release engineering, quality assurance, and general build maintenance.

Anúncio
Anúncio

Mais Conteúdo rRelacionado

Diapositivos para si (20)

Quem viu também gostou (19)

Anúncio

Semelhante a Mining Co-Change Information to Understand when Build Changes are Necessary (20)

Mais recentes (20)

Anúncio

Mining Co-Change Information to Understand when Build Changes are Necessary

  1. 1. Mining Co-Change Information to Understand when Build Changes are Necessary @shane_mcintosh Shane McIntosh Bram Adams Meiyappan Nagappan Ahmed E. Hassan
  2. 2. What is a build system? Source code 2
  3. 3. What is a build system? Source code Deliverable 2
  4. 4. Build systems describe how sources are! .c .cc .tex .o .o .dvi .a .exe .pdf .deb translated into deliverables 3
  5. 5. Continuous Integration:! Enabled by the build system 4 .c .mk
  6. 6. Continuous Integration:! Enabled by the build system Commit 4 Commit 9719cf0 .c .mk
  7. 7. Continuous Integration:! Enabled by the build system Commit 4 Build Commit 9719cf0 .c .mk
  8. 8. Continuous Integration:! Enabled by the build system Commit 4 Build Test Commit 9719cf0 .c .mk
  9. 9. Commit 9719cf0 Continuous Integration:! Enabled by the build system Commit 4 Build Test Report Commit 9719cf0 was successfully integrated .c .mk
  10. 10. Commit 9719cf0 Continuous Integration:! Enabled by the build system Commit 4 Build Test Report Commit 9719cf0 was successfully integrated .c .mk
  11. 11. “...nothing can be said to be certain, except death and taxes” - Benjamin Franklin The Build “Tax” Up to 27% of source changes require build changes, too! An Empirical Study of Build Maintenance Effort! S. McIntosh, B. Adams, T. H. D. Nguyen, Y. Kamei, A. E. Hassan [ICSE 2011] 5
  12. 12. Neglected build maintenance! is a frequent cause build breaks 6 .c .mk
  13. 13. Commit Neglected build maintenance! is a frequent cause build breaks 6 Commit aedd38 .c .mk
  14. 14. Commit Neglected build maintenance! is a frequent cause build breaks 6 Commit aedd38 .c .mk
  15. 15. Commit Neglected build maintenance! is a frequent cause build breaks 6 Build Commit aedd38 .c .mk
  16. 16. Commit Neglected build maintenance! is a frequent cause build breaks 6 Build Test Commit aedd38 .c .mk
  17. 17. Commit Neglected build maintenance! is a frequent cause build breaks 6 Build Test Commit aedd38 .c .mk
  18. 18. Commit Neglected build maintenance! is a frequent cause build breaks 6 Build Test Report Commit aedd38 .c .mk Commit aedd38 broke the build!
  19. 19. Neglected build maintenance! can even impact end users 7
  20. 20. Neglected build maintenance! can even impact end users 7 Not working due to linking of incorrect SQLite library version
  21. 21. Neglected build maintenance! can even impact end users 7 When are build! changes necessary? Not working due to linking of incorrect SQLite library version
  22. 22. Overview of the studied systems 8 C++ Java
  23. 23. Overview of the studied systems 8 C++ Java
  24. 24. Overview of the studied systems 8 C++ Java 13 years of historical data
  25. 25. Overview of the studied systems 8 C++ Java 13 years of historical data Single build system for Firefox, Thunderbird, etc.
  26. 26. Overview of the studied systems 8 C++ Java 13 years of historical data Single build system for Firefox, Thunderbird, etc. Total of 16 years of historical data
  27. 27. Overview of the studied systems 8 C++ Java 13 years of historical data Single build system for Firefox, Thunderbird, etc. Total of 16 years of historical data Proprietary and open source systems
  28. 28. Semi-automatic identification of file types using naming conventions .m..cck ..ad.caht 9
  29. 29. Semi-automatic identification of file types using naming conventions .c .mk .c Source Build Test .h .dat .ac Source Test Build 9
  30. 30. Grouping related changes according to the work items that they address 10
  31. 31. Grouping related changes according to the work items that they address .c .c .c Changes .mk 10
  32. 32. Grouping related changes according to the work items that they address Missed code! in #2121 .c .c .c Add feature! #2121 Fix for! bug #1234 Changes Transactions .mk 10
  33. 33. Grouping related changes according to the work items that they address Missed code! 2121 in #2121 .c .c .c Add feature! #2121 Fix for! bug #1234 1234 Changes Transactions Work items .mk 10
  34. 34. 11
  35. 35. ! Case study structure 11
  36. 36. ! Case study structure 11 (RQ1)! Co-change frequency .c ? .mk
  37. 37. ! Case study structure (RQ2)! Co-change prediction accuracy 11 (RQ1)! Co-change frequency .c ? .mk
  38. 38. ! Case study structure (RQ2)! Co-change prediction accuracy 11 (RQ1)! Co-change frequency .c ? .mk (RQ3)! Most powerful co-change indicators
  39. 39. ! Case study structure (RQ2)! Co-change prediction accuracy 12 (RQ1)! Co-change frequency .c ? .mk (RQ3)! Most powerful co-change indicators
  40. 40. The proportion of build changes that are accompanied by source code changes 13 .c ? .mk
  41. 41. The proportion of build changes that are accompanied by source code changes .mk .c .c .mk .mk .c 13 .c ? .mk
  42. 42. The proportion of build changes that are accompanied by source code changes 13 .c ? .mk .c .c Work! items .mk 1 .mk .mk .c 2 3
  43. 43. The proportion of build changes that are accompanied by source code changes 13 .c ? .mk .c .c Work! items .mk 1 .mk .mk .c 2 3 2 build co-changes 3 total build changes
  44. 44. Build changes are often accompanied by changes to the source code Mozilla Eclipse Lucene Jazz Build ⇒ Source 86% 82% 44% 72% Build ⇒ Test 29% 36% 41% 36% Build ⇒ Src/Test 88% 82% 53% 77% 14 .c ? .mk
  45. 45. Build changes are occasionally accompanied by changes to test code Mozilla Eclipse Lucene Jazz Build ⇒ Source 86% 82% 44% 72% Build ⇒ Test 29% 36% 41% 36% Build ⇒ Src/Test 88% 82% 53% 77% 15 .c ? .mk
  46. 46. 53%-88% of build changes are accompanied by changes to source or test code Mozilla Eclipse Lucene Jazz Build ⇒ Source 86% 82% 44% 72% Build ⇒ Test 29% 36% 41% 36% Build ⇒ Src/Test 88% 82% 53% 77% 16 .c ? .mk
  47. 47. ! Case study structure (RQ2)! Co-change prediction accuracy 17 (RQ1)! Co-change frequency .c ? .mk (RQ3)! Most powerful co-change indicators Most build changes are accompanied by source/test code changes
  48. 48. ! Case study structure (RQ2)! Co-change prediction accuracy 18 (RQ1)! Co-change frequency .c ? .mk (RQ3)! Most powerful co-change indicators Most build changes are accompanied by source/test code changes
  49. 49. Metrics that may indicate whether a code change will require an accompanying build co-change 19
  50. 50. Metrics that may indicate whether a code change will require an accompanying build co-change 19 Language-agnostic
  51. 51. Metrics that may indicate whether a code change will require an accompanying build co-change 19 Language-agnostic +++ File added
  52. 52. Metrics that may indicate whether a code change will require an accompanying build co-change 19 Language-agnostic +++ File added Historical .c .mk tendencies
  53. 53. Metrics that may indicate whether a code change will require an accompanying build co-change Language-agnostic Language-aware 19 +++ File added Historical .c .mk tendencies
  54. 54. Metrics that may indicate whether a code change will require an accompanying build co-change Language-agnostic Language-aware 19 +++ File added Historical .c .mk tendencies Changed dependencies import
  55. 55. Metrics that may indicate whether a code change will require an accompanying build co-change Language-agnostic Language-aware 19 +++ File added Historical .c .mk tendencies Changed dependencies import Changed! conditional compilation #ifdef
  56. 56. We train classifiers to identify code changes that require build changes .mk 1 2 20 Work! items .c .c .c Random forest classification model Build change necessary No build change necessary
  57. 57. We train classifiers to identify code changes that require build changes .mk 1 2 20 Work! items .c Ran.dcom .fcorest classification model Build change necessary No build change necessary
  58. 58. We train classifiers to identify code changes that require build changes .mk 1 2 21 Work! items .c Random forest classification model Build change! necessary No build change necessary
  59. 59. We train classifiers to identify code changes that require build changes .mk 1 2 22 Work! items .c .c .c Random forest classification model Build change necessary No build change necessary
  60. 60. We train classifiers to identify code changes that require build changes .mk 1 2 22 Work! items .c .c .c Random forest classification model Build change necessary No build change necessary
  61. 61. We train classifiers to identify code changes that require build changes .mk 1 2 23 Work! items .c .c Random forest classification model Build change necessary No build change! necessary
  62. 62. Language-aware metrics add significant explanatory power Mozilla Eclipse Lucene Jazz 24 AUC (language-agnostic metrics) 0.84 0.66 0.71 0.57 AUC (language-agnostic and aware metrics) 0.88 0.69 0.78 0.60
  63. 63. Mozilla classifiers are highly accurate Mozilla Eclipse Lucene Jazz 25 AUC (language-agnostic metrics) 0.84 0.66 0.71 0.57 AUC (language-agnostic and aware metrics) 0.88 0.69 0.78 0.60
  64. 64. Java classifiers are less accurate than Mozilla one Mozilla Eclipse Lucene Jazz 26 AUC (language-agnostic metrics) 0.84 0.66 0.71 0.57 AUC (language-agnostic and aware metrics) 0.88 0.69 0.78 0.60
  65. 65. Java build changes are rarely! driven by code changes Category Task Amount Correctly System structure Refactorings 19 (25%) class8ified General build 27 maintenance Build tool configuration 15 (20%) 0 Build defects 6 (8%) 0 Release engineering Add platform support 12 (16%) 2 Packaging fixes 12 (16%) 3 Library versioning 8 (11%) 0 Test maintenance Test infrastructure 3 (4%) 0
  66. 66. ! Case study structure (RQ2)! Co-change prediction accuracy 28 (RQ1)! Co-change frequency .c ? .mk (RQ3)! Most powerful co-change indicators Most build changes are accompanied by source/test code changes AUCC++ 0.88, AUCJava 0.60-0.88 Java build changes are rarely motivated by code changes
  67. 67. ! Case study structure (RQ2)! Co-change prediction accuracy 29 (RQ1)! Co-change frequency .c ? .mk (RQ3)! Most powerful co-change indicators Most build changes are accompanied by source/test code changes AUCC++ 0.88, AUCJava 0.60-0.88 Java build changes are rarely motivated by code changes
  68. 68. Measuring variable importance in random forest classifiers 30 Var1 Var2 500 Random Forests! L. Breiman [Machine Learning 2001] 150 Build! change? No No Yes Var3 2 456 777 111 65 4 13 Random forest classification model Classification Work! Item 1 2 3
  69. 69. Measuring variable importance in random forest classifiers 30 Var1 Var2 500 Random Forests! L. Breiman [Machine Learning 2001] 150 Build! change? No No Yes Var3 2 456 777 111 65 4 13 Random forest classification model Classification Work! Item 1 2 3 No Yes Yes
  70. 70. Measuring variable importance in random forest classifiers 30 Var1 Var2 500 Random Forests! L. Breiman [Machine Learning 2001] 150 Build! change? No No Yes Var3 2 456 777 111 65 4 13 Random forest classification model Classification Work! Item 1 2 3 No Yes Yes Misclassification rate of 1 ÷ 3 = 0.33
  71. 71. Measuring variable importance in random forest classifiers 30 Var1 Var2 500 Random Forests! L. Breiman [Machine Learning 2001] 150 Build! change? No No Yes Var3 2 456 777 111 65 4 13 Random forest classification model Classification Work! Item 1 2 3 No Yes Yes Randomly permute values
  72. 72. Measuring variable importance in random forest classifiers 31 Var1 Var2 777 500 Random Forests! L. Breiman [Machine Learning 2001] 150 Build! change? No No Yes Var3 2 456 111 65 4 13 Random forest classification model Classification Work! Item 1 2 3
  73. 73. Measuring variable importance in random forest classifiers 31 Var1 Var2 777 500 Random Forests! L. Breiman [Machine Learning 2001] 150 Build! change? No No Yes Var3 2 456 111 65 4 13 Random forest classification model Classification Work! Item 1 2 3 Yes Yes Yes
  74. 74. Measuring variable importance in random forest classifiers 31 Var1 Var2 777 500 Random Forests! L. Breiman [Machine Learning 2001] 150 Build! change? No No Yes Var3 2 456 111 65 4 13 Random forest classification model Classification Work! Item 1 2 3 Yes Yes Yes Misclassification rate of 2 ÷ 3 = 0.67
  75. 75. Measuring variable importance in random forest classifiers Var1 randomness increases misclassification rate by 0.33 31 Var1 Var2 777 500 Random Forests! L. Breiman [Machine Learning 2001] 150 Build! change? No No Yes Var3 2 456 111 65 4 13 Random forest classification model Classification Work! Item 1 2 3 Yes Yes Yes Misclassification rate of 2 ÷ 3 = 0.67
  76. 76. Variable importance in the! Mozilla classifier Mozilla Eclipse−core ● ● ● ● ● ● ●● ●● 0.10 0.05 0.00 Non−core dependencies Conditional compilation Test deleted Source modified Test added Source deleted Source added Prior build co−changes Test modified Source renamed Test renamed Test added Source deleted Source added Number of files Non−core dependencies Prior build co−changes Test deleted Source modified Test modified Source renamed Number of files Test renamed Source added Test Source Variable Importance Score 32
  77. 77. Variable importance in the! Mozilla classifier Mozilla Eclipse−core ● ● ● ● ● ● ●● ●● 0.10 0.05 0.00 Non−core dependencies Conditional compilation Test deleted Source modified Test added Source deleted Source added Prior build co−changes Test modified Source renamed Test renamed Test added Source deleted Source added Number of files Non−core dependencies Prior build co−changes Test deleted Source modified Test modified Source renamed Number of files Test renamed Source added Test Source Variable Importance Score 32 Structure-altering changes are most important in C++ classifiers
  78. 78. Variable importance in the! Mozilla classifier Mozilla Eclipse−core ● ● ● ● ● ● ●● ●● 0.10 0.05 0.00 Non−core dependencies Conditional compilation Test deleted Source modified Test added Source deleted Source added Prior build co−changes Test modified Source renamed Test renamed Test added Source deleted Source added Number of files Non−core dependencies Prior build co−changes Test deleted Source modified Test modified Source renamed Number of files Test renamed Source added Test Source Variable Importance Score 32 Structure-altering changes are most important in C++ classifiers Historical co-change tendencies are also important
  79. 79. Variable importance in Java classifiers Eclipse−core Lucene Jazz Non−core dependencies Prior build co−changes Test deleted Source modified 33 ● ●● ●● ● ● ● Non−core dependencies Prior build co−changes Test deleted Source modified Test modified Source renamed added Test added Source deleted Test added Source deleted Number of files Test renamed Source added Test modified Source renamed Non−core dependencies Test deleted Source modified Test added Source deleted Number of files Test renamed Source added Test modified Source renamed Test renamed
  80. 80. Variable importance in Java classifiers Eclipse−core Lucene Jazz Non−core dependencies Prior build co−changes Test deleted Source modified 33 ● ●● ●● ● ● ● Non−core dependencies Prior build co−changes Test deleted Source modified Test modified Source renamed added Test added Source deleted Test added Source deleted Number of files Test renamed Source added Test modified Source renamed Non−core dependencies Test deleted Source modified Test added Source deleted Number of files Test renamed Source added Test modified Source renamed Test renamed Structure-altering is less important than in Mozilla
  81. 81. Variable importance in Java classifiers Eclipse−core Lucene Jazz Non−core dependencies Prior build co−changes Test deleted Source modified 33 ● ●● ●● ● ● ● Non−core dependencies Prior build co−changes Test deleted Source modified Test modified Source renamed added Test added Source deleted Test added Source deleted Number of files Test renamed Source added Test modified Source renamed Non−core dependencies Test deleted Source modified Test added Source deleted Number of files Test renamed Source added Test modified Source renamed Test renamed Structure-altering is less important than in Mozilla Java classifiers rely more on historical co-change tendencies
  82. 82. ! Case study structure (RQ2)! Co-change prediction accuracy 34 (RQ1)! Co-change frequency .c ? .mk (RQ3)! Most powerful co-change indicators Most build changes are accompanied by source/test code changes AUCC++ 0.88, AUCJava 0.60-0.88 Java build changes are rarely motivated by code changes C++: Structure-altering changes Java: Historical co-change tendencies

×