SlideShare a Scribd company logo
1 of 29
Mining Source Code Improvement
Patterns from Similar Code Review
Yuki Ueda1, Takashi Ishio1, Akinori Ihara2,
Kenichi Matsumoto1
1Nara Institute of Science and Technology
2Wakayama University
13th International Workshop on Software Clones (IWSC’19)
Background Approach Result Summary
Contents
• Goal:Reduce Code Review Cost
• Approach:Code Improvement Pattern Detection
That Appeared Review
• Evaluation: Measure Patterns’ Frequency and
Accuracy
2
Background Approach Result Summary
Code review process:
Reviewers suggest code fix
Patch
Author
Reviewer Project
3
- i=key
+ i=dic[“key”]
Patch
Background
(1) Submit
Background Approach Result Summary
Code review process:
Reviewers suggest code fix
Patch
Author
Reviewer Project
4
- i=key
+ i=dic[“key”]
Patch
You should fix
(1) Submit
(2) Review, Fix suggestion
Background
Background Approach Result Summary
Code review process:
Reviewers suggest code fix
5
- i=key
+ i=dic[“key”]
- i=key
+ i_=_dic[“KEY”]
(3) Integrate
Patch
Author
Reviewer Project(1) Submit
(2) Review, Fix suggestion
Reviewed Patch
(Integrated Patch)
Pre-Review Patch
(Initial Patch)
Background
Background Approach Result Summary
Problem:
Reviewers need to check several times
6
- i=key
+ i=dic[“key”]
- i=key
+ i_=_dic[“KEY”]
(2)〜(n) Review,Fix suggestion
(n) Integrate
Patch
Author
Reviewer Project(1) Submit
Reviewed Patch
(Integrated Patch)
Pre-Review Patch
(Initial Patch)
String should be lower
Waste space
Background
Background Approach Result Summary
Goal:
Reduce Similar Review Automatically
7
Auto Review
System
(2) Review,Fix suggestion
(3) Review
request
Patch
Author
Reviewer(1) Submit
Similar patch is fixed in the
past like..
Background
Background Approach Result Summary
Approach:
Detect Pattern from Reviewed Patch Diff
8
”key” , it will be “KEY”
Pattern:
i=dic[“key”] i=dic[“KEY”]
Dataset:
i=dic[“key”] i=dic[“KEY”]i=dic[“key”]
Pre-Review Patch
i=dic[“KEY”]
Reviewed Patch
Approach
If patch has
Detect
Background Approach Result Summary
Approach:
Detect Pattern from Reviewed Patch Diff
9
Patch
Author
Auto Review
System
print(“key”)
print(“KEY”)
”key” , it will be “KEY”
Pattern:
If patch has
Use
Dataset:
i=dic[“key”] i=dic[“KEY”]i=dic[“key”] i=dic[“KEY”]i=dic[“key”]
Pre-Review Patch
i=dic[“KEY”]
Reviewed Patch
Approach
Background Approach Result Summary
Detect Code Improved Pattern (1/2):
Divide Patch Diff to Chunk
10
- if i␣==␣0:
+ if i==0:
break
- i=dic[“key”]
+ i=dic.get(“key”)
- i=dic[“key”]
+ i=dic.get(“key”)
- if i␣==␣0:
+ if i==0:
Approach
Background Approach Result Summary
Detect Code Improved Pattern (1/2):
Get Pattern by Sequential Pattern Mining
11
- i=dic[“key”]
+ i=dic.get(“key”)
- [i=dic - [ + .get(i=dic
- [i=dic
i=dic
- [ + .get(i=dic “key”
- ]
+ )
Length 2 Length3 Length4
Approach
Background Approach Result Summary
Detect Code Improved Pattern (1/2):
Get Pattern by Sequential Pattern Mining
12
- i=dic[“key”]
+ i=dic.get(“key”)
- [i=dic - [ + .get(i=dic
- [i=dic - ]
i=dic + )
- [ + .get(i=dic “key”
Length 2 Length3 Length4
Keep Frequently Appeared and Longer Patterns
Approach
Background Approach Result Summary
Pattern Evaluation
13
i=dic + .get( - ]
Appeared Time:
+ )(e.g. Pattern
i=dic
.get(
]
)
Pre-Reviewed Patches that have
Reviewed Patches that have
)
Number of Patch Pairs
Approach
Background Approach Result Summary
Pattern Evaluation
14
Appeared Time:
.get( )
Pre-Reviewed Patches that have
Reviewed Patches that have
Number of Patch Pairs
i=dic[“key”]
i=dic.get(“key”)
i=dic[”KEY”]
Count
NOT Count
e.g.):
Pre-Reviewed Patch
i=dic + .get( - ] + )(e.g. Pattern )
i=dic ]
Approach
Background Approach Result Summary
15
Appeared Time:
.get( )
Pre-Reviewed Patches that have
Reviewed Patches that have
Number of Patch Pairs
Accuracy:
.get( )
Pre-Reviewed Patches that have
Reviewed Patches that have
Ratio of Patch Pairs
Pattern Evaluation
i=dic ]
i=dic ]
i=dic + .get( - ] + )(e.g. Pattern )
Approach
Background Approach Result Summary
Target
16
Project OpenStack
Language Python3
Time Period 2011-2016
# Patches 173,749
# Chunks for Detect Pattern 555,050
# Chunks for Evaluate Pattern 61,673
Result
Background Approach Result Summary
8 Frequently Appeared Pattern
17
self.stbout() self.stubs.Set()
Why?: Support for OpenStacks‘ library dependency changes
Result
Background Approach Result Summary
8 Frequently Appeared Pattern
18
assertEquals() assertEqual()
Why?: Support for Python 2 to 3 changes
self.stbout() self.stubs.Set()
Why?: Support for OpenStacks‘ library dependency changes
xrange() range()
Result
Background Approach Result Summary
8 Frequently Appeared Pattern
19
assertEquals() assertEqual()
Why?: Support for Python 2 to 3 changes
assertTrue(x in array)
Why?: Improve readability
assertIn(x, array)
xrange() range()
self.stbout() self.stubs.Set()
Why?: Support for OpenStacks‘ library dependency changes
Result
Background Approach Result Summary
8 Frequently Appeared Pattern
20
assertEquals() assertEqual()
Why?: Support for Python2 to 3 changes
assertTrue(x in array)
Why?: Improve readability
assertIn(x, array)
- xrange() + range()
self.stbout() self.stubs.Set()
Why?: Support for OpenStacks‘ library dependency changes
Thresholds:
Appeared time > 300
Accuracy > 10%
Total 8 patterns
Cover: 32.3% (19,940/ 61,673) similar patches
Accuracy: 45.9%
Result
Background Approach Result Summary
Patterns are discussed on StackOverflow
21
- assertEquals() + assertEqual()
Why?: Support for Python2 to 3 changes
- assertTrue(x in array)
Why?: Improve readability
+ assertIn(x, array)
- xrange() + range()
- self.stbout() + self.stubs.Set()
Why?: Support for OpenStacks‘ library dependency changes
Result
Background Approach Result Summary
修正しました
For Automatically Code Review:
Work as GitHub Bot
22
Patch authorBot
I fixed
Reviewer
OK
Sample URL: https://github.com/Ikuyadeu/ExtentionTest/pull/9
Result
Background Approach Result Summary
vs Other Tool (1 / 2)
Static Analysis Tool
FOO=0 foo_=_0
23
Bad name
Waste
space
Static Analysis Tool (pylint)
Fix based on Language
Other tools: ESlint, Pmd, checkstyle
Result
Background Approach Result Summary
vs Other Tool (1 / 2)
Static Analysis Tool
FOO=0 foo_=_0
24
Static Analysis Tool (pylint)
Fix based on Language
This research:
Project-specific
changes
self.stbout()
xrange()
self.stubs.Set()
range()
Old library
dependency
Language
definition
Result
Background Approach Result Summary
vs Other Tool (2 / 2)
25
Choose best rule set from large rule set
• Invalid-name
• Bad-continuation
• Wrong-import-order
• Invalid-name
• Bad-continuation
• Wrong-import-order
IntelliCode
Result
Background Approach Result Summary
vs Other Tool (2 / 2)
26
Choose best rule set from large rule set
Find NEW pattern set from history
• Invalid-name
• Bad-continuation
• Wrong-import-order
• Invalid-name
• Bad-continuation
• Wrong-import-order
• disk2disk_api
• stubs.Set2stub_out
• assert-equals2equal
IntelliCode
This Study
Result
Background Approach Result Summary
vs Other Tool (2 / 2)
27
Choose best rule set from large rule set
Find NEW pattern set from history
• Invalid-name
• Bad-continuation
• Wrong-import-order
• Invalid-name
• Bad-continuation
• Wrong-import-order
• disk2disk_api
• stubs.Set2stub_out
• assert-equals2equal
IntelliCode
This Study
Support project-specific problem
Support change of environment
Result
Background Approach Result Summary
Future Work
• Which pattern should bot choose?
Most appeared pattern, High accuracy pattern
• Compare with Other Projects and Languages’
Patterns
• Evaluate by submitting pull request, and get ratio of
Accepted / Submitted pull request
28
Summary
📧ueda.yuki.un7@is.naist.jp

More Related Content

Similar to Mining Source Code Improvement Patterns from Similar Code Review Works

Reducing Redundancies in Multi-Revision Code Analysis
Reducing Redundancies in Multi-Revision Code AnalysisReducing Redundancies in Multi-Revision Code Analysis
Reducing Redundancies in Multi-Revision Code Analysis
Sebastiano Panichella
 
Reverse engineering and theory building v3
Reverse engineering and theory building v3Reverse engineering and theory building v3
Reverse engineering and theory building v3
ClarkTony
 
VL/HCC 2014 - A Longitudinal Study of Programmers' Backtracking
VL/HCC 2014 - A Longitudinal Study of Programmers' BacktrackingVL/HCC 2014 - A Longitudinal Study of Programmers' Backtracking
VL/HCC 2014 - A Longitudinal Study of Programmers' Backtracking
YoungSeok Yoon
 

Similar to Mining Source Code Improvement Patterns from Similar Code Review Works (20)

Static analysis: Around Java in 60 minutes
Static analysis: Around Java in 60 minutesStatic analysis: Around Java in 60 minutes
Static analysis: Around Java in 60 minutes
 
Reducing Redundancies in Multi-Revision Code Analysis
Reducing Redundancies in Multi-Revision Code AnalysisReducing Redundancies in Multi-Revision Code Analysis
Reducing Redundancies in Multi-Revision Code Analysis
 
Delivering High Quality Elixir Code using Gitlab
Delivering High Quality Elixir Code using GitlabDelivering High Quality Elixir Code using Gitlab
Delivering High Quality Elixir Code using Gitlab
 
Does static analysis need machine learning?
Does static analysis need machine learning?Does static analysis need machine learning?
Does static analysis need machine learning?
 
Devry CIS 355A Full Course Latest
Devry CIS 355A Full Course LatestDevry CIS 355A Full Course Latest
Devry CIS 355A Full Course Latest
 
Tech talks#6: Code Refactoring
Tech talks#6: Code RefactoringTech talks#6: Code Refactoring
Tech talks#6: Code Refactoring
 
Introduction to Aspect Oriented Programming
Introduction to Aspect Oriented ProgrammingIntroduction to Aspect Oriented Programming
Introduction to Aspect Oriented Programming
 
Impact of Coding Style Checker on Code Review -A case study on the OpenStack ...
Impact of Coding Style Checker on Code Review -A case study on the OpenStack ...Impact of Coding Style Checker on Code Review -A case study on the OpenStack ...
Impact of Coding Style Checker on Code Review -A case study on the OpenStack ...
 
Compiler Construction | Lecture 14 | Interpreters
Compiler Construction | Lecture 14 | InterpretersCompiler Construction | Lecture 14 | Interpreters
Compiler Construction | Lecture 14 | Interpreters
 
Symbexecsearch
SymbexecsearchSymbexecsearch
Symbexecsearch
 
SQLGitHub - Access GitHub API with SQL-like syntaxes
SQLGitHub - Access GitHub API with SQL-like syntaxesSQLGitHub - Access GitHub API with SQL-like syntaxes
SQLGitHub - Access GitHub API with SQL-like syntaxes
 
C++ Windows Forms L01 - Intro
C++ Windows Forms L01 - IntroC++ Windows Forms L01 - Intro
C++ Windows Forms L01 - Intro
 
Kyo - Functional Scala 2023.pdf
Kyo - Functional Scala 2023.pdfKyo - Functional Scala 2023.pdf
Kyo - Functional Scala 2023.pdf
 
Reverse engineering and theory building v3
Reverse engineering and theory building v3Reverse engineering and theory building v3
Reverse engineering and theory building v3
 
Eclipse Con 2015: Codan - a C/C++ Code Analysis Framework for CDT
Eclipse Con 2015: Codan - a C/C++ Code Analysis Framework for CDTEclipse Con 2015: Codan - a C/C++ Code Analysis Framework for CDT
Eclipse Con 2015: Codan - a C/C++ Code Analysis Framework for CDT
 
BIRTE-13-Kawashima
BIRTE-13-KawashimaBIRTE-13-Kawashima
BIRTE-13-Kawashima
 
EKON 23 Code_review_checklist
EKON 23 Code_review_checklistEKON 23 Code_review_checklist
EKON 23 Code_review_checklist
 
Magento code audit
Magento code auditMagento code audit
Magento code audit
 
VL/HCC 2014 - A Longitudinal Study of Programmers' Backtracking
VL/HCC 2014 - A Longitudinal Study of Programmers' BacktrackingVL/HCC 2014 - A Longitudinal Study of Programmers' Backtracking
VL/HCC 2014 - A Longitudinal Study of Programmers' Backtracking
 
EVERYTHING ABOUT STATIC CODE ANALYSIS FOR A JAVA PROGRAMMER
EVERYTHING ABOUT STATIC CODE ANALYSIS FOR A JAVA PROGRAMMEREVERYTHING ABOUT STATIC CODE ANALYSIS FOR A JAVA PROGRAMMER
EVERYTHING ABOUT STATIC CODE ANALYSIS FOR A JAVA PROGRAMMER
 

More from 奈良先端大 情報科学研究科

More from 奈良先端大 情報科学研究科 (20)

テレコミュニケーションを支援してみよう
テレコミュニケーションを支援してみようテレコミュニケーションを支援してみよう
テレコミュニケーションを支援してみよう
 
マイコンと機械学習を使って行動認識システムを作ろう
マイコンと機械学習を使って行動認識システムを作ろうマイコンと機械学習を使って行動認識システムを作ろう
マイコンと機械学習を使って行動認識システムを作ろう
 
5G時代を支えるNFVによるネットワーク最適設計
5G時代を支えるNFVによるネットワーク最適設計5G時代を支えるNFVによるネットワーク最適設計
5G時代を支えるNFVによるネットワーク最適設計
 
21.Raspberry Piを用いたIoTアプリの開発
21.Raspberry Piを用いたIoTアプリの開発21.Raspberry Piを用いたIoTアプリの開発
21.Raspberry Piを用いたIoTアプリの開発
 
20. 地理ビッグデータ利活用: リスク予測型自動避難誘導,地理的リスク分析
20. 地理ビッグデータ利活用: リスク予測型自動避難誘導,地理的リスク分析20. 地理ビッグデータ利活用: リスク予測型自動避難誘導,地理的リスク分析
20. 地理ビッグデータ利活用: リスク予測型自動避難誘導,地理的リスク分析
 
11.実装の脆弱性を利用して強力な暗号を解読してみよう!
11.実装の脆弱性を利用して強力な暗号を解読してみよう!11.実装の脆弱性を利用して強力な暗号を解読してみよう!
11.実装の脆弱性を利用して強力な暗号を解読してみよう!
 
8. ミニ・スーパコンピュータを自作しよう!
8. ミニ・スーパコンピュータを自作しよう!8. ミニ・スーパコンピュータを自作しよう!
8. ミニ・スーパコンピュータを自作しよう!
 
16. マイコンと機械学習を使って行動認識システムを作ろう
16. マイコンと機械学習を使って行動認識システムを作ろう16. マイコンと機械学習を使って行動認識システムを作ろう
16. マイコンと機械学習を使って行動認識システムを作ろう
 
15. テレイグジスタンスシステムを制作してみよう
15. テレイグジスタンスシステムを制作してみよう15. テレイグジスタンスシステムを制作してみよう
15. テレイグジスタンスシステムを制作してみよう
 
14. ビデオシースルーHMDで視覚拡張の世界を体感しよう
14. ビデオシースルーHMDで視覚拡張の世界を体感しよう14. ビデオシースルーHMDで視覚拡張の世界を体感しよう
14. ビデオシースルーHMDで視覚拡張の世界を体感しよう
 
19. 生物に学ぶ人工知能とロボット制御
19. 生物に学ぶ人工知能とロボット制御19. 生物に学ぶ人工知能とロボット制御
19. 生物に学ぶ人工知能とロボット制御
 
13. SDRで学ぶ無線通信
13. SDRで学ぶ無線通信13. SDRで学ぶ無線通信
13. SDRで学ぶ無線通信
 
18. 計測に基づいた写実的なコンピュータグラフィクスの生成法
18. 計測に基づいた写実的なコンピュータグラフィクスの生成法18. 計測に基づいた写実的なコンピュータグラフィクスの生成法
18. 計測に基づいた写実的なコンピュータグラフィクスの生成法
 
21. 人の動作・行動センシングに基づく拡張現実感システムの開発
21. 人の動作・行動センシングに基づく拡張現実感システムの開発21. 人の動作・行動センシングに基づく拡張現実感システムの開発
21. 人の動作・行動センシングに基づく拡張現実感システムの開発
 
20. 友好的関係を構築する人と対話ロボットのコミュニケーション技術開発
20. 友好的関係を構築する人と対話ロボットのコミュニケーション技術開発20. 友好的関係を構築する人と対話ロボットのコミュニケーション技術開発
20. 友好的関係を構築する人と対話ロボットのコミュニケーション技術開発
 
9. マイコンと機械学習を使って行動認識システムを作ろう
9. マイコンと機械学習を使って行動認識システムを作ろう9. マイコンと機械学習を使って行動認識システムを作ろう
9. マイコンと機械学習を使って行動認識システムを作ろう
 
6. 生物に学ぶ人工知能とロボット制御
6. 生物に学ぶ人工知能とロボット制御6. 生物に学ぶ人工知能とロボット制御
6. 生物に学ぶ人工知能とロボット制御
 
14. モバイルエージェントによる並列分散学習システムの構築
14. モバイルエージェントによる並列分散学習システムの構築14. モバイルエージェントによる並列分散学習システムの構築
14. モバイルエージェントによる並列分散学習システムの構築
 
17. 100台の小型ロボットを協調させよう
17. 100台の小型ロボットを協調させよう17. 100台の小型ロボットを協調させよう
17. 100台の小型ロボットを協調させよう
 
5. ミニ・スーパコンピュータを自作しよう!
5. ミニ・スーパコンピュータを自作しよう!5. ミニ・スーパコンピュータを自作しよう!
5. ミニ・スーパコンピュータを自作しよう!
 

Recently uploaded

Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
Areesha Ahmad
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
NazaninKarimi6
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
seri bangash
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
MohamedFarag457087
 

Recently uploaded (20)

Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort ServiceCall Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
 
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 
Grade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsGrade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its Functions
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical Science
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedConnaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
 
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
 

Mining Source Code Improvement Patterns from Similar Code Review Works

  • 1. Mining Source Code Improvement Patterns from Similar Code Review Yuki Ueda1, Takashi Ishio1, Akinori Ihara2, Kenichi Matsumoto1 1Nara Institute of Science and Technology 2Wakayama University 13th International Workshop on Software Clones (IWSC’19)
  • 2. Background Approach Result Summary Contents • Goal:Reduce Code Review Cost • Approach:Code Improvement Pattern Detection That Appeared Review • Evaluation: Measure Patterns’ Frequency and Accuracy 2
  • 3. Background Approach Result Summary Code review process: Reviewers suggest code fix Patch Author Reviewer Project 3 - i=key + i=dic[“key”] Patch Background (1) Submit
  • 4. Background Approach Result Summary Code review process: Reviewers suggest code fix Patch Author Reviewer Project 4 - i=key + i=dic[“key”] Patch You should fix (1) Submit (2) Review, Fix suggestion Background
  • 5. Background Approach Result Summary Code review process: Reviewers suggest code fix 5 - i=key + i=dic[“key”] - i=key + i_=_dic[“KEY”] (3) Integrate Patch Author Reviewer Project(1) Submit (2) Review, Fix suggestion Reviewed Patch (Integrated Patch) Pre-Review Patch (Initial Patch) Background
  • 6. Background Approach Result Summary Problem: Reviewers need to check several times 6 - i=key + i=dic[“key”] - i=key + i_=_dic[“KEY”] (2)〜(n) Review,Fix suggestion (n) Integrate Patch Author Reviewer Project(1) Submit Reviewed Patch (Integrated Patch) Pre-Review Patch (Initial Patch) String should be lower Waste space Background
  • 7. Background Approach Result Summary Goal: Reduce Similar Review Automatically 7 Auto Review System (2) Review,Fix suggestion (3) Review request Patch Author Reviewer(1) Submit Similar patch is fixed in the past like.. Background
  • 8. Background Approach Result Summary Approach: Detect Pattern from Reviewed Patch Diff 8 ”key” , it will be “KEY” Pattern: i=dic[“key”] i=dic[“KEY”] Dataset: i=dic[“key”] i=dic[“KEY”]i=dic[“key”] Pre-Review Patch i=dic[“KEY”] Reviewed Patch Approach If patch has Detect
  • 9. Background Approach Result Summary Approach: Detect Pattern from Reviewed Patch Diff 9 Patch Author Auto Review System print(“key”) print(“KEY”) ”key” , it will be “KEY” Pattern: If patch has Use Dataset: i=dic[“key”] i=dic[“KEY”]i=dic[“key”] i=dic[“KEY”]i=dic[“key”] Pre-Review Patch i=dic[“KEY”] Reviewed Patch Approach
  • 10. Background Approach Result Summary Detect Code Improved Pattern (1/2): Divide Patch Diff to Chunk 10 - if i␣==␣0: + if i==0: break - i=dic[“key”] + i=dic.get(“key”) - i=dic[“key”] + i=dic.get(“key”) - if i␣==␣0: + if i==0: Approach
  • 11. Background Approach Result Summary Detect Code Improved Pattern (1/2): Get Pattern by Sequential Pattern Mining 11 - i=dic[“key”] + i=dic.get(“key”) - [i=dic - [ + .get(i=dic - [i=dic i=dic - [ + .get(i=dic “key” - ] + ) Length 2 Length3 Length4 Approach
  • 12. Background Approach Result Summary Detect Code Improved Pattern (1/2): Get Pattern by Sequential Pattern Mining 12 - i=dic[“key”] + i=dic.get(“key”) - [i=dic - [ + .get(i=dic - [i=dic - ] i=dic + ) - [ + .get(i=dic “key” Length 2 Length3 Length4 Keep Frequently Appeared and Longer Patterns Approach
  • 13. Background Approach Result Summary Pattern Evaluation 13 i=dic + .get( - ] Appeared Time: + )(e.g. Pattern i=dic .get( ] ) Pre-Reviewed Patches that have Reviewed Patches that have ) Number of Patch Pairs Approach
  • 14. Background Approach Result Summary Pattern Evaluation 14 Appeared Time: .get( ) Pre-Reviewed Patches that have Reviewed Patches that have Number of Patch Pairs i=dic[“key”] i=dic.get(“key”) i=dic[”KEY”] Count NOT Count e.g.): Pre-Reviewed Patch i=dic + .get( - ] + )(e.g. Pattern ) i=dic ] Approach
  • 15. Background Approach Result Summary 15 Appeared Time: .get( ) Pre-Reviewed Patches that have Reviewed Patches that have Number of Patch Pairs Accuracy: .get( ) Pre-Reviewed Patches that have Reviewed Patches that have Ratio of Patch Pairs Pattern Evaluation i=dic ] i=dic ] i=dic + .get( - ] + )(e.g. Pattern ) Approach
  • 16. Background Approach Result Summary Target 16 Project OpenStack Language Python3 Time Period 2011-2016 # Patches 173,749 # Chunks for Detect Pattern 555,050 # Chunks for Evaluate Pattern 61,673 Result
  • 17. Background Approach Result Summary 8 Frequently Appeared Pattern 17 self.stbout() self.stubs.Set() Why?: Support for OpenStacks‘ library dependency changes Result
  • 18. Background Approach Result Summary 8 Frequently Appeared Pattern 18 assertEquals() assertEqual() Why?: Support for Python 2 to 3 changes self.stbout() self.stubs.Set() Why?: Support for OpenStacks‘ library dependency changes xrange() range() Result
  • 19. Background Approach Result Summary 8 Frequently Appeared Pattern 19 assertEquals() assertEqual() Why?: Support for Python 2 to 3 changes assertTrue(x in array) Why?: Improve readability assertIn(x, array) xrange() range() self.stbout() self.stubs.Set() Why?: Support for OpenStacks‘ library dependency changes Result
  • 20. Background Approach Result Summary 8 Frequently Appeared Pattern 20 assertEquals() assertEqual() Why?: Support for Python2 to 3 changes assertTrue(x in array) Why?: Improve readability assertIn(x, array) - xrange() + range() self.stbout() self.stubs.Set() Why?: Support for OpenStacks‘ library dependency changes Thresholds: Appeared time > 300 Accuracy > 10% Total 8 patterns Cover: 32.3% (19,940/ 61,673) similar patches Accuracy: 45.9% Result
  • 21. Background Approach Result Summary Patterns are discussed on StackOverflow 21 - assertEquals() + assertEqual() Why?: Support for Python2 to 3 changes - assertTrue(x in array) Why?: Improve readability + assertIn(x, array) - xrange() + range() - self.stbout() + self.stubs.Set() Why?: Support for OpenStacks‘ library dependency changes Result
  • 22. Background Approach Result Summary 修正しました For Automatically Code Review: Work as GitHub Bot 22 Patch authorBot I fixed Reviewer OK Sample URL: https://github.com/Ikuyadeu/ExtentionTest/pull/9 Result
  • 23. Background Approach Result Summary vs Other Tool (1 / 2) Static Analysis Tool FOO=0 foo_=_0 23 Bad name Waste space Static Analysis Tool (pylint) Fix based on Language Other tools: ESlint, Pmd, checkstyle Result
  • 24. Background Approach Result Summary vs Other Tool (1 / 2) Static Analysis Tool FOO=0 foo_=_0 24 Static Analysis Tool (pylint) Fix based on Language This research: Project-specific changes self.stbout() xrange() self.stubs.Set() range() Old library dependency Language definition Result
  • 25. Background Approach Result Summary vs Other Tool (2 / 2) 25 Choose best rule set from large rule set • Invalid-name • Bad-continuation • Wrong-import-order • Invalid-name • Bad-continuation • Wrong-import-order IntelliCode Result
  • 26. Background Approach Result Summary vs Other Tool (2 / 2) 26 Choose best rule set from large rule set Find NEW pattern set from history • Invalid-name • Bad-continuation • Wrong-import-order • Invalid-name • Bad-continuation • Wrong-import-order • disk2disk_api • stubs.Set2stub_out • assert-equals2equal IntelliCode This Study Result
  • 27. Background Approach Result Summary vs Other Tool (2 / 2) 27 Choose best rule set from large rule set Find NEW pattern set from history • Invalid-name • Bad-continuation • Wrong-import-order • Invalid-name • Bad-continuation • Wrong-import-order • disk2disk_api • stubs.Set2stub_out • assert-equals2equal IntelliCode This Study Support project-specific problem Support change of environment Result
  • 28. Background Approach Result Summary Future Work • Which pattern should bot choose? Most appeared pattern, High accuracy pattern • Compare with Other Projects and Languages’ Patterns • Evaluate by submitting pull request, and get ratio of Accepted / Submitted pull request 28 Summary