Mining Source Code Improvement Patterns from Similar Code Review Works
Mining Source Code Improvement
Patterns from Similar Code Review
Yuki Ueda1, Takashi Ishio1, Akinori Ihara2,
Kenichi Matsumoto1
1Nara Institute of Science and Technology
2Wakayama University
13th International Workshop on Software Clones (IWSC’19)
Background Approach Result Summary
Contents
• Goal:Reduce Code Review Cost
• Approach:Code Improvement Pattern Detection
That Appeared Review
• Evaluation: Measure Patterns’ Frequency and
Accuracy
2
Background Approach Result Summary
Problem:
Reviewers need to check several times
6
- i=key
+ i=dic[“key”]
- i=key
+ i_=_dic[“KEY”]
(2)〜(n) Review,Fix suggestion
(n) Integrate
Patch
Author
Reviewer Project(1) Submit
Reviewed Patch
(Integrated Patch)
Pre-Review Patch
(Initial Patch)
String should be lower
Waste space
Background
Background Approach Result Summary
Goal:
Reduce Similar Review Automatically
7
Auto Review
System
(2) Review,Fix suggestion
(3) Review
request
Patch
Author
Reviewer(1) Submit
Similar patch is fixed in the
past like..
Background
Background Approach Result Summary
Approach:
Detect Pattern from Reviewed Patch Diff
8
”key” , it will be “KEY”
Pattern:
i=dic[“key”] i=dic[“KEY”]
Dataset:
i=dic[“key”] i=dic[“KEY”]i=dic[“key”]
Pre-Review Patch
i=dic[“KEY”]
Reviewed Patch
Approach
If patch has
Detect
Background Approach Result Summary
Approach:
Detect Pattern from Reviewed Patch Diff
9
Patch
Author
Auto Review
System
print(“key”)
print(“KEY”)
”key” , it will be “KEY”
Pattern:
If patch has
Use
Dataset:
i=dic[“key”] i=dic[“KEY”]i=dic[“key”] i=dic[“KEY”]i=dic[“key”]
Pre-Review Patch
i=dic[“KEY”]
Reviewed Patch
Approach
Background Approach Result Summary
Detect Code Improved Pattern (1/2):
Divide Patch Diff to Chunk
10
- if i␣==␣0:
+ if i==0:
break
- i=dic[“key”]
+ i=dic.get(“key”)
- i=dic[“key”]
+ i=dic.get(“key”)
- if i␣==␣0:
+ if i==0:
Approach
Background Approach Result Summary
Detect Code Improved Pattern (1/2):
Get Pattern by Sequential Pattern Mining
12
- i=dic[“key”]
+ i=dic.get(“key”)
- [i=dic - [ + .get(i=dic
- [i=dic - ]
i=dic + )
- [ + .get(i=dic “key”
Length 2 Length3 Length4
Keep Frequently Appeared and Longer Patterns
Approach
Background Approach Result Summary
Pattern Evaluation
13
i=dic + .get( - ]
Appeared Time:
+ )(e.g. Pattern
i=dic
.get(
]
)
Pre-Reviewed Patches that have
Reviewed Patches that have
)
Number of Patch Pairs
Approach
Background Approach Result Summary
Pattern Evaluation
14
Appeared Time:
.get( )
Pre-Reviewed Patches that have
Reviewed Patches that have
Number of Patch Pairs
i=dic[“key”]
i=dic.get(“key”)
i=dic[”KEY”]
Count
NOT Count
e.g.):
Pre-Reviewed Patch
i=dic + .get( - ] + )(e.g. Pattern )
i=dic ]
Approach
Background Approach Result Summary
15
Appeared Time:
.get( )
Pre-Reviewed Patches that have
Reviewed Patches that have
Number of Patch Pairs
Accuracy:
.get( )
Pre-Reviewed Patches that have
Reviewed Patches that have
Ratio of Patch Pairs
Pattern Evaluation
i=dic ]
i=dic ]
i=dic + .get( - ] + )(e.g. Pattern )
Approach
Background Approach Result Summary
Target
16
Project OpenStack
Language Python3
Time Period 2011-2016
# Patches 173,749
# Chunks for Detect Pattern 555,050
# Chunks for Evaluate Pattern 61,673
Result
Background Approach Result Summary
8 Frequently Appeared Pattern
17
self.stbout() self.stubs.Set()
Why?: Support for OpenStacks‘ library dependency changes
Result
Background Approach Result Summary
8 Frequently Appeared Pattern
18
assertEquals() assertEqual()
Why?: Support for Python 2 to 3 changes
self.stbout() self.stubs.Set()
Why?: Support for OpenStacks‘ library dependency changes
xrange() range()
Result
Background Approach Result Summary
8 Frequently Appeared Pattern
19
assertEquals() assertEqual()
Why?: Support for Python 2 to 3 changes
assertTrue(x in array)
Why?: Improve readability
assertIn(x, array)
xrange() range()
self.stbout() self.stubs.Set()
Why?: Support for OpenStacks‘ library dependency changes
Result
Background Approach Result Summary
8 Frequently Appeared Pattern
20
assertEquals() assertEqual()
Why?: Support for Python2 to 3 changes
assertTrue(x in array)
Why?: Improve readability
assertIn(x, array)
- xrange() + range()
self.stbout() self.stubs.Set()
Why?: Support for OpenStacks‘ library dependency changes
Thresholds:
Appeared time > 300
Accuracy > 10%
Total 8 patterns
Cover: 32.3% (19,940/ 61,673) similar patches
Accuracy: 45.9%
Result
Background Approach Result Summary
Patterns are discussed on StackOverflow
21
- assertEquals() + assertEqual()
Why?: Support for Python2 to 3 changes
- assertTrue(x in array)
Why?: Improve readability
+ assertIn(x, array)
- xrange() + range()
- self.stbout() + self.stubs.Set()
Why?: Support for OpenStacks‘ library dependency changes
Result
Background Approach Result Summary
修正しました
For Automatically Code Review:
Work as GitHub Bot
22
Patch authorBot
I fixed
Reviewer
OK
Sample URL: https://github.com/Ikuyadeu/ExtentionTest/pull/9
Result
Background Approach Result Summary
vs Other Tool (1 / 2)
Static Analysis Tool
FOO=0 foo_=_0
23
Bad name
Waste
space
Static Analysis Tool (pylint)
Fix based on Language
Other tools: ESlint, Pmd, checkstyle
Result
Background Approach Result Summary
vs Other Tool (1 / 2)
Static Analysis Tool
FOO=0 foo_=_0
24
Static Analysis Tool (pylint)
Fix based on Language
This research:
Project-specific
changes
self.stbout()
xrange()
self.stubs.Set()
range()
Old library
dependency
Language
definition
Result
Background Approach Result Summary
vs Other Tool (2 / 2)
25
Choose best rule set from large rule set
• Invalid-name
• Bad-continuation
• Wrong-import-order
• Invalid-name
• Bad-continuation
• Wrong-import-order
IntelliCode
Result
Background Approach Result Summary
vs Other Tool (2 / 2)
26
Choose best rule set from large rule set
Find NEW pattern set from history
• Invalid-name
• Bad-continuation
• Wrong-import-order
• Invalid-name
• Bad-continuation
• Wrong-import-order
• disk2disk_api
• stubs.Set2stub_out
• assert-equals2equal
IntelliCode
This Study
Result
Background Approach Result Summary
vs Other Tool (2 / 2)
27
Choose best rule set from large rule set
Find NEW pattern set from history
• Invalid-name
• Bad-continuation
• Wrong-import-order
• Invalid-name
• Bad-continuation
• Wrong-import-order
• disk2disk_api
• stubs.Set2stub_out
• assert-equals2equal
IntelliCode
This Study
Support project-specific problem
Support change of environment
Result
Background Approach Result Summary
Future Work
• Which pattern should bot choose?
Most appeared pattern, High accuracy pattern
• Compare with Other Projects and Languages’
Patterns
• Evaluate by submitting pull request, and get ratio of
Accepted / Submitted pull request
28
Summary