You Cannot Fix What You Cannot Find! --- An Investigation of Fault Localization Bias in Benchmarking Automated Program Repair Systems

You Cannot Fix What You Cannot Find!
An Investigation of Fault Localization Bias in
Benchmarking Automated Program Repair Systems
Kui Liu, Anil Koyuncu, Tegawendé F. Bissyandé, Dongsun Kim,
Jacques Klein and Yves Le Traon
SnT, University of Luxembourg, Luxembourg
Xi’an, China, 12th ICST 2019April 24, 2019

2
> Automated Program Repair (APR) Basic Process
Test
Pass
Fail
Patch Validation
Plausible
Patch
Patch
Candidate
APR
Approach
Patch Generation
Fault
Localization
Tools
A Ranked List
of Suspicious
Code
Locations
Fault Localization (FL)
Buggy
Program
Passing
tests
Failing
tests

3
> Repair Performance Assessment of APR Tools
GenProg, ACS,
CapGen, SimFix, etc.

4
Benchmark
Defects4J
SimFix jGP jKali Nopol ACS HDR ssFix ELIXIR JAID
Chart 4 0 0 1 2 -(2) 3 4 2(4)
Closure 6 0 0 0 0 -(7) 2 0 5(9)
Math 14 5 1 1 12 -(7) 10 12 1(7)
Lang 9 0 0 3 3 -(6) 5 8 1(5)
Time 1 0 0 0 1 -(1) 0 2 0(0)
Total 34 5 1 5 18 13(23) 20 26 9(25)
TABLE I. TABLE EXCERPTED FROM [33] WITH THE CAPTION
“Correct patches generated by different techniques”.
Is this kind of comparison fair enough?
*The numbers in parenthesis(#) denote the number of bugs fixed by APR tools but ignoring the patch ranking.

6
> Automated Program Repair (APR) Basic Process
Test
Pass
Fail
Patch Validation
Plausible
Patch
Patch
Candidate
APR
Approach
Patch Generation
Fault
Localization
Tools
A Ranked List
of Suspicious
Code
Locations
Buggy
Program
Passing
tests
Failing
tests
Same SameContributionNo Discussion
There is no any disclosed discussion on the impact of fault localization for
APR tool repair performance.

7
Benchmark
Defects4J
SimFix jGP jKali Nopol ACS HDR ssFix ELIXIR JAID
Chart 4 0 0 1 2 -(2) 3 4 2(4)
Closure 6 0 0 0 0 -(7) 2 0 5(9)
Math 14 5 1 1 12 -(7) 10 12 1(7)
Lang 9 0 0 3 3 -(6) 5 8 1(5)
Time 1 0 0 0 1 -(1) 0 2 0(0)
Total 34 5 1 5 18 13(23) 20 26 9(25)
TABLE I. TABLE EXCERPTED FROM [33] WITH THE CAPTION
“Correct patches generated by different techniques”.
BIASED
If the fault localization of APR tools are different from each
other, this kind of comparison could be biased or misleading.

9
>
Spectrum-based fault localization, e.g.,
𝑆 𝑂𝑐ℎ𝑖𝑎𝑖 𝑠 =
𝑓𝑎𝑖𝑙𝑒𝑑(𝑠)
𝑓𝑎𝑖𝑙𝑒𝑑 𝑠 + 𝑝𝑎𝑠𝑠𝑒𝑑 𝑠 ∗ (𝑓𝑎𝑖𝑙𝑒𝑑 𝑠 + 𝑓𝑎𝑖𝑙𝑒𝑑(¬𝑠))
𝑆 𝑇𝑎𝑟𝑎𝑛𝑡𝑢𝑙𝑎 𝑠 =
𝑆 𝐵ar𝑖𝑛𝑒𝑙 𝑠 = 1 −
𝑝𝑎𝑠𝑠𝑒𝑑(𝑠)
𝑓𝑎𝑖𝑙𝑒𝑑 𝑠 + 𝑝𝑎𝑠𝑠𝑒𝑑(𝑠)
𝑓𝑎𝑖𝑙𝑒𝑑 𝑠 + 𝑓𝑎𝑖𝑙𝑒𝑑(¬𝑠)
𝑓𝑎𝑖𝑙𝑒𝑑 𝑠 + 𝑓𝑎𝑖𝑙𝑒𝑑(¬𝑠)
+
𝑝𝑎𝑠𝑠𝑒𝑑(𝑠)
𝑝𝑎𝑠𝑠𝑒𝑑 𝑠 + 𝑝𝑎𝑠𝑠𝑒𝑑(¬𝑠)
Fault Localization used in Automated Program Repair for Java

10
>
<Suspicious Class Name> <Line> <Suspicious Value>
org.jfree.chart.plot.CategoryPlot 1613 0.316227766
org.jfree.chart.renderer.category.AbstractCategoryItemRenderer 1798 0.115
org.jfree.data.category.DefaultCategoryDataset 259 0.0733235575
Fault Localization used in Automated Program Repair for Java

11
> Fault Localization Bias in APR
Test
Pass
Fail
Patch Validation
Plausible
Patch
Patch
Candidate
APR
Approach
Patch Generation
Fault
Localization
Tools
A Ranked List
of Suspicious
Code
Locations
Buggy
Program
Passing
tests
Failing
tests

>
12
> Research Questions
• RQ1: How do APR systems leverage fault localization techniques?
• RQ2: How many bugs from a common APR benchmark are actually localizable?
• RQ3: To what extent APR performance can vary by adapting different fault
localization configurations?

13
RQ1: Integration of Fault Localization Tools
in APR Pipelines

14
> FL Techniques used in APR tools
HDRepair,
ssFix,
ACS
SimFix.
APR Tools FL framework Framework version FL ranking metric Supplementary information
jGenProg GZoltar 0.1.1 Ochiai ∅
jKali GZoltar 0.1.1 Ochiai ∅
jMutRepair GZoltar 0.1.1 Ochiai ∅
HDRepair ? ? ? Faulty method is known.
Nopol GZoltar 0.0.10 Ochiai ∅
ACS GZoltar 0.1.1 Ochiai Predicate switching [53].
ELIXIR ? ? Ochiai ?
JAID ? ? ? ?
ssFix GZoltar 0.1.1 ? Statements in crashed stack trace.
CapGen GZoltar 0.1.1 Ochiai ?
SketchFix ? ? Ochiai ?
FixMiner GZoltar 0.1.1 Ochiai ∅
LSRepair GZoltar 0.1.1 Ochiai ∅
SimFix GZoltar 1.6.0 Ochiai Test case purification [54].

15
1. State-of-the-art APR systems in the literature add some
adaptations to the usual FL process to improve its accuracy.
Unfortunately, researchers have eluded so far the contribution of
this improvement in the overall repair performance, leading to
biased comparisons.
2. APR systems do not fully disclose their fault localization tuning
parameters, thus preventing reliable replication and comparisons.
> Take-away Message

16
RQ2: Localizability of Defects4J Bugs

17
>
How many bugs in the Defects4J benchmark can actually be localized by
current automated fault localization tools?
Localizability: --- a/src/org/jfree/data/time/Week.java
+++ b/src/org/jfree/data/time/Week.java
@@ −173, 1 +173, 1 @@
public Week(Date time, TimeZone zone) {
// defer argument checking...
- this(time, RegularTimePeriod.DEFAULT_TIME_ZONE,
Locale.getDefault());
+ this(time, zone, Locale.getDefault);
}
File Level
Method Level
Line Level

18
>
Project # Bugs
File Method Line
GZ-0.1.1 GZ-1.6.0 GZ-0.1.1 GZ-1.6.0 GZ-0.1.1 GZ-1.6.0
Chart 26 25 25 22 24 22 24
Closure 133 113 128 78 96 78 95
Lang 65 54 64 32 59 29 57
Math 106 101 105 92 100 91 100
Mockito 38 25 26 22 24 21 23
Time 27 26 26 22 22 22 22
Total 395 344 374 268 325 263 321
Number of Defects4J bugs localized with GZoltar + Ochiai.
One third of bugs in the Defects4J dataset cannot be localized at line
level by the commonly used automated fault localization tool.

19
>
Ranking
Metric
GZoltar-0.1.1 GZoltar-1.6.0
File Method Line File Method Line
Top-1 Position
Tarantula 171 101 45 169 106 35
Ochiai 173 102 45 172 111 38
DStar 173 102 45 175 114 40
Barinel 171 101 45 169 107 36
Opt2 175 97 39 179 115 39
Muse 170 98 40 178 118 41
Jaccard 173 102 45 171 112 39
Top-10 Position
Tarantula 240 180 135 242 189 144
Ochiai 244 184 140 242 191 145
DStar 245 184 139 242 190 142
Barinel 240 180 135 242 190 145
Opt2 237 168 128 239 184 135
Muse 234 169 129 239 186 140
Jaccard 245 184 139 241 188 142
A small part of bugs
can be localized with
high positions in the
ranking list of
suspicious positions.

20
> Impact of Effective Ranking in Fault Localization
APR tools are prone to correctly fix the subset of
Defects4J bugs that can be accurately localized.

21
1. One third of bugs in the Defects4J dataset cannot be localized by the
commonly used automated fault localization tool.
2. APR tools are prone to correctly fix the subset of Defects4J bugs that
can be accurately localized.
> Take-away Message

22
RQ3: Evaluation kPAR with Specific
Fault Localization Configurations

23
> kPAR with four different fault localization configurations
Normal FL:
File Assumption:
Method
Assumption:
Line Assumption:
It relies on the ranked list of suspicious code locations reported
by a given FL tool.
It assumes that the faulty code files are known.
It assumes that the faulty methods are known.
It assumes that the faulty code lines are known. No
fault localization is then used.
kPAR: Java implementation of PAR (Kim et al. ICSE 2013)
+ Gzoltar-0.1.1 + Ochiai.

24
> Repair Performance of kPAR with Four FL Configurations
FL Conf. Chart (C) Closure (Cl) Lang (L) Math (M) Mockito (Moc) Time (T) Total
Normal FL 3/10 5/9 1/8 7/18 1/2 1/2 18/49
File Assumption 4/7 6/13 1/8 7/15 2/2 2/3 22/48
Method Assumption 4/6 7/16 1/7 7/15 2/2 2/3 23/49
Line Assumption 7/8 11/16 4/9 9/16 2/2 3/4 36/55
Number of Defects4J bugs fixed by kPAR with four FL configurations.
C-1, 4, 7, L-59.
Cl-2, 38, 62, 63, 73.
M-15, 33, 58, 70, 75, 85, 89.
Moc-38, T-7.
File_AssumptionNormal_FL
C-26, Cl-4,
Moc-29, T-19.
Cl-10
Method_Assumption
C-8,14,19.
Cl-31,38,40,70.
M-4,82, T-26.
L-6,22,24.
Line_Assumption
With better fault localization results, kPAR can correctly fix more bugs.

25
> Take-away Message
Accuracy of fault localization has a
direct and substantial impact on the
performance of APR repair pipelines.

27
> Discussions
1.Full disclosure of FL parameters.
2.Qualification of APR performance.
3.Patch generation step vs Repair pipeline.
4.Sensitivity of search space.

28
> Summary
7
> Fault Localization Bias in APR
Test
Pass
Fail
Patch Validation
Plausible
Patch
Patch
Candidate
APR
Approach
Patch Generation
Fault
Localization
Tools
A Ranked List
of Suspicious
Code
Locations
Buggy
Program
Passing
tests
Failing
tests

You Cannot Fix What You Cannot Find! --- An Investigation of Fault Localization Bias in Benchmarking Automated Program Repair Systems

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a You Cannot Fix What You Cannot Find! --- An Investigation of Fault Localization Bias in Benchmarking Automated Program Repair Systems

Semelhante a You Cannot Fix What You Cannot Find! --- An Investigation of Fault Localization Bias in Benchmarking Automated Program Repair Systems (20)

Último

Último (20)

You Cannot Fix What You Cannot Find! --- An Investigation of Fault Localization Bias in Benchmarking Automated Program Repair Systems

Notas do Editor