1. General Summary of
Program Repair, and
Semantic Repair
Abhik Roychoudhury
National University of Singapore
Dagstuhl seminar, 2017
2. Bug Fixing
o Most software has many bugs.
o Security-related bugs should be fixed before they are exploited by malicious
users.
o Oftentimes, bugs are not fixed even a few months after they were reported.
o E.g. Bug 18665 of glibc
• Reported and responded on July 2015
• Patched on Feb 2016
• CVSS score: 8.1 / 10 (buffer overflow)
o “Thanks for the bug report. Do you have a test case that triggers this scenario? Do
you have a patch or suggested fix?”
Dagstuhl seminar, 2017
3. (Why) Program Repair
1. “Patches as better bug reports” [Weimer 2006].
2. Automating the simple one-line fixes as patch suggestions
• Work with companies with commercial testing tools.
• automating targeted repair techniques with template fixes e.g.
overflows.
3. Grading and understanding of programming assignments
• … if only the education business takes off
4. …
Note: 2 & 3 are very different businesses.
Dagstuhl seminar, 2017
4. DARPA CGC
4
A team of hackers won $2 million by building a
machine that could hack better than they could
Read more at
http://www.businessinsider.sg/forallsecure-mayhem-
darpa-cyber-grand-challenge-2016-
8/#ZuIF7Dmq3aaCAdaq.99
DARPA Cyber Grand
Challenge
-> Automation of Security
[detecting and fixing
vulnerabilities
automatically]
5. (Troubles with) Repair
• Weak description of intended behavior / correctness criterion e.g. tests
• Possibility to use “Bugs as deviant behavior” philosophy
• Weak applicability of repair techniques e.g. only overflow errors
• Large search space of candidate patches for general-purpose repair tools.
• Patch suggestions and Interactive Repair
Dagstuhl seminar, 2017
6. Correctness Criterion
• Assertions or Specifications
o May be suitable for targeted repair e.g. access control policy
• Bugs as deviant behavior
o A property which is rarely violated – dynamic invariants!
o Make sure that it is never violated [Clearview paper, SOSP 2009]
• Test-driven repair
o Repair based on test cases, to pass them.
o Most works we talk about use this criterion.
o Brings us to issues like strength of test oracle, quality of test-suite …
Dagstuhl seminar, 2017
7. Large search space –
syntax directed view
1. Where to fix –
in which line?
2. Generate the
candidate patches in
this line.
3. Validate the
candidate patches.
Dagstuhl seminar, 2017
8. Large search space –
semantic view
1. Where to fix –
in which line?
2. What values should
be returned by these
lines? <inp=1, ret=0>
3. What are the
expressions which will
return these values?
Dagstuhl seminar, 2017
9. High level view
Dagstuhl seminar, 2017
Test input
Concrete
values
Expected output of
program
Output:
Value-set or Constraint
Symbolic
execution
Program
Concrete Execution
10. General purpose repair
• … given a test-suite [Conceptual characterization]
o Generate –and-test patches (GenProg)
o Specification inference and patch synthesis
• Infer specification or properties about the patch to be synthesized.
• Meet the specification by enumeration, or by solving constraints.
• Various works – SemFix, Nopol, SPR, …
o Ordering of search space of patches
• Use minimality to prioritize the search space.
• Use learning approaches to prioritize the search space.
o Patch templates can be learnt from human fixes.
Dagstuhl seminar, 2017
11. General purpose repair
• … given a test-suite [Technical characterization]
o Generate –and-test patches (heuristic search)
• Use a well-known search framework GP for program repair
o Specification inference and patch synthesis
• Infer specification or properties about the patch to be synthesized.
• Meet the specification by searching in a space, or by solving constraints.
• Develop a customized search algorithm for each of the repair sub-problems, or use
symbolic execution to infer specifications about the patch.
o Embed a patch quality criterion in repair.
• Use minimality to prioritize the search space.
• Patch templates can be learnt from human fixes, or favor small fixes.
• Machine learning is used to re-order the search space.
Dagstuhl seminar, 2017
12. Specification Inference
• Infer specification or properties about the patch to be
synthesized.
o Meet the specification by searching in a space, or by solving constraints.
o Develop a customized search algorithm for each of the repair sub-problems, or use
symbolic execution to infer specifications about the patch.
Dagstuhl seminar, 2017
1. Where to fix –
in which line?
2. What values should
be returned by these
lines? <inp=1, ret=0>
3. What are the
expressions which will
return these values?
a. Enumerate values within a restricted domain e.g.
T/F values for conditions [SPR]
b. Use symbolic exec. to get sample values. [Angelix]
c. Use symbolic exec. to infer all possible values as
constraint. [SemFix]
13. Interactive Repair
RQ1: Can users help the
tool to improve the
accuracy of the fix
localization process?
RQ2: Can users help
the tool to quickly
and effectively find a
correct patch?
● Interactive Fault Localization Using Test Information
○ Recommend checking points or breakpoints
○ Patch suggestions at or around break-points
● Iterative Bug Isolation
14. Interactive Repair
if( a || b)
Branch is never executed line 2
Branch is never executed line 3
void getLargest(int a, int b, int c){
if( a > b && b > a)
printf(“%d”, b)
else if( b >= a && b >= c )
printf(“%d”, b)
else if( c >= a && c >= b )
printf(“%d”, c)
}
Branch is never executed
• Change condition to
a > b && a > c
• Remove b > a
• Remove branch
Automatic
breakpoint
Insertion
Anti-patterns as fault explanation in natural language
• a > b && b > a is a trivial condition
Dagstuhl seminar, 2017
Multiple
buggy
locations
15. if( a || b)
Expected c but got b line 3
void getLargest(int a, int b, int c){
if( a > b && a > c)
printf(“%d”, b)
else if( b >= a && b >= c )
printf(“%d”, b)
else if( c >= a && c >= b )
printf(“%d”, c)
}
Expected c but got b
• Change b to a
Interactive Repair
• Iterative Bug Isolation
Dagstuhl seminar, 2017
Interactive &
Iterative fault
localization
16. Syntax and semantics
based
Syntax-based Schematic
for 𝑒 𝜖 𝑆𝑒𝑎𝑟𝑐ℎ𝑆𝑝𝑎𝑐𝑒 do
validate 𝑒
done
Semantics-based Schematic
for 𝑝𝑎𝑟𝑡𝑖𝑡𝑖𝑜𝑛 𝜋: ∃𝛼. 𝜋 𝛼 do
synthesize 𝑒
done
Dagstuhl seminar, 2017
17. Comparison
Dagstuhl seminar, 2017
Syntax-based Schematic
for 𝑒 𝜖 𝑆𝑒𝑎𝑟𝑐ℎ𝑆𝑝𝑎𝑐𝑒 do
validate 𝑒 // break if possible
done
Semantics-based Schematic
for 𝑝𝑎𝑟𝑡𝑖𝑡𝑖𝑜𝑛 𝜋: ∃𝛼. 𝜋 𝛼 do
synthesize 𝑒 // cannot break
done
Syntax-based Schematic
for 𝑒 𝜖 𝑆𝑒𝑎𝑟𝑐ℎ𝑆𝑝𝑎𝑐𝑒 do // long loop
validate 𝑒
done
Semantics-based Schematic
for 𝑝𝑎𝑟𝑡𝑖𝑡𝑖𝑜𝑛 𝜋: ∃𝛼. 𝜋 𝛼 do
// efficient grouping
synthesize 𝑒
done
18. Expand the schematic
Dagstuhl seminar, 2017
Semantics-based Schematic
for 𝑝𝑎𝑟𝑡𝑖𝑡𝑖𝑜𝑛 𝜋: ∃𝛼. 𝜋 𝛼 do
synthesize 𝑒
done
Semantics-based Schematic
for 𝑝𝑎𝑡ℎ 𝜋: ∃𝛼. 𝜋 𝛼 do
synthesize 𝑒
done
Semantics-based Schematic
for each path do
Get repair constraint and
Solve to construct e
done
Semantics based schematic
Get repair constraint from tests;
Conjoin repair constraint from
each test.
19. Conjure up a function
Dagstuhl seminar, 2017
Buggy Program
…
var = a + b – c;x
Failing test input
Concrete Execution
Symbolic Execution with x as the only
unknown
Path conditions,
Output Expressions
x = f(Live Vars)
Get properties of
function f via
symbolic execution.
Construct a function
f which satisfies
these properties !
20. Example
1 int is_upward( int inhibit, int up_sep, int down_sep){
2 int bias;
3 if (inhibit)
4 bias = down_sep; // bias= up_sep + 100
5 else bias = up_sep ;
6 if (bias > down_sep)
7 return 1;
8 else return 0;
9 }
inhibit up_sep down_se
p
Observed
output
Expected
Output
Result
1 0 100 0 0 pass
1 11 110 0 1 fail
0 100 50 1 1 pass
1 -20 60 0 1 fail
0 0 10 0 0 pass
20
21. Repair Constraint
1 int is_upward( int inhibit, int up_sep, int down_sep){
2 int bias;
3 if (inhibit)
4 bias = down_sep; // bias= up_sep + 100
5 else bias = up_sep ;
6 if (bias > down_sep)
7 return 1;
8 else return 0;
9 }
inhibit up_sep down_se
p
Observed
output
Expected
Output
Result
1 11 110 0 1 fail
inhibit = 1, up_sep = 11, down_sep = 110
bias = X, path condition = true
inhibit = 1, up_sep = 11, down_sep = 110
bias = X, path condition = X> 110
inhibit = 1, up_sep = 11, down_sep = 110
bias = X, path condition = X ≤ 110
Line 4
Line 7 Line 8
21
22. Repair Constraint
1 int is_upward( int inhibit, int up_sep, int
down_sep){
2 int bias;
3 if (inhibit)
4 bias = f(inhibit, up_sep, down_sep)
5 else bias = up_sep ;
6 if (bias > down_sep)
7 return 1;
8 else return 0;
9 }
Inhibit
== 1
up_sep ==
11
down_se
p == 110
Symbolic Execution
f(1,11,110) > 110
22
23. Function synthesis
• Instead of solving
• Select primitive components to be used by the synthesized program
based on complexity
• Look for a program that uses only these primitive components and
satisfy the repair constraint
o Done via another constraint solving problem – pgm. synthesis
• Solving the repair constraint is the key, not how it is solved
• Enumerate expressions over a given set of components / operators
o Enforce axioms of the operators
o If candidate repair contains a constant, solve using SMT
Repair Constraint:
f(1,11,110) > 110 f(1,0,100) ≤ 100
f(1,-20,60) > 60
23
24. Patch as minimal change
24
Failing tests Debugging DSE
Synthesis
Failing tests
MaxSMT solver
Conjure a function which
represents minimal change
to buggy program.
25. Example
25
if (x > y)
if (x > z)
out =10;
else
out = 20;
else
out = 30;
return out; if (x >= y)
if (x >= z)
out =10;
else
out = 20;
else
out = 30;
return out;
if (x > y)
if (x > z)
out =10;
else
out = 20;
else
out = 30;
return ((x==y)? ((x==z)?10: 20)): out);
SemFix
DirectFix
Test cases:
all possible
orderings of x,y,z
26. No fault localization
26
int foo(int x, int y){
if (x > y)
y = y + 1;
else
y = y – 1;
return y + 2;
}
Test: foo(0,0) == 3?
x = 0 y = 0 result = 3
( if (x1 > y1) then (y2 = y1 + 1) else (y2 = y1 – 1)
(result = y2 + 2)
)
=
UNSAT
27. Constraint = Whole Pgm.
27
27
x = 0 y = 0 result = 3
( if (x1 > y1) then (y2 = y1 + 1) else (y2 = y1 – 1)
(result = y2 + 2)
)
= UNSAT
( if (x1 >= y1) then (y2 = y1 + 1) else (y2 = y1 – 1)
(result = y2 + 2)
)
x = 0 y = 0 result = 3 = SAT
30. Remember the schematic?
Dagstuhl seminar, 2017
Semantics-based Schematic
for 𝑝𝑎𝑟𝑡𝑖𝑡𝑖𝑜𝑛 𝜋: ∃𝛼. 𝜋 𝛼 do
synthesize 𝑒
done
Semantics-based Schematic
for 𝑝𝑎𝑡ℎ 𝜋: ∃𝛼. 𝜋 𝛼 do
synthesize 𝑒
done
Semantics-based Schematic
for 𝑝𝑎𝑡ℎ 𝜋: ∃𝛼. 𝜋 𝛼 do
for all test t get constraint t
Solve t t to construct 𝑒
done
Semantics-based Schematic
for 𝑝𝑎𝑡ℎ 𝜋: ∃𝛼. 𝜋 𝛼 do
Get repair constraint
Solve to construct 𝑒
done
31. Value based “Constraint”
Dagstuhl seminar, 2017
Semantics-based Schematic
for 𝑝𝑎𝑡ℎ 𝜋: ∃𝛼. 𝜋 𝛼 do
for all test t get constraint t
Solve t t to construct 𝑒
done
Instead of representing t as a SMT constraint represent it using values.
Value that is arbitrarily set during execution to a selected
expression and that makes the program pass.
Can be found by solving path condition of failing test case 𝐼, 𝑂 :
𝑝𝑎𝑡ℎ𝑐𝑜𝑛𝑑𝑖𝑡𝑖𝑜𝑛 𝛼 ∧ 𝑖𝑛𝑝𝑢𝑡 = 𝐼 ∧ 𝑜𝑢𝑡𝑝𝑢𝑡 = 𝑂
34. Repair Constraint
• SemFix work (ICSE 2013)
o Example: for an identified expression e to be fixed
• [ e > 0 ] ∧ f(t) == e for each test t
• DirectFix work (ICSE 2015)
o Whole Program as repair constraint
o Use the principle of minimality to synthesize a minimal patch.
• Angelix work (ICSE 2016)
o Example: for identified expressions e1, e2, … to be fixed
o [ (e == 1) ∨ (e == 2) ∨ (e== 3)] ∧ f(t) ==e for each test t.
o [ (e1 == 0 ∧ e2 == 1) ∨ (e1==1 ∧e2 ==0)] ∧ f(t) ==e1∧g(t)==e2 for each
test t.
Dagstuhl seminar, 2017
38. “Latest”
Results
38
1 i f ( hbtype == TLS1 HB REQUEST) {
2 . . .
3 memcpy (bp , pl , payload ) ;
4 . . .
5 }
(a) The buggy part of the Heartbleed-
vulnerable OpenSSL
1 i f ( hbtype == TLS1 HB REQUEST
2 && payload + 18 < s->s3->rrec.length) {
3 . . .
4 }
(b) A fix generated automatically
1 if (1 + 2 + payload + 16 > s->s3->rrec.length)
2 return 0;
3 . . .
4 i f ( hbtype == TLS1_HB_REQUEST) {
5 . . .
6 }
7 e l s e i f ( hbtype == TLS1_HB_RESPONSE) {
8 . . .
9 }
10 r e t u r n 0 ;
(c) The developer-provided repair
The Heartbleed Bug is a serious vulnerability in the popular
OpenSSL cryptographic software library. This weakness allows
stealing the information protected, under normal conditions, by the
SSL/TLS encryption used to secure the Internet. SSL/TLS provides
communication security and privacy over the Internet for
applications such as web, email, instant messaging (IM) and some
virtual private networks (VPNs).
--- Source: heartbleed.com