Sentence-to-Code Traceability Recovery with Domain Ontologies

Sentence-to-Code
Traceability Recovery
with Domain Ontologies
Shinpei Hayashi, Takashi Yoshikawa, Motoshi Saeki
Tokyo Institute of Technology, Japan

NL sentence

Source code
 Results: It works well!
− Implemented an automated tool
− Performed a case study using JDraw
• Recovered traceability between 7 sentences and code
• Obtained more accurate results than the cases
without using ontology

Background
 Documentation-to-code traceability
links are important
− For reducing maintenance costs
− For software reuse & extension
 Our focus: recovering
sentence-to-code traceability
− In some software products, there’s only
documentation of simple sentences without any
detailed description
E.g., by use of agile/bazaar-style processes

Example of Sentence
 JDraw (http://jdraw.sf.net/)

4

Example of Sentence

5

Example of Sentence

plain, filled and
gradient filled rectangles
Just single sentence,
no detailed documents

6

Aim
 To detect the set of methods related
to the input sentence
NL sentence Source code
(a set of words) (a set of methods)

Users can draw
a plain oval. draw Oval()

writeLog()
getCanvas() getColor()
setPixel()
getColorPallete()

DrawPanel() OvalTool()
7

Challenge
 How to find the correct set?
− Word similarity: leads to false positives/negatives

NL sentence Source code (a set of methods)
(a set of words)

Users can draw draw Oval()
a plain oval.
writeLog()
void setPixel(...) { getCanvas() getColor()
... draw ... False
} setPixel() negatives
False getColorPallete()
positives
8

Challenge
 How to find the correct set?
− Word similarity: leads to false positives/negatives
− Method invocation: leads to false positives
(a set of words) False
positive
Users can draw drawOval()
a plain oval.
writeLog()
setPixel()
getColorPallete()

9

Challenge
 Another criterion required
− To judge whether a method invocation is needed
− Considering the problem domain
(a set of words)

Users can draw drawOval() Not
a plain oval. important
important writeLog()
setPixel()
getColorPallete()

10

Domain Ontology
 Formally representing the knowledge
of the target problem domain
− As relationships between concepts (words)

canvas A concept “canvas” is a possible target
to “draw”.

draw The function “draw”
concerns the concept “color”.

An ontology for
oval color
painting tools (excerpt) 11

Solution
 Choosing method invocations
using domain ontology
(a set of words) (a set of methods and their invocations)

Users can draw
Ontology
writeLog()
canvas getCanvas() getColor()
setPixel()
draw
getColorPallete()

oval color
12

System Overview
Inputs Outputs
Functional
Sentence Call-graphs
(FCGs)

Source 1st
score
Code 100

2nd
Domain score
85
Ontology

13

Procedure
… draw oval
m1
1. Extracting
call-graph
m2 m3
2. Extracting words
3. Extracting functional
call-graphs (FCGs) m4 m5
4. Prioritizing FCGs

m6 m7 m8
14

Procedure
… draw oval
m1
1. Extracting
call-graph
m2 m3
by static analysis
2. Extracting words
3. Extracting functional m4 m5
call-graphs (FCGs)
m6 m7 m8
15

Procedure
… draw oval {..., draw}
{..., draw, oval} m1
1. Extracting {..., color}
{..., draw}
call-graph
m2 m3
2. Extracting words
• Stemming {..., oval}
{..., color}
• Removing stopwords m4 m5
call-graphs (FCGs) {..., pixel} {..., scale} {..., log}
m6 m7 m8
16

Procedure
… draw oval Sa m1
1. Extracting
call-graph
m2 m3
2. Extracting words
3. Extracting
functional call- m4 m5
graphs (FCGs) Sb
m6 m7 m8
17

Procedure
… draw oval Sa m1 Score
100
1. Extracting
call-graph
m2 m3
2. Extracting words
call-graphs (FCGs) m4 m5
4. Prioritizing FCGs S
b

Score m6 m7 m8
85 18

Extracting FCGs
{wa, wb}
m1
1. Root selection
m2 m3

2. Traversal m4 m5

m6 m7 m8
19

Extracting FCGs
{wa, wb} {wa, ...}
m1
role: {wa}
1. Root selection {wb, ...}
– Choose the methods m2 m3
having the words in role: {wb}
the input sentence {wb, ...}
– These words are m4 m5
role: {wb}
tagged as roles

2. Traversal m6 m7 m8
20

Extracting FCGs
{wa, wb}
m1
role: {wa}
1. Root selection
2. Traversal m2 m3
– Traverse method role: {wb}
invocations from
the roots m4 m5
role: {wb}
– Forwards if
the invocation
satisfies one of the m6 m7 m8
traversal rules
21

Extracting FCGs
{wa, wb}
m1
Rule #1 (Sentence-based) role: {wa}
1. Root selection
holds if the callee method
2. Traversal
hasTraverse methodgiven
– a word in the
m2 {..., wb} m3
role: {wb}
sentence
invocations from the
roots if the invocation m4
satisfies one of the role: {wb}
m5
traversal rules

m6 m7 m8
22

Extracting FCGs
{wa, wb}
m1
role: {wa}
1. Root selection
2. Traversal #2 (Ontology-based)
Rule
– Traverse method 2m m3
holds if the method invocation}
role: {wb role: {wc}

matches the relationships in
roots if the invocation m4 {..., wc}
the given ontology
m5
traversal rules

m6 m7 m8
23

Extracting FCGs
{wa, wb}
m1
role: {wa}
1. Root selection
2. Traversal
m2
– Traverse method (Inheritance)
m3
Rule #3 role: {wb} role: {wc}
roots if holds if the callee method
the invocation m4
satisfies onea word in the roles of }
has of the role: {wb
m5 {..., wc}
traversal rules role: {wc}
the caller method
m6 m7 m8
24

Ontology-Based Rule
 Holds if the invocation matches the
relationships in the given ontology
Ontology drawOval()
role: {draw, oval}
draw Caller

canvas Callee
getCanvas()
role: {canvas}
25

Extracting FCGs
{wa, wb} Sa m1
role: {wa}
1. Root selection
2. Traversal m2 m3
– Traverse method role: {wb} role: {wc}
roots m4 m5
role: {wb} role: {wc}
– Traversed methods
will be a FCG Sb
m6 m7 m8
role: {…}
26

Prioritizing FCGs
 Using an weighting scheme
 Criteria: we prioritize FCGs which
1. include methods having important roles as their
names
2. include method invocations matching to the
relationships in the ontology and/or the sentence
3. cover many words in the input sentence

27

Case Study
 Evaluation target: JDraw 1.1.5
− We picked up 7 sentences from JDraw's manual
on the Web
− Prepared an ontology for painting tools
(including 38 concepts and 45 relationships)
− Prepared control (answer) sets of methods by an
expert
 Evaluation Criteria
− Calculating precision and recall values by
comparing the extracted sets with the control
sets
28

Results
Use of ontology Yes Yes No No
Input sentences Prec. Recall Prec. Recall
1. "plain, filled and gradient filled rectangles" 0.83 0.94 1.00 0.19
2. "plain, filled and gradient filled ovals" 0.82 0.98 1.00 0.21
3. "image rotation" 1.00 0.35 0.00 0.00
4. "image scaling" 0.22 0.68 1.00 0.58
5. "save JPEGs of configurable quality" 0.40 1.00 0.67 1.00
6. "colour reduction" 0.74 0.95 0.74 0.95
7. "grayscaling" - - - -

 Picked up the FCGs having the highest F-
measure from the results ranked in the top
5
− F-measure = 2 / (precision-1 + recall-1)
29

Results
3. "image rotation" 1.00 0.35 0.00 0.00
4. "image scaling" 0.22 0.68 1.00 0.58

 Accurate results by using the ontology
− precision > 0.7 for 3 cases
− recall > 0.9 for 4 cases

30

Results
3. "image rotation" 1.00 0.35 0.00 0.00
4. "image scaling" 0.22 0.68 1.00 0.58

 Improvement by using the ontology
− Improved recall for 1st and 2nd cases
− detected traceability for 3rd case
Domain ontology gives us valuable guides
for sentence-to-code traceability recovery. 31

Future Work
 Case study++
− Larger
− Another/multiple domains
 Supporting ontology construction
 Improving NLP techniques
 Combination of other techniques
− Dynamic analysis
− Relevance feedback

Example of Sentence Solution
 Choosing method invocations
using domain ontology

Users can draw
Ontology
writeLog()
canvas getCanvas() getColor()
setPixel()
draw
getColorPallete()

oval color
5 12

Procedure Results
{wa, wb}
a b Sa m1
role: {wa}
1. Root selection
a

2. Traversal
– Traverse method
m2 m3
role: {wb} role: {wc}
invocations from the b c

roots if the invocation
m4 m5  Improvement by using the ontology
traversal rules b role: {wc}
S b
c
− Improved recall for 1st and 2nd cases
− detected traceability for 3rd case
– Traversed methods m6 m7 m8
will be a FCG role: {…}
Domain ontology gives us valuable guides
28 for sentence-to-code traceability recovery. 34

System Overview
Words in the
Sentence Sentence
Extracting Words
splitting, stemming,
extract
Source identifiers removing stop words... Words in the
Code Code
static
analysis Call-graph
Domain Functional
Ontology Traversing Call-graphs
call-graph
Inputs Prioritizing

Outputs Ordered functional
Call-graphs
39

Extracting Call Graph
 Statically extracting invocations
 Consideration of overridden methods
class Tool ....
class OvalTool extends Tool ... Drawer# Drawer#
class Drawer { startDraw startDraw
public void startDraw() {
Tool tool = Tool extracted
draw()
Tool.getCurrent();
tool.draw(); Tool# OvalTool#
} OvalTool
draw() draw draw
}
40

Extracting Words
 From source code {starts,
with,
Extracting startsWith, Tokenizing starts,
identifiers startDraw, ... camelCases draw, ...}

 From sentence
{draw, a, Normalizing
draw a circle circle}

 Normalization {draw, {start,
− Stemming oval} draw, ...}
− Removing stop words
41

Bad Results
3. "image rotation" 1.00 0.35 0.00 0.00
4. "image scaling" 0.22 0.68 1.00 0.58

 Bad effects, no improvement
− reason: use of external modules
− reason: use of compound words (grayscale vs. grayScale)

42

Sentence-to-Code Traceability Recovery with Domain Ontologies

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (18)

Destaque

Destaque (17)

Semelhante a Sentence-to-Code Traceability Recovery with Domain Ontologies

Semelhante a Sentence-to-Code Traceability Recovery with Domain Ontologies (20)

Mais de Shinpei Hayashi

Mais de Shinpei Hayashi (9)

Último

Último (20)

Sentence-to-Code Traceability Recovery with Domain Ontologies