2. NL sentence
Source code
Results: It works well!
− Implemented an automated tool
− Performed a case study using JDraw
• Recovered traceability between 7 sentences and code
• Obtained more accurate results than the cases
without using ontology
3. Background
Documentation-to-code traceability
links are important
− For reducing maintenance costs
− For software reuse & extension
Our focus: recovering
sentence-to-code traceability
− In some software products, there’s only
documentation of simple sentences without any
detailed description
E.g., by use of agile/bazaar-style processes
6. Example of Sentence
JDraw (http://jdraw.sf.net/)
plain, filled and
gradient filled rectangles
Just single sentence,
no detailed documents
6
7. Aim
To detect the set of methods related
to the input sentence
NL sentence Source code
(a set of words) (a set of methods)
Users can draw
a plain oval. draw Oval()
writeLog()
getCanvas() getColor()
setPixel()
getColorPallete()
DrawPanel() OvalTool()
7
8. Challenge
How to find the correct set?
− Word similarity: leads to false positives/negatives
NL sentence Source code (a set of methods)
(a set of words)
Users can draw draw Oval()
a plain oval.
writeLog()
void setPixel(...) { getCanvas() getColor()
... draw ... False
} setPixel() negatives
False getColorPallete()
positives
DrawPanel() OvalTool()
8
9. Challenge
How to find the correct set?
− Word similarity: leads to false positives/negatives
− Method invocation: leads to false positives
NL sentence Source code (a set of methods)
(a set of words) False
positive
Users can draw drawOval()
a plain oval.
writeLog()
getCanvas() getColor()
setPixel()
getColorPallete()
DrawPanel() OvalTool()
9
10. Challenge
Another criterion required
− To judge whether a method invocation is needed
− Considering the problem domain
NL sentence Source code (a set of methods)
(a set of words)
Users can draw drawOval() Not
a plain oval. important
important writeLog()
getCanvas() getColor()
setPixel()
getColorPallete()
DrawPanel() OvalTool()
10
11. Domain Ontology
Formally representing the knowledge
of the target problem domain
− As relationships between concepts (words)
canvas A concept “canvas” is a possible target
to “draw”.
draw The function “draw”
concerns the concept “color”.
An ontology for
oval color
painting tools (excerpt) 11
12. Solution
Choosing method invocations
using domain ontology
NL sentence Source code
(a set of words) (a set of methods and their invocations)
Users can draw
a plain oval. draw Oval()
Ontology
writeLog()
canvas getCanvas() getColor()
setPixel()
draw
getColorPallete()
DrawPanel() OvalTool()
oval color
12
14. Procedure
NL sentence Source code
(a set of words) (a set of methods and their invocations)
… draw oval
m1
1. Extracting
call-graph
m2 m3
2. Extracting words
3. Extracting functional
call-graphs (FCGs) m4 m5
4. Prioritizing FCGs
m6 m7 m8
14
15. Procedure
NL sentence Source code
(a set of words) (a set of methods and their invocations)
… draw oval
m1
1. Extracting
call-graph
m2 m3
by static analysis
2. Extracting words
3. Extracting functional m4 m5
call-graphs (FCGs)
4. Prioritizing FCGs
m6 m7 m8
15
16. Procedure
NL sentence Source code
(a set of words) (a set of methods and their invocations)
… draw oval {..., draw}
{..., draw, oval} m1
1. Extracting {..., color}
{..., draw}
call-graph
m2 m3
2. Extracting words
• Stemming {..., oval}
{..., color}
• Removing stopwords m4 m5
3. Extracting functional
call-graphs (FCGs) {..., pixel} {..., scale} {..., log}
4. Prioritizing FCGs
m6 m7 m8
16
17. Procedure
NL sentence Source code
(a set of words) (a set of methods and their invocations)
… draw oval Sa m1
1. Extracting
call-graph
m2 m3
2. Extracting words
3. Extracting
functional call- m4 m5
graphs (FCGs) Sb
4. Prioritizing FCGs
m6 m7 m8
17
18. Procedure
NL sentence Source code
(a set of words) (a set of methods and their invocations)
… draw oval Sa m1 Score
100
1. Extracting
call-graph
m2 m3
2. Extracting words
3. Extracting functional
call-graphs (FCGs) m4 m5
4. Prioritizing FCGs S
b
Score m6 m7 m8
85 18
19. Extracting FCGs
NL sentence Source code
(a set of words) (a set of methods and their invocations)
{wa, wb}
m1
1. Root selection
m2 m3
2. Traversal m4 m5
m6 m7 m8
19
20. Extracting FCGs
NL sentence Source code
(a set of words) (a set of methods and their invocations)
{wa, wb} {wa, ...}
m1
role: {wa}
1. Root selection {wb, ...}
– Choose the methods m2 m3
having the words in role: {wb}
the input sentence {wb, ...}
– These words are m4 m5
role: {wb}
tagged as roles
2. Traversal m6 m7 m8
20
21. Extracting FCGs
NL sentence Source code
(a set of words) (a set of methods and their invocations)
{wa, wb}
m1
role: {wa}
1. Root selection
2. Traversal m2 m3
– Traverse method role: {wb}
invocations from
the roots m4 m5
role: {wb}
– Forwards if
the invocation
satisfies one of the m6 m7 m8
traversal rules
21
22. Extracting FCGs
NL sentence Source code
(a set of words) (a set of methods and their invocations)
{wa, wb}
m1
Rule #1 (Sentence-based) role: {wa}
1. Root selection
holds if the callee method
2. Traversal
hasTraverse methodgiven
– a word in the
m2 {..., wb} m3
role: {wb}
sentence
invocations from the
roots if the invocation m4
satisfies one of the role: {wb}
m5
traversal rules
m6 m7 m8
22
23. Extracting FCGs
NL sentence Source code
(a set of words) (a set of methods and their invocations)
{wa, wb}
m1
role: {wa}
1. Root selection
2. Traversal #2 (Ontology-based)
Rule
– Traverse method 2m m3
holds if the method invocation}
invocations from the
role: {wb role: {wc}
matches the relationships in
roots if the invocation m4 {..., wc}
the given ontology
satisfies one of the role: {wb}
m5
traversal rules
m6 m7 m8
23
24. Extracting FCGs
NL sentence Source code
(a set of words) (a set of methods and their invocations)
{wa, wb}
m1
role: {wa}
1. Root selection
2. Traversal
m2
– Traverse method (Inheritance)
m3
Rule #3 role: {wb} role: {wc}
invocations from the
roots if holds if the callee method
the invocation m4
satisfies onea word in the roles of }
has of the role: {wb
m5 {..., wc}
traversal rules role: {wc}
the caller method
m6 m7 m8
24
25. Ontology-Based Rule
Holds if the invocation matches the
relationships in the given ontology
Ontology drawOval()
role: {draw, oval}
draw Caller
canvas Callee
getCanvas()
role: {canvas}
25
26. Extracting FCGs
NL sentence Source code
(a set of words) (a set of methods and their invocations)
{wa, wb} Sa m1
role: {wa}
1. Root selection
2. Traversal m2 m3
– Traverse method role: {wb} role: {wc}
invocations from the
roots m4 m5
role: {wb} role: {wc}
– Traversed methods
will be a FCG Sb
m6 m7 m8
role: {…}
26
27. Prioritizing FCGs
Using an weighting scheme
Criteria: we prioritize FCGs which
1. include methods having important roles as their
names
2. include method invocations matching to the
relationships in the ontology and/or the sentence
3. cover many words in the input sentence
27
28. Case Study
Evaluation target: JDraw 1.1.5
− We picked up 7 sentences from JDraw's manual
on the Web
− Prepared an ontology for painting tools
(including 38 concepts and 45 relationships)
− Prepared control (answer) sets of methods by an
expert
Evaluation Criteria
− Calculating precision and recall values by
comparing the extracted sets with the control
sets
28
29. Results
Use of ontology Yes Yes No No
Input sentences Prec. Recall Prec. Recall
1. "plain, filled and gradient filled rectangles" 0.83 0.94 1.00 0.19
2. "plain, filled and gradient filled ovals" 0.82 0.98 1.00 0.21
3. "image rotation" 1.00 0.35 0.00 0.00
4. "image scaling" 0.22 0.68 1.00 0.58
5. "save JPEGs of configurable quality" 0.40 1.00 0.67 1.00
6. "colour reduction" 0.74 0.95 0.74 0.95
7. "grayscaling" - - - -
Picked up the FCGs having the highest F-
measure from the results ranked in the top
5
− F-measure = 2 / (precision-1 + recall-1)
29
30. Results
Use of ontology Yes Yes No No
Input sentences Prec. Recall Prec. Recall
1. "plain, filled and gradient filled rectangles" 0.83 0.94 1.00 0.19
2. "plain, filled and gradient filled ovals" 0.82 0.98 1.00 0.21
3. "image rotation" 1.00 0.35 0.00 0.00
4. "image scaling" 0.22 0.68 1.00 0.58
5. "save JPEGs of configurable quality" 0.40 1.00 0.67 1.00
6. "colour reduction" 0.74 0.95 0.74 0.95
7. "grayscaling" - - - -
Accurate results by using the ontology
− precision > 0.7 for 3 cases
− recall > 0.9 for 4 cases
30
31. Results
Use of ontology Yes Yes No No
Input sentences Prec. Recall Prec. Recall
1. "plain, filled and gradient filled rectangles" 0.83 0.94 1.00 0.19
2. "plain, filled and gradient filled ovals" 0.82 0.98 1.00 0.21
3. "image rotation" 1.00 0.35 0.00 0.00
4. "image scaling" 0.22 0.68 1.00 0.58
5. "save JPEGs of configurable quality" 0.40 1.00 0.67 1.00
6. "colour reduction" 0.74 0.95 0.74 0.95
7. "grayscaling" - - - -
Improvement by using the ontology
− Improved recall for 1st and 2nd cases
− detected traceability for 3rd case
Domain ontology gives us valuable guides
for sentence-to-code traceability recovery. 31
32. Future Work
Case study++
− Larger
− Another/multiple domains
Supporting ontology construction
Improving NLP techniques
Combination of other techniques
− Dynamic analysis
− Relevance feedback
33. Example of Sentence Solution
JDraw (http://jdraw.sf.net/)
Choosing method invocations
using domain ontology
NL sentence Source code
(a set of words) (a set of methods and their invocations)
Users can draw
a plain oval. draw Oval()
Ontology
writeLog()
canvas getCanvas() getColor()
setPixel()
draw
getColorPallete()
DrawPanel() OvalTool()
oval color
5 12
Procedure Results
NL sentence Source code
(a set of words) (a set of methods and their invocations)
{wa, wb}
a b Sa m1
role: {wa}
1. Root selection
a
2. Traversal
– Traverse method
m2 m3
role: {wb} role: {wc}
invocations from the b c
roots if the invocation
m4 m5 Improvement by using the ontology
satisfies one of the role: {wb}
traversal rules b role: {wc}
S b
c
− Improved recall for 1st and 2nd cases
− detected traceability for 3rd case
– Traversed methods m6 m7 m8
will be a FCG role: {…}
Domain ontology gives us valuable guides
28 for sentence-to-code traceability recovery. 34
35. System Overview
Words in the
Sentence Sentence
Extracting Words
splitting, stemming,
extract
Source identifiers removing stop words... Words in the
Code Code
static
analysis Call-graph
Domain Functional
Ontology Traversing Call-graphs
call-graph
Inputs Prioritizing
Outputs Ordered functional
Call-graphs
39
36. Extracting Call Graph
Statically extracting invocations
Consideration of overridden methods
class Tool ....
class OvalTool extends Tool ... Drawer# Drawer#
class Drawer { startDraw startDraw
public void startDraw() {
Tool tool = Tool extracted
draw()
Tool.getCurrent();
tool.draw(); Tool# OvalTool#
} OvalTool
draw() draw draw
}
40
37. Extracting Words
From source code {starts,
with,
Extracting startsWith, Tokenizing starts,
identifiers startDraw, ... camelCases draw, ...}
From sentence
{draw, a, Normalizing
draw a circle circle}
Normalization {draw, {start,
− Stemming oval} draw, ...}
− Removing stop words
41
38. Bad Results
Use of ontology Yes Yes No No
Input sentences Prec. Recall Prec. Recall
1. "plain, filled and gradient filled rectangles" 0.83 0.94 1.00 0.19
2. "plain, filled and gradient filled ovals" 0.82 0.98 1.00 0.21
3. "image rotation" 1.00 0.35 0.00 0.00
4. "image scaling" 0.22 0.68 1.00 0.58
5. "save JPEGs of configurable quality" 0.40 1.00 0.67 1.00
6. "colour reduction" 0.74 0.95 0.74 0.95
7. "grayscaling" - - - -
Bad effects, no improvement
− reason: use of external modules
− reason: use of compound words (grayscale vs. grayScale)
42