Program comprehension is a crucial activity, preliminary to
any software maintenance task. Such an activity can be diffcult when the source code is not adequately documented, or the documentation is outdated. Differently from the many existing software re-documentation approaches, based on different kinds of code analysis, this paper describes CODES (mining sourCe cOde Descriptions from developErs diScussions), a tool which applies a \social" approach to software re-documentation. Specically, CODES extracts candidate method documentation from StackOver ow discussions, and
creates Javadoc descriptions from it. We evaluated CODES
to mine Lucene and Hibernate method descriptions. The
results indicate that CODES is able to extract descriptions
for 20% and 28% of the Lucene and Hibernate methods with a precision of 84% and 91% respectively.
Demo URL: http://youtu.be/Rnc5ni1AAzc
Demo Web Page:
www.ing.unisannio.it/spanichella/pages/tools/CODES
1. CODES: mining sourCe
cOde Descriptions from
developeErs diScussions
Carmine Sebastiano Massimiliano Gerardo
Vassallo Panichella Di Penta Canfora
2. S. Panichella, J. Aponte, M. Di Penta, A. Marcus, G. Canfora - ICPC 2012
3. Such communications are…
unstructured
usually not explicitly meant to describe specific parts of the
source code.
Example: Eclipse (Java System)
METHOD: searchMainMethods
..........................
Problem seems to come from
in org.eclipse.jdt.internal.debug.ui.launcher.
,there's a to
addSubTypes(List, IProgressMonitor,
IJavaSearchScope) if includesSubtypes flag
is ON. This add all types sub-types as soon
as the given scope encloses them without testing
if sub-types have a main !
..........................
CLASS: MainMethodSearchEngine Parameters: “ORANGE”
Keywords: “GREEN”
searchMainMethods
MainMethodSearchEngine
(IProgressMonitor,
IJavaSearchScope, boolean) cal
l
method
method
method
Method
Bug Report and Mailing List
4. • Step 1: Downloading SO discussions relying on
its REST interface and tracing them onto classes
• Step 2: Extracting paragraphs
• Step 3: Tracing paragraphs onto methods
( Discards Paragraphs of discussions with 0 Votes)
• Step 4: Heuristic based Filtering
• Step 5: Similarity based Filtering
CODES:
Approach for Mining Method Descriptions
The core approach behind CODES is based on the approach defined in our previous paper at the ICPC 2012
«Mining Source Code Description From developers Communications»
the motivation of our previous work, is founded from the conviction that
very often the documentation is scarse incompletes and out –of-date…Mine source code descriptions can be very important
For code Re-documenting or complementing code comments.
We argue that mailing list and issue tracker can be a useful source of information to help understand source code..
Thus, we defined in this previous paper an approach that mine java methods descriptions form developers discussions in mailing lists and issue trackers.
Indeed in bugs report and mailing lists there are often source code descriptions at different levels of abstraction. Observing this example of bug report of Eclipse (a Java System) we can see a good method description of a java class “MainMethodSearchEngine”. Such example motivated our previous approach to mine method descriptions.
However, such descriptions are also frequently present in discussions on stackoverflow… thus, we adapted our approach for this Questions&Answers Site…
In same way we can find a similar description in an email of Apache Lucene (an other java system). What is important to note that such “USEFUL” descriptions contains very often relvant keywords like “call/invoke”
Implementing CODES…that means “mining source code description from developers discussions”…
that starts selecting a java method (or methods) to re-documents…and find related
Description on stackoverflow,
CODES consists of 5 steps
Downloading SO discussions relying on its REST interface and tracing them onto classes.
Step 2: after that ----Extracting paragraphs from such Discussions
Step 3: than, -----Tracing paragraphs onto methods (using on Regular Expressions) – Discarding Discussions/or Answers with 0 Votes…
Step 4: in the step four, CODES---- applying an Heuristic based Filtering ( verifies that a paragraph meets some patterns) and considers
paragraphs having Syntactic descriptions, description of methods parameters, descriptions related to method invocations and so on..)
Step 5: Finally, in the last step we try to verify the accordance between the found methods descriptions on SO and the source code, “COMPUTING THE TEXTUAL
SIMILARITY BETWEEN THE source code and method descriptions, discarding descriptions having a similarity measure lower than 0.4”
(Similarity based Filtering)
Improvement with the aim of increasing the precision
while keeping the method coverage as high as possible.
Aim at further validating the proposed approach on a larger set of systems.
Investigate enhancing CODES improving its features in terms of usability and adding new features, e.g., for re-documenting classes or packages.