One of the main challenges faced by today’s developers is keeping up with the staggering amount of source code that needs to be read and understood. In order to help developers with this problem and reduce the costs associated with it, one solution is to use simple textual descriptions of source code entities that developers can grasp easily, while capturing the code semantics precisely. We propose an approach to automatically determine such descriptions, based on automated text summarization technology and structural information.
2. Developers read source code
• Before performing maintenance on a
system, developers need to understand
its source code
• During comprehension, programmers
search and browse the code
3. Skimming vs. reading code
• Skimming (Starke’09): quickly reading the names of
software artifacts
+ Fast
– Insufficient information
– Shallow understanding
• Reading in depth
– Slow
– Too much information
+ Deeper understanding
4. Code summaries
• Automatically generated, short, yet accurate
descriptions of source code entities
• They give more information than just the
header or the name of an artifact
• Significantly shorter and faster to read than
the source code they summarize
5. What should we summarize?
• Code
– Packages
– Classes
– Methods
– Method sequences
– Etc.
• Other artifacts
– Bug reports (ICSE 2010 - S. Rastakar, G. Murphy, G. Murray)
– E-mails
– Etc.
6. What should we include
in code summaries?
• Semantic information
– What does the source code do?
– Identifiers and comments that capture the main concepts
• Structural information
– How does the code work?
– Class relationships, callers and callees, members of a
class, etc.
8. How should we generate
code summaries?
• Semantic information: automatic text
summarization
– Machine Learning
– Discourse-based approaches
– Term-based Text Retrieval techniques
• Structural information: static analysis
9. How can we evaluate code
summaries?
• How good are the automatic summaries
when compared to manual ones?
• How useful are the automatic code
summaries for SE tasks?
10. Preliminary evaluation
• Compared automatic code summaries
with developer code summaries
• 6 developers, 12 methods in ATunes
• Used only lexical information – 5 most
relevant terms
11. Results
• Automatic source code summaries good in
reflecting developers’ summaries
• Text Retrieval techniques work as well on
source code as on natural language in reflecting
human summaries
• Developers make use of structural information in
their code summaries:
– Method name terms
– Class name terms
– Formal parameter types terms
12. What are we doing now?
• What type and how much structural
information should be included in code
summaries?
• How do developers generate summaries?
• Are different summaries needed for
different tasks?
• How useful are the code summaries for
SE tasks?, etc.
13. In summary…
• Automatic code summaries:
– Short yet accurate descriptions of source code
– Can reduce the effort of program comprehension
– Embed both semantic and structural information
– Can be generated for a variety of software entities
• Visit my poster
(HINT: look for the huge and colorful one)
• www.cs.wayne.edu/~severe and
www.cs.wayne.edu/~shaiduc
• sonja@wayne.edu