SlideShare uma empresa Scribd logo
1 de 95
Baixar para ler offline
Source Code comments,ICSSEA '15
the forgotten software check
How comments helps to find
ICSSEA 15
How comments helps to find
problems
J DERN V lJ. DERN, Valeo
Code Comment Analysis
Jérôme DERN
Software Quality Managery g
Contact:Contact:
+33 1 48 84 56 85
d ljerome.dern@valeo.com
@jeromedern
https://fr.linkedin.com/in/jeromede
I29/03/2015 | 2
Source Code Comments: The forgotten check
IntroductionIntroduction
I29/03/2015 | 3
Source code comments
I29/03/2015 | 4
A lot of code tools exists
I29/03/2015 | 5
Source Code Comments: The forgotten check
Code related tools are numerous
IDE (editors)IDE (editors),
Unitary Tests tools,
Static Analysis tools,
Control flow analysis toolsControl flow analysis tools,
Data flow analysis tools,
Runtime oriented tools,
Naming rule checkers,g ,
Time and Stack tools,
Reverse documentation builderReverse documentation builder…
I29/03/2015 | 6
Source Code Comments: The forgotten check
But, no tool focus on source code
commentscomments
Only two very limited freeware areOnly two very limited freeware are
available (one for Java and the other for
C++)C++)
Very limited and specific: they don’t coverVery limited and specific: they don t cover
problematic categories presented later on
in this presentationin this presentation
I29/03/2015 | 7
Why this idea of comment analysis tool?
During a Peer Review I discover that
a particular source code includes aa particular source code includes a
lot of TODO
I decided to search all occurrences of
TODO i th h l dTODO in the whole source code
I find out some others arguable practicesI find out some others arguable practices
I29/03/2015 | 8
Source Code Comments: The forgotten check
But searching without a
dedicated tool was very limiteddedicated tool was very limited
and time consuming…g
I29/03/2015 | 9
Source Code Comments: The forgotten check
Why tools don’t considerWhy tools don t consider
comments?
I29/03/2015 | 10
Source Code Comments: The forgotten check
We all focus on executable lines of code
Bugs are only located on executable lines
Comments are “inactive” that couldn’t have bugs
Nobody imagine that problems can be found by
l ki i tlooking in comments
Analyzing comments is not always easy
I29/03/2015 | 11
Source Code Comments: The forgotten check
I29/03/2015 | 12
The forgotten check
But, comments are essential for
Maintenance
C d d t diCode understanding
Code documentationCode documentation
But may also reveal a lot of potentialy p
code problems
I29/03/2015 | 13
The forgotten check
Comments are essential for
I29/03/2015 | 14
Handling comments
First, capturing comments is quite
easy
I29/03/2015 | 15
Handling comments
In C they began by /* and ends */ and
d d fi d b h ISOare not nested as defined by the ISO
C standard ISO/IEC 9899:1990C standard ISO/IEC 9899:1990
I29/03/2015 | 16
Handling comments
In C++, C, C#, D, Go, Java,
JavaScript PHP Object PascalJavaScript, PHP, Object Pascal,
ActionScript and Objective-C andp j
Swift
Comments starts by //
Ends at the end of the lineEnds at the end of the line,
/* and */ C notations are supported by all/ and / C notations are supported by all
excepting Object Pascal
I29/03/2015 | 17
Handling comments
In most assembly language
comments starts by “;”
ends at the end of the line.
This notation is used also in Lisp,
Scheme Clojure and AutoItScheme, Clojure and AutoIt
I29/03/2015 | 18
Handling comments
In all languages comments
Has a clear start and end
Are not nested (excepted for Swift)
So it is easy to capture themSo it is easy to capture them
whatever language is used
I29/03/2015 | 19
Handling comments
But some specific practices like
conditional compilation directives mayconditional compilation directives may
be “used” to add comment…
Example in C,C++ and Objective-C:p , j
#if 0
This is also a comment
#endif#endif
I29/03/2015 | 20
Handling comments
If the difficulty is not related to
comment’s capturecomment’s capture,
where it is?
I29/03/2015 | 21
Handling comments
Comment processing may be a little
bit complexbit complex
Natural language handling is not so
easyeasy
Commented code detection can be
trickytricky
I29/03/2015 | 22
Handling comments
I29/03/2015 | 23
The difficulties to handle comments
To fully understand why commentsTo fully understand why comments
may be seen as difficult to analyze,y y
we must clarify what can be expected
by a comment analysisby a comment analysis
I29/03/2015 | 24
Categories
Categories of checksCategories of checks
I29/03/2015 | 25
Defects detected by comment analysis
I29/03/2015 | 26
Defects detected by comment analysis
Comments not in EnglishComments not in English [NotEnglish]
I29/03/2015 | 27
Comments not in English
May happen when
D l t t l t d iDevelopment teams are located in a
country that is not natively speakingy y p g
English
S l d l t t ki tSeveral development teams working at
the same time but not in the same countryy
Using legacy code
I29/03/2015 | 28
Comments not in English
Why [NotEnglish] is important?Why [NotEnglish] is important?
Comments not in English are not
understandable by all development teams
Example: It is a major issue for a China team toExample: It is a major issue for a China team to
understand a source code with comments in
FrenchFrench
I29/03/2015 | 29
Defects detected
Bad practices [Bad practice]
I29/03/2015 | 30
Bad practices
What can be considered as [Bad
practice]?
Abuses slang & other inappropriate commentsAbuses, slang & other inappropriate comments
Comments that are not related to code activities
like jokes, personal life comments
Fancy comments emoticon and personalFancy comments, emoticon and personal
opinion on code
I29/03/2015 | 31
Bad practices
But also…
Presence of keywords to disable static
analysis ruleanalysis rule
I29/03/2015 | 32
Defects detected
Profanity in source code [Profanity]Profanity in source code [Profanity]
I29/03/2015 | 33
Profanity
Is there really [Profanity] in source code?y [ y]
In Europe it is rare to find profanity in source
code but this is a very common practicecode, but this is a very common practice
elsewhere
It depend surprisingly on computer language
used…
I29/03/2015 | 34
Profanity
(data coming from open source analysis)
I29/03/2015 | 35
Defects detected by comment analysis
Licensed & Open Source usageLicensed & Open Source usage
[Licensed]
I29/03/2015 | 36
Licensed
Wh [Li d] b bl ?Why [Licensed] can be a problem?
Using Open Source for commercialUsing Open Source for commercial
products may lead to opening product
dsource code
Commercial tools can detect open sourceCommercial tools can detect open source
software but they are expensive and it is
ft it bl t d t t hoften suitable to detect such a case as
soon as possiblep
I29/03/2015 | 37
Licensed
Well known example:p
The French ISP Free had a big problem
ith i f i t GPL li b iwith infringement GPL license by using
“Busybox” and “Iptables”y p
I29/03/2015 | 38
Licensed
Comments may reveals use of Open
Source software : license algorithm webSource software : license, algorithm, web
links, emails …
These kind of comments may reveals that
some portions of code are copiedsome portions of code are copied,
adapted, inspired from source code found
i th bin the web
I29/03/2015 | 39
Defects detected by comment analysis
Questioning commentsQuestioning comments [Problematic]
I29/03/2015 | 40
Defects detected by comment analysis
What are [Problematic] comments?
 Comments that may reveals developers
interrogation aboutinterrogation about
– Source code correctness,
– Algorithm chosen,
– Data strategy etcData, strategy, etc.
I29/03/2015 | 41
Defects detected by comment analysis
Unfinished software [U Fi i h d]Unfinished software [UnFinished]
I29/03/2015 | 42
Defects detected by comment analysis
What are [UnFinished]?
Comments that give clues that the
software is not finishedsoftware is not finished
For example, special keywords used by
developers to indicate that the code
– Is not finishedIs not finished
– May be optimized,
– Have to be updated
I29/03/2015 | 43
Defects detected by comment analysis
“C t d t” d“Commented out” code
[CommentedCode]
I29/03/2015 | 44
Defects detected by comment analysis
Why [CommentedCode] Should be pointed
out?out?
Very bad practice that should not be allowed
because this is the symptom of
– Unfinished source code
– Code removed to ease some R&D tests like
integration tests but that may not be removed for a
released version.
–Poor practice difficult to justify to customers
–Forbidden by MISRA-C 2004 rule 2.4, MISRA-C
2012 Dir 4.4, and MISRA C++ rule 2-7-3
I29/03/2015 | 45
Defects detected by comment analysis
Why [CommentedCode] Should be pointed out?
I29/03/2015 | 46
Defects detected by comment analysis
Commented code is difficult to detect becauseCommented code is difficult to detect because
code may be
Unfinished (not understandable by a compiler)– Unfinished (not understandable by a compiler)
– Not working anymore (using old or deleted definitions)
A mix of code and human language– A mix of code and human language
I29/03/2015 | 47
Defects detected by comment analysis
Mi i b t tiMissing best practices [BestPractice]
I29/03/2015 | 48
Defects detected by comment analysis
Lack of respect of [BestPractice] can be
Copyright header missingCopyright header missing
Mandatory keyword for development toolsy y p
–Configuration Management tool keywords missing
(Used to automatically store in comment history of file(Used to automatically store in comment history of file
modifications)
–Documentation management tool keywords (used tog y (
generated automatically software documentation)
I29/03/2015 | 49
Defects detected by comment analysis
Comment/Code ratio violations
I t t f i d bilit &–Important for measuring code reusability &
maintainability
–Can be calculated by some commercial tool like
Static Analyzers
– A real ratio that eliminate all mandatory
comments (Company headers, Functionscomments (Company headers, Functions
Headers…), “commented out” code and trivial
commentscomments
Additional statistics can include % of
t bl t icomments problem categories
I29/03/2015 | 50
Comments checking
Automatic checking of commentsAutomatic checking of comments
I29/03/2015 | 51
Automating comment checks
Comments can be checked manually
D i d i f d ifDuring code cross-review of source code if cross
reader is aware of previously seen defects
t icategories
During some Agile practices like PairDuring some Agile practices like Pair
programming
I29/03/2015 | 52
Automating comment checks
But
Thi i h k t f A t blThis is a huge work, sort of « Augean stables »
Not all teams are doing Cross-reviews and pairNot all teams are doing Cross reviews and pair
programming on all source code
H h k f il ilHuman checks may fails more easily
than a tool even if humans can see more clever
i tpoints
I29/03/2015 | 53
Automating comment checks
Main challenge of Automatic comment
analysisanalysis
Foreign language detectiong g g
Commented code detection
How these two challenges can beg
resolved?
I29/03/2015 | 54
Automating comment checks
Foreign language detection
This may be done by complex natural
language semantic analysis but this isg g y
really not needed
I29/03/2015 | 55
Automating comment checks
Since Zipf and his “Selected studies of the
principle of relative frequency in language” we allprinciple of relative frequency in language we all
knows that some words are more frequently
used than others in a given human languageused than others in a given human language
A list of most used words that are not available
in English can be sed to detect a foreignin English can be used to detect a foreign
language
A list of 30 to 70 words may be enough to detect
a particular languagep g g
–Accurate French language detection can be achieved
with 60 words including technical words for shortg
sentence detection
I29/03/2015 | 56
Automating comment checks
Another similar method is to check the
presence of a sufficient amount of Englishpresence of a sufficient amount of English
(and technical) words in comment
It may generate more false positive
B t thi th d i t l t d tBut this method is not related to any
language other than Englishg g g
I29/03/2015 | 57
Automating comment checks
“Commented out” code detection
Complex solution would be to embedded
a specific language analyzer in the toola specific language analyzer in the tool
Specific because we have to deal with
incomplete code, old code, partial code,
mixed code & human language (pseudomixed code & human language (pseudo
code, comments, …)
I29/03/2015 | 58
Automating comment checks
As for human language detection
there is a simple solutionthere is a simple solution
– Code detection can be done partially, butp y,
nearly enough, by capturing typical
computer language grammar and tokenscomputer language grammar and tokens
–This can be done by using simple regular
expressions
I29/03/2015 | 59
Automating comment checks
All other kind of checks
As foreign languages are already detected
and reported we can assume that all otherand reported we can assume that all other
categories of checks are done in English
I29/03/2015 | 60
Automating comment checks
All other kinds of checks can be done by
checking the presence or combination ofchecking the presence or combination of
some keywords in comments
Example for Unfinished: TBD, TBC,
TODOTODO, …
Some keywords may be checked casey y
insensitively, some others must not
I29/03/2015 | 61
Automating comment checks
We need some exclusion rules to avoid catching
too many false positivestoo many false positives
Example:
l /* if ( 8L ki I P ) */–else /* if (u8Locking == InProcess) */
–If this is true commented code, this is more a
documentation of what “if” the “else” refers to…
Same for:
–#endef /* #ifdef NVRAM_MODULE */
I29/03/2015 | 62
Automating comment checks
Obtained resultsObtained results
I29/03/2015 | 63
Automating comment checks
Managing false positive
In foreign language detection, false
positive was around 15% but very accuratepositive was around 15% but very accurate
to detect even a few foreign words
I29/03/2015 | 64
Automating comment checks
Managing false positive
Commented code false positive is around
10%10%
Lowering this leads to higher missed
positives
I29/03/2015 | 65
Automating comment checks
False positive was reduced
By choosing only strategic keyword
B i l ti l i li tBy implementing an exclusion list
composed of regular expression
 By not reporting company header
commentscomments
By adding a list of exclusion path forBy adding a list of exclusion path for
COTS, compiler libraries, legacy code…
I29/03/2015 | 66
Automating comment checks
Missed true positive
Difficult to evaluate except for foreign language
and commented code
 On foreign language it is 5% for sentences of 2
or more words but starting with 4 words it is lessor more words but starting with 4 words it is less
than 0,5%
 On commented code it is around 3% (due to
partial lines of code, “typedef” and “macro”p yp
usage that may be not captured by regular
expressions)p )
I29/03/2015 | 67
Automating comment checks
Missed true positive
On other kind of checks missed positive are less than
10% and can be reduced by adding new keyword
 This can be achieved in cross review when seeing
problematic comment reported by the tool
I29/03/2015 | 68
Automating comment checks
Automating comment checks in several
computer languagescomputer languages
Capturing comments is quite easy in several
computer languages (small configuration)
Analyzing natural language comments is notAnalyzing natural language comments is not
dependant of the computer language used
“Detected “commented out” code is computer
language dependant and new languages can be
added by a new small set of regular expression
and exclusion list
I29/03/2015 | 69
Automating comment checks
Automating comment analysis
T l t t t & thTool must capture comment & process them
Tool must be easy to use with a minimum needTool must be easy to use with a minimum need
of configuration
T l t d t t t l dTool must detect computer language used
The tool must be configurable forg
– Adding computer languages by just adding some
configuration filesconfiguration files
– Categories of problematic comments
I29/03/2015 | 70
Automating comment checks
Automating comment analysis
T l t d “ t ” t– Tool must produce an “easy to use” report
– Tool must produce a real comment ratio metric
– Tool must provide statistics for project
I29/03/2015 | 71
Automating comment checks
Automating comment analysis
Thi t l i t C t A l t lThis tool exists: Comment Analyser tool can
parse C and ASM and can be easily extended by
fi ti tconfiguration to manage
– other human languages (detection),
– other computer language
– other categoriesother categories
– new keywords in actual categories
I29/03/2015 | 72
Global results
45 projects analyzed in C and ASM,
4 Million of SLOC analyzed
3% of comments reveals problems% p
Mean comment/code ratio is 2 83Mean comment/code ratio is 2,83
Most common issue is Not in English
and “commented out” codea d co e ted out code
I29/03/2015 | 73
Results global view
45 various projects from 2,3 to 644 KSLOC of C & ASM
automotive embedded source code from 2002 to 2015
[PROFANITY]
0%
Comment Analysis 2002 to 2015
[NOTENGLISH]
0%
[COMMENTEDCODE]
18%
[ ]
59%
[BESTPRACTICE]
12%
[BADPRACTICE]
7%
[UNFINISHED]
1%
[PROBLEMATIC]
3%
[LICENSED]
0%
I29/03/2015 | 74
Results global: recent projects
22 various projects of C & ASM from 2011 to 2015: reported
comments dropped from 4,72% to 1,22%
[PROFANITY]
0%
Comment Analysis 2011 to 2015
[COMMENTEDCODE]
24%
[LICENSED]
1%
[NOTENGLISH]
15%
0%
24%
[BESTPRACTICE]
26%
[BADPRACTICE]
27%
[PROBLEMATIC]
6%
[UNFINISHED]
I29/03/2015 | 75
[UNFINISHED]
1%
Results global view
Comment ratio comparison between
QAC and “Comment Analyser”QAC and “Comment Analyser”
QAC Static Analysis tool compute STCDNQAC Static Analysis tool compute STCDN
metric as
th b f i ibl h t i t–the number of visible characters in comments,
divided by the number of visible characters
t id toutside comments.
–Comment delimiters are ignored.g
–Whitespace characters in strings are treated as
visible charactersvisible characters.
I29/03/2015 | 76
Results global view
Comment Analyser tool compute this
metrics asmetrics as
–NBCMT = number of useful characters in
t lti l hit h tcomments: multiple whitespace characters are
treated as one character, trivial comments (ex:
/*******************/) i lifi d ( >/*[ ]*/)/*******************/) are simplified (=>/*[…]*/),
and, commented code is removed
–NBPRG = number of useful characters outside
comments
–STCDN² = NBCMT/NBPRG
I29/03/2015 | 77
Results global view
Comment ratio comparison between QAC and
“Comment Analyser”Comment Analyser
On a project of 194K lines QAC gives a mean STDCN
value of 3 15 (3 15 characters in comments for 1value of 3,15 (3,15 characters in comments for 1
character not in comment) while Comment Analyser tool
gives 2,4 (2,4 characters in comments for 1 character notg (
in comment).
STCDN² metric is significantly different than thoseg y
calculated by QAC.
This mean that classic way of calculating Comment/CodeThis mean that classic way of calculating Comment/Code
ratio is not accurate.
I29/03/2015 | 78
Results global view
Example of tool output for a given project
Project: XXXX Comments analysis report
Project base directory: E:UserXXXX04-Software04-Coding01-Sources
E:UserXXXX04-Software04-Coding01-SourcesApplicationACC_IGACCIG_Manage.c (Rev. 1.23 by Y@valeo.com)
000894 [COMMENTEDCODE] //(WRP_GetData(flg, EvtBCMEmgcStop) == TRUE)
000901 [COMMENTEDCODE] //(WRP G tD t (fl E tBCME St ) TRUE)000901 [COMMENTEDCODE] //(WRP_GetData(flg, EvtBCMEmgcStop) == TRUE)
E:UserXXXX04-Software04-Coding01-SourcesApplicationACC_IGACCIG_Manage.h (Rev. 1.4 by Y@valeo.com)
000099 [COMMENTEDCODE] /* #ifndef ACCIG_MANAGE_H */
000266 [COMMENTEDCODE] /*LOC u8ComCID State = u8COMCID BUSY; */000266 [COMMENTEDCODE] /*LOC_u8ComCID_State = u8COMCID_BUSY; */
E:UserXXXX04-Software04-Coding01-SourcesApplicationDIARCDIA_SendBFDIA_RCSendBF_loc.h
(Rev. 1.4 by Y@valeo.com)
000118 [COMMENTEDCODE] /* define u8REQ LF EM ANT INS ALL ((uint8) 0x40) */000118 [COMMENTEDCODE] / define u8REQ_LF_EM_ANT_INS_ALL ((uint8) 0x40) /
000134 [COMMENTEDCODE] /* define u8LF_ALL_EXT_ANT ((uint8) 0x80) */
001415 [UNFINISHED] /* TODO: remove ASIC activation */
001764 [COMMENTEDCODE] /* CanACC BDB: 0 = OFF; 1 = ON */[ ] _ ;
001863 [COMMENTEDCODE] /*EVT_strEntries.u8EvtBSTPE = FALSE;*/
001869 [COMMENTEDCODE] /*EVT_strEntries.bEvtBDB1S01st = FALSE;*/
001873 [UNFINISHED] // TODO: confirm filtering
001925 [COMMENTEDCODE] /* &&(WRP_GetData(u8,EsclState) == HFS_u8ESCL_UNL) */
…
I29/03/2015 | 79
Results detailed view
Example of warning per type: commented code
/* CAR u8ComTypeInProgress == CAR u8COM TYP NO COM *//* CAR_u8ComTypeInProgress == CAR_u8COM_TYP_NO_COM */
// level = BATT_u8BattLevel
#if 0 /* Not used in 128 bit key */ if( (((uint8) u8NB_COL_KEY) > ((uint8) 6)) && (
((uint8) (u8Index % u8NB_COL_KEY)) == ((uint8) 4) )) {
/* { _CLI } */
//UTL 8M ( & t T C fi 8T S tK [0]//UTL_u8Memcpy( &strTrpConfig.au8TrpSecretKey[0],
WRP_GetData(au8,StartKeys),AUT_u8eSIZEOF_ISK_CODE)
/*( )*//*(uint8)*/
/*ASM_vidChecksStack() */
I29/03/2015 | 80
Results detailed view
Example of warning per type: Bad practice
/* PRQA S 3198 */ (Disabling Static analysis tool in source code)/* PRQA S 3198 -- */ (Disabling Static analysis tool in source code)
/* game over for the windowed mode, return to immediate mode and counter to 0 */
Missing PVCS $Workfile:$Revision:$Log:$Modtime: or $Date: keywords!
I29/03/2015 | 81
Results detailed view
Example of warning per type: Unfinished
/* TODO *//* TODO */
/*TBD: in case of end of stop field success/error/timeout*/
//todo :LINVHStart used ?
#if 0 8LCK END ST { // TODO R ??#if 0 case u8LCK_END_ST : { // TODO: Remove ??
// TODO: verify if ( LNK[...]
/* xxx x xxxx [...]*/
/* Toff = xxxx ms (unit = 13,5115 µs) */
// TODO: CALL DET
/* TBC */
I29/03/2015 | 82
Results detailed view
Example of warning per type: Problematic
If (mode /*=*/=RUNNING) [ ] <= May be a bug! And for sure a very bad practiceIf (mode / / RUNNING) […] < May be a bug! And for sure a very bad practice
// BUG : Result is written although not expected + Null pointer provided + Pointer not tested
/* Bug : one extra command was sent => LF carrier of length 0 !!! */
/* Temporary workaround: due to a bug in the ASIC software, the ASIC is not woken up by the
t [ ]*/request [...]*/
/* Compiler bug: __transponder_reset must be used else address RESET_SUBCOMMAND is not
linked */linked /
/* direct write in EVT struct !! */
/* Index overflow !!! */
/* !ONLY USED FOR TESTS! */
/* Reset can never be prevented => Problem !!! */
/* ERROR !!! */
I29/03/2015 | 83
/ ERROR !!! /
Results detailed view
Example of warning per type: Not English
/* timeout sur ACK 1 *//* timeout sur ACK 1 */
/* si le polling est en cours => signal à EVT pour activation fonction */
/* Accès ML avec calibrage */
/* A t B *//* eomA et eomB */
/* > Plus rien ne doit figurer après ce point. */
#if 0 /* Lecture entrées directes */ LOC_au8TabEntriesBruts[INDEX_ACC] =
[...]Dio_ReadChannel(DIO_UC_INFO_ACC) ^ (DIO_UC_INFO_ACC_MASK)
OCLOC_au8TabEnt[...]
// 0 pour +1 minimum
/* Durée du filtrage des entrées capteurs x période tâche = 6 * 2ms = 8 ms + latence
prise en compte soit x ms en veille => filtrage de X ms et y ms e[...]
I29/03/2015 | 84
Automating comment checks
Future WorkFuture Work
I29/03/2015 | 85
Automating comment checks
Possible future work
T d thi i t d l t l b t iToday this is a stand alone tool, but is
also can be provided as an add-on of
t ti l i t l lik QAC Csome static analysis tool like QAC-C,
QAC-C++
 A link with executable source code and
comments can be added in order tocomments can be added in order to
–Ensure comments presence for strategic
lines of code (structure members “if”lines of code (structure members, if
statement, “while” statements, …)
Etc– Etc…
I29/03/2015 | 86
Automating comment checks
Creating an add-on to configuration management
tool to check that modified code also include
comments addition or modification
–To ensure that comments are always up to datey p
I29/03/2015 | 87
Understand Automotive Software
ReferencesReferences
I29/03/2015 | 88
References
Early, previous and related works
 G Kingsley Zipf : “Selected studies of the principle of relative G. Kingsley Zipf : Selected studies of the principle of relative
frequency in language”, Harvard University Press, Cambridge, MA,
USA, 1932. 51 pp. LCCN P123 .Z5.
 Lin Tan, Ding Yuan and Yuanyuan Zhou. HotComments. How to
Make Program Comments More Useful? Department of Computer
Science, University of Illinois at Urbana-Champaign, {lintan2,Science, University of Illinois at Urbana Champaign, {lintan2,
dyuan3, yyzhou}@cs.uiuc.edu
 W. E. Howden. Comments analysis and programming errors. IEEE
Trans. Softw. Eng., 1990
 Z. Li and Y. Zhou. PR-Miner: Automatically extracting implicit
programming rules and detecting violations in large software code Inprogramming rules and detecting violations in large software code. In
FSE'05.
 D. Steidl, B. Hummel, E. Juergens: “Quality Analysis of Source Code D. Steidl, B. Hummel, E. Juergens: Quality Analysis of Source Code
Comments”, CQSE GmbH, Garching b. München, Germany
I29/03/2015 | 89
References
 N. Khamis, R. Witte, and J. Rilling, “Automatic Quality Assessment
of Source Code Comments: the JavadocMiner,” ser. NLDB ’10,
20102010.
 M.-A. Storey, J. Ryall, R. I. Bull, D. Myers, and J. Singer, “TODO or
To Bug: Exploring How Task Annotations Play a Role in the Workg p g y
Practices of Software Developers,” ser. ICSE ’08, 2008.
 A. T. T. Ying, J. L. Wright, and S. Abrams, “Source code that talks:
l ti f E li t k t d th i i li ti tan exploration of Eclipse task comments and their implication to
repository mining,” ser. MSR ’05, 2005.
 L Tan D Yuan and Y Zhou “HotComments: How to Make L. Tan, D. Yuan, and Y. Zhou, HotComments: How to Make
Program Comments More Useful?” ser. HOTOS ’07, 2007.
 D. J. Lawrie, H. Feild, and D. Binkley, “Leveraged Quality, , y, g Q y
Assessment using Information Retrieval Techniques,” ser. ICPC ’06,
2006.
I29/03/2015 | 90
References
 Z. M. Jiang and A. E. Hassan, “Examining the Evolution of Code
Comments in PostgreSQL,” ser. MSR ’06, 2006.
 B. Fluri, M. Wursch, and H. C. Gall, “Do Code and Comments Co-
Evolve? On the Relation between Source Code and Comment
Changes,” ser. WCRE ’07, 2007.g , ,
 J. Tang, H. Li, Y. Cao, and Z. Tang, “Email data cleaning,” ser.
KDD’05, 2005.
 A. Bacchelli, M. D’Ambros, and M. Lanza, “Extracting Source Code
from E-Mails,” ser. ICPC ’10, 2010.
I29/03/2015 | 91
Automating comment checks
“Comments” and Questions?“Comments” and Questions?
I29/03/2015 | 92
Comments and Questions
Any questions?
Any comments? “Human language”y g g
comments of course…
I29/03/2015 | 93
Automatic Comment Analysis
THANK YOU !THANK YOU !
I29/03/2015 | 94
Reference
I29/03/2015 | 95

Mais conteúdo relacionado

Semelhante a Source Code Comment Analysis Reveals Hidden Problems

issuesindesignofcodegenerator-150227091230-conversion-gate01
issuesindesignofcodegenerator-150227091230-conversion-gate01issuesindesignofcodegenerator-150227091230-conversion-gate01
issuesindesignofcodegenerator-150227091230-conversion-gate01vinithapanneer
 
Indy meetup#7 effective unit-testing-mule
Indy meetup#7 effective unit-testing-muleIndy meetup#7 effective unit-testing-mule
Indy meetup#7 effective unit-testing-muleikram_ahamed
 
Building application security with 0 money down
Building application security with 0 money downBuilding application security with 0 money down
Building application security with 0 money downDefCamp
 
Safety on the Max: How to Write Reliable C/C++ Code for Embedded Systems
Safety on the Max: How to Write Reliable C/C++ Code for Embedded SystemsSafety on the Max: How to Write Reliable C/C++ Code for Embedded Systems
Safety on the Max: How to Write Reliable C/C++ Code for Embedded SystemsAndrey Karpov
 
Csw2016 d antoine_automatic_exploitgeneration
Csw2016 d antoine_automatic_exploitgenerationCsw2016 d antoine_automatic_exploitgeneration
Csw2016 d antoine_automatic_exploitgenerationCanSecWest
 
Code Coverage - A Dump Metric
Code Coverage - A Dump MetricCode Coverage - A Dump Metric
Code Coverage - A Dump MetricDevOps Indonesia
 
Go - Where it's going and why you should pay attention.
Go - Where it's going and why you should pay attention.Go - Where it's going and why you should pay attention.
Go - Where it's going and why you should pay attention.Aaron Schlesinger
 
Refactoring Big Design Smells : Presented by Sanjay Kumar
Refactoring Big Design Smells : Presented by Sanjay KumarRefactoring Big Design Smells : Presented by Sanjay Kumar
Refactoring Big Design Smells : Presented by Sanjay KumaroGuild .
 
Code-Review-Principles-Process-and-Tools (1)
Code-Review-Principles-Process-and-Tools (1)Code-Review-Principles-Process-and-Tools (1)
Code-Review-Principles-Process-and-Tools (1)Aditya Bhuyan
 
Understanding DevOps
Understanding DevOpsUnderstanding DevOps
Understanding DevOpsInnoTech
 
How GitLab and HackerOne help organizations innovate faster without compromis...
How GitLab and HackerOne help organizations innovate faster without compromis...How GitLab and HackerOne help organizations innovate faster without compromis...
How GitLab and HackerOne help organizations innovate faster without compromis...HackerOne
 
from 0 to continuous delivery in 30 minutes
from 0 to continuous delivery in 30 minutesfrom 0 to continuous delivery in 30 minutes
from 0 to continuous delivery in 30 minutesAgileSparks
 
C++ and Embedded Linux - a perfect match
C++ and Embedded Linux - a perfect matchC++ and Embedded Linux - a perfect match
C++ and Embedded Linux - a perfect matchVinícius Tadeu Zein
 
Webinar on How to use MyAppConverter
Webinar on How to use  MyAppConverterWebinar on How to use  MyAppConverter
Webinar on How to use MyAppConverterJaoued Ahmed
 
Open Source In Enterprises Apache2009 Beijing Jack Cai
Open Source In Enterprises Apache2009 Beijing Jack CaiOpen Source In Enterprises Apache2009 Beijing Jack Cai
Open Source In Enterprises Apache2009 Beijing Jack CaiOpenSourceCamp
 
DevOps – Don’t Be Left Behind
DevOps – Don’t Be Left BehindDevOps – Don’t Be Left Behind
DevOps – Don’t Be Left BehindCapgemini
 
Technical debt management strategies
Technical debt management strategiesTechnical debt management strategies
Technical debt management strategiesRaquel Pau
 
O'Reilly Webinar Five Mistakes Log Analysis
O'Reilly Webinar Five Mistakes Log AnalysisO'Reilly Webinar Five Mistakes Log Analysis
O'Reilly Webinar Five Mistakes Log AnalysisAnton Chuvakin
 

Semelhante a Source Code Comment Analysis Reveals Hidden Problems (20)

issuesindesignofcodegenerator-150227091230-conversion-gate01
issuesindesignofcodegenerator-150227091230-conversion-gate01issuesindesignofcodegenerator-150227091230-conversion-gate01
issuesindesignofcodegenerator-150227091230-conversion-gate01
 
pdlc
pdlc pdlc
pdlc
 
Automation and Technical Debt
Automation and Technical DebtAutomation and Technical Debt
Automation and Technical Debt
 
Indy meetup#7 effective unit-testing-mule
Indy meetup#7 effective unit-testing-muleIndy meetup#7 effective unit-testing-mule
Indy meetup#7 effective unit-testing-mule
 
Building application security with 0 money down
Building application security with 0 money downBuilding application security with 0 money down
Building application security with 0 money down
 
Safety on the Max: How to Write Reliable C/C++ Code for Embedded Systems
Safety on the Max: How to Write Reliable C/C++ Code for Embedded SystemsSafety on the Max: How to Write Reliable C/C++ Code for Embedded Systems
Safety on the Max: How to Write Reliable C/C++ Code for Embedded Systems
 
Csw2016 d antoine_automatic_exploitgeneration
Csw2016 d antoine_automatic_exploitgenerationCsw2016 d antoine_automatic_exploitgeneration
Csw2016 d antoine_automatic_exploitgeneration
 
Code Coverage - A Dump Metric
Code Coverage - A Dump MetricCode Coverage - A Dump Metric
Code Coverage - A Dump Metric
 
Go - Where it's going and why you should pay attention.
Go - Where it's going and why you should pay attention.Go - Where it's going and why you should pay attention.
Go - Where it's going and why you should pay attention.
 
Refactoring Big Design Smells : Presented by Sanjay Kumar
Refactoring Big Design Smells : Presented by Sanjay KumarRefactoring Big Design Smells : Presented by Sanjay Kumar
Refactoring Big Design Smells : Presented by Sanjay Kumar
 
Code-Review-Principles-Process-and-Tools (1)
Code-Review-Principles-Process-and-Tools (1)Code-Review-Principles-Process-and-Tools (1)
Code-Review-Principles-Process-and-Tools (1)
 
Understanding DevOps
Understanding DevOpsUnderstanding DevOps
Understanding DevOps
 
How GitLab and HackerOne help organizations innovate faster without compromis...
How GitLab and HackerOne help organizations innovate faster without compromis...How GitLab and HackerOne help organizations innovate faster without compromis...
How GitLab and HackerOne help organizations innovate faster without compromis...
 
from 0 to continuous delivery in 30 minutes
from 0 to continuous delivery in 30 minutesfrom 0 to continuous delivery in 30 minutes
from 0 to continuous delivery in 30 minutes
 
C++ and Embedded Linux - a perfect match
C++ and Embedded Linux - a perfect matchC++ and Embedded Linux - a perfect match
C++ and Embedded Linux - a perfect match
 
Webinar on How to use MyAppConverter
Webinar on How to use  MyAppConverterWebinar on How to use  MyAppConverter
Webinar on How to use MyAppConverter
 
Open Source In Enterprises Apache2009 Beijing Jack Cai
Open Source In Enterprises Apache2009 Beijing Jack CaiOpen Source In Enterprises Apache2009 Beijing Jack Cai
Open Source In Enterprises Apache2009 Beijing Jack Cai
 
DevOps – Don’t Be Left Behind
DevOps – Don’t Be Left BehindDevOps – Don’t Be Left Behind
DevOps – Don’t Be Left Behind
 
Technical debt management strategies
Technical debt management strategiesTechnical debt management strategies
Technical debt management strategies
 
O'Reilly Webinar Five Mistakes Log Analysis
O'Reilly Webinar Five Mistakes Log AnalysisO'Reilly Webinar Five Mistakes Log Analysis
O'Reilly Webinar Five Mistakes Log Analysis
 

Source Code Comment Analysis Reveals Hidden Problems

  • 1. Source Code comments,ICSSEA '15 the forgotten software check How comments helps to find ICSSEA 15 How comments helps to find problems J DERN V lJ. DERN, Valeo
  • 2. Code Comment Analysis Jérôme DERN Software Quality Managery g Contact:Contact: +33 1 48 84 56 85 d ljerome.dern@valeo.com @jeromedern https://fr.linkedin.com/in/jeromede I29/03/2015 | 2
  • 3. Source Code Comments: The forgotten check IntroductionIntroduction I29/03/2015 | 3
  • 5. A lot of code tools exists I29/03/2015 | 5
  • 6. Source Code Comments: The forgotten check Code related tools are numerous IDE (editors)IDE (editors), Unitary Tests tools, Static Analysis tools, Control flow analysis toolsControl flow analysis tools, Data flow analysis tools, Runtime oriented tools, Naming rule checkers,g , Time and Stack tools, Reverse documentation builderReverse documentation builder… I29/03/2015 | 6
  • 7. Source Code Comments: The forgotten check But, no tool focus on source code commentscomments Only two very limited freeware areOnly two very limited freeware are available (one for Java and the other for C++)C++) Very limited and specific: they don’t coverVery limited and specific: they don t cover problematic categories presented later on in this presentationin this presentation I29/03/2015 | 7
  • 8. Why this idea of comment analysis tool? During a Peer Review I discover that a particular source code includes aa particular source code includes a lot of TODO I decided to search all occurrences of TODO i th h l dTODO in the whole source code I find out some others arguable practicesI find out some others arguable practices I29/03/2015 | 8
  • 9. Source Code Comments: The forgotten check But searching without a dedicated tool was very limiteddedicated tool was very limited and time consuming…g I29/03/2015 | 9
  • 10. Source Code Comments: The forgotten check Why tools don’t considerWhy tools don t consider comments? I29/03/2015 | 10
  • 11. Source Code Comments: The forgotten check We all focus on executable lines of code Bugs are only located on executable lines Comments are “inactive” that couldn’t have bugs Nobody imagine that problems can be found by l ki i tlooking in comments Analyzing comments is not always easy I29/03/2015 | 11
  • 12. Source Code Comments: The forgotten check I29/03/2015 | 12
  • 13. The forgotten check But, comments are essential for Maintenance C d d t diCode understanding Code documentationCode documentation But may also reveal a lot of potentialy p code problems I29/03/2015 | 13
  • 14. The forgotten check Comments are essential for I29/03/2015 | 14
  • 15. Handling comments First, capturing comments is quite easy I29/03/2015 | 15
  • 16. Handling comments In C they began by /* and ends */ and d d fi d b h ISOare not nested as defined by the ISO C standard ISO/IEC 9899:1990C standard ISO/IEC 9899:1990 I29/03/2015 | 16
  • 17. Handling comments In C++, C, C#, D, Go, Java, JavaScript PHP Object PascalJavaScript, PHP, Object Pascal, ActionScript and Objective-C andp j Swift Comments starts by // Ends at the end of the lineEnds at the end of the line, /* and */ C notations are supported by all/ and / C notations are supported by all excepting Object Pascal I29/03/2015 | 17
  • 18. Handling comments In most assembly language comments starts by “;” ends at the end of the line. This notation is used also in Lisp, Scheme Clojure and AutoItScheme, Clojure and AutoIt I29/03/2015 | 18
  • 19. Handling comments In all languages comments Has a clear start and end Are not nested (excepted for Swift) So it is easy to capture themSo it is easy to capture them whatever language is used I29/03/2015 | 19
  • 20. Handling comments But some specific practices like conditional compilation directives mayconditional compilation directives may be “used” to add comment… Example in C,C++ and Objective-C:p , j #if 0 This is also a comment #endif#endif I29/03/2015 | 20
  • 21. Handling comments If the difficulty is not related to comment’s capturecomment’s capture, where it is? I29/03/2015 | 21
  • 22. Handling comments Comment processing may be a little bit complexbit complex Natural language handling is not so easyeasy Commented code detection can be trickytricky I29/03/2015 | 22
  • 24. The difficulties to handle comments To fully understand why commentsTo fully understand why comments may be seen as difficult to analyze,y y we must clarify what can be expected by a comment analysisby a comment analysis I29/03/2015 | 24
  • 25. Categories Categories of checksCategories of checks I29/03/2015 | 25
  • 26. Defects detected by comment analysis I29/03/2015 | 26
  • 27. Defects detected by comment analysis Comments not in EnglishComments not in English [NotEnglish] I29/03/2015 | 27
  • 28. Comments not in English May happen when D l t t l t d iDevelopment teams are located in a country that is not natively speakingy y p g English S l d l t t ki tSeveral development teams working at the same time but not in the same countryy Using legacy code I29/03/2015 | 28
  • 29. Comments not in English Why [NotEnglish] is important?Why [NotEnglish] is important? Comments not in English are not understandable by all development teams Example: It is a major issue for a China team toExample: It is a major issue for a China team to understand a source code with comments in FrenchFrench I29/03/2015 | 29
  • 30. Defects detected Bad practices [Bad practice] I29/03/2015 | 30
  • 31. Bad practices What can be considered as [Bad practice]? Abuses slang & other inappropriate commentsAbuses, slang & other inappropriate comments Comments that are not related to code activities like jokes, personal life comments Fancy comments emoticon and personalFancy comments, emoticon and personal opinion on code I29/03/2015 | 31
  • 32. Bad practices But also… Presence of keywords to disable static analysis ruleanalysis rule I29/03/2015 | 32
  • 33. Defects detected Profanity in source code [Profanity]Profanity in source code [Profanity] I29/03/2015 | 33
  • 34. Profanity Is there really [Profanity] in source code?y [ y] In Europe it is rare to find profanity in source code but this is a very common practicecode, but this is a very common practice elsewhere It depend surprisingly on computer language used… I29/03/2015 | 34
  • 35. Profanity (data coming from open source analysis) I29/03/2015 | 35
  • 36. Defects detected by comment analysis Licensed & Open Source usageLicensed & Open Source usage [Licensed] I29/03/2015 | 36
  • 37. Licensed Wh [Li d] b bl ?Why [Licensed] can be a problem? Using Open Source for commercialUsing Open Source for commercial products may lead to opening product dsource code Commercial tools can detect open sourceCommercial tools can detect open source software but they are expensive and it is ft it bl t d t t hoften suitable to detect such a case as soon as possiblep I29/03/2015 | 37
  • 38. Licensed Well known example:p The French ISP Free had a big problem ith i f i t GPL li b iwith infringement GPL license by using “Busybox” and “Iptables”y p I29/03/2015 | 38
  • 39. Licensed Comments may reveals use of Open Source software : license algorithm webSource software : license, algorithm, web links, emails … These kind of comments may reveals that some portions of code are copiedsome portions of code are copied, adapted, inspired from source code found i th bin the web I29/03/2015 | 39
  • 40. Defects detected by comment analysis Questioning commentsQuestioning comments [Problematic] I29/03/2015 | 40
  • 41. Defects detected by comment analysis What are [Problematic] comments?  Comments that may reveals developers interrogation aboutinterrogation about – Source code correctness, – Algorithm chosen, – Data strategy etcData, strategy, etc. I29/03/2015 | 41
  • 42. Defects detected by comment analysis Unfinished software [U Fi i h d]Unfinished software [UnFinished] I29/03/2015 | 42
  • 43. Defects detected by comment analysis What are [UnFinished]? Comments that give clues that the software is not finishedsoftware is not finished For example, special keywords used by developers to indicate that the code – Is not finishedIs not finished – May be optimized, – Have to be updated I29/03/2015 | 43
  • 44. Defects detected by comment analysis “C t d t” d“Commented out” code [CommentedCode] I29/03/2015 | 44
  • 45. Defects detected by comment analysis Why [CommentedCode] Should be pointed out?out? Very bad practice that should not be allowed because this is the symptom of – Unfinished source code – Code removed to ease some R&D tests like integration tests but that may not be removed for a released version. –Poor practice difficult to justify to customers –Forbidden by MISRA-C 2004 rule 2.4, MISRA-C 2012 Dir 4.4, and MISRA C++ rule 2-7-3 I29/03/2015 | 45
  • 46. Defects detected by comment analysis Why [CommentedCode] Should be pointed out? I29/03/2015 | 46
  • 47. Defects detected by comment analysis Commented code is difficult to detect becauseCommented code is difficult to detect because code may be Unfinished (not understandable by a compiler)– Unfinished (not understandable by a compiler) – Not working anymore (using old or deleted definitions) A mix of code and human language– A mix of code and human language I29/03/2015 | 47
  • 48. Defects detected by comment analysis Mi i b t tiMissing best practices [BestPractice] I29/03/2015 | 48
  • 49. Defects detected by comment analysis Lack of respect of [BestPractice] can be Copyright header missingCopyright header missing Mandatory keyword for development toolsy y p –Configuration Management tool keywords missing (Used to automatically store in comment history of file(Used to automatically store in comment history of file modifications) –Documentation management tool keywords (used tog y ( generated automatically software documentation) I29/03/2015 | 49
  • 50. Defects detected by comment analysis Comment/Code ratio violations I t t f i d bilit &–Important for measuring code reusability & maintainability –Can be calculated by some commercial tool like Static Analyzers – A real ratio that eliminate all mandatory comments (Company headers, Functionscomments (Company headers, Functions Headers…), “commented out” code and trivial commentscomments Additional statistics can include % of t bl t icomments problem categories I29/03/2015 | 50
  • 51. Comments checking Automatic checking of commentsAutomatic checking of comments I29/03/2015 | 51
  • 52. Automating comment checks Comments can be checked manually D i d i f d ifDuring code cross-review of source code if cross reader is aware of previously seen defects t icategories During some Agile practices like PairDuring some Agile practices like Pair programming I29/03/2015 | 52
  • 53. Automating comment checks But Thi i h k t f A t blThis is a huge work, sort of « Augean stables » Not all teams are doing Cross-reviews and pairNot all teams are doing Cross reviews and pair programming on all source code H h k f il ilHuman checks may fails more easily than a tool even if humans can see more clever i tpoints I29/03/2015 | 53
  • 54. Automating comment checks Main challenge of Automatic comment analysisanalysis Foreign language detectiong g g Commented code detection How these two challenges can beg resolved? I29/03/2015 | 54
  • 55. Automating comment checks Foreign language detection This may be done by complex natural language semantic analysis but this isg g y really not needed I29/03/2015 | 55
  • 56. Automating comment checks Since Zipf and his “Selected studies of the principle of relative frequency in language” we allprinciple of relative frequency in language we all knows that some words are more frequently used than others in a given human languageused than others in a given human language A list of most used words that are not available in English can be sed to detect a foreignin English can be used to detect a foreign language A list of 30 to 70 words may be enough to detect a particular languagep g g –Accurate French language detection can be achieved with 60 words including technical words for shortg sentence detection I29/03/2015 | 56
  • 57. Automating comment checks Another similar method is to check the presence of a sufficient amount of Englishpresence of a sufficient amount of English (and technical) words in comment It may generate more false positive B t thi th d i t l t d tBut this method is not related to any language other than Englishg g g I29/03/2015 | 57
  • 58. Automating comment checks “Commented out” code detection Complex solution would be to embedded a specific language analyzer in the toola specific language analyzer in the tool Specific because we have to deal with incomplete code, old code, partial code, mixed code & human language (pseudomixed code & human language (pseudo code, comments, …) I29/03/2015 | 58
  • 59. Automating comment checks As for human language detection there is a simple solutionthere is a simple solution – Code detection can be done partially, butp y, nearly enough, by capturing typical computer language grammar and tokenscomputer language grammar and tokens –This can be done by using simple regular expressions I29/03/2015 | 59
  • 60. Automating comment checks All other kind of checks As foreign languages are already detected and reported we can assume that all otherand reported we can assume that all other categories of checks are done in English I29/03/2015 | 60
  • 61. Automating comment checks All other kinds of checks can be done by checking the presence or combination ofchecking the presence or combination of some keywords in comments Example for Unfinished: TBD, TBC, TODOTODO, … Some keywords may be checked casey y insensitively, some others must not I29/03/2015 | 61
  • 62. Automating comment checks We need some exclusion rules to avoid catching too many false positivestoo many false positives Example: l /* if ( 8L ki I P ) */–else /* if (u8Locking == InProcess) */ –If this is true commented code, this is more a documentation of what “if” the “else” refers to… Same for: –#endef /* #ifdef NVRAM_MODULE */ I29/03/2015 | 62
  • 63. Automating comment checks Obtained resultsObtained results I29/03/2015 | 63
  • 64. Automating comment checks Managing false positive In foreign language detection, false positive was around 15% but very accuratepositive was around 15% but very accurate to detect even a few foreign words I29/03/2015 | 64
  • 65. Automating comment checks Managing false positive Commented code false positive is around 10%10% Lowering this leads to higher missed positives I29/03/2015 | 65
  • 66. Automating comment checks False positive was reduced By choosing only strategic keyword B i l ti l i li tBy implementing an exclusion list composed of regular expression  By not reporting company header commentscomments By adding a list of exclusion path forBy adding a list of exclusion path for COTS, compiler libraries, legacy code… I29/03/2015 | 66
  • 67. Automating comment checks Missed true positive Difficult to evaluate except for foreign language and commented code  On foreign language it is 5% for sentences of 2 or more words but starting with 4 words it is lessor more words but starting with 4 words it is less than 0,5%  On commented code it is around 3% (due to partial lines of code, “typedef” and “macro”p yp usage that may be not captured by regular expressions)p ) I29/03/2015 | 67
  • 68. Automating comment checks Missed true positive On other kind of checks missed positive are less than 10% and can be reduced by adding new keyword  This can be achieved in cross review when seeing problematic comment reported by the tool I29/03/2015 | 68
  • 69. Automating comment checks Automating comment checks in several computer languagescomputer languages Capturing comments is quite easy in several computer languages (small configuration) Analyzing natural language comments is notAnalyzing natural language comments is not dependant of the computer language used “Detected “commented out” code is computer language dependant and new languages can be added by a new small set of regular expression and exclusion list I29/03/2015 | 69
  • 70. Automating comment checks Automating comment analysis T l t t t & thTool must capture comment & process them Tool must be easy to use with a minimum needTool must be easy to use with a minimum need of configuration T l t d t t t l dTool must detect computer language used The tool must be configurable forg – Adding computer languages by just adding some configuration filesconfiguration files – Categories of problematic comments I29/03/2015 | 70
  • 71. Automating comment checks Automating comment analysis T l t d “ t ” t– Tool must produce an “easy to use” report – Tool must produce a real comment ratio metric – Tool must provide statistics for project I29/03/2015 | 71
  • 72. Automating comment checks Automating comment analysis Thi t l i t C t A l t lThis tool exists: Comment Analyser tool can parse C and ASM and can be easily extended by fi ti tconfiguration to manage – other human languages (detection), – other computer language – other categoriesother categories – new keywords in actual categories I29/03/2015 | 72
  • 73. Global results 45 projects analyzed in C and ASM, 4 Million of SLOC analyzed 3% of comments reveals problems% p Mean comment/code ratio is 2 83Mean comment/code ratio is 2,83 Most common issue is Not in English and “commented out” codea d co e ted out code I29/03/2015 | 73
  • 74. Results global view 45 various projects from 2,3 to 644 KSLOC of C & ASM automotive embedded source code from 2002 to 2015 [PROFANITY] 0% Comment Analysis 2002 to 2015 [NOTENGLISH] 0% [COMMENTEDCODE] 18% [ ] 59% [BESTPRACTICE] 12% [BADPRACTICE] 7% [UNFINISHED] 1% [PROBLEMATIC] 3% [LICENSED] 0% I29/03/2015 | 74
  • 75. Results global: recent projects 22 various projects of C & ASM from 2011 to 2015: reported comments dropped from 4,72% to 1,22% [PROFANITY] 0% Comment Analysis 2011 to 2015 [COMMENTEDCODE] 24% [LICENSED] 1% [NOTENGLISH] 15% 0% 24% [BESTPRACTICE] 26% [BADPRACTICE] 27% [PROBLEMATIC] 6% [UNFINISHED] I29/03/2015 | 75 [UNFINISHED] 1%
  • 76. Results global view Comment ratio comparison between QAC and “Comment Analyser”QAC and “Comment Analyser” QAC Static Analysis tool compute STCDNQAC Static Analysis tool compute STCDN metric as th b f i ibl h t i t–the number of visible characters in comments, divided by the number of visible characters t id toutside comments. –Comment delimiters are ignored.g –Whitespace characters in strings are treated as visible charactersvisible characters. I29/03/2015 | 76
  • 77. Results global view Comment Analyser tool compute this metrics asmetrics as –NBCMT = number of useful characters in t lti l hit h tcomments: multiple whitespace characters are treated as one character, trivial comments (ex: /*******************/) i lifi d ( >/*[ ]*/)/*******************/) are simplified (=>/*[…]*/), and, commented code is removed –NBPRG = number of useful characters outside comments –STCDN² = NBCMT/NBPRG I29/03/2015 | 77
  • 78. Results global view Comment ratio comparison between QAC and “Comment Analyser”Comment Analyser On a project of 194K lines QAC gives a mean STDCN value of 3 15 (3 15 characters in comments for 1value of 3,15 (3,15 characters in comments for 1 character not in comment) while Comment Analyser tool gives 2,4 (2,4 characters in comments for 1 character notg ( in comment). STCDN² metric is significantly different than thoseg y calculated by QAC. This mean that classic way of calculating Comment/CodeThis mean that classic way of calculating Comment/Code ratio is not accurate. I29/03/2015 | 78
  • 79. Results global view Example of tool output for a given project Project: XXXX Comments analysis report Project base directory: E:UserXXXX04-Software04-Coding01-Sources E:UserXXXX04-Software04-Coding01-SourcesApplicationACC_IGACCIG_Manage.c (Rev. 1.23 by Y@valeo.com) 000894 [COMMENTEDCODE] //(WRP_GetData(flg, EvtBCMEmgcStop) == TRUE) 000901 [COMMENTEDCODE] //(WRP G tD t (fl E tBCME St ) TRUE)000901 [COMMENTEDCODE] //(WRP_GetData(flg, EvtBCMEmgcStop) == TRUE) E:UserXXXX04-Software04-Coding01-SourcesApplicationACC_IGACCIG_Manage.h (Rev. 1.4 by Y@valeo.com) 000099 [COMMENTEDCODE] /* #ifndef ACCIG_MANAGE_H */ 000266 [COMMENTEDCODE] /*LOC u8ComCID State = u8COMCID BUSY; */000266 [COMMENTEDCODE] /*LOC_u8ComCID_State = u8COMCID_BUSY; */ E:UserXXXX04-Software04-Coding01-SourcesApplicationDIARCDIA_SendBFDIA_RCSendBF_loc.h (Rev. 1.4 by Y@valeo.com) 000118 [COMMENTEDCODE] /* define u8REQ LF EM ANT INS ALL ((uint8) 0x40) */000118 [COMMENTEDCODE] / define u8REQ_LF_EM_ANT_INS_ALL ((uint8) 0x40) / 000134 [COMMENTEDCODE] /* define u8LF_ALL_EXT_ANT ((uint8) 0x80) */ 001415 [UNFINISHED] /* TODO: remove ASIC activation */ 001764 [COMMENTEDCODE] /* CanACC BDB: 0 = OFF; 1 = ON */[ ] _ ; 001863 [COMMENTEDCODE] /*EVT_strEntries.u8EvtBSTPE = FALSE;*/ 001869 [COMMENTEDCODE] /*EVT_strEntries.bEvtBDB1S01st = FALSE;*/ 001873 [UNFINISHED] // TODO: confirm filtering 001925 [COMMENTEDCODE] /* &&(WRP_GetData(u8,EsclState) == HFS_u8ESCL_UNL) */ … I29/03/2015 | 79
  • 80. Results detailed view Example of warning per type: commented code /* CAR u8ComTypeInProgress == CAR u8COM TYP NO COM *//* CAR_u8ComTypeInProgress == CAR_u8COM_TYP_NO_COM */ // level = BATT_u8BattLevel #if 0 /* Not used in 128 bit key */ if( (((uint8) u8NB_COL_KEY) > ((uint8) 6)) && ( ((uint8) (u8Index % u8NB_COL_KEY)) == ((uint8) 4) )) { /* { _CLI } */ //UTL 8M ( & t T C fi 8T S tK [0]//UTL_u8Memcpy( &strTrpConfig.au8TrpSecretKey[0], WRP_GetData(au8,StartKeys),AUT_u8eSIZEOF_ISK_CODE) /*( )*//*(uint8)*/ /*ASM_vidChecksStack() */ I29/03/2015 | 80
  • 81. Results detailed view Example of warning per type: Bad practice /* PRQA S 3198 */ (Disabling Static analysis tool in source code)/* PRQA S 3198 -- */ (Disabling Static analysis tool in source code) /* game over for the windowed mode, return to immediate mode and counter to 0 */ Missing PVCS $Workfile:$Revision:$Log:$Modtime: or $Date: keywords! I29/03/2015 | 81
  • 82. Results detailed view Example of warning per type: Unfinished /* TODO *//* TODO */ /*TBD: in case of end of stop field success/error/timeout*/ //todo :LINVHStart used ? #if 0 8LCK END ST { // TODO R ??#if 0 case u8LCK_END_ST : { // TODO: Remove ?? // TODO: verify if ( LNK[...] /* xxx x xxxx [...]*/ /* Toff = xxxx ms (unit = 13,5115 µs) */ // TODO: CALL DET /* TBC */ I29/03/2015 | 82
  • 83. Results detailed view Example of warning per type: Problematic If (mode /*=*/=RUNNING) [ ] <= May be a bug! And for sure a very bad practiceIf (mode / / RUNNING) […] < May be a bug! And for sure a very bad practice // BUG : Result is written although not expected + Null pointer provided + Pointer not tested /* Bug : one extra command was sent => LF carrier of length 0 !!! */ /* Temporary workaround: due to a bug in the ASIC software, the ASIC is not woken up by the t [ ]*/request [...]*/ /* Compiler bug: __transponder_reset must be used else address RESET_SUBCOMMAND is not linked */linked / /* direct write in EVT struct !! */ /* Index overflow !!! */ /* !ONLY USED FOR TESTS! */ /* Reset can never be prevented => Problem !!! */ /* ERROR !!! */ I29/03/2015 | 83 / ERROR !!! /
  • 84. Results detailed view Example of warning per type: Not English /* timeout sur ACK 1 *//* timeout sur ACK 1 */ /* si le polling est en cours => signal à EVT pour activation fonction */ /* Accès ML avec calibrage */ /* A t B *//* eomA et eomB */ /* > Plus rien ne doit figurer après ce point. */ #if 0 /* Lecture entrées directes */ LOC_au8TabEntriesBruts[INDEX_ACC] = [...]Dio_ReadChannel(DIO_UC_INFO_ACC) ^ (DIO_UC_INFO_ACC_MASK) OCLOC_au8TabEnt[...] // 0 pour +1 minimum /* Durée du filtrage des entrées capteurs x période tâche = 6 * 2ms = 8 ms + latence prise en compte soit x ms en veille => filtrage de X ms et y ms e[...] I29/03/2015 | 84
  • 85. Automating comment checks Future WorkFuture Work I29/03/2015 | 85
  • 86. Automating comment checks Possible future work T d thi i t d l t l b t iToday this is a stand alone tool, but is also can be provided as an add-on of t ti l i t l lik QAC Csome static analysis tool like QAC-C, QAC-C++  A link with executable source code and comments can be added in order tocomments can be added in order to –Ensure comments presence for strategic lines of code (structure members “if”lines of code (structure members, if statement, “while” statements, …) Etc– Etc… I29/03/2015 | 86
  • 87. Automating comment checks Creating an add-on to configuration management tool to check that modified code also include comments addition or modification –To ensure that comments are always up to datey p I29/03/2015 | 87
  • 89. References Early, previous and related works  G Kingsley Zipf : “Selected studies of the principle of relative G. Kingsley Zipf : Selected studies of the principle of relative frequency in language”, Harvard University Press, Cambridge, MA, USA, 1932. 51 pp. LCCN P123 .Z5.  Lin Tan, Ding Yuan and Yuanyuan Zhou. HotComments. How to Make Program Comments More Useful? Department of Computer Science, University of Illinois at Urbana-Champaign, {lintan2,Science, University of Illinois at Urbana Champaign, {lintan2, dyuan3, yyzhou}@cs.uiuc.edu  W. E. Howden. Comments analysis and programming errors. IEEE Trans. Softw. Eng., 1990  Z. Li and Y. Zhou. PR-Miner: Automatically extracting implicit programming rules and detecting violations in large software code Inprogramming rules and detecting violations in large software code. In FSE'05.  D. Steidl, B. Hummel, E. Juergens: “Quality Analysis of Source Code D. Steidl, B. Hummel, E. Juergens: Quality Analysis of Source Code Comments”, CQSE GmbH, Garching b. München, Germany I29/03/2015 | 89
  • 90. References  N. Khamis, R. Witte, and J. Rilling, “Automatic Quality Assessment of Source Code Comments: the JavadocMiner,” ser. NLDB ’10, 20102010.  M.-A. Storey, J. Ryall, R. I. Bull, D. Myers, and J. Singer, “TODO or To Bug: Exploring How Task Annotations Play a Role in the Workg p g y Practices of Software Developers,” ser. ICSE ’08, 2008.  A. T. T. Ying, J. L. Wright, and S. Abrams, “Source code that talks: l ti f E li t k t d th i i li ti tan exploration of Eclipse task comments and their implication to repository mining,” ser. MSR ’05, 2005.  L Tan D Yuan and Y Zhou “HotComments: How to Make L. Tan, D. Yuan, and Y. Zhou, HotComments: How to Make Program Comments More Useful?” ser. HOTOS ’07, 2007.  D. J. Lawrie, H. Feild, and D. Binkley, “Leveraged Quality, , y, g Q y Assessment using Information Retrieval Techniques,” ser. ICPC ’06, 2006. I29/03/2015 | 90
  • 91. References  Z. M. Jiang and A. E. Hassan, “Examining the Evolution of Code Comments in PostgreSQL,” ser. MSR ’06, 2006.  B. Fluri, M. Wursch, and H. C. Gall, “Do Code and Comments Co- Evolve? On the Relation between Source Code and Comment Changes,” ser. WCRE ’07, 2007.g , ,  J. Tang, H. Li, Y. Cao, and Z. Tang, “Email data cleaning,” ser. KDD’05, 2005.  A. Bacchelli, M. D’Ambros, and M. Lanza, “Extracting Source Code from E-Mails,” ser. ICPC ’10, 2010. I29/03/2015 | 91
  • 92. Automating comment checks “Comments” and Questions?“Comments” and Questions? I29/03/2015 | 92
  • 93. Comments and Questions Any questions? Any comments? “Human language”y g g comments of course… I29/03/2015 | 93
  • 94. Automatic Comment Analysis THANK YOU !THANK YOU ! I29/03/2015 | 94