The document discusses research on providing automated suggestions to developers on logging decisions. It summarizes the presenter's background and experiences working in software quality and testing. The research focuses on studying logging statements across seven open source systems to understand logging characteristics and locations. A deep learning model is trained on code block features to suggest logged and non-logged blocks with reasonable accuracy. Evaluation shows syntactic features achieve the best results, and cross-system models can still provide useful suggestions, as different systems may share similar logging guidelines.
5. Main research area: improving
software quality and testing process
Software System
Performance
counters
System logs
Software tests Bug reports
…
Software
developers
Applying code analysis, machine learning, and
data analytics to provide automated support to
developers
I will focus on my research on
software logging in this talk
6. Logs are often the only source of
information for production systems
System running in
production
Operator Developer
7. Logs can be used to assist various
software development tasks
Performance
Analysis
Requirement
Tracking
Debugging
Monitoring
8. LOG.warn(“Can not parse job id from {} ”, path, e);
Verbosity Level Static Message Dynamic Variables
…
} catch (Exception e) {
}
Logging
Too much: performance overhead;
too many trivial logs
Too little: missing important information
What is a Logging Statement & Trade-off of Logging
Logs record important runtime info,
but with trade-offs
8
9. What is a Logging Statement & Trade-off of Logging
Deciding where to log is
challenging
“Logging and tracing is (IMO) a fine art, knowing what
to log and where takes experience.”
10. Where do developers log?
Studying where do developers log and
provide recommendations
Can we leverage existing code to
recommend logging locations?
11. Source Code Logging statements
(with surrounding code)
Our process of studying and
providing logging suggestions
13. Source Code Logging statements
(with surrounding code)
Manual study on
logging code
Our process of studying and
providing logging suggestions
14. Manual study to
understand their
characteristics
Randomly sample 375
out of 14.9K logging
statements and their
surrendering code
Manually studying logging code
and its location
15. Manual study to
understand their
characteristics
Randomly sample 375
out of 14.9K logging
statements and their
surrendering code
Manually studying logging code
and its location
We uncover 6 categories of logging
locations, and the relationship between
logging statements and code
16. Category 1: Exception information logging in catch blocks
Categories of logging locations
Semantic information
Syntactic
information
The logging statements often record messages
or execution info related to the prior try block.
17. Categories of logging locations
Category 2: Execution state logging in branch blocks
Semantic information
Syntactic
information
Logging statements often record execution
states in different branches.
18. Categories of logging locations
Category 3: Logging the beginning/end of a method block
(method execution)
public void removeJob(JobID jobId){
...
// the end of the method
log.info(“Removed jobId {} from Zookeeper”, jobId);
}
Logging statements often record the
beginning or end of method execution.
Related to the semantic
of the method
19. Source Code Logging statements
(with surrounding code)
Manual study on
logging code
Our process of studying and
providing logging suggestions
Extracting Code Block
Feature
(Syntactic, Semantic,
Fusion)
21. IfStatement, MethodInvocation,
string, is, empti, session, id,
if(Strings.isEmpty(sessionID)) {
handleError();
return;
}
Syntactic
Semantic
Fusion
MethodInvocation,
ReturnStatement
handle, error
IfStatement, MethodInvocation, string, is, empty, session, id
MethodInvocation, handle, error,
ReturnStatement
Extracting code block features
22. Source Code Logging statements
(with surrounding code)
Manual study on
logging code
Our process of studying and
providing logging suggestions
Extracting Code Block
Feature
(Syntactic, Semantic,
Fusion)
Deep Learning
Framework
23. Source Code
Code Block Features
……
Word Embedding Layer
RNN
Cell
RNN
Cell
RNN
Cell
…… RNN
Cell
RNN Layer (LSTM)
Output Layer
Our process of studying and
providing logging suggestions
24. Source Code Logging statements
(with surrounding code)
Manual study on
logging code
Our process of studying and
providing logging suggestions
Extracting Code Block
Feature
(Syntactic, Semantic,
Fusion)
Deep Learning
Framework
Suggestion Results
(logged vs. non-logged block)
25. Research Questions
RQ1 RQ2
How effective are different block
features when suggesting logging
locations?
Are the trained models
transferable to other systems?
Evaluation of our logging location
suggestion models
26. Research Questions
RQ1 RQ2
How effective are different block
features when suggesting logging
locations?
Are the trained models
transferable to other systems?
Evaluation of our logging location
suggestion models
27. • For each system, we use 60% for training, 20%
for validation, and 20% for testing.
• Compute balanced accuracy for evaluation
• How well can the model suggest logged and
non-logged code blocks
Process and metrics for DL model
evaluation
28. Balanced Accuracy of different block features
50
60
70
80
90
Try-Catch Branching Looping Method
Syntactic Semantic Fusion
85.8
77.4
69.0
63.2
Process and metrics for DL model
evaluation
Models trained using syntactic features
achieve the best results.
29. True Positive
(TP)
True Negative
(TN)
False Positive
(FP)
False Negative
(FN)
High overlaps in TN shows non-logged code has distinct characteristics
that are captured by all features. Syntactics has the lowest FNs.
20.1% of the TPs are missed by syntactic but captured by two other
block features. Only small overlaps on FPs among the features.
Studying the overlap among the results
using three different features
30. Manually Studying FPs and FNs
We further manually study a sample of False Positive and
False Negative in our suggestion results
We find that a large portion of the
FPs and FNs may be considered as TPs and TNs.
An example of FP:
Some misclassifications may
actually be correct
The object state is saved to a JSON files instead
of log files
The actual performance of our model may be
even better due to the diverse nature of how
developers write logging code.
31. Research Questions
RQ1 RQ2
How effective are different block
features when suggesting logging
locations?
Are the trained models
transferable to other systems?
Evaluation of our logging location
suggestion models
We suggest logging
locations with reasonable
accuracy. Different features
capture different logging
info in the code.
32. Research Questions
RQ1 RQ2
How effective are different block
features when suggesting logging
locations?
Are the trained models
transferable to other systems?
Evaluation of our logging location
suggestion models
We suggest logging
locations with reasonable
accuracy. Different features
capture different logging
info in the code.
33. Training a model using
syntactic features
Training cross-system models
Apply the models on
other systems
34. Balanced Accuracy for cross-system suggestions
RQ2: Are the trained models transferable to other systems?
10
30
50
70
90
Cassandra Flink Kafka Zookeeper
Within Cross
81.7% 80.0% 84.6% 83.9% 88.4% 80.1% 91.7%
The percentage is the ratio of Cross against Within
Although decreased, cross system suggestion still achieves
reasonable performance compared to within-system suggestion.
Results of cross system suggestion
35. Research Questions
RQ1 RQ2
How effective are different block
features when suggesting logging
locations?
Are the trained models
transferable to other systems?
Evaluation of our logging location
suggestion models
We suggest logging
locations with reasonable
accuracy. Different features
capture different logging
info in the code.
Different systems may
share a similar implicit
logging guideline.