SlideShare a Scribd company logo
1 of 64
Explainable Artificial Intelligence (XAI) 

to Predict and Explain Future Software Defects
Dr. Chakkrit (Kla) Tantithamthavorn
Monash University, Melbourne, Australia.
chakkrit@monash.edu
@klainfohttp://chakkrit.com
Bug
Dr. Chakkrit Tantithamthavorn
Software bugs globally cost $2.84 trillion dollars
https://www.it-cisq.org/the-cost-of-poor-quality-software-in-the-us-a-2018-report/The-Cost-of-Poor-Quality-Software-in-the-US-2018-Report.pdf
A failure to eliminate defects in safety-critical
systems could result in serious injury to people,
threats to life, death, and disasters
https://news.microsoft.com/en-au/features/direct-costs-associated-with-cybersecurity-incidents-costs-australian-businesses-29-billion-per-annum/
59.5 billions annually for US 29 billions annually for Australia
Software evolves extremely fast
50% of the Google’s code base changes every month
Windows 8 involves 100K+ code changes
Software is written in multiple languages, by many people, over a long period of time 

in order to fix bugs , add new features , and improve code quality .
every day
And, software is released faster at massive scale
every 6 months every 6 weeksevery 6 months
How to find bugs?
Use unit testing to test the functionality correctness,
But manual testing for all files is time-consuming
Use static analysis tools check code quality
Use code review to find bugs and check code quality
Use CI/CD to automatically build, test, and merge with confidence
Others: UI testing, fuzzing, load/performance testing, etc.
QA activities take too much time (~50% of a project)
• Large and complex code base: 1 billion lines of code
• > 10K developers in 40+ office locations
• 5K+ projects under active development
• 17K code reviews per day
• 100 million test cases run per day
Given a limited time, how can we effectively prioritise QA
resources on the most risky program elements?
’s rules: All changes must be reviewed*
https://www.codegrip.tech/productivity/what-is-googles-internal-code-review-process/
https://eclipsecon.org/2013/sites/eclipsecon.org.2013/files/2013-03-24%20Continuous%20Integration%20at%20Google%20Scale.pdf
Within 6 months, 1K developers perform 80K+ code reviews 

(~77 reviews per person) for 30K+ code changes / one release
Software Analytics = Software Data + Data Analytics
26 million 

developers
57 million

repositories
100 million

pull requests + code review + CI logs + test logs + docker config files + others
WHY DO WE NEED SOFTWARE ANALYTICS?
To make informed decisions, glean actionable insights, and build empirical theories
PROCESS IMPROVEMENT
How do code review practices and
rapid releases impact software
quality?
PRODUCTIVITY IMPROVEMENT
How do continuous integration practices
impact team productivity?
QUALITY IMPROVEMENT
Why do programs crash?
How to prevent bugs in the future?
EMPIRICAL THEORY BUILDING
A Theory of Software Quality
A Theory of Effort/Cost Estimation
Beyond predicting defects
AI/ML IS SHAPING
SOFTWARE ENGINEERING
IMPROVE SOFTWARE QUALITY
Predict defects, vulnerabilities, malware
Generate test cases
AI/ML IS SHAPING
SOFTWARE ENGINEERING
IMPROVE SOFTWARE QUALITY
Predict defects, vulnerabilities, malware
Generate test cases
Generating UI/requirements/code/comments

Predict developer/team productivity

Recommend developers/reviewers
Identify developer turnover
IMPROVE PRODUCTIVITY
AI/ML MODELS FOR SOFTWARE DEFECTS
Focus on predicting, explaining future software defects, and building empirical theories
Predicting future software
defects so practitioners
can effectively optimize
limited resources
Building empirical-
grounded theories of
software quality
Explaining what makes a
software fail so managers
can develop the most
effective improvement plans
ANALYTICAL MODELLING FRAMEWORK
MAME: Mining, Analyzing, Modelling, Explaining
Raw Data
……
……
A B
Clean Data
MINING
Correlation
.
.
. ..
. .
.
.
..


ANALYZING
Analytical 

Models


MODELLING
Knowledge


EXPLAINING
Raw Data
ITS
Issue 

Tracking

System (ITS)
MINING SOFTWARE DEFECTS
Issue 

Reports
VCS
Version

Control

System (VCS)
Code
Changes
Code
Snapshot
Commit
Log
STEP 1: EXTRACT DATA
Reference: https://github.com/apache/lucene-solr/tree/662f8dd3423b3d56e9e1a197fe816393a33155e2
What are the source files
in this release?
ITS
Code
Changes
Code
Snapshot
Commit
Log
VCS
Issue 

Tracking

System (ITS)
Version

Control

System (VCS)
Issue 

Reports
Raw Data
STEP 2: COLLECT METRICS
STEP 1: EXTRACT DATA
MINING SOFTWARE DEFECTS
Reference: https://github.com/apache/lucene-solr/commit/662f8dd3423b3d56e9e1a197fe816393a33155e2
How many lines are
added or deleted?
ITS VCS
Issue 

Tracking

System (ITS)
Version

Control

System (VCS)
Raw Data
Commit
Log
Issue 

Reports
MINING SOFTWARE DEFECTS
Code
Changes
Code
Snapshot
STEP 2: COLLECT METRICS
STEP 1: EXTRACT DATA
Who edit this file?
ITS VCS
Issue 

Tracking

System (ITS)
Version

Control

System (VCS)
Raw Data
STEP 1: EXTRACT DATA
CODE METRICS
Size, Code Complexity, Cognitive Complexity,

OO Design (e.g., coupling, cohesion)
PROCESS METRICS
Development Practices 

(e.g., #commits, #dev, churn, #pre-
release defects, change complexity)
HUMAN FACTORS
Code Ownership, #MajorDevelopers, 

#MinorDevelopers, Author Ownership,

Developer Experience
Code
Changes
Code
Snapshot
Commit
Log
Issue 

Reports
STEP 2: COLLECT METRICS
MINING SOFTWARE DEFECTS
Reference: https://issues.apache.org/jira/browse/LUCENE-4128
Issue Reference ID
Bug / New Feature
Which releases are affected?
Which commits belong to this
issue report?
ITS
Code
Changes
Code
Snapshot
Commit
Log
VCS
Issue 

Tracking

System (ITS)
Version

Control

System (VCS)
Issue 

Reports
Raw Data
STEP 1: EXTRACT DATA
STEP 2: COLLECT METRICSSTEP 3: IDENTIFY DEFECTS
MINING SOFTWARE DEFECTS
Whether this report is created
after the release of interest?
ITS
Code
Changes
Code
Snapshot
Commit
Log
VCS
Issue 

Tracking

System (ITS)
Version

Control

System (VCS)
Issue 

Reports
Raw Data
STEP 1: EXTRACT DATA
STEP 3: IDENTIFY DEFECTS STEP 2: COLLECT METRICS
……
……
A B
Defect

Dataset
MINING SOFTWARE DEFECTS
Which files were changed to
fix the defect?
Link
Check “Mining Software Defects”
paper [Yatish et al., ICSE 2019]
LABELLING SOFTWARE DEFECTS
Release 1.0
Changes
Issues
Timeline
Timeline
C1: Fixed ID-1
ID=1, v=1.0
A.java
ID=2, v=0.9
C2: Fixed ID-2
B.java
ID=3, v=1.0
C3: Fixed ID-3
C.java
ID=4, v=1.0
C4: Fixed ID-4
D.java
Post-release defects are
defined as modules that are
fixed for a defect report that
affected a release of interest
ID indicates a defect report ID, 

C indicates a commit hash,

v indicates affected release(s)
DEFECTIVE
CLEAN
DEFECTIVE
DEFECTIVE
FILE
A.java
B.java
C.java
D.java
LABEL
Yatish et al., Mining Software Defects: Should We Consider Affected Releases?, In ICSE’19
HIGHLY-CURATED DATASETS
32 releases that span across 9 open-source software systems
Name %DefectiveRatio KLOC
ActiveMQ 6%-15% 142-299
Camel 2%-18% 75-383
Derby 14%-33% 412-533
Groovy 3%-8% 74-90
HBase 20%-26% 246-534
Hive 8%-19% 287-563
JRuby 5%-18% 105-238
Lucene 3%-24% 101-342
Wicket 4%-7% 109-165
Each dataset has 65 software metrics
• 54 code metrics
• 5 process metrics
• 6 ownership metrics
https://awsm-research.github.io/Rnalytica/
Yatish et al., Mining Software Defects: Should We Consider Affected Releases?, In ICSE’19
ANALYTICAL MODELLING FRAMEWORK
MAME: Mining, Analyzing, Modelling, Explaining
Raw Data
……
……
A B
Clean Data
MINING
Correlation
.
.
. ..
. .
.
.
..


ANALYZING
Analytical 

Models


MODELLING
Knowledge


EXPLAINING
Black-Box 

Models
Training 

Data
Learning
Algorithms
A.java
A.java is 

likely to be
defective

(P=0.90)
SOFTWARE DEFECT MODELLING FRAMEWORK
Using well-established AI/ML learning algorithms
Developers make 

an informed decision
Black-Box 

Models
Training 

Data
Learning
Algorithms
A.java
A.java is 

likely to be
defective

(P=0.90)
SOFTWARE DEFECT MODELLING FRAMEWORK
Using well-established AI/ML learning algorithms
Developers make 

an informed decision
Why is A.java defective?
Why is A.java defective rather than clean?
Why is file A.java defective, 

while file B.java is clean?
Article 22 of the European Union’s
General Data Protection Regulation
“The use of data in decision-
making that affects an
individual or group requires 

an explanation for any decision
made by an algorithm.”
http://www.privacy-regulation.eu/en/22.htm
AI/ML BRINGS CONCERNS TO REGULATORS
FAT: Fairness, Accountability, and Transparency
What if AI-assisted
productivity analytics tend to
promote males more than
females?
Do the AI systems conform to
regulation and legislation?
Do we understand how
machines work? Why models
make that predictions?
EXPLAINABLE ARTIFICIAL INTELLIGENCE (XAI)
A suite of AI/ML techniques that produce accurate predictions, while being able to explain such predictions
Black-Box 

Models
Training 

Data
Learning
Algorithms
A.java
Prediction
A.java is 

likely to be
defective

(P=0.90)
Explainable

Interface
Explanation
The system provides an
explanation that justifies its
prediction to the user
EXPLAINING A BLACK-BOX MODEL
Model-Specific Techniques (e.g., ANOVA for Regression / Variable Importance for Random Forest)
Unseen
Data
Black-Box 

Models
Explaining a black-box model to identify the most
important features based on the training data
Model-specific
interpretation 

techniques

(VarImp)
Global Explanation
A prediction
score of
90%
Predictions
Explaining an individual
prediction: we know how
do features contribute to
the final probability for
each prediction?
EXPLAINING AN INDIVIDUAL PREDICTION
Model-Agnostic Techniques to Generate An Outcome Explanation
A prediction
score of
90%
Model-
Agnostic

Techniques
Unseen
Data
Model-specific
interpretation 

techniques

(VarImp)
Black-Box 

Models
Global Explanation
Instance ExplanationsPredictions
WHY IS A.JAVA DEFECTIVE?
Explaining the importance of each metric that contributes to the final probability of each prediction
0.268
0.332
0.169
0.036
0.02
0.007
0.832
+ MAJOR_LINE = 2
+ ADEV = 12
+ CountDeclMethodPrivate = 6
+ CountDeclMethodPublic = 44
+ CountClassCoupled = 16
remaining 21 variables
final_prognosis
0.00 0.25 0.50 0.75 1.00 1.25
#ActiveDevelopers contributed the most
to the likelihood of being defective for this
module
A Quality Improvement Plan
“A policy to maintain the maximum
number of (two) developers who can
edited a module in the past (six) months”
Software Analytics in Action

A Hands-on Tutorial on Analyzing and Modelling Software Data
Dr. Chakkrit (Kla) Tantithamthavorn
Monash University, Melbourne, Australia.
chakkrit@monash.edu
@klainfohttp://chakkrit.com
Statistical

Model
Training

Corpus
Classifier 

Parameters
(7) Model

Construction
Performance

Measures
Data 

Sampling
(2) Data Cleaning and Filtration
(3) Metrics Extraction and Normalization
(4) Descriptive
Analytics
(+/-) Relationship
to the Outcome
Y
X
x
Software

Repository
Software

Dataset
Clean

Dataset
Studied Dataset
Outcome Studied Metrics Control Metrics
+~
(1) Data Collection
Predictive 

Analytics
Prescriptive
Analytics
(8) Model Validation
(9) Model Analysis
and Interpretation
Importance 

Score
Testing

Corpus
PredictionsPerformance

Estimates
Patterns
Challenges of Data Analytics Pipeline
How to clean data? How to collect ground-truths?
Should we rebalance the data?
Are features correlated? Which ML techniques is best?
Which model validation techniques should I use?
What is the benefit of optimising ML parameters?
How to analyse or explain the ML models?
Should we apply feature reduction?
What is best data analytics pipeline for
software defects?
Mining Software Data
Analyzing Software Data
Affected Releases

[ICSE’19]
Issue Reports

[ICSE’15]
Control Features

[ICSE-SEIP’18]
Feature Selection

[ICSME’18]
Correlation Analysis

[TSE’19]
Modelling Software Data
Class Imbalance

[TSE’19]
Parameters

[ICSE’16,TSE’18]
Model Validation

[TSE’17]
Measures

[ICSE-SEIP’18]
Explaining Software Data
Model Statistics

[ICSE-SEIP’18]
Interpretation

[TSE’19]
ANALYZING AND MODELLING 

SOFTWARE DEFECTS
Tantithamthavorn and Hassan. An Experience Report on Defect
Modelling in Practice: Pitfalls and Challenges. In ICSE-SEIP’18
MSR’19 

Education
RUN JUPYTER + R ANYTIME AND ANYWHERE
http://github.com/awsm-research/tutorial
Shift + Enter to run a cell
EXAMPLE DATASET # Load a defect dataset
>
>
>
>
>

>
source("import.R")

eclipse <- loadDefectDataset("eclipse-2.0")
data <- eclipse$data
indep <- eclipse$indep
dep <- eclipse$dep
data[,dep] <- factor(data[,dep])
6,729 files, 32 metrics

14% defective ratio
Tantithamthavorn and Hassan. An Experience Report on Defect Modelling in Practice: Pitfalls and Challenges. In ICSE-SEIP’18, pages 286-295.
# Understand your dataset
> describe(data)
data
33 Variables 6729 Observations
-------------------------------------------------
CC_sum
n missing distinct Mean
6729 0 268 26.9
lowest : 0 1 , highest: 1052 1299
————————————————————————
post
n missing distinct
6729 0 2
Value FALSE TRUE
Frequency 5754 975
Proportion 0.855 0.145[Zimmermann et al, PROMISE’07]
Is program complexity
associated with
software quality?
BUILDING A THEORY OF SOFTWARE QUALITY
# Develop a logistic regression
> m <- glm(post ~ CC_max, data = data)
# Print a model summary
> summary(m)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.490129 0.051777 -48.09 <2e-16 ***
CC_max 0.104319 0.004819 21.65 <2e-16 ***

---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’
0.1 ‘ ’ 1
INTRO: BASIC REGRESSION ANALYSIS
Theoretical Assumptions
1. Binary dependent variable and ordinal independent variables
2. Observations are independent
3. No (multi-)collinearity among independent variables
4. Assume a linear relationship between the logit of the outcome
and each variable
# Visualize the relationship of the studied variable
>
>
>
install.packages("effects")
library(effects)
plot(allEffects(m))

































CC_max effect plot
CC_max
post
0.20.40.60.8
0 50 100 150 200 250 300
Which factors share the
strongest association
with software quality?
BUILDING A THEORY OF SOFTWARE QUALITY
……
……
A B
Knowledge
Analytical 

Models
Clean Data
Correlation
.
.
. ..
. .
.
.
..
BEST PRACTICES FOR ANALYTICAL MODELLING
(1) Include control features
(3) Build interpretable models
(4) Explore different settings
(2) Remove correlated features
(7) Visualize the relationship
(5) Use out-of-sample bootstrap(6) Summarize by a Scott-Knott test
(1) Don’t use ANOVA Type-I
(2) Don’t optimize prob thresholds
7 DOs and 3 DON’Ts
(3) Don’t solely use F-measure
STEP1: INCLUDE CONTROL FEATURES
Size, OO Design 

(e.g., coupling, cohesion),
Program Complexity
Software
Defects
Control features are features that are not of interest even though they could affect the outcome
of a model (e.g., lines of code when modelling defects).
#commits, #dev, churn, 

#pre-release defects, 

change complexity
Code Ownership,

#MinorDevelopers,
Experience
Principles of designing factors
1. Easy/simple measurement
2. Explainable and actionable
3. Support decision making
Tantithamthavorn et al., An Experience Report on Defect Modelling in Practice: Pitfalls and Challenges. ICSE-SEIP’18
The risks of not including control features (e.g., lines of code)
STEP1: INCLUDE CONTROL FEATURES
# post ~ CC_max + PAR_max + FOUT_max

>
>
m1 <- glm(post ~ CC_max + PAR_max + FOUT_max, data
= data, family="binomial")
anova(m1)
Analysis of Deviance Table
Model: binomial, link: logit
Response: post
Terms added sequentially (first to last)
Df Deviance Resid. Df Resid. Dev
NULL 6728 5568.3
CC_max 1 600.98 6727 4967.3
PAR_max 1 131.45 6726 4835.8
FOUT_max 1 60.21 6725 4775.6

# post ~ TLOC + CC_max + PAR_max + FOUT_max

>
>
m2 <- glm(post ~ TLOC + CC_max + PAR_max +
FOUT_max, data = data, family="binomial")
anova(m2)

Analysis of Deviance Table
Model: binomial, link: logit
Response: post
Terms added sequentially (first to last)
Df Deviance Resid. Df Resid. Dev
NULL 6728 5568.3
TLOC 1 709.19 6727 4859.1
CC_max 1 74.56 6726 4784.5
PAR_max 1 63.35 6725 4721.2
FOUT_max 1 17.41 6724 4703.8
Complexity is the top rank Lines of code is the top rank
Conclusions may change when including control features
STEP2: REMOVE CORRELATED FEATURES
The state of practices in software engineering
Jiarpakdee et al: The Impact of Correlated Metrics on the Interpretation of Defect Models. TSE’19
“82% of SE datasets have 

correlated features”
“63% of SE studies do not 

mitigate correlated features”
Why? Most metrics are aggregated.
Collinearity is a phenomenon in which one
feature can be linearly predicted by another
feature
The risks of not removing correlated factors
STEP2: REMOVE CORRELATED FEATURES
Model 1 Model 2
CC_max 74 19
CC_avg 2 58
PAR_max 16 16
FOUT_max 7 7
Model1: Post ~ CC_max + CC_avg + PAR_max + FOUT_max
Model2: Post ~ CC_avg + CC_max + PAR_max + FOUT_max
CC_max is highly correlated with CC_avg
The values indicate the contribution of each
factor to the model (from ANOVA analysis)
Jiarpakdee et al: The Impact of Correlated Metrics on the Interpretation of Defect Models. TSE’19
Conclusions may be changed when reordering the correlated features
# Visualize spearman’s correlation for all metrics using a hierarchical clustering
>
>
>
library(rms)
plot(varclus(as.matrix(data[,indep]), similarity="spear", trans="abs"))
abline(h=0.3, col="red")
STEP2: REMOVE CORRELATED FACTORS
Using Spearman’s correlation analysis to detect collinearity
1
NSM_avg
NSM_max
NSM_sum
NSF_avg
NSF_max
NSF_sum
PAR_avg
PAR_max
PAR_sum
pre
NOI
NOT
FOUT_sum
MLOC_sum
TLOC
NBD_sum
CC_sum
FOUT_avg
FOUT_max
NBD_avg
NBD_max
CC_avg
CC_max
MLOC_avg
MLOC_max
ACD
NOF_avg
NOF_max
NOF_sum
NOM_avg
NOM_max
NOM_sum
1.00.60.2
Spearmanρ
2
4
5
3
6 7
# Visualize spearman’s correlation for all metrics using a hierarchical clustering
>
>
>
library(rms)
plot(varclus(as.matrix(data[,indep]), similarity="spear", trans="abs"))
abline(h=0.3, col="red")
STEP2: REMOVE CORRELATED FACTORS
Using Spearman’s correlation analysis to detect collinearity
1
NSM_avg
NSM_max
NSM_sum
NSF_avg
NSF_max
NSF_sum
PAR_avg
PAR_max
PAR_sum
pre
NOI
NOT
FOUT_sum
MLOC_sum
TLOC
NBD_sum
CC_sum
FOUT_avg
FOUT_max
NBD_avg
NBD_max
CC_avg
CC_max
MLOC_avg
MLOC_max
ACD
NOF_avg
NOF_max
NOF_sum
NOM_avg
NOM_max
NOM_sum
1.00.60.2
Spearmanρ
2
4
Using domain knowledge to manually select one
metric in a group. After mitigating correlated
metrics, we should have 9 factors (7+2).
A GROUP OF CORRELATED METRICS
NON-CORRELATED METRICS
5
3
6 7
# Visualize spearman’s correlation for all metrics using a hierarchical clustering
>
>
>
library(rms)
plot(varclus(as.matrix(data[,indep]), similarity="spear", trans="abs"))
abline(h=0.3, col="red")
STEP2: REMOVE CORRELATED FACTORS
1
NSM_avg
NSM_max
NSM_sum
NSF_avg
NSF_max
NSF_sum
PAR_avg
PAR_max
PAR_sum
pre
NOI
NOT
FOUT_sum
MLOC_sum
TLOC
NBD_sum
CC_sum
FOUT_avg
FOUT_max
NBD_avg
NBD_max
CC_avg
CC_max
MLOC_avg
MLOC_max
ACD
NOF_avg
NOF_max
NOF_sum
NOM_avg
NOM_max
NOM_sum
1.00.60.2
Spearmanρ
2
3 4
5 6
AutoSpearman (1) removes constant factors, and
(2) selects one factor of each group that shares
the least correlation with other factors that are not
in that group
7
How to automatically mitigate (multi-)collinearity?
Jiarpakdee et al: AutoSpearman: Automatically Mitigating Correlated Software Metrics for Interpreting Defect Models. ICSME’18
# Run a AutoSpearman
>
>
>
library(Rnalytica)
filterindep <- AutoSpearman(data, indep)
plot(varclus(as.matrix(data[, filterindep]), similarity="spear", trans="abs"))
abline(h=0.3, col="red")
STEP2: REMOVE CORRELATED FACTORS
How to automatically mitigate (multi-)collinearity?
NSF_avg
NSM_avg
PAR_avg
pre
NBD_avg
NOT
ACD
NOF_avg
NOM_avg
0.70.50.30.1
Spearmanρ
Jiarpakdee et al: AutoSpearman: Automatically Mitigating Correlated Software Metrics for Interpreting Defect Models. ICSME’18
STEP3: BUILD AND EXPLAIN DECISION TREES
R implementation of a Decision Trees-Based model (C5.0)
# Build a C5.0 tree-based model
>
>
tree.model <- C5.0(x = data[,indep], y = data[,dep])
summary(tree.model)
Read 6,729 cases (10 attributes) from undefined.data
Decision tree:
pre <= 1:
:...NOM_avg <= 17.5: FALSE (4924/342)
: NOM_avg > 17.5:
: :...NBD_avg > 1.971831:
: :...ACD <= 2: TRUE (51/14)
: : ACD > 2: FALSE (5)



Tantithamthavorn et al: Automated parameter optimization of classification techniques for defect prediction models. ICSE’16
# Plot a Decision Tree-based model
>plot(tree.model)



























pre
1
≤ 1 > 1
NOM_avg
2
≤ 17.5 > 17.5
Node 3 (n = 4924)
TRUEFALSE
0
0.2
0.4
0.6
0.8
1
NBD_avg
4
≤ 1.972 > 1.972
NBD_avg
5
≤ 1.029 > 1.029
Node 6 (n = 64)
TRUEFALSE
0
0.2
0.4
0.6
0.8
1
NOM_avg
7
≤ 64 > 64
Node 8 (n = 332)
TRUEFALSE
0
0.2
0.4
0.6
0.8
1
Node 9 (n = 21)
TRUEFALSE
0
0.2
0.4
0.6
0.8
1
ACD
10
≤ 2 > 2
Node 11 (n = 51)
TRUEFALSE
0
0.2
0.4
0.6
0.8
1
Node 12 (n = 5)
TRUEFALSE
0
0.2
0.4
0.6
0.8
1
pre
13
≤ 6 > 6
NBD_avg
14
≤ 1.012 > 1.012
Node 15 (n = 180)
TRUEFALSE
0
0.2
0.4
0.6
0.8
1
NOM_avg
16
≤ 23.5 > 23.5
PAR_avg
17
≤ 0.677 > 0.677
PAR_avg
18
≤ 0.579 > 0.579
NBD_avg
19
≤ 2.833> 2.833
Node 20 (n = 118)
TRUEFALSE
0
0.2
0.4
0.6
0.8
1
Node 21 (n = 7)
TRUEFALSE
0
0.2
0.4
0.6
0.8
1
NBD_avg
22
≤ 1.564> 1.564
Node 23 (n = 70)
TRUEFALSE
0
0.2
0.4
0.6
0.8
1
Node 24 (n = 29)
TRUEFALSE
0
0.2
0.4
0.6
0.8
1
pre
25
≤ 2 > 2
NBD_avg
26
≤ 2.13 > 2.13
Node 27 (n = 188)
TRUEFALSE
0
0.2
0.4
0.6
0.8
1
NOF_avg
28
≤ 1.75 > 1.75
Node 29 (n = 29)
TRUEFALSE
0
0.2
0.4
0.6
0.8
1
NOM_avg
30
≤ 6.5 > 6.5
Node 31 (n = 10)
TRUEFALSE
0
0.2
0.4
0.6
0.8
1
Node 32 (n = 15)
TRUEFALSE
0
0.2
0.4
0.6
0.8
1
PAR_avg
33
≤ 1.75 > 1.75
Node 34 (n = 288)
TRUEFALSE
0
0.2
0.4
0.6
0.8
1
NBD_avg
35
≤ 1.161> 1.161
Node 36 (n = 4)
TRUEFALSE
0
0.2
0.4
0.6
0.8
1
Node 37 (n = 37)
TRUEFALSE
0
0.2
0.4
0.6
0.8
1
ACD
38
≤ 0 > 0
NSF_avg
39
≤ 11 > 11
Node 40 (n = 73)
TRUEFALSE
0
0.2
0.4
0.6
0.8
1
Node 41 (n = 12)
TRUEFALSE
0
0.2
0.4
0.6
0.8
1
NOF_avg
42
≤ 12 > 12
Node 43 (n = 27)
TRUEFALSE
0
0.2
0.4
0.6
0.8
1
Node 44 (n = 7)
TRUEFALSE
0
0.2
0.4
0.6
0.8
1
pre
45
≤ 12 > 12
NSM_avg
46
≤ 0.25 > 0.25
NOF_avg
47
≤ 10.5 > 10.5
Node 48 (n = 94)
TRUEFALSE
0
0.2
0.4
0.6
0.8
1
Node 49 (n = 11)
TRUEFALSE
0
0.2
0.4
0.6
0.8
1
Node 50 (n = 56)
TRUEFALSE
0
0.2
0.4
0.6
0.8
1
Node 51 (n = 77)
TRUEFALSE
0
0.2
0.4
0.6
0.8
1
STEP3: BUILD AND EXPLAIN RULES MODELS
R implementation of a Rules-Based model (C5.0)
Tantithamthavorn et al: Automated parameter optimization of classification techniques for defect prediction models. ICSE’16
# Build a C5.0 rule-based model
Rule 13: (56/19, lift 4.5)
pre <= 1
NBD_avg > 1.971831
NOM_avg > 17.5
-> class TRUE [0.655]
Rule 14: (199/70, lift 4.5)
pre > 1
NBD_avg > 1.012195
NOM_avg > 23.5
-> class TRUE [0.647]
Rule 15: (45/16, lift 4.4)
pre > 2
pre <= 6
NBD_avg > 1.012195
PAR_avg > 1.75
-> class TRUE [0.638]
# Build a C5.0 rule-based model
>
>
rule.model <- C5.0(x = data[, indep], y =
data[,dep], rules = TRUE)
summary(rule.model)



Rules:
Rule 1: (2910/133, lift 1.1)
pre <= 6
NBD_avg <= 1.16129
-> class FALSE [0.954]
Rule 2: (3680/217, lift 1.1)
pre <= 2
NOM_avg <= 6.5
-> class FALSE [0.941]

Rule 3: (4676/316, lift 1.1)
pre <= 1
NBD_avg <= 1.971831
NOM_avg <= 64
-> class FALSE [0.932]
STEP3: BUILD AND EXPLAIN RF MODELS
R implementation of a Random Forest model
# Build a random forest model
>
>
>
f <- as.formula(paste( "RealBug", '~', paste(indep,
collapse = "+")))
rf.model <- randomForest(f, data = data, importance
= TRUE)
print(rf.model)



Call:
Type of random forest: classification
Number of trees: 500
No. of variables tried at each split: 1
OOB estimate of error rate: 12.3%
Confusion matrix:
FALSE TRUE class.error
FALSE 567 42 0.06896552
TRUE 57 139 0.29081633





# Plot a Random Forest model
>plot(rf.model)











































NOT
ACD
NSF_avg
NSM_avg
NOF_avg
PAR_avg
NBD_avg
NOM_avg
pre
●
●
●
●
●
●
●
●
●
10 20 30 40 50 60 70
MeanDecreaseAccuracy
NOT
ACD
NSM_avg
NSF_avg
NOF_avg
pre
NOM_avg
PAR_avg
NBD_avg
●
0
rf.model
STEP4: EXPLORE DIFFERENT SETTINGS
The risks of using default parameter settings
Tantithamthavorn et al: Automated parameter optimization of classification techniques for defect prediction models. ICSE’16

Fu et al. Tuning for software analytics: Is it really necessary? IST'16
87% of the widely-used classification
techniques require at least one
parameter setting [ICSE’16]
#trees for 

random forest
#clusters for 

k-nearest neighbors
#hidden layers
for neural networks
"80% of top-50 highly-cited defect
studies rely on a default setting
[IST’16]”
STEP4: EXPLORE DIFFERENT SETTINGS
The risks of using default parameter settings
Dataset
Generate
training
samples
Training

samples
Testing

samples
Models
Build 

models

w/ diff settings
Random Search

Differential Evolution ●●
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
C
50.1trial
C
50.100trials
R
F.10trees
R
F.100trees
G
LM
AUC
AUC Improvement

for C5.0
AUC Improvement

for RF
STEP5: USE OUT-OF-SAMPLE BOOTSTRAP
To estimate how well a model will perform on unseen data
Tantithamthavorn et al: An Empirical Comparison of Model Validation Techniques for Defect Prediction Models. TSE’17
Testing
70% 30%
Training
Holdout Validation k-Fold Cross Validation
Repeat k times
Bootstrap Validation
50% Holdout
70% Holdout
Repeated 50% Holdout
Repeated 70% Holdout
Leave-one-out CV
2 Fold CV
10 Fold CV
Repeated 10 fold CV
Ordinary bootstrap
Optimism-reduced bootstrap
Out-of-sample bootstrap
.632 Bootstrap
TestingTraining
Repeat N times
TestingTraining
STEP5: USE OUT-OF-SAMPLE BOOTSTRAP
R Implementation of out-of-sample bootstrap and 10-folds cross validation
Tantithamthavorn et al: An Empirical Comparison of Model Validation Techniques for Defect Prediction Models. TSE’17
# Out-of-sample Bootstrap Validation
>
>
>
>
>
>
for(i in seq(1,100)){
set.seed(1234+i)
indices <- sample(nrow(data), replace=TRUE)
training <- data[indices,]
testing <- data[-indices,]
…
}
# 10-Folds Cross-Validation Bootstrap Validation
>
>
>
>
>
indices <- createFolds(data[, dep], k = 10, list =
TRUE, returnTrain = TRUE)
for(i in seq(1,10)){
training <- data[indices[[i]],]
testing <- data[-indices[[i]],]
…
}
●
●
AUC
100
Bootstrap
10X10−Fold
C
V
0.75
0.78
0.81
0.84
value
More accurate and more
stable performance
estimates [TSE’17]
Fold 1
100 modules, 5% defective rate
10-folds cross-validation
Fold 5
Fold 6
…
Fold 10
There is a high chance that a testing sample
does not have any defective modules
…
Out-of-sample bootstrap
Training
Testing
A sample with replacement with the
same size of the original sample
Modules that do not appear in the
bootstrap sample
Bootstrap sample
~36.8%
A bootstrap sample is nearly
representative of the original dataset
STEP5: USE OUT-OF-SAMPLE BOOTSTRAP
The risks of using 10-folds CV on small datasets
Tantithamthavorn et al: An Empirical Comparison of Model Validation Techniques for Defect Prediction Models. TSE’17
STEP6: SUMMARIZE BY A SCOTTKNOTT-ESD TEST
To statistically determine the ranks of the most significant metrics
# Run a ScottKnottESD test
>
>
>
>
>
>
>
>
>
>
>
>
>
>
importance <- NULL
indep <- AutoSpearman(data, eclipse$indep)
f <- as.formula(paste( "post", '~', paste(indep,
collapse = "+")))
for(i in seq(1,100)){
indices <- sample(nrow(data), replace=TRUE)
training <- data[indices,]
m <- glm(f, data = training, family="binomial")
importance <- rbind(importance,
Anova(m,type="2",test="LR")$"LR Chisq")
}
importance <- data.frame(importance)
colnames(importance) <- indep
sk_esd(importance)
Groups:
pre NOM_avg NBD_avg ACD NSF_avg PAR_avg
1 2 3 4 5 6 

NOT NSM_avg NOF_avg

7 7 8
●
●
●●
●
●
●
●●
●
●
●●
●
● ●●●●●
●
●●●
●
●●●●●●●●
●
1 2 3 4 5 6 7 8
pre
N
O
M
_avg
N
BD
_avg
AC
D
N
SF_avg
PAR
_avg
N
O
T
N
SM
_avg
N
O
F_avg
0
100
200
300
variablevalue
Each rank has a statistically
significant difference with non-
negligible effect size [TSE’17]
# Visualize the relationship of the studied variable
>
>
>
>
>
library(effects)
indep <- AutoSpearman(data, eclipse$indep)
f <- as.formula(paste( "post", '~', paste(indep, collapse = "+")))
m <- glm(f, data = data, family="binomial")
plot(effect("pre",m))































STEP7: VISUALIZE THE RELATIONSHIP
To understand the relationship between the studied metric and the outcome
pre effect plot
pre
post
0.2
0.4
0.6
0.8
0 10 20 30 40 50 60 70
# ANOVA Type-I
>
>
>
>
>
>
Df Deviance Resid. Df Resid. Dev
NULL 6728 5568.3
NSF_max 1 45.151 6727 5523.1
NSM_max 1 17.178 6726 5505.9
NOF_max 1 50.545 6725 5455.4
ACD 1 43.386 6724 5412.
FIRST, DON’T USE ANOVA TYPE-I
To measure the significance/contribution of each metric to the model
Jiarpakdee et al: The Impact of Correlated Metrics on the Interpretation of Defect Models. TSE’19
RSS(post ~ 1)
RSS(post ~ NSF_max)
ANOVA Type-I measures the improvement of the Residual Sum of Squares (RSS) (i.e., the unexplained variance)
when each metric is sequentially added into the model.
RSS(post ~ NSF_max) - RSS(post ~ 1) = 45.151
RSS(post ~ NSF_max + NSM_max) - RSS(post ~ NSF_max) = 17.178
FIRST, DON’T USE ANOVA TYPE-I
To measure the significance/contribution of each metric to the model
Jiarpakdee et al: The Impact of Correlated Metrics on the Interpretation of Defect Models. TSE’19
# ANOVA Type-II
> > Anova(m)
Analysis of Deviance Table (Type II tests)
Response: post
LR Chisq Df Pr(>Chisq)
NSF_max 10.069 1 0.001508 **
NSM_max 17.756 1 2.511e-05 ***
NOF_max 21.067 1 4.435e-06 ***
ACD 43.386 1 4.493e-11 ***
RSS(post ~ all except the studied metric) - RSS(post ~ all metrics)
ANOVA Type-II measures the improvement of the Residual Sum of Squares (RSS) (i.e., the unexplained variance)
when adding a metric under examination to the model after the other metrics.
glm(post ~ X2 + X3 + X4, data=data)$deviance - glm(post ~ X1 + X2 + X3 + X4, data=data)$deviance
DON’T USE ANOVA TYPE-I
Instead, future studies must use ANOVA Type-II/III
Jiarpakdee et al: The Impact of Correlated Metrics on the Interpretation of Defect Models. TSE’19
Model 1 Model 2
Type 1 Type 2 Type 1 Type 2
ACD 28% 47% 49% 47%
NOF_max 32% 23% 13% 23%
NSM_max 11% 19% 31% 19%
NSF_max 29% 11% 7% 11%
Model1: post ~ NSF_max + NSM_max + NOF_max + ACD
Model2: post ~ NSM_max + ACD + NSF_max + NOF_max
Reordering
DON’T SOLELY USE F-MEASURES
Other (domain-specific) practical measures should also be included
Threshold-independent Measures

Area Under the ROC Curve = The discrimination ability to classify 2 outcomes.
Ranking Measures

Precision@20%LOC = The precision when inspecting the top 20% LOC

Recall@20%LOC = The recall when inspecting the top 20% LOC

Initial False Alarm (IFA) = The number of false alarms to find the first bug [Xia ICSME’17]

Effort-Aware Measures

Popt = an effort-based cumulative lift chart [Mende PROMISE’09]

Inspection Effort = The amount of effort (LOC) that is required to find the first bug. [Arisholm JSS’10]
DON’T SOLELY USE F-MEASURES
●
●
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
C
50.100trials
R
F.100trees
C
50.1trial
R
F.10trees
G
LM
F−measure(0.5)
●
●
●
●
●
●
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
C
50.100trials
R
F.100trees
C
50.1trial
R
F.10trees
G
LM
F−measure(0.8)
●
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
C
50.100trials
R
F.100trees
C
50.1trial
R
F.10trees
G
LM
F−measure(0.2)
Tantithamthavorn and Hassan. An Experience Report on Defect Modelling in Practice: Pitfalls and Challenges. In ICSE-SEIP’18
The risks of changing probability thresholds
DO NOT IMPLY CAUSATIONS
Complexity is the root cause of software defects
Software defects are caused by the high code
complexity
Complexity shares the strongest association with defect-
proneness
PH.D. SCHOLARSHIP
• Tuition Fee Waivers
• $28,000 Yearly Stipend
• Travel Funding
• A University-Selected Laptop (e.g.,
MacBook Pro)
• Access to HPC/GPU clusters (4,112 CPU
cores, 168 GPU co-processors, 3PB) +
NVIDIA DGX1-V
1. Written Communication Skills
2. Research
3. Public Speaking
4. Project Management
5. Leadership
6. Critical Thinking Skills
7. Team Collaboration
7 Developing Skills

More Related Content

What's hot

An Introduction to Computer Vision
An Introduction to Computer VisionAn Introduction to Computer Vision
An Introduction to Computer Visionguestd1b1b5
 
Common Problems of Software Development
Common Problems of Software DevelopmentCommon Problems of Software Development
Common Problems of Software DevelopmentAleksejs Truhans
 
Internship - Bootstrap
Internship - BootstrapInternship - Bootstrap
Internship - Bootstraptanay29
 
Adaptive software development (asd) a minimalist approach to complex software...
Adaptive software development (asd) a minimalist approach to complex software...Adaptive software development (asd) a minimalist approach to complex software...
Adaptive software development (asd) a minimalist approach to complex software...Katy Slemon
 
Software Engineering (Requirements Engineering & Software Maintenance)
Software Engineering (Requirements Engineering  & Software Maintenance)Software Engineering (Requirements Engineering  & Software Maintenance)
Software Engineering (Requirements Engineering & Software Maintenance)ShudipPal
 
Software architecture
Software architectureSoftware architecture
Software architecturenazn
 
Line Of Code(LOC) In Software Engineering By NADEEM AHMED FROM DEPALPUR
Line Of Code(LOC) In Software Engineering By NADEEM AHMED FROM DEPALPURLine Of Code(LOC) In Software Engineering By NADEEM AHMED FROM DEPALPUR
Line Of Code(LOC) In Software Engineering By NADEEM AHMED FROM DEPALPURNA000000
 
A presentation on software crisis
A presentation on software crisisA presentation on software crisis
A presentation on software crisischandan sharma
 
Intoduction to software engineering part 1
Intoduction to software engineering part 1Intoduction to software engineering part 1
Intoduction to software engineering part 1Rupesh Vaishnav
 
Cypress, Playwright, Selenium, or WebdriverIO? Let the Engineers Speak!
Cypress, Playwright, Selenium, or WebdriverIO? Let the Engineers Speak!Cypress, Playwright, Selenium, or WebdriverIO? Let the Engineers Speak!
Cypress, Playwright, Selenium, or WebdriverIO? Let the Engineers Speak!Applitools
 
Asynchronous JavaScript Programming with Callbacks & Promises
Asynchronous JavaScript Programming with Callbacks & PromisesAsynchronous JavaScript Programming with Callbacks & Promises
Asynchronous JavaScript Programming with Callbacks & PromisesHùng Nguyễn Huy
 
Jetpack Compose.pdf
Jetpack Compose.pdfJetpack Compose.pdf
Jetpack Compose.pdfSumirVats
 
AI Supremacy in Games: Deep Blue, Watson, Cepheus, AlphaGo, DeepStack and Ten...
AI Supremacy in Games: Deep Blue, Watson, Cepheus, AlphaGo, DeepStack and Ten...AI Supremacy in Games: Deep Blue, Watson, Cepheus, AlphaGo, DeepStack and Ten...
AI Supremacy in Games: Deep Blue, Watson, Cepheus, AlphaGo, DeepStack and Ten...Karel Ha
 

What's hot (20)

An Introduction to Computer Vision
An Introduction to Computer VisionAn Introduction to Computer Vision
An Introduction to Computer Vision
 
SE chapter 5
SE chapter 5SE chapter 5
SE chapter 5
 
Windows Services 101
Windows Services 101Windows Services 101
Windows Services 101
 
Common Problems of Software Development
Common Problems of Software DevelopmentCommon Problems of Software Development
Common Problems of Software Development
 
Software Quality Metrics
Software Quality MetricsSoftware Quality Metrics
Software Quality Metrics
 
AutoMapper
AutoMapperAutoMapper
AutoMapper
 
Linq to sql
Linq to sqlLinq to sql
Linq to sql
 
Internship - Bootstrap
Internship - BootstrapInternship - Bootstrap
Internship - Bootstrap
 
Adaptive software development (asd) a minimalist approach to complex software...
Adaptive software development (asd) a minimalist approach to complex software...Adaptive software development (asd) a minimalist approach to complex software...
Adaptive software development (asd) a minimalist approach to complex software...
 
Software Engineering (Requirements Engineering & Software Maintenance)
Software Engineering (Requirements Engineering  & Software Maintenance)Software Engineering (Requirements Engineering  & Software Maintenance)
Software Engineering (Requirements Engineering & Software Maintenance)
 
Software architecture
Software architectureSoftware architecture
Software architecture
 
Slides chapter 3
Slides chapter 3Slides chapter 3
Slides chapter 3
 
Line Of Code(LOC) In Software Engineering By NADEEM AHMED FROM DEPALPUR
Line Of Code(LOC) In Software Engineering By NADEEM AHMED FROM DEPALPURLine Of Code(LOC) In Software Engineering By NADEEM AHMED FROM DEPALPUR
Line Of Code(LOC) In Software Engineering By NADEEM AHMED FROM DEPALPUR
 
A presentation on software crisis
A presentation on software crisisA presentation on software crisis
A presentation on software crisis
 
Introduction to OpenCV
Introduction to OpenCVIntroduction to OpenCV
Introduction to OpenCV
 
Intoduction to software engineering part 1
Intoduction to software engineering part 1Intoduction to software engineering part 1
Intoduction to software engineering part 1
 
Cypress, Playwright, Selenium, or WebdriverIO? Let the Engineers Speak!
Cypress, Playwright, Selenium, or WebdriverIO? Let the Engineers Speak!Cypress, Playwright, Selenium, or WebdriverIO? Let the Engineers Speak!
Cypress, Playwright, Selenium, or WebdriverIO? Let the Engineers Speak!
 
Asynchronous JavaScript Programming with Callbacks & Promises
Asynchronous JavaScript Programming with Callbacks & PromisesAsynchronous JavaScript Programming with Callbacks & Promises
Asynchronous JavaScript Programming with Callbacks & Promises
 
Jetpack Compose.pdf
Jetpack Compose.pdfJetpack Compose.pdf
Jetpack Compose.pdf
 
AI Supremacy in Games: Deep Blue, Watson, Cepheus, AlphaGo, DeepStack and Ten...
AI Supremacy in Games: Deep Blue, Watson, Cepheus, AlphaGo, DeepStack and Ten...AI Supremacy in Games: Deep Blue, Watson, Cepheus, AlphaGo, DeepStack and Ten...
AI Supremacy in Games: Deep Blue, Watson, Cepheus, AlphaGo, DeepStack and Ten...
 

Similar to Explainable Artificial Intelligence (XAI) 
to Predict and Explain Future Software Defects

Machine programming
Machine programmingMachine programming
Machine programmingDESMOND YUEN
 
Finding Bugs, Fixing Bugs, Preventing Bugs — Exploiting Automated Tests to In...
Finding Bugs, Fixing Bugs, Preventing Bugs — Exploiting Automated Tests to In...Finding Bugs, Fixing Bugs, Preventing Bugs — Exploiting Automated Tests to In...
Finding Bugs, Fixing Bugs, Preventing Bugs — Exploiting Automated Tests to In...University of Antwerp
 
The Magic Of Application Lifecycle Management In Vs Public
The Magic Of Application Lifecycle Management In Vs PublicThe Magic Of Application Lifecycle Management In Vs Public
The Magic Of Application Lifecycle Management In Vs PublicDavid Solivan
 
Software Security Assurance for DevOps
Software Security Assurance for DevOpsSoftware Security Assurance for DevOps
Software Security Assurance for DevOpsBlack Duck by Synopsys
 
Software Analytics: Data Analytics for Software Engineering and Security
Software Analytics: Data Analytics for Software Engineering and SecuritySoftware Analytics: Data Analytics for Software Engineering and Security
Software Analytics: Data Analytics for Software Engineering and SecurityTao Xie
 
Just-in-time Detection of Protection-Impacting Changes on WordPress and Media...
Just-in-time Detection of Protection-Impacting Changes on WordPress and Media...Just-in-time Detection of Protection-Impacting Changes on WordPress and Media...
Just-in-time Detection of Protection-Impacting Changes on WordPress and Media...Amine Barrak
 
apidays Helsinki & North 2023 - API Security in the era of Generative AI, Mat...
apidays Helsinki & North 2023 - API Security in the era of Generative AI, Mat...apidays Helsinki & North 2023 - API Security in the era of Generative AI, Mat...
apidays Helsinki & North 2023 - API Security in the era of Generative AI, Mat...apidays
 
Managing Software Risk with CAST
Managing Software Risk with CASTManaging Software Risk with CAST
Managing Software Risk with CASTCAST
 
Finding Zero-Days Before The Attackers: A Fortune 500 Red Team Case Study
Finding Zero-Days Before The Attackers: A Fortune 500 Red Team Case StudyFinding Zero-Days Before The Attackers: A Fortune 500 Red Team Case Study
Finding Zero-Days Before The Attackers: A Fortune 500 Red Team Case StudyDevOps.com
 
Online java compiler with security editor
Online java compiler with security editorOnline java compiler with security editor
Online java compiler with security editorIRJET Journal
 
Mobile Trends And The New Threats - Is Your SAP System Vulnerable to Cyber At...
Mobile Trends And The New Threats - Is Your SAP System Vulnerable to Cyber At...Mobile Trends And The New Threats - Is Your SAP System Vulnerable to Cyber At...
Mobile Trends And The New Threats - Is Your SAP System Vulnerable to Cyber At...Virtual Forge
 
Lisa_DiFazio_SQA_Resume
Lisa_DiFazio_SQA_ResumeLisa_DiFazio_SQA_Resume
Lisa_DiFazio_SQA_ResumeLisa DiFazio
 
Ensure a Secure Shopping Experience with Oracle Security Testing.pdf
Ensure a Secure Shopping Experience with Oracle Security Testing.pdfEnsure a Secure Shopping Experience with Oracle Security Testing.pdf
Ensure a Secure Shopping Experience with Oracle Security Testing.pdfRohitBhandari66
 
Improve the Impact of DevOps
Improve the Impact of DevOpsImprove the Impact of DevOps
Improve the Impact of DevOpsSplunk
 
Securing a Great Developer Experience - DevOps Indonesia Meetup by Stefan Str...
Securing a Great Developer Experience - DevOps Indonesia Meetup by Stefan Str...Securing a Great Developer Experience - DevOps Indonesia Meetup by Stefan Str...
Securing a Great Developer Experience - DevOps Indonesia Meetup by Stefan Str...DevOps Indonesia
 
We cant hack ourselves secure
We cant hack ourselves secureWe cant hack ourselves secure
We cant hack ourselves secureEoin Keary
 
Empirical evaluation in 2020: how big, how beautiful?
Empirical evaluation in 2020: how big, how beautiful?Empirical evaluation in 2020: how big, how beautiful?
Empirical evaluation in 2020: how big, how beautiful?Massimiliano Di Penta
 

Similar to Explainable Artificial Intelligence (XAI) 
to Predict and Explain Future Software Defects (20)

Machine programming
Machine programmingMachine programming
Machine programming
 
Intro
IntroIntro
Intro
 
Finding Bugs, Fixing Bugs, Preventing Bugs — Exploiting Automated Tests to In...
Finding Bugs, Fixing Bugs, Preventing Bugs — Exploiting Automated Tests to In...Finding Bugs, Fixing Bugs, Preventing Bugs — Exploiting Automated Tests to In...
Finding Bugs, Fixing Bugs, Preventing Bugs — Exploiting Automated Tests to In...
 
The Magic Of Application Lifecycle Management In Vs Public
The Magic Of Application Lifecycle Management In Vs PublicThe Magic Of Application Lifecycle Management In Vs Public
The Magic Of Application Lifecycle Management In Vs Public
 
Software Security Assurance for DevOps
Software Security Assurance for DevOpsSoftware Security Assurance for DevOps
Software Security Assurance for DevOps
 
Software Analytics: Data Analytics for Software Engineering and Security
Software Analytics: Data Analytics for Software Engineering and SecuritySoftware Analytics: Data Analytics for Software Engineering and Security
Software Analytics: Data Analytics for Software Engineering and Security
 
Just-in-time Detection of Protection-Impacting Changes on WordPress and Media...
Just-in-time Detection of Protection-Impacting Changes on WordPress and Media...Just-in-time Detection of Protection-Impacting Changes on WordPress and Media...
Just-in-time Detection of Protection-Impacting Changes on WordPress and Media...
 
apidays Helsinki & North 2023 - API Security in the era of Generative AI, Mat...
apidays Helsinki & North 2023 - API Security in the era of Generative AI, Mat...apidays Helsinki & North 2023 - API Security in the era of Generative AI, Mat...
apidays Helsinki & North 2023 - API Security in the era of Generative AI, Mat...
 
Managing Software Risk with CAST
Managing Software Risk with CASTManaging Software Risk with CAST
Managing Software Risk with CAST
 
Finding Zero-Days Before The Attackers: A Fortune 500 Red Team Case Study
Finding Zero-Days Before The Attackers: A Fortune 500 Red Team Case StudyFinding Zero-Days Before The Attackers: A Fortune 500 Red Team Case Study
Finding Zero-Days Before The Attackers: A Fortune 500 Red Team Case Study
 
Dev{sec}ops
Dev{sec}opsDev{sec}ops
Dev{sec}ops
 
Online java compiler with security editor
Online java compiler with security editorOnline java compiler with security editor
Online java compiler with security editor
 
Mobile Trends And The New Threats - Is Your SAP System Vulnerable to Cyber At...
Mobile Trends And The New Threats - Is Your SAP System Vulnerable to Cyber At...Mobile Trends And The New Threats - Is Your SAP System Vulnerable to Cyber At...
Mobile Trends And The New Threats - Is Your SAP System Vulnerable to Cyber At...
 
Lisa_DiFazio_SQA_Resume
Lisa_DiFazio_SQA_ResumeLisa_DiFazio_SQA_Resume
Lisa_DiFazio_SQA_Resume
 
Ensure a Secure Shopping Experience with Oracle Security Testing.pdf
Ensure a Secure Shopping Experience with Oracle Security Testing.pdfEnsure a Secure Shopping Experience with Oracle Security Testing.pdf
Ensure a Secure Shopping Experience with Oracle Security Testing.pdf
 
Lect 01
Lect 01Lect 01
Lect 01
 
Improve the Impact of DevOps
Improve the Impact of DevOpsImprove the Impact of DevOps
Improve the Impact of DevOps
 
Securing a Great Developer Experience - DevOps Indonesia Meetup by Stefan Str...
Securing a Great Developer Experience - DevOps Indonesia Meetup by Stefan Str...Securing a Great Developer Experience - DevOps Indonesia Meetup by Stefan Str...
Securing a Great Developer Experience - DevOps Indonesia Meetup by Stefan Str...
 
We cant hack ourselves secure
We cant hack ourselves secureWe cant hack ourselves secure
We cant hack ourselves secure
 
Empirical evaluation in 2020: how big, how beautiful?
Empirical evaluation in 2020: how big, how beautiful?Empirical evaluation in 2020: how big, how beautiful?
Empirical evaluation in 2020: how big, how beautiful?
 

More from Chakkrit (Kla) Tantithamthavorn

The Impact of Class Rebalancing Techniques on the Performance and Interpretat...
The Impact of Class Rebalancing Techniques on the Performance and Interpretat...The Impact of Class Rebalancing Techniques on the Performance and Interpretat...
The Impact of Class Rebalancing Techniques on the Performance and Interpretat...Chakkrit (Kla) Tantithamthavorn
 
Mining Software Defects: Should We Consider Affected Releases?
Mining Software Defects: Should We Consider Affected Releases?Mining Software Defects: Should We Consider Affected Releases?
Mining Software Defects: Should We Consider Affected Releases?Chakkrit (Kla) Tantithamthavorn
 
Software Analytics In Action: A Hands-on Tutorial on Mining, Analyzing, Model...
Software Analytics In Action: A Hands-on Tutorial on Mining, Analyzing, Model...Software Analytics In Action: A Hands-on Tutorial on Mining, Analyzing, Model...
Software Analytics In Action: A Hands-on Tutorial on Mining, Analyzing, Model...Chakkrit (Kla) Tantithamthavorn
 
AI-Driven Software Quality Assurance in the Age of DevOps
AI-Driven Software Quality Assurance in the Age of DevOpsAI-Driven Software Quality Assurance in the Age of DevOps
AI-Driven Software Quality Assurance in the Age of DevOpsChakkrit (Kla) Tantithamthavorn
 
Leveraging HPC Resources to Improve the Experimental Design of Software Analy...
Leveraging HPC Resources to Improve the Experimental Design of Software Analy...Leveraging HPC Resources to Improve the Experimental Design of Software Analy...
Leveraging HPC Resources to Improve the Experimental Design of Software Analy...Chakkrit (Kla) Tantithamthavorn
 
An Empirical Comparison of Model Validation Techniques for Defect Prediction ...
An Empirical Comparison of Model Validation Techniques for Defect Prediction ...An Empirical Comparison of Model Validation Techniques for Defect Prediction ...
An Empirical Comparison of Model Validation Techniques for Defect Prediction ...Chakkrit (Kla) Tantithamthavorn
 
Towards a Better Understanding of the Impact of Experimental Components on De...
Towards a Better Understanding of the Impact of Experimental Components on De...Towards a Better Understanding of the Impact of Experimental Components on De...
Towards a Better Understanding of the Impact of Experimental Components on De...Chakkrit (Kla) Tantithamthavorn
 
The Impact of Mislabelling on the Performance and Interpretation of Defect Pr...
The Impact of Mislabelling on the Performance and Interpretation of Defect Pr...The Impact of Mislabelling on the Performance and Interpretation of Defect Pr...
The Impact of Mislabelling on the Performance and Interpretation of Defect Pr...Chakkrit (Kla) Tantithamthavorn
 
Impact Analysis of Granularity Levels on Feature Location Technique
Impact Analysis of Granularity Levels on Feature Location TechniqueImpact Analysis of Granularity Levels on Feature Location Technique
Impact Analysis of Granularity Levels on Feature Location TechniqueChakkrit (Kla) Tantithamthavorn
 
Open Data in Asia: An Overview of Open Data Policies and Practices in 13 Coun...
Open Data in Asia: An Overview of Open Data Policies and Practices in 13 Coun...Open Data in Asia: An Overview of Open Data Policies and Practices in 13 Coun...
Open Data in Asia: An Overview of Open Data Policies and Practices in 13 Coun...Chakkrit (Kla) Tantithamthavorn
 

More from Chakkrit (Kla) Tantithamthavorn (13)

The Impact of Class Rebalancing Techniques on the Performance and Interpretat...
The Impact of Class Rebalancing Techniques on the Performance and Interpretat...The Impact of Class Rebalancing Techniques on the Performance and Interpretat...
The Impact of Class Rebalancing Techniques on the Performance and Interpretat...
 
Mining Software Defects: Should We Consider Affected Releases?
Mining Software Defects: Should We Consider Affected Releases?Mining Software Defects: Should We Consider Affected Releases?
Mining Software Defects: Should We Consider Affected Releases?
 
Software Analytics In Action: A Hands-on Tutorial on Mining, Analyzing, Model...
Software Analytics In Action: A Hands-on Tutorial on Mining, Analyzing, Model...Software Analytics In Action: A Hands-on Tutorial on Mining, Analyzing, Model...
Software Analytics In Action: A Hands-on Tutorial on Mining, Analyzing, Model...
 
AI-Driven Software Quality Assurance in the Age of DevOps
AI-Driven Software Quality Assurance in the Age of DevOpsAI-Driven Software Quality Assurance in the Age of DevOps
AI-Driven Software Quality Assurance in the Age of DevOps
 
Leveraging HPC Resources to Improve the Experimental Design of Software Analy...
Leveraging HPC Resources to Improve the Experimental Design of Software Analy...Leveraging HPC Resources to Improve the Experimental Design of Software Analy...
Leveraging HPC Resources to Improve the Experimental Design of Software Analy...
 
An Empirical Comparison of Model Validation Techniques for Defect Prediction ...
An Empirical Comparison of Model Validation Techniques for Defect Prediction ...An Empirical Comparison of Model Validation Techniques for Defect Prediction ...
An Empirical Comparison of Model Validation Techniques for Defect Prediction ...
 
Towards a Better Understanding of the Impact of Experimental Components on De...
Towards a Better Understanding of the Impact of Experimental Components on De...Towards a Better Understanding of the Impact of Experimental Components on De...
Towards a Better Understanding of the Impact of Experimental Components on De...
 
The Impact of Mislabelling on the Performance and Interpretation of Defect Pr...
The Impact of Mislabelling on the Performance and Interpretation of Defect Pr...The Impact of Mislabelling on the Performance and Interpretation of Defect Pr...
The Impact of Mislabelling on the Performance and Interpretation of Defect Pr...
 
Impact Analysis of Granularity Levels on Feature Location Technique
Impact Analysis of Granularity Levels on Feature Location TechniqueImpact Analysis of Granularity Levels on Feature Location Technique
Impact Analysis of Granularity Levels on Feature Location Technique
 
Open Data in Asia: An Overview of Open Data Policies and Practices in 13 Coun...
Open Data in Asia: An Overview of Open Data Policies and Practices in 13 Coun...Open Data in Asia: An Overview of Open Data Policies and Practices in 13 Coun...
Open Data in Asia: An Overview of Open Data Policies and Practices in 13 Coun...
 
Introduction to Google App Engine
Introduction to Google App EngineIntroduction to Google App Engine
Introduction to Google App Engine
 
Introduction to GPU Programming
Introduction to GPU ProgrammingIntroduction to GPU Programming
Introduction to GPU Programming
 
Example Application of GPU
Example Application of GPUExample Application of GPU
Example Application of GPU
 

Recently uploaded

08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 

Recently uploaded (20)

08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 

Explainable Artificial Intelligence (XAI) 
to Predict and Explain Future Software Defects

  • 1. Explainable Artificial Intelligence (XAI) 
 to Predict and Explain Future Software Defects Dr. Chakkrit (Kla) Tantithamthavorn Monash University, Melbourne, Australia. chakkrit@monash.edu @klainfohttp://chakkrit.com
  • 3. Software bugs globally cost $2.84 trillion dollars https://www.it-cisq.org/the-cost-of-poor-quality-software-in-the-us-a-2018-report/The-Cost-of-Poor-Quality-Software-in-the-US-2018-Report.pdf A failure to eliminate defects in safety-critical systems could result in serious injury to people, threats to life, death, and disasters https://news.microsoft.com/en-au/features/direct-costs-associated-with-cybersecurity-incidents-costs-australian-businesses-29-billion-per-annum/ 59.5 billions annually for US 29 billions annually for Australia
  • 4. Software evolves extremely fast 50% of the Google’s code base changes every month Windows 8 involves 100K+ code changes Software is written in multiple languages, by many people, over a long period of time 
 in order to fix bugs , add new features , and improve code quality . every day And, software is released faster at massive scale every 6 months every 6 weeksevery 6 months
  • 5. How to find bugs? Use unit testing to test the functionality correctness, But manual testing for all files is time-consuming Use static analysis tools check code quality Use code review to find bugs and check code quality Use CI/CD to automatically build, test, and merge with confidence Others: UI testing, fuzzing, load/performance testing, etc.
  • 6. QA activities take too much time (~50% of a project) • Large and complex code base: 1 billion lines of code • > 10K developers in 40+ office locations • 5K+ projects under active development • 17K code reviews per day • 100 million test cases run per day Given a limited time, how can we effectively prioritise QA resources on the most risky program elements? ’s rules: All changes must be reviewed* https://www.codegrip.tech/productivity/what-is-googles-internal-code-review-process/ https://eclipsecon.org/2013/sites/eclipsecon.org.2013/files/2013-03-24%20Continuous%20Integration%20at%20Google%20Scale.pdf Within 6 months, 1K developers perform 80K+ code reviews 
 (~77 reviews per person) for 30K+ code changes / one release
  • 7. Software Analytics = Software Data + Data Analytics 26 million 
 developers 57 million
 repositories 100 million
 pull requests + code review + CI logs + test logs + docker config files + others
  • 8. WHY DO WE NEED SOFTWARE ANALYTICS? To make informed decisions, glean actionable insights, and build empirical theories PROCESS IMPROVEMENT How do code review practices and rapid releases impact software quality? PRODUCTIVITY IMPROVEMENT How do continuous integration practices impact team productivity? QUALITY IMPROVEMENT Why do programs crash? How to prevent bugs in the future? EMPIRICAL THEORY BUILDING A Theory of Software Quality A Theory of Effort/Cost Estimation Beyond predicting defects
  • 9. AI/ML IS SHAPING SOFTWARE ENGINEERING IMPROVE SOFTWARE QUALITY Predict defects, vulnerabilities, malware Generate test cases
  • 10. AI/ML IS SHAPING SOFTWARE ENGINEERING IMPROVE SOFTWARE QUALITY Predict defects, vulnerabilities, malware Generate test cases Generating UI/requirements/code/comments
 Predict developer/team productivity
 Recommend developers/reviewers Identify developer turnover IMPROVE PRODUCTIVITY
  • 11. AI/ML MODELS FOR SOFTWARE DEFECTS Focus on predicting, explaining future software defects, and building empirical theories Predicting future software defects so practitioners can effectively optimize limited resources Building empirical- grounded theories of software quality Explaining what makes a software fail so managers can develop the most effective improvement plans
  • 12. ANALYTICAL MODELLING FRAMEWORK MAME: Mining, Analyzing, Modelling, Explaining Raw Data …… …… A B Clean Data MINING Correlation . . . .. . . . . .. 
 ANALYZING Analytical 
 Models 
 MODELLING Knowledge 
 EXPLAINING
  • 13. Raw Data ITS Issue 
 Tracking
 System (ITS) MINING SOFTWARE DEFECTS Issue 
 Reports VCS Version
 Control
 System (VCS) Code Changes Code Snapshot Commit Log STEP 1: EXTRACT DATA
  • 14. Reference: https://github.com/apache/lucene-solr/tree/662f8dd3423b3d56e9e1a197fe816393a33155e2 What are the source files in this release? ITS Code Changes Code Snapshot Commit Log VCS Issue 
 Tracking
 System (ITS) Version
 Control
 System (VCS) Issue 
 Reports Raw Data STEP 2: COLLECT METRICS STEP 1: EXTRACT DATA MINING SOFTWARE DEFECTS
  • 15. Reference: https://github.com/apache/lucene-solr/commit/662f8dd3423b3d56e9e1a197fe816393a33155e2 How many lines are added or deleted? ITS VCS Issue 
 Tracking
 System (ITS) Version
 Control
 System (VCS) Raw Data Commit Log Issue 
 Reports MINING SOFTWARE DEFECTS Code Changes Code Snapshot STEP 2: COLLECT METRICS STEP 1: EXTRACT DATA Who edit this file?
  • 16. ITS VCS Issue 
 Tracking
 System (ITS) Version
 Control
 System (VCS) Raw Data STEP 1: EXTRACT DATA CODE METRICS Size, Code Complexity, Cognitive Complexity,
 OO Design (e.g., coupling, cohesion) PROCESS METRICS Development Practices 
 (e.g., #commits, #dev, churn, #pre- release defects, change complexity) HUMAN FACTORS Code Ownership, #MajorDevelopers, 
 #MinorDevelopers, Author Ownership,
 Developer Experience Code Changes Code Snapshot Commit Log Issue 
 Reports STEP 2: COLLECT METRICS MINING SOFTWARE DEFECTS
  • 17. Reference: https://issues.apache.org/jira/browse/LUCENE-4128 Issue Reference ID Bug / New Feature Which releases are affected? Which commits belong to this issue report? ITS Code Changes Code Snapshot Commit Log VCS Issue 
 Tracking
 System (ITS) Version
 Control
 System (VCS) Issue 
 Reports Raw Data STEP 1: EXTRACT DATA STEP 2: COLLECT METRICSSTEP 3: IDENTIFY DEFECTS MINING SOFTWARE DEFECTS Whether this report is created after the release of interest?
  • 18. ITS Code Changes Code Snapshot Commit Log VCS Issue 
 Tracking
 System (ITS) Version
 Control
 System (VCS) Issue 
 Reports Raw Data STEP 1: EXTRACT DATA STEP 3: IDENTIFY DEFECTS STEP 2: COLLECT METRICS …… …… A B Defect
 Dataset MINING SOFTWARE DEFECTS Which files were changed to fix the defect? Link Check “Mining Software Defects” paper [Yatish et al., ICSE 2019]
  • 19. LABELLING SOFTWARE DEFECTS Release 1.0 Changes Issues Timeline Timeline C1: Fixed ID-1 ID=1, v=1.0 A.java ID=2, v=0.9 C2: Fixed ID-2 B.java ID=3, v=1.0 C3: Fixed ID-3 C.java ID=4, v=1.0 C4: Fixed ID-4 D.java Post-release defects are defined as modules that are fixed for a defect report that affected a release of interest ID indicates a defect report ID, 
 C indicates a commit hash,
 v indicates affected release(s) DEFECTIVE CLEAN DEFECTIVE DEFECTIVE FILE A.java B.java C.java D.java LABEL Yatish et al., Mining Software Defects: Should We Consider Affected Releases?, In ICSE’19
  • 20. HIGHLY-CURATED DATASETS 32 releases that span across 9 open-source software systems Name %DefectiveRatio KLOC ActiveMQ 6%-15% 142-299 Camel 2%-18% 75-383 Derby 14%-33% 412-533 Groovy 3%-8% 74-90 HBase 20%-26% 246-534 Hive 8%-19% 287-563 JRuby 5%-18% 105-238 Lucene 3%-24% 101-342 Wicket 4%-7% 109-165 Each dataset has 65 software metrics • 54 code metrics • 5 process metrics • 6 ownership metrics https://awsm-research.github.io/Rnalytica/ Yatish et al., Mining Software Defects: Should We Consider Affected Releases?, In ICSE’19
  • 21. ANALYTICAL MODELLING FRAMEWORK MAME: Mining, Analyzing, Modelling, Explaining Raw Data …… …… A B Clean Data MINING Correlation . . . .. . . . . .. 
 ANALYZING Analytical 
 Models 
 MODELLING Knowledge 
 EXPLAINING
  • 22. Black-Box 
 Models Training 
 Data Learning Algorithms A.java A.java is 
 likely to be defective
 (P=0.90) SOFTWARE DEFECT MODELLING FRAMEWORK Using well-established AI/ML learning algorithms Developers make 
 an informed decision
  • 23. Black-Box 
 Models Training 
 Data Learning Algorithms A.java A.java is 
 likely to be defective
 (P=0.90) SOFTWARE DEFECT MODELLING FRAMEWORK Using well-established AI/ML learning algorithms Developers make 
 an informed decision Why is A.java defective? Why is A.java defective rather than clean? Why is file A.java defective, 
 while file B.java is clean?
  • 24. Article 22 of the European Union’s General Data Protection Regulation “The use of data in decision- making that affects an individual or group requires 
 an explanation for any decision made by an algorithm.” http://www.privacy-regulation.eu/en/22.htm
  • 25. AI/ML BRINGS CONCERNS TO REGULATORS FAT: Fairness, Accountability, and Transparency What if AI-assisted productivity analytics tend to promote males more than females? Do the AI systems conform to regulation and legislation? Do we understand how machines work? Why models make that predictions?
  • 26. EXPLAINABLE ARTIFICIAL INTELLIGENCE (XAI) A suite of AI/ML techniques that produce accurate predictions, while being able to explain such predictions Black-Box 
 Models Training 
 Data Learning Algorithms A.java Prediction A.java is 
 likely to be defective
 (P=0.90) Explainable
 Interface Explanation The system provides an explanation that justifies its prediction to the user
  • 27. EXPLAINING A BLACK-BOX MODEL Model-Specific Techniques (e.g., ANOVA for Regression / Variable Importance for Random Forest) Unseen Data Black-Box 
 Models Explaining a black-box model to identify the most important features based on the training data Model-specific interpretation 
 techniques
 (VarImp) Global Explanation A prediction score of 90% Predictions
  • 28. Explaining an individual prediction: we know how do features contribute to the final probability for each prediction? EXPLAINING AN INDIVIDUAL PREDICTION Model-Agnostic Techniques to Generate An Outcome Explanation A prediction score of 90% Model- Agnostic
 Techniques Unseen Data Model-specific interpretation 
 techniques
 (VarImp) Black-Box 
 Models Global Explanation Instance ExplanationsPredictions
  • 29. WHY IS A.JAVA DEFECTIVE? Explaining the importance of each metric that contributes to the final probability of each prediction 0.268 0.332 0.169 0.036 0.02 0.007 0.832 + MAJOR_LINE = 2 + ADEV = 12 + CountDeclMethodPrivate = 6 + CountDeclMethodPublic = 44 + CountClassCoupled = 16 remaining 21 variables final_prognosis 0.00 0.25 0.50 0.75 1.00 1.25 #ActiveDevelopers contributed the most to the likelihood of being defective for this module
  • 30. A Quality Improvement Plan “A policy to maintain the maximum number of (two) developers who can edited a module in the past (six) months”
  • 31. Software Analytics in Action
 A Hands-on Tutorial on Analyzing and Modelling Software Data Dr. Chakkrit (Kla) Tantithamthavorn Monash University, Melbourne, Australia. chakkrit@monash.edu @klainfohttp://chakkrit.com
  • 32. Statistical
 Model Training
 Corpus Classifier 
 Parameters (7) Model
 Construction Performance
 Measures Data 
 Sampling (2) Data Cleaning and Filtration (3) Metrics Extraction and Normalization (4) Descriptive Analytics (+/-) Relationship to the Outcome Y X x Software
 Repository Software
 Dataset Clean
 Dataset Studied Dataset Outcome Studied Metrics Control Metrics +~ (1) Data Collection Predictive 
 Analytics Prescriptive Analytics (8) Model Validation (9) Model Analysis and Interpretation Importance 
 Score Testing
 Corpus PredictionsPerformance
 Estimates Patterns Challenges of Data Analytics Pipeline How to clean data? How to collect ground-truths? Should we rebalance the data? Are features correlated? Which ML techniques is best? Which model validation techniques should I use? What is the benefit of optimising ML parameters? How to analyse or explain the ML models? Should we apply feature reduction? What is best data analytics pipeline for software defects?
  • 33. Mining Software Data Analyzing Software Data Affected Releases
 [ICSE’19] Issue Reports
 [ICSE’15] Control Features
 [ICSE-SEIP’18] Feature Selection
 [ICSME’18] Correlation Analysis
 [TSE’19] Modelling Software Data Class Imbalance
 [TSE’19] Parameters
 [ICSE’16,TSE’18] Model Validation
 [TSE’17] Measures
 [ICSE-SEIP’18] Explaining Software Data Model Statistics
 [ICSE-SEIP’18] Interpretation
 [TSE’19] ANALYZING AND MODELLING 
 SOFTWARE DEFECTS Tantithamthavorn and Hassan. An Experience Report on Defect Modelling in Practice: Pitfalls and Challenges. In ICSE-SEIP’18 MSR’19 
 Education
  • 34. RUN JUPYTER + R ANYTIME AND ANYWHERE http://github.com/awsm-research/tutorial Shift + Enter to run a cell
  • 35. EXAMPLE DATASET # Load a defect dataset > > > > >
 > source("import.R")
 eclipse <- loadDefectDataset("eclipse-2.0") data <- eclipse$data indep <- eclipse$indep dep <- eclipse$dep data[,dep] <- factor(data[,dep]) 6,729 files, 32 metrics
 14% defective ratio Tantithamthavorn and Hassan. An Experience Report on Defect Modelling in Practice: Pitfalls and Challenges. In ICSE-SEIP’18, pages 286-295. # Understand your dataset > describe(data) data 33 Variables 6729 Observations ------------------------------------------------- CC_sum n missing distinct Mean 6729 0 268 26.9 lowest : 0 1 , highest: 1052 1299 ———————————————————————— post n missing distinct 6729 0 2 Value FALSE TRUE Frequency 5754 975 Proportion 0.855 0.145[Zimmermann et al, PROMISE’07]
  • 36. Is program complexity associated with software quality? BUILDING A THEORY OF SOFTWARE QUALITY
  • 37. # Develop a logistic regression > m <- glm(post ~ CC_max, data = data) # Print a model summary > summary(m) Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -2.490129 0.051777 -48.09 <2e-16 *** CC_max 0.104319 0.004819 21.65 <2e-16 ***
 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 INTRO: BASIC REGRESSION ANALYSIS Theoretical Assumptions 1. Binary dependent variable and ordinal independent variables 2. Observations are independent 3. No (multi-)collinearity among independent variables 4. Assume a linear relationship between the logit of the outcome and each variable # Visualize the relationship of the studied variable > > > install.packages("effects") library(effects) plot(allEffects(m))
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 CC_max effect plot CC_max post 0.20.40.60.8 0 50 100 150 200 250 300
  • 38. Which factors share the strongest association with software quality? BUILDING A THEORY OF SOFTWARE QUALITY
  • 39. …… …… A B Knowledge Analytical 
 Models Clean Data Correlation . . . .. . . . . .. BEST PRACTICES FOR ANALYTICAL MODELLING (1) Include control features (3) Build interpretable models (4) Explore different settings (2) Remove correlated features (7) Visualize the relationship (5) Use out-of-sample bootstrap(6) Summarize by a Scott-Knott test (1) Don’t use ANOVA Type-I (2) Don’t optimize prob thresholds 7 DOs and 3 DON’Ts (3) Don’t solely use F-measure
  • 40. STEP1: INCLUDE CONTROL FEATURES Size, OO Design 
 (e.g., coupling, cohesion), Program Complexity Software Defects Control features are features that are not of interest even though they could affect the outcome of a model (e.g., lines of code when modelling defects). #commits, #dev, churn, 
 #pre-release defects, 
 change complexity Code Ownership,
 #MinorDevelopers, Experience Principles of designing factors 1. Easy/simple measurement 2. Explainable and actionable 3. Support decision making Tantithamthavorn et al., An Experience Report on Defect Modelling in Practice: Pitfalls and Challenges. ICSE-SEIP’18
  • 41. The risks of not including control features (e.g., lines of code) STEP1: INCLUDE CONTROL FEATURES # post ~ CC_max + PAR_max + FOUT_max
 > > m1 <- glm(post ~ CC_max + PAR_max + FOUT_max, data = data, family="binomial") anova(m1) Analysis of Deviance Table Model: binomial, link: logit Response: post Terms added sequentially (first to last) Df Deviance Resid. Df Resid. Dev NULL 6728 5568.3 CC_max 1 600.98 6727 4967.3 PAR_max 1 131.45 6726 4835.8 FOUT_max 1 60.21 6725 4775.6
 # post ~ TLOC + CC_max + PAR_max + FOUT_max
 > > m2 <- glm(post ~ TLOC + CC_max + PAR_max + FOUT_max, data = data, family="binomial") anova(m2)
 Analysis of Deviance Table Model: binomial, link: logit Response: post Terms added sequentially (first to last) Df Deviance Resid. Df Resid. Dev NULL 6728 5568.3 TLOC 1 709.19 6727 4859.1 CC_max 1 74.56 6726 4784.5 PAR_max 1 63.35 6725 4721.2 FOUT_max 1 17.41 6724 4703.8 Complexity is the top rank Lines of code is the top rank Conclusions may change when including control features
  • 42. STEP2: REMOVE CORRELATED FEATURES The state of practices in software engineering Jiarpakdee et al: The Impact of Correlated Metrics on the Interpretation of Defect Models. TSE’19 “82% of SE datasets have 
 correlated features” “63% of SE studies do not 
 mitigate correlated features” Why? Most metrics are aggregated. Collinearity is a phenomenon in which one feature can be linearly predicted by another feature
  • 43. The risks of not removing correlated factors STEP2: REMOVE CORRELATED FEATURES Model 1 Model 2 CC_max 74 19 CC_avg 2 58 PAR_max 16 16 FOUT_max 7 7 Model1: Post ~ CC_max + CC_avg + PAR_max + FOUT_max Model2: Post ~ CC_avg + CC_max + PAR_max + FOUT_max CC_max is highly correlated with CC_avg The values indicate the contribution of each factor to the model (from ANOVA analysis) Jiarpakdee et al: The Impact of Correlated Metrics on the Interpretation of Defect Models. TSE’19 Conclusions may be changed when reordering the correlated features
  • 44. # Visualize spearman’s correlation for all metrics using a hierarchical clustering > > > library(rms) plot(varclus(as.matrix(data[,indep]), similarity="spear", trans="abs")) abline(h=0.3, col="red") STEP2: REMOVE CORRELATED FACTORS Using Spearman’s correlation analysis to detect collinearity 1 NSM_avg NSM_max NSM_sum NSF_avg NSF_max NSF_sum PAR_avg PAR_max PAR_sum pre NOI NOT FOUT_sum MLOC_sum TLOC NBD_sum CC_sum FOUT_avg FOUT_max NBD_avg NBD_max CC_avg CC_max MLOC_avg MLOC_max ACD NOF_avg NOF_max NOF_sum NOM_avg NOM_max NOM_sum 1.00.60.2 Spearmanρ 2 4 5 3 6 7
  • 45. # Visualize spearman’s correlation for all metrics using a hierarchical clustering > > > library(rms) plot(varclus(as.matrix(data[,indep]), similarity="spear", trans="abs")) abline(h=0.3, col="red") STEP2: REMOVE CORRELATED FACTORS Using Spearman’s correlation analysis to detect collinearity 1 NSM_avg NSM_max NSM_sum NSF_avg NSF_max NSF_sum PAR_avg PAR_max PAR_sum pre NOI NOT FOUT_sum MLOC_sum TLOC NBD_sum CC_sum FOUT_avg FOUT_max NBD_avg NBD_max CC_avg CC_max MLOC_avg MLOC_max ACD NOF_avg NOF_max NOF_sum NOM_avg NOM_max NOM_sum 1.00.60.2 Spearmanρ 2 4 Using domain knowledge to manually select one metric in a group. After mitigating correlated metrics, we should have 9 factors (7+2). A GROUP OF CORRELATED METRICS NON-CORRELATED METRICS 5 3 6 7
  • 46. # Visualize spearman’s correlation for all metrics using a hierarchical clustering > > > library(rms) plot(varclus(as.matrix(data[,indep]), similarity="spear", trans="abs")) abline(h=0.3, col="red") STEP2: REMOVE CORRELATED FACTORS 1 NSM_avg NSM_max NSM_sum NSF_avg NSF_max NSF_sum PAR_avg PAR_max PAR_sum pre NOI NOT FOUT_sum MLOC_sum TLOC NBD_sum CC_sum FOUT_avg FOUT_max NBD_avg NBD_max CC_avg CC_max MLOC_avg MLOC_max ACD NOF_avg NOF_max NOF_sum NOM_avg NOM_max NOM_sum 1.00.60.2 Spearmanρ 2 3 4 5 6 AutoSpearman (1) removes constant factors, and (2) selects one factor of each group that shares the least correlation with other factors that are not in that group 7 How to automatically mitigate (multi-)collinearity? Jiarpakdee et al: AutoSpearman: Automatically Mitigating Correlated Software Metrics for Interpreting Defect Models. ICSME’18
  • 47. # Run a AutoSpearman > > > library(Rnalytica) filterindep <- AutoSpearman(data, indep) plot(varclus(as.matrix(data[, filterindep]), similarity="spear", trans="abs")) abline(h=0.3, col="red") STEP2: REMOVE CORRELATED FACTORS How to automatically mitigate (multi-)collinearity? NSF_avg NSM_avg PAR_avg pre NBD_avg NOT ACD NOF_avg NOM_avg 0.70.50.30.1 Spearmanρ Jiarpakdee et al: AutoSpearman: Automatically Mitigating Correlated Software Metrics for Interpreting Defect Models. ICSME’18
  • 48. STEP3: BUILD AND EXPLAIN DECISION TREES R implementation of a Decision Trees-Based model (C5.0) # Build a C5.0 tree-based model > > tree.model <- C5.0(x = data[,indep], y = data[,dep]) summary(tree.model) Read 6,729 cases (10 attributes) from undefined.data Decision tree: pre <= 1: :...NOM_avg <= 17.5: FALSE (4924/342) : NOM_avg > 17.5: : :...NBD_avg > 1.971831: : :...ACD <= 2: TRUE (51/14) : : ACD > 2: FALSE (5)
 
 Tantithamthavorn et al: Automated parameter optimization of classification techniques for defect prediction models. ICSE’16 # Plot a Decision Tree-based model >plot(tree.model)
 
 
 
 
 
 
 
 
 
 
 
 
 
 pre 1 ≤ 1 > 1 NOM_avg 2 ≤ 17.5 > 17.5 Node 3 (n = 4924) TRUEFALSE 0 0.2 0.4 0.6 0.8 1 NBD_avg 4 ≤ 1.972 > 1.972 NBD_avg 5 ≤ 1.029 > 1.029 Node 6 (n = 64) TRUEFALSE 0 0.2 0.4 0.6 0.8 1 NOM_avg 7 ≤ 64 > 64 Node 8 (n = 332) TRUEFALSE 0 0.2 0.4 0.6 0.8 1 Node 9 (n = 21) TRUEFALSE 0 0.2 0.4 0.6 0.8 1 ACD 10 ≤ 2 > 2 Node 11 (n = 51) TRUEFALSE 0 0.2 0.4 0.6 0.8 1 Node 12 (n = 5) TRUEFALSE 0 0.2 0.4 0.6 0.8 1 pre 13 ≤ 6 > 6 NBD_avg 14 ≤ 1.012 > 1.012 Node 15 (n = 180) TRUEFALSE 0 0.2 0.4 0.6 0.8 1 NOM_avg 16 ≤ 23.5 > 23.5 PAR_avg 17 ≤ 0.677 > 0.677 PAR_avg 18 ≤ 0.579 > 0.579 NBD_avg 19 ≤ 2.833> 2.833 Node 20 (n = 118) TRUEFALSE 0 0.2 0.4 0.6 0.8 1 Node 21 (n = 7) TRUEFALSE 0 0.2 0.4 0.6 0.8 1 NBD_avg 22 ≤ 1.564> 1.564 Node 23 (n = 70) TRUEFALSE 0 0.2 0.4 0.6 0.8 1 Node 24 (n = 29) TRUEFALSE 0 0.2 0.4 0.6 0.8 1 pre 25 ≤ 2 > 2 NBD_avg 26 ≤ 2.13 > 2.13 Node 27 (n = 188) TRUEFALSE 0 0.2 0.4 0.6 0.8 1 NOF_avg 28 ≤ 1.75 > 1.75 Node 29 (n = 29) TRUEFALSE 0 0.2 0.4 0.6 0.8 1 NOM_avg 30 ≤ 6.5 > 6.5 Node 31 (n = 10) TRUEFALSE 0 0.2 0.4 0.6 0.8 1 Node 32 (n = 15) TRUEFALSE 0 0.2 0.4 0.6 0.8 1 PAR_avg 33 ≤ 1.75 > 1.75 Node 34 (n = 288) TRUEFALSE 0 0.2 0.4 0.6 0.8 1 NBD_avg 35 ≤ 1.161> 1.161 Node 36 (n = 4) TRUEFALSE 0 0.2 0.4 0.6 0.8 1 Node 37 (n = 37) TRUEFALSE 0 0.2 0.4 0.6 0.8 1 ACD 38 ≤ 0 > 0 NSF_avg 39 ≤ 11 > 11 Node 40 (n = 73) TRUEFALSE 0 0.2 0.4 0.6 0.8 1 Node 41 (n = 12) TRUEFALSE 0 0.2 0.4 0.6 0.8 1 NOF_avg 42 ≤ 12 > 12 Node 43 (n = 27) TRUEFALSE 0 0.2 0.4 0.6 0.8 1 Node 44 (n = 7) TRUEFALSE 0 0.2 0.4 0.6 0.8 1 pre 45 ≤ 12 > 12 NSM_avg 46 ≤ 0.25 > 0.25 NOF_avg 47 ≤ 10.5 > 10.5 Node 48 (n = 94) TRUEFALSE 0 0.2 0.4 0.6 0.8 1 Node 49 (n = 11) TRUEFALSE 0 0.2 0.4 0.6 0.8 1 Node 50 (n = 56) TRUEFALSE 0 0.2 0.4 0.6 0.8 1 Node 51 (n = 77) TRUEFALSE 0 0.2 0.4 0.6 0.8 1
  • 49. STEP3: BUILD AND EXPLAIN RULES MODELS R implementation of a Rules-Based model (C5.0) Tantithamthavorn et al: Automated parameter optimization of classification techniques for defect prediction models. ICSE’16 # Build a C5.0 rule-based model Rule 13: (56/19, lift 4.5) pre <= 1 NBD_avg > 1.971831 NOM_avg > 17.5 -> class TRUE [0.655] Rule 14: (199/70, lift 4.5) pre > 1 NBD_avg > 1.012195 NOM_avg > 23.5 -> class TRUE [0.647] Rule 15: (45/16, lift 4.4) pre > 2 pre <= 6 NBD_avg > 1.012195 PAR_avg > 1.75 -> class TRUE [0.638] # Build a C5.0 rule-based model > > rule.model <- C5.0(x = data[, indep], y = data[,dep], rules = TRUE) summary(rule.model)
 
 Rules: Rule 1: (2910/133, lift 1.1) pre <= 6 NBD_avg <= 1.16129 -> class FALSE [0.954] Rule 2: (3680/217, lift 1.1) pre <= 2 NOM_avg <= 6.5 -> class FALSE [0.941]
 Rule 3: (4676/316, lift 1.1) pre <= 1 NBD_avg <= 1.971831 NOM_avg <= 64 -> class FALSE [0.932]
  • 50. STEP3: BUILD AND EXPLAIN RF MODELS R implementation of a Random Forest model # Build a random forest model > > > f <- as.formula(paste( "RealBug", '~', paste(indep, collapse = "+"))) rf.model <- randomForest(f, data = data, importance = TRUE) print(rf.model)
 
 Call: Type of random forest: classification Number of trees: 500 No. of variables tried at each split: 1 OOB estimate of error rate: 12.3% Confusion matrix: FALSE TRUE class.error FALSE 567 42 0.06896552 TRUE 57 139 0.29081633
 
 
 # Plot a Random Forest model >plot(rf.model)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 NOT ACD NSF_avg NSM_avg NOF_avg PAR_avg NBD_avg NOM_avg pre ● ● ● ● ● ● ● ● ● 10 20 30 40 50 60 70 MeanDecreaseAccuracy NOT ACD NSM_avg NSF_avg NOF_avg pre NOM_avg PAR_avg NBD_avg ● 0 rf.model
  • 51. STEP4: EXPLORE DIFFERENT SETTINGS The risks of using default parameter settings Tantithamthavorn et al: Automated parameter optimization of classification techniques for defect prediction models. ICSE’16
 Fu et al. Tuning for software analytics: Is it really necessary? IST'16 87% of the widely-used classification techniques require at least one parameter setting [ICSE’16] #trees for 
 random forest #clusters for 
 k-nearest neighbors #hidden layers for neural networks "80% of top-50 highly-cited defect studies rely on a default setting [IST’16]”
  • 52. STEP4: EXPLORE DIFFERENT SETTINGS The risks of using default parameter settings Dataset Generate training samples Training
 samples Testing
 samples Models Build 
 models
 w/ diff settings Random Search
 Differential Evolution ●● 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 C 50.1trial C 50.100trials R F.10trees R F.100trees G LM AUC AUC Improvement
 for C5.0 AUC Improvement
 for RF
  • 53. STEP5: USE OUT-OF-SAMPLE BOOTSTRAP To estimate how well a model will perform on unseen data Tantithamthavorn et al: An Empirical Comparison of Model Validation Techniques for Defect Prediction Models. TSE’17 Testing 70% 30% Training Holdout Validation k-Fold Cross Validation Repeat k times Bootstrap Validation 50% Holdout 70% Holdout Repeated 50% Holdout Repeated 70% Holdout Leave-one-out CV 2 Fold CV 10 Fold CV Repeated 10 fold CV Ordinary bootstrap Optimism-reduced bootstrap Out-of-sample bootstrap .632 Bootstrap TestingTraining Repeat N times TestingTraining
  • 54. STEP5: USE OUT-OF-SAMPLE BOOTSTRAP R Implementation of out-of-sample bootstrap and 10-folds cross validation Tantithamthavorn et al: An Empirical Comparison of Model Validation Techniques for Defect Prediction Models. TSE’17 # Out-of-sample Bootstrap Validation > > > > > > for(i in seq(1,100)){ set.seed(1234+i) indices <- sample(nrow(data), replace=TRUE) training <- data[indices,] testing <- data[-indices,] … } # 10-Folds Cross-Validation Bootstrap Validation > > > > > indices <- createFolds(data[, dep], k = 10, list = TRUE, returnTrain = TRUE) for(i in seq(1,10)){ training <- data[indices[[i]],] testing <- data[-indices[[i]],] … } ● ● AUC 100 Bootstrap 10X10−Fold C V 0.75 0.78 0.81 0.84 value More accurate and more stable performance estimates [TSE’17]
  • 55. Fold 1 100 modules, 5% defective rate 10-folds cross-validation Fold 5 Fold 6 … Fold 10 There is a high chance that a testing sample does not have any defective modules … Out-of-sample bootstrap Training Testing A sample with replacement with the same size of the original sample Modules that do not appear in the bootstrap sample Bootstrap sample ~36.8% A bootstrap sample is nearly representative of the original dataset STEP5: USE OUT-OF-SAMPLE BOOTSTRAP The risks of using 10-folds CV on small datasets Tantithamthavorn et al: An Empirical Comparison of Model Validation Techniques for Defect Prediction Models. TSE’17
  • 56. STEP6: SUMMARIZE BY A SCOTTKNOTT-ESD TEST To statistically determine the ranks of the most significant metrics # Run a ScottKnottESD test > > > > > > > > > > > > > > importance <- NULL indep <- AutoSpearman(data, eclipse$indep) f <- as.formula(paste( "post", '~', paste(indep, collapse = "+"))) for(i in seq(1,100)){ indices <- sample(nrow(data), replace=TRUE) training <- data[indices,] m <- glm(f, data = training, family="binomial") importance <- rbind(importance, Anova(m,type="2",test="LR")$"LR Chisq") } importance <- data.frame(importance) colnames(importance) <- indep sk_esd(importance) Groups: pre NOM_avg NBD_avg ACD NSF_avg PAR_avg 1 2 3 4 5 6 
 NOT NSM_avg NOF_avg
 7 7 8 ● ● ●● ● ● ● ●● ● ● ●● ● ● ●●●●● ● ●●● ● ●●●●●●●● ● 1 2 3 4 5 6 7 8 pre N O M _avg N BD _avg AC D N SF_avg PAR _avg N O T N SM _avg N O F_avg 0 100 200 300 variablevalue Each rank has a statistically significant difference with non- negligible effect size [TSE’17]
  • 57. # Visualize the relationship of the studied variable > > > > > library(effects) indep <- AutoSpearman(data, eclipse$indep) f <- as.formula(paste( "post", '~', paste(indep, collapse = "+"))) m <- glm(f, data = data, family="binomial") plot(effect("pre",m))
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 STEP7: VISUALIZE THE RELATIONSHIP To understand the relationship between the studied metric and the outcome pre effect plot pre post 0.2 0.4 0.6 0.8 0 10 20 30 40 50 60 70
  • 58. # ANOVA Type-I > > > > > > Df Deviance Resid. Df Resid. Dev NULL 6728 5568.3 NSF_max 1 45.151 6727 5523.1 NSM_max 1 17.178 6726 5505.9 NOF_max 1 50.545 6725 5455.4 ACD 1 43.386 6724 5412. FIRST, DON’T USE ANOVA TYPE-I To measure the significance/contribution of each metric to the model Jiarpakdee et al: The Impact of Correlated Metrics on the Interpretation of Defect Models. TSE’19 RSS(post ~ 1) RSS(post ~ NSF_max) ANOVA Type-I measures the improvement of the Residual Sum of Squares (RSS) (i.e., the unexplained variance) when each metric is sequentially added into the model. RSS(post ~ NSF_max) - RSS(post ~ 1) = 45.151 RSS(post ~ NSF_max + NSM_max) - RSS(post ~ NSF_max) = 17.178
  • 59. FIRST, DON’T USE ANOVA TYPE-I To measure the significance/contribution of each metric to the model Jiarpakdee et al: The Impact of Correlated Metrics on the Interpretation of Defect Models. TSE’19 # ANOVA Type-II > > Anova(m) Analysis of Deviance Table (Type II tests) Response: post LR Chisq Df Pr(>Chisq) NSF_max 10.069 1 0.001508 ** NSM_max 17.756 1 2.511e-05 *** NOF_max 21.067 1 4.435e-06 *** ACD 43.386 1 4.493e-11 *** RSS(post ~ all except the studied metric) - RSS(post ~ all metrics) ANOVA Type-II measures the improvement of the Residual Sum of Squares (RSS) (i.e., the unexplained variance) when adding a metric under examination to the model after the other metrics. glm(post ~ X2 + X3 + X4, data=data)$deviance - glm(post ~ X1 + X2 + X3 + X4, data=data)$deviance
  • 60. DON’T USE ANOVA TYPE-I Instead, future studies must use ANOVA Type-II/III Jiarpakdee et al: The Impact of Correlated Metrics on the Interpretation of Defect Models. TSE’19 Model 1 Model 2 Type 1 Type 2 Type 1 Type 2 ACD 28% 47% 49% 47% NOF_max 32% 23% 13% 23% NSM_max 11% 19% 31% 19% NSF_max 29% 11% 7% 11% Model1: post ~ NSF_max + NSM_max + NOF_max + ACD Model2: post ~ NSM_max + ACD + NSF_max + NOF_max Reordering
  • 61. DON’T SOLELY USE F-MEASURES Other (domain-specific) practical measures should also be included Threshold-independent Measures
 Area Under the ROC Curve = The discrimination ability to classify 2 outcomes. Ranking Measures
 Precision@20%LOC = The precision when inspecting the top 20% LOC
 Recall@20%LOC = The recall when inspecting the top 20% LOC
 Initial False Alarm (IFA) = The number of false alarms to find the first bug [Xia ICSME’17]
 Effort-Aware Measures
 Popt = an effort-based cumulative lift chart [Mende PROMISE’09]
 Inspection Effort = The amount of effort (LOC) that is required to find the first bug. [Arisholm JSS’10]
  • 62. DON’T SOLELY USE F-MEASURES ● ● 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 C 50.100trials R F.100trees C 50.1trial R F.10trees G LM F−measure(0.5) ● ● ● ● ● ● 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 C 50.100trials R F.100trees C 50.1trial R F.10trees G LM F−measure(0.8) ● 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 C 50.100trials R F.100trees C 50.1trial R F.10trees G LM F−measure(0.2) Tantithamthavorn and Hassan. An Experience Report on Defect Modelling in Practice: Pitfalls and Challenges. In ICSE-SEIP’18 The risks of changing probability thresholds
  • 63. DO NOT IMPLY CAUSATIONS Complexity is the root cause of software defects Software defects are caused by the high code complexity Complexity shares the strongest association with defect- proneness
  • 64. PH.D. SCHOLARSHIP • Tuition Fee Waivers • $28,000 Yearly Stipend • Travel Funding • A University-Selected Laptop (e.g., MacBook Pro) • Access to HPC/GPU clusters (4,112 CPU cores, 168 GPU co-processors, 3PB) + NVIDIA DGX1-V 1. Written Communication Skills 2. Research 3. Public Speaking 4. Project Management 5. Leadership 6. Critical Thinking Skills 7. Team Collaboration 7 Developing Skills