Comparing Text Mining Algorithms for Predicting the Severity of a Reported Bug

Proceedings of the 15th European Conference on Software Maintenance and Reengineering

Comparing Text Mining Algorithms for
Predicting the Severity of a Reported Bug

Ahmed Lamkanﬁ, Serge Demeyer, Quinten David Soetens, Tim Verdonck

1 /19
Monday 7 March 2011

Severity of a bug is important
✓ Critical factor in deciding how soon it needs to
be ﬁxed, i.e. when prioritizing bugs

4 /19
Monday 7 March 2011

Severity of a bug is important
✓ Critical factor in deciding how soon it needs to
be ﬁxed, i.e. when prioritizing bugs

Priority is not severity!
✓ e.g.: a crash occurring only at a small user base
may have a low priority

4 /19
Monday 7 March 2011

Severity is
technical
5 /19
Monday 7 March 2011

Priority is business
6 /19
Monday 7 March 2011

✓ Severity varies:
➡ trivial, minor, normal, major, critical and blocker
➡ clear guidelines exist to classify severity of bug
reports

7 /19
Monday 7 March 2011

✓ Severity varies:
➡ trivial, minor, normal, major, critical and blocker
➡ clear guidelines exist to classify severity of bug
reports

✓ Bugs are grouped according to products
and components
➡ e.g.: UI, SWT, Debug are components of product
Eclipse

7 /19
Monday 7 March 2011

Approach

8 /19
Monday 7 March 2011

Bug Database

9 /19
Monday 7 March 2011

Bug Database

(1) & (2)
Extract and preprocess bug reports

------------
------------
------------
Bug Reports
------------
------------
------------
------------
------------

9 /19
Monday 7 March 2011

Bug Database

(1) & (2)
Extract and preprocess bug reports

------------
------------
------------
Bug Reports
------------ (3) Training predict
------------
------------
------------
------------

9 /19
Monday 7 March 2011

Bug Database

New report
---------------

(1) & (2)
Extract and preprocess bug reports (4) Predict the severity

------------
------------
------------
Bug Reports
------------ (3) Training predict
------------
------------
------------
------------

9 /19
Monday 7 March 2011

Bug Database

New report
---------------

(1) & (2)

------------
------------
------------
Bug Reports
------------ (3) Training predict prediction
------------
------------
------------
------------
Non- Severe
Severe
• minor • major
• trivial • critical
• blocker

9 /19
Monday 7 March 2011

Bug Database

New report
---------------

(1) & (2)

------------
------------
------------
Bug Reports
------------ (3) Training predict prediction
------------
------------
------------
------------
Non- Severe
Severe
• minor • major
• trivial • critical
• blocker
Proceedings of the 2010 7th IEEE Working Conference on Mining Software Repositories, p.1-10
9 /19
Monday 7 March 2011

Which text mining algorithm should we use when
predicting the severity?

10/19
Monday 7 March 2011


Secondary questions:

10/19
Monday 7 March 2011


How much training necessary?

10/19
Monday 7 March 2011



What are the characteristics of the prediction algorithm?

10/19
Monday 7 March 2011

➡ Support Vector Machines, Naive Bayes, Naive Bayes Multinomial,
1-Nearest Neighbor



10/19
Monday 7 March 2011

1-Nearest Neighbor

➡ investigate Learning Curve


10/19
Monday 7 March 2011

1-Nearest Neighbor

➡ investigate Learning Curve

➡ extract keywords

10/19
Monday 7 March 2011

Support Vector Machines

11/19
Monday 7 March 2011

Support Vector Machines 1-Nearest Neighbor

11/19
Monday 7 March 2011

Support Vector Machines 1-Nearest Neighbor

Naive Bayes Naive Bayes Multinomial

11/19
Monday 7 March 2011

Evaluation:
✓ Receiver Operating Characteristic(ROC) curve
✓ Area Under Curve(AUC): 0.5 is random prediction; 1.0
perfect classiﬁcation
✓ deals with unbalanced category distributions
1
Classifier 1
Classifier 2
Classifier 3

0.8
True positive rate

0.6

0.4

0.2

0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
False 12/19 rate
positive
Monday 7 March 2011

Table II
erm ti is defined as BASIC NUMBERS ABOUT THE SELECTED COMPONENTS FOR
Cases for our study:
RESPECTIVELY E CLIPSE AND GNOME

d}| Product Name Non-severe bugs Severe bugs

ocuments and |{d ∈ Eclipse SWT 696 3218
Eclipse User Interface 1485 3351
cuments containing
JDT User Interface 1470 1554
Eclipse Debug 327 485
erm ti in document CDT Debug 60 205
GEF Draw2D 36 83
Evolution Mailer 2537 7291
fi
Evolution Calendar 619 2661
n with the 1-Nearest GNOME Panel 332 1297
Metacity General 331 293
classifiers. GStreamer Core 93 352
Nautilus CD-Burner 73 355

ng bug reports from
ate our experiment:
se Bugzilla as their the name K-Fold Cross-validation. For example in the case
of 10-fold cross validation, the complete set of available bug
ugs] Eclipse is an reports is first split randomly into 10 subsets. These subsets
nvironment widely are split in a stratified manner, meaning that the distribution
l settings. The bug of the severities in the subsets respect the distribution of
13/19
Monday 7 March 2011 the severities in the complete set of the bug reports. Then,

Table II
erm ti is defined as BASIC NUMBERS ABOUT THE SELECTED COMPONENTS FOR
Cases for our study: RESPECTIVELY E CLIPSE AND GNOME

d}| Product Name Non-severe bugs Severe bugs
Table II
erm ti is defined as
ocuments and |{d ∈ Eclipse SWT 696 3218
BASIC NUMBERS ABOUT THE SELECTED COMPONENTS FOR
Eclipse RESPECTIVELY E CLIPSE AND GNOME 3351
User Interface 1485
cuments containing
JDT User Interface 1470 1554
∈ d}|ti in document
erm Product
CDT
Name
Debug
Non-severe bugs Severe bugs
60 205
documents and |{d ∈ Eclipse
GEF SWT
Draw2D 36696 3218
83
Eclipse User Interface 1485 3351
ocuments containing Evolution Mailer 2537 7291
fi JDT User Interface 1470 1554
n withti in 1-Nearest
term the document GNOME Panel 332 1297
CDT Debug 60 205
classifiers. GEF
GStreamer
Draw2D
Core 93
36
352
83
Nautilus
Evolution CD-Burner
Mailer 73
2537 355
7291
dfi
on with reports from
ng bug the 1-Nearest GNOME Panel 332 1297
sate our experiment:
classifiers. the name K-Fold Cross-validation. For example in 352 case
GStreamer Core 93 the
se Bugzilla as their
of 10-fold cross validation, the complete set of available bug
Nautilus CD-Burner 73 355

ugs] bug reports from reports is first split randomly into 10 subsets. These subsets
ing Eclipse is an
nvironment widely are split in a stratified manner, meaning that the distribution
uate our experiment:
of the severities in the subsets respect example in the case
the name K-Fold Cross-validation. For the distribution of
usesettings. The bug
l Bugzilla as their 13/19
Monday 7 March 2011 the severities in the complete set of the bug reports. Then,

Results

14/19
Monday 7 March 2011

Which text mining algorithm to use? (1/2)

1 1
NB NB
NB Multinomial NB Multinomial
1-NN 0.9 1-NN
SVM SVM

0.8 0.8

0.7
True positive rate

True positive rate
0.6 0.6

0.5

0.4 0.4

0.3

0.2 0.2

0.1

0 0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
False positive rate False positive rate

Eclipse / SWT Evolution / Mailer
15/19
Monday 7 March 2011

OC curves accurate classifier.
accurate classifier.
Classifier 1
Classifier
Classifier 1
2
Classifier 2
Classifier
Classifier
3
3 Table IV
Table IV
Which text mining algorithm to use?
A REA U NDER C URVE RESULTS FROM THE DIFFERENT COMPONENTS
(2/2)
Product / Component
Product / Component NB
NB NB Mult.
NB Mult. 1-NN
1-NN SVM
SVM
Eclipse / SWT
Eclipse / SWT 0.74
0.74 0.83
0.83 0.62
0.62 0.76
0.76
JDT / UI
JDT / UI 0.69
0.69 0.75
0.75 0.63
0.63 0.71
0.71
Eclipe / UI
Eclipe / UI 0.70
0.70 0.80
0.80 0.61
0.61 0.79
0.79
Eclipse / Debug
Eclipse / Debug 0.72
0.72 0.76
0.76 0.67
0.67 0.73
0.73
GEF / Draw2D
GEF / Draw2D 0.59
0.59 0.55
0.55 0.51
0.51 0.48
0.48
CDT / Debug
CDT / Debug 0.68
0.68 0.70
0.70 0.52
0.52 0.69
0.69
Evolution
Evolution /
/ Mailer
Mailer 0.84
0.84 0.89
0.89 0.73
0.73 0.87
0.87
0.6
0.6 0.7
0.7 0.8
0.8 0.9
0.9 1
1
ate
ate
Evolution
Evolution /
/ Calendar
Calendar 0.86
0.86 0.90
0.90 0.78
0.78 0.86
0.86
GNOME / Panel
GNOME / Panel 0.89
0.89 0.90
0.90 0.78
0.78 0.86
0.86
a cumbersome activity, Metacity / General 0.72 0.76 0.69 0.71
Metacity / General 0.72 0.76 0.69 0.71
se together. Therefore, GStreamer / Core 0.74 0.76 0.65 0.73
GStreamer / Core 0.74 0.76 0.65 0.73
calculated which serves Nautilus / CDBurner 0.93 0.93 0.81 0.91
Nautilus / CDBurner 0.93 0.93 0.81 0.91
accuracy. If the Area
5 then the classifier is
ber close to 1.0 means The same conclusions can also be drawn from the other
erfect predictions. This selected cases based on an analysis of the Area Under Curve
sions when comparing measures in Table IV. In this table here, we highlighted the
best results in bold. /19
16
From these results, we indeed notice that
Monday 7 March 2011

OC curves graph indicates
lrandom predictions but
of the accurate classifier.notice that the for respectively an Eclipse
algorithm denoting
Furthermore, we its accuracy accuracy decreases in the
accurate classifier.
er. In order to optimize
Classifier 1
Classifier 1
Classifier 2
Classifier 2 and GNOME case. Na¨ve Bayes classifier. curve we the
caseaof the standard Remember, the nearer a Lastly,is to see
ı
Classifier 3
m for classifiers with a Table IV
1-Nearest of the graph, theIVTHE accurately be predic-
Table
left-upper sideNeighbor based approach tends COMPONENTS
the REA U NDER C URVE RESULTS FROMmoreDIFFERENT to the the less
Classifier 3

OC the coordinate (1,0)
o curves Which text mining algorithm to use?
A
tions are. From Figure 3, we notice a winner in(2/2) cases:
accurate classifier. both
re 2 we see1 the ROC
Classifier
the Na¨ve Bayes Multinomial classifier. At the same time, we
ı
Classifier 2
Product / Component
Product / Component NB IV Mult. 1-NN SVM
Table NB Mult.
NB NB 1-NN SVM
We can observe that
Classifier 3
also observe that the Support VectorDIFFERENT COMPONENTSis
A REA U NDER C URVE RESULTS FROM THE Machines classifier
ehavior. We also notice Eclipse / SWT
Eclipse / SWT 0.74
ı
0.74 0.83
nearly as accurate as the Na¨ve Bayes Multinomial 0.76
0.83 0.62
0.62 classifier.
0.76
random predictions but JDT / UI we notice 0.69
JDT / UI 0.69 0.75
Furthermore, Component that the accuracy 1-NN SVMin the
0.75 decreases
0.63
0.63 0.71
0.71
Product / NB NB Mult.
case of the/ standard Na¨ve Bayes0.80
Eclipe / UI
Eclipe UI ı 0.70
0.70 classifier. Lastly, we see
0.80 0.61
0.61 0.79
0.79
the 1-NearestSWT
0.72
0.72 0.83
Neighbor based approach tends to be the less
0.76 0.62
0.67
0.67 0.76
0.73
0.73
OC curves accurate classifier.
JDT // UI
GEF / Draw2D 0.69
0.59 0.75
0.55 0.63
0.51 0.71
0.48
GEF Draw2D 0.59 0.55 0.51 0.48
Classifier 1
Classifier 2
Classifier 3
Eclipe / UI
CDT / Debug
CDT / Debug 0.68 IV 0.80
0.70
0.68
Table
0.70
0.70 0.61
0.52
0.52 0.79
0.69
0.69
0.72 0.76
A REA U NDER Debug RESULTS FROM THE DIFFERENT COMPONENTS
Eclipse / C URVE 0.67 0.73
Evolution / Mailer
Evolution / Mailer 0.84
0.84 0.89
0.89 0.73
0.73 0.87
0.87
0.6
0.6 0.7
0.7 0.8
0.8 0.9
0.9 1
1
GEF / Draw2D 0.59 0.55 0.51 0.48
ate Evolution / Calendar
Evolution / Calendar 0.86
0.86 0.90
0.90 0.78
0.78 0.86
0.86
ate
CDT / Debug
Product / Component 0.68
NB 0.70
NB Mult. 0.52
1-NN 0.69
SVM
GNOME / Panel
GNOME / Panel 0.89
0.89 0.90
0.90 0.78
0.78 0.86
0.86
a cumbersome activity, Evolution/ /General 0.89
Metacity Mailer
Metacity / General
Eclipse / SWT 0.84
0.72
0.74
0.72 0.76
0.83
0.76 0.73
0.69
0.62
0.69 0.87
0.71
0.76
0.71
se together. Therefore,
0.6 0.7 0.8 0.9 1
Evolution / /Calendar
GStreamer / Core 0.86
0.74 0.90
0.76 0.78
0.65 0.86
0.73
ate JDT / UI
GStreamer Core 0.69
0.74 0.75
0.76 0.63
0.65 0.71
0.73
calculated which serves GNOMEUI Panel
Nautilus // CDBurner 0.89
0.93 0.90
0.93 0.78
0.81 0.86
0.91
Eclipe / / CDBurner
Nautilus 0.70
0.93 0.80
0.93 0.61
0.81 0.79
0.91
a cumbersomethe Area
accuracy. If activity, Metacity / General 0.72 0.76 0.69 0.71
Eclipse / Debug 0.72 0.76 0.67 0.73
se together. Therefore,
5 then the classifier is GStreamer / Core 0.74 0.76 0.65 0.73
GEF / Draw2D 0.59 0.55 0.51 0.48
calculated to 1.0 means
ber close which serves The same /conclusions
Nautilus CDBurner
can
0.93
also 0.93 drawn from 0.91 other
be 0.81
the
erfect predictions. Area
accuracy. If the This selected cases based on an analysis0.70 the Area Under Curve
CDT / Debug 0.68
of 0.52 0.69

5 thenwhenclassifier is
sions the comparing measures in Table IV. In 0.84 table here, we highlighted the
Evolution / Mailer this 0.89 0.73 0.87
ber close 0.8 1.0 means
0.6
ate
0.7
to 0.9 1 The same /bold. /19
16
best results inconclusions 0.86 also 0.90 drawn fromnotice that
From these results, we indeed 0.86 other
Evolution Calendar can be 0.78 the
Monday 7 March 2011

How much need for training?

1
NB
NB mult.
1-NN
SVM

0.9

0.8

0.7

0.6

0.5
0 200 400 600 800 1000 1200 1400

Eclipse / SWT
17/19
Monday 7 March 2011

How much need for training?

1 1
NB NB
NB mult. NB mult.
1-NN 1-NN
SVM SVM

0.9 0.9

0.8 0.8

0.7 0.7

0.6 0.6

0.5 0.5
0 200 400 600 800 1000 1200 1400 0 500 1000 1500 2000 2500 3000

Eclipse / SWT Evolution / Mailer
17/19
Monday 7 March 2011

terms which we extracted from the resulting Na¨ve Bayes
ı
tend to vary
Multinomial classifier of two cases.
specific indic
Classifier characteristics
Table VI
T OP MOST SIGNIFICANT TERMS OF EACH SEVERITY
approach whe
is sound.
Case Non-severe Severe Each com
of describ
Eclipse quick, fix, dialog, npe, java, file,
JDT type, gener, code, package, open, junit, which are
UI set, javadoc, wizard, eclips, editor, compone
mnemon, messag, prefer, folder, problem,
import, method, project, cannot,
manipul, button, delete, rename,
warn, page, miss, error, view, search, In this sect
wrong, extract, label, fail, intern, broken,
add, quickfix, pref, run, explore, cause, validity of ou
constant, enabl, icon, perspect, jdt, or alleviate th
paramet, constructor classpath, hang,
resourc, save, crash studies resear
Evolution message, not, crash, evolut, mail, categories.
Mailer button, change, email, imap, click, Construc
dialog, display, evo, inbox, mailer,
doesnt, header, list, open, read, server, component, a
search, select, show, start, hang, bodi, component w
signature, text, cancel, onli, appli,
unread, view, window, junk, make, prefer, reporters hav
load, pad, ad, content, tree, user, automat, field in a bug
startup, subscribe, mode, sourc, sigsegv,
another, encrypt, warn, segment risk that the
import, press, print, ways than inte
sometimes
components w
18/19
Monday 7 March 2011
Internal

Conclusions
✓ Naive Bayes Multinomial most accurate
predictor

✓ More training results in more stable
predictions

✓ Characteristics of classiﬁers tend to be
component speciﬁc

19/19
Monday 7 March 2011

Comparing Text Mining Algorithms for Predicting the Severity of a Reported Bug

Recomendados

Recomendados

Mais conteúdo relacionado

Último

Último (20)

Destaque

Destaque (20)

Comparing Text Mining Algorithms for Predicting the Severity of a Reported Bug