Comparing Text Mining Algorithms for Predicting the Severity of a Reported Bug
1. Proceedings of the 15th European Conference on Software Maintenance and Reengineering
Comparing Text Mining Algorithms for
Predicting the Severity of a Reported Bug
Ahmed Lamkanfi, Serge Demeyer, Quinten David Soetens, Tim Verdonck
1 /19
Monday 7 March 2011
2. Proceedings of the 15th European Conference on Software Maintenance and Reengineering
Comparing Text Mining Algorithms for
Predicting the Severity of a Reported Bug
Ahmed Lamkanfi, Serge Demeyer, Quinten David Soetens, Tim Verdonck
1 /19
Monday 7 March 2011
5. Severity of a bug is important
✓ Critical factor in deciding how soon it needs to
be fixed, i.e. when prioritizing bugs
4 /19
Monday 7 March 2011
6. Severity of a bug is important
✓ Critical factor in deciding how soon it needs to
be fixed, i.e. when prioritizing bugs
Priority is not severity!
✓ e.g.: a crash occurring only at a small user base
may have a low priority
4 /19
Monday 7 March 2011
7. Severity is
technical
5 /19
Monday 7 March 2011
10. ✓ Severity varies:
➡ trivial, minor, normal, major, critical and blocker
➡ clear guidelines exist to classify severity of bug
reports
7 /19
Monday 7 March 2011
11. ✓ Severity varies:
➡ trivial, minor, normal, major, critical and blocker
➡ clear guidelines exist to classify severity of bug
reports
✓ Bugs are grouped according to products
and components
➡ e.g.: UI, SWT, Debug are components of product
Eclipse
7 /19
Monday 7 March 2011
15. Bug Database
(1) & (2)
Extract and preprocess bug reports
------------
------------
------------
Bug Reports
------------ (3) Training predict
------------
------------
------------
------------
9 /19
Monday 7 March 2011
16. Bug Database
New report
---------------
(1) & (2)
Extract and preprocess bug reports (4) Predict the severity
------------
------------
------------
Bug Reports
------------ (3) Training predict
------------
------------
------------
------------
9 /19
Monday 7 March 2011
17. Bug Database
New report
---------------
(1) & (2)
Extract and preprocess bug reports (4) Predict the severity
------------
------------
------------
Bug Reports
------------ (3) Training predict prediction
------------
------------
------------
------------
Non- Severe
Severe
• minor • major
• trivial • critical
• blocker
9 /19
Monday 7 March 2011
18. Bug Database
New report
---------------
(1) & (2)
Extract and preprocess bug reports (4) Predict the severity
------------
------------
------------
Bug Reports
------------ (3) Training predict prediction
------------
------------
------------
------------
Non- Severe
Severe
• minor • major
• trivial • critical
• blocker
Proceedings of the 2010 7th IEEE Working Conference on Mining Software Repositories, p.1-10
9 /19
Monday 7 March 2011
19. Bug Database
New report
---------------
(1) & (2)
Extract and preprocess bug reports (4) Predict the severity
------------
------------
------------
Bug Reports
------------ (3) Training predict prediction
------------
------------
------------
------------
Non- Severe
Severe
• minor • major
• trivial • critical
• blocker
Proceedings of the 2010 7th IEEE Working Conference on Mining Software Repositories, p.1-10
9 /19
Monday 7 March 2011
20. Which text mining algorithm should we use when
predicting the severity?
10/19
Monday 7 March 2011
21. Which text mining algorithm should we use when
predicting the severity?
Secondary questions:
10/19
Monday 7 March 2011
22. Which text mining algorithm should we use when
predicting the severity?
Secondary questions:
How much training necessary?
10/19
Monday 7 March 2011
23. Which text mining algorithm should we use when
predicting the severity?
Secondary questions:
How much training necessary?
What are the characteristics of the prediction algorithm?
10/19
Monday 7 March 2011
24. Which text mining algorithm should we use when
predicting the severity?
➡ Support Vector Machines, Naive Bayes, Naive Bayes Multinomial,
1-Nearest Neighbor
Secondary questions:
How much training necessary?
What are the characteristics of the prediction algorithm?
10/19
Monday 7 March 2011
25. Which text mining algorithm should we use when
predicting the severity?
➡ Support Vector Machines, Naive Bayes, Naive Bayes Multinomial,
1-Nearest Neighbor
Secondary questions:
How much training necessary?
➡ investigate Learning Curve
What are the characteristics of the prediction algorithm?
10/19
Monday 7 March 2011
26. Which text mining algorithm should we use when
predicting the severity?
➡ Support Vector Machines, Naive Bayes, Naive Bayes Multinomial,
1-Nearest Neighbor
Secondary questions:
How much training necessary?
➡ investigate Learning Curve
What are the characteristics of the prediction algorithm?
➡ extract keywords
10/19
Monday 7 March 2011
30. Support Vector Machines 1-Nearest Neighbor
Naive Bayes Naive Bayes Multinomial
11/19
Monday 7 March 2011
31. Evaluation:
✓ Receiver Operating Characteristic(ROC) curve
✓ Area Under Curve(AUC): 0.5 is random prediction; 1.0
perfect classification
✓ deals with unbalanced category distributions
1
Classifier 1
Classifier 2
Classifier 3
0.8
True positive rate
0.6
0.4
0.2
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
False 12/19 rate
positive
Monday 7 March 2011
32. Table II
erm ti is defined as BASIC NUMBERS ABOUT THE SELECTED COMPONENTS FOR
Cases for our study:
RESPECTIVELY E CLIPSE AND GNOME
d}| Product Name Non-severe bugs Severe bugs
ocuments and |{d ∈ Eclipse SWT 696 3218
Eclipse User Interface 1485 3351
cuments containing
JDT User Interface 1470 1554
Eclipse Debug 327 485
erm ti in document CDT Debug 60 205
GEF Draw2D 36 83
Evolution Mailer 2537 7291
fi
Evolution Calendar 619 2661
n with the 1-Nearest GNOME Panel 332 1297
Metacity General 331 293
classifiers. GStreamer Core 93 352
Nautilus CD-Burner 73 355
ng bug reports from
ate our experiment:
se Bugzilla as their the name K-Fold Cross-validation. For example in the case
of 10-fold cross validation, the complete set of available bug
ugs] Eclipse is an reports is first split randomly into 10 subsets. These subsets
nvironment widely are split in a stratified manner, meaning that the distribution
l settings. The bug of the severities in the subsets respect the distribution of
13/19
Monday 7 March 2011 the severities in the complete set of the bug reports. Then,
33. Table II
erm ti is defined as BASIC NUMBERS ABOUT THE SELECTED COMPONENTS FOR
Cases for our study: RESPECTIVELY E CLIPSE AND GNOME
d}| Product Name Non-severe bugs Severe bugs
Table II
erm ti is defined as
ocuments and |{d ∈ Eclipse SWT 696 3218
BASIC NUMBERS ABOUT THE SELECTED COMPONENTS FOR
Eclipse RESPECTIVELY E CLIPSE AND GNOME 3351
User Interface 1485
cuments containing
JDT User Interface 1470 1554
Eclipse Debug 327 485
∈ d}|ti in document
erm Product
CDT
Name
Debug
Non-severe bugs Severe bugs
60 205
documents and |{d ∈ Eclipse
GEF SWT
Draw2D 36696 3218
83
Eclipse User Interface 1485 3351
ocuments containing Evolution Mailer 2537 7291
fi JDT User Interface 1470 1554
Evolution Calendar 619 2661
Eclipse Debug 327 485
n withti in 1-Nearest
term the document GNOME Panel 332 1297
CDT Debug 60 205
Metacity General 331 293
classifiers. GEF
GStreamer
Draw2D
Core 93
36
352
83
Nautilus
Evolution CD-Burner
Mailer 73
2537 355
7291
dfi
Evolution Calendar 619 2661
on with reports from
ng bug the 1-Nearest GNOME Panel 332 1297
sate our experiment:
Metacity General 331 293
classifiers. the name K-Fold Cross-validation. For example in 352 case
GStreamer Core 93 the
se Bugzilla as their
of 10-fold cross validation, the complete set of available bug
Nautilus CD-Burner 73 355
ugs] bug reports from reports is first split randomly into 10 subsets. These subsets
ing Eclipse is an
nvironment widely are split in a stratified manner, meaning that the distribution
uate our experiment:
of the severities in the subsets respect example in the case
the name K-Fold Cross-validation. For the distribution of
usesettings. The bug
l Bugzilla as their 13/19
Monday 7 March 2011 the severities in the complete set of the bug reports. Then,
35. Which text mining algorithm to use? (1/2)
1 1
NB NB
NB Multinomial NB Multinomial
1-NN 0.9 1-NN
SVM SVM
0.8 0.8
0.7
True positive rate
True positive rate
0.6 0.6
0.5
0.4 0.4
0.3
0.2 0.2
0.1
0 0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
False positive rate False positive rate
Eclipse / SWT Evolution / Mailer
15/19
Monday 7 March 2011
36. OC curves accurate classifier.
accurate classifier.
Classifier 1
Classifier
Classifier 1
2
Classifier 2
Classifier
Classifier
3
3 Table IV
Table IV
Which text mining algorithm to use?
A REA U NDER C URVE RESULTS FROM THE DIFFERENT COMPONENTS
A REA U NDER C URVE RESULTS FROM THE DIFFERENT COMPONENTS
(2/2)
Product / Component
Product / Component NB
NB NB Mult.
NB Mult. 1-NN
1-NN SVM
SVM
Eclipse / SWT
Eclipse / SWT 0.74
0.74 0.83
0.83 0.62
0.62 0.76
0.76
JDT / UI
JDT / UI 0.69
0.69 0.75
0.75 0.63
0.63 0.71
0.71
Eclipe / UI
Eclipe / UI 0.70
0.70 0.80
0.80 0.61
0.61 0.79
0.79
Eclipse / Debug
Eclipse / Debug 0.72
0.72 0.76
0.76 0.67
0.67 0.73
0.73
GEF / Draw2D
GEF / Draw2D 0.59
0.59 0.55
0.55 0.51
0.51 0.48
0.48
CDT / Debug
CDT / Debug 0.68
0.68 0.70
0.70 0.52
0.52 0.69
0.69
Evolution
Evolution /
/ Mailer
Mailer 0.84
0.84 0.89
0.89 0.73
0.73 0.87
0.87
0.6
0.6 0.7
0.7 0.8
0.8 0.9
0.9 1
1
ate
ate
Evolution
Evolution /
/ Calendar
Calendar 0.86
0.86 0.90
0.90 0.78
0.78 0.86
0.86
GNOME / Panel
GNOME / Panel 0.89
0.89 0.90
0.90 0.78
0.78 0.86
0.86
a cumbersome activity, Metacity / General 0.72 0.76 0.69 0.71
Metacity / General 0.72 0.76 0.69 0.71
se together. Therefore, GStreamer / Core 0.74 0.76 0.65 0.73
GStreamer / Core 0.74 0.76 0.65 0.73
calculated which serves Nautilus / CDBurner 0.93 0.93 0.81 0.91
Nautilus / CDBurner 0.93 0.93 0.81 0.91
accuracy. If the Area
5 then the classifier is
ber close to 1.0 means The same conclusions can also be drawn from the other
erfect predictions. This selected cases based on an analysis of the Area Under Curve
sions when comparing measures in Table IV. In this table here, we highlighted the
best results in bold. /19
16
From these results, we indeed notice that
Monday 7 March 2011
37. OC curves graph indicates
lrandom predictions but
of the accurate classifier.notice that the for respectively an Eclipse
algorithm denoting
Furthermore, we its accuracy accuracy decreases in the
accurate classifier.
er. In order to optimize
Classifier 1
Classifier 1
Classifier 2
Classifier 2 and GNOME case. Na¨ve Bayes classifier. curve we the
caseaof the standard Remember, the nearer a Lastly,is to see
ı
Classifier 3
m for classifiers with a Table IV
1-Nearest of the graph, theIVTHE accurately be predic-
Table
left-upper sideNeighbor based approach tends COMPONENTS
the REA U NDER C URVE RESULTS FROMmoreDIFFERENT to the the less
Classifier 3
OC the coordinate (1,0)
o curves Which text mining algorithm to use?
A REA U NDER C URVE RESULTS FROM THE DIFFERENT COMPONENTS
A
tions are. From Figure 3, we notice a winner in(2/2) cases:
accurate classifier. both
re 2 we see1 the ROC
Classifier
the Na¨ve Bayes Multinomial classifier. At the same time, we
ı
Classifier 2
Product / Component
Product / Component NB IV Mult. 1-NN SVM
Table NB Mult.
NB NB 1-NN SVM
We can observe that
Classifier 3
also observe that the Support VectorDIFFERENT COMPONENTSis
A REA U NDER C URVE RESULTS FROM THE Machines classifier
ehavior. We also notice Eclipse / SWT
Eclipse / SWT 0.74
ı
0.74 0.83
nearly as accurate as the Na¨ve Bayes Multinomial 0.76
0.83 0.62
0.62 classifier.
0.76
random predictions but JDT / UI we notice 0.69
JDT / UI 0.69 0.75
Furthermore, Component that the accuracy 1-NN SVMin the
0.75 decreases
0.63
0.63 0.71
0.71
Product / NB NB Mult.
case of the/ standard Na¨ve Bayes0.80
Eclipe / UI
Eclipe UI ı 0.70
0.70 classifier. Lastly, we see
0.80 0.61
0.61 0.79
0.79
the 1-NearestSWT
Eclipse / Debug 0.74
0.72
0.72 0.83
Neighbor based approach tends to be the less
Eclipse / Debug 0.76
0.76 0.62
0.67
0.67 0.76
0.73
0.73
OC curves accurate classifier.
JDT // UI
GEF / Draw2D 0.69
0.59 0.75
0.55 0.63
0.51 0.71
0.48
GEF Draw2D 0.59 0.55 0.51 0.48
Classifier 1
Classifier 2
Classifier 3
Eclipe / UI
CDT / Debug
CDT / Debug 0.68 IV 0.80
0.70
0.68
Table
0.70
0.70 0.61
0.52
0.52 0.79
0.69
0.69
0.72 0.76
A REA U NDER Debug RESULTS FROM THE DIFFERENT COMPONENTS
Eclipse / C URVE 0.67 0.73
Evolution / Mailer
Evolution / Mailer 0.84
0.84 0.89
0.89 0.73
0.73 0.87
0.87
0.6
0.6 0.7
0.7 0.8
0.8 0.9
0.9 1
1
GEF / Draw2D 0.59 0.55 0.51 0.48
ate Evolution / Calendar
Evolution / Calendar 0.86
0.86 0.90
0.90 0.78
0.78 0.86
0.86
ate
CDT / Debug
Product / Component 0.68
NB 0.70
NB Mult. 0.52
1-NN 0.69
SVM
GNOME / Panel
GNOME / Panel 0.89
0.89 0.90
0.90 0.78
0.78 0.86
0.86
a cumbersome activity, Evolution/ /General 0.89
Metacity Mailer
Metacity / General
Eclipse / SWT 0.84
0.72
0.74
0.72 0.76
0.83
0.76 0.73
0.69
0.62
0.69 0.87
0.71
0.76
0.71
se together. Therefore,
0.6 0.7 0.8 0.9 1
Evolution / /Calendar
GStreamer / Core 0.86
0.74 0.90
0.76 0.78
0.65 0.86
0.73
ate JDT / UI
GStreamer Core 0.69
0.74 0.75
0.76 0.63
0.65 0.71
0.73
calculated which serves GNOMEUI Panel
Nautilus // CDBurner 0.89
0.93 0.90
0.93 0.78
0.81 0.86
0.91
Eclipe / / CDBurner
Nautilus 0.70
0.93 0.80
0.93 0.61
0.81 0.79
0.91
a cumbersomethe Area
accuracy. If activity, Metacity / General 0.72 0.76 0.69 0.71
Eclipse / Debug 0.72 0.76 0.67 0.73
se together. Therefore,
5 then the classifier is GStreamer / Core 0.74 0.76 0.65 0.73
GEF / Draw2D 0.59 0.55 0.51 0.48
calculated to 1.0 means
ber close which serves The same /conclusions
Nautilus CDBurner
can
0.93
also 0.93 drawn from 0.91 other
be 0.81
the
erfect predictions. Area
accuracy. If the This selected cases based on an analysis0.70 the Area Under Curve
CDT / Debug 0.68
of 0.52 0.69
5 thenwhenclassifier is
sions the comparing measures in Table IV. In 0.84 table here, we highlighted the
Evolution / Mailer this 0.89 0.73 0.87
ber close 0.8 1.0 means
0.6
ate
0.7
to 0.9 1 The same /bold. /19
16
best results inconclusions 0.86 also 0.90 drawn fromnotice that
From these results, we indeed 0.86 other
Evolution Calendar can be 0.78 the
Monday 7 March 2011
38. How much need for training?
1
NB
NB mult.
1-NN
SVM
0.9
0.8
0.7
0.6
0.5
0 200 400 600 800 1000 1200 1400
Eclipse / SWT
17/19
Monday 7 March 2011
39. How much need for training?
1 1
NB NB
NB mult. NB mult.
1-NN 1-NN
SVM SVM
0.9 0.9
0.8 0.8
0.7 0.7
0.6 0.6
0.5 0.5
0 200 400 600 800 1000 1200 1400 0 500 1000 1500 2000 2500 3000
Eclipse / SWT Evolution / Mailer
17/19
Monday 7 March 2011
40. terms which we extracted from the resulting Na¨ve Bayes
ı
tend to vary
Multinomial classifier of two cases.
specific indic
Classifier characteristics
Table VI
T OP MOST SIGNIFICANT TERMS OF EACH SEVERITY
approach whe
is sound.
Case Non-severe Severe Each com
of describ
Eclipse quick, fix, dialog, npe, java, file,
JDT type, gener, code, package, open, junit, which are
UI set, javadoc, wizard, eclips, editor, compone
mnemon, messag, prefer, folder, problem,
import, method, project, cannot,
manipul, button, delete, rename,
warn, page, miss, error, view, search, In this sect
wrong, extract, label, fail, intern, broken,
add, quickfix, pref, run, explore, cause, validity of ou
constant, enabl, icon, perspect, jdt, or alleviate th
paramet, constructor classpath, hang,
resourc, save, crash studies resear
Evolution message, not, crash, evolut, mail, categories.
Mailer button, change, email, imap, click, Construc
dialog, display, evo, inbox, mailer,
doesnt, header, list, open, read, server, component, a
search, select, show, start, hang, bodi, component w
signature, text, cancel, onli, appli,
unread, view, window, junk, make, prefer, reporters hav
load, pad, ad, content, tree, user, automat, field in a bug
startup, subscribe, mode, sourc, sigsegv,
another, encrypt, warn, segment risk that the
import, press, print, ways than inte
sometimes
components w
18/19
Monday 7 March 2011
Internal
41. terms which we extracted from the resulting Na¨ve Bayes
ı
tend to vary
Multinomial classifier of two cases.
specific indic
Classifier characteristics
Table VI
T OP MOST SIGNIFICANT TERMS OF EACH SEVERITY
approach whe
is sound.
Case Non-severe Severe Each com
of describ
Eclipse quick, fix, dialog, npe, java, file,
JDT type, gener, code, package, open, junit, which are
UI set, javadoc, wizard, eclips, editor, compone
mnemon, messag, prefer, folder, problem,
import, method, project, cannot,
manipul, button, delete, rename,
warn, page, miss, error, view, search, In this sect
wrong, extract, label, fail, intern, broken,
add, quickfix, pref, run, explore, cause, validity of ou
constant, enabl, icon, perspect, jdt, or alleviate th
paramet, constructor classpath, hang,
resourc, save, crash studies resear
Evolution message, not, crash, evolut, mail, categories.
Mailer button, change, email, imap, click, Construc
dialog, display, evo, inbox, mailer,
doesnt, header, list, open, read, server, component, a
search, select, show, start, hang, bodi, component w
signature, text, cancel, onli, appli,
unread, view, window, junk, make, prefer, reporters hav
load, pad, ad, content, tree, user, automat, field in a bug
startup, subscribe, mode, sourc, sigsegv,
another, encrypt, warn, segment risk that the
import, press, print, ways than inte
sometimes
components w
18/19
Monday 7 March 2011
Internal
42. Conclusions
✓ Naive Bayes Multinomial most accurate
predictor
✓ More training results in more stable
predictions
✓ Characteristics of classifiers tend to be
component specific
19/19
Monday 7 March 2011