The document describes a study that aimed to predict the severity of reported software bugs by analyzing their textual descriptions. The study used a Bayesian classifier trained on bug reports from Mozilla, Eclipse, and GNOME to classify bug severity. Results showed the approach could predict severity with reasonable precision and recall when using short descriptions and training per component. Combining components required a larger training set.
1. Proceedings of the 2010 7th IEEE Working Conference on Mining Software Repositories, p.1-10
Predicting the Severity of a Reported Bug
Ahmed Lamkanfi, Serge Demeyer | Emanuel Giger | Bart Goethals
Ansymo | s.e.a.l. | ADReM
2. Proceedings of the 2010 7th IEEE Working Conference on Mining Software Repositories, p.1-10
Predicting the Severity of a Reported Bug
Ahmed Lamkanfi, Serge Demeyer | Emanuel Giger | Bart Goethals
Ansymo | s.e.a.l. | ADReM
3.
4.
5. Severity of a bug is important
✓ Critical factor in deciding how soon it needs to
be fixed, i.e. when prioritizing bugs
8. ✓ Severity varies:
➡ trivial, minor, normal major, critical and blocker
➡ clear guidelines exist to classify severity of bug
reports
9. ✓ Severity varies:
➡ trivial, minor, normal major, critical and blocker
➡ clear guidelines exist to classify severity of bug
reports
✓ Both a short and longer description of the
problem
10. ✓ Severity varies:
➡ trivial, minor, normal major, critical and blocker
➡ clear guidelines exist to classify severity of bug
reports
✓ Both a short and longer description of the
problem
✓ Bugs are grouped according to products
and components
➡ e.g.: plug-ins, bookmarks are components of
product Firefox
11. Can we accurately predict the severity of a reported
bug by analyzing its textual descriptions?
12. Can we accurately predict the severity of a reported
bug by analyzing its textual descriptions?
Also the following questions:
13. Can we accurately predict the severity of a reported
bug by analyzing its textual descriptions?
Also the following questions:
Potential indicators?
14. Can we accurately predict the severity of a reported
bug by analyzing its textual descriptions?
Also the following questions:
Potential indicators?
Short versus long description?
15. Can we accurately predict the severity of a reported
bug by analyzing its textual descriptions?
Also the following questions:
Potential indicators?
Short versus long description?
Per component versus cross-component?
17. We use text mining to classify bug reports
• Bayesian classifier: based on the probabilistic
occurrence of words
• training and evaluation period
• in first instance, per component
18. We use text mining to classify bug reports
• Bayesian classifier: based on the probabilistic
occurrence of words
• training and evaluation period
• in first instance, per component
19. We use text mining to classify bug reports
• Bayesian classifier: based on the probabilistic
occurrence of words
• training and evaluation period
• in first instance, per component
20. We use text mining to classify bug reports
• Bayesian classifier: based on the probabilistic
occurrence of words
• training and evaluation period
• in first instance, per component
Non-severe bugs Severe bugs
(trivial, minor) (major, critical, blocker)
21. We use text mining to classify bug reports
• Bayesian classifier: based on the probabilistic
occurrence of words
• training and evaluation period
• in first instance, per component
Undecided
Non-severe bugs Default Severe bugs
(trivial, minor) (normal) (major, critical, blocker)
22. Evaluation of the approach:
✓ precision and recall:
Cases drawn from the open-source community
✓ Mozilla, Eclipse and GNOME
28. How does the approach perform when using
the longer description?
29. How does the approach perform when using
the longer description?
Non-severe Severe
component precision recall precision recall
Mozilla: Layout
0.583 0.961 0.890 0.314
Mozilla: Bookmarks
0.536 0.963 0.820 0.166
Mozilla: Firefox
0.578 0.948 0.856 0.308
general
Eclipse: UI
0.548 0.976 0.892 0.197
Eclipse: JDT-UI
0.547 0.973 0.881 0.195
Eclipse: JDT-Text
0.570 0.988 0.955 0.257
30. How does the approach perform when using
the longer description?
Non-severe Severe
component precision recall precision recall
Mozilla: Layout
0.583 0.961 0.890 0.314
Mozilla: Bookmarks
0.536 0.963 0.820 0.166
Mozilla: Firefox
0.578 0.948 0.856 0.308
general
Eclipse: UI
0.548 0.976 0.892 0.197
Eclipse: JDT-UI
0.547 0.973 0.881 0.195
Eclipse: JDT-Text
0.570 0.988 0.955 0.257
31. How does the approach perform when
combining bugs from different components?
32. How does the approach perform when
combining bugs from different components?
Non-severe Severe
component precision recall precision recall
Mozilla
0.704 0.750 0.733 0.685
Eclipse
0.693 0.553 0.628 0.755
GNOME
0.817 0.737 0.760 0.835
33. How does the approach perform when
combining bugs from different components?
Non-severe Severe
component precision recall precision recall
Mozilla
0.704 0.750 0.733 0.685
Eclipse
0.693 0.553 0.628 0.755
GNOME
0.817 0.737 0.760 0.835
Much larger training set necessary
✓± 2000 reports instead of ± 500 per severity!
34. Conclusions
✓ It is possible to predict the severity of a
reported bug
✓ Short description better source for
predictions
✓ Cross-component approach works, but
requires more training samples