An important question for managing open source communities is how to allocate the resources of volunteers among the many tasks. One time consuming task is reviewing new bug reports. My presentation consists of three parts: First, based on a detailed analysis over the period 1999 -2009 of the Firefox Bugzilla database I will present graphs showing the role of community members in fixing bugs over the period 1999 - 2009. Second, I will focus on the role of an open source community as an information repository, how such an information repository is build by community members, and how understanding of this artifact shortens repair times. Finally I will talk about some small tools that I developed to help improve the bug fixing process. One of these tools predicts which bug report will get fixed based only on the initial bug report information. My goal is to inform open source community members and give some empirical evidence which might help in adding new functionality to Bugzilla where new bugs are not just ranked based on submission date but are ranked based on most likely to be fixed.
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
How Shallow is a Bug - How Open Source Communities (Help) Fix Bugs
1. Ranking the Bugs
How Open Source Communities
(Help) Fix Software Defects
Diederik van Liere
University of Toronto / Erasmus University Rotterdam
FSOSS SENECA 2009
1
2. Who am I?
• Post-doc researcher at the strategy department
Rotman School of Management / University of
Toronto
• PhD information & decision sciences department at
the Rotterdam School of Management
• “Networkophile”
• My research focuses on the intersection of social &
digital networks and open source software
• Mosaic / Netscape / Firefox user since 1994
• Blog: www.network-labs.org
2
8. Physical distance does not affect
post-release fault rates, distance in
the organizational chart does
as Quoted from Greg Wilson, StackOveflow Days (Nagappan et al. (2007) & Bird et al. (2009))
7
9. Data Collection
• A community member is someone who posted at
least 1 message at bugzilla.mozilla.org
• Date of entry: first time message posted
• Date of exit: a month after last message posted
• Collected bug reports filed at bugzilla.mozilla.org
with id 1 - 480.000 if product is Firefox / Core /
Seamonkey, in total 320.655 bug reports
• This covers the period late 1998 to March 1st, 2009
• Tried to match different email addresses to single
developer (more about this later)
8
16. Bug complexity
Quality user
contribution
Time needed to Time needed to
Community churn
verify bug report fix bug
Understanding of
the information
repository
15
17. Bug complexity
Quality user
contribution
Time needed to Time needed to
Community churn
verify bug report fix bug
Understanding of
the information
repository
Stage 1
16
18. Bug complexity
Quality user
contribution
Time needed to Time needed to
Community churn
verify bug report fix bug
Understanding of
the information
repository
Stage 2
17
19. Estimation & Variables
• Weibull regression (Accelerated Time to Failure
models, used to predict time-to-failure for hard disks)
• Unit of analysis: bug report
• Quality of bug report ranges from 0 to 4:
• Sum the presence of ‘steps to reproduce’, ‘stack
trace’, ‘screenshot’ & ‘version information’
• Understanding of information repository: average
experience bug discussants marking bugs duplicate
• Churn rate community:
• Bug complexity: centrality in bug dependency
network 18
23. Understanding what’s in the
information repository
shortens time to verify
Understanding of
Less time needed to
the information
verify bug report
repository
22
27. Implications
• Retention of community members is key
• Get community members through the learning
curve asap
• Extend Bugzilla with prediction functionality
to assist in allocating resources
• Funnel 1st time bug reporters to ‘user
assistance area’
26
34. Literature References
• E. S. Raymond, “The Cathedral & the Bazaar: Musings on Linux and Open
Source by an Accidental Revolutionary”, Sebastopol, CA: O'Reilly & Associates,
Inc., 1999. Available at: http://www.catb.org/~esr/writings/cathedral-
bazaar/
• N. Nagappan, et al., “The Influence of Organizational Structure on Software
Quality - An Empirical Case Study”, International Conference on Software
Engineering, 2008. Available at: http://portal.acm.org/citation.cfm?
id=1368160
• T. Zimmermann, et al. “Predicting Defects Using Network Analysis on Dependency
Graphs”, International Conference on Software Engineering, 2008.
Available at: http://portal.acm.org/citation.cfm?id=1368161
• C. Bird, et al. “Does Distributed Development Affect Software Quality? An Empirical
Case Study of Windows Vista”, Communications of the ACM, 2009.
Available at: http://macbeth.cs.ucdavis.edu/distributed.pdf
33