The recent and fast expansion of OSS (Open-source software) communities has fostered research on how open source projects evolve and how their communities interact. Several research studies show that the inflow of new developers plays an important role in the longevity and the success of OSS projects. Beside that they also discovered that an high percentage of newcomers tend to leave the project because of the socio-technical barriers they meet when they join the project. However, such research effort did not generate yet concrete results in support retention and training of project newcomers. In this thesis dissertation we investigated problems arising when newcomers join software projects, and possible solutions to support them. Specifically, we studied (i) how newcomers behave during development activities and how they interact with others developers with the aim at (ii) developing tools and/or techniques for supporting them during the integration in the development team. Thus, among the various recommenders, we defined (i) a tool able to suggest appropriate mentors to newcomers during the training stage; then, with the aim at supporting newcomers during program comprehension we defined other two recommenders: a tool that (ii) generates high quality source code summaries and another tool able to (iii) provide descriptions of specific source code elements. For future work, we plan to improve the proposed recommenders and to integrate other kind of recommenders to better support newcomers in OSS projects.
3. Team
1 Team
2 Team
n
...
Open Source Projects (OSS)
3
4. Team
1 Team
2 Team
n
...
New
Features
Bugs
fixing
...................
...................
...................
Open Source Projects (OSS)
New
Features
Bugs
fixing
...................
...................
...................
New
Features
Bugs
fixing
...................
...................
...................
4
9. Dealing with Newcomers in Open
Source Projects
9
1)
Recruitment
2)
Selection
Kraut
et
al.
10. Dealing with Newcomers in Open
Source Projects
10
1)
Recruitment
2)
Selection
3)
Retention
Kraut
et
al.
11. Dealing with Newcomers in Open
Source Projects
11
4)
Socialization
1)
Recruitment
2)
Selection
3)
Retention
Kraut
et
al.
12. Dealing with Newcomers in Open
Source Projects
12
Kraut
et
al.
[1]
Steinmacher
et
al.
“Why
do
newcomers
abandon
open
source
software
projects?”
CHASE
2013
[2]
Zhou
et
al.
“Does
the
initial
environment
impact
the
future
of
developers?”
ICSE
2011
[3]
Dagenais
et
al.
“Moving
into
a
new
software
project
landscape”
ICSE
2010
[4]
Steinmacher
et
al.
“How
to
support
newcomers
on
boarding
to
open
source
software
projects”
14.
Perform
Development
Tasks
Mentoring
Team
Collaborations
14
Newcomer Training Process
Steinmacher et al. “The hard life of open source software project newcomers”
(CHASE 2014)
Dagenais et al. “Moving in a New Project Landscape”
(ICSE 2011)
15.
Perform
Development
Tasks
Mentoring
Recommenders
Team
Collaborations
15
Newcomer Training Process
Steinmacher et al. “The hard life of open source software project newcomers”
(CHASE 2014)
Dagenais et al. “Moving in a New Project Landscape”
(ICSE 2011)
16. 1)
Recommend
Mentors
Perform
Development
Tasks
Mentoring
Recommenders
Team
Collaborations
16
Newcomer Training Process
Steinmacher et al. “The hard life of open source software project newcomers”
(CHASE 2014)
Dagenais et al. “Moving in a New Project Landscape”
(ICSE 2011)
17. 1)
Recommend
Mentors
2)
Supporting
Source
Code
Comprehension
and
Re-‐documentation
Perform
Development
Tasks
Mentoring
Recommenders
Team
Collaborations
17
Newcomer Training Process
3)
Detecting
Emerging
Teams
Steinmacher et al. “The hard life of open source software project newcomers”
(CHASE 2014)
Dagenais et al. “Moving in a New Project Landscape”
(ICSE 2011)
22. Thesis Structure
• PART I : analyzing data from software repositories to support
team work.
• rs developers and support the team work.
• PART II : analyzing how developers use software artifacts to
help newcomers in program comprehension task. Si
• on ta
• sk.
• PART III : developing recommenders to support concretely
project newcomers
22
23. • PART I: analyzing data from software repositories to support
team work.
• rs developers and support the team work.
• PART II : analyzing how developers use software artifacts to
help newcomers in program comprehension task. Si
• on ta
• sk.
• PART III : developing recommenders to support concretely
project newcomers
23
Thesis Structure
24. • PART I: analyzing data from software repositories to support
team work.
• rs developers and support the team work.
• PART II: analyzing how developers use software artifacts to
help newcomers in program comprehension task. Si
• on ta
• sk.
• PART III : presents recommenders to support concretely
project newcomers
24
Thesis Structure
25. • PART I: analyzing data from software repositories to support
team work.
• rs developers and support the team work.
• PART II: analyzing how developers use software artifacts to
help newcomers in program comprehension task. Si
• on ta
• sk.
• PART III: developing recommenders to support concretely
project newcomers.
25
Thesis Structure
27. How Developers’ Collaborations Networks
Identified from Different Sources Differ?
IRC
CHAT
LOG
VERSIONING
SYSTEM
ISSUE
TRACKER
MAILING
LIST
Sebastiano Panichella, Gabriele Bavota, Massimiliano Di Penta, Gerardo Canfora, Giuliano Antoniol
How
Developers’
Collaborations
Identified
from
Different
Sources
Tell
us
About
Code
Changes.
The 30th International Conference on Software Maintenance and Evolution (IEEE ICSME 2014)
27
28. Example: Hibernate OSS Project
28
How Developers’ Collaborations Networks
Identified from Different Sources Differ?
29. Example: Hibernate OSS Project
29
How Developers’ Collaborations Networks
Identified from Different Sources Differ?
30. Example: Hibernate OSS Project
30
How Developers’ Collaborations Networks
Identified from Different Sources Differ?
31. Example: Hibernate OSS Project
31
How Developers’ Collaborations Networks
Identified from Different Sources Differ?
32. Example: Hibernate OSS Project
32
How Developers’ Collaborations Networks
Identified from Different Sources Differ?
33. Developers Overlap between
Different Sources
Apache Httpd
Apache Lucene
Samba
Hibernate
ISSUE and CHAT
ISSUE and MAIL
<
35% 56%
MAIL and CHAT
MAIL and ISSUE
<
50%
86%
33
34. Overlap of Developers Social Links
ISSUE and CHAT
ISSUE and MAIL
<
26% 38%
MAIL and CHAT
MAIL and ISSUE
<
20%
30%
34
Apache Httpd
Apache Lucene
Samba
Hibernate
35. During an IRC Chat Meeting
PROJECT: Hibernate
“is there a better way?
dunno like I said this is
brainstorming and I have
not given lots of thought to
these cases”
“but we also need to
create the attributes
and values in
the entity binding..”
35
36. PROJECT: Hibernate
“is there a better way?
dunno like I said this is
brainstorming and I have
not given lots of thought to
these cases”
1) Brainstorming
36
“however planning
a pure standalone
test suite would
make things
easier...”
During an IRC Chat Meeting
37. PROJECT: Hibernate
“is there a better way?
dunno like I said this is
brainstorming and I have
not given lots of thought to
these cases”
“however planning
a pure standalone
test suite would
make things
easier...”
1) Brainstorming
2) Planning
(e.g. Testing
activities)
37
During an IRC Chat Meeting
38. PROJECT: Hibernate
“okay I think it is a bug
and I’m going to create
a jira first”
“however planning
a pure standalone
test suite would
make things
easier...”
1) Brainstorming
2) Planning
(e.g. Testing
activities)
3) Open an
Issue
38
During an IRC Chat Meeting
40. By
use
FUZZY
CLUSTER
ALGORITHMS
Teams
Identification
from
Emergent
Collaborations
40
Analysis of the Evolution
of Teams: How?
Sebastiano Panichella, Gerardo Canfora, Massimiliano Di Penta, Rocco Oliveto:
How the evolution of emerging collaborations relates to code changes: an empirical study.
The 22nd International Conference on Program Comprehension (IEEE ICPC 2014)
41. Teams
Merge
in
a
New
Release:
-‐
in
20%-‐35%
of
the
cases
Teams
Split
in
a
New
Release:
-‐
In
15%-‐35%
of
the
cases
How do Emerging Collaborations Change
across Software Releases?
Teams
Disappeared:
22%-‐45%
Teams
Survived:
50%-‐70%
41
42. Analysis of Developers’
Communication
1)
Analyzing
developers
collaboration/communication
through
specific
channels
would
only
provide
a
partial
view
of
the
reality.
2)
Social
network
recommenders
should
not
limit
their
information
mining
a
single
source.
3)
Issue
and
mail
can
be
used
to
identify
leaders
with
high
accuracy.
PART I
Summary
42
43. Analysis of Developers’
Communication
1)
Analysing
developers
collaboration/communication
from
mailing
list
and
issue
tracker
to
profile
developers
roles.
PART I
Possible
Future
Directions:
43
2)
Analysing
developers
collabora]on/communica]on
from
mailing
list
and
issue
tracker
to
profile
teams
of
developers
specialized
in
the
topics
of
interest
of
newcomers.
45. 45
PART II – Experiment A
PART II – Experiment B
Two Empirical Studies Aimed at Understanding
46. 46
PART II – Experiment A
How such documentation is browsed
by developers to perform
maintenance activities?
PART II – Experiment B
Two Empirical Studies Aimed at Understanding
47. PART II – Experiment A
How such documentation is browsed
by developers to perform
maintenance activities?
PART II – Experiment B
What code elements are often used
by humans when labeling a source
code artifacts?
word1
word2
word3
word4
word5
word6
47
Two Empirical Studies Aimed at Understanding
48. Maintenance Tasks
Bug
Fixing:
Add
a
new
feature:
Improve
existing
features:
48
G. Bavota, G. Canfora, M. Di Penta, R.Oliveto, Sebastiano Panichella
An Empirical Investigation on Documentation Usage Patterns in Maintenance Tasks.
The 29th International Conference on Software Maintenance (ICSM 2013)
• 33
participants
49. How Much Time did Participants Spend on
Different Kinds of Artifacts?
72%
13%10%
3%2%
49
50. 72%
13%10%
3%2%
Undergraduates
students
used
Source
Code
and
Javadoc
significantly
more
than
Graduate
students
Undergraduate
Students
50
How Much Time did Participants Spend on
Different Kinds of Artifacts?
52. More
experienced
participants
use
a
more
“Integrated
approach”
S=
Sequence
Diagram
D=
Class
Diagram
U=
Use
Case
J=
Javadoc
52
Most Frequent Navigation Patterns
Before Reaching Source Code
18%$
8%$
2%$
2%$
1%$
1%$
3%$
0%$
1%$
0%$
4%$
12%$
16%$
12%$
10%$
4%$
4%$
1%$
2%$
1%$
1%$
5%$
0%# 2%# 4%# 6%# 8%# 10%# 12%# 14%# 16%# 18%# 20%#
S##
D##
(SD)+##
(US)+##
U(SD)+##
(DS)+##
J##
U#
S(US)+##
SU(SD)+##
Other##
Graduate#
Undegraduate#
53. PART II – Experiment B
What Code Elements are Often
Used by Humans When Labeling
a Source Code Artifact?
word1
word2
word3
word4
word5
word6
53
54. • Object:
Experiment B: Context
eXVantage
(industrial
test
data
generation
tool)
• Subjects:
38
Bachelor
Student
CS
…
book
hotel
room
reservation
arrival
departure
smoking
double
card
breakfastJava
Class
54
55. Experiment B: ContextComparison of Different Labeling Techniques
Andrea De Lucia, Massimiliano Di Penta, Rocco Oliveto, Annibale Panichella, Sebastiano
Panichella : Labeling Source Code with Information Retrieval Methods:An Empirical Study. EMSE 2014.
55
56. Experiment B: ContextComparison of Different Labeling Techniques
Data
extracted
from
signature
of
methods
match
very
well
the
mental
model
of
newcomers
when
describing
source
code
Andrea De Lucia, Massimiliano Di Penta, Rocco Oliveto, Annibale Panichella, Sebastiano
Panichella : Labeling Source Code with Information Retrieval Methods:An Empirical Study. EMSE 2014.
56
57. How Developers Browse and
Understand Software Artifacts
1) Newcomers spend more time to analyze low-level artifacts as compared to high-level artifacts
2) Heuristics based on data extracted form signature of methods are able to match very well
the mental model of newcomers when describing source code elements
PART II
Summary
57
58. How Developers Browse and
Understand Software Artifacts
1) Develop new recommenders able to suggest to newcomers how to proper
navigate software artefacts during maintenance tasks
2) Build on top of the the automatic labelling approach more accurate summarization
techniques to support newcomers during program comprehension and
maintenance activity.
PART II
Possible
Future
Directions:
58
60. Two Recommenders to Support
Project Newcomers
PART III - A)
Suggest
Appropriate
Mentors
to
Help
Newcomers
in
Open
Source
Projects
PART III – B)
Mining
Source
Code
Descriptions
from
Developers’
Communication
to
Improve
Newcomers’
Program
Comprehension
60
61. PART III – A)
Suggest Appropriate Mentors to Help
Newcomers in Open Source Projects
61
62. Mentoring
of
Project
Newcomers
is
Highly
Desirable….
Previous Work
Dagenais
et
al.
ICSE
2010
MENTOR
62
64. Characteristics of a Good Mentor
64
Enough
ability
to
help
other
people.
Enough
expertise
about
the
topic
of
interest
for
the
newcomer.
https://community.apache.org/mentoringprogramme.html
65. 1)
Find
Past
Successful
Mentors
2)
Suggest
Mentors
Having
Specific
Skills
YODA
(Young and newcOmer Developer Assistant)
65
Approach for Mentors Identification in Open Source Projects
Gerardo Canfora, Massimiliano Di Penta, Rocco Oliveto, Sebastiano Panichella:
Who is Going to Mentor Newcomers in Open Source Projects?
International Symposium on the Foundations of Software Engineering (SIGSOFT FSE 2012)
http://www.ifi.uzh.ch/seal/people/panichella/tools.html
66. Score
Computed
Aggregating
the
Factors
in
a
Weighted
Sum
Identify Past Successful Mentors
∑=
5
1i
ii fw
F1
F2 >
F3
F4
-‐
1st
F5
74. 74
Is it Possible to Recommend Mentors
To Project Newcomers?
85%$
30%$
100%$
64%$
94%$
81%$
24%$
100%$
77%$
82%$
0%$
10%$
20%$
30%$
40%$
50%$
60%$
70%$
80%$
90%$
100%$
Apache$ FreeBSD$ PostgreSQL$ Python$ Samba$
Precision$
Mentor$RecommendaGons:$Precision$on$Top$1$and$Top$2$
Top$1$
Top$2$
75. 75
Results When are Used Both Mails and Issues
80%$
67%$
90%$ 91%$ 92%$
72%$
45%$
89%$
84%$
76%$
0%$
10%$
20%$
30%$
40%$
50%$
60%$
70%$
80%$
90%$
100%$
Apache$ FreeBSD$ PostgreSQL$ Python$ Samba$
Precision$
Mentor$RecommendaGons:$Precision$on$Top$1$and$Top$2$
Top$1$
Top$2$
76. 76
Mentor
Recommenda]ons:
Precision
on
Top
1
and
Top
2
Precision
0%
25%
50%
75%
100%
Apache FreeBSD PostgreSQL Python Samba
76%
84%
89%
45%
72%
92%91%90%
67%
80%
Top
1
Top
2
YODA Make it Possible
to Recommend Mentors
It is Possible to Recommend Mentors
To Project Newcomers?
77. PART III – B)
Mining
Source
Code
Descriptions
from
Developer
Communications
to
Improve
Newcomers
Program
Comprehension
77
79. Mining Summaries
We argue that messages exchanged among contributors/developers are a
useful source of information to help understanding source code.
In such situations developers need to infer knowledge from,
the
source
code
itself
Newcomer
Can
find
Source
code
description
79
80. Mining Summaries
We argue that messages exchanged among contributors/developers are a
useful source of information to help understanding source code.
In such situations developers need to infer knowledge from,
the
source
code
itself
source
code
descriptions
in
external
artifacts.
Newcomer
Can
find
Source
code
description
..................................................
When call the method IndexSplitter.split(File
destDir, String[] segs) from the Lucene cotrib
directory(contrib/misc/src/java/org/apache/
lucene/index) it creates an index with
segments descriptor file with wrong data.
Namely wrong is the number representing the
name of segment that would be created next in
this index.
..................................................
CLASS:
IndexSplitter METHOD:
split
80
82. • Step 1: Downloading SO discussions relying on
its REST interface and tracing them onto classes
• Step 2: Extracting paragraphs
• Step 3: Tracing paragraphs onto methods
( Discards Paragraphs of discussions with 0 Votes)
• Step 4: Heuristic based Filtering
• Step 5: Similarity based Filtering
CODES:
Approach for Mining Method Descriptions
CarmineVassallo, Sebastiano Panichella, Massimiliano Di Penta, Gerardo Canfora:
CODES: mining source code descriptions from developers discussions.
BEST TOOL AWARD at the 22nd International Conference on Program Comprehension (IEEE ICPC 2014)
82
http://www.ifi.uzh.ch/seal/people/panichella/tools.html
Sebastiano Panichella, Jairo Aponte, Massimiliano Di Penta,Andrian Marcus, Gerardo Canfora:
Mining source code descriptions from developer communications.
International Conference on Program Comprehension (IEEE ICPC 2012)
83. Help Newcomer Program Comprehension with
extraction of summaries of code elements from
Newcomer
Q&A
SITE
ISSUE
TRACKER
MAILING
LIST
83
Supporting Software Development
85. Approach Precision vs.
Number of Method Covered
We
mine
useful
java
methods
description
from
developers
discussions
85
86.
PART III
Recommenders
1)
YODA
make
it
possible
to
recommend
mentors
with
a
precision
higher
than
67%
3)
Combining
Mails
and
Issues
improve
recommenders’
performance
2)
CODES
identifies
relevant
descriptions
with
a
precision
higher
than
79%
86
89. Future Work…
Analysing
developers
collaboration/communication
from
mailing
list
and
issue
tracker
to
profile
developers
roles
or
teams.
We
will
aim
at
building
recommenders
to
help
newcomer
in
the
choice
of
appropriate
patterns
to
navigate
software
documentation
during
maintenance
tasks.
New
Recommenders
89
Tester
Maintainer
Mentor
90. Future Work…
Develop
new
recommenders
able
to
describing
dependencies
with
third-‐party
libraries
and
help
newcomers
to
avoid
changes
that
could
break
the
dependency
with
used
libraries.
New
Recommenders
90
Build
on
top
of
the
the
automatic
labelling
approach
more
accurate
summarization
techniques
to
support
newcomers
during
program
comprehension
92. Advice to Young
Newcomers (Researchers)
92
Pleasures
and
rewards
of
research:
-‐ PASSION
is
the
most
important
thing!
-‐ Be
aware
that
with
this
work
you
will
not
earn
so
much
moneys.
If
you
have
ANY
DOUBT
don’t
do
it!
-‐
WORK
HARD
(even
in
the
week-‐end)
and
do
good
work!
Do
you
really
want
to
be
a
PhD
student?
93. Ways to fail a Ph.D
93
Learn too much…
- Some students go to Ph.D. school because they want to learn.
- Let there be no mistake: Ph.D. school involves a lot of learning.
But, it requires focused learning directed toward an eventual thesis.
- Taking (or sitting in on) non-required classes outside one's focus
is almost always a waste of time, and it's always unnecessary.
Expect perfection…
Perfection cannot be attained. It is approached in the limit…
Something that is “good enough” can be achieved…
Procrastinate…
Chronic perfectionists also tend to be procrastinators.
So do eternal students with a drive to learn instead of research.
94. Ways to fail a Ph.D
94
Ignore Conference Meetings…
Start complaining instead working…
- Complaining about some of the following staff instead working is a big mistake:
a) Your working environment.
b) The absence of your advisor.
c) Anything else.
-> “PhD student” doesn’t means “student”
…It means, “Young Researcher”…
…we can expect ADVISES from our advisor NOT ideas…
- Conferences are the best places to be aware of “new emerging ideas”
- Be in contact with other peers and share ideas. “Best idea born in a collaborative
matter”.
95. Thing to do during a Ph.D
95
Be focused!
Be focused on real problems!
- It is important, certainly work in a wider research has a wide impact, however,
it should be around a precise problem.
A wide research will get you better job offers, but proven depth is more likely to get you
fame.
- The best research often comes from REAL PROBLEMS: I think that much
academic research is, well, academic.
- Talk with local companies: you may find that your best ideas, even general
theoretical ideas, arise from their needs.
Validate your research work!
- Validate your research ideas is a very effort consuming task, it is very
important try to involve developers (doing a survey or controlled experiments)
in this loop.