Presentation of the paper "Primers or Reminders? The Effects of Existing Review Comments on Code Review" published at ICSE 2020.
Authors:
Davide Spadini, Gül Calikli, Alberto Bacchelli
Link to the paper: https://research.tudelft.nl/en/publications/primers-or-reminders-the-effects-of-existing-review-comments-on-c
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Primers or Reminders? The Effects of Existing Review Comments on Code Review
1. Primers or Reminders?
The Effects of Existing Review
Comments on Code Review
Davide Spadini, Gül Calikli, Alberto Bacchelli
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement No. 642954
2. Davide Spadini, Gül Calikli, Alberto Bacchelli
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement No. 642954
@DavideSpadini ishepard
Primers or Reminders?
The Effects of Existing Review
Comments on Code Review
3. Motivation
Peer review of manuscripts
PyDriller: Python Framework for Mining Soware Repositories
Davide Spadini
Delft University of Technology
Software Improvement Group
Delft, The Netherlands
d.spadini@sig.eu
Maurício Aniche
Delft University of Technology
Delft, The Netherlands
m.f.aniche@tudelft.nl
Alberto Bacchelli
University of Zurich
Zurich, Switzerland
bacchelli@i.uzh.ch
ABSTRACT
Software repositories contain historical and valuable information
about the overall development of software systems. Mining software
repositories (MSR) is nowadays considered one of the most inter-
esting growing elds within software engineering. MSR focuses
on extracting and analyzing data available in software repositories
to uncover interesting, useful, and actionable information about
the system. Even though MSR plays an important role in software
engineering research, few tools have been created and made public
to support developers in extracting information from Git reposi-
tory. In this paper, we present P, a Python Framework that
eases the process of mining Git. We compare our tool against the
state-of-the-art Python Framework GitPython, demonstrating that
P can achieve the same results with, on average, 50% less
LOC and signicantly lower complexity.
URL: https://github.com/ishepard/pydriller,
Materials: https://doi.org/10.5281/zenodo.1327363,
Pre-print: https://doi.org/10.5281/zenodo.1327411
CCS CONCEPTS
• Software and its engineering;
KEYWORDS
Mining Software Repositories, GitPython, Git, Python
ACM Reference Format:
Davide Spadini, Maurício Aniche, and Alberto Bacchelli. 2018. PyDriller:
Python Framework for Mining Software Repositories. In Proceedings of the
26th ACM Joint European Software Engineering Conference and Symposium
on the Foundations of Software Engineering (ESEC/FSE ’18), November 4–
9, 2018, Lake Buena Vista, FL, USA. ACM, New York, NY, USA, 4 pages.
https://doi.org/10.1145/3236024.3264598
1 INTRODUCTION
Mining software repository (MSR) techniques allow researchers to
analyze the information generated throughout the software devel-
follow [20], predicting classes that are more prone to change/de-
fects [3, 6, 16, 17], and identifying the core developers of a software
team to transfer knowledge [12].
Among the dierent sources of information researchers can use,
version control systems, such as Git, are among the most used ones.
Indeed, version control systems provide researchers with precise
information about the source code, its evolution, the developers of
the software, and the commit messages (which explain the reasons
for changing).
Nevertheless, extracting information from Git repositories is
not trivial. Indeed, many frameworks can be used to interact with
Git (depending on the preferred programming language), such as
GitPython [1] for Python, or JGit for Java [8]. However, these tools
are often dicult to use. One of the main reasons for such diculty
is that they encapsulate all the features from Git, hence, developers
are forced to write long and complex implementations to extract
even simple data from a Git repository.
In this paper, we present P, a Python framework that
helps developers to mine software repositories. P provides
developers with simple APIs to extract information from a Git
repository, such as commits, developers, modications, dis, and
source code. Moreover, as P is a framework, developers
can further manipulate the extracted data and quickly export the
results to their preferred formats (e.g., CSV les and databases).
To evaluate the usefulness of our tool, we compare it with the
state-of-the-art Python framework GitPython, in terms of imple-
mentation complexity, performance, and memory consumption.
Our results show that P requires signicantly fewer lines
of code to perform the same task when compared to GitPython,
with only a small drop in performance. Also, we asked six develop-
ers to perform tasks with both tools and found that all developers
spend less time in learning and implementing tasks in P.
2 PYDRILLER
P is a wrapper around GitPython that eases the extraction
of information from Git repositories. The most signicant dier-
Code review
4. Motivation
Peer review of manuscripts
PyDriller: Python Framework for Mining Soware Repositories
Davide Spadini
Delft University of Technology
Software Improvement Group
Delft, The Netherlands
d.spadini@sig.eu
Maurício Aniche
Delft University of Technology
Delft, The Netherlands
m.f.aniche@tudelft.nl
Alberto Bacchelli
University of Zurich
Zurich, Switzerland
bacchelli@i.uzh.ch
ABSTRACT
Software repositories contain historical and valuable information
about the overall development of software systems. Mining software
repositories (MSR) is nowadays considered one of the most inter-
esting growing elds within software engineering. MSR focuses
on extracting and analyzing data available in software repositories
to uncover interesting, useful, and actionable information about
the system. Even though MSR plays an important role in software
engineering research, few tools have been created and made public
to support developers in extracting information from Git reposi-
tory. In this paper, we present P, a Python Framework that
eases the process of mining Git. We compare our tool against the
state-of-the-art Python Framework GitPython, demonstrating that
P can achieve the same results with, on average, 50% less
LOC and signicantly lower complexity.
URL: https://github.com/ishepard/pydriller,
Materials: https://doi.org/10.5281/zenodo.1327363,
Pre-print: https://doi.org/10.5281/zenodo.1327411
CCS CONCEPTS
• Software and its engineering;
KEYWORDS
Mining Software Repositories, GitPython, Git, Python
ACM Reference Format:
Davide Spadini, Maurício Aniche, and Alberto Bacchelli. 2018. PyDriller:
Python Framework for Mining Software Repositories. In Proceedings of the
26th ACM Joint European Software Engineering Conference and Symposium
on the Foundations of Software Engineering (ESEC/FSE ’18), November 4–
9, 2018, Lake Buena Vista, FL, USA. ACM, New York, NY, USA, 4 pages.
https://doi.org/10.1145/3236024.3264598
1 INTRODUCTION
Mining software repository (MSR) techniques allow researchers to
analyze the information generated throughout the software devel-
follow [20], predicting classes that are more prone to change/de-
fects [3, 6, 16, 17], and identifying the core developers of a software
team to transfer knowledge [12].
Among the dierent sources of information researchers can use,
version control systems, such as Git, are among the most used ones.
Indeed, version control systems provide researchers with precise
information about the source code, its evolution, the developers of
the software, and the commit messages (which explain the reasons
for changing).
Nevertheless, extracting information from Git repositories is
not trivial. Indeed, many frameworks can be used to interact with
Git (depending on the preferred programming language), such as
GitPython [1] for Python, or JGit for Java [8]. However, these tools
are often dicult to use. One of the main reasons for such diculty
is that they encapsulate all the features from Git, hence, developers
are forced to write long and complex implementations to extract
even simple data from a Git repository.
In this paper, we present P, a Python framework that
helps developers to mine software repositories. P provides
developers with simple APIs to extract information from a Git
repository, such as commits, developers, modications, dis, and
source code. Moreover, as P is a framework, developers
can further manipulate the extracted data and quickly export the
results to their preferred formats (e.g., CSV les and databases).
To evaluate the usefulness of our tool, we compare it with the
state-of-the-art Python framework GitPython, in terms of imple-
mentation complexity, performance, and memory consumption.
Our results show that P requires signicantly fewer lines
of code to perform the same task when compared to GitPython,
with only a small drop in performance. Also, we asked six develop-
ers to perform tasks with both tools and found that all developers
spend less time in learning and implementing tasks in P.
2 PYDRILLER
P is a wrapper around GitPython that eases the extraction
of information from Git repositories. The most signicant dier-
Code review
- Asynchronous - Asynchronous
5. Motivation
Peer review of manuscripts
PyDriller: Python Framework for Mining Soware Repositories
Davide Spadini
Delft University of Technology
Software Improvement Group
Delft, The Netherlands
d.spadini@sig.eu
Maurício Aniche
Delft University of Technology
Delft, The Netherlands
m.f.aniche@tudelft.nl
Alberto Bacchelli
University of Zurich
Zurich, Switzerland
bacchelli@i.uzh.ch
ABSTRACT
Software repositories contain historical and valuable information
about the overall development of software systems. Mining software
repositories (MSR) is nowadays considered one of the most inter-
esting growing elds within software engineering. MSR focuses
on extracting and analyzing data available in software repositories
to uncover interesting, useful, and actionable information about
the system. Even though MSR plays an important role in software
engineering research, few tools have been created and made public
to support developers in extracting information from Git reposi-
tory. In this paper, we present P, a Python Framework that
eases the process of mining Git. We compare our tool against the
state-of-the-art Python Framework GitPython, demonstrating that
P can achieve the same results with, on average, 50% less
LOC and signicantly lower complexity.
URL: https://github.com/ishepard/pydriller,
Materials: https://doi.org/10.5281/zenodo.1327363,
Pre-print: https://doi.org/10.5281/zenodo.1327411
CCS CONCEPTS
• Software and its engineering;
KEYWORDS
Mining Software Repositories, GitPython, Git, Python
ACM Reference Format:
Davide Spadini, Maurício Aniche, and Alberto Bacchelli. 2018. PyDriller:
Python Framework for Mining Software Repositories. In Proceedings of the
26th ACM Joint European Software Engineering Conference and Symposium
on the Foundations of Software Engineering (ESEC/FSE ’18), November 4–
9, 2018, Lake Buena Vista, FL, USA. ACM, New York, NY, USA, 4 pages.
https://doi.org/10.1145/3236024.3264598
1 INTRODUCTION
Mining software repository (MSR) techniques allow researchers to
analyze the information generated throughout the software devel-
follow [20], predicting classes that are more prone to change/de-
fects [3, 6, 16, 17], and identifying the core developers of a software
team to transfer knowledge [12].
Among the dierent sources of information researchers can use,
version control systems, such as Git, are among the most used ones.
Indeed, version control systems provide researchers with precise
information about the source code, its evolution, the developers of
the software, and the commit messages (which explain the reasons
for changing).
Nevertheless, extracting information from Git repositories is
not trivial. Indeed, many frameworks can be used to interact with
Git (depending on the preferred programming language), such as
GitPython [1] for Python, or JGit for Java [8]. However, these tools
are often dicult to use. One of the main reasons for such diculty
is that they encapsulate all the features from Git, hence, developers
are forced to write long and complex implementations to extract
even simple data from a Git repository.
In this paper, we present P, a Python framework that
helps developers to mine software repositories. P provides
developers with simple APIs to extract information from a Git
repository, such as commits, developers, modications, dis, and
source code. Moreover, as P is a framework, developers
can further manipulate the extracted data and quickly export the
results to their preferred formats (e.g., CSV les and databases).
To evaluate the usefulness of our tool, we compare it with the
state-of-the-art Python framework GitPython, in terms of imple-
mentation complexity, performance, and memory consumption.
Our results show that P requires signicantly fewer lines
of code to perform the same task when compared to GitPython,
with only a small drop in performance. Also, we asked six develop-
ers to perform tasks with both tools and found that all developers
spend less time in learning and implementing tasks in P.
2 PYDRILLER
P is a wrapper around GitPython that eases the extraction
of information from Git repositories. The most signicant dier-
Code review
- Asynchronous
- 1 reviewer per peer review
- Asynchronous
- 1 reviewer per code review
6. Motivation
Peer review of manuscripts
PyDriller: Python Framework for Mining Soware Repositories
Davide Spadini
Delft University of Technology
Software Improvement Group
Delft, The Netherlands
d.spadini@sig.eu
Maurício Aniche
Delft University of Technology
Delft, The Netherlands
m.f.aniche@tudelft.nl
Alberto Bacchelli
University of Zurich
Zurich, Switzerland
bacchelli@i.uzh.ch
ABSTRACT
Software repositories contain historical and valuable information
about the overall development of software systems. Mining software
repositories (MSR) is nowadays considered one of the most inter-
esting growing elds within software engineering. MSR focuses
on extracting and analyzing data available in software repositories
to uncover interesting, useful, and actionable information about
the system. Even though MSR plays an important role in software
engineering research, few tools have been created and made public
to support developers in extracting information from Git reposi-
tory. In this paper, we present P, a Python Framework that
eases the process of mining Git. We compare our tool against the
state-of-the-art Python Framework GitPython, demonstrating that
P can achieve the same results with, on average, 50% less
LOC and signicantly lower complexity.
URL: https://github.com/ishepard/pydriller,
Materials: https://doi.org/10.5281/zenodo.1327363,
Pre-print: https://doi.org/10.5281/zenodo.1327411
CCS CONCEPTS
• Software and its engineering;
KEYWORDS
Mining Software Repositories, GitPython, Git, Python
ACM Reference Format:
Davide Spadini, Maurício Aniche, and Alberto Bacchelli. 2018. PyDriller:
Python Framework for Mining Software Repositories. In Proceedings of the
26th ACM Joint European Software Engineering Conference and Symposium
on the Foundations of Software Engineering (ESEC/FSE ’18), November 4–
9, 2018, Lake Buena Vista, FL, USA. ACM, New York, NY, USA, 4 pages.
https://doi.org/10.1145/3236024.3264598
1 INTRODUCTION
Mining software repository (MSR) techniques allow researchers to
analyze the information generated throughout the software devel-
follow [20], predicting classes that are more prone to change/de-
fects [3, 6, 16, 17], and identifying the core developers of a software
team to transfer knowledge [12].
Among the dierent sources of information researchers can use,
version control systems, such as Git, are among the most used ones.
Indeed, version control systems provide researchers with precise
information about the source code, its evolution, the developers of
the software, and the commit messages (which explain the reasons
for changing).
Nevertheless, extracting information from Git repositories is
not trivial. Indeed, many frameworks can be used to interact with
Git (depending on the preferred programming language), such as
GitPython [1] for Python, or JGit for Java [8]. However, these tools
are often dicult to use. One of the main reasons for such diculty
is that they encapsulate all the features from Git, hence, developers
are forced to write long and complex implementations to extract
even simple data from a Git repository.
In this paper, we present P, a Python framework that
helps developers to mine software repositories. P provides
developers with simple APIs to extract information from a Git
repository, such as commits, developers, modications, dis, and
source code. Moreover, as P is a framework, developers
can further manipulate the extracted data and quickly export the
results to their preferred formats (e.g., CSV les and databases).
To evaluate the usefulness of our tool, we compare it with the
state-of-the-art Python framework GitPython, in terms of imple-
mentation complexity, performance, and memory consumption.
Our results show that P requires signicantly fewer lines
of code to perform the same task when compared to GitPython,
with only a small drop in performance. Also, we asked six develop-
ers to perform tasks with both tools and found that all developers
spend less time in learning and implementing tasks in P.
2 PYDRILLER
P is a wrapper around GitPython that eases the extraction
of information from Git repositories. The most signicant dier-
Code review
- Asynchronous
- 1 reviewer per peer review
- Reviewers judge the manuscript
independently from each other
- Asynchronous
- 1 reviewer per code review
- Reviews are immediately visible to
the other reviewers
Could this visibility bias
the other reviewers?
7. Availability Bias
• Availability bias is one type of cognitive bias
• It is the tendency to overestimate the likelihood of events with greater
availability in memory (recent memories)
• By reading other reviewers’ comments, the reviewer might be biased in
fi
nding
the same types of errors, thus resulting in a biased code review outcome.
• Are reviewers biased by other reviewers comments? Should we change the
code review process and do something similar to a manuscript review?
8. Research Questions
RQ1: … a bug type that is not normally considered?
RQ2: … a bug type that is normally considered?
What is the effect of priming a reviewer
with comments on …
9. Demographics
confounders
With 1 comment of a
previous reviewer
Treatment Group
Without comments
Control Group
review
review
review
review
Normally
considered:
Corner Case
Normally
considered:
Corner Case
Not normally
considered:
NPE on
parameters
Not normally
considered:
NPE on
parameters
10. With 1 comment of a
previous reviewer
Treatment Group
Without comments
Control Group
review
review
review
review
Normally
considered:
Corner Case
Normally
considered:
Corner Case
Not normally
considered:
NPE on
parameters
Not normally
considered:
NPE on
parameters
}Questions on
the code review
11.
12.
13.
14. RQ1: What is the effect of priming a reviewer with comments on a bug type that is not
normally considered?
15. Reviewers primed on a not commonly considered bug are more
likely to
fi
nd other occurrences of this type of bugs. However, this
does not prevent them in
fi
nding also other types of bugs.
40%: “Extremely in
fl
uenced”
40%: ”Very in
fl
uenced
20%: ”Somewhat in
fl
uenced
RQ1: What is the effect of priming a reviewer with comments on a bug type that is not
normally considered?
16. RQ2: What is the effect of priming a reviewer with comments on a bug type
that is normally considered?
17. RQ2: Results
50%: ”Extremely in
fl
uenced
10%: ”Somewhat in
fl
uenced”
40%: “Slightly/Not In
fl
uenced”
Reviewers primed on an algorithmic bug perceive an in
fl
uence, but
are as likely as the others to
fi
nd algorithmic bugs. Furthermore,
primed participants did not capture fewer bugs of the other type.
18. Closing the circle
Peer review of manuscripts
PyDriller: Python Framework for Mining Soware Repositories
Davide Spadini
Delft University of Technology
Software Improvement Group
Delft, The Netherlands
d.spadini@sig.eu
Maurício Aniche
Delft University of Technology
Delft, The Netherlands
m.f.aniche@tudelft.nl
Alberto Bacchelli
University of Zurich
Zurich, Switzerland
bacchelli@i.uzh.ch
ABSTRACT
Software repositories contain historical and valuable information
about the overall development of software systems. Mining software
repositories (MSR) is nowadays considered one of the most inter-
esting growing elds within software engineering. MSR focuses
on extracting and analyzing data available in software repositories
to uncover interesting, useful, and actionable information about
the system. Even though MSR plays an important role in software
engineering research, few tools have been created and made public
to support developers in extracting information from Git reposi-
tory. In this paper, we present P, a Python Framework that
eases the process of mining Git. We compare our tool against the
state-of-the-art Python Framework GitPython, demonstrating that
P can achieve the same results with, on average, 50% less
LOC and signicantly lower complexity.
URL: https://github.com/ishepard/pydriller,
Materials: https://doi.org/10.5281/zenodo.1327363,
Pre-print: https://doi.org/10.5281/zenodo.1327411
CCS CONCEPTS
• Software and its engineering;
KEYWORDS
Mining Software Repositories, GitPython, Git, Python
ACM Reference Format:
Davide Spadini, Maurício Aniche, and Alberto Bacchelli. 2018. PyDriller:
Python Framework for Mining Software Repositories. In Proceedings of the
26th ACM Joint European Software Engineering Conference and Symposium
on the Foundations of Software Engineering (ESEC/FSE ’18), November 4–
9, 2018, Lake Buena Vista, FL, USA. ACM, New York, NY, USA, 4 pages.
https://doi.org/10.1145/3236024.3264598
1 INTRODUCTION
Mining software repository (MSR) techniques allow researchers to
analyze the information generated throughout the software devel-
follow [20], predicting classes that are more prone to change/de-
fects [3, 6, 16, 17], and identifying the core developers of a software
team to transfer knowledge [12].
Among the dierent sources of information researchers can use,
version control systems, such as Git, are among the most used ones.
Indeed, version control systems provide researchers with precise
information about the source code, its evolution, the developers of
the software, and the commit messages (which explain the reasons
for changing).
Nevertheless, extracting information from Git repositories is
not trivial. Indeed, many frameworks can be used to interact with
Git (depending on the preferred programming language), such as
GitPython [1] for Python, or JGit for Java [8]. However, these tools
are often dicult to use. One of the main reasons for such diculty
is that they encapsulate all the features from Git, hence, developers
are forced to write long and complex implementations to extract
even simple data from a Git repository.
In this paper, we present P, a Python framework that
helps developers to mine software repositories. P provides
developers with simple APIs to extract information from a Git
repository, such as commits, developers, modications, dis, and
source code. Moreover, as P is a framework, developers
can further manipulate the extracted data and quickly export the
results to their preferred formats (e.g., CSV les and databases).
To evaluate the usefulness of our tool, we compare it with the
state-of-the-art Python framework GitPython, in terms of imple-
mentation complexity, performance, and memory consumption.
Our results show that P requires signicantly fewer lines
of code to perform the same task when compared to GitPython,
with only a small drop in performance. Also, we asked six develop-
ers to perform tasks with both tools and found that all developers
spend less time in learning and implementing tasks in P.
2 PYDRILLER
P is a wrapper around GitPython that eases the extraction
of information from Git repositories. The most signicant dier-
Code review