SlideShare uma empresa Scribd logo
1 de 19
Baixar para ler offline
Primers or Reminders?


The Effects of Existing Review
Comments on Code Review
Davide Spadini, Gül Calikli, Alberto Bacchelli
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement No. 642954
Davide Spadini, Gül Calikli, Alberto Bacchelli
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement No. 642954


@DavideSpadini ishepard
Primers or Reminders?


The Effects of Existing Review
Comments on Code Review
Motivation
Peer review of manuscripts
PyDriller: Python Framework for Mining Soware Repositories
Davide Spadini
Delft University of Technology
Software Improvement Group
Delft, The Netherlands
d.spadini@sig.eu
Maurício Aniche
Delft University of Technology
Delft, The Netherlands
m.f.aniche@tudelft.nl
Alberto Bacchelli
University of Zurich
Zurich, Switzerland
bacchelli@i.uzh.ch
ABSTRACT
Software repositories contain historical and valuable information
about the overall development of software systems. Mining software
repositories (MSR) is nowadays considered one of the most inter-
esting growing elds within software engineering. MSR focuses
on extracting and analyzing data available in software repositories
to uncover interesting, useful, and actionable information about
the system. Even though MSR plays an important role in software
engineering research, few tools have been created and made public
to support developers in extracting information from Git reposi-
tory. In this paper, we present P, a Python Framework that
eases the process of mining Git. We compare our tool against the
state-of-the-art Python Framework GitPython, demonstrating that
P can achieve the same results with, on average, 50% less
LOC and signicantly lower complexity.
URL: https://github.com/ishepard/pydriller,
Materials: https://doi.org/10.5281/zenodo.1327363,
Pre-print: https://doi.org/10.5281/zenodo.1327411
CCS CONCEPTS
• Software and its engineering;
KEYWORDS
Mining Software Repositories, GitPython, Git, Python
ACM Reference Format:
Davide Spadini, Maurício Aniche, and Alberto Bacchelli. 2018. PyDriller:
Python Framework for Mining Software Repositories. In Proceedings of the
26th ACM Joint European Software Engineering Conference and Symposium
on the Foundations of Software Engineering (ESEC/FSE ’18), November 4–
9, 2018, Lake Buena Vista, FL, USA. ACM, New York, NY, USA, 4 pages.
https://doi.org/10.1145/3236024.3264598
1 INTRODUCTION
Mining software repository (MSR) techniques allow researchers to
analyze the information generated throughout the software devel-
follow [20], predicting classes that are more prone to change/de-
fects [3, 6, 16, 17], and identifying the core developers of a software
team to transfer knowledge [12].
Among the dierent sources of information researchers can use,
version control systems, such as Git, are among the most used ones.
Indeed, version control systems provide researchers with precise
information about the source code, its evolution, the developers of
the software, and the commit messages (which explain the reasons
for changing).
Nevertheless, extracting information from Git repositories is
not trivial. Indeed, many frameworks can be used to interact with
Git (depending on the preferred programming language), such as
GitPython [1] for Python, or JGit for Java [8]. However, these tools
are often dicult to use. One of the main reasons for such diculty
is that they encapsulate all the features from Git, hence, developers
are forced to write long and complex implementations to extract
even simple data from a Git repository.
In this paper, we present P, a Python framework that
helps developers to mine software repositories. P provides
developers with simple APIs to extract information from a Git
repository, such as commits, developers, modications, dis, and
source code. Moreover, as P is a framework, developers
can further manipulate the extracted data and quickly export the
results to their preferred formats (e.g., CSV les and databases).
To evaluate the usefulness of our tool, we compare it with the
state-of-the-art Python framework GitPython, in terms of imple-
mentation complexity, performance, and memory consumption.
Our results show that P requires signicantly fewer lines
of code to perform the same task when compared to GitPython,
with only a small drop in performance. Also, we asked six develop-
ers to perform tasks with both tools and found that all developers
spend less time in learning and implementing tasks in P.
2 PYDRILLER
P is a wrapper around GitPython that eases the extraction
of information from Git repositories. The most signicant dier-
Code review
Motivation
Peer review of manuscripts
PyDriller: Python Framework for Mining Soware Repositories
Davide Spadini
Delft University of Technology
Software Improvement Group
Delft, The Netherlands
d.spadini@sig.eu
Maurício Aniche
Delft University of Technology
Delft, The Netherlands
m.f.aniche@tudelft.nl
Alberto Bacchelli
University of Zurich
Zurich, Switzerland
bacchelli@i.uzh.ch
ABSTRACT
Software repositories contain historical and valuable information
about the overall development of software systems. Mining software
repositories (MSR) is nowadays considered one of the most inter-
esting growing elds within software engineering. MSR focuses
on extracting and analyzing data available in software repositories
to uncover interesting, useful, and actionable information about
the system. Even though MSR plays an important role in software
engineering research, few tools have been created and made public
to support developers in extracting information from Git reposi-
tory. In this paper, we present P, a Python Framework that
eases the process of mining Git. We compare our tool against the
state-of-the-art Python Framework GitPython, demonstrating that
P can achieve the same results with, on average, 50% less
LOC and signicantly lower complexity.
URL: https://github.com/ishepard/pydriller,
Materials: https://doi.org/10.5281/zenodo.1327363,
Pre-print: https://doi.org/10.5281/zenodo.1327411
CCS CONCEPTS
• Software and its engineering;
KEYWORDS
Mining Software Repositories, GitPython, Git, Python
ACM Reference Format:
Davide Spadini, Maurício Aniche, and Alberto Bacchelli. 2018. PyDriller:
Python Framework for Mining Software Repositories. In Proceedings of the
26th ACM Joint European Software Engineering Conference and Symposium
on the Foundations of Software Engineering (ESEC/FSE ’18), November 4–
9, 2018, Lake Buena Vista, FL, USA. ACM, New York, NY, USA, 4 pages.
https://doi.org/10.1145/3236024.3264598
1 INTRODUCTION
Mining software repository (MSR) techniques allow researchers to
analyze the information generated throughout the software devel-
follow [20], predicting classes that are more prone to change/de-
fects [3, 6, 16, 17], and identifying the core developers of a software
team to transfer knowledge [12].
Among the dierent sources of information researchers can use,
version control systems, such as Git, are among the most used ones.
Indeed, version control systems provide researchers with precise
information about the source code, its evolution, the developers of
the software, and the commit messages (which explain the reasons
for changing).
Nevertheless, extracting information from Git repositories is
not trivial. Indeed, many frameworks can be used to interact with
Git (depending on the preferred programming language), such as
GitPython [1] for Python, or JGit for Java [8]. However, these tools
are often dicult to use. One of the main reasons for such diculty
is that they encapsulate all the features from Git, hence, developers
are forced to write long and complex implementations to extract
even simple data from a Git repository.
In this paper, we present P, a Python framework that
helps developers to mine software repositories. P provides
developers with simple APIs to extract information from a Git
repository, such as commits, developers, modications, dis, and
source code. Moreover, as P is a framework, developers
can further manipulate the extracted data and quickly export the
results to their preferred formats (e.g., CSV les and databases).
To evaluate the usefulness of our tool, we compare it with the
state-of-the-art Python framework GitPython, in terms of imple-
mentation complexity, performance, and memory consumption.
Our results show that P requires signicantly fewer lines
of code to perform the same task when compared to GitPython,
with only a small drop in performance. Also, we asked six develop-
ers to perform tasks with both tools and found that all developers
spend less time in learning and implementing tasks in P.
2 PYDRILLER
P is a wrapper around GitPython that eases the extraction
of information from Git repositories. The most signicant dier-
Code review
- Asynchronous - Asynchronous
Motivation
Peer review of manuscripts
PyDriller: Python Framework for Mining Soware Repositories
Davide Spadini
Delft University of Technology
Software Improvement Group
Delft, The Netherlands
d.spadini@sig.eu
Maurício Aniche
Delft University of Technology
Delft, The Netherlands
m.f.aniche@tudelft.nl
Alberto Bacchelli
University of Zurich
Zurich, Switzerland
bacchelli@i.uzh.ch
ABSTRACT
Software repositories contain historical and valuable information
about the overall development of software systems. Mining software
repositories (MSR) is nowadays considered one of the most inter-
esting growing elds within software engineering. MSR focuses
on extracting and analyzing data available in software repositories
to uncover interesting, useful, and actionable information about
the system. Even though MSR plays an important role in software
engineering research, few tools have been created and made public
to support developers in extracting information from Git reposi-
tory. In this paper, we present P, a Python Framework that
eases the process of mining Git. We compare our tool against the
state-of-the-art Python Framework GitPython, demonstrating that
P can achieve the same results with, on average, 50% less
LOC and signicantly lower complexity.
URL: https://github.com/ishepard/pydriller,
Materials: https://doi.org/10.5281/zenodo.1327363,
Pre-print: https://doi.org/10.5281/zenodo.1327411
CCS CONCEPTS
• Software and its engineering;
KEYWORDS
Mining Software Repositories, GitPython, Git, Python
ACM Reference Format:
Davide Spadini, Maurício Aniche, and Alberto Bacchelli. 2018. PyDriller:
Python Framework for Mining Software Repositories. In Proceedings of the
26th ACM Joint European Software Engineering Conference and Symposium
on the Foundations of Software Engineering (ESEC/FSE ’18), November 4–
9, 2018, Lake Buena Vista, FL, USA. ACM, New York, NY, USA, 4 pages.
https://doi.org/10.1145/3236024.3264598
1 INTRODUCTION
Mining software repository (MSR) techniques allow researchers to
analyze the information generated throughout the software devel-
follow [20], predicting classes that are more prone to change/de-
fects [3, 6, 16, 17], and identifying the core developers of a software
team to transfer knowledge [12].
Among the dierent sources of information researchers can use,
version control systems, such as Git, are among the most used ones.
Indeed, version control systems provide researchers with precise
information about the source code, its evolution, the developers of
the software, and the commit messages (which explain the reasons
for changing).
Nevertheless, extracting information from Git repositories is
not trivial. Indeed, many frameworks can be used to interact with
Git (depending on the preferred programming language), such as
GitPython [1] for Python, or JGit for Java [8]. However, these tools
are often dicult to use. One of the main reasons for such diculty
is that they encapsulate all the features from Git, hence, developers
are forced to write long and complex implementations to extract
even simple data from a Git repository.
In this paper, we present P, a Python framework that
helps developers to mine software repositories. P provides
developers with simple APIs to extract information from a Git
repository, such as commits, developers, modications, dis, and
source code. Moreover, as P is a framework, developers
can further manipulate the extracted data and quickly export the
results to their preferred formats (e.g., CSV les and databases).
To evaluate the usefulness of our tool, we compare it with the
state-of-the-art Python framework GitPython, in terms of imple-
mentation complexity, performance, and memory consumption.
Our results show that P requires signicantly fewer lines
of code to perform the same task when compared to GitPython,
with only a small drop in performance. Also, we asked six develop-
ers to perform tasks with both tools and found that all developers
spend less time in learning and implementing tasks in P.
2 PYDRILLER
P is a wrapper around GitPython that eases the extraction
of information from Git repositories. The most signicant dier-
Code review
- Asynchronous


-  1 reviewer per peer review
- Asynchronous


-  1 reviewer per code review
Motivation
Peer review of manuscripts
PyDriller: Python Framework for Mining Soware Repositories
Davide Spadini
Delft University of Technology
Software Improvement Group
Delft, The Netherlands
d.spadini@sig.eu
Maurício Aniche
Delft University of Technology
Delft, The Netherlands
m.f.aniche@tudelft.nl
Alberto Bacchelli
University of Zurich
Zurich, Switzerland
bacchelli@i.uzh.ch
ABSTRACT
Software repositories contain historical and valuable information
about the overall development of software systems. Mining software
repositories (MSR) is nowadays considered one of the most inter-
esting growing elds within software engineering. MSR focuses
on extracting and analyzing data available in software repositories
to uncover interesting, useful, and actionable information about
the system. Even though MSR plays an important role in software
engineering research, few tools have been created and made public
to support developers in extracting information from Git reposi-
tory. In this paper, we present P, a Python Framework that
eases the process of mining Git. We compare our tool against the
state-of-the-art Python Framework GitPython, demonstrating that
P can achieve the same results with, on average, 50% less
LOC and signicantly lower complexity.
URL: https://github.com/ishepard/pydriller,
Materials: https://doi.org/10.5281/zenodo.1327363,
Pre-print: https://doi.org/10.5281/zenodo.1327411
CCS CONCEPTS
• Software and its engineering;
KEYWORDS
Mining Software Repositories, GitPython, Git, Python
ACM Reference Format:
Davide Spadini, Maurício Aniche, and Alberto Bacchelli. 2018. PyDriller:
Python Framework for Mining Software Repositories. In Proceedings of the
26th ACM Joint European Software Engineering Conference and Symposium
on the Foundations of Software Engineering (ESEC/FSE ’18), November 4–
9, 2018, Lake Buena Vista, FL, USA. ACM, New York, NY, USA, 4 pages.
https://doi.org/10.1145/3236024.3264598
1 INTRODUCTION
Mining software repository (MSR) techniques allow researchers to
analyze the information generated throughout the software devel-
follow [20], predicting classes that are more prone to change/de-
fects [3, 6, 16, 17], and identifying the core developers of a software
team to transfer knowledge [12].
Among the dierent sources of information researchers can use,
version control systems, such as Git, are among the most used ones.
Indeed, version control systems provide researchers with precise
information about the source code, its evolution, the developers of
the software, and the commit messages (which explain the reasons
for changing).
Nevertheless, extracting information from Git repositories is
not trivial. Indeed, many frameworks can be used to interact with
Git (depending on the preferred programming language), such as
GitPython [1] for Python, or JGit for Java [8]. However, these tools
are often dicult to use. One of the main reasons for such diculty
is that they encapsulate all the features from Git, hence, developers
are forced to write long and complex implementations to extract
even simple data from a Git repository.
In this paper, we present P, a Python framework that
helps developers to mine software repositories. P provides
developers with simple APIs to extract information from a Git
repository, such as commits, developers, modications, dis, and
source code. Moreover, as P is a framework, developers
can further manipulate the extracted data and quickly export the
results to their preferred formats (e.g., CSV les and databases).
To evaluate the usefulness of our tool, we compare it with the
state-of-the-art Python framework GitPython, in terms of imple-
mentation complexity, performance, and memory consumption.
Our results show that P requires signicantly fewer lines
of code to perform the same task when compared to GitPython,
with only a small drop in performance. Also, we asked six develop-
ers to perform tasks with both tools and found that all developers
spend less time in learning and implementing tasks in P.
2 PYDRILLER
P is a wrapper around GitPython that eases the extraction
of information from Git repositories. The most signicant dier-
Code review
- Asynchronous


-  1 reviewer per peer review


- Reviewers judge the manuscript
independently from each other
- Asynchronous


-  1 reviewer per code review


- Reviews are immediately visible to
the other reviewers
Could this visibility bias


the other reviewers?
Availability Bias
• Availability bias is one type of cognitive bias


• It is the tendency to overestimate the likelihood of events with greater
availability in memory (recent memories)


• By reading other reviewers’ comments, the reviewer might be biased in
fi
nding
the same types of errors, thus resulting in a biased code review outcome.


• Are reviewers biased by other reviewers comments? Should we change the
code review process and do something similar to a manuscript review?
Research Questions
RQ1: … a bug type that is not normally considered?
RQ2: … a bug type that is normally considered?
What is the effect of priming a reviewer
with comments on …
Demographics


 confounders
With 1 comment of a

 previous reviewer
Treatment Group
Without comments
Control Group
review
review
review
review
Normally

considered:

Corner Case
Normally


considered:


Corner Case
Not normally


considered:


NPE on
parameters
Not normally


considered:


NPE on
parameters
With 1 comment of a

previous reviewer
Treatment Group
Without comments
Control Group
review
review
review
review
Normally

considered:

Corner Case
Normally


considered:


Corner Case
Not normally


considered:


NPE on
parameters
Not normally


considered:


NPE on
parameters
}Questions on


the code review
RQ1: What is the effect of priming a reviewer with comments on a bug type that is not
normally considered?
Reviewers primed on a not commonly considered bug are more
likely to
fi
nd other occurrences of this type of bugs. However, this
does not prevent them in
fi
nding also other types of bugs.
40%: “Extremely in
fl
uenced”


40%: ”Very in
fl
uenced

 20%: ”Somewhat in
fl
uenced
RQ1: What is the effect of priming a reviewer with comments on a bug type that is not
normally considered?
RQ2: What is the effect of priming a reviewer with comments on a bug type
that is normally considered?
RQ2: Results
50%: ”Extremely in
fl
uenced


10%: ”Somewhat in
fl
uenced”


40%: “Slightly/Not In
fl
uenced”
Reviewers primed on an algorithmic bug perceive an in
fl
uence, but
are as likely as the others to
fi
nd algorithmic bugs. Furthermore,
primed participants did not capture fewer bugs of the other type.
Closing the circle
Peer review of manuscripts
PyDriller: Python Framework for Mining Soware Repositories
Davide Spadini
Delft University of Technology
Software Improvement Group
Delft, The Netherlands
d.spadini@sig.eu
Maurício Aniche
Delft University of Technology
Delft, The Netherlands
m.f.aniche@tudelft.nl
Alberto Bacchelli
University of Zurich
Zurich, Switzerland
bacchelli@i.uzh.ch
ABSTRACT
Software repositories contain historical and valuable information
about the overall development of software systems. Mining software
repositories (MSR) is nowadays considered one of the most inter-
esting growing elds within software engineering. MSR focuses
on extracting and analyzing data available in software repositories
to uncover interesting, useful, and actionable information about
the system. Even though MSR plays an important role in software
engineering research, few tools have been created and made public
to support developers in extracting information from Git reposi-
tory. In this paper, we present P, a Python Framework that
eases the process of mining Git. We compare our tool against the
state-of-the-art Python Framework GitPython, demonstrating that
P can achieve the same results with, on average, 50% less
LOC and signicantly lower complexity.
URL: https://github.com/ishepard/pydriller,
Materials: https://doi.org/10.5281/zenodo.1327363,
Pre-print: https://doi.org/10.5281/zenodo.1327411
CCS CONCEPTS
• Software and its engineering;
KEYWORDS
Mining Software Repositories, GitPython, Git, Python
ACM Reference Format:
Davide Spadini, Maurício Aniche, and Alberto Bacchelli. 2018. PyDriller:
Python Framework for Mining Software Repositories. In Proceedings of the
26th ACM Joint European Software Engineering Conference and Symposium
on the Foundations of Software Engineering (ESEC/FSE ’18), November 4–
9, 2018, Lake Buena Vista, FL, USA. ACM, New York, NY, USA, 4 pages.
https://doi.org/10.1145/3236024.3264598
1 INTRODUCTION
Mining software repository (MSR) techniques allow researchers to
analyze the information generated throughout the software devel-
follow [20], predicting classes that are more prone to change/de-
fects [3, 6, 16, 17], and identifying the core developers of a software
team to transfer knowledge [12].
Among the dierent sources of information researchers can use,
version control systems, such as Git, are among the most used ones.
Indeed, version control systems provide researchers with precise
information about the source code, its evolution, the developers of
the software, and the commit messages (which explain the reasons
for changing).
Nevertheless, extracting information from Git repositories is
not trivial. Indeed, many frameworks can be used to interact with
Git (depending on the preferred programming language), such as
GitPython [1] for Python, or JGit for Java [8]. However, these tools
are often dicult to use. One of the main reasons for such diculty
is that they encapsulate all the features from Git, hence, developers
are forced to write long and complex implementations to extract
even simple data from a Git repository.
In this paper, we present P, a Python framework that
helps developers to mine software repositories. P provides
developers with simple APIs to extract information from a Git
repository, such as commits, developers, modications, dis, and
source code. Moreover, as P is a framework, developers
can further manipulate the extracted data and quickly export the
results to their preferred formats (e.g., CSV les and databases).
To evaluate the usefulness of our tool, we compare it with the
state-of-the-art Python framework GitPython, in terms of imple-
mentation complexity, performance, and memory consumption.
Our results show that P requires signicantly fewer lines
of code to perform the same task when compared to GitPython,
with only a small drop in performance. Also, we asked six develop-
ers to perform tasks with both tools and found that all developers
spend less time in learning and implementing tasks in P.
2 PYDRILLER
P is a wrapper around GitPython that eases the extraction
of information from Git repositories. The most signicant dier-
Code review
Primers or Reminders? The Effects of Existing Review Comments on Code Review

Mais conteúdo relacionado

Mais procurados

Grilo: Feeding applications with multimedia content (GUADEC 2010)
Grilo: Feeding applications with multimedia content (GUADEC 2010)Grilo: Feeding applications with multimedia content (GUADEC 2010)
Grilo: Feeding applications with multimedia content (GUADEC 2010)Igalia
 
"Different software evolutions from Start till Release in PHP product" Oleksa...
"Different software evolutions from Start till Release in PHP product" Oleksa..."Different software evolutions from Start till Release in PHP product" Oleksa...
"Different software evolutions from Start till Release in PHP product" Oleksa...Fwdays
 
Technical integration of data repositories status and challenges
Technical integration of data repositories status and challengesTechnical integration of data repositories status and challenges
Technical integration of data repositories status and challengesvty
 
Spark-MPI: Approaching the Fifth Paradigm with Nikolay Malitsky
Spark-MPI: Approaching the Fifth Paradigm with Nikolay MalitskySpark-MPI: Approaching the Fifth Paradigm with Nikolay Malitsky
Spark-MPI: Approaching the Fifth Paradigm with Nikolay MalitskyDatabricks
 
Oscon 2011 Practicing Open Science
Oscon 2011 Practicing Open ScienceOscon 2011 Practicing Open Science
Oscon 2011 Practicing Open ScienceMarcus Hanwell
 
DVC: O'Reilly Artificial Intelligence Conference 2019 - New York
DVC: O'Reilly Artificial Intelligence Conference 2019 - New YorkDVC: O'Reilly Artificial Intelligence Conference 2019 - New York
DVC: O'Reilly Artificial Intelligence Conference 2019 - New YorkDmitry Petrov
 
Welcome to the FOSS4G Community
Welcome to the FOSS4G CommunityWelcome to the FOSS4G Community
Welcome to the FOSS4G CommunityJody Garnett
 
Chemistry development kit
Chemistry development kitChemistry development kit
Chemistry development kitAlichy Sowmya
 
External controlled vocabularies support in Dataverse
External controlled vocabularies support in DataverseExternal controlled vocabularies support in Dataverse
External controlled vocabularies support in Dataversevty
 
OpenACC Monthly Highlights: June 2020
OpenACC Monthly Highlights: June 2020OpenACC Monthly Highlights: June 2020
OpenACC Monthly Highlights: June 2020OpenACC
 
The world of Docker and Kubernetes
The world of Docker and Kubernetes The world of Docker and Kubernetes
The world of Docker and Kubernetes vty
 
Dataset Descriptions in Open PHACTS and HCLS
Dataset Descriptions in Open PHACTS and HCLSDataset Descriptions in Open PHACTS and HCLS
Dataset Descriptions in Open PHACTS and HCLSAlasdair Gray
 
Flexible metadata schemes for research data repositories - Clarin Conference...
Flexible metadata schemes for research data repositories  - Clarin Conference...Flexible metadata schemes for research data repositories  - Clarin Conference...
Flexible metadata schemes for research data repositories - Clarin Conference...Vyacheslav Tykhonov
 
Introduction to cloud-native application development: with Heroku and Spring ...
Introduction to cloud-native application development: with Heroku and Spring ...Introduction to cloud-native application development: with Heroku and Spring ...
Introduction to cloud-native application development: with Heroku and Spring ...Roberto Casadei
 

Mais procurados (15)

Grilo: Feeding applications with multimedia content (GUADEC 2010)
Grilo: Feeding applications with multimedia content (GUADEC 2010)Grilo: Feeding applications with multimedia content (GUADEC 2010)
Grilo: Feeding applications with multimedia content (GUADEC 2010)
 
"Different software evolutions from Start till Release in PHP product" Oleksa...
"Different software evolutions from Start till Release in PHP product" Oleksa..."Different software evolutions from Start till Release in PHP product" Oleksa...
"Different software evolutions from Start till Release in PHP product" Oleksa...
 
Technical integration of data repositories status and challenges
Technical integration of data repositories status and challengesTechnical integration of data repositories status and challenges
Technical integration of data repositories status and challenges
 
Spark-MPI: Approaching the Fifth Paradigm with Nikolay Malitsky
Spark-MPI: Approaching the Fifth Paradigm with Nikolay MalitskySpark-MPI: Approaching the Fifth Paradigm with Nikolay Malitsky
Spark-MPI: Approaching the Fifth Paradigm with Nikolay Malitsky
 
Oscon 2011 Practicing Open Science
Oscon 2011 Practicing Open ScienceOscon 2011 Practicing Open Science
Oscon 2011 Practicing Open Science
 
DVC: O'Reilly Artificial Intelligence Conference 2019 - New York
DVC: O'Reilly Artificial Intelligence Conference 2019 - New YorkDVC: O'Reilly Artificial Intelligence Conference 2019 - New York
DVC: O'Reilly Artificial Intelligence Conference 2019 - New York
 
Welcome to the FOSS4G Community
Welcome to the FOSS4G CommunityWelcome to the FOSS4G Community
Welcome to the FOSS4G Community
 
Open Development
Open DevelopmentOpen Development
Open Development
 
Chemistry development kit
Chemistry development kitChemistry development kit
Chemistry development kit
 
External controlled vocabularies support in Dataverse
External controlled vocabularies support in DataverseExternal controlled vocabularies support in Dataverse
External controlled vocabularies support in Dataverse
 
OpenACC Monthly Highlights: June 2020
OpenACC Monthly Highlights: June 2020OpenACC Monthly Highlights: June 2020
OpenACC Monthly Highlights: June 2020
 
The world of Docker and Kubernetes
The world of Docker and Kubernetes The world of Docker and Kubernetes
The world of Docker and Kubernetes
 
Dataset Descriptions in Open PHACTS and HCLS
Dataset Descriptions in Open PHACTS and HCLSDataset Descriptions in Open PHACTS and HCLS
Dataset Descriptions in Open PHACTS and HCLS
 
Flexible metadata schemes for research data repositories - Clarin Conference...
Flexible metadata schemes for research data repositories  - Clarin Conference...Flexible metadata schemes for research data repositories  - Clarin Conference...
Flexible metadata schemes for research data repositories - Clarin Conference...
 
Introduction to cloud-native application development: with Heroku and Spring ...
Introduction to cloud-native application development: with Heroku and Spring ...Introduction to cloud-native application development: with Heroku and Spring ...
Introduction to cloud-native application development: with Heroku and Spring ...
 

Semelhante a Primers or Reminders? The Effects of Existing Review Comments on Code Review

The path to an hybrid open source paradigm
The path to an hybrid open source paradigmThe path to an hybrid open source paradigm
The path to an hybrid open source paradigmJonathan Challener
 
A $5 Billion Value (Linux Foundation, 2015)
A $5 Billion Value (Linux Foundation, 2015)A $5 Billion Value (Linux Foundation, 2015)
A $5 Billion Value (Linux Foundation, 2015)Simone Aliprandi
 
lfpub_cp_cost_estimate2015 (1)
lfpub_cp_cost_estimate2015 (1)lfpub_cp_cost_estimate2015 (1)
lfpub_cp_cost_estimate2015 (1)Amanda McPherson
 
Building a Cyber Threat Intelligence Knowledge Management System (Paris Augus...
Building a Cyber Threat Intelligence Knowledge Management System (Paris Augus...Building a Cyber Threat Intelligence Knowledge Management System (Paris Augus...
Building a Cyber Threat Intelligence Knowledge Management System (Paris Augus...Vaticle
 
IDW2022: A decades experiences in transparent and interactive publication of ...
IDW2022: A decades experiences in transparent and interactive publication of ...IDW2022: A decades experiences in transparent and interactive publication of ...
IDW2022: A decades experiences in transparent and interactive publication of ...GigaScience, BGI Hong Kong
 
Data Science Meets DevOps: GitOps with OpenShift (1).pdf
Data Science Meets DevOps: GitOps with OpenShift (1).pdfData Science Meets DevOps: GitOps with OpenShift (1).pdf
Data Science Meets DevOps: GitOps with OpenShift (1).pdfHemaVeeradhi1
 
Anaconda and PyData Solutions
Anaconda and PyData SolutionsAnaconda and PyData Solutions
Anaconda and PyData SolutionsTravis Oliphant
 
Future of jobs and digital economy citi conference 090618
Future of jobs and digital economy citi conference 090618Future of jobs and digital economy citi conference 090618
Future of jobs and digital economy citi conference 090618Economic Strategy Institute
 
Complex Made Simple @ LF Energy Conference in Paris
Complex Made Simple @ LF Energy Conference in ParisComplex Made Simple @ LF Energy Conference in Paris
Complex Made Simple @ LF Energy Conference in ParisShane Coughlan
 
OSS Projects Knowledge Mining with CROSSMINER, OW2con'18, June 7-8, 2018
OSS Projects Knowledge Mining with CROSSMINER, OW2con'18, June 7-8, 2018OSS Projects Knowledge Mining with CROSSMINER, OW2con'18, June 7-8, 2018
OSS Projects Knowledge Mining with CROSSMINER, OW2con'18, June 7-8, 2018OW2
 
Easing IoT Development for Novice Programmers Through Code Recipes
Easing IoT Development for Novice Programmers Through Code RecipesEasing IoT Development for Novice Programmers Through Code Recipes
Easing IoT Development for Novice Programmers Through Code RecipesJuan Pablo Sáenz
 
BigDataEurope @BDVA Summit2016 2: Societal Pilots
BigDataEurope @BDVA Summit2016 2: Societal PilotsBigDataEurope @BDVA Summit2016 2: Societal Pilots
BigDataEurope @BDVA Summit2016 2: Societal PilotsBigData_Europe
 
OpenChain Mini-Summit May 2023
OpenChain Mini-Summit May 2023OpenChain Mini-Summit May 2023
OpenChain Mini-Summit May 2023Shane Coughlan
 
Seven Ways to Boost Artificial Intelligence Research
Seven Ways to Boost Artificial Intelligence ResearchSeven Ways to Boost Artificial Intelligence Research
Seven Ways to Boost Artificial Intelligence ResearchNVIDIA
 
Self-Service IoT Data Analytics with StreamPipes
Self-Service IoT Data Analytics with StreamPipesSelf-Service IoT Data Analytics with StreamPipes
Self-Service IoT Data Analytics with StreamPipesApache StreamPipes
 
Big Data projects.pdf
Big Data projects.pdfBig Data projects.pdf
Big Data projects.pdfssuserf0a206
 
From Copycat Codelets to an AI Market Internet Protocol
From Copycat Codelets to an AI Market Internet ProtocolFrom Copycat Codelets to an AI Market Internet Protocol
From Copycat Codelets to an AI Market Internet ProtocolStefan Ianta
 
OpenChain Japan Work Group - Meeting 27
OpenChain Japan Work Group - Meeting 27OpenChain Japan Work Group - Meeting 27
OpenChain Japan Work Group - Meeting 27Shane Coughlan
 

Semelhante a Primers or Reminders? The Effects of Existing Review Comments on Code Review (20)

The path to an hybrid open source paradigm
The path to an hybrid open source paradigmThe path to an hybrid open source paradigm
The path to an hybrid open source paradigm
 
A $5 Billion Value (Linux Foundation, 2015)
A $5 Billion Value (Linux Foundation, 2015)A $5 Billion Value (Linux Foundation, 2015)
A $5 Billion Value (Linux Foundation, 2015)
 
lfpub_cp_cost_estimate2015 (1)
lfpub_cp_cost_estimate2015 (1)lfpub_cp_cost_estimate2015 (1)
lfpub_cp_cost_estimate2015 (1)
 
Building a Cyber Threat Intelligence Knowledge Management System (Paris Augus...
Building a Cyber Threat Intelligence Knowledge Management System (Paris Augus...Building a Cyber Threat Intelligence Knowledge Management System (Paris Augus...
Building a Cyber Threat Intelligence Knowledge Management System (Paris Augus...
 
Ecosystem WG
Ecosystem WGEcosystem WG
Ecosystem WG
 
IDW2022: A decades experiences in transparent and interactive publication of ...
IDW2022: A decades experiences in transparent and interactive publication of ...IDW2022: A decades experiences in transparent and interactive publication of ...
IDW2022: A decades experiences in transparent and interactive publication of ...
 
Data Science Meets DevOps: GitOps with OpenShift (1).pdf
Data Science Meets DevOps: GitOps with OpenShift (1).pdfData Science Meets DevOps: GitOps with OpenShift (1).pdf
Data Science Meets DevOps: GitOps with OpenShift (1).pdf
 
Anaconda and PyData Solutions
Anaconda and PyData SolutionsAnaconda and PyData Solutions
Anaconda and PyData Solutions
 
Future of jobs and digital economy citi conference 090618
Future of jobs and digital economy citi conference 090618Future of jobs and digital economy citi conference 090618
Future of jobs and digital economy citi conference 090618
 
Complex Made Simple @ LF Energy Conference in Paris
Complex Made Simple @ LF Energy Conference in ParisComplex Made Simple @ LF Energy Conference in Paris
Complex Made Simple @ LF Energy Conference in Paris
 
OSS Projects Knowledge Mining with CROSSMINER, OW2con'18, June 7-8, 2018
OSS Projects Knowledge Mining with CROSSMINER, OW2con'18, June 7-8, 2018OSS Projects Knowledge Mining with CROSSMINER, OW2con'18, June 7-8, 2018
OSS Projects Knowledge Mining with CROSSMINER, OW2con'18, June 7-8, 2018
 
Easing IoT Development for Novice Programmers Through Code Recipes
Easing IoT Development for Novice Programmers Through Code RecipesEasing IoT Development for Novice Programmers Through Code Recipes
Easing IoT Development for Novice Programmers Through Code Recipes
 
BigDataEurope @BDVA Summit2016 2: Societal Pilots
BigDataEurope @BDVA Summit2016 2: Societal PilotsBigDataEurope @BDVA Summit2016 2: Societal Pilots
BigDataEurope @BDVA Summit2016 2: Societal Pilots
 
OpenChain Mini-Summit May 2023
OpenChain Mini-Summit May 2023OpenChain Mini-Summit May 2023
OpenChain Mini-Summit May 2023
 
Msr2021 tutorial-di penta
Msr2021 tutorial-di pentaMsr2021 tutorial-di penta
Msr2021 tutorial-di penta
 
Seven Ways to Boost Artificial Intelligence Research
Seven Ways to Boost Artificial Intelligence ResearchSeven Ways to Boost Artificial Intelligence Research
Seven Ways to Boost Artificial Intelligence Research
 
Self-Service IoT Data Analytics with StreamPipes
Self-Service IoT Data Analytics with StreamPipesSelf-Service IoT Data Analytics with StreamPipes
Self-Service IoT Data Analytics with StreamPipes
 
Big Data projects.pdf
Big Data projects.pdfBig Data projects.pdf
Big Data projects.pdf
 
From Copycat Codelets to an AI Market Internet Protocol
From Copycat Codelets to an AI Market Internet ProtocolFrom Copycat Codelets to an AI Market Internet Protocol
From Copycat Codelets to an AI Market Internet Protocol
 
OpenChain Japan Work Group - Meeting 27
OpenChain Japan Work Group - Meeting 27OpenChain Japan Work Group - Meeting 27
OpenChain Japan Work Group - Meeting 27
 

Mais de Delft University of Technology

Mais de Delft University of Technology (7)

Investigating Severity Thresholds for Test Smells
Investigating Severity Thresholds for Test SmellsInvestigating Severity Thresholds for Test Smells
Investigating Severity Thresholds for Test Smells
 
Test-Driven Code Review: An Empirical Study
Test-Driven Code Review: An Empirical StudyTest-Driven Code Review: An Empirical Study
Test-Driven Code Review: An Empirical Study
 
Practices and Tools for Better Software Testing
Practices and Tools for  Better Software TestingPractices and Tools for  Better Software Testing
Practices and Tools for Better Software Testing
 
PyDriller: Python Framework for Mining Software Repositories
PyDriller: Python Framework for Mining Software RepositoriesPyDriller: Python Framework for Mining Software Repositories
PyDriller: Python Framework for Mining Software Repositories
 
When Testing Meets Code Review: Why and How Developers Review Tests
When Testing Meets Code Review: Why and How Developers Review TestsWhen Testing Meets Code Review: Why and How Developers Review Tests
When Testing Meets Code Review: Why and How Developers Review Tests
 
On The Relation of Test Smells to Software Code Quality
On The Relation of Test Smells to Software Code QualityOn The Relation of Test Smells to Software Code Quality
On The Relation of Test Smells to Software Code Quality
 
To Mock or Not To Mock
To Mock or Not To MockTo Mock or Not To Mock
To Mock or Not To Mock
 

Último

Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncWhy does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncssuser2ae721
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidNikhilNagaraju
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptSAURABHKUMAR892774
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)Dr SOUNDIRARAJ N
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...VICTOR MAESTRE RAMIREZ
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...Chandu841456
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx959SahilShah
 
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)dollysharma2066
 
An introduction to Semiconductor and its types.pptx
An introduction to Semiconductor and its types.pptxAn introduction to Semiconductor and its types.pptx
An introduction to Semiconductor and its types.pptxPurva Nikam
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfAsst.prof M.Gokilavani
 
Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...121011101441
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxbritheesh05
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...asadnawaz62
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxDeepakSakkari2
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxKartikeyaDwivedi3
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerAnamika Sarkar
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...srsj9000
 

Último (20)

young call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Serviceyoung call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Service
 
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncWhy does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfid
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.ppt
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx
 
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
 
Design and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdfDesign and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdf
 
An introduction to Semiconductor and its types.pptx
An introduction to Semiconductor and its types.pptxAn introduction to Semiconductor and its types.pptx
An introduction to Semiconductor and its types.pptx
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
 
Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptx
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptx
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptx
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
 

Primers or Reminders? The Effects of Existing Review Comments on Code Review

  • 1. Primers or Reminders? The Effects of Existing Review Comments on Code Review Davide Spadini, Gül Calikli, Alberto Bacchelli This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement No. 642954
  • 2. Davide Spadini, Gül Calikli, Alberto Bacchelli This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement No. 642954 @DavideSpadini ishepard Primers or Reminders? The Effects of Existing Review Comments on Code Review
  • 3. Motivation Peer review of manuscripts PyDriller: Python Framework for Mining Soware Repositories Davide Spadini Delft University of Technology Software Improvement Group Delft, The Netherlands d.spadini@sig.eu Maurício Aniche Delft University of Technology Delft, The Netherlands m.f.aniche@tudelft.nl Alberto Bacchelli University of Zurich Zurich, Switzerland bacchelli@i.uzh.ch ABSTRACT Software repositories contain historical and valuable information about the overall development of software systems. Mining software repositories (MSR) is nowadays considered one of the most inter- esting growing elds within software engineering. MSR focuses on extracting and analyzing data available in software repositories to uncover interesting, useful, and actionable information about the system. Even though MSR plays an important role in software engineering research, few tools have been created and made public to support developers in extracting information from Git reposi- tory. In this paper, we present P, a Python Framework that eases the process of mining Git. We compare our tool against the state-of-the-art Python Framework GitPython, demonstrating that P can achieve the same results with, on average, 50% less LOC and signicantly lower complexity. URL: https://github.com/ishepard/pydriller, Materials: https://doi.org/10.5281/zenodo.1327363, Pre-print: https://doi.org/10.5281/zenodo.1327411 CCS CONCEPTS • Software and its engineering; KEYWORDS Mining Software Repositories, GitPython, Git, Python ACM Reference Format: Davide Spadini, Maurício Aniche, and Alberto Bacchelli. 2018. PyDriller: Python Framework for Mining Software Repositories. In Proceedings of the 26th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE ’18), November 4– 9, 2018, Lake Buena Vista, FL, USA. ACM, New York, NY, USA, 4 pages. https://doi.org/10.1145/3236024.3264598 1 INTRODUCTION Mining software repository (MSR) techniques allow researchers to analyze the information generated throughout the software devel- follow [20], predicting classes that are more prone to change/de- fects [3, 6, 16, 17], and identifying the core developers of a software team to transfer knowledge [12]. Among the dierent sources of information researchers can use, version control systems, such as Git, are among the most used ones. Indeed, version control systems provide researchers with precise information about the source code, its evolution, the developers of the software, and the commit messages (which explain the reasons for changing). Nevertheless, extracting information from Git repositories is not trivial. Indeed, many frameworks can be used to interact with Git (depending on the preferred programming language), such as GitPython [1] for Python, or JGit for Java [8]. However, these tools are often dicult to use. One of the main reasons for such diculty is that they encapsulate all the features from Git, hence, developers are forced to write long and complex implementations to extract even simple data from a Git repository. In this paper, we present P, a Python framework that helps developers to mine software repositories. P provides developers with simple APIs to extract information from a Git repository, such as commits, developers, modications, dis, and source code. Moreover, as P is a framework, developers can further manipulate the extracted data and quickly export the results to their preferred formats (e.g., CSV les and databases). To evaluate the usefulness of our tool, we compare it with the state-of-the-art Python framework GitPython, in terms of imple- mentation complexity, performance, and memory consumption. Our results show that P requires signicantly fewer lines of code to perform the same task when compared to GitPython, with only a small drop in performance. Also, we asked six develop- ers to perform tasks with both tools and found that all developers spend less time in learning and implementing tasks in P. 2 PYDRILLER P is a wrapper around GitPython that eases the extraction of information from Git repositories. The most signicant dier- Code review
  • 4. Motivation Peer review of manuscripts PyDriller: Python Framework for Mining Soware Repositories Davide Spadini Delft University of Technology Software Improvement Group Delft, The Netherlands d.spadini@sig.eu Maurício Aniche Delft University of Technology Delft, The Netherlands m.f.aniche@tudelft.nl Alberto Bacchelli University of Zurich Zurich, Switzerland bacchelli@i.uzh.ch ABSTRACT Software repositories contain historical and valuable information about the overall development of software systems. Mining software repositories (MSR) is nowadays considered one of the most inter- esting growing elds within software engineering. MSR focuses on extracting and analyzing data available in software repositories to uncover interesting, useful, and actionable information about the system. Even though MSR plays an important role in software engineering research, few tools have been created and made public to support developers in extracting information from Git reposi- tory. In this paper, we present P, a Python Framework that eases the process of mining Git. We compare our tool against the state-of-the-art Python Framework GitPython, demonstrating that P can achieve the same results with, on average, 50% less LOC and signicantly lower complexity. URL: https://github.com/ishepard/pydriller, Materials: https://doi.org/10.5281/zenodo.1327363, Pre-print: https://doi.org/10.5281/zenodo.1327411 CCS CONCEPTS • Software and its engineering; KEYWORDS Mining Software Repositories, GitPython, Git, Python ACM Reference Format: Davide Spadini, Maurício Aniche, and Alberto Bacchelli. 2018. PyDriller: Python Framework for Mining Software Repositories. In Proceedings of the 26th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE ’18), November 4– 9, 2018, Lake Buena Vista, FL, USA. ACM, New York, NY, USA, 4 pages. https://doi.org/10.1145/3236024.3264598 1 INTRODUCTION Mining software repository (MSR) techniques allow researchers to analyze the information generated throughout the software devel- follow [20], predicting classes that are more prone to change/de- fects [3, 6, 16, 17], and identifying the core developers of a software team to transfer knowledge [12]. Among the dierent sources of information researchers can use, version control systems, such as Git, are among the most used ones. Indeed, version control systems provide researchers with precise information about the source code, its evolution, the developers of the software, and the commit messages (which explain the reasons for changing). Nevertheless, extracting information from Git repositories is not trivial. Indeed, many frameworks can be used to interact with Git (depending on the preferred programming language), such as GitPython [1] for Python, or JGit for Java [8]. However, these tools are often dicult to use. One of the main reasons for such diculty is that they encapsulate all the features from Git, hence, developers are forced to write long and complex implementations to extract even simple data from a Git repository. In this paper, we present P, a Python framework that helps developers to mine software repositories. P provides developers with simple APIs to extract information from a Git repository, such as commits, developers, modications, dis, and source code. Moreover, as P is a framework, developers can further manipulate the extracted data and quickly export the results to their preferred formats (e.g., CSV les and databases). To evaluate the usefulness of our tool, we compare it with the state-of-the-art Python framework GitPython, in terms of imple- mentation complexity, performance, and memory consumption. Our results show that P requires signicantly fewer lines of code to perform the same task when compared to GitPython, with only a small drop in performance. Also, we asked six develop- ers to perform tasks with both tools and found that all developers spend less time in learning and implementing tasks in P. 2 PYDRILLER P is a wrapper around GitPython that eases the extraction of information from Git repositories. The most signicant dier- Code review - Asynchronous - Asynchronous
  • 5. Motivation Peer review of manuscripts PyDriller: Python Framework for Mining Soware Repositories Davide Spadini Delft University of Technology Software Improvement Group Delft, The Netherlands d.spadini@sig.eu Maurício Aniche Delft University of Technology Delft, The Netherlands m.f.aniche@tudelft.nl Alberto Bacchelli University of Zurich Zurich, Switzerland bacchelli@i.uzh.ch ABSTRACT Software repositories contain historical and valuable information about the overall development of software systems. Mining software repositories (MSR) is nowadays considered one of the most inter- esting growing elds within software engineering. MSR focuses on extracting and analyzing data available in software repositories to uncover interesting, useful, and actionable information about the system. Even though MSR plays an important role in software engineering research, few tools have been created and made public to support developers in extracting information from Git reposi- tory. In this paper, we present P, a Python Framework that eases the process of mining Git. We compare our tool against the state-of-the-art Python Framework GitPython, demonstrating that P can achieve the same results with, on average, 50% less LOC and signicantly lower complexity. URL: https://github.com/ishepard/pydriller, Materials: https://doi.org/10.5281/zenodo.1327363, Pre-print: https://doi.org/10.5281/zenodo.1327411 CCS CONCEPTS • Software and its engineering; KEYWORDS Mining Software Repositories, GitPython, Git, Python ACM Reference Format: Davide Spadini, Maurício Aniche, and Alberto Bacchelli. 2018. PyDriller: Python Framework for Mining Software Repositories. In Proceedings of the 26th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE ’18), November 4– 9, 2018, Lake Buena Vista, FL, USA. ACM, New York, NY, USA, 4 pages. https://doi.org/10.1145/3236024.3264598 1 INTRODUCTION Mining software repository (MSR) techniques allow researchers to analyze the information generated throughout the software devel- follow [20], predicting classes that are more prone to change/de- fects [3, 6, 16, 17], and identifying the core developers of a software team to transfer knowledge [12]. Among the dierent sources of information researchers can use, version control systems, such as Git, are among the most used ones. Indeed, version control systems provide researchers with precise information about the source code, its evolution, the developers of the software, and the commit messages (which explain the reasons for changing). Nevertheless, extracting information from Git repositories is not trivial. Indeed, many frameworks can be used to interact with Git (depending on the preferred programming language), such as GitPython [1] for Python, or JGit for Java [8]. However, these tools are often dicult to use. One of the main reasons for such diculty is that they encapsulate all the features from Git, hence, developers are forced to write long and complex implementations to extract even simple data from a Git repository. In this paper, we present P, a Python framework that helps developers to mine software repositories. P provides developers with simple APIs to extract information from a Git repository, such as commits, developers, modications, dis, and source code. Moreover, as P is a framework, developers can further manipulate the extracted data and quickly export the results to their preferred formats (e.g., CSV les and databases). To evaluate the usefulness of our tool, we compare it with the state-of-the-art Python framework GitPython, in terms of imple- mentation complexity, performance, and memory consumption. Our results show that P requires signicantly fewer lines of code to perform the same task when compared to GitPython, with only a small drop in performance. Also, we asked six develop- ers to perform tasks with both tools and found that all developers spend less time in learning and implementing tasks in P. 2 PYDRILLER P is a wrapper around GitPython that eases the extraction of information from Git repositories. The most signicant dier- Code review - Asynchronous - 1 reviewer per peer review - Asynchronous - 1 reviewer per code review
  • 6. Motivation Peer review of manuscripts PyDriller: Python Framework for Mining Soware Repositories Davide Spadini Delft University of Technology Software Improvement Group Delft, The Netherlands d.spadini@sig.eu Maurício Aniche Delft University of Technology Delft, The Netherlands m.f.aniche@tudelft.nl Alberto Bacchelli University of Zurich Zurich, Switzerland bacchelli@i.uzh.ch ABSTRACT Software repositories contain historical and valuable information about the overall development of software systems. Mining software repositories (MSR) is nowadays considered one of the most inter- esting growing elds within software engineering. MSR focuses on extracting and analyzing data available in software repositories to uncover interesting, useful, and actionable information about the system. Even though MSR plays an important role in software engineering research, few tools have been created and made public to support developers in extracting information from Git reposi- tory. In this paper, we present P, a Python Framework that eases the process of mining Git. We compare our tool against the state-of-the-art Python Framework GitPython, demonstrating that P can achieve the same results with, on average, 50% less LOC and signicantly lower complexity. URL: https://github.com/ishepard/pydriller, Materials: https://doi.org/10.5281/zenodo.1327363, Pre-print: https://doi.org/10.5281/zenodo.1327411 CCS CONCEPTS • Software and its engineering; KEYWORDS Mining Software Repositories, GitPython, Git, Python ACM Reference Format: Davide Spadini, Maurício Aniche, and Alberto Bacchelli. 2018. PyDriller: Python Framework for Mining Software Repositories. In Proceedings of the 26th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE ’18), November 4– 9, 2018, Lake Buena Vista, FL, USA. ACM, New York, NY, USA, 4 pages. https://doi.org/10.1145/3236024.3264598 1 INTRODUCTION Mining software repository (MSR) techniques allow researchers to analyze the information generated throughout the software devel- follow [20], predicting classes that are more prone to change/de- fects [3, 6, 16, 17], and identifying the core developers of a software team to transfer knowledge [12]. Among the dierent sources of information researchers can use, version control systems, such as Git, are among the most used ones. Indeed, version control systems provide researchers with precise information about the source code, its evolution, the developers of the software, and the commit messages (which explain the reasons for changing). Nevertheless, extracting information from Git repositories is not trivial. Indeed, many frameworks can be used to interact with Git (depending on the preferred programming language), such as GitPython [1] for Python, or JGit for Java [8]. However, these tools are often dicult to use. One of the main reasons for such diculty is that they encapsulate all the features from Git, hence, developers are forced to write long and complex implementations to extract even simple data from a Git repository. In this paper, we present P, a Python framework that helps developers to mine software repositories. P provides developers with simple APIs to extract information from a Git repository, such as commits, developers, modications, dis, and source code. Moreover, as P is a framework, developers can further manipulate the extracted data and quickly export the results to their preferred formats (e.g., CSV les and databases). To evaluate the usefulness of our tool, we compare it with the state-of-the-art Python framework GitPython, in terms of imple- mentation complexity, performance, and memory consumption. Our results show that P requires signicantly fewer lines of code to perform the same task when compared to GitPython, with only a small drop in performance. Also, we asked six develop- ers to perform tasks with both tools and found that all developers spend less time in learning and implementing tasks in P. 2 PYDRILLER P is a wrapper around GitPython that eases the extraction of information from Git repositories. The most signicant dier- Code review - Asynchronous - 1 reviewer per peer review - Reviewers judge the manuscript independently from each other - Asynchronous - 1 reviewer per code review - Reviews are immediately visible to the other reviewers Could this visibility bias the other reviewers?
  • 7. Availability Bias • Availability bias is one type of cognitive bias • It is the tendency to overestimate the likelihood of events with greater availability in memory (recent memories) • By reading other reviewers’ comments, the reviewer might be biased in fi nding the same types of errors, thus resulting in a biased code review outcome. • Are reviewers biased by other reviewers comments? Should we change the code review process and do something similar to a manuscript review?
  • 8. Research Questions RQ1: … a bug type that is not normally considered? RQ2: … a bug type that is normally considered? What is the effect of priming a reviewer with comments on …
  • 9. Demographics confounders With 1 comment of a previous reviewer Treatment Group Without comments Control Group review review review review Normally considered: Corner Case Normally considered: Corner Case Not normally considered: NPE on parameters Not normally considered: NPE on parameters
  • 10. With 1 comment of a previous reviewer Treatment Group Without comments Control Group review review review review Normally considered: Corner Case Normally considered: Corner Case Not normally considered: NPE on parameters Not normally considered: NPE on parameters }Questions on the code review
  • 11.
  • 12.
  • 13.
  • 14. RQ1: What is the effect of priming a reviewer with comments on a bug type that is not normally considered?
  • 15. Reviewers primed on a not commonly considered bug are more likely to fi nd other occurrences of this type of bugs. However, this does not prevent them in fi nding also other types of bugs. 40%: “Extremely in fl uenced” 40%: ”Very in fl uenced 20%: ”Somewhat in fl uenced RQ1: What is the effect of priming a reviewer with comments on a bug type that is not normally considered?
  • 16. RQ2: What is the effect of priming a reviewer with comments on a bug type that is normally considered?
  • 17. RQ2: Results 50%: ”Extremely in fl uenced 10%: ”Somewhat in fl uenced” 40%: “Slightly/Not In fl uenced” Reviewers primed on an algorithmic bug perceive an in fl uence, but are as likely as the others to fi nd algorithmic bugs. Furthermore, primed participants did not capture fewer bugs of the other type.
  • 18. Closing the circle Peer review of manuscripts PyDriller: Python Framework for Mining Soware Repositories Davide Spadini Delft University of Technology Software Improvement Group Delft, The Netherlands d.spadini@sig.eu Maurício Aniche Delft University of Technology Delft, The Netherlands m.f.aniche@tudelft.nl Alberto Bacchelli University of Zurich Zurich, Switzerland bacchelli@i.uzh.ch ABSTRACT Software repositories contain historical and valuable information about the overall development of software systems. Mining software repositories (MSR) is nowadays considered one of the most inter- esting growing elds within software engineering. MSR focuses on extracting and analyzing data available in software repositories to uncover interesting, useful, and actionable information about the system. Even though MSR plays an important role in software engineering research, few tools have been created and made public to support developers in extracting information from Git reposi- tory. In this paper, we present P, a Python Framework that eases the process of mining Git. We compare our tool against the state-of-the-art Python Framework GitPython, demonstrating that P can achieve the same results with, on average, 50% less LOC and signicantly lower complexity. URL: https://github.com/ishepard/pydriller, Materials: https://doi.org/10.5281/zenodo.1327363, Pre-print: https://doi.org/10.5281/zenodo.1327411 CCS CONCEPTS • Software and its engineering; KEYWORDS Mining Software Repositories, GitPython, Git, Python ACM Reference Format: Davide Spadini, Maurício Aniche, and Alberto Bacchelli. 2018. PyDriller: Python Framework for Mining Software Repositories. In Proceedings of the 26th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE ’18), November 4– 9, 2018, Lake Buena Vista, FL, USA. ACM, New York, NY, USA, 4 pages. https://doi.org/10.1145/3236024.3264598 1 INTRODUCTION Mining software repository (MSR) techniques allow researchers to analyze the information generated throughout the software devel- follow [20], predicting classes that are more prone to change/de- fects [3, 6, 16, 17], and identifying the core developers of a software team to transfer knowledge [12]. Among the dierent sources of information researchers can use, version control systems, such as Git, are among the most used ones. Indeed, version control systems provide researchers with precise information about the source code, its evolution, the developers of the software, and the commit messages (which explain the reasons for changing). Nevertheless, extracting information from Git repositories is not trivial. Indeed, many frameworks can be used to interact with Git (depending on the preferred programming language), such as GitPython [1] for Python, or JGit for Java [8]. However, these tools are often dicult to use. One of the main reasons for such diculty is that they encapsulate all the features from Git, hence, developers are forced to write long and complex implementations to extract even simple data from a Git repository. In this paper, we present P, a Python framework that helps developers to mine software repositories. P provides developers with simple APIs to extract information from a Git repository, such as commits, developers, modications, dis, and source code. Moreover, as P is a framework, developers can further manipulate the extracted data and quickly export the results to their preferred formats (e.g., CSV les and databases). To evaluate the usefulness of our tool, we compare it with the state-of-the-art Python framework GitPython, in terms of imple- mentation complexity, performance, and memory consumption. Our results show that P requires signicantly fewer lines of code to perform the same task when compared to GitPython, with only a small drop in performance. Also, we asked six develop- ers to perform tasks with both tools and found that all developers spend less time in learning and implementing tasks in P. 2 PYDRILLER P is a wrapper around GitPython that eases the extraction of information from Git repositories. The most signicant dier- Code review