Mining the Modern Code Review Repositories: A Dataset of People, Process and Product (MSR 2016)

•Transferir como PPTX, PDF•

2 gostaram•757 visualizações

Slides for the data paper "Mining the Modern Code Review Repositories: A Dataset of People, Process and Product" in the proceedings of the 13th International Conference on Mining Software Repositories (MSR 2016), Austin, TX, May 2016.

Software

Mining the Modern Code Review Repositories:
A Dataset of People, Process and Product
Xin Yang Raula G. Kula Norihiro Yoshida Hajimu Iida
May 14–15, 2016. Austin, Texas
MSR 2016 data showcase
Osaka University
Japan
Nagoya University
Japan
NAIST
Japan
NAIST
Japan

An Overview of the Code Review Dataset
1
● Code Review
● Source Code
● Human / Social

Why we made this dataset?
2
*Hamasaki et al., “Who does what during a code review? datasets of OSS peer review
repositories”. MSR '13
Our JSON-based
Dataset
(Hamasaki et al. MSR'13)*

Our previous work
(Hamasaki et al. MSR '13)*
Why we made this dataset?
2
*Hamasaki et al., “Who does what during a code review? datasets of OSS peer review
repositories”. MSR '13
Our JSON-based
Dataset
(Hamasaki et al. MSR'13)*
Some feedback:
“Hard to query...”
“Hard to convert...”
“Unable to access the source
code...”

Process
Product
People
You can mine from three different aspects
3

4 years 3 years 7 years 4 years 3 years
611 20 567 111 189
173,749 13,597 63,610 110,172 9,168
5,091 437 3,334 1,437 759
Dataset Statistics (updated to May 2015)
4
</></></>

Mais conteúdo relacionado

Destaque

Mining Software Archives to Support Software DevelopmentThomas Zimmermann

Model Comparison for Delta-CompressionMarkus Scheidgen

An Empirical Study of Goto in C Code from GitHub RepositoriesSAIL_QU

[우리가 데이터를 쓰는 법] 온라인 서비스 개선을 위한 데이터 활용법 - 마이크로소프트 김진영 데이터과학자Dylan Ko

Software Analytics: Towards Software Mining that MattersTao Xie

연관도 분석을 이용한 데이터마이닝Keunhyun Oh

고품질 Sw와 개발문화도형 임

Mining public datasets using opensource tools: Zeppelin, Spark and Jujuseoul_engineer

Software Defect Prediction on Unlabeled DatasetsSung Kim

Dissertation DefenseSung Kim

위대한개발문화신승환

Mining Software RepositoriesIsrael Herraiz

Introduce Deep learning & A.I. ApplicationsMario Cho

Crime Analysis using Data AnalysisChetan Hireholi

06. graph miningJeonghun Yoon

Code coverage for MSR Researches [Work in Progress]Maurício Aniche

Creating and Analyzing Source Code Repository Models - A Model-based Approach...Markus Scheidgen

Oliot Consumer ElectronicsDaeyoung Kim

Destaque (18)

Mining Software Archives to Support Software Development

Model Comparison for Delta-Compression

An Empirical Study of Goto in C Code from GitHub Repositories

[우리가 데이터를 쓰는 법] 온라인 서비스 개선을 위한 데이터 활용법 - 마이크로소프트 김진영 데이터과학자

Software Analytics: Towards Software Mining that Matters

연관도 분석을 이용한 데이터마이닝

고품질 Sw와 개발문화

Mining public datasets using opensource tools: Zeppelin, Spark and Juju

Software Defect Prediction on Unlabeled Datasets

Dissertation Defense

위대한개발문화

Mining Software Repositories

Introduce Deep learning & A.I. Applications

Crime Analysis using Data Analysis

06. graph mining

Code coverage for MSR Researches [Work in Progress]

Creating and Analyzing Source Code Repository Models - A Model-based Approach...

Oliot Consumer Electronics

Semelhante a Mining the Modern Code Review Repositories: A Dataset of People, Process and Product (MSR 2016)

ACT Talk, Giuseppe Totaro: High Performance Computing for Distributed Indexin...Advanced-Concepts-Team

Love Can't Wait! Optimizing PageLoad Time of SPAs at Zoosk [FutureStack16]New Relic

Introduction to YassonDmitry Kornilov

Appendix A Work DistributionSoumyadeepBasu4

(Big) Data for Research for "Science, Technology and Entrepreneurship"Yasushi Hara

ER 2016 TutorialRim Moussa

eNanoMapper database, search tools and templatesNina Jeliazkova

Ischools workshop - 4 - data discoveryARDC

Week 2 tyoes of databases and ERD 2020Osama Ghandour Geris

Buidling large scale recommendation engineKeeyong Han

Big Data Analytics - IntroductionAlex Meadows

The Materials Project: Experiences from running a million computational scien...Anubhav Jain

Don't panic! - Postgres introductionFederico Campoli

Getting Started with MongoDB (TCF ITPC 2014)Michael Redlich

Release webinar: Sansa and OntarioBigData_Europe

The Materials Project - Combining Science and Informatics to Accelerate Mater...University of California, San Diego

Web Intelligence 2013 - Characterizing concepts of interest leveraging Linked...Fabrizio Orlandi

Link Discovery Tutorial Part III: Benchmarking for Instance Matching SystemsHolistic Benchmarking of Big Linked Data

Big data forum 19 march 2014Matt Carroll

Semelhante a Mining the Modern Code Review Repositories: A Dataset of People, Process and Product (MSR 2016) (20)

ACT Talk, Giuseppe Totaro: High Performance Computing for Distributed Indexin...

Love Can't Wait! Optimizing PageLoad Time of SPAs at Zoosk [FutureStack16]

Introduction to Yasson

Appendix A Work Distribution

(Big) Data for Research for "Science, Technology and Entrepreneurship"

ER 2016 Tutorial

eNanoMapper database, search tools and templates

Ischools workshop - 4 - data discovery

Week 2 tyoes of databases and ERD 2020

Buidling large scale recommendation engine

Big Data Analytics - Introduction

The Materials Project: Experiences from running a million computational scien...

Don't panic! - Postgres introduction

Getting Started with MongoDB (TCF ITPC 2014)

Release webinar: Sansa and Ontario

The Materials Project - Combining Science and Informatics to Accelerate Mater...

Web Intelligence 2013 - Characterizing concepts of interest leveraging Linked...

Link Discovery Tutorial Part III: Benchmarking for Instance Matching Systems

Big data forum 19 march 2014

Mais de Norihiro Yoshida

ファジングツールAFLの利用を支援するツールFuzz4Bによるファジング教育の試みNorihiro Yoshida

Extracting a Micro State Transition Table Using KLEENorihiro Yoshida

A Quantitative Comparison of Coverage-Based Greybox FuzzersNorihiro Yoshida

ソフトウェア開発における産学協創フォーラムオープニング資料Norihiro Yoshida

コードクローン検出・変更管理ツール群の開発Norihiro Yoshida

Proactive Clone Recommendation System for Extract Method RefactoringNorihiro Yoshida

Code Search Based on Deep Neural Network and Code MutationNorihiro Yoshida

機械学習システムにおける技術的負債についてNorihiro Yoshida

When, why and for whom do practitioners detect technical debts?: An experienc...Norihiro Yoshida

Revisiting the Relationship Between Code Smells and RefactoringNorihiro Yoshida

IWESEP 2013Norihiro Yoshida

MSR2013Norihiro Yoshida

Mais de Norihiro Yoshida (12)

ファジングツールAFLの利用を支援するツールFuzz4Bによるファジング教育の試み

Extracting a Micro State Transition Table Using KLEE

A Quantitative Comparison of Coverage-Based Greybox Fuzzers

ソフトウェア開発における産学協創フォーラムオープニング資料

コードクローン検出・変更管理ツール群の開発

Proactive Clone Recommendation System for Extract Method Refactoring

Code Search Based on Deep Neural Network and Code Mutation

機械学習システムにおける技術的負債について

When, why and for whom do practitioners detect technical debts?: An experienc...

Revisiting the Relationship Between Code Smells and Refactoring

IWESEP 2013

MSR2013

Último

The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS

8257 interfacing 2 in microprocessor for btech studentsHimanshiGarg82

Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verifiedDelhi Call girls

%in Bahrain+277-882-255-28 abortion pills for sale in Bahrainmasabamasaba

Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...Nitya salvi

How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes

Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812

The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...kalichargn70th171

Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfkalichargn70th171

Microsoft AI Transformation Partner Playbook.pdfWilly Marroquin (WillyDevNET)

Right Money Management App For Your Financial GoalsJhone kinadey

%in kempton park+277-882-255-28 abortion pills for sale in kempton park masabamasaba

Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfryanfarris8

Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...SelfMade bd

MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...Jittipong Loespradit

call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls

The title is not connected to what is insideshinachiaurasa2

The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171

Pharm-D Biostatistics and Research methodologyAnusha Are

Exploring the Best Video Editing App.pdfproinshot.com

Mining the Modern Code Review Repositories: A Dataset of People, Process and Product (MSR 2016)

1. Mining the Modern Code Review Repositories: A Dataset of People, Process and Product Xin Yang Raula G. Kula Norihiro Yoshida Hajimu Iida May 14–15, 2016. Austin, Texas MSR 2016 data showcase Osaka University Japan Nagoya University Japan NAIST Japan NAIST Japan

2. An Overview of the Code Review Dataset 1 ● Code Review ● Source Code ● Human / Social

3. Why we made this dataset? 2 *Hamasaki et al., “Who does what during a code review? datasets of OSS peer review repositories”. MSR '13 Our JSON-based Dataset (Hamasaki et al. MSR'13)*

4. Our previous work (Hamasaki et al. MSR '13)* Why we made this dataset? 2 *Hamasaki et al., “Who does what during a code review? datasets of OSS peer review repositories”. MSR '13 Our JSON-based Dataset (Hamasaki et al. MSR'13)* Some feedback: “Hard to query...” “Hard to convert...” “Unable to access the source code...”

5. Our previous work (Hamasaki et al. MSR '13)* Why we made this dataset? 2 *Hamasaki et al., “Who does what during a code review? datasets of OSS peer review repositories”. MSR '13 Our JSON-based Dataset (Hamasaki et al. MSR'13)* Some feedback: “Hard to query...” “Hard to convert...” “Unable to access the source code...” Script

6. Typical Modern Code Review Process 3

7. Process Product People You can mine from three different aspects 3

8. 4 years 3 years 7 years 4 years 3 years 611 20 567 111 189 173,749 13,597 63,610 110,172 9,168 5,091 437 3,334 1,437 759 Dataset Statistics (updated to May 2015) 4 </></></>

9. goo.gl/Wi4UoJ 5 Download the Dataset

Notas do Editor

Why we made this dataset? Code review dataset from 5 successful OSS projects Source code from Git Human and social information (anonymized usernames and email addresses)
Our previous work in MSR 2013 provides JSON format dataset and refined dataset with csv format. In these 3 years we have received many feedback from our dataset users. Some users complained that : ……. Thus, we improved our dataset by converting JSON to MySQL database, and provide shell scripts to access source code...
Our previous work in MSR 2013 provides JSON format dataset and refined dataset with csv format. In these 3 years we have received many feedback from our dataset users. Some users complained that : ……. Thus, we improved our dataset by converting JSON to MySQL database, and provide shell scripts to access source code...
Our previous work in MSR 2013 provides JSON format dataset and refined dataset with csv format. In these 3 years we have received many feedback from our dataset users. Some users complained that : ……. Thus, we improved our dataset by converting JSON to MySQL database, and provide shell scripts to access source code...
This is a typical MCR process, Author create and update their patches (changes), Reviewers perform code reviews on changes and send feedback to authors Continuous Integration (CI) tools build and test changes, After several times revisions, the changes will pass reviews and be integrated to code repositories
Our dataset try to retrieve the data from three different aspect of code review process. First, how developers, reviewers and CI tools collaborate (see People) Second, what is the life cycle of a change from initial commit to final decision (see Process) Final, what is the product of code review (see Product).
Some basic statistics about our dataset We retrieve data from 5 big-scale successful OSS projects: OpenStack, Libreoffice, AOSP, Qt and Eclipse Time: how long this project use Gerrit code review (from the time they adopted Gerrit) Repositories: how many repositories are involved Patches: how many changes have been created Participants: how many people have participated in
You can download our dataset here and now!

Mining the Modern Code Review Repositories: A Dataset of People, Process and Product (MSR 2016)

Recomendados

Recomendados

Mais conteúdo relacionado

Destaque

Destaque (18)

Semelhante a Mining the Modern Code Review Repositories: A Dataset of People, Process and Product (MSR 2016)

Semelhante a Mining the Modern Code Review Repositories: A Dataset of People, Process and Product (MSR 2016) (20)

Mais de Norihiro Yoshida

Mais de Norihiro Yoshida (12)

Último

Último (20)

Mining the Modern Code Review Repositories: A Dataset of People, Process and Product (MSR 2016)

Notas do Editor