Presented at: The 44th IEEE/ACM International Conference on Software Engineering (ICSE 2022)
Date of Conference: May 2022
Conference Location: Virtual & Pittsburgh, PA, USA
This paper was originally published in the Empirical Software Engineering journal
The preprint is available at: https://arxiv.org/pdf/2110.12229
A video of the presentation is available at: https://youtu.be/suWRL2nmxMs
Optimizing AI for immediate response in Smart CCTV
How Do I Refactor This? An Empirical Study on Refactoring Trends and Topics in Stack Overflow
1. How Do I Refactor This?
An Empirical Study on
Refactoring Trends and
Topics in Stack Overflow
Anthony Peruma · Steven Simmons ·
Eman AlOmar · Christian Newman ·
Mohamed Mkaouer · Ali Ouni
I C S E J o u r n a l - F i r s t P a p e r
2. Software Refactoring
2
An essential part of software maintenance and evolution
Improves the internal quality of the system, and reduce its
technical debt
Research in refactoring is well-established
➢ Detection of refactoring opportunities & code recommendations
3. Refactoring research is
continually evolving
Are developers applying refactorings in the
same environments, on problems with the
same characteristics and context, as
researchers assume?
• Refactoring is no longer about correcting
code smells
• Industry projects are complex and require
more complicated solutions
• Prior studies interviewed developers
4. GOAL
4
Understand the trends and challenges
around developer discussions on software
refactoring concepts and activities
5. The most popular programming-specific question and answer forum
Over 19 million questions and one million users
6. Research Questions
6
RQ1: How have refactoring discussions on Stack Overflow
grown over the years?
RQ2: What do developers discuss in refactoring-based
Stack Overflow posts?
RQ3: Which topics are the most popular and difficult
among refactoring-related questions?
8. Experiment design
8
Posts – Questions, Answers &
Accepted Answers
Tags – Associated with a question
Score – Higher the score the
better
View Count – Number of time
the post was viewed
Posts with the refactor tag
Posts having ‘refactor’ in the
title
Quantitative – database
queries and custom code
Qualitative – manually
analyzing a statistically
significant sample
9. Anatomy of a post
9
tags
score
title
body
views
QUESTION
ANSWER
score
accepted answer
13. How have refactoring discussions on
Stack Overflow grown over the years?
1. How have refactoring posts grown throughout
the years?
2. What is the distribution of questions and answers
among developers?
3. What are the tags that are associated with
refactoring questions?
RQ 1
13
14. RQ 1.1: How have refactoring posts grown throughout the
years?
Approach:
• Extract all questions that had the term ‘refactor’ in either the title or tag
• Extract all answers (i.e., accepted and non-accepted) associated with the
questions
Findings:
• 9,489 questions, from which, 828 did not have an associated answer
• Median time between a question and its first answer is 0.27 hours
• While the number of questions and accepted answers have increased yearly,
the volume by which they increased has been falling
14
15. RQ 1.2: What is the distribution of questions and answers
among developers?
Approach:
• Utilize the OwnerUserId field to identify the creator of a post
Findings:
• 7,795 distinct users are responsible for creating all refactoring questions
• Most developers asking questions, tend to only ask questions and not answer
questions
• Most developers would ask only one refactoring question
15
16. RQ 1.3: What are the tags that are associated with
refactoring questions?
Approach:
• Extract all distinct tags from all refactoring posts
• Manual review of the tags
Findings:
• 3,053 distinct tags
• Top five tags are related to programming
languages (or web frameworks) – Java, C#,
JavaScript, Ruby on Rails, and Ruby
• Constant rise in JavaScript questions
16
17. 17
RQ 1 Summary
How have refactoring discussions on Stack Overflow
grown over the years?
• Stack Overflow is a popular venue for refactoring discussions between developers
• Refactoring questions usually receive a response in a short period of time
• There is a rise in questions around dynamically typed languages such as JavaScript
• Most tags are on algorithm and programming concepts, followed by frameworks
18. What do developers discuss in
refactoring-based Stack Overflow posts?
1. What are the frequent terms utilized by developers
in refactoring discussions?
2. To what extent do traditional refactoring
opportunities, known in existing literature, match
with the challenges faced by developers in Stack
Overflow posts?
3. What are the topics around software refactoring
that are being asked by developers?
RQ 2
18
19. RQ 2.1: What are the frequent terms utilized by developers
in refactoring discussions?
Approach:
• Extract the top keywords as bigrams from question posts
• Existence of terms correspond to refactoring operations
Findings:
• IDE ‘visual studio’ plays an important part in refactoring
discussions – the IDE supports multiple languages
• ‘refactoring tool’ shows the importance and reliance of tools
and IDEs in refactoring activities
• ‘legacy code’ highlights a common reason why developers
request support with refactoring
• Code extraction and moving are frequently discussed
19
20. RQ 2.2: To what extent do traditional refactoring opportunities, known in
existing literature, match with the challenges faced by developers in Stack
Overflow posts?
Approach:
• Occurrence of Self-Affirmed Refactoring terms in questions
Findings:
• Frequent mention of key internal quality attributes -- dependency, inheritance
• Use of terms such as ‘clean up’ or ‘redesign’ to discuss refactorings
• Non-functional attribute discussion around ‘readability’, ‘efficiency’, and
‘performance’
20
21. RQ 2.3: What are the topics around software refactoring that
are being asked by developers?
Approach:
• Topic modeling analysis using
latent Dirichlet allocation
• Includes text-preprocessing
• Use of topic coherence, perplexity
and visualization to determine the
optimum number of topics
• Manual analysis of a statistically
significant sample of questions
21
Findings:
22. RQ 2.3: What are the topics around software refactoring that
are being asked by developers?
22
Code
Optimization
Simplifying code
structures
Improve readability
and reusability
Reduce lengthy
switch-case
statements, loops,
and duplicate code
Tools and
IDEs
Perform complex
refactorings
Renaming software
artifacts
Architecture
and Design
Patterns
Accumulation of
code updates violate
design principles
Applying SOLID, DRY,
SRP, and KISS
principles
Unit Testing
Challenges with
evolving the test
suite alongside the
source code
Database
Business logic within
SQL scripts grow in
length and
complexity
Challenges with
readability, design
principles, and
system performance
23. 23
RQ 2 Summary
What do developers discuss in refactoring-based
Stack Overflow posts?
• Refactoring discussions revolve around five topics – Code Optimization, Tools and
IDEs, Architecture and Design Patterns, Unit Testing, and Database
• Maintainability is a key concern
• Improving readability and reusability is of utmost concern
• Challenges in synchronizing refactoring changes across software engineering artifacts
24. Which topics are the most popular
and difficult among refactoring-
related questions?
RQ 3
24
25. Which topics are the most popular and difficult among
refactoring-related questions?
Approach:
• Measure popularity using a questions view count, favorite count, and score
• Measure difficulty: questions without answers, without accepted answers and
median time for an accepted answer
Findings:
• Questions on Tools/IDEs is the most popular, Database is the least popular
• Tools/IDE questions get more views than code optimization questions
• Questions on Tools/IDE are mostly unanswered than others
• Code Optimization questions are less challenging to answer 25
28. Research/Academic community
• Course curriculum to reflect real-world settings
• Adaptation of refactoring operations for multiple
programming language and artifact types
• Improve and extend the applicability of
readability quality metrics
• Expand the study and applicability of reusability
beyond source code
28
29. Tool/IDE vendor community
• Automatic synchronization between project
artifacts
• Enhanced rename refactoring functionality
• Enhance the user experience
29
30. Developer community
• Extend coding standards utilized in projects to
support naming standards for all project artifacts
• Integrating code quality tools into the build
process for the early detection of poor coding
practices
• Perform frequent and early peer-reviews on all
project artifacts
30
32. Conclusion
A quantitative and qualitative analysis of refactoring questions asked by
developers on Stack Overflow
Findings:
• Stack Overflow is a popular venue for developers to seek assistance with refactoring
• Growth in refactoring dynamically typed code such as Python and JavaScript
• Most questions are around optimizing source code to improve readability and reusability
• Refactoring is not limited to source code – database and unit testing artifact refactoring is common
• Tools are also a popular discussion topic among developers
32
Preprint: https://arxiv.org/abs/2110.12229