SlideShare uma empresa Scribd logo
1 de 30
Baixar para ler offline
Understanding the
  evolution of software
  project communities
       Tom Mens & Mathieu Goeminne
Service de Génie Logiciel, Institut d’Informatique
   Faculté des Sciences, Université de Mons
Research topic

       • Study of open source software evolution
       • Taking into account the community (social
           network) of persons surrounding a
           software project (developers, users, ...)
       • By analysing and combining data from
           different types of repositories


Université de Mons   Thursday 7 april 2011, SATTOSE seminar, Koblenz, Germany   Mathieu Goeminne & Tom Mens
Goals
       •   Understand (through repository mining and analysis)
           •   How software quality is impacted by how the community is
               structured

           •   How do software product, project and community co-evolve

           •   How developers and users interact and influence one another

       •   Improve (through guidelines and tool support)
           •   The way in which communities should be structured

           •   The quality of the software process and product

           •   The interaction between developers and users

Université de Mons   Thursday 7 april 2011, SATTOSE seminar, Koblenz, Germany   Mathieu Goeminne & Tom Mens
Questions
   •   Is there a core group (of developers and/or users) being significantly
       more active than the others?
   •   How does a person contribute to different types of activities?
   •   Are the recurrent patterns of activity in the community?
   •   How do particular “events” impact the project?
   •   How does developer intake and turnover affect the project?
   •   Do we find evidence of Pareto principle (inequality of activity)
   •   Is there a “bus factor” effect?
   •   How does the activity distribution evolve over time?
   •   How does the activity distribution vary across different projects?

Université de Mons   Thursday 7 april 2011, SATTOSE seminar, Koblenz, Germany   Mathieu Goeminne & Tom Mens
Why open source?
       • Free access to source code, defect data,
           developer and user communication
       • Observable communities
       • Observable activities
       • Increasing popularity for personal and
           commercial use
       • A huge range of community and software
           sizes
Université de Mons   Thursday 7 april 2011, SATTOSE seminar, Koblenz, Germany   Mathieu Goeminne & Tom Mens
Methodology
       •   Exploit available data from different repositories
           •   code repositories
           •   mail repositories (mailing lists)
           •   bug repositories (bug trackers)

       •   Select open source projects
       •   Use Herdsman framework
           •   Based on FLOSSMetrics data extraction
           •   Use identity merging tool
           •   Use of econometrics
           •   Use of statistical analysis and visualisation

Université de Mons   Thursday 7 april 2011, SATTOSE seminar, Koblenz, Germany   Mathieu Goeminne & Tom Mens
Selected Projects
       • Criteria for selecting projects
          •    Availability of data from repositories
          •    Data processable by FLOSSMetrics tools
              •      CVSAnaly2, MLStats, Bicho
          •    Size of considered projects: persons
               involved, code size, activity in each
               repository
          •    Age of considered projects
Université de Mons    Thursday 7 april 2011, SATTOSE seminar, Koblenz, Germany   Mathieu Goeminne & Tom Mens
Selected Projects
                     Brasero               Evince            Subversion                Wine
  versioning
   system
                        git                   svn                   svn                  git

  age (years)             8                   11                    11                   11

 size (KLOC)           107                   580                   422                 2001

  #commits            4100                  4000                 51529                74500

    #mails             460                  1800                 24673                14000

     #bugs             250                   950                                       3300

Université de Mons   Thursday 7 april 2011, SATTOSE seminar, Koblenz, Germany   Mathieu Goeminne & Tom Mens
Herdsman framework
       • Exploiting available data from different
           repositories
          • code repositories: to detect committer
               activity
          • mail repositories (mailing lists): to detect
               mailer activity
          • bug repositories (bug trackers): to detect
               bug-related activities
Université de Mons   Thursday 7 april 2011, SATTOSE seminar, Koblenz, Germany   Mathieu Goeminne & Tom Mens
Herdsman Framework




Université de Mons   Thursday 7 april 2011, SATTOSE seminar, Koblenz, Germany   Mathieu Goeminne & Tom Mens
Identity Merging
Comparing Merge Algorithms
• Based on a reference merge model
  • manually created
  • iterative approach
  • relying on information contained in different
  files (COMMITTERS, MAINTAINERS, AUTHORS, NEWS, README)
• Compute, for each algorithm, precision and
recall w.r.t. reference model
Reference merge model
 Brasero                     Evince




  Before: one repository account = one person
  After: one person may have multiple accounts
Comparing Merge
                 Algorithms
       • Simple
       • Bird (code and mail repositories only)
        • based on Levenshtein distance
       • Bird extended for bug repositories
       • Improved
        • Combining ideas from Bird and Robles
Université de Mons   Thursday 7 april 2011, SATTOSE seminar, Koblenz, Germany   Mathieu Goeminne & Tom Mens
Comparing Merge Algorithms
  Brasero         Evince
Comparing Merge Algorithms
 (varying parameter values) - Evince
Simple                    Improved




Bird




                        Bird extended
Activity distribution

       • How are developer activities (commits,
           mails, bug fixes) distributed?
          • For a single release: do we observe an
               unequal distribution ?
          • Over time: do we observe a change in
               this distribution ?


Université de Mons   Thursday 7 april 2011, SATTOSE seminar, Koblenz, Germany   Mathieu Goeminne & Tom Mens
Activity distribution
             For a single release
       • Evidence of Pareto principle (20/80 rule)?
        • Most activity is carried out by a small
               group of persons.
          • Typically : 20% do 80% of the job.
       • Doesn’t necessarily imply that the activity
           distribution follows a Pareto law


Université de Mons   Thursday 7 april 2011, SATTOSE seminar, Koblenz, Germany   Mathieu Goeminne & Tom Mens
Activity distribution
   1

 0.9
                                                                                    Brasero                       commits
 0.8

 0.7
                                                                                                                  mails
 0.6
                                                                                                                  br changes
 0.5

 0.4
                                                                                     Evince
 0.3

 0.2




                                                                                      Wine
 0.1

   0
       0     0.1   0.2   0.3   0.4   0.5   0.6     0.7   0.8   0.9   1




 1

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

 0
       0     0.1   0.2   0.3   0.4   0.5   0.6     0.7   0.8   0.9   1




           Université de Mons                    Thursday 7 april 2011, SATTOSE seminar, Koblenz, Germany   Mathieu Goeminne & Tom Mens
Activity distribution
                Over time
       • Econometrics
          • express inequality in a distribution
          • aggregation metrics:
               Gini, Hoover, Theil (normalised)
          • Values between 0 and 1
           • 0 = perfect equality; 1 = perfect
                     inequality

Université de Mons    Thursday 7 april 2011, SATTOSE seminar, Koblenz, Germany   Mathieu Goeminne & Tom Mens
Activity distribution
     (aggregation indices for Evince)
     1


    0.9


    0.8


    0.7


    0.6


    0.5


    0.4                                                       Gini
                                                              Hoover
    0.3
                                                              Theil (normalised)
    0.2


    0.1


     0
     Apr-99 Dec-99 Aug-00 Apr-01 Dec-01 Aug-02 Apr-03 Dec-03 Aug-04 Apr-05 Dec-05 Aug-06 Apr-07 Dec-07 Aug-08




Université de Mons        Thursday 7 april 2011, SATTOSE seminar, Koblenz, Germany            Mathieu Goeminne & Tom Mens
Activity distribution
                                      (Gini index)
  $"
!#,"                                                                                                                                         Brasero
!#+"
!#*"
!#)"
!#("
!#'"                 .4556/7"
                                                                                                                                               Evince
!#&"                 58697"
!#%"                 :;"0".<8=>?7"
!#$"
  !"
  -./0!)"    1230!*"    -./0!*"    1230!+"    -./0!+"     1230!,"    -./0!,"      1230$!"   -./0$!"
                                                                                                                                                 Wine
  $"                                                                                                    $"
!#,"                                                                                                  !#,"
!#+"                                                                                                  !#+"
!#*"                                                                                                  !#*"
!#)"                                                                                                  !#)"                 2.44536"
!#("                                                                                                  !#("                 47586"
!#'"                                                                     1233456"                     !#'"                 9:;"<=>.<3"2?7@;=6"
!#&"                                                                     37486"                       !#&"
!#%"                                                                     9:;"/<.2/5"1=7>;<6"          !#%"
!#$"                                                                                                  !#$"
  !"                                                                                                    !"
  -./0,," -./0!!" -./0!$" -./0!%" -./0!&" -./0!'" -./0!(" -./0!)" -./0!*" -./0!+" -./0!," -./0$!"       -./0,+" -./0,," 1230!!" 1230!$" 1230!%" 1230!&" 1230!'" 1230!(" 1230!)" 1230!*" 1230!+" 1230!," 1230$!"




       Université de Mons                            Thursday 7 april 2011, SATTOSE seminar, Koblenz, Germany                                                       Mathieu Goeminne & Tom Mens
Activity Distribution
              Conclusion

       • Activity distributions seem to become
           more and more unequally distributed
       • The Pareto principle is clearly present in
           studied projects



Université de Mons   Thursday 7 april 2011, SATTOSE seminar, Koblenz, Germany   Mathieu Goeminne & Tom Mens
Activity distribution
              Future work
       • Identify the type of statistical distribution
       • Use sliding windows for studying activity
           distribution over time
          • useful to detect impact of personnel
               turnover
          • ignore persons that have become
               inactive, and discover new active persons.

Université de Mons   Thursday 7 april 2011, SATTOSE seminar, Koblenz, Germany   Mathieu Goeminne & Tom Mens
Identifying core groups
       • Display Venn diagrams of most active (top
           20) persons, according to each definition of
           activity
           (committing, mailing, bug report changing)
       • For each person, show the percentage of
           activity attributable to this person
       • Take into account identity merges
Université de Mons   Thursday 7 april 2011, SATTOSE seminar, Koblenz, Germany   Mathieu Goeminne & Tom Mens
Identifying core groups
                                                                   Brasero

                                                                     Evince

                                                                     Wine




Université de Mons   Thursday 7 april 2011, SATTOSE seminar, Koblenz, Germany   Mathieu Goeminne & Tom Mens
Identifying core groups
             Conclusion

       • For Brasero and Evince, the activity is led
           by a limited number of persons involved in
           2 or 3 of the defined activities.
       • For Wine, it seems not to be the case.

Université de Mons   Thursday 7 april 2011, SATTOSE seminar, Koblenz, Germany   Mathieu Goeminne & Tom Mens
Identifying core groups
            Future work
       • Automate this process for the entire
           project community
       • Study the evolution of core groups over
           time
          • Does a core group remain stable or does
               it change often? Why?
       • Can we find evidence for a “bus factor”?
Université de Mons   Thursday 7 april 2011, SATTOSE seminar, Koblenz, Germany   Mathieu Goeminne & Tom Mens
More future work
       • Study correlation between community
           structure (social network) and source code
           quality (as computed using software metrics).
       • Extend and refine types of activity:
         • different types of commit activity (doc,
               source code, test, etc.); of mail activity
               (information, asking, answering, etc.); of bug
               repository activity (bug creation,
               modification and commenting)
Université de Mons   Thursday 7 april 2011, SATTOSE seminar, Koblenz, Germany   Mathieu Goeminne & Tom Mens
Thank you



Université de Mons   Thursday 7 april 2011, SATTOSE seminar, Koblenz, Germany   Mathieu Goeminne & Tom Mens

Mais conteúdo relacionado

Semelhante a Understanding the evolution of software project communities

Bugs tracking at a large scale in the FLOSS ecosystem
Bugs tracking at a large scale in the FLOSS ecosystemBugs tracking at a large scale in the FLOSS ecosystem
Bugs tracking at a large scale in the FLOSS ecosystem
olberger
 
GoOpen 2010: Reidar Conradi
GoOpen 2010: Reidar ConradiGoOpen 2010: Reidar Conradi
GoOpen 2010: Reidar Conradi
Friprogsenteret
 

Semelhante a Understanding the evolution of software project communities (20)

MOD2014-Mens-Lecture4
MOD2014-Mens-Lecture4MOD2014-Mens-Lecture4
MOD2014-Mens-Lecture4
 
Howison si2 keynote
Howison si2 keynoteHowison si2 keynote
Howison si2 keynote
 
Continuous Delivery - the missing parts - Paul Stack
Continuous Delivery - the missing parts - Paul StackContinuous Delivery - the missing parts - Paul Stack
Continuous Delivery - the missing parts - Paul Stack
 
Microservices, DevOps, and Containers with OpenShift and Fabric8
Microservices, DevOps, and Containers with OpenShift and Fabric8Microservices, DevOps, and Containers with OpenShift and Fabric8
Microservices, DevOps, and Containers with OpenShift and Fabric8
 
Open Source and Open Standards, the Future of ECM? IRMS Conference April 2011
Open Source and Open Standards, the Future of ECM? IRMS Conference April 2011Open Source and Open Standards, the Future of ECM? IRMS Conference April 2011
Open Source and Open Standards, the Future of ECM? IRMS Conference April 2011
 
Bugs tracking at a large scale in the FLOSS ecosystem
Bugs tracking at a large scale in the FLOSS ecosystemBugs tracking at a large scale in the FLOSS ecosystem
Bugs tracking at a large scale in the FLOSS ecosystem
 
DevOps Beyond the Buzzwords: Culture, Tools, & Straight Talk
DevOps Beyond the Buzzwords: Culture, Tools, & Straight TalkDevOps Beyond the Buzzwords: Culture, Tools, & Straight Talk
DevOps Beyond the Buzzwords: Culture, Tools, & Straight Talk
 
Why FLOSS is a Java developer's best friend: Dave Gruber
Why FLOSS is a Java developer's best friend: Dave GruberWhy FLOSS is a Java developer's best friend: Dave Gruber
Why FLOSS is a Java developer's best friend: Dave Gruber
 
SciSoftDays Talk - Howison: Spreading the work in software ecosystems
SciSoftDays Talk - Howison: Spreading the work in software ecosystemsSciSoftDays Talk - Howison: Spreading the work in software ecosystems
SciSoftDays Talk - Howison: Spreading the work in software ecosystems
 
Governing services, data, rules, processes and more
Governing services, data, rules, processes and moreGoverning services, data, rules, processes and more
Governing services, data, rules, processes and more
 
Sustainability in Scientific Software: Ecosystem complexity and Software Vis...
Sustainability in Scientific Software:Ecosystem complexityandSoftware Vis...Sustainability in Scientific Software:Ecosystem complexityandSoftware Vis...
Sustainability in Scientific Software: Ecosystem complexity and Software Vis...
 
Howison i conf-transition
Howison i conf-transitionHowison i conf-transition
Howison i conf-transition
 
Software Ecosystems = Big Data
Software Ecosystems = Big DataSoftware Ecosystems = Big Data
Software Ecosystems = Big Data
 
#OSSPARIS19 - Do not be afraid to be forked ! - YOAV KUTNER, Oro Inc.
#OSSPARIS19 - Do not be afraid to be forked ! - YOAV KUTNER, Oro Inc.#OSSPARIS19 - Do not be afraid to be forked ! - YOAV KUTNER, Oro Inc.
#OSSPARIS19 - Do not be afraid to be forked ! - YOAV KUTNER, Oro Inc.
 
DevOps
DevOpsDevOps
DevOps
 
Seronto Process
Seronto ProcessSeronto Process
Seronto Process
 
Software Analytics: Data Analytics for Software Engineering
Software Analytics: Data Analytics for Software EngineeringSoftware Analytics: Data Analytics for Software Engineering
Software Analytics: Data Analytics for Software Engineering
 
2014 abic-talk
2014 abic-talk2014 abic-talk
2014 abic-talk
 
GoOpen 2010: Reidar Conradi
GoOpen 2010: Reidar ConradiGoOpen 2010: Reidar Conradi
GoOpen 2010: Reidar Conradi
 
Open Bugs & Development Stages
Open Bugs & Development StagesOpen Bugs & Development Stages
Open Bugs & Development Stages
 

Mais de Tom Mens

Comparing semantic versioning practices in Cargo, npm, Packagist and Rubygems
Comparing semantic versioning practices in Cargo, npm, Packagist and RubygemsComparing semantic versioning practices in Cargo, npm, Packagist and Rubygems
Comparing semantic versioning practices in Cargo, npm, Packagist and Rubygems
Tom Mens
 
Comparing dependency issues across software package distributions (FOSDEM 2020)
Comparing dependency issues across software package distributions (FOSDEM 2020)Comparing dependency issues across software package distributions (FOSDEM 2020)
Comparing dependency issues across software package distributions (FOSDEM 2020)
Tom Mens
 
Empirically Analysing the Socio-Technical Health of Software Package Managers
Empirically Analysing the Socio-Technical Health of Software Package ManagersEmpirically Analysing the Socio-Technical Health of Software Package Managers
Empirically Analysing the Socio-Technical Health of Software Package Managers
Tom Mens
 

Mais de Tom Mens (20)

How to be(come) a successful PhD student
How to be(come) a successful PhD studentHow to be(come) a successful PhD student
How to be(come) a successful PhD student
 
Recognising bot activity in collaborative software development
Recognising bot activity in collaborative software developmentRecognising bot activity in collaborative software development
Recognising bot activity in collaborative software development
 
A Dataset of Bot and Human Activities in GitHub
A Dataset of Bot and Human Activities in GitHubA Dataset of Bot and Human Activities in GitHub
A Dataset of Bot and Human Activities in GitHub
 
The (r)evolution of CI/CD on GitHub
 The (r)evolution of CI/CD on GitHub The (r)evolution of CI/CD on GitHub
The (r)evolution of CI/CD on GitHub
 
Nurturing the Software Ecosystems of the Future
Nurturing the Software Ecosystems of the FutureNurturing the Software Ecosystems of the Future
Nurturing the Software Ecosystems of the Future
 
Comment programmer un robot en 30 minutes?
Comment programmer un robot en 30 minutes?Comment programmer un robot en 30 minutes?
Comment programmer un robot en 30 minutes?
 
On the rise and fall of CI services in GitHub
On the rise and fall of CI services in GitHubOn the rise and fall of CI services in GitHub
On the rise and fall of CI services in GitHub
 
On backporting practices in package dependency networks
On backporting practices in package dependency networksOn backporting practices in package dependency networks
On backporting practices in package dependency networks
 
Comparing semantic versioning practices in Cargo, npm, Packagist and Rubygems
Comparing semantic versioning practices in Cargo, npm, Packagist and RubygemsComparing semantic versioning practices in Cargo, npm, Packagist and Rubygems
Comparing semantic versioning practices in Cargo, npm, Packagist and Rubygems
 
Lost in Zero Space
Lost in Zero SpaceLost in Zero Space
Lost in Zero Space
 
Evaluating a bot detection model on git commit messages
Evaluating a bot detection model on git commit messagesEvaluating a bot detection model on git commit messages
Evaluating a bot detection model on git commit messages
 
Is my software ecosystem healthy? It depends!
Is my software ecosystem healthy? It depends!Is my software ecosystem healthy? It depends!
Is my software ecosystem healthy? It depends!
 
Bot or not? Detecting bots in GitHub pull request activity based on comment s...
Bot or not? Detecting bots in GitHub pull request activity based on comment s...Bot or not? Detecting bots in GitHub pull request activity based on comment s...
Bot or not? Detecting bots in GitHub pull request activity based on comment s...
 
On the fragility of open source software packaging ecosystems
On the fragility of open source software packaging ecosystemsOn the fragility of open source software packaging ecosystems
On the fragility of open source software packaging ecosystems
 
How magic is zero? An Empirical Analysis of Initial Development Releases in S...
How magic is zero? An Empirical Analysis of Initial Development Releases in S...How magic is zero? An Empirical Analysis of Initial Development Releases in S...
How magic is zero? An Empirical Analysis of Initial Development Releases in S...
 
Comparing dependency issues across software package distributions (FOSDEM 2020)
Comparing dependency issues across software package distributions (FOSDEM 2020)Comparing dependency issues across software package distributions (FOSDEM 2020)
Comparing dependency issues across software package distributions (FOSDEM 2020)
 
Measuring Technical Lag in Software Deployments (CHAOSScon 2020)
Measuring Technical Lag in Software Deployments (CHAOSScon 2020)Measuring Technical Lag in Software Deployments (CHAOSScon 2020)
Measuring Technical Lag in Software Deployments (CHAOSScon 2020)
 
SecoHealth 2019 Research Achievements
SecoHealth 2019 Research AchievementsSecoHealth 2019 Research Achievements
SecoHealth 2019 Research Achievements
 
SECO-Assist 2019 research seminar
SECO-Assist 2019 research seminarSECO-Assist 2019 research seminar
SECO-Assist 2019 research seminar
 
Empirically Analysing the Socio-Technical Health of Software Package Managers
Empirically Analysing the Socio-Technical Health of Software Package ManagersEmpirically Analysing the Socio-Technical Health of Software Package Managers
Empirically Analysing the Socio-Technical Health of Software Package Managers
 

Último

The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
KarakKing
 

Último (20)

How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxExploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
 

Understanding the evolution of software project communities

  • 1. Understanding the evolution of software project communities Tom Mens & Mathieu Goeminne Service de Génie Logiciel, Institut d’Informatique Faculté des Sciences, Université de Mons
  • 2. Research topic • Study of open source software evolution • Taking into account the community (social network) of persons surrounding a software project (developers, users, ...) • By analysing and combining data from different types of repositories Université de Mons Thursday 7 april 2011, SATTOSE seminar, Koblenz, Germany Mathieu Goeminne & Tom Mens
  • 3. Goals • Understand (through repository mining and analysis) • How software quality is impacted by how the community is structured • How do software product, project and community co-evolve • How developers and users interact and influence one another • Improve (through guidelines and tool support) • The way in which communities should be structured • The quality of the software process and product • The interaction between developers and users Université de Mons Thursday 7 april 2011, SATTOSE seminar, Koblenz, Germany Mathieu Goeminne & Tom Mens
  • 4. Questions • Is there a core group (of developers and/or users) being significantly more active than the others? • How does a person contribute to different types of activities? • Are the recurrent patterns of activity in the community? • How do particular “events” impact the project? • How does developer intake and turnover affect the project? • Do we find evidence of Pareto principle (inequality of activity) • Is there a “bus factor” effect? • How does the activity distribution evolve over time? • How does the activity distribution vary across different projects? Université de Mons Thursday 7 april 2011, SATTOSE seminar, Koblenz, Germany Mathieu Goeminne & Tom Mens
  • 5. Why open source? • Free access to source code, defect data, developer and user communication • Observable communities • Observable activities • Increasing popularity for personal and commercial use • A huge range of community and software sizes Université de Mons Thursday 7 april 2011, SATTOSE seminar, Koblenz, Germany Mathieu Goeminne & Tom Mens
  • 6. Methodology • Exploit available data from different repositories • code repositories • mail repositories (mailing lists) • bug repositories (bug trackers) • Select open source projects • Use Herdsman framework • Based on FLOSSMetrics data extraction • Use identity merging tool • Use of econometrics • Use of statistical analysis and visualisation Université de Mons Thursday 7 april 2011, SATTOSE seminar, Koblenz, Germany Mathieu Goeminne & Tom Mens
  • 7. Selected Projects • Criteria for selecting projects • Availability of data from repositories • Data processable by FLOSSMetrics tools • CVSAnaly2, MLStats, Bicho • Size of considered projects: persons involved, code size, activity in each repository • Age of considered projects Université de Mons Thursday 7 april 2011, SATTOSE seminar, Koblenz, Germany Mathieu Goeminne & Tom Mens
  • 8. Selected Projects Brasero Evince Subversion Wine versioning system git svn svn git age (years) 8 11 11 11 size (KLOC) 107 580 422 2001 #commits 4100 4000 51529 74500 #mails 460 1800 24673 14000 #bugs 250 950 3300 Université de Mons Thursday 7 april 2011, SATTOSE seminar, Koblenz, Germany Mathieu Goeminne & Tom Mens
  • 9. Herdsman framework • Exploiting available data from different repositories • code repositories: to detect committer activity • mail repositories (mailing lists): to detect mailer activity • bug repositories (bug trackers): to detect bug-related activities Université de Mons Thursday 7 april 2011, SATTOSE seminar, Koblenz, Germany Mathieu Goeminne & Tom Mens
  • 10. Herdsman Framework Université de Mons Thursday 7 april 2011, SATTOSE seminar, Koblenz, Germany Mathieu Goeminne & Tom Mens
  • 12. Comparing Merge Algorithms • Based on a reference merge model • manually created • iterative approach • relying on information contained in different files (COMMITTERS, MAINTAINERS, AUTHORS, NEWS, README) • Compute, for each algorithm, precision and recall w.r.t. reference model
  • 13. Reference merge model Brasero Evince Before: one repository account = one person After: one person may have multiple accounts
  • 14. Comparing Merge Algorithms • Simple • Bird (code and mail repositories only) • based on Levenshtein distance • Bird extended for bug repositories • Improved • Combining ideas from Bird and Robles Université de Mons Thursday 7 april 2011, SATTOSE seminar, Koblenz, Germany Mathieu Goeminne & Tom Mens
  • 15. Comparing Merge Algorithms Brasero Evince
  • 16. Comparing Merge Algorithms (varying parameter values) - Evince Simple Improved Bird Bird extended
  • 17. Activity distribution • How are developer activities (commits, mails, bug fixes) distributed? • For a single release: do we observe an unequal distribution ? • Over time: do we observe a change in this distribution ? Université de Mons Thursday 7 april 2011, SATTOSE seminar, Koblenz, Germany Mathieu Goeminne & Tom Mens
  • 18. Activity distribution For a single release • Evidence of Pareto principle (20/80 rule)? • Most activity is carried out by a small group of persons. • Typically : 20% do 80% of the job. • Doesn’t necessarily imply that the activity distribution follows a Pareto law Université de Mons Thursday 7 april 2011, SATTOSE seminar, Koblenz, Germany Mathieu Goeminne & Tom Mens
  • 19. Activity distribution 1 0.9 Brasero commits 0.8 0.7 mails 0.6 br changes 0.5 0.4 Evince 0.3 0.2 Wine 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Université de Mons Thursday 7 april 2011, SATTOSE seminar, Koblenz, Germany Mathieu Goeminne & Tom Mens
  • 20. Activity distribution Over time • Econometrics • express inequality in a distribution • aggregation metrics: Gini, Hoover, Theil (normalised) • Values between 0 and 1 • 0 = perfect equality; 1 = perfect inequality Université de Mons Thursday 7 april 2011, SATTOSE seminar, Koblenz, Germany Mathieu Goeminne & Tom Mens
  • 21. Activity distribution (aggregation indices for Evince) 1 0.9 0.8 0.7 0.6 0.5 0.4 Gini Hoover 0.3 Theil (normalised) 0.2 0.1 0 Apr-99 Dec-99 Aug-00 Apr-01 Dec-01 Aug-02 Apr-03 Dec-03 Aug-04 Apr-05 Dec-05 Aug-06 Apr-07 Dec-07 Aug-08 Université de Mons Thursday 7 april 2011, SATTOSE seminar, Koblenz, Germany Mathieu Goeminne & Tom Mens
  • 22. Activity distribution (Gini index) $" !#," Brasero !#+" !#*" !#)" !#(" !#'" .4556/7" Evince !#&" 58697" !#%" :;"0".<8=>?7" !#$" !" -./0!)" 1230!*" -./0!*" 1230!+" -./0!+" 1230!," -./0!," 1230$!" -./0$!" Wine $" $" !#," !#," !#+" !#+" !#*" !#*" !#)" !#)" 2.44536" !#(" !#(" 47586" !#'" 1233456" !#'" 9:;"<=>.<3"2?7@;=6" !#&" 37486" !#&" !#%" 9:;"/<.2/5"1=7>;<6" !#%" !#$" !#$" !" !" -./0,," -./0!!" -./0!$" -./0!%" -./0!&" -./0!'" -./0!(" -./0!)" -./0!*" -./0!+" -./0!," -./0$!" -./0,+" -./0,," 1230!!" 1230!$" 1230!%" 1230!&" 1230!'" 1230!(" 1230!)" 1230!*" 1230!+" 1230!," 1230$!" Université de Mons Thursday 7 april 2011, SATTOSE seminar, Koblenz, Germany Mathieu Goeminne & Tom Mens
  • 23. Activity Distribution Conclusion • Activity distributions seem to become more and more unequally distributed • The Pareto principle is clearly present in studied projects Université de Mons Thursday 7 april 2011, SATTOSE seminar, Koblenz, Germany Mathieu Goeminne & Tom Mens
  • 24. Activity distribution Future work • Identify the type of statistical distribution • Use sliding windows for studying activity distribution over time • useful to detect impact of personnel turnover • ignore persons that have become inactive, and discover new active persons. Université de Mons Thursday 7 april 2011, SATTOSE seminar, Koblenz, Germany Mathieu Goeminne & Tom Mens
  • 25. Identifying core groups • Display Venn diagrams of most active (top 20) persons, according to each definition of activity (committing, mailing, bug report changing) • For each person, show the percentage of activity attributable to this person • Take into account identity merges Université de Mons Thursday 7 april 2011, SATTOSE seminar, Koblenz, Germany Mathieu Goeminne & Tom Mens
  • 26. Identifying core groups Brasero Evince Wine Université de Mons Thursday 7 april 2011, SATTOSE seminar, Koblenz, Germany Mathieu Goeminne & Tom Mens
  • 27. Identifying core groups Conclusion • For Brasero and Evince, the activity is led by a limited number of persons involved in 2 or 3 of the defined activities. • For Wine, it seems not to be the case. Université de Mons Thursday 7 april 2011, SATTOSE seminar, Koblenz, Germany Mathieu Goeminne & Tom Mens
  • 28. Identifying core groups Future work • Automate this process for the entire project community • Study the evolution of core groups over time • Does a core group remain stable or does it change often? Why? • Can we find evidence for a “bus factor”? Université de Mons Thursday 7 april 2011, SATTOSE seminar, Koblenz, Germany Mathieu Goeminne & Tom Mens
  • 29. More future work • Study correlation between community structure (social network) and source code quality (as computed using software metrics). • Extend and refine types of activity: • different types of commit activity (doc, source code, test, etc.); of mail activity (information, asking, answering, etc.); of bug repository activity (bug creation, modification and commenting) Université de Mons Thursday 7 april 2011, SATTOSE seminar, Koblenz, Germany Mathieu Goeminne & Tom Mens
  • 30. Thank you Université de Mons Thursday 7 april 2011, SATTOSE seminar, Koblenz, Germany Mathieu Goeminne & Tom Mens