SlideShare a Scribd company logo
1 of 13
Download to read offline
Are FLOSS Developers Committing to CVS/SVN as much
          as they are Talking in Mailing Lists?
Challenges for Integrating Data from Multiple Repositories




                              Sulayman K. Sowe, I. Samoladas, I. Stamelos, L. Angelis
                                  Dept. of Informatics, Aristotle University, Greece.
                                                sksowe@csd.auth.gr

               3rd International Workshop on Public Data about Software Development (WoPDaSD)
                                        10th September 2008, Milan, Italy.




This research is partially sponsored by the FLOSSMetrics Project (Ref. No. FP6-IST5-033547), http://flossmetrics.org/
and SQO-OSS project (Ref. No. FP6-IST-5-033331),http://www.sqo-oss.eu/

                                          WoPDaSD                                                                       ~.1
In this presentation...
  ➲   Nomadic life of FLOSS developers
         Motivation for this research:
         Research hypothesis

  ➲   Methodology in brief
         Data & Source
         Identification of developers from SVN & Lists

  ➲   Results & Discussion

  ➲   Summary & conclusion
         Ongoing research
                     WoPDaSD                              ~.2
Nomadic life of FLOSS developers




➲    Like the Fulani nomads of the West African planes
     FLOSS developers are not bound to a single territory
     and are free to:
     participate in other projects or communities,
     use and reuse software/bits of code from other projects,
     suggest, argue for or against requirements, specs., etc. in
      projects where they have least commits rights,
     use different identities (usernames, email), etc.


                        WoPDaSD                                     ~.3
Motivation for this research
➲
        Why research FLOSS developers or nomads?
    
          Understand the collaborative nature of developing FLOSS in
          terms developer participation (code commits and email postings)
          in multiple repositories - SVN and Mailing Lists.
➲
        Research Hypothesis:
        IF Mailing lists are the main communication veins in most projects,
        then CVS/SVN is a collection of arteries. Thus,
    
          FLOSS developers code and participate in lists discussions:
           H0: ”FLOSS developers contribute equally to code
               repository and mailing lists”, alternative
           H1: “FLOSS developers contribute more to code repository
               than mailing lists”.




                                WoPDaSD                                       ~.4
Methodology…Data & Source
➲   Retrieve data from 14 projects from the Flossmetric
    retrieval system
       Mailing lists data dumps (.sql file format)
       SVN data dumps (.sql file format)




                           WoPDaSD                        ~.5
Initial (Raw) Data
 ➲   How many SVN commiters and Mailing Lists posters in each project?




SVN
Commits




ML
Posts




                            WoPDaSD                                      ~.6
Methodology…Identification of developers
 ➲
     The main problem in
     studying developers
     activities in multiple
     repositories is
     identification:
     ➲
         Is committer A in SVN of
         project X the same person
         (Poster A) in mailing lists of
         project X?




                                  WoPDaSD   ~.7
Results & Discussion…1
➲       The query result for each project gave us developers co-occurrence in both SVN
        and mailing list
➲       N=486 for all 14 projects.
            Percentage of developer in both repositories
              In 8 projects = 57.14%
              In 4 projects = 90.11%
              In 2 projects = 80.21%
➲       What is going on in ibatis and turbine?




                                    WoPDaSD                                          ~.8
Results & Discussion...2
➲       Distribution of Commits & Posts
        Domination of commits over posts
        Mean commit per developer > Mean post per developer
        Developers are committing more to SVN than they are posting to mailing lists,
         EXCEPT in ibatis and turbine.




                              WoPDaSD                                                ~.9
Results & Discussion...3
➲ Relationship between Commits and Posts
➲   Overall correlation between commits and posts shows statistical significance
    (with * and for p < 0.05).




                        WoPDaSD                                                    ~.10
Results & Discussion...4
➲       Developers contribution in terms of commits and posts
        Wilcoxon signed rank test applied on mean values shows almost 50-50 split
         between projects where commits = posts (green) and commits > posts (yellow).
         With only the turbine project showing otherwise.




                              WoPDaSD                                              ~.11
Summary & conclusion
➲   FLOSS developers are coding as much as they are
    talking. They contribute equally to cod repositories
    and mailing lists, H0 supported.
➲   However, in almost all the projects, developers made
    more commits than posts, H1 supported.

➲   Why turbine and ibatis are outliers?
         Maybe the high prolific developer is making more posts than commits; in
          a ratio 4:1.
         Something peculiar about the composition of Apache related projects
➲   Ongoing aspects of this research
          Automate data collection and identification process
          Analyze a total of 60 or more projects from the FM retrieval system.
          Add a quality dimension to committers variable:
            Categorize commits: modifications, deletions, additions, code related,
             documentation (reports, readme, etc)
            Time scale/Sliding frames: the evolution of commits and posts over a
             given period.



                             WoPDaSD                                                  ~.12
Thank you for your attention
        Questions ?
         Comments
Suggestion for improvements




       WoPDaSD                  ~.13

More Related Content

Similar to Implications Of Dual Participation Of Floss Developer

Interactive Multi-Submission Deposit Workflows for Desktop Applications by Da...
Interactive Multi-Submission Deposit Workflows for Desktop Applications by Da...Interactive Multi-Submission Deposit Workflows for Desktop Applications by Da...
Interactive Multi-Submission Deposit Workflows for Desktop Applications by Da...depositMO
 
Progressive f# tutorials nyc don syme on keynote f# in the open source world
Progressive f# tutorials nyc don syme on keynote f# in the open source worldProgressive f# tutorials nyc don syme on keynote f# in the open source world
Progressive f# tutorials nyc don syme on keynote f# in the open source worldSkills Matter
 
Developing With Openbravo Rl Eppt
Developing With Openbravo Rl EpptDeveloping With Openbravo Rl Eppt
Developing With Openbravo Rl Epptvobree
 
OpenAIRE and the case of Irish Repositories, by Jochen Schirrwagen (RIAN Work...
OpenAIRE and the case of Irish Repositories, by Jochen Schirrwagen (RIAN Work...OpenAIRE and the case of Irish Repositories, by Jochen Schirrwagen (RIAN Work...
OpenAIRE and the case of Irish Repositories, by Jochen Schirrwagen (RIAN Work...OpenAIRE
 
OpenAIRE and the Case of Irish Repositories
OpenAIRE and the Case of Irish RepositoriesOpenAIRE and the Case of Irish Repositories
OpenAIRE and the Case of Irish RepositoriesRIANIreland
 
Summit 16: Keynote: HPE Presentation- Transforming Communication Service Prov...
Summit 16: Keynote: HPE Presentation- Transforming Communication Service Prov...Summit 16: Keynote: HPE Presentation- Transforming Communication Service Prov...
Summit 16: Keynote: HPE Presentation- Transforming Communication Service Prov...OPNFV
 
Mark Hughes Annual Seminar Presentation on Open Source
Mark Hughes Annual Seminar Presentation on Open Source Mark Hughes Annual Seminar Presentation on Open Source
Mark Hughes Annual Seminar Presentation on Open Source Tracy Kent
 
Guidelines for Working with Contract Developers in Evergreen
Guidelines for Working with Contract Developers in EvergreenGuidelines for Working with Contract Developers in Evergreen
Guidelines for Working with Contract Developers in Evergreenloriayre
 
DSD-INT 2014 - OpenMI symposium - OpenMI and other model coupling standards, ...
DSD-INT 2014 - OpenMI symposium - OpenMI and other model coupling standards, ...DSD-INT 2014 - OpenMI symposium - OpenMI and other model coupling standards, ...
DSD-INT 2014 - OpenMI symposium - OpenMI and other model coupling standards, ...Deltares
 
GoOpen 2010: Sandro D'Elia
GoOpen 2010: Sandro D'EliaGoOpen 2010: Sandro D'Elia
GoOpen 2010: Sandro D'EliaFriprogsenteret
 
Status of discussions with repository platforms_ DSpace.pdf
Status of discussions with repository platforms_ DSpace.pdfStatus of discussions with repository platforms_ DSpace.pdf
Status of discussions with repository platforms_ DSpace.pdf4Science
 
OSSDN Introduction 06112015
OSSDN Introduction 06112015OSSDN Introduction 06112015
OSSDN Introduction 06112015Rick Bauer
 
Introduction to OpenDaylight and Hydrogen, Learnings from the Year, What's Ne...
Introduction to OpenDaylight and Hydrogen, Learnings from the Year, What's Ne...Introduction to OpenDaylight and Hydrogen, Learnings from the Year, What's Ne...
Introduction to OpenDaylight and Hydrogen, Learnings from the Year, What's Ne...David Meyer
 
ContainerDayVietnam2016: Become a Cloud-native Developer
ContainerDayVietnam2016: Become a Cloud-native DeveloperContainerDayVietnam2016: Become a Cloud-native Developer
ContainerDayVietnam2016: Become a Cloud-native DeveloperDocker-Hanoi
 
Detecting Good Practices and Pitfalls when Publishing Vocabularies on the Web
Detecting Good Practices and Pitfalls when Publishing Vocabularies on the Web Detecting Good Practices and Pitfalls when Publishing Vocabularies on the Web
Detecting Good Practices and Pitfalls when Publishing Vocabularies on the Web María Poveda Villalón
 
Intro to open source - 101 presentation
Intro to open source - 101 presentationIntro to open source - 101 presentation
Intro to open source - 101 presentationJavier Perez
 
Software Defined Networking: The OpenDaylight Project
Software Defined Networking: The OpenDaylight ProjectSoftware Defined Networking: The OpenDaylight Project
Software Defined Networking: The OpenDaylight ProjectGreat Wide Open
 
Up to speed in domain driven design
Up to speed in domain driven designUp to speed in domain driven design
Up to speed in domain driven designRick van der Arend
 
Collaborative Data Analysis with Taverna Workflows
Collaborative Data Analysis with Taverna WorkflowsCollaborative Data Analysis with Taverna Workflows
Collaborative Data Analysis with Taverna WorkflowsAndrea Wiggins
 
Release webinar: Sansa and Ontario
Release webinar: Sansa and OntarioRelease webinar: Sansa and Ontario
Release webinar: Sansa and OntarioBigData_Europe
 

Similar to Implications Of Dual Participation Of Floss Developer (20)

Interactive Multi-Submission Deposit Workflows for Desktop Applications by Da...
Interactive Multi-Submission Deposit Workflows for Desktop Applications by Da...Interactive Multi-Submission Deposit Workflows for Desktop Applications by Da...
Interactive Multi-Submission Deposit Workflows for Desktop Applications by Da...
 
Progressive f# tutorials nyc don syme on keynote f# in the open source world
Progressive f# tutorials nyc don syme on keynote f# in the open source worldProgressive f# tutorials nyc don syme on keynote f# in the open source world
Progressive f# tutorials nyc don syme on keynote f# in the open source world
 
Developing With Openbravo Rl Eppt
Developing With Openbravo Rl EpptDeveloping With Openbravo Rl Eppt
Developing With Openbravo Rl Eppt
 
OpenAIRE and the case of Irish Repositories, by Jochen Schirrwagen (RIAN Work...
OpenAIRE and the case of Irish Repositories, by Jochen Schirrwagen (RIAN Work...OpenAIRE and the case of Irish Repositories, by Jochen Schirrwagen (RIAN Work...
OpenAIRE and the case of Irish Repositories, by Jochen Schirrwagen (RIAN Work...
 
OpenAIRE and the Case of Irish Repositories
OpenAIRE and the Case of Irish RepositoriesOpenAIRE and the Case of Irish Repositories
OpenAIRE and the Case of Irish Repositories
 
Summit 16: Keynote: HPE Presentation- Transforming Communication Service Prov...
Summit 16: Keynote: HPE Presentation- Transforming Communication Service Prov...Summit 16: Keynote: HPE Presentation- Transforming Communication Service Prov...
Summit 16: Keynote: HPE Presentation- Transforming Communication Service Prov...
 
Mark Hughes Annual Seminar Presentation on Open Source
Mark Hughes Annual Seminar Presentation on Open Source Mark Hughes Annual Seminar Presentation on Open Source
Mark Hughes Annual Seminar Presentation on Open Source
 
Guidelines for Working with Contract Developers in Evergreen
Guidelines for Working with Contract Developers in EvergreenGuidelines for Working with Contract Developers in Evergreen
Guidelines for Working with Contract Developers in Evergreen
 
DSD-INT 2014 - OpenMI symposium - OpenMI and other model coupling standards, ...
DSD-INT 2014 - OpenMI symposium - OpenMI and other model coupling standards, ...DSD-INT 2014 - OpenMI symposium - OpenMI and other model coupling standards, ...
DSD-INT 2014 - OpenMI symposium - OpenMI and other model coupling standards, ...
 
GoOpen 2010: Sandro D'Elia
GoOpen 2010: Sandro D'EliaGoOpen 2010: Sandro D'Elia
GoOpen 2010: Sandro D'Elia
 
Status of discussions with repository platforms_ DSpace.pdf
Status of discussions with repository platforms_ DSpace.pdfStatus of discussions with repository platforms_ DSpace.pdf
Status of discussions with repository platforms_ DSpace.pdf
 
OSSDN Introduction 06112015
OSSDN Introduction 06112015OSSDN Introduction 06112015
OSSDN Introduction 06112015
 
Introduction to OpenDaylight and Hydrogen, Learnings from the Year, What's Ne...
Introduction to OpenDaylight and Hydrogen, Learnings from the Year, What's Ne...Introduction to OpenDaylight and Hydrogen, Learnings from the Year, What's Ne...
Introduction to OpenDaylight and Hydrogen, Learnings from the Year, What's Ne...
 
ContainerDayVietnam2016: Become a Cloud-native Developer
ContainerDayVietnam2016: Become a Cloud-native DeveloperContainerDayVietnam2016: Become a Cloud-native Developer
ContainerDayVietnam2016: Become a Cloud-native Developer
 
Detecting Good Practices and Pitfalls when Publishing Vocabularies on the Web
Detecting Good Practices and Pitfalls when Publishing Vocabularies on the Web Detecting Good Practices and Pitfalls when Publishing Vocabularies on the Web
Detecting Good Practices and Pitfalls when Publishing Vocabularies on the Web
 
Intro to open source - 101 presentation
Intro to open source - 101 presentationIntro to open source - 101 presentation
Intro to open source - 101 presentation
 
Software Defined Networking: The OpenDaylight Project
Software Defined Networking: The OpenDaylight ProjectSoftware Defined Networking: The OpenDaylight Project
Software Defined Networking: The OpenDaylight Project
 
Up to speed in domain driven design
Up to speed in domain driven designUp to speed in domain driven design
Up to speed in domain driven design
 
Collaborative Data Analysis with Taverna Workflows
Collaborative Data Analysis with Taverna WorkflowsCollaborative Data Analysis with Taverna Workflows
Collaborative Data Analysis with Taverna Workflows
 
Release webinar: Sansa and Ontario
Release webinar: Sansa and OntarioRelease webinar: Sansa and Ontario
Release webinar: Sansa and Ontario
 

Recently uploaded

Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 

Recently uploaded (20)

Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 

Implications Of Dual Participation Of Floss Developer

  • 1. Are FLOSS Developers Committing to CVS/SVN as much as they are Talking in Mailing Lists? Challenges for Integrating Data from Multiple Repositories Sulayman K. Sowe, I. Samoladas, I. Stamelos, L. Angelis Dept. of Informatics, Aristotle University, Greece. sksowe@csd.auth.gr 3rd International Workshop on Public Data about Software Development (WoPDaSD) 10th September 2008, Milan, Italy. This research is partially sponsored by the FLOSSMetrics Project (Ref. No. FP6-IST5-033547), http://flossmetrics.org/ and SQO-OSS project (Ref. No. FP6-IST-5-033331),http://www.sqo-oss.eu/ WoPDaSD ~.1
  • 2. In this presentation... ➲ Nomadic life of FLOSS developers  Motivation for this research:  Research hypothesis ➲ Methodology in brief  Data & Source  Identification of developers from SVN & Lists ➲ Results & Discussion ➲ Summary & conclusion  Ongoing research WoPDaSD ~.2
  • 3. Nomadic life of FLOSS developers ➲ Like the Fulani nomads of the West African planes FLOSS developers are not bound to a single territory and are free to:  participate in other projects or communities,  use and reuse software/bits of code from other projects,  suggest, argue for or against requirements, specs., etc. in projects where they have least commits rights,  use different identities (usernames, email), etc. WoPDaSD ~.3
  • 4. Motivation for this research ➲ Why research FLOSS developers or nomads?  Understand the collaborative nature of developing FLOSS in terms developer participation (code commits and email postings) in multiple repositories - SVN and Mailing Lists. ➲ Research Hypothesis: IF Mailing lists are the main communication veins in most projects, then CVS/SVN is a collection of arteries. Thus,  FLOSS developers code and participate in lists discussions: H0: ”FLOSS developers contribute equally to code repository and mailing lists”, alternative H1: “FLOSS developers contribute more to code repository than mailing lists”. WoPDaSD ~.4
  • 5. Methodology…Data & Source ➲ Retrieve data from 14 projects from the Flossmetric retrieval system  Mailing lists data dumps (.sql file format)  SVN data dumps (.sql file format) WoPDaSD ~.5
  • 6. Initial (Raw) Data ➲ How many SVN commiters and Mailing Lists posters in each project? SVN Commits ML Posts WoPDaSD ~.6
  • 7. Methodology…Identification of developers ➲ The main problem in studying developers activities in multiple repositories is identification: ➲ Is committer A in SVN of project X the same person (Poster A) in mailing lists of project X? WoPDaSD ~.7
  • 8. Results & Discussion…1 ➲ The query result for each project gave us developers co-occurrence in both SVN and mailing list ➲ N=486 for all 14 projects.  Percentage of developer in both repositories  In 8 projects = 57.14%  In 4 projects = 90.11%  In 2 projects = 80.21% ➲ What is going on in ibatis and turbine? WoPDaSD ~.8
  • 9. Results & Discussion...2 ➲ Distribution of Commits & Posts  Domination of commits over posts  Mean commit per developer > Mean post per developer  Developers are committing more to SVN than they are posting to mailing lists, EXCEPT in ibatis and turbine. WoPDaSD ~.9
  • 10. Results & Discussion...3 ➲ Relationship between Commits and Posts ➲ Overall correlation between commits and posts shows statistical significance (with * and for p < 0.05). WoPDaSD ~.10
  • 11. Results & Discussion...4 ➲ Developers contribution in terms of commits and posts  Wilcoxon signed rank test applied on mean values shows almost 50-50 split between projects where commits = posts (green) and commits > posts (yellow). With only the turbine project showing otherwise. WoPDaSD ~.11
  • 12. Summary & conclusion ➲ FLOSS developers are coding as much as they are talking. They contribute equally to cod repositories and mailing lists, H0 supported. ➲ However, in almost all the projects, developers made more commits than posts, H1 supported. ➲ Why turbine and ibatis are outliers?  Maybe the high prolific developer is making more posts than commits; in a ratio 4:1.  Something peculiar about the composition of Apache related projects ➲ Ongoing aspects of this research  Automate data collection and identification process  Analyze a total of 60 or more projects from the FM retrieval system.  Add a quality dimension to committers variable:  Categorize commits: modifications, deletions, additions, code related, documentation (reports, readme, etc)  Time scale/Sliding frames: the evolution of commits and posts over a given period. WoPDaSD ~.12
  • 13. Thank you for your attention Questions ? Comments Suggestion for improvements WoPDaSD ~.13