SlideShare uma empresa Scribd logo
1 de 15
Linguistic diversity in
open-source development


 Bogdan Vasilescu
 Alexander Serebrenik
 Mark van den Brand
Motivation


                             …                           …

      I „speak‟ Java                        C
                                                   C++                        …
                                                           HTML
                                     Lisp
                                                              XML
                                     Java

                                                         Python
                                            Unix shell
           I „speak‟ Java                                           I „speak‟ Python
             and Python



/ Mathematics and Computer Science                                      23-4-2012   PAGE 1
Motivation

If                                   leaves the project, what is the risk of not finding
                                        replacement developers that speak Python?


     No risk, plenty of other Python                            What about now?
     developers to choose from




/ Mathematics and Computer Science                                           23-4-2012   PAGE 2
Linguistic diversity

      • Greenberg (1956)
           • compare geographic regions
           • probability that two random individuals do not speak the
             same language




/ Mathematics and Computer Science                              23-4-2012   PAGE 3
Linguistic diversity

    Probability that two random individuals do not speak the same language


                • Simple model
                     • everyone speaks exactly one language
                     • languages are independent


                                               2                           S
                       A 1                 p                 p
                                      L                                       P




/ Mathematics and Computer Science                                 23-4-2012   PAGE 4
Linguistic diversity

    Probability that two random individuals do not speak the same language


                • Related-languages model
                     • everyone speaks exactly one language
                     • languages are similar

                                                                           S
                         B 1             p pm sim(, m)      p
                                     ,m L                                     P
                                         0   sim(, m) 1
                                         sim(, ) 1
/ Mathematics and Computer Science                                 23-4-2012   PAGE 5
Linguistic diversity

    Probability that two random individuals do not speak the same language


                • Polyglot related-languages model
                     • everyone speaks at least one language
                     • languages are similar
                                                                  sim(, m)
                                                             s ,m t                       Xs
                      F 1                           ps pt                     ps
                                     s ,t P ( L )                  s t                         P

         L           A, B, C                P ( L)          A, B, C , AB, AC , BC , ABC

/ Mathematics and Computer Science                                                 23-4-2012   PAGE 6
Our risk measure

    • Probability that two random individuals do not speak the
      same language
                                                                  sim(, m)
                                                             s ,m t
                          F 1                       ps pt
                                     s ,t P ( L )                  s t

    • Risk of not finding developers that „speak‟ 

                risk () 1                      ps maxk s sim (k )
                                      s P( L)




/ Mathematics and Computer Science                                            23-4-2012   PAGE 7
StackOverflow.com




/ Mathematics and Computer Science   23-4-2012   PAGE 8
User tags




/ Mathematics and Computer Science   23-4-2012   PAGE 9
Similarity measure

      •    Reverend Gonzo: Java, C, C++, C#, Python,…
      •    Alexander Serebrenik: Prolog, SQL, C++,…
      •    Bogdan Vasilescu: Python,…
      •    Jon Skeet: C#, Java, ASP.net, XML,…
      •    … > 400,000

      • Association rule mining:
                                                               Java
           • “C => Java”
                                             nBoth   C
          sim k             conf    k   
                                             nLeft


/ Mathematics and Computer Science                       23-4-2012   PAGE 10
Similarity measure - results




     • Assembly posts: 44
     • Assembly + Java developers: > 1000
      When in need for Java developers, ask Assembly guys


/ Mathematics and Computer Science                  23-4-2012   PAGE 11
Case study - Emacs

  • 1985-2012: C, Emacs Lisp, C++, Java, Lisp, Python, M4, … (26)

                                                 Exotic languages
                                                 High/low risk




/ Mathematics and Computer Science                        23-4-2012   PAGE 12
Case study - Emacs

                        C: spoken by half of the community
                        + similar to other languages         Python: spoken very sporadically
                        low risk                            + similar to other languages
                                                              low risk




/ Mathematics and Computer Science                                            23-4-2012   PAGE 13
Conclusions

                                        What is the risk of not finding developers
                                        that speak Python?

   • Risk measure                       risk () 1         ps maxk s sim (k )
                                                     s P( L)
   • Similarity measure (StackOverflow)
      • “C => Java” sim k conf k                                  nBoth
                                                               
                                                                   nLeft

                                     Low risk                       Depends on similarity




/ Mathematics and Computer Science                                           23-4-2012   PAGE 14

Mais conteúdo relacionado

Semelhante a IPA Spring Days 2012

02 c a306-phillips_langtags
02 c a306-phillips_langtags02 c a306-phillips_langtags
02 c a306-phillips_langtags
suvo1111
 
A Strong Object Recognition Using Lbp, Ltp And Rlbp
A Strong Object Recognition Using Lbp, Ltp And RlbpA Strong Object Recognition Using Lbp, Ltp And Rlbp
A Strong Object Recognition Using Lbp, Ltp And Rlbp
Rikki Wright
 

Semelhante a IPA Spring Days 2012 (20)

The Rise of Dynamic Languages
The Rise of Dynamic LanguagesThe Rise of Dynamic Languages
The Rise of Dynamic Languages
 
40cpv9ekrit7h1h772c3hp1mg2 (2)
40cpv9ekrit7h1h772c3hp1mg2 (2)40cpv9ekrit7h1h772c3hp1mg2 (2)
40cpv9ekrit7h1h772c3hp1mg2 (2)
 
02 c a306-phillips_langtags
02 c a306-phillips_langtags02 c a306-phillips_langtags
02 c a306-phillips_langtags
 
Prolog & lisp
Prolog & lispProlog & lisp
Prolog & lisp
 
About programming languages
About programming languagesAbout programming languages
About programming languages
 
A Strong Object Recognition Using Lbp, Ltp And Rlbp
A Strong Object Recognition Using Lbp, Ltp And RlbpA Strong Object Recognition Using Lbp, Ltp And Rlbp
A Strong Object Recognition Using Lbp, Ltp And Rlbp
 
Session 0.0 aussenac semanticsnl-pwebsem2017-v4
Session 0.0   aussenac semanticsnl-pwebsem2017-v4Session 0.0   aussenac semanticsnl-pwebsem2017-v4
Session 0.0 aussenac semanticsnl-pwebsem2017-v4
 
Aussenac semanticsnl pwebsem2017-v4
Aussenac semanticsnl pwebsem2017-v4Aussenac semanticsnl pwebsem2017-v4
Aussenac semanticsnl pwebsem2017-v4
 
Multilingual Search and Text Analytics with Solr - Open Source Search Conference
Multilingual Search and Text Analytics with Solr - Open Source Search ConferenceMultilingual Search and Text Analytics with Solr - Open Source Search Conference
Multilingual Search and Text Analytics with Solr - Open Source Search Conference
 
Representing Translations on the Semantic Web
Representing Translations on the Semantic WebRepresenting Translations on the Semantic Web
Representing Translations on the Semantic Web
 
Sugar Presentation - YULHackers March 2009
Sugar Presentation - YULHackers March 2009Sugar Presentation - YULHackers March 2009
Sugar Presentation - YULHackers March 2009
 
Why Languages Matter 20090123
Why Languages Matter 20090123Why Languages Matter 20090123
Why Languages Matter 20090123
 
HLT
HLTHLT
HLT
 
Advanced Language Technologies for Mathematical Markup
Advanced Language Technologies for Mathematical MarkupAdvanced Language Technologies for Mathematical Markup
Advanced Language Technologies for Mathematical Markup
 
From Programming to Modeling And Back Again
From Programming to Modeling And Back AgainFrom Programming to Modeling And Back Again
From Programming to Modeling And Back Again
 
MLE_keynote.pdf
MLE_keynote.pdfMLE_keynote.pdf
MLE_keynote.pdf
 
Can functional programming be liberated from static typing?
Can functional programming be liberated from static typing?Can functional programming be liberated from static typing?
Can functional programming be liberated from static typing?
 
ELKL 4, Language Technology: learning from endangered languages
ELKL 4, Language Technology: learning from endangered languagesELKL 4, Language Technology: learning from endangered languages
ELKL 4, Language Technology: learning from endangered languages
 
Esa act
Esa actEsa act
Esa act
 
Lecture 2: Language
Lecture 2: LanguageLecture 2: Language
Lecture 2: Language
 

Mais de Bogdan Vasilescu

Mais de Bogdan Vasilescu (7)

ICSM 2012 ERA
ICSM 2012 ERAICSM 2012 ERA
ICSM 2012 ERA
 
ICSM 2011
ICSM 2011ICSM 2011
ICSM 2011
 
Benevol 2011
Benevol 2011Benevol 2011
Benevol 2011
 
Sattose 2011
Sattose 2011Sattose 2011
Sattose 2011
 
Master Thesis presentation
Master Thesis presentationMaster Thesis presentation
Master Thesis presentation
 
Seeing the forest for the trees, UMons 2011
Seeing the forest for the trees, UMons 2011Seeing the forest for the trees, UMons 2011
Seeing the forest for the trees, UMons 2011
 
WETSoM 2011
WETSoM 2011WETSoM 2011
WETSoM 2011
 

Último

Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
kauryashika82
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 

Último (20)

Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
Asian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptxAsian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptx
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 

IPA Spring Days 2012

  • 1. Linguistic diversity in open-source development Bogdan Vasilescu Alexander Serebrenik Mark van den Brand
  • 2. Motivation … … I „speak‟ Java C C++ … HTML Lisp XML Java Python Unix shell I „speak‟ Java I „speak‟ Python and Python / Mathematics and Computer Science 23-4-2012 PAGE 1
  • 3. Motivation If leaves the project, what is the risk of not finding replacement developers that speak Python? No risk, plenty of other Python What about now? developers to choose from / Mathematics and Computer Science 23-4-2012 PAGE 2
  • 4. Linguistic diversity • Greenberg (1956) • compare geographic regions • probability that two random individuals do not speak the same language / Mathematics and Computer Science 23-4-2012 PAGE 3
  • 5. Linguistic diversity Probability that two random individuals do not speak the same language • Simple model • everyone speaks exactly one language • languages are independent 2 S A 1 p  p  L P / Mathematics and Computer Science 23-4-2012 PAGE 4
  • 6. Linguistic diversity Probability that two random individuals do not speak the same language • Related-languages model • everyone speaks exactly one language • languages are similar S B 1 p pm sim(, m) p ,m L P 0 sim(, m) 1 sim(, ) 1 / Mathematics and Computer Science 23-4-2012 PAGE 5
  • 7. Linguistic diversity Probability that two random individuals do not speak the same language • Polyglot related-languages model • everyone speaks at least one language • languages are similar sim(, m)  s ,m t Xs F 1 ps pt ps s ,t P ( L ) s t P L A, B, C P ( L) A, B, C , AB, AC , BC , ABC / Mathematics and Computer Science 23-4-2012 PAGE 6
  • 8. Our risk measure • Probability that two random individuals do not speak the same language sim(, m)  s ,m t F 1 ps pt s ,t P ( L ) s t • Risk of not finding developers that „speak‟  risk () 1 ps maxk s sim (k ) s P( L) / Mathematics and Computer Science 23-4-2012 PAGE 7
  • 9. StackOverflow.com / Mathematics and Computer Science 23-4-2012 PAGE 8
  • 10. User tags / Mathematics and Computer Science 23-4-2012 PAGE 9
  • 11. Similarity measure • Reverend Gonzo: Java, C, C++, C#, Python,… • Alexander Serebrenik: Prolog, SQL, C++,… • Bogdan Vasilescu: Python,… • Jon Skeet: C#, Java, ASP.net, XML,… • … > 400,000 • Association rule mining: Java • “C => Java” nBoth C sim k conf k  nLeft / Mathematics and Computer Science 23-4-2012 PAGE 10
  • 12. Similarity measure - results • Assembly posts: 44 • Assembly + Java developers: > 1000  When in need for Java developers, ask Assembly guys / Mathematics and Computer Science 23-4-2012 PAGE 11
  • 13. Case study - Emacs • 1985-2012: C, Emacs Lisp, C++, Java, Lisp, Python, M4, … (26) Exotic languages High/low risk / Mathematics and Computer Science 23-4-2012 PAGE 12
  • 14. Case study - Emacs C: spoken by half of the community + similar to other languages Python: spoken very sporadically low risk + similar to other languages  low risk / Mathematics and Computer Science 23-4-2012 PAGE 13
  • 15. Conclusions What is the risk of not finding developers that speak Python? • Risk measure risk () 1 ps maxk s sim (k ) s P( L) • Similarity measure (StackOverflow) • “C => Java” sim k conf k nBoth  nLeft Low risk Depends on similarity / Mathematics and Computer Science 23-4-2012 PAGE 14