O slideshow foi denunciado.
Seu SlideShare está sendo baixado. ×
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio

Confira estes a seguir

1 de 48 Anúncio
Anúncio

Mais Conteúdo rRelacionado

Semelhante a Saner17 sharma (20)

Mais recentes (20)

Anúncio

Saner17 sharma

  1. 1. Harnessing Twitter to Support Serendipitous Learning of Developers Abhishek Sharma1, Yuan Tian1, Agus Sulistya1, David Lo1 and Aiko Fallas Yamashita2 1School of Information Systems, Singapore Management University 2Oslo and Akershus University, Norway 24th IEEE International Conference on Software Analysis, Evolution, and Reengineering (SANER 2017)
  2. 2. • Keeping up to date a big challenge (Storey et al. TSE’16) Developer Challenges? 2
  3. 3. Why Twitter for Learning • Keeping up to date a big challenge (Storey et al. TSE’16) • Twitter is used by software developers to share important information (Tian et al. MSR’12) 2 https://unsplash.com/photos/HAIPJ8PyeL8
  4. 4. Why Twitter for Learning • Keeping up to date a big challenge (Storey et al. TSE’16) • Twitter is used by software developers to share important information (Tian et al. MSR’12) • Twitter enables serendipitous (pleasant and undirected) learning for developers (Singer et al. ICSE’14) 2 https://unsplash.com/photos/HAIPJ8PyeL8
  5. 5. Challenges • Finding useful articles not easy 3
  6. 6. Challenges • Finding useful articles not easy • Developers need to identify – many relevant Twitter users to follow – sieve through a large amount of tweets/URLs 3
  7. 7. Challenges • Finding useful articles not easy • Developers need to identify – many relevant Twitter users to follow – sieve through a large amount of tweets/URLs Singer et al. ICSE’14 3
  8. 8. Challenges • Finding useful articles not easy • Developers need to identify – many relevant Twitter users to follow – sieve through a large amount of tweets/URLs Singer et al. ICSE’14 • Too much information can make learning using Twitter an unpleasant experience 3 https://unsplash.com/photos/yD5rv8_WzxA
  9. 9. This Study • Can we automatically extract popular and relevant URLs from Twitter for developers • In this work, we: • propose 14 features to characterize a URL • evaluate a supervised and unsupervised approach to recommend URLs harvested from Twitter 4
  10. 10. Methodology (1): Collecting Seed Data 5
  11. 11. Methodology (1): Collecting Seed Data • Get a list of seed twitter users 5 http://www.noop.nl/2009/02/twitter-top-100-for-softwaredevelopers.htm
  12. 12. Methodology (1): Collecting Seed Data • Get a list of seed twitter users 5 http://www.noop.nl/2009/02/twitter-top-100-for-softwaredevelopers.htm
  13. 13. Methodology (1): Collecting Seed Data • Get a list of seed twitter users • Get a larger set of people who – Follow (or are followed by) >= 5 seed users – Results in 85,171 Twitter users 5
  14. 14. Methodology (1): Collecting Seed Data • Get a list of seed twitter users • Get a larger set of people who – Follow (or are followed by) >= 5 seed users – Results in 85,171 Twitter users • Collect tweets generated by these users for 1 month period (Nov’ 15) 5
  15. 15. Methodology (2): URL Extraction 615
  16. 16. Methodology (2): URL Extraction • Find tweets which contain keyword “java” (2,104 tweets) 616
  17. 17. Methodology (2): URL Extraction • Find tweets which contain keyword “java” (2,104 tweets) • Find tweets which contain an URL (1,606 tweets) 617 https://t.co/ https://b.ly/ https://go.cl
  18. 18. Methodology (2): URL Extraction • Find tweets which contain keyword “java” (2,104 tweets) • Find tweets which contain an URL (1,606 tweets) • Extract URLs http://ow.ly/UIxwS http://bit.ly/1OFsZSj http://goo.gl/IGxGlo https://t.co/ryPI3 618 https://t.co/ https://b.ly/ https://go.cl
  19. 19. Methodology (2): URL Extraction • Find tweets which contain keyword “java” (2,104 tweets) • Find tweets which contain an URL (1,606 tweets) • Extract URLs • Expand short URLs (770 expanded URLs) http://abc.com http://xyz.com http://abc.com http://xyz.com 619 https://t.co/ https://b.ly/ https://go.cl
  20. 20. Methodology (2): URL Extraction • Find tweets which contain keyword “java” (2,104 tweets) • Find tweets which contain an URL (1,606 tweets) • Extract URLs • Expand short URLs (770 expanded URLs) • Resolve duplicate/broken URLs (577) http://abc.com http://xyz.com 620 https://t.co/ https://b.ly/ https://go.cl
  21. 21. Methodology (3): Feature Extraction • 14 features extracted – Content – Popularity – Network 7
  22. 22. Methodology (3): Feature Extraction • Content 8
  23. 23. Methodology (3): Feature Extraction • Content – cosine similarity between keyword and 8
  24. 24. Methodology (3): Feature Extraction • Content – cosine similarity between keyword and • tweet text (CosSimT) 8
  25. 25. Methodology (3): Feature Extraction • Content – cosine similarity between keyword and • tweet text (CosSimT) • user profile text (CosSimP) 8
  26. 26. Methodology (3): Feature Extraction • Content – cosine similarity between keyword and • tweet text (CosSimT) • user profile text (CosSimP) • webpage text (CosSimW) 8
  27. 27. Methodology (3): Feature Extraction – Network 9
  28. 28. Methodology (3): Feature Extraction – Network • estimate importance of users through – centrality scores – page rank 9
  29. 29. – Network • estimate importance of users through – centrality scores – page rank 9 Methodology (3): Feature Extraction
  30. 30. – Network • estimate importance of users through – centrality scores – page rank – Popularity • number of times the tweets containing the URL were 9 Methodology (3): Feature Extraction
  31. 31. – Network • estimate importance of users through – centrality scores – page rank – Popularity • number of times the tweets containing the URL were – retweeted 9 Methodology (3): Feature Extraction
  32. 32. – Network • estimate importance of users through – centrality scores – page rank – Popularity • number of times the tweets containing the URL were – retweeted – liked 9 Methodology (3): Feature Extraction
  33. 33. Methodology (4): Labelling the URLs • Labelled independently by – 2 persons having having more than 4 years of professional programming experience in Java – one a PhD student and another a Research Engineer 10
  34. 34. Methodology (4): Labelling the URLs • Labelled independently by – 2 persons having having more than 4 years of professional programming experience in Java – one a PhD student and another a Research Engineer • Both persons sat together to resolve disagreements 10
  35. 35. Methodology (4): Labelling the URLs • Labelled independently by – 2 persons having having more than 4 years of professional programming experience in Java – one a PhD student and another a Research Engineer • Both persons sat together to resolve disagreements • URLs assigned relevance scores from 0-3 10
  36. 36. Methodology (5): Recommendation • Unsupervised –Borda Count – assigns ranking points for each feature score for an URL and then combines the scores 11 • Supervised –Learning to Rank – learns a ranking function based on the weighted sum of features of an URL
  37. 37. RQ1: Effectiveness of Our Approach 12 • NDCG (Normalized Discounted Cumulative Gain) • Measures the capability to recommend higher ranked URLs at top ranks • Score closer to 1 specifies better performance with the range of scores being 0-1
  38. 38. RQ1: Effectiveness of Our Approach 12 0.832 0.719 0 0.2 0.4 0.6 0.8 1 Supervised Unsupervised NDCGScore Recommendation Approach • NDCG (Normalized Discounted Cumulative Gain) • Measures the capability to recommend higher ranked URLs at top ranks • Score closer to 1 specifies better performance with the range of scores being 0-1
  39. 39. RQ2: Sensitivity of Supervised Approach to Training Data 13 0.832 0.825 0.833 0.845 0.834 0.842 0.837 0.847 0.843 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 10 9 8 7 6 5 4 3 2 NDCG Score k(nooffoldsused)
  40. 40. Threats to Validity 14
  41. 41. Threats to Validity • Subjectivity in the labelling process 14
  42. 42. Threats to Validity • Subjectivity in the labelling process – asked 2 persons to label independently 14
  43. 43. Threats to Validity • Subjectivity in the labelling process – asked 2 persons to label independently • Only 1 domain 14
  44. 44. Threats to Validity • Subjectivity in the labelling process – asked 2 persons to label independently • Only 1 domain – evaluate more domains in future work 14
  45. 45. Threats to Validity • Subjectivity in the labelling process – asked 2 persons to label independently • Only 1 domain – evaluate more domains in future work • Suitability of evaluation metric 14
  46. 46. Threats to Validity • Subjectivity in the labelling process – asked 2 persons to label independently • Only 1 domain – evaluate more domains in future work • Suitability of evaluation metric – used NDCG which is a standard metric 14
  47. 47. Conclusion and Future Work • Supervised and unsupervised approaches show promise in recommending URLs • Future work: – Automatically categorize the recommended URLs – Build an automated system to recommend relevant URLs 15
  48. 48. Feedback/Advice • What additional resources we can consider for mining URLs? • How to infer developer interests automatically? Thank you!

×