SlideShare a Scribd company logo
1 of 30
A Benchmark Study on
Sentiment Analysis for
Software Engineering Research
Nicole Novielli
@NicoleNovielli
Filippo Lanubile
@lanubile
Daniela Girardi
@DanielaGirard91
Sentiment analysis for software engineering
Collaborative software development
– Security concerns detection (Pletea et al., MSR’14)
– Impact on productivity (Ortu et al., MSR‘15)
– Early burnout discovery (Mantyla et al. MSR’15)
– Anger detection (Gachechiladze et al., ICSE-NIER‘17)
Collaborative knowledge sharing
– Empirically-driven guidelines for question writing (Calefato et al., IST 2018)
Requirements engineering
– User feedback (Guzman and Maalej, RE‘14)
– App improvement (Panichella et al., ICSME ‘14)
Actionable insights for
Off-the-shelf tools for sentiment analysis
Approach Ouput Validated on
Supervised learning
Bag-of-words
Probabilities:
• p(positive)
• p(negative)
• p(neutral)
Movie reviews
Tweets
Supervised learning Sentiment score in
[0,4]:
• 0 = very negative
• 2 = neutral
• 4 = very positive
Movie reviews
Lexicon-based
Dictionaries with a
priori polarity scores
in [-5, 5]
Sentiment scores
• Negative in [-5, -1]
• Positive in [1,5]
• Neutral = (-1,1)
Social media:
• YouTube
• Twitter
• MySpace
• …
http://sentistrength.wlv.ac.uk/
http://text-processing.com/
http://nlp.stanford.edu/sentiment/
Are off-the-shelf sentiment analysis tools
reliable for software engineering research?
RQ1: Do different sentiment analysis tools
agree with emotions of software developers?
The tools disagree with each other
Poor performance on technical texts
Disagreement can lead to diverging
conclusions
RQ2: Do sentiment analysis tools agree with
each other?
RQ3: Do different sentiment analysis tools lead
to contradictory results in software
engineering study?
RQ4: How does the choice of a sentiment
analysis tool affect conclusion validity?
Need for Software engineering (SE) specific tools for sentiment analysis
SE-specific sentiment analysis tools
• Senti4SD (Calefato et al. EMSE 2017)
• SentiCR(Ahmed et al., ASE ‘17)
• SentiStrength-SE (Islam and Zibran, MSR’17)
Supervised
Lexicon-based
F. Calefato, F. Lanubile, F. Maiorano, N. Novielli. Sentiment Polarity Detection for Software Development. EMSE, 2017
T. Ahmed, A. Bosu, A. Iqbal, and S. Rahimi. . SentiCR: a customized sentiment analysis tool for code review interactions, ASE 2017.
M.D.R. Islam and M.F. Zibran, Leveraging automated sentiment analysis in software engineering, MSR 2017.
Our replication
Research questions
RQ1: Do different sentiment analysis
tools agree with emotions of software
developers?
RQ2: Do sentiment analysis tools agree
with each other?
RQ2: Do SE-specific sentiment analysis
tools agree with each other?
RQ1: Do SE-specific sentiment analysis
tools agree with emotions of software
developers?
• Senti4SD (Calefato et al. EMSE 2017)
• SentiCR(Ahmed et al., ASE ‘17)
• SentiStrength-SE
(Islam and Zibran, MSR’17)
• SentiStrength (baseline)
• NLTK
• Stanford NLP
• Alchemy API
• SentiStrength
SE-specificOff-the-shelf
Our replication
Gold standard datasets
392 comments
(Murgia et al., MSR’14)
5869 comments
(Murgia et al., MSR’16)
4423 Qs, As, Cs
(Calefato et al., EMSE 2017)
Model-driven annotation
Model-driven annotation of emotions
Emotion Original study Our replication
Love Positive Positive
Joy Positive Positive
Surprise Positive Ambiguous
Anger Negative Negative
Sadness Negative Negative
Fear Negative Negative
No
emotion
Neutral Neutral
(Shaver et al., 1987)
Mapping emotions to polarity
I'm happy with the approach and
the code looks good
Joy -> Positive Polarity
Joy Happiness Satisfaction
Our replication
Gold standard datasets
392 comments
(Murgia et al., MSR’14)
5869 comments
(Murgia et al., MSR’16)
4423 Qs, As, Cs
(Calefato et al., EMSE 2017)
Model-driven annotation
1500 sentences QA on
Java libraries (Lin et al., ICSE’18)
1600 comments from
code review (Ahmed et al., ASE’17)
Ad-hoc annotation
Model-driven vs. ad-hoc annotation
Model-driven Ad-hoc
Theoretical models Yes No
Training of raters Yes No
Guidelines for
annotation
Based on
taxonomy
Based on subjective
perception
Research questions
RQ3: To what extent the labeling approach
has an impact on the performance of SE-
specific sentiment analysis tools?
Our replication
RQ1: Do different sentiment analysis tools
agree with emotions of software developers?
RQ2: Do sentiment analysis tools agree with
each other?
RQ2: Do SE-specific sentiment analysis tools
agree with each other?
RQ1: Do SE-specific sentiment analysis tools
agree with emotions of software developers?
Metrics
Our replication
• Weighted Cohens’ Kappa (1968) • Weighted Cohens’ Kappa (1968)
• Text categorization metrics
(Sebastiani, 2002)
– Precision
– Recall
– F-measure
J. Cohen. 1968. Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit. Psychological Bulletin, 70, 4, 213-220.
F. Sebastiani. 2002. Machine learning in automated text categorization. ACM Computing Surveys, 34,1, 1-47.
Weighted Cohens’ Kappa
Disagreement: strong vs. mild
negative neutral positive
negative 0 1 2
neutral 1 0 1
positive 2 1 0
Interpretation (Viera and Garrett, 2005)
• less than chance κ ≤ 0
• slight if 0.01 ≤ κ ≤ 0.20
• fair if 0.21 ≤ κ ≤ 0.40
• moderate if 0.41 ≤ κ ≤ 0.60
• substantial if 0.61 ≤ κ ≤ 0.80
• almost perfect if 0.81 ≤ κ ≤ 1
A.J. Viera, J.M. Garrett. 2005. Understanding interobserver agreement: the kappa statistic. Family Medicine, 37,5, 360–363
Experimental setting
Gold
standard
datasets
Train 70%Stratified
sampling
Senti4SD
updated model
SentiCR
updated model
Training of
supervised tools
Test 30%
SentiStrength-SE
SentiStrength
Assessment of
performance
Our replication: SE-specific tools
vs. manual annotation
Original study: off-the-shelf tools
RQ1: Do SE-specific sentiment analysis tools
agree with emotions of software developers?
Fair agreement
Substantial
agreement
Original study: off-the-shelf tools
Our replication: SE-specific tools
vs. manual annotation
RQ1: Do SE-specific sentiment analysis tools
agree with emotions of software developers?
Opportunistic
sampling using
SentiStrength
(Calefato et al., EMSE 2017)
Our replication
vs. manual annotation
RQ1: Do SE-specific sentiment analysis tools
agree with emotions of software developers?
• SE-specific optimization
improves the classification
accuracy
• Retraining supervised tools
produces better performance
Our replication
vs. manual annotation
RQ1: Do SE-specific sentiment analysis tools
agree with emotions of software developers?
• SE-specific optimization
improves the classification
accuracy
• Retraining supervised tools
produces better performance
• Comparable performance for
SentiStrength-SE (lexicon-
based)
RQ2: Do SE-specific sentiment analysis tools
agree with each other?
Our replication
Original study: off-the-shelf tools
From substantial
to perfect
agreement
From less than
chance to fair
agreement
RQ3: To what extent the labeling approach has an
impact on the performance of SE-specific sentiment analysis tools?
Model-
driven
annotation
Ad-hoc
annotation
RQ3: To what extent the labeling approach has an
impact on the performance of SE-specific sentiment analysis tools?
Model-driven annotation Ad-hoc annotation
• From substantial to perfect
agreement also between supervised
and lexicon-based tools
• From fair to moderate agreement
• Better agreement for supervised
approaches
Error analysis
Manual inspection of texts misclassified by all tools
Error analysis
Error analysis
Polar facts but neutral sentiment
‘I tried the following and it returns nothing’
---
‘This creates an unnecessary garbage list.
Sets.newHashSet should accept an Iterable.’
Error analysis
General error
Broken syntax as in ‘wontbe so bad’
---
Idiomatic expression ‘Are you out of
mind?’
Error analysis
Politeness
Context-dependent interpretation of
politeness by raters
‘Thank you’ vs. ‘Thank you!’
Lessons learned
 Reliable sentiment analysis in software engineering
is possible
Lessons learned
 Reliable sentiment analysis in software engineering
is possible
 Tuning of tools for software engineering improves
classification accuracy
 SE-specific tools agree with manual annotation
 SE-specific tools agree with each other
Lessons learned
 Reliable sentiment analysis in software engineering
is possible
 Tuning of tools for software engineering improves
classification accuracy
 SE-specific tools agree with manual annotation
 SE-specific tools agree with each other
 Grounding research on theoretical models of affect
is recommended
 The choice depends on the research goals: polarity vs.
fine-grained emotions, emotions vs. attitudes, etc.
Lessons learned
 Reliable sentiment analysis in software engineering
is possible
 Tuning of tools for software engineering improves
classification accuracy
 SE-specific tools agree with manual annotation
 SE-specific tools agree with each other
 Grounding research on theoretical models of affect
is recommended
 The choice depends on the research goals: polarity vs.
fine-grained emotions, emotions vs. attitudes, etc.
 Preliminary sanity check is always recommended

More Related Content

What's hot

project sentiment analysis
project sentiment analysisproject sentiment analysis
project sentiment analysis
sneha penmetsa
 
Project sentiment analysis
Project sentiment analysisProject sentiment analysis
Project sentiment analysis
Bob Prieto
 

What's hot (15)

How to Test Whether Consciousness Can Be Revived From Digital Reflections of ...
How to Test Whether Consciousness Can Be Revived From Digital Reflections of ...How to Test Whether Consciousness Can Be Revived From Digital Reflections of ...
How to Test Whether Consciousness Can Be Revived From Digital Reflections of ...
 
Supervised Learning Based Approach to Aspect Based Sentiment Analysis
Supervised Learning Based Approach to Aspect Based Sentiment AnalysisSupervised Learning Based Approach to Aspect Based Sentiment Analysis
Supervised Learning Based Approach to Aspect Based Sentiment Analysis
 
Exploring Capturable Everyday Memory for Autobiographical Authentication, at ...
Exploring Capturable Everyday Memory for Autobiographical Authentication, at ...Exploring Capturable Everyday Memory for Autobiographical Authentication, at ...
Exploring Capturable Everyday Memory for Autobiographical Authentication, at ...
 
Introduction to Sentiment Analysis
Introduction to Sentiment AnalysisIntroduction to Sentiment Analysis
Introduction to Sentiment Analysis
 
How Can Software Engineering Support AI
How Can Software Engineering Support AIHow Can Software Engineering Support AI
How Can Software Engineering Support AI
 
IRE Major Project
IRE Major Project IRE Major Project
IRE Major Project
 
project sentiment analysis
project sentiment analysisproject sentiment analysis
project sentiment analysis
 
Project sentiment analysis
Project sentiment analysisProject sentiment analysis
Project sentiment analysis
 
MTech Seminar Presentation [IIT-Bombay]
MTech Seminar Presentation [IIT-Bombay]MTech Seminar Presentation [IIT-Bombay]
MTech Seminar Presentation [IIT-Bombay]
 
SemEval - Aspect Based Sentiment Analysis
SemEval - Aspect Based Sentiment AnalysisSemEval - Aspect Based Sentiment Analysis
SemEval - Aspect Based Sentiment Analysis
 
IRJET- Semantic Question Matching
IRJET- Semantic Question MatchingIRJET- Semantic Question Matching
IRJET- Semantic Question Matching
 
Philosophy of Software Diagnostics
Philosophy of Software DiagnosticsPhilosophy of Software Diagnostics
Philosophy of Software Diagnostics
 
A data driven approach to query expansion in question answering
A data driven approach to query expansion in question answeringA data driven approach to query expansion in question answering
A data driven approach to query expansion in question answering
 
Predictive uncertainty of deep models and its applications
Predictive uncertainty of deep models and its applicationsPredictive uncertainty of deep models and its applications
Predictive uncertainty of deep models and its applications
 
LSTM Based Sentiment Analysis
LSTM Based Sentiment AnalysisLSTM Based Sentiment Analysis
LSTM Based Sentiment Analysis
 

Similar to A Benchmark Study on Sentiment Analysis for Software Engineering Research

Affective Trust as a Predictor of Successful Collaboration in Distributed Sof...
Affective Trust as a Predictor of Successful Collaboration in Distributed Sof...Affective Trust as a Predictor of Successful Collaboration in Distributed Sof...
Affective Trust as a Predictor of Successful Collaboration in Distributed Sof...
Fabio Calefato
 
Sentiment Analysis for Software EngineeringHow Far Can We G.docx
Sentiment Analysis for Software EngineeringHow Far Can We G.docxSentiment Analysis for Software EngineeringHow Far Can We G.docx
Sentiment Analysis for Software EngineeringHow Far Can We G.docx
edgar6wallace88877
 

Similar to A Benchmark Study on Sentiment Analysis for Software Engineering Research (20)

To Label or Not? Advances and Open Challenges in SE-specific Sentiment Analysis
To Label or Not? Advances and Open Challenges in SE-specific Sentiment AnalysisTo Label or Not? Advances and Open Challenges in SE-specific Sentiment Analysis
To Label or Not? Advances and Open Challenges in SE-specific Sentiment Analysis
 
A Gold Standard for Emotion Annotation in Stack Overflow
A Gold Standard for Emotion Annotation in Stack Overflow A Gold Standard for Emotion Annotation in Stack Overflow
A Gold Standard for Emotion Annotation in Stack Overflow
 
A Context-Based Algorithm For Sentiment Analysis
A Context-Based Algorithm For Sentiment AnalysisA Context-Based Algorithm For Sentiment Analysis
A Context-Based Algorithm For Sentiment Analysis
 
Emotion Classification In Software Engineering Texts: A Comparative Analysis ...
Emotion Classification In Software Engineering Texts: A Comparative Analysis ...Emotion Classification In Software Engineering Texts: A Comparative Analysis ...
Emotion Classification In Software Engineering Texts: A Comparative Analysis ...
 
Hybrid Deep Learning Model for Multilingual Sentiment Analysis
Hybrid Deep Learning Model for Multilingual Sentiment AnalysisHybrid Deep Learning Model for Multilingual Sentiment Analysis
Hybrid Deep Learning Model for Multilingual Sentiment Analysis
 
Applying AI to software engineering problems: Do not forget the human!
Applying AI to software engineering problems: Do not forget the human!Applying AI to software engineering problems: Do not forget the human!
Applying AI to software engineering problems: Do not forget the human!
 
Sentiment analysis tools for software engineering research cannot be used out...
Sentiment analysis tools for software engineering research cannot be used out...Sentiment analysis tools for software engineering research cannot be used out...
Sentiment analysis tools for software engineering research cannot be used out...
 
Explainable AI for non-expert users
Explainable AI for non-expert usersExplainable AI for non-expert users
Explainable AI for non-expert users
 
On serendipity in recommender systems - Haifa RecSoc workshop june 2015
On serendipity in recommender systems - Haifa RecSoc workshop june 2015On serendipity in recommender systems - Haifa RecSoc workshop june 2015
On serendipity in recommender systems - Haifa RecSoc workshop june 2015
 
A General Architecture for an Emotion-aware Content-based Recommender System
A General Architecture for an Emotion-aware Content-based Recommender SystemA General Architecture for an Emotion-aware Content-based Recommender System
A General Architecture for an Emotion-aware Content-based Recommender System
 
REVIEW PPT.pptx
REVIEW PPT.pptxREVIEW PPT.pptx
REVIEW PPT.pptx
 
Affective Trust as a Predictor of Successful Collaboration in Distributed Sof...
Affective Trust as a Predictor of Successful Collaboration in Distributed Sof...Affective Trust as a Predictor of Successful Collaboration in Distributed Sof...
Affective Trust as a Predictor of Successful Collaboration in Distributed Sof...
 
IRJET- A Review on: Sentiment Polarity Analysis on Twitter Data from Diff...
IRJET-  	  A Review on: Sentiment Polarity Analysis on Twitter Data from Diff...IRJET-  	  A Review on: Sentiment Polarity Analysis on Twitter Data from Diff...
IRJET- A Review on: Sentiment Polarity Analysis on Twitter Data from Diff...
 
Aspect-Level Sentiment Analysis On Hotel Reviews
Aspect-Level Sentiment Analysis On Hotel ReviewsAspect-Level Sentiment Analysis On Hotel Reviews
Aspect-Level Sentiment Analysis On Hotel Reviews
 
Sentiment Analysis for Software EngineeringHow Far Can We G.docx
Sentiment Analysis for Software EngineeringHow Far Can We G.docxSentiment Analysis for Software EngineeringHow Far Can We G.docx
Sentiment Analysis for Software EngineeringHow Far Can We G.docx
 
Lac presentation
Lac presentationLac presentation
Lac presentation
 
OPTIMIZATION OF CROSS DOMAIN SENTIMENT ANALYSIS USING SENTIWORDNET
OPTIMIZATION OF CROSS DOMAIN SENTIMENT ANALYSIS USING SENTIWORDNETOPTIMIZATION OF CROSS DOMAIN SENTIMENT ANALYSIS USING SENTIWORDNET
OPTIMIZATION OF CROSS DOMAIN SENTIMENT ANALYSIS USING SENTIWORDNET
 
Using NLP Approach for Analyzing Customer Reviews
Using NLP Approach for Analyzing Customer Reviews Using NLP Approach for Analyzing Customer Reviews
Using NLP Approach for Analyzing Customer Reviews
 
USING NLP APPROACH FOR ANALYZING CUSTOMER REVIEWS
USING NLP APPROACH FOR ANALYZING CUSTOMER REVIEWSUSING NLP APPROACH FOR ANALYZING CUSTOMER REVIEWS
USING NLP APPROACH FOR ANALYZING CUSTOMER REVIEWS
 
Sentiment analysis
Sentiment analysisSentiment analysis
Sentiment analysis
 

More from Nicole Novielli

The Challenges of Affect Detection in the Social Programmer Ecosystem
The Challenges of Affect Detection in the Social Programmer EcosystemThe Challenges of Affect Detection in the Social Programmer Ecosystem
The Challenges of Affect Detection in the Social Programmer Ecosystem
Nicole Novielli
 
Social Network Analysis for Global Software Engineering: Exploring relationsh...
Social Network Analysis for Global Software Engineering: Exploring relationsh...Social Network Analysis for Global Software Engineering: Exploring relationsh...
Social Network Analysis for Global Software Engineering: Exploring relationsh...
Nicole Novielli
 

More from Nicole Novielli (11)

A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Towards Supporting Emotion Awareness of Software Developers
Towards Supporting Emotion Awareness of Software DevelopersTowards Supporting Emotion Awareness of Software Developers
Towards Supporting Emotion Awareness of Software Developers
 
Keynote@QUATIC - Recognizing Developer's Emotions: Advances and Open Challenges
Keynote@QUATIC - Recognizing Developer's Emotions: Advances and Open ChallengesKeynote@QUATIC - Recognizing Developer's Emotions: Advances and Open Challenges
Keynote@QUATIC - Recognizing Developer's Emotions: Advances and Open Challenges
 
Emotion Detection Using Noninvasive Low-cost Sensors
Emotion Detection Using Noninvasive Low-cost SensorsEmotion Detection Using Noninvasive Low-cost Sensors
Emotion Detection Using Noninvasive Low-cost Sensors
 
Evalita2018 iListen - itaLIan Speech acT labEliNg
Evalita2018 iListen - itaLIan Speech acT labEliNgEvalita2018 iListen - itaLIan Speech acT labEliNg
Evalita2018 iListen - itaLIan Speech acT labEliNg
 
The Challenges of Affect Detection in the Social Programmer Ecosystem
The Challenges of Affect Detection in the Social Programmer EcosystemThe Challenges of Affect Detection in the Social Programmer Ecosystem
The Challenges of Affect Detection in the Social Programmer Ecosystem
 
Deep Tweets: from Entity Linking to Sentiment Analysis
Deep Tweets: from Entity Linking to Sentiment AnalysisDeep Tweets: from Entity Linking to Sentiment Analysis
Deep Tweets: from Entity Linking to Sentiment Analysis
 
UNIBA at EVALITA 2014-SENTIPOLC Task: Predicting tweet sentiment polarity com...
UNIBA at EVALITA 2014-SENTIPOLC Task: Predicting tweet sentiment polarity com...UNIBA at EVALITA 2014-SENTIPOLC Task: Predicting tweet sentiment polarity com...
UNIBA at EVALITA 2014-SENTIPOLC Task: Predicting tweet sentiment polarity com...
 
Towards Discovering the Role of Emotions in Stack Overflow
Towards Discovering the Role of Emotions in Stack OverflowTowards Discovering the Role of Emotions in Stack Overflow
Towards Discovering the Role of Emotions in Stack Overflow
 
A Preliminary Investigation of the Effect of Social Media on Affective Trust ...
A Preliminary Investigation of the Effect of Social Media on Affective Trust ...A Preliminary Investigation of the Effect of Social Media on Affective Trust ...
A Preliminary Investigation of the Effect of Social Media on Affective Trust ...
 
Social Network Analysis for Global Software Engineering: Exploring relationsh...
Social Network Analysis for Global Software Engineering: Exploring relationsh...Social Network Analysis for Global Software Engineering: Exploring relationsh...
Social Network Analysis for Global Software Engineering: Exploring relationsh...
 

Recently uploaded

Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
ahmedjiabur940
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
gajnagarg
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
gajnagarg
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
gajnagarg
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
gajnagarg
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Bertram Ludäscher
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
gajnagarg
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
Health
 

Recently uploaded (20)

Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
 
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 

A Benchmark Study on Sentiment Analysis for Software Engineering Research

  • 1. A Benchmark Study on Sentiment Analysis for Software Engineering Research Nicole Novielli @NicoleNovielli Filippo Lanubile @lanubile Daniela Girardi @DanielaGirard91
  • 2. Sentiment analysis for software engineering Collaborative software development – Security concerns detection (Pletea et al., MSR’14) – Impact on productivity (Ortu et al., MSR‘15) – Early burnout discovery (Mantyla et al. MSR’15) – Anger detection (Gachechiladze et al., ICSE-NIER‘17) Collaborative knowledge sharing – Empirically-driven guidelines for question writing (Calefato et al., IST 2018) Requirements engineering – User feedback (Guzman and Maalej, RE‘14) – App improvement (Panichella et al., ICSME ‘14) Actionable insights for
  • 3. Off-the-shelf tools for sentiment analysis Approach Ouput Validated on Supervised learning Bag-of-words Probabilities: • p(positive) • p(negative) • p(neutral) Movie reviews Tweets Supervised learning Sentiment score in [0,4]: • 0 = very negative • 2 = neutral • 4 = very positive Movie reviews Lexicon-based Dictionaries with a priori polarity scores in [-5, 5] Sentiment scores • Negative in [-5, -1] • Positive in [1,5] • Neutral = (-1,1) Social media: • YouTube • Twitter • MySpace • … http://sentistrength.wlv.ac.uk/ http://text-processing.com/ http://nlp.stanford.edu/sentiment/ Are off-the-shelf sentiment analysis tools reliable for software engineering research?
  • 4. RQ1: Do different sentiment analysis tools agree with emotions of software developers? The tools disagree with each other Poor performance on technical texts Disagreement can lead to diverging conclusions RQ2: Do sentiment analysis tools agree with each other? RQ3: Do different sentiment analysis tools lead to contradictory results in software engineering study? RQ4: How does the choice of a sentiment analysis tool affect conclusion validity? Need for Software engineering (SE) specific tools for sentiment analysis
  • 5. SE-specific sentiment analysis tools • Senti4SD (Calefato et al. EMSE 2017) • SentiCR(Ahmed et al., ASE ‘17) • SentiStrength-SE (Islam and Zibran, MSR’17) Supervised Lexicon-based F. Calefato, F. Lanubile, F. Maiorano, N. Novielli. Sentiment Polarity Detection for Software Development. EMSE, 2017 T. Ahmed, A. Bosu, A. Iqbal, and S. Rahimi. . SentiCR: a customized sentiment analysis tool for code review interactions, ASE 2017. M.D.R. Islam and M.F. Zibran, Leveraging automated sentiment analysis in software engineering, MSR 2017.
  • 6. Our replication Research questions RQ1: Do different sentiment analysis tools agree with emotions of software developers? RQ2: Do sentiment analysis tools agree with each other? RQ2: Do SE-specific sentiment analysis tools agree with each other? RQ1: Do SE-specific sentiment analysis tools agree with emotions of software developers? • Senti4SD (Calefato et al. EMSE 2017) • SentiCR(Ahmed et al., ASE ‘17) • SentiStrength-SE (Islam and Zibran, MSR’17) • SentiStrength (baseline) • NLTK • Stanford NLP • Alchemy API • SentiStrength SE-specificOff-the-shelf
  • 7. Our replication Gold standard datasets 392 comments (Murgia et al., MSR’14) 5869 comments (Murgia et al., MSR’16) 4423 Qs, As, Cs (Calefato et al., EMSE 2017) Model-driven annotation
  • 8. Model-driven annotation of emotions Emotion Original study Our replication Love Positive Positive Joy Positive Positive Surprise Positive Ambiguous Anger Negative Negative Sadness Negative Negative Fear Negative Negative No emotion Neutral Neutral (Shaver et al., 1987) Mapping emotions to polarity I'm happy with the approach and the code looks good Joy -> Positive Polarity Joy Happiness Satisfaction
  • 9. Our replication Gold standard datasets 392 comments (Murgia et al., MSR’14) 5869 comments (Murgia et al., MSR’16) 4423 Qs, As, Cs (Calefato et al., EMSE 2017) Model-driven annotation 1500 sentences QA on Java libraries (Lin et al., ICSE’18) 1600 comments from code review (Ahmed et al., ASE’17) Ad-hoc annotation
  • 10. Model-driven vs. ad-hoc annotation Model-driven Ad-hoc Theoretical models Yes No Training of raters Yes No Guidelines for annotation Based on taxonomy Based on subjective perception
  • 11. Research questions RQ3: To what extent the labeling approach has an impact on the performance of SE- specific sentiment analysis tools? Our replication RQ1: Do different sentiment analysis tools agree with emotions of software developers? RQ2: Do sentiment analysis tools agree with each other? RQ2: Do SE-specific sentiment analysis tools agree with each other? RQ1: Do SE-specific sentiment analysis tools agree with emotions of software developers?
  • 12. Metrics Our replication • Weighted Cohens’ Kappa (1968) • Weighted Cohens’ Kappa (1968) • Text categorization metrics (Sebastiani, 2002) – Precision – Recall – F-measure J. Cohen. 1968. Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit. Psychological Bulletin, 70, 4, 213-220. F. Sebastiani. 2002. Machine learning in automated text categorization. ACM Computing Surveys, 34,1, 1-47.
  • 13. Weighted Cohens’ Kappa Disagreement: strong vs. mild negative neutral positive negative 0 1 2 neutral 1 0 1 positive 2 1 0 Interpretation (Viera and Garrett, 2005) • less than chance κ ≤ 0 • slight if 0.01 ≤ κ ≤ 0.20 • fair if 0.21 ≤ κ ≤ 0.40 • moderate if 0.41 ≤ κ ≤ 0.60 • substantial if 0.61 ≤ κ ≤ 0.80 • almost perfect if 0.81 ≤ κ ≤ 1 A.J. Viera, J.M. Garrett. 2005. Understanding interobserver agreement: the kappa statistic. Family Medicine, 37,5, 360–363
  • 14. Experimental setting Gold standard datasets Train 70%Stratified sampling Senti4SD updated model SentiCR updated model Training of supervised tools Test 30% SentiStrength-SE SentiStrength Assessment of performance
  • 15. Our replication: SE-specific tools vs. manual annotation Original study: off-the-shelf tools RQ1: Do SE-specific sentiment analysis tools agree with emotions of software developers? Fair agreement Substantial agreement
  • 16. Original study: off-the-shelf tools Our replication: SE-specific tools vs. manual annotation RQ1: Do SE-specific sentiment analysis tools agree with emotions of software developers? Opportunistic sampling using SentiStrength (Calefato et al., EMSE 2017)
  • 17. Our replication vs. manual annotation RQ1: Do SE-specific sentiment analysis tools agree with emotions of software developers? • SE-specific optimization improves the classification accuracy • Retraining supervised tools produces better performance
  • 18. Our replication vs. manual annotation RQ1: Do SE-specific sentiment analysis tools agree with emotions of software developers? • SE-specific optimization improves the classification accuracy • Retraining supervised tools produces better performance • Comparable performance for SentiStrength-SE (lexicon- based)
  • 19. RQ2: Do SE-specific sentiment analysis tools agree with each other? Our replication Original study: off-the-shelf tools From substantial to perfect agreement From less than chance to fair agreement
  • 20. RQ3: To what extent the labeling approach has an impact on the performance of SE-specific sentiment analysis tools? Model- driven annotation Ad-hoc annotation
  • 21. RQ3: To what extent the labeling approach has an impact on the performance of SE-specific sentiment analysis tools? Model-driven annotation Ad-hoc annotation • From substantial to perfect agreement also between supervised and lexicon-based tools • From fair to moderate agreement • Better agreement for supervised approaches
  • 22. Error analysis Manual inspection of texts misclassified by all tools
  • 24. Error analysis Polar facts but neutral sentiment ‘I tried the following and it returns nothing’ --- ‘This creates an unnecessary garbage list. Sets.newHashSet should accept an Iterable.’
  • 25. Error analysis General error Broken syntax as in ‘wontbe so bad’ --- Idiomatic expression ‘Are you out of mind?’
  • 26. Error analysis Politeness Context-dependent interpretation of politeness by raters ‘Thank you’ vs. ‘Thank you!’
  • 27. Lessons learned  Reliable sentiment analysis in software engineering is possible
  • 28. Lessons learned  Reliable sentiment analysis in software engineering is possible  Tuning of tools for software engineering improves classification accuracy  SE-specific tools agree with manual annotation  SE-specific tools agree with each other
  • 29. Lessons learned  Reliable sentiment analysis in software engineering is possible  Tuning of tools for software engineering improves classification accuracy  SE-specific tools agree with manual annotation  SE-specific tools agree with each other  Grounding research on theoretical models of affect is recommended  The choice depends on the research goals: polarity vs. fine-grained emotions, emotions vs. attitudes, etc.
  • 30. Lessons learned  Reliable sentiment analysis in software engineering is possible  Tuning of tools for software engineering improves classification accuracy  SE-specific tools agree with manual annotation  SE-specific tools agree with each other  Grounding research on theoretical models of affect is recommended  The choice depends on the research goals: polarity vs. fine-grained emotions, emotions vs. attitudes, etc.  Preliminary sanity check is always recommended

Editor's Notes

  1. Better performance for supervised approaches
  2. Better performance for supervised approaches
  3. Mainly observed in ad hoc annotation datasets