3. ● Growing availability and popularity of opinion-rich resources such as online
review sites, personal blogs and microblogging websites like twitter.
● A major challenge is to build technology to detect and summarize an
overall sentiment on such websites
● Automatically extracting sentiment from a given block of text or tweet
● Marketers can use this to research public opinion of their company and
products, or to analyze customer satisfaction
● Organizations can also use this to gather critical feedback about problems
in newly released products
● To promote research that will lead to better understanding of how
sentiment is conveyed in tweets and texts, SemEval (Semantic Evaluation)
2014 organizers had organized a task (Task 9) on sentiment analysis on
twitter dataset
Introduction
4. “Given a message, classify whether the message is of positive, negative, or
neutral sentiment. For messages conveying both a positive and negative
sentiment, whichever is the stronger sentiment should be chosen.”
● Two approaches:
1. Naive-Bayes Classifier
a. Pre-processing of tweets
Lower Case, @username, URLs, #hashTag, punctuations,
additional spaces
b . Feature Vector Creation
Unigram Model, trained and tested using nltk library
2. Support Vector Machine (SVM)
a. Pre-processing of tweets
CMU tokenizer, POS tagging, urls, @username, negations,
lowercase
b. Feature Vector Creation
POS-tag, world n gram, emoticons, all-caps, lexicon score,
cluster, punctuation, elongation of words
Message Polarity Classification
5. “Given a message containing a marked instance of a word or phrase,
determine whether that instance is positive, negative or neutral in that context.”
1. Lexicon Used
NRC Hashtag Sentiment Lexicon and Sentiment140 Lexicon
2. Pre-processing of tweets
CMU Tokenizer, POS Tagging, @username, url, negation
lower case
3. Features Used
POS Tags, Word N-Grams, Emoticons, All-Caps, Lexicon Scores,
Punctuation, Elongated Words, Linguistic Feature
4. Semantic Features
Adjective, Modifier, Verb-modifier, Subjective Relationship,
Dependency etc
Contextual Polarity disambiguation
6. Experiment with different features using
SVM
Accuracy
Achieved
(in %)
Only unigrams 63.8554
Without sentiment scores 64.0275
Bigram with thresholding (d = 1) 64.3718
All features + Trigrams 65.4045
All features 66.6093
All features without bigrams 67.2978
Experiment
using SVM
Recall Precision F-Score
Bigrams with
Thresholding (d =
1)
61.6582 63.5262 62.2186
Bigrams without
Thresholding
63.8728 66.2443 64.6489
Without
Bigrams
64.7117 66.3478 65.2356
Results (Message Polarity Classification)
7. Results (Contextual Polarity disambiguation)
Experiment with different features using SVM Accuracy Achieved
(in %)
All features (with 1000 test data-points, 21673 train data-points) 85.9
All features (with 10000 test data-points, 11673 train data-points) 87.71
Experiment using SVM Recall Precision F-Score
All Features(1K) 78.4063 79.7105 70.0381
All Features(10K) 81.2385 82.4210 81.7664