SlideShare uma empresa Scribd logo
1 de 35
TUTORIAL OF SENTIMENT
ANALYSIS
Fabio Benedetti
Outline
• Introduction to vocabularies used in

sentiment analysis
• Description of GitHub project
• Twitter Dev & script for download of tweets
• Simple sentiment classification with AFINN-111
• Define sentiment scores of new words
• Sentiment classification with SentiWordNet
• Document sentiment classification
AFINN-111
• AFINN is a list of English words rated for sentiment

score.

• between -5 (negative) to +5 (positive).

• AFINN-111: Newest version with 2477 words and

phrases.

…
Abilities 2
Ability 2
Aboard
1
Absentee -1
…
WordNet
• WordNet is lexical database for the English language

that groups English word into set of synonyms called
synset
• WordNet distinguishes between :
• nouns
• verbs
• adjectives
• adverbs
SYNSET#

SYNSET4

SYNSET2

SYNSET1
• SentiWordNet is an extension of WordNet that adds

for each synset 3 measures:

• PosScore [0,1] : positivity measure
• NegScore [0,1]: negativity measure

• ObjScore [0,1]: objective measure

ObjScore
a
a

00016135
00016247

0
0.125

=

1

– (PosScore + NegScore )

0.25 rank#5
0.5
superabundant#1

growing profusely; "rank jungle vegetation"
most excessively abundant

• SentiWordNet 3.0: An Enhanced Lexical Resource for

Sentiment Analysis and Opinion Mining
• http://sentiwordnet.isti.cnr.it/
Project on GitHub
• https://github.com/linkTDP/BigDataAnalysis_TweetSentim

ent

• AFINN-111.txt
• SentiWordNet_3.0.0_20130122.txt
• config.json
• ExtractTweet.py
• DeriveTweetSentimentEasy.py
• NewTermSentimentInference.py
• SentiWordnet.py
• DocumentSentimentClassification.py
config.json & ExtractTweet.py (1)
This script can be used to download tweets in a csv file and
is configurable through config.json
The authentication fields that must be set are:
• consumer_key
• consumer_secret
• access_token
• access_token_secret

These fields can be retrieved from https://dev.twitter.com
creating an account and an application
Twitter Developers
• Create an account on the site:

https://dev.twitter.com/
config.json & ExtractTweet.py (2)
Other fields:
• file_name (name of the .cvs output file)
• count (number of tweet to download)
• filter (a word used to filter the tweet in output)

The CSV file produced in output can be used as input
of the other three script.
DeriveTweetSentimentEasy.py
This script use AFINN-111 as vocabulary
In AFINN-111 the score is negative and positive
according to sentiment of the word.
Therefore a very rudimental sentiment score of the
tweet can be calculated summing the score of each
word.

Issue:
In AFINN-111 not all the words are present.
NewTermSentimentInference.py
•
SentiWordnet.py
This script use SentiWordNet as vocabulary and an the
algorithm that is implemented is inspired by :
Hamouda, Alaa, and Mohamed Rohaim. "Reviews
classification using sentiwordnet lexicon." World
Congress on Computer Science and Information
Technology. 2011.
http://www.academia.edu/1336655/Reviews_Classific
ation_Using_SentiWordNet_Lexicon
Sentiment Classification Phases
Tweet

Tokenization

Speech
Tagging

WordNet
WSD

SentiWordNet
Interpretation

Sentiment
Orientation

Tweet
Classified
Tokenization & Speech Tagging
• Tokenization process: splits the text into very simple

tokens such as numbers, punctuation and words
of different types.

• Speech Tagging process: produces a tag as an

annotation based on the role of each word in the
tweet.

noun

verb

noun

adverb

Francesco

speaks

English

well
Word Sense Disambiguation
The techniques of WSD are aimed at the
determination of the meaning of every word in his
context.

In this case the disambiguation happens selecting for
each words in a tweet the synset in WordNet that best
represents this word in his context.
Word Sense Disambiguation (2)
I have implemented a simple (and inaccurate) algorithm
of WSD using NLTK (Python's library for NLP).
Each synset in WordNet has a textual a brief description
called Gloss.
Very intuitively this algorithm choose as synset of the word
the one whose Gloss contains the largest number of words
present in the tweet.
If no Gloss has a match with the tweet's words, the
algorithm choose the first synset, that usually is the most
used.
Issue:

The corpus of a tweet is very small (max 140 character), so
this algorithm could produce a bad disambiguation of the
word's sense.
SentiWordNet Interpretation
Given a synset (after the phase of WSD) we can search in
SentiWordNet the sentiment score associated to this synset
tweet
@BonksMullet @chet_sellers This is very accurate and hilarious.
Well done :)
WSD
synset
accurate#1 conforming exactly or almost exactly to fact or to a standard
or performing with total accuracy; "an accurate reproduction"; "the
accounting was accurate"; "accurate measurements"; "an accurate scale"

SentiWordNet
score
Pos_score
0.5

Neg_score
0

Obj_score
0.5
Sentiment Orientation
•
Sentiment Orientation (1)
•
Sentiment Orientation (2)
•
Tweet Classified
•
Open issues
• the tweet's corpus is too short to use the great part of the

WSD techniques
• In this kind of short texts (tweet or Facebook's comments)
is used a particular slang that needs ad hoc techniques
to be processed.

Insights:
• Apoorv Agarwal, Boyi Xie, Ilia Vovsha, Owen

Rambow, and Rebecca Passonneau. 2011. Sentiment
analysis of Twitter data. In Proceedings of the Workshop
on Languages in Social Media (LSM '11)
• Gokulakrishnan, B.; Priyanthan, P.; Ragavan, T.;
Prasath, N.; Perera, A., "Opinion mining and sentiment
analysis on a Twitter data stream," Advances in ICT for
Emerging Regions (ICTer), 2012 International Conference
on.
Example of Documents Sentiment
Classification
DocumentSentimentClassification.py
Implementation of the algorithm for Document
Classification see at lesson

Turney, Peter D., and Michael L. Littman. "Measuring
praise and criticism: Inference of semantic orientation
from association." ACM Transactions on Information
Systems (TOIS) 21.4 (2003): 315-346.
Parameters
Parameters (at the start of the code):
• FILE_NAME = “ name of the file .txt on which you want

execute the classification”
• API_KEY_BING = “Api Key Bing”
• API_KEY_GOOGLE = “Api Key for Custom Search Api”
• USE_GOOGLE = (Boolean) Enable (True) or Disable
(False) the use of the Google Api for Custom Search

The number of free queries per day using Google Api are
limited to 100!!
Libraries
• NLTK – Natural Language Toolkit
• tokenizers/punkt/english.pickle Module
• Requests
• Math
• Urllib2
• google-api-python-client
• https://code.google.com/p/google-api-python-client/

This libraries could be installed using Pip:
pip install <library name>
Bing API
• https://datamarket.azure.com/dataset/bing/search
Bing API - Key
Google API – Custom Search
• https://cloud.google.com/console#/project
Google API – Custom Search
• https://cloud.google.com/console#/project
Google API – Custom Search (1)
Google API – Custom Search (1)
Google API – Custom Search (1)
References
• AFFIN-111 -

•
•

•

•

•

http://www2.imm.dtu.dk/pubdb/views/publication_details.php
?id=6010
SentiWordNet - http://sentiwordnet.isti.cnr.it/
SENTIWORDNET: A Publicly Available Lexical Resource for
Opinion Mining http://nmis.isti.cnr.it/sebastiani/Publications/LREC06.pdf
Reviews ClassificationUsing SentiWordNet Lexicon http://www.academia.edu/1336655/Reviews_Classification_Usi
ng_SentiWordNet_Lexicon
Using SentiWordNet and Sentiment Analysis for Detecting
Radical Content on Web Forums http://www.jeremyellman.com/jeremy_unn/pdfs/1_____Chaloth
orn_Ellman_SKIMA_2012.pdf
From tweets to polls: Linking text sentiment to public opinion
time series http://www.aaai.org/ocs/index.php/ICWSM/ICWSM10/paper/vi
ewFile/1536/1842
References
• Natural Language Toolkit - http://nltk.org/
• Twitter Developers - https://dev.twitter.com/
• Tweepy - https://github.com/tweepy/tweepy

• Python csv -

http://www.pythonforbeginners.com/systems
-programming/using-the-csv-module-inpython/

Mais conteúdo relacionado

Mais procurados

Sentiment Analysis Using Machine Learning
Sentiment Analysis Using Machine LearningSentiment Analysis Using Machine Learning
Sentiment Analysis Using Machine LearningNihar Suryawanshi
 
Natural language processing
Natural language processingNatural language processing
Natural language processingAbash shah
 
NLP using transformers
NLP using transformers NLP using transformers
NLP using transformers Arvind Devaraj
 
Anthiil Inside workshop on NLP
Anthiil Inside workshop on NLPAnthiil Inside workshop on NLP
Anthiil Inside workshop on NLPSatyam Saxena
 
Word embeddings
Word embeddingsWord embeddings
Word embeddingsShruti kar
 
natural language processing help at myassignmenthelp.net
natural language processing  help at myassignmenthelp.netnatural language processing  help at myassignmenthelp.net
natural language processing help at myassignmenthelp.netwww.myassignmenthelp.net
 
Neural Architectures for Named Entity Recognition
Neural Architectures for Named Entity RecognitionNeural Architectures for Named Entity Recognition
Neural Architectures for Named Entity RecognitionRrubaa Panchendrarajan
 
Text generation and_advanced_topics
Text generation and_advanced_topicsText generation and_advanced_topics
Text generation and_advanced_topicsankit_ppt
 
Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)Alia Hamwi
 
Sentiment analysis using naive bayes classifier
Sentiment analysis using naive bayes classifier Sentiment analysis using naive bayes classifier
Sentiment analysis using naive bayes classifier Dev Sahu
 
NLP_KASHK:Evaluating Language Model
NLP_KASHK:Evaluating Language ModelNLP_KASHK:Evaluating Language Model
NLP_KASHK:Evaluating Language ModelHemantha Kulathilake
 
Introduction to Named Entity Recognition
Introduction to Named Entity RecognitionIntroduction to Named Entity Recognition
Introduction to Named Entity RecognitionTomer Lieber
 
Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)
Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)
Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)Sergey Karayev
 
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...Simplilearn
 
Sentiment analysis presentation
Sentiment analysis presentationSentiment analysis presentation
Sentiment analysis presentationGunjanSrivastava23
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingToine Bogers
 

Mais procurados (20)

Sentiment Analysis Using Machine Learning
Sentiment Analysis Using Machine LearningSentiment Analysis Using Machine Learning
Sentiment Analysis Using Machine Learning
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Sentiment analysis
Sentiment analysisSentiment analysis
Sentiment analysis
 
NLP using transformers
NLP using transformers NLP using transformers
NLP using transformers
 
Anthiil Inside workshop on NLP
Anthiil Inside workshop on NLPAnthiil Inside workshop on NLP
Anthiil Inside workshop on NLP
 
Word embeddings
Word embeddingsWord embeddings
Word embeddings
 
Word2Vec
Word2VecWord2Vec
Word2Vec
 
natural language processing help at myassignmenthelp.net
natural language processing  help at myassignmenthelp.netnatural language processing  help at myassignmenthelp.net
natural language processing help at myassignmenthelp.net
 
Neural Architectures for Named Entity Recognition
Neural Architectures for Named Entity RecognitionNeural Architectures for Named Entity Recognition
Neural Architectures for Named Entity Recognition
 
Wordnet
WordnetWordnet
Wordnet
 
Text generation and_advanced_topics
Text generation and_advanced_topicsText generation and_advanced_topics
Text generation and_advanced_topics
 
Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)
 
Sentiment analysis using naive bayes classifier
Sentiment analysis using naive bayes classifier Sentiment analysis using naive bayes classifier
Sentiment analysis using naive bayes classifier
 
NLP_KASHK:Evaluating Language Model
NLP_KASHK:Evaluating Language ModelNLP_KASHK:Evaluating Language Model
NLP_KASHK:Evaluating Language Model
 
Introduction to Named Entity Recognition
Introduction to Named Entity RecognitionIntroduction to Named Entity Recognition
Introduction to Named Entity Recognition
 
Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)
Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)
Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)
 
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
 
Text Classification
Text ClassificationText Classification
Text Classification
 
Sentiment analysis presentation
Sentiment analysis presentationSentiment analysis presentation
Sentiment analysis presentation
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 

Semelhante a Sentiment analysis tutorial: Introduction to vocabularies, GitHub project and Twitter API

Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...Lucidworks
 
Tweet analyzer web applicaion
Tweet analyzer web applicaionTweet analyzer web applicaion
Tweet analyzer web applicaionPrathameshSankpal
 
Building Large Arabic Multi-Domain Resources for Sentiment Analysis
Building Large Arabic Multi-Domain Resources for Sentiment Analysis Building Large Arabic Multi-Domain Resources for Sentiment Analysis
Building Large Arabic Multi-Domain Resources for Sentiment Analysis Hady Elsahar
 
Sentiment analysis on demonetisation
Sentiment analysis on demonetisationSentiment analysis on demonetisation
Sentiment analysis on demonetisationAbrarMohamed5
 
Sentiment Analysis
Sentiment AnalysisSentiment Analysis
Sentiment AnalysisSagar Ahire
 
Dice.com Bay Area Search - Beyond Learning to Rank Talk
Dice.com Bay Area Search - Beyond Learning to Rank TalkDice.com Bay Area Search - Beyond Learning to Rank Talk
Dice.com Bay Area Search - Beyond Learning to Rank TalkSimon Hughes
 
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...Association for Computational Linguistics
 
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...Association for Computational Linguistics
 
Breaking the language barrier: how do we quickly add multilanguage support in...
Breaking the language barrier: how do we quickly add multilanguage support in...Breaking the language barrier: how do we quickly add multilanguage support in...
Breaking the language barrier: how do we quickly add multilanguage support in...Jaya Mathew
 
YouCat : Weakly Supervised Youtube Video Categorization System from Meta Data...
YouCat : Weakly Supervised Youtube Video Categorization System from Meta Data...YouCat : Weakly Supervised Youtube Video Categorization System from Meta Data...
YouCat : Weakly Supervised Youtube Video Categorization System from Meta Data...Subhabrata Mukherjee
 
Natural language processing and search
Natural language processing and searchNatural language processing and search
Natural language processing and searchNathan McMinn
 
The Actionable Guide to Doing Better Semantic Keyword Research #BrightonSEO (...
The Actionable Guide to Doing Better Semantic Keyword Research #BrightonSEO (...The Actionable Guide to Doing Better Semantic Keyword Research #BrightonSEO (...
The Actionable Guide to Doing Better Semantic Keyword Research #BrightonSEO (...Paul Shapiro
 
Utilizing the natural langauage toolkit for keyword research
Utilizing the natural langauage toolkit for keyword researchUtilizing the natural langauage toolkit for keyword research
Utilizing the natural langauage toolkit for keyword researchErudite
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...lucenerevolution
 
Experiences with Sentiment Analysis with Peter Zadrozny
Experiences with Sentiment Analysis with Peter ZadroznyExperiences with Sentiment Analysis with Peter Zadrozny
Experiences with Sentiment Analysis with Peter Zadroznypadatascience
 

Semelhante a Sentiment analysis tutorial: Introduction to vocabularies, GitHub project and Twitter API (20)

Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
 
Tweet analyzer web applicaion
Tweet analyzer web applicaionTweet analyzer web applicaion
Tweet analyzer web applicaion
 
Building Large Arabic Multi-Domain Resources for Sentiment Analysis
Building Large Arabic Multi-Domain Resources for Sentiment Analysis Building Large Arabic Multi-Domain Resources for Sentiment Analysis
Building Large Arabic Multi-Domain Resources for Sentiment Analysis
 
Sentiment analysis on demonetisation
Sentiment analysis on demonetisationSentiment analysis on demonetisation
Sentiment analysis on demonetisation
 
Final presentation
Final presentationFinal presentation
Final presentation
 
Sentiment Analysis
Sentiment AnalysisSentiment Analysis
Sentiment Analysis
 
Dice.com Bay Area Search - Beyond Learning to Rank Talk
Dice.com Bay Area Search - Beyond Learning to Rank TalkDice.com Bay Area Search - Beyond Learning to Rank Talk
Dice.com Bay Area Search - Beyond Learning to Rank Talk
 
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
 
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
 
Live Blog Analysis
Live Blog AnalysisLive Blog Analysis
Live Blog Analysis
 
Introduction to .Net
Introduction to .NetIntroduction to .Net
Introduction to .Net
 
Breaking the language barrier: how do we quickly add multilanguage support in...
Breaking the language barrier: how do we quickly add multilanguage support in...Breaking the language barrier: how do we quickly add multilanguage support in...
Breaking the language barrier: how do we quickly add multilanguage support in...
 
YouCat : Weakly Supervised Youtube Video Categorization System from Meta Data...
YouCat : Weakly Supervised Youtube Video Categorization System from Meta Data...YouCat : Weakly Supervised Youtube Video Categorization System from Meta Data...
YouCat : Weakly Supervised Youtube Video Categorization System from Meta Data...
 
Bi-lingual Word Sense Induction
Bi-lingual Word Sense InductionBi-lingual Word Sense Induction
Bi-lingual Word Sense Induction
 
Natural language processing and search
Natural language processing and searchNatural language processing and search
Natural language processing and search
 
The Actionable Guide to Doing Better Semantic Keyword Research #BrightonSEO (...
The Actionable Guide to Doing Better Semantic Keyword Research #BrightonSEO (...The Actionable Guide to Doing Better Semantic Keyword Research #BrightonSEO (...
The Actionable Guide to Doing Better Semantic Keyword Research #BrightonSEO (...
 
Aman chaudhary
 Aman chaudhary Aman chaudhary
Aman chaudhary
 
Utilizing the natural langauage toolkit for keyword research
Utilizing the natural langauage toolkit for keyword researchUtilizing the natural langauage toolkit for keyword research
Utilizing the natural langauage toolkit for keyword research
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
 
Experiences with Sentiment Analysis with Peter Zadrozny
Experiences with Sentiment Analysis with Peter ZadroznyExperiences with Sentiment Analysis with Peter Zadrozny
Experiences with Sentiment Analysis with Peter Zadrozny
 

Último

INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
Using Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea DevelopmentUsing Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea Developmentchesterberbo7
 
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDecoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDhatriParmar
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptxmary850239
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management systemChristalin Nelson
 
Multi Domain Alias In the Odoo 17 ERP Module
Multi Domain Alias In the Odoo 17 ERP ModuleMulti Domain Alias In the Odoo 17 ERP Module
Multi Domain Alias In the Odoo 17 ERP ModuleCeline George
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxlancelewisportillo
 
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...DhatriParmar
 
How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17Celine George
 
Expanded definition: technical and operational
Expanded definition: technical and operationalExpanded definition: technical and operational
Expanded definition: technical and operationalssuser3e220a
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 
Mental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young mindsMental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young mindsPooky Knightsmith
 
Textual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSTextual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSMae Pangan
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfPatidar M
 
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...Nguyen Thanh Tu Collection
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfJemuel Francisco
 
4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptxmary850239
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxVanesaIglesias10
 

Último (20)

INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
Using Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea DevelopmentUsing Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea Development
 
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDecoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management system
 
Multi Domain Alias In the Odoo 17 ERP Module
Multi Domain Alias In the Odoo 17 ERP ModuleMulti Domain Alias In the Odoo 17 ERP Module
Multi Domain Alias In the Odoo 17 ERP Module
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
 
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
 
How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17
 
Expanded definition: technical and operational
Expanded definition: technical and operationalExpanded definition: technical and operational
Expanded definition: technical and operational
 
Paradigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTAParadigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTA
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 
Mental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young mindsMental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young minds
 
Textual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSTextual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHS
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdf
 
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
 
4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptx
 
prashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Professionprashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Profession
 

Sentiment analysis tutorial: Introduction to vocabularies, GitHub project and Twitter API

  • 2. Outline • Introduction to vocabularies used in sentiment analysis • Description of GitHub project • Twitter Dev & script for download of tweets • Simple sentiment classification with AFINN-111 • Define sentiment scores of new words • Sentiment classification with SentiWordNet • Document sentiment classification
  • 3. AFINN-111 • AFINN is a list of English words rated for sentiment score. • between -5 (negative) to +5 (positive). • AFINN-111: Newest version with 2477 words and phrases. … Abilities 2 Ability 2 Aboard 1 Absentee -1 …
  • 4. WordNet • WordNet is lexical database for the English language that groups English word into set of synonyms called synset • WordNet distinguishes between : • nouns • verbs • adjectives • adverbs SYNSET# SYNSET4 SYNSET2 SYNSET1
  • 5. • SentiWordNet is an extension of WordNet that adds for each synset 3 measures: • PosScore [0,1] : positivity measure • NegScore [0,1]: negativity measure • ObjScore [0,1]: objective measure ObjScore a a 00016135 00016247 0 0.125 = 1 – (PosScore + NegScore ) 0.25 rank#5 0.5 superabundant#1 growing profusely; "rank jungle vegetation" most excessively abundant • SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining • http://sentiwordnet.isti.cnr.it/
  • 6. Project on GitHub • https://github.com/linkTDP/BigDataAnalysis_TweetSentim ent • AFINN-111.txt • SentiWordNet_3.0.0_20130122.txt • config.json • ExtractTweet.py • DeriveTweetSentimentEasy.py • NewTermSentimentInference.py • SentiWordnet.py • DocumentSentimentClassification.py
  • 7. config.json & ExtractTweet.py (1) This script can be used to download tweets in a csv file and is configurable through config.json The authentication fields that must be set are: • consumer_key • consumer_secret • access_token • access_token_secret These fields can be retrieved from https://dev.twitter.com creating an account and an application
  • 8. Twitter Developers • Create an account on the site: https://dev.twitter.com/
  • 9.
  • 10. config.json & ExtractTweet.py (2) Other fields: • file_name (name of the .cvs output file) • count (number of tweet to download) • filter (a word used to filter the tweet in output) The CSV file produced in output can be used as input of the other three script.
  • 11. DeriveTweetSentimentEasy.py This script use AFINN-111 as vocabulary In AFINN-111 the score is negative and positive according to sentiment of the word. Therefore a very rudimental sentiment score of the tweet can be calculated summing the score of each word. Issue: In AFINN-111 not all the words are present.
  • 13. SentiWordnet.py This script use SentiWordNet as vocabulary and an the algorithm that is implemented is inspired by : Hamouda, Alaa, and Mohamed Rohaim. "Reviews classification using sentiwordnet lexicon." World Congress on Computer Science and Information Technology. 2011. http://www.academia.edu/1336655/Reviews_Classific ation_Using_SentiWordNet_Lexicon
  • 15. Tokenization & Speech Tagging • Tokenization process: splits the text into very simple tokens such as numbers, punctuation and words of different types. • Speech Tagging process: produces a tag as an annotation based on the role of each word in the tweet. noun verb noun adverb Francesco speaks English well
  • 16. Word Sense Disambiguation The techniques of WSD are aimed at the determination of the meaning of every word in his context. In this case the disambiguation happens selecting for each words in a tweet the synset in WordNet that best represents this word in his context.
  • 17. Word Sense Disambiguation (2) I have implemented a simple (and inaccurate) algorithm of WSD using NLTK (Python's library for NLP). Each synset in WordNet has a textual a brief description called Gloss. Very intuitively this algorithm choose as synset of the word the one whose Gloss contains the largest number of words present in the tweet. If no Gloss has a match with the tweet's words, the algorithm choose the first synset, that usually is the most used. Issue: The corpus of a tweet is very small (max 140 character), so this algorithm could produce a bad disambiguation of the word's sense.
  • 18. SentiWordNet Interpretation Given a synset (after the phase of WSD) we can search in SentiWordNet the sentiment score associated to this synset tweet @BonksMullet @chet_sellers This is very accurate and hilarious. Well done :) WSD synset accurate#1 conforming exactly or almost exactly to fact or to a standard or performing with total accuracy; "an accurate reproduction"; "the accounting was accurate"; "accurate measurements"; "an accurate scale" SentiWordNet score Pos_score 0.5 Neg_score 0 Obj_score 0.5
  • 23. Open issues • the tweet's corpus is too short to use the great part of the WSD techniques • In this kind of short texts (tweet or Facebook's comments) is used a particular slang that needs ad hoc techniques to be processed. Insights: • Apoorv Agarwal, Boyi Xie, Ilia Vovsha, Owen Rambow, and Rebecca Passonneau. 2011. Sentiment analysis of Twitter data. In Proceedings of the Workshop on Languages in Social Media (LSM '11) • Gokulakrishnan, B.; Priyanthan, P.; Ragavan, T.; Prasath, N.; Perera, A., "Opinion mining and sentiment analysis on a Twitter data stream," Advances in ICT for Emerging Regions (ICTer), 2012 International Conference on.
  • 24. Example of Documents Sentiment Classification DocumentSentimentClassification.py Implementation of the algorithm for Document Classification see at lesson Turney, Peter D., and Michael L. Littman. "Measuring praise and criticism: Inference of semantic orientation from association." ACM Transactions on Information Systems (TOIS) 21.4 (2003): 315-346.
  • 25. Parameters Parameters (at the start of the code): • FILE_NAME = “ name of the file .txt on which you want execute the classification” • API_KEY_BING = “Api Key Bing” • API_KEY_GOOGLE = “Api Key for Custom Search Api” • USE_GOOGLE = (Boolean) Enable (True) or Disable (False) the use of the Google Api for Custom Search The number of free queries per day using Google Api are limited to 100!!
  • 26. Libraries • NLTK – Natural Language Toolkit • tokenizers/punkt/english.pickle Module • Requests • Math • Urllib2 • google-api-python-client • https://code.google.com/p/google-api-python-client/ This libraries could be installed using Pip: pip install <library name>
  • 28. Bing API - Key
  • 29. Google API – Custom Search • https://cloud.google.com/console#/project
  • 30. Google API – Custom Search • https://cloud.google.com/console#/project
  • 31. Google API – Custom Search (1)
  • 32. Google API – Custom Search (1)
  • 33. Google API – Custom Search (1)
  • 34. References • AFFIN-111 - • • • • • http://www2.imm.dtu.dk/pubdb/views/publication_details.php ?id=6010 SentiWordNet - http://sentiwordnet.isti.cnr.it/ SENTIWORDNET: A Publicly Available Lexical Resource for Opinion Mining http://nmis.isti.cnr.it/sebastiani/Publications/LREC06.pdf Reviews ClassificationUsing SentiWordNet Lexicon http://www.academia.edu/1336655/Reviews_Classification_Usi ng_SentiWordNet_Lexicon Using SentiWordNet and Sentiment Analysis for Detecting Radical Content on Web Forums http://www.jeremyellman.com/jeremy_unn/pdfs/1_____Chaloth orn_Ellman_SKIMA_2012.pdf From tweets to polls: Linking text sentiment to public opinion time series http://www.aaai.org/ocs/index.php/ICWSM/ICWSM10/paper/vi ewFile/1536/1842
  • 35. References • Natural Language Toolkit - http://nltk.org/ • Twitter Developers - https://dev.twitter.com/ • Tweepy - https://github.com/tweepy/tweepy • Python csv - http://www.pythonforbeginners.com/systems -programming/using-the-csv-module-inpython/