O SlideShare utiliza cookies para otimizar a funcionalidade e o desempenho do site, assim como para apresentar publicidade mais relevante aos nossos usuários. Se você continuar a navegar o site, você aceita o uso de cookies. Leia nosso Contrato do Usuário e nossa Política de Privacidade.
O SlideShare utiliza cookies para otimizar a funcionalidade e o desempenho do site, assim como para apresentar publicidade mais relevante aos nossos usuários. Se você continuar a utilizar o site, você aceita o uso de cookies. Leia nossa Política de Privacidade e nosso Contrato do Usuário para obter mais detalhes.
While Twitter displays no advertising, advertisers can target users based on their history of tweets and may quote tweets in ads directed specifically to the user.
Twitter sentiment-analysis Jiit2013-14
(Twitter Sentimental Analysis)
Major Project Presentation
Piyush Aggarwal Rachit Goel
Department of CSE/IT
1. Problem Statement
3. Data Collection
4. Data Pre-Processing
5. Classification of Tweets
7. Future Scope
A major benefit of social media is that we can see the good
and bad things people say about the particular brand or
The bigger your company gets difficult it becomes to keep
a handle on how everyone feels about your brand. For
large companies with thousands of daily mentions on social
media, news sites and blogs, it’s extremely difficult to do
To combat this problem, sentimental analysis software are
necessary. These soft wares can be used to evaluate the
people's sentiment about particular brand or personality.
TWEEZER = TWEEts + analyZER
This product (Tweezer) introduce a novel approach for
automatically classifying the sentiment of Twitter
messages. These messages are classified as positive or
neutral or negative with respect to a query term or the
keyword entered by a user.
Introduction: What is Tweezer!!
1. Data Streaming: For performing sentimental analysis
we need Twitter data consisting of tweets about a
particular keyword or query term. For collecting the
data and tweets we have used Twitter public API
available for general public for free. It is the part of
#NOTE: Tweets are short messages, restricted to 140
characters in length. Due to the nature of this micro
blogging service (quick and short messages), people use
acronyms, make spelling mistakes, use emoticons and
other characters that express special meanings.
It is a process to remove the unwanted words from
tweets that does not amount to any sentiments.
1. Emotional Icons- 170 emoticons; identified emotional
icons and remove them.
2. URLs-does not signify any sentiment; replaced it with
a word |URL|
Data Pre Processing
3. Stop words- words as “a‟, “is”, “the”; does not
indicate any sentiment
4. UserNames and HashTags- @ symbol before the
username and # for topic; both replaced with AT_USER.
Data Pre Processing(cont..)
5. Repeated Letters- huuuungry, huuuuuuungry,
huuuuuuuuuungry into the token “huungry".
6. Slag Words- Non English words
Data Pre Processing(cont..)
Different Ways of Classifications-
Binary Classification: It is a two way categorization
i.e. positive or negative.
3-Tier: In this, Tweets are categorized as Positive,
Negative and Neutral.
5-Tier: Tweets are bucketed in 5 Classes namely:
Extremely Positive, Positive, Neutral, Negative and
Characterization of Tweets
Sarcasm types related to twitter are as follows:
Positive words with negative smiley.
Negative words with positive smiley.
Sarcasm related to facts which includes spoofs,
sarcastic recreation etc.
1. Data Pre-Processing using more parameters to get
2. Updating Dictionary for new Synonym and
Antonyms of already existing words.
3. Web-Application can be converted to Mobile
4. Context Sentimental Analysis may be implemented
in future for accuracy purposes.