SlideShare uma empresa Scribd logo
1 de 38
General Framework for
Sentiment Analysis of Twitter Data
with Special Attention Towards
Improving Health Awareness
B. J. Gunasekara
Supervisor - Dr R. D. Nawarathna
Introduction
Social networking
encourages users to
express their ideas &
views on
their day-to-day life
style
Department of Stat. & Comp. Sc., Faculty of
Science, University of Peradeniya
2
Social Media Analytics
• The practice of gathering data from web
resources like blogs and social media and
analyzing that data
• Applications
 Big Data Analysis
 Survey & Marketing
 Decision Making
Department of Stat. & Comp. Sc., Faculty of
Science, University of Peradeniya
3
Twitter
“To give everyone the
power to create and
share ideas and
information instantly,
without barriers”
4
Department of Stat. & Comp. Sc., Faculty of
Science, University of Peradeniya
288 Million Monthly Active Users
500 Million Tweets Sent Per Day
152,000+ Tweets by Healthcare
professionals per Day
Department of Stat. & Comp. Sc., Faculty of
Science, University of Peradeniya
5
Tell your story with
140 characters
 Textual content
 User mentions
 Hashtags
 URLs
 Location
Content of a tweet
6
Department of Stat. & Comp. Sc., Faculty of
Science, University of Peradeniya
Most of the tweets contain a less
informational value!!!
but a collection of tweets can
provide a
valuable insight into a
population
7
Department of Stat. & Comp. Sc., Faculty of
Science, University of Peradeniya
One voice can make a difference…
But a million can change the world!
#LetDoctorsBeDoctors #ChildhoodCancer
#BreastCancer
#digitalhealth
#ObesityCareWeek
# Parkinsons#Lyphoma
#Migraine
8
Department of Stat. & Comp. Sc., Faculty of
Science, University of Peradeniya
Importance of Improving Health
Literacy
• Maintain personal health & wellbeing
• Save on your medical costs
• Avoid Misinterpretations
 chemo isn't so nice. Bad dreams 
 I really am surprised at how bad the side-effects are from
#chemo this time. It's taken me by surprise a bit. Not good.
 hospitals are the worst!! hate the medicine like
smell lingering in the air why did my life become
so bad  hate #chemo ahhh
 Don't let chemotherapy take away your 'you‘ !!!
find your fab again with @Baldlybeautiful
 My dads experimental chemo has officially stopped
his tumors from growing for an entire year now 
9
Department of Stat. & Comp. Sc., Faculty of
Science, University of Peradeniya
Natural Language Processing
• NLP is the platform built to understand the linguistic
interaction between humans and computers.
• Main Tasks –
 Information Extraction
 Semantic Parsing
 Text To 3D Scene Generation
 Sentiment And Social Meaning
 Machine Translation
 Dialog And Speech Processing
 Automatic Summarization
 Text Segmentation
Department of Stat. & Comp. Sc., Faculty of
Science, University of Peradeniya
10
• Sentiment analysis is the extraction of
subjective information in a document using
NLP, text analysis and computer linguistics.
• Basic Tasks
 Polarity classification
 Subjectivity/objectivity identification
 Feature/aspect-based sentiment analysis
Sentiment Analysis (Opinion Mining)
11
Department of Stat. & Comp. Sc., Faculty of
Science, University of Peradeniya
Related Work
• Language feature analysis
• Special frameworks
 Autoregressive Moving Average (ARMA)
 Latent Dirichlet allocation(LDA)
 Ailment Topic Aspect Model (ATAM)
• Derivations from existing models
 BioCaster Ontology,
an extant knowledge model of laymen’s terms
Department of Stat. & Comp. Sc., Faculty of
Science, University of Peradeniya
12
Problem Statement
• Perform a sentiment analysis which concerns
on improving health awareness,
by analyzing the typical public reaction to
common illnesses and treatments in Twitter
community.
Department of Stat. & Comp. Sc., Faculty of
Science, University of Peradeniya
13
Methodology
• The proposed method is based on POS Tagged
Bigrams with Naïve Bayes Classifier
Department of Stat. & Comp. Sc., Faculty of
Science, University of Peradeniya
14
15
Department of Stat. & Comp. Sc., Faculty of
Science, University of Peradeniya
Feature Extraction
• “200 lives were lost, coz of this massive
dengue outbreak “Tweet
• ['lives', 'lost', 'coz', 'massive', 'dengue',
'outbreak']Unigrams
• ['lives_lost', 'lost_coz', 'coz_massive',
'massive_dengue', 'dengue_outbreak']Bigrams
• [('lives', 'NNS'), ('were', 'VBD'), ('lost',
'VBN'), ('coz', 'NN'), ('of', 'IN'), ('this', 'DT'),
('massive', 'JJ'), ('dengue', 'NN'),
('outbreak', 'NN')]
POS tagging
16
Department of Stat. & Comp. Sc., Faculty of
Science, University of Peradeniya
Bigram vs. Unigram
• The frequency distribution of bigrams in a
string is used for simple statistical analysis of
text.
• Unlike unigrams, bigrams suggest another
word (increased long-tail specificity )
• Classifier has more contexts to predict the
label than relying on single word.
Department of Stat. & Comp. Sc., Faculty of
Science, University of Peradeniya
17
POS Tagging
• The process of labeling the particular part of
speech of a word with respect to its definition,
as well as its context.
• Mainly nouns & adjectives were considered.
• Adjectives can modify a noun to add value, to
add better sense.
 Penn Treebank
 Brown Corpus
Department of Stat. & Comp. Sc., Faculty of
Science, University of Peradeniya
18
• Based on Bayes Theorem
• It assumes that the probability of each attribute
belonging to a given class value is independent
of all other attributes and probabilities of each
attribute belonging to each class.
• Ideal for categorical data – easy to calculate
using ratios.
Naïve Bayes classifier
19
Department of Stat. & Comp. Sc., Faculty of
Science, University of Peradeniya
System Implementation
• Python 3.4
 Operator - Functional interface to built-in operators.
 Itertools - Numeric and Mathematical Modules
 Re - Searching within and changing text using formal
patterns.
• NLTK
 Probability - Classes for representing and processing
probabilistic information
 Classify - Classifiers
 Metrics - Testing & validation
• Matplotlib & Pylab
• Tkinter
Department of Stat. & Comp. Sc., Faculty of
Science, University of Peradeniya
20
Experimental Setup
• Specific health topics, illnesses and treatments
were selected using WebMD and Mayo Clinic
• Tweets related to those issues were collected
using NodeXL tool.
• Data was collected over a period of time to
ensure that it does not contain any strange
outliers.
• Training sets
– the datasets were distributed within groups with 10
people in each and the label of a tweet was
assigned according to the tag chosen by
the majority.
Department of Stat. & Comp. Sc., Faculty of
Science, University of Peradeniya
21
• Both Naïve Bayes and Maximum Entropy
classifiers were used.
• Experiments were carried trying out for
different combinations of bigram/unigram,
with part-of-speech (POS) tagging.
• The performance was evaluated with
cross validation.
Department of Stat. & Comp. Sc., Faculty of
Science, University of Peradeniya
22
Datasets
23
Department of Stat. & Comp. Sc., Faculty of
Science, University of Peradeniya
Name
Content
(keywords)
# From To Classified
Polarity Ratio
(Negative:Positive)
Dengue Dengue 472
27/04/2015
20:29
1/7/2015
15:14
Yes 323:149
H1N1 H1N1, Influenza 548
24/06/15
1:45
30/06/15
14:57
Yes 314 : 234
Chemo-I Chemotherapy 170
12/10/15
7:12
22/10/15
14:37
Yes 72 : 98
Chemo-II Chemotherapy 734
12/10/2015
12:04
22/10/15
14:37
No -
Experiment 1: Dengue Dataset
Dengue, Dengue Vaccine
Naïve Bayes MaxEnt
Uni
grams
Bi
grams
POS-
Tagged
Bigrams
Uni
grams
Bi
grams
POS-
Tagged
Bigrams
Accuracy 72.52 75.50 81.82 68.68 70.32 76.06
Weighted
Precision
74.26 74.40 81.69 72.42 65.91 61.28
Weighted
Recall
70.70 73.77 82.26 67.30 70.82 57.70
Weighted
F-measure
70.90 71.00 79.84 67.55 60.57 58.72
Department of Stat. & Comp. Sc., Faculty of
Science, University of Peradeniya
24
Accuracy
25
Department of Stat. & Comp. Sc., Faculty of
Science, University of Peradeniya
60.00
65.00
70.00
75.00
80.00
85.00
Naïve Bayes Maximum Entropy
Unigrams
Bigrams
POS-Tagged Bigrams
Weighted F-measure
Department of Stat. & Comp. Sc., Faculty of
Science, University of Peradeniya
0.00
10.00
20.00
30.00
40.00
50.00
60.00
70.00
80.00
90.00
Naïve Bayes Maximum Entropy
Unigrams
Bigrams
POS-Tagged Bigrams
26
Experiment 2: H1N1 Dataset
H1N1,Influenza
Department of Stat. & Comp. Sc., Faculty of
Science, University of Peradeniya
27
Naïve Bayes
Unigrams Bigrams POS-Tagged Bigrams
Accuracy
67.43 70.59 76.04
Weighted Precision 67.52 70.62 76.09
Weighted Recall 67.95 70.44 76.05
Weighted F-measure 65.69 70.08 75.78
Experiment 3: Chemo-I Dataset
Chemotherapy
Department of Stat. & Comp. Sc., Faculty of
Science, University of Peradeniya
28
Naïve Bayes
Unigrams Bigrams POS-Tagged Bigrams
Accuracy 75.88 76.47 78.24
Weighted Precision 78.23 78.66 79.96
Weighted Recall 75.10 75.60 77.16
Weighted F-measure 75.69 76.25 77.93
Polarity Checker : Dataset Analysis
29
Department of Stat. & Comp. Sc., Faculty of
Science, University of Peradeniya
Polarity Checker : Top Stories
Positive Negative
1
#Dengue News: Scientists identify the
skin immune cells targeted by the
dengue virus
United Nations News Centre - At least
3,000 suspected Dengue fever cases
reported in Yemen – UN health agency:
2
Co-ordination meet of BBMP Health and
edu. dept. regarding control and
prevention of Dengue and Chikungunya
fever spread by Mosquito bite. (1/5)
#MyiTimes Country faces largest dengue
epidemic ever - KUALA LUMPUR: The
country is probably facing the largest
dengue problem
3
Well that's a 1st! Malaysia Dept of Health
officials doing house to house calls
looking for dengue hot spots!!
Clean bill of health here!
#Dengue News: Country faces largest
dengue epidemic ever - Free Malaysia
Today
4
@PascalBarollier Fantastic! Thanks for
helping our tribe put a face to dengue
global leaders won't forget.
Country faces largest dengue epidemic
ever: The number of deaths has doubled
this year compared to the same period…
5
@DengueInfo Thank you for helping us
get the word out on Dengue Tribe! To
help put a face to dengue, join here
#Yemen Yemen: At least 3,000 suspected
Dengue fever cases reported in Yemen –
UN health agency says 30
Department of Stat. & Comp. Sc., Faculty of
Science, University of Peradeniya
Polarity Checker : Text Analysis
31
Department of Stat. & Comp. Sc., Faculty of
Science, University of Peradeniya
Buzzmeter
Department of Stat. & Comp. Sc., Faculty of
Science, University of Peradeniya
32
Buzzmeter : Unigram vs. Bigram
Department of Stat. & Comp. Sc., Faculty of
Science, University of Peradeniya
33
Buzzmeter : Unigram vs. Bigram
• Chemo radiation
• Breast cancer
• Last chemo
• Cancer awareness
34
Department of Stat. & Comp. Sc., Faculty of
Science, University of Peradeniya
Conclusion
• This research presents a sentiment analysis with
special attention towards improving health
awareness.
 automatic classification of a given tweet
 generate the general attitude from a given set of
tweets, with top stories.
 track most commonly used words/phrases in health
related tweets
• POS-tagged bigrams using nouns + adjectives
with Naive Bayes method produced the
best overall performance.
Department of Stat. & Comp. Sc., Faculty of
Science, University of Peradeniya
35
Future Recommendations
• Real-time Twitter data analyzing
• Web plug-ins
• Mobile apps
• Identifying pattern of spreading of a disease,
threatened areas & age groups
• Health alerts/warnings system
Department of Stat. & Comp. Sc., Faculty of
Science, University of Peradeniya
36
Questions?
37
Department of Stat. & Comp. Sc., Faculty of
Science, University of Peradeniya
Thank You!!!
38
Department of Stat. & Comp. Sc., Faculty of
Science, University of Peradeniya

Mais conteúdo relacionado

Semelhante a General Framework for Sentiment Analysis of Twitter Data, with Special Attention towards Improving Health Awareness - Final Year Research Project

The Clinical Genome Conference 2014
The Clinical Genome Conference 2014The Clinical Genome Conference 2014
The Clinical Genome Conference 2014Nicole Proulx
 
Curriculum_Vitae_Mark_Ebbert-modern
Curriculum_Vitae_Mark_Ebbert-modernCurriculum_Vitae_Mark_Ebbert-modern
Curriculum_Vitae_Mark_Ebbert-modernMark Ebbert
 
Online Graduate Programs in Bioinformatics at NYU
Online Graduate Programs in Bioinformatics at NYUOnline Graduate Programs in Bioinformatics at NYU
Online Graduate Programs in Bioinformatics at NYUNYU Tandon Online
 
Machine Learning in Biology and Why It Doesn't Make Sense - Theo Knijnenburg,...
Machine Learning in Biology and Why It Doesn't Make Sense - Theo Knijnenburg,...Machine Learning in Biology and Why It Doesn't Make Sense - Theo Knijnenburg,...
Machine Learning in Biology and Why It Doesn't Make Sense - Theo Knijnenburg,...Seattle DAML meetup
 
Functional Analysis & Screening Technologies Congress
Functional Analysis & Screening Technologies CongressFunctional Analysis & Screening Technologies Congress
Functional Analysis & Screening Technologies CongressJames Prudhomme
 
Dr. Nanyingi Technology Keynote
Dr. Nanyingi Technology KeynoteDr. Nanyingi Technology Keynote
Dr. Nanyingi Technology KeynoteNanyingi Mark
 
International Journal of Biometrics and Bioinformatics(IJBB) Volume (4) Issu...
International Journal of Biometrics and Bioinformatics(IJBB)  Volume (4) Issu...International Journal of Biometrics and Bioinformatics(IJBB)  Volume (4) Issu...
International Journal of Biometrics and Bioinformatics(IJBB) Volume (4) Issu...CSCJournals
 
Computational Pathology Workshop July 8 2014
Computational Pathology Workshop July 8 2014Computational Pathology Workshop July 8 2014
Computational Pathology Workshop July 8 2014Joel Saltz
 
Basics of Data Analysis in Bioinformatics
Basics of Data Analysis in BioinformaticsBasics of Data Analysis in Bioinformatics
Basics of Data Analysis in BioinformaticsElena Sügis
 
Math, Stats and CS in Public Health and Medical Research
Math, Stats and CS in Public Health and Medical ResearchMath, Stats and CS in Public Health and Medical Research
Math, Stats and CS in Public Health and Medical ResearchJessica Minnier
 
Leveraging Text Classification Strategies for Clinical and Public Health Appl...
Leveraging Text Classification Strategies for Clinical and Public Health Appl...Leveraging Text Classification Strategies for Clinical and Public Health Appl...
Leveraging Text Classification Strategies for Clinical and Public Health Appl...Karin Verspoor
 
AI and whole slide imaging biomarkers
AI and whole slide imaging biomarkersAI and whole slide imaging biomarkers
AI and whole slide imaging biomarkersJoel Saltz
 
Research Statement Chien-Wei Lin
Research Statement Chien-Wei LinResearch Statement Chien-Wei Lin
Research Statement Chien-Wei LinChien-Wei Lin
 
Cancer Research Meets Data Science — What Can We Do Together?
Cancer Research Meets Data Science — What Can We Do Together?Cancer Research Meets Data Science — What Can We Do Together?
Cancer Research Meets Data Science — What Can We Do Together?Philip Bourne
 
Digital Pathology, FDA Approval and Precision Medicine
Digital Pathology, FDA Approval and Precision MedicineDigital Pathology, FDA Approval and Precision Medicine
Digital Pathology, FDA Approval and Precision MedicineJoel Saltz
 
51_Introduction to Artificial Intelligence and its applications.pdf
51_Introduction to Artificial Intelligence and its applications.pdf51_Introduction to Artificial Intelligence and its applications.pdf
51_Introduction to Artificial Intelligence and its applications.pdfVamsi kumar
 

Semelhante a General Framework for Sentiment Analysis of Twitter Data, with Special Attention towards Improving Health Awareness - Final Year Research Project (20)

The Clinical Genome Conference 2014
The Clinical Genome Conference 2014The Clinical Genome Conference 2014
The Clinical Genome Conference 2014
 
Qiu_CV_Feb12_2017
Qiu_CV_Feb12_2017Qiu_CV_Feb12_2017
Qiu_CV_Feb12_2017
 
Curriculum_Vitae_Mark_Ebbert-modern
Curriculum_Vitae_Mark_Ebbert-modernCurriculum_Vitae_Mark_Ebbert-modern
Curriculum_Vitae_Mark_Ebbert-modern
 
Online Graduate Programs in Bioinformatics at NYU
Online Graduate Programs in Bioinformatics at NYUOnline Graduate Programs in Bioinformatics at NYU
Online Graduate Programs in Bioinformatics at NYU
 
Machine Learning in Biology and Why It Doesn't Make Sense - Theo Knijnenburg,...
Machine Learning in Biology and Why It Doesn't Make Sense - Theo Knijnenburg,...Machine Learning in Biology and Why It Doesn't Make Sense - Theo Knijnenburg,...
Machine Learning in Biology and Why It Doesn't Make Sense - Theo Knijnenburg,...
 
The_Odyssey_Spring_2015
The_Odyssey_Spring_2015The_Odyssey_Spring_2015
The_Odyssey_Spring_2015
 
Functional Analysis & Screening Technologies Congress
Functional Analysis & Screening Technologies CongressFunctional Analysis & Screening Technologies Congress
Functional Analysis & Screening Technologies Congress
 
Dr. Nanyingi Technology Keynote
Dr. Nanyingi Technology KeynoteDr. Nanyingi Technology Keynote
Dr. Nanyingi Technology Keynote
 
International Journal of Biometrics and Bioinformatics(IJBB) Volume (4) Issu...
International Journal of Biometrics and Bioinformatics(IJBB)  Volume (4) Issu...International Journal of Biometrics and Bioinformatics(IJBB)  Volume (4) Issu...
International Journal of Biometrics and Bioinformatics(IJBB) Volume (4) Issu...
 
Computational Pathology Workshop July 8 2014
Computational Pathology Workshop July 8 2014Computational Pathology Workshop July 8 2014
Computational Pathology Workshop July 8 2014
 
Basics of Data Analysis in Bioinformatics
Basics of Data Analysis in BioinformaticsBasics of Data Analysis in Bioinformatics
Basics of Data Analysis in Bioinformatics
 
Math, Stats and CS in Public Health and Medical Research
Math, Stats and CS in Public Health and Medical ResearchMath, Stats and CS in Public Health and Medical Research
Math, Stats and CS in Public Health and Medical Research
 
Leveraging Text Classification Strategies for Clinical and Public Health Appl...
Leveraging Text Classification Strategies for Clinical and Public Health Appl...Leveraging Text Classification Strategies for Clinical and Public Health Appl...
Leveraging Text Classification Strategies for Clinical and Public Health Appl...
 
AI and whole slide imaging biomarkers
AI and whole slide imaging biomarkersAI and whole slide imaging biomarkers
AI and whole slide imaging biomarkers
 
Research Statement Chien-Wei Lin
Research Statement Chien-Wei LinResearch Statement Chien-Wei Lin
Research Statement Chien-Wei Lin
 
Cancer Research Meets Data Science — What Can We Do Together?
Cancer Research Meets Data Science — What Can We Do Together?Cancer Research Meets Data Science — What Can We Do Together?
Cancer Research Meets Data Science — What Can We Do Together?
 
Fran rod final general resume
Fran rod final general resumeFran rod final general resume
Fran rod final general resume
 
Digital Pathology, FDA Approval and Precision Medicine
Digital Pathology, FDA Approval and Precision MedicineDigital Pathology, FDA Approval and Precision Medicine
Digital Pathology, FDA Approval and Precision Medicine
 
Samuel's CV.
Samuel's CV.Samuel's CV.
Samuel's CV.
 
51_Introduction to Artificial Intelligence and its applications.pdf
51_Introduction to Artificial Intelligence and its applications.pdf51_Introduction to Artificial Intelligence and its applications.pdf
51_Introduction to Artificial Intelligence and its applications.pdf
 

Mais de Bethmi Gunasekara

Electron JS | Build cross-platform desktop applications with web technologies
Electron JS | Build cross-platform desktop applications with web technologiesElectron JS | Build cross-platform desktop applications with web technologies
Electron JS | Build cross-platform desktop applications with web technologiesBethmi Gunasekara
 
TestNG - The Next Generation of Unit Testing
TestNG - The Next Generation of Unit TestingTestNG - The Next Generation of Unit Testing
TestNG - The Next Generation of Unit TestingBethmi Gunasekara
 
No SQL- The Future Of Data Storage
No SQL- The Future Of Data StorageNo SQL- The Future Of Data Storage
No SQL- The Future Of Data StorageBethmi Gunasekara
 
Web Portal for Construction Industry
Web Portal for Construction IndustryWeb Portal for Construction Industry
Web Portal for Construction IndustryBethmi Gunasekara
 

Mais de Bethmi Gunasekara (6)

Electron JS | Build cross-platform desktop applications with web technologies
Electron JS | Build cross-platform desktop applications with web technologiesElectron JS | Build cross-platform desktop applications with web technologies
Electron JS | Build cross-platform desktop applications with web technologies
 
Introduction to React JS
Introduction to React JSIntroduction to React JS
Introduction to React JS
 
TestNG - The Next Generation of Unit Testing
TestNG - The Next Generation of Unit TestingTestNG - The Next Generation of Unit Testing
TestNG - The Next Generation of Unit Testing
 
Html 5 - What's new?
Html 5 - What's new?Html 5 - What's new?
Html 5 - What's new?
 
No SQL- The Future Of Data Storage
No SQL- The Future Of Data StorageNo SQL- The Future Of Data Storage
No SQL- The Future Of Data Storage
 
Web Portal for Construction Industry
Web Portal for Construction IndustryWeb Portal for Construction Industry
Web Portal for Construction Industry
 

Último

办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一F La
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGILLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGIThomas Poetter
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhThiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhYasamin16
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degreeyuu sss
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...ssuserf63bd7
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 

Último (20)

办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGILLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhThiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 

General Framework for Sentiment Analysis of Twitter Data, with Special Attention towards Improving Health Awareness - Final Year Research Project

  • 1. General Framework for Sentiment Analysis of Twitter Data with Special Attention Towards Improving Health Awareness B. J. Gunasekara Supervisor - Dr R. D. Nawarathna
  • 2. Introduction Social networking encourages users to express their ideas & views on their day-to-day life style Department of Stat. & Comp. Sc., Faculty of Science, University of Peradeniya 2
  • 3. Social Media Analytics • The practice of gathering data from web resources like blogs and social media and analyzing that data • Applications  Big Data Analysis  Survey & Marketing  Decision Making Department of Stat. & Comp. Sc., Faculty of Science, University of Peradeniya 3
  • 4. Twitter “To give everyone the power to create and share ideas and information instantly, without barriers” 4 Department of Stat. & Comp. Sc., Faculty of Science, University of Peradeniya
  • 5. 288 Million Monthly Active Users 500 Million Tweets Sent Per Day 152,000+ Tweets by Healthcare professionals per Day Department of Stat. & Comp. Sc., Faculty of Science, University of Peradeniya 5
  • 6. Tell your story with 140 characters  Textual content  User mentions  Hashtags  URLs  Location Content of a tweet 6 Department of Stat. & Comp. Sc., Faculty of Science, University of Peradeniya
  • 7. Most of the tweets contain a less informational value!!! but a collection of tweets can provide a valuable insight into a population 7 Department of Stat. & Comp. Sc., Faculty of Science, University of Peradeniya
  • 8. One voice can make a difference… But a million can change the world! #LetDoctorsBeDoctors #ChildhoodCancer #BreastCancer #digitalhealth #ObesityCareWeek # Parkinsons#Lyphoma #Migraine 8 Department of Stat. & Comp. Sc., Faculty of Science, University of Peradeniya
  • 9. Importance of Improving Health Literacy • Maintain personal health & wellbeing • Save on your medical costs • Avoid Misinterpretations  chemo isn't so nice. Bad dreams   I really am surprised at how bad the side-effects are from #chemo this time. It's taken me by surprise a bit. Not good.  hospitals are the worst!! hate the medicine like smell lingering in the air why did my life become so bad  hate #chemo ahhh  Don't let chemotherapy take away your 'you‘ !!! find your fab again with @Baldlybeautiful  My dads experimental chemo has officially stopped his tumors from growing for an entire year now  9 Department of Stat. & Comp. Sc., Faculty of Science, University of Peradeniya
  • 10. Natural Language Processing • NLP is the platform built to understand the linguistic interaction between humans and computers. • Main Tasks –  Information Extraction  Semantic Parsing  Text To 3D Scene Generation  Sentiment And Social Meaning  Machine Translation  Dialog And Speech Processing  Automatic Summarization  Text Segmentation Department of Stat. & Comp. Sc., Faculty of Science, University of Peradeniya 10
  • 11. • Sentiment analysis is the extraction of subjective information in a document using NLP, text analysis and computer linguistics. • Basic Tasks  Polarity classification  Subjectivity/objectivity identification  Feature/aspect-based sentiment analysis Sentiment Analysis (Opinion Mining) 11 Department of Stat. & Comp. Sc., Faculty of Science, University of Peradeniya
  • 12. Related Work • Language feature analysis • Special frameworks  Autoregressive Moving Average (ARMA)  Latent Dirichlet allocation(LDA)  Ailment Topic Aspect Model (ATAM) • Derivations from existing models  BioCaster Ontology, an extant knowledge model of laymen’s terms Department of Stat. & Comp. Sc., Faculty of Science, University of Peradeniya 12
  • 13. Problem Statement • Perform a sentiment analysis which concerns on improving health awareness, by analyzing the typical public reaction to common illnesses and treatments in Twitter community. Department of Stat. & Comp. Sc., Faculty of Science, University of Peradeniya 13
  • 14. Methodology • The proposed method is based on POS Tagged Bigrams with Naïve Bayes Classifier Department of Stat. & Comp. Sc., Faculty of Science, University of Peradeniya 14
  • 15. 15 Department of Stat. & Comp. Sc., Faculty of Science, University of Peradeniya
  • 16. Feature Extraction • “200 lives were lost, coz of this massive dengue outbreak “Tweet • ['lives', 'lost', 'coz', 'massive', 'dengue', 'outbreak']Unigrams • ['lives_lost', 'lost_coz', 'coz_massive', 'massive_dengue', 'dengue_outbreak']Bigrams • [('lives', 'NNS'), ('were', 'VBD'), ('lost', 'VBN'), ('coz', 'NN'), ('of', 'IN'), ('this', 'DT'), ('massive', 'JJ'), ('dengue', 'NN'), ('outbreak', 'NN')] POS tagging 16 Department of Stat. & Comp. Sc., Faculty of Science, University of Peradeniya
  • 17. Bigram vs. Unigram • The frequency distribution of bigrams in a string is used for simple statistical analysis of text. • Unlike unigrams, bigrams suggest another word (increased long-tail specificity ) • Classifier has more contexts to predict the label than relying on single word. Department of Stat. & Comp. Sc., Faculty of Science, University of Peradeniya 17
  • 18. POS Tagging • The process of labeling the particular part of speech of a word with respect to its definition, as well as its context. • Mainly nouns & adjectives were considered. • Adjectives can modify a noun to add value, to add better sense.  Penn Treebank  Brown Corpus Department of Stat. & Comp. Sc., Faculty of Science, University of Peradeniya 18
  • 19. • Based on Bayes Theorem • It assumes that the probability of each attribute belonging to a given class value is independent of all other attributes and probabilities of each attribute belonging to each class. • Ideal for categorical data – easy to calculate using ratios. Naïve Bayes classifier 19 Department of Stat. & Comp. Sc., Faculty of Science, University of Peradeniya
  • 20. System Implementation • Python 3.4  Operator - Functional interface to built-in operators.  Itertools - Numeric and Mathematical Modules  Re - Searching within and changing text using formal patterns. • NLTK  Probability - Classes for representing and processing probabilistic information  Classify - Classifiers  Metrics - Testing & validation • Matplotlib & Pylab • Tkinter Department of Stat. & Comp. Sc., Faculty of Science, University of Peradeniya 20
  • 21. Experimental Setup • Specific health topics, illnesses and treatments were selected using WebMD and Mayo Clinic • Tweets related to those issues were collected using NodeXL tool. • Data was collected over a period of time to ensure that it does not contain any strange outliers. • Training sets – the datasets were distributed within groups with 10 people in each and the label of a tweet was assigned according to the tag chosen by the majority. Department of Stat. & Comp. Sc., Faculty of Science, University of Peradeniya 21
  • 22. • Both Naïve Bayes and Maximum Entropy classifiers were used. • Experiments were carried trying out for different combinations of bigram/unigram, with part-of-speech (POS) tagging. • The performance was evaluated with cross validation. Department of Stat. & Comp. Sc., Faculty of Science, University of Peradeniya 22
  • 23. Datasets 23 Department of Stat. & Comp. Sc., Faculty of Science, University of Peradeniya Name Content (keywords) # From To Classified Polarity Ratio (Negative:Positive) Dengue Dengue 472 27/04/2015 20:29 1/7/2015 15:14 Yes 323:149 H1N1 H1N1, Influenza 548 24/06/15 1:45 30/06/15 14:57 Yes 314 : 234 Chemo-I Chemotherapy 170 12/10/15 7:12 22/10/15 14:37 Yes 72 : 98 Chemo-II Chemotherapy 734 12/10/2015 12:04 22/10/15 14:37 No -
  • 24. Experiment 1: Dengue Dataset Dengue, Dengue Vaccine Naïve Bayes MaxEnt Uni grams Bi grams POS- Tagged Bigrams Uni grams Bi grams POS- Tagged Bigrams Accuracy 72.52 75.50 81.82 68.68 70.32 76.06 Weighted Precision 74.26 74.40 81.69 72.42 65.91 61.28 Weighted Recall 70.70 73.77 82.26 67.30 70.82 57.70 Weighted F-measure 70.90 71.00 79.84 67.55 60.57 58.72 Department of Stat. & Comp. Sc., Faculty of Science, University of Peradeniya 24
  • 25. Accuracy 25 Department of Stat. & Comp. Sc., Faculty of Science, University of Peradeniya 60.00 65.00 70.00 75.00 80.00 85.00 Naïve Bayes Maximum Entropy Unigrams Bigrams POS-Tagged Bigrams
  • 26. Weighted F-measure Department of Stat. & Comp. Sc., Faculty of Science, University of Peradeniya 0.00 10.00 20.00 30.00 40.00 50.00 60.00 70.00 80.00 90.00 Naïve Bayes Maximum Entropy Unigrams Bigrams POS-Tagged Bigrams 26
  • 27. Experiment 2: H1N1 Dataset H1N1,Influenza Department of Stat. & Comp. Sc., Faculty of Science, University of Peradeniya 27 Naïve Bayes Unigrams Bigrams POS-Tagged Bigrams Accuracy 67.43 70.59 76.04 Weighted Precision 67.52 70.62 76.09 Weighted Recall 67.95 70.44 76.05 Weighted F-measure 65.69 70.08 75.78
  • 28. Experiment 3: Chemo-I Dataset Chemotherapy Department of Stat. & Comp. Sc., Faculty of Science, University of Peradeniya 28 Naïve Bayes Unigrams Bigrams POS-Tagged Bigrams Accuracy 75.88 76.47 78.24 Weighted Precision 78.23 78.66 79.96 Weighted Recall 75.10 75.60 77.16 Weighted F-measure 75.69 76.25 77.93
  • 29. Polarity Checker : Dataset Analysis 29 Department of Stat. & Comp. Sc., Faculty of Science, University of Peradeniya
  • 30. Polarity Checker : Top Stories Positive Negative 1 #Dengue News: Scientists identify the skin immune cells targeted by the dengue virus United Nations News Centre - At least 3,000 suspected Dengue fever cases reported in Yemen – UN health agency: 2 Co-ordination meet of BBMP Health and edu. dept. regarding control and prevention of Dengue and Chikungunya fever spread by Mosquito bite. (1/5) #MyiTimes Country faces largest dengue epidemic ever - KUALA LUMPUR: The country is probably facing the largest dengue problem 3 Well that's a 1st! Malaysia Dept of Health officials doing house to house calls looking for dengue hot spots!! Clean bill of health here! #Dengue News: Country faces largest dengue epidemic ever - Free Malaysia Today 4 @PascalBarollier Fantastic! Thanks for helping our tribe put a face to dengue global leaders won't forget. Country faces largest dengue epidemic ever: The number of deaths has doubled this year compared to the same period… 5 @DengueInfo Thank you for helping us get the word out on Dengue Tribe! To help put a face to dengue, join here #Yemen Yemen: At least 3,000 suspected Dengue fever cases reported in Yemen – UN health agency says 30 Department of Stat. & Comp. Sc., Faculty of Science, University of Peradeniya
  • 31. Polarity Checker : Text Analysis 31 Department of Stat. & Comp. Sc., Faculty of Science, University of Peradeniya
  • 32. Buzzmeter Department of Stat. & Comp. Sc., Faculty of Science, University of Peradeniya 32
  • 33. Buzzmeter : Unigram vs. Bigram Department of Stat. & Comp. Sc., Faculty of Science, University of Peradeniya 33
  • 34. Buzzmeter : Unigram vs. Bigram • Chemo radiation • Breast cancer • Last chemo • Cancer awareness 34 Department of Stat. & Comp. Sc., Faculty of Science, University of Peradeniya
  • 35. Conclusion • This research presents a sentiment analysis with special attention towards improving health awareness.  automatic classification of a given tweet  generate the general attitude from a given set of tweets, with top stories.  track most commonly used words/phrases in health related tweets • POS-tagged bigrams using nouns + adjectives with Naive Bayes method produced the best overall performance. Department of Stat. & Comp. Sc., Faculty of Science, University of Peradeniya 35
  • 36. Future Recommendations • Real-time Twitter data analyzing • Web plug-ins • Mobile apps • Identifying pattern of spreading of a disease, threatened areas & age groups • Health alerts/warnings system Department of Stat. & Comp. Sc., Faculty of Science, University of Peradeniya 36
  • 37. Questions? 37 Department of Stat. & Comp. Sc., Faculty of Science, University of Peradeniya
  • 38. Thank You!!! 38 Department of Stat. & Comp. Sc., Faculty of Science, University of Peradeniya