SlideShare uma empresa Scribd logo
1 de 64
Baixar para ler offline
Different types of analysis Sentiment analysis Next meetings
Big Data and Automated Content Analysis
Week 4 – Monday
»Sentiment Analysis«
Damian Trilling
d.c.trilling@uva.nl
@damian0604
www.damiantrilling.net
Afdeling Communicatiewetenschap
Universiteit van Amsterdam
26 February 2017
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Next meetings
Today
1 Different types of analysis
What can we do?
Systematizing analytical approaches
2 Data analysis 1: Sentiment analysis
What is it?
Bag-of-words approaches
Advanced approaches
A sentiment analysis tailored to your needs!
Packages for sentiment analysis
A receipe
Machine Learning as alternative
3 Take-home message, next meetings, & exam
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Next meetings
What we already can do
with regard to data collection:
• query a (JSON-based) API (GoogleBooks, Twitter)
• handle CSV files
• handle JSON files
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Next meetings
What we already can do
with regard to data collection:
• query a (JSON-based) API (GoogleBooks, Twitter)
• handle CSV files
• handle JSON files
with regard to analysis:
Not much. We counted some frequencies and calculated some
averages.
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Next meetings
What can we do?
Data analysis: Overview
What can we do?
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Next meetings
What can we do?
Data analysis: Overview
What can we do?
What do you think? What are interesting methods to
analyze large data sets (like, e.g., social media data? What
questions can they answer?)
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Next meetings
What can we do?
What else can we do?
For example
• sentiment analysis
• automated coding with regular expressions
• natural language processing
• supervised and unsupervised machine learning
• network analysis
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Next meetings
What can we do?
What else can we do?
Or ideally. . .
. . . a combination of these techniques.
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Next meetings
Systematizing analytical approaches
Overview
Systematizing analytical approaches
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Next meetings
Systematizing analytical approaches
Systematizing analytical approaches
Taking the example of Twitter:
Analyzing the structure
• Number of Tweets over time
• singleton/retweet ratio
• Distribution of number of Tweets per user
• Interaction networks
Bruns, A., & Stieglitz, S. (2013). Toward more systematic Twitter analysis: metrics for tweeting activities.
International Journal of Social Research Methodology. doi:10.1080/13645579.2012.756095
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Next meetings
Systematizing analytical approaches
Systematizing analytical approaches
Taking the example of Twitter:
Analyzing the structure
• Number of Tweets over time
• singleton/retweet ratio
• Distribution of number of Tweets per user
• Interaction networks
⇒ Focus on the amount of content and on the question who
interacts with whom, not on what is said
Bruns, A., & Stieglitz, S. (2013). Toward more systematic Twitter analysis: metrics for tweeting activities.
International Journal of Social Research Methodology. doi:10.1080/13645579.2012.756095
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Next meetings
Systematizing analytical approaches
Systematizing analytical approaches
Taking the example of Twitter:
Analyzing the content
• Sentiment analysis
• Word frequencies, searchstrings
• Co-word analysis (⇒frames)
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Next meetings
Systematizing analytical approaches
Systematizing analytical approaches
Taking the example of Twitter:
Analyzing the content
• Sentiment analysis
• Word frequencies, searchstrings
• Co-word analysis (⇒frames)
⇒ Focus on what is said
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Next meetings
Systematizing analytical approaches
Systematizing analytical approaches
⇒ It depends on your reserach question which approach is
more interesting!
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Next meetings
Systematizing analytical approaches
Automated Content Analysis
Methodological approach
deductive inductive
Typical research interests
and content features
Common statistical
procedures
visibility analysis
sentiment analysis
subjectivity analysis
Counting and
Dictionary
Supervised
Machine Learning
Unsupervised
Machine Learning
frames
topics
gender bias
frames
topics
string comparisons
counting
support vector machines
naive Bayes
principal component analysis
cluster analysis
latent dirichlet allocation
semantic network analysis
Boumans, J.W., & Trilling, D. (2016). Taking stock of the toolkit: An overview of relevant automated content
analysis approaches and techniques for digital journalism scholars. Digital Journalism, 4, 1. 8–23.
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Next meetings
What is it?
Data analysis 1:
Sentiment analysis
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Next meetings
What is it?
What is sentiment analysis?
Extracting subjective information from texts
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Next meetings
What is it?
What is sentiment analysis?
Extracting subjective information from texts
• the author’s attitude towards the topic of the text
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Next meetings
What is it?
What is sentiment analysis?
Extracting subjective information from texts
• the author’s attitude towards the topic of the text
• polarity: negative—positive
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Next meetings
What is it?
What is sentiment analysis?
Extracting subjective information from texts
• the author’s attitude towards the topic of the text
• polarity: negative—positive
• subjectivity: neutral—subjective *
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Next meetings
What is it?
What is sentiment analysis?
Extracting subjective information from texts
• the author’s attitude towards the topic of the text
• polarity: negative—positive
• subjectivity: neutral—subjective *
• advanced approaches: different emotions
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Next meetings
What is it?
What is sentiment analysis?
Extracting subjective information from texts
• the author’s attitude towards the topic of the text
• polarity: negative—positive
• subjectivity: neutral—subjective *
• advanced approaches: different emotions
* Less sophisticated approaches do not see this as a sperate dimension but
simply calculate objectivity = 1 − (negativity + positivity)
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Next meetings
What is it?
Applications
Who uses it?
• Companies
• especially for Web Analytics
• Social Scientists
• applications in data journalism, politics, . . .
Many references to examples in Mostafa (2013).
⇒ Cases in which you have a huge amount of data or real-time
data and you want to get an idea of the tone.
Mostafa, M. M. (2013). More than words: Social networks’ text mining for consumer brand sentiments. Expert
Systems with Applications, 40(10), 4241– 4251. doi:10.1016/j.eswa.2013.01.019
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Next meetings
What is it?
Example
1 >>> sentiment("Great service by @NSHighspeed")
2 (0.8, 0.75)
3 >>> sentiment("Bad service by @NSHighspeed")
4 (-0.6166666666666667, 0.6666666666666666)
(polarity, subjectivity) with
−1 ≤ polarity ≤ +1
0 ≤ subjectivity ≤ +1 )
This is the module pattern.nl, available for Python 2 via pip. The development branch on github supports Python
3.
De Smedt, T., & Daelemans W. (2012). Pattern for Python. Journal of Machine Learning Research, 13, 2063-2067.
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Next meetings
Bag-of-words approaches
Data analysis 1: Sentiment analysis
Bag-of-words approaches
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Next meetings
Bag-of-words approaches
Bag-of-words approaches
How does it work?
• We take each word of a text and look if it’s positive or
negative.
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Next meetings
Bag-of-words approaches
Bag-of-words approaches
How does it work?
• We take each word of a text and look if it’s positive or
negative.
• Most simple way: compare it with a list of negative words and
with a list of positive words (That’s what Mostafa (2013) did)
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Next meetings
Bag-of-words approaches
Bag-of-words approaches
How does it work?
• We take each word of a text and look if it’s positive or
negative.
• Most simple way: compare it with a list of negative words and
with a list of positive words (That’s what Mostafa (2013) did)
• More advanced: look up a subjectivity score from a table
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Next meetings
Bag-of-words approaches
Bag-of-words approaches
How does it work?
• We take each word of a text and look if it’s positive or
negative.
• Most simple way: compare it with a list of negative words and
with a list of positive words (That’s what Mostafa (2013) did)
• More advanced: look up a subjectivity score from a table
• e.g., add up the scores and average them.
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Next meetings
Bag-of-words approaches
How to do this
If you were to run an analyis like the one by Mostafa (2013), how
could you do this?
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Next meetings
Bag-of-words approaches
How to do this
(given a string tekst that you want to analyze and two lists of strings with negative
and positive words, lijstpos=["great","fantastic",...,"perfect"] and
lijstneg)
1 sentiment=0
2 for woord in tekst.split():
3 if woord in lijstpos:
4 sentiment=sentiment+1 #same as sentiment+=1
5 elif woord in lijstneg:
6 sentiment=sentiment-1 #same as sentiment-=1
7 print (sentiment)
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Next meetings
Bag-of-words approaches
Do we need to have the lists in our program itself?
No.
You could have them in a separate text file, one per row, and then
read that file directly to a list.
1 poslijst=open("filewithonepositivewordperline.txt").read().splitlines()
2 neglijst=open("filewithonenegativewordperline.txt").read().splitlines()
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Next meetings
Bag-of-words approaches
More advanced versions
• CSV files or similar tables with weights
• Or some kind of dict?
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Next meetings
Bag-of-words approaches
Mustafa 2013: Interpreting the output
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Next meetings
Bag-of-words approaches
Mustafa 2013: Interpreting the output
Your thoughts?
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Next meetings
Bag-of-words approaches
Mustafa 2013: Interpreting the output
Your thoughts?
• each word
counts equally
(1)
• many tweets
contain no
words from the
list. What does
this mean?
• Ways to
improve BOW
approaches?
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Next meetings
Bag-of-words approaches
Bag-of-words approaches
e.g., Schut, L. (2013). Verenigde Staten vs. Verenigd Koningrijk: Een automatische inhoudsanalyse naar
verklarende factoren voor het gebruik van positive campaigning en negative campaigning door vooraanstaande
politici en politieke partijen op Twitter. Bachelor Thesis, Universiteit van Amsterdam.
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Next meetings
Bag-of-words approaches
Bag-of-words approaches
pro
• easy to implement
• easy to modify:
• add or remove words
• make new lists for other languages, other categories (than
positive/negative), . . .
• easy to understand (transparency, reproducability)
e.g., Schut, L. (2013). Verenigde Staten vs. Verenigd Koningrijk: Een automatische inhoudsanalyse naar
verklarende factoren voor het gebruik van positive campaigning en negative campaigning door vooraanstaande
politici en politieke partijen op Twitter. Bachelor Thesis, Universiteit van Amsterdam.
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Next meetings
Bag-of-words approaches
Bag-of-words approaches
con
• simplistic assumptions
• e.g., intensifiers cannot be interpreted ("really" in "really
good" or "really bad")
• or, even more important, negations.
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Next meetings
Advanced approaches
Data analysis 1: Sentiment analysis
Advanced approaches
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Next meetings
Advanced approaches
Improving the BOW approach
Example: The Sentistrenght algorithm
• −5 . . . − 1 and +1 . . . + 5
• spelling correction
• "booster word list" for strengthening/weakening the effect of
the following word
• interpreting repeated letters ("baaaaaad"), CAPITALS and !!!
• idioms
• negation
• ldots
Thelwall, M., Buckley, K., & Paltoglou, G. (2012). Sentiment strength detection for the social Web. Journal of the
American Society for Information Science and Technology, 63(1), 163-173.
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Next meetings
Advanced approaches
Advanced approaches
Take the structure of a text into account
• Try to apply linguistics concepts to identify sentence structure
• can identify negations
• can interpret intensifiers
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Next meetings
Advanced approaches
Example
1 from pattern.nl import sentiment
2 >>> sentiment("Great service by @NSHighspeed")
3 (0.8, 0.75)
4 >>> sentiment("Really")
5 (0.0, 1.0)
6 >>> sentiment("Really Great service by @NSHighspeed")
7 (1.0, 1.0)
(polarity, subjectivity) with
−1 ≤ polarity ≤ +1
0 ≤ subjectivity ≤ +1 )
Unlike in pure bag-of-words approaches, here, the overall sentiment is not just
the sum or the average of its parts!
De Smedt, T., & Daelemans W. (2012). Pattern for Python. Journal of Machine Learning Research, 13, 2063-2067.
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Next meetings
Advanced approaches
Advanced approaches
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Next meetings
Advanced approaches
Advanced approaches
pro
• understand intensifiers or negation
• thus: higher accuracy
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Next meetings
Advanced approaches
Advanced approaches
pro
• understand intensifiers or negation
• thus: higher accuracy
con
• Black box? Or do we understand the algorithm?
• Difficult to adapt to own needs
• really much better results?
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Next meetings
A sentiment analysis tailored to your needs!
Data analysis 1: Sentiment analysis
A sentiment analysis tailored to your needs!
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Next meetings
A sentiment analysis tailored to your needs!
A sentiment analysis tailored to your needs!
Identifying suicidal texts
• Bag-of-words-approach with very specific dictionary
• added negation
• added regular expression search for key phrases
• Very specific design requirements: False positives are OK,
false negatives not!
Huang, Y.-P., Goh, T., & Liew, C.L. (2007). Hunting suicide notes in web 2.0 – preliminary findings. Ninth IEEE
International Symposium on Multimedia. Retrieved from
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=4476021
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Next meetings
A sentiment analysis tailored to your needs!
Already this still relatively simple approach seems to work
satisfactory, but if 106 scientists from 24 competing teams (!)
work on it, they can
Pestian, J.P.; Matykiewicz, P., Linn-Gust, M., South, B., Uzuner, O., Wiebe, J., Cohen, K.B., Hurdle, J., & Brew,
C. (2012). Sentiment analysis of suicide notes: A shared task. Biomedical Informatics Insights, 5(1), p. 3-16.
Retrieved from http://europepmc.org/articles/PMC3299408?pdf=render
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Next meetings
A sentiment analysis tailored to your needs!
Already this still relatively simple approach seems to work
satisfactory, but if 106 scientists from 24 competing teams (!)
work on it, they can
group suidide notes by these characteristics:
• swear
• family
• friend
• positive emotion
• negative emotion
• anxiety
• anger
• sad
• cognitive process
• biology
• sexual
• ingestion
• religion
• death
Pestian, J.P.; Matykiewicz, P., Linn-Gust, M., South, B., Uzuner, O., Wiebe, J., Cohen, K.B., Hurdle, J., & Brew,
C. (2012). Sentiment analysis of suicide notes: A shared task. Biomedical Informatics Insights, 5(1), p. 3-16.
Retrieved from http://europepmc.org/articles/PMC3299408?pdf=render
Big Data and Automated Content Analysis Damian Trilling
Packages for sentiment analysis
Different types of analysis Sentiment analysis Next meetings
Packages for sentiment analysis
Which packages are easy to use?
vader pro: in NLTK module, con: English only
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Next meetings
Packages for sentiment analysis
Which packages are easy to use?
vader pro: in NLTK module, con: English only
pattern pro: mutiple languages (including Dutch), con:
porting to Python 3 is still under development (but in
principle, it works)
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Next meetings
Packages for sentiment analysis
Which packages are easy to use?
vader pro: in NLTK module, con: English only
pattern pro: mutiple languages (including Dutch), con:
porting to Python 3 is still under development (but in
principle, it works)
sentistrength pro: multiple languages, widely used, con: needs
Python wrapper, license
vader: Chapter 6.3; pattern: Chapter 6.5; sentistrength: Chapter 6.4
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Next meetings
Packages for sentiment analysis
Which packages are easy to use?
vader pro: in NLTK module, con: English only
pattern pro: mutiple languages (including Dutch), con:
porting to Python 3 is still under development (but in
principle, it works)
sentistrength pro: multiple languages, widely used, con: needs
Python wrapper, license
vader: Chapter 6.3; pattern: Chapter 6.5; sentistrength: Chapter 6.4
BUT: Keep in mind that the results of any
off-the-shelf-package might be biased and/or noisy in your
domain!
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Next meetings
A receipe
A possible receipe for doing your sentiment analysis
1 Construct a list data of strings with your input data
2 Create an empty list sent for storing the results
3 For each text t in data, estimate the sentiment of t and
append the result to sent1
4 Confirm that len(data) == len(sent)
5 use zip() and a csv.writer to write input and output next
to each other to a csv file.
1
use multiple lists instead if you estimate for instance subjectivity and
polarity
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Next meetings
Machine Learning as alternative
Superivsed ML (⇒ week7)
An alternative state-of-the-art approach:
Use supervised machine learning
• Instead of defining rules, hand-code (“annotate”) the
sentiment of some tweets manually and let the computer find
out which words or characters (“features”) predict sentiment
• Then use this model to predict sentiment for other tweets
• Essentially the same like what you know since the second year
of your Bachelor: regression analysis (but now with DV
sentiment and IV’s word occurrences)
Gonzalez-Bailon, S., & Paltoglou, G. (2015). Signals of public opinion in online communication: A comparison of
methods and data sources. The ANNALS of the American Academy of Political and Social Science, 659(1),
95-–107.
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Next meetings
Take-home message
Mid-term take-home exam
Next meetings
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Next meetings
Take-home messages
What you should be familiar with:
• You should have completely understood last week’s exercise.
Re-read it if neccessary.
• Approaches to the analysis (e.g., structure vs. content)
• Types of sentiment analysis, application areas, pros and cons
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Next meetings
Mid-term take home exam
Week 5: Wednesday, 7 March, to Sunday, 11 March
• You get the exam on Wednesday at the end of the meeting
• Answers have to be handed in no later than Sunday, 23.59
• 30% of final grade
• 3 questions:
1 Literature question: E.g., different methods (“Explain how. . . is
done”) and/or epistemological or theoretical implications
(“What does this mean for social-scientific research?”)
2 Empirical question (conceptual)
3 Empirical question (actual programming task)
If you fully understood all exercises until now, it shouldn’t be
difficult and won’t take too long. But give yourself a lot of
buffer time!!!
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Next meetings
Next meetings
Wednesday, 28–2
Task for during the meeting: Conduct an own sentiment analysis!
You can bring your own data (you will probably learn more!), but
then already think about how to write some script to read the data
(as we did last week and as described in Chapter 5).
Big Data and Automated Content Analysis Damian Trilling

Mais conteúdo relacionado

Mais procurados

Why L-3 Data Tactics Data Science?
Why L-3 Data Tactics Data Science?Why L-3 Data Tactics Data Science?
Why L-3 Data Tactics Data Science?Rich Heimann
 
The Web of Data: do we actually understand what we built?
The Web of Data: do we actually understand what we built?The Web of Data: do we actually understand what we built?
The Web of Data: do we actually understand what we built?Frank van Harmelen
 
Tutorial on Question Answering Systems
Tutorial on Question Answering Systems Tutorial on Question Answering Systems
Tutorial on Question Answering Systems Saeedeh Shekarpour
 
Moving Your Machine Learning Models to Production with TensorFlow Extended
Moving Your Machine Learning Models to Production with TensorFlow ExtendedMoving Your Machine Learning Models to Production with TensorFlow Extended
Moving Your Machine Learning Models to Production with TensorFlow ExtendedJonathan Mugan
 
Presentation1.pdf
Presentation1.pdfPresentation1.pdf
Presentation1.pdfZixunZhou
 
Big Data Analytics: Discovering Latent Structure in Twitter; A Case Study in ...
Big Data Analytics: Discovering Latent Structure in Twitter; A Case Study in ...Big Data Analytics: Discovering Latent Structure in Twitter; A Case Study in ...
Big Data Analytics: Discovering Latent Structure in Twitter; A Case Study in ...Rich Heimann
 
Machine Learning - Challenges, Learnings & Opportunities
Machine Learning - Challenges, Learnings & OpportunitiesMachine Learning - Challenges, Learnings & Opportunities
Machine Learning - Challenges, Learnings & OpportunitiesCodePolitan
 

Mais procurados (20)

BDACA - Lecture6
BDACA - Lecture6BDACA - Lecture6
BDACA - Lecture6
 
BDACA1516s2 - Lecture7
BDACA1516s2 - Lecture7BDACA1516s2 - Lecture7
BDACA1516s2 - Lecture7
 
BDACA1516s2 - Lecture6
BDACA1516s2 - Lecture6BDACA1516s2 - Lecture6
BDACA1516s2 - Lecture6
 
BDACA1516s2 - Lecture5
BDACA1516s2 - Lecture5BDACA1516s2 - Lecture5
BDACA1516s2 - Lecture5
 
BDACA1516s2 - Lecture2
BDACA1516s2 - Lecture2BDACA1516s2 - Lecture2
BDACA1516s2 - Lecture2
 
Analyzing social media with Python and other tools (1/4)
Analyzing social media with Python and other tools (1/4)Analyzing social media with Python and other tools (1/4)
Analyzing social media with Python and other tools (1/4)
 
BDACA1617s2 - Tutorial 1
BDACA1617s2 - Tutorial 1BDACA1617s2 - Tutorial 1
BDACA1617s2 - Tutorial 1
 
BDACA1617s2 - Lecture 2
BDACA1617s2 - Lecture 2BDACA1617s2 - Lecture 2
BDACA1617s2 - Lecture 2
 
BDACA1617s2 - Lecture7
BDACA1617s2 - Lecture7BDACA1617s2 - Lecture7
BDACA1617s2 - Lecture7
 
BD-ACA week3a
BD-ACA week3aBD-ACA week3a
BD-ACA week3a
 
BDACA1617s2 - Lecture6
BDACA1617s2 - Lecture6BDACA1617s2 - Lecture6
BDACA1617s2 - Lecture6
 
BDACA1617s2 - Lecture5
BDACA1617s2 - Lecture5BDACA1617s2 - Lecture5
BDACA1617s2 - Lecture5
 
Why L-3 Data Tactics Data Science?
Why L-3 Data Tactics Data Science?Why L-3 Data Tactics Data Science?
Why L-3 Data Tactics Data Science?
 
BD-ACA Week8a
BD-ACA Week8aBD-ACA Week8a
BD-ACA Week8a
 
The Web of Data: do we actually understand what we built?
The Web of Data: do we actually understand what we built?The Web of Data: do we actually understand what we built?
The Web of Data: do we actually understand what we built?
 
Tutorial on Question Answering Systems
Tutorial on Question Answering Systems Tutorial on Question Answering Systems
Tutorial on Question Answering Systems
 
Moving Your Machine Learning Models to Production with TensorFlow Extended
Moving Your Machine Learning Models to Production with TensorFlow ExtendedMoving Your Machine Learning Models to Production with TensorFlow Extended
Moving Your Machine Learning Models to Production with TensorFlow Extended
 
Presentation1.pdf
Presentation1.pdfPresentation1.pdf
Presentation1.pdf
 
Big Data Analytics: Discovering Latent Structure in Twitter; A Case Study in ...
Big Data Analytics: Discovering Latent Structure in Twitter; A Case Study in ...Big Data Analytics: Discovering Latent Structure in Twitter; A Case Study in ...
Big Data Analytics: Discovering Latent Structure in Twitter; A Case Study in ...
 
Machine Learning - Challenges, Learnings & Opportunities
Machine Learning - Challenges, Learnings & OpportunitiesMachine Learning - Challenges, Learnings & Opportunities
Machine Learning - Challenges, Learnings & Opportunities
 

Semelhante a BDACA - Lecture4

Lane-SlidesMania.pptx
Lane-SlidesMania.pptxLane-SlidesMania.pptx
Lane-SlidesMania.pptxAngeCustodio
 
How to Analyze Data (1).pptx
How to Analyze Data (1).pptxHow to Analyze Data (1).pptx
How to Analyze Data (1).pptxInfosectrain3
 
Social CI: A Work method and a tool for Competitive Intelligence Networking
Social CI: A Work method and a tool for Competitive Intelligence NetworkingSocial CI: A Work method and a tool for Competitive Intelligence Networking
Social CI: A Work method and a tool for Competitive Intelligence NetworkingComintelli
 
Merriam ch 8 5.26.10
Merriam ch 8 5.26.10Merriam ch 8 5.26.10
Merriam ch 8 5.26.10Daberkow
 
Twitter data analysis using R
Twitter data analysis using RTwitter data analysis using R
Twitter data analysis using Rsantoshi mangalgi
 
DIY ERM (Do-It-Yourself Electronic Resources Management) for the Small Library
DIY ERM (Do-It-Yourself Electronic Resources Management) for the Small LibraryDIY ERM (Do-It-Yourself Electronic Resources Management) for the Small Library
DIY ERM (Do-It-Yourself Electronic Resources Management) for the Small LibraryNASIG
 
Digital Trails Dave King 1 5 10 Part 2 D3
Digital Trails   Dave King   1 5 10   Part 2   D3Digital Trails   Dave King   1 5 10   Part 2   D3
Digital Trails Dave King 1 5 10 Part 2 D3Dave King
 
Citi Global T4I Accelerator Data and Analytics Presentation
Citi Global T4I Accelerator Data and Analytics PresentationCiti Global T4I Accelerator Data and Analytics Presentation
Citi Global T4I Accelerator Data and Analytics PresentationMarquis Cabrera
 
A picture is worth a thousand words
A picture is worth a thousand wordsA picture is worth a thousand words
A picture is worth a thousand wordsMasum Billah
 
Investigating Performance: Design & Outcomes with xAPI | LSCon 2017
Investigating Performance: Design & Outcomes with xAPI | LSCon 2017Investigating Performance: Design & Outcomes with xAPI | LSCon 2017
Investigating Performance: Design & Outcomes with xAPI | LSCon 2017HT2 Labs
 
Data Science in Industry - Applying Machine Learning to Real-world Challenges
Data Science in Industry - Applying Machine Learning to Real-world ChallengesData Science in Industry - Applying Machine Learning to Real-world Challenges
Data Science in Industry - Applying Machine Learning to Real-world ChallengesYuchen Zhao
 
Diamonds in the Rough (Sentiment(al) Analysis
Diamonds in the Rough (Sentiment(al) AnalysisDiamonds in the Rough (Sentiment(al) Analysis
Diamonds in the Rough (Sentiment(al) AnalysisScott K. Wilder
 
Session 01 designing and scoping a data science project
Session 01 designing and scoping a data science projectSession 01 designing and scoping a data science project
Session 01 designing and scoping a data science projectbodaceacat
 
Session 01 designing and scoping a data science project
Session 01 designing and scoping a data science projectSession 01 designing and scoping a data science project
Session 01 designing and scoping a data science projectSara-Jayne Terp
 

Semelhante a BDACA - Lecture4 (20)

BD-ACA week4a
BD-ACA week4aBD-ACA week4a
BD-ACA week4a
 
BDACA1516s2 - Lecture1
BDACA1516s2 - Lecture1BDACA1516s2 - Lecture1
BDACA1516s2 - Lecture1
 
Lane-SlidesMania.pptx
Lane-SlidesMania.pptxLane-SlidesMania.pptx
Lane-SlidesMania.pptx
 
How to Analyze Data (1).pptx
How to Analyze Data (1).pptxHow to Analyze Data (1).pptx
How to Analyze Data (1).pptx
 
BD-ACA week1b
BD-ACA week1bBD-ACA week1b
BD-ACA week1b
 
BDACA1617s2 - Lecture 1
BDACA1617s2 - Lecture 1BDACA1617s2 - Lecture 1
BDACA1617s2 - Lecture 1
 
Social CI: A Work method and a tool for Competitive Intelligence Networking
Social CI: A Work method and a tool for Competitive Intelligence NetworkingSocial CI: A Work method and a tool for Competitive Intelligence Networking
Social CI: A Work method and a tool for Competitive Intelligence Networking
 
Merriam ch 8 5.26.10
Merriam ch 8 5.26.10Merriam ch 8 5.26.10
Merriam ch 8 5.26.10
 
Twitter data analysis using R
Twitter data analysis using RTwitter data analysis using R
Twitter data analysis using R
 
DIY ERM (Do-It-Yourself Electronic Resources Management) for the Small Library
DIY ERM (Do-It-Yourself Electronic Resources Management) for the Small LibraryDIY ERM (Do-It-Yourself Electronic Resources Management) for the Small Library
DIY ERM (Do-It-Yourself Electronic Resources Management) for the Small Library
 
BDACA - Lecture1
BDACA - Lecture1BDACA - Lecture1
BDACA - Lecture1
 
new.pptx
new.pptxnew.pptx
new.pptx
 
Digital Trails Dave King 1 5 10 Part 2 D3
Digital Trails   Dave King   1 5 10   Part 2   D3Digital Trails   Dave King   1 5 10   Part 2   D3
Digital Trails Dave King 1 5 10 Part 2 D3
 
Citi Global T4I Accelerator Data and Analytics Presentation
Citi Global T4I Accelerator Data and Analytics PresentationCiti Global T4I Accelerator Data and Analytics Presentation
Citi Global T4I Accelerator Data and Analytics Presentation
 
A picture is worth a thousand words
A picture is worth a thousand wordsA picture is worth a thousand words
A picture is worth a thousand words
 
Investigating Performance: Design & Outcomes with xAPI | LSCon 2017
Investigating Performance: Design & Outcomes with xAPI | LSCon 2017Investigating Performance: Design & Outcomes with xAPI | LSCon 2017
Investigating Performance: Design & Outcomes with xAPI | LSCon 2017
 
Data Science in Industry - Applying Machine Learning to Real-world Challenges
Data Science in Industry - Applying Machine Learning to Real-world ChallengesData Science in Industry - Applying Machine Learning to Real-world Challenges
Data Science in Industry - Applying Machine Learning to Real-world Challenges
 
Diamonds in the Rough (Sentiment(al) Analysis
Diamonds in the Rough (Sentiment(al) AnalysisDiamonds in the Rough (Sentiment(al) Analysis
Diamonds in the Rough (Sentiment(al) Analysis
 
Session 01 designing and scoping a data science project
Session 01 designing and scoping a data science projectSession 01 designing and scoping a data science project
Session 01 designing and scoping a data science project
 
Session 01 designing and scoping a data science project
Session 01 designing and scoping a data science projectSession 01 designing and scoping a data science project
Session 01 designing and scoping a data science project
 

Mais de Department of Communication Science, University of Amsterdam (9)

BDACA - Tutorial5
BDACA - Tutorial5BDACA - Tutorial5
BDACA - Tutorial5
 
BDACA - Lecture5
BDACA - Lecture5BDACA - Lecture5
BDACA - Lecture5
 
BDACA - Lecture2
BDACA - Lecture2BDACA - Lecture2
BDACA - Lecture2
 
BDACA - Tutorial1
BDACA - Tutorial1BDACA - Tutorial1
BDACA - Tutorial1
 
Media diets in an age of apps and social media: Dealing with a third layer of...
Media diets in an age of apps and social media: Dealing with a third layer of...Media diets in an age of apps and social media: Dealing with a third layer of...
Media diets in an age of apps and social media: Dealing with a third layer of...
 
Conceptualizing and measuring news exposure as network of users and news items
Conceptualizing and measuring news exposure as network of users and news itemsConceptualizing and measuring news exposure as network of users and news items
Conceptualizing and measuring news exposure as network of users and news items
 
Data Science: Case "Political Communication 2/2"
Data Science: Case "Political Communication 2/2"Data Science: Case "Political Communication 2/2"
Data Science: Case "Political Communication 2/2"
 
Data Science: Case "Political Communication 1/2"
Data Science: Case "Political Communication 1/2"Data Science: Case "Political Communication 1/2"
Data Science: Case "Political Communication 1/2"
 
Should we worry about filter bubbles?
Should we worry about filter bubbles?Should we worry about filter bubbles?
Should we worry about filter bubbles?
 

Último

Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceSamikshaHamane
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxCarlos105
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfTechSoup
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Celine George
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxDr.Ibrahim Hassaan
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSJoshuaGantuangco2
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...Postal Advocate Inc.
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptxmary850239
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17Celine George
 

Último (20)

Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in Pharmacovigilance
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
 
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptxLEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptx
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptxYOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17
 

BDACA - Lecture4

  • 1. Different types of analysis Sentiment analysis Next meetings Big Data and Automated Content Analysis Week 4 – Monday »Sentiment Analysis« Damian Trilling d.c.trilling@uva.nl @damian0604 www.damiantrilling.net Afdeling Communicatiewetenschap Universiteit van Amsterdam 26 February 2017 Big Data and Automated Content Analysis Damian Trilling
  • 2. Different types of analysis Sentiment analysis Next meetings Today 1 Different types of analysis What can we do? Systematizing analytical approaches 2 Data analysis 1: Sentiment analysis What is it? Bag-of-words approaches Advanced approaches A sentiment analysis tailored to your needs! Packages for sentiment analysis A receipe Machine Learning as alternative 3 Take-home message, next meetings, & exam Big Data and Automated Content Analysis Damian Trilling
  • 3. Different types of analysis Sentiment analysis Next meetings What we already can do with regard to data collection: • query a (JSON-based) API (GoogleBooks, Twitter) • handle CSV files • handle JSON files Big Data and Automated Content Analysis Damian Trilling
  • 4. Different types of analysis Sentiment analysis Next meetings What we already can do with regard to data collection: • query a (JSON-based) API (GoogleBooks, Twitter) • handle CSV files • handle JSON files with regard to analysis: Not much. We counted some frequencies and calculated some averages. Big Data and Automated Content Analysis Damian Trilling
  • 5. Different types of analysis Sentiment analysis Next meetings What can we do? Data analysis: Overview What can we do? Big Data and Automated Content Analysis Damian Trilling
  • 6. Different types of analysis Sentiment analysis Next meetings What can we do? Data analysis: Overview What can we do? What do you think? What are interesting methods to analyze large data sets (like, e.g., social media data? What questions can they answer?) Big Data and Automated Content Analysis Damian Trilling
  • 7. Different types of analysis Sentiment analysis Next meetings What can we do? What else can we do? For example • sentiment analysis • automated coding with regular expressions • natural language processing • supervised and unsupervised machine learning • network analysis Big Data and Automated Content Analysis Damian Trilling
  • 8. Different types of analysis Sentiment analysis Next meetings What can we do? What else can we do? Or ideally. . . . . . a combination of these techniques. Big Data and Automated Content Analysis Damian Trilling
  • 9. Different types of analysis Sentiment analysis Next meetings Systematizing analytical approaches Overview Systematizing analytical approaches Big Data and Automated Content Analysis Damian Trilling
  • 10. Different types of analysis Sentiment analysis Next meetings Systematizing analytical approaches Systematizing analytical approaches Taking the example of Twitter: Analyzing the structure • Number of Tweets over time • singleton/retweet ratio • Distribution of number of Tweets per user • Interaction networks Bruns, A., & Stieglitz, S. (2013). Toward more systematic Twitter analysis: metrics for tweeting activities. International Journal of Social Research Methodology. doi:10.1080/13645579.2012.756095 Big Data and Automated Content Analysis Damian Trilling
  • 11. Different types of analysis Sentiment analysis Next meetings Systematizing analytical approaches Systematizing analytical approaches Taking the example of Twitter: Analyzing the structure • Number of Tweets over time • singleton/retweet ratio • Distribution of number of Tweets per user • Interaction networks ⇒ Focus on the amount of content and on the question who interacts with whom, not on what is said Bruns, A., & Stieglitz, S. (2013). Toward more systematic Twitter analysis: metrics for tweeting activities. International Journal of Social Research Methodology. doi:10.1080/13645579.2012.756095 Big Data and Automated Content Analysis Damian Trilling
  • 12. Different types of analysis Sentiment analysis Next meetings Systematizing analytical approaches Systematizing analytical approaches Taking the example of Twitter: Analyzing the content • Sentiment analysis • Word frequencies, searchstrings • Co-word analysis (⇒frames) Big Data and Automated Content Analysis Damian Trilling
  • 13. Different types of analysis Sentiment analysis Next meetings Systematizing analytical approaches Systematizing analytical approaches Taking the example of Twitter: Analyzing the content • Sentiment analysis • Word frequencies, searchstrings • Co-word analysis (⇒frames) ⇒ Focus on what is said Big Data and Automated Content Analysis Damian Trilling
  • 14. Different types of analysis Sentiment analysis Next meetings Systematizing analytical approaches Systematizing analytical approaches ⇒ It depends on your reserach question which approach is more interesting! Big Data and Automated Content Analysis Damian Trilling
  • 15. Different types of analysis Sentiment analysis Next meetings Systematizing analytical approaches Automated Content Analysis Methodological approach deductive inductive Typical research interests and content features Common statistical procedures visibility analysis sentiment analysis subjectivity analysis Counting and Dictionary Supervised Machine Learning Unsupervised Machine Learning frames topics gender bias frames topics string comparisons counting support vector machines naive Bayes principal component analysis cluster analysis latent dirichlet allocation semantic network analysis Boumans, J.W., & Trilling, D. (2016). Taking stock of the toolkit: An overview of relevant automated content analysis approaches and techniques for digital journalism scholars. Digital Journalism, 4, 1. 8–23. Big Data and Automated Content Analysis Damian Trilling
  • 16. Different types of analysis Sentiment analysis Next meetings What is it? Data analysis 1: Sentiment analysis Big Data and Automated Content Analysis Damian Trilling
  • 17. Different types of analysis Sentiment analysis Next meetings What is it? What is sentiment analysis? Extracting subjective information from texts Big Data and Automated Content Analysis Damian Trilling
  • 18. Different types of analysis Sentiment analysis Next meetings What is it? What is sentiment analysis? Extracting subjective information from texts • the author’s attitude towards the topic of the text Big Data and Automated Content Analysis Damian Trilling
  • 19. Different types of analysis Sentiment analysis Next meetings What is it? What is sentiment analysis? Extracting subjective information from texts • the author’s attitude towards the topic of the text • polarity: negative—positive Big Data and Automated Content Analysis Damian Trilling
  • 20. Different types of analysis Sentiment analysis Next meetings What is it? What is sentiment analysis? Extracting subjective information from texts • the author’s attitude towards the topic of the text • polarity: negative—positive • subjectivity: neutral—subjective * Big Data and Automated Content Analysis Damian Trilling
  • 21. Different types of analysis Sentiment analysis Next meetings What is it? What is sentiment analysis? Extracting subjective information from texts • the author’s attitude towards the topic of the text • polarity: negative—positive • subjectivity: neutral—subjective * • advanced approaches: different emotions Big Data and Automated Content Analysis Damian Trilling
  • 22. Different types of analysis Sentiment analysis Next meetings What is it? What is sentiment analysis? Extracting subjective information from texts • the author’s attitude towards the topic of the text • polarity: negative—positive • subjectivity: neutral—subjective * • advanced approaches: different emotions * Less sophisticated approaches do not see this as a sperate dimension but simply calculate objectivity = 1 − (negativity + positivity) Big Data and Automated Content Analysis Damian Trilling
  • 23. Different types of analysis Sentiment analysis Next meetings What is it? Applications Who uses it? • Companies • especially for Web Analytics • Social Scientists • applications in data journalism, politics, . . . Many references to examples in Mostafa (2013). ⇒ Cases in which you have a huge amount of data or real-time data and you want to get an idea of the tone. Mostafa, M. M. (2013). More than words: Social networks’ text mining for consumer brand sentiments. Expert Systems with Applications, 40(10), 4241– 4251. doi:10.1016/j.eswa.2013.01.019 Big Data and Automated Content Analysis Damian Trilling
  • 24. Different types of analysis Sentiment analysis Next meetings What is it? Example 1 >>> sentiment("Great service by @NSHighspeed") 2 (0.8, 0.75) 3 >>> sentiment("Bad service by @NSHighspeed") 4 (-0.6166666666666667, 0.6666666666666666) (polarity, subjectivity) with −1 ≤ polarity ≤ +1 0 ≤ subjectivity ≤ +1 ) This is the module pattern.nl, available for Python 2 via pip. The development branch on github supports Python 3. De Smedt, T., & Daelemans W. (2012). Pattern for Python. Journal of Machine Learning Research, 13, 2063-2067. Big Data and Automated Content Analysis Damian Trilling
  • 25. Different types of analysis Sentiment analysis Next meetings Bag-of-words approaches Data analysis 1: Sentiment analysis Bag-of-words approaches Big Data and Automated Content Analysis Damian Trilling
  • 26. Different types of analysis Sentiment analysis Next meetings Bag-of-words approaches Bag-of-words approaches How does it work? • We take each word of a text and look if it’s positive or negative. Big Data and Automated Content Analysis Damian Trilling
  • 27. Different types of analysis Sentiment analysis Next meetings Bag-of-words approaches Bag-of-words approaches How does it work? • We take each word of a text and look if it’s positive or negative. • Most simple way: compare it with a list of negative words and with a list of positive words (That’s what Mostafa (2013) did) Big Data and Automated Content Analysis Damian Trilling
  • 28. Different types of analysis Sentiment analysis Next meetings Bag-of-words approaches Bag-of-words approaches How does it work? • We take each word of a text and look if it’s positive or negative. • Most simple way: compare it with a list of negative words and with a list of positive words (That’s what Mostafa (2013) did) • More advanced: look up a subjectivity score from a table Big Data and Automated Content Analysis Damian Trilling
  • 29. Different types of analysis Sentiment analysis Next meetings Bag-of-words approaches Bag-of-words approaches How does it work? • We take each word of a text and look if it’s positive or negative. • Most simple way: compare it with a list of negative words and with a list of positive words (That’s what Mostafa (2013) did) • More advanced: look up a subjectivity score from a table • e.g., add up the scores and average them. Big Data and Automated Content Analysis Damian Trilling
  • 30. Different types of analysis Sentiment analysis Next meetings Bag-of-words approaches How to do this If you were to run an analyis like the one by Mostafa (2013), how could you do this? Big Data and Automated Content Analysis Damian Trilling
  • 31. Different types of analysis Sentiment analysis Next meetings Bag-of-words approaches How to do this (given a string tekst that you want to analyze and two lists of strings with negative and positive words, lijstpos=["great","fantastic",...,"perfect"] and lijstneg) 1 sentiment=0 2 for woord in tekst.split(): 3 if woord in lijstpos: 4 sentiment=sentiment+1 #same as sentiment+=1 5 elif woord in lijstneg: 6 sentiment=sentiment-1 #same as sentiment-=1 7 print (sentiment) Big Data and Automated Content Analysis Damian Trilling
  • 32. Different types of analysis Sentiment analysis Next meetings Bag-of-words approaches Do we need to have the lists in our program itself? No. You could have them in a separate text file, one per row, and then read that file directly to a list. 1 poslijst=open("filewithonepositivewordperline.txt").read().splitlines() 2 neglijst=open("filewithonenegativewordperline.txt").read().splitlines() Big Data and Automated Content Analysis Damian Trilling
  • 33.
  • 34. Different types of analysis Sentiment analysis Next meetings Bag-of-words approaches More advanced versions • CSV files or similar tables with weights • Or some kind of dict? Big Data and Automated Content Analysis Damian Trilling
  • 35.
  • 36.
  • 37. Different types of analysis Sentiment analysis Next meetings Bag-of-words approaches Mustafa 2013: Interpreting the output Big Data and Automated Content Analysis Damian Trilling
  • 38. Different types of analysis Sentiment analysis Next meetings Bag-of-words approaches Mustafa 2013: Interpreting the output Your thoughts? Big Data and Automated Content Analysis Damian Trilling
  • 39. Different types of analysis Sentiment analysis Next meetings Bag-of-words approaches Mustafa 2013: Interpreting the output Your thoughts? • each word counts equally (1) • many tweets contain no words from the list. What does this mean? • Ways to improve BOW approaches? Big Data and Automated Content Analysis Damian Trilling
  • 40. Different types of analysis Sentiment analysis Next meetings Bag-of-words approaches Bag-of-words approaches e.g., Schut, L. (2013). Verenigde Staten vs. Verenigd Koningrijk: Een automatische inhoudsanalyse naar verklarende factoren voor het gebruik van positive campaigning en negative campaigning door vooraanstaande politici en politieke partijen op Twitter. Bachelor Thesis, Universiteit van Amsterdam. Big Data and Automated Content Analysis Damian Trilling
  • 41. Different types of analysis Sentiment analysis Next meetings Bag-of-words approaches Bag-of-words approaches pro • easy to implement • easy to modify: • add or remove words • make new lists for other languages, other categories (than positive/negative), . . . • easy to understand (transparency, reproducability) e.g., Schut, L. (2013). Verenigde Staten vs. Verenigd Koningrijk: Een automatische inhoudsanalyse naar verklarende factoren voor het gebruik van positive campaigning en negative campaigning door vooraanstaande politici en politieke partijen op Twitter. Bachelor Thesis, Universiteit van Amsterdam. Big Data and Automated Content Analysis Damian Trilling
  • 42. Different types of analysis Sentiment analysis Next meetings Bag-of-words approaches Bag-of-words approaches con • simplistic assumptions • e.g., intensifiers cannot be interpreted ("really" in "really good" or "really bad") • or, even more important, negations. Big Data and Automated Content Analysis Damian Trilling
  • 43. Different types of analysis Sentiment analysis Next meetings Advanced approaches Data analysis 1: Sentiment analysis Advanced approaches Big Data and Automated Content Analysis Damian Trilling
  • 44. Different types of analysis Sentiment analysis Next meetings Advanced approaches Improving the BOW approach Example: The Sentistrenght algorithm • −5 . . . − 1 and +1 . . . + 5 • spelling correction • "booster word list" for strengthening/weakening the effect of the following word • interpreting repeated letters ("baaaaaad"), CAPITALS and !!! • idioms • negation • ldots Thelwall, M., Buckley, K., & Paltoglou, G. (2012). Sentiment strength detection for the social Web. Journal of the American Society for Information Science and Technology, 63(1), 163-173. Big Data and Automated Content Analysis Damian Trilling
  • 45. Different types of analysis Sentiment analysis Next meetings Advanced approaches Advanced approaches Take the structure of a text into account • Try to apply linguistics concepts to identify sentence structure • can identify negations • can interpret intensifiers Big Data and Automated Content Analysis Damian Trilling
  • 46. Different types of analysis Sentiment analysis Next meetings Advanced approaches Example 1 from pattern.nl import sentiment 2 >>> sentiment("Great service by @NSHighspeed") 3 (0.8, 0.75) 4 >>> sentiment("Really") 5 (0.0, 1.0) 6 >>> sentiment("Really Great service by @NSHighspeed") 7 (1.0, 1.0) (polarity, subjectivity) with −1 ≤ polarity ≤ +1 0 ≤ subjectivity ≤ +1 ) Unlike in pure bag-of-words approaches, here, the overall sentiment is not just the sum or the average of its parts! De Smedt, T., & Daelemans W. (2012). Pattern for Python. Journal of Machine Learning Research, 13, 2063-2067. Big Data and Automated Content Analysis Damian Trilling
  • 47. Different types of analysis Sentiment analysis Next meetings Advanced approaches Advanced approaches Big Data and Automated Content Analysis Damian Trilling
  • 48. Different types of analysis Sentiment analysis Next meetings Advanced approaches Advanced approaches pro • understand intensifiers or negation • thus: higher accuracy Big Data and Automated Content Analysis Damian Trilling
  • 49. Different types of analysis Sentiment analysis Next meetings Advanced approaches Advanced approaches pro • understand intensifiers or negation • thus: higher accuracy con • Black box? Or do we understand the algorithm? • Difficult to adapt to own needs • really much better results? Big Data and Automated Content Analysis Damian Trilling
  • 50. Different types of analysis Sentiment analysis Next meetings A sentiment analysis tailored to your needs! Data analysis 1: Sentiment analysis A sentiment analysis tailored to your needs! Big Data and Automated Content Analysis Damian Trilling
  • 51. Different types of analysis Sentiment analysis Next meetings A sentiment analysis tailored to your needs! A sentiment analysis tailored to your needs! Identifying suicidal texts • Bag-of-words-approach with very specific dictionary • added negation • added regular expression search for key phrases • Very specific design requirements: False positives are OK, false negatives not! Huang, Y.-P., Goh, T., & Liew, C.L. (2007). Hunting suicide notes in web 2.0 – preliminary findings. Ninth IEEE International Symposium on Multimedia. Retrieved from http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=4476021 Big Data and Automated Content Analysis Damian Trilling
  • 52. Different types of analysis Sentiment analysis Next meetings A sentiment analysis tailored to your needs! Already this still relatively simple approach seems to work satisfactory, but if 106 scientists from 24 competing teams (!) work on it, they can Pestian, J.P.; Matykiewicz, P., Linn-Gust, M., South, B., Uzuner, O., Wiebe, J., Cohen, K.B., Hurdle, J., & Brew, C. (2012). Sentiment analysis of suicide notes: A shared task. Biomedical Informatics Insights, 5(1), p. 3-16. Retrieved from http://europepmc.org/articles/PMC3299408?pdf=render Big Data and Automated Content Analysis Damian Trilling
  • 53. Different types of analysis Sentiment analysis Next meetings A sentiment analysis tailored to your needs! Already this still relatively simple approach seems to work satisfactory, but if 106 scientists from 24 competing teams (!) work on it, they can group suidide notes by these characteristics: • swear • family • friend • positive emotion • negative emotion • anxiety • anger • sad • cognitive process • biology • sexual • ingestion • religion • death Pestian, J.P.; Matykiewicz, P., Linn-Gust, M., South, B., Uzuner, O., Wiebe, J., Cohen, K.B., Hurdle, J., & Brew, C. (2012). Sentiment analysis of suicide notes: A shared task. Biomedical Informatics Insights, 5(1), p. 3-16. Retrieved from http://europepmc.org/articles/PMC3299408?pdf=render Big Data and Automated Content Analysis Damian Trilling
  • 55. Different types of analysis Sentiment analysis Next meetings Packages for sentiment analysis Which packages are easy to use? vader pro: in NLTK module, con: English only Big Data and Automated Content Analysis Damian Trilling
  • 56. Different types of analysis Sentiment analysis Next meetings Packages for sentiment analysis Which packages are easy to use? vader pro: in NLTK module, con: English only pattern pro: mutiple languages (including Dutch), con: porting to Python 3 is still under development (but in principle, it works) Big Data and Automated Content Analysis Damian Trilling
  • 57. Different types of analysis Sentiment analysis Next meetings Packages for sentiment analysis Which packages are easy to use? vader pro: in NLTK module, con: English only pattern pro: mutiple languages (including Dutch), con: porting to Python 3 is still under development (but in principle, it works) sentistrength pro: multiple languages, widely used, con: needs Python wrapper, license vader: Chapter 6.3; pattern: Chapter 6.5; sentistrength: Chapter 6.4 Big Data and Automated Content Analysis Damian Trilling
  • 58. Different types of analysis Sentiment analysis Next meetings Packages for sentiment analysis Which packages are easy to use? vader pro: in NLTK module, con: English only pattern pro: mutiple languages (including Dutch), con: porting to Python 3 is still under development (but in principle, it works) sentistrength pro: multiple languages, widely used, con: needs Python wrapper, license vader: Chapter 6.3; pattern: Chapter 6.5; sentistrength: Chapter 6.4 BUT: Keep in mind that the results of any off-the-shelf-package might be biased and/or noisy in your domain! Big Data and Automated Content Analysis Damian Trilling
  • 59. Different types of analysis Sentiment analysis Next meetings A receipe A possible receipe for doing your sentiment analysis 1 Construct a list data of strings with your input data 2 Create an empty list sent for storing the results 3 For each text t in data, estimate the sentiment of t and append the result to sent1 4 Confirm that len(data) == len(sent) 5 use zip() and a csv.writer to write input and output next to each other to a csv file. 1 use multiple lists instead if you estimate for instance subjectivity and polarity Big Data and Automated Content Analysis Damian Trilling
  • 60. Different types of analysis Sentiment analysis Next meetings Machine Learning as alternative Superivsed ML (⇒ week7) An alternative state-of-the-art approach: Use supervised machine learning • Instead of defining rules, hand-code (“annotate”) the sentiment of some tweets manually and let the computer find out which words or characters (“features”) predict sentiment • Then use this model to predict sentiment for other tweets • Essentially the same like what you know since the second year of your Bachelor: regression analysis (but now with DV sentiment and IV’s word occurrences) Gonzalez-Bailon, S., & Paltoglou, G. (2015). Signals of public opinion in online communication: A comparison of methods and data sources. The ANNALS of the American Academy of Political and Social Science, 659(1), 95-–107. Big Data and Automated Content Analysis Damian Trilling
  • 61. Different types of analysis Sentiment analysis Next meetings Take-home message Mid-term take-home exam Next meetings Big Data and Automated Content Analysis Damian Trilling
  • 62. Different types of analysis Sentiment analysis Next meetings Take-home messages What you should be familiar with: • You should have completely understood last week’s exercise. Re-read it if neccessary. • Approaches to the analysis (e.g., structure vs. content) • Types of sentiment analysis, application areas, pros and cons Big Data and Automated Content Analysis Damian Trilling
  • 63. Different types of analysis Sentiment analysis Next meetings Mid-term take home exam Week 5: Wednesday, 7 March, to Sunday, 11 March • You get the exam on Wednesday at the end of the meeting • Answers have to be handed in no later than Sunday, 23.59 • 30% of final grade • 3 questions: 1 Literature question: E.g., different methods (“Explain how. . . is done”) and/or epistemological or theoretical implications (“What does this mean for social-scientific research?”) 2 Empirical question (conceptual) 3 Empirical question (actual programming task) If you fully understood all exercises until now, it shouldn’t be difficult and won’t take too long. But give yourself a lot of buffer time!!! Big Data and Automated Content Analysis Damian Trilling
  • 64. Different types of analysis Sentiment analysis Next meetings Next meetings Wednesday, 28–2 Task for during the meeting: Conduct an own sentiment analysis! You can bring your own data (you will probably learn more!), but then already think about how to write some script to read the data (as we did last week and as described in Chapter 5). Big Data and Automated Content Analysis Damian Trilling