1. Why should I learn Python?
Why should I learn Python?
Explaining a basic Python script
Your turn
Questions?
Python in the Social Sciences
A brief introduction by means of real-life
examples
Damian Trilling
d.c.trilling@uva.nl
@damian0604
www.damiantrilling.net
Afdeling Communicatiewetenschap
Universiteit van Amsterdam
Coding Culture, Utrecht, 5 March 2014
Python in the Social Sciences
Damian Trilling
2.
3. Why should I learn Python?
Why should I learn Python?
Explaining a basic Python script
Your turn
Questions?
What I won’t do today
I won’t give you a structured introduction about
• variables
• commands
• data types
• ...
and all the other technical stuff.
Python in the Social Sciences
Damian Trilling
4. Why should I learn Python?
Why should I learn Python?
Explaining a basic Python script
Your turn
Questions?
What I won’t do today
I won’t give you a structured introduction about
• variables
• commands
• data types
• ...
and all the other technical stuff.
You’ll do that yourself the next weeks.
Python in the Social Sciences
Damian Trilling
5. Why should I learn Python?
Why should I learn Python?
Explaining a basic Python script
Your turn
Questions?
What I won’t do today
I won’t give you a structured introduction about
• variables
• commands
• data types
• ...
and all the other technical stuff.
You’ll do that yourself the next weeks.
I’ll give you some examples of what you can do with the knowledge
you’re going to acquire.
Python in the Social Sciences
Damian Trilling
6. Why should I learn Python?
Why should I learn Python?
Explaining a basic Python script
Your turn
Questions?
Some exmples
Why should I learn Python?
Some examples
Python in the Social Sciences
Damian Trilling
7. Why should I learn Python?
Why should I learn Python?
Explaining a basic Python script
Your turn
Questions?
Some exmples
A recent bachelor thesis
Tone in tweets
Python in the Social Sciences
Damian Trilling
8. Why should I learn Python?
Why should I learn Python?
Explaining a basic Python script
Your turn
Questions?
Some exmples
A recent bachelor thesis
Tone in tweets
Imagine you want to know something about someone’s behavior on
twitter. Or how a specific topic is discussed on Twitter.
Python in the Social Sciences
Damian Trilling
9. Why should I learn Python?
Why should I learn Python?
Explaining a basic Python script
Your turn
Questions?
Some exmples
A recent bachelor thesis
Tone in tweets
Imagine you want to know something about someone’s behavior on
twitter. Or how a specific topic is discussed on Twitter.
Do you really want to go through thousands of tweets by hand?
Python in the Social Sciences
Damian Trilling
10. Why should I learn Python?
Why should I learn Python?
Explaining a basic Python script
Your turn
Questions?
Some exmples
So you’d better think about automating your coding
Finding out how negative or positive politicians are towards
their opponents
Schut, L. (2013). Verenigde Staten vs. Verenigd Koninkrijk: Een automatische inhoudsanalyse naar verklarende
factoren voor het gebruik van positive campaigning en negative campaigning door vooraanstaande politici en
politieke partijen op Twitter. Bachelor Thesis, Universiteit van Amsterdam.
Python in the Social Sciences
Damian Trilling
11. Why should I learn Python?
Why should I learn Python?
Explaining a basic Python script
Your turn
Questions?
Some exmples
So you’d better think about automating your coding
Finding out how negative or positive politicians are towards
their opponents
The student took lists with positive and negative words and made
additional ones with a politician’s opponents.
Schut, L. (2013). Verenigde Staten vs. Verenigd Koninkrijk: Een automatische inhoudsanalyse naar verklarende
factoren voor het gebruik van positive campaigning en negative campaigning door vooraanstaande politici en
politieke partijen op Twitter. Bachelor Thesis, Universiteit van Amsterdam.
Python in the Social Sciences
Damian Trilling
12. Why should I learn Python?
Why should I learn Python?
Explaining a basic Python script
Your turn
Questions?
Some exmples
So you’d better think about automating your coding
Finding out how negative or positive politicians are towards
their opponents
The student took lists with positive and negative words and made
additional ones with a politician’s opponents.
She used a Python-script to check which type of words was used to
refer to opponents.
Schut, L. (2013). Verenigde Staten vs. Verenigd Koninkrijk: Een automatische inhoudsanalyse naar verklarende
factoren voor het gebruik van positive campaigning en negative campaigning door vooraanstaande politici en
politieke partijen op Twitter. Bachelor Thesis, Universiteit van Amsterdam.
Python in the Social Sciences
Damian Trilling
13. Why should I learn Python?
Why should I learn Python?
Explaining a basic Python script
Your turn
Questions?
Some exmples
So you’d better think about automating your coding
Finding out how negative or positive politicians are towards
their opponents
The student took lists with positive and negative words and made
additional ones with a politician’s opponents.
She used a Python-script to check which type of words was used to
refer to opponents.
For further analysis, the results where imported in SPSS.
Schut, L. (2013). Verenigde Staten vs. Verenigd Koninkrijk: Een automatische inhoudsanalyse naar verklarende
factoren voor het gebruik van positive campaigning en negative campaigning door vooraanstaande politici en
politieke partijen op Twitter. Bachelor Thesis, Universiteit van Amsterdam.
Python in the Social Sciences
Damian Trilling
14. Why should I learn Python?
Why should I learn Python?
Explaining a basic Python script
Your turn
Questions?
Some exmples
Python in the Social Sciences
Damian Trilling
15. Why should I learn Python?
Why should I learn Python?
Explaining a basic Python script
Your turn
Questions?
Some exmples
Python in the Social Sciences
Damian Trilling
16. Why should I learn Python?
Why should I learn Python?
Explaining a basic Python script
Your turn
Questions?
Some exmples
Frame adoption on Twitter
Which phrases used by Merkel and Steinbrück on TV make it
to the #tvduell discussion on Twitter?
As part of the project, I wrote a Python-script to identify word
co-occurrences on Twitter. The script produced not only lists with
word counts, but also a GDF-file that could be used for
visualization.
Python in the Social Sciences
Damian Trilling
17. 1
2
3
4
5
6
7
8
#!/Library/Frameworks/Python.framework/Versions/2.7/bin/python2.7
# -*- coding: utf-8 -*from __future__ import division
from itertools import combinations
from collections import defaultdict
from collections import Counter
from unicsv import CsvUnicodeReader
import codecs, cStringIO, sys, re, unicodedata, os
9
10
11
12
gdfbestand="resultaten/netwerk.gdf"
wordsplitbestand="resultaten/wordsplit.csv"
tempbestand="allewoorden.tmp"
13
14
15
16
minedgeweight=20
cooc=defaultdict(int)
tweets=[]
17
18
19
20
21
22
23
24
25
26
27
print "nReading tweet nr. "
reader=CsvUnicodeReader(open(wordsplitbestand,"r"))
i=0
for row in reader:
i=i+1
# skip first row, as it contains column headers
if i>1:
print "r",str(i)," ",
sys.stdout.flush()
tweets.append(row[9])
18. 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
f = codecs.open(tempbestand, ’wb’, encoding="utf-8")
i=0
print "Making tempfile to count word frequencies"
allestems=[]
for tweet in tweets:
for stems in tweet.split():
allestems.append(stems)
for k in range(0,len(allestems)):
f.write(allestems[k]+"n")
print "Couting..."
c=Counter()
with codecs.open(tempbestand,"rb", encoding="utf-8") as r:
for l in r:
c[l.rstrip(’n’)] += 1
os.remove(tempbestand)
f = codecs.open(gdfbestand, ’wb’, encoding="utf-8")
for tweet in tweets:
words=tweet.split()
for a,b in combinations(words,2):
if a!=b:
cooc[(a,b)]+=1
19. 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
f.write("nodedef>name VARCHAR, width DOUBLEn")
algenoemd=[]
verwijderen=[]
for k in cooc:
if cooc[k]<minedgeweight:
verwijderen.append(k)
else:
if k[0] not in algenoemd:
f.write(k[0]+","+str(c[k[0]])+"n")
algenoemd.append(k[0])
if k[1] not in algenoemd:
f.write(k[1]+","+str(c[k[1]])+"n")
algenoemd.append(k[1])
for k in verwijderen:
del cooc[k]
f.write("edgedef>node1 VARCHAR,node2 VARCHAR, weight DOUBLEn")
for k, v in cooc.iteritems():
regel= ",".join(k)+","+str(v)
f.write(regel+"n")
20. Why should I learn Python?
Why should I learn Python?
Explaining a basic Python script
Your turn
Questions?
Some exmples
Frame adoption on Twitter
Python in the Social Sciences
Damian Trilling
21. Why should I learn Python?
Why should I learn Python?
Explaining a basic Python script
Your turn
Questions?
Summing up what you can use it for
Why should I learn Python?
Summing up what you can use it for
Python in the Social Sciences
Damian Trilling
22. Why should I learn Python?
Why should I learn Python?
Explaining a basic Python script
Your turn
Questions?
Summing up what you can use it for
One tool to rule them all?
Python in the Social Sciences
Damian Trilling
23. Why should I learn Python?
Why should I learn Python?
Explaining a basic Python script
Your turn
Questions?
Summing up what you can use it for
One tool to rule them all?
Of course there are ready-made tool for some of the questions we
want to answer. But for many, there isn’t. Python offers us the
possibility to build exactly the tool we need.
Python in the Social Sciences
Damian Trilling
24. Why should I learn Python?
Why should I learn Python?
Explaining a basic Python script
Your turn
Questions?
Summing up what you can use it for
One tool to rule them all?
Of course there are ready-made tool for some of the questions we
want to answer. But for many, there isn’t. Python offers us the
possibility to build exactly the tool we need.
fun!
Python in the Social Sciences
And it’s
Damian Trilling
25. Why should I learn Python?
Why should I learn Python?
Explaining a basic Python script
Your turn
Questions?
Summing up what you can use it for
1st group of tasks
Highly repetitive tasks
Simple tasks (counting things, comparing texts, . . . ) that can be
described in a formalized way. Saves time even with few cases, but
there is virtually no size limit.
Example: Retweets start with RT, optionally followed by a space,
and some letters. So it is very easy to identify them automatically
Python in the Social Sciences
Damian Trilling
26.
27.
28. Why should I learn Python?
Why should I learn Python?
Explaining a basic Python script
Your turn
Questions?
Summing up what you can use it for
2nd group of tasks
Task for which specific Python modules exist
There are thousands of modules suitable for text analysis. You
basically only have to write code for data input and output.
Example: Sentiment analysis
Python in the Social Sciences
Damian Trilling
29.
30. Why should I learn Python?
Why should I learn Python?
Explaining a basic Python script
Your turn
Questions?
Summing up what you can use it for
3rd group of tasks
API’s, RSS, webscraping . . .
You can use Python if you want to collect and store information.
Example: Collecting bio’s of Twitter users, scraping the web (data
journalism!), downloading Facebook data
Python in the Social Sciences
Damian Trilling
31.
32. Why should I learn Python?
Why should I learn Python?
Explaining a basic Python script
Your turn
Questions?
Summing up what you can use it for
Why we should use Python in the social sciences
It is a programming language
• It is flexible. You can use it for (in principle) any kind of data
• There are virtually no limits regarding the amount of data to
process
• You can run it on every platform
Python in the Social Sciences
Damian Trilling
33. Why should I learn Python?
Why should I learn Python?
Explaining a basic Python script
Your turn
Questions?
Summing up what you can use it for
Why we should use Python in the social sciences
It is a programming language
• It is flexible. You can use it for (in principle) any kind of data
• There are virtually no limits regarding the amount of data to
process
• You can run it on every platform
• And yet it is easy to learn!
Python in the Social Sciences
Damian Trilling
34. Why should I learn Python?
Why should I learn Python?
Explaining a basic Python script
Your turn
Questions?
Summing up what you can use it for
Why we should use Python in the social sciences
It is a programming language
• It is flexible. You can use it for (in principle) any kind of data
• There are virtually no limits regarding the amount of data to
process
• You can run it on every platform
• And yet it is easy to learn!
It is widely used for content analysis
• Many online ressources and toolkits
• Books about NLP and Web Scraping with Python
Python in the Social Sciences
Damian Trilling
35. Why should I learn Python?
Why should I learn Python?
Explaining a basic Python script
Your turn
Questions?
Summing up what you can use it for
Think of the following task
RQ: What are the differences in terms of actors mentioned
between Israeli and Palestinian news coverage?
Python in the Social Sciences
Damian Trilling
36. Why should I learn Python?
Why should I learn Python?
Explaining a basic Python script
Your turn
Questions?
Summing up what you can use it for
Think of the following task
RQ: What are the differences in terms of actors mentioned
between Israeli and Palestinian news coverage?
1
The data structure: You have a folder with articles
Python in the Social Sciences
Damian Trilling
37. Why should I learn Python?
Why should I learn Python?
Explaining a basic Python script
Your turn
Questions?
Summing up what you can use it for
Think of the following task
RQ: What are the differences in terms of actors mentioned
between Israeli and Palestinian news coverage?
1
The data structure: You have a folder with articles
2
The desired output: You want a table with the file names and
a column per actor, counting how often they are mentioned
Python in the Social Sciences
Damian Trilling
38. Why should I learn Python?
Why should I learn Python?
Explaining a basic Python script
Your turn
Questions?
Summing up what you can use it for
Think of the following task
RQ: What are the differences in terms of actors mentioned
between Israeli and Palestinian news coverage?
1
The data structure: You have a folder with articles
2
The desired output: You want a table with the file names and
a column per actor, counting how often they are mentioned
3
A typical task for a short Python script!
Python in the Social Sciences
Damian Trilling
39. Why should I learn Python?
Why should I learn Python?
Explaining a basic Python script
Your turn
Questions?
Summing up what you can use it for
You need someting like this:
for every file in folder:
read the file
count actors
add new row to table with filename and actor counts
save table
(such a notation is called pseudo-code)
Python in the Social Sciences
Damian Trilling
40. Why should I learn Python?
Why should I learn Python?
Explaining a basic Python script
Your turn
Questions?
Summing up what you can use it for
and in Python, it’s not that different!
Python in the Social Sciences
Damian Trilling
41. 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
mypath ="C:UsersRicardaDocumentsArtikelen"
regex54 = re.compile(r’Israel.*[minister|politician.*|[Aa]uthorit’)
filename_list=[]
matchcount54=0
matchcount54_list=[]
onlyfiles = [ f for f in listdir(mypath) if isfile(join(mypath,f)) ]
for f in onlyfiles:
matchcount54=0
artikel=open(join(mypath,f),"r")
for line in artikel:
matches54 = regex54.findall(line)
for word in matches54:
matchcount54=matchcount54+1
filename_list.append(f)
matchcount54_list.append(matchcount54)
artikel.close()
output=zip(filename_list,matchcount54_list)
writer = csv.writer(open("overzichtstabel.csv", ’wb’))
writer.writerows(output)
42. Why should I learn Python?
Why should I learn Python?
Explaining a basic Python script
Your turn
Questions?
Pseudo-code
Explaining a basic Python script:
Pseudo code
Python in the Social Sciences
Damian Trilling
43. Why should I learn Python?
Why should I learn Python?
Explaining a basic Python script
Your turn
Questions?
Pseudo-code
We collected tweets on the UNFCC-conference with
yourTwapperkeeper.
Our task: Identify all tweets that include a reference to Poland
Let’s start with some pseudo-code!
1
2
3
4
5
6
7
open csv-table
for each line:
append column 1 to a list of tweets
append column 3 to a list of corresponding users
look for searchstring in column 1
append search result to a list of results
save lists to a new csv-file
Python in the Social Sciences
Damian Trilling
44. Why should I learn Python?
Why should I learn Python?
Explaining a basic Python script
Your turn
Questions?
Python code
Explaining a basic Python script:
Python code
Python in the Social Sciences
Damian Trilling
45. 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
#!/usr/bin/python
from unicsv import CsvUnicodeReader
from unicsv import CsvUnicodeWriter
import re
inputfilename="mytweets.csv"
outputfilename="myoutput.csv"
user_list=[]
tweet_list=[]
search_list=[]
searchstring1 = re.compile(r’[Pp]olen|[Pp]ool|[Ww]arschau|[Ww]arszawa’)
print "Opening "+inputfilename
reader=CsvUnicodeReader(open(inputfilename,"r"))
for row in reader:
tweet_list.append(row[0])
user_list.append(row[2])
matches1 = searchstring1.findall(row[0])
matchcount1=0
for word in matches1:
matchcount1=matchcount1+1
search_list.append(matchcount1)
print "Constructing data matrix"
outputdata=zip(tweet_list,user_list,search_list)
headers=zip(["tweet"],["user"],["how often is Poland mentioned?"])
print "Write data matrix to ",outputfilename
writer=CsvUnicodeWriter(open(outputfilename,"wb"))
writer.writerows(headers)
writer.writerows(outputdata)
46. Why should I learn Python?
Why should I learn Python?
Explaining a basic Python script
Your turn
Questions?
Python code
1
2
3
4
5
#!/usr/bin/python
# We start with importing some modules:
from unicsv import CsvUnicodeReader
from unicsv import CsvUnicodeWriter
import re
6
7
8
9
10
# Let us define two variables that contain
# the names of the files we want to use
inputfilename="mytweets.csv"
outputfilename="myoutput.csv"
Python in the Social Sciences
Damian Trilling
47. Why should I learn Python?
Why should I learn Python?
Explaining a basic Python script
Your turn
Questions?
Python code
1
2
3
4
5
6
# We create some empty lists that we will use later on.
# A list can contain several variables
# and is denoted by square brackets.
user_list=[]
tweet_list=[]
search_list=[]
Python in the Social Sciences
Damian Trilling
48. Why should I learn Python?
Why should I learn Python?
Explaining a basic Python script
Your turn
Questions?
Python code
1
2
# What do we want to look for?
searchstring1 = re.compile(r’[Pp]olen|[Pp]ool|[Ww]arschau|[Ww]arszawa’)
3
4
5
6
# Enough preparation, let the program begin!
# We tell the user what is going on...
print "Opening "+inputfilename
7
8
9
# ... and call the module that reads the input file.
reader=CsvUnicodeReader(open(inputfilename,"r"))
Python in the Social Sciences
Damian Trilling
49. Why should I learn Python?
Why should I learn Python?
Explaining a basic Python script
Your turn
Questions?
Python code
1
2
3
4
5
6
7
8
# Now we read the file line by line.
# The indented block is repeated for each row
# (thus, each tweet)
for row in reader:
# append data from the current row to our lists.
# Note that we start counting with 0.
tweet_list.append(row[0])
user_list.append(row[2])
9
10
11
12
13
14
15
16
# Let us count how often our searchstring is used in
# in this tweet
matches1 = searchstring1.findall(row[0])
matchcount1=0
for word in matches1:
matchcount1=matchcount1+1
search_list.append(matchcount1)
Python in the Social Sciences
Damian Trilling
50. Why should I learn Python?
Why should I learn Python?
Explaining a basic Python script
Your turn
Questions?
Python code
1
2
# Time to put all the data in one container
# and save it:
3
4
5
6
7
8
9
10
print "Constructing data matrix"
outputdata=zip(tweet_list,user_list,search_list)
headers=zip(["tweet"],["user"],["how often is Poland mentioned?"])
print "Write data matrix to ",outputfilename
writer=CsvUnicodeWriter(open(outputfilename,"wb"))
writer.writerows(headers)
writer.writerows(outputdata)
Python in the Social Sciences
Damian Trilling
51. 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
#!/usr/bin/python
from unicsv import CsvUnicodeReader
from unicsv import CsvUnicodeWriter
import re
inputfilename="mytweets.csv"
outputfilename="myoutput.csv"
user_list=[]
tweet_list=[]
search_list=[]
searchstring1 = re.compile(r’[Pp]olen|[Pp]ool|[Ww]arschau|[Ww]arszawa’)
print "Opening "+inputfilename
reader=CsvUnicodeReader(open(inputfilename,"r"))
for row in reader:
tweet_list.append(row[0])
user_list.append(row[2])
matches1 = searchstring1.findall(row[0])
matchcount1=0
for word in matches1:
matchcount1=matchcount1+1
search_list.append(matchcount1)
print "Constructing data matrix"
outputdata=zip(tweet_list,user_list,search_list)
headers=zip(["tweet"],["user"],["how often is Poland mentioned?"])
print "Write data matrix to ",outputfilename
writer=CsvUnicodeWriter(open(outputfilename,"wb"))
writer.writerows(headers)
writer.writerows(outputdata)
52. Why should I learn Python?
Why should I learn Python?
Explaining a basic Python script
Your turn
Questions?
The output
Explaining a basic Python script:
The output (myoutput.csv)
Python in the Social Sciences
Damian Trilling
53. Why should I learn Python?
Why should I learn Python?
Explaining a basic Python script
Your turn
Questions?
The output
1
2
3
4
5
tweet,user,how often is Poland mentioned?
:-) #Lectrr #wereldleiders #uitspraken #Wikileaks #klimaattop http://t.
co/Udjpk48EIB,henklbr,0
Wat zijn de resulaten vd #klimaattop in #Warschau waard? @EP_Environment
ontmoet voorzitter klimaattop @MarcinKorolec http://t.co/4
Lmiaopf60,Europarl_NL,1
RT @greenami1: De winnaars en verliezers van de lachwekkende #klimaattop
in #Warschau (interview): http://t.co/DEYqnqXHdy #Misserfolg #Kli
...,LarsMoratis,1
De winnaars en verliezers van de lachwekkende #klimaattop in #Warschau (
interview): http://t.co/DEYqnqXHdy #Misserfolg #Klimaschutz #FAZ,
greenami1,1
Python in the Social Sciences
Damian Trilling
54. Why should I learn Python?
Why should I learn Python?
Explaining a basic Python script
Your turn
Questions?
The output
Python in the Social Sciences
Damian Trilling
55. Why should I learn Python?
Why should I learn Python?
Explaining a basic Python script
Your turn
Questions?
Try it yourself!
Python in the Social Sciences
Damian Trilling
56. Why should I learn Python?
Why should I learn Python?
Explaining a basic Python script
Your turn
Questions?
Doe je mee?
Python in the Social Sciences
Damian Trilling
57. Why should I learn Python?
Why should I learn Python?
Explaining a basic Python script
Your turn
Questions?
Vragen of opmerkingen?
Damian Trilling
d.c.trilling@uva.nl
@damian0604
www.damiantrilling.net
Python in the Social Sciences
Damian Trilling