SlideShare uma empresa Scribd logo
1 de 36
Baixar para ler offline
The data

The script

Your turn

Questions?

Hands-on-Workshop
Big (Twitter) Data
Damian Trilling
d.c.trilling@uva.nl
@damian0604
www.damiantrilling.net
Afdeling Communicatiewetenschap
Universiteit van Amsterdam

30 January 2014
10.45
#bigdata

Damian Trilling
The data

The script

Your turn

Questions?

In this sesion (2/4):
1 The data

Recording tweets with yourTwapperkeeper
CSV-files
Other ways to collect tweets
Not that different: Facebook posts
2 The script

Pseudo-code
Python code
The output
3 Your turn
4 Questions?

#bigdata

Damian Trilling
The data

The script

Your turn

Questions?

Recording tweets with yourTwapperkeeper

The data:
Recording tweets with yourTwapperkeeper
http://datacollection.followthenews-uva.cloudlet.sara.nl

#bigdata

Damian Trilling
The data

The script

Your turn

Questions?

Recording tweets with yourTwapperkeeper

yourTwapperkeeper

#bigdata

Damian Trilling
The data

The script

Your turn

Questions?

Recording tweets with yourTwapperkeeper

yourTwapperkeeper

Storage
Continuosly calls the Twitter-API and saves all
tweets containing specific hashtags to a
mySQL-database.
You tell it once which data to collect – and
wait some months.

#bigdata

Damian Trilling
The data

The script

Your turn

Questions?

Recording tweets with yourTwapperkeeper

yourTwapperkeeper

#bigdata

Damian Trilling
The data

The script

Your turn

Questions?

Recording tweets with yourTwapperkeeper

yourTwapperkeeper

Retrieving the data
You could access the MySQL-database directly.
But yourTwapperkeeper has a nice interface
that allows you to export the data to a format
we can use for the analysis.

#bigdata

Damian Trilling
The data

The script

Your turn

Questions?

CSV-files

The data:
CSV-files

#bigdata

Damian Trilling
The data

The script

Your turn

Questions?

CSV-files

CSV-files

The format of our choice
• All programs can read it
• Even human-readable in a simple text editor:
• Plain text, with a comma (or a semicolon) denoting column

breaks
• No limits regarging the size

#bigdata

Damian Trilling
The data

The script

Your turn

Questions?

CSV-files

1

2

3

text,to_user_id,from_user,id,from_user_id,
iso_language_code,source,profile_image_url,geo_type,
geo_coordinates_0,geo_coordinates_1,created_at,time
:-) #Lectrr #wereldleiders #uitspraken #Wikileaks #
klimaattop http://t.co/Udjpk48EIB,,henklbr
,407085917011079169,118374840,nl,web,http://pbs.twimg.
com/profile_images/378800000673845195/
b47785b1595e6a1c63b93e463f3d0ccc_normal.jpeg,,0,0,Sun
Dec 01 09:57:00 +0000 2013,1385891820
Wat zijn de resulaten vd #klimaattop in #Warschau waard?
@EP_Environment ontmoet voorzitter klimaattop
@MarcinKorolec http://t.co/4Lmiaopf60,,Europarl_NL
,406058792573730816,37623918,en,<a href="http://www.
hootsuite.com" rel="nofollow">HootSuite</a>,http://pbs
.twimg.com/profile_images/2943831271/
b6631b23a86502fae808ca3efde23d0d_normal.png,,0,0,Thu
Nov 28 13:55:35 +0000 2013,1385646935

#bigdata

Damian Trilling
The data

The script

Your turn

Questions?

Other ways to collect tweets

The data:
Other ways to collect tweets

#bigdata

Damian Trilling
The data

The script

Your turn

Questions?

Other ways to collect tweets

Other ways to collect tweets
Again, we want a CSV file. . .
• If you want tweets per person: www.allmytweets.net
• Up to six days backwards: www.scraperwiki.com
• Buy it from a commercial vendor
• TCAT (from the guys at DMI/mediastudies)
• For specific purposes, write your own Python script to access

the Twitter-API
(if you want to, I can show you more about this tomorrow)

#bigdata

Damian Trilling
The data

The script

Your turn

Questions?

Not that different: Facebook posts

The data:
Not that different: Facebook posts

#bigdata

Damian Trilling
The data

The script

Your turn

Questions?

Not that different: Facebook posts

Not that different: Facebook posts
Have a look at netvizz
• Gephi-files for network analysis
• . . . and a tab-seperated (essentially the same as CSV) file with

the content)

#bigdata

Damian Trilling
The data

The script

Your turn

Questions?

Not that different: Facebook posts

Not that different: Facebook posts
Have a look at netvizz
• Gephi-files for network analysis
• . . . and a tab-seperated (essentially the same as CSV) file with

the content)

An alternative: Facepager
• Tool to query different APIs (a.o. Twitter and Facebook) and

to store the result in a CSV table
• http://www.ls1.ifkw.uni-muenchen.de/personen/

wiss_ma/keyling_till/software.html

#bigdata

Damian Trilling
The data

The script

Your turn

Questions?

Pseudo-code

The script:
Pseudo-code

#bigdata

Damian Trilling
The data

The script

Your turn

Questions?

Pseudo-code

Our task: Identify all tweets that include a reference to Poland
Let’s start with some pseudo-code!
1
2
3
4
5
6
7

open csv-table
for each line:
append column 1 to a list of tweets
append column 3 to a list of corresponding users
look for searchstring in column 1
append search result to a list of results
save lists to a new csv-file

#bigdata

Damian Trilling
The data

The script

Your turn

Questions?

Python code

The script:
Python code

#bigdata

Damian Trilling
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27

#!/usr/bin/python
from unicsv import CsvUnicodeReader
from unicsv import CsvUnicodeWriter
import re
inputfilename="mytweets.csv"
outputfilename="myoutput.csv"
user_list=[]
tweet_list=[]
search_list=[]
searchstring1 = re.compile(r’[Pp]olen|[Pp]ool|[Ww]arschau|[Ww]arszawa’)
print "Opening "+inputfilename
reader=CsvUnicodeReader(open(inputfilename,"r"))
for row in reader:
tweet_list.append(row[0])
user_list.append(row[2])
matches1 = searchstring1.findall(row[0])
matchcount1=0
for word in matches1:
matchcount1=matchcount1+1
search_list.append(matchcount1)
print "Constructing data matrix"
outputdata=zip(tweet_list,user_list,search_list)
headers=zip(["tweet"],["user"],["how often is Poland mentioned?"])
print "Write data matrix to ",outputfilename
writer=CsvUnicodeWriter(open(outputfilename,"wb"))
writer.writerows(headers)
writer.writerows(outputdata)
The data

The script

Your turn

Questions?

Python code

1
2
3
4
5

#!/usr/bin/python
# We start with importing some modules:
from unicsv import CsvUnicodeReader
from unicsv import CsvUnicodeWriter
import re

6
7
8
9
10

# Let us define two variables that contain
# the names of the files we want to use
inputfilename="mytweets.csv"
outputfilename="myoutput.csv"

#bigdata

Damian Trilling
The data

The script

Your turn

Questions?

Python code

1
2
3
4
5
6

# We create some empty lists that we will use later on.
# A list can contain several variables
# and is denoted by square brackets.
user_list=[]
tweet_list=[]
search_list=[]

#bigdata

Damian Trilling
The data

The script

Your turn

Questions?

Python code

1
2

# What do we want to look for?
searchstring1 = re.compile(r’[Pp]olen|[Pp]ool|[Ww]arschau
|[Ww]arszawa’)

3
4
5
6

# Enough preparation, let the program begin!
# We tell the user what is going on...
print "Opening "+inputfilename

7
8
9

# ... and call the module that reads the input file.
reader=CsvUnicodeReader(open(inputfilename,"r"))

#bigdata

Damian Trilling
The data

The script

Your turn

Questions?

Python code

1
2
3
4
5
6
7
8

# Now we read the file line by line.
# The indented block is repeated for each row
# (thus, each tweet)
for row in reader:
# append data from the current row to our lists.
# Note that we start counting with 0.
tweet_list.append(row[0])
user_list.append(row[2])

9
10
11
12
13
14
15
16

#bigdata

# Let us count how often our searchstring is used in
# in this tweet
matches1 = searchstring1.findall(row[0])
matchcount1=0
for word in matches1:
matchcount1=matchcount1+1
search_list.append(matchcount1)

Damian Trilling
The data

The script

Your turn

Questions?

Python code

1
2

# Time to put all the data in one container
# and save it:

3
4
5
6

7
8
9
10

print "Constructing data matrix"
outputdata=zip(tweet_list,user_list,search_list)
headers=zip(["tweet"],["user"],["how often is Poland
mentioned?"])
print "Write data matrix to ",outputfilename
writer=CsvUnicodeWriter(open(outputfilename,"wb"))
writer.writerows(headers)
writer.writerows(outputdata)

#bigdata

Damian Trilling
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27

#!/usr/bin/python
from unicsv import CsvUnicodeReader
from unicsv import CsvUnicodeWriter
import re
inputfilename="mytweets.csv"
outputfilename="myoutput.csv"
user_list=[]
tweet_list=[]
search_list=[]
searchstring1 = re.compile(r’[Pp]olen|[Pp]ool|[Ww]arschau|[Ww]arszawa’)
print "Opening "+inputfilename
reader=CsvUnicodeReader(open(inputfilename,"r"))
for row in reader:
tweet_list.append(row[0])
user_list.append(row[2])
matches1 = searchstring1.findall(row[0])
matchcount1=0
for word in matches1:
matchcount1=matchcount1+1
search_list.append(matchcount1)
print "Constructing data matrix"
outputdata=zip(tweet_list,user_list,search_list)
headers=zip(["tweet"],["user"],["how often is Poland mentioned?"])
print "Write data matrix to ",outputfilename
writer=CsvUnicodeWriter(open(outputfilename,"wb"))
writer.writerows(headers)
writer.writerows(outputdata)
The data

The script

Your turn

Questions?

The output

The script:
myoutput.csv

#bigdata

Damian Trilling
The data

The script

Your turn

Questions?

The output

1
2

3

4

5

tweet,user,how often is Poland mentioned?
:-) #Lectrr #wereldleiders #uitspraken #Wikileaks #
klimaattop http://t.co/Udjpk48EIB,henklbr,0
Wat zijn de resulaten vd #klimaattop in #Warschau waard?
@EP_Environment ontmoet voorzitter klimaattop
@MarcinKorolec http://t.co/4Lmiaopf60,Europarl_NL,1
RT @greenami1: De winnaars en verliezers van de
lachwekkende #klimaattop in #Warschau (interview):
http://t.co/DEYqnqXHdy #Misserfolg #Kli...,LarsMoratis
,1
De winnaars en verliezers van de lachwekkende #klimaattop
in #Warschau (interview): http://t.co/DEYqnqXHdy #
Misserfolg #Klimaschutz #FAZ,greenami1,1

#bigdata

Damian Trilling
The data

The script

Your turn

Questions?

The output

#bigdata

Damian Trilling
The data

The script

Your turn

Questions?

Try it yourself!
We’ll help you getting started. Please go to
http://beehub.nl/bigdata-cw/workshop and download the
some files. Save the Python files
unicsv.py
myfirstscript.py as well as the dataset
mytweets.csv in a new folder called workshop on your
H-drive.
When you are done, start Python (GUI) from the
Windows Start Menu.

#bigdata

Damian Trilling
The data

The script

Your turn

Questions?

Recap
1 The data

Recording tweets with yourTwapperkeeper
CSV-files
Other ways to collect tweets
Not that different: Facebook posts
2 The script

Pseudo-code
Python code
The output
3 Your turn
4 Questions?

#bigdata

Damian Trilling
The data

The script

Your turn

Questions?

This afternoon

Your own script

#bigdata

Damian Trilling
The data

The script

Your turn

Questions?

Vragen of opmerkingen?

Damian Trilling
d.c.trilling@uva.nl
@damian0604
www.damiantrilling.net
#bigdata

Damian Trilling

Mais conteúdo relacionado

Destaque (7)

BDACA1617s2 - Lecture3
BDACA1617s2 - Lecture3BDACA1617s2 - Lecture3
BDACA1617s2 - Lecture3
 
BDACA1617s2 - Lecture 1
BDACA1617s2 - Lecture 1BDACA1617s2 - Lecture 1
BDACA1617s2 - Lecture 1
 
BDACA1617s2 - Tutorial 1
BDACA1617s2 - Tutorial 1BDACA1617s2 - Tutorial 1
BDACA1617s2 - Tutorial 1
 
Real Time Analytics for Big Data a Twitter Case Study
Real Time Analytics for Big Data a Twitter Case StudyReal Time Analytics for Big Data a Twitter Case Study
Real Time Analytics for Big Data a Twitter Case Study
 
Real Time Analytics for Big Data - A twitter inspired case study
Real Time Analytics for Big Data - A twitter inspired case studyReal Time Analytics for Big Data - A twitter inspired case study
Real Time Analytics for Big Data - A twitter inspired case study
 
Gephi Tutorial Visualization
Gephi Tutorial VisualizationGephi Tutorial Visualization
Gephi Tutorial Visualization
 
Twitter bootstrap tutorial
Twitter bootstrap tutorialTwitter bootstrap tutorial
Twitter bootstrap tutorial
 

Semelhante a Analyzing social media with Python and other tools (2/4)

Adventure in Data: A tour of visualization projects at Twitter
Adventure in Data: A tour of visualization projects at TwitterAdventure in Data: A tour of visualization projects at Twitter
Adventure in Data: A tour of visualization projects at TwitterKrist Wongsuphasawat
 
What to expect when you are visualizing
What to expect when you are visualizingWhat to expect when you are visualizing
What to expect when you are visualizingKrist Wongsuphasawat
 
Mining Social Web APIs with IPython Notebook (Data Day Texas 2015)
Mining Social Web APIs with IPython Notebook (Data Day Texas 2015)Mining Social Web APIs with IPython Notebook (Data Day Texas 2015)
Mining Social Web APIs with IPython Notebook (Data Day Texas 2015)Matthew Russell
 
Mining Social Web APIs with IPython Notebook (PyCon 2014)
Mining Social Web APIs with IPython Notebook (PyCon 2014)Mining Social Web APIs with IPython Notebook (PyCon 2014)
Mining Social Web APIs with IPython Notebook (PyCon 2014)Matthew Russell
 
Social Media Data Collection & Analysis
Social Media Data Collection & AnalysisSocial Media Data Collection & Analysis
Social Media Data Collection & AnalysisScott Sanders
 
Unleashing twitter data for fun and insight
Unleashing twitter data for fun and insightUnleashing twitter data for fun and insight
Unleashing twitter data for fun and insightDigital Reasoning
 
Unleashing Twitter Data for Fun and Insight
Unleashing Twitter Data for Fun and InsightUnleashing Twitter Data for Fun and Insight
Unleashing Twitter Data for Fun and InsightMatthew Russell
 
Five steps to get tweets sent by a list of users
Five steps to get tweets sent by a list of usersFive steps to get tweets sent by a list of users
Five steps to get tweets sent by a list of usersWeiai Wayne Xu
 
Data Engineering 101: Building your first data product by Jonathan Dinu PyDat...
Data Engineering 101: Building your first data product by Jonathan Dinu PyDat...Data Engineering 101: Building your first data product by Jonathan Dinu PyDat...
Data Engineering 101: Building your first data product by Jonathan Dinu PyDat...PyData
 
Natural Language Processing sample code by Aiden
Natural Language Processing sample code by AidenNatural Language Processing sample code by Aiden
Natural Language Processing sample code by AidenAiden Wu, FRM
 
Linking Feral Event Data: IWMW 2009 Case Study
Linking Feral Event Data: IWMW 2009 Case StudyLinking Feral Event Data: IWMW 2009 Case Study
Linking Feral Event Data: IWMW 2009 Case Studylisbk
 
Collect twitter data using python
Collect twitter data using pythonCollect twitter data using python
Collect twitter data using pythonKe Jiang
 
Sentiment Analysis on Twitter Data Using Apache Flume and Hive
Sentiment Analysis on Twitter Data Using Apache Flume and HiveSentiment Analysis on Twitter Data Using Apache Flume and Hive
Sentiment Analysis on Twitter Data Using Apache Flume and HiveIRJET Journal
 
Collect twitter data using python
Collect twitter data using pythonCollect twitter data using python
Collect twitter data using pythonKe Jiang
 

Semelhante a Analyzing social media with Python and other tools (2/4) (20)

Analyzing social media with Python and other tools (4/4)
Analyzing social media with Python and other tools (4/4) Analyzing social media with Python and other tools (4/4)
Analyzing social media with Python and other tools (4/4)
 
Adventure in Data: A tour of visualization projects at Twitter
Adventure in Data: A tour of visualization projects at TwitterAdventure in Data: A tour of visualization projects at Twitter
Adventure in Data: A tour of visualization projects at Twitter
 
What to expect when you are visualizing
What to expect when you are visualizingWhat to expect when you are visualizing
What to expect when you are visualizing
 
Mining Social Web APIs with IPython Notebook (Data Day Texas 2015)
Mining Social Web APIs with IPython Notebook (Data Day Texas 2015)Mining Social Web APIs with IPython Notebook (Data Day Texas 2015)
Mining Social Web APIs with IPython Notebook (Data Day Texas 2015)
 
Mining Social Web APIs with IPython Notebook (PyCon 2014)
Mining Social Web APIs with IPython Notebook (PyCon 2014)Mining Social Web APIs with IPython Notebook (PyCon 2014)
Mining Social Web APIs with IPython Notebook (PyCon 2014)
 
Analyzing social media with Python and other tools (1/4)
Analyzing social media with Python and other tools (1/4)Analyzing social media with Python and other tools (1/4)
Analyzing social media with Python and other tools (1/4)
 
BD-ACA week3a
BD-ACA week3aBD-ACA week3a
BD-ACA week3a
 
Aws r
Aws rAws r
Aws r
 
BDACA1516s2 - Lecture3
BDACA1516s2 - Lecture3BDACA1516s2 - Lecture3
BDACA1516s2 - Lecture3
 
Social Media Data Collection & Analysis
Social Media Data Collection & AnalysisSocial Media Data Collection & Analysis
Social Media Data Collection & Analysis
 
Unleashing twitter data for fun and insight
Unleashing twitter data for fun and insightUnleashing twitter data for fun and insight
Unleashing twitter data for fun and insight
 
Unleashing Twitter Data for Fun and Insight
Unleashing Twitter Data for Fun and InsightUnleashing Twitter Data for Fun and Insight
Unleashing Twitter Data for Fun and Insight
 
Five steps to get tweets sent by a list of users
Five steps to get tweets sent by a list of usersFive steps to get tweets sent by a list of users
Five steps to get tweets sent by a list of users
 
Data Engineering 101: Building your first data product by Jonathan Dinu PyDat...
Data Engineering 101: Building your first data product by Jonathan Dinu PyDat...Data Engineering 101: Building your first data product by Jonathan Dinu PyDat...
Data Engineering 101: Building your first data product by Jonathan Dinu PyDat...
 
Natural Language Processing sample code by Aiden
Natural Language Processing sample code by AidenNatural Language Processing sample code by Aiden
Natural Language Processing sample code by Aiden
 
Linking Feral Event Data: IWMW 2009 Case Study
Linking Feral Event Data: IWMW 2009 Case StudyLinking Feral Event Data: IWMW 2009 Case Study
Linking Feral Event Data: IWMW 2009 Case Study
 
Collect twitter data using python
Collect twitter data using pythonCollect twitter data using python
Collect twitter data using python
 
01-intro.pptx
01-intro.pptx01-intro.pptx
01-intro.pptx
 
Sentiment Analysis on Twitter Data Using Apache Flume and Hive
Sentiment Analysis on Twitter Data Using Apache Flume and HiveSentiment Analysis on Twitter Data Using Apache Flume and Hive
Sentiment Analysis on Twitter Data Using Apache Flume and Hive
 
Collect twitter data using python
Collect twitter data using pythonCollect twitter data using python
Collect twitter data using python
 

Mais de Department of Communication Science, University of Amsterdam

Mais de Department of Communication Science, University of Amsterdam (18)

BDACA - Lecture8
BDACA - Lecture8BDACA - Lecture8
BDACA - Lecture8
 
BDACA - Lecture7
BDACA - Lecture7BDACA - Lecture7
BDACA - Lecture7
 
BDACA - Lecture6
BDACA - Lecture6BDACA - Lecture6
BDACA - Lecture6
 
BDACA - Lecture4
BDACA - Lecture4BDACA - Lecture4
BDACA - Lecture4
 
BDACA - Lecture3
BDACA - Lecture3BDACA - Lecture3
BDACA - Lecture3
 
BDACA - Lecture2
BDACA - Lecture2BDACA - Lecture2
BDACA - Lecture2
 
BDACA - Tutorial1
BDACA - Tutorial1BDACA - Tutorial1
BDACA - Tutorial1
 
BDACA - Lecture1
BDACA - Lecture1BDACA - Lecture1
BDACA - Lecture1
 
BDACA1617s2 - Lecture4
BDACA1617s2 - Lecture4BDACA1617s2 - Lecture4
BDACA1617s2 - Lecture4
 
BDACA1617s2 - Lecture 2
BDACA1617s2 - Lecture 2BDACA1617s2 - Lecture 2
BDACA1617s2 - Lecture 2
 
Media diets in an age of apps and social media: Dealing with a third layer of...
Media diets in an age of apps and social media: Dealing with a third layer of...Media diets in an age of apps and social media: Dealing with a third layer of...
Media diets in an age of apps and social media: Dealing with a third layer of...
 
Conceptualizing and measuring news exposure as network of users and news items
Conceptualizing and measuring news exposure as network of users and news itemsConceptualizing and measuring news exposure as network of users and news items
Conceptualizing and measuring news exposure as network of users and news items
 
Data Science: Case "Political Communication 2/2"
Data Science: Case "Political Communication 2/2"Data Science: Case "Political Communication 2/2"
Data Science: Case "Political Communication 2/2"
 
Data Science: Case "Political Communication 1/2"
Data Science: Case "Political Communication 1/2"Data Science: Case "Political Communication 1/2"
Data Science: Case "Political Communication 1/2"
 
BDACA1516s2 - Lecture8
BDACA1516s2 - Lecture8BDACA1516s2 - Lecture8
BDACA1516s2 - Lecture8
 
BDACA1516s2 - Lecture7
BDACA1516s2 - Lecture7BDACA1516s2 - Lecture7
BDACA1516s2 - Lecture7
 
BDACA1516s2 - Lecture4
 BDACA1516s2 - Lecture4 BDACA1516s2 - Lecture4
BDACA1516s2 - Lecture4
 
BDACA1516s2 - Lecture1
BDACA1516s2 - Lecture1BDACA1516s2 - Lecture1
BDACA1516s2 - Lecture1
 

Último

Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfErwinPantujan2
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George
 
Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)cama23
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptxmary850239
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYKayeClaireEstoconing
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management systemChristalin Nelson
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfMr Bounab Samir
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomnelietumpap1
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfSpandanaRallapalli
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptxSherlyMaeNeri
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfTechSoup
 

Último (20)

Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
 
Raw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptxRaw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptx
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP
 
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptxFINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
 
Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
 
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptxLEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management system
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choom
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdf
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptxYOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptx
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
 

Analyzing social media with Python and other tools (2/4)

  • 1. The data The script Your turn Questions? Hands-on-Workshop Big (Twitter) Data Damian Trilling d.c.trilling@uva.nl @damian0604 www.damiantrilling.net Afdeling Communicatiewetenschap Universiteit van Amsterdam 30 January 2014 10.45 #bigdata Damian Trilling
  • 2. The data The script Your turn Questions? In this sesion (2/4): 1 The data Recording tweets with yourTwapperkeeper CSV-files Other ways to collect tweets Not that different: Facebook posts 2 The script Pseudo-code Python code The output 3 Your turn 4 Questions? #bigdata Damian Trilling
  • 3. The data The script Your turn Questions? Recording tweets with yourTwapperkeeper The data: Recording tweets with yourTwapperkeeper http://datacollection.followthenews-uva.cloudlet.sara.nl #bigdata Damian Trilling
  • 4. The data The script Your turn Questions? Recording tweets with yourTwapperkeeper yourTwapperkeeper #bigdata Damian Trilling
  • 5. The data The script Your turn Questions? Recording tweets with yourTwapperkeeper yourTwapperkeeper Storage Continuosly calls the Twitter-API and saves all tweets containing specific hashtags to a mySQL-database. You tell it once which data to collect – and wait some months. #bigdata Damian Trilling
  • 6. The data The script Your turn Questions? Recording tweets with yourTwapperkeeper yourTwapperkeeper #bigdata Damian Trilling
  • 7. The data The script Your turn Questions? Recording tweets with yourTwapperkeeper yourTwapperkeeper Retrieving the data You could access the MySQL-database directly. But yourTwapperkeeper has a nice interface that allows you to export the data to a format we can use for the analysis. #bigdata Damian Trilling
  • 8.
  • 9.
  • 10.
  • 11. The data The script Your turn Questions? CSV-files The data: CSV-files #bigdata Damian Trilling
  • 12. The data The script Your turn Questions? CSV-files CSV-files The format of our choice • All programs can read it • Even human-readable in a simple text editor: • Plain text, with a comma (or a semicolon) denoting column breaks • No limits regarging the size #bigdata Damian Trilling
  • 13. The data The script Your turn Questions? CSV-files 1 2 3 text,to_user_id,from_user,id,from_user_id, iso_language_code,source,profile_image_url,geo_type, geo_coordinates_0,geo_coordinates_1,created_at,time :-) #Lectrr #wereldleiders #uitspraken #Wikileaks # klimaattop http://t.co/Udjpk48EIB,,henklbr ,407085917011079169,118374840,nl,web,http://pbs.twimg. com/profile_images/378800000673845195/ b47785b1595e6a1c63b93e463f3d0ccc_normal.jpeg,,0,0,Sun Dec 01 09:57:00 +0000 2013,1385891820 Wat zijn de resulaten vd #klimaattop in #Warschau waard? @EP_Environment ontmoet voorzitter klimaattop @MarcinKorolec http://t.co/4Lmiaopf60,,Europarl_NL ,406058792573730816,37623918,en,<a href="http://www. hootsuite.com" rel="nofollow">HootSuite</a>,http://pbs .twimg.com/profile_images/2943831271/ b6631b23a86502fae808ca3efde23d0d_normal.png,,0,0,Thu Nov 28 13:55:35 +0000 2013,1385646935 #bigdata Damian Trilling
  • 14. The data The script Your turn Questions? Other ways to collect tweets The data: Other ways to collect tweets #bigdata Damian Trilling
  • 15. The data The script Your turn Questions? Other ways to collect tweets Other ways to collect tweets Again, we want a CSV file. . . • If you want tweets per person: www.allmytweets.net • Up to six days backwards: www.scraperwiki.com • Buy it from a commercial vendor • TCAT (from the guys at DMI/mediastudies) • For specific purposes, write your own Python script to access the Twitter-API (if you want to, I can show you more about this tomorrow) #bigdata Damian Trilling
  • 16. The data The script Your turn Questions? Not that different: Facebook posts The data: Not that different: Facebook posts #bigdata Damian Trilling
  • 17. The data The script Your turn Questions? Not that different: Facebook posts Not that different: Facebook posts Have a look at netvizz • Gephi-files for network analysis • . . . and a tab-seperated (essentially the same as CSV) file with the content) #bigdata Damian Trilling
  • 18. The data The script Your turn Questions? Not that different: Facebook posts Not that different: Facebook posts Have a look at netvizz • Gephi-files for network analysis • . . . and a tab-seperated (essentially the same as CSV) file with the content) An alternative: Facepager • Tool to query different APIs (a.o. Twitter and Facebook) and to store the result in a CSV table • http://www.ls1.ifkw.uni-muenchen.de/personen/ wiss_ma/keyling_till/software.html #bigdata Damian Trilling
  • 19.
  • 20. The data The script Your turn Questions? Pseudo-code The script: Pseudo-code #bigdata Damian Trilling
  • 21. The data The script Your turn Questions? Pseudo-code Our task: Identify all tweets that include a reference to Poland Let’s start with some pseudo-code! 1 2 3 4 5 6 7 open csv-table for each line: append column 1 to a list of tweets append column 3 to a list of corresponding users look for searchstring in column 1 append search result to a list of results save lists to a new csv-file #bigdata Damian Trilling
  • 22. The data The script Your turn Questions? Python code The script: Python code #bigdata Damian Trilling
  • 23. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 #!/usr/bin/python from unicsv import CsvUnicodeReader from unicsv import CsvUnicodeWriter import re inputfilename="mytweets.csv" outputfilename="myoutput.csv" user_list=[] tweet_list=[] search_list=[] searchstring1 = re.compile(r’[Pp]olen|[Pp]ool|[Ww]arschau|[Ww]arszawa’) print "Opening "+inputfilename reader=CsvUnicodeReader(open(inputfilename,"r")) for row in reader: tweet_list.append(row[0]) user_list.append(row[2]) matches1 = searchstring1.findall(row[0]) matchcount1=0 for word in matches1: matchcount1=matchcount1+1 search_list.append(matchcount1) print "Constructing data matrix" outputdata=zip(tweet_list,user_list,search_list) headers=zip(["tweet"],["user"],["how often is Poland mentioned?"]) print "Write data matrix to ",outputfilename writer=CsvUnicodeWriter(open(outputfilename,"wb")) writer.writerows(headers) writer.writerows(outputdata)
  • 24. The data The script Your turn Questions? Python code 1 2 3 4 5 #!/usr/bin/python # We start with importing some modules: from unicsv import CsvUnicodeReader from unicsv import CsvUnicodeWriter import re 6 7 8 9 10 # Let us define two variables that contain # the names of the files we want to use inputfilename="mytweets.csv" outputfilename="myoutput.csv" #bigdata Damian Trilling
  • 25. The data The script Your turn Questions? Python code 1 2 3 4 5 6 # We create some empty lists that we will use later on. # A list can contain several variables # and is denoted by square brackets. user_list=[] tweet_list=[] search_list=[] #bigdata Damian Trilling
  • 26. The data The script Your turn Questions? Python code 1 2 # What do we want to look for? searchstring1 = re.compile(r’[Pp]olen|[Pp]ool|[Ww]arschau |[Ww]arszawa’) 3 4 5 6 # Enough preparation, let the program begin! # We tell the user what is going on... print "Opening "+inputfilename 7 8 9 # ... and call the module that reads the input file. reader=CsvUnicodeReader(open(inputfilename,"r")) #bigdata Damian Trilling
  • 27. The data The script Your turn Questions? Python code 1 2 3 4 5 6 7 8 # Now we read the file line by line. # The indented block is repeated for each row # (thus, each tweet) for row in reader: # append data from the current row to our lists. # Note that we start counting with 0. tweet_list.append(row[0]) user_list.append(row[2]) 9 10 11 12 13 14 15 16 #bigdata # Let us count how often our searchstring is used in # in this tweet matches1 = searchstring1.findall(row[0]) matchcount1=0 for word in matches1: matchcount1=matchcount1+1 search_list.append(matchcount1) Damian Trilling
  • 28. The data The script Your turn Questions? Python code 1 2 # Time to put all the data in one container # and save it: 3 4 5 6 7 8 9 10 print "Constructing data matrix" outputdata=zip(tweet_list,user_list,search_list) headers=zip(["tweet"],["user"],["how often is Poland mentioned?"]) print "Write data matrix to ",outputfilename writer=CsvUnicodeWriter(open(outputfilename,"wb")) writer.writerows(headers) writer.writerows(outputdata) #bigdata Damian Trilling
  • 29. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 #!/usr/bin/python from unicsv import CsvUnicodeReader from unicsv import CsvUnicodeWriter import re inputfilename="mytweets.csv" outputfilename="myoutput.csv" user_list=[] tweet_list=[] search_list=[] searchstring1 = re.compile(r’[Pp]olen|[Pp]ool|[Ww]arschau|[Ww]arszawa’) print "Opening "+inputfilename reader=CsvUnicodeReader(open(inputfilename,"r")) for row in reader: tweet_list.append(row[0]) user_list.append(row[2]) matches1 = searchstring1.findall(row[0]) matchcount1=0 for word in matches1: matchcount1=matchcount1+1 search_list.append(matchcount1) print "Constructing data matrix" outputdata=zip(tweet_list,user_list,search_list) headers=zip(["tweet"],["user"],["how often is Poland mentioned?"]) print "Write data matrix to ",outputfilename writer=CsvUnicodeWriter(open(outputfilename,"wb")) writer.writerows(headers) writer.writerows(outputdata)
  • 30. The data The script Your turn Questions? The output The script: myoutput.csv #bigdata Damian Trilling
  • 31. The data The script Your turn Questions? The output 1 2 3 4 5 tweet,user,how often is Poland mentioned? :-) #Lectrr #wereldleiders #uitspraken #Wikileaks # klimaattop http://t.co/Udjpk48EIB,henklbr,0 Wat zijn de resulaten vd #klimaattop in #Warschau waard? @EP_Environment ontmoet voorzitter klimaattop @MarcinKorolec http://t.co/4Lmiaopf60,Europarl_NL,1 RT @greenami1: De winnaars en verliezers van de lachwekkende #klimaattop in #Warschau (interview): http://t.co/DEYqnqXHdy #Misserfolg #Kli...,LarsMoratis ,1 De winnaars en verliezers van de lachwekkende #klimaattop in #Warschau (interview): http://t.co/DEYqnqXHdy # Misserfolg #Klimaschutz #FAZ,greenami1,1 #bigdata Damian Trilling
  • 32. The data The script Your turn Questions? The output #bigdata Damian Trilling
  • 33. The data The script Your turn Questions? Try it yourself! We’ll help you getting started. Please go to http://beehub.nl/bigdata-cw/workshop and download the some files. Save the Python files unicsv.py myfirstscript.py as well as the dataset mytweets.csv in a new folder called workshop on your H-drive. When you are done, start Python (GUI) from the Windows Start Menu. #bigdata Damian Trilling
  • 34. The data The script Your turn Questions? Recap 1 The data Recording tweets with yourTwapperkeeper CSV-files Other ways to collect tweets Not that different: Facebook posts 2 The script Pseudo-code Python code The output 3 Your turn 4 Questions? #bigdata Damian Trilling
  • 35. The data The script Your turn Questions? This afternoon Your own script #bigdata Damian Trilling
  • 36. The data The script Your turn Questions? Vragen of opmerkingen? Damian Trilling d.c.trilling@uva.nl @damian0604 www.damiantrilling.net #bigdata Damian Trilling