SlideShare a Scribd company logo
1 of 16
Download to read offline
Social network analysis for journalists using the
Twitter API
Introduction
Social Network analysis allows us to identify players in a social network and how they are related
to each other. For example: I want to identify people who are involved in a certain topic ­ either to
interview or to understand what different groups are engaging in debate.
What you’ll Need:
● Gephi (http://gephi.org)
● OpenRefine (http://openrefine.org)
● The Sample Spreadsheet
(https://docs.google.com/a/okfn.org/spreadsheet/ccc?key=0Aq9agjil66PydDlORHRQQlF
EckRtYkNVbS15bjd2Vmc#gid=0)
● A sample Dataset
(http://datahub.io/dataset/ddj­2013­04­5­2013­04­18/resource/3163ceb8­63f4­4901­9387­
dab3f2b86157)
● Bonus: The twitter search to graph tool from:
https://github.com/mihi­tr/twsearch/raw/master/dist/twitter­search/twsearch.jar
Step 1: Basic Social Networks
Throughout this exercise we will use Gephi for graph analysis and visualization. Let’s start by
getting a small graph into gephi.
Take a look at the sample spreadsheet ­ this is data from a fictional case you are investigating.
In your country the minister of health (Mark Illinger) recently bought 500,000 respiration masks
from a company (Clearsky­Health) during a flu­scare that turned out non substantial. The masks
were never used and rot away in the basement of the ministry. During your investigation you
found that during the period of this deal Clearsky­Health was consulted by Flowingwater
Consulting and paid them a large sum for their services. A consulting company owned by Adele
Meral­Poisson. Adele Meral­Poisson is a well known lobbyist and the wife of Mark Illinger.
While we don’t need to apply network analysis to understand this fictional case ­ it helps
understanding the sample spreadsheet. Gephi is able to import spreadsheets like this through
it’s “import csv” section. Let’s do this.
Walkthrough Importing CSV into Gephi
1. Save the Sample Spreadsheet as csv (or click download as → comma seperated values
if using google spreadsheet)
2. Start Gephi
3. select File → Open
4. Select the csv file safed from the sample spreadsheet.
5. You will get a import report ­ check whether the number of nodes and edges seem
correct and there are no errors reported
6. The default values are OK for many graphs of this type. If the links between the objects in
your spreadsheet are not unilateral but rather bilateral: e.g. lists of friendship,
relationships etc. select Undirected instead of directed.
7. For now we’ll go with directed ­ so click “OK” to import the graph.
Now we have imported our simple graph and already see some things on the screen let’s make
it a little nicer. By playing around with Gephi a bit.
Walkthrough: Basic layout in Gephi
See the grey nodes there, let’s make this graph a little easier to read
1. Click on the big fat “T” on the bottom of the graph screen to activate labels
2. Let’s zoom a bit, click on the button on the lower right of the graph window to open the
larger menu
3. You should see a zoom slider now, slide it around to make your graph a little bigger:
4. You can click on individual nodes and drag them around to arrange them nicer.
Step 2: Getting data out of Twitter
Now we have this, let’s get some data out of Twitter. We’ll be using the twitter search for a
particular hashtag to find information who talks about it, with whom and what do they talk about.
Twitter offers loads of information on their API for search it’s here:
https://dev.twitter.com/docs/api/1/get/search
It basically all boils down to using https://search.twitter.com/search.json?q=%23tag (the %23 is
the #character encoded ­ so %23ijf corresponds to #ijf). If you open the link in the browser you
will get the data in json format ­ a format that is ideal for computers to read ­ but rather hard for
you. Luckily Refine can help with this and turn the information into a table. (If you’ve never worked
with refine before, consider having a quick look at the cleaning data with refine recipe at the
school of data: http://schoolofdata.org/handbook/recipes/cleaning­data­with­refine/)
Walktrough: Get JSON data from web apis into Refine
1. Open Refine
2. Click Create Project
3. Select “Web Adresses”
4. Enter the the following url https://search.twitter.com/search.json?q=%23ijf ­ this
searches for the #ijf hashtag on twitter.
5. Click on “Next”
6. You will get  a preview window showing you nicely formatted json:
7. Hover over the curly bracket inside results and click this selects the results as the data to
import into a table.
8. Now name your project and click “create project” to get the final table
By now we have the all the tweets in a table. You see there is a ton of information to each tweet:
we’re interested in who communicates with whom and about what: so the columns we care
about are the “text” column and the “from_user” column  ­let’s delete all the others. (To do so
use “All → Edit Columns → remove/reorder Columns”)
The from user is stripped of the characteristical @ in front of the username that is used in tweets
­ since we want to extract the usernames from tweets later, let’s add a new column with from as
@tweets. This will involve a tiny bit of programming ­ don’t be afraid it’s not rocket science
Walkthrough: Adding a new column in Refine
1. On your from_user column Select “Edit column → add column based on this column...”
2. Whoah ­ Refine wants us to write a little code to tell it what the new column looks like
3. Let’s program then: Later on we’ll do something the built in programming language
doesn’t let us do, luckily it offers two alternatives Jython (basically python) and clojure.
We’ll go for clojure as we’ll need it later.
4. Select Clojure as your language
5. We want to prepend “@” to each name (here “value” refers to the value in each row)
6. Enter (str “@” value) into the expression field
7. See how the value has been changed from peppemanzo to @peppemanzo ­ what
happened? In clojure “str” can be used to combine multiple strings: (str “@” value)
therefore combines the string “@” with the string in value ­ what we wanted to do.
8. Now simply name your column (eg. “From”) and click on OK you will have a new column
Ok we got the first thing of our graph: the from user ­ now let’s see what the users talk about.
While this will get a lot more complicated ­ don’t worry we’ll walk you through....
Walkthrough: Extracting Users and Hashtags from Tweets
1. Let’s start with adding a new column based on the text column
2. The first thing we want to do is to split the tweet into words ­ we can do so by entering
(.split value “ “) into the expression field (make sure your language is still clojure)
3. Our tweet now looks very different ­ it has been turned into an “Array” of words. (an Array
is simply a collection, you can recognize it by the square brackets.
4. We don’t actually want all words, do we? We only want those starting with @ or # ­ users
and hashtags (so we can see who’s talking with whom about what) ­ so we need to filter
our array.
5. Filtering in clojure works with the “filter” function, it takes a filter­function and an array  ­
the filter­function simply determines whether the value should be kept or not. In our case
the filter­function looks like “#(contains? #{# @} (first %))” ­ looks like comic­book
characters swearing? Don’t worry, contains? basically checks if something is in
something else, here whether the first character of the value (first %) is either # or @
(#{# @}) ­ exactly what we want. Let’s extend our expression:
6. Whoohaa, that seemed to have worked! Now the only thing we need to do is to create a
single value out of it. ­ Remember we can do so by using “str” as above.
7. If we do this straight away we run into a problem: before we used “str” as (str “1st” “2nd”)
now we want to do (str [“1st” “2nd”]) because we have an array ­ clojure helps us here
with the apply function: (apply str [“1st” “2nd”]) converts (str [“1st” “2nd”]) to (str “1st”
“2nd”). Let’s do so...
8. Seems to have worked. Do you spot the problem though?
9. Exactly the words are joined without a clear seperator, let’s add a seperator: The easiest
way is to interpose a character (e.g. a comma) between all the elements of the array ­
clojure does this with the interpose function. (interpose “,” [1 2 3]) will turn out to be [1 “,”
2 “,” 3]. Let’s extend our formula:
10. So our final expression is:
(apply str (interpose "," (filter #(contains? #{# @} (first %)) (.split value " "))))
Looks complicated but remember, we built this from the ground up.
11. Great ­ we can now extract who talks to whom! name your column and click “OK”  to
continue
Now we have extracted who talks with whom, but the format is still different from what we need
in gephi. So let’s clean up to have the data in the right format for gephi.
Waltkthrough Cleaning up
1. first, let’s remove the two columns we don’t need anymore: the text and the original
from_user column ­ do this with “all → edit columns → remove and reorder columns
2. Make sure your “from” column is the first column
3. Now, let’s split up the to column so we have one row in each entry: use “to → edit cells
→ split multi valued cells” enter “,” as seperator
4. Make sure to switch back to “rows” mode.
5. Now let’s fill the empty rows: select “from → edit cells → fill down”
6. Notice that there are some characters in there that don’t belong to names (e.g. “:” ?) Let’s
remove them.
7. select “to → edit cells → transform...”
8. To replace our transformation is going to be (.replace value “:” “”)
You’ve now cleaned your csv and prepared it enough for gephi, let’s make some graphs! Export
the file as csv and open it in gephi as above.
A small network from a Twitter Search
Let’s play with the network we got through google refine:
1. Open the CSV file from google refine in gephi
2. look around the graph ­ you’ll see pretty soon that there are several nodes that don’t really
make sense: “from” and “to” for example. Let’s remove them
3. Switch gephi to the “data laboratory” view
4. This view will show you nodes and edges found
5. you can delete nodes by right clicking on them (you could also add new nodes)
6. Delete “from” “to” and “#ijf” ­ since this was the term we searched it’s going to be
mentioned everywhere
7. Activate the labels: it’s pretty messy right now so let’s add some layouting. To layout
simply select the algorithm in layout and click “play” ­ see how the graph changes.
8. Generally combining “Force Atlas” with “Fuchterman Reingold” gives nice results. Add
“label adjust” to make sure text does not overlap.
9. Now let’s make some more adjustments ­ let’s scale the label by how often things are
mentioned. Select label size in the ranking menu
10. Select “Degree” as rank parameter
11. Click on “Apply” ­ you might need to run the “label adjust” layout again to avoid
overlapping labels
12. With this simple trick, we see what kind of topics and persons are frequently mentioned
Great ­ but it has one downside ­ the data we’re able to get via google refine is very limited ­ so
let’s explore another route.
A larger network from a Twitter search
Now we analyzed a small network from a search ­ let’s deal with a bigger one. This one is from a
week of searching for the twitter hashtag #ddj. (you can download it here:
http://datahub.io/dataset/ddj­2013­04­5­2013­04­18/resource/3163ceb8­63f4­4901­9387­dab3f2b
86157
The file is in gexf format ­ a format for exchanging graph data.
Walkthrough: Network analysis using Gephi
1. Open the sample graph file in gephi
2. Go to the Data view and remove the #ddj node
3. Enable Node labels
4. Scale labels by Degree (number of edges from this node)
5. Apply “Force Atlas”, “Fuchterman Rheingold” and “Label Adjust” (remember to stop the
first two after a while).
6. Now you should have  a clear view of the network
7. Now let’s perform some analysis. One thing we are interested in is: who is central and
who’s not: in other words: Who is talking and who is talked to.
8. For this we will run statistics (found in the statistics tab on the right) ­ we will use the
“Network diameter” statistics first ­ they tell us about eccentricity, Betweenness centrality
and closeness centrality. Betweenness centrality tells us which nodes connect nodes: in
our terms: high betweenness centrality are nodes who are communication leaders. Low
betweenness centrality are topics.
9. Now we ran our test, we can color the labels according to this. Select the label color
ranking and “Betweenness Centrality”
10. Pick colors as you like them ­ I prefer light colors and a dark background.
11. Now let’s do something different. Let’s try to detect the different groups of people who are
involved in the discussion. This is done with the “modularity” statistic.
12. Color your labels using the “Modularity Class” ­ now you see different clusters of people
who are involved in the discussion
Now we have analyzed a bigger network ­ found the important players and the different groups
active in the discussions ­ all by searching twitter and storing the result.
Bonus: Scraping the twitter search with a small java utility
If you have downloaded the .jar file mentioned above ­ it’s a scraper extracting persons and
hastags from twitter ­ think of what we did previously but automated. To run it use:
java twsearch.jar “#ijf” 0 ijf.gexf
this will search for #ijf on twitter every 20 seconds and write it to the file ijf.gexf ­ the gexf format
is a graph format understood by gephi. If you want to end data collection: press ctrl­c ­ simple
isn’t it? ­ In fact the utility just runs using java ­ it is written entirely in clojure (the language we
used to work with the tweets above).
Social network analysis for journalists using the twitter api

More Related Content

Viewers also liked

ABC法則與跟進(skype coring)
ABC法則與跟進(skype coring)ABC法則與跟進(skype coring)
ABC法則與跟進(skype coring)
mengju
 
Presentazione terra madre
Presentazione terra madrePresentazione terra madre
Presentazione terra madre
Donata Columbro
 
Internal assessement in ib ppt for students
Internal assessement in ib ppt for studentsInternal assessement in ib ppt for students
Internal assessement in ib ppt for students
shajugeorge
 
20130504 - FeWeb - Twitter API
20130504  - FeWeb - Twitter API20130504  - FeWeb - Twitter API
20130504 - FeWeb - Twitter API
Pascal Alberty
 

Viewers also liked (20)

Dates
DatesDates
Dates
 
ABC法則與跟進(skype coring)
ABC法則與跟進(skype coring)ABC法則與跟進(skype coring)
ABC法則與跟進(skype coring)
 
DaCENA
DaCENADaCENA
DaCENA
 
ppcmcq
ppcmcqppcmcq
ppcmcq
 
Associazioni semantiche per il Computational Journalism
Associazioni semantiche per il Computational JournalismAssociazioni semantiche per il Computational Journalism
Associazioni semantiche per il Computational Journalism
 
Dossier Camereaperte 2013
Dossier Camereaperte 2013Dossier Camereaperte 2013
Dossier Camereaperte 2013
 
Presentazione terra madre
Presentazione terra madrePresentazione terra madre
Presentazione terra madre
 
Ngos and social media
Ngos and social mediaNgos and social media
Ngos and social media
 
#BeSocial12: what is the "web 2.0 revolution"?
#BeSocial12: what is the "web 2.0 revolution"?#BeSocial12: what is the "web 2.0 revolution"?
#BeSocial12: what is the "web 2.0 revolution"?
 
Correlazioni utenti
Correlazioni utentiCorrelazioni utenti
Correlazioni utenti
 
Effective advertising techniques
Effective advertising techniquesEffective advertising techniques
Effective advertising techniques
 
Sistemi di raccomandazione
Sistemi di raccomandazioneSistemi di raccomandazione
Sistemi di raccomandazione
 
Internal assessement in ib ppt for students
Internal assessement in ib ppt for studentsInternal assessement in ib ppt for students
Internal assessement in ib ppt for students
 
Smart City & Smart People - La Pubblica Amministrazione, l'Istruzione, la Gre...
Smart City & Smart People - La Pubblica Amministrazione, l'Istruzione, la Gre...Smart City & Smart People - La Pubblica Amministrazione, l'Istruzione, la Gre...
Smart City & Smart People - La Pubblica Amministrazione, l'Istruzione, la Gre...
 
How to measure Twitter
How to measure TwitterHow to measure Twitter
How to measure Twitter
 
20130504 - FeWeb - Twitter API
20130504  - FeWeb - Twitter API20130504  - FeWeb - Twitter API
20130504 - FeWeb - Twitter API
 
Jordan Kay's Twitter API tour
Jordan Kay's Twitter API tourJordan Kay's Twitter API tour
Jordan Kay's Twitter API tour
 
Getting Started with Public APIs
Getting Started with Public APIsGetting Started with Public APIs
Getting Started with Public APIs
 
How not to measure Twitter Influence
How not to measure Twitter InfluenceHow not to measure Twitter Influence
How not to measure Twitter Influence
 
Facebook api for iOS
Facebook api for iOSFacebook api for iOS
Facebook api for iOS
 

More from Valeria Gennari

More from Valeria Gennari (10)

Eyewish
EyewishEyewish
Eyewish
 
Fooid - onepager
Fooid - onepagerFooid - onepager
Fooid - onepager
 
TripAdvisor - Un'indagine di mercato sul colosso delle review 2.0
TripAdvisor - Un'indagine di mercato sul colosso delle review 2.0TripAdvisor - Un'indagine di mercato sul colosso delle review 2.0
TripAdvisor - Un'indagine di mercato sul colosso delle review 2.0
 
Sistemi distribuiti
Sistemi distribuitiSistemi distribuiti
Sistemi distribuiti
 
TripAdvisor - Un'indagine di mercato sul colosso delle review 2.0
TripAdvisor - Un'indagine di mercato sul colosso delle review 2.0TripAdvisor - Un'indagine di mercato sul colosso delle review 2.0
TripAdvisor - Un'indagine di mercato sul colosso delle review 2.0
 
Report finale per il Corso di Strumenti d'indagine per le organizzazioni e i ...
Report finale per il Corso di Strumenti d'indagine per le organizzazioni e i ...Report finale per il Corso di Strumenti d'indagine per le organizzazioni e i ...
Report finale per il Corso di Strumenti d'indagine per le organizzazioni e i ...
 
Relazione finale Bee_cocca
Relazione finale Bee_coccaRelazione finale Bee_cocca
Relazione finale Bee_cocca
 
Differenze tra occidentali e orientali nella lettura dello schermo del pc
Differenze tra occidentali e orientali nella lettura dello schermo del pcDifferenze tra occidentali e orientali nella lettura dello schermo del pc
Differenze tra occidentali e orientali nella lettura dello schermo del pc
 
Progetto di ergonomia - Supermercato Simply, Viale Monza (MI)
Progetto di ergonomia - Supermercato Simply, Viale Monza (MI)Progetto di ergonomia - Supermercato Simply, Viale Monza (MI)
Progetto di ergonomia - Supermercato Simply, Viale Monza (MI)
 
Relazione finale pedalaMi
Relazione finale pedalaMiRelazione finale pedalaMi
Relazione finale pedalaMi
 

Recently uploaded

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Recently uploaded (20)

ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 

Social network analysis for journalists using the twitter api