Social Media Data Collection & Network Analysis with Netlytic and R
1. Social Media Data Collection & Network Analysis with Netlytic and R
Anatoliy Gruzd
gruzd@ryerson.ca
@gruzd
Canada Research Chair in Social Media Data Stewardship
Associate Professor, Ted Rogers School of Management
Director, Social Media Lab
Ryerson University
HKBU, Hong Kong
Dec 3, 2015
Twitter: @gruzd ANATOLIY GRUZD 1
4. Twitter: @gruzd
ANATOLIY GRUZD
Social Media sites have become
an integral part of our daily lives!
Growth of Social Media Data
Facebook
1.5B
users
Instagram
400M
users
Twitter
300M
users
5. Decision Making
in domains such as Politics, Health Care and Education
Twitter: @gruzd ANATOLIY GRUZD 6
How to Make Sense of Social Media Data?
Self-
collected/
reported
Public
APIs
Data
Resellers
6. How to Make Sense of Social Media Data?
Big Data Technology
Twitter: @gruzd ANATOLIY GRUZD 7
Credit: Nathan Lapierre
7. Twitter: @gruzd ANATOLIY GRUZD 8
Social Media Analytics Tools
http://socialmedialab.ca/apps/social-media-toolkit/
8. Data -> Visualizations -> Understanding
How to Make Sense of Social Media Data?
Twitter: @gruzd ANATOLIY GRUZD 9
9. How to Make Sense of Social Media Data?
Example: Geo-based Analysis
Twitter: @gruzd ANATOLIY GRUZD 10
10. How to Make Sense of Social Media Data?
Example: Geo-based Analysis
Twitter: @gruzd ANATOLIY GRUZD 11
Geography of
Twitter Networks
11. How to Make Sense of Social Media Data?
Example: Geo-based + Content Analysis
Tracking Hate Speech on Twitter
Twitter: @gruzd ANATOLIY GRUZD 12
Source: http://www.fenuxe.com/tag/geo-coded
12. Social Network Analysis (SNA)
• Nodes = People
• Edges /Ties (lines) = Relations/
“Who retweeted/ replied/
mentioned whom”
How to Make Sense of Social Media Data?
Twitter: @gruzd ANATOLIY GRUZD 13
13. Makes it much easier to understand what is going on
in a group
Advantages of
Social Network Analysis
Once the network is discovered, we can find
out:
• How do people interact with each other,
• Who are the most/least active members,
• Who is influential in a group,
• Who is susceptible to being influenced,
etc…
Twitter: @gruzd
ANATOLIY GRUZD 14
Liberal
Conservative
Spam
Unknown &
Undecided
NDP
Left
Green
Bloc
Other
Gruzd, A. and Roy, J (2014). Political Polarization on Social Media: Do
Birds of a Feather Flock Together on Twitter? Policy & Internet.
14. Common approach for collecting social
network data:
• Self-reported social network data
may not be available/accurate
• Surveys or interviews
Problems with surveys or interviews
• Time-consuming
• Questions can be too sensitive
• Answers are subjective or incomplete
• Participant can forget people and
interactions
• Different people perceive events and
relationships differently
How Do We Collect Information About Online Social Networks?
Twitter: @gruzd ANATOLIY GRUZD 15
15. Studying Online Social Networks
http://www.visualcomplexity.com/vc
Forum networks
Blog networks
Friends’ networks (Facebook,
Twitter, Google+, etc…)
Networks of like-minded people
(YouTube, Flickr, etc…)
Twitter: @gruzd ANATOLIY GRUZD 17
16. Goal: Automated Networks Discovery
Challenge: Figuring out what content-based features of online interactions can
help to uncover nodes and ties between group members
How Do We Collect Information About Online Social Networks?
Twitter: @gruzd ANATOLIY GRUZD 18
17. Automated Discovery of Social Networks
Emails
Nick
Rick
Dick
• Nodes = People
• Ties = “Who talks to whom”
• Tie strength = The number of
messages exchanged between
individuals
Twitter: @gruzd ANATOLIY GRUZD 19
18. Automated Discovery of Social Networks
“Many to Many” Communication
ChatMailing listservForum Comments
Twitter: @gruzd ANATOLIY GRUZD 20
19. @John
@Peter
@Paul • Nodes = People
• Ties = “Who retweeted/
replied/mentioned whom”
• Tie strength = The number of
retweets, replies or mentions
Automated Discovery of Social Networks
Twitter Networks
Twitter: @gruzd ANATOLIY GRUZD 21
20. Automated Discovery of Social Networks
Twitter Data Examples
Network Ties
@Cheeflo -> @JoeProf
@Cheeflo -> @VMosco
@JoeProf -> @VMosco
Twitter: @gruzd ANATOLIY GRUZD 22
Network Tie
@Gruzd -> @SidneyEve
Connection type: Mention
Connection type: Reply
23. Sample Twitter Searches
#ELECTION2016 #HONGKONG
Twitter: @gruzd ANATOLIY GRUZD 25
3557 records (Dec 3, 2015)1394 records (Oct 29, 2015)
What do these visualizations tell us?
24. SNA Measures
Micro-level
In-degree centrality
Out-degree centrality
Betweenness centrality
Other centrality measures (e.g.,
closeness, eigenvector)
Macro-level
Density
Diameter
Reciprocity
Centralization
Modularity
ANATOLIY GRUZD 26Twitter: @gruzd
25. SNA Measures
Micro-level
In-degree centrality
Out-degree centrality
Betweenness centrality
Other centrality measures (e.g.,
closeness, eigenvector)
ANATOLIY GRUZD 27
In-degree suggests “prestige”
highlighting the most mentioned or
replied Twitter users
Twitter: @gruzd
26. In-degree centrality
#HongKong Twitter network
Twitter: @gruzd ANATOLIY GRUZD 28
SEVENTEEN or SVT is
a S.Korean boy group formed
by Pledis Entertainment
27. SNA Measures
Micro-level
In-degree centrality
Out-degree centrality
Betweenness centrality
Other centrality measures (e.g.,
closeness, eigenvector)
ANATOLIY GRUZD 29
Out-degree reveals active Twitter
users with a good awareness of others
in the network
Twitter: @gruzd
29. SNA Measures
Micro-level
In-degree centrality
Out-degree centrality
Betweenness centrality
Other centrality measures (e.g.,
closeness, eigenvector)
ANATOLIY GRUZD 31
Betweenness shows actors who are
located on the most number of
information paths and who often
connect different groups of users in
the network
Twitter: @gruzd
30. Betweenness centrality
#HongKong Twitter network
Twitter: @gruzd ANATOLIY GRUZD 32
Note: A fan (retweets/replies to messages
from two different fan communities/sites)
40. SNA Measures
Macro-level
Density
Diameter
Reciprocity
Centralization
Modularity
Modularity provides an estimate of
whether a network consists of one
coherent group of participants who are
engaged in the same conversation and
who are paying attention to each other
(values closer to 0);
or whether a network consists of
different conversations and
communities with a weak overlap
(values closer to 1).
ANATOLIY GRUZD 44Twitter: @gruzd