This presentation provides an overview of some of the data extractions that may be achieved on social media platforms using their respective APIs and a free open-source tool (NodeXL).
Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.
Extracting Social Network Data and Multimedia Communications from Social Media Platforms for Analysis and Decision-making
1. EXTRACTING SOCIAL NETWORK DATA
AND MULTIMEDIA COMMUNICATIONS
FROM SOCIAL MEDIA PLATFORMS FOR
ANALYSIS AND DECISION-MAKING
Shalin Hai-Jew
2014 Big XII Teaching and Learning
Conference
Oklahoma State University
Stillwater, Oklahoma
Aug. 4 – 5, 2014
2. PRESENTATION OVERVIEW
Electronic Commons
Academic Environment Analysis and Decision-making (from E-SNA)
Examples of Social Network Data Graphs
Electronic Social Network Analysis (E-SNA) / Social Physics
Social Media Platform Types
Microblogging: Twitter
Content-Based Social Platforms: YouTube, Flickr
Web Networks
NodeXL (Network Overview, Discovery and Exploration for Excel)
Review
Tools
2
4. WELCOMES AND SELF-INTROS
Please introduce yourself as your digital alter-ego. What does your electronic alter-
ego look like on, say, Twitter? Facebook? Flickr? YouTube? How accurate is your
digital doppelganger to your real-world self? Why?
If analyst(s) were to conduct an “inference attack” on your electronic presence, what could they find
out? What could they infer in terms of data leakage and unintended communications (latent
information)?
If electronic presence is a kind of social performance, how is it best performed, and
why?
What are your experiences with social media platforms? Which do you prefer, and
why? Have your preferences changed over time?
What would you like to learn about electronic social network analysis?
4
5. THE CONTEXT
To provide a rationale for the
use of electronic social network
analysis to benefit the
(teaching and learning, and
other) work of universities
5
Note: This presentation was designed to
introduce some basic electronic social
network analysis capabilities, not teach
the audience directly how to do the
work, which is beyond the purview of the
presentation.
6. THE ELECTRONIC COMMONS
A “chokepoint” for social issues as a commons
A way to reach many technologically and socially
A way to trigger mass actions (attitudes, beliefs, actions), potentially in a viral or cascading way…as an influence agent
A fantasy space where “egos” may assume audiences (that may be non-existent)
A fantasy space where “egos” may assume non-audiences (the assumption of narrow-casting) when it
may be broadcasting (unintended audiences along with the intended ones)
Re-creation of social power structures from the real-world into the virtual
In-group and out-groups
Social performances, posing
Social codes and meanings
Mixed interests and motives
Low cost of indulging curiosities, particularly in an automated and scalable way
6
7. THE ELECTRONIC COMMONS (CONT.)
Certain individuals (demographics) in certain social media platforms
Limited big data sharing (value to the data and the identities)
Application programming interfaces (APIs) to access shadow databases
Importance of maintaining trust with clients
Private accounts (vs. public ones)
7
8. AFFORDANCES AND ENABLEMENTS FOR
INSTITUTIONS OF HIGHER EDUCATION
What are ways that universities have benefitted from the Web? Social media?
How can universities continue building on these affordances? What innovations can
people use to build on these effects?
What are some ways that universities can harness electronic social network analysis
(e-SNA) for their various professional / formal and professional / informal
objectives?
8
9. ACADEMIC ENVIRONMENT ANALYSIS AND
DECISION-MAKING (FROM E-SNA)
What is the social media presence of the university?
Who are its closest partners in terms of exchanging messages or sharing social media contents?
What are the contents of the messages? What are the main expressed sentiments?
If the university is considering partnering with an organization, what may be
learned about this organization based on its social media presence?
Who are the most active participants in a #hashtag conversation about some aspect
of the university? Who is the “mayor of the hashtag” (per Marc A. Smith’s term)?
Why?
What conversations are occurring around the events being hosted on or around
campus?
9
10. ACADEMIC ENVIRONMENT ANALYSIS AND
DECISION-MAKING (FROM E-SNA)(CONT.)
If there is a controversial or trending issue, what are the main sentiments being
expressed? Who and which ad hoc groups are expressing what sentiments? How
may the university take part constructively?
If a flash mob action is being planned around campus, how can campus
administrators and law enforcement personnel know about what is happening?
If there is a university-related issue that may be inspired, organized, and
maintained using social media, how can universities harness social media to
constructive ends?
Is there mis-use of the university name and brand? Are there fraudulently created
social media accounts linked to the university? (After de-aliasing, who is actually
behind such accounts?) How can social media platform information be used to
geolocate events to physical spaces, and aliases to actual people?
10
11. ACADEMIC ENVIRONMENT ANALYSIS AND
DECISION-MAKING (FROM E-SNA)(CONT.)
What sorts of images and video are being shared (that are associated with the
university) on microblogging sites? On content sharing sites?
In terms of digital content tagging, what are the most common words linked to the
university (or its student groups, colleges, public figures, and other associated groups
and individuals)?
If there is a desire to change public perceptions, how may social media platforms be used
constructively? What are the ethical rules of engagement?
How may a university maintain relationships with its various constituencies through
social media? Its political partners? Its corporate partners? Its alumni? Its donors?
Its current learners? Its current learners’ families? And then, further, how can e-SNA
be used to maintain understandings of these interchanges and interrelationships?
11
13. GRAPH 1A: A #HASHTAG CONVERSATION ON
TWITTER (FLU)
13
Note: Please click on the various
graphs to link to them on the
NodeXL Graph Gallery. Datasets
may be downloaded there for many
of these data extractions.
The data structures can be depicted
in a variety of ways based on a
number of layout algorithms.
14. GRAPH 1B: A TWITTER #HASHTAG
CONVERSATION (#BRAG)
14
15. GRAPH 2: AN #EVENTGRAPH ON TWITTER
(MERLOT)
15
26. A NOTE ABOUT WEB NETWORK GRAPHS
Third-party VOSON (Virtual Observatory for the Study of Online Networks) tool
out of Australia National University (with an add-in to NodeXL)
Maltego Tungsten
26
27. (E-) SOCIAL NETWORK ANALYSIS AND
SOCIAL PHYSICS
To summarize some of the
basic concepts of social
network analysis as applied to
electronic spaces
27
29. “SOCIAL PHYSICS”
Identifying the latent “laws” of human interactions with each other at macro and
micro levels
Laws of affiliation and association (over time): homophily, heterophily
Laws of attraction and aversion
Laws of human patterning socially (and others)
Laws of human uses of physical spaces
Laws of systemic change
Laws of social frictions and large-scale combat
29
30. STATISTICAL MEASURES
Global Network Measures
Betweenness centrality: Total number
of shortest paths or walks for each pair
of dyadic notes (info moves between the
shortest paths and closest ties), how much
of a bridge a node is for network
connectivity
Closeness centrality: Geodesic path
distance between a node and every
other node (farness as sum of all
distances to all other nodes; closeness as
inverse of farness)
Node-Level (Local) Measures
Degree centrality: In-degree and out-
degree (relative popularity)
Clustering coefficient: Embeddedness
of single nodes in cliques or ego
neighborhoods with its alters
30
31. STATISTICAL MEASURES(CONT.)
Global Network Measures
Eigenvector centrality (diversity): Relative
distances between a node and every other
node and those connected to higher-value or
popular nodes resulting in a higher value
(values between 0 and 1) as a measure of
relative influence
Clustering coefficient: Aggregation of
multiple nodes based on similarity (like co-
occurrence) or connectivity, and expressed as
proximity or closeness visually; may be a
measure of transitivity
Motif Structures
Dyads, triads, and other structured sub-
groupings
Local and experiential for the nodes in terms of
structured connections
May (fractals) / may not be reflective of the
overall structure
Global motif censuses (counts of
occurrences of various types of motif
structures in a whole network)
Structural holes as indicators of potential
openings for nodes and links (to build
resilience)
31
32. STRUCTURE MINING
Structure of social relationships as an indicator of…
Type of social organization
An embedded power structure
An expression of interdependent and intermixed personalities
Network diffusion of information, power, and other transmissible phenomena
Geodesic structures and distances and paths
Static slice-in-time representations but actual dynamical (changing) realities
(“A Brief Overview of Social Network Analysis”)
32
33. NODES AND LINKS (IN TERMS OF SOCIAL MEDIA
PLATFORMS)
Entities
Individuals, organizations, governments,
non-profits, political groups, and others
People, robots, and cyborgs
Relationships
Follower, following
Tweets, re-tweets, replies-to, mentions
Comments on videos and response
videos
Co-occurrence of related tags networks
33
34. ON TWITTER
To give a sense of the various
network graphs possible from
the Twitter microblogging site
(with multimedia scraping)
34
35. ABOUT TWITTER
255 million monthly active users
500 million Tweets (140-character microblogging messages) a day
Nearly 80% of active users on mobile
77% of accounts outside U.S.
Support for over 35 languages
Vine (looping video sharing on mobile) with more than 40 million users
Verified accounts
[Twitter created by a four-man team in 2006 and incorporated in 2007 (About
Twitter FactSheet)]
35
36. TYPES OF INFORMATION AVAILABLE
#Hashtag conversations (tagged conversations)
#Hashtag eventgraphs (event-based)
Keyword networks (multi-topic)
User networks (ego-based)
List networks (topic-based)
36
37. SOME E-SNA CHALLENGES WITH THIS SOCIAL
MEDIA PLATFORM
Word disambiguation
1/100 with geolocation data (which is often noisy data)
Rate-limiting
Goes back a week only (no deep historical searches without paying for a third-
party company with access)
Enables extractions of Tweet streams as datasets
Limits for some languages (requiring URL Decoder / Encoder for readability, such as
at the following)
37
38. ON FLICKR
To provide a sense of what
network data may be extracted
from the Yahoo Flickr imagery
and video repository
38
39. ABOUT FLICKR
Hosts imagery and video
Over 90 million registered members
3.5 million new images uploaded daily
Hosting over 6 billion images as of 2011
Free accounts offering a terabyte of storage per individual
Enables public and private accounts
Enables Creative Commons licensure of contents and CC-Search access
[Created by Ludicorp in 2004 and sold to Yahoo in 2005]
39
40. TYPES OF INFORMATION AVAILABLE
Related Tags Networks on Flickr
(Multi-lingual) tags as a form of
metadata describing the imagery and
videos
Related tags (networks of tags that co-
occur and may be expressed as
clustered text-based graphs)
Graphs may be partitioned for more visual
clarity
Scraped imagery may be embedded in the
graphs
User Networks / Groups on Flickr
Ego neighborhoods of individual or
group contributors to Flickr
“Alters” (nodes with direct ties) to the
user network in Flickr
Follower / following
Reply-to
40
41. SOME E-SNA CHALLENGES WITH THIS SOCIAL
MEDIA PLATFORM
Disambiguation of terms
Reliance on informal tagging and folksonomies
Dealing with metadata and not the multimedia directly
Limits for some languages (requiring URL decoder / encoder for some languages,
namely Cyrillic and Arabic)
41
42. ON YOUTUBE
To give a sense of the content
networks available on Google’s
YouTube video collection
42
43. ABOUT YOUTUBE
Over a billion unique users each month on YouTube
Six billion hours of video watched monthly
100 hours of video uploaded each minute
Localized in 61 countries and as many languages
80% of traffic from outside the U.S. (YouTube Statistics)
Adobe Flash video format and HTML 5 format
[Founded in 2005 by a three-man development team and purchased by Google in
2006]
43
44. TYPES OF INFORMATION AVAILABLE
User networks (user accounts and connections with other user accounts)
Thumbnail screengrabs possible
Video networks (videos about a particular topic)
Thumbnail screengrabs possible
44
45. SOME E-SNA CHALLENGES WITH THIS SOCIAL
MEDIA PLATFORM
Based on metadata, not the direct videos
Would be richer if drawn from the scripts of the video contents
45
46. ON THE WEB
To provide a sense of what
may be captured in terms of
Web networks
46
47. TYPES OF INFORMATION AVAILABLE
Ties between websites
URLs linked to a geographical location (and vice versa)
Technological understructure of websites
Relatedness ties between various types of electronic information (and the
enablement of transforms or the changing of one type of electronic information to
another)
Scraping of files (PDF) and imagery (with EXIF data)
Re-identification of aliases
47
48. SOME E-SNA CHALLENGES WITH THIS
INFORMATION SOURCE
High levels of ambiguity
Past data leaving trails (even if the information may not be current)
Involves the public web only, not the hidden Web
Requires a commercial tool for efficiency and coherence
48
51. GENERAL SEQUENCE
1. Define a research question (that is answerable with this type of data query).
2. Formulate a strategy to use the tool to extract information from a particular social
media platform.
3. Start NodeXL. Ensure that there is Internet connectivity. Set up the data
extraction parameters. Run the data extraction.
4. Process the data. Create the graph visualization.
5. Analyze the graph metrics. Analyze the graph visualization. Analyze
complementary information from other sources.
6. Use the information to make a decision or create a strategy.
51
52. TOOL CAPABILITIES
Data extraction from a range of social media
platforms
Graph visualization using a dozen different
grouping (clustering) visualizations and overall graph
visualizations
52
55. LAYERS OF DEPENDENCIES
From near-to-far
Local computer and its processing
Connectivity speed to the Internet
NodeXL
Access to the social media platform
Whitelisting
Rate limiting (and time-of-day for access)
Particular search terms “forbidden”
Data processing with NodeXL
Data visualization (with NodeXL or another tool)
Data analysis
Re-run? Additional data extractions?
55
60. A BRIEF REVIEW OF THE AFFORDANCES OF E-SNA
Surfacing Hidden or Latent Information
Who (which nodes) is most active in an event or conversation or other phenomena?
What is he/she/they/it asserting (as an influence agent) via text? via imagery? via
video?
Scalability
This scalable approach enables analysis of both small-scale and (relatively) large-
scale data, and everything in between. At some point, the human has to come in to
analyze what’s found and to advance the work…but computers can do all the heavy
lifting.
60
61. A BRIEF REVIEW OF THE AFFORDANCES OF E-SNA
(CONT.)
Machine-Enhanced Sentiment Analysis
Gist of a Tweetstream related to a user account or related accounts, a hashtag
conversation, an eventgraph, a photostream, a videostream
Embedded meanings and sentiments (the meaning, the direction and the strength of
that emotion, the cultural and social-based valence whether positive or negative)
Fine-tuning the automated analysis of texts
Machine reading of imagery
Human-informed processes (at virtually every step)
61
62. OTHER DATA EXTRACTION AND GRAPH
VISUALIZATION TOOLS
NCapture on Chrome and Internet Explorer (NVivo 10 on Windows)
CEMap on AutoMap with ORA NetScenes
Maltego Tungsten™
* All the above have other purposes and
capabilities beyond the limited use cases
shown here.
62
63. REFERENCES
Hansen, D.L., Schneiderman, B., & Smith, M.A. (2011). Analyzing Social Media
Networks with NodeXL: Insights from a Connected World. Boston: Elsevier.
(available digitally on SciDirect)
NodeXL on CodePlex (downloadables)
63
64. LIVE DEMO? QUESTIONS? COMMENTS?
Audience suggestions for targets?
Any questions this presentation? About
e-social network analysis? The software
tools? The social media platforms?
Questions about research you might
want to embark on using this
methodology and these tools?
64
65. CONCLUSION AND CONTACT
Dr. Shalin Hai-Jew
Instructional Designer
Information Technology Assistance Center (iTAC)
Kansas State University
212 Hale Library
Manhattan, KS 66506-1200
785-532-5262 (work phone)
shalin@k-state.edu
65