1. Analyzing Events Through the
Lens of Social Media
Debanjan Mahata (dxmahata@ualr.edu)
Nitin Agarwal (nxagarwal@ualr.edu)
University of Arkansas at Little Rock
This work is supported in part by grants from the US Office of Naval Research (ONR)
and US National Science Foundation (NSF)
2. Outline
• Introduction
• Motivation
• Challenges
• Proposed Framework
• Data collection and processing
• Experiments- Results and Analysis
• Looking Ahead
7. Social Media’s Influence
• Social media played a phenomenal role in organizing these events
• Citizen journalism at its best
8. Goals of the Research
• We study how social media can be leveraged
to analyze
– Events and their characteristics
– Coverage differences from mainstream media
– Socio-demographic, socio-technical behavioral
patterns
– and explore further implications of the research
9. Challenges
• Identifying the right social media sources
• Language barrier
• Colloquial usage, misspellings, sparse links
• Extracting relevant information from the
sources
– Entity extraction and resolution
• Evaluation due to lack of benchmark datasets.
11. Proposed Methodology
• Identifying the right social media sources
Specificity (κ) of a source ‘S’ for an event ‘E’
IG(E, S) = H (E) − H (E | S)
1 p(s)
= ∑ p(e) log − ∑
p(e) e∈E,s∈S
p(e, s) log
p(e, s)
e∈E
12. Proposed Methodology
• Identifying the right social media sources
Closeness (τ) of a term/entity ‘e’ to a source ‘E’
τ = P(e, E) = P(E)P(e | E)
P(e | E) = efiEf = ef (e, E)*iEf (e)
• Creating Event dictionaries
13. Construction of Event Dictionaries
• Reference point to Egyptian revolution Tahrir Square, Egyptian
specific dictionary government, Gigi
construct event vocabulary Ibrahim, Alexandria,
• Independent of the sources Wael Abbas, …
• Globalvoicesonline.org Libyan revolution specific Tripoli, Muammar Al
dictionary Gaddafi, North Atlantic
• Extract entities from global Treaty Organization,
voices online source Chad, United Kingdom, …
• Use closeness measure to Tunisian revolution Tunisian government, Lin
order the entities based on specific dictionary Ben Mhenni, Samir
Feriani, Kasbah Square,
relevance to the event RCD, …
– Event-specific dictionary
Socio-political (global) Twitter, Iranian
– Event category-specific event dictionary Government, Tear gas
dictionary devices, Facebook, Big
Social network, …
Top 5 entities in the event specific and
Event category-specific dictionaries
14. Data collection
• Collected using Google Blog Search
• From blogspot.com
Event Query Term Number of Blogs Dates
Egyptian Revolution “egyptian 579 25th January, 2011 –
revolution” OR 7th December, 2011
“egypt protest”
Libyan Revolution “libyan revolution” 600 15th February, 2011
OR “libya protest” – 7th December,
2011
Tunisian Revolution “tunisian 484 17th December,
revolution” OR 2010 – 7th
“tunisia protest” December, 2011
15. Data Description
Blogger specific Blog post
specific
Blog specific URL
URL
URL Timestamp
Work information
Blogging tags Text
Gender
Outlinks
Blogs followed
Topic Category
Blogs owned Language
24. Conclusions
• Relevance of social media in various events
• Methodology to analyze events via social media
• Associated challenges
• Proposed measures to identify specific sources with respect
to atomic information units/entities
• Evaluation framework
• Popular sources may not be specific
• Localized sources tend to be more specific
• Expand the dataset, include more and various types of events
• Use as apparatus to analyze social movements, collective
actions, marketing research, etc.
26. Observation
• Socio-demographic
– Location
– Age
– Gender
– Profession (occupation, industry)
– etc.
• Socio-technical
– Links
– Devices
– Other social media profiles
• Network of bloggers from the extracted data
27. Specificity
κ = IG(Ei ,Sk ) = H(Ei )− H(Ei ,Sk )
i= n i= n
κ = −H (Ei ,Sk ) = ∑ fτ ∑ f
i i i
i=1 i=1
2006 time magazine selected “You” as the person of the year. in 2011. Web 2.0/social media enabled, or more precisely helped people topple decades-old authoritarian regimes in MENA region. Essentially, what people tried to do for the last 40 years social media helped to accomplish that in 5 years.
Social media has irreversibly transformed how people communicate, organize, mobilize, respond
Formation of collective action, manifestation of social movements, etc.
100s of millions of blogs, billions of tweets, several thousands youtube videos. Tons of region-specific sources. Strictly network-based approaches do not usually perform well.
Georgian Cyber Campaign (2009) brought Internet traffic to a standstill in the Republic of Georgia. The attacks, which coincided with the Russian military ’s invasion of Georgia, were carried out in large part out by civilians and Russian crime gangs. The attacks were significant in that they made it almost impossible for citizens and officials to communicate about what was happening on the ground during the military operation. According to a US Cyber Consequences Unit (US-CCU) August 2009 special report on this cyber campaign, social networking forums were the primary means used to recruit and arm the attackers. Social media has a key role in monitoring and tracking cyber-threats.
Chicken and egg problem – to identify good event dictionary you need good source and to identify good source you need event dictionary
ef-iEf = Generalization of content analysis measure, tf-idf
Alchemy API used to extract entities Our approach looks at quality (closeness) of the entities and not just quantity, so it is robust to the skewed distribution depicted above.
Also motivates the need for studying region-specific often non-English language sources.
In order to select the highly specific sources, we propose a novel ‘specificity’ measure, which estimates the unique information that a source ( S k ) can offer vis-à-vis an event ( E i ). It is important to note here that a source’s specificity is always estimated with respect to a given event. The measure draws upon the theory of information gain and is defined as . Mathematically, where IG(E i , S k ) denotes the information gain for a source related to an event E i , where S k = k th source belonging to ; E i = i th event belonging to ; denotes the set of sources for an event E i , ; denotes the set of events selected for the study; H(E i ) is the total entropy for the event E i , H(E i ,S k ) is the total entropy of the source S k related to the event E i . Since H(E i ) is constant for every event, IG(E i, S k ) is directly proportional to - H(E i , S k ) . So we only calculate the values for H(E i , S k ) in order to find . The formulation of discussed above is generic. Sections 6.3.1 explains a specific implementation.