5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
#ICCSS2015 - Computational Human Security Analytics using "Big Data"
1. Computational Human Security Analytics using “Big
Data”
Pete Burnap & Matt Williams
Social Data Science Lab
School of Computer Science and Informatics & School of
Social Sciences
Cardiff University
@pbFeed @mattlwilliams
@socdatalab
2. COSMOS Web Observatory – cosmosproject.net
Integrated
Open (“plug and play”)
Scalable (MongoDB data stores/
Hadoop Back End)
Burnap, P. et al. (2014) ‘COSMOS: Towards an Integrated and Scalable Service for Analyzing Social Media
on Demand’, International Journal of Parallel, Emergent and Distributed Systems
Usable – developed with social
scientists for social scientists
Reproducible/Citable Research
- export/share workflow
3. Web Observatory Features
• Data Collection and Curation
– Persistent connection to Twitter 1% Stream (~4
billion)
– Geocoded tweets from UK (~200 million annually)
– Bespoke keyword-driven Twitter collections (on crime
and security)
– ONS/Police API
– Drag and drop RSS
– Import CSV/JSON
– …Web enabled so push/pull data from anywhere (i.e.
other observatories!)
4. Web Observatory Features
• Data Transformation
– Word Frequency
– Point data frequency over time
– Social Network Analysis
– Geospatial Clustering
– Sentiment Analysis
– Demographic Analysis (gender, location, age,
occupation/social class) (Sloan et. al, 2015 PloS One)
– …API to plug new modules and benchmark tools…plus
transform data via other observatories
5. Supervised Machine Learning & Cyber Hate Speech
• Numerous instances in the hate speech human annotated sample of calls for
collective action and hateful incitement towards social groups exhibiting protected
characteristics.
• For instance, there were exclamations such as “send them home”, “get them out”, and
“should be hung”
• Implemented the Stanford Lexical Parser, along with a context-free lexical parsing
model, to extract typed dependencies within the tweet text (Marneffe et al., 2006).
• Typed dependencies provide a representation of grammatical relationships in a
sentence (or tweet in this case) that can be used as features for classification.
“Totally fed up with the way this country has turned into a haven for terrorists. Send them
all back home”.
• [root(ROOT-0, Send-1), nsubj(home-5, them-2), det(home-5, all-3), amod(home-5,
back-4), xcomp(Send-1, home-5)]
• Linguistically therefore, the term ‘them’ is associated with ‘home’ in a relational sense.
Sociologically, this is an “othering” phrase
• Combination of linguistics and sociology potentially provides a very interesting set of
features for the more nuanced classification of hate speech beyond BoW approach
7. Theory-driven Experimental Design
• Modeling the spread of cyber hate following a national security event
– Does cyber hate get propagated? (size)
– Does cyber hate continue to be propagated for long time? (survival)
• Study of the process of action, reaction and amplification (Cohen 1972)
• Moral panics: process of impact, inventory and reaction
• Social response is partly responsible for deepening impact of event then SM
reactions act as a force amplifier
8. Impact
Impact “during which the disaster strikes and the immediate
unorganised response to the death, injury and destruction
takes place”: Initial reaction and diffusion on SM
9. Inventory
Inventory “during which those exposed to the disaster begin to
form a preliminary picture of what has happened and of their
own condition”: Diffusion of rumour and hate on SM
10. Reaction
Reaction “images in the inventory were crystallized into more
organised opinions and attitudes”: Diffusion of wider issues on
SM – immigration, religion, security etc.
11. Size Results
-100 0 100 200 300 400 500 600 700
Far Right Political
Political
Police
Media
Cyberhate
News (per 100 stories)
Google (per 100 searches)
Sentiment
URL
Hashtag
Increased likelihood of retweet (all p < 0.05)
12. Survival Results
0.000.250.500.751.00
0 200000 400000 600000 800000 1000000
Analysis Time (Seconds)
No Cyberhate Moderate Cyberhate
Extreme Cyberhate
Kaplan−Meier Survival Estimates for Tweets Containing Cyberhate
0.000.250.500.751.00
0 200000 400000 600000 800000 1000000
Analysis Time (Seconds)
News Agent Police Agent
Political Agent Far Right Political Agent
Other Agent
Kaplan−Meier Survival Estimates for Tweet Agent Type
13. References
Williams, M. L. and Burnap, P. (2015) ‘Cyberhate on social
media in the aftermath of Woolwich: A case study in
computational criminology and big data. British Journal of
Criminology
Burnap, P. and Williams, M. (2015) ‘Cyber Hate Speech on
Twitter: An Application of Machine Classification and Statistical
Modeling for Policy and Decision Making’, Policy & Internet
(7:2)
Burnap, P., Williams, M.L. et al. (2014), ‘Tweeting the Terror:
Modelling the Social Media Reaction to the Woolwich Terrorist
Attack’, Social Network Analysis and Mining (4:2 )