This multidisciplinary research investigates Twitter posts related to sexual assaults and rape myths by characterizing and detecting the types of malicious intent, which leads to the beliefs on discrediting women and rape myths. We analyze narrative contexts in which such malicious intents are expressed and discuss their implications for gender violence policy design.
Pandey, R., Purohit, H., Stabile, B., & Grant, A. (2018). Distributional Semantics Approach to Detect Intent in Twitter Conversations on Sexual Assaults. IEEE/WIC/ACM Web Intelligence. ArXiv preprint: https://arxiv.org/abs/1810.01012
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Detect Policy-affecting Intent in Twitter Conversations for Rape and Sexual Assaults - Web-Intelligence 2018
1. DISTRIBUTIONAL SEMANTICS APPROACH TO DETECT INTENT IN
TWITTER CONVERSATIONS ON SEXUAL ASSAULTS
@gizmowiki @hemant_pt @bstabile1 @aubreyleigh86
Rahul Pandey, Hemant Purohit Bonnie Stabile, Aubrey Grant
2018 IEEE/WIC/ACM International Conference on Web Intelligence (WI’18)
Santiago, Chile
Dec 05, 2018
George Mason University, USA
2. Distributional Semantics Approach to Detect Intent in Twitter Conversations on Sexual Assaults, WI’18Distributional Semantics Approach to Detect Intent in Twitter Conversations on Sexual Assaults, WI’18
Outline
¨ Motivation
¤ Social media for Gender-based Violence (GBV)
¤ Problem: mining malicious user intent related to undermining GBV laws & policies
¤ User intent mining on social media
¨ Intent Classification Framework
¤ Social Construction Theory
¤ Malicious intent typology
¤ Distributed semantics for learning intent
¨ Experiments
¤ Data collection and annotation
¤ Evaluation schemes
¨ Result Analysis
¤ Prediction results analysis
¤ Topic content analysis
¤ Psycholinguistics analysis
¨ Discussion: Lessons, Limitations, and Future Work
2
*Disclaimer: Illustrations may
contain abusive language
due to the context of GBV.
3. Distributional Semantics Approach to Detect Intent in Twitter Conversations on Sexual Assaults, WI’18Distributional Semantics Approach to Detect Intent in Twitter Conversations on Sexual Assaults, WI’18
Gender-based Violence (GBV):
Major public health crisis
3
4. Distributional Semantics Approach to Detect Intent in Twitter Conversations on Sexual Assaults, WI’18Distributional Semantics Approach to Detect Intent in Twitter Conversations on Sexual Assaults, WI’18
How Can Info. Technology Help Curb GBV?
4
WHO
recommendations:
We have
many laws &
policies, but
lack scalable
tools for
studying
effectiveness
of policies!
5. Distributional Semantics Approach to Detect Intent in Twitter Conversations on Sexual Assaults, WI’18Distributional Semantics Approach to Detect Intent in Twitter Conversations on Sexual Assaults, WI’18
Leveraging Social Media for Curbing GBV
5
Social Media is a cheaper way to
directly listen and learn about
people’s thinking on GBV!
[Purohit, Banerjee, Hampton,
Shalin, Bhandutia, & Sheth, First Monday 2016]
Image Source: http://blog.ungei.org/16days-preventing-gender-based-violence-in-schools-in-asia-pacific/ ;
https://purposefullyscarred.wordpress.com/2013/05/07/the-danger-of-rape-myths-and-how-they-could-influence-court-anonymity
Types of GBV
6. Distributional Semantics Approach to Detect Intent in Twitter Conversations on Sexual Assaults, WI’18
Help Offering
Emotionally
Supporting
Expertise
Sharing
Joking
Marketing
Manipulating
Harassing
Bullying
Propagandizing
Deceiving
Rumoring
Accusing
Sensationalizing
Social Good Social Bad Social Ugly
Positive
Effects
Negative
Effects
Study User Intent on Social Media
[Purohit & Pandey (LNSN 2019). Intent Mining for the Good, Bad & Ugly Use of Social Web: Concepts, Methods, and Challenges.]
6
7. Distributional Semantics Approach to Detect Intent in Twitter Conversations on Sexual Assaults, WI’18Distributional Semantics Approach to Detect Intent in Twitter Conversations on Sexual Assaults, WI’18
Malicious Intent Examples
7
express doubts
express belief
8. Distributional Semantics Approach to Detect Intent in Twitter Conversations on Sexual Assaults, WI’18Distributional Semantics Approach to Detect Intent in Twitter Conversations on Sexual Assaults, WI’18
Problem
8
¨ RQ. How can we identify social media messages with
malicious intent, which leads to the beliefs on
discrediting women and spreading rape myths?
¨ Impact:
¤ Inform policymakers about public perception to help in the policy
formulation/revision of laws as well as improve the policy outcomes
e.g.,
“white women have lied
about rape against
black men for
generations”
9. Distributional Semantics Approach to Detect Intent in Twitter Conversations on Sexual Assaults, WI’18Distributional Semantics Approach to Detect Intent in Twitter Conversations on Sexual Assaults, WI’18
Related Work: Mining user intent on web
9
Query Intent in Information Retrieval/Search
• Jansen et al. [IPM 2008]
Buying-Selling Intent
• Hollerit et al. [WWW 2013]
Help Seeking-Offering Intent
• Purohit et al. [SocialCom 2015]
Travel & Food Recommendation Intent
• Wang et al. [AAAI 2015]
Well-known in Web Search:
Our challenge: ambiguous intents for public beliefs, unlike more specific commercial/help intent
Emerging area in social media research, e.g.,
Different from
Search intent
due to
conversational
expectations
10. Distributional Semantics Approach to Detect Intent in Twitter Conversations on Sexual Assaults, WI’18Distributional Semantics Approach to Detect Intent in Twitter Conversations on Sexual Assaults, WI’18
Solution: Contributions
10
¨ A novel theoretical-guided policy-affecting intent
typology and classification framework
n Novel application of Social Construction Theory
n Policy-affecting intent types: {Accusational, Validational, Sensational}
n Distributional semantics approach for learning intent
¨ A large-scale multidisciplinary study for prevalence of
policy-affecting intent and their contexts
11. Distributional Semantics Approach to Detect Intent in Twitter Conversations on Sexual Assaults, WI’18Distributional Semantics Approach to Detect Intent in Twitter Conversations on Sexual Assaults, WI’18
Outline
¨ Motivation
¤ Social media for Gender-based Violence (GBV)
¤ Problem: mining malicious user intent related to undermining GBV laws & policies
¤ User intent mining on social media
¨ Intent Classification Framework
¤ Social Construction Theory
¤ Malicious intent typology
¤ Distributed semantics for learning intent
¨ Experiments
¤ Data collection and annotation
¤ Evaluation schemes
¨ Result Analysis
¤ Prediction results analysis
¤ Topic content analysis
¤ Psycholinguistics analysis
¨ Discussion: Lessons, Limitations, and Future Work
11
12. Distributional Semantics Approach to Detect Intent in Twitter Conversations on Sexual Assaults, WI’18Distributional Semantics Approach to Detect Intent in Twitter Conversations on Sexual Assaults, WI’18
Social Construction Theory:
Framework for studying effects of policies
12
• Liars, “sluts”
or vengeful
women
• Innocents,
victims, not
blameworthy
• Promiscuous,
feminist or
abrasive
• Athletes,
breadwinners,
men with
potential
Advantaged Contender
DeviantsDependents
Power
ofactorsStrongWeak
Social Construction
of actorsPositive Negative
Public
perception
can hurt
these actors
the most!
13. Distributional Semantics Approach to Detect Intent in Twitter Conversations on Sexual Assaults, WI’18Distributional Semantics Approach to Detect Intent in Twitter Conversations on Sexual Assaults, WI’18
Accusational
ValidationalSensational
Malicious Intent Typology: Based on actors identified
by domain experts in the Social Construction framework
13
focus more on politics
or provocation
than on the issue of
rape or sexual assault
express belief
in the accused
or accuser
express doubts about
or undermine accusers
14. Distributional Semantics Approach to Detect Intent in Twitter Conversations on Sexual Assaults, WI’18Distributional Semantics Approach to Detect Intent in Twitter Conversations on Sexual Assaults, WI’18
Malicious Intent Classification: Challenges
14
¨ A semantic text classification problem
¨ Varied intentions in one message
n Complicates natural language understanding
¨ Lack of sufficient contextual details
n Sparse intent cues
¨ Poor learning of the useful regularities due to
surrounding noise
n Multiple textual forms for same intent
15. Distributional Semantics Approach to Detect Intent in Twitter Conversations on Sexual Assaults, WI’18Distributional Semantics Approach to Detect Intent in Twitter Conversations on Sexual Assaults, WI’18
Distributional Semantics-based
Classification Framework
15
Distributional Semantics approach helps in capturing long-term dependency features for ambiguous intent.
16. Distributional Semantics Approach to Detect Intent in Twitter Conversations on Sexual Assaults, WI’18Distributional Semantics Approach to Detect Intent in Twitter Conversations on Sexual Assaults, WI’18
Distributional Semantics-based
Classification Framework
16
¨ Fixed Feature Extractor
¤ Training a Convolutional Neural Network
n Google’s word2vec initialized embedding layer due to
small-scale labeled data
¨ Logistic Regression Classifier
n Using value of fully connected network as fixed feature
code for training inputs
17. Distributional Semantics Approach to Detect Intent in Twitter Conversations on Sexual Assaults, WI’18Distributional Semantics Approach to Detect Intent in Twitter Conversations on Sexual Assaults, WI’18
Outline
¨ Motivation
¤ Social media for Gender-based Violence (GBV)
¤ Problem: mining malicious user intent related to undermining GBV laws & policies
¤ User intent mining on social media
¨ Intent Classification Framework
¤ Social Construction Theory
¤ Malicious intent typology
¤ Distributed semantics for learning intent
¨ Experiments
¤ Data collection and annotation
¤ Evaluation schemes
¨ Result Analysis
¤ Prediction results analysis
¤ Topic content analysis
¤ Psycholinguistics analysis
¨ Discussion: Lessons, Limitations, and Future Work
17
18. Distributional Semantics Approach to Detect Intent in Twitter Conversations on Sexual Assaults, WI’18Distributional Semantics Approach to Detect Intent in Twitter Conversations on Sexual Assaults, WI’18
Experiments: Data collection
18
¨ Twitter data
n Twitter Streaming API
n Keyword based (‘filter/track’) method
n CitizenHelper System [Karuna et al. ICWSM’17]
n Seed keywords (‘rape’ and ‘sexual assault’)
n Total collection: 5,434,784 tweets
¨ Myth-set
¤ Filter: {lie, lying, lied, liar, hoax, fake, false, fabricated, made up}
¤ 112,369 tweets
19. Distributional Semantics Approach to Detect Intent in Twitter Conversations on Sexual Assaults, WI’18Distributional Semantics Approach to Detect Intent in Twitter Conversations on Sexual Assaults, WI’18
Experiments: Annotation
19
¨ Data annotation for training
n 2500 randomly sampled unique messages from Myth-set
n Figure Eight crowdsourcing platform and student volunteers
n confidence_score > 66%
n final set: 1163 messages
¨ Label distribution Accusationa
l
46%
Validational
14%
Sensational
30%
Other
10%
Accusational Validational Sensational Other
20. Distributional Semantics Approach to Detect Intent in Twitter Conversations on Sexual Assaults, WI’18Distributional Semantics Approach to Detect Intent in Twitter Conversations on Sexual Assaults, WI’18
Experiments: Evaluation schemes
20
¨ Baseline classifiers
¤ [B1] BoW Features + Logistic Regression (LR) Model: Using Bag-of-Words
features and training on logistic model
¤ [B2] word2vec Average Features + LR Model: Using word2vec features and
training on logistic model
¤ [B3] Random Initialized Embedding + CNN: Using randomly initialized
embeddings and training a CNN network
¤ [B4] word2vec Initialized Embedding + CNN (Softmax): Initializing the
embeddings with word2vec and training a CNN network
¤ [B5] CNN (Randomly Initialized) Codes + LR: Training CNN with randomly
initialized embeddings and using as a feature codes to finally train LR model
¨ Proposed classifiers
¤ [P] CNN (word2vec Adapted) Codes + LR: Training CNN with word2vec
initialized embeddings and using as a feature codes to finally train LR model
¨ Performance metrics: avg. accuracy and micro-F1 score across 10-fold CV
21. Distributional Semantics Approach to Detect Intent in Twitter Conversations on Sexual Assaults, WI’18Distributional Semantics Approach to Detect Intent in Twitter Conversations on Sexual Assaults, WI’18
Outline
¨ Motivation
¤ Social media for Gender-based Violence (GBV)
¤ Problem: mining malicious user intent related to undermining GBV laws & policies
¤ User intent mining on social media
¨ Intent Classification Framework
¤ Social Construction Theory
¤ Malicious intent typology
¤ Distributed semantics for learning intent
¨ Experiments
¤ Data collection and annotation
¤ Evaluation schemes
¨ Result Analysis
¤ Prediction results analysis
¤ Topic content analysis
¤ Psycholinguistics analysis
¨ Discussion: Lessons, Limitations, and Future Work
21
22. Distributional Semantics Approach to Detect Intent in Twitter Conversations on Sexual Assaults, WI’18Distributional Semantics Approach to Detect Intent in Twitter Conversations on Sexual Assaults, WI’18
Results: Malicious intent classification
22
Proposed
approach is better
due to the power
distributional
semantics features!
23. Distributional Semantics Approach to Detect Intent in Twitter Conversations on Sexual Assaults, WI’18Distributional Semantics Approach to Detect Intent in Twitter Conversations on Sexual Assaults, WI’18
Analysis: Prediction results
23
¨ Prediction on Myth-set
- 31,129 unique messages
¨ Observations
Ø Prevalence of “Accusational” intent messages
e.g., “crazy how a woman could lie &say a man raped her &people believe her url”
Ø ~1/3 messages are “Sensational” indicating social media as a
channel to propagate agendas
e.g., “bill clinton who is been impeached, disbarred, accused of rape, other sexual
misconduct, lying under oath is about tell”
24. Distributional Semantics Approach to Detect Intent in Twitter Conversations on Sexual Assaults, WI’18Distributional Semantics Approach to Detect Intent in Twitter Conversations on Sexual Assaults, WI’18
Analysis: Topical content
24
¨ Trained LDA model for topic modeling using MALLET
¨ Observations:
1. “Accusational” intent have context of specific target groups
2. “Sensational” intent focusses on politics and current affairs
3. “Validational” intent have context of verifying or validating the facts
Accusational Sensational Validational
TOPIC 1 rape url raped lie false women lying
case men girls time white made
black rapes saint accused money
rapist shit
url lie fake sexual lied steal made
charges accused criminal trial called
court taxes raping muslim defended
abuse white real
fake victims accused victim reported girl
case fuck proven called white claim culture
life making falsely understand reason
stories rapists
TOPIC 2 man asaram police hoax victims
found year allegation stone forced
fined number fact revenge females
delhi free media lot rolling
rapes year kill donald corrupt cheat
vote hoax muslims pedophile proven
laughed benghazi war hate calls ass
world shit
liar claims stop thing police report
accusation innocent raping evidence
low guilty actual charges feel talking
woman hard lies assume
TOPIC 3 lied bapu filed assault allegations
real accuser stop guilty report lives
hate charge feminists lies attention
assaulted derrick trial support
rape lying liar trump hillary raped
women bill clinton assault victim false
victims case murder racist media man
support time
lying women assault lied girls true ppl child
bad problem year makes wrong good
things call hate world calling assaults
25. Distributional Semantics Approach to Detect Intent in Twitter Conversations on Sexual Assaults, WI’18Distributional Semantics Approach to Detect Intent in Twitter Conversations on Sexual Assaults, WI’18
Analysis: Psycholinguistic
25
¨ Performed psychometric analysis using the popular
software of Linguistic Inquiry Word Count (LIWC)
n ~2000 random samples of each intent types of predicted posts
¨ Observations:
1. Casual writing styles for “Accusational” and “Validational”
2. Validational intent messages express greater certainty in
the context of communication
3. Sensational intent messages use greater expression of
power and negative emotion
26. Distributional Semantics Approach to Detect Intent in Twitter Conversations on Sexual Assaults, WI’18Distributional Semantics Approach to Detect Intent in Twitter Conversations on Sexual Assaults, WI’18
Outline
¨ Motivation
¤ Social media for Gender-based Violence (GBV)
¤ Problem: mining malicious user intent related to undermining GBV laws & policies
¤ User intent mining on social media
¨ Intent Classification Framework
¤ Social Construction Theory
¤ Malicious intent typology
¤ Distributed semantics for learning intent
¨ Experiments
¤ Data collection and annotation
¤ Evaluation schemes
¨ Result Analysis
¤ Prediction results analysis
¤ Topic content analysis
¤ Psycholinguistics analysis
¨ Discussion: Lessons, Limitations, and Future Work
26
27. Distributional Semantics Approach to Detect Intent in Twitter Conversations on Sexual Assaults, WI’18Distributional Semantics Approach to Detect Intent in Twitter Conversations on Sexual Assaults, WI’18
Discussion: Lessons
27
¨ Novel adaption of the framework of social construction theory
¤ Discovered three types of policy-affecting intent categories
¨ A scalable alternative for policy-relevant data collection
¤ Provide assistance to the policy analysts on gender-based violence
¤ Complement the existing costly, survey-driven methods
¤ Validate a novel design of intent classifier for short-text social posts
¨ Observed “Accusational” intent messages most prevalent in
social media
¤ target the credibility of women and highlight the ‘advantaged’ status
of male accusers
28. Distributional Semantics Approach to Detect Intent in Twitter Conversations on Sexual Assaults, WI’18Distributional Semantics Approach to Detect Intent in Twitter Conversations on Sexual Assaults, WI’18
Discussion: Limitations & future work
28
¨ Limitations
¤ Keyword dependent
¤ Fixed time duration
¤ Only English language
¨ Future Work
¤ Study the presence of intents in randomly collected public data
¤ Incorporate perceptual and cognitive features as seen in LIWC
¤ Compare policy-affecting intentional context across platforms
29. Distributional Semantics Approach to Detect Intent in Twitter Conversations on Sexual Assaults, WI’18Distributional Semantics Approach to Detect Intent in Twitter Conversations on Sexual Assaults, WI’18
Conclusions
29
¨ Presented first quantitative analysis of policy-affecting intent expressed on
social media regarding rape and sexual assault by a novel application of social
construction theory
¨ Demonstrated a novel CNN-based policy-affecting malicious intent classifier
with micro F1-score up to 97% for an intent class
¤ representation of short-text by external knowledge of word2vec embeddings
¤ More optimal features for efficient learning compared to all baselines
¨ Observed that “Accusational” and “Sensational” intents were most common in
social media for rape myths & sexual assault with a focus on targeted groups (e.g.,
gender/occupation)
30. Distributional Semantics Approach to Detect Intent in Twitter Conversations on Sexual Assaults, WI’18Distributional Semantics Approach to Detect Intent in Twitter Conversations on Sexual Assaults, WI’18
PAPER: https://arxiv.org/abs/1810.01012
CONTACT: rpandey4@gmu.edu, hpurohit@gmu.edu
Acknowledgement:
Image sources, Human_Info_lab students as well as sponsors:
Questions?
30
Grant Support:
IIS #1657379