LREC 2016 Workshop. Resources and Processing of Linguistic and Extra-Linguistic Data from People with Various Forms of Cognitive/Psychiatric Impairments (RaPID-2016)
Paper: https://www.researchgate.net/publication/304320133_Mining_Auditory_Hallucinations_from_Unsolicited_Twitter_Posts
Abstract:
Auditory hallucinations are common in people who experience psychosis and psychotic-like phenomena. This exploratory study aimed to establish the feasibility of harvesting and mining datasets from unsolicited Twitter posts to identify potential auditory hallucinations. To this end, several search queries were defined to collect posts from Twitter. A training sample was annotated by research psychologists for relatedness to auditory hallucinatory experiences and a text classifier was trained on that dataset to identify tweets related to auditory hallucinations. A number of features were used including sentiment polarity and mentions of specific semantic classes, such as fear expressions, communication tools and abusive language. We then used the classification model to generate a dataset with potential mentions of auditory hallucinatory experiences. A preliminary analysis of a dataset (N = 4957) revealed that posts linked to auditory hallucinations were associated with negative sentiments. In addition, such tweets had a higher proportionate distribution between the hours of 11pm and 5am in comparison to other tweets.
Mining auditory hallucinations from unsolicited Twitter posts
1. Mining auditory
hallucinations from
unsolicited Twitter
posts
M. Belousov, M. Dinev, R. M. Morris, N. Berry, S. Bucci, G. Nenadic
University of Manchester
Health eResearch Centre (HeRC), The Farr Institute of Health Informatics Research
Portorož, May 2016
2. Mining auditory
hallucinations from
unsolicited Twitter
posts
schizophrenia
hearing voices
mental
psychosissymptom
sound
health
M. Belousov, M. Dinev, R. M. Morris, N. Berry, S. Bucci, G. Nenadic
University of Manchester
Health eResearch Centre (HeRC), The Farr Institute of Health Informatics Research
Portorož, May 2016
3. Mining auditory
hallucinations from
unsolicited Twitter
posts
social network
brief message
fewer than 140 characters
310M active users
share opinions
spontaneous unforced
unasked-for
M. Belousov, M. Dinev, R. M. Morris, N. Berry, S. Bucci, G. Nenadic
University of Manchester
Health eResearch Centre (HeRC), The Farr Institute of Health Informatics Research
Portorož, May 2016
4. Mining auditory
hallucinations from
unsolicited Twitter
posts
knowledge discovery
exploratory
patternsunseen
data
analysis
M. Belousov, M. Dinev, R. M. Morris, N. Berry, S. Bucci, G. Nenadic
University of Manchester
Health eResearch Centre (HeRC), The Farr Institute of Health Informatics Research
Portorož, May 2016
5. Mining auditory
hallucinations from
unsolicited Twitter
posts
schizophrenia
hearing voices
mental
psychosissymptom
sound
health
knowledge discovery
patternsunseen
social network
brief message
fewer than 140 characters
320M active usersshare opinions
spontaneous unforced
unasked-for
M. Belousov, M. Dinev, R. M. Morris, N. Berry, S. Bucci, G. Nenadic
University of Manchester
Health eResearch Centre (HeRC), The Farr Institute of Health Informatics Research
Portorož, May 2016
6. Research aim
Q: Is it feasible to generate useful datasets from
unsolicited Twitter posts regarding auditory
hallucinatory experiences to support psychological
investigations?
6
7. Research aim
Q: Is it feasible to generate useful datasets from
unsolicited Twitter posts regarding auditory
hallucinatory experiences to support psychological
investigations?
6
A: Classification model that can predict whether a
given post is related to hallucinatory experiences.
8. Potentially related posts
7
I am hearing a scary voice right now, I don’t know if
it’s in my head or in television.. Crazy
All twitter posts were paraphrased to preserve anonymity
✅
If hallucinating is thought of as hearing voices that
are not actually real, then these painkillers are
causing me to hallucinate like mad ✅
9. Unrelated posts
8
My grandmom is watching Deliver Us From Evil and I
can hear this weird high-pitched voice and want
Ralph Sarchie to hold me
All twitter posts were paraphrased to preserve anonymity
❌
So I was convinced I was hearing stuff. It was so
funny because the noise was coming from the
kitchen but I thought I was hallucinating ❌
10. Iterative workflow
9
Define search queries
Collect unique posts from Twitter
Annotate posts &
Explore data
Predict relatedness of posts to
hallucinatory experiences
Analyse data
Redefine
search queries
11. Data collection
10
Search query
hallucinating hearing
(“hear things” OR “hearing things”) “in my head”
hearing scary things “in my head”
(hear OR hearing)
(“other people” OR “other ppls” OR “other ppl”) thoughts
(voice OR voices) (commenting OR criticising)
(scary OR frightening OR “everything I do”)
(hear OR hearing) (voice OR voices)
(god OR angel OR allah OR spirit OR soul OR “holy spirit” OR djinn OR jinn)
(hear OR hearing) (voice OR voices)
(scary OR devil OR demon OR daemon OR evil OR “evil spirit”)
List of defined search queries for Twitter Search API
12. Data annotation
11
• Two research psychologists manually
annotated posts:
• Assign classes: related or unrelated
to hallucinations
• Highlight specific phrases to describe
their decisions
• Later highlighted words and phrases
were utilised to identify characteristics
of each classification category
Data annotation process
RESULT: 401 annotated examples: 94 related to hallucinatory
experiences
• The observed IAA was 0.85 on 41 examples (10% of the final annotated set)
13. Data exploration: semantic classes
12
• Relative (father, friend)
• Communication Tool (phone)
• Audio Device
(headphones, TV)
• Drug (cannabis, painkillers)
• Audio Recording (voicemail)
• Possible Hallucination
(seeing things, in my head)
• Audio & Visual Media, Apps
(song, YouTube, Siri)
• Religious Term (prayer)
• Emotional Support (helpline)
• Own Voice Indicator
(my voice, our own voice)
• Fear Expression (scared,
creepy)
• Abusive Language (sh*t, hell)
• Stigmatising Language
(crazy, insane)
14. Text classification pipeline
13
Im hearing a scary voice rn,idk if
it’s in my head or in TV..craazy
Information
Extraction
Classification
Text
Preprocessing
corrected
text
structured
text
raw
(unstructured)
text
structured
text
label
label: related to hallucinatory experience
I am hearing a scary voice right now, I don’t know if
audio device
it’s in my head or in television.. Crazy
stigmatising lang.
fear expr.
possible hallucination
O V V D A N R R O V V P
AL P D N & P N
POS tagset from Gimpel et al. (2011): O - personal pronoun, V - verb, D - determiner, etc.
15. Information extraction
14
My grandmom is watching Deliver Us From Evil and
I can hear this weird high-pitched voice and want
Ralph Sarchie to hold me
Neg. sentimentRelative [1] NE (person) [1] POS Tags
NE (misc) [1]
*Stanford NER using 4-class model trained on the CoNLL 2003 data
*
16. Information extraction
14
My grandmom is watching Deliver Us From Evil and
I can hear this weird high-pitched voice and want
Ralph Sarchie to hold me
Neg. sentimentRelative [1] NE (person) [1]
key phrase
extraction
POS Tags
hear this weird high-pitched voice
Neg. sentimentWeird / strange [1] POS Tags
V D A A N
NE (misc) [1]
*Stanford NER using 4-class model trained on the CoNLL 2003 data
*
17. Groups of features
15
Feature group Features
Mentions of semantic classes mentions of each semantic class
Key phrases sentiment polarity, sem. classes, POS tags
Part-of-speech tags nouns, verbs, adjectives, etc.
Sentiment polarity positive, negative or neutral
Popularity of the post likes, retweets
Use of nonstandard language spelling mistakes, abbreviations
Number of Twitter entities URLs, #hashtags, @mentions
Named entities persons, locations, organisations
Lexical distribution sentences, words, characters
18. Classification scenario
• 401 labelled examples: 94 related; 307 unrelated
• Three different types of classification methods:
• Naive Bayes (probabilistic model)
• Support Vector Machine (geometric model)
• AdaBoost (boosting of the tree model)
• Compare performance with simple baseline: tf-idf
features
16
19. Evaluation
17
Based on ten experiments of stratified 10-fold cross validation
Baseline features outperform only with SVM, difference is non-significant (p-value=0.375)
Classification performance of various classification methods on two
different sets of features
NB
SVM
AdaBoost
F2-score
0 0.225 0.45 0.675 0.9
0.711
0.751
0.486
0.772
0.743
0.831
Proposed features
Baseline features
🏆
20. Contribution of features
18
Features F2-score
Mentions of semantic classes * 0.769 ▼
Key phrases * 0.788 ▼
Part-of-speech tags 0.817 ▼
Sentiment polarity * 0.818 ▼
Popularity of the post 0.828 ▼
Use of nonstandard language 0.831 ▬
Number of Twitter entities 0.832 ▲
Named entities 0.832 ▲
Lexical distribution 0.833 ▲
All features 0.831 ▲
* Statistically significant differences are marked with asterisk
21. Error analysis (highlights)
19
Text Predicted Actual
I do not hear voices, I am not
paranoid
✅
Related
❌
Unrelated
I’m hallucinating I’m hearing
hawks! Oh hang on, it is just
the television
✅
Related
❌
Unrelated
The voices which I hear every
night tell me to do it
❌
Unrelated
✅
Related
All twitter posts were paraphrased to preserve anonymity
22. Generating dataset for analysis
1. Take best-performed classification model
2. Predict relatedness for unlabelled examples
3. Combine with 401 labelled (annotated)
examples
RESULT: 4957 examples: 546 potentially related to
hallucinatory experiences *
20
* e.g. Wiles et. al (2006) national survey only 62 cases identified
24. Preliminary data analysis
21
Related
Unrelated
0 25 50 75 100
72%
19%
28%
81%• Negative sentiments
significantly associated
with posts that indicated
the occurrence of auditory
hallucinations
• Posts linked to auditory
hallucinations had a higher
proportionate distribution
between the hours of
11pm and 5am
25. Summary
• Experimental methodology to harvest and mine
datasets from unsolicited Twitter posts to identify
potential psychotic(-like) experiences.
• Classification model that can relatively accurate predict
the relatedness of posts to auditory hallucinations
• Preliminary data analysis that identified interesting
patterns in sentiment polarity and posting time
• Future research: investigate expressions of sleep in
Twitter users’ who report a diagnosis of a psychosis-
related disorder
22
26. 23
Questions?
Acknowledgements
Centre for Doctoral Training, School of Computer Science, University of Manchester
Health eResearch Centre (HeRC), The Farr Institute of Health Informatics Research
School of Psychological Sciences, University of Manchester