Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Screening Twitter Users for Depression and PTSD
1. Screening Twitter Users for
Depression and PTSD with
Lexical Decision Lists
Ted Pedersen
University of Minnesota, Duluth
tpederse@d.umn.edu
2. Motivations
● Interesting classification task
● Even more interesting to identify vocabulary
that indicates depression or PTSD
● Or tendency to self-report?
● Focused on decision lists, a simple machine
learning method that learns a human
interpretable model
3. Decision Lists
●
All tweets for each user kept on single line (to avoid splitting)
●
Text is lowercased, anything not alpha-numeric is removed
●
Randomly shuffled
●
Ngram features learned from first 8 million words in training data
for each condition
●
Ngrams may be binary or any length 1-6
●
Ngrams made up of stopwords removed (or not)
●
Ngrams weighted by frequency (or binary)
●
Eight different decision lists learned
●
system2 most accurate : Ngrams 1-6, stopwords, and binary weighting
4. Decision Lists
● Any Ngram that meets previous three conditions
and occurs at least 50 times more often in one
condition than the other is selected as a feature
● Since conditions are binary (DvC, PvC, DvP)
frequency in one condition is positive while the
other is negative
● Ngrams that occur about the same number of
times in both conditions not especially indicative
or interesting
5. Running Decision List
● For each Ngram in tweet, check to see if it
is in decision list
● If using frequency weight, add value (positive
or negative) of the Ngram to an overall score
● If using binary weight, add 1 or -1 to overall
score
● Do this for all tweets for a user, if overall
score > 0 then one class, <= 0 the other
6. Decision List
● Decision lists often make a classification after
finding the most indicative feature
● Elected to use all features found in user tweets
to provide more nuanced decision
● System2 decision list has
● 18,617 features (DvC)
● 21,145 features (DvP)
● 17,936 features (PvC)
7. Results?
DvP DvC PvC
System2 .769 .736 .720
System1 .760 .731 .721
Random .471 .492 .489
● System2 and System1 are identical except
that 2 uses a stoplist while 1 does not
● Both use Ngrams 1-6 and binary weighting
8. Top 10 Features
● DvC
● Depression : ud83c, please, love, follow, ufe0f, re, f*cking, love you, im, udf38
● Control : http, http t co, http t, co, t co, ud83d, lol, u2764 u2764 -, u2764 u2764
u2764, u2764 u2764 u2764 u2764
● PvC
●
PTSD : u2026, co, t co, u043e, u0430, u0435, thank, thank you, please, u0438
● Control : ud83d, rt, ude02, ud83d ude02, gt, u2764 -, lol, u201c, ude02 ud83d -,
ud83d ude02 ud83d
● DvP
● Depression : ud83d, ud83c, rt, love, ude02, ud83d ude02, im, follow, don t, don,
love you
●
PTSD : co, t co, http -, http t, http t co, u2026, amp, news, thanks, answer
9. Lessons
● Standard machine learning algorithms can
perform well at this task
● Even very simple ones like our decision lists
● Emoticons and Emoji are often strong indicators
● Ngrams of varying length combined with binary
weights attained best results
● Frequency weighting very poor
● Stoplist has minimal impact
10. Discussion
● How typical is it to self-report depression or PTSD?
● Is desire to self-report an indicator of something else?
● Do untreated / undiagnosed users look differently?
● How common are these conditions?
● PTSD : 7-8% (www.ptsd.va.gov)
● Depression : 17% (www.adaa.org)
● Typical to have multiple diagnoses
● PTSD + Depression
● Anxiety + Depression
11. A case of self-reporting
Which is worse, cancer or depression? The answer
is clear. Depression is worse: depression makes
you want to die and cancer doesn’t.
I’ve spent all my adult life with depression lurking. I
haven’t mentioned it to very many people at all. For
the first ten years I talked about it to nobody at all,
for the next decade only Gill and therapists ...
12. Adam Kilgarriff
● Posted to blog May 3, 2015. Died
May 16 at age 55.
● https://blog.kilgarriff.co.uk/?p=101