SlideShare uma empresa Scribd logo
1 de 12
Baixar para ler offline
Screening Twitter Users for
Depression and PTSD with
Lexical Decision Lists
Ted Pedersen
University of Minnesota, Duluth
tpederse@d.umn.edu
Motivations
● Interesting classification task
● Even more interesting to identify vocabulary
that indicates depression or PTSD
● Or tendency to self-report?
● Focused on decision lists, a simple machine
learning method that learns a human
interpretable model
Decision Lists
●
All tweets for each user kept on single line (to avoid splitting)
●
Text is lowercased, anything not alpha-numeric is removed
●
Randomly shuffled
●
Ngram features learned from first 8 million words in training data
for each condition
●
Ngrams may be binary or any length 1-6
●
Ngrams made up of stopwords removed (or not)
●
Ngrams weighted by frequency (or binary)
●
Eight different decision lists learned
●
system2 most accurate : Ngrams 1-6, stopwords, and binary weighting
Decision Lists
● Any Ngram that meets previous three conditions
and occurs at least 50 times more often in one
condition than the other is selected as a feature
● Since conditions are binary (DvC, PvC, DvP)
frequency in one condition is positive while the
other is negative
● Ngrams that occur about the same number of
times in both conditions not especially indicative
or interesting
Running Decision List
● For each Ngram in tweet, check to see if it
is in decision list
● If using frequency weight, add value (positive
or negative) of the Ngram to an overall score
● If using binary weight, add 1 or -1 to overall
score
● Do this for all tweets for a user, if overall
score > 0 then one class, <= 0 the other
Decision List
● Decision lists often make a classification after
finding the most indicative feature
● Elected to use all features found in user tweets
to provide more nuanced decision
● System2 decision list has
● 18,617 features (DvC)
● 21,145 features (DvP)
● 17,936 features (PvC)
Results?
DvP DvC PvC
System2 .769 .736 .720
System1 .760 .731 .721
Random .471 .492 .489
● System2 and System1 are identical except
that 2 uses a stoplist while 1 does not
● Both use Ngrams 1-6 and binary weighting
Top 10 Features
● DvC
● Depression : ud83c, please, love, follow, ufe0f, re, f*cking, love you, im, udf38
● Control : http, http t co, http t, co, t co, ud83d, lol, u2764 u2764 -, u2764 u2764
u2764, u2764 u2764 u2764 u2764
● PvC
●
PTSD : u2026, co, t co, u043e, u0430, u0435, thank, thank you, please, u0438
● Control : ud83d, rt, ude02, ud83d ude02, gt, u2764 -, lol, u201c, ude02 ud83d -,
ud83d ude02 ud83d
● DvP
● Depression : ud83d, ud83c, rt, love, ude02, ud83d ude02, im, follow, don t, don,
love you
●
PTSD : co, t co, http -, http t, http t co, u2026, amp, news, thanks, answer
Lessons
● Standard machine learning algorithms can
perform well at this task
● Even very simple ones like our decision lists
● Emoticons and Emoji are often strong indicators
● Ngrams of varying length combined with binary
weights attained best results
● Frequency weighting very poor
● Stoplist has minimal impact
Discussion
● How typical is it to self-report depression or PTSD?
● Is desire to self-report an indicator of something else?
● Do untreated / undiagnosed users look differently?
● How common are these conditions?
● PTSD : 7-8% (www.ptsd.va.gov)
● Depression : 17% (www.adaa.org)
● Typical to have multiple diagnoses
● PTSD + Depression
● Anxiety + Depression
A case of self-reporting
Which is worse, cancer or depression? The answer
is clear. Depression is worse: depression makes
you want to die and cancer doesn’t.
I’ve spent all my adult life with depression lurking. I
haven’t mentioned it to very many people at all. For
the first ten years I talked about it to nobody at all,
for the next decade only Gill and therapists ...
Adam Kilgarriff
● Posted to blog May 3, 2015. Died
May 16 at age 55.
● https://blog.kilgarriff.co.uk/?p=101

Mais conteúdo relacionado

Mais procurados

Quiz de la unidad didactica
Quiz de la unidad didacticaQuiz de la unidad didactica
Quiz de la unidad didactica
Juliana Forero
 
Stoplight Strategies
Stoplight StrategiesStoplight Strategies
Stoplight Strategies
riotryan
 

Mais procurados (9)

Writing Tests Effectively
Writing Tests EffectivelyWriting Tests Effectively
Writing Tests Effectively
 
Text Analysis Of The Interapy Pts Corpus
Text Analysis Of The Interapy Pts CorpusText Analysis Of The Interapy Pts Corpus
Text Analysis Of The Interapy Pts Corpus
 
Logical reasoning questions and answers
Logical reasoning questions and answersLogical reasoning questions and answers
Logical reasoning questions and answers
 
Logical reasoning questions and answers
Logical reasoning questions and answersLogical reasoning questions and answers
Logical reasoning questions and answers
 
Quiz de la unidad didactica
Quiz de la unidad didacticaQuiz de la unidad didactica
Quiz de la unidad didactica
 
All you need to know about the GMAT.ppt
All you need to know about the GMAT.pptAll you need to know about the GMAT.ppt
All you need to know about the GMAT.ppt
 
The Evolution of Speech Segmentation: A Computer Simulation
The Evolution of Speech Segmentation: A Computer SimulationThe Evolution of Speech Segmentation: A Computer Simulation
The Evolution of Speech Segmentation: A Computer Simulation
 
Stoplight Strategies
Stoplight StrategiesStoplight Strategies
Stoplight Strategies
 
How i became a data scientist
How i became a data scientistHow i became a data scientist
How i became a data scientist
 

Destaque (8)

Duluth : Word Sense Discrimination in the Service of Lexicography
Duluth : Word Sense Discrimination in the Service of LexicographyDuluth : Word Sense Discrimination in the Service of Lexicography
Duluth : Word Sense Discrimination in the Service of Lexicography
 
The Semantic Quilt
The Semantic QuiltThe Semantic Quilt
The Semantic Quilt
 
Communication - Human Factors
Communication - Human FactorsCommunication - Human Factors
Communication - Human Factors
 
What are the different Senses / Meanings of the Word Statistics
What are the different Senses / Meanings of the Word StatisticsWhat are the different Senses / Meanings of the Word Statistics
What are the different Senses / Meanings of the Word Statistics
 
The horizon isn't found in a dictionary : Identifying emerging word senses a...
The horizon isn't found in a  dictionary : Identifying emerging word senses a...The horizon isn't found in a  dictionary : Identifying emerging word senses a...
The horizon isn't found in a dictionary : Identifying emerging word senses a...
 
Lecture: Word Senses
Lecture: Word SensesLecture: Word Senses
Lecture: Word Senses
 
English tg 3 third quarter
English tg 3 third quarterEnglish tg 3 third quarter
English tg 3 third quarter
 
Lexical Semantics+Wsd1
Lexical Semantics+Wsd1Lexical Semantics+Wsd1
Lexical Semantics+Wsd1
 

Semelhante a Screening Twitter Users for Depression and PTSD

Particle swarm optimization (PSO) ppt presentation
Particle swarm optimization (PSO) ppt presentationParticle swarm optimization (PSO) ppt presentation
Particle swarm optimization (PSO) ppt presentation
LatestShorts
 

Semelhante a Screening Twitter Users for Depression and PTSD (12)

Analysis of Post Traumatic Stress Disorder (PTSD) patients using realtime data
Analysis of Post Traumatic Stress Disorder (PTSD) patients using realtime dataAnalysis of Post Traumatic Stress Disorder (PTSD) patients using realtime data
Analysis of Post Traumatic Stress Disorder (PTSD) patients using realtime data
 
Observational studies in social media
Observational studies in social mediaObservational studies in social media
Observational studies in social media
 
Alz Hack II
Alz Hack IIAlz Hack II
Alz Hack II
 
Text classification & sentiment analysis
Text classification & sentiment analysisText classification & sentiment analysis
Text classification & sentiment analysis
 
Automated health responses
Automated health responses Automated health responses
Automated health responses
 
Sarcasm Detection: Achilles Heel of sentiment analysis
Sarcasm Detection: Achilles Heel of sentiment analysisSarcasm Detection: Achilles Heel of sentiment analysis
Sarcasm Detection: Achilles Heel of sentiment analysis
 
Particle swarm optimization (PSO) ppt presentation
Particle swarm optimization (PSO) ppt presentationParticle swarm optimization (PSO) ppt presentation
Particle swarm optimization (PSO) ppt presentation
 
Zina Ibrahim - Big Data in Mental Health - 23rd July 2014
Zina Ibrahim - Big Data in Mental Health - 23rd July 2014Zina Ibrahim - Big Data in Mental Health - 23rd July 2014
Zina Ibrahim - Big Data in Mental Health - 23rd July 2014
 
Advanced regression and model selection
Advanced regression and model selectionAdvanced regression and model selection
Advanced regression and model selection
 
Diabetes Prediction Using Machine Learning
Diabetes Prediction Using Machine LearningDiabetes Prediction Using Machine Learning
Diabetes Prediction Using Machine Learning
 
When recommendation systems go bad
When recommendation systems go badWhen recommendation systems go bad
When recommendation systems go bad
 
[DigiHealth 22] Budget friendly sample sizes for genomics research - Ognjen M...
[DigiHealth 22] Budget friendly sample sizes for genomics research - Ognjen M...[DigiHealth 22] Budget friendly sample sizes for genomics research - Ognjen M...
[DigiHealth 22] Budget friendly sample sizes for genomics research - Ognjen M...
 

Mais de University of Minnesota, Duluth

Mais de University of Minnesota, Duluth (20)

Muslims in Machine Learning workshop (NeurlPS 2021) - Automatically Identifyi...
Muslims in Machine Learning workshop (NeurlPS 2021) - Automatically Identifyi...Muslims in Machine Learning workshop (NeurlPS 2021) - Automatically Identifyi...
Muslims in Machine Learning workshop (NeurlPS 2021) - Automatically Identifyi...
 
Automatically Identifying Islamophobia in Social Media
Automatically Identifying Islamophobia in Social MediaAutomatically Identifying Islamophobia in Social Media
Automatically Identifying Islamophobia in Social Media
 
What Makes Hate Speech : an interactive workshop
What Makes Hate Speech : an interactive workshopWhat Makes Hate Speech : an interactive workshop
What Makes Hate Speech : an interactive workshop
 
Algorithmic Bias - What is it? Why should we care? What can we do about it?
Algorithmic Bias - What is it? Why should we care? What can we do about it? Algorithmic Bias - What is it? Why should we care? What can we do about it?
Algorithmic Bias - What is it? Why should we care? What can we do about it?
 
Algorithmic Bias : What is it? Why should we care? What can we do about it?
Algorithmic Bias : What is it? Why should we care? What can we do about it?Algorithmic Bias : What is it? Why should we care? What can we do about it?
Algorithmic Bias : What is it? Why should we care? What can we do about it?
 
Duluth at Semeval 2017 Task 6 - Language Models in Humor Detection
Duluth at Semeval 2017 Task 6 - Language Models in Humor Detection Duluth at Semeval 2017 Task 6 - Language Models in Humor Detection
Duluth at Semeval 2017 Task 6 - Language Models in Humor Detection
 
Who's to say what's funny? A computer using Language Models and Deep Learning...
Who's to say what's funny? A computer using Language Models and Deep Learning...Who's to say what's funny? A computer using Language Models and Deep Learning...
Who's to say what's funny? A computer using Language Models and Deep Learning...
 
Duluth at Semeval 2017 Task 7 - Puns upon a Midnight Dreary, Lexical Semantic...
Duluth at Semeval 2017 Task 7 - Puns upon a Midnight Dreary, Lexical Semantic...Duluth at Semeval 2017 Task 7 - Puns upon a Midnight Dreary, Lexical Semantic...
Duluth at Semeval 2017 Task 7 - Puns upon a Midnight Dreary, Lexical Semantic...
 
Puns upon a midnight dreary, lexical semantics for the weak and weary
Puns upon a midnight dreary, lexical semantics for the weak and wearyPuns upon a midnight dreary, lexical semantics for the weak and weary
Puns upon a midnight dreary, lexical semantics for the weak and weary
 
MICAI 2013 Tutorial Slides - Measuring the Similarity and Relatedness of Conc...
MICAI 2013 Tutorial Slides - Measuring the Similarity and Relatedness of Conc...MICAI 2013 Tutorial Slides - Measuring the Similarity and Relatedness of Conc...
MICAI 2013 Tutorial Slides - Measuring the Similarity and Relatedness of Conc...
 
Pedersen naacl-2013-demo-poster-may25
Pedersen naacl-2013-demo-poster-may25Pedersen naacl-2013-demo-poster-may25
Pedersen naacl-2013-demo-poster-may25
 
Pedersen semeval-2013-poster-may24
Pedersen semeval-2013-poster-may24Pedersen semeval-2013-poster-may24
Pedersen semeval-2013-poster-may24
 
Talk at UAB, April 12, 2013
Talk at UAB, April 12, 2013Talk at UAB, April 12, 2013
Talk at UAB, April 12, 2013
 
Feb20 mayo-webinar-21feb2012
Feb20 mayo-webinar-21feb2012Feb20 mayo-webinar-21feb2012
Feb20 mayo-webinar-21feb2012
 
Ihi2012 semantic-similarity-tutorial-part1
Ihi2012 semantic-similarity-tutorial-part1Ihi2012 semantic-similarity-tutorial-part1
Ihi2012 semantic-similarity-tutorial-part1
 
Pedersen ACL Disco-2011 workshop
Pedersen ACL Disco-2011 workshopPedersen ACL Disco-2011 workshop
Pedersen ACL Disco-2011 workshop
 
Pedersen acl2011-business-meeting
Pedersen acl2011-business-meetingPedersen acl2011-business-meeting
Pedersen acl2011-business-meeting
 
Acm ihi-2010-pedersen-final
Acm ihi-2010-pedersen-finalAcm ihi-2010-pedersen-final
Acm ihi-2010-pedersen-final
 
Pedersen naacl-2010-poster
Pedersen naacl-2010-posterPedersen naacl-2010-poster
Pedersen naacl-2010-poster
 
Aaai 2006 Pedersen
Aaai 2006 PedersenAaai 2006 Pedersen
Aaai 2006 Pedersen
 

Último

1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
QucHHunhnh
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
QucHHunhnh
 

Último (20)

Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Asian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptxAsian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptx
 
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-IIFood Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural ResourcesEnergy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
 

Screening Twitter Users for Depression and PTSD

  • 1. Screening Twitter Users for Depression and PTSD with Lexical Decision Lists Ted Pedersen University of Minnesota, Duluth tpederse@d.umn.edu
  • 2. Motivations ● Interesting classification task ● Even more interesting to identify vocabulary that indicates depression or PTSD ● Or tendency to self-report? ● Focused on decision lists, a simple machine learning method that learns a human interpretable model
  • 3. Decision Lists ● All tweets for each user kept on single line (to avoid splitting) ● Text is lowercased, anything not alpha-numeric is removed ● Randomly shuffled ● Ngram features learned from first 8 million words in training data for each condition ● Ngrams may be binary or any length 1-6 ● Ngrams made up of stopwords removed (or not) ● Ngrams weighted by frequency (or binary) ● Eight different decision lists learned ● system2 most accurate : Ngrams 1-6, stopwords, and binary weighting
  • 4. Decision Lists ● Any Ngram that meets previous three conditions and occurs at least 50 times more often in one condition than the other is selected as a feature ● Since conditions are binary (DvC, PvC, DvP) frequency in one condition is positive while the other is negative ● Ngrams that occur about the same number of times in both conditions not especially indicative or interesting
  • 5. Running Decision List ● For each Ngram in tweet, check to see if it is in decision list ● If using frequency weight, add value (positive or negative) of the Ngram to an overall score ● If using binary weight, add 1 or -1 to overall score ● Do this for all tweets for a user, if overall score > 0 then one class, <= 0 the other
  • 6. Decision List ● Decision lists often make a classification after finding the most indicative feature ● Elected to use all features found in user tweets to provide more nuanced decision ● System2 decision list has ● 18,617 features (DvC) ● 21,145 features (DvP) ● 17,936 features (PvC)
  • 7. Results? DvP DvC PvC System2 .769 .736 .720 System1 .760 .731 .721 Random .471 .492 .489 ● System2 and System1 are identical except that 2 uses a stoplist while 1 does not ● Both use Ngrams 1-6 and binary weighting
  • 8. Top 10 Features ● DvC ● Depression : ud83c, please, love, follow, ufe0f, re, f*cking, love you, im, udf38 ● Control : http, http t co, http t, co, t co, ud83d, lol, u2764 u2764 -, u2764 u2764 u2764, u2764 u2764 u2764 u2764 ● PvC ● PTSD : u2026, co, t co, u043e, u0430, u0435, thank, thank you, please, u0438 ● Control : ud83d, rt, ude02, ud83d ude02, gt, u2764 -, lol, u201c, ude02 ud83d -, ud83d ude02 ud83d ● DvP ● Depression : ud83d, ud83c, rt, love, ude02, ud83d ude02, im, follow, don t, don, love you ● PTSD : co, t co, http -, http t, http t co, u2026, amp, news, thanks, answer
  • 9. Lessons ● Standard machine learning algorithms can perform well at this task ● Even very simple ones like our decision lists ● Emoticons and Emoji are often strong indicators ● Ngrams of varying length combined with binary weights attained best results ● Frequency weighting very poor ● Stoplist has minimal impact
  • 10. Discussion ● How typical is it to self-report depression or PTSD? ● Is desire to self-report an indicator of something else? ● Do untreated / undiagnosed users look differently? ● How common are these conditions? ● PTSD : 7-8% (www.ptsd.va.gov) ● Depression : 17% (www.adaa.org) ● Typical to have multiple diagnoses ● PTSD + Depression ● Anxiety + Depression
  • 11. A case of self-reporting Which is worse, cancer or depression? The answer is clear. Depression is worse: depression makes you want to die and cancer doesn’t. I’ve spent all my adult life with depression lurking. I haven’t mentioned it to very many people at all. For the first ten years I talked about it to nobody at all, for the next decade only Gill and therapists ...
  • 12. Adam Kilgarriff ● Posted to blog May 3, 2015. Died May 16 at age 55. ● https://blog.kilgarriff.co.uk/?p=101