SlideShare a Scribd company logo
1 of 31
Download to read offline
Mining Text, Survey, Twitter & RSS Data 
Using DiscoverText
Dr. Stuart W. Shulman
Founder & CEO, Texifter
Emergent properties found in very well‐read texts, 
such as the character type “extremist agent of the law”
My Roots as a Coder
Dissertation Data (1997‐1999)
Relations between Classes
Rates and Terms for Credit
Farm Profitability
Cost of Living
Soil Fertility
Education
Exploration
Speculation
Coding
Validation
Circa 1999
May 2001
Council for Excellence in Government
June 2002
National Defense University
Purist
A Spectrum of Methods Approaches
deep immersion
closenessto data
antipathy to numbers
credible interpretation
in‐depthanalysis
contextual
subjective
experimental 
mixed method
adaptivehybrid
flexibleapproach
interdisciplinary
open minded
quantitative
focus on error
measurementcritical
validityand reliability
replication & objectivity
generalization
hypotheses
PositivistPluralist
An Important Book in the Journey
Other Very Important Books
Text Classification
A problem for 2.5 millennia
Plato argued it is frustrating
Software cannot remove the problem
It can expose problems humans must fix
Grimmer & Stewart 
“Text as Data” Political Analysis (2013)
Volume is a problem for scholars
Coders are expensive
Groups struggle to accurately label text at scale
Validation of both humans and machines is “essential”
Some models are easier to validate than others
All models are wrong
Automated models enhance/amplify, but don’t replace humans
There is no one right way to do this
“Validate, validate, validate”
“What should be avoided then, is the blind use of 
any method without a validation step.” 
Computer Science & NSF Influence:
Measure Everything!
How fast?
How reliable?
How accurate?
Valid?
Labeling, Tagging, or Annotation
Improves Machine Learning Over Time
A Labeling Interface Built for Speed
Redacted
Redacted
Samples for Human Coding 
& Machine Classification 
Inter‐Rater Reliability is Only One Factor
Understanding the landscape of human interpretationbetter 
prepares us to face the challenge of machine classification
Fleiss’ Kappa: The Level of 
Agreement Beyond Chance
Adjudicate Coder Disagreement
Enhanced Machine Learning
“CoderRank is to text analytics what PageRank was to 
search. Just as Google said not all web pages are 
created equal, Texifter argues that not all humans are 
created equal. When training machines, it is best to 
rely most on the humans most likely to create a valid 
observation. We proposedaunique way to rank 
humans on trust and knowledge vectors.”
CoderRank 2
CoderRank 3
CoderRank 4
CoderRank 5
CoderRank 6
CoderRank 7
CoderRank 1
CoderRank 8
CoderRank 9
Iterate Human & Machine Learning
Word Sense Disambiguation (Relevance)
“Patriots” Football Versus Politics
New Work on Fake News Detection
•A free and open source software option (CAT)
•Web‐basedcrowd sourcecollaborative tools
•Measurement innovations 
•Free real time Twitter data collection
•Random samplingand keystroke coding
•Advanced search and filtering
•Deduplicationand clusteringalgorithms 
•Custom machine‐learning classifiers
•Word sense disambiguation
•CoderRankfor enhanced machine learning
What Can CAT & DiscoverText Contribute?
Dr. Stuart W. Shulman
Founder & CEO, Texifter, LLC
Editor Emeritus, Journal of Information Technology & Politics
Contact Information
Email: stu@texifter.com
Twitter:@stuartwshulman 
Thanks for Listening!

More Related Content

Similar to Mining Text, Survey, Twitter & RSS Data Using DiscoverText

The Visual Framing of the Three Cycles of Climate Control in the New York Tie...
The Visual Framing of the Three Cycles of Climate Control in the New York Tie...The Visual Framing of the Three Cycles of Climate Control in the New York Tie...
The Visual Framing of the Three Cycles of Climate Control in the New York Tie...
Jason Lee Thompson
 
Climategate and scientific methodology
Climategate and scientific methodologyClimategate and scientific methodology
Climategate and scientific methodology
Janet Stemwedel
 
Evaluating The Theoretical And Or Methodological Approach...
Evaluating The Theoretical And Or Methodological Approach...Evaluating The Theoretical And Or Methodological Approach...
Evaluating The Theoretical And Or Methodological Approach...
Sandra Ahn
 
TRADOC OE Dr. Bishop Presentation
TRADOC OE Dr. Bishop PresentationTRADOC OE Dr. Bishop Presentation
TRADOC OE Dr. Bishop Presentation
US Army TRADOC G2
 
Empirical Article Review and Class DiscussionOrientation to the .docx
Empirical Article Review and Class DiscussionOrientation to the .docxEmpirical Article Review and Class DiscussionOrientation to the .docx
Empirical Article Review and Class DiscussionOrientation to the .docx
SALU18
 
Weeks1-2_MyLastName.docxWeeks 1-2 Written Assignment (s.docx
Weeks1-2_MyLastName.docxWeeks 1-2 Written Assignment (s.docxWeeks1-2_MyLastName.docxWeeks 1-2 Written Assignment (s.docx
Weeks1-2_MyLastName.docxWeeks 1-2 Written Assignment (s.docx
loganta
 

Similar to Mining Text, Survey, Twitter & RSS Data Using DiscoverText (20)

Meyer Big Data SDP13
Meyer Big Data SDP13Meyer Big Data SDP13
Meyer Big Data SDP13
 
Making our mark: the important role of social scientists in the ‘era of big d...
Making our mark: the important role of social scientists in the ‘era of big d...Making our mark: the important role of social scientists in the ‘era of big d...
Making our mark: the important role of social scientists in the ‘era of big d...
 
CSP589448
CSP589448CSP589448
CSP589448
 
BROWN BAG TALK WITH MICAH ALTMAN, SOURCES OF BIG DATA FOR SOCIAL SCIENCES
BROWN BAG TALK WITH MICAH ALTMAN, SOURCES OF BIG DATA FOR SOCIAL SCIENCESBROWN BAG TALK WITH MICAH ALTMAN, SOURCES OF BIG DATA FOR SOCIAL SCIENCES
BROWN BAG TALK WITH MICAH ALTMAN, SOURCES OF BIG DATA FOR SOCIAL SCIENCES
 
Emergence and Growth of Knowledge and Diversity in Hierarchically Complex Org...
Emergence and Growth of Knowledge and Diversity in Hierarchically Complex Org...Emergence and Growth of Knowledge and Diversity in Hierarchically Complex Org...
Emergence and Growth of Knowledge and Diversity in Hierarchically Complex Org...
 
The Visual Framing of the Three Cycles of Climate Control in the New York Tie...
The Visual Framing of the Three Cycles of Climate Control in the New York Tie...The Visual Framing of the Three Cycles of Climate Control in the New York Tie...
The Visual Framing of the Three Cycles of Climate Control in the New York Tie...
 
Climategate and scientific methodology
Climategate and scientific methodologyClimategate and scientific methodology
Climategate and scientific methodology
 
Teaching information literacy to students of the Long-Tail Market: a pedagogi...
Teaching information literacy to students of the Long-Tail Market: a pedagogi...Teaching information literacy to students of the Long-Tail Market: a pedagogi...
Teaching information literacy to students of the Long-Tail Market: a pedagogi...
 
Code in the IL Classroom: Moving towards a trans-discipline information liter...
Code in the IL Classroom: Moving towards a trans-discipline information liter...Code in the IL Classroom: Moving towards a trans-discipline information liter...
Code in the IL Classroom: Moving towards a trans-discipline information liter...
 
Evaluating The Theoretical And Or Methodological Approach...
Evaluating The Theoretical And Or Methodological Approach...Evaluating The Theoretical And Or Methodological Approach...
Evaluating The Theoretical And Or Methodological Approach...
 
TRADOC OE Dr. Bishop Presentation
TRADOC OE Dr. Bishop PresentationTRADOC OE Dr. Bishop Presentation
TRADOC OE Dr. Bishop Presentation
 
Generational Inclusion
Generational InclusionGenerational Inclusion
Generational Inclusion
 
Nisbet goucher class
Nisbet goucher classNisbet goucher class
Nisbet goucher class
 
Of Mice And Men Literary Analysis Essay.pdf
Of Mice And Men Literary Analysis Essay.pdfOf Mice And Men Literary Analysis Essay.pdf
Of Mice And Men Literary Analysis Essay.pdf
 
A Wolf in Sheep’s Clothing: Genre Confusion and Disinformation - Joel M Burkh...
A Wolf in Sheep’s Clothing: Genre Confusion and Disinformation - Joel M Burkh...A Wolf in Sheep’s Clothing: Genre Confusion and Disinformation - Joel M Burkh...
A Wolf in Sheep’s Clothing: Genre Confusion and Disinformation - Joel M Burkh...
 
Empirical Article Review and Class DiscussionOrientation to the .docx
Empirical Article Review and Class DiscussionOrientation to the .docxEmpirical Article Review and Class DiscussionOrientation to the .docx
Empirical Article Review and Class DiscussionOrientation to the .docx
 
CeB - f - s01
CeB - f - s01CeB - f - s01
CeB - f - s01
 
Talking Tech - the art and science of communicating complex ideas (Bristech2...
Talking Tech  - the art and science of communicating complex ideas (Bristech2...Talking Tech  - the art and science of communicating complex ideas (Bristech2...
Talking Tech - the art and science of communicating complex ideas (Bristech2...
 
Weeks1-2_MyLastName.docxWeeks 1-2 Written Assignment (s.docx
Weeks1-2_MyLastName.docxWeeks 1-2 Written Assignment (s.docxWeeks1-2_MyLastName.docxWeeks 1-2 Written Assignment (s.docx
Weeks1-2_MyLastName.docxWeeks 1-2 Written Assignment (s.docx
 
Introduction to qualitative research for shs teaching
Introduction to qualitative research for shs teachingIntroduction to qualitative research for shs teaching
Introduction to qualitative research for shs teaching
 

Recently uploaded

%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
masabamasaba
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
masabamasaba
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Recently uploaded (20)

%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
 
Harnessing ChatGPT - Elevating Productivity in Today's Agile Environment
Harnessing ChatGPT  - Elevating Productivity in Today's Agile EnvironmentHarnessing ChatGPT  - Elevating Productivity in Today's Agile Environment
Harnessing ChatGPT - Elevating Productivity in Today's Agile Environment
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
tonesoftg
tonesoftgtonesoftg
tonesoftg
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go Platformless
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 

Mining Text, Survey, Twitter & RSS Data Using DiscoverText