Slides delivered as a part of #CAQDAS14.
In 1989 the Department of Sociology at the University of Surrey convened the world's first conference on qualitative software, which brought together qualitative methodologists and software developers who debated the pros and cons of the use of technology for qualitative data analysis. The result was a book (Fielding & Lee (1991) Using Computers in Qualitative Research, Sage Publications), the setting-up of the CAQDAS Networking Project and many other conferences concerning the topics over the years.
This conference will be another opportunity for methodologists, developers and researchers to come together and debate the issues.There will be keynote papers by leading experts in the field, software support clinics and opportunities to present work in progress.
http://www.surrey.ac.uk/sociology/files/Programme%20.pdf
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
Measuring reliability and validity in human coding and machine classification
1. Measuring
Reliability
and
Validity
in
Human
Coding
and
Machine
Classifica9on
Dr.
Stuart
Shulman
May
2,
2014
CAQDAS
Conference
2014
“…a
wealth
of
informa0on
creates
a
poverty
of
a6en0on.”
-‐
Herbert
Simon,
1971
2.
3. • This
research
has
been
supported
by
grants
from
the
NaGonal
Science
FoundaGon
(NSF)
and
was
supplemented
through
interagency
agreements
between
the
US
Environmental
ProtecGon
Agency,
the
US
Fish
&
Wildlife
Service,
and
the
NSF.
– EIA
0089892
(2001-‐2002)
v “SGER
CiGzen
Agenda-‐SeVng
in
the
Regulatory
Process:
Electronic
CollecGon
and
Synthesis
of
Public
Commentary”
– EIA
0327979
(2003-‐2004)
v “SGER
CollaboraGve:
A
Testbed
for
eRulemaking
Data”
– SES
0322662
(2003-‐2005)
v “Democracy
and
E-‐Rulemaking:
Comparing
TradiGonal
vs.
Electronic
Comment
from
a
Discursive
DemocraGc
Framework”
– IIS
0429293
(2004-‐2007)
v “CollaboraGve
Research:
Language
Processing
Technology
for
Electronic
Rulemaking”
– SES-‐0620673
(2007)
v
“Coding
across
the
Disciplines:
A
Project-‐Based
Workshop
on
Manual
Text
AnnotaGon
Techniques”
– IIS-‐0705566
(2007-‐2010)
v “CollaboraGve
Research
III-‐COR:
From
a
Pile
of
Documents
to
a
CollecGon
of
InformaGon:
A
Framework
for
MulG-‐Dimensional
Text
Analysis”
• Any
opinions,
findings
and
conclusions
or
recommenda9ons
expressed
in
this
material
are
those
of
the
authors
and
do
not
necessarily
reflect
those
of
the
Na9onal
Science
Founda9on
Acknowledgements
5. Qualita9ve
Methods:
Genes,
Taste,
or
Tac9c?
• Qualita9ve
by
birth
or
choice?
– Some
look
to
words
as
an
alternaGve
to
number
crunching
– Others
rooted
in
rich
and
meaningful
interpreGve
tradiGons
• Another
group
is
fluent
in
both
qual
&
quant
– Mixed
methods
open
up
rather
than
limits
fields
of
knowledge
• One
central
goal
is
valid
inferences
about
phenomena
– Replicable
and
transparent
methods
– AbenGon
to
error
and
correcGve
measures
– Internal
and
external
validaGon
of
results
• Using
computers
for
qualita9ve
data
analysis
helps,
but…
– Rigor
sGll
originates
with
the
research
design,
not
the
technology
– Socware
makes
beber
organizaGon
and
efficiency
possible
– Coders
enable
the
researcher
to
step
back
while
scaling
up
6. Purist
Pluralist
Posi9vist
A
spectrum
of
approaches
to
working
with
qualita9ve
data
Different
types
of
knowledge
claims
depending
where
you
sit
deep
immersion
closeness
to
data
anGpathy
to
numbers
credible
interpretaGon
in-‐depth
analysis
contextual
subjecGve
experimental
mixed
method
adapGve
hybrid
flexible
approach
interdisciplinary
quanGtaGve
focus
on
error
measurement
criGcal
validity
and
reliability
replicaGon
&
objecGvity
generalizaGon
hypotheses
These
choices
philosophical,
ideological,
poli9cal
and
ethical
7. Emergent
proper9es
found
in
a
very
well
read
texts,
such
as
the
character
type
“extremist
agent
of
the
law”
9. Rela9ons
between
Classes
Rates
and
Terms
for
Credit
Farm
Profitability
Cost
of
Living
Soil
Fer9lity
Educa9on
Explora9on
Specula9on
Coding
Valida9on
10. Skip
Ahead
10
Years:
Display
Ideas
Using
IR
&
NLP
Techniques
• Informa9on
Retrieval
(IR)
– Search
and
cluster
topics
and
cross-‐
correlate
by
stakeholders
• Natural
Language
Processing
(NLP)
– Grouped
by
opinion
and
writer
type
Con
Pro
25,000
20,000
15,000
10,000
5,000
Par
2.2(a1)
Ø Con:
ü 150,
818:
“impossible
to
maintain”
ü 272:
“too
expensive
for
elderly”
Ø Pro:
ü 169,
213,
391,
392,
394:
“already
being
done
in
Alaska”
ü 18:
“extend
to
children”
Xxx
xx
xxx
xx
x
xxx
x
xxx
Xx
xxxx
x
xxx
x
xxxxxxx
x
Xxxxx
x
xx
xxxx
x
xx
x
Xx
xx
xxxx
x
Xxx
xx
xxx
xx
x
xxx
x
xxx
Xx
xxxx
x
xxx
x
xxxxxxx
x
Xxxxx
xx
xxxx
xxx
Xxx
xxx
xxxxxxx
x
xxx
xx
x
Xx
xx
xxxx
x
Xxx
xx
xxx
xx
x
xxx
x
xxx
Xx
xxxx
x
xxx
x
xxxxxxx
x
Xxxxx
x
xx
xxxx
x
xx
x
Xx
xx
xxxx
x
11. Stuart
W.
Shulman.
2003.
"An
Experiment
in
Digital
Government
at
the
United
States
Na9onal
Organic
Program,"
Agriculture
and
Human
Values
20(3),
253-‐265.
20. Over
13,000
hours
of
video
and
audio
were
recorded
of
the
public
spaces
in
a
LTC
facility’s
demenGa
unit
in
suburban
Pibsburgh,
PA.
A
codebook
of
80+
codes
was
developed
to
categorize
the
behavior
of
the
consenGng
residents
and
staff
(only
in
relaGon
to
paGents).
22
coders
spent
more
than
4,400
hours
over
a
period
of
22
months
coding
the
video
data.
The
data
were
coded
using
the
Informedia
Digital
Video
Library
(IDVL),
an
interface
designed
by
computer
scienGsts
at
Carnegie
Mellon
University.
29. Dr.
Stuart
W.
Shulman
Founder
&
CEO,
Texicer,
LLC
Research
Associate
Professor,
Department
of
PoliGcal
Science
University
of
Massachusebs
Amherst
Director,
QualitaGve
Data
Analysis
Program
(QDAP)
Associate
Director,
NaGonal
Center
for
Digital
Government
Editor
Emeritus,
Journal
of
Informa0on
Technology
&
Poli0cs
stu@texicer.com
hbp://people.umass.edu/stu/
@stuartwshulman