SlideShare uma empresa Scribd logo
1 de 100
Baixar para ler offline
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Complex sampling in latent variable models
Daniel Oberski
Department of methodology and statistics
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
• When doing latent class analysis, factor analysis, IRT, or
structural equation modeling, should you use sampling
weights, stratification, and clustering variables?
• What is complex about surveys?
• What is ``pseudo'' about pseudo-maximum likelihood?
• What are design effects and what makes them so deft?
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Outline
..1 Complex surveys
..2 Latent variable models (LVM)
..3 Estimation of LVM under complex sampling
..4 Effect on LVM
..5 Conclusion
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Does it make a difference?
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Unweighted regression
Weighted regression
Source: 1988 National Maternal and Infant Health Survey (Korn and Graubard, 1995).
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Unweighted regression
Weighted regression
Source: 1988 National Maternal and Infant Health Survey (Korn and Graubard, 1995).
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Latent class analysis of eating vegetables
Unweighted LCA
Low High
Latent class 33% 77%
Recall 1 high 60% 80%
Recall 2 high 51% 82%
Recall 3 high 40% 81%
Recall 4 high 46% 79%
Source: The continuing Survey of Food Intakes by Individuals (Patterson et al., 2002).
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Latent class analysis of eating vegetables
Unweighted LCA
Low High
Latent class 33% 77%
Recall 1 high 60% 80%
Recall 2 high 51% 82%
Recall 3 high 40% 81%
Recall 4 high 46% 79%
LCA using weights
Low High
Latent class 18% 82%
Recall 1 high 46% 78%
Recall 2 high 39% 76%
Recall 3 high 28% 77%
Recall 4 high 39% 73%
Source: The continuing Survey of Food Intakes by Individuals (Patterson et al., 2002).
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Sample surveys, ``linear estimators''
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Sample surveys
Purposes:
• Descriptive;
• Analytic.
Assessment of Health Status and Social Determinants
of Health (Padgol village, Gujarat, India).
Source: Boston U. India Research and Outreach Initiative.
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Sample surveys
Idea of a sample survey: can generalize from a sample to a
population if the sample is ``like'' the population,
``representative method''.
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Sample of people ``like'' the population?
• Neyman (1934) figured this would be true on average if
you draw a random sample;
• This is the theory we still use today.
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Sample of people ``like'' the population?
• Neyman (1934) figured this would be true on average if
you draw a random sample;
• This is the theory we still use today.
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Sample of people ``like'' the population?
• Neyman (1934) figured this would be true on average if
you draw a random sample;
• This is the theory we still use today.
``Linear estimator'':
Eπ

n−1
∑
i∈sample
yi

 = N−1
∑
i∈population
yi.
and generally
mn
d
→ N[µ, var(mn)]
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Sample of people ``like'' the population?
• Neyman (1934) figured this would be true on average if
you draw a random sample;
• This is the theory we still use today.
``Linear estimator'':
Eπ

n−1
∑
i∈sample
yi

 = N−1
∑
i∈population
yi.
and generally
mn
d
→ N[µ, var(mn)]
``Design-consistent''
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
``Linear estimator''
• Most of the time when people talk about ``linear
estimators'', they are thinking about means and totals.
• But a proportion is a linear estimator too;
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
``Linear estimator''
• Most of the time when people talk about ``linear
estimators'', they are thinking about means and totals.
• But a proportion is a linear estimator too;
• for ex., proportion observed for response patterns:
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
``Linear estimator''
• Most of the time when people talk about ``linear
estimators'', they are thinking about means and totals.
• But a proportion is a linear estimator too;
• for ex., proportion observed for response patterns:
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
``Linear estimator''
• Most of the time when people talk about ``linear
estimators'', they are thinking about means and totals.
• But a proportion is a linear estimator too;
• for ex., proportion observed for response patterns:
Pattern Prop. Pattern Prop.
1111 0.226 0111 0.090
1110 0.087 0110 0.047
1101 0.092 0101 0.046
1100 0.049 0100 0.030
1011 0.085 0011 0.045
1010 0.048 0010 0.028
1001 0.049 0001 0.029
1000 0.029 0000 0.022
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
``Linear estimator''
• Most of the time when people talk about ``linear
estimators'', they are thinking about means and totals.
• But a proportion is a linear estimator too;
• for ex., proportion observed for response patterns:
Pattern Prop. Pattern Prop.
1111 0.226 0111 0.090
1110 0.087 0110 0.047
1101 0.092 0101 0.046
1100 0.049 0100 0.030
1011 0.085 0011 0.045
1010 0.048 0010 0.028
1001 0.049 0001 0.029
1000 0.029 0000 0.022
→
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
``Linear estimator''
• Most of the time when people talk about ``linear
estimators'', they are thinking about means and totals.
• But a proportion is a linear estimator too;
• for ex., proportion observed for response patterns:
Pattern Prop. Pattern Prop.
1111 0.226 0111 0.090
1110 0.087 0110 0.047
1101 0.092 0101 0.046
1100 0.049 0100 0.030
1011 0.085 0011 0.045
1010 0.048 0010 0.028
1001 0.049 0001 0.029
1000 0.029 0000 0.022
→
LCA
estimates:
Latent class
1 2
y1 0.77 0.56
y2 0.78 0.55
y3 0.76 0.55
y4 0.78 0.54
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
``Linear estimator''
• Most of the time when people talk about ``linear
estimators'', they are thinking about means and totals.
• But a proportion is a linear estimator too;
• for ex., proportion observed for response patterns:
Pattern Prop. Pattern Prop.
1111 0.226 0111 0.090
1110 0.087 0110 0.047
1101 0.092 0101 0.046
1100 0.049 0100 0.030
1011 0.085 0011 0.045
1010 0.048 0010 0.028
1001 0.049 0001 0.029
1000 0.029 0000 0.022
→
LCA
estimates:
Latent class
1 2
y1 0.77 0.56
y2 0.78 0.55
y3 0.76 0.55
y4 0.78 0.54
• Even the (co)variance is a linear estimator, if you redefine
d := (y − E(Y))(y − E(Y))T: then var(y) = (n − 1)−1
∑
d
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
``Linear estimator''
• Most of the time when people talk about ``linear
estimators'', they are thinking about means and totals.
• But a proportion is a linear estimator too;
• for ex., proportion observed for response patterns:
Pattern Prop. Pattern Prop.
1111 0.226 0111 0.090
1110 0.087 0110 0.047
1101 0.092 0101 0.046
1100 0.049 0100 0.030
1011 0.085 0011 0.045
1010 0.048 0010 0.028
1001 0.049 0001 0.029
1000 0.029 0000 0.022
→
LCA
estimates:
Latent class
1 2
y1 0.77 0.56
y2 0.78 0.55
y3 0.76 0.55
y4 0.78 0.54
• Even the (co)variance is a linear estimator, if you redefine
d := (y − E(Y))(y − E(Y))T: then var(y) = (n − 1)−1
∑
d
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Complications → ``complex surveys'':
• Clustering
• Stratification
• Selection with unequal probabilities πi
Equivalent: not independently and identically distributed (iid)
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Clustering
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Simple random sampling: a lot of driving
A simple random sample of voter locations in the US.
Source: Lumley (2010).
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Source: Heeringa et al. (2010)
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Sample clustering for several reasons:
• Geographic clustering of elements for household surveys
reduces interviewing costs by amortizing travel and
related expenditures over a group of observations. E.g.:
NCS- R, National Health and Nutrition Examination Survey
(NHANES), Health and Retirement Study (HRS)
• Sample elements may not be individually identified on the
available sampling frames but can be linked to aggregate
cluster units (e.g., voters at precinct polling stations,
students in colleges and universities). The available
sampling frame often identifies only the cluster groupings.
(Heeringa et al., 2010, p. 28)
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Sample clustering for several reasons:
• Geographic clustering of elements for household surveys
reduces interviewing costs by amortizing travel and
related expenditures over a group of observations. E.g.:
NCS- R, National Health and Nutrition Examination Survey
(NHANES), Health and Retirement Study (HRS)
• Sample elements may not be individually identified on the
available sampling frames but can be linked to aggregate
cluster units (e.g., voters at precinct polling stations,
students in colleges and universities). The available
sampling frame often identifies only the cluster groupings.
• One or more stages of the sample are deliberately
clustered to enable the estimation of multilevel models
and components of variance in variables of interest (e.g.,
students in classes, classes within schools).
(Heeringa et al., 2010, p. 28)
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Sample clustering for several reasons:
• Geographic clustering of elements for household surveys
reduces interviewing costs by amortizing travel and
related expenditures over a group of observations. E.g.:
NCS- R, National Health and Nutrition Examination Survey
(NHANES), Health and Retirement Study (HRS)
• Sample elements may not be individually identified on the
available sampling frames but can be linked to aggregate
cluster units (e.g., voters at precinct polling stations,
students in colleges and universities). The available
sampling frame often identifies only the cluster groupings.
• One or more stages of the sample are deliberately
clustered to enable the estimation of multilevel models
and components of variance in variables of interest (e.g.,
students in classes, classes within schools).
(Heeringa et al., 2010, p. 28)
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Stratification
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Sample stratified by region
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Stratified sampling serves several purposes:
• Relative to an SRS of equal size, smaller standard errors
• Disproportionately allocate the sample to subpopulations,
that is, to oversample specific subpopulations to ensure
sufficient sample sizes for analysis.
(Heeringa et al., 2010, p. 32)
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Unequal probabilities of selection
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Common reasons for varying probabilities of case selection in
sample surveys include (Heeringa et al., 2010, p. 38--43):
• Disproportionate sampling within strata to
• achieve an optimally allocated sample
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Common reasons for varying probabilities of case selection in
sample surveys include (Heeringa et al., 2010, p. 38--43):
• Disproportionate sampling within strata to
• achieve an optimally allocated sample
• deliberately increase precision for subpopulations
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Common reasons for varying probabilities of case selection in
sample surveys include (Heeringa et al., 2010, p. 38--43):
• Disproportionate sampling within strata to
• achieve an optimally allocated sample
• deliberately increase precision for subpopulations
• Differentially sample subpopulations, e.g. NHANES
oversampling of people with disabilities.
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Common reasons for varying probabilities of case selection in
sample surveys include (Heeringa et al., 2010, p. 38--43):
• Disproportionate sampling within strata to
• achieve an optimally allocated sample
• deliberately increase precision for subpopulations
• Differentially sample subpopulations, e.g. NHANES
oversampling of people with disabilities.
• Subsampling of observational units within sample clusters,
e.g. selecting a single random respondent from the
eligible members of sample households.
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Common reasons for varying probabilities of case selection in
sample surveys include (Heeringa et al., 2010, p. 38--43):
• Disproportionate sampling within strata to
• achieve an optimally allocated sample
• deliberately increase precision for subpopulations
• Differentially sample subpopulations, e.g. NHANES
oversampling of people with disabilities.
• Subsampling of observational units within sample clusters,
e.g. selecting a single random respondent from the
eligible members of sample households.
• Sampling probability that can be obtained only in the
process of the survey data collection, e.g. in a random
digit dialing (RDD) telephone survey, number of distinct
landline telephone numbers
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Common reasons for varying probabilities of case selection in
sample surveys include (Heeringa et al., 2010, p. 38--43):
• Disproportionate sampling within strata to
• achieve an optimally allocated sample
• deliberately increase precision for subpopulations
• Differentially sample subpopulations, e.g. NHANES
oversampling of people with disabilities.
• Subsampling of observational units within sample clusters,
e.g. selecting a single random respondent from the
eligible members of sample households.
• Sampling probability that can be obtained only in the
process of the survey data collection, e.g. in a random
digit dialing (RDD) telephone survey, number of distinct
landline telephone numbers
• Nonresponse
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Common reasons for varying probabilities of case selection in
sample surveys include (Heeringa et al., 2010, p. 38--43):
• Disproportionate sampling within strata to
• achieve an optimally allocated sample
• deliberately increase precision for subpopulations
• Differentially sample subpopulations, e.g. NHANES
oversampling of people with disabilities.
• Subsampling of observational units within sample clusters,
e.g. selecting a single random respondent from the
eligible members of sample households.
• Sampling probability that can be obtained only in the
process of the survey data collection, e.g. in a random
digit dialing (RDD) telephone survey, number of distinct
landline telephone numbers
• Nonresponse
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Linear estimators in complex samples
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Problems:
Bias: If some (types of) people have a differing chance πi
of being in the sample, usual sample statistics will not (on
average) equal the population quantities anymore.
Variance: Affected by clustering/stratification.
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Problems:
Bias: If some (types of) people have a differing chance πi
of being in the sample, usual sample statistics will not (on
average) equal the population quantities anymore.
Variance: Affected by clustering/stratification.
If ˆµn := n−1
∑
i∈sample
1
πi
yi, notice:
Eπ

n−1
∑
i∈sample
1
πi
yi

 = N−1
∑
i∈population
πi
πi
yi = N−1
∑
i∈population
yi
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Problems:
Bias: If some (types of) people have a differing chance πi
of being in the sample, usual sample statistics will not (on
average) equal the population quantities anymore.
Variance: Affected by clustering/stratification.
If ˆµn := n−1
∑
i∈sample
1
πi
yi, notice:
Eπ

n−1
∑
i∈sample
1
πi
yi

 = N−1
∑
i∈population
πi
πi
yi = N−1
∑
i∈population
yi
Solutions:
• weighted estimator ˆµn unbiased (Horvitz and Thompson, 1952);
• Can obtain variance of weighted estimate, var(ˆµn), under
clustering, stratification.
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Latent variable modeling
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Latent variable modeling (LVM)
• (Confirmatory) factor analysis (CFA);
• Structural Equation Modeling (SEM);
• Latent Class Analysis/Modeling (LCA/LCM);
• Latent trait modeling;
• Item Response Theory (IRT) models;
• Mixture models;
• Random effects/hierarchical/multilevel models;
• ``Anchoring vignettes'' models;
• ... etc.
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
• Proportions can be turned into an LC or IRT analysis;
• Covariances can be turned into a SEM analysis.
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
• Proportions can be turned into an LC or IRT analysis;
• Covariances can be turned into a SEM analysis.
Definition
Latent variable model estimation: a way of turning observed
covariances/proportions (``moments'') into LVM parameter
estimates.
LVM : mn → ˆθn
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
LVM : mn → ˆθn
Example: confirmatory factor analysis (CFA) with 1 factor, 3
indicators:
:
ˆλ11 =
√
cor(y1, y2)cor(y1, y3)/cor(y2, y3)
ˆλ21 =
√
cor(y1, y2)cor(y2, y3)/cor(y1, y3)
ˆλ31 =
√
cor(y1, y3)cor(y2, y3)/cor(y1, y2)
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Inference in latent variable models under simple random
sampling
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Data generating process:
→ →
Inference:
← ←
(Fuller, 2009).
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Data generating process:
Model
Superpopulation
→ →
Inference:
← ←
(Fuller, 2009).
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Data generating process:
Model
Superpopulation
→ Finite population →
Inference:
← ←
(Fuller, 2009).
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Data generating process:
Model
Superpopulation
→ Finite population → Sample
Inference:
← ←
(Fuller, 2009).
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Data generating process:
Model
Superpopulation
→ Finite population → Sample
Inference:
← ← Sample
(Fuller, 2009).
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Data generating process:
Model
Superpopulation
→ Finite population → Sample
Inference:
← Finite population ← Sample
(Fuller, 2009).
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Data generating process:
Model
Superpopulation
→ Finite population → Sample
Inference:
Model
← Finite population ← Sample
(Fuller, 2009).
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Superpopulation → Finite population of 100 subjects
Loadings: 0.707
→
y1
−2
0
2
−4 −2 0 2
Corr:
0.442
Corr:
0.475
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
y20
2
−2 0 2
Corr:
0.321
●
●
●●
● ●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
● ●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
● ●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
y3
0
2
−2 0 2
Loadings:
y1: 0.810
y2: 0.546
y3: 0.587
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Simple random sample (SRS) of 20 from finite pop.
y1
−2
0
2
−4 −2 0 2
Cor : 0.442
1: 0.425
2: 0.568
Cor : 0.475
1: 0.361
2: 0.668
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
y20
2
−2 0 2
Cor : 0.321
1: 0.258
2: 0.543
●
●
●●
● ●●●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
● ●
●
●
●●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
● ●● ●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
● ●
●
● ●
●
●
●●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
y3
0
2
−2 0 2
●
(Superpopulation
loadings: 0.707)
SRS factor loading
estimates:
y1: 0.836
y2: 0.679
y3: 0.800
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Superpopulation inference from SRS to
superpopulation
Superpopulation
← ←
Sample
y1
−2
0
2
−4 −2 0 2
Cor : 0.442
1: 0.425
2: 0.568
Cor : 0.475
1: 0.361
2: 0.668
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
y20
2
−2 0 2
Cor : 0.321
1: 0.258
2: 0.543
●
●
●●
● ●●●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
● ●
●
●
●●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
● ●● ●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
● ●
●
● ●
●
●
●●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
y3
0
2
−2 0 2
●
simple.random
1
2
λ11: 0.707
λ21: 0.707
λ31: 0.707
← ←
Avg. (sd) loading over
10,000 samples:
ˆλ11: 0.707 (0.125)
ˆλ21: 0.722 (0.127)
ˆλ31: 0.711 (0.122)
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Complex sampling affects latent variable modeling
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
LVM : mn → ˆθn
This means that:
• bias in covariances/proportions (moments) leads to bias in
LVM parameter estimates;
• any across-sample variation in latent variable parameter
estimates is entirely due to variation in the sample
moments used to estimate them.
• With more observed variables (moments), use Maximum
Likelihood (ML) to get estimates, but above is still true.
• MLE: ˆθn = arg maxθ L(θ; ˆµn)
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
LVM: mn → ˆθn so bias in mn means bias in ˆθn
• One solution: modeling correctly all aspects of the
sampling design.
(Skinner et al., 1989, chapter 3)
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
LVM: mn → ˆθn so bias in mn means bias in ˆθn
• One solution: modeling correctly all aspects of the
sampling design.
• Another solution:
replacing the observed moments with
design-consistent moments will provide
design-consistent estimates =
``pseudo-maximum likelihood'' (PML).
ˆµn → ˆθn
(Skinner et al., 1989, chapter 3)
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
LVM: mn → ˆθn so bias in mn means bias in ˆθn
• One solution: modeling correctly all aspects of the
sampling design.
• Another solution:
replacing the observed moments with
design-consistent moments will provide
design-consistent estimates =
``pseudo-maximum likelihood'' (PML).
ˆµn → ˆθn
• (A third solution: weighted least squares - less than
satisfactory results)
(Skinner et al., 1989, chapter 3)
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
LVM: mn → ˆθn so bias in mn means bias in ˆθn
• One solution: modeling correctly all aspects of the
sampling design.
• Another solution:
replacing the observed moments with
design-consistent moments will provide
design-consistent estimates =
``pseudo-maximum likelihood'' (PML).
ˆµn → ˆθn
• (A third solution: weighted least squares - less than
satisfactory results)
(Skinner et al., 1989, chapter 3)
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
LVM: mn → ˆθn so bias in mn means bias in ˆθn
• One solution: modeling correctly all aspects of the
sampling design.
• Another solution:
replacing the observed moments with
design-consistent moments will provide
design-consistent estimates =
``pseudo-maximum likelihood'' (PML).
ˆµn → ˆθn
(Skinner et al., 1989, chapter 3)
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Variance of PMLE is obtained by sandwich (linearization)
estimate. In turn depends on variance of design-consistent
moment estimates (the ``meat'').
var(ˆθn) = (∆T
V∆)−1
∆T
V · var(ˆµn) · V∆(∆T
V∆)−1
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Variance of PMLE is obtained by sandwich (linearization)
estimate. In turn depends on variance of design-consistent
moment estimates (the ``meat'').
var(ˆθn) = (∆T
V∆)−1
∆T
V · var(ˆµn) · V∆(∆T
V∆)−1
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Variance of PMLE is obtained by sandwich (linearization)
estimate. In turn depends on variance of design-consistent
moment estimates (the ``meat'').
var(ˆθn) = (∆T
V∆)−1
∆T
V · var(ˆµn) · V∆(∆T
V∆)−1
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Variance of PMLE is obtained by sandwich (linearization)
estimate. In turn depends on variance of design-consistent
moment estimates (the ``meat'').
var(ˆθn) = (∆T
V∆)−1
∆T
V · var(ˆµn) · V∆(∆T
V∆)−1
V: Depends on distributional assumptions (=ML)
∆: Depends on the specific model (=LVM)
var(ˆµn): Depends on variance of means/prop's/covar's under complex sampling
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
pseudo-
..1 supposed or purporting to be but not really so; false; not
genuine: pseudonym | pseudoscience.
..2 resembling or imitating: pseudohallucination |
pseudo-French.
ORIGIN from Greek pseudēs ‘false,’ pseudos ‘falsehood.’
Source: New Oxford American Dictionary
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
pseudo-
..1 supposed or purporting to be but not really so; false; not
genuine: pseudonym | pseudoscience.
..2 resembling or imitating: pseudohallucination |
pseudo-French.
ORIGIN from Greek pseudēs ‘false,’ pseudos ‘falsehood.’
Source: New Oxford American Dictionary
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Pseudo-ML
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Pseudo-ML
Why ML?
• Consistently estimate parameters aggregated over
clusters and strata;
• Estimates ``MLE that would be obtained by fitting the
model to the population data''.
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Pseudo-ML
Why ML?
• Consistently estimate parameters aggregated over
clusters and strata;
• Estimates ``MLE that would be obtained by fitting the
model to the population data''.
Why pseudo?
• Not exactly equal to the MLE obtained by correctly
modeling all aspects of the sampling design;
• Not asymptotically optimal.
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Pseudo-ML
Why ML?
• Consistently estimate parameters aggregated over
clusters and strata;
• Estimates ``MLE that would be obtained by fitting the
model to the population data''.
Why pseudo?
• Not exactly equal to the MLE obtained by correctly
modeling all aspects of the sampling design;
• Not asymptotically optimal.
Why PML?
• Aggregate parameters may be of interest;
• No assumptions/modeling on design necessary.
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
The effect of clustering
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Cluster sample of 20 from finite population
y1
−2
0
2
−4 −2 0 2
Cor : 0.442
1: 0.412
2: 0.49
Cor : 0.475
1: 0.504
2: 0.455
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
y20
2
−2 0 2
Cor : 0.321
1: 0.352
2: 0.205
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
y3
0
2
−2 0 2
●
●
(Superpopulation
loadings: 0.707)
Cluster sample
loading estimates:
y1: 0.997
y2: 0.491
y3: 0.456
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Superpopulation inference from SRS to
superpopulation
Superpopulation
← ←
Sample
y1
−2
0
2
−4 −2 0 2
Cor : 0.442
1: 0.425
2: 0.568
Cor : 0.475
1: 0.361
2: 0.668
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
y20
2
−2 0 2
Cor : 0.321
1: 0.258
2: 0.543
●
●
●●
● ●●●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
● ●
●
●
●●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
● ●● ●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
● ●
●
● ●
●
●
●●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
y3
0
2
−2 0 2
●
simple.random
1
2
λ11: 0.707
λ21: 0.707
λ31: 0.707
← ←
Avg. (sd) loading over
10,000 samples:
ˆλ11: 0.665 (0.157)
ˆλ21: 0.699 (0.140)
ˆλ31: 0.703 (0.145)
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
The effect of cluster sampling on factor analysis
ˆλ11
ˆλ21
ˆλ31
avg sd avg sd avg sd
Population: 0.707 0.707 0.707
SRS: 0.707 (0.125) 0.722 (0.127) 0.711 (0.122)
Cluster smp: 0.665 (0.157) 0.699 (0.140) 0.703 (0.145)
deft 1.26 1.10 1.19
deff 1.58 1.22 1.41
% Var. incr. 58% 22% 41%
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Design effects' deftness
• ``Design effect'' or deff = varclus(ˆθ)/varsrs(ˆθ)(Kish, 1965);
• deff is increase in variance relative to a simple random
sampling design;
• deft is relative increase in standard errors;
• In practice deff/deft have to be estimated and we use the
sandwich estimator of variance.
Useful for:
• Seeing to what extent it makes a difference to take
complex sampling into account;
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Design effects' deftness
• ``Design effect'' or deff = varclus(ˆθ)/varsrs(ˆθ)(Kish, 1965);
• deff is increase in variance relative to a simple random
sampling design;
• deft is relative increase in standard errors;
• In practice deff/deft have to be estimated and we use the
sandwich estimator of variance.
Useful for:
• Seeing to what extent it makes a difference to take
complex sampling into account;
• Identifying parameters that are more or less affected;
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Design effects' deftness
• ``Design effect'' or deff = varclus(ˆθ)/varsrs(ˆθ)(Kish, 1965);
• deff is increase in variance relative to a simple random
sampling design;
• deft is relative increase in standard errors;
• In practice deff/deft have to be estimated and we use the
sandwich estimator of variance.
Useful for:
• Seeing to what extent it makes a difference to take
complex sampling into account;
• Identifying parameters that are more or less affected;
• Sample size and power calculations.
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Design effects' deftness
• ``Design effect'' or deff = varclus(ˆθ)/varsrs(ˆθ)(Kish, 1965);
• deff is increase in variance relative to a simple random
sampling design;
• deft is relative increase in standard errors;
• In practice deff/deft have to be estimated and we use the
sandwich estimator of variance.
Useful for:
• Seeing to what extent it makes a difference to take
complex sampling into account;
• Identifying parameters that are more or less affected;
• Sample size and power calculations.
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
The effect of unequal probabilities of selection
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Sampling with probability correlated with factor x
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Sampling with probability correlated with x2
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
ˆλ11
ˆλ21
ˆλ31
avg sd avg sd avg sd
Population: 0.707 0.707 0.707
SRS: 0.707 (0.125) 0.722 (0.127) 0.711 (0.122)
Selection probability proportional to latent factor x:
Unwghted: 0.679 (0.137) 0.683 (0.138) 0.692 (0.137)
Bias/deft -4% 1.13 -3% 1.12 -2% 1.12
Weighted: 0.687 (0.143) 0.698 (0.143) 0.703 (0.143)
Bias/deft -3% 1.18 -1% 1.16 -1% 1.17
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
ˆλ11
ˆλ21
ˆλ31
avg sd avg sd avg sd
Population: 0.707 0.707 0.707
SRS: 0.707 (0.125) 0.722 (0.127) 0.711 (0.122)
Selection probability proportional to latent factor x:
Unwghted: 0.679 (0.137) 0.683 (0.138) 0.692 (0.137)
Bias/deft -4% 1.13 -3% 1.12 -2% 1.12
Weighted: 0.687 (0.143) 0.698 (0.143) 0.703 (0.143)
Bias/deft -3% 1.18 -1% 1.16 -1% 1.17
Selection probability proportional to x2:
Unwghted: 0.845 (0.060) 0.842 (0.061) 0.843 (0.061)
Bias/deft 20% 0.495 19% 0.492 19% 0.497
Weighted: 0.750 (0.139) 0.739 (0.141) 0.737 (0.137)
Bias/deft 6% 1.149 5% 1.145 4% 1.123
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
When does weighting make a difference for point
estimates of latent variable models?
• (Usually) when weights represent omitted variable(s) that
interact with observed or latent variables;
• (Sometimes, e.g. IRT, LCA) when selection is correlated
with a dependent variable.
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
When does weighting make a difference for point
estimates of latent variable models?
• (Usually) when weights represent omitted variable(s) that
interact with observed or latent variables;
• (Sometimes, e.g. IRT, LCA) when selection is correlated
with a dependent variable.
• When the model is strongly misspecified:
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
When does weighting make a difference for point
estimates of latent variable models?
• (Usually) when weights represent omitted variable(s) that
interact with observed or latent variables;
• (Sometimes, e.g. IRT, LCA) when selection is correlated
with a dependent variable.
• When the model is strongly misspecified:
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
When does weighting make a difference for point
estimates of latent variable models?
• (Usually) when weights represent omitted variable(s) that
interact with observed or latent variables;
• (Sometimes, e.g. IRT, LCA) when selection is correlated
with a dependent variable.
• When the model is strongly misspecified:
0.5 1.0 1.5 2.0
-2-101
x
y1
True curve (black line),
Overall linear reg. line (green),
and reg. from unequal selection/weights
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Should you weight?
..1 Purpose of the analysis: analytical versus descriptive;
..2 Anticipated bias from an unweighted analysis;
..3 If unweighted analysis is unbiased, relative magnitude of
inefficiency resulting from a weighted analysis;
..4 Whether variables are available and known to model the
sample design instead of weighting the analysis.
(Patterson et al., 2002, p. 727)
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Conclusions
• Surveys are not usually simple random samples (or iid);
• Sample design may bias the results of latent variable
modeling (confidence intervals, significance tests, fit
measures, parameter estimates);
• Pseudo-maximum likelihood can take the design into
account without additional assumptions;
• Implemented in software. SEM: lavaan.survey in R
• Nonparametric correction for the design;
• ``Aggregate modeling'';
• Payment is in variance (efficiency);
• Alternative is modeling the effects of strata, clusters,
covariates behind; ``disaggregate modeling''.
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Thank you for your attention!
Daniel Oberski
doberski@uvt.nl
http://daob.org/
Complex sampling in latent variable models Daniel Oberski
References
References
Fuller, W. A. (2009). Sampling statistics. Wiley, New York.
Heeringa, S., West, B., and Berglund, P. (2010). Applied survey data analysis.
Horvitz, D. and Thompson, D. (1952). A generalization of sampling without
replacement from a finite universe. Journal of the American Statistical
Association, 47(260):663--685.
Kish, L. (1965). Survey sampling. New York: Wiley.
Korn, E. and Graubard, B. (1995). Examples of differing weighted and
unweighted estimates from a sample survey. The American Statistician,
49(3):291--295.
Lumley, T. (2010). Complex surveys: a guide to analysis using R. Wiley.
Neyman, J. (1934). On the two different aspects of the representative
method: the method of stratified sampling and the method of purposive
selection. Journal of the Royal Statistical Society, 97(4):558--625.
Patterson, B., Dayton, C., and Graubard, B. (2002). Latent class analysis of
complex sample survey data. Journal of the American Statistical
Association, 97(459):721--741.
Skinner, C., Holt, D., and Smith, T. (1989). Analysis of complex surveys. John
Wiley & Sons.
Complex sampling in latent variable models Daniel Oberski

Mais conteúdo relacionado

Semelhante a Complex sampling in latent variable models

Waterloo GLMM talk
Waterloo GLMM talkWaterloo GLMM talk
Waterloo GLMM talkBen Bolker
 
Waterloo GLMM talk
Waterloo GLMM talkWaterloo GLMM talk
Waterloo GLMM talkBen Bolker
 
Take it to the Limit: quantitation, likelihood, modelling and other matters
Take it to the Limit: quantitation, likelihood, modelling and other mattersTake it to the Limit: quantitation, likelihood, modelling and other matters
Take it to the Limit: quantitation, likelihood, modelling and other mattersStephen Senn
 
factor-analysis (1).pdf
factor-analysis (1).pdffactor-analysis (1).pdf
factor-analysis (1).pdfYashwanth Rm
 
Slides for "Do Deep Generative Models Know What They Don't know?"
Slides for "Do Deep Generative Models Know What They Don't know?"Slides for "Do Deep Generative Models Know What They Don't know?"
Slides for "Do Deep Generative Models Know What They Don't know?"Julius Hietala
 
Statistik dan Probabilitas Yuni Yamasari 2.pptx
Statistik dan Probabilitas Yuni Yamasari 2.pptxStatistik dan Probabilitas Yuni Yamasari 2.pptx
Statistik dan Probabilitas Yuni Yamasari 2.pptxAisyahLailia
 
Advanced Econometrics L9.pptx
Advanced Econometrics L9.pptxAdvanced Econometrics L9.pptx
Advanced Econometrics L9.pptxakashayosha
 
Multicollinearity1
Multicollinearity1Multicollinearity1
Multicollinearity1Muhammad Ali
 
Regression Analysis
Regression AnalysisRegression Analysis
Regression AnalysisSalim Azad
 
Mixed Effects Models - Logit Models
Mixed Effects Models - Logit ModelsMixed Effects Models - Logit Models
Mixed Effects Models - Logit ModelsScott Fraundorf
 
Rsqrd AI - ML Interpretability: Beyond Feature Importance
Rsqrd AI - ML Interpretability: Beyond Feature ImportanceRsqrd AI - ML Interpretability: Beyond Feature Importance
Rsqrd AI - ML Interpretability: Beyond Feature ImportanceAlessya Visnjic
 
Discrete and continuous probability models
Discrete and continuous probability modelsDiscrete and continuous probability models
Discrete and continuous probability modelsAkshay Kumar Mishra
 
An Introduction to Factor analysis ppt
An Introduction to Factor analysis pptAn Introduction to Factor analysis ppt
An Introduction to Factor analysis pptMukesh Bisht
 

Semelhante a Complex sampling in latent variable models (20)

R meetup lm
R meetup lmR meetup lm
R meetup lm
 
Waterloo GLMM talk
Waterloo GLMM talkWaterloo GLMM talk
Waterloo GLMM talk
 
Waterloo GLMM talk
Waterloo GLMM talkWaterloo GLMM talk
Waterloo GLMM talk
 
Heteroscedasticity
HeteroscedasticityHeteroscedasticity
Heteroscedasticity
 
Multicollinearity PPT
Multicollinearity PPTMulticollinearity PPT
Multicollinearity PPT
 
GLM_2020_21.pptx
GLM_2020_21.pptxGLM_2020_21.pptx
GLM_2020_21.pptx
 
Chapter4_Multi_Reg_Estim.pdf.pdf
Chapter4_Multi_Reg_Estim.pdf.pdfChapter4_Multi_Reg_Estim.pdf.pdf
Chapter4_Multi_Reg_Estim.pdf.pdf
 
Take it to the Limit: quantitation, likelihood, modelling and other matters
Take it to the Limit: quantitation, likelihood, modelling and other mattersTake it to the Limit: quantitation, likelihood, modelling and other matters
Take it to the Limit: quantitation, likelihood, modelling and other matters
 
factor-analysis (1).pdf
factor-analysis (1).pdffactor-analysis (1).pdf
factor-analysis (1).pdf
 
Slides for "Do Deep Generative Models Know What They Don't know?"
Slides for "Do Deep Generative Models Know What They Don't know?"Slides for "Do Deep Generative Models Know What They Don't know?"
Slides for "Do Deep Generative Models Know What They Don't know?"
 
Statistik dan Probabilitas Yuni Yamasari 2.pptx
Statistik dan Probabilitas Yuni Yamasari 2.pptxStatistik dan Probabilitas Yuni Yamasari 2.pptx
Statistik dan Probabilitas Yuni Yamasari 2.pptx
 
Advanced Econometrics L9.pptx
Advanced Econometrics L9.pptxAdvanced Econometrics L9.pptx
Advanced Econometrics L9.pptx
 
Multicollinearity1
Multicollinearity1Multicollinearity1
Multicollinearity1
 
Regression Analysis
Regression AnalysisRegression Analysis
Regression Analysis
 
RamezanPhDSlides
RamezanPhDSlidesRamezanPhDSlides
RamezanPhDSlides
 
DNN Model Interpretability
DNN Model InterpretabilityDNN Model Interpretability
DNN Model Interpretability
 
Mixed Effects Models - Logit Models
Mixed Effects Models - Logit ModelsMixed Effects Models - Logit Models
Mixed Effects Models - Logit Models
 
Rsqrd AI - ML Interpretability: Beyond Feature Importance
Rsqrd AI - ML Interpretability: Beyond Feature ImportanceRsqrd AI - ML Interpretability: Beyond Feature Importance
Rsqrd AI - ML Interpretability: Beyond Feature Importance
 
Discrete and continuous probability models
Discrete and continuous probability modelsDiscrete and continuous probability models
Discrete and continuous probability models
 
An Introduction to Factor analysis ppt
An Introduction to Factor analysis pptAn Introduction to Factor analysis ppt
An Introduction to Factor analysis ppt
 

Último

High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...chandars293
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryAlex Henderson
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learninglevieagacer
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxSuji236384
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLkantirani197
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learninglevieagacer
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professormuralinath2
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformationAreesha Ahmad
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)Areesha Ahmad
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfSumit Kumar yadav
 
Dubai Call Girls Beauty Face Teen O525547819 Call Girls Dubai Young
Dubai Call Girls Beauty Face Teen O525547819 Call Girls Dubai YoungDubai Call Girls Beauty Face Teen O525547819 Call Girls Dubai Young
Dubai Call Girls Beauty Face Teen O525547819 Call Girls Dubai Youngkajalvid75
 
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flypumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flyPRADYUMMAURYA1
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsSérgio Sacani
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.Nitya salvi
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bSérgio Sacani
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusNazaninKarimi6
 

Último (20)

High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdf
 
Dubai Call Girls Beauty Face Teen O525547819 Call Girls Dubai Young
Dubai Call Girls Beauty Face Teen O525547819 Call Girls Dubai YoungDubai Call Girls Beauty Face Teen O525547819 Call Girls Dubai Young
Dubai Call Girls Beauty Face Teen O525547819 Call Girls Dubai Young
 
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flypumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
 

Complex sampling in latent variable models

  • 1. Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion Complex sampling in latent variable models Daniel Oberski Department of methodology and statistics Complex sampling in latent variable models Daniel Oberski
  • 2. Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion • When doing latent class analysis, factor analysis, IRT, or structural equation modeling, should you use sampling weights, stratification, and clustering variables? • What is complex about surveys? • What is ``pseudo'' about pseudo-maximum likelihood? • What are design effects and what makes them so deft? Complex sampling in latent variable models Daniel Oberski
  • 3. Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion Outline ..1 Complex surveys ..2 Latent variable models (LVM) ..3 Estimation of LVM under complex sampling ..4 Effect on LVM ..5 Conclusion Complex sampling in latent variable models Daniel Oberski
  • 4. Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion Does it make a difference? Complex sampling in latent variable models Daniel Oberski
  • 5. Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion Unweighted regression Weighted regression Source: 1988 National Maternal and Infant Health Survey (Korn and Graubard, 1995). Complex sampling in latent variable models Daniel Oberski
  • 6. Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion Unweighted regression Weighted regression Source: 1988 National Maternal and Infant Health Survey (Korn and Graubard, 1995). Complex sampling in latent variable models Daniel Oberski
  • 7. Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion Latent class analysis of eating vegetables Unweighted LCA Low High Latent class 33% 77% Recall 1 high 60% 80% Recall 2 high 51% 82% Recall 3 high 40% 81% Recall 4 high 46% 79% Source: The continuing Survey of Food Intakes by Individuals (Patterson et al., 2002). Complex sampling in latent variable models Daniel Oberski
  • 8. Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion Latent class analysis of eating vegetables Unweighted LCA Low High Latent class 33% 77% Recall 1 high 60% 80% Recall 2 high 51% 82% Recall 3 high 40% 81% Recall 4 high 46% 79% LCA using weights Low High Latent class 18% 82% Recall 1 high 46% 78% Recall 2 high 39% 76% Recall 3 high 28% 77% Recall 4 high 39% 73% Source: The continuing Survey of Food Intakes by Individuals (Patterson et al., 2002). Complex sampling in latent variable models Daniel Oberski
  • 9. Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion Sample surveys, ``linear estimators'' Complex sampling in latent variable models Daniel Oberski
  • 10. Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion Sample surveys Purposes: • Descriptive; • Analytic. Assessment of Health Status and Social Determinants of Health (Padgol village, Gujarat, India). Source: Boston U. India Research and Outreach Initiative. Complex sampling in latent variable models Daniel Oberski
  • 11. Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion Sample surveys Idea of a sample survey: can generalize from a sample to a population if the sample is ``like'' the population, ``representative method''. Complex sampling in latent variable models Daniel Oberski
  • 12. Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion Sample of people ``like'' the population? • Neyman (1934) figured this would be true on average if you draw a random sample; • This is the theory we still use today. Complex sampling in latent variable models Daniel Oberski
  • 13. Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion Sample of people ``like'' the population? • Neyman (1934) figured this would be true on average if you draw a random sample; • This is the theory we still use today. Complex sampling in latent variable models Daniel Oberski
  • 14. Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion Sample of people ``like'' the population? • Neyman (1934) figured this would be true on average if you draw a random sample; • This is the theory we still use today. ``Linear estimator'': Eπ  n−1 ∑ i∈sample yi   = N−1 ∑ i∈population yi. and generally mn d → N[µ, var(mn)] Complex sampling in latent variable models Daniel Oberski
  • 15. Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion Sample of people ``like'' the population? • Neyman (1934) figured this would be true on average if you draw a random sample; • This is the theory we still use today. ``Linear estimator'': Eπ  n−1 ∑ i∈sample yi   = N−1 ∑ i∈population yi. and generally mn d → N[µ, var(mn)] ``Design-consistent'' Complex sampling in latent variable models Daniel Oberski
  • 16. Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion ``Linear estimator'' • Most of the time when people talk about ``linear estimators'', they are thinking about means and totals. • But a proportion is a linear estimator too; Complex sampling in latent variable models Daniel Oberski
  • 17. Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion ``Linear estimator'' • Most of the time when people talk about ``linear estimators'', they are thinking about means and totals. • But a proportion is a linear estimator too; • for ex., proportion observed for response patterns: Complex sampling in latent variable models Daniel Oberski
  • 18. Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion ``Linear estimator'' • Most of the time when people talk about ``linear estimators'', they are thinking about means and totals. • But a proportion is a linear estimator too; • for ex., proportion observed for response patterns: Complex sampling in latent variable models Daniel Oberski
  • 19. Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion ``Linear estimator'' • Most of the time when people talk about ``linear estimators'', they are thinking about means and totals. • But a proportion is a linear estimator too; • for ex., proportion observed for response patterns: Pattern Prop. Pattern Prop. 1111 0.226 0111 0.090 1110 0.087 0110 0.047 1101 0.092 0101 0.046 1100 0.049 0100 0.030 1011 0.085 0011 0.045 1010 0.048 0010 0.028 1001 0.049 0001 0.029 1000 0.029 0000 0.022 Complex sampling in latent variable models Daniel Oberski
  • 20. Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion ``Linear estimator'' • Most of the time when people talk about ``linear estimators'', they are thinking about means and totals. • But a proportion is a linear estimator too; • for ex., proportion observed for response patterns: Pattern Prop. Pattern Prop. 1111 0.226 0111 0.090 1110 0.087 0110 0.047 1101 0.092 0101 0.046 1100 0.049 0100 0.030 1011 0.085 0011 0.045 1010 0.048 0010 0.028 1001 0.049 0001 0.029 1000 0.029 0000 0.022 → Complex sampling in latent variable models Daniel Oberski
  • 21. Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion ``Linear estimator'' • Most of the time when people talk about ``linear estimators'', they are thinking about means and totals. • But a proportion is a linear estimator too; • for ex., proportion observed for response patterns: Pattern Prop. Pattern Prop. 1111 0.226 0111 0.090 1110 0.087 0110 0.047 1101 0.092 0101 0.046 1100 0.049 0100 0.030 1011 0.085 0011 0.045 1010 0.048 0010 0.028 1001 0.049 0001 0.029 1000 0.029 0000 0.022 → LCA estimates: Latent class 1 2 y1 0.77 0.56 y2 0.78 0.55 y3 0.76 0.55 y4 0.78 0.54 Complex sampling in latent variable models Daniel Oberski
  • 22. Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion ``Linear estimator'' • Most of the time when people talk about ``linear estimators'', they are thinking about means and totals. • But a proportion is a linear estimator too; • for ex., proportion observed for response patterns: Pattern Prop. Pattern Prop. 1111 0.226 0111 0.090 1110 0.087 0110 0.047 1101 0.092 0101 0.046 1100 0.049 0100 0.030 1011 0.085 0011 0.045 1010 0.048 0010 0.028 1001 0.049 0001 0.029 1000 0.029 0000 0.022 → LCA estimates: Latent class 1 2 y1 0.77 0.56 y2 0.78 0.55 y3 0.76 0.55 y4 0.78 0.54 • Even the (co)variance is a linear estimator, if you redefine d := (y − E(Y))(y − E(Y))T: then var(y) = (n − 1)−1 ∑ d Complex sampling in latent variable models Daniel Oberski
  • 23. Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion ``Linear estimator'' • Most of the time when people talk about ``linear estimators'', they are thinking about means and totals. • But a proportion is a linear estimator too; • for ex., proportion observed for response patterns: Pattern Prop. Pattern Prop. 1111 0.226 0111 0.090 1110 0.087 0110 0.047 1101 0.092 0101 0.046 1100 0.049 0100 0.030 1011 0.085 0011 0.045 1010 0.048 0010 0.028 1001 0.049 0001 0.029 1000 0.029 0000 0.022 → LCA estimates: Latent class 1 2 y1 0.77 0.56 y2 0.78 0.55 y3 0.76 0.55 y4 0.78 0.54 • Even the (co)variance is a linear estimator, if you redefine d := (y − E(Y))(y − E(Y))T: then var(y) = (n − 1)−1 ∑ d Complex sampling in latent variable models Daniel Oberski
  • 24. Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion Complications → ``complex surveys'': • Clustering • Stratification • Selection with unequal probabilities πi Equivalent: not independently and identically distributed (iid) Complex sampling in latent variable models Daniel Oberski
  • 25. Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion Clustering Complex sampling in latent variable models Daniel Oberski
  • 26. Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion Simple random sampling: a lot of driving A simple random sample of voter locations in the US. Source: Lumley (2010). Complex sampling in latent variable models Daniel Oberski
  • 27. Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion Source: Heeringa et al. (2010) Complex sampling in latent variable models Daniel Oberski
  • 28. Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion Sample clustering for several reasons: • Geographic clustering of elements for household surveys reduces interviewing costs by amortizing travel and related expenditures over a group of observations. E.g.: NCS- R, National Health and Nutrition Examination Survey (NHANES), Health and Retirement Study (HRS) • Sample elements may not be individually identified on the available sampling frames but can be linked to aggregate cluster units (e.g., voters at precinct polling stations, students in colleges and universities). The available sampling frame often identifies only the cluster groupings. (Heeringa et al., 2010, p. 28) Complex sampling in latent variable models Daniel Oberski
  • 29. Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion Sample clustering for several reasons: • Geographic clustering of elements for household surveys reduces interviewing costs by amortizing travel and related expenditures over a group of observations. E.g.: NCS- R, National Health and Nutrition Examination Survey (NHANES), Health and Retirement Study (HRS) • Sample elements may not be individually identified on the available sampling frames but can be linked to aggregate cluster units (e.g., voters at precinct polling stations, students in colleges and universities). The available sampling frame often identifies only the cluster groupings. • One or more stages of the sample are deliberately clustered to enable the estimation of multilevel models and components of variance in variables of interest (e.g., students in classes, classes within schools). (Heeringa et al., 2010, p. 28) Complex sampling in latent variable models Daniel Oberski
  • 30. Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion Sample clustering for several reasons: • Geographic clustering of elements for household surveys reduces interviewing costs by amortizing travel and related expenditures over a group of observations. E.g.: NCS- R, National Health and Nutrition Examination Survey (NHANES), Health and Retirement Study (HRS) • Sample elements may not be individually identified on the available sampling frames but can be linked to aggregate cluster units (e.g., voters at precinct polling stations, students in colleges and universities). The available sampling frame often identifies only the cluster groupings. • One or more stages of the sample are deliberately clustered to enable the estimation of multilevel models and components of variance in variables of interest (e.g., students in classes, classes within schools). (Heeringa et al., 2010, p. 28) Complex sampling in latent variable models Daniel Oberski
  • 31. Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion Stratification Complex sampling in latent variable models Daniel Oberski
  • 32. Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion Sample stratified by region Complex sampling in latent variable models Daniel Oberski
  • 33. Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion Stratified sampling serves several purposes: • Relative to an SRS of equal size, smaller standard errors • Disproportionately allocate the sample to subpopulations, that is, to oversample specific subpopulations to ensure sufficient sample sizes for analysis. (Heeringa et al., 2010, p. 32) Complex sampling in latent variable models Daniel Oberski
  • 34. Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion Unequal probabilities of selection Complex sampling in latent variable models Daniel Oberski
  • 35. Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion Complex sampling in latent variable models Daniel Oberski
  • 36. Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion Common reasons for varying probabilities of case selection in sample surveys include (Heeringa et al., 2010, p. 38--43): • Disproportionate sampling within strata to • achieve an optimally allocated sample Complex sampling in latent variable models Daniel Oberski
  • 37. Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion Common reasons for varying probabilities of case selection in sample surveys include (Heeringa et al., 2010, p. 38--43): • Disproportionate sampling within strata to • achieve an optimally allocated sample • deliberately increase precision for subpopulations Complex sampling in latent variable models Daniel Oberski
  • 38. Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion Common reasons for varying probabilities of case selection in sample surveys include (Heeringa et al., 2010, p. 38--43): • Disproportionate sampling within strata to • achieve an optimally allocated sample • deliberately increase precision for subpopulations • Differentially sample subpopulations, e.g. NHANES oversampling of people with disabilities. Complex sampling in latent variable models Daniel Oberski
  • 39. Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion Common reasons for varying probabilities of case selection in sample surveys include (Heeringa et al., 2010, p. 38--43): • Disproportionate sampling within strata to • achieve an optimally allocated sample • deliberately increase precision for subpopulations • Differentially sample subpopulations, e.g. NHANES oversampling of people with disabilities. • Subsampling of observational units within sample clusters, e.g. selecting a single random respondent from the eligible members of sample households. Complex sampling in latent variable models Daniel Oberski
  • 40. Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion Common reasons for varying probabilities of case selection in sample surveys include (Heeringa et al., 2010, p. 38--43): • Disproportionate sampling within strata to • achieve an optimally allocated sample • deliberately increase precision for subpopulations • Differentially sample subpopulations, e.g. NHANES oversampling of people with disabilities. • Subsampling of observational units within sample clusters, e.g. selecting a single random respondent from the eligible members of sample households. • Sampling probability that can be obtained only in the process of the survey data collection, e.g. in a random digit dialing (RDD) telephone survey, number of distinct landline telephone numbers Complex sampling in latent variable models Daniel Oberski
  • 41. Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion Common reasons for varying probabilities of case selection in sample surveys include (Heeringa et al., 2010, p. 38--43): • Disproportionate sampling within strata to • achieve an optimally allocated sample • deliberately increase precision for subpopulations • Differentially sample subpopulations, e.g. NHANES oversampling of people with disabilities. • Subsampling of observational units within sample clusters, e.g. selecting a single random respondent from the eligible members of sample households. • Sampling probability that can be obtained only in the process of the survey data collection, e.g. in a random digit dialing (RDD) telephone survey, number of distinct landline telephone numbers • Nonresponse Complex sampling in latent variable models Daniel Oberski
  • 42. Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion Common reasons for varying probabilities of case selection in sample surveys include (Heeringa et al., 2010, p. 38--43): • Disproportionate sampling within strata to • achieve an optimally allocated sample • deliberately increase precision for subpopulations • Differentially sample subpopulations, e.g. NHANES oversampling of people with disabilities. • Subsampling of observational units within sample clusters, e.g. selecting a single random respondent from the eligible members of sample households. • Sampling probability that can be obtained only in the process of the survey data collection, e.g. in a random digit dialing (RDD) telephone survey, number of distinct landline telephone numbers • Nonresponse Complex sampling in latent variable models Daniel Oberski
  • 43. Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion Linear estimators in complex samples Complex sampling in latent variable models Daniel Oberski
  • 44. Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion Problems: Bias: If some (types of) people have a differing chance πi of being in the sample, usual sample statistics will not (on average) equal the population quantities anymore. Variance: Affected by clustering/stratification. Complex sampling in latent variable models Daniel Oberski
  • 45. Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion Problems: Bias: If some (types of) people have a differing chance πi of being in the sample, usual sample statistics will not (on average) equal the population quantities anymore. Variance: Affected by clustering/stratification. If ˆµn := n−1 ∑ i∈sample 1 πi yi, notice: Eπ  n−1 ∑ i∈sample 1 πi yi   = N−1 ∑ i∈population πi πi yi = N−1 ∑ i∈population yi Complex sampling in latent variable models Daniel Oberski
  • 46. Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion Problems: Bias: If some (types of) people have a differing chance πi of being in the sample, usual sample statistics will not (on average) equal the population quantities anymore. Variance: Affected by clustering/stratification. If ˆµn := n−1 ∑ i∈sample 1 πi yi, notice: Eπ  n−1 ∑ i∈sample 1 πi yi   = N−1 ∑ i∈population πi πi yi = N−1 ∑ i∈population yi Solutions: • weighted estimator ˆµn unbiased (Horvitz and Thompson, 1952); • Can obtain variance of weighted estimate, var(ˆµn), under clustering, stratification. Complex sampling in latent variable models Daniel Oberski
  • 47. Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion Latent variable modeling Complex sampling in latent variable models Daniel Oberski
  • 48. Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion Latent variable modeling (LVM) • (Confirmatory) factor analysis (CFA); • Structural Equation Modeling (SEM); • Latent Class Analysis/Modeling (LCA/LCM); • Latent trait modeling; • Item Response Theory (IRT) models; • Mixture models; • Random effects/hierarchical/multilevel models; • ``Anchoring vignettes'' models; • ... etc. Complex sampling in latent variable models Daniel Oberski
  • 49. Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion • Proportions can be turned into an LC or IRT analysis; • Covariances can be turned into a SEM analysis. Complex sampling in latent variable models Daniel Oberski
  • 50. Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion • Proportions can be turned into an LC or IRT analysis; • Covariances can be turned into a SEM analysis. Definition Latent variable model estimation: a way of turning observed covariances/proportions (``moments'') into LVM parameter estimates. LVM : mn → ˆθn Complex sampling in latent variable models Daniel Oberski
  • 51. Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion LVM : mn → ˆθn Example: confirmatory factor analysis (CFA) with 1 factor, 3 indicators: : ˆλ11 = √ cor(y1, y2)cor(y1, y3)/cor(y2, y3) ˆλ21 = √ cor(y1, y2)cor(y2, y3)/cor(y1, y3) ˆλ31 = √ cor(y1, y3)cor(y2, y3)/cor(y1, y2) Complex sampling in latent variable models Daniel Oberski
  • 52. Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion Inference in latent variable models under simple random sampling Complex sampling in latent variable models Daniel Oberski
  • 53. Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion Data generating process: → → Inference: ← ← (Fuller, 2009). Complex sampling in latent variable models Daniel Oberski
  • 54. Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion Data generating process: Model Superpopulation → → Inference: ← ← (Fuller, 2009). Complex sampling in latent variable models Daniel Oberski
  • 55. Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion Data generating process: Model Superpopulation → Finite population → Inference: ← ← (Fuller, 2009). Complex sampling in latent variable models Daniel Oberski
  • 56. Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion Data generating process: Model Superpopulation → Finite population → Sample Inference: ← ← (Fuller, 2009). Complex sampling in latent variable models Daniel Oberski
  • 57. Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion Data generating process: Model Superpopulation → Finite population → Sample Inference: ← ← Sample (Fuller, 2009). Complex sampling in latent variable models Daniel Oberski
  • 58. Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion Data generating process: Model Superpopulation → Finite population → Sample Inference: ← Finite population ← Sample (Fuller, 2009). Complex sampling in latent variable models Daniel Oberski
  • 59. Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion Data generating process: Model Superpopulation → Finite population → Sample Inference: Model ← Finite population ← Sample (Fuller, 2009). Complex sampling in latent variable models Daniel Oberski
  • 60. Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion Superpopulation → Finite population of 100 subjects Loadings: 0.707 → y1 −2 0 2 −4 −2 0 2 Corr: 0.442 Corr: 0.475 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● y20 2 −2 0 2 Corr: 0.321 ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● y3 0 2 −2 0 2 Loadings: y1: 0.810 y2: 0.546 y3: 0.587 Complex sampling in latent variable models Daniel Oberski
  • 61. Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion Simple random sample (SRS) of 20 from finite pop. y1 −2 0 2 −4 −2 0 2 Cor : 0.442 1: 0.425 2: 0.568 Cor : 0.475 1: 0.361 2: 0.668 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● y20 2 −2 0 2 Cor : 0.321 1: 0.258 2: 0.543 ● ● ●● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● y3 0 2 −2 0 2 ● (Superpopulation loadings: 0.707) SRS factor loading estimates: y1: 0.836 y2: 0.679 y3: 0.800 Complex sampling in latent variable models Daniel Oberski
  • 62. Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion Superpopulation inference from SRS to superpopulation Superpopulation ← ← Sample y1 −2 0 2 −4 −2 0 2 Cor : 0.442 1: 0.425 2: 0.568 Cor : 0.475 1: 0.361 2: 0.668 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● y20 2 −2 0 2 Cor : 0.321 1: 0.258 2: 0.543 ● ● ●● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● y3 0 2 −2 0 2 ● simple.random 1 2 λ11: 0.707 λ21: 0.707 λ31: 0.707 ← ← Avg. (sd) loading over 10,000 samples: ˆλ11: 0.707 (0.125) ˆλ21: 0.722 (0.127) ˆλ31: 0.711 (0.122) Complex sampling in latent variable models Daniel Oberski
  • 63. Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion Complex sampling affects latent variable modeling Complex sampling in latent variable models Daniel Oberski
  • 64. Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion LVM : mn → ˆθn This means that: • bias in covariances/proportions (moments) leads to bias in LVM parameter estimates; • any across-sample variation in latent variable parameter estimates is entirely due to variation in the sample moments used to estimate them. • With more observed variables (moments), use Maximum Likelihood (ML) to get estimates, but above is still true. • MLE: ˆθn = arg maxθ L(θ; ˆµn) Complex sampling in latent variable models Daniel Oberski
  • 65. Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion LVM: mn → ˆθn so bias in mn means bias in ˆθn • One solution: modeling correctly all aspects of the sampling design. (Skinner et al., 1989, chapter 3) Complex sampling in latent variable models Daniel Oberski
  • 66. Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion LVM: mn → ˆθn so bias in mn means bias in ˆθn • One solution: modeling correctly all aspects of the sampling design. • Another solution: replacing the observed moments with design-consistent moments will provide design-consistent estimates = ``pseudo-maximum likelihood'' (PML). ˆµn → ˆθn (Skinner et al., 1989, chapter 3) Complex sampling in latent variable models Daniel Oberski
  • 67. Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion LVM: mn → ˆθn so bias in mn means bias in ˆθn • One solution: modeling correctly all aspects of the sampling design. • Another solution: replacing the observed moments with design-consistent moments will provide design-consistent estimates = ``pseudo-maximum likelihood'' (PML). ˆµn → ˆθn • (A third solution: weighted least squares - less than satisfactory results) (Skinner et al., 1989, chapter 3) Complex sampling in latent variable models Daniel Oberski
  • 68. Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion LVM: mn → ˆθn so bias in mn means bias in ˆθn • One solution: modeling correctly all aspects of the sampling design. • Another solution: replacing the observed moments with design-consistent moments will provide design-consistent estimates = ``pseudo-maximum likelihood'' (PML). ˆµn → ˆθn • (A third solution: weighted least squares - less than satisfactory results) (Skinner et al., 1989, chapter 3) Complex sampling in latent variable models Daniel Oberski
  • 69. Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion LVM: mn → ˆθn so bias in mn means bias in ˆθn • One solution: modeling correctly all aspects of the sampling design. • Another solution: replacing the observed moments with design-consistent moments will provide design-consistent estimates = ``pseudo-maximum likelihood'' (PML). ˆµn → ˆθn (Skinner et al., 1989, chapter 3) Complex sampling in latent variable models Daniel Oberski
  • 70. Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion Variance of PMLE is obtained by sandwich (linearization) estimate. In turn depends on variance of design-consistent moment estimates (the ``meat''). var(ˆθn) = (∆T V∆)−1 ∆T V · var(ˆµn) · V∆(∆T V∆)−1 Complex sampling in latent variable models Daniel Oberski
  • 71. Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion Variance of PMLE is obtained by sandwich (linearization) estimate. In turn depends on variance of design-consistent moment estimates (the ``meat''). var(ˆθn) = (∆T V∆)−1 ∆T V · var(ˆµn) · V∆(∆T V∆)−1 Complex sampling in latent variable models Daniel Oberski
  • 72. Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion Variance of PMLE is obtained by sandwich (linearization) estimate. In turn depends on variance of design-consistent moment estimates (the ``meat''). var(ˆθn) = (∆T V∆)−1 ∆T V · var(ˆµn) · V∆(∆T V∆)−1 Complex sampling in latent variable models Daniel Oberski
  • 73. Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion Variance of PMLE is obtained by sandwich (linearization) estimate. In turn depends on variance of design-consistent moment estimates (the ``meat''). var(ˆθn) = (∆T V∆)−1 ∆T V · var(ˆµn) · V∆(∆T V∆)−1 V: Depends on distributional assumptions (=ML) ∆: Depends on the specific model (=LVM) var(ˆµn): Depends on variance of means/prop's/covar's under complex sampling Complex sampling in latent variable models Daniel Oberski
  • 74. Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion pseudo- ..1 supposed or purporting to be but not really so; false; not genuine: pseudonym | pseudoscience. ..2 resembling or imitating: pseudohallucination | pseudo-French. ORIGIN from Greek pseudēs ‘false,’ pseudos ‘falsehood.’ Source: New Oxford American Dictionary Complex sampling in latent variable models Daniel Oberski
  • 75. Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion pseudo- ..1 supposed or purporting to be but not really so; false; not genuine: pseudonym | pseudoscience. ..2 resembling or imitating: pseudohallucination | pseudo-French. ORIGIN from Greek pseudēs ‘false,’ pseudos ‘falsehood.’ Source: New Oxford American Dictionary Complex sampling in latent variable models Daniel Oberski
  • 76. Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion Pseudo-ML Complex sampling in latent variable models Daniel Oberski
  • 77. Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion Pseudo-ML Why ML? • Consistently estimate parameters aggregated over clusters and strata; • Estimates ``MLE that would be obtained by fitting the model to the population data''. Complex sampling in latent variable models Daniel Oberski
  • 78. Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion Pseudo-ML Why ML? • Consistently estimate parameters aggregated over clusters and strata; • Estimates ``MLE that would be obtained by fitting the model to the population data''. Why pseudo? • Not exactly equal to the MLE obtained by correctly modeling all aspects of the sampling design; • Not asymptotically optimal. Complex sampling in latent variable models Daniel Oberski
  • 79. Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion Pseudo-ML Why ML? • Consistently estimate parameters aggregated over clusters and strata; • Estimates ``MLE that would be obtained by fitting the model to the population data''. Why pseudo? • Not exactly equal to the MLE obtained by correctly modeling all aspects of the sampling design; • Not asymptotically optimal. Why PML? • Aggregate parameters may be of interest; • No assumptions/modeling on design necessary. Complex sampling in latent variable models Daniel Oberski
  • 80. Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion The effect of clustering Complex sampling in latent variable models Daniel Oberski
  • 81. Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion Cluster sample of 20 from finite population y1 −2 0 2 −4 −2 0 2 Cor : 0.442 1: 0.412 2: 0.49 Cor : 0.475 1: 0.504 2: 0.455 ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● y20 2 −2 0 2 Cor : 0.321 1: 0.352 2: 0.205 ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● y3 0 2 −2 0 2 ● ● (Superpopulation loadings: 0.707) Cluster sample loading estimates: y1: 0.997 y2: 0.491 y3: 0.456 Complex sampling in latent variable models Daniel Oberski
  • 82. Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion Superpopulation inference from SRS to superpopulation Superpopulation ← ← Sample y1 −2 0 2 −4 −2 0 2 Cor : 0.442 1: 0.425 2: 0.568 Cor : 0.475 1: 0.361 2: 0.668 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● y20 2 −2 0 2 Cor : 0.321 1: 0.258 2: 0.543 ● ● ●● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● y3 0 2 −2 0 2 ● simple.random 1 2 λ11: 0.707 λ21: 0.707 λ31: 0.707 ← ← Avg. (sd) loading over 10,000 samples: ˆλ11: 0.665 (0.157) ˆλ21: 0.699 (0.140) ˆλ31: 0.703 (0.145) Complex sampling in latent variable models Daniel Oberski
  • 83. Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion The effect of cluster sampling on factor analysis ˆλ11 ˆλ21 ˆλ31 avg sd avg sd avg sd Population: 0.707 0.707 0.707 SRS: 0.707 (0.125) 0.722 (0.127) 0.711 (0.122) Cluster smp: 0.665 (0.157) 0.699 (0.140) 0.703 (0.145) deft 1.26 1.10 1.19 deff 1.58 1.22 1.41 % Var. incr. 58% 22% 41% Complex sampling in latent variable models Daniel Oberski
  • 84. Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion Design effects' deftness • ``Design effect'' or deff = varclus(ˆθ)/varsrs(ˆθ)(Kish, 1965); • deff is increase in variance relative to a simple random sampling design; • deft is relative increase in standard errors; • In practice deff/deft have to be estimated and we use the sandwich estimator of variance. Useful for: • Seeing to what extent it makes a difference to take complex sampling into account; Complex sampling in latent variable models Daniel Oberski
  • 85. Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion Design effects' deftness • ``Design effect'' or deff = varclus(ˆθ)/varsrs(ˆθ)(Kish, 1965); • deff is increase in variance relative to a simple random sampling design; • deft is relative increase in standard errors; • In practice deff/deft have to be estimated and we use the sandwich estimator of variance. Useful for: • Seeing to what extent it makes a difference to take complex sampling into account; • Identifying parameters that are more or less affected; Complex sampling in latent variable models Daniel Oberski
  • 86. Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion Design effects' deftness • ``Design effect'' or deff = varclus(ˆθ)/varsrs(ˆθ)(Kish, 1965); • deff is increase in variance relative to a simple random sampling design; • deft is relative increase in standard errors; • In practice deff/deft have to be estimated and we use the sandwich estimator of variance. Useful for: • Seeing to what extent it makes a difference to take complex sampling into account; • Identifying parameters that are more or less affected; • Sample size and power calculations. Complex sampling in latent variable models Daniel Oberski
  • 87. Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion Design effects' deftness • ``Design effect'' or deff = varclus(ˆθ)/varsrs(ˆθ)(Kish, 1965); • deff is increase in variance relative to a simple random sampling design; • deft is relative increase in standard errors; • In practice deff/deft have to be estimated and we use the sandwich estimator of variance. Useful for: • Seeing to what extent it makes a difference to take complex sampling into account; • Identifying parameters that are more or less affected; • Sample size and power calculations. Complex sampling in latent variable models Daniel Oberski
  • 88. Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion The effect of unequal probabilities of selection Complex sampling in latent variable models Daniel Oberski
  • 89. Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion Sampling with probability correlated with factor x Complex sampling in latent variable models Daniel Oberski
  • 90. Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion Sampling with probability correlated with x2 Complex sampling in latent variable models Daniel Oberski
  • 91. Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion ˆλ11 ˆλ21 ˆλ31 avg sd avg sd avg sd Population: 0.707 0.707 0.707 SRS: 0.707 (0.125) 0.722 (0.127) 0.711 (0.122) Selection probability proportional to latent factor x: Unwghted: 0.679 (0.137) 0.683 (0.138) 0.692 (0.137) Bias/deft -4% 1.13 -3% 1.12 -2% 1.12 Weighted: 0.687 (0.143) 0.698 (0.143) 0.703 (0.143) Bias/deft -3% 1.18 -1% 1.16 -1% 1.17 Complex sampling in latent variable models Daniel Oberski
  • 92. Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion ˆλ11 ˆλ21 ˆλ31 avg sd avg sd avg sd Population: 0.707 0.707 0.707 SRS: 0.707 (0.125) 0.722 (0.127) 0.711 (0.122) Selection probability proportional to latent factor x: Unwghted: 0.679 (0.137) 0.683 (0.138) 0.692 (0.137) Bias/deft -4% 1.13 -3% 1.12 -2% 1.12 Weighted: 0.687 (0.143) 0.698 (0.143) 0.703 (0.143) Bias/deft -3% 1.18 -1% 1.16 -1% 1.17 Selection probability proportional to x2: Unwghted: 0.845 (0.060) 0.842 (0.061) 0.843 (0.061) Bias/deft 20% 0.495 19% 0.492 19% 0.497 Weighted: 0.750 (0.139) 0.739 (0.141) 0.737 (0.137) Bias/deft 6% 1.149 5% 1.145 4% 1.123 Complex sampling in latent variable models Daniel Oberski
  • 93. Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion When does weighting make a difference for point estimates of latent variable models? • (Usually) when weights represent omitted variable(s) that interact with observed or latent variables; • (Sometimes, e.g. IRT, LCA) when selection is correlated with a dependent variable. Complex sampling in latent variable models Daniel Oberski
  • 94. Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion When does weighting make a difference for point estimates of latent variable models? • (Usually) when weights represent omitted variable(s) that interact with observed or latent variables; • (Sometimes, e.g. IRT, LCA) when selection is correlated with a dependent variable. • When the model is strongly misspecified: Complex sampling in latent variable models Daniel Oberski
  • 95. Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion When does weighting make a difference for point estimates of latent variable models? • (Usually) when weights represent omitted variable(s) that interact with observed or latent variables; • (Sometimes, e.g. IRT, LCA) when selection is correlated with a dependent variable. • When the model is strongly misspecified: Complex sampling in latent variable models Daniel Oberski
  • 96. Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion When does weighting make a difference for point estimates of latent variable models? • (Usually) when weights represent omitted variable(s) that interact with observed or latent variables; • (Sometimes, e.g. IRT, LCA) when selection is correlated with a dependent variable. • When the model is strongly misspecified: 0.5 1.0 1.5 2.0 -2-101 x y1 True curve (black line), Overall linear reg. line (green), and reg. from unequal selection/weights Complex sampling in latent variable models Daniel Oberski
  • 97. Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion Should you weight? ..1 Purpose of the analysis: analytical versus descriptive; ..2 Anticipated bias from an unweighted analysis; ..3 If unweighted analysis is unbiased, relative magnitude of inefficiency resulting from a weighted analysis; ..4 Whether variables are available and known to model the sample design instead of weighting the analysis. (Patterson et al., 2002, p. 727) Complex sampling in latent variable models Daniel Oberski
  • 98. Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion Conclusions • Surveys are not usually simple random samples (or iid); • Sample design may bias the results of latent variable modeling (confidence intervals, significance tests, fit measures, parameter estimates); • Pseudo-maximum likelihood can take the design into account without additional assumptions; • Implemented in software. SEM: lavaan.survey in R • Nonparametric correction for the design; • ``Aggregate modeling''; • Payment is in variance (efficiency); • Alternative is modeling the effects of strata, clusters, covariates behind; ``disaggregate modeling''. Complex sampling in latent variable models Daniel Oberski
  • 99. Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion Thank you for your attention! Daniel Oberski doberski@uvt.nl http://daob.org/ Complex sampling in latent variable models Daniel Oberski
  • 100. References References Fuller, W. A. (2009). Sampling statistics. Wiley, New York. Heeringa, S., West, B., and Berglund, P. (2010). Applied survey data analysis. Horvitz, D. and Thompson, D. (1952). A generalization of sampling without replacement from a finite universe. Journal of the American Statistical Association, 47(260):663--685. Kish, L. (1965). Survey sampling. New York: Wiley. Korn, E. and Graubard, B. (1995). Examples of differing weighted and unweighted estimates from a sample survey. The American Statistician, 49(3):291--295. Lumley, T. (2010). Complex surveys: a guide to analysis using R. Wiley. Neyman, J. (1934). On the two different aspects of the representative method: the method of stratified sampling and the method of purposive selection. Journal of the Royal Statistical Society, 97(4):558--625. Patterson, B., Dayton, C., and Graubard, B. (2002). Latent class analysis of complex sample survey data. Journal of the American Statistical Association, 97(459):721--741. Skinner, C., Holt, D., and Smith, T. (1989). Analysis of complex surveys. John Wiley & Sons. Complex sampling in latent variable models Daniel Oberski