O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

A measure to evaluate latent variable model fit by sensitivity analysis

704 visualizações

Publicada em

Latent variable models involve restrictions on the data that can be formulated in terms of "misspecifications": restrictions with a model-based meaning. Examples include zero cross-loadings and local dependencies, as well as “measurement invariance” or “differential item functioning”. If incorrect, misspecifications can potentially disturb the main purpose of the latent variable analysis—seriously so in some cases.
Recently, I proposed to evaluate whether a particular analysis at hand is such a case or not.
To do this, I define a measure based on the likelihood of the restricted model that approximates the change in the parameters of interest if the misspecification were freed, the EPC-interest. The main idea is to examine the EPC-interest and free those misspecifications that are “important” while ignoring those that are not. I have implemented the EPC-interest in the lavaan software for structural equation modeling and the Latent Gold software for latent class analysis.
This approach can resolve several problems and inconsistencies in the current practice of model fit evaluation used in latent variable analysis, something I illustrate using analyses from the “measurement invariance” literature and from item response theory.

Publicada em: Ciências
  • Seja o primeiro a comentar

A measure to evaluate latent variable model fit by sensitivity analysis

  1. 1. A measure to evaluate latent variable model fit by sensitivity analysis Daniel Oberski Department of methodology and statistics Dept of Statistics, Leiden University Latent variable model fit by sensitivity analysis Daniel Oberski
  2. 2. Latent variable models What do they assume and what are they good for? Latent variable model fit by sensitivity analysis Daniel Oberski
  3. 3. ξ y1 y2 yJ... p(y) = ∑ ξ p(ξ) J∏ j=1 p(yj|ξ) Latent variable model fit by sensitivity analysis Daniel Oberski
  4. 4. ξ y1 y2 yJ... p(y) = ∑ ξ p(ξ)p(y1, y2|ξ) J∏ j=3 p(yj|ξ) Latent variable model fit by sensitivity analysis Daniel Oberski
  5. 5. Example Goal: estimate false positives and false negatives in four diagnostic tests for C. Trachomatis infection: y1 Ligase chain reaction (LCR) test (Yes/No); y2 Polymerase chain reaction (PCR) test (Yes/No); y3 DNA probe test (DNAP) (Yes/No); y4 Culture (CULT) (Yes/No). Tool: 2-latent class model (diseased or non-diseased). (Original data from Dendukuri et al. 2009) Latent variable model fit by sensitivity analysis Daniel Oberski
  6. 6. Assume: ξ y1 y2 yJ... But really: ξ y1 y2 yJ... What difference does it make for the goal: false positives and false negatives? (simulation by Van Smeden et al., submitted) Latent variable model fit by sensitivity analysis Daniel Oberski
  7. 7. ξ y1 y2 yJ... x p(y) = ∑ ξ p(ξ|x) J∏ j=1 p(yj|ξ) Latent variable model fit by sensitivity analysis Daniel Oberski
  8. 8. ξ y1 y2 yJ... x p(y) = ∑ ξ p(ξ|x) J∏ j=1 p(yj|ξ, x) Latent variable model fit by sensitivity analysis Daniel Oberski
  9. 9. Example Goal: Estimate gender differences in ”valuing Stimulation”: (1) Very much like me; (2) Like me; (3) Somewhat like me; (4) A little like me; (5) Not like me; (6) Not like me at all. impdiff (S)he looks for adventures and likes to take risks. (S)he wants to have an exciting life. impadv (S)he likes surprises and is always looking for new things to do. He thinks it is important to do lots of different things in life. Tool: Structural Equation Model for European Social Survey data (n = 18519 men and 16740 women). (Original study by Schwarz et al. 2005) Latent variable model fit by sensitivity analysis Daniel Oberski
  10. 10. Assume: ξ y1 y2 yJ... x But really (?): ξ y1 y2 yJ... x What difference does it make for the goal: true gender differences in values? (re-analysis of data by Oberski 2014) q q q q q q q q Men value more Women value more −0.2 0.0 0.2 ACPO ST SD HE COTR SE UN BE "Human value" factor Latentmeandifferenceestimate±2s.e. Model q Scalar invariance Free intercept 'Adventure' Latent variable model fit by sensitivity analysis Daniel Oberski
  11. 11. PROBLEM The original authors found that the conditional independence model fit the data ”approximately” (p. 1013)... ”Chi-square deteriorated significantly, ∆χ2 (19) = 3313, p < .001, but CFI did not change. Change in chi-square is highly sensitive with large sample sizes and complex models. The other indices suggested that scalar invariance might be accepted (CFI = .88, RMSEA = .04, CI = .039.040, PCLOSE = 1.0).” ... but unfortunately this ”acceptable” misspecification could reverse their conclusions! Latent variable model fit by sensitivity analysis Daniel Oberski
  12. 12. Numbers that indicate how well the model fits the data • Likelihood Ratio vs. saturated • Information-based criteria: AIC, BIC, CAIC, ... • Bivariate residuals (Maydeu & Joe 2005; Oberski, Van Kollenburg & Vermunt 2013) • Score/Lagrange multiplier tests, “modification index”, “expected parameter change” (EPC) (Saris, Satorra & Sörbom 1989; Oberski & Vermunt 2013; Oberski & Vermunt accepted) “Fit indices”: • RMSEA: √ (χ2/df)−1) N−1 • CFI: [ (χ2 null − dfnull) − (χ2 − df) ] /(χ2 null − dfnull) • Lots of others: TLI, NFI, NNFI, RFI, IFI, RNI, RMR, SRMR1-3, GFI, AGFI, MFI, ECVI, ... Latent variable model fit by sensitivity analysis Daniel Oberski
  13. 13. What is the problem? • We do latent variable modeling with a goal in mind. • But the latent variable model might be misspecified. • The appropriate question: ”will that affect my goal?” • The actual question: ”do the data fit the model in the population” (LR) or ”are the model and the data far apart relative to model complexity” (RMSEA etc.) What is the solution? Evaluate directly what effect possible misspecifications have on the goal of the analysis. Latent variable model fit by sensitivity analysis Daniel Oberski
  14. 14. How to evaluate directly what effect possible misspecifications have on the goal of the analysis. Latent variable model fit by sensitivity analysis Daniel Oberski
  15. 15. Two ideas to evaluate the effect of misspecifications 1 Try out all possible models with misspecifications, calculate the estimates of interest under these models and evaluate whether these are substantively different. Advantage: Does the job. Disadvantage: There may be too many alternative models. Also: are applied researchers really going to do this? 2 Use EPC-interest: expected change in free parameters Advantage: Does the job without the need to estimate any alternative models. Disadvantage: Is an approximation (though a reasonable one). Latent variable model fit by sensitivity analysis Daniel Oberski
  16. 16. EPC-interest applied to Stimulation example • After fitting the full scalar invariance model, • Effect size estimate of sex difference in Stimulation is +0.214 (s.e. 0.0139). • But EPC-interest of equal ”Adventure” item intercept is -0.243. • So EPC-interest suggests conclusion can be reversed by freeing a misspecified scalar invariance restriction • Actual change when freeing this intercept is very close to EPC-interest: -0.235. Latent variable model fit by sensitivity analysis Daniel Oberski
  17. 17. EPC-interest How does it work? Latent variable model fit by sensitivity analysis Daniel Oberski
  18. 18. • Let’s say there is a restricted model whose purpose it is to estimate its parameters, θ, or some linear function of them such as a subselection, Pθ. • We could parameterize these restrictions as ψ = 0. For example: ψ could be direct effect of gender on ”Adventure”, or loglinear dependence between DNA tests. • The maximum likelihood estimates are then ˆθ = arg max L(θ, ψ = 0) Question: How much would ˆθ change if we freed ψ? Latent variable model fit by sensitivity analysis Daniel Oberski
  19. 19. How much would ˆθ change if we freed ψ? The trick is to consider estimate of θ we would get under ψ ̸= 0; that is, ˜θ = arg max L(θ, ψ). As it turns out, we don’t actually need ˜θ, since ˜θ − ˆθ = ˆH −1 θθ ˆHθψD−1 [ ∂L(θ, ψ) ∂ψ θ=ˆθ ] + O(δ′ δ), where H is a Hessian, D = ˆHψψ − ˆH ′ θψ ˆH −1 θθ ˆHθψ and δ is the ”overall wrongness” of the model (ψ′ , θ′ − ˆθ ′ )′. Latent variable model fit by sensitivity analysis Daniel Oberski
  20. 20. How much would ˆθ change if we freed ψ? Dropping the approximation term (assuming the model parameters are not ”too far” from the truth) we get the approximation EPC-interest = −P ˆH −1 θθ ˆHθψ EPC-self ≈ −P ˆH −1 θθ ˆHθψ ( ψ − ˆψ ) For those of you familiar with Structural Equation Modeling (or attending my 2013 MBC2 talk), ”EPC-self” is the usual ”expected parameter change” in the fixed parameter vector, i.e. the size of the misspecification. Latent variable model fit by sensitivity analysis Daniel Oberski
  21. 21. Monte Carlo simulation: EPC-interest is a good approximation to the actual change in parameters of interest when freeing equality restriction Average over 200 replications ∆ν1 ng EPC-self ∆ˆα ∆ˆα bias EPC-interest EPC-interest bias 0.1 50 0.064 0.240 -0.040 -0.034 0.005 0.3 50 0.213 0.313 -0.113 -0.113 -0.001 0.8 50 0.657 0.505 -0.305 -0.401 -0.096 0.1 100 0.058 0.231 -0.031 -0.031 0.000 0.3 100 0.203 0.323 -0.123 -0.109 0.014 0.8 100 0.619 0.492 -0.292 -0.370 -0.077 0.1 500 0.063 0.233 -0.033 -0.033 0.000 0.3 500 0.208 0.307 -0.107 -0.112 -0.005 0.8 500 0.598 0.501 -0.301 -0.349 -0.048 Latent variable model fit by sensitivity analysis Daniel Oberski
  22. 22. Another example showcasing EPC-interest Latent variable model fit by sensitivity analysis Daniel Oberski
  23. 23. Ranking data in 48 WVS countries Option # M/P Value wording Set A 1. M A high level of economic growth 2. M Making sure this country has strong defense forces 3. P Seeing that people have more say about how things are done at their jobs and in their communities 4. P Trying to make our cities and countryside more beautiful Set B 1. M Maintaining order in the nation 2. P Giving people more say in important government decisions 3. M Fighting rising prices 4. P Protecting freedom of speech Set C 1. M A stable economy 2. P Progress toward a less impersonal and more humane society 3. P Progress toward a society in which ideas count more than money 4. M The fight against crime Latent variable model fit by sensitivity analysis Daniel Oberski
  24. 24. Figure: Graphical representation of the multilevel latent class regression model for (post)materialism measured by three partial ranking tasks. Observed variables are shown in rectangles while unobserved (“latent”) variables are shown in ellipses. Latent variable model fit by sensitivity analysis Daniel Oberski
  25. 25. Latent class ranking model with 4 choices Each ranking set, for example, set A: P(A1ic = a1, A2ic = a2|Xic = x) = ωa1x ∑ k ωkx ωa2x ∑ k̸=a1 ωkx , where ωkx is the “utility” of object k for respondents in class x. Multilevel structure to account for the countries using group class variable G: P(Xic = x|Z1ic = z1ic, Z2ic = z2, Gc = g) = = exp(αx + γ1xz1 + γ2xz2 + βgx) ∑ t exp(αt + γ1tz1 + γ2tz2 + +βtg) , Latent variable model fit by sensitivity analysis Daniel Oberski
  26. 26. Multilevel latent class model w/ covariates for rankings L(θ) = P(A1, A2, B1, B2, C1, C2|Z1, Z2) = C∏ c=1 ∑ G P(Gc) nc∏ i=1 ∑ X P(Xic|Z1ic, Z2ic, Gc)× P(A1ic, A2ic|Xic)P(B1ic, B2ic|Xic)P(C1ic, C2ic|Xic), Goal: estimate γ (especially its sign). Possible problem: Violations of scalar and metric measurement invariance (DIF), parameterized respectively as τ∗ and λ∗. Solution: See if these matter for the sign of γ. Latent variable model fit by sensitivity analysis Daniel Oberski
  27. 27. Table: Full invariance multilevel latent class model: parameter estimates of interest with standard errors (columns 3 and 4), as well as expected change in these parameters measured by the EPC-interest when freeing each of six sets of possible misspecifications (columns 5–10). EPC-interest for... τ∗ jkg λ∗ jkxg Estimates Ranking task Ranking t Est. s.e. 1 2 3 1 2 Class 1 GDP -0.035 (0.007) -0.013 0.021 -0.002 0.073 0.252 Class 2 GDP -0.198 (0.012) -0.018 -0.035 0.015 -0.163 -0.058 Class 1 Women 0.013 (0.001) -0.006 0.002 0.000 -0.003 0.029 Class 2 Women -0.037 (0.001) 0.007 -0.003 0.002 -0.006 -0.013 Latent variable model fit by sensitivity analysis Daniel Oberski
  28. 28. Table: Partially invariant multilevel latent class model: parameter estimates of interest with standard errors (columns 3 and 4), as well as expected change in these parameters measured by the EPC-interest when freeing each of four sets of remaining possible misspecifications (columns 5–7 and 10). EPC-interest for non-invariance of... τ∗ kg λ∗ kxg Ranking task Ranking task Est. s.e. 1 2 3 1 2 3 Class 1 GDP -0.127 (0.008) -0.015 -0.003 0.002 0.097 Class 2 GDP 0.057 (0.011) -0.043 -0.013 0.002 0.161 Class 1 Women 0.008 (0.001) -0.002 0.000 0.002 0.001 Class 2 Women 0.020 (0.001) -0.007 -0.001 0.002 0.007 Latent variable model fit by sensitivity analysis Daniel Oberski
  29. 29. Mixed Postmaterialist Materialist Mixed Postmaterialist Materialist % Women in parliament GDP per capita 0.2 0.4 0.6 Minimum Maximum Minimum Maximum Covariate level ProbabilityofClass Figure: Estimated probability of choosing each class as a function of the covariates of interest under the final model. Latent variable model fit by sensitivity analysis Daniel Oberski
  30. 30. ARM AUS AZE BLR CHL CHNCOL CYP DEU DZA ECU EGY ESPEST GHA IRQ JOR JPN KAZ KGZ KOR LBN MAR MEX MYSNGA NLD NZL PAK PER PHL POLQAT ROU RUS RWA SGPSVN SWE TTO TUN TUR UKR URY USA UZB YEM ZWE ARM AUS AZE BLR CHL CHN COL CYP DEU DZA ECU EGY ESP ESTGHA IRQJOR JPN KAZ KGZ KOR LBN MAR MEX MYSNGA NLD NZL PAK PER PHLPOL QAT ROU RUS RWA SGP SVN SWE TTO TUN TUR UKR URY USA UZB YEM ZWE ARM AUS AZE BLR CHL CHN COL CYP DEU DZA ECU EGY ESP EST GHA IRQ JOR JPN KAZKGZ KOR LBN MAR MEX MYSNGA NLDNZL PAK PER PHL POL QAT ROU RUS RWASGP SVN SWE TTO TUN TUR UKR URY USA UZB YEM ZWE Class 1 ("Materialist") Class 2 ("Postmaterialist") Class 3 ("Mixed") 0.0 0.2 0.4 0.6 0.8 0 20 40 0 20 40 0 20 40 % Women in Parliament Classposterior ARM AUS AZE BLR CHL CHNCOL CYP DEU DZA ECU EGY ESPEST GHA IRQ JOR JPN KAZ KGZ KOR LBN MAR MEX MYSNGA NLD NZL PAK PER PHL POL QATROU RUS RWA SGPSVN SWE TTO TUN TUR UKR URY USA UZB YEM ZWE ARM AUS AZE BLR CHL CHN COL CYP DEU DZA ECU EGY ESP ESTGHA IRQJOR JPN KAZ KGZ KOR LBNMAR MEX MYSNGA NLD NZL PAK PER PHL POL QAT ROU RUS RWA SGP SVN SWE TTO TUN TUR UKR URY USA UZB YEMZWE ARM AUS AZE BLR CHL CHN COL CYP DEU DZA ECU EGY ESP EST GHA IRQ JOR JPNKAZKGZ KOR LBN MAR MEX MYSNGA NLDNZL PAK PER PHL POL QAT ROU RUSRWA SGP SVN SWE TTO TUN TUR UKR URY USA UZB YEM ZWE Class 1 ("Materialist") Class 2 ("Postmaterialist") Class 3 ("Mixed") 0.0 0.2 0.4 0.6 0.8 7 8 9 10 11 7 8 9 10 11 7 8 9 10 11 Ln(GDP per capita) Classposterior Latent variable model fit by sensitivity analysis Daniel Oberski
  31. 31. What has been gained by using EPC-interest: I am fairly confident here that there truly is ”approximate measurement invariance”, in the sense that any violations of measurement invariance do not bias the primary conclusions. I think attaining this goal is the main purpose of model fit evaluation. Latent variable model fit by sensitivity analysis Daniel Oberski
  32. 32. Conclusion Latent variable model fit by sensitivity analysis Daniel Oberski
  33. 33. Conclusion • Latent variable modeling is often performed for a purpose; • Model fit evaluation should then be done for the reason that violations of assumptions can disturb this purpose. • Introduced the EPC-interest to look into this; • Evaluates the change in the parameter(s) of interest that would result if a restriction is freed that parameterizes a potential violation of assumptions. Latent variable model fit by sensitivity analysis Daniel Oberski
  34. 34. Implemented in SEM software lavaan for R: Oberski (2014). Evaluating Sensitivity of Parameters of Interest to Measurement Invariance in Latent Variable Models. Political Analysis, 22 (1). Implemented in LCA software Latent Gold: Oberski, Vermunt & Moors (submitted). Evaluating measurement invariance in categorical data latent variable models with the EPC-interest. Under review. Oberski & Vermunt (2014). A model-based approach to goodness-of-fit evaluation in item response theory. Measurement, 11, 117–122. Nagelkerke, Oberski, & Vermunt (accepted). ”Goodness-of-fit of Multilevel Latent Class Models for Categorical Data”. Sociological Methodology. Oberski & Vermunt (conditionally accepted). ”The Expected Parameter Change (EPC) for Local Dependence Assessment in Binary Data Latent Class Models”. Psychometrika. Latent variable model fit by sensitivity analysis Daniel Oberski
  35. 35. Thank you for your attention! Daniel Oberski doberski@uvt.nl See http://daob.nl/publications for full texts & code Latent variable model fit by sensitivity analysis Daniel Oberski
  36. 36. SEM regression coefficient example European Sociological Review 2008, 24(5), 583–599 Latent variable model fit by sensitivity analysis Daniel Oberski
  37. 37. SEM regression coefficient example Conservation Self−transcendence q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q Sweden Danmark Austria Switzerland Netherlands Germany Ireland Spain Norway Hungary Finland Portugal France Belgium Slovenia United Kingdom Greece Czech Republic Poland Sweden Danmark Austria Switzerland Netherlands Germany Ireland Spain Norway Hungary Finland Portugal France Belgium Slovenia United Kingdom Greece Czech Republic Poland ALLOWNOCOND −1.0 −0.5 0.0 0.5 1.0 −1.0 −0.5 0.0 0.5 1.0 Regression coefficient Latent variable model fit by sensitivity analysis Daniel Oberski
  38. 38. SEM regression coefficient example EPC-interest statistics of at least 0.1 in absolute value with respect to the latent variable regression coefficients. Metric invariance (loading) restriction “Conditions → Work skills” in... Slovenia France Hungary Ireland EPC-interest w.r.t.: Conditions → Self-transcendence -0.073 -0.092 -0.067 0.073 Conservation 0.144 0.139 0.123 -0.113 SEPC-self 0.610 0.692 0.759 -0.514 Latent variable model fit by sensitivity analysis Daniel Oberski
  39. 39. SEM regression coefficient example What has been gained by using EPC-interest • Full metric invariance model: ”close fit”; • EPC-interest still detects threats to cross-country comparisons of regression coefficients; • MI and EPC-self do not detect these particular misspecifications; • MI and EPC-self detect other misspecifications; • Looking at EPC-interest reveals that these do not affect the cross-country comparisons of regression coefficients. Latent variable model fit by sensitivity analysis Daniel Oberski

×