O slideshow foi denunciado.
Seu SlideShare está sendo baixado. ×

Bootstrap estimation of variance from ROC curve analysis of NHANES complex survey data

Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Próximos SlideShares
Meta-Analysis in Ayurveda
Meta-Analysis in Ayurveda
Carregando em…3
×

Confira estes a seguir

1 de 1 Anúncio

Bootstrap estimation of variance from ROC curve analysis of NHANES complex survey data

The secondary data analyst is fundamentally hindered from implementing bootstrap variance estimation for complex survey data because information for adjusting bootstrap replicate weights for post-stratification and non-response are usually not publicly available. Taking this is as a given, we ignored post-stratification and non-response weight adjustments in order to implement replicate adjustments described in Rust & Rao 1996, then proceeded with bootstrap estimation (1000 replicates) of a simple weighted statistic (the sum) for complex survey data (NHANES) on the tobacco smoke biomarker NNAL in urine for comparison with the estimate from Taylor series linearization for the full survey sample. The bootstrap and Taylor series estimates were found to be very close (CV=0.52 percent). We therefore proceeded to use the bootstrap to estimate variances for the optimal cutpoint and c-index from ROC curve analysis of NHANES urinary NNAL data to discriminate smokers from non-smokers. The optimal cutpoint was 19.92 NNAL ng/g Cr [bootstrap CI 19.77:20.08; CV=12.39%] and c-index (equivalent to the area under the ROC curve) was 0.98978 [bootstrap CI 0.98970:0.98985; CV=0.11%].

The secondary data analyst is fundamentally hindered from implementing bootstrap variance estimation for complex survey data because information for adjusting bootstrap replicate weights for post-stratification and non-response are usually not publicly available. Taking this is as a given, we ignored post-stratification and non-response weight adjustments in order to implement replicate adjustments described in Rust & Rao 1996, then proceeded with bootstrap estimation (1000 replicates) of a simple weighted statistic (the sum) for complex survey data (NHANES) on the tobacco smoke biomarker NNAL in urine for comparison with the estimate from Taylor series linearization for the full survey sample. The bootstrap and Taylor series estimates were found to be very close (CV=0.52 percent). We therefore proceeded to use the bootstrap to estimate variances for the optimal cutpoint and c-index from ROC curve analysis of NHANES urinary NNAL data to discriminate smokers from non-smokers. The optimal cutpoint was 19.92 NNAL ng/g Cr [bootstrap CI 19.77:20.08; CV=12.39%] and c-index (equivalent to the area under the ROC curve) was 0.98978 [bootstrap CI 0.98970:0.98985; CV=0.11%].

Anúncio
Anúncio

Mais Conteúdo rRelacionado

Quem viu também gostou (9)

Anúncio

Semelhante a Bootstrap estimation of variance from ROC curve analysis of NHANES complex survey data (20)

Mais de REY DECASTRO (8)

Anúncio

Mais recentes (20)

Bootstrap estimation of variance from ROC curve analysis of NHANES complex survey data

  1. 1. Bootstrap estimation of variance from ROC curve analysis of NHANES complex survey data B. Rey deCastro, Yang Xia, Lee-Yang Wong; CDC National Center for Environmental Health, Atlanta, GA, United States Email: cdcinfo@cdc.gov | Web: www.cdc.gov The findings and conclusions in this report are those of the authors and do not necessarily represent the official position of the Centers for Disease Control and Prevention. National Center for Environmental Health Division of Laboratory Sciences Introduction • The secondary data analyst is hindered from implementing bootstrap variance estimation for complex survey data because information for adjusting bootstrap replicate weights for post-stratification and non-response are usually not publicly available. • Taking this is as a given, we implemented replicate adjustments described in Rao & Wu 1988 in order to evaluate the effects of ignoring post-stratification and non-response weight adjustments in bootstrap variance estimation Results Sum: Bootstrap vs. Taylor series linearization • The bootstrap and Taylor series linearization estimates of the sum of NNAL were within 0.80 percent, with CVs of 9.06 and 8.95 percent, respectively • Finding that the bootstrap’s performance for the sum was acceptable, we proceeded to use the bootstrap to estimate variances for the optimal cutpoint and c-index from ROC curve analysis of NHANES urinary NNAL data to discriminate smokers from non-smokers • The optimal bootstrap cutpoint was 19.92 NNAL ng/g Cr [bootstrap 2.5%ile to 97.5%ile: 15.23:29.19; CV=12.39%] and c-index (equivalent to the area under the ROC curve) was 0.98978 [0.98760:0.99198; CV=0.11%], with sensitivity 0.96243 [0.94866:0.97347; CV=0.65%] and specificity 0.94920 [0.93764:0.96426; CV=0.61%] • Empirical ROC curve from an example bootstrap replicate (below) provides additional evidence that the ROC curve statistics are well-estimated • Sample-weighted ROC curve statistics estimated in this way from NHANES data are representative of the United States civilian, non-institutionalized population Materials & methods • Bootstrap estimation of a simple weighted statistic (the sum) and its variance from replicate subsamples (1000 replicates) of complex survey data (NHANES), and compared this with Taylor series linearization for the full survey sample • Bootstrap estimation (1000 replicates) of optimal cutpoint and c-index from ROC curve analysis of NHANES urinary NNAL data to discriminate smokers from non-smokers • Survey data: United States National Health and Nutrition Examination Survey 2007-2010 data (NHANES; n = 20,686 participants) on urinary 4-(methylnitrosamino)-1-(3-pyridyl)-1- butanol (NNAL), a biomarker of tobacco smoke carcinogens • Replicate weight adjustment (Rao & Wu 1988): • 𝑤ℎ𝑖𝑗 𝑡 = 0 𝑖𝑓 𝑃𝑆𝑈 𝑛𝑜𝑡 𝑠𝑒𝑙𝑒𝑐𝑡𝑒𝑑 𝑤ℎ𝑖𝑗 × 𝑛ℎ 𝑛ℎ−1 × 𝑟ℎ𝑖 𝑡 • Software: SAS 9.3_M1 SURVEYSELECT, SURVEYLOGISTIC, LOGISTIC Conclusions • Bootstrap estimation of the sample-weighted sum compared favorably with the estimate by Taylor series linearization • This result suggests that additional adjustment of bootstrap replicate weights for post-stratification and non-response would have been negligible for these complex survey data • This result also suggests that adjustment of replicate weights for post-stratification and non-response in bootstrap estimation of variance for ROC curve parameters may also be negligible for these data • Sample-weighted ROC curve analysis yielded an optimal cutpoint for urinary NNAL concentration (creatinine-adjusted) that discriminates tobacco smokers from non-smokers with excellent sensitivity and specificity for a representative of the United States civilian, non-institutionalized population • Prospects are favorable for adapting this approach to domain (i.e., subpopulation) analysis • Results from SAS will be compared with implementations of Rao & Wu 1988 in R and STATA • SAS code is available from the authors upon request Future work Further information rdecastro@cdc.gov, +1 770-488-0162, CDC NCEH, 4770 Buford Hwy, Mailstop F 47, Atlanta GA 30341-3717 Literature cited 1. Efron, B. and R. Tibshirani, An introduction to the bootstrap. Monographs on statistics and applied probability. 1993, New York: Chapman & Hall. xvi, 436 p. 2. Rao, J. N. K., and C. F. J. Wu. 1988. Resampling inference with complex survey data. Journal of the American Statistical Association 83: 231–241. 3. Rao, J.N.K., C.F.J. Wu, and K. Yue, Some Recent Work on Resampling Methods for Complex Surveys. Survey Methodology, 1992. 18(2): p. 209- 217. 4. Rust, K.F. and J.N.K. Rao, Variance estimation for complex surveys using replication techniques. Statistical Methods in Medical Research, 1996. 5: p. 283-310. Acknowledgements The authors would like to thank the following people for their insightful suggestions for improving this analysis: Patricia A. Berglund, Steven G. Heeringa, Lisa Mirel, Van L. Parsons, J.N.K. Rao, Connie Sosnoff, and Bradley T. West ROC curve analysis Contact URL to Poster

×