SlideShare uma empresa Scribd logo
1 de 26
Baixar para ler offline
Sample size for binary logistic prediction models:
Beyond events per variable criteria
Maarten van Smeden, PhD
Leiden University Medical Center

Senior researcher

MEMTAB 2018

Utrecht, July 3
Slides available at: https://www.slideshare.net/MaartenvanSmeden/presentations MEMTAB, Utrecht, July 3 2018
Sample size prediction modeling literature (2018)
Slides available at: https://www.slideshare.net/MaartenvanSmeden/presentations MEMTAB, Utrecht, July 3 2018
Events per variable (EPV)
Slides available at: https://www.slideshare.net/MaartenvanSmeden/presentations MEMTAB, Utrecht, July 3 2018
Events per variable (EPV)
Slides available at: https://www.slideshare.net/MaartenvanSmeden/presentations MEMTAB, Utrecht, July 3 2018
Events per variable (EPV)
Critique

• Flimsy supporting evidence for 10 EPV rule [1]

• 50 EPV rule more realistic with traditional variable selection techniques [2]

• 5 EPV sufficient to reduce (average) overfitting after “modern” shrinkage [3]

• EPV only part of sample size story [4]

[1] van Smeden et al., BMC MRM, 2014, doi: 10.1186/s12874-016-0267-3

[2] Steyerberg et al., Stat Med, 2000, doi: 10.1002/(SICI)1097-0258(20000430)19:8<1059::AID-SIM412>3.0.CO;2-0 

[3] Pavlou et al., Stat Med, 2016, doi: 10.1002/sim.6782

[4] Ogundimu et al., JCE, 2016, doi: 10.1016/j.jclinepi.2016.02.031
Slides available at: https://www.slideshare.net/MaartenvanSmeden/presentations MEMTAB, Utrecht, July 3 2018
EPV forgets about the intercept?
Slides available at: https://www.slideshare.net/MaartenvanSmeden/presentations MEMTAB, Utrecht, July 3 2018
New sample size criteria: rMSPE
Root Mean Squared Prediction Error (rMSPE): 

standard deviation of out-of-sample probability prediction error

Rational: since clinical prediction is about probability estimation, a
sample size criterion should be based on allowable error rates in these
estimates
Slides available at: https://www.slideshare.net/MaartenvanSmeden/presentations MEMTAB, Utrecht, July 3 2018
Slides available at: https://www.slideshare.net/MaartenvanSmeden/presentations MEMTAB, Utrecht, July 3 2018
Slides available at: https://www.slideshare.net/MaartenvanSmeden/presentations MEMTAB, Utrecht, July 3 2018
*Coverage property not guaranteed: assuming errors are IID normal
Slides available at: https://www.slideshare.net/MaartenvanSmeden/presentations MEMTAB, Utrecht, July 3 2018
Slides available at: https://www.slideshare.net/MaartenvanSmeden/presentations MEMTAB, Utrecht, July 3 2018
Unfortunately no closed form solution for out-of-sample rMSPE
Slides available at: https://www.slideshare.net/MaartenvanSmeden/presentations MEMTAB, Utrecht, July 3 2018
Simulation study
• 4,032 simulation conditions (factorial design)

simulation factors: EPV (3 to 50), number candidate predictors (4 to 12),
events fraction (1/16 to 1/2), area under ROC curve (0.65 to 0.85), distribution
and correlation predictors, number of noise variables

• 5,000 replications per condition -> > 20 million simulation runs
Slides available at: https://www.slideshare.net/MaartenvanSmeden/presentations MEMTAB, Utrecht, July 3 2018
Simulation study
• 4,032 simulation conditions (factorial design)

simulation factors: EPV (3 to 50), number candidate predictors (4 to 12),
events fraction (1/16 to 1/2), area under ROC curve (0.65 to 0.85), distribution
and correlation predictors, number of noise variables

• 5,000 replications per condition -> > 20 million simulation runs
• Each run: generate pairs of derivation data and validation data
(large, with 5,000 expected events) and develop + validate various
logistic prediction models

• Will focus on maximum likelihood logistic regression
Slides available at: https://www.slideshare.net/MaartenvanSmeden/presentations MEMTAB, Utrecht, July 3 2018
Simulation study
• 4,032 simulation conditions (factorial design)

simulation factors: EPV (3 to 50), number candidate predictors (4 to 12),
events fraction (1/16 to 1/2), area under ROC curve (0.65 to 0.85), distribution
and correlation predictors, number of noise variables

• 5,000 replications per condition -> > 20 million simulation runs
• Each run: generate pairs of derivation data and validation data
(large, with 5,000 expected events) and develop + validate various
logistic prediction models

• Will focus on maximum likelihood logistic regression

• Simulation meta models: fit linear (Ridge) regression models to predict
simulation outcome (rMSPE) from simulation factors
Slides available at: https://www.slideshare.net/MaartenvanSmeden/presentations MEMTAB, Utrecht, July 3 2018
Simulation meta models
rMPSE

• Meta-model with 3 (of 7) factors: N, events fraction and number of
(candidate) predictors: R2 = 0.992
• (Meta-model with only EPV as factor: R2 = 0.432)
https://mvansmeden.shinyapps.io/BeyondEPV/
Slides available at: https://www.slideshare.net/MaartenvanSmeden/presentations MEMTAB, Utrecht, July 3 2018
Slides available at: https://www.slideshare.net/MaartenvanSmeden/presentations MEMTAB, Utrecht, July 3 2018
In press
Thanks to Richard Riley for commenting on early draft
Slides available at: https://www.slideshare.net/MaartenvanSmeden/presentations MEMTAB, Utrecht, July 3 2018
Final remarks
• 10 EPV prediction models can produce widely inaccurate probability
estimates

• New sample size criterion - based on rMSPE - could be accurately
approximated by predictable data characteristics

• Validation, analytical work, and extensions still needs to be done

• Our new sample size calculation shiny-app is “Beta”; can be used to
approximate rMSPE for settings that stay close to our simulation
design (article in press)

• One sample criterion probably isn’t always enough. Notably, low events
fraction settings may come with low rMSPE and high need of shrinkage
Slides available at: https://www.slideshare.net/MaartenvanSmeden/presentations MEMTAB, Utrecht, July 3 2018
Final remarks
Binary logistic regression sample size recommendations

1. Think about allowable probability prediction error (e.g. in terms of 95%
coverage regions)

2. If you can, run a realistic simulation study

3. If you can’t do 2, use our shiny-app with caution to calculate minimal
sample size
Slides available at: https://www.slideshare.net/MaartenvanSmeden/presentations MEMTAB, Utrecht, July 3 2018
https://mvansmeden.shinyapps.io/BeyondEPV/
Slides available at: https://www.slideshare.net/MaartenvanSmeden/presentations MEMTAB, Utrecht, July 3 2018
Slides available at: https://www.slideshare.net/MaartenvanSmeden/presentations MEMTAB, Utrecht, July 3 2018
Slides available at: https://www.slideshare.net/MaartenvanSmeden/presentations MEMTAB, Utrecht, July 3 2018
Logistic prediction models
Schmidt et al., Schizo Bulletin, 2017, doi:10.1093/schbul/sbw098; Damen et al., BMJ, 2017, doi:10.1136/bmj.i2416; Collins et al., BMC MRM, 2014, doi:10.1186/1471-2288-14-40; Collins et al., BMC Med, 2011, doi:
10.1186/1741-7015-9-103; Bouwmeester et al., Plos Med, 2012: 10.1371/journal.pmed.1001221.
Slides available at: https://www.slideshare.net/MaartenvanSmeden/presentations MEMTAB, Utrecht, July 3 2018
New sample size criterion
Use expected root Mean Squared Prediction Error (rMSPE)

Interpretation: standard deviation of expected out-of-sample probability
prediction error

Where are the unobservable “true” probabilities that would have been
obtained would the prediction model have been derived with correct
functional form and infinite sample size; are estimated probabilities from
the derived model in a large external set of similar individuals (“out-of-
sample”).

rMSPE = E[(πi − ̂πi)2
],
πi
̂πi
Slides available at: https://www.slideshare.net/MaartenvanSmeden/presentations MEMTAB, Utrecht, July 3 2018
Difference between estimated probability from a prediction model
when applied in large sample validation study vs “true” probability
obtained when the same model would have been derived from an
infinitely large sample

Mais conteúdo relacionado

Mais procurados

Improving predictions: Lasso, Ridge and Stein's paradox
Improving predictions: Lasso, Ridge and Stein's paradoxImproving predictions: Lasso, Ridge and Stein's paradox
Improving predictions: Lasso, Ridge and Stein's paradoxMaarten van Smeden
 
data analysis techniques and statistical softwares
data analysis techniques and statistical softwaresdata analysis techniques and statistical softwares
data analysis techniques and statistical softwaresDr.ammara khakwani
 
Structural equation-models-introduction-kimmo-vehkalahti-2013
Structural equation-models-introduction-kimmo-vehkalahti-2013Structural equation-models-introduction-kimmo-vehkalahti-2013
Structural equation-models-introduction-kimmo-vehkalahti-2013Kimmo Vehkalahti
 
Data Analysis using SPSS: Part 1
Data Analysis using SPSS: Part 1Data Analysis using SPSS: Part 1
Data Analysis using SPSS: Part 1Taddesse Kassahun
 
Introduction to Research Designs in Public Health.pdf
Introduction to Research Designs in Public Health.pdfIntroduction to Research Designs in Public Health.pdf
Introduction to Research Designs in Public Health.pdfAugustineGatimuNjugu
 
Clinical Healthcare Data Analytics
Clinical Healthcare Data AnalyticsClinical Healthcare Data Analytics
Clinical Healthcare Data Analyticsdansouk
 
Statistics final seminar
Statistics final seminarStatistics final seminar
Statistics final seminarTejas Jagtap
 
Data mining Part 1
Data mining Part 1Data mining Part 1
Data mining Part 1Gautam Kumar
 
Introduction to meta-analysis (1612_MA_workshop)
Introduction to meta-analysis (1612_MA_workshop)Introduction to meta-analysis (1612_MA_workshop)
Introduction to meta-analysis (1612_MA_workshop)Ahmed Negida
 
Data Analysis, Presentation and Interpretation of Data
Data Analysis, Presentation and Interpretation of DataData Analysis, Presentation and Interpretation of Data
Data Analysis, Presentation and Interpretation of DataRoqui Malijan
 
Propensity Score Methods for Comparative Effectiveness Research with Multiple...
Propensity Score Methods for Comparative Effectiveness Research with Multiple...Propensity Score Methods for Comparative Effectiveness Research with Multiple...
Propensity Score Methods for Comparative Effectiveness Research with Multiple...Kazuki Yoshida
 
Meta-Analysis -- Introduction.pptx
Meta-Analysis -- Introduction.pptxMeta-Analysis -- Introduction.pptx
Meta-Analysis -- Introduction.pptxACSRM
 
Research design new ppt
Research design new pptResearch design new ppt
Research design new pptRekha Marbate
 
Systematic Review & Meta Analysis.pptx
Systematic Review & Meta Analysis.pptxSystematic Review & Meta Analysis.pptx
Systematic Review & Meta Analysis.pptxDr. Anik Chakraborty
 

Mais procurados (20)

Data Analysis
Data AnalysisData Analysis
Data Analysis
 
Improving predictions: Lasso, Ridge and Stein's paradox
Improving predictions: Lasso, Ridge and Stein's paradoxImproving predictions: Lasso, Ridge and Stein's paradox
Improving predictions: Lasso, Ridge and Stein's paradox
 
data analysis techniques and statistical softwares
data analysis techniques and statistical softwaresdata analysis techniques and statistical softwares
data analysis techniques and statistical softwares
 
Structural equation-models-introduction-kimmo-vehkalahti-2013
Structural equation-models-introduction-kimmo-vehkalahti-2013Structural equation-models-introduction-kimmo-vehkalahti-2013
Structural equation-models-introduction-kimmo-vehkalahti-2013
 
Data Analysis using SPSS: Part 1
Data Analysis using SPSS: Part 1Data Analysis using SPSS: Part 1
Data Analysis using SPSS: Part 1
 
Introduction to Research Designs in Public Health.pdf
Introduction to Research Designs in Public Health.pdfIntroduction to Research Designs in Public Health.pdf
Introduction to Research Designs in Public Health.pdf
 
Clinical Healthcare Data Analytics
Clinical Healthcare Data AnalyticsClinical Healthcare Data Analytics
Clinical Healthcare Data Analytics
 
Statistics final seminar
Statistics final seminarStatistics final seminar
Statistics final seminar
 
Data mining Part 1
Data mining Part 1Data mining Part 1
Data mining Part 1
 
Introduction to meta-analysis (1612_MA_workshop)
Introduction to meta-analysis (1612_MA_workshop)Introduction to meta-analysis (1612_MA_workshop)
Introduction to meta-analysis (1612_MA_workshop)
 
Data Analysis, Presentation and Interpretation of Data
Data Analysis, Presentation and Interpretation of DataData Analysis, Presentation and Interpretation of Data
Data Analysis, Presentation and Interpretation of Data
 
Propensity Score Methods for Comparative Effectiveness Research with Multiple...
Propensity Score Methods for Comparative Effectiveness Research with Multiple...Propensity Score Methods for Comparative Effectiveness Research with Multiple...
Propensity Score Methods for Comparative Effectiveness Research with Multiple...
 
SAMPLE SIZE, CONSENT, STATISTICS
SAMPLE SIZE, CONSENT, STATISTICSSAMPLE SIZE, CONSENT, STATISTICS
SAMPLE SIZE, CONSENT, STATISTICS
 
Survival analysis
Survival analysisSurvival analysis
Survival analysis
 
Presentation of data
Presentation of dataPresentation of data
Presentation of data
 
Meta-Analysis -- Introduction.pptx
Meta-Analysis -- Introduction.pptxMeta-Analysis -- Introduction.pptx
Meta-Analysis -- Introduction.pptx
 
Research process
Research processResearch process
Research process
 
Research design new ppt
Research design new pptResearch design new ppt
Research design new ppt
 
Data analysis
Data analysisData analysis
Data analysis
 
Systematic Review & Meta Analysis.pptx
Systematic Review & Meta Analysis.pptxSystematic Review & Meta Analysis.pptx
Systematic Review & Meta Analysis.pptx
 

Semelhante a Sample size for binary logistic prediction models: Beyond events per variable criteria

An Homophily-based Approach for Fast Post Recommendation in Microblogging Sys...
An Homophily-based Approach for Fast Post Recommendation in Microblogging Sys...An Homophily-based Approach for Fast Post Recommendation in Microblogging Sys...
An Homophily-based Approach for Fast Post Recommendation in Microblogging Sys...recsysfr
 
March 2, 2018 - Machine Learning for Production Forecasting
March 2, 2018 - Machine Learning for Production ForecastingMarch 2, 2018 - Machine Learning for Production Forecasting
March 2, 2018 - Machine Learning for Production ForecastingDavid Fulford
 
Revealing Differences in Designer‘s and Users‘Perspectives
Revealing Differences in Designer‘s and Users‘PerspectivesRevealing Differences in Designer‘s and Users‘Perspectives
Revealing Differences in Designer‘s and Users‘PerspectivesSebastian Feuerstack
 
Big data fusion and parametrization for strategic transport models
Big data fusion and parametrization for strategic transport modelsBig data fusion and parametrization for strategic transport models
Big data fusion and parametrization for strategic transport modelsLuuk Brederode
 
Machine Learning for Finance Master Class
Machine Learning for Finance Master Class Machine Learning for Finance Master Class
Machine Learning for Finance Master Class QuantUniversity
 
Selecting Ontologies and Publishing Data of Electrical Appliances: A Refrige...
Selecting Ontologies  and Publishing Data of Electrical Appliances: A Refrige...Selecting Ontologies  and Publishing Data of Electrical Appliances: A Refrige...
Selecting Ontologies and Publishing Data of Electrical Appliances: A Refrige...Anna Fensel
 
A Bug Report Analysis and Search Tool (presentation for M.Sc. degree)
A Bug Report Analysis and Search Tool (presentation for M.Sc. degree)A Bug Report Analysis and Search Tool (presentation for M.Sc. degree)
A Bug Report Analysis and Search Tool (presentation for M.Sc. degree)yguarata
 
Story behind Microelectronic Circuits
Story behind Microelectronic CircuitsStory behind Microelectronic Circuits
Story behind Microelectronic CircuitsHoopeer Hoopeer
 
Lecture_1_-_Course_Overview_(Inked).pdf
Lecture_1_-_Course_Overview_(Inked).pdfLecture_1_-_Course_Overview_(Inked).pdf
Lecture_1_-_Course_Overview_(Inked).pdfRTEFGDFGJU
 
Sgd teaching-consulting-10-jan-2009 (1)
Sgd teaching-consulting-10-jan-2009 (1)Sgd teaching-consulting-10-jan-2009 (1)
Sgd teaching-consulting-10-jan-2009 (1)Sanjeev Deshmukh
 
M2CAT: Extracting reproducible simulation studies from model repositories usi...
M2CAT: Extracting reproducible simulation studies from model repositories usi...M2CAT: Extracting reproducible simulation studies from model repositories usi...
M2CAT: Extracting reproducible simulation studies from model repositories usi...Martin Scharm
 
Topics of interest for IWPT'01.doc
Topics of interest for IWPT'01.docTopics of interest for IWPT'01.doc
Topics of interest for IWPT'01.docbutest
 
Subject: Ex-post impact evaluations of energy efficiency policies in Europe
Subject:	Ex-post impact evaluations of energy efficiency policies in EuropeSubject:	Ex-post impact evaluations of energy efficiency policies in Europe
Subject: Ex-post impact evaluations of energy efficiency policies in EuropeLeonardo ENERGY
 
PythonQuants conference - QuantUniversity presentation - Stress Testing in th...
PythonQuants conference - QuantUniversity presentation - Stress Testing in th...PythonQuants conference - QuantUniversity presentation - Stress Testing in th...
PythonQuants conference - QuantUniversity presentation - Stress Testing in th...QuantUniversity
 
M2CAT: Extracting reproducible simulation studies from model repositories usi...
M2CAT: Extracting reproducible simulation studies from model repositories usi...M2CAT: Extracting reproducible simulation studies from model repositories usi...
M2CAT: Extracting reproducible simulation studies from model repositories usi...Martin Scharm
 
ISWC 2015 - Collecting, integrating, enriching and republishing open city dat...
ISWC 2015 - Collecting, integrating, enriching and republishing open city dat...ISWC 2015 - Collecting, integrating, enriching and republishing open city dat...
ISWC 2015 - Collecting, integrating, enriching and republishing open city dat...Stefan Bischof
 
Risk-based cost methods - David Engel, Pacific Northwest National Laboratory
Risk-based cost methods - David Engel, Pacific Northwest National LaboratoryRisk-based cost methods - David Engel, Pacific Northwest National Laboratory
Risk-based cost methods - David Engel, Pacific Northwest National LaboratoryGlobal CCS Institute
 
2011 02-04 - d sallier - prévision probabiliste
2011 02-04 - d sallier - prévision probabiliste2011 02-04 - d sallier - prévision probabiliste
2011 02-04 - d sallier - prévision probabilisteCdiscount
 

Semelhante a Sample size for binary logistic prediction models: Beyond events per variable criteria (20)

An Homophily-based Approach for Fast Post Recommendation in Microblogging Sys...
An Homophily-based Approach for Fast Post Recommendation in Microblogging Sys...An Homophily-based Approach for Fast Post Recommendation in Microblogging Sys...
An Homophily-based Approach for Fast Post Recommendation in Microblogging Sys...
 
March 2, 2018 - Machine Learning for Production Forecasting
March 2, 2018 - Machine Learning for Production ForecastingMarch 2, 2018 - Machine Learning for Production Forecasting
March 2, 2018 - Machine Learning for Production Forecasting
 
Revealing Differences in Designer‘s and Users‘Perspectives
Revealing Differences in Designer‘s and Users‘PerspectivesRevealing Differences in Designer‘s and Users‘Perspectives
Revealing Differences in Designer‘s and Users‘Perspectives
 
Big data fusion and parametrization for strategic transport models
Big data fusion and parametrization for strategic transport modelsBig data fusion and parametrization for strategic transport models
Big data fusion and parametrization for strategic transport models
 
Machine Learning for Finance Master Class
Machine Learning for Finance Master Class Machine Learning for Finance Master Class
Machine Learning for Finance Master Class
 
Selecting Ontologies and Publishing Data of Electrical Appliances: A Refrige...
Selecting Ontologies  and Publishing Data of Electrical Appliances: A Refrige...Selecting Ontologies  and Publishing Data of Electrical Appliances: A Refrige...
Selecting Ontologies and Publishing Data of Electrical Appliances: A Refrige...
 
A Bug Report Analysis and Search Tool (presentation for M.Sc. degree)
A Bug Report Analysis and Search Tool (presentation for M.Sc. degree)A Bug Report Analysis and Search Tool (presentation for M.Sc. degree)
A Bug Report Analysis and Search Tool (presentation for M.Sc. degree)
 
Story behind Microelectronic Circuits
Story behind Microelectronic CircuitsStory behind Microelectronic Circuits
Story behind Microelectronic Circuits
 
Lecture_1_-_Course_Overview_(Inked).pdf
Lecture_1_-_Course_Overview_(Inked).pdfLecture_1_-_Course_Overview_(Inked).pdf
Lecture_1_-_Course_Overview_(Inked).pdf
 
Sgd teaching-consulting-10-jan-2009 (1)
Sgd teaching-consulting-10-jan-2009 (1)Sgd teaching-consulting-10-jan-2009 (1)
Sgd teaching-consulting-10-jan-2009 (1)
 
M2CAT: Extracting reproducible simulation studies from model repositories usi...
M2CAT: Extracting reproducible simulation studies from model repositories usi...M2CAT: Extracting reproducible simulation studies from model repositories usi...
M2CAT: Extracting reproducible simulation studies from model repositories usi...
 
e:Bio Kick-Off Meeting, SEMS
e:Bio Kick-Off Meeting, SEMSe:Bio Kick-Off Meeting, SEMS
e:Bio Kick-Off Meeting, SEMS
 
Topics of interest for IWPT'01.doc
Topics of interest for IWPT'01.docTopics of interest for IWPT'01.doc
Topics of interest for IWPT'01.doc
 
Subject: Ex-post impact evaluations of energy efficiency policies in Europe
Subject:	Ex-post impact evaluations of energy efficiency policies in EuropeSubject:	Ex-post impact evaluations of energy efficiency policies in Europe
Subject: Ex-post impact evaluations of energy efficiency policies in Europe
 
PythonQuants conference - QuantUniversity presentation - Stress Testing in th...
PythonQuants conference - QuantUniversity presentation - Stress Testing in th...PythonQuants conference - QuantUniversity presentation - Stress Testing in th...
PythonQuants conference - QuantUniversity presentation - Stress Testing in th...
 
M2CAT: Extracting reproducible simulation studies from model repositories usi...
M2CAT: Extracting reproducible simulation studies from model repositories usi...M2CAT: Extracting reproducible simulation studies from model repositories usi...
M2CAT: Extracting reproducible simulation studies from model repositories usi...
 
ISWC 2015 - Collecting, integrating, enriching and republishing open city dat...
ISWC 2015 - Collecting, integrating, enriching and republishing open city dat...ISWC 2015 - Collecting, integrating, enriching and republishing open city dat...
ISWC 2015 - Collecting, integrating, enriching and republishing open city dat...
 
Risk-based cost methods - David Engel, Pacific Northwest National Laboratory
Risk-based cost methods - David Engel, Pacific Northwest National LaboratoryRisk-based cost methods - David Engel, Pacific Northwest National Laboratory
Risk-based cost methods - David Engel, Pacific Northwest National Laboratory
 
2011 02-04 - d sallier - prévision probabiliste
2011 02-04 - d sallier - prévision probabiliste2011 02-04 - d sallier - prévision probabiliste
2011 02-04 - d sallier - prévision probabiliste
 
ABB Scheduling.pdf
ABB Scheduling.pdfABB Scheduling.pdf
ABB Scheduling.pdf
 

Mais de Maarten van Smeden

Rage against the machine learning 2023
Rage against the machine learning 2023Rage against the machine learning 2023
Rage against the machine learning 2023Maarten van Smeden
 
A gentle introduction to AI for medicine
A gentle introduction to AI for medicineA gentle introduction to AI for medicine
A gentle introduction to AI for medicineMaarten van Smeden
 
Improving epidemiological research: avoiding the statistical paradoxes and fa...
Improving epidemiological research: avoiding the statistical paradoxes and fa...Improving epidemiological research: avoiding the statistical paradoxes and fa...
Improving epidemiological research: avoiding the statistical paradoxes and fa...Maarten van Smeden
 
Shrinkage in medical prediction: the poor man’s solution for an inadequate sa...
Shrinkage in medical prediction: the poor man’s solution for an inadequate sa...Shrinkage in medical prediction: the poor man’s solution for an inadequate sa...
Shrinkage in medical prediction: the poor man’s solution for an inadequate sa...Maarten van Smeden
 
Guideline for high-quality diagnostic and prognostic applications of AI in he...
Guideline for high-quality diagnostic and prognostic applications of AI in he...Guideline for high-quality diagnostic and prognostic applications of AI in he...
Guideline for high-quality diagnostic and prognostic applications of AI in he...Maarten van Smeden
 
Prognosis-based medicine: merits and pitfalls of forecasting patient health
Prognosis-based medicine: merits and pitfalls of forecasting patient healthPrognosis-based medicine: merits and pitfalls of forecasting patient health
Prognosis-based medicine: merits and pitfalls of forecasting patient healthMaarten van Smeden
 
Algorithm based medicine: old statistics wine in new machine learning bottles?
Algorithm based medicine: old statistics wine in new machine learning bottles?Algorithm based medicine: old statistics wine in new machine learning bottles?
Algorithm based medicine: old statistics wine in new machine learning bottles?Maarten van Smeden
 
Clinical prediction models for covid-19: alarming results from a living syste...
Clinical prediction models for covid-19: alarming results from a living syste...Clinical prediction models for covid-19: alarming results from a living syste...
Clinical prediction models for covid-19: alarming results from a living syste...Maarten van Smeden
 
Five questions about artificial intelligence
Five questions about artificial intelligenceFive questions about artificial intelligence
Five questions about artificial intelligenceMaarten van Smeden
 
Prediction models for diagnosis and prognosis related to COVID-19
Prediction models for diagnosis and prognosis related to COVID-19Prediction models for diagnosis and prognosis related to COVID-19
Prediction models for diagnosis and prognosis related to COVID-19Maarten van Smeden
 
Clinical prediction models: development, validation and beyond
Clinical prediction models:development, validation and beyondClinical prediction models:development, validation and beyond
Clinical prediction models: development, validation and beyondMaarten van Smeden
 
Correcting for missing data, measurement error and confounding
Correcting for missing data, measurement error and confoundingCorrecting for missing data, measurement error and confounding
Correcting for missing data, measurement error and confoundingMaarten van Smeden
 
Why the EPV≥10 sample size rule is rubbish and what to use instead
Why the EPV≥10 sample size rule is rubbish and what to use instead Why the EPV≥10 sample size rule is rubbish and what to use instead
Why the EPV≥10 sample size rule is rubbish and what to use instead Maarten van Smeden
 
Living systematic reviews: now and in the future
Living systematic reviews: now and in the futureLiving systematic reviews: now and in the future
Living systematic reviews: now and in the futureMaarten van Smeden
 

Mais de Maarten van Smeden (20)

Uncertainty in AI
Uncertainty in AIUncertainty in AI
Uncertainty in AI
 
UMC Utrecht AI Methods Lab
UMC Utrecht AI Methods LabUMC Utrecht AI Methods Lab
UMC Utrecht AI Methods Lab
 
Rage against the machine learning 2023
Rage against the machine learning 2023Rage against the machine learning 2023
Rage against the machine learning 2023
 
A gentle introduction to AI for medicine
A gentle introduction to AI for medicineA gentle introduction to AI for medicine
A gentle introduction to AI for medicine
 
Associate professor lecture
Associate professor lectureAssociate professor lecture
Associate professor lecture
 
Improving epidemiological research: avoiding the statistical paradoxes and fa...
Improving epidemiological research: avoiding the statistical paradoxes and fa...Improving epidemiological research: avoiding the statistical paradoxes and fa...
Improving epidemiological research: avoiding the statistical paradoxes and fa...
 
Shrinkage in medical prediction: the poor man’s solution for an inadequate sa...
Shrinkage in medical prediction: the poor man’s solution for an inadequate sa...Shrinkage in medical prediction: the poor man’s solution for an inadequate sa...
Shrinkage in medical prediction: the poor man’s solution for an inadequate sa...
 
Guideline for high-quality diagnostic and prognostic applications of AI in he...
Guideline for high-quality diagnostic and prognostic applications of AI in he...Guideline for high-quality diagnostic and prognostic applications of AI in he...
Guideline for high-quality diagnostic and prognostic applications of AI in he...
 
Predictimands
PredictimandsPredictimands
Predictimands
 
Prognosis-based medicine: merits and pitfalls of forecasting patient health
Prognosis-based medicine: merits and pitfalls of forecasting patient healthPrognosis-based medicine: merits and pitfalls of forecasting patient health
Prognosis-based medicine: merits and pitfalls of forecasting patient health
 
Algorithm based medicine
Algorithm based medicineAlgorithm based medicine
Algorithm based medicine
 
Algorithm based medicine: old statistics wine in new machine learning bottles?
Algorithm based medicine: old statistics wine in new machine learning bottles?Algorithm based medicine: old statistics wine in new machine learning bottles?
Algorithm based medicine: old statistics wine in new machine learning bottles?
 
Clinical prediction models for covid-19: alarming results from a living syste...
Clinical prediction models for covid-19: alarming results from a living syste...Clinical prediction models for covid-19: alarming results from a living syste...
Clinical prediction models for covid-19: alarming results from a living syste...
 
Five questions about artificial intelligence
Five questions about artificial intelligenceFive questions about artificial intelligence
Five questions about artificial intelligence
 
Prediction models for diagnosis and prognosis related to COVID-19
Prediction models for diagnosis and prognosis related to COVID-19Prediction models for diagnosis and prognosis related to COVID-19
Prediction models for diagnosis and prognosis related to COVID-19
 
Clinical prediction models: development, validation and beyond
Clinical prediction models:development, validation and beyondClinical prediction models:development, validation and beyond
Clinical prediction models: development, validation and beyond
 
Correcting for missing data, measurement error and confounding
Correcting for missing data, measurement error and confoundingCorrecting for missing data, measurement error and confounding
Correcting for missing data, measurement error and confounding
 
Why the EPV≥10 sample size rule is rubbish and what to use instead
Why the EPV≥10 sample size rule is rubbish and what to use instead Why the EPV≥10 sample size rule is rubbish and what to use instead
Why the EPV≥10 sample size rule is rubbish and what to use instead
 
Living systematic reviews: now and in the future
Living systematic reviews: now and in the futureLiving systematic reviews: now and in the future
Living systematic reviews: now and in the future
 
Voorspelmodellen en COVID-19
Voorspelmodellen en COVID-19Voorspelmodellen en COVID-19
Voorspelmodellen en COVID-19
 

Último

Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxpradhanghanshyam7136
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PPRINCE C P
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 sciencefloriejanemacaya1
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Patrick Diehl
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfSumit Kumar yadav
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsSumit Kumar yadav
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |aasikanpl
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 

Último (20)

Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptx
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C P
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 science
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 

Sample size for binary logistic prediction models: Beyond events per variable criteria

  • 1. Sample size for binary logistic prediction models: Beyond events per variable criteria Maarten van Smeden, PhD Leiden University Medical Center Senior researcher MEMTAB 2018 Utrecht, July 3
  • 2. Slides available at: https://www.slideshare.net/MaartenvanSmeden/presentations MEMTAB, Utrecht, July 3 2018 Sample size prediction modeling literature (2018)
  • 3. Slides available at: https://www.slideshare.net/MaartenvanSmeden/presentations MEMTAB, Utrecht, July 3 2018 Events per variable (EPV)
  • 4. Slides available at: https://www.slideshare.net/MaartenvanSmeden/presentations MEMTAB, Utrecht, July 3 2018 Events per variable (EPV)
  • 5. Slides available at: https://www.slideshare.net/MaartenvanSmeden/presentations MEMTAB, Utrecht, July 3 2018 Events per variable (EPV) Critique • Flimsy supporting evidence for 10 EPV rule [1] • 50 EPV rule more realistic with traditional variable selection techniques [2] • 5 EPV sufficient to reduce (average) overfitting after “modern” shrinkage [3] • EPV only part of sample size story [4] [1] van Smeden et al., BMC MRM, 2014, doi: 10.1186/s12874-016-0267-3 [2] Steyerberg et al., Stat Med, 2000, doi: 10.1002/(SICI)1097-0258(20000430)19:8<1059::AID-SIM412>3.0.CO;2-0  [3] Pavlou et al., Stat Med, 2016, doi: 10.1002/sim.6782 [4] Ogundimu et al., JCE, 2016, doi: 10.1016/j.jclinepi.2016.02.031
  • 6. Slides available at: https://www.slideshare.net/MaartenvanSmeden/presentations MEMTAB, Utrecht, July 3 2018 EPV forgets about the intercept?
  • 7. Slides available at: https://www.slideshare.net/MaartenvanSmeden/presentations MEMTAB, Utrecht, July 3 2018 New sample size criteria: rMSPE Root Mean Squared Prediction Error (rMSPE): 
 standard deviation of out-of-sample probability prediction error Rational: since clinical prediction is about probability estimation, a sample size criterion should be based on allowable error rates in these estimates
  • 8. Slides available at: https://www.slideshare.net/MaartenvanSmeden/presentations MEMTAB, Utrecht, July 3 2018
  • 9. Slides available at: https://www.slideshare.net/MaartenvanSmeden/presentations MEMTAB, Utrecht, July 3 2018
  • 10. Slides available at: https://www.slideshare.net/MaartenvanSmeden/presentations MEMTAB, Utrecht, July 3 2018 *Coverage property not guaranteed: assuming errors are IID normal
  • 11. Slides available at: https://www.slideshare.net/MaartenvanSmeden/presentations MEMTAB, Utrecht, July 3 2018
  • 12. Slides available at: https://www.slideshare.net/MaartenvanSmeden/presentations MEMTAB, Utrecht, July 3 2018 Unfortunately no closed form solution for out-of-sample rMSPE
  • 13. Slides available at: https://www.slideshare.net/MaartenvanSmeden/presentations MEMTAB, Utrecht, July 3 2018 Simulation study • 4,032 simulation conditions (factorial design)
 simulation factors: EPV (3 to 50), number candidate predictors (4 to 12), events fraction (1/16 to 1/2), area under ROC curve (0.65 to 0.85), distribution and correlation predictors, number of noise variables • 5,000 replications per condition -> > 20 million simulation runs
  • 14. Slides available at: https://www.slideshare.net/MaartenvanSmeden/presentations MEMTAB, Utrecht, July 3 2018 Simulation study • 4,032 simulation conditions (factorial design)
 simulation factors: EPV (3 to 50), number candidate predictors (4 to 12), events fraction (1/16 to 1/2), area under ROC curve (0.65 to 0.85), distribution and correlation predictors, number of noise variables • 5,000 replications per condition -> > 20 million simulation runs • Each run: generate pairs of derivation data and validation data (large, with 5,000 expected events) and develop + validate various logistic prediction models • Will focus on maximum likelihood logistic regression
  • 15. Slides available at: https://www.slideshare.net/MaartenvanSmeden/presentations MEMTAB, Utrecht, July 3 2018 Simulation study • 4,032 simulation conditions (factorial design)
 simulation factors: EPV (3 to 50), number candidate predictors (4 to 12), events fraction (1/16 to 1/2), area under ROC curve (0.65 to 0.85), distribution and correlation predictors, number of noise variables • 5,000 replications per condition -> > 20 million simulation runs • Each run: generate pairs of derivation data and validation data (large, with 5,000 expected events) and develop + validate various logistic prediction models • Will focus on maximum likelihood logistic regression • Simulation meta models: fit linear (Ridge) regression models to predict simulation outcome (rMSPE) from simulation factors
  • 16. Slides available at: https://www.slideshare.net/MaartenvanSmeden/presentations MEMTAB, Utrecht, July 3 2018 Simulation meta models rMPSE • Meta-model with 3 (of 7) factors: N, events fraction and number of (candidate) predictors: R2 = 0.992 • (Meta-model with only EPV as factor: R2 = 0.432) https://mvansmeden.shinyapps.io/BeyondEPV/
  • 17. Slides available at: https://www.slideshare.net/MaartenvanSmeden/presentations MEMTAB, Utrecht, July 3 2018
  • 18. Slides available at: https://www.slideshare.net/MaartenvanSmeden/presentations MEMTAB, Utrecht, July 3 2018 In press Thanks to Richard Riley for commenting on early draft
  • 19. Slides available at: https://www.slideshare.net/MaartenvanSmeden/presentations MEMTAB, Utrecht, July 3 2018 Final remarks • 10 EPV prediction models can produce widely inaccurate probability estimates • New sample size criterion - based on rMSPE - could be accurately approximated by predictable data characteristics • Validation, analytical work, and extensions still needs to be done • Our new sample size calculation shiny-app is “Beta”; can be used to approximate rMSPE for settings that stay close to our simulation design (article in press) • One sample criterion probably isn’t always enough. Notably, low events fraction settings may come with low rMSPE and high need of shrinkage
  • 20. Slides available at: https://www.slideshare.net/MaartenvanSmeden/presentations MEMTAB, Utrecht, July 3 2018 Final remarks Binary logistic regression sample size recommendations 1. Think about allowable probability prediction error (e.g. in terms of 95% coverage regions) 2. If you can, run a realistic simulation study 3. If you can’t do 2, use our shiny-app with caution to calculate minimal sample size
  • 21. Slides available at: https://www.slideshare.net/MaartenvanSmeden/presentations MEMTAB, Utrecht, July 3 2018 https://mvansmeden.shinyapps.io/BeyondEPV/
  • 22. Slides available at: https://www.slideshare.net/MaartenvanSmeden/presentations MEMTAB, Utrecht, July 3 2018
  • 23. Slides available at: https://www.slideshare.net/MaartenvanSmeden/presentations MEMTAB, Utrecht, July 3 2018
  • 24. Slides available at: https://www.slideshare.net/MaartenvanSmeden/presentations MEMTAB, Utrecht, July 3 2018 Logistic prediction models Schmidt et al., Schizo Bulletin, 2017, doi:10.1093/schbul/sbw098; Damen et al., BMJ, 2017, doi:10.1136/bmj.i2416; Collins et al., BMC MRM, 2014, doi:10.1186/1471-2288-14-40; Collins et al., BMC Med, 2011, doi: 10.1186/1741-7015-9-103; Bouwmeester et al., Plos Med, 2012: 10.1371/journal.pmed.1001221.
  • 25. Slides available at: https://www.slideshare.net/MaartenvanSmeden/presentations MEMTAB, Utrecht, July 3 2018 New sample size criterion Use expected root Mean Squared Prediction Error (rMSPE) Interpretation: standard deviation of expected out-of-sample probability prediction error Where are the unobservable “true” probabilities that would have been obtained would the prediction model have been derived with correct functional form and infinite sample size; are estimated probabilities from the derived model in a large external set of similar individuals (“out-of- sample”). rMSPE = E[(πi − ̂πi)2 ], πi ̂πi
  • 26. Slides available at: https://www.slideshare.net/MaartenvanSmeden/presentations MEMTAB, Utrecht, July 3 2018 Difference between estimated probability from a prediction model when applied in large sample validation study vs “true” probability obtained when the same model would have been derived from an infinitely large sample