OCTRI PSS Simulations in R Seminar.pdf

DATE: SEPTEMBER 21, 2023 PRESENTED BY: ROBIN BAUDIER, MSPH, MFA, STAFF BIOSTATISTICIAN WITH THE BDP
Power and Sample Size Simulations in R
OCTRI Research Forum
TOPICS COVERED
Brief overviews of
 Power and Sample Size (PSS)
concepts
 Monte Carlo Simulations
 Using Monte Carlo Simulations to
calculate PSS
How to perform PSS simulations in R
 Simulating variables
 Using loops
 Looping tests on simulated datasets to
calculate power
 Examples of complex PSS simulations
 Packages that can make things a little
easier
POWER & SAMPLE SIZE CALCULATIONS
A VERY BRIEF OVERVIEW
FOR MORE INFORMATION ON THE BASICS OF PSS CALCULATIONS
 Check out the OCTRI PSS 101 Seminar, co-developed by Meike Niederhausen, PhD and Alicia
Johnson, MPH, and presented most recently by Alicia Johnson in July:
 RECORDING: https://echo360.org/media/bd0064c8-327c-4b56-8279-9558cd307125/public
 SLIDES: https://drive.google.com/file/d/1691RWaDzzRkjbdhbfjfxO7yKESRUjuAn/view
Image adapted from: https://towardsdatascience.com/5-ways-to-increase-statistical-power-377c00dd0214
Power is the probability of correctly rejecting the null hypothesis
What is power?
α is the probability of
incorrectly rejecting
the null hypothesis
1 - α
1 - β
Critical Value
Image adapted from: https://towardsdatascience.com/5-ways-to-increase-statistical-power-377c00dd0214
Statistical test example: one-tailed T-test
α = 0.05
1 - α
1 - β
0 1.645
Reject the null
hypothesis if t > 1.645
d.f.: n1+n2-2
t
Image adapted from: https://towardsdatascience.com/5-ways-to-increase-statistical-power-377c00dd0214
1 - β
Prediction example: bootstrapping ML model
How do you determine if an machine learning (ML) model performs
better than chance? You can create a null distribution by randomly
permuting the outcome variable on your dataset a large number of
times (e.g., 999) and run ML model on each dataset.
1 – (proportion of random R2 < real data model R2) = p-value
ML model R2
H0: R2 = 0
HA: R2 > 0
Reject H0 if proportion of random R2 less than
model R2 ≥95%
α = 0.05
Image adapted from: https://towardsdatascience.com/5-ways-to-increase-statistical-power-377c00dd0214
PSS simulation example:
In PSS simulations, rather than modeling the null distribution
as you do in bootstrapping, you simulate the HA distribution
(with a specified N, effect size, SD, etc) a large number of
times and assess what proportion of statistical tests
performed on simulated data are able to detect the effect of
interest
What is the probability of
correctly rejecting the null
hypothesis (power) with the
specified effect size, N, and α?
Power = proportion of models (performed on data
simulated with HA distributions) with p-values <α
HA models
with p< α
HA models
with p≥ α
Image adapted from: https://towardsdatascience.com/5-ways-to-increase-statistical-power-377c00dd0214
PSS simulation example:
In PSS simulations, rather than modeling the null distribution
as you do in bootstrapping, you simulate the HA distribution
(with a specified N, effect size, SD, etc) a large number of
times and assess what proportion of statistical tests
performed on simulated data are able to detect the effect of
interest
What is the probability of
correctly rejecting the null
hypothesis (power) with the
specified effect size, N, and α?
Power = proportion of models (performed on data
simulated with HA distributions) with p-values <α
HA models
with p< α
HA models
with p≥ α
What if you don’t have good power?
OCTRI PSS Simulations in R Seminar.pdf
Also need to
know:
• What’s your
data
distribution and
structure?
• What statistical
test/model are
you planning to
use?
POWER & SAMPLE SIZE SIMULATIONS
WHEN, WHAT & HOW
WHEN SHOULD YOU USE A PSS SIMULATION?
 When there isn’t an easier way. Before you start down the road of power
simulations, check whether any R packages exist for the power simulation you
want to perform!
PSS R PACKAGES https://med.und.edu/research/daccota/_files/pdfs/berd
c_resource_pdfs/sample_size_r_module.pdf
 Monte Carlo Simulation (aka, Monte Carlo Method or a multiple
probability simulation) is a mathematical technique invented during
WWII by John von Neumann and Stanislaw Ulam that is used to
estimate the possible outcomes of an uncertain event (e.g.,
modeling nuclear fission to develop atomic bombs).
 Because the element of chance (similar to a game of roulette) is
core to the modeling approach, it’s named after the well-known
casino district in Monaco.
 They are used in a wide range of fields and applications, including
financial risk assessment and long-term forecasting, estimating the
duration or cost of projects, analyzing weather patterns, traffic flow,
and in a wide swath of applications in biomedical research.
WHAT IS A MONTE CARLO SIMULATION?
 Monte Carlo Simulation (aka, Monte Carlo Method or a multiple
probability simulation) is a mathematical technique invented during
WWII by John von Neumann and Stanislaw Ulam that is used to
estimate the possible outcomes of an uncertain event (e.g.,
modeling nuclear fission to develop atomic bombs).
 Because the element of chance (similar to a game of roulette) is
core to the modeling approach, it’s named after the well-known
casino district in Monaco.
 They are used in a wide range of fields and applications, including
financial risk assessment and long-term forecasting, estimating the
duration or cost of projects, analyzing weather patterns, traffic flow,
and in a wide swath of applications in biomedical research.
WHAT IS A MONTE CARLO SIMULATION?
WHAT IS A MONTE CARLO SIMULATION?
 Monte Carlo Simulation (aka, Monte Carlo Method or a multiple
probability simulation) is a mathematical technique invented during
WWII by John von Neumann and Stanislaw Ulam that is used to
estimate the possible outcomes of an uncertain event (e.g.,
modeling nuclear fission to develop atomic bombs).
 Because the element of chance (similar to a game of roulette) is
core to the modeling approach, it’s named after the well-known
casino district in Monaco.
 They are used in a wide range of fields and applications, including
financial risk assessment and long-term forecasting, estimating the
duration or cost of projects, analyzing weather patterns, traffic flow,
and in a wide swath of applications in biomedical research.
HOW DOES A MONTE CARLO FORECASTING SIMULATION WORK?
 Three basic steps:
 Specify probability distributions of the independent variables (distribution,
range, correlations, etc)
 Set up your hypothesized predictive model, including dependent (outcome)
variable and the independent variable(s) (predictors & covariates)
 Run simulations of your model repeatedly, using generated random values of
the independent variables every time to created a predicted value of the
outcome. Do this until enough results are gathered to make a probability
distribution of your forecasted outcome.
HOW DOES A MONTE CARLO PSS SIMULATION WORK?
 Three basic steps:
 Specify probability distributions and of the independent and dependent variables (distribution,
range, correlations, etc) based on your hypothesized predictive model of the true relationship
between variables
 Run simulations repeatedly, generating random values of the independent and dependent
variables every time, and test your model’s ability to detect their true relationship at the
specified N /effect size/alpha. Do this repeatedly to assess what proportion of your simulated
results can detect the true relationship between predictor and outcome (i.e., p-value < alpha).
This is your power.
 If you’re happy with your power from the previous step, skip this step. If you’re power is too low
(e.g., <0.80), increase your N or effect size. If you’re power is higher than you need, you can
lower the N or effect size to find the minimum detectable at the desired power.
YES, BUT HOW DO YOU ACTUALLY DO THIS IN R?
 The rest of this seminar will be taking place in the R Studio environment. Code and data used is
available in Posit cloud by following this link:
https://posit.cloud/spaces/409887/join?access_code=VJ50mLqZ55QM8kAGEAGZfd5ZPnOfCw0
8LvW8i7Qv
 The next few slides will briefly walk you through the process of accessing seminar code, which can
be done now or at a later date using the above link
JOINING POSIT SPACE
 Sign up for free Posit Cloud account:
https://posit.cloud/plans/free
 After you are logged in, click link to the
shared workspace:
https://posit.cloud/spaces/409887/join?a
ccess_code=VJ50mLqZ55QM8kAGEAGZfd5
ZPnOfCw08LvW8i7Qv
 When prompted, say “Yes” to Join Space
JOINING POSIT SPACE
Click
this
OPEN PROJECT
Click this to
open a
temporary
copy
Click + to
make
yourself a
permanent
copy
OPEN PSS SIMULATIONS IN R.RMD FILE
Open code
Questions?
1 de 25

Recomendados

ProbErrorBoundROM_MC2015 por
ProbErrorBoundROM_MC2015ProbErrorBoundROM_MC2015
ProbErrorBoundROM_MC2015Mohammad Abdo
135 visualizações9 slides
Probabilistic Error Bounds for Reduced Order Modeling M&C2015 por
Probabilistic Error Bounds for Reduced Order Modeling M&C2015Probabilistic Error Bounds for Reduced Order Modeling M&C2015
Probabilistic Error Bounds for Reduced Order Modeling M&C2015Mohammad
157 visualizações9 slides
RDO_01_2016_Journal_P_Web por
RDO_01_2016_Journal_P_WebRDO_01_2016_Journal_P_Web
RDO_01_2016_Journal_P_WebSahl Martin
709 visualizações21 slides
Md simulation and stochastic simulation por
Md simulation and stochastic simulationMd simulation and stochastic simulation
Md simulation and stochastic simulationAbdulAhad358
113 visualizações23 slides
Simulation notes por
Simulation notesSimulation notes
Simulation notesmelkamu desalew
288 visualizações24 slides
Quantifying petrophysical Uncertainties Spe 93125-ms por
Quantifying petrophysical Uncertainties Spe 93125-msQuantifying petrophysical Uncertainties Spe 93125-ms
Quantifying petrophysical Uncertainties Spe 93125-msAli .Y.J
620 visualizações6 slides

Mais conteúdo relacionado

Similar a OCTRI PSS Simulations in R Seminar.pdf

MATLAB IMPLEMENTATION OF SELF-ORGANIZING MAPS FOR CLUSTERING OF REMOTE SENSIN... por
MATLAB IMPLEMENTATION OF SELF-ORGANIZING MAPS FOR CLUSTERING OF REMOTE SENSIN...MATLAB IMPLEMENTATION OF SELF-ORGANIZING MAPS FOR CLUSTERING OF REMOTE SENSIN...
MATLAB IMPLEMENTATION OF SELF-ORGANIZING MAPS FOR CLUSTERING OF REMOTE SENSIN...Daksh Raj Chopra
2.5K visualizações64 slides
Kinetic bands versus Bollinger Bands por
Kinetic bands versus Bollinger  BandsKinetic bands versus Bollinger  Bands
Kinetic bands versus Bollinger BandsAlexandru Daia
30 visualizações7 slides
Gene's law por
Gene's lawGene's law
Gene's lawHoopeer Hoopeer
24 visualizações10 slides
Six Sigma Methods and Formulas for Successful Quality Management por
Six Sigma Methods and Formulas for Successful Quality ManagementSix Sigma Methods and Formulas for Successful Quality Management
Six Sigma Methods and Formulas for Successful Quality ManagementIJERA Editor
40 visualizações4 slides
ABDO_MLROM_PHYSOR2016 por
ABDO_MLROM_PHYSOR2016ABDO_MLROM_PHYSOR2016
ABDO_MLROM_PHYSOR2016Mohammad Abdo
120 visualizações7 slides
Thesis Statement On Graphene por
Thesis Statement On GrapheneThesis Statement On Graphene
Thesis Statement On GrapheneCynthia Smith
2 visualizações42 slides

Similar a OCTRI PSS Simulations in R Seminar.pdf(20)

MATLAB IMPLEMENTATION OF SELF-ORGANIZING MAPS FOR CLUSTERING OF REMOTE SENSIN... por Daksh Raj Chopra
MATLAB IMPLEMENTATION OF SELF-ORGANIZING MAPS FOR CLUSTERING OF REMOTE SENSIN...MATLAB IMPLEMENTATION OF SELF-ORGANIZING MAPS FOR CLUSTERING OF REMOTE SENSIN...
MATLAB IMPLEMENTATION OF SELF-ORGANIZING MAPS FOR CLUSTERING OF REMOTE SENSIN...
Daksh Raj Chopra2.5K visualizações
Kinetic bands versus Bollinger Bands por Alexandru Daia
Kinetic bands versus Bollinger  BandsKinetic bands versus Bollinger  Bands
Kinetic bands versus Bollinger Bands
Alexandru Daia30 visualizações
Gene's law por Hoopeer Hoopeer
Gene's lawGene's law
Gene's law
Hoopeer Hoopeer24 visualizações
Six Sigma Methods and Formulas for Successful Quality Management por IJERA Editor
Six Sigma Methods and Formulas for Successful Quality ManagementSix Sigma Methods and Formulas for Successful Quality Management
Six Sigma Methods and Formulas for Successful Quality Management
IJERA Editor40 visualizações
ABDO_MLROM_PHYSOR2016 por Mohammad Abdo
ABDO_MLROM_PHYSOR2016ABDO_MLROM_PHYSOR2016
ABDO_MLROM_PHYSOR2016
Mohammad Abdo120 visualizações
Thesis Statement On Graphene por Cynthia Smith
Thesis Statement On GrapheneThesis Statement On Graphene
Thesis Statement On Graphene
Cynthia Smith2 visualizações
Time series project por Rupantar Rana
Time series projectTime series project
Time series project
Rupantar Rana7.7K visualizações
Machine Status Prediction for Dynamic and Heterogenous Cloud Environment por jins0618
Machine Status Prediction for Dynamic and Heterogenous Cloud EnvironmentMachine Status Prediction for Dynamic and Heterogenous Cloud Environment
Machine Status Prediction for Dynamic and Heterogenous Cloud Environment
jins0618166 visualizações
Chapter11a por David Warner
Chapter11aChapter11a
Chapter11a
David Warner175 visualizações
Aj Copulas V4 por jainan33
Aj Copulas V4Aj Copulas V4
Aj Copulas V4
jainan331.4K visualizações
Max Kuhn's talk on R machine learning por Vivian S. Zhang
Max Kuhn's talk on R machine learningMax Kuhn's talk on R machine learning
Max Kuhn's talk on R machine learning
Vivian S. Zhang5.5K visualizações
Data Trend Analysis by Assigning Polynomial Function For Given Data Set por IJCERT
Data Trend Analysis by Assigning Polynomial Function For Given Data SetData Trend Analysis by Assigning Polynomial Function For Given Data Set
Data Trend Analysis by Assigning Polynomial Function For Given Data Set
IJCERT178 visualizações
Machine Learning Guide maXbox Starter62 por Max Kleiner
Machine Learning Guide maXbox Starter62Machine Learning Guide maXbox Starter62
Machine Learning Guide maXbox Starter62
Max Kleiner65 visualizações
2. System Simulation modeling unit i por Amita Gautam
2. System Simulation modeling unit i2. System Simulation modeling unit i
2. System Simulation modeling unit i
Amita Gautam79 visualizações
Random Forest Ensemble of Support Vector Regression for Solar Power Forecasting por Mohamed Abuella
Random Forest Ensemble of Support Vector Regression for Solar Power ForecastingRandom Forest Ensemble of Support Vector Regression for Solar Power Forecasting
Random Forest Ensemble of Support Vector Regression for Solar Power Forecasting
Mohamed Abuella425 visualizações
Monte carlo simulation por k1kumar
Monte carlo simulationMonte carlo simulation
Monte carlo simulation
k1kumar577 visualizações
Performance characterization in computer vision por potaters
Performance characterization in computer visionPerformance characterization in computer vision
Performance characterization in computer vision
potaters498 visualizações
pam_1997 por Jacek Marczyk
pam_1997pam_1997
pam_1997
Jacek Marczyk430 visualizações
Monte carlo simulation in food application por Priya darshini
Monte carlo simulation in food applicationMonte carlo simulation in food application
Monte carlo simulation in food application
Priya darshini654 visualizações

Último

DGIQ East 2023 AI Ethics SIG por
DGIQ East 2023 AI Ethics SIGDGIQ East 2023 AI Ethics SIG
DGIQ East 2023 AI Ethics SIGKaren Lopez
5 visualizações7 slides
Data Journeys Hard Talk workshop final.pptx por
Data Journeys Hard Talk workshop final.pptxData Journeys Hard Talk workshop final.pptx
Data Journeys Hard Talk workshop final.pptxinfo828217
11 visualizações18 slides
Applied physics letters journal.pdf por
Applied physics letters journal.pdfApplied physics letters journal.pdf
Applied physics letters journal.pdfaqsamukhtiyar88
5 visualizações8 slides
Dr. Ousmane Badiane-2023 ReSAKSS Conference por
Dr. Ousmane Badiane-2023 ReSAKSS ConferenceDr. Ousmane Badiane-2023 ReSAKSS Conference
Dr. Ousmane Badiane-2023 ReSAKSS ConferenceAKADEMIYA2063
5 visualizações34 slides
Customer Data Cleansing Project.pptx por
Customer Data Cleansing Project.pptxCustomer Data Cleansing Project.pptx
Customer Data Cleansing Project.pptxNat O
6 visualizações23 slides
GDG Cloud Community Day 2022 - Managing data quality in Machine Learning por
GDG Cloud Community Day 2022 -  Managing data quality in Machine LearningGDG Cloud Community Day 2022 -  Managing data quality in Machine Learning
GDG Cloud Community Day 2022 - Managing data quality in Machine LearningSARADINDU SENGUPTA
5 visualizações11 slides

Último(20)

DGIQ East 2023 AI Ethics SIG por Karen Lopez
DGIQ East 2023 AI Ethics SIGDGIQ East 2023 AI Ethics SIG
DGIQ East 2023 AI Ethics SIG
Karen Lopez5 visualizações
Data Journeys Hard Talk workshop final.pptx por info828217
Data Journeys Hard Talk workshop final.pptxData Journeys Hard Talk workshop final.pptx
Data Journeys Hard Talk workshop final.pptx
info82821711 visualizações
Applied physics letters journal.pdf por aqsamukhtiyar88
Applied physics letters journal.pdfApplied physics letters journal.pdf
Applied physics letters journal.pdf
aqsamukhtiyar885 visualizações
Dr. Ousmane Badiane-2023 ReSAKSS Conference por AKADEMIYA2063
Dr. Ousmane Badiane-2023 ReSAKSS ConferenceDr. Ousmane Badiane-2023 ReSAKSS Conference
Dr. Ousmane Badiane-2023 ReSAKSS Conference
AKADEMIYA20635 visualizações
Customer Data Cleansing Project.pptx por Nat O
Customer Data Cleansing Project.pptxCustomer Data Cleansing Project.pptx
Customer Data Cleansing Project.pptx
Nat O6 visualizações
GDG Cloud Community Day 2022 - Managing data quality in Machine Learning por SARADINDU SENGUPTA
GDG Cloud Community Day 2022 -  Managing data quality in Machine LearningGDG Cloud Community Day 2022 -  Managing data quality in Machine Learning
GDG Cloud Community Day 2022 - Managing data quality in Machine Learning
SARADINDU SENGUPTA5 visualizações
OPPOTUS - Malaysians on Malaysia 3Q2023.pdf por Oppotus
OPPOTUS - Malaysians on Malaysia 3Q2023.pdfOPPOTUS - Malaysians on Malaysia 3Q2023.pdf
OPPOTUS - Malaysians on Malaysia 3Q2023.pdf
Oppotus34 visualizações
Running PostgreSQL in a Kubernetes cluster: CloudNativePG por Nick Ivanov
Running PostgreSQL in a Kubernetes cluster: CloudNativePGRunning PostgreSQL in a Kubernetes cluster: CloudNativePG
Running PostgreSQL in a Kubernetes cluster: CloudNativePG
Nick Ivanov7 visualizações
Underfunded.pptx por vgarcia19
Underfunded.pptxUnderfunded.pptx
Underfunded.pptx
vgarcia1915 visualizações
Pydata Global 2023 - How can a learnt model unlearn something por SARADINDU SENGUPTA
Pydata Global 2023 - How can a learnt model unlearn somethingPydata Global 2023 - How can a learnt model unlearn something
Pydata Global 2023 - How can a learnt model unlearn something
SARADINDU SENGUPTA8 visualizações
PRIVACY AWRE PERSONAL DATA STORAGE por antony420421
PRIVACY AWRE PERSONAL DATA STORAGEPRIVACY AWRE PERSONAL DATA STORAGE
PRIVACY AWRE PERSONAL DATA STORAGE
antony4204218 visualizações
PyData Global 2022 - Things I learned while running neural networks on microc... por SARADINDU SENGUPTA
PyData Global 2022 - Things I learned while running neural networks on microc...PyData Global 2022 - Things I learned while running neural networks on microc...
PyData Global 2022 - Things I learned while running neural networks on microc...
SARADINDU SENGUPTA5 visualizações
Custom Tag Manager Templates por Markus Baersch
Custom Tag Manager TemplatesCustom Tag Manager Templates
Custom Tag Manager Templates
Markus Baersch30 visualizações
Infomatica-MDM.pptx por Kapil Rangwani
Infomatica-MDM.pptxInfomatica-MDM.pptx
Infomatica-MDM.pptx
Kapil Rangwani12 visualizações
CRM stick or twist workshop por info828217
CRM stick or twist workshopCRM stick or twist workshop
CRM stick or twist workshop
info82821714 visualizações
VoxelNet por taeseon ryu
VoxelNetVoxelNet
VoxelNet
taeseon ryu20 visualizações
shivam tiwari.pptx por AanyaMishra4
shivam tiwari.pptxshivam tiwari.pptx
shivam tiwari.pptx
AanyaMishra49 visualizações
Data about the sector workshop por info828217
Data about the sector workshopData about the sector workshop
Data about the sector workshop
info82821729 visualizações
apple.pptx por honeybeeqwe
apple.pptxapple.pptx
apple.pptx
honeybeeqwe6 visualizações

OCTRI PSS Simulations in R Seminar.pdf

  • 1. DATE: SEPTEMBER 21, 2023 PRESENTED BY: ROBIN BAUDIER, MSPH, MFA, STAFF BIOSTATISTICIAN WITH THE BDP Power and Sample Size Simulations in R OCTRI Research Forum
  • 2. TOPICS COVERED Brief overviews of  Power and Sample Size (PSS) concepts  Monte Carlo Simulations  Using Monte Carlo Simulations to calculate PSS How to perform PSS simulations in R  Simulating variables  Using loops  Looping tests on simulated datasets to calculate power  Examples of complex PSS simulations  Packages that can make things a little easier
  • 3. POWER & SAMPLE SIZE CALCULATIONS A VERY BRIEF OVERVIEW
  • 4. FOR MORE INFORMATION ON THE BASICS OF PSS CALCULATIONS  Check out the OCTRI PSS 101 Seminar, co-developed by Meike Niederhausen, PhD and Alicia Johnson, MPH, and presented most recently by Alicia Johnson in July:  RECORDING: https://echo360.org/media/bd0064c8-327c-4b56-8279-9558cd307125/public  SLIDES: https://drive.google.com/file/d/1691RWaDzzRkjbdhbfjfxO7yKESRUjuAn/view
  • 5. Image adapted from: https://towardsdatascience.com/5-ways-to-increase-statistical-power-377c00dd0214 Power is the probability of correctly rejecting the null hypothesis What is power? α is the probability of incorrectly rejecting the null hypothesis 1 - α 1 - β Critical Value
  • 6. Image adapted from: https://towardsdatascience.com/5-ways-to-increase-statistical-power-377c00dd0214 Statistical test example: one-tailed T-test α = 0.05 1 - α 1 - β 0 1.645 Reject the null hypothesis if t > 1.645 d.f.: n1+n2-2 t
  • 7. Image adapted from: https://towardsdatascience.com/5-ways-to-increase-statistical-power-377c00dd0214 1 - β Prediction example: bootstrapping ML model How do you determine if an machine learning (ML) model performs better than chance? You can create a null distribution by randomly permuting the outcome variable on your dataset a large number of times (e.g., 999) and run ML model on each dataset. 1 – (proportion of random R2 < real data model R2) = p-value ML model R2 H0: R2 = 0 HA: R2 > 0 Reject H0 if proportion of random R2 less than model R2 ≥95% α = 0.05
  • 8. Image adapted from: https://towardsdatascience.com/5-ways-to-increase-statistical-power-377c00dd0214 PSS simulation example: In PSS simulations, rather than modeling the null distribution as you do in bootstrapping, you simulate the HA distribution (with a specified N, effect size, SD, etc) a large number of times and assess what proportion of statistical tests performed on simulated data are able to detect the effect of interest What is the probability of correctly rejecting the null hypothesis (power) with the specified effect size, N, and α? Power = proportion of models (performed on data simulated with HA distributions) with p-values <α HA models with p< α HA models with p≥ α
  • 9. Image adapted from: https://towardsdatascience.com/5-ways-to-increase-statistical-power-377c00dd0214 PSS simulation example: In PSS simulations, rather than modeling the null distribution as you do in bootstrapping, you simulate the HA distribution (with a specified N, effect size, SD, etc) a large number of times and assess what proportion of statistical tests performed on simulated data are able to detect the effect of interest What is the probability of correctly rejecting the null hypothesis (power) with the specified effect size, N, and α? Power = proportion of models (performed on data simulated with HA distributions) with p-values <α HA models with p< α HA models with p≥ α What if you don’t have good power?
  • 11. Also need to know: • What’s your data distribution and structure? • What statistical test/model are you planning to use?
  • 12. POWER & SAMPLE SIZE SIMULATIONS WHEN, WHAT & HOW
  • 13. WHEN SHOULD YOU USE A PSS SIMULATION?  When there isn’t an easier way. Before you start down the road of power simulations, check whether any R packages exist for the power simulation you want to perform!
  • 14. PSS R PACKAGES https://med.und.edu/research/daccota/_files/pdfs/berd c_resource_pdfs/sample_size_r_module.pdf
  • 15.  Monte Carlo Simulation (aka, Monte Carlo Method or a multiple probability simulation) is a mathematical technique invented during WWII by John von Neumann and Stanislaw Ulam that is used to estimate the possible outcomes of an uncertain event (e.g., modeling nuclear fission to develop atomic bombs).  Because the element of chance (similar to a game of roulette) is core to the modeling approach, it’s named after the well-known casino district in Monaco.  They are used in a wide range of fields and applications, including financial risk assessment and long-term forecasting, estimating the duration or cost of projects, analyzing weather patterns, traffic flow, and in a wide swath of applications in biomedical research. WHAT IS A MONTE CARLO SIMULATION?
  • 16.  Monte Carlo Simulation (aka, Monte Carlo Method or a multiple probability simulation) is a mathematical technique invented during WWII by John von Neumann and Stanislaw Ulam that is used to estimate the possible outcomes of an uncertain event (e.g., modeling nuclear fission to develop atomic bombs).  Because the element of chance (similar to a game of roulette) is core to the modeling approach, it’s named after the well-known casino district in Monaco.  They are used in a wide range of fields and applications, including financial risk assessment and long-term forecasting, estimating the duration or cost of projects, analyzing weather patterns, traffic flow, and in a wide swath of applications in biomedical research. WHAT IS A MONTE CARLO SIMULATION?
  • 17. WHAT IS A MONTE CARLO SIMULATION?  Monte Carlo Simulation (aka, Monte Carlo Method or a multiple probability simulation) is a mathematical technique invented during WWII by John von Neumann and Stanislaw Ulam that is used to estimate the possible outcomes of an uncertain event (e.g., modeling nuclear fission to develop atomic bombs).  Because the element of chance (similar to a game of roulette) is core to the modeling approach, it’s named after the well-known casino district in Monaco.  They are used in a wide range of fields and applications, including financial risk assessment and long-term forecasting, estimating the duration or cost of projects, analyzing weather patterns, traffic flow, and in a wide swath of applications in biomedical research.
  • 18. HOW DOES A MONTE CARLO FORECASTING SIMULATION WORK?  Three basic steps:  Specify probability distributions of the independent variables (distribution, range, correlations, etc)  Set up your hypothesized predictive model, including dependent (outcome) variable and the independent variable(s) (predictors & covariates)  Run simulations of your model repeatedly, using generated random values of the independent variables every time to created a predicted value of the outcome. Do this until enough results are gathered to make a probability distribution of your forecasted outcome.
  • 19. HOW DOES A MONTE CARLO PSS SIMULATION WORK?  Three basic steps:  Specify probability distributions and of the independent and dependent variables (distribution, range, correlations, etc) based on your hypothesized predictive model of the true relationship between variables  Run simulations repeatedly, generating random values of the independent and dependent variables every time, and test your model’s ability to detect their true relationship at the specified N /effect size/alpha. Do this repeatedly to assess what proportion of your simulated results can detect the true relationship between predictor and outcome (i.e., p-value < alpha). This is your power.  If you’re happy with your power from the previous step, skip this step. If you’re power is too low (e.g., <0.80), increase your N or effect size. If you’re power is higher than you need, you can lower the N or effect size to find the minimum detectable at the desired power.
  • 20. YES, BUT HOW DO YOU ACTUALLY DO THIS IN R?  The rest of this seminar will be taking place in the R Studio environment. Code and data used is available in Posit cloud by following this link: https://posit.cloud/spaces/409887/join?access_code=VJ50mLqZ55QM8kAGEAGZfd5ZPnOfCw0 8LvW8i7Qv  The next few slides will briefly walk you through the process of accessing seminar code, which can be done now or at a later date using the above link
  • 21. JOINING POSIT SPACE  Sign up for free Posit Cloud account: https://posit.cloud/plans/free  After you are logged in, click link to the shared workspace: https://posit.cloud/spaces/409887/join?a ccess_code=VJ50mLqZ55QM8kAGEAGZfd5 ZPnOfCw08LvW8i7Qv  When prompted, say “Yes” to Join Space
  • 23. OPEN PROJECT Click this to open a temporary copy Click + to make yourself a permanent copy
  • 24. OPEN PSS SIMULATIONS IN R.RMD FILE Open code