SlideShare uma empresa Scribd logo
1 de 66
Baixar para ler offline
A General Framework for
Multiple Testing Dependence
Jeffrey Leek
Johns Hopkins University School of Medicine
High-dimensional multiple hypothesis testing is common.
Problem:
Dependence between tests can result in incorrect statistical
and scientific results.
A solution:
Define and address multiple testing dependence at the
level of the data – not the P-values.
Big Picture Ideas
High-Dimensional Multiple Testing Is Common
Spatial EpidemiologyBrain Imaging
Molecular Biology
4
Inflammation and the Host Response to Injury
mRNA
Expression
~50,000
genes
Clinical Data 
>150
clinical variables
Patient 1 Patient 2 Patient 166….
MOF
measures
severity of
injury
Data at Initial Time Point
Multiple Organ Failure
Simple Analysis
1. Fit the model to the data, xi, for gene i:
xi = ai + biMOF + ei
2. Calculate P-values for testing the hypotheses:
H0: bi = 0 vs. H1: bi ≠ 0
3
Four “Replicated” Studies
Phase 1
Phase 3
Phase 2
Phase 4
P-value P-value
P-value P-value
Frequency
Frequency
Frequency
Frequency
•  Data for test i:
•  “Primary variable(s)”:
•  Model:
•  Hypothesis test i:
€
xi = xi1,xi2,…,xin( )
€
Y = y1,y2,…,yn( )
€
xij = ai + biksk y j( )
k=1
d
∑ + eij
H0i :bi ∈ Ω0 H1i :bi ∈ Ω1
{m hypothesis tests, n observations per test}
Start With The Whole Data
= +
X = B S(Y) + E
observations
tests
Underlying Model
A Simple Simulated Example
Independent E Dependent E
Genes
Genes
Arrays Arrays
Null P-Value Distributions
Independent E
Dependent E
Frequency
Frequency
Frequency
Frequency
Frequency
Frequency
Frequency
Frequency
P-value P-value P-value P-value
P-value P-value P-value P-value
Null P-Value Distributions
|ρ| = 0.40 |ρ| = 0.31 |ρ| = 0.10 |ρ| = 0.00Correlation
Independent E
Dependent E
Frequency
Frequency
Frequency
Frequency
Frequency
Frequency
Frequency
Frequency
P-value P-value P-value P-value
P-value P-value P-value P-value
Null Distribution Behavior
Dependent E
Independent E
False Discovery Rate Estimates
Independent E Dependent E
Ranking Estimates
Independent E Dependent E
Data X
Fit Model
X= BS + E
Obtain
and R
€
ˆB
Calculate
P-values
Form P-value
Threshold
When To Address Dependence?
Form Test-Statistics
and
Null Distribution
Data X
Fit Model
X= BS + E
Obtain
and R
€
ˆB
Calculate
P-values
Form P-value
Threshold
When To Address Dependence?
Form Test-Statistics
and
Null Distribution
Existing Approaches
Empirical null approaches
modify the null distribution at
the test-statistic level
Dependence adjustments
conservatively modify
the P-value threshold
Examples of Existing Approaches
•  Empirical Null
– Devlin and Roeder Biometrics (1999)
– Efron JASA (2004)
– Schwartzman AOAS (2008)
•  Error Rate Adjustments
– Benjamini and Yekutieli Annals of Statistics (2001)
– Romano, Shaikh, and Wolf Test (2001)
– Dudoit, Gilbert, van der Laan Biometrical Journal (2008)
Data X
Fit Model
X= BS + E
Obtain
and R
€
ˆB
Calculate
P-values
Form P-value
Threshold
When To Address Dependence?
Form Test-Statistics
and
Null Distribution
Our Approach
Fit the model:
X = BS + ΓG + U
where G is a valid dependence
kernel
Dependence and bias are no longer present at any of these steps;
standard methods can be used.
Data X
Fit Model
X= BS + E
Obtain
and R
€
ˆB
Calculate
P-values
Form P-value
Threshold
When To Address Dependence?
Form Test-Statistics
and
Null Distribution
Our Approach
Fit the model:
X = BS + ΓG + U
where G is a valid dependence
kernel
New Dependence Definitions
Definition – Data X are population-level multiple testing
dependent if:
Definition - Data X are estimation-level multiple testing
dependent if:
Leek and Storey (2008)
Structure in E
Array
MOF1Genes
Signal + Dependent Noise
Dependent Noise
Independent Noise
= +
X = B S + E
observations
tests
data
random
variation
primary
variables
Decomposing E
= +
X = B S + H + U
tests
+
independent
variation
observations
data
primary
variables
dependent
variation
Decomposing E
= +
X = B S + Γ G + U
tests
+
independent
variation
observations
data
primary
variables
dependence
kernel
Decomposing E
H
Decomposing E
Theorem Let the data be distributed according to the
model:
Suppose that for each ei there is no Borel measurable
function, g, such that ei =g(ei,…,ei-1,ei+1,…,em) almost
surely. Then there exist matrices Γ(m×r), G(r×n) (r ≤ n) and
U(m×n) such that:
where the rows of U are independent and ui ≠ 0 and
ui=hi(ei) for a non-random Borel measurable function hi.
Leek and Storey (2008)
Dependence Kernel
Leek and Storey (2008)
Definition – Dependence Kernel
An r ×n matrix G forms a dependence kernel for the data X, if
the following equality holds:
X = BS + E
= BS + ΓG + U
where the rows of U are independent.
Fitting S & G Results In Independent Tests
Leek and Storey (2008)
Theorem Let G be any valid dependence kernel for the data X.
Suppose that the model:
is fit by least squares resulting in residuals:
if the rowspace jointly spanned by S and G has dimension less
than n, then the ri and the are jointly independent given S
and G and:
€
ˆbi
= +
X = B S + Γ G + U
tests
+
independent
variation
observations
data
primary
variables
dependence
kernel
A “Blessing” of Dimensionality
Iteratively Reweighted Surrogate Variable Analysis
1.  Estimate the row dimension, , of G.
2.  Form an initial estimate equal to the first right
singular vectors of R = X - S.
3.  Estimate .
4.  Weight the ith row of X by and
set to be the first right singular vectors of the
weighted matrix.
ˆG(b+1)
€
ˆr
€
ˆB
Iterate for b=0,…,B:
€
ˆG0
ˆr
€
X = BS + ΓG + U
€
xi = biS + γiG + ui
Whole data:
Test i data:
€
ˆr
An Example of the IRW-SVA Algorithm
The Data True GEstimate of GPr(G & !S)
An Example of the IRW-SVA Algorithm
The Data True GEstimate of GPr(G & !S)
An Example of the IRW-SVA Algorithm
The Data True GEstimate of GPr(G & !S)
An Example of the IRW-SVA Algorithm
The Data True GEstimate of GPr(G & !S)
An Example of the IRW-SVA Algorithm
The Data True GEstimate of GPr(G & !S)
An Example of the IRW-SVA Algorithm
The Data True GEstimate of GPr(G & !S)
An Example of the IRW-SVA Algorithm
The Data True GEstimate of GPr(G & !S)
Iteratively Re-weighted Surrogate Variable Analysis
1.  Estimate the row dimension, , of G.
2.  Form an initial estimate equal to the first right
singular vectors of R = X - S.
3.  Estimate .
4.  Weight the ith row of X by and
set to be the first right singular vectors of the
weighted matrix.
ˆG(b+1)
€
ˆr
€
ˆB
€
ˆG0
ˆr
€
X = BS + ΓG + U
€
xi = biS + γiG + ui
Whole data:
Test i data:
€
ˆr
Iterate for b=0,…,B:
1.  Buja and Eyuboglu (1992) proposed a
permutation approach.
2.  Patterson, Price, and Reich (2006) proposed a
sequential testing strategy based on Tracey-
Widom theory.
3.  Leek (in preparation) proposes an eigenvalue
estimator that is consistent in the number of
tests.
Estimating The Row Dimension of G
1.  Assume the data follow X = BS + ΓG + U, where G
and S have row dimensions r and d, r + d < n.
2.  Calculate the singular values s1,…, sn of X and choose
b, such that r+d < b.
3.  Calculate the eigenvalues, λ1,…, λn of
where P = I - S(STS)-1ST and R = XP.
4.  Set
ˆr = 1 λj > m−1/ 3
( )
j=1
n
∑
€
€
1
m
RT
R − sb
2
P[ ]
Estimating The Row Dimension of G
Theorem As ,
is a consistent estimate of the row dimension of G,
provided that:
(1) uij are independent
(2) E[uij]=0
(3) 
(4) 
(5)  ΓTΓ is positive definite with unique eigenvalues
€
m → ∞
€
E[uij
2
] = σi
2
< M1
€
E[uij
4
] < M2
€
lim
m→∞
1
m
Leek (In Prep.)
€
ˆr = 1 λj > m−1/ 3
( )
j=1
n
∑
Estimating The Row Dimension of G
Iteratively Re-weighted Surrogate Variable Analysis
1.  Estimate the row dimension, , of G.
2.  Form an initial estimate equal to the first right
singular vectors of R = X - S.
3.  Estimate .
4.  Weight the ith row of X by and
set to be the first right singular vectors of the
weighted matrix.
ˆG(b+1)
€
ˆr
€
ˆB
€
ˆG0
ˆr
€
X = BS + ΓG + U
€
xi = biS + γiG + ui
Whole data:
Test i data:
€
ˆr
Iterate for b=0,…,B:
Break The Estimation Into Two Components
1.  Form F-statistics F1,…,Fm for testing the hypotheses:
2.  Bootstrap from the conditional null model to obtain null-
statistics , k =1,…K.
3.  From Bayes’ Theorem:
where and .
Estimating the Probability Weights
€
F1
0k
,...,Fm
0k
€
Fi
0k
~ g0
€
Fi ~ π0g0 + (1− π0)g1
1.  Form F-statistics F1,…,Fm for testing the hypotheses:
2.  Bootstrap from the conditional null model to obtain null-
statistics , k =1,…K.
3.  From Bayes’ Theorem:
4.  Estimate the ratio of the densities with a non-parametric
logistic regression where Fi are “successes” and Fi
0k are
“failures” (Anderson and Blair 1982).
where and . .
Estimating the Probability Weights
€
F1
0k
,...,Fm
0k
€
Fi
0k
~ g0
€
Fi ~ π0g0 + (1− π0)g1
1.  Form F-statistics F1,…,Fm for testing the hypotheses:
2.  Bootstrap from the conditional null model to obtain null-
statistics , k =1,…K.
3.  From Bayes’ Theorem:
4.  Estimate the ratio of the densities with a non-parametric
logistic regression where Fi are “successes” and Fi
0k are
“failures” (Anderson and Blair 1982).
5.  Estimate π0 according to Storey (2002).
where and .
Estimating the Probability Weights
€
F1
0k
,...,Fm
0k
€
Fi
0k
~ g0
€
Fi ~ π0g0 + (1− π0)g1
Estimating the Probability Weights
Estimate of posterior
probability bi ≠ 0.
SVA-Adjusted Analysis
1.  Estimate G with IRW-SVA
2.  Fit
3.  Test the hypotheses
€
H0i :bi ∈ Ω0 H1i :bi ∈ Ω1
A Simple Simulated Example
Independent E Dependent E
Genes
Genes
Arrays Arrays
Null Distribution Behavior
Dependent E
Independent E
Dependent E
+ IRW-SVA
False Discovery Rate Estimates
Independent E Dependent E
Dependent E
+ IRW-SVA
True False Discovery Rate True False Discovery Rate True False Discovery Rate
Q-value
Q-value
Q-value
Ranking Estimates
Independent E Dependent E
Dependent E
+ IRW-SVA
Ranking by True Signal to Noise Ranking by True Signal to Noise Ranking by True Signal to Noise
AverageRankingbyT-Statistic
AverageRankingbyT-Statistic
AverageRankingbyT-Statistic
53
Inflammation and the Host Response to Injury
mRNA
Expression
~50,000
genes
Clinical Data 
>150
clinical variables
Patient 1 Patient 2 Patient 166….
MOF1
measures
severity of
injury
Phase 1 Phase 2 Phase 3 Phase 4
Four “Replicated” Studies
FrequencyFrequency
P-value P-value P-value P-value
P-value P-value P-value P-value
Frequency
Frequency
Frequency
Frequency
Frequency
Frequency
Frequency
Functional Enrichment Across Phases
Number of phases in which a significant pathway appears
Percentoftotalsignificantpathways
1 of 4 2 of 4 3 of 4 4 of 4
Unadjusted
IRW-SVAAdjusted
•  High-dimensional hypothesis testing is common.
•  Dependence between tests can result in incorrect
statistical and scientific inference.
•  We can define and address dependence at the
level of the model using the dependence kernel.
•  IRW-SVA can be used to improve inference in
high-dimensional multiple hypothesis testing.
Summary
Future Work
•  Multiple Testing
– Develop dependence kernel estimates for spatial data
– Develop diagnostic tests for multiple testing procedures
•  High-Dimensional Asymptotics
– Extend methods for asymptotic SVD to binary data
•  Feature Selection for High-Dimensional Classifiers
– Extensions of top-scoring pairs (TSP) to survival data
– Theoretical connections to LDA and SVM
– Embedding TSP in a logic regression framework
Thank You
1.  Calculate the residuals R = X - S.
2.  Calculate the singular values of R, d1,…,dn.
3.  Permute each row of R individually to get R0.
4.  Take the SVD of the residuals R* = R0 - S to
obtain null singular values .
5.  Compare di to for k=1,…,K to calculate a P-
value for the ith right singular vector.
Estimating The Row Dimension of G
€
ˆB
€
ˆB0
€
di0
k
€
di0
k
For k =1,…,K do steps 3-4:
Buja and Eyuboglu (1992)
Why Does This Work?
Leek and Storey (2007), Leek and Storey (2008)
Useful Fact:
X = BS + E
= BS + ΓG + U
= BS + ΛH + U
if G and H have the same column space.
•  References:
Benjamini Y and Hochberg Y. (1995), “Controlling the false discovery rate – a
practical and powerful approach to multiple testing.” JRSSB, 57: 289-300.
De Castro MC, Monte-Mor RL, Sawyer DO, and Singer, BH. (2005),
“Malaria risk on the amazon frontier.” PNAS, 103: 2452-2457.
Delin B and Roeder K. (1999), “Genomic control for association studies.”
Biometrics, 55: 997-1004.
Efron B. (2004) “Large-scale simultaneous hypothesis testing: The choice of a
null hypothesis.” JASA, 99: 96-104.
Leek JT and Storey JD. (2008) “A general framework for multiple testing
dependence.” Proceedings of the National Academy of Sciences , 105:
18718-18723.
Leek JT and Storey JD. (2007) “Capturing heterogeneity in gene expression
studies by ‘Surrogate Variable Analysis’.” PLoS Genetics, 3: e161.
Taylor JE and Worsley KJ. (2007) “Detecting sparse signals in random fields,
with applications to brain mapping.” JASA, 102: 913-928.
Thank You
1.  Perform each hypothesis test individually.
2.  Obtain the test-statistic for each test.
3.  Compare distribution of test-statistics to the
theoretical null distribution.
4.  Adjust theoretical null so that it matches the
observed statistics in a low signal region.
Empirical Null
Theoretical Null
Efron (2004)
Theoretical Null
Empirical Null
Efron (2004)
Empirical Null Results in Incorrect Null Distribution
Dep. Kernel
•  Observed statistics or observed P-values come
from mixture distribution:
π0g0 + π1g1
•  Dependence distorts g0 … can go either way:
•  Must use full data set to capture dependence
With Confounding Empirical Null is Ill-Posed

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

better together? statistical learning in models made of modules
better together? statistical learning in models made of modulesbetter together? statistical learning in models made of modules
better together? statistical learning in models made of modules
 
Convergence of ABC methods
Convergence of ABC methodsConvergence of ABC methods
Convergence of ABC methods
 
prior selection for mixture estimation
prior selection for mixture estimationprior selection for mixture estimation
prior selection for mixture estimation
 
Bayesian inference on mixtures
Bayesian inference on mixturesBayesian inference on mixtures
Bayesian inference on mixtures
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Laplace's Demon: seminar #1
Laplace's Demon: seminar #1
 
Inference in generative models using the Wasserstein distance [[INI]
Inference in generative models using the Wasserstein distance [[INI]Inference in generative models using the Wasserstein distance [[INI]
Inference in generative models using the Wasserstein distance [[INI]
 
Multiple estimators for Monte Carlo approximations
Multiple estimators for Monte Carlo approximationsMultiple estimators for Monte Carlo approximations
Multiple estimators for Monte Carlo approximations
 
8517ijaia06
8517ijaia068517ijaia06
8517ijaia06
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
Introductory maths analysis chapter 10 official
Introductory maths analysis   chapter 10 officialIntroductory maths analysis   chapter 10 official
Introductory maths analysis chapter 10 official
 
asymptotics of ABC
asymptotics of ABCasymptotics of ABC
asymptotics of ABC
 
ABC workshop: 17w5025
ABC workshop: 17w5025ABC workshop: 17w5025
ABC workshop: 17w5025
 
An overview of Bayesian testing
An overview of Bayesian testingAn overview of Bayesian testing
An overview of Bayesian testing
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
 
CISEA 2019: ABC consistency and convergence
CISEA 2019: ABC consistency and convergenceCISEA 2019: ABC consistency and convergence
CISEA 2019: ABC consistency and convergence
 
Lesson 23: Antiderivatives (Section 021 handout)
Lesson 23: Antiderivatives (Section 021 handout)Lesson 23: Antiderivatives (Section 021 handout)
Lesson 23: Antiderivatives (Section 021 handout)
 
Approximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forestsApproximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forests
 
Multilinear Twisted Paraproducts
Multilinear Twisted ParaproductsMultilinear Twisted Paraproducts
Multilinear Twisted Paraproducts
 
Monte Carlo in Montréal 2017
Monte Carlo in Montréal 2017Monte Carlo in Montréal 2017
Monte Carlo in Montréal 2017
 

Semelhante a JHU Job Talk

GonzalezGinestetResearchDay2016
GonzalezGinestetResearchDay2016GonzalezGinestetResearchDay2016
GonzalezGinestetResearchDay2016
Pablo Ginestet
 
Novel set approximations in generalized multi valued decision information sys...
Novel set approximations in generalized multi valued decision information sys...Novel set approximations in generalized multi valued decision information sys...
Novel set approximations in generalized multi valued decision information sys...
Soaad Abd El-Badie
 
An improved demspter shafer algorithm for resolving conflicting events
An improved demspter shafer algorithm for resolving conflicting eventsAn improved demspter shafer algorithm for resolving conflicting events
An improved demspter shafer algorithm for resolving conflicting events
Gauravv Prabhu
 

Semelhante a JHU Job Talk (20)

Bayesian statistics intro using r
Bayesian statistics intro using rBayesian statistics intro using r
Bayesian statistics intro using r
 
Hmisiri nonparametrics book
Hmisiri nonparametrics bookHmisiri nonparametrics book
Hmisiri nonparametrics book
 
ISBA 2022 Susie Bayarri lecture
ISBA 2022 Susie Bayarri lectureISBA 2022 Susie Bayarri lecture
ISBA 2022 Susie Bayarri lecture
 
GonzalezGinestetResearchDay2016
GonzalezGinestetResearchDay2016GonzalezGinestetResearchDay2016
GonzalezGinestetResearchDay2016
 
Data mining assignment 2
Data mining assignment 2Data mining assignment 2
Data mining assignment 2
 
Lec12-Probability (1).ppt
Lec12-Probability (1).pptLec12-Probability (1).ppt
Lec12-Probability (1).ppt
 
Lec12-Probability.ppt
Lec12-Probability.pptLec12-Probability.ppt
Lec12-Probability.ppt
 
Lec12-Probability.ppt
Lec12-Probability.pptLec12-Probability.ppt
Lec12-Probability.ppt
 
Lec12-Probability.ppt
Lec12-Probability.pptLec12-Probability.ppt
Lec12-Probability.ppt
 
Novel set approximations in generalized multi valued decision information sys...
Novel set approximations in generalized multi valued decision information sys...Novel set approximations in generalized multi valued decision information sys...
Novel set approximations in generalized multi valued decision information sys...
 
An improved demspter shafer algorithm for resolving conflicting events
An improved demspter shafer algorithm for resolving conflicting eventsAn improved demspter shafer algorithm for resolving conflicting events
An improved demspter shafer algorithm for resolving conflicting events
 
More investment in Research and Development for better Education in the future?
More investment in Research and Development for better Education in the future?More investment in Research and Development for better Education in the future?
More investment in Research and Development for better Education in the future?
 
02.bayesian learning
02.bayesian learning02.bayesian learning
02.bayesian learning
 
02.bayesian learning
02.bayesian learning02.bayesian learning
02.bayesian learning
 
bayesian learning
bayesian learningbayesian learning
bayesian learning
 
Unit IV UNCERTAINITY AND STATISTICAL REASONING in AI K.Sundar,AP/CSE,VEC
Unit IV UNCERTAINITY AND STATISTICAL REASONING in AI K.Sundar,AP/CSE,VECUnit IV UNCERTAINITY AND STATISTICAL REASONING in AI K.Sundar,AP/CSE,VEC
Unit IV UNCERTAINITY AND STATISTICAL REASONING in AI K.Sundar,AP/CSE,VEC
 
Bayesian statistics using r intro
Bayesian statistics using r   introBayesian statistics using r   intro
Bayesian statistics using r intro
 
Talk at CIRM on Poisson equation and debiasing techniques
Talk at CIRM on Poisson equation and debiasing techniquesTalk at CIRM on Poisson equation and debiasing techniques
Talk at CIRM on Poisson equation and debiasing techniques
 
Basic of Hypothesis Testing TEKU QM
Basic of Hypothesis Testing TEKU QMBasic of Hypothesis Testing TEKU QM
Basic of Hypothesis Testing TEKU QM
 
Basic concepts of probability
Basic concepts of probability Basic concepts of probability
Basic concepts of probability
 

Mais de jtleek

Mais de jtleek (11)

Data science as a science
Data science as a scienceData science as a science
Data science as a science
 
JHU Data Science MOOCs - Behind the Scenes
JHU Data Science MOOCs - Behind the ScenesJHU Data Science MOOCs - Behind the Scenes
JHU Data Science MOOCs - Behind the Scenes
 
Fixing the leaks in the pipeline from public genomics data to the clinic
Fixing the leaks in the pipeline from public genomics data to the clinicFixing the leaks in the pipeline from public genomics data to the clinic
Fixing the leaks in the pipeline from public genomics data to the clinic
 
Evidence based data analysis
Evidence based data analysisEvidence based data analysis
Evidence based data analysis
 
Evidence based data analysis
Evidence based data analysisEvidence based data analysis
Evidence based data analysis
 
Leek romesf-2015
Leek romesf-2015Leek romesf-2015
Leek romesf-2015
 
The Largest Data Science Program in the World: The Johns Hopkins Data Science...
The Largest Data Science Program in the World: The Johns Hopkins Data Science...The Largest Data Science Program in the World: The Johns Hopkins Data Science...
The Largest Data Science Program in the World: The Johns Hopkins Data Science...
 
Flash talk about Johns Hopkins Biostatistics Genomics Group
Flash talk about Johns Hopkins Biostatistics Genomics GroupFlash talk about Johns Hopkins Biostatistics Genomics Group
Flash talk about Johns Hopkins Biostatistics Genomics Group
 
10 things statistics taught us about big data
10 things statistics taught us about big data10 things statistics taught us about big data
10 things statistics taught us about big data
 
Big data and statisticians
Big data and statisticiansBig data and statisticians
Big data and statisticians
 
Data Science Education at JHSPH
Data Science Education at JHSPHData Science Education at JHSPH
Data Science Education at JHSPH
 

Último

Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
gajnagarg
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
ahmedjiabur940
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
chadhar227
 
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling ManjurJual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
ptikerjasaptiker
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
vexqp
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
vexqp
 
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit RiyadhCytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Abortion pills in Riyadh +966572737505 get cytotec
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
vexqp
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
q6pzkpark
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
gajnagarg
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
wsppdmt
 

Último (20)

Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling ManjurJual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit RiyadhCytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptx
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 

JHU Job Talk

  • 1. A General Framework for Multiple Testing Dependence Jeffrey Leek Johns Hopkins University School of Medicine
  • 2. High-dimensional multiple hypothesis testing is common. Problem: Dependence between tests can result in incorrect statistical and scientific results. A solution: Define and address multiple testing dependence at the level of the data – not the P-values. Big Picture Ideas
  • 3. High-Dimensional Multiple Testing Is Common Spatial EpidemiologyBrain Imaging Molecular Biology
  • 4. 4 Inflammation and the Host Response to Injury mRNA Expression ~50,000 genes Clinical Data >150 clinical variables Patient 1 Patient 2 Patient 166…. MOF measures severity of injury
  • 5. Data at Initial Time Point Multiple Organ Failure
  • 6. Simple Analysis 1. Fit the model to the data, xi, for gene i: xi = ai + biMOF + ei 2. Calculate P-values for testing the hypotheses: H0: bi = 0 vs. H1: bi ≠ 0 3
  • 7. Four “Replicated” Studies Phase 1 Phase 3 Phase 2 Phase 4 P-value P-value P-value P-value Frequency Frequency Frequency Frequency
  • 8. •  Data for test i: •  “Primary variable(s)”: •  Model: •  Hypothesis test i: € xi = xi1,xi2,…,xin( ) € Y = y1,y2,…,yn( ) € xij = ai + biksk y j( ) k=1 d ∑ + eij H0i :bi ∈ Ω0 H1i :bi ∈ Ω1 {m hypothesis tests, n observations per test} Start With The Whole Data
  • 9. = + X = B S(Y) + E observations tests Underlying Model
  • 10. A Simple Simulated Example Independent E Dependent E Genes Genes Arrays Arrays
  • 11. Null P-Value Distributions Independent E Dependent E Frequency Frequency Frequency Frequency Frequency Frequency Frequency Frequency P-value P-value P-value P-value P-value P-value P-value P-value
  • 12. Null P-Value Distributions |ρ| = 0.40 |ρ| = 0.31 |ρ| = 0.10 |ρ| = 0.00Correlation Independent E Dependent E Frequency Frequency Frequency Frequency Frequency Frequency Frequency Frequency P-value P-value P-value P-value P-value P-value P-value P-value
  • 14. False Discovery Rate Estimates Independent E Dependent E
  • 16. Data X Fit Model X= BS + E Obtain and R € ˆB Calculate P-values Form P-value Threshold When To Address Dependence? Form Test-Statistics and Null Distribution
  • 17. Data X Fit Model X= BS + E Obtain and R € ˆB Calculate P-values Form P-value Threshold When To Address Dependence? Form Test-Statistics and Null Distribution Existing Approaches Empirical null approaches modify the null distribution at the test-statistic level Dependence adjustments conservatively modify the P-value threshold
  • 18. Examples of Existing Approaches •  Empirical Null – Devlin and Roeder Biometrics (1999) – Efron JASA (2004) – Schwartzman AOAS (2008) •  Error Rate Adjustments – Benjamini and Yekutieli Annals of Statistics (2001) – Romano, Shaikh, and Wolf Test (2001) – Dudoit, Gilbert, van der Laan Biometrical Journal (2008)
  • 19. Data X Fit Model X= BS + E Obtain and R € ˆB Calculate P-values Form P-value Threshold When To Address Dependence? Form Test-Statistics and Null Distribution Our Approach Fit the model: X = BS + ΓG + U where G is a valid dependence kernel
  • 20. Dependence and bias are no longer present at any of these steps; standard methods can be used. Data X Fit Model X= BS + E Obtain and R € ˆB Calculate P-values Form P-value Threshold When To Address Dependence? Form Test-Statistics and Null Distribution Our Approach Fit the model: X = BS + ΓG + U where G is a valid dependence kernel
  • 21. New Dependence Definitions Definition – Data X are population-level multiple testing dependent if: Definition - Data X are estimation-level multiple testing dependent if: Leek and Storey (2008)
  • 22. Structure in E Array MOF1Genes Signal + Dependent Noise Dependent Noise Independent Noise
  • 23. = + X = B S + E observations tests data random variation primary variables Decomposing E
  • 24. = + X = B S + H + U tests + independent variation observations data primary variables dependent variation Decomposing E
  • 25. = + X = B S + Γ G + U tests + independent variation observations data primary variables dependence kernel Decomposing E H
  • 26. Decomposing E Theorem Let the data be distributed according to the model: Suppose that for each ei there is no Borel measurable function, g, such that ei =g(ei,…,ei-1,ei+1,…,em) almost surely. Then there exist matrices Γ(m×r), G(r×n) (r ≤ n) and U(m×n) such that: where the rows of U are independent and ui ≠ 0 and ui=hi(ei) for a non-random Borel measurable function hi. Leek and Storey (2008)
  • 27. Dependence Kernel Leek and Storey (2008) Definition – Dependence Kernel An r ×n matrix G forms a dependence kernel for the data X, if the following equality holds: X = BS + E = BS + ΓG + U where the rows of U are independent.
  • 28. Fitting S & G Results In Independent Tests Leek and Storey (2008) Theorem Let G be any valid dependence kernel for the data X. Suppose that the model: is fit by least squares resulting in residuals: if the rowspace jointly spanned by S and G has dimension less than n, then the ri and the are jointly independent given S and G and: € ˆbi
  • 29. = + X = B S + Γ G + U tests + independent variation observations data primary variables dependence kernel A “Blessing” of Dimensionality
  • 30. Iteratively Reweighted Surrogate Variable Analysis 1.  Estimate the row dimension, , of G. 2.  Form an initial estimate equal to the first right singular vectors of R = X - S. 3.  Estimate . 4.  Weight the ith row of X by and set to be the first right singular vectors of the weighted matrix. ˆG(b+1) € ˆr € ˆB Iterate for b=0,…,B: € ˆG0 ˆr € X = BS + ΓG + U € xi = biS + γiG + ui Whole data: Test i data: € ˆr
  • 31. An Example of the IRW-SVA Algorithm The Data True GEstimate of GPr(G & !S)
  • 32. An Example of the IRW-SVA Algorithm The Data True GEstimate of GPr(G & !S)
  • 33. An Example of the IRW-SVA Algorithm The Data True GEstimate of GPr(G & !S)
  • 34. An Example of the IRW-SVA Algorithm The Data True GEstimate of GPr(G & !S)
  • 35. An Example of the IRW-SVA Algorithm The Data True GEstimate of GPr(G & !S)
  • 36. An Example of the IRW-SVA Algorithm The Data True GEstimate of GPr(G & !S)
  • 37. An Example of the IRW-SVA Algorithm The Data True GEstimate of GPr(G & !S)
  • 38. Iteratively Re-weighted Surrogate Variable Analysis 1.  Estimate the row dimension, , of G. 2.  Form an initial estimate equal to the first right singular vectors of R = X - S. 3.  Estimate . 4.  Weight the ith row of X by and set to be the first right singular vectors of the weighted matrix. ˆG(b+1) € ˆr € ˆB € ˆG0 ˆr € X = BS + ΓG + U € xi = biS + γiG + ui Whole data: Test i data: € ˆr Iterate for b=0,…,B:
  • 39. 1.  Buja and Eyuboglu (1992) proposed a permutation approach. 2.  Patterson, Price, and Reich (2006) proposed a sequential testing strategy based on Tracey- Widom theory. 3.  Leek (in preparation) proposes an eigenvalue estimator that is consistent in the number of tests. Estimating The Row Dimension of G
  • 40. 1.  Assume the data follow X = BS + ΓG + U, where G and S have row dimensions r and d, r + d < n. 2.  Calculate the singular values s1,…, sn of X and choose b, such that r+d < b. 3.  Calculate the eigenvalues, λ1,…, λn of where P = I - S(STS)-1ST and R = XP. 4.  Set ˆr = 1 λj > m−1/ 3 ( ) j=1 n ∑ € € 1 m RT R − sb 2 P[ ] Estimating The Row Dimension of G
  • 41. Theorem As , is a consistent estimate of the row dimension of G, provided that: (1) uij are independent (2) E[uij]=0 (3)  (4)  (5)  ΓTΓ is positive definite with unique eigenvalues € m → ∞ € E[uij 2 ] = σi 2 < M1 € E[uij 4 ] < M2 € lim m→∞ 1 m Leek (In Prep.) € ˆr = 1 λj > m−1/ 3 ( ) j=1 n ∑ Estimating The Row Dimension of G
  • 42. Iteratively Re-weighted Surrogate Variable Analysis 1.  Estimate the row dimension, , of G. 2.  Form an initial estimate equal to the first right singular vectors of R = X - S. 3.  Estimate . 4.  Weight the ith row of X by and set to be the first right singular vectors of the weighted matrix. ˆG(b+1) € ˆr € ˆB € ˆG0 ˆr € X = BS + ΓG + U € xi = biS + γiG + ui Whole data: Test i data: € ˆr Iterate for b=0,…,B:
  • 43. Break The Estimation Into Two Components
  • 44. 1.  Form F-statistics F1,…,Fm for testing the hypotheses: 2.  Bootstrap from the conditional null model to obtain null- statistics , k =1,…K. 3.  From Bayes’ Theorem: where and . Estimating the Probability Weights € F1 0k ,...,Fm 0k € Fi 0k ~ g0 € Fi ~ π0g0 + (1− π0)g1
  • 45. 1.  Form F-statistics F1,…,Fm for testing the hypotheses: 2.  Bootstrap from the conditional null model to obtain null- statistics , k =1,…K. 3.  From Bayes’ Theorem: 4.  Estimate the ratio of the densities with a non-parametric logistic regression where Fi are “successes” and Fi 0k are “failures” (Anderson and Blair 1982). where and . . Estimating the Probability Weights € F1 0k ,...,Fm 0k € Fi 0k ~ g0 € Fi ~ π0g0 + (1− π0)g1
  • 46. 1.  Form F-statistics F1,…,Fm for testing the hypotheses: 2.  Bootstrap from the conditional null model to obtain null- statistics , k =1,…K. 3.  From Bayes’ Theorem: 4.  Estimate the ratio of the densities with a non-parametric logistic regression where Fi are “successes” and Fi 0k are “failures” (Anderson and Blair 1982). 5.  Estimate π0 according to Storey (2002). where and . Estimating the Probability Weights € F1 0k ,...,Fm 0k € Fi 0k ~ g0 € Fi ~ π0g0 + (1− π0)g1
  • 47. Estimating the Probability Weights Estimate of posterior probability bi ≠ 0.
  • 48. SVA-Adjusted Analysis 1.  Estimate G with IRW-SVA 2.  Fit 3.  Test the hypotheses € H0i :bi ∈ Ω0 H1i :bi ∈ Ω1
  • 49. A Simple Simulated Example Independent E Dependent E Genes Genes Arrays Arrays
  • 50. Null Distribution Behavior Dependent E Independent E Dependent E + IRW-SVA
  • 51. False Discovery Rate Estimates Independent E Dependent E Dependent E + IRW-SVA True False Discovery Rate True False Discovery Rate True False Discovery Rate Q-value Q-value Q-value
  • 52. Ranking Estimates Independent E Dependent E Dependent E + IRW-SVA Ranking by True Signal to Noise Ranking by True Signal to Noise Ranking by True Signal to Noise AverageRankingbyT-Statistic AverageRankingbyT-Statistic AverageRankingbyT-Statistic
  • 53. 53 Inflammation and the Host Response to Injury mRNA Expression ~50,000 genes Clinical Data >150 clinical variables Patient 1 Patient 2 Patient 166…. MOF1 measures severity of injury
  • 54. Phase 1 Phase 2 Phase 3 Phase 4 Four “Replicated” Studies FrequencyFrequency P-value P-value P-value P-value P-value P-value P-value P-value Frequency Frequency Frequency Frequency Frequency Frequency Frequency
  • 55. Functional Enrichment Across Phases Number of phases in which a significant pathway appears Percentoftotalsignificantpathways 1 of 4 2 of 4 3 of 4 4 of 4 Unadjusted IRW-SVAAdjusted
  • 56. •  High-dimensional hypothesis testing is common. •  Dependence between tests can result in incorrect statistical and scientific inference. •  We can define and address dependence at the level of the model using the dependence kernel. •  IRW-SVA can be used to improve inference in high-dimensional multiple hypothesis testing. Summary
  • 57. Future Work •  Multiple Testing – Develop dependence kernel estimates for spatial data – Develop diagnostic tests for multiple testing procedures •  High-Dimensional Asymptotics – Extend methods for asymptotic SVD to binary data •  Feature Selection for High-Dimensional Classifiers – Extensions of top-scoring pairs (TSP) to survival data – Theoretical connections to LDA and SVM – Embedding TSP in a logic regression framework
  • 59. 1.  Calculate the residuals R = X - S. 2.  Calculate the singular values of R, d1,…,dn. 3.  Permute each row of R individually to get R0. 4.  Take the SVD of the residuals R* = R0 - S to obtain null singular values . 5.  Compare di to for k=1,…,K to calculate a P- value for the ith right singular vector. Estimating The Row Dimension of G € ˆB € ˆB0 € di0 k € di0 k For k =1,…,K do steps 3-4: Buja and Eyuboglu (1992)
  • 60. Why Does This Work? Leek and Storey (2007), Leek and Storey (2008) Useful Fact: X = BS + E = BS + ΓG + U = BS + ΛH + U if G and H have the same column space.
  • 61. •  References: Benjamini Y and Hochberg Y. (1995), “Controlling the false discovery rate – a practical and powerful approach to multiple testing.” JRSSB, 57: 289-300. De Castro MC, Monte-Mor RL, Sawyer DO, and Singer, BH. (2005), “Malaria risk on the amazon frontier.” PNAS, 103: 2452-2457. Delin B and Roeder K. (1999), “Genomic control for association studies.” Biometrics, 55: 997-1004. Efron B. (2004) “Large-scale simultaneous hypothesis testing: The choice of a null hypothesis.” JASA, 99: 96-104. Leek JT and Storey JD. (2008) “A general framework for multiple testing dependence.” Proceedings of the National Academy of Sciences , 105: 18718-18723. Leek JT and Storey JD. (2007) “Capturing heterogeneity in gene expression studies by ‘Surrogate Variable Analysis’.” PLoS Genetics, 3: e161. Taylor JE and Worsley KJ. (2007) “Detecting sparse signals in random fields, with applications to brain mapping.” JASA, 102: 913-928. Thank You
  • 62. 1.  Perform each hypothesis test individually. 2.  Obtain the test-statistic for each test. 3.  Compare distribution of test-statistics to the theoretical null distribution. 4.  Adjust theoretical null so that it matches the observed statistics in a low signal region. Empirical Null
  • 65. Empirical Null Results in Incorrect Null Distribution Dep. Kernel
  • 66. •  Observed statistics or observed P-values come from mixture distribution: π0g0 + π1g1 •  Dependence distorts g0 … can go either way: •  Must use full data set to capture dependence With Confounding Empirical Null is Ill-Posed