EWAS and the exposome: Mt Sinai in Brescia 052119

EWAS and the elusive architecture of
exposome variance in phenotype
Chirag J Patel

Mt. Sinai Exposome Symposium

Brescia, Italy

5/21/2019
chirag@hms.harvard.edu

@chiragjp

www.chiragjpgroup.org

I have a ticket to the opera

(tomorrow evening, 8pm)!

Please contact me if you would like to go
@chiragjp

P = G + EType 2 Diabetes

Cancer

Alzheimer’s

Gene expression
Phenotype Genome
Variants
Environment
Infectious agents

Diet + Nutrients

Pollutants

Drugs

σ2G
σ2P
H2 =
Heritability (H2) is the range of phenotypic
variability attributed to genetic variability in a
population
This estimate captures the genetic architecture of
phenotype, important for way we model genetic
risk for disease

For example:
(1) Is my G-P association speciﬁed correctly?
Or, how much of the P variation is additive vs. interaction?

P = a + b1*g1

vs.

P = a + b1*g1 + z1*c1

vs.

P = a + b1*g1 + b2*g2 + b2*g1*g2 + z1*c1
Stratiﬁcation (confounding)?
Epistasis (interaction)?

For example:
(2) How much P variation explained by what we can measure?
evolut
partic
eases;
tase 1)
well a
biolog
The
captur
implem
STRU
revert
subset
librium
clearly
−log10(P)
0
5
10
15
Chromosome
22
X
21
20
19
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
80
60
40
100
rvedteststatistic
a
b
NATURE|Vol 447|7 June 2007
WTCCC, 2007
AA Aa aa
case
control
ie, how much variation in P captured by GWAS?
Image credit: illumina

What is the exposomic architecture of complex
phenotype?
?
σ2P
E2 =
Important for way we model exposomic risk for
disease, including the role of mixtures!

E2
Combination of shared and speciﬁc environment
σ2E = σ2shared + σ2speciﬁc + error

🏠
σ2shared
σ2P
C2 =
Shared E (C2) is the range of phenotypic variability
attributed to shared household or geography
(but not genetics)

Lakhani et al., Nature Genetics 2019
Decompose the mixture of genetics, shared E
(C2), and indicators of shared E (air pollution,
weather, and socioeconomic status)
chirag “the better” braden
Arjun Manrai
(BCH CHIP)
http://apps.chiragjpgroup.org/catch/

To decompose P, G, and E, we need to
measure them all simultaneously!

Decomposing the mixture of genetics, shared E (C2), and indicators of
shared E (air pollution, weather, and socioeconomic status)
in real-world data
+ Disease (ICD9/ICD10),

procedures, drugs, labs

N ~ 45M
chirag “the better” braden
Jian Yang
Peter M Visscher
Arjun Manrai
(BCH CHIP)
insurance claims
Weather
Air Pollution
Census SES

Claims Analysis of Twin Correlation and
Heritability (CATCH)
vulture.com

Amassing (the largest) twin and sibling cohort
in the US to estimate G and E in ~500 P
• Assume familial relationships in
subscriber groups

• Subscriber group less than 15 members

• Both members are child of primary
subscriber (e.g., employed individual)

• Same date of birth
• Year of birth occurs on or after 1985

• Member enrollment greater than 36
months
Same Sex -
Female
17,919
Same Sex -
Male
17,835
Opposite
Sex
20,642
total 56,396
Largest collection of twins in US (next largest has ~28k pairs)
Largest collection of twins in US (next largest has ~28k pairs)
724K siblings!

Where do we get E indicators?

Exposome Data Warehouse (~1TB)
Geographical information system-enabled
database to map individuals to E

US distribution of twins and siblings that can be “linked”
to variation in air pollution, climate, and socioeconomic
status

We mapped 13360 ICD9 billing codes to 1809
PheWAS codes (in addition to 95 Mendelian
disorders)
Denny, Bastarache, et al. 2013

Rzhetsky, White et al. 2013
CARDIOVASCULAR
hypertension (401)
cardiac dysrhythmias (427)
DIGESTIVE
irritable bowel syndrome (564.1)
ENDOCRINE
type 2 diabetes (250.1)
type 1 diabetes (250.2)
(and 11 more phenotype groups)

h2 = 2(rmz - rdz)

c2 = 2rdz - rmz

In a twin study, h2 and c2 can be
estimated using Falconer’s formula
Tetrachoric correlation to estimate rmz & rdz
h2 : narrow-sense heritability

c2 : shared environment

rmz: correlation of phenotype between identical twins

rdz: correlation of phenotype between fraternal twins

… but we do not know the zygosity status of
claimants…
But we do know:

Opposite sex twins: all fraternal

Same Sex twins 👯 : mixture of identical and fraternal

We can estimate the proportion of fraternal and
identical twins using opposite sex twin prevalence
Weinberg, 1902

Benyamin, et al, 2005, 2006
P(mz) ~ 1 - 2(NOS / Nall) = 0.26
p(ss) = Nss / Nall = 0.63
p = P(mz|ss) = P(mz) / P(ss) = 0.41
h2 = 2/p (rss - ros)
c2 = (ros(p+1) - rss) / p

Patient cohorts in the “real-world” :
overall heritability (0.32) and shared environment (0.09): a
global view among 560 phenotypes

gives a nuanced view of G and E
CaTCH: Claims analysis of Twin Correlation and Heritability
US-based, ages < 25
statistic
Phenotype category
Overall (0.32) Endocrine (0.4) Metabolic (0.4)

Decomposing G (h2), shared E (c2) and factors of shared E

(SES, air quality, and temperature)

in 2 candidate phenotypes:

Lyme and Obesity
temperature/seasonality = 5% of shared environment
SES explains 10% of shared E
signiﬁcant total genetics
signiﬁcant total shared E
zero genetics
moderate shared environment

h2 and c2 estimates for 560 phenotypes versus statistical signiﬁcance :

326/560 traits (>50%) have a heritable and 180/560 (32%) had a shared
environment component!
r=0.817

… but air pollution, climate, and geocoded SES
play a modest role in total shared environment (c2)
r=0.817

56K twins and 700K siblings in a massive health insurance cohort
point to complex and elusive variation in 560 phenotypes
560phenotypes
h2=0.3
c2=0.1 (🏠)
?
https://rdcu.be/boZeV

• 58.2% of P had non-zero h2

• 32% of P had a non-zero c2

• 87.9% had non-zero age fixed effect

• 50% had non-zero sex fixed effect

• Shared c2 != pollution, income, climate

• 0.32 + 0.09 = 0.41
• 1 - 0.41 = 0.59!
The shared environment and genetic contribution in phenotype
is complex and modest:
Where is the rest of the variation on phenotype?
Where is the rest of the variation in P?

Explaining the missing variation in P with the Exposome:
A data-driven paradigm for robust discovery of E
Wild, 2005, 2012
Ioannidis , 2009

Rappaport and Smith, 2010, 2011

Buck-Louis and Sundaram 2012

Miller and Jones, 2014

Patel CJ and Ioannidis JPAI, 2014ab
Ioannidis, 2016
Manrai et al 2017

Hypothesis - on average, explaining only 10% of the stuﬀ in red
560phenotypes
h2=0.3
c2=0.1 (🏠)
~10%
https://rdcu.be/boZeV

Challenges in EWAS for identification of specific E factors, E
mixtures, and variance explained (E2) of the exposome
Power (both low and large sample sizes)
Model misspecification and mis?-interpretation
Dense correlational web of the exposome
Time-dependence of the exposome-phenome associations
ARPH 2016

JAMA 2014

JECH 2014
Scalable measurement of the exposome

Red: positive ρ

Blue: negative ρ

thickness: |ρ|
for each pair of E:

Spearman ρ

(575 factors: 81,937 correlations)
Correlation globes paint a complex view of the exposome:
average correlation of < 0.3
permuted data to produce

“null ρ”

sought replication in > 1
cohort
Pac Symp Biocomput. 2015

JECH. 2015

Chung et al., ES&T 2018
Eﬀective number of
variables:

500 (10% decrease)

Co-E patterns between

females and males are similar
females
males
… however:
Co-E within household

weaker!
Chung et al., ES&T 2018
Jake Chung

High-throughput data analytics to mitigate analytical challenges of
exposome-based research:

What model do we use, how do we assess them, and how to
interpret them?
Agier, Portengen et al, EHP 2016

Unclear what model to use in the context of adjustment
variables
Estimating the Vibration of Eﬀects (or Risk)
Variable of Interest
e.g., 1 SD of log(serum Vitamin D)
Adjusting Variable Set
n=13
All-subsets Cox regression
213+ 1 = 8,193 models
SES [3rd tertile]
education [>HS]
race [white]
body mass index [normal]
total cholesterol
any heart disease
family heart disease
any hypertension
any diabetes
any cancer
current/past smoker [no smoking]
drink 5/day
physical activity
Data Source
NHANES 1999-2004
417 variables of interest
time to death
N≧1000 (≧100 deaths)
effect sizes
p-values
●
●
●
●
●
●
●
●
●
●
●
●
●●
0
1
2
3
4
5
6
7
8
9
10
11
1213
1
50
99
1 50 99
5.0
7.5
−log10(pvalue)
Vitamin D (1SD(log))
RHR = 1.14
RPvalue = 4.68
A
B
C D
E
median p-value/HR for k
percentile indicator
JCE, 2015
●
●
●
●
●
●
●
●
●
●
●
●
●●
0
1
2
3
4
5
6
7
8
9
10
11
1213
1
50
99
1 50 99
2.5
5.0
7.5
0.64 0.68 0.72 0.76
Hazard Ratio
−log10(pvalue)
RHR = 1.14
RP = 4.68
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0
1
2
3
4
5
6
7
8
9
10
11
12
13
1
50
99
1 50 99
1
2
3
4
0.75 0.80 0.85 0.90
Hazard Ratio
−log10(pvalue)
Thyroxine (1SD(log))
RHR = 1.15
RP = 2.90
http://bit.ly/eﬀectvibration

The Vibration of Eﬀects:
Vitamin D and Thyroxine and attenuated risk in mortality
JCE, 2015
●
●
●
●
●
●
●
●
●
●
●
●
●●
0
1
2
3
4
5
6
7
8
9
10
11
1213
1
50
99
1 50 99
2.5
5.0
7.5
0.64 0.68 0.72 0.76
Hazard Ratio
−log10(pvalue)
RHR = 1.14
RP = 4.68
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0
1
2
3
4
5
6
7
8
9
10
11
12
13
1
50
99
1 50 99
1
2
3
4
0.75 0.80 0.85 0.90
Hazard Ratio
−log10(pvalue)
Thyroxine (1SD(log))
RHR = 1.15
RP = 2.90

●
●
●
●
●
9
10
111213
1
5
10
1.3
−log10(pvalue)
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0
1
2
3
4
5
6
7
8
9
10
111213
1
50
99
1 50 99
5
10
1.3 1.4 1.5 1.6
Hazard Ratio
−log10(pvalue)
Cadmium (1SD(log))
adjustment=current_past_smoking
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0
1
2
3
4
5
6
7
8
9
10
111213
1
50
99
1 50 99
5
10
1.3 1.4 1.5 1.6
Hazard Ratio
−log10(pvalue)
Cadmium (1SD(log))
RHR = 1.29
RP = 8.29
The Vibration of Eﬀects: shifts in the eﬀect size distribution
due to select adjustments (e.g., adjusting cadmium levels
JCE, 2015

pcb
b-carotene
C-reactive protein
cotinine

JCE, 2015
Risk and significance depends on modeling scenario!
The Vibration of Effects: beware of the Janus effect

(both risk and protection?!)
“risk”“protection”
“significant”
http://bit.ly/effectvibration

How do we interpret the complex models?:
What is a mixture? - an integration of correlated E? Interaction?
Curr Epidemiol Rep 2017
P = a + b1*e1 + b2*e2 + b3*e3

vs.

P = a + b1*e1 + b2*e2 + b3*e3 + b4*e1*e2

vs.
P = a + b1*e1 + b2*e2 + b3*e3 + b4*e1*e2

+ b5*e1*e2 + b6*e2*e3 + b7*e1*e3 + b8*e1*e2*e3
All terms signiﬁcant?

Only interaction terms?
Additive?
Simple interaction?
All 2 way/n-way?

Our current exposomic research cohorts may be underpowered
for complex modeling!
Chung et al, Environment International, 2019

… or overpowered for detection of simple
associations!

Emerging large sample sized cohorts will expose massive mis-
misspeciﬁcation, or confounding, and correlation amongst E!
Manrai et al, American Journal of Epidemiology 2019
EWAS in telomere length
(Patel et al., IJE 2017)
Adjusted for age, poverty, ethnicity, and sex
GWAS in telomere length
(Pearson and Manolio, JAMA 2008)
Before and after confounding adjustment

Aside from residual confounding and mis-speciﬁcation:
Modest association sizes and R2— if eﬀects — may be diluted
by the complex phenomenon of individual exposure and time
Athersuch Bioanalysis 2012
Cumulative (cadmium, PCB)
Constant, but excreted (phenols, vitamins)
Intervention (drugs)
Seasonal (allergen)
In-utero
Not shown: Diurnal

In conclusion:
In complex traits and phenotypes, we are missing much
of phenotypic variation of the population.
H2 ~ 0.3
C2 ~ 0.1

Challenges in EWAS for identification of specific E factors, E
mixtures, and variance explained (E2) of the exposome
Power (both low and large sample sizes)
Model misspecification and mis(?)-interpretation
Dense correlational web of the exposome
Time-dependence of the exposome-phenome associations
ARPH 2016

JAMA 2014

JECH 2014
Scalable measurement of the exposome

Now hiring!

Data scientist for translational exposome informatics!
phenome exposome genome
individuals
= +
diabetes
telom
eres
airpollution
nutrients
inﬂuenza
FTOTCF7L2
thousands of environmental exposures millions of genetic variants
Develop data science tools for dissecting relationships between the
phenome, exposome, and genome to optimize medical decision
making in the era of massive data.
(you)
your phenome
P E G
@chiragjp


Harvard DBMI
Susanne Churchill

Nathan Palmer

Sophia Mamousette

Sunny Alvear

Chirag J Patel


@chiragjp

www.chiragjpgroup.org
NIH Common Fund

Big Data to Knowledge
Acknowledgements
RagGroup
Jake Chung
Kajal Claypool
Chirag Lakhani
Danielle Rasooly

Alan LeGoallec

Braden Tierney

Yixuan He
Mentioned Collaborators
Arjun Manrai
John Ioannidis

Peter Visscher

EWAS and the exposome: Mt Sinai in Brescia 052119

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to EWAS and the exposome: Mt Sinai in Brescia 052119

Similar to EWAS and the exposome: Mt Sinai in Brescia 052119 (19)

Recently uploaded

Recently uploaded (20)

EWAS and the exposome: Mt Sinai in Brescia 052119