Call Girls Jayanagar Just Call 7001305949 Top Class Call Girl Service Available
Big data exposome and pediatric outcomes
1. Big data streams to elucidate the role of
environmental exposures in pediatric
outcomes
Chirag J Patel
Hot Topics!
12/8/2016
chirag@hms.harvard.edu
@chiragjp
www.chiragjpgroup.org
3. P = G + E
Phenotypes are a function of inherited and
environmental factors
4. P = G + EType 2 Diabetes
Cancer
Birthweight
Birth timing
Phenotype
Phenotypes are a function of inherited and
environmental factors
5. P = G + EType 2 Diabetes
Cancer
Birthweight
Birth timing
Phenotype Genome
polymorphisms
Phenotypes are a function of inherited and
environmental factors
6. P = G + EType 2 Diabetes
Cancer
Birthweight
Birth timing
Phenotype Genome
polymorphisms
Environment
Infectious agents
Nutrients
Pollutants
Drugs
Phenotypes are a function of inherited and
environmental factors
7. P = G + E
However: we lack methods to discover the role of E in
phenotypes and disease for precision medicine.
8. ... and the case is different with genetics (e.g.,
genomics)!
over 1,400
Genome-wide Association Studies (GWAS)
NHGRI GWAS Catalog
https://www.genome.gov/
G:
dad mom
me
12. σ2
G
σ2
P
H2 =
Heritability (H2) is the range of phenotypic variability
attributed to genetic variability in a population
Indicator of the proportion of phenotypic
differences attributed to G.
13. Height is an example of a heritable trait:
Francis Galton shows how its done (1887)
mid-height of 205 parents described
60% of variability of 928 offspring
σ2
G
σ2
P
14. Source: SNPedia.com
H2 estimates for complex traits are low and variable:
massive opportunity for high-throughput E research
Eye color
Hair curliness
Type−1 diabetes
Height
Schizophrenia
Epilepsy
Graves' disease
Celiac disease
Polycystic ovary syndrome
Attention deficit hyperactivity disorder
Bipolar disorder
Obesity
Alzheimer's disease
Anorexia nervosa
Psoriasis
Bone mineral density
Menarche, age at
Nicotine dependence
Sexual orientation
Alcoholism
Lupus
Rheumatoid arthritis
Crohn's disease
Migraine
Thyroid cancer
Autism
Blood pressure, diastolic
Body mass index
Depression
Coronary artery disease
Insomnia
Menopause, age at
Heart disease
Birthweight
Prostate cancer
QT interval
Breast cancer
Ovarian cancer
Preterm Birth
Hangover
Stroke
Asthma
Blood pressure, systolic
Hypertension
Osteoarthritis
Parkinson's disease
Longevity
Type−2 diabetes
Gallstone disease
Testicular cancer
Cervical cancer
Sciatica
Parturition Timing
Bladder cancer
Colon cancer
Lung cancer
Leukemia
Stomach cancer
0 25 50 75 100
Heritability: Var(G)/Var(Phenotype)
15. Source: SNPedia.com
H2 estimates for complex traits are low and variable:
massive opportunity for high-throughput E research
Eye color
Hair curliness
Type−1 diabetes
Height
Schizophrenia
Epilepsy
Graves' disease
Celiac disease
Polycystic ovary syndrome
Attention deficit hyperactivity disorder
Bipolar disorder
Obesity
Alzheimer's disease
Anorexia nervosa
Psoriasis
Bone mineral density
Menarche, age at
Nicotine dependence
Sexual orientation
Alcoholism
Lupus
Rheumatoid arthritis
Crohn's disease
Migraine
Thyroid cancer
Autism
Blood pressure, diastolic
Body mass index
Depression
Coronary artery disease
Insomnia
Menopause, age at
Heart disease
Birthweight
Prostate cancer
QT interval
Breast cancer
Ovarian cancer
Preterm Birth
Hangover
Stroke
Asthma
Blood pressure, systolic
Hypertension
Osteoarthritis
Parkinson's disease
Longevity
Type−2 diabetes
Gallstone disease
Testicular cancer
Cervical cancer
Sciatica
Parturition Timing
Bladder cancer
Colon cancer
Lung cancer
Leukemia
Stomach cancer
0 25 50 75 100
Heritability: Var(G)/Var(Phenotype)
Type 2 Diabetes
(25%)
16. Source: SNPedia.com
H2 estimates for complex traits are low and variable:
massive opportunity for high-throughput E research
Eye color
Hair curliness
Type−1 diabetes
Height
Schizophrenia
Epilepsy
Graves' disease
Celiac disease
Polycystic ovary syndrome
Attention deficit hyperactivity disorder
Bipolar disorder
Obesity
Alzheimer's disease
Anorexia nervosa
Psoriasis
Bone mineral density
Menarche, age at
Nicotine dependence
Sexual orientation
Alcoholism
Lupus
Rheumatoid arthritis
Crohn's disease
Migraine
Thyroid cancer
Autism
Blood pressure, diastolic
Body mass index
Depression
Coronary artery disease
Insomnia
Menopause, age at
Heart disease
Birthweight
Prostate cancer
QT interval
Breast cancer
Ovarian cancer
Preterm Birth
Hangover
Stroke
Asthma
Blood pressure, systolic
Hypertension
Osteoarthritis
Parkinson's disease
Longevity
Type−2 diabetes
Gallstone disease
Testicular cancer
Cervical cancer
Sciatica
Parturition Timing
Bladder cancer
Colon cancer
Lung cancer
Leukemia
Stomach cancer
0 25 50 75 100
Heritability: Var(G)/Var(Phenotype)
Type 2 Diabetes
(25%)
Heart Disease
(30-60%)
17. Source: SNPedia.com
H2 estimates for complex traits are low and variable:
massive opportunity for high-throughput E research
Eye color
Hair curliness
Type−1 diabetes
Height
Schizophrenia
Epilepsy
Graves' disease
Celiac disease
Polycystic ovary syndrome
Attention deficit hyperactivity disorder
Bipolar disorder
Obesity
Alzheimer's disease
Anorexia nervosa
Psoriasis
Bone mineral density
Menarche, age at
Nicotine dependence
Sexual orientation
Alcoholism
Lupus
Rheumatoid arthritis
Crohn's disease
Migraine
Thyroid cancer
Autism
Blood pressure, diastolic
Body mass index
Depression
Coronary artery disease
Insomnia
Menopause, age at
Heart disease
Birthweight
Prostate cancer
QT interval
Breast cancer
Ovarian cancer
Preterm Birth
Hangover
Stroke
Asthma
Blood pressure, systolic
Hypertension
Osteoarthritis
Parkinson's disease
Longevity
Type−2 diabetes
Gallstone disease
Testicular cancer
Cervical cancer
Sciatica
Parturition Timing
Bladder cancer
Colon cancer
Lung cancer
Leukemia
Stomach cancer
0 25 50 75 100
Heritability: Var(G)/Var(Phenotype)
Preterm Birth (37%)
Birthweight (40%)
Timing (20%)
18. Source: SNPedia.com
H2 estimates for complex traits are low and variable:
massive opportunity for high-throughput E research
Eye color
Hair curliness
Type−1 diabetes
Height
Schizophrenia
Epilepsy
Graves' disease
Celiac disease
Polycystic ovary syndrome
Attention deficit hyperactivity disorder
Bipolar disorder
Obesity
Alzheimer's disease
Anorexia nervosa
Psoriasis
Bone mineral density
Menarche, age at
Nicotine dependence
Sexual orientation
Alcoholism
Lupus
Rheumatoid arthritis
Crohn's disease
Migraine
Thyroid cancer
Autism
Blood pressure, diastolic
Body mass index
Depression
Coronary artery disease
Insomnia
Menopause, age at
Heart disease
Birthweight
Prostate cancer
QT interval
Breast cancer
Ovarian cancer
Preterm Birth
Hangover
Stroke
Asthma
Blood pressure, systolic
Hypertension
Osteoarthritis
Parkinson's disease
Longevity
Type−2 diabetes
Gallstone disease
Testicular cancer
Cervical cancer
Sciatica
Parturition Timing
Bladder cancer
Colon cancer
Lung cancer
Leukemia
Stomach cancer
0 25 50 75 100
Heritability: Var(G)/Var(Phenotype)
H2 < 50%
19. Explaining the other 50%:
A new data-driven paradigm for robust discovery of
E in disease via EWAS and the exposome
PERSPECTIVES
Xenobiotics
Inflammation
Preexisting disease
Lipid peroxidation
Oxidative stress
Gut flora
Internal
chemical
environment
Externalenvironment
ExposomeRADIATION
DIET
POLLUTION
INFECTIONS
DRUGS
LIFE-STYLE
STRESS
Reactive electrophiles
Metals
Endocrine disrupters
Immune modulators
Receptor-binding proteins
itical entity for disease eti-
ogy (7). Recent discussion
as focused on whether and
ow to implement this vision
8). Although fully charac-
rizing human exposomes
daunting, strategies can be
eveloped for getting “snap-
hots” of critical portions of
person’s exposome during
ifferent stages of life. At
ne extreme is a “bottom-up”
rategy in which all chemi-
als in each external source
f a subject’s exposome are
easured at each time point.
lthoughthisapproachwould
ave the advantage of relat-
g important exposures to
e air, water, or diet, it would
quire enormous effort and
ould miss essential compo-
ents of the internal chemi-
al environment due to such
actors as gender, obesity,
flammation, and stress. By
ontrast, a “top-down” strat-
gy would measure all chem-
als (or products of their
ownstream processing or
ffects, so-called read-outs
r signatures) in a subject’s
ood. This would require
nly a single blood specimen
each time point and would relate directly ruptors and can be measured through serum
some (telomere) length in
peripheral blood mono-
nuclear cells responded
to chronic psychological
stress, possibly mediated
by the production of reac-
tive oxygen species (15).
Characterizing the
exposome represents a tech-
nological challenge like that of
thehumangenomeproject,which
began when DNA sequencing
was in its infancy (16). Analyti-
cal systems are needed to pro-
cess small amounts of blood from
thousands of subjects. Assays
should be multiplexed for mea-
suring many chemicals in each
class of interest. Tandem mass
spectrometry, gene and protein
chips, and microfluidic systems
offer the means to do this. Plat-
forms for high-throughput assays
shouldleadtoeconomiesofscale,
again like those experienced by
the human genome project. And
because exposome technologies
would provide feedback for thera-
peuticinterventionsandpersonal-
ized medicine, they should moti-
vate the development of commer-
cial devices for screening impor-
tant environmental exposures in
blood samples.
With successful characterization of both
Characterizing the exposome. The exposome represents
the combined exposures from all sources that reach the
internal chemical environment. Toxicologically important
classes of exposome chemicals are shown. Signatures and
biomarkers can detect these agents in blood or serum.
onOctober21,2010www.sciencemag.orgrom
Wild, 2005
Rappaport and Smith, 2010, 2011
Buck-Louis and Sundaram 2012
Miller and Jones, 2014
Patel CJ and Ioannidis JPAI, 2014
20. Explaining the other 50%:
A new data-driven paradigm for robust discovery of
E in disease via EWAS and the exposome
what to
measure?
PERSPECTIVES
Xenobiotics
Inflammation
Preexisting disease
Lipid peroxidation
Oxidative stress
Gut flora
Internal
chemical
environment
Externalenvironment
ExposomeRADIATION
DIET
POLLUTION
INFECTIONS
DRUGS
LIFE-STYLE
STRESS
Reactive electrophiles
Metals
Endocrine disrupters
Immune modulators
Receptor-binding proteins
itical entity for disease eti-
ogy (7). Recent discussion
as focused on whether and
ow to implement this vision
8). Although fully charac-
rizing human exposomes
daunting, strategies can be
eveloped for getting “snap-
hots” of critical portions of
person’s exposome during
ifferent stages of life. At
ne extreme is a “bottom-up”
rategy in which all chemi-
als in each external source
f a subject’s exposome are
easured at each time point.
lthoughthisapproachwould
ave the advantage of relat-
g important exposures to
e air, water, or diet, it would
quire enormous effort and
ould miss essential compo-
ents of the internal chemi-
al environment due to such
actors as gender, obesity,
flammation, and stress. By
ontrast, a “top-down” strat-
gy would measure all chem-
als (or products of their
ownstream processing or
ffects, so-called read-outs
r signatures) in a subject’s
ood. This would require
nly a single blood specimen
each time point and would relate directly ruptors and can be measured through serum
some (telomere) length in
peripheral blood mono-
nuclear cells responded
to chronic psychological
stress, possibly mediated
by the production of reac-
tive oxygen species (15).
Characterizing the
exposome represents a tech-
nological challenge like that of
thehumangenomeproject,which
began when DNA sequencing
was in its infancy (16). Analyti-
cal systems are needed to pro-
cess small amounts of blood from
thousands of subjects. Assays
should be multiplexed for mea-
suring many chemicals in each
class of interest. Tandem mass
spectrometry, gene and protein
chips, and microfluidic systems
offer the means to do this. Plat-
forms for high-throughput assays
shouldleadtoeconomiesofscale,
again like those experienced by
the human genome project. And
because exposome technologies
would provide feedback for thera-
peuticinterventionsandpersonal-
ized medicine, they should moti-
vate the development of commer-
cial devices for screening impor-
tant environmental exposures in
blood samples.
With successful characterization of both
Characterizing the exposome. The exposome represents
the combined exposures from all sources that reach the
internal chemical environment. Toxicologically important
classes of exposome chemicals are shown. Signatures and
biomarkers can detect these agents in blood or serum.
onOctober21,2010www.sciencemag.orgrom
Wild, 2005
Rappaport and Smith, 2010, 2011
Buck-Louis and Sundaram 2012
Miller and Jones, 2014
Patel CJ and Ioannidis JPAI, 2014
21. Explaining the other 50%:
A new data-driven paradigm for robust discovery of
E in disease via EWAS and the exposome
what to
measure? how to measure?
PERSPECTIVES
Xenobiotics
Inflammation
Preexisting disease
Lipid peroxidation
Oxidative stress
Gut flora
Internal
chemical
environment
Externalenvironment
ExposomeRADIATION
DIET
POLLUTION
INFECTIONS
DRUGS
LIFE-STYLE
STRESS
Reactive electrophiles
Metals
Endocrine disrupters
Immune modulators
Receptor-binding proteins
itical entity for disease eti-
ogy (7). Recent discussion
as focused on whether and
ow to implement this vision
8). Although fully charac-
rizing human exposomes
daunting, strategies can be
eveloped for getting “snap-
hots” of critical portions of
person’s exposome during
ifferent stages of life. At
ne extreme is a “bottom-up”
rategy in which all chemi-
als in each external source
f a subject’s exposome are
easured at each time point.
lthoughthisapproachwould
ave the advantage of relat-
g important exposures to
e air, water, or diet, it would
quire enormous effort and
ould miss essential compo-
ents of the internal chemi-
al environment due to such
actors as gender, obesity,
flammation, and stress. By
ontrast, a “top-down” strat-
gy would measure all chem-
als (or products of their
ownstream processing or
ffects, so-called read-outs
r signatures) in a subject’s
ood. This would require
nly a single blood specimen
each time point and would relate directly ruptors and can be measured through serum
some (telomere) length in
peripheral blood mono-
nuclear cells responded
to chronic psychological
stress, possibly mediated
by the production of reac-
tive oxygen species (15).
Characterizing the
exposome represents a tech-
nological challenge like that of
thehumangenomeproject,which
began when DNA sequencing
was in its infancy (16). Analyti-
cal systems are needed to pro-
cess small amounts of blood from
thousands of subjects. Assays
should be multiplexed for mea-
suring many chemicals in each
class of interest. Tandem mass
spectrometry, gene and protein
chips, and microfluidic systems
offer the means to do this. Plat-
forms for high-throughput assays
shouldleadtoeconomiesofscale,
again like those experienced by
the human genome project. And
because exposome technologies
would provide feedback for thera-
peuticinterventionsandpersonal-
ized medicine, they should moti-
vate the development of commer-
cial devices for screening impor-
tant environmental exposures in
blood samples.
With successful characterization of both
Characterizing the exposome. The exposome represents
the combined exposures from all sources that reach the
internal chemical environment. Toxicologically important
classes of exposome chemicals are shown. Signatures and
biomarkers can detect these agents in blood or serum.
onOctober21,2010www.sciencemag.orgrom
Wild, 2005
Rappaport and Smith, 2010, 2011
Buck-Louis and Sundaram 2012
Miller and Jones, 2014
Patel CJ and Ioannidis JPAI, 2014
22. Explaining the other 50%:
A new data-driven paradigm for robust discovery of
E in disease via EWAS and the exposome
what to
measure? how to measure?
PERSPECTIVES
Xenobiotics
Inflammation
Preexisting disease
Lipid peroxidation
Oxidative stress
Gut flora
Internal
chemical
environment
Externalenvironment
ExposomeRADIATION
DIET
POLLUTION
INFECTIONS
DRUGS
LIFE-STYLE
STRESS
Reactive electrophiles
Metals
Endocrine disrupters
Immune modulators
Receptor-binding proteins
itical entity for disease eti-
ogy (7). Recent discussion
as focused on whether and
ow to implement this vision
8). Although fully charac-
rizing human exposomes
daunting, strategies can be
eveloped for getting “snap-
hots” of critical portions of
person’s exposome during
ifferent stages of life. At
ne extreme is a “bottom-up”
rategy in which all chemi-
als in each external source
f a subject’s exposome are
easured at each time point.
lthoughthisapproachwould
ave the advantage of relat-
g important exposures to
e air, water, or diet, it would
quire enormous effort and
ould miss essential compo-
ents of the internal chemi-
al environment due to such
actors as gender, obesity,
flammation, and stress. By
ontrast, a “top-down” strat-
gy would measure all chem-
als (or products of their
ownstream processing or
ffects, so-called read-outs
r signatures) in a subject’s
ood. This would require
nly a single blood specimen
each time point and would relate directly ruptors and can be measured through serum
some (telomere) length in
peripheral blood mono-
nuclear cells responded
to chronic psychological
stress, possibly mediated
by the production of reac-
tive oxygen species (15).
Characterizing the
exposome represents a tech-
nological challenge like that of
thehumangenomeproject,which
began when DNA sequencing
was in its infancy (16). Analyti-
cal systems are needed to pro-
cess small amounts of blood from
thousands of subjects. Assays
should be multiplexed for mea-
suring many chemicals in each
class of interest. Tandem mass
spectrometry, gene and protein
chips, and microfluidic systems
offer the means to do this. Plat-
forms for high-throughput assays
shouldleadtoeconomiesofscale,
again like those experienced by
the human genome project. And
because exposome technologies
would provide feedback for thera-
peuticinterventionsandpersonal-
ized medicine, they should moti-
vate the development of commer-
cial devices for screening impor-
tant environmental exposures in
blood samples.
With successful characterization of both
Characterizing the exposome. The exposome represents
the combined exposures from all sources that reach the
internal chemical environment. Toxicologically important
classes of exposome chemicals are shown. Signatures and
biomarkers can detect these agents in blood or serum.
onOctober21,2010www.sciencemag.orgrom
how to analyze in relation to health?
Wild, 2005
Rappaport and Smith, 2010, 2011
Buck-Louis and Sundaram 2012
Miller and Jones, 2014
Patel CJ and Ioannidis JPAI, 2014
23. Explaining the other 50%:
A new data-driven paradigm for robust discovery of
E in disease via EWAS and the exposome
what to
measure? how to measure?
Xenobiotics
Inflammation
Preexisting disease
Lipid peroxidation
Oxidative stress
Gut flora
Internal
chemical
environment
Externalenvironment
ExposomeRADIATION
DIET
POLLUTION
INFECTIONS
DRUGS
LIFE-STYLE
STRESS
Reactive electrophiles
Metals
Endocrine disrupters
Immune modulators
Receptor-binding proteins
itical entity for disease eti-
). Recent discussion
as focused on whether and
ow to implement this vision
). Although fully charac-
rizing human exposomes
daunting, strategies can be
eveloped for getting “snap-
hots” of critical portions of
person’s exposome during
ifferent stages of life. At
ne extreme is a “bottom-up”
rategy in which all chemi-
als in each external source
f a subject’s exposome are
easured at each time point.
lthoughthisapproachwould
ave the advantage of relat-
g important exposures to
e air, water, or diet, it would
quire enormous effort and
ould miss essential compo-
ents of the internal chemi-
al environment due to such
actors as gender, obesity,
flammation, and stress. By
ontrast, a “top-down” strat-
gy would measure all chem-
als (or products of their
ownstream processing or
ffects, so-called read-outs
r signatures) in a subject’s
ood. This would require
nly a single blood specimen
exposome represents a tech-
nological challenge like that of
thehumangenomeproject,which
began when DNA sequencing
was in its infancy (
cal systems are needed to pro-
cess small amounts of blood from
thousands of subjects. Assays
should be multiplexed for mea-
suring many chemicals in each
class of interest. Tandem mass
spectrometry, gene and protein
chips, and microfluidic systems
offer the means to do this. Plat-
forms for high-throughput assays
shouldleadtoeconomiesofscale,
again like those experienced by
the human genome project. And
because exposome technologies
would provide feedback for thera-
peuticinterventionsandpersonal-
ized medicine, they should moti-
vate the development of commer-
cial devices for screening impor-
tant environmental exposures in
blood samples.
Characterizing the exposome. The exposome represents
the combined exposures from all sources that reach the
internal chemical environment. Toxicologically important
classes of exposome chemicals are shown. Signatures and
biomarkers can detect these agents in blood or serum.
how to analyze in relation to health?
Wild, 2005
Rappaport and Smith, 2010, 2011
Buck-Louis and Sundaram 2012
Miller and Jones, 2014
Patel CJ and Ioannidis JPAI, 2014
24. Explaining the other 50%:
A new data-driven paradigm for robust discovery of
E in disease via EWAS and the exposome
what to
measure? how to measure?
Xenobiotics
Inflammation
Preexisting disease
Lipid peroxidation
Oxidative stress
Gut flora
Internal
chemical
environment
Externalenvironment
ExposomeRADIATION
DIET
POLLUTION
INFECTIONS
DRUGS
LIFE-STYLE
STRESS
Reactive electrophiles
Metals
Endocrine disrupters
Immune modulators
Receptor-binding proteins
itical entity for disease eti-
). Recent discussion
as focused on whether and
ow to implement this vision
). Although fully charac-
rizing human exposomes
daunting, strategies can be
eveloped for getting “snap-
hots” of critical portions of
person’s exposome during
ifferent stages of life. At
ne extreme is a “bottom-up”
rategy in which all chemi-
als in each external source
f a subject’s exposome are
easured at each time point.
lthoughthisapproachwould
ave the advantage of relat-
g important exposures to
e air, water, or diet, it would
quire enormous effort and
ould miss essential compo-
ents of the internal chemi-
al environment due to such
actors as gender, obesity,
flammation, and stress. By
ontrast, a “top-down” strat-
gy would measure all chem-
als (or products of their
ownstream processing or
ffects, so-called read-outs
r signatures) in a subject’s
ood. This would require
nly a single blood specimen
exposome represents a tech-
nological challenge like that of
thehumangenomeproject,which
began when DNA sequencing
was in its infancy (
cal systems are needed to pro-
cess small amounts of blood from
thousands of subjects. Assays
should be multiplexed for mea-
suring many chemicals in each
class of interest. Tandem mass
spectrometry, gene and protein
chips, and microfluidic systems
offer the means to do this. Plat-
forms for high-throughput assays
shouldleadtoeconomiesofscale,
again like those experienced by
the human genome project. And
because exposome technologies
would provide feedback for thera-
peuticinterventionsandpersonal-
ized medicine, they should moti-
vate the development of commer-
cial devices for screening impor-
tant environmental exposures in
blood samples.
Characterizing the exposome. The exposome represents
the combined exposures from all sources that reach the
internal chemical environment. Toxicologically important
classes of exposome chemicals are shown. Signatures and
biomarkers can detect these agents in blood or serum.
“A more comprehensive view of
environmental exposure is
needed ... to discover major
causes of diseases...”
how to analyze in relation to health?
Wild, 2005
Rappaport and Smith, 2010, 2011
Buck-Louis and Sundaram 2012
Miller and Jones, 2014
Patel CJ and Ioannidis JPAI, 2014
25. Connecting Environmental Exposure with Disease:
Missing the “System” of Exposures?
E+ E-
diseased
non-
diseased
?
Exposed to many things, but do not assess the multiplicity.
Fragmented literature of associations.
Challenge to discover E associated with disease.
26. Example of fragmentation and vibration of effects:
Is everything we eat associated with cancer?
AJCN, 2012
JCE, 2015
50 random ingredients from
Boston Cooking School
Cookbook
Any associated with cancer?
27. Example of fragmentation and vibration of effects:
Is everything we eat associated with cancer?
AJCN, 2012
JCE, 2015
Of 50, 40 studied in a cancer risk
50 random ingredients from
Boston Cooking School
Cookbook
Any associated with cancer?
28. Example of fragmentation and vibration of effects:
Is everything we eat associated with cancer?
AJCN, 2012
JCE, 2015
FIGURE 1. Effect estimates reported in the literature by malignancy type (top) or ingredient (bottom). Only ingredients with $10 studie
outliers are not shown (effect estimates .10).
Of 50, 40 studied in a cancer risk
50 random ingredients from
Boston Cooking School
Cookbook
Any associated with cancer?
29. Example of fragmentation and vibration of effects:
Is everything we eat associated with cancer?
AJCN, 2012
JCE, 2015
FIGURE 1. Effect estimates reported in the literature by malignancy type (top) or ingredient (bottom). Only ingredients with $10 studie
outliers are not shown (effect estimates .10).
Of 50, 40 studied in a cancer risk
Weak statistical evidence:
non-replicated
inconsistent effects
non-standardized
50 random ingredients from
Boston Cooking School
Cookbook
Any associated with cancer?
30. Example of fragmentation and vibration of effects:
Is everything we eat associated with cancer?
AJCN, 2012
JCE, 2015
FIGURE 1. Effect estimates reported in the literature by malignancy type (top) or ingredient (bottom). Only ingredients with $10 studie
outliers are not shown (effect estimates .10).
Of 50, 40 studied in a cancer risk
Weak statistical evidence:
non-replicated
inconsistent effects
non-standardized
50 random ingredients from
Boston Cooking School
Cookbook
Any associated with cancer?
relative risk
riskprotection
31. Connecting Environmental Exposure with Disease:
Missing the “System” of Exposures?
E+ E-
diseased
non-
diseased
?
Exposed to many things, but do not assess the multiplicity.
Fragmented literature of associations.
Challenge to discover E associated with disease.
33. Gold standard for breadth of human exposure information:
National Health and Nutrition Examination Survey1
since the 1960s
now biannual: 1999 onwards
10,000 participants per survey
The sample for the survey is selected to represent
the U.S. population of all ages. To produce reli-
able statistics, NHANES over-samples persons 60
and older, African Americans, and Hispanics.
Since the United States has experienced dramatic
growth in the number of older people during this
century, the aging population has major impli-
cations for health care needs, public policy, and
research priorities. NCHS is working with public
health agencies to increase the knowledge of the
health status of older Americans. NHANES has a
primary role in this endeavor.
All participants visit the physician. Dietary inter-
views and body measurements are included for
everyone. All but the very young have a blood
sample taken and will have a dental screening.
Depending upon the age of the participant, the
rest of the examination includes tests and proce-
dures to assess the various aspects of health listed
above. In general, the older the individual, the
more extensive the examination.
Survey Operations
Health interviews are conducted in respondents’
homes. Health measurements are performed in
specially-designed and equipped mobile centers,
which travel to locations throughout the country.
The study team consists of a physician, medical
and health technicians, as well as dietary and health
interviewers. Many of the study staff are
bilingual (English/Spanish).
An advanced computer system using high-
end servers, desktop PCs, and wide-area
networking collect and process all of the
NHANES data, nearly eliminating the need
for paper forms and manual coding operations.
This system allows interviewers to use note-
book computers with electronic pens. The staff
at the mobile center can automatically transmit
data into data bases through such devices as
digital scales and stadiometers. Touch-sensi-
tive computer screens let respondents enter
their own responses to certain sensitive ques-
tions in complete privacy. Survey information
is available to NCHS staff within 24 hours of
collection, which enhances the capability of
collecting quality data and increases the speed
with which results are released to the public.
In each location, local health and government
officials are notified of the upcoming survey.
Households in the study area receive a letter
from the NCHS Director to introduce the
survey. Local media may feature stories about
the survey.
NHANES is designed to facilitate and en-
courage participation. Transportation is provided
to and from the mobile center if necessary.
Participants receive compensation and a report
of medical findings is given to each participant.
All information collected in the survey is kept
strictly confidential. Privacy is protected by
public laws.
Uses of the Data
Information from NHANES is made available
through an extensive series of publications and
articles in scientific and technical journals. For
data users and researchers throughout the world,
survey data are available on the internet and on
easy-to-use CD-ROMs.
Research organizations, universities, health
care providers, and educators benefit from
survey information. Primary data users are
federal agencies that collaborated in the de-
sign and development of the survey. The
National Institutes of Health, the Food and
Drug Administration, and CDC are among the
agencies that rely upon NHANES to provide
data essential for the implementation and
evaluation of program activities. The U.S.
Department of Agriculture and NCHS coop-
erate in planning and reporting dietary and
nutrition information from the survey.
NHANES’ partnership with the U.S. Environ-
mental Protection Agency allows continued
study of the many important environmental
influences on our health.
• Physical fitness and physical functioning
• Reproductive history and sexual behavior
• Respiratory disease (asthma, chronic bron-
chitis, emphysema)
• Sexually transmitted diseases
• Vision
1 http://www.cdc.gov/nchs/nhanes.htm
34. Gold standard for breadth of human exposure information:
National Health and Nutrition Examination Survey1
since the 1960s
now biannual: 1999 onwards
10,000 participants per survey
The sample for the survey is selected to represent
the U.S. population of all ages. To produce reli-
able statistics, NHANES over-samples persons 60
and older, African Americans, and Hispanics.
Since the United States has experienced dramatic
growth in the number of older people during this
century, the aging population has major impli-
cations for health care needs, public policy, and
research priorities. NCHS is working with public
health agencies to increase the knowledge of the
health status of older Americans. NHANES has a
primary role in this endeavor.
All participants visit the physician. Dietary inter-
views and body measurements are included for
everyone. All but the very young have a blood
sample taken and will have a dental screening.
Depending upon the age of the participant, the
rest of the examination includes tests and proce-
dures to assess the various aspects of health listed
above. In general, the older the individual, the
more extensive the examination.
Survey Operations
Health interviews are conducted in respondents’
homes. Health measurements are performed in
specially-designed and equipped mobile centers,
which travel to locations throughout the country.
The study team consists of a physician, medical
and health technicians, as well as dietary and health
interviewers. Many of the study staff are
bilingual (English/Spanish).
An advanced computer system using high-
end servers, desktop PCs, and wide-area
networking collect and process all of the
NHANES data, nearly eliminating the need
for paper forms and manual coding operations.
This system allows interviewers to use note-
book computers with electronic pens. The staff
at the mobile center can automatically transmit
data into data bases through such devices as
digital scales and stadiometers. Touch-sensi-
tive computer screens let respondents enter
their own responses to certain sensitive ques-
tions in complete privacy. Survey information
is available to NCHS staff within 24 hours of
collection, which enhances the capability of
collecting quality data and increases the speed
with which results are released to the public.
In each location, local health and government
officials are notified of the upcoming survey.
Households in the study area receive a letter
from the NCHS Director to introduce the
survey. Local media may feature stories about
the survey.
NHANES is designed to facilitate and en-
courage participation. Transportation is provided
to and from the mobile center if necessary.
Participants receive compensation and a report
of medical findings is given to each participant.
All information collected in the survey is kept
strictly confidential. Privacy is protected by
public laws.
Uses of the Data
Information from NHANES is made available
through an extensive series of publications and
articles in scientific and technical journals. For
data users and researchers throughout the world,
survey data are available on the internet and on
easy-to-use CD-ROMs.
Research organizations, universities, health
care providers, and educators benefit from
survey information. Primary data users are
federal agencies that collaborated in the de-
sign and development of the survey. The
National Institutes of Health, the Food and
Drug Administration, and CDC are among the
agencies that rely upon NHANES to provide
data essential for the implementation and
evaluation of program activities. The U.S.
Department of Agriculture and NCHS coop-
erate in planning and reporting dietary and
nutrition information from the survey.
NHANES’ partnership with the U.S. Environ-
mental Protection Agency allows continued
study of the many important environmental
influences on our health.
• Physical fitness and physical functioning
• Reproductive history and sexual behavior
• Respiratory disease (asthma, chronic bron-
chitis, emphysema)
• Sexually transmitted diseases
• Vision
1 http://www.cdc.gov/nchs/nhanes.htm
>250 exposures (serum + urine)
>1,000 genetic loci
>85 quantitative clinical traits
(e.g., serum glucose, lipids, body
mass index)
35. What maternal E are associated with preterm birth (< 37 weeks)?
Reprod Tox, 2014
36. What maternal E are associated with preterm birth (< 37 weeks)?:
What did we screen in moms?
37. What maternal E are associated with preterm birth (< 37 weeks)?:
What did we screen in moms?
Nutrients and Vitamins
vitamin D, carotenes
32
38. What maternal E are associated with preterm birth (< 37 weeks)?:
What did we screen in moms?
Infectious Agents
hepatitis, HIV, Staph. aureus
24
Nutrients and Vitamins
vitamin D, carotenes
32
39. What maternal E are associated with preterm birth (< 37 weeks)?:
What did we screen in moms?
Infectious Agents
hepatitis, HIV, Staph. aureus
24
Nutrients and Vitamins
vitamin D, carotenes
32
Plastics and consumables
phthalates, bisphenol A
49
40. What maternal E are associated with preterm birth (< 37 weeks)?:
What did we screen in moms?
Infectious Agents
hepatitis, HIV, Staph. aureus
24
Nutrients and Vitamins
vitamin D, carotenes
32
Plastics and consumables
phthalates, bisphenol A
49
Pesticides and air-related pollutants
atrazine; cadmium; hydrocarbons;
polychorinated biphenyls;
volatile organic compounds
95
41. NHANES 1999-2006
5772 reporting live births
What maternal E are associated with preterm birth (< 37 weeks)?:
Method for screening for associations
Reprod Tox, 2014
42. NHANES 1999-2006
5772 reporting live births
What maternal E are associated with preterm birth (< 37 weeks)?:
Method for screening for associations
Reprod Tox, 2014
Pregnant year prior to survey?
842 participants
43. NHANES 1999-2006
5772 reporting live births
What maternal E are associated with preterm birth (< 37 weeks)?:
Method for screening for associations
Reprod Tox, 2014
Pregnant year prior to survey?
842 participants
Any child born preterm?
< 37 weeks or earlier
44. NHANES 1999-2006
5772 reporting live births
What maternal E are associated with preterm birth (< 37 weeks)?:
Method for screening for associations
Reprod Tox, 2014
Pregnant year prior to survey?
842 participants
Any child born preterm?
< 37 weeks or earlier
Any preterm birth
N=62
No preterm birth
N=718
45. NHANES 1999-2006
5772 reporting live births
What maternal E are associated with preterm birth (< 37 weeks)?:
Method for screening for associations
Reprod Tox, 2014
logistic regression
(age, race, poverty/income, education, number of births)
Pregnant year prior to survey?
842 participants
Any child born preterm?
< 37 weeks or earlier
Any preterm birth
N=62
No preterm birth
N=718
46. What maternal E are associated with preterm birth (< 37 weeks)?:
Volcano plot of 201 associations
Reprod Tox, 2014
0
1
2
0 2 4 6
Odds Ratio
−log10(p−value)
47. What maternal E are associated with preterm birth (< 37 weeks)?:
Volcano plot of 201 associations
Reprod Tox, 2014
0
1
2
0 2 4 6
Odds Ratio
−log10(p−value)
serum iron (OR: 1.6)
urine Cs (OR: 1.9)
urine hydroxypyrene (OR: 1.8)
48. What maternal E are associated with preterm birth (< 37 weeks)?:
Volcano plot of 201 associations
Reprod Tox, 2014
0
1
2
0 2 4 6
Odds Ratio
−log10(p−value)
urine bisphenol A (OR: 1.9)
serum iron (OR: 1.6)
urine Cs (OR: 1.9)
urine hydroxypyrene (OR: 1.8)
49. Tentative evaluation of higher Bisphenol A levels in moms who gave
preterm birth in a tertiary clinic
Reprod Tox, 2014
Lucile Packard Children’s Hospital
37 consenting mothers with urine
(during gestation)!
bisphenol A
50. Tentative evaluation of higher Bisphenol A levels in moms who gave
preterm birth in a tertiary clinic
Reprod Tox, 2014
Lucile Packard Children’s Hospital
37 consenting mothers with urine
(during gestation)!
bisphenol A
Child born preterm?
51. Tentative evaluation of higher Bisphenol A levels in moms who gave
preterm birth in a tertiary clinic
Reprod Tox, 2014
Lucile Packard Children’s Hospital
37 consenting mothers with urine
(during gestation)!
bisphenol A
Child born preterm?
Preterm
N=16
No Preterm
N=21
52. Tentative evaluation of higher Bisphenol A levels in moms who gave
preterm birth in a tertiary clinic
Reprod Tox, 2014
Lucile Packard Children’s Hospital
37 consenting mothers with urine
(during gestation)!
bisphenol A
0.07 ug/mL 0.03 ug/mL
Odds Ratio (1SD change): 3.5 (p=0.1)
(age, race, creatinine, gestational age)
Child born preterm?
Preterm
N=16
No Preterm
N=21
53. The spectrum of associations depends on age:
What E factors are associated with
mortality?
54. EWAS to search for
exposures and behaviors associated with all-cause mortality.
NHANES: 1999-2004
National Death Index linked mortality
246 behaviors and exposures (serum/urine/self-report)
IJE, 2013
55. EWAS to search for
exposures and behaviors associated with all-cause mortality.
NHANES: 1999-2004
National Death Index linked mortality
246 behaviors and exposures (serum/urine/self-report)
NHANES: 1999-2001
N=330 to 6008 (26 to 655 deaths)
~5.5 years of followup
IJE, 2013
56. EWAS to search for
exposures and behaviors associated with all-cause mortality.
NHANES: 1999-2004
National Death Index linked mortality
246 behaviors and exposures (serum/urine/self-report)
NHANES: 1999-2001
N=330 to 6008 (26 to 655 deaths)
~5.5 years of followup
Cox proportional hazards
baseline exposure and time to death
IJE, 2013
57. EWAS to search for
exposures and behaviors associated with all-cause mortality.
NHANES: 1999-2004
National Death Index linked mortality
246 behaviors and exposures (serum/urine/self-report)
NHANES: 1999-2001
N=330 to 6008 (26 to 655 deaths)
~5.5 years of followup
Cox proportional hazards
baseline exposure and time to death
False discovery rate < 5%
IJE, 2013
58. EWAS to search for
exposures and behaviors associated with all-cause mortality.
NHANES: 1999-2004
National Death Index linked mortality
246 behaviors and exposures (serum/urine/self-report)
NHANES: 1999-2001
N=330 to 6008 (26 to 655 deaths)
~5.5 years of followup
Cox proportional hazards
baseline exposure and time to death
False discovery rate < 5%
NHANES: 2003-2004
N=177 to 3258 (20-202 deaths)
~2.8 years of followup
p < 0.05
IJE, 2013
60. Adjusted Hazard Ratio
-log10(pvalue)
0.4 0.6 0.8 1.0 1.2 1.4 1.6 2.0 2.4 2.8
02468
1
2
3
4
5
67
1 Physical Activity
2 Does anyone smoke in home?
3 Cadmium
4 Cadmium, urine
5 Past smoker
6 Current smoker
7 trans-lycopene
(11) 1
2
3 4
5 6
78
9
10 1112
13 14
1516
1 age (10 year increment)
2 SES_1
3 male
4 SES_0
5 black
6 SES_2
7 SES_3
8 education_hs
9 other_eth
10 mexican
11 occupation_blue_semi
12 education_less_hs
13 occupation_never
14 occupation_blue_high
15 occupation_white_semi
16 other_hispanic
(69)
EWAS (re)-identifies factors associated with all-cause mortality:
Volcano plot of 200 associations
age (10 years)
income (quintile 2)
income (quintile 1)
male
black income (quintile 3)
age, sex, income, education, race/ethnicity, occupation [in red]
61. Adjusted Hazard Ratio
-log10(pvalue)
0.4 0.6 0.8 1.0 1.2 1.4 1.6 2.0 2.4 2.8
02468
1
2
3
4
5
67
1 Physical Activity
2 Does anyone smoke in home?
3 Cadmium
4 Cadmium, urine
5 Past smoker
6 Current smoker
7 trans-lycopene
(11) 1
2
3 4
5 6
78
9
10 1112
13 14
1516
1 age (10 year increment)
2 SES_1
3 male
4 SES_0
5 black
6 SES_2
7 SES_3
8 education_hs
9 other_eth
10 mexican
11 occupation_blue_semi
12 education_less_hs
13 occupation_never
14 occupation_blue_high
15 occupation_white_semi
16 other_hispanic
(69)
EWAS (re)-identifies factors associated with all-cause mortality:
Volcano plot of 200 associations
age (10 years)
income (quintile 2)
income (quintile 1)
male
black income (quintile 3)
age, sex, income, education, race/ethnicity, occupation [in red]
serum lycopene
[1SD]
62. Adjusted Hazard Ratio
-log10(pvalue)
0.4 0.6 0.8 1.0 1.2 1.4 1.6 2.0 2.4 2.8
02468
1
2
3
4
5
67
1 Physical Activity
2 Does anyone smoke in home?
3 Cadmium
4 Cadmium, urine
5 Past smoker
6 Current smoker
7 trans-lycopene
(11) 1
2
3 4
5 6
78
9
10 1112
13 14
1516
1 age (10 year increment)
2 SES_1
3 male
4 SES_0
5 black
6 SES_2
7 SES_3
8 education_hs
9 other_eth
10 mexican
11 occupation_blue_semi
12 education_less_hs
13 occupation_never
14 occupation_blue_high
15 occupation_white_semi
16 other_hispanic
(69)
EWAS (re)-identifies factors associated with all-cause mortality:
Volcano plot of 200 associations
age (10 years)
income (quintile 2)
income (quintile 1)
male
black income (quintile 3)
age, sex, income, education, race/ethnicity, occupation [in red]
past smoker?
current smoker?serum lycopene
[1SD]
63. Adjusted Hazard Ratio
-log10(pvalue)
0.4 0.6 0.8 1.0 1.2 1.4 1.6 2.0 2.4 2.8
02468
1
2
3
4
5
67
1 Physical Activity
2 Does anyone smoke in home?
3 Cadmium
4 Cadmium, urine
5 Past smoker
6 Current smoker
7 trans-lycopene
(11) 1
2
3 4
5 6
78
9
10 1112
13 14
1516
1 age (10 year increment)
2 SES_1
3 male
4 SES_0
5 black
6 SES_2
7 SES_3
8 education_hs
9 other_eth
10 mexican
11 occupation_blue_semi
12 education_less_hs
13 occupation_never
14 occupation_blue_high
15 occupation_white_semi
16 other_hispanic
(69)
EWAS (re)-identifies factors associated with all-cause mortality:
Volcano plot of 200 associations
age (10 years)
income (quintile 2)
income (quintile 1)
male
black income (quintile 3)
any one smoke in home?
age, sex, income, education, race/ethnicity, occupation [in red]
past smoker?
current smoker?serum lycopene
[1SD]
64. Adjusted Hazard Ratio
-log10(pvalue)
0.4 0.6 0.8 1.0 1.2 1.4 1.6 2.0 2.4 2.8
02468
1
2
3
4
5
67
1 Physical Activity
2 Does anyone smoke in home?
3 Cadmium
4 Cadmium, urine
5 Past smoker
6 Current smoker
7 trans-lycopene
(11) 1
2
3 4
5 6
78
9
10 1112
13 14
1516
1 age (10 year increment)
2 SES_1
3 male
4 SES_0
5 black
6 SES_2
7 SES_3
8 education_hs
9 other_eth
10 mexican
11 occupation_blue_semi
12 education_less_hs
13 occupation_never
14 occupation_blue_high
15 occupation_white_semi
16 other_hispanic
(69)
EWAS (re)-identifies factors associated with all-cause mortality:
Volcano plot of 200 associations
age (10 years)
income (quintile 2)
income (quintile 1)
male
black income (quintile 3)
any one smoke in home?
age, sex, income, education, race/ethnicity, occupation [in red]
serum and urine cadmium
[1 SD]
past smoker?
current smoker?serum lycopene
[1SD]
65. Adjusted Hazard Ratio
-log10(pvalue)
0.4 0.6 0.8 1.0 1.2 1.4 1.6 2.0 2.4 2.8
02468
1
2
3
4
5
67
1 Physical Activity
2 Does anyone smoke in home?
3 Cadmium
4 Cadmium, urine
5 Past smoker
6 Current smoker
7 trans-lycopene
(11) 1
2
3 4
5 6
78
9
10 1112
13 14
1516
1 age (10 year increment)
2 SES_1
3 male
4 SES_0
5 black
6 SES_2
7 SES_3
8 education_hs
9 other_eth
10 mexican
11 occupation_blue_semi
12 education_less_hs
13 occupation_never
14 occupation_blue_high
15 occupation_white_semi
16 other_hispanic
(69)
EWAS (re)-identifies factors associated with all-cause mortality:
Volcano plot of 200 associations
age (10 years)
income (quintile 2)
income (quintile 1)
male
black income (quintile 3)
any one smoke in home?
age, sex, income, education, race/ethnicity, occupation [in red]
serum and urine cadmium
[1 SD]
past smoker?
current smoker?serum lycopene
[1SD]
physical activity
[low, moderate, high activity]*
*derived from METs per activity and categorized by Health.gov guidelines
66. Adjusted Hazard Ratio
-log10(pvalue)
0.4 0.6 0.8 1.0 1.2 1.4 1.6 2.0 2.4 2.8
02468
1
2
3
4
5
67
1 Physical Activity
2 Does anyone smoke in home?
3 Cadmium
4 Cadmium, urine
5 Past smoker
6 Current smoker
7 trans-lycopene
(11) 1
2
3 4
5 6
78
9
10 1112
13 14
1516
1 age (10 year increment)
2 SES_1
3 male
4 SES_0
5 black
6 SES_2
7 SES_3
8 education_hs
9 other_eth
10 mexican
11 occupation_blue_semi
12 education_less_hs
13 occupation_never
14 occupation_blue_high
15 occupation_white_semi
16 other_hispanic
(69)
EWAS (re)-identifies factors associated with all-cause mortality:
Volcano plot of 200 associations
age (10 years)
income (quintile 2)
income (quintile 1)
male
black income (quintile 3)
any one smoke in home?
age, sex, income, education, race/ethnicity, occupation [in red]
serum and urine cadmium
[1 SD]
past smoker?
current smoker?serum lycopene
[1SD]
physical activity
[low, moderate, high activity]*
*derived from METs per activity and categorized by Health.gov guidelines
R2 ~ 2%
67. Eye color
Hair curliness
Type-1 diabetes
Height
Schizophrenia
Epilepsy
Graves' disease
Celiac disease
Polycystic ovary syndrome
Attention deficit hyperactivity disorder
Bipolar disorder
Obesity
Alzheimer's disease
Anorexia nervosa
Psoriasis
Bone mineral density
Menarche, age at
Nicotine dependence
Sexual orientation
Alcoholism
Lupus
Rheumatoid arthritis
Crohn's disease
Migraine
Thyroid cancer
Autism
Blood pressure, diastolic
Body mass index
Depression
Coronary artery disease
Insomnia
Menopause, age at
Heart disease
Prostate cancer
QT interval
Breast cancer
Ovarian cancer
Hangover
Stroke
Asthma
Blood pressure, systolic
Hypertension
Osteoarthritis
Parkinson's disease
Longevity
Type-2 diabetes
Gallstone disease
Testicular cancer
Cervical cancer
Sciatica
Bladder cancer
Colon cancer
Lung cancer
Leukemia
Stomach cancer
0 25 50 75 100
Heritability: Var(G)/Var(Phenotype) Source: SNPedia.com
Remember: >50% of disease risk and phenotypic variability is
in E!
H2 < 50%
68. Where can it be found?
>50% of disease risk and phenotypic variability
is in E!
69. Studying the Elusive Environment in Large Scale
Itispossiblethatmorethan50%ofcomplexdiseaserisk
isattributedtodifferencesinanindividual’senvironment.1
Airpollution,smoking,anddietaredocumentedenviron-
mental factors affecting health, yet these factors are but
a fraction of the “exposome,” the totality of the exposure
loadoccurringthroughoutaperson’slifetime.1
Investigat-
ing one or a handful of exposures at a time has led to a
highly fragmented literature of epidemiologic associa-
tions. Much of that literature is not reproducible, and se-
lectivereportingmaybeamajorreasonforthelackofre-
producibility. A new model is required to discover
environmental exposures associated with disease while
mitigating possibilities of selective reporting.
Toremedythelackofreproducibilityandconcernsof
validity, multiple personal exposures can be assessed si-
multaneously in terms of their association with a condi-
tion or disease of interest; the strongest associations can
then be tentatively validated in independent data sets
(eg, as done in references 2 and 3).2,3
The main advan-
tages of this process include the ability to search the list
ofexposuresandadjustformultiplicitysystematicallyand
reportalltheprobedassociationsinsteadofonlythemost
significant results. The term “environment-wide associa-
tion studies” (EWAS) has been used to describe this ap-
proach (an analogy to genome-wide association stud-
ies).Forexample,Wangetal4
screenedmorethan2000
chemicalsinserumtodiscoverendogenousexposuresas-
sociated with risk for cardiovascular disease.
Therearenotablehurdlesinanalyzing“big”environ-
mental data. These same problems affect epidemiology
of1-risk-factor-at-a-time,butinEWAStheirprevalencebe-
comes more clearly manifest at large scale. When study-
the EWAS vantage point, intervening on β-carotene
(Figure, D) seems a futile exercise given its complex rela-
tionship with other nutrients and pollutants.
Giventhiscomplexity,howcanstudiesofenvironmen-
talriskmoveforward?First,EWASanalysesshouldbeap-
pliedtomultipledatasets,andconsistencycanbeformally
examinedforallassessedcorrelations.Second,thetempo-
ral relationship between exposure and changes in health
parametersmayofferhelpfulhintsaboutwhichofthesig-
nalsaremorethansimplecorrelations.Third,standardized
adjustedanalyses,inwhichadjustmentsareperformedsys-
tematicallyandinthesamewayacrossmultipledatasets,
may also help. This is in stark contrast with the current
model,wherebymostepidemiologicstudiesusesingledata
setswithoutreplicationaswellasnon–time-dependentas-
sessments,andreportedadjustmentsaremarkedlydiffer-
entacrossreportsanddatasets,eventhoseperformedby
thesameteam(differentapproachesincreasevaliditybut
mustbereconciledandassimilated).
However, eventually for most environmental cor-
relates,theremaybeunsurpassabledifficultyestablish-
ing potential causal inferences based on observational
data alone. Factors that seem protective may some-
times be tested in randomized trials. The complexity of
the multiple correlations also highlights the challenge
thatinterveningtomodify1putativeriskfactoralsomay
inadvertently affect multiple other correlated factors.
Even when a seemingly simple intervention is tested in
randomizedtrials(affectingasingleriskfactoramongthe
manycorrelations),theinterventionisnotreallysimple.
In essence what is tested are multiple perturbations of
factors correlated with the one targeted for interven-
VIEWPOINT
Chirag J. Patel, PhD
Center for Biomedical
Informatics, Harvard
Medical School,
Boston, Massachusetts.
John P. A. Ioannidis,
MD, DSc
Stanford Prevention
Research Center,
Department of Health
Research and Policy,
Department of
Medicine, Stanford
University School of
Medicine, Stanford,
California, Department
of Statistics, Stanford
University School of
Humanities and
Sciences, Stanford,
California, and
Meta-Research
Innovation Center at
Stanford (METRICS),
Stanford, California.
Opinion
JAMA, 2014
JECH, 2014
Proc Symp Biocomp, 2015
How can we proceed to study the elusive environment in
large scale for discovery-based research?
Studying the Elusive Environment in Large Scale
Itispossiblethatmorethan50%ofcomplexdiseaserisk
isattributedtodifferencesinanindividual’senvironment.1
Airpollution,smoking,anddietaredocumentedenviron-
mental factors affecting health, yet these factors are but
a fraction of the “exposome,” the totality of the exposure
loadoccurringthroughoutaperson’slifetime.1
Investigat-
ing one or a handful of exposures at a time has led to a
highly fragmented literature of epidemiologic associa-
tions. Much of that literature is not reproducible, and se-
lectivereportingmaybeamajorreasonforthelackofre-
producibility. A new model is required to discover
environmental exposures associated with disease while
mitigating possibilities of selective reporting.
Toremedythelackofreproducibilityandconcernsof
validity, multiple personal exposures can be assessed si-
multaneously in terms of their association with a condi-
tion or disease of interest; the strongest associations can
then be tentatively validated in independent data sets
(eg, as done in references 2 and 3).2,3
The main advan-
tages of this process include the ability to search the list
ofexposuresandadjustformultiplicitysystematicallyand
reportalltheprobedassociationsinsteadofonlythemost
significant results. The term “environment-wide associa-
tion studies” (EWAS) has been used to describe this ap-
the EWAS vantage point, intervening on β-carotene
(Figure, D) seems a futile exercise given its complex rela-
tionship with other nutrients and pollutants.
Giventhiscomplexity,howcanstudiesofenvironmen-
talriskmoveforward?First,EWASanalysesshouldbeap-
pliedtomultipledatasets,andconsistencycanbeformally
examinedforallassessedcorrelations.Second,thetempo-
ral relationship between exposure and changes in health
parametersmayofferhelpfulhintsaboutwhichofthesig-
nalsaremorethansimplecorrelations.Third,standardized
adjustedanalyses,inwhichadjustmentsareperformedsys-
tematicallyandinthesamewayacrossmultipledatasets
may also help. This is in stark contrast with the current
model,wherebymostepidemiologicstudiesusesingledata
setswithoutreplicationaswellasnon–time-dependentas-
sessments,andreportedadjustmentsaremarkedlydiffer-
entacrossreportsanddatasets,eventhoseperformedby
thesameteam(differentapproachesincreasevaliditybut
mustbereconciledandassimilated).
However, eventually for most environmental cor-
relates,theremaybeunsurpassabledifficultyestablish-
ing potential causal inferences based on observationa
data alone. Factors that seem protective may some-
times be tested in randomized trials. The complexity of
VIEWPOINT
Chirag J. Patel, PhD
Center for Biomedical
Informatics, Harvard
Medical School,
Boston, Massachusetts.
John P. A. Ioannidis,
MD, DSc
Stanford Prevention
Research Center,
Department of Health
Research and Policy,
Department of
Medicine, Stanford
University School of
Medicine, Stanford,
California, Department
of Statistics, Stanford
University School of
Humanities and
Sciences, Stanford,
California, and
Meta-Research
Innovation Center at
Stanford (METRICS),
Stanford, California.
Opinion
70. Studying the Elusive Environment in Large Scale
Itispossiblethatmorethan50%ofcomplexdiseaserisk
isattributedtodifferencesinanindividual’senvironment.1
Airpollution,smoking,anddietaredocumentedenviron-
mental factors affecting health, yet these factors are but
a fraction of the “exposome,” the totality of the exposure
loadoccurringthroughoutaperson’slifetime.1
Investigat-
ing one or a handful of exposures at a time has led to a
highly fragmented literature of epidemiologic associa-
tions. Much of that literature is not reproducible, and se-
lectivereportingmaybeamajorreasonforthelackofre-
producibility. A new model is required to discover
environmental exposures associated with disease while
mitigating possibilities of selective reporting.
Toremedythelackofreproducibilityandconcernsof
validity, multiple personal exposures can be assessed si-
multaneously in terms of their association with a condi-
tion or disease of interest; the strongest associations can
then be tentatively validated in independent data sets
(eg, as done in references 2 and 3).2,3
The main advan-
tages of this process include the ability to search the list
ofexposuresandadjustformultiplicitysystematicallyand
reportalltheprobedassociationsinsteadofonlythemost
significant results. The term “environment-wide associa-
tion studies” (EWAS) has been used to describe this ap-
proach (an analogy to genome-wide association stud-
ies).Forexample,Wangetal4
screenedmorethan2000
chemicalsinserumtodiscoverendogenousexposuresas-
sociated with risk for cardiovascular disease.
Therearenotablehurdlesinanalyzing“big”environ-
mental data. These same problems affect epidemiology
of1-risk-factor-at-a-time,butinEWAStheirprevalencebe-
comes more clearly manifest at large scale. When study-
the EWAS vantage point, intervening on β-carotene
(Figure, D) seems a futile exercise given its complex rela-
tionship with other nutrients and pollutants.
Giventhiscomplexity,howcanstudiesofenvironmen-
talriskmoveforward?First,EWASanalysesshouldbeap-
pliedtomultipledatasets,andconsistencycanbeformally
examinedforallassessedcorrelations.Second,thetempo-
ral relationship between exposure and changes in health
parametersmayofferhelpfulhintsaboutwhichofthesig-
nalsaremorethansimplecorrelations.Third,standardized
adjustedanalyses,inwhichadjustmentsareperformedsys-
tematicallyandinthesamewayacrossmultipledatasets,
may also help. This is in stark contrast with the current
model,wherebymostepidemiologicstudiesusesingledata
setswithoutreplicationaswellasnon–time-dependentas-
sessments,andreportedadjustmentsaremarkedlydiffer-
entacrossreportsanddatasets,eventhoseperformedby
thesameteam(differentapproachesincreasevaliditybut
mustbereconciledandassimilated).
However, eventually for most environmental cor-
relates,theremaybeunsurpassabledifficultyestablish-
ing potential causal inferences based on observational
data alone. Factors that seem protective may some-
times be tested in randomized trials. The complexity of
the multiple correlations also highlights the challenge
thatinterveningtomodify1putativeriskfactoralsomay
inadvertently affect multiple other correlated factors.
Even when a seemingly simple intervention is tested in
randomizedtrials(affectingasingleriskfactoramongthe
manycorrelations),theinterventionisnotreallysimple.
In essence what is tested are multiple perturbations of
factors correlated with the one targeted for interven-
VIEWPOINT
Chirag J. Patel, PhD
Center for Biomedical
Informatics, Harvard
Medical School,
Boston, Massachusetts.
John P. A. Ioannidis,
MD, DSc
Stanford Prevention
Research Center,
Department of Health
Research and Policy,
Department of
Medicine, Stanford
University School of
Medicine, Stanford,
California, Department
of Statistics, Stanford
University School of
Humanities and
Sciences, Stanford,
California, and
Meta-Research
Innovation Center at
Stanford (METRICS),
Stanford, California.
Opinion
JAMA, 2014
JECH, 2014
Proc Symp Biocomp, 2015
How can we proceed to study the elusive environment in
large scale for discovery-based research?
Studying the Elusive Environment in Large Scale
Itispossiblethatmorethan50%ofcomplexdiseaserisk
isattributedtodifferencesinanindividual’senvironment.1
Airpollution,smoking,anddietaredocumentedenviron-
mental factors affecting health, yet these factors are but
a fraction of the “exposome,” the totality of the exposure
loadoccurringthroughoutaperson’slifetime.1
Investigat-
ing one or a handful of exposures at a time has led to a
highly fragmented literature of epidemiologic associa-
tions. Much of that literature is not reproducible, and se-
lectivereportingmaybeamajorreasonforthelackofre-
producibility. A new model is required to discover
environmental exposures associated with disease while
mitigating possibilities of selective reporting.
Toremedythelackofreproducibilityandconcernsof
validity, multiple personal exposures can be assessed si-
multaneously in terms of their association with a condi-
tion or disease of interest; the strongest associations can
then be tentatively validated in independent data sets
(eg, as done in references 2 and 3).2,3
The main advan-
tages of this process include the ability to search the list
ofexposuresandadjustformultiplicitysystematicallyand
reportalltheprobedassociationsinsteadofonlythemost
significant results. The term “environment-wide associa-
tion studies” (EWAS) has been used to describe this ap-
the EWAS vantage point, intervening on β-carotene
(Figure, D) seems a futile exercise given its complex rela-
tionship with other nutrients and pollutants.
Giventhiscomplexity,howcanstudiesofenvironmen-
talriskmoveforward?First,EWASanalysesshouldbeap-
pliedtomultipledatasets,andconsistencycanbeformally
examinedforallassessedcorrelations.Second,thetempo-
ral relationship between exposure and changes in health
parametersmayofferhelpfulhintsaboutwhichofthesig-
nalsaremorethansimplecorrelations.Third,standardized
adjustedanalyses,inwhichadjustmentsareperformedsys-
tematicallyandinthesamewayacrossmultipledatasets
may also help. This is in stark contrast with the current
model,wherebymostepidemiologicstudiesusesingledata
setswithoutreplicationaswellasnon–time-dependentas-
sessments,andreportedadjustmentsaremarkedlydiffer-
entacrossreportsanddatasets,eventhoseperformedby
thesameteam(differentapproachesincreasevaliditybut
mustbereconciledandassimilated).
However, eventually for most environmental cor-
relates,theremaybeunsurpassabledifficultyestablish-
ing potential causal inferences based on observationa
data alone. Factors that seem protective may some-
times be tested in randomized trials. The complexity of
VIEWPOINT
Chirag J. Patel, PhD
Center for Biomedical
Informatics, Harvard
Medical School,
Boston, Massachusetts.
John P. A. Ioannidis,
MD, DSc
Stanford Prevention
Research Center,
Department of Health
Research and Policy,
Department of
Medicine, Stanford
University School of
Medicine, Stanford,
California, Department
of Statistics, Stanford
University School of
Humanities and
Sciences, Stanford,
California, and
Meta-Research
Innovation Center at
Stanford (METRICS),
Stanford, California.
Opinion
•new ‘omics technologies
71. Studying the Elusive Environment in Large Scale
Itispossiblethatmorethan50%ofcomplexdiseaserisk
isattributedtodifferencesinanindividual’senvironment.1
Airpollution,smoking,anddietaredocumentedenviron-
mental factors affecting health, yet these factors are but
a fraction of the “exposome,” the totality of the exposure
loadoccurringthroughoutaperson’slifetime.1
Investigat-
ing one or a handful of exposures at a time has led to a
highly fragmented literature of epidemiologic associa-
tions. Much of that literature is not reproducible, and se-
lectivereportingmaybeamajorreasonforthelackofre-
producibility. A new model is required to discover
environmental exposures associated with disease while
mitigating possibilities of selective reporting.
Toremedythelackofreproducibilityandconcernsof
validity, multiple personal exposures can be assessed si-
multaneously in terms of their association with a condi-
tion or disease of interest; the strongest associations can
then be tentatively validated in independent data sets
(eg, as done in references 2 and 3).2,3
The main advan-
tages of this process include the ability to search the list
ofexposuresandadjustformultiplicitysystematicallyand
reportalltheprobedassociationsinsteadofonlythemost
significant results. The term “environment-wide associa-
tion studies” (EWAS) has been used to describe this ap-
proach (an analogy to genome-wide association stud-
ies).Forexample,Wangetal4
screenedmorethan2000
chemicalsinserumtodiscoverendogenousexposuresas-
sociated with risk for cardiovascular disease.
Therearenotablehurdlesinanalyzing“big”environ-
mental data. These same problems affect epidemiology
of1-risk-factor-at-a-time,butinEWAStheirprevalencebe-
comes more clearly manifest at large scale. When study-
the EWAS vantage point, intervening on β-carotene
(Figure, D) seems a futile exercise given its complex rela-
tionship with other nutrients and pollutants.
Giventhiscomplexity,howcanstudiesofenvironmen-
talriskmoveforward?First,EWASanalysesshouldbeap-
pliedtomultipledatasets,andconsistencycanbeformally
examinedforallassessedcorrelations.Second,thetempo-
ral relationship between exposure and changes in health
parametersmayofferhelpfulhintsaboutwhichofthesig-
nalsaremorethansimplecorrelations.Third,standardized
adjustedanalyses,inwhichadjustmentsareperformedsys-
tematicallyandinthesamewayacrossmultipledatasets,
may also help. This is in stark contrast with the current
model,wherebymostepidemiologicstudiesusesingledata
setswithoutreplicationaswellasnon–time-dependentas-
sessments,andreportedadjustmentsaremarkedlydiffer-
entacrossreportsanddatasets,eventhoseperformedby
thesameteam(differentapproachesincreasevaliditybut
mustbereconciledandassimilated).
However, eventually for most environmental cor-
relates,theremaybeunsurpassabledifficultyestablish-
ing potential causal inferences based on observational
data alone. Factors that seem protective may some-
times be tested in randomized trials. The complexity of
the multiple correlations also highlights the challenge
thatinterveningtomodify1putativeriskfactoralsomay
inadvertently affect multiple other correlated factors.
Even when a seemingly simple intervention is tested in
randomizedtrials(affectingasingleriskfactoramongthe
manycorrelations),theinterventionisnotreallysimple.
In essence what is tested are multiple perturbations of
factors correlated with the one targeted for interven-
VIEWPOINT
Chirag J. Patel, PhD
Center for Biomedical
Informatics, Harvard
Medical School,
Boston, Massachusetts.
John P. A. Ioannidis,
MD, DSc
Stanford Prevention
Research Center,
Department of Health
Research and Policy,
Department of
Medicine, Stanford
University School of
Medicine, Stanford,
California, Department
of Statistics, Stanford
University School of
Humanities and
Sciences, Stanford,
California, and
Meta-Research
Innovation Center at
Stanford (METRICS),
Stanford, California.
Opinion
JAMA, 2014
JECH, 2014
Proc Symp Biocomp, 2015
•longitudinal publicly available data
How can we proceed to study the elusive environment in
large scale for discovery-based research?
Studying the Elusive Environment in Large Scale
Itispossiblethatmorethan50%ofcomplexdiseaserisk
isattributedtodifferencesinanindividual’senvironment.1
Airpollution,smoking,anddietaredocumentedenviron-
mental factors affecting health, yet these factors are but
a fraction of the “exposome,” the totality of the exposure
loadoccurringthroughoutaperson’slifetime.1
Investigat-
ing one or a handful of exposures at a time has led to a
highly fragmented literature of epidemiologic associa-
tions. Much of that literature is not reproducible, and se-
lectivereportingmaybeamajorreasonforthelackofre-
producibility. A new model is required to discover
environmental exposures associated with disease while
mitigating possibilities of selective reporting.
Toremedythelackofreproducibilityandconcernsof
validity, multiple personal exposures can be assessed si-
multaneously in terms of their association with a condi-
tion or disease of interest; the strongest associations can
then be tentatively validated in independent data sets
(eg, as done in references 2 and 3).2,3
The main advan-
tages of this process include the ability to search the list
ofexposuresandadjustformultiplicitysystematicallyand
reportalltheprobedassociationsinsteadofonlythemost
significant results. The term “environment-wide associa-
tion studies” (EWAS) has been used to describe this ap-
the EWAS vantage point, intervening on β-carotene
(Figure, D) seems a futile exercise given its complex rela-
tionship with other nutrients and pollutants.
Giventhiscomplexity,howcanstudiesofenvironmen-
talriskmoveforward?First,EWASanalysesshouldbeap-
pliedtomultipledatasets,andconsistencycanbeformally
examinedforallassessedcorrelations.Second,thetempo-
ral relationship between exposure and changes in health
parametersmayofferhelpfulhintsaboutwhichofthesig-
nalsaremorethansimplecorrelations.Third,standardized
adjustedanalyses,inwhichadjustmentsareperformedsys-
tematicallyandinthesamewayacrossmultipledatasets
may also help. This is in stark contrast with the current
model,wherebymostepidemiologicstudiesusesingledata
setswithoutreplicationaswellasnon–time-dependentas-
sessments,andreportedadjustmentsaremarkedlydiffer-
entacrossreportsanddatasets,eventhoseperformedby
thesameteam(differentapproachesincreasevaliditybut
mustbereconciledandassimilated).
However, eventually for most environmental cor-
relates,theremaybeunsurpassabledifficultyestablish-
ing potential causal inferences based on observationa
data alone. Factors that seem protective may some-
times be tested in randomized trials. The complexity of
VIEWPOINT
Chirag J. Patel, PhD
Center for Biomedical
Informatics, Harvard
Medical School,
Boston, Massachusetts.
John P. A. Ioannidis,
MD, DSc
Stanford Prevention
Research Center,
Department of Health
Research and Policy,
Department of
Medicine, Stanford
University School of
Medicine, Stanford,
California, Department
of Statistics, Stanford
University School of
Humanities and
Sciences, Stanford,
California, and
Meta-Research
Innovation Center at
Stanford (METRICS),
Stanford, California.
Opinion
•new ‘omics technologies
72. Studying the Elusive Environment in Large Scale
Itispossiblethatmorethan50%ofcomplexdiseaserisk
isattributedtodifferencesinanindividual’senvironment.1
Airpollution,smoking,anddietaredocumentedenviron-
mental factors affecting health, yet these factors are but
a fraction of the “exposome,” the totality of the exposure
loadoccurringthroughoutaperson’slifetime.1
Investigat-
ing one or a handful of exposures at a time has led to a
highly fragmented literature of epidemiologic associa-
tions. Much of that literature is not reproducible, and se-
lectivereportingmaybeamajorreasonforthelackofre-
producibility. A new model is required to discover
environmental exposures associated with disease while
mitigating possibilities of selective reporting.
Toremedythelackofreproducibilityandconcernsof
validity, multiple personal exposures can be assessed si-
multaneously in terms of their association with a condi-
tion or disease of interest; the strongest associations can
then be tentatively validated in independent data sets
(eg, as done in references 2 and 3).2,3
The main advan-
tages of this process include the ability to search the list
ofexposuresandadjustformultiplicitysystematicallyand
reportalltheprobedassociationsinsteadofonlythemost
significant results. The term “environment-wide associa-
tion studies” (EWAS) has been used to describe this ap-
proach (an analogy to genome-wide association stud-
ies).Forexample,Wangetal4
screenedmorethan2000
chemicalsinserumtodiscoverendogenousexposuresas-
sociated with risk for cardiovascular disease.
Therearenotablehurdlesinanalyzing“big”environ-
mental data. These same problems affect epidemiology
of1-risk-factor-at-a-time,butinEWAStheirprevalencebe-
comes more clearly manifest at large scale. When study-
the EWAS vantage point, intervening on β-carotene
(Figure, D) seems a futile exercise given its complex rela-
tionship with other nutrients and pollutants.
Giventhiscomplexity,howcanstudiesofenvironmen-
talriskmoveforward?First,EWASanalysesshouldbeap-
pliedtomultipledatasets,andconsistencycanbeformally
examinedforallassessedcorrelations.Second,thetempo-
ral relationship between exposure and changes in health
parametersmayofferhelpfulhintsaboutwhichofthesig-
nalsaremorethansimplecorrelations.Third,standardized
adjustedanalyses,inwhichadjustmentsareperformedsys-
tematicallyandinthesamewayacrossmultipledatasets,
may also help. This is in stark contrast with the current
model,wherebymostepidemiologicstudiesusesingledata
setswithoutreplicationaswellasnon–time-dependentas-
sessments,andreportedadjustmentsaremarkedlydiffer-
entacrossreportsanddatasets,eventhoseperformedby
thesameteam(differentapproachesincreasevaliditybut
mustbereconciledandassimilated).
However, eventually for most environmental cor-
relates,theremaybeunsurpassabledifficultyestablish-
ing potential causal inferences based on observational
data alone. Factors that seem protective may some-
times be tested in randomized trials. The complexity of
the multiple correlations also highlights the challenge
thatinterveningtomodify1putativeriskfactoralsomay
inadvertently affect multiple other correlated factors.
Even when a seemingly simple intervention is tested in
randomizedtrials(affectingasingleriskfactoramongthe
manycorrelations),theinterventionisnotreallysimple.
In essence what is tested are multiple perturbations of
factors correlated with the one targeted for interven-
VIEWPOINT
Chirag J. Patel, PhD
Center for Biomedical
Informatics, Harvard
Medical School,
Boston, Massachusetts.
John P. A. Ioannidis,
MD, DSc
Stanford Prevention
Research Center,
Department of Health
Research and Policy,
Department of
Medicine, Stanford
University School of
Medicine, Stanford,
California, Department
of Statistics, Stanford
University School of
Humanities and
Sciences, Stanford,
California, and
Meta-Research
Innovation Center at
Stanford (METRICS),
Stanford, California.
Opinion
JAMA, 2014
JECH, 2014
Proc Symp Biocomp, 2015
•longitudinal publicly available data
How can we proceed to study the elusive environment in
large scale for discovery-based research?
Studying the Elusive Environment in Large Scale
Itispossiblethatmorethan50%ofcomplexdiseaserisk
isattributedtodifferencesinanindividual’senvironment.1
Airpollution,smoking,anddietaredocumentedenviron-
mental factors affecting health, yet these factors are but
a fraction of the “exposome,” the totality of the exposure
loadoccurringthroughoutaperson’slifetime.1
Investigat-
ing one or a handful of exposures at a time has led to a
highly fragmented literature of epidemiologic associa-
tions. Much of that literature is not reproducible, and se-
lectivereportingmaybeamajorreasonforthelackofre-
producibility. A new model is required to discover
environmental exposures associated with disease while
mitigating possibilities of selective reporting.
Toremedythelackofreproducibilityandconcernsof
validity, multiple personal exposures can be assessed si-
multaneously in terms of their association with a condi-
tion or disease of interest; the strongest associations can
then be tentatively validated in independent data sets
(eg, as done in references 2 and 3).2,3
The main advan-
tages of this process include the ability to search the list
ofexposuresandadjustformultiplicitysystematicallyand
reportalltheprobedassociationsinsteadofonlythemost
significant results. The term “environment-wide associa-
tion studies” (EWAS) has been used to describe this ap-
the EWAS vantage point, intervening on β-carotene
(Figure, D) seems a futile exercise given its complex rela-
tionship with other nutrients and pollutants.
Giventhiscomplexity,howcanstudiesofenvironmen-
talriskmoveforward?First,EWASanalysesshouldbeap-
pliedtomultipledatasets,andconsistencycanbeformally
examinedforallassessedcorrelations.Second,thetempo-
ral relationship between exposure and changes in health
parametersmayofferhelpfulhintsaboutwhichofthesig-
nalsaremorethansimplecorrelations.Third,standardized
adjustedanalyses,inwhichadjustmentsareperformedsys-
tematicallyandinthesamewayacrossmultipledatasets
may also help. This is in stark contrast with the current
model,wherebymostepidemiologicstudiesusesingledata
setswithoutreplicationaswellasnon–time-dependentas-
sessments,andreportedadjustmentsaremarkedlydiffer-
entacrossreportsanddatasets,eventhoseperformedby
thesameteam(differentapproachesincreasevaliditybut
mustbereconciledandassimilated).
However, eventually for most environmental cor-
relates,theremaybeunsurpassabledifficultyestablish-
ing potential causal inferences based on observationa
data alone. Factors that seem protective may some-
times be tested in randomized trials. The complexity of
VIEWPOINT
Chirag J. Patel, PhD
Center for Biomedical
Informatics, Harvard
Medical School,
Boston, Massachusetts.
John P. A. Ioannidis,
MD, DSc
Stanford Prevention
Research Center,
Department of Health
Research and Policy,
Department of
Medicine, Stanford
University School of
Medicine, Stanford,
California, Department
of Statistics, Stanford
University School of
Humanities and
Sciences, Stanford,
California, and
Meta-Research
Innovation Center at
Stanford (METRICS),
Stanford, California.
Opinion
High-throughputascertainmentofendogenousindicatorsofen-
vironmentalexposurethatmayreflecttheexposomeincreasinglyat-
tractattention,andtheirperformanceneedstobecarefullyevaluated.
These include chemical detection of indicators of exposure through
US federally funded gene expression experiment data be d
itedinpublicrepositoriessuchastheGeneExpressionOmnibu
repositoryhasbeeninstrumentalindevelopmentoftechnolo
measurement of gene expression, data standardization, and
Figure. Correlation Interdependency Globes for 4 Environmental Exposures (Cotinine, Mercury, Cadmium, Trans-β-Carotene) in National Health
Nutrition Examination Survey (NHANES) Participants, 2003-2004
A Serum cotinine B Serum total mercury C Serum cadmium D Serum trans-β-carotene
37 Total correlations 42 Total correlations 68 Total correlations 68 Total correlations
Negative correlation Positive correl
Infectious
agents
Pollutants
Nutrients
and vitamins
Demographic
attributes
Eachcorrelationinterdependencyglobeincludes317environmentalexposures
representedbythenodesaroundtheperipheryoftheglobe.Pairwisecorrelations
aredepictedbyedges(lines)betweenthenodeofinterest(arrowhead)andother
nodes.Correlationswithabsolutevaluesexceeding0.2areshown(stronges
Thesizeofeachnodeisproportionaltothenumberofedgesforanode,and
thicknessofeachedgeindicatesthemagnitudeofthecorrelation.
Opinion Viewpoint
•data mining and informatics to tackle complexity
what causes what?
confounding
•new ‘omics technologies
73. Studying the Elusive Environment in Large Scale
Itispossiblethatmorethan50%ofcomplexdiseaserisk
isattributedtodifferencesinanindividual’senvironment.1
Airpollution,smoking,anddietaredocumentedenviron-
mental factors affecting health, yet these factors are but
a fraction of the “exposome,” the totality of the exposure
loadoccurringthroughoutaperson’slifetime.1
Investigat-
ing one or a handful of exposures at a time has led to a
highly fragmented literature of epidemiologic associa-
tions. Much of that literature is not reproducible, and se-
lectivereportingmaybeamajorreasonforthelackofre-
producibility. A new model is required to discover
environmental exposures associated with disease while
mitigating possibilities of selective reporting.
Toremedythelackofreproducibilityandconcernsof
validity, multiple personal exposures can be assessed si-
multaneously in terms of their association with a condi-
tion or disease of interest; the strongest associations can
then be tentatively validated in independent data sets
(eg, as done in references 2 and 3).2,3
The main advan-
tages of this process include the ability to search the list
ofexposuresandadjustformultiplicitysystematicallyand
reportalltheprobedassociationsinsteadofonlythemost
significant results. The term “environment-wide associa-
tion studies” (EWAS) has been used to describe this ap-
proach (an analogy to genome-wide association stud-
ies).Forexample,Wangetal4
screenedmorethan2000
chemicalsinserumtodiscoverendogenousexposuresas-
sociated with risk for cardiovascular disease.
Therearenotablehurdlesinanalyzing“big”environ-
mental data. These same problems affect epidemiology
of1-risk-factor-at-a-time,butinEWAStheirprevalencebe-
comes more clearly manifest at large scale. When study-
the EWAS vantage point, intervening on β-carotene
(Figure, D) seems a futile exercise given its complex rela-
tionship with other nutrients and pollutants.
Giventhiscomplexity,howcanstudiesofenvironmen-
talriskmoveforward?First,EWASanalysesshouldbeap-
pliedtomultipledatasets,andconsistencycanbeformally
examinedforallassessedcorrelations.Second,thetempo-
ral relationship between exposure and changes in health
parametersmayofferhelpfulhintsaboutwhichofthesig-
nalsaremorethansimplecorrelations.Third,standardized
adjustedanalyses,inwhichadjustmentsareperformedsys-
tematicallyandinthesamewayacrossmultipledatasets,
may also help. This is in stark contrast with the current
model,wherebymostepidemiologicstudiesusesingledata
setswithoutreplicationaswellasnon–time-dependentas-
sessments,andreportedadjustmentsaremarkedlydiffer-
entacrossreportsanddatasets,eventhoseperformedby
thesameteam(differentapproachesincreasevaliditybut
mustbereconciledandassimilated).
However, eventually for most environmental cor-
relates,theremaybeunsurpassabledifficultyestablish-
ing potential causal inferences based on observational
data alone. Factors that seem protective may some-
times be tested in randomized trials. The complexity of
the multiple correlations also highlights the challenge
thatinterveningtomodify1putativeriskfactoralsomay
inadvertently affect multiple other correlated factors.
Even when a seemingly simple intervention is tested in
randomizedtrials(affectingasingleriskfactoramongthe
manycorrelations),theinterventionisnotreallysimple.
In essence what is tested are multiple perturbations of
factors correlated with the one targeted for interven-
VIEWPOINT
Chirag J. Patel, PhD
Center for Biomedical
Informatics, Harvard
Medical School,
Boston, Massachusetts.
John P. A. Ioannidis,
MD, DSc
Stanford Prevention
Research Center,
Department of Health
Research and Policy,
Department of
Medicine, Stanford
University School of
Medicine, Stanford,
California, Department
of Statistics, Stanford
University School of
Humanities and
Sciences, Stanford,
California, and
Meta-Research
Innovation Center at
Stanford (METRICS),
Stanford, California.
Opinion
JAMA, 2014
JECH, 2014
Proc Symp Biocomp, 2015
•longitudinal publicly available data
How can we proceed to study the elusive environment in
large scale for discovery-based research?
Studying the Elusive Environment in Large Scale
Itispossiblethatmorethan50%ofcomplexdiseaserisk
isattributedtodifferencesinanindividual’senvironment.1
Airpollution,smoking,anddietaredocumentedenviron-
mental factors affecting health, yet these factors are but
a fraction of the “exposome,” the totality of the exposure
loadoccurringthroughoutaperson’slifetime.1
Investigat-
ing one or a handful of exposures at a time has led to a
highly fragmented literature of epidemiologic associa-
tions. Much of that literature is not reproducible, and se-
lectivereportingmaybeamajorreasonforthelackofre-
producibility. A new model is required to discover
environmental exposures associated with disease while
mitigating possibilities of selective reporting.
Toremedythelackofreproducibilityandconcernsof
validity, multiple personal exposures can be assessed si-
multaneously in terms of their association with a condi-
tion or disease of interest; the strongest associations can
then be tentatively validated in independent data sets
(eg, as done in references 2 and 3).2,3
The main advan-
tages of this process include the ability to search the list
ofexposuresandadjustformultiplicitysystematicallyand
reportalltheprobedassociationsinsteadofonlythemost
significant results. The term “environment-wide associa-
tion studies” (EWAS) has been used to describe this ap-
the EWAS vantage point, intervening on β-carotene
(Figure, D) seems a futile exercise given its complex rela-
tionship with other nutrients and pollutants.
Giventhiscomplexity,howcanstudiesofenvironmen-
talriskmoveforward?First,EWASanalysesshouldbeap-
pliedtomultipledatasets,andconsistencycanbeformally
examinedforallassessedcorrelations.Second,thetempo-
ral relationship between exposure and changes in health
parametersmayofferhelpfulhintsaboutwhichofthesig-
nalsaremorethansimplecorrelations.Third,standardized
adjustedanalyses,inwhichadjustmentsareperformedsys-
tematicallyandinthesamewayacrossmultipledatasets
may also help. This is in stark contrast with the current
model,wherebymostepidemiologicstudiesusesingledata
setswithoutreplicationaswellasnon–time-dependentas-
sessments,andreportedadjustmentsaremarkedlydiffer-
entacrossreportsanddatasets,eventhoseperformedby
thesameteam(differentapproachesincreasevaliditybut
mustbereconciledandassimilated).
However, eventually for most environmental cor-
relates,theremaybeunsurpassabledifficultyestablish-
ing potential causal inferences based on observationa
data alone. Factors that seem protective may some-
times be tested in randomized trials. The complexity of
VIEWPOINT
Chirag J. Patel, PhD
Center for Biomedical
Informatics, Harvard
Medical School,
Boston, Massachusetts.
John P. A. Ioannidis,
MD, DSc
Stanford Prevention
Research Center,
Department of Health
Research and Policy,
Department of
Medicine, Stanford
University School of
Medicine, Stanford,
California, Department
of Statistics, Stanford
University School of
Humanities and
Sciences, Stanford,
California, and
Meta-Research
Innovation Center at
Stanford (METRICS),
Stanford, California.
Opinion
High-throughputascertainmentofendogenousindicatorsofen-
vironmentalexposurethatmayreflecttheexposomeincreasinglyat-
tractattention,andtheirperformanceneedstobecarefullyevaluated.
These include chemical detection of indicators of exposure through
US federally funded gene expression experiment data be d
itedinpublicrepositoriessuchastheGeneExpressionOmnibu
repositoryhasbeeninstrumentalindevelopmentoftechnolo
measurement of gene expression, data standardization, and
Figure. Correlation Interdependency Globes for 4 Environmental Exposures (Cotinine, Mercury, Cadmium, Trans-β-Carotene) in National Health
Nutrition Examination Survey (NHANES) Participants, 2003-2004
A Serum cotinine B Serum total mercury C Serum cadmium D Serum trans-β-carotene
37 Total correlations 42 Total correlations 68 Total correlations 68 Total correlations
Negative correlation Positive correl
Infectious
agents
Pollutants
Nutrients
and vitamins
Demographic
attributes
Eachcorrelationinterdependencyglobeincludes317environmentalexposures
representedbythenodesaroundtheperipheryoftheglobe.Pairwisecorrelations
aredepictedbyedges(lines)betweenthenodeofinterest(arrowhead)andother
nodes.Correlationswithabsolutevaluesexceeding0.2areshown(stronges
Thesizeofeachnodeisproportionaltothenumberofedgesforanode,and
thicknessofeachedgeindicatesthemagnitudeofthecorrelation.
Opinion Viewpoint
•data mining and informatics to tackle complexity
what causes what?
confounding
•new ‘omics technologies
EWAS
74. Studying the Elusive Environment in Large Scale
Itispossiblethatmorethan50%ofcomplexdiseaserisk
isattributedtodifferencesinanindividual’senvironment.1
Airpollution,smoking,anddietaredocumentedenviron-
mental factors affecting health, yet these factors are but
a fraction of the “exposome,” the totality of the exposure
loadoccurringthroughoutaperson’slifetime.1
Investigat-
ing one or a handful of exposures at a time has led to a
highly fragmented literature of epidemiologic associa-
tions. Much of that literature is not reproducible, and se-
lectivereportingmaybeamajorreasonforthelackofre-
producibility. A new model is required to discover
environmental exposures associated with disease while
mitigating possibilities of selective reporting.
Toremedythelackofreproducibilityandconcernsof
validity, multiple personal exposures can be assessed si-
multaneously in terms of their association with a condi-
tion or disease of interest; the strongest associations can
then be tentatively validated in independent data sets
(eg, as done in references 2 and 3).2,3
The main advan-
tages of this process include the ability to search the list
ofexposuresandadjustformultiplicitysystematicallyand
reportalltheprobedassociationsinsteadofonlythemost
significant results. The term “environment-wide associa-
tion studies” (EWAS) has been used to describe this ap-
proach (an analogy to genome-wide association stud-
ies).Forexample,Wangetal4
screenedmorethan2000
chemicalsinserumtodiscoverendogenousexposuresas-
sociated with risk for cardiovascular disease.
Therearenotablehurdlesinanalyzing“big”environ-
mental data. These same problems affect epidemiology
of1-risk-factor-at-a-time,butinEWAStheirprevalencebe-
comes more clearly manifest at large scale. When study-
the EWAS vantage point, intervening on β-carotene
(Figure, D) seems a futile exercise given its complex rela-
tionship with other nutrients and pollutants.
Giventhiscomplexity,howcanstudiesofenvironmen-
talriskmoveforward?First,EWASanalysesshouldbeap-
pliedtomultipledatasets,andconsistencycanbeformally
examinedforallassessedcorrelations.Second,thetempo-
ral relationship between exposure and changes in health
parametersmayofferhelpfulhintsaboutwhichofthesig-
nalsaremorethansimplecorrelations.Third,standardized
adjustedanalyses,inwhichadjustmentsareperformedsys-
tematicallyandinthesamewayacrossmultipledatasets,
may also help. This is in stark contrast with the current
model,wherebymostepidemiologicstudiesusesingledata
setswithoutreplicationaswellasnon–time-dependentas-
sessments,andreportedadjustmentsaremarkedlydiffer-
entacrossreportsanddatasets,eventhoseperformedby
thesameteam(differentapproachesincreasevaliditybut
mustbereconciledandassimilated).
However, eventually for most environmental cor-
relates,theremaybeunsurpassabledifficultyestablish-
ing potential causal inferences based on observational
data alone. Factors that seem protective may some-
times be tested in randomized trials. The complexity of
the multiple correlations also highlights the challenge
thatinterveningtomodify1putativeriskfactoralsomay
inadvertently affect multiple other correlated factors.
Even when a seemingly simple intervention is tested in
randomizedtrials(affectingasingleriskfactoramongthe
manycorrelations),theinterventionisnotreallysimple.
In essence what is tested are multiple perturbations of
factors correlated with the one targeted for interven-
VIEWPOINT
Chirag J. Patel, PhD
Center for Biomedical
Informatics, Harvard
Medical School,
Boston, Massachusetts.
John P. A. Ioannidis,
MD, DSc
Stanford Prevention
Research Center,
Department of Health
Research and Policy,
Department of
Medicine, Stanford
University School of
Medicine, Stanford,
California, Department
of Statistics, Stanford
University School of
Humanities and
Sciences, Stanford,
California, and
Meta-Research
Innovation Center at
Stanford (METRICS),
Stanford, California.
Opinion
JAMA, 2014
JECH, 2014
Proc Symp Biocomp, 2015
•longitudinal publicly available data
How can we proceed to study the elusive environment in
large scale for discovery-based research?
Studying the Elusive Environment in Large Scale
Itispossiblethatmorethan50%ofcomplexdiseaserisk
isattributedtodifferencesinanindividual’senvironment.1
Airpollution,smoking,anddietaredocumentedenviron-
mental factors affecting health, yet these factors are but
a fraction of the “exposome,” the totality of the exposure
loadoccurringthroughoutaperson’slifetime.1
Investigat-
ing one or a handful of exposures at a time has led to a
highly fragmented literature of epidemiologic associa-
tions. Much of that literature is not reproducible, and se-
lectivereportingmaybeamajorreasonforthelackofre-
producibility. A new model is required to discover
environmental exposures associated with disease while
mitigating possibilities of selective reporting.
Toremedythelackofreproducibilityandconcernsof
validity, multiple personal exposures can be assessed si-
multaneously in terms of their association with a condi-
tion or disease of interest; the strongest associations can
then be tentatively validated in independent data sets
(eg, as done in references 2 and 3).2,3
The main advan-
tages of this process include the ability to search the list
ofexposuresandadjustformultiplicitysystematicallyand
reportalltheprobedassociationsinsteadofonlythemost
significant results. The term “environment-wide associa-
tion studies” (EWAS) has been used to describe this ap-
the EWAS vantage point, intervening on β-carotene
(Figure, D) seems a futile exercise given its complex rela-
tionship with other nutrients and pollutants.
Giventhiscomplexity,howcanstudiesofenvironmen-
talriskmoveforward?First,EWASanalysesshouldbeap-
pliedtomultipledatasets,andconsistencycanbeformally
examinedforallassessedcorrelations.Second,thetempo-
ral relationship between exposure and changes in health
parametersmayofferhelpfulhintsaboutwhichofthesig-
nalsaremorethansimplecorrelations.Third,standardized
adjustedanalyses,inwhichadjustmentsareperformedsys-
tematicallyandinthesamewayacrossmultipledatasets
may also help. This is in stark contrast with the current
model,wherebymostepidemiologicstudiesusesingledata
setswithoutreplicationaswellasnon–time-dependentas-
sessments,andreportedadjustmentsaremarkedlydiffer-
entacrossreportsanddatasets,eventhoseperformedby
thesameteam(differentapproachesincreasevaliditybut
mustbereconciledandassimilated).
However, eventually for most environmental cor-
relates,theremaybeunsurpassabledifficultyestablish-
ing potential causal inferences based on observationa
data alone. Factors that seem protective may some-
times be tested in randomized trials. The complexity of
VIEWPOINT
Chirag J. Patel, PhD
Center for Biomedical
Informatics, Harvard
Medical School,
Boston, Massachusetts.
John P. A. Ioannidis,
MD, DSc
Stanford Prevention
Research Center,
Department of Health
Research and Policy,
Department of
Medicine, Stanford
University School of
Medicine, Stanford,
California, Department
of Statistics, Stanford
University School of
Humanities and
Sciences, Stanford,
California, and
Meta-Research
Innovation Center at
Stanford (METRICS),
Stanford, California.
Opinion
High-throughputascertainmentofendogenousindicatorsofen-
vironmentalexposurethatmayreflecttheexposomeincreasinglyat-
tractattention,andtheirperformanceneedstobecarefullyevaluated.
These include chemical detection of indicators of exposure through
US federally funded gene expression experiment data be d
itedinpublicrepositoriessuchastheGeneExpressionOmnibu
repositoryhasbeeninstrumentalindevelopmentoftechnolo
measurement of gene expression, data standardization, and
Figure. Correlation Interdependency Globes for 4 Environmental Exposures (Cotinine, Mercury, Cadmium, Trans-β-Carotene) in National Health
Nutrition Examination Survey (NHANES) Participants, 2003-2004
A Serum cotinine B Serum total mercury C Serum cadmium D Serum trans-β-carotene
37 Total correlations 42 Total correlations 68 Total correlations 68 Total correlations
Negative correlation Positive correl
Infectious
agents
Pollutants
Nutrients
and vitamins
Demographic
attributes
Eachcorrelationinterdependencyglobeincludes317environmentalexposures
representedbythenodesaroundtheperipheryoftheglobe.Pairwisecorrelations
aredepictedbyedges(lines)betweenthenodeofinterest(arrowhead)andother
nodes.Correlationswithabsolutevaluesexceeding0.2areshown(stronges
Thesizeofeachnodeisproportionaltothenumberofedgesforanode,and
thicknessofeachedgeindicatesthemagnitudeofthecorrelation.
Opinion Viewpoint
•data mining and informatics to tackle complexity
what causes what?
confounding
•new ‘omics technologies
EWAS
75. with Paul Avillach, Michael McDuffie, Jeremy Easton-Marks,
Cartik Saravanamuthu and the BD2K PIC-SURE team
40K participants
>1000 indicators of exposure
Data and API available now
http://nhanes.hms.harvard.edu
Download all the data:
NHANES exposome browser