Intro to Biomedical Informatics 701

Bioinformatics for discovery:
Introduction to GWAS and EWAS
BMI 701:Introduction to Biomedical Informatics

12/1/2015
chirag@hms.harvard.edu

@chiragjp

www.chiragjpgroup.org
Chirag J Patel

P = G + EType 2 Diabetes

Cancer

Alzheimer’s

Gene expression
Phenotype Genome
Variants
Environment
Infectious agents

Nutrients

Pollutants

Drugs
Complex traits are a function of genes and
environment...

We are great at G investigation!
over 2000

Genome-wide Association Studies (GWAS)

https://www.ebi.ac.uk/gwas/
G

>2,000 traits/diseases

>15,000 SNPs

>16,000 SNP-trait associations
https://www.ebi.ac.uk/gwas/

Dissecting G in P:
What is a Genome-wide Association Study?
Hypothesis-free “search engine” for genetic variants

associated with a complex trait or disease

in unrelated populations
SNP(A) SNP(a)
diseased
non-
diseased
SNP(A) SNP(a)
diseased
non-
diseased
SNP(A) SNP(a)
diseased
non-
diseased
SNP(A) SNP(a)
diseased
non-
diseased
SNP(A) SNP(a)
diseased
non-
diseased
SNP(A) SNP(a)
diseased
non-
diseased
SNP(A) SNP(a)
diseased
non-
diseased
SNP(A) SNP(a)
diseased
non-
diseased
SNP(A) SNP(a)
diseased
non-
diseased
SNP(Z) SNP(z)
diseased
non-
diseasedgenome-wide

A new paradigm of GWAS for discovery of G in P:
Human Genome Project to GWAS
Sequencing of the genome
2001
HapMap project:
http://hapmap.ncbi.nlm.nih.gov/
Characterize common variation
2001-current day
High-throughput variant
assay
< $99 for ~1M variants
Measurement tools
~2003 (ongoing)
ARTICLES
Genome-wide association study of 14,000
cases of seven common diseases and
3,000 shared controls
The Wellcome Trust Case Control Consortium*
There is increasing evidence that genome-wide association (GWA) studies represent a powerful approach to the
identification of genes involved in common human diseases. We describe a joint GWA study (using the Affymetrix GeneChip
500K Mapping Array Set) undertaken in the British population, which has examined ,2,000 individuals for each of 7 major
diseases and a shared set of ,3,000 controls. Case-control comparisons identified 24 independent association signals at
P , 5 3 1027
: 1 in bipolar disorder, 1 in coronary artery disease, 9 in Crohn’s disease, 3 in rheumatoid arthritis, 7 in type 1
diabetes and 3 in type 2 diabetes. On the basis of prior findings and replication studies thus-far completed, almost all of these
signals reflect genuine susceptibility effects. We observed association at many previously identified loci, and found
compelling evidence that some loci confer risk for more than one of the diseases studied. Across all diseases, we identified a
25 27
Vol 447|7 June 2007|doi:10.1038/nature05911
Nature 2008
Comprehensive, high-throughput analyses
GWAS

Number of raw publications with subject of
“GWAS”
0
1000
2000
3000
1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013
Year
NumberofPublications'GWAS'
pubmed MeSH terms:
human + GWAS

Number of raw publications with subject of
“GWAS”
0
1000
2000
3000
1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013
Year
NumberofPublications'GWAS'
pubmed MeSH terms:

human + GWAS
Risch + Merikangas
linkage vs. association
human genome sequenced
GWAS
age-related macular degeneration
mega-meta-GWAS
WTCCC
GWAS is relevant today (even with NGS) around the corner

Geneticists have made substantial progress in
identifying the genetic basis of many human
diseases, at least those with conspicuous deter-
minants.ThesesuccessesincludeHuntington's
disease, Alzheimer's disease, and some forms of
breast cancer. However, the detection of ge-
netic factors for complex diseases-such as
schizophrenia, bipolardisorder, anddiabetes-
has been far more complicated. There have
been numerous reports of genes or loci that
might underlie these disorders, butfew ofthese
findings have been replicated. The modest na-
ture ofthe gene effectsforthese disorders likely
explains the contradictory and inconclusive
claims about their identification. Despite the
small effects of such genes, the magnitude of
theirattributable risk (theproportion ofpeople
affectedduetothem) maybelargebecause they
are quite frequent in the population, making
them ofpublic health significance.
Has the genetic study ofcomplex disorders
reached its limits? The persistent lack of
replicability of these reports of linkage be-
tween various loci and complex diseases
might imply that it has. We argue below that
age analysis we have chosen for this argu-
ment is a popular current paradigm in which
pairs of siblings, both with the disease, are
examined for sharing of alleles at multiple
sites in the genome defined by genetic mark-
ers. The more often the affected siblings
share the same allele at a particular site, the
more likely the site is close to the disease
gene. Using the formulas in (1), we calculate
the expected proportion Yofalleles shared by
a pair ofaffected siblings for the best possible
case-that is, a closely linked marker locus
(recombination fraction 0 = 0) that is fully
informative (heterozygosity = 1) (2)-as
1 +W wherew= pq(y-1)2
2+w (py+q)2
If there is no linkage of a marker at a
particular site to the disease, the siblings
would be expected to share alleles 50% ofthe
time; that is, Y would equal 0.5. Values of Y
for various values ofp and y are given in the
third column of the table. For an allele of
moderate frequency (p is 0.1 to 0.5) that con-
linkage analysis for
about 2 or less will ne
because the numbe
(more than -2500)
able.
Although testsof
est effect are of low
above example, direc
a disease locus itself
To illustrate this poi
sion/disequilibrium t
In this test, transmis
at a locus from heter
affected offspring is e
lian inheritance, all a
chance ofbeing tran
eration. In contrast,
associated with dise
mitted more often th
For this approach,
with multiple affect
just on single affect
parents. For the same
can calculate the pr
parents as pq(y + 1
the probability for a
transmit the high ris
Association tests ca
pairs of affected sibl
associatedwithdiseas
over 50% is the same
the probability ofpar
creased at lowvalues
the probability ofpar
creased. The formula
The Future of Genetic Studies of
Complex Human Diseases
Neil Risch and Kathleen Merikangas
onimm, 0In"a0,"a,
Geneticists have made substantial progress in
identifying the genetic basis of many human
diseases, at least those with conspicuous deter-
minants.ThesesuccessesincludeHuntington's
disease, Alzheimer's disease, and some forms of
breast cancer. However, the detection of ge-
netic factors for complex diseases-such as
schizophrenia, bipolardisorder, anddiabetes-
has been far more complicated. There have
been numerous reports of genes or loci that
might underlie these disorders, butfew ofthese
findings have been replicated. The modest na-
ture ofthe gene effectsforthese disorders likely
explains the contradictory and inconclusive
claims about their identification. Despite the
small effects of such genes, the magnitude of
theirattributable risk (theproportion ofpeople
affectedduetothem) maybelargebecause they
are quite frequent in the population, making
them ofpublic health significance.
Has the genetic study ofcomplex disorders
reached its limits? The persistent lack of
replicability of these reports of linkage be-
tween various loci and complex diseases
might imply that it has. We argue below that
age analysis we have chosen for this ar
ment is a popular current paradigm in whi
pairs of siblings, both with the disease,
examined for sharing of alleles at multip
sites in the genome defined by genetic mar
ers. The more often the affected sibli
share the same allele at a particular site, t
more likely the site is close to the dise
gene. Using the formulas in (1), we calcul
the expected proportion Yofalleles shared
a pair ofaffected siblings for the best possi
case-that is, a closely linked marker lo
(recombination fraction 0 = 0) that is fu
informative (heterozygosity = 1) (2)-as
1 +W wherew= pq(y-1)2
2+w (py+q)2
If there is no linkage of a marker at
particular site to the disease, the sibli
would be expected to share alleles 50% oft
time; that is, Y would equal 0.5. Values o
for various values ofp and y are given in t
third column of the table. For an allele
moderate frequency (p is 0.1 to 0.5) that co
The Future of Genetic Studies of
Complex Human Diseases
Neil Risch and Kathleen Merikangas
Science, 1996
A new paradigm is needed for discovery!

Single nucleotide polymorphisms (SNPs):
How many SNPs are in the human genome?
>3,000,000,000 bases in human genome
SNPs appear ~1000 bases
~3,000,000 SNPs
40-60% have minor allele frequency <5%

GWAS focus on frequency >5%
HapMap Consortium, 2010

Can’t measure everything:
Tag SNPs and Linkage Disequilibrium (LD)
LD = co-occurance of SNPs in a contiguous region
Bush and Moore, 2012

The phenomenon of LD makes GWAS possible:
How and why?: Indirect association
additional studies to map the precise
location of the influential SNP.
Conceptually, the end result of GWAS
under the common disease/common var-
needed to capture the variation
African genome.
It is important to note that t
ogy for measuring genomic
Figure 3. Indirect Association. Genotyped SNPs often lie in a region of high linka
will be statistically associated with disease as a surrogate for the disease SNP throu
doi:10.1371/journal.pcbi.1002822.g003
Bush and Moore, 2012
LD blocks

Can’t measure everything:
Tag SNPs and Linkage Disequilibrium
Tag SNPs are common proxies for other SNPs

500K - 1M per chip
tified significant associations for seven SNPs representing four new
T2DM loci (Table 1). In all cases, the strongest association for the
MAX statistic (see Methods) was obtained with the additive model.
of this gene (Fig. 2a)
solely in the secretory
final stages of insulin
*
*
*
0
2
4
–log10[P]
–log10[P]
*
4954642sr
2373971sr
3373971sr
445409sr
8012261sr
3349941sr
883429sr
2019462sr
0349941sr
90350501sr
036169sr
0415007sr
2225991sr
6136642sr
8136642sr
1869646sr
8798751sr
04928201sr
3926642sr
5926642sr
43666231sr
9926642sr
2954642sr
01350501sr
5769646sr
4577187sr
4769646sr
41350501sr
5784931sr
2173387sr
39250501sr
5050007sr
7492602sr
1255051sr
156868sr
4373387sr
4784931sr
7501107sr
2697402sr
91518711sr
6461001sr
29250501sr
5889103sr
8669646sr
0889103sr
4688392sr
SLC30A8 IDE
0
2
4
7912381sr
3148707sr
0283856sr
52078111sr
5227373sr
0491242sr
2369412sr
2297881sr
662155sr
7790197sr
44068701sr
35075221sr
5826807sr
7851092sr
9409522sr
–log10[P]
–log10[P]
EXT2 ALX4
0
2
4
*** *
0
2
4
a b
c d
LD block
2 alleles are correlated because they are inherited
together
Sladek et al, 2007

image: www.lifa-core.de/
Digitizing SNPs:
e.g., Illumina Inﬁnium Array
image: illumina.com

Assessing Thousands of Factors Simultaneously:
Data-driven search for diﬀerences in SNP frequencies
~100,000 - ~1,000,000 association tests
disease cases
healthy controls
GCAGGTACATG...GGTA...
GCAGGTACACG...GGTA...
disease cases
healthy controls

Associating One SNP with Disease
Case-Control Study Design
DiseaseSNP (A/a)
?
A a
diseased
non-
diseased
cases
controls

What is an “Odds Ratio”?
DiseaseSNP (A/a)
?
A a
diseased c d
non-
diseased
x y
cases
controls
Chi-squared test
Odds Ratio a vs A:
Odds of disease with allele a
vs.
Odds of disease with allele A
1: equal odds (no diﬀerence)

>1: increased odds (increased risk)

<1: decreased odds (decreased risk)

Calculating the Odds Ratio
DiseaseSNP (A/a)
?
A a
diseased c d
non-
diseased
x y
cases
controls
Chi-squared test

Odds Ratio
dx
cy
y/x
d/c
[d/(d+y)]/[y/(d+y)]
Odds Ratio a vs A:
[c/(x+y)]/[x/(c+x)]
Odds with allele a
Odds with allele A
How would you interpret an OR of 2?

Cohort Study Design
DiseaseSNP (A/a)
?
•Direct measure of risk vs. odds ratio

•Need to wait!
•If incidence is low, N needs to be large!
Non-diseasedSNP (A/a)
vs.
Cox survival regression

Relative Risk

Models to associate genotypes with disease
Examples for a case-control study
Aa AA
AA
aa Aa
AaaaAa
Disease Non-diseased
ND=4 NC=4

Examples for a case-control study
Aa AA
AA
aa Aa
AaaaAa
Disease Non-diseased
ND=4 NC=4
A a
diseased
non-
diseased
6 2
2 6
OR A (vs a)

OR a (vs A)

AA Aa aa
diseased
non-
diseased
Genotypic Test (“2 or 1 df test”)
Aa AA
AA
aa Aa
AaaaAa
Diseased Non-diseased
ND=4 NC=4
2 OR AA (vs. Aa)

aa (vs. Aa)
2 0
220

Associating One SNP with Quantitative Trait
(e.g., height, weight, cholesterol)
40
60
80
100
1 2 3
factor(SNP)
trait
GG GC CC
height
SNP rs1234 SNP rs123456
25
50
75
100
125
1 2 3
factor(SNP)
trait
height
CC CT TT

Associating One SNP with Quantitative Trait
Linear Regression and Additive Risk Model
y=ɑ+βx+ε
25
50
75
100
125
1 2 3
factor(SNP)
trait
height
CC (0) CT (1) TT (2)
SNP rs123456
height = ɑ+βx
xCC=0 if individual is CC
xCT=1 if individual is CT
xTT=2 if individual is TT
ɑ
β: change in height for 1 risk allele
T= risk allele
β

Prototypical “Manhattan plot” to visualize
associations
Science, 2007
~100,000 - ~1,000,000 association tests
evol
part
ease
tase
well
biol
T
capt
imp
STR
reve
subs
libri
clea
−log10(P)
0
5
10
15
Chromosome
22
X
21
20
19
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
80
60
40
100
rvedteststatistic
a
b
NATURE|Vol 447|7 June 2007
AA Aa aa
diseased
non-
diseased

ibility with schizophrenia, a psychotic disorder with many similar-
ities to BD. In particular association findings have been reported with
assium channel. Ion channelopathies are well-recognized as causes of
episodic central nervous system disease, including seizures, ataxias
−log10
(P)
0
5
10
15
0
5
10
15
0
5
10
15
0
5
10
15
0
5
10
15
0
5
10
15
0
5
10
15
Chromosome
Type 2 diabetes
22
XX
21
20
19
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
22
XX
21
20
19
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
22
XX
21
20
19
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
22
XX
21
20
19
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
22
XX
21
20
19
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
22
XX
21
20
19
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
22
XX
21
20
19
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
Coronary artery disease
Crohn’s disease
Hypertension
Rheumatoid arthritis
Type 1 diabetes
Bipolar disorder
Figure 4 | Genome-wide scan for seven diseases. For each of seven diseases
2log10 of the trend test P value for quality-control-positive SNPs, excluding
Chromosomes are shown in alternating colours for clarity, with
P values ,1 3 1025
highlighted in green. All panels are truncated at

Type I Error:
False Positives!
what is a p-value?
chance we attain the observed result if no difference (H0)
Many tests: some can be significant (low p-value by chance)!
100 tests at a p-value of 0.05...
how many would be significant per chance?
Bonferroni “correction”:

Correct the 0.05 significance level by number of tests
e.g., 1M SNPs: 0.05/1x10-6 = 5x10-8

QQplot:
Distribution of of observed p-values vs. Ho p-
values
Histogram of runif(10000)
runif(10000)
Frequency
0.0 0.2 0.4 0.6 0.8 1.0
0100200300400500
p-values under Ho
Histogram of gwas$P.value
gwas$P.value
Frequency
0.0 0.2 0.4 0.6 0.8 1.0
050000100000150000
p-values of GWAS in Total Cholesterol
Global Lipids Consortium, 2012random uniform distribution

QQplot:
Distribution of of observed p-values vs. Ho p-
values
Histogram of gwas$P.value
gwas$P.value
Frequency
0.0 0.2 0.4 0.6 0.8 1.0
050000100000150000
p-values of GWAS in Total Cholesterol

Which diseases show evidence of association?
Examining the QQplot of test statistics in WTCCC
sent study cannot provideconclusive exclusion of any given gene. This
is the consequence of several factors including: less-than-complete
coverage of common variation genome-wide on the Affymetrix chip;
poor coverage (by design) of rare variants, including many structural
variants (thereby reducing power to detect rare, penetrant, alleles)25
;
difficultieswithdefining thefullgenomicextentofthegene ofinterest;
and, despite the sample size, relatively low power to detect, at levels of
already allow us, for selected diseases, to highlight pathways and
mechanisms of particular interest. Naturally, extensive resequencing
and fine-mapping work, followed by functional studies will be
required before such inferences can be translated into robust state-
ments about the molecular and physiological mechanisms involved.
We turn now to a discussion of the main findings for each disease,
focusing here only on the most significant and interesting results
25
20
20
15
15
10
10
5
5
30
0
0
25
20
20
15
15
10
10
5
5
30
0
0
25
20
20
15
15
10
10
5
5
30
0
0
25
20
20
15
15
10
10
5
5
30
0
0
25
20
20
15
15
10
10
5
5
30
0
0
25
20
20
15
15
10
10
5
5
30
0
0
25
20
20
15
15
10
10
5
5
30
0
0
BD
Observedteststatistic
Expected chi-squared value
CAD CD
HT RA
T2D
T1D
Figure 3 | Quantile-quantile plots for seven genome-wide scans. For each
of the seven disease collections, a quantile-quantile plot of the results of the
trend test is shown in black for all SNPs that pass the standard project filters,
have a minor allele frequency .1% and missing data rate ,1%. SNPs that
360,000 SNPs. SNPs at which the test statistic exceeds 30 are represented by
triangles. Additional quantile-quantile plots, which also exclude all SNPs
located in the regions of association listed in Table 3, are superimposed in
blue (for BD, the exclusion of these SNPs has no visible effect on the plot, and

Observational associations do not equal causation...

Ice Cream $ Drowning
Confounding bias
What is a confounder?
Summer!
?
Confounder is correlated to both the “risk” factor and disease,

leading to invalid inference.

Common source of bias in observational studies (e.g., case-control,
cohort, etc)

SNP Disease
Population Stratiﬁcation:
A source of possible confounding in GWAS
race/ethnicity
?
Ancestry correlated with allele frequency and disease

GWAS are done on speciﬁc populations separately.

(most have been done in populations of European ancestry)

FTO Diabetes
Mediation
SNPs indicative of a mediator factor?
Example: FTO and Type 2 Diabetes
Body Mass
?
Association between FTO and Type 2 Diabetes via BMI?
... or does FTO have a independent role in Type 2 Diabetes...?
FTO Body Mass

PLINK:
(Standard) Whole Genome Analysis Software

PLINK:
(Standard) Whole Genome Analysis Software
http://pngu.mgh.harvard.edu/~purcell/plink/
•cited >9000 times since 2007

•allele frequency

•linkage disequilibrium (LD)

•data manipulation/ﬁltering

•association: allelic, genotypic models

•chi-square

•logistic

•linear

Examples:

GWASs in Type 2 Diabetes

Type 2 Diabetes Mellitus:
A complex, multifactorial disease
•Insulin production vs. use

•beta-cell function

•insulin sensitivity (BMI)

•Moves glucose from blood into
cells

•Complications arise due to
glucose in blood, hyperglycemia
•diagnosed by blood glucose
levels

CDC,
family history: 25%
body weight, diet, lifestyle, age

ARTICLES
A genome-wide association study
identifies novel risk loci for type 2 diabetes
Robert Sladek1,2,4
, Ghislain Rocheleau1
*, Johan Rung4
*, Christian Dina5
*, Lishuang Shen1
, David Serre1
,
Philippe Boutin5
, Daniel Vincent4
, Alexandre Belisle4
, Samy Hadjadj6
, Beverley Balkau7
, Barbara Heude7
,
Guillaume Charpentier8
, Thomas J. Hudson4,9
, Alexandre Montpetit4
, Alexey V. Pshezhetsky10
, Marc Prentki10,11
,
Barry I. Posner2,12
, David J. Balding13
, David Meyre5
, Constantin Polychronakos1,3
& Philippe Froguel5,14
Type 2 diabetes mellitus results from the interaction of environmental factors with a combination of genetic variants, most of
which were hitherto unknown. A systematic search for these variants was recently made possible by the development of
high-density arrays that permit the genotyping of hundreds of thousands of polymorphisms. We tested 392,935
single-nucleotide polymorphisms in a French case–control cohort. Markers with the most significant difference in genotype
frequencies between cases of type 2 diabetes and controls were fast-tracked for testing in a second cohort. This identified
four loci containing variants that confer type 2 diabetes risk, in addition to confirming the known association with the TCF7L2
gene. These loci include a non-synonymous polymorphism in the zinc transporter SLC30A8, which is expressed exclusively in
insulin-producing b-cells, and two linkage disequilibrium blocks that contain genes potentially involved in b-cell
development or function (IDE–KIF11–HHEX and EXT2–ALX4). These associations explain a substantial portion of disease risk
and constitute proof of principle for the genome-wide approach to the elucidation of complex genetic traits.
The rapidly increasing prevalence of type 2 diabetes mellitus (T2DM) is
thought to be due to environmental factors, such as increased availabil-
ity of food and decreased opportunity and motivation for physical
activity, acting on genetically susceptible individuals. The heritability
of T2DM is one of the best established among common diseases and,
consequently, genetic risk factors for T2DM have been the subject of
intense research1
. Although the genetic causes of many monogenic
forms of diabetes (maturity onset diabetes in the young, neonatal mito-
chondrial and other syndromic types of diabetes mellitus) have been
elucidated, few variants leading to common T2DM have been clearly
identified and individually confer only a small risk (odds ratio < 1.1–
1.25) of developing T2DM1
. Linkage studies have reported many
T2DM-linked chromosomal regions and have identified putative, cau-
sative genetic variants in CAPN10 (ref. 2), ENPP1 (ref. 3), HNF4A (refs
4, 5) and ACDC (also called ADIPOQ)6
. In parallel, candidate-gene
studieshavereportedmanyT2DM-associatedloci,withcodingvariants
in the nuclear receptor PPARG (P12A)7
and the potassium channel
KCNJ11 (E23K)8
being among the very few that havebeen convincingly
replicated. The strongest known (odds ratio < 1.7) T2DM association9
was recently mapped to the transcription factor TCF7L2 and has been
consistently replicated in multiple populations10–20
.
Subjects and study design
The recent availability of high-density genotyping arrays, which com-
bine the power of association studies with the systematic nature of a
genome-wide search, led us to undertake a two-stage, genome-wide
association study to identify additional T2DM susceptibility loci
(Supplementary Fig. 1). In the first stage of this study, we obtained
genotypes for 392,935 single-nucleotide polymorphisms (SNPs) in
1,363 T2DM cases and controls (Supplementary Table 1). In order to
enrich for risk alleles21
, the diabetic subjects studied in stage 1 were
selected to have at least one affected first degree relative and age at
onset under 45 yr (excluding patients with maturity onset diabetes in
the young). Furthermore, in order to decrease phenotypic hetero-
geneity and to enrich for variants determining insulin resistance and
b-cell dysfunction through mechanisms other than severe obesity, we
initially studied diabetic patients with a body mass index (BMI)
,30 kg m22
. Control subjects were selected to have fasting blood
glucose ,5.7 mmol l21
in DESIR, a large prospective cohort for the
study of insulin resistance in French subjects22
.
Genotypes for each study subject were obtained using two plat-
forms: Illumina Infinium Human1 BeadArrays, which assay 109,365
SNPs chosen using a gene-centred design; and Human Hap300
BeadArrays, which assay 317,503 SNPs chosen to tag haplotype
blocks identified by the Phase I HapMap23
. Of the 409,927 markers
that passed quality control (Supplementary Tables 2 and 3), geno-
types were obtained for an average of 99.2% (Human1) and 99.4%
(Hap300) of markers for each subject with a reproducibility of
.99.9% (both platforms). Forty-three subjects were removed from
analysis because of evidence of intercontinental admixture (Sup-
plementary Fig. 3) and an additional four because their genotype-
determined gender disagreed with clinical records. In total, T2DM
association was tested for 100,764 (Human1) and 309,163 (Hap300)
SNPs representing 392,935 unique loci (Fig. 1). Because of unequal
male/female ratios in our cases and controls, we analysed the 12,666
sex-chromosome SNPs separately for each gender.
*These authors contributed equally to this work.
1
Departments of Human Genetics, 2
Medicine and 3
Pediatrics, Faculty of Medicine, McGill University, Montreal H3H 1P3, Canada. 4
McGill University and Genome Quebec Innovation
Centre, Montreal H3A 1A4, Canada. 5
CNRS 8090-Institute of Biology, Pasteur Institute, Lille 59019 Cedex, France. 6
Endocrinology and Diabetology, University Hospital, Poitiers
86021 Cedex, France. 7
INSERM U780-IFR69, Villejuif 94807, France. 8
Endocrinology-Diabetology Unit, Corbeil-Essonnes Hospital, Corbeil-Essonnes 91100, France. 9
Ontario
Institute for Cancer Research, Toronto M5G 1L7, Canada. 10
Montreal Diabetes Research Center, Montreal H2L 4M1, Canada. 11
Molecular Nutrition Unit and the Department of
Nutrition, University of Montreal and the Centre Hospitalier de l’Universite´ de Montre´al, Montreal H3C 3J7, Canada. 12
Polypeptide Hormone Laboratory and Department of Anatomy
and Cell Biology, Montreal H3A 2B2, Canada. 13
Department of Epidemiology & Public Health, Imperial College, St Mary’s Campus, Norfolk Place, London W2 1PG, UK. 14
Section of
Genomic Medicine, Imperial College London W12 0NN, and Hammersmith Hospital, Du Cane Road, London W12 0HS, UK.
881
Nature©2007 Publishing Group
Nature, 2/2007
References and Notes
1. B. G. Richmond, D. S. Strait, Nature 404, 382 (2000).
2. J. Kingdon, Lowly Origins (Princeton Univ. Press,
Princeton, NJ, 2003).
3. C. V. Ward, M. G. Leakey, A. Walker, Evol. Anthropol. 7,
197 (1999).
4. Y. Haile-Selassie, Nature 412, 178 (2001).
5. T. D. White et al., Nature 440, 883 (2006).
6. K. Kovarovic, P. Andrews, J. Hum. Evol., in press (available
at http://dx.doi.org./doi:10.1016/j.jhevol.2007.01.001; doi:
10.1016/j.jhevol.2007.01.001).
7. N. Patterson, D. J. Richter, S. Gnerre, E. S. Lander,
D. Reich, Nature 441, 1103 (2006).
8. K. D. Hunt et al., Primates 37, 363 (1996).
9. J. G. Fleagle et al., Symp. Zool. Soc. London 48, 359
(1981).
10. R. H. Crompton et al., Cour. Forsch-Inst. Senckenb. 243,
115 (2003).
11. J. T. Stern, Yrb. Phys. Anthropol. 19, 59 (1975).
12. S. K. S. Thorpe, R. H. Crompton, Am. J. Phys. Anthropol.
131, 384 (2006).
13. K. D. Hunt, J. Hum. Evol. 26, 183 (1994).
15. E. Larney, S. Larsen, Am. J. Phys. Anthropol. 125, 42 (2004).
16. S. K. S. Thorpe, R. H. Crompton, Am. J. Phys. Anthropol.
127, 58 (2005).
17. S. K. S. Thorpe, R. H. Crompton, M. M. Gunther,
R. F. Ker, R. McN. Alexander, Am. J. Phys. Anthropol.
110, 179 (1999).
18. R. McN. Alexander, Principles of Animal Locomotion
(Princeton Univ. Press, Princeton, NJ, 2003).
19. C. V. Ward, Yrbk. Phys. Anthropol. 45, 185 (2002).
20. R. W. Wrangham, N. L. Conklin-Brittain, K. D. Hunt,
Int. J. Primatol. 19, 949 (1998).
21. H. Pontzer, R. W. Wrangham, J. Hum. Evol. 46, 317 (2004).
22. R. C. Payne et al., J. Anat. 208, 709 (2006).
23. M. Pickford, B. Senut, B. Gommery, in Late Cenozoic
Environments and Hominid Evolution: a Tribute to Bill
Bishop, P. Andrews, P. Banham, Eds. (Geological Society,
London, 1999), pp. 27–38.
24. N. M. Young, L. MacLatchy, J. Hum. Evol. 46, 163 (2004).
25. D. Gommery, B. Senu, M. Pickford, E. Musiime,
Ann. Paléontol. 88, 167 (2002).
26. C. V. Ward, in Handbook of Paleoanthropology Vol. 2:
Primate Evolution and Human Origins, W. Henke,
I. Tattersall, Eds. (Springer, Heidelberg, Germany, 2007),
pp. 1011–1030.
N. Ogihara, M. Nakatsukasa, Eds. (Springer, Heidelberg,
Germany, 2006), pp. 199–208.
28. C. P. E. Zollikofer et al., Nature 434, 755 (2005).
29. M. Pickford, Anthropologie 69, 191 (2005).
30. We thank the Indonesian Institute of Science, Indonesian
Nature Conservation Service, and Leuser Development
Programme for granting permission and giving support
for research in the Leuser Ecosystem. R. McN. Alexander,
T. M. Blackburn, S. Burtles. J. Rees, N. Jeffery,
E. E. Vereecke, A. Walker, A. Wilson, and B. Wood
commented on the manuscript. R. Savage developed the
animation (fig. S1). Studies of captive animals were
hosted by the North of England Zoological Society. This
research was supported by grants from the Leverhulme
Trust, the Royal Society, the L.S.B. Leakey Foundation,
and the Natural Environment Research Council.
Supporting Online Material
www.sciencemag.org/cgi/content/full/316/5829/1328/DC1
Table S1
Movies S1 to S3
5 February 2007; accepted 18 April 2007
10.1126/science.1140799
Genome-Wide Association Analysis
Identifies Loci for Type 2 Diabetes
and Triglyceride Levels
Diabetes Genetics Initiative of Broad Institute of Harvard and MIT, Lund University,
and Novartis Institutes for BioMedical Research*†
New strategies for prevention and treatment of type 2 diabetes (T2D) require improved insight into
disease etiology. We analyzed 386,731 common single-nucleotide polymorphisms (SNPs) in 1464
patients with T2D and 1467 matched controls, each characterized for measures of glucose
metabolism, lipids, obesity, and blood pressure. With collaborators (FUSION and WTCCC/UKT2D),
we identified and confirmed three loci associated with T2D—in a noncoding region near CDKN2A
and CDKN2B, in an intron of IGF2BP2, and an intron of CDKAL1—and replicated associations near
HHEX and in SLC30A8 found by a recent whole-genome association study. We identified and
confirmed association of a SNP in an intron of glucokinase regulatory protein (GCKR) with serum
triglycerides. The discovery of associated variants in unsuspected genes and outside coding regions
illustrates the ability of genome-wide association studies to provide potentially important clues to
the pathogenesis of common diseases.
T
ype 2 diabetes, obesity, and cardiovascular
risk factors are caused by a combination
of genetic susceptibility, environment, be-
havior, and chance. Whole-genome association
studies (WGAS) offer a new approach to gene
discovery unbiased with regard to presumed
functions or locations of causal variants. This
approach is based on Fisher’s theory for additive
effects at common alleles (1); human heterozy-
to purifying selection, and has been made pos-
sible by genomic advances such as the human
genome sequence, SNP and HapMap databases,
and genotyping arrays (3).
We studied 1464 patients with T2D and
1467 controls from Finland and Sweden, each
characterized for 18 clinical traits: anthropomet-
ric measures, glucose tolerance and insulin se-
cretion, lipids and apolipoproteins, and blood
applying stringent quality-control filters, high-
quality genotypes for 386,731 common SNPs
were obtained (4). To extend the set of putative
causal alleles tested for association, we devel-
oped 284,968 additional multimarker (haplo-
type) tests based on these SNP genotypes (5, 6).
The 671,699 allelic tests capture (correlation co-
efficient r2
≥ 0.8) 78% of common SNPs in
HapMap CEU (3).
Each SNP and haplotype test was assessed
for association to T2D and each of 18 traits with
the software package PLINK (http://pngu.mgh.
harvard.edu/purcell/plink/). For T2D, a weighted
meta-analysis was used to combine results for
the population-based and family-based subsam-
ples (4). For quantitative traits, multivariable
linear or logistic regression with or without co-
variates was performed (4). Association results
for each SNP, haplotype test, and phenotype are
available (www.broad.mit.edu/diabetes/).
In genome-wide analysis involving hundreds
of thousands of statistical tests, modest levels of
bias imposed on the null distribution can over-
whelm a small number of true results. We used
three strategies to search for evidence of sys-
tematic bias from unrecognized population struc-
ture, the analytical approach, and genotyping
artifacts (7, 8). First, we examined the distribu-
tion of P-values in the population-based sam-
ple, observing a close match to that expected
for a null distribution (genomic inflation factor
lGC = 1.05 for T2D). Second, we calculated
G. Brice,6
B. Bullman,7
J. Campbell,8
B. Castle,9
R. Cetnarsyj,8
C.
Chapman,10
C. Chu,11
N. Coates,12
T. Cole,10
R. Davidson,4
A. Donaldson,13
H. Dorkins,3
F. Douglas,2
D. Eccles,9
R. Eeles,1
F. Elmslie,6
D. G. Evans,7
S. Goff,6
S. Goodman,5
D. Goudie,2
J. Gray,15
L. Greenhalgh,16
H. Gregory,17
S. V. Hodgson,6
T. Homfray,6
R. S. Houlston,1
L. Izatt,18
L. Jackson,18
L. Jeffers,19
V. Johnson-Roffey,12
F. Kavalier,18
C. Kirk,19
F. Lalloo,7
C. Langman,18
I. Locke,1
M. Longmuir,4
J. Mackay,20
A. Magee,19
S. Mansour,6
Z. Miedzybrodzka,17
J. Miller,11
P. Morrison,19
V. Murday,4
J. Paterson,21
G. Pichert,18
M. Porteous,8
N. Rahman,6
M. Rogers,15
S. Rowe,22
S. Shanley,1
A. Saggar,6
G. Scott,2
L. Side,23
L. Snadden,4
M. Steel,2
M. Thomas,5
S. Thomas,1
1
Clinical Genetics Service, Royal Marsden Hospital, Downs
Road, Sutton, Surrey, SM2 5PT, UK. 2
Department of
Clinical Genetics, Ninewells Hospital, Dundee, DD1 9SY,
UK. 3
Medical and Community Genetics, Kennedy-Galton
Centre, Level 8V, Northwick Park and St. Mark’s NHS Trust,
Watford Rd, Harrow, HA1 3UJ, UK. 4
Institute of Medical
Genetics, Yorkhill NHS Trust, Dalnair Street, Glasgow, G3
8SJ, UK. 5
Clinical Genetics Department, Royal Devon and
Exeter Hospital (Heavitree), Gladstone Road, Exeter, EX1
2ED, UK. 6
Department of Clinical Genetics, St. George’s
Hospital Medical School, Jenner Wing, Cranmer Terrace,
London, SW17 0RE, UK. 7
Department of Medical Genetics,
St. Mary’s Hospital, Hathersage Road, Manchester, M13
0JH, UK. 8
South East of Scotland Clinical Genetics Service,
Western General Hospital, Crewe Road, Edinburgh, EH4
2XU, UK. 9
Department of Medical Genetics, The Princess
Anne Hospital, Coxford Road, Southampton, S016 5YA, UK.
10
Clinical Genetics Unit, Birmingham Women’s Hospital,
Metchley Park Road, Edgbaston, Birmingham, B15 2TG,
UK. 11
Yorkshire Regional Genetic Service, Department of
Clinical Genetics, Cancer Genetics Building, St. James
University Hospital, Beckett Street, Leeds, LS9 7TF, UK.
12
Department of Clinical Genetics, Leicester Royal Infirm-
ary, Leicester, LE1 5WW, UK. 13
Department of Clinical
Genetics, St Michael’s Hospital, Southwell Street, Bristol,
BS2 8EG, UK. 14
Institute of Human Genetics, International
Centre for Life, Central Parkway, Newcastle upon Tyne, NE1
3BZ, UK. 15
Institute of Medical Genetics, University
Hospital of Wales, Heath Park, Cardiff, CF14 4XW, UK.
16
Department of Clinical Genetics, Alder Hey Children’s
Hospital, Eaton Road, Liverpool L12 2AP, UK. 17
Clinical
Genetics Centre, Argyll House, Foresterhill, Aberdeen,
AB25 2ZR, UK. 18
Clinical Genetics, 7th Floor New Guy’s
House, Guy’s
UK. 19
Clinical
Belvoir Park H
20
Clinical and
Health, 30 G
21
Department
Trust, Box 13
22
Department
of Chester Ho
23
Department
Road, Headin
Supporting
www.sciencema
Materials and
Figs. S1 to S8
Tables S1 to S
References
9 March 2007
Published onli
10.1126/scien
Include this in
A Genome-Wide Association Study of
Type 2 Diabetes in Finns Detects
Multiple Susceptibility Variants
Laura J. Scott,1
Karen L. Mohlke,2
Lori L. Bonnycastle,3
Cristen J. Willer,1
Yun Li,1
William L. Duren,1
Michael R. Erdos,3
Heather M. Stringham,1
Peter S. Chines,3
Anne U. Jackson,1
Ludmila Prokunina-Olsson,3
Chia-Jen Ding,1
Amy J. Swift,3
Narisu Narisu,3
Tianle Hu,1
Randall Pruim,4
Rui Xiao,1
Xiao-Yi Li,1
Karen N. Conneely,1
Nancy L. Riebow,3
Andrew G. Sprau,3
Maurine Tong,3
Peggy P. White,1
Kurt N. Hetrick,5
Michael W. Barnhart,5
Craig W. Bark,5
Janet L. Goldstein,5
Lee Watkins,5
Fang Xiang,1
Jouko Saramies,6
Thomas A. Buchanan,7
Richard M. Watanabe,8,9
Timo T. Valle,10
Leena Kinnunen,10,11
Gonçalo R. Abecasis,1
Elizabeth W. Pugh,5
Kimberly F. Doheny,5
Richard N. Bergman,9
Jaakko Tuomilehto,10,11,12
Francis S. Collins,3
* Michael Boehnke1
*
Identifying the genetic variants that increase the risk of type 2 diabetes (T2D) in humans has
been a formidable challenge. Adopting a genome-wide association strategy, we genotyped 1161
Finnish T2D cases and 1174 Finnish normal glucose-tolerant (NGT) controls with >315,000
single-nucleotide polymorphisms (SNPs) and imputed genotypes for an additional >2 million
autosomal SNPs. We carried out association analysis with these SNPs to identify genetic variants
that predispose to T2D, compared our T2D association results with the results of two similar studies,
and genotyped 80 SNPs in an additional 1215 Finnish T2D cases and 1258 Finnish NGT controls.
We identify T2D-associated variants in an intergenic region of chromosome 11p12, contribute
to the identification of T2D-associated variants near the genes IGF2BP2 and CDKAL1 and the
ria (8). We
ciation with
the log-odd
(8). We ob
versus 31.6
P values <
against the
with a large
consistent w
SNPs that
also sugges
trols by birt
successful;
genomic co
Analysi
allowed us
variation in
portion, w
(8, 13) that
equilibrium
Centre d’E
(Utah resid
1
Department
Genetics, Uni
USA. 2
Depar
Science, 6/2007
Study design: Richa Saxena1–6
and Valeriya Lyssenko7
(Team
Leaders), Peter Almgren,7
Paul I. W. de Bakker,1–6
Noël P.
Burtt,1
Jose C. Florez,1–6
Hong Chen,8
Joanne Meyer,8
Joel N.
Hirschhorn,1,6,9–11
Mark J. Daly,1–3,5
Thomas E. Hughes,8
Leif
Groop,7,12
David Altshuler1–6
(Chair)
Clinical characterization and phenotypes: Valeriya Lyssenko7
and Richa Saxena1–6
(Team Leaders), Peter Almgren,7
Kristin
Ardlie,1
Kristina Bengtsson Boström,13
Noël P. Burtt,1
Hong Chen,8
Jose C. Florez,1–6
Bo Isomaa,14,15
Sekar Kathiresan,1,3,5
Guillaume
Lettre,1,6,9–11
Ulf Lindblad,16
Helen N. Lyon,1,6,9–11
Olle Melander,7
Christopher Newton-Cheh,1–3,5
Peter Nilsson,17
Marju Orho-
Melander,7
Lennart Råstam,16
Elizabeth K. Speliotes,1,3,6,9–11
Marja-Riitta Taskinen,12
Tiinamaija Tuomi,12,15
Benjamin F.
Voight,1–3,5
David Altshuler,1–6
Joel N. Hirschhorn,1,6,9–11
Thomas
E. Hughes,8
Leif Groop7,12
(Chair)
DNA sample QC and diabetes replication genotyping:
Candace Guiducci1
and Valeriya Lyssenko7
(Team Leaders),
Anna Berglund,7
Joyce Carlson,18
Lauren Gianniny,1
Rachel
Hackett,1
Liselotte Hall,18
Johan Holmkvist,7
Esa Laurila,7
Marju
Orho-Melander,7
Marketa Sjögren,7
Maria Sterner,18
Aarti
Surti1
Margareta Svensson,7
Malin Svensson,7
Ryan Tewhey,1
Noël P. Burtt1
(Chair)
Whole genome scan genotyping: Brendan Blumenstiel1
(Team Leader), Melissa Parkin,1
Matthew DeFelice,1
Candace
Guiducci,1
Ryan Tewhey,1
Rachel Barry,1
Wendy Brodeur,1
Noël
P. Burtt,1
Jody Camarata,1
Nancy Chia,1
Mary Fava,1
John
Gibbons,1
Bob Handsaker,1
Claire Healy,1
Kieu Nguyen,1
Casey
Gates,1
Carrie Sougnez,1
Diane Gage,1
Marcia Nizzari,1
David
Altshuler,1–6
Stacey B. Gabriel1
(Chair)
GCKR replication genotyping and analysis (Malmö Diet
and Cancer Study): Sekar Kathiresan1,3,5
(Team Leader),
Candace Guiducci,1
Aarti Surti,1
Noël P. Burtt,1
Olle Melander,7
Marju Orho-Melander7
(Chair)
Statistical analysis: Benjamin F. Voight1–3,5
and Paul I. W.
de Bakker1–6
(Team Leaders), Richa Saxena,1–6
Valeriya
Lyssenko,7
Peter Almgren,7
Noël P. Burtt,1
Hong Chen,8
Gung-Wei
Chirn,8
Qicheng Ma,8
Hemang Parikh,7
Delwood Richardson,8
Darrell Ricke,8
Jeffrey J. Roix,8
Leif Groop,7,12
Shaun Purcell,1,2
David Altshuler,1–6
Mark J. Daly1–3,5
(Chair)
1
Broad Institute of Harvard and Massachusetts Institute of
Technology (MIT), Cambridge, MA 02142, USA. 2
Center for
Human Genetic Research, Massachusetts General Hospital,
Boston, MA 02114, USA. 3
Department of Medicine, Mas-
sachusetts General Hospital, Boston, MA 02114, USA.
4
Department of Molecular Biology, Massachusetts General
Hospital, Boston, MA 02114, USA. 5
Department of Medicine,
Harvard Medical School, Boston, MA 02115, USA. 6
Depart-
ment of Genetics, Harvard Medical School, Boston, MA
02115, USA. 7
Department of Clinical Sciences, Diabetes and
Endocrinology Research Unit, University Hospital Malmö,
Lund University, Malmö, Sweden. 8
Diabetes and Metabolism
Disease Area, Novartis Institutes for BioMedical Research, 100
Technology Square, Cambridge, MA 02139, USA. 9
Depart-
ment of Pediatrics, Harvard Medical School, Boston, MA
02115, USA. 10
Division of Endocrinology, Children’s Hospital,
Boston, MA 02115, USA. 11
Division of Genetics, Children’s
Hospital, Boston, MA 02115, USA. 12
Department of Medicine,
Helsinki University Hospital, University of Helsinki, Helsinki,
Finland. 13
Skaraborg Institute, Skövde, Sweden. 14
Malmska
Municipal Health Center and Hospital, Jakobstad, Finland.
15
Folkhälsan Research Center, Helsinki, Finland. 16
Depart-
ment of Clinical Sciences, Community Medicine Research
Unit, University Hospital Malmö, Lund University, Malmö,
Sweden. 17
Department of Clinical Sciences, Medicine Research
Unit, University Hospital Malmö, Lund University, Malmö, Sweden.
18
Clinical Chemistry, University Hospital Malmö, Lund
University, Malmö, Sweden. 19
Department of Psychiatry,
Massachusetts General Hospital, Harvard Medical School,
Boston, MA 02115, USA.
Supporting Online Material
www.sciencemag.org/cgi/content/full/1142358/DC1
Materials and Methods
Figs. S1 and S2
Tables S1 to S6
References
9 March 2007; accepted 20 April 2007
Published online 26 April 2007;
10.1126/science.1142358
Include this information when citing this paper.
Replication of Genome-Wide
Association Signals in UK Samples
Reveals Risk Loci for Type 2 Diabetes
Eleftheria Zeggini,1,2
* Michael N. Weedon,3,4
* Cecilia M. Lindgren,1,2
* Timothy M. Frayling,3,4
*
Katherine S. Elliott,2
Hana Lango,3,4
Nicholas J. Timpson,2,5
John R. B. Perry,3,4
Nigel W. Rayner,1,2
Rachel M. Freathy,3,4
Jeffrey C. Barrett,2
Beverley Shields,4
Andrew P. Morris,2
Sian Ellard,4,6
Christopher J. Groves,1
Lorna W. Harries,4
Jonathan L. Marchini,7
Katharine R. Owen,1
Beatrice Knight,4
Lon R. Cardon,2
Mark Walker,8
Graham A. Hitman,9
Andrew D. Morris,10
Alex S. F. Doney,10
The Wellcome Trust Case Control
Consortium (WTCCC),† Mark I. McCarthy,1,2
‡§ Andrew T. Hattersley3,4
‡
The molecular mechanisms involved in the development of type 2 diabetes are poorly
understood. Starting from genome-wide genotype data for 1924 diabetic cases and 2938
population controls generated by the Wellcome Trust Case Control Consortium, we set out to detect
replicated diabetes association signals through analysis of 3757 additional cases and 5346 controls
and by integration of our findings with equivalent data from other international consortia. We
detected diabetes susceptibility loci in and around the genes CDKAL1, CDKN2A/CDKN2B, and
IGF2BP2 and confirmed the recently described associations at HHEX/IDE and SLC30A8. Our findings
provide insight into the genetic architecture of type 2 diabetes, emphasizing the contribution of
Here, we describe how integration of data
from the WTCCC scan and our own replication
studies with similar information generated by the
Diabetes Genetics Initiative (DGI) (6) and the
Finland–United States Investigation of NIDDM
Genetics (FUSION) (7) has identified several
additional susceptibility variants for T2D.
In the WTCCC study, analysis of 490,032
autosomal SNPs in 16,179 samples yielded
459,448 SNPs that passed initial quality control
(5). We considered only the 393,453 autosomal
SNPs with minor allele frequency (MAF) ex-
ceeding 1% in both cases and controls and no
extreme departure from Hardy-Weinberg equi-
librium (P < 10−4
in cases or controls) (8). This
T2D-specific data set shows no evidence of sub-
stantial confounding from population substruc-
ture and genotyping biases (8).
To distinguish true associations from those
reflecting fluctuations under the null or residual
errors arising from aberrant allele calling, we first
submitted putative signals from the WTCCC study
to additional quality control, including cluster-
plot visualization and validation genotyping on
REPORTS
onFebruary8,2010www.sciencemag.orgDownloadedfrom

ARTICLES
A genome-wide association study
identifies novel risk loci for type 2 diabetes
Robert Sladek1,2,4
, Ghislain Rocheleau1
*, Johan Rung4
*, Christian Dina5
*, Lishuang Shen1
, David Serre1
,
Philippe Boutin5
, Daniel Vincent4
, Alexandre Belisle4
, Samy Hadjadj6
, Beverley Balkau7
, Barbara Heude7
,
Guillaume Charpentier8
, Thomas J. Hudson4,9
, Alexandre Montpetit4
, Alexey V. Pshezhetsky10
, Marc Prentki10,11
,
Barry I. Posner2,12
, David J. Balding13
, David Meyre5
, Constantin Polychronakos1,3
& Philippe Froguel5,14
Type 2 diabetes mellitus results from the interaction of environmental factors with a combination of genetic variants, most of
which were hitherto unknown. A systematic search for these variants was recently made possible by the development of
high-density arrays that permit the genotyping of hundreds of thousands of polymorphisms. We tested 392,935
single-nucleotide polymorphisms in a French case–control cohort. Markers with the most significant difference in genotype
frequencies between cases of type 2 diabetes and controls were fast-tracked for testing in a second cohort. This identified
four loci containing variants that confer type 2 diabetes risk, in addition to confirming the known association with the TCF7L2
gene. These loci include a non-synonymous polymorphism in the zinc transporter SLC30A8, which is expressed exclusively in
insulin-producing b-cells, and two linkage disequilibrium blocks that contain genes potentially involved in b-cell
development or function (IDE–KIF11–HHEX and EXT2–ALX4). These associations explain a substantial portion of disease risk
and constitute proof of principle for the genome-wide approach to the elucidation of complex genetic traits.
The rapidly increasing prevalence of type 2 diabetes mellitus (T2DM) is
thought to be due to environmental factors, such as increased availabil-
ity of food and decreased opportunity and motivation for physical
activity, acting on genetically susceptible individuals. The heritability
of T2DM is one of the best established among common diseases and,
consequently, genetic risk factors for T2DM have been the subject of
intense research1
. Although the genetic causes of many monogenic
forms of diabetes (maturity onset diabetes in the young, neonatal mito-
chondrial and other syndromic types of diabetes mellitus) have been
elucidated, few variants leading to common T2DM have been clearly
identified and individually confer only a small risk (odds ratio < 1.1–
1.25) of developing T2DM1
. Linkage studies have reported many
T2DM-linked chromosomal regions and have identified putative, cau-
sative genetic variants in CAPN10 (ref. 2), ENPP1 (ref. 3), HNF4A (refs
genotypes for 392,935 single-nucleotide polymorphisms (SNPs) in
1,363 T2DM cases and controls (Supplementary Table 1). In order to
enrich for risk alleles21
, the diabetic subjects studied in stage 1 were
selected to have at least one affected first degree relative and age at
onset under 45 yr (excluding patients with maturity onset diabetes in
the young). Furthermore, in order to decrease phenotypic hetero-
geneity and to enrich for variants determining insulin resistance and
b-cell dysfunction through mechanisms other than severe obesity, we
initially studied diabetic patients with a body mass index (BMI)
,30 kg m22
. Control subjects were selected to have fasting blood
glucose ,5.7 mmol l21
in DESIR, a large prospective cohort for the
study of insulin resistance in French subjects22
.
Genotypes for each study subject were obtained using two plat-
Sladek, 2007How many SNPs (p-value?)
European-based; N ~ 1000
cases: high fasting blood glucose/non-obese

controls: non-obese

Human Hap300 chip, showing no T2DM association in stage 1
(P . 0.01) and separated by at least 100 kb. Using the first principal
component as a covariate for ancestry differences between cases and
controls, we tested for association between rs932206 and disease
status. Our result suggests that this apparent association is largely
BMI on the association between marker and disease, as it is asymp-
totically equivalent to the Armitage trend test used to detect asso-
ciation in stages 1 and 2. None of the associations (Supplementary
Table 7) was substantially changed by considering the effects of these
covariates.
5
3
1
5
3
1
5
3
1
5
3
1
5
3
1
15
10
5
5
3
1
5
3
1
5
3
1
5
3
1
5
3
1
5
3
1
5
3
1
5
3
1
5
3
1
5
3
1
5
3
1
5
3
1
5
3
1
5
3
1
5
3
1
5
3
1
5
3
1
1 2 3 4 5
6 7 8 9 10
11 12 13 14 15
16 17 19 20
21 22 X
18
Figure 1 | Graphical summary of stage 1 association results. T2DM
association was determined for SNPs on the Human1 and Hap300 chips. The
x axis represents the chromosome position from pter; the y axis shows
2log10[pMAX], the P-value obtained by the MAX statistic, for each SNP
(Note the different scale on the y axis of the chromosome 10 plot.). SNPs that
passed the cutoff for a fast-tracked second stage are highlighted in red.
882
Nature©2007 Publishing Group Sladek, 2007

Identification of four novel T2DM loci
Our fast-track stage 2 genotyping confirmed the reported association
for rs7903146 (TCF7L2) on chromosome 10, and in addition iden-
tified significant associations for seven SNPs representing four new
T2DM loci (Table 1). In all cases, the strongest association for the
MAX statistic (see Methods) was obtained with the additive model.
The most significant of these corresponds to rs13266634, a non-
synonymous SNP (R325W) in SLC30A8, located in a 33-kb linkage
disequilibrium block on chromosome 8, containing only the 39 end
of this gene (Fig. 2a). SLC30A8 encodes a zinc transporter expressed
solely in the secretory vesicles of b-cells and is thus implicated in the
final stages of insulin biosynthesis, which involve co-crystallization
Table 1 | Confirmed association results
SNP Chromosome Position
(nucleotides)
Risk
allele
Major
allele
MAF
(case)
MAF
(ctrl)
Odds ratio
(het)
Odds ratio
(hom)
PAR ls Stage 2
pMAX
Stage 2 pMAX
(perm)
Stage 1
pMAX
Stage 1 pMAX
(perm)
Nearest
gene
rs7903146 10 114,748,339 T C 0.406 0.293 1.65 6 0.19 2.77 6 0.50 0.28 1.0546 1.5 3 10234
,1.0 3 1027
3.2 3 10217
,3.3 3 10210
TCF7L2
rs13266634 8 118,253,964 C C 0.254 0.301 1.18 6 0.25 1.53 6 0.31 0.24 1.0089 6.1 3 1028
5.0 3 1027
2.1 3 1025
1.8 3 1025
SLC30A8
rs1111875 10 94,452,862 G G 0.358 0.402 1.19 6 0.19 1.44 6 0.24 0.19 1.0069 3.0 3 1026
7.4 3 1026
9.1 3 1026
7.3 3 1026
HHEX
rs7923837 10 94,471,897 G G 0.335 0.377 1.22 6 0.21 1.45 6 0.25 0.20 1.0065 7.5 3 1026
2.2 3 1025
3.4 3 1026
2.5 3 1026
HHEX
rs7480010 11 42,203,294 G A 0.336 0.301 1.14 6 0.13 1.40 6 0.25 0.08 1.0041 1.1 3 1024
2.9 3 1024
1.5 3 1025
1.2 3 1025
LOC387761
rs3740878 11 44,214,378 A A 0.240 0.272 1.26 6 0.29 1.46 6 0.33 0.24 1.0046 1.2 3 1024
2.8 3 1024
1.8 3 1025
1.3 3 1025
EXT2
rs11037909 11 44,212,190 T T 0.240 0.271 1.27 6 0.30 1.47 6 0.33 0.25 1.0045 1.8 3 1024
4.5 3 1024
1.8 3 1025
1.3 3 1025
EXT2
rs1113132 11 44,209,979 C C 0.237 0.267 1.15 6 0.27 1.36 6 0.31 0.19 1.0044 3.3 3 1024
8.1 3 1024
3.7 3 1025
2.9 3 1025
EXT2
Significant T2DM associations were confirmed for eight SNPs in five loci. Allele frequencies, odds ratios (with 95% confidence intervals) and PAR were calculated using only the stage 2 data. Allele
frequencies in the controls were very close to those reported for the CEU set (European subjects genotyped in the HapMap project). Induced sibling recurrent risk ratios (ls) were estimated using
stage 2 genotype counts for the control subjects and assuming a T2DM prevalence of 7% in the French population. hom, homozygous; het, heterozygous; major allele, the allele with the higher
frequency in controls; pMAX, P-value of the MAX statistic from the x2
distribution; pMAX (perm), P-value of the MAX statistic from the permutation-derived empirical distribution (pMAX and
pMAX (perm) are adjusted for variance inflation); risk allele, the allele with higher frequency in cases compared with controls.
0
2
4
–log10[P]
–log10[P]
SLC30A8 IDE HHEXKIF11
0
2
4
a b
NATURE|Vol 445|22 February 2007 ARTICLES
Sladek, 2007
5
3
1
5
3
1
15
10
5
1 1 1
5
3
1
5
3
1
5
3
1
5
3
1
5
3
1
5
3
1
5
3
1
3 4 5
8 9 10
13 14 15
19 20
X
18
DM 2log10[pMAX], the P-value obtained by the MAX statistic, for each SNP
How would you interpret the p-
values?
Odds ratios?
Conﬁrmed 8 SNPs with N ~ 1000

Scaling up discovery by combining populations:

meta-analyses

g the Diabetes Genetics
nvestigation of NIDDM
nd (iv) the Framingham
omponent studies (n ¼
ry Table 1 online.
aring, the four consortia
n 10 and 20 SNPs promi-
their individual, interim,
mentary Table 2 online).
oci with consistent effects
dies. Two of these repre-
6PC2 and GCK. In addi-
nerated evidence for an
NPs around the MTNR1B
rs1387153, P ¼ 2.2 Â
10À11; DFS: rs10830963,
5.8 Â 10À4, for the most
ch analysis). The associa-
d on formal meta-analysis
r exclusion of individuals
¼ 1.1 Â 10À57; rs4607517
NR1B), P ¼ 3.2 Â 10À50;
pplementary Table 3 and
ent efforts to harmonize
(including the additional
data from the WTCCC, DGI and FUSION scans)10 (Supplementary
Note). We found strong evidence that the minor G allele of
rs10830963 was associated with increased risk of T2D (odds ratio ¼
1.09 (1.05–1.12), P ¼ 3.3 Â 10À7; Fig. 2 and Supplementary Table 6
online). The possibility that the fasting glucose association might
DGI
Study ID OR (95% CI) Weight
(%)
1.12 (0.96, 1.30) 4.61
4.89
8.03
9.58
3.53
8.75
2.69
6.04
10.56
23.18
2.85
7.41
7.90
100.00
1.20 (1.03, 1.39)
1.07 (0.95, 1.20)
1.14 (1.03, 1.27)
1.00 (0.84, 1.19)
1.17 (1.04, 1.30)
1.07 (0.88, 1.31)
1.16 (1.02, 1.33)
1.00 (0.90, 1.10)
1.03 (0.96, 1.10)
0.91 (0.75, 1.10)
1.15 (1.02, 1.30)
1.16 (1.03, 1.30)
1.09 (1.05, 1.12)
Meta-analysis P value = 3.3 × 10
–7
FUSION
WTCCC
deCODE
KORA
Rotterdam
CCC
ADDITION/ELY
Norfolk
UKT2DGC
OxGN/58BC
FUSION Stage 2
METSIM
.722 1 1.39
Overall (I
2
= 26.6%, P = 0.176)
Figure 2 Association of rs10830963 with type 2 diabetes (T2D) in 13 case-
control studies.
VOLUME 41 [ NUMBER 1 [ JANUARY 2009 NATURE GENETICS
Meta-analysis of SNP rs10830963:
Combining ﬁndings from multiple cohorts
Propenko, 2009

A RT I C L E S
By combining genome-wide association data from 8,130 individuals with type 2 diabetes (T2D) and 38,987 controls of
European descent and following up previously unidentified meta-analysis signals in a further 34,412 cases and 59,925 controls,
we identified 12 new T2D association signals with combined P < 5 × 10−8. These include a second independent signal at the
KCNQ1 locus; the first report, to our knowledge, of an X-chromosomal association (near DUSP9); and a further instance of
overlap between loci implicated in monogenic and multifactorial forms of diabetes (at HNF1A). The identified loci affect both
beta-cell function and insulin action, and, overall, T2D association signals show evidence of enrichment for genes involved in
cell cycle regulation. We also show that a high proportion of T2D susceptibility loci harbor independent association signals
influencing apparently unrelated complex traits.
Type 2 diabetes (T2D) is characterized by insulin resistance and
deficient beta-cell function1. The escalating prevalence of T2D and
the limitations of currently available preventative and therapeutic
options highlight the need for a more complete understanding of
T2D pathogenesis. To date, approximately 25 genome-wide significant
common variant associations with T2D have been described, mostly
through genome-wide association (GWA) analyses2–13. The identities
of the variants and genes mediating the susceptibility effects at most
of these signals have yet to be established, and the known variants
account for less than 10% of the overall estimated genetic contribution
to T2D predisposition. Although some of the unexplained heritability
will reflect variants poorly captured by existing GWA platforms, we
reasoned that an expanded meta-analysis of existing GWA data would
the inverse-variance method (Online Methods, Fig. 1, Supplementary
Tables 1 and 2 and Supplementary Note). We observed only modest
genomic control inflation ( gc = 1.07), suggesting that the observed
results were not due to population stratification. After removing SNPs
within established T2D loci (Supplementary Table 3), the result-
ing quantile-quantile plot was consistent with a modest excess of
disease associations of relatively small effect (Supplementary Note).
Weak evidence for association at HLA variants strongly associated
with autoimmune forms of diabetes (Supplementary Table 3 and
Supplementary Note) suggested some case admixture involving
subjects with type 1 diabetes or latent autoimmune diabetes of adult-
hood; however, failure to detect T2D associations at other non-HLA
type 1 diabetes susceptibility loci (for example, INS, PTPN22 and
Twelve type 2 diabetes susceptibility loci identified
through large-scale association analysis
Voight, 2010
Meta-analyses for T2D:
N>40K and 90K identiﬁes >30 loci among 2,400,000 SNPs

A RT I C L E S
13 autosomal loci exceeded the threshold for genome-wide significance
(P ranging from 2.8 × 10−8 to 1.4 × 10−22) with allele-specific odds
(r2 < 0.05), and conditional analyses (see below) establish these SNPs
as independent (Fig. 2 and Supplementary Table 4). Further analysis
50 Locus established previously
Locus identified by current study
Locus not confirmed by current study
BCL11A
THADA
NOTCH2
ADAMTS9
IRS1
IGF2BP2
WFS1
ZBED3
CDKAL1
HHEX/IDE
KCNQ1 (2 signals*: )
TCF7L2
KCNJ11
CENTD2
MTNR1B
HMGA2 ZFAND6
PRC1
FTO
HNF1B DUSP9
Conditional analysis
Unconditional analysis
TSPAN8/LGR5
HNF1A
CDC123/CAMK1D
CHCHD9
CDKN2A/2B
SLC30A8
TP53INP1
JAZF1
KLF14
PPAR
40
30
–log10(P)–log10(P)
20
10
10
1 2 3 4 5 6 7 8
Chromosome
9 10 11 12 13 14 15 16 17 18 19 20 21 22 X
0
0
Suggestive statistical association (P < 1 10
–5
)
Association in identified or established region (P < 1 10
–4
)
Figure 1 Genome-wide Manhattan plots for the DIAGRAM+ stage 1 meta-analysis. Top panel summarizes the results of the unconditional meta-
analysis. Previously established loci are denoted in red and loci identified by the current study are denoted in green. The ten signals in blue are those
taken forward but not confirmed in stage 2 analyses. The genes used to name signals have been chosen on the basis of proximity to the index SNP and
should not be presumed to indicate causality. The lower panel summarizes the results of equivalent meta-analysis after conditioning on 30 previously
established and newly identified autosomal T2D-associated SNPs (denoted by the dotted lines below these loci in the upper panel). Newly discovered
conditional signals (outside established loci) are denoted with an orange dot if they show suggestive levels of significance (P < 10−5), whereas
secondary signals close to already confirmed T2D loci are shown in purple (P < 10−4).
Meta-analyses for T2D:
N>40K and 90K identiﬁes >30 loci among 2,400,000 SNPs

0
20
40
60
80
100
recombinationrate(cM/Mb)
●●●
●●
●●
●●●
●
●
●
●●●
●
●●●●●
●
●
●
●●●
●●
●● ●
●
●●●
●●
●
●
●
●
●
●
●●
●
●
●●
●● ●
●
●●
●●
●
●
●
●●●
●
●
●
●
●●
●
●
●
●
●●●●●
●
●●
●
●
●
●●●
●
●
●
●
●
●
●●
●
●●●
●●●
●
●
●
●
●
●
●●●●●
●●●●
●
●
●●
●
●
●●
●
●
●●
●
●●
●
●
●●●
●●
●●
●
●●
●
●●
●
●
● ●
●●●●
●
●
●
●
●
●●
●
●● ●●
●● ●
●
●
●
●
● ●
●●
●
●●●●
●
●
●
●●
●
●●
●
●●●
●
●
●
●
●
●●●●
●
● ●● ●
●
●●●●●
●
●
2 −>
PGCP
98
SLC30A8 Region
0
2
4
6
8
10
−log10(P−value)
0
20
40
60
80
100
rs3802177
●●●●
●
● ●
●
●
●
●
● ●
●●
●
●●
●●● ●
●
●
●
●●●
●●
●
●●●●●●
●
●●●
●
●
●
●
●
●
●●
●●●
●●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●
● ●
● ●
●● ●
●
●
●
●
●
●
●
●
●
● ●
●
●●
●
●
●●● ●● ●●
●
●
●
●
●
● ●
●
●
● ●
●
●
●● ●●
●
●
●
●
●
●
●
●
●
● ●
●
●
●● ●
●● ●
●
●●
●●
●
●●
●●
●
● ●
●
● ●
● ●
●●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●●●
● ●● ●●
●
●
●
●●
●
●●
●
●
●
● ●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
● ●●●
● ●
●
●
●
●
●
●
● ●
●
●
●
●
●●
●
●
● ● ●
● ●
●
●
● ●
●
●
●
●●
● ●
●
●
●
●
●
●● ●
●● ●●●
●
●
●
●
●●●●●
●
●
●
●●
●● ●
●
●
●
● ●
● ●
●
●
●
●
●
●● ●
●●
●
●
●
●
●
●
●
●●●
●● ● ●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●●
●
●●●
●●
●● ●
●●
●
●●● ●
● ●
●
●●●●
●
●
●
●
●
●
●
●
●
●
●●●●
●
●●
●
●
●
● ●
●●
●
●
●●
● ●
● ●
● ●
●
●●
●
●
●
●
●●●
●
●
●
●
● ●●
●
●
●
●●
●
●
●
● ●
●
●●●●
●●
●
●
●●
●●●
●
●●●●●
●●
●●●
●
●●●
●
●
●
●
●●●
●●
●
●
●
●●●●●
●
●
●
●
●●
●
●●●
●
●
● ●
●
●
●●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●●●
●
●
● ●
●●●
●
●●●
●
●
●●
●
●
●
●
●●
●
●
●●● ●
●
●
●
●
●
●●
●
●
●
●●
●
●
●●●●
●
●
●
●
●
●
●
●
●●
● ●
●
●● ●
●
●
●
●
● ●●●●
●
●
●
●
●
●
●
● ●
●
●●
● ●● ●
●
●
●
●●
●
●
●●● ●●
●
●
●
●
●●●
●
●
●
●
●●
●
● ●●
●
● ●●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●●
●
● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
● ●
●
● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
● ●●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
● ●
●
●
●
●
● ●●●● ●●●
●
●
●
●●
●
● ●
●
●
●
●●
●
● ●
●
●
● ●●●
●
●
●● ●
●
●
●
●
●●
●
●
●
●
●● ●●
●●
●
●
●
●
●
● ●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●●
● ●
●
●
●
●
● ●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●●●● ●
●
●
●
● ●
●
●
●
●● ● ●
● ●●
●
●
●
●
●
●
●
●
●● ●
●
●
●●
●
●
● ●
●
● ●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●● ●
●●
●● ●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
● ●
●
● ●
●
● ●●
●
●●
●
●●
● ●
●● ●
●
●●
●
●●● ●
●●
●
●
● ●
●
●
●
●
●
●
●
●
● ●
●●
●●
●
●
●●
●
●
●
● ● ●
●
●
●
●
●
●
●●
●
●
● ● ●
●
●
●
●●
●
●
●
●
●
●●
●
●
● ●
●
●
●● ●●
●
●● ●●●
●●
●●●●●●
●
●
●
●●
●●
●
●
●
●
●●●
●
●
●
●●
●
●●
●
●●
●●●●●
●
●
●
●●● ●
●
●●
●
●●
●
●● ●
●●
●
●
●
●
●
●
●
●
●● ●●●
●
●● ●●●●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●●●
●●
●
●
●
●
●
●
●●
●
● ●●●●
●●
●●
●●●
●
●
●
●●●●●
●
●●
●
●
●
●
●●
●
● ●● ●●●●●●●●●
●●●
●
●●●
●
●● ●
●●●
●
●
●
●
●
●
●● ●
●
●
●
●● ●●
●
●●
●
●●●●●● ●
●
● ●
●
●
●
●
●
●●
● ●
●
●●
●
●
●
●
●
●
● ●● ●
●
●
●
●
●
●●
●
●
●
●●●
●
●●●●●
●
●
●●●
●
●●●● ●
●●
●●
● ●
●●● ●
●
●●●●●●●
●
●
●
●
●
●
●●
●
●●
●
●●
●●●●●
●●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●●
●
●
●
●●
●●
●●
●
●●●●
●●●
●
●● ●
●
●
●
●●●
●
●●●
●
●●
●
●●●
●
●●●●●●●●●●
●
●
●
●
●●●●
●
●●
●●●●●●●●●●●●●
●
●●●
●
●●
●● ●
● ●●
●●
●
●●●●●
●
●
●
●●
●●
●
●
●●●●●●
●
●
●
●
●
●
●●●●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●
●
●●●●●
●
●
● ●
●
●
●
●
●
●
●●
●
●●
●
●
●●●●
●
●●
●
●●● ●
●
●
●
●●●
●
●
●
●
●
●●
●●
●●
●
●
●
●
●●●●●
●
●
●●
●●
●
●●●●●
●
●
●●●
●●
●●●
●
●
●
●
●●
●
●
●
● ●●
●
● ●●
●
●
● ●●
●
●
●
●
●
●
●
●●
●●●
●
●●
●●
●●●●
●
●
● ●●●
●
●
●●●
●
●
●
●
●●
●
●
●●●●● ●
●● ●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●●● ●●●●
●
●
●
●● ●
●●●●
●●
● ●
●
●●●●
●● ●
●
●
●
●●
●
● ●●
●
●●
● ●
●
●
●
●●●
● ●●
●●●
●
● ●●●
●
●
●●●●●
●
●
●
●
●●●●●
●
●●●●●
●
●●●
●
●
●●
●
●
●
●
●●●
●●
●●●
●● ●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●●● ●
●
●
●
●
●●●
●
●
●
●●
●
●
● ●
●
●
●
●●
●●
●
●●
●
●
● ●●●
●
●
●
●
●
●
● ● ●
●
● ● ●● ●
●
●
● ●
●●
●
●
●●●● ● ●●●
●●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●●
●●●
●
●
● ●
● ● ●
●
●
●
●●
●
●
●●
●
●●●
●
●●●
●
●●●●●●● ●
●
●
●
●
●
●●●●●●●● ●●
● ●
●
● ●●●●●● ● ●
●●
●
●●
●●● ●
●
●
● ●
●
●
●●●● ●●
●
●
●●●
●●●
●
●●●●
●
●●●●●●
● ● ●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●●
●
●
●
●
●●●●●●●●●●●●●
●●●●●●● ● ●
●●●●●●●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●● ●●●
●
●●
●
●●●●
●● ●
●
●
●
●
●
●●●●
●
●●
●
●
●
●
●
●●
●●●●● ●
●
●
●●
●
●●●●●●●●●●●●●
●●●●●●●●●●
●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●●
●
●●●●●●
●
●● ●
●●●●●●●
●
●●
●●●●
●
●●●●
●
● ●
●●●●●●
●
●●
●●●●●●●●●●●
●●● ● ●
●
●●●●●●
●
●●
● ●●●●●●
●●●●●
●
●
●
●
● ●●●●●●●●●●●●●●●●●●●
●●
●
●
●
●●
●
●
●
●
●●●●
●●
●
●●●
●●
●●●
●
●●
●●
●
●●
●
●
●●●●●
●
●
●
●●
●●
●●●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●●●
●●● ●
●●●
●
●●●●●●●●
●
●●●●
●
●
●●●
●
●●
●
●●●
● ●●●●
●
●●
●●●
●
●●●●●
●●●●
●●
●●●
●
●
●
●
●
●
●●●●
● ●
●
●●●
●
●
●
●
●
●
●
●
●●●●●●●●●●●
●
●
●●●●●
●
●
●●●●●
●
●●●●
● ●●
●
●●●●●
●
●●●● ●●
●
●●
●
●
●
●●
●●●●●●●●●●●●●
● ●
●●●●●●●
●●●●
●
●●
●●
●●●
●
●
●● ●●●
●
●●●●
●
●
●●●
●●●●●●●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
● ●
●●●
●●●●●
●
●●●●●●●●●●●
●
●●●●●●●
●●●●●●●●
●
●
● ●
●●
●
●
●
●●●
●●
●
●
●●●●●●●●●●●●●●●●
●●●●●
●●●●●
●
●
●
●
●
●
●●
●●
●
●
●
●●●
● ●
●●
●
●
●
●
● ●● ●
●
●
●
●
●
●●
●
●
●
●
●●
●●
●
●
●●●●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●
● ●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
● ●●
●
●
●
●
●
●
●
● ●
● ●
● ●●
● ●
●
●
●
●●
●
● ●
●● ●●
●
●
●
●
●
●●
● ●
●
●
●● ●
●
●
●
●
●
●● ●
●
●●
●●
● ●
●
●
●●
● ●● ●
●
● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●●
●
●●
●
●
●
●
●
●
●
●
●●
● ●●
●
● ●●
●
●
●●
●
●●
●
●● ●
●
●
●
●●
●
●
● ●● ●
●●●
●
●
●
●
●
● ●● ● ●
●
● ●
●
●● ●●●●●●●●●
●
●●●●
●●
●●●
●●
●●
●●●
●
●●
●
●
●
●●●●
●
●
●
●
●
● ●
●●
●
●
●●●
●
●●
●
●●
●
●
●
●●●
●
●
●●●●●●●●
●
●●●●
●●
● ●●
●●
●
●●●●●●●
●●●●
●
●
●●
●●●
● ●●●
●●●
●
●●
●
●
● ●●
●
●●●●
●
●
●
●
●●●
●
●●●●●●●●
●
● ●
●
●●
●
●
●
●●
●
●
●●
●
●● ●●
●
●
●
●●●●
●
●
●
●
●●
●
●●
●●
●
●
●● ●
●●●●
●●
●●
●
●
●
●
●
● ●● ● ●●●
●
●
●
●
●
●
●
●
●
●●●
●
●●
●●●●
●
●●
●
●
●
●
●● ●
●
●
●
●
●
●●
●
●
●
●
●
● ●
●
● ●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
● ●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●
● ● ●
●
●
●
●
●
●
●
● ●● ●
●●
●
● ●●●●
●
●
●● ●
●
●●
●●
●
●
● ●
●
●
●
●
●● ●
●
●
●
●
●
●
● ● ●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
● ●
●
●
●
●
●●
●
●
●
●
● ● ●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●●
●
●
●
●
● ●●
●
●
●
●
●●
● ● ●
●
●
●
● ●
●
●●
●
●
●
● ●
●
● ●●● ●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
● ● ●
●
●
●
● ●
●
●● ●
●
●
●
●●
●
●
● ● ●
●●
●
●
●
●●
●
●
● ●
●
●
●
●
●●
● ●
●
●
●
●●●●
●
●
●
●
●
●
●
●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●
●
●●●
●
●●
●
● ●
●●●
● ●
●
●
●
●
●●
● ●
●●
●●
● ● ●
● ●●
●
●● ●●
●
● ● ●
● ●
● ●●
●
●
● ●
●●
●●
●
●●
●●●●●●●●
●
●
●●●●●●●
●
●●●
●
●
●●●●●
● ●● ●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●● ● ●
●
●
●
●
●●
●
●
●●● ●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●● ●
●
●● ●●
●
●●
● ●
● ●● ●
●
● ●●
●
●
●●
●
●
●
●
●
●
●
●
● ●●
●
●● ●● ●
●●
●
●
●
●
●
●
●
●
●
●
●●●
●
●●
●
●
●
●
●
●●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●●●●
●
● ●●
●
●
●
●
●
● ●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●●
●
●
●● ●
● ●●
●
●
●
●●
● ●
●●●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
● ●
●
●
●
●
●
●
●●●
●
●
●
●●●
● ●
●
● ●●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
● ●
●
●
●●●
●
●
● ●
●
●●
●
●●
●
●●●●●●●●●●
●●●●●
●●
●●●
●●●
●
●
●●●●
●●●●●●●●●●
●
●
●
●
●●
●●●●●
●●●●●●●●●●
●●●●●
●
●
●
●
●
●
●●●●●●●●
●
●
●
●●●●
●●●●
●●●
● ●
●●
●
●
●●
●
●
●
●●●●● ●●
●
●
●
●
●
●
●
●●●●
●
●●●
●
● ●●
●
●
●●
●
●
●
●● ●
●●
●●● ●
● ●
●
●●●
●●
●
●●
●
●
●
●
●
● ●●
●
●
● ● ●
●
●
●
●●
●
●
●
● ● ●●
●
● ● ●
●
●
●●●●
● ●
●
● ●
●
●
● ●● ● ●● ●
●
●
●
●
●
●●
●
●
●
● ●
●● ●●●●
●●
●
●
●● ●
●
●●
● ●
●
●
●
●
●● ●●
●
●●
●
●
●
●
●●
●
●
●
●●
●
●●
●●
●
●
●
rs3802177 stage 1
● r^2: 0.8 − 1.0
● r^2: 0.6 − 0.8
● r^2: 0.4 − 0.6
● r^2: 0.2 − 0.4
● r^2: 0.0 − 0.2
● r^2 missing
<− TRPS1
<− EIF3H
UTP23 −>
<− RAD21
LOC441376 −>
SLC30A8 −>
MED30 −>
<− EXT1
<− SAMD12
<− TNFRSF11
COLEC1
117 118 119 120
Position on chromosome 8 (Mb)
CDKN2A/B Region
0
2
4
6
8
10
−log10(P−value)
0
20
40
60
80
100
rs10965250
●● ●● ●
●
●
●
●
●
●
●
●
●
●
● ●●●
●
●
●●
●
●●
●
●
● ●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●●●
●
●
●
●
●
●
● ●
●●
●
●
● ●
●
●
●
●
●
●●
●
●●
●
●
●
●●
●
●●
●
●
●
●
●
●
●●●
●
●●●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●●●
●
●
●
●
●
●●
●● ●
●
●
●●
●
●
●●
● ●●
●
●
●
● ●●
●
●
●
●
●
●
● ●●
●
●●
●
●
●
●
●●
●
●
●
●
●● ●
●
●● ● ●
●
●
●
●
●
●
●
● ●
●
●●
●●
●● ●
●
●
●
●
●
●
●
●●
●
●●●●
●
●
● ●
●●
●
●
●
●
●
● ● ●
●●
●
●
●
●
●●●●
●
●●
●
●●
●
●
●
●●●
●
●●●
● ●
●
● ●●●
●
●●●
●
●
●
●
●●●●
●●
●
●●
●
●
●
●
●
●
●
●●●
●
●
●
● ●
●
●
● ●
●
●
●
●
●
● ●
●
● ●●
●
●
●
● ●
● ●●●●
●
●●
●
●
●
●
●
● ●●
●
● ●●●●●
●
●●
●
●
●
●
●
● ●●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●●
●●
●●●
●
●●
●●
●
●
●●
●●●
●●
●
●
●●
●
●
●●
● ● ●
●
● ●
●●●●●●●●●
●●
●●
●
●
●
●
●
●
●
●●
●
●
● ●●●●●●●
●●●
●
●
● ●●
●
●
●●●●
●
●
●
●●
●
●
●
●
●●●●●
●
●●
●●●●●●
●
●
●
●●
●
●
●●●
●
● ●
●●●
●
●●●●
●
●
●
●●●●
●●
●●●
●●
●●●●●
●●
●●●
●●●●●
●
●●●●
●
●
●
●●
●
●
●
●
●●●
●
●
●●
●●
●●
●
●
●
●
●●
●
●
●
●●●
●
●
●
●
●
●●●●●●●
●●●
●
●●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●●
●●
●●
●●●●●●●●●●
●
●●●
●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●●●
●●●●●
● ●●
●●
●
●
●●●
●●
●
●●
●
● ●
●
●
●●●
●
●●●
●
●●●
●
●
●
●
●●●●●●●●●●●●●
●
●●
●●●
●●●
●●●
●
●
●
●●●●
●●
●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●
●●●
●●
●●
●●●●●●●●●●●●●●●
●
●●●
●●●●●
●
●
●
●
●
●
●
●●●●
●●
●
●
●
●
●
●●
●
●●●
●
●
●●
●●●●●
●
●●
●
●
●
●
●●●●●●●
●
●
●
●
●
●●●
●●
●
●●●
●
●●●
●
●●●●●●●●●●●●●●●●
●●●●
●●
●
●●
●●
●●
●
●
●
●
●
●●
●
●●
●
●●●
●
●●●
●
●●●●●
●
●●
●
●●●
●●
●●
●
●
●●●
●●
●●●●
●●
●●
●●
●●
●
●
●
●
●
●
●●●●
●
●●●●●
●
●
●
●●●●
●
●●
●
●
●
●
●●●
●
●●
●
●
●●●●●
●
●
●
●
●
●●
●
●●
● ●●●●●
●
●●
●●●●●
●●
●
●●●
●
●
●●●
●
●
●
●
●
●
●
●●
●
●●●●
●
●
●●
●
●●
●●●●●●●●●●●●●●
●●
●
●●
●●●
●
●
●
●●
●●
●
●●●
●
●●●●
●
●
●
●
●●
●●
●●
●●●●●●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●●
●●●
●
●
●●
●
●●
●
●
●
●●
●
●●●
●
●●
●
●
●●●
●
●●●●●
●
●
●●●
●●●●●
●●
●●●
●
●
●
●
●
●
●●
●
●
●
●
●●●●●●
●
●●●
●●
●
●●●
●
● ● ●
●●
●●
●
●
●
●
● ●
●
●
●
●
●
●
●
●●
●
●
●
●
●● ●●●
●
●
●● ●
●
●
● ●
●
●
●
●
●
●
●
●●
●
●●●
● ●●
●
● ●●●●● ●● ●
●●
● ●● ● ●
●
●●
●●
●●
●
● ●● ●
●
●
●●
● ●
●
●●
●
●●
● ●
●
●
● ● ●●●● ●
●
●
●
●●
●
● ●●●●
●●
●●●
●●
●●
●
●
●
●●
●
●
●●●●
●●●
●
● ●●
●●
●
●
●
●●●
●
●
●●●
●
●
●
●
● ●
●
●
●
●
●●
●
●
●
●●
●
● ●
●
●
●
●
●●
●●
●
●
● ●●●
●
●
●
●
●●
●
●
●
●● ●●
●
●●
●
●
● ● ● ●
●
● ●
●
●●
● ●●●●
●
●
●
●
● ●
●
●
● ●
●
●● ●
●
●
●
● ●●
●●
●●
● ●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●
● ●●
●
●
●
●
●
● ●
●●
●
●
● ●
●
●
●
●
● ●
●
●●
●
●
●
● ●
●
●
●●●● ●
●
●
●●
●
●
● ●
●●
●
●●
●
●
●
●
●
●●●
●●●●
●
●
●
●●●
●
●
●
●
●
●●
●
●
●
●
●
● ●●●
●
●
● ●
●●● ●●
●
●
●
●●
●●
●
●●
●●
● ●●●●
● ●
●
●
● ●
● ● ●
●
● ●
●
●
●
●●
●● ●
●●●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●● ●●
●● ●
●
●
●
● ●
●
●●
●
●
● ●
●●●●●
●● ●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●●● ●●
●
●
● ●●
●
●
●●●
●
●●●●
●
●●
● ●
●
●
●
●●
● ●
●
●
●●●
●●●●●●
●●●●
●● ●●
●●●●
●●●
●●●
●
●
●
●
●● ●●
●
●●●
●● ● ●
●●●
●●●●
●
●●
●
●
●●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
● ●
●
●
●●●
●●
●
●
●
● ●●
●
●
●●
●●
●
●
●●
●
●
●●
●
●●
●
●●●●●
●●
●●
●
●
● ●●●
●●
●
●
●●
●
●●
●●
●●●
●
●
●
●●
●
●
●● ● ●●
●●●●●●●●●●●●●●●●
● ●●
●●●
●●
●●●●
●
●
●
●
● ●●
● ●
●
●● ●●●●●
●
● ● ●
●
●● ●●
●
●●
●
●●
●
●
●●●
●●
●
●
●
●
●●●
●
●● ●● ●
●● ●
●
●
●●
●
●
●●●●
●●● ●
●●
●●●●●
●
●
●●●
●
●●
●
●●
●
●
●●●
●●
●●●
●
●
●
●
●
●
●
●●
●
●
●●●● ●●●
●●
●●
●● ●
●●
●
●●
●
●
●●●●●
● ●●
●
●
●●
●
●
●
●●●●
●
●●
●
●●●
●
●
●
●
●
●
●
●●●
●●
● ●
●
● ●●
● ●●●●●
●
●
●●
●
●●
●
●
●●
●
●
●
●●●●●●
●
●
●●●●
●●
● ●●●●● ●
●
●
●
●●
●●
●
●●
●
●
●
●
●●●●●
●
●
●
●●●●
●
●
●
●●●●●● ●
●●
●●
●●●
●●●
●●●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●●
●●●●●
●●● ●
●●●
●
●
●
●
●
●
●●
●
●
●●●●● ●●● ●
●
●
●
●
● ●●●
●
●
●●
●
● ●●
●
●
●
● ●●
●
●
●
●
●
●●●
●
●
●● ● ● ●
●
●● ●
● ●●●
● ●
●
● ●
●
●
●
●
●●
● ● ● ●
●●
●
●
●
●●
●●
●
●●
●
●
●●
● ●
●
●
●
●
●
● ● ●
●
●
●
●
●●
●
●
●
●
●
● ●●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
● ●●
●
●●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●●
●
●●●
●
●
●
●
●●●●
●
● ●
● ●
●
● ●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
● ●● ●
●
●● ● ●
●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●● ●●
●
●
● ●
●
●●
●
●●
●●●
●
●
●
●● ●
●●
●●
●
● ●● ●
●
●
●
●
●●
●
●
●●
●●
● ●
●
●
●
●●
●
●
●
●
●●
●●●
●●
●●● ●●
●●
●●●
●●
●●
●
●
●
●
●●
●● ●● ●
●
●
●
●
●
●
●
●●
●
● ●●
●
●●
● ●
●
●●
●●
● ●●
●
●
●
●
● ●●
●
●
●
●
●
●●
●
●
●●
●●●●●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●●●
●
●
●
●
●
●
●●●
●● ●●●●●●
●●
●●●●●●●●
●
●
●
●
●
● ●
●●
●●
●
●●●●
●●
●●
●
●
●●
●
●
●
●
●
●●●●
●
●
●
● ● ●
●
●●
●●●●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●●●●●●●
●●
●
●
●
●
●●
●
●
●●
●
●
●●●●
●
●●
●●
● ●
●
●
●
● ● ● ●
●●●
●
●
●
●
●
●
●
● ●
●
● ●●
● ●
●●
●
●
●
●●
●
●
●● ●
●
●●
●
●
●
●
●
●
●●●●●
●●
●● ●
●
●
●●
●
●
●
●
●●●●●●●●
●●●
●
●●●●
●●● ●
●
●●
●
●
●●●● ●●●●
●
● ●
●
●
● ●●●●●
●
●
●
●
●
● ●
●
● ●
●●●
●●●
●
●
●
●●
●●
● ●
●
● ●
●●
●
●●
●
●●
●
●
●
●
●
●
● ●●
●
● ●
●
●●●●
●●
●
●
●
●
●
●●● ●
●
●● ●●
●
● ●●●
●
●
●
●
●●
●
●
●●
● ●
●
●
● ●
●
●
●
●
● ●●●
●
●
●
●
●●
●
●
● ●●●●
●
●
●
●
●
● ●●
●
●
●
● ●
●
● ●
● ●●
●●
●
●
●
● ●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●● ●●
●
●
●
●
●● ●
●
●
●
●●
●
●●
●
●●●●
●●●
●
●
●
●●● ●
●
●
●
●●●
●
●
●
●
●●
●●
●
●●
●
● ●●●
●
●
●
●●
●● ●
●●
●
● ●
●
●●
●
●
●
●
●
● ●●
●●
●●●
●
●
●
●
●●●
●
●● ●●
●●
●● ●●
●
●●● ●●
●●● ●
●●●
●
●
●
●●
●
●
●
●
●
●
●●●
●
●●●●
●
●
●
●●
●●●
●
●
●●●
●●
●●
●●●●●
●
●
●●●●
●
●
●●● ●● ●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●●
●
●●
●
●
●
●
●●
●
●
●●
●
●●
● ●●●●● ●●● ●●●
●
●
●
●
●
● ●●
●
●
●
●
●
●●
●●
● ●●●●
●
●●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
● ●
●
●●
●
● ●
●
●
●
●● ●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●●
●
●
●
●
●● ●
●
●
●●●●●●●
●●
●●●●
●●
●
●
●●
●●
●
●
●
●●
●
●●●
●
●
●
●●
●
● ●●●● ●●●●●
●●●●●
●●
●
●●●●
●
●
●●
●
●●●
●
●
●●● ●● ●
●
●● ●
●
●
●
●●
●●● ●●
●●
●● ●
●
●
●●
●
●
●
●●
●●
●
●
●
● ●
●
●
●
●●●●●
● ●
● ●
●
●
●●
●
●●
●
●
●
●
● ● ●●● ●
●
● ●● ●
●
●
●●
●
● ●
●●
●
●
● ●
●
●
●●
●●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●●●
●
●
●
●
● ●
●●
● ●
● ●
●
●
●
●
●
●● ●
● ●
●
●
●
●
●
●
●
●●●● ●●
●
●
●
●
●
● ●●
●
●
●
● ● ●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●
● ●
●
● ●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●● ●
●
●
● ●
●
●
●●●
●
●●
●●
●
●
●●
●
●
●
●
●
●●●
●
●●
●
●
●
●
●
●
● ●
●
● ●
●
●●
●●
●●
●
●
●
●
● ●
●
●
●
●
● ●
● ●●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
● ●●
●●
●
●
●
●
●
● ●
● ●
● ●
●
●●
●
●
● ●●
●
●
●
●
●
●●
●
●
●
●●
●●
●
●
●
●
●
● ●
●
● ●●
●
●
● ●
●
●
●
●●
● ●● ●
●●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●●
●
●
● ●● ●●
●●
●
●
●
●
●
●●
●
●
●● ●
●
●
●
●
●
●●●
●
●●●
● ●
●●
●
●●●●
●
●
●
●
●●
●
●
● ●
● ●●
● ●● ●● ●
●
● ●
●
●
●
●
●
●
● ● ●
●
● ●
●
●
●
●
●● ●●
●
●
●
●
●
●
●
●
●
●
●
● ●●
● ●
●
● ●●
●
●
●●
●●
● ●
●
●
●
● ●
●
●
● ●●
●
● ●
●
● ● ●
●
● ●
●●
●
●●
●
●
●
●
●
● ●
●
●
●
●
●
●
●●
● ●
● ● ●●●● ●
●
● ●●
●
●
● ●
●
●
●
● ●
●
●●
●
●
●
●
●
● ●● ●
●
●
● ●
●
●
●
● ●
●
●
●
●●
●
●●
●
● ●
●
● ●●
●
●
●
●
●
●
●
●●
● ●
●
●
●●
●
●
●
●
●
●
● ●
●●
●
●
●
●
●
● ●
●
●
●
●● ●
● ● ●
●●
●●●
●
●
●
● ●
●
●
●
●
● ●
●
●●
●
● ● ●
●
●
●
●
●
●
● ●
●
●
●● ●
●
●
●
● ●
●
●
●●
●
●
●
●● ●
●
●
●
●
● ●
●
●
● ●●
●
●
● ●
●
●● ● ●
●
● ●●
●● ●
●
● ●
●
●
●●
●●
●
●
● ●● ●
●
●
●
● ●
●
●
●
●
●
●
●
●
●●
●
● ●
●
●
●
●●
●
●●
●
●
●
●●
●
●●
●
●
●
●
● ●
●
● ●●●
●
●
●
●
●
●● ●
●
●
●●
●
●●
●
●
●●● ●
●
●●●●
●●
●
●
●
●
●
●
● ●●●
●
●
●●● ●●
●
●
●
●
●●
●
●
● ●●
● ● ● ●
●
●
●●
●
●
●
●●
●
● ●
● ● ●●●● ●
●●
●
●●
●
●
●●
●●●
●
●
●
●
●
●
●
●
●
● ●
●●
● ●●
●
●●
●
●
●●
● ● ●
●
●
● ●●●
●●●
●● ●
●
● ●
●
●●
●
●●● ●
●
●
● ●
●
●
●
●
●
●●
●●
●●
●
●● ●● ●
●●
●
●
●●●
●
●
●
●
●
●●
rs10965250 stage 1
● r^2: 0.8 − 1.0
● r^2: 0.6 − 0.8
● r^2: 0.4 − 0.6
● r^2: 0.2 − 0.4
● r^2: 0.0 − 0.2
● r^2 missing
<− MLLT3
KIAA1797 −>
<− PTPLAD2
<− IFNB1
<− IFNW1
<− IFNA21
<− IFNA4
<− IFNA7
<− IFNA13
MTAP −>
<− CDKN2A
<− CDKN2B
DMRTA1 −>
<− ELAVL2
21 22 23 24
Position on chromosome 9 (Mb)
40
60
80
100
recombinationrate(c
CDC123/CAMK1D Region
4
6
8
10
log10(P−value)
40
60
80
100
recombinationrate(c
rs12779790
●●●
●
●
●●
●
rs12779790 stage 1
● r^2: 0.8 − 1.0
● r^2: 0.6 − 0.8
● r^2: 0.4 − 0.6
● r^2: 0.2 − 0.4
● r^2: 0.0 − 0.2
● r^2 missing
HHEX/IDE Region
10
15
log10(P−value)
40
60
80
100
recombinationrate(c
rs5015480
●
●
●
●
●
●●
●
●
●
●●●●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●●●
rs5015480 stage 1
● r^2: 0.8 − 1.0
● r^2: 0.6 − 0.8
● r^2: 0.4 − 0.6
● r^2: 0.2 − 0.4
● r^2: 0.0 − 0.2
● r^2 missing
.609
Not in a gene...In a gene...
~90% of GWAS hits are non-coding!

pporting!Figures!
!
!
~90% of GWAS hits are non-coding!
Stamatoyannopoulos, Science 2012
Systematic Localization of Common
Disease-Associated Variation in
Regulatory DNA
Matthew T. Maurano,1
* Richard Humbert,1
* Eric Rynes,1
* Robert E. Thurman,1
Eric Haugen,1
Hao Wang,1
Alex P. Reynolds,1
Richard Sandstrom,1
Hongzhu Qu,1,2
Jennifer Brody,3
Anthony Shafer,1
Fidencio Neri,1
Kristen Lee,1
Tanya Kutyavin,1
Sandra Stehling-Sun,1
Audra K. Johnson,1
Theresa K. Canfield,1
Erika Giste,1
Morgan Diegel,1
Daniel Bates,1
R. Scott Hansen,4
Shane Neph,1
Peter J. Sabo,1
Shelly Heimfeld,5
Antony Raubitschek,6
Steven Ziegler,6
Chris Cotsapas,7,8
Nona Sotoodehnia,3,9
Ian Glass,10
Shamil R. Sunyaev,11
Rajinder Kaul,4
John A. Stamatoyannopoulos1,12
†
Genome-wide association studies have identified many noncoding variants associated with common
diseases and traits. We show that these variants are concentrated in regulatory DNA marked by
deoxyribonuclease I (DNase I) hypersensitive sites (DHSs). Eighty-eight percent of such DHSs are active
during fetal development and are enriched in variants associated with gestational exposure–related
phenotypes. We identified distant gene targets for hundreds of variant-containing DHSs that may explain
phenotype associations. Disease-associated variants systematically perturb transcription factor recognition
sequences, frequently alter allelic chromatin states, and form regulatory networks. We also demonstrated
tissue-selective enrichment of more weakly disease-associated variants within DHSs and the de novo
identification of pathogenic cell types for Crohn’s disease, multiple sclerosis, and an electrocardiogram
trait, without prior knowledge of physiological mechanisms. Our results suggest pervasive involvement of
regulatory DNA variation in common human disease and provide pathogenic insights into diverse disorders.
D
isease- and trait-associated genetic variants
are rapidly being identified with genome-
wide association studies (GWAS) and re-
lated strategies (1). To date, hundreds of GWAS
have been conducted, spanning diverse diseases
and quantitative phenotypes (2) (fig. S1A). How-
ever, the majority (~93%) of disease- and trait-
associated variants emerging from these studies
lie within noncoding sequence (fig. S1B), com-
plicating their functional evaluation. Several lines
of evidence suggest the involvement of a propor-
tion of such variants in transcriptional regulatory
mechanisms, including modulation of promoter
and enhancer elements (3–6) and enrichment with-
in expression quantitative trait loci (eQTL) (3, 7, 8).
Human regulatory DNA encompasses a vari-
ety of cis-regulatory elements within which the co-
operative binding of transcription factors creates
focal alterations in chromatin structure. Deoxy-
ribonuclease I (DNase I) hypersensitive sites (DHSs)
are sensitive and precise markers of this actuated
regulatory DNA, and DNase I mapping has been
instrumental in the discovery and census of hu-
man cis-regulatory elements (9). We performed
DNase I mapping genome-wide (10) in 349 cell
and tissue samples, including 85 cell types studied
under the ENCODE Project (10) and 264 sam-
ples studied under the Roadmap Epigenomics
Program (11). These encompass several classes
nome. In total, we identified 3,899,693 distinct
DHS positions along the genome (collectively
spanning 42.2%), each of which was detected in
one or more cell or tissue types (median = 5).
Disease- and trait-associated variants are
concentrated in regulatory DNA. We examined
the distribution of 5654 noncoding genome-wide
significant associations [5134 unique single-
nucleotide polymorphisms (SNPs); fig. S1 and
table S2] for 207 diseases and 447 quantitative
traits (2) with the deep genome-scale maps of
regulatory DNA marked by DHSs. This revealed
a collective 40% enrichment of GWAS SNPs in
DHSs (fig. S1C, P < 10−55
, binomial, compared to
the distribution of HapMap SNPs). Fully 76.6%
of all noncoding GWAS SNPs either lie within a
DHS (57.1%, 2931 SNPs) or are in complete
linkage disequilibrium (LD) with SNPs in a near-
by DHS (19.5%, 999 SNPs) (Fig. 1A) (12). To con-
firm this enrichment, we sampled variants from
the 1000 Genomes Project (13) with the same ge-
nomic feature localization (intronic versus inter-
genic), distance from the nearest transcriptional
start site, and allele frequency in individuals of
European ancestry. We confirmed significant en-
richment both for SNPs within DHSs (P < 10−59
,
simulation) and also including variants in com-
plete LD (r 2
= 1) with SNPs in DHSs (P < 10−37
,
simulation) (fig. S2).
In total, 47.5% of GWAS SNPs fall within
gene bodies (fig. S1B); however, only 10.9% of
intronic GWAS SNPs within DHSs are in strong
LD (r2
≥ 0.8) with a coding SNP, indicating that
the vast majority of noncoding genic variants
are not simply tagging coding sequence. Analo-
gously, only 16.3% of GWAS variants within
coding sequences are in strong LD with variants in
DHSs. SNPs on widely used genotyping arrays
(e.g., Affymetrix) were modestly enriched with-
in DHSs (fig. S2), possibly due to selection of
SNPs with robust experimental performance in
genotyping assays. However, we found no evi-
dence for sequence composition bias (table S3).
To further examine the enrichment of GWAS
SNPs in regulatory DNA, we systematically clas-
sified all noncoding GWAS SNPs by the quality
1
Department of Genome Sciences, University of Washington,
Seattle, WA 98195, USA. 2
Laboratory of Disease Genomics
RESEARCH ARTICLE
onSeptember12,2012www.sciencemag.orgDownloadedfrom

There have been few, if any, similar bursts of discovery in the
history of medical research.
David Hunter and Peter Kraft, NEJM, 2007

Common claims discussed in regards to GWAS:
Despite issues, yielded many discoveries vs. cost
to a doubling of the number of associated variants discov-
ered. The proportion of genetic variation explained by
signiﬁcantly associated SNPs is usually low (typically less
than 10%) for many complex traits, but for diseases such
as CD and multiple sclerosis (MS [MIM 126200]), and for
quantitative traits such as height and lipid traits, between
Figure 1. GWAS Discoveries over Time
Data obtained from the Published GWAS Catalog (see Web
Resources). Only the top SNPs representing loci with association
p values < 5 3 10À8
are included, and so that multiple counting
is avoided, SNPs identiﬁed for the same traits with LD r2
> 0.8 esti-
mated from the entire HapMap samples are excluded.
~500,000 SNP chips x ~$500/chip

= $250M
Five years of GWAS Discovery (Visscher, 2012)
$250M / ~2000 loci

= $125K/locus
Candidate genes: >$250M!
100 NIH R01s

Fighter jet

Hadron Collider: $9B

Nothing comparable to elucidate E inﬂuence!
We lack high-throughput methods
and data to discover new E in P…
E: ???

A similar paradigm for discovery should exist

for E!
Why?

σ2
G
σ2
P
H2 =
Heritability (H2) is the range of phenotypic variability
attributed to genetic variability in a population
Indicator of the proportion of phenotypic
diﬀerences attributed to G.

Height is an example of a heritable trait:

Francis Galton shows how its done (1887)
“mid-height of 205 parents
described 60% of variability of 928
offspring”

Intro to Biomedical Informatics 701

Intro to Biomedical Informatics 701

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (11)

Semelhante a Intro to Biomedical Informatics 701

Semelhante a Intro to Biomedical Informatics 701 (20)

Mais de Chirag Patel

Mais de Chirag Patel (6)

Último

Último (20)

Intro to Biomedical Informatics 701