A lecture for UW EPI 519 providing background for genome-wide association studies, a few examples of recent papers in the CVD GWAS literature, and some lessons and new directions. The talk was originally given in 2008 (in collaboration with a colleagure), this version has been updated slightly for 2010 and includes references for further reading.
Some of the typefaces may have been mangled on conversion; the file download should be more reliable.
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
Epi519 Gwas Talk
1. Genome-wide
Association
Studies
EPI 519
21 October 2010
Joshua C. Bis, PhD
University of Washington, Cardiovascular
The Type 1 Diabetes Genetics Consortium. Nature Genetics, 2009 May 10
Health Research Unit
6. highly consistent associations*
Trait Gene Polymorphism Frequency
Deep Vein F5 Arg506Gln 0.015
Thrombosis
Graves’ disease CTLA4 Thr17Ala 0.62
Type 1 diabetes INS 5’VNTR 0.67
HIV infection CCR5 32 bp Ins/Del 0.05-0.07
Alzheimer’s disease APOE Epsilon 2/3/4 0.16-0.24
Creutzfelt-Jakob PRNP Met129Val 0.37
* Associations between polymorphisms and disease where at least 75% of identified studies
achieved statistical significance. (out of 600 gene–disease studies reviewed)
Hirschhorn: Genet Med, Volume 4(2).March/April 2002.45-61
7. “genomics”
The field within genetics
concerned with the structure and
function of the entire DNA
sequence of an individual or
population.
-- Thomas Roderick
McDonald’s Raw Bar
1986
8. genome-wide association study
“… a study of common genetic
variation across the entire
human genome designed to
identify genetic associations with
observable traits.”
-- National Institutes of Health,
“Policy for sharing of data obtained in
NIH-sponsored or conducted GWAS”
9. “A major strength of the
genome-wide approach … has
been its freedom from reliance
on prior knowledge.”
-- “A HapMap harvest of insights into the genetics of
common disease”
(Manolio, Brooks, Collins.)
13. haplotypes
The International HapMap Consortium. Nature | Vol 437 | 27October2005
14. “… to create a public, genome-
wide database of common
human sequence variation,
providing information needed as
a guide to genetic studies of
clinical phenotypes.”
-- October 2002
16. imputation
Use patterns of variation from HapMap to impute genotypes.
Increases power by allowing for association testing at
untyped markers and allows comparisons across studies and
platforms by using a common set of SNPs.
Li, Willer, Sanna, Abecasis. Annu Rev Genomics Hum Genet. 2009;10:387-406
22. association study
controls cases
CC CT CT TT TT CT CC TT TT CC CC CC TT TT TT CT CC TT CT TT TT CT TT CT TT TT TT CT CT CT CT TT CT CT TT CT CT CT CC CT CT CC CC CT TT CT CT TT CC CC CT CC TT CT
CT CC TT TT CC CT CT CC TT CT CT TT TT CC CT CT CC CC CC CT CT CT TT CT TT CT TT TT CC CC CT CT CC CT CT TT CC CT TT CC CC CC CT CT CT CT CT CC CT CT CC CC CT CT
CT TT TT CT CT CT CT TT CT CT TT TT CC CT CC CT CT CC CT CC CT CC CT CT CC CC TT TT TT TT CC CT CC CT TT CT CC CT CC CT CT TT CT CT TT CT CT TT CT CC CC CC CC CT
CT CT CT TT CT CT TT CT TT CC TT CC CC TT CC CT TT CT CC CT CT CT TT CT CT CT CT CT CT CC CT CC CT CT TT CC CC TT CT CT CT TT CT CC CT CT TT CC CT TT CC CC CC CT
CT CT CT CC CC CT CC CT TT CT TT CT CT CC CT CT CT CT CT TT CC CT TT TT CC TT CC CT TT CT
TT CT CT CT TT CC CT CT CC CT CT CT CT CC CT CT CC CC CC CT TT CC CC CC
CT CC TT CT CT TT CT TT TT CT CT CT CT TT CT CT TT CC TT CT CT CT TT TT TT CT CT CT CT CT
CT CT CC CC CC TT CC CT TT CT CC CT CT TT TT TT TT CT CC TT CT CT TT CT CT CC CC CC TT CC CC TT CT CC TT TT CC CT CT CC CC CT CC CT CT CC CT TT CC CC CC CT CT CC
TT CT CT CT CT CC CT TT CC CC TT TT CC TT CT CT CT CT CC CT CT CC TT CT CT CT CT CC CT CT CT CC CT CC CC CT CT CT CT TT CC CT CC CC CT CT CT CT CT CT TT CC CT CT
CT TT CC TT CT CT TT CT CC CT CT CT TT CC CC CC CT CT CC CC CC CT CT CT CT CC CT CT CT CC CT CT CC CT CC CT CT CC CC CT CT CT CC CC CC TT CT TT CT CT TT CC TT TT
CT CC CT TT CC TT CT TT CT CC CT CC CT CC CT TT TT CT CT CT CT CT CC TT CC CC CT CT TT CT CT CT CC TT CT CC CT CC CT CC CC CC CC CT CT CT CC CT CC CT CT CT CC CC
CT CT CT CT TT CT CT TT CT CT CT CC CC CT CT CC CC CC CT CT CT CT CT CT CC CT CT CC CT TT CT CT CT CT TT CC CT CT CT CT CT CT TT CT CT CT CC TT CT CC CC TT TT TT
TT TT CT CC CT CT CT CC TT CT CC CT CC TT TT CT TT TT CT CT TT CT CT TT CT TT CC CT CT CC CT TT CC TT TT CC TT CT TT TT CT TT CT CT CC TT TT CT CT CC CC CT TT CT
CT CT CC CT CT CC TT TT CC CT CC CT CT CT CT CT TT CC CC CT CC TT TT TT CC CT CC CC CC CT TT CC CC CT TT CC CT CT CT CC CT CC TT TT CT CC CT CC CT CT CT CC CT CC
CC CT CT TT CT CT CT CT CT CC CT TT CC CT TT CT CC CT CT CT CC CC CT CT CT TT CT CT CT CT
CC CT CT CT CC CT CC TT CT CC CT CT CT CT CC CT CC TT CT TT TT TT CT CT
TT CT CT TT CT CC TT CT CC TT CT CC CC CT CT CC TT CT CC CC CT CT CT CT CC CT CT CC CT CT
TT CT TT CT TT TT TT CC CT CT CC CT CT CT CT CC CT CT CT CC TT CT CC TT CT CT CT CC TT TT CT CT CT CT CC CT CT CT CC CT TT TT CT CT CC TT CT CT CC TT CC TT CT CT
TT TT CT CT CT CT CT CT CT CT CT CT CT CT TT CT TT CT CT CC CT CT CT CT CC CT TT TT TT CT CT CT TT CC CC CC CC CT CT TT CT CC CT CT CC CC CC CT TT TT TT CC CT CT
CC CT TT CT CT CT TT TT CC CT CC CT CC CT CT TT CC TT CC CT CT TT CC CT CC CT CT CC CT CC CC CC CC TT TT CT CT TT CT CT CT CC CT CC CC CC CC CT CC CT TT CT CT CT
CT TT CT TT TT CT CC CT CT TT TT CT CC CC CC CT CT CT TT CT TT CT CT CT CT TT CT TT CC TT CT CT CC CT CT CT CT CT TT TT TT TT CC CC CC CT TT CT CC CT CT CT TT CC
CT CT CC CC CC CT CC CT CC TT TT TT CC CC TT CT CT CT TT CC CT CT CC CC CT TT CT CT CT TT CT CT CC CC CC CT CC TT CC CT TT TT CC CT CC TT CT CT CT CC CC CC CT CT
TT CT CT CT TT CT CT CT TT CT TT CC CT CT TT CT CT TT TT CC TT TT CC CT CT CT CC CC CT TT CT TT CT TT CT CC CC CT CC CC CT TT CT CT CT CT CC CT CC TT CT CC CC CT
TT CT TT CT CT CT CT TT TT CT TT CT CT CC CT CT CT TT TT CT CC CT TT CC CT TT TT CT TT CT CC CC CT CC CT TT CT CC CT CC TT CC TT CT CC CT CC CT CT CT CC CT CT TT
CT CC CT TT TT CT CT TT CT CC CT TT CT CT CT CT CT CT CC CT CT CC CT CC TT CT CC TT CT TT TT CC CT TT TT CC CT CC CT CT CT TT TT CT CC TT CT CT CC CT CC TT CT CT
CT TT CC CT CT CC TT CT CC TT CC CT CT TT TT CT CT TT CT CC CC CT CT TT CT CT TT CC CC CT
CC CC CT CT CC CT TT CT CT CT CT CT CT TT CC CT CT TT TT CC CT TT CC TT
TT CT TT TT CT CC TT CC CT CT CC TT CT CC CT CC TT CC TT CC CT TT TT TT CT CC TT CC CC CC
CT CT CC TT CC TT CT CT CC TT TT CT TT CT CC CT CC CT CT CT TT TT TT CC CT CC CC CC TT TT TT TT CC CC CC TT TT CT CC CC CT CT CC CT CT TT TT CC CT CT TT TT CT TT
CT CT CT CT CT TT CT CC CT TT CT TT CC CC CT TT CT CT CC CT CT CC TT TT TT CT TT CC TT TT CC CC TT TT TT TT TT CT CT CT CT CC CC CT CC CC TT CT CC CT CT CC CT CT
CC CC CC CT CT TT CT CT CC TT CT CT TT CT CT CT CC TT CT CT CC CT CC CC CT CC CT CT CT TT CC CT CC CC TT CT CT CC CT CT CC CT CT CC CT CT TT CT CT CC CC TT CC CC
TT CC CC TT CT CC TT TT CT TT CT CC CT CT CT TT CC CC CC CC CT CT CT CC CT CT CT TT CT CT CT CT CC CT CT CT CT TT TT CT CC CT CT CC CT CC TT CC TT CC CC TT CT CC
TT TT CT CC CT CC CC CC CT CT TT CT CT CC CC TT CC CT TT CT TT CT CC CT CT TT TT TT CT TT CC CC CC TT CT CT CT CT CC CT CT CT CT CT CC TT CT CT CT TT CT CC CC TT
CC CT CT CC CT CT CC CC CT CT TT CC TT TT CT CT CT TT CT CT CT CC TT TT TT CT TT TT CT CT CC TT CT CC CC CT CT CT TT CT CT CC TT CT CT CT CT CC CT CT CC CC CC CT
CC CC CC CT CT CT CT CT CC CT CC CT TT CC CT TT CT CT CT CT CT TT CT TT CT CT TT TT TT CC TT CC CT CC TT CC CT CT CT CT CC CT TT CT TT CC CT CT TT CC CC CT TT TT
TT CC CC CC CT CT CC CT TT TT CT CT CC TT CT TT CT TT CT CC CC CC CT TT TT TT CT CT CC CT
CT TT TT CT TT CT CC CC CT CT CT CC CC CT CT CC TT CC CT CC CT CC TT CC
CT CT CC CC CT CC CC TT CT CC CT CC CT CT CC CC TT CT CC TT TT CT CT TT CT CT CC TT CC CT
CT CC CT CT CC CT TT CT CT CC TT TT CT CT CT CT CC TT CT CT CC CT TT TT CC CC CT TT TT CC CT CC CT CT TT CC CT TT CT CT CT CT CT CT CT CC CC CT CC CT TT CT CC CC
CT CT CC CC TT CT CT TT TT CT CT CT CT CT TT CT CT CT CC CT TT CT CT CC CT CC CT TT CT CC CT CC TT CT CT CC CT CT CT CC CC CT CC CT CC CC CC TT TT CC CT CC CC TT
CC TT CC TT CT CC TT CT TT CT TT TT CT CT CT TT CC CT TT CT CC TT CT TT CC TT CT TT CT CT CT CT CT CC CT CT CT CT CT CC CC TT TT CC CC CC TT TT CT CC CC TT CT TT
TT TT CC CC TT CC CC CC CC CC CT CT CT CC CT CT CC CT TT CT CT CT CT CT CT TT CT CC CT CT TT CT CC CC CC CT CT TT CC TT TT CT CT CT CT CT TT CT CC CT CT CC CT CT
CC CC CT CT CT CT CT TT TT CT CT TT TT CT TT CC TT TT CT CC TT TT CT TT CC TT CC CT CT TT CT CT CT CT CC CC CT TT CC TT CT CC CT CT CT CT CC CT CT TT CT
CT CC TT CT CC CT CT TT CC CT CT TT TT CT CT TT CT CC CT CT CC TT CT CC CT TT TT CC CT TT
CT CC CT CT TT CC CT TT TT TT TT CT TT CT TT CT CC CT CC TT CT CT CT CT TT CT CT TT CT CT
TT CT TT TT CT CT CC TT CC CT CT CC CT TT CT CT CT CT TT CT CT CT TT CC CC CC CT CC TT CC
TT CT CC CT CT CC CC CC CT CT CC TT CT CT CT CT CT CT CT CT TT TT CT CC CC CT CT CT CT CC
CT TT TT CT CT CT CC CT CT TT CT CT CT CT CT CC CT CT CT TT CC CC TT CT CC CT CT TT CT CC
CT TT CC CT CC CT CT CT TT CC CC CT CT TT CT CT CT TT CT CC CT CT CT CT TT CT CT TT CT CC
CT CT CC CT CC CT CT CT CT CT CT CT CT TT TT CC CT TT CC CT TT CC CC CT CC CC CT TT CT CC
CT CT CC CC CT CT CT CT CT TT TT CT CT TT CC CC CT CT CC CT CC CT TT CC CT CT CC CC TT TT
TT CT TT CT TT CT TT TT CT TT CC TT CT CT CC CC CC CC CT CT TT TT TT CC CT CC CT TT CT TT
CC CT TT CT CT CC CC CT CT CT TT TT TT CT CC CT TT TT CT TT CC CT TT CC CT CC CT CT CT TT
CC CC CT TT TT TT CT CT CC CT TT CT CT CT TT CT TT CC CT CC CC CT CT CC CT CC TT CT CC CT
CT CC CC CT CC TT CT CT CT TT CT CT CC CT CT CC TT CC CT CT TT CT CT TT TT CT CT TT TT TT
CT CT CT CC CT CT CC TT CT CT CC CC CT CC CT CT CT CT CT CC CC CC CT CT TT CT TT CT CT CT
Odds ratio for C allele:
CT CT CT CT CT TT TT TT CT CC CT CC CT TT TT TT TT TT CT CT CT TT CT CT TT CT CC CT CT CT
CT CT TT TT CT CT TT TT CC CT TT TT CC CT TT CT TT TT CT TT TT CC CC CC CC CT CT CT TT CT
TT CC CC CC TT CT CT TT TT CT CT CT CT CT TT CT CT CC CT CT CT CT TT CC CC CT CT CT CC TT
CT TT TT CT TT TT CC CT CC TT CT CC TT CC CC CT CC CC TT TT CT CT CC CC CT TT TT CC CT TT
TT TT TT CT CT CT TT CT CT CC CC CT TT CT TT CT TT CT CC CC CT CC CT CT CT TT CC CC CT CT
CT
CC
CT
TT
CC
TT
CT
CT
CT
CT
CT
CT
CT
CC
CT
TT
TT
TT
CT
CT
CT
CC
CT
CT
CT
TT
CC
CT
CT
CC
CT
TT
CT
CT
TT
CT
CT
TT
CT
CT
CT
CC
TT
CT
CT
CT
CT
TT
CC
TT
CT
CC
CC
CC
CC
CT
TT
CT
TT
CT
CT
CT
CT
TT
TT
CC
CC
TT
CT
CT
CT
CC
CT
CC
CC
TT
CC
CC
CT
CC
CC
CT
CT
CC
CC
CC
TT
TT
TT
CC
TT
TT
CC
CT
CC
TT
TT
CT
CT
TT
CT
CT
CT
CC
CT
CT
CT
CC
CT
CT
TT
CT
CT
CT
CC
CT
TT
CT
CT
CT
1.35, p = 6.3 x 10-7
CT CC CC TT CT TT CC TT CT CT CT TT CT CT CC CC CC TT CT CT CC CT CT CT CT CC CC CT CT CC
CC CC CC CC CC CT CT TT CT CT CT CT CT CT TT CT CT TT TT CT CT CT TT CT CT CT CT TT TT TT
CT CC CT TT TT CT TT TT TT CT
23. Manhattan plot
(McCarthy et al.,Nature Reviews Genetics, May 2008)
24. p-value
the probability of seeing your data or more extreme
data if the null hypothesis is true.
By chance, with 1,000,000 statistical tests:
• a threshold of p=0.05
would show 50,000 “significant” associations
360 cases : 360 controls
• a threshold of p = 0.05/1,000,000 (5 x 10-8)
would show 0.05 “significant” associations
1590 cases: 1590 controls.
25. study design considerations
Case-control or cohort
Sample size
Phenotype definition
Comparability of cases and controls
• Genotyping quality
• Population substructure
• Laboratory procedures, genotyping, data cleaning
26. population stratification
requires both allele frequency
and disease prevalence differences
Balding. Nature Reviews Genetics. 2006; 7:781-791
27. Q-Q plots
(modified from McCarthy et al.,Nature Reviews Genetics, May 2008)
28. Allele frequency & effect size
Feasibility of identifying genetic variants by risk allele
frequency and strength of genetic effect (odds ratio).
TA Manolio et al. Nature 461, 747-753 (2009) doi:10.1038/nature08494
29. reasons for larger sample size:
• More genotypes / tests • Lower effect size
• More genotype error or • Lower frequency of risk
misclassification allele
• Higher heterogeneity of • Lower correlation
association between marker allele
and risk allele.
31. Multi-stage discovery
Carry-forward a large number of
potential associations through
multiple, narrowing stages.
Protect against false positives via
replication
Minimize false negative results
via permissive early thresholds
From
Hoover,
R.
Epidemiology.
18(1):13-‐17,
January
2007.
32. Meta analysis
Combine results from several
studies to increase power using
traditional methods of meta-
analysis.
Allows for first stage discovery of
small effect sizes
34. Wellcome Trust Case Control Consortium
Biggest projects undertaken to identify genetic variation that
may be associated with disease
£ 9 million in funding from Wellcome Trust
GWAS of seven common diseases: 2,000 cases each and 3,000
shared controls
All genotyping data available to scientific community
www.wtccc.org.uk; (Nature, vol 447, 7 June 2007)
40. Coronary Disease GWAS: 9p21
author McPherson Helgadottir Samani Larson
where Science Science NEJM BMC Med Gen
when May 2007 May 2007 August 2007 Sept 2007
design 3-stage case-control case control cohort
case control
discovery OHS 1 deCODE: Iceland A WTCCC Framingham Heart
OHS 2 Study
ARIC
replication CCHS Iceland B German Family Study
DHS 3 U.S. case-control
OHS-3
case severe premature CHD MI MI or revascularization + incident MI
definition fhx of CAD
age at onset <60 <70 males <66
<75 females
41. 9p21 results
study SNP locus hazard/odds ratio PAR
ARIC rs10757274 9p21 AB: 1.18 (1.02-1.37) 12-15%
BB: 1.29 (1.09-1.52)
CCHS rs10757274 9p21 AB: 1.26 (1.12-1.42) 10-13%
BB: 1.38 (1.19-1.60)
deCODE rs10757278 9p21 AB: 1.26 (1.16-1.36) 21%
BB: 1.64 (1.47-1.82)
deCODE rs10757278 9p21 AB: 1.49 (1.31-1.69) 31%
early onset BB: 2.02 (1.72 - 2.36)
Helgadottir, Science 2007
McPherson, Science 2007
43. 9p21 locus
not located within a “gene”
region contains CDKN2A and CDKN2B genes
• role in cell proliferation, cell aging and apoptosis -
important features of atherogenesis
• Sequencing did not reveal obvious candidates
may implicate a previously unrecognized gene or regulatory
element
same region also associated with type 2 diabetes
53. missing heritability (2009)
% of heritability
number of loci explained
Age-related macular degeneration 5 50%
Crohn’s disease 32 20%
Type-2 diabetes 18 6%
HDL cholesterol 7 5%
Height 40 5%
Early-onset MI 9 2.8%
Fasting glucose 4 1.5%
Manolio, Nature 2009
54. missing heritability
many variants with small effects yet to be found
• larger sample sizes have revealed more loci
true positives below significance threshold
contribution of rare variants
failure to identify true causal variant
structural variants poorly captured by arrays
previous estimates of heritability flawed
GxG or GxE interactions
55. missing heritability (update)
Meta-analysis of > 100,000 discovers 59 new associations
SNPs explain ~12% of trait variability & ~ 25% heritability
Some predict MI risk; point to LDL/HDL differences
56. disease prediction
hope: highly predictive and affordable genetic tests
reality: low discriminatory and predictive ability Manolio, NEJM 2010
57. next steps
Ever larger sample sizes
Studies of non-European ethnic populations
Sequencing implicated genetic regions
More complex genetic models
• Gene x Gene interactions
• pooling of rare variants
Functional biology: work in basic science and animal models
58. summary
GWAS have led to new Don’t forget:
biology • case definition
Small effect sizes • QC measures
Not useful in prediction • sample size and power
Much yet to be discovered • multiple testing
More complicated than we • independent replication
thought
59. “There have been few, if any,
similar bursts of discovery in the
history of medical research”
-- “Drinking from the fire hose …” (Hunter & Knox)
64. Sources / References / Reading
1. The International HapMap Consortium.* A haplotype map of the human genome. Nature, 2005. 437(7063): p. 1299-320.[16255080].
2. The Type 1 Diabetes Genetics Consortium.* Genome-wide association study and meta-analysis find that over 40 loci affect risk of type 1 diabetes. Nature Genetics, 2009
May 10 [19430480]
3. Myocardial Infarction Genetics Consortium.* Genome-wide association of early-onset myocardial infarction with single nucleotide polymorphisms and copy number
variants. Nat Genet. 2009 Mar;41(3):334-41 [19198609]
4. The Wellcome Trust Case Control Constortium.* Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature, 2007. 447
(7145): p. 661-78.[17554300].
5. Balding, D.J., A tutorial on statistical methods for population association studies. Nat Rev Genet, 2006. 7(10): p. 781-91.[16983374].
6. Christensen, K. and J.C. Murray, What genome-wide association studies can do for medicine. N Engl J Med, 2007. 356(11): p. 1094-7.[17360987].
7. Frazer, K.A., et al., A second generation human haplotype map of over 3.1 million SNPs. Nature, 2007. 449(7164): p. 851-61.[17943122].
8. Hirschhorn, J.N., et al., A comprehensive review of genetic association studies. Genet Med, 2002. 4(2): p. 45-61.[11882781].
9. Hoover, R. The evolution of epidemiologic research: from cottage industry to "big" science. Epidemiology. 2007 Jan;18(1):13-7. [17179754]
10. Hunter, D.J. and P. Kraft, Drinking from the fire hose--statistical issues in genomewide association studies. N Engl J Med, 2007. 357(5): p. 436-9.[17634446].
11. Li Y, Willer C, Sanna S, Abecasis G., Genotype imputation. Annu Rev Genomics Hum Genet. 2009;10:387-406. [19715440]
12. Johnson AD and O’Donnell CJ: Open access database of GWA results, BMC Medical Genetics 2009: 10:6
13. Manolio, T.A., et al., Genetics of ultrasonographic carotid atherosclerosis. Arterioscler Thromb Vasc Biol, 2004. 24(9): p. 1567-77.[15256397].
14. Manolio, T.A., L.D. Brooks, and F.S. Collins, A HapMap harvest of insights into the genetics of common disease. J Clin Invest, 2008. 118(5): p. 1590-605.[18451988].
15. Manolio TA, Collins FS, Cox NJ, et al. Finding the missing heritability of complex diseases. Nature. 2009 Oct 8;461(7265):747-53. [19812666]
16. Manolio, TA. Genomewide association studies and assessment of the risk of disease. N Engl J Med. 2010 Jul 8;363(2):166-76. [20647212]
17. McCarthy, M.I., et al., Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nat Rev Genet, 2008. 9(5): p. 356-69.[18398418].
18. Pearson, T.A. and T.A. Manolio, How to interpret a genome-wide association study. JAMA, 2008. 299(11): p. 1335-44.[18349094].
19. Samani NJ, Erdmann J, Hall AS, et al. Genomewide association analysis of coronary artery disease. N Engl J Med. 2007 Aug 2;357(5):443-53. [17634449]
20. Teslovich TM, Musunuru K, Smith AV, et al. Biological, clinical and population relevance of 95 loci for blood lipids. Nature. 2010 Aug 5;466(7307):707-13. [20686565]
21. NHGRI catalog of published GWA studies (http://genome.gov/GWASstudies)