SlideShare uma empresa Scribd logo
1 de 45
Baixar para ler offline
Simulating Genes in
GWAS
Kevin R. Thornton
Ecology and Evolutionary Biology
UC Irvine
slides will be available at
http://www.slideshare.net/molpopgen
http://www.molpopgen.org
Acknowledgements
Tony Long
Andrew Foran
Jaleal Sanjak
Several genomic regions have been implicated in linkage studies
and, recently, replicated evidence implicating specific genes has been
reported. Increasing evidence suggests an overlap in genetic suscept-
ibility with schizophrenia, a psychotic disorder with many similar-
ities to BD. In particular association findings have been reported with
expanded reference group analysis (Supplementary Table 9), it is of
interest that the closest gene to the signal at rs1526805 (P 5 2.2 3
1027
) is KCNC2 which encodes the Shaw-related voltage-gated pot-
assium channel. Ion channelopathies are well-recognized as causes of
episodic central nervous system disease, including seizures, ataxias
−log10
(P)
0
5
10
15
0
5
10
15
0
5
10
15
0
5
10
15
0
5
10
15
0
5
10
15
0
5
10
15
Chromosome
Type 2 diabetes
22
XX
21
20
19
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
22
XX
21
20
19
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
22
XX
21
20
19
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
22
XX
21
20
19
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
22
XX
21
20
19
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
22
XX
21
20
19
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
22
XX
21
20
19
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
Coronary artery disease
Crohn’s disease
Hypertension
Rheumatoid arthritis
Type 1 diabetes
Bipolar disorder
Figure 4 | Genome-wide scan for seven diseases. For each of seven diseases
2log10 of the trend test P value for quality-control-positive SNPs, excluding
those in each disease that were excluded for having poor clustering after
visual inspection, are plotted against position on each chromosome.
Chromosomes are shown in alternating colours for clarity, with
P values ,1 3 1025
highlighted in green. All panels are truncated at
2log10(P value) 5 15, although some markers (for example, in the MHC in
T1D and RA) exceed this significance threshold.
666
doi:10.1038/nature05911
Burton et al.
the differences observed in their allelic architecture. Some apparent
differences may simply be due to differences in the stage of investiga-
tion across traits. Studies in several conditions have clearly demon-
strated that the number of detected variants increases with increasing
sample size22–24
.
Population genetic theory suggests an explanation for the paucity
of variants explaining a large proportion of disease predisposition, in
that decreased reproductive fitness should typically act to reduce the
frequencies of high-risk variants. This might explain the relative lack
of variants detected so far for some neuropsychiatric conditions, such
as autism spectrum disorders, given their low reproductive fitness25
.
Yet for a condition such as type 1 diabetes, which has a similar pre-
valence, familial risk, early onset and poor reproductive fitness (at
yielded intriguing new variants33,34
. Studies of populations of recent
African ancestry in particular is likely to increase the yield of rare
variants and narrow the large chromosomal regions of association
identified in the ‘younger’ population due to extended linkage dis-
equilibrium, or the tendency for adjacent genetic loci to be inherited
together31
. Isolated populations may also be of value given their
potential to be enriched in unique variants35
.
The accuracy of current heritability estimates is also important,
because experimentally identified variants could never explain all the
variance in an erroneously inflated heritability estimate. Heritability
of quantitative traits, formally defined as the proportion of pheno-
typic variance in a population attributable to additive genetic factors
(narrow-sense heritability, h2
(ref. 36)) is typically estimated from
Table 1 | Estimates of heritability and number of loci for several complex traits
Disease Number of loci Proportion of heritability explained Heritability measure
Age-related macular degeneration72
5 50% Sibling recurrence risk
Crohn’s disease21
32 20% Genetic risk (liability)
Systemic lupus erythematosus73
6 15% Sibling recurrence risk
Type 2 diabetes74
18 6% Sibling recurrence risk
HDL cholesterol75
7 5.2% Residual* phenotypic variance
Height15
40 5% Phenotypic variance
Early onset myocardial infarction76
9 2.8% Phenotypic variance
Fasting glucose77
4 1.5% Phenotypic variance
* Residual is after adjustment for age, gender, diabetes.
748
Macmillan Publishers Limited. All rights reserved©2009
doi:10.1038/nature08494
Manolio et al.
NHGRI GWA Catalog
www.genome.gov/GWAStudies
www.ebi.ac.uk/fgpt/gwas/
Published Genome-Wide Associations through 12/2012
P -8 for 17 trait categories
doi:10.1371/journal.pbio.1000579
Wray et al.
Unsurprisingly, since the GWAS method is primarily powered
common alleles, risk allele frequencies were well above 5%
all TASPs (reported index TASs with an association p valu
5.0 ϫ 10Ϫ8 and all HapMap phase II CEU SNPs in LD [r2 Ͼ 0
OCA2, eye color
MC1R, hair color
LOXL1, exfoliation glaucoma125102030
OddsRatio
0 20 40 60 80 100
Reported risk allele frequency, %
1. Published odds ratios for discrete traits by reported risk allele frequencies. Labeled SNP-trait associations are those with the highest ORs. Note tha
is is on the log scale.
www.pnas.org/cgi/doi/10.1073/pnas.0903103106
Hindorff et al.
tion explained by rare variants, because natural selection should
mize the frequency of deleterious variants in the population [24].
efore, for any phenotype, many causal variants will be rare, and
proportion of population-level genetic variance in complex
notypes attributable to variants across the allele frequency
trum will depend upon the strength of selection in our evolu-
ry past. The problem is that this is something that we do not
that the power of detection is proportional to pa2
, but it is clear
for each complex trait, variance is contributed from the entire a
frequency spectrum. This highlights the scarcity of low-frequ
variants identified by GWAS for quantitative traits and com
disease in humans. Detecting these variants will require a comb
tion of greater sample size, better genotyping, and impro
phenotyping.
Minor allele frequency
(A) (B)
Absoluteeffect(SDunits)
<0.001 0.01 0.1 0.5
0135
Risk allele frequencyOddsraƟo
<0.001 0.01 0.1 0.5 1
1510 TRENDS in Genetics
e I. For quantitative traits (A), the absolute effect is plotted against the minor allele frequency, whereas for complex common diseases (B), the odds ratio is pl
st the risk allele frequency. Each of the 38 quantitative traits and 43 disease traits are represented by different colors. Abbreviation: SD, standard deviation
http://dx.doi.org/10.1016/j.tig.2014.02.003
Robinson et al.
1
2
3
4
5
6
7
8
9
10
OddsRatio
N
on−synonym
ous
sites
Prom
oters
(1kb)
Prom
oters
(5kb)
5’U
TR
s
3’U
TR
s
m
iR
TS
Intronic
regions
Intergenic
regions
Intergenic
TFBSsC
pG
islandsPR
eM
od
sites
O
R
egAnno
elem
entsEAR
regions
M
C
Ss
H
AR
s
PSG
s
Annotation Set
Enrichment/depletion analysis after adjusting for ’hitchhiking’ effects from non−synonymous sites
Fig. 2. Odds ratios for TAS block enrichment/depletion analysis after adjusting for ‘‘hitchhiking’’ effects from nonsynonymous sites. Four annotation sets (Splice
sites, Validated enhancers, EvoFold elements, and noncoding RNAs) are not represented here because no TAS blocks mapped to these annotation sets. The blue
circle represents the point estimate of the odds ratio (OR) and the red lines represent the 95% CI. Possible ‘‘hitchhiking’’ effects from nonsynonymous sites are
reduced by discarding any TASP/control SNP in r2 Ͼ 0.6 with a nonsynonymous SNP. For an explanation of the annotation sets on the x axis, we refer the reader
to Table S4. Note that the y axis is on the log scale. Nonsynonymous OR computation is not adjusted for ‘‘hitchhiking’’ effects.
www.pnas.org/cgi/doi/10.1073/pnas.0903103106
Hindorff et al.
Observation Interpretation
Missing H Lots
Uniform frequencies of “hits” Common associations exist
Rare hits have larger OR
Rare alleles may have larger
effects
Larger OR in genes Genes matter
Observation Interpretation
Rare hits have larger
OR
Rare alleles may have
larger effects
Disease is harmful
with respect to fitness
(in the evolutionary
sense).
Larger OR in genes Genes matter
0.4 0.02
0.01
0.01
0.00
a b
0.3
Frequencyofobservations
Causalvariantfrequency
0.2
0.1
0
0.05 0.50 1.0
Figure 3 | Inconsistency between genome-wide association stu
a | The frequency distribution of risk allele frequencies (shown in lighdoi:10.1038/nrg3118
Gibson
0.4 0.020
0.015
0.010
0.005
a b
0.3
Frequencyofobservations
Causalvariantfrequency
0.2
0.1
0
0.05 0.50 1.0 0.1 0.2 0.3 0.4 0.5
Odds
ratio
2
3
4
5
6
7
8
9
> 9
Figure 3 | Inconsistency between genome-wide association study results and rare variant expectations.
a | The frequency distribution of risk allele frequencies (shown in light red) for 414 common variant associations with 17
diseases is only slightly skewed towards lower-frequency variants. By contrast, simulations — in this case, assuming up
to nine rare causal variants inducing the common variant association with SNPs at the same frequency as observed on
common genotyping platforms (light green bars) — result in a marked left-skew with a peak for common variants
whose frequency is less than 10%. (The skew is even stronger if only a single causal variant is responsible.) The observed
data are thus not immediately consistent with the rare variant model. b | Part of the problem with synthetic associations
is that they would explain too much heritability if they were pervasively responsible for common variant effects. This is
due to the relationship between allele frequency, maximum possible linkage disequilibrium (LD) and the amount of
variance explained19
. The plot shows the expected odds ratio due to a rare variant of the indicated frequency (from
0.5% to 2%) if it increases the odds ratio at a common SNP (with which it is in maximum possible LD) by 1.1-fold.
Intermediate effect sizes (2 < odds ratio < 5) require combined causal variant frequencies in excess of 1%. As the
number of rare variants increases, the likelihood that they are in high LD with the common variant also drops, further
WS
The multiplicative model
G =
Y
i
(1 + ei)
Risch & colleagues, Pritchard,
countless others
The multiplicative model
G =
Y
i
(1 + ei)
0 2 4 6 8 10
0246810
Causative mutations on paternal allele
Causativemutationsonmaternalallele
0.2
0.4
0.6
0.8
1
1.2
1.4
Risch & colleagues, Pritchard,
countless others
WWHD?
(What would Haldane do?)
p2 2pq q2
1 1 sh 1 2s
Genotype AA Aa aa
Mating
frequency
Fitness
ˆq =
u
sh
ˆq ⇡
r
u
s
as h ! 0
DOI: 10.1017/S0305004100015644
Haldane
Mutation at rate u (per gamete per generation)
“A” allele
X
X
X
“a” allele
is heterogeneous
in its molecular origin
trans-heterozygotes are at risk.
Phenotype has (weak) effect on individual fitness
doi:10.1371/journal.pgen.1003258
Thornton et al.
E↵ect sizes ⇠ Exp( )
0.0
2.5
5.0
7.5
0.0 0.3 0.6 0.9
Effect size
density
= effect of haplotype.
Additive over causative mutations
hi
doi:10.1371/journal.pgen.1003258
Thornton et al.
Gij =
p
hi ⇥ hj
(geometric mean)
0 2 4 6 8 10
0246810
Causative mutations on paternal allele
Causativemutationsonmaternalallele
0.05
0.1
0.15
0.2
0.25
0.3 0.35
0.4
Pi,j = Gi,j + N(0, )
w = e
(Pi,j )2
2 2
S
doi:10.1371/journal.pgen.1003258
Thornton et al.
Aside: simulation tools
• C++ library for rapid forward simulation
• Available from https://github.com/molpopgen/
fwdpp
• Preprint on arXiv at http://arxiv.org/abs/1401.3786
1e−031e−021e−011e+001e+01
θ = ρ = 100
Population size (N diploids)
Meanruntime(days)
1000 10000 50000
sfs_code
SLiM
fwdpp (gamete−based)
fwddpp (individual−based)
0.0050.0200.0500.2000.5002.0005.000
θ = ρ = 500
Population size (N diploids)
1000 10000 50000
51020501002005001000
Population size (N diploids)
Meanpeakmemoryuse(Mb)
1000 10000 50000
1020501002005001000
Population size (N diploids)
1000 10000 50000
http://arxiv.org/abs/1401.3786
Thornton
2Nsh = 1 2Nsh = 10 2Nsh = 100
0
5
10
15
20
0.1 0.5 1 0.1 0.5 1 0.1 0.5 1
Proportion of new mutations that are deleterious
Meanruntime(hours)
Simulation
fwdpp (gamete−based)
fwdpp (individual−based)
SLiM
2Nsh = 1 2Nsh = 10 2Nsh = 100
0
50
100
150
0.1 0.5 1 0.1 0.5 1 0.1 0.5 1
Proportion of new mutations that are deleterious
Meanpeakmemoryuse(megabytes)
http://arxiv.org/abs/1401.3786
Thornton
Selection is weak
●●● ● ● ● ● ● ● ● ●
0.0 0.1 0.2 0.3 0.4 0.5
0.700.800.901.00
Mean effect size (λ)
Relativefitness
● Population mean fitness
Average fitness of a case
Average minimum fitness
doi:10.1371/journal.pgen.1003258
Thornton et al.
Heritability plateaus
●
●
●
●
●
● ●
● ● ●
●
0.0 0.1 0.2 0.3 0.4 0.5
0.000.020.040.06
Mean effect size (λλ)
Broad−senseheritability
doi:10.1371/journal.pgen.1003258
Thornton et al.
Rare alleles
0.00.20.4
Derived allele frequency
Proportion
1 5 10
●
●
● ● ● ● ● ● ● ● ●
= 0.25
doi:10.1371/journal.pgen.1003258
Thornton et al.
GWAS have poor power
0.0 0.1 0.2 0.3 0.4 0.5
0.00.20.40.60.8
Mean effect size (λ)
Power
GWAS
GWAS,
no recombination
resequencing
resequencing
no recombination
doi:10.1371/journal.pgen.1003258
Thornton et al.
Compare model to data…
0.4 0.020
0.015
0.010
0.005
a b
0.3
Frequencyofobservations
Causalvariantfrequency
0.2
0.1
0
0.05 0.50 1.0 0.1 0.2 0.3 0.4 0.5
Odds
ratio
2
3
4
5
6
7
8
9
> 9
Figure 3 | Inconsistency between genome-wide association study results and rare variant expectations.
a | The frequency distribution of risk allele frequencies (shown in light red) for 414 common variant associations with 17
diseases is only slightly skewed towards lower-frequency variants. By contrast, simulations — in this case, assuming up
to nine rare causal variants inducing the common variant association with SNPs at the same frequency as observed on
common genotyping platforms (light green bars) — result in a marked left-skew with a peak for common variants
whose frequency is less than 10%. (The skew is even stronger if only a single causal variant is responsible.) The observed
data are thus not immediately consistent with the rare variant model. b | Part of the problem with synthetic associations
is that they would explain too much heritability if they were pervasively responsible for common variant effects. This is
due to the relationship between allele frequency, maximum possible linkage disequilibrium (LD) and the amount of
variance explained19
. The plot shows the expected odds ratio due to a rare variant of the indicated frequency (from
0.5% to 2%) if it increases the odds ratio at a common SNP (with which it is in maximum possible LD) by 1.1-fold.
Intermediate effect sizes (2 < odds ratio < 5) require combined causal variant frequencies in excess of 1%. As the
number of rare variants increases, the likelihood that they are in high LD with the common variant also drops, further
reducing the probability that they can explain observed common variant association. Suppose that a disease has a
REVIEWS
doi:10.1038/nrg3118 doi:10.1371/journal.pbio.1000579
Gibson Wray et al.
…reveals a pretty good fit
doi:10.1371/journal.pbio.1000579
Wray et al.
0246810
MAF of most significant marker
(in cases)
Meannumberofmarkers
n = 36.899
0 0.1 0.2 0.3 0.4 0.5
= 0.05
(Based on simulating
imperfect SNP chips)
“Burden” tests do badly…
0.0 0.1 0.2 0.3 0.4 0.5
0.00.20.40.60.81.0
Mean effect size (λ)
Power
GWAS
GWAS
no recombination
Resequencing
Resequencing
no recombination
0.0 0.1 0.2 0.3 0.4 0.5
0.00.20.40.60.81.0
Mean effect size (λ)
Power
50 markers
50 markers
no recombination
100 markers
100 markers
no recombination
200 markers
200 markers
no recombination
250 markers
250 markers
no recombination
Madsen and Browning
(2009)
Li and Leal (2008)
doi:10.1371/journal.pgen.1003258
Thornton et al.
…because the model is
wrong.
●
●
●
●
●
●
●
●
●
●
0.0 0.1 0.2 0.3 0.4 0.5
02468
Mean effect size (λ)
Meannumberofcausativemutationsperdiploid
●
●
●
●
●
●
●
●
●
●
●
●
Controls
Cases
Controls (rares)
Cases (rares)
doi:10.1371/journal.pgen.1003258
Thornton et al.
SKAT does ok
0.0 0.1 0.2 0.3 0.4 0.5
0.00.20.40.60.81.0
Mean effect size (λ)
Power
Resequencing, default weights and optimal p−values
GWAS, default weights and optimal p−values
Resequencing, Madsen−Browning weights and optimal p−values
GWAS, Madsen−Browning weights and optimal p−values
doi:10.1371/journal.pgen.1003258
Thornton et al.
Manhattan plots
0 20 40 60 80 100
051015
Position (kbp)
−log10(p)
Common
Common, causative
Rare
Rare, causative
0 20 40 60 80 100
051015
Position (kbp)
−log10(p)
Common
Common, causative
Rare
Rare, causative
Methods), and excluded 153 individuals on this basis. We next
evolutio
particul
eases; po
tase 1) a
well as
biology
There
capture
implem
STRUC
reverted
subset o
librium
clearly p
rather th
show th
perhaps
tary Fig
The
results
Europe
trend te
1.05 for
diseases
than str
sion of
ariates i
only slig
graphica
P values
−log10(P)
0
5
10
15
Chromosome
22
X
21
20
19
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
3020
20
100
0
40
80
60
40
100
Observedteststatistic
Expected chi-squared value
a
b
Figure 2 | Genome-wide picture of geographic variation. a, P values for the
11-d.f. test for difference in SNP allele frequencies between geographical
regions, within the 9 collections. SNPs have been excluded using the project
quality control filters described in Methods. Green dots indicate SNPs with a
P value ,1 3 1025
. b, Quantile-quantile plots of these test statistics. SNPs at
which the test statistic exceeds 100 are represented by triangles at the top of
the plot, and the shaded region is the 95% concentration band (see
Methods). Also shown in blue is the quantile-quantile plot resulting from
removal of all SNPs in the 13 most differentiated regions (Table 1).
NATURE|Vol 447|7 June 2007
doi:10.1371/journal.pgen.1003258
doi:10.1038/nature05911
Burton et al.
Thornton et al.
A new association test
evolutionary interest, genes showing eviden
particularly interesting for the biology of tra
eases; possible targets for selection include N
tase 1) at 11q13, which could have a role in
well as TLR1 (toll-like receptor 1) at 4p14
biology of tuberculosis and leprosy has been
There may be important population st
captured by current geographical region
implementations of strongly model-base
STRUCTURE11,12
are impracticable for dat
reverted to the classical method of principa
subset of 197,175 SNPs chosen to reduce in
librium. Nevertheless, four of the first si
clearly picked up effects attributable to loc
rather than genome-wide structure. The rem
show the same predominant geographical t
perhaps unsurprisingly, London is set some
tary Fig. 8).
The overall effect of population struc
results seems to be small, once recent
Europe are excluded. Estimates of over-disp
trend test statistics (usually denoted l; ref. 1
1.05 for RA and T1D, respectively, to 1.08
−log10(P)
0
5
10
15
Chromosome
22
X
21
20
19
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
3020
20
100
0
40
80
60
40
100
Observedteststatistic
Expected chi-squared value
a
b
Figure 2 | Genome-wide picture of geographic variation. a, P values for the
11-d.f. test for difference in SNP allele frequencies between geographical
regions, within the 9 collections. SNPs have been excluded using the project
NATURE|Vol 447|7 June 2007
ESMK =
i=KX
i=1
✓
log10(pi) + log10
i
K
◆
doi:10.1371/journal.pgen.1003258
Thornton et al.
ESM is a more powerful test
0.0 0.1 0.2 0.3 0.4 0.5
0.00.20.40.60.81.0
Mean effect size (λ)
Power
GWAS
GWAS,
no recombination
resequencing
resequencing
no recombination
(Caveat: requires permutation to get p-values)
doi:10.1371/journal.pgen.1003258
Thornton et al.
Running ESM on real data
• We think we can implement ESM using a mix of the
PLINK toolkit plus some custom programs.
• We need data to test it out on.
• There are very few modern GWAS available for
reanalysis.
• Lack of data sharing hurts the field.
Rare alleles and missing
heritability
• Current tests are underpowered
• Heterogeneity means that GWAS “hits” tag few
causative mutations
• Causative mutations that are tagged tend to be
(relatively) common. These “common” mutations
have effect sizes much smaller than the typical
causative mutation that segregates
●
●
●● ●
●
●
● ●● ●
●
●
●
●
● ● ● ●
●
●
●
●
●
●
●
●● ● ● ● ●
●
●
● ●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
● ●
●
●
●
●
0.010 0.025
0.050 0.075
0.100 0.125
0.175 0.250
0.350 0.500
0.0000
0.0015
0.0030
0.0000
0.0015
0.0030
0.0000
0.0015
0.0030
0.0000
0.0015
0.0030
0.0000
0.0015
0.0030
0 1 2 0 1 2
Number of copies of derived allele at focal SNP
Meannumberofcausativesingletonsperindividual
Focal SNP
●
●
Most significant marker
Unassociated SNP
doi:10.1371/journal.pgen.1003258
Thornton et al.
Population growth
Time
PresentPast
Populationsize
H^2 insensitive to growth
●
●
●
●
● ●
●
●
●
●
0.01
0.02
0.03
0.04
0.0 0.1 0.2 0.3 0.4 0.5
Average effect size of new mutation
Meanbroad−senseheritability
model
●
constant
growth
Unpublished
Consistent with recent
findings from other groups
N A LY S I S
t despite these substantial shifts in the
rall frequency spectrum, the impact on
netic load—namely, the mean number of
eterious variants per individual and thus
average fitness—is much more subtle.
n the semidominant case, the individual
rden is essentially unaffected by these
mographic events (Fig. 1c,d). With growth,
increased number of segregating sites
alanced exactly by a decrease in the mean
quency (with the converse being true for
bottleneck model) so that the number
variants per individual stays constant.
is kind of balance is predicted by classic
tation-selection balance models18 and
n be shown to hold for general changes
population size, provided that selection
trong and deleterious alleles are at least
tially dominant (Supplementary Note).
The behavior of the recessive model is
re complicated (Fig. 1e,f). In the bottle-
a b
c d
e f
100
–1,000 0 1,000 2,000 3,000
Time since beginning of bottleneck (generations) Time since beginning of growth (generations)
10,000
1,000
–1,000 0 1,000 2,000 3,000
Time (generations)
Bottleneck
Populationsize
100,000
10,000
Time (generations)
Growth
Populationsize
–200 –100 0 100 200
10
2
10
4
SemidominantRecessive
NumberperMB
102
104
102
104
umberperMB
umberperMB
100
10
2
10
4
NumberperMB
Number of
segregating sites
Number of segregating
sites
Number of segregating sites
Number of deleterious
alleles per individual
Number of deleterious alleles per individual
Number of rare deleterious alleles
Number of segregating sites
Number of rare segregating sites
Number of rare segregating
sites
Number of rare segregating sites
Number of rare segregating sites
Load: number of deleterious alleles per individual
Load: number of homozygous sites per individual
Load: number of deleterious alleles per individual
Number of rare
deleterious
alleles per individual
Number of rare deleterious alleles per individual
–200 –100 0 100 200
ure 1 Time course of load and other key
ects of variation through a bottleneck and
onential growth. (a,b) The bottleneck (a)
exponential growth (b). (c–f) The expected
mber of variants and alleles per MB assuming
midominant mutations (c,d) or recessive
tations (e,f) with s = 1% and a mutation rate
site per generation of 10−8.
Simons et al.
doi:10.1038/ng.2896
Power is affected
0.00
0.02
0.04
0.06
0.08
0.000 0.025 0.050 0.075 0.100
Effect size of segregating causative mutation
Frequencyinpopulation
Model
Constant
Growth
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0.0
0.2
0.4
0.6
0.8
0.0 0.1 0.2 0.3 0.4 0.5
Mean effect size of causative mutation
Power
Statistic
●
ESM50
Logit
SKAT
Model
Constant
Growth
Unpublished
Excellent fit to empirical
data
Frequency of most−associated marker
No.markers
0.0 0.2 0.4 0.6 0.8 1.0
02468101214
Unpublished
Implications
• Power to detect regions with modest effects on risk
(4-5% contribution to broad-sense heritability) is
very low in growing populations
• The explanatory power of simple models is
probably far from exhausted
Implications
• Much more likely to detect loci
with mutations of modest
effect
• Underlying distribution of
mean effect size across loci is
completely unknown in any
system
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0.0
0.2
0.4
0.6
0.8
0.0 0.1 0.2 0.3 0.4 0.5
Mean effect size of causative mutation
Power
Statistic
●
ESM50
Logit
SKAT
Model
Constant
Growth
Unpublished
Future work
• Multilocus models with epistasis
• Machine learning approaches: do they work?
• Develop new simulation tools
• Make simulation output available
• Implement ESM test for analyzing real GWAS data
Other work in the lab
• Copy number variation in Drosophila: doi: 10.1093/
molbev/msu124
• Detecting TE insertions using paired-end data in
Drosophila: doi: 10.1093/molbev/mst129
• Modeling experimental evolution: doi: 10.1093/
molbev/msu048
• Structural variation and variation in gene
expression

Mais conteúdo relacionado

Mais procurados

FINALBIOCHEMCAPSTONEPP
FINALBIOCHEMCAPSTONEPPFINALBIOCHEMCAPSTONEPP
FINALBIOCHEMCAPSTONEPPBraeden Lovett
 
Candidemia in HIV-positive patients in Dschang District Hospital (West Region...
Candidemia in HIV-positive patients in Dschang District Hospital (West Region...Candidemia in HIV-positive patients in Dschang District Hospital (West Region...
Candidemia in HIV-positive patients in Dschang District Hospital (West Region...Claude Nangwat
 
5. identificacion de genes
5. identificacion de genes5. identificacion de genes
5. identificacion de genesRafael Ospina
 
Mark Daly - Finding risk genes in psychiatric disorders
Mark Daly - Finding risk genes in psychiatric disordersMark Daly - Finding risk genes in psychiatric disorders
Mark Daly - Finding risk genes in psychiatric disorderswef
 
SCHIZOPHRENIA RESEARCH FORUM - LIVE WEBINAR June 2017 Kristen Brennand
SCHIZOPHRENIA RESEARCH FORUM - LIVE WEBINAR June 2017   Kristen BrennandSCHIZOPHRENIA RESEARCH FORUM - LIVE WEBINAR June 2017   Kristen Brennand
SCHIZOPHRENIA RESEARCH FORUM - LIVE WEBINAR June 2017 Kristen Brennandwef
 
Assessment of immunomolecular_expression_and_prognostic_role_of_tlr7_among_pa...
Assessment of immunomolecular_expression_and_prognostic_role_of_tlr7_among_pa...Assessment of immunomolecular_expression_and_prognostic_role_of_tlr7_among_pa...
Assessment of immunomolecular_expression_and_prognostic_role_of_tlr7_among_pa...dr.Ihsan alsaimary
 
Neurological Complications of Venomous Snake Bites
Neurological Complications of Venomous Snake Bites Neurological Complications of Venomous Snake Bites
Neurological Complications of Venomous Snake Bites Ade Wijaya
 
Schizophrenia Research Forum Live Webinar - June 28, 2017 - Rusty Gage
Schizophrenia Research Forum Live Webinar - June 28, 2017 - Rusty Gage Schizophrenia Research Forum Live Webinar - June 28, 2017 - Rusty Gage
Schizophrenia Research Forum Live Webinar - June 28, 2017 - Rusty Gage wef
 
Heimler Syndrome Paper
Heimler Syndrome PaperHeimler Syndrome Paper
Heimler Syndrome PaperNada Alsheqaih
 
10 στρατηγική αντιμετώπισης λοιμώξεων από πολυανθεκτικά
10 στρατηγική αντιμετώπισης λοιμώξεων από πολυανθεκτικά10 στρατηγική αντιμετώπισης λοιμώξεων από πολυανθεκτικά
10 στρατηγική αντιμετώπισης λοιμώξεων από πολυανθεκτικάEKMED
 
GWAS analysis of QTL for resistance against Edwardsiella ictaluri in F2 inter...
GWAS analysis of QTL for resistance against Edwardsiella ictaluri in F2 inter...GWAS analysis of QTL for resistance against Edwardsiella ictaluri in F2 inter...
GWAS analysis of QTL for resistance against Edwardsiella ictaluri in F2 inter...Golden Helix
 

Mais procurados (18)

FINALBIOCHEMCAPSTONEPP
FINALBIOCHEMCAPSTONEPPFINALBIOCHEMCAPSTONEPP
FINALBIOCHEMCAPSTONEPP
 
Candidemia in HIV-positive patients in Dschang District Hospital (West Region...
Candidemia in HIV-positive patients in Dschang District Hospital (West Region...Candidemia in HIV-positive patients in Dschang District Hospital (West Region...
Candidemia in HIV-positive patients in Dschang District Hospital (West Region...
 
Eshg poster roman-naranjo
Eshg poster roman-naranjoEshg poster roman-naranjo
Eshg poster roman-naranjo
 
5. identificacion de genes
5. identificacion de genes5. identificacion de genes
5. identificacion de genes
 
14KoVar
14KoVar14KoVar
14KoVar
 
Wendy_Poster_cchmc_final
Wendy_Poster_cchmc_finalWendy_Poster_cchmc_final
Wendy_Poster_cchmc_final
 
Osmf rnk
Osmf rnkOsmf rnk
Osmf rnk
 
Mark Daly - Finding risk genes in psychiatric disorders
Mark Daly - Finding risk genes in psychiatric disordersMark Daly - Finding risk genes in psychiatric disorders
Mark Daly - Finding risk genes in psychiatric disorders
 
SCHIZOPHRENIA RESEARCH FORUM - LIVE WEBINAR June 2017 Kristen Brennand
SCHIZOPHRENIA RESEARCH FORUM - LIVE WEBINAR June 2017   Kristen BrennandSCHIZOPHRENIA RESEARCH FORUM - LIVE WEBINAR June 2017   Kristen Brennand
SCHIZOPHRENIA RESEARCH FORUM - LIVE WEBINAR June 2017 Kristen Brennand
 
Assessment of immunomolecular_expression_and_prognostic_role_of_tlr7_among_pa...
Assessment of immunomolecular_expression_and_prognostic_role_of_tlr7_among_pa...Assessment of immunomolecular_expression_and_prognostic_role_of_tlr7_among_pa...
Assessment of immunomolecular_expression_and_prognostic_role_of_tlr7_among_pa...
 
Neurological Complications of Venomous Snake Bites
Neurological Complications of Venomous Snake Bites Neurological Complications of Venomous Snake Bites
Neurological Complications of Venomous Snake Bites
 
Genomics
GenomicsGenomics
Genomics
 
Schizophrenia Research Forum Live Webinar - June 28, 2017 - Rusty Gage
Schizophrenia Research Forum Live Webinar - June 28, 2017 - Rusty Gage Schizophrenia Research Forum Live Webinar - June 28, 2017 - Rusty Gage
Schizophrenia Research Forum Live Webinar - June 28, 2017 - Rusty Gage
 
Heimler Syndrome Paper
Heimler Syndrome PaperHeimler Syndrome Paper
Heimler Syndrome Paper
 
10 στρατηγική αντιμετώπισης λοιμώξεων από πολυανθεκτικά
10 στρατηγική αντιμετώπισης λοιμώξεων από πολυανθεκτικά10 στρατηγική αντιμετώπισης λοιμώξεων από πολυανθεκτικά
10 στρατηγική αντιμετώπισης λοιμώξεων από πολυανθεκτικά
 
Chen 2008
Chen 2008Chen 2008
Chen 2008
 
GWAS analysis of QTL for resistance against Edwardsiella ictaluri in F2 inter...
GWAS analysis of QTL for resistance against Edwardsiella ictaluri in F2 inter...GWAS analysis of QTL for resistance against Edwardsiella ictaluri in F2 inter...
GWAS analysis of QTL for resistance against Edwardsiella ictaluri in F2 inter...
 
73 84
73 8473 84
73 84
 

Destaque

Именуем ресурсы для Windows 8 правильно
Именуем ресурсы для Windows 8 правильноИменуем ресурсы для Windows 8 правильно
Именуем ресурсы для Windows 8 правильноslavabobik
 
CURSO DE SUPERACIÓN LENGUA CASTELLANA Y CIENCIAS NATURALES
CURSO DE SUPERACIÓN LENGUA CASTELLANA Y CIENCIAS NATURALES CURSO DE SUPERACIÓN LENGUA CASTELLANA Y CIENCIAS NATURALES
CURSO DE SUPERACIÓN LENGUA CASTELLANA Y CIENCIAS NATURALES Mary Barrera Muñoz
 
Green Hope Reserve, Nicaragua
Green Hope Reserve, NicaraguaGreen Hope Reserve, Nicaragua
Green Hope Reserve, NicaraguaIUCNGPAP
 
SEO & Web Redesign - Before and After
SEO & Web Redesign - Before and AfterSEO & Web Redesign - Before and After
SEO & Web Redesign - Before and AfterDavy Bour
 
OpenConext Workshop TNC2014
OpenConext Workshop TNC2014OpenConext Workshop TNC2014
OpenConext Workshop TNC2014openconext
 
MPLS -Novel approach of multi protocol label switching for Asynchronous Trans...
MPLS -Novel approach of multi protocol label switching for Asynchronous Trans...MPLS -Novel approach of multi protocol label switching for Asynchronous Trans...
MPLS -Novel approach of multi protocol label switching for Asynchronous Trans...Vatsal Shah
 
Hitch hiking journalclub
Hitch hiking journalclubHitch hiking journalclub
Hitch hiking journalclubKevin Thornton
 
Food stamps
Food stampsFood stamps
Food stampsfragrom
 

Destaque (16)

Rich poor
Rich poorRich poor
Rich poor
 
Happy life 8
Happy life 8Happy life 8
Happy life 8
 
Именуем ресурсы для Windows 8 правильно
Именуем ресурсы для Windows 8 правильноИменуем ресурсы для Windows 8 правильно
Именуем ресурсы для Windows 8 правильно
 
CURSO DE SUPERACIÓN LENGUA CASTELLANA Y CIENCIAS NATURALES
CURSO DE SUPERACIÓN LENGUA CASTELLANA Y CIENCIAS NATURALES CURSO DE SUPERACIÓN LENGUA CASTELLANA Y CIENCIAS NATURALES
CURSO DE SUPERACIÓN LENGUA CASTELLANA Y CIENCIAS NATURALES
 
Green Hope Reserve, Nicaragua
Green Hope Reserve, NicaraguaGreen Hope Reserve, Nicaragua
Green Hope Reserve, Nicaragua
 
SEO & Web Redesign - Before and After
SEO & Web Redesign - Before and AfterSEO & Web Redesign - Before and After
SEO & Web Redesign - Before and After
 
OpenConext Workshop TNC2014
OpenConext Workshop TNC2014OpenConext Workshop TNC2014
OpenConext Workshop TNC2014
 
Halloween powerpoint
Halloween powerpointHalloween powerpoint
Halloween powerpoint
 
Strategy english
Strategy englishStrategy english
Strategy english
 
MPLS -Novel approach of multi protocol label switching for Asynchronous Trans...
MPLS -Novel approach of multi protocol label switching for Asynchronous Trans...MPLS -Novel approach of multi protocol label switching for Asynchronous Trans...
MPLS -Novel approach of multi protocol label switching for Asynchronous Trans...
 
Vivo vitrothingamajig
Vivo vitrothingamajigVivo vitrothingamajig
Vivo vitrothingamajig
 
Analisis de Encuestas
Analisis de EncuestasAnalisis de Encuestas
Analisis de Encuestas
 
Hitch hiking journalclub
Hitch hiking journalclubHitch hiking journalclub
Hitch hiking journalclub
 
Food stamps
Food stampsFood stamps
Food stamps
 
Seminar2015
Seminar2015Seminar2015
Seminar2015
 
Je
JeJe
Je
 

Semelhante a Simulating Genes in Genome-wide Association Studies

Intro to Biomedical Informatics 701
Intro to Biomedical Informatics 701 Intro to Biomedical Informatics 701
Intro to Biomedical Informatics 701 Chirag Patel
 
Genetics In Psychiatry
Genetics In PsychiatryGenetics In Psychiatry
Genetics In PsychiatryFrank Meissner
 
Genetic factors associated with periodontium
Genetic factors associated with periodontiumGenetic factors associated with periodontium
Genetic factors associated with periodontiumDR. OINAM MONICA DEVI
 
Human Genetic Variation poster
Human Genetic Variation posterHuman Genetic Variation poster
Human Genetic Variation posterRihan Islam
 
Big data and the exposome, Oregon State 040616
Big data and the exposome, Oregon State 040616Big data and the exposome, Oregon State 040616
Big data and the exposome, Oregon State 040616Chirag Patel
 
CKD and Genetics 2015
CKD and Genetics 2015CKD and Genetics 2015
CKD and Genetics 2015Meguid Nahas
 
GENETICS AND PERIODONTAL DISEASES
GENETICS AND PERIODONTAL DISEASES GENETICS AND PERIODONTAL DISEASES
GENETICS AND PERIODONTAL DISEASES Danish Hamid
 
Genetic variation and its role in health pharmacology
Genetic variation and its role in health pharmacologyGenetic variation and its role in health pharmacology
Genetic variation and its role in health pharmacologyDeepak Kumar
 
Biomedical Informatics 706: Precision Medicine with exposures
Biomedical Informatics 706: Precision Medicine with exposuresBiomedical Informatics 706: Precision Medicine with exposures
Biomedical Informatics 706: Precision Medicine with exposuresChirag Patel
 
Informatics and data analytics to support for exposome-based discovery
Informatics and data analytics to support for exposome-based discoveryInformatics and data analytics to support for exposome-based discovery
Informatics and data analytics to support for exposome-based discoveryChirag Patel
 
Improved sensitivit
Improved sensitivitImproved sensitivit
Improved sensitivitt7260678
 
Systemic lupus erythematosusssss (2).pptx
Systemic lupus erythematosusssss (2).pptxSystemic lupus erythematosusssss (2).pptx
Systemic lupus erythematosusssss (2).pptxJuan Diego
 
A New Generation Of Mechanism-Based Biomarkers For The Clinic
A New Generation Of Mechanism-Based Biomarkers For The ClinicA New Generation Of Mechanism-Based Biomarkers For The Clinic
A New Generation Of Mechanism-Based Biomarkers For The ClinicJoaquin Dopazo
 
thesis_final dhwani.docx
thesis_final dhwani.docxthesis_final dhwani.docx
thesis_final dhwani.docxssuser1e2788
 

Semelhante a Simulating Genes in Genome-wide Association Studies (20)

Schizophrenia
SchizophreniaSchizophrenia
Schizophrenia
 
Intro to Biomedical Informatics 701
Intro to Biomedical Informatics 701 Intro to Biomedical Informatics 701
Intro to Biomedical Informatics 701
 
Genetics In Psychiatry
Genetics In PsychiatryGenetics In Psychiatry
Genetics In Psychiatry
 
Genetic factors associated with periodontium
Genetic factors associated with periodontiumGenetic factors associated with periodontium
Genetic factors associated with periodontium
 
Human Genetic Variation poster
Human Genetic Variation posterHuman Genetic Variation poster
Human Genetic Variation poster
 
Genetic factors
Genetic factorsGenetic factors
Genetic factors
 
Big data and the exposome, Oregon State 040616
Big data and the exposome, Oregon State 040616Big data and the exposome, Oregon State 040616
Big data and the exposome, Oregon State 040616
 
CKD and Genetics 2015
CKD and Genetics 2015CKD and Genetics 2015
CKD and Genetics 2015
 
Cjn.05780616.full lupus
Cjn.05780616.full  lupusCjn.05780616.full  lupus
Cjn.05780616.full lupus
 
GENETICS AND PERIODONTAL DISEASES
GENETICS AND PERIODONTAL DISEASES GENETICS AND PERIODONTAL DISEASES
GENETICS AND PERIODONTAL DISEASES
 
Genetic variation and its role in health pharmacology
Genetic variation and its role in health pharmacologyGenetic variation and its role in health pharmacology
Genetic variation and its role in health pharmacology
 
Genetics in Psychiatry
Genetics in PsychiatryGenetics in Psychiatry
Genetics in Psychiatry
 
Biomedical Informatics 706: Precision Medicine with exposures
Biomedical Informatics 706: Precision Medicine with exposuresBiomedical Informatics 706: Precision Medicine with exposures
Biomedical Informatics 706: Precision Medicine with exposures
 
Informatics and data analytics to support for exposome-based discovery
Informatics and data analytics to support for exposome-based discoveryInformatics and data analytics to support for exposome-based discovery
Informatics and data analytics to support for exposome-based discovery
 
Improved sensitivit
Improved sensitivitImproved sensitivit
Improved sensitivit
 
Systemic lupus erythematosusssss (2).pptx
Systemic lupus erythematosusssss (2).pptxSystemic lupus erythematosusssss (2).pptx
Systemic lupus erythematosusssss (2).pptx
 
GWAS Study.pdf
GWAS Study.pdfGWAS Study.pdf
GWAS Study.pdf
 
A New Generation Of Mechanism-Based Biomarkers For The Clinic
A New Generation Of Mechanism-Based Biomarkers For The ClinicA New Generation Of Mechanism-Based Biomarkers For The Clinic
A New Generation Of Mechanism-Based Biomarkers For The Clinic
 
thesis_final dhwani.docx
thesis_final dhwani.docxthesis_final dhwani.docx
thesis_final dhwani.docx
 
DOOR syndrome
DOOR syndromeDOOR syndrome
DOOR syndrome
 

Último

DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSSLeenakshiTyagi
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxAArockiyaNisha
 
fundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyfundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyDrAnita Sharma
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfrohankumarsinghrore1
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfSumit Kumar yadav
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000Sapana Sha
 

Último (20)

DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSS
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
 
fundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyfundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomology
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 

Simulating Genes in Genome-wide Association Studies

  • 1. Simulating Genes in GWAS Kevin R. Thornton Ecology and Evolutionary Biology UC Irvine slides will be available at http://www.slideshare.net/molpopgen http://www.molpopgen.org
  • 3. Several genomic regions have been implicated in linkage studies and, recently, replicated evidence implicating specific genes has been reported. Increasing evidence suggests an overlap in genetic suscept- ibility with schizophrenia, a psychotic disorder with many similar- ities to BD. In particular association findings have been reported with expanded reference group analysis (Supplementary Table 9), it is of interest that the closest gene to the signal at rs1526805 (P 5 2.2 3 1027 ) is KCNC2 which encodes the Shaw-related voltage-gated pot- assium channel. Ion channelopathies are well-recognized as causes of episodic central nervous system disease, including seizures, ataxias −log10 (P) 0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15 Chromosome Type 2 diabetes 22 XX 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 22 XX 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 22 XX 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 22 XX 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 22 XX 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 22 XX 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 22 XX 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 Coronary artery disease Crohn’s disease Hypertension Rheumatoid arthritis Type 1 diabetes Bipolar disorder Figure 4 | Genome-wide scan for seven diseases. For each of seven diseases 2log10 of the trend test P value for quality-control-positive SNPs, excluding those in each disease that were excluded for having poor clustering after visual inspection, are plotted against position on each chromosome. Chromosomes are shown in alternating colours for clarity, with P values ,1 3 1025 highlighted in green. All panels are truncated at 2log10(P value) 5 15, although some markers (for example, in the MHC in T1D and RA) exceed this significance threshold. 666 doi:10.1038/nature05911 Burton et al.
  • 4. the differences observed in their allelic architecture. Some apparent differences may simply be due to differences in the stage of investiga- tion across traits. Studies in several conditions have clearly demon- strated that the number of detected variants increases with increasing sample size22–24 . Population genetic theory suggests an explanation for the paucity of variants explaining a large proportion of disease predisposition, in that decreased reproductive fitness should typically act to reduce the frequencies of high-risk variants. This might explain the relative lack of variants detected so far for some neuropsychiatric conditions, such as autism spectrum disorders, given their low reproductive fitness25 . Yet for a condition such as type 1 diabetes, which has a similar pre- valence, familial risk, early onset and poor reproductive fitness (at yielded intriguing new variants33,34 . Studies of populations of recent African ancestry in particular is likely to increase the yield of rare variants and narrow the large chromosomal regions of association identified in the ‘younger’ population due to extended linkage dis- equilibrium, or the tendency for adjacent genetic loci to be inherited together31 . Isolated populations may also be of value given their potential to be enriched in unique variants35 . The accuracy of current heritability estimates is also important, because experimentally identified variants could never explain all the variance in an erroneously inflated heritability estimate. Heritability of quantitative traits, formally defined as the proportion of pheno- typic variance in a population attributable to additive genetic factors (narrow-sense heritability, h2 (ref. 36)) is typically estimated from Table 1 | Estimates of heritability and number of loci for several complex traits Disease Number of loci Proportion of heritability explained Heritability measure Age-related macular degeneration72 5 50% Sibling recurrence risk Crohn’s disease21 32 20% Genetic risk (liability) Systemic lupus erythematosus73 6 15% Sibling recurrence risk Type 2 diabetes74 18 6% Sibling recurrence risk HDL cholesterol75 7 5.2% Residual* phenotypic variance Height15 40 5% Phenotypic variance Early onset myocardial infarction76 9 2.8% Phenotypic variance Fasting glucose77 4 1.5% Phenotypic variance * Residual is after adjustment for age, gender, diabetes. 748 Macmillan Publishers Limited. All rights reserved©2009 doi:10.1038/nature08494 Manolio et al.
  • 5. NHGRI GWA Catalog www.genome.gov/GWAStudies www.ebi.ac.uk/fgpt/gwas/ Published Genome-Wide Associations through 12/2012 P -8 for 17 trait categories
  • 7. Unsurprisingly, since the GWAS method is primarily powered common alleles, risk allele frequencies were well above 5% all TASPs (reported index TASs with an association p valu 5.0 ϫ 10Ϫ8 and all HapMap phase II CEU SNPs in LD [r2 Ͼ 0 OCA2, eye color MC1R, hair color LOXL1, exfoliation glaucoma125102030 OddsRatio 0 20 40 60 80 100 Reported risk allele frequency, % 1. Published odds ratios for discrete traits by reported risk allele frequencies. Labeled SNP-trait associations are those with the highest ORs. Note tha is is on the log scale. www.pnas.org/cgi/doi/10.1073/pnas.0903103106 Hindorff et al.
  • 8. tion explained by rare variants, because natural selection should mize the frequency of deleterious variants in the population [24]. efore, for any phenotype, many causal variants will be rare, and proportion of population-level genetic variance in complex notypes attributable to variants across the allele frequency trum will depend upon the strength of selection in our evolu- ry past. The problem is that this is something that we do not that the power of detection is proportional to pa2 , but it is clear for each complex trait, variance is contributed from the entire a frequency spectrum. This highlights the scarcity of low-frequ variants identified by GWAS for quantitative traits and com disease in humans. Detecting these variants will require a comb tion of greater sample size, better genotyping, and impro phenotyping. Minor allele frequency (A) (B) Absoluteeffect(SDunits) <0.001 0.01 0.1 0.5 0135 Risk allele frequencyOddsraƟo <0.001 0.01 0.1 0.5 1 1510 TRENDS in Genetics e I. For quantitative traits (A), the absolute effect is plotted against the minor allele frequency, whereas for complex common diseases (B), the odds ratio is pl st the risk allele frequency. Each of the 38 quantitative traits and 43 disease traits are represented by different colors. Abbreviation: SD, standard deviation http://dx.doi.org/10.1016/j.tig.2014.02.003 Robinson et al.
  • 9. 1 2 3 4 5 6 7 8 9 10 OddsRatio N on−synonym ous sites Prom oters (1kb) Prom oters (5kb) 5’U TR s 3’U TR s m iR TS Intronic regions Intergenic regions Intergenic TFBSsC pG islandsPR eM od sites O R egAnno elem entsEAR regions M C Ss H AR s PSG s Annotation Set Enrichment/depletion analysis after adjusting for ’hitchhiking’ effects from non−synonymous sites Fig. 2. Odds ratios for TAS block enrichment/depletion analysis after adjusting for ‘‘hitchhiking’’ effects from nonsynonymous sites. Four annotation sets (Splice sites, Validated enhancers, EvoFold elements, and noncoding RNAs) are not represented here because no TAS blocks mapped to these annotation sets. The blue circle represents the point estimate of the odds ratio (OR) and the red lines represent the 95% CI. Possible ‘‘hitchhiking’’ effects from nonsynonymous sites are reduced by discarding any TASP/control SNP in r2 Ͼ 0.6 with a nonsynonymous SNP. For an explanation of the annotation sets on the x axis, we refer the reader to Table S4. Note that the y axis is on the log scale. Nonsynonymous OR computation is not adjusted for ‘‘hitchhiking’’ effects. www.pnas.org/cgi/doi/10.1073/pnas.0903103106 Hindorff et al.
  • 10. Observation Interpretation Missing H Lots Uniform frequencies of “hits” Common associations exist Rare hits have larger OR Rare alleles may have larger effects Larger OR in genes Genes matter
  • 11. Observation Interpretation Rare hits have larger OR Rare alleles may have larger effects Disease is harmful with respect to fitness (in the evolutionary sense). Larger OR in genes Genes matter
  • 12. 0.4 0.02 0.01 0.01 0.00 a b 0.3 Frequencyofobservations Causalvariantfrequency 0.2 0.1 0 0.05 0.50 1.0 Figure 3 | Inconsistency between genome-wide association stu a | The frequency distribution of risk allele frequencies (shown in lighdoi:10.1038/nrg3118 Gibson
  • 13. 0.4 0.020 0.015 0.010 0.005 a b 0.3 Frequencyofobservations Causalvariantfrequency 0.2 0.1 0 0.05 0.50 1.0 0.1 0.2 0.3 0.4 0.5 Odds ratio 2 3 4 5 6 7 8 9 > 9 Figure 3 | Inconsistency between genome-wide association study results and rare variant expectations. a | The frequency distribution of risk allele frequencies (shown in light red) for 414 common variant associations with 17 diseases is only slightly skewed towards lower-frequency variants. By contrast, simulations — in this case, assuming up to nine rare causal variants inducing the common variant association with SNPs at the same frequency as observed on common genotyping platforms (light green bars) — result in a marked left-skew with a peak for common variants whose frequency is less than 10%. (The skew is even stronger if only a single causal variant is responsible.) The observed data are thus not immediately consistent with the rare variant model. b | Part of the problem with synthetic associations is that they would explain too much heritability if they were pervasively responsible for common variant effects. This is due to the relationship between allele frequency, maximum possible linkage disequilibrium (LD) and the amount of variance explained19 . The plot shows the expected odds ratio due to a rare variant of the indicated frequency (from 0.5% to 2%) if it increases the odds ratio at a common SNP (with which it is in maximum possible LD) by 1.1-fold. Intermediate effect sizes (2 < odds ratio < 5) require combined causal variant frequencies in excess of 1%. As the number of rare variants increases, the likelihood that they are in high LD with the common variant also drops, further WS The multiplicative model G = Y i (1 + ei) Risch & colleagues, Pritchard, countless others
  • 14. The multiplicative model G = Y i (1 + ei) 0 2 4 6 8 10 0246810 Causative mutations on paternal allele Causativemutationsonmaternalallele 0.2 0.4 0.6 0.8 1 1.2 1.4 Risch & colleagues, Pritchard, countless others
  • 15. WWHD? (What would Haldane do?) p2 2pq q2 1 1 sh 1 2s Genotype AA Aa aa Mating frequency Fitness ˆq = u sh ˆq ⇡ r u s as h ! 0 DOI: 10.1017/S0305004100015644 Haldane
  • 16. Mutation at rate u (per gamete per generation) “A” allele X X X “a” allele is heterogeneous in its molecular origin trans-heterozygotes are at risk. Phenotype has (weak) effect on individual fitness doi:10.1371/journal.pgen.1003258 Thornton et al.
  • 17. E↵ect sizes ⇠ Exp( ) 0.0 2.5 5.0 7.5 0.0 0.3 0.6 0.9 Effect size density = effect of haplotype. Additive over causative mutations hi doi:10.1371/journal.pgen.1003258 Thornton et al.
  • 18. Gij = p hi ⇥ hj (geometric mean) 0 2 4 6 8 10 0246810 Causative mutations on paternal allele Causativemutationsonmaternalallele 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 Pi,j = Gi,j + N(0, ) w = e (Pi,j )2 2 2 S doi:10.1371/journal.pgen.1003258 Thornton et al.
  • 19. Aside: simulation tools • C++ library for rapid forward simulation • Available from https://github.com/molpopgen/ fwdpp • Preprint on arXiv at http://arxiv.org/abs/1401.3786
  • 20. 1e−031e−021e−011e+001e+01 θ = ρ = 100 Population size (N diploids) Meanruntime(days) 1000 10000 50000 sfs_code SLiM fwdpp (gamete−based) fwddpp (individual−based) 0.0050.0200.0500.2000.5002.0005.000 θ = ρ = 500 Population size (N diploids) 1000 10000 50000 51020501002005001000 Population size (N diploids) Meanpeakmemoryuse(Mb) 1000 10000 50000 1020501002005001000 Population size (N diploids) 1000 10000 50000 http://arxiv.org/abs/1401.3786 Thornton
  • 21. 2Nsh = 1 2Nsh = 10 2Nsh = 100 0 5 10 15 20 0.1 0.5 1 0.1 0.5 1 0.1 0.5 1 Proportion of new mutations that are deleterious Meanruntime(hours) Simulation fwdpp (gamete−based) fwdpp (individual−based) SLiM 2Nsh = 1 2Nsh = 10 2Nsh = 100 0 50 100 150 0.1 0.5 1 0.1 0.5 1 0.1 0.5 1 Proportion of new mutations that are deleterious Meanpeakmemoryuse(megabytes) http://arxiv.org/abs/1401.3786 Thornton
  • 22. Selection is weak ●●● ● ● ● ● ● ● ● ● 0.0 0.1 0.2 0.3 0.4 0.5 0.700.800.901.00 Mean effect size (λ) Relativefitness ● Population mean fitness Average fitness of a case Average minimum fitness doi:10.1371/journal.pgen.1003258 Thornton et al.
  • 23. Heritability plateaus ● ● ● ● ● ● ● ● ● ● ● 0.0 0.1 0.2 0.3 0.4 0.5 0.000.020.040.06 Mean effect size (λλ) Broad−senseheritability doi:10.1371/journal.pgen.1003258 Thornton et al.
  • 24. Rare alleles 0.00.20.4 Derived allele frequency Proportion 1 5 10 ● ● ● ● ● ● ● ● ● ● ● = 0.25 doi:10.1371/journal.pgen.1003258 Thornton et al.
  • 25. GWAS have poor power 0.0 0.1 0.2 0.3 0.4 0.5 0.00.20.40.60.8 Mean effect size (λ) Power GWAS GWAS, no recombination resequencing resequencing no recombination doi:10.1371/journal.pgen.1003258 Thornton et al.
  • 26. Compare model to data… 0.4 0.020 0.015 0.010 0.005 a b 0.3 Frequencyofobservations Causalvariantfrequency 0.2 0.1 0 0.05 0.50 1.0 0.1 0.2 0.3 0.4 0.5 Odds ratio 2 3 4 5 6 7 8 9 > 9 Figure 3 | Inconsistency between genome-wide association study results and rare variant expectations. a | The frequency distribution of risk allele frequencies (shown in light red) for 414 common variant associations with 17 diseases is only slightly skewed towards lower-frequency variants. By contrast, simulations — in this case, assuming up to nine rare causal variants inducing the common variant association with SNPs at the same frequency as observed on common genotyping platforms (light green bars) — result in a marked left-skew with a peak for common variants whose frequency is less than 10%. (The skew is even stronger if only a single causal variant is responsible.) The observed data are thus not immediately consistent with the rare variant model. b | Part of the problem with synthetic associations is that they would explain too much heritability if they were pervasively responsible for common variant effects. This is due to the relationship between allele frequency, maximum possible linkage disequilibrium (LD) and the amount of variance explained19 . The plot shows the expected odds ratio due to a rare variant of the indicated frequency (from 0.5% to 2%) if it increases the odds ratio at a common SNP (with which it is in maximum possible LD) by 1.1-fold. Intermediate effect sizes (2 < odds ratio < 5) require combined causal variant frequencies in excess of 1%. As the number of rare variants increases, the likelihood that they are in high LD with the common variant also drops, further reducing the probability that they can explain observed common variant association. Suppose that a disease has a REVIEWS doi:10.1038/nrg3118 doi:10.1371/journal.pbio.1000579 Gibson Wray et al.
  • 27. …reveals a pretty good fit doi:10.1371/journal.pbio.1000579 Wray et al. 0246810 MAF of most significant marker (in cases) Meannumberofmarkers n = 36.899 0 0.1 0.2 0.3 0.4 0.5 = 0.05 (Based on simulating imperfect SNP chips)
  • 28. “Burden” tests do badly… 0.0 0.1 0.2 0.3 0.4 0.5 0.00.20.40.60.81.0 Mean effect size (λ) Power GWAS GWAS no recombination Resequencing Resequencing no recombination 0.0 0.1 0.2 0.3 0.4 0.5 0.00.20.40.60.81.0 Mean effect size (λ) Power 50 markers 50 markers no recombination 100 markers 100 markers no recombination 200 markers 200 markers no recombination 250 markers 250 markers no recombination Madsen and Browning (2009) Li and Leal (2008) doi:10.1371/journal.pgen.1003258 Thornton et al.
  • 29. …because the model is wrong. ● ● ● ● ● ● ● ● ● ● 0.0 0.1 0.2 0.3 0.4 0.5 02468 Mean effect size (λ) Meannumberofcausativemutationsperdiploid ● ● ● ● ● ● ● ● ● ● ● ● Controls Cases Controls (rares) Cases (rares) doi:10.1371/journal.pgen.1003258 Thornton et al.
  • 30. SKAT does ok 0.0 0.1 0.2 0.3 0.4 0.5 0.00.20.40.60.81.0 Mean effect size (λ) Power Resequencing, default weights and optimal p−values GWAS, default weights and optimal p−values Resequencing, Madsen−Browning weights and optimal p−values GWAS, Madsen−Browning weights and optimal p−values doi:10.1371/journal.pgen.1003258 Thornton et al.
  • 31. Manhattan plots 0 20 40 60 80 100 051015 Position (kbp) −log10(p) Common Common, causative Rare Rare, causative 0 20 40 60 80 100 051015 Position (kbp) −log10(p) Common Common, causative Rare Rare, causative Methods), and excluded 153 individuals on this basis. We next evolutio particul eases; po tase 1) a well as biology There capture implem STRUC reverted subset o librium clearly p rather th show th perhaps tary Fig The results Europe trend te 1.05 for diseases than str sion of ariates i only slig graphica P values −log10(P) 0 5 10 15 Chromosome 22 X 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 3020 20 100 0 40 80 60 40 100 Observedteststatistic Expected chi-squared value a b Figure 2 | Genome-wide picture of geographic variation. a, P values for the 11-d.f. test for difference in SNP allele frequencies between geographical regions, within the 9 collections. SNPs have been excluded using the project quality control filters described in Methods. Green dots indicate SNPs with a P value ,1 3 1025 . b, Quantile-quantile plots of these test statistics. SNPs at which the test statistic exceeds 100 are represented by triangles at the top of the plot, and the shaded region is the 95% concentration band (see Methods). Also shown in blue is the quantile-quantile plot resulting from removal of all SNPs in the 13 most differentiated regions (Table 1). NATURE|Vol 447|7 June 2007 doi:10.1371/journal.pgen.1003258 doi:10.1038/nature05911 Burton et al. Thornton et al.
  • 32. A new association test evolutionary interest, genes showing eviden particularly interesting for the biology of tra eases; possible targets for selection include N tase 1) at 11q13, which could have a role in well as TLR1 (toll-like receptor 1) at 4p14 biology of tuberculosis and leprosy has been There may be important population st captured by current geographical region implementations of strongly model-base STRUCTURE11,12 are impracticable for dat reverted to the classical method of principa subset of 197,175 SNPs chosen to reduce in librium. Nevertheless, four of the first si clearly picked up effects attributable to loc rather than genome-wide structure. The rem show the same predominant geographical t perhaps unsurprisingly, London is set some tary Fig. 8). The overall effect of population struc results seems to be small, once recent Europe are excluded. Estimates of over-disp trend test statistics (usually denoted l; ref. 1 1.05 for RA and T1D, respectively, to 1.08 −log10(P) 0 5 10 15 Chromosome 22 X 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 3020 20 100 0 40 80 60 40 100 Observedteststatistic Expected chi-squared value a b Figure 2 | Genome-wide picture of geographic variation. a, P values for the 11-d.f. test for difference in SNP allele frequencies between geographical regions, within the 9 collections. SNPs have been excluded using the project NATURE|Vol 447|7 June 2007 ESMK = i=KX i=1 ✓ log10(pi) + log10 i K ◆ doi:10.1371/journal.pgen.1003258 Thornton et al.
  • 33. ESM is a more powerful test 0.0 0.1 0.2 0.3 0.4 0.5 0.00.20.40.60.81.0 Mean effect size (λ) Power GWAS GWAS, no recombination resequencing resequencing no recombination (Caveat: requires permutation to get p-values) doi:10.1371/journal.pgen.1003258 Thornton et al.
  • 34. Running ESM on real data • We think we can implement ESM using a mix of the PLINK toolkit plus some custom programs. • We need data to test it out on. • There are very few modern GWAS available for reanalysis. • Lack of data sharing hurts the field.
  • 35. Rare alleles and missing heritability • Current tests are underpowered • Heterogeneity means that GWAS “hits” tag few causative mutations • Causative mutations that are tagged tend to be (relatively) common. These “common” mutations have effect sizes much smaller than the typical causative mutation that segregates
  • 36. ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.010 0.025 0.050 0.075 0.100 0.125 0.175 0.250 0.350 0.500 0.0000 0.0015 0.0030 0.0000 0.0015 0.0030 0.0000 0.0015 0.0030 0.0000 0.0015 0.0030 0.0000 0.0015 0.0030 0 1 2 0 1 2 Number of copies of derived allele at focal SNP Meannumberofcausativesingletonsperindividual Focal SNP ● ● Most significant marker Unassociated SNP doi:10.1371/journal.pgen.1003258 Thornton et al.
  • 38. H^2 insensitive to growth ● ● ● ● ● ● ● ● ● ● 0.01 0.02 0.03 0.04 0.0 0.1 0.2 0.3 0.4 0.5 Average effect size of new mutation Meanbroad−senseheritability model ● constant growth Unpublished
  • 39. Consistent with recent findings from other groups N A LY S I S t despite these substantial shifts in the rall frequency spectrum, the impact on netic load—namely, the mean number of eterious variants per individual and thus average fitness—is much more subtle. n the semidominant case, the individual rden is essentially unaffected by these mographic events (Fig. 1c,d). With growth, increased number of segregating sites alanced exactly by a decrease in the mean quency (with the converse being true for bottleneck model) so that the number variants per individual stays constant. is kind of balance is predicted by classic tation-selection balance models18 and n be shown to hold for general changes population size, provided that selection trong and deleterious alleles are at least tially dominant (Supplementary Note). The behavior of the recessive model is re complicated (Fig. 1e,f). In the bottle- a b c d e f 100 –1,000 0 1,000 2,000 3,000 Time since beginning of bottleneck (generations) Time since beginning of growth (generations) 10,000 1,000 –1,000 0 1,000 2,000 3,000 Time (generations) Bottleneck Populationsize 100,000 10,000 Time (generations) Growth Populationsize –200 –100 0 100 200 10 2 10 4 SemidominantRecessive NumberperMB 102 104 102 104 umberperMB umberperMB 100 10 2 10 4 NumberperMB Number of segregating sites Number of segregating sites Number of segregating sites Number of deleterious alleles per individual Number of deleterious alleles per individual Number of rare deleterious alleles Number of segregating sites Number of rare segregating sites Number of rare segregating sites Number of rare segregating sites Number of rare segregating sites Load: number of deleterious alleles per individual Load: number of homozygous sites per individual Load: number of deleterious alleles per individual Number of rare deleterious alleles per individual Number of rare deleterious alleles per individual –200 –100 0 100 200 ure 1 Time course of load and other key ects of variation through a bottleneck and onential growth. (a,b) The bottleneck (a) exponential growth (b). (c–f) The expected mber of variants and alleles per MB assuming midominant mutations (c,d) or recessive tations (e,f) with s = 1% and a mutation rate site per generation of 10−8. Simons et al. doi:10.1038/ng.2896
  • 40. Power is affected 0.00 0.02 0.04 0.06 0.08 0.000 0.025 0.050 0.075 0.100 Effect size of segregating causative mutation Frequencyinpopulation Model Constant Growth ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.0 0.2 0.4 0.6 0.8 0.0 0.1 0.2 0.3 0.4 0.5 Mean effect size of causative mutation Power Statistic ● ESM50 Logit SKAT Model Constant Growth Unpublished
  • 41. Excellent fit to empirical data Frequency of most−associated marker No.markers 0.0 0.2 0.4 0.6 0.8 1.0 02468101214 Unpublished
  • 42. Implications • Power to detect regions with modest effects on risk (4-5% contribution to broad-sense heritability) is very low in growing populations • The explanatory power of simple models is probably far from exhausted
  • 43. Implications • Much more likely to detect loci with mutations of modest effect • Underlying distribution of mean effect size across loci is completely unknown in any system ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.0 0.2 0.4 0.6 0.8 0.0 0.1 0.2 0.3 0.4 0.5 Mean effect size of causative mutation Power Statistic ● ESM50 Logit SKAT Model Constant Growth Unpublished
  • 44. Future work • Multilocus models with epistasis • Machine learning approaches: do they work? • Develop new simulation tools • Make simulation output available • Implement ESM test for analyzing real GWAS data
  • 45. Other work in the lab • Copy number variation in Drosophila: doi: 10.1093/ molbev/msu124 • Detecting TE insertions using paired-end data in Drosophila: doi: 10.1093/molbev/mst129 • Modeling experimental evolution: doi: 10.1093/ molbev/msu048 • Structural variation and variation in gene expression