SlideShare uma empresa Scribd logo
1 de 34
University of Agricultural Sciences
Department of Genetics and Plant Breeding
GKVK, Bangalore-65
PG Seminar: GPB 581(0+1)
On
Genome wide association studies.
Submitted By: Varsha Gayatonde
Sr.MSc, PALB 2235
Dept. of Genetics & Plant Breeding
UAS, GKVK, Bangalore
Submitted To: Dr. R.Nandini
Associate Professor
Dept. of Genetics & Plant Breeding
UAS, GKVK, Bangalore
Department of Genetics and Plant Breeding
University of Agricultural Sciences
GKVK, Bangalore-560065
Contents
SI. No. Title Content
1 Introduction to mapping.
2 Terminologies
3 Brief History
4 Association mapping
5 Comparison of GWAS and Biparental mapping
6 Concept of Linkage disequilibrium(LD)
7 Factors affecting LD and use in plant system.
8 Genome wide association studies.
9 Methodologies.
10 Challenges while conducting GWAS
11 Advantages and disadvantages
12 GWAS studies in Arabidopsis
13 Studies on Rice
14 Maize smut studies
15
GWAS studies on MYB related traits and in other
crops.
16 Current association challenges
18 Conclusion
19 References
DEPARTMENT OF GENETICS AND PLANT BREEDING
GKVK, UNIVERSITY OF AGRICULTURAL SCIENCES, BANGALORE – 560 065
First Seminar: GPB 581 (0+1)
GENOME WIDE ASSOCIATION STUDIES
Synopsis
Genome wide association is a study design in which many markers spread across a genome, are
genotyped and test a statistical association with a phenotype are performed locally along the
genome. It is also an examination of many common genetic variants in different individuals to
see if any variant is associated with a trait.
The first prospects for whole genome association studies began in early 2002¹. This LD
based association mapping started with human beings, later in Arabidopsis, rice, grapevine,
wheat, soybean, maize, tomato and other model organisms.̒ HapMap,’ the multi-country effort to
identify, catalog common human genetic variant put a milestone to extend application to other
organisms in order to make GWAS powerful. SNPs need to be chosen widely distributed in a
way, that reflects the genetic variation. Selection of suitable and desirable markers yield fine
mapping² and the genome wide chips, which enabled increased coverage of markers improving
power in association signals. But this doesn’t necessarily imply increased power of detecting
association loci. The other drawbacks here are need of large population size, pooling and cost of
preparing DNA samples and less knowledge about the risk of a trait.
To overcome this drawback recently researchers upgraded the statistical approaches,
proper imputation of genotypes and advanced approaches like nested association mapping,
candidate gene association approach, and the web application of GWAS (GWAPP in
Arabidopsis)³. Despite of its drawbacks still GWAS is famous due to its dropping genotyping
costs, which is likely to drive association studies away from candidate gene based studies. This
will likely tinvolve whole genome resequencing of all the individuals in a population, will allow
assessment of the effect of point mutation,insertions,deletions and large structure variation.
Many studies conducted using GWAS as a tool worldwide on different factors like
temperature effect on cobs, agronomic variants, agroclimatic diversities, flowering and grain
yield traits, disease diversity etc. A major benefit of GWAS is one time genotyping and repeated
phenotyping in different environmental conditions help to study ̒n’ number of traits within a
short period over a large area. The rapid development of high throughput sequencing technology
is that, population choice for GWAS studies will no longer be restricted to current model
organisms and will slowly become more forced on which species are more relevant for
answering biological questions.
References:
1. Ozoki, K., 2001, A high throughput SNP typing system for GWAS, Springer. 16:1134-
1137.
2. Tohn, P, A., 2009, Validating and refining GWAS signals, Nature. 10:318-329.
Name: VARSHA
ID:PALB 2235
Date: 29/10/2013
Time: 10:00 AM
Introduction
The level of the genetic diversity is pivotal for world food security and
survival of human civilization on earth. Domestication resulted as improved
cultivars in several crops to produce food for the better supply of the human
diet. Presently 150 plant species cultivated in agriculture, twelve provide about
75% of human food and four produce 50% of human diet. According to FHO
report, ∼800 million people are suffering from food deficiency. An attention to
improve agricultural production to eliminate or, at least, reduce the feeding
problems.
The narrow genetic base of modern crop cultivars is the serious obstacle to
sustain and improve crop productivity due to rapid vulnerability to potentially
new biotic and abiotic stresses. Plant germplasm resources comprising of wild
plant species, modern cultivars, and their crop wild relatives, are the important
reservoirs of natural genetic variations.
Originated from a number of historical genetic events as a respond to
environmental stresses and selection through crop domestication
• The objective of genetic mapping is to identify simply inherited markers
in close proximity to genetic factors affecting quantitative traits
(Quantitative trait loci, or QTL).
• This localization relies on processes that create a statistical association
between marker and QTL alleles and processes that selectively reduce
that association as a function of the marker distance from the QTL.
Why we need genome mapping?
Gene mapping in the map of genes present inside our chromosome. In
Eukaryotes genes are condensed tightly inside the compact system. We have to
know which gene is answering for the trait of interest. Expression of genotypes
give us phenotypes. We can’t look in to gene and genotypes, though it is our
disability. So to know that we have to calculate mathematically. The further
extension of mapping technology is to know the traits in a more easier, cheaper
and within a short period of time.
Genome
• The genome is all the DNA in a cell. All the DNA on all the chromosomes
,includes genes, intergenic sequences, repeats, Specifically, it is all the
DNA in an organelle.
• Eukaryotes can have 2-3 genomes; Nuclear genome, Mitochondrial
genome, Plastid genome respectively. If not specified, “genome” usually
refers to the nuclear genome.
Terminologies
False negative: the declaration of an outcome as statistically non-significant,
when the effect is actually genuine.
False positive: the declaration of an outcome as statistically significant, when
there is no true effect.
Linkage: refers to coinheritance of different loci within a genetic distance on
the chromosome.
Linkage equilibrium: LE is a random association of alleles at different loci and
equals the product of allele frequencies within haplotypes.
Linkage disequilibrium: LD is a non-random association of alleles at different
loci, describing the condition with non-equal frequency of haplotypes in a
population.
Minor allele Frequency(MAF):The frequency of the less common alleles of a
polymorphic locus. Its value lies between 0 to 0.5,and can be vary between
populations.
Odd ratio: Measurement of association that is commonly used in case control
studies. Defined as odd of exposure to the susceptible genetic variant in case
compared with that in controls. If OR significantly greater than 1,then the
genetic variant is associated with a disease.
Association Mapping
Association mapping, a high resolution method for mapping quantitative
trait loci based on linkage disequilibrium. Association refers to covariance of a
marker polymorphism and a trait of interest.
The first association study to attempt a genome scanning plants
was conducted in sea beet (Beta vulgaris ssp. maritima), a wild relative of sugar
beet (Beta vulgaris ssp. vulgaris).The first association study of a quantitative
trait based on a candidate gene was the analysis of flowering time and the
dwarf8 (d8) gene in maize. . Association mapping is based on the principle of
Linkage disequilibrium (LD) and is based on the entire population.
How it works?
A group of unrelated individuals normally presents variation for many
phenotypic aspects, thus several traits can be studied in the same population
using the same genotypic data. A higher proportion of molecular markers are
likely to be polymorphic, providing better genome coverage than any biparental
map. Elite lines are used for study, multi-year and multi-location phenotypic
data may be available at no additional cost.
Goal of association mapping
Identification of susceptibility variant, replication in differecohort/population, u
nderstanding of genetic function at cell level,this can lead to
identification of durable targets, development of drug for prevention better
understanding of the cellular processes that are involved in disease treatments.
Association mapping offers many advantages over linkage analysis:
• much higher mapping resolution;
• greater allele number and broader reference population;
• less research time in establishing an association
• Utilizes existing individuals.
• Multi-trial phenotypic data stored in databases can be used.
Limitations
• Resources for phenotyping and statistical issues.
• Population structure results in spurious associations.
Two types of association mapping:
• Success of either methods depends on population size and degree of LD
• Genome wide scanning Markers spanned across the genome, Moderate
to extensive LD. If LD is high, GWA is useful with low resolution
mapping.
• Candidate gene scanning Sequencing only candidate gene which has
low LD
Flowchart of a gene association study
.
Biparental mapping GWAS
What are genome-wide association studies?
Genome-wide association studies are a relatively new way for scientists to
identify genes involved in human disease. This method searches the genome for
small variations, called single nucleotide polymorphisms or SNPs (pronounced
“snips”), that occur more frequently in people with a particular disease than in
people without the disease. Each study can look at hundreds or thousands of
SNPs at the same time. Researchers use data from this type of study to
pinpoint genes that may contribute to a person’s risk of developing a certain
disease.
Because genome-wide association studies examine SNPs across the
genome, they represent a promising way to study complex, common diseases in
which many genetic variations contribute to a person’s risk. This approach has
already identified SNPs related to several complex conditions including
diabetes, heart abnormalities, Parkinson disease, and Crohn disease.
Researchers hope that future genome-wide association studies will identify
more SNPs associated with chronic diseases, as well as variations that affect a
person’s response to certain drugs and influence interactions between a
person’s genes and the environment.
1) No cross required, works with
existing germplasm.
2) Phenotypic data can be already
available.
3) High resolution.
4) More than 2 alleles are tested.
5) Many loci for a single trait are
concurrently analyzed.
6) Comparatively low.
1) Experimental cross required.
2) Phenotypes to be collected.
3) Limited mapping resolution.
4) Essentially 2 alleles are tested
5) Constraints to segregating loci
6) between parental lines.
High detection power
Synonyms
Genome-wide case-control studies; Genome-wide genetic association analysis.
Genome-wide association studies (GWAS) are projects to investigate the
statistical association between phenotypes and a dense set of genetic markers
(Genetic Marker) that capture a substantial amount of genetic variations in the
genome, using a large number of matched samples.
Phenotypes can be qualitative traits such as disease status or quantitative
traits such as blood pressure. Statistical association between disease status
and alleles of a genetic marker is carried out by categorical data analysis.
Genetic markers are usually genotyped by microarray chips. Whether a
substantial genetic variation in the genome, including common, rare, and
structural variations, is captured by the set of markers depends on the number
of markers and their chromosome locations.
The typical number of single nucleotide polymorphism (SNP) markers
used in a current GWAS,
societies depends on the exploitation of genetic recombination and allelic
diversity for crop improvement, and many of the world’s farmers depend
directly on the harvests of the genetic diversity they sow for food and fodder as
well as the next seasons seed (Smale et al., 2004).
The considerable genetic diversity of traditional varieties of crops is the
most immediately useful and economically valuable part of global biodiversity.
Subsistence farmers use landraces as a key component of their cropping
systems. Such farmers account for about 60% of agricultural land use and
provide approximately 15-20% of the world’s food (Francis, 1986). In addition,
landraces are the basic raw materials used by plant breeders for developing
modern varieties. Over the last few decades, awareness of the rich diversity of
exotic or wild germplasm has increased. This has lead to a more intensive use
of this germplasm in breeding and thereby yields of many crops increased
dramatically.
Aim to identify which regions (or SNPs) in the genome are associated with
disease or certain phenotype.
Design: it identifies the population structure, Select case subjects (those
with disease),Select control subjects (healthy),Genotype a million SNPs for each
subject, Determine which SNP is associated, Encoded data ,Ranking SNPs.
History of GWAS
Successful study published in 2005, with investigating patients age
related molecular degeneration.
Prior to GWAS in 2000 Inheritance studies of linkage families. Then
the revolution occurred in HapMap2003, which is the variety of sequencing
techniques to discover and catalog SNPs in different population.
Human Genome Project
The Human Genome Project was declared complete in April 2003. An
initial rough draft of the human genome was available in June 2000 and by
February 2001 a working draft had been completed and published followed by
the final sequencing mapping of the human genome on April 14, 2003.
Although this was reported to be 99% of the human genome with 99.99%
accuracy a major quality assessment of the human genome sequence was
published on May 27, 2004 indicating over 92% of sampling exceeded 99.99%
accuracy which is within the intended goal. Further analyses and papers on the
HGP continue to occur.
Hap Map
Hap Map is a Multi-country effort to identify, catalog common human
genetic variants. Developed to better understand and catalogue LD patterns
across the genome in several populations. Genotyped ~4 million SNPs on
samples of African, east Asian, European ancestry. All genotype data in a
publicly available data base. here we can download the genotype data. It is able
to examine LD patterns across genome, Can estimate approximate coverage of
a given SNP chip and Can represent 80-90% of common SNPs with~300,000
tag SNPs for European or Asian samples and~500,000 tag SNPs for African
samples.
Thousand genome project
Another spinoff from Human genome project, the 1000 genome project
launched in 2008.A 3 year project covered most of the countries worldwide. It
mainly targeted the African countries. This 1000 genome project showing
researchers how dynamic the human genome really is and why it is so.
Concepts Underlying the Study Design
Single Nucleotide Polymorphisms
The modern unit of genetic variation is the single nucleotide
polymorphism or SNP. SNPs are single base-pair changes in the DNA
sequence that occur with high frequency in the human genome. For the
purposes of genetic studies, SNPs are typically used as markers of a genomic
region, with the large majority of them having a minimal impact on biological
systems. SNPs can have functional consequences, however, causing amino acid
changes, changes to mRNA transcript stability, and changes to transcription
factor binding affinity. SNPs are by far the most abundant form of genetic
variation in the human genome. SNPs are notably a type of common genetic
variation; many SNPs are present in a large proportion of human populations.
SNPs typically have two alleles, meaning within a population there are two
commonly occurring base-pair possibilities for a SNP location. The frequency of
a SNP is giving in terms of the minor allele frequency or the frequency of the
less common allele. For example, a SNP with a minor allele (G) frequency of
0.40 implies that 40% of a population has the G allele versus the more
common allele (the major allele), which is found in 60% of the population.
Linkage Disequilibrium
Linkage disequilibrium (LD) is a property of SNPs on a contiguous stretch
of genomic sequence that describes the degree to which an allele of one SNP is
inherited or correlated with an allele of another SNP within a population. The
term linkage disequilibrium was coined by population geneticists in an attempt
to mathematically describe changes in genetic variation within a population
over time. It is related to the concept of chromosomal linkage, where two
markers on a chromosome remain physically joined on a chromosome through
generations of a family. Recombination events within a family from generation
to generation break apart chromosomal segments. This effect is amplified
through generations, and in a population of fixed size undergoing random
mating, repeated random recombination events will break apart segments of
contiguous chromosome (containing linked alleles) until eventually all alleles in
the population are in linkage equilibrium or are independent. Thus, linkage
between markers on a population scale is referred to as linkage disequilibrium.
The rate of LD decay is dependent on multiple factors, including the
population size, the number of founding chromosomes in the population, and
the number of generations for which the population has existed. As such,
different human sub-populations have different degrees and patterns of LD.
African-descent populations are the most ancestral and have smaller regions of
LD due to the accumulation of more recombination events in that group.
European-descent and Asian descent populations were created by founder
events (a sampling of chromosomes from the African population), which altered
the number of founding chromosomes, the population size, and the
generational age of the population. These populations on average have larger
regions of LD than African-descent groups. Many measures of LD have been
proposed, though all are ultimately related to the difference between the
observed frequency of co-occurrence for two alleles (i.e. a two-marker
haplotype) and the frequency expected if the two markers are independent. The
two commonly used measures of linkage disequilibrium are D’ and r².
LD in animal system
LD in Humans:
 LD has been studied extensively in humans (Homo sapiens) - Pritchard
& Przeworski
 There is tremendous heterogeneity in human LD estimates because of
differences in loci, marker types (microsatellites versus SNPs), sample
populations, and chromosome type (sex chromosomes versus
autosomes).
LD in other animal systems:
 LD studies have also been conducted in cattle and Fruit flies
(Drosophila melanogaster)
 Extensive LD reported in the Dutch black and white dairy cattle
populations Globalization of semen trading.
LD in plant systems
MAIZE: In maize (Zea mays ssp. mays), several studies have been
conducted to investigate. LD over a wide range of population and marker types.
The patterns of LD vary substantially with the population chosen.
Ten million investigated sequence diversity at 21 loci on chromosome 1 in
a diverse group of maize germplasm.
ARABIDOPSIS: The LD pattern in Arabidopsis is a sharp contrast to the
pattern in maize. During last five years most studies described LD in Wheat
and Barley, besides single reports on rice, rye grass, soybean, sugarcane and
sorghum.
The factors, which lead to an increase in LD, include
Inbreeding, Small population size, Genetic isolation between lineages,
Population subdivision, Low recombination rate, Population admixture, Natural
and artificial selection, Balancing selection, etc.
The factors, which lead to a decrease/disruption in LD, include Outcrossing,
High recombination rate, High mutation rate, etc.
Stages of GWAS designs
1) One stage design-First time used in HapMap project where for this cost
involvement was more. It is a process of genotyping all samples on all the
markers.
2) Two stage design-It reduces the genotyping requirements and reduces
the false positive rate.
3) Multistage designs-Joint analysis has more power than replication. p-
value in Stage 1 must be liberal. CaTs power calculator. Here signals
from an initial, First-stage GWA are used to define a subset of SNPs that
are retyped in additional second stage samples. Lower cost do not gain
power.
Analysis of GWAS
Most common approach: look at each SNP one-at-a-time. Possibly add in multi-
marker information. Further investigate / report top SNPs only Or backwards
replication…Most commonly trend test.Log additive model, logistic regression
are the foremost methods to analyze GWAS. Adjust for potential population
stratification.
Basics for GWAS
Calculate the odd ratio
• If 2 events are considered
• odds of A and B is
OR = Odds(A)/Odds(B) = Pr(A)/Pr(~A) / Pr(B)/Pr(~B)
• Odd ratio of many Events comparing the 2 groups
OR = Odds(D|G=1)/Odds(D|G=0) =
= Pr(D|G=1)/Pr(~D|G=1) / Pr(D|G=0)/Pr(~D|G=0)
= Pr(D|G=1)/Pr(D|G=0) x Pr(~D|G=0)/Pr(~D|G=1)
= RR x Pr(~D|G=0)/Pr(~D|G=1).
• Symmetry in odds ratio
OR = Odds(D|G=1)/Odds(D|G=0) = Odds(G|D=1)/Odds(G|D=0).
Testing Significance by using Chi-square test and Rank SNP by P-
value. (Statistical test of association).
Challenges we are going to address while conducting GWAS
Multiple hypothesis testing. In GWAS the number of statistical tests is
commonly is on the order of 10⁶. At significance level of 0.01we would
expect 10,000 false positive. Thus individual p-value <0.01are not
significant anymore. Correction of multiple hypothesis testing is critical.
Population structure-Confounding structure leads to false positive. It
requires favorable conditions like Statistical power and resolution Small
samples, large number of hypothesis, increased power, testing compound
hypothesis.
Association Test Single Locus Analysis
When a well-defined phenotype has been selected for a study population,
and genotypes are collected using sound techniques, the statistical
analysis of genetic data can begin. The de facto analysis of genome-wide
association data is a series of single-locus statistic tests, examining each
SNP independently for association to the phenotype. The statistical test
conducted depends on a variety of factors, but first and foremost,
statistical tests are different for quantitative traits versus case/control
studies. Quantitative traits are generally analyzed using generalized
linear model (GLM) approaches, most commonly the Analysis of Variance
(ANOVA), which is similar to linear regression with a categorical predictor
variable, in this case genotype classes. The null hypothesis of an ANOVA
using a single SNP is that there is no difference between the trait means
of any genotype group. The assumptions of GLM and ANOVA are 1) the
trait is normally distributed; 2) the trait variance within each group is
the same (the groups are homoscedastic); 3) the groups are inde-
pendent.
Dichotomous case/control traits are generally analyzed using either
contingency table methods or logistic regression. Contingency table tests
examine and measure the deviation from independence that is expected
under the null hypothesis that there is no association between the
phenotype and genotype classes. The most ubiquitous form of this test is
the popular chi-square test (and the related Fisher’s exact test). Logistic
regression is an extension of linear regression where the outcome of a
linear model is transformed using a logistic function that predicts the
probability of having case status given a genotype class. Logistic
regression is often the preferred approach because it allows for
adjustment for clinical covariates (and other factors), and can provide
adjusted odds ratios as a measure of effect size. Logistic regression has
been extensively developed, and numerous diagnostic procedures are
available to aid interpretation of the model.
For both quantitative and dichotomous trait analysis (regardless
of the analysis method), there are a variety of ways that genotype data
can be encoded or shaped for association tests. The choice of data
encoding can have implications for the statistical power of a test, as the
degrees of freedom for the test may change depending on the number of
genotype-based groups that are formed. Allelic association tests examine
the association between one allele of the SNP and the phenotype.
Genotypic association tests examine the association between genotypes
(or genotype classes) and the phenotype. The genotypes for a SNP can
also be grouped into genotype classes or models, such as dominant,
recessive, multiplicative, or additive models. Each model makes different
assumptions about the genetic effect in the data assuming two alleles for
a SNP, A and a,a dominant model (for A) assumes that
having one or more copies of the A allele increases risk compared to a
(i.e. Aa or AA genotypes have higher risk). The recessive model (for A)
assumes that two copies of the A allele are required to alter risk, so
individuals with the AA genotype are compared to individuals with Aa
and aa genotypes. The multiplicative model (for A) assumes that if there
is 36risk for having a single A allele, there is a 96risk for having two
copies of the A allele: in this case if the risk for Aa is k, the risk for
AA is k2. The additive model (for A) assumes that there is a uniform,
linear increase in risk for each copy of the A allele, so if the risk is 36for
Aa, there is a 66risk for AA - in this case the risk for Aa is k and the risk
for AA is 2k.A common practice for GWAS is to examine additive models
only, as the additive model has reasonable power to detect both additive
and dominant effects, but it is important to note that an additive model
may be underpowered to detect some recessive effects . Rather than
choosing one model a priori, some studies evaluate multiple genetic
models coupled with an appropriate correction for multiple testing.
Multi-Locus Analysis
In addition to single-locus analyses, genome-wide association studies
provide an enormous opportunity to examine interactions among genetic
variants throughout the genome. Multi-locus analysis, however, is not
nearly as straightforward as conducting single-locus tests, and presents
numerous computational, statistical, and logistical challenges. Because
most GWAS genotype between 500,000 and one million SNPs, examining all
pair-wise combinations of SNPs is a computationally intractable approach,
even for highly efficient algorithms. One approach to this issue is to reduce
or filter the set of genotyped SNPs, eliminating redundant information.
A simple and common way to filter SNPs is to select a set of results
from a single-SNP analysis based on an arbitrary significance threshold and
exhaustively evaluate interactions in that subset. This can be perilous,
however, as selecting SNPs to analyze based on main effects will prevent
certain multi-locus models from being detected – so called ‘‘purely epistatic’’
models with statistically undetectable marginal effects. With these models, a
large component of the heritability is concentrated in the interaction rather
than in the main effects. In other words, a specific combination of markers
incurs a significant change in disease risk. The benefits of this analysis are
that it performs an unbiased analysis for interactions within the selected set
of SNPs. It is also far more computationally and statistically tractable than
analyzing all possible combinations of markers.
Missing heritability
In many complex diseases there are numerous genetic variants which
have been identified. But for many of the recent studies these common
variants only explain a small fraction of the increased risk. Most of those
that have been identified have no established biological relevance to the
disease and often they are not located inside ’active’ genes . From the last
years of GWAS it is clear that the common variants fail to explain the
majority of the genetic heritability of most human diseases. This suggests
that the hypothesis of ’Common disease, common variant’ is not as valid as
was previously believed. The problem is that the biological reality does not
correspond to the study design and assumption of GWAS, and the solution
is not to increase the sample size even further but to improve the study
design and statistical methods. One possible explanation to the missing
heritability could be some kind of interaction between different
genes(epistasis). These interactions could be hard to detect when analyzing
one SNP at the time, as the marginal effect of a single SNP will be small.
Another explanation is that part of the increased risk can be explained by
many rare variants, which are present among less than 1 % of the
population. This suggests that there could be heterogeneity, where different
genetic profiles can cause diseases that are diagnostically the same.
Genetic Interactions
A general definition of genetic interaction (epistasis) is that the effect (pene-
trance) of one locus varies according to the genotype present at another
locus. To detect interactions we need to define how a ’natural’ combined
effect of two risk loci would be expressed in the organism. The concept of
gene-gene interactions is not new, but still it is confusing since the term is
used in various ways. Biological interaction or epistasis was defined first by
Bateson in 1909 . In that example one of the alleles at one locus G is
preventing the alleles at locus B from being expressed in the organism. This
relation does not necessarily have to be symmetric. This definition is similar
to the definition biologists use to examine a biological interaction between
proteins, where proteins interact to regulate several cellular processes. In
statistics the definition of interaction is usually a deviation from a linear
model. In 1918 Fisher made a statistical definition of epistasis [28], as
deviation from additivity in effects of the alleles at different loci on a
quantitative trait.
This definition is more similar to the classical statistical definition of
interaction and do not quite correspond to the biological definition of
epistasis. These definitions get troublesome when the trait is binary, in
these cases the mathematical modelling often focus on the penetrances.
Hence the definitions of epistasis need to be modified. For binary traits an
example could be that both allele A and allele B at two different loci are
needed to develop the trait. In this case A is epistatic to B, and B is epistatic
to A, hence the epistasis is symmetric in contrast to the definition by
Bateson.
A classic way to represent lack of epistasis has been the heterogeneity
model, a person gets the trait by possessing (at least) one of the
predisposing genotypes. This definition actually falls under Bateson’s
definition of epistasis, for example if a person has both risk variants
(situated at different loci) the effect of allele A will be masked by allele B -
another confusing issue about these genetic interactions. There are two
types of genetic heterogeneity, allelic heterogeneity is when several
mutations on the same allele cause the same disease. Locus heterogeneity
means that mutations in several unrelated loci can cause the same
disorder.
The above example of locus heterogenetity could be generalized to a
situation without full penetrance, that is 0 <fi,j < 1 for some of the
penetrances. Mathematically, locus heterogeneity can be expressed as
fij = i + $j # i$j
where ↵i and $j are the penetrance factors for the two genetic
variants.Locus heterogeneity is similar to a daisy chain, where it is enough
for one of the components to break (caused by having at least one of the risk
variants) for the entire system to malfunction, i.e. to obtain the disease.
There are two other common two-locus models for binary traits, the
multiplicative model and the additive model. The multiplicative model can
be expressed as
fij = ↵i$j ,this model is often considered as epistatic. Both the additive model
fij = ↵i + $j , and the heterogeneity model are thought of as non-epistatic by
most authors. However, some authors considers epistasis as departure from
the multiplicative model. Further problems appear when considering that
both the multiplicative and the heterogeneity models become additive with
suitable log transformations. It will be difficult to really model the true
epistatic interactions in complex diseases, and discovered epistatic effects
may have limited input to the understanding of the disease. Still, models
that allow for interactions can improve the statistical power of detecting the
genetic risk variants .The main issue in finding interactions, independent of
how you define epistasis, is how you should detect it in complex diseases
when analyzing millions of genetic markers. Assume that the disease is
caused by different mutations on different loci in various families, and these
genes have a strong effect in each of the subpopulations. Then the
heterogenetic risk genes will probably show a very weak marginal effect
when the markers are analyzed one at the time. For epistatic interactions it
will be very computationally demanding to examine all possible gene-gene
interactions, in addition to the issue of correcting for testing multiple
hypotheses. One way to handle this is to first test for marginal main effects
for each marker in the sample, and hope that the genes involved in
interactions will also show at least a modest marginal effect. Then the
results from this analysis is combined with biological knowledge to suggest
a number of candidates for interaction analysis.
Data Imputation
To conduct a meta-analysis properly, the effect of the same allele
across multiple distinct studies must be assessed. This can prove difficult if
different studies use different genotyping platforms (which use different SNP
marker sets). As this is often the case, GWAS datasets can be imputed to
generate results for a common set of SNPs across all studies. Genotype
imputation exploits known LD patterns and haplotype frequencies from the
HapMap or 1000 Genomes project to estimate genotypes for SNPs not
directly genotyped in the study.
The concept is similar in principle to haplotype phasing algorithms,
where the contiguous set of alleles lying on a specific chromosome is
estimated. Genotype imputation methods extend this idea to human
populations. First, a collection of shared haplotypes within the study sample
is computed to estimate haplotype frequencies among the genotyped SNPs.
Phased haplotypes from the study sample are compared to reference
haplotypes from a panel of much more dense SNPs, such as the HapMap
data. The matched reference haplotypes contain genotypes for surrounding
markers that were not genotyped in the study sample. Because the study
sample haplotypes may match multiple reference haplotypes, surrounding
genotypes may be given a score or probability of a match based. On the
haplotype overlap. For example, rather than assign an imputed SNP a single
allele A, the probability of possible alleles is reported (0.85 A,0.12 C,0.03
T)based on haplotype frequencies. This information can be used in the
analysis of imputed data to take into account uncertainty in the genotype
estimation process, typically using Bayesian analysis approaches. Popular
algorithms for genotype imputation include BimBam , IMPUTE , MaCH ,and
Beagle . Much like conducting a meta-analysis, genotype imputation must
be conducted with great care. The reference panel (i.e. the 1000 Genomes
data or the HapMap project) must contain haplotypes drawn from the same
population as the study sample in order to facilitate a proper haplotype
match. If a study was conducted using individuals of Asian descent, but
only European descent populations are represented in the reference panel,
the genotype imputation quality will be poor as there is a lower probability
of a haplotype match. Also, the reference allele for each SNP must be
identical in both the study sample and the reference panel. Finally, the
analysis of imputed genotypes should account for the uncertainty in
genotype state generated by the imputation process.
Statistical methods in GWAS
If a genetic marker is associated to a particular disease, then the
genotype or allele frequencies will be different among affected and healthy
individuals. A commonly used test for searching for associated SNPs in case-
control studies is a Pearson % test applied to a 2-by-2 table of allele counts in
the two groups. For complex traits it is commonly assumed that the
contribution to the genetic effect from each SNP is roughly additive, i.e. the
penetrance for heterozygous are somewhere in between the penetrance for the
two homozygotes. This test is powerful for additive models, whereof the
popularity of this test in these studies. Other common tests include a Pearson
%test comparing the genotype frequencies instead of allele frequencies,
Cochran Armitage test for trend in penetrances, and logistic regression. The
Transmission Disequilibrium Test (TDT) is an association test using data from
families with at least one affected child. This test was introduced by Spielman
et al. , and the test evaluates the transmission of an allele from a heterozygous
parent to the offspring. The TDT is based on the assumption that each of the
two alleles M1 and M2 at a locus is transmitted with equal probability to the
offspring, hence for a sample of heterozygous parents we expect approximately
half of them to transmit the alleleM1. If one of the alleles is transmitted more
often among families where the children have a genetic disease, we suspect
that the allele is associated to the disease. Let b denote the number of
heterozygous parents who transmits alleleM1 to their offspring, and c the
number of heterozygous parents who transmits allele M2. Conditioned on b + c,
b is is binomially distributed, but usually the test statistic has the following
form
T = (b # c)2 b + c, This test asymptotically follows a % distribution and is
equivalent to a Pearson%2-test.
Logistic Regression
Generalized Linear Models (GLMs) extend the ordinary regression model to
other response variables than the Normal distributed. GLMs are applicable if
the response variable has a distribution which belongs to the natural
exponential family. One of those distributions is the Binomial distribution, and
with Logistic Regression we model the binomial probability p(x)= P(Y =1|x) as
logp(x) 1 # p(x)= ↵ +Xj$jxj Here xj denotes the value of the jth element in the
predictor x. In the simple Logistic regression with one binary predictor x, $ is
equal to the log odds ratio$ = p(x = 1)/(1 # p(x = 1)) p(x = 0)/(1 # p(x = 0)).
In retrospective (individuals are sampled based on their affection status)
studies the effect parameter $ will be the same as in the prospective (sampling
based on the predictors) design, if we assume that the sampling probability is
independent of x. This is one of the main reasons for using this method in
biomedical studies. Another advantage with the logistic regression is that it is
easy to include several predictor in the analysis and make inference for
interactions between genes and environment, as well as gene-gene interactions.
Schaid described a univariate method for case-parent data, modelling genotype
relative risks with conditional logistic regression using three pseudo controls
based on the parents’ untransmitted alleles. This method can be generalized to
two loci. For case-control data logistic regression can be used to analyse
interactions by comparing the saturated model to an additive model,
specified on the form of. The additive logistic model is roughly equivalent to the
heterogeneity mode if the relative risk (RR) or odds ratio (OR) is of moderate
size. However, North et. al show examples of heterogeneity models which are
marginally recessive (marginal RR⇡ 150), in this case the logistic regression
yields non-zero interaction estimates. Hence, to really examine deviations from
the heterogeneity model (and not the multiplicative or logistic model) more
advanced methods need to be applied.
GWAS studied in Various crops:
1.Arabidopsis
Genome-Wide Association Mapping in Arabidopsis Identifies Previously
Known Flowering Time and Pathogen Resistance Genes.
A very large number of spurious genotype–phenotype correlations are
found, especially for traits that vary geographically. For example, plants from
northern latitudes flower later; however, in addition to sharing genetic variants
that make them flower late, they also tend to share variants across the genome,
making it difficult to determine which genes are responsible for flowering. This
notwithstanding, several previously known genes were successfully identified in
this study, and the researchers are optimistic about the prospects for
association mapping in this species.
They checked flowering time and pathogen resistance in a sample of 95
accessions for which genome wide polymorphism data were available. In spite
of an extremely high rate of false positives due to population structure, we were
able to identify known major genes for all phenotypes tested, thus
demonstrating the potential of genome-wide association mapping in A. thaliana
and other species with similar patterns of variation. The rate of false positives
differed strongly between traits, with more clinal traits showing the highest
rate. However, the false positive rates were always substantial regardless of the
trait, highlighting the necessity of an appropriate genomic control in
association studies.
The columns on the left give the genotype and associated phenotype for four
loci, for each of the 95 accessions. The four loci are the flowering time locus
FRI (þ, wild-type; 1, Ler null allele; 2, Col null allele, for which the associated
phenotype is flowering time in long-day conditions without vernalization (late
flowering is indicated by height and color of bar), and the three pathogen
resistance loci Rps5, Rpm1, and Rps2 (þ, wild-type;, null allele, for which the
associated phenotypes are hypersensitive response to the appropriate bacterial
avr gene (red indicates resistance, black indicates susceptibility, and missing
data are indicated by missing bar). The tree on the right illustrates the genetic
relationships between the accessions.
GWAPP: A Web Application for Genome-Wide Association Mapping in
Arabidopsis.
GWAPP, an interactive Web-based application for conducting GWAS in A.
thaliana. Using an efficient implementation of a linear mixed model, traits
measured for a subset of 1386 publicly available ecotypes can be uploaded and
mapped with a mixed model and other methods in just a couple of minutes.
GWAPP features an extensive, interactive, and user-friendly interface that
includes interactive Manhattan plots and linkage disequilibrium plots. It also
facilitates exploratory data analysis by implementing features such as the
inclusion of candidate polymorphisms in the model as cofactors.
Fig 1(A) The filter box allows the user to exclude specific accessions as well
as change the name and the description of the data set.(B) The data set list
displays information for each accession in the data set. In edit mode, the user
can use the checkbox to add and remove accessions from the data set.(C) A
Google map shows the locations of all accessions in the data set. Clicking on
one marker will show a pop-up with information about the name and ID of the
selected accession.(D) The geographic distribution map (GeoMap) shows the
geographic distribution of the accessions in the data set. Moving the mouse
over a country will show the number of accessions located in that region.
Fig2.The result view displays GWAS plots for each of the five chromosomes.
Each GWAS plot itself consists of three panels. The top panel (A) contains a
scatterplot. The positions on the chromosome are on the x axis and the score
on the y axis. The dots in the scatterplot represent SNPs (E).A horizontal
dashed line (H) shows the 5% FDR threshold. At the top of the GWAS results
view, a search box for genes is displayed (D). These genes will be displayed as a
colored band (red in the figure). The second panel (B) shows the gene
annotation and is only shown for a specific zoom range (<1.5Mb). It will display
genes, gene features, and gene names. Moving the mouse over a gene will
display additional information in a pop-up (F), and clicking on a gene will open
the TAIR page for the specific gene. Panel (C) displays various chromosome-
wide statistics. The region highlighted by a yellow band (I) is shown in the
scatterplot and in the gene annotation. The gear icon opens a pop-up (G) with
the available statistics the user can choose from.
Genome-wide association mapping reveals a rich genetic architecture
of complex traits in Oryza sativa.o
Here the earlier approaches revealed that the Biparental and QTL approaches
are not scalable to investigate the genetic potential and tremendous phenotypic
variation of more than 12000 accessions available in public germplasm
repositories.
Here they took global collection of 413 diverse(sativa) varieties from 82
contries using high quality custom designed 441000 oligonucleotides
phenotyping array.For these accesions they phenotyped 34 morphological,
developmental and agronomic traits over 2 consicutive field seasons.
This mapping stretegy evaluated variation both within and among 4 of the
major subgroups of rice, revealing significant heterogenity of genetic
architechture among groups as well as gene by environmental effect.
Fig. Phenotypic distribution and genome-wide association scan for plant
height. ( a ) Quantile – Quantile plots for both na ï ve and mixed model for
plant height in all samples. ( b ) Boxplot showing the differences in plant
height among subpopulations. Box edges represent the upper and lower
quantile with median value shown as bold line in the middle of the box.
Whiskers represent 1.5 times the quantile of the data. Individuals falling
outside the range of the whiskers shown as open dots. ( c ) Histogram of plant
height in all samples. Dashed black line represents the null distribution. ( d )
Genome-wide P -values from the mixed model and na ï ve method. x axis
shows the SNPs along each chromosome; y axis is the− log 10 (P -value) for
the association. Dots in ( a ) and ( c ) indicate SNPs with P -values <1 × 10
−4 in the mixed model and the top 50 SNPs in the naïve method; SNPs within
200 kb range of known genes are in red; other significant SNPs are in blue.
Candidate gene locations shown as red vertical dashed lines with names on
top.
A Genome-Wide Association Study Identifies Genomic Regions for
Virulence in the Non-Model Organism Heterobasidion annosum s.s
The dense single nucleotide polymorphisms (SNP) panels needed for
genome wide association (GWA) studies have hitherto been expensive to
establish and use on non-model organisms. To overcome this, we used a next
generation sequencing approach to both establish SNPs and to determine
genotypes. We conducted a GWA study on a fungal species, analyzing the
virulence of Heterobasidion annosum s.s., a necrotrophic pathogen, on its
hosts Picea abies and Pinus sylvestris. From a set of 33,018 single nucleotide
polymorphisms (SNP) in 23 haploid isolates, twelve SNP markers distributed on
seven contigs were associated with virulence (P,0.0001). Four of the contigs
harbour known virulence genes from other fungal pathogens and the remaining
three harbour novel candidate genes. Two contigs link closely to virulence
regions recognized previously by QTL mapping in the congeneric hybrid H.
irregulare6H. occidentale. The study demonstrates the efficiency of GWA
studies for dissecting important complex traits of small populations of non-
model haploid organisms with small genomes.
Genome-wide association study (GWAS) of resistance to head smut
in maize
Head smut, caused by the fungus Sphacelotheca reiliana (Kühn)
Clint, is a devastating global disease in maize, leading to severe quality
and yield loss each year. The present study is the first to conduct a
genome-wide association study (GWAS) of head smut resistance using
the Illumina MaizeSNP50 array. Out of 45,868 single nucleotide
polymorphisms in a panel of 144 inbred lines, 18 novel candidate genes
were associated with head smut resistance in maize. These candidate
genes were classified into three groups, namely, resistance genes, disease
response genes, and other genes with possible plant disease resistance
functions. The data suggested a complicated molecular mechanism of
maize resistance against S. reiliana. This study also suggested that
GWAS is a useful approach for identifying causal genetic factors for head
smut resistance in maize.
Fig. Manhattan plots of a mixed linear model (MLM) for resistance to
head smut. Plots above the blue horizontal dashed line show the
genome-wide significance with a moderately stringent threshold of −log
(1/45,868). Plots above the red horizontal dashed line show the genome-
wide significance with stringent threshold of −log(0.05/45,868). The
different colors indicate plots for different chromosomes, which follow the
order: chromosome 1–chromosome 10. The plots with the −log10 (P)
Value above 8 were not shown
Advantages of GWAS
1) Biological pathway of the trait does not have to be known.
2) Potential to discover novel candidate genes, not identified through other
methodological approaches.
3) Encourage the formation of collaborative consortia to recruit sufficient
number of participants for analysis, which tend to continue their
collaboration with subsequent analysis.
4) Rules act at specific genetic association.
5) Provides data on ancestry of each subject, which assists in matching
case subject with control subject.
6) Provides data on 2 types of structural variants-sequence and copy
number variations- which provides more robust data.
7) It is large enough to identify mutations explaining a few percent of
phenotypic variance.
Disadvantages
1) Results need replication in independent samples in different population.
2) A large study of population is required.
3) GWAS detect association not causation.
4) Identifying specific location not complete gene. Many variants identified
are nowhere near a protein coding gene or are within genes that were not
previously believed to associate with a trait or condition.
5) Falls on common variants.
6) Detect any variant that are common(>5%) in a population.
7) Typically for any particular trait, the cumulative effect of multiple SNPs
only explain a small function of an individual risk of a train.
Still why GWAS is popular?
The dropping genotyping costs are likely to drive association studies
away from candidate genes. It involves whole genome resequencing of all
the individuals in a population, will allow an assessment of point
mutation, insertions deletions and large structure variation such as copy
number variation Eg. Resequencing of Arabidopsis lyrata. In future this
will help in RNA-seq data to include in e-QTL mapping in GWAS studies.
Population choice for GWAS studies will no longer restricted to
model organisms will slowly become more focused on the spp which are
more relevant in answering biological questions. The accuracy of GWAS
depends on 1 time genotyping and repeated phenotyping in different
environmental conditions.
Output of GWAS
• To ensure greatest utility of GWAS result in the future,all phenotype and
genotype data should to be made public and be deposited in public
databases.
• As such file format and minimum information standards should to be
established, such as those available for sequence data or microarray
experiments. Priority to storage and dissemination of phenotypic and
genotypic data.
Future perspectives
Despite the caveats outlined above, it seems that genome-wide
association studies of the role of common variants in complex disease will be
carried out in the near future. Initial studies will define more accurately the
principal factors, which have been summarized above, that can reduce the
power of such studies. In these studies, large sample sizes should be used,
biases taken into account, multiple-testing issues addressed and replication
studies carried out, therefore optimizing experimental design, statistical power
and cost efficiency. Close evaluation of the yields of true susceptibility loci in
relation to the cost of such rigorously designed studies will determine whether
the genome-wide analyses of common SNPs is a worthwhile approach in the
continuing dissection of the genetic basis of common disease.
Summary and conclusions
The past year has seen a remarkable shift in our capacity to dissect the
genetic basis of common diseases and continuous traits of biomedical
significance. The GWA approach has proven itself extremely well-suited to the
identification of common SNP-based variants with modest to large effects on
phenotype. Careful implementation and appropriate interpretation has resulted
in discoveries that have proven more robust than many had anticipated.
Growing numbers of novel susceptibility loci have been identified, shedding
light on the fundamental mechanisms that influence disease predisposition,
and much is being learned about the complex relationships between changes in
genome sequence and phenotypic variation.
However, we are far from the end of this particular voyage, and recent
discoveries are nothing more than initial forays into the terra incognita of our
genomes. We remain unable to explain more than a small proportion of
observed familial clustering for most multifactorial traits, a fact that
emphasizes the need to extend analysis to a more complete range of potential
susceptibility variants, and to support more explicit modelling of the joint
effects of genes and environment. Many of the greatest challenges to be
faced in the years ahead lie not so much in the identification of the association
signals themselves, but in defining the molecular mechanisms through which
they influence disease risk and/or phenotypic expression.
Reference
Altshuler DM, Gibbs RA, Peltonen L, Altshuler DM, Gibbs RA, et al. (2010)
Integrating common and rare genetic variation in diverse human
populations. Nature 467: 52–58. doi: 10.1038/ nature09298.
Atwell S, Huang YS, Vilhjalmsson BJ, Willems G, Horton M, et al.
(2010)Genome-wide association study of 107 phenotypes in Arabidopsis
thaliana inbred lines. Nature 465: 627–631.
Connelly CF, Akey JM (2012) On the prospects of whole-genome association
mapping in Saccharomyces cerevisiae. Genetics.
Cooper GM, Johnson JA, Langaee TY, Feng H, Stanaway IB, et al. (2008) A
genome-wide scan for common genetic variants with a large influence on
warfarin maintenance dose. Blood 112: 1022–1027. doi: 10.1182/blood-
2008-01- 134247
Cumagun CJR, Bowden RL, Jurgenson JE, Leslie JF, Miedaner T (2004)
Genetic mapping of pathogenicity and aggressiveness of Gibberella zeae
(Fusarium graminearum) toward wheat. Phytopathology 94: 520–526.
Edwards AO, Ritter R, III, Abel KJ, Manning A, Panhuysen C, et al. (2005)
Complement factor H polymorphism and age-related macular degener-
ation. Science 308: 421–424. doi: 10.1126/ science.1110189.
Ellison CE, Hall C, Kowbel D, Welch J, Brem RB, et al. (2011) Population
genomics and local adaptation in wild isolates of a model microbial
eukaryote.
Freedman, M. et al. Assessing the impact of population stratification on
genetic association studies. Nat. Genet. 36, 388–393 (2004).
Freimer, N. & Sabatti, C. The use of pedigree, sib-pair and association studies
of common diseases for genetic mapping and epidemiology. Nature
Genet. 36, 1045–1051 (2004).A clear and unbiased review of the main
current genetic mapping strategies that discusses analyses using
extended pedigrees, affected sib-pairs and association.
Genomes Project Consortium (2010) A map of human genome variation from
population-scale sequencing. Nature 467: 1061–1073. Doi:10.1038
/nature09534.
Griffith OL, Montgomery SB, Bernier B, Chu B, Kasaian K, et al. (2008)
ORegAnno: an open- access community-driven resource for regulatory
annotation. Nucleic Acids Res 36: D107-D113. doi: 10.1093/nar/gkm967
Haines JL, Hauser MA, Schmidt S, Scott WK, Olson LM, et al. (2005)
Complement factor H variant increases the risk of age-related macular
degeneration. Science 308: 419–421. doi: 10.1126/science.1110359.
Hall D, Tegstrom C, Ingvarsson PK (2010) Using association mapping to
dissect the genetic basis of complex traits in plants. Brief Funct Genomics
9: 157–165.
Hawthorne B, Rees-George J, Bowen J, Ball R (1997) A single locus with a
large effect on virulence in Nectria haematococca MPI. Fungal Genet
Newsl 44: 24–26.
Hindorff LA, Sethupathy P, Junkins HA, Ramos EM, Mehta JP, et al.
(2009)Potential etiologic and functional implications of genome-wide
association loci for human diseases and traits. Proc Natl Acad Sci U S A
106: 9362–9367.
IRGSP .he map-based sequence of the rice genome. Nature 436 , 793– 800(
2005).
Klein RJ, Zeiss C, Chew EY, Tsai JY, Sackler RS, et al. (2005) Complement
factor H polymorphism in age-related macular degeneration. Science 308:
385–389. doi: 10.1126/science.1109557.
Lander, E.S. & Schork, N.J.Genetic dissection of complex traits. Science
265,2037–2048 (1994).
Li, Y., Huang, Y.S., Bergelson, J., Nordborg, M., and Borevitz,
J.O.(2010).Association mapping of local climate-sensitive quantitative
trait loci in Arabidopsis thaliana. Proc.Natl.Acad.Sci.USA 107: 21199–
21204.
Lind M, Dalman K, Stenlid J, Karlsson B, Olson A (2007) Identification of
quantitative trait loci affecting virulence in the basidiomycete
Heterobasidion annosum s.l. Curr Genet 52: 35–44.
Lind M, van der Nest M, Olson A ˚ , Brandstro ¨m-Durling M, Stenlid J (2012)
A 2nd generation linkage map of Heterobasidion annosum s.l. based on in
silico anchoring of AFLP markers. PLoS One 7: e48347.
Liti G, Carter DM, Moses AM, Warringer J, Parts L, et al. (2009) Population
genomics of domestic and wild yeasts. Nature 458: 337–341.
Lohmueller, K. et al. Meta-analysis of genetic association studies supports a
contribution of common variants to susceptibility to common disease.
Nat. Genet. 33,177–182 (2003).
Muller LAH, Lucas JE, Georgianna DR, McCusker JH (2011) Genome-wide
association analysis of clinical vs. nonclinical origin provides insights
into Saccharomyces cerevisiae pathogenesis. Mol Ecol 20: 4085–4097.
Neafsey DE, Barker BM, Sharpton TJ, Stajich JE, Park DJ, et al.
(2010)Population genomic sequencing of Coccidioides fungi reveals
recent hybridization and transposon control. Genome Res 20: 938–946.
Olson A, Stenlid J (2001) Plant pathogens - Mitochondrial control of
fungalhybrid virulence. Nature 411: 438–438.
Ozoki, K., 2001, A high throughput SNP typing system for GWAS, Springer.
16:1134-1137.
Pandelova I, Ciuffetti LM (2005) A proteomics-based approach for identification
of the Tox D gene. Fungal Genet Newsl 52.
Price,A. L. et al. Principal components analysis corrects for stratification in
genome-wide association studies .Nat. Genet.38, 906 –909 (2006).
Santoyo F, Gonzalez AE, Terron MC, Ramirez L, Pisabarro AG
(2008)Quantitative linkage mapping of lignin-degrading enzymatic
activities in Pleurotus ostreatus. Enzyme Microb Technol 43: 137–143.
Tohn, P, A., 2009, Validating and refining GWAS signals, Nature. 10:318-329.
Umit Seren. And Bjarni., 2012, GWAPP:A web application for genome wide
association mapping in Arabidopsis, The plant cell J. 24:4793-4805.
Yamamoto , T., Yonemaru J.&Yano,M .Towards the understanding of
complex traits in rice: substantially or superficially? DNA Res.16, 141–
154 ( 2009).
Yan J, Shah T, Warburton ML, Buckler ES, McMullen MD, et al. (2009)
Genetic characterization and linkage disequilibrium estimation of a
global maize collection using SNP markers. PLoS One 4.
Zhang H, Zhao Q, Liu K, Zhang Z, Wang Y, et al. (2009) MgCRZ1,a
transcription factor of Magnaporthe grisea, controls growth, development
and is involved in full virulence. FEMS Microbial Lett 293: 160–169.

Mais conteúdo relacionado

Mais procurados

Genomic selection for crop improvement
Genomic selection for crop improvementGenomic selection for crop improvement
Genomic selection for crop improvementnagamani gorantla
 
Association mapping
Association mapping Association mapping
Association mapping Preeti Kapoor
 
Association mapping
Association mappingAssociation mapping
Association mappingNivethitha T
 
Whole Genome Selection
Whole Genome SelectionWhole Genome Selection
Whole Genome SelectionRaghav N.R
 
Association mapping approaches for tagging quality traits in maize
Association mapping approaches for tagging quality traits in maizeAssociation mapping approaches for tagging quality traits in maize
Association mapping approaches for tagging quality traits in maizeSenthil Natesan
 
Association mapping for improvement of agronomic traits in rice
Association mapping  for improvement of agronomic traits in riceAssociation mapping  for improvement of agronomic traits in rice
Association mapping for improvement of agronomic traits in riceSopan Zuge
 
Association mapping in plants
Association mapping in plantsAssociation mapping in plants
Association mapping in plantsWaseem Hussain
 
GBS: Genotyping by sequencing
GBS: Genotyping by sequencingGBS: Genotyping by sequencing
GBS: Genotyping by sequencingsampath perumal
 
MARKER-ASSISTED BREEDING FOR RICE IMPROVEMENT
MARKER-ASSISTED BREEDING FOR RICE IMPROVEMENTMARKER-ASSISTED BREEDING FOR RICE IMPROVEMENT
MARKER-ASSISTED BREEDING FOR RICE IMPROVEMENTFOODCROPS
 
Development and use of different mapping population in brinjal
Development and use of different mapping population in brinjalDevelopment and use of different mapping population in brinjal
Development and use of different mapping population in brinjalBasavaraj Panjagal
 
Genomic selection, prediction models, GEBV values, genomic selection in plant...
Genomic selection, prediction models, GEBV values, genomic selection in plant...Genomic selection, prediction models, GEBV values, genomic selection in plant...
Genomic selection, prediction models, GEBV values, genomic selection in plant...Mahesh Biradar
 

Mais procurados (20)

Genomic selection for crop improvement
Genomic selection for crop improvementGenomic selection for crop improvement
Genomic selection for crop improvement
 
Association mapping
Association mapping Association mapping
Association mapping
 
Genomic selection
Genomic  selectionGenomic  selection
Genomic selection
 
QTL
QTLQTL
QTL
 
MAGIC POPULATION
MAGIC POPULATIONMAGIC POPULATION
MAGIC POPULATION
 
Association mapping
Association mappingAssociation mapping
Association mapping
 
Whole Genome Selection
Whole Genome SelectionWhole Genome Selection
Whole Genome Selection
 
GWAS
GWASGWAS
GWAS
 
Association mapping approaches for tagging quality traits in maize
Association mapping approaches for tagging quality traits in maizeAssociation mapping approaches for tagging quality traits in maize
Association mapping approaches for tagging quality traits in maize
 
Association mapping for improvement of agronomic traits in rice
Association mapping  for improvement of agronomic traits in riceAssociation mapping  for improvement of agronomic traits in rice
Association mapping for improvement of agronomic traits in rice
 
Association mapping in plants
Association mapping in plantsAssociation mapping in plants
Association mapping in plants
 
TILLING & ECO-TILLING
TILLING & ECO-TILLINGTILLING & ECO-TILLING
TILLING & ECO-TILLING
 
GBS: Genotyping by sequencing
GBS: Genotyping by sequencingGBS: Genotyping by sequencing
GBS: Genotyping by sequencing
 
1632 Anirudh Kumar
1632 Anirudh Kumar1632 Anirudh Kumar
1632 Anirudh Kumar
 
Mapping population ppt
Mapping population pptMapping population ppt
Mapping population ppt
 
MARKER-ASSISTED BREEDING FOR RICE IMPROVEMENT
MARKER-ASSISTED BREEDING FOR RICE IMPROVEMENTMARKER-ASSISTED BREEDING FOR RICE IMPROVEMENT
MARKER-ASSISTED BREEDING FOR RICE IMPROVEMENT
 
QTL mapping for crop improvement
QTL mapping for crop improvementQTL mapping for crop improvement
QTL mapping for crop improvement
 
Magic population
Magic populationMagic population
Magic population
 
Development and use of different mapping population in brinjal
Development and use of different mapping population in brinjalDevelopment and use of different mapping population in brinjal
Development and use of different mapping population in brinjal
 
Genomic selection, prediction models, GEBV values, genomic selection in plant...
Genomic selection, prediction models, GEBV values, genomic selection in plant...Genomic selection, prediction models, GEBV values, genomic selection in plant...
Genomic selection, prediction models, GEBV values, genomic selection in plant...
 

Semelhante a Report- Genome wide association studies.

Gene hunting strategies
Gene hunting strategiesGene hunting strategies
Gene hunting strategiesAshfaq Ahmad
 
A New Generation Of Mechanism-Based Biomarkers For The Clinic
A New Generation Of Mechanism-Based Biomarkers For The ClinicA New Generation Of Mechanism-Based Biomarkers For The Clinic
A New Generation Of Mechanism-Based Biomarkers For The ClinicJoaquin Dopazo
 
Molecular markers for measuring genetic diversity
Molecular markers for measuring genetic diversity Molecular markers for measuring genetic diversity
Molecular markers for measuring genetic diversity Zohaib HUSSAIN
 
From reads to pathways for efficient disease gene finding
From reads to pathways for efficient disease gene findingFrom reads to pathways for efficient disease gene finding
From reads to pathways for efficient disease gene findingJoaquin Dopazo
 
Roleoffunctionalgenomicsincropimprovement ashishgautam
Roleoffunctionalgenomicsincropimprovement ashishgautamRoleoffunctionalgenomicsincropimprovement ashishgautam
Roleoffunctionalgenomicsincropimprovement ashishgautamAshish Gautam
 
Genome wide Association studies.pptx
Genome wide Association studies.pptxGenome wide Association studies.pptx
Genome wide Association studies.pptxAkshitaAwasthi3
 
Genetic variability and phylogenetic relationships studies of Aegilops L. usi...
Genetic variability and phylogenetic relationships studies of Aegilops L. usi...Genetic variability and phylogenetic relationships studies of Aegilops L. usi...
Genetic variability and phylogenetic relationships studies of Aegilops L. usi...Innspub Net
 
Genome to pangenome : A doorway into crops genome exploration
Genome to pangenome : A doorway into crops genome explorationGenome to pangenome : A doorway into crops genome exploration
Genome to pangenome : A doorway into crops genome explorationKiranKm11
 
Potential for Genomic Selection in indigenous breeds and results of GWAS in G...
Potential for Genomic Selection in indigenous breeds and results of GWAS in G...Potential for Genomic Selection in indigenous breeds and results of GWAS in G...
Potential for Genomic Selection in indigenous breeds and results of GWAS in G...Superior Animal Genetics (SAG)
 
QTL mapping current status and future prospects
QTL mapping current status and future prospectsQTL mapping current status and future prospects
QTL mapping current status and future prospectsRana Asif Abbas
 
Forward and reverse genetics
Forward and reverse geneticsForward and reverse genetics
Forward and reverse geneticsVinod Pawar
 
Marker assisted selection lecture
Marker assisted selection lectureMarker assisted selection lecture
Marker assisted selection lectureBruno Mmassy
 
Sequencing-based Genotyping Assays
Sequencing-based Genotyping AssaysSequencing-based Genotyping Assays
Sequencing-based Genotyping AssaysKikoGarcia13
 
QTL mapping and analysis.pptx
QTL mapping and analysis.pptxQTL mapping and analysis.pptx
QTL mapping and analysis.pptxSarathS586768
 
Omics for crop improvement (new)
Omics for crop improvement (new)Omics for crop improvement (new)
Omics for crop improvement (new)Gokul Dhana
 
How Can Ngs Forward Research Essay
How Can Ngs Forward Research EssayHow Can Ngs Forward Research Essay
How Can Ngs Forward Research EssayStefanie Yang
 

Semelhante a Report- Genome wide association studies. (20)

Gene hunting strategies
Gene hunting strategiesGene hunting strategies
Gene hunting strategies
 
A New Generation Of Mechanism-Based Biomarkers For The Clinic
A New Generation Of Mechanism-Based Biomarkers For The ClinicA New Generation Of Mechanism-Based Biomarkers For The Clinic
A New Generation Of Mechanism-Based Biomarkers For The Clinic
 
Molecular markers for measuring genetic diversity
Molecular markers for measuring genetic diversity Molecular markers for measuring genetic diversity
Molecular markers for measuring genetic diversity
 
From reads to pathways for efficient disease gene finding
From reads to pathways for efficient disease gene findingFrom reads to pathways for efficient disease gene finding
From reads to pathways for efficient disease gene finding
 
GWAS Study.pdf
GWAS Study.pdfGWAS Study.pdf
GWAS Study.pdf
 
Pharmacogenomics
PharmacogenomicsPharmacogenomics
Pharmacogenomics
 
Roleoffunctionalgenomicsincropimprovement ashishgautam
Roleoffunctionalgenomicsincropimprovement ashishgautamRoleoffunctionalgenomicsincropimprovement ashishgautam
Roleoffunctionalgenomicsincropimprovement ashishgautam
 
Genome wide Association studies.pptx
Genome wide Association studies.pptxGenome wide Association studies.pptx
Genome wide Association studies.pptx
 
Genetic variability and phylogenetic relationships studies of Aegilops L. usi...
Genetic variability and phylogenetic relationships studies of Aegilops L. usi...Genetic variability and phylogenetic relationships studies of Aegilops L. usi...
Genetic variability and phylogenetic relationships studies of Aegilops L. usi...
 
Genome to pangenome : A doorway into crops genome exploration
Genome to pangenome : A doorway into crops genome explorationGenome to pangenome : A doorway into crops genome exploration
Genome to pangenome : A doorway into crops genome exploration
 
Potential for Genomic Selection in indigenous breeds and results of GWAS in G...
Potential for Genomic Selection in indigenous breeds and results of GWAS in G...Potential for Genomic Selection in indigenous breeds and results of GWAS in G...
Potential for Genomic Selection in indigenous breeds and results of GWAS in G...
 
QTL mapping current status and future prospects
QTL mapping current status and future prospectsQTL mapping current status and future prospects
QTL mapping current status and future prospects
 
Forward and reverse genetics
Forward and reverse geneticsForward and reverse genetics
Forward and reverse genetics
 
Marker assisted selection lecture
Marker assisted selection lectureMarker assisted selection lecture
Marker assisted selection lecture
 
Sequencing-based Genotyping Assays
Sequencing-based Genotyping AssaysSequencing-based Genotyping Assays
Sequencing-based Genotyping Assays
 
QTLS......pptx
QTLS......pptxQTLS......pptx
QTLS......pptx
 
Nikhil ahlawat
Nikhil ahlawatNikhil ahlawat
Nikhil ahlawat
 
QTL mapping and analysis.pptx
QTL mapping and analysis.pptxQTL mapping and analysis.pptx
QTL mapping and analysis.pptx
 
Omics for crop improvement (new)
Omics for crop improvement (new)Omics for crop improvement (new)
Omics for crop improvement (new)
 
How Can Ngs Forward Research Essay
How Can Ngs Forward Research EssayHow Can Ngs Forward Research Essay
How Can Ngs Forward Research Essay
 

Mais de Varsha Gayatonde (20)

Leaf structure and function
Leaf structure and functionLeaf structure and function
Leaf structure and function
 
Tomato
Tomato   Tomato
Tomato
 
Tobacco
TobaccoTobacco
Tobacco
 
Sunflower basavraj t
Sunflower   basavraj tSunflower   basavraj t
Sunflower basavraj t
 
Soyabean
Soyabean   Soyabean
Soyabean
 
Sorghum varu gaitonde.
Sorghum   varu gaitonde.Sorghum   varu gaitonde.
Sorghum varu gaitonde.
 
Sesame
SesameSesame
Sesame
 
Pumpkin
Pumpkin   Pumpkin
Pumpkin
 
Pigeon pea
Pigeon peaPigeon pea
Pigeon pea
 
Pigeon pea
Pigeon peaPigeon pea
Pigeon pea
 
Pea
PeaPea
Pea
 
Okra
OkraOkra
Okra
 
Oats
OatsOats
Oats
 
Maize
MaizeMaize
Maize
 
Groundnut
GroundnutGroundnut
Groundnut
 
Green gram
Green gramGreen gram
Green gram
 
Field bean
Field beanField bean
Field bean
 
Cowpea
CowpeaCowpea
Cowpea
 
Cowpea 12
Cowpea 12Cowpea 12
Cowpea 12
 
Cotton
CottonCotton
Cotton
 

Último

POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppCeline George
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991RKavithamani
 
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...RKavithamani
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 

Último (20)

POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website App
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
 
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 

Report- Genome wide association studies.

  • 1. University of Agricultural Sciences Department of Genetics and Plant Breeding GKVK, Bangalore-65 PG Seminar: GPB 581(0+1) On Genome wide association studies. Submitted By: Varsha Gayatonde Sr.MSc, PALB 2235 Dept. of Genetics & Plant Breeding UAS, GKVK, Bangalore Submitted To: Dr. R.Nandini Associate Professor Dept. of Genetics & Plant Breeding UAS, GKVK, Bangalore Department of Genetics and Plant Breeding University of Agricultural Sciences GKVK, Bangalore-560065
  • 2. Contents SI. No. Title Content 1 Introduction to mapping. 2 Terminologies 3 Brief History 4 Association mapping 5 Comparison of GWAS and Biparental mapping 6 Concept of Linkage disequilibrium(LD) 7 Factors affecting LD and use in plant system. 8 Genome wide association studies. 9 Methodologies. 10 Challenges while conducting GWAS 11 Advantages and disadvantages 12 GWAS studies in Arabidopsis 13 Studies on Rice 14 Maize smut studies 15 GWAS studies on MYB related traits and in other crops. 16 Current association challenges 18 Conclusion 19 References
  • 3. DEPARTMENT OF GENETICS AND PLANT BREEDING GKVK, UNIVERSITY OF AGRICULTURAL SCIENCES, BANGALORE – 560 065 First Seminar: GPB 581 (0+1) GENOME WIDE ASSOCIATION STUDIES Synopsis Genome wide association is a study design in which many markers spread across a genome, are genotyped and test a statistical association with a phenotype are performed locally along the genome. It is also an examination of many common genetic variants in different individuals to see if any variant is associated with a trait. The first prospects for whole genome association studies began in early 2002¹. This LD based association mapping started with human beings, later in Arabidopsis, rice, grapevine, wheat, soybean, maize, tomato and other model organisms.̒ HapMap,’ the multi-country effort to identify, catalog common human genetic variant put a milestone to extend application to other organisms in order to make GWAS powerful. SNPs need to be chosen widely distributed in a way, that reflects the genetic variation. Selection of suitable and desirable markers yield fine mapping² and the genome wide chips, which enabled increased coverage of markers improving power in association signals. But this doesn’t necessarily imply increased power of detecting association loci. The other drawbacks here are need of large population size, pooling and cost of preparing DNA samples and less knowledge about the risk of a trait. To overcome this drawback recently researchers upgraded the statistical approaches, proper imputation of genotypes and advanced approaches like nested association mapping, candidate gene association approach, and the web application of GWAS (GWAPP in Arabidopsis)³. Despite of its drawbacks still GWAS is famous due to its dropping genotyping costs, which is likely to drive association studies away from candidate gene based studies. This will likely tinvolve whole genome resequencing of all the individuals in a population, will allow assessment of the effect of point mutation,insertions,deletions and large structure variation. Many studies conducted using GWAS as a tool worldwide on different factors like temperature effect on cobs, agronomic variants, agroclimatic diversities, flowering and grain yield traits, disease diversity etc. A major benefit of GWAS is one time genotyping and repeated phenotyping in different environmental conditions help to study ̒n’ number of traits within a short period over a large area. The rapid development of high throughput sequencing technology is that, population choice for GWAS studies will no longer be restricted to current model organisms and will slowly become more forced on which species are more relevant for answering biological questions. References: 1. Ozoki, K., 2001, A high throughput SNP typing system for GWAS, Springer. 16:1134- 1137. 2. Tohn, P, A., 2009, Validating and refining GWAS signals, Nature. 10:318-329. Name: VARSHA ID:PALB 2235 Date: 29/10/2013 Time: 10:00 AM
  • 4. Introduction The level of the genetic diversity is pivotal for world food security and survival of human civilization on earth. Domestication resulted as improved cultivars in several crops to produce food for the better supply of the human diet. Presently 150 plant species cultivated in agriculture, twelve provide about 75% of human food and four produce 50% of human diet. According to FHO report, ∼800 million people are suffering from food deficiency. An attention to improve agricultural production to eliminate or, at least, reduce the feeding problems. The narrow genetic base of modern crop cultivars is the serious obstacle to sustain and improve crop productivity due to rapid vulnerability to potentially new biotic and abiotic stresses. Plant germplasm resources comprising of wild plant species, modern cultivars, and their crop wild relatives, are the important reservoirs of natural genetic variations. Originated from a number of historical genetic events as a respond to environmental stresses and selection through crop domestication • The objective of genetic mapping is to identify simply inherited markers in close proximity to genetic factors affecting quantitative traits (Quantitative trait loci, or QTL). • This localization relies on processes that create a statistical association between marker and QTL alleles and processes that selectively reduce that association as a function of the marker distance from the QTL. Why we need genome mapping? Gene mapping in the map of genes present inside our chromosome. In Eukaryotes genes are condensed tightly inside the compact system. We have to know which gene is answering for the trait of interest. Expression of genotypes give us phenotypes. We can’t look in to gene and genotypes, though it is our disability. So to know that we have to calculate mathematically. The further extension of mapping technology is to know the traits in a more easier, cheaper and within a short period of time.
  • 5. Genome • The genome is all the DNA in a cell. All the DNA on all the chromosomes ,includes genes, intergenic sequences, repeats, Specifically, it is all the DNA in an organelle. • Eukaryotes can have 2-3 genomes; Nuclear genome, Mitochondrial genome, Plastid genome respectively. If not specified, “genome” usually refers to the nuclear genome. Terminologies False negative: the declaration of an outcome as statistically non-significant, when the effect is actually genuine. False positive: the declaration of an outcome as statistically significant, when there is no true effect. Linkage: refers to coinheritance of different loci within a genetic distance on the chromosome. Linkage equilibrium: LE is a random association of alleles at different loci and equals the product of allele frequencies within haplotypes. Linkage disequilibrium: LD is a non-random association of alleles at different loci, describing the condition with non-equal frequency of haplotypes in a population. Minor allele Frequency(MAF):The frequency of the less common alleles of a polymorphic locus. Its value lies between 0 to 0.5,and can be vary between populations. Odd ratio: Measurement of association that is commonly used in case control studies. Defined as odd of exposure to the susceptible genetic variant in case compared with that in controls. If OR significantly greater than 1,then the genetic variant is associated with a disease. Association Mapping Association mapping, a high resolution method for mapping quantitative trait loci based on linkage disequilibrium. Association refers to covariance of a marker polymorphism and a trait of interest. The first association study to attempt a genome scanning plants was conducted in sea beet (Beta vulgaris ssp. maritima), a wild relative of sugar
  • 6. beet (Beta vulgaris ssp. vulgaris).The first association study of a quantitative trait based on a candidate gene was the analysis of flowering time and the dwarf8 (d8) gene in maize. . Association mapping is based on the principle of Linkage disequilibrium (LD) and is based on the entire population. How it works? A group of unrelated individuals normally presents variation for many phenotypic aspects, thus several traits can be studied in the same population using the same genotypic data. A higher proportion of molecular markers are likely to be polymorphic, providing better genome coverage than any biparental map. Elite lines are used for study, multi-year and multi-location phenotypic data may be available at no additional cost. Goal of association mapping Identification of susceptibility variant, replication in differecohort/population, u nderstanding of genetic function at cell level,this can lead to identification of durable targets, development of drug for prevention better understanding of the cellular processes that are involved in disease treatments. Association mapping offers many advantages over linkage analysis: • much higher mapping resolution; • greater allele number and broader reference population; • less research time in establishing an association • Utilizes existing individuals. • Multi-trial phenotypic data stored in databases can be used. Limitations • Resources for phenotyping and statistical issues. • Population structure results in spurious associations.
  • 7. Two types of association mapping: • Success of either methods depends on population size and degree of LD • Genome wide scanning Markers spanned across the genome, Moderate to extensive LD. If LD is high, GWA is useful with low resolution mapping. • Candidate gene scanning Sequencing only candidate gene which has low LD Flowchart of a gene association study
  • 8. . Biparental mapping GWAS What are genome-wide association studies? Genome-wide association studies are a relatively new way for scientists to identify genes involved in human disease. This method searches the genome for small variations, called single nucleotide polymorphisms or SNPs (pronounced “snips”), that occur more frequently in people with a particular disease than in people without the disease. Each study can look at hundreds or thousands of SNPs at the same time. Researchers use data from this type of study to pinpoint genes that may contribute to a person’s risk of developing a certain disease. Because genome-wide association studies examine SNPs across the genome, they represent a promising way to study complex, common diseases in which many genetic variations contribute to a person’s risk. This approach has already identified SNPs related to several complex conditions including diabetes, heart abnormalities, Parkinson disease, and Crohn disease. Researchers hope that future genome-wide association studies will identify more SNPs associated with chronic diseases, as well as variations that affect a person’s response to certain drugs and influence interactions between a person’s genes and the environment. 1) No cross required, works with existing germplasm. 2) Phenotypic data can be already available. 3) High resolution. 4) More than 2 alleles are tested. 5) Many loci for a single trait are concurrently analyzed. 6) Comparatively low. 1) Experimental cross required. 2) Phenotypes to be collected. 3) Limited mapping resolution. 4) Essentially 2 alleles are tested 5) Constraints to segregating loci 6) between parental lines. High detection power
  • 9. Synonyms Genome-wide case-control studies; Genome-wide genetic association analysis. Genome-wide association studies (GWAS) are projects to investigate the statistical association between phenotypes and a dense set of genetic markers (Genetic Marker) that capture a substantial amount of genetic variations in the genome, using a large number of matched samples. Phenotypes can be qualitative traits such as disease status or quantitative traits such as blood pressure. Statistical association between disease status and alleles of a genetic marker is carried out by categorical data analysis. Genetic markers are usually genotyped by microarray chips. Whether a substantial genetic variation in the genome, including common, rare, and structural variations, is captured by the set of markers depends on the number of markers and their chromosome locations. The typical number of single nucleotide polymorphism (SNP) markers used in a current GWAS, societies depends on the exploitation of genetic recombination and allelic diversity for crop improvement, and many of the world’s farmers depend directly on the harvests of the genetic diversity they sow for food and fodder as well as the next seasons seed (Smale et al., 2004). The considerable genetic diversity of traditional varieties of crops is the most immediately useful and economically valuable part of global biodiversity. Subsistence farmers use landraces as a key component of their cropping systems. Such farmers account for about 60% of agricultural land use and provide approximately 15-20% of the world’s food (Francis, 1986). In addition, landraces are the basic raw materials used by plant breeders for developing modern varieties. Over the last few decades, awareness of the rich diversity of exotic or wild germplasm has increased. This has lead to a more intensive use of this germplasm in breeding and thereby yields of many crops increased dramatically. Aim to identify which regions (or SNPs) in the genome are associated with disease or certain phenotype.
  • 10. Design: it identifies the population structure, Select case subjects (those with disease),Select control subjects (healthy),Genotype a million SNPs for each subject, Determine which SNP is associated, Encoded data ,Ranking SNPs. History of GWAS Successful study published in 2005, with investigating patients age related molecular degeneration. Prior to GWAS in 2000 Inheritance studies of linkage families. Then the revolution occurred in HapMap2003, which is the variety of sequencing techniques to discover and catalog SNPs in different population. Human Genome Project The Human Genome Project was declared complete in April 2003. An initial rough draft of the human genome was available in June 2000 and by February 2001 a working draft had been completed and published followed by
  • 11. the final sequencing mapping of the human genome on April 14, 2003. Although this was reported to be 99% of the human genome with 99.99% accuracy a major quality assessment of the human genome sequence was published on May 27, 2004 indicating over 92% of sampling exceeded 99.99% accuracy which is within the intended goal. Further analyses and papers on the HGP continue to occur. Hap Map Hap Map is a Multi-country effort to identify, catalog common human genetic variants. Developed to better understand and catalogue LD patterns across the genome in several populations. Genotyped ~4 million SNPs on samples of African, east Asian, European ancestry. All genotype data in a publicly available data base. here we can download the genotype data. It is able to examine LD patterns across genome, Can estimate approximate coverage of a given SNP chip and Can represent 80-90% of common SNPs with~300,000 tag SNPs for European or Asian samples and~500,000 tag SNPs for African samples. Thousand genome project Another spinoff from Human genome project, the 1000 genome project launched in 2008.A 3 year project covered most of the countries worldwide. It mainly targeted the African countries. This 1000 genome project showing researchers how dynamic the human genome really is and why it is so. Concepts Underlying the Study Design Single Nucleotide Polymorphisms The modern unit of genetic variation is the single nucleotide polymorphism or SNP. SNPs are single base-pair changes in the DNA sequence that occur with high frequency in the human genome. For the purposes of genetic studies, SNPs are typically used as markers of a genomic region, with the large majority of them having a minimal impact on biological systems. SNPs can have functional consequences, however, causing amino acid changes, changes to mRNA transcript stability, and changes to transcription factor binding affinity. SNPs are by far the most abundant form of genetic variation in the human genome. SNPs are notably a type of common genetic variation; many SNPs are present in a large proportion of human populations. SNPs typically have two alleles, meaning within a population there are two
  • 12. commonly occurring base-pair possibilities for a SNP location. The frequency of a SNP is giving in terms of the minor allele frequency or the frequency of the less common allele. For example, a SNP with a minor allele (G) frequency of 0.40 implies that 40% of a population has the G allele versus the more common allele (the major allele), which is found in 60% of the population. Linkage Disequilibrium Linkage disequilibrium (LD) is a property of SNPs on a contiguous stretch of genomic sequence that describes the degree to which an allele of one SNP is inherited or correlated with an allele of another SNP within a population. The term linkage disequilibrium was coined by population geneticists in an attempt to mathematically describe changes in genetic variation within a population over time. It is related to the concept of chromosomal linkage, where two markers on a chromosome remain physically joined on a chromosome through generations of a family. Recombination events within a family from generation to generation break apart chromosomal segments. This effect is amplified through generations, and in a population of fixed size undergoing random mating, repeated random recombination events will break apart segments of contiguous chromosome (containing linked alleles) until eventually all alleles in the population are in linkage equilibrium or are independent. Thus, linkage between markers on a population scale is referred to as linkage disequilibrium. The rate of LD decay is dependent on multiple factors, including the population size, the number of founding chromosomes in the population, and the number of generations for which the population has existed. As such, different human sub-populations have different degrees and patterns of LD. African-descent populations are the most ancestral and have smaller regions of LD due to the accumulation of more recombination events in that group. European-descent and Asian descent populations were created by founder events (a sampling of chromosomes from the African population), which altered the number of founding chromosomes, the population size, and the generational age of the population. These populations on average have larger regions of LD than African-descent groups. Many measures of LD have been proposed, though all are ultimately related to the difference between the observed frequency of co-occurrence for two alleles (i.e. a two-marker haplotype) and the frequency expected if the two markers are independent. The two commonly used measures of linkage disequilibrium are D’ and r². LD in animal system
  • 13. LD in Humans:  LD has been studied extensively in humans (Homo sapiens) - Pritchard & Przeworski  There is tremendous heterogeneity in human LD estimates because of differences in loci, marker types (microsatellites versus SNPs), sample populations, and chromosome type (sex chromosomes versus autosomes). LD in other animal systems:  LD studies have also been conducted in cattle and Fruit flies (Drosophila melanogaster)  Extensive LD reported in the Dutch black and white dairy cattle populations Globalization of semen trading. LD in plant systems MAIZE: In maize (Zea mays ssp. mays), several studies have been conducted to investigate. LD over a wide range of population and marker types. The patterns of LD vary substantially with the population chosen. Ten million investigated sequence diversity at 21 loci on chromosome 1 in a diverse group of maize germplasm. ARABIDOPSIS: The LD pattern in Arabidopsis is a sharp contrast to the pattern in maize. During last five years most studies described LD in Wheat and Barley, besides single reports on rice, rye grass, soybean, sugarcane and sorghum. The factors, which lead to an increase in LD, include Inbreeding, Small population size, Genetic isolation between lineages, Population subdivision, Low recombination rate, Population admixture, Natural and artificial selection, Balancing selection, etc. The factors, which lead to a decrease/disruption in LD, include Outcrossing, High recombination rate, High mutation rate, etc.
  • 14. Stages of GWAS designs 1) One stage design-First time used in HapMap project where for this cost involvement was more. It is a process of genotyping all samples on all the markers. 2) Two stage design-It reduces the genotyping requirements and reduces the false positive rate. 3) Multistage designs-Joint analysis has more power than replication. p- value in Stage 1 must be liberal. CaTs power calculator. Here signals from an initial, First-stage GWA are used to define a subset of SNPs that are retyped in additional second stage samples. Lower cost do not gain power. Analysis of GWAS Most common approach: look at each SNP one-at-a-time. Possibly add in multi- marker information. Further investigate / report top SNPs only Or backwards replication…Most commonly trend test.Log additive model, logistic regression are the foremost methods to analyze GWAS. Adjust for potential population stratification. Basics for GWAS Calculate the odd ratio • If 2 events are considered • odds of A and B is OR = Odds(A)/Odds(B) = Pr(A)/Pr(~A) / Pr(B)/Pr(~B) • Odd ratio of many Events comparing the 2 groups OR = Odds(D|G=1)/Odds(D|G=0) = = Pr(D|G=1)/Pr(~D|G=1) / Pr(D|G=0)/Pr(~D|G=0) = Pr(D|G=1)/Pr(D|G=0) x Pr(~D|G=0)/Pr(~D|G=1) = RR x Pr(~D|G=0)/Pr(~D|G=1). • Symmetry in odds ratio OR = Odds(D|G=1)/Odds(D|G=0) = Odds(G|D=1)/Odds(G|D=0). Testing Significance by using Chi-square test and Rank SNP by P- value. (Statistical test of association). Challenges we are going to address while conducting GWAS
  • 15. Multiple hypothesis testing. In GWAS the number of statistical tests is commonly is on the order of 10⁶. At significance level of 0.01we would expect 10,000 false positive. Thus individual p-value <0.01are not significant anymore. Correction of multiple hypothesis testing is critical. Population structure-Confounding structure leads to false positive. It requires favorable conditions like Statistical power and resolution Small samples, large number of hypothesis, increased power, testing compound hypothesis. Association Test Single Locus Analysis When a well-defined phenotype has been selected for a study population, and genotypes are collected using sound techniques, the statistical analysis of genetic data can begin. The de facto analysis of genome-wide association data is a series of single-locus statistic tests, examining each SNP independently for association to the phenotype. The statistical test conducted depends on a variety of factors, but first and foremost, statistical tests are different for quantitative traits versus case/control studies. Quantitative traits are generally analyzed using generalized linear model (GLM) approaches, most commonly the Analysis of Variance (ANOVA), which is similar to linear regression with a categorical predictor variable, in this case genotype classes. The null hypothesis of an ANOVA using a single SNP is that there is no difference between the trait means of any genotype group. The assumptions of GLM and ANOVA are 1) the trait is normally distributed; 2) the trait variance within each group is the same (the groups are homoscedastic); 3) the groups are inde- pendent. Dichotomous case/control traits are generally analyzed using either contingency table methods or logistic regression. Contingency table tests examine and measure the deviation from independence that is expected under the null hypothesis that there is no association between the phenotype and genotype classes. The most ubiquitous form of this test is the popular chi-square test (and the related Fisher’s exact test). Logistic regression is an extension of linear regression where the outcome of a linear model is transformed using a logistic function that predicts the probability of having case status given a genotype class. Logistic regression is often the preferred approach because it allows for adjustment for clinical covariates (and other factors), and can provide adjusted odds ratios as a measure of effect size. Logistic regression has been extensively developed, and numerous diagnostic procedures are available to aid interpretation of the model.
  • 16. For both quantitative and dichotomous trait analysis (regardless of the analysis method), there are a variety of ways that genotype data can be encoded or shaped for association tests. The choice of data encoding can have implications for the statistical power of a test, as the degrees of freedom for the test may change depending on the number of genotype-based groups that are formed. Allelic association tests examine the association between one allele of the SNP and the phenotype. Genotypic association tests examine the association between genotypes (or genotype classes) and the phenotype. The genotypes for a SNP can also be grouped into genotype classes or models, such as dominant, recessive, multiplicative, or additive models. Each model makes different assumptions about the genetic effect in the data assuming two alleles for a SNP, A and a,a dominant model (for A) assumes that having one or more copies of the A allele increases risk compared to a (i.e. Aa or AA genotypes have higher risk). The recessive model (for A) assumes that two copies of the A allele are required to alter risk, so individuals with the AA genotype are compared to individuals with Aa and aa genotypes. The multiplicative model (for A) assumes that if there is 36risk for having a single A allele, there is a 96risk for having two copies of the A allele: in this case if the risk for Aa is k, the risk for AA is k2. The additive model (for A) assumes that there is a uniform, linear increase in risk for each copy of the A allele, so if the risk is 36for Aa, there is a 66risk for AA - in this case the risk for Aa is k and the risk for AA is 2k.A common practice for GWAS is to examine additive models only, as the additive model has reasonable power to detect both additive and dominant effects, but it is important to note that an additive model may be underpowered to detect some recessive effects . Rather than choosing one model a priori, some studies evaluate multiple genetic models coupled with an appropriate correction for multiple testing. Multi-Locus Analysis In addition to single-locus analyses, genome-wide association studies provide an enormous opportunity to examine interactions among genetic variants throughout the genome. Multi-locus analysis, however, is not nearly as straightforward as conducting single-locus tests, and presents numerous computational, statistical, and logistical challenges. Because most GWAS genotype between 500,000 and one million SNPs, examining all
  • 17. pair-wise combinations of SNPs is a computationally intractable approach, even for highly efficient algorithms. One approach to this issue is to reduce or filter the set of genotyped SNPs, eliminating redundant information. A simple and common way to filter SNPs is to select a set of results from a single-SNP analysis based on an arbitrary significance threshold and exhaustively evaluate interactions in that subset. This can be perilous, however, as selecting SNPs to analyze based on main effects will prevent certain multi-locus models from being detected – so called ‘‘purely epistatic’’ models with statistically undetectable marginal effects. With these models, a large component of the heritability is concentrated in the interaction rather than in the main effects. In other words, a specific combination of markers incurs a significant change in disease risk. The benefits of this analysis are that it performs an unbiased analysis for interactions within the selected set of SNPs. It is also far more computationally and statistically tractable than analyzing all possible combinations of markers. Missing heritability In many complex diseases there are numerous genetic variants which have been identified. But for many of the recent studies these common variants only explain a small fraction of the increased risk. Most of those that have been identified have no established biological relevance to the disease and often they are not located inside ’active’ genes . From the last years of GWAS it is clear that the common variants fail to explain the majority of the genetic heritability of most human diseases. This suggests that the hypothesis of ’Common disease, common variant’ is not as valid as was previously believed. The problem is that the biological reality does not correspond to the study design and assumption of GWAS, and the solution is not to increase the sample size even further but to improve the study design and statistical methods. One possible explanation to the missing heritability could be some kind of interaction between different genes(epistasis). These interactions could be hard to detect when analyzing one SNP at the time, as the marginal effect of a single SNP will be small. Another explanation is that part of the increased risk can be explained by many rare variants, which are present among less than 1 % of the population. This suggests that there could be heterogeneity, where different genetic profiles can cause diseases that are diagnostically the same.
  • 18. Genetic Interactions A general definition of genetic interaction (epistasis) is that the effect (pene- trance) of one locus varies according to the genotype present at another locus. To detect interactions we need to define how a ’natural’ combined effect of two risk loci would be expressed in the organism. The concept of gene-gene interactions is not new, but still it is confusing since the term is used in various ways. Biological interaction or epistasis was defined first by Bateson in 1909 . In that example one of the alleles at one locus G is preventing the alleles at locus B from being expressed in the organism. This relation does not necessarily have to be symmetric. This definition is similar to the definition biologists use to examine a biological interaction between proteins, where proteins interact to regulate several cellular processes. In statistics the definition of interaction is usually a deviation from a linear model. In 1918 Fisher made a statistical definition of epistasis [28], as deviation from additivity in effects of the alleles at different loci on a quantitative trait. This definition is more similar to the classical statistical definition of interaction and do not quite correspond to the biological definition of epistasis. These definitions get troublesome when the trait is binary, in these cases the mathematical modelling often focus on the penetrances. Hence the definitions of epistasis need to be modified. For binary traits an example could be that both allele A and allele B at two different loci are needed to develop the trait. In this case A is epistatic to B, and B is epistatic to A, hence the epistasis is symmetric in contrast to the definition by Bateson. A classic way to represent lack of epistasis has been the heterogeneity model, a person gets the trait by possessing (at least) one of the predisposing genotypes. This definition actually falls under Bateson’s definition of epistasis, for example if a person has both risk variants (situated at different loci) the effect of allele A will be masked by allele B - another confusing issue about these genetic interactions. There are two types of genetic heterogeneity, allelic heterogeneity is when several mutations on the same allele cause the same disease. Locus heterogeneity means that mutations in several unrelated loci can cause the same disorder.
  • 19. The above example of locus heterogenetity could be generalized to a situation without full penetrance, that is 0 <fi,j < 1 for some of the penetrances. Mathematically, locus heterogeneity can be expressed as fij = i + $j # i$j where ↵i and $j are the penetrance factors for the two genetic variants.Locus heterogeneity is similar to a daisy chain, where it is enough for one of the components to break (caused by having at least one of the risk variants) for the entire system to malfunction, i.e. to obtain the disease. There are two other common two-locus models for binary traits, the multiplicative model and the additive model. The multiplicative model can be expressed as fij = ↵i$j ,this model is often considered as epistatic. Both the additive model fij = ↵i + $j , and the heterogeneity model are thought of as non-epistatic by most authors. However, some authors considers epistasis as departure from the multiplicative model. Further problems appear when considering that both the multiplicative and the heterogeneity models become additive with suitable log transformations. It will be difficult to really model the true epistatic interactions in complex diseases, and discovered epistatic effects may have limited input to the understanding of the disease. Still, models that allow for interactions can improve the statistical power of detecting the genetic risk variants .The main issue in finding interactions, independent of how you define epistasis, is how you should detect it in complex diseases when analyzing millions of genetic markers. Assume that the disease is caused by different mutations on different loci in various families, and these genes have a strong effect in each of the subpopulations. Then the heterogenetic risk genes will probably show a very weak marginal effect when the markers are analyzed one at the time. For epistatic interactions it will be very computationally demanding to examine all possible gene-gene interactions, in addition to the issue of correcting for testing multiple hypotheses. One way to handle this is to first test for marginal main effects for each marker in the sample, and hope that the genes involved in interactions will also show at least a modest marginal effect. Then the results from this analysis is combined with biological knowledge to suggest a number of candidates for interaction analysis.
  • 20. Data Imputation To conduct a meta-analysis properly, the effect of the same allele across multiple distinct studies must be assessed. This can prove difficult if different studies use different genotyping platforms (which use different SNP marker sets). As this is often the case, GWAS datasets can be imputed to generate results for a common set of SNPs across all studies. Genotype imputation exploits known LD patterns and haplotype frequencies from the HapMap or 1000 Genomes project to estimate genotypes for SNPs not directly genotyped in the study. The concept is similar in principle to haplotype phasing algorithms, where the contiguous set of alleles lying on a specific chromosome is estimated. Genotype imputation methods extend this idea to human populations. First, a collection of shared haplotypes within the study sample is computed to estimate haplotype frequencies among the genotyped SNPs. Phased haplotypes from the study sample are compared to reference haplotypes from a panel of much more dense SNPs, such as the HapMap data. The matched reference haplotypes contain genotypes for surrounding markers that were not genotyped in the study sample. Because the study sample haplotypes may match multiple reference haplotypes, surrounding genotypes may be given a score or probability of a match based. On the haplotype overlap. For example, rather than assign an imputed SNP a single allele A, the probability of possible alleles is reported (0.85 A,0.12 C,0.03 T)based on haplotype frequencies. This information can be used in the analysis of imputed data to take into account uncertainty in the genotype estimation process, typically using Bayesian analysis approaches. Popular algorithms for genotype imputation include BimBam , IMPUTE , MaCH ,and Beagle . Much like conducting a meta-analysis, genotype imputation must be conducted with great care. The reference panel (i.e. the 1000 Genomes data or the HapMap project) must contain haplotypes drawn from the same population as the study sample in order to facilitate a proper haplotype match. If a study was conducted using individuals of Asian descent, but only European descent populations are represented in the reference panel, the genotype imputation quality will be poor as there is a lower probability of a haplotype match. Also, the reference allele for each SNP must be identical in both the study sample and the reference panel. Finally, the analysis of imputed genotypes should account for the uncertainty in genotype state generated by the imputation process.
  • 21. Statistical methods in GWAS If a genetic marker is associated to a particular disease, then the genotype or allele frequencies will be different among affected and healthy individuals. A commonly used test for searching for associated SNPs in case- control studies is a Pearson % test applied to a 2-by-2 table of allele counts in the two groups. For complex traits it is commonly assumed that the contribution to the genetic effect from each SNP is roughly additive, i.e. the penetrance for heterozygous are somewhere in between the penetrance for the two homozygotes. This test is powerful for additive models, whereof the popularity of this test in these studies. Other common tests include a Pearson %test comparing the genotype frequencies instead of allele frequencies, Cochran Armitage test for trend in penetrances, and logistic regression. The Transmission Disequilibrium Test (TDT) is an association test using data from families with at least one affected child. This test was introduced by Spielman et al. , and the test evaluates the transmission of an allele from a heterozygous parent to the offspring. The TDT is based on the assumption that each of the two alleles M1 and M2 at a locus is transmitted with equal probability to the offspring, hence for a sample of heterozygous parents we expect approximately half of them to transmit the alleleM1. If one of the alleles is transmitted more often among families where the children have a genetic disease, we suspect that the allele is associated to the disease. Let b denote the number of heterozygous parents who transmits alleleM1 to their offspring, and c the number of heterozygous parents who transmits allele M2. Conditioned on b + c, b is is binomially distributed, but usually the test statistic has the following form T = (b # c)2 b + c, This test asymptotically follows a % distribution and is equivalent to a Pearson%2-test. Logistic Regression Generalized Linear Models (GLMs) extend the ordinary regression model to other response variables than the Normal distributed. GLMs are applicable if the response variable has a distribution which belongs to the natural exponential family. One of those distributions is the Binomial distribution, and with Logistic Regression we model the binomial probability p(x)= P(Y =1|x) as logp(x) 1 # p(x)= ↵ +Xj$jxj Here xj denotes the value of the jth element in the predictor x. In the simple Logistic regression with one binary predictor x, $ is equal to the log odds ratio$ = p(x = 1)/(1 # p(x = 1)) p(x = 0)/(1 # p(x = 0)).
  • 22. In retrospective (individuals are sampled based on their affection status) studies the effect parameter $ will be the same as in the prospective (sampling based on the predictors) design, if we assume that the sampling probability is independent of x. This is one of the main reasons for using this method in biomedical studies. Another advantage with the logistic regression is that it is easy to include several predictor in the analysis and make inference for interactions between genes and environment, as well as gene-gene interactions. Schaid described a univariate method for case-parent data, modelling genotype relative risks with conditional logistic regression using three pseudo controls based on the parents’ untransmitted alleles. This method can be generalized to two loci. For case-control data logistic regression can be used to analyse interactions by comparing the saturated model to an additive model, specified on the form of. The additive logistic model is roughly equivalent to the heterogeneity mode if the relative risk (RR) or odds ratio (OR) is of moderate size. However, North et. al show examples of heterogeneity models which are marginally recessive (marginal RR⇡ 150), in this case the logistic regression yields non-zero interaction estimates. Hence, to really examine deviations from the heterogeneity model (and not the multiplicative or logistic model) more advanced methods need to be applied. GWAS studied in Various crops: 1.Arabidopsis Genome-Wide Association Mapping in Arabidopsis Identifies Previously Known Flowering Time and Pathogen Resistance Genes. A very large number of spurious genotype–phenotype correlations are found, especially for traits that vary geographically. For example, plants from northern latitudes flower later; however, in addition to sharing genetic variants that make them flower late, they also tend to share variants across the genome, making it difficult to determine which genes are responsible for flowering. This notwithstanding, several previously known genes were successfully identified in this study, and the researchers are optimistic about the prospects for association mapping in this species. They checked flowering time and pathogen resistance in a sample of 95 accessions for which genome wide polymorphism data were available. In spite of an extremely high rate of false positives due to population structure, we were able to identify known major genes for all phenotypes tested, thus demonstrating the potential of genome-wide association mapping in A. thaliana
  • 23. and other species with similar patterns of variation. The rate of false positives differed strongly between traits, with more clinal traits showing the highest rate. However, the false positive rates were always substantial regardless of the trait, highlighting the necessity of an appropriate genomic control in association studies. The columns on the left give the genotype and associated phenotype for four loci, for each of the 95 accessions. The four loci are the flowering time locus FRI (þ, wild-type; 1, Ler null allele; 2, Col null allele, for which the associated phenotype is flowering time in long-day conditions without vernalization (late flowering is indicated by height and color of bar), and the three pathogen
  • 24. resistance loci Rps5, Rpm1, and Rps2 (þ, wild-type;, null allele, for which the associated phenotypes are hypersensitive response to the appropriate bacterial avr gene (red indicates resistance, black indicates susceptibility, and missing data are indicated by missing bar). The tree on the right illustrates the genetic relationships between the accessions. GWAPP: A Web Application for Genome-Wide Association Mapping in Arabidopsis. GWAPP, an interactive Web-based application for conducting GWAS in A. thaliana. Using an efficient implementation of a linear mixed model, traits measured for a subset of 1386 publicly available ecotypes can be uploaded and mapped with a mixed model and other methods in just a couple of minutes. GWAPP features an extensive, interactive, and user-friendly interface that includes interactive Manhattan plots and linkage disequilibrium plots. It also facilitates exploratory data analysis by implementing features such as the inclusion of candidate polymorphisms in the model as cofactors. Fig 1(A) The filter box allows the user to exclude specific accessions as well as change the name and the description of the data set.(B) The data set list displays information for each accession in the data set. In edit mode, the user can use the checkbox to add and remove accessions from the data set.(C) A Google map shows the locations of all accessions in the data set. Clicking on one marker will show a pop-up with information about the name and ID of the selected accession.(D) The geographic distribution map (GeoMap) shows the geographic distribution of the accessions in the data set. Moving the mouse over a country will show the number of accessions located in that region. Fig2.The result view displays GWAS plots for each of the five chromosomes. Each GWAS plot itself consists of three panels. The top panel (A) contains a scatterplot. The positions on the chromosome are on the x axis and the score on the y axis. The dots in the scatterplot represent SNPs (E).A horizontal dashed line (H) shows the 5% FDR threshold. At the top of the GWAS results view, a search box for genes is displayed (D). These genes will be displayed as a colored band (red in the figure). The second panel (B) shows the gene annotation and is only shown for a specific zoom range (<1.5Mb). It will display genes, gene features, and gene names. Moving the mouse over a gene will display additional information in a pop-up (F), and clicking on a gene will open the TAIR page for the specific gene. Panel (C) displays various chromosome- wide statistics. The region highlighted by a yellow band (I) is shown in the
  • 25. scatterplot and in the gene annotation. The gear icon opens a pop-up (G) with the available statistics the user can choose from.
  • 26. Genome-wide association mapping reveals a rich genetic architecture of complex traits in Oryza sativa.o Here the earlier approaches revealed that the Biparental and QTL approaches are not scalable to investigate the genetic potential and tremendous phenotypic variation of more than 12000 accessions available in public germplasm repositories. Here they took global collection of 413 diverse(sativa) varieties from 82 contries using high quality custom designed 441000 oligonucleotides phenotyping array.For these accesions they phenotyped 34 morphological, developmental and agronomic traits over 2 consicutive field seasons. This mapping stretegy evaluated variation both within and among 4 of the major subgroups of rice, revealing significant heterogenity of genetic architechture among groups as well as gene by environmental effect.
  • 27. Fig. Phenotypic distribution and genome-wide association scan for plant height. ( a ) Quantile – Quantile plots for both na ï ve and mixed model for plant height in all samples. ( b ) Boxplot showing the differences in plant height among subpopulations. Box edges represent the upper and lower quantile with median value shown as bold line in the middle of the box. Whiskers represent 1.5 times the quantile of the data. Individuals falling outside the range of the whiskers shown as open dots. ( c ) Histogram of plant height in all samples. Dashed black line represents the null distribution. ( d ) Genome-wide P -values from the mixed model and na ï ve method. x axis shows the SNPs along each chromosome; y axis is the− log 10 (P -value) for the association. Dots in ( a ) and ( c ) indicate SNPs with P -values <1 × 10 −4 in the mixed model and the top 50 SNPs in the naïve method; SNPs within 200 kb range of known genes are in red; other significant SNPs are in blue. Candidate gene locations shown as red vertical dashed lines with names on top. A Genome-Wide Association Study Identifies Genomic Regions for Virulence in the Non-Model Organism Heterobasidion annosum s.s The dense single nucleotide polymorphisms (SNP) panels needed for genome wide association (GWA) studies have hitherto been expensive to establish and use on non-model organisms. To overcome this, we used a next generation sequencing approach to both establish SNPs and to determine genotypes. We conducted a GWA study on a fungal species, analyzing the virulence of Heterobasidion annosum s.s., a necrotrophic pathogen, on its hosts Picea abies and Pinus sylvestris. From a set of 33,018 single nucleotide polymorphisms (SNP) in 23 haploid isolates, twelve SNP markers distributed on seven contigs were associated with virulence (P,0.0001). Four of the contigs harbour known virulence genes from other fungal pathogens and the remaining three harbour novel candidate genes. Two contigs link closely to virulence regions recognized previously by QTL mapping in the congeneric hybrid H. irregulare6H. occidentale. The study demonstrates the efficiency of GWA studies for dissecting important complex traits of small populations of non- model haploid organisms with small genomes. Genome-wide association study (GWAS) of resistance to head smut in maize Head smut, caused by the fungus Sphacelotheca reiliana (Kühn) Clint, is a devastating global disease in maize, leading to severe quality and yield loss each year. The present study is the first to conduct a
  • 28. genome-wide association study (GWAS) of head smut resistance using the Illumina MaizeSNP50 array. Out of 45,868 single nucleotide polymorphisms in a panel of 144 inbred lines, 18 novel candidate genes were associated with head smut resistance in maize. These candidate genes were classified into three groups, namely, resistance genes, disease response genes, and other genes with possible plant disease resistance functions. The data suggested a complicated molecular mechanism of maize resistance against S. reiliana. This study also suggested that GWAS is a useful approach for identifying causal genetic factors for head smut resistance in maize. Fig. Manhattan plots of a mixed linear model (MLM) for resistance to head smut. Plots above the blue horizontal dashed line show the genome-wide significance with a moderately stringent threshold of −log (1/45,868). Plots above the red horizontal dashed line show the genome- wide significance with stringent threshold of −log(0.05/45,868). The different colors indicate plots for different chromosomes, which follow the order: chromosome 1–chromosome 10. The plots with the −log10 (P) Value above 8 were not shown Advantages of GWAS 1) Biological pathway of the trait does not have to be known. 2) Potential to discover novel candidate genes, not identified through other methodological approaches. 3) Encourage the formation of collaborative consortia to recruit sufficient number of participants for analysis, which tend to continue their collaboration with subsequent analysis.
  • 29. 4) Rules act at specific genetic association. 5) Provides data on ancestry of each subject, which assists in matching case subject with control subject. 6) Provides data on 2 types of structural variants-sequence and copy number variations- which provides more robust data. 7) It is large enough to identify mutations explaining a few percent of phenotypic variance. Disadvantages 1) Results need replication in independent samples in different population. 2) A large study of population is required. 3) GWAS detect association not causation. 4) Identifying specific location not complete gene. Many variants identified are nowhere near a protein coding gene or are within genes that were not previously believed to associate with a trait or condition. 5) Falls on common variants. 6) Detect any variant that are common(>5%) in a population. 7) Typically for any particular trait, the cumulative effect of multiple SNPs only explain a small function of an individual risk of a train. Still why GWAS is popular? The dropping genotyping costs are likely to drive association studies away from candidate genes. It involves whole genome resequencing of all the individuals in a population, will allow an assessment of point mutation, insertions deletions and large structure variation such as copy number variation Eg. Resequencing of Arabidopsis lyrata. In future this will help in RNA-seq data to include in e-QTL mapping in GWAS studies. Population choice for GWAS studies will no longer restricted to model organisms will slowly become more focused on the spp which are more relevant in answering biological questions. The accuracy of GWAS depends on 1 time genotyping and repeated phenotyping in different environmental conditions. Output of GWAS • To ensure greatest utility of GWAS result in the future,all phenotype and genotype data should to be made public and be deposited in public databases.
  • 30. • As such file format and minimum information standards should to be established, such as those available for sequence data or microarray experiments. Priority to storage and dissemination of phenotypic and genotypic data. Future perspectives Despite the caveats outlined above, it seems that genome-wide association studies of the role of common variants in complex disease will be carried out in the near future. Initial studies will define more accurately the principal factors, which have been summarized above, that can reduce the power of such studies. In these studies, large sample sizes should be used, biases taken into account, multiple-testing issues addressed and replication studies carried out, therefore optimizing experimental design, statistical power and cost efficiency. Close evaluation of the yields of true susceptibility loci in relation to the cost of such rigorously designed studies will determine whether the genome-wide analyses of common SNPs is a worthwhile approach in the continuing dissection of the genetic basis of common disease. Summary and conclusions The past year has seen a remarkable shift in our capacity to dissect the genetic basis of common diseases and continuous traits of biomedical significance. The GWA approach has proven itself extremely well-suited to the identification of common SNP-based variants with modest to large effects on phenotype. Careful implementation and appropriate interpretation has resulted in discoveries that have proven more robust than many had anticipated. Growing numbers of novel susceptibility loci have been identified, shedding light on the fundamental mechanisms that influence disease predisposition, and much is being learned about the complex relationships between changes in genome sequence and phenotypic variation. However, we are far from the end of this particular voyage, and recent discoveries are nothing more than initial forays into the terra incognita of our genomes. We remain unable to explain more than a small proportion of observed familial clustering for most multifactorial traits, a fact that emphasizes the need to extend analysis to a more complete range of potential susceptibility variants, and to support more explicit modelling of the joint effects of genes and environment. Many of the greatest challenges to be faced in the years ahead lie not so much in the identification of the association
  • 31. signals themselves, but in defining the molecular mechanisms through which they influence disease risk and/or phenotypic expression. Reference Altshuler DM, Gibbs RA, Peltonen L, Altshuler DM, Gibbs RA, et al. (2010) Integrating common and rare genetic variation in diverse human populations. Nature 467: 52–58. doi: 10.1038/ nature09298. Atwell S, Huang YS, Vilhjalmsson BJ, Willems G, Horton M, et al. (2010)Genome-wide association study of 107 phenotypes in Arabidopsis thaliana inbred lines. Nature 465: 627–631. Connelly CF, Akey JM (2012) On the prospects of whole-genome association mapping in Saccharomyces cerevisiae. Genetics. Cooper GM, Johnson JA, Langaee TY, Feng H, Stanaway IB, et al. (2008) A genome-wide scan for common genetic variants with a large influence on warfarin maintenance dose. Blood 112: 1022–1027. doi: 10.1182/blood- 2008-01- 134247 Cumagun CJR, Bowden RL, Jurgenson JE, Leslie JF, Miedaner T (2004) Genetic mapping of pathogenicity and aggressiveness of Gibberella zeae (Fusarium graminearum) toward wheat. Phytopathology 94: 520–526. Edwards AO, Ritter R, III, Abel KJ, Manning A, Panhuysen C, et al. (2005) Complement factor H polymorphism and age-related macular degener- ation. Science 308: 421–424. doi: 10.1126/ science.1110189. Ellison CE, Hall C, Kowbel D, Welch J, Brem RB, et al. (2011) Population genomics and local adaptation in wild isolates of a model microbial eukaryote. Freedman, M. et al. Assessing the impact of population stratification on genetic association studies. Nat. Genet. 36, 388–393 (2004). Freimer, N. & Sabatti, C. The use of pedigree, sib-pair and association studies of common diseases for genetic mapping and epidemiology. Nature Genet. 36, 1045–1051 (2004).A clear and unbiased review of the main current genetic mapping strategies that discusses analyses using extended pedigrees, affected sib-pairs and association.
  • 32. Genomes Project Consortium (2010) A map of human genome variation from population-scale sequencing. Nature 467: 1061–1073. Doi:10.1038 /nature09534. Griffith OL, Montgomery SB, Bernier B, Chu B, Kasaian K, et al. (2008) ORegAnno: an open- access community-driven resource for regulatory annotation. Nucleic Acids Res 36: D107-D113. doi: 10.1093/nar/gkm967 Haines JL, Hauser MA, Schmidt S, Scott WK, Olson LM, et al. (2005) Complement factor H variant increases the risk of age-related macular degeneration. Science 308: 419–421. doi: 10.1126/science.1110359. Hall D, Tegstrom C, Ingvarsson PK (2010) Using association mapping to dissect the genetic basis of complex traits in plants. Brief Funct Genomics 9: 157–165. Hawthorne B, Rees-George J, Bowen J, Ball R (1997) A single locus with a large effect on virulence in Nectria haematococca MPI. Fungal Genet Newsl 44: 24–26. Hindorff LA, Sethupathy P, Junkins HA, Ramos EM, Mehta JP, et al. (2009)Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc Natl Acad Sci U S A 106: 9362–9367. IRGSP .he map-based sequence of the rice genome. Nature 436 , 793– 800( 2005). Klein RJ, Zeiss C, Chew EY, Tsai JY, Sackler RS, et al. (2005) Complement factor H polymorphism in age-related macular degeneration. Science 308: 385–389. doi: 10.1126/science.1109557. Lander, E.S. & Schork, N.J.Genetic dissection of complex traits. Science 265,2037–2048 (1994). Li, Y., Huang, Y.S., Bergelson, J., Nordborg, M., and Borevitz, J.O.(2010).Association mapping of local climate-sensitive quantitative trait loci in Arabidopsis thaliana. Proc.Natl.Acad.Sci.USA 107: 21199– 21204. Lind M, Dalman K, Stenlid J, Karlsson B, Olson A (2007) Identification of quantitative trait loci affecting virulence in the basidiomycete Heterobasidion annosum s.l. Curr Genet 52: 35–44.
  • 33. Lind M, van der Nest M, Olson A ˚ , Brandstro ¨m-Durling M, Stenlid J (2012) A 2nd generation linkage map of Heterobasidion annosum s.l. based on in silico anchoring of AFLP markers. PLoS One 7: e48347. Liti G, Carter DM, Moses AM, Warringer J, Parts L, et al. (2009) Population genomics of domestic and wild yeasts. Nature 458: 337–341. Lohmueller, K. et al. Meta-analysis of genetic association studies supports a contribution of common variants to susceptibility to common disease. Nat. Genet. 33,177–182 (2003). Muller LAH, Lucas JE, Georgianna DR, McCusker JH (2011) Genome-wide association analysis of clinical vs. nonclinical origin provides insights into Saccharomyces cerevisiae pathogenesis. Mol Ecol 20: 4085–4097. Neafsey DE, Barker BM, Sharpton TJ, Stajich JE, Park DJ, et al. (2010)Population genomic sequencing of Coccidioides fungi reveals recent hybridization and transposon control. Genome Res 20: 938–946. Olson A, Stenlid J (2001) Plant pathogens - Mitochondrial control of fungalhybrid virulence. Nature 411: 438–438. Ozoki, K., 2001, A high throughput SNP typing system for GWAS, Springer. 16:1134-1137. Pandelova I, Ciuffetti LM (2005) A proteomics-based approach for identification of the Tox D gene. Fungal Genet Newsl 52. Price,A. L. et al. Principal components analysis corrects for stratification in genome-wide association studies .Nat. Genet.38, 906 –909 (2006). Santoyo F, Gonzalez AE, Terron MC, Ramirez L, Pisabarro AG (2008)Quantitative linkage mapping of lignin-degrading enzymatic activities in Pleurotus ostreatus. Enzyme Microb Technol 43: 137–143. Tohn, P, A., 2009, Validating and refining GWAS signals, Nature. 10:318-329. Umit Seren. And Bjarni., 2012, GWAPP:A web application for genome wide association mapping in Arabidopsis, The plant cell J. 24:4793-4805. Yamamoto , T., Yonemaru J.&Yano,M .Towards the understanding of complex traits in rice: substantially or superficially? DNA Res.16, 141– 154 ( 2009).
  • 34. Yan J, Shah T, Warburton ML, Buckler ES, McMullen MD, et al. (2009) Genetic characterization and linkage disequilibrium estimation of a global maize collection using SNP markers. PLoS One 4. Zhang H, Zhao Q, Liu K, Zhang Z, Wang Y, et al. (2009) MgCRZ1,a transcription factor of Magnaporthe grisea, controls growth, development and is involved in full virulence. FEMS Microbial Lett 293: 160–169.