Introduction
•
All organisms are subject to mutations as
a result of normal cellular operations or
interactions with the environment, leading
to genetic variation (polymorphism).
•
For this variation to be useful, it must be
(1) heritable and (2) discernable.
Introduction
•
Types of genetic variation include:
base substitutions, commonly referred to as
single nucleotide polymorphisms (SNPs).
insertions or deletions of nucleotide sequences
(indels) within a locus.
inversion of a segment of DNA within a locus.
rearrangement of DNA segments around a locus
of interest
DNA-based genetic markers
•
In the past, allozyme and mtDNA markers.
•
More recent marker types include:
restriction fragment length polymorphism (RFLP) (1)
randomly amplified polymorphic DNA (RAPD)
amplified fragment length polymorphism (AFLP)
expressed sequence tag (EST) markers
single nucleotide polymorphism (SNP)
microsatellite
Type I (coding) vs. type II (non coding)
markers
•
Molecular markers are classified into two
categories:
type I are markers associated with genes
of known function.
type II markers are associated with
anonymous genomic segments
Type I vs. type II markers
•
Most RFLP markers are type I markers because
they were identified during analysis of known
genes.
•
Allozyme markers are type I markers because the
protein they encode has known function.
•
RAPD markers are type II markers because RAPD
bands are amplified from anonymous genomic
regions via the polymerase chain reaction (PCR).
Type I vs. type II markers
• AFLP markers are type II because they are also
amplified from anonymous genomic regions.
• EST markers are type I markers because they
represent transcripts of genes, it is more common in
animals and plants research.
• SNP markers are mostly type II markers unless they
are developed from expressed sequences (eSNP or
cSNP) (type l).
• Microsatellite markers are type II markers unless
they are associated with genes of known function
(type l).
polymorphic
information content (PIC)
• The usefulness of molecular markers can be
measured based on their PIC.
• PIC refers to the value of a marker for detecting
polymorphism in a population.
• PIC depends on the number of detectable alleles
and the distribution of their frequencies.
• The greater the number of alleles, the greater the
PIC
Allozyme markers
• Allozymes are allelic variants of proteins produced
by a single gene locus, and are of interest as
markers because polymorphism exists and
because they represent protein products of genes.
• Amino acid differences in the polypeptide chains of
the different allelic forms of an enzyme reflect
changes in the underlying DNA sequence.
Allozyme markers
• Depending on the nature of the amino acid
changes, the resulting protein products may migrate
at different rates (due to charge and size
differences) when run through a starch gel
subjected to an electrical field.
• Differences in the presence/absence and relative
frequencies of alleles are used to quantify genetic
variation and distinguish among genetic units at the
levels of populations, species, and higher taxonomic
designations.
Allozyme markers
• Disadvantages:
heterozygote deficiencies due to null (enzymatically
inactive) alleles and the amount and quality of tissue
samples required.
some changes in DNA sequence are masked at the
protein level, reducing the level of detectable variation.
some changes in nucleotide sequence do not change
the encoded polypeptide (silent substitutions).
some polypeptide changes do not alter the mobility of
the protein in an electrophoretic gel (synonymous
substitutions).
Mitochondrial DNA markers
• Sequence divergence accumulates more rapidly in
mitochondrial than in nuclear DNA due to a faster
mutation rate result from a lack of repair
mechanisms during replication.
• Due to its non-Mendelian mode of inheritance, the
mtDNA molecule must be considered a single
locus in genetic investigations.
• Disadvantage: mtDNA data may not reflect those of
the nuclear genome.
Restriction fragment length
polymorphism (RFLP)
• Restriction endonucleases are bacterial enzymes that
recognize specific 4, 5, 6, or 8 bp nucleotide sequences
and cut DNA wherever these sequences are
encountered, so that changes in the DNA sequence due
to indels, base substitutions, or rearrangements
involving the restriction sites can result in the gain, loss,
or relocation of a restriction site.
• Digestion of DNA with restriction enzymes results in
fragments whose number and size can vary among
individuals, populations, and species.
Restriction fragment length
polymorphism (RFLP)
• Traditionally, fragments were separated using Southern
blot analysis, Most recent analyses replace it with PCR.
• If flanking sequences are known for a locus, the
segment containing the RFLP region is amplified via
PCR.
• If the length polymorphism is caused by a relatively
large (> approx. 100 bp depending on the size of the
undigested PCR product) deletion or insertion, gel
electrophoresis of the PCR products should reveal the
size difference.
Restriction fragment length
polymorphism (RFLP)
• By using a ‘universal’ primers on a target DNA, PCR
products can be digested with restriction enzymes and
visualized by simple staining with ethidium bromide due to
the increased amount of DNA produced by the PCR
method.
• Advantage: they are codominant markers, because the
size difference is often large, scoring is relatively easy.
• Disadvantage: the relatively low level of polymorphism. In
addition, either sequence information (for PCR analysis)
or probes (for Southern blot analysis) are required,
making it difficult and time-consuming
Random amplified polymorphic
DNA (RAPD)
• RAPD procedures were using PCR to randomly
amplify anonymous segments of nuclear DNA with
an identical pair of primers 8 – 10 bp in length.
• Because the primers are short and relatively low
annealing temperatures (often 36– 40 C) are used,
the likelihood of amplifying multiple products is
great, with each product representing a different
locus.
Random amplified polymorphic
DNA (RAPD)
• The potential power is relatively high for detection of
polymorphism; typically, 5 –20 bands can be
produced using a given primer pair, and multiple sets
of random primers can be used to scan the entire
genome for differential RAPD bands.
• Because each band is considered a bi-allelic locus
(presence or absence of an amplified product), PIC
values for RAPDs fall below those for microsatellites
and SNPs, and RAPDs may not be as informative as
AFLPs because fewer loci are generated
simultaneously.
Amplified fragment length
polymorphism (AFLP)
• AFLP is a PCR-based, multi-locus fingerprinting
technique that combines the strengths and
overcomes the weaknesses of the RFLP and RAPD
methods.
• Like RFLPs, the molecular basis of AFLP
polymorphisms includes indels between restriction
sites and base substitutions at restriction sites; like
RAPDs, it also includes base substitutions at PCR
primer binding sites.
Amplified fragment length
polymorphism (AFLP)
• The unique feature of the technique is the addition
of adaptors of known sequence to DNA fragments
generated by digestion of whole genomic DNA.
• This allows for the subsequent PCR amplification of
a subset of the total fragments for ease of
separation by gel electrophoresis.
• It is the same as RFLP, but instead of analyzing
one locus at a time, it allows for the analysis of
many loci simultaneously.
Single nucleotide polymorphism
(SNP)
• It describes polymorphisms caused by point
mutations that give rise to different alleles
containing alternative bases at a given nucleotide
position within a locus. lt is used for DNA
sequencing.
• SNP markers are inherited as co-dominant
markers.
• Its PIC is not as high as multi-allele microsatellites.
• Random shotgun sequencing, amplicon
sequencing using PCR, and comparative EST
analysis are among the most popular sequencing
methods for SNP discovery.
Expressed sequence tags (ESTs)
• ESTs are single-pass sequences generated from
random sequencing of cDNA clones used for gene
profiling and genomic mapping.
• It offers a rapid and valuable first look at genes
expressed in specific tissue types, under specific
physiological conditions, or during specific
developmental stages.
• ESTs are useful for the development of cDNA
microarrays that allow analysis of differentially
expressed genes.
What is microsatellites & SSRs?
• Microsatellites or simple sequence repeats (SSRs),
represent codominant molecular genetic markers, i.e., both
allele in an individual are present in the analysis.
• Microsatellites are stretches of DNA consisting of tandemly
repeated short units of 1–6 base pairs (bp) in length. SSRs
typically span between twenty and a few hundred bases
• Due to their high level of polymorphism, relatively small size,
multiallelic nature, codominant inheritance and rapid
detection protocols, easily amplified with the PCR using two
unique oligonucleotide primers that flank the microsatellite
and hence define the microsatellite locus, these markers are
widely used in a variety of fundamental and applied fields of
life and medical sciences.
Microsatellites &SSRs
• Application in biology and medicine including:
forensics, molecular epidemiology, parasitology,
population and conservation genetics, genetic
mapping and genetic dissection of complex traits.
• Microsatellites are considered selectively neutral
markers, found anywhere in the genome, both in
protein-encoding (9-15%) and noncoding DNA.
• SSRs contribute to DNA structure, chromatin
organization, regulation of DNA recombination,
transcription and translation, gene expression and
cell cycle dynamics.
Microsatellites &SSRs
• The majority of microsatellites (30–67%) found are
dinucleotides, mostly represented by poly (A/T)
tracts, which are the most frequent classes of
SSRs, where (tri-, tetra-, penta-and
hexanucluotides) are about 1.5-fold less common in
genomic DNA.
• In the human genome, one microsatellite was found
every 6 kb and one CA repeat (the most common
type of tandem repeat) occurred every 30 kb of
DNA.
Microsatellites &SSRs
• Di- and tetranucleotide motifs are mostly clustered
in noncoding regions. In vertebrates, they are
distributed 42- and 30-fold less frequently in exons
than in intronic sequences and intergenic regions,
respectively.
• Long dimeric motifs are highly unstable within
expressed sequences, while in noncoding regions
most dinucleotide repeats can have surprisingly
long stretches, probably due to the high tolerance of
noncoding DNA to mutations.
Microsatellites &SSRs
• In contrast, triplets are found in both coding and
non-coding genomic regions with a high frequency.
• In humans, the expansion of trinucleotides,
encoding polyproline (CCG)n, polyarginine
(CGG)n, polyalanine [(GCC)n and (GCG)n] and
polyglutamine (CAG)n tracts within exons has been
described.
• Such expansions can lead to various
neurodegenerative and neuromuscular disorders,
including myotonic distrophy, fragile X syndrome,
Huntington's disease and spinocerebellar ataxia.
Function of microsatellites
1. DNA & chromosome structure
• Microsatellites are involved in forming a wide variety of
unusual DNA structures with simple and complex loop-
folding patterns.
• Telomeric and centromeric chromosome regions have
been shown to be rich in long arrays of a variety of
mono-, di-, tri-, tetra-and hexanucleotide motifs.
• The (TTAGGG)n hexamer sequence is recognized by
ribonucleoprotein polymerase, a telomerase, which
synthesizes telomere repeats onto the chromosome
ends to overcome the loss of sequences during DNA
replication, whereas other proteins prevent nucleolytic
degradation and confer stability of chromosomes.
Function of microsatellites
2. DNA recombination
• Dinucleotide motifs are preferential sites for
recombination events due to their high affinity for
recombination enzymes.
• Some SSR sequences, such as GT, CA, CT, GA
and others, may influence recombination through
their effects on DNA structure.
• SSRs were shown to be associated with the
assignment of some Rh phenotypes, and to be
involved in the molecular evolution of the human Rh
gene family and its orthologs in other eukaryotes
via replication slippage and recombination (gene
conversion) mechanisms.
Function of microsatellites
3. DNA replication
• Human genes encoding important cell fidelity and
growth factors, such as the B-cell
leukemia/lymphoma 2 (BCL2)-associated X protein,
insulin-like growth factor 2 receptor (IGF2R), breast
cancer early onset protein 2 (BRCA2) and
transforming growth factor beta 2 (TGF-β2), contain
short repeated sequences.
• Frame-shift mutations, resulting in both insertions
and deletions of repeat units within these
sequences that affect these genes and could
therefore initiate tumorigenes and can affect
enzymes controlling mutation rate and cell cycles.
Function of microsatellites
4. Gene expression
• SSRs located in promoter regions can influence drastic
or quantitative variations in gene expression and
change the level of promoter activity. The human insulin
minisatellite is highly polymorphic, and some of its
alleles were shown to regulate the expression of the
insulin gene.
• Intronic SSRs also can affect gene transcription affect
mRNA stability, representing binding sites for
translation factors. For example, such an effect was
measured for the tetrameric microsatellite located in
intron 1 of the human tyrosine hydroxylase gene and
the (CA)n dinucleotide repeat in the first intron of the
human epidermal growth factor receptor gene.
Development of type I (coding) and type II
(non-coding) microsatellite markers
• Type I markers are more difficult to develop. While non-
gene sequences are free to mutate, causing higher levels
of polymorphism, sequences within protein-coding regions
generally show lower levels of polymorphism because of
functional selection pressure.
• The most effective and rapid way for producing type I
microsatellites is the sequencing of clones from cDNA
libraries. Both 5′- and 3′-ends of a cDNA clone can be
sequenced to produce expressed sequence tags (ESTs).
• An EST represents a short, usually 200–600 bp-long
nucleotide sequence, which represents a uniquely
expressed region of the genome.
Development of type I (coding) and type II
(non-coding) microsattellite markers
• EST sequences are archived in a special branch of the
GenBank nucleotide database (dbEST). In Nov. 2005, the
EST database contained more than 31.3 million sequence
entries from around 500 species.
• A typical strategy for the development of ESTderived
microsatellite markers (data mining) includes preliminary
analysis of EST sequences from the DNA database to
remove poly(A) and poly(T) stretches which are common in
ESTs developed from the 3′-ends of cDNA clones and
correspond to the poly(A)-tails in eukaryotic mRNA.
• Sequences are further screened for putative SSRs (all
SSR-containing EST sequences). Following the
identification of ESTs, flanking primers should be designed
to amplify a microsatellite.
Applications of microsatellites
1. Genetic mapping
2. Individual DNA identification and
parentage assignment
3. Phylogeny, population and conservation
genetics
4. Molecular epidemiology and pathology
5. Quantitative trait loci mapping
6. Marker-assisted selection
1. Genetic mapping
• SSRs remain the markers of choice for the
construction of linkage maps, because they are
highly polymorphic (and highly informative) and
require a small amount of DNA for each test.
• However, type II (noncoding) microsatellites are
very helpful for building a dense linkage map
framework into which type I (coding) markers can
then be incorporated (type I markers directly shows
the location of genes within the linkage map).
1. Genetic mapping
• Linkage map is known as recombination maps and define
the order and distance of loci along a chromosome on the
basis of inheritance in families or populations.
• During meiosis, one random copy of each chromosome
pair is passed on to the gamete. Only genes located next
to each other are tightly linked.
• Crossingover results from physical exchange of
chromosome segments between two homologous
chromosomes of meiosis.
• Recombination results in the exchange of grandparental
alleles of genes further apart on that chromosome
1. Genetic mapping
• Genetic distance is usually measured in
centimorgans (cM), where 1 cM is equivalent to
1% recombination between markers.
• Linkage map length differs between sexes. In
species with the XY sex determination system,
the female map is usually longer than the male
map because of higher recombination rates in
females compared to males.
2. Individual DNA identification and
parentage assignment
• Microsatellites represent codominant single-locus DNA
markers. For each SSR, a progeny inherits one allele from
the father and another allele from the mother.
• Appropriate mathematical tools are available to evaluate
genetic relatedness and inheritance in these systems.
• A suitable methodology should be chosen for accurate and
correct analysis of genotyping data to reconstruct parentage
and pedigree structure.
• Due to the small size of SSRs, they are relatively stable in
degraded DNA. This is one reason why polymorphic SSRs
are widely used in forensic science, as microsatellite loci
remain relatively stable in bone remnants and dental tissue,
providing the basis for the successful application of ancient
DNA.
3. Phylogeny, population and
conservation genetics
• By using variability within stretches of tandem
repeats, which evolve significantly more rapidly
than flanking regions.
• Flanking regions of microsatellites have proven
their value in establishing phylogenetic
relationships between species and families,
because they evolve much more slowly than
numbers of tandem repeats.
• Phylogeographical applications of micro-satellites
are eminently suitable, where population structure
is observed over a large geographical scale.
4. Molecular epidemiology and
pathology
• Genomic instability of microsatellites has been extensively
evaluated in the field of carcinogenesis, where chromosomal
rearrangements (e.g., translocations, insertions and
deletions of genomic regions) occur.
• Carcinogenic events often happen within a genomic region
harboring a tumour suppressor gene and hence inactivate
the gene.
• Carcinogenic rearrangements are associated with loss of
heterozygosity (LOH) in microsatellites located within the
affected chromosome region.
• Thus, detecting microsatellite LOH in tumour tissues
contributes not only to molecular diagnosis of cancer, but
also points the possible location of a tumour suppressor
gene.
5. Quantitative trait loci mapping
• A quantitative trait is one that has measurable
phenotypic variation owing to genetic and/or
environmental influences.
• The variation can be measured numerically (for
example, height, size or blood pressure) and
quantified.
• Generally, quantitative traits are complex
(multifactorial) and influenced by several
polymorphic genes and by environmental
conditions.
6. Marker-assisted selection
• Marker-assisted selection is based on the concept that it
is possible to infer the presence of a gene from the
presence of a marker tightly linked to that gene.
• So, it is important to have high-density and high-resolution
genetic maps, which are saturated by markers in the
vicinity of a target locus (gene) that will be selected.
• The degree of saturation is the proportion of the genome
that will be covered by markers at the density such that
the maximum separation between markers is no greater
than a few centimorgans (usually 1–2 cM), within which
linkage of markers can be detected.