Mais conteúdo relacionado



  1. Genetic biomarkers Noha l. ibrahim
  2. Introduction • All organisms are subject to mutations as a result of normal cellular operations or interactions with the environment, leading to genetic variation (polymorphism). • For this variation to be useful, it must be (1) heritable and (2) discernable.
  3. Introduction • Types of genetic variation include:  base substitutions, commonly referred to as single nucleotide polymorphisms (SNPs).  insertions or deletions of nucleotide sequences (indels) within a locus.  inversion of a segment of DNA within a locus.  rearrangement of DNA segments around a locus of interest
  4. DNA-based genetic markers • In the past, allozyme and mtDNA markers. • More recent marker types include:  restriction fragment length polymorphism (RFLP) (1)  randomly amplified polymorphic DNA (RAPD)  amplified fragment length polymorphism (AFLP)  expressed sequence tag (EST) markers  single nucleotide polymorphism (SNP)  microsatellite
  5. Type I (coding) vs. type II (non coding) markers • Molecular markers are classified into two categories:  type I are markers associated with genes of known function.  type II markers are associated with anonymous genomic segments
  6. Type I vs. type II markers • Most RFLP markers are type I markers because they were identified during analysis of known genes. • Allozyme markers are type I markers because the protein they encode has known function. • RAPD markers are type II markers because RAPD bands are amplified from anonymous genomic regions via the polymerase chain reaction (PCR).
  7. Type I vs. type II markers • AFLP markers are type II because they are also amplified from anonymous genomic regions. • EST markers are type I markers because they represent transcripts of genes, it is more common in animals and plants research. • SNP markers are mostly type II markers unless they are developed from expressed sequences (eSNP or cSNP) (type l). • Microsatellite markers are type II markers unless they are associated with genes of known function (type l).
  8. polymorphic information content (PIC) • The usefulness of molecular markers can be measured based on their PIC. • PIC refers to the value of a marker for detecting polymorphism in a population. • PIC depends on the number of detectable alleles and the distribution of their frequencies. • The greater the number of alleles, the greater the PIC
  9. Allozyme markers • Allozymes are allelic variants of proteins produced by a single gene locus, and are of interest as markers because polymorphism exists and because they represent protein products of genes. • Amino acid differences in the polypeptide chains of the different allelic forms of an enzyme reflect changes in the underlying DNA sequence.
  10. Allozyme markers • Depending on the nature of the amino acid changes, the resulting protein products may migrate at different rates (due to charge and size differences) when run through a starch gel subjected to an electrical field. • Differences in the presence/absence and relative frequencies of alleles are used to quantify genetic variation and distinguish among genetic units at the levels of populations, species, and higher taxonomic designations.
  11. Allozyme markers • Disadvantages:  heterozygote deficiencies due to null (enzymatically inactive) alleles and the amount and quality of tissue samples required.  some changes in DNA sequence are masked at the protein level, reducing the level of detectable variation.  some changes in nucleotide sequence do not change the encoded polypeptide (silent substitutions).  some polypeptide changes do not alter the mobility of the protein in an electrophoretic gel (synonymous substitutions).
  12. Mitochondrial DNA markers • Sequence divergence accumulates more rapidly in mitochondrial than in nuclear DNA due to a faster mutation rate result from a lack of repair mechanisms during replication. • Due to its non-Mendelian mode of inheritance, the mtDNA molecule must be considered a single locus in genetic investigations. • Disadvantage: mtDNA data may not reflect those of the nuclear genome.
  13. Restriction fragment length polymorphism (RFLP) • Restriction endonucleases are bacterial enzymes that recognize specific 4, 5, 6, or 8 bp nucleotide sequences and cut DNA wherever these sequences are encountered, so that changes in the DNA sequence due to indels, base substitutions, or rearrangements involving the restriction sites can result in the gain, loss, or relocation of a restriction site. • Digestion of DNA with restriction enzymes results in fragments whose number and size can vary among individuals, populations, and species.
  14. Restriction fragment length polymorphism (RFLP) • Traditionally, fragments were separated using Southern blot analysis, Most recent analyses replace it with PCR. • If flanking sequences are known for a locus, the segment containing the RFLP region is amplified via PCR. • If the length polymorphism is caused by a relatively large (> approx. 100 bp depending on the size of the undigested PCR product) deletion or insertion, gel electrophoresis of the PCR products should reveal the size difference.
  15. Restriction fragment length polymorphism (RFLP) • By using a ‘universal’ primers on a target DNA, PCR products can be digested with restriction enzymes and visualized by simple staining with ethidium bromide due to the increased amount of DNA produced by the PCR method. • Advantage: they are codominant markers, because the size difference is often large, scoring is relatively easy. • Disadvantage: the relatively low level of polymorphism. In addition, either sequence information (for PCR analysis) or probes (for Southern blot analysis) are required, making it difficult and time-consuming
  16. Random amplified polymorphic DNA (RAPD) • RAPD procedures were using PCR to randomly amplify anonymous segments of nuclear DNA with an identical pair of primers 8 – 10 bp in length. • Because the primers are short and relatively low annealing temperatures (often 36– 40 C) are used, the likelihood of amplifying multiple products is great, with each product representing a different locus.
  17. Random amplified polymorphic DNA (RAPD) • The potential power is relatively high for detection of polymorphism; typically, 5 –20 bands can be produced using a given primer pair, and multiple sets of random primers can be used to scan the entire genome for differential RAPD bands. • Because each band is considered a bi-allelic locus (presence or absence of an amplified product), PIC values for RAPDs fall below those for microsatellites and SNPs, and RAPDs may not be as informative as AFLPs because fewer loci are generated simultaneously.
  18. Amplified fragment length polymorphism (AFLP) • AFLP is a PCR-based, multi-locus fingerprinting technique that combines the strengths and overcomes the weaknesses of the RFLP and RAPD methods. • Like RFLPs, the molecular basis of AFLP polymorphisms includes indels between restriction sites and base substitutions at restriction sites; like RAPDs, it also includes base substitutions at PCR primer binding sites.
  19. Amplified fragment length polymorphism (AFLP) • The unique feature of the technique is the addition of adaptors of known sequence to DNA fragments generated by digestion of whole genomic DNA. • This allows for the subsequent PCR amplification of a subset of the total fragments for ease of separation by gel electrophoresis. • It is the same as RFLP, but instead of analyzing one locus at a time, it allows for the analysis of many loci simultaneously.
  20. Single nucleotide polymorphism (SNP) • It describes polymorphisms caused by point mutations that give rise to different alleles containing alternative bases at a given nucleotide position within a locus. lt is used for DNA sequencing. • SNP markers are inherited as co-dominant markers. • Its PIC is not as high as multi-allele microsatellites. • Random shotgun sequencing, amplicon sequencing using PCR, and comparative EST analysis are among the most popular sequencing methods for SNP discovery.
  21. Expressed sequence tags (ESTs) • ESTs are single-pass sequences generated from random sequencing of cDNA clones used for gene profiling and genomic mapping. • It offers a rapid and valuable first look at genes expressed in specific tissue types, under specific physiological conditions, or during specific developmental stages. • ESTs are useful for the development of cDNA microarrays that allow analysis of differentially expressed genes.
  22. What is microsatellites & SSRs? • Microsatellites or simple sequence repeats (SSRs), represent codominant molecular genetic markers, i.e., both allele in an individual are present in the analysis. • Microsatellites are stretches of DNA consisting of tandemly repeated short units of 1–6 base pairs (bp) in length. SSRs typically span between twenty and a few hundred bases • Due to their high level of polymorphism, relatively small size, multiallelic nature, codominant inheritance and rapid detection protocols, easily amplified with the PCR using two unique oligonucleotide primers that flank the microsatellite and hence define the microsatellite locus, these markers are widely used in a variety of fundamental and applied fields of life and medical sciences.
  23. Microsatellites &SSRs • Application in biology and medicine including: forensics, molecular epidemiology, parasitology, population and conservation genetics, genetic mapping and genetic dissection of complex traits. • Microsatellites are considered selectively neutral markers, found anywhere in the genome, both in protein-encoding (9-15%) and noncoding DNA. • SSRs contribute to DNA structure, chromatin organization, regulation of DNA recombination, transcription and translation, gene expression and cell cycle dynamics.
  24. Microsatellites &SSRs • The majority of microsatellites (30–67%) found are dinucleotides, mostly represented by poly (A/T) tracts, which are the most frequent classes of SSRs, where (tri-, tetra-, penta-and hexanucluotides) are about 1.5-fold less common in genomic DNA. • In the human genome, one microsatellite was found every 6 kb and one CA repeat (the most common type of tandem repeat) occurred every 30 kb of DNA.
  25. Microsatellites &SSRs • Di- and tetranucleotide motifs are mostly clustered in noncoding regions. In vertebrates, they are distributed 42- and 30-fold less frequently in exons than in intronic sequences and intergenic regions, respectively. • Long dimeric motifs are highly unstable within expressed sequences, while in noncoding regions most dinucleotide repeats can have surprisingly long stretches, probably due to the high tolerance of noncoding DNA to mutations.
  26. Microsatellites &SSRs • In contrast, triplets are found in both coding and non-coding genomic regions with a high frequency. • In humans, the expansion of trinucleotides, encoding polyproline (CCG)n, polyarginine (CGG)n, polyalanine [(GCC)n and (GCG)n] and polyglutamine (CAG)n tracts within exons has been described. • Such expansions can lead to various neurodegenerative and neuromuscular disorders, including myotonic distrophy, fragile X syndrome, Huntington's disease and spinocerebellar ataxia.
  27. Function of microsatellites 1. DNA & chromosome structure • Microsatellites are involved in forming a wide variety of unusual DNA structures with simple and complex loop- folding patterns. • Telomeric and centromeric chromosome regions have been shown to be rich in long arrays of a variety of mono-, di-, tri-, tetra-and hexanucleotide motifs. • The (TTAGGG)n hexamer sequence is recognized by ribonucleoprotein polymerase, a telomerase, which synthesizes telomere repeats onto the chromosome ends to overcome the loss of sequences during DNA replication, whereas other proteins prevent nucleolytic degradation and confer stability of chromosomes.
  28. Function of microsatellites 2. DNA recombination • Dinucleotide motifs are preferential sites for recombination events due to their high affinity for recombination enzymes. • Some SSR sequences, such as GT, CA, CT, GA and others, may influence recombination through their effects on DNA structure. • SSRs were shown to be associated with the assignment of some Rh phenotypes, and to be involved in the molecular evolution of the human Rh gene family and its orthologs in other eukaryotes via replication slippage and recombination (gene conversion) mechanisms.
  29. Function of microsatellites 3. DNA replication • Human genes encoding important cell fidelity and growth factors, such as the B-cell leukemia/lymphoma 2 (BCL2)-associated X protein, insulin-like growth factor 2 receptor (IGF2R), breast cancer early onset protein 2 (BRCA2) and transforming growth factor beta 2 (TGF-β2), contain short repeated sequences. • Frame-shift mutations, resulting in both insertions and deletions of repeat units within these sequences that affect these genes and could therefore initiate tumorigenes and can affect enzymes controlling mutation rate and cell cycles.
  30. Function of microsatellites 4. Gene expression • SSRs located in promoter regions can influence drastic or quantitative variations in gene expression and change the level of promoter activity. The human insulin minisatellite is highly polymorphic, and some of its alleles were shown to regulate the expression of the insulin gene. • Intronic SSRs also can affect gene transcription affect mRNA stability, representing binding sites for translation factors. For example, such an effect was measured for the tetrameric microsatellite located in intron 1 of the human tyrosine hydroxylase gene and the (CA)n dinucleotide repeat in the first intron of the human epidermal growth factor receptor gene.
  31. Development of type I (coding) and type II (non-coding) microsatellite markers • Type I markers are more difficult to develop. While non- gene sequences are free to mutate, causing higher levels of polymorphism, sequences within protein-coding regions generally show lower levels of polymorphism because of functional selection pressure. • The most effective and rapid way for producing type I microsatellites is the sequencing of clones from cDNA libraries. Both 5′- and 3′-ends of a cDNA clone can be sequenced to produce expressed sequence tags (ESTs). • An EST represents a short, usually 200–600 bp-long nucleotide sequence, which represents a uniquely expressed region of the genome.
  32. Development of type I (coding) and type II (non-coding) microsattellite markers • EST sequences are archived in a special branch of the GenBank nucleotide database (dbEST). In Nov. 2005, the EST database contained more than 31.3 million sequence entries from around 500 species. • A typical strategy for the development of ESTderived microsatellite markers (data mining) includes preliminary analysis of EST sequences from the DNA database to remove poly(A) and poly(T) stretches which are common in ESTs developed from the 3′-ends of cDNA clones and correspond to the poly(A)-tails in eukaryotic mRNA. • Sequences are further screened for putative SSRs (all SSR-containing EST sequences). Following the identification of ESTs, flanking primers should be designed to amplify a microsatellite.
  33. Applications of microsatellites 1. Genetic mapping 2. Individual DNA identification and parentage assignment 3. Phylogeny, population and conservation genetics 4. Molecular epidemiology and pathology 5. Quantitative trait loci mapping 6. Marker-assisted selection
  34. 1. Genetic mapping • SSRs remain the markers of choice for the construction of linkage maps, because they are highly polymorphic (and highly informative) and require a small amount of DNA for each test. • However, type II (noncoding) microsatellites are very helpful for building a dense linkage map framework into which type I (coding) markers can then be incorporated (type I markers directly shows the location of genes within the linkage map).
  35. 1. Genetic mapping • Linkage map is known as recombination maps and define the order and distance of loci along a chromosome on the basis of inheritance in families or populations. • During meiosis, one random copy of each chromosome pair is passed on to the gamete. Only genes located next to each other are tightly linked. • Crossingover results from physical exchange of chromosome segments between two homologous chromosomes of meiosis. • Recombination results in the exchange of grandparental alleles of genes further apart on that chromosome
  36. 1. Genetic mapping • Genetic distance is usually measured in centimorgans (cM), where 1 cM is equivalent to 1% recombination between markers. • Linkage map length differs between sexes. In species with the XY sex determination system, the female map is usually longer than the male map because of higher recombination rates in females compared to males.
  37. 2. Individual DNA identification and parentage assignment • Microsatellites represent codominant single-locus DNA markers. For each SSR, a progeny inherits one allele from the father and another allele from the mother. • Appropriate mathematical tools are available to evaluate genetic relatedness and inheritance in these systems. • A suitable methodology should be chosen for accurate and correct analysis of genotyping data to reconstruct parentage and pedigree structure. • Due to the small size of SSRs, they are relatively stable in degraded DNA. This is one reason why polymorphic SSRs are widely used in forensic science, as microsatellite loci remain relatively stable in bone remnants and dental tissue, providing the basis for the successful application of ancient DNA.
  38. 3. Phylogeny, population and conservation genetics • By using variability within stretches of tandem repeats, which evolve significantly more rapidly than flanking regions. • Flanking regions of microsatellites have proven their value in establishing phylogenetic relationships between species and families, because they evolve much more slowly than numbers of tandem repeats. • Phylogeographical applications of micro-satellites are eminently suitable, where population structure is observed over a large geographical scale.
  39. 4. Molecular epidemiology and pathology • Genomic instability of microsatellites has been extensively evaluated in the field of carcinogenesis, where chromosomal rearrangements (e.g., translocations, insertions and deletions of genomic regions) occur. • Carcinogenic events often happen within a genomic region harboring a tumour suppressor gene and hence inactivate the gene. • Carcinogenic rearrangements are associated with loss of heterozygosity (LOH) in microsatellites located within the affected chromosome region. • Thus, detecting microsatellite LOH in tumour tissues contributes not only to molecular diagnosis of cancer, but also points the possible location of a tumour suppressor gene.
  40. 5. Quantitative trait loci mapping • A quantitative trait is one that has measurable phenotypic variation owing to genetic and/or environmental influences. • The variation can be measured numerically (for example, height, size or blood pressure) and quantified. • Generally, quantitative traits are complex (multifactorial) and influenced by several polymorphic genes and by environmental conditions.
  41. 6. Marker-assisted selection • Marker-assisted selection is based on the concept that it is possible to infer the presence of a gene from the presence of a marker tightly linked to that gene. • So, it is important to have high-density and high-resolution genetic maps, which are saturated by markers in the vicinity of a target locus (gene) that will be selected. • The degree of saturation is the proportion of the genome that will be covered by markers at the density such that the maximum separation between markers is no greater than a few centimorgans (usually 1–2 cM), within which linkage of markers can be detected.