25. Problem: Given a very long DNA sequence, identify coding regions (including intron splice sites) and their predicted protein sequences Computational Gene Finding
30. GENE STRUCTURE INFORMATION - POSITION ON PHYSICAL MAP This gene structure corresponds to the position on the physical map
31.
32.
33. GENE STRUCTURE INFORMATION - PREDICTED GENE STRUCTURE This gene structure relates to the predicted gene structures Boxes are Exons, thin lines (or springs) are Introns
34. Find the open reading frames GAAAAAGCTCCTGCCCAATCTGAAATGGTTAGCCTATCTTTCCACCGT Any sequence has 3 potential reading frames (+1, +2, +3) Its complement also has three potential reading frames (-1, -2, -3) 6 possible reading frames The triplet, non-punctuated nature of the genetic code helps us out 64 potential codons 61 true codons 3 stop codons (TGA, TAA, TAG) Random distribution app. 1/21 codons will be a stop E K A P A Q S E M V S L S F H R K K L L P N L K W L A Y L S T K S S C P I * N G * P I F P P
35.
36.
37.
38.
39.
40. blastn (EST) For raw DNA sequence analysis blastx is extremely useful Will probe your DNA sequence against the protein database A match (homolog) gives you some ideas regarding function One problem are all of the genome sequences Will get matches to genome databases that are strictly identified by sequence homology – often you need some experimental evidence
41.
42.
43. Borodovsky et al., 1999, Organization of the Prokaryotic Genome (Charlebois, ed) pp. 11-34 New generation of programs to predict gene coding sequences based on a non-random repeat pattern (eg. Glimmer, GeneMark) – actually pretty good
79. Contents-Schedule RNA genes Besides the 6000 protein coding-genes, there is: 140 ribosomal RNA genes 275 transfer RNA gnes 40 small nuclear RNA genes >100 small nucleolar genes ? pRNA in 29 rotary packaging motor ( Simpson et el. Nature 408:745-750,2000) Cartilage-hair hypoplasmia mapped to an RNA (Ridanpoa et al. Cell 104:195-203,2001) The human Prader-Willi ciritical region (Cavaille et al. PNAS 97:14035-7, 2000)
80.
81.
82.
83.
84. RNA genes can be hard to detects UGAGGUAGUAGGUUGUAUAGU C.elegans let-27; 21 nt (Pasquinelli et al. Nature 408:86-89,2000) Often small Sometimes multicopy and redundant Often not polyadenylated (not represented in ESTs) Immune to frameshift and nonsense mutations No open reading frame, no codon bias Often evolving rapidly in primary sequence miRNA genes
85.
86. Let-7 (lethal-7) was also mapped to a ncRNA gene with a 21-nucleotide product The small let-7 RNA is also thought to be a post-transcriptional negative regulator for lin-41 and lin-42 100% conserved in all bilaterally symmetrical animals (not jellyfish and sponges) Sometimes called stRNAs, small temporal RNAs Let-7 (Pasquinelli et al. Nature 408:86-89,2000)
87.
88.
89.
90.
91.
92.
93.
94.
95. Basic CFG “ production rules” S -> aS S -> Sa S -> aSu S -> SS Context-free grammers A CFG “derivation” S -> aS
96. Basic CFG “ production rules” S -> aS S -> Sa S -> aSu S -> SS Context-free grammers A CFG “derivation” S -> a S S -> a aS
97. Basic CFG “ production rules” S -> aS S -> Sa S -> aSu S -> SS Context-free grammers A CFG “derivation” S -> aS S -> aa S S -> aa SS
98. Basic CFG “ production rules” S -> aS S -> Sa S -> aSu S -> SS Context-free grammers A CFG “derivation” S -> aS S -> aaS S -> aa S S S -> aa gSc uS
99. Basic CFG “ production rules” S -> aS S -> Sa S -> aSu S -> SS Context-free grammers A CFG “derivation” S -> aS S -> aaS S -> aaS S S -> aagSc uS
100. Basic CFG “ production rules” S -> aS S -> Sa S -> aSu S -> SS Context-free grammers A CFG “derivation” S -> aS S -> aaS S -> aaSS S -> aag S cu S S -> aag aSu cu gSc
101. Basic CFG “ production rules” S -> aS S -> Sa S -> aSu S -> SS Context-free grammers A CFG “derivation” S -> aS S -> aaS S -> aaSS S -> aagScuS S -> aagaSucugSc S -> aaga S aucugg S cc S -> aaga cSg aucuggc gSc cc
102. Basic CFG “ production rules” S -> aS S -> Sa S -> aSu S -> SS Context-free grammers A CFG “derivation” S -> aS S -> aaS S -> aaSS S -> aagScuS S -> aagaSucugSc S -> aagaSaucuggScc S -> aagacSgaucuggcgSccc S -> aagacuSgaucuggcgSccc S -> aagacuuSgaucuggcgaSccc S -> aagacuucSgaucuggcgacSccc S -> aagacuucgSgaucuggcgacaSccc S -> aagacuucggaucuggcgacaccc
103. Basic CFG “ production rules” S -> aS S -> Sa S -> aSu S -> SS Context-free grammers A CFG “derivation” S -> aS S -> aaS S -> aaSS S -> aagScuS S -> aagaSucugSc S -> aagaSaucuggScc S -> aagacSgaucuggcgSccc S -> aagacuSgaucuggcgSccc S -> aagacuuSgaucuggcgaSccc S -> aagacuucSgaucuggcgacSccc S -> aagacuucgSgaucuggcgacaSccc S -> aagacuucggaucuggcgacaccc
104. Basic CFG “ production rules” S -> aS S -> Sa S -> aSu S -> SS Context-free grammers A CFG “derivation” S -> aS S -> aaS S -> aaSS S -> aagScuS S -> aagaSucugSc S -> aagaSaucuggScc S -> aagacSgaucuggcgSccc S -> aagacuSgaucuggcgSccc S -> aagacuuSgaucuggcgaSccc S -> aagacuucSgaucuggcgacSccc S -> aagacuucgSgaucuggcgacaSccc S -> aagacuucggaucuggcgacaccc A C G U * A A A A A G G G G G C C C C C C C U U U * * * * *