Handwritten Text Recognition for manuscripts and early printed texts
Maximizing hidden stop codon on gene design
1. Synthetic gene design with a large number of hidden stops Authors: Phan, V., Saha, S., Pandey, A., Wong, T-Y Published in: Intl. Journal of Data Mining and Bioinformatics Vol. 4, No. 4, 2010 Presented by: Khaled Monsoor Bioinformatics Masters Program The University of Memphis Mail: kmonsoor@memphis.edu Date: Nov 05, 2010
26. Regular Expression for a Protein (start) (codon)k (stop) Start – ATG Stop – TAA, TAG, TGA Codon– any triplet not equal to TAA, TAG, orTGA Example: ATG.ACC.AAT.CGG.TAA 14 Stop codon (but hidden)
27. Why a hidden stop is good ? Hidden stops can protect against frame shifts by terminating consequence translation early Without hidden stops, frame shifts can cause very long non-functional proteins, resulting to not only waste of time, amino acid resources (money), ATP (energy) but also produce some deadly toxin Ref: Seligmann and Pollock, DNA and Cell Biology, 2004 15
36. Hidden Stops Consider this protein is MSDSKED Both sequences encode for this protein: ATG.AGT.GAT.AGT.AAA.GAA.GAC.TAA ATG.TCC.GAT.TCG.AAA.GAA.GAC.TAA Sequence (1) is better! It has 4 hidden stops! 19
37.
38. Dynamic Programming approach Idea: Optimal design of whole sequence is based on optimal design of partial sequences H(i, j) = optimal design up to ith amino acid, Ai, which is coded by its jthcodon 21
39. Optimal Substructure of algorithm This formula can be computed recursively (in linear time, O(n)) H(i, j) = maxk { H(i-1, k) + Ikj } Maximizing over all k codons coding the previous amino acid, Ai-1 Ikj = 1 if the kth codon of Ai-1 and jth codon of Ai is a stop codon 22
40. Strategy: Back Translation Protein DNA This is a 1-to-many mapping Back translation should: Satisfy constraints imposed by host genomes, Serve specific design purpose 23
42. Constrained by GC Content GC content = number of G & C in sequence GC content relates to the stability of DNA Algorithm’s objectives: maximizenumber of hidden stops, then, matchGC content of host genome 25
53. Conclusion Synthetic gene design with a large number of hidden stops
54. Comparison “Wild type” (genes from NCBI) Random gene (constrained by Codon usage of “wild type” “Optimal” – design with no constraint (max stop codon) Constrained by GC content of wild type Constrained by Codon usage of wild type 31