2. Peptide recognition modules A lot of known protein domains recognize and bind small linear peptides. Examples: SH3, SH2, WW, PDZ domains, etc These domains can be viewed like lego blocks that can be “used” by evolution to confer target specificity to whole proteins. These domains seem to be very important in signal transduction pathways. SH3 domain of FYN
3. Peptide recognition modules Each domain type binds a general protein “motif”, that can be characterised experimentally (for example with phage display) These motifs are small and degenerate, therefore very abundant in the proteomes
4. Peptide recognition modules Each specific SH3 domain as a particular instance of the more general motif. Only a fraction of the motifs found in the proteome are relevant targets of the domains. Can we use secondary structure and comparative genomics to find which are the relevant targets ?
5. Idea for a method Look for conservation of the target binding motif ignoring all parts of the target protein that are not suitable for binding Alignment of protein orthologs 2) Map the pattern in all S. cerevisiae proteins 4)Conserved motifs are considered relevant 3) Remove patterns outside “disorder” regions Predictor: disEMBL 1) Get pattern from the literature
6. What genomes to use ? To close to S. cerevisiae we will find everything conserved, even the non relevant. In species that diverged a long time ago the binding motif will have diverged. Best guess: C. glabrata K. lactis C. albicans D. hansenii Y. lipolytica
7. Scoring a method Known Negative Interactions (N) Known Positive Interactions (P) Universe of possible interactions Predicted Interactions False Positives (FP) True Positives (TP) Accuracy = TP/(TP+FP) (Probability of picking a true interaction) Coverage = TP/P (How much of the known positives does the method recover)
8.
9. Conservation only Conservation and secondary structure filter >80% Accuracy <25% Coverage Conserved in 4 of the 5 genomes used
10.
11. Going back to the question: What genomes to use ? Is there an optimal time of divergence for species to include in the comparative genomics study ? Best results obtained from around 400My to 950My
12. Looking at all possible combinations of four genomes we could determine that some genomes are significantly more informative than others. Increasing divergence time from S. cerevisiae
13.
14. Predicting new interactions Look for conservation of target motifs in four of the genomes: C. glabrata , K.lactis , C. albicans , D. hansenii , Y. lipolytica ,N. crassa and S.pombe Red = some experimental evidence found Thin lines = not some compartment