Protein evolution proceeds through genetic mechanisms, but selection acts on biological assemblies. I define a protodomain as a minimal independently evolving unit with conserved structure. Protodomain rearrangements have minimal impact on biological assemblies, so they represent a valid evolutionary path through fold space.
These slides are from my Candidacy Exam on Jan 28, 2013 at University of California, San Diego. It discusses my current research in Philip Bourne's lab, as well as proposes research for my thesis over the next two years. An audio version is available at http://www.scivee.tv/node/57082
4. CONTINUITY
Sadreyev, R. I., Kim, B.-H., &Grishin, N. V.
(2009). Discrete-continuous duality of protein
structure space. Current Opinion in Structural
Biology, 19(3), 321–328.
Grishin. J Struct Biol (2001) vol. 134 (2-3) pp. 167-85
5. MODELS OF FOLD SPACE
β
Orengo, Flores, Taylor, Thorn
ton. Protein Eng (1993) vol. 6 α α/β
(5) pp. 485-500
Holm and Sander. J Mol Biol
(1993) vol. 233 (1) pp. 123-38 α+β
Holm and Sander. Science
(1996) vol. 273 (5275) pp. 595-
603
Shindyalov and Bourne.
Proteins (2000) vol. 38 (3) pp.
247-60
Hou, Sims, Zhang, Kim.
PNAS (2003) vol. 100 (5) pp.
2386-90
Taylor. Curr Opin Struct Biol
(2007) vol. 17 (3) pp. 354-61
Sadreyev et al. Curr Opin
Struct Biol (2009) vol. 19 (3)
pp. 321-8
6. BIG QUESTIONS
Is fold space discrete or continuous?
Where do new folds come from?
What insights can we gain by studying fold
space?
10. FOLD
Group of domains with
Same major secondary structural elements
Same mutual orientation
Same connectivity
11. PROTODOMAINS
A protodomain is a minimal, independently
evolving protein unit with a conserved
structure.
Defined through evolution, but usually observed
as structural motif
Coined by Philippe Youkharibache
12. PROTODOMAINS
A protodomain is a minimal, independently
evolving protein unit with a conserved structure.
GTP binding
Glyoxalase I from regulator from
Clostridium Thermotoga
acetobutylicum[3 maritima [1VR8]
HDP]
Glyoxalase I in E. Pseudomonas
coli [1F9Z] 1,2-dihydroxy-
naphthalene
dioxygenase
[2EHZ]
14. SPECIFIC AIMS
1. Improve algorithms to identify conserved
protodomains globally across the PDB.
2. Identify structurally similar and potentially
homologous protodomains across fold space.
3. Integrate protodomain arrangements with
domain and quaternary structure information
to create a parsimonious model of fold evolution
across the tree of life.
4. Apply protodomain principles to understanding
the evolution of specific protein families.
15. AIM 1
Improve algorithms to identify conserved
protodomains globally across the PDB.
Preliminary Research:
a) Circular Permutation with CE-CP
b) Symmetry with CE-Symm
Proposed Research:
a) Improve CE-Symm algorithm
b) Create algorithms for other types of
protodomain rearrangements
c) Run algorithms globally across the PDB
d) Create non-redundant catalogue of
protodomains
16. CIRCULAR PERMUTATION
Spencer Bliven and Andreas Prlić. Circular
Permutation in Proteins. PLoSComputBiol (2012)
8(3): e1002445.
18. CE-CP
A Prlić, S Bliven, P Rose, J Jacobsen, PV Troshin, M Chapman, J
Gao, CH Koh, S Foisy, R Holland, G Rimša, ML Heuer, H.
Brandstätter–Müller, PE Bourne, and S Willis. BioJava: an open-
source framework for bioinformatics in 2012. Bioinformatics (2012).
http://www.rcsb.org/pdb/workbench/workbench.do
N
C
C
N
Molybdate-binding protein Regulator of G protein
Concanavalin A [1NLS.A]
[1ATG.A] vs. OpuAC signaling 10 [2IHB.A] vs.
vs. Pea Lectin [1RIN.A+B]
[2B4L.A] vaccinia H1-related
phosphatase[1VHR.A]
20. SYMMETRY
Beta Propeller
Goodsell, D. S., & Olson, A. J. (2000).
Structural symmetry and protein function.
Annual Review of Biophysics and Biomolecular
Structure, 29, 105–153.
21. SYMMETRY
Functionally important FGF-1
3JUT
Protein evolution (e.g. beta-trefoil)
DNA binding
Allosteric regulation
Cooperativity TATA Binding
Protein
1TGH
Widespread (19% of proteins)
Hemoglobin
4HHB
22. SYMMETRY EVOLUTION
Start with perfectly symmetric homomer
Duplications & Fusions
Symmetry lost to drift
23. INTERMEDIATES TO BETA-TREFOIL
FGF-1 [3JUT]
Lee, J., &Blaber, M. (2011). Experimental support for the evolution of symmetric protein architecture
from a simple peptide motif. PNAS, 108(1), 126–130.
24. CE-SYMM WISHLIST
Find alignments for all valid rotations
Refine alignments based on isomorphism
constraints
Utilize crystallographic symmetry more
efficiently for biological assemblies Triose Phosphate
Isomerase [8TIM]
Detect multiple axes of symmetry
5-enol-pyruvyl shikimate-3-phosphate
(EPSP) synthase [1G6S]
25. CE-SYMM
Andreas Prlić, Spencer E. Bliven, Peter W.
Rose, Philippe Youkharibache, Douglas Myers-
Turnbull, Philip E. Bourne. On Symmetry and
Pseudo-Symmetry in Proteins. In preparation.
FGF-1 [3JUT] AmtB [3C1G]
28. ADDITIONAL METHODS FOR DETECTING
PROTODOMAINS
Changes in Quaternary Structure
Protodomain searches (Douglas Myers-Turnbull)
Domain Swapping
29. AIM 2
Identify structurally similar and potentially
homologous protodomains across fold space.
Preliminary Research
a) All-vs-all comparison of chains & domains
b) Clustering & network analysis
Proposed Research
a) Run all-vs-all comparison of protodomains
b) Build protodomain similarity network
c) Correlate network with existing properties:
ligand binding, symmetry order, enzymatic
activity, and distribution across organisms, etc
30. ALL-VS-ALL STRUCTURAL ALIGNMENT
Andreas Prlić, Spencer Bliven, Peter W
Rose, Wolfgang F. Bluhm, Chris Bizon, Adam
Godzik, Philip E. Bourne. Precalculated Protein
Structure Alignments at the RCSB PDB website.
Bioinformatics (2010) vol. 26 (23) pp. 2983-2985
31. ALL-VS-ALL STRUCTURAL ALIGNMENT
Use sequence clustering to get representative
chains with <40% sequence identity (currently
23410)
Split into domains by SCOP or PDP
All chains and domains compared using FATCAT
Use Open Science Grid (OSG)
Client/Server architecture for aggregating results
…
Scores
…
34. CROSS-CLASS EXAMPLE
3GP6.A
PagP, modifies lipid A
f.4.1 (transmembrane
beta-barrel)
1KT6.A
Retinol-binding
protein
b.60.1 (Lipocalins)
35. AIM 3
Integrate protodomain arrangements with domain
and quaternary structure information to create a
parsimonious model of fold evolution across the tree
of life.
Preliminary Research
a) Classification of biological assemblies by quaternary
symmetry & chain stoichiometry
b) Model for evolution via protodomains
Proposed Research
a) Determine the protodomain content of each
biological assembly
b) Identify BAs with conserved protodomain
architecture but different chain architecture, or vice
versa
c) Integrate data with model of protodomain evolution
36. QUATERNARY STRUCTURE
Find symmetry & pseudosymmetry within
biological assemblies
Functions at chain level
Can use various thresholds to determine
stoichiometry (95% sequence, CE alignment, etc)
Rhinovirus 2 [3DPR] GTP Cyclohydrolase I Hemoglobin [4HHB]
I (60,60,60,60,60) [1A8R] D5 (10) C2 (2,2)
37. EVOLUTIONARY MODEL
1. Local Mutation
2. Protodomain fusion
3. Protodomain fission
4. Loss of Interface
5. Gain of Interface
6. New Protodomains
38. CONNECTION TO FOLD SPACE
Mostly local mutations = continuous regions
Protodomain creation & rearrangement =
discrete regions
Identifying evolutionary events allows
quantitative comparison of the frequencies of
each mechanism
Biologically rather than geometrically motivated
39. AIM 4
Apply protodomain principles to understanding
the evolution of specific protein families.
Qualities
Have good structural coverage
Contain multiple members with symmetry at either
domain or quaternary structure level.
Contain circularly permuted members
Span a diverse set of folds
Ion Channels
Beta Propellers
AmtB [3C1G]
40. SODIUM/ASPARTATESYMPORTER FROM
PYROCOCCUSHORIKOSHII(GLTPH)
cytoplasm
Top Side
[2NXW]
Forrest, L. R., Krämer, R., & Ziegler, C. (2011). The structural basis of
secondary active transport mechanisms. Biochimica et
BiophysicaActa, 1807(2), 167–188.
41. CONCLUSIONS
Biological Assemblies are the functional unit of
structure
Protodomains can rearrange without modifying
the biological assembly
Separating changes in biological assembly from
genetic changes can provide evolutionary
perspective on fold space
Local Changes = Continuous Evolution
Protodomain rearrangements = Discrete Transitions
43. PUBLICATIONS
A Prlić, S Bliven, PW Rose, WF Bluhm, C Bizon, A Godzik, PE
Bourne. Precalculated Protein Structure Alignments at the RCSB
PDB website. Bioinformatics (2010) vol. 26 (23) pp. 2983-2985
Spencer Bliven and Andreas Prlić. Circular Permutation in
Proteins. PLoSComputBiol (2012) 8(3): e1002445.
A Prlić, S Bliven, P Rose, J Jacobsen, PV Troshin, M Chapman, J
Gao, CH Koh, S Foisy, R Holland, G Rimša, ML Heuer, H
Brandstätter–Müller, PE Bourne, and S Willis. BioJava: an open-
source framework for bioinformatics in 2012. Bioinformatics
(2012).
Intended:
CE-Symm method
Evolutionary model & examples of protodomain evolution
Structural similarity network analysis
Use of model for specific protein family
44. ACKNOWLEDGMENTS
Committee Collaborators
Philip Bourne Philippe Youkharibache
Milton H. Saier Jean-Pierre Changeux
Russell F. Doolittle BiojavaContributors
Michael K. Gilson
Adam Godzik
The lovely Christine
Bourne Lab/PDB Bliven
Andreas Prlić
Peter Rose
Douglas Myers-Turnbull
Lab & PDB members
45. This work is licensed under a Creative Commons Attribution-ShareAlike 3.0
Unported License.
Notas do Editor
Implications not within scope (eg sequence comparisons)Define motifsAcknowledge Philippe’s work
Orengo, C A, Flores, T P, Taylor, W R, Thornton, J M. Identification and classification of protein fold families. Protein Eng (1993) vol. 6 (5) pp. 485-500 1. SSAP structure comparison 2. 150 non-redundant reps 3. See 3 clusters by Multidimensional scalingHolm and Sander. Protein structure comparison by alignment of distance matrices. J Mol Biol (1993) vol. 233 (1) pp. 123-38 1. Original DALI 2. 225 representatives 3. finds 3 clusters by hierarchical clusteringHolm and Sander. Touring protein fold space with Dali/FSSP. Nucleic Acids Res (1998) vol. 26 (1) pp. 316-9 1. Automated classification of fold spaceHolm and Sander. Mapping the protein universe. Science (1996) vol. 273 (5275) pp. 595-603 1. Use multivariate scaling to project proteins to 2D. 2. Use 287 unique folds as input 3. find 5 classes 4. DALI 5. Updates: Holm and Sander. Touring protein fold space with Dali/FSSP. Nucleic Acids Res (1998) vol. 26 (1) pp. 316-9Shindyalov and Bourne. An alternative view of protein fold space. Proteins (2000) vol. 38 (3) pp. 247-60 1. 2016 repr (using fast structural alignment), but only use 75 of them? 2. all-v-all, but no visualizationHou, Jingtong, Sims, Gregory E, Zhang, Chao, Kim, Sung-Hou H. A global representation of the protein fold space. Proceedings of the National Academy of Sciences of the United States of America (2003) vol. 100 (5) pp. 2386-90 1. 3D projection 2. incorporates SCOP 3. 498 scop fold reprsChoi and Kim. Evolution of protein structural classes and protein sequence families. Proceedings of the National Academy of Sciences of the United States of America (2006) vol. 103 (38) pp. 14056-61 1. [from Taylor] Common structural ancestors (CSAs) are estimated for protein families and the age of the CSA plotted in fold space. This shows the b/a class to be the most ancient. Although there is debate about the methods used to estimate age, the ancient nature of the b/a proteins is clear but not unexpected, as many other functional properties suggest their antiquity.Taylor. Evolutionary transitions in protein fold space. Curr Opin Struct Biol (2007) vol. 17 (3) pp. 354-61 1. Concludes that attempts to embed fold space are futile. 2. Previous attempts were able to distinguish 'class' level but failed at finding significant relationships. 3. Contains a nice discussion about CP evolution 4. Cites Orengo, Holm, Hou, Choi.Sadreyev et al. Discrete-continuous duality of protein structure space. Curr Opin Struct Biol (2009) vol. 19 (3) pp. 321-8 1. Argues for continuous structural space with discrete evolutionary space. 2. Using DALI z-score as metric, cluster 2000 proteins in 2D using CLANS. Find continuous space with some 'mountains' of higher densityDaniels et al. Touring Protein Space with Matt. IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM (2011) PREPRINT 1. Abstract claims automated classification at superfamily/fold
Familiar terms but highlight subtleties
Should I mention ongoing Topic Page involvement?
Based on Uliel. Bioinformatics (1999) vol. 15 (11) pp. 930-6
Lee and Blaber. Experimental support for the evolution of symmetric protein architecture from a simple peptide motif. PNAS (2011) vol. 108 (1) pp. 126-30
Would already be detected as protodomain
Green-active transportersRed- channelsOmit legend
a - All alpha proteinsb - All beta proteinsc - Alpha and beta proteins (a/b)d - Alpha and beta proteins (a+b)e - Multi-domain proteins (alpha and beta)f - Membrane and cell surface proteins and peptidesg - Small proteinsh - Coiled coil proteinsi - Low resolution protein structuresj - Peptidesk - Designed proteinsMain cluster: 6000/7467 = 80% nodes, 83780/86878 = 96% edges