Ab initio protein structure prediction uses computational methods to predict a protein's 3D structure from its amino acid sequence. It relies on conformational searching to generate structure decoys and selecting native-like models. The key factors for success are an accurate energy function, efficient search methods like molecular dynamics or genetic algorithms, and effective selection of models close to the native structure. Model selection approaches include energy evaluations, compatibility scores, clustering of similar decoys, and identifying the lowest energy conformations.
2. Protein structure prediction
● Protein structure prediction (PSP) is the
prediction of the three-dimensional structure
of a protein from its amino acid sequence
i.e. the prediction of its tertiary structure
from its primary structure.
3. Protein Structure Prediction: Methods
Similar Protein
Structure
Available
Not
Available
Template Based
Method
ab initio modelling
Threading
4. ab initio modelling
● ab initio modelling conducts a
conformational search under the guidance of
a designed energy function.
● This procedure usually generates a number of
possible conformations (structure decoys),
and final models are selected from them.
5. ● A successful ab initio modelling depends on
three factors:
➢ an accurate energy function with which the
native structure of a protein corresponds to the
most thermodynamically stable state, compared
to all possible decoy structures;
➢ an efficient search method which can quickly
identify the low-energy states through
conformational search;
➢ selection of native-like models from a pool
of decoy structures.
6. Energy Functions
● Energy classified into two groups:
➔ Physics-based energy functions
➔ Knowledge-based energy functions
7. Physics-Based Energy Functions
“In a strictly-defined physics-based ab initio method,
interactions between atoms should be based on
quantum mechanics and the coulomb potential with
only a few fundamental parameters such as the
electron charge and the Planck constant; all atoms
should be described by their atom types where only
the number of electrons is relevant.”
(Hagler et al. 1974; Weiner et al. 1984)
8. Physics-Based Energy Functions
“In a strictly-defined physics-based ab initio method,
interactions between atoms should be based on
quantum mechanics and the coulomb potential with
only a few fundamental parameters such as the
electron charge and the Planck constant; all atoms
should be described by their atom types where only
the number of electrons is relevant.”
(Hagler et al. 1974; Weiner et al. 1984)
9. A compromised force field with a large
number of selected atom types is used. In
each atom type, the chemical and physical
properties of the atoms are enough alike
with the parameters calculated from crystal
packing or quantum mechanical theory.
10. ● Well-known examples of such all-atom physics-
based force fields include:
✔ AMBER
✔ CHARMM
✔ OPLS
✔ GROMOS96
● These potentials contain terms associated with
bond lengths, angles, torsion angles, van der
Waals, and electrostatics interactions.
● The major difference between them lies in the
selection of atom types and the interaction
parameters.
11. Knowledge-Based Energy Function
● Refers to the empirical energy terms derived from the
statistics of the solved structures in deposited PDB.
● Can be divided into two types:
➢ generic and sequence-independent terms such as the
hydrogen bonding and the local backbone stiffness of a
polypeptide chain
➢ amino-acid or protein-sequence dependent terms, e.g. pair
wise residue contact potential, distance dependent atomic
contact potential , and secondary structure propensities
12. Conformational Search Methods
● Successful ab initio modelling of protein structures
depends on the availability of a powerful conformation
search method which can efficiently find the global
minimum energy structure for a given energy function
with complicated energy landscape.
● Types:
➔ Monte Carlo Simulations
➔ Molecular Dynamics
➔ Genetic Algorithm
➔ Mathematical Optimization
13. Monte Carlo Simulations
● Its core idea is to use random samples of
parameters or inputs to explore the behavior
of a complex system or process.
14. Initial configuration of particles
in a system
Monte Carlo move is attempted
that changes the configuration of
the particles
Move is accepted or rejected
based on an acceptance
criterion
Calculates the value of a
property of interest
An accurate average value of this
property can be obtained
StepsinMCsimulation
15. Molecular Dynamics
● MD simulation solves Newton’s equations of motion at each step of
atom movement, which is probably the most faithful method
depicting atomistically what is occurring in proteins.
● The method is therefore most-often used for the study of protein
folding pathways
● The long simulation time is one of the major issues of this method,
since the incremental time scale is usually in the order of
femtoseconds (10 15 s) while the fastest folding time of a small−
protein (less than 100 residues) is in the millisecond range in nature.
16. Genetic Algorithm
● The genetic algorithm is a method for solving problems
that is based on natural selection, the process that drives
biological evolution.
● The genetic algorithm repeatedly modifies a population of
individual solutions.
● At each step, the genetic algorithm selects individuals at
random from the current population to be parents and uses
them to produce the children for the next generation.
● Over successive generations, the population "evolves"
toward an optimal solution.
17. Mathematical Optimization
● Mathematical optimization is the selection of a best
element (with regard to some criteria) from some
set of available alternatives.
18. Model Selection
● The selection of protein models has been
emerged as a new field called Model Quality
Assessment Programs (MQAP)
● Modelling selection approaches can be
classified into two types:
energy based
free-energy based
20. Knowledge-Based Energy Function
● Sippl developed a pair wise residue-distance based
potential (Sippl 1990) using the statistics of known PDB
structures in 1990 (its newest version is PROSA II (Sippl
1993; Wiederstein and Sippl 2007) ).
● A variety of knowledge-based potentials have been
proposed, which include atomic interaction potential,
solvation potential, hydrogen bond potential, torsion angle
potential, etc.
21. Sequence-Structure Compatibility Function
● Best models are selected not purely based on energy functions.
● They are selected based on the compatibility of target sequences
to model structures.
● The earliest and still successful example is that by Luthy et al.
(1992), who used threading scores to evaluate structures.
● Colovos and Yeates (1993) later used a quadratic error function
to describe the non-covalently bonded interactions among CC,
CN, CO, NN, NO and OO, where near-native structures have
fewer errors than other decoys
22. Clustering of Decoy Structures
● Cluster analysis or clustering is the task of grouping a set of objects in such a
way that objects in the same group (called a cluster) are more similar (in
some sense or another) to each other than to those in other groups (clusters).
● The cluster-centre conformation of the largest cluster is considered closer to
native structures than the majority of decoys.
● In the work by Shortle et al. (1998), for all 12 cases tested, the cluster-centre
conformation of the largest cluster was closer to native structures than the
majority of decoys. Cluster-centre structures were ranked as the top 1–5%
closest to their native structures.