2. The druggable genome
Human genome
Polysaccharides Lipids Nucleic Acids Proteins
Proteins with
binding site
Druggable genome: Subset of genes which
express proteins capable of binding small
drug-like molecules
3. Protein Structure Prediction
Why predict protein structure if we can use
experimental tools to determine it?
• Experimental methods are slow and expensive
• Some structures were failed to be solved
• A representative family structure can suffice to
deduce structures of the entire family sequences
4. Outline to modeling………………
1. Introduction to protein structure and databases
2. Structure prediction approaches
• Ab-initio
• Threading
• Homology modeling
3. Hands on molecular modeling
4. Model evaluation
6. • Pauling built models based on the following
principles, codified by G. N. Ramachandran:
• Bond lengths and angles -should be similar to
those found in individual amino acids and
small peptides
• Peptide bond -should be planer
• Overlaps-not permitted, pairs of atoms no
closer than sum of their covalent radii
• Stabilization-have sterics that permit
hydrogen bonding
• Two degrees of freedom:
• (phi) angle = rotation about N-C
• (psi) angle = rotation about C-C
• A linear amino acid polymer with some folds
is better but still not functional nor
completely energetically favorable packing!
9. Databases
• RCSB-the Protein Data Bank-all deposited structures
• Experimentally-determined structures of proteins, nucleic acids, and
complex assemblies.
• Currently having 65,000 structures
• Uniport main sequence database
o SwissPro
o TrEMBL
Collaborations:
European Bioinformatics Institute (EBI),
Swiss Institute of Bioinformatics (SIB) ,
Protein Information Resource (PIR)
• NCBI lots of databases, including sequence and
structures
• PDBsum combines structural & sequence data
10. 2. Structure Prediction Approaches
o Ab-initio fold prediction
• Not based on similarity to a
sequence- structure
o Threading (Fold Recognition)
• Requires a structure similar
to a known structure
o Homology modeling
• Based on sequence similarity
with a protein for which a
structure has been solved.
11. a. Ab initio modeling
o Structure prediction from
“first principals”:
o Shows that we understand
the process.
o Given only the sequence, try
to predict the structure
based on physico-chemical
properties
• The force field
• Molecular dynamics
• Minimal energy
12. Force field
• Mathematical expressions describing the potential energy of a
molecular system
• Each expression describes a different type of physico-chemical
interaction between atoms in the system:
• Van der Waals forces, Covalent bonds, Hydrogen bonds,
Charges, Hydrophobic effects
Molecular dynamics
• Simulates the forces that governs the protein within water.
• Since proteins usually naturally fold, this would lead to the
native protein structure.
13. Minimal Energy
Assumption: the folded
form is the minimal energy
conformation of a protein
• Use of simplified energy
function
• Search methods for minimal
energy conformation:
• Greedy search
• Simulated annealing
14. b. Threading (Fold reorganization)
Given a sequence and a library of folds, thread the
sequence through each fold. Take the one with the
highest score (I-TASSER).
• Method will fail if new protein does not belong
to any fold in the library.
• Score of the threading is computed based on
known physical chemistry properties and
statistics of amino acids.
15.
16. c. Homology Modeling…………………
o A protein structure is
defined by its amino acid
sequence.
o Closely related sequences
adopt highly similar
structures, distantly related
sequences may still fold
into similar structures.
o Three-dimensional Triophospate ismoerases
structure of proteins from 44.7% sequence identity
0.95 RMSD
the same family is more
conserved
18. The Query Protein
Name: Dihydrodipicolinate reductase
Enzyme reaction:
Molecular process: Lysine biosynthesis (early stages)
Organism: E. coli
Sequence length: 273 aa
19. Steps in homology modeling
1. Searching for structures related to the query sequence
2. Selecting templates
3. Aligning query sequence with template structures
4. Building a model for the query using information from the
template structures (Modelor 9.10)
5. Evaluating the model
20. 1. Searching For Structures
Get your sequence
>DAPB_ECOLI
MHDANIRVAIAGAGGRMGRQLIQAALALEGVQLGAALEREGSSLLGSDAGEL
AGAGKTGVTVQSSLDAVKDDFDVFIDFTRPEGTLNHLAFCRQHGKGMVIGTT
GFDEAGKQAIRDAAADIAIVFAANFSVGVNVMLKLLEKAAKVMGDYTDIEII
EAHHRHKVDAPSGTALAMGEAIAHALDKDLKDCAVYSREGHTGERVPGTIGF
ATVRAGDIVGEHTAMFADIGERLEITHKASSRMTFANGAVRSALWLSGKESG
LFDMRDVLDLNNL*
28. • Aligning query sequence with template structures
• Building a model for the query using information from the
template structures (Modeller 9.10)
• Modeller 9.10 will generate PDB files with reference to the
template structure.
• Evaluation of the structure in SAVES
29. 4. Model evaluation
Examples of assessment approaches:
1. Assessment of the model’s stereochemistry
2. Prediction of unreliable regions of the model -
“pseudo energy” profile: peaks errors
3. Consistence with experimental observations
4. Consistence with evolutionary conservation rates
.
34. Outline to docking………….
• Introduction to protein-ligand docking
• Scoring functions
• Assessing performance
• Practical aspects
35. Protein ligand Docking
• A Structure-Based Drug Design (SBDD) method
“structure” means “using protein structure”
• Computational method that mimics the binding of a ligand to a protein
• Given...
• Predicts...
• The pose of the molecule in
the binding site
• The binding affinity or a score
representing the strength of
binding
36. Pose vs. binding site
• Binding site (or “active site”)
• The part of the protein where the ligand
binds
• Generally a cavity on the protein surface
• Can be identified by looking at the
crystal structure of the protein
• Pose (or “binding mode”)
• The geometry of the ligand in the binding
site
• Geometry = location, orientation and
conformation
• Protein-ligand docking is not about
identifying the binding site
37. Outline to docking………….
• Introduction to protein-ligand docking
• Scoring functions
• Assessing performance
• Practical aspects
38. Components of docking software
• Typically, protein-ligand docking software consist
of two main components which work together:
1. Search algorithm
• Generates a large number of poses of a molecule in the
binding site
2. Scoring function
• Calculates a score or binding affinity for a particular pose
• To provide
The pose of the molecule in the
binding site
The binding affinity or a score
representing the strength of
binding
39. The perfect scoring function will
• Accurately calculate the binding affinity
• Will allow actives to be identified in a virtual screen
• Be able to rank actives in terms of affinity
• Score the poses of an active higher than poses of an
inactive
• Will rank actives higher than inactives in a virtual screen
• Score the correct pose of the active higher than an
incorrect pose of the active
• Will allow the correct pose of the active to be identified
“actives” = molecules with biological activity
40. Broadly speaking, scoring functions can be
divided into the following classes:
• Forcefield-based
• Based on terms from molecular mechanics force fields
• GoldScore, DOCK, AutoDock
• Empirical
• Parameterised against experimental binding affinities
• ChemScore, PLP, Glide SP/XP
• Knowledge-based potentials
• Based on statistical analysis of observed pairwise
distributions
• PMF, DrugScore, ASP
41. Böhm’s empirical scoring function
• In general, scoring functions assume that the free energy of binding can be written as
a linear sum of terms to reflect the various contributions to binding
• Bohm’s scoring function included contributions from
hydrogen bonding, ionic interactions, lipophilic
interactions and the loss of internal conformational
freedom of the ligand.
• The ∆G values on the right of the equation are all constants
• ∆Go is a contribution to the binding energy that does not directly depend on any specific
interactions with the protein
• The hydrogen bonding and ionic terms are both dependent on the geometry of the
interaction, with large deviations from ideal geometries (ideal distance R, ideal angle α)
being penalised.
• The lipophilic term is proportional to the contact surface area (Alipo) between protein and
ligand involving non-polar atoms.
• The conformational entropy term is the penalty associated with freezing internal rotations
of the ligand. It is largely entropic in nature. Here the value is directly proportional to the
number of rotatable bonds in the ligand (NROT).
42. Outline to docking………….
• Introduction to protein-ligand docking
• Scoring functions
• Assessing performance
• Practical aspects
43. Pose prediction accuracy
• Accuracy measured by RMSD (root mean squared deviation) compared to known
crystal structures
RMSD = square root of the average of (the difference between a particular
coordinate in the crystal and that coordinate in the pose)2
Within 2.0Å RMSD considered cut-off for accuracy
• In general, the best docking software predicts the correct pose about 70% of the
time
• Need a dataset of Nact known actives, and inactives
• Dock all molecules, and rank each by score
• Ideally, all actives would be at the top of the list
• Define enrichment, E, as the number of actives found (Nfound) in the top X% of
scores (typically 1% or 5%), compared to how many expected by chance
E = Nfound / (Nact * X/100)
E > 1 implies “positive enrichment”, better than random
E < 1 implies “negative enrichment”, worse than random
44. Outline to docking………….
• Introduction to protein-ligand docking
• Scoring functions
• Assessing performance
• Practical aspects
45. Protein preparation
• The Protein Data Bank (PDB) is a repository of protein crystal
structures, often in complexes with inhibitors
• PDB structures often contain water molecules
• In general, all water molecules are removed except where it is known
that they play an important role in coordinating to the ligand
• PDB structures are missing all hydrogen atoms
• Many docking programs require the protein to have explicit hydrogens.
In general these can be added unambiguously, except in the case of
acidic/basic side chains
N NH
• An incorrect assignment of protonation
states in the active site will give poor HN NH
results R +
• Glutamate, Aspartate have COO- or COOH
• OH is hydrogen bond donor, O- is not N R
HN
• Histidine is a base and its neutral form has
two tautomers.
R
46. Ligand preparation
A reasonable 3D structure is required as starting point
• Even during flexible docking, bond lengths and angles are held
fixed
The protonation state and tautomeric form of a particular
ligand could influence its hydrogen bonding ability
• Either protonate as expected for physiological pH and use a single
tautomer
• Or generate and dock all possible protonation states and
tautomers, and retain the one with the highest score
OH H+ O
Enol Ketone
47.
48. Conclusions
• Computationals prediction of protein structure using
modeling tools are effort saving and error minimizing
processes.
• Homology modeling can be successively applied if structure
of known sequence simillarity is known.
• Protein-ligand docking is an essential tool for computational
drug design
• Widely used in pharmaceutical companies
• But it’s not a golden bullet
• The perfect scoring function has yet to be found
• The performance varies from target to target, and scoring function
to scoring function
• Care needs to be taken when preparing both the protein and
the ligands
• The more information you have, the better your chances.
49. Useful links…..
1. SEARCHING FOR STRUCTURES
• PDB-Blast at NCBI- http://blast.ncbi.nlm.nih.gov/Blast.cgi
• Meta server- 3D judry http://bioinfo.pl/meta/
• FFAS03- http://ffas.ljcrf.edu/ffas-cgi/cgi/ffas.pl
• HHPRED- http://toolkit.tuebingen.mpg.de/hhpred
2. SELECTING TEMPLATES
3. ALIGNING QUERY SEQUENCE WITH TEMPLATE STRUCTURES
• MSA - MUSCLE, T-coffee and MAFFT at
http://toolkit.tuebingen.mpg.de/sections/alignment
• Alignment editor – Bioedit - http://www.mbio.ncsu.edu/BioEdit/bioedit.html
4. BUILDING A MODEL
• Nest - http://wiki.c2b2.columbia.edu/honiglab_public/index.php/Software:nest
• Modeller - http://salilab.org/modeller/modeller.html
5. EVALUATING THE MODEL
• ConSurf http://consurf.tau.ac.il
• PROCHECK http://www.biochem.ucl.ac.uk/~roman/procheck/procheck.html
• WHATCHECK www.cmbi.kun.nl/swift/whatcheck/
• ProSA https://prosa.services.came.sbg.ac.at/prosa.php
• ProQ http://www.sbc.su.se/~bjornw/ProQ/ProQ.cgi