43_EMIJ-06-00212.pdf

Submit Manuscript | http://medcraveonline.com
Introduction
In 1981, a cover article entitled “Next Industrial Revolution:
Designing Drugs by Computer at Merck”1
was published by Fortune
magazine. This event may be regarded as the beginning of the
potential interests among scientists towards computer-added drug
design (CADD). In the past decade, CADD has been reemerged as a
very effective way to develop new potential drugs. The screening of a
numberofcompoundscanbeeasilyexecutedinordertoleadcompound
discovery. Many compounds predicted to be inactive can be skipped,
and those predicted to be active can be prioritized which reduces
the cost and workload of a full High-Throughput screening without
compromising lead discovery. This feature provides new impetus
for the drug discovery. In modern post-genomic era, the number of
proteins with a known three-dimensional structure is increasing
rapidly. This increase in the number of protein targets is in part
because of improvements in techniques for structure determination,
such as high-throughput X-ray crystallography. The structures
produced by structural genomics initiatives are available publicly.
Efforts are made to achieve large scale generation and analysis of
information through computational as well as experimental methods
from the three dimensional structures and dynamics of the proteins.
Computational generation of targets through modeling and ab-initio
prediction of protein structures are the helpful tools in computational
proteomics. Molecular docking of a protein structure with potential
interacting partners (ligands) is heavily used in industry of rational
drug design. Discovery and development of a new medicine is a long
and expensive process. The new discovered compound must not only
produce the desired response with minimal side effects but also be
better than the existing remedies.2
Docking is a computational technique that samples conformations
of ligand in protein binding sites which aims to achieve an optimized
conformation for both the protein and ligand and relative orientation
between them, such that the free energy of the overall system is
minimized. The interaction between a ligand and its target may be due
to non-bonded forces and, in some cases, covalent interactions. Upon
binding, many ligands show significant shape complementarities with
the region of the macromolecule in binding site. Ligands often form
hydrogen bond interactions in the active site. Some receptors have
hydrophobic packet formed by a group of non-polar hydrophobic
amino acids in to which ligand can place a hydrophobic group of
appropriate size. Ligands are required to be sufficiently lipophillic to
partition the cell membrane but not so lipophillic that it stays there.
Components of the docking simulations are molecular
representation, conformational search space and ranking of the docked
complexes. Protein surface provide the toe-hold information about the
interaction with other molecules. Different mathematical models viz;
geometrical shape descriptor or grid generation are used to define
the protein surface which can be modified for rigid and flexible
portions of the receptors. Docking initiates from the non-native folded
protein chains and ligand conformations guided from experimental
observations. The multipart docking demands a large number of
possible arrangements. Also, protein domains are not necessarily
stable and they may have low population times which indicate that
these domains are more flexible than entire protein chains. Solving
these problems require two components: an efficient search procedure
and a good scoring function. The generation of ligand conformations
and to locate at the most stable state in the energy landscape is the
first step in the process of docking. The scoring function should
include and appropriately parameterized all the energetic ingredients.
To predict the possible conformation of the binary complex, each
docking program utilizes a specific search algorithm and to assign a
numerical fitness value to the computed protein-ligand conformation
different scoring functions are utilized. Scoring functions are also
valuable in order to optimize and ranks best poses of the docking.
The scoring function should be fast enough to allow its application to
a large number of potential. Speed and accuracy are key features for
obtaining a successful result in docking simulations. The objective of
the docking algorithm and scoring function is to obtain a fast method
which is able to ascertain the novel lead compounds or reproduce
experimental conformation at higher accuracy as possible.
Search algorithms
In the conformational search, structural parameters of the ligands,
Endocrinol Metab Int J. 2018;6(6):359‒367. 359
© 2018 Yadava. This is an open access article distributed under the terms of the Creative Commons Attribution License, which
permits unrestricted use, distribution, and build upon your work non-commercially.
Search algorithms and scoring methods in protein-
ligand docking
Volume 6 Issue 6 - 2018
UmeshYadava
Professor, Department of Physics, Deen Dayal Upadnyaya
Gorakhpur University, India
Correspondence: UmeshYadava, Professor, Department
of Physics, Deen Dayal Upadnyaya Gorakhpur University,
Gorakhpur–273009, India, Email u_yadava@yahoo.com
Received: September 10, 2018 | Published: November 13,
2018
Abstract
Accurate structural modeling and correct prediction of activity are the two aims
of docking studies. The identification of molecular features and modifications in
compounds in order to improve the potency are the other difficult issues to understand.
Docking process is a multi-step process in which each step introduces one or more
additional degrees of complication. The generation of ligand conformations and to
locate at the most stable state in the energy landscape is the first step in the process
of docking. To treat ligand flexibility and, to some extent, protein flexibility different
search algorithms are utilized. The evaluation and ranking of envisaged ligand
conformations are executed by scoring functions. Scoring functions make various
assumptions and simplifications and do not fully account for a number of physical
phenomena that determine molecular recognition. This chapter focuses on the
methodological development of the search algorithms and scoring functions including
limitations and advantages.
Endocrinology & Metabolism International Journal
Mini Review Open Access

Search algorithms and scoring methods in protein-ligand docking 360
Copyright:
©2018Yadava
Citation:Yadava U. Search algorithms and scoring methods in protein-ligand docking. Endocrinol Metab Int J. 2018;6(6):359‒367.
DOI: 10.15406/emij.2018.06.00212
such as torsional (dihedral), translational and rotational degrees
of freedom, are incrementally modified. Conformational search
algorithms perform this task by applying different methods.3
The
identification of molecular features and modifications in compounds,
in order to improve the potency are the difficult issues to understand.
The docking process may be regarded as a multi-step process in which
each step introduces one or more additional degrees of complication.
Accurate structural modeling and correct prediction of activity are the
aspirants of docking studies. The search algorithms used to predict
plausible conformations of the complex are defined by a set of rules
and parameters. In terms of the flexibility of the ligand and/or the
receptor, docking algorithms can be categorized in two large sets:
rigid-body and flexible docking which are based on different types
of algorithms.
Rigid-body docking method considers essential geometric
complementarities and deals with the flexibility of neither ligand nor
receptor, which limits the specificity and accuracy of results. In many
of the cases, Rigid-body docking simulation has been capable of
identifying ligand binding sites for proteins which are close enough to
the crystallographic structures.4
Root mean square deviation (RMSD)
between the atomic coordinates obtained from docking simulation
and crystallographic structure is used for the comparison of the
structures. In docking simulations, the best results generate RMSD
values below 1.5 Å. Rigid-body docking method is usually utilized
for the fastest way to perform an initial screening of small molecule
databases. An illustrious example of Rigid body docking algorithm
is the DOCK which is designed to find molecules with a high degree
of shape complementarities to the binding site.5
At first, the DOCK
program derives the negative image of the binding pocket utilizing
the molecular surface of the receptor (protein or nucleic acid). The
negative site is consisted of the collection of the overlapping spheres
of varying radii such that each sphere touches only at the two points
of the molecular surface. Ligand atoms are matched to the sphere
centers to find matching sets such that all the distances between the
sphere centers matches with the corresponding distances between
the atoms of the ligands (Figure 1). The ligand can then be oriented
within the binding site using least square fit of the atoms to the sphere
centers. There are also steps in algorithm which checks the steric
clashes between the ligand and receptor. In the case of unacceptable
orientation the ligand is reoriented within the least square fit limit until
acceptable orientation is obtained. The acceptable orientation is then
scored on the basis of interaction energy computation. Subsequently,
new orientations are generated by matching sphere centers and ligand
atoms and scored using scoring functions. Orientations are arranged
on the basis of these scores for the subsequent analysis.
Figure 1 The DOCK algorithm: a) atoms are matched with sphere centers then molecule is oriented in the binding site. b) docked molecule in the active site
of an enzyme
After initial screening of ligands through rigid-body docking,
the flexible docking is utilized for a more specific refinement and
lead optimization. Flexible docking stipulates more computational
power. In Flexible docking, several possible conformations of
ligand or receptor, or both of the molecules is considered at the same
time. Rigid-body docking considers only six degrees of freedom
(translational and rotational) while flexible docking method considers
conformational degrees of freedom of ligands and receptor too. Most
of the methods only consider the conformational space for the ligands
while the receptor is considered to be rigid. Docking algorithms
contain several common methods for searching conformational
space. As an instance, the docking through Monte Carlo methods
incorporates simulated annealing as well.6
To treat ligand flexibility
and, to some extent, protein flexibility different search algorithms are
used. Ligand flexibility search methods can be divided into three basic
classes: Systematic search methods, Random or Stochastic methods
and Simulation methods.
Systematic search algorithms
Systematic search algorithms endorse slight variations in the
structural parameters, progressively changing the conformation of the
ligands. Systemic search algorithms try to explore all the degrees of
freedom in a molecule which is dictated by the rotations of the bonds
and angles and size of increments. The number of possible molecular
conformations is given by,
1 1
,
360
n
i j
i j
N inc
θ
°
= =
∏ ∏
Where N represent the number of rotatable bonds, ninc
is the
number of increments and θi,j
represents the incremental angle j
for bond i. Therefore, because of large number of conformations,
Systematic searches ultimately face the combinatorial explosion
problem.7
The systematic search method probes the energy landscape
of the conformational space and, after numerous search and evaluation

Copyright:
©2018Yadava
DOI: 10.15406/emij.2018.06.00212
cycles, converges to the minimum energy solution corresponding to
the most likely binding mode. Although the method is effective in
exploringtheconformationalspace,itcanconvergetoalocalminimum
rather than the global minimum. This drawback can be overcome by
performing simultaneous searches starting from different points of the
energy landscape.3
Incremental Construction, Conformational search,
Database, Fast Shape Mappings, Distance Geometry are the examples
of the Systematic search algorithms. Systematic search methods can
be categorized into exhaustive search algorithms and fragmentation
based algorithms.
Exhaustive search algorithms
Exhaustive searches elucidate ligand conformations by
systematically rotating all possible rotatable bonds at a given interval.
Large conformational space often prohibits an exhaustive systematic
search. Algorithms such as GLIDE8
use heuristics to focus on regions
of conformational space that are likely to contain good scoring ligand
poses. GLIDE also precomputes a grid representation of target’s shape
and properties and an initial set of low-energy ligand conformations
in ligand torsion-angle space is created. Initial favorable ligand poses
are identified by approximate positioning and scoring methods. This
initial screening reduces the conformational space over which the high
resolution docking search is applied. High-resolution search involves
the minimization of the ligand using standard molecular mechanics
energy function followed by a Monte Carlo procedure for examining
nearby torsional minima.
Fragment based algorithms
Incremental construction:
Fragmentation methods sample ligand conformation by
Incremental Construction of ligand conformations from fragments
obtained by dividing the ligand of interest. Ligand conformations
are obtained by docking fragments in the binding site one at a time
and incrementally growing them or by docking all fragments into
the binding site and linking them covalently. The incremental search
process is bringing about in to two different techniques. In first
technique, known as de novo ligand design, the various molecular
fragments are docked in to the active region and then they are linked
covalently. Second technique includes the breakup of docked ligands
in to rigid (core) and flexible parts (side chains); the core rigid
part is docked first in to the active site and flexible parts are added
incrementally.9
The fragments are added within geometric constraints
depending upon their steric complementarities and binding affinities
in to binding sites. An algorithm is also introduced to remove the
unfavourable conformations. In another systemic search method, the
libraries of pre-generated conformations of ligands are utilised. It is a
helpful tool in rigid body docking procedure.
Distance geometry (DG):
In the development of Distance Geometry systematic algorithm,
intra and inter molecular distances are used.10
Compared to other
methods, Distance geometry algorithm uses a smaller set of distance
constraints, that be inclined to work with a large number of constraints
either by imposing additional bounds or by deducing bounds from the
given bounds. FLOG11
utilizes the distance geometry and generates
database conformations which can be then used in the same manner
as DOCK.12
Fast shape matching(SM):
Fast Shape matching algorithms are based upon the geometrical
overlap between the two molecules derived from molecular
surfaces. Different algorithms are employed in order to make several
alignments between ligand and receptor. Fast shape matching also
predicts the possible conformations of the binding site for the ligand
suitability. Rigid-body docking applications usually make use of the
basic concept of the fast shape matching algorithm. As an illustrative
example ZDOCK13
accounts a geometrical surface model which
combines shape complementarities, desolvation and electrostatics
parameters through a Fast Fourier Transform algorithm. Fast shape
matching algorithm has been found to demonstrate high accuracies
in the case of protein-protein docking. It is also widely utilized with
the strategies in flexible docking algorithms too. The first step in the
DOCK algorithm is based on a sphere-matching procedure combined
with Incremental Construction method.
In FlexX, the interaction geometries between the core fragments
and receptor groups is taken in to account for the placement of the rigid
core.14
FlexX also differ from DOCK in the sense that it uses pose-
clustering algorithm to classify the docked poses. In common with
other incremental search algorithms, the Hammerhead also divides
ligands into fragments and docks in to binding site. After the docking
of fragments, it rebuilds the ligand using energy minimization criteria
with acceptable initial scores. The use of libraries of pre-generated
conformations of the ligand is another method of the systematic
search. Once the acceptable conformations library is calculated then
search problem is reduced to a rigid body docking procedure.
Stochastic or random search methods
Stochastic or Random search methods are based on making
random changes to either a single ligand or a population of ligands
which are evaluated with a predefined probability function. Derived
from the probability criteria, favorable changes are accepted. For this,
the algorithm generates ensembles of molecular conformations and
populates a wide range of the energy landscape. This strategy avoids
trapping the final solution at a local energy minimum and increases the
probability of finding a global minimum. As the algorithm promotes
a broad coverage of the energy landscape, the computational cost
associated with this procedure is an important limitation. Genetic
algorithm, Monte carlo simulation, Tabu search etc. methods are the
examples of stochastic or random search methods which uses different
probability criteria of acceptance.
Monte carlo algorithm
Monte Carlo algorithm generate a random initial configuration
of ligand in the active site which is scored based on some specific
properties (e.g. energy). Monte Carlo algorithm generates an initial
configuration of the ligand in the active site consisting of a random
conformation, translation and rotation. This initial configuration
is scored based on some specific criteria. Then, small changes are
made to generate new configuration. This new configuration is again
scored on the same criteria. If the new configuration gives better score
than the previous one then it is retained otherwise it is accepted or
rejected following Metropolis criterion. In metropolis criterion, if the
configuration is not a new minimum, a Boltzmann-based probability
function is applied. If the solution passes the probability function
test, it is accepted; if not, the configuration is rejected. The process
is repeated until the desired number of configurations is obtained.
Monte carlo algorithms are introduced as an initial minimization
process in some molecular dynamics programs, such as GROMOS15
and GROMACS.16
The docking programs MCDOCK17
and ICM18
utilizes Monte Carlo as flexible docking algorithm. The splendor

Copyright:
©2018Yadava
DOI: 10.15406/emij.2018.06.00212
of Monte Carlo method is that it provides accurate and precise
results in different thermodynamic conditions. Based on ensemble,
this method tries to dock the ligand inside the receptor binding site
through numerous random positions and rotations that drops off the
chances of being trapped in the local minima. Random orientations
and conformations are generated using random number generator
algorithm which decreases the chances of false results.
Genetic algorithm
Genetic algorithms are based on the principle of population and
biological evolution. The particular arrangement of the ligand and
protein are defined by a set of parameters describing translation,
rotation and conformation of the ligand with respect to protein.
These parameters correspond to gene in the genetic algorithm and
called as ‘state variables’. Parameters are encoded in a chromosome
and stochastically varied and evaluated by a fitness function. In
molecular docking total interaction energy of ligand with protein is
regarded as the fitness value. Random pairs of the chromosomes are
combined (mated) through a process called as crossover to produce
a new chromosome (offspring). Based upon the fitness criteria, the
new chromosome inherits the genes from either parent. Additionally,
some offsprings undergo random mutation in which one gene is
changed by random amount. The mutation is accepted only if it
gives better fitness value. Consequently, solutions better suited to
their environment reproduce whereas poorer matching sets die. This
process is analogous to the gene recombination and mutation to
produce the next generation. The scoring function for these algorithms
takes a large number of parameters in to account like mutation
rates, crossover rates and number evolutionary rounds. The genetic
algorithm, as implemented in GOLD, requires approximate size
and location of the receptor active site as an input. Several methods
are used to define active site of the protein. Lamarkian Genetic
Algorithm is implemented in AUTODOCK which switches between
genotypic spaces to phenotypic space. Mutation and crossover take
place in genotypic space while phenotypic space is decided by the
energy function and optimized. From the energy minimization, the
phenotypic alterations are mapped back onto the genes through the
change of state variables of the ligand.19
Tabu search algorithm
In Tabu search algorithm, a number of small random changes are
made to the current configuration of ligand and rank them according
to the fitness function. The change having minimum number of tabu
(reject conformations) is accepted. Developed and described by
Glover, Tabu search is a meta-heuristic method which has been used
to resolve hard optimization problems.20
Heuristics i.e. approximate
solution methods are used to tackle difficult problems. Initially, to
handle the combinatorial problems in search space, Local search
techniques were used which starts with an initial feasible solution and
progressively improved by a series of local modifications (or moves).
The search terminates when it comes across a local minimum which
is an important limitations of this method. The method consist of, n
small random changes to the current conformation which are ranked
according to the value of the chosen fitness function. The ‘tabu’
changes (that is, previously rejected conformations) are determined.
If the best modification has a lower value than any other accepted so
far, then it is accepted, even if it is in the ‘tabu’; otherwise, the best
‘non-tabu’ change is accepted and recorded. This process is repeated
until true minimum is obtained. Usually, tabus are stored in a short-
term memory of the search space. Some tabu search techniques are
implemented by aspiration criteria which revoke tabus after a specific
period because thay may prevent attractive moves even when there
is no danger of cycling18, but they are rarely used. Tabu Search
docking algorithm has demonstrated high accuracy, being talented to
prevent the simulation being trapped in local minima and avoid to
visit previously known minimal energy conformations.
Simulation methods
Molecular dynamics simulation
The most popular simulation approach for molecular docking is
the Molecular dynamics simulation which calculates the trajectory of
the system by the application of Newtonian mechanics. Forces are
calculated on each atom from the small change in potential energy
between current and new position ( )
i i
F U
= ∇ . These atomic forces
and corresponding masses of atoms are used to determine the atomic
positions over a series of small time steps by integrating Newton’s
second law of motion
2
2
i
i i
d r
F m
dt
=
 
 
 
. Thus, a trajectory of
changes in atomic positions over time is obtained. The problem with
molecular dynamics simulation is that it is unable to cross high-energy
barriers within a practicable period of simulation. Hence, molecular
dynamics simulation can locate ligands within local minima. The
complement of other methods followed (like Simulated annealing) by
molecular dynamics simulation may provide better results. In the year
1999, Mangoni and coworkers described a MD protocol for docking
small flexible ligands to flexible targets in water.21
The center of mass
motion of the ligand from its internal and rotational motions were
separated and coupled to different temperature baths. The rigid or
flexible ligand and/or receptor were defined using appropriate values
of temperature and coupling constants. Later on the relaxed-complex
approach was introduced that walks around the binding conformations
that may happen only rarely in the unbound target proteins. In this
approach, the MD simulation of ligand free target is carried out for
2 ns and then the docking of ligand is performed. The discovery of
first clinically approved HIV integrase inhibitor, raltagravir was led
by the relaxed complex method.22
Long MD simulations are used for
the study of drug binding events to their protein targets. In contrast
to MD simulation, energy minimization methods are hardly ever
used as alone search techniques. The energy minimization techniques
lead to the local energy minima, therefore these methods are often
complement with other search methods, including Monte Carlo. The
DOCK program acts upon a minimization step after each fragment
addition, tracked by a final minimization before scoring.
Simulated annealing
Simulated Annealing is a probabilistic method for finding the
global minimum of a cost function that posses several local minimum.
In simulated annealing search method, the biomolecular system is
simulated by a specific kind of molecular dynamics simulation in
which every docked conformer is simulated with gradual decrease of
temperature during regular interval of time. Unlike solutions may be
rejected based on scoring criteria or through approximations for steric
clashes and distance matches. Simulated Annealing method has been
used in several conformational-analysis, protein structure predictions,
and molecular docking search methods. Since this method considers
the conformations and flexibility of ligand and protein both, it may
give a better accuracy results when compared with Monte Carlo.
However, SimulatedAnnealing docking may be more time consuming,
as annealing cycle has to be repeated for each ligand positioned inside
the binding pocket of receptor. Simulated Annealing is combined

Copyright:
©2018Yadava
DOI: 10.15406/emij.2018.06.00212
with Monte Carlo (AutoDock) where random changes are made in
ligand orientation during each simulated annealing temperature cycle.
The energy of new configuration is compared with the previous one,
lower energy configuration is chosen to be compared with the next
configuration. Otherwise, the configuration is accepted or rejected
based on Metropolis criterion.
Several docking algorithms are used by different programs
and have demonstrated considerable accuracies in different cases.
Table 1 shows the various docking programs and search algorithms
implemented with them. Most of the programs are focused on virtual
screening initiatives. The key point in the development of docking
algorithms is the accuracy of results close to experimental values which
mostly depends on the type of target and ligands under study. Methods
which are more complex and considers many physicochemical and
thermodynamic properties be likely to provide higher accuracy but
cost more computational time.
Table 1 Docking algorithms implemented in various docking programs
Program Algorithm
Autodock Lamarkian Genetic Algorithm
DOCK Shape Matching
EUDOCK Shape Matching
FlexX Incremental Construction
FLOG Incremental Construction
GLIDE Descriptor matching/Monte Carlo
GOLD Genetic Algorithm
HammerHead Incremental Construction
MSDOCK Shape Matching
MCDOCK Monte Carlo
M-ZDOCK Shape Matching
ICM MC minimization
LigandFit Shape matching
QXP MC minimization, free searching and pruning
SLIDE Descriptor matching
Surflex Dock Surface based molecular similarity
SYSDOCK Shape Matching
ZDOCK Shape Matching
Docking algorithms in drug Designing and discovery
Drugs are small ligands which are often highly flexible with a
large number of conformations. Drug molecule surfaces are optimally
complemented to the receptor pocket. Therefore, docking schemes for
drug designing are based on the enhanced ligand flexibility. Mediating
hydrogen bonds, water molecules in the interface play important
roles.23
Two approaches are adopted for docking in drug designing.
In the first approach, fragments are selected combinatorially from a
library of chemical fragments, which is docked in to the binding site
and molecule is grown, considering all the permissible degrees of
freedom. The docked orientation is investigated to minimize energies
and search for most favourable combinations. The algorithms for such
problems are reviewed by Bohacek and others.24
The second approach
is based on the docking of the entire molecule. Instead of step-by-
step ligating fragments, database matching and scoring is carried
out in this approach. Data bases which are utilized for this purpose
are CMC (Comprehensive Medicinal Chemistry), NCI (National
Cancer Institute databases of drugs), ACD (Available Chemical
Directory), ZINC etc. Selected compounds are either docked directly
one by one, or, dock compounds containing given pharmacophores.
Pharmacophores are the geometrical features or chemical attributes
which are common to the compounds found to bind experimentally
with the given receptor. Geometrical features include hydrogen
bonds, coordination of hydrophobic atoms, charged groups etc.
that are responsible for the recognition of ligands with the receptor.
Pharmacophores help in identifying the potential ligands from the
databases. The algorithm for multiple structural alignments identifies
the pharmacophores.25
Finn et al.,26
developed an algorithm called as
Randomized Pharmacophore Identification for Drug Design (RAPID)
which is designed to find the structural alignment between a pair of
molecules. In this method, a triplet is selected randomly from one
molecule and fined a congruent triplet in other molecule. If the yield
is larger than required then the transformation is considered as the
solution. At each iteration, the dilemma must be solved again between
the current solution and the next molecule. In this approaches, the
difficulty exists in the drug flexibility. An efficient algorithm for
flexible 3D structure matching against massive databases of small
molecules has been proposed by Rigoutsos et al.,27
. This method
determines those molecules which contain common substructure with
the substructure of the query molecule allowing torsional flexibility
around rotatable bonds. Pharmacophoric searches are routinely
utilized in the search of novel active compounds.28
Taking intrinsic
flexibility in the active site and entropic penalty associated with
ligand binding, Carlson et al.,29
developed a dynamic pharmacophore
construction algorithm which has been tested on HIV-1 integrase. In
spite of advantages of using pharmacophores and docking compounds,
this method also has limitations of drug diversity. Drug diversity is of
particular importance because it has been shown that the volume and
shape of the binding site can change, and drugs of diverse shape, size
and composition can bind at the same site.
Scoring functions
Docking algorithms predict a number of orientations (poses)
for the ligand inside the biding site. The evaluation and ranking of
envisaged ligand conformations are executed by some approximate
mathematical functions known as scoring functions. There are three
important applications of scoring functions in molecular docking:
ligand binding mode identification, binding affinity prediction, and
virtual database screening. An accurate scoring function would
perform equally well on each of them. The design of consistent and
reliable scoring functions is vital. Generally, free-energy estimation
techniques are used in the development of scoring functions of the
protein ligand docking complexes. Enthalpic and entropic effects
also play important roles in ligand-binding events. The free energy
perturbation approaches considers an additive equation of various
components of binding.30
Acomplete equation of this kind would have
the terms as appearing in equation (1).
int /
bind solvent conf rot trans rot vib
G G G G G G G
∆ = ∆ + ∆ + ∆ + ∆ + ∆ + ∆
(1)
where ΔGsolvent
contribution arises due to effect of the interaction of
solvent with ligand and protein. ΔGconf
is the effect of conformational
changes in protein and ligand. ΔGint
represent the contribution of the

Copyright:
©2018Yadava
DOI: 10.15406/emij.2018.06.00212
free energy of specific protein-ligand interaction. ΔGrot
is related with
the free energy loss due to freezing rotatable bonds, generally known
as entropic contribution. ΔGtrans/rot
represents the loss in translational
and rotational free energy caused by association of two bodies (protein
and ligand) to form a single body (protein-ligand complex).ΔGvib
is the
contribution of free energy due to changes in vibrational modes. There
are number of approaches for the estimation of various terms. Scoring
functions make various assumptions and simplifications of these
terms and do not fully account for a number of physical phenomena
that determine molecular recognition. Usually, three types of scoring
functions: Force field based scoring, knowledge based scoring and
empirical scoring functions are utilized by docking programs.
Force-field based scoring
A force field which expresses the energy of the system as a
sum of diverse non-bonded terms (viz; van der Waals (VDW)
interactions, electrostatic interactions, and bond stretching/bending/
torsional forces), involved in molecular recognition, are used for
the development of force-field based scoring functions. Force field
methods utilize a variety of force-field parameters. Empirical scoring
functions use several intermolecular interaction terms which are
calibrated with maximum possible experimental data. The idea
that binding energies can be approximated by a sum of individual
uncorrelated terms is used in designing of these functions.
A typical semiempirical force field scoring function used in
molecular docking through DOCK is composed of two energy
components of Lennard-Jones potential and an electrostatic term
whose energy parameters are taken from the Amber force fields. The
force field as implemented in DOCK can be expressed by equation
(2).
12 6
( )
( )
ij ij i j
i j
ij ij ij ij
A B q q
U
r r r r
ε
∑ ∑
= − + (2)
where rij
represents the distance between protein atom i and
ligand atom j. Aij
and Bij
are the van der Waals parameters, and qi
and qj
are the atomic charges. Here, ε(rij)
is the distance dependent
dielectric constant representing the effect of solvent implicitly. In
spite of the computational efficiency of DOCK, it cannot account
for the desolvation effects that account for the effect of aqueous and
non-aqueous environment of polar and non-polar groups. In absence
of desolvation effects, the scoring function would be more biased
towards the coulombic interactions and favors highly charged ligands.
The common way of introducing the effect of desolvation terms
is to treat water molecules explicitly. However, these methods are
computationally expensive. The computational cost is reduced
by treating water as continuum dielectric medium. These models
includes Poisson-Boltzmann surface area(PB/SA) and Generalized-
Born surface area (GB/SA)31
which are often used in post scoring of
the docking programs. In some simplified scoring schemes solvation
effect in ligand binding free energy calculations is performed using a
GB/SA approach. The electrostatic interactions and the electrostatic
desolvation costs are calculated with the GB model while the
hydrophobic contributions for non-polar atoms are estimated using
the solvent-accessible surface areas (SA) of the atoms. Lennard-Jones
potential is utilized for the estimation of van der Waals energies.
The parameters of the, van der Waals, hydrophobic and electrostatic
contributions are optimized in agreement with experimental affinity
data.
Various force-field scoring functions are based on different force
field parameter sets. For example, GScore32
is based on the Tripos
force field33
and AutoDock34
on the AMBER force field. However,
functional forms are usually similar. Standard force-field scoring
functions have limitations of not only the inclusion of solvation and
entropictermsbuttheysufferfromcut-offdistancesforthetreatmentof
non-bonded interactions, complicating the accurate treatment of long-
range effects. Hydrogen-bonding terms are often included in different
ways. G-Score includes hydrogen-bonding terms which depend opon
the nature and geometry of the hydrogen bonding interaction. The
AutoDock treats all of the hydrogen bonds by a directional term and
12–10 Lennard–Jones potential. Recent semiempirical AutoDock
scoring scheme includes the evaluations for dispersion/repulsion,
hydrogen bonding, electrostatics, and desolvation terms (equation
(3)):
The weighting constants W have been optimized to calibrate the
empirical free energy based on a set of experimentally-determined
binding constants. The first term represent the van der Waals potential
dispersion/repulsion interactions. The second term represents the
directional H-bond energy term. C and D are the optimal parameters
for hydrogen bonds. The function E(θ) provides directionality based
on the angle θ from ideal hydrogen-bonding geometry. Coulomb
electrostatic potential is considered in third term. The final term is a
desolvation potential based on the volume of atoms (V) that surround
a given atom and shelter it from solvent, weighted by a solvation
parameter (S) and and exponential term with distance-weighting
factor σ=3.5Å.35
Empirical scoring
The empirical scoring functions consist of several energy terms
whose coefficients(weights) are based on experimentally observed
values which are obtained from the regression analysis using
experimentally determined binding energies and x-ray structures.
Due to simple energy terms, the binding score calculation from
empirical scoring functions are much faster than force field scoring
functions. First of all Bohm36
developed empirical scoring function
SCORE1, based on the experimental data of forty five protein-ligand
complexes which consist of four energy terms i.e. ionic interaction,
hydrogen bonds, the lipophilic contact between protein and ligand and
the number of rotatable bonds in the ligand. Later on the empirical
scoring function was improved by taking a number of parameters in
to account. The ChemScore empirical scoring function, presented
by Eldridge and coworkers37
consist of the terms: metal atoms
interaction, hydrogen bonds, the lipophilic effects of atoms, and the
effective number of rotatable bonds in the ligand. In ChemScore, the
free energy of binding is given by equation (4).
2
2
2
_
12 6 12 10
, , , ,
( )
( ) ( )( ) ( )
( )
rij
ij ij ij ij i j
vdw h bond elec sol i j j i
i j i j i j i j
ij ij ij ij ij ij
A B C D q q
V W W E W W S V S V e
r r r r r r
σ
θ
ε
−
∑ ∑ ∑ ∑
= − + − + + + (3)
'
_ 0
_
( , ) ( , ) ( ) ( , )
bind H bond metal lipo rot nl nl
H bond metal lipo rot
G G f R G f R G f R G f P P G
α α
∑ ∑ ∑ ∑
∆ = ∆ ∆ ∆ + ∆ ∆ ∆ + ∆ ∆ + ∆ + ∆ (4)

Copyright:
©2018Yadava
DOI: 10.15406/emij.2018.06.00212
Here, f is a function which depends on angular (Δα) and/or distance
(ΔR) terms. ΔG0
, is the regression constant. Wang et al. developed a
new empirical function based on a larger set of 200 protein–ligand
complexes which was called as X-score. The X-score contains VDW
interactions, hydrogen bonds, hydrophobic effects and effective
rotatable bonds in ligand.38
The terms accounting for non-bonded interactions are includeded
in empirical scoring functions in various ways. For example, in the
early LUDI formulation,30
the hydrogen-bonding term is separated
into neutral hydrogen bonds and ionic hydrogen bonds, whereas
ChemScore does not differentiate between different types of
hydrogen bonds. Their hydrophobic interaction treatments may also
have different ways. F-score has an additional term for aromatic
interactions. Glide-score39
takes into account a number of parameters
like hydrogen bonds (H-bond), hydrophobic contacts (Lipo), van der-
Waals (vdW), columbic (Coul), polar interactions in the binding site
(Site), metal binding term (Metal) and penalty for buried polar group
(BuryP) & freezing rotatable bonds (RotB) (equation (5)).
0.130 0.065 – –
bond
G score H Lipo Metal Site Coul vdW BuryP RotB
− = + + + + + (5)
In Extra Precision (XP) docking protocol of Glide, in addition to
uniquewaterdesolvationenergyterms,protein-ligandstructuralmotifs
leading to enhanced binding affinity are included: (i) hydrophobic
enclosure where groups of lipophilic ligand atoms are enclosed
on opposite faces by lipophilic protein atoms, (ii) neutral-neutral
single or correlated hydrogen bonds in a hydrophobically enclosed
environment, and (iii) five categories of charged-charged hydrogen
bonds.40
XP Glide Score contains following terms (equation(6)).41
_ coul vdW bind penalty
XP GlideScore E E E E
= + + +
(6)
Where, _ _ _ _ _ _
_
bind hyd enclosure hb nn motif hb cc motif PI hb pair phobic pair
E E E E E E E
= + + + + +
and _
penalty desolv ligand strain
E E E
= +
Despite their fast and direct estimation of binding affinities, these
scoring schemes come across the limitations of the penalty term for
bad structures and highly dependence on the placement of hydrogen
atoms.
Knowledge based scoring
Knowledge-based scoring functions are derived from statistical
observations of intermolecular contacts recognized from the
databases. They are designed to reproduce structures rather than
binding energies. Potential of Mean Force (PMF), DrugScore etc. are
the well-liked implementations of these scoring functions that utilizes
pairwise atomic potentials. In general, with empirical methods,
Knowledge based scoring functions attempt to implicitly capture
binding effects that are difficult to model explicitly.
Compared to the force field and empirical scoring functions, the
knowledge-based scoring functions offer a good balance between
accuracy and speed. Because the potentials employed in these
schemes are extracted from the structures rather than from attempting
to reproduce the known affinities by fitting. Knowledge-based scoring
functions are quite robust and relatively insensitive to the training set
as they are derived from a large and diverse structural database.
Tanaka and Scheraga pioneered the Knowledge based scoring
which led to the development of a number of knowledge-based scoring
functions which has been applied in protein structure prediction and
protein–ligand studies (Li and Liang, 2006). The functional form of
Knowledge based scoring function SMoG42
is given by the equation
(7):
ij
j
j
G gi
i
∑
= ∆ ; log
ij
ij
p
g kT
p
= −
 
 
 
(7)
and Δij
= 0 or 1 depending upon whether values of i and j are
more than 5Å or within 5Å. pij
and p represent the interatomic
and averaged interatomic interactions. k is the Boltzmann constant
and T represents the Kelvin temperature. Potential of Mean Force
(PMF)43,44
and DrugScore include solvent-accessibility corrections
to pair-wise potentials as well. Knowledge based scoring functions
are computationally simple. A disadvantage is that their derivation is
basically based on information implicitly programmed in limited sets
of protein–ligand complex structures. Table 2 depicts the various type
of scoring functions employed in different docking programs.
Table 2 Types of scoring functions employed in docking programs
Force field based scoring Empirical scoring Knowledge based scoring
DOCK,AutoDock, GOLD, SYBYL/
D-Score, SYBYL/G-Score etc.
FlexX, Glide, LUDI, ICM, ChemScore, X-Score, Surflex,
SYBYL/F-Score, SCALE, SFCscore, LigScore etc.
ITScore, PMF, DrugScore, SMoG, BLEEP,
MScore, KScore, GOLD/ASP, DFIRE etc.
Consensus scoring
Because of imperfections of scoring functions, now-a-days, another
type of scoring function known as consensus scoring is utilized which
combines the information from different scoring schemes to overcome
the limitations in scoring. This can be very useful, as it combines the
advantages and simultaneously attenuates the shortcomings of each
method. Chemscore, GFscore, Xcscore, Gold-like, FlexX etc. scoring
functions are the examples of consensus scoring functions.
Concluding remarks
Docking simulations help in the development of pharmaceutical
research which cuts the much of the cost and efforts involved in
traditional drug discovery. Virtual screening on protein templates
provide an opportunity for de novo identification of ligands without
biased from known hits. A large number of search algorithms have
been developed in order to implement flexibilities of ligands and/
or proteins to obtain the correct poses of the complexes. The results

Copyright:
©2018Yadava
DOI: 10.15406/emij.2018.06.00212
obtained utilizing, the search methodologies, employed by different
docking programs, are highly dependent upon the system chosen for
study. Therefore, we should be cautious regarding the choice of the
algorithm for docking. The interplay between docking and scoring
functions is fairly complex, but it is often easier to produce reliable
models of bound ligands than to distinguish true ligands from false-
positives. Despite considerable interest and improvements the current
scoring functions are still far from being universally acceptable.
Each scoring functions have their own advantages and limitations.
The comparision of scoring functions is not promising if they are
tested on different sets. Some comparision studies available online
(http://www.csardock.org, http://dud.docking.org, CCDC/astex etc.)
may be invaluable and promising for the development of new and
improvement of existing scoring functions.
Acknowledgments
None.
Conflict of interest
The author declares there is no conflict of interest.
References
1. Van Drie JH. Computer-aided drug design: the next 20 years. J Comput
Aided Mol Des. 2007;21(10-11):591–601.
2. Kitchen DB, Decornez H, Furr JR, et al. Docking and scoring in virtual
screening for drug discovery: methods and applications. Nat Rev Drug
Discov. 2004;3(11):935–949.
3. Ferreira LG, dos Santos RN, Oliva G, et al. Molecular Docking and
Structure-Based Drug Design Strategies. Molecules. 2015;20(7):13384–
13421.
4. Oliveira JS, Pereira JH, Canduri F, et al. Crystallographic and pre-stea-
dy-state kinetics studies on binding of NADH to wild-type and isonia-
zid-resistant enoyl-ACP(CoA) reductase enzymes from Mycobacterium
tuberculosis. J Mol Biol. 2006;359(3):646–666.
5. Kuntz ID, Blaney JM, Oatley SJ, et al. A geometric approach to macro-
molecule-ligand interactions. J Mol Biol. 1982;161(2):269–288.
6. Goodsell DS, Olson AJ. Automated Docking of Substrates to Proteins by
Simulated Annealing. Proteins. 1990;8(3):195–202.
7. Leach AR. Molecular Modelling: Principles and Applications, Addison
Wesley Longman Limited, Harlow.
8. Friesner RA, Banks JL, Murphy RB, et al. Glide: a new approach for rap-
id, accurate docking and scoring. 1. Method and assessment of docking
accuracy. J Med Chem. 2004;47(7):1739–149.
9. Rarey M, Kramer B, Lengauer T, et al. A Fast Flexible Docking Me-
thod using an Incremental Construction Algorithm. J Mol Biol.
1996;261(3):470–489.
10. Moré JJ, Wu Z. Distance Geometry Optimization for Protein Structures.
Journal of Global Optimization. 1999;15(3):219–234.
11. Miller MD1, Kearsley SK, Underwood DJ, et al. FLOG: a system to se-
lect ‘quasi-flexible’ligands complementary to a receptor of known three-
-dimensional structure. J Comput Aided Mol Des. 1994;8(2):153–174.
12. Ewing TJ, Makino S, Skillman AG, et al. DOCK 4.0: search strategies
for automated molecular docking of flexible molecule databases. J Com-
put Aided Mol Des. 2001;15(5):411–428.
13. Chen R, Li L, Weng Z. ZDOCK: An initial-stage protein-docking algori-
thm. Proteins. 2003;52(1):80–87.
14. Lorber DM, Shoichet B. Flexible ligand docking using conformational
ensembles. Protein Sci. 1998;7(4):938–950.
15. Van Der Spoel D, Lindahl E, Hess B, et al. GROMACS: fast, flexible,
and free. J Comput Chem. 2005;26(16):1701–1718.
16. Christen M, Hunenberger PH, Bakowies D, et al. The GROMOS sof-
tware for biomolecular simulation: GROMOS05. J Comput Chem.
2005;26(16):1719–1751.
17. Liu M, Wang S. MCDOCK: A Monte Carlo simulation approach to the
molecular docking problem. J Comput Aided Mol Des. 1999;13(5):435–
451.
18. Totrov R, Abagyan R. Flexible protein–ligand docking by global energy
optimization in internal coordinates. Proteins. 1997;(Suppl):215–220.
19. Brooijmans N, Kuntz ID. Molecular recognition and docking algorithms.
Annu Rev Biophys Biomol Struct. 2003;32:335–373.
20. Glover F. Future paths for integer programming and links to artificial
intelligence. Computers and Operations Research. 1986;13(5):533–549.
21. Mangoni M, Roccatano D, Di Nola A. Docking of flexible ligands to
flexible receptors in solution by molecular dynamics simulation. Pro-
teins. 1999;35(2):153–162.
22. Cocohoba J, Dong BJ. Raltegravir: The first HIV integrase inhibitor. Clin
Infect Dis. 2009;48(7):931–939.
23. Lengauer T, Rarey M. Computational methods for biomolecular docking.
Curr Opin Struct Biol. 1996;6(3):402–406.
24. Bohacek RS, McMartin C, Guida WC. The art and practice of structu-
re-based drug design: a molecular modeling perspective. Med Res Rev.
1996;16(1):3–50.
25. Leibowitz N, Fligelman Z, Nussinov R, et al. An automated multiple
structure alignment and detection of a common substructural motif. Pro-
teins. 2001;43:235–245.
26. Finn PW, Kavarki LE, Latombe LC, et al. RAPID: randomized phar-
macophore identification for drug design. Computational Geometry.
1997;97:324–333.
27. Rigoutsos I, Platt D, Califano A. Flexible 3D-substructure matchingand
novel conformer derivation in very large databases of 3D-molecular in-
formation. IBM Research Division. Yorktown Heights, NY: T.J. Watson
Research Center, 1996.
28. Martin YC. 3D database searching in drug design. J Med Chem.
1992;35(12):2145–2154.
29. Carlson HA, Masukawa KM, Rubins K, et al. Developing a dy-
namic pharmacophore model for HIV-1 integrase. J Med Chem.
20001;43(11):2100–2114.
30. Böhm HJ. LUDI: rule-based automatic design of new substituents for
enzyme inhibitor leads. J Comput Aided Mol Des. 1992;6(6):593–606.
31. Rocchia W, Sridharan S, Nicholls A, et al. Rapid grid-based construction
of the molecular surface and the use of induced surface charge to calcu-
late reaction field energies: applications to the molecular systems and
geometric objects. J Comput Chem. 2002;23(1):128–137.
32. Hawkins GD, Cramer CJ, Truhlar DG. Pairwise solute descreening of
solute charges from a dielectric medium. Chemical Physics Letters.
1995;246:122–129.
33. Kramer B, Rarey M, Lengauer T. Evaluation of the FLEXX incre-
mental construction algorithm for protein–ligand docking. Proteins.
1999;37(2):228–241.
34. Morris GM, Goodsell DS, Halliday RS, et al. Automated Docking Using
a Lamarckian Genetic Algorithm and Empirical Binding Free Energy
Function. Journal of Computational Chemistry. 1998;19(14):1639–1662.

Copyright:
©2018Yadava
DOI: 10.15406/emij.2018.06.00212
35. Forli S, Botta M. Lennard-Jones Potential and Dummy Atom Settings to
Overcome the AUTODOCK Limitation in Treating Flexible Ring Sys-
tems. J Chem Inf Model. 2007;47(4):1481–192.
36. Bohm HJ. The development of a simple empirical scoring function to
estimate the binding constant for a protein-ligand complex of known
three-dimensional structure. J Comput Aided Mol Des. 1994;8(3):243–
256.
37. Eldridge MD, Murray CW, Auton TR, et al. Empirical scoring functions:
I. The development of a fast empirical scoring function to estimate the
binding affinity of ligands in receptor complexes. J Comput Aided Mol
Des. 1997;11(5):425–445.
38. Wang R, Lai L, Wang S. Further development and validation of empirical
scoring functions for structure based binding affinity prediction. J Com-
put Aided Mol Des. 2002;16(1):11–26.
39. Friesner RA, Murphy RB, Repasky MP, et al. Extra precision glide: do-
cking and scoring incorporating a model of hydrophobic enclosure for
protein-ligand complexes. J Med Chem. 2006;49(21):6177–61196.
40. Gohlke H, Hendlich M, Klebe G. Knowledge-based scoring function to
predict protein-ligand interactions. J Mol Biol. 2000;295(2):337–356.
41. Yadava U, Singh M, Roychoudhury M. Pyrazolo[3,4-d]pyrimidines
as inhibitor of anti-coagulation and inflammation activities of phos-
pholipase A2: insight from molecular docking studies. J Biol Phys.
2013;39(3):419–438.
42. Muegge I, Martin YC. A general and fast scoring function for pro-
tein-ligand interactions: a simplified potential approach. J Med Chem.
1999;42(5):791–804.
43. Li X, Liang J. Knowledge-based energy functions for computational
studies of proteins in Computational Methods for Protein Structure Pre-
diction and Modeling. Xu Y, Xu D, Liang J, editors. New York, Springer.
44. Higueruelo AP, Schreyer A, Bickerton GR, et al. What Can We Learn
from Molecular Recognition in Protein–Ligand Complexes for the
Design of New Drugs? Angewandte Chemie Internation Edition.
1996;35:2588–2614.

43_EMIJ-06-00212.pdf

Recommended

Recommended

More Related Content

Similar to 43_EMIJ-06-00212.pdf

Similar to 43_EMIJ-06-00212.pdf (20)

Recently uploaded

Recently uploaded (20)

43_EMIJ-06-00212.pdf