In silico methods in drug discovery and development

In Silico methods in Drug Discovery
and Development
Stephane Acoca
Department of Biochemistry
McGill University
Montrea, Quebec, Canada
Submitted August 2011
A thesis submitted to McGill University in partial fulfillment of the requirements
for the degree of Doctor of Philosophy
© Stephane Acoca, 2011

i

Abstract
Computational drug design methods have become increasingly invaluable in the drug discovery
and development process. Throughout this thesis will be described the development and
application of methods that are used at every stage of the drug discovery and development
pipeline. In Chapter 2 will take a look at the use computational methods towards the
understanding and development of two novel Bcl-2 inhibitors, Obatoclax and ABT-737, being
developed for the treatment of Cancer. The study proposes certain mechanisms through which
ABT-737 displays selectivity towards certain targets within the Bcl-2 family. Additionally, we
propose a binding mode for Obatoclax which is in accordance with experimental data. The
following Chapter addresses the use of virtual screening for the identification of novel lead
compounds. Trypanosoma brucei RNA Editing Ligase 1 was chosen as the target for the
development of treatments against Trypanosoma infections and C35, a potent novel inhibitor of
the enzyme, was identified. Furthermore, our research shows that the action of C35 extends to
inhibition of several critical enzyme activities required for the RNA editing process as well as
compromising the integrity of the multiprotein complex which carries it out. The following
Chapter takes a look at the use of mass spectrometry data in order to expedite discovery of
bioactive compounds in natural products. We developed an algorithm which analyses MS/MS
data in order to derive the Molecular Formula of the compound. The novel algorithm obtained a
95% success rate on a test set of 91 compounds. The last Chapter of the thesis explores the use of
molecular dynamics to generate a conformational ensemble of targets for virtual screening.
Conformational ensembles were generated for a target test set taken from the Directory for
Useful Decoys. The results showed that molecular dynamics-based conformational ensembles

ii

provided remarkable improvements on 2 of the targets tested due to the enhanced capacity to
properly dock compounds in otherwise restricted structures. The last Chapter of the thesis is a
general discussion on the work of the thesis and a proposal on how all can be integrated within
the drug discovery and development pipeline.

iii

Résumé
Les méthodes the modélisation sont devenues un outil inestimable dans le processus de
découverte et de développement de nouveaux médicaments. Au cours de cette thèse va être
décrit le développement et l’application de méthodes utilisés à chaque stage de la découverte et
du développement de produits pharmaceutiques. Le Chapitre 2 est un aperçu sur l’utilisation de
méthodes computationnelles vers le développement de deux nouveaux inhibiteurs des protéines
Bcl-2, Obatoclax et ABT-737, en développement pour le traitement du Cancer. L’étude propose
certains mécanismes d’ABT-737 qui expliquent ca sélectivité envers les membres de la famille
Bcl-2. De plus, nous proposons un mécanisme d’attachement pour Obatoclax qui conforme aux
données expérimentales. Le Chapitre suivant adresse l’utilisation du dépistage virtuel pour
l’identification de nouvelles molécules mère. La Ligase de l’Edition d’ARN du Trypanosoma
brucei a été choisie comme cible pour le développement de traitements contre des infections dû
au Trypanosome et C35 a été identifié comme nouvel inhibiteur de l’enzyme. En outre, notre
recherche démontre que l’action de C35 s’étends a l’inhibition de plusieurs enzymes nécessaires
pour le mécanisme d’édition de l’ARN en plus de compromettre l’intégrité du complexe multi-
protéinique qui l’effectue. Le Chapitre suivant prends regard a l’utilisation de donnes dérivant
de la spectrométrie de masse pour but d’accélérer la découverte de molécules bioactives venant
de sources naturelles. Nous avons développé un algorithme qui analyse les données MS/MS pour
but de dériver la formule moléculaire du composé. Le nouvel algorithme a obtenu un taux de
succès s’élevant à 95% sur un ensemble test de 91 molécules. Le dernier Chapitre de la thèse
explore l’utilisation de simulations de dynamique moléculaire pour générer en ensemble
conformationel de protéines cible pour son utilisation dans le dépistage virtuel. Les ensembles

iv

conformationel ont étés généré pour une série test obtenu d’un répertoire attitré ‘Directory for
Useful Decoys’. Les résultats démontrent que les ensembles conformationel dérivés de la
dynamique moléculaire ont apporté des améliorations remarquables sur deux des cibles testées
dû à une capacité accrue de placement approprié des molécules dans un site qui est autrement
très restreint. Le dernier Chapitre de cette thèse est une discussion générale sur le travail
accomplie et une proposition sur la manière dont tous les éléments sont intégrer dans un
protocole de découverte et de développement de produits pharmaceutiques.

v

Acknowledgements
I would like to thank first Dr Enrico Purisima and Prof. Gordon Shore for their mentorship
and patience throughout the doctoral work leading to this thesis. I am very thankful for the
experiences I have during my tenure in Dr Purisima’s laboratory. I would like to show my
special thanks to former members of the laboratory Dr Sathesh Bhat, Dr Marwen Naim, Herve
Hogues, and Dr Qizhi Cui whose guidance, friendship, inspiration, assistance and support have
been invaluable during my tenure. My long conversations on modeling with Dr Bhat and Mr
Hogues have been of special value in my learning of computational modeling. I would also like
to extend thanks to current and past members of the laboratory which include Dr Shafinaz
Chowdhury, Dr Christophe Deprez, Dr Edwin Wang and Dr Sheldon Dennis for creating a
positive work environment. I would like to show special thanks members of my Research
Advisory Committee (RAC) Prof. John Silvius, Prof. Albert Berghuis who have been of
great assistance in guiding me through the completion of the doctoral work. I’d also like to add
special recognition to Prof. Imed Gallouzi for his help. I’d also like to thank the Chemical
Biology program at McGill, which has partially funded my work. Lastly, I’d like to thank my
family for their continued encouragement and support.

vi

Table of Contents
Abstract i
Résumé iii
Acknowledgements v
Table of Contents vi
List of Figures x
List of Tables xiii
Abbreviations xv
Contribution of Authors xvii
Chapter 1. General Introduction
1.1 Drug Discovery and Development 2
1.1.1 Overview & Challenges 2
1.1.2 Thesis Outline 5
1.2 Molecular Modeling 7
1.2.1 Molecular Mechanics 7
1.3 Predicting Binding Free Energies – Scoring 12
1.3.1 Effect of water: Continuum (Implicit) Solvation energy 15
1.3.1.1 Finite difference 18
1.3.1.2 Boundary Element Method 19
1.3.1.3 Desolvation Cost 19
1.3.2 Scoring Functions 21
1.3.2.1 Physical-Chemical 22
1.3.2.2 Empirical function 25
1.3.2.3 Knowledge-based 25
1.3.2.4 Problems 26
1.4 Predicting Binding Modes – Docking 27
1.4.1 Docking Algorithms 28
1.4.1.1 Fast Shape Matching 28
1.4.1.2 Incremental Construction 29
1.4.1.3 Monte Carlo Simulations 30
1.4.1.4 Evolutionary Programming 31
1.5 Molecular Dynamics 32
1.5.1 Newton’s Laws 32
1.5.2 Ensembles 34
1.5.3 Verlet Algorithm 34
1.5.4 Considerations 36
1.5.5 Boundary Conditions 38
1.5.6 Long Range Electrostatic Calculations: The Ewald Summation Method 39
1.6 Virtual Screening 40
1.6.1 Virtual Screening Pipeline 41

vii

1.6.2 The Target 43
1.6.3 The Compound Database 44
1.6.4 The Docking Protocol 44
1.6.5 MD Simulations 45
1.6.6 Conformational Ensembles 45
1.7 Successes of CADD 48
Chapter 2. Molecular Dynamics Study of Small Molecule Inhibitors
of the Bcl-2 Family
Preface 51
2.1 Rationale 52
2.2 Abstract 52
2.3 Introduction 53
2.4 Methods 58
2.4.1 Structure Preparation 58
2.4.2 Force Field Parameters 58
2.4.3 Docking 59
2.4.4 Molecular Dynamics Simulations 60
2.4.5 Binding free energy estimate 61
2.5 Results and Discussion 62
2.5.1 Molecular Modeling of ABT-737 complexes 62
2.5.2 Binding groove structure 64
2.5.3 Chlorobiphenyl group 65
2.5.4 Phenylpiperazine linker 67
2.5.5 Nitrophenylsulfonamide group 69
2.5.6 S-phenyl group 71
2.5.7 Dimethyl group 72
2.5.8 SIE Analysis and Virtual Alanine Mutations 72
2.5.9 Protein structure and dynamics 75
2.5.10 Mcl-1 and obatoclax 78
2.6 Conclusion 80
Chapter 3. Naphthalene-based RNA editing inhibitor blocks RNA
editing activities and editosome assembly in Trypanosoma
Brucei
Preface 83
3.1 Rationale 84
3.2 Abstract 84
3.3 Introduction 85
3.4 Experimental Procedures 88
3.4.2 Virtual Screening 89
3.4.3 Solvated Interaction Energy 90

viii

3.4.4 Preparation of mitochondrial extract and tandem affinity purification 90
of ligase complex
3.4.5 Preparation of RNAs 91
3.4.6 Adenylylation and deadenylylation assays 91
3.4.7 In vitro RNA editing assays 92
3.4.8 Gel shift assay 93
3.4.9 Guanylyltransferase labeling 94
3.5 Results 94
3.5.1 Virtual Screening 94
3.5.2 Inhibition of RNA editing by selected compounds 97
3.5.3 Inhibition of ligase adenylylation at low protein concentrations by C35 and S5 100
3.5.4 Inhibition of deadenylylation by C35 and S5 102
3.5.5 Inhibition of different steps of RNA editing by C35 and S5 104
3.5.6 Inhibitory compounds affect the editosome RNA-binding activity 107
3.5.7 20S editosome complex integrity is affected by C35 treatment 110
3.6 Discussion 112
3.7 Acknowledgments 117
Chapter 4. Automated Molecular Formula Analysis determination by
Tandem Mass Spectrometry (MS/MS)
Preface 119
4.1 Rationale 120
4.2 Abstract 120
4.3 Introduction 121
4.4 Experimental 125
4.4.1 Materials 125
4.4.2 Instrumentation 125
4.4.3 MS/MS experiments 126
4.4.4 The algorithm of molecular formula analysis 127
4.4.5 Nitrogen-enriched or oxygen-enriched compounds 132
4.5.1 Risk of assigning incorrect molecular formula 133
4.5.2 Mass accuracy 134
4.5.3 Fragmentation pathways of brefeldin 4 135
4.5.4 Molecules with single structural domain 137
4.5.5 Molecules with multiple core structures 140
4.5.6 Analysis of structurally-related compounds 143
4.5.7 Cyclazocine and N-alllylnormetazocine 146
4.5.8 Peptides 148
4.5.9 Chloro- or bromo-containing compounds 152
4.6 Conclusion 154
4.7 Acknowledgements 155

ix

Chapter 5. Molecular Dynamics ensemble in Virtual Screening
Preface 157
5.1 Rationale 158
5.2 Abstract 158
5.3 Introduction 159
5.4 Methods 162
5.4.2 Ligand Preparation and Docking 163
5.4.3 Molecular dynamics simulations 164
5.4.4 Force Field Parameters 165
5.4.5 Clustering 165
5.4.6 Test Data Sets 165
5.5.1 Overview of Results 166
5.5.2 Obstructive changes during apo simulations 167
5.5.3 Performance of holo ensemble 170
5.5.4 Structural change in holo ensemble 171
5.5.5 Effect on score distribution 174
5.5.6 Comparison with RCS 177
5.5.7 Use of DUD training set 178
5.6 Conclusion 180
Chapter 6. General Discussion
6.1 Molecular Dynamics Study of Bcl-2 Inhibitors
6.2 Discovery of TbRel1 Inhibitors
6.3 Automated Molecular Formula determination by Tandem Mass Spectrometry
6.4 Ensemble-based Virtual Screening
Appendices
Appendix A
Appendix B
Appendix C
Appendix D
References 222
Original Contributions to Knowledge 250

x

List of Figures
Chapter 1
Figure 1.01 The pharmaceutical drug discovery and development pipeline
Figure 1.02 Increasing costs in pharmaceutical R&D
Figure 1.03 Pre-approval costs for new drugs
Figure 1.04 The contributions of bonded terms to the potential energy function
Figure 1.05 Cubic grid scheme for the Finite Difference Method
Figure 1.06 Representation of desolvation effects during ligand-protein complex formation
Figure 1.07 Periodic boundary conditions in molecular dynamic simulations
Figure 1.08 The virtual screening pipeline
Chapter 2
Figure 2.01 ABT-737 chemical structure
Figure 2.02 Obatoclax chemical structure
Figure 2.03 Multiple sequence alignment of representative BH3 domains from BH3-Only
proteins.
Figure 2.04 Superposition of ABT-737 and Bim BH3 peptide bound to Bcl-xL.
Figure 2.05 Calculated binding mode of ABT-737 in Bcl-xl, Bcl-2 and Mcl-1.
Figure 2.06 Distance of the ABT-737 biphenyl ring centroids from their initial positions after
superposition of the protein C-alpha atoms to those in the first snapshots.
Figure 2.07 Distance of the ABT-737 linker ring centroids from their initial positions after
superposition of the protein C-alpha atoms to those in the first snapshots.
Figure 2.08 Distance of the ABT-737 nitrophenyl and S-phenyl ring centroids from their
initial positions after superposition of the protein C-alpha atoms to those in the
first snapshot.
Figure 2.09 Calculated binding mode of obatoclax in Mcl-1.
Chapter 3
Figure 3.01 Predicted binding modes of TbREL1
Figure 3.02 Effect of selected compounds that inhibit editosome activity
Figure 3.03 Effect of inhibitory compounds on adenylylation and deadenylylation steps of
RNA editing ligases
Figure 3.04 Effect of inhibitory compounds on different steps of RNA editing
Figure 3.05 Effect of inhibitory compounds on RNA-binding activity of editosome complex
Figure 3.06 Analysis of sedimentation profile and activity of ligase-associated complexes in
the presence of C35

xi

Figure 3.07 Alternative models for the mechanism of action of C35 and S5.
Chapter 4
Figure 4.01 The MS/MS spectrum of brefeldin A
Figure 4.02 Fragmentation pathways of brefeldin A
Figure 4.03 The MS/MS spectrum of prazosin
Figure 4.04 Fragmentation pathways of prazosin
Figure 4.05 The MS/MS spectrum of dihydroergotamine and dihydroergocristine
Figure 4.06 Fragmentation pathways of dihydroergotamine
Figure 4.07 Structures of dihydroergotamine and dihydroergocristine
Figure 4.08 The MS/MS spectrum of cyclazocine and N-allylnormetazocine
Figure 4.09 Fragmentation pathways of cyclazocine
Figure 4.10 The MS/MS spectrum of 5-leucine encephalin
Figure 4.11 Stepwise analysis of 5-leucine encephalin sequences
Figure 4.12 Overall detail analysis of 5-leucine encephalin
Figure 4.13 The MS/MS spectrum of quinacrine
Figure 4.14 Shows the plausible fragmentation pathways of quinacrine
Chapter 5
Figure 5.01 Changes in binding site observed in the apo ensemble in a) COX2, b) AR,
c) GART and d) PARP.
Figure 5.02 Changes in binding site observed in the holo ensemble for COX2.
Figure 5.03 Changes in binding site observed in the holo ensemble for AR.
Figure 5.04 Changes in binding site observed in the holo ensemble for ER.
Figure 5.05 Score distribution of true binders across the crystal structure and selected holo ensemble
structure for a) ER, b) AR, c) EGFR, and d) COX2.
Appendix
A
Figure A.01 Helices surrounding the binding grooves of Bcl-xL, Bcl-2 and Mcl-1.
Figure A.02 Distance between ABT-737 sulfonamide HN and backbone carbonyl O of Bcl-xL
Asn136, bcl-2 Asn140 and Mcl-1 Asn260.
Figure A.03 Hydrogen bond pair distances between ABT-737 sulfonyl O and side chains
in Bcl-xL, Bcl-2 and Mcl-1

xii

Figure A.04 Hydrogen bond pair distances between ABT-737 dimethylamino HN and side
chain carboxylate O in Bcl-xL and Bcl-2.
Figure A.05 Distance of ABT-737 ring centroids from their initial positions after superposition
of the protein C-alpha atoms to those in the first snapshot.
B
Figure B.01 Inhibitors identified from first round of virtual screening.
Figure B.02 Previously identified inhibitors not retrieved in virtual screening.
D
Figure D.01 Overview of VS results for the crystal structure and apo/holo ensembles
Figure D.02 Ensemble-based VS results for structures generated from apo MDs

xiii

List of Tables
Chapter 2
Table 2.1 Solvated interaction energies (SIE) in kcal/mol
Table 2.2 Virtual alanine mutations
Chapter 3
Table 3.1 Virtual hits selected for experimental validation
Chapter 4
Table 4.1 Potential neutral losses in the MS/MS experiment in forward MFA
Table 4.2 Reverse MFA of brefeldin A with correct formula of precursor ion
Table 4.3 Reverse MFA of brefeldin A with incorrect formula of precursor ion
Table 4.4 Molecular formula analysis of prazonsin
Table 4.5 Molecular formula analysis of dihydroergotamine
Table 4.6 Molecular formula analysis of dihydroergocristine
Table 4.7 Molecular formula analysis of cyclazocine
Table 4.8 Molecular formula analysis of N-allylnormetazocine
Table 4.9 Molecular formula analysis of quinacrine
Chapter 5
Table 5.1 Targets of the DUD set selected and properties of each set
Appendix
A
Table A.01 Fourier coefficients for ca-s6-n-ca

xiv

B
Table B.01 Ranking of selected hits from virtual screen
C
Table C.01 Molecular formula analysis of 5-leucine enkephalin

xv

Abbreviations
ADA Adenosine Deaminase
AR Androgen Receptor
BCL-2 B-Cell Lymphoma 2
BEM Boundary Element Method
CADD Computer-Aided Drug Design
CML Chronic Myelogenous Leukemia
COX2 Cyclooxygenase 2
CRK Cdc2-Related Kinase
DNDi Drugs for Neglected Disease initiative
DUD Directory of Useful Decoys
EGFR Epidermal Growth Factor Receptor
EP Evolutionary Programming
ER Estrogen Receptor
FDM Finite Difference Method
FXa Factor Xa
GA Genetic Algorithm
GART Glynacinamide Ribonucleotide Transformylase
gRNA guide RNA
GSK Glycogen Synthase Kinase
HSP90 Heat Shock Protein 90
IC Incremental Construction
KB Knowledge-based
KBP Knowledge-based potentials
LGA Lamarckian Genetic Algorithm
MAPK Mitogen-Activated Protein Kinase
MC Monte-Carlo
MD Molecular Dynamics
MF Molecular Formula
MFA Molecular Formula Analysis
MM Molecular Mechanics
MW Molecular Weight
NCE New Chemical Entity
NS Nanoseconds
NTD Neglected Tropical Diseases
PARP Poly ADP-Ribose Polymerase
PDB Protein Data Bank
PBSA Poisson-Boltzmann Surface Area
PS Picoseconds
RCS Relaxed Complex Scheme
RMS Root Mean Square
SA Surface Area
SBDD Structure-Based Drug Design
SIE Solvated Interaction Energy

xvi

SM Shape Matching
SRC SRC Tyrosine Kinase
TbRel1 Trypanosoma Brucei RNA-editing Ligase 1
VDS Virtual Decoy Set
VdW Van der Waals
VS Virtual Screening

xvii

Contributions of Authors
This thesis includes the text and figures from 3 published articles. I am the first author in
one of the manuscript (Chapter 2) and second author in the remaining two (Chapter 3 & 4).
Additionally, the thesis includes the text and figures from work to be completed towards the
publication of a manuscript (Chapter 5). This thesis has been written in manuscript-based format,
and the references of all chapters have been combined into one reference section at the end of the
dissertation. The contributions of the authors for each of the manuscripts are as follows:
Chapter 2:
Acoca S., Cui Q., Shore G.C., Purisima E.O. 2011. Molecular Dynamics Study of Small
Molecule Inhibitors of the Bcl-2 Family. Proteins. 79(9):2624-36.
I performed all original work and completed the first draft of the manuscript. Prior to
submission, Dr Cui reran a number of the simulations and Dr Purisima reworked the manuscript.
Chapter 3:
Moshiri H., Acoca S., Kala S., Najafadabi H.S., Hogues H., Purisima E.O., Salavati R. 2011.
RNA Editing Ligase 1 Inhibitors Blocks RNA Editing Activities and Editosome Assembly in
Trypanosoma Brucei. J Biol Chemistry. 286(16):14178-89.
My contributions to the manuscript involved the virtual screening segment of the work.
Specifically, the a) Virtual Screening section, b) Figure 1, c) Table1 and d) all relevant section
of the Experimental Procedures (Structure Preparation, Virtual Screening and Solvated
Interaction Energy). Prof Salavati’s Group carried out all experimental testing of the
compounds and its inhibitory properties with regards to the 20s Editosome activities.
Chapter 4:
Jarussophon S, Acoca S, Gao J.M., Deprez C., Kiyota T., Draghici C., Purisima E., Konishi Y.
2009. Automated Molecular Formula Determination by Tandem Mass Spectrometry (MS/MS).
Analyst 134(4):690-700.

xviii

I wrote the code for the software that ran the analysis and collaborated with Dr Konishi in its
development. The algorithm implemented in the software was originally developed by Dr
Konishi and his group. Dr Deprez is responsible for the continued maintenance of the software.
Chapter 5:
Acoca S., Hogues H, Purisima EO. 2010. Molecular dynamics ensembles for virtual screening.
(Manuscript in preparation).
The entirety of the work for this manuscript was carried by me. The docking scripts for the
tailoring of the pipeline to ensemble virtual screening were written by Mr Hogues.

1
Chapter 1
General Introduction

1.1
1.1.1
T
since
over
from
case,
multi
1.01)
F
PR
T
Drug Di
Overview
Though the u
the beginnin
a century old
plants and m
the modern
i-step proces
).
Figure 1.01
RE-CLINI
Target Identif
Identificati
Lead Comp
iscovery a
w & Challen
use of foreign
ng of time, t
d. Since then
microbial sou
pipeline for
ss involving
The Ph
ICAL ST
In Vitro Te
Animal Te
fication
on of
ounds
and Develo
nges
n substances
the use of an
n, medicinal
urces, or pro
r pharmaceut
the collabor
harmaceutica
TUDIES
esting
esting
2
opment
s for the treat
n isolated, we
l substances
oducts of pur
tical drug di
rative effort o
al Drug Disc
Ph
Ph
tment of illn
ell-defined c
have been n
re chemical
scovery and
of a multitud
overy and D
CLINIC
hase I
hase II
Lead O
nesses has be
chemical ent
natural produ
synthesis. W
d developmen
de of special
Development
AL STUD
Optimization
een practiced
tity is only
ucts isolated
Whichever th
nt is a long,
lties (Figure
Pipeline
DIES
Phase III
Phase IV
n
d
he

3
However, no venture of pharmaceutical research is without risk and a positive
outcome of the research is all but guaranteed. The difficulties inherent in discovery and
development along with the stringent requirements of pharmaceutical drugs have created
an economic problem in the profitability of such endeavors. Despite some spectacular
successes, more is spent on drug discovery and development every year and less is
delivered in terms of innovation (DiMasi et al., 2003). Figure 1.02 shows the reported
aggregate annual domestic prescription drug R&D expenditures for all members of the
U.S. pharmaceutical industry since 1963 alongside with the number of new US drug
approvals by year (DiMasi et al., 2003). When compared, the rate of growth of R&D
expenditures clearly outpaces that of new approvals by a large margin. These rising costs
have led to an overwhelming economical R&D problem within the pharmaceutical
industry. In 2003, a study of 68 new medications placed a timeline of 10-12 years and
cumulative costs averaging US$897 million for the development and marketing of a new
medication (Ezzell, 2003). The pre-approval R&D costs themselves are up from US$138
million in 1979 to US$318 million in 1991 to US$802 million in 2000 (Figure 1.03). The
result of these increases in higher R&D costs is an increased trend towards mergers and
industry consolidation. Additionally, higher costs translate into lowering risks.
Reorganization of R&D sectors in the pharmaceutical industry aims to optimize the
return on investment by carefully selecting the most profitable research sectors. The sum
of these effects leads to an increased need in efficient, low-cost technologies that bridge
the gap between R&D and the economic challenges facing the pharmaceutical industry.

F
in
fr
(P
F
pr
D
Figure 1.02
ndustry R&D
rom 1963 to 2
PhRMA) and
Figure 1.03
re-clinical, cl
DiMasi et al., 2
Increa
expenditures
2000. Source
d Tufts CSDD
Pre-ap
inical and tot
2003)
sing costs in
s (2000 dollar
of data: Phar
D Approved N
pproval costs
tal costs per ap
4
pharmaceut
rs) and US ne
rmaceutical R
NCE database.
for new dru
pproved new
tical R&D. In
ew chemical e
Research and M
. (Taken from
ugs. Each colu
w drug in 2000
nflation-adjus
entity (NCE)
Manufacturer
m DiMasi et al
umn indicates
0 US dollars.
sted
approvals
rs of America
l., 2003)
s the capitaliz
(Taken from
a
ze

5
Computer-Assisted Drug Design (CADD) approaches have been widely used in the
pharmaceutical industry. By allowing scientists to direct their attention on the most
promising candidate compounds, and thereby narrowing the synthetic and biological
testing efforts, CADD approaches play an important role in accelerating pharmaceutical
research. The recent successes of CADD in assisting rational drug design approaches
have proven it to be an essential tool drug design and development (Kapetanovic IM,
2008; Mandal et al., 2009; Song et al., 2009).
1.1.2 Thesis Outline
As part of this thesis, several elements of CADD have been incorporated into
research targeted at every step of the pharmaceutical drug discovery pipeline. The
following is a description of the contributions of each chapter to the individual segments
of the pharmaceutical drug design and discovery pipeline.
Chapter 4 explores the lead identification stage of the pipeline and provides an
alternative means of expediting research when identification of an active compound from
a natural products sample is required. Natural products (and their semi-synthetic
derivatives) have been major sources of marketed medications. However, lead isolation
and identification from natural product extracts faces the problem of replication, i.e. the
re-discovery of known natural products. Chapter 4 presents the development of a novel
algorithm which utilizes MS/MS data to extrapolate the correct molecular formula of a

6
compound resulting in a rapid identification of the probable nature of the isolated
compound.
Chapters 3 and 5 look at lead identification from the alternative source: compound
databases. Chapter 3 is the application of our virtual screening (VS) pipeline to the
Trypanosoma Brucei RNA-editing Ligase 1 (TbRel1) where the success of our screen led
to the identification of an inhibitor which allowed a better understanding as to the effects
of inhibition. Chapter 5 seeks to further enhance the current VS pipeline by utilizing MD-
generated conformational ensembles. Here the use of conformational ensembles (see
Chapter 1.6.6) attempts to provide a better, more complete representation of the target’s
conformational dynamics as part of the VS process.
Lastly, Chapter 2 is a representative pre-clinical molecular dynamics (MD) study
of a lead compound in complex with its target. The in silico MD study provides the
opportunity to researchers of obtaining information on the mechanism of action of the
compound that would be unavailable through the usual experimental means. Our
experiments aimed at identifying specific structural factors which provided the specificity
of two compounds which target the Bcl-2 family of proteins which have recently become
key targets for cancer therapeutics.

7
1.2 Molecular Modeling
The field of Computational Drug Design relies on the development of our
understanding of the underlying mechanisms involved in the interactions of a drug and its
target. As such, the development of Molecular Mechanics (MM) and Quantum
Mechanics (QM) has brought about the study of drug-target recognition events at the
atomic and electronic level. The increasing accuracy of these models, along with that of
the computational resources required to compute them, has prompted the development of
computational tools with increasing accuracy in evaluating drug-target interactions.
1.2.1 Molecular Mechanics
First applied by Westheimer and Mayer in 1946, MM encompasses the
computational techniques that allow the calculation of molecular properties through the
use of classical mechanics and electrostatics (Westheimer and Mayer, 1946). MM
provides the means to computationally describe molecular structures and properties
practically. As opposed to QM where the primary purpose is the accuracy of the
calculations, MM packages are directed to describe molecular structures and properties
accurately, robustly, and within reasonable time frames (Boyd DB and Lipkowitz KB,
1982). To do so MM (also referred to as Force Fields) describes molecules as a collection
of atoms held together by elastic or harmonic forces. These forces essentially represent
the structural features of a molecule such as bond lengths, bond angles, dihedral angles,
etc. Functions are used to describe the behavior of these forces resulting in a calculated

8
potential energy for each. As such, the total potential energy of a molecule is calculated
by the sum of all energy contributions (Eq. 1.01):
= + + + + (1.01)
Functional form of the potential energy of a molecule
where Ebond, Eangle, Etorsion, Evdw and Eelec describe the bond length, bond angle, torsion
angle, Van der Waals and electrostatic contributions respectively. Energy contributions
are calculated to describe the deviation of structural features from their empirically (or
high-level QM) calculated ideal value. While the exact mathematical functions utilized to
describe these contributions may differ between MM packages, the functions are chosen
to accurately replicate the behavior of each energy contribution within expected ranges
while minimizing the amount of calculations, and therefore of computational time,
required. From herein, all discussions on the potential energy function will refer to that
implemented by the AMBER forcefield (Cornell et al., 1995).
The potential energy function is described as follows:
Ε = − + − + (1.02)
2
1 + cos( Φ − ) + − +
The potential energy function

wher
length
bond
atom
and r
will n
e Kr, Kθ and
h respective
ed paramete
-centered pa
rij is the dista
now be discu
Figure 1.0
The torsion
series, harm
Harbury, 2
d Vn are force
ly, and γ is t
ers used to co
artial charges
ance between
ussed briefly
04 The co
n angle, bond
monic potenti
2007).
e constants,
the phase for
ompute the v
s on atoms i
n atoms i an
y.
ontributions o
d length/angle
ial, and Lenar
9
θeq and req a
r the torsiona
van der Waa
and j respec
d j. Each ter
of bonded ter
e, and VdW c
rd-Jones pote
are the equil
al angle. Aij
als energies.
ctively, ε is
rm of the pot
rms to the po
ontact are rep
ential respecti
librium bond
and Bij are t
qi and qj de
the dielectri
tential energ
otential ener
presented by a
ively.(Taken f
d angle and
the non-
enote the
c constant
gy function
rgy function.
a Fourier
from Boas annd

10
Ε = − (1.03a) Ε = − (1.03b)
The bond length and bond angle contributions to the potential energy function.
The typical bond length of an alkane carbon-carbon bond is 1.53Å. Similarly, the
angle between a typical C-C-C bond is between 109° and 114°. Deviations from these
equilibrium values will result in an increase in the energy of the system. Therefore,
thinking of a molecule as an assembly of point masses held together by springs (the bond
lengths and angles) is a perfectly reasonable approximation to their experimental
behavior. Therefore, the Ebond, Eangle terms are modeled as harmonic potentials centered
around an equilibrium value (Eq. 1.03a,b).
Ε =
2
1 + cos( Φ − ) (1.03 )
The dihedral angle contribution to the potential energy function.
The torsion angle is essentially the rotation about bonds. For any set of four
covalently bonded atoms ABCD the torsion angle is described as the angle measured
about the BC axis from the ABC plane to the BCD plane. The periodic nature of the
torsion angle, and of the torsional potential energy, lends itself to be described by
periodic functions such as a Fourier series with the series typically truncated at the third
term (Eq. 1.03c).

11
Ε = − (1.03 )
The VdW contribution to the potential energy function.
The Van der Waals (VdW) energy relates to non-bonded interactions of atoms as
a function of the distance between the nuclei. As two atoms approach one another,
London dispersion forces predominate creating a net attractive force between them. As
the distance between the two radii get too close, a VdW repulsion comes into play. The
attractive and repulsive parts of the potential energy is described by the Lenard-Jones (6-
12) potential though the more computationally demanding Buckingham potential can also
be used (Eq 1.03d). Parameters for the VdW energy term are obtained by measuring non-
bonded contact distances in crystals as well as VdW contact data for rare gas atoms
though other non-experimental sources (simulations) can also be used (Boyd and
Lipkowitz, 1982; Cornell et al., 1995).
Ε = (1.03 )
The electrostatics contribution to the potential energy function.
The last term in the potential energy function calculates the electrostatic energy
associated with interaction of two point charges, as described by Coulomb’s law.
Therefore, the magnitude of the electrostatic forces (and energy) of interaction between
two point charges is directly proportional to the scalar multiplication of the magnitude of
the charges and inversely proportional to the square of the distances between them (Eq.

12
1.03e). Applications of MM Force fields include, but are not limited to, energy
minimization, scoring, docking, molecular dynamics and Monte Carlo methods.
Over the past decade, the development of techniques such as high-throughput X-
ray crystallography has expedited the rate of macromolecular structure determination
resulting in a current total of ~70000 crystallographic or solution structures of proteins
deposited in the Protein Data Bank (Berman et al., 2000). The availability of this wealth
of structural information, along with well-documented successes, has generated
considerable interest the advancement of structure-based drug design (SBDD) techniques
(Marrone et al., 1997). A number of structure-based screening methods have been
developed to expedite pharmaceutical research. These methods have been used in lead
discovery identify novel chemical entities showing strong inhibitory activity towards a
target and in lead optimization where the careful selection of an optimized lead within a
set of chemically similar compounds is required. The following sections will include an
overall review of the methods which have been most crucial to the development of
molecular modeling in drug design, namely predicting binding free energy and binding
modes, and will be followed by a review of Molecular Dynamics & Virtual Screening
methods which have together significantly contributed to the advancement of CADD.
1.3 Predicting Binding Free Energies - Scoring
Calculations of free binding energies play an important role in the accuracy of
SBDD techniques (Raha and Merz, 2005). The major function of such techniques is in

13
providing estimates of binding free energies at a faster rate and lower cost than that
possible by experimental means. As such, the correlation between experimental and
computationally derived binding free energies to a target is a prerequisite to their success
in drug design.
The selective binding of a small molecule to a target protein is the result of
complementary structural and energetic features. This reaction is determined by the
standard Gibb’s free energy of binding Δ ∘
under standard state conditions
(concentrations at 1M, temperature is 298K and pressure of 1atm). The experimentally
determined association, dissociation and inhibitory constants (KA, KD and Ki respectively)
relate to the standard Gibbs free energy as follows:
= = = (1.04)
Δ ∘
= − (1.05)
As the binding free energy of a system is a state function, theoretical calculations of the
binding free energy can approximate the binding free energy in a direct fashion, by
calculating the properties of the protein and ligand individually and then of their
complex:

14
Δ = − + (1.06)
where ∆ is the free energy of binding, is the free energy of the complex,
and the free energies of the protein and ligand respectively. Another
form of expression of the binding free energy used is the decomposition into different
additive free energy components integrated into a single equation:
Δ = Δ + Δ + Δ + Δ (1.07)
In Eq 1.08 , ∆ is the interaction free energy owing mostly to electrostatic and steric
enthalpic contributions from complex formation, ∆ is the free energy of solvation
which accounts for solvent effects in binding, ∆ is the free energy change
associated with changes in the motion of the components of the system, and ∆
accounts for the free energy due to conformational changes upon complexation. Scoring
functions address these components of the binding free energy differently. Chapter 1.3.2
is a review of the different methods employed to evaluate them. However, before
addressing the differences between scoring methods, an extensive review of solvation
effects on binding is appropriate, as the development of implicit solvation models has had
a tremendous impact in the calculation of solvation free energies and hence of our ability
to estimate binding free energies (Tomasi and Persico, 1994; Orozco and Luque, 2000).

15
1.3.1 Effect of water: Continuum (Implicit) Solvation energy
Protein-ligand binding is a process that normally occurs within an aqueous
environment. These interactions play a significant role in binding energetics and are thus
taken into account when making binding free energy predictions. The effective dielectric
constant of water at 25°C is 78.5 while that of vacuum is 1. This energy is the result of a
favorable interaction between the atomic charge and the high-dielectric environment. As
a result of this favorable interaction, there is an energy penalty when polar parts of the
ligand are removed from their contact with water and exposed instead to the binding site.
Additionally, the presence of water results in the effective screening of charge-charge
interactions as indicated by the dielectric constant in the Coulomb equation (Eq. 1.03e).
However, the interface of a protein-ligand complex usually excludes the presence of
water molecules. In order to account for the distance-dependence on the effect of water
on charge-charge screening, a crude screening model that contains a distance-dependent
dielectric constant was introduced. In this model, for all atoms i and j in Eq 1.03e the
effective dielectric constant would be ε = Crij where C is a constant and rij is the
interatomic distance. While this model allows for the rapid calculation of one of the
major effects of water, it does not account for the one-body solvation energy for each
atom. Additionally, in calculating the electrostatic interaction between two atoms, the
position of all other protein and ligand atoms affect it and should also be taken into
account.

16
Continuum (implicit) solvation models can account for the additional
complexities of electrostatic interactions. The continuum solvation models essentially
treat the solvent as a bulk dielectric medium with a dielectric constant of ~80 (Dout=78.5
for water at 25°C) and the protein/ligand as low-dielectric regions with enclosed atomic
charges. Numerical solutions to the Poisson Boltzmann (PB) equation provide an
efficient means of calculating the electrostatic potential produced by a system. The PB
equation relates the electrostatic potential Φ(r) to the charge density ρ(r) as:
∇( ( )∇ ( )) = −4 ( ) (1.08)
where ε(r) is the dielectric constant. The total free energy of solvation is calculated as
follows:
∆ = ∆ + ∆ (1.09)
In Eq. 1.09, Φ(r), obtained from solving the PB equation, allows for the computation of
the total electrostatic energy component of the solvation free energy:
= ∑ ( ) = ∑ ( ( ) + ( )) (1.10)
where ϕC
and ϕR
are respectively the Coulomb and reaction field potential. The
Coulombic component is calculated as a Coulomb summation over all other charges than
qi :

17
( ) =
1
(1.11)
The reaction field component ϕR
of the electrostatic potential is derived from numerical
solutions to the PB equation, using either a finite difference scheme (FDM) or a boundary
element method (BEM) (Gilson et al., 1988; Honig and Nicholls, 1995; Purisima and
Nilar, 1995). Therefore, once a solution to the PB equation is calculated, the electrostatics
component of the solvation free energy is obtained.
The non-polar segment is derived from surface area terms. It contains
contributions from cavity formation and solvent-solute dispersion-repulsion interactions.
These terms are often considered to be proportional to the molecular surface area (Floris
and Tomasi, 1989; Still et al., 1990; Gogonea and Merz, 1999). The general formula for
this term is therefore:
= ∑ (1.12)
where Ai is the furnace area of one solute atom and τi is a surface tension parameter
specific for that atom. Typically, the molecular surface will be defined as the solvent-
excluded surface area or the solvent-accessible surface area. The solvent-excluded
surface may however perform better than other surface models (Pitarch et al., 1996).

The f
meth
Elem
1.3.1
Honi
(War
super
poten
withi
betwe
assign
formu
following se
ods for solvi
ment (BEM) m
.1 Fi
While the
g’s group in
rwicker and W
rimposed on
ntial, charge
in the lattice
een grid poin
ned proporti
ula then calc
Figure
(Taken f
ctions will ta
ing the PB e
methods.
inite Differen
e FDM was f
n the develop
Watson, 198
to the solute
density, diel
(Fig. 1.05).
nts, the alloc
ionally to the
culates the d
1.05
from Folgaro
ake a closer
equation, nam
nce Method
first introduc
pment of the
82; Gilson et
e and surroun
lectric const
As the posit
cated charge
e distance of
erivatives of
Cubic grid
o et al., 2002)
18
look at the t
mely the Fin
ced by Warw
Delphi prog
t al., 1988). I
nding solven
tant and ioni
tion of the at
e at each of th
f the grid po
f the PB equ
scheme for t
two most com
nite Differenc
wicker and W
gram has wid
In the FDM,
nt where valu
ic strength ar
toms of the s
he eight neig
int to the ch
uation.
the Finite Dif
mmonly em
ce (FDM) an
Watson, the w
dely popular
, a cubic latt
ues of the el
re assigned t
solute usuall
ghboring gri
harge. A finit
fference Met
mployed
nd Boundary
work of
rized its use
tice is first
lectrostatic
to grid point
ly fall
id points is
te difference
thod.
y
ts
e

19
1.3.1.2 Boundary Element Method
The BEM is an alternative approach to FDM for solving the PB equation. In the
BEM, the potential is represented as a charge density spread over the molecular surface
(Zauhar and Varnek, 1996). Instead of directly solving for the PB equation, the BEM
considers the induced-surface charge to develop an integral formulation to the problem.
This is expressed as:
=
( )
| |
(1.13)
where is the electrostatic potentials due to the surface charge distribution, σ(r) is the
surface charge density and the integral is taken over the entire molecular surface area
(Zauhar and Morgan, 1985; Purisima and Nilar, 1995) . The SIE and SIETRAJ scoring
functions used throughout this thesis for computation of binding free energies, compute
the reaction field energy using the BRI-BEM program which utilizes the BEM (Purisima
and Nilar, 1995; Purisima EO, 1998; Naïm et al, 2007; Cui et al., 2008).
1.3.1.3 Desolvation cost
Continuum solvation studies on the energetics involved in ligand-binding have
been conclusive in noting the large, unfavorable effects of solvent-screening on the
overall electrostatic change in free energy (Kuhn and Kollman, 2000; Wang et al., 2001;
Hou et al., 2002). Complex formation between a protein and ligand involves the breakage

20
and formation of several hydrogen bonds that includes the reorganization of water
molecules around the ligand and target active site (Fig. 1.06). While the gas-phase
interaction between the ligand and protein is favorable, the desolvation of the binding
pocket involved in ligand-binding results in an overall large energetic penalty. Hence,
ligand-binding is suggested to be primarily driven by short-range (vdW) and long-range
hydrophobic forces (Hünenberger et al., 1999; Kuhn and Kollman, 2000; Wang et al.,
2001; Hou et al., 2002). This phenomenon can be better described by looking at the
electrostatic component of the binding free energy. The electrostatic change in binding
free energy is expressed in Eq. 1.15 as the sum of the change in reaction field and
coulomb binding free energies:
Δ = Δ + (1.14)
where Δ is the change in reaction field energy and is the change in
intermolecular Coulomb energy. Computational studies have noted that while the
intermolecular Coulomb energy favors binding, the desolvation effects are incompletely
compensated by ligand-target interaction in the bound state resulting in an unfavorable
effect on the binding free energy (Hendsch and Tidor, 1999; Miyashita et al., 2003; Sims
et al., 2005).

1.3.2
of inh
mode
rates
2009
devel
respe
abilit
funct
comp
equat
most
Figure 1.0
Scoring F
In their m
hibitors and
es. Virtual sc
and improve
). The maste
lopment of s
ect to their ab
ty to predict
tions may us
putational de
tion, most sc
dominant co
06 Repres
format
Functions
most general u
provide an a
creening pip
e lead comp
er equation (
scoring funct
bility to pred
binding mod
e all or some
emands in as
coring functi
ontributions
sentation of d
tion. (Taken f
use, scoring
accurate disc
elines have u
ound identif
Eq. 1.08) de
tions. The fo
dict binding
des is also o
e of the diffe
ssessing a mo
ions employ
into accoun
21
desolvation e
from Cozzini
functions ar
crimination b
used scoring
fication (Grü
escribed is us
ollowing is a
affinity and/
f interest (H
erent terms e
ore rigorous
a more emp
nt. This provi
effects during
et al., 2004)
re designed t
between true
g functions to
üneberg et al
sed as an ov
an overview
/or affinity r
Halperin et al
expressed in
representati
pirical expre
ides a fast an
g ligand-prot
to predict bi
e and false b
o improve en
l., 2002; Seih
verall guide f
of scoring fu
ranking thou
l., 2002). Sco
n Eq. 1.08. D
ion of the m
ssion, taking
nd accurate m
tein complex
inding mode
binding
nrichment
hert MH,
for the
functions wit
ugh their
oring
Due to the
master
g only the
means of
x
es
th

22
predicting binding modes and ranking potential lead compounds in a virtual screening
setting. The three categories of scoring functions that will be reviewed include physical-
chemical, knowledge-based, and empirical functions.
1.3.2.1 Physical-Chemical
The most prominent physical-chemical scoring function is the Molecular
Mechanics/Poisson-Bolzmann Surface Area (MM-PBSA) function (Kollman et al.,
2000). The overall format of the function can be summarized as follows:
Δ = Δ + Δ − Δ − Δ − ∆ (1.15)
where ∆ is the Coulomb electrostatics and vdW interaction energies calculated using
MM force field packages such as AMBER and CHARMM (Case et al., 2005; Brooks et
al., 2009). The ∆ term is usually evaluated using normal mode analysis of a MD
trajectory. All calculations are based on ensemble averages based on snapshots taken
from MD trajectory. Therefore, the MM/PBSA energy is calculated from averages of a
finite number of snapshots from the ensemble and, as such, the quality of the results is
sensitive to the details of the MD simulation.
The Solvated-Interaction Energy (SIE) scoring function is another example of
physics-based scoring function. It makes use of force-field parameters and equations to

make
equat

For th
interm
the ch
eleme
energ
comp
solva
poten
betwe
water
solva
surfa
surfa
e estimates o
tion is as fol
ΔG
he electrosta
molecular in
hange in rea
ent method (
gy is calculat
ponents of fr
ation energy
ntial between
een the boun
r.
As describ
ation energy)
ce area can b
ce area is ca
on the bindin
lows:
=
atic compone
nteraction en
ction field so
(Purisima an
ted as the dif
ree energy to
and i
n the ligand
nd and free s
bed in Sectio
) is proportio
be different
alculated as t
ng affinity of
+ Δ

ent of the fre
ergy i
olvation ene
nd Nilar, 199
fference betw
o binding,
s the vdW in
and protein
states of the
on 1.31, the
onal to the su
from functio
the solvent-e
23
f molecules t
+
ee energy (
is estimated
ergy calcula
95; Purisima
ween the bou
is the
nteraction en
atoms.
solute-water
cavitation c
urface area (
on to functio
excluded sur
to a target. T
+ Δ
), the e
using Coulo
ated using the
a, 1998). Th
und and free
change in th
nergy calcula
is calcul
r VdW energ
ost (nonpola
(Eq. 1.13). T
on. In the cas
rface area (N
The format o
electrostatic
omb’s law an
e BRI-BEM
e change in
e states. For
he non-electr
ated using th
lated as the d
gy and cavita
ar componen
The definitio
se of SIE, th
Naim et al., 2
of the SIE
(1.16
nd is
M boundary
solvation
the nonpola
rostatic
he LJ 6-12
difference
ation cost in
nt of the
n of the
he molecular
2007).
6)
s
ar
n

24
= ∙ Δ (1.17)
However, to the cavitation cost is also added the loss of intermolecular VdW interaction
between solute and solvent. This is accomplished by a linear scaling the solute-solute
intermolecular VdW by a factor β and thereby account for the loss of solute-solvent VdW
interactions upon complex formation:
= ( − 1) + ∙ Δ (1.18)
The complete parameterization of the SIE scoring function is dependent on a number of
variables that include the solute dielectric constant (Din), solute atomic radii {ri}, SA
scaling coefficient (γ), vdW interaction energy scaling coefficient (β) and fitting constant
(C) (Naim et al., 2007):
({ }, , , , ) = ( ) + Δ ({ }, ) +
∙ + ∙ Δ ({ }) + (1.19)
One general issue with empirical scoring functions is tied to their training set which can
lead to an overall bias towards targets that have been explored as part of it (i.e. the
training set on which they are parameterized may represent a bias towards the
composition and diversity contained in it) (Gohlke and Klebe, 2002; Ferrera et al., 2004).

25
1.3.2.2 Empirical (Regression) Scoring Functions
Empirical scoring functions weigh contributions from the different energetic
terms in order to make a binding affinity prediction. These terms may include hydrogen-
bonding using geometric measures as well as FF-based physical potentials. However, the
linear weighing of the terms is derived from regression methods that fit binding affinity
terms to experimental affinities using experimental data and structural information. The
regression analysis optimizes the weighing to provide a maximal correlation between
computed and experimental binding affinities in the training set (Bohm et al., 1994;
Verkhivker et al., 1995; Head et al., 1996; Naim et al., 2007).
1.3.2.3 Knowledge-based Scoring Functions
Knowledge-based (KB) scoring functions use statistical potentials that are derived
from protein-ligand complexes databases such as the PDB (Koppensteiner and Sippl,
1998; Muegge et al., 2000; Gohlke and Klebe, 2001). The use of KB potentials for the
scoring of protein-ligand complexes was inspired by the success of potentials in
predicting protein folding and structure (Sippl, 1990; Sippl, 1993; Sippl et al., 1996). In
KB functions, occurrences of interacting pairs of atoms in a training set of complexes are
used to derive statistical potentials that resemble but are not potentials of mean force
(Ben-Naim, 1997). In doing so, certain assumptions are made. The first is that the
protein-ligand complex structures are assumed to be in a state of thermodynamic

26
equilibrium while the second is that the distributions of atoms in the complexes obey
Boltzmann’s law (Sippl et al., 1993; Mullinax and Noid, 2010).
KB potentials are built by first calculating a distance-dependent probability
distribution of atom-pairs. The Hemholtz free energy is then calculated per atom-pair in
the protein-ligand complex:
( ) = −
( )
(1.20)
where ρij(r) is the pair correlation function for an atom pair of type ij at distance r while
is a normalization factor representing the bulk density for the atom-pair when they
are not interacting at a distance r. A few notable examples of KBP scoring functions
include the piecewise linear potential (PLP), PMFScore and DrugScore (Verkhivker et
al., 1995; Muegge and Martin, 1999; Gohlke et al., 2000).
1.3.1.4 Problems
The major shortcoming of most scoring techniques like SIE is that they only
consider a single receptor-compound interaction in estimating binding free energies of
what is a dynamic process in nature resulting from an ensemble of such complexes. The
use of ensembles as part of a Virtual Screening pipeline is explored in Chapter 5 of this

27
thesis. Nevertheless, despite phenomenal advances in computational power and
technologies, accurate estimates of binding free energies remains challenging.
1.4 Predicting Binding Modes – Docking
Predicting binding modes of ligands to a target protein structure, also known as
docking, has been a key component of in silico techniques used in structure-based,
rational drug design (Kuntz I, 1992; Cavasotto and Orry, 2007). Docking schemes
attempt to find the optimal matching between a ligand and a targeted protein. In essence,
the problem can be reduced to the following: given the atomic coordinates of these two
molecules, predict the proper conformation of the complex. One assumption that is
usually taken into the docking problem is prior knowledge of the binding site targeted by
the ligand.
Docking schemes are typically validated by their ability to reproduce
experimental data through docking studies where protein-ligand complex conformations
are obtained in silico and compared to structures obtained by experimental means (i.e. X-
ray crystallography or nuclear magnetic resonance). Since predicting the correct bound
conformation of both the protein and ligand is a challenging and computationally
expensive task, the problem is usually reduced to the following: given the proper “bound”
conformation of the protein, predict the proper bound conformation of the ligand and
complex. This problem is the focus of the large majority of docking algorithms though a

28
few incorporate a sampling of receptor conformation as well to optimize the predicted
complex coordinates.
The main purposes of docking algorithms can be divided into two groups though
the function of one is not mutually exclusive of the other. The first emphasizes speed and
accuracy, where the main goal is the rapid screening of millions of potential candidate
molecules for the discovery of a few active compounds in virtual screening (see Section
1.6). The second emphasizes accuracy of the complex structure, attempting to bridge the
gap closer and closer between the predicted complex and the experimental structure.
Docking programs search through a large selection of possible fits between a ligand and
the targeted binding pocket and assess the best fit between them by taking into account
several parameters. These parameters are akin to those used in scoring functions, which is
in essence what they are. In this case however, the scoring scheme is optimized to
retrieve the binding mode that is closest to the experimental structure as measured by
RMSD.
1.4.1 Docking Algorithms
1.4.1.1 Fast Shape Matching
Shape Matching algorithms primarily take into account the overall geometrical
overlap between the protein and ligand molecules. Shape matching methods employ a
variety of algorithms in order to assess proper conformations of the ligand and binding
site to be matched.

29
Rigid-body docking applications are mainly SM-based. Examples include
ZDOCK as well as our own internally developed docking program (Chen et al., 2003;
unpublished). Flexible docking algorithms can also use SM methods as part of their
strategy. For instance, DOCK combines incremental construction and a sphere matching
algorithm in order to identify an optimal geometrical alignment (Kuntz et al., 1982).
The development of methods that analytically calculate the solvent-accessible molecular
surface was a key contributor that allowed the development of these SM applications
(Connolly et al., 1983a; Connolly et al., 1983b).
1.4.1.2 Incremental construction
Incremental construction methods divide the ligand into fragments which are
separately docked onto the surface of the binding site. The fragments identified as rigid
“anchors” regions of the ligand are typically docked first and the fragments identified as
flexible regions are added sequentially with a systematic scanning of the torsion angles
around the anchors. Following the docking, rigid fragments are then fused together for an
optimal orientation of the molecule to be obtained. This fragmentation of the molecule is
a means of incorporating ligand flexibility into docking.
The first IC algorithm was part of the DOCK program (Desjarlais et al., 1986). In
DOCK, the rigid fragments were first docked independently and each combination of the
rigid fragments was combined as in the original compound if the atoms were within

30
certain distances of each other. Other methods utilizing IC include FlexX, FLOG,
Hammerhead and Surflex (Miller et al., 1994; Rarey et al., 1996; Welch et al., 1996;
Kramer et al., 1999; Ewing et al., 2001; Jain AN, 2007).
1.4.1.3 Monte Carlo Simulations
The Monte Carlo (MC) Method was developed in its present form by Metropolis,
Ulam and Neumann during their work on the Manhattan project (Metropolis and Ulam,
1949; Metropolis et al., 1953). Historically, it was used to perform the first computer
simulation of a molecular system. MC simulations were later integrated as a means of
adding flexibility to docking algorithms (Liu and Wang, 1999). With respect to docking
algorithms, MC simulations attempt to position the ligand within the binding site through
a number of random translational and rotational changes. The advantage of the added
randomness to the sampling is a decreased likelihood of being trapped in local minima.
The standard (Metropolis) MC methods generate configurations of a system
through random Cartesian changes. Each change to the system is evaluated and then
rejected or accepted based on a Boltzmann probability. One example of a MC-based
docking application is the Internal Coordinates Mechanics (ICM) program (Abagyan et
al., 1994). ICM initially makes a random move of one of three types: a rigid body ligand
move, a torsion move of the ligand or a torsion move of the receptor side chain (Abagyan
and Totrov, 1994; Abagyan et al., 1994). The side chain movement samples the
conformational space defined a priori through a side-chain rotamer library (Ponder and

31
Richards, 1987). The side chain sampling allows the algorithm to explore with larger
probability the conformational space which is known to be highly populated. Following
each sampling step, a modified ECEPP/3 scoring function is used to perform a conjugate
gradient local minimization and test whether the conformation is accepted or rejected
using the Boltzmann criteria.
1.4.1.4 Evolutionary Programming
Evolutionary programming (EP) algorithms are computational models that take
their name and concept from biological processes. The EP algorithms generally start with
a population of structures characterized by a given set of genes. Parent structures are then
allowed to produce children structures containing a mixture of structural characteristic of
the parents (as defined by the parents genes), throughout which mutations are allowed to
occur. The individuals of the population displaying the most favorable features are kept
while others are discarded, as per Darwin’s principle of natural selection.
Genetic algorithms (GA) are one example of EP algorithms. In GA, a population
of chromosomes (parents) is used to create new chromosomes (offsprings). Crossovers
are used to generate the new chromosomes and a complex set of scoring functions are
then used to select members within each round of selection. DOCK and GOLD are two of
the most notable docking programs utilizing variations on the GAs (Ewing et al., 2001;
Verdonk et al., 2003). While EP algorithms can find one of the best solutions to the
docking problem, they, like all heuristic algorithms, can also be trapped in local minima.

32
1.5 Molecular Dynamics
Molecular recognition between a protein and its ligand is a dynamic and complex
process. An accurate computational representation of this interaction is a problem of
considerable complexity and interest in CADD. Few techniques address this process and
account for the conformational flexibility of both the ligand and receptor. Even fewer do
so in an accurate and efficient manner. Protein flexibility is a multi-factorial, complex
problem owing to the inter- and intra- molecular interactions involved in the
conformational dynamics. Of all methods commonly used, Molecular Dynamics (MD)
simulations provide the most complete computational representation of the dynamics
involved in this process.
1.5.1 Newton’s Laws
Molecular dynamics methods solve the Newton’s equation of motion for atoms on
an energy surface. Newton’s law of motion provides the means of generating successive
conformations of the system. The result of these successive conformations is a trajectory
that indicates how the positions and velocities of particles within the system vary with
time. Newton’s laws of motion can be summarized as follows:

33
First law: Every body remains in a state of constant velocity unless acted upon by an
external unbalanced force. Hence, if the resultant force is zero, then the velocity of the
object is constant (Eq. 1.22):
I. = 0 ⟹ = 0 (1.21)
Second law: A body of mass m subject to a net force F undergoes and acceleration a that
has the same direction to the force and a magnitude that is directly proportional to the
force and inversely proportional to the mass:
II. F = = a (1.22)
Third law: The mutual forces of action and reaction between two bodies are equal,
opposite and collinear, i.e. whenever a first body exerts a force F on a second body, the
second body exerts a force –F on the first:
III. F , = − F , (1.23)
In order to obtain an accurate trajectory, the differential equation embodied by Newton’s
second law of motion is solved:

34
= (1.24)
which describes the motion of particle of mass mi along one coordinate xi with a net force
F along that direction.
1.5.2 Ensembles
MD simulations are characterized with regards to the macroscopic conditions that
are held constant. Statistical mechanics require that certain macroscopic conditions must
be held constant in order to study the collection of all microstates of a system, its
ensemble. Therefore, ensembles can be characterized by different quantities that include:
volume (V), pressure (P), total energy (E), temperature (T) and number of particles (N).
Ensembles are accordingly named and labeled with respect to the fixed quantities: NVT
(canonical), NVE (micro-canonical) and NPT (isothermic-isobaric).
1.5.3 Verlet Algorithm
As discussed above (see Section 1.2.1), nuclei behave to a good approximation as
classical particles. The dynamics of motion can therefore be extrapolated by solving
Newton’s second equation:
= − = (1.25)

35
Here, V is the potential energy at position x and the vector x is a vector of length 3N
containing the Cartesian coordinates for all particles. With an initial set of particles at
position xi, the positions at a small time-step later can be calculated using a Taylor
expansion (Eq. 1.27, 1.28):
= +
∂
∂t
(Δ ) +
1
2
∂
∂t
(Δ ) +
1
6
∂
∂t
(Δ ) + ⋯ (1.26)
= + (Δ ) +
1
2
(Δ ) +
1
6
(Δ ) + ⋯ (1.27)
where the velocities vi, the acceleration ai and the hyper-acceleration bi are the first,
second and third derivatives of the positions with respect to time. Substituting Δt with -Δt
we obtain the positions at ri-1:
= − (Δ ) +
1
2
(Δ ) −
1
6
(Δ ) + ⋯ (1.28)
By adding equations for ri+1 and ri-1 we are able to calculate the position at Δt later from
the current acceleration, and the previous and current positions.
= (2 − ) + (Δ ) (1.29)

36
where the current acceleration can be obtained from the force or the derivative of the
potential:
=
F
= −
1
(1.30)
As the acceleration is re-evaluated at each time step from the forces, the positions are
changed at each time-step, which then creates the resulting trajectory. This, in essence, is
the Verlet algorithm (Verlet, 1967). Certain disadvantages of the Verlet algorithm has
given rise to the use of alternative algorithms for MD simulations. The first disadvantage
is the tendency towards truncation errors. This is a consequence of adding ai (a small
number) and 2ri – ri-1 (a large number) for the calculations of the new positions. The
second is that velocities are not an explicit part of the Verlet algorithm and creates a
problem in generating constant temperature ensembles (Cuendet and van Gurensteren,
2007). The velocity Verlet algorithm is a variation that addresses these problems (Martys
and Mountain, 1999).
1.5.4 Considerations
MD simulations require small time-steps and are time-intensive with regard to the
calculation of phenomena such as bond stretching and angle-bending motions. The size
of the chosen time-step is a critical element affecting the accuracy of the trajectory with
smaller time-steps providing a better approximation of the expected dynamics of the
system. This however also increases the computational costs, as more steps are required
for propagating the system for a given total time. Generally, the longest time-step that can

37
be taken is limited by the rate of the fastest process being sampled in the system.
Typically, that requires that the time-step be one order of magnitude smaller than the
fastest process. In MD simulations, molecular rotations and vibrations occur with
frequencies in the 1011
-1014
S-1
. Therefore, time-steps in the order of 10-15
S or less are
required for sampling of these molecular motions. A consequence of this limitation is that
a MD simulation of 1 nanosecond (ns) would require ~109
time-steps to complete. Since
simulations are typically in the nanosecond range, orders of ~109
calculations present a
significant computational demand. Additionally, biological phenomena such as protein
folding typically occur in even longer microsecond timescales.
One solution to this problem involves the freezing of the fastest molecular
motions. This allows for significantly longer time-steps to be used while affecting the
overall accuracy minimally. This is made possible because the fastest processes, the
stretching vibrations, have a minimal impact on the properties of the trajectory. This is
especially true for bonds involving hydrogen atoms. Therefore, freezing of bond lengths
involving hydrogen atoms results in longer simulation times for a given number of
calculated time-steps. The SHAKE and RATTLE algorithms provide the constraints
necessary to maintain bonds involving hydrogen atoms fixed during the simulation and
typically allow time-steps to be increased two to three fold (Ryckaert et al., 1977;
Andersen, 1983).
Lastly, overcoming energy barriers can be a challenging task given that any
motion of a conformational ensemble outside of its minimum in the potential energy

38
surface will generate a force pulling the system back towards its minimum. A number of
novel algorithms such as Replica-Exchange and MetaDynamics attempt to overcome this
limitation using different means (Sugita and Okamoto, 1999; Laio and Parrinello, 2002).
1.5.5 Boundary Conditions
MD simulations of a solvated system usually involve several hundred or thousand
molecules of solvent. However, in order for macroscopic properties to be realistically
calculated from a limited number of solvent molecules, boundary effects require special
considerations. When considering that a water-filled 1L cube contains 3.3 x 1025
molecules of water at room temperature, 2 x 1019
of which will be interacting with the
cube’s boundary, it is easy to see why using a computationally tractable number of
molecules will be insufficient for deriving bulk properties. In a system containing a few
thousand water molecules, most would be under the influence of interactions with the
boundary.
Periodic boundary conditions basically replicate the bulk properties of a fluid
given a limited number of solvent molecules. The system is usually prepared within the
confines of a box having a cubic or other polyhedral geometry (Bekker, 1997). The box is
then replicated in all directions (Fig. 1.07). If a solvent molecule leaves the box during
the simulation it is replaced by an image particle entering the box from the opposite side
(Fig. 1.07). A constant number of solvent molecules within the box is therefore

maint
as if t
Figu
repro
1.5.6
r-1
. T
contr
spher
cutof
simul
1995
tained. This
they were w
re 1.07 Pe
duced from fr
Long-Ran
The intera
This creates a
ributions from
rical cutoffs,
ff. However,
lations of pe
).
configuratio
within bulk flu
eriodic bound
from www.-ph
ge Electrost
action energy
a computatio
m atoms loc
, which essen
cutoffs have
eptides and n
on allows for
uid.
dary conditio
hy.-cmich.-edu
tatic Calcul
y between tw
onal problem
ated outside
ntially elimi
e been docum
nucleic acids
39
r particles w
ons in molecu
u/-people/-pe
lations: The
wo point cha
m in consider
e of the centr
nates electro
mented to re
s (Schreiber
within the sys
ular dynami
tkov/-isaacs/-
e Ewald Sum
arges decays
ring the long
ral box. One
ostatic contri
esult in sever
and Steinhau
stem to expe
ic simulation
-phys/-pbc.-h
mmation M
s at a rate pro
g-range elec
e solution is
ibutions bey
re artifacts in
user, 1992; Y
erience force
ns. (box
tml)
Method
oportional to
trostatic
using
yond the
n MD
York et al.,
es
o

40
Ewald summation methods allow the potential due to the partial charges of a
system and all of their periodic images to be considered. In Ewald summation, the
position of each image box is related to the central box through a vector. Each vector is
therefore an integral multiple of the length of the box. Generally, the contribution of
charge-charge interactions within the central box to the potential energy can be written
as:
=
1
2 4
(1.31)
where rij is the distance between charge i and j.
1.6 Virtual Screening
While economic pressures increase to deliver target-optimized drugs
at an accelerated pace and minimal costs, computational methods have become an
increasingly important tool in drug discovery efforts. While numerous challenges
continue to persist in the in silico accurate prediction of ligand-target interaction,
computational methods have already proved themselves in the successful development of
numerous pharmaceutical medications (See Section 1.7). Of note is the role of virtual
screening (VS) in lead discovery efforts. VS provides the ability to analyze large
compound databases, make predictions as to which compounds are most likely to interact
with the desired target and become promising lead candidates. These candidates can then
be tested and successful molecules can then go through rounds of optimization. VS

41
therefore circumvents the expense incurred through large scale screening efforts and
narrows the search to a few, high-potential candidates (Oprea and Matter, 2004).
1.6.1 Virtual Screening Pipeline
A VS pipeline is designed to optimize the use of computational resources for
efficiency and speed at the initial stages and for accuracy at the later stages. This design
optimizes the use of computational resources for the best overall performance of the
pipeline. In this case, earlier stages of the filtering process minimize the use
computational resources, thereby optimizing speed, by using soft scoring functions. More
extensive calculations and sampling methods are reserved for the later stages of the
pipeline where careful selection of the candidates with the highest potential is required
(Fig 1.08).

Figuure 1.08
42
The Virrtual Screenning Pipelinne.

43
1.6.2 The Target
Target selection is the first step to any structure-based drug discovery project.
Several requirements must be met. The first involves the target’s druggability (Hajduk et
al., 2005). The second involves the availability and choice of the 3D structure used for
the screening. X-ray crystallography or NMR structures are the preferred choices though
VS projects have been successfully run on homology models as well (Evers and
Klabunde, 2005). Since the majority of VS software has limited considerations with
regards to target flexibility, the choice of structure should be aimed towards one where
the conformation of the binding site is akin to that expected when bound to a small
molecule (Sousa et al., 2006).
Following the careful selection of target and structure, preparation of the target
structure is another important task in the VS preparation steps. The primary consideration
is in the assignment of proper protonation states to active-site residues. Difficulties arise
due to the effects of local electrostatic conditions on the pKa values of side-chain
functional groups. With respect to the success of VS, proper assignment of side-chain
protonation states is crucial in providing an accurate representation of the binding-site
characteristics. A few alternatives exist which integrate the electrostatic effects in
assessing the protonation states of side-chain functional groups. One example is the H++
server which predicts the protonation states of amino-acid side chain functional groups
within the continuum electrostatic framework (Gordon et al., 2005).

44
1.6.3 The Compound Database
A database should first provide optimal structural diversity so as to maximize
chances of finding numerous scaffolds displaying activity against the target. Generally,
compounds should also adhere to the Lipinski’s rule of five (Lipinski et al., 2001).
Several small molecule database exist which are routinely used for VS. These include
the ZINC library, the National Cancer Institute compound database, and Accelerys
Available Chemical Directory and MDDR libraries (Milne et al., 1994; Irwin and
Shoichet, 2005). Most major pharmaceutical companies also have in-house corporate
libraries.
1.6.4 The Docking Protocol
The docking protocol is at the core of every VS pipeline. Docking algorithms
attempt to predict the structure of the protein-ligand complex as a first, preliminary filter.
The docking must therefore be fast, as an extremely large number of compounds must be
evaluated. While the docking pipeline may not provide absolute accuracy with regards to
selecting all true-positive compounds, it must be robust enough not to discard moderate
to strong binders as false-negatives across a variety of targets. This preliminary docking
step is typically composed of a docking algorithm (see Section 1.4.1) and a scoring
function. The scoring function used at this step is usually optimized for speed rather than
accuracy and other more extensive and accurate functions are usually used at later stages

45
of the pipeline where a more discriminate assessment of the binding potential of a
smaller number of compounds is required.
1.6.5 MD Simulations
MD simulations are used a final refinement of the most promising candidate
molecules before selection is done. As such, MD simulations in the order of a few
hundred picoseconds to a few nanoseconds are done on the predicted ligand-protein
complex. The goal of MD simulations in this setting is to establish the proper dynamic
stability of the complex. This is achieved by careful observation as to the interactions
between the ligand and target supported by analysis of the stability of the protein
structure and binding mode. Scoring functions such as SIETRAJ and MM-PBSA can also
be used on the MD simulations to obtain a better assessment as to the potential binding
affinity of the compounds (Kollman et al., 2000; Cui et al., 2008).
1.6.6 Conformational Ensembles
The VS pipeline described typically considers the target as a rigid entity during
the docking process. Since the conformational flexibility of the target is seldom fully
considered during such a process, methods that integrate target flexibility through the use
of conformational ensembles have proved successful (Bursavich et al., 2002; Osterberg et
al., 2002; Barril and Morley, 2005; Amaro et al., 2008). Theoretically, conformational
ensembles allow a fully dynamic representation of the target to be presented to the ligand

46
for fit. This is akin to what is thought to occur in solution where a ligand binds to a pre-
existing receptor population. The ligand is then exposed to the conformational ensemble
of the receptor and may preferentially bind to conformations that occur infrequently in
the receptor’s dynamics (Ma et al., 2002; Wong and McCammon, 2003). The result is a
shift in the equilibrium population towards that of the preferentially bound conformation
(Ma et al., 2002). The “lock and key” model of ligand binding is therefore thought to be
a representation of one of the rare conformations within this ensemble and hence, that
conformational selection is a driving force in ligand recognition.
One of the most prominent examples of using conformational ensembles
generated from MD simulations for VS has been implemented as part of the Relaxed
Complex Scheme (RCS) (Lin et al., 2002; Lin et al., 2003; Amaro et al., 2008). The RCS
combines the advantages of docking with the dynamic conformational sampling that is
provided by MD simulations. Through this use of MD simulations, the RCS integrates
extensive conformational sampling of the target structure into the VS pipeline. At the
core of the RCS is an all-atom MD simulation of the target where the simulation time
varies from a few ns to tens of ns (Schames et al., 2004; Amaro et al., 2008; Cheng et al.,
2008). With few exceptions, the AutoDock docking program is typically used to carry out
docking and scoring functions (Morris et al., 2009). Since significant conformational
changes to the active site are induced by ligand binding, a ligand-bound structure is
usually preferred. The resulting trajectory is then reduced to a computationally tractable
ensemble. A number of strategies exist to select a representative subset from the full set
of resulting structures where much of the dynamic information of the trajectory remains.

47
RMSD-based clustering is an obvious choice for selection of the most dominant
configurations within the trajectory. In their study of avian influenza neuraminidase using
the RCS, Cheng et al. applied RMSD clustering on snapshots extracted every 10ps from
40ns trajectories (Cheng et al., 2008). An alternate but equally effective method is that of
QR-factorization (O’Donoghue and Luthey-Schulten, 2005). QR-factorization was
originally designed for the removal of redundant information from structural databases by
identifying a set of structures which represent the evolutionary conformational space of a
protein. In their study of the Trypanosoma brucei RNA-editing Ligase 1 (TbRel1),
Amaro et al. integrated the use of QR factorization into the RCS in order to extract a
representative set of structures from a 20ns trajectory of the target in complex with ATP,
its native substrate (Amaro et al., 2007; Amaro et al., 2008). For the QR factorization,
snapshots were extracted every 50ps resulting in a set of 400 structures which was
reduced to a total of 33. In both cases the RCS proved extremely successful in identifying
true binders from the original database. For Cheng et al., the weighted average score
from docking into the full representative ensemble of the holo trajectory resulted in the
selection of 25 compounds, 10 of which displayed a Ki under 500µM (Cheng et al.,
2008). For Amaro et al., ranking of the mean score from docking into the QR
representative set resulted in the selection of 10 compounds, 5 of which displayed
inhibition at 10µM or better (Amaro et al., 2008).

48
1.7 Successes of CADD
Computer-aided drug design techniques have now become a core component of
modern drug discovery and development pipelines (Jorgensen, 2004). One of the most
prominent successes of rational, structure-based drug design is that of the imatinib
(Gleevec®), a tyrosine kinase inhibitor for the treatment of Chronic Myelogenous
Leukemia (CML) (Capdeville et al., 2002). Early drug discovery programs for the
treatment of cancer largely focused on inhibition of DNA synthesis and cell division
through the use of antimetabolites (nucleoside analogs and antifolates), alkylating agents
(classical and newer platinum-based therapeutics) and microtubule destabilizers (vinca
alkaloids) and microtubule stabilizing agents (taxanes) (Scott, 1970; Scagliotti and
Selvaggi, 2006; Zhou and Giannakakou, 2005). The uncovering of the bcr-abl reciprocal
translocation as the pathogenic event in CML established it as an attractive drug target
(Kelliher et al., 1990). Docking studies and X-ray crystallography established the binding
of Gleevec with high-affinity to the inactive form of the ATP-binding pocket (Schindler
et al., 2000; Zimmerman et al., 2001). Additionally, SBDD allowed for the analysis of
mutations in the enzyme which gives rise to imatinib resistance. This provided an
opportunity for the design of novel pharmaceuticals that are effective in overcoming
imatinib-resistance (Weisberg et al., 2007).
The first marketed drug whose development was assisted by SBDD was captopril
(Capoten®
), an angiotensin-converting enzyme (ACE) inhibitor used for the treatment of
hypertension (Cushman et al., 1977). Early on in the developmental stages a peptidic lead

49
compound had been identified from a snake poison. However, structural information as to
the binding site of ACE was lacking. This led scientists at Squibb to use the structure of
another zinc protease, the recently crystallized carboxypeptidase A, to model binding site
of ACE. The modeling led to the development of captopril, the first successful design
based on a molecular model. The structural determination of ACE came about in 2002
where it was determined that the biding site of ACE differed significantly from
carboxypeptidase A leading to the development of newer, more targeted ACE inhibititors
(Natesh et al., 2003).
The success of CADD in properly assessing the potential binding of a compound
to a target is directly related to our ability to correctly the binding affinity of a small
molecule. Many of the limitations of current in silico pipelines stem from the difficulties
in properly and reliably predicting the binding of small molecules to a target (Michel and
Essex, 2010).

50
Chapter 2
Molecular Dynamics Study of Small Molecule
Inhibitors of the Bcl-2 Family

51
Preface
The contents presented in the following chapter have been published as presented:
Acoca S, Cui Q, Shore GC, Purisima EO. 2011. Molecular dynamics study of small
molecule inhibitors of the Bcl-2 family. Proteins. 79(9):2624-36.

52
2.1 Rationale
Molecular modeling techniques have taken an important role in drug
development. This is especially true of molecular simulations and scoring functions
which provide useful insights for the optimization of lead compounds. Obatoclax and
ABT-737 are two novel Bcl-2 inhibitors which have different selectivity profiles for
antiapoptotic Bcl-2 members. While numerous studies have examined the selectivity of
BH3 domains for Bcl-2 members, few have provided conclusive evidence as to the
selectivity of ABT-737. With regards to Obatoclax, lack of structural data on its binding
mode has also left much questions unanswered as to how it mediates its inhibition of Bcl-
2 members. This study therefore aimed to provide the grounds on which the selectivity of
both ABT-737 and Obatoclax could be understood while identifying the most probable
binding mode of Obatoclax.
2.2 Abstract
We carried out docking and molecular dynamics simulations on ABT-737 and
Obatoclax, which are inhibitors of the Bcl-2 family of proteins. We modeled the binding
mode of ABT-737 with Bcl-XL, Bcl-2, and Mcl-1 and examined their dynamical
behavior. We found that the binding of the chlorobiphenyl end of ABT-737 was quite
stable across all three proteins. However, the phenylpiperazine linker group was
dramatically more mobile in Mcl-1 compared to either Bcl-XL or Bcl-2. The S-phenyl
group at the p4 binding site was well-anchored in Bcl-XL and Bcl-2 but was somewhat

53
more mobile in Mcl-1 although the phenyl ring itself on average stayed close to the p4
binding site in Mcl-1. This greater mobility is likely due to the greater openness of the p3
and p4 binding sites on Mcl-1. The calculated binding free energies were consistent with
the much weaker binding affinity of ABT-737 for Mcl-1. Obatoclax was predicted to
bind at the p1 and p2 binding sites of Mcl-1 and the binding mode was quite stable during
the molecular dynamics simulation with Mcl-1 wrapping around the molecule. The
modeled binding mode suggests that Obatoclax is able to inhibit all three proteins
because it makes use of the p1 and p2 binding sites alone, which is a fairly narrow groove
in all three proteins unlike the p4 binding site, which is much broader in Mcl-1.
2.3 Introduction
Cancer is fundamentally a disease of dynamic changes in the genome. It has been
described as a multistep process culminating in the acquirement of six essential
alterations in cellular physiology (Hanahan and Weinberg, 2000). Dysregulation of the
apoptotic process has been recognized as one of these critical alterations required for
progression to the disease phenotype (Hanahan and Weinberg, 2000). As such, research
directed towards a better understanding of the processes involved in the regulation of
apoptosis has bloomed in the past decade, directed towards a better understanding of the
extensive network of protein interactions that regulate it and the potential targets that can
be used to activate it.

54
At its core, apoptosis is the mechanism responsible for the careful synchrony of
cellular death observed throughout development, the maintenance of homeostasis and
proper immune function (Krammer et al., 1994; Meier et al., 2000; Elmore S, 2007).
There are two pathways (intrinsic and extrinsic) which converge towards activation of the
apoptotic machinery. The extrinsic pathway is characterized by activation of members of
the death receptor family (Ashkenazi and Dixit, 1998). Death receptors, which belong to
the tumor necrosis factor (TNF) receptor superfamily, are surface transmembrane
receptors engaged by binding of extracellular “death ligands” such as FasL and TNF
(Ashkenazi and Dixit, 1998). Activation of these receptors leads to the formation of the
death-inducing signaling complex (DISC), which mediates the activation of initiator
caspases thereby committing the cell to apoptotic death (Bao and Shi, 2007). On the other
hand, the intrinsic (mitochondrial) apoptotic pathway is triggered by mainly non-receptor
stimuli. It is unique in its ability to initiate apoptosis in response to DNA damage,
cytotoxic stress and cytokine deprivation though it can be engaged by the extrinsic
pathway as well (Brenner and Mak, 20009). In response to apoptotic stimuli, the intrinsic
pathway triggers the permeabilization of the outer mitochondrial membrane (OMM).
This permeabilization releases Cytochrome C and other molecules residing within the
mitochondrial inter membrane space (IMS) into the cytosol, resulting in the formation of
the apoptosome (a complex of Cytochrome C, APAF-1 and pro-caspase 9) and activation
of the caspase cascade through caspase 9 (Ow et al., 2008).
At the heart of the intrinsic pathway lies the Bcl-2 family of apoptotic proteins.
Known as the “Gatekeepers of Mitochondrial Apoptosis”, the Bcl-2 family of proteins

55
are unique in their role of regulating mitochondrial outer membrane integrity in response
to death stimuli (Adams and Cory, 2007). Through heterodimerization, anti-apoptotic
members can neutralize the effects of pro-apoptotic members, the relative balance of
which acts as a regulating switch for initiating mitochondrial apoptosis (Oltersdorf et al.,
2005). The Bcl-2 family is composed of three groups of proteins distinguished through
functional and structural features. The antiapoptotic members (consisting of Bcl-2, Bcl-
XL, Bcl-B, Bcl-W, Mcl-1 and A1) share three to four α-helical regions of high sequence
similarity known as the Bcl-2 Homology (BH) domains (Petros et al., 2004; Adams and
Cory, 2007). Bcl-2 pro-survival proteins inhibit the pro-apoptotic members in part by
sequestering the amphiphilic BH3 helix of the pro-apoptotic members within a long
surface exposed groove. Because the Bcl-2 survival members promote cell survival in
cancer cell lines, they are recognized as a highly relevant target for the treatment of
cancer. They are also implicated in general resistance to chemotherapeutic agents along
with a more aggressive malignant phenotype (Minn et al., 1995; Simonian et al., 1997;
Amundson et al, 2000). Bcl-2 inhibitors show promise as cancer therapeutics, especially
when used in combination therapy (Oltersdorf et al., 2005; Nguyen et al., 2007; Lessene
et al., 2008; Tse et al, 2008; Ackler et al., 2010).
One promising agent is the orally bioavailable compound ABT-263 (navitoclax);
ABT-737 (Figure 2.01) is an analog that is widely used in preclinical studies as a tool
compound (Oltersdorf et al., 2005; Tse et al., 2008). These compounds were developed
using the SAR by NMR methodology and employing stable protein fragments for optimal
NMR study. They display subnanomolar affinity for such recombinant fragments of Bcl-

56
2, Bcl-XL, and Bcl-W but > 1µM for Mcl-1 (Shuker et al., 1996; Oltersdorf et al., 2005;
Tse et al., 2008). As predicted by its affinity profile, ABT-737 as well as navitoclax
exhibits limited efficacy in cells where Mcl-1 is expressed (Konopleva et al., 2006; van
Delft et al., 2006; Chen et al., 2007; Tse et al. 2008). Consequently, this selectivity is one
of the key aspects of navitoclax that may limit its chemotherapeutic utility. Several
studies have addressed the selectivity of BH3 peptides for members of the Bcl-2 family;
however, extrapolation to explaining and modifying the selectivity of ABT-
737/navitoclax has not been straightforward (Lee et al., 2008; Lee et al., 2009; Fire et al.,
2010). Furthermore, the Bcl-2 pro-survival proteins are anchored in the mitochondrial
outer membrane where they in fact undergo conformational changes and greater
penetration into the lipid bilayer in response to stress stimuli (Shore et al., 2008). Thus
despite the high affinity binding of these compounds to soluble recombinant protein
fragments in aqueous buffers in vitro, it is not clear how this translates to the efficacy of
binding in intact cells.
Figure 2.01 ABT-737 chemical structure.

57
A second Bcl-2 inhibitor currently in Phase I & II trials is obatoclax (GX15-070),
a hydrophobic cycloprodigiosin derivative developed by Gemin X Pharmaceuticals
(Nguyen et al., 2007). Obatoclax (Figure 2.02) was found to inhibit the binding of BH3
peptides to recombinant fragments of all pro-survival members of the Bcl-2 family with
low micromolar affinity employing fluorescence polarization assays but its key property
lies in its ability to potently overcome Mcl-1 mediated resistance to chemotherapeutic
agents (Zhai et al., 2006; Nguyen et al., 2007; Perez-Galan et al., 2007). Indeed, in
assays employing native Mcl-1 in intact mitochondrial outer membrane, 10 nM obatoclax
reverses the constitutive interaction between Mcl-1 and pro-apoptotic Bak. Hence, an
understanding of its binding mode to Mcl-1 is of particular interest.
Figure 2.02 Obatoclax chemical structure.
In this chapter we present an extensive analysis of molecular dynamics
simulations performed for obatoclax/Mcl-1 and ABT-737 complexes. The aim of the
current study is to rationalize the binding specificity of ABT-737 and to predict the
binding mode of obatoclax to Mcl-1, for which an experimentally determined three-
dimensional structure of the complex has proven to be elusive.

58
2.4 Methods
2.4.1 Structure Preparation
The starting structures for the docking and molecular dynamics simulation
experiments of Bcl-2, Bcl-XL and Mcl-1 complexes were taken from the Protein Data
Bank (Codes 1YSW, 2YXJ, and 2PQK respectively). All bound ligands (small molecules
and BH3 peptides), waters and ions and other molecules were removed from the
complexes, except for Bcl-XL for which we kept the ABT-737 ligand. Missing side
chains, terminal residues and hydrogen atoms were added using Sybyl 8.0 (Tripos Inc.,
St. Louis, MO) and XLeap in AMBER (Case et al., 2005). Protonation states were
assigned using the H++ server (Gordon et al., 2005). Visual inspection of all assigned
protonation states was done in Sybyl 8.0 and adjusted as needed.
2.4.2 Force field parameters
The FF99SB force field in the AMBER suite of programs was used for the protein
atoms. The antechamber module of Amber Tools was used to assign GAFF parameters
for obatoclax and ABT-737 (Wang et al., 2004; Case et al., 2005; Hornak et al., 2006). In
the case of the ABT-737, we applied the biphenyl parameters of Athri and Wislon (Athri
and Wilson, 2009). Partial charges for the inhibitors were obtained using RESP with 6-
31G* electrostatic potentials calculated using GAMESS (Bayly et al., 1993; Schmidt et
al., 1993).

59
The sulfonamide group in ABT-737 has an imide-like bond (see Figure 2.01) that
is not well-represented by the default GAFF parameters. Hence, we derived force field
torsional parameters for the S–N bond using a model compound with a phenyl ring on
either side the SO2NHCO group. The covalent geometry was taken from the Cambridge
Structural Database (CSD) entry CEKHIJ (Allen FH, 2002). A torsional energy profile
around the S–N bond was generated using GAMESS at an MP2/6-31G* level of theory.
A truncated Fourier series was fitted to the residual torsional energy profile after
subtracting out the calculated AMBER energy. The resulting coefficients are listed in
Table S1 (Supplementary Materials).
2.4.3 Docking
For Bcl-XL the deposited crystal structure of the complex (PDB 2YXJ)
was used directly as the starting point for our calculations. For Bcl-2, the initial docked
pose of ABT-737 was obtained by superposing the Bcl-2 structure with Bcl-XL and
extracting and merging the inhibitor coordinates in the Bcl-XL structure into the Bcl-2
structure. The same procedure was carried out for docking ABT-737 into Mcl-1. For Bcl-
2 and Mcl-1, direct merging of the inhibitor into the binding site resulted in some side
chains being in awkward positions relative to ABT-737. These were initially relieved
using the Sculpt module of Pymol (Schrodinger, New York) followed by ligand-
restrained energy minimization.

60
Docking of obatoclax into Mcl-1 was carried out using an in-house docking
program (manuscript in preparation) that does an exhaustive rigid body docking
(translation and rotation) of the ligand on a grid. A rectangular box enclosing the entire
binding groove defined the search region. We used a grid spacing of 0.5 Å and rigid body
rotational angular increments corresponding to atomic displacements of 0.5 Å. Poses
were scored using a weighted combination of van der Waals, coulomb, surface area,
shape complementarity and hydrogen bonding terms. The weights were previously
calibrated to reproduce binding poses of a training set of protein-ligand complexes.
OMEGA (OpenEye Scientific Software, New Mexico) was used to generate conformers
for the ligand used in the rigid docking. The protein was kept fixed during the docking.
The top-scoring pose was used for the MD simulation.
2.4.4 Molecular Dynamics Simulations
Each system was immersed in a truncated octahedral TIP3P water box (Jorgensen
et al., 1983). The distance between the wall of the box and the closest atom of the solute
was 12Å. Sodium or chloride counterions were added as required to maintain
electroneutrality of the system. Molecular dynamics (MD) simulations were carried out
using the AMBER program. A 2 fs time step and 9 Å non-bonded cutoff was used.
SHAKE was employed to constrain bond lengths of bonds to hydrogen atoms and the
Particle Mesh Ewald algorithm was used to treat long-range electrostatics (Ryckaert et
al., 1977; Cheatam et al., 1995).

In silico methods in drug discovery and development

In silico methods in drug discovery and development

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (10)

Semelhante a In silico methods in drug discovery and development

Semelhante a In silico methods in drug discovery and development (20)

In silico methods in drug discovery and development