SlideShare uma empresa Scribd logo
1 de 270
Baixar para ler offline
 
 
 
 
In Silico methods in Drug Discovery
and Development
Stephane Acoca
Department of Biochemistry
McGill University
Montrea, Quebec, Canada
Submitted August 2011
A thesis submitted to McGill University in partial fulfillment of the requirements
for the degree of Doctor of Philosophy
© Stephane Acoca, 2011
 
i
 
Abstract
Computational drug design methods have become increasingly invaluable in the drug discovery
and development process. Throughout this thesis will be described the development and
application of methods that are used at every stage of the drug discovery and development
pipeline. In Chapter 2 will take a look at the use computational methods towards the
understanding and development of two novel Bcl-2 inhibitors, Obatoclax and ABT-737, being
developed for the treatment of Cancer. The study proposes certain mechanisms through which
ABT-737 displays selectivity towards certain targets within the Bcl-2 family. Additionally, we
propose a binding mode for Obatoclax which is in accordance with experimental data. The
following Chapter addresses the use of virtual screening for the identification of novel lead
compounds. Trypanosoma brucei RNA Editing Ligase 1 was chosen as the target for the
development of treatments against Trypanosoma infections and C35, a potent novel inhibitor of
the enzyme, was identified. Furthermore, our research shows that the action of C35 extends to
inhibition of several critical enzyme activities required for the RNA editing process as well as
compromising the integrity of the multiprotein complex which carries it out. The following
Chapter takes a look at the use of mass spectrometry data in order to expedite discovery of
bioactive compounds in natural products. We developed an algorithm which analyses MS/MS
data in order to derive the Molecular Formula of the compound. The novel algorithm obtained a
95% success rate on a test set of 91 compounds. The last Chapter of the thesis explores the use of
molecular dynamics to generate a conformational ensemble of targets for virtual screening.
Conformational ensembles were generated for a target test set taken from the Directory for
Useful Decoys. The results showed that molecular dynamics-based conformational ensembles
 
ii
 
provided remarkable improvements on 2 of the targets tested due to the enhanced capacity to
properly dock compounds in otherwise restricted structures. The last Chapter of the thesis is a
general discussion on the work of the thesis and a proposal on how all can be integrated within
the drug discovery and development pipeline.
 
iii
 
Résumé
Les méthodes the modélisation sont devenues un outil inestimable dans le processus de
découverte et de développement de nouveaux médicaments. Au cours de cette thèse va être
décrit le développement et l’application de méthodes utilisés à chaque stage de la découverte et
du développement de produits pharmaceutiques. Le Chapitre 2 est un aperçu sur l’utilisation de
méthodes computationnelles vers le développement de deux nouveaux inhibiteurs des protéines
Bcl-2, Obatoclax et ABT-737, en développement pour le traitement du Cancer. L’étude propose
certains mécanismes d’ABT-737 qui expliquent ca sélectivité envers les membres de la famille
Bcl-2. De plus, nous proposons un mécanisme d’attachement pour Obatoclax qui conforme aux
données expérimentales. Le Chapitre suivant adresse l’utilisation du dépistage virtuel pour
l’identification de nouvelles molécules mère. La Ligase de l’Edition d’ARN du Trypanosoma
brucei a été choisie comme cible pour le développement de traitements contre des infections dû
au Trypanosome et C35 a été identifié comme nouvel inhibiteur de l’enzyme. En outre, notre
recherche démontre que l’action de C35 s’étends a l’inhibition de plusieurs enzymes nécessaires
pour le mécanisme d’édition de l’ARN en plus de compromettre l’intégrité du complexe multi-
protéinique qui l’effectue. Le Chapitre suivant prends regard a l’utilisation de donnes dérivant
de la spectrométrie de masse pour but d’accélérer la découverte de molécules bioactives venant
de sources naturelles. Nous avons développé un algorithme qui analyse les données MS/MS pour
but de dériver la formule moléculaire du composé. Le nouvel algorithme a obtenu un taux de
succès s’élevant à 95% sur un ensemble test de 91 molécules. Le dernier Chapitre de la thèse
explore l’utilisation de simulations de dynamique moléculaire pour générer en ensemble
conformationel de protéines cible pour son utilisation dans le dépistage virtuel. Les ensembles
 
iv
 
conformationel ont étés généré pour une série test obtenu d’un répertoire attitré ‘Directory for
Useful Decoys’. Les résultats démontrent que les ensembles conformationel dérivés de la
dynamique moléculaire ont apporté des améliorations remarquables sur deux des cibles testées
dû à une capacité accrue de placement approprié des molécules dans un site qui est autrement
très restreint. Le dernier Chapitre de cette thèse est une discussion générale sur le travail
accomplie et une proposition sur la manière dont tous les éléments sont intégrer dans un
protocole de découverte et de développement de produits pharmaceutiques.
 
v
 
Acknowledgements
I would like to thank first Dr Enrico Purisima and Prof. Gordon Shore for their mentorship
and patience throughout the doctoral work leading to this thesis. I am very thankful for the
experiences I have during my tenure in Dr Purisima’s laboratory. I would like to show my
special thanks to former members of the laboratory Dr Sathesh Bhat, Dr Marwen Naim, Herve
Hogues, and Dr Qizhi Cui whose guidance, friendship, inspiration, assistance and support have
been invaluable during my tenure. My long conversations on modeling with Dr Bhat and Mr
Hogues have been of special value in my learning of computational modeling. I would also like
to extend thanks to current and past members of the laboratory which include Dr Shafinaz
Chowdhury, Dr Christophe Deprez, Dr Edwin Wang and Dr Sheldon Dennis for creating a
positive work environment. I would like to show special thanks members of my Research
Advisory Committee (RAC) Prof. John Silvius, Prof. Albert Berghuis who have been of
great assistance in guiding me through the completion of the doctoral work. I’d also like to add
special recognition to Prof. Imed Gallouzi for his help. I’d also like to thank the Chemical
Biology program at McGill, which has partially funded my work. Lastly, I’d like to thank my
family for their continued encouragement and support.
 
vi
 
Table of Contents
Abstract i
Résumé iii
Acknowledgements v
Table of Contents vi
List of Figures x
List of Tables xiii
Abbreviations xv
Contribution of Authors xvii
Chapter 1. General Introduction
1.1 Drug Discovery and Development 2
1.1.1 Overview & Challenges 2
1.1.2 Thesis Outline 5
1.2 Molecular Modeling 7
1.2.1 Molecular Mechanics 7
1.3 Predicting Binding Free Energies – Scoring 12
1.3.1 Effect of water: Continuum (Implicit) Solvation energy 15
1.3.1.1 Finite difference 18
1.3.1.2 Boundary Element Method 19
1.3.1.3 Desolvation Cost 19
1.3.2 Scoring Functions 21
1.3.2.1 Physical-Chemical 22
1.3.2.2 Empirical function 25
1.3.2.3 Knowledge-based 25
1.3.2.4 Problems 26
1.4 Predicting Binding Modes – Docking 27
1.4.1 Docking Algorithms 28
1.4.1.1 Fast Shape Matching 28
1.4.1.2 Incremental Construction 29
1.4.1.3 Monte Carlo Simulations 30
1.4.1.4 Evolutionary Programming 31
1.5 Molecular Dynamics 32
1.5.1 Newton’s Laws 32
1.5.2 Ensembles 34
1.5.3 Verlet Algorithm 34
1.5.4 Considerations 36
1.5.5 Boundary Conditions 38
1.5.6 Long Range Electrostatic Calculations: The Ewald Summation Method 39
1.6 Virtual Screening 40
1.6.1 Virtual Screening Pipeline 41
 
vii
 
1.6.2 The Target 43
1.6.3 The Compound Database 44
1.6.4 The Docking Protocol 44
1.6.5 MD Simulations 45
1.6.6 Conformational Ensembles 45
1.7 Successes of CADD 48
Chapter 2. Molecular Dynamics Study of Small Molecule Inhibitors
of the Bcl-2 Family
Preface 51
2.1 Rationale 52
2.2 Abstract 52
2.3 Introduction 53
2.4 Methods 58
2.4.1 Structure Preparation 58
2.4.2 Force Field Parameters 58
2.4.3 Docking 59
2.4.4 Molecular Dynamics Simulations 60
2.4.5 Binding free energy estimate 61
2.5 Results and Discussion 62
2.5.1 Molecular Modeling of ABT-737 complexes 62
2.5.2 Binding groove structure 64
2.5.3 Chlorobiphenyl group 65
2.5.4 Phenylpiperazine linker 67
2.5.5 Nitrophenylsulfonamide group 69
2.5.6 S-phenyl group 71
2.5.7 Dimethyl group 72
2.5.8 SIE Analysis and Virtual Alanine Mutations 72
2.5.9 Protein structure and dynamics 75
2.5.10 Mcl-1 and obatoclax 78
2.6 Conclusion 80
Chapter 3. Naphthalene-based RNA editing inhibitor blocks RNA
editing activities and editosome assembly in Trypanosoma
Brucei
Preface 83
3.1 Rationale 84
3.2 Abstract 84
3.3 Introduction 85
3.4 Experimental Procedures 88
3.4.1 Structure Preparation 88
3.4.2 Virtual Screening 89
3.4.3 Solvated Interaction Energy 90
 
viii
 
3.4.4 Preparation of mitochondrial extract and tandem affinity purification 90
of ligase complex
3.4.5 Preparation of RNAs 91
3.4.6 Adenylylation and deadenylylation assays 91
3.4.7 In vitro RNA editing assays 92
3.4.8 Gel shift assay 93
3.4.9 Guanylyltransferase labeling 94
3.5 Results 94
3.5.1 Virtual Screening 94
3.5.2 Inhibition of RNA editing by selected compounds 97
3.5.3 Inhibition of ligase adenylylation at low protein concentrations by C35 and S5 100
3.5.4 Inhibition of deadenylylation by C35 and S5 102
3.5.5 Inhibition of different steps of RNA editing by C35 and S5 104
3.5.6 Inhibitory compounds affect the editosome RNA-binding activity 107
3.5.7 20S editosome complex integrity is affected by C35 treatment 110
3.6 Discussion 112
3.7 Acknowledgments 117
Chapter 4. Automated Molecular Formula Analysis determination by
Tandem Mass Spectrometry (MS/MS)
Preface 119
4.1 Rationale 120
4.2 Abstract 120
4.3 Introduction 121
4.4 Experimental 125
4.4.1 Materials 125
4.4.2 Instrumentation 125
4.4.3 MS/MS experiments 126
4.4.4 The algorithm of molecular formula analysis 127
4.4.5 Nitrogen-enriched or oxygen-enriched compounds 132
4.5 Results and Discussion 133
4.5.1 Risk of assigning incorrect molecular formula 133
4.5.2 Mass accuracy 134
4.5.3 Fragmentation pathways of brefeldin 4 135
4.5.4 Molecules with single structural domain 137
4.5.5 Molecules with multiple core structures 140
4.5.6 Analysis of structurally-related compounds 143
4.5.7 Cyclazocine and N-alllylnormetazocine 146
4.5.8 Peptides 148
4.5.9 Chloro- or bromo-containing compounds 152
4.6 Conclusion 154
4.7 Acknowledgements 155
 
ix
 
Chapter 5. Molecular Dynamics ensemble in Virtual Screening
Preface 157
5.1 Rationale 158
5.2 Abstract 158
5.3 Introduction 159
5.4 Methods 162
5.4.1 Structure Preparation 162
5.4.2 Ligand Preparation and Docking 163
5.4.3 Molecular dynamics simulations 164
5.4.4 Force Field Parameters 165
5.4.5 Clustering 165
5.4.6 Test Data Sets 165
5.5 Results and Discussion 166
5.5.1 Overview of Results 166
5.5.2 Obstructive changes during apo simulations 167
5.5.3 Performance of holo ensemble 170
5.5.4 Structural change in holo ensemble 171
5.5.5 Effect on score distribution 174
5.5.6 Comparison with RCS 177
5.5.7 Use of DUD training set 178
5.6 Conclusion 180
Chapter 6. General Discussion
6.1 Molecular Dynamics Study of Bcl-2 Inhibitors
6.2 Discovery of TbRel1 Inhibitors
6.3 Automated Molecular Formula determination by Tandem Mass Spectrometry
6.4 Ensemble-based Virtual Screening
Appendices
Appendix A
Appendix B
Appendix C
Appendix D
References 222
Original Contributions to Knowledge 250
 
 
 
x
 
List of Figures
Chapter 1
Figure 1.01 The pharmaceutical drug discovery and development pipeline
Figure 1.02 Increasing costs in pharmaceutical R&D
Figure 1.03 Pre-approval costs for new drugs
Figure 1.04 The contributions of bonded terms to the potential energy function
Figure 1.05 Cubic grid scheme for the Finite Difference Method
Figure 1.06 Representation of desolvation effects during ligand-protein complex formation
Figure 1.07 Periodic boundary conditions in molecular dynamic simulations
Figure 1.08 The virtual screening pipeline
Chapter 2
Figure 2.01 ABT-737 chemical structure
Figure 2.02 Obatoclax chemical structure
Figure 2.03 Multiple sequence alignment of representative BH3 domains from BH3-Only
proteins.
Figure 2.04 Superposition of ABT-737 and Bim BH3 peptide bound to Bcl-xL.
Figure 2.05 Calculated binding mode of ABT-737 in Bcl-xl, Bcl-2 and Mcl-1.
Figure 2.06 Distance of the ABT-737 biphenyl ring centroids from their initial positions after
superposition of the protein C-alpha atoms to those in the first snapshots.
Figure 2.07 Distance of the ABT-737 linker ring centroids from their initial positions after
superposition of the protein C-alpha atoms to those in the first snapshots.
Figure 2.08 Distance of the ABT-737 nitrophenyl and S-phenyl ring centroids from their
initial positions after superposition of the protein C-alpha atoms to those in the
first snapshot.
Figure 2.09 Calculated binding mode of obatoclax in Mcl-1.
Chapter 3
Figure 3.01 Predicted binding modes of TbREL1
Figure 3.02 Effect of selected compounds that inhibit editosome activity
Figure 3.03 Effect of inhibitory compounds on adenylylation and deadenylylation steps of
RNA editing ligases
Figure 3.04 Effect of inhibitory compounds on different steps of RNA editing
Figure 3.05 Effect of inhibitory compounds on RNA-binding activity of editosome complex
Figure 3.06 Analysis of sedimentation profile and activity of ligase-associated complexes in
the presence of C35
 
xi
 
Figure 3.07 Alternative models for the mechanism of action of C35 and S5.
Chapter 4
Figure 4.01 The MS/MS spectrum of brefeldin A
Figure 4.02 Fragmentation pathways of brefeldin A
Figure 4.03 The MS/MS spectrum of prazosin
Figure 4.04 Fragmentation pathways of prazosin
Figure 4.05 The MS/MS spectrum of dihydroergotamine and dihydroergocristine
Figure 4.06 Fragmentation pathways of dihydroergotamine
Figure 4.07 Structures of dihydroergotamine and dihydroergocristine
Figure 4.08 The MS/MS spectrum of cyclazocine and N-allylnormetazocine
Figure 4.09 Fragmentation pathways of cyclazocine
Figure 4.10 The MS/MS spectrum of 5-leucine encephalin
Figure 4.11 Stepwise analysis of 5-leucine encephalin sequences
Figure 4.12 Overall detail analysis of 5-leucine encephalin
Figure 4.13 The MS/MS spectrum of quinacrine
Figure 4.14 Shows the plausible fragmentation pathways of quinacrine
Chapter 5
Figure 5.01 Changes in binding site observed in the apo ensemble in a) COX2, b) AR,
c) GART and d) PARP.
Figure 5.02 Changes in binding site observed in the holo ensemble for COX2.
Figure 5.03 Changes in binding site observed in the holo ensemble for AR.
Figure 5.04 Changes in binding site observed in the holo ensemble for ER.
Figure 5.05 Score distribution of true binders across the crystal structure and selected holo ensemble
structure for a) ER, b) AR, c) EGFR, and d) COX2.
Appendix
A
Figure A.01 Helices surrounding the binding grooves of Bcl-xL, Bcl-2 and Mcl-1.
Figure A.02 Distance between ABT-737 sulfonamide HN and backbone carbonyl O of Bcl-xL
Asn136, bcl-2 Asn140 and Mcl-1 Asn260.
Figure A.03 Hydrogen bond pair distances between ABT-737 sulfonyl O and side chains
in Bcl-xL, Bcl-2 and Mcl-1
 
xii
 
Figure A.04 Hydrogen bond pair distances between ABT-737 dimethylamino HN and side
chain carboxylate O in Bcl-xL and Bcl-2.
Figure A.05 Distance of ABT-737 ring centroids from their initial positions after superposition
of the protein C-alpha atoms to those in the first snapshot.
B
Figure B.01 Inhibitors identified from first round of virtual screening.
Figure B.02 Previously identified inhibitors not retrieved in virtual screening.
D
Figure D.01 Overview of VS results for the crystal structure and apo/holo ensembles
Figure D.02 Ensemble-based VS results for structures generated from apo MDs
 
xiii
 
List of Tables
Chapter 2
Table 2.1 Solvated interaction energies (SIE) in kcal/mol
Table 2.2 Virtual alanine mutations
Chapter 3
Table 3.1 Virtual hits selected for experimental validation
Chapter 4
Table 4.1 Potential neutral losses in the MS/MS experiment in forward MFA
Table 4.2 Reverse MFA of brefeldin A with correct formula of precursor ion
Table 4.3 Reverse MFA of brefeldin A with incorrect formula of precursor ion
Table 4.4 Molecular formula analysis of prazonsin
Table 4.5 Molecular formula analysis of dihydroergotamine
Table 4.6 Molecular formula analysis of dihydroergocristine
Table 4.7 Molecular formula analysis of cyclazocine
Table 4.8 Molecular formula analysis of N-allylnormetazocine
Table 4.9 Molecular formula analysis of quinacrine
Chapter 5
Table 5.1 Targets of the DUD set selected and properties of each set
Appendix
A
Table A.01 Fourier coefficients for ca-s6-n-ca
 
xiv
 
B
Table B.01 Ranking of selected hits from virtual screen
C
Table C.01 Molecular formula analysis of 5-leucine enkephalin
 
xv
 
Abbreviations
ADA Adenosine Deaminase
AR Androgen Receptor
BCL-2 B-Cell Lymphoma 2
BEM Boundary Element Method
CADD Computer-Aided Drug Design
CML Chronic Myelogenous Leukemia
COX2 Cyclooxygenase 2
CRK Cdc2-Related Kinase
DNDi Drugs for Neglected Disease initiative
DUD Directory of Useful Decoys
EGFR Epidermal Growth Factor Receptor
EP Evolutionary Programming
ER Estrogen Receptor
FDM Finite Difference Method
FXa Factor Xa
GA Genetic Algorithm
GART Glynacinamide Ribonucleotide Transformylase
gRNA guide RNA
GSK Glycogen Synthase Kinase
HSP90 Heat Shock Protein 90
IC Incremental Construction
KB Knowledge-based
KBP Knowledge-based potentials
LGA Lamarckian Genetic Algorithm
MAPK Mitogen-Activated Protein Kinase
MC Monte-Carlo
MD Molecular Dynamics
MF Molecular Formula
MFA Molecular Formula Analysis
MM Molecular Mechanics
MW Molecular Weight
NCE New Chemical Entity
NS Nanoseconds
NTD Neglected Tropical Diseases
PARP Poly ADP-Ribose Polymerase
PDB Protein Data Bank
PBSA Poisson-Boltzmann Surface Area
PS Picoseconds
RCS Relaxed Complex Scheme
RMS Root Mean Square
SA Surface Area
SBDD Structure-Based Drug Design
SIE Solvated Interaction Energy
 
xvi
 
SM Shape Matching
SRC SRC Tyrosine Kinase
TbRel1 Trypanosoma Brucei RNA-editing Ligase 1
VDS Virtual Decoy Set
VdW Van der Waals
VS Virtual Screening
 
xvii
 
Contributions of Authors
This thesis includes the text and figures from 3 published articles. I am the first author in
one of the manuscript (Chapter 2) and second author in the remaining two (Chapter 3 & 4).
Additionally, the thesis includes the text and figures from work to be completed towards the
publication of a manuscript (Chapter 5). This thesis has been written in manuscript-based format,
and the references of all chapters have been combined into one reference section at the end of the
dissertation. The contributions of the authors for each of the manuscripts are as follows:
Chapter 2:
Acoca S., Cui Q., Shore G.C., Purisima E.O. 2011. Molecular Dynamics Study of Small
Molecule Inhibitors of the Bcl-2 Family. Proteins. 79(9):2624-36.
I performed all original work and completed the first draft of the manuscript. Prior to
submission, Dr Cui reran a number of the simulations and Dr Purisima reworked the manuscript.
Chapter 3:
Moshiri H., Acoca S., Kala S., Najafadabi H.S., Hogues H., Purisima E.O., Salavati R. 2011.
RNA Editing Ligase 1 Inhibitors Blocks RNA Editing Activities and Editosome Assembly in
Trypanosoma Brucei. J Biol Chemistry. 286(16):14178-89.
My contributions to the manuscript involved the virtual screening segment of the work.
Specifically, the a) Virtual Screening section, b) Figure 1, c) Table1 and d) all relevant section
of the Experimental Procedures (Structure Preparation, Virtual Screening and Solvated
Interaction Energy). Prof Salavati’s Group carried out all experimental testing of the
compounds and its inhibitory properties with regards to the 20s Editosome activities.
Chapter 4:
Jarussophon S, Acoca S, Gao J.M., Deprez C., Kiyota T., Draghici C., Purisima E., Konishi Y.
2009. Automated Molecular Formula Determination by Tandem Mass Spectrometry (MS/MS).
Analyst 134(4):690-700.
 
xviii
 
I wrote the code for the software that ran the analysis and collaborated with Dr Konishi in its
development. The algorithm implemented in the software was originally developed by Dr
Konishi and his group. Dr Deprez is responsible for the continued maintenance of the software.
Chapter 5:
Acoca S., Hogues H, Purisima EO. 2010. Molecular dynamics ensembles for virtual screening.
(Manuscript in preparation).
The entirety of the work for this manuscript was carried by me. The docking scripts for the
tailoring of the pipeline to ensemble virtual screening were written by Mr Hogues.
 
1
Chapter 1
General Introduction
1.1
1.1.1
T
since
over
from
case,
multi
1.01)
F
PR
T
Drug Di
Overview
Though the u
the beginnin
a century old
plants and m
the modern
i-step proces
).
Figure 1.01
RE-CLINI
Target Identif
Identificati
Lead Comp
iscovery a
w & Challen
use of foreign
ng of time, t
d. Since then
microbial sou
pipeline for
ss involving
The Ph
ICAL ST
In Vitro Te
Animal Te
fication
on of
ounds
and Develo
nges
n substances
the use of an
n, medicinal
urces, or pro
r pharmaceut
the collabor
harmaceutica
TUDIES
esting
esting
2
opment
s for the treat
n isolated, we
l substances
oducts of pur
tical drug di
rative effort o
al Drug Disc
Ph
Ph
tment of illn
ell-defined c
have been n
re chemical
scovery and
of a multitud
overy and D
CLINIC
hase I
hase II
Lead O
nesses has be
chemical ent
natural produ
synthesis. W
d developmen
de of special
Development
AL STUD
Optimization
een practiced
tity is only
ucts isolated
Whichever th
nt is a long,
lties (Figure
Pipeline
DIES
Phase III
Phase IV
n
d
he
3
However, no venture of pharmaceutical research is without risk and a positive
outcome of the research is all but guaranteed. The difficulties inherent in discovery and
development along with the stringent requirements of pharmaceutical drugs have created
an economic problem in the profitability of such endeavors. Despite some spectacular
successes, more is spent on drug discovery and development every year and less is
delivered in terms of innovation (DiMasi et al., 2003). Figure 1.02 shows the reported
aggregate annual domestic prescription drug R&D expenditures for all members of the
U.S. pharmaceutical industry since 1963 alongside with the number of new US drug
approvals by year (DiMasi et al., 2003). When compared, the rate of growth of R&D
expenditures clearly outpaces that of new approvals by a large margin. These rising costs
have led to an overwhelming economical R&D problem within the pharmaceutical
industry. In 2003, a study of 68 new medications placed a timeline of 10-12 years and
cumulative costs averaging US$897 million for the development and marketing of a new
medication (Ezzell, 2003). The pre-approval R&D costs themselves are up from US$138
million in 1979 to US$318 million in 1991 to US$802 million in 2000 (Figure 1.03). The
result of these increases in higher R&D costs is an increased trend towards mergers and
industry consolidation. Additionally, higher costs translate into lowering risks.
Reorganization of R&D sectors in the pharmaceutical industry aims to optimize the
return on investment by carefully selecting the most profitable research sectors. The sum
of these effects leads to an increased need in efficient, low-cost technologies that bridge
the gap between R&D and the economic challenges facing the pharmaceutical industry.
F
in
fr
(P
F
pr
D
Figure 1.02
ndustry R&D
rom 1963 to 2
PhRMA) and
Figure 1.03
re-clinical, cl
DiMasi et al., 2
Increa
expenditures
2000. Source
d Tufts CSDD
Pre-ap
inical and tot
2003)
sing costs in
s (2000 dollar
of data: Phar
D Approved N
pproval costs
tal costs per ap
4
pharmaceut
rs) and US ne
rmaceutical R
NCE database.
for new dru
pproved new
tical R&D. In
ew chemical e
Research and M
. (Taken from
ugs. Each colu
w drug in 2000
nflation-adjus
entity (NCE)
Manufacturer
m DiMasi et al
umn indicates
0 US dollars.
sted
approvals
rs of America
l., 2003)
s the capitaliz
(Taken from
a
ze
5
Computer-Assisted Drug Design (CADD) approaches have been widely used in the
pharmaceutical industry. By allowing scientists to direct their attention on the most
promising candidate compounds, and thereby narrowing the synthetic and biological
testing efforts, CADD approaches play an important role in accelerating pharmaceutical
research. The recent successes of CADD in assisting rational drug design approaches
have proven it to be an essential tool drug design and development (Kapetanovic IM,
2008; Mandal et al., 2009; Song et al., 2009).
1.1.2 Thesis Outline
As part of this thesis, several elements of CADD have been incorporated into
research targeted at every step of the pharmaceutical drug discovery pipeline. The
following is a description of the contributions of each chapter to the individual segments
of the pharmaceutical drug design and discovery pipeline.
Chapter 4 explores the lead identification stage of the pipeline and provides an
alternative means of expediting research when identification of an active compound from
a natural products sample is required. Natural products (and their semi-synthetic
derivatives) have been major sources of marketed medications. However, lead isolation
and identification from natural product extracts faces the problem of replication, i.e. the
re-discovery of known natural products. Chapter 4 presents the development of a novel
algorithm which utilizes MS/MS data to extrapolate the correct molecular formula of a
6
compound resulting in a rapid identification of the probable nature of the isolated
compound.
Chapters 3 and 5 look at lead identification from the alternative source: compound
databases. Chapter 3 is the application of our virtual screening (VS) pipeline to the
Trypanosoma Brucei RNA-editing Ligase 1 (TbRel1) where the success of our screen led
to the identification of an inhibitor which allowed a better understanding as to the effects
of inhibition. Chapter 5 seeks to further enhance the current VS pipeline by utilizing MD-
generated conformational ensembles. Here the use of conformational ensembles (see
Chapter 1.6.6) attempts to provide a better, more complete representation of the target’s
conformational dynamics as part of the VS process.
Lastly, Chapter 2 is a representative pre-clinical molecular dynamics (MD) study
of a lead compound in complex with its target. The in silico MD study provides the
opportunity to researchers of obtaining information on the mechanism of action of the
compound that would be unavailable through the usual experimental means. Our
experiments aimed at identifying specific structural factors which provided the specificity
of two compounds which target the Bcl-2 family of proteins which have recently become
key targets for cancer therapeutics.
7
1.2 Molecular Modeling
The field of Computational Drug Design relies on the development of our
understanding of the underlying mechanisms involved in the interactions of a drug and its
target. As such, the development of Molecular Mechanics (MM) and Quantum
Mechanics (QM) has brought about the study of drug-target recognition events at the
atomic and electronic level. The increasing accuracy of these models, along with that of
the computational resources required to compute them, has prompted the development of
computational tools with increasing accuracy in evaluating drug-target interactions.
1.2.1 Molecular Mechanics
First applied by Westheimer and Mayer in 1946, MM encompasses the
computational techniques that allow the calculation of molecular properties through the
use of classical mechanics and electrostatics (Westheimer and Mayer, 1946). MM
provides the means to computationally describe molecular structures and properties
practically. As opposed to QM where the primary purpose is the accuracy of the
calculations, MM packages are directed to describe molecular structures and properties
accurately, robustly, and within reasonable time frames (Boyd DB and Lipkowitz KB,
1982). To do so MM (also referred to as Force Fields) describes molecules as a collection
of atoms held together by elastic or harmonic forces. These forces essentially represent
the structural features of a molecule such as bond lengths, bond angles, dihedral angles,
etc. Functions are used to describe the behavior of these forces resulting in a calculated
8
potential energy for each. As such, the total potential energy of a molecule is calculated
by the sum of all energy contributions (Eq. 1.01):
																																		 = + + + + 																														(1.01)
Functional form of the potential energy of a molecule
where Ebond, Eangle, Etorsion, Evdw and Eelec describe the bond length, bond angle, torsion
angle, Van der Waals and electrostatic contributions respectively. Energy contributions
are calculated to describe the deviation of structural features from their empirically (or
high-level QM) calculated ideal value. While the exact mathematical functions utilized to
describe these contributions may differ between MM packages, the functions are chosen
to accurately replicate the behavior of each energy contribution within expected ranges
while minimizing the amount of calculations, and therefore of computational time,
required. From herein, all discussions on the potential energy function will refer to that
implemented by the AMBER forcefield (Cornell et al., 1995).
The potential energy function is described as follows:
																					Ε =	 − +	 − +																												(1.02)
2
1 + cos( Φ − ) 		+	 − +
The potential energy function
wher
length
bond
atom
and r
will n
e Kr, Kθ and
h respective
ed paramete
-centered pa
rij is the dista
now be discu
Figure 1.0
The torsion
series, harm
Harbury, 2
d Vn are force
ly, and γ is t
ers used to co
artial charges
ance between
ussed briefly
04 The co
n angle, bond
monic potenti
2007).
e constants,
the phase for
ompute the v
s on atoms i
n atoms i an
y.
ontributions o
d length/angle
ial, and Lenar
9
θeq and req a
r the torsiona
van der Waa
and j respec
d j. Each ter
of bonded ter
e, and VdW c
rd-Jones pote
are the equil
al angle. Aij
als energies.
ctively, ε is
rm of the pot
rms to the po
ontact are rep
ential respecti
librium bond
and Bij are t
qi and qj de
the dielectri
tential energ
otential ener
presented by a
ively.(Taken f
d angle and
the non-
enote the
c constant
gy function
rgy function.
a Fourier
from Boas annd
10
	Ε =	 − 				(1.03a)												Ε =		 − 				(1.03b)
The bond length and bond angle contributions to the potential energy function.
The typical bond length of an alkane carbon-carbon bond is 1.53Å. Similarly, the
angle between a typical C-C-C bond is between 109° and 114°. Deviations from these
equilibrium values will result in an increase in the energy of the system. Therefore,
thinking of a molecule as an assembly of point masses held together by springs (the bond
lengths and angles) is a perfectly reasonable approximation to their experimental
behavior. Therefore, the Ebond, Eangle terms are modeled as harmonic potentials centered
around an equilibrium value (Eq. 1.03a,b).
																																																	Ε =
2
1 + cos( Φ − ) 																		(1.03 )
The dihedral angle contribution to the potential energy function.
The torsion angle is essentially the rotation about bonds. For any set of four
covalently bonded atoms ABCD the torsion angle is described as the angle measured
about the BC axis from the ABC plane to the BCD plane. The periodic nature of the
torsion angle, and of the torsional potential energy, lends itself to be described by
periodic functions such as a Fourier series with the series typically truncated at the third
term (Eq. 1.03c).
11
																																																								Ε =	 − 																																															(1.03 )
The VdW contribution to the potential energy function.
The Van der Waals (VdW) energy relates to non-bonded interactions of atoms as
a function of the distance between the nuclei. As two atoms approach one another,
London dispersion forces predominate creating a net attractive force between them. As
the distance between the two radii get too close, a VdW repulsion comes into play. The
attractive and repulsive parts of the potential energy is described by the Lenard-Jones (6-
12) potential though the more computationally demanding Buckingham potential can also
be used (Eq 1.03d). Parameters for the VdW energy term are obtained by measuring non-
bonded contact distances in crystals as well as VdW contact data for rare gas atoms
though other non-experimental sources (simulations) can also be used (Boyd and
Lipkowitz, 1982; Cornell et al., 1995).
																																																									Ε =	 																																																								(1.03 )
The electrostatics contribution to the potential energy function.
The last term in the potential energy function calculates the electrostatic energy
associated with interaction of two point charges, as described by Coulomb’s law.
Therefore, the magnitude of the electrostatic forces (and energy) of interaction between
two point charges is directly proportional to the scalar multiplication of the magnitude of
the charges and inversely proportional to the square of the distances between them (Eq.
12
1.03e). Applications of MM Force fields include, but are not limited to, energy
minimization, scoring, docking, molecular dynamics and Monte Carlo methods.
Over the past decade, the development of techniques such as high-throughput X-
ray crystallography has expedited the rate of macromolecular structure determination
resulting in a current total of ~70000 crystallographic or solution structures of proteins
deposited in the Protein Data Bank (Berman et al., 2000). The availability of this wealth
of structural information, along with well-documented successes, has generated
considerable interest the advancement of structure-based drug design (SBDD) techniques
(Marrone et al., 1997). A number of structure-based screening methods have been
developed to expedite pharmaceutical research. These methods have been used in lead
discovery identify novel chemical entities showing strong inhibitory activity towards a
target and in lead optimization where the careful selection of an optimized lead within a
set of chemically similar compounds is required. The following sections will include an
overall review of the methods which have been most crucial to the development of
molecular modeling in drug design, namely predicting binding free energy and binding
modes, and will be followed by a review of Molecular Dynamics & Virtual Screening
methods which have together significantly contributed to the advancement of CADD.
1.3 Predicting Binding Free Energies - Scoring
Calculations of free binding energies play an important role in the accuracy of
SBDD techniques (Raha and Merz, 2005). The major function of such techniques is in
13
providing estimates of binding free energies at a faster rate and lower cost than that
possible by experimental means. As such, the correlation between experimental and
computationally derived binding free energies to a target is a prerequisite to their success
in drug design.
The selective binding of a small molecule to a target protein is the result of
complementary structural and energetic features. This reaction is determined by the
standard Gibb’s free energy of binding Δ ∘
under standard state conditions
(concentrations at 1M, temperature is 298K and pressure of 1atm). The experimentally
determined association, dissociation and inhibitory constants (KA, KD and Ki respectively)
relate to the standard Gibbs free energy as follows:
																																																				 = = = 																																																(1.04)
																																																														Δ ∘
=	− 																																																								(1.05)
As the binding free energy of a system is a state function, theoretical calculations of the
binding free energy can approximate the binding free energy in a direct fashion, by
calculating the properties of the protein and ligand individually and then of their
complex:
14
																																					Δ =	 − + 																										(1.06)
where ∆ 		is the free energy of binding, 		is the free energy of the complex,
		and 	 		the free energies of the protein and ligand respectively. Another
form of expression of the binding free energy used is the decomposition into different
additive free energy components integrated into a single equation:
																																									Δ = 	Δ + Δ + Δ + Δ 																					(1.07)
In Eq 1.08 , ∆ 		is the interaction free energy owing mostly to electrostatic and steric
enthalpic contributions from complex formation, ∆ 		is the free energy of solvation
which accounts for solvent effects in binding, ∆ 	is the free energy change
associated with changes in the motion of the components of the system, and ∆
accounts for the free energy due to conformational changes upon complexation. Scoring
functions address these components of the binding free energy differently. Chapter 1.3.2
is a review of the different methods employed to evaluate them. However, before
addressing the differences between scoring methods, an extensive review of solvation
effects on binding is appropriate, as the development of implicit solvation models has had
a tremendous impact in the calculation of solvation free energies and hence of our ability
to estimate binding free energies (Tomasi and Persico, 1994; Orozco and Luque, 2000).
15
1.3.1 Effect of water: Continuum (Implicit) Solvation energy
Protein-ligand binding is a process that normally occurs within an aqueous
environment. These interactions play a significant role in binding energetics and are thus
taken into account when making binding free energy predictions. The effective dielectric
constant of water at 25°C is 78.5 while that of vacuum is 1. This energy is the result of a
favorable interaction between the atomic charge and the high-dielectric environment. As
a result of this favorable interaction, there is an energy penalty when polar parts of the
ligand are removed from their contact with water and exposed instead to the binding site.
Additionally, the presence of water results in the effective screening of charge-charge
interactions as indicated by the dielectric constant in the Coulomb equation (Eq. 1.03e).
However, the interface of a protein-ligand complex usually excludes the presence of
water molecules. In order to account for the distance-dependence on the effect of water
on charge-charge screening, a crude screening model that contains a distance-dependent
dielectric constant was introduced. In this model, for all atoms i and j in Eq 1.03e the
effective dielectric constant would be ε = Crij where C is a constant and rij is the
interatomic distance. While this model allows for the rapid calculation of one of the
major effects of water, it does not account for the one-body solvation energy for each
atom. Additionally, in calculating the electrostatic interaction between two atoms, the
position of all other protein and ligand atoms affect it and should also be taken into
account.
16
Continuum (implicit) solvation models can account for the additional
complexities of electrostatic interactions. The continuum solvation models essentially
treat the solvent as a bulk dielectric medium with a dielectric constant of ~80 (Dout=78.5
for water at 25°C) and the protein/ligand as low-dielectric regions with enclosed atomic
charges. Numerical solutions to the Poisson Boltzmann (PB) equation provide an
efficient means of calculating the electrostatic potential produced by a system. The PB
equation relates the electrostatic potential Φ(r) to the charge density ρ(r) as:
∇( ( )∇ ( )) =	−4 ( ) (1.08)
where ε(r) is the dielectric constant. The total free energy of solvation is calculated as
follows:
∆ =	∆ + ∆ (1.09)
In Eq. 1.09, Φ(r), obtained from solving the PB equation, allows for the computation of
the total electrostatic energy component of the solvation free energy:
	 =	 ∑ ( ) = ∑ ( ( ) + ( )) (1.10)
where ϕC
and ϕR
are respectively the Coulomb and reaction field potential. The
Coulombic component is calculated as a Coulomb summation over all other charges than
qi :
17
																																																																	 ( ) =	
1
																																															(1.11)
The reaction field component ϕR
of the electrostatic potential is derived from numerical
solutions to the PB equation, using either a finite difference scheme (FDM) or a boundary
element method (BEM) (Gilson et al., 1988; Honig and Nicholls, 1995; Purisima and
Nilar, 1995). Therefore, once a solution to the PB equation is calculated, the electrostatics
component of the solvation free energy is obtained.
The non-polar segment is derived from surface area terms. It contains
contributions from cavity formation and solvent-solute dispersion-repulsion interactions.
These terms are often considered to be proportional to the molecular surface area (Floris
and Tomasi, 1989; Still et al., 1990; Gogonea and Merz, 1999). The general formula for
this term is therefore:
=	∑ (1.12)
where Ai is the furnace area of one solute atom and τi is a surface tension parameter
specific for that atom. Typically, the molecular surface will be defined as the solvent-
excluded surface area or the solvent-accessible surface area. The solvent-excluded
surface may however perform better than other surface models (Pitarch et al., 1996).
The f
meth
Elem
1.3.1
Honi
(War
super
poten
withi
betwe
assign
formu
following se
ods for solvi
ment (BEM) m
.1 Fi
While the
g’s group in
rwicker and W
rimposed on
ntial, charge
in the lattice
een grid poin
ned proporti
ula then calc
Figure
(Taken f
ctions will ta
ing the PB e
methods.
inite Differen
e FDM was f
n the develop
Watson, 198
to the solute
density, diel
(Fig. 1.05).
nts, the alloc
ionally to the
culates the d
1.05
from Folgaro
ake a closer
equation, nam
nce Method
first introduc
pment of the
82; Gilson et
e and surroun
lectric const
As the posit
cated charge
e distance of
erivatives of
Cubic grid
o et al., 2002)
18
look at the t
mely the Fin
ced by Warw
Delphi prog
t al., 1988). I
nding solven
tant and ioni
tion of the at
e at each of th
f the grid po
f the PB equ
scheme for t
two most com
nite Differenc
wicker and W
gram has wid
In the FDM,
nt where valu
ic strength ar
toms of the s
he eight neig
int to the ch
uation.
the Finite Dif
mmonly em
ce (FDM) an
Watson, the w
dely popular
, a cubic latt
ues of the el
re assigned t
solute usuall
ghboring gri
harge. A finit
fference Met
mployed
nd Boundary
work of
rized its use
tice is first
lectrostatic
to grid point
ly fall
id points is
te difference
thod.
y
ts
e
19
1.3.1.2 Boundary Element Method
The BEM is an alternative approach to FDM for solving the PB equation. In the
BEM, the potential is represented as a charge density spread over the molecular surface
(Zauhar and Varnek, 1996). Instead of directly solving for the PB equation, the BEM
considers the induced-surface charge to develop an integral formulation to the problem.
This is expressed as:
=
( )
| |
(1.13)
where is the electrostatic potentials due to the surface charge distribution, σ(r) is the
surface charge density and the integral is taken over the entire molecular surface area
(Zauhar and Morgan, 1985; Purisima and Nilar, 1995) . The SIE and SIETRAJ scoring
functions used throughout this thesis for computation of binding free energies, compute
the reaction field energy using the BRI-BEM program which utilizes the BEM (Purisima
and Nilar, 1995; Purisima EO, 1998; Naïm et al, 2007; Cui et al., 2008).
1.3.1.3 Desolvation cost
Continuum solvation studies on the energetics involved in ligand-binding have
been conclusive in noting the large, unfavorable effects of solvent-screening on the
overall electrostatic change in free energy (Kuhn and Kollman, 2000; Wang et al., 2001;
Hou et al., 2002). Complex formation between a protein and ligand involves the breakage
20
and formation of several hydrogen bonds that includes the reorganization of water
molecules around the ligand and target active site (Fig. 1.06). While the gas-phase
interaction between the ligand and protein is favorable, the desolvation of the binding
pocket involved in ligand-binding results in an overall large energetic penalty. Hence,
ligand-binding is suggested to be primarily driven by short-range (vdW) and long-range
hydrophobic forces (Hünenberger et al., 1999; Kuhn and Kollman, 2000; Wang et al.,
2001; Hou et al., 2002). This phenomenon can be better described by looking at the
electrostatic component of the binding free energy. The electrostatic change in binding
free energy is expressed in Eq. 1.15 as the sum of the change in reaction field and
coulomb binding free energies:
																																																														Δ = 	Δ + 																																								(1.14)
where Δ is the change in reaction field energy and is the change in
intermolecular Coulomb energy. Computational studies have noted that while the
intermolecular Coulomb energy favors binding, the desolvation effects are incompletely
compensated by ligand-target interaction in the bound state resulting in an unfavorable
effect on the binding free energy (Hendsch and Tidor, 1999; Miyashita et al., 2003; Sims
et al., 2005).
1.3.2
of inh
mode
rates
2009
devel
respe
abilit
funct
comp
equat
most
Figure 1.0
Scoring F
In their m
hibitors and
es. Virtual sc
and improve
). The maste
lopment of s
ect to their ab
ty to predict
tions may us
putational de
tion, most sc
dominant co
06 Repres
format
Functions
most general u
provide an a
creening pip
e lead comp
er equation (
scoring funct
bility to pred
binding mod
e all or some
emands in as
coring functi
ontributions
sentation of d
tion. (Taken f
use, scoring
accurate disc
elines have u
ound identif
Eq. 1.08) de
tions. The fo
dict binding
des is also o
e of the diffe
ssessing a mo
ions employ
into accoun
21
desolvation e
from Cozzini
functions ar
crimination b
used scoring
fication (Grü
escribed is us
ollowing is a
affinity and/
f interest (H
erent terms e
ore rigorous
a more emp
nt. This provi
effects during
et al., 2004)
re designed t
between true
g functions to
üneberg et al
sed as an ov
an overview
/or affinity r
Halperin et al
expressed in
representati
pirical expre
ides a fast an
g ligand-prot
to predict bi
e and false b
o improve en
l., 2002; Seih
verall guide f
of scoring fu
ranking thou
l., 2002). Sco
n Eq. 1.08. D
ion of the m
ssion, taking
nd accurate m
tein complex
inding mode
binding
nrichment
hert MH,
for the
functions wit
ugh their
oring
Due to the
master
g only the
means of
x
es
th
22
predicting binding modes and ranking potential lead compounds in a virtual screening
setting. The three categories of scoring functions that will be reviewed include physical-
chemical, knowledge-based, and empirical functions.
1.3.2.1 Physical-Chemical
The most prominent physical-chemical scoring function is the Molecular
Mechanics/Poisson-Bolzmann Surface Area (MM-PBSA) function (Kollman et al.,
2000). The overall format of the function can be summarized as follows:
																				Δ = 	Δ + Δ − 	Δ − Δ − ∆ (1.15)
where ∆ is the Coulomb electrostatics and vdW interaction energies calculated using
MM force field packages such as AMBER and CHARMM (Case et al., 2005; Brooks et
al., 2009). The ∆ 	term is usually evaluated using normal mode analysis of a MD
trajectory. All calculations are based on ensemble averages based on snapshots taken
from MD trajectory. Therefore, the MM/PBSA energy is calculated from averages of a
finite number of snapshots from the ensemble and, as such, the quality of the results is
sensitive to the details of the MD simulation.
The Solvated-Interaction Energy (SIE) scoring function is another example of
physics-based scoring function. It makes use of force-field parameters and equations to
make
equat
								
For th
interm
the ch
eleme
energ
comp
solva
poten
betwe
water
solva
surfa
surfa
e estimates o
tion is as fol
												ΔG
he electrosta
molecular in
hange in rea
ent method (
gy is calculat
ponents of fr
ation energy
ntial between
een the boun
r.
As describ
ation energy)
ce area can b
ce area is ca
on the bindin
lows:
=	
atic compone
nteraction en
ction field so
(Purisima an
ted as the dif
ree energy to
and i
n the ligand
nd and free s
bed in Sectio
) is proportio
be different
alculated as t
ng affinity of
+ Δ
	
ent of the fre
ergy i
olvation ene
nd Nilar, 199
fference betw
o binding,
s the vdW in
and protein
states of the
on 1.31, the
onal to the su
from functio
the solvent-e
23
f molecules t
+
ee energy (
is estimated
ergy calcula
95; Purisima
ween the bou
is the
nteraction en
atoms.
solute-water
cavitation c
urface area (
on to functio
excluded sur
to a target. T
+ Δ
), the e
using Coulo
ated using the
a, 1998). Th
und and free
change in th
nergy calcula
is calcul
r VdW energ
ost (nonpola
(Eq. 1.13). T
on. In the cas
rface area (N
The format o
electrostatic
omb’s law an
e BRI-BEM
e change in
e states. For
he non-electr
ated using th
lated as the d
gy and cavita
ar componen
The definitio
se of SIE, th
Naim et al., 2
of the SIE
(1.16
nd is
M boundary
solvation
the nonpola
rostatic
he LJ 6-12
difference
ation cost in
nt of the
n of the
he molecular
2007).
6)
s
ar
n
24
																																																			 = 	 ∙ Δ (1.17)
However, to the cavitation cost is also added the loss of intermolecular VdW interaction
between solute and solvent. This is accomplished by a linear scaling the solute-solute
intermolecular VdW by a factor β and thereby account for the loss of solute-solvent VdW
interactions upon complex formation:
																																										 = ( − 1) + 	 ∙ Δ (1.18)
The complete parameterization of the SIE scoring function is dependent on a number of
variables that include the solute dielectric constant (Din), solute atomic radii {ri}, SA
scaling coefficient (γ), vdW interaction energy scaling coefficient (β) and fitting constant
(C) (Naim et al., 2007):
({ }, , , , ) = ( ) + Δ ({ }, ) +
																																														 ∙ 	 + 	 ∙ Δ ({ }) + (1.19)
One general issue with empirical scoring functions is tied to their training set which can
lead to an overall bias towards targets that have been explored as part of it (i.e. the
training set on which they are parameterized may represent a bias towards the
composition and diversity contained in it) (Gohlke and Klebe, 2002; Ferrera et al., 2004).
25
1.3.2.2 Empirical (Regression) Scoring Functions
Empirical scoring functions weigh contributions from the different energetic
terms in order to make a binding affinity prediction. These terms may include hydrogen-
bonding using geometric measures as well as FF-based physical potentials. However, the
linear weighing of the terms is derived from regression methods that fit binding affinity
terms to experimental affinities using experimental data and structural information. The
regression analysis optimizes the weighing to provide a maximal correlation between
computed and experimental binding affinities in the training set (Bohm et al., 1994;
Verkhivker et al., 1995; Head et al., 1996; Naim et al., 2007).
1.3.2.3 Knowledge-based Scoring Functions
Knowledge-based (KB) scoring functions use statistical potentials that are derived
from protein-ligand complexes databases such as the PDB (Koppensteiner and Sippl,
1998; Muegge et al., 2000; Gohlke and Klebe, 2001). The use of KB potentials for the
scoring of protein-ligand complexes was inspired by the success of potentials in
predicting protein folding and structure (Sippl, 1990; Sippl, 1993; Sippl et al., 1996). In
KB functions, occurrences of interacting pairs of atoms in a training set of complexes are
used to derive statistical potentials that resemble but are not potentials of mean force
(Ben-Naim, 1997). In doing so, certain assumptions are made. The first is that the
protein-ligand complex structures are assumed to be in a state of thermodynamic
26
equilibrium while the second is that the distributions of atoms in the complexes obey
Boltzmann’s law (Sippl et al., 1993; Mullinax and Noid, 2010).
KB potentials are built by first calculating a distance-dependent probability
distribution of atom-pairs. The Hemholtz free energy is then calculated per atom-pair in
the protein-ligand complex:
																																																										 ( ) =	−
( )
																																										(1.20)
where ρij(r) is the pair correlation function for an atom pair of type ij at distance r while
is a normalization factor representing the bulk density for the atom-pair when they
are not interacting at a distance r. A few notable examples of KBP scoring functions
include the piecewise linear potential (PLP), PMFScore and DrugScore (Verkhivker et
al., 1995; Muegge and Martin, 1999; Gohlke et al., 2000).
1.3.1.4 Problems
The major shortcoming of most scoring techniques like SIE is that they only
consider a single receptor-compound interaction in estimating binding free energies of
what is a dynamic process in nature resulting from an ensemble of such complexes. The
use of ensembles as part of a Virtual Screening pipeline is explored in Chapter 5 of this
27
thesis. Nevertheless, despite phenomenal advances in computational power and
technologies, accurate estimates of binding free energies remains challenging.
1.4 Predicting Binding Modes – Docking
Predicting binding modes of ligands to a target protein structure, also known as
docking, has been a key component of in silico techniques used in structure-based,
rational drug design (Kuntz I, 1992; Cavasotto and Orry, 2007). Docking schemes
attempt to find the optimal matching between a ligand and a targeted protein. In essence,
the problem can be reduced to the following: given the atomic coordinates of these two
molecules, predict the proper conformation of the complex. One assumption that is
usually taken into the docking problem is prior knowledge of the binding site targeted by
the ligand.
Docking schemes are typically validated by their ability to reproduce
experimental data through docking studies where protein-ligand complex conformations
are obtained in silico and compared to structures obtained by experimental means (i.e. X-
ray crystallography or nuclear magnetic resonance). Since predicting the correct bound
conformation of both the protein and ligand is a challenging and computationally
expensive task, the problem is usually reduced to the following: given the proper “bound”
conformation of the protein, predict the proper bound conformation of the ligand and
complex. This problem is the focus of the large majority of docking algorithms though a
28
few incorporate a sampling of receptor conformation as well to optimize the predicted
complex coordinates.
The main purposes of docking algorithms can be divided into two groups though
the function of one is not mutually exclusive of the other. The first emphasizes speed and
accuracy, where the main goal is the rapid screening of millions of potential candidate
molecules for the discovery of a few active compounds in virtual screening (see Section
1.6). The second emphasizes accuracy of the complex structure, attempting to bridge the
gap closer and closer between the predicted complex and the experimental structure.
Docking programs search through a large selection of possible fits between a ligand and
the targeted binding pocket and assess the best fit between them by taking into account
several parameters. These parameters are akin to those used in scoring functions, which is
in essence what they are. In this case however, the scoring scheme is optimized to
retrieve the binding mode that is closest to the experimental structure as measured by
RMSD.
1.4.1 Docking Algorithms
1.4.1.1 Fast Shape Matching
Shape Matching algorithms primarily take into account the overall geometrical
overlap between the protein and ligand molecules. Shape matching methods employ a
variety of algorithms in order to assess proper conformations of the ligand and binding
site to be matched.
29
Rigid-body docking applications are mainly SM-based. Examples include
ZDOCK as well as our own internally developed docking program (Chen et al., 2003;
unpublished). Flexible docking algorithms can also use SM methods as part of their
strategy. For instance, DOCK combines incremental construction and a sphere matching
algorithm in order to identify an optimal geometrical alignment (Kuntz et al., 1982).
The development of methods that analytically calculate the solvent-accessible molecular
surface was a key contributor that allowed the development of these SM applications
(Connolly et al., 1983a; Connolly et al., 1983b).
1.4.1.2 Incremental construction
Incremental construction methods divide the ligand into fragments which are
separately docked onto the surface of the binding site. The fragments identified as rigid
“anchors” regions of the ligand are typically docked first and the fragments identified as
flexible regions are added sequentially with a systematic scanning of the torsion angles
around the anchors. Following the docking, rigid fragments are then fused together for an
optimal orientation of the molecule to be obtained. This fragmentation of the molecule is
a means of incorporating ligand flexibility into docking.
The first IC algorithm was part of the DOCK program (Desjarlais et al., 1986). In
DOCK, the rigid fragments were first docked independently and each combination of the
rigid fragments was combined as in the original compound if the atoms were within
30
certain distances of each other. Other methods utilizing IC include FlexX, FLOG,
Hammerhead and Surflex (Miller et al., 1994; Rarey et al., 1996; Welch et al., 1996;
Kramer et al., 1999; Ewing et al., 2001; Jain AN, 2007).
1.4.1.3 Monte Carlo Simulations
The Monte Carlo (MC) Method was developed in its present form by Metropolis,
Ulam and Neumann during their work on the Manhattan project (Metropolis and Ulam,
1949; Metropolis et al., 1953). Historically, it was used to perform the first computer
simulation of a molecular system. MC simulations were later integrated as a means of
adding flexibility to docking algorithms (Liu and Wang, 1999). With respect to docking
algorithms, MC simulations attempt to position the ligand within the binding site through
a number of random translational and rotational changes. The advantage of the added
randomness to the sampling is a decreased likelihood of being trapped in local minima.
The standard (Metropolis) MC methods generate configurations of a system
through random Cartesian changes. Each change to the system is evaluated and then
rejected or accepted based on a Boltzmann probability. One example of a MC-based
docking application is the Internal Coordinates Mechanics (ICM) program (Abagyan et
al., 1994). ICM initially makes a random move of one of three types: a rigid body ligand
move, a torsion move of the ligand or a torsion move of the receptor side chain (Abagyan
and Totrov, 1994; Abagyan et al., 1994). The side chain movement samples the
conformational space defined a priori through a side-chain rotamer library (Ponder and
31
Richards, 1987). The side chain sampling allows the algorithm to explore with larger
probability the conformational space which is known to be highly populated. Following
each sampling step, a modified ECEPP/3 scoring function is used to perform a conjugate
gradient local minimization and test whether the conformation is accepted or rejected
using the Boltzmann criteria.
1.4.1.4 Evolutionary Programming
Evolutionary programming (EP) algorithms are computational models that take
their name and concept from biological processes. The EP algorithms generally start with
a population of structures characterized by a given set of genes. Parent structures are then
allowed to produce children structures containing a mixture of structural characteristic of
the parents (as defined by the parents genes), throughout which mutations are allowed to
occur. The individuals of the population displaying the most favorable features are kept
while others are discarded, as per Darwin’s principle of natural selection.
Genetic algorithms (GA) are one example of EP algorithms. In GA, a population
of chromosomes (parents) is used to create new chromosomes (offsprings). Crossovers
are used to generate the new chromosomes and a complex set of scoring functions are
then used to select members within each round of selection. DOCK and GOLD are two of
the most notable docking programs utilizing variations on the GAs (Ewing et al., 2001;
Verdonk et al., 2003). While EP algorithms can find one of the best solutions to the
docking problem, they, like all heuristic algorithms, can also be trapped in local minima.
32
1.5 Molecular Dynamics
Molecular recognition between a protein and its ligand is a dynamic and complex
process. An accurate computational representation of this interaction is a problem of
considerable complexity and interest in CADD. Few techniques address this process and
account for the conformational flexibility of both the ligand and receptor. Even fewer do
so in an accurate and efficient manner. Protein flexibility is a multi-factorial, complex
problem owing to the inter- and intra- molecular interactions involved in the
conformational dynamics. Of all methods commonly used, Molecular Dynamics (MD)
simulations provide the most complete computational representation of the dynamics
involved in this process.
1.5.1 Newton’s Laws
Molecular dynamics methods solve the Newton’s equation of motion for atoms on
an energy surface. Newton’s law of motion provides the means of generating successive
conformations of the system. The result of these successive conformations is a trajectory
that indicates how the positions and velocities of particles within the system vary with
time. Newton’s laws of motion can be summarized as follows:
33
First law: Every body remains in a state of constant velocity unless acted upon by an
external unbalanced force. Hence, if the resultant force is zero, then the velocity of the
object is constant (Eq. 1.22):
																																																											I.					 = 0 ⟹ = 0																																											(1.21)
Second law: A body of mass m subject to a net force F undergoes and acceleration a that
has the same direction to the force and a magnitude that is directly proportional to the
force and inversely proportional to the mass:
																																																													II.					F = = a																																																		(1.22)
Third law: The mutual forces of action and reaction between two bodies are equal,
opposite and collinear, i.e. whenever a first body exerts a force F on a second body, the
second body exerts a force –F on the first:
																																																											III.			 F , = − F , 																																											(1.23)
In order to obtain an accurate trajectory, the differential equation embodied by Newton’s
second law of motion is solved:
34
																																																																											 = 																																																								(1.24)
which describes the motion of particle of mass mi along one coordinate xi with a net force
F along that direction.
1.5.2 Ensembles
MD simulations are characterized with regards to the macroscopic conditions that
are held constant. Statistical mechanics require that certain macroscopic conditions must
be held constant in order to study the collection of all microstates of a system, its
ensemble. Therefore, ensembles can be characterized by different quantities that include:
volume (V), pressure (P), total energy (E), temperature (T) and number of particles (N).
Ensembles are accordingly named and labeled with respect to the fixed quantities: NVT
(canonical), NVE (micro-canonical) and NPT (isothermic-isobaric).
1.5.3 Verlet Algorithm
As discussed above (see Section 1.2.1), nuclei behave to a good approximation as
classical particles. The dynamics of motion can therefore be extrapolated by solving
Newton’s second equation:
																																																																		 = − = 																																															(1.25)
35
Here, V is the potential energy at position x and the vector x is a vector of length 3N
containing the Cartesian coordinates for all particles. With an initial set of particles at
position xi, the positions at a small time-step later can be calculated using a Taylor
expansion (Eq. 1.27, 1.28):
																																 = +
∂
∂t
(Δ ) +
1
2
∂
∂t
(Δ ) +
1
6
∂
∂t
(Δ ) + ⋯																				(1.26)
																																						 = + (Δ ) +
1
2
(Δ ) +
1
6
(Δ ) + ⋯																							(1.27)
where the velocities vi, the acceleration ai and the hyper-acceleration bi are the first,
second and third derivatives of the positions with respect to time. Substituting Δt with -Δt
we obtain the positions at ri-1:
																																					 = − (Δ ) +
1
2
(Δ ) −
1
6
(Δ ) + ⋯																								(1.28)
By adding equations for ri+1 and ri-1 we are able to calculate the position at Δt later from
the current acceleration, and the previous and current positions.
																																																							 = (2 − ) + (Δ ) 																																									(1.29)
36
where the current acceleration can be obtained from the force or the derivative of the
potential:
																																																														 =
F
= −
1
																																																		(1.30)
As the acceleration is re-evaluated at each time step from the forces, the positions are
changed at each time-step, which then creates the resulting trajectory. This, in essence, is
the Verlet algorithm (Verlet, 1967). Certain disadvantages of the Verlet algorithm has
given rise to the use of alternative algorithms for MD simulations. The first disadvantage
is the tendency towards truncation errors. This is a consequence of adding ai (a small
number) and 2ri – ri-1 (a large number) for the calculations of the new positions. The
second is that velocities are not an explicit part of the Verlet algorithm and creates a
problem in generating constant temperature ensembles (Cuendet and van Gurensteren,
2007). The velocity Verlet algorithm is a variation that addresses these problems (Martys
and Mountain, 1999).
1.5.4 Considerations
MD simulations require small time-steps and are time-intensive with regard to the
calculation of phenomena such as bond stretching and angle-bending motions. The size
of the chosen time-step is a critical element affecting the accuracy of the trajectory with
smaller time-steps providing a better approximation of the expected dynamics of the
system. This however also increases the computational costs, as more steps are required
for propagating the system for a given total time. Generally, the longest time-step that can
37
be taken is limited by the rate of the fastest process being sampled in the system.
Typically, that requires that the time-step be one order of magnitude smaller than the
fastest process. In MD simulations, molecular rotations and vibrations occur with
frequencies in the 1011
-1014
S-1
. Therefore, time-steps in the order of 10-15
S or less are
required for sampling of these molecular motions. A consequence of this limitation is that
a MD simulation of 1 nanosecond (ns) would require ~109
time-steps to complete. Since
simulations are typically in the nanosecond range, orders of ~109
calculations present a
significant computational demand. Additionally, biological phenomena such as protein
folding typically occur in even longer microsecond timescales.
One solution to this problem involves the freezing of the fastest molecular
motions. This allows for significantly longer time-steps to be used while affecting the
overall accuracy minimally. This is made possible because the fastest processes, the
stretching vibrations, have a minimal impact on the properties of the trajectory. This is
especially true for bonds involving hydrogen atoms. Therefore, freezing of bond lengths
involving hydrogen atoms results in longer simulation times for a given number of
calculated time-steps. The SHAKE and RATTLE algorithms provide the constraints
necessary to maintain bonds involving hydrogen atoms fixed during the simulation and
typically allow time-steps to be increased two to three fold (Ryckaert et al., 1977;
Andersen, 1983).
Lastly, overcoming energy barriers can be a challenging task given that any
motion of a conformational ensemble outside of its minimum in the potential energy
38
surface will generate a force pulling the system back towards its minimum. A number of
novel algorithms such as Replica-Exchange and MetaDynamics attempt to overcome this
limitation using different means (Sugita and Okamoto, 1999; Laio and Parrinello, 2002).
1.5.5 Boundary Conditions
MD simulations of a solvated system usually involve several hundred or thousand
molecules of solvent. However, in order for macroscopic properties to be realistically
calculated from a limited number of solvent molecules, boundary effects require special
considerations. When considering that a water-filled 1L cube contains 3.3 x 1025
molecules of water at room temperature, 2 x 1019
of which will be interacting with the
cube’s boundary, it is easy to see why using a computationally tractable number of
molecules will be insufficient for deriving bulk properties. In a system containing a few
thousand water molecules, most would be under the influence of interactions with the
boundary.
Periodic boundary conditions basically replicate the bulk properties of a fluid
given a limited number of solvent molecules. The system is usually prepared within the
confines of a box having a cubic or other polyhedral geometry (Bekker, 1997). The box is
then replicated in all directions (Fig. 1.07). If a solvent molecule leaves the box during
the simulation it is replaced by an image particle entering the box from the opposite side
(Fig. 1.07). A constant number of solvent molecules within the box is therefore
maint
as if t
Figu
repro
1.5.6
r-1
. T
contr
spher
cutof
simul
1995
tained. This
they were w
re 1.07 Pe
duced from fr
Long-Ran
The intera
This creates a
ributions from
rical cutoffs,
ff. However,
lations of pe
).
configuratio
within bulk flu
eriodic bound
from www.-ph
ge Electrost
action energy
a computatio
m atoms loc
, which essen
cutoffs have
eptides and n
on allows for
uid.
dary conditio
hy.-cmich.-edu
tatic Calcul
y between tw
onal problem
ated outside
ntially elimi
e been docum
nucleic acids
39
r particles w
ons in molecu
u/-people/-pe
lations: The
wo point cha
m in consider
e of the centr
nates electro
mented to re
s (Schreiber
within the sys
ular dynami
tkov/-isaacs/-
e Ewald Sum
arges decays
ring the long
ral box. One
ostatic contri
esult in sever
and Steinhau
stem to expe
ic simulation
-phys/-pbc.-h
mmation M
s at a rate pro
g-range elec
e solution is
ibutions bey
re artifacts in
user, 1992; Y
erience force
ns. (box
tml)
Method
oportional to
trostatic
using
yond the
n MD
York et al.,
es
o
40
Ewald summation methods allow the potential due to the partial charges of a
system and all of their periodic images to be considered. In Ewald summation, the
position of each image box is related to the central box through a vector. Each vector is
therefore an integral multiple of the length of the box. Generally, the contribution of
charge-charge interactions within the central box to the potential energy can be written
as:
																																																															 =	
1
2 4
																																															(1.31)
where rij is the distance between charge i and j.
1.6 Virtual Screening
While economic pressures increase to deliver target-optimized drugs
at an accelerated pace and minimal costs, computational methods have become an
increasingly important tool in drug discovery efforts. While numerous challenges
continue to persist in the in silico accurate prediction of ligand-target interaction,
computational methods have already proved themselves in the successful development of
numerous pharmaceutical medications (See Section 1.7). Of note is the role of virtual
screening (VS) in lead discovery efforts. VS provides the ability to analyze large
compound databases, make predictions as to which compounds are most likely to interact
with the desired target and become promising lead candidates. These candidates can then
be tested and successful molecules can then go through rounds of optimization. VS
41
therefore circumvents the expense incurred through large scale screening efforts and
narrows the search to a few, high-potential candidates (Oprea and Matter, 2004).
1.6.1 Virtual Screening Pipeline
A VS pipeline is designed to optimize the use of computational resources for
efficiency and speed at the initial stages and for accuracy at the later stages. This design
optimizes the use of computational resources for the best overall performance of the
pipeline. In this case, earlier stages of the filtering process minimize the use
computational resources, thereby optimizing speed, by using soft scoring functions. More
extensive calculations and sampling methods are reserved for the later stages of the
pipeline where careful selection of the candidates with the highest potential is required
(Fig 1.08).
Figuure 1.08
42
The Virrtual Screenning Pipelinne.
43
1.6.2 The Target
Target selection is the first step to any structure-based drug discovery project.
Several requirements must be met. The first involves the target’s druggability (Hajduk et
al., 2005). The second involves the availability and choice of the 3D structure used for
the screening. X-ray crystallography or NMR structures are the preferred choices though
VS projects have been successfully run on homology models as well (Evers and
Klabunde, 2005). Since the majority of VS software has limited considerations with
regards to target flexibility, the choice of structure should be aimed towards one where
the conformation of the binding site is akin to that expected when bound to a small
molecule (Sousa et al., 2006).
Following the careful selection of target and structure, preparation of the target
structure is another important task in the VS preparation steps. The primary consideration
is in the assignment of proper protonation states to active-site residues. Difficulties arise
due to the effects of local electrostatic conditions on the pKa values of side-chain
functional groups. With respect to the success of VS, proper assignment of side-chain
protonation states is crucial in providing an accurate representation of the binding-site
characteristics. A few alternatives exist which integrate the electrostatic effects in
assessing the protonation states of side-chain functional groups. One example is the H++
server which predicts the protonation states of amino-acid side chain functional groups
within the continuum electrostatic framework (Gordon et al., 2005).
44
1.6.3 The Compound Database
A database should first provide optimal structural diversity so as to maximize
chances of finding numerous scaffolds displaying activity against the target. Generally,
compounds should also adhere to the Lipinski’s rule of five (Lipinski et al., 2001).
Several small molecule database exist which are routinely used for VS. These include
the ZINC library, the National Cancer Institute compound database, and Accelerys
Available Chemical Directory and MDDR libraries (Milne et al., 1994; Irwin and
Shoichet, 2005). Most major pharmaceutical companies also have in-house corporate
libraries.
1.6.4 The Docking Protocol
The docking protocol is at the core of every VS pipeline. Docking algorithms
attempt to predict the structure of the protein-ligand complex as a first, preliminary filter.
The docking must therefore be fast, as an extremely large number of compounds must be
evaluated. While the docking pipeline may not provide absolute accuracy with regards to
selecting all true-positive compounds, it must be robust enough not to discard moderate
to strong binders as false-negatives across a variety of targets. This preliminary docking
step is typically composed of a docking algorithm (see Section 1.4.1) and a scoring
function. The scoring function used at this step is usually optimized for speed rather than
accuracy and other more extensive and accurate functions are usually used at later stages
45
of the pipeline where a more discriminate assessment of the binding potential of a
smaller number of compounds is required.
1.6.5 MD Simulations
MD simulations are used a final refinement of the most promising candidate
molecules before selection is done. As such, MD simulations in the order of a few
hundred picoseconds to a few nanoseconds are done on the predicted ligand-protein
complex. The goal of MD simulations in this setting is to establish the proper dynamic
stability of the complex. This is achieved by careful observation as to the interactions
between the ligand and target supported by analysis of the stability of the protein
structure and binding mode. Scoring functions such as SIETRAJ and MM-PBSA can also
be used on the MD simulations to obtain a better assessment as to the potential binding
affinity of the compounds (Kollman et al., 2000; Cui et al., 2008).
1.6.6 Conformational Ensembles
The VS pipeline described typically considers the target as a rigid entity during
the docking process. Since the conformational flexibility of the target is seldom fully
considered during such a process, methods that integrate target flexibility through the use
of conformational ensembles have proved successful (Bursavich et al., 2002; Osterberg et
al., 2002; Barril and Morley, 2005; Amaro et al., 2008). Theoretically, conformational
ensembles allow a fully dynamic representation of the target to be presented to the ligand
46
for fit. This is akin to what is thought to occur in solution where a ligand binds to a pre-
existing receptor population. The ligand is then exposed to the conformational ensemble
of the receptor and may preferentially bind to conformations that occur infrequently in
the receptor’s dynamics (Ma et al., 2002; Wong and McCammon, 2003). The result is a
shift in the equilibrium population towards that of the preferentially bound conformation
(Ma et al., 2002). The “lock and key” model of ligand binding is therefore thought to be
a representation of one of the rare conformations within this ensemble and hence, that
conformational selection is a driving force in ligand recognition.
One of the most prominent examples of using conformational ensembles
generated from MD simulations for VS has been implemented as part of the Relaxed
Complex Scheme (RCS) (Lin et al., 2002; Lin et al., 2003; Amaro et al., 2008). The RCS
combines the advantages of docking with the dynamic conformational sampling that is
provided by MD simulations. Through this use of MD simulations, the RCS integrates
extensive conformational sampling of the target structure into the VS pipeline. At the
core of the RCS is an all-atom MD simulation of the target where the simulation time
varies from a few ns to tens of ns (Schames et al., 2004; Amaro et al., 2008; Cheng et al.,
2008). With few exceptions, the AutoDock docking program is typically used to carry out
docking and scoring functions (Morris et al., 2009). Since significant conformational
changes to the active site are induced by ligand binding, a ligand-bound structure is
usually preferred. The resulting trajectory is then reduced to a computationally tractable
ensemble. A number of strategies exist to select a representative subset from the full set
of resulting structures where much of the dynamic information of the trajectory remains.
47
RMSD-based clustering is an obvious choice for selection of the most dominant
configurations within the trajectory. In their study of avian influenza neuraminidase using
the RCS, Cheng et al. applied RMSD clustering on snapshots extracted every 10ps from
40ns trajectories (Cheng et al., 2008). An alternate but equally effective method is that of
QR-factorization (O’Donoghue and Luthey-Schulten, 2005). QR-factorization was
originally designed for the removal of redundant information from structural databases by
identifying a set of structures which represent the evolutionary conformational space of a
protein. In their study of the Trypanosoma brucei RNA-editing Ligase 1 (TbRel1),
Amaro et al. integrated the use of QR factorization into the RCS in order to extract a
representative set of structures from a 20ns trajectory of the target in complex with ATP,
its native substrate (Amaro et al., 2007; Amaro et al., 2008). For the QR factorization,
snapshots were extracted every 50ps resulting in a set of 400 structures which was
reduced to a total of 33. In both cases the RCS proved extremely successful in identifying
true binders from the original database. For Cheng et al., the weighted average score
from docking into the full representative ensemble of the holo trajectory resulted in the
selection of 25 compounds, 10 of which displayed a Ki under 500µM (Cheng et al.,
2008). For Amaro et al., ranking of the mean score from docking into the QR
representative set resulted in the selection of 10 compounds, 5 of which displayed
inhibition at 10µM or better (Amaro et al., 2008).
48
1.7 Successes of CADD
Computer-aided drug design techniques have now become a core component of
modern drug discovery and development pipelines (Jorgensen, 2004). One of the most
prominent successes of rational, structure-based drug design is that of the imatinib
(Gleevec®), a tyrosine kinase inhibitor for the treatment of Chronic Myelogenous
Leukemia (CML) (Capdeville et al., 2002). Early drug discovery programs for the
treatment of cancer largely focused on inhibition of DNA synthesis and cell division
through the use of antimetabolites (nucleoside analogs and antifolates), alkylating agents
(classical and newer platinum-based therapeutics) and microtubule destabilizers (vinca
alkaloids) and microtubule stabilizing agents (taxanes) (Scott, 1970; Scagliotti and
Selvaggi, 2006; Zhou and Giannakakou, 2005). The uncovering of the bcr-abl reciprocal
translocation as the pathogenic event in CML established it as an attractive drug target
(Kelliher et al., 1990). Docking studies and X-ray crystallography established the binding
of Gleevec with high-affinity to the inactive form of the ATP-binding pocket (Schindler
et al., 2000; Zimmerman et al., 2001). Additionally, SBDD allowed for the analysis of
mutations in the enzyme which gives rise to imatinib resistance. This provided an
opportunity for the design of novel pharmaceuticals that are effective in overcoming
imatinib-resistance (Weisberg et al., 2007).
The first marketed drug whose development was assisted by SBDD was captopril
(Capoten®
), an angiotensin-converting enzyme (ACE) inhibitor used for the treatment of
hypertension (Cushman et al., 1977). Early on in the developmental stages a peptidic lead
49
compound had been identified from a snake poison. However, structural information as to
the binding site of ACE was lacking. This led scientists at Squibb to use the structure of
another zinc protease, the recently crystallized carboxypeptidase A, to model binding site
of ACE. The modeling led to the development of captopril, the first successful design
based on a molecular model. The structural determination of ACE came about in 2002
where it was determined that the biding site of ACE differed significantly from
carboxypeptidase A leading to the development of newer, more targeted ACE inhibititors
(Natesh et al., 2003).
The success of CADD in properly assessing the potential binding of a compound
to a target is directly related to our ability to correctly the binding affinity of a small
molecule. Many of the limitations of current in silico pipelines stem from the difficulties
in properly and reliably predicting the binding of small molecules to a target (Michel and
Essex, 2010).
50
Chapter 2
Molecular Dynamics Study of Small Molecule
Inhibitors of the Bcl-2 Family
51
Preface
The contents presented in the following chapter have been published as presented:
Acoca S, Cui Q, Shore GC, Purisima EO. 2011. Molecular dynamics study of small
molecule inhibitors of the Bcl-2 family. Proteins. 79(9):2624-36.
52
2.1 Rationale
Molecular modeling techniques have taken an important role in drug
development. This is especially true of molecular simulations and scoring functions
which provide useful insights for the optimization of lead compounds. Obatoclax and
ABT-737 are two novel Bcl-2 inhibitors which have different selectivity profiles for
antiapoptotic Bcl-2 members. While numerous studies have examined the selectivity of
BH3 domains for Bcl-2 members, few have provided conclusive evidence as to the
selectivity of ABT-737. With regards to Obatoclax, lack of structural data on its binding
mode has also left much questions unanswered as to how it mediates its inhibition of Bcl-
2 members. This study therefore aimed to provide the grounds on which the selectivity of
both ABT-737 and Obatoclax could be understood while identifying the most probable
binding mode of Obatoclax.
2.2 Abstract
We carried out docking and molecular dynamics simulations on ABT-737 and
Obatoclax, which are inhibitors of the Bcl-2 family of proteins. We modeled the binding
mode of ABT-737 with Bcl-XL, Bcl-2, and Mcl-1 and examined their dynamical
behavior. We found that the binding of the chlorobiphenyl end of ABT-737 was quite
stable across all three proteins. However, the phenylpiperazine linker group was
dramatically more mobile in Mcl-1 compared to either Bcl-XL or Bcl-2. The S-phenyl
group at the p4 binding site was well-anchored in Bcl-XL and Bcl-2 but was somewhat
53
more mobile in Mcl-1 although the phenyl ring itself on average stayed close to the p4
binding site in Mcl-1. This greater mobility is likely due to the greater openness of the p3
and p4 binding sites on Mcl-1. The calculated binding free energies were consistent with
the much weaker binding affinity of ABT-737 for Mcl-1. Obatoclax was predicted to
bind at the p1 and p2 binding sites of Mcl-1 and the binding mode was quite stable during
the molecular dynamics simulation with Mcl-1 wrapping around the molecule. The
modeled binding mode suggests that Obatoclax is able to inhibit all three proteins
because it makes use of the p1 and p2 binding sites alone, which is a fairly narrow groove
in all three proteins unlike the p4 binding site, which is much broader in Mcl-1.
2.3 Introduction
Cancer is fundamentally a disease of dynamic changes in the genome. It has been
described as a multistep process culminating in the acquirement of six essential
alterations in cellular physiology (Hanahan and Weinberg, 2000). Dysregulation of the
apoptotic process has been recognized as one of these critical alterations required for
progression to the disease phenotype (Hanahan and Weinberg, 2000). As such, research
directed towards a better understanding of the processes involved in the regulation of
apoptosis has bloomed in the past decade, directed towards a better understanding of the
extensive network of protein interactions that regulate it and the potential targets that can
be used to activate it.
54
At its core, apoptosis is the mechanism responsible for the careful synchrony of
cellular death observed throughout development, the maintenance of homeostasis and
proper immune function (Krammer et al., 1994; Meier et al., 2000; Elmore S, 2007).
There are two pathways (intrinsic and extrinsic) which converge towards activation of the
apoptotic machinery. The extrinsic pathway is characterized by activation of members of
the death receptor family (Ashkenazi and Dixit, 1998). Death receptors, which belong to
the tumor necrosis factor (TNF) receptor superfamily, are surface transmembrane
receptors engaged by binding of extracellular “death ligands” such as FasL and TNF
(Ashkenazi and Dixit, 1998). Activation of these receptors leads to the formation of the
death-inducing signaling complex (DISC), which mediates the activation of initiator
caspases thereby committing the cell to apoptotic death (Bao and Shi, 2007). On the other
hand, the intrinsic (mitochondrial) apoptotic pathway is triggered by mainly non-receptor
stimuli. It is unique in its ability to initiate apoptosis in response to DNA damage,
cytotoxic stress and cytokine deprivation though it can be engaged by the extrinsic
pathway as well (Brenner and Mak, 20009). In response to apoptotic stimuli, the intrinsic
pathway triggers the permeabilization of the outer mitochondrial membrane (OMM).
This permeabilization releases Cytochrome C and other molecules residing within the
mitochondrial inter membrane space (IMS) into the cytosol, resulting in the formation of
the apoptosome (a complex of Cytochrome C, APAF-1 and pro-caspase 9) and activation
of the caspase cascade through caspase 9 (Ow et al., 2008).
At the heart of the intrinsic pathway lies the Bcl-2 family of apoptotic proteins.
Known as the “Gatekeepers of Mitochondrial Apoptosis”, the Bcl-2 family of proteins
55
are unique in their role of regulating mitochondrial outer membrane integrity in response
to death stimuli (Adams and Cory, 2007). Through heterodimerization, anti-apoptotic
members can neutralize the effects of pro-apoptotic members, the relative balance of
which acts as a regulating switch for initiating mitochondrial apoptosis (Oltersdorf et al.,
2005). The Bcl-2 family is composed of three groups of proteins distinguished through
functional and structural features. The antiapoptotic members (consisting of Bcl-2, Bcl-
XL, Bcl-B, Bcl-W, Mcl-1 and A1) share three to four α-helical regions of high sequence
similarity known as the Bcl-2 Homology (BH) domains (Petros et al., 2004; Adams and
Cory, 2007). Bcl-2 pro-survival proteins inhibit the pro-apoptotic members in part by
sequestering the amphiphilic BH3 helix of the pro-apoptotic members within a long
surface exposed groove. Because the Bcl-2 survival members promote cell survival in
cancer cell lines, they are recognized as a highly relevant target for the treatment of
cancer. They are also implicated in general resistance to chemotherapeutic agents along
with a more aggressive malignant phenotype (Minn et al., 1995; Simonian et al., 1997;
Amundson et al, 2000). Bcl-2 inhibitors show promise as cancer therapeutics, especially
when used in combination therapy (Oltersdorf et al., 2005; Nguyen et al., 2007; Lessene
et al., 2008; Tse et al, 2008; Ackler et al., 2010).
One promising agent is the orally bioavailable compound ABT-263 (navitoclax);
ABT-737 (Figure 2.01) is an analog that is widely used in preclinical studies as a tool
compound (Oltersdorf et al., 2005; Tse et al., 2008). These compounds were developed
using the SAR by NMR methodology and employing stable protein fragments for optimal
NMR study. They display subnanomolar affinity for such recombinant fragments of Bcl-
56
2, Bcl-XL, and Bcl-W but > 1µM for Mcl-1 (Shuker et al., 1996; Oltersdorf et al., 2005;
Tse et al., 2008). As predicted by its affinity profile, ABT-737 as well as navitoclax
exhibits limited efficacy in cells where Mcl-1 is expressed (Konopleva et al., 2006; van
Delft et al., 2006; Chen et al., 2007; Tse et al. 2008). Consequently, this selectivity is one
of the key aspects of navitoclax that may limit its chemotherapeutic utility. Several
studies have addressed the selectivity of BH3 peptides for members of the Bcl-2 family;
however, extrapolation to explaining and modifying the selectivity of ABT-
737/navitoclax has not been straightforward (Lee et al., 2008; Lee et al., 2009; Fire et al.,
2010). Furthermore, the Bcl-2 pro-survival proteins are anchored in the mitochondrial
outer membrane where they in fact undergo conformational changes and greater
penetration into the lipid bilayer in response to stress stimuli (Shore et al., 2008). Thus
despite the high affinity binding of these compounds to soluble recombinant protein
fragments in aqueous buffers in vitro, it is not clear how this translates to the efficacy of
binding in intact cells.
Figure 2.01 ABT-737 chemical structure.
57
A second Bcl-2 inhibitor currently in Phase I & II trials is obatoclax (GX15-070),
a hydrophobic cycloprodigiosin derivative developed by Gemin X Pharmaceuticals
(Nguyen et al., 2007). Obatoclax (Figure 2.02) was found to inhibit the binding of BH3
peptides to recombinant fragments of all pro-survival members of the Bcl-2 family with
low micromolar affinity employing fluorescence polarization assays but its key property
lies in its ability to potently overcome Mcl-1 mediated resistance to chemotherapeutic
agents (Zhai et al., 2006; Nguyen et al., 2007; Perez-Galan et al., 2007). Indeed, in
assays employing native Mcl-1 in intact mitochondrial outer membrane, 10 nM obatoclax
reverses the constitutive interaction between Mcl-1 and pro-apoptotic Bak. Hence, an
understanding of its binding mode to Mcl-1 is of particular interest.
Figure 2.02 Obatoclax chemical structure.
In this chapter we present an extensive analysis of molecular dynamics
simulations performed for obatoclax/Mcl-1 and ABT-737 complexes. The aim of the
current study is to rationalize the binding specificity of ABT-737 and to predict the
binding mode of obatoclax to Mcl-1, for which an experimentally determined three-
dimensional structure of the complex has proven to be elusive.
58
2.4 Methods
2.4.1 Structure Preparation
The starting structures for the docking and molecular dynamics simulation
experiments of Bcl-2, Bcl-XL and Mcl-1 complexes were taken from the Protein Data
Bank (Codes 1YSW, 2YXJ, and 2PQK respectively). All bound ligands (small molecules
and BH3 peptides), waters and ions and other molecules were removed from the
complexes, except for Bcl-XL for which we kept the ABT-737 ligand. Missing side
chains, terminal residues and hydrogen atoms were added using Sybyl 8.0 (Tripos Inc.,
St. Louis, MO) and XLeap in AMBER (Case et al., 2005). Protonation states were
assigned using the H++ server (Gordon et al., 2005). Visual inspection of all assigned
protonation states was done in Sybyl 8.0 and adjusted as needed.
2.4.2 Force field parameters
The FF99SB force field in the AMBER suite of programs was used for the protein
atoms. The antechamber module of Amber Tools was used to assign GAFF parameters
for obatoclax and ABT-737 (Wang et al., 2004; Case et al., 2005; Hornak et al., 2006). In
the case of the ABT-737, we applied the biphenyl parameters of Athri and Wislon (Athri
and Wilson, 2009). Partial charges for the inhibitors were obtained using RESP with 6-
31G* electrostatic potentials calculated using GAMESS (Bayly et al., 1993; Schmidt et
al., 1993).
59
The sulfonamide group in ABT-737 has an imide-like bond (see Figure 2.01) that
is not well-represented by the default GAFF parameters. Hence, we derived force field
torsional parameters for the S–N bond using a model compound with a phenyl ring on
either side the SO2NHCO group. The covalent geometry was taken from the Cambridge
Structural Database (CSD) entry CEKHIJ (Allen FH, 2002). A torsional energy profile
around the S–N bond was generated using GAMESS at an MP2/6-31G* level of theory.
A truncated Fourier series was fitted to the residual torsional energy profile after
subtracting out the calculated AMBER energy. The resulting coefficients are listed in
Table S1 (Supplementary Materials).
2.4.3 Docking
For Bcl-XL the deposited crystal structure of the complex (PDB 2YXJ)
was used directly as the starting point for our calculations. For Bcl-2, the initial docked
pose of ABT-737 was obtained by superposing the Bcl-2 structure with Bcl-XL and
extracting and merging the inhibitor coordinates in the Bcl-XL structure into the Bcl-2
structure. The same procedure was carried out for docking ABT-737 into Mcl-1. For Bcl-
2 and Mcl-1, direct merging of the inhibitor into the binding site resulted in some side
chains being in awkward positions relative to ABT-737. These were initially relieved
using the Sculpt module of Pymol (Schrodinger, New York) followed by ligand-
restrained energy minimization.
60
Docking of obatoclax into Mcl-1 was carried out using an in-house docking
program (manuscript in preparation) that does an exhaustive rigid body docking
(translation and rotation) of the ligand on a grid. A rectangular box enclosing the entire
binding groove defined the search region. We used a grid spacing of 0.5 Å and rigid body
rotational angular increments corresponding to atomic displacements of 0.5 Å. Poses
were scored using a weighted combination of van der Waals, coulomb, surface area,
shape complementarity and hydrogen bonding terms. The weights were previously
calibrated to reproduce binding poses of a training set of protein-ligand complexes.
OMEGA (OpenEye Scientific Software, New Mexico) was used to generate conformers
for the ligand used in the rigid docking. The protein was kept fixed during the docking.
The top-scoring pose was used for the MD simulation.
2.4.4 Molecular Dynamics Simulations
Each system was immersed in a truncated octahedral TIP3P water box (Jorgensen
et al., 1983). The distance between the wall of the box and the closest atom of the solute
was 12Å. Sodium or chloride counterions were added as required to maintain
electroneutrality of the system. Molecular dynamics (MD) simulations were carried out
using the AMBER program. A 2 fs time step and 9 Å non-bonded cutoff was used.
SHAKE was employed to constrain bond lengths of bonds to hydrogen atoms and the
Particle Mesh Ewald algorithm was used to treat long-range electrostatics (Ryckaert et
al., 1977; Cheatam et al., 1995).
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development
In silico methods in drug discovery and development

Mais conteúdo relacionado

Mais procurados

Computer aided drug design
Computer aided drug designComputer aided drug design
Computer aided drug designN K
 
Rational drug design
Rational drug designRational drug design
Rational drug designNaresh Juttu
 
Structure based drug designing
Structure based drug designingStructure based drug designing
Structure based drug designingSeenam Iftikhar
 
Energy minimization methods - Molecular Modeling
Energy minimization methods - Molecular ModelingEnergy minimization methods - Molecular Modeling
Energy minimization methods - Molecular ModelingChandni Pathak
 
Structure based and ligand based drug designing
Structure based and ligand based drug designingStructure based and ligand based drug designing
Structure based and ligand based drug designingDr Vysakh Mohan M
 
Structure based drug design
Structure based drug designStructure based drug design
Structure based drug designADAM S
 
Target discovery and validation
Target discovery and validation Target discovery and validation
Target discovery and validation ANAND SAGAR TIWARI
 
Validation of homology modeling
Validation of homology modelingValidation of homology modeling
Validation of homology modelingAlichy Sowmya
 
In silico drug desigining
In silico drug desiginingIn silico drug desigining
In silico drug desiginingDevesh Shukla
 
Virtual Screening in Drug Discovery
Virtual Screening in Drug DiscoveryVirtual Screening in Drug Discovery
Virtual Screening in Drug DiscoveryAbhik Seal
 
Traditional and Rational Drug Designing
Traditional and Rational Drug DesigningTraditional and Rational Drug Designing
Traditional and Rational Drug DesigningManish Kumar
 
In-silico Drug designing
In-silico Drug designing In-silico Drug designing
In-silico Drug designing Vikas Sinhmar
 

Mais procurados (20)

Homology modelling
Homology modellingHomology modelling
Homology modelling
 
Computer Aided Drug Design ppt
Computer Aided Drug Design pptComputer Aided Drug Design ppt
Computer Aided Drug Design ppt
 
Computer aided drug design
Computer aided drug designComputer aided drug design
Computer aided drug design
 
Rational drug design
Rational drug designRational drug design
Rational drug design
 
Structure based drug designing
Structure based drug designingStructure based drug designing
Structure based drug designing
 
MOLECULAR DOCKING
MOLECULAR DOCKINGMOLECULAR DOCKING
MOLECULAR DOCKING
 
Rational drug design
Rational drug designRational drug design
Rational drug design
 
Drug Design:Discovery, Development and Delivery
Drug Design:Discovery, Development and DeliveryDrug Design:Discovery, Development and Delivery
Drug Design:Discovery, Development and Delivery
 
Energy minimization methods - Molecular Modeling
Energy minimization methods - Molecular ModelingEnergy minimization methods - Molecular Modeling
Energy minimization methods - Molecular Modeling
 
Structure based and ligand based drug designing
Structure based and ligand based drug designingStructure based and ligand based drug designing
Structure based and ligand based drug designing
 
De Novo Drug Design
De Novo Drug DesignDe Novo Drug Design
De Novo Drug Design
 
Cadd
CaddCadd
Cadd
 
Structure based drug design
Structure based drug designStructure based drug design
Structure based drug design
 
Target discovery and validation
Target discovery and validation Target discovery and validation
Target discovery and validation
 
Validation of homology modeling
Validation of homology modelingValidation of homology modeling
Validation of homology modeling
 
In silico drug desigining
In silico drug desiginingIn silico drug desigining
In silico drug desigining
 
Qsar ppt
Qsar pptQsar ppt
Qsar ppt
 
Virtual Screening in Drug Discovery
Virtual Screening in Drug DiscoveryVirtual Screening in Drug Discovery
Virtual Screening in Drug Discovery
 
Traditional and Rational Drug Designing
Traditional and Rational Drug DesigningTraditional and Rational Drug Designing
Traditional and Rational Drug Designing
 
In-silico Drug designing
In-silico Drug designing In-silico Drug designing
In-silico Drug designing
 

Destaque

In-Silico Drug Designing Against Ebola Virus: A Genomic Approach
In-Silico Drug Designing Against Ebola Virus: A Genomic ApproachIn-Silico Drug Designing Against Ebola Virus: A Genomic Approach
In-Silico Drug Designing Against Ebola Virus: A Genomic ApproachHarris Kaushik
 
Computer Aided Vaccine Design
Computer Aided Vaccine DesignComputer Aided Vaccine Design
Computer Aided Vaccine DesignGeoffrey Siwo
 
Introduction to In silico engineering for biologics
Introduction to In silico engineering for biologicsIntroduction to In silico engineering for biologics
Introduction to In silico engineering for biologicsLee Larcombe
 
In silico drug design an intro
In silico drug design   an introIn silico drug design   an intro
In silico drug design an introPrasanthperceptron
 
Subunit and peptide vaccine
Subunit and peptide vaccineSubunit and peptide vaccine
Subunit and peptide vaccineAdnya Desai
 
presentation on in silico studies
presentation on in silico studiespresentation on in silico studies
presentation on in silico studiesShaik Sana
 
Recombinant peptide vaccine
Recombinant peptide vaccineRecombinant peptide vaccine
Recombinant peptide vaccineAnuj Raja
 

Destaque (10)

In-Silico Drug Designing Against Ebola Virus: A Genomic Approach
In-Silico Drug Designing Against Ebola Virus: A Genomic ApproachIn-Silico Drug Designing Against Ebola Virus: A Genomic Approach
In-Silico Drug Designing Against Ebola Virus: A Genomic Approach
 
Computer Aided Vaccine Design
Computer Aided Vaccine DesignComputer Aided Vaccine Design
Computer Aided Vaccine Design
 
Introduction to In silico engineering for biologics
Introduction to In silico engineering for biologicsIntroduction to In silico engineering for biologics
Introduction to In silico engineering for biologics
 
In silico drug design an intro
In silico drug design   an introIn silico drug design   an intro
In silico drug design an intro
 
Subunit and peptide vaccine
Subunit and peptide vaccineSubunit and peptide vaccine
Subunit and peptide vaccine
 
Computer aided Drug designing (CADD)
Computer aided Drug designing (CADD)Computer aided Drug designing (CADD)
Computer aided Drug designing (CADD)
 
presentation on in silico studies
presentation on in silico studiespresentation on in silico studies
presentation on in silico studies
 
Shigella
ShigellaShigella
Shigella
 
Shigella
ShigellaShigella
Shigella
 
Recombinant peptide vaccine
Recombinant peptide vaccineRecombinant peptide vaccine
Recombinant peptide vaccine
 

Semelhante a In silico methods in drug discovery and development

Bristol-Myer Squibb Report
Bristol-Myer Squibb ReportBristol-Myer Squibb Report
Bristol-Myer Squibb ReportRay Parker
 
Population Pharmacokinetic Modelling of an investigational prodrug. Crunenber...
Population Pharmacokinetic Modelling of an investigational prodrug. Crunenber...Population Pharmacokinetic Modelling of an investigational prodrug. Crunenber...
Population Pharmacokinetic Modelling of an investigational prodrug. Crunenber...robirish51
 
Project report: Investigating the effect of cellular objectives on genome-sca...
Project report: Investigating the effect of cellular objectives on genome-sca...Project report: Investigating the effect of cellular objectives on genome-sca...
Project report: Investigating the effect of cellular objectives on genome-sca...Jarle Pahr
 
Applications Of Bioinformatics In Drug Discovery And Process
Applications Of Bioinformatics In Drug Discovery And ProcessApplications Of Bioinformatics In Drug Discovery And Process
Applications Of Bioinformatics In Drug Discovery And ProcessProf. Dr. Basavaraj Nanjwade
 
Chemoinformatic File Format.pptx
Chemoinformatic File Format.pptxChemoinformatic File Format.pptx
Chemoinformatic File Format.pptxwadhava gurumeet
 
cadd-191129134050 (1).pptx
cadd-191129134050 (1).pptxcadd-191129134050 (1).pptx
cadd-191129134050 (1).pptxNoorelhuda2
 
Computer simulation
Computer simulationComputer simulation
Computer simulationshashi kiran
 
Virtual screening of chemicals for endocrine disrupting activity through CER...
Virtual screening of chemicals for endocrine disrupting activity through  CER...Virtual screening of chemicals for endocrine disrupting activity through  CER...
Virtual screening of chemicals for endocrine disrupting activity through CER...Kamel Mansouri
 
Metabolic engineering approaches in medicinal plants
Metabolic engineering approaches in medicinal plantsMetabolic engineering approaches in medicinal plants
Metabolic engineering approaches in medicinal plantsN Poorin
 
Biotechnology And Chemical Weapons Control
Biotechnology And Chemical Weapons ControlBiotechnology And Chemical Weapons Control
Biotechnology And Chemical Weapons Controlguest971b1073
 
Docking studies on synthesized quinazoline compounds
Docking studies on synthesized quinazoline compoundsDocking studies on synthesized quinazoline compounds
Docking studies on synthesized quinazoline compoundssrirampharma
 
ANALYSIS OF PROTEIN MICROARRAY DATA USING DATA MINING
ANALYSIS OF PROTEIN MICROARRAY DATA USING DATA MININGANALYSIS OF PROTEIN MICROARRAY DATA USING DATA MINING
ANALYSIS OF PROTEIN MICROARRAY DATA USING DATA MININGijbbjournal
 
Development of machine learning-based prediction models for chemical modulato...
Development of machine learning-based prediction models for chemical modulato...Development of machine learning-based prediction models for chemical modulato...
Development of machine learning-based prediction models for chemical modulato...Sunghwan Kim
 
Bioinformatics role in Pharmaceutical industries
Bioinformatics role in Pharmaceutical industriesBioinformatics role in Pharmaceutical industries
Bioinformatics role in Pharmaceutical industriesMuzna Kashaf
 
LEAD IDENTIFICATION BY SUHAS PATIL (S.K.)
LEAD IDENTIFICATION BY SUHAS PATIL (S.K.)LEAD IDENTIFICATION BY SUHAS PATIL (S.K.)
LEAD IDENTIFICATION BY SUHAS PATIL (S.K.)suhaspatil114
 
Enzyme histochemistry.pdf
Enzyme histochemistry.pdfEnzyme histochemistry.pdf
Enzyme histochemistry.pdfVamsi kumar
 
Advanced Systems Biology Methods in Drug Discovery
Advanced Systems Biology Methods in Drug DiscoveryAdvanced Systems Biology Methods in Drug Discovery
Advanced Systems Biology Methods in Drug DiscoveryMikel Txopitea Elorriaga
 

Semelhante a In silico methods in drug discovery and development (20)

Bristol-Myer Squibb Report
Bristol-Myer Squibb ReportBristol-Myer Squibb Report
Bristol-Myer Squibb Report
 
Population Pharmacokinetic Modelling of an investigational prodrug. Crunenber...
Population Pharmacokinetic Modelling of an investigational prodrug. Crunenber...Population Pharmacokinetic Modelling of an investigational prodrug. Crunenber...
Population Pharmacokinetic Modelling of an investigational prodrug. Crunenber...
 
Project report: Investigating the effect of cellular objectives on genome-sca...
Project report: Investigating the effect of cellular objectives on genome-sca...Project report: Investigating the effect of cellular objectives on genome-sca...
Project report: Investigating the effect of cellular objectives on genome-sca...
 
Applications Of Bioinformatics In Drug Discovery And Process
Applications Of Bioinformatics In Drug Discovery And ProcessApplications Of Bioinformatics In Drug Discovery And Process
Applications Of Bioinformatics In Drug Discovery And Process
 
Cadd
CaddCadd
Cadd
 
Chemoinformatic File Format.pptx
Chemoinformatic File Format.pptxChemoinformatic File Format.pptx
Chemoinformatic File Format.pptx
 
cadd-191129134050 (1).pptx
cadd-191129134050 (1).pptxcadd-191129134050 (1).pptx
cadd-191129134050 (1).pptx
 
Computer simulation
Computer simulationComputer simulation
Computer simulation
 
Virtual screening of chemicals for endocrine disrupting activity through CER...
Virtual screening of chemicals for endocrine disrupting activity through  CER...Virtual screening of chemicals for endocrine disrupting activity through  CER...
Virtual screening of chemicals for endocrine disrupting activity through CER...
 
Metabolic engineering approaches in medicinal plants
Metabolic engineering approaches in medicinal plantsMetabolic engineering approaches in medicinal plants
Metabolic engineering approaches in medicinal plants
 
Biotechnology And Chemical Weapons Control
Biotechnology And Chemical Weapons ControlBiotechnology And Chemical Weapons Control
Biotechnology And Chemical Weapons Control
 
Docking studies on synthesized quinazoline compounds
Docking studies on synthesized quinazoline compoundsDocking studies on synthesized quinazoline compounds
Docking studies on synthesized quinazoline compounds
 
ANALYSIS OF PROTEIN MICROARRAY DATA USING DATA MINING
ANALYSIS OF PROTEIN MICROARRAY DATA USING DATA MININGANALYSIS OF PROTEIN MICROARRAY DATA USING DATA MINING
ANALYSIS OF PROTEIN MICROARRAY DATA USING DATA MINING
 
Development of machine learning-based prediction models for chemical modulato...
Development of machine learning-based prediction models for chemical modulato...Development of machine learning-based prediction models for chemical modulato...
Development of machine learning-based prediction models for chemical modulato...
 
Gdt 2-126
Gdt 2-126Gdt 2-126
Gdt 2-126
 
Gdt 2-126 (1)
Gdt 2-126 (1)Gdt 2-126 (1)
Gdt 2-126 (1)
 
Bioinformatics role in Pharmaceutical industries
Bioinformatics role in Pharmaceutical industriesBioinformatics role in Pharmaceutical industries
Bioinformatics role in Pharmaceutical industries
 
LEAD IDENTIFICATION BY SUHAS PATIL (S.K.)
LEAD IDENTIFICATION BY SUHAS PATIL (S.K.)LEAD IDENTIFICATION BY SUHAS PATIL (S.K.)
LEAD IDENTIFICATION BY SUHAS PATIL (S.K.)
 
Enzyme histochemistry.pdf
Enzyme histochemistry.pdfEnzyme histochemistry.pdf
Enzyme histochemistry.pdf
 
Advanced Systems Biology Methods in Drug Discovery
Advanced Systems Biology Methods in Drug DiscoveryAdvanced Systems Biology Methods in Drug Discovery
Advanced Systems Biology Methods in Drug Discovery
 

In silico methods in drug discovery and development

  • 1.         In Silico methods in Drug Discovery and Development Stephane Acoca Department of Biochemistry McGill University Montrea, Quebec, Canada Submitted August 2011 A thesis submitted to McGill University in partial fulfillment of the requirements for the degree of Doctor of Philosophy © Stephane Acoca, 2011
  • 2.   i   Abstract Computational drug design methods have become increasingly invaluable in the drug discovery and development process. Throughout this thesis will be described the development and application of methods that are used at every stage of the drug discovery and development pipeline. In Chapter 2 will take a look at the use computational methods towards the understanding and development of two novel Bcl-2 inhibitors, Obatoclax and ABT-737, being developed for the treatment of Cancer. The study proposes certain mechanisms through which ABT-737 displays selectivity towards certain targets within the Bcl-2 family. Additionally, we propose a binding mode for Obatoclax which is in accordance with experimental data. The following Chapter addresses the use of virtual screening for the identification of novel lead compounds. Trypanosoma brucei RNA Editing Ligase 1 was chosen as the target for the development of treatments against Trypanosoma infections and C35, a potent novel inhibitor of the enzyme, was identified. Furthermore, our research shows that the action of C35 extends to inhibition of several critical enzyme activities required for the RNA editing process as well as compromising the integrity of the multiprotein complex which carries it out. The following Chapter takes a look at the use of mass spectrometry data in order to expedite discovery of bioactive compounds in natural products. We developed an algorithm which analyses MS/MS data in order to derive the Molecular Formula of the compound. The novel algorithm obtained a 95% success rate on a test set of 91 compounds. The last Chapter of the thesis explores the use of molecular dynamics to generate a conformational ensemble of targets for virtual screening. Conformational ensembles were generated for a target test set taken from the Directory for Useful Decoys. The results showed that molecular dynamics-based conformational ensembles
  • 3.   ii   provided remarkable improvements on 2 of the targets tested due to the enhanced capacity to properly dock compounds in otherwise restricted structures. The last Chapter of the thesis is a general discussion on the work of the thesis and a proposal on how all can be integrated within the drug discovery and development pipeline.
  • 4.   iii   Résumé Les méthodes the modélisation sont devenues un outil inestimable dans le processus de découverte et de développement de nouveaux médicaments. Au cours de cette thèse va être décrit le développement et l’application de méthodes utilisés à chaque stage de la découverte et du développement de produits pharmaceutiques. Le Chapitre 2 est un aperçu sur l’utilisation de méthodes computationnelles vers le développement de deux nouveaux inhibiteurs des protéines Bcl-2, Obatoclax et ABT-737, en développement pour le traitement du Cancer. L’étude propose certains mécanismes d’ABT-737 qui expliquent ca sélectivité envers les membres de la famille Bcl-2. De plus, nous proposons un mécanisme d’attachement pour Obatoclax qui conforme aux données expérimentales. Le Chapitre suivant adresse l’utilisation du dépistage virtuel pour l’identification de nouvelles molécules mère. La Ligase de l’Edition d’ARN du Trypanosoma brucei a été choisie comme cible pour le développement de traitements contre des infections dû au Trypanosome et C35 a été identifié comme nouvel inhibiteur de l’enzyme. En outre, notre recherche démontre que l’action de C35 s’étends a l’inhibition de plusieurs enzymes nécessaires pour le mécanisme d’édition de l’ARN en plus de compromettre l’intégrité du complexe multi- protéinique qui l’effectue. Le Chapitre suivant prends regard a l’utilisation de donnes dérivant de la spectrométrie de masse pour but d’accélérer la découverte de molécules bioactives venant de sources naturelles. Nous avons développé un algorithme qui analyse les données MS/MS pour but de dériver la formule moléculaire du composé. Le nouvel algorithme a obtenu un taux de succès s’élevant à 95% sur un ensemble test de 91 molécules. Le dernier Chapitre de la thèse explore l’utilisation de simulations de dynamique moléculaire pour générer en ensemble conformationel de protéines cible pour son utilisation dans le dépistage virtuel. Les ensembles
  • 5.   iv   conformationel ont étés généré pour une série test obtenu d’un répertoire attitré ‘Directory for Useful Decoys’. Les résultats démontrent que les ensembles conformationel dérivés de la dynamique moléculaire ont apporté des améliorations remarquables sur deux des cibles testées dû à une capacité accrue de placement approprié des molécules dans un site qui est autrement très restreint. Le dernier Chapitre de cette thèse est une discussion générale sur le travail accomplie et une proposition sur la manière dont tous les éléments sont intégrer dans un protocole de découverte et de développement de produits pharmaceutiques.
  • 6.   v   Acknowledgements I would like to thank first Dr Enrico Purisima and Prof. Gordon Shore for their mentorship and patience throughout the doctoral work leading to this thesis. I am very thankful for the experiences I have during my tenure in Dr Purisima’s laboratory. I would like to show my special thanks to former members of the laboratory Dr Sathesh Bhat, Dr Marwen Naim, Herve Hogues, and Dr Qizhi Cui whose guidance, friendship, inspiration, assistance and support have been invaluable during my tenure. My long conversations on modeling with Dr Bhat and Mr Hogues have been of special value in my learning of computational modeling. I would also like to extend thanks to current and past members of the laboratory which include Dr Shafinaz Chowdhury, Dr Christophe Deprez, Dr Edwin Wang and Dr Sheldon Dennis for creating a positive work environment. I would like to show special thanks members of my Research Advisory Committee (RAC) Prof. John Silvius, Prof. Albert Berghuis who have been of great assistance in guiding me through the completion of the doctoral work. I’d also like to add special recognition to Prof. Imed Gallouzi for his help. I’d also like to thank the Chemical Biology program at McGill, which has partially funded my work. Lastly, I’d like to thank my family for their continued encouragement and support.
  • 7.   vi   Table of Contents Abstract i Résumé iii Acknowledgements v Table of Contents vi List of Figures x List of Tables xiii Abbreviations xv Contribution of Authors xvii Chapter 1. General Introduction 1.1 Drug Discovery and Development 2 1.1.1 Overview & Challenges 2 1.1.2 Thesis Outline 5 1.2 Molecular Modeling 7 1.2.1 Molecular Mechanics 7 1.3 Predicting Binding Free Energies – Scoring 12 1.3.1 Effect of water: Continuum (Implicit) Solvation energy 15 1.3.1.1 Finite difference 18 1.3.1.2 Boundary Element Method 19 1.3.1.3 Desolvation Cost 19 1.3.2 Scoring Functions 21 1.3.2.1 Physical-Chemical 22 1.3.2.2 Empirical function 25 1.3.2.3 Knowledge-based 25 1.3.2.4 Problems 26 1.4 Predicting Binding Modes – Docking 27 1.4.1 Docking Algorithms 28 1.4.1.1 Fast Shape Matching 28 1.4.1.2 Incremental Construction 29 1.4.1.3 Monte Carlo Simulations 30 1.4.1.4 Evolutionary Programming 31 1.5 Molecular Dynamics 32 1.5.1 Newton’s Laws 32 1.5.2 Ensembles 34 1.5.3 Verlet Algorithm 34 1.5.4 Considerations 36 1.5.5 Boundary Conditions 38 1.5.6 Long Range Electrostatic Calculations: The Ewald Summation Method 39 1.6 Virtual Screening 40 1.6.1 Virtual Screening Pipeline 41
  • 8.   vii   1.6.2 The Target 43 1.6.3 The Compound Database 44 1.6.4 The Docking Protocol 44 1.6.5 MD Simulations 45 1.6.6 Conformational Ensembles 45 1.7 Successes of CADD 48 Chapter 2. Molecular Dynamics Study of Small Molecule Inhibitors of the Bcl-2 Family Preface 51 2.1 Rationale 52 2.2 Abstract 52 2.3 Introduction 53 2.4 Methods 58 2.4.1 Structure Preparation 58 2.4.2 Force Field Parameters 58 2.4.3 Docking 59 2.4.4 Molecular Dynamics Simulations 60 2.4.5 Binding free energy estimate 61 2.5 Results and Discussion 62 2.5.1 Molecular Modeling of ABT-737 complexes 62 2.5.2 Binding groove structure 64 2.5.3 Chlorobiphenyl group 65 2.5.4 Phenylpiperazine linker 67 2.5.5 Nitrophenylsulfonamide group 69 2.5.6 S-phenyl group 71 2.5.7 Dimethyl group 72 2.5.8 SIE Analysis and Virtual Alanine Mutations 72 2.5.9 Protein structure and dynamics 75 2.5.10 Mcl-1 and obatoclax 78 2.6 Conclusion 80 Chapter 3. Naphthalene-based RNA editing inhibitor blocks RNA editing activities and editosome assembly in Trypanosoma Brucei Preface 83 3.1 Rationale 84 3.2 Abstract 84 3.3 Introduction 85 3.4 Experimental Procedures 88 3.4.1 Structure Preparation 88 3.4.2 Virtual Screening 89 3.4.3 Solvated Interaction Energy 90
  • 9.   viii   3.4.4 Preparation of mitochondrial extract and tandem affinity purification 90 of ligase complex 3.4.5 Preparation of RNAs 91 3.4.6 Adenylylation and deadenylylation assays 91 3.4.7 In vitro RNA editing assays 92 3.4.8 Gel shift assay 93 3.4.9 Guanylyltransferase labeling 94 3.5 Results 94 3.5.1 Virtual Screening 94 3.5.2 Inhibition of RNA editing by selected compounds 97 3.5.3 Inhibition of ligase adenylylation at low protein concentrations by C35 and S5 100 3.5.4 Inhibition of deadenylylation by C35 and S5 102 3.5.5 Inhibition of different steps of RNA editing by C35 and S5 104 3.5.6 Inhibitory compounds affect the editosome RNA-binding activity 107 3.5.7 20S editosome complex integrity is affected by C35 treatment 110 3.6 Discussion 112 3.7 Acknowledgments 117 Chapter 4. Automated Molecular Formula Analysis determination by Tandem Mass Spectrometry (MS/MS) Preface 119 4.1 Rationale 120 4.2 Abstract 120 4.3 Introduction 121 4.4 Experimental 125 4.4.1 Materials 125 4.4.2 Instrumentation 125 4.4.3 MS/MS experiments 126 4.4.4 The algorithm of molecular formula analysis 127 4.4.5 Nitrogen-enriched or oxygen-enriched compounds 132 4.5 Results and Discussion 133 4.5.1 Risk of assigning incorrect molecular formula 133 4.5.2 Mass accuracy 134 4.5.3 Fragmentation pathways of brefeldin 4 135 4.5.4 Molecules with single structural domain 137 4.5.5 Molecules with multiple core structures 140 4.5.6 Analysis of structurally-related compounds 143 4.5.7 Cyclazocine and N-alllylnormetazocine 146 4.5.8 Peptides 148 4.5.9 Chloro- or bromo-containing compounds 152 4.6 Conclusion 154 4.7 Acknowledgements 155
  • 10.   ix   Chapter 5. Molecular Dynamics ensemble in Virtual Screening Preface 157 5.1 Rationale 158 5.2 Abstract 158 5.3 Introduction 159 5.4 Methods 162 5.4.1 Structure Preparation 162 5.4.2 Ligand Preparation and Docking 163 5.4.3 Molecular dynamics simulations 164 5.4.4 Force Field Parameters 165 5.4.5 Clustering 165 5.4.6 Test Data Sets 165 5.5 Results and Discussion 166 5.5.1 Overview of Results 166 5.5.2 Obstructive changes during apo simulations 167 5.5.3 Performance of holo ensemble 170 5.5.4 Structural change in holo ensemble 171 5.5.5 Effect on score distribution 174 5.5.6 Comparison with RCS 177 5.5.7 Use of DUD training set 178 5.6 Conclusion 180 Chapter 6. General Discussion 6.1 Molecular Dynamics Study of Bcl-2 Inhibitors 6.2 Discovery of TbRel1 Inhibitors 6.3 Automated Molecular Formula determination by Tandem Mass Spectrometry 6.4 Ensemble-based Virtual Screening Appendices Appendix A Appendix B Appendix C Appendix D References 222 Original Contributions to Knowledge 250    
  • 11.   x   List of Figures Chapter 1 Figure 1.01 The pharmaceutical drug discovery and development pipeline Figure 1.02 Increasing costs in pharmaceutical R&D Figure 1.03 Pre-approval costs for new drugs Figure 1.04 The contributions of bonded terms to the potential energy function Figure 1.05 Cubic grid scheme for the Finite Difference Method Figure 1.06 Representation of desolvation effects during ligand-protein complex formation Figure 1.07 Periodic boundary conditions in molecular dynamic simulations Figure 1.08 The virtual screening pipeline Chapter 2 Figure 2.01 ABT-737 chemical structure Figure 2.02 Obatoclax chemical structure Figure 2.03 Multiple sequence alignment of representative BH3 domains from BH3-Only proteins. Figure 2.04 Superposition of ABT-737 and Bim BH3 peptide bound to Bcl-xL. Figure 2.05 Calculated binding mode of ABT-737 in Bcl-xl, Bcl-2 and Mcl-1. Figure 2.06 Distance of the ABT-737 biphenyl ring centroids from their initial positions after superposition of the protein C-alpha atoms to those in the first snapshots. Figure 2.07 Distance of the ABT-737 linker ring centroids from their initial positions after superposition of the protein C-alpha atoms to those in the first snapshots. Figure 2.08 Distance of the ABT-737 nitrophenyl and S-phenyl ring centroids from their initial positions after superposition of the protein C-alpha atoms to those in the first snapshot. Figure 2.09 Calculated binding mode of obatoclax in Mcl-1. Chapter 3 Figure 3.01 Predicted binding modes of TbREL1 Figure 3.02 Effect of selected compounds that inhibit editosome activity Figure 3.03 Effect of inhibitory compounds on adenylylation and deadenylylation steps of RNA editing ligases Figure 3.04 Effect of inhibitory compounds on different steps of RNA editing Figure 3.05 Effect of inhibitory compounds on RNA-binding activity of editosome complex Figure 3.06 Analysis of sedimentation profile and activity of ligase-associated complexes in the presence of C35
  • 12.   xi   Figure 3.07 Alternative models for the mechanism of action of C35 and S5. Chapter 4 Figure 4.01 The MS/MS spectrum of brefeldin A Figure 4.02 Fragmentation pathways of brefeldin A Figure 4.03 The MS/MS spectrum of prazosin Figure 4.04 Fragmentation pathways of prazosin Figure 4.05 The MS/MS spectrum of dihydroergotamine and dihydroergocristine Figure 4.06 Fragmentation pathways of dihydroergotamine Figure 4.07 Structures of dihydroergotamine and dihydroergocristine Figure 4.08 The MS/MS spectrum of cyclazocine and N-allylnormetazocine Figure 4.09 Fragmentation pathways of cyclazocine Figure 4.10 The MS/MS spectrum of 5-leucine encephalin Figure 4.11 Stepwise analysis of 5-leucine encephalin sequences Figure 4.12 Overall detail analysis of 5-leucine encephalin Figure 4.13 The MS/MS spectrum of quinacrine Figure 4.14 Shows the plausible fragmentation pathways of quinacrine Chapter 5 Figure 5.01 Changes in binding site observed in the apo ensemble in a) COX2, b) AR, c) GART and d) PARP. Figure 5.02 Changes in binding site observed in the holo ensemble for COX2. Figure 5.03 Changes in binding site observed in the holo ensemble for AR. Figure 5.04 Changes in binding site observed in the holo ensemble for ER. Figure 5.05 Score distribution of true binders across the crystal structure and selected holo ensemble structure for a) ER, b) AR, c) EGFR, and d) COX2. Appendix A Figure A.01 Helices surrounding the binding grooves of Bcl-xL, Bcl-2 and Mcl-1. Figure A.02 Distance between ABT-737 sulfonamide HN and backbone carbonyl O of Bcl-xL Asn136, bcl-2 Asn140 and Mcl-1 Asn260. Figure A.03 Hydrogen bond pair distances between ABT-737 sulfonyl O and side chains in Bcl-xL, Bcl-2 and Mcl-1
  • 13.   xii   Figure A.04 Hydrogen bond pair distances between ABT-737 dimethylamino HN and side chain carboxylate O in Bcl-xL and Bcl-2. Figure A.05 Distance of ABT-737 ring centroids from their initial positions after superposition of the protein C-alpha atoms to those in the first snapshot. B Figure B.01 Inhibitors identified from first round of virtual screening. Figure B.02 Previously identified inhibitors not retrieved in virtual screening. D Figure D.01 Overview of VS results for the crystal structure and apo/holo ensembles Figure D.02 Ensemble-based VS results for structures generated from apo MDs
  • 14.   xiii   List of Tables Chapter 2 Table 2.1 Solvated interaction energies (SIE) in kcal/mol Table 2.2 Virtual alanine mutations Chapter 3 Table 3.1 Virtual hits selected for experimental validation Chapter 4 Table 4.1 Potential neutral losses in the MS/MS experiment in forward MFA Table 4.2 Reverse MFA of brefeldin A with correct formula of precursor ion Table 4.3 Reverse MFA of brefeldin A with incorrect formula of precursor ion Table 4.4 Molecular formula analysis of prazonsin Table 4.5 Molecular formula analysis of dihydroergotamine Table 4.6 Molecular formula analysis of dihydroergocristine Table 4.7 Molecular formula analysis of cyclazocine Table 4.8 Molecular formula analysis of N-allylnormetazocine Table 4.9 Molecular formula analysis of quinacrine Chapter 5 Table 5.1 Targets of the DUD set selected and properties of each set Appendix A Table A.01 Fourier coefficients for ca-s6-n-ca
  • 15.   xiv   B Table B.01 Ranking of selected hits from virtual screen C Table C.01 Molecular formula analysis of 5-leucine enkephalin
  • 16.   xv   Abbreviations ADA Adenosine Deaminase AR Androgen Receptor BCL-2 B-Cell Lymphoma 2 BEM Boundary Element Method CADD Computer-Aided Drug Design CML Chronic Myelogenous Leukemia COX2 Cyclooxygenase 2 CRK Cdc2-Related Kinase DNDi Drugs for Neglected Disease initiative DUD Directory of Useful Decoys EGFR Epidermal Growth Factor Receptor EP Evolutionary Programming ER Estrogen Receptor FDM Finite Difference Method FXa Factor Xa GA Genetic Algorithm GART Glynacinamide Ribonucleotide Transformylase gRNA guide RNA GSK Glycogen Synthase Kinase HSP90 Heat Shock Protein 90 IC Incremental Construction KB Knowledge-based KBP Knowledge-based potentials LGA Lamarckian Genetic Algorithm MAPK Mitogen-Activated Protein Kinase MC Monte-Carlo MD Molecular Dynamics MF Molecular Formula MFA Molecular Formula Analysis MM Molecular Mechanics MW Molecular Weight NCE New Chemical Entity NS Nanoseconds NTD Neglected Tropical Diseases PARP Poly ADP-Ribose Polymerase PDB Protein Data Bank PBSA Poisson-Boltzmann Surface Area PS Picoseconds RCS Relaxed Complex Scheme RMS Root Mean Square SA Surface Area SBDD Structure-Based Drug Design SIE Solvated Interaction Energy
  • 17.   xvi   SM Shape Matching SRC SRC Tyrosine Kinase TbRel1 Trypanosoma Brucei RNA-editing Ligase 1 VDS Virtual Decoy Set VdW Van der Waals VS Virtual Screening
  • 18.   xvii   Contributions of Authors This thesis includes the text and figures from 3 published articles. I am the first author in one of the manuscript (Chapter 2) and second author in the remaining two (Chapter 3 & 4). Additionally, the thesis includes the text and figures from work to be completed towards the publication of a manuscript (Chapter 5). This thesis has been written in manuscript-based format, and the references of all chapters have been combined into one reference section at the end of the dissertation. The contributions of the authors for each of the manuscripts are as follows: Chapter 2: Acoca S., Cui Q., Shore G.C., Purisima E.O. 2011. Molecular Dynamics Study of Small Molecule Inhibitors of the Bcl-2 Family. Proteins. 79(9):2624-36. I performed all original work and completed the first draft of the manuscript. Prior to submission, Dr Cui reran a number of the simulations and Dr Purisima reworked the manuscript. Chapter 3: Moshiri H., Acoca S., Kala S., Najafadabi H.S., Hogues H., Purisima E.O., Salavati R. 2011. RNA Editing Ligase 1 Inhibitors Blocks RNA Editing Activities and Editosome Assembly in Trypanosoma Brucei. J Biol Chemistry. 286(16):14178-89. My contributions to the manuscript involved the virtual screening segment of the work. Specifically, the a) Virtual Screening section, b) Figure 1, c) Table1 and d) all relevant section of the Experimental Procedures (Structure Preparation, Virtual Screening and Solvated Interaction Energy). Prof Salavati’s Group carried out all experimental testing of the compounds and its inhibitory properties with regards to the 20s Editosome activities. Chapter 4: Jarussophon S, Acoca S, Gao J.M., Deprez C., Kiyota T., Draghici C., Purisima E., Konishi Y. 2009. Automated Molecular Formula Determination by Tandem Mass Spectrometry (MS/MS). Analyst 134(4):690-700.
  • 19.   xviii   I wrote the code for the software that ran the analysis and collaborated with Dr Konishi in its development. The algorithm implemented in the software was originally developed by Dr Konishi and his group. Dr Deprez is responsible for the continued maintenance of the software. Chapter 5: Acoca S., Hogues H, Purisima EO. 2010. Molecular dynamics ensembles for virtual screening. (Manuscript in preparation). The entirety of the work for this manuscript was carried by me. The docking scripts for the tailoring of the pipeline to ensemble virtual screening were written by Mr Hogues.  
  • 21. 1.1 1.1.1 T since over from case, multi 1.01) F PR T Drug Di Overview Though the u the beginnin a century old plants and m the modern i-step proces ). Figure 1.01 RE-CLINI Target Identif Identificati Lead Comp iscovery a w & Challen use of foreign ng of time, t d. Since then microbial sou pipeline for ss involving The Ph ICAL ST In Vitro Te Animal Te fication on of ounds and Develo nges n substances the use of an n, medicinal urces, or pro r pharmaceut the collabor harmaceutica TUDIES esting esting 2 opment s for the treat n isolated, we l substances oducts of pur tical drug di rative effort o al Drug Disc Ph Ph tment of illn ell-defined c have been n re chemical scovery and of a multitud overy and D CLINIC hase I hase II Lead O nesses has be chemical ent natural produ synthesis. W d developmen de of special Development AL STUD Optimization een practiced tity is only ucts isolated Whichever th nt is a long, lties (Figure Pipeline DIES Phase III Phase IV n d he
  • 22. 3 However, no venture of pharmaceutical research is without risk and a positive outcome of the research is all but guaranteed. The difficulties inherent in discovery and development along with the stringent requirements of pharmaceutical drugs have created an economic problem in the profitability of such endeavors. Despite some spectacular successes, more is spent on drug discovery and development every year and less is delivered in terms of innovation (DiMasi et al., 2003). Figure 1.02 shows the reported aggregate annual domestic prescription drug R&D expenditures for all members of the U.S. pharmaceutical industry since 1963 alongside with the number of new US drug approvals by year (DiMasi et al., 2003). When compared, the rate of growth of R&D expenditures clearly outpaces that of new approvals by a large margin. These rising costs have led to an overwhelming economical R&D problem within the pharmaceutical industry. In 2003, a study of 68 new medications placed a timeline of 10-12 years and cumulative costs averaging US$897 million for the development and marketing of a new medication (Ezzell, 2003). The pre-approval R&D costs themselves are up from US$138 million in 1979 to US$318 million in 1991 to US$802 million in 2000 (Figure 1.03). The result of these increases in higher R&D costs is an increased trend towards mergers and industry consolidation. Additionally, higher costs translate into lowering risks. Reorganization of R&D sectors in the pharmaceutical industry aims to optimize the return on investment by carefully selecting the most profitable research sectors. The sum of these effects leads to an increased need in efficient, low-cost technologies that bridge the gap between R&D and the economic challenges facing the pharmaceutical industry.
  • 23. F in fr (P F pr D Figure 1.02 ndustry R&D rom 1963 to 2 PhRMA) and Figure 1.03 re-clinical, cl DiMasi et al., 2 Increa expenditures 2000. Source d Tufts CSDD Pre-ap inical and tot 2003) sing costs in s (2000 dollar of data: Phar D Approved N pproval costs tal costs per ap 4 pharmaceut rs) and US ne rmaceutical R NCE database. for new dru pproved new tical R&D. In ew chemical e Research and M . (Taken from ugs. Each colu w drug in 2000 nflation-adjus entity (NCE) Manufacturer m DiMasi et al umn indicates 0 US dollars. sted approvals rs of America l., 2003) s the capitaliz (Taken from a ze
  • 24. 5 Computer-Assisted Drug Design (CADD) approaches have been widely used in the pharmaceutical industry. By allowing scientists to direct their attention on the most promising candidate compounds, and thereby narrowing the synthetic and biological testing efforts, CADD approaches play an important role in accelerating pharmaceutical research. The recent successes of CADD in assisting rational drug design approaches have proven it to be an essential tool drug design and development (Kapetanovic IM, 2008; Mandal et al., 2009; Song et al., 2009). 1.1.2 Thesis Outline As part of this thesis, several elements of CADD have been incorporated into research targeted at every step of the pharmaceutical drug discovery pipeline. The following is a description of the contributions of each chapter to the individual segments of the pharmaceutical drug design and discovery pipeline. Chapter 4 explores the lead identification stage of the pipeline and provides an alternative means of expediting research when identification of an active compound from a natural products sample is required. Natural products (and their semi-synthetic derivatives) have been major sources of marketed medications. However, lead isolation and identification from natural product extracts faces the problem of replication, i.e. the re-discovery of known natural products. Chapter 4 presents the development of a novel algorithm which utilizes MS/MS data to extrapolate the correct molecular formula of a
  • 25. 6 compound resulting in a rapid identification of the probable nature of the isolated compound. Chapters 3 and 5 look at lead identification from the alternative source: compound databases. Chapter 3 is the application of our virtual screening (VS) pipeline to the Trypanosoma Brucei RNA-editing Ligase 1 (TbRel1) where the success of our screen led to the identification of an inhibitor which allowed a better understanding as to the effects of inhibition. Chapter 5 seeks to further enhance the current VS pipeline by utilizing MD- generated conformational ensembles. Here the use of conformational ensembles (see Chapter 1.6.6) attempts to provide a better, more complete representation of the target’s conformational dynamics as part of the VS process. Lastly, Chapter 2 is a representative pre-clinical molecular dynamics (MD) study of a lead compound in complex with its target. The in silico MD study provides the opportunity to researchers of obtaining information on the mechanism of action of the compound that would be unavailable through the usual experimental means. Our experiments aimed at identifying specific structural factors which provided the specificity of two compounds which target the Bcl-2 family of proteins which have recently become key targets for cancer therapeutics.
  • 26. 7 1.2 Molecular Modeling The field of Computational Drug Design relies on the development of our understanding of the underlying mechanisms involved in the interactions of a drug and its target. As such, the development of Molecular Mechanics (MM) and Quantum Mechanics (QM) has brought about the study of drug-target recognition events at the atomic and electronic level. The increasing accuracy of these models, along with that of the computational resources required to compute them, has prompted the development of computational tools with increasing accuracy in evaluating drug-target interactions. 1.2.1 Molecular Mechanics First applied by Westheimer and Mayer in 1946, MM encompasses the computational techniques that allow the calculation of molecular properties through the use of classical mechanics and electrostatics (Westheimer and Mayer, 1946). MM provides the means to computationally describe molecular structures and properties practically. As opposed to QM where the primary purpose is the accuracy of the calculations, MM packages are directed to describe molecular structures and properties accurately, robustly, and within reasonable time frames (Boyd DB and Lipkowitz KB, 1982). To do so MM (also referred to as Force Fields) describes molecules as a collection of atoms held together by elastic or harmonic forces. These forces essentially represent the structural features of a molecule such as bond lengths, bond angles, dihedral angles, etc. Functions are used to describe the behavior of these forces resulting in a calculated
  • 27. 8 potential energy for each. As such, the total potential energy of a molecule is calculated by the sum of all energy contributions (Eq. 1.01): = + + + + (1.01) Functional form of the potential energy of a molecule where Ebond, Eangle, Etorsion, Evdw and Eelec describe the bond length, bond angle, torsion angle, Van der Waals and electrostatic contributions respectively. Energy contributions are calculated to describe the deviation of structural features from their empirically (or high-level QM) calculated ideal value. While the exact mathematical functions utilized to describe these contributions may differ between MM packages, the functions are chosen to accurately replicate the behavior of each energy contribution within expected ranges while minimizing the amount of calculations, and therefore of computational time, required. From herein, all discussions on the potential energy function will refer to that implemented by the AMBER forcefield (Cornell et al., 1995). The potential energy function is described as follows: Ε = − + − + (1.02) 2 1 + cos( Φ − ) + − + The potential energy function
  • 28. wher length bond atom and r will n e Kr, Kθ and h respective ed paramete -centered pa rij is the dista now be discu Figure 1.0 The torsion series, harm Harbury, 2 d Vn are force ly, and γ is t ers used to co artial charges ance between ussed briefly 04 The co n angle, bond monic potenti 2007). e constants, the phase for ompute the v s on atoms i n atoms i an y. ontributions o d length/angle ial, and Lenar 9 θeq and req a r the torsiona van der Waa and j respec d j. Each ter of bonded ter e, and VdW c rd-Jones pote are the equil al angle. Aij als energies. ctively, ε is rm of the pot rms to the po ontact are rep ential respecti librium bond and Bij are t qi and qj de the dielectri tential energ otential ener presented by a ively.(Taken f d angle and the non- enote the c constant gy function rgy function. a Fourier from Boas annd
  • 29. 10 Ε = − (1.03a) Ε = − (1.03b) The bond length and bond angle contributions to the potential energy function. The typical bond length of an alkane carbon-carbon bond is 1.53Å. Similarly, the angle between a typical C-C-C bond is between 109° and 114°. Deviations from these equilibrium values will result in an increase in the energy of the system. Therefore, thinking of a molecule as an assembly of point masses held together by springs (the bond lengths and angles) is a perfectly reasonable approximation to their experimental behavior. Therefore, the Ebond, Eangle terms are modeled as harmonic potentials centered around an equilibrium value (Eq. 1.03a,b). Ε = 2 1 + cos( Φ − ) (1.03 ) The dihedral angle contribution to the potential energy function. The torsion angle is essentially the rotation about bonds. For any set of four covalently bonded atoms ABCD the torsion angle is described as the angle measured about the BC axis from the ABC plane to the BCD plane. The periodic nature of the torsion angle, and of the torsional potential energy, lends itself to be described by periodic functions such as a Fourier series with the series typically truncated at the third term (Eq. 1.03c).
  • 30. 11 Ε = − (1.03 ) The VdW contribution to the potential energy function. The Van der Waals (VdW) energy relates to non-bonded interactions of atoms as a function of the distance between the nuclei. As two atoms approach one another, London dispersion forces predominate creating a net attractive force between them. As the distance between the two radii get too close, a VdW repulsion comes into play. The attractive and repulsive parts of the potential energy is described by the Lenard-Jones (6- 12) potential though the more computationally demanding Buckingham potential can also be used (Eq 1.03d). Parameters for the VdW energy term are obtained by measuring non- bonded contact distances in crystals as well as VdW contact data for rare gas atoms though other non-experimental sources (simulations) can also be used (Boyd and Lipkowitz, 1982; Cornell et al., 1995). Ε = (1.03 ) The electrostatics contribution to the potential energy function. The last term in the potential energy function calculates the electrostatic energy associated with interaction of two point charges, as described by Coulomb’s law. Therefore, the magnitude of the electrostatic forces (and energy) of interaction between two point charges is directly proportional to the scalar multiplication of the magnitude of the charges and inversely proportional to the square of the distances between them (Eq.
  • 31. 12 1.03e). Applications of MM Force fields include, but are not limited to, energy minimization, scoring, docking, molecular dynamics and Monte Carlo methods. Over the past decade, the development of techniques such as high-throughput X- ray crystallography has expedited the rate of macromolecular structure determination resulting in a current total of ~70000 crystallographic or solution structures of proteins deposited in the Protein Data Bank (Berman et al., 2000). The availability of this wealth of structural information, along with well-documented successes, has generated considerable interest the advancement of structure-based drug design (SBDD) techniques (Marrone et al., 1997). A number of structure-based screening methods have been developed to expedite pharmaceutical research. These methods have been used in lead discovery identify novel chemical entities showing strong inhibitory activity towards a target and in lead optimization where the careful selection of an optimized lead within a set of chemically similar compounds is required. The following sections will include an overall review of the methods which have been most crucial to the development of molecular modeling in drug design, namely predicting binding free energy and binding modes, and will be followed by a review of Molecular Dynamics & Virtual Screening methods which have together significantly contributed to the advancement of CADD. 1.3 Predicting Binding Free Energies - Scoring Calculations of free binding energies play an important role in the accuracy of SBDD techniques (Raha and Merz, 2005). The major function of such techniques is in
  • 32. 13 providing estimates of binding free energies at a faster rate and lower cost than that possible by experimental means. As such, the correlation between experimental and computationally derived binding free energies to a target is a prerequisite to their success in drug design. The selective binding of a small molecule to a target protein is the result of complementary structural and energetic features. This reaction is determined by the standard Gibb’s free energy of binding Δ ∘ under standard state conditions (concentrations at 1M, temperature is 298K and pressure of 1atm). The experimentally determined association, dissociation and inhibitory constants (KA, KD and Ki respectively) relate to the standard Gibbs free energy as follows: = = = (1.04) Δ ∘ = − (1.05) As the binding free energy of a system is a state function, theoretical calculations of the binding free energy can approximate the binding free energy in a direct fashion, by calculating the properties of the protein and ligand individually and then of their complex:
  • 33. 14 Δ = − + (1.06) where ∆ is the free energy of binding, is the free energy of the complex, and the free energies of the protein and ligand respectively. Another form of expression of the binding free energy used is the decomposition into different additive free energy components integrated into a single equation: Δ = Δ + Δ + Δ + Δ (1.07) In Eq 1.08 , ∆ is the interaction free energy owing mostly to electrostatic and steric enthalpic contributions from complex formation, ∆ is the free energy of solvation which accounts for solvent effects in binding, ∆ is the free energy change associated with changes in the motion of the components of the system, and ∆ accounts for the free energy due to conformational changes upon complexation. Scoring functions address these components of the binding free energy differently. Chapter 1.3.2 is a review of the different methods employed to evaluate them. However, before addressing the differences between scoring methods, an extensive review of solvation effects on binding is appropriate, as the development of implicit solvation models has had a tremendous impact in the calculation of solvation free energies and hence of our ability to estimate binding free energies (Tomasi and Persico, 1994; Orozco and Luque, 2000).
  • 34. 15 1.3.1 Effect of water: Continuum (Implicit) Solvation energy Protein-ligand binding is a process that normally occurs within an aqueous environment. These interactions play a significant role in binding energetics and are thus taken into account when making binding free energy predictions. The effective dielectric constant of water at 25°C is 78.5 while that of vacuum is 1. This energy is the result of a favorable interaction between the atomic charge and the high-dielectric environment. As a result of this favorable interaction, there is an energy penalty when polar parts of the ligand are removed from their contact with water and exposed instead to the binding site. Additionally, the presence of water results in the effective screening of charge-charge interactions as indicated by the dielectric constant in the Coulomb equation (Eq. 1.03e). However, the interface of a protein-ligand complex usually excludes the presence of water molecules. In order to account for the distance-dependence on the effect of water on charge-charge screening, a crude screening model that contains a distance-dependent dielectric constant was introduced. In this model, for all atoms i and j in Eq 1.03e the effective dielectric constant would be ε = Crij where C is a constant and rij is the interatomic distance. While this model allows for the rapid calculation of one of the major effects of water, it does not account for the one-body solvation energy for each atom. Additionally, in calculating the electrostatic interaction between two atoms, the position of all other protein and ligand atoms affect it and should also be taken into account.
  • 35. 16 Continuum (implicit) solvation models can account for the additional complexities of electrostatic interactions. The continuum solvation models essentially treat the solvent as a bulk dielectric medium with a dielectric constant of ~80 (Dout=78.5 for water at 25°C) and the protein/ligand as low-dielectric regions with enclosed atomic charges. Numerical solutions to the Poisson Boltzmann (PB) equation provide an efficient means of calculating the electrostatic potential produced by a system. The PB equation relates the electrostatic potential Φ(r) to the charge density ρ(r) as: ∇( ( )∇ ( )) = −4 ( ) (1.08) where ε(r) is the dielectric constant. The total free energy of solvation is calculated as follows: ∆ = ∆ + ∆ (1.09) In Eq. 1.09, Φ(r), obtained from solving the PB equation, allows for the computation of the total electrostatic energy component of the solvation free energy: = ∑ ( ) = ∑ ( ( ) + ( )) (1.10) where ϕC and ϕR are respectively the Coulomb and reaction field potential. The Coulombic component is calculated as a Coulomb summation over all other charges than qi :
  • 36. 17 ( ) = 1 (1.11) The reaction field component ϕR of the electrostatic potential is derived from numerical solutions to the PB equation, using either a finite difference scheme (FDM) or a boundary element method (BEM) (Gilson et al., 1988; Honig and Nicholls, 1995; Purisima and Nilar, 1995). Therefore, once a solution to the PB equation is calculated, the electrostatics component of the solvation free energy is obtained. The non-polar segment is derived from surface area terms. It contains contributions from cavity formation and solvent-solute dispersion-repulsion interactions. These terms are often considered to be proportional to the molecular surface area (Floris and Tomasi, 1989; Still et al., 1990; Gogonea and Merz, 1999). The general formula for this term is therefore: = ∑ (1.12) where Ai is the furnace area of one solute atom and τi is a surface tension parameter specific for that atom. Typically, the molecular surface will be defined as the solvent- excluded surface area or the solvent-accessible surface area. The solvent-excluded surface may however perform better than other surface models (Pitarch et al., 1996).
  • 37. The f meth Elem 1.3.1 Honi (War super poten withi betwe assign formu following se ods for solvi ment (BEM) m .1 Fi While the g’s group in rwicker and W rimposed on ntial, charge in the lattice een grid poin ned proporti ula then calc Figure (Taken f ctions will ta ing the PB e methods. inite Differen e FDM was f n the develop Watson, 198 to the solute density, diel (Fig. 1.05). nts, the alloc ionally to the culates the d 1.05 from Folgaro ake a closer equation, nam nce Method first introduc pment of the 82; Gilson et e and surroun lectric const As the posit cated charge e distance of erivatives of Cubic grid o et al., 2002) 18 look at the t mely the Fin ced by Warw Delphi prog t al., 1988). I nding solven tant and ioni tion of the at e at each of th f the grid po f the PB equ scheme for t two most com nite Differenc wicker and W gram has wid In the FDM, nt where valu ic strength ar toms of the s he eight neig int to the ch uation. the Finite Dif mmonly em ce (FDM) an Watson, the w dely popular , a cubic latt ues of the el re assigned t solute usuall ghboring gri harge. A finit fference Met mployed nd Boundary work of rized its use tice is first lectrostatic to grid point ly fall id points is te difference thod. y ts e
  • 38. 19 1.3.1.2 Boundary Element Method The BEM is an alternative approach to FDM for solving the PB equation. In the BEM, the potential is represented as a charge density spread over the molecular surface (Zauhar and Varnek, 1996). Instead of directly solving for the PB equation, the BEM considers the induced-surface charge to develop an integral formulation to the problem. This is expressed as: = ( ) | | (1.13) where is the electrostatic potentials due to the surface charge distribution, σ(r) is the surface charge density and the integral is taken over the entire molecular surface area (Zauhar and Morgan, 1985; Purisima and Nilar, 1995) . The SIE and SIETRAJ scoring functions used throughout this thesis for computation of binding free energies, compute the reaction field energy using the BRI-BEM program which utilizes the BEM (Purisima and Nilar, 1995; Purisima EO, 1998; Naïm et al, 2007; Cui et al., 2008). 1.3.1.3 Desolvation cost Continuum solvation studies on the energetics involved in ligand-binding have been conclusive in noting the large, unfavorable effects of solvent-screening on the overall electrostatic change in free energy (Kuhn and Kollman, 2000; Wang et al., 2001; Hou et al., 2002). Complex formation between a protein and ligand involves the breakage
  • 39. 20 and formation of several hydrogen bonds that includes the reorganization of water molecules around the ligand and target active site (Fig. 1.06). While the gas-phase interaction between the ligand and protein is favorable, the desolvation of the binding pocket involved in ligand-binding results in an overall large energetic penalty. Hence, ligand-binding is suggested to be primarily driven by short-range (vdW) and long-range hydrophobic forces (Hünenberger et al., 1999; Kuhn and Kollman, 2000; Wang et al., 2001; Hou et al., 2002). This phenomenon can be better described by looking at the electrostatic component of the binding free energy. The electrostatic change in binding free energy is expressed in Eq. 1.15 as the sum of the change in reaction field and coulomb binding free energies: Δ = Δ + (1.14) where Δ is the change in reaction field energy and is the change in intermolecular Coulomb energy. Computational studies have noted that while the intermolecular Coulomb energy favors binding, the desolvation effects are incompletely compensated by ligand-target interaction in the bound state resulting in an unfavorable effect on the binding free energy (Hendsch and Tidor, 1999; Miyashita et al., 2003; Sims et al., 2005).
  • 40. 1.3.2 of inh mode rates 2009 devel respe abilit funct comp equat most Figure 1.0 Scoring F In their m hibitors and es. Virtual sc and improve ). The maste lopment of s ect to their ab ty to predict tions may us putational de tion, most sc dominant co 06 Repres format Functions most general u provide an a creening pip e lead comp er equation ( scoring funct bility to pred binding mod e all or some emands in as coring functi ontributions sentation of d tion. (Taken f use, scoring accurate disc elines have u ound identif Eq. 1.08) de tions. The fo dict binding des is also o e of the diffe ssessing a mo ions employ into accoun 21 desolvation e from Cozzini functions ar crimination b used scoring fication (Grü escribed is us ollowing is a affinity and/ f interest (H erent terms e ore rigorous a more emp nt. This provi effects during et al., 2004) re designed t between true g functions to üneberg et al sed as an ov an overview /or affinity r Halperin et al expressed in representati pirical expre ides a fast an g ligand-prot to predict bi e and false b o improve en l., 2002; Seih verall guide f of scoring fu ranking thou l., 2002). Sco n Eq. 1.08. D ion of the m ssion, taking nd accurate m tein complex inding mode binding nrichment hert MH, for the functions wit ugh their oring Due to the master g only the means of x es th
  • 41. 22 predicting binding modes and ranking potential lead compounds in a virtual screening setting. The three categories of scoring functions that will be reviewed include physical- chemical, knowledge-based, and empirical functions. 1.3.2.1 Physical-Chemical The most prominent physical-chemical scoring function is the Molecular Mechanics/Poisson-Bolzmann Surface Area (MM-PBSA) function (Kollman et al., 2000). The overall format of the function can be summarized as follows: Δ = Δ + Δ − Δ − Δ − ∆ (1.15) where ∆ is the Coulomb electrostatics and vdW interaction energies calculated using MM force field packages such as AMBER and CHARMM (Case et al., 2005; Brooks et al., 2009). The ∆ term is usually evaluated using normal mode analysis of a MD trajectory. All calculations are based on ensemble averages based on snapshots taken from MD trajectory. Therefore, the MM/PBSA energy is calculated from averages of a finite number of snapshots from the ensemble and, as such, the quality of the results is sensitive to the details of the MD simulation. The Solvated-Interaction Energy (SIE) scoring function is another example of physics-based scoring function. It makes use of force-field parameters and equations to
  • 42. make equat For th interm the ch eleme energ comp solva poten betwe water solva surfa surfa e estimates o tion is as fol ΔG he electrosta molecular in hange in rea ent method ( gy is calculat ponents of fr ation energy ntial between een the boun r. As describ ation energy) ce area can b ce area is ca on the bindin lows: = atic compone nteraction en ction field so (Purisima an ted as the dif ree energy to and i n the ligand nd and free s bed in Sectio ) is proportio be different alculated as t ng affinity of + Δ ent of the fre ergy i olvation ene nd Nilar, 199 fference betw o binding, s the vdW in and protein states of the on 1.31, the onal to the su from functio the solvent-e 23 f molecules t + ee energy ( is estimated ergy calcula 95; Purisima ween the bou is the nteraction en atoms. solute-water cavitation c urface area ( on to functio excluded sur to a target. T + Δ ), the e using Coulo ated using the a, 1998). Th und and free change in th nergy calcula is calcul r VdW energ ost (nonpola (Eq. 1.13). T on. In the cas rface area (N The format o electrostatic omb’s law an e BRI-BEM e change in e states. For he non-electr ated using th lated as the d gy and cavita ar componen The definitio se of SIE, th Naim et al., 2 of the SIE (1.16 nd is M boundary solvation the nonpola rostatic he LJ 6-12 difference ation cost in nt of the n of the he molecular 2007). 6) s ar n
  • 43. 24 = ∙ Δ (1.17) However, to the cavitation cost is also added the loss of intermolecular VdW interaction between solute and solvent. This is accomplished by a linear scaling the solute-solute intermolecular VdW by a factor β and thereby account for the loss of solute-solvent VdW interactions upon complex formation: = ( − 1) + ∙ Δ (1.18) The complete parameterization of the SIE scoring function is dependent on a number of variables that include the solute dielectric constant (Din), solute atomic radii {ri}, SA scaling coefficient (γ), vdW interaction energy scaling coefficient (β) and fitting constant (C) (Naim et al., 2007): ({ }, , , , ) = ( ) + Δ ({ }, ) + ∙ + ∙ Δ ({ }) + (1.19) One general issue with empirical scoring functions is tied to their training set which can lead to an overall bias towards targets that have been explored as part of it (i.e. the training set on which they are parameterized may represent a bias towards the composition and diversity contained in it) (Gohlke and Klebe, 2002; Ferrera et al., 2004).
  • 44. 25 1.3.2.2 Empirical (Regression) Scoring Functions Empirical scoring functions weigh contributions from the different energetic terms in order to make a binding affinity prediction. These terms may include hydrogen- bonding using geometric measures as well as FF-based physical potentials. However, the linear weighing of the terms is derived from regression methods that fit binding affinity terms to experimental affinities using experimental data and structural information. The regression analysis optimizes the weighing to provide a maximal correlation between computed and experimental binding affinities in the training set (Bohm et al., 1994; Verkhivker et al., 1995; Head et al., 1996; Naim et al., 2007). 1.3.2.3 Knowledge-based Scoring Functions Knowledge-based (KB) scoring functions use statistical potentials that are derived from protein-ligand complexes databases such as the PDB (Koppensteiner and Sippl, 1998; Muegge et al., 2000; Gohlke and Klebe, 2001). The use of KB potentials for the scoring of protein-ligand complexes was inspired by the success of potentials in predicting protein folding and structure (Sippl, 1990; Sippl, 1993; Sippl et al., 1996). In KB functions, occurrences of interacting pairs of atoms in a training set of complexes are used to derive statistical potentials that resemble but are not potentials of mean force (Ben-Naim, 1997). In doing so, certain assumptions are made. The first is that the protein-ligand complex structures are assumed to be in a state of thermodynamic
  • 45. 26 equilibrium while the second is that the distributions of atoms in the complexes obey Boltzmann’s law (Sippl et al., 1993; Mullinax and Noid, 2010). KB potentials are built by first calculating a distance-dependent probability distribution of atom-pairs. The Hemholtz free energy is then calculated per atom-pair in the protein-ligand complex: ( ) = − ( ) (1.20) where ρij(r) is the pair correlation function for an atom pair of type ij at distance r while is a normalization factor representing the bulk density for the atom-pair when they are not interacting at a distance r. A few notable examples of KBP scoring functions include the piecewise linear potential (PLP), PMFScore and DrugScore (Verkhivker et al., 1995; Muegge and Martin, 1999; Gohlke et al., 2000). 1.3.1.4 Problems The major shortcoming of most scoring techniques like SIE is that they only consider a single receptor-compound interaction in estimating binding free energies of what is a dynamic process in nature resulting from an ensemble of such complexes. The use of ensembles as part of a Virtual Screening pipeline is explored in Chapter 5 of this
  • 46. 27 thesis. Nevertheless, despite phenomenal advances in computational power and technologies, accurate estimates of binding free energies remains challenging. 1.4 Predicting Binding Modes – Docking Predicting binding modes of ligands to a target protein structure, also known as docking, has been a key component of in silico techniques used in structure-based, rational drug design (Kuntz I, 1992; Cavasotto and Orry, 2007). Docking schemes attempt to find the optimal matching between a ligand and a targeted protein. In essence, the problem can be reduced to the following: given the atomic coordinates of these two molecules, predict the proper conformation of the complex. One assumption that is usually taken into the docking problem is prior knowledge of the binding site targeted by the ligand. Docking schemes are typically validated by their ability to reproduce experimental data through docking studies where protein-ligand complex conformations are obtained in silico and compared to structures obtained by experimental means (i.e. X- ray crystallography or nuclear magnetic resonance). Since predicting the correct bound conformation of both the protein and ligand is a challenging and computationally expensive task, the problem is usually reduced to the following: given the proper “bound” conformation of the protein, predict the proper bound conformation of the ligand and complex. This problem is the focus of the large majority of docking algorithms though a
  • 47. 28 few incorporate a sampling of receptor conformation as well to optimize the predicted complex coordinates. The main purposes of docking algorithms can be divided into two groups though the function of one is not mutually exclusive of the other. The first emphasizes speed and accuracy, where the main goal is the rapid screening of millions of potential candidate molecules for the discovery of a few active compounds in virtual screening (see Section 1.6). The second emphasizes accuracy of the complex structure, attempting to bridge the gap closer and closer between the predicted complex and the experimental structure. Docking programs search through a large selection of possible fits between a ligand and the targeted binding pocket and assess the best fit between them by taking into account several parameters. These parameters are akin to those used in scoring functions, which is in essence what they are. In this case however, the scoring scheme is optimized to retrieve the binding mode that is closest to the experimental structure as measured by RMSD. 1.4.1 Docking Algorithms 1.4.1.1 Fast Shape Matching Shape Matching algorithms primarily take into account the overall geometrical overlap between the protein and ligand molecules. Shape matching methods employ a variety of algorithms in order to assess proper conformations of the ligand and binding site to be matched.
  • 48. 29 Rigid-body docking applications are mainly SM-based. Examples include ZDOCK as well as our own internally developed docking program (Chen et al., 2003; unpublished). Flexible docking algorithms can also use SM methods as part of their strategy. For instance, DOCK combines incremental construction and a sphere matching algorithm in order to identify an optimal geometrical alignment (Kuntz et al., 1982). The development of methods that analytically calculate the solvent-accessible molecular surface was a key contributor that allowed the development of these SM applications (Connolly et al., 1983a; Connolly et al., 1983b). 1.4.1.2 Incremental construction Incremental construction methods divide the ligand into fragments which are separately docked onto the surface of the binding site. The fragments identified as rigid “anchors” regions of the ligand are typically docked first and the fragments identified as flexible regions are added sequentially with a systematic scanning of the torsion angles around the anchors. Following the docking, rigid fragments are then fused together for an optimal orientation of the molecule to be obtained. This fragmentation of the molecule is a means of incorporating ligand flexibility into docking. The first IC algorithm was part of the DOCK program (Desjarlais et al., 1986). In DOCK, the rigid fragments were first docked independently and each combination of the rigid fragments was combined as in the original compound if the atoms were within
  • 49. 30 certain distances of each other. Other methods utilizing IC include FlexX, FLOG, Hammerhead and Surflex (Miller et al., 1994; Rarey et al., 1996; Welch et al., 1996; Kramer et al., 1999; Ewing et al., 2001; Jain AN, 2007). 1.4.1.3 Monte Carlo Simulations The Monte Carlo (MC) Method was developed in its present form by Metropolis, Ulam and Neumann during their work on the Manhattan project (Metropolis and Ulam, 1949; Metropolis et al., 1953). Historically, it was used to perform the first computer simulation of a molecular system. MC simulations were later integrated as a means of adding flexibility to docking algorithms (Liu and Wang, 1999). With respect to docking algorithms, MC simulations attempt to position the ligand within the binding site through a number of random translational and rotational changes. The advantage of the added randomness to the sampling is a decreased likelihood of being trapped in local minima. The standard (Metropolis) MC methods generate configurations of a system through random Cartesian changes. Each change to the system is evaluated and then rejected or accepted based on a Boltzmann probability. One example of a MC-based docking application is the Internal Coordinates Mechanics (ICM) program (Abagyan et al., 1994). ICM initially makes a random move of one of three types: a rigid body ligand move, a torsion move of the ligand or a torsion move of the receptor side chain (Abagyan and Totrov, 1994; Abagyan et al., 1994). The side chain movement samples the conformational space defined a priori through a side-chain rotamer library (Ponder and
  • 50. 31 Richards, 1987). The side chain sampling allows the algorithm to explore with larger probability the conformational space which is known to be highly populated. Following each sampling step, a modified ECEPP/3 scoring function is used to perform a conjugate gradient local minimization and test whether the conformation is accepted or rejected using the Boltzmann criteria. 1.4.1.4 Evolutionary Programming Evolutionary programming (EP) algorithms are computational models that take their name and concept from biological processes. The EP algorithms generally start with a population of structures characterized by a given set of genes. Parent structures are then allowed to produce children structures containing a mixture of structural characteristic of the parents (as defined by the parents genes), throughout which mutations are allowed to occur. The individuals of the population displaying the most favorable features are kept while others are discarded, as per Darwin’s principle of natural selection. Genetic algorithms (GA) are one example of EP algorithms. In GA, a population of chromosomes (parents) is used to create new chromosomes (offsprings). Crossovers are used to generate the new chromosomes and a complex set of scoring functions are then used to select members within each round of selection. DOCK and GOLD are two of the most notable docking programs utilizing variations on the GAs (Ewing et al., 2001; Verdonk et al., 2003). While EP algorithms can find one of the best solutions to the docking problem, they, like all heuristic algorithms, can also be trapped in local minima.
  • 51. 32 1.5 Molecular Dynamics Molecular recognition between a protein and its ligand is a dynamic and complex process. An accurate computational representation of this interaction is a problem of considerable complexity and interest in CADD. Few techniques address this process and account for the conformational flexibility of both the ligand and receptor. Even fewer do so in an accurate and efficient manner. Protein flexibility is a multi-factorial, complex problem owing to the inter- and intra- molecular interactions involved in the conformational dynamics. Of all methods commonly used, Molecular Dynamics (MD) simulations provide the most complete computational representation of the dynamics involved in this process. 1.5.1 Newton’s Laws Molecular dynamics methods solve the Newton’s equation of motion for atoms on an energy surface. Newton’s law of motion provides the means of generating successive conformations of the system. The result of these successive conformations is a trajectory that indicates how the positions and velocities of particles within the system vary with time. Newton’s laws of motion can be summarized as follows:
  • 52. 33 First law: Every body remains in a state of constant velocity unless acted upon by an external unbalanced force. Hence, if the resultant force is zero, then the velocity of the object is constant (Eq. 1.22): I. = 0 ⟹ = 0 (1.21) Second law: A body of mass m subject to a net force F undergoes and acceleration a that has the same direction to the force and a magnitude that is directly proportional to the force and inversely proportional to the mass: II. F = = a (1.22) Third law: The mutual forces of action and reaction between two bodies are equal, opposite and collinear, i.e. whenever a first body exerts a force F on a second body, the second body exerts a force –F on the first: III. F , = − F , (1.23) In order to obtain an accurate trajectory, the differential equation embodied by Newton’s second law of motion is solved:
  • 53. 34 = (1.24) which describes the motion of particle of mass mi along one coordinate xi with a net force F along that direction. 1.5.2 Ensembles MD simulations are characterized with regards to the macroscopic conditions that are held constant. Statistical mechanics require that certain macroscopic conditions must be held constant in order to study the collection of all microstates of a system, its ensemble. Therefore, ensembles can be characterized by different quantities that include: volume (V), pressure (P), total energy (E), temperature (T) and number of particles (N). Ensembles are accordingly named and labeled with respect to the fixed quantities: NVT (canonical), NVE (micro-canonical) and NPT (isothermic-isobaric). 1.5.3 Verlet Algorithm As discussed above (see Section 1.2.1), nuclei behave to a good approximation as classical particles. The dynamics of motion can therefore be extrapolated by solving Newton’s second equation: = − = (1.25)
  • 54. 35 Here, V is the potential energy at position x and the vector x is a vector of length 3N containing the Cartesian coordinates for all particles. With an initial set of particles at position xi, the positions at a small time-step later can be calculated using a Taylor expansion (Eq. 1.27, 1.28): = + ∂ ∂t (Δ ) + 1 2 ∂ ∂t (Δ ) + 1 6 ∂ ∂t (Δ ) + ⋯ (1.26) = + (Δ ) + 1 2 (Δ ) + 1 6 (Δ ) + ⋯ (1.27) where the velocities vi, the acceleration ai and the hyper-acceleration bi are the first, second and third derivatives of the positions with respect to time. Substituting Δt with -Δt we obtain the positions at ri-1: = − (Δ ) + 1 2 (Δ ) − 1 6 (Δ ) + ⋯ (1.28) By adding equations for ri+1 and ri-1 we are able to calculate the position at Δt later from the current acceleration, and the previous and current positions. = (2 − ) + (Δ ) (1.29)
  • 55. 36 where the current acceleration can be obtained from the force or the derivative of the potential: = F = − 1 (1.30) As the acceleration is re-evaluated at each time step from the forces, the positions are changed at each time-step, which then creates the resulting trajectory. This, in essence, is the Verlet algorithm (Verlet, 1967). Certain disadvantages of the Verlet algorithm has given rise to the use of alternative algorithms for MD simulations. The first disadvantage is the tendency towards truncation errors. This is a consequence of adding ai (a small number) and 2ri – ri-1 (a large number) for the calculations of the new positions. The second is that velocities are not an explicit part of the Verlet algorithm and creates a problem in generating constant temperature ensembles (Cuendet and van Gurensteren, 2007). The velocity Verlet algorithm is a variation that addresses these problems (Martys and Mountain, 1999). 1.5.4 Considerations MD simulations require small time-steps and are time-intensive with regard to the calculation of phenomena such as bond stretching and angle-bending motions. The size of the chosen time-step is a critical element affecting the accuracy of the trajectory with smaller time-steps providing a better approximation of the expected dynamics of the system. This however also increases the computational costs, as more steps are required for propagating the system for a given total time. Generally, the longest time-step that can
  • 56. 37 be taken is limited by the rate of the fastest process being sampled in the system. Typically, that requires that the time-step be one order of magnitude smaller than the fastest process. In MD simulations, molecular rotations and vibrations occur with frequencies in the 1011 -1014 S-1 . Therefore, time-steps in the order of 10-15 S or less are required for sampling of these molecular motions. A consequence of this limitation is that a MD simulation of 1 nanosecond (ns) would require ~109 time-steps to complete. Since simulations are typically in the nanosecond range, orders of ~109 calculations present a significant computational demand. Additionally, biological phenomena such as protein folding typically occur in even longer microsecond timescales. One solution to this problem involves the freezing of the fastest molecular motions. This allows for significantly longer time-steps to be used while affecting the overall accuracy minimally. This is made possible because the fastest processes, the stretching vibrations, have a minimal impact on the properties of the trajectory. This is especially true for bonds involving hydrogen atoms. Therefore, freezing of bond lengths involving hydrogen atoms results in longer simulation times for a given number of calculated time-steps. The SHAKE and RATTLE algorithms provide the constraints necessary to maintain bonds involving hydrogen atoms fixed during the simulation and typically allow time-steps to be increased two to three fold (Ryckaert et al., 1977; Andersen, 1983). Lastly, overcoming energy barriers can be a challenging task given that any motion of a conformational ensemble outside of its minimum in the potential energy
  • 57. 38 surface will generate a force pulling the system back towards its minimum. A number of novel algorithms such as Replica-Exchange and MetaDynamics attempt to overcome this limitation using different means (Sugita and Okamoto, 1999; Laio and Parrinello, 2002). 1.5.5 Boundary Conditions MD simulations of a solvated system usually involve several hundred or thousand molecules of solvent. However, in order for macroscopic properties to be realistically calculated from a limited number of solvent molecules, boundary effects require special considerations. When considering that a water-filled 1L cube contains 3.3 x 1025 molecules of water at room temperature, 2 x 1019 of which will be interacting with the cube’s boundary, it is easy to see why using a computationally tractable number of molecules will be insufficient for deriving bulk properties. In a system containing a few thousand water molecules, most would be under the influence of interactions with the boundary. Periodic boundary conditions basically replicate the bulk properties of a fluid given a limited number of solvent molecules. The system is usually prepared within the confines of a box having a cubic or other polyhedral geometry (Bekker, 1997). The box is then replicated in all directions (Fig. 1.07). If a solvent molecule leaves the box during the simulation it is replaced by an image particle entering the box from the opposite side (Fig. 1.07). A constant number of solvent molecules within the box is therefore
  • 58. maint as if t Figu repro 1.5.6 r-1 . T contr spher cutof simul 1995 tained. This they were w re 1.07 Pe duced from fr Long-Ran The intera This creates a ributions from rical cutoffs, ff. However, lations of pe ). configuratio within bulk flu eriodic bound from www.-ph ge Electrost action energy a computatio m atoms loc , which essen cutoffs have eptides and n on allows for uid. dary conditio hy.-cmich.-edu tatic Calcul y between tw onal problem ated outside ntially elimi e been docum nucleic acids 39 r particles w ons in molecu u/-people/-pe lations: The wo point cha m in consider e of the centr nates electro mented to re s (Schreiber within the sys ular dynami tkov/-isaacs/- e Ewald Sum arges decays ring the long ral box. One ostatic contri esult in sever and Steinhau stem to expe ic simulation -phys/-pbc.-h mmation M s at a rate pro g-range elec e solution is ibutions bey re artifacts in user, 1992; Y erience force ns. (box tml) Method oportional to trostatic using yond the n MD York et al., es o
  • 59. 40 Ewald summation methods allow the potential due to the partial charges of a system and all of their periodic images to be considered. In Ewald summation, the position of each image box is related to the central box through a vector. Each vector is therefore an integral multiple of the length of the box. Generally, the contribution of charge-charge interactions within the central box to the potential energy can be written as: = 1 2 4 (1.31) where rij is the distance between charge i and j. 1.6 Virtual Screening While economic pressures increase to deliver target-optimized drugs at an accelerated pace and minimal costs, computational methods have become an increasingly important tool in drug discovery efforts. While numerous challenges continue to persist in the in silico accurate prediction of ligand-target interaction, computational methods have already proved themselves in the successful development of numerous pharmaceutical medications (See Section 1.7). Of note is the role of virtual screening (VS) in lead discovery efforts. VS provides the ability to analyze large compound databases, make predictions as to which compounds are most likely to interact with the desired target and become promising lead candidates. These candidates can then be tested and successful molecules can then go through rounds of optimization. VS
  • 60. 41 therefore circumvents the expense incurred through large scale screening efforts and narrows the search to a few, high-potential candidates (Oprea and Matter, 2004). 1.6.1 Virtual Screening Pipeline A VS pipeline is designed to optimize the use of computational resources for efficiency and speed at the initial stages and for accuracy at the later stages. This design optimizes the use of computational resources for the best overall performance of the pipeline. In this case, earlier stages of the filtering process minimize the use computational resources, thereby optimizing speed, by using soft scoring functions. More extensive calculations and sampling methods are reserved for the later stages of the pipeline where careful selection of the candidates with the highest potential is required (Fig 1.08).
  • 61. Figuure 1.08 42 The Virrtual Screenning Pipelinne.
  • 62. 43 1.6.2 The Target Target selection is the first step to any structure-based drug discovery project. Several requirements must be met. The first involves the target’s druggability (Hajduk et al., 2005). The second involves the availability and choice of the 3D structure used for the screening. X-ray crystallography or NMR structures are the preferred choices though VS projects have been successfully run on homology models as well (Evers and Klabunde, 2005). Since the majority of VS software has limited considerations with regards to target flexibility, the choice of structure should be aimed towards one where the conformation of the binding site is akin to that expected when bound to a small molecule (Sousa et al., 2006). Following the careful selection of target and structure, preparation of the target structure is another important task in the VS preparation steps. The primary consideration is in the assignment of proper protonation states to active-site residues. Difficulties arise due to the effects of local electrostatic conditions on the pKa values of side-chain functional groups. With respect to the success of VS, proper assignment of side-chain protonation states is crucial in providing an accurate representation of the binding-site characteristics. A few alternatives exist which integrate the electrostatic effects in assessing the protonation states of side-chain functional groups. One example is the H++ server which predicts the protonation states of amino-acid side chain functional groups within the continuum electrostatic framework (Gordon et al., 2005).
  • 63. 44 1.6.3 The Compound Database A database should first provide optimal structural diversity so as to maximize chances of finding numerous scaffolds displaying activity against the target. Generally, compounds should also adhere to the Lipinski’s rule of five (Lipinski et al., 2001). Several small molecule database exist which are routinely used for VS. These include the ZINC library, the National Cancer Institute compound database, and Accelerys Available Chemical Directory and MDDR libraries (Milne et al., 1994; Irwin and Shoichet, 2005). Most major pharmaceutical companies also have in-house corporate libraries. 1.6.4 The Docking Protocol The docking protocol is at the core of every VS pipeline. Docking algorithms attempt to predict the structure of the protein-ligand complex as a first, preliminary filter. The docking must therefore be fast, as an extremely large number of compounds must be evaluated. While the docking pipeline may not provide absolute accuracy with regards to selecting all true-positive compounds, it must be robust enough not to discard moderate to strong binders as false-negatives across a variety of targets. This preliminary docking step is typically composed of a docking algorithm (see Section 1.4.1) and a scoring function. The scoring function used at this step is usually optimized for speed rather than accuracy and other more extensive and accurate functions are usually used at later stages
  • 64. 45 of the pipeline where a more discriminate assessment of the binding potential of a smaller number of compounds is required. 1.6.5 MD Simulations MD simulations are used a final refinement of the most promising candidate molecules before selection is done. As such, MD simulations in the order of a few hundred picoseconds to a few nanoseconds are done on the predicted ligand-protein complex. The goal of MD simulations in this setting is to establish the proper dynamic stability of the complex. This is achieved by careful observation as to the interactions between the ligand and target supported by analysis of the stability of the protein structure and binding mode. Scoring functions such as SIETRAJ and MM-PBSA can also be used on the MD simulations to obtain a better assessment as to the potential binding affinity of the compounds (Kollman et al., 2000; Cui et al., 2008). 1.6.6 Conformational Ensembles The VS pipeline described typically considers the target as a rigid entity during the docking process. Since the conformational flexibility of the target is seldom fully considered during such a process, methods that integrate target flexibility through the use of conformational ensembles have proved successful (Bursavich et al., 2002; Osterberg et al., 2002; Barril and Morley, 2005; Amaro et al., 2008). Theoretically, conformational ensembles allow a fully dynamic representation of the target to be presented to the ligand
  • 65. 46 for fit. This is akin to what is thought to occur in solution where a ligand binds to a pre- existing receptor population. The ligand is then exposed to the conformational ensemble of the receptor and may preferentially bind to conformations that occur infrequently in the receptor’s dynamics (Ma et al., 2002; Wong and McCammon, 2003). The result is a shift in the equilibrium population towards that of the preferentially bound conformation (Ma et al., 2002). The “lock and key” model of ligand binding is therefore thought to be a representation of one of the rare conformations within this ensemble and hence, that conformational selection is a driving force in ligand recognition. One of the most prominent examples of using conformational ensembles generated from MD simulations for VS has been implemented as part of the Relaxed Complex Scheme (RCS) (Lin et al., 2002; Lin et al., 2003; Amaro et al., 2008). The RCS combines the advantages of docking with the dynamic conformational sampling that is provided by MD simulations. Through this use of MD simulations, the RCS integrates extensive conformational sampling of the target structure into the VS pipeline. At the core of the RCS is an all-atom MD simulation of the target where the simulation time varies from a few ns to tens of ns (Schames et al., 2004; Amaro et al., 2008; Cheng et al., 2008). With few exceptions, the AutoDock docking program is typically used to carry out docking and scoring functions (Morris et al., 2009). Since significant conformational changes to the active site are induced by ligand binding, a ligand-bound structure is usually preferred. The resulting trajectory is then reduced to a computationally tractable ensemble. A number of strategies exist to select a representative subset from the full set of resulting structures where much of the dynamic information of the trajectory remains.
  • 66. 47 RMSD-based clustering is an obvious choice for selection of the most dominant configurations within the trajectory. In their study of avian influenza neuraminidase using the RCS, Cheng et al. applied RMSD clustering on snapshots extracted every 10ps from 40ns trajectories (Cheng et al., 2008). An alternate but equally effective method is that of QR-factorization (O’Donoghue and Luthey-Schulten, 2005). QR-factorization was originally designed for the removal of redundant information from structural databases by identifying a set of structures which represent the evolutionary conformational space of a protein. In their study of the Trypanosoma brucei RNA-editing Ligase 1 (TbRel1), Amaro et al. integrated the use of QR factorization into the RCS in order to extract a representative set of structures from a 20ns trajectory of the target in complex with ATP, its native substrate (Amaro et al., 2007; Amaro et al., 2008). For the QR factorization, snapshots were extracted every 50ps resulting in a set of 400 structures which was reduced to a total of 33. In both cases the RCS proved extremely successful in identifying true binders from the original database. For Cheng et al., the weighted average score from docking into the full representative ensemble of the holo trajectory resulted in the selection of 25 compounds, 10 of which displayed a Ki under 500µM (Cheng et al., 2008). For Amaro et al., ranking of the mean score from docking into the QR representative set resulted in the selection of 10 compounds, 5 of which displayed inhibition at 10µM or better (Amaro et al., 2008).
  • 67. 48 1.7 Successes of CADD Computer-aided drug design techniques have now become a core component of modern drug discovery and development pipelines (Jorgensen, 2004). One of the most prominent successes of rational, structure-based drug design is that of the imatinib (Gleevec®), a tyrosine kinase inhibitor for the treatment of Chronic Myelogenous Leukemia (CML) (Capdeville et al., 2002). Early drug discovery programs for the treatment of cancer largely focused on inhibition of DNA synthesis and cell division through the use of antimetabolites (nucleoside analogs and antifolates), alkylating agents (classical and newer platinum-based therapeutics) and microtubule destabilizers (vinca alkaloids) and microtubule stabilizing agents (taxanes) (Scott, 1970; Scagliotti and Selvaggi, 2006; Zhou and Giannakakou, 2005). The uncovering of the bcr-abl reciprocal translocation as the pathogenic event in CML established it as an attractive drug target (Kelliher et al., 1990). Docking studies and X-ray crystallography established the binding of Gleevec with high-affinity to the inactive form of the ATP-binding pocket (Schindler et al., 2000; Zimmerman et al., 2001). Additionally, SBDD allowed for the analysis of mutations in the enzyme which gives rise to imatinib resistance. This provided an opportunity for the design of novel pharmaceuticals that are effective in overcoming imatinib-resistance (Weisberg et al., 2007). The first marketed drug whose development was assisted by SBDD was captopril (Capoten® ), an angiotensin-converting enzyme (ACE) inhibitor used for the treatment of hypertension (Cushman et al., 1977). Early on in the developmental stages a peptidic lead
  • 68. 49 compound had been identified from a snake poison. However, structural information as to the binding site of ACE was lacking. This led scientists at Squibb to use the structure of another zinc protease, the recently crystallized carboxypeptidase A, to model binding site of ACE. The modeling led to the development of captopril, the first successful design based on a molecular model. The structural determination of ACE came about in 2002 where it was determined that the biding site of ACE differed significantly from carboxypeptidase A leading to the development of newer, more targeted ACE inhibititors (Natesh et al., 2003). The success of CADD in properly assessing the potential binding of a compound to a target is directly related to our ability to correctly the binding affinity of a small molecule. Many of the limitations of current in silico pipelines stem from the difficulties in properly and reliably predicting the binding of small molecules to a target (Michel and Essex, 2010).
  • 69. 50 Chapter 2 Molecular Dynamics Study of Small Molecule Inhibitors of the Bcl-2 Family
  • 70. 51 Preface The contents presented in the following chapter have been published as presented: Acoca S, Cui Q, Shore GC, Purisima EO. 2011. Molecular dynamics study of small molecule inhibitors of the Bcl-2 family. Proteins. 79(9):2624-36.
  • 71. 52 2.1 Rationale Molecular modeling techniques have taken an important role in drug development. This is especially true of molecular simulations and scoring functions which provide useful insights for the optimization of lead compounds. Obatoclax and ABT-737 are two novel Bcl-2 inhibitors which have different selectivity profiles for antiapoptotic Bcl-2 members. While numerous studies have examined the selectivity of BH3 domains for Bcl-2 members, few have provided conclusive evidence as to the selectivity of ABT-737. With regards to Obatoclax, lack of structural data on its binding mode has also left much questions unanswered as to how it mediates its inhibition of Bcl- 2 members. This study therefore aimed to provide the grounds on which the selectivity of both ABT-737 and Obatoclax could be understood while identifying the most probable binding mode of Obatoclax. 2.2 Abstract We carried out docking and molecular dynamics simulations on ABT-737 and Obatoclax, which are inhibitors of the Bcl-2 family of proteins. We modeled the binding mode of ABT-737 with Bcl-XL, Bcl-2, and Mcl-1 and examined their dynamical behavior. We found that the binding of the chlorobiphenyl end of ABT-737 was quite stable across all three proteins. However, the phenylpiperazine linker group was dramatically more mobile in Mcl-1 compared to either Bcl-XL or Bcl-2. The S-phenyl group at the p4 binding site was well-anchored in Bcl-XL and Bcl-2 but was somewhat
  • 72. 53 more mobile in Mcl-1 although the phenyl ring itself on average stayed close to the p4 binding site in Mcl-1. This greater mobility is likely due to the greater openness of the p3 and p4 binding sites on Mcl-1. The calculated binding free energies were consistent with the much weaker binding affinity of ABT-737 for Mcl-1. Obatoclax was predicted to bind at the p1 and p2 binding sites of Mcl-1 and the binding mode was quite stable during the molecular dynamics simulation with Mcl-1 wrapping around the molecule. The modeled binding mode suggests that Obatoclax is able to inhibit all three proteins because it makes use of the p1 and p2 binding sites alone, which is a fairly narrow groove in all three proteins unlike the p4 binding site, which is much broader in Mcl-1. 2.3 Introduction Cancer is fundamentally a disease of dynamic changes in the genome. It has been described as a multistep process culminating in the acquirement of six essential alterations in cellular physiology (Hanahan and Weinberg, 2000). Dysregulation of the apoptotic process has been recognized as one of these critical alterations required for progression to the disease phenotype (Hanahan and Weinberg, 2000). As such, research directed towards a better understanding of the processes involved in the regulation of apoptosis has bloomed in the past decade, directed towards a better understanding of the extensive network of protein interactions that regulate it and the potential targets that can be used to activate it.
  • 73. 54 At its core, apoptosis is the mechanism responsible for the careful synchrony of cellular death observed throughout development, the maintenance of homeostasis and proper immune function (Krammer et al., 1994; Meier et al., 2000; Elmore S, 2007). There are two pathways (intrinsic and extrinsic) which converge towards activation of the apoptotic machinery. The extrinsic pathway is characterized by activation of members of the death receptor family (Ashkenazi and Dixit, 1998). Death receptors, which belong to the tumor necrosis factor (TNF) receptor superfamily, are surface transmembrane receptors engaged by binding of extracellular “death ligands” such as FasL and TNF (Ashkenazi and Dixit, 1998). Activation of these receptors leads to the formation of the death-inducing signaling complex (DISC), which mediates the activation of initiator caspases thereby committing the cell to apoptotic death (Bao and Shi, 2007). On the other hand, the intrinsic (mitochondrial) apoptotic pathway is triggered by mainly non-receptor stimuli. It is unique in its ability to initiate apoptosis in response to DNA damage, cytotoxic stress and cytokine deprivation though it can be engaged by the extrinsic pathway as well (Brenner and Mak, 20009). In response to apoptotic stimuli, the intrinsic pathway triggers the permeabilization of the outer mitochondrial membrane (OMM). This permeabilization releases Cytochrome C and other molecules residing within the mitochondrial inter membrane space (IMS) into the cytosol, resulting in the formation of the apoptosome (a complex of Cytochrome C, APAF-1 and pro-caspase 9) and activation of the caspase cascade through caspase 9 (Ow et al., 2008). At the heart of the intrinsic pathway lies the Bcl-2 family of apoptotic proteins. Known as the “Gatekeepers of Mitochondrial Apoptosis”, the Bcl-2 family of proteins
  • 74. 55 are unique in their role of regulating mitochondrial outer membrane integrity in response to death stimuli (Adams and Cory, 2007). Through heterodimerization, anti-apoptotic members can neutralize the effects of pro-apoptotic members, the relative balance of which acts as a regulating switch for initiating mitochondrial apoptosis (Oltersdorf et al., 2005). The Bcl-2 family is composed of three groups of proteins distinguished through functional and structural features. The antiapoptotic members (consisting of Bcl-2, Bcl- XL, Bcl-B, Bcl-W, Mcl-1 and A1) share three to four α-helical regions of high sequence similarity known as the Bcl-2 Homology (BH) domains (Petros et al., 2004; Adams and Cory, 2007). Bcl-2 pro-survival proteins inhibit the pro-apoptotic members in part by sequestering the amphiphilic BH3 helix of the pro-apoptotic members within a long surface exposed groove. Because the Bcl-2 survival members promote cell survival in cancer cell lines, they are recognized as a highly relevant target for the treatment of cancer. They are also implicated in general resistance to chemotherapeutic agents along with a more aggressive malignant phenotype (Minn et al., 1995; Simonian et al., 1997; Amundson et al, 2000). Bcl-2 inhibitors show promise as cancer therapeutics, especially when used in combination therapy (Oltersdorf et al., 2005; Nguyen et al., 2007; Lessene et al., 2008; Tse et al, 2008; Ackler et al., 2010). One promising agent is the orally bioavailable compound ABT-263 (navitoclax); ABT-737 (Figure 2.01) is an analog that is widely used in preclinical studies as a tool compound (Oltersdorf et al., 2005; Tse et al., 2008). These compounds were developed using the SAR by NMR methodology and employing stable protein fragments for optimal NMR study. They display subnanomolar affinity for such recombinant fragments of Bcl-
  • 75. 56 2, Bcl-XL, and Bcl-W but > 1µM for Mcl-1 (Shuker et al., 1996; Oltersdorf et al., 2005; Tse et al., 2008). As predicted by its affinity profile, ABT-737 as well as navitoclax exhibits limited efficacy in cells where Mcl-1 is expressed (Konopleva et al., 2006; van Delft et al., 2006; Chen et al., 2007; Tse et al. 2008). Consequently, this selectivity is one of the key aspects of navitoclax that may limit its chemotherapeutic utility. Several studies have addressed the selectivity of BH3 peptides for members of the Bcl-2 family; however, extrapolation to explaining and modifying the selectivity of ABT- 737/navitoclax has not been straightforward (Lee et al., 2008; Lee et al., 2009; Fire et al., 2010). Furthermore, the Bcl-2 pro-survival proteins are anchored in the mitochondrial outer membrane where they in fact undergo conformational changes and greater penetration into the lipid bilayer in response to stress stimuli (Shore et al., 2008). Thus despite the high affinity binding of these compounds to soluble recombinant protein fragments in aqueous buffers in vitro, it is not clear how this translates to the efficacy of binding in intact cells. Figure 2.01 ABT-737 chemical structure.
  • 76. 57 A second Bcl-2 inhibitor currently in Phase I & II trials is obatoclax (GX15-070), a hydrophobic cycloprodigiosin derivative developed by Gemin X Pharmaceuticals (Nguyen et al., 2007). Obatoclax (Figure 2.02) was found to inhibit the binding of BH3 peptides to recombinant fragments of all pro-survival members of the Bcl-2 family with low micromolar affinity employing fluorescence polarization assays but its key property lies in its ability to potently overcome Mcl-1 mediated resistance to chemotherapeutic agents (Zhai et al., 2006; Nguyen et al., 2007; Perez-Galan et al., 2007). Indeed, in assays employing native Mcl-1 in intact mitochondrial outer membrane, 10 nM obatoclax reverses the constitutive interaction between Mcl-1 and pro-apoptotic Bak. Hence, an understanding of its binding mode to Mcl-1 is of particular interest. Figure 2.02 Obatoclax chemical structure. In this chapter we present an extensive analysis of molecular dynamics simulations performed for obatoclax/Mcl-1 and ABT-737 complexes. The aim of the current study is to rationalize the binding specificity of ABT-737 and to predict the binding mode of obatoclax to Mcl-1, for which an experimentally determined three- dimensional structure of the complex has proven to be elusive.
  • 77. 58 2.4 Methods 2.4.1 Structure Preparation The starting structures for the docking and molecular dynamics simulation experiments of Bcl-2, Bcl-XL and Mcl-1 complexes were taken from the Protein Data Bank (Codes 1YSW, 2YXJ, and 2PQK respectively). All bound ligands (small molecules and BH3 peptides), waters and ions and other molecules were removed from the complexes, except for Bcl-XL for which we kept the ABT-737 ligand. Missing side chains, terminal residues and hydrogen atoms were added using Sybyl 8.0 (Tripos Inc., St. Louis, MO) and XLeap in AMBER (Case et al., 2005). Protonation states were assigned using the H++ server (Gordon et al., 2005). Visual inspection of all assigned protonation states was done in Sybyl 8.0 and adjusted as needed. 2.4.2 Force field parameters The FF99SB force field in the AMBER suite of programs was used for the protein atoms. The antechamber module of Amber Tools was used to assign GAFF parameters for obatoclax and ABT-737 (Wang et al., 2004; Case et al., 2005; Hornak et al., 2006). In the case of the ABT-737, we applied the biphenyl parameters of Athri and Wislon (Athri and Wilson, 2009). Partial charges for the inhibitors were obtained using RESP with 6- 31G* electrostatic potentials calculated using GAMESS (Bayly et al., 1993; Schmidt et al., 1993).
  • 78. 59 The sulfonamide group in ABT-737 has an imide-like bond (see Figure 2.01) that is not well-represented by the default GAFF parameters. Hence, we derived force field torsional parameters for the S–N bond using a model compound with a phenyl ring on either side the SO2NHCO group. The covalent geometry was taken from the Cambridge Structural Database (CSD) entry CEKHIJ (Allen FH, 2002). A torsional energy profile around the S–N bond was generated using GAMESS at an MP2/6-31G* level of theory. A truncated Fourier series was fitted to the residual torsional energy profile after subtracting out the calculated AMBER energy. The resulting coefficients are listed in Table S1 (Supplementary Materials). 2.4.3 Docking For Bcl-XL the deposited crystal structure of the complex (PDB 2YXJ) was used directly as the starting point for our calculations. For Bcl-2, the initial docked pose of ABT-737 was obtained by superposing the Bcl-2 structure with Bcl-XL and extracting and merging the inhibitor coordinates in the Bcl-XL structure into the Bcl-2 structure. The same procedure was carried out for docking ABT-737 into Mcl-1. For Bcl- 2 and Mcl-1, direct merging of the inhibitor into the binding site resulted in some side chains being in awkward positions relative to ABT-737. These were initially relieved using the Sculpt module of Pymol (Schrodinger, New York) followed by ligand- restrained energy minimization.
  • 79. 60 Docking of obatoclax into Mcl-1 was carried out using an in-house docking program (manuscript in preparation) that does an exhaustive rigid body docking (translation and rotation) of the ligand on a grid. A rectangular box enclosing the entire binding groove defined the search region. We used a grid spacing of 0.5 Å and rigid body rotational angular increments corresponding to atomic displacements of 0.5 Å. Poses were scored using a weighted combination of van der Waals, coulomb, surface area, shape complementarity and hydrogen bonding terms. The weights were previously calibrated to reproduce binding poses of a training set of protein-ligand complexes. OMEGA (OpenEye Scientific Software, New Mexico) was used to generate conformers for the ligand used in the rigid docking. The protein was kept fixed during the docking. The top-scoring pose was used for the MD simulation. 2.4.4 Molecular Dynamics Simulations Each system was immersed in a truncated octahedral TIP3P water box (Jorgensen et al., 1983). The distance between the wall of the box and the closest atom of the solute was 12Å. Sodium or chloride counterions were added as required to maintain electroneutrality of the system. Molecular dynamics (MD) simulations were carried out using the AMBER program. A 2 fs time step and 9 Å non-bonded cutoff was used. SHAKE was employed to constrain bond lengths of bonds to hydrogen atoms and the Particle Mesh Ewald algorithm was used to treat long-range electrostatics (Ryckaert et al., 1977; Cheatam et al., 1995).