1. 1
BORREL Alexandre
PockDrug: a new pocket druggability
prediction model ables to overcome pocket
estimation uncertainties
ISB Winter School 2014 - Levi
24-11-2014
2. 2
What is druggability ?
« The viability of a drug target depends on two components: biological relevance and chemical tractability.
The concept of druggability was introduced to describe the second component, and it is defined as the
ability of a target to bind a drug-like molecule with a therapeutically useful level of affinity. »
Definitions
Perola et al. 2012 Journal of Chemical Information and Modeling, 52(4), 1027–1038.
« The term ‘druggability’ usually refers to the likelihood of finding orally bioavailable small molecules
that bind to a particular target in a disease-modifying way »
Edfeldt et al. 2011 Drug Discovery Today, 16(7-8), 284–7.
Orally bioavailable small molecules
Target
3. Ligand
3
What are drug-like molecules ?
Drug-like molecule
Orally bioavailable small molecules tend to have properties within certain parameters e.g.,
Lipinski et al. 1997 Advanced Drug Delivery Reviews 23 (2-25)
In order to bind such compounds, a
protein should have a binding site
with complementary properties
● Mwt <= 500 Da
● LogP <= 5
● H-bond Acceptors <= 10
● H-bond Donors <= 5
« Rule of 5 »
Weight: 381 Da
LogP : 3.9
...
Celecoxib (CEL)
Pocket (from 1OQ5)
Volume: 361 Å3
Surface: 296 Å2
...
General concept of model
Pocket
4. Importance of assessing druggability
Estimates suggest that around 10-15% of human
genome may be druggable (with small molecule
approach) and 600-1500 potential targets
Adapted from Hopkins, A. A. L., & Groom, C. R. C.
(2002). Nature Reviews
- prioritize potential targets
- avert targets that are unlikely to bind small
molecules with high affinity (optimize experimental
screenings)
Druggability is important to:
Brown, D., & Superti-Furga, G. (2003). Drug Discovery Today, 8(23), 1067–1077
Human genome ~30,000
Druggable
Genome
~30,000
Diesase
modifying
Genes ~3,000
Drug targets ~ 600-1,500
4
5. 5
How do you predict the pocket druggability ?
From target structures
Step1: Identifying cavities or pockets (pockets estimation)
3 main steps:
Step2: Compute pocket properties
Step3: Apply a statistical model
Less druggable
DruggablePockets
Polarity, ...
Geometry, ...
Step1 Step2 Step3
6. 6
Dataset ?
NRDLD (Non Redundant set of Druggable and Less Druggable binding sites)
71 druggables
44 less druggables
HTS NMR screening PDBBind
Database screeningExperimental data
Adapted from Krasowski, A. et al. (2011). Journal of Chemical
Information and Modeling, 51(11), 2829–42.
Widely used by others druggability studies.
Apo (Apo proteins set)
From “Druggable Cavity Directory”, 139 apo protein are extracted.
132 druggables
7 less druggables
7. 7
Estimations
Pocket estimation (step1)
We decided to use 3 different pocket
estimations.
prox: by taking the ligand information,
the pocket is the protein atoms close
to the ligand, only for holo pocket.
fpocket: geometric algorithm based
on Voronoi tessellation
DoGSite: based on a Difference of
Gaussian (DoG) approach which
originates from image processing.
Le Guilloux, V. et al. (2009). BMC
Bioinformatics, 10, 168
Volkamer, A. et al. (2010).Journal of
Chemical Information and Modeling,
50(11), 2041–52.
« ...different pocket detection methods
can assign different sizes and/or
numbers of pockets for the same
structure. »
Gao, M., & Skolnick, J. (2013). Bioinformatics
(Oxford, England), 29(5), 597–604.
8. 8
Descriptors
Pocket characteristic features (step2)
A set of 52 descriptors are computed.
Scores of polarity, hydrophobicity, ... Geometry, distance,
volume, shape, ...
K
H
A
LN
W
d
d'
Inertia
Residues compositions
Perola et al. 2012 (Chemical information and modeling)
Kyte et al. 1992 (molecular biology)
Petitjean 1996 (Chemical information and modeling)
Pérot et al. 2008 (Drug Discovery Today)
9. 9
Goal
2. Pocket estimations
prox fpocket
DoGSite
52 descriptors
3. Pocket descriptors
1. Datasets
71 druggables
44 less druggables
NRDLD
132 druggables
7 less druggables
Apo
48 druggables
26 less druggables
Train
23 druggables
14 less druggables
Test
10. 10
Goal
2. Pocket estimations
prox fpocket
DoGSite
52 descriptors
3. Pocket descriptors
1. Datasets
71 druggables
44 less druggables
NRDLD
132 druggables
7 less druggables
Apo
48 druggables
26 less druggables
Train
23 druggables
14 less druggables
Test
Unique generic model available on several
pocket estimations
11. 11
Pocket variability
PCA from three pockets set from
estimated by prox, fpocket and
DoGSite and computed from 52
pockets descriptors.
Same binding site defined by different estimations have different properties but ....
The Overlap Score are weak:
prox-fpocket = 30 % ± 14 %
prox-DoGSite = 28 % ± 14 %
fpocket-DoGSite = 30 % ± 16 %
13. 13
In spite of different estimations druggable
and less druggable pockets are grouped in
different area.
Globally same properties in despite of estimation
Aromaticity
Geometry
Polarity
Hydrophobicity
Variability VS Druggability
14. 14
Statistic protocol – Step 3
Train set
52 pocket descriptors
Pocket characterized
Train pocket set
1. Define training and test set by estimator
15. 15
Statistic protocol – Step 3
52 pocket descriptors
Combination of n descriptors among 52
ninit = 2
X LDA models built
If n=2, X=2,652
If n=3, X = 23,426
Train set
2. compute Linear Discriminant Analysis (LDA)
models with n descriptorsPocket characterized
Train pocket set
1. Define training and test set by estimator
16. 16
Statistic protocol – Step 3
Train set
52 pocket descriptors
Combination of n descriptors among 52
ninit = 2
Matthew's Coefficient Correlation
3. select best models with minimal number of
descriptors
Best model performances
On train set and cross validation
Build model from best LDA models
if MCC loo (n)>MCC loo (n-1)
else
Consensus PockDrug
Pocket characterized
Train pocket set
X LDA models built
If n=2, X=2,652
If n=3, X = 23,426
1. Define training and test set by estimator
2. compute Linear Discriminant Analysis (LDA)
models with n descriptors
17. Statistic protocol – Step 3
fpocket
Set train test Loo
Acc 86 % 86 % 85 %
MCC 0.67 0.71 0.65
MCC close to 0.70 on train, test and
by Loo
Overcom
e the pocket uncertainties ?
Select model built from fpocket
18. 18
1. Test model on other pockets
estimated by other estimators.
Validation – Step 3
2. Test model on apo pocket set
MCC close to 0.70 on train, test and
by Loo
Protein sets
Estimations
Pocket descriptors
Pocket sets characterized
Performances
prox-test
fpocket-test DoGSite-test
fpocket-apo
fpocket
Test
DoGSite
DoGSite-apo
prox
apo
Consensus PockDrug
fpocket
Set train test Loo
Acc 86 % 86 % 85 %
MCC 0.67 0.71 0.65
19. Performances
1. Robust on estimations
Consensus PockDrug
prox-
test
DoGSite
-test
fpocket-
test
Acc 95 % 87 % 87 %
MCC 0.89 0.73 0.71
Good performances overcomes
pocket estimations
19
20. Performances
fpocket-
score
DoGSite-
Scorer
Acc 76 % 76 %
MCC 0.51 0.54
1. Robust on estimations
2. Comparison
+ 0.20 in term of MCC
Consensus PockDrug
prox-
test
DoGSite
-test
fpocket-
test
Acc 95 % 87 % 87 %
MCC 0.89 0.73 0.71
Good performances overcomes
pocket estimations
20
21. Performances
fpocket-
apo
DoGSite-
apo
Acc 91 % 94 %
Mcc 0.45 0.53
fpocket-
score
DoGSite-
Scorer
Acc 76 % 76 %
MCC 0.51 0.54
1. Robust on estimations
2. Comparison
3. Apo pockets
+ 0.20 in term of MCC
Successful in apo pocket
Consensus PockDrug
prox-
test
DoGSite
-test
fpocket-
test
Acc 95 % 87 % 87 %
MCC 0.89 0.73 0.71
Good performances overcomes
pocket estimations
21
22. 22
Characteristics in PockDrug
4 keys properties
Hydrophobicity ++++
Geometric +++
Contact (H-bond donnor-acceptor) ++
Aromaticity +
Hydrophobicity
Geometry
Aromatic
Contact
Correlation radar (prox-fpocket)
PockDrug include high correlated descriptors
23. PockDrug results
23
Acetylcholinesterase
complexed with Huprine
Geometry Hydrophobicity Aromatic
0.82 +/- 0.09
By pocket, we have a druggable probability
with a confidence index
Results by pocket
Druggable probability (Mean)
Confidence (SD)
27. 27
Conclusion
Druggability models
Perspectives
- Efficient
- Less depend of pocket estimation (for estimations tested)
- Efficient on an apo pocket set
- Test the model on other estimators
- Improve the quality of pocket characteristic features, propose new pocket
descriptors, e.g. pocket solvation, pocket flexibility, …
- Propose a similar protocol for other ligand type e.g. Small molecules not drug-
like, peptide, …
28. 28
Acknowledgments
Pr. Camproux & Dr. Xhaard
Computation Drug
Discovery group
Supervisors
Others contributors
Dr. Regad & Dr. Petitjean
Web site
Abi Hussein Hiba, Bécot Jérôme
33. 33
Statistic protocol (2)
Pockets estimated by fpocket
Matthew's Coefficient Correlation
Choose the models with combination of 3 descriptors
34. 34
Statistic protocol (3)
2 models are generated and discuss
- Observated-PockDrug (from pockets estimated by prox4), data not shows
- Predicted-PockDrug (from pockets estimated by fpocket)
Druggability model are a combination of
best LDA models with:
- best performances on other pocket
sets
- with a minimal number of descriptors
Best model performances
52 pocket descriptors
Pockets characterized
train set test set
X LDA models built
if n = 2, X = 2 652
if n = 3, X = 23 426
Built Drug-model among X best LDA models
Combinations of n descriptors among 52
ninit = 2
n = n + 1
if MCC loo(n) >MCC loo(n-1)
else
OPE
- prox4
Pocket set
Model construction
Performances
Estimators
PPE
- fpocket
- DoGSite
Apo
- Apo139
Protein sets
Holo
- NRDLD
Pocket descriptors
Druggability model
Validation
Pocket sets characterized
prox4-NRDLD
fpocket-NRDLD DoGSite-NRDLD fpocket-Apo139 DoGSite-Apo139
and MCC test(n) >MCC test(n-1)
on test, train set and by leave one out (loo)
35. 35
Dataset
NRDLD (Non Redundant set of Druggable and Less Druggable bindind sites)
Non-Redundant set of
Druggable and Less Druggable
binding sites (NRDLD set)
71 druggables
44 less druggables
HTS
- 43 druggable
- 17 non druggable
NMR screening
- 35 druggables
- 37 non druggables
PDBBind
Database screening
Experimental data
Cheng et al (2007). Nature
Biotechnology, 25(1), 71–5
Hajduk et al. (2005). Journal of
Medicinal Chemistry, 48(7), 2518–25.
Wang et al. (2005). Journal of
Medicinal Chemistry, 48(12),
4111–9.
From Krasowski, A. et al. (2011).
Journal of Chemical Information and
Modeling, 51(11), 2829–42.
Widely used by others druggability studies.
Apo139
From DCD database, 139 apo protein are extracted.
132 druggables
7 less druggables
36. 36
Druggability predictions: background
« ...different pocket detection
methods can assign different sizes
and/or numbers of pockets for the
same structure. »
Gao, M., & Skolnick, J. (2013). Bioinformatics
(Oxford, England), 29(5), 597–604.
Step1: Step2:
-Lots of estimator algorithms:
- energy levels
- geometric features
- sequence alignment
- Appropriate descriptor set
- Best statistical protocol
- machine learning ?
- descriptors selection ?
- validation ?
Step3:
- Dataset