I presented this at the RACI Biomolecular on the Beach conference in December 2011. A correlation inflation teaser followed by alkane/water logP and SAR/SPR based on relationships between structures. The photograph in the title slide was taken in Asunción.
Molecular design: One step back and two paths forward
1. Molecular Design: One step back and two paths forward
Peter W Kenny (pwk.pub.2008@gmail.com)
2. Some things that are hurting Pharma
• Having to exploit targets that are less well-linked to
human disease
• Inability to predict idiosyncratic toxicity
• Inability to measure free (unbound) physiological
concentrations of drug for remote targets (e.g.
intracellular or within blood brain barrier)
Dans la merde: http://fbdd-lit.blogspot.com/2011/09/dans-la-merde.html
4. Add Normally-distributed noise
Data set A Data set B
Points plotted at
constant increment
Equal numbers of points
for each value of x
Preparation of data sets
5. r2 = 0.99
RMSE = 0.36
Data set A: Fit median value of Y to X
An example of this approach to plotting data can be seen in Leeson & Springthorpe, The influence of
drug-like concepts on decision-making in medicinal chemistry. Nat. Rev. Drug Discov. 2007, 7, 881-890.
6. Low Medium High
Data set B: Use value of X to split into three equally-sized groups
and show mean and associated confidence interval for each
An example of this approach to analysing data can be seen in: Gleeson, Generation of a
Set of Simple, Interpretable ADMET Rules of Thumb. J. Med. Chem. 2008, 51, 817-834.
7. What data set A really looks like
Fit to original data
N=11000; r2 = 0.09 ; RMSE = 9.95
Fit to transformed data
N=11; r2 = 0.99 ; RMSE = 0.36
Percentile plot (see Colclough et al
BMC 2008, 16, 6611-6616)
90%
75%
50%
25%
10%
Residual plot for fit to original data
8. Fit to original data
N=10000; r2 = 0.08 ; RMSE = 10.0)
Residual plot for fit to original data
Low Medium High
What data set B really looks like
Mean values of Y and (barely visible)
confidence intervals shown with
standard deviations
x
14. logPoct = 2.1
logPalk = 1.9
DlogP = 0.2
logPoct = 1.5
logPalk = -0.8
DlogP = 2.3
logPoct = 2.5
logPalk = -1.8
DlogP = 4.3
Differences in octanol/water and alkane/water logP values
reflect hydrogen bonding between solute and octanol
Toulmin et al, J. Med. Chem. 2008, 51, 3720-3730
15. DlogP = 0.5
PSA/ Å2 = 48
Polar Surface Area is not predictive of
hydrogen bond strength
Toulmin et al, J. Med. Chem. 2008, 51, 3720-3730
DlogP = 4.3
PSA/ Å2 = 22
16. 1.0 1.1 0.8 1.3 1.7
0.8 1.5
Measured values of DlogP
Toulmin et al, J. Med. Chem. 2008, 51, 3720-3730
1.6 1.1
20. Difficulties in measuring logPalk:
Many compounds poorly soluble in alkanes
Self-association masks polarity
21. Alkane/water partition coefficients: Where next?
General access to logPalk
likely to require predictive
models for some time
Carefully measure logPalk
for structurally diverse
compounds
Solvation models: logPalk
easier to measure than
ΔG(gaq)
23. (Descriptor-based) QSAR/QSPR:
Some questions
• How valid is methodology (especially for validation)
when distribution of compounds in training/test space
is highly non-uniform?
• Are models predicting activity or locating neighbours?
• Are ‘global’ models ensembles of local models?
• How well do the methods handle ‘activity cliffs’?
• How should we account for sizes of descriptor pools
when comparing models?
24. Measures of Diversity & Coverage
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
2-Dimensional representation of chemical space is used here to illustrate concepts of diversity
and coverage. Stars indicate compounds selected to sample this region of chemical space.
In this representation, similar compounds are close together
26. Examples of relationships between structures
Tanimoto coefficient (foyfi) for structures is 0.90
Ester is methyl-substituted acid Amides are ‘reversed’
27. Leatherface molecular editor
From chain saw to Matched Molecular Pairs
c-[A;!R]
bnd 1 2
c-Br
cul 2
hyd 1 1
[nX2]1c([OH])cccc1
hyd 1 1
hyd 3 -1
bnd 2 3 2
Kenny & Sadowski Structure modification in chemical databases, Methods and Principles in Medicinal
Chemistry (Chemoinformatics in Drug Discovery 2005, 23, 271-285.
28. Glycogen Phosphorylase inhibitors:
Series comparison
DpIC50
DlogFu
DlogS
0.38 (0.06)
-0.30 (0.06)
-0.29 (0.13)
DpIC50
DlogFu
DlogS
0.21 (0.06)
0.13 (0.04)
0.20 (0.09)
DpIC50
DlogFu
DlogS
0.29 (0.07)
-0.42 (0.08)
-0.62 (0.13)
Standard errors in mean values shown in parenthesis; see Birch et al, BMCL 2009, 19, 850-853
29. Effect of bioisosteric replacement
on plasma protein binding
?
Date of Analysis N DlogFu SE SD %increase
2003 7 -0.64 0.09 0.23 0
2008 12 -0.60 0.06 0.20 0
Mining PPB database for carboxylate/tetrazole pairs suggested that bioisosteric
replacement would lead to decrease in Fu so tetrazoles not synthesised.
Birch et al, BMCL 2009, 19, 850-853
30. Amide N DlogS SE SD %Increase
Acyclic (aliphatic amine) 109 0.59 0.07 0.71 76
Cyclic 9 0.18 0.15 0.47 44
Benzanilides 9 1.49 0.25 0.76 100
Effect of amide N-methylation on aqueous solubility
is dependent on substructural context
Birch et al, BMCL 2009, 19, 850-853
31. Relationships between structures
Discover new
bioisosteres
Prediction of activity
& properties
Recognise
extreme data
Direct prediction
(e.g. look up
substituent effects)
Indirect prediction
(e.g. apply correction
to existing model)
Bad measurement
or interesting effect?
32. Conclusions
• Data can be massaged and correlations can
be enhanced but it won’t extract us from ‘la
merde’
• There is life beyond octanol/water if we
choose to look for it
• Even molecules can have meaningful
relationships
33. Selected references
• Seiler (1974) Interconversion of lipophilicities from hydrocarbon/water systems into the octanol/water
system. Eur. J. Med. Chem. 9, 473–479.
• Toulmin, Wood & Kenny (2008) Toward Prediction of Alkane/Water Partition Coefficients. J. Med. Chem.
51, 3720-3730. http://dx.doi.org/10.1021/jm701549s
• Kenny & Sadowskii (2005) Structure modification in chemical databases. Methods and Principles in
Medicinal Chemistry 23(Chemoinformatics in Drug Discovery), 271-285
http://dx.doi.org/10.1002/3527603743.ch11
• Leach et al (2006) Matched Molecular Pairs as a Guide in the Optimization of Pharmaceutical Properties; a
Study of Aqueous Solubility, Plasma Protein Binding and Oral Exposure,. J. Med. Chem. 49, 6672-6682.
http://dx.doi.org/10.1021/jm0605233
• Birch et al (2009) Matched molecular pair analysis of activity and properties of glycogen phosphorylase
inhibitors. Bioorg. Med. Chem. Lett. 19, 850-853. http://dx.doi.org/10.1016/j.bmcl.2008.12.003
• Wassermann, Wawer & Bajorath (2010) Activity Landscape Representations for Structure−Activity
Relationship Analysis. J. Med. Chem. 53, 8209-8223. http://dx.doi.org/10.1021/jm100933w
Alkane/water partition coefficents
Relationships between structures