Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Molecular Similarity Characterization of ADME Landscapes
1. Molecular Similarity Characterization of
ADME Landscapes
ACS Annual Meeting
San Francisco 2010
Bin Chen‡, Rishi Gupta* and Eric Gifford†
‡ School of Informatics and Computing, Indiana University, Bloomington, IN 47408
* Anti Bacterial Research Unit, Pfizer Global R&D, Groton, CT 06340
† Computational Sciences CoE, Pfizer Global R&D, Groton, CT 06340
Pfizer Confidential
3. What has been done so far?
A lot of excellent work
in the Activity space
using a variety of
similarity methods and
descriptors
Current work focuses
primarily on ADME end
points and Molecular
properties while
examining various
descriptor types and
similarity methods
3 Pfizer Confidential
4. Do similar compounds have similar ADME properties?
Similar ADME Varies based on
Similarity 0.92 Similarity 0.85
Properties? descriptors used
O
OH
OH
OH
0.9 0.8 0.7
4 Pfizer Confidential
5. Do different ADME endpoints have different landscapes?
# neighbors with same class
Probe Compound
Ratiosimilarity
# total neighbors
High Risk Compound
HLM
Low Risk Compound
4
Ratio0.9 0.8
0.9 0.8 0.7 5
8
Ratio0.8 0.67
12
RRCK
4
Ratio0.9 0.8
0.9 0.8 0.7 5
9
Ratio0.8 0.75
12
5 Pfizer Confidential
6. Hypothesis: Visualizing Chemical Landscape
Identical Compounds
Ratio ~1.0
1.0
Ratio
Endpoint1
0.5 Endpoint2
Endpoint3
Endpoint4
Ratio=f(endpoint, similarity)
Ratio ~ High (low) 0.2 0.5 0.8 1
risk compounds/total
Similarity cutoff
compounds
6 Pfizer Confidential
7. Datasets, Assays and Bins
Endpoint Description Result unit Low Risk High
Risk
-6
RRCK passive permeability in RRCK cell line 10 cm/sec >10 <=10
HLM metabolic stability using human liver microsomes µL/min/mg <20 >=20
-6
MDR Pgp influenced permeability and 10 cm/sec >10 <=10
efflux in MDCK-MDR1 cells
CYP1A2 CYP1A2 inhibition in a substrate cocktail assay % Inhibition <10 >=10
CYP3A4 CYP3A4 inhibition in a substrate cocktail assay % Inhibition <10 >=10
CYP2D6 CYP2D6inhibition in a substrate cocktail assay % Inhibition <10 >=10
CYP2C9 CYP2C9 inhibition in a substrate cocktail assay % Inhibition <10 >=10
*Solubility ADMET Aqueous Solubility properties Solubility level >2 <=2
*cLogP logarithm partition coefficient Octanol-Water <3 >=3
Partition Coefficient
• Full matrix consisting of 17787 compounds and 9 endpoints
• Solubility and cLogP are predicted endpoints using in-house computational models
on datasets with more than 10K compounds ,the rest are experimental results
7 Pfizer Confidential
8. Characterize Chemical Landscape: Proposed Workflow*
Full matrix FCFP6 Similarity • Structure similarity
(cmpd*endpoint) Tanimoto matrix • Fingerprint (4)
Select all high/low • MDL public keys
risk compounds in an • Atom pairs
Endpoint
• FCFP6
Select one similarity • ECFC4
cutoff • Coefficient (2)
• Tanimoto
Select one compound
Iterate all high • Cosine
Iterate all
Cutoffs Calculate the ratio of
risk compounds • Risk categorization (2)
(total 14) each compound • High risk
Ratiosimilarity
# neighbors _ with _ same _ class
total _# neighbors
• Low risk
• Endpoints (9)
Average the ratio of • Complexity: 4*2*2*9=144
all the compounds
Plot: Similarity cutoff
& ratio
Workflow for Plotting landscape of an endpoint using FCFP6 and tanimoto as similarity measurement
8 *Molecular Similarity Characterization of ADME Landscapes; Chen et al., JCIM, Submitted, 2010
Pfizer Confidential
9. What are we evaluating?
Compound ID Similarity 0.9 Similarity 0.8 Similarity 0.7 …
PF_1 0.9 0.9 0.7 …
PF_2 1 0.5 0.7 …
PF_3 0.95 0.8 0.7 …
… … … … …
PF_N 0.91 0.85 0.68 …
average 0.95 0.85 0.7 …
Calculate the ratio of all compounds, individually.
Average the ratio of all the compounds at each similarity threshold,
ignoring the ratio is 0 (either no same class neighbor or no neighbor)
9 Pfizer Confidential
10. Results: Compare Different Endpoints
(a) ECFC4, Tanimoto, low risk (b) ECFC4, Tanimoto, high risk
• Rate of “fall” of a given curve defines how easy/difficult it would be to modify a compound and
modify its property i.e. transform a compound from being high risk to low risk or vice versa
• Compounds in MDR are relatively difficult to come out of a High Risk Class compared to HLM at
any given similarity cutoff
•
10 Ratio stays constant after a given certain similarity threshold (i.e. 0.4 in the case of CYP2C9 )
Pfizer Confidential
11. Results: Compare Different Fingerprints*
(a) RRCK high risk (b) RRCK low risk
• Ratio is different among fingerprints, the order is always FCFP6> Atom-
pairs >ECFC4>MDL
*Molecular
11 Similarity Characterization of ADME Landscapes; Chen et al., JCIM, Submitted, 2010
Pfizer Confidential
12. Results: Compare different similarity coefficients
(a) RRCK Low Risk (b) RRCK High Risk
• Ratio is different among similarity coefficients, the order is always
12 tanimoto>Cosine Pfizer Confidential
13. Use Case: Which one is better to optimize?
MDR:LOW
MDR: HIGH
N RRCK:HIGH
RRCK: LOW
O …
…
N N
N
N N S
Probability of Success?
MDR: LOW? MDR:LOW?
RRCK: LOW? RRCK:LOW?
O … N …
N
N
N
N N S
13 Pfizer Confidential
14. Use Case: Data Driven Compound Prioritization?
l h
Ei (ratio) (1 E j (ratio))
i j
ADMET score
l h
CYP2D6
CYP2C9
CYP1A2
CYP3A4
Aq. Sol.
cLogP
RRCK
MDR
HLM
Compds # High SCORE
Risk
Compound1 - - + - - - - - - 1 0.688
Compound2 + - - - - - - - - 1 0.694
Compound3 - - - - - - - + + 2 0.623
Compound4 - - - + + - + - - 3 0.627
+ and - represent high risk and low risk endpoint, respectively
14 Pfizer Confidential
15. Potential Combinations
• 4 descriptor types are used
• 2 similarity metrics are used
• 9 endpoints,
• 512 combinations.
• Overlap means some compounds with higher risk endpoints should go first than those
with lower e.g.: MDL+Tanimoto Coeff.
15 Pfizer Confidential
17. Conclusion
Small structural changes result in change of class
(High/Low Risk) within a given endpoint
Different endpoints behave differently from each other
e.g. MDR may be difficult to modify than CYP2C9
Curves are relatively parallel to each other independent
of descriptor and similarity metric
Derived scoring function out of the plots to prioritize
compounds (for screening or series selection)
Ratios could be used for differentiating between
“difficult” endpoints versus “easy” endpoints
1.0
Ratio
Difficult
0.5
Easy
0.2 0.5 0.8 1
17 Pfizer Confidential
Similarity cutoff
18. Reference
Martin YC et al. Do Structurally Similar Molecules Have Similar Biological
Activity?. J. Med. Chem. 2002, 45, 4350-4358
Medina-Franco, JL; et al. Characterization of Activity Landscapes Using 2D and
3D Similarity Methods: Consensus activity Cliffs. J. Chem. Inf. Model. 2009, 49,
477-491
Segall MD, et al. Focus on Success: Using a Probabilistic Approach to Achieve
an Optimal Balance of Compound Properties in Drug Discovery. Expert Opin.
Drug Metab. Toxicol. 2006, 2, 325-37
18 Pfizer Confidential
19. Acknowledgement
David Wild (School of Informatics and Computing, Indiana University)
Veerabahu Shanmugasundaram (AB RU)
Robyn Ayscue
Hua Gao
19 Pfizer Confidential
21. Results
RRCK, ECFC4, Tanimoto, High Risk RRCK, ECFC4, Tanimoto, Low Risk
Heatmap for ratios of all compounds at 14 similarity cutoffs
21 Pfizer Confidential
22. Discussion & further work
Normal distribution
Outliers analysis
Ranking function validation
Implementation
On virtue of full matrix and ADME predictive
model, any given compound can be assigned a
score for prioritization
22 Pfizer Confidential