4. Example Challenges:
●Toxicology: single toxin may modulate several
different biochemical processes
●Cancer: malignant cells have multiple biochemical
sensitivities that may be targeted
●Spectral disorders (e.g., Autism, Alzheimers, etc.):
distinct phenotypes produce similar symptoms
Discovery Paradigm:
Chemical screening prospective hits
Chemical proteomics prospective targets
How to attain comprehensive understanding?
9. How to make sense of diffuse multimode
data?
Mechanism of Action (MOA) discovery:
find compound subsets that conserve
common mechanism
Excellent (but imperfect) example:
TEST (Toxicology Estimation Software Tool)
http://www.epa.gov/nrmrl/std/qsar/qsar.html
10. TEST
Multiple data sets covering toxicity outcomes
for numerous compounds
Predict toxicity of query compounds via on-the-fly
training to similar pre-characterized analogs
11. TEST
Multiple data sets covering toxicity outcomes
for numerous compounds
Predict toxicity of query compounds via on-the-fly
training to similar pre-characterized analogs
Use Tanimoto distances over molecular
fingerprints: no validated relevance specific
outcomes
13. Procedure:
1. Assemble Matrix of compounds vs.
activity & features
2. Normalize
MOA method: feature / compound selection
14. Procedure:
1. Assemble Matrix of compounds vs.
activity & features
2. Normalize
3. Fold activity into features as per:
Ci = |Act* - Xi*|
X values: 0 = perfect correlation
1 = perfect anticorrelation
MOA method: feature / compound selection
15. Procedure:
1. Assemble Matrix of compounds vs.
activity & features
2. Normalize
3. Fold activity into features as per:
Ci = |Act* - Xi*|
4. Bicluster
MOA method: feature / compound selection
16. Procedure:
1. Assemble Matrix of compounds vs.
activity & features
2. Normalize
3. Fold activity into features as per:
Ci = |Act* - Xi*|
4. Bicluster
Clusters Contiguous correlative or anticorrelative
regions or matrix
Within clusters: molecules may share
MOA; features may correlate with
activity
Confidence: correlative & predictive
quality of model derived from cluster
MOA method: feature / compound selection
17. Example: Oral Bioavailability
Oral update depends on:
● Polar solubility
● Membrane permeability
● Interaction with various transporters
Data (from Tingjun Hou): 773 molecules
http://modem.ucsd.edu/adme/databases/databases_bioavailability.htm
Descriptors (from VolSurf and DVS): 298 features
passing information content and linear independence (R < 0.90) filters
18. Example: Oral Bioavailability
Oral update depends on:
● Polar solubility
● Membrane permeability
● Interaction with various transporters
Data (from Tingjun Hou): 773 molecules
http://modem.ucsd.edu/adme/databases/databases_bioavailability.htm
Descriptors (from VolSurf and DVS): 298 features
passing information content and linear independence (R < 0.90) filters
Preliminary Model (Weka: Bootstrap Aggregating / RepTree):
Q2
(5-fold) = 0.4712
19. Example: Oral Bioavailability
Oral update depends on:
● Polar solubility
● Membrane permeability
● Interaction with various transporters
Data (from Tingjun Hou): 773 molecules
http://modem.ucsd.edu/adme/databases/databases_bioavailability.htm
Descriptors (from VolSurf and DVS): 298 features
passing information content and linear independence (R < 0.90) filters
Preliminary Model (Weka: Bootstrap Aggregating / RepTree):
Q2
(5-fold) = 0.4712 CFS & RF:
reduced to
27 features
Q2
(5-fold) = 0.4739
22. Clusters as local training sets:
Condense to 18 high quality clusters that cover almost
entire training space (omit only 10 of 768 cpds)
23. Conclusions
Correlative & predictive performance of subset models
gives strong confidence in MOA conservation in
clusters
Head-to-head comparison with chemical proteomics
data should provide strong basis for target
identification
Questions / Suggestions?
glushington@yahoo.com