SlideShare uma empresa Scribd logo
1 de 60
Small Molecules and siRNA:Methods to Explore Bioactivity Data Rajarshi Guha NIH Chemical for Translational Therapeutics August 17, 2011 Pfizer, Groton
Background Cheminformatics methods QSAR, diversity analysis, virtual screening, fragments, polypharmacology, networks More recently siRNAscreening, high content imaging,combination screening Extensive use of machine learning All tied together with software development Integrate small molecule information & biosystems – systems chemical biology
Outline Exploring the SAR landscape The landscape view of SAR data Quantifying SAR landscapes Extending an SAR landscape Linking small molecule &  RNAiHTS Overview of the Trans NIH RNAi Screening Initiative Infrastructure components Linking small molecule & siRNA screens
The Landscape View of Structure Activity Datasets
Structure Activity Relationships Similar molecules will have similar activities Small changes in structure will lead to small changes in activity One implication is that SAR’s are additive This is the basis for QSAR modeling Martin, Y.C. et al., J. Med. Chem., 2002, 45, 4350–4358
Structure Activity Landscapes Rugged gorges or rolling hills? Small structural changes associated with large activity changes represent steep slopes in the landscape But traditionally, QSAR assumes gentle slopes Machine learning is not very good for special cases Maggiora, G.M., J. Chem. Inf. Model., 2006, 46, 1535–1535
Characterizing the Landscape A cliff can be numerically characterized Structure Activity Landscape Index (SALI) Cliffs are characterized by elements of the matrix with very large values Guha, R.; Van Drie, J.H., J. Chem. Inf. Model., 2008, 48, 646–658
Visualizing SALI Values The SALI graph Compounds are nodes Nodes i,j are connected if SALI(i,j) > X Only display connected nodes
What Can We Do With SALI’s? SALI characterizes cliffs & non-cliffs For a  given molecular representation, SALI’s gives us an idea of  thesmoothness of the SAR landscape Models try and encodethis landscape Use the landscape to guidedescriptor or model selection
Descriptor Space Smoothness Edge count of the SALI graph for varying cutoffs Measures smoothness of the descriptor space Can reduce this to a single number (AUC)
Other Examples Instead of fingerprints, we use molecular descriptors SALI denominator now uses Euclidean distance 2D & 3D random descriptor sets None are really good Too rough, or Too flat 2D 3D
Feature Selection Using SALI Surprisingly, exhaustive search of 66,000 4-descriptor combinations did not yield semi-smoothly decreasing curves Not entirely clear what type of curve is desirable
Measuring Model Quality A QSAR model should easily encode the “rolling hills” A good model captures the most significantcliffs Can be formalized as  How many of the edge orderings of a SALI graph 	      	 does the model predict correctly? Define S (X ), representing the number of edges correctly predicted for a SALI network at a threshold X Repeat for varying X and obtain the SALI curve
SALI Curves
Model Search Using the SCI We’ve used the SALI to retrospectively analyze models Can we use SALI to develop models? Identify a model that captures the cliffs Tricky Cliffs are fundamentally outliers Optimizing for good SALI values implies overfitting Need to trade-off between SALI & generalizability
Predicting the Landscape Rather than predicting activity directly, we can try to predict the SAR landscape Implies that we attempt to directly predict cliffs Observations are now pairs of molecules A more complex problem Choice of features is trickier Still face the problem of cliffs as outliers Somewhat similar to predicting activity differences Scheiber et al, Statistical Analysis and Data Mining, 2009, 2, 115-122
Motivation Predicting activity cliffs corresponds to extending the SAR landscape Identify whether a new molecule will perform better or worse compared to the specific molecules in the dataset Can be useful for guiding lead optimization, but not necessarily useful for lead hopping
Predicting Cliffs Dependent variable are pairwise SALI values, calculated using fingerprints Independent variables are molecular descriptors – but considered pairwise Absolute difference of descriptor pairs, or Geometric mean of descriptor pairs … Develop a model to correlate pairwise descriptors to pairwise SALI values
A Test Case We first consider the CavalliCoMFA dataset of 30 molecules with pIC50’s Evaluate topological and physicochemical descriptors Developed random forest models On the original observed values (30 obs) On the SALI values (435 observations) Cavalli, A. et al, J Med Chem, 2002, 45, 3844-3853
Double Counting Structures? The dependent and independent variables both encode structure.  But pretty low correlations between individual pairwisedescriptors and the SALI values
Model  Summaries Original pIC50 RMSE = 0.97 SALI, AbsDiff RMSE = 1.10 SALI, GeoMean RMSE = 1.04 All models explain similar % of variance of their respective datasets  Using geometric mean as the descriptor aggregation function seems to perform best SALI models are more robust due to larger size of the dataset
Test Case 2 Considered the Holloway docking dataset, 32 molecules with pIC50’s and Einter Similar strategy as before Need to transform SALI values  Descriptors show minimal correlation Holloway, M.K. et al, J Med Chem, 1995, 38, 305-317
Model  Summaries Original pIC50 RMSE = 1.05 SALI, AbsDiff RMSE = 0.48 SALI, GeoMean RMSE = 0.48 The SALI models perform much poorer in terms of  % of variance explained Descriptor aggregation method does not seem to have much effect The SALI models appear to perform decently on the cliffs – but misses the most significant
Model  Summaries Original pIC50 RMSE = 1.05 SALI, AbsDiff RMSE = 9.76 SALI, GeoMean RMSE = 10.01 With untransformed SALI values, models perform similarly in terms of  % of variance explained The most significant cliffs correspond to stereoisomers
Test Case 3 38 adenosine receptor antagonists with reported Ki values; use 35 for training and 3 for testing Random forest model on the SALI values performed reasonable well (RMSE = 7.51, R2=0.62) Upper end ofSALI rangeis better predicted Kalla, R.V. et al, J. Med. Chem., 2006, 48, 1984-2008
Test Case 3 ,[object Object]
Generally, performance is poorer for smaller cliffsFor any given hold out molecule, range of error in SALI prediction is large Suggests that some form of domain applicability metric would be useful
Model Caveats Models based on SALI values are dependent on their being an SAR in the original activity data Scrambling results for these models are poorer than the original models but aren’t as random as expected
Conclusions SALI is the first step in characterizing the SAR landscape Allows us to directly analyze the landscape, as opposed to individual molecules Being able to predict the landscape could serve as a useful way to extend an SAR  landscape
Joining the Dots: Integrating High Throughput Small Molecule and RNAi Screens
RNAi Facility Mission Pathway (Reporter assays, e.g. luciferase, b-lactamase) Simple Phenotypes (Viability, cytotoxicity, oxidative stress, etc) Perform collaborative genome-wide RNAi screening-based projects with intramural investigators Advance the science of RNAi and miRNA screening and informatics via technology development to improve efficiency, reliability, and costs. Complex Phenotypes (High-content imaging, cell cycle, translocation, etc) Range of Assays
RNAi Informatics Infrastructure
RNAi Analysis Workflow Raw and Processed Data GO annotations Pathways Interactions Hit List Follow-up
RNAi Informatics Toolset Local databases (screen data, pathways, interactions, etc). Commercial pathway tools.  Custom software for loading, analysis and visualization.
Back End Services Currently all computational analysis performed on the backend R & Bioconductor code Custom R package (ncgcrnai) to support NCGC infrastructure Partly derived from cellHTS2 Supports QC metrics, normalization, adjustments, selections, triage, (static) visualization, reports Some Java tools for Data loading Library and plate registration
User Accessible Tools
User Accessible Tools
RNAi& Small Molecule Screens CAGCATGAGTACTACAGGCCA TACGGGAACTACCATAATTTA What targets mediate activity of siRNA  and compound Pathway elucidation, identification of interactions ,[object Object]
 Develop new annotated librariesTarget ID and validation Link RNAi generated pathway peturbations to small molecule activities. Could provide insight into polypharmacology ,[object Object],Goal: Develop systems level view of small molecule activity
HTS for NF-κB Antagonists NF-κB controls DNA transcription  Involved in cellular responses to stimuli Immune response, memory formation Inflammation, cancer, auto-immune diseases http://www.genego.com
HTS for NF-κB Antagonists ME-180 cell line Stimulate cells using TNF, leading to NF-κB activation, readout via a β-lactamase reporter Identify small molecules and siRNA’s that block the resultant activation
Small Molecule HTS Summary 2,899 FDA-approved compounds screened 55 compounds retested active Which components of the NF-κB pathway do they hit? 17 molecules have target/pathway information in GeneGO Literature searches list a few more Most Potent Actives Proscillaridin A Trabectidin Digoxin Miller, S.C. et al, Biochem. Pharmacol., 2010, ASAP
RNAi HTS Summary Qiagen HDG library – 6886 genes, 4 siRNA’s per gene A total of 567 genes were knockeddown by 1 or more siRNA’s We consider >= 2 as a “reliable” hit 16 reliable hits Added in 66 genes for follow up via triage procedure
The Obvious Conclusion The active compounds target the 16 hits (at least) from the RNAi screen Useful if the RNAi screen was small & focused But what if we’re investigating a larger system? Is there a way to get more specific? Can compound data suggest RNAi non-hits?
Small Molecule Targets Bortezomib (proteosome inhibitor) Some small molecules interact with core components Daunorubicin (IκBα inhibitor)
Small Molecule Targets Montelukast (LDT4 antagonist) Others are active against upstream targets We also get an idea of off -target effects
Compound Networks - Similarity Evaluate fingerprint-based similarity matrix for the 55 actives Connect pairs that exhibit Tc> 0.7  Edges are weightedby the Tc value  Most groupings areobvious
A “Dictionary” Based Approach Create a small-ish annotated library “Seed” compounds Use it in parallel small molecule/RNAi screens Use a similarity based approach to prioritize larger collections, in terms of anticipated targets Currently, we’d use structural similarity Diversity of prioritized structures is dependent on the diversity of the annotated library
Compound Networks - Targets Predict targets for the actives using SEA Target based compound network maps nearly identically to the similarity based network  But depending on the predicted target qualitywe get poor (or no) mappings to the RNAi targeted genes Keiser, M.J. et al, Nat. Biotech., 2007, 25, 197-206
Gene Networks - Pathways Nodes are 1374 HDG genes contained in the NCI PID  Edge indicates two genes/proteins are involved in the same pathway “Good” hits tend to be very highly connected Wang, L. et al, BMC Genomics, 2009, 10, 220
(Reduced) Gene Networks – Pathways Nodes are 526 genes with >= 1 siRNA showing knockdown  Edge indicates two genes/proteins are involved in the same pathway
Pathway Based Integration Direct matching of targets is not very useful Try and map compounds to siRNA targets if the compounds’ predicted target(s) and siRNA targets are in the same pathway Considering 16 reliable hits, we cover 26 pathways Predicted compound targets cover 131 pathways For 18 out of 41 compounds 3 RNAi-derived pathways not covered by compound-derived pathways	 Rhodopsin, alternative NFkB, FAS
Pathway Based Integration Still not completely useful, as it only handled 18 compounds Depending on target predictions is probably not a great idea
Integration Caveats Biggest bottleneck is lack of resolution Currently, both small molecule and RNAi data are 1-D Active or inactive, high/low signal CRC’s for small molecules alleviate this a bit High content screens can provide significantly more information and so better resolution Data size & feature selection are of concern
Integration Caveats Compound annotations are key Currently working on using ChEMBL data to provide target ‘suggestions’ More comprehensive pathway data will be required RNAi and small molecule inhibition do not always lead to the same phenotype Could be indicative of promiscuity Could indicate true biological differences Weiss, W.A. et al, Nat. Chem. Biol., 2007, 12, 739-744
Conclusions Building up a wealth of small molecule and RNAi data “Standard” analysis of RNAi screens relatively straightforward Challenges involve integrating RNAi data with other sources Primary bottleneck is dimensionality of the data Simple flourescence-based approaches do not provide sufficient resolution High-content is required
Acknowledgements John Van Drie Gerry Maggiora MicLajiness JurgenBajorath Scott Martin Pinar Tuzmen CarleenKlump DacTrung Nguyen Ruili Huang Yuhong Wang
CPT Sensitization & “Central” Genes Yves Pommier, Nat. Rev. Cancer, 2006.  TOP1 poisons prevent DNA religation resulting in replication-dependent double strand breaks. Cell activates DNA damage response (e.g. ATR).
Screening Protocol Screen conducted in the human breast cancer cell line MDA-MB-231. Many variables to optimize including transfection conditions, cell seeding density, assay conditions, and the selection of positive and negative controls.
Hit Selection Follow-Up Dose Response Analysis ATR Screen #1 siNeg siATR-A siATR-B siATR-C Viability (%) Sensitization Ranked by Log2 Fold Change CPT (Log M) Screen #2 MAP3K7IP2 siNeg siMAP3K7IP2-A siMAP3K7IP2-B siMAP3K7IP2-C Viability (%) siMAP3K7IP2-D Sensitization Ranked by Log2 Fold Change CPT (Log M) Multiple active siRNAs for ATR, MAP3K7IP2, and BCL2L1.

Mais conteúdo relacionado

Semelhante a Small Molecules and siRNA: Methods to Explore Bioactivity Data

Predicting Activity Cliffs - Can Machine Learning Handle Special Cases?
Predicting Activity Cliffs - Can Machine Learning Handle Special Cases?Predicting Activity Cliffs - Can Machine Learning Handle Special Cases?
Predicting Activity Cliffs - Can Machine Learning Handle Special Cases?Rajarshi Guha
 
ON THE PREDICTION ACCURACIES OF THREE MOST KNOWN REGULARIZERS : RIDGE REGRESS...
ON THE PREDICTION ACCURACIES OF THREE MOST KNOWN REGULARIZERS : RIDGE REGRESS...ON THE PREDICTION ACCURACIES OF THREE MOST KNOWN REGULARIZERS : RIDGE REGRESS...
ON THE PREDICTION ACCURACIES OF THREE MOST KNOWN REGULARIZERS : RIDGE REGRESS...ijaia
 
Performance analysis of regularized linear regression models for oxazolines a...
Performance analysis of regularized linear regression models for oxazolines a...Performance analysis of regularized linear regression models for oxazolines a...
Performance analysis of regularized linear regression models for oxazolines a...ijcsity
 
Cadd and molecular modeling for M.Pharm
Cadd and molecular modeling for M.PharmCadd and molecular modeling for M.Pharm
Cadd and molecular modeling for M.PharmShikha Popali
 
Gordon2003
Gordon2003Gordon2003
Gordon2003toluene
 
Prediction Of Bioactivity From Chemical Structure
Prediction Of Bioactivity From Chemical StructurePrediction Of Bioactivity From Chemical Structure
Prediction Of Bioactivity From Chemical StructureJeremy Besnard
 
QSAR quantitative structure activity relationship
QSAR quantitative structure activity relationship QSAR quantitative structure activity relationship
QSAR quantitative structure activity relationship ZarlishAttique1
 
A Network Visualization of Structure Activity Landscapes
A Network Visualization of Structure Activity LandscapesA Network Visualization of Structure Activity Landscapes
A Network Visualization of Structure Activity LandscapesRajarshi Guha
 
Quantitative Structure Activity Relationship
Quantitative Structure Activity RelationshipQuantitative Structure Activity Relationship
Quantitative Structure Activity RelationshipRaniBhagat1
 
Statistical method used in QSAR.pptx
Statistical method used in QSAR.pptxStatistical method used in QSAR.pptx
Statistical method used in QSAR.pptxupasanasharma66
 
Modeling MAPK with ODEs and Petri Nets
Modeling MAPK with ODEs and Petri NetsModeling MAPK with ODEs and Petri Nets
Modeling MAPK with ODEs and Petri NetsBiafra Ahanonu
 
Data Mining Protein Structures' Topological Properties to Enhance Contact Ma...
Data Mining Protein Structures' Topological Properties  to Enhance Contact Ma...Data Mining Protein Structures' Topological Properties  to Enhance Contact Ma...
Data Mining Protein Structures' Topological Properties to Enhance Contact Ma...jaumebp
 
Homology Modeling.pptx
Homology Modeling.pptxHomology Modeling.pptx
Homology Modeling.pptxAmnaAkram29
 
Lecture 9 molecular descriptors
Lecture 9  molecular descriptorsLecture 9  molecular descriptors
Lecture 9 molecular descriptorsRAJAN ROLTA
 
LASSO MODELING AS AN ALTERNATIVE TO PCA BASED MULTIVARIATE MODELS TO SYSTEM W...
LASSO MODELING AS AN ALTERNATIVE TO PCA BASED MULTIVARIATE MODELS TO SYSTEM W...LASSO MODELING AS AN ALTERNATIVE TO PCA BASED MULTIVARIATE MODELS TO SYSTEM W...
LASSO MODELING AS AN ALTERNATIVE TO PCA BASED MULTIVARIATE MODELS TO SYSTEM W...mathsjournal
 
LASSO MODELING AS AN ALTERNATIVE TO PCA BASED MULTIVARIATE MODELS TO SYSTEM W...
LASSO MODELING AS AN ALTERNATIVE TO PCA BASED MULTIVARIATE MODELS TO SYSTEM W...LASSO MODELING AS AN ALTERNATIVE TO PCA BASED MULTIVARIATE MODELS TO SYSTEM W...
LASSO MODELING AS AN ALTERNATIVE TO PCA BASED MULTIVARIATE MODELS TO SYSTEM W...mathsjournal
 
Application of support vector machines for prediction of anti hiv activity of...
Application of support vector machines for prediction of anti hiv activity of...Application of support vector machines for prediction of anti hiv activity of...
Application of support vector machines for prediction of anti hiv activity of...Alexander Decker
 
Validation of homology modeling
Validation of homology modelingValidation of homology modeling
Validation of homology modelingAlichy Sowmya
 

Semelhante a Small Molecules and siRNA: Methods to Explore Bioactivity Data (20)

Predicting Activity Cliffs - Can Machine Learning Handle Special Cases?
Predicting Activity Cliffs - Can Machine Learning Handle Special Cases?Predicting Activity Cliffs - Can Machine Learning Handle Special Cases?
Predicting Activity Cliffs - Can Machine Learning Handle Special Cases?
 
ON THE PREDICTION ACCURACIES OF THREE MOST KNOWN REGULARIZERS : RIDGE REGRESS...
ON THE PREDICTION ACCURACIES OF THREE MOST KNOWN REGULARIZERS : RIDGE REGRESS...ON THE PREDICTION ACCURACIES OF THREE MOST KNOWN REGULARIZERS : RIDGE REGRESS...
ON THE PREDICTION ACCURACIES OF THREE MOST KNOWN REGULARIZERS : RIDGE REGRESS...
 
Performance analysis of regularized linear regression models for oxazolines a...
Performance analysis of regularized linear regression models for oxazolines a...Performance analysis of regularized linear regression models for oxazolines a...
Performance analysis of regularized linear regression models for oxazolines a...
 
Cadd and molecular modeling for M.Pharm
Cadd and molecular modeling for M.PharmCadd and molecular modeling for M.Pharm
Cadd and molecular modeling for M.Pharm
 
Gordon2003
Gordon2003Gordon2003
Gordon2003
 
Prediction Of Bioactivity From Chemical Structure
Prediction Of Bioactivity From Chemical StructurePrediction Of Bioactivity From Chemical Structure
Prediction Of Bioactivity From Chemical Structure
 
QSAR quantitative structure activity relationship
QSAR quantitative structure activity relationship QSAR quantitative structure activity relationship
QSAR quantitative structure activity relationship
 
A Network Visualization of Structure Activity Landscapes
A Network Visualization of Structure Activity LandscapesA Network Visualization of Structure Activity Landscapes
A Network Visualization of Structure Activity Landscapes
 
Quantitative Structure Activity Relationship
Quantitative Structure Activity RelationshipQuantitative Structure Activity Relationship
Quantitative Structure Activity Relationship
 
Statistical method used in QSAR.pptx
Statistical method used in QSAR.pptxStatistical method used in QSAR.pptx
Statistical method used in QSAR.pptx
 
Modeling MAPK with ODEs and Petri Nets
Modeling MAPK with ODEs and Petri NetsModeling MAPK with ODEs and Petri Nets
Modeling MAPK with ODEs and Petri Nets
 
Data Mining Protein Structures' Topological Properties to Enhance Contact Ma...
Data Mining Protein Structures' Topological Properties  to Enhance Contact Ma...Data Mining Protein Structures' Topological Properties  to Enhance Contact Ma...
Data Mining Protein Structures' Topological Properties to Enhance Contact Ma...
 
Homology Modeling.pptx
Homology Modeling.pptxHomology Modeling.pptx
Homology Modeling.pptx
 
3D QSAR
3D QSAR3D QSAR
3D QSAR
 
Lecture 9 molecular descriptors
Lecture 9  molecular descriptorsLecture 9  molecular descriptors
Lecture 9 molecular descriptors
 
LASSO MODELING AS AN ALTERNATIVE TO PCA BASED MULTIVARIATE MODELS TO SYSTEM W...
LASSO MODELING AS AN ALTERNATIVE TO PCA BASED MULTIVARIATE MODELS TO SYSTEM W...LASSO MODELING AS AN ALTERNATIVE TO PCA BASED MULTIVARIATE MODELS TO SYSTEM W...
LASSO MODELING AS AN ALTERNATIVE TO PCA BASED MULTIVARIATE MODELS TO SYSTEM W...
 
LASSO MODELING AS AN ALTERNATIVE TO PCA BASED MULTIVARIATE MODELS TO SYSTEM W...
LASSO MODELING AS AN ALTERNATIVE TO PCA BASED MULTIVARIATE MODELS TO SYSTEM W...LASSO MODELING AS AN ALTERNATIVE TO PCA BASED MULTIVARIATE MODELS TO SYSTEM W...
LASSO MODELING AS AN ALTERNATIVE TO PCA BASED MULTIVARIATE MODELS TO SYSTEM W...
 
3 D QSAR Approaches and Contour Map Analysis
3 D QSAR Approaches and Contour Map Analysis3 D QSAR Approaches and Contour Map Analysis
3 D QSAR Approaches and Contour Map Analysis
 
Application of support vector machines for prediction of anti hiv activity of...
Application of support vector machines for prediction of anti hiv activity of...Application of support vector machines for prediction of anti hiv activity of...
Application of support vector machines for prediction of anti hiv activity of...
 
Validation of homology modeling
Validation of homology modelingValidation of homology modeling
Validation of homology modeling
 

Mais de Rajarshi Guha

Pharos: A Torch to Use in Your Journey in the Dark Genome
Pharos: A Torch to Use in Your Journey in the Dark GenomePharos: A Torch to Use in Your Journey in the Dark Genome
Pharos: A Torch to Use in Your Journey in the Dark GenomeRajarshi Guha
 
Pharos: Putting targets in context
Pharos: Putting targets in contextPharos: Putting targets in context
Pharos: Putting targets in contextRajarshi Guha
 
Pharos – A Torch to Use in Your Journey In the Dark Genome
Pharos – A Torch to Use in Your Journey In the Dark GenomePharos – A Torch to Use in Your Journey In the Dark Genome
Pharos – A Torch to Use in Your Journey In the Dark GenomeRajarshi Guha
 
Pharos - Face of the KMC
Pharos - Face of the KMCPharos - Face of the KMC
Pharos - Face of the KMCRajarshi Guha
 
Enhancing Prioritization & Discovery of Novel Combinations using an HTS Platform
Enhancing Prioritization & Discovery of Novel Combinations using an HTS PlatformEnhancing Prioritization & Discovery of Novel Combinations using an HTS Platform
Enhancing Prioritization & Discovery of Novel Combinations using an HTS PlatformRajarshi Guha
 
What can your library do for you?
What can your library do for you?What can your library do for you?
What can your library do for you?Rajarshi Guha
 
So I have an SD File … What do I do next?
So I have an SD File … What do I do next?So I have an SD File … What do I do next?
So I have an SD File … What do I do next?Rajarshi Guha
 
Characterization of Chemical Libraries Using Scaffolds and Network Models
Characterization of Chemical Libraries Using Scaffolds and Network ModelsCharacterization of Chemical Libraries Using Scaffolds and Network Models
Characterization of Chemical Libraries Using Scaffolds and Network ModelsRajarshi Guha
 
From Data to Action : Bridging Chemistry and Biology with Informatics at NCATS
From Data to Action: Bridging Chemistry and Biology with Informatics at NCATSFrom Data to Action: Bridging Chemistry and Biology with Informatics at NCATS
From Data to Action : Bridging Chemistry and Biology with Informatics at NCATSRajarshi Guha
 
Robots, Small Molecules & R
Robots, Small Molecules & RRobots, Small Molecules & R
Robots, Small Molecules & RRajarshi Guha
 
Fingerprinting Chemical Structures
Fingerprinting Chemical StructuresFingerprinting Chemical Structures
Fingerprinting Chemical StructuresRajarshi Guha
 
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...Rajarshi Guha
 
When the whole is better than the parts
When the whole is better than the partsWhen the whole is better than the parts
When the whole is better than the partsRajarshi Guha
 
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...Rajarshi Guha
 
Pushing Chemical Biology Through the Pipes
Pushing Chemical Biology Through the PipesPushing Chemical Biology Through the Pipes
Pushing Chemical Biology Through the PipesRajarshi Guha
 
Characterization and visualization of compound combination responses in a hig...
Characterization and visualization of compound combination responses in a hig...Characterization and visualization of compound combination responses in a hig...
Characterization and visualization of compound combination responses in a hig...Rajarshi Guha
 
The BioAssay Research Database
The BioAssay Research DatabaseThe BioAssay Research Database
The BioAssay Research DatabaseRajarshi Guha
 
Cloudy with a Touch of Cheminformatics
Cloudy with a Touch of CheminformaticsCloudy with a Touch of Cheminformatics
Cloudy with a Touch of CheminformaticsRajarshi Guha
 
Chemical Data Mining: Open Source & Reproducible
Chemical Data Mining: Open Source & ReproducibleChemical Data Mining: Open Source & Reproducible
Chemical Data Mining: Open Source & ReproducibleRajarshi Guha
 
Chemogenomics in the cloud: Is the sky the limit?
Chemogenomics in the cloud: Is the sky the limit?Chemogenomics in the cloud: Is the sky the limit?
Chemogenomics in the cloud: Is the sky the limit?Rajarshi Guha
 

Mais de Rajarshi Guha (20)

Pharos: A Torch to Use in Your Journey in the Dark Genome
Pharos: A Torch to Use in Your Journey in the Dark GenomePharos: A Torch to Use in Your Journey in the Dark Genome
Pharos: A Torch to Use in Your Journey in the Dark Genome
 
Pharos: Putting targets in context
Pharos: Putting targets in contextPharos: Putting targets in context
Pharos: Putting targets in context
 
Pharos – A Torch to Use in Your Journey In the Dark Genome
Pharos – A Torch to Use in Your Journey In the Dark GenomePharos – A Torch to Use in Your Journey In the Dark Genome
Pharos – A Torch to Use in Your Journey In the Dark Genome
 
Pharos - Face of the KMC
Pharos - Face of the KMCPharos - Face of the KMC
Pharos - Face of the KMC
 
Enhancing Prioritization & Discovery of Novel Combinations using an HTS Platform
Enhancing Prioritization & Discovery of Novel Combinations using an HTS PlatformEnhancing Prioritization & Discovery of Novel Combinations using an HTS Platform
Enhancing Prioritization & Discovery of Novel Combinations using an HTS Platform
 
What can your library do for you?
What can your library do for you?What can your library do for you?
What can your library do for you?
 
So I have an SD File … What do I do next?
So I have an SD File … What do I do next?So I have an SD File … What do I do next?
So I have an SD File … What do I do next?
 
Characterization of Chemical Libraries Using Scaffolds and Network Models
Characterization of Chemical Libraries Using Scaffolds and Network ModelsCharacterization of Chemical Libraries Using Scaffolds and Network Models
Characterization of Chemical Libraries Using Scaffolds and Network Models
 
From Data to Action : Bridging Chemistry and Biology with Informatics at NCATS
From Data to Action: Bridging Chemistry and Biology with Informatics at NCATSFrom Data to Action: Bridging Chemistry and Biology with Informatics at NCATS
From Data to Action : Bridging Chemistry and Biology with Informatics at NCATS
 
Robots, Small Molecules & R
Robots, Small Molecules & RRobots, Small Molecules & R
Robots, Small Molecules & R
 
Fingerprinting Chemical Structures
Fingerprinting Chemical StructuresFingerprinting Chemical Structures
Fingerprinting Chemical Structures
 
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...
 
When the whole is better than the parts
When the whole is better than the partsWhen the whole is better than the parts
When the whole is better than the parts
 
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...
 
Pushing Chemical Biology Through the Pipes
Pushing Chemical Biology Through the PipesPushing Chemical Biology Through the Pipes
Pushing Chemical Biology Through the Pipes
 
Characterization and visualization of compound combination responses in a hig...
Characterization and visualization of compound combination responses in a hig...Characterization and visualization of compound combination responses in a hig...
Characterization and visualization of compound combination responses in a hig...
 
The BioAssay Research Database
The BioAssay Research DatabaseThe BioAssay Research Database
The BioAssay Research Database
 
Cloudy with a Touch of Cheminformatics
Cloudy with a Touch of CheminformaticsCloudy with a Touch of Cheminformatics
Cloudy with a Touch of Cheminformatics
 
Chemical Data Mining: Open Source & Reproducible
Chemical Data Mining: Open Source & ReproducibleChemical Data Mining: Open Source & Reproducible
Chemical Data Mining: Open Source & Reproducible
 
Chemogenomics in the cloud: Is the sky the limit?
Chemogenomics in the cloud: Is the sky the limit?Chemogenomics in the cloud: Is the sky the limit?
Chemogenomics in the cloud: Is the sky the limit?
 

Último

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 

Último (20)

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 

Small Molecules and siRNA: Methods to Explore Bioactivity Data

  • 1. Small Molecules and siRNA:Methods to Explore Bioactivity Data Rajarshi Guha NIH Chemical for Translational Therapeutics August 17, 2011 Pfizer, Groton
  • 2. Background Cheminformatics methods QSAR, diversity analysis, virtual screening, fragments, polypharmacology, networks More recently siRNAscreening, high content imaging,combination screening Extensive use of machine learning All tied together with software development Integrate small molecule information & biosystems – systems chemical biology
  • 3. Outline Exploring the SAR landscape The landscape view of SAR data Quantifying SAR landscapes Extending an SAR landscape Linking small molecule & RNAiHTS Overview of the Trans NIH RNAi Screening Initiative Infrastructure components Linking small molecule & siRNA screens
  • 4. The Landscape View of Structure Activity Datasets
  • 5. Structure Activity Relationships Similar molecules will have similar activities Small changes in structure will lead to small changes in activity One implication is that SAR’s are additive This is the basis for QSAR modeling Martin, Y.C. et al., J. Med. Chem., 2002, 45, 4350–4358
  • 6. Structure Activity Landscapes Rugged gorges or rolling hills? Small structural changes associated with large activity changes represent steep slopes in the landscape But traditionally, QSAR assumes gentle slopes Machine learning is not very good for special cases Maggiora, G.M., J. Chem. Inf. Model., 2006, 46, 1535–1535
  • 7. Characterizing the Landscape A cliff can be numerically characterized Structure Activity Landscape Index (SALI) Cliffs are characterized by elements of the matrix with very large values Guha, R.; Van Drie, J.H., J. Chem. Inf. Model., 2008, 48, 646–658
  • 8. Visualizing SALI Values The SALI graph Compounds are nodes Nodes i,j are connected if SALI(i,j) > X Only display connected nodes
  • 9. What Can We Do With SALI’s? SALI characterizes cliffs & non-cliffs For a given molecular representation, SALI’s gives us an idea of thesmoothness of the SAR landscape Models try and encodethis landscape Use the landscape to guidedescriptor or model selection
  • 10. Descriptor Space Smoothness Edge count of the SALI graph for varying cutoffs Measures smoothness of the descriptor space Can reduce this to a single number (AUC)
  • 11. Other Examples Instead of fingerprints, we use molecular descriptors SALI denominator now uses Euclidean distance 2D & 3D random descriptor sets None are really good Too rough, or Too flat 2D 3D
  • 12. Feature Selection Using SALI Surprisingly, exhaustive search of 66,000 4-descriptor combinations did not yield semi-smoothly decreasing curves Not entirely clear what type of curve is desirable
  • 13. Measuring Model Quality A QSAR model should easily encode the “rolling hills” A good model captures the most significantcliffs Can be formalized as How many of the edge orderings of a SALI graph does the model predict correctly? Define S (X ), representing the number of edges correctly predicted for a SALI network at a threshold X Repeat for varying X and obtain the SALI curve
  • 15. Model Search Using the SCI We’ve used the SALI to retrospectively analyze models Can we use SALI to develop models? Identify a model that captures the cliffs Tricky Cliffs are fundamentally outliers Optimizing for good SALI values implies overfitting Need to trade-off between SALI & generalizability
  • 16. Predicting the Landscape Rather than predicting activity directly, we can try to predict the SAR landscape Implies that we attempt to directly predict cliffs Observations are now pairs of molecules A more complex problem Choice of features is trickier Still face the problem of cliffs as outliers Somewhat similar to predicting activity differences Scheiber et al, Statistical Analysis and Data Mining, 2009, 2, 115-122
  • 17. Motivation Predicting activity cliffs corresponds to extending the SAR landscape Identify whether a new molecule will perform better or worse compared to the specific molecules in the dataset Can be useful for guiding lead optimization, but not necessarily useful for lead hopping
  • 18. Predicting Cliffs Dependent variable are pairwise SALI values, calculated using fingerprints Independent variables are molecular descriptors – but considered pairwise Absolute difference of descriptor pairs, or Geometric mean of descriptor pairs … Develop a model to correlate pairwise descriptors to pairwise SALI values
  • 19. A Test Case We first consider the CavalliCoMFA dataset of 30 molecules with pIC50’s Evaluate topological and physicochemical descriptors Developed random forest models On the original observed values (30 obs) On the SALI values (435 observations) Cavalli, A. et al, J Med Chem, 2002, 45, 3844-3853
  • 20. Double Counting Structures? The dependent and independent variables both encode structure. But pretty low correlations between individual pairwisedescriptors and the SALI values
  • 21. Model Summaries Original pIC50 RMSE = 0.97 SALI, AbsDiff RMSE = 1.10 SALI, GeoMean RMSE = 1.04 All models explain similar % of variance of their respective datasets Using geometric mean as the descriptor aggregation function seems to perform best SALI models are more robust due to larger size of the dataset
  • 22. Test Case 2 Considered the Holloway docking dataset, 32 molecules with pIC50’s and Einter Similar strategy as before Need to transform SALI values Descriptors show minimal correlation Holloway, M.K. et al, J Med Chem, 1995, 38, 305-317
  • 23. Model Summaries Original pIC50 RMSE = 1.05 SALI, AbsDiff RMSE = 0.48 SALI, GeoMean RMSE = 0.48 The SALI models perform much poorer in terms of % of variance explained Descriptor aggregation method does not seem to have much effect The SALI models appear to perform decently on the cliffs – but misses the most significant
  • 24. Model Summaries Original pIC50 RMSE = 1.05 SALI, AbsDiff RMSE = 9.76 SALI, GeoMean RMSE = 10.01 With untransformed SALI values, models perform similarly in terms of % of variance explained The most significant cliffs correspond to stereoisomers
  • 25. Test Case 3 38 adenosine receptor antagonists with reported Ki values; use 35 for training and 3 for testing Random forest model on the SALI values performed reasonable well (RMSE = 7.51, R2=0.62) Upper end ofSALI rangeis better predicted Kalla, R.V. et al, J. Med. Chem., 2006, 48, 1984-2008
  • 26.
  • 27. Generally, performance is poorer for smaller cliffsFor any given hold out molecule, range of error in SALI prediction is large Suggests that some form of domain applicability metric would be useful
  • 28. Model Caveats Models based on SALI values are dependent on their being an SAR in the original activity data Scrambling results for these models are poorer than the original models but aren’t as random as expected
  • 29. Conclusions SALI is the first step in characterizing the SAR landscape Allows us to directly analyze the landscape, as opposed to individual molecules Being able to predict the landscape could serve as a useful way to extend an SAR landscape
  • 30. Joining the Dots: Integrating High Throughput Small Molecule and RNAi Screens
  • 31. RNAi Facility Mission Pathway (Reporter assays, e.g. luciferase, b-lactamase) Simple Phenotypes (Viability, cytotoxicity, oxidative stress, etc) Perform collaborative genome-wide RNAi screening-based projects with intramural investigators Advance the science of RNAi and miRNA screening and informatics via technology development to improve efficiency, reliability, and costs. Complex Phenotypes (High-content imaging, cell cycle, translocation, etc) Range of Assays
  • 33. RNAi Analysis Workflow Raw and Processed Data GO annotations Pathways Interactions Hit List Follow-up
  • 34. RNAi Informatics Toolset Local databases (screen data, pathways, interactions, etc). Commercial pathway tools. Custom software for loading, analysis and visualization.
  • 35. Back End Services Currently all computational analysis performed on the backend R & Bioconductor code Custom R package (ncgcrnai) to support NCGC infrastructure Partly derived from cellHTS2 Supports QC metrics, normalization, adjustments, selections, triage, (static) visualization, reports Some Java tools for Data loading Library and plate registration
  • 38.
  • 39.
  • 40. HTS for NF-κB Antagonists NF-κB controls DNA transcription Involved in cellular responses to stimuli Immune response, memory formation Inflammation, cancer, auto-immune diseases http://www.genego.com
  • 41. HTS for NF-κB Antagonists ME-180 cell line Stimulate cells using TNF, leading to NF-κB activation, readout via a β-lactamase reporter Identify small molecules and siRNA’s that block the resultant activation
  • 42. Small Molecule HTS Summary 2,899 FDA-approved compounds screened 55 compounds retested active Which components of the NF-κB pathway do they hit? 17 molecules have target/pathway information in GeneGO Literature searches list a few more Most Potent Actives Proscillaridin A Trabectidin Digoxin Miller, S.C. et al, Biochem. Pharmacol., 2010, ASAP
  • 43. RNAi HTS Summary Qiagen HDG library – 6886 genes, 4 siRNA’s per gene A total of 567 genes were knockeddown by 1 or more siRNA’s We consider >= 2 as a “reliable” hit 16 reliable hits Added in 66 genes for follow up via triage procedure
  • 44. The Obvious Conclusion The active compounds target the 16 hits (at least) from the RNAi screen Useful if the RNAi screen was small & focused But what if we’re investigating a larger system? Is there a way to get more specific? Can compound data suggest RNAi non-hits?
  • 45. Small Molecule Targets Bortezomib (proteosome inhibitor) Some small molecules interact with core components Daunorubicin (IκBα inhibitor)
  • 46. Small Molecule Targets Montelukast (LDT4 antagonist) Others are active against upstream targets We also get an idea of off -target effects
  • 47. Compound Networks - Similarity Evaluate fingerprint-based similarity matrix for the 55 actives Connect pairs that exhibit Tc> 0.7 Edges are weightedby the Tc value Most groupings areobvious
  • 48. A “Dictionary” Based Approach Create a small-ish annotated library “Seed” compounds Use it in parallel small molecule/RNAi screens Use a similarity based approach to prioritize larger collections, in terms of anticipated targets Currently, we’d use structural similarity Diversity of prioritized structures is dependent on the diversity of the annotated library
  • 49. Compound Networks - Targets Predict targets for the actives using SEA Target based compound network maps nearly identically to the similarity based network But depending on the predicted target qualitywe get poor (or no) mappings to the RNAi targeted genes Keiser, M.J. et al, Nat. Biotech., 2007, 25, 197-206
  • 50. Gene Networks - Pathways Nodes are 1374 HDG genes contained in the NCI PID Edge indicates two genes/proteins are involved in the same pathway “Good” hits tend to be very highly connected Wang, L. et al, BMC Genomics, 2009, 10, 220
  • 51. (Reduced) Gene Networks – Pathways Nodes are 526 genes with >= 1 siRNA showing knockdown Edge indicates two genes/proteins are involved in the same pathway
  • 52. Pathway Based Integration Direct matching of targets is not very useful Try and map compounds to siRNA targets if the compounds’ predicted target(s) and siRNA targets are in the same pathway Considering 16 reliable hits, we cover 26 pathways Predicted compound targets cover 131 pathways For 18 out of 41 compounds 3 RNAi-derived pathways not covered by compound-derived pathways Rhodopsin, alternative NFkB, FAS
  • 53. Pathway Based Integration Still not completely useful, as it only handled 18 compounds Depending on target predictions is probably not a great idea
  • 54. Integration Caveats Biggest bottleneck is lack of resolution Currently, both small molecule and RNAi data are 1-D Active or inactive, high/low signal CRC’s for small molecules alleviate this a bit High content screens can provide significantly more information and so better resolution Data size & feature selection are of concern
  • 55. Integration Caveats Compound annotations are key Currently working on using ChEMBL data to provide target ‘suggestions’ More comprehensive pathway data will be required RNAi and small molecule inhibition do not always lead to the same phenotype Could be indicative of promiscuity Could indicate true biological differences Weiss, W.A. et al, Nat. Chem. Biol., 2007, 12, 739-744
  • 56. Conclusions Building up a wealth of small molecule and RNAi data “Standard” analysis of RNAi screens relatively straightforward Challenges involve integrating RNAi data with other sources Primary bottleneck is dimensionality of the data Simple flourescence-based approaches do not provide sufficient resolution High-content is required
  • 57. Acknowledgements John Van Drie Gerry Maggiora MicLajiness JurgenBajorath Scott Martin Pinar Tuzmen CarleenKlump DacTrung Nguyen Ruili Huang Yuhong Wang
  • 58. CPT Sensitization & “Central” Genes Yves Pommier, Nat. Rev. Cancer, 2006. TOP1 poisons prevent DNA religation resulting in replication-dependent double strand breaks. Cell activates DNA damage response (e.g. ATR).
  • 59. Screening Protocol Screen conducted in the human breast cancer cell line MDA-MB-231. Many variables to optimize including transfection conditions, cell seeding density, assay conditions, and the selection of positive and negative controls.
  • 60. Hit Selection Follow-Up Dose Response Analysis ATR Screen #1 siNeg siATR-A siATR-B siATR-C Viability (%) Sensitization Ranked by Log2 Fold Change CPT (Log M) Screen #2 MAP3K7IP2 siNeg siMAP3K7IP2-A siMAP3K7IP2-B siMAP3K7IP2-C Viability (%) siMAP3K7IP2-D Sensitization Ranked by Log2 Fold Change CPT (Log M) Multiple active siRNAs for ATR, MAP3K7IP2, and BCL2L1.
  • 61. Are These Genes Relevant? Some are well known to be CPT-sensitizers Consider a HPRD PPI sub-network corresponding to the Qiagen HDG gene set How “central” are these selected genes? Larger values of betweennessindicate that the node lies onmany shortest paths Makes sense - a number of them are stress-related But some of them have very lowbetweenness values
  • 62. Are These Genes Relevant? Most selected genesare densely connected A few are not Generally did notreconfirm Network metrics could be used to provide confidencein selections

Notas do Editor

  1. Outliers in a cliff prediction model are not as severe since SALI changes more slowly than just activity differences
  2. For SALI = 0, had to set log10(SALI) = 0Similar performance if we use SALI and not log10(SALI) at least more % variance is explained. Still fail on most significant cliffs
  3. View plates (raw, normalized, adjusted, …)Highlight specific genes, siRNA’sView assay statisticsView pathway membership (via Wikipathways)Linkout to external resources (Entrez, GeneCards, …)Hit selection, follow up (DRC)
  4. View plates (raw, normalized, adjusted, …)Highlight specific genes, siRNA’sView assay statisticsView pathway membership (via Wikipathways)Linkout to external resources (Entrez, GeneCards, …)Hit selection, follow up (DRC)
  5. * Proscillaridin A was not selected in the 20 compounds for further analysis in the paper* 2 cardiac glycosides in the top 3, target appears to be caspase-3 (activating it). CG inhibition of NF-kb is well known . See PNAS 2005, by Pollard* Trabectidin induces lethal DNA strand breaks and blocks cell cycle in G2 phase
  6. PSM* genes code for proteosome subunits – so they likely prevent the ubiquination of the IkBa complex, so that RelA+cp50 cannot be released from the IkBa complex and enter the nucleus
  7. Size of node indicates potency – larger is more potentLanatosidec and a have Tc = 1 and hence the edge was not shown (ideally it should be shown)
  8. Good confirmation that SEA worksSize of node corresponds to SEA confidence score
  9. We consider 41 compounds rather than 55, since a number of them did not have sufficiently confident target predictionsWe then get to 18 compounds since, many of the predicted genes, did not map to an NCI PID pathway
  10. Pheontypic difference can arise when PPI’s are involved
  11. HPRD subnetwork corresponding to the Qiagen HDG has 6782 genes
  12. HPRD subnetwork corresponding to the Qiagen HDG has 6782 genes