SlideShare uma empresa Scribd logo
1 de 25
Baixar para ler offline
What can your library do for you?
Rajarshi Guha, Dac-Trung Nguyen,
Alexey Zhakarov, Ajit Jadhav
NIH NCATS
ACS Fall Meeting 2016, Philadelphia
August 21, 2016
Library Design
Historical collections and assay data provide information on
how a set of compounds has faired
Use (dis)similarity and machine learning to construct new
collections that show similar behavior
Plus various constraints
Libraries can be designed for certain target families or specific
screening paradigms
If sufficiently annotated, compound behavior
may be correlated to assay and biology char-
acteristics
Library Design
Historical collections and assay data provide information on
how a set of compounds has faired
Use (dis)similarity and machine learning to construct new
collections that show similar behavior
Plus various constraints
Libraries can be designed for certain target families or specific
screening paradigms
If sufficiently annotated, compound behavior
may be correlated to assay and biology char-
acteristics
Two Questions
How likely are compounds, associated with a given annotation,
identified as active?
Given a new set of compounds, what sets of assay conditions (as
implied by the annotations) will they be active in?
BAO 2.0
Assay Modeling
Prior Work
BAO annotated datasets
de Souza et al, 2014; Vempati et al, 2012
Analyzing HTS datasets using BAO
Zander-Balderud et al, 2015; Sch¨urer et al, 2011
Semi-automated annotation of assay descriptions using the
BAO
Clark et al, 2014
Workflow
Extract unique BAO terms and for each term identify
annotated assays
Extract active compounds from this set of assays
Compute fingerprint bit distribution
Use these conditional bit distributions to identify the BAO
terms that describe the assay that they are likely to be active
in
Dataset Overview
Extracted 4010 Pubchem AIDs from BARD
Primary, confirmation, counterscreening assays
154M outcomes
740K compounds
Pubchem 881-bit keys using CDK and NCGC implementations
192 unique BAO terms
Dataset Overview
1e+02
1e+05
1e+08
Active Inactive Inconclusive Probe Unspecified
Numberofoutcomes
0
200
400
0 2 4 6
log Num. Compound
Density(assays)
Outcome
Active
Inactive
0
300
600
900
2 3 4 5 6 7 8 9 10 11 12 13
Term Depth
NumberofTerms
0
1000
2000
1 2 3 4 5 6 7 8 9 13
Number of BAO Terms
NumberofAssays
Class Imbalances
Imbalanced classes are problematic, and some of
the terms with near-balanced classes are not very
specific (e.g., imaging method)
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
qqqqqqq
qqq
qqqqqq
q
q
qqq
q
q
q
qq
q
q
qq
q
q
q
q
VICTOR X2 Multilabel Plate Reader
imaging method
radiometry method
SpectraMax 190 Microplate Reader
reporter protein
thermal shift
0
3
6
9
BAO ID
ClassRatio
Problem formulation
For a given library of compounds X, we would like to calculate a
ranked list relevant T of BAO terms that are most likely associated
with X. Let x ∈ X and t is a BAO term. The list T is an ordered
list based on the following:
argmax
i



j
p(ti |xj )



, (1)
where p(ti |xj ) is the probability that BAO term ti is associated
with compound xj . From Bayes’ rule, we have
p(ti |xj ) =
p(xj |ti )p(ti )
p(xj )
or p(ti |xj ) ∝ p(xj |ti )p(ti ).
Given that BAO terms are annotated at the assay level, we instead
have
p(ti |xj ) ∝ p(ti )
k
p(xj |ak)p(ak|ti ), (2)
where ak is a BAO annotated assay.
A Bayesian Approach for Ranking
Note that p(xj |ak) is the sampling function specified over only
active compounds in assay ak. In our model, xj is defined as
independent Bernoulli distribution with parameter θ, i.e.,
p(xj |ak) =
i
θ
xji
i (1 − θi )1−xji
,
where xji ∈ {0, 1} is the i-th bit of the PubChem substructural
fingerprint.
Learning BAO terms for a library of compounds amounts to
estimating θ, p(ti ), and p(ak|ti ).
Per-Term Activity Classifier
For a given ontology term Ti , predict whether a compound
will be active or not
Model this using Na¨ıve Bayes, where we extract set of actives
and inactives from assays annotated with Ti
Results in a set of models {M1, M2, · · · , MN},
For a new compound in library, obtain probability of being
active for term Ti for all i and take top k terms
Aggregate top k terms from all compounds in library
Represents the set of ontology terms defining an assay
in which these compounds would likely be active
Test Libraries
Considered several libraries to test out the approach
MIPE (1912 compounds) - Approved, investigational drugs,
constructed for functional diversity
LOPAC (1280 compounds) - Diverse library, designed for
enrichment of bioactivity
Natural Products (5000 compounds)
1000 member subset of ChEMBL GPCR collection
1000 member subset of ChEMBL Kinase collection
Test Libraries
In the Pubchem fingerprint space, the libraries are not very different
0.00
0.25
0.50
0.75
1.00
0.00
0.25
0.50
0.75
1.00
0.00
0.25
0.50
0.75
1.00
0.00
0.25
0.50
0.75
1.00
0.00
0.25
0.50
0.75
1.00
0.00
0.25
0.50
0.75
1.00
0.00
0.25
0.50
0.75
1.00
gpcr1gpcr2kinase1kinase2lopacmipenp
Bit Position
NormalizedFrequency
Test Libraries - Distance Matrix
lopac
mipe
np
kinase1
kinase2
gpcr1
gpcr2
lopac
m
ipe
np
kinase1
kinase2
gpcr1
gpcr2
0
1
2
3
4
Euc. Dist.
Prediction Workflow
Bayesian Ranking
Compute liklihood of all
terms for each compound
Aggregate across library
(mean liklihood) and take
top k
Activity Models
For molecules predicted
active, collect corresponding
terms
Retain the top k most
frequent terms across the
library
We take the top k terms as the set of anno-
tations describing an assay in which the library
will show activity in
Result - Bayesian Ranking
BAO_0000001
BAO_0000002
BAO_0000003
BAO_0000004
BAO_0000005
BAO_0000010
BAO_0000015
BAO_0000035
BAO_0000045
BAO_0000046
BAO_0000049
BAO_0000050
BAO_0000051
BAO_0000055
BAO_0000057
BAO_0000062
BAO_0000063
BAO_0000070
BAO_0000079
BAO_0000080
BAO_0000100
BAO_0000123
BAO_0000129
BAO_0000130
BAO_0000131
BAO_0000139
BAO_0000142
BAO_0000152
BAO_0000160
BAO_0000164
BAO_0000166
BAO_0000217
BAO_0000218
BAO_0000219
BAO_0000220
BAO_0000221
BAO_0000223
BAO_0000224
BAO_0000225
BAO_0000249
BAO_0000250
BAO_0000251
BAO_0000254
BAO_0000357
BAO_0000363
BAO_0000366
BAO_0000394
BAO_0000405
BAO_0000450
BAO_0000452
BAO_0000453
BAO_0000508
BAO_0000512
BAO_0000513
BAO_0000515
BAO_0000516
BAO_0000572
BAO_0000577
BAO_0000578
BAO_0000591
BAO_0000593
BAO_0000657
BAO_0000682
BAO_0000691
BAO_0000697
BAO_0000698
BAO_0000699
BAO_0000701
BAO_0000705
BAO_0000706
BAO_0000722
BAO_0000850
BAO_0000884
BAO_0000902
BAO_0000903
BAO_0000904
BAO_0000905
BAO_0000906
BAO_0000913
BAO_0000943
BAO_0000982
BAO_0001019
BAO_0001036
BAO_0001046
BAO_0001047
BAO_0001104
BAO_0002001
BAO_0002041
BAO_0002043
BAO_0002090
BAO_0002100
BAO_0002168
BAO_0002176
BAO_0002182
BAO_0002188
BAO_0002196
BAO_0002424
BAO_0002527
BAO_0002528
BAO_0002530
BAO_0002534
BAO_0002656
BAO_0002989
BAO_0002990
BAO_0002991
BAO_0002993
BAO_0002994
BAO_0002995
BAO_0002996
BAO_0002997
BAO_0002998
BAO_0003000
BAO_0003002
BAO_0003003
BAO_0003004
BAO_0003005
BAO_0003006
BAO_0003007
BAO_0003008
BAO_0003009
BAO_0003010
BAO_0003063
BAO_0003064
BAO_0003069
lopac
m
ipe
N
P
−5e+09
−4e+09
−3e+09
−2e+09
−1e+09
Avg Prob
Result - Per Term Activity Classifier
bioluminescence
molecular redistribution determination method
direct enzyme activity measurement method
whole cell lysate format
BAO_0000722
lopac
m
ipe
npkinase1kinase2
gpcr1
gpcr2
Rank
1
2
3
4
5
What’s Different Between Libraries?
What’s Different Between Libraries?
Term Depth for the ’Differential’ Terms
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qqq
qqq
qq
qq
q
q
qqqq q
q
3
4
5
6
7
8
3 4 5 6 7 8
Term Depth (LOPAC)
TermDepth(MIPE)
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qqq
qqq
qq
qq
q
q
q qqq q
q
3
4
5
6
7
8
3 4 5 6 7 8
Term Depth (NP)
TermDepth(MIPE)
Pitfalls
If sufficiently annotated, compound behavior may
be correlated to assay and biology characteristics
A very abstract, possibly lossy, view of the effect of
compounds on biology
Depends on correct and meaningful annotations
Annotations terms are context dependent, but this may not
be considered when annotating a dataset
BAO terms exhibit hierarchical relationships and ignoring
them is simplistic
Acknowledgements
Qiong Cheng (U. Miami)
Stephan Sch¨urer (U. Miami)
Source code and slides

Mais conteúdo relacionado

Semelhante a What can your library do for you?

Prediction Of Bioactivity From Chemical Structure
Prediction Of Bioactivity From Chemical StructurePrediction Of Bioactivity From Chemical Structure
Prediction Of Bioactivity From Chemical StructureJeremy Besnard
 
Gordon2003
Gordon2003Gordon2003
Gordon2003toluene
 
2015.04.08-Next-generation-sequencing-issues
2015.04.08-Next-generation-sequencing-issues2015.04.08-Next-generation-sequencing-issues
2015.04.08-Next-generation-sequencing-issuesDongyan Zhao
 
Using open bioactivity data for developing machine-learning prediction models...
Using open bioactivity data for developing machine-learning prediction models...Using open bioactivity data for developing machine-learning prediction models...
Using open bioactivity data for developing machine-learning prediction models...Sunghwan Kim
 
Bringing bioassay protocols to the world of informatics, using semantic annot...
Bringing bioassay protocols to the world of informatics, using semantic annot...Bringing bioassay protocols to the world of informatics, using semantic annot...
Bringing bioassay protocols to the world of informatics, using semantic annot...Alex Clark
 
Part 4 of RNA-seq for DE analysis: Extracting count table and QC
Part 4 of RNA-seq for DE analysis: Extracting count table and QCPart 4 of RNA-seq for DE analysis: Extracting count table and QC
Part 4 of RNA-seq for DE analysis: Extracting count table and QCJoachim Jacob
 
Novel Approaches to Elucidating Structure Activity Relationships
Novel Approaches to Elucidating Structure Activity RelationshipsNovel Approaches to Elucidating Structure Activity Relationships
Novel Approaches to Elucidating Structure Activity RelationshipsChristopher Petersen
 
Use of spark for proteomic scoring seattle presentation
Use of spark for  proteomic scoring   seattle presentationUse of spark for  proteomic scoring   seattle presentation
Use of spark for proteomic scoring seattle presentationlordjoe
 
Final presentation dwi riyono
Final presentation dwi riyonoFinal presentation dwi riyono
Final presentation dwi riyonoDwi Riyono
 
Metabolomic Data Analysis Workshop and Tutorials (2014)
Metabolomic Data Analysis Workshop and Tutorials (2014)Metabolomic Data Analysis Workshop and Tutorials (2014)
Metabolomic Data Analysis Workshop and Tutorials (2014)Dmitry Grapov
 
MCNext Sybr qPCR quantification Kit
MCNext Sybr qPCR quantification KitMCNext Sybr qPCR quantification Kit
MCNext Sybr qPCR quantification KitRui Wang
 
Mapping millions of peptidoforms to Genome Coordinates
Mapping millions of peptidoforms to Genome CoordinatesMapping millions of peptidoforms to Genome Coordinates
Mapping millions of peptidoforms to Genome CoordinatesYasset Perez-Riverol
 
Protein threading using context specific alignment potential ismb-2013
Protein threading using context specific alignment potential ismb-2013Protein threading using context specific alignment potential ismb-2013
Protein threading using context specific alignment potential ismb-2013Sheng Wang
 
Lecture 9 molecular descriptors
Lecture 9  molecular descriptorsLecture 9  molecular descriptors
Lecture 9 molecular descriptorsRAJAN ROLTA
 
Analsys Sciences - Introduction to HPLC
Analsys Sciences - Introduction to HPLCAnalsys Sciences - Introduction to HPLC
Analsys Sciences - Introduction to HPLCAnalysys
 
Multi-omics infrastructure and data for R/Bioconductor
Multi-omics infrastructure and data for R/BioconductorMulti-omics infrastructure and data for R/Bioconductor
Multi-omics infrastructure and data for R/BioconductorLevi Waldron
 
Link Mining for Kernel-based Compound-Protein Interaction Predictions Using a...
Link Mining for Kernel-based Compound-Protein Interaction Predictions Using a...Link Mining for Kernel-based Compound-Protein Interaction Predictions Using a...
Link Mining for Kernel-based Compound-Protein Interaction Predictions Using a...Masahito Ohue
 
Models Can Lie
Models Can LieModels Can Lie
Models Can LieRaju Rimal
 
A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...
A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...
A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...Vahid Taslimitehrani
 

Semelhante a What can your library do for you? (20)

Prediction Of Bioactivity From Chemical Structure
Prediction Of Bioactivity From Chemical StructurePrediction Of Bioactivity From Chemical Structure
Prediction Of Bioactivity From Chemical Structure
 
Gordon2003
Gordon2003Gordon2003
Gordon2003
 
2015.04.08-Next-generation-sequencing-issues
2015.04.08-Next-generation-sequencing-issues2015.04.08-Next-generation-sequencing-issues
2015.04.08-Next-generation-sequencing-issues
 
Using open bioactivity data for developing machine-learning prediction models...
Using open bioactivity data for developing machine-learning prediction models...Using open bioactivity data for developing machine-learning prediction models...
Using open bioactivity data for developing machine-learning prediction models...
 
Bringing bioassay protocols to the world of informatics, using semantic annot...
Bringing bioassay protocols to the world of informatics, using semantic annot...Bringing bioassay protocols to the world of informatics, using semantic annot...
Bringing bioassay protocols to the world of informatics, using semantic annot...
 
Part 4 of RNA-seq for DE analysis: Extracting count table and QC
Part 4 of RNA-seq for DE analysis: Extracting count table and QCPart 4 of RNA-seq for DE analysis: Extracting count table and QC
Part 4 of RNA-seq for DE analysis: Extracting count table and QC
 
Novel Approaches to Elucidating Structure Activity Relationships
Novel Approaches to Elucidating Structure Activity RelationshipsNovel Approaches to Elucidating Structure Activity Relationships
Novel Approaches to Elucidating Structure Activity Relationships
 
Use of spark for proteomic scoring seattle presentation
Use of spark for  proteomic scoring   seattle presentationUse of spark for  proteomic scoring   seattle presentation
Use of spark for proteomic scoring seattle presentation
 
Final presentation dwi riyono
Final presentation dwi riyonoFinal presentation dwi riyono
Final presentation dwi riyono
 
Metabolomic Data Analysis Workshop and Tutorials (2014)
Metabolomic Data Analysis Workshop and Tutorials (2014)Metabolomic Data Analysis Workshop and Tutorials (2014)
Metabolomic Data Analysis Workshop and Tutorials (2014)
 
MCNext Sybr qPCR quantification Kit
MCNext Sybr qPCR quantification KitMCNext Sybr qPCR quantification Kit
MCNext Sybr qPCR quantification Kit
 
Mapping millions of peptidoforms to Genome Coordinates
Mapping millions of peptidoforms to Genome CoordinatesMapping millions of peptidoforms to Genome Coordinates
Mapping millions of peptidoforms to Genome Coordinates
 
Protein threading using context specific alignment potential ismb-2013
Protein threading using context specific alignment potential ismb-2013Protein threading using context specific alignment potential ismb-2013
Protein threading using context specific alignment potential ismb-2013
 
Lecture 9 molecular descriptors
Lecture 9  molecular descriptorsLecture 9  molecular descriptors
Lecture 9 molecular descriptors
 
Analsys Sciences - Introduction to HPLC
Analsys Sciences - Introduction to HPLCAnalsys Sciences - Introduction to HPLC
Analsys Sciences - Introduction to HPLC
 
Multi-omics infrastructure and data for R/Bioconductor
Multi-omics infrastructure and data for R/BioconductorMulti-omics infrastructure and data for R/Bioconductor
Multi-omics infrastructure and data for R/Bioconductor
 
Link Mining for Kernel-based Compound-Protein Interaction Predictions Using a...
Link Mining for Kernel-based Compound-Protein Interaction Predictions Using a...Link Mining for Kernel-based Compound-Protein Interaction Predictions Using a...
Link Mining for Kernel-based Compound-Protein Interaction Predictions Using a...
 
Contrast Pattern Aided Regression and Classification
Contrast Pattern Aided Regression and ClassificationContrast Pattern Aided Regression and Classification
Contrast Pattern Aided Regression and Classification
 
Models Can Lie
Models Can LieModels Can Lie
Models Can Lie
 
A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...
A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...
A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...
 

Mais de Rajarshi Guha

Pharos: A Torch to Use in Your Journey in the Dark Genome
Pharos: A Torch to Use in Your Journey in the Dark GenomePharos: A Torch to Use in Your Journey in the Dark Genome
Pharos: A Torch to Use in Your Journey in the Dark GenomeRajarshi Guha
 
Pharos: Putting targets in context
Pharos: Putting targets in contextPharos: Putting targets in context
Pharos: Putting targets in contextRajarshi Guha
 
Pharos – A Torch to Use in Your Journey In the Dark Genome
Pharos – A Torch to Use in Your Journey In the Dark GenomePharos – A Torch to Use in Your Journey In the Dark Genome
Pharos – A Torch to Use in Your Journey In the Dark GenomeRajarshi Guha
 
Pharos - Face of the KMC
Pharos - Face of the KMCPharos - Face of the KMC
Pharos - Face of the KMCRajarshi Guha
 
Enhancing Prioritization & Discovery of Novel Combinations using an HTS Platform
Enhancing Prioritization & Discovery of Novel Combinations using an HTS PlatformEnhancing Prioritization & Discovery of Novel Combinations using an HTS Platform
Enhancing Prioritization & Discovery of Novel Combinations using an HTS PlatformRajarshi Guha
 
So I have an SD File … What do I do next?
So I have an SD File … What do I do next?So I have an SD File … What do I do next?
So I have an SD File … What do I do next?Rajarshi Guha
 
Characterization of Chemical Libraries Using Scaffolds and Network Models
Characterization of Chemical Libraries Using Scaffolds and Network ModelsCharacterization of Chemical Libraries Using Scaffolds and Network Models
Characterization of Chemical Libraries Using Scaffolds and Network ModelsRajarshi Guha
 
From Data to Action : Bridging Chemistry and Biology with Informatics at NCATS
From Data to Action: Bridging Chemistry and Biology with Informatics at NCATSFrom Data to Action: Bridging Chemistry and Biology with Informatics at NCATS
From Data to Action : Bridging Chemistry and Biology with Informatics at NCATSRajarshi Guha
 
Robots, Small Molecules & R
Robots, Small Molecules & RRobots, Small Molecules & R
Robots, Small Molecules & RRajarshi Guha
 
Fingerprinting Chemical Structures
Fingerprinting Chemical StructuresFingerprinting Chemical Structures
Fingerprinting Chemical StructuresRajarshi Guha
 
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...Rajarshi Guha
 
When the whole is better than the parts
When the whole is better than the partsWhen the whole is better than the parts
When the whole is better than the partsRajarshi Guha
 
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...Rajarshi Guha
 
Pushing Chemical Biology Through the Pipes
Pushing Chemical Biology Through the PipesPushing Chemical Biology Through the Pipes
Pushing Chemical Biology Through the PipesRajarshi Guha
 
Characterization and visualization of compound combination responses in a hig...
Characterization and visualization of compound combination responses in a hig...Characterization and visualization of compound combination responses in a hig...
Characterization and visualization of compound combination responses in a hig...Rajarshi Guha
 
The BioAssay Research Database
The BioAssay Research DatabaseThe BioAssay Research Database
The BioAssay Research DatabaseRajarshi Guha
 
Cloudy with a Touch of Cheminformatics
Cloudy with a Touch of CheminformaticsCloudy with a Touch of Cheminformatics
Cloudy with a Touch of CheminformaticsRajarshi Guha
 
Chemical Data Mining: Open Source & Reproducible
Chemical Data Mining: Open Source & ReproducibleChemical Data Mining: Open Source & Reproducible
Chemical Data Mining: Open Source & ReproducibleRajarshi Guha
 
Chemogenomics in the cloud: Is the sky the limit?
Chemogenomics in the cloud: Is the sky the limit?Chemogenomics in the cloud: Is the sky the limit?
Chemogenomics in the cloud: Is the sky the limit?Rajarshi Guha
 
Quantifying Text Sentiment in R
Quantifying Text Sentiment in RQuantifying Text Sentiment in R
Quantifying Text Sentiment in RRajarshi Guha
 

Mais de Rajarshi Guha (20)

Pharos: A Torch to Use in Your Journey in the Dark Genome
Pharos: A Torch to Use in Your Journey in the Dark GenomePharos: A Torch to Use in Your Journey in the Dark Genome
Pharos: A Torch to Use in Your Journey in the Dark Genome
 
Pharos: Putting targets in context
Pharos: Putting targets in contextPharos: Putting targets in context
Pharos: Putting targets in context
 
Pharos – A Torch to Use in Your Journey In the Dark Genome
Pharos – A Torch to Use in Your Journey In the Dark GenomePharos – A Torch to Use in Your Journey In the Dark Genome
Pharos – A Torch to Use in Your Journey In the Dark Genome
 
Pharos - Face of the KMC
Pharos - Face of the KMCPharos - Face of the KMC
Pharos - Face of the KMC
 
Enhancing Prioritization & Discovery of Novel Combinations using an HTS Platform
Enhancing Prioritization & Discovery of Novel Combinations using an HTS PlatformEnhancing Prioritization & Discovery of Novel Combinations using an HTS Platform
Enhancing Prioritization & Discovery of Novel Combinations using an HTS Platform
 
So I have an SD File … What do I do next?
So I have an SD File … What do I do next?So I have an SD File … What do I do next?
So I have an SD File … What do I do next?
 
Characterization of Chemical Libraries Using Scaffolds and Network Models
Characterization of Chemical Libraries Using Scaffolds and Network ModelsCharacterization of Chemical Libraries Using Scaffolds and Network Models
Characterization of Chemical Libraries Using Scaffolds and Network Models
 
From Data to Action : Bridging Chemistry and Biology with Informatics at NCATS
From Data to Action: Bridging Chemistry and Biology with Informatics at NCATSFrom Data to Action: Bridging Chemistry and Biology with Informatics at NCATS
From Data to Action : Bridging Chemistry and Biology with Informatics at NCATS
 
Robots, Small Molecules & R
Robots, Small Molecules & RRobots, Small Molecules & R
Robots, Small Molecules & R
 
Fingerprinting Chemical Structures
Fingerprinting Chemical StructuresFingerprinting Chemical Structures
Fingerprinting Chemical Structures
 
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...
 
When the whole is better than the parts
When the whole is better than the partsWhen the whole is better than the parts
When the whole is better than the parts
 
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...
 
Pushing Chemical Biology Through the Pipes
Pushing Chemical Biology Through the PipesPushing Chemical Biology Through the Pipes
Pushing Chemical Biology Through the Pipes
 
Characterization and visualization of compound combination responses in a hig...
Characterization and visualization of compound combination responses in a hig...Characterization and visualization of compound combination responses in a hig...
Characterization and visualization of compound combination responses in a hig...
 
The BioAssay Research Database
The BioAssay Research DatabaseThe BioAssay Research Database
The BioAssay Research Database
 
Cloudy with a Touch of Cheminformatics
Cloudy with a Touch of CheminformaticsCloudy with a Touch of Cheminformatics
Cloudy with a Touch of Cheminformatics
 
Chemical Data Mining: Open Source & Reproducible
Chemical Data Mining: Open Source & ReproducibleChemical Data Mining: Open Source & Reproducible
Chemical Data Mining: Open Source & Reproducible
 
Chemogenomics in the cloud: Is the sky the limit?
Chemogenomics in the cloud: Is the sky the limit?Chemogenomics in the cloud: Is the sky the limit?
Chemogenomics in the cloud: Is the sky the limit?
 
Quantifying Text Sentiment in R
Quantifying Text Sentiment in RQuantifying Text Sentiment in R
Quantifying Text Sentiment in R
 

Último

Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 

Último (20)

Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 

What can your library do for you?

  • 1. What can your library do for you? Rajarshi Guha, Dac-Trung Nguyen, Alexey Zhakarov, Ajit Jadhav NIH NCATS ACS Fall Meeting 2016, Philadelphia August 21, 2016
  • 2. Library Design Historical collections and assay data provide information on how a set of compounds has faired Use (dis)similarity and machine learning to construct new collections that show similar behavior Plus various constraints Libraries can be designed for certain target families or specific screening paradigms If sufficiently annotated, compound behavior may be correlated to assay and biology char- acteristics
  • 3. Library Design Historical collections and assay data provide information on how a set of compounds has faired Use (dis)similarity and machine learning to construct new collections that show similar behavior Plus various constraints Libraries can be designed for certain target families or specific screening paradigms If sufficiently annotated, compound behavior may be correlated to assay and biology char- acteristics
  • 4. Two Questions How likely are compounds, associated with a given annotation, identified as active? Given a new set of compounds, what sets of assay conditions (as implied by the annotations) will they be active in?
  • 7. Prior Work BAO annotated datasets de Souza et al, 2014; Vempati et al, 2012 Analyzing HTS datasets using BAO Zander-Balderud et al, 2015; Sch¨urer et al, 2011 Semi-automated annotation of assay descriptions using the BAO Clark et al, 2014
  • 8. Workflow Extract unique BAO terms and for each term identify annotated assays Extract active compounds from this set of assays Compute fingerprint bit distribution Use these conditional bit distributions to identify the BAO terms that describe the assay that they are likely to be active in
  • 9. Dataset Overview Extracted 4010 Pubchem AIDs from BARD Primary, confirmation, counterscreening assays 154M outcomes 740K compounds Pubchem 881-bit keys using CDK and NCGC implementations 192 unique BAO terms
  • 10. Dataset Overview 1e+02 1e+05 1e+08 Active Inactive Inconclusive Probe Unspecified Numberofoutcomes 0 200 400 0 2 4 6 log Num. Compound Density(assays) Outcome Active Inactive 0 300 600 900 2 3 4 5 6 7 8 9 10 11 12 13 Term Depth NumberofTerms 0 1000 2000 1 2 3 4 5 6 7 8 9 13 Number of BAO Terms NumberofAssays
  • 11. Class Imbalances Imbalanced classes are problematic, and some of the terms with near-balanced classes are not very specific (e.g., imaging method) qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq qqqqqqq qqq qqqqqq q q qqq q q q qq q q qq q q q q VICTOR X2 Multilabel Plate Reader imaging method radiometry method SpectraMax 190 Microplate Reader reporter protein thermal shift 0 3 6 9 BAO ID ClassRatio
  • 12. Problem formulation For a given library of compounds X, we would like to calculate a ranked list relevant T of BAO terms that are most likely associated with X. Let x ∈ X and t is a BAO term. The list T is an ordered list based on the following: argmax i    j p(ti |xj )    , (1) where p(ti |xj ) is the probability that BAO term ti is associated with compound xj . From Bayes’ rule, we have p(ti |xj ) = p(xj |ti )p(ti ) p(xj ) or p(ti |xj ) ∝ p(xj |ti )p(ti ). Given that BAO terms are annotated at the assay level, we instead have p(ti |xj ) ∝ p(ti ) k p(xj |ak)p(ak|ti ), (2) where ak is a BAO annotated assay.
  • 13. A Bayesian Approach for Ranking Note that p(xj |ak) is the sampling function specified over only active compounds in assay ak. In our model, xj is defined as independent Bernoulli distribution with parameter θ, i.e., p(xj |ak) = i θ xji i (1 − θi )1−xji , where xji ∈ {0, 1} is the i-th bit of the PubChem substructural fingerprint. Learning BAO terms for a library of compounds amounts to estimating θ, p(ti ), and p(ak|ti ).
  • 14. Per-Term Activity Classifier For a given ontology term Ti , predict whether a compound will be active or not Model this using Na¨ıve Bayes, where we extract set of actives and inactives from assays annotated with Ti Results in a set of models {M1, M2, · · · , MN}, For a new compound in library, obtain probability of being active for term Ti for all i and take top k terms Aggregate top k terms from all compounds in library Represents the set of ontology terms defining an assay in which these compounds would likely be active
  • 15. Test Libraries Considered several libraries to test out the approach MIPE (1912 compounds) - Approved, investigational drugs, constructed for functional diversity LOPAC (1280 compounds) - Diverse library, designed for enrichment of bioactivity Natural Products (5000 compounds) 1000 member subset of ChEMBL GPCR collection 1000 member subset of ChEMBL Kinase collection
  • 16. Test Libraries In the Pubchem fingerprint space, the libraries are not very different 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 gpcr1gpcr2kinase1kinase2lopacmipenp Bit Position NormalizedFrequency
  • 17. Test Libraries - Distance Matrix lopac mipe np kinase1 kinase2 gpcr1 gpcr2 lopac m ipe np kinase1 kinase2 gpcr1 gpcr2 0 1 2 3 4 Euc. Dist.
  • 18. Prediction Workflow Bayesian Ranking Compute liklihood of all terms for each compound Aggregate across library (mean liklihood) and take top k Activity Models For molecules predicted active, collect corresponding terms Retain the top k most frequent terms across the library We take the top k terms as the set of anno- tations describing an assay in which the library will show activity in
  • 19. Result - Bayesian Ranking BAO_0000001 BAO_0000002 BAO_0000003 BAO_0000004 BAO_0000005 BAO_0000010 BAO_0000015 BAO_0000035 BAO_0000045 BAO_0000046 BAO_0000049 BAO_0000050 BAO_0000051 BAO_0000055 BAO_0000057 BAO_0000062 BAO_0000063 BAO_0000070 BAO_0000079 BAO_0000080 BAO_0000100 BAO_0000123 BAO_0000129 BAO_0000130 BAO_0000131 BAO_0000139 BAO_0000142 BAO_0000152 BAO_0000160 BAO_0000164 BAO_0000166 BAO_0000217 BAO_0000218 BAO_0000219 BAO_0000220 BAO_0000221 BAO_0000223 BAO_0000224 BAO_0000225 BAO_0000249 BAO_0000250 BAO_0000251 BAO_0000254 BAO_0000357 BAO_0000363 BAO_0000366 BAO_0000394 BAO_0000405 BAO_0000450 BAO_0000452 BAO_0000453 BAO_0000508 BAO_0000512 BAO_0000513 BAO_0000515 BAO_0000516 BAO_0000572 BAO_0000577 BAO_0000578 BAO_0000591 BAO_0000593 BAO_0000657 BAO_0000682 BAO_0000691 BAO_0000697 BAO_0000698 BAO_0000699 BAO_0000701 BAO_0000705 BAO_0000706 BAO_0000722 BAO_0000850 BAO_0000884 BAO_0000902 BAO_0000903 BAO_0000904 BAO_0000905 BAO_0000906 BAO_0000913 BAO_0000943 BAO_0000982 BAO_0001019 BAO_0001036 BAO_0001046 BAO_0001047 BAO_0001104 BAO_0002001 BAO_0002041 BAO_0002043 BAO_0002090 BAO_0002100 BAO_0002168 BAO_0002176 BAO_0002182 BAO_0002188 BAO_0002196 BAO_0002424 BAO_0002527 BAO_0002528 BAO_0002530 BAO_0002534 BAO_0002656 BAO_0002989 BAO_0002990 BAO_0002991 BAO_0002993 BAO_0002994 BAO_0002995 BAO_0002996 BAO_0002997 BAO_0002998 BAO_0003000 BAO_0003002 BAO_0003003 BAO_0003004 BAO_0003005 BAO_0003006 BAO_0003007 BAO_0003008 BAO_0003009 BAO_0003010 BAO_0003063 BAO_0003064 BAO_0003069 lopac m ipe N P −5e+09 −4e+09 −3e+09 −2e+09 −1e+09 Avg Prob
  • 20. Result - Per Term Activity Classifier bioluminescence molecular redistribution determination method direct enzyme activity measurement method whole cell lysate format BAO_0000722 lopac m ipe npkinase1kinase2 gpcr1 gpcr2 Rank 1 2 3 4 5
  • 23. Term Depth for the ’Differential’ Terms q q q q q q q q qq q q q q q q q q q q q q q q q q q q qqq qqq qq qq q q qqqq q q 3 4 5 6 7 8 3 4 5 6 7 8 Term Depth (LOPAC) TermDepth(MIPE) q q q q q q q q q q q q q q q q q q q q q q q q q q q q qqq qqq qq qq q q q qqq q q 3 4 5 6 7 8 3 4 5 6 7 8 Term Depth (NP) TermDepth(MIPE)
  • 24. Pitfalls If sufficiently annotated, compound behavior may be correlated to assay and biology characteristics A very abstract, possibly lossy, view of the effect of compounds on biology Depends on correct and meaningful annotations Annotations terms are context dependent, but this may not be considered when annotating a dataset BAO terms exhibit hierarchical relationships and ignoring them is simplistic
  • 25. Acknowledgements Qiong Cheng (U. Miami) Stephan Sch¨urer (U. Miami) Source code and slides