Bhawanipatna Call Girls 📞9332606886 Call Girls in Bhawanipatna Escorts servic...
Enhancing high throughput screeing for mycobacterium tuberculosis drug discovery using bayesian models
1. Enhancing High Throughput Screening For Mycobacterium
tuberculosis Drug Discovery Using Bayesian Models
Sean Ekins1, 2*, Robert C. Reynolds3,4, Baojie Wan5 Scott G. Franzblau5,
Joel S. Freundlich6,7and Barry A. Bunin1
1
Collaborative Drug Discovery, 1633 Bayshore Highway, Suite 342, Burlingame, CA 94010, USA.
2
Collaborations in Chemistry, 5616 Hilltop Needmore Road, Fuquay-Varina, NC 27526, USA.
3
Southern Research Institute, 2000 Ninth Avenue South, Birmingham, AL 35205, USA.
4
Current address: University of Alabama at Birmingham, College of Arts and Sciences , Department of Chemistry, 1530 3 rd
Avenue South, Birmingham, Alabama 35294-1240, USA.
5
Institute for Tuberculosis Research, University of Illinois at Chicago, Chicago, IL 60607, USA
6
Department of Medicine, Center for Emerging and Reemerging Pathogens, UMDNJ – New Jersey Medical School, 185
South Orange Avenue Newark, NJ 07103, USA.
7
Department of Pharmacology & Physiology, UMDNJ – New Jersey Medical School, 185 South Orange Avenue Newark, NJ
07103, USA.
.
2. Applying CDD to Build a disease community for TB
Tuberculosis kills 1.6-1.7m/yr (~1 every 8 seconds)
1/3rd of worlds population infected!!!!
Multi drug resistance in 4.3% of cases
Extensively drug resistant increasing incidence
One new drugs in over 40 yrs
Drug-drug interactions and Co-morbidity with HIV
Collaboration between groups is rare
These groups may work on existing or new targets
Use of computational methods with TB is rare
3. ~ 20 public datasets for TB
Including Novartis data on TB hits
>300,000 cpds
Patents, Papers Annotated by CDD
Open to browse by anyone
http://www.collaborativedrug.
com/register
4. Fitting into the drug discovery
process
Ekins et al,
Trends in
Microbiology
19: 65-74, 2011
6. UIC hit rates
Inhibitor
Compound Number of Hit rate (%) at 90%
Provider concentration (ug/ml Readout
Library compounds Inhibition
or uM)
Luminescence
ChemBridge Novacore 50,000 30 uM 4.55
(LuxAB)
Luminescence
Asinex Diverse 59,760 50 uM 1.91
(LuxAB)
Luminescence
ASDI 6,811 30 uM 2.73
(LuxAB)
Prestwick 1,120 20 ug/ml Luminescence (ATP) 20.6
Fluorescence (MABA) 16.07
Luminescence
MRCT 100,000 10 uM 0.67
(LuxABCDE)
7. Wasting data?
Information from these inefficient and expensive
HTS campaigns does not appear to have been
used to direct “informed” selection of new
libraries in subsequent screens and compound
optimization in TB drug discovery
How can we continuously learn from all the
data?
8. Bayesian machine learning
Bayesian classification is a simple probabilistic classification model. It is based on
Bayes’ theorem
h is the hypothesis or model
d is the observed data
p(h) is the prior belief (probability of hypothesis h before observing any data)
p(d) is the data evidence (marginal probability of the data)
p(d|h) is the likelihood (probability of data d if hypothesis h is true)
p(h|d) is the posterior probability (probability of hypothesis h being true given the
observed data d)
A weight is calculated for each feature using a Laplacian-adjusted probability
estimate to account for the different sampling frequencies of different features.
The weights are summed to provide a probability estimate
Ekins, Williams and Xu, Drug Metab Dispos 38: 2302-2308, 2010
9. Process – Bioactivity only
High-throughput Mtb screening Bayesian Machine Learning Mtb Model
phenotypic molecule
Mtb screening database
S Descriptors + Bioactivity
N
H
N
Molecule Database
(e.g. GSK malaria actives)
virtually scored using Bayesian Models
Top scoring molecules assayed for New bioactivity data
Mtb growth inhibition may enhance models
S
Identify in vitro hits N
H
N
Increased hit/lead discovery efficiency
10. Bayesian Classification TB Models
We can use the public data for machine learning model building
Using Discovery Studio Bayesian model
Leave out 50% x 100
Dateset Internal
(number of External ROC
molecules) ROC Score Score Concordance Specificity Sensitivity
MLSMR
All single point
screen
(N = 220463) 0.86 ± 0 0.86 ± 0 78.56 ± 1.86 78.59 ± 1.94 77.13 ± 2.26
MLSMR
dose response set
(N = 2273) 0.73 ± 0.01 0.75 ± 0.01 66.85 ± 4.06 67.21 ± 7.05 65.47 ± 7.96
Ekins et al., Mol BioSyst, 6: 840-851, 2010
11. Bayesian Classification Models for TB
Laplacian-corrected Bayesian classifier models were generated using FCFP-6 and
simple descriptors. 2 models 220,000 and >2000 compounds
active compounds with MIC < 5uM
Good
G1: 1704324327 G2: -2092491099 G3: -1230843627 G4: 940811929 G5: 563485513
73 out of 165 good 57 out of 120 good 75 out of 188 good 35 out of 65 good 123 out of 357 good
Bayesian Score: 2.885 Bayesian Score: 2.873 Bayesian Score: 2.811 Bayesian Score: 2.780 Bayesian Score: 2.769
Bad
B1: 1444982751 B2: 274564616 B3: -1775057221 B4: 48625803 B5: 899570811
0 out of 1158 good 0 out of 1024 good 0 out of 982 good 0 out of 740 good 0 out of 738 good
Bayesian Score: -3.135 Bayesian Score: -3.018 Bayesian Score: -2.978 Bayesian Score: -2.712 Bayesian Score: -2.709
Ekins et al., Mol BioSyst, 6: 840-851, 2010
13. Initial testing of Mtb Bayesian models using NIAID and
GVKbio data
Both models substantially better than
the random hit rate for identifying
known active compounds with
MIC 5 uM in the first 1000
compounds sorted by the Bayesian
model scores
The number of active compounds
was substantially larger in the NIAID
dataset (1871 out of
3748) versus the GVKbio dataset
(377 out of 2880),
Ekins et al., Mol BioSyst, 6: 840-851, 2010
14. Additional test sets
1702 hits in >100K cpds 34 hits in 248 cpds 21 hits in 2108 cpds
100K library Novartis Data FDA drugs
Suggests models can predict data from the same and independent labs
Enrichments 4-10 fold
Initial enrichment – enables screening few compounds to find actives
Ekins et al., Mol BioSyst, 6: 840-851, 2010 Ekins and Freundlich, Pharm Res, 28, 1859-1869, 2011.
15. Dual-Event models
Become more stringent in what we call an
ACTIVE
IC90 < 10 uM and a selectivity index (SI)
greater than ten. SI was calculated as SI =
CC50/IC90 where CC50 is the concentration
that resulted in 50% inhibition of Vero cells
(CC50).
16. Dual-Event models
High-throughput Mtb screening Bayesian Machine Learning Mtb Model
phenotypic molecule
Mtb screening database
S Descriptors + Bioactivity (+Cytotoxicity)
N
H
N
Molecule Database
(e.g. GSK malaria actives)
virtually scored using Bayesian Models
Top scoring molecules assayed for New bioactivity data
Mtb growth inhibition may enhance models
S
Identify in vitro hits N
H
N
Increased hit/lead discovery efficiency
17. Bayesian Classification TB Models
Single pt ROC XV AUC = 0.88
Dose resp = 0.78
Dose resp + cyto = 0.86
Dateset External Internal
(number of ROC ROC
molecules) Score Score Concordance Specificity Sensitivity
MLSMR
All single point
screen
(N = 220463) 0.86 ± 0 0.86 ± 0 78.56 ± 1.86 78.59 ± 1.94 77.13 ± 2.26
MLSMR
dose response set
(N = 2273) 0.73 ± 0.01 0.75 ± 0.01 66.85 ± 4.06 67.21 ± 7.05 65.47 ± 7.96
NEW Dose resp and
cytotoxicity (N =
2273) 0.82 ± 0.02 0.84 ± 0.02 82.61 ± 4.68 83.91 ± 5.48 65.99 ± 7.47
Ekins et al., PLOSONE, in press 2013
18. MLSMR dual event model
Good
bad
Ekins et al., PLOSONE, in press 2013
20. Models with SRI kinase data
Model 1 ROC XV AUC (N 23797) = 0.89
Model 2 (N 1248) = 0.72
Model 3 (N 1248) = 0.77
Leave out 50% x 100
Dateset Internal
(number of External ROC
molecules) ROC Score Score Concordance Specificity Sensitivity
Model 1
(N = 23797) 0.87 ± 0 0.88 ± 0 76.77 ± 2.14 76.49 ± 2.41 81.7 ± 2.96
Model 2
(N = 1248) 0.65 ± 0.01 0.70 ± 0.01 61.58 ± 1.56 61.85 ± 8.45 61.30 ± 8.24
Model 3 64.84 ±
(N=1248) 0.74 ± 0.02 0.75 ± 0.02 68.67 ± 6.88 69.28 ± 9.84 12.11
Ekins et al., PLOSONE, in press 2013
21. Testing to date has been retrospective
Can we use our models to select compounds
and influence design?
Prospective prediction
Do it enough times to show robustness
22. Testing prospectively
MLSMR dose response with cytotoxicity and the
TAACF kinase dose response with cytotoxicity
models were used to screen the
Asinex library (N = 25,008)
Maybridge library (N = 57,200)
Selleck Chemicals kinase library (N = 194)
23. Results - Asinex library
94 molecules selected with the MLSMR dose response and
cytotoxicity model
88 with the library based on kinase inhibitor scaffolds with
cytotoxicity model and were tested at a single
concentration.
8 (MLSMR) and 19 hits (kinase) with > 90% inhibition at
100 ug/ml (8.5% and 21.5% hit rates)
Results - Maybridge library
50 molecules had greater than or equal to 90% inhibition at
100 ug/ml (28.7% hit rate) - 8 with good SI
Ekins et al., PLOSONE, in press 2013
24. Asinex and MLSMR actives PCA
Ekins et al., PLOSONE, in press 2013
26. An example of the model ranking similar
compounds
Maybridge Structure Inhibition % Inhibition % MIC MIC LORA CC50 Vero MLSMR Kinase
number MABA at 100 LORA at 100 MABA (µg g/ml) (µg g/ml) model model
µg g/ml µg g/ml (µg/ml) score score
JFD02381 98.9 95 5.84 10.09 >100 25.27 12.79
(0.80) (0.5)
JFD02382 91.5 90.1 > 100 47.99 >100 18.32 9.78
(0.69) (0.43)
27. Analysis of SelleckChem Kinase library N=194
47 molecules greater
than or equal to 90%
inhibition of M.
tuberculosis activity,
at 100ug/ml
hit rate of 24.2%.
Note best model was
another dual activity
model (Ekins et al.,
Chem Biol 20: 370-378,
2013)
Ekins et al., PLOSONE, in press 2013
29. A summary of the numbers involved – filtering for hits.
82,403 molecules screened through Bayesian models
550 molecules were tested in vitro
124 actives were identified
22.5 % hit rate
Identified several novel potent lead series with good cytotoxicity
& selectivity
Identified known human kinase inhibitors and FDA approved
drugs as new hits
30. Conclusions
Still difficult to identify molecules with bioactivity and no
cytotoxicity
Models perform differently on different data sets
Need to understand what factors are key
Hit rate much higher than HTS / screen a fraction of
molecules
Computational models should be used prior to HTS
Focus resources
31. Acknowledgments
The project described was supported by Award Number R43 LM011152-01
“Biocomputation across distributed private datasets to enhance drug
discovery” from the National Library of Medicine (PI: S. Ekins)
Accelrys
The CDD TB has been developed thanks to funding from the Bill and
Melinda Gates Foundation (Grant#49852 “Collaborative drug discovery for
TB through a novel database of SAR data optimized to promote data
archiving and sharing”)
Allen Casey (IDRI)
32. You can find me @... CDD Booth 205
PAPER ID: 13433
PAPER TITLE: “Dispensing processes profoundly impact biological assays and computational and
statistical analyses”
April 8th 8.35am Room 349
PAPER ID: 14750
PAPER TITLE: “Enhancing High Throughput Screening For Mycobacterium tuberculosis Drug Discovery
Using Bayesian Models”
April 9th 1.30pm Room 353
PAPER ID: 21524
PAPER TITLE: “Navigating between patents, papers, abstracts and databases using public sources and
tools”
April 9th 3.50pm Room 350
PAPER ID: 13358
PAPER TITLE: “TB Mobile: Appifying Data on Anti-tuberculosis Molecule Targets”
April 10th 8.30am Room 357
PAPER ID: 13382
PAPER TITLE: “Challenges and recommendations for obtaining chemical structures of industry-provided
repurposing candidates”
April 10th 10.20am Room 350
PAPER ID: 13438
PAPER TITLE: “Dual-event machine learning models to accelerate drug discovery”
April 10th 3.05 pm Room 350
Notas do Editor
CDD Experienced Team Innovates and Executes Barry Bunin, PhD (Pres. & Cofounder as first Eli Lilly EIR) Libraria (CEO, Pres.-CSO), Arris Pharmaceuticals (Sr. Scientist), Genentech, UC Berkeley (Ellman), Columbia University, author. Moses Hohman, PhD (Director Software Engineering) Northwestern Assoc. Director of Bioinformatics, Thoughtworks, Inc., U of Chicago (PhD), Harvard ( magna cum laude, Physics) Sylvia Ernst, PhD (Director Community Growth & Sales) Left 800-lb Gorillas: Accelrys-Scitegic, MDL-Elsevier-Beilstein Peter Cohan (BOD & Overall Sales Strategy) Symyx (VP Bus Dev & President-Discovery Tools), MDL (VP Customer Marketing), www.secondderivative.com, author. Omidyar Network, Founders Fund, & Lilly (BOD observers) WSGR (Corporate Counsel), Rina Accountancy (GAAP compliance) Partners: Hub Consortium Members, ChemAxon, DNDi, MMV, Sandler Center… CDD SAB: Christopher Lipinski PhD, James McKerrow, MD PhD, David Roos PhD, Adam Renslo PhD, Wes Van Voorhis, MD PhD
CDD Experienced Team Innovates and Executes Barry Bunin, PhD (Pres. & Cofounder as first Eli Lilly EIR) Libraria (CEO, Pres.-CSO), Arris Pharmaceuticals (Sr. Scientist), Genentech, UC Berkeley (Ellman), Columbia University, author. Moses Hohman, PhD (Director Software Engineering) Northwestern Assoc. Director of Bioinformatics, Thoughtworks, Inc., U of Chicago (PhD), Harvard ( magna cum laude, Physics) Sylvia Ernst, PhD (Director Community Growth & Sales) Left 800-lb Gorillas: Accelrys-Scitegic, MDL-Elsevier-Beilstein Peter Cohan (BOD & Overall Sales Strategy) Symyx (VP Bus Dev & President-Discovery Tools), MDL (VP Customer Marketing), www.secondderivative.com, author. Omidyar Network, Founders Fund, & Lilly (BOD observers) WSGR (Corporate Counsel), Rina Accountancy (GAAP compliance) Partners: Hub Consortium Members, ChemAxon, DNDi, MMV, Sandler Center… CDD SAB: Christopher Lipinski PhD, James McKerrow, MD PhD, David Roos PhD, Adam Renslo PhD, Wes Van Voorhis, MD PhD
CDD Experienced Team Innovates and Executes Barry Bunin, PhD (Pres. & Cofounder as first Eli Lilly EIR) Libraria (CEO, Pres.-CSO), Arris Pharmaceuticals (Sr. Scientist), Genentech, UC Berkeley (Ellman), Columbia University, author. Moses Hohman, PhD (Director Software Engineering) Northwestern Assoc. Director of Bioinformatics, Thoughtworks, Inc., U of Chicago (PhD), Harvard ( magna cum laude, Physics) Sylvia Ernst, PhD (Director Community Growth & Sales) Left 800-lb Gorillas: Accelrys-Scitegic, MDL-Elsevier-Beilstein Peter Cohan (BOD & Overall Sales Strategy) Symyx (VP Bus Dev & President-Discovery Tools), MDL (VP Customer Marketing), www.secondderivative.com, author. Omidyar Network, Founders Fund, & Lilly (BOD observers) WSGR (Corporate Counsel), Rina Accountancy (GAAP compliance) Partners: Hub Consortium Members, ChemAxon, DNDi, MMV, Sandler Center… CDD SAB: Christopher Lipinski PhD, James McKerrow, MD PhD, David Roos PhD, Adam Renslo PhD, Wes Van Voorhis, MD PhD
CDD Experienced Team Innovates and Executes Barry Bunin, PhD (Pres. & Cofounder as first Eli Lilly EIR) Libraria (CEO, Pres.-CSO), Arris Pharmaceuticals (Sr. Scientist), Genentech, UC Berkeley (Ellman), Columbia University, author. Moses Hohman, PhD (Director Software Engineering) Northwestern Assoc. Director of Bioinformatics, Thoughtworks, Inc., U of Chicago (PhD), Harvard ( magna cum laude, Physics) Sylvia Ernst, PhD (Director Community Growth & Sales) Left 800-lb Gorillas: Accelrys-Scitegic, MDL-Elsevier-Beilstein Peter Cohan (BOD & Overall Sales Strategy) Symyx (VP Bus Dev & President-Discovery Tools), MDL (VP Customer Marketing), www.secondderivative.com, author. Omidyar Network, Founders Fund, & Lilly (BOD observers) WSGR (Corporate Counsel), Rina Accountancy (GAAP compliance) Partners: Hub Consortium Members, ChemAxon, DNDi, MMV, Sandler Center… CDD SAB: Christopher Lipinski PhD, James McKerrow, MD PhD, David Roos PhD, Adam Renslo PhD, Wes Van Voorhis, MD PhD