SlideShare uma empresa Scribd logo
1 de 46
Bigger Data to Increase Drug DiscoveryBigger Data to Increase Drug Discovery
Sean EkinsSean Ekins
Phoenix Nest, Inc., Brooklyn, NY.
Collaborations in Chemistry, Inc., Fuquay Varina, NC.
Collaborative Drug Discovery, Inc., Burlingame, CA.
Collaborations Pharmaceuticals, Inc., Fuquay Varina, NC.
In a Perfect World…
• All major diseases cured
• All > 7000 rare diseases have treatments available
• Neglected diseases are eradicated
• Antibiotics, antivirals, vaccines developed to anticipate all
future mutations
• Drug resistance eradicated
• All research coordinated globally
• Government/individuals collaboration- discovers / fund all
research
• Billions of molecules will be available with data for different
targets
• All decisions will involve machine learning
• Life expectancy is infinite
Big DATA
Ebola- related tweets in a 6 week
period 2014
Robert Moore
Why ‘Bigger’ and not ‘Big’
Just a matter of scale?
Drug Discovery’s
definition of Big data
Everyone else’s definition of Big data
What about Chemistry and Biology -
Pharmacology X.0
• Data Sources
• PubChem
• ChEMBL
• ToxCast over 1800 molecules tested against over 800 endpoints
BUT
BUT
WHERE
ARE
THE
‘Big’ Chemistry DBs
But what about small data?
• In some cases its all we have
• In vivo data is not high throughput
• Small data builds networks DATA
V
http://smalldatagroup.com/
The past
• 1996
• Data from low throughput
Drug-drug interaction studies
• E.g. Ki values with CYP 3A4
• A drug company might have
10s of values
• This data was used to build
3D QSAR, pharmacophores
JPET, 290: 429-438, 1999
  Hydrophobi
c features 
(HPF)
Hydrogen 
bond 
acceptor 
(HBA)
Hydrogen 
bond 
donor 
(HBD)
Observed 
vs. 
predicted 
IC50 r
Acoustic mediated process
2 1 1 0.92
Tip-based process
0 2 1 0.80
Acoustic Tip based
Generated with Discovery Studio Generated with Discovery Studio 
(Accelrys)(Accelrys)
Cyan = hydrophobicCyan = hydrophobic
Green = hydrogen bond acceptorGreen = hydrogen bond acceptor
Purple = hydrogen bond donorPurple = hydrogen bond donor
Each model shows most potent Each model shows most potent 
molecule mappingmolecule mapping
How you dispense liquids may be important: insights from small dataHow you dispense liquids may be important: insights from small data
PLoS ONE 8(5): e62325 (2013)
Ebola inhibitor
Pharmacophore
Ekins S, Freundlich JS and Coffee M
F1000Research 2014, 3:277
Docking FDA approved
compounds in VP35
protein showing overlap
with ligand (yellow)
Proposed amodiaquine,
chloroquine, clomiphene toremifene
Which all are active in vitro may have
common features and bind common
site / target
A common feature pharmacophore for FDA-approved drugs inhibiting the Ebola virus
The last 5 years -Present
• 2010
• Data from high
throughput screens at
Pfizer
• E.g. metabolic
stability data ~200K
compounds
• This data was used to
build machine
learning models
• 2015
• Could easily be
double this amount
Drug Metab Dispos, 38: 2083-2090, 2010
Ebola Machine Learning Models
Models 
(training set 
868 
compounds)
RP Forest 
(Out of 
bag ROC)
RP Single 
Tree (With 5 
fold cross 
validation 
ROC)
SVM
(with 5 fold 
cross 
validation 
ROC) 
Bayesian 
(with 5 fold 
cross 
validation 
ROC)
Bayesian 
(leave out 
50% x 100 
ROC) 
Open Bayesian
(with 5 fold 
cross 
validation 
ROC)
Ebola 
replication 
(actives = 20)
0.70 0.78 0.73 0.86 0.86 0.82
Ebola 
Pseudotype 
(actives = 41)
0.85 0.81 0.76 0.85 0.82 0.82
Ekins, Freundlich, Madrid and Clark
https://goo.gl/uG8K3P
Tuberculosis still kills 1.6-1.7m/yr (~1 every 8 seconds)
1/3rd
of worlds population infected!!!!
streptomycin (1943)streptomycin (1943)
para-para-aminosalicyclic acid (1949)aminosalicyclic acid (1949)
isoniazid (1952)isoniazid (1952)
pyrazinamide (1954)pyrazinamide (1954)
cycloserine (1955)cycloserine (1955)
ethambutol (1962)ethambutol (1962)
rifampicin (1967)rifampicin (1967)
Multi drug resistance in 4.3% of casesMulti drug resistance in 4.3% of cases
Extensively drug resistant increasingExtensively drug resistant increasing
incidenceincidence
2 new drugs (bedaquiline, delamanid)2 new drugs (bedaquiline, delamanid)
in 40 yrsin 40 yrs
Tuberculosis – a big diseaseTuberculosis – a big disease
Tested >350,000 moleculesTested >350,000 molecules      Tested ~2M            2M     Tested ~2M            2M     >300,000    >300,000
>1500 active and non toxic>1500 active and non toxic     Published 177        100s    Published 177        100s         800         800 
Big Data: Screening for New Tuberculosis TreatmentsBig Data: Screening for New Tuberculosis Treatments 
How many will become a new drug?
How do we learn from this big data?
TBDA screened over 1 million, 1 million 
more to go
TB Alliance + Japanese pharma screens
Over 8000 molecules with dose
response data for Mtb in CDD Public
from NIAID/SRI
https://app.collaborativedrug.com/register
Over 6 years analyzed in vitro data and built models
Top scoring molecules
assayed for
Mtb growth inhibition
Mtb screening
molecule
database/s
High-throughput
phenotypic
Mtb screening
Descriptors + Bioactivity (+Cytotoxicity)
Bayesian Machine Learning classification Mtb Model
Molecule Database
(e.g. GSK malaria
actives)
virtually scored
using Bayesian Models
New bioactivity data
may enhance models
Identify in vitro hits and test models3 x published prospective tests ~750~750
molecules were testedmolecules were tested in vitroin vitro
198 actives were identified198 actives were identified
>20 % hit rate>20 % hit rate
Multiple retrospective tests 3-10 fold
enrichment
N
H
S
N
Ekins et al., Pharm Res 31: 414-435, 2014
Ekins, et al., Tuberculosis 94; 162-169, 2014
Ekins, et al., PLOSONE 8; e63240, 2013
Ekins, et al., Chem Biol 20: 370-378, 2013
Ekins, et al., JCIM, 53: 3054−3063, 2013
Ekins and Freundlich, Pharm Res, 28, 1859-1869, 2011
Ekins et al., Mol BioSyst, 6: 840-851, 2010
Ekins, et al., Mol. Biosyst. 6, 2316-2324, 2010,
5 active compounds vs Mtb in a few months
7 tested, 5 active (70% hit rate)
Ekins et al.,Chem
Biol 20, 370–378,
2013
1. Virtually screen
13,533-member GSK
antimalarial hit library
2. Bayesian Model = SRI
TAACF-CB2 dose
response + cytotoxicity
model
3. Top 46 commercially
available compounds
visually inspected
4. 7 compounds chosen
for Mtb testing based
on
- drug-likeness
- chemotype diversity
GSK #
Bayesian
Score Chemical Structure
Mtb H37Rv
MIC
(µg/mL)
GSK
Reported
% Inhibition
HepG2 @ 10
µM cmpd
TCMDC-
123868 5.73 >32 40
TCMDC-
125802 5.63 0.0625 5
TCMDC-
124192 5.27 2.0 4
TCMDC-
124334 5.20 2.0 4
TCMDC-
123856 5.09 1.0 83
TCMDC-
123640 4.66 >32 10
TCMDC-
124922 4.55 1.0 9
Filling out the triazine matrix using SARtable:
A new kind of map
Green = good activity, Red = bad; colored dots are predictions
No relationship between internal or external ROC and the
number of molecules in the training set?
PCA of combined
data and ARRA(red)
Ekins et al., J Chem Inf Model
54: 2157-2165 (2014)
Internal and leave out 50%x100 ROC track each other
External ROC less correlation
Smaller models do just as well with external testing
~350,000
What matters most >70 years of TB mouse in vivo data – Mind
the gap - 770 molecules
MIND THE TB GAP
Ekins et al.,
J Chem Inf Model 54: 1070-82, 2014
Ekins, Nuermberger & Freundlich
DDT 19: 1279-1282, 2014
In vivo Machine Learning Models
ROC 5 fold cross validation
RP Forest RP Single
Tree
SVM Bayesian
3 /11
(27.2%)
4/11
(36.4%)
7/11
(63.6%)
8/11
(72.7%)
External test set
Ekins et al.,
J Chem Inf Model 54: 1070-82, 2014
RP Forest RP Single
Tree
SVM Bayesian
0.75 0.71 0.77 0.73
ow can we find the in vivo active compound
e need a map..
>70 years of TB in vivo data
Green = in vivo mouse active
Empty = in vivo inactive
Yellow = 2013-2015 data
Uses Bayesian fingerprints
and clustering by similarity
Clark and Ekins - unpublished
Clustering in vivo
mouse TB dataHex
plot
>70 years of TB in vivo data
Green = in vivo mouse active
Empty = in vivo inactive
Yellow = 2013-2015
Clark and Ekins - unpublished
Clustering in vivo
mouse TB data
Triazine surrounded by
inactives
Issues
High Log P, poor solubility
How do we ‘increase drug discovery’?
• Make data and models more accessible
• Collaborate
• Share
– Create mobile apps
• Encourage engagement from non scientists
MoDELSRESIDE IN PAPERS
NOT ACCESSIBLE…THISIS
UNDESIRABLE
How do wesharethem?
How do weuseThem?
• CDD Vision
Uses Bayesian algorithm and FCFP_6 fingerprints
Bayesian models
Clark et al., J Cheminform 6:38 2014
Predictions for the InhA target: (a) the ROC curve with ECFP_6 and FCFP_6Predictions for the InhA target: (a) the ROC curve with ECFP_6 and FCFP_6
fingerprints; (b) modified Bayesian estimators for active and inactive compounds;fingerprints; (b) modified Bayesian estimators for active and inactive compounds;
(c) structures of selected binders.(c) structures of selected binders.
For each listed target with at least two binders, it is first assumed that all of theFor each listed target with at least two binders, it is first assumed that all of the
molecules in the collection that do not indicate this as one of their targets aremolecules in the collection that do not indicate this as one of their targets are
inactive.inactive.
In the app we used ECFP_6 fingerprintsIn the app we used ECFP_6 fingerprints
Building Bayesian models for each target in TB MobileBuilding Bayesian models for each target in TB Mobile
Clark et al., J Cheminform 6:38 2014
TB Mobile Vers.2TB Mobile Vers.2
Ekins et al., J Cheminform 5:13, 2013
Clark et al., J Cheminform 6:38 2014
Predict targets
Cluster molecules
http://goo.gl/vPOKS
http://goo.gl/iDJFR
Predictions for 2013-2015 in vivo
molecules
Bayesian models added to mobile apps: MMDS
Bayesian models added to mobile apps:
Approved drugs
Human Microsomal
Intrinsic clearance
Human protein binding Solubility pH 7.4
AZ dataset models >1000 molecules
Models from ChEMBL data
http://molsync.com/bayesian2
What do 2000 ChEMBL models
look like
Folding bit size
Average
ROC
http://molsync.com/bayesian2
Bigger datasets and model
collections
• Profiling “big datasets” is going to be the norm.
• A recent study mined PubChem datasets for
compounds that have rat in vivo acute toxicity
data
• This could be used in other big data initiatives
like ToxCast (> 1000 compounds x 800 assays)
and Tox21 etc.
• Kinase screening data (1000s mols x 100s
assays)
• GPCR datasets etc (1000s mols x 100s assays)
Zhang J, Hsieh JH, Zhu H (2014) Profiling Animal
Toxicants by Automatically Mining Public
Bioassay Data: A Big Data Approach for
Computational Toxicology. PLoS ONE 9(6):
e99863. doi:10.1371/journal.pone.0099863
http://127.0.0.1:8081/plosone/article?id=info:doi/1
• Data is at your fingertips instantly
• labs add data to a massive corpus
of knowledge
• Instantly available to all
• Algorithms for mining, prediction
• Millions of models accessible
• Making decisions on experiments
needed and running them
• Data visualization, exploration is
real-time, updated
• Data follows you
Sean Ekins, a computational drug discovery consultant at Collaborations in
Chemistry in North Carolina, is much more skeptical. He notes pharma
companies have found hundreds of antimalaria compounds more potent
than TNP-470 and says that he is not convinced Eve can do QSAR. He wants
to see Eve go head-to-head with a real computational chemist. “Eve should
go back to the Garden of Eden and leave drug discovery to scientists who
know what they are doing,” Ekins says.
How close are we?
• Computers and models do not replace scientists
• A tool to help us sift through ideas quickly
• Many examples have lead to leads
• Bigger data not needed for good models
• More data becoming public
• Can model ADME, bioactivity and more
• Collaboration and software is important
• Mobile apps have useful cheminformatics features -
aid anyone to do drug discovery
• Models are compact < 1MB and portable
• The age of model sharing is here
Conclusions
Wanted
• “Bigger” small moleculescreening datasets
• Preferably > 500,000 – 1,000,000 moleculeswith data
• To test how machinelearningAlgorithmsScale
• Contact ekinssean@yahoo.com
Nadia Litterman, Krishna Dole and all at CDD, Megan Coffee, SRI, MM4TB and manyNadia Litterman, Krishna Dole and all at CDD, Megan Coffee, SRI, MM4TB and many
others …Funding:others …Funding: Bill and Melinda Gates Foundation (Grant#49852)Bill and Melinda Gates Foundation (Grant#49852) 1R41AI088893-01,1R41AI088893-01,
2R42AI088893-02, R43 LM011152-01,2R42AI088893-02, R43 LM011152-01, 9R44TR000942-02, 1R41AI108003-01,
1U19AI109713-01, MM4TB, Software: BioviaMM4TB, Software: Biovia
Freundlich Lab

Mais conteúdo relacionado

Mais procurados

Vanderwall cheminformatics Drexel Part 1
Vanderwall cheminformatics Drexel Part 1Vanderwall cheminformatics Drexel Part 1
Vanderwall cheminformatics Drexel Part 1Jean-Claude Bradley
 
Jack Tuszynski Accelerating Chemotherapy Drug Discovery with Analytics and Hi...
Jack Tuszynski Accelerating Chemotherapy Drug Discovery with Analytics and Hi...Jack Tuszynski Accelerating Chemotherapy Drug Discovery with Analytics and Hi...
Jack Tuszynski Accelerating Chemotherapy Drug Discovery with Analytics and Hi...Kim Solez ,
 
Drug Repurposing Against Infectious Diseases
Drug Repurposing Against Infectious Diseases Drug Repurposing Against Infectious Diseases
Drug Repurposing Against Infectious Diseases Philip Bourne
 
CDD: Vault, CDD: Vision and CDD: Models for Drug Discovery Collaborations
CDD: Vault, CDD: Vision and CDD: Models for Drug Discovery CollaborationsCDD: Vault, CDD: Vision and CDD: Models for Drug Discovery Collaborations
CDD: Vault, CDD: Vision and CDD: Models for Drug Discovery CollaborationsSean Ekins
 
2015 6 bd2k_biobranch_knowbio
2015 6 bd2k_biobranch_knowbio2015 6 bd2k_biobranch_knowbio
2015 6 bd2k_biobranch_knowbioBenjamin Good
 
Collaborative Database and Computational Models for Tuberculosis Drug Discovery
Collaborative Database and Computational Models for Tuberculosis Drug DiscoveryCollaborative Database and Computational Models for Tuberculosis Drug Discovery
Collaborative Database and Computational Models for Tuberculosis Drug DiscoverySean Ekins
 
MSR david-heckerman_genomics
MSR david-heckerman_genomicsMSR david-heckerman_genomics
MSR david-heckerman_genomicsDaniel Carchedi
 
Using Computational Toxicology to Enable Risk-Based Chemical Safety Decision ...
Using Computational Toxicology to Enable Risk-Based Chemical Safety Decision ...Using Computational Toxicology to Enable Risk-Based Chemical Safety Decision ...
Using Computational Toxicology to Enable Risk-Based Chemical Safety Decision ...U.S. EPA Office of Research and Development
 
Application of Computational and High-Throughput in vitro Screening for Prior...
Application of Computational and High-Throughput in vitro Screening for Prior...Application of Computational and High-Throughput in vitro Screening for Prior...
Application of Computational and High-Throughput in vitro Screening for Prior...U.S. EPA Office of Research and Development
 
challenges and recommendations for obtaining chemical structures of industry-...
challenges and recommendations for obtaining chemical structures of industry-...challenges and recommendations for obtaining chemical structures of industry-...
challenges and recommendations for obtaining chemical structures of industry-...Sean Ekins
 
Real-Time Genome Sequencing of Resistant Bacteria Provides Precision Infectio...
Real-Time Genome Sequencing of Resistant Bacteria Provides Precision Infectio...Real-Time Genome Sequencing of Resistant Bacteria Provides Precision Infectio...
Real-Time Genome Sequencing of Resistant Bacteria Provides Precision Infectio...ExternalEvents
 

Mais procurados (13)

Vanderwall cheminformatics Drexel Part 1
Vanderwall cheminformatics Drexel Part 1Vanderwall cheminformatics Drexel Part 1
Vanderwall cheminformatics Drexel Part 1
 
Jack Tuszynski Accelerating Chemotherapy Drug Discovery with Analytics and Hi...
Jack Tuszynski Accelerating Chemotherapy Drug Discovery with Analytics and Hi...Jack Tuszynski Accelerating Chemotherapy Drug Discovery with Analytics and Hi...
Jack Tuszynski Accelerating Chemotherapy Drug Discovery with Analytics and Hi...
 
Drug Repurposing Against Infectious Diseases
Drug Repurposing Against Infectious Diseases Drug Repurposing Against Infectious Diseases
Drug Repurposing Against Infectious Diseases
 
CDD: Vault, CDD: Vision and CDD: Models for Drug Discovery Collaborations
CDD: Vault, CDD: Vision and CDD: Models for Drug Discovery CollaborationsCDD: Vault, CDD: Vision and CDD: Models for Drug Discovery Collaborations
CDD: Vault, CDD: Vision and CDD: Models for Drug Discovery Collaborations
 
2015 6 bd2k_biobranch_knowbio
2015 6 bd2k_biobranch_knowbio2015 6 bd2k_biobranch_knowbio
2015 6 bd2k_biobranch_knowbio
 
Collaborative Database and Computational Models for Tuberculosis Drug Discovery
Collaborative Database and Computational Models for Tuberculosis Drug DiscoveryCollaborative Database and Computational Models for Tuberculosis Drug Discovery
Collaborative Database and Computational Models for Tuberculosis Drug Discovery
 
MSR david-heckerman_genomics
MSR david-heckerman_genomicsMSR david-heckerman_genomics
MSR david-heckerman_genomics
 
Using Computational Toxicology to Enable Risk-Based Chemical Safety Decision ...
Using Computational Toxicology to Enable Risk-Based Chemical Safety Decision ...Using Computational Toxicology to Enable Risk-Based Chemical Safety Decision ...
Using Computational Toxicology to Enable Risk-Based Chemical Safety Decision ...
 
Application of Computational and High-Throughput in vitro Screening for Prior...
Application of Computational and High-Throughput in vitro Screening for Prior...Application of Computational and High-Throughput in vitro Screening for Prior...
Application of Computational and High-Throughput in vitro Screening for Prior...
 
Computational Toxicity in 21st Century Safety Sciences
Computational Toxicity in 21st Century Safety SciencesComputational Toxicity in 21st Century Safety Sciences
Computational Toxicity in 21st Century Safety Sciences
 
Challenges and recommendations for obtaining chemical structures of industry
Challenges and recommendations for obtaining chemical structures of industryChallenges and recommendations for obtaining chemical structures of industry
Challenges and recommendations for obtaining chemical structures of industry
 
challenges and recommendations for obtaining chemical structures of industry-...
challenges and recommendations for obtaining chemical structures of industry-...challenges and recommendations for obtaining chemical structures of industry-...
challenges and recommendations for obtaining chemical structures of industry-...
 
Real-Time Genome Sequencing of Resistant Bacteria Provides Precision Infectio...
Real-Time Genome Sequencing of Resistant Bacteria Provides Precision Infectio...Real-Time Genome Sequencing of Resistant Bacteria Provides Precision Infectio...
Real-Time Genome Sequencing of Resistant Bacteria Provides Precision Infectio...
 

Semelhante a Bigger Data to Increase Drug Discovery

C&E news talk sept 16
C&E news talk sept 16C&E news talk sept 16
C&E news talk sept 16Sean Ekins
 
acs talk open source drug discovery
acs talk open source drug discoveryacs talk open source drug discovery
acs talk open source drug discoverySean Ekins
 
Looking Back at Mycobacterium tuberculosis Mouse Efficacy Testing To Move Ne...
Looking Back at Mycobacterium tuberculosis Mouse  Efficacy Testing To Move Ne...Looking Back at Mycobacterium tuberculosis Mouse  Efficacy Testing To Move Ne...
Looking Back at Mycobacterium tuberculosis Mouse Efficacy Testing To Move Ne...Sean Ekins
 
Bioinformatics t9-t10-biocheminformatics v2014
Bioinformatics t9-t10-biocheminformatics v2014Bioinformatics t9-t10-biocheminformatics v2014
Bioinformatics t9-t10-biocheminformatics v2014Prof. Wim Van Criekinge
 
Mining 'Bigger' Datasets to Create, Validate and Share Machine Learning Models
Mining 'Bigger' Datasets to Create, Validate and Share Machine Learning ModelsMining 'Bigger' Datasets to Create, Validate and Share Machine Learning Models
Mining 'Bigger' Datasets to Create, Validate and Share Machine Learning ModelsSean Ekins
 
dual-event machine learning models to accelerate drug discovery
dual-event machine learning models to accelerate drug discoverydual-event machine learning models to accelerate drug discovery
dual-event machine learning models to accelerate drug discoverySean Ekins
 
Applying cheminformatics and bioinformatics approaches to neglected tropical ...
Applying cheminformatics and bioinformatics approaches to neglected tropical ...Applying cheminformatics and bioinformatics approaches to neglected tropical ...
Applying cheminformatics and bioinformatics approaches to neglected tropical ...Sean Ekins
 
SMR kinase meeting October 2013
SMR kinase meeting October 2013SMR kinase meeting October 2013
SMR kinase meeting October 2013jpoverington
 
New Target Prediction and Visualization Tools Incorporating Open Source Molec...
New Target Prediction and Visualization Tools Incorporating Open Source Molec...New Target Prediction and Visualization Tools Incorporating Open Source Molec...
New Target Prediction and Visualization Tools Incorporating Open Source Molec...Sean Ekins
 
2010StanfordE25 Michele dragoescu e25 project
2010StanfordE25 Michele dragoescu e25 project2010StanfordE25 Michele dragoescu e25 project
2010StanfordE25 Michele dragoescu e25 projectmdragoescu
 
Basics of QSAR Modeling by Prof Rahul D. Jawarkar.pptx
Basics of QSAR Modeling by Prof Rahul D. Jawarkar.pptxBasics of QSAR Modeling by Prof Rahul D. Jawarkar.pptx
Basics of QSAR Modeling by Prof Rahul D. Jawarkar.pptxRahul Jawarkar
 
2016 bioinformatics i_bio_cheminformatics_wimvancriekinge
2016 bioinformatics i_bio_cheminformatics_wimvancriekinge2016 bioinformatics i_bio_cheminformatics_wimvancriekinge
2016 bioinformatics i_bio_cheminformatics_wimvancriekingeProf. Wim Van Criekinge
 
Indications discovery and drug repurposing
Indications discovery and drug repurposingIndications discovery and drug repurposing
Indications discovery and drug repurposingSean Ekins
 
Systems biology in polypharmacology: explaining and predicting drug secondary...
Systems biology in polypharmacology: explaining and predicting drug secondary...Systems biology in polypharmacology: explaining and predicting drug secondary...
Systems biology in polypharmacology: explaining and predicting drug secondary...Andrei KUCHARAVY
 
Gene Express Jaima Presentation September 04, 2008 Chiba
Gene Express Jaima Presentation September 04, 2008 ChibaGene Express Jaima Presentation September 04, 2008 Chiba
Gene Express Jaima Presentation September 04, 2008 ChibaDavid Lester
 
Bioinformatica 15-12-2011-t9-t10-bio cheminformatics
Bioinformatica 15-12-2011-t9-t10-bio cheminformaticsBioinformatica 15-12-2011-t9-t10-bio cheminformatics
Bioinformatica 15-12-2011-t9-t10-bio cheminformaticsProf. Wim Van Criekinge
 
Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Ian Foster
 
2015 bioinformatics bio_cheminformatics_wim_vancriekinge
2015 bioinformatics bio_cheminformatics_wim_vancriekinge2015 bioinformatics bio_cheminformatics_wim_vancriekinge
2015 bioinformatics bio_cheminformatics_wim_vancriekingeProf. Wim Van Criekinge
 
Bioinformatics t9-t10-bio cheminformatics-wimvancriekinge_v2013
Bioinformatics t9-t10-bio cheminformatics-wimvancriekinge_v2013Bioinformatics t9-t10-bio cheminformatics-wimvancriekinge_v2013
Bioinformatics t9-t10-bio cheminformatics-wimvancriekinge_v2013Prof. Wim Van Criekinge
 
Data Mining and Big Data Analytics in Pharma
Data Mining and Big Data Analytics in Pharma Data Mining and Big Data Analytics in Pharma
Data Mining and Big Data Analytics in Pharma Ankur Khanna
 

Semelhante a Bigger Data to Increase Drug Discovery (20)

C&E news talk sept 16
C&E news talk sept 16C&E news talk sept 16
C&E news talk sept 16
 
acs talk open source drug discovery
acs talk open source drug discoveryacs talk open source drug discovery
acs talk open source drug discovery
 
Looking Back at Mycobacterium tuberculosis Mouse Efficacy Testing To Move Ne...
Looking Back at Mycobacterium tuberculosis Mouse  Efficacy Testing To Move Ne...Looking Back at Mycobacterium tuberculosis Mouse  Efficacy Testing To Move Ne...
Looking Back at Mycobacterium tuberculosis Mouse Efficacy Testing To Move Ne...
 
Bioinformatics t9-t10-biocheminformatics v2014
Bioinformatics t9-t10-biocheminformatics v2014Bioinformatics t9-t10-biocheminformatics v2014
Bioinformatics t9-t10-biocheminformatics v2014
 
Mining 'Bigger' Datasets to Create, Validate and Share Machine Learning Models
Mining 'Bigger' Datasets to Create, Validate and Share Machine Learning ModelsMining 'Bigger' Datasets to Create, Validate and Share Machine Learning Models
Mining 'Bigger' Datasets to Create, Validate and Share Machine Learning Models
 
dual-event machine learning models to accelerate drug discovery
dual-event machine learning models to accelerate drug discoverydual-event machine learning models to accelerate drug discovery
dual-event machine learning models to accelerate drug discovery
 
Applying cheminformatics and bioinformatics approaches to neglected tropical ...
Applying cheminformatics and bioinformatics approaches to neglected tropical ...Applying cheminformatics and bioinformatics approaches to neglected tropical ...
Applying cheminformatics and bioinformatics approaches to neglected tropical ...
 
SMR kinase meeting October 2013
SMR kinase meeting October 2013SMR kinase meeting October 2013
SMR kinase meeting October 2013
 
New Target Prediction and Visualization Tools Incorporating Open Source Molec...
New Target Prediction and Visualization Tools Incorporating Open Source Molec...New Target Prediction and Visualization Tools Incorporating Open Source Molec...
New Target Prediction and Visualization Tools Incorporating Open Source Molec...
 
2010StanfordE25 Michele dragoescu e25 project
2010StanfordE25 Michele dragoescu e25 project2010StanfordE25 Michele dragoescu e25 project
2010StanfordE25 Michele dragoescu e25 project
 
Basics of QSAR Modeling by Prof Rahul D. Jawarkar.pptx
Basics of QSAR Modeling by Prof Rahul D. Jawarkar.pptxBasics of QSAR Modeling by Prof Rahul D. Jawarkar.pptx
Basics of QSAR Modeling by Prof Rahul D. Jawarkar.pptx
 
2016 bioinformatics i_bio_cheminformatics_wimvancriekinge
2016 bioinformatics i_bio_cheminformatics_wimvancriekinge2016 bioinformatics i_bio_cheminformatics_wimvancriekinge
2016 bioinformatics i_bio_cheminformatics_wimvancriekinge
 
Indications discovery and drug repurposing
Indications discovery and drug repurposingIndications discovery and drug repurposing
Indications discovery and drug repurposing
 
Systems biology in polypharmacology: explaining and predicting drug secondary...
Systems biology in polypharmacology: explaining and predicting drug secondary...Systems biology in polypharmacology: explaining and predicting drug secondary...
Systems biology in polypharmacology: explaining and predicting drug secondary...
 
Gene Express Jaima Presentation September 04, 2008 Chiba
Gene Express Jaima Presentation September 04, 2008 ChibaGene Express Jaima Presentation September 04, 2008 Chiba
Gene Express Jaima Presentation September 04, 2008 Chiba
 
Bioinformatica 15-12-2011-t9-t10-bio cheminformatics
Bioinformatica 15-12-2011-t9-t10-bio cheminformaticsBioinformatica 15-12-2011-t9-t10-bio cheminformatics
Bioinformatica 15-12-2011-t9-t10-bio cheminformatics
 
Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009
 
2015 bioinformatics bio_cheminformatics_wim_vancriekinge
2015 bioinformatics bio_cheminformatics_wim_vancriekinge2015 bioinformatics bio_cheminformatics_wim_vancriekinge
2015 bioinformatics bio_cheminformatics_wim_vancriekinge
 
Bioinformatics t9-t10-bio cheminformatics-wimvancriekinge_v2013
Bioinformatics t9-t10-bio cheminformatics-wimvancriekinge_v2013Bioinformatics t9-t10-bio cheminformatics-wimvancriekinge_v2013
Bioinformatics t9-t10-bio cheminformatics-wimvancriekinge_v2013
 
Data Mining and Big Data Analytics in Pharma
Data Mining and Big Data Analytics in Pharma Data Mining and Big Data Analytics in Pharma
Data Mining and Big Data Analytics in Pharma
 

Mais de Sean Ekins

How to Win a small business grant.pptx
How to Win a small business grant.pptxHow to Win a small business grant.pptx
How to Win a small business grant.pptxSean Ekins
 
Evaluating Multiple Machine Learning Models for Biodegradation and Aquatic To...
Evaluating Multiple Machine Learning Models for Biodegradation and Aquatic To...Evaluating Multiple Machine Learning Models for Biodegradation and Aquatic To...
Evaluating Multiple Machine Learning Models for Biodegradation and Aquatic To...Sean Ekins
 
A presentation at the Global Genes rare drug development symposium on governm...
A presentation at the Global Genes rare drug development symposium on governm...A presentation at the Global Genes rare drug development symposium on governm...
A presentation at the Global Genes rare drug development symposium on governm...Sean Ekins
 
Leveraging Science Communication and Social Media to Build Your Brand and Ele...
Leveraging Science Communication and Social Media to Build Your Brand and Ele...Leveraging Science Communication and Social Media to Build Your Brand and Ele...
Leveraging Science Communication and Social Media to Build Your Brand and Ele...Sean Ekins
 
Bayesian Models for Chagas Disease
Bayesian Models for Chagas DiseaseBayesian Models for Chagas Disease
Bayesian Models for Chagas DiseaseSean Ekins
 
Assay Central: A New Approach to Compiling Big Data and Preparing Machine Lea...
Assay Central: A New Approach to Compiling Big Data and Preparing Machine Lea...Assay Central: A New Approach to Compiling Big Data and Preparing Machine Lea...
Assay Central: A New Approach to Compiling Big Data and Preparing Machine Lea...Sean Ekins
 
Drug Discovery Today March 2017 special issue
Drug Discovery Today March 2017 special issueDrug Discovery Today March 2017 special issue
Drug Discovery Today March 2017 special issueSean Ekins
 
Five Ways to Use Social Media to Raise Awareness for Your Paper or Research
Five Ways to Use Social Media to Raise Awareness for Your Paper or ResearchFive Ways to Use Social Media to Raise Awareness for Your Paper or Research
Five Ways to Use Social Media to Raise Awareness for Your Paper or ResearchSean Ekins
 
CDD models case study #3
CDD models case study #3 CDD models case study #3
CDD models case study #3 Sean Ekins
 
CDD models case study #2
CDD models case study #2 CDD models case study #2
CDD models case study #2 Sean Ekins
 
CDD Models case study #1
CDD Models case study #1 CDD Models case study #1
CDD Models case study #1 Sean Ekins
 
CDD: Vault, CDD: Vision and CDD: Models software for biologists and chemists ...
CDD: Vault, CDD: Vision and CDD: Models software for biologists and chemists ...CDD: Vault, CDD: Vision and CDD: Models software for biologists and chemists ...
CDD: Vault, CDD: Vision and CDD: Models software for biologists and chemists ...Sean Ekins
 
The future of computational chemistry b ig
The future of computational chemistry b igThe future of computational chemistry b ig
The future of computational chemistry b igSean Ekins
 
#ZikaOpen: Homology Models -
#ZikaOpen: Homology Models - #ZikaOpen: Homology Models -
#ZikaOpen: Homology Models - Sean Ekins
 
Slas talk 2016
Slas talk 2016Slas talk 2016
Slas talk 2016Sean Ekins
 
Pros and cons of social networking for scientists
Pros and cons of social networking for scientistsPros and cons of social networking for scientists
Pros and cons of social networking for scientistsSean Ekins
 
Rare pediatric and neglected tropical diseases priority review voucher and tr...
Rare pediatric and neglected tropical diseases priority review voucher and tr...Rare pediatric and neglected tropical diseases priority review voucher and tr...
Rare pediatric and neglected tropical diseases priority review voucher and tr...Sean Ekins
 
Combining Metabolite-Based Pharmacophores with Bayesian Machine Learning Mode...
Combining Metabolite-Based Pharmacophores with Bayesian Machine Learning Mode...Combining Metabolite-Based Pharmacophores with Bayesian Machine Learning Mode...
Combining Metabolite-Based Pharmacophores with Bayesian Machine Learning Mode...Sean Ekins
 
Infographic for Sanfilippo Syndrome IIIC and IIID
Infographic for Sanfilippo Syndrome IIIC and IIIDInfographic for Sanfilippo Syndrome IIIC and IIID
Infographic for Sanfilippo Syndrome IIIC and IIIDSean Ekins
 
Cmt update-summer-newsletter-2015
Cmt update-summer-newsletter-2015Cmt update-summer-newsletter-2015
Cmt update-summer-newsletter-2015Sean Ekins
 

Mais de Sean Ekins (20)

How to Win a small business grant.pptx
How to Win a small business grant.pptxHow to Win a small business grant.pptx
How to Win a small business grant.pptx
 
Evaluating Multiple Machine Learning Models for Biodegradation and Aquatic To...
Evaluating Multiple Machine Learning Models for Biodegradation and Aquatic To...Evaluating Multiple Machine Learning Models for Biodegradation and Aquatic To...
Evaluating Multiple Machine Learning Models for Biodegradation and Aquatic To...
 
A presentation at the Global Genes rare drug development symposium on governm...
A presentation at the Global Genes rare drug development symposium on governm...A presentation at the Global Genes rare drug development symposium on governm...
A presentation at the Global Genes rare drug development symposium on governm...
 
Leveraging Science Communication and Social Media to Build Your Brand and Ele...
Leveraging Science Communication and Social Media to Build Your Brand and Ele...Leveraging Science Communication and Social Media to Build Your Brand and Ele...
Leveraging Science Communication and Social Media to Build Your Brand and Ele...
 
Bayesian Models for Chagas Disease
Bayesian Models for Chagas DiseaseBayesian Models for Chagas Disease
Bayesian Models for Chagas Disease
 
Assay Central: A New Approach to Compiling Big Data and Preparing Machine Lea...
Assay Central: A New Approach to Compiling Big Data and Preparing Machine Lea...Assay Central: A New Approach to Compiling Big Data and Preparing Machine Lea...
Assay Central: A New Approach to Compiling Big Data and Preparing Machine Lea...
 
Drug Discovery Today March 2017 special issue
Drug Discovery Today March 2017 special issueDrug Discovery Today March 2017 special issue
Drug Discovery Today March 2017 special issue
 
Five Ways to Use Social Media to Raise Awareness for Your Paper or Research
Five Ways to Use Social Media to Raise Awareness for Your Paper or ResearchFive Ways to Use Social Media to Raise Awareness for Your Paper or Research
Five Ways to Use Social Media to Raise Awareness for Your Paper or Research
 
CDD models case study #3
CDD models case study #3 CDD models case study #3
CDD models case study #3
 
CDD models case study #2
CDD models case study #2 CDD models case study #2
CDD models case study #2
 
CDD Models case study #1
CDD Models case study #1 CDD Models case study #1
CDD Models case study #1
 
CDD: Vault, CDD: Vision and CDD: Models software for biologists and chemists ...
CDD: Vault, CDD: Vision and CDD: Models software for biologists and chemists ...CDD: Vault, CDD: Vision and CDD: Models software for biologists and chemists ...
CDD: Vault, CDD: Vision and CDD: Models software for biologists and chemists ...
 
The future of computational chemistry b ig
The future of computational chemistry b igThe future of computational chemistry b ig
The future of computational chemistry b ig
 
#ZikaOpen: Homology Models -
#ZikaOpen: Homology Models - #ZikaOpen: Homology Models -
#ZikaOpen: Homology Models -
 
Slas talk 2016
Slas talk 2016Slas talk 2016
Slas talk 2016
 
Pros and cons of social networking for scientists
Pros and cons of social networking for scientistsPros and cons of social networking for scientists
Pros and cons of social networking for scientists
 
Rare pediatric and neglected tropical diseases priority review voucher and tr...
Rare pediatric and neglected tropical diseases priority review voucher and tr...Rare pediatric and neglected tropical diseases priority review voucher and tr...
Rare pediatric and neglected tropical diseases priority review voucher and tr...
 
Combining Metabolite-Based Pharmacophores with Bayesian Machine Learning Mode...
Combining Metabolite-Based Pharmacophores with Bayesian Machine Learning Mode...Combining Metabolite-Based Pharmacophores with Bayesian Machine Learning Mode...
Combining Metabolite-Based Pharmacophores with Bayesian Machine Learning Mode...
 
Infographic for Sanfilippo Syndrome IIIC and IIID
Infographic for Sanfilippo Syndrome IIIC and IIIDInfographic for Sanfilippo Syndrome IIIC and IIID
Infographic for Sanfilippo Syndrome IIIC and IIID
 
Cmt update-summer-newsletter-2015
Cmt update-summer-newsletter-2015Cmt update-summer-newsletter-2015
Cmt update-summer-newsletter-2015
 

Último

Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Silpa
 
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Silpa
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY1301aanya
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and ClassificationsAreesha Ahmad
 
300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptxryanrooker
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.Silpa
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .Poonam Aher Patil
 
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professormuralinath2
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
Stages in the normal growth curve
Stages in the normal growth curveStages in the normal growth curve
Stages in the normal growth curveAreesha Ahmad
 
Velocity and Acceleration PowerPoint.ppt
Velocity and Acceleration PowerPoint.pptVelocity and Acceleration PowerPoint.ppt
Velocity and Acceleration PowerPoint.pptRakeshMohan42
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)Areesha Ahmad
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....muralinath2
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsSérgio Sacani
 
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIACURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIADr. TATHAGAT KHOBRAGADE
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptxSilpa
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...Scintica Instrumentation
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxMohamedFarag457087
 

Último (20)

Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
 
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 
300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Stages in the normal growth curve
Stages in the normal growth curveStages in the normal growth curve
Stages in the normal growth curve
 
Velocity and Acceleration PowerPoint.ppt
Velocity and Acceleration PowerPoint.pptVelocity and Acceleration PowerPoint.ppt
Velocity and Acceleration PowerPoint.ppt
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIACURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptx
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
 

Bigger Data to Increase Drug Discovery

  • 1. Bigger Data to Increase Drug DiscoveryBigger Data to Increase Drug Discovery Sean EkinsSean Ekins Phoenix Nest, Inc., Brooklyn, NY. Collaborations in Chemistry, Inc., Fuquay Varina, NC. Collaborative Drug Discovery, Inc., Burlingame, CA. Collaborations Pharmaceuticals, Inc., Fuquay Varina, NC.
  • 2. In a Perfect World… • All major diseases cured • All > 7000 rare diseases have treatments available • Neglected diseases are eradicated • Antibiotics, antivirals, vaccines developed to anticipate all future mutations • Drug resistance eradicated • All research coordinated globally • Government/individuals collaboration- discovers / fund all research • Billions of molecules will be available with data for different targets • All decisions will involve machine learning • Life expectancy is infinite
  • 4.
  • 5.
  • 6. Ebola- related tweets in a 6 week period 2014 Robert Moore
  • 7. Why ‘Bigger’ and not ‘Big’
  • 8. Just a matter of scale? Drug Discovery’s definition of Big data Everyone else’s definition of Big data
  • 9. What about Chemistry and Biology - Pharmacology X.0 • Data Sources • PubChem • ChEMBL • ToxCast over 1800 molecules tested against over 800 endpoints
  • 12. But what about small data? • In some cases its all we have • In vivo data is not high throughput • Small data builds networks DATA V http://smalldatagroup.com/
  • 13. The past • 1996 • Data from low throughput Drug-drug interaction studies • E.g. Ki values with CYP 3A4 • A drug company might have 10s of values • This data was used to build 3D QSAR, pharmacophores JPET, 290: 429-438, 1999
  • 14.   Hydrophobi c features  (HPF) Hydrogen  bond  acceptor  (HBA) Hydrogen  bond  donor  (HBD) Observed  vs.  predicted  IC50 r Acoustic mediated process 2 1 1 0.92 Tip-based process 0 2 1 0.80 Acoustic Tip based Generated with Discovery Studio Generated with Discovery Studio  (Accelrys)(Accelrys) Cyan = hydrophobicCyan = hydrophobic Green = hydrogen bond acceptorGreen = hydrogen bond acceptor Purple = hydrogen bond donorPurple = hydrogen bond donor Each model shows most potent Each model shows most potent  molecule mappingmolecule mapping How you dispense liquids may be important: insights from small dataHow you dispense liquids may be important: insights from small data PLoS ONE 8(5): e62325 (2013)
  • 15. Ebola inhibitor Pharmacophore Ekins S, Freundlich JS and Coffee M F1000Research 2014, 3:277 Docking FDA approved compounds in VP35 protein showing overlap with ligand (yellow) Proposed amodiaquine, chloroquine, clomiphene toremifene Which all are active in vitro may have common features and bind common site / target A common feature pharmacophore for FDA-approved drugs inhibiting the Ebola virus
  • 16. The last 5 years -Present • 2010 • Data from high throughput screens at Pfizer • E.g. metabolic stability data ~200K compounds • This data was used to build machine learning models • 2015 • Could easily be double this amount Drug Metab Dispos, 38: 2083-2090, 2010
  • 17. Ebola Machine Learning Models Models  (training set  868  compounds) RP Forest  (Out of  bag ROC) RP Single  Tree (With 5  fold cross  validation  ROC) SVM (with 5 fold  cross  validation  ROC)  Bayesian  (with 5 fold  cross  validation  ROC) Bayesian  (leave out  50% x 100  ROC)  Open Bayesian (with 5 fold  cross  validation  ROC) Ebola  replication  (actives = 20) 0.70 0.78 0.73 0.86 0.86 0.82 Ebola  Pseudotype  (actives = 41) 0.85 0.81 0.76 0.85 0.82 0.82 Ekins, Freundlich, Madrid and Clark
  • 19. Tuberculosis still kills 1.6-1.7m/yr (~1 every 8 seconds) 1/3rd of worlds population infected!!!! streptomycin (1943)streptomycin (1943) para-para-aminosalicyclic acid (1949)aminosalicyclic acid (1949) isoniazid (1952)isoniazid (1952) pyrazinamide (1954)pyrazinamide (1954) cycloserine (1955)cycloserine (1955) ethambutol (1962)ethambutol (1962) rifampicin (1967)rifampicin (1967) Multi drug resistance in 4.3% of casesMulti drug resistance in 4.3% of cases Extensively drug resistant increasingExtensively drug resistant increasing incidenceincidence 2 new drugs (bedaquiline, delamanid)2 new drugs (bedaquiline, delamanid) in 40 yrsin 40 yrs Tuberculosis – a big diseaseTuberculosis – a big disease
  • 20. Tested >350,000 moleculesTested >350,000 molecules      Tested ~2M            2M     Tested ~2M            2M     >300,000    >300,000 >1500 active and non toxic>1500 active and non toxic     Published 177        100s    Published 177        100s         800         800  Big Data: Screening for New Tuberculosis TreatmentsBig Data: Screening for New Tuberculosis Treatments  How many will become a new drug? How do we learn from this big data? TBDA screened over 1 million, 1 million  more to go TB Alliance + Japanese pharma screens
  • 21. Over 8000 molecules with dose response data for Mtb in CDD Public from NIAID/SRI https://app.collaborativedrug.com/register
  • 22. Over 6 years analyzed in vitro data and built models Top scoring molecules assayed for Mtb growth inhibition Mtb screening molecule database/s High-throughput phenotypic Mtb screening Descriptors + Bioactivity (+Cytotoxicity) Bayesian Machine Learning classification Mtb Model Molecule Database (e.g. GSK malaria actives) virtually scored using Bayesian Models New bioactivity data may enhance models Identify in vitro hits and test models3 x published prospective tests ~750~750 molecules were testedmolecules were tested in vitroin vitro 198 actives were identified198 actives were identified >20 % hit rate>20 % hit rate Multiple retrospective tests 3-10 fold enrichment N H S N Ekins et al., Pharm Res 31: 414-435, 2014 Ekins, et al., Tuberculosis 94; 162-169, 2014 Ekins, et al., PLOSONE 8; e63240, 2013 Ekins, et al., Chem Biol 20: 370-378, 2013 Ekins, et al., JCIM, 53: 3054−3063, 2013 Ekins and Freundlich, Pharm Res, 28, 1859-1869, 2011 Ekins et al., Mol BioSyst, 6: 840-851, 2010 Ekins, et al., Mol. Biosyst. 6, 2316-2324, 2010,
  • 23. 5 active compounds vs Mtb in a few months 7 tested, 5 active (70% hit rate) Ekins et al.,Chem Biol 20, 370–378, 2013 1. Virtually screen 13,533-member GSK antimalarial hit library 2. Bayesian Model = SRI TAACF-CB2 dose response + cytotoxicity model 3. Top 46 commercially available compounds visually inspected 4. 7 compounds chosen for Mtb testing based on - drug-likeness - chemotype diversity GSK # Bayesian Score Chemical Structure Mtb H37Rv MIC (µg/mL) GSK Reported % Inhibition HepG2 @ 10 µM cmpd TCMDC- 123868 5.73 >32 40 TCMDC- 125802 5.63 0.0625 5 TCMDC- 124192 5.27 2.0 4 TCMDC- 124334 5.20 2.0 4 TCMDC- 123856 5.09 1.0 83 TCMDC- 123640 4.66 >32 10 TCMDC- 124922 4.55 1.0 9
  • 24. Filling out the triazine matrix using SARtable: A new kind of map Green = good activity, Red = bad; colored dots are predictions
  • 25. No relationship between internal or external ROC and the number of molecules in the training set? PCA of combined data and ARRA(red) Ekins et al., J Chem Inf Model 54: 2157-2165 (2014) Internal and leave out 50%x100 ROC track each other External ROC less correlation Smaller models do just as well with external testing ~350,000
  • 26. What matters most >70 years of TB mouse in vivo data – Mind the gap - 770 molecules MIND THE TB GAP Ekins et al., J Chem Inf Model 54: 1070-82, 2014 Ekins, Nuermberger & Freundlich DDT 19: 1279-1282, 2014
  • 27. In vivo Machine Learning Models ROC 5 fold cross validation RP Forest RP Single Tree SVM Bayesian 3 /11 (27.2%) 4/11 (36.4%) 7/11 (63.6%) 8/11 (72.7%) External test set Ekins et al., J Chem Inf Model 54: 1070-82, 2014 RP Forest RP Single Tree SVM Bayesian 0.75 0.71 0.77 0.73
  • 28. ow can we find the in vivo active compound e need a map..
  • 29. >70 years of TB in vivo data Green = in vivo mouse active Empty = in vivo inactive Yellow = 2013-2015 data Uses Bayesian fingerprints and clustering by similarity Clark and Ekins - unpublished Clustering in vivo mouse TB dataHex plot
  • 30. >70 years of TB in vivo data Green = in vivo mouse active Empty = in vivo inactive Yellow = 2013-2015 Clark and Ekins - unpublished Clustering in vivo mouse TB data Triazine surrounded by inactives Issues High Log P, poor solubility
  • 31. How do we ‘increase drug discovery’? • Make data and models more accessible • Collaborate • Share – Create mobile apps • Encourage engagement from non scientists
  • 32. MoDELSRESIDE IN PAPERS NOT ACCESSIBLE…THISIS UNDESIRABLE How do wesharethem? How do weuseThem?
  • 33. • CDD Vision Uses Bayesian algorithm and FCFP_6 fingerprints Bayesian models Clark et al., J Cheminform 6:38 2014
  • 34. Predictions for the InhA target: (a) the ROC curve with ECFP_6 and FCFP_6Predictions for the InhA target: (a) the ROC curve with ECFP_6 and FCFP_6 fingerprints; (b) modified Bayesian estimators for active and inactive compounds;fingerprints; (b) modified Bayesian estimators for active and inactive compounds; (c) structures of selected binders.(c) structures of selected binders. For each listed target with at least two binders, it is first assumed that all of theFor each listed target with at least two binders, it is first assumed that all of the molecules in the collection that do not indicate this as one of their targets aremolecules in the collection that do not indicate this as one of their targets are inactive.inactive. In the app we used ECFP_6 fingerprintsIn the app we used ECFP_6 fingerprints Building Bayesian models for each target in TB MobileBuilding Bayesian models for each target in TB Mobile Clark et al., J Cheminform 6:38 2014
  • 35. TB Mobile Vers.2TB Mobile Vers.2 Ekins et al., J Cheminform 5:13, 2013 Clark et al., J Cheminform 6:38 2014 Predict targets Cluster molecules http://goo.gl/vPOKS http://goo.gl/iDJFR
  • 36. Predictions for 2013-2015 in vivo molecules
  • 37. Bayesian models added to mobile apps: MMDS
  • 38. Bayesian models added to mobile apps: Approved drugs
  • 39. Human Microsomal Intrinsic clearance Human protein binding Solubility pH 7.4 AZ dataset models >1000 molecules
  • 40. Models from ChEMBL data http://molsync.com/bayesian2
  • 41. What do 2000 ChEMBL models look like Folding bit size Average ROC http://molsync.com/bayesian2
  • 42. Bigger datasets and model collections • Profiling “big datasets” is going to be the norm. • A recent study mined PubChem datasets for compounds that have rat in vivo acute toxicity data • This could be used in other big data initiatives like ToxCast (> 1000 compounds x 800 assays) and Tox21 etc. • Kinase screening data (1000s mols x 100s assays) • GPCR datasets etc (1000s mols x 100s assays) Zhang J, Hsieh JH, Zhu H (2014) Profiling Animal Toxicants by Automatically Mining Public Bioassay Data: A Big Data Approach for Computational Toxicology. PLoS ONE 9(6): e99863. doi:10.1371/journal.pone.0099863 http://127.0.0.1:8081/plosone/article?id=info:doi/1
  • 43. • Data is at your fingertips instantly • labs add data to a massive corpus of knowledge • Instantly available to all • Algorithms for mining, prediction • Millions of models accessible • Making decisions on experiments needed and running them • Data visualization, exploration is real-time, updated • Data follows you Sean Ekins, a computational drug discovery consultant at Collaborations in Chemistry in North Carolina, is much more skeptical. He notes pharma companies have found hundreds of antimalaria compounds more potent than TNP-470 and says that he is not convinced Eve can do QSAR. He wants to see Eve go head-to-head with a real computational chemist. “Eve should go back to the Garden of Eden and leave drug discovery to scientists who know what they are doing,” Ekins says. How close are we?
  • 44. • Computers and models do not replace scientists • A tool to help us sift through ideas quickly • Many examples have lead to leads • Bigger data not needed for good models • More data becoming public • Can model ADME, bioactivity and more • Collaboration and software is important • Mobile apps have useful cheminformatics features - aid anyone to do drug discovery • Models are compact < 1MB and portable • The age of model sharing is here Conclusions
  • 45. Wanted • “Bigger” small moleculescreening datasets • Preferably > 500,000 – 1,000,000 moleculeswith data • To test how machinelearningAlgorithmsScale • Contact ekinssean@yahoo.com
  • 46. Nadia Litterman, Krishna Dole and all at CDD, Megan Coffee, SRI, MM4TB and manyNadia Litterman, Krishna Dole and all at CDD, Megan Coffee, SRI, MM4TB and many others …Funding:others …Funding: Bill and Melinda Gates Foundation (Grant#49852)Bill and Melinda Gates Foundation (Grant#49852) 1R41AI088893-01,1R41AI088893-01, 2R42AI088893-02, R43 LM011152-01,2R42AI088893-02, R43 LM011152-01, 9R44TR000942-02, 1R41AI108003-01, 1U19AI109713-01, MM4TB, Software: BioviaMM4TB, Software: Biovia Freundlich Lab

Notas do Editor

  1. You do not need big data to show fundamental observations