SlideShare uma empresa Scribd logo
1 de 14
Baixar para ler offline
Predictive in silico models
Crowd computing:
A new approach to predictive modeling
Jörg Bentzien
Open-Source Pharma Bellagio, Italy 7/16/2014 – 7/18/2014
Open-Source Pharma
Bellagio, Italy 7/16/2014 – 7/18/2014 2
Introduction
Ph.D. In Chemistry, Univ. Münster, Germany, Prof Martin Klessinger
Photochemical [2+2] Cycloaddition reactions
Post-Doctoral Studies at USC, Los Angeles, CA,
Nobel Laureate Prof Arieh Warshel
Enzymatic Reactions
Xencor, Monrovia, CA
Protein Design
Boehringer Ingelheim Pharmaceuticals, Ridgefield,CT
ComputationalChemist, Small Molecule Drug Design
ADMETModeling
Crowdsourcing with Kaggle, 2012
Bentzien et al. Drug DiscoveryToday (2013), 18, 472 - 478.
Bentzien et al. J PhysChem B (1998), 102, 2293 - 2301
Hayes et al. J PNAS (2002), 99, 15926 - 15931
Bentzien, Klessinger
J OrgChem (1994), 59, 4887 - 4894
3
Why are we building predictive in silico Models?
We cannot make and test every compound.
• Reduce drug failure rates, de-risk compounds
• Select and prioritize compounds before synthesis
Predictive in silico models could help to achieve this task.
Lack of efficacy and safety/toxicity are the main reasons why drugs fail in the clinic.
Toxicity is the main reason for attrition in early drug development.
Reasons for attrition in clinical trials:
Arrowsmith,
Nat. Rev. Drug
Disc. 2013,
12, 569
Open-Source Pharma
Bellagio, Italy 7/16/2014 – 7/18/2014
Efficacy
Safety
4
Principal of in silico modeling
N
S
O
N
H
N
S
O O
OH
Prediction of (ADMET)
Observable
In vivo effect
In vitro effect
Code in Machine
readable form
Calculate Descriptors:
Physical Chemical Descriptors
Molecular Properties
Fingerprints
Substructure Counts
etc.
Generate predictive Model:
Random Forest
SVM
PLS
CoMFA
etc.
Select:
Training Set
Test Set
Validation Set
Pi = f(x1,x2,x3,x4, ….)x1,x2,x3,x4, ….
N
N
H
O
N N
O
N
+
O
O
OH
S
SH
H
Br
0
5
10
0 5 10
pIC50precited
IC50 exp
Positive
Predicted
Negative
Predicted
Positive
Exper.i
True
Positive
(TP)
False
Negative
(FN)
Negative
Experi.
False
Positive
(FP)
True
Negative
(TN)
Regression:
Classification:
Open-Source Pharma Bellagio, Italy 7/16/2014 – 7/18/2014
Find a relation between the chemical structure and the observable, Pi (e.g.
genotoxicity), by first calculating descriptors, xi (e.g. physchem properties), and then
using a mathematical algorithm that calculates the observable Pi for each structure.
BI 621,079
hCB2 cAMP EC50 = 1.6 nM
O
N
H
S
Cl
O
O N N
Open-Source Pharma
Bellagio, Italy 7/16/2014 – 7/18/2014
5
Crowd-Sourcing applied to in silico Modeling:
The general idea
Traditional Model Building The KAGGLE approach
Ames Positive
predicted
AM1
Ames Negative
predicted
AM1
Ames
Positive
experimental
167 (183) 16
Ames
Negative
experimental
21 53 (74)
Potent Ames negative
compound
O
F
F
F
N
N
H
Single
Expert
Modeller
3. Generate a Model
4. Find a solution
3. Generate a Model
Taking advantage of the “crowd”
one vs. many
Potent Ames positive compound
O
N
H
1. Define the problem
2. Prepare the Data
Open-Source Pharma Bellagio, Italy 7/16/2014 – 7/18/2014 6
The Kaggle Challenge:The Data Set
Predicting a Biological Response 3/16/2012 – 6/15/2012
Data Set of 6512 compounds from Literature
CADD-BI performed:
Data Set Clean-Up (6252: 3401p/2851n)
Random split into:
Training Set (3751: 2034p/1717n)
PublicTest Set (625: 329p/296n)
Private Validation Set (1876: 1038p/838n)
Pre-calculated Descriptors (1776)
Participants had no knowledge of
• the modeled endpoint
• the descriptor types
• the chemical structures
BI offered $20,000 for the best three models
Participants could use any technology they wanted
BI will get the models
Objectives:
• Response to competition
• Quality of the algorithms/models
• Model transfer
Task:
Generate an Ames Classification model
1 = Ames positive
0 = Ames negative
This Challenge does NOT test all aspects of predictive
in silico modeling
Important aspects, e.g. data set selection, descriptor
selection/design, are missing
Study is a machine learning exercise, a proof of concept
Advantage: We know exactly what to expect,
comparative benchmarks available
Open-Source Pharma
Bellagio, Italy 7/16/2014 – 7/18/2014
7
The Kaggle Challenge:The Competition
Predicting a Biological Response 3/16/2012 – 6/15/2012
Overwhelming response to competition!
Best models perform better than
standard benchmarks:
Rank Log Loss
Best Model 1 0.37356
Random Forest 352 0.41540
SVM 541 0.49503
Each Class Predicted with
Probability 0.5
599 0.69250
On average 88 entries per day!
Optimal model
generated
after ~20 Days
796 players (487 first time participants)
703 teams
8841 models submitted
The Kaggle Challenge: Measuring the Performance
Different performance metrics for in silico classification models
LogLoss Sensitivity Specificity CCR PPV NPV MCC
Random
Forest 352
0.41540 0.855 0.802 0.829 0.843 0.818 0.66
SVM 540 0.49503 0.792 0.743 0.768 0.793 0.743 0.55
Rank 1 0.37356 0.841 0.820 0.830 0.853 0.806 0.66
Rank 2 0.37363 0.855 0.803 0.829 0.843 0.818 0.66
Rank 3 0.37407 0.860 0.807 0.833 0.846 0.823 0.67
Rank 10 0.37641 0.860 0.810 0.835 0.849 0.824 0.67
Rank 50 0.38229 0.856 0.805 0.831 0.845 0.819 0.66
Rank 100 0.38958 0.869 0.794 0.831 0.839 0.830 0.67
Differencesin top models in logloss metric are small.
Different statistical measures lead to different rankings.
RF benchmarkhas high correct classification rate (CCR) and high MatthewCorrelationCoefficient.
Benchmarks
Positive
Predicted
Negative
Predicted
Positive
Experi.
True
Positive
(TP)
False
Negative
(FN)
Negative
Experi.
False
Positive
(FP)
True
Negative
(TN)
Positive
Predicted
Negative
Predicted
Rank 1
Rank 2
Rank 3
873
888
893
165
150
145
Rank 1
Rank 2
Rank 3
151
165
162
687
673
676
Positive
Predicted
Negative
Predicted
RF
SVM
888
822
150
216
RF
SVM
166
215
672
673
Positive
Predicted
Negative
Predicted
Rank 17
D27
896
781
142
257
Rank 17
D27
169
215
669
623
Other ModelsWinningTeams
Open-Source Pharma
Bellagio, Italy 7/16/2014 – 7/18/2014
8
Open-Source Pharma
Bellagio, Italy 7/16/2014 – 7/18/2014
9
The Kaggle Challenge: Lessons learned
Technology aspects:
• 1st ranked team: R-software, blending of several different RandomForest models, with
special feature selection and weighting techniques. Final models were merged using other
machine learning techniques.
• 2nd ranked team: R-software, RandomForest, derived new response variable pending on
value and observed activity.This may lead to better separation between actives and
inactives.
• 3rd ranked team: R-software, RandomForest with special techniques to deal with
imbalanced data sets.
• The challenge was a success
• There was a great response
• Predictive in silico models were generated within a three months time frame
• Models were at least as good as the literature
• Social aspects of crowd-sourcing were observed
Open-Source Pharma
Bellagio, Italy 7/16/2014 – 7/18/2014
10
The Kaggle Challenge: Lessons learned (continued)
Performance aspects:
• Model performance on par with best literature models, reached maximum performance for
data set
• Top ranking models are not significantly different from Random Forest benchmark
• Quick turn-around (3 months), code made available
• Model performance plateaued after 20 days
A standard RandomForest model is a good starting point.
In-house technology performs as well as more complex approaches.
Social aspects of competition:
• Very strong response: 703 teams, 8841 models submitted
• People from all over the world participated:
1st place team from US (Harvard,Travelers insurance)
2nd place team from Russia graduate student from Moscow
3rd place from China graduate student from Beijing
• Winning teams had no CompChem/Chemistry background
• Formation of teams occurred during competition
Bentzien at al. “Crowd
computing: Using competitive
dynamics to develop and refine
highly predictive models”, Drug
DiscoveryToday (2013), 18, 472
- 478.
Open-Source Pharma
Bellagio, Italy 7/16/2014 – 7/18/2014
11
The Kaggle Challenge: Lessons learned (continued)
Important aspects for successful crowdsourcing:
Design the Crowdsourcing Challenge:
Very clear defined task/objective
Predefined precise metric to measure entries
Provide adequate incentive/prize money for participants
Participants:
Hosting the challenge either through third party or self
Internal/Restricted/Open Challenge
Promote the crowd sourcing challenge among key expert leaders
The Challenge:
Right barrier for participation
Fast turn-around/feedback to participants
Gamification
can provide additional incentive to participants
can lead to synergies amongst participants
After the Challenge:
Clear follow-up of what to do with the results
Does the challenge benefit to your Network/Organization?
Crowd-Sourcing : Other examples
Open-Source Pharma
Bellagio, Italy 7/16/2014 – 7/18/2014
12
http://www.nytimes.com/2012/11/24/science
/scientists-see-advancesin-
deep-learning-a-part-of-artificial-
intelligence.html?_r=0
Lakhani et al., Nat Biotech,
2013, 31, 108-111.
www.innocentive.com
www.the-dream-project.com
Prill et al., ScienceSignaling, 2011, 4, 1-6
www.kaggle.com
www.topcoder.com
www.grants4targets.com
Open-Source Pharma
Bellagio, Italy 7/16/2014 – 7/18/2014
13
Crowd-Sourcing: A new way for solving problems(?)
Will crowd-sourcing solve all the problems? Likely not.
Crowd sourcing offers opportunities but it is not without risks.
For crowd sourcing to be successful/innovative the task needs to be structured right.
Murcko & Walters, “Alpha Shock” J Comput Aided Mol Des 2012, 26, 97-102
Kittur et al. “The Future of Crowd Work” 16th ACM Conference on Computer
Supported Cooperative Work (CSCW 2013)
Will crowd-sourcing be the future way of drug discovery?
Maybe, ….
Drug Discovery will definitely be different from what it is now.
Potential framework for future crowd
work.
Requires
• Intelligent work decomposition
• sophisticated workflow design
• high level of collaborative work
• quality assurance.
Simple crowd work
• tendency to be mechanical
• not innovative
• has exploitive tendency
Example:
Amazon MechanicalTurk
Open-Source Pharma
Bellagio, Italy 7/16/2014 – 7/18/2014
14
Acknowledgements
Business Partners and Collaborates
ADMET-WG:
Jan Kriegl, Bernd BeckStefan,
Scheuerer, Michael Durawa,
Pierre Bonneau, Sanjay Srivastava,
Michel Garneau, Hassan Kadhim,
Matthias Klemencic, Christian
Klein, Robert Happel, Gerald
Birringer, Dustin Smith, Scott
Oloff, Zheng Yang
Toxicology:
Warren Ku
Patricia Escobar
Ray Kemper
External Collaborators:
Ernst-Walter Knapp
Özgür Demir-Kamuk
AlexTropsha
Curt Breneman
John Pu
Andy Fant
Zhuo Zhen
Medicinal Chemistry:
Robert Hughes
In silico VPR-team
All the MedChem users
Research IS:
Scott Oloff
DavidThompson (PAC)
Zheng Yang
Scott Whalen
Cathy Farrell
MiguelTeodoro
IS-InnovationTeam
Alex Renner
Structural Research:
Sandy Farmer
Neil Farrow
Ingo Mügge
All CADD colleagues
SKD:
Will Loging
Kaggle:
KaggleTeam
Kaggle Challenge Participants

Mais conteúdo relacionado

Mais procurados

Probability Forecasting - a Machine Learning Perspective
Probability Forecasting - a Machine Learning PerspectiveProbability Forecasting - a Machine Learning Perspective
Probability Forecasting - a Machine Learning Perspective
butest
 

Mais procurados (7)

Haro Pharmaceutical I-Corps@NIH 121014
Haro Pharmaceutical I-Corps@NIH 121014Haro Pharmaceutical I-Corps@NIH 121014
Haro Pharmaceutical I-Corps@NIH 121014
 
CV_of_ArulMurugan (2017_01_18)
CV_of_ArulMurugan (2017_01_18)CV_of_ArulMurugan (2017_01_18)
CV_of_ArulMurugan (2017_01_18)
 
Non-viral ocular delivery
Non-viral ocular deliveryNon-viral ocular delivery
Non-viral ocular delivery
 
Probability Forecasting - a Machine Learning Perspective
Probability Forecasting - a Machine Learning PerspectiveProbability Forecasting - a Machine Learning Perspective
Probability Forecasting - a Machine Learning Perspective
 
Walking toward challenging dream
Walking toward challenging dreamWalking toward challenging dream
Walking toward challenging dream
 
Avedro
AvedroAvedro
Avedro
 
Ophthalmology Innovation Showcase 2 - Mynosys Cellular Devices
Ophthalmology Innovation Showcase 2 - Mynosys Cellular DevicesOphthalmology Innovation Showcase 2 - Mynosys Cellular Devices
Ophthalmology Innovation Showcase 2 - Mynosys Cellular Devices
 

Destaque

Predictive in vitro & in silico Methods for Precision Medicine- Robert G. Hun...
Predictive in vitro & in silico Methods for Precision Medicine- Robert G. Hun...Predictive in vitro & in silico Methods for Precision Medicine- Robert G. Hun...
Predictive in vitro & in silico Methods for Precision Medicine- Robert G. Hun...
RobertGHunter
 
Data sources for UK Pharma Sales Analysts - Overview
Data sources for UK Pharma Sales Analysts - OverviewData sources for UK Pharma Sales Analysts - Overview
Data sources for UK Pharma Sales Analysts - Overview
VA Consultancy
 
Adding Value to Your Custom Research With Online Tools
Adding Value to Your Custom Research With Online ToolsAdding Value to Your Custom Research With Online Tools
Adding Value to Your Custom Research With Online Tools
guest6e3d4d
 
GfK - Technology Custom Research
GfK - Technology Custom ResearchGfK - Technology Custom Research
GfK - Technology Custom Research
rickvdw
 
The evolution of Sales
The evolution of SalesThe evolution of Sales
The evolution of Sales
Qorus Software
 

Destaque (11)

Predictive in vitro & in silico Methods for Precision Medicine- Robert G. Hun...
Predictive in vitro & in silico Methods for Precision Medicine- Robert G. Hun...Predictive in vitro & in silico Methods for Precision Medicine- Robert G. Hun...
Predictive in vitro & in silico Methods for Precision Medicine- Robert G. Hun...
 
Data sources for UK Pharma Sales Analysts - Overview
Data sources for UK Pharma Sales Analysts - OverviewData sources for UK Pharma Sales Analysts - Overview
Data sources for UK Pharma Sales Analysts - Overview
 
BCC Research
BCC ResearchBCC Research
BCC Research
 
Adding Value to Your Custom Research With Online Tools
Adding Value to Your Custom Research With Online ToolsAdding Value to Your Custom Research With Online Tools
Adding Value to Your Custom Research With Online Tools
 
About Lux Research, Inc. 2016
About Lux Research, Inc. 2016About Lux Research, Inc. 2016
About Lux Research, Inc. 2016
 
VONSCH® - Artificial AC mains
VONSCH® - Artificial AC mainsVONSCH® - Artificial AC mains
VONSCH® - Artificial AC mains
 
VONSCH® - Custom research and development of power electronics
VONSCH® - Custom research and development of power electronicsVONSCH® - Custom research and development of power electronics
VONSCH® - Custom research and development of power electronics
 
GfK - Technology Custom Research
GfK - Technology Custom ResearchGfK - Technology Custom Research
GfK - Technology Custom Research
 
Technology Scouting
Technology Scouting Technology Scouting
Technology Scouting
 
Open Source Pharma: The future of drug development
Open Source Pharma: The future of drug developmentOpen Source Pharma: The future of drug development
Open Source Pharma: The future of drug development
 
The evolution of Sales
The evolution of SalesThe evolution of Sales
The evolution of Sales
 

Semelhante a Open Source Pharma: Crowd computing: A new approach to predictive modeling

Promiscuous patterns and perils in PubChem and the MLSCN
Promiscuous patterns and perils in PubChem and the MLSCNPromiscuous patterns and perils in PubChem and the MLSCN
Promiscuous patterns and perils in PubChem and the MLSCN
Jeremy Yang
 
BioAssay Express: Creating and exploiting assay metadata
BioAssay Express: Creating and exploiting assay metadataBioAssay Express: Creating and exploiting assay metadata
BioAssay Express: Creating and exploiting assay metadata
Philip Cheung
 
Article IVD March 2006
Article IVD March 2006Article IVD March 2006
Article IVD March 2006
Fabrice Sultan
 

Semelhante a Open Source Pharma: Crowd computing: A new approach to predictive modeling (20)

sbv IMPROVER: an industry initiative to harness the wisdom of the crowd in sc...
sbv IMPROVER: an industry initiative to harness the wisdom of the crowd in sc...sbv IMPROVER: an industry initiative to harness the wisdom of the crowd in sc...
sbv IMPROVER: an industry initiative to harness the wisdom of the crowd in sc...
 
Rules Reduction using Evolutionary Meta-Heuristics
Rules Reduction using  Evolutionary Meta-HeuristicsRules Reduction using  Evolutionary Meta-Heuristics
Rules Reduction using Evolutionary Meta-Heuristics
 
Nesher Tech I-Corps@NIH 121014
Nesher Tech I-Corps@NIH 121014Nesher Tech I-Corps@NIH 121014
Nesher Tech I-Corps@NIH 121014
 
A New Model for Informed Consent: The Impact of Open Science on the Responsib...
A New Model for Informed Consent: The Impact of Open Science on the Responsib...A New Model for Informed Consent: The Impact of Open Science on the Responsib...
A New Model for Informed Consent: The Impact of Open Science on the Responsib...
 
ReComp and P4@NU: Reproducible Data Science for Health
ReComp and P4@NU: Reproducible Data Science for HealthReComp and P4@NU: Reproducible Data Science for Health
ReComp and P4@NU: Reproducible Data Science for Health
 
iwatchjr | Next Generation Wearable for Young Generation
iwatchjr | Next Generation Wearable for Young Generationiwatchjr | Next Generation Wearable for Young Generation
iwatchjr | Next Generation Wearable for Young Generation
 
Promiscuous patterns and perils in PubChem and the MLSCN
Promiscuous patterns and perils in PubChem and the MLSCNPromiscuous patterns and perils in PubChem and the MLSCN
Promiscuous patterns and perils in PubChem and the MLSCN
 
The Cold Start Problem and Per-Group Personalization in Real-Life Emotion Rec...
The Cold Start Problem and Per-Group Personalization in Real-Life Emotion Rec...The Cold Start Problem and Per-Group Personalization in Real-Life Emotion Rec...
The Cold Start Problem and Per-Group Personalization in Real-Life Emotion Rec...
 
Machine Learning for Molecules
Machine Learning for MoleculesMachine Learning for Molecules
Machine Learning for Molecules
 
BioAssay Express: Creating and exploiting assay metadata
BioAssay Express: Creating and exploiting assay metadataBioAssay Express: Creating and exploiting assay metadata
BioAssay Express: Creating and exploiting assay metadata
 
Workflows supporting drug discovery against malaria
Workflows supporting drug discovery against malariaWorkflows supporting drug discovery against malaria
Workflows supporting drug discovery against malaria
 
Article IVD March 2006
Article IVD March 2006Article IVD March 2006
Article IVD March 2006
 
Digital Biomarkers for Huntington Disease
Digital Biomarkers for Huntington DiseaseDigital Biomarkers for Huntington Disease
Digital Biomarkers for Huntington Disease
 
Power and sample size calculations for survival analysis webinar Slides
Power and sample size calculations for survival analysis webinar SlidesPower and sample size calculations for survival analysis webinar Slides
Power and sample size calculations for survival analysis webinar Slides
 
AIQC - ISCB 2022.pdf
AIQC - ISCB 2022.pdfAIQC - ISCB 2022.pdf
AIQC - ISCB 2022.pdf
 
resumeliliane
resumelilianeresumeliliane
resumeliliane
 
Simplified Knowledge Prediction: Application of Machine Learning in Real Life
Simplified Knowledge Prediction: Application of Machine Learning in Real LifeSimplified Knowledge Prediction: Application of Machine Learning in Real Life
Simplified Knowledge Prediction: Application of Machine Learning in Real Life
 
Innovative Strategies For Successful Trial Design - Webinar Slides
Innovative Strategies For Successful Trial Design - Webinar SlidesInnovative Strategies For Successful Trial Design - Webinar Slides
Innovative Strategies For Successful Trial Design - Webinar Slides
 
Reproducibility in cheminformatics and computational chemistry research: cert...
Reproducibility in cheminformatics and computational chemistry research: cert...Reproducibility in cheminformatics and computational chemistry research: cert...
Reproducibility in cheminformatics and computational chemistry research: cert...
 
Artificial Intelligence in Medicine.pdf
Artificial Intelligence in Medicine.pdfArtificial Intelligence in Medicine.pdf
Artificial Intelligence in Medicine.pdf
 

Mais de Open Source Pharma

Mais de Open Source Pharma (6)

Open Source Pharma: Crowdsourcing wet labs with undergraduates
Open Source Pharma: Crowdsourcing wet labs with undergraduatesOpen Source Pharma: Crowdsourcing wet labs with undergraduates
Open Source Pharma: Crowdsourcing wet labs with undergraduates
 
Open Source Pharma: From philosophy to real time experience
Open Source Pharma: From philosophy to real time experienceOpen Source Pharma: From philosophy to real time experience
Open Source Pharma: From philosophy to real time experience
 
Open Source Pharma: Anti-tuberculosis drug overview
Open Source Pharma: Anti-tuberculosis drug overviewOpen Source Pharma: Anti-tuberculosis drug overview
Open Source Pharma: Anti-tuberculosis drug overview
 
Open Source Pharma: OSDD: An innovative model for distributed co-creation
Open Source Pharma: OSDD: An innovative model for distributed co-creationOpen Source Pharma: OSDD: An innovative model for distributed co-creation
Open Source Pharma: OSDD: An innovative model for distributed co-creation
 
Open Source Malaria July 2014
Open Source Malaria July 2014Open Source Malaria July 2014
Open Source Malaria July 2014
 
Open Source Pharma: Game changing for innovative medicine
Open Source Pharma: Game changing for innovative medicineOpen Source Pharma: Game changing for innovative medicine
Open Source Pharma: Game changing for innovative medicine
 

Último

Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.
Silpa
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Sérgio Sacani
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Sérgio Sacani
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
levieagacer
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
MohamedFarag457087
 

Último (20)

Grade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsGrade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its Functions
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
 
Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.
 
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
 
Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.
 
300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx
 
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxClimate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
 
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIACURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical Science
 
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort ServiceCall Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
 
Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.
 
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRLGwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
 
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRingsTransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 

Open Source Pharma: Crowd computing: A new approach to predictive modeling

  • 1. Predictive in silico models Crowd computing: A new approach to predictive modeling Jörg Bentzien Open-Source Pharma Bellagio, Italy 7/16/2014 – 7/18/2014
  • 2. Open-Source Pharma Bellagio, Italy 7/16/2014 – 7/18/2014 2 Introduction Ph.D. In Chemistry, Univ. Münster, Germany, Prof Martin Klessinger Photochemical [2+2] Cycloaddition reactions Post-Doctoral Studies at USC, Los Angeles, CA, Nobel Laureate Prof Arieh Warshel Enzymatic Reactions Xencor, Monrovia, CA Protein Design Boehringer Ingelheim Pharmaceuticals, Ridgefield,CT ComputationalChemist, Small Molecule Drug Design ADMETModeling Crowdsourcing with Kaggle, 2012 Bentzien et al. Drug DiscoveryToday (2013), 18, 472 - 478. Bentzien et al. J PhysChem B (1998), 102, 2293 - 2301 Hayes et al. J PNAS (2002), 99, 15926 - 15931 Bentzien, Klessinger J OrgChem (1994), 59, 4887 - 4894
  • 3. 3 Why are we building predictive in silico Models? We cannot make and test every compound. • Reduce drug failure rates, de-risk compounds • Select and prioritize compounds before synthesis Predictive in silico models could help to achieve this task. Lack of efficacy and safety/toxicity are the main reasons why drugs fail in the clinic. Toxicity is the main reason for attrition in early drug development. Reasons for attrition in clinical trials: Arrowsmith, Nat. Rev. Drug Disc. 2013, 12, 569 Open-Source Pharma Bellagio, Italy 7/16/2014 – 7/18/2014 Efficacy Safety
  • 4. 4 Principal of in silico modeling N S O N H N S O O OH Prediction of (ADMET) Observable In vivo effect In vitro effect Code in Machine readable form Calculate Descriptors: Physical Chemical Descriptors Molecular Properties Fingerprints Substructure Counts etc. Generate predictive Model: Random Forest SVM PLS CoMFA etc. Select: Training Set Test Set Validation Set Pi = f(x1,x2,x3,x4, ….)x1,x2,x3,x4, …. N N H O N N O N + O O OH S SH H Br 0 5 10 0 5 10 pIC50precited IC50 exp Positive Predicted Negative Predicted Positive Exper.i True Positive (TP) False Negative (FN) Negative Experi. False Positive (FP) True Negative (TN) Regression: Classification: Open-Source Pharma Bellagio, Italy 7/16/2014 – 7/18/2014 Find a relation between the chemical structure and the observable, Pi (e.g. genotoxicity), by first calculating descriptors, xi (e.g. physchem properties), and then using a mathematical algorithm that calculates the observable Pi for each structure.
  • 5. BI 621,079 hCB2 cAMP EC50 = 1.6 nM O N H S Cl O O N N Open-Source Pharma Bellagio, Italy 7/16/2014 – 7/18/2014 5 Crowd-Sourcing applied to in silico Modeling: The general idea Traditional Model Building The KAGGLE approach Ames Positive predicted AM1 Ames Negative predicted AM1 Ames Positive experimental 167 (183) 16 Ames Negative experimental 21 53 (74) Potent Ames negative compound O F F F N N H Single Expert Modeller 3. Generate a Model 4. Find a solution 3. Generate a Model Taking advantage of the “crowd” one vs. many Potent Ames positive compound O N H 1. Define the problem 2. Prepare the Data
  • 6. Open-Source Pharma Bellagio, Italy 7/16/2014 – 7/18/2014 6 The Kaggle Challenge:The Data Set Predicting a Biological Response 3/16/2012 – 6/15/2012 Data Set of 6512 compounds from Literature CADD-BI performed: Data Set Clean-Up (6252: 3401p/2851n) Random split into: Training Set (3751: 2034p/1717n) PublicTest Set (625: 329p/296n) Private Validation Set (1876: 1038p/838n) Pre-calculated Descriptors (1776) Participants had no knowledge of • the modeled endpoint • the descriptor types • the chemical structures BI offered $20,000 for the best three models Participants could use any technology they wanted BI will get the models Objectives: • Response to competition • Quality of the algorithms/models • Model transfer Task: Generate an Ames Classification model 1 = Ames positive 0 = Ames negative This Challenge does NOT test all aspects of predictive in silico modeling Important aspects, e.g. data set selection, descriptor selection/design, are missing Study is a machine learning exercise, a proof of concept Advantage: We know exactly what to expect, comparative benchmarks available
  • 7. Open-Source Pharma Bellagio, Italy 7/16/2014 – 7/18/2014 7 The Kaggle Challenge:The Competition Predicting a Biological Response 3/16/2012 – 6/15/2012 Overwhelming response to competition! Best models perform better than standard benchmarks: Rank Log Loss Best Model 1 0.37356 Random Forest 352 0.41540 SVM 541 0.49503 Each Class Predicted with Probability 0.5 599 0.69250 On average 88 entries per day! Optimal model generated after ~20 Days 796 players (487 first time participants) 703 teams 8841 models submitted
  • 8. The Kaggle Challenge: Measuring the Performance Different performance metrics for in silico classification models LogLoss Sensitivity Specificity CCR PPV NPV MCC Random Forest 352 0.41540 0.855 0.802 0.829 0.843 0.818 0.66 SVM 540 0.49503 0.792 0.743 0.768 0.793 0.743 0.55 Rank 1 0.37356 0.841 0.820 0.830 0.853 0.806 0.66 Rank 2 0.37363 0.855 0.803 0.829 0.843 0.818 0.66 Rank 3 0.37407 0.860 0.807 0.833 0.846 0.823 0.67 Rank 10 0.37641 0.860 0.810 0.835 0.849 0.824 0.67 Rank 50 0.38229 0.856 0.805 0.831 0.845 0.819 0.66 Rank 100 0.38958 0.869 0.794 0.831 0.839 0.830 0.67 Differencesin top models in logloss metric are small. Different statistical measures lead to different rankings. RF benchmarkhas high correct classification rate (CCR) and high MatthewCorrelationCoefficient. Benchmarks Positive Predicted Negative Predicted Positive Experi. True Positive (TP) False Negative (FN) Negative Experi. False Positive (FP) True Negative (TN) Positive Predicted Negative Predicted Rank 1 Rank 2 Rank 3 873 888 893 165 150 145 Rank 1 Rank 2 Rank 3 151 165 162 687 673 676 Positive Predicted Negative Predicted RF SVM 888 822 150 216 RF SVM 166 215 672 673 Positive Predicted Negative Predicted Rank 17 D27 896 781 142 257 Rank 17 D27 169 215 669 623 Other ModelsWinningTeams Open-Source Pharma Bellagio, Italy 7/16/2014 – 7/18/2014 8
  • 9. Open-Source Pharma Bellagio, Italy 7/16/2014 – 7/18/2014 9 The Kaggle Challenge: Lessons learned Technology aspects: • 1st ranked team: R-software, blending of several different RandomForest models, with special feature selection and weighting techniques. Final models were merged using other machine learning techniques. • 2nd ranked team: R-software, RandomForest, derived new response variable pending on value and observed activity.This may lead to better separation between actives and inactives. • 3rd ranked team: R-software, RandomForest with special techniques to deal with imbalanced data sets. • The challenge was a success • There was a great response • Predictive in silico models were generated within a three months time frame • Models were at least as good as the literature • Social aspects of crowd-sourcing were observed
  • 10. Open-Source Pharma Bellagio, Italy 7/16/2014 – 7/18/2014 10 The Kaggle Challenge: Lessons learned (continued) Performance aspects: • Model performance on par with best literature models, reached maximum performance for data set • Top ranking models are not significantly different from Random Forest benchmark • Quick turn-around (3 months), code made available • Model performance plateaued after 20 days A standard RandomForest model is a good starting point. In-house technology performs as well as more complex approaches. Social aspects of competition: • Very strong response: 703 teams, 8841 models submitted • People from all over the world participated: 1st place team from US (Harvard,Travelers insurance) 2nd place team from Russia graduate student from Moscow 3rd place from China graduate student from Beijing • Winning teams had no CompChem/Chemistry background • Formation of teams occurred during competition Bentzien at al. “Crowd computing: Using competitive dynamics to develop and refine highly predictive models”, Drug DiscoveryToday (2013), 18, 472 - 478.
  • 11. Open-Source Pharma Bellagio, Italy 7/16/2014 – 7/18/2014 11 The Kaggle Challenge: Lessons learned (continued) Important aspects for successful crowdsourcing: Design the Crowdsourcing Challenge: Very clear defined task/objective Predefined precise metric to measure entries Provide adequate incentive/prize money for participants Participants: Hosting the challenge either through third party or self Internal/Restricted/Open Challenge Promote the crowd sourcing challenge among key expert leaders The Challenge: Right barrier for participation Fast turn-around/feedback to participants Gamification can provide additional incentive to participants can lead to synergies amongst participants After the Challenge: Clear follow-up of what to do with the results Does the challenge benefit to your Network/Organization?
  • 12. Crowd-Sourcing : Other examples Open-Source Pharma Bellagio, Italy 7/16/2014 – 7/18/2014 12 http://www.nytimes.com/2012/11/24/science /scientists-see-advancesin- deep-learning-a-part-of-artificial- intelligence.html?_r=0 Lakhani et al., Nat Biotech, 2013, 31, 108-111. www.innocentive.com www.the-dream-project.com Prill et al., ScienceSignaling, 2011, 4, 1-6 www.kaggle.com www.topcoder.com www.grants4targets.com
  • 13. Open-Source Pharma Bellagio, Italy 7/16/2014 – 7/18/2014 13 Crowd-Sourcing: A new way for solving problems(?) Will crowd-sourcing solve all the problems? Likely not. Crowd sourcing offers opportunities but it is not without risks. For crowd sourcing to be successful/innovative the task needs to be structured right. Murcko & Walters, “Alpha Shock” J Comput Aided Mol Des 2012, 26, 97-102 Kittur et al. “The Future of Crowd Work” 16th ACM Conference on Computer Supported Cooperative Work (CSCW 2013) Will crowd-sourcing be the future way of drug discovery? Maybe, …. Drug Discovery will definitely be different from what it is now. Potential framework for future crowd work. Requires • Intelligent work decomposition • sophisticated workflow design • high level of collaborative work • quality assurance. Simple crowd work • tendency to be mechanical • not innovative • has exploitive tendency Example: Amazon MechanicalTurk
  • 14. Open-Source Pharma Bellagio, Italy 7/16/2014 – 7/18/2014 14 Acknowledgements Business Partners and Collaborates ADMET-WG: Jan Kriegl, Bernd BeckStefan, Scheuerer, Michael Durawa, Pierre Bonneau, Sanjay Srivastava, Michel Garneau, Hassan Kadhim, Matthias Klemencic, Christian Klein, Robert Happel, Gerald Birringer, Dustin Smith, Scott Oloff, Zheng Yang Toxicology: Warren Ku Patricia Escobar Ray Kemper External Collaborators: Ernst-Walter Knapp Özgür Demir-Kamuk AlexTropsha Curt Breneman John Pu Andy Fant Zhuo Zhen Medicinal Chemistry: Robert Hughes In silico VPR-team All the MedChem users Research IS: Scott Oloff DavidThompson (PAC) Zheng Yang Scott Whalen Cathy Farrell MiguelTeodoro IS-InnovationTeam Alex Renner Structural Research: Sandy Farmer Neil Farrow Ingo Mügge All CADD colleagues SKD: Will Loging Kaggle: KaggleTeam Kaggle Challenge Participants