SlideShare a Scribd company logo
1 of 1
Download to read offline
U.S. Environmental Protection Agency
Office of Research and Development
EDSP Prioritization: Collaborative Estrogen Receptor Activity Prediction Project (CERAPP)
Kamel Mansouri1,2, Jayaram Kancherla1,2, Ann Richard2 and Richard Judson2
1Oak Ridge Institute for Science and Education, Oak Ridge, TN, USA
2National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency, RTP, NC, USA
Abstract
Humans are potentially exposed to tens of thousands of man-made chemicals in the environment. It is well known
that some environmental chemicals mimic natural hormones and thus have the potential to be endocrine disruptors. Most
of these environmental chemicals have never been tested for their ability to disrupt the endocrine system, in particular,
their ability to interact with the estrogen receptor. EPA needs tools to prioritize thousands of chemicals, for instance in the
Endocrine Disruptor Screening Program (EDSP). This project was intended to be a demonstration of the use of predictive
computational models on HTS data including ToxCast and Tox21 assays to prioritize a large chemical universe of 32464
unique structures for one specific molecular target – the estrogen receptor. CERAPP combined multiple computational
models for prediction of estrogen receptor activity, and used the predicted results to build a unique consensus model.
Models were developed in collaboration between 17 groups in the U.S. and Europe and applied to predict the common set
of chemicals. Structure-based techniques such as docking and several QSAR modeling approaches were employed, mostly
using a common training set of 1677 compounds provided by U.S. EPA, to build a total of 42 classification models and 8
regression models for binding, agonist and antagonist activity. All predictions were evaluated on ToxCast data and on an
external validation set collected from the literature. In order to overcome the limitations of single models, a consensus was
built weighting models based on their prediction accuracy scores (including sensitivity and specificity against training and
external sets). Individual model scores ranged from 0.69 to 0.85, showing high prediction reliabilities. The final consensus
predicted 4001 chemicals as actives to be considered as high priority for further testing and 6742 as suspicious chemicals.
Consensus model
Kamel Mansouri l mansouri.kamel@epa.gov l 919-541-0545
Project planning
Conclusions
β€’ Successful collaboration of 17 international research groups resulted in consensus predictions.
β€’ Most individual models performed very well on both ToxCast and literature data.
β€’ Model accuracy differences between ToxCast and literature may be due to noise in the literature data.
β€’ A total number of 4001 out of 32464 chemicals are predicted by the consensus as active binders. For
prioritization purposes, 6742 additional chemicals (down to 0.2 positive concordance) could be
considered as suspicious.
β€’ The consensus model predictions correlate better with literature data from multiple sources, likely due
to greater noise in data with unique or low number of sources.
β€’ Consensus prediction results are being used in the EDSP program. See http://actor.epa.gov/edsp21.
Disclaimer: The views expressed in this poster are those of the authors and do not necessarily reflect the views or policies of the
U.S. Environmental Protection Agency.
Chemical structure curation:
The main steps of this KNIME workflow were:
β€’ Check the validity of the molecular file format
β€’ Retrieve any missing structures from web-services
β€’ Remove the inorganic and metallo-organic structures
β€’ Remove salts and counter ions and fulfill valence
β€’ Standardize stereo-isomers and tautomers
β€’ Remove duplicates
Data preparation
Participants:
DTU: Technical University of Denmark/ National Food
Institute
EPA/NCCT: U.S. Environmental Protection Agency /
National Center for Computational Toxicology
FDA/NCTR/DBB: U.S. Food and Drug Administration/
National Center for Toxicological Research/Division of
Bioinformatics and Biostatistics
FDA/NCTR/DSB: U.S. Food and Drug Administration/
National Center for Toxicological Research/Division of
Systems Biology
Helmholtz/ISB: Helmholtz Zentrum Muenchen/
Institute of Structural Biology
ILS&EPA/NCCT: ILS Inc & EPA/NCCT
IRCSS: Istituto di Ricerche Farmacologiche β€œMario
Negri”
JRC_Ispra: Joint Research Centre of the European
Commission, Ispra.
LockheedMartin&EPA: Lockheed Martin IS&GS/
High Performance Computing
NIH/NCATS: National Institutes of Health/ National
Center for Advancing Translational Sciences
NIH/NCI: National Institutes of Health/ National
Cancer Institute
RIFM: Research Institute for Fragrance Materials, Inc
UMEA/Chemistry: University of UMEA/ Chemistry
department
UNC/MML: University of North Carolina/ Laboratory
for Molecular Modeling
UniBA/Pharma: University of Bari/ Department of
Pharmacy
UNIMIB/Michem: University of Milano-Bicocca/
Milano Chemometrics and QSAR Research Group
UNISTRA/Infochim: University of Strasbourg/
ChemoInformatique
Steps Tasks
1: Structures
curation
- Extract chemical structures from regulatory sources
- Design and document a workflow for structure cleaning
- Deliver the QSAR-ready training set and prediction set
to all participants
2: Experimental
data preparation
- Collect and clean experimental data for the evaluation
set
- Define the strategy used to evaluate models predictions
3: Modeling &
predictions
- Train or refine the models using the training set
- Compiling of predictions and applicability domains for
evaluation
4: Model
evaluation
- Analyze the training and evaluation sets for consistency
- Evaluate the predictions of each model separately
5: Consensus
strategy
- Define and calculate scores for each model based on the
evaluation step
- Define a weighting scheme from the scores
6: Consensus
modeling &
validation
- Combine the individual predictions based on the
weighting scheme and create a consensus prediction
- Validate the consensus model using a new external
dataset
Sources of chemicals:
β€’ EDSP Universe (10K)
β€’ Chemicals with known use (40K) (CPCat & ACToR)
β€’ Canadian Domestic Substances List (DSL) (23K)
β€’ EPA DSSTox – structures of EPA/FDA interest (15K)
β€’ ToxCast and Tox21 (In vitro ER data) (8K)
Sets of unique chemical structures:
β€’ Training set (ToxCast): 1677 Chemicals
β€’ Prediction Set: 32464 Chemicals
Active Inactive Total
Binding 1982 5301 7283
Agonist 350 5969 6319
Antagonist 284 6255 6539
Total 2617 7024 7522
Inactive V. Weak Weak Moderate Strong Total
Binding 5042 685 894 72 77 6770
Agonist 5892 19 179 31 42 6163
Antagonist 6221 76 188 10 10 6505
Total 6892 702 9016 81 93 7253
Models and evaluation
ROC curve of the consensus predictions evaluated using
variable number of literature sources
Box plot of the correlation between the positive concordance of the
categorical models and the potency level predicted by the continuous
models, positive classes predicted by the consensus model
Variation of the balanced accuracy with positive concordance
thresholds at different numbers of literature sources and
corresponding number of activesPlot of the consensus accuracy showing the importance of
using multiple literature sources
ToxCast data
Literature data
(All: 7283)
Literature data
(>6 sources: 1209)
Sensitivity 0.93 0.30 0.87
Specificity 0.97 0.91 0.94
Balanced accuracy 0.95 0.61 0.91
ToxCast data
Literature data
(All: 7283)
ObservedPredicted Actives Inactives Actives Inactives
Actives 83 6 597 1385
Inactives 40 1400 463 4838
Evaluation set for binary categorical models Evaluation set for continuous models
BA: balanced accuracy inside the applicability domain for the unambiguous compounds. % pred: percentage of the predicted within 32464 chemicals.
π‘ π‘π‘œπ‘Ÿπ‘’_1 = 1
3
𝑁𝐸𝑅 π‘‡π‘œπ‘₯πΆπ‘Žπ‘ π‘‘ βˆ— 𝑁_π‘π‘Ÿπ‘’π‘‘ π‘‡π‘œπ‘₯πΆπ‘Žπ‘ π‘‘
𝑁_π‘‘π‘œπ‘‘ π‘‡π‘œπ‘₯πΆπ‘Žπ‘ π‘‘
+
𝑁_π‘π‘Ÿπ‘’π‘‘
𝑁_π‘‘π‘œπ‘‘
+ 1
π‘π‘“π‘–π‘™π‘‘π‘’π‘Ÿ
𝑖=1
𝑁 π‘“π‘–π‘™π‘‘π‘’π‘Ÿ
𝑁𝐸𝑅𝑖 βˆ— 𝑁_π‘π‘Ÿπ‘’π‘‘π‘–
𝑁_π‘‘π‘œπ‘‘π‘–
π‘ π‘π‘œπ‘Ÿπ‘’_2 = 1
2 𝑁𝐸𝑅 π‘‡π‘œπ‘₯πΆπ‘Žπ‘ π‘‘ + 𝑁𝐸𝑅 π‘Žπ‘™π‘™_π‘“π‘–π‘™π‘‘π‘’π‘Ÿπ‘ 
Categorical models:
Binding: 22 models
Agonists: 11 models
Antagonists: 9 models
Continuous models:
Binding: 3 models
Agonists: 3 models
Antagonists: 2 models
Most models predict most
chemicals as inactive
Prioritization
757 chemicals have >75%
positive concordance
DTU EPA_
NCCT
FDA_NCTR
_DBB
FDA_NCTR
_DSB
OCHEM ILS__
EPA
IRCCS_
CART
IRCCS_
Ruleset
JRC__
_Ispra
Lockheed
&Martin_
EPA_
NIH__
NCATS
NIH__
NCI___
GUSAR
NIH_
NCI_
PASS
RIFM UMEA UNC_
MML
UNIBA UNIMIB_
Michem
UNISTRA
BA 0.78 0.69 0.68 0.66 0.72 0.75 0.75 0.62 0.67 0.66 0.65 0.69 0.66 0.65 0.70 0.65 0.73 0.71 0.60
% pred 49.48 100 100 0.62 96.47 99.9 89.20 94.88 100 100 99.97 94.87 97.4 100 99.89 100 46.75 36.44 100
Score_1 0.43 0.82 0.87 - 0.83 0.82 0.78 0.75 0.77 0.75 0.77 0.88 0.78 0.78 0.82 0.80 0.40 0.32 0.80
Score_2 0.80 0.78 0.84 0.69 0.80 0.79 0.77 0.77 0.74 0.75 0.67 0.84 0.76 0.69 0.76 0.73 0.80 0.85 0.73
Project goal:
The aim of CERAPP was to combine multiple
computational models for prediction of estrogen receptor activity
into a unique consensus model in order to prioritize a large
chemical of 32464 structures.
Concordance of binding models on the active compounds of the prediction set.
Statistics of the consensus model
Modelstatistics
Random

More Related Content

What's hot

OPERA, AN OPEN SOURCE AND OPEN DATA SUITE OF QSAR MODELS
OPERA, AN OPEN SOURCE AND OPEN DATA SUITE OF QSAR MODELSOPERA, AN OPEN SOURCE AND OPEN DATA SUITE OF QSAR MODELS
OPERA, AN OPEN SOURCE AND OPEN DATA SUITE OF QSAR MODELSKamel Mansouri
Β 
Virtual screening of chemicals for endocrine disrupting activity through CER...
Virtual screening of chemicals for endocrine disrupting activity through  CER...Virtual screening of chemicals for endocrine disrupting activity through  CER...
Virtual screening of chemicals for endocrine disrupting activity through CER...Kamel Mansouri
Β 
presentation on in silico studies
presentation on in silico studiespresentation on in silico studies
presentation on in silico studiesShaik Sana
Β 
The influence of data curation on QSAR Modeling – Presented at American Chemi...
The influence of data curation on QSAR Modeling – Presented at American Chemi...The influence of data curation on QSAR Modeling – Presented at American Chemi...
The influence of data curation on QSAR Modeling – Presented at American Chemi...Kamel Mansouri
Β 
Virtual screening of chemicals for endocrine disrupting activity: Case studie...
Virtual screening of chemicals for endocrine disrupting activity: Case studie...Virtual screening of chemicals for endocrine disrupting activity: Case studie...
Virtual screening of chemicals for endocrine disrupting activity: Case studie...Kamel Mansouri
Β 
Applying computational models for transporters to predict toxicity
Applying computational models for transporters to predict toxicityApplying computational models for transporters to predict toxicity
Applying computational models for transporters to predict toxicitySean Ekins
Β 
Alternative methods to animal toxicity testing
Alternative methods to animal toxicity testingAlternative methods to animal toxicity testing
Alternative methods to animal toxicity testingpriyachhikara1
Β 
Scoring and ranking of metabolic trees to computationally prioritize chemical...
Scoring and ranking of metabolic trees to computationally prioritize chemical...Scoring and ranking of metabolic trees to computationally prioritize chemical...
Scoring and ranking of metabolic trees to computationally prioritize chemical...Kamel Mansouri
Β 
OPERA: A free and open source QSAR tool for predicting physicochemical proper...
OPERA: A free and open source QSAR tool for predicting physicochemical proper...OPERA: A free and open source QSAR tool for predicting physicochemical proper...
OPERA: A free and open source QSAR tool for predicting physicochemical proper...Kamel Mansouri
Β 
Computational Drug Discovery: Machine Learning for Making Sense of Big Data i...
Computational Drug Discovery: Machine Learning for Making Sense of Big Data i...Computational Drug Discovery: Machine Learning for Making Sense of Big Data i...
Computational Drug Discovery: Machine Learning for Making Sense of Big Data i...Chanin Nantasenamat
Β 
The EPA iCSS Chemistry Dashboard to Support Compound Identification Using Hig...
The EPA iCSS Chemistry Dashboard to Support Compound Identification Using Hig...The EPA iCSS Chemistry Dashboard to Support Compound Identification Using Hig...
The EPA iCSS Chemistry Dashboard to Support Compound Identification Using Hig...Andrew McEachran
Β 

What's hot (20)

OPERA, AN OPEN SOURCE AND OPEN DATA SUITE OF QSAR MODELS
OPERA, AN OPEN SOURCE AND OPEN DATA SUITE OF QSAR MODELSOPERA, AN OPEN SOURCE AND OPEN DATA SUITE OF QSAR MODELS
OPERA, AN OPEN SOURCE AND OPEN DATA SUITE OF QSAR MODELS
Β 
The EPA iCSS Chemistry Dashboard to Support Compound Identification Using Hig...
The EPA iCSS Chemistry Dashboard to Support Compound Identification Using Hig...The EPA iCSS Chemistry Dashboard to Support Compound Identification Using Hig...
The EPA iCSS Chemistry Dashboard to Support Compound Identification Using Hig...
Β 
Environmental Chemistry Compound Identification Using High Resolution Mass Sp...
Environmental Chemistry Compound Identification Using High Resolution Mass Sp...Environmental Chemistry Compound Identification Using High Resolution Mass Sp...
Environmental Chemistry Compound Identification Using High Resolution Mass Sp...
Β 
Virtual screening of chemicals for endocrine disrupting activity through CER...
Virtual screening of chemicals for endocrine disrupting activity through  CER...Virtual screening of chemicals for endocrine disrupting activity through  CER...
Virtual screening of chemicals for endocrine disrupting activity through CER...
Β 
Structure Identification Using High Resolution Mass Spectrometry Data and the...
Structure Identification Using High Resolution Mass Spectrometry Data and the...Structure Identification Using High Resolution Mass Spectrometry Data and the...
Structure Identification Using High Resolution Mass Spectrometry Data and the...
Β 
Structure Identification Using High Resolution Mass Spectrometry Data and the...
Structure Identification Using High Resolution Mass Spectrometry Data and the...Structure Identification Using High Resolution Mass Spectrometry Data and the...
Structure Identification Using High Resolution Mass Spectrometry Data and the...
Β 
presentation on in silico studies
presentation on in silico studiespresentation on in silico studies
presentation on in silico studies
Β 
An examination of data quality on QSAR Modeling in regards to the environment...
An examination of data quality on QSAR Modeling in regards to the environment...An examination of data quality on QSAR Modeling in regards to the environment...
An examination of data quality on QSAR Modeling in regards to the environment...
Β 
The influence of data curation on QSAR Modeling – Presented at American Chemi...
The influence of data curation on QSAR Modeling – Presented at American Chemi...The influence of data curation on QSAR Modeling – Presented at American Chemi...
The influence of data curation on QSAR Modeling – Presented at American Chemi...
Β 
Virtual screening of chemicals for endocrine disrupting activity: Case studie...
Virtual screening of chemicals for endocrine disrupting activity: Case studie...Virtual screening of chemicals for endocrine disrupting activity: Case studie...
Virtual screening of chemicals for endocrine disrupting activity: Case studie...
Β 
The influence of data curation on QSAR Modeling – examining issues of qualit...
 The influence of data curation on QSAR Modeling – examining issues of qualit... The influence of data curation on QSAR Modeling – examining issues of qualit...
The influence of data curation on QSAR Modeling – examining issues of qualit...
Β 
Applying computational models for transporters to predict toxicity
Applying computational models for transporters to predict toxicityApplying computational models for transporters to predict toxicity
Applying computational models for transporters to predict toxicity
Β 
Docking
DockingDocking
Docking
Β 
Alternative methods to animal toxicity testing
Alternative methods to animal toxicity testingAlternative methods to animal toxicity testing
Alternative methods to animal toxicity testing
Β 
Scoring and ranking of metabolic trees to computationally prioritize chemical...
Scoring and ranking of metabolic trees to computationally prioritize chemical...Scoring and ranking of metabolic trees to computationally prioritize chemical...
Scoring and ranking of metabolic trees to computationally prioritize chemical...
Β 
OPERA: A free and open source QSAR tool for predicting physicochemical proper...
OPERA: A free and open source QSAR tool for predicting physicochemical proper...OPERA: A free and open source QSAR tool for predicting physicochemical proper...
OPERA: A free and open source QSAR tool for predicting physicochemical proper...
Β 
Delivering The Benefits of Chemical-Biological Integration in Computational T...
Delivering The Benefits of Chemical-Biological Integration in Computational T...Delivering The Benefits of Chemical-Biological Integration in Computational T...
Delivering The Benefits of Chemical-Biological Integration in Computational T...
Β 
Computational Drug Discovery: Machine Learning for Making Sense of Big Data i...
Computational Drug Discovery: Machine Learning for Making Sense of Big Data i...Computational Drug Discovery: Machine Learning for Making Sense of Big Data i...
Computational Drug Discovery: Machine Learning for Making Sense of Big Data i...
Β 
High Throughput Screening
High Throughput Screening High Throughput Screening
High Throughput Screening
Β 
The EPA iCSS Chemistry Dashboard to Support Compound Identification Using Hig...
The EPA iCSS Chemistry Dashboard to Support Compound Identification Using Hig...The EPA iCSS Chemistry Dashboard to Support Compound Identification Using Hig...
The EPA iCSS Chemistry Dashboard to Support Compound Identification Using Hig...
Β 

Viewers also liked

Stem cells biology and their application in clinical medicine
Stem cells biology and their application in clinical medicineStem cells biology and their application in clinical medicine
Stem cells biology and their application in clinical medicineRajesh Shukla
Β 
In vitro fertilization and embryo transfer in humans
In vitro fertilization and embryo transfer in humansIn vitro fertilization and embryo transfer in humans
In vitro fertilization and embryo transfer in humansHasnahana Chetia
Β 
In vitro fertilization embryo transfer
In vitro fertilization embryo transferIn vitro fertilization embryo transfer
In vitro fertilization embryo transferNiyamat Panjesha
Β 
Cloning Powerpoint
Cloning PowerpointCloning Powerpoint
Cloning Powerpointpau_ole_aloha
Β 

Viewers also liked (6)

Stem cells biology and their application in clinical medicine
Stem cells biology and their application in clinical medicineStem cells biology and their application in clinical medicine
Stem cells biology and their application in clinical medicine
Β 
In vitro fertilization and embryo transfer in humans
In vitro fertilization and embryo transfer in humansIn vitro fertilization and embryo transfer in humans
In vitro fertilization and embryo transfer in humans
Β 
I.V.F
I.V.FI.V.F
I.V.F
Β 
Cloning
CloningCloning
Cloning
Β 
In vitro fertilization embryo transfer
In vitro fertilization embryo transferIn vitro fertilization embryo transfer
In vitro fertilization embryo transfer
Β 
Cloning Powerpoint
Cloning PowerpointCloning Powerpoint
Cloning Powerpoint
Β 

Similar to EDSP Prioritization: Collaborative Estrogen Receptor Activity Prediction Project (CERAPP). Presented at SOT 2015, San Diego, CA, March 22 - 26, 2015.

CoMPARA: Collaborative Modeling Project for Androgen Receptor Activity
CoMPARA: Collaborative Modeling Project for Androgen Receptor ActivityCoMPARA: Collaborative Modeling Project for Androgen Receptor Activity
CoMPARA: Collaborative Modeling Project for Androgen Receptor ActivityKamel Mansouri
Β 
Consensus Models to Predict Endocrine Disruption for All Human-Exposure Chemi...
Consensus Models to Predict Endocrine Disruption for All Human-Exposure Chemi...Consensus Models to Predict Endocrine Disruption for All Human-Exposure Chemi...
Consensus Models to Predict Endocrine Disruption for All Human-Exposure Chemi...Kamel Mansouri
Β 
Development of machine learning-based prediction models for chemical modulato...
Development of machine learning-based prediction models for chemical modulato...Development of machine learning-based prediction models for chemical modulato...
Development of machine learning-based prediction models for chemical modulato...Sunghwan Kim
Β 
Bioinformatics t9-t10-biocheminformatics v2014
Bioinformatics t9-t10-biocheminformatics v2014Bioinformatics t9-t10-biocheminformatics v2014
Bioinformatics t9-t10-biocheminformatics v2014Prof. Wim Van Criekinge
Β 
SOT short course on computational toxicology
SOT short course on computational toxicology SOT short course on computational toxicology
SOT short course on computational toxicology Sean Ekins
Β 
Crofton Evolution of Toxicology
Crofton Evolution of ToxicologyCrofton Evolution of Toxicology
Crofton Evolution of ToxicologyKevinCrofton
Β 
Unc slides on computational toxicology
Unc slides on computational toxicologyUnc slides on computational toxicology
Unc slides on computational toxicologySean Ekins
Β 
2015 bioinformatics bio_cheminformatics_wim_vancriekinge
2015 bioinformatics bio_cheminformatics_wim_vancriekinge2015 bioinformatics bio_cheminformatics_wim_vancriekinge
2015 bioinformatics bio_cheminformatics_wim_vancriekingeProf. Wim Van Criekinge
Β 
Bioinformatics t9-t10-bio cheminformatics-wimvancriekinge_v2013
Bioinformatics t9-t10-bio cheminformatics-wimvancriekinge_v2013Bioinformatics t9-t10-bio cheminformatics-wimvancriekinge_v2013
Bioinformatics t9-t10-bio cheminformatics-wimvancriekinge_v2013Prof. Wim Van Criekinge
Β 
2016 bioinformatics i_bio_cheminformatics_wimvancriekinge
2016 bioinformatics i_bio_cheminformatics_wimvancriekinge2016 bioinformatics i_bio_cheminformatics_wimvancriekinge
2016 bioinformatics i_bio_cheminformatics_wimvancriekingeProf. Wim Van Criekinge
Β 
Using open bioactivity data for developing machine-learning prediction models...
Using open bioactivity data for developing machine-learning prediction models...Using open bioactivity data for developing machine-learning prediction models...
Using open bioactivity data for developing machine-learning prediction models...Sunghwan Kim
Β 
Predictive tox proposal_ver1
Predictive tox proposal_ver1Predictive tox proposal_ver1
Predictive tox proposal_ver1Ankur Khanna
Β 
International Computational Collaborations to Solve Toxicology Problems
International Computational Collaborations to Solve Toxicology ProblemsInternational Computational Collaborations to Solve Toxicology Problems
International Computational Collaborations to Solve Toxicology ProblemsKamel Mansouri
Β 
Chemical prioritization using in silico modeling. SOT 2018 (San Antonio, USA)
Chemical prioritization using in silico modeling. SOT 2018 (San Antonio, USA)Chemical prioritization using in silico modeling. SOT 2018 (San Antonio, USA)
Chemical prioritization using in silico modeling. SOT 2018 (San Antonio, USA)Kamel Mansouri
Β 

Similar to EDSP Prioritization: Collaborative Estrogen Receptor Activity Prediction Project (CERAPP). Presented at SOT 2015, San Diego, CA, March 22 - 26, 2015. (20)

CoMPARA: Collaborative Modeling Project for Androgen Receptor Activity
CoMPARA: Collaborative Modeling Project for Androgen Receptor ActivityCoMPARA: Collaborative Modeling Project for Androgen Receptor Activity
CoMPARA: Collaborative Modeling Project for Androgen Receptor Activity
Β 
Consensus Models to Predict Endocrine Disruption for All Human-Exposure Chemi...
Consensus Models to Predict Endocrine Disruption for All Human-Exposure Chemi...Consensus Models to Predict Endocrine Disruption for All Human-Exposure Chemi...
Consensus Models to Predict Endocrine Disruption for All Human-Exposure Chemi...
Β 
Development of machine learning-based prediction models for chemical modulato...
Development of machine learning-based prediction models for chemical modulato...Development of machine learning-based prediction models for chemical modulato...
Development of machine learning-based prediction models for chemical modulato...
Β 
Bioinformatics t9-t10-biocheminformatics v2014
Bioinformatics t9-t10-biocheminformatics v2014Bioinformatics t9-t10-biocheminformatics v2014
Bioinformatics t9-t10-biocheminformatics v2014
Β 
SOT short course on computational toxicology
SOT short course on computational toxicology SOT short course on computational toxicology
SOT short course on computational toxicology
Β 
Crofton Evolution of Toxicology
Crofton Evolution of ToxicologyCrofton Evolution of Toxicology
Crofton Evolution of Toxicology
Β 
Unc slides on computational toxicology
Unc slides on computational toxicologyUnc slides on computational toxicology
Unc slides on computational toxicology
Β 
2015 bioinformatics bio_cheminformatics_wim_vancriekinge
2015 bioinformatics bio_cheminformatics_wim_vancriekinge2015 bioinformatics bio_cheminformatics_wim_vancriekinge
2015 bioinformatics bio_cheminformatics_wim_vancriekinge
Β 
Bioinformatics t9-t10-bio cheminformatics-wimvancriekinge_v2013
Bioinformatics t9-t10-bio cheminformatics-wimvancriekinge_v2013Bioinformatics t9-t10-bio cheminformatics-wimvancriekinge_v2013
Bioinformatics t9-t10-bio cheminformatics-wimvancriekinge_v2013
Β 
2016 bioinformatics i_bio_cheminformatics_wimvancriekinge
2016 bioinformatics i_bio_cheminformatics_wimvancriekinge2016 bioinformatics i_bio_cheminformatics_wimvancriekinge
2016 bioinformatics i_bio_cheminformatics_wimvancriekinge
Β 
Using open bioactivity data for developing machine-learning prediction models...
Using open bioactivity data for developing machine-learning prediction models...Using open bioactivity data for developing machine-learning prediction models...
Using open bioactivity data for developing machine-learning prediction models...
Β 
Predictive tox proposal_ver1
Predictive tox proposal_ver1Predictive tox proposal_ver1
Predictive tox proposal_ver1
Β 
Development of a Tool for Systematic Integration of Traditional and New Appro...
Development of a Tool for Systematic Integration of Traditional and New Appro...Development of a Tool for Systematic Integration of Traditional and New Appro...
Development of a Tool for Systematic Integration of Traditional and New Appro...
Β 
Web-based access to experimental and predicted data for environmental fate, t...
Web-based access to experimental and predicted data for environmental fate, t...Web-based access to experimental and predicted data for environmental fate, t...
Web-based access to experimental and predicted data for environmental fate, t...
Β 
Incorporating new technologies and High Throughput Screening in the design an...
Incorporating new technologies and High Throughput Screening in the design an...Incorporating new technologies and High Throughput Screening in the design an...
Incorporating new technologies and High Throughput Screening in the design an...
Β 
International Computational Collaborations to Solve Toxicology Problems
International Computational Collaborations to Solve Toxicology ProblemsInternational Computational Collaborations to Solve Toxicology Problems
International Computational Collaborations to Solve Toxicology Problems
Β 
US EPA CompTox Chemistry Dashboard as a source of data to fill data gaps for ...
US EPA CompTox Chemistry Dashboard as a source of data to fill data gaps for ...US EPA CompTox Chemistry Dashboard as a source of data to fill data gaps for ...
US EPA CompTox Chemistry Dashboard as a source of data to fill data gaps for ...
Β 
Structure identification approaches using the EPA CompTox Chemicals Dashboard...
Structure identification approaches using the EPA CompTox Chemicals Dashboard...Structure identification approaches using the EPA CompTox Chemicals Dashboard...
Structure identification approaches using the EPA CompTox Chemicals Dashboard...
Β 
New Approach Methods - What is That?
New Approach Methods - What is That?New Approach Methods - What is That?
New Approach Methods - What is That?
Β 
Chemical prioritization using in silico modeling. SOT 2018 (San Antonio, USA)
Chemical prioritization using in silico modeling. SOT 2018 (San Antonio, USA)Chemical prioritization using in silico modeling. SOT 2018 (San Antonio, USA)
Chemical prioritization using in silico modeling. SOT 2018 (San Antonio, USA)
Β 

More from Kamel Mansouri

Automated workflows for data curation and standardization of chemical structu...
Automated workflows for data curation and standardization of chemical structu...Automated workflows for data curation and standardization of chemical structu...
Automated workflows for data curation and standardization of chemical structu...Kamel Mansouri
Β 
Free online access to experimental and predicted chemical properties through ...
Free online access to experimental and predicted chemical properties through ...Free online access to experimental and predicted chemical properties through ...
Free online access to experimental and predicted chemical properties through ...Kamel Mansouri
Β 
QSAR STUDY ON READY BIODEGRADABILITY OF CHEMICALS. Presented at the 3rd Chemo...
QSAR STUDY ON READY BIODEGRADABILITY OF CHEMICALS. Presented at the 3rd Chemo...QSAR STUDY ON READY BIODEGRADABILITY OF CHEMICALS. Presented at the 3rd Chemo...
QSAR STUDY ON READY BIODEGRADABILITY OF CHEMICALS. Presented at the 3rd Chemo...Kamel Mansouri
Β 
In-silico study of ToxCast GPCR assays by quantitative structure-activity rel...
In-silico study of ToxCast GPCR assays by quantitative structure-activity rel...In-silico study of ToxCast GPCR assays by quantitative structure-activity rel...
In-silico study of ToxCast GPCR assays by quantitative structure-activity rel...Kamel Mansouri
Β 
An examination of data quality on QSAR Modeling in regards to the environment...
An examination of data quality on QSAR Modeling in regards to the environment...An examination of data quality on QSAR Modeling in regards to the environment...
An examination of data quality on QSAR Modeling in regards to the environment...Kamel Mansouri
Β 
In-silico structure activity relationship study of toxicity endpoints by QSAR...
In-silico structure activity relationship study of toxicity endpoints by QSAR...In-silico structure activity relationship study of toxicity endpoints by QSAR...
In-silico structure activity relationship study of toxicity endpoints by QSAR...Kamel Mansouri
Β 

More from Kamel Mansouri (6)

Automated workflows for data curation and standardization of chemical structu...
Automated workflows for data curation and standardization of chemical structu...Automated workflows for data curation and standardization of chemical structu...
Automated workflows for data curation and standardization of chemical structu...
Β 
Free online access to experimental and predicted chemical properties through ...
Free online access to experimental and predicted chemical properties through ...Free online access to experimental and predicted chemical properties through ...
Free online access to experimental and predicted chemical properties through ...
Β 
QSAR STUDY ON READY BIODEGRADABILITY OF CHEMICALS. Presented at the 3rd Chemo...
QSAR STUDY ON READY BIODEGRADABILITY OF CHEMICALS. Presented at the 3rd Chemo...QSAR STUDY ON READY BIODEGRADABILITY OF CHEMICALS. Presented at the 3rd Chemo...
QSAR STUDY ON READY BIODEGRADABILITY OF CHEMICALS. Presented at the 3rd Chemo...
Β 
In-silico study of ToxCast GPCR assays by quantitative structure-activity rel...
In-silico study of ToxCast GPCR assays by quantitative structure-activity rel...In-silico study of ToxCast GPCR assays by quantitative structure-activity rel...
In-silico study of ToxCast GPCR assays by quantitative structure-activity rel...
Β 
An examination of data quality on QSAR Modeling in regards to the environment...
An examination of data quality on QSAR Modeling in regards to the environment...An examination of data quality on QSAR Modeling in regards to the environment...
An examination of data quality on QSAR Modeling in regards to the environment...
Β 
In-silico structure activity relationship study of toxicity endpoints by QSAR...
In-silico structure activity relationship study of toxicity endpoints by QSAR...In-silico structure activity relationship study of toxicity endpoints by QSAR...
In-silico structure activity relationship study of toxicity endpoints by QSAR...
Β 

Recently uploaded

Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfSumit Kumar yadav
Β 
Hire πŸ’• 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire πŸ’• 9907093804 Hooghly Call Girls Service Call Girls AgencyHire πŸ’• 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire πŸ’• 9907093804 Hooghly Call Girls Service Call Girls AgencySheetal Arora
Β 
Stunning βž₯8448380779β–» Call Girls In Panchshil Enclave Delhi NCR
Stunning βž₯8448380779β–» Call Girls In Panchshil Enclave Delhi NCRStunning βž₯8448380779β–» Call Girls In Panchshil Enclave Delhi NCR
Stunning βž₯8448380779β–» Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
Β 
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxBroad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxjana861314
Β 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)Areesha Ahmad
Β 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfSumit Kumar yadav
Β 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
Β 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...SΓ©rgio Sacani
Β 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhousejana861314
Β 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
Β 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
Β 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
Β 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
Β 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSΓ©rgio Sacani
Β 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
Β 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSΓ©rgio Sacani
Β 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
Β 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxpradhanghanshyam7136
Β 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...SΓ©rgio Sacani
Β 

Recently uploaded (20)

Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
Β 
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
Β 
Hire πŸ’• 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire πŸ’• 9907093804 Hooghly Call Girls Service Call Girls AgencyHire πŸ’• 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire πŸ’• 9907093804 Hooghly Call Girls Service Call Girls Agency
Β 
Stunning βž₯8448380779β–» Call Girls In Panchshil Enclave Delhi NCR
Stunning βž₯8448380779β–» Call Girls In Panchshil Enclave Delhi NCRStunning βž₯8448380779β–» Call Girls In Panchshil Enclave Delhi NCR
Stunning βž₯8448380779β–» Call Girls In Panchshil Enclave Delhi NCR
Β 
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxBroad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Β 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
Β 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
Β 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
Β 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
Β 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhouse
Β 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
Β 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
Β 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Β 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
Β 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
Β 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
Β 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Β 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
Β 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptx
Β 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Β 

EDSP Prioritization: Collaborative Estrogen Receptor Activity Prediction Project (CERAPP). Presented at SOT 2015, San Diego, CA, March 22 - 26, 2015.

  • 1. U.S. Environmental Protection Agency Office of Research and Development EDSP Prioritization: Collaborative Estrogen Receptor Activity Prediction Project (CERAPP) Kamel Mansouri1,2, Jayaram Kancherla1,2, Ann Richard2 and Richard Judson2 1Oak Ridge Institute for Science and Education, Oak Ridge, TN, USA 2National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency, RTP, NC, USA Abstract Humans are potentially exposed to tens of thousands of man-made chemicals in the environment. It is well known that some environmental chemicals mimic natural hormones and thus have the potential to be endocrine disruptors. Most of these environmental chemicals have never been tested for their ability to disrupt the endocrine system, in particular, their ability to interact with the estrogen receptor. EPA needs tools to prioritize thousands of chemicals, for instance in the Endocrine Disruptor Screening Program (EDSP). This project was intended to be a demonstration of the use of predictive computational models on HTS data including ToxCast and Tox21 assays to prioritize a large chemical universe of 32464 unique structures for one specific molecular target – the estrogen receptor. CERAPP combined multiple computational models for prediction of estrogen receptor activity, and used the predicted results to build a unique consensus model. Models were developed in collaboration between 17 groups in the U.S. and Europe and applied to predict the common set of chemicals. Structure-based techniques such as docking and several QSAR modeling approaches were employed, mostly using a common training set of 1677 compounds provided by U.S. EPA, to build a total of 42 classification models and 8 regression models for binding, agonist and antagonist activity. All predictions were evaluated on ToxCast data and on an external validation set collected from the literature. In order to overcome the limitations of single models, a consensus was built weighting models based on their prediction accuracy scores (including sensitivity and specificity against training and external sets). Individual model scores ranged from 0.69 to 0.85, showing high prediction reliabilities. The final consensus predicted 4001 chemicals as actives to be considered as high priority for further testing and 6742 as suspicious chemicals. Consensus model Kamel Mansouri l mansouri.kamel@epa.gov l 919-541-0545 Project planning Conclusions β€’ Successful collaboration of 17 international research groups resulted in consensus predictions. β€’ Most individual models performed very well on both ToxCast and literature data. β€’ Model accuracy differences between ToxCast and literature may be due to noise in the literature data. β€’ A total number of 4001 out of 32464 chemicals are predicted by the consensus as active binders. For prioritization purposes, 6742 additional chemicals (down to 0.2 positive concordance) could be considered as suspicious. β€’ The consensus model predictions correlate better with literature data from multiple sources, likely due to greater noise in data with unique or low number of sources. β€’ Consensus prediction results are being used in the EDSP program. See http://actor.epa.gov/edsp21. Disclaimer: The views expressed in this poster are those of the authors and do not necessarily reflect the views or policies of the U.S. Environmental Protection Agency. Chemical structure curation: The main steps of this KNIME workflow were: β€’ Check the validity of the molecular file format β€’ Retrieve any missing structures from web-services β€’ Remove the inorganic and metallo-organic structures β€’ Remove salts and counter ions and fulfill valence β€’ Standardize stereo-isomers and tautomers β€’ Remove duplicates Data preparation Participants: DTU: Technical University of Denmark/ National Food Institute EPA/NCCT: U.S. Environmental Protection Agency / National Center for Computational Toxicology FDA/NCTR/DBB: U.S. Food and Drug Administration/ National Center for Toxicological Research/Division of Bioinformatics and Biostatistics FDA/NCTR/DSB: U.S. Food and Drug Administration/ National Center for Toxicological Research/Division of Systems Biology Helmholtz/ISB: Helmholtz Zentrum Muenchen/ Institute of Structural Biology ILS&EPA/NCCT: ILS Inc & EPA/NCCT IRCSS: Istituto di Ricerche Farmacologiche β€œMario Negri” JRC_Ispra: Joint Research Centre of the European Commission, Ispra. LockheedMartin&EPA: Lockheed Martin IS&GS/ High Performance Computing NIH/NCATS: National Institutes of Health/ National Center for Advancing Translational Sciences NIH/NCI: National Institutes of Health/ National Cancer Institute RIFM: Research Institute for Fragrance Materials, Inc UMEA/Chemistry: University of UMEA/ Chemistry department UNC/MML: University of North Carolina/ Laboratory for Molecular Modeling UniBA/Pharma: University of Bari/ Department of Pharmacy UNIMIB/Michem: University of Milano-Bicocca/ Milano Chemometrics and QSAR Research Group UNISTRA/Infochim: University of Strasbourg/ ChemoInformatique Steps Tasks 1: Structures curation - Extract chemical structures from regulatory sources - Design and document a workflow for structure cleaning - Deliver the QSAR-ready training set and prediction set to all participants 2: Experimental data preparation - Collect and clean experimental data for the evaluation set - Define the strategy used to evaluate models predictions 3: Modeling & predictions - Train or refine the models using the training set - Compiling of predictions and applicability domains for evaluation 4: Model evaluation - Analyze the training and evaluation sets for consistency - Evaluate the predictions of each model separately 5: Consensus strategy - Define and calculate scores for each model based on the evaluation step - Define a weighting scheme from the scores 6: Consensus modeling & validation - Combine the individual predictions based on the weighting scheme and create a consensus prediction - Validate the consensus model using a new external dataset Sources of chemicals: β€’ EDSP Universe (10K) β€’ Chemicals with known use (40K) (CPCat & ACToR) β€’ Canadian Domestic Substances List (DSL) (23K) β€’ EPA DSSTox – structures of EPA/FDA interest (15K) β€’ ToxCast and Tox21 (In vitro ER data) (8K) Sets of unique chemical structures: β€’ Training set (ToxCast): 1677 Chemicals β€’ Prediction Set: 32464 Chemicals Active Inactive Total Binding 1982 5301 7283 Agonist 350 5969 6319 Antagonist 284 6255 6539 Total 2617 7024 7522 Inactive V. Weak Weak Moderate Strong Total Binding 5042 685 894 72 77 6770 Agonist 5892 19 179 31 42 6163 Antagonist 6221 76 188 10 10 6505 Total 6892 702 9016 81 93 7253 Models and evaluation ROC curve of the consensus predictions evaluated using variable number of literature sources Box plot of the correlation between the positive concordance of the categorical models and the potency level predicted by the continuous models, positive classes predicted by the consensus model Variation of the balanced accuracy with positive concordance thresholds at different numbers of literature sources and corresponding number of activesPlot of the consensus accuracy showing the importance of using multiple literature sources ToxCast data Literature data (All: 7283) Literature data (>6 sources: 1209) Sensitivity 0.93 0.30 0.87 Specificity 0.97 0.91 0.94 Balanced accuracy 0.95 0.61 0.91 ToxCast data Literature data (All: 7283) ObservedPredicted Actives Inactives Actives Inactives Actives 83 6 597 1385 Inactives 40 1400 463 4838 Evaluation set for binary categorical models Evaluation set for continuous models BA: balanced accuracy inside the applicability domain for the unambiguous compounds. % pred: percentage of the predicted within 32464 chemicals. π‘ π‘π‘œπ‘Ÿπ‘’_1 = 1 3 𝑁𝐸𝑅 π‘‡π‘œπ‘₯πΆπ‘Žπ‘ π‘‘ βˆ— 𝑁_π‘π‘Ÿπ‘’π‘‘ π‘‡π‘œπ‘₯πΆπ‘Žπ‘ π‘‘ 𝑁_π‘‘π‘œπ‘‘ π‘‡π‘œπ‘₯πΆπ‘Žπ‘ π‘‘ + 𝑁_π‘π‘Ÿπ‘’π‘‘ 𝑁_π‘‘π‘œπ‘‘ + 1 π‘π‘“π‘–π‘™π‘‘π‘’π‘Ÿ 𝑖=1 𝑁 π‘“π‘–π‘™π‘‘π‘’π‘Ÿ 𝑁𝐸𝑅𝑖 βˆ— 𝑁_π‘π‘Ÿπ‘’π‘‘π‘– 𝑁_π‘‘π‘œπ‘‘π‘– π‘ π‘π‘œπ‘Ÿπ‘’_2 = 1 2 𝑁𝐸𝑅 π‘‡π‘œπ‘₯πΆπ‘Žπ‘ π‘‘ + 𝑁𝐸𝑅 π‘Žπ‘™π‘™_π‘“π‘–π‘™π‘‘π‘’π‘Ÿπ‘  Categorical models: Binding: 22 models Agonists: 11 models Antagonists: 9 models Continuous models: Binding: 3 models Agonists: 3 models Antagonists: 2 models Most models predict most chemicals as inactive Prioritization 757 chemicals have >75% positive concordance DTU EPA_ NCCT FDA_NCTR _DBB FDA_NCTR _DSB OCHEM ILS__ EPA IRCCS_ CART IRCCS_ Ruleset JRC__ _Ispra Lockheed &Martin_ EPA_ NIH__ NCATS NIH__ NCI___ GUSAR NIH_ NCI_ PASS RIFM UMEA UNC_ MML UNIBA UNIMIB_ Michem UNISTRA BA 0.78 0.69 0.68 0.66 0.72 0.75 0.75 0.62 0.67 0.66 0.65 0.69 0.66 0.65 0.70 0.65 0.73 0.71 0.60 % pred 49.48 100 100 0.62 96.47 99.9 89.20 94.88 100 100 99.97 94.87 97.4 100 99.89 100 46.75 36.44 100 Score_1 0.43 0.82 0.87 - 0.83 0.82 0.78 0.75 0.77 0.75 0.77 0.88 0.78 0.78 0.82 0.80 0.40 0.32 0.80 Score_2 0.80 0.78 0.84 0.69 0.80 0.79 0.77 0.77 0.74 0.75 0.67 0.84 0.76 0.69 0.76 0.73 0.80 0.85 0.73 Project goal: The aim of CERAPP was to combine multiple computational models for prediction of estrogen receptor activity into a unique consensus model in order to prioritize a large chemical of 32464 structures. Concordance of binding models on the active compounds of the prediction set. Statistics of the consensus model Modelstatistics Random