SlideShare uma empresa Scribd logo
1 de 75
Delivering The Benefits of Chemical-Biological
Integration in Computational Toxicology
at the EPA
Antony Williams
National Center for Computational Toxicology
U.S. Environmental Protection Agency, RTP, NC
November 7th, 2016
German Cheminformatics Conference, Fulda, Germany
This work was reviewed by the U.S. EPA and approved for presentation but does not necessarily reflect official Agency policy.
http://www.orcid.org/0000-0002-2668-4821
Overview
• An introduction to NCCT
• Our Data and Our Dashboards
• The CompTox Chemistry Dashboard
– Architecture
– Data and Models: e,g, Physicochemical Properties
– Application to Non-Targeted Screening
• Coming Soon
• Future Work
1
Who is NCCT?
• National Center for Computational Toxicology – part of EPA’s
Office of Research and Development
• Research driven by EPA’s Chemical Safety for Sustainability
Research Program
– Develop new approaches to evaluate the safety of chemicals
– Integrate advances in biology, biotechnology, chemistry, exposure
science and computer science
2
• Goal - To identify chemical exposures that
may disrupt biological processes and cause
adverse outcomes.
Number of Chemicals in Commerce
Presents Regulatory Challenges
ONE LIST: EPA’s Endocrine Disruptor Screening Program (EDSP) List # of Compounds
Conventional Active Ingredients 838
Antimicrobial Active Ingredients 324
Biological Pesticide Active Ingredients 287
Non Food Use Inert Ingredients 2,211
Food Use Inert Ingredients 1,536
Fragrances used as Inert Ingredients 1,529
Safe Drinking Water Act Chemicals 3,616
TOTAL 10,341
December, 2014 Panel: “Scientific Issues Associated with
Integrated Endocrine Bioactivity and Exposure-Based Prioritization
and Screening“ DOCKET NUMBER: EPA–HQ–OPP–2014–0614
EDSP
Chemical
Universe
10,000
chemicals
Completed testing for
67 chemicals
Current testing for
107 chemicals
Exposure Data Cannot Keep Pace
with Regulatory Needs
TSCA: > 84,000
P.P. Egeghy et al. Sci Total Environ. 414 (2012) 159–166
We need more data and derivative
models and algorithms
• Our outputs include a lot of data, models, algorithms
and software applications
• We produce Open Data – we want people to
interrogate it, learn from it, develop understanding
5
High-Throughput Bioactivity to
Identify Potential Hazards
 Assays in dose-response format (50% activity concentration
– AC50 – and efficacy if data described by a Hill function)
Concentration
Response
In vitro Assay AC50
Concentration (mM)
Assay AC50
With Uncertainty
All data is made public:
http://actor.epa.gov/dashboard/
New datasets added continuously
(for example, 1060 chemicals
tested by Truong et al., 2014
zebrafish assay)
ToxCast & Tox21:
Chemicals, Data and Release
7
Set Chemicals Assays Endpoints Completion Available
ToxCast Phase I 293 ~600 ~700 2011 Now
ToxCast Phase II 767 ~600 ~700 03/2013 Now
ToxCast E1K 800 ~50 ~120 03/2013 Now
ToxCast Phase III ~8300 ~300 ~300 In progress 2016
Tox21 ~9000 ~80 ~150 In progress ongoing
~900
Chemicals
Assays
~800
0
Pesticides , antimicrobials, food additives, green alternatives, HPV, MPV,
endocrine reference cmpds, tox reference cmpds, NTP in vivo, FDA GRAS,
FDA PAFA, EDSP, water contaminants, exposure data, industrial, failed
drugs, marketed drugs, fragrances, flame retardants, etc.
~9000
High Throughput Measurement
to Identify Exposure
8
HT Hazard and Exposure Combined
for Risk Assessment
Potential Exposure
from ExpoCast
mg/kg BW/day
Potential Hazard
from ToxCast
Lower
Risk
Medium
Risk
Higher
Risk
Delivering Data via Dashboards
10
ACToR
https://actor.epa.gov/actor/
• Aggregated Computational Toxicology Resource
• Warehouse of available public toxicity data from
>1000 public sources for >500,000 chemicals
11
ACToR
https://actor.epa.gov/actor/
12
CPCat: Chemical and Product Categories
https://actor.epa.gov/cpcat/
13
Chemical and Product Categories is a
database containing information mapping
>43,000 chemicals to a set of terms
categorizing their usage or function.
Toxcast Dashboard
https://actor.epa.gov/dashboard/
• Access and Interrogate chemical screening
data from ToxCast and the Tox21 collaboration
14
Our New Developing Architecture
Our Latest Dashboard
https://comptox.epa.gov
16
About 720,000 chemicals
Almost 15 years of data
An Integration Hub
Approximately 15 Years of Data…
Continually changing
DSSTox_v2
PublicCurated
5. Low
2. Low
6. Untrusted
1. High
3. High
4. Med
7. Incomplete
validated
Public_Untrusted
Public_Low
Public_Medium
Public_High
DSSTox_Low
DSSTox_High 4535
16K
33K
101K
584K
~ 310K pending
~ 150K pending
Bisphenol A
(Accessing DSSTox Data)
18
Physicochemical Properties
19
Data Downloads
20
Data Download: Excel
21
Developing “NCCT Models”
• Physchem properties for exposure modeling,
augmented with ToxCast HTS in vitro data etc.
• Our approach to modeling:
– Obtain high quality training sets
– Apply appropriate modeling approaches
– Validate performance of models
– Define the applicability domain and limitations of the models
– Use models to predict properties across our full datasets
PHYSPROP Data: Available from:
http://esc.syrres.com/interkow/EpiSuiteData.htm
• Water solubility
• Melting Point
• Boiling Point
• LogP (Octanol-water partition coefficient)
• Atmospheric Hydroxylation Rate
• LogBCF (Bioconcentration Factor)
• Biodegradation Half-life
• Ready biodegradability
• Henry's Law Constant
• Fish Biotransformation Half-life
• LogKOA (Octanol/Air Partition Coefficient)
• LogKOC (Soil Adsorption Coefficient)
• Vapor Pressure
Check and Curate Public Data
• Public data should be curated prior to modeling.
• The data files have FOUR representations of a
chemical, plus the property value.
Check and Curate Public Data
Public data should be curated prior to modeling
25
Covalent Halogens Identical Chemicals
Mismatches
The Approach
• Our curation process
– Choose the “chemical” by checking levels of consistency
– Perform initial analysis manually to understand how to
clean the data (chemical structure and ID)
– Automate the process (and test iteratively)
– Process all datasets using final method
– We did NOT validate each measured property value
KNIME Workflow to Evaluate
the Dataset
LogP dataset: 15,809 structures
• CAS Checksum: 12163 valid, 3646 invalid (>23%)
• Invalid names: 555
• Invalid SMILES 133
• Valence errors: 322 Molfile, 3782 SMILES (>24%)
• Duplicates check:
–31 DUPLICATE MOLFILES
–626 DUPLICATE SMILES
–531 DUPLICATE NAMES
• SMILES vs. Molfiles (structure check)
–1279 differ in stereochemistry (~8%)
–362 “Covalent Halogens”
–191 differ as tautomers
–436 are different compounds (~3%)
Property Initial file Curated Data Curated QSAR ready
AOP 818 818 745
BCF 685 618 608
BioHC 175 151 150
Biowin 1265 1196 1171
BP 5890 5591 5436
HL 1829 1758 1711
KM 631 548 541
KOA 308 277 270
LogP 15809 14544 14041
MP 10051 9120 8656
PC 788 750 735
VP 3037 2840 2716
WF 5764 5076 4836
WS 2348 2046 2010
Curation to “QSAR Ready Files”
• “QSAR-Ready Structures”
– Desalt/Neutralize, Desolvate, Remove stereochemistry
Communicating Transparency in
Models to Users of an App
• Too often predicted values just give “numbers”
• Users have no real understanding of model performance
• There are good examples though! ACD/Ilab, T.E.S.T, OCHEM
30
ACD/ILab
EPA T.E.S.T
OCHEM
Prop Vars 5-fold CV (75%) Training (75%) Test (25%)
Q2 RMSE N R2 RMSE N R2 RMSE
BCF 10 0.84 0.55 465 0.85 0.53 161 0.83 0.64
BP 13 0.93 22.46 4077 0.93 22.06 1358 0.93 22.08
LogP 9 0.85 0.69 10531 0.86 0.67 3510 0.86 0.78
MP 15 0.72 51.8 6486 0.74 50.27 2167 0.73 52.72
VP 12 0.91 1.08 2034 0.91 1.08 679 0.92 1
WS 11 0.87 0.81 3158 0.87 0.82 1066 0.86 0.86
HL 9 0.84 1.96 441 0.84 1.91 150 0.85 1.82
NCCT Models: What you would
report in a paper…
Predicted Data
32
Calculation Result
for a chemical
Model Performance
with full QMRF
Nearest Neighbors
from Training Set
QMRF Reports
33
Prediction Details and QMRF
Report
34• Accepted for publication to SAR and QSAR in Environmental Research
EPA T.E.S.T
https://www.epa.gov/chemical-research/toxicity-estimation-software-tool-test
35
From physicochemical property
endpoints to toxicity endpoints
Full transparency for each prediction
36
External Links
37
Synonyms
38
Over a million synonyms, different
levels of curation and validation
Functional Use and Composition
(Integrating CPCat Data)
39
Bioassay Screening Data
(Integrating ToxCast Data)
40
Bioassay Screening Data
(Integrating ToxCast Data)
41
Exposure Data – NHANES and
ExpoCast Predictions
42
Environ. Sci. Technol., 2013, 47 (15), pp 8479–8488
PubChem integration
43
Connecting into the Dashboard
• Linkages into the Dashboard are simple: using the
associated identifiers
• For integration use files of structures/identifiers
mapped to DTXSIDs
• Integrated already - PubChem, EBI’s UNICHEM,
ChemSpider and whoever wants the files…
Our OPEN Data is available…
• Various types of data at FTP download site:
ftp://newftp.epa.gov/COMPTOX/Sustainable_Chemistry_
Data/Chemistry_Dashboard
45
ONE Application of the Dashboard
• Targeted Analysis:
- We know exactly what we’re looking for
- 10s – 100s of chemicals
• Suspect Screening Analysis (SSA):
- We have chemicals of interest
- 100s – 1,000s of chemicals
Suspect Screening Results
Suspect Screening in House DustMass
Retention Time
947 Peaks in an American Health Homes Dust Sample
We are now expanding our identity libraries
using reference samples of ToxCast chemicals
Liquid chromatography
peaks corresponds to a
chemical with an accurate
mass and predicted formula:
Multiple chemicals can have
the same mass and formula:
Is chemical A present,
chemical B, or both?
C17H19NO3
ONE Application of the Dashboard
• Targeted Analysis:
- We know exactly what we’re looking for
- 10s – 100s of chemicals
• Suspect Screening Analysis (SSA):
- We have chemicals of interest
- 100s – 1,000s of chemicals
• Non-Targeted Analysis (NTA):
- We have no preconceived lists
- 1,000s – 10,000s of chemicals
- In dust, soil, food, air, water, products,
plants, animals, and…us!!
Previous Work with Suspect-Screening
Rank-Ordering of “Known-Unknowns”
using ChemSpider
51
Advanced MS Searches
52
Does the Dashboard Add Value?
53
721k structures
Dilution Example…
Morphine Skeleton
54
ChemSpider 6982 Results!!!
Search for C15H15N3O2
55
Tacedinaline
Methyl Red
C.I Disperse
Yellow 3
Same top hits – different ranking
90 hits only versus 6926 hits
56
18
17
4Tacedinaline
Methyl Red
C.I Disperse
Yellow 3
Using Meta-Data to Sort Candidates
57
Microbiological
Indicator Dye
Anti-cancer Drug
Textile/Product Dye
Dashboard vs ChemSpider
Ranking Summary
Mass-based Searching Formula Based Searching
Dashboard ChemSpider Dashboard ChemSpider
Cumulative Average
Position 1.3 2.2 1.2 1.4
% in #1 Position 85% 70% 88% 80%
• Selected peer-reviewed publications
• 162 total individual chemicals in search
Coming December 2016
Batch Searching Names/CASRNs
• What are these chemicals?
59
Coming December 2016
Batch Searching…
60
Coming December 2016
Download to Excel
61
Non-Targeted Analysis: In-testing
62
Metadata included for Ranking
63
Need for “MS-Ready Structures”
64
“QSAR-Ready Structures”
• For the purpose of building QSAR Models
we already “standardize” structures
– Desalt/Neutralize
– Desolvate
– Remove stereochemistry
• Some minor tweaks gets us “MS-ready
Structures”. ALREADY in our database.
65
“QSAR-Ready Structures”
• Mass and Formula-based searches will be
based on MS-ready structures but
connected to the original chemical (with
name, CAS, rank ordering)
• MS-ready structures and substance
mappings will be available as Open Data
66
Work in Progress: RAPIDTOX
Predicted Hazards
67
Work in Progress: RAPIDTOX
Structural Analogs for Read-Across
Work in Progress: RAPIDTOX
PubMed “Abstract Sifting”
69
Future Work
• Real-time Predictions
• API/Web Services in development
• Deeper integration to agency databases
• Focus on the challenge “Identify chemical
exposures that may disrupt biological
processes and cause adverse outcomes”
70
Conclusion
• NCCT: delivering data, algorithms, models and
software tools for almost a decade.
• Developing a new flexible architecture to support
multiple both internal and external apps
• A future concept for the CompTox dashboard…
71
Future Search Possibilities
72
Chemicals Products Targets Assays Literature
Conclusion: We are focused on
Integrating Research Efforts
HT research needs:
NTA research needs:ToxCast:
Prioritizations for:
1) Parent chemicals
2) Mixtures
3) Metabolites
ExpoCast:
Measurement data for:
1) Model inputs
2) Model evaluation
3) Model refinement
1) Chemical databases
2) Chemical standards
3) Exposure forecasts
4) Bioactivity data
5) Functional use data
6) Prioritization methods
7) HT workflows
Acknowledgements
EPA-RTP
Chris Grulke
Kamel Mansouri
Ann Richard
Nancy Baker
Katharine Phillips
Kathie Dionisio
Jennifer Smith
Jeff Edwards
John Wambaugh
Grace Patlewicz
Imran Shah
Jon Sobus
Richard Judson

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Structure Identification Using High Resolution Mass Spectrometry Data and the...
Structure Identification Using High Resolution Mass Spectrometry Data and the...Structure Identification Using High Resolution Mass Spectrometry Data and the...
Structure Identification Using High Resolution Mass Spectrometry Data and the...
 
Chemical identification of unknowns in high resolution mass spectrometry usin...
Chemical identification of unknowns in high resolution mass spectrometry usin...Chemical identification of unknowns in high resolution mass spectrometry usin...
Chemical identification of unknowns in high resolution mass spectrometry usin...
 
The needs for chemistry standards, database tools and data curation at the ch...
The needs for chemistry standards, database tools and data curation at the ch...The needs for chemistry standards, database tools and data curation at the ch...
The needs for chemistry standards, database tools and data curation at the ch...
 
An examination of data quality on QSAR Modeling in regards to the environment...
An examination of data quality on QSAR Modeling in regards to the environment...An examination of data quality on QSAR Modeling in regards to the environment...
An examination of data quality on QSAR Modeling in regards to the environment...
 
Structure identification by Mass Spectrometry Non-Targeted Analysis using the...
Structure identification by Mass Spectrometry Non-Targeted Analysis using the...Structure identification by Mass Spectrometry Non-Targeted Analysis using the...
Structure identification by Mass Spectrometry Non-Targeted Analysis using the...
 
New developments in delivering public access to data from the National Center...
New developments in delivering public access to data from the National Center...New developments in delivering public access to data from the National Center...
New developments in delivering public access to data from the National Center...
 
Accessing information for chemicals in hydraulic fracturing fluids using the ...
Accessing information for chemicals in hydraulic fracturing fluids using the ...Accessing information for chemicals in hydraulic fracturing fluids using the ...
Accessing information for chemicals in hydraulic fracturing fluids using the ...
 
US EPA CompTox Chemistry Dashboard as a source of data to fill data gaps for ...
US EPA CompTox Chemistry Dashboard as a source of data to fill data gaps for ...US EPA CompTox Chemistry Dashboard as a source of data to fill data gaps for ...
US EPA CompTox Chemistry Dashboard as a source of data to fill data gaps for ...
 
Development of a Tool for Systematic Integration of Traditional and New Appro...
Development of a Tool for Systematic Integration of Traditional and New Appro...Development of a Tool for Systematic Integration of Traditional and New Appro...
Development of a Tool for Systematic Integration of Traditional and New Appro...
 
New Approach Methods - What is That?
New Approach Methods - What is That?New Approach Methods - What is That?
New Approach Methods - What is That?
 
Using the US EPA’s CompTox Chemistry Dashboard for structure identification a...
Using the US EPA’s CompTox Chemistry Dashboard for structure identification a...Using the US EPA’s CompTox Chemistry Dashboard for structure identification a...
Using the US EPA’s CompTox Chemistry Dashboard for structure identification a...
 
Web-based access to data for >600 disinfection by-products via the EPA CompTo...
Web-based access to data for >600 disinfection by-products via the EPA CompTo...Web-based access to data for >600 disinfection by-products via the EPA CompTo...
Web-based access to data for >600 disinfection by-products via the EPA CompTo...
 
Structure identification approaches using the EPA CompTox Chemicals Dashboard...
Structure identification approaches using the EPA CompTox Chemicals Dashboard...Structure identification approaches using the EPA CompTox Chemicals Dashboard...
Structure identification approaches using the EPA CompTox Chemicals Dashboard...
 
Non-targeted analysis supported by data and cheminformatics delivered via the...
Non-targeted analysis supported by data and cheminformatics delivered via the...Non-targeted analysis supported by data and cheminformatics delivered via the...
Non-targeted analysis supported by data and cheminformatics delivered via the...
 
How to place your research questions or results into the context of the "Lega...
How to place your research questions or results into the context of the "Lega...How to place your research questions or results into the context of the "Lega...
How to place your research questions or results into the context of the "Lega...
 
What chemicals constitute the Exposome? Accessing data via the US EPA’s Comp...
What chemicals constitute the Exposome? Accessing data via the US EPA’s  Comp...What chemicals constitute the Exposome? Accessing data via the US EPA’s  Comp...
What chemicals constitute the Exposome? Accessing data via the US EPA’s Comp...
 
US-EPA CompTox Chemicals Dashboard – integrating chemistry and biology data t...
US-EPA CompTox Chemicals Dashboard – integrating chemistry and biology data t...US-EPA CompTox Chemicals Dashboard – integrating chemistry and biology data t...
US-EPA CompTox Chemicals Dashboard – integrating chemistry and biology data t...
 
Adding complex expert knowledge into chemical database and transforming surfa...
Adding complex expert knowledge into chemical database and transforming surfa...Adding complex expert knowledge into chemical database and transforming surfa...
Adding complex expert knowledge into chemical database and transforming surfa...
 
Does bigger mean better in the world of chemistry databases?
Does bigger mean better in the world of chemistry databases? Does bigger mean better in the world of chemistry databases?
Does bigger mean better in the world of chemistry databases?
 
US-EPA CompTox Chemicals Dashboard providing access to experimental and predi...
US-EPA CompTox Chemicals Dashboard providing access to experimental and predi...US-EPA CompTox Chemicals Dashboard providing access to experimental and predi...
US-EPA CompTox Chemicals Dashboard providing access to experimental and predi...
 

Destaque

From Data Availability to Information Accessibility: The WellWiki Project
From Data Availability to Information Accessibility: The WellWiki ProjectFrom Data Availability to Information Accessibility: The WellWiki Project
From Data Availability to Information Accessibility: The WellWiki ProjectJoel Gehman
 
Using Ecological Momentary Assessment to Examine Post-food Consumption Affect...
Using Ecological Momentary Assessment to Examine Post-food Consumption Affect...Using Ecological Momentary Assessment to Examine Post-food Consumption Affect...
Using Ecological Momentary Assessment to Examine Post-food Consumption Affect...Yue Liao
 
SMS Berlin 2016 Cultural Perspectives on Strategic Management
SMS Berlin 2016 Cultural Perspectives on Strategic ManagementSMS Berlin 2016 Cultural Perspectives on Strategic Management
SMS Berlin 2016 Cultural Perspectives on Strategic ManagementJoel Gehman
 
Shaping Expectations: Defining and Refining the Role of Technical Services in...
Shaping Expectations: Defining and Refining the Role of Technical Services in...Shaping Expectations: Defining and Refining the Role of Technical Services in...
Shaping Expectations: Defining and Refining the Role of Technical Services in...NASIG
 
Web Preservation, or Managing your Organisation’s Online Presence After the O...
Web Preservation, or Managing your Organisation’s Online Presence After the O...Web Preservation, or Managing your Organisation’s Online Presence After the O...
Web Preservation, or Managing your Organisation’s Online Presence After the O...lisbk
 
Going Concerns: A Perspective from the Nexus of Business, Culture and Instit...
Going Concerns:  A Perspective from the Nexus of Business, Culture and Instit...Going Concerns:  A Perspective from the Nexus of Business, Culture and Instit...
Going Concerns: A Perspective from the Nexus of Business, Culture and Instit...Joel Gehman
 

Destaque (19)

Building an Online Profile Using Social Networking and Amplification Tools fo...
Building an Online Profile Using Social Networking and Amplification Tools fo...Building an Online Profile Using Social Networking and Amplification Tools fo...
Building an Online Profile Using Social Networking and Amplification Tools fo...
 
Social Networking Tools for Scientists and Building an Online Profile
Social Networking Tools for Scientists and Building an Online ProfileSocial Networking Tools for Scientists and Building an Online Profile
Social Networking Tools for Scientists and Building an Online Profile
 
Building an Online Profile: Social Networking and Amplification Tools for Sc...
Building an Online Profile:  Social Networking and Amplification Tools for Sc...Building an Online Profile:  Social Networking and Amplification Tools for Sc...
Building an Online Profile: Social Networking and Amplification Tools for Sc...
 
NSF Data Management Requirements 101
NSF Data Management Requirements 101NSF Data Management Requirements 101
NSF Data Management Requirements 101
 
Simple Springshare Mashups: Cross-Platform Strategies for Repurposing Digital...
Simple Springshare Mashups: Cross-Platform Strategies for Repurposing Digital...Simple Springshare Mashups: Cross-Platform Strategies for Repurposing Digital...
Simple Springshare Mashups: Cross-Platform Strategies for Repurposing Digital...
 
How One Monkey on a Typewriter Made a Difference to Online Chemistry
How One Monkey on a Typewriter Made a Difference to Online ChemistryHow One Monkey on a Typewriter Made a Difference to Online Chemistry
How One Monkey on a Typewriter Made a Difference to Online Chemistry
 
From Data Availability to Information Accessibility: The WellWiki Project
From Data Availability to Information Accessibility: The WellWiki ProjectFrom Data Availability to Information Accessibility: The WellWiki Project
From Data Availability to Information Accessibility: The WellWiki Project
 
Using Ecological Momentary Assessment to Examine Post-food Consumption Affect...
Using Ecological Momentary Assessment to Examine Post-food Consumption Affect...Using Ecological Momentary Assessment to Examine Post-food Consumption Affect...
Using Ecological Momentary Assessment to Examine Post-food Consumption Affect...
 
SMS Berlin 2016 Cultural Perspectives on Strategic Management
SMS Berlin 2016 Cultural Perspectives on Strategic ManagementSMS Berlin 2016 Cultural Perspectives on Strategic Management
SMS Berlin 2016 Cultural Perspectives on Strategic Management
 
Investigating Impact Metrics for Performance for the US-EPA National Center f...
Investigating Impact Metrics for Performance for the US-EPA National Center f...Investigating Impact Metrics for Performance for the US-EPA National Center f...
Investigating Impact Metrics for Performance for the US-EPA National Center f...
 
A Bird in the Hand: Leveraging ILL Requests to Improve Electronic Resource A...
A Bird in the Hand: Leveraging ILL Requests to Improve Electronic Resource A...A Bird in the Hand: Leveraging ILL Requests to Improve Electronic Resource A...
A Bird in the Hand: Leveraging ILL Requests to Improve Electronic Resource A...
 
Social Media Tools for Scientists and Building an Online Profile
Social Media Tools for Scientists and Building an Online ProfileSocial Media Tools for Scientists and Building an Online Profile
Social Media Tools for Scientists and Building an Online Profile
 
Shaping Expectations: Defining and Refining the Role of Technical Services in...
Shaping Expectations: Defining and Refining the Role of Technical Services in...Shaping Expectations: Defining and Refining the Role of Technical Services in...
Shaping Expectations: Defining and Refining the Role of Technical Services in...
 
Web Preservation, or Managing your Organisation’s Online Presence After the O...
Web Preservation, or Managing your Organisation’s Online Presence After the O...Web Preservation, or Managing your Organisation’s Online Presence After the O...
Web Preservation, or Managing your Organisation’s Online Presence After the O...
 
Going Concerns: A Perspective from the Nexus of Business, Culture and Instit...
Going Concerns:  A Perspective from the Nexus of Business, Culture and Instit...Going Concerns:  A Perspective from the Nexus of Business, Culture and Instit...
Going Concerns: A Perspective from the Nexus of Business, Culture and Instit...
 
The EPA Comptox Chemistry Dashboard: A Web-Based Data Integration Hub for Tox...
The EPA Comptox Chemistry Dashboard: A Web-Based Data Integration Hub for Tox...The EPA Comptox Chemistry Dashboard: A Web-Based Data Integration Hub for Tox...
The EPA Comptox Chemistry Dashboard: A Web-Based Data Integration Hub for Tox...
 
2016 davis-plantbio
2016 davis-plantbio2016 davis-plantbio
2016 davis-plantbio
 
2016 bergen-sars
2016 bergen-sars2016 bergen-sars
2016 bergen-sars
 
2016 davis-biotech
2016 davis-biotech2016 davis-biotech
2016 davis-biotech
 

Semelhante a Delivering The Benefits of Chemical-Biological Integration in Computational Toxicology at the EPA

The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...
The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...
The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...Kamel Mansouri
 
The influence of data curation on QSAR Modeling – Presented at American Chemi...
The influence of data curation on QSAR Modeling – Presented at American Chemi...The influence of data curation on QSAR Modeling – Presented at American Chemi...
The influence of data curation on QSAR Modeling – Presented at American Chemi...Kamel Mansouri
 
BioAssay Express: Creating and exploiting assay metadata
BioAssay Express: Creating and exploiting assay metadataBioAssay Express: Creating and exploiting assay metadata
BioAssay Express: Creating and exploiting assay metadataPhilip Cheung
 
The EPA CompTox Dashboard as a Data Integration Hub for Environmental Chemist...
The EPA CompTox Dashboard as a Data Integration Hub for Environmental Chemist...The EPA CompTox Dashboard as a Data Integration Hub for Environmental Chemist...
The EPA CompTox Dashboard as a Data Integration Hub for Environmental Chemist...Andrew McEachran
 
CERAPP - Collaborative Estrogen Receptor Activity Prediction Project. Computa...
CERAPP - Collaborative Estrogen Receptor Activity Prediction Project. Computa...CERAPP - Collaborative Estrogen Receptor Activity Prediction Project. Computa...
CERAPP - Collaborative Estrogen Receptor Activity Prediction Project. Computa...Kamel Mansouri
 
An examination of data quality on QSAR Modeling in regards to the environment...
An examination of data quality on QSAR Modeling in regards to the environment...An examination of data quality on QSAR Modeling in regards to the environment...
An examination of data quality on QSAR Modeling in regards to the environment...Kamel Mansouri
 

Semelhante a Delivering The Benefits of Chemical-Biological Integration in Computational Toxicology at the EPA (20)

Incorporating new technologies and High Throughput Screening in the design an...
Incorporating new technologies and High Throughput Screening in the design an...Incorporating new technologies and High Throughput Screening in the design an...
Incorporating new technologies and High Throughput Screening in the design an...
 
Web-based access to experimental and predicted data for environmental fate, t...
Web-based access to experimental and predicted data for environmental fate, t...Web-based access to experimental and predicted data for environmental fate, t...
Web-based access to experimental and predicted data for environmental fate, t...
 
The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...
The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...
The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...
 
Accessing Environmental Chemistry Data via Data Dashboards
Accessing Environmental Chemistry Data via Data Dashboards Accessing Environmental Chemistry Data via Data Dashboards
Accessing Environmental Chemistry Data via Data Dashboards
 
Progress in Using Big Data in Chemical Toxicity Research at the National Cent...
Progress in Using Big Data in Chemical Toxicity Research at the National Cent...Progress in Using Big Data in Chemical Toxicity Research at the National Cent...
Progress in Using Big Data in Chemical Toxicity Research at the National Cent...
 
Data Review and Clean-Up Using Crowdsourced Input via the US EPA CompTox Das...
Data Review and Clean-Up Using Crowdsourced Input via the  US EPA CompTox Das...Data Review and Clean-Up Using Crowdsourced Input via the  US EPA CompTox Das...
Data Review and Clean-Up Using Crowdsourced Input via the US EPA CompTox Das...
 
US-EPA Chemicals Dashboard – an integrated data hub for environmental science
US-EPA Chemicals Dashboard – an integrated data hub for environmental scienceUS-EPA Chemicals Dashboard – an integrated data hub for environmental science
US-EPA Chemicals Dashboard – an integrated data hub for environmental science
 
The influence of data curation on QSAR Modeling – Presented at American Chemi...
The influence of data curation on QSAR Modeling – Presented at American Chemi...The influence of data curation on QSAR Modeling – Presented at American Chemi...
The influence of data curation on QSAR Modeling – Presented at American Chemi...
 
Introduction to Cheminformatics: Accessing data through the CompTox Chemicals...
Introduction to Cheminformatics: Accessing data through the CompTox Chemicals...Introduction to Cheminformatics: Accessing data through the CompTox Chemicals...
Introduction to Cheminformatics: Accessing data through the CompTox Chemicals...
 
US-EPA Cheminformatics Support for Delivering Data Related to Chemicals of E...
US-EPA Cheminformatics Support for Delivering Data Related to Chemicals of E...US-EPA Cheminformatics Support for Delivering Data Related to Chemicals of E...
US-EPA Cheminformatics Support for Delivering Data Related to Chemicals of E...
 
BioAssay Express: Creating and exploiting assay metadata
BioAssay Express: Creating and exploiting assay metadataBioAssay Express: Creating and exploiting assay metadata
BioAssay Express: Creating and exploiting assay metadata
 
Chemistry data: Distortion and dissemination in the Internet Era
Chemistry data: Distortion and dissemination in the Internet EraChemistry data: Distortion and dissemination in the Internet Era
Chemistry data: Distortion and dissemination in the Internet Era
 
Delivering chemical-associated data via EPA web applications
Delivering chemical-associated data via EPA web applicationsDelivering chemical-associated data via EPA web applications
Delivering chemical-associated data via EPA web applications
 
The EPA CompTox Chemistry Dashboard and Underpinning Software Architecture
The EPA CompTox Chemistry Dashboard and Underpinning Software Architecture The EPA CompTox Chemistry Dashboard and Underpinning Software Architecture
The EPA CompTox Chemistry Dashboard and Underpinning Software Architecture
 
The EPA CompTox Dashboard as a Data Integration Hub for Environmental Chemist...
The EPA CompTox Dashboard as a Data Integration Hub for Environmental Chemist...The EPA CompTox Dashboard as a Data Integration Hub for Environmental Chemist...
The EPA CompTox Dashboard as a Data Integration Hub for Environmental Chemist...
 
Accessing information for Per- & Polyfluoroalkyl Substances using the US EPA ...
Accessing information for Per- & Polyfluoroalkyl Substances using the US EPA ...Accessing information for Per- & Polyfluoroalkyl Substances using the US EPA ...
Accessing information for Per- & Polyfluoroalkyl Substances using the US EPA ...
 
CERAPP - Collaborative Estrogen Receptor Activity Prediction Project. Computa...
CERAPP - Collaborative Estrogen Receptor Activity Prediction Project. Computa...CERAPP - Collaborative Estrogen Receptor Activity Prediction Project. Computa...
CERAPP - Collaborative Estrogen Receptor Activity Prediction Project. Computa...
 
An examination of data quality on QSAR Modeling in regards to the environment...
An examination of data quality on QSAR Modeling in regards to the environment...An examination of data quality on QSAR Modeling in regards to the environment...
An examination of data quality on QSAR Modeling in regards to the environment...
 
The EPA Comptox Chemistry Dashboard: A Web-Based Data Integration Hub for Env...
The EPA Comptox Chemistry Dashboard: A Web-Based Data Integration Hub for Env...The EPA Comptox Chemistry Dashboard: A Web-Based Data Integration Hub for Env...
The EPA Comptox Chemistry Dashboard: A Web-Based Data Integration Hub for Env...
 
Data delivery from the US-EPA Center for Computational Toxicology and Exposur...
Data delivery from the US-EPA Center for Computational Toxicology and Exposur...Data delivery from the US-EPA Center for Computational Toxicology and Exposur...
Data delivery from the US-EPA Center for Computational Toxicology and Exposur...
 

Último

Grade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsGrade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsOrtegaSyrineMay
 
Sector 62, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 62, Noida Call girls :8448380779 Model Escorts | 100% verifiedSector 62, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 62, Noida Call girls :8448380779 Model Escorts | 100% verifiedDelhi Call girls
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformationAreesha Ahmad
 
Dubai Call Girls Beauty Face Teen O525547819 Call Girls Dubai Young
Dubai Call Girls Beauty Face Teen O525547819 Call Girls Dubai YoungDubai Call Girls Beauty Face Teen O525547819 Call Girls Dubai Young
Dubai Call Girls Beauty Face Teen O525547819 Call Girls Dubai Youngkajalvid75
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)Areesha Ahmad
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bSérgio Sacani
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Silpa
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPirithiRaju
 
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Silpa
 
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedConnaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedDelhi Call girls
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxMohamedFarag457087
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxseri bangash
 
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICESAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICEayushi9330
 
Introduction to Viruses
Introduction to VirusesIntroduction to Viruses
Introduction to VirusesAreesha Ahmad
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxSuji236384
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusNazaninKarimi6
 

Último (20)

Grade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsGrade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its Functions
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
Sector 62, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 62, Noida Call girls :8448380779 Model Escorts | 100% verifiedSector 62, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 62, Noida Call girls :8448380779 Model Escorts | 100% verified
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
 
Dubai Call Girls Beauty Face Teen O525547819 Call Girls Dubai Young
Dubai Call Girls Beauty Face Teen O525547819 Call Girls Dubai YoungDubai Call Girls Beauty Face Teen O525547819 Call Girls Dubai Young
Dubai Call Girls Beauty Face Teen O525547819 Call Girls Dubai Young
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
 
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedConnaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
 
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICESAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
 
Introduction to Viruses
Introduction to VirusesIntroduction to Viruses
Introduction to Viruses
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
 

Delivering The Benefits of Chemical-Biological Integration in Computational Toxicology at the EPA

  • 1. Delivering The Benefits of Chemical-Biological Integration in Computational Toxicology at the EPA Antony Williams National Center for Computational Toxicology U.S. Environmental Protection Agency, RTP, NC November 7th, 2016 German Cheminformatics Conference, Fulda, Germany This work was reviewed by the U.S. EPA and approved for presentation but does not necessarily reflect official Agency policy. http://www.orcid.org/0000-0002-2668-4821
  • 2. Overview • An introduction to NCCT • Our Data and Our Dashboards • The CompTox Chemistry Dashboard – Architecture – Data and Models: e,g, Physicochemical Properties – Application to Non-Targeted Screening • Coming Soon • Future Work 1
  • 3. Who is NCCT? • National Center for Computational Toxicology – part of EPA’s Office of Research and Development • Research driven by EPA’s Chemical Safety for Sustainability Research Program – Develop new approaches to evaluate the safety of chemicals – Integrate advances in biology, biotechnology, chemistry, exposure science and computer science 2 • Goal - To identify chemical exposures that may disrupt biological processes and cause adverse outcomes.
  • 4. Number of Chemicals in Commerce Presents Regulatory Challenges ONE LIST: EPA’s Endocrine Disruptor Screening Program (EDSP) List # of Compounds Conventional Active Ingredients 838 Antimicrobial Active Ingredients 324 Biological Pesticide Active Ingredients 287 Non Food Use Inert Ingredients 2,211 Food Use Inert Ingredients 1,536 Fragrances used as Inert Ingredients 1,529 Safe Drinking Water Act Chemicals 3,616 TOTAL 10,341 December, 2014 Panel: “Scientific Issues Associated with Integrated Endocrine Bioactivity and Exposure-Based Prioritization and Screening“ DOCKET NUMBER: EPA–HQ–OPP–2014–0614 EDSP Chemical Universe 10,000 chemicals Completed testing for 67 chemicals Current testing for 107 chemicals
  • 5. Exposure Data Cannot Keep Pace with Regulatory Needs TSCA: > 84,000 P.P. Egeghy et al. Sci Total Environ. 414 (2012) 159–166
  • 6. We need more data and derivative models and algorithms • Our outputs include a lot of data, models, algorithms and software applications • We produce Open Data – we want people to interrogate it, learn from it, develop understanding 5
  • 7. High-Throughput Bioactivity to Identify Potential Hazards  Assays in dose-response format (50% activity concentration – AC50 – and efficacy if data described by a Hill function) Concentration Response In vitro Assay AC50 Concentration (mM) Assay AC50 With Uncertainty All data is made public: http://actor.epa.gov/dashboard/ New datasets added continuously (for example, 1060 chemicals tested by Truong et al., 2014 zebrafish assay)
  • 8. ToxCast & Tox21: Chemicals, Data and Release 7 Set Chemicals Assays Endpoints Completion Available ToxCast Phase I 293 ~600 ~700 2011 Now ToxCast Phase II 767 ~600 ~700 03/2013 Now ToxCast E1K 800 ~50 ~120 03/2013 Now ToxCast Phase III ~8300 ~300 ~300 In progress 2016 Tox21 ~9000 ~80 ~150 In progress ongoing ~900 Chemicals Assays ~800 0 Pesticides , antimicrobials, food additives, green alternatives, HPV, MPV, endocrine reference cmpds, tox reference cmpds, NTP in vivo, FDA GRAS, FDA PAFA, EDSP, water contaminants, exposure data, industrial, failed drugs, marketed drugs, fragrances, flame retardants, etc. ~9000
  • 9. High Throughput Measurement to Identify Exposure 8
  • 10. HT Hazard and Exposure Combined for Risk Assessment Potential Exposure from ExpoCast mg/kg BW/day Potential Hazard from ToxCast Lower Risk Medium Risk Higher Risk
  • 11. Delivering Data via Dashboards 10
  • 12. ACToR https://actor.epa.gov/actor/ • Aggregated Computational Toxicology Resource • Warehouse of available public toxicity data from >1000 public sources for >500,000 chemicals 11
  • 14. CPCat: Chemical and Product Categories https://actor.epa.gov/cpcat/ 13 Chemical and Product Categories is a database containing information mapping >43,000 chemicals to a set of terms categorizing their usage or function.
  • 15. Toxcast Dashboard https://actor.epa.gov/dashboard/ • Access and Interrogate chemical screening data from ToxCast and the Tox21 collaboration 14
  • 16. Our New Developing Architecture
  • 17. Our Latest Dashboard https://comptox.epa.gov 16 About 720,000 chemicals Almost 15 years of data An Integration Hub
  • 18. Approximately 15 Years of Data… Continually changing DSSTox_v2 PublicCurated 5. Low 2. Low 6. Untrusted 1. High 3. High 4. Med 7. Incomplete validated Public_Untrusted Public_Low Public_Medium Public_High DSSTox_Low DSSTox_High 4535 16K 33K 101K 584K ~ 310K pending ~ 150K pending
  • 23. Developing “NCCT Models” • Physchem properties for exposure modeling, augmented with ToxCast HTS in vitro data etc. • Our approach to modeling: – Obtain high quality training sets – Apply appropriate modeling approaches – Validate performance of models – Define the applicability domain and limitations of the models – Use models to predict properties across our full datasets
  • 24. PHYSPROP Data: Available from: http://esc.syrres.com/interkow/EpiSuiteData.htm • Water solubility • Melting Point • Boiling Point • LogP (Octanol-water partition coefficient) • Atmospheric Hydroxylation Rate • LogBCF (Bioconcentration Factor) • Biodegradation Half-life • Ready biodegradability • Henry's Law Constant • Fish Biotransformation Half-life • LogKOA (Octanol/Air Partition Coefficient) • LogKOC (Soil Adsorption Coefficient) • Vapor Pressure
  • 25. Check and Curate Public Data • Public data should be curated prior to modeling. • The data files have FOUR representations of a chemical, plus the property value.
  • 26. Check and Curate Public Data Public data should be curated prior to modeling 25 Covalent Halogens Identical Chemicals Mismatches
  • 27. The Approach • Our curation process – Choose the “chemical” by checking levels of consistency – Perform initial analysis manually to understand how to clean the data (chemical structure and ID) – Automate the process (and test iteratively) – Process all datasets using final method – We did NOT validate each measured property value
  • 28. KNIME Workflow to Evaluate the Dataset
  • 29. LogP dataset: 15,809 structures • CAS Checksum: 12163 valid, 3646 invalid (>23%) • Invalid names: 555 • Invalid SMILES 133 • Valence errors: 322 Molfile, 3782 SMILES (>24%) • Duplicates check: –31 DUPLICATE MOLFILES –626 DUPLICATE SMILES –531 DUPLICATE NAMES • SMILES vs. Molfiles (structure check) –1279 differ in stereochemistry (~8%) –362 “Covalent Halogens” –191 differ as tautomers –436 are different compounds (~3%)
  • 30. Property Initial file Curated Data Curated QSAR ready AOP 818 818 745 BCF 685 618 608 BioHC 175 151 150 Biowin 1265 1196 1171 BP 5890 5591 5436 HL 1829 1758 1711 KM 631 548 541 KOA 308 277 270 LogP 15809 14544 14041 MP 10051 9120 8656 PC 788 750 735 VP 3037 2840 2716 WF 5764 5076 4836 WS 2348 2046 2010 Curation to “QSAR Ready Files” • “QSAR-Ready Structures” – Desalt/Neutralize, Desolvate, Remove stereochemistry
  • 31. Communicating Transparency in Models to Users of an App • Too often predicted values just give “numbers” • Users have no real understanding of model performance • There are good examples though! ACD/Ilab, T.E.S.T, OCHEM 30 ACD/ILab EPA T.E.S.T OCHEM
  • 32. Prop Vars 5-fold CV (75%) Training (75%) Test (25%) Q2 RMSE N R2 RMSE N R2 RMSE BCF 10 0.84 0.55 465 0.85 0.53 161 0.83 0.64 BP 13 0.93 22.46 4077 0.93 22.06 1358 0.93 22.08 LogP 9 0.85 0.69 10531 0.86 0.67 3510 0.86 0.78 MP 15 0.72 51.8 6486 0.74 50.27 2167 0.73 52.72 VP 12 0.91 1.08 2034 0.91 1.08 679 0.92 1 WS 11 0.87 0.81 3158 0.87 0.82 1066 0.86 0.86 HL 9 0.84 1.96 441 0.84 1.91 150 0.85 1.82 NCCT Models: What you would report in a paper…
  • 33. Predicted Data 32 Calculation Result for a chemical Model Performance with full QMRF Nearest Neighbors from Training Set
  • 35. Prediction Details and QMRF Report 34• Accepted for publication to SAR and QSAR in Environmental Research
  • 37. Full transparency for each prediction 36
  • 39. Synonyms 38 Over a million synonyms, different levels of curation and validation
  • 40. Functional Use and Composition (Integrating CPCat Data) 39
  • 43. Exposure Data – NHANES and ExpoCast Predictions 42 Environ. Sci. Technol., 2013, 47 (15), pp 8479–8488
  • 45. Connecting into the Dashboard • Linkages into the Dashboard are simple: using the associated identifiers • For integration use files of structures/identifiers mapped to DTXSIDs • Integrated already - PubChem, EBI’s UNICHEM, ChemSpider and whoever wants the files…
  • 46. Our OPEN Data is available… • Various types of data at FTP download site: ftp://newftp.epa.gov/COMPTOX/Sustainable_Chemistry_ Data/Chemistry_Dashboard 45
  • 47. ONE Application of the Dashboard • Targeted Analysis: - We know exactly what we’re looking for - 10s – 100s of chemicals • Suspect Screening Analysis (SSA): - We have chemicals of interest - 100s – 1,000s of chemicals
  • 49. Suspect Screening in House DustMass Retention Time 947 Peaks in an American Health Homes Dust Sample We are now expanding our identity libraries using reference samples of ToxCast chemicals Liquid chromatography peaks corresponds to a chemical with an accurate mass and predicted formula: Multiple chemicals can have the same mass and formula: Is chemical A present, chemical B, or both? C17H19NO3
  • 50. ONE Application of the Dashboard • Targeted Analysis: - We know exactly what we’re looking for - 10s – 100s of chemicals • Suspect Screening Analysis (SSA): - We have chemicals of interest - 100s – 1,000s of chemicals • Non-Targeted Analysis (NTA): - We have no preconceived lists - 1,000s – 10,000s of chemicals - In dust, soil, food, air, water, products, plants, animals, and…us!!
  • 51. Previous Work with Suspect-Screening
  • 54. Does the Dashboard Add Value? 53 721k structures
  • 56. ChemSpider 6982 Results!!! Search for C15H15N3O2 55 Tacedinaline Methyl Red C.I Disperse Yellow 3
  • 57. Same top hits – different ranking 90 hits only versus 6926 hits 56 18 17 4Tacedinaline Methyl Red C.I Disperse Yellow 3
  • 58. Using Meta-Data to Sort Candidates 57 Microbiological Indicator Dye Anti-cancer Drug Textile/Product Dye
  • 59. Dashboard vs ChemSpider Ranking Summary Mass-based Searching Formula Based Searching Dashboard ChemSpider Dashboard ChemSpider Cumulative Average Position 1.3 2.2 1.2 1.4 % in #1 Position 85% 70% 88% 80% • Selected peer-reviewed publications • 162 total individual chemicals in search
  • 60. Coming December 2016 Batch Searching Names/CASRNs • What are these chemicals? 59
  • 61. Coming December 2016 Batch Searching… 60
  • 64. Metadata included for Ranking 63
  • 65. Need for “MS-Ready Structures” 64
  • 66. “QSAR-Ready Structures” • For the purpose of building QSAR Models we already “standardize” structures – Desalt/Neutralize – Desolvate – Remove stereochemistry • Some minor tweaks gets us “MS-ready Structures”. ALREADY in our database. 65
  • 67. “QSAR-Ready Structures” • Mass and Formula-based searches will be based on MS-ready structures but connected to the original chemical (with name, CAS, rank ordering) • MS-ready structures and substance mappings will be available as Open Data 66
  • 68. Work in Progress: RAPIDTOX Predicted Hazards 67
  • 69. Work in Progress: RAPIDTOX Structural Analogs for Read-Across
  • 70. Work in Progress: RAPIDTOX PubMed “Abstract Sifting” 69
  • 71. Future Work • Real-time Predictions • API/Web Services in development • Deeper integration to agency databases • Focus on the challenge “Identify chemical exposures that may disrupt biological processes and cause adverse outcomes” 70
  • 72. Conclusion • NCCT: delivering data, algorithms, models and software tools for almost a decade. • Developing a new flexible architecture to support multiple both internal and external apps • A future concept for the CompTox dashboard… 71
  • 73. Future Search Possibilities 72 Chemicals Products Targets Assays Literature
  • 74. Conclusion: We are focused on Integrating Research Efforts HT research needs: NTA research needs:ToxCast: Prioritizations for: 1) Parent chemicals 2) Mixtures 3) Metabolites ExpoCast: Measurement data for: 1) Model inputs 2) Model evaluation 3) Model refinement 1) Chemical databases 2) Chemical standards 3) Exposure forecasts 4) Bioactivity data 5) Functional use data 6) Prioritization methods 7) HT workflows
  • 75. Acknowledgements EPA-RTP Chris Grulke Kamel Mansouri Ann Richard Nancy Baker Katharine Phillips Kathie Dionisio Jennifer Smith Jeff Edwards John Wambaugh Grace Patlewicz Imran Shah Jon Sobus Richard Judson