Non-targeted and suspect screening studies using high resolution mass spectrometry (HRMS) have revolutionized the detection of chemicals in complex matrices. However, data processing remains challenging due to the vast number of chemicals detected in samples, software and computational requirements of data processing, and inherent uncertainty in confidently identifying chemicals from candidate lists. The US EPA has developed functionality within the CompTox Chemicals Dashboard (https://comptox.epa.gov) to address challenges related to data processing and analysis in HRMS. These tools include the generation of “MS-Ready” structures to optimize database searching, retention time prediction for candidate reduction, consensus ranking using chemical metadata, and in silico MS/MS fragmentation prediction for spectral matching. Combining these tools into a comprehensive workflow improves certainty in candidate identification. This presentation will introduce the tools and combined workflow, including visualization and access via the CompTox Chemicals Dashboard. These tools, data, and visualization approaches within an open chemistry resource provides a publicly available software tool to support structure identification and non-targeted analyses. This abstract does not necessarily represent the views or policies of the U.S. Environmental Protection Agency.
Chemical identification of unknowns in high resolution mass spectrometry using the CompTox Chemicals Dashboard
1. Applications of the US EPA’s CompTox
Chemicals Dashboard to support
structure identification and chemical
forensics using mass spectrometry
Antony Williams1 and Andrew D. McEachran2,3
1) National Center for Computational Toxicology, U.S. Environmental Protection Agency, RTP, NC
2) Oak Ridge Institute of Science and Education (ORISE) Research Participant, RTP, NC
3) Present Address: Agilent Inc., Santa Clara, CA
March 18th 2019
Pittcon, Philadelphia
http://www.orcid.org/0000-0002-2668-4821
The views expressed in this presentation are those of the author and do not necessarily reflect the views or policies of the U.S. EPA
2. CompTox Chemicals Dashboard
• A publicly accessible website delivering access:
– ~875,000 chemicals with related property data
– Searchable by chemical, product use, gene and assay (ToxCast)
– Experimental and predicted physicochemical property data
– “Bioactivity data” for the ToxCast/Tox21 project
– Links to other agency websites and public data resources
– “Literature” searches for chemicals using public resources
– “Batch searching” for thousands of chemicals
– DOWNLOADABLE Open Data for reuse and repurposing
1
37. Batch Searching
• Singleton searches are useful but we work
with thousands of masses and formulae!
• Typical questions
– What is the list of chemicals for the formula CxHyOz
– What is the list of chemicals for a mass +/- error
– Can I get chemical lists in Excel files? In SDF files?
– Can I include properties in the download file?
36
49. Delivering a Better Database
• An ideal database would provide:
– Curated CAS Number-Name mappings with “correct”
chemical structures
• We have full time curators checking data
48
68. MS-Ready Mappings
• 125 chemicals returned in total
– 8 of the 125 are single component chemicals
– 3 of the 8 are isotope-labeled
– 3 are neutral compounds and 2 are charged
67
85. Work in Progress
• CFM-ID
– Viewing and Downloading pre-predicted spectra
– Search spectra against the database
• Retention Time Index Prediction
84
88. Work in Progress
• CFM-ID
– Viewing and Downloading pre-predicted spectra
– Search spectra against the database
• Retention Time Index Prediction
• Structure/substructure/similarity search
87
91. Work in Progress
• CFM-ID
– Viewing and Downloading pre-predicted spectra
– Search spectra against the database
• Retention Time Index Prediction
• Structure/substructure/similarity search
• Integration of predicted ion mobility data
90
93. Work in Progress
• CFM-ID
– Viewing and Downloading pre-predicted spectra
– Search spectra against the database
• Retention Time Index Prediction
• Structure/substructure/similarity search
• Integration of predicted ion mobility data
• Access to API and web services for
programmatic access
92
94. API services and Open Data
• Groups waiting on our API and web services
• Mass Spec companies instrument integration
• Release will be in iterations but for now our
data are available
93
97. Integration to MetFrag in place
https://jcheminf.biomedcentral.com/articles/10.1186/s13321-018-0299-2
96
98. Conclusion
• Dashboard access to data for ~875,000 chemicals
• MS-Ready data facilitates structure identification
• Related metadata facilitates candidate ranking
97
• Relationship mappings and
chemical lists of great utility
• Dashboard and contents
are one part of the solution
• Future releases will offer
even more utility
• We are committed to open
API development with time..
99. Acknowledgements
• THANK YOU for the invitation!
• IT Development team – especially Jeff
Edwards and Jeremy Dunne
• Chris Grulke for the ChemReg system
• NERL colleagues – Jon Sobus, Elin Ulrich,
Mark Strynar, Seth Newton
• Emma Schymanski, LCSB, Luxembourg
• The NORMAN Network and all contributors
98
100. Contact
Antony Williams
US EPA Office of Research and Development
National Center for Computational Toxicology
EMAIL: Williams.Antony@epa.gov
ORCID: https://orcid.org/0000-0002-2668-4821
99