Integrating an Analytical Methods and Mass Spectral Database with Cheminformatics Capabilities

US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure Scientist at National Center of Computational Toxicology at EPA em US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
Innovative Research for a Sustainable Future
www.epa.gov/research
Integrating an Analytical Methods and Mass Spectral Database with
Cheminformatics Capabilities
Gregory Janesch1, Erik Carr1, Vicente Samano2, Brian Meyer2 and Antony Williams3
1. ORAU Student Services Contractor to Center for Computational Toxicology & Exposure, Office of Research & Development, U.S. Environmental Protection Agency, Research Triangle Park, NC, USA
2. Senior Environmental Employment Program, US Environmental Protection Agency, Research Triangle Park, USA
3. Center for Computational Toxicology & Exposure, Office of Research & Development, U.S. Environmental Protection Agency, Research Triangle Park, NC, USA
`
ACS West
San Francisco, CA
August 13-17, 2023
There are three kinds of data contained within the database.
- Fact sheets are results-oriented documents with data associated
with one or more substances including basic descriptions of health
effects to monographs with NMR, Raman, and IR spectra.
- Methods document an end-to-end analytical procedure for one or
more substances, sometimes 100s of chemicals. The documents
are curated to extract the chemical compounds and then
annotated with information such as matrix and methodologies.
- Spectra, in the form of lists of m/z-intensity pairs and parameters.
In addition to the above information, records have assorted
metadata stored in the database. These data include information
such as experimental conditions, authors, a synopsis for the method
or fact sheet, and other data depending on what kind of record it is.
Data are open access and are derived from a variety of sources.
These include online spectral databases, vendor methods, research
groups, EPA databases and other government agencies.
At the time of writing the database contains approximately:
- 165,000 spectra (plus 600,000 externally linked spectra)
- >700 fact sheets
- >3300 methods
General Searching
Data
Spectrum Search
Description
A large variety of sources for spectra, documented analytical
procedures and methods, and other associated documentation exist
and are, in theory, easily available with the usual web search.
However, these sources are largely isolated from each other, not
easy to find via general searches because of inconsistencies in
chemical names and identifiers and then are highly varied in format.
To address these challenges, the Analytical Methods and Open
Spectra (AMOS) web application has been developed. AMOS is a
database and associated web-based application containing several
types of records searchable by common identifiers known to
chemists (i.e., CASRNs, InChI Keys and chemical names).
The authors thank the data curation team for their rigorous work in
annotating and identifying information in the records. Chemical data
extraction, curation and annotation is an essential part of this work.
Primary search functionality
searches all records for a
single chemical substance.
One half of the page (Fig.1)
shows the searched
compound (assuming a
match) and yields a table of
records containing that
substance, the data source,
associated methodology, and
a short description of the
record itself.
Selecting a row in that table
allows for viewing the
contents of that record more
closely, whether opening an
analytical method or
displaying a spectrum.
For spectral data, an
additional search option is
available. If a mass range,
methodology, and spectrum
(as x,y pairs) are supplied,
matching spectra with that
mass and methodology,
ranked by their similarity to
the user-supplied spectrum
will be returned. See Fig. 2.
The top table lists the
associated substance for
the found spectrum (with
associated DTXSID), the
similarity of that spectrum,
and a description of that
spectrum. Below that table
is an interactive plot of the
overlap of the two spectra.
Method Searches
AMOS contains two functions for searching for methods. One is a simple
table that lists all methods in the database (not pictured). This list can be
filtered by several fields including matrix, analyte, and method name,
allowing for quick discovery of methods that cover a known topic.
The other, shown below, is a search for methods containing similar
substances, thereby providing a starting point even for chemicals without
methods. A substance is searched for and if methods exist they are
returned. If there are no existing methods for that chemical then AMOS
returns all methods which contain at least one substance with a
sufficiently high Tanimoto structural similarity coefficient. This can be
especially useful in cases where a substance does not have any methods
associated with it at all – in the example below (see Fig. 3), the drug was
only available starting in 2015, so there has been relatively little time to
develop and publish methods for it.
Acknowledgements
Disclaimers
This tool is currently internal to the US- EPA and still under development.
Plans to release this to the public have not been finalized, but the process
is hoped to be complete by early 2024.The data used in this application
have not been thoroughly reviewed by the EPA and the user needs to
exercise judgement in their use of the results.
The views expressed in this poster are those of the authors and do not
necessarily reflect the views or policies of the U.S. EPA
Figure 1: The list of methods and
LC-MS or GC-MS spectra
associated with perfluorooctane-
sulfonic acid (PFOS).
Figure 2: A spectral similarity search
result includes the similarity match for
spectra and the list of associated
chemical compounds.
Figure 3: A search for a chemical with no matching methods then
provides the associated structure to a Tanimoto structural similarity
search to return methods with similar structures contained in them.
1 de 1

Recomendados

Inference Networks for Molecular Database Similarity Searching por
Inference Networks for Molecular Database Similarity SearchingInference Networks for Molecular Database Similarity Searching
Inference Networks for Molecular Database Similarity SearchingCSCJournals
229 visualizações16 slides
COMPUTATIONAL METHODS FOR FUNCTIONAL ANALYSIS OF GENE EXPRESSION por
COMPUTATIONAL METHODS FOR FUNCTIONAL ANALYSIS OF GENE EXPRESSIONCOMPUTATIONAL METHODS FOR FUNCTIONAL ANALYSIS OF GENE EXPRESSION
COMPUTATIONAL METHODS FOR FUNCTIONAL ANALYSIS OF GENE EXPRESSIONcsandit
275 visualizações14 slides

Mais conteúdo relacionado

Similar a Integrating an Analytical Methods and Mass Spectral Database with Cheminformatics Capabilities

How to handle discrepancies while you collect data for systemic review – pubrica por
How to handle discrepancies while you collect data for systemic review – pubricaHow to handle discrepancies while you collect data for systemic review – pubrica
How to handle discrepancies while you collect data for systemic review – pubricaPubrica
55 visualizações3 slides
COMPUTATIONAL TOOLS FOR PREDICTION OF NUCLEAR RECEPTOR MEDIATED EFFECTS por
COMPUTATIONAL TOOLS FOR PREDICTION OF NUCLEAR RECEPTOR MEDIATED EFFECTSCOMPUTATIONAL TOOLS FOR PREDICTION OF NUCLEAR RECEPTOR MEDIATED EFFECTS
COMPUTATIONAL TOOLS FOR PREDICTION OF NUCLEAR RECEPTOR MEDIATED EFFECTSEAJOA
201 visualizações11 slides

Similar a Integrating an Analytical Methods and Mass Spectral Database with Cheminformatics Capabilities(20)

How to handle discrepancies while you collect data for systemic review – pubrica por Pubrica
How to handle discrepancies while you collect data for systemic review – pubricaHow to handle discrepancies while you collect data for systemic review – pubrica
How to handle discrepancies while you collect data for systemic review – pubrica
Pubrica 55 visualizações
COMPUTATIONAL TOOLS FOR PREDICTION OF NUCLEAR RECEPTOR MEDIATED EFFECTS por EAJOA
COMPUTATIONAL TOOLS FOR PREDICTION OF NUCLEAR RECEPTOR MEDIATED EFFECTSCOMPUTATIONAL TOOLS FOR PREDICTION OF NUCLEAR RECEPTOR MEDIATED EFFECTS
COMPUTATIONAL TOOLS FOR PREDICTION OF NUCLEAR RECEPTOR MEDIATED EFFECTS
EAJOA201 visualizações
Hdat pdf-draft por shassant2
Hdat pdf-draftHdat pdf-draft
Hdat pdf-draft
shassant2343 visualizações
A Systematic Literature Review On Health Recommender Systems por Becky Goins
A Systematic Literature Review On Health Recommender SystemsA Systematic Literature Review On Health Recommender Systems
A Systematic Literature Review On Health Recommender Systems
Becky Goins4 visualizações
Predicting active compounds for lung cancer based on quantitative structure-a... por IJECEIAES
Predicting active compounds for lung cancer based on quantitative structure-a...Predicting active compounds for lung cancer based on quantitative structure-a...
Predicting active compounds for lung cancer based on quantitative structure-a...
IJECEIAES4 visualizações
Chemoinformatics—an introduction for computer scientists por unyil96
Chemoinformatics—an introduction for computer scientistsChemoinformatics—an introduction for computer scientists
Chemoinformatics—an introduction for computer scientists
unyil965.5K visualizações
Assessing Drug Safety Using AI por Databricks
Assessing Drug Safety Using AIAssessing Drug Safety Using AI
Assessing Drug Safety Using AI
Databricks852 visualizações
Developing tools for high resolution mass spectrometry-based screening via th... por Andrew McEachran
Developing tools for high resolution mass spectrometry-based screening via th...Developing tools for high resolution mass spectrometry-based screening via th...
Developing tools for high resolution mass spectrometry-based screening via th...
Andrew McEachran136 visualizações
Pallavi gupta por PallaviGupta220
Pallavi guptaPallavi gupta
Pallavi gupta
PallaviGupta22020 visualizações
Systematic reviews of topical fluorides for dental caries: a review of report... por cathykr
Systematic reviews of topical fluorides for dental caries: a review of report...Systematic reviews of topical fluorides for dental caries: a review of report...
Systematic reviews of topical fluorides for dental caries: a review of report...
cathykr1.4K visualizações
4th Annual Advancing the Pace of Chemical Risk Assessment por Michelle Angrish
4th Annual Advancing the Pace of Chemical Risk Assessment4th Annual Advancing the Pace of Chemical Risk Assessment
4th Annual Advancing the Pace of Chemical Risk Assessment
Michelle Angrish75 visualizações
Embi cri review-2012-final por Peter Embi
Embi cri review-2012-finalEmbi cri review-2012-final
Embi cri review-2012-final
Peter Embi572 visualizações
Methods Of Search For Eligible Studies por Katie Gulley
Methods Of Search For Eligible StudiesMethods Of Search For Eligible Studies
Methods Of Search For Eligible Studies
Katie Gulley2 visualizações
A method for mining infrequent causal associations and its application in fin... por IEEEFINALYEARPROJECTS
A method for mining infrequent causal associations and its application in fin...A method for mining infrequent causal associations and its application in fin...
A method for mining infrequent causal associations and its application in fin...
IEEEFINALYEARPROJECTS729 visualizações

Último

BLOTTING TECHNIQUES SPECIAL por
BLOTTING TECHNIQUES SPECIALBLOTTING TECHNIQUES SPECIAL
BLOTTING TECHNIQUES SPECIALMuhammadImranMirza2
14 visualizações56 slides
Effect of Integrated Nutrient Management on Growth and Yield of Solanaceous F... por
Effect of Integrated Nutrient Management on Growth and Yield of Solanaceous F...Effect of Integrated Nutrient Management on Growth and Yield of Solanaceous F...
Effect of Integrated Nutrient Management on Growth and Yield of Solanaceous F...SwagatBehera9
6 visualizações36 slides
2. Natural Sciences and Technology Author Siyavula.pdf por
2. Natural Sciences and Technology Author Siyavula.pdf2. Natural Sciences and Technology Author Siyavula.pdf
2. Natural Sciences and Technology Author Siyavula.pdfssuser821efa
13 visualizações232 slides
Worldviews and their (im)plausibility: Science and Holism por
Worldviews and their (im)plausibility: Science and HolismWorldviews and their (im)plausibility: Science and Holism
Worldviews and their (im)plausibility: Science and HolismJohnWilkins48
44 visualizações19 slides
Study on Drug Drug Interaction Through Prescription Analysis of Type II Diabe... por
Study on Drug Drug Interaction Through Prescription Analysis of Type II Diabe...Study on Drug Drug Interaction Through Prescription Analysis of Type II Diabe...
Study on Drug Drug Interaction Through Prescription Analysis of Type II Diabe...Anmol Vishnu Gupta
28 visualizações12 slides

Último(20)

BLOTTING TECHNIQUES SPECIAL por MuhammadImranMirza2
BLOTTING TECHNIQUES SPECIALBLOTTING TECHNIQUES SPECIAL
BLOTTING TECHNIQUES SPECIAL
MuhammadImranMirza214 visualizações
Effect of Integrated Nutrient Management on Growth and Yield of Solanaceous F... por SwagatBehera9
Effect of Integrated Nutrient Management on Growth and Yield of Solanaceous F...Effect of Integrated Nutrient Management on Growth and Yield of Solanaceous F...
Effect of Integrated Nutrient Management on Growth and Yield of Solanaceous F...
SwagatBehera96 visualizações
2. Natural Sciences and Technology Author Siyavula.pdf por ssuser821efa
2. Natural Sciences and Technology Author Siyavula.pdf2. Natural Sciences and Technology Author Siyavula.pdf
2. Natural Sciences and Technology Author Siyavula.pdf
ssuser821efa13 visualizações
Worldviews and their (im)plausibility: Science and Holism por JohnWilkins48
Worldviews and their (im)plausibility: Science and HolismWorldviews and their (im)plausibility: Science and Holism
Worldviews and their (im)plausibility: Science and Holism
JohnWilkins4844 visualizações
Study on Drug Drug Interaction Through Prescription Analysis of Type II Diabe... por Anmol Vishnu Gupta
Study on Drug Drug Interaction Through Prescription Analysis of Type II Diabe...Study on Drug Drug Interaction Through Prescription Analysis of Type II Diabe...
Study on Drug Drug Interaction Through Prescription Analysis of Type II Diabe...
Anmol Vishnu Gupta28 visualizações
vitamine B1.pptx por ajithkilpart
vitamine B1.pptxvitamine B1.pptx
vitamine B1.pptx
ajithkilpart36 visualizações
CYTOSKELETON STRUCTURE.ppt por EstherShobhaR
CYTOSKELETON STRUCTURE.pptCYTOSKELETON STRUCTURE.ppt
CYTOSKELETON STRUCTURE.ppt
EstherShobhaR20 visualizações
Generative AI to Accelerate Discovery of Materials por Deakin University
Generative AI to Accelerate Discovery of MaterialsGenerative AI to Accelerate Discovery of Materials
Generative AI to Accelerate Discovery of Materials
Deakin University7 visualizações
IMMUNODIAGNOSTICS KITS.pdf por vetrivel303632
IMMUNODIAGNOSTICS KITS.pdfIMMUNODIAGNOSTICS KITS.pdf
IMMUNODIAGNOSTICS KITS.pdf
vetrivel30363231 visualizações
Cyanobacteria as a Biofertilizer (BY- Ayushi).pptx por AyushiKardam
Cyanobacteria as a Biofertilizer (BY- Ayushi).pptxCyanobacteria as a Biofertilizer (BY- Ayushi).pptx
Cyanobacteria as a Biofertilizer (BY- Ayushi).pptx
AyushiKardam5 visualizações
Small ruminant keepers’ knowledge, attitudes and practices towards peste des ... por ILRI
Small ruminant keepers’ knowledge, attitudes and practices towards peste des ...Small ruminant keepers’ knowledge, attitudes and practices towards peste des ...
Small ruminant keepers’ knowledge, attitudes and practices towards peste des ...
ILRI10 visualizações
Small ruminant keepers’ knowledge, attitudes and practices towards peste des ... por ILRI
Small ruminant keepers’ knowledge, attitudes and practices towards peste des ...Small ruminant keepers’ knowledge, attitudes and practices towards peste des ...
Small ruminant keepers’ knowledge, attitudes and practices towards peste des ...
ILRI7 visualizações
Krishna VSC 692 Credit Seminar.pptx por KrishnaSharma682993
Krishna VSC 692 Credit Seminar.pptxKrishna VSC 692 Credit Seminar.pptx
Krishna VSC 692 Credit Seminar.pptx
KrishnaSharma68299313 visualizações
Oral_Presentation_by_Fatma (2).pdf por fatmaalmrzqi
Oral_Presentation_by_Fatma (2).pdfOral_Presentation_by_Fatma (2).pdf
Oral_Presentation_by_Fatma (2).pdf
fatmaalmrzqi8 visualizações
Vegetable grafting: A new crop improvement approach.pptx por Himul Suthar
Vegetable grafting: A new crop improvement approach.pptxVegetable grafting: A new crop improvement approach.pptx
Vegetable grafting: A new crop improvement approach.pptx
Himul Suthar9 visualizações
selection of preformed arch wires during the alignment stage of preadjusted o... por MaherFouda1
selection of preformed arch wires during the alignment stage of preadjusted o...selection of preformed arch wires during the alignment stage of preadjusted o...
selection of preformed arch wires during the alignment stage of preadjusted o...
MaherFouda18 visualizações
Indian council for child welfare por RenuWaghmare2
Indian council for child welfareIndian council for child welfare
Indian council for child welfare
RenuWaghmare211 visualizações
NU-543 Class II Type A2 Biosafety Cabinet por Gaia Science Pte Ltd
NU-543 Class II Type A2 Biosafety CabinetNU-543 Class II Type A2 Biosafety Cabinet
NU-543 Class II Type A2 Biosafety Cabinet
Gaia Science Pte Ltd5 visualizações
INTRODUCTION TO PLANT SYSTEMATICS.pptx por RASHMI M G
INTRODUCTION TO PLANT SYSTEMATICS.pptxINTRODUCTION TO PLANT SYSTEMATICS.pptx
INTRODUCTION TO PLANT SYSTEMATICS.pptx
RASHMI M G 5 visualizações

Integrating an Analytical Methods and Mass Spectral Database with Cheminformatics Capabilities

  • 1. Innovative Research for a Sustainable Future www.epa.gov/research Integrating an Analytical Methods and Mass Spectral Database with Cheminformatics Capabilities Gregory Janesch1, Erik Carr1, Vicente Samano2, Brian Meyer2 and Antony Williams3 1. ORAU Student Services Contractor to Center for Computational Toxicology & Exposure, Office of Research & Development, U.S. Environmental Protection Agency, Research Triangle Park, NC, USA 2. Senior Environmental Employment Program, US Environmental Protection Agency, Research Triangle Park, USA 3. Center for Computational Toxicology & Exposure, Office of Research & Development, U.S. Environmental Protection Agency, Research Triangle Park, NC, USA ` ACS West San Francisco, CA August 13-17, 2023 There are three kinds of data contained within the database. - Fact sheets are results-oriented documents with data associated with one or more substances including basic descriptions of health effects to monographs with NMR, Raman, and IR spectra. - Methods document an end-to-end analytical procedure for one or more substances, sometimes 100s of chemicals. The documents are curated to extract the chemical compounds and then annotated with information such as matrix and methodologies. - Spectra, in the form of lists of m/z-intensity pairs and parameters. In addition to the above information, records have assorted metadata stored in the database. These data include information such as experimental conditions, authors, a synopsis for the method or fact sheet, and other data depending on what kind of record it is. Data are open access and are derived from a variety of sources. These include online spectral databases, vendor methods, research groups, EPA databases and other government agencies. At the time of writing the database contains approximately: - 165,000 spectra (plus 600,000 externally linked spectra) - >700 fact sheets - >3300 methods General Searching Data Spectrum Search Description A large variety of sources for spectra, documented analytical procedures and methods, and other associated documentation exist and are, in theory, easily available with the usual web search. However, these sources are largely isolated from each other, not easy to find via general searches because of inconsistencies in chemical names and identifiers and then are highly varied in format. To address these challenges, the Analytical Methods and Open Spectra (AMOS) web application has been developed. AMOS is a database and associated web-based application containing several types of records searchable by common identifiers known to chemists (i.e., CASRNs, InChI Keys and chemical names). The authors thank the data curation team for their rigorous work in annotating and identifying information in the records. Chemical data extraction, curation and annotation is an essential part of this work. Primary search functionality searches all records for a single chemical substance. One half of the page (Fig.1) shows the searched compound (assuming a match) and yields a table of records containing that substance, the data source, associated methodology, and a short description of the record itself. Selecting a row in that table allows for viewing the contents of that record more closely, whether opening an analytical method or displaying a spectrum. For spectral data, an additional search option is available. If a mass range, methodology, and spectrum (as x,y pairs) are supplied, matching spectra with that mass and methodology, ranked by their similarity to the user-supplied spectrum will be returned. See Fig. 2. The top table lists the associated substance for the found spectrum (with associated DTXSID), the similarity of that spectrum, and a description of that spectrum. Below that table is an interactive plot of the overlap of the two spectra. Method Searches AMOS contains two functions for searching for methods. One is a simple table that lists all methods in the database (not pictured). This list can be filtered by several fields including matrix, analyte, and method name, allowing for quick discovery of methods that cover a known topic. The other, shown below, is a search for methods containing similar substances, thereby providing a starting point even for chemicals without methods. A substance is searched for and if methods exist they are returned. If there are no existing methods for that chemical then AMOS returns all methods which contain at least one substance with a sufficiently high Tanimoto structural similarity coefficient. This can be especially useful in cases where a substance does not have any methods associated with it at all – in the example below (see Fig. 3), the drug was only available starting in 2015, so there has been relatively little time to develop and publish methods for it. Acknowledgements Disclaimers This tool is currently internal to the US- EPA and still under development. Plans to release this to the public have not been finalized, but the process is hoped to be complete by early 2024.The data used in this application have not been thoroughly reviewed by the EPA and the user needs to exercise judgement in their use of the results. The views expressed in this poster are those of the authors and do not necessarily reflect the views or policies of the U.S. EPA Figure 1: The list of methods and LC-MS or GC-MS spectra associated with perfluorooctane- sulfonic acid (PFOS). Figure 2: A spectral similarity search result includes the similarity match for spectra and the list of associated chemical compounds. Figure 3: A search for a chemical with no matching methods then provides the associated structure to a Tanimoto structural similarity search to return methods with similar structures contained in them.