Online databases can be used for the purposes of structure identification. The Royal Society of Chemistry provides access to an online database containing tens of millions of compounds and this has been shown to be a very effective platform for the development of tools for structure identification. Since in many cases an unknown to an investigator is known in the chemical literature or reference database, these “known unknowns” are commonly available now on aggregated internet resources. The identification of these types of compounds in commercial, environmental, forensic, and natural product samples can be identified by searching against these large aggregated databases querying by either elemental composition or monoisotopic mass. Searching by elemental composition is the preferred approach as it is often difficult to determine a unique elemental composition for compounds with molecular weights greater than 600 Da. In these cases, searching by the monoisotopic mass is advantageous. In either case, the search results can be refined by appropriate filtering to identify the compounds. We will report on integrated filtering and search approaches on our aggregated compound database for the purpose of structure identification and review our progress in using the platform for natural product dereplication purposes.
Using an online database of chemical compounds for the purpose of structure identification
1. Using an online database of
chemical compounds for the
purpose of structure
identification
Antony Williams, Valery Tkachenko
and Alexey Pshenichnov
ACS San Francisco
August 2014
2. Free and Easy
• Everything I will show in terms of ChemSpider
is available for free online today
• To make it easy to “take notes” these slides
are already available at:
www.slideshare.net/AntonyWilliams/
3. Mass Spectrometry for
Structure ID
• Many applications of mass spectrometry are the
identification of “knowns”
• Known structures, previously characterized,
previously identified and, increasingly, online
• Dereplication, identification of “other
manufacturers” materials, metabolites, lipids
analysis – can be supported by existing
databases
• What large database could serve mass spec. ?
4. • ~32 million chemicals and growing
• Data sourced from >500 different sources
• Crowd sourced curation and annotation
• Ongoing deposition of data from our
journals and our collaborators
• Structure centric hub for web-searching
• …and a really big dictionary!!!
12. For Mass Spectrometrists
• Valuable searches for Mass Spec would be:
• Search the database by mass or formula for
structure identification
• Search subsets of data – e.g. “metabolism”,
pesticides etc
• Link structure-based data across the internet
• Provide “programming interfaces” to integrate
• Does ChemSpider provide value to Mass
Spectrometrists?
14. Data Source Selection
• >32 million chemicals include
• Vendor collections
• Government databases
• Individual/Lab data
• Publication data
• All segregated allowing for data source
selection
43. Identification of “Known
Unknowns”
• “Known Unknowns” can be identified by
searching in ChemSpider
• Searching of “segregated” datasets can be
performed
• Datasets can be expanded for specific
projects – for example, natural products ID…
48. What about ID’ing
“Unknowns”?
• Bring together various spectroscopic
techniques for structure elucidation –
primarily NMR and Mass Spectrometry
• Work to identify substructural fragments
• Use Computer-Assisted Structure
Elucidation
49. • Index literature related to marine natural
products: 26K articles and growing
• Structure searchable database
• Data includes taxonomy, location and literature
• “Spectral features” generated algorithmically
• Utilize the spectral features for dereplication
• Initially NMR and MS
52. Web Services Open Up
Collaboration
• Agilent, Bruker, Waters and Thermo all using
or investigating our web-based services for
compound lookup
• Many academic sites integrating directly –
metabonomics, name lookup, mass-based
searching
53. Results of the ChemSpider Search
in the MarkerLynx Worksheet
57. Thank you
Email: williamsa@rsc.org
ORCID: 0000-0002-2668-4821
Twitter: @ChemConnector
Personal Blog: www.chemconnector.com
SLIDES: www.slideshare.net/AntonyWilliams
Notas do Editor
MarinLit is ‘article-centric’ and not compound centric. Compounds are only indexed when they are newly discovered, revised, or new to marine.
All compound records link to the paper they were first mentioned. They are not linked to subsequent articles that describe them.