As scientists we now have online a number of domain specific databases in chemistry for us to use. While there are hundreds of these “compound databases” available for us to access there are very few developed with the concerns of the analytical scientist in mind. ChemSpider is a free resource from the Royal Society of Chemistry hosting over 28 million chemicals from over 400 data sources and is already utilized by the mass spectrometry community in particular to aid in the process of structure verification. This presentation will give an overview of how ChemSpider has become one of the internets’ primary resources for chemists providing access to chemicals, experimental and predicted properties, patents, publications and analytical data and ultimately acting as a structure-centric hub. The importance of programming interfaces to allow for integration and how the primary mass spectrometry vendors are already utilizing ChemSpider will be discussed. The progress towards providing a dereplication platform for natural products using a combination of mass spectrometry and NMR spectroscopy data will be outlined and the path forward to a fully chemical structure-enabled internet and the importance of data interchange standards to enable this will be discussed.
Utilizing Online Databases for the Purpose of Structure Identification – Approaches Utilizing the ChemSpider Resource
1. Utilizing Online Databases for the Purpose
of Structure Identification – Approaches
Utilizing the ChemSpider Resource
Antony Williams
Triangle Chromatography Discussion Group
May 16th
2013
2. Free and Easy
• Everything I will show in terms of ChemSpider is
available for free online today
• To make it easy to “take notes” these slides will
go online tonight for you to download
www.slideshare.net/AntonyWilliams/
3. Mass Spectrometry for Structure ID
• Many applications of mass spectrometry are the
identification of “knowns”
• Known structures, previously characterized,
previously identified and, increasingly, online
• Dereplication, identification of “other
manufacturers” materials, metabolites, lipids
analysis – can be supported by existing databases
• What large database could serve mass spec. ?
4. ChemSpider
• > 28 million chemicals with associated data
• Linked out to 400 data sources…
12. For Mass Spectrometrists
• Valuable searches for Mass Spec would be:
– Search the database by mass or formula for
structure identification
– Search subsets of data – e.g. “metabolism”,
pesticides etc
– Link structure-based data across the internet
– Provide “programming interfaces” to integrate
– Does ChemSpider provide value to Mass
Spectrometrists?
14. Data Source Selection
• >28 million chemicals include
– Vendor collections
– Government databases
– Individual/Lab data
– Publication data
– All segregated allowing for data source selection
39. What can I find on ChemSpider?
• Experimental properties
• Predicted properties
• Literature links
• Book Links
• Database links
• Where to Buy
• Patent links
• Spectral data
• Toxicity data
• Virtual screening data
• ……….and a hub for
searching the entire
internet!!!
45. Identification of “Known
Unknowns”
• “Known Unknowns” can be identified by
searching in ChemSpider
• Searching of “segregated” datasets can be
performed
• Datasets can be expanded for specific projects –
for example, natural products ID…
46. • FP7 Initiative. PharmaSea: increasing value and flow in
the marine biodiscovery pipeline
47. The PharmaSea Project
• PharmaSea project for the identification of
natural products – dereplication approaches
– Use MS searches of natural product slices to identify
– Natural product data include from RSC databases
(NPU) and ChemSpider data sources
48. What about ID’ing “Unknowns”?
• Bring together various spectroscopic techniques
for structure elucidation – primarily NMR and
Mass Spectrometry
• Work to identify substructural fragments
• Use Computer-Assisted Structure Elucidation
51. The PharmaSea Project
• PharmaSea project for the identification of
natural products – dereplication approaches
– Use MS searches of natural product slices to identify
– Natural product data include from RSC databases
(NPU) and ChemSpider data sources
– Pre-fragment compounds and develop searches
– Dereplication using NMR data
• NMR features
• Predicted spectra and “Verification approaches”
• CASE based approaches
52. Web Services Open Up Collaboration
• Agilent, Bruker, Waters and Thermo all using or
investigating our web-based services for
compound lookup
• Many academic sites integrating directly –
metabonomics, name lookup, mass-based
searching
56. Calculation of Elemental Composition &
ChemSpider Search of Lipid Maps Database
Performed via MarkerLynx
57. Some usage statistics
• ca. 200 visitors at any one time, ~30,000 visits per day
• Mar 4-Apr 3, 2013
– Visits = 731,656
– Unique Visitors = 527,008
• Independent servers to support other projects
58. Crowdsourcing ChemSpider
• ChemSpider is crowdsourced
• Community deposition,
annotation and curation
• Anyone can “Leave Feedback”
• Registered users can add data
59. Future Developments
• Support for Multiple Substructures
• Mass to formula conversion
• Expand data sources with MS focus
• Hosting reference data for Metabonomics
• Investigating how to serve chromatographers
• What can we do for you???
• Anybody in the audience teach spectroscopy??
68. Acknowledgments
• RSC eScience Team
• James Little, Eastman Chemical Company
• Alexey Pshenichnov, University of Leicester –
SpectraSchool
• ACD/Labs – Assigned Spectra Display Widget
• Depositors of data – there are many!