SlideShare uma empresa Scribd logo
1 de 7
Baixar para ler offline
ChemSpider: A Hub for Online Chemical Information Resources

Antony Williams*

*ChemSpider, Royal Society of Chemistry, U.S. Office: Wake Forest, NC-27587

E-mail: williamsa@rsc.org



1. Internet-based chemistry

The World Wide Web continues to have an expanding and profound effect on providing access to
chemical information. A chemist may wish to know a variety of information about a given
chemical compound including physical and chemical properties, molecular structure, spectral
data, synthetic methods, known reactions, safety information, and systematic nomenclature and
chemical names. In the past, having access to this variety of information required a small library
of different reference works, since no one resource contained all this data. This was problematic
both in terms of cost and physical space for storage. Now there is a single web site that not only
provides all this information for millions of compounds but also is free. This website is the Royal
Society of Chemistry’s ChemSpider [1, 2].

2. ChemSpider

As a cheminformatician interested in integrating together large amounts of data, specifically
structure-based data, spectral data and large quantities of physicochemical data, the author,
together with a number of software developers decided to pursue the challenge of integrating
together web-based chemistry data. Using a nominal infrastructure of just three computer servers
and developing bespoke software using Microsoft technologies (specifically a .NET architecture
using a SQL server database) ChemSpider was released to the community as a platform
containing >10.5 million unique chemical structures sourced from the PubChem database [3]
integrated to a small number of online resources. The original system included both structure and
rudimentary substructure searching. Within a few months of release the ability for users to
register and upload chemical compounds and annotate and curate data was introduced. The
amount of data online continued to grow with depositions from chemical vendors and other
online chemical databases and reached around 20 million chemicals. Within a period of three
years the ChemSpider platform had developed a significant level of popularity with the
community and was acquired by the Royal Society of Chemistry [4].

Today ChemSpider is a free, online chemical database offering access to physical and chemical
properties, molecular structures, spectral data, synthetic methods, safety information, and
nomenclature for over twenty six million unique chemical compounds, sourced and linked out to
almost four hundred separate data sources on the web. ChemSpider is fast becoming the primary
chemistry internet portal and it can be very useful for both chemical teaching and research.
ChemSpider is not just a search engine layered on terabytes of chemistry data but is also a
crowdsourcing community for chemists. Registered users can enter information and annotate and
curate the records. The requirement to register and login is to prevent anonymous acts of
vandalism. The chemical community has been forthcoming in adding information including new
chemical structures, associations between structures and publications, addition of analytical data
such as spectra and the curation of chemical identifiers and property data.

ChemSpider has been described as the Google for Chemistry and a Wikipedia for chemists. By
aggregating data and linking it together using a chemical structure as the primary record in the
database, ChemSpider has been able to link together Wikipedia [5], PubChem [6], ChEBI
(Chemical Entities of Biological Interest) [7] and KEGG (The Kyoto Encyclopedia of Genes and
Genomes) [8], chemical vendors, a patent database, and both open and closed access chemistry
journals. Where possible, each chemical record retains the links out to the original source of the
material thereby associating a microattribution. These links allow a ChemSpider user to source
information of particular interest, including where to purchase a chemical, as well as toxicity and
metabolism data and so on. Aggregating that level of connected information via a classical search
engine, like Google, would be very time consuming.

ChemSpider has a number of advantages over a simple Google search. The variety of information
about a compound provided at ChemSpider is hard to match on any other free web site. The data
continue to be validated, updated and expanded by practicing chemists. ChemSpider provides
links to many other online sources for further information. This plethora of links now includes
Google Books, Scholar and Patents, Microsoft Academic Search, RSC Databases, Books and
Publishing website and an ever-increasing number of government, commercial and academic
databases.




Figure     1:     The     header    of     the    chemical     record   for     Domoic      Acid
(http://www.chemspider.com/4445428) in ChemSpider. The entire record spans multiple pages
including links to patents and publications, pre-calculated and experimental properties and links
to many data external data sources and informational websites.
ChemSpider aggregated over 25 million unique chemical entities in just over 3 years. New
additions to the database are made daily especially since it is now integrated to the RSC
publishing process whereby new compounds identified in prospected RSC articles are deposited
and released to the community as the article is published. Many of the compounds in the current
database have already been curated, and the process is ongoing. In comparison the Chemical
Abstracts Service (CAS), which has been in the business of aggregating chemistry-related data
for over a century in order to create the CAS registry, recorded its 50 millionth chemical structure
in 2009 [9].

Searching the web using classical search engines is less useful than ChemSpider since these
services do not provide structure-based searching of the internet nor do they systematically
organize data curation. The closest comparison in terms of validated and crowdsourced
contributions to the domain of chemistry are the chemical pages in Wikipedia; however,
Wikipedia has information on far fewer compounds and supports only text searching not structure
searching.

The ChemSpider “web services” provide programmatical access to ChemSpider and allows for
instrument vendors to utilize the data for the purpose of structure identification. This opportunity
in particular is being used for the purpose of compound identification by mass spectrometry [10].
The data are also available to the Open PHACTS project [11], a project funded by the Innovative
Medicines Initative [12], and ChemSpider is one of the key particpants in the project. As
ChemSpider continues to expand in scope, capabilities and data the site is likely to become the
dominant free online resource for chemists especially as it supports a number of additional
projects as discussed below.

3. Synthetic Reactions on ChemSpider

The recently added ChemSpider SyntheticPages [13] provides a source of online data regarding
chemical synthesis procedures. This database is created by the community, for the community.
Chemists populate the online database with one or more of their chemical reactions outlining how
to perform a reaction. ChemSpider SyntheticPages grows as the community continues to
contribute content. What type of reactions suit? The reactions could be for a new compound or a
known compound from the literature or from an authors’ own publications. Also, it does not
matter if a similar prep is already in the database. There is a benefit to submitting as early stage
researchers should realise that potential employers have free and direct access to examples of
their work, including the time-consuming "starting material" preps that perhaps did not make it
into the papers or thesis. It is fast to submit an article - certainly less than an hour from start to
finish, and probably a lot less if the author already has the text in electronic format for a report.
The kudos of being a part of a database hosted by the RSC should not be underestimated and the
issuance of a permanent digital object identifier (DOI) link provides curriculum vitae value. The
value of the database will grow exponentially with an increasing number of pages covering an
increasingly broad array of chemical syntheses.
Figure 2: A ChemSpider SyntheticPages article regarding a hydrogenation process

4. Making Chemistry Mobile

As there has been an unprecedented growth in new ways to access online information using
mobile devices [14, 15] (for example, iPhones and iPads using the iOS operating system and
Android devices) it made sense to deliver access to ChemSpider and its related projects on such
platforms. Initially the ChemMobi [16] application from Symyx (now part of Accelrys) was
developed using the ChemSpider web services. This was soon followed by mobile websites
versions of both ChemSpider and ChemSpider SyntheticPages. Numerous other iOS apps then
made use of the web services. The Royal Society of Chemistry contracted the development of a
ChemSpider Mobile app [17] and it has since been downloaded many thousands of times and
runs on both iPhone and iPad.
Figure 3: The ChemSpider website optimized for mobile devices. These screen captures obtained
from an iPhone.

5. Additional projects integrating ChemSpider

An increasing array of projects are now being supported by ChemSpider as they serve up content
via the programming interface. ChemSpider is already becoming an important resource for
teaching, learning, and research. Specifically, the spectroscopic data, over 3000 spectra in total,
are the basis for the Spectral Game, which has already been used by over 10000 students [18].
This game allows students to learn how to interpret NMR spectra by validating either H1 or C13
spectra against two or more structures. Increasing in complexity as the game progresses by
increasing from 2 to 5 structures to choose from to match with the spectrum, the game has been
played by thousands of students from almost a 100 different countries.

Other RSC resources have recently been unveiled utilizing integration to ChemSpider data. These
include the Learn Chemistry Wiki [19] and SpectraSchool [20] to help in the education of
secondary school children. Since ChemSpider offers unrivalled online access to chemistry data
via application programming interfaces such projects will continue to expand in scope and
capabilities.
Figure 4: The Learn Chemistry wiki: a wiki environment utilizing ChemSpider data on its
compound pages.

Conclusion

ChemSpider is presently one of the richest sources of chemistry data available online. It has been
recognized with a number of awards in 2010 including the Bio-IT Best Practices Award for
community service [21] and the ALPSP [22] and i-Expo [23] awards for innovation. The
ChemSpider database is the foundation platform for a series of related websites and applications
and presently serves many hundreds of thousands of requests every day. ChemSpider is likely to
increase in prominence and impact in the coming years as the quantity of data grows and the
diversity of integrated data sources increases.

References

1.      Pence, H.E. and A.J. Williams, ChemSpider: An Online Chemical Information Resource. J.
        Chem. Educ., 2010. 87(11): p. 1123-1124.
2.      ChemSpider. Available from: http://www.chemspider.com.
3.      Wang, Y., et al., PubChem: a public information system for analyzing bioactivities of
        small molecules. Nucleic Acids Res, 2009. 37(Web Server issue): p. W623-33.
4.      Royal Society of Chemistry acquires ChemSpider. September 22nd 2011]; Available
        from: http://www.rsc.org/AboutUs/News/PressReleases/2009/ChemSpider.asp.
5.      Wkipedia Home Page. 2010 [cited 2010 May 12,]; Available from: www.wikipedia.org.
6.      PubChem Home Page. 2010                  [cited 2010 May 12]; Available from:
        http://pubchem.ncbi.nlm.nih.gov/.
7.      ChEBI Home Page. 2010                  [cited 2010 May 12]; Available from:
        http://www.ebi.ac.uk/chebi/.
8.    KEGG Home Page. 2010                         [cited 2010 May 12]; Available from:
      http://www.genome.jp/kegg/.
9.    http://www.cas.org/products/scifindr/index.html.                  Available          from:
      http://www.cas.org/products/scifindr/index.html.
10.   Little, J.L., et al., Identification of "Known Unknowns" Utilizing Accurate Mass Data and
      ChemSpider. J Am Soc Mass Spectrom, 2011.
11.   OpenPHACTS Project. 2011               [cited 2011 October 31st 2011]; Available from:
      http://www.openphacts.org/.
12.   Kamel, N., et al., The Innovative Medicines Initiative (IMI): a new opportunity for
      scientific collaboration between academia and industry at the European level. Eur Respir
      J, 2008. 31(5): p. 924-6.
13.   http://cssp.chemspider.com. ChemSpider Synthetic Pages. Available from:
      http://cssp.chemspider.com.
14.   Williams, A.J., et al., Mobile Apps for chemistry in the world of drug discovery. Drug Disc
      Today, 2011. 16(21-22): p. 928-939.
15.   Williams, A.J. and H.E. Pence, Smart Phones, a Powerful Tool in the Chemistry Classroom.
      J Chem Educ, 2011. 88: p. 683-686.
16.   http://tinyurl.com/3tpnmpn. ChemMobi. Available from: http://tinyurl.com/3tpnmpn.
17.   ChemSpider           Mobile,      [cited     2011   January     4th];    Available   from:
      http://itunes.apple.com/us/app/chemspider/id458878661
18.   Bradley, J.C., et al., The Spectral Game: leveraging Open Data and crowdsourcing for
      education. J Cheminform, 2009. 1(1): p. 9.
19.   Learn Chemistry Wiki.                   [cited 2011 January 4th]; Available from:
      http://www.rsc.org/learn-chemistry/wiki/Main_Page.
20.   SpectraSchool. [cited 2011 January 4th ]; Available from: http://spectraschool.rsc.org/.
21.   Williams, A.J. ChemSpider wins Bio-IT Best Practices Award for Community Service. 2010;
      Available        from:        http://www.chemspider.com/blog/chemspider-wins-bio-it-best-
      practices-award-for-community-service.html.
22.   ChemSpider wins the APSP Publishing Innovation Prize. 2010; Available from:
      http://www.chemspider.com/blog/chemspider-wins-the-alpsp-publishing-innovation-
      prize.html.
23.   ChemSpider wins "Most Innovative Software" Award. 2010; Available from:
      http://www.chemspider.com/blog/chemspider-wins-most-innovative-software-
      award.html.

Mais conteúdo relacionado

Mais procurados

PubChem for chemical information literacy training
PubChem for chemical information literacy trainingPubChem for chemical information literacy training
PubChem for chemical information literacy trainingSunghwan Kim
 
PubChem as an Emerging Toxicological Information Resource
PubChem as an Emerging Toxicological Information ResourcePubChem as an Emerging Toxicological Information Resource
PubChem as an Emerging Toxicological Information ResourceSunghwan Kim
 
Toxicological information in PubChem
Toxicological information in PubChemToxicological information in PubChem
Toxicological information in PubChemSunghwan Kim
 

Mais procurados (20)

Structure identification approaches using the EPA CompTox Chemicals Dashboard...
Structure identification approaches using the EPA CompTox Chemicals Dashboard...Structure identification approaches using the EPA CompTox Chemicals Dashboard...
Structure identification approaches using the EPA CompTox Chemicals Dashboard...
 
Bringing it all together: A Web-based Database for Chemical and Biological Da...
Bringing it all together: A Web-based Database for Chemical and Biological Da...Bringing it all together: A Web-based Database for Chemical and Biological Da...
Bringing it all together: A Web-based Database for Chemical and Biological Da...
 
What chemicals constitute the Exposome? Accessing data via the US EPA’s Comp...
What chemicals constitute the Exposome? Accessing data via the US EPA’s  Comp...What chemicals constitute the Exposome? Accessing data via the US EPA’s  Comp...
What chemicals constitute the Exposome? Accessing data via the US EPA’s Comp...
 
How to place your research questions or results into the context of the "Lega...
How to place your research questions or results into the context of the "Lega...How to place your research questions or results into the context of the "Lega...
How to place your research questions or results into the context of the "Lega...
 
New Approach Methods - What is That?
New Approach Methods - What is That?New Approach Methods - What is That?
New Approach Methods - What is That?
 
The EPA Comptox Chemistry Dashboard: A Web-Based Data Integration Hub for Tox...
The EPA Comptox Chemistry Dashboard: A Web-Based Data Integration Hub for Tox...The EPA Comptox Chemistry Dashboard: A Web-Based Data Integration Hub for Tox...
The EPA Comptox Chemistry Dashboard: A Web-Based Data Integration Hub for Tox...
 
The US-EPA CompTox Chemicals Dashboard – an information hub for over five tho...
The US-EPA CompTox Chemicals Dashboard – an information hub for over five tho...The US-EPA CompTox Chemicals Dashboard – an information hub for over five tho...
The US-EPA CompTox Chemicals Dashboard – an information hub for over five tho...
 
US-EPA CompTox Chemicals Dashboard as a web-based data resource to help ident...
US-EPA CompTox Chemicals Dashboard as a web-based data resource to help ident...US-EPA CompTox Chemicals Dashboard as a web-based data resource to help ident...
US-EPA CompTox Chemicals Dashboard as a web-based data resource to help ident...
 
Structure identification by Mass Spectrometry Non-Targeted Analysis using the...
Structure identification by Mass Spectrometry Non-Targeted Analysis using the...Structure identification by Mass Spectrometry Non-Targeted Analysis using the...
Structure identification by Mass Spectrometry Non-Targeted Analysis using the...
 
Adding complex expert knowledge into chemical database and transforming surfa...
Adding complex expert knowledge into chemical database and transforming surfa...Adding complex expert knowledge into chemical database and transforming surfa...
Adding complex expert knowledge into chemical database and transforming surfa...
 
Accessing information for chemicals in hydraulic fracturing fluids using the ...
Accessing information for chemicals in hydraulic fracturing fluids using the ...Accessing information for chemicals in hydraulic fracturing fluids using the ...
Accessing information for chemicals in hydraulic fracturing fluids using the ...
 
PubChem for chemical information literacy training
PubChem for chemical information literacy trainingPubChem for chemical information literacy training
PubChem for chemical information literacy training
 
The EPA iCSS Chemistry Dashboard to Support Compound Identification Using Hig...
The EPA iCSS Chemistry Dashboard to Support Compound Identification Using Hig...The EPA iCSS Chemistry Dashboard to Support Compound Identification Using Hig...
The EPA iCSS Chemistry Dashboard to Support Compound Identification Using Hig...
 
Using the US EPA’s CompTox Chemistry Dashboard for structure identification a...
Using the US EPA’s CompTox Chemistry Dashboard for structure identification a...Using the US EPA’s CompTox Chemistry Dashboard for structure identification a...
Using the US EPA’s CompTox Chemistry Dashboard for structure identification a...
 
EPA CompTox Chemicals Dashboard as a Data Integration Hub for Environmental C...
EPA CompTox Chemicals Dashboard as a Data Integration Hub for Environmental C...EPA CompTox Chemicals Dashboard as a Data Integration Hub for Environmental C...
EPA CompTox Chemicals Dashboard as a Data Integration Hub for Environmental C...
 
Non-targeted analysis supported by data and cheminformatics delivered via the...
Non-targeted analysis supported by data and cheminformatics delivered via the...Non-targeted analysis supported by data and cheminformatics delivered via the...
Non-targeted analysis supported by data and cheminformatics delivered via the...
 
An integrated data hub for per- and polyfluoroalkyl (PFAS) chemicals to suppo...
An integrated data hub for per- and polyfluoroalkyl (PFAS) chemicals to suppo...An integrated data hub for per- and polyfluoroalkyl (PFAS) chemicals to suppo...
An integrated data hub for per- and polyfluoroalkyl (PFAS) chemicals to suppo...
 
PubChem as an Emerging Toxicological Information Resource
PubChem as an Emerging Toxicological Information ResourcePubChem as an Emerging Toxicological Information Resource
PubChem as an Emerging Toxicological Information Resource
 
Toxicological information in PubChem
Toxicological information in PubChemToxicological information in PubChem
Toxicological information in PubChem
 
EPA CompTox Chemicals Dashboard as a Data Integration Hub for Environmental C...
EPA CompTox Chemicals Dashboard as a Data Integration Hub for Environmental C...EPA CompTox Chemicals Dashboard as a Data Integration Hub for Environmental C...
EPA CompTox Chemicals Dashboard as a Data Integration Hub for Environmental C...
 

Destaque

Generalidades virus y_hongos_-obst_2012
Generalidades virus y_hongos_-obst_2012Generalidades virus y_hongos_-obst_2012
Generalidades virus y_hongos_-obst_2012jvillarod
 
Learn BEM: CSS Naming Convention
Learn BEM: CSS Naming ConventionLearn BEM: CSS Naming Convention
Learn BEM: CSS Naming ConventionIn a Rocket
 
10 Insightful Quotes On Designing A Better Customer Experience
10 Insightful Quotes On Designing A Better Customer Experience10 Insightful Quotes On Designing A Better Customer Experience
10 Insightful Quotes On Designing A Better Customer ExperienceYuan Wang
 
How to Build a Dynamic Social Media Plan
How to Build a Dynamic Social Media PlanHow to Build a Dynamic Social Media Plan
How to Build a Dynamic Social Media PlanPost Planner
 

Destaque (6)

Cuerpo trabajo
Cuerpo trabajoCuerpo trabajo
Cuerpo trabajo
 
Generalidades virus y_hongos_-obst_2012
Generalidades virus y_hongos_-obst_2012Generalidades virus y_hongos_-obst_2012
Generalidades virus y_hongos_-obst_2012
 
Apostila 28.03.13
Apostila 28.03.13Apostila 28.03.13
Apostila 28.03.13
 
Learn BEM: CSS Naming Convention
Learn BEM: CSS Naming ConventionLearn BEM: CSS Naming Convention
Learn BEM: CSS Naming Convention
 
10 Insightful Quotes On Designing A Better Customer Experience
10 Insightful Quotes On Designing A Better Customer Experience10 Insightful Quotes On Designing A Better Customer Experience
10 Insightful Quotes On Designing A Better Customer Experience
 
How to Build a Dynamic Social Media Plan
How to Build a Dynamic Social Media PlanHow to Build a Dynamic Social Media Plan
How to Build a Dynamic Social Media Plan
 

Semelhante a ChemSpider as a hub for online chemical information resources

Chemistryand web2 ma walker 2 5 10
Chemistryand web2 ma walker 2 5 10Chemistryand web2 ma walker 2 5 10
Chemistryand web2 ma walker 2 5 10Elizabeth Brown
 
An Introduction to Chemoinformatics for the postgraduate students of Agriculture
An Introduction to Chemoinformatics for the postgraduate students of AgricultureAn Introduction to Chemoinformatics for the postgraduate students of Agriculture
An Introduction to Chemoinformatics for the postgraduate students of AgricultureDevakumar Jain
 

Semelhante a ChemSpider as a hub for online chemical information resources (20)

ChemSpider Overview SLides August 2007
ChemSpider Overview SLides August 2007ChemSpider Overview SLides August 2007
ChemSpider Overview SLides August 2007
 
The Benefits to Chemical Vendors of Putting their data on ChemSpider
The Benefits to Chemical Vendors of Putting their data on ChemSpiderThe Benefits to Chemical Vendors of Putting their data on ChemSpider
The Benefits to Chemical Vendors of Putting their data on ChemSpider
 
ChemSpider Overview Presentation at Special Libraries Association
ChemSpider Overview Presentation at Special Libraries AssociationChemSpider Overview Presentation at Special Libraries Association
ChemSpider Overview Presentation at Special Libraries Association
 
ChemSpider Presentation At University Of Toronto
ChemSpider Presentation At University Of TorontoChemSpider Presentation At University Of Toronto
ChemSpider Presentation At University Of Toronto
 
Crawling Across the Web of Chemistry Using ChemSpider
Crawling Across the Web of Chemistry Using ChemSpider Crawling Across the Web of Chemistry Using ChemSpider
Crawling Across the Web of Chemistry Using ChemSpider
 
Connecting Chemists to the Internet Through ChemSpider
Connecting Chemists to the Internet Through ChemSpiderConnecting Chemists to the Internet Through ChemSpider
Connecting Chemists to the Internet Through ChemSpider
 
ChemSpider and How The Wisdom Of The Crowds Can Improve The Quality Of ...
ChemSpider  and How The Wisdom Of The  Crowds  Can  Improve The  Quality Of  ...ChemSpider  and How The Wisdom Of The  Crowds  Can  Improve The  Quality Of  ...
ChemSpider and How The Wisdom Of The Crowds Can Improve The Quality Of ...
 
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspnRSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
 
Building A Community Resource For The Life Sciences
Building A Community Resource For The Life SciencesBuilding A Community Resource For The Life Sciences
Building A Community Resource For The Life Sciences
 
A perspective of Publicly Accessible/Open Access Chemistry Databases
A perspective of Publicly Accessible/Open Access Chemistry DatabasesA perspective of Publicly Accessible/Open Access Chemistry Databases
A perspective of Publicly Accessible/Open Access Chemistry Databases
 
Why Chemistry and the Web Will Benefit from a ChemSpider
Why Chemistry and the Web Will Benefit from a ChemSpiderWhy Chemistry and the Web Will Benefit from a ChemSpider
Why Chemistry and the Web Will Benefit from a ChemSpider
 
RSC ChemSpider -- Managing and Integrating Chemistry on the Internet to Build...
RSC ChemSpider -- Managing and Integrating Chemistry on the Internet to Build...RSC ChemSpider -- Managing and Integrating Chemistry on the Internet to Build...
RSC ChemSpider -- Managing and Integrating Chemistry on the Internet to Build...
 
Public Compound Databases
Public Compound DatabasesPublic Compound Databases
Public Compound Databases
 
Chemistryand web2 ma walker 2 5 10
Chemistryand web2 ma walker 2 5 10Chemistryand web2 ma walker 2 5 10
Chemistryand web2 ma walker 2 5 10
 
Chemspider Presentation at the ACS Meeting in New orleans
Chemspider Presentation at the ACS Meeting in New orleansChemspider Presentation at the ACS Meeting in New orleans
Chemspider Presentation at the ACS Meeting in New orleans
 
ChemSpider as a Foundation for Crowdsourcing and Collaborations in Open Chemi...
ChemSpider as a Foundation for Crowdsourcing and Collaborations in Open Chemi...ChemSpider as a Foundation for Crowdsourcing and Collaborations in Open Chemi...
ChemSpider as a Foundation for Crowdsourcing and Collaborations in Open Chemi...
 
ChemSpider – A Community Platform for Chemistry and Resources Supporting the ...
ChemSpider – A Community Platform for Chemistry and Resources Supporting the ...ChemSpider – A Community Platform for Chemistry and Resources Supporting the ...
ChemSpider – A Community Platform for Chemistry and Resources Supporting the ...
 
Current opinions in drug discovery public compound databases
Current opinions in drug discovery public compound databasesCurrent opinions in drug discovery public compound databases
Current opinions in drug discovery public compound databases
 
Navigating the Complex Web of Chemistry Using ChemSpider
Navigating the Complex Web of Chemistry Using ChemSpiderNavigating the Complex Web of Chemistry Using ChemSpider
Navigating the Complex Web of Chemistry Using ChemSpider
 
An Introduction to Chemoinformatics for the postgraduate students of Agriculture
An Introduction to Chemoinformatics for the postgraduate students of AgricultureAn Introduction to Chemoinformatics for the postgraduate students of Agriculture
An Introduction to Chemoinformatics for the postgraduate students of Agriculture
 

ChemSpider as a hub for online chemical information resources

  • 1. ChemSpider: A Hub for Online Chemical Information Resources Antony Williams* *ChemSpider, Royal Society of Chemistry, U.S. Office: Wake Forest, NC-27587 E-mail: williamsa@rsc.org 1. Internet-based chemistry The World Wide Web continues to have an expanding and profound effect on providing access to chemical information. A chemist may wish to know a variety of information about a given chemical compound including physical and chemical properties, molecular structure, spectral data, synthetic methods, known reactions, safety information, and systematic nomenclature and chemical names. In the past, having access to this variety of information required a small library of different reference works, since no one resource contained all this data. This was problematic both in terms of cost and physical space for storage. Now there is a single web site that not only provides all this information for millions of compounds but also is free. This website is the Royal Society of Chemistry’s ChemSpider [1, 2]. 2. ChemSpider As a cheminformatician interested in integrating together large amounts of data, specifically structure-based data, spectral data and large quantities of physicochemical data, the author, together with a number of software developers decided to pursue the challenge of integrating together web-based chemistry data. Using a nominal infrastructure of just three computer servers and developing bespoke software using Microsoft technologies (specifically a .NET architecture using a SQL server database) ChemSpider was released to the community as a platform containing >10.5 million unique chemical structures sourced from the PubChem database [3] integrated to a small number of online resources. The original system included both structure and rudimentary substructure searching. Within a few months of release the ability for users to register and upload chemical compounds and annotate and curate data was introduced. The amount of data online continued to grow with depositions from chemical vendors and other online chemical databases and reached around 20 million chemicals. Within a period of three years the ChemSpider platform had developed a significant level of popularity with the community and was acquired by the Royal Society of Chemistry [4]. Today ChemSpider is a free, online chemical database offering access to physical and chemical properties, molecular structures, spectral data, synthetic methods, safety information, and nomenclature for over twenty six million unique chemical compounds, sourced and linked out to almost four hundred separate data sources on the web. ChemSpider is fast becoming the primary chemistry internet portal and it can be very useful for both chemical teaching and research. ChemSpider is not just a search engine layered on terabytes of chemistry data but is also a crowdsourcing community for chemists. Registered users can enter information and annotate and curate the records. The requirement to register and login is to prevent anonymous acts of
  • 2. vandalism. The chemical community has been forthcoming in adding information including new chemical structures, associations between structures and publications, addition of analytical data such as spectra and the curation of chemical identifiers and property data. ChemSpider has been described as the Google for Chemistry and a Wikipedia for chemists. By aggregating data and linking it together using a chemical structure as the primary record in the database, ChemSpider has been able to link together Wikipedia [5], PubChem [6], ChEBI (Chemical Entities of Biological Interest) [7] and KEGG (The Kyoto Encyclopedia of Genes and Genomes) [8], chemical vendors, a patent database, and both open and closed access chemistry journals. Where possible, each chemical record retains the links out to the original source of the material thereby associating a microattribution. These links allow a ChemSpider user to source information of particular interest, including where to purchase a chemical, as well as toxicity and metabolism data and so on. Aggregating that level of connected information via a classical search engine, like Google, would be very time consuming. ChemSpider has a number of advantages over a simple Google search. The variety of information about a compound provided at ChemSpider is hard to match on any other free web site. The data continue to be validated, updated and expanded by practicing chemists. ChemSpider provides links to many other online sources for further information. This plethora of links now includes Google Books, Scholar and Patents, Microsoft Academic Search, RSC Databases, Books and Publishing website and an ever-increasing number of government, commercial and academic databases. Figure 1: The header of the chemical record for Domoic Acid (http://www.chemspider.com/4445428) in ChemSpider. The entire record spans multiple pages including links to patents and publications, pre-calculated and experimental properties and links to many data external data sources and informational websites.
  • 3. ChemSpider aggregated over 25 million unique chemical entities in just over 3 years. New additions to the database are made daily especially since it is now integrated to the RSC publishing process whereby new compounds identified in prospected RSC articles are deposited and released to the community as the article is published. Many of the compounds in the current database have already been curated, and the process is ongoing. In comparison the Chemical Abstracts Service (CAS), which has been in the business of aggregating chemistry-related data for over a century in order to create the CAS registry, recorded its 50 millionth chemical structure in 2009 [9]. Searching the web using classical search engines is less useful than ChemSpider since these services do not provide structure-based searching of the internet nor do they systematically organize data curation. The closest comparison in terms of validated and crowdsourced contributions to the domain of chemistry are the chemical pages in Wikipedia; however, Wikipedia has information on far fewer compounds and supports only text searching not structure searching. The ChemSpider “web services” provide programmatical access to ChemSpider and allows for instrument vendors to utilize the data for the purpose of structure identification. This opportunity in particular is being used for the purpose of compound identification by mass spectrometry [10]. The data are also available to the Open PHACTS project [11], a project funded by the Innovative Medicines Initative [12], and ChemSpider is one of the key particpants in the project. As ChemSpider continues to expand in scope, capabilities and data the site is likely to become the dominant free online resource for chemists especially as it supports a number of additional projects as discussed below. 3. Synthetic Reactions on ChemSpider The recently added ChemSpider SyntheticPages [13] provides a source of online data regarding chemical synthesis procedures. This database is created by the community, for the community. Chemists populate the online database with one or more of their chemical reactions outlining how to perform a reaction. ChemSpider SyntheticPages grows as the community continues to contribute content. What type of reactions suit? The reactions could be for a new compound or a known compound from the literature or from an authors’ own publications. Also, it does not matter if a similar prep is already in the database. There is a benefit to submitting as early stage researchers should realise that potential employers have free and direct access to examples of their work, including the time-consuming "starting material" preps that perhaps did not make it into the papers or thesis. It is fast to submit an article - certainly less than an hour from start to finish, and probably a lot less if the author already has the text in electronic format for a report. The kudos of being a part of a database hosted by the RSC should not be underestimated and the issuance of a permanent digital object identifier (DOI) link provides curriculum vitae value. The value of the database will grow exponentially with an increasing number of pages covering an increasingly broad array of chemical syntheses.
  • 4. Figure 2: A ChemSpider SyntheticPages article regarding a hydrogenation process 4. Making Chemistry Mobile As there has been an unprecedented growth in new ways to access online information using mobile devices [14, 15] (for example, iPhones and iPads using the iOS operating system and Android devices) it made sense to deliver access to ChemSpider and its related projects on such platforms. Initially the ChemMobi [16] application from Symyx (now part of Accelrys) was developed using the ChemSpider web services. This was soon followed by mobile websites versions of both ChemSpider and ChemSpider SyntheticPages. Numerous other iOS apps then made use of the web services. The Royal Society of Chemistry contracted the development of a ChemSpider Mobile app [17] and it has since been downloaded many thousands of times and runs on both iPhone and iPad.
  • 5. Figure 3: The ChemSpider website optimized for mobile devices. These screen captures obtained from an iPhone. 5. Additional projects integrating ChemSpider An increasing array of projects are now being supported by ChemSpider as they serve up content via the programming interface. ChemSpider is already becoming an important resource for teaching, learning, and research. Specifically, the spectroscopic data, over 3000 spectra in total, are the basis for the Spectral Game, which has already been used by over 10000 students [18]. This game allows students to learn how to interpret NMR spectra by validating either H1 or C13 spectra against two or more structures. Increasing in complexity as the game progresses by increasing from 2 to 5 structures to choose from to match with the spectrum, the game has been played by thousands of students from almost a 100 different countries. Other RSC resources have recently been unveiled utilizing integration to ChemSpider data. These include the Learn Chemistry Wiki [19] and SpectraSchool [20] to help in the education of secondary school children. Since ChemSpider offers unrivalled online access to chemistry data via application programming interfaces such projects will continue to expand in scope and capabilities.
  • 6. Figure 4: The Learn Chemistry wiki: a wiki environment utilizing ChemSpider data on its compound pages. Conclusion ChemSpider is presently one of the richest sources of chemistry data available online. It has been recognized with a number of awards in 2010 including the Bio-IT Best Practices Award for community service [21] and the ALPSP [22] and i-Expo [23] awards for innovation. The ChemSpider database is the foundation platform for a series of related websites and applications and presently serves many hundreds of thousands of requests every day. ChemSpider is likely to increase in prominence and impact in the coming years as the quantity of data grows and the diversity of integrated data sources increases. References 1. Pence, H.E. and A.J. Williams, ChemSpider: An Online Chemical Information Resource. J. Chem. Educ., 2010. 87(11): p. 1123-1124. 2. ChemSpider. Available from: http://www.chemspider.com. 3. Wang, Y., et al., PubChem: a public information system for analyzing bioactivities of small molecules. Nucleic Acids Res, 2009. 37(Web Server issue): p. W623-33. 4. Royal Society of Chemistry acquires ChemSpider. September 22nd 2011]; Available from: http://www.rsc.org/AboutUs/News/PressReleases/2009/ChemSpider.asp. 5. Wkipedia Home Page. 2010 [cited 2010 May 12,]; Available from: www.wikipedia.org. 6. PubChem Home Page. 2010 [cited 2010 May 12]; Available from: http://pubchem.ncbi.nlm.nih.gov/. 7. ChEBI Home Page. 2010 [cited 2010 May 12]; Available from: http://www.ebi.ac.uk/chebi/.
  • 7. 8. KEGG Home Page. 2010 [cited 2010 May 12]; Available from: http://www.genome.jp/kegg/. 9. http://www.cas.org/products/scifindr/index.html. Available from: http://www.cas.org/products/scifindr/index.html. 10. Little, J.L., et al., Identification of "Known Unknowns" Utilizing Accurate Mass Data and ChemSpider. J Am Soc Mass Spectrom, 2011. 11. OpenPHACTS Project. 2011 [cited 2011 October 31st 2011]; Available from: http://www.openphacts.org/. 12. Kamel, N., et al., The Innovative Medicines Initiative (IMI): a new opportunity for scientific collaboration between academia and industry at the European level. Eur Respir J, 2008. 31(5): p. 924-6. 13. http://cssp.chemspider.com. ChemSpider Synthetic Pages. Available from: http://cssp.chemspider.com. 14. Williams, A.J., et al., Mobile Apps for chemistry in the world of drug discovery. Drug Disc Today, 2011. 16(21-22): p. 928-939. 15. Williams, A.J. and H.E. Pence, Smart Phones, a Powerful Tool in the Chemistry Classroom. J Chem Educ, 2011. 88: p. 683-686. 16. http://tinyurl.com/3tpnmpn. ChemMobi. Available from: http://tinyurl.com/3tpnmpn. 17. ChemSpider Mobile, [cited 2011 January 4th]; Available from: http://itunes.apple.com/us/app/chemspider/id458878661 18. Bradley, J.C., et al., The Spectral Game: leveraging Open Data and crowdsourcing for education. J Cheminform, 2009. 1(1): p. 9. 19. Learn Chemistry Wiki. [cited 2011 January 4th]; Available from: http://www.rsc.org/learn-chemistry/wiki/Main_Page. 20. SpectraSchool. [cited 2011 January 4th ]; Available from: http://spectraschool.rsc.org/. 21. Williams, A.J. ChemSpider wins Bio-IT Best Practices Award for Community Service. 2010; Available from: http://www.chemspider.com/blog/chemspider-wins-bio-it-best- practices-award-for-community-service.html. 22. ChemSpider wins the APSP Publishing Innovation Prize. 2010; Available from: http://www.chemspider.com/blog/chemspider-wins-the-alpsp-publishing-innovation- prize.html. 23. ChemSpider wins "Most Innovative Software" Award. 2010; Available from: http://www.chemspider.com/blog/chemspider-wins-most-innovative-software- award.html.