Presentation delivered by Colin Batchelor from the RSC eScience team at ACS New Orleans Spring Meeting April 2013.
There are dozens of public compound databases now available online, some of these providing access to tens of millions of chemical compounds. However, very little effort has been put into the delivery of databases of chemical reactions with the majority of large resources being commercial in nature. In our five years of delivering chemical based data resources to the chemistry community one of the primary requests has been that chemists want to know how to synthesize many of the chemicals they are researching. This presentation will provide an overview of our concerted efforts to enhance access to freely available chemistry data and will discuss the ChemSpider Reactions as an integrating hub of content including data extracted from US patents, from RSC Journals and databases and from our micro-publishing platform ChemSpider Synthetic Pages (CSSP).
Why Teams call analytics are critical to your entire business
ChemSpider reactions – delivering a free community resource of chemical syntheses
1. ChemSpider Reactions:
Delivering a free community
resource of chemical syntheses
Valery Tkachenko, Colin Batchelor, Daniel Lowe, Ken
Karapetyan, David Sharpe and Antony Williams
ACS New Orleans April 2013
2. Overview
• Motivation
• The RSC and chemical reaction data
• New sources of chemical reaction data
• ChemSpider Reactions: bringing it all together
• Experiments with reaction classification
• The National Chemical Database Service
3. Who needs another reaction
database?
• Those who cannot afford to license access…
• Those who would like to access data that is
not abstracted
• Those who might like to contribute data to a
database
• Anybody wanting to integrate their systems in
and to pull data out.
4. RSC and chemical reaction data 1
Graphical abstracting journals:
Methods in Organic Synthesis (monthly, 1990 to present)
Catalysts and Catalysed Reactions (monthly, 2005 to
present)
These constitute a backfile of over 50000 novel reactions
7. New sources of reaction data
Daniel Lowe’s PhD thesis (Cantab, 2012) was on
extracting reactions from US patent data.
We can apply this technology to the RSC Journal
archive.
8. ChemSpider Reactions
bringing it all together
http://csr.dev.rsc-us.org/
WORK IN PROGRESS
9. Reaction classification 1
Project Prospect has text-mined RSC journal
articles for named reactions and molecular
processes, annotated according to Creative
Commons-licensed ontologies:
See http://rxno.googlecode.com/
11. Reaction InChI
To do for reactions what InChI has done for
structures
•Think online searching
•Deduplication and linking
http://www-rinchi.ch.cam.ac.uk/help.html
12. Reaction InChI
Early work – RInChIs layered on to a few
hundred thousand reactions
•Not generated for a few 10s of thousands of
reactions
•Reaction deduplication results differ based on
algorithm – GGA software versus RInChI
•Under investigation
14. What will ChemSpider Reactions serve?
• Chemical Database Service
• Linking back to original
publications/supplementary data
• Underpinning other tools e.g. retrosynthetic
analysis (depends on data quality and
mapping)
15. Chemical Database
Service
National Chemical Database Service
for UK academics
Integrates commercial databases and
services
Chemicals, analytical data, prediction
algorithms
Development of data repository
16. ARChem from SimBioSys 1
Synthesis planning tool which performs rule-
and precedent-based retrosynthetic analysis
back to commercially available starting
materials.
19. But what about data quality?
• Data validation and curation
required
• Encouraging participation with
Rewards and RECOGNITION
20. Manual curation
• Integrated commenting, curating and validation
platform across ALL eScience and Publishing
platforms
• All integrated to a central RSC profile and
feeding the alt-metrics tools
21. The other kind of RDF
(made-up example)
Chemical reactions are unusually well-suited to representation. (Donald
Davidson’s event semantics)
_:r1 a obo:RXNO_0000004 ; # Diels–Alder
obo:has_participant_ceasing_to_exist _:m1 ;
# a diene
obo:has_participant_ceasing_to_exist _:m2 ;
# an olefin
obo:has_participant_starting_to_exist _:m3 .
# a substituted cyclohexene
_:m1 a <http://rdf.chemspider.com/233000> .
_:m2 a <http://rdf.chemspider.com/233001> .
_:m3 a <http://rdf.chemspider.com/233002> .