SlideShare uma empresa Scribd logo
1 de 50
The importance of standards for
data exchange and interchange
on the Royal Society of
Chemistry eScience platforms
Valery Tkachenko, Colin Batchelor,
Jon Steele and Antony Williams*
ACS Indianapolis
September 12th
2013
RSC Projects in Action
• Many RSC projects underway, underpinned by
ChemSpider, and very dependent on standards
• ChemSpider
• ChemSpider Reactions
• Open PHACTS
• PharmaSea
• Chemical Database Service
• Open Source Drug Discovery
• 3-year Innovative Medicines Initiative project
• Integrating chemistry and biology data using
semantic web technologies
• Open source code, open data and open
standards
• Academics, Pharmas, Publishers…
• To put medicines in the pipeline…
Open Source Drug Discovery
Compound Data
• The standards of chemical structure handling
are primarily molfile, SDfile, SMILES, InChI
• We primarily depend on molfiles and SDF
files for data deposition and interchange
• We use InChI a lot – especially for integrated
searching across the web
• There ARE data interchange problems
associated with structures….
ChemSpider
ChemSpider
Exact Search
Skeleton Search
Compound Data
• The standards of chemical structure handling
are primarily molfile, SDfile, SMILES, InChI
• We primarily depend on molfiles and SDF
files for data deposition and interchange
• We use InChI a lot – especially for integrated
searching across the web
• There ARE data interchange problems
associated with structures….
CVSP : chemical validation
Free chemistry validation platform that performs:
•Structure validation
• Atoms
• Bonds
• Valence
• Stereo
• If aromatic - check that uniquely dearomatized
• Strongest acid not ionized first in partially-ionized system
•Cross-matching of SDF fields
• synonyms
• InChIs
• Smiles
Input formats supported:
CDX, Mol,
Sdf
Zip
Gz
Tab-delimited text files
CVSP: standardization
modules
• Custom processing let’s user to put together workflow
from pre-defined standardization modules list
Reaction Data
• ChemSpider is built for compounds – but
how are they made???
• ChemSpider Reactions is our attempt to
answer the question..
• Integrating both commercial and open data
• RSC Databases, data extracted from our
publications on the DERA project and Open
Data sources of reactions
• Molfiles, CDX files, RXN files
RSC and Chemical Reactions
RSC and Chemical Reactions
RSC Journal Content
• Many 10s/100s of thousands of reactions
contained in our journals
• Electronic Supplementary information data
contains lots more
ChemSpider Reactions
ChemSpider Reactions
ChemSpider SyntheticPages
Spectral Data
• ChemSpider requires spectral data to be
deposited in standard formats – JCAMP or
images
• All spectra available at: http://
www.chemspider.com/spectra.aspx
• Data are deposited on a regular basis
• Students
• Chemical vendors
• Growing collection now
Student Submissions
JCAMP NMR Spectra
Data on ChemSpider
Data Interchange
JCAMP file downloads
• When NMR spectra are stored as JCAMP
then downloads into offline packages are
feasible – MestreLabs, ACD/Labs etc
• Open Data – download versus view
• Store spectra locally and reuse
• Java is increasingly a pain!
• Need to move to HTML5 viewing on
ChemSpider, especially for Mobile Viewing
Spectral Display in the hand
Challenges with Spectra
• JCAMP is good for a lot of spectral data – IR,
Raman, 1D NMR
• MS data is rarely made available in JCAMP
• We would love a ratified JCAMP 6.0 for 2D
data exchange – allows third parties to build
support for download
• ASSIGNED JCAMP spectra can be
supported but no real standards here
…and images
DERA to digitize documents?
• We want to get data out of our historical archive
• What could we do?
• Find chemical names and generate structures
• Find chemical images and generate structures
• Find reactions – and make a database!
• Find data (MP, BP, LogP) and deposit
• Find figures and database them
• Find spectra (and link to structures)
Text-Mining
ESI – Text Spectra
Do we want to search text spectra?
What do we get when we search:
13C NMR (CDCl3, 100 MHz): δ = 14.12 (CH3),
30.11 (CH, benzylic methane), 30.77 (CH,
benzylic methane), 66.12 (CH2), 68.49 (CH2),
117.72, 118.19, 120.29, 122.67, 123.37, 125.69,
125.84, 129.03, 130.00, 130.53 (ArCH), 99.42,
123.60, 134.69, 139.23, 147.21, 147.61,
149.41, 152.62, 154.88 (ArC)
MestreLabs Mnova NMR Beta
1H NMR (CDCl3, 400 MHz):
δ = 2.57 (m, 4H, Me, C(5a)H), 4.24 (d, 1H, J = 4.8 Hz, C(11b)H),
4.35 (t, 1H, Jb = 10.8 Hz, C(6)H), 4.47 (m, 2H, C(5)H), 4.57 (dd,
1H, J = 2.8 Hz, C(6)H), 6.95 (d, 1H, J = 8.4 Hz, ArH), 7.18–7.94
(m, 11H, ArH)
ESI – Text and Image Spectra
Extracted JCAMP Spectrum
Prepare CONSISTENT JCAMP
Data onto ChemSpider
It’s exactly the WRONG WAY!
• We should NOT be mining data out of future
publications
• Structures should be submitted “correctly”
• Spectra should be digital spectral formats,
not images
• ESI should be RICH and interactive
APIs and Standards
• We follow the standard expectations in terms
of how people would want to access our
APIs: RESTful services, JSON handling etc.
• We allow people to pass in queries using
molfiles, SMILES, InChI/Keys etc
• Future will include JCAMP searching
• APIs in use by MANY organizations and of
value to our Open PHACTS, PharmaSea,
Chemical Database Service etc. Also Mobile
Conclusions
• Data Interchange standards are all over our
projects!
• We are grateful to companies, organizations,
contributors who have helped define:
• Structure – Mol,SDF,InChI etc
• Spectra – JCAMP, SPC, NetCDF etc
• W3C standards
For the Next ACS hopefully…
• Build out our ChemSpider Reaction collection
• Grab spectral data out of our ESI!
• Get more submissions in STANDARD formats
• Integrate to spectroscopy handling systems
for deposition in JCAMP
• Push molfiles directly into ChemSpider with
improved deposition platform
• Build out the chemical data repository…
Thank You
Email: williamsa@rsc.org
Twitter: ChemConnector
Personal Blog: www.chemconnector.com
SLIDES: www.slideshare.net/AntonyWilliams

Mais conteúdo relacionado

Mais procurados

Cheminformatics and the Structure Elucidation of Natural Products
Cheminformatics and the Structure Elucidation of Natural ProductsCheminformatics and the Structure Elucidation of Natural Products
Cheminformatics and the Structure Elucidation of Natural Products
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Value of the mediawiki platform for providing content to the chemistry community
Value of the mediawiki platform for providing content to the chemistry communityValue of the mediawiki platform for providing content to the chemistry community
Value of the mediawiki platform for providing content to the chemistry community
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Serving the medicinal chemistry community with Royal Society of Chemistry che...
Serving the medicinal chemistry community with Royal Society of Chemistry che...Serving the medicinal chemistry community with Royal Society of Chemistry che...
Serving the medicinal chemistry community with Royal Society of Chemistry che...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Activities at the Royal Society of Chemistry to gather, extract and analyze b...
Activities at the Royal Society of Chemistry to gather, extract and analyze b...Activities at the Royal Society of Chemistry to gather, extract and analyze b...
Activities at the Royal Society of Chemistry to gather, extract and analyze b...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Dealing with the complex challenge of managing diverse chemistry data online
Dealing with the complex challenge of managing diverse chemistry data onlineDealing with the complex challenge of managing diverse chemistry data online
Dealing with the complex challenge of managing diverse chemistry data online
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Clustering the royal society of chemistry chemical repository to enable enhan...
Clustering the royal society of chemistry chemical repository to enable enhan...Clustering the royal society of chemistry chemical repository to enable enhan...
Clustering the royal society of chemistry chemical repository to enable enhan...
Valery Tkachenko
 
Acs collaborative computational technologies for biomedical research an enabl...
Acs collaborative computational technologies for biomedical research an enabl...Acs collaborative computational technologies for biomedical research an enabl...
Acs collaborative computational technologies for biomedical research an enabl...
Sean Ekins
 

Mais procurados (18)

Data integration and building a profile for yourself as an online scientist
Data integration and building a profile for yourself as an online scientistData integration and building a profile for yourself as an online scientist
Data integration and building a profile for yourself as an online scientist
 
Data Mining Dissertations and Adventures and Experiences in the World of Chem...
Data Mining Dissertations and Adventures and Experiences in the World of Chem...Data Mining Dissertations and Adventures and Experiences in the World of Chem...
Data Mining Dissertations and Adventures and Experiences in the World of Chem...
 
Dealing with the complex challenge of managing diverse analytical chemistry d...
Dealing with the complex challenge of managing diverse analytical chemistry d...Dealing with the complex challenge of managing diverse analytical chemistry d...
Dealing with the complex challenge of managing diverse analytical chemistry d...
 
Cheminformatics and the Structure Elucidation of Natural Products
Cheminformatics and the Structure Elucidation of Natural ProductsCheminformatics and the Structure Elucidation of Natural Products
Cheminformatics and the Structure Elucidation of Natural Products
 
Building a data repository to manage chemistry research data
Building a data repository to manage chemistry research dataBuilding a data repository to manage chemistry research data
Building a data repository to manage chemistry research data
 
Value of the mediawiki platform for providing content to the chemistry community
Value of the mediawiki platform for providing content to the chemistry communityValue of the mediawiki platform for providing content to the chemistry community
Value of the mediawiki platform for providing content to the chemistry community
 
Serving the medicinal chemistry community with Royal Society of Chemistry che...
Serving the medicinal chemistry community with Royal Society of Chemistry che...Serving the medicinal chemistry community with Royal Society of Chemistry che...
Serving the medicinal chemistry community with Royal Society of Chemistry che...
 
Our dire need to mandate data standards and expectations for scientific publi...
Our dire need to mandate data standards and expectations for scientific publi...Our dire need to mandate data standards and expectations for scientific publi...
Our dire need to mandate data standards and expectations for scientific publi...
 
Activities at the Royal Society of Chemistry to gather, extract and analyze b...
Activities at the Royal Society of Chemistry to gather, extract and analyze b...Activities at the Royal Society of Chemistry to gather, extract and analyze b...
Activities at the Royal Society of Chemistry to gather, extract and analyze b...
 
Dealing with the complex challenge of managing diverse chemistry data online
Dealing with the complex challenge of managing diverse chemistry data onlineDealing with the complex challenge of managing diverse chemistry data online
Dealing with the complex challenge of managing diverse chemistry data online
 
Royal society of chemistry activities to develop a data repository for chemis...
Royal society of chemistry activities to develop a data repository for chemis...Royal society of chemistry activities to develop a data repository for chemis...
Royal society of chemistry activities to develop a data repository for chemis...
 
Open innovation contributions from RSC resulting from the Open Phacts project
Open innovation contributions from RSC resulting from the Open Phacts projectOpen innovation contributions from RSC resulting from the Open Phacts project
Open innovation contributions from RSC resulting from the Open Phacts project
 
eScience Resources for the Chemistry Community from the Royal Society of Chem...
eScience Resources for the Chemistry Community from the Royal Society of Chem...eScience Resources for the Chemistry Community from the Royal Society of Chem...
eScience Resources for the Chemistry Community from the Royal Society of Chem...
 
ACS 248th Paper 71 ChAMP Project
ACS 248th Paper 71 ChAMP ProjectACS 248th Paper 71 ChAMP Project
ACS 248th Paper 71 ChAMP Project
 
The needs for chemistry standards, database tools and data curation at the ch...
The needs for chemistry standards, database tools and data curation at the ch...The needs for chemistry standards, database tools and data curation at the ch...
The needs for chemistry standards, database tools and data curation at the ch...
 
Clustering the royal society of chemistry chemical repository to enable enhan...
Clustering the royal society of chemistry chemical repository to enable enhan...Clustering the royal society of chemistry chemical repository to enable enhan...
Clustering the royal society of chemistry chemical repository to enable enhan...
 
Hosting a compound centric community resource for chemistry data
Hosting a compound centric community resource for chemistry dataHosting a compound centric community resource for chemistry data
Hosting a compound centric community resource for chemistry data
 
Acs collaborative computational technologies for biomedical research an enabl...
Acs collaborative computational technologies for biomedical research an enabl...Acs collaborative computational technologies for biomedical research an enabl...
Acs collaborative computational technologies for biomedical research an enabl...
 

Semelhante a The importance of standards for data exchange and interchange on the Royal Society of Chemistry e science platforms

Current initiatives in developing research data repositories at the Royal Soc...
Current initiatives in developing research data repositories at the Royal Soc...Current initiatives in developing research data repositories at the Royal Soc...
Current initiatives in developing research data repositories at the Royal Soc...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Experiences in Hosting Big Chemistry Data Collections for the Community
Experiences in Hosting Big Chemistry Data Collections for the CommunityExperiences in Hosting Big Chemistry Data Collections for the Community
Experiences in Hosting Big Chemistry Data Collections for the Community
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Dealing with the complex challenge of managing diverse chemistry data online
Dealing with the complex challenge of managing diverse chemistry data onlineDealing with the complex challenge of managing diverse chemistry data online
Dealing with the complex challenge of managing diverse chemistry data online
Ken Karapetyan
 
Prediction of pKa from chemical structure using free and open source tools
Prediction of pKa from chemical structure using free and open source toolsPrediction of pKa from chemical structure using free and open source tools
Prediction of pKa from chemical structure using free and open source tools
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Incorporating new technologies and High Throughput Screening in the design an...
Incorporating new technologies and High Throughput Screening in the design an...Incorporating new technologies and High Throughput Screening in the design an...
Incorporating new technologies and High Throughput Screening in the design an...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 

Semelhante a The importance of standards for data exchange and interchange on the Royal Society of Chemistry e science platforms (20)

Current initiatives in developing research data repositories at the Royal Soc...
Current initiatives in developing research data repositories at the Royal Soc...Current initiatives in developing research data repositories at the Royal Soc...
Current initiatives in developing research data repositories at the Royal Soc...
 
Experiences in Hosting Big Chemistry Data Collections for the Community
Experiences in Hosting Big Chemistry Data Collections for the CommunityExperiences in Hosting Big Chemistry Data Collections for the Community
Experiences in Hosting Big Chemistry Data Collections for the Community
 
Using online chemistry databases to facilitate structure identification in ma...
Using online chemistry databases to facilitate structure identification in ma...Using online chemistry databases to facilitate structure identification in ma...
Using online chemistry databases to facilitate structure identification in ma...
 
Teaching analytical spectroscopy using online spectroscopic data
Teaching analytical spectroscopy using online spectroscopic dataTeaching analytical spectroscopy using online spectroscopic data
Teaching analytical spectroscopy using online spectroscopic data
 
Importance of data standards for large scale data integration in chemistry
Importance of data standards for large scale data integration in chemistryImportance of data standards for large scale data integration in chemistry
Importance of data standards for large scale data integration in chemistry
 
NOMAD
NOMADNOMAD
NOMAD
 
Data enhancing the royal society of chemistry publication archive
Data enhancing the royal society of chemistry publication archiveData enhancing the royal society of chemistry publication archive
Data enhancing the royal society of chemistry publication archive
 
Data enhancing the royal society of chemistry publication archive
Data enhancing the royal society of chemistry publication archiveData enhancing the royal society of chemistry publication archive
Data enhancing the royal society of chemistry publication archive
 
Dealing with the complex challenge of managing diverse chemistry data online
Dealing with the complex challenge of managing diverse chemistry data onlineDealing with the complex challenge of managing diverse chemistry data online
Dealing with the complex challenge of managing diverse chemistry data online
 
Approaches for extraction and digital chromatography of chemical data
Approaches for extraction and digital chromatography of chemical dataApproaches for extraction and digital chromatography of chemical data
Approaches for extraction and digital chromatography of chemical data
 
The influence of data curation on QSAR Modeling – Presented at American Chemi...
The influence of data curation on QSAR Modeling – Presented at American Chemi...The influence of data curation on QSAR Modeling – Presented at American Chemi...
The influence of data curation on QSAR Modeling – Presented at American Chemi...
 
Evolution of open chemical information
Evolution of open chemical informationEvolution of open chemical information
Evolution of open chemical information
 
Molecular modelling for in silico drug discovery
Molecular modelling for in silico drug discoveryMolecular modelling for in silico drug discovery
Molecular modelling for in silico drug discovery
 
How the InChI identifier is used to underpin our online chemistry databases a...
How the InChI identifier is used to underpin our online chemistry databases a...How the InChI identifier is used to underpin our online chemistry databases a...
How the InChI identifier is used to underpin our online chemistry databases a...
 
How ACDLabs Software Tools are used by the Royal Society of Chemistry
How ACDLabs Software Tools are used by the Royal Society of ChemistryHow ACDLabs Software Tools are used by the Royal Society of Chemistry
How ACDLabs Software Tools are used by the Royal Society of Chemistry
 
Introduction to Cheminformatics: Accessing data through the CompTox Chemicals...
Introduction to Cheminformatics: Accessing data through the CompTox Chemicals...Introduction to Cheminformatics: Accessing data through the CompTox Chemicals...
Introduction to Cheminformatics: Accessing data through the CompTox Chemicals...
 
ISDD_SK_Talk.V2
ISDD_SK_Talk.V2ISDD_SK_Talk.V2
ISDD_SK_Talk.V2
 
ChemSpider – disseminating data and enabling an abundance of chemistry platforms
ChemSpider – disseminating data and enabling an abundance of chemistry platformsChemSpider – disseminating data and enabling an abundance of chemistry platforms
ChemSpider – disseminating data and enabling an abundance of chemistry platforms
 
Prediction of pKa from chemical structure using free and open source tools
Prediction of pKa from chemical structure using free and open source toolsPrediction of pKa from chemical structure using free and open source tools
Prediction of pKa from chemical structure using free and open source tools
 
Incorporating new technologies and High Throughput Screening in the design an...
Incorporating new technologies and High Throughput Screening in the design an...Incorporating new technologies and High Throughput Screening in the design an...
Incorporating new technologies and High Throughput Screening in the design an...
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Último (20)

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 

The importance of standards for data exchange and interchange on the Royal Society of Chemistry e science platforms

  • 1. The importance of standards for data exchange and interchange on the Royal Society of Chemistry eScience platforms Valery Tkachenko, Colin Batchelor, Jon Steele and Antony Williams* ACS Indianapolis September 12th 2013
  • 2. RSC Projects in Action • Many RSC projects underway, underpinned by ChemSpider, and very dependent on standards • ChemSpider • ChemSpider Reactions • Open PHACTS • PharmaSea • Chemical Database Service • Open Source Drug Discovery
  • 3. • 3-year Innovative Medicines Initiative project • Integrating chemistry and biology data using semantic web technologies • Open source code, open data and open standards • Academics, Pharmas, Publishers… • To put medicines in the pipeline…
  • 4.
  • 5.
  • 6. Open Source Drug Discovery
  • 7. Compound Data • The standards of chemical structure handling are primarily molfile, SDfile, SMILES, InChI • We primarily depend on molfiles and SDF files for data deposition and interchange • We use InChI a lot – especially for integrated searching across the web • There ARE data interchange problems associated with structures….
  • 12. Compound Data • The standards of chemical structure handling are primarily molfile, SDfile, SMILES, InChI • We primarily depend on molfiles and SDF files for data deposition and interchange • We use InChI a lot – especially for integrated searching across the web • There ARE data interchange problems associated with structures….
  • 13. CVSP : chemical validation Free chemistry validation platform that performs: •Structure validation • Atoms • Bonds • Valence • Stereo • If aromatic - check that uniquely dearomatized • Strongest acid not ionized first in partially-ionized system •Cross-matching of SDF fields • synonyms • InChIs • Smiles
  • 14. Input formats supported: CDX, Mol, Sdf Zip Gz Tab-delimited text files
  • 15. CVSP: standardization modules • Custom processing let’s user to put together workflow from pre-defined standardization modules list
  • 16.
  • 17. Reaction Data • ChemSpider is built for compounds – but how are they made??? • ChemSpider Reactions is our attempt to answer the question.. • Integrating both commercial and open data • RSC Databases, data extracted from our publications on the DERA project and Open Data sources of reactions • Molfiles, CDX files, RXN files
  • 18. RSC and Chemical Reactions
  • 19. RSC and Chemical Reactions
  • 20. RSC Journal Content • Many 10s/100s of thousands of reactions contained in our journals • Electronic Supplementary information data contains lots more
  • 23.
  • 25. Spectral Data • ChemSpider requires spectral data to be deposited in standard formats – JCAMP or images • All spectra available at: http:// www.chemspider.com/spectra.aspx • Data are deposited on a regular basis • Students • Chemical vendors • Growing collection now
  • 30. JCAMP file downloads • When NMR spectra are stored as JCAMP then downloads into offline packages are feasible – MestreLabs, ACD/Labs etc • Open Data – download versus view • Store spectra locally and reuse • Java is increasingly a pain! • Need to move to HTML5 viewing on ChemSpider, especially for Mobile Viewing
  • 32. Challenges with Spectra • JCAMP is good for a lot of spectral data – IR, Raman, 1D NMR • MS data is rarely made available in JCAMP • We would love a ratified JCAMP 6.0 for 2D data exchange – allows third parties to build support for download • ASSIGNED JCAMP spectra can be supported but no real standards here
  • 34. DERA to digitize documents? • We want to get data out of our historical archive • What could we do? • Find chemical names and generate structures • Find chemical images and generate structures • Find reactions – and make a database! • Find data (MP, BP, LogP) and deposit • Find figures and database them • Find spectra (and link to structures)
  • 36. ESI – Text Spectra
  • 37. Do we want to search text spectra? What do we get when we search: 13C NMR (CDCl3, 100 MHz): δ = 14.12 (CH3), 30.11 (CH, benzylic methane), 30.77 (CH, benzylic methane), 66.12 (CH2), 68.49 (CH2), 117.72, 118.19, 120.29, 122.67, 123.37, 125.69, 125.84, 129.03, 130.00, 130.53 (ArCH), 99.42, 123.60, 134.69, 139.23, 147.21, 147.61, 149.41, 152.62, 154.88 (ArC)
  • 39. 1H NMR (CDCl3, 400 MHz): δ = 2.57 (m, 4H, Me, C(5a)H), 4.24 (d, 1H, J = 4.8 Hz, C(11b)H), 4.35 (t, 1H, Jb = 10.8 Hz, C(6)H), 4.47 (m, 2H, C(5)H), 4.57 (dd, 1H, J = 2.8 Hz, C(6)H), 6.95 (d, 1H, J = 8.4 Hz, ArH), 7.18–7.94 (m, 11H, ArH)
  • 40. ESI – Text and Image Spectra
  • 41.
  • 43.
  • 46. It’s exactly the WRONG WAY! • We should NOT be mining data out of future publications • Structures should be submitted “correctly” • Spectra should be digital spectral formats, not images • ESI should be RICH and interactive
  • 47. APIs and Standards • We follow the standard expectations in terms of how people would want to access our APIs: RESTful services, JSON handling etc. • We allow people to pass in queries using molfiles, SMILES, InChI/Keys etc • Future will include JCAMP searching • APIs in use by MANY organizations and of value to our Open PHACTS, PharmaSea, Chemical Database Service etc. Also Mobile
  • 48. Conclusions • Data Interchange standards are all over our projects! • We are grateful to companies, organizations, contributors who have helped define: • Structure – Mol,SDF,InChI etc • Spectra – JCAMP, SPC, NetCDF etc • W3C standards
  • 49. For the Next ACS hopefully… • Build out our ChemSpider Reaction collection • Grab spectral data out of our ESI! • Get more submissions in STANDARD formats • Integrate to spectroscopy handling systems for deposition in JCAMP • Push molfiles directly into ChemSpider with improved deposition platform • Build out the chemical data repository…
  • 50. Thank You Email: williamsa@rsc.org Twitter: ChemConnector Personal Blog: www.chemconnector.com SLIDES: www.slideshare.net/AntonyWilliams