SlideShare uma empresa Scribd logo
1 de 64
A Chemistry Data Repository 
to Serve Them All 
Antony Williams
Chemistry for the Community 
• The Royal Society of Chemistry as a 
provider of chemistry for the community: 
• As a charity 
• As a scientific publisher 
• As a host of commercial databases 
• As a partner in grant-based projects 
• As the host of ChemSpider 
• And now in development : the RSC Data 
Repository for Chemistry
• ~30 million chemicals and growing 
• Data sourced from >500 different sources 
• Crowd sourced curation and annotation 
• Ongoing deposition of data from our 
journals and our collaborators 
• Structure centric hub for web-searching 
• …and a really big dictionary!!!
ChemSpider
ChemSpider
Experimental/Predicted Properties
Literature references
Patents references
Books
Vendors and data sources
Aspirin on ChemSpider
Many Names, One Structure
What is the Structure of Vitamin K?
MeSH 
• A lipid cofactor that is required for normal 
blood clotting. 
• Several forms of vitamin K have been 
identified: 
• VITAMIN K 1 (phytomenadione) derived 
from plants, 
• VITAMIN K 2 (menaquinone) from bacteria, 
and synthetic naphthoquinone provitamins, 
• VITAMIN K 3 (menadione).
What is the Structure of Vitamin K?
The ultimate “dictionary” 
• Search all forms of structure IDs 
• Systematic name(s) 
• Trivial Name(s) 
• SMILES 
• InChI Strings 
• InChIKeys 
• Database IDs 
• Registry Number
Linking Names to Structures
Semantic Mark-up of Articles
Crowdsourced “Annotations” 
• Users can add 
• Descriptions, Syntheses and Commentaries 
• Links to PubMed articles 
• Links to articles via DOIs 
• Add spectral data 
• Add Crystallographic Information Files 
• Add photos 
• Add MP3 files 
• Add Videos
Crowdsourced Enhancement 
• The community can clean and enhance the 
database by providing Feedback and direct 
curation 
• Tens of thousands of edits made
ChemSpider 
• ChemSpider allowed the community to 
participate in linking the internet of 
chemistry & crowdsourcing of data 
• Successful experiment in terms of building 
a central hub for integrated web search 
• More people are “users” than “contributors” 
• Yet basic feedback and game-play helps
ChemSpider Spectra
www.SpectralGame.com 
http://www.jcheminf.com/content/1/1/9
An Adventure into the World of 
Small but significant contribution..
ChemSpider SyntheticPages
Micropublishing with Peer Review 
(a chemical synthesis blog?)
Multi-Step Synthesis
Interactive Data
Publications-summary of work 
• Scientific publications are a summary of work 
• Is all work reported? 
• How much science is lost to pruning? 
• What of value sits in notebooks and is lost? 
• Publications offering access to “real data”? 
• How much data is lost? 
• How many compounds never reported? 
• How many syntheses fail or succeed? 
• How many characterization measurements?
Deposition of Research Data 
• If we manage compounds, syntheses and 
analytical data… 
• If we have security and provenance of data… 
• If we deliver user interfaces to satisfy the 
various use cases… 
• Then we have delivered electronic lab 
notebooks for chemistry laboratories. ELNs 
are research data repositories
What are we building? 
• We are building the “RSC Data Repository” 
• Containers for compounds, reactions, analytical 
data, tabular data 
• Algorithms for data validation and standardization 
• Flexible indexing and search technologies 
• A platform for modeling data and hosting existing 
models and predictive algorithms
Deposition of Data
Compounds
Reactions
Analytical data
Crystallography data
Deposition of Data 
• Developing systems that provides 
feedback to users regarding data quality 
• Validate/standardize chemical compounds 
• Check for balanced reactions 
• Checks spectral data 
• EXAMPLE Future work 
• Properties – compare experimental to pred. 
• Automated structure verification - NMR
Can we get historical data? 
• Text and data can be mined 
• Spectra can be extracted and converted 
• SO MUCH Open Source Code available
Text Mining 
The N-(β-hydroxyethyl)-N-methyl-N'-(2-trifluoromethyl-1,3,4- 
thiadiazol-5-yl)urea prepared in Example 6 , thionyl chloride 
( 5 ml ) and benzene ( 50 ml ) were charged into a glass 
reaction vessel equipped with a mechanical stirrer , 
thermometer and reflux condenser . 
The reaction mixture was heated at reflux with stirring , for a 
period of about one-half hour . 
After this time the benzene and unreacted thionyl chloride 
were stripped from the reaction mixture under reduced 
pressure to yield the desired product N-(β-chloroethyl)-N-methyl- 
N'-(2-trifluoromethyl-1,3,4-thiaidazol-5-yl)urea as a 
solid residue
Text Mining 
The N-(β-hydroxyethyl)-N-methyl-N'-(2-trifluoromethyl-1,3,4- 
thiadiazol-5-yl)urea prepared in Example 6 , thionyl chloride 
( 5 ml ) and benzene ( 50 ml ) were charged into a glass 
reaction vessel equipped with a mechanical stirrer , 
thermometer and reflux condenser . 
The reaction mixture was heated at reflux with stirring , for a 
period of about one-half hour . 
After this time the benzene and unreacted thionyl chloride 
were stripped from the reaction mixture under reduced 
pressure to yield the desired product N-(β-chloroethyl)-N-methyl- 
N'-(2-trifluoromethyl-1,3,4-thiaidazol-5-yl)urea as a 
solid residue
Text spectra? 
13C NMR (CDCl3, 100 MHz): δ = 14.12 (CH3), 
30.11 (CH, benzylic methane), 30.77 (CH, 
benzylic methane), 66.12 (CH2), 68.49 (CH2), 
117.72, 118.19, 120.29, 122.67, 123.37, 125.69, 
125.84, 129.03, 130.00, 130.53 (ArCH), 99.42, 
123.60, 134.69, 139.23, 147.21, 147.61, 149.41, 
152.62, 154.88 (ArC)
1H NMR (CDCl3, 400 MHz): 
δ = 2.57 (m, 4H, Me, C(5a)H), 4.24 (d, 1H, J = 4.8 Hz, 
C(11b)H), 4.35 (t, 1H, Jb = 10.8 Hz, C(6)H), 4.47 (m, 2H, 
C(5)H), 4.57 (dd, 1H, J = 2.8 Hz, C(6)H), 6.95 (d, 1H, J = 
8.4 Hz, ArH), 7.18–7.94 (m, 11H, ArH)
Turn “Figures” Into Data
Make it interactive
SO MANY reactions!
Extracting our Archive 
• What could we get from our archive? 
• Find chemical names and generate structures 
• Find chemical images and generate structures 
• Find reactions 
• Find data (MP, BP, LogP) and deposit 
• Find figures and database them 
• Find spectra (and link to structures)
Models published from data
Text-mining Data to compare
Support grant-based services 
• Multiple European consortium-based grants 
• PharmaSea (FP7 funded) 
• Open PHACTS (IMI funded) 
• UK National Chemical Database Service ( 
http://cds.rsc.org) – developing data repository 
for lab data, integrate Electronic Lab 
Notebooks 
• Open Drug Discovery projects
• 3-year Innovative Medicines Initiative project 
• Integrating chemistry and biology data using 
semantic web technologies 
• Open code, open data, open standards 
• Academics, Pharmas, Publishers… 
• To put medicines in the pipeline…
The Open PHACTS community ecosystem
Open Source Drug Discovery India
UK Chemical Database Service 
• The National Chemical Database Service is for 
UK academics
Vision for the Service PART 1 
• Provide access to databases and services of 
interest to the academic community to serve 
their needs. Access to services to include: 
• Crystallography data – Organic and inorganic 
materials 
• Thermophysical data 
• Reactions Data including retrosynthetic analysis 
• Prediction technologies – name generation, 
physicochemical parameters, NMR prediction
Vision for the Service PART 2 
• Response to the call for proposals included 
our vision for a 21st Century data repository 
• At a time of Open Access, Open Data and 
funding agency requirement to make data 
public – build a data repository 
• Funding is split for licensing content and 
services (VAST MAJORITY) and some 
funding for research and development
An Initial “Vague” Vision Set 
• Manage “all” of the chemistry data associated 
with chemical substances 
• Data to be downloadable, reusable, interactive 
• Build a platform that enables the scientist 
• Data storage, validation, standardization and 
curation 
• Collaborative data sharing 
• Provide data platform that can enable and 
enhance publishing of scientific papers
10 years ago… 
• There was no iPad, no iPhone, no Android 
• Facebook had only just been released
ChemSpider is 7 years old… 
• 10s of thousands of users per day 
• Over 30 million chemical compounds 
• Recognized as one of the worlds primary 
sources of chemistry data 
• But 7 years is a long time in software… 
• The data repository is the new foundation, 
ChemSpider is an interface and brand
Powered by RSC Data 
• What will it be like when we are hosting 
chemistry data that doesn’t get published??? 
• Or hosting all data UNTIL it gets published?? 
• What will it be like when computer models are 
being rebuilt every time there is a new dataset 
– validating the data, flagging data 
• What will it be like when publications are not 
only peer-reviewed but also computer 
reviewed?
The Future 
Internet Data 
Commercial Software 
Pre-competitive Data 
Open Science 
Open Data 
Publishers 
Educators 
Open Databases 
Chemical Vendors 
Small organic molecules 
Undefined materials 
Organometallics 
Nanomaterials 
Polymers 
Minerals 
Particle bound 
Links to Biologicals
A Global Chemistry Network 
• The Global Chemistry Network is much bigger 
than just data - scientific networking, 
micro/publishing, integration hub. 
• The data repository as a handler for data, GCN 
as a submission interface, GCN as a profile 
handler, rewards and recognition platform etc. 
• Data repository architecture designed to deliver 
the underpinning data containers and 
visualization widgets etc.
Thank you 
Email: williamsa@rsc.org 
ORCID: 0000-0002-2668-4821 
Twitter: @ChemConnector 
Personal Blog: www.chemconnector.com 
SLIDES: www.slideshare.net/AntonyWilliams

Mais conteúdo relacionado

Mais procurados

ACS 248th Paper 71 ChAMP Project
ACS 248th Paper 71 ChAMP ProjectACS 248th Paper 71 ChAMP Project
ACS 248th Paper 71 ChAMP ProjectStuart Chalk
 
Acs collaborative computational technologies for biomedical research an enabl...
Acs collaborative computational technologies for biomedical research an enabl...Acs collaborative computational technologies for biomedical research an enabl...
Acs collaborative computational technologies for biomedical research an enabl...Sean Ekins
 

Mais procurados (18)

Our dire need to mandate data standards and expectations for scientific publi...
Our dire need to mandate data standards and expectations for scientific publi...Our dire need to mandate data standards and expectations for scientific publi...
Our dire need to mandate data standards and expectations for scientific publi...
 
The application of text and data mining to enhance the RSC publication archive
The application of text and data mining to enhance the RSC publication archiveThe application of text and data mining to enhance the RSC publication archive
The application of text and data mining to enhance the RSC publication archive
 
Data Mining Dissertations and Adventures and Experiences in the World of Chem...
Data Mining Dissertations and Adventures and Experiences in the World of Chem...Data Mining Dissertations and Adventures and Experiences in the World of Chem...
Data Mining Dissertations and Adventures and Experiences in the World of Chem...
 
Value of the mediawiki platform for providing content to the chemistry community
Value of the mediawiki platform for providing content to the chemistry communityValue of the mediawiki platform for providing content to the chemistry community
Value of the mediawiki platform for providing content to the chemistry community
 
Dealing with the complex challenge of managing diverse analytical chemistry d...
Dealing with the complex challenge of managing diverse analytical chemistry d...Dealing with the complex challenge of managing diverse analytical chemistry d...
Dealing with the complex challenge of managing diverse analytical chemistry d...
 
Investigating Impact Metrics for Performance for the US-EPA National Center f...
Investigating Impact Metrics for Performance for the US-EPA National Center f...Investigating Impact Metrics for Performance for the US-EPA National Center f...
Investigating Impact Metrics for Performance for the US-EPA National Center f...
 
Activities at the Royal Society of Chemistry to gather, extract and analyze b...
Activities at the Royal Society of Chemistry to gather, extract and analyze b...Activities at the Royal Society of Chemistry to gather, extract and analyze b...
Activities at the Royal Society of Chemistry to gather, extract and analyze b...
 
eScience Resources for the Chemistry Community from the Royal Society of Chem...
eScience Resources for the Chemistry Community from the Royal Society of Chem...eScience Resources for the Chemistry Community from the Royal Society of Chem...
eScience Resources for the Chemistry Community from the Royal Society of Chem...
 
Serving the medicinal chemistry community with Royal Society of Chemistry che...
Serving the medicinal chemistry community with Royal Society of Chemistry che...Serving the medicinal chemistry community with Royal Society of Chemistry che...
Serving the medicinal chemistry community with Royal Society of Chemistry che...
 
eScience at the Royal Society of Chemistry and our current initiatives
eScience at the Royal Society of Chemistry and our current initiativeseScience at the Royal Society of Chemistry and our current initiatives
eScience at the Royal Society of Chemistry and our current initiatives
 
The needs for chemistry standards, database tools and data curation at the ch...
The needs for chemistry standards, database tools and data curation at the ch...The needs for chemistry standards, database tools and data curation at the ch...
The needs for chemistry standards, database tools and data curation at the ch...
 
Dealing with the complex challenge of managing diverse chemistry data online
Dealing with the complex challenge of managing diverse chemistry data onlineDealing with the complex challenge of managing diverse chemistry data online
Dealing with the complex challenge of managing diverse chemistry data online
 
Open innovation contributions from RSC resulting from the Open Phacts project
Open innovation contributions from RSC resulting from the Open Phacts projectOpen innovation contributions from RSC resulting from the Open Phacts project
Open innovation contributions from RSC resulting from the Open Phacts project
 
Encouraging undergraduate students to participate as authors of scientific pu...
Encouraging undergraduate students to participate as authors of scientific pu...Encouraging undergraduate students to participate as authors of scientific pu...
Encouraging undergraduate students to participate as authors of scientific pu...
 
Royal society of chemistry activities to develop a data repository for chemis...
Royal society of chemistry activities to develop a data repository for chemis...Royal society of chemistry activities to develop a data repository for chemis...
Royal society of chemistry activities to develop a data repository for chemis...
 
ACS 248th Paper 71 ChAMP Project
ACS 248th Paper 71 ChAMP ProjectACS 248th Paper 71 ChAMP Project
ACS 248th Paper 71 ChAMP Project
 
Hosting a compound centric community resource for chemistry data
Hosting a compound centric community resource for chemistry dataHosting a compound centric community resource for chemistry data
Hosting a compound centric community resource for chemistry data
 
Acs collaborative computational technologies for biomedical research an enabl...
Acs collaborative computational technologies for biomedical research an enabl...Acs collaborative computational technologies for biomedical research an enabl...
Acs collaborative computational technologies for biomedical research an enabl...
 

Semelhante a RSC Data Repository to Serve the Chemistry Community

ICIC 2013 Conference Proceedings Antony Williams Royal Society of Chemistry
ICIC 2013 Conference Proceedings Antony Williams Royal Society of ChemistryICIC 2013 Conference Proceedings Antony Williams Royal Society of Chemistry
ICIC 2013 Conference Proceedings Antony Williams Royal Society of ChemistryDr. Haxel Consult
 
The Global Chemistry Network - driving innovation
The Global Chemistry Network - driving innovationThe Global Chemistry Network - driving innovation
The Global Chemistry Network - driving innovationRoyal Society of Chemistry
 
Dealing with the complex challenge of managing diverse chemistry data online
Dealing with the complex challenge of managing diverse chemistry data onlineDealing with the complex challenge of managing diverse chemistry data online
Dealing with the complex challenge of managing diverse chemistry data onlineKen Karapetyan
 
Royal society of chemistry developments to support open drug discovery
Royal society of chemistry developments to support open drug discoveryRoyal society of chemistry developments to support open drug discovery
Royal society of chemistry developments to support open drug discoveryKen Karapetyan
 
ChemSpider – disseminating data and enabling an abundance of chemistry platforms
ChemSpider – disseminating data and enabling an abundance of chemistry platformsChemSpider – disseminating data and enabling an abundance of chemistry platforms
ChemSpider – disseminating data and enabling an abundance of chemistry platformsKen Karapetyan
 
ChemSpider reactions – delivering a free community resource of chemical synth...
ChemSpider reactions – delivering a free community resource of chemical synth...ChemSpider reactions – delivering a free community resource of chemical synth...
ChemSpider reactions – delivering a free community resource of chemical synth...Ken Karapetyan
 

Semelhante a RSC Data Repository to Serve the Chemistry Community (20)

Current initiatives in developing research data repositories at the Royal Soc...
Current initiatives in developing research data repositories at the Royal Soc...Current initiatives in developing research data repositories at the Royal Soc...
Current initiatives in developing research data repositories at the Royal Soc...
 
ChemSpider as an integration hub for interlinked chemistry data
ChemSpider as an integration hub for interlinked chemistry dataChemSpider as an integration hub for interlinked chemistry data
ChemSpider as an integration hub for interlinked chemistry data
 
ICIC 2013 Conference Proceedings Antony Williams Royal Society of Chemistry
ICIC 2013 Conference Proceedings Antony Williams Royal Society of ChemistryICIC 2013 Conference Proceedings Antony Williams Royal Society of Chemistry
ICIC 2013 Conference Proceedings Antony Williams Royal Society of Chemistry
 
Building a data repository to manage chemistry research data
Building a data repository to manage chemistry research dataBuilding a data repository to manage chemistry research data
Building a data repository to manage chemistry research data
 
Experiences in Hosting Big Chemistry Data Collections for the Community
Experiences in Hosting Big Chemistry Data Collections for the CommunityExperiences in Hosting Big Chemistry Data Collections for the Community
Experiences in Hosting Big Chemistry Data Collections for the Community
 
The Global Chemistry Network - driving innovation
The Global Chemistry Network - driving innovationThe Global Chemistry Network - driving innovation
The Global Chemistry Network - driving innovation
 
Dealing with the complex challenge of managing diverse chemistry data online
Dealing with the complex challenge of managing diverse chemistry data onlineDealing with the complex challenge of managing diverse chemistry data online
Dealing with the complex challenge of managing diverse chemistry data online
 
Royal society of chemistry developments to support open drug discovery
Royal society of chemistry developments to support open drug discoveryRoyal society of chemistry developments to support open drug discovery
Royal society of chemistry developments to support open drug discovery
 
Royal society of chemistry developments to support open drug discovery
Royal society of chemistry developments to support open drug discoveryRoyal society of chemistry developments to support open drug discovery
Royal society of chemistry developments to support open drug discovery
 
ChemSpider – disseminating data and enabling an abundance of chemistry platforms
ChemSpider – disseminating data and enabling an abundance of chemistry platformsChemSpider – disseminating data and enabling an abundance of chemistry platforms
ChemSpider – disseminating data and enabling an abundance of chemistry platforms
 
Accessing Environmental Chemistry Data via Data Dashboards and Applications t...
Accessing Environmental Chemistry Data via Data Dashboards and Applications t...Accessing Environmental Chemistry Data via Data Dashboards and Applications t...
Accessing Environmental Chemistry Data via Data Dashboards and Applications t...
 
The importance of the InChI identifier as a foundation technology for eScienc...
The importance of the InChI identifier as a foundation technology for eScienc...The importance of the InChI identifier as a foundation technology for eScienc...
The importance of the InChI identifier as a foundation technology for eScienc...
 
The expansive reach of ChemSpider as a resource for the chemistry community
The expansive reach of ChemSpider as a resource for the chemistry communityThe expansive reach of ChemSpider as a resource for the chemistry community
The expansive reach of ChemSpider as a resource for the chemistry community
 
Introduction to Cheminformatics: Accessing data through the CompTox Chemicals...
Introduction to Cheminformatics: Accessing data through the CompTox Chemicals...Introduction to Cheminformatics: Accessing data through the CompTox Chemicals...
Introduction to Cheminformatics: Accessing data through the CompTox Chemicals...
 
Hosting public domain chemicals data online for the community – the challenge...
Hosting public domain chemicals data online for the community – the challenge...Hosting public domain chemicals data online for the community – the challenge...
Hosting public domain chemicals data online for the community – the challenge...
 
ChemSpider reactions – delivering a free community resource of chemical synth...
ChemSpider reactions – delivering a free community resource of chemical synth...ChemSpider reactions – delivering a free community resource of chemical synth...
ChemSpider reactions – delivering a free community resource of chemical synth...
 
US-EPA Chemicals Dashboard – an integrated data hub for environmental science
US-EPA Chemicals Dashboard – an integrated data hub for environmental scienceUS-EPA Chemicals Dashboard – an integrated data hub for environmental science
US-EPA Chemicals Dashboard – an integrated data hub for environmental science
 
New Approach Methods - What is That?
New Approach Methods - What is That?New Approach Methods - What is That?
New Approach Methods - What is That?
 
Royal Society of Chemistry projects underpinning open innovation
Royal Society of Chemistry projects underpinning open innovationRoyal Society of Chemistry projects underpinning open innovation
Royal Society of Chemistry projects underpinning open innovation
 
Cheminformatics tools and chemistry data underpinning mass spectrometry analy...
Cheminformatics tools and chemistry data underpinning mass spectrometry analy...Cheminformatics tools and chemistry data underpinning mass spectrometry analy...
Cheminformatics tools and chemistry data underpinning mass spectrometry analy...
 

Último

Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxpradhanghanshyam7136
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfnehabiju2046
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |aasikanpl
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfSwapnil Therkar
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptMAESTRELLAMesa2
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxSwapnil Therkar
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Patrick Diehl
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 

Último (20)

Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptx
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdf
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.ppt
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptx
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 

RSC Data Repository to Serve the Chemistry Community

  • 1. A Chemistry Data Repository to Serve Them All Antony Williams
  • 2. Chemistry for the Community • The Royal Society of Chemistry as a provider of chemistry for the community: • As a charity • As a scientific publisher • As a host of commercial databases • As a partner in grant-based projects • As the host of ChemSpider • And now in development : the RSC Data Repository for Chemistry
  • 3. • ~30 million chemicals and growing • Data sourced from >500 different sources • Crowd sourced curation and annotation • Ongoing deposition of data from our journals and our collaborators • Structure centric hub for web-searching • …and a really big dictionary!!!
  • 10. Vendors and data sources
  • 12. Many Names, One Structure
  • 13. What is the Structure of Vitamin K?
  • 14. MeSH • A lipid cofactor that is required for normal blood clotting. • Several forms of vitamin K have been identified: • VITAMIN K 1 (phytomenadione) derived from plants, • VITAMIN K 2 (menaquinone) from bacteria, and synthetic naphthoquinone provitamins, • VITAMIN K 3 (menadione).
  • 15. What is the Structure of Vitamin K?
  • 16. The ultimate “dictionary” • Search all forms of structure IDs • Systematic name(s) • Trivial Name(s) • SMILES • InChI Strings • InChIKeys • Database IDs • Registry Number
  • 17. Linking Names to Structures
  • 19. Crowdsourced “Annotations” • Users can add • Descriptions, Syntheses and Commentaries • Links to PubMed articles • Links to articles via DOIs • Add spectral data • Add Crystallographic Information Files • Add photos • Add MP3 files • Add Videos
  • 20. Crowdsourced Enhancement • The community can clean and enhance the database by providing Feedback and direct curation • Tens of thousands of edits made
  • 21. ChemSpider • ChemSpider allowed the community to participate in linking the internet of chemistry & crowdsourcing of data • Successful experiment in terms of building a central hub for integrated web search • More people are “users” than “contributors” • Yet basic feedback and game-play helps
  • 24. An Adventure into the World of Small but significant contribution..
  • 26. Micropublishing with Peer Review (a chemical synthesis blog?)
  • 29. Publications-summary of work • Scientific publications are a summary of work • Is all work reported? • How much science is lost to pruning? • What of value sits in notebooks and is lost? • Publications offering access to “real data”? • How much data is lost? • How many compounds never reported? • How many syntheses fail or succeed? • How many characterization measurements?
  • 30. Deposition of Research Data • If we manage compounds, syntheses and analytical data… • If we have security and provenance of data… • If we deliver user interfaces to satisfy the various use cases… • Then we have delivered electronic lab notebooks for chemistry laboratories. ELNs are research data repositories
  • 31. What are we building? • We are building the “RSC Data Repository” • Containers for compounds, reactions, analytical data, tabular data • Algorithms for data validation and standardization • Flexible indexing and search technologies • A platform for modeling data and hosting existing models and predictive algorithms
  • 37. Deposition of Data • Developing systems that provides feedback to users regarding data quality • Validate/standardize chemical compounds • Check for balanced reactions • Checks spectral data • EXAMPLE Future work • Properties – compare experimental to pred. • Automated structure verification - NMR
  • 38. Can we get historical data? • Text and data can be mined • Spectra can be extracted and converted • SO MUCH Open Source Code available
  • 39. Text Mining The N-(β-hydroxyethyl)-N-methyl-N'-(2-trifluoromethyl-1,3,4- thiadiazol-5-yl)urea prepared in Example 6 , thionyl chloride ( 5 ml ) and benzene ( 50 ml ) were charged into a glass reaction vessel equipped with a mechanical stirrer , thermometer and reflux condenser . The reaction mixture was heated at reflux with stirring , for a period of about one-half hour . After this time the benzene and unreacted thionyl chloride were stripped from the reaction mixture under reduced pressure to yield the desired product N-(β-chloroethyl)-N-methyl- N'-(2-trifluoromethyl-1,3,4-thiaidazol-5-yl)urea as a solid residue
  • 40. Text Mining The N-(β-hydroxyethyl)-N-methyl-N'-(2-trifluoromethyl-1,3,4- thiadiazol-5-yl)urea prepared in Example 6 , thionyl chloride ( 5 ml ) and benzene ( 50 ml ) were charged into a glass reaction vessel equipped with a mechanical stirrer , thermometer and reflux condenser . The reaction mixture was heated at reflux with stirring , for a period of about one-half hour . After this time the benzene and unreacted thionyl chloride were stripped from the reaction mixture under reduced pressure to yield the desired product N-(β-chloroethyl)-N-methyl- N'-(2-trifluoromethyl-1,3,4-thiaidazol-5-yl)urea as a solid residue
  • 41. Text spectra? 13C NMR (CDCl3, 100 MHz): δ = 14.12 (CH3), 30.11 (CH, benzylic methane), 30.77 (CH, benzylic methane), 66.12 (CH2), 68.49 (CH2), 117.72, 118.19, 120.29, 122.67, 123.37, 125.69, 125.84, 129.03, 130.00, 130.53 (ArCH), 99.42, 123.60, 134.69, 139.23, 147.21, 147.61, 149.41, 152.62, 154.88 (ArC)
  • 42. 1H NMR (CDCl3, 400 MHz): δ = 2.57 (m, 4H, Me, C(5a)H), 4.24 (d, 1H, J = 4.8 Hz, C(11b)H), 4.35 (t, 1H, Jb = 10.8 Hz, C(6)H), 4.47 (m, 2H, C(5)H), 4.57 (dd, 1H, J = 2.8 Hz, C(6)H), 6.95 (d, 1H, J = 8.4 Hz, ArH), 7.18–7.94 (m, 11H, ArH)
  • 46. Extracting our Archive • What could we get from our archive? • Find chemical names and generate structures • Find chemical images and generate structures • Find reactions • Find data (MP, BP, LogP) and deposit • Find figures and database them • Find spectra (and link to structures)
  • 49. Support grant-based services • Multiple European consortium-based grants • PharmaSea (FP7 funded) • Open PHACTS (IMI funded) • UK National Chemical Database Service ( http://cds.rsc.org) – developing data repository for lab data, integrate Electronic Lab Notebooks • Open Drug Discovery projects
  • 50.
  • 51. • 3-year Innovative Medicines Initiative project • Integrating chemistry and biology data using semantic web technologies • Open code, open data, open standards • Academics, Pharmas, Publishers… • To put medicines in the pipeline…
  • 52. The Open PHACTS community ecosystem
  • 53. Open Source Drug Discovery India
  • 54. UK Chemical Database Service • The National Chemical Database Service is for UK academics
  • 55. Vision for the Service PART 1 • Provide access to databases and services of interest to the academic community to serve their needs. Access to services to include: • Crystallography data – Organic and inorganic materials • Thermophysical data • Reactions Data including retrosynthetic analysis • Prediction technologies – name generation, physicochemical parameters, NMR prediction
  • 56.
  • 57. Vision for the Service PART 2 • Response to the call for proposals included our vision for a 21st Century data repository • At a time of Open Access, Open Data and funding agency requirement to make data public – build a data repository • Funding is split for licensing content and services (VAST MAJORITY) and some funding for research and development
  • 58. An Initial “Vague” Vision Set • Manage “all” of the chemistry data associated with chemical substances • Data to be downloadable, reusable, interactive • Build a platform that enables the scientist • Data storage, validation, standardization and curation • Collaborative data sharing • Provide data platform that can enable and enhance publishing of scientific papers
  • 59. 10 years ago… • There was no iPad, no iPhone, no Android • Facebook had only just been released
  • 60. ChemSpider is 7 years old… • 10s of thousands of users per day • Over 30 million chemical compounds • Recognized as one of the worlds primary sources of chemistry data • But 7 years is a long time in software… • The data repository is the new foundation, ChemSpider is an interface and brand
  • 61. Powered by RSC Data • What will it be like when we are hosting chemistry data that doesn’t get published??? • Or hosting all data UNTIL it gets published?? • What will it be like when computer models are being rebuilt every time there is a new dataset – validating the data, flagging data • What will it be like when publications are not only peer-reviewed but also computer reviewed?
  • 62. The Future Internet Data Commercial Software Pre-competitive Data Open Science Open Data Publishers Educators Open Databases Chemical Vendors Small organic molecules Undefined materials Organometallics Nanomaterials Polymers Minerals Particle bound Links to Biologicals
  • 63. A Global Chemistry Network • The Global Chemistry Network is much bigger than just data - scientific networking, micro/publishing, integration hub. • The data repository as a handler for data, GCN as a submission interface, GCN as a profile handler, rewards and recognition platform etc. • Data repository architecture designed to deliver the underpinning data containers and visualization widgets etc.
  • 64. Thank you Email: williamsa@rsc.org ORCID: 0000-0002-2668-4821 Twitter: @ChemConnector Personal Blog: www.chemconnector.com SLIDES: www.slideshare.net/AntonyWilliams