SlideShare uma empresa Scribd logo
1 de 42
Connecting Chemistry Across the
     Internet Using ChemSpider

       Antony J Williams and Valery Tkachenko
                     SERMACS, November 15th 2012
Chemistry Data and the Weeds
Tell me about Roundup
So what is Round Up?
The World’s Encyclopedia
Roundup
Where do we Round Up data?
   Where can I find the molfile for Roundup?
   Papers/Patents about Roundup?
   What are the side effects of Roundup?
   Where can I order Roundup?
   What are the physicochemical properties?
   Metabolic pathways?
   Different synonyms of Roundup?
   Synthesis of Roundup?
   Side effects of Roundup?
   Etc….
Where do I Round Up Data?
In an increasing LinkedData map….
But I want to aggregate data? So…
ChemSpider
 Takes on the role of a structure centric hub:

   Connecting, validating, qualifying data
   Enhancing data with connections to services
   Provides access to data and services for others
    to use (Thermo, Agilent, Bruker, Waters,
    ACD/Labs, Accelrys, etc.)
   Uses available services to integrate, connect
    and enhance the offering
Roundup on ChemSpider
What will ChemSpider give us??
What will ChemSpider give us??
What will ChemSpider give us??
What will ChemSpider give us??
What will ChemSpider give us??
What will ChemSpider give us??
ChemSpider is Collapsing Data???
What will ChemSpider give us??
For Glyphosate itself
How did we build it?
 We deal in Molfiles or SDF files – with coordinates
 Deposit anything that has an InChI – we support
  what InChI can handle, good and bad
 Standardization based on “InChI standardization”
 InChIs aggregate (certain) tautomers

 How much of ChemSpider is “on ChemSpider”?
Connecting Chemistry across the web
 So much of what is seen on ChemSpider is
  retrieved in real time using services
Connecting Chemistry across the web
Online Predictions
A Comment on Quality
 For >28 million chemical compounds there are
  some errors:

     “Incorrect” structure representations
     Mismatched name-structure relationships
     Experimental properties (the values, the units)
     Real vs. virtual compounds – text-mining and
      conversion

   We have deprecated a LOT of data…
Downsides of InChI

 Good for small molecules – but no polymers,
  issues with inorganics, organometallics, imperfect
  stereochemistry. ChemSpider is “small molecules”

 InChI used as the “deduplicator” – FIRST version
  of a compound into the database becomes THE
  structure to deduplicate against…
Side Effects of InChI Usage
SMILES by comparison…
Side Effects of InChI Usage
Standardization Issues
Depiction based on molfile
Downsides of Overall Approach
 Meshing data together based on InChIs worked
  for simple molecules

 2D layout errors inherited or limited by algorithm

 Complex molecules that are meant to be the
  same thing were NOT deduplicated. Compounds
  differing by one stereocenter, named the same,
  meant to be the same, are not the same
So much data online is “erroneous”
The confusion of name-structures
Collapsing Data – Standardization
What needs to happen?
 If we could validate
    Catch errors in databases (and clean)
    Proactively catch errors in publications/patents
    Reduce junk in the ether – improve QUALITY!

 If we collectively standardized
    Interlinking between databases should improve


   CVSP – a separate presentation….stick around
Crowdsourcing ChemSpider

 ChemSpider is crowdsourced

 Community deposition, annotation
  and curation

 Anyone can “Leave Feedback”

 Registered users can add data
ChemSpider and Global Chemistry Hub
                      Internet Data




 Small organic molecules              Commercial Software
 Undefined materials                  Pre-competitive Data
 Organometallics                            Open Science
 Nanomaterials                                 Open Data
 Polymers                                      Publishers
 Minerals                                      Educators
 Particle bound                           Open Databases
 Links to Biologicals                   Chemical Vendors
Delivering a Prediction Platform
 Experimental data will be used as the basis of
  model generation – a predictive platform…
The Future of ChemSpider
 Continued focus on quality over quantity –
  but more data is good too!
 ChemSpider Reactions – work in progress
  and includes >300,000 reactions
 Plugging in a validation and standardization
  platform
 Delivering personal and institutional
  repository capabilities
Thank you

Email: williamsa@rsc.org
Twitter: ChemConnector
Personal Blog: www.chemconnector.com
SLIDES: www.slideshare.net/AntonyWilliams

Mais conteúdo relacionado

Mais procurados

ICIC 2013 New Product Introductions CAS
ICIC 2013 New Product Introductions CASICIC 2013 New Product Introductions CAS
ICIC 2013 New Product Introductions CAS
Dr. Haxel Consult
 
Why Chemistry and the Web Will Benefit from a ChemSpider
Why Chemistry and the Web Will Benefit from a ChemSpiderWhy Chemistry and the Web Will Benefit from a ChemSpider

Mais procurados (19)

ICIC 2013 New Product Introductions CAS
ICIC 2013 New Product Introductions CASICIC 2013 New Product Introductions CAS
ICIC 2013 New Product Introductions CAS
 
Taming The Wild West Of Internet Based Chemistry You Can Help
Taming The Wild West Of Internet Based Chemistry You Can HelpTaming The Wild West Of Internet Based Chemistry You Can Help
Taming The Wild West Of Internet Based Chemistry You Can Help
 
Whitney Symposium Lecture June 2008
Whitney Symposium Lecture June 2008Whitney Symposium Lecture June 2008
Whitney Symposium Lecture June 2008
 
Chemistry in the hand: The delivery of structure databases and spectroscopy g...
Chemistry in the hand: The delivery of structure databases and spectroscopy g...Chemistry in the hand: The delivery of structure databases and spectroscopy g...
Chemistry in the hand: The delivery of structure databases and spectroscopy g...
 
How the web has weaved a web of interlinked chemistry data final
How the web has weaved a web of interlinked chemistry data finalHow the web has weaved a web of interlinked chemistry data final
How the web has weaved a web of interlinked chemistry data final
 
Short Overview of ChemSPider at Drexel University
Short Overview of ChemSPider at Drexel UniversityShort Overview of ChemSPider at Drexel University
Short Overview of ChemSPider at Drexel University
 
Chemistry apps in the wild and at the royal society of chemistry
Chemistry apps in the wild and at the royal society of chemistryChemistry apps in the wild and at the royal society of chemistry
Chemistry apps in the wild and at the royal society of chemistry
 
Improving online chemistry one structure at a time
Improving online chemistry one structure at a timeImproving online chemistry one structure at a time
Improving online chemistry one structure at a time
 
ChemSpider as an integration hub for interlinked chemistry data
ChemSpider as an integration hub for interlinked chemistry dataChemSpider as an integration hub for interlinked chemistry data
ChemSpider as an integration hub for interlinked chemistry data
 
Gartner Healthcare Supply Chain Roundtable: October 14, Chicago
Gartner Healthcare Supply Chain Roundtable: October 14, ChicagoGartner Healthcare Supply Chain Roundtable: October 14, Chicago
Gartner Healthcare Supply Chain Roundtable: October 14, Chicago
 
Enhancing Discoverability Across Royal Society Of Chemistry Content By Integr...
Enhancing Discoverability Across Royal Society Of Chemistry Content By Integr...Enhancing Discoverability Across Royal Society Of Chemistry Content By Integr...
Enhancing Discoverability Across Royal Society Of Chemistry Content By Integr...
 
Text Mining for Chemistry and Building a Public Platform for Document Markup
Text Mining for Chemistry and Building a Public Platform for Document MarkupText Mining for Chemistry and Building a Public Platform for Document Markup
Text Mining for Chemistry and Building a Public Platform for Document Markup
 
The royal society of chemistry and its adoption of semantic web technologies ...
The royal society of chemistry and its adoption of semantic web technologies ...The royal society of chemistry and its adoption of semantic web technologies ...
The royal society of chemistry and its adoption of semantic web technologies ...
 
ChemSpider – The Vision and Challenges Associated with Building a Free Online...
ChemSpider – The Vision and Challenges Associated with Building a Free Online...ChemSpider – The Vision and Challenges Associated with Building a Free Online...
ChemSpider – The Vision and Challenges Associated with Building a Free Online...
 
The needs for chemistry standards, database tools and data curation at the ch...
The needs for chemistry standards, database tools and data curation at the ch...The needs for chemistry standards, database tools and data curation at the ch...
The needs for chemistry standards, database tools and data curation at the ch...
 
eScience Resources for the Chemistry Community from the Royal Society of Chem...
eScience Resources for the Chemistry Community from the Royal Society of Chem...eScience Resources for the Chemistry Community from the Royal Society of Chem...
eScience Resources for the Chemistry Community from the Royal Society of Chem...
 
Why Chemistry and the Web Will Benefit from a ChemSpider
Why Chemistry and the Web Will Benefit from a ChemSpiderWhy Chemistry and the Web Will Benefit from a ChemSpider
Why Chemistry and the Web Will Benefit from a ChemSpider
 
ChemSpider – An Online Database and Registration System Linking the Web
ChemSpider – An Online Database and  Registration System Linking the WebChemSpider – An Online Database and  Registration System Linking the Web
ChemSpider – An Online Database and Registration System Linking the Web
 
How an Online Resource for Chemistry Can Change Our World
How an Online Resource for Chemistry Can Change Our WorldHow an Online Resource for Chemistry Can Change Our World
How an Online Resource for Chemistry Can Change Our World
 

Destaque

Building a Standard for Standards: The ChAMP Project
Building a Standard for Standards: The ChAMP ProjectBuilding a Standard for Standards: The ChAMP Project
Building a Standard for Standards: The ChAMP Project
Stuart Chalk
 
Acs collaborative computational technologies for biomedical research an enabl...
Acs collaborative computational technologies for biomedical research an enabl...Acs collaborative computational technologies for biomedical research an enabl...
Acs collaborative computational technologies for biomedical research an enabl...
Sean Ekins
 
ChemSpider compound database as one of the pillars of a semantic web for …
ChemSpider compound database as one of the pillars of a semantic web for …ChemSpider compound database as one of the pillars of a semantic web for …
ChemSpider compound database as one of the pillars of a semantic web for …
Valery Tkachenko
 

Destaque (9)

Building a semantic chemistry platform with the royal society of chemistry
Building a semantic chemistry platform with the royal society of chemistryBuilding a semantic chemistry platform with the royal society of chemistry
Building a semantic chemistry platform with the royal society of chemistry
 
Building a Standard for Standards: The ChAMP Project
Building a Standard for Standards: The ChAMP ProjectBuilding a Standard for Standards: The ChAMP Project
Building a Standard for Standards: The ChAMP Project
 
ACS 248th Paper 136 JSmol/JSpecView Eureka Integration
ACS 248th Paper 136 JSmol/JSpecView Eureka IntegrationACS 248th Paper 136 JSmol/JSpecView Eureka Integration
ACS 248th Paper 136 JSmol/JSpecView Eureka Integration
 
Acs finding promiscuous old drugs for new uses-final
Acs   finding promiscuous old drugs for new uses-finalAcs   finding promiscuous old drugs for new uses-final
Acs finding promiscuous old drugs for new uses-final
 
Acs collaborative computational technologies for biomedical research an enabl...
Acs collaborative computational technologies for biomedical research an enabl...Acs collaborative computational technologies for biomedical research an enabl...
Acs collaborative computational technologies for biomedical research an enabl...
 
ChemSpider compound database as one of the pillars of a semantic web for …
ChemSpider compound database as one of the pillars of a semantic web for …ChemSpider compound database as one of the pillars of a semantic web for …
ChemSpider compound database as one of the pillars of a semantic web for …
 
ACS 248th Paper 71 ChAMP Project
ACS 248th Paper 71 ChAMP ProjectACS 248th Paper 71 ChAMP Project
ACS 248th Paper 71 ChAMP Project
 
Lightning Talk #9: How UX and Data Storytelling Can Shape Policy by Mika Aldaba
Lightning Talk #9: How UX and Data Storytelling Can Shape Policy by Mika AldabaLightning Talk #9: How UX and Data Storytelling Can Shape Policy by Mika Aldaba
Lightning Talk #9: How UX and Data Storytelling Can Shape Policy by Mika Aldaba
 
Study: The Future of VR, AR and Self-Driving Cars
Study: The Future of VR, AR and Self-Driving CarsStudy: The Future of VR, AR and Self-Driving Cars
Study: The Future of VR, AR and Self-Driving Cars
 

Semelhante a Connecting Chemistry Across the Internet Using ChemSpider

ChemSpider – A Platform to Gather, Host and Integrate Structure Based Data Ac...
ChemSpider – A Platform to Gather, Host and Integrate Structure Based Data Ac...ChemSpider – A Platform to Gather, Host and Integrate Structure Based Data Ac...
ChemSpider – A Platform to Gather, Host and Integrate Structure Based Data Ac...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
RSC ChemSpider -- Managing and Integrating Chemistry on the Internet to Build...
RSC ChemSpider -- Managing and Integrating Chemistry on the Internet to Build...RSC ChemSpider -- Managing and Integrating Chemistry on the Internet to Build...
RSC ChemSpider -- Managing and Integrating Chemistry on the Internet to Build...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
ChemSpider as a Foundation for Crowdsourcing and Collaborations in Open Chemi...
ChemSpider as a Foundation for Crowdsourcing and Collaborations in Open Chemi...ChemSpider as a Foundation for Crowdsourcing and Collaborations in Open Chemi...
ChemSpider as a Foundation for Crowdsourcing and Collaborations in Open Chemi...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
ChemSpider – A Community Platform for Chemistry and Resources Supporting the ...
ChemSpider – A Community Platform for Chemistry and Resources Supporting the ...ChemSpider – A Community Platform for Chemistry and Resources Supporting the ...
ChemSpider – A Community Platform for Chemistry and Resources Supporting the ...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Chemspider hosting linking and curating chemistry data for the community
Chemspider hosting linking and curating chemistry data for the communityChemspider hosting linking and curating chemistry data for the community
Chemspider hosting linking and curating chemistry data for the community
Royal Society of Chemistry
 

Semelhante a Connecting Chemistry Across the Internet Using ChemSpider (20)

ChemSpider and How The Wisdom Of The Crowds Can Improve The Quality Of ...
ChemSpider  and How The Wisdom Of The  Crowds  Can  Improve The  Quality Of  ...ChemSpider  and How The Wisdom Of The  Crowds  Can  Improve The  Quality Of  ...
ChemSpider and How The Wisdom Of The Crowds Can Improve The Quality Of ...
 
ChemSpider – A Platform to Gather, Host and Integrate Structure Based Data Ac...
ChemSpider – A Platform to Gather, Host and Integrate Structure Based Data Ac...ChemSpider – A Platform to Gather, Host and Integrate Structure Based Data Ac...
ChemSpider – A Platform to Gather, Host and Integrate Structure Based Data Ac...
 
RSC ChemSpider -- Managing and Integrating Chemistry on the Internet to Build...
RSC ChemSpider -- Managing and Integrating Chemistry on the Internet to Build...RSC ChemSpider -- Managing and Integrating Chemistry on the Internet to Build...
RSC ChemSpider -- Managing and Integrating Chemistry on the Internet to Build...
 
ChemSpider as a Foundation for Crowdsourcing and Collaborations in Open Chemi...
ChemSpider as a Foundation for Crowdsourcing and Collaborations in Open Chemi...ChemSpider as a Foundation for Crowdsourcing and Collaborations in Open Chemi...
ChemSpider as a Foundation for Crowdsourcing and Collaborations in Open Chemi...
 
ChemSpider - Does Community Engagement work to Build a Quality Online Resourc...
ChemSpider - Does Community Engagement work to Build a Quality Online Resourc...ChemSpider - Does Community Engagement work to Build a Quality Online Resourc...
ChemSpider - Does Community Engagement work to Build a Quality Online Resourc...
 
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspnRSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
 
ChemSpider Presentation At University Of Toronto
ChemSpider Presentation At University Of TorontoChemSpider Presentation At University Of Toronto
ChemSpider Presentation At University Of Toronto
 
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Ann...
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Ann...Delivering Curated Chemistry to the World via Crowdsourced Deposition and Ann...
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Ann...
 
Feeding and consuming data to support open notebook science via the chem spid...
Feeding and consuming data to support open notebook science via the chem spid...Feeding and consuming data to support open notebook science via the chem spid...
Feeding and consuming data to support open notebook science via the chem spid...
 
Integrating and curating internet based chemistry resources to serve life sci...
Integrating and curating internet based chemistry resources to serve life sci...Integrating and curating internet based chemistry resources to serve life sci...
Integrating and curating internet based chemistry resources to serve life sci...
 
Building A Community Resource For The Life Sciences
Building A Community Resource For The Life SciencesBuilding A Community Resource For The Life Sciences
Building A Community Resource For The Life Sciences
 
ChemSpider Overview Presentation at Special Libraries Association
ChemSpider Overview Presentation at Special Libraries AssociationChemSpider Overview Presentation at Special Libraries Association
ChemSpider Overview Presentation at Special Libraries Association
 
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Ann...
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Ann...Delivering Curated Chemistry to the World via Crowdsourced Deposition and Ann...
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Ann...
 
ChemSpider – A Community Platform for Chemistry and Resources Supporting the ...
ChemSpider – A Community Platform for Chemistry and Resources Supporting the ...ChemSpider – A Community Platform for Chemistry and Resources Supporting the ...
ChemSpider – A Community Platform for Chemistry and Resources Supporting the ...
 
RSC ChemSpider is the online chemistry database where community contributions...
RSC ChemSpider is the online chemistry database where community contributions...RSC ChemSpider is the online chemistry database where community contributions...
RSC ChemSpider is the online chemistry database where community contributions...
 
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Comm...
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Comm...ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Comm...
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Comm...
 
Crowdsourcing, Collaborations and Text-Mining in a World of Open Chemistry
Crowdsourcing, Collaborations and Text-Mining in a World of Open Chemistry Crowdsourcing, Collaborations and Text-Mining in a World of Open Chemistry
Crowdsourcing, Collaborations and Text-Mining in a World of Open Chemistry
 
Chemspider hosting linking and curating chemistry data for the community
Chemspider hosting linking and curating chemistry data for the communityChemspider hosting linking and curating chemistry data for the community
Chemspider hosting linking and curating chemistry data for the community
 
RSC ChemSpider – Building An Internet Based Community For Chemists
RSC ChemSpider – Building An Internet Based Community For ChemistsRSC ChemSpider – Building An Internet Based Community For Chemists
RSC ChemSpider – Building An Internet Based Community For Chemists
 
Online Public Compound Databases
Online Public Compound DatabasesOnline Public Compound Databases
Online Public Compound Databases
 

Último

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Último (20)

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 

Connecting Chemistry Across the Internet Using ChemSpider

  • 1. Connecting Chemistry Across the Internet Using ChemSpider Antony J Williams and Valery Tkachenko SERMACS, November 15th 2012
  • 2. Chemistry Data and the Weeds
  • 3. Tell me about Roundup
  • 4. So what is Round Up?
  • 7. Where do we Round Up data?  Where can I find the molfile for Roundup?  Papers/Patents about Roundup?  What are the side effects of Roundup?  Where can I order Roundup?  What are the physicochemical properties?  Metabolic pathways?  Different synonyms of Roundup?  Synthesis of Roundup?  Side effects of Roundup?  Etc….
  • 8. Where do I Round Up Data?
  • 9.
  • 10. In an increasing LinkedData map….
  • 11. But I want to aggregate data? So…
  • 12. ChemSpider  Takes on the role of a structure centric hub:  Connecting, validating, qualifying data  Enhancing data with connections to services  Provides access to data and services for others to use (Thermo, Agilent, Bruker, Waters, ACD/Labs, Accelrys, etc.)  Uses available services to integrate, connect and enhance the offering
  • 14. What will ChemSpider give us??
  • 15. What will ChemSpider give us??
  • 16. What will ChemSpider give us??
  • 17. What will ChemSpider give us??
  • 18. What will ChemSpider give us??
  • 19. What will ChemSpider give us??
  • 21. What will ChemSpider give us??
  • 23. How did we build it?  We deal in Molfiles or SDF files – with coordinates  Deposit anything that has an InChI – we support what InChI can handle, good and bad  Standardization based on “InChI standardization”  InChIs aggregate (certain) tautomers  How much of ChemSpider is “on ChemSpider”?
  • 24. Connecting Chemistry across the web  So much of what is seen on ChemSpider is retrieved in real time using services
  • 27. A Comment on Quality  For >28 million chemical compounds there are some errors:  “Incorrect” structure representations  Mismatched name-structure relationships  Experimental properties (the values, the units)  Real vs. virtual compounds – text-mining and conversion  We have deprecated a LOT of data…
  • 28. Downsides of InChI  Good for small molecules – but no polymers, issues with inorganics, organometallics, imperfect stereochemistry. ChemSpider is “small molecules”  InChI used as the “deduplicator” – FIRST version of a compound into the database becomes THE structure to deduplicate against…
  • 29. Side Effects of InChI Usage
  • 31. Side Effects of InChI Usage
  • 33. Downsides of Overall Approach  Meshing data together based on InChIs worked for simple molecules  2D layout errors inherited or limited by algorithm  Complex molecules that are meant to be the same thing were NOT deduplicated. Compounds differing by one stereocenter, named the same, meant to be the same, are not the same
  • 34. So much data online is “erroneous”
  • 35. The confusion of name-structures
  • 36. Collapsing Data – Standardization
  • 37. What needs to happen?  If we could validate  Catch errors in databases (and clean)  Proactively catch errors in publications/patents  Reduce junk in the ether – improve QUALITY!  If we collectively standardized  Interlinking between databases should improve  CVSP – a separate presentation….stick around
  • 38. Crowdsourcing ChemSpider  ChemSpider is crowdsourced  Community deposition, annotation and curation  Anyone can “Leave Feedback”  Registered users can add data
  • 39. ChemSpider and Global Chemistry Hub Internet Data Small organic molecules Commercial Software Undefined materials Pre-competitive Data Organometallics Open Science Nanomaterials Open Data Polymers Publishers Minerals Educators Particle bound Open Databases Links to Biologicals Chemical Vendors
  • 40. Delivering a Prediction Platform  Experimental data will be used as the basis of model generation – a predictive platform…
  • 41. The Future of ChemSpider  Continued focus on quality over quantity – but more data is good too!  ChemSpider Reactions – work in progress and includes >300,000 reactions  Plugging in a validation and standardization platform  Delivering personal and institutional repository capabilities
  • 42. Thank you Email: williamsa@rsc.org Twitter: ChemConnector Personal Blog: www.chemconnector.com SLIDES: www.slideshare.net/AntonyWilliams