SlideShare uma empresa Scribd logo
1 de 63
Great promise of navigating the
          internet using InChIs

                     Antony J Williams
                 ACS San Diego March 2012
Openness and Quality Issues
Williams and Ekins, DDT, 16: 747-750 (2011)

              Science Translational Medicine 2011
Warning…
 This talk is not about Quality…it’s about quantity
Warning…
 This talk is not about Quality…it’s about quantity




                  Drugbank was here
Data quality is a known issue
We ALL have issues!!!
It’s about what’s out there…
How to Link it…
And getting out of overwhelm…
So what is Yohimbine?
Of course it is out there…




      Drugbox: 3001/5080 with InChIs

      Chembox:5436/7690 with InChIs
Tell me more…
   Where can I find the molfile for Yohimbine?
   Papers/Patents about Yohimbine?
   What are the side effects of Yohimbine?
   Where can I order Yohimbine?
   What are the physicochemical properties?
   Metabolic pathways?
   Different synonyms of Yohimbine?
   Synthesis of Yohimbine?
   Side effects of Yohimbine?
   Etc….
Quantity!
Yohimbine on ChemSpider..Quality?
How do we build it?
 We deal in Molfiles or SDF files – with coordinates

 Deposit anything that has an InChI – we support
  what InChI can handle, good and bad

 Standardization based on “InChI standardization”

 InChIs aggregate (certain) tautomers

 We link out to external sites using their IDs
Downsides of InChI
 InChI was a moving target (multi versions) but
  overall worked as planned.

 Good for small molecules – but no polymers,
  issues with inorganics, organometallics, imperfect
  stereochemistry. ChemSpider is “small molecules”

 InChI used as the “deduplicator” – FIRST version
  of a compound into the database becomes THE
  structure to deduplicate against…
Side Effects of InChI Usage
SMILES by comparison…
Side Effects of InChI Usage
Standardization Issues
Depiction based on molfile
Downsides of Overall Approach
 Meshing data together based on InChIs worked
  for simple molecules

 2D layout errors inherited or limited by algorithm

 Complex molecules that are meant to be the
  same thing were NOT deduplicated. Compounds
  differing by one stereocenter, named the same,
  meant to be the same, are not the same
Yohimbine on ChemSpider..Quality?
So where can we travel???
So where can we travel???
InChI String Search via Google
Give me InChIKeys…
And where can we travel???
 ChemSpider

 BRENDA

 Wikipedia

 ChEMBL

 ChEBI

 DrugBank
 Aggregator

 Enzymes

 Encyclopedia

 Pharmacology

 Curated Chemicals

 Drug-Drug Target
Recognizing Compound Dilution
 So much chemistry on the web….

 And so much dilution – “structural uniqueness”
  versus “accidental ambiguity”

 InChI as an easy skeleton search
Vancomycin – Search the Internet
Vancomycin




Search Molecular   Search Full Molecule
  SKELETON
Full Skeleton Search
All aggegators suffer dilution!
Many Problems Can be Solved…
 Clean up databases – structure validation,
  structure standardization

 Warn about
   Valency, charge balance, depiction issues,
    bond types, absent stereo, and another 100
    rules (or so…)

 Standardize
   Agree community rules to “Standardize”
Structure Validation
Structure Validation - Fixed
What needs to happen?
 If we could validate
    Catch errors in databases (and clean)
    Proactively catch errors in publications/patents
    Reduce junk in the ether – improve QUALITY!

 If we standardized
    Interlinking should improve
NPC Browser Set
Download, Deposit, Reprocess
Substructure   # of    # of          No           Incomplete       Complete but

                Hits   Correct   stereochemistry Stereochemistry      incorrect

                        Hits                                       stereochemistry


Gonane          34       5             8               21                0

Gon-4-ene       55       12            3               33                7

Gon-1,4-diene   60       17            10              23                10
Structure-Name Validation
                                  H3C
                                                                           NH2
                                               O
                                                                      I              I
                                      O            O                                     CH3
                           H3C                          OH
                   O                                CH3
                                                                                                  O
                                          CH3
                       O                             H
     HN
                                          CH3                               I                OH
              OH
                                                             O
          O                      HO
                                               O     O
                                           O
                                                                            Choladine
                                  O
                                                   CH3


      Taxol

                                                                 Cl
                       H3C                                                               N
                                                                                 N
                       CH3                  CH3

          CH3      H
                                  Cholane
              H        H
                                                                      Chlotrimazole
Standardize




 Use the SRS as a guidance document for
  standardization
 Adjust as necessary to our needs
Nitro groups
Salt and Ionic Bonds
Ammonium salts
Millions of structures? Lots of Issues
ChemSpider Standardization
 Entire ChemSpider database will be standardized
  using modified FDA rule set

 Original Molfiles will be standardized and all
  properties (predicted properties, SMILES, InChIs,
  Names) will all be regenerated

 Standardization procedures automatically applied
  to all future depositions
Identifier Dictionaries
 Reciprocal curation processes…share curation
  with each other.

 If a database has a compound already then use
  InChiKeys to match “suggested” validation
  against the compound.

 A series of “added” and “removed” synonyms
  against InChIKeys for matching.
Proof of Concept Data Curation Sharing
Who wants to work with us?
Structure Validation using feed
 Look for approved synonyms

 Compare feed InChIKey with database InChIKey

 If different, flag for inspection
It is so difficult to navigate…
                                                        IP?
                                What’s the
                                structure?
                                                    Are they in
                                                     our file?
                                  What’s
                                 similar?
                                                    What’s the
                              Pharmacology           target?
                                  data?

                                              Known
                                            Pathways?
                             Competitors?
                                                    Working On
                              Connections             Now?
                              to disease?
                                              Expressed in
                                             right cell type?
Open PHACTS Project
 Develop a set of robust standards…
 Implement the standards in a semantic integration hub
 Deliver services to support drug discovery programs in
  pharma and public domain
 22 partners, 8 pharmaceutical companies, 3 biotechs
 36 months project

  Guiding principle is open access, open usage, open source
                - Key to standards adoption -
Chemistry in Open PHACTS
 Selected data slices of ChemSpider carrying
  pharmacological links into the “linked data cache”

 ChemSpiderIDs and InChIs/InChIKeys will be in
  Open PHACTS and available for linking

 A structure ID standard to enable further linking
  across the semantic web of science
ChemSpider and InChI
                      Internet Data




 Small organic molecules              Commercial Software
 Undefined materials                  Pre-competitive Data
 Organometallics                            Open Science
 Nanomaterials                                 Open Data
 Polymers                                      Publishers
 Minerals                                      Educators
 Particle bound                           Open Databases
 Links to Biologicals                   Chemical Vendors
The great promise should be obvious
 InChIs are here to stay
 They will evolve, they will encompass, we will
  adopt and adapt
 Public and private databases will federate &
  build a linked environment of validated data!
 Data validation and standardization is
  needed
 Open Data will continue to proliferate
 InChIs are in the “Semantic Web” already
If InChI never existed or went away..
 ChemSpider would never have been built

 Database linking would suffer dramatically

 The web would not be “structure searchable”

 Cheminformatics tools would likely not be linking
  to public domain databases in the same way

 And we would not have the pleasure of today…
Acknowledgments
 The inspiration of the InChI Masters – Steve H.,
  Steve S., Alan, Dmitrii, Igor

 IUPAC, NIST, all adopters, supporters,
  challengers and users

 The InChI Trust and its supporters for funding
  continued development

 Al Gore –enabling us to search InChIs on the web
Steve Heller
Steve Heller
Thank you

Email: williamsa@rsc.org
Twitter: ChemConnector
Personal Blog: www.chemconnector.com
SLIDES: www.slideshare.net/AntonyWilliams

Mais conteúdo relacionado

Semelhante a Great promise of navigating the internet using in chis

We’re all SMILES! Building Chemical Semantic Web Services with SADI, ChEBI, a...
We’re all SMILES! Building Chemical Semantic Web Services with SADI, ChEBI, a...We’re all SMILES! Building Chemical Semantic Web Services with SADI, ChEBI, a...
We’re all SMILES! Building Chemical Semantic Web Services with SADI, ChEBI, a...
Michel Dumontier
 
Need and benefits for structure standardization to facilitate integration and...
Need and benefits for structure standardization to facilitate integration and...Need and benefits for structure standardization to facilitate integration and...
Need and benefits for structure standardization to facilitate integration and...
Valery Tkachenko
 
How the InChI identifier is used to underpin our online chemistry databases a...
How the InChI identifier is used to underpin our online chemistry databases a...How the InChI identifier is used to underpin our online chemistry databases a...
How the InChI identifier is used to underpin our online chemistry databases a...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
ChemSpider as a Foundation for Crowdsourcing and Collaborations in Open Chemi...
ChemSpider as a Foundation for Crowdsourcing and Collaborations in Open Chemi...ChemSpider as a Foundation for Crowdsourcing and Collaborations in Open Chemi...
ChemSpider as a Foundation for Crowdsourcing and Collaborations in Open Chemi...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Chemistry Online and The vision and challenges associated with building the c...
Chemistry Online and The vision and challenges associated with building the c...Chemistry Online and The vision and challenges associated with building the c...
Chemistry Online and The vision and challenges associated with building the c...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 

Semelhante a Great promise of navigating the internet using in chis (20)

All together now: piecing together the knowledge graph of life
All together now: piecing together the knowledge graph of lifeAll together now: piecing together the knowledge graph of life
All together now: piecing together the knowledge graph of life
 
Crowdsourcing, Collaborations and Text-Mining in a World of Open Chemistry
Crowdsourcing, Collaborations and Text-Mining in a World of Open Chemistry Crowdsourcing, Collaborations and Text-Mining in a World of Open Chemistry
Crowdsourcing, Collaborations and Text-Mining in a World of Open Chemistry
 
Ontology work at the Royal Society of Chemistry
Ontology work at the Royal Society of ChemistryOntology work at the Royal Society of Chemistry
Ontology work at the Royal Society of Chemistry
 
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Ann...
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Ann...Delivering Curated Chemistry to the World via Crowdsourced Deposition and Ann...
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Ann...
 
Facilitating Scientific Discovery through Crowdsourcing and Distributed Parti...
Facilitating Scientific Discovery through Crowdsourcing and Distributed Parti...Facilitating Scientific Discovery through Crowdsourcing and Distributed Parti...
Facilitating Scientific Discovery through Crowdsourcing and Distributed Parti...
 
We’re all SMILES! Building Chemical Semantic Web Services with SADI, ChEBI, a...
We’re all SMILES! Building Chemical Semantic Web Services with SADI, ChEBI, a...We’re all SMILES! Building Chemical Semantic Web Services with SADI, ChEBI, a...
We’re all SMILES! Building Chemical Semantic Web Services with SADI, ChEBI, a...
 
A Presentation At Nature Publishing Group Crowdsourcing, Collaborations And T...
A Presentation At Nature Publishing Group Crowdsourcing, Collaborations And T...A Presentation At Nature Publishing Group Crowdsourcing, Collaborations And T...
A Presentation At Nature Publishing Group Crowdsourcing, Collaborations And T...
 
Need and benefits for structure standardization to facilitate integration and...
Need and benefits for structure standardization to facilitate integration and...Need and benefits for structure standardization to facilitate integration and...
Need and benefits for structure standardization to facilitate integration and...
 
Communities building ontologies: Tensions and Reality
Communities building ontologies: Tensions and RealityCommunities building ontologies: Tensions and Reality
Communities building ontologies: Tensions and Reality
 
How the InChI identifier is used to underpin our online chemistry databases a...
How the InChI identifier is used to underpin our online chemistry databases a...How the InChI identifier is used to underpin our online chemistry databases a...
How the InChI identifier is used to underpin our online chemistry databases a...
 
How the InChI identifier is used to underpin our online chemistry databases a...
How the InChI identifier is used to underpin our online chemistry databases a...How the InChI identifier is used to underpin our online chemistry databases a...
How the InChI identifier is used to underpin our online chemistry databases a...
 
Bio4j
Bio4jBio4j
Bio4j
 
ChemSpider as a Foundation for Crowdsourcing and Collaborations in Open Chemi...
ChemSpider as a Foundation for Crowdsourcing and Collaborations in Open Chemi...ChemSpider as a Foundation for Crowdsourcing and Collaborations in Open Chemi...
ChemSpider as a Foundation for Crowdsourcing and Collaborations in Open Chemi...
 
ChemSpider - Does Community Engagement work to Build a Quality Online Resourc...
ChemSpider - Does Community Engagement work to Build a Quality Online Resourc...ChemSpider - Does Community Engagement work to Build a Quality Online Resourc...
ChemSpider - Does Community Engagement work to Build a Quality Online Resourc...
 
Chemistry Online and The vision and challenges associated with building the c...
Chemistry Online and The vision and challenges associated with building the c...Chemistry Online and The vision and challenges associated with building the c...
Chemistry Online and The vision and challenges associated with building the c...
 
How to place your research questions or results into the context of the "Lega...
How to place your research questions or results into the context of the "Lega...How to place your research questions or results into the context of the "Lega...
How to place your research questions or results into the context of the "Lega...
 
Improving online chemistry one structure at a time
Improving online chemistry one structure at a timeImproving online chemistry one structure at a time
Improving online chemistry one structure at a time
 
La chemoinformatica: uno strumento computazionale per la chimica farmaceutica
La chemoinformatica: uno strumento computazionale per la chimica farmaceuticaLa chemoinformatica: uno strumento computazionale per la chimica farmaceutica
La chemoinformatica: uno strumento computazionale per la chimica farmaceutica
 
Bubpha techapattaraporn equal partnership case studies from biotec
Bubpha techapattaraporn equal partnership case studies from biotecBubpha techapattaraporn equal partnership case studies from biotec
Bubpha techapattaraporn equal partnership case studies from biotec
 
TCI Quarterly Newsletter - July 2023
TCI Quarterly Newsletter - July 2023TCI Quarterly Newsletter - July 2023
TCI Quarterly Newsletter - July 2023
 

Último

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 

Great promise of navigating the internet using in chis

  • 1. Great promise of navigating the internet using InChIs Antony J Williams ACS San Diego March 2012
  • 2. Openness and Quality Issues Williams and Ekins, DDT, 16: 747-750 (2011) Science Translational Medicine 2011
  • 3. Warning…  This talk is not about Quality…it’s about quantity
  • 4. Warning…  This talk is not about Quality…it’s about quantity Drugbank was here
  • 5. Data quality is a known issue
  • 6. We ALL have issues!!!
  • 7. It’s about what’s out there…
  • 8. How to Link it…
  • 9. And getting out of overwhelm…
  • 10. So what is Yohimbine?
  • 11. Of course it is out there… Drugbox: 3001/5080 with InChIs Chembox:5436/7690 with InChIs
  • 12. Tell me more…  Where can I find the molfile for Yohimbine?  Papers/Patents about Yohimbine?  What are the side effects of Yohimbine?  Where can I order Yohimbine?  What are the physicochemical properties?  Metabolic pathways?  Different synonyms of Yohimbine?  Synthesis of Yohimbine?  Side effects of Yohimbine?  Etc….
  • 15. How do we build it?  We deal in Molfiles or SDF files – with coordinates  Deposit anything that has an InChI – we support what InChI can handle, good and bad  Standardization based on “InChI standardization”  InChIs aggregate (certain) tautomers  We link out to external sites using their IDs
  • 16. Downsides of InChI  InChI was a moving target (multi versions) but overall worked as planned.  Good for small molecules – but no polymers, issues with inorganics, organometallics, imperfect stereochemistry. ChemSpider is “small molecules”  InChI used as the “deduplicator” – FIRST version of a compound into the database becomes THE structure to deduplicate against…
  • 17. Side Effects of InChI Usage
  • 19. Side Effects of InChI Usage
  • 21. Downsides of Overall Approach  Meshing data together based on InChIs worked for simple molecules  2D layout errors inherited or limited by algorithm  Complex molecules that are meant to be the same thing were NOT deduplicated. Compounds differing by one stereocenter, named the same, meant to be the same, are not the same
  • 23. So where can we travel???
  • 24. So where can we travel???
  • 25.
  • 26. InChI String Search via Google Give me InChIKeys…
  • 27. And where can we travel???
  • 28.  ChemSpider  BRENDA  Wikipedia  ChEMBL  ChEBI  DrugBank
  • 29.  Aggregator  Enzymes  Encyclopedia  Pharmacology  Curated Chemicals  Drug-Drug Target
  • 30. Recognizing Compound Dilution  So much chemistry on the web….  And so much dilution – “structural uniqueness” versus “accidental ambiguity”  InChI as an easy skeleton search
  • 31. Vancomycin – Search the Internet
  • 32. Vancomycin Search Molecular Search Full Molecule SKELETON
  • 35. Many Problems Can be Solved…  Clean up databases – structure validation, structure standardization  Warn about  Valency, charge balance, depiction issues, bond types, absent stereo, and another 100 rules (or so…)  Standardize  Agree community rules to “Standardize”
  • 38. What needs to happen?  If we could validate  Catch errors in databases (and clean)  Proactively catch errors in publications/patents  Reduce junk in the ether – improve QUALITY!  If we standardized  Interlinking should improve
  • 39.
  • 42. Substructure # of # of No Incomplete Complete but Hits Correct stereochemistry Stereochemistry incorrect Hits stereochemistry Gonane 34 5 8 21 0 Gon-4-ene 55 12 3 33 7 Gon-1,4-diene 60 17 10 23 10
  • 43. Structure-Name Validation H3C NH2 O I I O O CH3 H3C OH O CH3 O CH3 O H HN CH3 I OH OH O O HO O O O Choladine O CH3 Taxol Cl H3C N N CH3 CH3 CH3 H Cholane H H Chlotrimazole
  • 44. Standardize  Use the SRS as a guidance document for standardization  Adjust as necessary to our needs
  • 46. Salt and Ionic Bonds
  • 48. Millions of structures? Lots of Issues
  • 49. ChemSpider Standardization  Entire ChemSpider database will be standardized using modified FDA rule set  Original Molfiles will be standardized and all properties (predicted properties, SMILES, InChIs, Names) will all be regenerated  Standardization procedures automatically applied to all future depositions
  • 50. Identifier Dictionaries  Reciprocal curation processes…share curation with each other.  If a database has a compound already then use InChiKeys to match “suggested” validation against the compound.  A series of “added” and “removed” synonyms against InChIKeys for matching.
  • 51. Proof of Concept Data Curation Sharing Who wants to work with us?
  • 52. Structure Validation using feed  Look for approved synonyms  Compare feed InChIKey with database InChIKey  If different, flag for inspection
  • 53. It is so difficult to navigate… IP? What’s the structure? Are they in our file? What’s similar? What’s the Pharmacology target? data? Known Pathways? Competitors? Working On Connections Now? to disease? Expressed in right cell type?
  • 54. Open PHACTS Project  Develop a set of robust standards…  Implement the standards in a semantic integration hub  Deliver services to support drug discovery programs in pharma and public domain  22 partners, 8 pharmaceutical companies, 3 biotechs  36 months project Guiding principle is open access, open usage, open source - Key to standards adoption -
  • 55.
  • 56. Chemistry in Open PHACTS  Selected data slices of ChemSpider carrying pharmacological links into the “linked data cache”  ChemSpiderIDs and InChIs/InChIKeys will be in Open PHACTS and available for linking  A structure ID standard to enable further linking across the semantic web of science
  • 57. ChemSpider and InChI Internet Data Small organic molecules Commercial Software Undefined materials Pre-competitive Data Organometallics Open Science Nanomaterials Open Data Polymers Publishers Minerals Educators Particle bound Open Databases Links to Biologicals Chemical Vendors
  • 58. The great promise should be obvious  InChIs are here to stay  They will evolve, they will encompass, we will adopt and adapt  Public and private databases will federate & build a linked environment of validated data!  Data validation and standardization is needed  Open Data will continue to proliferate  InChIs are in the “Semantic Web” already
  • 59. If InChI never existed or went away..  ChemSpider would never have been built  Database linking would suffer dramatically  The web would not be “structure searchable”  Cheminformatics tools would likely not be linking to public domain databases in the same way  And we would not have the pleasure of today…
  • 60. Acknowledgments  The inspiration of the InChI Masters – Steve H., Steve S., Alan, Dmitrii, Igor  IUPAC, NIST, all adopters, supporters, challengers and users  The InChI Trust and its supporters for funding continued development  Al Gore –enabling us to search InChIs on the web
  • 63. Thank you Email: williamsa@rsc.org Twitter: ChemConnector Personal Blog: www.chemconnector.com SLIDES: www.slideshare.net/AntonyWilliams