SlideShare uma empresa Scribd logo
1 de 63
Great promise of navigating the
          internet using InChIs

                     Antony J Williams
                 ACS San Diego March 2012
Openness and Quality Issues
Williams and Ekins, DDT, 16: 747-750 (2011)

              Science Translational Medicine 2011
Warning…
 This talk is not about Quality…it’s about quantity
Warning…
 This talk is not about Quality…it’s about quantity




                  Drugbank was here
Data quality is a known issue
We ALL have issues!!!
It’s about what’s out there…
How to Link it…
And getting out of overwhelm…
So what is Yohimbine?
Of course it is out there…




      Drugbox: 3001/5080 with InChIs

      Chembox:5436/7690 with InChIs
Tell me more…
   Where can I find the molfile for Yohimbine?
   Papers/Patents about Yohimbine?
   What are the side effects of Yohimbine?
   Where can I order Yohimbine?
   What are the physicochemical properties?
   Metabolic pathways?
   Different synonyms of Yohimbine?
   Synthesis of Yohimbine?
   Side effects of Yohimbine?
   Etc….
Quantity!
Yohimbine on ChemSpider..Quality?
How do we build it?
 We deal in Molfiles or SDF files – with coordinates

 Deposit anything that has an InChI – we support
  what InChI can handle, good and bad

 Standardization based on “InChI standardization”

 InChIs aggregate (certain) tautomers

 We link out to external sites using their IDs
Downsides of InChI
 InChI was a moving target (multi versions) but
  overall worked as planned.

 Good for small molecules – but no polymers,
  issues with inorganics, organometallics, imperfect
  stereochemistry. ChemSpider is “small molecules”

 InChI used as the “deduplicator” – FIRST version
  of a compound into the database becomes THE
  structure to deduplicate against…
Side Effects of InChI Usage
SMILES by comparison…
Side Effects of InChI Usage
Standardization Issues
Depiction based on molfile
Downsides of Overall Approach
 Meshing data together based on InChIs worked
  for simple molecules

 2D layout errors inherited or limited by algorithm

 Complex molecules that are meant to be the
  same thing were NOT deduplicated. Compounds
  differing by one stereocenter, named the same,
  meant to be the same, are not the same
Yohimbine on ChemSpider..Quality?
So where can we travel???
So where can we travel???
InChI String Search via Google
Give me InChIKeys…
And where can we travel???
 ChemSpider

 BRENDA

 Wikipedia

 ChEMBL

 ChEBI

 DrugBank
 Aggregator

 Enzymes

 Encyclopedia

 Pharmacology

 Curated Chemicals

 Drug-Drug Target
Recognizing Compound Dilution
 So much chemistry on the web….

 And so much dilution – “structural uniqueness”
  versus “accidental ambiguity”

 InChI as an easy skeleton search
Vancomycin – Search the Internet
Vancomycin




Search Molecular   Search Full Molecule
  SKELETON
Full Skeleton Search
All aggegators suffer dilution!
Many Problems Can be Solved…
 Clean up databases – structure validation,
  structure standardization

 Warn about
   Valency, charge balance, depiction issues,
    bond types, absent stereo, and another 100
    rules (or so…)

 Standardize
   Agree community rules to “Standardize”
Structure Validation
Structure Validation - Fixed
What needs to happen?
 If we could validate
    Catch errors in databases (and clean)
    Proactively catch errors in publications/patents
    Reduce junk in the ether – improve QUALITY!

 If we standardized
    Interlinking should improve
NPC Browser Set
Download, Deposit, Reprocess
Substructure   # of    # of          No           Incomplete       Complete but

                Hits   Correct   stereochemistry Stereochemistry      incorrect

                        Hits                                       stereochemistry


Gonane          34       5             8               21                0

Gon-4-ene       55       12            3               33                7

Gon-1,4-diene   60       17            10              23                10
Structure-Name Validation
                                  H3C
                                                                           NH2
                                               O
                                                                      I              I
                                      O            O                                     CH3
                           H3C                          OH
                   O                                CH3
                                                                                                  O
                                          CH3
                       O                             H
     HN
                                          CH3                               I                OH
              OH
                                                             O
          O                      HO
                                               O     O
                                           O
                                                                            Choladine
                                  O
                                                   CH3


      Taxol

                                                                 Cl
                       H3C                                                               N
                                                                                 N
                       CH3                  CH3

          CH3      H
                                  Cholane
              H        H
                                                                      Chlotrimazole
Standardize




 Use the SRS as a guidance document for
  standardization
 Adjust as necessary to our needs
Nitro groups
Salt and Ionic Bonds
Ammonium salts
Millions of structures? Lots of Issues
ChemSpider Standardization
 Entire ChemSpider database will be standardized
  using modified FDA rule set

 Original Molfiles will be standardized and all
  properties (predicted properties, SMILES, InChIs,
  Names) will all be regenerated

 Standardization procedures automatically applied
  to all future depositions
Identifier Dictionaries
 Reciprocal curation processes…share curation
  with each other.

 If a database has a compound already then use
  InChiKeys to match “suggested” validation
  against the compound.

 A series of “added” and “removed” synonyms
  against InChIKeys for matching.
Proof of Concept Data Curation Sharing
Who wants to work with us?
Structure Validation using feed
 Look for approved synonyms

 Compare feed InChIKey with database InChIKey

 If different, flag for inspection
It is so difficult to navigate…
                                                        IP?
                                What’s the
                                structure?
                                                    Are they in
                                                     our file?
                                  What’s
                                 similar?
                                                    What’s the
                              Pharmacology           target?
                                  data?

                                              Known
                                            Pathways?
                             Competitors?
                                                    Working On
                              Connections             Now?
                              to disease?
                                              Expressed in
                                             right cell type?
Open PHACTS Project
 Develop a set of robust standards…
 Implement the standards in a semantic integration hub
 Deliver services to support drug discovery programs in
  pharma and public domain
 22 partners, 8 pharmaceutical companies, 3 biotechs
 36 months project

  Guiding principle is open access, open usage, open source
                - Key to standards adoption -
Chemistry in Open PHACTS
 Selected data slices of ChemSpider carrying
  pharmacological links into the “linked data cache”

 ChemSpiderIDs and InChIs/InChIKeys will be in
  Open PHACTS and available for linking

 A structure ID standard to enable further linking
  across the semantic web of science
ChemSpider and InChI
                      Internet Data




 Small organic molecules              Commercial Software
 Undefined materials                  Pre-competitive Data
 Organometallics                            Open Science
 Nanomaterials                                 Open Data
 Polymers                                      Publishers
 Minerals                                      Educators
 Particle bound                           Open Databases
 Links to Biologicals                   Chemical Vendors
The great promise should be obvious
 InChIs are here to stay
 They will evolve, they will encompass, we will
  adopt and adapt
 Public and private databases will federate &
  build a linked environment of validated data!
 Data validation and standardization is
  needed
 Open Data will continue to proliferate
 InChIs are in the “Semantic Web” already
If InChI never existed or went away..
 ChemSpider would never have been built

 Database linking would suffer dramatically

 The web would not be “structure searchable”

 Cheminformatics tools would likely not be linking
  to public domain databases in the same way

 And we would not have the pleasure of today…
Acknowledgments
 The inspiration of the InChI Masters – Steve H.,
  Steve S., Alan, Dmitrii, Igor

 IUPAC, NIST, all adopters, supporters,
  challengers and users

 The InChI Trust and its supporters for funding
  continued development

 Al Gore –enabling us to search InChIs on the web
Steve Heller
Steve Heller
Thank you

Email: williamsa@rsc.org
Twitter: ChemConnector
Personal Blog: www.chemconnector.com
SLIDES: www.slideshare.net/AntonyWilliams

Mais conteúdo relacionado

Semelhante a Great promise of navigating the internet using in chis

All together now: piecing together the knowledge graph of life
All together now: piecing together the knowledge graph of lifeAll together now: piecing together the knowledge graph of life
All together now: piecing together the knowledge graph of lifeChris Mungall
 
We’re all SMILES! Building Chemical Semantic Web Services with SADI, ChEBI, a...
We’re all SMILES! Building Chemical Semantic Web Services with SADI, ChEBI, a...We’re all SMILES! Building Chemical Semantic Web Services with SADI, ChEBI, a...
We’re all SMILES! Building Chemical Semantic Web Services with SADI, ChEBI, a...Michel Dumontier
 
A Presentation At Nature Publishing Group Crowdsourcing, Collaborations And T...
A Presentation At Nature Publishing Group Crowdsourcing, Collaborations And T...A Presentation At Nature Publishing Group Crowdsourcing, Collaborations And T...
A Presentation At Nature Publishing Group Crowdsourcing, Collaborations And T...guest01a117
 
Need and benefits for structure standardization to facilitate integration and...
Need and benefits for structure standardization to facilitate integration and...Need and benefits for structure standardization to facilitate integration and...
Need and benefits for structure standardization to facilitate integration and...Valery Tkachenko
 
Communities building ontologies: Tensions and Reality
Communities building ontologies: Tensions and RealityCommunities building ontologies: Tensions and Reality
Communities building ontologies: Tensions and Realityrobertstevens65
 
How the InChI identifier is used to underpin our online chemistry databases a...
How the InChI identifier is used to underpin our online chemistry databases a...How the InChI identifier is used to underpin our online chemistry databases a...
How the InChI identifier is used to underpin our online chemistry databases a...Ken Karapetyan
 
La chemoinformatica: uno strumento computazionale per la chimica farmaceutica
La chemoinformatica: uno strumento computazionale per la chimica farmaceuticaLa chemoinformatica: uno strumento computazionale per la chimica farmaceutica
La chemoinformatica: uno strumento computazionale per la chimica farmaceuticaCRS4 Research Center in Sardinia
 
TCI Quarterly Newsletter - July 2023
TCI Quarterly Newsletter - July 2023TCI Quarterly Newsletter - July 2023
TCI Quarterly Newsletter - July 2023AshutoshKumar13713
 

Semelhante a Great promise of navigating the internet using in chis (20)

All together now: piecing together the knowledge graph of life
All together now: piecing together the knowledge graph of lifeAll together now: piecing together the knowledge graph of life
All together now: piecing together the knowledge graph of life
 
Crowdsourcing, Collaborations and Text-Mining in a World of Open Chemistry
Crowdsourcing, Collaborations and Text-Mining in a World of Open Chemistry Crowdsourcing, Collaborations and Text-Mining in a World of Open Chemistry
Crowdsourcing, Collaborations and Text-Mining in a World of Open Chemistry
 
Ontology work at the Royal Society of Chemistry
Ontology work at the Royal Society of ChemistryOntology work at the Royal Society of Chemistry
Ontology work at the Royal Society of Chemistry
 
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Ann...
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Ann...Delivering Curated Chemistry to the World via Crowdsourced Deposition and Ann...
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Ann...
 
Facilitating Scientific Discovery through Crowdsourcing and Distributed Parti...
Facilitating Scientific Discovery through Crowdsourcing and Distributed Parti...Facilitating Scientific Discovery through Crowdsourcing and Distributed Parti...
Facilitating Scientific Discovery through Crowdsourcing and Distributed Parti...
 
We’re all SMILES! Building Chemical Semantic Web Services with SADI, ChEBI, a...
We’re all SMILES! Building Chemical Semantic Web Services with SADI, ChEBI, a...We’re all SMILES! Building Chemical Semantic Web Services with SADI, ChEBI, a...
We’re all SMILES! Building Chemical Semantic Web Services with SADI, ChEBI, a...
 
A Presentation At Nature Publishing Group Crowdsourcing, Collaborations And T...
A Presentation At Nature Publishing Group Crowdsourcing, Collaborations And T...A Presentation At Nature Publishing Group Crowdsourcing, Collaborations And T...
A Presentation At Nature Publishing Group Crowdsourcing, Collaborations And T...
 
Need and benefits for structure standardization to facilitate integration and...
Need and benefits for structure standardization to facilitate integration and...Need and benefits for structure standardization to facilitate integration and...
Need and benefits for structure standardization to facilitate integration and...
 
Communities building ontologies: Tensions and Reality
Communities building ontologies: Tensions and RealityCommunities building ontologies: Tensions and Reality
Communities building ontologies: Tensions and Reality
 
How the InChI identifier is used to underpin our online chemistry databases a...
How the InChI identifier is used to underpin our online chemistry databases a...How the InChI identifier is used to underpin our online chemistry databases a...
How the InChI identifier is used to underpin our online chemistry databases a...
 
How the InChI identifier is used to underpin our online chemistry databases a...
How the InChI identifier is used to underpin our online chemistry databases a...How the InChI identifier is used to underpin our online chemistry databases a...
How the InChI identifier is used to underpin our online chemistry databases a...
 
Bio4j
Bio4jBio4j
Bio4j
 
ChemSpider as a Foundation for Crowdsourcing and Collaborations in Open Chemi...
ChemSpider as a Foundation for Crowdsourcing and Collaborations in Open Chemi...ChemSpider as a Foundation for Crowdsourcing and Collaborations in Open Chemi...
ChemSpider as a Foundation for Crowdsourcing and Collaborations in Open Chemi...
 
ChemSpider - Does Community Engagement work to Build a Quality Online Resourc...
ChemSpider - Does Community Engagement work to Build a Quality Online Resourc...ChemSpider - Does Community Engagement work to Build a Quality Online Resourc...
ChemSpider - Does Community Engagement work to Build a Quality Online Resourc...
 
Chemistry Online and The vision and challenges associated with building the c...
Chemistry Online and The vision and challenges associated with building the c...Chemistry Online and The vision and challenges associated with building the c...
Chemistry Online and The vision and challenges associated with building the c...
 
How to place your research questions or results into the context of the "Lega...
How to place your research questions or results into the context of the "Lega...How to place your research questions or results into the context of the "Lega...
How to place your research questions or results into the context of the "Lega...
 
Improving online chemistry one structure at a time
Improving online chemistry one structure at a timeImproving online chemistry one structure at a time
Improving online chemistry one structure at a time
 
La chemoinformatica: uno strumento computazionale per la chimica farmaceutica
La chemoinformatica: uno strumento computazionale per la chimica farmaceuticaLa chemoinformatica: uno strumento computazionale per la chimica farmaceutica
La chemoinformatica: uno strumento computazionale per la chimica farmaceutica
 
Bubpha techapattaraporn equal partnership case studies from biotec
Bubpha techapattaraporn equal partnership case studies from biotecBubpha techapattaraporn equal partnership case studies from biotec
Bubpha techapattaraporn equal partnership case studies from biotec
 
TCI Quarterly Newsletter - July 2023
TCI Quarterly Newsletter - July 2023TCI Quarterly Newsletter - July 2023
TCI Quarterly Newsletter - July 2023
 

Último

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Principled Technologies
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024SynarionITSolutions
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 

Último (20)

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 

Great promise of navigating the internet using in chis

  • 1. Great promise of navigating the internet using InChIs Antony J Williams ACS San Diego March 2012
  • 2. Openness and Quality Issues Williams and Ekins, DDT, 16: 747-750 (2011) Science Translational Medicine 2011
  • 3. Warning…  This talk is not about Quality…it’s about quantity
  • 4. Warning…  This talk is not about Quality…it’s about quantity Drugbank was here
  • 5. Data quality is a known issue
  • 6. We ALL have issues!!!
  • 7. It’s about what’s out there…
  • 8. How to Link it…
  • 9. And getting out of overwhelm…
  • 10. So what is Yohimbine?
  • 11. Of course it is out there… Drugbox: 3001/5080 with InChIs Chembox:5436/7690 with InChIs
  • 12. Tell me more…  Where can I find the molfile for Yohimbine?  Papers/Patents about Yohimbine?  What are the side effects of Yohimbine?  Where can I order Yohimbine?  What are the physicochemical properties?  Metabolic pathways?  Different synonyms of Yohimbine?  Synthesis of Yohimbine?  Side effects of Yohimbine?  Etc….
  • 15. How do we build it?  We deal in Molfiles or SDF files – with coordinates  Deposit anything that has an InChI – we support what InChI can handle, good and bad  Standardization based on “InChI standardization”  InChIs aggregate (certain) tautomers  We link out to external sites using their IDs
  • 16. Downsides of InChI  InChI was a moving target (multi versions) but overall worked as planned.  Good for small molecules – but no polymers, issues with inorganics, organometallics, imperfect stereochemistry. ChemSpider is “small molecules”  InChI used as the “deduplicator” – FIRST version of a compound into the database becomes THE structure to deduplicate against…
  • 17. Side Effects of InChI Usage
  • 19. Side Effects of InChI Usage
  • 21. Downsides of Overall Approach  Meshing data together based on InChIs worked for simple molecules  2D layout errors inherited or limited by algorithm  Complex molecules that are meant to be the same thing were NOT deduplicated. Compounds differing by one stereocenter, named the same, meant to be the same, are not the same
  • 23. So where can we travel???
  • 24. So where can we travel???
  • 25.
  • 26. InChI String Search via Google Give me InChIKeys…
  • 27. And where can we travel???
  • 28.  ChemSpider  BRENDA  Wikipedia  ChEMBL  ChEBI  DrugBank
  • 29.  Aggregator  Enzymes  Encyclopedia  Pharmacology  Curated Chemicals  Drug-Drug Target
  • 30. Recognizing Compound Dilution  So much chemistry on the web….  And so much dilution – “structural uniqueness” versus “accidental ambiguity”  InChI as an easy skeleton search
  • 31. Vancomycin – Search the Internet
  • 32. Vancomycin Search Molecular Search Full Molecule SKELETON
  • 35. Many Problems Can be Solved…  Clean up databases – structure validation, structure standardization  Warn about  Valency, charge balance, depiction issues, bond types, absent stereo, and another 100 rules (or so…)  Standardize  Agree community rules to “Standardize”
  • 38. What needs to happen?  If we could validate  Catch errors in databases (and clean)  Proactively catch errors in publications/patents  Reduce junk in the ether – improve QUALITY!  If we standardized  Interlinking should improve
  • 39.
  • 42. Substructure # of # of No Incomplete Complete but Hits Correct stereochemistry Stereochemistry incorrect Hits stereochemistry Gonane 34 5 8 21 0 Gon-4-ene 55 12 3 33 7 Gon-1,4-diene 60 17 10 23 10
  • 43. Structure-Name Validation H3C NH2 O I I O O CH3 H3C OH O CH3 O CH3 O H HN CH3 I OH OH O O HO O O O Choladine O CH3 Taxol Cl H3C N N CH3 CH3 CH3 H Cholane H H Chlotrimazole
  • 44. Standardize  Use the SRS as a guidance document for standardization  Adjust as necessary to our needs
  • 46. Salt and Ionic Bonds
  • 48. Millions of structures? Lots of Issues
  • 49. ChemSpider Standardization  Entire ChemSpider database will be standardized using modified FDA rule set  Original Molfiles will be standardized and all properties (predicted properties, SMILES, InChIs, Names) will all be regenerated  Standardization procedures automatically applied to all future depositions
  • 50. Identifier Dictionaries  Reciprocal curation processes…share curation with each other.  If a database has a compound already then use InChiKeys to match “suggested” validation against the compound.  A series of “added” and “removed” synonyms against InChIKeys for matching.
  • 51. Proof of Concept Data Curation Sharing Who wants to work with us?
  • 52. Structure Validation using feed  Look for approved synonyms  Compare feed InChIKey with database InChIKey  If different, flag for inspection
  • 53. It is so difficult to navigate… IP? What’s the structure? Are they in our file? What’s similar? What’s the Pharmacology target? data? Known Pathways? Competitors? Working On Connections Now? to disease? Expressed in right cell type?
  • 54. Open PHACTS Project  Develop a set of robust standards…  Implement the standards in a semantic integration hub  Deliver services to support drug discovery programs in pharma and public domain  22 partners, 8 pharmaceutical companies, 3 biotechs  36 months project Guiding principle is open access, open usage, open source - Key to standards adoption -
  • 55.
  • 56. Chemistry in Open PHACTS  Selected data slices of ChemSpider carrying pharmacological links into the “linked data cache”  ChemSpiderIDs and InChIs/InChIKeys will be in Open PHACTS and available for linking  A structure ID standard to enable further linking across the semantic web of science
  • 57. ChemSpider and InChI Internet Data Small organic molecules Commercial Software Undefined materials Pre-competitive Data Organometallics Open Science Nanomaterials Open Data Polymers Publishers Minerals Educators Particle bound Open Databases Links to Biologicals Chemical Vendors
  • 58. The great promise should be obvious  InChIs are here to stay  They will evolve, they will encompass, we will adopt and adapt  Public and private databases will federate & build a linked environment of validated data!  Data validation and standardization is needed  Open Data will continue to proliferate  InChIs are in the “Semantic Web” already
  • 59. If InChI never existed or went away..  ChemSpider would never have been built  Database linking would suffer dramatically  The web would not be “structure searchable”  Cheminformatics tools would likely not be linking to public domain databases in the same way  And we would not have the pleasure of today…
  • 60. Acknowledgments  The inspiration of the InChI Masters – Steve H., Steve S., Alan, Dmitrii, Igor  IUPAC, NIST, all adopters, supporters, challengers and users  The InChI Trust and its supporters for funding continued development  Al Gore –enabling us to search InChIs on the web
  • 63. Thank you Email: williamsa@rsc.org Twitter: ChemConnector Personal Blog: www.chemconnector.com SLIDES: www.slideshare.net/AntonyWilliams