SlideShare uma empresa Scribd logo
1 de 27
Baixar para ler offline
Towards a Gold Standard: Improving The
  Quality of Public Domain Chemistry
               Databases

           Antony J. Williams1, Sean Ekins 2


  1Royal Society of Chemistry, Wake Forest, NC 27587
 2Collaborations in Chemistry, Fuquay Varina, NC 27526.
The future: crowdsourced drug discovery




Williams et al., Drug Discovery World, Winter 2009
Chemistry structures are proliferating
                 on the web
   Safety data
   Toxicity data
   Blogs and Wikis
   Property databases            Users take them at face value
   Experimental results
   Scientific publications           They SHOULD NOT!!!
   Compound aggregators
   Open Notebook Science
   Metabolic pathway databases
   Encyclopedic articles (Wikipedia)

    Immense quantities of scientific information are contained in the
    thousands of databases

    Progress can however be inhibited by errors in these databases,
    downstream effects when the data is reused.
                                                  http://bit.ly/zWGaps
What is the Structure of Vitamin K1?
What Mechanisms Do we Have to Alert the Community ?
   Email database owner and hope for a response
   Blog it
      Tony has been blogging about database quality for years and nobody
       was listening – other than the people at PubChem
      For some databases, when he blogged they listened and would edit!

   Tweet it

   Dec 2010 - We felt something had to be said definitively about structure
    quality
   Publish it – wrote to Science, Nature and then PLoS Computational Biology

    http://bit.ly/qtJF2f

                  Perhaps the phone?
April 27 2011- Then came the :
       The NPC Browser




               Science Translational Medicine 2011
But wait, hold on – did anyone peer review the
                          database??
Database released and within days ..
A quick analysis of structure quality revealed..
100’s of errors found in structures




                                                   Williams and Ekins,
                                                   DDT, 16: 747-750 (2011)
NPC Browser
http://tripod.nih.gov/npc/
Neomycin in NPC Browser
http://tripod.nih.gov/npc/
Neomycin In ChemSpider
How many contribute to
             clean-up?
   Less than a dozen contributors to data

   The majority are project members



   The crowd is                small…
   This is the same for all cheminformatics crowd-
    based efforts
What Mechanisms Do we Have to Alert the Community –
                  Publishing is too slow


   Tony Blogged April 28th 1 day after
    release http://bit.ly/jn8wLC

   I Blogged April 29th http://bit.ly/lXHInG
   suggesting the need for a gold standard
    database

   After more extensive analysis we sent a
    manuscript to Science Translational
    Medicine - Rejected

   Drug Discovery Today..accepted…8
    Months after we pointed out the issue
    even before NPC Browser release..
                                                Williams and Ekins,
                                                DDT, 16: 747-750 (2011)
Responses from Community and NCGC

    Comments on initial blog
    NCGC added a disclaimer which I blogged about May 23rd
     http://bit.ly/m4Tx2b

                                                 Sept 8th 2011
                                            Email from Tudor Oprea
                                              (cc’ed to 60 others)
                                           He has also been pointing
                                            out database errors for
                                                     years..

                                             Followed by one from
                                             Chris Austin offering to
                                                    meet us

    Several individuals thanked us for the alert
More Extensive Analysis and solutions


     More analysis of NPC browser errors

     “analysis of the NPC browser ‘HTS amenable compounds’ subset of
      data for 7600 compounds identified fundamental errors in
      stereochemistry, valency issues and charge imbalances in a few
      minutes work using a rudimentary software tool”

     Analysis of other chemistry databases and errors

     Other types of databases and errors

     Offered solutions

Towards a Gold Standard: Regarding Quality in Public Domain Chemistry Databases and Approaches to Improving
the Situation Antony J. Williams, Sean Ekins and Valery Tkachenko, Drug Discovery Today, In Press 2012
Data Errors in the NPC Browser: Analysis of Steroids




         Substructure    # of    # of           No            Incomplete        Complete but

                         Hits   Correct   stereochemistry   Stereochemistry       incorrect

                                 Hits                                         stereochemistry




       Gonane             34      5              8                21                 0


       Gon-4-ene          55      12             3                33                 7


       Gon-1,4-diene      60      17            10                23                 10




Towards a Gold Standard: Regarding Quality in Public Domain Chemistry Databases and Approaches to Improving
the Situation Antony J. Williams, Sean Ekins and Valery Tkachenko, Drug Discovery Today, In Press 2012
Why this matters to us and
   YOU the CROWD ?
What You Might Not Know About
    Chemistry Databases On The Internet
   Data-sharing between open databases is cyclic
   This can proliferate errors in the “Linked Data”
Public Domain Databases
   Our databases are a mess…

   Non-curated databases are proliferating errors

   We source and deposit data between databases

   Original sources of errors hard to determine

   Curation is time-consuming and challenging
Molecule Data Quality Impacts
   in silico drug discovery
     vast ligand and protein–protein interaction databases
     develop computational models

     global mapping of pharmacological space

     drug-target networks of approved drugs

     prediction of off-target effects
Different types of
            databases and errors
   Bayer paper on target validation 2/3 of papers did not live up to claims

   MDL Drug Data Report (MDDR), errors

   Errors in clinical research databases vary from 2.3% to 26.9%

   Multicenter analysis by MS-based proteomics identified generic problems in
    databases when characterizing proteins -search engines could not distinguish
    different identifiers many algorithms calculated molecular weight incorrectly

   One database had between 2.1% and 13.6% of annotated Pfam hits unjustified



   ligand–protein X-ray structure - these can also have errors with far reaching
    consequences
Solutions
   Structure Validation and Standardization
   Curation
   Annotation
   Structure filters
        Incorrect valency, atom labels, aromatic bonds, stereochemistry, salts,
         duplication
   Structure standardization guidelines
        Provided by the FDA (Substance Registration System UniqueIngredient
         Identifier (UNII):
         http://www.fda.gov/ForIndustry/DataStandards/SubstanceRegistrationSyste
         m-UniqueIngredientIdentifierUNII/default.htm)


   Need a record of molecule provenance
   Can we track databases and quality - - www.scidbs.com
RSC Introduces “Validation Service”
Scidbs.com
        Default Body
Scidbs.com

                            DB logo

                            Type of DB


                            Contact
                            Owner
             Default Body   Website


                            License
                            Curation etc
Data should be:
   Free from structure errors
   Free from data errors
   Free from experimental errors

   Are we asking too much? Is it even possible??

Yet when we alert others:
   When we raise our hands we are ignored
   Our scientific community needs to wake up
Today
   NPC browser has fewer errors..so do ALL databases!
   More people aware of molecule quality online. Trust is
    earned not just granted!
   The future database user is more informed


                 Tomorrow
   Peer reviewers test the databases that are in manuscripts
   NIH checks databases before release!
   COLLABORATION between government DBs. PLEASE!!!
   We need minimal compound database standards
    (MCDS)
Acknowledgement

We thank the paper reviewers
and blog commenters
for their constructive comments

Chris Lipinski

This work was unfunded
(but was the right thing to do!)


www.scidbs.com

Mais conteúdo relacionado

Destaque

Resume milind patil
Resume milind patilResume milind patil
Resume milind patilMilind Patil
 
Slides for burroughs wellcome foundation ajw100611 sefinal
Slides for burroughs wellcome foundation ajw100611 sefinalSlides for burroughs wellcome foundation ajw100611 sefinal
Slides for burroughs wellcome foundation ajw100611 sefinalSean Ekins
 
Grafico diario del dax perfomance index para el 13 08-2013
Grafico diario del dax perfomance index para el 13 08-2013Grafico diario del dax perfomance index para el 13 08-2013
Grafico diario del dax perfomance index para el 13 08-2013Experiencia Trading
 
Creating Value in Health through Big Data
Creating Value in Health through Big DataCreating Value in Health through Big Data
Creating Value in Health through Big DataBooz Allen Hamilton
 
Secrets of e marketing success 2016 presentation
Secrets of e marketing success 2016 presentationSecrets of e marketing success 2016 presentation
Secrets of e marketing success 2016 presentationMartha Lord
 
How to Deal with an Overbearing Mother w/o Audio
How to Deal with an Overbearing Mother w/o AudioHow to Deal with an Overbearing Mother w/o Audio
How to Deal with an Overbearing Mother w/o Audiosheppar1
 
LinkedIn for education: An Implementation Aid
LinkedIn for education: An Implementation AidLinkedIn for education: An Implementation Aid
LinkedIn for education: An Implementation AidRaghunath Ramaswamy
 
Локальная_система_позиционирования
Локальная_система_позиционированияЛокальная_система_позиционирования
Локальная_система_позиционированияOleg Dubinin
 
orchid island 蘭嶼
orchid island 蘭嶼orchid island 蘭嶼
orchid island 蘭嶼kkjjkevin03
 
Giving feedback & Scrum
Giving feedback & ScrumGiving feedback & Scrum
Giving feedback & ScrumJohan Hoberg
 
Parallelizing a Real-Time Steering Simulation for Computer Games with OpenMP ...
Parallelizing a Real-Time Steering Simulation for Computer Games with OpenMP ...Parallelizing a Real-Time Steering Simulation for Computer Games with OpenMP ...
Parallelizing a Real-Time Steering Simulation for Computer Games with OpenMP ...Bjoern Knafla
 

Destaque (18)

Resume milind patil
Resume milind patilResume milind patil
Resume milind patil
 
Slides for burroughs wellcome foundation ajw100611 sefinal
Slides for burroughs wellcome foundation ajw100611 sefinalSlides for burroughs wellcome foundation ajw100611 sefinal
Slides for burroughs wellcome foundation ajw100611 sefinal
 
6th lesson
6th lesson6th lesson
6th lesson
 
Grafico diario del dax perfomance index para el 13 08-2013
Grafico diario del dax perfomance index para el 13 08-2013Grafico diario del dax perfomance index para el 13 08-2013
Grafico diario del dax perfomance index para el 13 08-2013
 
Creating Value in Health through Big Data
Creating Value in Health through Big DataCreating Value in Health through Big Data
Creating Value in Health through Big Data
 
Secrets of e marketing success 2016 presentation
Secrets of e marketing success 2016 presentationSecrets of e marketing success 2016 presentation
Secrets of e marketing success 2016 presentation
 
How to Deal with an Overbearing Mother w/o Audio
How to Deal with an Overbearing Mother w/o AudioHow to Deal with an Overbearing Mother w/o Audio
How to Deal with an Overbearing Mother w/o Audio
 
BGP Loop Prevention
BGP Loop Prevention BGP Loop Prevention
BGP Loop Prevention
 
LinkedIn for education: An Implementation Aid
LinkedIn for education: An Implementation AidLinkedIn for education: An Implementation Aid
LinkedIn for education: An Implementation Aid
 
Локальная_система_позиционирования
Локальная_система_позиционированияЛокальная_система_позиционирования
Локальная_система_позиционирования
 
orchid island 蘭嶼
orchid island 蘭嶼orchid island 蘭嶼
orchid island 蘭嶼
 
Giving feedback & Scrum
Giving feedback & ScrumGiving feedback & Scrum
Giving feedback & Scrum
 
MEC / CES - January 6, 2015
MEC / CES - January 6, 2015MEC / CES - January 6, 2015
MEC / CES - January 6, 2015
 
Presentation1
Presentation1Presentation1
Presentation1
 
La capa de ozono
La capa de ozonoLa capa de ozono
La capa de ozono
 
Big ideas 2015
Big ideas 2015Big ideas 2015
Big ideas 2015
 
Parallelizing a Real-Time Steering Simulation for Computer Games with OpenMP ...
Parallelizing a Real-Time Steering Simulation for Computer Games with OpenMP ...Parallelizing a Real-Time Steering Simulation for Computer Games with OpenMP ...
Parallelizing a Real-Time Steering Simulation for Computer Games with OpenMP ...
 
Evaluation question 1res
Evaluation question 1resEvaluation question 1res
Evaluation question 1res
 

Semelhante a Acs towards a gold standard database

Dispensing Processes Impact Computational and Statistical Analyses
Dispensing Processes Impact Computational and Statistical AnalysesDispensing Processes Impact Computational and Statistical Analyses
Dispensing Processes Impact Computational and Statistical Analyses Sean Ekins
 

Semelhante a Acs towards a gold standard database (20)

ChemSpider - Does Community Engagement work to Build a Quality Online Resourc...
ChemSpider - Does Community Engagement work to Build a Quality Online Resourc...ChemSpider - Does Community Engagement work to Build a Quality Online Resourc...
ChemSpider - Does Community Engagement work to Build a Quality Online Resourc...
 
RSC ChemSpider -- Managing and Integrating Chemistry on the Internet to Build...
RSC ChemSpider -- Managing and Integrating Chemistry on the Internet to Build...RSC ChemSpider -- Managing and Integrating Chemistry on the Internet to Build...
RSC ChemSpider -- Managing and Integrating Chemistry on the Internet to Build...
 
Mining public domain data as a basis for drug repurposing
Mining public domain data as a basis for drug repurposingMining public domain data as a basis for drug repurposing
Mining public domain data as a basis for drug repurposing
 
ChemSpider – A Platform to Gather, Host and Integrate Structure Based Data Ac...
ChemSpider – A Platform to Gather, Host and Integrate Structure Based Data Ac...ChemSpider – A Platform to Gather, Host and Integrate Structure Based Data Ac...
ChemSpider – A Platform to Gather, Host and Integrate Structure Based Data Ac...
 
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Ann...
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Ann...Delivering Curated Chemistry to the World via Crowdsourced Deposition and Ann...
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Ann...
 
RSC ChemSpider is the online chemistry database where community contributions...
RSC ChemSpider is the online chemistry database where community contributions...RSC ChemSpider is the online chemistry database where community contributions...
RSC ChemSpider is the online chemistry database where community contributions...
 
ChemSpider – A Crowdsourcing Environment for Hosting and Validating Chemistry...
ChemSpider – A Crowdsourcing Environment for Hosting and Validating Chemistry...ChemSpider – A Crowdsourcing Environment for Hosting and Validating Chemistry...
ChemSpider – A Crowdsourcing Environment for Hosting and Validating Chemistry...
 
Towards a gold standard and regarding quality in public domain chemistry data...
Towards a gold standard and regarding quality in public domain chemistry data...Towards a gold standard and regarding quality in public domain chemistry data...
Towards a gold standard and regarding quality in public domain chemistry data...
 
Improving online chemistry one structure at a time
Improving online chemistry one structure at a timeImproving online chemistry one structure at a time
Improving online chemistry one structure at a time
 
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Ann...
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Ann...Delivering Curated Chemistry to the World via Crowdsourced Deposition and Ann...
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Ann...
 
Online Public Compound Databases
Online Public Compound DatabasesOnline Public Compound Databases
Online Public Compound Databases
 
Integrating and curating internet based chemistry resources to serve life sci...
Integrating and curating internet based chemistry resources to serve life sci...Integrating and curating internet based chemistry resources to serve life sci...
Integrating and curating internet based chemistry resources to serve life sci...
 
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspnRSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
 
ChemSpider as an integration hub for interlinked chemistry data
ChemSpider as an integration hub for interlinked chemistry dataChemSpider as an integration hub for interlinked chemistry data
ChemSpider as an integration hub for interlinked chemistry data
 
ChemSpider as a Foundation for Crowdsourcing and Collaborations in Open Chemi...
ChemSpider as a Foundation for Crowdsourcing and Collaborations in Open Chemi...ChemSpider as a Foundation for Crowdsourcing and Collaborations in Open Chemi...
ChemSpider as a Foundation for Crowdsourcing and Collaborations in Open Chemi...
 
Connecting Chemistry Across the Internet Using ChemSpider
Connecting Chemistry Across the Internet Using ChemSpiderConnecting Chemistry Across the Internet Using ChemSpider
Connecting Chemistry Across the Internet Using ChemSpider
 
Big data challenges associated with building a national data repository for c...
Big data challenges associated with building a national data repository for c...Big data challenges associated with building a national data repository for c...
Big data challenges associated with building a national data repository for c...
 
Dispensing Processes Impact Computational and Statistical Analyses
Dispensing Processes Impact Computational and Statistical AnalysesDispensing Processes Impact Computational and Statistical Analyses
Dispensing Processes Impact Computational and Statistical Analyses
 
Chem spider as a chemical term resolver
Chem spider as a chemical term resolverChem spider as a chemical term resolver
Chem spider as a chemical term resolver
 
Whitney Symposium Lecture June 2008
Whitney Symposium Lecture June 2008Whitney Symposium Lecture June 2008
Whitney Symposium Lecture June 2008
 

Mais de Sean Ekins

How to Win a small business grant.pptx
How to Win a small business grant.pptxHow to Win a small business grant.pptx
How to Win a small business grant.pptxSean Ekins
 
Evaluating Multiple Machine Learning Models for Biodegradation and Aquatic To...
Evaluating Multiple Machine Learning Models for Biodegradation and Aquatic To...Evaluating Multiple Machine Learning Models for Biodegradation and Aquatic To...
Evaluating Multiple Machine Learning Models for Biodegradation and Aquatic To...Sean Ekins
 
A presentation at the Global Genes rare drug development symposium on governm...
A presentation at the Global Genes rare drug development symposium on governm...A presentation at the Global Genes rare drug development symposium on governm...
A presentation at the Global Genes rare drug development symposium on governm...Sean Ekins
 
Leveraging Science Communication and Social Media to Build Your Brand and Ele...
Leveraging Science Communication and Social Media to Build Your Brand and Ele...Leveraging Science Communication and Social Media to Build Your Brand and Ele...
Leveraging Science Communication and Social Media to Build Your Brand and Ele...Sean Ekins
 
Bayesian Models for Chagas Disease
Bayesian Models for Chagas DiseaseBayesian Models for Chagas Disease
Bayesian Models for Chagas DiseaseSean Ekins
 
Assay Central: A New Approach to Compiling Big Data and Preparing Machine Lea...
Assay Central: A New Approach to Compiling Big Data and Preparing Machine Lea...Assay Central: A New Approach to Compiling Big Data and Preparing Machine Lea...
Assay Central: A New Approach to Compiling Big Data and Preparing Machine Lea...Sean Ekins
 
Drug Discovery Today March 2017 special issue
Drug Discovery Today March 2017 special issueDrug Discovery Today March 2017 special issue
Drug Discovery Today March 2017 special issueSean Ekins
 
Using In Silico Tools in Repurposing Drugs for Neglected and Orphan Diseases
Using In Silico Tools in Repurposing Drugs for Neglected and Orphan DiseasesUsing In Silico Tools in Repurposing Drugs for Neglected and Orphan Diseases
Using In Silico Tools in Repurposing Drugs for Neglected and Orphan DiseasesSean Ekins
 
Five Ways to Use Social Media to Raise Awareness for Your Paper or Research
Five Ways to Use Social Media to Raise Awareness for Your Paper or ResearchFive Ways to Use Social Media to Raise Awareness for Your Paper or Research
Five Ways to Use Social Media to Raise Awareness for Your Paper or ResearchSean Ekins
 
Open zika presentation
Open zika presentation Open zika presentation
Open zika presentation Sean Ekins
 
academic / small company collaborations for rare and neglected diseasesv2
 academic / small company collaborations for rare and neglected diseasesv2 academic / small company collaborations for rare and neglected diseasesv2
academic / small company collaborations for rare and neglected diseasesv2Sean Ekins
 
CDD models case study #3
CDD models case study #3 CDD models case study #3
CDD models case study #3 Sean Ekins
 
CDD models case study #2
CDD models case study #2 CDD models case study #2
CDD models case study #2 Sean Ekins
 
CDD Models case study #1
CDD Models case study #1 CDD Models case study #1
CDD Models case study #1 Sean Ekins
 
Using Machine Learning Models Based on Phenotypic Data to Discover New Molecu...
Using Machine Learning Models Based on Phenotypic Data to Discover New Molecu...Using Machine Learning Models Based on Phenotypic Data to Discover New Molecu...
Using Machine Learning Models Based on Phenotypic Data to Discover New Molecu...Sean Ekins
 
CDD: Vault, CDD: Vision and CDD: Models software for biologists and chemists ...
CDD: Vault, CDD: Vision and CDD: Models software for biologists and chemists ...CDD: Vault, CDD: Vision and CDD: Models software for biologists and chemists ...
CDD: Vault, CDD: Vision and CDD: Models software for biologists and chemists ...Sean Ekins
 
The future of computational chemistry b ig
The future of computational chemistry b igThe future of computational chemistry b ig
The future of computational chemistry b igSean Ekins
 
#ZikaOpen: Homology Models -
#ZikaOpen: Homology Models - #ZikaOpen: Homology Models -
#ZikaOpen: Homology Models - Sean Ekins
 
Slas talk 2016
Slas talk 2016Slas talk 2016
Slas talk 2016Sean Ekins
 
Pros and cons of social networking for scientists
Pros and cons of social networking for scientistsPros and cons of social networking for scientists
Pros and cons of social networking for scientistsSean Ekins
 

Mais de Sean Ekins (20)

How to Win a small business grant.pptx
How to Win a small business grant.pptxHow to Win a small business grant.pptx
How to Win a small business grant.pptx
 
Evaluating Multiple Machine Learning Models for Biodegradation and Aquatic To...
Evaluating Multiple Machine Learning Models for Biodegradation and Aquatic To...Evaluating Multiple Machine Learning Models for Biodegradation and Aquatic To...
Evaluating Multiple Machine Learning Models for Biodegradation and Aquatic To...
 
A presentation at the Global Genes rare drug development symposium on governm...
A presentation at the Global Genes rare drug development symposium on governm...A presentation at the Global Genes rare drug development symposium on governm...
A presentation at the Global Genes rare drug development symposium on governm...
 
Leveraging Science Communication and Social Media to Build Your Brand and Ele...
Leveraging Science Communication and Social Media to Build Your Brand and Ele...Leveraging Science Communication and Social Media to Build Your Brand and Ele...
Leveraging Science Communication and Social Media to Build Your Brand and Ele...
 
Bayesian Models for Chagas Disease
Bayesian Models for Chagas DiseaseBayesian Models for Chagas Disease
Bayesian Models for Chagas Disease
 
Assay Central: A New Approach to Compiling Big Data and Preparing Machine Lea...
Assay Central: A New Approach to Compiling Big Data and Preparing Machine Lea...Assay Central: A New Approach to Compiling Big Data and Preparing Machine Lea...
Assay Central: A New Approach to Compiling Big Data and Preparing Machine Lea...
 
Drug Discovery Today March 2017 special issue
Drug Discovery Today March 2017 special issueDrug Discovery Today March 2017 special issue
Drug Discovery Today March 2017 special issue
 
Using In Silico Tools in Repurposing Drugs for Neglected and Orphan Diseases
Using In Silico Tools in Repurposing Drugs for Neglected and Orphan DiseasesUsing In Silico Tools in Repurposing Drugs for Neglected and Orphan Diseases
Using In Silico Tools in Repurposing Drugs for Neglected and Orphan Diseases
 
Five Ways to Use Social Media to Raise Awareness for Your Paper or Research
Five Ways to Use Social Media to Raise Awareness for Your Paper or ResearchFive Ways to Use Social Media to Raise Awareness for Your Paper or Research
Five Ways to Use Social Media to Raise Awareness for Your Paper or Research
 
Open zika presentation
Open zika presentation Open zika presentation
Open zika presentation
 
academic / small company collaborations for rare and neglected diseasesv2
 academic / small company collaborations for rare and neglected diseasesv2 academic / small company collaborations for rare and neglected diseasesv2
academic / small company collaborations for rare and neglected diseasesv2
 
CDD models case study #3
CDD models case study #3 CDD models case study #3
CDD models case study #3
 
CDD models case study #2
CDD models case study #2 CDD models case study #2
CDD models case study #2
 
CDD Models case study #1
CDD Models case study #1 CDD Models case study #1
CDD Models case study #1
 
Using Machine Learning Models Based on Phenotypic Data to Discover New Molecu...
Using Machine Learning Models Based on Phenotypic Data to Discover New Molecu...Using Machine Learning Models Based on Phenotypic Data to Discover New Molecu...
Using Machine Learning Models Based on Phenotypic Data to Discover New Molecu...
 
CDD: Vault, CDD: Vision and CDD: Models software for biologists and chemists ...
CDD: Vault, CDD: Vision and CDD: Models software for biologists and chemists ...CDD: Vault, CDD: Vision and CDD: Models software for biologists and chemists ...
CDD: Vault, CDD: Vision and CDD: Models software for biologists and chemists ...
 
The future of computational chemistry b ig
The future of computational chemistry b igThe future of computational chemistry b ig
The future of computational chemistry b ig
 
#ZikaOpen: Homology Models -
#ZikaOpen: Homology Models - #ZikaOpen: Homology Models -
#ZikaOpen: Homology Models -
 
Slas talk 2016
Slas talk 2016Slas talk 2016
Slas talk 2016
 
Pros and cons of social networking for scientists
Pros and cons of social networking for scientistsPros and cons of social networking for scientists
Pros and cons of social networking for scientists
 

Último

Call Girl in Chennai | Whatsapp No 📞 7427069034 📞 VIP Escorts Service Availab...
Call Girl in Chennai | Whatsapp No 📞 7427069034 📞 VIP Escorts Service Availab...Call Girl in Chennai | Whatsapp No 📞 7427069034 📞 VIP Escorts Service Availab...
Call Girl in Chennai | Whatsapp No 📞 7427069034 📞 VIP Escorts Service Availab...amritaverma53
 
Chennai ❣️ Call Girl 6378878445 Call Girls in Chennai Escort service book now
Chennai ❣️ Call Girl 6378878445 Call Girls in Chennai Escort service book nowChennai ❣️ Call Girl 6378878445 Call Girls in Chennai Escort service book now
Chennai ❣️ Call Girl 6378878445 Call Girls in Chennai Escort service book nowtanudubay92
 
Call Girls Shahdol Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Shahdol Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Shahdol Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Shahdol Just Call 8250077686 Top Class Call Girl Service AvailableDipal Arora
 
Call Girls Kathua Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Kathua Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Kathua Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Kathua Just Call 8250077686 Top Class Call Girl Service AvailableDipal Arora
 
❤️Call Girl Service In Chandigarh☎️9814379184☎️ Call Girl in Chandigarh☎️ Cha...
❤️Call Girl Service In Chandigarh☎️9814379184☎️ Call Girl in Chandigarh☎️ Cha...❤️Call Girl Service In Chandigarh☎️9814379184☎️ Call Girl in Chandigarh☎️ Cha...
❤️Call Girl Service In Chandigarh☎️9814379184☎️ Call Girl in Chandigarh☎️ Cha...Sheetaleventcompany
 
Call Girls in Lucknow Just Call 👉👉8630512678 Top Class Call Girl Service Avai...
Call Girls in Lucknow Just Call 👉👉8630512678 Top Class Call Girl Service Avai...Call Girls in Lucknow Just Call 👉👉8630512678 Top Class Call Girl Service Avai...
Call Girls in Lucknow Just Call 👉👉8630512678 Top Class Call Girl Service Avai...soniyagrag336
 
Pune Call Girl Service 📞9xx000xx09📞Just Call Divya📲 Call Girl In Pune No💰Adva...
Pune Call Girl Service 📞9xx000xx09📞Just Call Divya📲 Call Girl In Pune No💰Adva...Pune Call Girl Service 📞9xx000xx09📞Just Call Divya📲 Call Girl In Pune No💰Adva...
Pune Call Girl Service 📞9xx000xx09📞Just Call Divya📲 Call Girl In Pune No💰Adva...Sheetaleventcompany
 
Call Girl In Indore 📞9235973566📞 Just📲 Call Inaaya Indore Call Girls Service ...
Call Girl In Indore 📞9235973566📞 Just📲 Call Inaaya Indore Call Girls Service ...Call Girl In Indore 📞9235973566📞 Just📲 Call Inaaya Indore Call Girls Service ...
Call Girl In Indore 📞9235973566📞 Just📲 Call Inaaya Indore Call Girls Service ...Sheetaleventcompany
 
❤️Amritsar Escorts Service☎️9815674956☎️ Call Girl service in Amritsar☎️ Amri...
❤️Amritsar Escorts Service☎️9815674956☎️ Call Girl service in Amritsar☎️ Amri...❤️Amritsar Escorts Service☎️9815674956☎️ Call Girl service in Amritsar☎️ Amri...
❤️Amritsar Escorts Service☎️9815674956☎️ Call Girl service in Amritsar☎️ Amri...Sheetaleventcompany
 
Race Course Road } Book Call Girls in Bangalore | Whatsapp No 6378878445 VIP ...
Race Course Road } Book Call Girls in Bangalore | Whatsapp No 6378878445 VIP ...Race Course Road } Book Call Girls in Bangalore | Whatsapp No 6378878445 VIP ...
Race Course Road } Book Call Girls in Bangalore | Whatsapp No 6378878445 VIP ...dishamehta3332
 
7 steps How to prevent Thalassemia : Dr Sharda Jain & Vandana Gupta
7 steps How to prevent Thalassemia : Dr Sharda Jain & Vandana Gupta7 steps How to prevent Thalassemia : Dr Sharda Jain & Vandana Gupta
7 steps How to prevent Thalassemia : Dr Sharda Jain & Vandana GuptaLifecare Centre
 
Goa Call Girl Service 📞9xx000xx09📞Just Call Divya📲 Call Girl In Goa No💰Advanc...
Goa Call Girl Service 📞9xx000xx09📞Just Call Divya📲 Call Girl In Goa No💰Advanc...Goa Call Girl Service 📞9xx000xx09📞Just Call Divya📲 Call Girl In Goa No💰Advanc...
Goa Call Girl Service 📞9xx000xx09📞Just Call Divya📲 Call Girl In Goa No💰Advanc...Sheetaleventcompany
 
Call Girls Bangalore - 450+ Call Girl Cash Payment 💯Call Us 🔝 6378878445 🔝 💃 ...
Call Girls Bangalore - 450+ Call Girl Cash Payment 💯Call Us 🔝 6378878445 🔝 💃 ...Call Girls Bangalore - 450+ Call Girl Cash Payment 💯Call Us 🔝 6378878445 🔝 💃 ...
Call Girls Bangalore - 450+ Call Girl Cash Payment 💯Call Us 🔝 6378878445 🔝 💃 ...gragneelam30
 
Exclusive Call Girls Bangalore {7304373326} ❤️VVIP POOJA Call Girls in Bangal...
Exclusive Call Girls Bangalore {7304373326} ❤️VVIP POOJA Call Girls in Bangal...Exclusive Call Girls Bangalore {7304373326} ❤️VVIP POOJA Call Girls in Bangal...
Exclusive Call Girls Bangalore {7304373326} ❤️VVIP POOJA Call Girls in Bangal...Sheetaleventcompany
 
Call Girls Rishikesh Just Call 9667172968 Top Class Call Girl Service Available
Call Girls Rishikesh Just Call 9667172968 Top Class Call Girl Service AvailableCall Girls Rishikesh Just Call 9667172968 Top Class Call Girl Service Available
Call Girls Rishikesh Just Call 9667172968 Top Class Call Girl Service Availableperfect solution
 
❤️Chandigarh Escorts Service☎️9814379184☎️ Call Girl service in Chandigarh☎️ ...
❤️Chandigarh Escorts Service☎️9814379184☎️ Call Girl service in Chandigarh☎️ ...❤️Chandigarh Escorts Service☎️9814379184☎️ Call Girl service in Chandigarh☎️ ...
❤️Chandigarh Escorts Service☎️9814379184☎️ Call Girl service in Chandigarh☎️ ...Sheetaleventcompany
 
ANATOMY AND PHYSIOLOGY OF RESPIRATORY SYSTEM.pptx
ANATOMY AND PHYSIOLOGY OF RESPIRATORY SYSTEM.pptxANATOMY AND PHYSIOLOGY OF RESPIRATORY SYSTEM.pptx
ANATOMY AND PHYSIOLOGY OF RESPIRATORY SYSTEM.pptxSwetaba Besh
 
💰Call Girl In Bangalore☎️7304373326💰 Call Girl service in Bangalore☎️Bangalor...
💰Call Girl In Bangalore☎️7304373326💰 Call Girl service in Bangalore☎️Bangalor...💰Call Girl In Bangalore☎️7304373326💰 Call Girl service in Bangalore☎️Bangalor...
💰Call Girl In Bangalore☎️7304373326💰 Call Girl service in Bangalore☎️Bangalor...Sheetaleventcompany
 
Circulatory Shock, types and stages, compensatory mechanisms
Circulatory Shock, types and stages, compensatory mechanismsCirculatory Shock, types and stages, compensatory mechanisms
Circulatory Shock, types and stages, compensatory mechanismsMedicoseAcademics
 
Low Cost Call Girls Bangalore {9179660964} ❤️VVIP NISHA Call Girls in Bangalo...
Low Cost Call Girls Bangalore {9179660964} ❤️VVIP NISHA Call Girls in Bangalo...Low Cost Call Girls Bangalore {9179660964} ❤️VVIP NISHA Call Girls in Bangalo...
Low Cost Call Girls Bangalore {9179660964} ❤️VVIP NISHA Call Girls in Bangalo...Sheetaleventcompany
 

Último (20)

Call Girl in Chennai | Whatsapp No 📞 7427069034 📞 VIP Escorts Service Availab...
Call Girl in Chennai | Whatsapp No 📞 7427069034 📞 VIP Escorts Service Availab...Call Girl in Chennai | Whatsapp No 📞 7427069034 📞 VIP Escorts Service Availab...
Call Girl in Chennai | Whatsapp No 📞 7427069034 📞 VIP Escorts Service Availab...
 
Chennai ❣️ Call Girl 6378878445 Call Girls in Chennai Escort service book now
Chennai ❣️ Call Girl 6378878445 Call Girls in Chennai Escort service book nowChennai ❣️ Call Girl 6378878445 Call Girls in Chennai Escort service book now
Chennai ❣️ Call Girl 6378878445 Call Girls in Chennai Escort service book now
 
Call Girls Shahdol Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Shahdol Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Shahdol Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Shahdol Just Call 8250077686 Top Class Call Girl Service Available
 
Call Girls Kathua Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Kathua Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Kathua Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Kathua Just Call 8250077686 Top Class Call Girl Service Available
 
❤️Call Girl Service In Chandigarh☎️9814379184☎️ Call Girl in Chandigarh☎️ Cha...
❤️Call Girl Service In Chandigarh☎️9814379184☎️ Call Girl in Chandigarh☎️ Cha...❤️Call Girl Service In Chandigarh☎️9814379184☎️ Call Girl in Chandigarh☎️ Cha...
❤️Call Girl Service In Chandigarh☎️9814379184☎️ Call Girl in Chandigarh☎️ Cha...
 
Call Girls in Lucknow Just Call 👉👉8630512678 Top Class Call Girl Service Avai...
Call Girls in Lucknow Just Call 👉👉8630512678 Top Class Call Girl Service Avai...Call Girls in Lucknow Just Call 👉👉8630512678 Top Class Call Girl Service Avai...
Call Girls in Lucknow Just Call 👉👉8630512678 Top Class Call Girl Service Avai...
 
Pune Call Girl Service 📞9xx000xx09📞Just Call Divya📲 Call Girl In Pune No💰Adva...
Pune Call Girl Service 📞9xx000xx09📞Just Call Divya📲 Call Girl In Pune No💰Adva...Pune Call Girl Service 📞9xx000xx09📞Just Call Divya📲 Call Girl In Pune No💰Adva...
Pune Call Girl Service 📞9xx000xx09📞Just Call Divya📲 Call Girl In Pune No💰Adva...
 
Call Girl In Indore 📞9235973566📞 Just📲 Call Inaaya Indore Call Girls Service ...
Call Girl In Indore 📞9235973566📞 Just📲 Call Inaaya Indore Call Girls Service ...Call Girl In Indore 📞9235973566📞 Just📲 Call Inaaya Indore Call Girls Service ...
Call Girl In Indore 📞9235973566📞 Just📲 Call Inaaya Indore Call Girls Service ...
 
❤️Amritsar Escorts Service☎️9815674956☎️ Call Girl service in Amritsar☎️ Amri...
❤️Amritsar Escorts Service☎️9815674956☎️ Call Girl service in Amritsar☎️ Amri...❤️Amritsar Escorts Service☎️9815674956☎️ Call Girl service in Amritsar☎️ Amri...
❤️Amritsar Escorts Service☎️9815674956☎️ Call Girl service in Amritsar☎️ Amri...
 
Race Course Road } Book Call Girls in Bangalore | Whatsapp No 6378878445 VIP ...
Race Course Road } Book Call Girls in Bangalore | Whatsapp No 6378878445 VIP ...Race Course Road } Book Call Girls in Bangalore | Whatsapp No 6378878445 VIP ...
Race Course Road } Book Call Girls in Bangalore | Whatsapp No 6378878445 VIP ...
 
7 steps How to prevent Thalassemia : Dr Sharda Jain & Vandana Gupta
7 steps How to prevent Thalassemia : Dr Sharda Jain & Vandana Gupta7 steps How to prevent Thalassemia : Dr Sharda Jain & Vandana Gupta
7 steps How to prevent Thalassemia : Dr Sharda Jain & Vandana Gupta
 
Goa Call Girl Service 📞9xx000xx09📞Just Call Divya📲 Call Girl In Goa No💰Advanc...
Goa Call Girl Service 📞9xx000xx09📞Just Call Divya📲 Call Girl In Goa No💰Advanc...Goa Call Girl Service 📞9xx000xx09📞Just Call Divya📲 Call Girl In Goa No💰Advanc...
Goa Call Girl Service 📞9xx000xx09📞Just Call Divya📲 Call Girl In Goa No💰Advanc...
 
Call Girls Bangalore - 450+ Call Girl Cash Payment 💯Call Us 🔝 6378878445 🔝 💃 ...
Call Girls Bangalore - 450+ Call Girl Cash Payment 💯Call Us 🔝 6378878445 🔝 💃 ...Call Girls Bangalore - 450+ Call Girl Cash Payment 💯Call Us 🔝 6378878445 🔝 💃 ...
Call Girls Bangalore - 450+ Call Girl Cash Payment 💯Call Us 🔝 6378878445 🔝 💃 ...
 
Exclusive Call Girls Bangalore {7304373326} ❤️VVIP POOJA Call Girls in Bangal...
Exclusive Call Girls Bangalore {7304373326} ❤️VVIP POOJA Call Girls in Bangal...Exclusive Call Girls Bangalore {7304373326} ❤️VVIP POOJA Call Girls in Bangal...
Exclusive Call Girls Bangalore {7304373326} ❤️VVIP POOJA Call Girls in Bangal...
 
Call Girls Rishikesh Just Call 9667172968 Top Class Call Girl Service Available
Call Girls Rishikesh Just Call 9667172968 Top Class Call Girl Service AvailableCall Girls Rishikesh Just Call 9667172968 Top Class Call Girl Service Available
Call Girls Rishikesh Just Call 9667172968 Top Class Call Girl Service Available
 
❤️Chandigarh Escorts Service☎️9814379184☎️ Call Girl service in Chandigarh☎️ ...
❤️Chandigarh Escorts Service☎️9814379184☎️ Call Girl service in Chandigarh☎️ ...❤️Chandigarh Escorts Service☎️9814379184☎️ Call Girl service in Chandigarh☎️ ...
❤️Chandigarh Escorts Service☎️9814379184☎️ Call Girl service in Chandigarh☎️ ...
 
ANATOMY AND PHYSIOLOGY OF RESPIRATORY SYSTEM.pptx
ANATOMY AND PHYSIOLOGY OF RESPIRATORY SYSTEM.pptxANATOMY AND PHYSIOLOGY OF RESPIRATORY SYSTEM.pptx
ANATOMY AND PHYSIOLOGY OF RESPIRATORY SYSTEM.pptx
 
💰Call Girl In Bangalore☎️7304373326💰 Call Girl service in Bangalore☎️Bangalor...
💰Call Girl In Bangalore☎️7304373326💰 Call Girl service in Bangalore☎️Bangalor...💰Call Girl In Bangalore☎️7304373326💰 Call Girl service in Bangalore☎️Bangalor...
💰Call Girl In Bangalore☎️7304373326💰 Call Girl service in Bangalore☎️Bangalor...
 
Circulatory Shock, types and stages, compensatory mechanisms
Circulatory Shock, types and stages, compensatory mechanismsCirculatory Shock, types and stages, compensatory mechanisms
Circulatory Shock, types and stages, compensatory mechanisms
 
Low Cost Call Girls Bangalore {9179660964} ❤️VVIP NISHA Call Girls in Bangalo...
Low Cost Call Girls Bangalore {9179660964} ❤️VVIP NISHA Call Girls in Bangalo...Low Cost Call Girls Bangalore {9179660964} ❤️VVIP NISHA Call Girls in Bangalo...
Low Cost Call Girls Bangalore {9179660964} ❤️VVIP NISHA Call Girls in Bangalo...
 

Acs towards a gold standard database

  • 1. Towards a Gold Standard: Improving The Quality of Public Domain Chemistry Databases Antony J. Williams1, Sean Ekins 2 1Royal Society of Chemistry, Wake Forest, NC 27587 2Collaborations in Chemistry, Fuquay Varina, NC 27526.
  • 2. The future: crowdsourced drug discovery Williams et al., Drug Discovery World, Winter 2009
  • 3. Chemistry structures are proliferating on the web  Safety data  Toxicity data  Blogs and Wikis  Property databases Users take them at face value  Experimental results  Scientific publications They SHOULD NOT!!!  Compound aggregators  Open Notebook Science  Metabolic pathway databases  Encyclopedic articles (Wikipedia) Immense quantities of scientific information are contained in the thousands of databases Progress can however be inhibited by errors in these databases, downstream effects when the data is reused. http://bit.ly/zWGaps
  • 4. What is the Structure of Vitamin K1?
  • 5. What Mechanisms Do we Have to Alert the Community ?  Email database owner and hope for a response  Blog it  Tony has been blogging about database quality for years and nobody was listening – other than the people at PubChem  For some databases, when he blogged they listened and would edit!  Tweet it  Dec 2010 - We felt something had to be said definitively about structure quality  Publish it – wrote to Science, Nature and then PLoS Computational Biology http://bit.ly/qtJF2f Perhaps the phone?
  • 6. April 27 2011- Then came the : The NPC Browser Science Translational Medicine 2011
  • 7. But wait, hold on – did anyone peer review the database?? Database released and within days .. A quick analysis of structure quality revealed.. 100’s of errors found in structures Williams and Ekins, DDT, 16: 747-750 (2011)
  • 9. Neomycin in NPC Browser http://tripod.nih.gov/npc/
  • 11. How many contribute to clean-up?  Less than a dozen contributors to data  The majority are project members  The crowd is small…  This is the same for all cheminformatics crowd- based efforts
  • 12. What Mechanisms Do we Have to Alert the Community – Publishing is too slow  Tony Blogged April 28th 1 day after release http://bit.ly/jn8wLC  I Blogged April 29th http://bit.ly/lXHInG  suggesting the need for a gold standard database  After more extensive analysis we sent a manuscript to Science Translational Medicine - Rejected  Drug Discovery Today..accepted…8 Months after we pointed out the issue even before NPC Browser release.. Williams and Ekins, DDT, 16: 747-750 (2011)
  • 13. Responses from Community and NCGC  Comments on initial blog  NCGC added a disclaimer which I blogged about May 23rd http://bit.ly/m4Tx2b Sept 8th 2011 Email from Tudor Oprea (cc’ed to 60 others) He has also been pointing out database errors for years.. Followed by one from Chris Austin offering to meet us Several individuals thanked us for the alert
  • 14. More Extensive Analysis and solutions  More analysis of NPC browser errors  “analysis of the NPC browser ‘HTS amenable compounds’ subset of data for 7600 compounds identified fundamental errors in stereochemistry, valency issues and charge imbalances in a few minutes work using a rudimentary software tool”  Analysis of other chemistry databases and errors  Other types of databases and errors  Offered solutions Towards a Gold Standard: Regarding Quality in Public Domain Chemistry Databases and Approaches to Improving the Situation Antony J. Williams, Sean Ekins and Valery Tkachenko, Drug Discovery Today, In Press 2012
  • 15. Data Errors in the NPC Browser: Analysis of Steroids Substructure # of # of No Incomplete Complete but Hits Correct stereochemistry Stereochemistry incorrect Hits stereochemistry Gonane 34 5 8 21 0 Gon-4-ene 55 12 3 33 7 Gon-1,4-diene 60 17 10 23 10 Towards a Gold Standard: Regarding Quality in Public Domain Chemistry Databases and Approaches to Improving the Situation Antony J. Williams, Sean Ekins and Valery Tkachenko, Drug Discovery Today, In Press 2012
  • 16. Why this matters to us and YOU the CROWD ?
  • 17. What You Might Not Know About Chemistry Databases On The Internet  Data-sharing between open databases is cyclic  This can proliferate errors in the “Linked Data”
  • 18. Public Domain Databases  Our databases are a mess…  Non-curated databases are proliferating errors  We source and deposit data between databases  Original sources of errors hard to determine  Curation is time-consuming and challenging
  • 19. Molecule Data Quality Impacts  in silico drug discovery  vast ligand and protein–protein interaction databases  develop computational models  global mapping of pharmacological space  drug-target networks of approved drugs  prediction of off-target effects
  • 20. Different types of databases and errors  Bayer paper on target validation 2/3 of papers did not live up to claims  MDL Drug Data Report (MDDR), errors  Errors in clinical research databases vary from 2.3% to 26.9%  Multicenter analysis by MS-based proteomics identified generic problems in databases when characterizing proteins -search engines could not distinguish different identifiers many algorithms calculated molecular weight incorrectly  One database had between 2.1% and 13.6% of annotated Pfam hits unjustified  ligand–protein X-ray structure - these can also have errors with far reaching consequences
  • 21. Solutions  Structure Validation and Standardization  Curation  Annotation  Structure filters  Incorrect valency, atom labels, aromatic bonds, stereochemistry, salts, duplication  Structure standardization guidelines  Provided by the FDA (Substance Registration System UniqueIngredient Identifier (UNII): http://www.fda.gov/ForIndustry/DataStandards/SubstanceRegistrationSyste m-UniqueIngredientIdentifierUNII/default.htm)  Need a record of molecule provenance  Can we track databases and quality - - www.scidbs.com
  • 23. Scidbs.com Default Body
  • 24. Scidbs.com DB logo Type of DB Contact Owner Default Body Website License Curation etc
  • 25. Data should be:  Free from structure errors  Free from data errors  Free from experimental errors  Are we asking too much? Is it even possible?? Yet when we alert others:  When we raise our hands we are ignored  Our scientific community needs to wake up
  • 26. Today  NPC browser has fewer errors..so do ALL databases!  More people aware of molecule quality online. Trust is earned not just granted!  The future database user is more informed Tomorrow  Peer reviewers test the databases that are in manuscripts  NIH checks databases before release!  COLLABORATION between government DBs. PLEASE!!!  We need minimal compound database standards (MCDS)
  • 27. Acknowledgement We thank the paper reviewers and blog commenters for their constructive comments Chris Lipinski This work was unfunded (but was the right thing to do!) www.scidbs.com