SlideShare a Scribd company logo
1 of 62
Crowd-Sourcing to Build a Structure
  Centric Community for Chemists

                           Antony Williams
                   Whitney Symposium 2008 - Networks
Social Networking for Chemists




 Building a Structure Centric Community for Chemists
Network Drug Discovery Tools
             www.curehunter.com




Building a Structure Centric Community for Chemists
Beware the Networks!




Building a Structure Centric Community for Chemists
Collaborative Authoring in Academia
   Group level collaboration via Wikis




                 Building a Structure Centric Community for Chemists
Collaborative Authoring for Drug Discovery

   Pfizerpedia




                  Building a Structure Centric Community for Chemists
Collaborative Knowledge Management
for Chemists – Wikipedia, Built by a Network




         Building a Structure Centric Community for Chemists
and biologists…WikiProteins




Building a Structure Centric Community for Chemists
WikiProteins




                                                         What
                                                           Is
                                                        Tegafur?




Building a Structure Centric Community for Chemists
Commonly Lacking…

   Approaches generally lack “structural intelligence”
       Structures have properties (Mw, MF, exp. & pred. properties)
       Collections of structures need to be searchable by structure
       Most data collections are “self-contained” and rarely
        connecting to other resources via “structure”




                    Building a Structure Centric Community for Chemists
A Search Engine for Chemists

   Questions a chemist might ask…
       What is the melting point of n-butanol?
       What is the chemical structure of Xanax?
       Chemically, what is viagra?
       What are the stereocenters of cholesterol?
       Where can I find publications about Taxol?
       What are the different trade names for Ketoconazole?
       What is the NMR spectrum of Aspirin?
       What are the safety handling issues for Thymol Blue?

       ChemSpider can answer all of these questions

                    Building a Structure Centric Community for Chemists
ChemSpider Data Content

   Over 20 million unique chemical structures :
       Online Databases –PubChem, Drugbank, HMDB, Wikipedia
       Chemical Vendors – over 40 different vendors and growing
       Personal Depositions – individual contributions
       Journal Publishers
       Content database vendors
       Analytical data collections
       Patents (9 MILLION Structures to search patents)
       Web scraping

    Content is linked back to the original data sources
                   Building a Structure Centric Community for Chemists
A Structure Centric Community for Chemists

   A FREE ACCESS platform for deposition,
    management, curation, annotation and extension of
    information associated with chemical structures
   Semantically connect to other sites providing access to
    knowledge, data and information of determined quality
   Search by alphanumeric text, chemical structure and
    substructure and combination searches
   Predict properties for submitted structures



                 Building a Structure Centric Community for Chemists
Tell me about Aspirin




Building a Structure Centric Community for Chemists
Tell me about Aspirin




Building a Structure Centric Community for Chemists
Links out to KEGG
Kyoto Encyclopedia of Genes and Genomes




         Building a Structure Centric Community for Chemists
Tell me about Aspirin




Building a Structure Centric Community for Chemists
Tell me About Aspirin




Building a Structure Centric Community for Chemists
Tell me about Aspirin




Building a Structure Centric Community for Chemists
Tell me about Aspirin




Building a Structure Centric Community for Chemists
Abstract Compounds?

   Is there any information about “Quesnoin”?




   Type in the name (and there may be many) or other
    identifier
   Paste a chemical structure
   Draw the structure


                 Building a Structure Centric Community for Chemists
Example Search




Building a Structure Centric Community for Chemists
Example Search




Building a Structure Centric Community for Chemists
Example Search 2

   What compounds have a mass of 300+/-0.001?




   or search a combination of intrinsic/predicted properties

                 Building a Structure Centric Community for Chemists
Example Search 2




Building a Structure Centric Community for Chemists
Complex Search




Building a Structure Centric Community for Chemists
Search Open Access Journals – ChemSpider




       Building a Structure Centric Community for Chemists
Search PubMed – ChemSpider




Building a Structure Centric Community for Chemists
The Quality of Data Online…
   Aggregating data opens up quality issues
   Structure-identifier associations are “dirty”
   Structures are COMMONLY incorrect – stereochem issues
   Manual curation of small databases is enough work – what
    about millions of structures?
   Structures are far from perfect. What is a “correct structure”?
       Full stereochemistry?
       Historical timeline of structure?
       Who is the authority?



                       Building a Structure Centric Community for Chemists
Who holds THE Quality Authority?

   Chemical Abstracts Service is the structural authority
    today. 1400 (?) employees, world standard in chemistry
    information
   101 years of knowledge, process and expertise.
    MANUAL curation is key. Robotic curation is enabling
   How can an online, free access system peacefully co-
    exist with the authority?




                 Building a Structure Centric Community for Chemists
Quality is a Major Issue- Search Butanol




       Building a Structure Centric Community for Chemists
Crowd-sourcing Database Compilation




      Building a Structure Centric Community for Chemists
Wikipedia – Crowdsourcing Chemistry




       Building a Structure Centric Community for Chemists
Wikipedia Chemistry Curation project

   Only ca. 5000 organic structures, 7000 total structures
   MONTHS of work so far for a team of 6 people
   Many errors removed in the process. Curation process
    is a daily event for users/depositors
   Slow and torturous process for stereo molecules.




                 Building a Structure Centric Community for Chemists
Thymol Blue on ChemSpider

   Data online includes:
       UV-vis spectrum
       Measured experimental properties
       Link to Wikipedia article
       Links to chromatography details
       Multiple identifiers/trade names etc.
       Links to vendors/suppliers/other databases
       Safety information




                    Building a Structure Centric Community for Chemists
Differences between ChemSpider/Wikipedia

           ChemSpider                                  Wikipedia
>20 million unique structures                ~5000 organics, 2000 others
Complex queries – Properties,                Text
Text, structure/substructure, OA
publishers, Data Sources, …
Prediction of properties                     No
Analytical Data                              No
Active depositors/curators – 30              Active editors – about 50 (?)
5000 people/day; 1100 registered             ????
Compound monographs linked                   Detailed compound monographs

                  Building a Structure Centric Community for Chemists
Differences between Wikipedia/ChemSpider

            Wikipedia                                    ChemSpider
Supported by tried and tested                 Primarily Microsoft .NET
Media-Wiki platform.                          technologies with OS components
Established infrastructure and                “Out of a basement” on three
Wikipedia Foundation Team                     servers and 5 volunteers
Chemistry is a subset of the ‘Pedia           Chemistry is the focus of ‘Spider
GFL licensing for everything                  Mixed “licensing”
Strong team of WP:Chem                        Growing team of WP:Chem
advocates, curators and admins                advocates, curators and admins
Worldwide reputation as quality               Growing reputation as focused on
source                                        quality
                   Building a Structure Centric Community for Chemists
Crowd-sourcing Curation

   How to curate data for millions of structures?
   Robot processes can clean up depositions
       Search for Chloride and check molecular formula for Cl
       Check for stereochemistry and remove names with stereo
   Provide a simple-to-use platform to curate, annotate
    and tag data
   Provide curator administration to prevent vandalism
    (Veropedia)



                   Building a Structure Centric Community for Chemists
Multi-level Curation and Approval




   Building a Structure Centric Community for Chemists
Post Comments
   Anyone can “Post Comments” associated with a
    structure. To curate data we require login to track




                  Building a Structure Centric Community for Chemists
Crowd-sourcing Chemistry

   Crowd-sourced curation: identify and tag errors, edit
    names, synonyms, identify records for deprecation

   ALSO

   Crowd-sourced deposition: anyone can deposit data
    (structures, text, images, analytical data)




                 Building a Structure Centric Community for Chemists
But, when registered and logged in…

   Ability to curate and add to the database
       Add structures
       “Clean” structures
       Add data (spectra, CIFs, images)
       Add links to other pages (URLs)
       Add publication details




                    Building a Structure Centric Community for Chemists
Adding to the Database - Structure




    Building a Structure Centric Community for Chemists
Adding New Text Data


Add Publication                                                Add URL




                                                          Add Identifier




         Building a Structure Centric Community for Chemists
Adding Supplementary Info to a Structure




       Building a Structure Centric Community for Chemists
Can ChemSpider Enable Discovery?
   Yes, chemists can search by text, structure, substructure or
    properties to look at relationships and probe drug discovery




                    Building a Structure Centric Community for Chemists
ChemSpider – Research in Progress

   Supporting Open Notebook Science as a repository –
    JC Bradley at Drexel University
   For the purpose of online virtual screening
   Applying descriptors of various types to filter a
    database of 20 million compounds

   In progress:
       Utilizing SimBioSys’ LASSO Descriptor
       Collaboration based on NISS’ ChemModLab


                   Building a Structure Centric Community for Chemists
LASSO
Ligand Activity by Surface Similarity Order




         Building a Structure Centric Community for Chemists
LASSO Descriptors on ChemSpider
     SEMANTIC WEB in action




Building a Structure Centric Community for Chemists
LASSO Searching Method 1

   Ask the question “What are the top 1000 molecules
    with similar LASSO descriptors to the actives for the
    Estrogen Receptor”




                 Building a Structure Centric Community for Chemists
It WORKS - Enrichment Plot




   60% of the actives were recovered in the top 1% of the database.
   “Environmental binders” are weak binders
   The top ranked compounds may well be active ER binders
   Likely candidates for experimental investigation

                   Building a Structure Centric Community for Chemists
Tipping Point

   Tipping point - the point at
    which a slow gradual change
    becomes irreversible and then
    proceeds with gathering pace




                 Building a Structure Centric Community for Chemists
ChemSpider Forums/Blogs

   Forum.chemspider.com
   www.chemspider.com/blog




               Building a Structure Centric Community for Chemists
ChemSpider TouchGraph




Building a Structure Centric Community for Chemists
What would we most like to do?

   Enable “Collaborative Science”. What would that look
    like?

   Access to chemical supplies when people need them
   Awareness of available literature, patents, databases of
    curated content – whether Open Access or not.
    Transaction fees (or not) are between user and provider
   Host Open Notebook Science exchanges



                 Building a Structure Centric Community for Chemists
“ChemSpider Inside”
   Instrument vendors integrated ChemSpider to their
    metabolism ID project – ChemSpider linked to all Mass
    Spec Intruments doing Metabolite ID?

   Wikipedia roundtrip linking to ChemSpider
   Google indexing ChemSpider at “fixed rate”
   Integration to desktop drawing packages
   Members of Microsoft BioIT Alliance
   Discussions on Taverna’s Workflow Sourceforge group
   Hosting Open Access articles shortly…
                 Building a Structure Centric Community for Chemists
Where to from here? Short term

   Integrated text and structure/substructure searching of the
    Open Access literature is in development
   Web-based scraping of structure-based information –
    examples in place
   Enhanced web services layer to integrate searches
   Deposit updated Patent Database (9 million structures)
   Reaction handling and deposition




                 Building a Structure Centric Community for Chemists
Where to from here? Mid-term

   Spidering for Chemistry – extract data from articles,
    webpages and data sources AND stay within copyright
   WiChempedia project – wiki-layers on top of
    ChemSpider, alongside Wikipedia curation project.
   Deeper integration to text-based searching and
    conversion of chemical names to structures for online
    structure searching:
       Improved integration with NCBI Entrez system
       Deliver “dedicated websites” for specific publishers



                    Building a Structure Centric Community for Chemists
Where to from here? Mid-Term

   An extensible datamodel “on the fly” allows us to
    easily expand to integrate abstract data to structures
   Data mine and curate “parameters” – physicochemical
    and physiological parameters to enable QSAR
    analysis, data modeling and provision of models
    online (UNC-Chapel Hill, NISS)




                Building a Structure Centric Community for Chemists
Our Challenges

   There are “no employees”
   ChemSpider is non-funded
   System is hyper-dependent
    on ISP, power and limited
    compute power
   We are upsetting a lot of
    people – evangelists,
    cheminformatics system
    vendors, publishers, data
    content providers


                Building a Structure Centric Community for Chemists
Acknowledgments

   The ChemSpider team of volunteer developers
   ChemSpider Advisory Group
   Our curators, depositors and users
   Suppliers of commercial software – Microsoft,
    ACD/Labs, OpenEye, ChemAxon, SimBioSys
   SureChem – Structure Based Online Patent Searching




                Building a Structure Centric Community for Chemists
Further reading

   www.chemspider.com/blog
   Internet-based tools for communication and
    collaboration in chemistry, Drug Discovery Today,
    Volume 13, Numbers 11/12, June 2008 502-506,
    doi:10.1016/j.drudis.2008.03.015
   A perspective of publicly accessible/open-access
    chemistry databases, Drug Discovery Today, Volume
    13, Numbers 11/12, June 2008, 495-501,
    doi:10.1016/j.drudis.2008.03.017


                Building a Structure Centric Community for Chemists

More Related Content

What's hot (6)

Delivering Curated Chemistry to the World via Crowdsourced Deposition and Ann...
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Ann...Delivering Curated Chemistry to the World via Crowdsourced Deposition and Ann...
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Ann...
 
Hosting a compound centric community resource for chemistry data
Hosting a compound centric community resource for chemistry dataHosting a compound centric community resource for chemistry data
Hosting a compound centric community resource for chemistry data
 
How a Structure-Centric Community for Chemists Can Benefit Drug Discovery - V...
How a Structure-Centric Community for Chemists Can Benefit Drug Discovery - V...How a Structure-Centric Community for Chemists Can Benefit Drug Discovery - V...
How a Structure-Centric Community for Chemists Can Benefit Drug Discovery - V...
 
Navigating the Complex Web of Chemistry Using ChemSpider
Navigating the Complex Web of Chemistry Using ChemSpiderNavigating the Complex Web of Chemistry Using ChemSpider
Navigating the Complex Web of Chemistry Using ChemSpider
 
Crowdsourcing, Collaborations and Text-Mining in a World of Open Chemistry
Crowdsourcing, Collaborations and Text-Mining in a World of Open Chemistry Crowdsourcing, Collaborations and Text-Mining in a World of Open Chemistry
Crowdsourcing, Collaborations and Text-Mining in a World of Open Chemistry
 
Asking the scientific literature to tell us about metabolism
Asking the scientific literature to tell us about metabolismAsking the scientific literature to tell us about metabolism
Asking the scientific literature to tell us about metabolism
 

Similar to Whitney Symposium Lecturejune 2008 1220331644496491 9

Chemspider hosting linking and curating chemistry data for the community
Chemspider hosting linking and curating chemistry data for the communityChemspider hosting linking and curating chemistry data for the community
Chemspider hosting linking and curating chemistry data for the communityRoyal Society of Chemistry
 

Similar to Whitney Symposium Lecturejune 2008 1220331644496491 9 (20)

ChemSpider Overview SLides August 2007
ChemSpider Overview SLides August 2007ChemSpider Overview SLides August 2007
ChemSpider Overview SLides August 2007
 
Crawling Across the Web of Chemistry Using ChemSpider
Crawling Across the Web of Chemistry Using ChemSpider Crawling Across the Web of Chemistry Using ChemSpider
Crawling Across the Web of Chemistry Using ChemSpider
 
RSC ChemSpider -- Managing and Integrating Chemistry on the Internet to Build...
RSC ChemSpider -- Managing and Integrating Chemistry on the Internet to Build...RSC ChemSpider -- Managing and Integrating Chemistry on the Internet to Build...
RSC ChemSpider -- Managing and Integrating Chemistry on the Internet to Build...
 
AZ of Chemspider February 2011
AZ of Chemspider February 2011AZ of Chemspider February 2011
AZ of Chemspider February 2011
 
ChemSpider – The Vision and Challenges Associated with Building a Free Online...
ChemSpider – The Vision and Challenges Associated with Building a Free Online...ChemSpider – The Vision and Challenges Associated with Building a Free Online...
ChemSpider – The Vision and Challenges Associated with Building a Free Online...
 
ChemSpider Presentation At University Of Toronto
ChemSpider Presentation At University Of TorontoChemSpider Presentation At University Of Toronto
ChemSpider Presentation At University Of Toronto
 
Connecting Chemists to the Internet Through ChemSpider
Connecting Chemists to the Internet Through ChemSpiderConnecting Chemists to the Internet Through ChemSpider
Connecting Chemists to the Internet Through ChemSpider
 
Taming The Wild West Of Internet Based Chemistry You Can Help
Taming The Wild West Of Internet Based Chemistry You Can HelpTaming The Wild West Of Internet Based Chemistry You Can Help
Taming The Wild West Of Internet Based Chemistry You Can Help
 
How Community Crowdsourcing and Social Networking is Helping to Build a Quali...
How Community Crowdsourcing and Social Networking is Helping to Build a Quali...How Community Crowdsourcing and Social Networking is Helping to Build a Quali...
How Community Crowdsourcing and Social Networking is Helping to Build a Quali...
 
RSC ChemSpider – Building An Internet Based Community For Chemists
RSC ChemSpider – Building An Internet Based Community For ChemistsRSC ChemSpider – Building An Internet Based Community For Chemists
RSC ChemSpider – Building An Internet Based Community For Chemists
 
ChemSpider and How The Wisdom Of The Crowds Can Improve The Quality Of ...
ChemSpider  and How The Wisdom Of The  Crowds  Can  Improve The  Quality Of  ...ChemSpider  and How The Wisdom Of The  Crowds  Can  Improve The  Quality Of  ...
ChemSpider and How The Wisdom Of The Crowds Can Improve The Quality Of ...
 
Citizen Scientists and Their Contributions to Internet Based Chemistry
Citizen Scientists and Their Contributions to Internet Based ChemistryCitizen Scientists and Their Contributions to Internet Based Chemistry
Citizen Scientists and Their Contributions to Internet Based Chemistry
 
Integrating and curating internet based chemistry resources to serve life sci...
Integrating and curating internet based chemistry resources to serve life sci...Integrating and curating internet based chemistry resources to serve life sci...
Integrating and curating internet based chemistry resources to serve life sci...
 
Navigating the Complex Web of Chemistry Using ChemSpider
Navigating the Complex Web of Chemistry Using ChemSpiderNavigating the Complex Web of Chemistry Using ChemSpider
Navigating the Complex Web of Chemistry Using ChemSpider
 
ChemSpider -Connecting and Curating Online Chemistry Resources
ChemSpider -Connecting and Curating Online Chemistry ResourcesChemSpider -Connecting and Curating Online Chemistry Resources
ChemSpider -Connecting and Curating Online Chemistry Resources
 
ChemSpider - Building a Foundation for the Semantic Web by Hosting a Crowd So...
ChemSpider - Building a Foundation for the Semantic Web by Hosting a Crowd So...ChemSpider - Building a Foundation for the Semantic Web by Hosting a Crowd So...
ChemSpider - Building a Foundation for the Semantic Web by Hosting a Crowd So...
 
Online Public Compound Databases
Online Public Compound DatabasesOnline Public Compound Databases
Online Public Compound Databases
 
Chemspider hosting linking and curating chemistry data for the community
Chemspider hosting linking and curating chemistry data for the communityChemspider hosting linking and curating chemistry data for the community
Chemspider hosting linking and curating chemistry data for the community
 
ChemSpider hosting linking and curating chemistry data for the community
ChemSpider  hosting linking and curating chemistry data for the communityChemSpider  hosting linking and curating chemistry data for the community
ChemSpider hosting linking and curating chemistry data for the community
 
ChemSpider as a Platform for Crowd Participation in Curating Chemistry
ChemSpider as a Platform for Crowd Participation in Curating ChemistryChemSpider as a Platform for Crowd Participation in Curating Chemistry
ChemSpider as a Platform for Crowd Participation in Curating Chemistry
 

Whitney Symposium Lecturejune 2008 1220331644496491 9

  • 1. Crowd-Sourcing to Build a Structure Centric Community for Chemists Antony Williams Whitney Symposium 2008 - Networks
  • 2. Social Networking for Chemists Building a Structure Centric Community for Chemists
  • 3. Network Drug Discovery Tools www.curehunter.com Building a Structure Centric Community for Chemists
  • 4. Beware the Networks! Building a Structure Centric Community for Chemists
  • 5. Collaborative Authoring in Academia  Group level collaboration via Wikis Building a Structure Centric Community for Chemists
  • 6. Collaborative Authoring for Drug Discovery  Pfizerpedia Building a Structure Centric Community for Chemists
  • 7. Collaborative Knowledge Management for Chemists – Wikipedia, Built by a Network Building a Structure Centric Community for Chemists
  • 8. and biologists…WikiProteins Building a Structure Centric Community for Chemists
  • 9. WikiProteins What Is Tegafur? Building a Structure Centric Community for Chemists
  • 10. Commonly Lacking…  Approaches generally lack “structural intelligence”  Structures have properties (Mw, MF, exp. & pred. properties)  Collections of structures need to be searchable by structure  Most data collections are “self-contained” and rarely connecting to other resources via “structure” Building a Structure Centric Community for Chemists
  • 11. A Search Engine for Chemists  Questions a chemist might ask…  What is the melting point of n-butanol?  What is the chemical structure of Xanax?  Chemically, what is viagra?  What are the stereocenters of cholesterol?  Where can I find publications about Taxol?  What are the different trade names for Ketoconazole?  What is the NMR spectrum of Aspirin?  What are the safety handling issues for Thymol Blue?  ChemSpider can answer all of these questions Building a Structure Centric Community for Chemists
  • 12. ChemSpider Data Content  Over 20 million unique chemical structures :  Online Databases –PubChem, Drugbank, HMDB, Wikipedia  Chemical Vendors – over 40 different vendors and growing  Personal Depositions – individual contributions  Journal Publishers  Content database vendors  Analytical data collections  Patents (9 MILLION Structures to search patents)  Web scraping Content is linked back to the original data sources Building a Structure Centric Community for Chemists
  • 13. A Structure Centric Community for Chemists  A FREE ACCESS platform for deposition, management, curation, annotation and extension of information associated with chemical structures  Semantically connect to other sites providing access to knowledge, data and information of determined quality  Search by alphanumeric text, chemical structure and substructure and combination searches  Predict properties for submitted structures Building a Structure Centric Community for Chemists
  • 14. Tell me about Aspirin Building a Structure Centric Community for Chemists
  • 15. Tell me about Aspirin Building a Structure Centric Community for Chemists
  • 16. Links out to KEGG Kyoto Encyclopedia of Genes and Genomes Building a Structure Centric Community for Chemists
  • 17. Tell me about Aspirin Building a Structure Centric Community for Chemists
  • 18. Tell me About Aspirin Building a Structure Centric Community for Chemists
  • 19. Tell me about Aspirin Building a Structure Centric Community for Chemists
  • 20. Tell me about Aspirin Building a Structure Centric Community for Chemists
  • 21. Abstract Compounds?  Is there any information about “Quesnoin”?  Type in the name (and there may be many) or other identifier  Paste a chemical structure  Draw the structure Building a Structure Centric Community for Chemists
  • 22. Example Search Building a Structure Centric Community for Chemists
  • 23. Example Search Building a Structure Centric Community for Chemists
  • 24. Example Search 2  What compounds have a mass of 300+/-0.001?  or search a combination of intrinsic/predicted properties Building a Structure Centric Community for Chemists
  • 25. Example Search 2 Building a Structure Centric Community for Chemists
  • 26. Complex Search Building a Structure Centric Community for Chemists
  • 27. Search Open Access Journals – ChemSpider Building a Structure Centric Community for Chemists
  • 28. Search PubMed – ChemSpider Building a Structure Centric Community for Chemists
  • 29. The Quality of Data Online…  Aggregating data opens up quality issues  Structure-identifier associations are “dirty”  Structures are COMMONLY incorrect – stereochem issues  Manual curation of small databases is enough work – what about millions of structures?  Structures are far from perfect. What is a “correct structure”?  Full stereochemistry?  Historical timeline of structure?  Who is the authority? Building a Structure Centric Community for Chemists
  • 30. Who holds THE Quality Authority?  Chemical Abstracts Service is the structural authority today. 1400 (?) employees, world standard in chemistry information  101 years of knowledge, process and expertise. MANUAL curation is key. Robotic curation is enabling  How can an online, free access system peacefully co- exist with the authority? Building a Structure Centric Community for Chemists
  • 31. Quality is a Major Issue- Search Butanol Building a Structure Centric Community for Chemists
  • 32. Crowd-sourcing Database Compilation Building a Structure Centric Community for Chemists
  • 33. Wikipedia – Crowdsourcing Chemistry Building a Structure Centric Community for Chemists
  • 34. Wikipedia Chemistry Curation project  Only ca. 5000 organic structures, 7000 total structures  MONTHS of work so far for a team of 6 people  Many errors removed in the process. Curation process is a daily event for users/depositors  Slow and torturous process for stereo molecules. Building a Structure Centric Community for Chemists
  • 35. Thymol Blue on ChemSpider  Data online includes:  UV-vis spectrum  Measured experimental properties  Link to Wikipedia article  Links to chromatography details  Multiple identifiers/trade names etc.  Links to vendors/suppliers/other databases  Safety information Building a Structure Centric Community for Chemists
  • 36. Differences between ChemSpider/Wikipedia ChemSpider Wikipedia >20 million unique structures ~5000 organics, 2000 others Complex queries – Properties, Text Text, structure/substructure, OA publishers, Data Sources, … Prediction of properties No Analytical Data No Active depositors/curators – 30 Active editors – about 50 (?) 5000 people/day; 1100 registered ???? Compound monographs linked Detailed compound monographs Building a Structure Centric Community for Chemists
  • 37. Differences between Wikipedia/ChemSpider Wikipedia ChemSpider Supported by tried and tested Primarily Microsoft .NET Media-Wiki platform. technologies with OS components Established infrastructure and “Out of a basement” on three Wikipedia Foundation Team servers and 5 volunteers Chemistry is a subset of the ‘Pedia Chemistry is the focus of ‘Spider GFL licensing for everything Mixed “licensing” Strong team of WP:Chem Growing team of WP:Chem advocates, curators and admins advocates, curators and admins Worldwide reputation as quality Growing reputation as focused on source quality Building a Structure Centric Community for Chemists
  • 38. Crowd-sourcing Curation  How to curate data for millions of structures?  Robot processes can clean up depositions  Search for Chloride and check molecular formula for Cl  Check for stereochemistry and remove names with stereo  Provide a simple-to-use platform to curate, annotate and tag data  Provide curator administration to prevent vandalism (Veropedia) Building a Structure Centric Community for Chemists
  • 39. Multi-level Curation and Approval Building a Structure Centric Community for Chemists
  • 40. Post Comments  Anyone can “Post Comments” associated with a structure. To curate data we require login to track Building a Structure Centric Community for Chemists
  • 41. Crowd-sourcing Chemistry  Crowd-sourced curation: identify and tag errors, edit names, synonyms, identify records for deprecation  ALSO  Crowd-sourced deposition: anyone can deposit data (structures, text, images, analytical data) Building a Structure Centric Community for Chemists
  • 42. But, when registered and logged in…  Ability to curate and add to the database  Add structures  “Clean” structures  Add data (spectra, CIFs, images)  Add links to other pages (URLs)  Add publication details Building a Structure Centric Community for Chemists
  • 43. Adding to the Database - Structure Building a Structure Centric Community for Chemists
  • 44. Adding New Text Data Add Publication Add URL Add Identifier Building a Structure Centric Community for Chemists
  • 45. Adding Supplementary Info to a Structure Building a Structure Centric Community for Chemists
  • 46. Can ChemSpider Enable Discovery?  Yes, chemists can search by text, structure, substructure or properties to look at relationships and probe drug discovery Building a Structure Centric Community for Chemists
  • 47. ChemSpider – Research in Progress  Supporting Open Notebook Science as a repository – JC Bradley at Drexel University  For the purpose of online virtual screening  Applying descriptors of various types to filter a database of 20 million compounds  In progress:  Utilizing SimBioSys’ LASSO Descriptor  Collaboration based on NISS’ ChemModLab Building a Structure Centric Community for Chemists
  • 48. LASSO Ligand Activity by Surface Similarity Order Building a Structure Centric Community for Chemists
  • 49. LASSO Descriptors on ChemSpider SEMANTIC WEB in action Building a Structure Centric Community for Chemists
  • 50. LASSO Searching Method 1  Ask the question “What are the top 1000 molecules with similar LASSO descriptors to the actives for the Estrogen Receptor” Building a Structure Centric Community for Chemists
  • 51. It WORKS - Enrichment Plot  60% of the actives were recovered in the top 1% of the database.  “Environmental binders” are weak binders  The top ranked compounds may well be active ER binders  Likely candidates for experimental investigation Building a Structure Centric Community for Chemists
  • 52. Tipping Point  Tipping point - the point at which a slow gradual change becomes irreversible and then proceeds with gathering pace Building a Structure Centric Community for Chemists
  • 53. ChemSpider Forums/Blogs  Forum.chemspider.com  www.chemspider.com/blog Building a Structure Centric Community for Chemists
  • 54. ChemSpider TouchGraph Building a Structure Centric Community for Chemists
  • 55. What would we most like to do?  Enable “Collaborative Science”. What would that look like?  Access to chemical supplies when people need them  Awareness of available literature, patents, databases of curated content – whether Open Access or not. Transaction fees (or not) are between user and provider  Host Open Notebook Science exchanges Building a Structure Centric Community for Chemists
  • 56. “ChemSpider Inside”  Instrument vendors integrated ChemSpider to their metabolism ID project – ChemSpider linked to all Mass Spec Intruments doing Metabolite ID?  Wikipedia roundtrip linking to ChemSpider  Google indexing ChemSpider at “fixed rate”  Integration to desktop drawing packages  Members of Microsoft BioIT Alliance  Discussions on Taverna’s Workflow Sourceforge group  Hosting Open Access articles shortly… Building a Structure Centric Community for Chemists
  • 57. Where to from here? Short term  Integrated text and structure/substructure searching of the Open Access literature is in development  Web-based scraping of structure-based information – examples in place  Enhanced web services layer to integrate searches  Deposit updated Patent Database (9 million structures)  Reaction handling and deposition Building a Structure Centric Community for Chemists
  • 58. Where to from here? Mid-term  Spidering for Chemistry – extract data from articles, webpages and data sources AND stay within copyright  WiChempedia project – wiki-layers on top of ChemSpider, alongside Wikipedia curation project.  Deeper integration to text-based searching and conversion of chemical names to structures for online structure searching:  Improved integration with NCBI Entrez system  Deliver “dedicated websites” for specific publishers Building a Structure Centric Community for Chemists
  • 59. Where to from here? Mid-Term  An extensible datamodel “on the fly” allows us to easily expand to integrate abstract data to structures  Data mine and curate “parameters” – physicochemical and physiological parameters to enable QSAR analysis, data modeling and provision of models online (UNC-Chapel Hill, NISS) Building a Structure Centric Community for Chemists
  • 60. Our Challenges  There are “no employees”  ChemSpider is non-funded  System is hyper-dependent on ISP, power and limited compute power  We are upsetting a lot of people – evangelists, cheminformatics system vendors, publishers, data content providers Building a Structure Centric Community for Chemists
  • 61. Acknowledgments  The ChemSpider team of volunteer developers  ChemSpider Advisory Group  Our curators, depositors and users  Suppliers of commercial software – Microsoft, ACD/Labs, OpenEye, ChemAxon, SimBioSys  SureChem – Structure Based Online Patent Searching Building a Structure Centric Community for Chemists
  • 62. Further reading  www.chemspider.com/blog  Internet-based tools for communication and collaboration in chemistry, Drug Discovery Today, Volume 13, Numbers 11/12, June 2008 502-506, doi:10.1016/j.drudis.2008.03.015  A perspective of publicly accessible/open-access chemistry databases, Drug Discovery Today, Volume 13, Numbers 11/12, June 2008, 495-501, doi:10.1016/j.drudis.2008.03.017 Building a Structure Centric Community for Chemists