Chem spider as a chemical term resolver

•Transferir como PPT, PDF•

1 gostou•682 visualizações

US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure

In recent years, in parallel with the general broad trend of information proliferation, many tens of public chemical databases have been created and made available using internet technologies. In many cases fluent data exchange has occurred between these various databases as they source information from one another. While this has the advantages of linking together multiple data sources the results also include the proliferation of errors across the various databases. The lack of a public authority to resolve such errors significantly affects the quality of freely accessible chemical information. While ChemSpider has previously allowed a crowdsourcing approach to curation efforts have now migrated to addressing this problem using a "federated resolver" approach. This presentation will report on our work in this area.

Tecnologia Educação

ChemSpider as a Chemical
Term Resolver

Antony Williams, Valery Tkachenko,
Sean Ekins and Andy Fant
ACS San Diego March 2012

It is so difficult to navigate…
IP?
What’s the
structure?
Are they in
our file?
What’s
similar?
What’s the
Pharmacology target?
data?

Known
Pathways?
Competitors?
Working On
Connections Now?
to disease?
Expressed in
right cell type?

Open PHACTS Project
 Develop a set of robust standards…
 Implement the standards in a semantic integration hub
 Deliver services to support drug discovery programs in
pharma and public domain
 22 partners, 8 pharmaceutical companies, 3 biotechs
 36 months project

Guiding principle is open access, open usage, open source
- Key to standards adoption -

MeSH
 A lipid cofactor that is required for normal blood
clotting.

 Several forms of vitamin K have been identified:
 VITAMIN K 1 (phytomenadione) derived from
plants,
 VITAMIN K 2 (menaquinone) from bacteria, and
synthetic naphthoquinone provitamins,
 VITAMIN K 3 (menadione).

Create an Online “Resolver” as a
path to chemistry
 Search all forms of structure IDs

 Systematic name(s)
 Trivial Name(s)
 SMILES
 InChI Strings
 InChIKeys
 Database IDs
 Registry Number

Available Information…
 Linked to vendors, safety data, toxicity, metabolism

Resolving Names for QUALITY
 Searching chemical identifiers should resolve to
the correct chemical as much as possible

Validated Name-Structure Dictionaries

 Chemical name dictionaries are used for:
 Text-mining (publications, patents)
 Used to index PubMed and link to Google Patents

 Linking to other databases – think Biology!
 When structures are not available drug names link

 Searching the web
 Names link to structures link to InChIs

Top 200 Drugs on Wikipedia
http://en.wikipedia.org/wiki/List_of_bestselling_drugs

The Project Challenge PART ONE
 Agree on the set of chemical names to work with

 Independently create an SDF file in each “lab”

 Compare differences and agree on final structures

 Issue “Gold Standard” SDF file to team

Relative accuracy of groups against
final master list

The Project Challenge PART TWO
 Use Gold Standard SDF File to investigate data
quality on these compounds in Internet Databases

 Two checks
 Search chemical name – does it return the
correct compound. If not correct, how is it
different?
 Search “structure” – SMILES, Molfile,
InChIString or InChIKey

Standardize

 Use the SRS as a guidance document for
standardization
 Adjust as necessary to our needs

One dictionary look up is never enough…
 ChemSpider does not contain all chemistry

 We are not the only ones curating data

 New chemistry expands daily and goes online

One dictionary look up is never enough…
 Federation is key….

 Check ChemSpider first, if not found then
 Check PubChem
 Check NCI resolver
 Check ChEBI
 Check ….the “network” of open interfaces

 Each resolver will have its own “quantitative
confidence”.

Chemical Identifier Resolver (CIR)

Converts a given
structure identifier into
another representation
or structure identifier.

Resolve names,
identifiers etc

http://cactus.nci.nih.gov/chemical/structure

We are building….
 A central federated resolver utilizing available
services
 Dictionary lookups, systematic name conversions
(multiple tools – ACD/Labs, Lexichem, OPSIN)
 “Consensus” decisions and guidance BUT
 Chemicals have timelines!!!

Thank you

Email: williamsa@rsc.org
Twitter: ChemConnector
Personal Blog: www.chemconnector.com
SLIDES: www.slideshare.net/AntonyWilliams

Mais conteúdo relacionado

Mais procurados

ChemSpider as a Foundation for Crowdsourcing and Collaborations in Open Chemi...US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure

Crawling Across the Web of Chemistry Using ChemSpider US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure

Citizen Scientists and Their Contributions to Internet Based ChemistryUS Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure

ChemSpider - Building a Foundation for the Semantic Web by Hosting a Crowd So...US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure

Integrating and curating internet based chemistry resources to serve life sci...US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure

How an Online Resource for Chemistry Can Change Our WorldUS Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure

ChemSpider hosting linking and curating chemistry data for the communityUS Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure

ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Comm...US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure

How the web has weaved a web of interlinked chemistry data finalUS Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure

Navigating the Complex Web of Chemistry Using ChemSpiderUS Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure

Enhancing Discoverability Across Royal Society Of Chemistry Content By Integr...US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure

Whitney Symposium Lecture June 2008US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure

Ebi public meeting on internet chemistry databases november 2010US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure

ChemSpider – The Vision and Challenges Associated with Building a Free Online...US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure

Implementing chemistry platform for OpenPHACTSValery Tkachenko

ChemSpider – A Platform to Gather, Host and Integrate Structure Based Data Ac...US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure

Building linked data large-scale chemistry platform - challenges, lessons and...Valery Tkachenko

Crowdsourcing, Collaborations And Text Mining In A World Of Open ChemistryUS Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure

ChemSpider – A Community Platform for Chemistry and Resources Supporting the ...US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure

Mais procurados (19)

ChemSpider as a Foundation for Crowdsourcing and Collaborations in Open Chemi...

Crawling Across the Web of Chemistry Using ChemSpider

Citizen Scientists and Their Contributions to Internet Based Chemistry

ChemSpider - Building a Foundation for the Semantic Web by Hosting a Crowd So...

Integrating and curating internet based chemistry resources to serve life sci...

How an Online Resource for Chemistry Can Change Our World

ChemSpider hosting linking and curating chemistry data for the community

ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Comm...

How the web has weaved a web of interlinked chemistry data final

Navigating the Complex Web of Chemistry Using ChemSpider

Enhancing Discoverability Across Royal Society Of Chemistry Content By Integr...

Whitney Symposium Lecture June 2008

Ebi public meeting on internet chemistry databases november 2010

ChemSpider – The Vision and Challenges Associated with Building a Free Online...

Implementing chemistry platform for OpenPHACTS

ChemSpider – A Platform to Gather, Host and Integrate Structure Based Data Ac...

Building linked data large-scale chemistry platform - challenges, lessons and...

Crowdsourcing, Collaborations And Text Mining In A World Of Open Chemistry

ChemSpider – A Community Platform for Chemistry and Resources Supporting the ...

Semelhante a Chem spider as a chemical term resolver

Delivering Curated Chemistry to the World via Crowdsourced Deposition and Ann...US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure

Chemistry Online and The vision and challenges associated with building the c...US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure

The Great Promise of Online Data for Chemistry and the Life SciencesUS Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure

Chemical Database Projects Delivered by RSC eScienceUS Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure

Mining public domain data as a basis for drug repurposingUS Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure

ChemSpider – disseminating data and enabling an abundance of chemistry platformsUS Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure

Chemspider hosting linking and curating chemistry data for the communityRoyal Society of Chemistry

ChemSpider - Does Community Engagement work to Build a Quality Online Resourc...US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure

Crowdsourcing, Collaborations and Text-Mining in a World of Open Chemistry US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure

RSC ChemSpider Science Commons Symposium Pacific Northwest #scspnUS Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure

Crowdsourcing Chemistry for the Community – 5 Years of ExperiencesUS Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure

RSC ChemSpider -- Managing and Integrating Chemistry on the Internet to Build...US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure

Connecting Chemistry Across the Internet Using ChemSpiderUS Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure

ChemSpider – An Online Database and Registration System Linking the WebUS Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure

RSC ChemSpider – Building An Internet Based Community For ChemistsUS Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure

Building A Community Resource For The Life SciencesUS Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure

RSC ChemSpider is the online chemistry database where community contributions...US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure

Connecting Chemists to the Internet Through ChemSpiderUS Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure

ChemSpider Presentation At University Of TorontoUS Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure

Chemistry made mobile – the expanding world of chemistry in the handUS Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure

Semelhante a Chem spider as a chemical term resolver (20)

Delivering Curated Chemistry to the World via Crowdsourced Deposition and Ann...

Chemistry Online and The vision and challenges associated with building the c...

The Great Promise of Online Data for Chemistry and the Life Sciences

Chemical Database Projects Delivered by RSC eScience

Mining public domain data as a basis for drug repurposing

ChemSpider – disseminating data and enabling an abundance of chemistry platforms

Chemspider hosting linking and curating chemistry data for the community

ChemSpider - Does Community Engagement work to Build a Quality Online Resourc...

Crowdsourcing, Collaborations and Text-Mining in a World of Open Chemistry

RSC ChemSpider Science Commons Symposium Pacific Northwest #scspn

Crowdsourcing Chemistry for the Community – 5 Years of Experiences

RSC ChemSpider -- Managing and Integrating Chemistry on the Internet to Build...

Connecting Chemistry Across the Internet Using ChemSpider

ChemSpider – An Online Database and Registration System Linking the Web

RSC ChemSpider – Building An Internet Based Community For Chemists

Building A Community Resource For The Life Sciences

RSC ChemSpider is the online chemistry database where community contributions...

Connecting Chemists to the Internet Through ChemSpider

ChemSpider Presentation At University Of Toronto

Chemistry made mobile – the expanding world of chemistry in the hand

Último

TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey

Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3

Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited

Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro

Take control of your SAP testing with UiPath Test SuiteDianaGray10

unit 4 immunoblotting technique complete.pptxBkGupta21

"ML in Production",Oleksandr BaganFwdays

Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3

DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos

Commit 2024 - Secret Management made easyAlfredo García Lavilla

Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3

SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero

How to write a Business Continuity PlanDatabarracks

DMCC Future of Trade Web3 - Special EditionDubai Multi Commodity Centre

Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada

SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal

Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed

Gen AI in Business - Global Trends Report 2024.pdfAddepto

Chem spider as a chemical term resolver

1. ChemSpider as a Chemical Term Resolver Antony Williams, Valery Tkachenko, Sean Ekins and Andy Fant ACS San Diego March 2012

2. The Web of Chemistry – VERY BIG!

3. Online Databases are “Linking”

4. It is so difficult to navigate… IP? What’s the structure? Are they in our file? What’s similar? What’s the Pharmacology target? data? Known Pathways? Competitors? Working On Connections Now? to disease? Expressed in right cell type?

5. Open PHACTS Project  Develop a set of robust standards…  Implement the standards in a semantic integration hub  Deliver services to support drug discovery programs in pharma and public domain  22 partners, 8 pharmaceutical companies, 3 biotechs  36 months project Guiding principle is open access, open usage, open source - Key to standards adoption -

7. What is the Structure of Vitamin K?

8. MeSH  A lipid cofactor that is required for normal blood clotting.  Several forms of vitamin K have been identified:  VITAMIN K 1 (phytomenadione) derived from plants,  VITAMIN K 2 (menaquinone) from bacteria, and synthetic naphthoquinone provitamins,  VITAMIN K 3 (menadione).

9. What is the Structure of Vitamin K1?

10.

11.

12. Create an Online “Resolver” as a path to chemistry  Search all forms of structure IDs  Systematic name(s)  Trivial Name(s)  SMILES  InChI Strings  InChIKeys  Database IDs  Registry Number

13. ChemSpider

14. Available Information…  Linked to vendors, safety data, toxicity, metabolism

15. Available Information….

16. Vitamin K1 Names

17. Vitamin K1 on ChemSpider CORRECT

18. Resolving Names for QUALITY  Searching chemical identifiers should resolve to the correct chemical as much as possible

19. Validated Name-Structure Dictionaries  Chemical name dictionaries are used for:  Text-mining (publications, patents)  Used to index PubMed and link to Google Patents  Linking to other databases – think Biology!  When structures are not available drug names link  Searching the web  Names link to structures link to InChIs

20. I want to know about “Vincristine”

21. Vincristine: Identifiers

22. Vincristine: Patents Linked by Name

23. Many Names, One Structure

24. Top 200 Drugs on Wikipedia http://en.wikipedia.org/wiki/List_of_bestselling_drugs

25. The Project Challenge PART ONE  Agree on the set of chemical names to work with  Independently create an SDF file in each “lab”  Compare differences and agree on final structures  Issue “Gold Standard” SDF file to team

26. RSC Process

27. Relative accuracy of groups against final master list

28. The Project Challenge PART TWO  Use Gold Standard SDF File to investigate data quality on these compounds in Internet Databases  Two checks  Search chemical name – does it return the correct compound. If not correct, how is it different?  Search “structure” – SMILES, Molfile, InChIString or InChIKey

29. “The First 10”

30. Performance on 150 Drug Names

31.

32. NPC Browser Set

33. Standardize  Use the SRS as a guidance document for standardization  Adjust as necessary to our needs

34. Nitro groups

35. Salt and Ionic Bonds

36. One dictionary look up is never enough…  ChemSpider does not contain all chemistry  We are not the only ones curating data  New chemistry expands daily and goes online

37. One dictionary look up is never enough…  Federation is key….  Check ChemSpider first, if not found then  Check PubChem  Check NCI resolver  Check ChEBI  Check ….the “network” of open interfaces  Each resolver will have its own “quantitative confidence”.

38. Chemical Identifier Resolver (CIR) Converts a given structure identifier into another representation or structure identifier. Resolve names, identifiers etc http://cactus.nci.nih.gov/chemical/structure

39. What can become a resolver?

40. We are building….  A central federated resolver utilizing available services  Dictionary lookups, systematic name conversions (multiple tools – ACD/Labs, Lexichem, OPSIN)  “Consensus” decisions and guidance BUT  Chemicals have timelines!!!

41. ORIGINAL FINAL

42. Thank you Email: williamsa@rsc.org Twitter: ChemConnector Personal Blog: www.chemconnector.com SLIDES: www.slideshare.net/AntonyWilliams

Chem spider as a chemical term resolver

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (19)

Semelhante a Chem spider as a chemical term resolver

Semelhante a Chem spider as a chemical term resolver (20)

Último

Último (20)

Chem spider as a chemical term resolver