SlideShare uma empresa Scribd logo
1 de 57
Baixar para ler offline
Markus Sitzmann 1 , Wolf-Dietrich Ihlenfeldt 2 , and Marc C. Nicklaus 1 [1] Computer-Aided Drug Design Group, Chemical Biology Laboratory, NCI-Frederick, NIH, DHHS [2] Xemistry GmbH, Auf den Stieden 8, D-35094 Lahntal, Germany NCI/CADD Chemical Identifier Resolver: Indexing and Analysis of Available Chemistry Space
Chemistry Space Analysis ,[object Object],[object Object],[object Object],[object Object]
Chemical Identifier Resolver chemical structure NCI/CADD Identifiers InChI/InChIKey ChemSpider ID PubChem SID/CID chemical   names CAS Registry Number NSC number FDA UNII ChemNavigator SID SMILES SD File Chemical Formula ChEBI ID PDB Ligand ID MRV   CML SYBYL Line Notation   GIF image
http://cactus.nci.nih.gov/chemical/structure Works as a resolver for different  chemical structure identifiers.  Allows one to convert a given structure identifier into another representation or structure identifier. Chemical Identifier Resolver NCI/CADD Web Resources first beta release:  July 2009 current release (beta   4):  April 2011
[object Object],example:  http://cactus.nci.nih.gov/chemical/structure/ Tamiflu / cas 204255-11-8 http://cactus.nci.nih.gov/chemical/structure/”identifier”/”representation” MIME type:  text/plain Chemical Identifier Resolver NCI/CADD Web Resources XML format:  http://cactus.nci.nih.gov/chemical/structure/”identifier”/”representation” /xml ,[object Object]
resolver chemical names IUPAC names (by  OPSIN ) CAS numbers SMILES strings IUPAC InChI/InChIKeys NCI/CADD Identifiers CACTVS HASHISY NSC number PubChem SID ChemSpider ID ChemNavigator SID FDA UNII /smiles /names, /iupac_name /cas /inchi, /stdinchi /inchikey, /stdinchikey /ficts, /ficus, /uuuuu  /image /file, /sdf /mw, /monoisotopic_mass   /formula /twirl, /3d /urls /chemspider_id /pubchem_sid /chemnavigator_sid “ identifier” “ representation” http://cactus.nci.nih.gov/chemcial/structure Chemical Identifier Resolver NCI/CADD Public Web Resources
identifier representation http request http response detection of the identifier type identifier is a full structure  representation (e.g. SMILES, InChI) calculation of the requested structure representation identifier is a hashed structure representation (e.g. InChIKey), trivial name etc. database lookup MIME type Chemical Identifier Resolver NCI/CADD Web Resources structure e.g. InChI, GIF image e.g. CAS number, chemical name CACTVS NCI/CADD Chemical Structure Database (CSDB)
[object Object],[object Object],[object Object],currently: ~ 150 chemical structure databases ~120 million structure records   ~81.6 million unique structures by  NCI/CADD FICuS Identifier ~84 million unique structures by Std. InChIKey ChemNav. iResearch Lib. ~56% PubChem ~38% others ~6% Chemical Structure Database (CSDB) Chemical Identifier Resolver
[object Object],FICTS, FICuS, uuuuu
[object Object],[object Object],[object Object],[object Object],[object Object],NCI/CADD Structure Identifiers Unique Representation of Chemical Structures 9850FD9F9E2B4E25 H N N N H 2 O H O
structure normalization parent structure NCI/CADD Identifier hashcode calculation E_HASHISY ,[object Object],[object Object],FICTS original structure record Molfile SDF SMILES ChemDraw cdx PDB FICuS uuuuu SDF SMILES database NCI/CADD Structure Identifiers Unique Representation of Chemical Structures
[object Object],Fragments sensitive keep only largest organic fragment Isotopes ignore isotope labels sensitive Charges uncharge sensitive find canonical tautomer Stereochemistry sensitive discard stereo information un-sensitive un-sensitive un-sensitive un-sensitive sensitive Tautomers Na + un-sensitive NCI/CADD Structure Identifiers Unique Representation of Chemical Structures D D D D D D O O C O O H N H 2 O - O N H 3 + O H O N H 2 O O H O O H C O O H H N H 2 C O O H N H 2 H O O - O O H
Fragments Isotopes Charges sensitive sensitive sensitive un-sensitive un-sensitive un-sensitive un-sensitive Tautomers Stereochemistry sensitive sensitive Na + NCI/CADD Structure Identifiers Unique Representation of Chemical Structures D D D D D D O O C O O H N H 2 O - O N H 3 + O H O N H 2 O O H O O H C O O H H N H 2 C O O H N H 2 H O O - O O H
Fragments Isotopes Charges sensitive sensitive sensitive F I C representation of the exact drawing un-sensitive un-sensitive un-sensitive un-sensitive un-sensitive T ≠ ≠ ≠ Tautomers Stereochemistry sensitive sensitive ≠ ≠ S Na + ≠ ≠ FICTS NCI/CADD Structure Identifiers Unique Representation of Chemical Structures D D D D D D O O C O O H N H 2 O - O N H 3 + O H O N H 2 O O H O O H C O O H H N H 2 C O O H N H 2 H O O - O O H
Fragments Isotopes Charges sensitive sensitive sensitive F I C comes closest to how a chemist perceives a compound un-sensitive un-sensitive un-sensitive un-sensitive un-sensitive u Tautomers Stereochemistry sensitive sensitive = ≠ S Na + FICuS ≠ ≠ ≠ ≠ = NCI/CADD Structure Identifiers Unique Representation of Chemical Structures D D D D D D O O C O O H N H 2 O - O N H 3 + O H O N H 2 O O H O O H C O O H H N H 2 C O O H N H 2 H O O - O O H
Fragments Isotopes Charges Tautomers Stereochemistry Na + sensitive sensitive sensitive sensitive sensitive = = = = = = = = closely related forms of the same compound u u u u u un-sensitive un-sensitive un-sensitive un-sensitive un-sensitive uuuuu NCI/CADD Structure Identifiers Unique Representation of Chemical Structures O O - D D D D D D O - O N H 3 + O O H O O H C O O H H N H 2 C O O H N H 2 H O O H O O C O O H N H 2 O H O N H 2
Fragments Isotopes Charges Stereo Tautomers FICTS FICuS uuuuu sensitive /  not sensitive <CACTVS hashcode (E_HASHISY)>-<tag>-<version>-<checksum> Na + 4A122D094098B50D -FICTS-01-1D  0E26B623DF7FAD30 -FICuS-01-70 9850FD9F9E2B4E25 -uuuuu-01-27 NCI/CADD Structure Identifiers Unique Representation of Chemical Structures H N N N H 2 O - O
H N N N H 2 O - O N a + charged form tautomer isotope salt stereoisomers “ errors” histidine H N N N H 2 O H O N N H N H 2 O H O H N N O H O N H 2 H N N O H O N H 2 H N N N H 3 + O - O O H N N N H 2 O N a H N N N H O H O N H N 1 5 N H 2 O H O
A3DAE0788050DDE4-FICTS  E5F83F10C5DB080A -FICTS B2FDA68AEDA06DB9-FICTS 9850FD9F9E2B4E25 -FICTS E5F83F10C5DB080A -FICTS E92E4BA2869F3611-FICTS 8A7AD1EB498CC76A-FICTS 6C16DE2351F9FF50-FICTS H N N N H 2 O - O N a + 9850FD9F9E2B4E25 -FICTS charged form tautomer isotope salt stereoisomers FICTS “ errors” H N N N H 2 O H O N N H N H 2 O H O H N N O H O N H 2 H N N O H O N H 2 H N N N H 3 + O - O O H N N N H 2 O N a H N N N H O H O N H N 1 5 N H 2 O H O
A3DAE0788050DDE4-FICuS  E5F83F10C5DB080A -FICuS B2FDA68AEDA06DB9-FICuS 9850FD9F9E2B4E25 -FICuS E5F83F10C5DB080A -FICuS E92E4BA2869F3611-FICuS 8A7AD1EB498CC76A-FICuS 9850FD9F9E2B4E25 -FICuS H N N N H 2 O - O N a + 9850FD9F9E2B4E25 -FICuS charged form tautomer isotope salt stereoisomers FICuS “ errors” H N N N H 2 O H O N N H N H 2 O H O H N N O H O N H 2 H N N O H O N H 2 H N N N H 3 + O - O O H N N N H 2 O N a H N N N H O H O N H N 1 5 N H 2 O H O
9850FD9F9E2B4E25 -uuuuu 9850FD9F9E2B4E25 -uuuuu 9850FD9F9E2B4E25 -uuuuu 9850FD9F9E2B4E25 -FICuS 9850FD9F9E2B4E25 -uuuuu 9850FD9F9E2B4E25 -uuuuu 9850FD9F9E2B4E25 -uuuuu 9850FD9F9E2B4E25 -uuuuu H N N N H 2 O - O N a + 9850FD9F9E2B4E25 -uuuuu charged form tautomer isotope stereoisomers salt uuuuu “ errors” H N N N H 2 O H O N N H N H 2 O H O H N N O H O N H 2 H N N O H O N H 2 H N N N H 3 + O - O O H N N N H 2 O N a H N N N H O H O N H N 1 5 N H 2 O H O
HNDVDQJCIGZPNO -UHFFFAOYSA-N HNDVDQJCIGZPNO -CDYZYAPPSA-N HNDVDQJCIGZPNO -RXMQYKEDSA-N  HNDVDQJCIGZPNO -YFKPBYRVSA-N HNDVDQJCIGZPNO - UHFFFAOYSA -N H N N N H 2 O - O N a + HNDVDQJCIGZPNO - UHFFFAOYSA -N charged form tautomer isotope stereoisomers salt Std. InChIKey “ errors” HNDVDQJCIGZPNO - UHFFFAOYSA -N UHPNKBYGGMJTIM -UHFFFAOYSA-M   UHPNKBYGGMJTIM -UHFFFAOYSA-M  H N N N H 2 O H O N N H N H 2 O H O H N N O H O N H 2 H N N O H O N H 2 H N N N H 3 + O - O O H N N N H 2 O N a H N N N H O H O N H N 1 5 N H 2 O H O
FICTS original record original record original record original record FICTS original record original record original record original record original record original record original record FICTS FICTS FICTS FICTS FICTS FICTS FICuS FICuS FICuS FICuS FICuS FICuS uuuuu uuuuu uuuuu uuuuu 83.1 million FICTS parent structures 81.6 million FICuS parent structures 76.2 million uuuuu parent structures 119.8 million original structure records in  CSDB NCI/CADD Chemical Structure Database Structure Normalization
FICTS original record original record original record original record FICTS original record original record original record original record original record original record original record FICTS FICTS FICTS FICTS FICTS FICTS FICuS FICuS FICuS FICuS FICuS FICuS uuuuu uuuuu uuuuu uuuuu tautomer- invariant 83.1 million FICTS parent structures 81.6 million FICuS parent structures 76.2 million uuuuu parent structures 119.8 million original structure records in  CSDB NCI/CADD Chemical Structure Database Structure Normalization
Tautomer Analysis How much “chemical space” is “just generated” by drawing tautomers?
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],NCI/CADD Chemical Structure Database Tautomer Analysis
rule 12 : furanones rule 11 : 1.11 (aromatic) heteroatom H shift rule 10 : 1.9 (aromatic) heteroatom H shift rule 9 : 1.7 (aromatic) heteroatom H shift rule 8 : 1.5 aromatic heteroatom H shift (2) rule 7 : 1.5 (aromatic) heteroatom H shift (1) rule 6 : 1.3 heteroatom H shift rule 5 : 1.3 aromatic heteroatom H shift rule 4 : special imine rule 3 : simple (aliphatic) imine rule 2 : 1.5 (thio)keto/(thio)enol rule 1 : 1.3 (thio)keto/(thio)enol ,[object Object],rule 21 : phosphonic acids rule 20 : isocyanides rule 19 : formamidinesulfinic acids rule 18 : cyanic/iso-cyanic acids rule 17 : oxim/nitroso via phenol rule 16 : oxim/nitroso rule 15 : pentavalent nitro/aci-nitro rule 14 : ionic nitro/aci-nitro rule 13 : keten/ynol exchange NCI/CADD Chemical Structure Database Tautomer Analysis
[O,S,Se,Te;X1:1]=[C;z{1-2}:2][CX4R{0-2}:3] [#1:4] >> [#1:4] [O,S,Se,Te;X2:1][#6;z{1-2}:2]=[C,cz{0-1}R{0-1}:3]   [N,n,S,s,O,o,Se,Te:1]=[NX2,nX2,C,c,P,p:2][N,n,S,O,Se,Te:3] [#1:4] >> [#1:4] [N,n,S,O,Se,Te:1][NX2,nX2,C,c,P,p:2]=[N,n,S,s,O,o,Se,Te:3]   1.3 keto/enol 1.3 heteroatom H shift rule 1:  1.3 (thio)keto/(thio)enol   rule 6:  1.3 heteroatom H shift   NCI/CADD Chemical Structure Database Tautomer Analysis 3 2 O 1 H 4 3 2 O 1 H 4 N 2 S 1 N 3 H H 4 H N 2 S 1 N 3 H H 4 H
FICTS FICTS FICTS FICTS FICTS FICTS FICTS FICTS 72.0 million FICTS parent structures NCI/CADD Chemical Structure Database Tautomer Analysis FICuS FICuS FICuS FICuS FICuS FICuS 8.6%  change tautomeric form during FICuS normalization FICTS parent structures 70.6 million FICuS parent structures structure counts are on basis of the 2009 version of CSDB (103.9 million structure records) FICuS parent structures 1.5%  have an one-to-many relationship to several FICTS parent structures (“ conflict ”) 98.5%  have an one-to-one relationship to a single FICTS parent structure
NCI/CADD Chemical Structure Database Tautomer Analysis number database releases 0 10 20 30 40 50 60 70 80 90 0.0 0.5 1.0 1.5 2.0 frequency tautomeric overlap within each individual database release (%)  average:   ~0.3% of original structure records
NCI/CADD Chemical Structure Database Tautomer Analysis number database releases 0 10 20 30 40 50 60 70 80 90 0.0 0.5 1.0 1.5 2.0 frequency tautomeric overlap within each individual database release (%)  average:   ~0.3% of original structure records Asinex ChemBridge ComGenex ChemNavigator Columbia University Molecular Screening Center EPA DSSTox Specs Ambinter BIND BindingDB ChemNavigator KEGG NCI Open Database NIST WebBook NLM ChemIDplus NMRShiftDB Thomson Pharma Wombat NCI/DTP PASS Training Set SGC-Ox ChemDB ZINC ChEBI ChemSpider
NCI/CADD Chemical Structure Database Tautomer Analysis 0 5 10 15 20 25 30 0.5 2.5 4.5 6.5 8.5 10.5 12.5 14.5 16.5 18.5 20.5 22.5 24.5 frequency number database releases percentage of FICuS parent structure in each database release occurring somewhere in CSDB with a conflict occurrence of “tautomerism-critical” molecules within each individual database release (%) average:   ~9.5% of FICuS parent structures
HPMBP  (1-Phenyl-3-methyl-4-benzoyl-pyrazolone-5) ,[object Object],[object Object],[object Object],He, D.; Li Z.; Ma M.; Huang J.; Yang Y. Study of extraction characteristics of HPMBP. 1. Tautomer and extraction characteristics. J. Chem. Eng. Data  2009 , 54(10), 2944-2947 Example for a Tautomer “Conflict” H N N O O
HPMBP  (1-Phenyl-3-methyl-4-benzoyl-pyrazolone-5) CACTVS generates 7 tautomers Example for a Tautomer “Conflict” canonical  tautomer by CACTVS 5 have potential stereo center on atoms or bonds N N O H O H N N O O H N N O O R/S H N N O H O H R/S H N N O O H E/Z N N O O H E/Z N N O O R/S
H H 4551-69-1 33064-14-1 127117-31-1 859  references 49 references 3  references HPMBP  (1-Phenyl-3-methyl-4-benzoyl-pyrazolone-5) 3 have CAS Registry Numbers  assigned Example for a Tautomer “Conflict” (no stereo) (Z) N N O O H N N O O H N N O O R/S H N N O H O H R/S N N O O H E/Z N N O O H E/Z N N O O R/S
N N O H O N N O O N N O O H H N N O O H H N N O H O H H N N O O 6 databases 16 databases  (no stereo) 3 databases  (R) 2 databases  (S) 12 databases 1 database (no stereo) HPMBP  (1-Phenyl-3-methyl-4-benzoyl-pyrazolone-5) Example for a Tautomer “Conflict” occurrences in databases indexed in CSDB R/S R/S E/Z E/Z R/S H N N O O
6   databases 16 databases  (no stereo) 3 databases  (R) 2 databases  (S) 12  databases occurrences in databases N N O H O 1 database (no stereo) HPMBP  (1-Phenyl-3-methyl-4-benzoyl-pyrazolone-5) Example for a Tautomer “Conflict” ACD 3D Ambinter BindingDB ChemBank ChemDB ChemSpider ChemNavigator MLSMR NIAID  Scripps Screening   Center Thomson Pharma ZINC ChemDB ACD 3D ACX Ambinter BioByte QSAR ChemBank ChemBridge ChemDB ChemSpider DiscoveryGate EPA GCES MLSMR NCI Open Database NIST MS-Lib NLM ChemIDplus Sigma-Aldrich Thomson Pharma   Ambinter ChemDB ChemSpider DiscoveryGate ChemNavigator Thomson Pharma   ChemSpider ZINC   ChemSpider ECOTOX ZINC  N N O O R / S H N N O O N N O O H E / Z H N N O O H E / Z H N N O H O H R / S H N N O O R / S
FICuS FICuS FICuS FICuS FICuS FICuS 70.6 million FICuS parent structures NCI/CADD Chemical Structure Database Tautomer Analysis ,[object Object],[object Object],[object Object],starting from the set of  FICuS parent structures  we systematically  generated all  tautomers based on the  21 SMIRKS rule set  available in CACTVS generated 680 million tautomers for 1.7% of the   FICuS parent   structures the enumeration was not exhaustive
Tautomer Analysis NCI/CADD Chemical Structure Database ,[object Object],2.6 17,860,604 rule 12 : furanones 0.2 1,374,235 rule 11 : 1.11 (aromatic) heteroatom H shift 0.7 5,061,731 rule 10 : 1.9 (aromatic) heteroatom H shift 8.4 57,242,472 rule 9 : 1.7 (aromatic) heteroatom H shift <0.1 26,819 rule 8 : 1.5 aromatic heteroatom H shift (2) 4.0 27,542,770 rule 7 : 1.5 (aromatic) heteroatom H shift (1) 36.8 250,453,882 rule 6 : 1.3 heteroatom H shift 3.8 25,678,446 rule 5 : 1.3 aromatic heteroatom H shift 0.6 4,306,155 rule 4 : special imine 5.3 35,917,415 rule 3 : simple (aliphatic) imine 1.7 11,541,452 rule 2 : 1.5 (thio)keto/(thio)enol 25.4 173,002,712 rule 1 : 1.3 (thio)keto/(thio)enol % count generated tautomers tautomer rule
<0.1 54,926 rule 21 : phosphonic acids <0.1 229 rule 20 : isocyanides <0.1 1392 rule 19 : formamidinesulfinic acids <0.1 181 rule 18 : cyanic/iso-cyanic acids <0.1 131,502 rule 17 : oxim/nitroso via phenol <0.1 505,695 rule 16 : oxim/nitroso <0.1 129 rule 15 : pentavalent nitro/aci-nitro <0.1 428,266 rule 14 : ionic nitro/aci-nitro <0.1 57,989 rule 13 : keten/ynol exchange % count generated tautomers tautomer rule Tautomer Analysis NCI/CADD Chemical Structure Database ,[object Object]
NCI/CADD Chemical Structure Database Tautomer Analysis ,[object Object],<0.1 3 801–832 tautomers <0.1 362 701-800 tautomers <0.1 1,400 601-700 tautomers <0.1 4,323 501-600 tautomers <0.1 17,241 401-500 tautomers <0.1 35,144 301-400 tautomers <0.1 104,875 201-300 tautomers 0.8 565,199 101-200 tautomers 1.6 1,136,066 51-100 tautomers 3.7 2,622,587 25-50 tautomers 15.4 10,870,312 11-25 tautomers 47.5 33,532,284 2-10 tautomers 15.2 10,721,845 one tautomer 13.8 9,756,186 no tautomers % count FICuS structures with
NCI/CADD Chemical Structure Database Tautomer Analysis ,[object Object],many minor tautomeric forms (but you find them in databases) <0.1 3 801–832 tautomers <0.1 362 701-800 tautomers <0.1 1,400 601-700 tautomers <0.1 4,323 501-600 tautomers <0.1 17,241 401-500 tautomers <0.1 35,144 301-400 tautomers 0.1 104,875 201-300 tautomers 0.8 565,199 101-200 tautomers 1.6 1,136,066 51-100 tautomers 3.7 2,622,587 25-50 tautomers 15.4 10,870,312 11-25 tautomers 47.5 33,532,284 2-10 tautomers 15.2 10,721,845 one tautomer 13,8 9,756,186 no tautomers % count FICuS structures with
45.6 310,725,465 >0.9-1.0 31.5 214,747,976 >0.8-0.9 16.4 111,954,384 >0.7-0.8 5.3 36,448,651 >0.6-0.7 0.9 6,304,436 >0.5-0.6 <0.1 369,331 >0.4-0.5 <0.1 6,580 >0.3-0.4 <0.1 6 >0.2-0.3 0.0 0 >0.0-0.2 % Count Tanimoto index range Tautomer Analysis Tanimoto Similarities of Tautomers ,[object Object],PubChem/CACTVS E_SCREEN bitvector (881 bits) ~ 23% below 0.8 Tanimoto similarity (although the same molecule)
Scaffold Analysis
Scaffold Analysis NCI/CADD Chemical Structure Database molecular scaffold tree archetype scaffold simple scaffold Schuffenhauer et al. J. Chem. Inf. Model.  2007 ,  47 , 47-58  Bemis et al. J. Med. Chem.  1996,  39 , 2887-2893 Bemis et al. J. Med. Chem.  1996,  39 , 2887-2893 S O O N N O level 2 level 1  example N N H O N N H O N N H
NCI/CADD Chemical Structure Database molecular scaffold tree archetype scaffold simple scaffold 76.2 million 8.1 million scaffolds 6.8 million scaffolds 0.8 million scaffolds CSDB Scaffold Analysis uuuuu  compound  set level 2 level 1  N N H O O N N H N N H
NCI/CADD Chemical Structure Database 76.2 million number of unique scaffolds per hierarchy level CSDB Scaffold Analysis uuuuu  compound  set 8.1 million scaffolds 0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 1 2 3 4 5 6 7 8 9 10 Hierarchy Level Number of Unique Scaffolds (in millions) 0 10.0 20.0 30.0 40.0 50.0 60.0 70.0 80.0 Number of unique structures (in million) level 2 level 1  molecular scaffold tree N N H O O N N H
NCI/CADD Chemical Structure Database 1667 58 5 1 2 33 11 2 N N O R 2 R 1 R 9 R 8 R 7 R 6 R 5 R 4 N N R 10 R 2 R 1 R 9 R 8 R 7 R 6 R 5 R 4 R 3 21 R 3 96 5 3 4 25 1693 16 7 73 44 2,281 uuuuu  parent structures 2,726 uuuuu parent structures 744,469 uuuuu parent structures 5334 structure records in 64 databases 6007 structure records in 66 databases 1,069,046 structure records in 66 databases Scaffold Analysis S O O N N O N N H O N N H
Atom Neighborhoods
Multilevel Neighborhoods of Atoms (MNA) HC  C(C(CC-H)C(CC-C)-H(C)) HO  C(C(CC-H)C(CN-H)-H(C)) CHCC  C(C(CC-H)C(CN-H)-C(C-O-O)) CHCN  C(C(CC-H)N(CC)-H(C)) CCCC  C(C(CC-C)N(CC)-H(C)) CCOO   N(C(CN-H)C(CN-H)) NCC  -H(C(CC-H)) OHC   -H(C(CN-H)) OC  -H(-O(-H-C)) -C(C(CC-C)-O(-H-C)-O(-C)) -O(-H(-O)-C(C-O-O)) -O(-C(C-O-O)) NCI/CADD Chemical Structure Database Filimonov D., Poroikov V., Borodina Yu., Gloriozova T. J. Chem. Inf. Comput. Sci.,  1999 , 39 (4), 666-670. MNA level 1 MNA level 2 N O H O H H
Multilevel Neighborhoods of Atoms (MNA) NCI/CADD Chemical Structure Database Unique MNAs  level 1 level 2 13,426 918,516 2.3 billion relationships 1.3 billion relationships ~ 17 per uuuuu parent structure ~ 30 per uuuuu parent structure 76.2 million CSDB uuuuu  compound  set
Multilevel Neighborhoods of Atoms (MNA) NCI/CADD Chemical Structure Database 424,784 MNAs (level 2) are exclusive to a set of  1,3 million structures in ChemSpider Unique MNAs  level 1 level 2 13,426 918,516 2.3 billion relationships 1.3 billion relationships ~ 17 per uuuuu parent structure ~ 30 per uuuuu parent structure 76.2 million CSDB uuuuu  compound  set
Chemical Structure Web Services NCI/CADD web service NCI/CADD web service NCI/CADD Chemical Structure Database (CSDB) CACTVS external web services http Chemical Identifier Resolver other software packages e.g. OPSIN Chemical Structure Web Services Indexing Chemical Space
http://cactus.nci.nih.gov/chemical/structure Chemical Identifier Resolver NCI/CADD Web Resources http://cactus.nci.nih.gov/blog
Acknowledgments ChemNavigator Scott Hutton Tad Hurst CADD Group, CBL, NCI Igor Filippov  Thanks to all database providers! http://cactus.nci.nih.gov Our web site: University of Cambridge Daniel Lowe Peter Murray-Rust Noel’ O Boyle (University College Cork, Ireland)  Richard Apodaca (Metamolecular) Hans-Juergen Himmler
Acknowledgments - Software CACTVS Python Web Framework Python SQL Library Peter Ertl (Novartis) ChemWriter Javascript library
 

Mais conteúdo relacionado

Destaque

SharePoint Worst Practices - SPSRIC
SharePoint Worst Practices - SPSRICSharePoint Worst Practices - SPSRIC
SharePoint Worst Practices - SPSRICDan Usher
 
Redacción de textos Nicolas Arturo Vargas
Redacción de textos Nicolas Arturo VargasRedacción de textos Nicolas Arturo Vargas
Redacción de textos Nicolas Arturo Vargasnicolas1629
 
你所不知道的健康檢查
你所不知道的健康檢查你所不知道的健康檢查
你所不知道的健康檢查honan4108
 
положение о государственной итоговой аттестации выпускников 9 классов мбоу со...
положение о государственной итоговой аттестации выпускников 9 классов мбоу со...положение о государственной итоговой аттестации выпускников 9 классов мбоу со...
положение о государственной итоговой аттестации выпускников 9 классов мбоу со...himbaza
 
Propos Février 10
Propos Février 10Propos Février 10
Propos Février 10Andy Bulle
 
Open Cloud Innovation Festa 2016 | モノがクラウドに直結 IoT向け プログラマブルな通信プラットフォーム "SORACOM"
Open Cloud Innovation Festa 2016 | モノがクラウドに直結 IoT向け プログラマブルな通信プラットフォーム "SORACOM"Open Cloud Innovation Festa 2016 | モノがクラウドに直結 IoT向け プログラマブルな通信プラットフォーム "SORACOM"
Open Cloud Innovation Festa 2016 | モノがクラウドに直結 IoT向け プログラマブルな通信プラットフォーム "SORACOM"SORACOM,INC
 
Faith And Economics
Faith And EconomicsFaith And Economics
Faith And EconomicsRebecca G
 
Composição da carteira ifix novembro 2013
Composição da carteira ifix novembro 2013Composição da carteira ifix novembro 2013
Composição da carteira ifix novembro 2013claudiusinhos
 
Azure en entornos empresariales
Azure en entornos empresarialesAzure en entornos empresariales
Azure en entornos empresarialesAvanet
 
HXRefactored - Doesn't Your Mom Deserve Better
HXRefactored - Doesn't Your Mom Deserve BetterHXRefactored - Doesn't Your Mom Deserve Better
HXRefactored - Doesn't Your Mom Deserve BetterSanjay Khurana
 
Future of ecommerce
Future of ecommerceFuture of ecommerce
Future of ecommerceSasmita Pati
 
collaborative inquiry
collaborative inquiry collaborative inquiry
collaborative inquiry yliulaoshi
 
2015 Ultimate Hiring Toolbox For Small & Medium Businesses
2015 Ultimate Hiring Toolbox For Small & Medium Businesses2015 Ultimate Hiring Toolbox For Small & Medium Businesses
2015 Ultimate Hiring Toolbox For Small & Medium BusinessesSage HR
 
Jel 2012 open education sophie2ze
Jel 2012 open education sophie2zeJel 2012 open education sophie2ze
Jel 2012 open education sophie2zeSophie TOUZÉ
 

Destaque (20)

SharePoint Worst Practices - SPSRIC
SharePoint Worst Practices - SPSRICSharePoint Worst Practices - SPSRIC
SharePoint Worst Practices - SPSRIC
 
Reto clínico joven con dolor abdominal intratable
Reto clínico joven con dolor abdominal intratableReto clínico joven con dolor abdominal intratable
Reto clínico joven con dolor abdominal intratable
 
Redacción de textos Nicolas Arturo Vargas
Redacción de textos Nicolas Arturo VargasRedacción de textos Nicolas Arturo Vargas
Redacción de textos Nicolas Arturo Vargas
 
你所不知道的健康檢查
你所不知道的健康檢查你所不知道的健康檢查
你所不知道的健康檢查
 
положение о государственной итоговой аттестации выпускников 9 классов мбоу со...
положение о государственной итоговой аттестации выпускников 9 классов мбоу со...положение о государственной итоговой аттестации выпускников 9 классов мбоу со...
положение о государственной итоговой аттестации выпускников 9 классов мбоу со...
 
Propos Février 10
Propos Février 10Propos Février 10
Propos Février 10
 
Open Cloud Innovation Festa 2016 | モノがクラウドに直結 IoT向け プログラマブルな通信プラットフォーム "SORACOM"
Open Cloud Innovation Festa 2016 | モノがクラウドに直結 IoT向け プログラマブルな通信プラットフォーム "SORACOM"Open Cloud Innovation Festa 2016 | モノがクラウドに直結 IoT向け プログラマブルな通信プラットフォーム "SORACOM"
Open Cloud Innovation Festa 2016 | モノがクラウドに直結 IoT向け プログラマブルな通信プラットフォーム "SORACOM"
 
Faith And Economics
Faith And EconomicsFaith And Economics
Faith And Economics
 
Composição da carteira ifix novembro 2013
Composição da carteira ifix novembro 2013Composição da carteira ifix novembro 2013
Composição da carteira ifix novembro 2013
 
Evaluation
EvaluationEvaluation
Evaluation
 
Azure en entornos empresariales
Azure en entornos empresarialesAzure en entornos empresariales
Azure en entornos empresariales
 
98 2016 da 0 a tre anni
98   2016   da 0 a tre anni98   2016   da 0 a tre anni
98 2016 da 0 a tre anni
 
eScience-School-Oct2012-Campinas-Brazil
eScience-School-Oct2012-Campinas-BrazileScience-School-Oct2012-Campinas-Brazil
eScience-School-Oct2012-Campinas-Brazil
 
HXRefactored - Doesn't Your Mom Deserve Better
HXRefactored - Doesn't Your Mom Deserve BetterHXRefactored - Doesn't Your Mom Deserve Better
HXRefactored - Doesn't Your Mom Deserve Better
 
Future of ecommerce
Future of ecommerceFuture of ecommerce
Future of ecommerce
 
E book La Crisis Silenciosa (1ª Parte)
E book La Crisis Silenciosa (1ª Parte)E book La Crisis Silenciosa (1ª Parte)
E book La Crisis Silenciosa (1ª Parte)
 
collaborative inquiry
collaborative inquiry collaborative inquiry
collaborative inquiry
 
2015 Ultimate Hiring Toolbox For Small & Medium Businesses
2015 Ultimate Hiring Toolbox For Small & Medium Businesses2015 Ultimate Hiring Toolbox For Small & Medium Businesses
2015 Ultimate Hiring Toolbox For Small & Medium Businesses
 
Jel 2012 open education sophie2ze
Jel 2012 open education sophie2zeJel 2012 open education sophie2ze
Jel 2012 open education sophie2ze
 
Dukane ipad products 2013
Dukane ipad products 2013Dukane ipad products 2013
Dukane ipad products 2013
 

Semelhante a ICCS9 2011 Talk

Increasingly Accurate Representation of Biochemistry (v2)
Increasingly Accurate Representation of Biochemistry (v2)Increasingly Accurate Representation of Biochemistry (v2)
Increasingly Accurate Representation of Biochemistry (v2)Michel Dumontier
 
Chemistry Resource FS1:15
Chemistry Resource FS1:15Chemistry Resource FS1:15
Chemistry Resource FS1:15Krystal Huffer
 
Webinar : Predicting Pharmacology and Safety Profiles with AurPASS
Webinar : Predicting Pharmacology and Safety Profiles with AurPASSWebinar : Predicting Pharmacology and Safety Profiles with AurPASS
Webinar : Predicting Pharmacology and Safety Profiles with AurPASSAureus Sciences
 
ICIC 2016: Mind the Gap: The novel benefits of human-curated substance locat...
ICIC 2016: Mind the Gap:  The novel benefits of human-curated substance locat...ICIC 2016: Mind the Gap:  The novel benefits of human-curated substance locat...
ICIC 2016: Mind the Gap: The novel benefits of human-curated substance locat...Dr. Haxel Consult
 
Building support for the semantic web for chemistry at the Royal Society of C...
Building support for the semantic web for chemistry at the Royal Society of C...Building support for the semantic web for chemistry at the Royal Society of C...
Building support for the semantic web for chemistry at the Royal Society of C...Ken Karapetyan
 
WEBINAR HDX-MS a powerful tool for biopharmaceutical characterisation
WEBINAR HDX-MS a powerful tool for biopharmaceutical characterisationWEBINAR HDX-MS a powerful tool for biopharmaceutical characterisation
WEBINAR HDX-MS a powerful tool for biopharmaceutical characterisationQuality Assistance s.a.
 
Acs 2013 indianapolis_cvsp
Acs 2013 indianapolis_cvspAcs 2013 indianapolis_cvsp
Acs 2013 indianapolis_cvspKen Karapetyan
 
The Dictionary of Substances and Their Effects (DOSE): Volume 06 O-S
The Dictionary of Substances and Their Effects (DOSE): Volume 06 O-SThe Dictionary of Substances and Their Effects (DOSE): Volume 06 O-S
The Dictionary of Substances and Their Effects (DOSE): Volume 06 O-Skopiersperre
 
Math 225-spring-2012
Math 225-spring-2012Math 225-spring-2012
Math 225-spring-2012Bruce Slutsky
 
Tobias marschall haplotype aware genotyping
Tobias marschall haplotype aware genotypingTobias marschall haplotype aware genotyping
Tobias marschall haplotype aware genotypingGenomeInABottle
 
Bioinformatics t9-t10-biocheminformatics v2014
Bioinformatics t9-t10-biocheminformatics v2014Bioinformatics t9-t10-biocheminformatics v2014
Bioinformatics t9-t10-biocheminformatics v2014Prof. Wim Van Criekinge
 
Chemical Text Mining for Current Awareness of Pharmaceutical Patents
Chemical Text Mining for Current Awareness of Pharmaceutical PatentsChemical Text Mining for Current Awareness of Pharmaceutical Patents
Chemical Text Mining for Current Awareness of Pharmaceutical Patentsdan2097
 

Semelhante a ICCS9 2011 Talk (20)

Increasingly Accurate Representation of Biochemistry (v2)
Increasingly Accurate Representation of Biochemistry (v2)Increasingly Accurate Representation of Biochemistry (v2)
Increasingly Accurate Representation of Biochemistry (v2)
 
Chemicals, Chemical Identifiers and Navigating Through Databases
Chemicals, Chemical Identifiers and Navigating Through DatabasesChemicals, Chemical Identifiers and Navigating Through Databases
Chemicals, Chemical Identifiers and Navigating Through Databases
 
Chemistry Resource FS1:15
Chemistry Resource FS1:15Chemistry Resource FS1:15
Chemistry Resource FS1:15
 
Webinar : Predicting Pharmacology and Safety Profiles with AurPASS
Webinar : Predicting Pharmacology and Safety Profiles with AurPASSWebinar : Predicting Pharmacology and Safety Profiles with AurPASS
Webinar : Predicting Pharmacology and Safety Profiles with AurPASS
 
ICIC 2016: Mind the Gap: The novel benefits of human-curated substance locat...
ICIC 2016: Mind the Gap:  The novel benefits of human-curated substance locat...ICIC 2016: Mind the Gap:  The novel benefits of human-curated substance locat...
ICIC 2016: Mind the Gap: The novel benefits of human-curated substance locat...
 
Data model
Data modelData model
Data model
 
Building support for the semantic web for chemistry at the Royal Society of C...
Building support for the semantic web for chemistry at the Royal Society of C...Building support for the semantic web for chemistry at the Royal Society of C...
Building support for the semantic web for chemistry at the Royal Society of C...
 
Building support for the semantic web for chemistry at the Royal Society of C...
Building support for the semantic web for chemistry at the Royal Society of C...Building support for the semantic web for chemistry at the Royal Society of C...
Building support for the semantic web for chemistry at the Royal Society of C...
 
Advanced NCBI
Advanced NCBI Advanced NCBI
Advanced NCBI
 
WEBINAR HDX-MS a powerful tool for biopharmaceutical characterisation
WEBINAR HDX-MS a powerful tool for biopharmaceutical characterisationWEBINAR HDX-MS a powerful tool for biopharmaceutical characterisation
WEBINAR HDX-MS a powerful tool for biopharmaceutical characterisation
 
EB-eye Back End
EB-eye Back EndEB-eye Back End
EB-eye Back End
 
Acs 2013 indianapolis_cvsp
Acs 2013 indianapolis_cvspAcs 2013 indianapolis_cvsp
Acs 2013 indianapolis_cvsp
 
The Dictionary of Substances and Their Effects (DOSE): Volume 06 O-S
The Dictionary of Substances and Their Effects (DOSE): Volume 06 O-SThe Dictionary of Substances and Their Effects (DOSE): Volume 06 O-S
The Dictionary of Substances and Their Effects (DOSE): Volume 06 O-S
 
Biological databases
Biological databasesBiological databases
Biological databases
 
Math 225-spring-2012
Math 225-spring-2012Math 225-spring-2012
Math 225-spring-2012
 
SERMACS 2012
SERMACS 2012SERMACS 2012
SERMACS 2012
 
Seton2007
Seton2007Seton2007
Seton2007
 
Tobias marschall haplotype aware genotyping
Tobias marschall haplotype aware genotypingTobias marschall haplotype aware genotyping
Tobias marschall haplotype aware genotyping
 
Bioinformatics t9-t10-biocheminformatics v2014
Bioinformatics t9-t10-biocheminformatics v2014Bioinformatics t9-t10-biocheminformatics v2014
Bioinformatics t9-t10-biocheminformatics v2014
 
Chemical Text Mining for Current Awareness of Pharmaceutical Patents
Chemical Text Mining for Current Awareness of Pharmaceutical PatentsChemical Text Mining for Current Awareness of Pharmaceutical Patents
Chemical Text Mining for Current Awareness of Pharmaceutical Patents
 

Último

Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsSafe Software
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7DianaGray10
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdfPedro Manuel
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Commit University
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding TeamAdam Moalla
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Adtran
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemAsko Soukka
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IES VE
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1DianaGray10
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureEric D. Schabell
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxUdaiappa Ramachandran
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024D Cloud Solutions
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationIES VE
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...Aggregage
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsSeth Reyes
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.YounusS2
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Will Schroeder
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesMd Hossain Ali
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfDaniel Santiago Silva Capera
 

Último (20)

Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdf
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystem
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability Adventure
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptx
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
 
20230104 - machine vision
20230104 - machine vision20230104 - machine vision
20230104 - machine vision
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and Hazards
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
 

ICCS9 2011 Talk

  • 1. Markus Sitzmann 1 , Wolf-Dietrich Ihlenfeldt 2 , and Marc C. Nicklaus 1 [1] Computer-Aided Drug Design Group, Chemical Biology Laboratory, NCI-Frederick, NIH, DHHS [2] Xemistry GmbH, Auf den Stieden 8, D-35094 Lahntal, Germany NCI/CADD Chemical Identifier Resolver: Indexing and Analysis of Available Chemistry Space
  • 2.
  • 3. Chemical Identifier Resolver chemical structure NCI/CADD Identifiers InChI/InChIKey ChemSpider ID PubChem SID/CID chemical names CAS Registry Number NSC number FDA UNII ChemNavigator SID SMILES SD File Chemical Formula ChEBI ID PDB Ligand ID MRV CML SYBYL Line Notation GIF image
  • 4. http://cactus.nci.nih.gov/chemical/structure Works as a resolver for different chemical structure identifiers. Allows one to convert a given structure identifier into another representation or structure identifier. Chemical Identifier Resolver NCI/CADD Web Resources first beta release: July 2009 current release (beta 4): April 2011
  • 5.
  • 6. resolver chemical names IUPAC names (by OPSIN ) CAS numbers SMILES strings IUPAC InChI/InChIKeys NCI/CADD Identifiers CACTVS HASHISY NSC number PubChem SID ChemSpider ID ChemNavigator SID FDA UNII /smiles /names, /iupac_name /cas /inchi, /stdinchi /inchikey, /stdinchikey /ficts, /ficus, /uuuuu /image /file, /sdf /mw, /monoisotopic_mass /formula /twirl, /3d /urls /chemspider_id /pubchem_sid /chemnavigator_sid “ identifier” “ representation” http://cactus.nci.nih.gov/chemcial/structure Chemical Identifier Resolver NCI/CADD Public Web Resources
  • 7. identifier representation http request http response detection of the identifier type identifier is a full structure representation (e.g. SMILES, InChI) calculation of the requested structure representation identifier is a hashed structure representation (e.g. InChIKey), trivial name etc. database lookup MIME type Chemical Identifier Resolver NCI/CADD Web Resources structure e.g. InChI, GIF image e.g. CAS number, chemical name CACTVS NCI/CADD Chemical Structure Database (CSDB)
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13. Fragments Isotopes Charges sensitive sensitive sensitive un-sensitive un-sensitive un-sensitive un-sensitive Tautomers Stereochemistry sensitive sensitive Na + NCI/CADD Structure Identifiers Unique Representation of Chemical Structures D D D D D D O O C O O H N H 2 O - O N H 3 + O H O N H 2 O O H O O H C O O H H N H 2 C O O H N H 2 H O O - O O H
  • 14. Fragments Isotopes Charges sensitive sensitive sensitive F I C representation of the exact drawing un-sensitive un-sensitive un-sensitive un-sensitive un-sensitive T ≠ ≠ ≠ Tautomers Stereochemistry sensitive sensitive ≠ ≠ S Na + ≠ ≠ FICTS NCI/CADD Structure Identifiers Unique Representation of Chemical Structures D D D D D D O O C O O H N H 2 O - O N H 3 + O H O N H 2 O O H O O H C O O H H N H 2 C O O H N H 2 H O O - O O H
  • 15. Fragments Isotopes Charges sensitive sensitive sensitive F I C comes closest to how a chemist perceives a compound un-sensitive un-sensitive un-sensitive un-sensitive un-sensitive u Tautomers Stereochemistry sensitive sensitive = ≠ S Na + FICuS ≠ ≠ ≠ ≠ = NCI/CADD Structure Identifiers Unique Representation of Chemical Structures D D D D D D O O C O O H N H 2 O - O N H 3 + O H O N H 2 O O H O O H C O O H H N H 2 C O O H N H 2 H O O - O O H
  • 16. Fragments Isotopes Charges Tautomers Stereochemistry Na + sensitive sensitive sensitive sensitive sensitive = = = = = = = = closely related forms of the same compound u u u u u un-sensitive un-sensitive un-sensitive un-sensitive un-sensitive uuuuu NCI/CADD Structure Identifiers Unique Representation of Chemical Structures O O - D D D D D D O - O N H 3 + O O H O O H C O O H H N H 2 C O O H N H 2 H O O H O O C O O H N H 2 O H O N H 2
  • 17. Fragments Isotopes Charges Stereo Tautomers FICTS FICuS uuuuu sensitive / not sensitive <CACTVS hashcode (E_HASHISY)>-<tag>-<version>-<checksum> Na + 4A122D094098B50D -FICTS-01-1D 0E26B623DF7FAD30 -FICuS-01-70 9850FD9F9E2B4E25 -uuuuu-01-27 NCI/CADD Structure Identifiers Unique Representation of Chemical Structures H N N N H 2 O - O
  • 18. H N N N H 2 O - O N a + charged form tautomer isotope salt stereoisomers “ errors” histidine H N N N H 2 O H O N N H N H 2 O H O H N N O H O N H 2 H N N O H O N H 2 H N N N H 3 + O - O O H N N N H 2 O N a H N N N H O H O N H N 1 5 N H 2 O H O
  • 19. A3DAE0788050DDE4-FICTS E5F83F10C5DB080A -FICTS B2FDA68AEDA06DB9-FICTS 9850FD9F9E2B4E25 -FICTS E5F83F10C5DB080A -FICTS E92E4BA2869F3611-FICTS 8A7AD1EB498CC76A-FICTS 6C16DE2351F9FF50-FICTS H N N N H 2 O - O N a + 9850FD9F9E2B4E25 -FICTS charged form tautomer isotope salt stereoisomers FICTS “ errors” H N N N H 2 O H O N N H N H 2 O H O H N N O H O N H 2 H N N O H O N H 2 H N N N H 3 + O - O O H N N N H 2 O N a H N N N H O H O N H N 1 5 N H 2 O H O
  • 20. A3DAE0788050DDE4-FICuS E5F83F10C5DB080A -FICuS B2FDA68AEDA06DB9-FICuS 9850FD9F9E2B4E25 -FICuS E5F83F10C5DB080A -FICuS E92E4BA2869F3611-FICuS 8A7AD1EB498CC76A-FICuS 9850FD9F9E2B4E25 -FICuS H N N N H 2 O - O N a + 9850FD9F9E2B4E25 -FICuS charged form tautomer isotope salt stereoisomers FICuS “ errors” H N N N H 2 O H O N N H N H 2 O H O H N N O H O N H 2 H N N O H O N H 2 H N N N H 3 + O - O O H N N N H 2 O N a H N N N H O H O N H N 1 5 N H 2 O H O
  • 21. 9850FD9F9E2B4E25 -uuuuu 9850FD9F9E2B4E25 -uuuuu 9850FD9F9E2B4E25 -uuuuu 9850FD9F9E2B4E25 -FICuS 9850FD9F9E2B4E25 -uuuuu 9850FD9F9E2B4E25 -uuuuu 9850FD9F9E2B4E25 -uuuuu 9850FD9F9E2B4E25 -uuuuu H N N N H 2 O - O N a + 9850FD9F9E2B4E25 -uuuuu charged form tautomer isotope stereoisomers salt uuuuu “ errors” H N N N H 2 O H O N N H N H 2 O H O H N N O H O N H 2 H N N O H O N H 2 H N N N H 3 + O - O O H N N N H 2 O N a H N N N H O H O N H N 1 5 N H 2 O H O
  • 22. HNDVDQJCIGZPNO -UHFFFAOYSA-N HNDVDQJCIGZPNO -CDYZYAPPSA-N HNDVDQJCIGZPNO -RXMQYKEDSA-N HNDVDQJCIGZPNO -YFKPBYRVSA-N HNDVDQJCIGZPNO - UHFFFAOYSA -N H N N N H 2 O - O N a + HNDVDQJCIGZPNO - UHFFFAOYSA -N charged form tautomer isotope stereoisomers salt Std. InChIKey “ errors” HNDVDQJCIGZPNO - UHFFFAOYSA -N UHPNKBYGGMJTIM -UHFFFAOYSA-M UHPNKBYGGMJTIM -UHFFFAOYSA-M H N N N H 2 O H O N N H N H 2 O H O H N N O H O N H 2 H N N O H O N H 2 H N N N H 3 + O - O O H N N N H 2 O N a H N N N H O H O N H N 1 5 N H 2 O H O
  • 23. FICTS original record original record original record original record FICTS original record original record original record original record original record original record original record FICTS FICTS FICTS FICTS FICTS FICTS FICuS FICuS FICuS FICuS FICuS FICuS uuuuu uuuuu uuuuu uuuuu 83.1 million FICTS parent structures 81.6 million FICuS parent structures 76.2 million uuuuu parent structures 119.8 million original structure records in CSDB NCI/CADD Chemical Structure Database Structure Normalization
  • 24. FICTS original record original record original record original record FICTS original record original record original record original record original record original record original record FICTS FICTS FICTS FICTS FICTS FICTS FICuS FICuS FICuS FICuS FICuS FICuS uuuuu uuuuu uuuuu uuuuu tautomer- invariant 83.1 million FICTS parent structures 81.6 million FICuS parent structures 76.2 million uuuuu parent structures 119.8 million original structure records in CSDB NCI/CADD Chemical Structure Database Structure Normalization
  • 25. Tautomer Analysis How much “chemical space” is “just generated” by drawing tautomers?
  • 26.
  • 27.
  • 28. [O,S,Se,Te;X1:1]=[C;z{1-2}:2][CX4R{0-2}:3] [#1:4] >> [#1:4] [O,S,Se,Te;X2:1][#6;z{1-2}:2]=[C,cz{0-1}R{0-1}:3] [N,n,S,s,O,o,Se,Te:1]=[NX2,nX2,C,c,P,p:2][N,n,S,O,Se,Te:3] [#1:4] >> [#1:4] [N,n,S,O,Se,Te:1][NX2,nX2,C,c,P,p:2]=[N,n,S,s,O,o,Se,Te:3] 1.3 keto/enol 1.3 heteroatom H shift rule 1: 1.3 (thio)keto/(thio)enol rule 6: 1.3 heteroatom H shift NCI/CADD Chemical Structure Database Tautomer Analysis 3 2 O 1 H 4 3 2 O 1 H 4 N 2 S 1 N 3 H H 4 H N 2 S 1 N 3 H H 4 H
  • 29. FICTS FICTS FICTS FICTS FICTS FICTS FICTS FICTS 72.0 million FICTS parent structures NCI/CADD Chemical Structure Database Tautomer Analysis FICuS FICuS FICuS FICuS FICuS FICuS 8.6% change tautomeric form during FICuS normalization FICTS parent structures 70.6 million FICuS parent structures structure counts are on basis of the 2009 version of CSDB (103.9 million structure records) FICuS parent structures 1.5% have an one-to-many relationship to several FICTS parent structures (“ conflict ”) 98.5% have an one-to-one relationship to a single FICTS parent structure
  • 30. NCI/CADD Chemical Structure Database Tautomer Analysis number database releases 0 10 20 30 40 50 60 70 80 90 0.0 0.5 1.0 1.5 2.0 frequency tautomeric overlap within each individual database release (%) average: ~0.3% of original structure records
  • 31. NCI/CADD Chemical Structure Database Tautomer Analysis number database releases 0 10 20 30 40 50 60 70 80 90 0.0 0.5 1.0 1.5 2.0 frequency tautomeric overlap within each individual database release (%) average: ~0.3% of original structure records Asinex ChemBridge ComGenex ChemNavigator Columbia University Molecular Screening Center EPA DSSTox Specs Ambinter BIND BindingDB ChemNavigator KEGG NCI Open Database NIST WebBook NLM ChemIDplus NMRShiftDB Thomson Pharma Wombat NCI/DTP PASS Training Set SGC-Ox ChemDB ZINC ChEBI ChemSpider
  • 32. NCI/CADD Chemical Structure Database Tautomer Analysis 0 5 10 15 20 25 30 0.5 2.5 4.5 6.5 8.5 10.5 12.5 14.5 16.5 18.5 20.5 22.5 24.5 frequency number database releases percentage of FICuS parent structure in each database release occurring somewhere in CSDB with a conflict occurrence of “tautomerism-critical” molecules within each individual database release (%) average: ~9.5% of FICuS parent structures
  • 33.
  • 34. HPMBP (1-Phenyl-3-methyl-4-benzoyl-pyrazolone-5) CACTVS generates 7 tautomers Example for a Tautomer “Conflict” canonical tautomer by CACTVS 5 have potential stereo center on atoms or bonds N N O H O H N N O O H N N O O R/S H N N O H O H R/S H N N O O H E/Z N N O O H E/Z N N O O R/S
  • 35. H H 4551-69-1 33064-14-1 127117-31-1 859 references 49 references 3 references HPMBP (1-Phenyl-3-methyl-4-benzoyl-pyrazolone-5) 3 have CAS Registry Numbers assigned Example for a Tautomer “Conflict” (no stereo) (Z) N N O O H N N O O H N N O O R/S H N N O H O H R/S N N O O H E/Z N N O O H E/Z N N O O R/S
  • 36. N N O H O N N O O N N O O H H N N O O H H N N O H O H H N N O O 6 databases 16 databases (no stereo) 3 databases (R) 2 databases (S) 12 databases 1 database (no stereo) HPMBP (1-Phenyl-3-methyl-4-benzoyl-pyrazolone-5) Example for a Tautomer “Conflict” occurrences in databases indexed in CSDB R/S R/S E/Z E/Z R/S H N N O O
  • 37. 6 databases 16 databases (no stereo) 3 databases (R) 2 databases (S) 12 databases occurrences in databases N N O H O 1 database (no stereo) HPMBP (1-Phenyl-3-methyl-4-benzoyl-pyrazolone-5) Example for a Tautomer “Conflict” ACD 3D Ambinter BindingDB ChemBank ChemDB ChemSpider ChemNavigator MLSMR NIAID Scripps Screening Center Thomson Pharma ZINC ChemDB ACD 3D ACX Ambinter BioByte QSAR ChemBank ChemBridge ChemDB ChemSpider DiscoveryGate EPA GCES MLSMR NCI Open Database NIST MS-Lib NLM ChemIDplus Sigma-Aldrich Thomson Pharma Ambinter ChemDB ChemSpider DiscoveryGate ChemNavigator Thomson Pharma ChemSpider ZINC ChemSpider ECOTOX ZINC N N O O R / S H N N O O N N O O H E / Z H N N O O H E / Z H N N O H O H R / S H N N O O R / S
  • 38.
  • 39.
  • 40.
  • 41.
  • 42.
  • 43.
  • 45. Scaffold Analysis NCI/CADD Chemical Structure Database molecular scaffold tree archetype scaffold simple scaffold Schuffenhauer et al. J. Chem. Inf. Model. 2007 , 47 , 47-58 Bemis et al. J. Med. Chem. 1996, 39 , 2887-2893 Bemis et al. J. Med. Chem. 1996, 39 , 2887-2893 S O O N N O level 2 level 1 example N N H O N N H O N N H
  • 46. NCI/CADD Chemical Structure Database molecular scaffold tree archetype scaffold simple scaffold 76.2 million 8.1 million scaffolds 6.8 million scaffolds 0.8 million scaffolds CSDB Scaffold Analysis uuuuu compound set level 2 level 1 N N H O O N N H N N H
  • 47. NCI/CADD Chemical Structure Database 76.2 million number of unique scaffolds per hierarchy level CSDB Scaffold Analysis uuuuu compound set 8.1 million scaffolds 0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 1 2 3 4 5 6 7 8 9 10 Hierarchy Level Number of Unique Scaffolds (in millions) 0 10.0 20.0 30.0 40.0 50.0 60.0 70.0 80.0 Number of unique structures (in million) level 2 level 1 molecular scaffold tree N N H O O N N H
  • 48. NCI/CADD Chemical Structure Database 1667 58 5 1 2 33 11 2 N N O R 2 R 1 R 9 R 8 R 7 R 6 R 5 R 4 N N R 10 R 2 R 1 R 9 R 8 R 7 R 6 R 5 R 4 R 3 21 R 3 96 5 3 4 25 1693 16 7 73 44 2,281 uuuuu parent structures 2,726 uuuuu parent structures 744,469 uuuuu parent structures 5334 structure records in 64 databases 6007 structure records in 66 databases 1,069,046 structure records in 66 databases Scaffold Analysis S O O N N O N N H O N N H
  • 50. Multilevel Neighborhoods of Atoms (MNA) HC C(C(CC-H)C(CC-C)-H(C)) HO C(C(CC-H)C(CN-H)-H(C)) CHCC C(C(CC-H)C(CN-H)-C(C-O-O)) CHCN C(C(CC-H)N(CC)-H(C)) CCCC C(C(CC-C)N(CC)-H(C)) CCOO N(C(CN-H)C(CN-H)) NCC -H(C(CC-H)) OHC -H(C(CN-H)) OC -H(-O(-H-C)) -C(C(CC-C)-O(-H-C)-O(-C)) -O(-H(-O)-C(C-O-O)) -O(-C(C-O-O)) NCI/CADD Chemical Structure Database Filimonov D., Poroikov V., Borodina Yu., Gloriozova T. J. Chem. Inf. Comput. Sci., 1999 , 39 (4), 666-670. MNA level 1 MNA level 2 N O H O H H
  • 51. Multilevel Neighborhoods of Atoms (MNA) NCI/CADD Chemical Structure Database Unique MNAs level 1 level 2 13,426 918,516 2.3 billion relationships 1.3 billion relationships ~ 17 per uuuuu parent structure ~ 30 per uuuuu parent structure 76.2 million CSDB uuuuu compound set
  • 52. Multilevel Neighborhoods of Atoms (MNA) NCI/CADD Chemical Structure Database 424,784 MNAs (level 2) are exclusive to a set of 1,3 million structures in ChemSpider Unique MNAs level 1 level 2 13,426 918,516 2.3 billion relationships 1.3 billion relationships ~ 17 per uuuuu parent structure ~ 30 per uuuuu parent structure 76.2 million CSDB uuuuu compound set
  • 53. Chemical Structure Web Services NCI/CADD web service NCI/CADD web service NCI/CADD Chemical Structure Database (CSDB) CACTVS external web services http Chemical Identifier Resolver other software packages e.g. OPSIN Chemical Structure Web Services Indexing Chemical Space
  • 54. http://cactus.nci.nih.gov/chemical/structure Chemical Identifier Resolver NCI/CADD Web Resources http://cactus.nci.nih.gov/blog
  • 55. Acknowledgments ChemNavigator Scott Hutton Tad Hurst CADD Group, CBL, NCI Igor Filippov Thanks to all database providers! http://cactus.nci.nih.gov Our web site: University of Cambridge Daniel Lowe Peter Murray-Rust Noel’ O Boyle (University College Cork, Ireland) Richard Apodaca (Metamolecular) Hans-Juergen Himmler
  • 56. Acknowledgments - Software CACTVS Python Web Framework Python SQL Library Peter Ertl (Novartis) ChemWriter Javascript library
  • 57.