SlideShare uma empresa Scribd logo
1 de 36
www.guidetopharmacology.org
The transformative utility of InChIKey searching in
the Mother of all Databases (a.k.a. Google)
Chris Southan
IUPHAR/BPS Guide to PHARMACOLOGY Web portal Group, Centre for Integrative Physiology,
School of Biomedical Sciences, University of Edinburgh,
Hugh Robson Building, Edinburgh, EH8 9XD, UK.
cdsouthan@hotmail.com
1
Outline
• Introduction: the atorvastatin example
• Chem-to-bio context
• IK stats and estimates
• Extracting IKs from documents
• IK database-to-database
• Open Source malaria drug discovery as a testbed
• Caveats and future prospects
2
The precedent
InChI as a web index for molecules
“We have now discovered, serendipitously, that these InChIs have been
comprehensively and accurately indexed by the Google search engine. From
preliminary exploration it appears that every known document in which an InChI
appears has been indexed and that all are retrievable by standard queries with
virtually 100% precision. This means that standard Web-based indexers, without
any alteration, are capable of acting as completely precise chemical search
engines. Although we have many years of developing chemistry on the web, this
was an unexpected and very welcome finding”
Murray-Rust et al. 2004 http://lists.w3.org/Archives/Public/public-swls-
ws/2004Oct/att-0019/
3
IK example: atorvastatin and metabolites
4
Fast and clean results
5
parent
para-hydroxy
ortho-hydroxy
Inner layer XUKUURHRXDUEBC image search
6
Making the chem < > bio join
Biochemistry
Medicinal chemistry
Toxicology
Chemical biology
Systems pharmacology
Metabolomics
Drug discovery
Pharmacology
Chemogenomics
InChIKey
7
Getting biology out of text-tombs is not easy;
Getting chemistry out is even more difficult
8
Why chem < > bio joining is difficult
• The majority of chemistry embedded in biological reports is specified as
semantic names or images
• The MeSH to PubChem connectivity is patchy
• Biologists use sequence database accession numbers, ontologies and
gene names widely but chemists rarely use open chemical database IDs
• Most bioactive chemistry in text does not have direct connectivity to
databases (unlike GenBank/RefSeq/UniProt < > PubMed)
• Nat.Chem.Biol. is the only bio-journal that mandates PubChem
reciprocal linking
• Most authors don’t engage with surfacing and connectivity (e.g.
becoming PubChem submitters and/or figshare data depositors)
• Chemists and biologists tend not to communicate easily
• GenBank started in 1982, PubChem in 2004
• Inventors/authors under-cite their own medicinal chemistry patents
9
So how many IKs has Google indexed ?
• PubChem ~ 50 million
• ChemSpider ~ 30 million
• PubChem from patents (all sources) ~ 15 million
• PubChem journal sources (PubMed + ChEMBL) ~ 1 million
• Web sources outside the above (no idea)
• Open ELNs (no idea)
Guestimate 60 million-ish
10
Databases < > documents:
IK Googling facilitates reciprocal linking
Abstracts
Patents
Papers
15 mill
0.2 mill (mainly MeSH)
0.9 mill (ChEMBL)
12K
11
IKs with data-supported bioactivity (>biology)
• GVKBIO Online Structure Activity Relationship Database (GOSTAR ) = 6.3
million with SAR data from patents and literature (not tagged in PubChem)
• Thomson Pharma = 4.2 million selected examples from patents and literature
• PubChem BioAssay “active” = 0.93 million
• ChEMBL (in PubChem) = 0.88 million
• Thomson Pharma (2013 only) = 0.27 million
• PubMed = 0.23 million
• MeSH “pharmacology” = 12,719
• INN or USAN = 10,707
• Union of last two above = 19,334 intersect = 4,092
• Prous (Thomson) Drugs of the Future = 7,218
• DrugBank approved (via SIDs) = 1,504
Guestimate for chemistry with a useful level of solubility, stability, specificity and
potency (e.g. < 250 nM) in biological systems ~ 0.5 million IKs (but of course we
also need low potency and inactives for controls and SAR) 12
IKs and the representational hextet used in
documents and databases
13
Extracting IKs from documents: OPSIN
14
Extracting IKs from documents: chemicalize.org
15
Extracting IKs from documents OSRA
16
Extracting IKs from documents: sketchers
17
IK call-outs in dbs: extending the link reach
18
Modified peptides/big stuff:
connection where similarity struggles
http://www.guidetopharmacology.org/GRAC/LigandDisplayForwar
d?ligandId=2532 19
OSM drug discovery: test bed for open data
surfacing and connecting chem > bio
• Team are exploring chemistry surfacing/sharing in real time (e.g. ELNs,
Wiki,Github, ChEMBLMalaria for project updates)
• Converted to IK utility (after the necessary evangelizing)
• Global antimalarial drug R&D (open and closed) exemplifies full range of
connectivity issues that IK surfacing can potentially ameliorate
20
Actively unlocking IK connections
21
Name > structure > biology: missing links
22
Where the IK connects……
23
Chemicalize.org: 413 strucs/IKs fromWO2011086531
CID 53311393 ->
24
WO2011086531 >chemicalize.org > SAR IC50s
> figshare surfaces and connects (e.g. PubChem)
25
Share structures via open MyNCBI
http://www.ncbi.nlm.nih.gov/sites/myncbi/collections/public/1zWhcobieZbIo
uGfUdsdbHek5/. 26
DIY surfacing of name < > IK connections
27
Caveats and risks for IK Googling
• Ranking heuristics are opaque and change
• Results shift on short time scales (i.e. irreproducible)
• NoAPI (or good search result set parsers)
• Don’t ignore corroborative searches in well-structured databases
• Searching common IKs is not generally useful (but can filter)
• No good for similarity searching on their own (but you can intersperse with
similarity approaches)
• In the relentless war between good and evil (Google verses the SEO Dark
Side) dodgy chemical suppliers are always pushing
• There may be future risks of common chemistry swamping
• Names, SMILES or even IUPAC strings may sometimes give Google hits
where the IK misses (because its not there)
28
What does the future hold /need ?
• For manual searching Googling the IK is the “first stop shop”
• InChI world-domination is proceeding
• Inexorable increase in full-text, open access journals and crawled open
repositories (e.g. figshare)
• Journals must encourage author chemistry mark-up to include the IK
• More biologists getting into chemistry connections and databases
• Boutique bioactive chemistry databases becoming more discoverable
• SureChEMBL will improve image handling and get crawled
• RSC Journal Archive > ChemSpider
• ContentMine (Murry-Rust et. al.) 100 million facts, including journal-extracted
chemical structure streaming
• More Open (Source) Drug Discovery > Google crawled ELNs with IKs
• Wider community use of Chemicalize.org for targeted extractions
• New IK via source expansion in ChemSpider and PubChem
29
Thanks and Questions
30
Extras
31
Abstract
Abstract: Google indexing of the InChIKey (IK) has turned the web into a de facto
chemical database with well over 50 million unique entries (PMID:23399051). The
first block of the IK encodes molecular skeleton that can be used to give maximum
recall of related structures. For example, Google searching XUKUURHRXDUEBC
from atorvastatin displays ~200 low-redundancy links in ~0.3 sec with a low false-
positive rate . These include most major databases and less familiar but valuable
sources. The simplicity of the IK makes it useful for those less familiar with chemical
searching. Advanced Google Search can be used to filter results, image searching
gives complementary coverage and there are also hits in Google Scholar. IK
searching thus becomes powerfully enabling for reciprocal document-to-database
joins from legacy text tombs including over 50 years of biology < > chemistry. Open
tools such as chemicalize.org can generate of IKs from patents, papers, abstracts or
web pages. Open Drug Discovery data on tested, synthesized or even proposed
compounds, can be globally connected in real-time by surfacing IKs in open
laboratory notebooks, Wikis, blogs, Twitter, figshare etc. Following the ChemSpider
precedent the IUPHAR/GTP database offers users IK Google searches from all ligand
entries including peptides.
32
Patent SAR fromWO2011086531:
Collating activities via SureChemOpen
CID 53311393 >
33
Triaging document or webpage chemistry
• Identify the structure specification types
– Semantic names (all sources)
– Code names (press releases, papers and abstracts)
– IUPAC names (papers, patents and abstracts)
– Images (papers, patents, & Google images)
– SMILES (open lab books)
– InChi strings (open lab books)
– SDF files (open lab books, & github)
Convert these to a structure (e.g. SDF, SMILES, InChI) then:
– Search InChIKey in Google
– Search major databases
– Compare extracted sets for intersects and diffs
– Extend exact match connectivity with similarity searching
34
Orthogonal joining
35
Triage example: a new
antimalaria
The MMV390048 code
name is linked to an
image in press reports
but is PubChem and
PubMed -ve
36

Mais conteúdo relacionado

Mais de Chris Southan

Connectivity > documents > structures > bioactivity
Connectivity > documents > structures > bioactivityConnectivity > documents > structures > bioactivity
Connectivity > documents > structures > bioactivityChris Southan
 
Peptide tribulations
Peptide tribulationsPeptide tribulations
Peptide tribulationsChris Southan
 
Vicissitudes of target validation for BACE1 and BACE2
Vicissitudes of target validation for BACE1 and BACE2 Vicissitudes of target validation for BACE1 and BACE2
Vicissitudes of target validation for BACE1 and BACE2 Chris Southan
 
Guide to Pharmacology database: ELIXIR updae
Guide to Pharmacology database: ELIXIR updaeGuide to Pharmacology database: ELIXIR updae
Guide to Pharmacology database: ELIXIR updaeChris Southan
 
In silico 360 Analysis for Drug Development
In silico 360 Analysis for Drug DevelopmentIn silico 360 Analysis for Drug Development
In silico 360 Analysis for Drug DevelopmentChris Southan
 
Will the correct BACE ORFs please stand up?
Will the correct BACE ORFs please stand up?Will the correct BACE ORFs please stand up?
Will the correct BACE ORFs please stand up?Chris Southan
 
Desperately seeking DARCP
Desperately seeking DARCPDesperately seeking DARCP
Desperately seeking DARCPChris Southan
 
Seeking glimmers of light in Pharos “Tdark” proteins
Seeking glimmers of light in  Pharos “Tdark” proteinsSeeking glimmers of light in  Pharos “Tdark” proteins
Seeking glimmers of light in Pharos “Tdark” proteinsChris Southan
 
5HT2A modulators update for SAFER
5HT2A modulators update for SAFER5HT2A modulators update for SAFER
5HT2A modulators update for SAFERChris Southan
 
Quality and noise in big chemistry databases
Quality and noise in big chemistry databasesQuality and noise in big chemistry databases
Quality and noise in big chemistry databasesChris Southan
 
Connecting chemistry-to-biology
Connecting chemistry-to-biology Connecting chemistry-to-biology
Connecting chemistry-to-biology Chris Southan
 
GtoPdb June 2019 poster
GtoPdb June 2019 posterGtoPdb June 2019 poster
GtoPdb June 2019 posterChris Southan
 
PubChem as a source of systems biology perturbagens
PubChem as a source of  systems biology perturbagensPubChem as a source of  systems biology perturbagens
PubChem as a source of systems biology perturbagensChris Southan
 
PubChem for drug discovery and chemical biology
PubChem for drug discovery and chemical biologyPubChem for drug discovery and chemical biology
PubChem for drug discovery and chemical biologyChris Southan
 
Will the real proteins please stand up
Will the real proteins please stand upWill the real proteins please stand up
Will the real proteins please stand upChris Southan
 
Peptide Tribulations
Peptide TribulationsPeptide Tribulations
Peptide TribulationsChris Southan
 
Looking at chemistry - protein - papers connectivity in ELIXIR
Looking at chemistry - protein - papers connectivity in ELIXIRLooking at chemistry - protein - papers connectivity in ELIXIR
Looking at chemistry - protein - papers connectivity in ELIXIRChris Southan
 
Guide to Immunopharmacology update
Guide to Immunopharmacology updateGuide to Immunopharmacology update
Guide to Immunopharmacology updateChris Southan
 
Druggable Proteome sources in UniProt
Druggable Proteome sources in UniProtDruggable Proteome sources in UniProt
Druggable Proteome sources in UniProtChris Southan
 
Peptide Tribulations in GtoPdb
Peptide Tribulations in GtoPdbPeptide Tribulations in GtoPdb
Peptide Tribulations in GtoPdbChris Southan
 

Mais de Chris Southan (20)

Connectivity > documents > structures > bioactivity
Connectivity > documents > structures > bioactivityConnectivity > documents > structures > bioactivity
Connectivity > documents > structures > bioactivity
 
Peptide tribulations
Peptide tribulationsPeptide tribulations
Peptide tribulations
 
Vicissitudes of target validation for BACE1 and BACE2
Vicissitudes of target validation for BACE1 and BACE2 Vicissitudes of target validation for BACE1 and BACE2
Vicissitudes of target validation for BACE1 and BACE2
 
Guide to Pharmacology database: ELIXIR updae
Guide to Pharmacology database: ELIXIR updaeGuide to Pharmacology database: ELIXIR updae
Guide to Pharmacology database: ELIXIR updae
 
In silico 360 Analysis for Drug Development
In silico 360 Analysis for Drug DevelopmentIn silico 360 Analysis for Drug Development
In silico 360 Analysis for Drug Development
 
Will the correct BACE ORFs please stand up?
Will the correct BACE ORFs please stand up?Will the correct BACE ORFs please stand up?
Will the correct BACE ORFs please stand up?
 
Desperately seeking DARCP
Desperately seeking DARCPDesperately seeking DARCP
Desperately seeking DARCP
 
Seeking glimmers of light in Pharos “Tdark” proteins
Seeking glimmers of light in  Pharos “Tdark” proteinsSeeking glimmers of light in  Pharos “Tdark” proteins
Seeking glimmers of light in Pharos “Tdark” proteins
 
5HT2A modulators update for SAFER
5HT2A modulators update for SAFER5HT2A modulators update for SAFER
5HT2A modulators update for SAFER
 
Quality and noise in big chemistry databases
Quality and noise in big chemistry databasesQuality and noise in big chemistry databases
Quality and noise in big chemistry databases
 
Connecting chemistry-to-biology
Connecting chemistry-to-biology Connecting chemistry-to-biology
Connecting chemistry-to-biology
 
GtoPdb June 2019 poster
GtoPdb June 2019 posterGtoPdb June 2019 poster
GtoPdb June 2019 poster
 
PubChem as a source of systems biology perturbagens
PubChem as a source of  systems biology perturbagensPubChem as a source of  systems biology perturbagens
PubChem as a source of systems biology perturbagens
 
PubChem for drug discovery and chemical biology
PubChem for drug discovery and chemical biologyPubChem for drug discovery and chemical biology
PubChem for drug discovery and chemical biology
 
Will the real proteins please stand up
Will the real proteins please stand upWill the real proteins please stand up
Will the real proteins please stand up
 
Peptide Tribulations
Peptide TribulationsPeptide Tribulations
Peptide Tribulations
 
Looking at chemistry - protein - papers connectivity in ELIXIR
Looking at chemistry - protein - papers connectivity in ELIXIRLooking at chemistry - protein - papers connectivity in ELIXIR
Looking at chemistry - protein - papers connectivity in ELIXIR
 
Guide to Immunopharmacology update
Guide to Immunopharmacology updateGuide to Immunopharmacology update
Guide to Immunopharmacology update
 
Druggable Proteome sources in UniProt
Druggable Proteome sources in UniProtDruggable Proteome sources in UniProt
Druggable Proteome sources in UniProt
 
Peptide Tribulations in GtoPdb
Peptide Tribulations in GtoPdbPeptide Tribulations in GtoPdb
Peptide Tribulations in GtoPdb
 

Último

Jogeshwari ! Call Girls Service Mumbai - 450+ Call Girl Cash Payment 90042684...
Jogeshwari ! Call Girls Service Mumbai - 450+ Call Girl Cash Payment 90042684...Jogeshwari ! Call Girls Service Mumbai - 450+ Call Girl Cash Payment 90042684...
Jogeshwari ! Call Girls Service Mumbai - 450+ Call Girl Cash Payment 90042684...Anamika Rawat
 
Call Girls Kolkata Kalikapur 💯Call Us 🔝 8005736733 🔝 💃 Top Class Call Girl Se...
Call Girls Kolkata Kalikapur 💯Call Us 🔝 8005736733 🔝 💃 Top Class Call Girl Se...Call Girls Kolkata Kalikapur 💯Call Us 🔝 8005736733 🔝 💃 Top Class Call Girl Se...
Call Girls Kolkata Kalikapur 💯Call Us 🔝 8005736733 🔝 💃 Top Class Call Girl Se...Namrata Singh
 
Andheri East ) Call Girls in Mumbai Phone No 9004268417 Elite Escort Service ...
Andheri East ) Call Girls in Mumbai Phone No 9004268417 Elite Escort Service ...Andheri East ) Call Girls in Mumbai Phone No 9004268417 Elite Escort Service ...
Andheri East ) Call Girls in Mumbai Phone No 9004268417 Elite Escort Service ...Anamika Rawat
 
Call Girls Service Jaipur {9521753030 } ❤️VVIP BHAWNA Call Girl in Jaipur Raj...
Call Girls Service Jaipur {9521753030 } ❤️VVIP BHAWNA Call Girl in Jaipur Raj...Call Girls Service Jaipur {9521753030 } ❤️VVIP BHAWNA Call Girl in Jaipur Raj...
Call Girls Service Jaipur {9521753030 } ❤️VVIP BHAWNA Call Girl in Jaipur Raj...khalifaescort01
 
Call Girls Rishikesh Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Rishikesh Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Rishikesh Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Rishikesh Just Call 8250077686 Top Class Call Girl Service AvailableDipal Arora
 
All Time Service Available Call Girls Marine Drive 📳 9820252231 For 18+ VIP C...
All Time Service Available Call Girls Marine Drive 📳 9820252231 For 18+ VIP C...All Time Service Available Call Girls Marine Drive 📳 9820252231 For 18+ VIP C...
All Time Service Available Call Girls Marine Drive 📳 9820252231 For 18+ VIP C...Arohi Goyal
 
Night 7k to 12k Chennai City Center Call Girls 👉👉 7427069034⭐⭐ 100% Genuine E...
Night 7k to 12k Chennai City Center Call Girls 👉👉 7427069034⭐⭐ 100% Genuine E...Night 7k to 12k Chennai City Center Call Girls 👉👉 7427069034⭐⭐ 100% Genuine E...
Night 7k to 12k Chennai City Center Call Girls 👉👉 7427069034⭐⭐ 100% Genuine E...hotbabesbook
 
Call Girls in Delhi Triveni Complex Escort Service(🔝))/WhatsApp 97111⇛47426
Call Girls in Delhi Triveni Complex Escort Service(🔝))/WhatsApp 97111⇛47426Call Girls in Delhi Triveni Complex Escort Service(🔝))/WhatsApp 97111⇛47426
Call Girls in Delhi Triveni Complex Escort Service(🔝))/WhatsApp 97111⇛47426jennyeacort
 
Call Girls Vasai Virar Just Call 9630942363 Top Class Call Girl Service Avail...
Call Girls Vasai Virar Just Call 9630942363 Top Class Call Girl Service Avail...Call Girls Vasai Virar Just Call 9630942363 Top Class Call Girl Service Avail...
Call Girls Vasai Virar Just Call 9630942363 Top Class Call Girl Service Avail...GENUINE ESCORT AGENCY
 
(Low Rate RASHMI ) Rate Of Call Girls Jaipur ❣ 8445551418 ❣ Elite Models & Ce...
(Low Rate RASHMI ) Rate Of Call Girls Jaipur ❣ 8445551418 ❣ Elite Models & Ce...(Low Rate RASHMI ) Rate Of Call Girls Jaipur ❣ 8445551418 ❣ Elite Models & Ce...
(Low Rate RASHMI ) Rate Of Call Girls Jaipur ❣ 8445551418 ❣ Elite Models & Ce...parulsinha
 
Call Girls Hyderabad Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Hyderabad Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Hyderabad Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Hyderabad Just Call 8250077686 Top Class Call Girl Service AvailableDipal Arora
 
💕SONAM KUMAR💕Premium Call Girls Jaipur ↘️9257276172 ↙️One Night Stand With Lo...
💕SONAM KUMAR💕Premium Call Girls Jaipur ↘️9257276172 ↙️One Night Stand With Lo...💕SONAM KUMAR💕Premium Call Girls Jaipur ↘️9257276172 ↙️One Night Stand With Lo...
💕SONAM KUMAR💕Premium Call Girls Jaipur ↘️9257276172 ↙️One Night Stand With Lo...khalifaescort01
 
Call Girls Jaipur Just Call 9521753030 Top Class Call Girl Service Available
Call Girls Jaipur Just Call 9521753030 Top Class Call Girl Service AvailableCall Girls Jaipur Just Call 9521753030 Top Class Call Girl Service Available
Call Girls Jaipur Just Call 9521753030 Top Class Call Girl Service AvailableJanvi Singh
 
The Most Attractive Hyderabad Call Girls Kothapet 𖠋 9332606886 𖠋 Will You Mis...
The Most Attractive Hyderabad Call Girls Kothapet 𖠋 9332606886 𖠋 Will You Mis...The Most Attractive Hyderabad Call Girls Kothapet 𖠋 9332606886 𖠋 Will You Mis...
The Most Attractive Hyderabad Call Girls Kothapet 𖠋 9332606886 𖠋 Will You Mis...chandars293
 
Manyata Tech Park ( Call Girls ) Bangalore ✔ 6297143586 ✔ Hot Model With Sexy...
Manyata Tech Park ( Call Girls ) Bangalore ✔ 6297143586 ✔ Hot Model With Sexy...Manyata Tech Park ( Call Girls ) Bangalore ✔ 6297143586 ✔ Hot Model With Sexy...
Manyata Tech Park ( Call Girls ) Bangalore ✔ 6297143586 ✔ Hot Model With Sexy...vidya singh
 
Call Girls Rishikesh Just Call 9667172968 Top Class Call Girl Service Available
Call Girls Rishikesh Just Call 9667172968 Top Class Call Girl Service AvailableCall Girls Rishikesh Just Call 9667172968 Top Class Call Girl Service Available
Call Girls Rishikesh Just Call 9667172968 Top Class Call Girl Service Availableperfect solution
 
Call Girl in Indore 8827247818 {LowPrice} ❤️ (ahana) Indore Call Girls * UPA...
Call Girl in Indore 8827247818 {LowPrice} ❤️ (ahana) Indore Call Girls  * UPA...Call Girl in Indore 8827247818 {LowPrice} ❤️ (ahana) Indore Call Girls  * UPA...
Call Girl in Indore 8827247818 {LowPrice} ❤️ (ahana) Indore Call Girls * UPA...mahaiklolahd
 
Trichy Call Girls Book Now 9630942363 Top Class Trichy Escort Service Available
Trichy Call Girls Book Now 9630942363 Top Class Trichy Escort Service AvailableTrichy Call Girls Book Now 9630942363 Top Class Trichy Escort Service Available
Trichy Call Girls Book Now 9630942363 Top Class Trichy Escort Service AvailableGENUINE ESCORT AGENCY
 
Night 7k to 12k Navi Mumbai Call Girl Photo 👉 BOOK NOW 9833363713 👈 ♀️ night ...
Night 7k to 12k Navi Mumbai Call Girl Photo 👉 BOOK NOW 9833363713 👈 ♀️ night ...Night 7k to 12k Navi Mumbai Call Girl Photo 👉 BOOK NOW 9833363713 👈 ♀️ night ...
Night 7k to 12k Navi Mumbai Call Girl Photo 👉 BOOK NOW 9833363713 👈 ♀️ night ...aartirawatdelhi
 
Top Quality Call Girl Service Kalyanpur 6378878445 Available Call Girls Any Time
Top Quality Call Girl Service Kalyanpur 6378878445 Available Call Girls Any TimeTop Quality Call Girl Service Kalyanpur 6378878445 Available Call Girls Any Time
Top Quality Call Girl Service Kalyanpur 6378878445 Available Call Girls Any TimeCall Girls Delhi
 

Último (20)

Jogeshwari ! Call Girls Service Mumbai - 450+ Call Girl Cash Payment 90042684...
Jogeshwari ! Call Girls Service Mumbai - 450+ Call Girl Cash Payment 90042684...Jogeshwari ! Call Girls Service Mumbai - 450+ Call Girl Cash Payment 90042684...
Jogeshwari ! Call Girls Service Mumbai - 450+ Call Girl Cash Payment 90042684...
 
Call Girls Kolkata Kalikapur 💯Call Us 🔝 8005736733 🔝 💃 Top Class Call Girl Se...
Call Girls Kolkata Kalikapur 💯Call Us 🔝 8005736733 🔝 💃 Top Class Call Girl Se...Call Girls Kolkata Kalikapur 💯Call Us 🔝 8005736733 🔝 💃 Top Class Call Girl Se...
Call Girls Kolkata Kalikapur 💯Call Us 🔝 8005736733 🔝 💃 Top Class Call Girl Se...
 
Andheri East ) Call Girls in Mumbai Phone No 9004268417 Elite Escort Service ...
Andheri East ) Call Girls in Mumbai Phone No 9004268417 Elite Escort Service ...Andheri East ) Call Girls in Mumbai Phone No 9004268417 Elite Escort Service ...
Andheri East ) Call Girls in Mumbai Phone No 9004268417 Elite Escort Service ...
 
Call Girls Service Jaipur {9521753030 } ❤️VVIP BHAWNA Call Girl in Jaipur Raj...
Call Girls Service Jaipur {9521753030 } ❤️VVIP BHAWNA Call Girl in Jaipur Raj...Call Girls Service Jaipur {9521753030 } ❤️VVIP BHAWNA Call Girl in Jaipur Raj...
Call Girls Service Jaipur {9521753030 } ❤️VVIP BHAWNA Call Girl in Jaipur Raj...
 
Call Girls Rishikesh Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Rishikesh Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Rishikesh Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Rishikesh Just Call 8250077686 Top Class Call Girl Service Available
 
All Time Service Available Call Girls Marine Drive 📳 9820252231 For 18+ VIP C...
All Time Service Available Call Girls Marine Drive 📳 9820252231 For 18+ VIP C...All Time Service Available Call Girls Marine Drive 📳 9820252231 For 18+ VIP C...
All Time Service Available Call Girls Marine Drive 📳 9820252231 For 18+ VIP C...
 
Night 7k to 12k Chennai City Center Call Girls 👉👉 7427069034⭐⭐ 100% Genuine E...
Night 7k to 12k Chennai City Center Call Girls 👉👉 7427069034⭐⭐ 100% Genuine E...Night 7k to 12k Chennai City Center Call Girls 👉👉 7427069034⭐⭐ 100% Genuine E...
Night 7k to 12k Chennai City Center Call Girls 👉👉 7427069034⭐⭐ 100% Genuine E...
 
Call Girls in Delhi Triveni Complex Escort Service(🔝))/WhatsApp 97111⇛47426
Call Girls in Delhi Triveni Complex Escort Service(🔝))/WhatsApp 97111⇛47426Call Girls in Delhi Triveni Complex Escort Service(🔝))/WhatsApp 97111⇛47426
Call Girls in Delhi Triveni Complex Escort Service(🔝))/WhatsApp 97111⇛47426
 
Call Girls Vasai Virar Just Call 9630942363 Top Class Call Girl Service Avail...
Call Girls Vasai Virar Just Call 9630942363 Top Class Call Girl Service Avail...Call Girls Vasai Virar Just Call 9630942363 Top Class Call Girl Service Avail...
Call Girls Vasai Virar Just Call 9630942363 Top Class Call Girl Service Avail...
 
(Low Rate RASHMI ) Rate Of Call Girls Jaipur ❣ 8445551418 ❣ Elite Models & Ce...
(Low Rate RASHMI ) Rate Of Call Girls Jaipur ❣ 8445551418 ❣ Elite Models & Ce...(Low Rate RASHMI ) Rate Of Call Girls Jaipur ❣ 8445551418 ❣ Elite Models & Ce...
(Low Rate RASHMI ) Rate Of Call Girls Jaipur ❣ 8445551418 ❣ Elite Models & Ce...
 
Call Girls Hyderabad Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Hyderabad Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Hyderabad Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Hyderabad Just Call 8250077686 Top Class Call Girl Service Available
 
💕SONAM KUMAR💕Premium Call Girls Jaipur ↘️9257276172 ↙️One Night Stand With Lo...
💕SONAM KUMAR💕Premium Call Girls Jaipur ↘️9257276172 ↙️One Night Stand With Lo...💕SONAM KUMAR💕Premium Call Girls Jaipur ↘️9257276172 ↙️One Night Stand With Lo...
💕SONAM KUMAR💕Premium Call Girls Jaipur ↘️9257276172 ↙️One Night Stand With Lo...
 
Call Girls Jaipur Just Call 9521753030 Top Class Call Girl Service Available
Call Girls Jaipur Just Call 9521753030 Top Class Call Girl Service AvailableCall Girls Jaipur Just Call 9521753030 Top Class Call Girl Service Available
Call Girls Jaipur Just Call 9521753030 Top Class Call Girl Service Available
 
The Most Attractive Hyderabad Call Girls Kothapet 𖠋 9332606886 𖠋 Will You Mis...
The Most Attractive Hyderabad Call Girls Kothapet 𖠋 9332606886 𖠋 Will You Mis...The Most Attractive Hyderabad Call Girls Kothapet 𖠋 9332606886 𖠋 Will You Mis...
The Most Attractive Hyderabad Call Girls Kothapet 𖠋 9332606886 𖠋 Will You Mis...
 
Manyata Tech Park ( Call Girls ) Bangalore ✔ 6297143586 ✔ Hot Model With Sexy...
Manyata Tech Park ( Call Girls ) Bangalore ✔ 6297143586 ✔ Hot Model With Sexy...Manyata Tech Park ( Call Girls ) Bangalore ✔ 6297143586 ✔ Hot Model With Sexy...
Manyata Tech Park ( Call Girls ) Bangalore ✔ 6297143586 ✔ Hot Model With Sexy...
 
Call Girls Rishikesh Just Call 9667172968 Top Class Call Girl Service Available
Call Girls Rishikesh Just Call 9667172968 Top Class Call Girl Service AvailableCall Girls Rishikesh Just Call 9667172968 Top Class Call Girl Service Available
Call Girls Rishikesh Just Call 9667172968 Top Class Call Girl Service Available
 
Call Girl in Indore 8827247818 {LowPrice} ❤️ (ahana) Indore Call Girls * UPA...
Call Girl in Indore 8827247818 {LowPrice} ❤️ (ahana) Indore Call Girls  * UPA...Call Girl in Indore 8827247818 {LowPrice} ❤️ (ahana) Indore Call Girls  * UPA...
Call Girl in Indore 8827247818 {LowPrice} ❤️ (ahana) Indore Call Girls * UPA...
 
Trichy Call Girls Book Now 9630942363 Top Class Trichy Escort Service Available
Trichy Call Girls Book Now 9630942363 Top Class Trichy Escort Service AvailableTrichy Call Girls Book Now 9630942363 Top Class Trichy Escort Service Available
Trichy Call Girls Book Now 9630942363 Top Class Trichy Escort Service Available
 
Night 7k to 12k Navi Mumbai Call Girl Photo 👉 BOOK NOW 9833363713 👈 ♀️ night ...
Night 7k to 12k Navi Mumbai Call Girl Photo 👉 BOOK NOW 9833363713 👈 ♀️ night ...Night 7k to 12k Navi Mumbai Call Girl Photo 👉 BOOK NOW 9833363713 👈 ♀️ night ...
Night 7k to 12k Navi Mumbai Call Girl Photo 👉 BOOK NOW 9833363713 👈 ♀️ night ...
 
Top Quality Call Girl Service Kalyanpur 6378878445 Available Call Girls Any Time
Top Quality Call Girl Service Kalyanpur 6378878445 Available Call Girls Any TimeTop Quality Call Girl Service Kalyanpur 6378878445 Available Call Girls Any Time
Top Quality Call Girl Service Kalyanpur 6378878445 Available Call Girls Any Time
 

Transformative Utility of InChIKey Searching in the Mother of all Databases

  • 1. www.guidetopharmacology.org The transformative utility of InChIKey searching in the Mother of all Databases (a.k.a. Google) Chris Southan IUPHAR/BPS Guide to PHARMACOLOGY Web portal Group, Centre for Integrative Physiology, School of Biomedical Sciences, University of Edinburgh, Hugh Robson Building, Edinburgh, EH8 9XD, UK. cdsouthan@hotmail.com 1
  • 2. Outline • Introduction: the atorvastatin example • Chem-to-bio context • IK stats and estimates • Extracting IKs from documents • IK database-to-database • Open Source malaria drug discovery as a testbed • Caveats and future prospects 2
  • 3. The precedent InChI as a web index for molecules “We have now discovered, serendipitously, that these InChIs have been comprehensively and accurately indexed by the Google search engine. From preliminary exploration it appears that every known document in which an InChI appears has been indexed and that all are retrievable by standard queries with virtually 100% precision. This means that standard Web-based indexers, without any alteration, are capable of acting as completely precise chemical search engines. Although we have many years of developing chemistry on the web, this was an unexpected and very welcome finding” Murray-Rust et al. 2004 http://lists.w3.org/Archives/Public/public-swls- ws/2004Oct/att-0019/ 3
  • 4. IK example: atorvastatin and metabolites 4
  • 5. Fast and clean results 5 parent para-hydroxy ortho-hydroxy
  • 6. Inner layer XUKUURHRXDUEBC image search 6
  • 7. Making the chem < > bio join Biochemistry Medicinal chemistry Toxicology Chemical biology Systems pharmacology Metabolomics Drug discovery Pharmacology Chemogenomics InChIKey 7
  • 8. Getting biology out of text-tombs is not easy; Getting chemistry out is even more difficult 8
  • 9. Why chem < > bio joining is difficult • The majority of chemistry embedded in biological reports is specified as semantic names or images • The MeSH to PubChem connectivity is patchy • Biologists use sequence database accession numbers, ontologies and gene names widely but chemists rarely use open chemical database IDs • Most bioactive chemistry in text does not have direct connectivity to databases (unlike GenBank/RefSeq/UniProt < > PubMed) • Nat.Chem.Biol. is the only bio-journal that mandates PubChem reciprocal linking • Most authors don’t engage with surfacing and connectivity (e.g. becoming PubChem submitters and/or figshare data depositors) • Chemists and biologists tend not to communicate easily • GenBank started in 1982, PubChem in 2004 • Inventors/authors under-cite their own medicinal chemistry patents 9
  • 10. So how many IKs has Google indexed ? • PubChem ~ 50 million • ChemSpider ~ 30 million • PubChem from patents (all sources) ~ 15 million • PubChem journal sources (PubMed + ChEMBL) ~ 1 million • Web sources outside the above (no idea) • Open ELNs (no idea) Guestimate 60 million-ish 10
  • 11. Databases < > documents: IK Googling facilitates reciprocal linking Abstracts Patents Papers 15 mill 0.2 mill (mainly MeSH) 0.9 mill (ChEMBL) 12K 11
  • 12. IKs with data-supported bioactivity (>biology) • GVKBIO Online Structure Activity Relationship Database (GOSTAR ) = 6.3 million with SAR data from patents and literature (not tagged in PubChem) • Thomson Pharma = 4.2 million selected examples from patents and literature • PubChem BioAssay “active” = 0.93 million • ChEMBL (in PubChem) = 0.88 million • Thomson Pharma (2013 only) = 0.27 million • PubMed = 0.23 million • MeSH “pharmacology” = 12,719 • INN or USAN = 10,707 • Union of last two above = 19,334 intersect = 4,092 • Prous (Thomson) Drugs of the Future = 7,218 • DrugBank approved (via SIDs) = 1,504 Guestimate for chemistry with a useful level of solubility, stability, specificity and potency (e.g. < 250 nM) in biological systems ~ 0.5 million IKs (but of course we also need low potency and inactives for controls and SAR) 12
  • 13. IKs and the representational hextet used in documents and databases 13
  • 14. Extracting IKs from documents: OPSIN 14
  • 15. Extracting IKs from documents: chemicalize.org 15
  • 16. Extracting IKs from documents OSRA 16
  • 17. Extracting IKs from documents: sketchers 17
  • 18. IK call-outs in dbs: extending the link reach 18
  • 19. Modified peptides/big stuff: connection where similarity struggles http://www.guidetopharmacology.org/GRAC/LigandDisplayForwar d?ligandId=2532 19
  • 20. OSM drug discovery: test bed for open data surfacing and connecting chem > bio • Team are exploring chemistry surfacing/sharing in real time (e.g. ELNs, Wiki,Github, ChEMBLMalaria for project updates) • Converted to IK utility (after the necessary evangelizing) • Global antimalarial drug R&D (open and closed) exemplifies full range of connectivity issues that IK surfacing can potentially ameliorate 20
  • 21. Actively unlocking IK connections 21
  • 22. Name > structure > biology: missing links 22
  • 23. Where the IK connects…… 23
  • 24. Chemicalize.org: 413 strucs/IKs fromWO2011086531 CID 53311393 -> 24
  • 25. WO2011086531 >chemicalize.org > SAR IC50s > figshare surfaces and connects (e.g. PubChem) 25
  • 26. Share structures via open MyNCBI http://www.ncbi.nlm.nih.gov/sites/myncbi/collections/public/1zWhcobieZbIo uGfUdsdbHek5/. 26
  • 27. DIY surfacing of name < > IK connections 27
  • 28. Caveats and risks for IK Googling • Ranking heuristics are opaque and change • Results shift on short time scales (i.e. irreproducible) • NoAPI (or good search result set parsers) • Don’t ignore corroborative searches in well-structured databases • Searching common IKs is not generally useful (but can filter) • No good for similarity searching on their own (but you can intersperse with similarity approaches) • In the relentless war between good and evil (Google verses the SEO Dark Side) dodgy chemical suppliers are always pushing • There may be future risks of common chemistry swamping • Names, SMILES or even IUPAC strings may sometimes give Google hits where the IK misses (because its not there) 28
  • 29. What does the future hold /need ? • For manual searching Googling the IK is the “first stop shop” • InChI world-domination is proceeding • Inexorable increase in full-text, open access journals and crawled open repositories (e.g. figshare) • Journals must encourage author chemistry mark-up to include the IK • More biologists getting into chemistry connections and databases • Boutique bioactive chemistry databases becoming more discoverable • SureChEMBL will improve image handling and get crawled • RSC Journal Archive > ChemSpider • ContentMine (Murry-Rust et. al.) 100 million facts, including journal-extracted chemical structure streaming • More Open (Source) Drug Discovery > Google crawled ELNs with IKs • Wider community use of Chemicalize.org for targeted extractions • New IK via source expansion in ChemSpider and PubChem 29
  • 32. Abstract Abstract: Google indexing of the InChIKey (IK) has turned the web into a de facto chemical database with well over 50 million unique entries (PMID:23399051). The first block of the IK encodes molecular skeleton that can be used to give maximum recall of related structures. For example, Google searching XUKUURHRXDUEBC from atorvastatin displays ~200 low-redundancy links in ~0.3 sec with a low false- positive rate . These include most major databases and less familiar but valuable sources. The simplicity of the IK makes it useful for those less familiar with chemical searching. Advanced Google Search can be used to filter results, image searching gives complementary coverage and there are also hits in Google Scholar. IK searching thus becomes powerfully enabling for reciprocal document-to-database joins from legacy text tombs including over 50 years of biology < > chemistry. Open tools such as chemicalize.org can generate of IKs from patents, papers, abstracts or web pages. Open Drug Discovery data on tested, synthesized or even proposed compounds, can be globally connected in real-time by surfacing IKs in open laboratory notebooks, Wikis, blogs, Twitter, figshare etc. Following the ChemSpider precedent the IUPHAR/GTP database offers users IK Google searches from all ligand entries including peptides. 32
  • 33. Patent SAR fromWO2011086531: Collating activities via SureChemOpen CID 53311393 > 33
  • 34. Triaging document or webpage chemistry • Identify the structure specification types – Semantic names (all sources) – Code names (press releases, papers and abstracts) – IUPAC names (papers, patents and abstracts) – Images (papers, patents, & Google images) – SMILES (open lab books) – InChi strings (open lab books) – SDF files (open lab books, & github) Convert these to a structure (e.g. SDF, SMILES, InChI) then: – Search InChIKey in Google – Search major databases – Compare extracted sets for intersects and diffs – Extend exact match connectivity with similarity searching 34
  • 36. Triage example: a new antimalaria The MMV390048 code name is linked to an image in press reports but is PubChem and PubMed -ve 36

Notas do Editor

  1. IinChIKeys - estimate of PubChem + ChemSpider in Google – but PubChem currently has a backlog for Key scrapingThe ROF + 250-800 is a very approximate circumscription of the property space that has some possibility of bioactivityProbably a proportion of vendor structures may have never been committed to textThere are some virtuals “out there” including some patent-extractions but difficult to estimate
  2. Only Nature Chemical Biology and Nature Chemistry have direct links from the journal document to PubChemGiven todays technology the major patent offices could put links in the PDFs but are unlikely to do so
  3. IinChIKeys - estimate of PubChem + ChemSpider in Google – but PubChem currently has a backlog for Key scrapingThe ROF + 250-800 is a very approximate circumscription of the property space that has some possibility of bioactivityProbably a proportion of vendor structures may have never been committed to textThere are some virtuals “out there” including some patent-extractions but difficult to estimate
  4. The CID links straight throught to chemicalize and will just re-extract the whole patent in a few seconds The 413 gave 358 hits in pub chem
  5. From manual cross-checking between the individual example structures and the IC50 table the Excel sheet can be populated
  6. Can upload CID lists and download as a saved and public collection
  7. We can start of with patent linksNote in this case numbered image capture, as oposed to the IUPAC listing, was important to manually collate the structure against the correct IC50
  8. Need to assess what representational types are being used in the documentEg. Some patents are image-only (but SureChem is pulling most of these out)Then select tools and sources for the job ´Decide how to store your structures locally The default batch search is an upload to PubChemThe default individual search is the InChIKey against Google
  9. InChIkey search picks up instantly This was just a choice of one of the activesSo this connects PubChem and figshare
  10. Self explanatoryNote my blog post was indexed