SlideShare uma empresa Scribd logo
1 de 45
Digging out Structures for Repurposing:
     Non-competitive Intelligence


             PubChem Seminar April 2013

  Christopher Southan, TW2Informatics, Göteborg, Sweden




                                                          [1]
Dr Christopher Southan, Ph.D., M.Sc.,B.Sc.
TW2Informatics: http://www.cdsouthan.info/Consult/CDS_cons.htm
Mobile: +46(0)702-530710
Skype: cdsouthan
Email: cdsouthan@hotmail.com
Twitter: http://twitter.com/#!/cdsouthan
Blog: http://cdsouthan.blogspot.com/
LinkedIN: http://www.linkedin.com/in/cdsouthan
Publications: http://www.citeulike.org/user/cdsouthan/order/year,,/publications
Presentations: http://www.slideshare.net/cdsouthan




                                                                                  [2]
Outline


•   Trawling for repurposing-relevant data
•   Code names statistics and name > structure triage
•   The NCATS/MRC challenge
•   Story of JNJ-39393406
•   Scaling-up Code name hunting and x-mapping
•   Code name in clinical trials, MeSH, PubChem
•   Story of PF-04457845
•   Trials, MeSH and PubChem code name intersects
•   Conclusions




                                                        [3]
Intelligence: trawling compound information


              Competitive                            Non-competitive

• Directed towards commercially              • Directed towards repositioning any
  positioning and/or repurposing               compound
  own portfolio                              • Collaborative approaches to IP
• Major big pharma activity                    holders (but new IP possible)
• Mixed commercial/public sources            • Can utilise public resources alone
• Internal specialists                       • Different domain expert entry
• Typically a closed activity (i.e. little     points
  open “best practice”)                      • Predominantly an open activity
• Typically therapeutic area aligned           (e.g. OSDD)
                                             • Can be hypothesis-neutral


                                                                               [4]
Structures:
connecting to repurposing-relevant data

•   Code names and synonyms
•   Resolving these to structures
•   Database entries
•   BioAssay results
•   Target/pathway links
•   In vitro & in vivo research papers
•   Clinical trial results and papers
•   Patents for analogues and SAR
•   Comparative in vivo data
•   Mendelian and GWAS disease links
•   Expression data for cpds
•   In silico modeling (including rare or NTDs)
•   Vendor similarity matches

                                                  [5]
Code names: 2-15 year information hole




                       Pharmaprojects
                       2009-10 figures




                                         [6]
Drugs,code names, INN/USANs and structures:
              few congruent hard numbers

•   Pharmaprojects (2013) drug profiles ~ 50,000
•   Thomson Reuters Cortelis (2012) drug monographs = 41,889
•   Pharmaprojects (via ProQuest, 2012) records ~ 35,000
•   Thomson Reuters Partnering (2011 structures, PMID: 22024215) = 17,901
•   Pharmaprojects (2003 structures) = 14,000
•   ChEMBL USANs (2013) = 10,568
•   PubChem (2013) “USAN [synonym] OR INN [synonym]” = 9,890
•   Pharmaprojects (2010 in development, no structure count) = 9,737
•   GVKBIO Clinical Candidate structures (2008, PMID:20298516) = 8,864
•   Pharmaprojects (2010 review, no structures) Phase 1+2+3 = 3,828




                                                                       [7]
Code names: major repurposing potential – but..
• ~ 95% of the 30K are/will become “parked” or “abandoned”
• Can be repurposed in silico at least
• Obvious hierarchy : leads> development > clinical trials > INN > approved

• Problems
   – New code names < 50% - 70% blinded (i.e. no structures)
   – Some older code names never un-blinded
   – Code naming practices independent and completely ad hoc
   – Publications, conference reports, clinical trials entries, press releases
     and portfolio listings linked to “blinded” code names (no structures)
   – Even for public declarations (e.g. papers) data linked into “the system”
     (e.g. synonym mapping) is patchy
   – Code originators do not provenance public database entries
   – Data supporting non-progression decisions rarely disclosed
   – http://chembl.blogspot.se/p/research-code-stems.html 100’s of codes

                                                                            [8]
Code name-to-structure mapping triage

Dig out the code names    Name/image > struc

 PubChem Substance        • chemicalize.org, OPSIN,
                            Chemical Identifier Resolver,
 PubChem Compound           sketchers, OSRA


    PubMed/MeSH           • Cross-checks:
                             –   SMILES/SDF/InChI strings
                                 PubChem and ChemSpider
    Google Scholar           –   InChIKey in Google
                             –   SureChemOpen patent search
    Google Images            –   Clinicaltrials.gov
                             –   Synonym trawling

 Google open (filtered)

                                                              [9]
The NCATS/MRC industry sponsored
repurposing exercise: the joy of code lists




                                              [10]
NCATS/MRC repurposing candidates




http://cdsouthan.blogspot.se/2012/09/mrc-22-vs-ncats-58-repurposing-lists.html
                                                                                 [11]
NCATS/MRC: summary statistics



                                PMID 23159359




              •   70 code names – no structures
              •   18 INNs & 4 codes-only in PubChem
              •   24 strucs “dug out” but PubChem-ve
              •   24 codes remain blinded
                                                   [12]
Sleuthing down a JNJ-39393406 structure:
        from darkness to twilight




                                           [13]
JNJ-39393406:NCATS documentation PubChem -ve




                                               [14]
JNJ-39393406: ClinicalTrials.gov




                                   [15]
JNJ-39393406 in PubMed




                         [16]
JNJ-39393406: open Google




                            [17]
JNJ-39393406: Google Scholar (was) structure -ve




                                                   [18]
JNJ-39393406 in Google images: finally a mapping




But where did these two vendors get their mapping from ?
                                                           [19]
(Probable) JNJ-39393406 in PubChem:
CID 1675566 patent-only sources and near-neighbours




                                                  [20]
(Probable) JNJ-39393406:
SureChemOpen patent match
   with corroborative data

   PubChem SID 152835708




                             Cf NCATS data




                                             [21]
More JNJ-39393406 mystery:
InChIKey in Google > ChemSpider > 3rd vendor




                                               [22]
Not all JNJ-s are blinded: JNJ-40418677
   IUPAC in abstract but code still PubChem –ve




IUPAC name converted at chemicalize.org for PubChem mapping
                                                              [23]
Scaling-up code name retrieval:
       wild card searches




                                  [24]
Phases & codes in Clinicaltrials.gov:
                           thin on results

• Interventional studies = 115356 , 7895 with results (7%)

• Results | Interventional Studies | Phase 1, 2, 3 | Industry = 4477

• Interventional Studies | GSK* | Phase 1, 2, 3 | Industry = 1004
• Results | Interventional Studies | GSK* | Phase 1, 2, 3 | Industry = 122 (12%)

• Interventional Studies | GSK* OR AZD* OR JNJ* OR PF0* | Phase 1, 2, 3 |
  Industry = 1640

• Results | Interventional Studies | GSK* OR AZD* OR JNJ* OR PF0* | Phase
  1, 2, 3 | Industry = 185 (11%)



                                                                             [25]
altrials.net: public pressure > more results > more
              repurposing opportunities




             http://www.youtube.com/watch?v=lQ6YTU5kGXw&fe
             ature=youtu.be&t=28m39s

                                                             [26]
Stemming code names in MeSh




                              [27]
Code names in PubChem Compound (CIDs)




       CID:SID ratio 275:1039           [28]
Codes in PubChem: selected matches




                                     [29]
“GSK-” in ChEMBL : 61




                        [30]
Tracking PF-04457845 through the system




                                          [31]
PubMed intersects: finding PF-04457845




                                         [32]
PF-04457845:
   PubMed




           [33]
PF-04457845: Clinicaltrials.org




                                  [34]
PF-04457845:
  PubChem CID
    24771824

 Substance (SID)
capture of activity,
vendor and patent
     sources




                   [35]
Wikipedia: links to other development compounds




                 But who put them in ?



                                                  [36]
PF-04457845: (almost) a total system success

•   Declared efficacy failure > possible repurposing candidate
•   Selection of analogues and a probe [18F]PF-9811 (CID 70679467)
•   The “system” did well because of good publishing practice (e.g. full text)
•   Code, structure, target, papers, trials and patents all connected
•   5mg for $275

But-
• Serendipitous finding (no “efficacy failure” or “study stopped” tags)
• Lack of clinicaltrials.org <> PubMed
• BindingDB using deprecated ChEBI ID
• PMID:21505060 not yet in ChEMBL
• No direct target or patent nos. in CID record because no DrugBank,
  SCRIPDB or IBM capture
• [18F]PF-9811 PubChem, [(18)F]PF-9811 PubMed, PF-9811-18F Books
                                                                                 [37]
Looking at code name intersects in different
            parts of the system




                                               [38]
Clinicaltrials.org       JNJ* Word cloud




  JNJ-28431754 = Canagliflozin = CID 24812758


                                                [39]
Company Pipelines: GSK codes for 2012




                                        [40]
GSK codes: PubChem vs. 2012 Pipeline




                                       [41]
Clinical Trials, PubChem, MeSH: GSK




                                      [42]
Clinical Trials, PubChem, MeSH: JNJ




                                      [43]
Clinical, PubChem, MeSH, & 2012 Pipeline:GSK




                                               [44]
Conclusions


• Stalled development candidates, designated by company codes,
  constitute a large potential repurposing information estate
• Historical in vitro , pharmacological & clinical data linked to ~ 30K codes
• But only 40-50% have structures assignable from open sources
• An even smaller proportion have code names in PubChem
• Public name>struc>data capture is ad hoc and needs improving
• Repurposing-relevant relationships are not easy to dig out
• Some “non competitive intelligence” approaches are shown here
• The big push for transparency and open access should improve
  disclosure, data capture, linkage and repurposing opportunities

                                  Happy hunting !

              TED Talk: Francis Collins: We need better drugs -- now
         http://www.ted.com/talks/francis_collins_we_need_better_drugs_now.html
                                                                                  [45]

Mais conteúdo relacionado

Mais procurados

Exploiting PubChem for drug discovery based on natural products
Exploiting PubChem for drug discovery based on natural productsExploiting PubChem for drug discovery based on natural products
Exploiting PubChem for drug discovery based on natural products
Sunghwan Kim
 
CINF 55: SureChEMBL: An open patent chemistry resource
CINF 55: SureChEMBL: An open patent chemistry resourceCINF 55: SureChEMBL: An open patent chemistry resource
CINF 55: SureChEMBL: An open patent chemistry resource
George Papadatos
 
BOUNCER: A Privacy-aware Query Processing Over Federations of RDF Datasets
BOUNCER: A Privacy-aware Query Processing Over Federations of RDF DatasetsBOUNCER: A Privacy-aware Query Processing Over Federations of RDF Datasets
BOUNCER: A Privacy-aware Query Processing Over Federations of RDF Datasets
Kemele M. Endris
 

Mais procurados (20)

Searching for patent information in PubChem
Searching for patent information in PubChem Searching for patent information in PubChem
Searching for patent information in PubChem
 
Cheminformatics Education with PubChem
Cheminformatics Education with PubChemCheminformatics Education with PubChem
Cheminformatics Education with PubChem
 
Exploiting PubChem for drug discovery based on natural products
Exploiting PubChem for drug discovery based on natural productsExploiting PubChem for drug discovery based on natural products
Exploiting PubChem for drug discovery based on natural products
 
PubChem for chemical information literacy training
PubChem for chemical information literacy trainingPubChem for chemical information literacy training
PubChem for chemical information literacy training
 
Semantic Technology: The Basics
Semantic Technology: The BasicsSemantic Technology: The Basics
Semantic Technology: The Basics
 
Open-source from/in the enterprise: the RDKit
Open-source from/in the enterprise: the RDKitOpen-source from/in the enterprise: the RDKit
Open-source from/in the enterprise: the RDKit
 
Overview of SureChEMBL
Overview of SureChEMBLOverview of SureChEMBL
Overview of SureChEMBL
 
SureChEMBL patent annotations in Open PHACTS
SureChEMBL patent annotations in Open PHACTSSureChEMBL patent annotations in Open PHACTS
SureChEMBL patent annotations in Open PHACTS
 
ChEMBL+KNIME
ChEMBL+KNIMEChEMBL+KNIME
ChEMBL+KNIME
 
PubChem: A Public Chemical Information Resource for Big Data Chemistry
PubChem: A Public Chemical Information Resource for Big Data ChemistryPubChem: A Public Chemical Information Resource for Big Data Chemistry
PubChem: A Public Chemical Information Resource for Big Data Chemistry
 
SureChEMBL and Open PHACTS
SureChEMBL and Open PHACTSSureChEMBL and Open PHACTS
SureChEMBL and Open PHACTS
 
CINF 55: SureChEMBL: An open patent chemistry resource
CINF 55: SureChEMBL: An open patent chemistry resourceCINF 55: SureChEMBL: An open patent chemistry resource
CINF 55: SureChEMBL: An open patent chemistry resource
 
Patent chemisty big bang: utilities for SMEs
Patent chemisty big bang: utilities for SMEsPatent chemisty big bang: utilities for SMEs
Patent chemisty big bang: utilities for SMEs
 
IUPHAR/BPS Guide to Pharmacology: concise mapping of chemistry, data, and tar...
IUPHAR/BPS Guide to Pharmacology: concise mapping of chemistry, data, and tar...IUPHAR/BPS Guide to Pharmacology: concise mapping of chemistry, data, and tar...
IUPHAR/BPS Guide to Pharmacology: concise mapping of chemistry, data, and tar...
 
PubChem and Big Data Chemistry
PubChem and Big Data ChemistryPubChem and Big Data Chemistry
PubChem and Big Data Chemistry
 
BOUNCER: A Privacy-aware Query Processing Over Federations of RDF Datasets
BOUNCER: A Privacy-aware Query Processing Over Federations of RDF DatasetsBOUNCER: A Privacy-aware Query Processing Over Federations of RDF Datasets
BOUNCER: A Privacy-aware Query Processing Over Federations of RDF Datasets
 
Antimalarial drug dscovery data disclosure
Antimalarial drug dscovery data disclosureAntimalarial drug dscovery data disclosure
Antimalarial drug dscovery data disclosure
 
Update on the Druggable Proteome
Update on the Druggable ProteomeUpdate on the Druggable Proteome
Update on the Druggable Proteome
 
Guide to PHARMACOLOGY: a web-Based Compendium for Research and Education
Guide to PHARMACOLOGY: a web-Based Compendium for Research and EducationGuide to PHARMACOLOGY: a web-Based Compendium for Research and Education
Guide to PHARMACOLOGY: a web-Based Compendium for Research and Education
 
GtoPdb and GtoImmuPdb in context
GtoPdb and GtoImmuPdb in contextGtoPdb and GtoImmuPdb in context
GtoPdb and GtoImmuPdb in context
 

Destaque

Strelecky how foresight_can_be_beneficial_249
Strelecky how foresight_can_be_beneficial_249Strelecky how foresight_can_be_beneficial_249
Strelecky how foresight_can_be_beneficial_249
atelier t*h
 
3.18 competitive organisational structures
3.18 competitive organisational structures3.18 competitive organisational structures
3.18 competitive organisational structures
sdwaltton
 
Digital signatures, paving the way to a digital Europe_Arthur D Little_2014
Digital signatures, paving the way to a digital Europe_Arthur D Little_2014Digital signatures, paving the way to a digital Europe_Arthur D Little_2014
Digital signatures, paving the way to a digital Europe_Arthur D Little_2014
Market Engel SAS
 

Destaque (15)

Strelecky how foresight_can_be_beneficial_249
Strelecky how foresight_can_be_beneficial_249Strelecky how foresight_can_be_beneficial_249
Strelecky how foresight_can_be_beneficial_249
 
It's all About the Data
It's all About the DataIt's all About the Data
It's all About the Data
 
NeuroWeb Roadmap: Results of Foresight & Call for Action
NeuroWeb Roadmap: Results of Foresight & Call for ActionNeuroWeb Roadmap: Results of Foresight & Call for Action
NeuroWeb Roadmap: Results of Foresight & Call for Action
 
3.18 competitive organisational structures
3.18 competitive organisational structures3.18 competitive organisational structures
3.18 competitive organisational structures
 
Usana Health and Freedom
Usana Health and FreedomUsana Health and Freedom
Usana Health and Freedom
 
VMworld 2013: Security Automation Workflows with NSX
VMworld 2013: Security Automation Workflows with NSX VMworld 2013: Security Automation Workflows with NSX
VMworld 2013: Security Automation Workflows with NSX
 
AutoIt for the rest of us - handout
AutoIt for the rest of us - handoutAutoIt for the rest of us - handout
AutoIt for the rest of us - handout
 
Crystal_Woods_2016 resume v2
Crystal_Woods_2016 resume v2Crystal_Woods_2016 resume v2
Crystal_Woods_2016 resume v2
 
Digital signatures, paving the way to a digital Europe_Arthur D Little_2014
Digital signatures, paving the way to a digital Europe_Arthur D Little_2014Digital signatures, paving the way to a digital Europe_Arthur D Little_2014
Digital signatures, paving the way to a digital Europe_Arthur D Little_2014
 
Digital certificate management v1 (Draft)
Digital certificate management v1 (Draft)Digital certificate management v1 (Draft)
Digital certificate management v1 (Draft)
 
Website Auto scraping with Autoit and .Net HttpRequest
Website Auto scraping with Autoit and .Net HttpRequestWebsite Auto scraping with Autoit and .Net HttpRequest
Website Auto scraping with Autoit and .Net HttpRequest
 
Resume and Coverletter Workshop, 2009
Resume and Coverletter Workshop, 2009Resume and Coverletter Workshop, 2009
Resume and Coverletter Workshop, 2009
 
The Role of Digital Certificates in Contemporary Government Systems: the Case...
The Role of Digital Certificates in Contemporary Government Systems: the Case...The Role of Digital Certificates in Contemporary Government Systems: the Case...
The Role of Digital Certificates in Contemporary Government Systems: the Case...
 
What is iso iec 20000
What is iso iec 20000What is iso iec 20000
What is iso iec 20000
 
The ultimate guide to digital signatures
The ultimate guide to digital signaturesThe ultimate guide to digital signatures
The ultimate guide to digital signatures
 

Semelhante a Digging out Structures for Repurposing: Non-competitive Intelligence

Comparison of Compounds-to-targets between Databases
Comparison of Compounds-to-targets between DatabasesComparison of Compounds-to-targets between Databases
Comparison of Compounds-to-targets between Databases
Chris Southan
 
2016 Standardization of Laboratory Test Coding - PHI Conference
2016 Standardization of Laboratory Test Coding - PHI Conference2016 Standardization of Laboratory Test Coding - PHI Conference
2016 Standardization of Laboratory Test Coding - PHI Conference
Megan Sawchuk
 
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...
Dr. Haxel Consult
 
Toward F.A.I.R. Pharma. PhUSE Linked Data Initiatives Past and Present
Toward F.A.I.R. Pharma. PhUSE Linked Data Initiatives Past and PresentToward F.A.I.R. Pharma. PhUSE Linked Data Initiatives Past and Present
Toward F.A.I.R. Pharma. PhUSE Linked Data Initiatives Past and Present
Tim Williams
 
Promiscuous patterns and perils in PubChem and the MLSCN
Promiscuous patterns and perils in PubChem and the MLSCNPromiscuous patterns and perils in PubChem and the MLSCN
Promiscuous patterns and perils in PubChem and the MLSCN
Jeremy Yang
 

Semelhante a Digging out Structures for Repurposing: Non-competitive Intelligence (20)

Mining Drug Targets, Structures and Activity Data
Mining Drug Targets, Structures and Activity DataMining Drug Targets, Structures and Activity Data
Mining Drug Targets, Structures and Activity Data
 
Comparison of Compounds-to-targets between Databases
Comparison of Compounds-to-targets between DatabasesComparison of Compounds-to-targets between Databases
Comparison of Compounds-to-targets between Databases
 
2016 Standardization of Laboratory Test Coding - PHI Conference
2016 Standardization of Laboratory Test Coding - PHI Conference2016 Standardization of Laboratory Test Coding - PHI Conference
2016 Standardization of Laboratory Test Coding - PHI Conference
 
Exploring SAR between Patents and PubChem
Exploring SAR between Patents and PubChemExploring SAR between Patents and PubChem
Exploring SAR between Patents and PubChem
 
Connecting Bioactive Chemistry Across Documents and Databases
Connecting Bioactive Chemistry Across Documents and Databases Connecting Bioactive Chemistry Across Documents and Databases
Connecting Bioactive Chemistry Across Documents and Databases
 
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...
 
Mining Small Molecules for Drug Discovery
Mining Small Molecules for Drug DiscoveryMining Small Molecules for Drug Discovery
Mining Small Molecules for Drug Discovery
 
MRCT's Centre for Therapeutics Discovery
MRCT's Centre for Therapeutics DiscoveryMRCT's Centre for Therapeutics Discovery
MRCT's Centre for Therapeutics Discovery
 
Toward F.A.I.R. Pharma. PhUSE Linked Data Initiatives Past and Present
Toward F.A.I.R. Pharma. PhUSE Linked Data Initiatives Past and PresentToward F.A.I.R. Pharma. PhUSE Linked Data Initiatives Past and Present
Toward F.A.I.R. Pharma. PhUSE Linked Data Initiatives Past and Present
 
Knowledge Graphs: Changing How We Think About Data
Knowledge Graphs: Changing How We Think About DataKnowledge Graphs: Changing How We Think About Data
Knowledge Graphs: Changing How We Think About Data
 
Precompetitive Collaborations
Precompetitive CollaborationsPrecompetitive Collaborations
Precompetitive Collaborations
 
Mining 'Bigger' Datasets to Create, Validate and Share Machine Learning Models
Mining 'Bigger' Datasets to Create, Validate and Share Machine Learning ModelsMining 'Bigger' Datasets to Create, Validate and Share Machine Learning Models
Mining 'Bigger' Datasets to Create, Validate and Share Machine Learning Models
 
Medicinal Chemistry Due Diligence: Computational Predictions of an expert’s e...
Medicinal Chemistry Due Diligence: Computational Predictions of an expert’s e...Medicinal Chemistry Due Diligence: Computational Predictions of an expert’s e...
Medicinal Chemistry Due Diligence: Computational Predictions of an expert’s e...
 
Knowledge Graphs : Shaping Our Data Future
Knowledge Graphs : Shaping Our Data FutureKnowledge Graphs : Shaping Our Data Future
Knowledge Graphs : Shaping Our Data Future
 
Patents in PubChem
Patents in PubChemPatents in PubChem
Patents in PubChem
 
Peptide Tribulations in GtoPdb
Peptide Tribulations in GtoPdbPeptide Tribulations in GtoPdb
Peptide Tribulations in GtoPdb
 
Pathway studio into webinar 052715v1
Pathway studio into webinar 052715v1Pathway studio into webinar 052715v1
Pathway studio into webinar 052715v1
 
Promiscuous patterns and perils in PubChem and the MLSCN
Promiscuous patterns and perils in PubChem and the MLSCNPromiscuous patterns and perils in PubChem and the MLSCN
Promiscuous patterns and perils in PubChem and the MLSCN
 
PubChem for drug discovery and chemical biology
PubChem for drug discovery and chemical biologyPubChem for drug discovery and chemical biology
PubChem for drug discovery and chemical biology
 
PubChem as a source of systems biology perturbagens
PubChem as a source of  systems biology perturbagensPubChem as a source of  systems biology perturbagens
PubChem as a source of systems biology perturbagens
 

Mais de Chris Southan

Vicissitudes of target validation for BACE1 and BACE2
Vicissitudes of target validation for BACE1 and BACE2 Vicissitudes of target validation for BACE1 and BACE2
Vicissitudes of target validation for BACE1 and BACE2
Chris Southan
 
In silico 360 Analysis for Drug Development
In silico 360 Analysis for Drug DevelopmentIn silico 360 Analysis for Drug Development
In silico 360 Analysis for Drug Development
Chris Southan
 

Mais de Chris Southan (20)

FAIR connectivity for DARCP
FAIR  connectivity for DARCPFAIR  connectivity for DARCP
FAIR connectivity for DARCP
 
Connectivity > documents > structures > bioactivity
Connectivity > documents > structures > bioactivityConnectivity > documents > structures > bioactivity
Connectivity > documents > structures > bioactivity
 
Peptide tribulations
Peptide tribulationsPeptide tribulations
Peptide tribulations
 
Vicissitudes of target validation for BACE1 and BACE2
Vicissitudes of target validation for BACE1 and BACE2 Vicissitudes of target validation for BACE1 and BACE2
Vicissitudes of target validation for BACE1 and BACE2
 
Guide to Pharmacology database: ELIXIR updae
Guide to Pharmacology database: ELIXIR updaeGuide to Pharmacology database: ELIXIR updae
Guide to Pharmacology database: ELIXIR updae
 
In silico 360 Analysis for Drug Development
In silico 360 Analysis for Drug DevelopmentIn silico 360 Analysis for Drug Development
In silico 360 Analysis for Drug Development
 
Will the correct BACE ORFs please stand up?
Will the correct BACE ORFs please stand up?Will the correct BACE ORFs please stand up?
Will the correct BACE ORFs please stand up?
 
Desperately seeking DARCP
Desperately seeking DARCPDesperately seeking DARCP
Desperately seeking DARCP
 
Seeking glimmers of light in Pharos “Tdark” proteins
Seeking glimmers of light in  Pharos “Tdark” proteinsSeeking glimmers of light in  Pharos “Tdark” proteins
Seeking glimmers of light in Pharos “Tdark” proteins
 
5HT2A modulators update for SAFER
5HT2A modulators update for SAFER5HT2A modulators update for SAFER
5HT2A modulators update for SAFER
 
Quality and noise in big chemistry databases
Quality and noise in big chemistry databasesQuality and noise in big chemistry databases
Quality and noise in big chemistry databases
 
Connecting chemistry-to-biology
Connecting chemistry-to-biology Connecting chemistry-to-biology
Connecting chemistry-to-biology
 
GtoPdb June 2019 poster
GtoPdb June 2019 posterGtoPdb June 2019 poster
GtoPdb June 2019 poster
 
Will the real proteins please stand up
Will the real proteins please stand upWill the real proteins please stand up
Will the real proteins please stand up
 
Peptide Tribulations
Peptide TribulationsPeptide Tribulations
Peptide Tribulations
 
Looking at chemistry - protein - papers connectivity in ELIXIR
Looking at chemistry - protein - papers connectivity in ELIXIRLooking at chemistry - protein - papers connectivity in ELIXIR
Looking at chemistry - protein - papers connectivity in ELIXIR
 
Guide to Immunopharmacology update
Guide to Immunopharmacology updateGuide to Immunopharmacology update
Guide to Immunopharmacology update
 
Druggable Proteome sources in UniProt
Druggable Proteome sources in UniProtDruggable Proteome sources in UniProt
Druggable Proteome sources in UniProt
 
Pub Med to PubChem Connectivity
Pub Med to PubChem ConnectivityPub Med to PubChem Connectivity
Pub Med to PubChem Connectivity
 
The IUPHAR/MMV Guide to Malaria Pharmacology
The  IUPHAR/MMV Guide to Malaria Pharmacology  The  IUPHAR/MMV Guide to Malaria Pharmacology
The IUPHAR/MMV Guide to Malaria Pharmacology
 

Último

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Último (20)

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 

Digging out Structures for Repurposing: Non-competitive Intelligence

  • 1. Digging out Structures for Repurposing: Non-competitive Intelligence PubChem Seminar April 2013 Christopher Southan, TW2Informatics, Göteborg, Sweden [1]
  • 2. Dr Christopher Southan, Ph.D., M.Sc.,B.Sc. TW2Informatics: http://www.cdsouthan.info/Consult/CDS_cons.htm Mobile: +46(0)702-530710 Skype: cdsouthan Email: cdsouthan@hotmail.com Twitter: http://twitter.com/#!/cdsouthan Blog: http://cdsouthan.blogspot.com/ LinkedIN: http://www.linkedin.com/in/cdsouthan Publications: http://www.citeulike.org/user/cdsouthan/order/year,,/publications Presentations: http://www.slideshare.net/cdsouthan [2]
  • 3. Outline • Trawling for repurposing-relevant data • Code names statistics and name > structure triage • The NCATS/MRC challenge • Story of JNJ-39393406 • Scaling-up Code name hunting and x-mapping • Code name in clinical trials, MeSH, PubChem • Story of PF-04457845 • Trials, MeSH and PubChem code name intersects • Conclusions [3]
  • 4. Intelligence: trawling compound information Competitive Non-competitive • Directed towards commercially • Directed towards repositioning any positioning and/or repurposing compound own portfolio • Collaborative approaches to IP • Major big pharma activity holders (but new IP possible) • Mixed commercial/public sources • Can utilise public resources alone • Internal specialists • Different domain expert entry • Typically a closed activity (i.e. little points open “best practice”) • Predominantly an open activity • Typically therapeutic area aligned (e.g. OSDD) • Can be hypothesis-neutral [4]
  • 5. Structures: connecting to repurposing-relevant data • Code names and synonyms • Resolving these to structures • Database entries • BioAssay results • Target/pathway links • In vitro & in vivo research papers • Clinical trial results and papers • Patents for analogues and SAR • Comparative in vivo data • Mendelian and GWAS disease links • Expression data for cpds • In silico modeling (including rare or NTDs) • Vendor similarity matches [5]
  • 6. Code names: 2-15 year information hole Pharmaprojects 2009-10 figures [6]
  • 7. Drugs,code names, INN/USANs and structures: few congruent hard numbers • Pharmaprojects (2013) drug profiles ~ 50,000 • Thomson Reuters Cortelis (2012) drug monographs = 41,889 • Pharmaprojects (via ProQuest, 2012) records ~ 35,000 • Thomson Reuters Partnering (2011 structures, PMID: 22024215) = 17,901 • Pharmaprojects (2003 structures) = 14,000 • ChEMBL USANs (2013) = 10,568 • PubChem (2013) “USAN [synonym] OR INN [synonym]” = 9,890 • Pharmaprojects (2010 in development, no structure count) = 9,737 • GVKBIO Clinical Candidate structures (2008, PMID:20298516) = 8,864 • Pharmaprojects (2010 review, no structures) Phase 1+2+3 = 3,828 [7]
  • 8. Code names: major repurposing potential – but.. • ~ 95% of the 30K are/will become “parked” or “abandoned” • Can be repurposed in silico at least • Obvious hierarchy : leads> development > clinical trials > INN > approved • Problems – New code names < 50% - 70% blinded (i.e. no structures) – Some older code names never un-blinded – Code naming practices independent and completely ad hoc – Publications, conference reports, clinical trials entries, press releases and portfolio listings linked to “blinded” code names (no structures) – Even for public declarations (e.g. papers) data linked into “the system” (e.g. synonym mapping) is patchy – Code originators do not provenance public database entries – Data supporting non-progression decisions rarely disclosed – http://chembl.blogspot.se/p/research-code-stems.html 100’s of codes [8]
  • 9. Code name-to-structure mapping triage Dig out the code names Name/image > struc PubChem Substance • chemicalize.org, OPSIN, Chemical Identifier Resolver, PubChem Compound sketchers, OSRA PubMed/MeSH • Cross-checks: – SMILES/SDF/InChI strings PubChem and ChemSpider Google Scholar – InChIKey in Google – SureChemOpen patent search Google Images – Clinicaltrials.gov – Synonym trawling Google open (filtered) [9]
  • 10. The NCATS/MRC industry sponsored repurposing exercise: the joy of code lists [10]
  • 12. NCATS/MRC: summary statistics PMID 23159359 • 70 code names – no structures • 18 INNs & 4 codes-only in PubChem • 24 strucs “dug out” but PubChem-ve • 24 codes remain blinded [12]
  • 13. Sleuthing down a JNJ-39393406 structure: from darkness to twilight [13]
  • 18. JNJ-39393406: Google Scholar (was) structure -ve [18]
  • 19. JNJ-39393406 in Google images: finally a mapping But where did these two vendors get their mapping from ? [19]
  • 20. (Probable) JNJ-39393406 in PubChem: CID 1675566 patent-only sources and near-neighbours [20]
  • 21. (Probable) JNJ-39393406: SureChemOpen patent match with corroborative data PubChem SID 152835708 Cf NCATS data [21]
  • 22. More JNJ-39393406 mystery: InChIKey in Google > ChemSpider > 3rd vendor [22]
  • 23. Not all JNJ-s are blinded: JNJ-40418677 IUPAC in abstract but code still PubChem –ve IUPAC name converted at chemicalize.org for PubChem mapping [23]
  • 24. Scaling-up code name retrieval: wild card searches [24]
  • 25. Phases & codes in Clinicaltrials.gov: thin on results • Interventional studies = 115356 , 7895 with results (7%) • Results | Interventional Studies | Phase 1, 2, 3 | Industry = 4477 • Interventional Studies | GSK* | Phase 1, 2, 3 | Industry = 1004 • Results | Interventional Studies | GSK* | Phase 1, 2, 3 | Industry = 122 (12%) • Interventional Studies | GSK* OR AZD* OR JNJ* OR PF0* | Phase 1, 2, 3 | Industry = 1640 • Results | Interventional Studies | GSK* OR AZD* OR JNJ* OR PF0* | Phase 1, 2, 3 | Industry = 185 (11%) [25]
  • 26. altrials.net: public pressure > more results > more repurposing opportunities http://www.youtube.com/watch?v=lQ6YTU5kGXw&fe ature=youtu.be&t=28m39s [26]
  • 27. Stemming code names in MeSh [27]
  • 28. Code names in PubChem Compound (CIDs) CID:SID ratio 275:1039 [28]
  • 29. Codes in PubChem: selected matches [29]
  • 30. “GSK-” in ChEMBL : 61 [30]
  • 31. Tracking PF-04457845 through the system [31]
  • 32. PubMed intersects: finding PF-04457845 [32]
  • 33. PF-04457845: PubMed [33]
  • 35. PF-04457845: PubChem CID 24771824 Substance (SID) capture of activity, vendor and patent sources [35]
  • 36. Wikipedia: links to other development compounds But who put them in ? [36]
  • 37. PF-04457845: (almost) a total system success • Declared efficacy failure > possible repurposing candidate • Selection of analogues and a probe [18F]PF-9811 (CID 70679467) • The “system” did well because of good publishing practice (e.g. full text) • Code, structure, target, papers, trials and patents all connected • 5mg for $275 But- • Serendipitous finding (no “efficacy failure” or “study stopped” tags) • Lack of clinicaltrials.org <> PubMed • BindingDB using deprecated ChEBI ID • PMID:21505060 not yet in ChEMBL • No direct target or patent nos. in CID record because no DrugBank, SCRIPDB or IBM capture • [18F]PF-9811 PubChem, [(18)F]PF-9811 PubMed, PF-9811-18F Books [37]
  • 38. Looking at code name intersects in different parts of the system [38]
  • 39. Clinicaltrials.org JNJ* Word cloud JNJ-28431754 = Canagliflozin = CID 24812758 [39]
  • 40. Company Pipelines: GSK codes for 2012 [40]
  • 41. GSK codes: PubChem vs. 2012 Pipeline [41]
  • 42. Clinical Trials, PubChem, MeSH: GSK [42]
  • 43. Clinical Trials, PubChem, MeSH: JNJ [43]
  • 44. Clinical, PubChem, MeSH, & 2012 Pipeline:GSK [44]
  • 45. Conclusions • Stalled development candidates, designated by company codes, constitute a large potential repurposing information estate • Historical in vitro , pharmacological & clinical data linked to ~ 30K codes • But only 40-50% have structures assignable from open sources • An even smaller proportion have code names in PubChem • Public name>struc>data capture is ad hoc and needs improving • Repurposing-relevant relationships are not easy to dig out • Some “non competitive intelligence” approaches are shown here • The big push for transparency and open access should improve disclosure, data capture, linkage and repurposing opportunities Happy hunting ! TED Talk: Francis Collins: We need better drugs -- now http://www.ted.com/talks/francis_collins_we_need_better_drugs_now.html [45]

Notas do Editor

  1. IUPAC in abstract converted by MeSH but not transferred to PubChemChemicalize.org used for conversion, matched patent sourcesTherefore structure is there but code synonym is notNo ones responsibility to submit the code-to-struc