SlideShare uma empresa Scribd logo
1 de 50
Baixar para ler offline
Patent annotations: From SureChEMBL
to Open PHACTS
George Papadatos, PhD
ChEMBL Group
georgep@ebi.ac.uk
Outline
•  Overview of the SureChEMBL patent resource
•  Pipeline, UI and content
•  Extraction of chemistry and biological entities
•  Integration with the Open PHACTS platform
•  Use cases and examples
Bioactivity data
Compound
Assay/Target
>Thrombin
MAHVRGLQLPGCLALAALCSLVHSQHVFLAPQQARSLLQRVRRANTFLEEVRKGNLE
RECVEETCSYEEAFEALESSTATDVFWAKYTACETARTPRDKLAACLEGNCAEGLGT
NYRGHVNITRSGIECQLWRSRYPHKPEINSTTHPGADLQENFCRNPDSSTTGPWCYT
TDPTVRRQECSIPVCGQDQVTVAMTPRSEGSSVNLSPPLEQCVPDRGQQYQGRLAVT
THGLPCLAWASAQAKALSKHQDFNSAVQLVENFCRNPDGDEEGVWCYVAGKPGDFGY
CDLNYCEEAVEEETGDGLDEDSDRAIEGRTATSEYQTFFNPRTFGSGEADCGLRPLF
EKKSLEDKTERELLESYIDGRIVEGSDAEIGMSPWQVMLFRKSPQELLCGASLISDR
WVLTAAHCLLYPPWDKNFTENDLLVRIGKHSRTRYERNIEKISMLEKIYIHPRYNWR
ENLDRDIALMKLKKPVAFSDYIHPVCLPDRETAASLLQAGYKGRVTGWGNLKETWTA
NVGKGQPSVLQVVNLPIVERPVCKDSTRIRITDNMFCAGYKPDEGKRGDACEGDSGG
PFVMKSPFNNRWYQMGIVSWGEGCDRDGKYGFYTHVFRLKKWIQKVIDQFGE
3. Insight, tools and resources for translational drug discovery
2. Organization, integration, curation and standardization of pharmacology data
1. Scientific facts
Ki = 4.5nM
APTT = 11 min.
ChEMBL: Data for drug discovery
Why looking at patent documents?
•  Patent filing and searching
•  Legal, financial and commercial incentives & interests
•  Prior art, novelty, freedom to operate searches
•  Competitive intelligence
•  Unprecedented wealth of knowledge
•  Most of the knowledge will never be disclosed anywhere else
•  Compounds, scaffolds, reactions
•  Biological targets, diseases, indications
•  Average lag of 2-4 years between patent document and journal
publication disclosure for chemistry, 4-5 for biological targets
SureChEMBL data processing
WO	
EP	
Applica,ons
&	Granted	
US	
Applica,ons	
&	granted	
JP	
Abstracts	
Patent
Offices
Chemistry	
Database	
SureChEMBL System
Patent	
PDFs	
(service)	
Applica,on	
Users	
API	
Database	
En,ty	
Recogni,on	
SureChem IP
1-[4-ethoxy-3-(6,7-dihydro-1-methyl-7-oxo-3-
propyl-1H-pyrazolo[4,3-d]pyrimidin-5-
yl)phenylsulfonyl]-4-methylpiperazine	
Image	to	
Structure	
(one	method)	
AUachments	
Name	to	
Structure	
(five	methods)	
OCR
Processed	
patents	
(service)	
www.surechembl.org
SureChEMBL Data Content
•  > 17M unique structures
•  ~13M are in a drug-like phys-chem space
•  > 14.5M chemically annotated patents 
•  ~4.5M are life-science relevant (based on patent classification
codes)
•  ~80,000 novel compounds extracted from ~50,000 new
patents monthly
•  2–7 days for a published patent to be chemically annotated
and searchable in SureChEMBL
•  SureChEMBL provides search access to all patents (not just
chemically annotated ones) ~120M patents
SureChEMBL data processing v2
WO	
EP	
Applica,ons
&	Granted	
US	
Applica,ons	
&	granted	
JP	
Abstracts	
Chemistry	
Database	
SureChEMBL System
Database	
En,ty	
Recogni,on	
SureChem IP
Image	to	
Structure	
(one	method)	
Name	to	
Structure	
(five	methods)	
OCR
Processed	
patents	
(service)	
Bio-En,ty	
Recogni,on	
Patent	
PDFs	
(service)	
Applica,on	
Server	
Users	
Patent
Offices
www.surechembl.org
SureChEMBL bioannotation
•  SciBite’s Termite text-mining engine run on 4M life-science patents
from SureChEMBL corpus
•  Gene names (mapped to HGNC symbols) and diseases (mapped to
MeSH IDs) annotated
•  Section/frequency information annotated (e.g., in title, abstract,
claims, total frequency)
•  Relevance score (0-3) to flag important chemical and biological
entities and remove noise
Relevance scoring – genes/diseases
•  Various features used:
•  Term frequency
•  Position (title, abstract, figure, caption, table)
•  Frequency distribution
•  Scores range from 0 – 3
•  3 – most important entities in the patent
•  2 – important entities in the patent
•  1 – mentioned entities in the patent
•  0 – ambiguous entity/likely annotation error
Relevance scoring - compounds
•  Main assumptions for relevance:
1.  Very frequent compounds are irrelevant (but if drug-like then that’s OK)
2.  Compounds with busy chemical space around them are interesting
•  Use distribution of close analogues (NNs) among compounds found in the same
patent family
•  Scores range from 0 – 3
•  3 – highest number of NNs: most important entities in the patent
•  2 – important entities in the patent
•  1 – few NNs: mentioned entities in the patent
•  0 – singletons or trivial entities, most likely errors or reagents, solvents,
substituents
Hatori, K., Wakabayashi, H., & Tamaki, K. (2008). JCIM, 48(1), 135–142. doi:10.1021/ci7002686
Tyrchan, C., Boström, J., Giordaneto, F., Winter, J., & Muresan, S. (2012). JCIM, 52(6), 1480–1489.
doi:10.1021/ci3001293
Data details
•  4 million full-text patent documents annotated
•  USPTO, WIPO, EPO – English language
•  Life-sciences relevant
•  Patents mapped to SureChEMBL IDs (e.g. EP-1339685-A2)
•  Title, publication date, classification codes
•  Compounds mapped to SCHEMBL IDs (e.g. SCHEMBL15064)
•  Genes mapped to HGNC symbols (e.g. FDFT1)
•  Diseases mapped to MeSH terms (e.g. D009765)
•  Relevance scores for compounds, genes and diseases
•  0: ambiguous/trivial to 3: most important (key) entities in the patent
https://wiki.openphacts.org/index.php/SureChEMBL
Gotchas & out of scope
•  No Markush extraction
•  No natural language processing (e.g., ‘compound x is an
inhibitor of target y’)
•  No extraction of bioactivities
•  No chemistry search (yet)
•  Patent coverage stops in April 2015
•  Incremental updates TBD
•  Patent calls still in development
Open PHACTS Patent API
https://dev.openphacts.org/docs/develop
Open PHACTS Patent API
Disease	
Compound	
Target	
Patent	
extracted	
links
Open PHACTS Patent API
Disease	
Compound	
Target	
Patent	
inferred	links	
extracted	
links
Use case #1: Patent to Entities
1.  From a patent get compounds, genes and diseases
2.  Filter to remove noise
•  Frequency and relevance score
3.  Process and visualise
Disease	
Compound	
Target	
Patent	
?
?
?
US-7718693-B2
Use case #1: Patent to Entities
•  Patent URI:
•  http://rdf.ebi.ac.uk/resource/surechembl/patent/US-7718693-B2
•  API call:
•  586 entities back
Use case #1: Patent to Entities
•  Patent URI:
•  http://rdf.ebi.ac.uk/resource/surechembl/patent/US-7718693-B2
•  API call:
•  586 entities back
1) Look at target and disease entities
•  178 target and disease entities
•  Filter: Relevance score >= 2 à 23 remain
•  Visualise in tag cloud by frequency
1) Look at target and disease entities
•  178 target and disease entities
•  Filter: Relevance score >= 2 à 23 remain
•  Visualise in tag cloud by frequency
Does it make sense?
Does it make sense?
US-7718693-B2
2) Look at compound entities
•  408 compound entities
•  Filter: Relevance score >= 1 à 201 remain
•  Calculate properties
2) Look at compound entities
•  408 compound entities
•  Filter: Relevance score >= 1 à 201 remain
•  Calculate properties
Does it make sense?
•  Calculate MCS
Does it make sense?
•  Calculate MCS
US-7718693-B2
Use case #2: Drug targets & indications for compound
1.  Search patents for a compound (approved drug)
2.  Filter to remove noise
•  Frequency, relevance score and classification code
3.  For remaining patents, get disease and target entities
4.  Filter to remove noise
•  Frequency and relevance score
5.  Visualise results
Disease	
Compound	
Target	
Patent
Eluxadoline (JNJ-27018966, VIBERZI)
CHEMBL2159122
FDA Approval: 2015
1) Get patents for Eluxadoline
•  UniChem call à SCHEMBL12971682
•  Compound URI:
•  http://rdf.ebi.ac.uk/resource/surechembl/molecule/SCHEMBL12971682
•  API call:
•  Relevance score >=1 à 17 patents (patentome):
1) Get patents for Eluxadoline
•  UniChem call à SCHEMBL12971682
•  Compound URI:
•  http://rdf.ebi.ac.uk/resource/surechembl/molecule/SCHEMBL12971682
•  API call:
•  Relevance score >=1 à 17 patents (patentome):
2) Get target and disease entities for patents
•  API call:
•  Classification codes for patents
•  Filter: A61 and C07* à 11 patents remaining
•  API call:
•  Filter: Relevance score >= 2, Frequency >= 2 à 2 targets, 21 diseases
•  Visualise with tag clouds by frequency
* http://web2.wipo.int/classifications/ipc/ipcpub/
Results
Relevant targets:
Relevant diseases:
Does it make sense?
http://www.accessdata.fda.gov/drugsatfda_docs/label/2015/206940s000lbl.pdf
Does it make sense?
http://www.accessdata.fda.gov/drugsatfda_docs/label/2015/206940s000lbl.pdf
Take home message
•  It is now possible to extract and interlink the key structures,
scaffolds, targets and diseases from med. chem. patent
corpus automatically
•  By high-throughput text-mining only
•  Thanks to simple heuristics (relevance scores and frequency)
•  Using the Open PHACTS API and KNIME
•  For the first time in a free resource in such scale
Disease	
Compound	
Target	
Patent
Take home message II
•  It is now possible to link compounds to targets and
diseases using the med. chem. patent corpus as evidence
•  By text-mining only
•  Thanks to heuristics (relevance scores and frequencies)
•  Using the Open PHACTS API and KNIME
•  For the first time in a free resource in such scale
•  Inconceivable just 2 years ago!
Disease	
Compound	
Target	
Patent
What next: Other ideas and use cases
•  Target validation / Druggability
•  For a target, get me all related and relevant diseases
•  Compare with DisGeNET / CTTV, etc.
•  Any known patented scaffolds for my target?
•  Start a pharmacophore hypothesis for patent busting
•  Novelty checking / Due diligence
•  What do we know about this scaffold / compound?
•  Add ChEMBL pharmacology, pathway information
•  Large-scale data mining
Availability
•  API calls will be released to production soon
•  KNIME workflows available on request then
•  SureChEMBL annotations licensed under CC BY-SA
•  SciBite annotations licensed under CC BY-NC-SA
Acknowledgements
•  SureChEMBL
•  Anna Gaulton
•  Mark Davies
•  Nathan Dedman
•  James Siddle
•  Anne Hersey
•  SciBite
•  Lee Harland
•  Open PHACTS consortium
•  Stefan Senger
•  Nick Lynch
•  Daniela Digles
•  Antonis Loizou
Technology partners
Patent annotations: From SureChEMBL
to Open PHACTS
George Papadatos, PhD
ChEMBL Group
georgep@ebi.ac.uk

Mais conteúdo relacionado

Mais procurados

Consensus ranking and fragmentation prediction for identification of unknowns...
Consensus ranking and fragmentation prediction for identification of unknowns...Consensus ranking and fragmentation prediction for identification of unknowns...
Consensus ranking and fragmentation prediction for identification of unknowns...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
IRIDA: Canada’s federated platform for genomic epidemiology
IRIDA: Canada’s federated platform for genomic epidemiology IRIDA: Canada’s federated platform for genomic epidemiology
IRIDA: Canada’s federated platform for genomic epidemiology
William Hsiao
 
The US-EPA CompTox Chemicals Dashboard – a key player in the domain of Open S...
The US-EPA CompTox Chemicals Dashboard – a key player in the domain of Open S...The US-EPA CompTox Chemicals Dashboard – a key player in the domain of Open S...
The US-EPA CompTox Chemicals Dashboard – a key player in the domain of Open S...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
EUGM15 - George Papadatos, Mark Davies, Nathan Dedman (EMBL-EBI): SureChEMBL:...
EUGM15 - George Papadatos, Mark Davies, Nathan Dedman (EMBL-EBI): SureChEMBL:...EUGM15 - George Papadatos, Mark Davies, Nathan Dedman (EMBL-EBI): SureChEMBL:...
EUGM15 - George Papadatos, Mark Davies, Nathan Dedman (EMBL-EBI): SureChEMBL:...
ChemAxon
 
Incorporating new technologies and High Throughput Screening in the design an...
Incorporating new technologies and High Throughput Screening in the design an...Incorporating new technologies and High Throughput Screening in the design an...
Incorporating new technologies and High Throughput Screening in the design an...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
2014 agbt giab_progress update
2014 agbt giab_progress update2014 agbt giab_progress update
2014 agbt giab_progress update
GenomeInABottle
 
US-EPA Chemicals Dashboard – an integrated data hub for environmental science
US-EPA Chemicals Dashboard – an integrated data hub for environmental scienceUS-EPA Chemicals Dashboard – an integrated data hub for environmental science
US-EPA Chemicals Dashboard – an integrated data hub for environmental science
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 

Mais procurados (20)

Knowledge is Property- All YOU need to know ABC of Patent Searching
Knowledge is Property- All YOU need to know ABC of Patent SearchingKnowledge is Property- All YOU need to know ABC of Patent Searching
Knowledge is Property- All YOU need to know ABC of Patent Searching
 
How Can We Make Genomic Epidemiology a Widespread Reality? - William Hsiao
How Can We Make Genomic Epidemiology a Widespread Reality?  - William HsiaoHow Can We Make Genomic Epidemiology a Widespread Reality?  - William Hsiao
How Can We Make Genomic Epidemiology a Widespread Reality? - William Hsiao
 
Using open bioactivity data for developing machine-learning prediction models...
Using open bioactivity data for developing machine-learning prediction models...Using open bioactivity data for developing machine-learning prediction models...
Using open bioactivity data for developing machine-learning prediction models...
 
Development of machine learning-based prediction models for chemical modulato...
Development of machine learning-based prediction models for chemical modulato...Development of machine learning-based prediction models for chemical modulato...
Development of machine learning-based prediction models for chemical modulato...
 
Consensus ranking and fragmentation prediction for identification of unknowns...
Consensus ranking and fragmentation prediction for identification of unknowns...Consensus ranking and fragmentation prediction for identification of unknowns...
Consensus ranking and fragmentation prediction for identification of unknowns...
 
IRIDA: Canada’s federated platform for genomic epidemiology
IRIDA: Canada’s federated platform for genomic epidemiology IRIDA: Canada’s federated platform for genomic epidemiology
IRIDA: Canada’s federated platform for genomic epidemiology
 
Semantic Technology: The Basics
Semantic Technology: The BasicsSemantic Technology: The Basics
Semantic Technology: The Basics
 
SAFE: Policy Aware SPARQL Query Federation Over RDF Data Cubes
SAFE: Policy Aware SPARQL Query Federation Over RDF Data CubesSAFE: Policy Aware SPARQL Query Federation Over RDF Data Cubes
SAFE: Policy Aware SPARQL Query Federation Over RDF Data Cubes
 
4A2B2C-2013
4A2B2C-20134A2B2C-2013
4A2B2C-2013
 
The US-EPA CompTox Chemicals Dashboard – a key player in the domain of Open S...
The US-EPA CompTox Chemicals Dashboard – a key player in the domain of Open S...The US-EPA CompTox Chemicals Dashboard – a key player in the domain of Open S...
The US-EPA CompTox Chemicals Dashboard – a key player in the domain of Open S...
 
EUGM15 - George Papadatos, Mark Davies, Nathan Dedman (EMBL-EBI): SureChEMBL:...
EUGM15 - George Papadatos, Mark Davies, Nathan Dedman (EMBL-EBI): SureChEMBL:...EUGM15 - George Papadatos, Mark Davies, Nathan Dedman (EMBL-EBI): SureChEMBL:...
EUGM15 - George Papadatos, Mark Davies, Nathan Dedman (EMBL-EBI): SureChEMBL:...
 
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Pfizer’s Recent Use of tr...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Pfizer’s Recent Use of tr...tranSMART Community Meeting 5-7 Nov 13 - Session 3: Pfizer’s Recent Use of tr...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Pfizer’s Recent Use of tr...
 
Introduction to Cheminformatics: Accessing data through the CompTox Chemicals...
Introduction to Cheminformatics: Accessing data through the CompTox Chemicals...Introduction to Cheminformatics: Accessing data through the CompTox Chemicals...
Introduction to Cheminformatics: Accessing data through the CompTox Chemicals...
 
Incorporating new technologies and High Throughput Screening in the design an...
Incorporating new technologies and High Throughput Screening in the design an...Incorporating new technologies and High Throughput Screening in the design an...
Incorporating new technologies and High Throughput Screening in the design an...
 
Aug2015 Ali Bashir and Jason Chin Pac bio giab_assembly_summary_ali3
Aug2015 Ali Bashir and Jason Chin Pac bio giab_assembly_summary_ali3Aug2015 Ali Bashir and Jason Chin Pac bio giab_assembly_summary_ali3
Aug2015 Ali Bashir and Jason Chin Pac bio giab_assembly_summary_ali3
 
2014 agbt giab_progress update
2014 agbt giab_progress update2014 agbt giab_progress update
2014 agbt giab_progress update
 
US-EPA Chemicals Dashboard – an integrated data hub for environmental science
US-EPA Chemicals Dashboard – an integrated data hub for environmental scienceUS-EPA Chemicals Dashboard – an integrated data hub for environmental science
US-EPA Chemicals Dashboard – an integrated data hub for environmental science
 
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference DatabaseDevelopment of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
 
ICIC 2017: Freeware and public databases: Towards a Wiki Drug Discovery?
ICIC 2017: Freeware and public databases: Towards a Wiki Drug Discovery?ICIC 2017: Freeware and public databases: Towards a Wiki Drug Discovery?
ICIC 2017: Freeware and public databases: Towards a Wiki Drug Discovery?
 
Cshl minseqe 2013_ouellette
Cshl minseqe 2013_ouelletteCshl minseqe 2013_ouellette
Cshl minseqe 2013_ouellette
 

Semelhante a Patent annotations: From SureChEMBL to Open PHACTS

ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...
Dr. Haxel Consult
 
CINF 55: SureChEMBL: An open patent chemistry resource
CINF 55: SureChEMBL: An open patent chemistry resourceCINF 55: SureChEMBL: An open patent chemistry resource
CINF 55: SureChEMBL: An open patent chemistry resource
George Papadatos
 
CoMPARA: Collaborative Modeling Project for Androgen Receptor Activity
CoMPARA: Collaborative Modeling Project for Androgen Receptor ActivityCoMPARA: Collaborative Modeling Project for Androgen Receptor Activity
CoMPARA: Collaborative Modeling Project for Androgen Receptor Activity
Kamel Mansouri
 

Semelhante a Patent annotations: From SureChEMBL to Open PHACTS (20)

SureChEMBL patent annotations in Open PHACTS
SureChEMBL patent annotations in Open PHACTSSureChEMBL patent annotations in Open PHACTS
SureChEMBL patent annotations in Open PHACTS
 
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...
 
New Approach Methods - What is That?
New Approach Methods - What is That?New Approach Methods - What is That?
New Approach Methods - What is That?
 
CINF 55: SureChEMBL: An open patent chemistry resource
CINF 55: SureChEMBL: An open patent chemistry resourceCINF 55: SureChEMBL: An open patent chemistry resource
CINF 55: SureChEMBL: An open patent chemistry resource
 
SureChEMBL and Open PHACTS
SureChEMBL and Open PHACTSSureChEMBL and Open PHACTS
SureChEMBL and Open PHACTS
 
Crofton Evolution of Toxicology
Crofton Evolution of ToxicologyCrofton Evolution of Toxicology
Crofton Evolution of Toxicology
 
High Throughput Screening drugg discovery.pdf
High Throughput Screening drugg discovery.pdfHigh Throughput Screening drugg discovery.pdf
High Throughput Screening drugg discovery.pdf
 
Multiplexing analysis of 1000 approved drugs in PubChem
Multiplexing analysis of 1000 approved drugs in PubChemMultiplexing analysis of 1000 approved drugs in PubChem
Multiplexing analysis of 1000 approved drugs in PubChem
 
Progress in Using Big Data in Chemical Toxicity Research at the National Cent...
Progress in Using Big Data in Chemical Toxicity Research at the National Cent...Progress in Using Big Data in Chemical Toxicity Research at the National Cent...
Progress in Using Big Data in Chemical Toxicity Research at the National Cent...
 
Best Practices for Validating a Next-Gen Sequencing Workflow
Best Practices for Validating a Next-Gen Sequencing WorkflowBest Practices for Validating a Next-Gen Sequencing Workflow
Best Practices for Validating a Next-Gen Sequencing Workflow
 
VarSeq 2.4.0: VSClinical ACMG Workflow from the User Perspective
VarSeq 2.4.0: VSClinical ACMG Workflow from the User PerspectiveVarSeq 2.4.0: VSClinical ACMG Workflow from the User Perspective
VarSeq 2.4.0: VSClinical ACMG Workflow from the User Perspective
 
VarSeq 2.4.0: VSClinical ACMG Workflow from the User Perspective
VarSeq 2.4.0: VSClinical ACMG Workflow from the User PerspectiveVarSeq 2.4.0: VSClinical ACMG Workflow from the User Perspective
VarSeq 2.4.0: VSClinical ACMG Workflow from the User Perspective
 
Patent awareness particularly in Bio-science related inventions
Patent awareness particularly in Bio-science related inventionsPatent awareness particularly in Bio-science related inventions
Patent awareness particularly in Bio-science related inventions
 
Bioinformatics t9-t10-biocheminformatics v2014
Bioinformatics t9-t10-biocheminformatics v2014Bioinformatics t9-t10-biocheminformatics v2014
Bioinformatics t9-t10-biocheminformatics v2014
 
Overview of SureChEMBL
Overview of SureChEMBLOverview of SureChEMBL
Overview of SureChEMBL
 
Cheminformatics tools and chemistry data underpinning mass spectrometry analy...
Cheminformatics tools and chemistry data underpinning mass spectrometry analy...Cheminformatics tools and chemistry data underpinning mass spectrometry analy...
Cheminformatics tools and chemistry data underpinning mass spectrometry analy...
 
How to place your research questions or results into the context of the "Lega...
How to place your research questions or results into the context of the "Lega...How to place your research questions or results into the context of the "Lega...
How to place your research questions or results into the context of the "Lega...
 
Vanderwall cheminformatics Drexel Part 1
Vanderwall cheminformatics Drexel Part 1Vanderwall cheminformatics Drexel Part 1
Vanderwall cheminformatics Drexel Part 1
 
2015 bioinformatics bio_cheminformatics_wim_vancriekinge
2015 bioinformatics bio_cheminformatics_wim_vancriekinge2015 bioinformatics bio_cheminformatics_wim_vancriekinge
2015 bioinformatics bio_cheminformatics_wim_vancriekinge
 
CoMPARA: Collaborative Modeling Project for Androgen Receptor Activity
CoMPARA: Collaborative Modeling Project for Androgen Receptor ActivityCoMPARA: Collaborative Modeling Project for Androgen Receptor Activity
CoMPARA: Collaborative Modeling Project for Androgen Receptor Activity
 

Mais de open_phacts

Mais de open_phacts (19)

Open PHACTS April 2017 Science webinar Workflow tools
Open PHACTS April 2017 Science webinar Workflow toolsOpen PHACTS April 2017 Science webinar Workflow tools
Open PHACTS April 2017 Science webinar Workflow tools
 
Open PHACTS Webinar Series - Chemistry Platform
Open PHACTS Webinar Series - Chemistry PlatformOpen PHACTS Webinar Series - Chemistry Platform
Open PHACTS Webinar Series - Chemistry Platform
 
Open PHACTS webinar June 2016 - Data2Discovery
Open PHACTS webinar June 2016 - Data2DiscoveryOpen PHACTS webinar June 2016 - Data2Discovery
Open PHACTS webinar June 2016 - Data2Discovery
 
Open PHACTS MIOSS may 2016
Open PHACTS MIOSS may 2016Open PHACTS MIOSS may 2016
Open PHACTS MIOSS may 2016
 
Open PHACTS Webinar: Computational Protocols for In Silico Target Validation
Open PHACTS Webinar: Computational Protocols for In Silico Target ValidationOpen PHACTS Webinar: Computational Protocols for In Silico Target Validation
Open PHACTS Webinar: Computational Protocols for In Silico Target Validation
 
2013-12-04 Experimental data guided docking allows to elucidate the molecular...
2013-12-04 Experimental data guided docking allows to elucidate the molecular...2013-12-04 Experimental data guided docking allows to elucidate the molecular...
2013-12-04 Experimental data guided docking allows to elucidate the molecular...
 
2015-05-19 Open PHACTS Drug Discovery Workflow Workshop - KNIME
2015-05-19 Open PHACTS Drug Discovery Workflow Workshop - KNIME2015-05-19 Open PHACTS Drug Discovery Workflow Workshop - KNIME
2015-05-19 Open PHACTS Drug Discovery Workflow Workshop - KNIME
 
2015-05-19 Open PHACTS Drug Discovery Workflow Workshop - The API
2015-05-19 Open PHACTS Drug Discovery Workflow Workshop - The API2015-05-19 Open PHACTS Drug Discovery Workflow Workshop - The API
2015-05-19 Open PHACTS Drug Discovery Workflow Workshop - The API
 
2015-04-28 Open PHACTS at Swedish Linked Data Network Meet-up
2015-04-28 Open PHACTS at Swedish Linked Data Network Meet-up2015-04-28 Open PHACTS at Swedish Linked Data Network Meet-up
2015-04-28 Open PHACTS at Swedish Linked Data Network Meet-up
 
2015-02-10 The Open PHACTS Discovery Platform: Semantic Data Integration for ...
2015-02-10 The Open PHACTS Discovery Platform: Semantic Data Integration for ...2015-02-10 The Open PHACTS Discovery Platform: Semantic Data Integration for ...
2015-02-10 The Open PHACTS Discovery Platform: Semantic Data Integration for ...
 
2014-03-20 Open PHACTS - A Data Platform for Drug Discovery
2014-03-20 Open PHACTS - A Data Platform for Drug Discovery2014-03-20 Open PHACTS - A Data Platform for Drug Discovery
2014-03-20 Open PHACTS - A Data Platform for Drug Discovery
 
2012-10-08 Practical Semantics In The Pharmaceutical Industry - The Open PHAC...
2012-10-08 Practical Semantics In The Pharmaceutical Industry - The Open PHAC...2012-10-08 Practical Semantics In The Pharmaceutical Industry - The Open PHAC...
2012-10-08 Practical Semantics In The Pharmaceutical Industry - The Open PHAC...
 
2013 Open PHACTS Architecture Poster
2013 Open PHACTS Architecture Poster2013 Open PHACTS Architecture Poster
2013 Open PHACTS Architecture Poster
 
2013 Open PHACTS Scientific Questions Poster
2013 Open PHACTS Scientific Questions Poster2013 Open PHACTS Scientific Questions Poster
2013 Open PHACTS Scientific Questions Poster
 
2013 Open PHACTS Exemplars Poster
2013 Open PHACTS Exemplars Poster2013 Open PHACTS Exemplars Poster
2013 Open PHACTS Exemplars Poster
 
2011-11-28 Open PHACTS at RSC CICAG
2011-11-28 Open PHACTS at RSC CICAG2011-11-28 Open PHACTS at RSC CICAG
2011-11-28 Open PHACTS at RSC CICAG
 
2011-12-02 Open PHACTS at STM Innovation
2011-12-02 Open PHACTS at STM Innovation2011-12-02 Open PHACTS at STM Innovation
2011-12-02 Open PHACTS at STM Innovation
 
2011-10-11 Open PHACTS at BioIT World Europe
2011-10-11 Open PHACTS at BioIT World Europe2011-10-11 Open PHACTS at BioIT World Europe
2011-10-11 Open PHACTS at BioIT World Europe
 
2011-11-07 Open PHACTS Poster
2011-11-07 Open PHACTS Poster2011-11-07 Open PHACTS Poster
2011-11-07 Open PHACTS Poster
 

Último

Call Girls Aurangabad Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Aurangabad Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Aurangabad Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Aurangabad Just Call 8250077686 Top Class Call Girl Service Available
Dipal Arora
 
Call Girls in Gagan Vihar (delhi) call me [🔝 9953056974 🔝] escort service 24X7
Call Girls in Gagan Vihar (delhi) call me [🔝  9953056974 🔝] escort service 24X7Call Girls in Gagan Vihar (delhi) call me [🔝  9953056974 🔝] escort service 24X7
Call Girls in Gagan Vihar (delhi) call me [🔝 9953056974 🔝] escort service 24X7
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Último (20)

Top Quality Call Girl Service Kalyanpur 6378878445 Available Call Girls Any Time
Top Quality Call Girl Service Kalyanpur 6378878445 Available Call Girls Any TimeTop Quality Call Girl Service Kalyanpur 6378878445 Available Call Girls Any Time
Top Quality Call Girl Service Kalyanpur 6378878445 Available Call Girls Any Time
 
VIP Hyderabad Call Girls Bahadurpally 7877925207 ₹5000 To 25K With AC Room 💚😋
VIP Hyderabad Call Girls Bahadurpally 7877925207 ₹5000 To 25K With AC Room 💚😋VIP Hyderabad Call Girls Bahadurpally 7877925207 ₹5000 To 25K With AC Room 💚😋
VIP Hyderabad Call Girls Bahadurpally 7877925207 ₹5000 To 25K With AC Room 💚😋
 
Call Girls in Delhi Triveni Complex Escort Service(🔝))/WhatsApp 97111⇛47426
Call Girls in Delhi Triveni Complex Escort Service(🔝))/WhatsApp 97111⇛47426Call Girls in Delhi Triveni Complex Escort Service(🔝))/WhatsApp 97111⇛47426
Call Girls in Delhi Triveni Complex Escort Service(🔝))/WhatsApp 97111⇛47426
 
Pondicherry Call Girls Book Now 9630942363 Top Class Pondicherry Escort Servi...
Pondicherry Call Girls Book Now 9630942363 Top Class Pondicherry Escort Servi...Pondicherry Call Girls Book Now 9630942363 Top Class Pondicherry Escort Servi...
Pondicherry Call Girls Book Now 9630942363 Top Class Pondicherry Escort Servi...
 
Call Girls Ooty Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Ooty Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Ooty Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Ooty Just Call 8250077686 Top Class Call Girl Service Available
 
All Time Service Available Call Girls Marine Drive 📳 9820252231 For 18+ VIP C...
All Time Service Available Call Girls Marine Drive 📳 9820252231 For 18+ VIP C...All Time Service Available Call Girls Marine Drive 📳 9820252231 For 18+ VIP C...
All Time Service Available Call Girls Marine Drive 📳 9820252231 For 18+ VIP C...
 
Call Girls Aurangabad Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Aurangabad Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Aurangabad Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Aurangabad Just Call 8250077686 Top Class Call Girl Service Available
 
Call Girls Nagpur Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Nagpur Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Nagpur Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Nagpur Just Call 9907093804 Top Class Call Girl Service Available
 
Mumbai ] (Call Girls) in Mumbai 10k @ I'm VIP Independent Escorts Girls 98333...
Mumbai ] (Call Girls) in Mumbai 10k @ I'm VIP Independent Escorts Girls 98333...Mumbai ] (Call Girls) in Mumbai 10k @ I'm VIP Independent Escorts Girls 98333...
Mumbai ] (Call Girls) in Mumbai 10k @ I'm VIP Independent Escorts Girls 98333...
 
Book Paid Powai Call Girls Mumbai 𖠋 9930245274 𖠋Low Budget Full Independent H...
Book Paid Powai Call Girls Mumbai 𖠋 9930245274 𖠋Low Budget Full Independent H...Book Paid Powai Call Girls Mumbai 𖠋 9930245274 𖠋Low Budget Full Independent H...
Book Paid Powai Call Girls Mumbai 𖠋 9930245274 𖠋Low Budget Full Independent H...
 
Call Girls Faridabad Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Faridabad Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Faridabad Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Faridabad Just Call 9907093804 Top Class Call Girl Service Available
 
Night 7k to 12k Navi Mumbai Call Girl Photo 👉 BOOK NOW 9833363713 👈 ♀️ night ...
Night 7k to 12k Navi Mumbai Call Girl Photo 👉 BOOK NOW 9833363713 👈 ♀️ night ...Night 7k to 12k Navi Mumbai Call Girl Photo 👉 BOOK NOW 9833363713 👈 ♀️ night ...
Night 7k to 12k Navi Mumbai Call Girl Photo 👉 BOOK NOW 9833363713 👈 ♀️ night ...
 
Top Rated Hyderabad Call Girls Erragadda ⟟ 9332606886 ⟟ Call Me For Genuine ...
Top Rated  Hyderabad Call Girls Erragadda ⟟ 9332606886 ⟟ Call Me For Genuine ...Top Rated  Hyderabad Call Girls Erragadda ⟟ 9332606886 ⟟ Call Me For Genuine ...
Top Rated Hyderabad Call Girls Erragadda ⟟ 9332606886 ⟟ Call Me For Genuine ...
 
Top Rated Bangalore Call Girls Mg Road ⟟ 9332606886 ⟟ Call Me For Genuine S...
Top Rated Bangalore Call Girls Mg Road ⟟   9332606886 ⟟ Call Me For Genuine S...Top Rated Bangalore Call Girls Mg Road ⟟   9332606886 ⟟ Call Me For Genuine S...
Top Rated Bangalore Call Girls Mg Road ⟟ 9332606886 ⟟ Call Me For Genuine S...
 
Call Girls in Gagan Vihar (delhi) call me [🔝 9953056974 🔝] escort service 24X7
Call Girls in Gagan Vihar (delhi) call me [🔝  9953056974 🔝] escort service 24X7Call Girls in Gagan Vihar (delhi) call me [🔝  9953056974 🔝] escort service 24X7
Call Girls in Gagan Vihar (delhi) call me [🔝 9953056974 🔝] escort service 24X7
 
Call Girls Visakhapatnam Just Call 9907093804 Top Class Call Girl Service Ava...
Call Girls Visakhapatnam Just Call 9907093804 Top Class Call Girl Service Ava...Call Girls Visakhapatnam Just Call 9907093804 Top Class Call Girl Service Ava...
Call Girls Visakhapatnam Just Call 9907093804 Top Class Call Girl Service Ava...
 
VIP Service Call Girls Sindhi Colony 📳 7877925207 For 18+ VIP Call Girl At Th...
VIP Service Call Girls Sindhi Colony 📳 7877925207 For 18+ VIP Call Girl At Th...VIP Service Call Girls Sindhi Colony 📳 7877925207 For 18+ VIP Call Girl At Th...
VIP Service Call Girls Sindhi Colony 📳 7877925207 For 18+ VIP Call Girl At Th...
 
(Low Rate RASHMI ) Rate Of Call Girls Jaipur ❣ 8445551418 ❣ Elite Models & Ce...
(Low Rate RASHMI ) Rate Of Call Girls Jaipur ❣ 8445551418 ❣ Elite Models & Ce...(Low Rate RASHMI ) Rate Of Call Girls Jaipur ❣ 8445551418 ❣ Elite Models & Ce...
(Low Rate RASHMI ) Rate Of Call Girls Jaipur ❣ 8445551418 ❣ Elite Models & Ce...
 
♛VVIP Hyderabad Call Girls Chintalkunta🖕7001035870🖕Riya Kappor Top Call Girl ...
♛VVIP Hyderabad Call Girls Chintalkunta🖕7001035870🖕Riya Kappor Top Call Girl ...♛VVIP Hyderabad Call Girls Chintalkunta🖕7001035870🖕Riya Kappor Top Call Girl ...
♛VVIP Hyderabad Call Girls Chintalkunta🖕7001035870🖕Riya Kappor Top Call Girl ...
 
Call Girls Service Jaipur {9521753030} ❤️VVIP RIDDHI Call Girl in Jaipur Raja...
Call Girls Service Jaipur {9521753030} ❤️VVIP RIDDHI Call Girl in Jaipur Raja...Call Girls Service Jaipur {9521753030} ❤️VVIP RIDDHI Call Girl in Jaipur Raja...
Call Girls Service Jaipur {9521753030} ❤️VVIP RIDDHI Call Girl in Jaipur Raja...
 

Patent annotations: From SureChEMBL to Open PHACTS

  • 1. Patent annotations: From SureChEMBL to Open PHACTS George Papadatos, PhD ChEMBL Group georgep@ebi.ac.uk
  • 2. Outline •  Overview of the SureChEMBL patent resource •  Pipeline, UI and content •  Extraction of chemistry and biological entities •  Integration with the Open PHACTS platform •  Use cases and examples
  • 3. Bioactivity data Compound Assay/Target >Thrombin MAHVRGLQLPGCLALAALCSLVHSQHVFLAPQQARSLLQRVRRANTFLEEVRKGNLE RECVEETCSYEEAFEALESSTATDVFWAKYTACETARTPRDKLAACLEGNCAEGLGT NYRGHVNITRSGIECQLWRSRYPHKPEINSTTHPGADLQENFCRNPDSSTTGPWCYT TDPTVRRQECSIPVCGQDQVTVAMTPRSEGSSVNLSPPLEQCVPDRGQQYQGRLAVT THGLPCLAWASAQAKALSKHQDFNSAVQLVENFCRNPDGDEEGVWCYVAGKPGDFGY CDLNYCEEAVEEETGDGLDEDSDRAIEGRTATSEYQTFFNPRTFGSGEADCGLRPLF EKKSLEDKTERELLESYIDGRIVEGSDAEIGMSPWQVMLFRKSPQELLCGASLISDR WVLTAAHCLLYPPWDKNFTENDLLVRIGKHSRTRYERNIEKISMLEKIYIHPRYNWR ENLDRDIALMKLKKPVAFSDYIHPVCLPDRETAASLLQAGYKGRVTGWGNLKETWTA NVGKGQPSVLQVVNLPIVERPVCKDSTRIRITDNMFCAGYKPDEGKRGDACEGDSGG PFVMKSPFNNRWYQMGIVSWGEGCDRDGKYGFYTHVFRLKKWIQKVIDQFGE 3. Insight, tools and resources for translational drug discovery 2. Organization, integration, curation and standardization of pharmacology data 1. Scientific facts Ki = 4.5nM APTT = 11 min. ChEMBL: Data for drug discovery
  • 4. Why looking at patent documents? •  Patent filing and searching •  Legal, financial and commercial incentives & interests •  Prior art, novelty, freedom to operate searches •  Competitive intelligence •  Unprecedented wealth of knowledge •  Most of the knowledge will never be disclosed anywhere else •  Compounds, scaffolds, reactions •  Biological targets, diseases, indications •  Average lag of 2-4 years between patent document and journal publication disclosure for chemistry, 4-5 for biological targets
  • 5. SureChEMBL data processing WO EP Applica,ons & Granted US Applica,ons & granted JP Abstracts Patent Offices Chemistry Database SureChEMBL System Patent PDFs (service) Applica,on Users API Database En,ty Recogni,on SureChem IP 1-[4-ethoxy-3-(6,7-dihydro-1-methyl-7-oxo-3- propyl-1H-pyrazolo[4,3-d]pyrimidin-5- yl)phenylsulfonyl]-4-methylpiperazine Image to Structure (one method) AUachments Name to Structure (five methods) OCR Processed patents (service) www.surechembl.org
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12. SureChEMBL Data Content •  > 17M unique structures •  ~13M are in a drug-like phys-chem space •  > 14.5M chemically annotated patents  •  ~4.5M are life-science relevant (based on patent classification codes) •  ~80,000 novel compounds extracted from ~50,000 new patents monthly •  2–7 days for a published patent to be chemically annotated and searchable in SureChEMBL •  SureChEMBL provides search access to all patents (not just chemically annotated ones) ~120M patents
  • 13. SureChEMBL data processing v2 WO EP Applica,ons & Granted US Applica,ons & granted JP Abstracts Chemistry Database SureChEMBL System Database En,ty Recogni,on SureChem IP Image to Structure (one method) Name to Structure (five methods) OCR Processed patents (service) Bio-En,ty Recogni,on Patent PDFs (service) Applica,on Server Users Patent Offices www.surechembl.org
  • 14. SureChEMBL bioannotation •  SciBite’s Termite text-mining engine run on 4M life-science patents from SureChEMBL corpus •  Gene names (mapped to HGNC symbols) and diseases (mapped to MeSH IDs) annotated •  Section/frequency information annotated (e.g., in title, abstract, claims, total frequency) •  Relevance score (0-3) to flag important chemical and biological entities and remove noise
  • 15. Relevance scoring – genes/diseases •  Various features used: •  Term frequency •  Position (title, abstract, figure, caption, table) •  Frequency distribution •  Scores range from 0 – 3 •  3 – most important entities in the patent •  2 – important entities in the patent •  1 – mentioned entities in the patent •  0 – ambiguous entity/likely annotation error
  • 16. Relevance scoring - compounds •  Main assumptions for relevance: 1.  Very frequent compounds are irrelevant (but if drug-like then that’s OK) 2.  Compounds with busy chemical space around them are interesting •  Use distribution of close analogues (NNs) among compounds found in the same patent family •  Scores range from 0 – 3 •  3 – highest number of NNs: most important entities in the patent •  2 – important entities in the patent •  1 – few NNs: mentioned entities in the patent •  0 – singletons or trivial entities, most likely errors or reagents, solvents, substituents Hatori, K., Wakabayashi, H., & Tamaki, K. (2008). JCIM, 48(1), 135–142. doi:10.1021/ci7002686 Tyrchan, C., Boström, J., Giordaneto, F., Winter, J., & Muresan, S. (2012). JCIM, 52(6), 1480–1489. doi:10.1021/ci3001293
  • 17. Data details •  4 million full-text patent documents annotated •  USPTO, WIPO, EPO – English language •  Life-sciences relevant •  Patents mapped to SureChEMBL IDs (e.g. EP-1339685-A2) •  Title, publication date, classification codes •  Compounds mapped to SCHEMBL IDs (e.g. SCHEMBL15064) •  Genes mapped to HGNC symbols (e.g. FDFT1) •  Diseases mapped to MeSH terms (e.g. D009765) •  Relevance scores for compounds, genes and diseases •  0: ambiguous/trivial to 3: most important (key) entities in the patent https://wiki.openphacts.org/index.php/SureChEMBL
  • 18. Gotchas & out of scope •  No Markush extraction •  No natural language processing (e.g., ‘compound x is an inhibitor of target y’) •  No extraction of bioactivities •  No chemistry search (yet) •  Patent coverage stops in April 2015 •  Incremental updates TBD •  Patent calls still in development
  • 19. Open PHACTS Patent API https://dev.openphacts.org/docs/develop
  • 20. Open PHACTS Patent API Disease Compound Target Patent extracted links
  • 21. Open PHACTS Patent API Disease Compound Target Patent inferred links extracted links
  • 22. Use case #1: Patent to Entities 1.  From a patent get compounds, genes and diseases 2.  Filter to remove noise •  Frequency and relevance score 3.  Process and visualise Disease Compound Target Patent ? ? ?
  • 24. Use case #1: Patent to Entities •  Patent URI: •  http://rdf.ebi.ac.uk/resource/surechembl/patent/US-7718693-B2 •  API call: •  586 entities back
  • 25. Use case #1: Patent to Entities •  Patent URI: •  http://rdf.ebi.ac.uk/resource/surechembl/patent/US-7718693-B2 •  API call: •  586 entities back
  • 26. 1) Look at target and disease entities •  178 target and disease entities •  Filter: Relevance score >= 2 à 23 remain •  Visualise in tag cloud by frequency
  • 27. 1) Look at target and disease entities •  178 target and disease entities •  Filter: Relevance score >= 2 à 23 remain •  Visualise in tag cloud by frequency
  • 28. Does it make sense?
  • 29. Does it make sense? US-7718693-B2
  • 30. 2) Look at compound entities •  408 compound entities •  Filter: Relevance score >= 1 à 201 remain •  Calculate properties
  • 31. 2) Look at compound entities •  408 compound entities •  Filter: Relevance score >= 1 à 201 remain •  Calculate properties
  • 32. Does it make sense? •  Calculate MCS
  • 33. Does it make sense? •  Calculate MCS US-7718693-B2
  • 34. Use case #2: Drug targets & indications for compound 1.  Search patents for a compound (approved drug) 2.  Filter to remove noise •  Frequency, relevance score and classification code 3.  For remaining patents, get disease and target entities 4.  Filter to remove noise •  Frequency and relevance score 5.  Visualise results Disease Compound Target Patent
  • 36. 1) Get patents for Eluxadoline •  UniChem call à SCHEMBL12971682 •  Compound URI: •  http://rdf.ebi.ac.uk/resource/surechembl/molecule/SCHEMBL12971682 •  API call: •  Relevance score >=1 à 17 patents (patentome):
  • 37. 1) Get patents for Eluxadoline •  UniChem call à SCHEMBL12971682 •  Compound URI: •  http://rdf.ebi.ac.uk/resource/surechembl/molecule/SCHEMBL12971682 •  API call: •  Relevance score >=1 à 17 patents (patentome):
  • 38. 2) Get target and disease entities for patents •  API call: •  Classification codes for patents •  Filter: A61 and C07* à 11 patents remaining •  API call: •  Filter: Relevance score >= 2, Frequency >= 2 à 2 targets, 21 diseases •  Visualise with tag clouds by frequency * http://web2.wipo.int/classifications/ipc/ipcpub/
  • 40. Does it make sense? http://www.accessdata.fda.gov/drugsatfda_docs/label/2015/206940s000lbl.pdf
  • 41. Does it make sense? http://www.accessdata.fda.gov/drugsatfda_docs/label/2015/206940s000lbl.pdf
  • 42. Take home message •  It is now possible to extract and interlink the key structures, scaffolds, targets and diseases from med. chem. patent corpus automatically •  By high-throughput text-mining only •  Thanks to simple heuristics (relevance scores and frequency) •  Using the Open PHACTS API and KNIME •  For the first time in a free resource in such scale Disease Compound Target Patent
  • 43. Take home message II •  It is now possible to link compounds to targets and diseases using the med. chem. patent corpus as evidence •  By text-mining only •  Thanks to heuristics (relevance scores and frequencies) •  Using the Open PHACTS API and KNIME •  For the first time in a free resource in such scale •  Inconceivable just 2 years ago! Disease Compound Target Patent
  • 44. What next: Other ideas and use cases •  Target validation / Druggability •  For a target, get me all related and relevant diseases •  Compare with DisGeNET / CTTV, etc. •  Any known patented scaffolds for my target? •  Start a pharmacophore hypothesis for patent busting •  Novelty checking / Due diligence •  What do we know about this scaffold / compound? •  Add ChEMBL pharmacology, pathway information •  Large-scale data mining
  • 45. Availability •  API calls will be released to production soon •  KNIME workflows available on request then •  SureChEMBL annotations licensed under CC BY-SA •  SciBite annotations licensed under CC BY-NC-SA
  • 46.
  • 47.
  • 48. Acknowledgements •  SureChEMBL •  Anna Gaulton •  Mark Davies •  Nathan Dedman •  James Siddle •  Anne Hersey •  SciBite •  Lee Harland •  Open PHACTS consortium •  Stefan Senger •  Nick Lynch •  Daniela Digles •  Antonis Loizou
  • 50. Patent annotations: From SureChEMBL to Open PHACTS George Papadatos, PhD ChEMBL Group georgep@ebi.ac.uk