SlideShare uma empresa Scribd logo
1 de 36
Baixar para ler offline
Bibliological data science
and drug discovery
Knowing the knowns*
Effectively Harnessing the World’s Literature To Inform Rational Compound Design - ACS National Meeting, Philadelphia, Aug 21-24, 2016
Jeremy J Yang
Translational Informatics Division
School of Medicine
University of New Mexico
Integrative Data Science Lab
School of Informatics & Computing
Indiana University
*phrase borrowed from Edgar Jacoby, Janssen.
In science, luck favors the prepared.
- Louis Pasteur
The main thing was not to . . . "foul up."
- The Right Stuff, by Tom Wolfe, about John Glenn.
Overview of talk
● Formulation of problem
● Resources and examples:
TIN-X, Target Importance and Novelty Explorer (&IDG)
Chem2Bio2RDF
OPDDR, Open Phenotypic Drug Discovery Resource
DrugCentral
Formulation of problem
● "World's Literature" redefined by online revolution
● Rational Compound Design = improving our odds
● For given research question, what are the known knowns?
● Connect the dots and weigh the evidence from global
knowledge graph.
TIN-X
TIN-X
Target Importance & Novelty Explorer
● Bibliometric application developed for Illuminating the
Druggable Genome (IDG) project
● Text mining from Novo Nordisk Center for Protein
Research (U. Copenhagen) lab of Lars Juhl Jensen.
● Algorithm and client developed at UNM (Cristian Bologa,
Daniel Cannon)
● Disease Ontology (DO) classification
● Drug Target Ontology (DTO) protein classification
Illuminating the Druggable Genome (IDG)
7
Knowledge Mgmt Center PI:
Tudor Oprea, MD, PhD
pharos.nih.gov
TIN-X
http://newdrugtargets.org
TIN-X
TIN-X
http://newdrugtargets.org
Target Novelty:
Fk
= 1 / Tk
● Tk
= # targets in paper (k)
● Fk
= fractional score of paper (k)
● for papers where Tk
> 0
Ni
= 1 / ∑(Fk
)
● Ni
= novelty, target (i)
● sum over papers where target (i) mentioned
Target-Disease Importance:
Fk
= 1 / (Tk
* Dk
)
● Tk
= # targets in paper (k)
● Dk
= # diseases in paper (k)
● Fk
= fractional score of paper (k)
Iij
= ∑(Fk
)
● Iij
= importance, target (i) for disease (j)
● sum over papers where both mentioned
Target Importance and Novelty Explorer (TIN-X), Daniel Cannon, Jeremy Yang, Stephen Mathias, Oleg Ursu, Subramani
Mani, Anna Waller, Stephan Schürer, Lars Juhl Jensen, Larry Sklar, Cristian Bologa, and Tudor Oprea (manuscript in
preparation).
TIN-X
TIN-X
Target Importance & Novelty Explorer
● Text mining is a valuable tool for monitoring literature,
filtering and ranking, and detecting trends.
● Automation can infer patterns regarding community
trends and consensus.
● Interactive visualization tools help navigate big data.
● Good big data text miners care about small data too!
TIN-X
Key contributors
Cristian Bologa Daniel Cannon Lars Juhl Jensen
Chem2Bio2RDF
● 24 sources, 52
datasets, 78M triples
● Semantically linked
● Chen, B, et al, BMC
Bioinformatics (2010).
● Chen, B et al, PLoS
Comp Bio (2012).
● Fu, G et al, BMC
Bioinfo (2016).
● Related projects:
Bio2RDF, LOD
http://chem2bio2rdf.org
Classes:
biological
chemical
chemogenomics
literature
phenotype
systems
disease
pathway
polypharmacology
PPI
side effect
BindingDB
BindingMOAD
IU
ChEBI
ChEMBL
CTD
DCDB
DIP
DrugBank
HGNC
HPRD
KEGG
MATADOR
OMIM
PDBe
PDSP
PharmGKB
PubChem
PubMed
Reactome
SIDER
TTD
UniProt
Sources:
Linked Open
Data (LOD)
http://linkeddata.org/
Chem2Bio2RDF apps: (1) SLAP, (2) Metapaths
2012
2016
● Data semantics essential for integration of
heterogeneous sources
● Strong evidence requires strong semantics
● Semantic Web Technologies common framework
enabling -- but not assuring -- community progress
● Chem2Bio2RDF v2.0 to leverage major community
advances (esp. Open PHACTS)
● Data ecosystems, coop-tition & prisoner's dilemma
Key contributors
Bin Chen Ying Ding David Wild
OPDDR
OPDDR
Open Phenotypic Drug Discovery Resource
https://ncats.nih.gov/expertise/preclinical/pd2
OPDDR
collaboration
Example: OIDD HeLa cell based assay
Integrated RDF
bioassay:AID1117350
skos:exactMatch
oidd_assay:17 .
bioassay:AID1117350
dcterms:source source:ID846 ;
dcterms:title "Increased chromatin
condensation in HeLa cells-IC50"@en .
bioassay:AID1117350 rdf:type bao:BAO_0002786 .
bioassay:AID1117350 rdf:type bao:BAO_0000010 .
bioassay:AID1117350 rdf:type bao:BAO_0000219 .
endpoint:SID170464897_AID1117349
vocabulary:PubChemAssayOutcome vocabulary:active ;
sio:has-value "0.0656"^^xsd:float ;
a bao:BAO_0000190 ;
rdfs:label "IC50"@en .
substance:SID170464897
skos:exactMatch
chembl_molecule:CHEMBL1483 .
chembl_assay:OIDD00017
cco:hasCellLine
chembl_cell_line:CHEMBL3308376 .
D2D builds apps,
tools and solutions
for knowledge
discovery powered
by fast, scalable
network analytics
and rigorous
semantics.
d2discovery.com
Predictive Phenotypic Profiler (P3) prototype
openphacts.org
OPDDR
● OPDDR phenotypic assays have been linked and
integrated via community semantics to both
phenotypic (cell lines) and molecular
(genomic/protein targets)
● New phenotypic knowledge domain offers additional
value in drug discovery and pharmacological
informatics
● Open PHACTS excellent, well suited platform
DrugCentral
DrugCentral
● DrugCentral is a free, open, curated resource about
approved drugs, designed for research
● Compounds, products, labels, targets, IDs, names
● DrugCentral developed over several years at UNM
● DrugCentral recently released with new interface
● License: CC-BY-SA
http://drugcentral.org
http://drugcentral.org
http://drugcentral.org
DrugCentral
● Free, open, accurate, comprehensive drug reference
for biomolecular and biomedical informatics research
Compounds 4444
Products 84787
Synonyms 20522
Structures 4231
Targets 3651
Bioactivities 15620
MoA 3484
SNOMED 45349
"DrugCentral: online drug compendium", Oleg Ursu, Jayme Holmes, Jeffrey Knockel, Cristian Bologa,
Jeremy Yang, Stephen Mathias, Stuart Nelson, Tudor Oprea (manuscript submitted).
In Conclusion
● New resources continue to emerge and evolve, providing
opportunities for knowledge driven drug discovery
● Community standards → more intelligent web
● Adapt to new data environment for success
● Private + public data must be integrated to
○ Be prepared (like Pasteur)
○ Not "foul up" (like Glenn)

Mais conteúdo relacionado

Destaque

Phenotypic drug discovery special interest group outline
Phenotypic drug discovery special interest group outlinePhenotypic drug discovery special interest group outline
Phenotypic drug discovery special interest group outline
Jonathan Lee
 
Playstation 4
Playstation 4Playstation 4
Playstation 4
ocjs
 

Destaque (8)

Chem2bio2rdf portal
Chem2bio2rdf portalChem2bio2rdf portal
Chem2bio2rdf portal
 
Open Phenotypic Drug Discovery Resource poster
Open Phenotypic Drug Discovery Resource posterOpen Phenotypic Drug Discovery Resource poster
Open Phenotypic Drug Discovery Resource poster
 
Phenotypic drug discovery special interest group outline
Phenotypic drug discovery special interest group outlinePhenotypic drug discovery special interest group outline
Phenotypic drug discovery special interest group outline
 
Rmc phenotypic screening
Rmc phenotypic screeningRmc phenotypic screening
Rmc phenotypic screening
 
Network pharmacology: From BioAssay Response Data to Network
Network pharmacology: From BioAssay Response Data to NetworkNetwork pharmacology: From BioAssay Response Data to Network
Network pharmacology: From BioAssay Response Data to Network
 
Towards semantic systems chemical biology
Towards semantic systems chemical biology Towards semantic systems chemical biology
Towards semantic systems chemical biology
 
Playstation 4
Playstation 4Playstation 4
Playstation 4
 
Leveraging molecular and clinical data to transform drug discovery in the era...
Leveraging molecular and clinical data to transform drug discovery in the era...Leveraging molecular and clinical data to transform drug discovery in the era...
Leveraging molecular and clinical data to transform drug discovery in the era...
 

Semelhante a Bibliological data science and drug discovery

dkNET Webinar: Discover the Latest from dkNET - Biomed Resource Watch 06/02/2023
dkNET Webinar: Discover the Latest from dkNET - Biomed Resource Watch 06/02/2023dkNET Webinar: Discover the Latest from dkNET - Biomed Resource Watch 06/02/2023
dkNET Webinar: Discover the Latest from dkNET - Biomed Resource Watch 06/02/2023
dkNET
 
dkNET Webinar: Discovering and Evaluating Antibodies, Cell Lines, Software To...
dkNET Webinar: Discovering and Evaluating Antibodies, Cell Lines, Software To...dkNET Webinar: Discovering and Evaluating Antibodies, Cell Lines, Software To...
dkNET Webinar: Discovering and Evaluating Antibodies, Cell Lines, Software To...
dkNET
 
Research Data Alliance (RDA) Webinar: What do you really know about that anti...
Research Data Alliance (RDA) Webinar: What do you really know about that anti...Research Data Alliance (RDA) Webinar: What do you really know about that anti...
Research Data Alliance (RDA) Webinar: What do you really know about that anti...
dkNET
 
Building a Knowledge Graph with Spark and NLP: How We Recommend Novel Drugs t...
Building a Knowledge Graph with Spark and NLP: How We Recommend Novel Drugs t...Building a Knowledge Graph with Spark and NLP: How We Recommend Novel Drugs t...
Building a Knowledge Graph with Spark and NLP: How We Recommend Novel Drugs t...
Databricks
 

Semelhante a Bibliological data science and drug discovery (20)

dkNET Webinar: Discover the Latest from dkNET - Biomed Resource Watch 06/02/2023
dkNET Webinar: Discover the Latest from dkNET - Biomed Resource Watch 06/02/2023dkNET Webinar: Discover the Latest from dkNET - Biomed Resource Watch 06/02/2023
dkNET Webinar: Discover the Latest from dkNET - Biomed Resource Watch 06/02/2023
 
Structured Review and a Datascope Library: Surfing Waves of Knowledge Between...
Structured Review and a Datascope Library: Surfing Waves of Knowledge Between...Structured Review and a Datascope Library: Surfing Waves of Knowledge Between...
Structured Review and a Datascope Library: Surfing Waves of Knowledge Between...
 
Thin Slicing a Black Swan: When Less Is More
Thin Slicing a Black Swan: When Less Is MoreThin Slicing a Black Swan: When Less Is More
Thin Slicing a Black Swan: When Less Is More
 
Open reproducible research
Open reproducible researchOpen reproducible research
Open reproducible research
 
Digital Scholar Webinar: Open reproducible research
Digital Scholar Webinar: Open reproducible researchDigital Scholar Webinar: Open reproducible research
Digital Scholar Webinar: Open reproducible research
 
Data-driven drug discovery for rare diseases - Tales from the trenches (CINF ...
Data-driven drug discovery for rare diseases - Tales from the trenches (CINF ...Data-driven drug discovery for rare diseases - Tales from the trenches (CINF ...
Data-driven drug discovery for rare diseases - Tales from the trenches (CINF ...
 
dkNET Webinar: Discovering and Evaluating Antibodies, Cell Lines, Software To...
dkNET Webinar: Discovering and Evaluating Antibodies, Cell Lines, Software To...dkNET Webinar: Discovering and Evaluating Antibodies, Cell Lines, Software To...
dkNET Webinar: Discovering and Evaluating Antibodies, Cell Lines, Software To...
 
Research Data Alliance (RDA) Webinar: What do you really know about that anti...
Research Data Alliance (RDA) Webinar: What do you really know about that anti...Research Data Alliance (RDA) Webinar: What do you really know about that anti...
Research Data Alliance (RDA) Webinar: What do you really know about that anti...
 
Osp 1st sep2015 OSDD
Osp 1st sep2015 OSDDOsp 1st sep2015 OSDD
Osp 1st sep2015 OSDD
 
Scott Edmunds Open data examples, from the Science as an Open Enterprise sess...
Scott Edmunds Open data examples, from the Science as an Open Enterprise sess...Scott Edmunds Open data examples, from the Science as an Open Enterprise sess...
Scott Edmunds Open data examples, from the Science as an Open Enterprise sess...
 
Scott Edmunds & Mendel Wong, Citizen Science #101. HKU MPA lecuture
Scott Edmunds & Mendel Wong, Citizen Science #101. HKU MPA lecutureScott Edmunds & Mendel Wong, Citizen Science #101. HKU MPA lecuture
Scott Edmunds & Mendel Wong, Citizen Science #101. HKU MPA lecuture
 
Omdi2021 Ontologies for (Materials) Science in the Digital Age
Omdi2021 Ontologies for (Materials) Science in the Digital AgeOmdi2021 Ontologies for (Materials) Science in the Digital Age
Omdi2021 Ontologies for (Materials) Science in the Digital Age
 
Kohlmeier "Innovations in Academic Search & Discovery - A Case Study From the...
Kohlmeier "Innovations in Academic Search & Discovery - A Case Study From the...Kohlmeier "Innovations in Academic Search & Discovery - A Case Study From the...
Kohlmeier "Innovations in Academic Search & Discovery - A Case Study From the...
 
Extreme scale text based classification of medical data
Extreme scale text based classification of medical dataExtreme scale text based classification of medical data
Extreme scale text based classification of medical data
 
DSS Ontotext Webinar -Examode: Extreme-scale text-based classification of med...
DSS Ontotext Webinar -Examode: Extreme-scale text-based classification of med...DSS Ontotext Webinar -Examode: Extreme-scale text-based classification of med...
DSS Ontotext Webinar -Examode: Extreme-scale text-based classification of med...
 
Reproducibility in cheminformatics and computational chemistry research: cert...
Reproducibility in cheminformatics and computational chemistry research: cert...Reproducibility in cheminformatics and computational chemistry research: cert...
Reproducibility in cheminformatics and computational chemistry research: cert...
 
Martone grethe
Martone gretheMartone grethe
Martone grethe
 
Open zika presentation
Open zika presentation Open zika presentation
Open zika presentation
 
Scott Edmunds: Data publication in the data deluge
Scott Edmunds: Data publication in the data delugeScott Edmunds: Data publication in the data deluge
Scott Edmunds: Data publication in the data deluge
 
Building a Knowledge Graph with Spark and NLP: How We Recommend Novel Drugs t...
Building a Knowledge Graph with Spark and NLP: How We Recommend Novel Drugs t...Building a Knowledge Graph with Spark and NLP: How We Recommend Novel Drugs t...
Building a Knowledge Graph with Spark and NLP: How We Recommend Novel Drugs t...
 

Mais de Jeremy Yang

Promiscuous patterns and perils in PubChem and the MLSCN
Promiscuous patterns and perils in PubChem and the MLSCNPromiscuous patterns and perils in PubChem and the MLSCN
Promiscuous patterns and perils in PubChem and the MLSCN
Jeremy Yang
 

Mais de Jeremy Yang (19)

TIGA: Target Illumination GWAS Analytics
TIGA: Target Illumination GWAS AnalyticsTIGA: Target Illumination GWAS Analytics
TIGA: Target Illumination GWAS Analytics
 
DrugCentralDb and BioClients: Dockerized PostgreSql with Python API-tizer
DrugCentralDb and BioClients: Dockerized PostgreSql with Python API-tizerDrugCentralDb and BioClients: Dockerized PostgreSql with Python API-tizer
DrugCentralDb and BioClients: Dockerized PostgreSql with Python API-tizer
 
Mining ClinicalTrials.gov via CTTI AACT for drug target hypotheses
Mining ClinicalTrials.gov via CTTI AACT for drug target hypothesesMining ClinicalTrials.gov via CTTI AACT for drug target hypotheses
Mining ClinicalTrials.gov via CTTI AACT for drug target hypotheses
 
TIN-X v2: modernized architecture with REST API
TIN-X v2: modernized architecture with REST APITIN-X v2: modernized architecture with REST API
TIN-X v2: modernized architecture with REST API
 
Ex-files: Sex-Specific Gene Expression Profiles Explorer
Ex-files: Sex-Specific Gene Expression Profiles ExplorerEx-files: Sex-Specific Gene Expression Profiles Explorer
Ex-files: Sex-Specific Gene Expression Profiles Explorer
 
Illuminating the Druggable Genome with Knowledge Engineering and Machine Lear...
Illuminating the Druggable Genome with Knowledge Engineering and Machine Lear...Illuminating the Druggable Genome with Knowledge Engineering and Machine Lear...
Illuminating the Druggable Genome with Knowledge Engineering and Machine Lear...
 
Badapple: promiscuity patterns from noisy evidence (poster)
Badapple: promiscuity patterns from noisy evidence (poster)Badapple: promiscuity patterns from noisy evidence (poster)
Badapple: promiscuity patterns from noisy evidence (poster)
 
BioMISS: Language Diversity of Computing
BioMISS: Language Diversity of ComputingBioMISS: Language Diversity of Computing
BioMISS: Language Diversity of Computing
 
The Language Diversity of Computing
The Language Diversity of ComputingThe Language Diversity of Computing
The Language Diversity of Computing
 
RMSD: routine measure stirs doubts
RMSD: routine measure stirs doubtsRMSD: routine measure stirs doubts
RMSD: routine measure stirs doubts
 
Canonicalized systematic nomenclature in cheminformatics
Canonicalized systematic nomenclature in cheminformaticsCanonicalized systematic nomenclature in cheminformatics
Canonicalized systematic nomenclature in cheminformatics
 
Molecular scaffolds poster
Molecular scaffolds posterMolecular scaffolds poster
Molecular scaffolds poster
 
Molecular scaffolds are special and useful guides to discovery
Molecular scaffolds are special and useful guides to discoveryMolecular scaffolds are special and useful guides to discovery
Molecular scaffolds are special and useful guides to discovery
 
The BADAPPLE promiscuity plugin for BARD
The BADAPPLE promiscuity plugin for BARDThe BADAPPLE promiscuity plugin for BARD
The BADAPPLE promiscuity plugin for BARD
 
Cheminformatics Software Development: Case Studies
Cheminformatics Software Development: Case StudiesCheminformatics Software Development: Case Studies
Cheminformatics Software Development: Case Studies
 
How am I supposed to organize a protein database when I can't even organize m...
How am I supposed to organize a protein database when I can't even organize m...How am I supposed to organize a protein database when I can't even organize m...
How am I supposed to organize a protein database when I can't even organize m...
 
UNM Division of Biocomputing public web applications
UNM Division of Biocomputing public web applicationsUNM Division of Biocomputing public web applications
UNM Division of Biocomputing public web applications
 
Cyberinfrastructure Day 2010: Applications in Biocomputing
Cyberinfrastructure Day 2010: Applications in BiocomputingCyberinfrastructure Day 2010: Applications in Biocomputing
Cyberinfrastructure Day 2010: Applications in Biocomputing
 
Promiscuous patterns and perils in PubChem and the MLSCN
Promiscuous patterns and perils in PubChem and the MLSCNPromiscuous patterns and perils in PubChem and the MLSCN
Promiscuous patterns and perils in PubChem and the MLSCN
 

Último

Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
Sérgio Sacani
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
PirithiRaju
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
levieagacer
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
RizalinePalanog2
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
Areesha Ahmad
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Sérgio Sacani
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
1301aanya
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Sérgio Sacani
 

Último (20)

Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
 
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATION
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATIONSTS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATION
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATION
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
 

Bibliological data science and drug discovery

  • 1. Bibliological data science and drug discovery Knowing the knowns* Effectively Harnessing the World’s Literature To Inform Rational Compound Design - ACS National Meeting, Philadelphia, Aug 21-24, 2016 Jeremy J Yang Translational Informatics Division School of Medicine University of New Mexico Integrative Data Science Lab School of Informatics & Computing Indiana University *phrase borrowed from Edgar Jacoby, Janssen.
  • 2. In science, luck favors the prepared. - Louis Pasteur The main thing was not to . . . "foul up." - The Right Stuff, by Tom Wolfe, about John Glenn.
  • 3. Overview of talk ● Formulation of problem ● Resources and examples: TIN-X, Target Importance and Novelty Explorer (&IDG) Chem2Bio2RDF OPDDR, Open Phenotypic Drug Discovery Resource DrugCentral
  • 4. Formulation of problem ● "World's Literature" redefined by online revolution ● Rational Compound Design = improving our odds ● For given research question, what are the known knowns? ● Connect the dots and weigh the evidence from global knowledge graph.
  • 6. TIN-X Target Importance & Novelty Explorer ● Bibliometric application developed for Illuminating the Druggable Genome (IDG) project ● Text mining from Novo Nordisk Center for Protein Research (U. Copenhagen) lab of Lars Juhl Jensen. ● Algorithm and client developed at UNM (Cristian Bologa, Daniel Cannon) ● Disease Ontology (DO) classification ● Drug Target Ontology (DTO) protein classification
  • 7. Illuminating the Druggable Genome (IDG) 7 Knowledge Mgmt Center PI: Tudor Oprea, MD, PhD pharos.nih.gov
  • 11. Target Novelty: Fk = 1 / Tk ● Tk = # targets in paper (k) ● Fk = fractional score of paper (k) ● for papers where Tk > 0 Ni = 1 / ∑(Fk ) ● Ni = novelty, target (i) ● sum over papers where target (i) mentioned Target-Disease Importance: Fk = 1 / (Tk * Dk ) ● Tk = # targets in paper (k) ● Dk = # diseases in paper (k) ● Fk = fractional score of paper (k) Iij = ∑(Fk ) ● Iij = importance, target (i) for disease (j) ● sum over papers where both mentioned Target Importance and Novelty Explorer (TIN-X), Daniel Cannon, Jeremy Yang, Stephen Mathias, Oleg Ursu, Subramani Mani, Anna Waller, Stephan Schürer, Lars Juhl Jensen, Larry Sklar, Cristian Bologa, and Tudor Oprea (manuscript in preparation). TIN-X
  • 12. TIN-X Target Importance & Novelty Explorer ● Text mining is a valuable tool for monitoring literature, filtering and ranking, and detecting trends. ● Automation can infer patterns regarding community trends and consensus. ● Interactive visualization tools help navigate big data. ● Good big data text miners care about small data too!
  • 13. TIN-X Key contributors Cristian Bologa Daniel Cannon Lars Juhl Jensen
  • 15. ● 24 sources, 52 datasets, 78M triples ● Semantically linked ● Chen, B, et al, BMC Bioinformatics (2010). ● Chen, B et al, PLoS Comp Bio (2012). ● Fu, G et al, BMC Bioinfo (2016). ● Related projects: Bio2RDF, LOD http://chem2bio2rdf.org
  • 17.
  • 19. Chem2Bio2RDF apps: (1) SLAP, (2) Metapaths 2012 2016
  • 20. ● Data semantics essential for integration of heterogeneous sources ● Strong evidence requires strong semantics ● Semantic Web Technologies common framework enabling -- but not assuring -- community progress ● Chem2Bio2RDF v2.0 to leverage major community advances (esp. Open PHACTS) ● Data ecosystems, coop-tition & prisoner's dilemma
  • 21. Key contributors Bin Chen Ying Ding David Wild
  • 22. OPDDR
  • 23. OPDDR Open Phenotypic Drug Discovery Resource
  • 26. Example: OIDD HeLa cell based assay Integrated RDF bioassay:AID1117350 skos:exactMatch oidd_assay:17 . bioassay:AID1117350 dcterms:source source:ID846 ; dcterms:title "Increased chromatin condensation in HeLa cells-IC50"@en . bioassay:AID1117350 rdf:type bao:BAO_0002786 . bioassay:AID1117350 rdf:type bao:BAO_0000010 . bioassay:AID1117350 rdf:type bao:BAO_0000219 . endpoint:SID170464897_AID1117349 vocabulary:PubChemAssayOutcome vocabulary:active ; sio:has-value "0.0656"^^xsd:float ; a bao:BAO_0000190 ; rdfs:label "IC50"@en . substance:SID170464897 skos:exactMatch chembl_molecule:CHEMBL1483 . chembl_assay:OIDD00017 cco:hasCellLine chembl_cell_line:CHEMBL3308376 .
  • 27. D2D builds apps, tools and solutions for knowledge discovery powered by fast, scalable network analytics and rigorous semantics. d2discovery.com Predictive Phenotypic Profiler (P3) prototype
  • 29. OPDDR ● OPDDR phenotypic assays have been linked and integrated via community semantics to both phenotypic (cell lines) and molecular (genomic/protein targets) ● New phenotypic knowledge domain offers additional value in drug discovery and pharmacological informatics ● Open PHACTS excellent, well suited platform
  • 31. DrugCentral ● DrugCentral is a free, open, curated resource about approved drugs, designed for research ● Compounds, products, labels, targets, IDs, names ● DrugCentral developed over several years at UNM ● DrugCentral recently released with new interface ● License: CC-BY-SA http://drugcentral.org
  • 34. DrugCentral ● Free, open, accurate, comprehensive drug reference for biomolecular and biomedical informatics research Compounds 4444 Products 84787 Synonyms 20522 Structures 4231 Targets 3651 Bioactivities 15620 MoA 3484 SNOMED 45349
  • 35. "DrugCentral: online drug compendium", Oleg Ursu, Jayme Holmes, Jeffrey Knockel, Cristian Bologa, Jeremy Yang, Stephen Mathias, Stuart Nelson, Tudor Oprea (manuscript submitted).
  • 36. In Conclusion ● New resources continue to emerge and evolve, providing opportunities for knowledge driven drug discovery ● Community standards → more intelligent web ● Adapt to new data environment for success ● Private + public data must be integrated to ○ Be prepared (like Pasteur) ○ Not "foul up" (like Glenn)