SlideShare uma empresa Scribd logo
1 de 34
Baixar para ler offline
Bringing bioassay protocols to the world of
informatics, using semantic annotations
Alex M. Clark
SAR
◈ The activity column:
▷ TCruzi (chagas)
▷ IC50
▷ nM
◈ Underdefined in a
typical SDfile or table
◈ Only alternative is a
verbose publication
2
Assay Informatics
◈ Measurement of an activity has many details:
▷ target, organism, cell line, measurement type,
units, experimental design, instrumentation, etc.
◈ Usually only available as text...
◈ ... needs to be informatics.
◈ Searching for experiments, comparing datasets,
spotting trends, evaluating completeness
3
Cell-Free Homogeneous Primary HTS to Identify Inhibitors of
GSK3beta Activity
(1) Dispense 1 uL/well of CABPE, 0.5 uL of ATP, and 1 uL of
positive control GW8510 or AB in respective wells according to
plate design to 1536-well assay ready plates (Aurora 29847)
that contain 2.5 nL/well of 10 mM compound using BioRAPTR
(Beckman) to start the reaction. Incubate at room temperature for
60 minutes.
(2) Add 2.5 uL/well of ADP-glo (Promega, V9103) with
BioRAPTR, incubate at room temperature for 40 minutes
(3) Add 5 uL/well of ADP-glo (Promega, V9103) with Combi nL
(Thermo), incubate at room temperature for 30 minutes
Assay Informatics
◈ Measurement of an activity has many details:
▷ target, organism, cell line, measurement type,
units, experimental design, instrumentation, etc.
◈ Usually only available as text...
◈ ... needs to be informatics.
◈ Searching for experiments, comparing datasets,
spotting trends, evaluating completeness
3
Cell-Free Homogeneous Primary HTS to Identify Inhibitors of
GSK3beta Activity
(1) Dispense 1 uL/well of CABPE, 0.5 uL of ATP, and 1 uL of
positive control GW8510 or AB in respective wells according to
plate design to 1536-well assay ready plates (Aurora 29847)
that contain 2.5 nL/well of 10 mM compound using BioRAPTR
(Beckman) to start the reaction. Incubate at room temperature for
60 minutes.
(2) Add 2.5 uL/well of ADP-glo (Promega, V9103) with
BioRAPTR, incubate at room temperature for 40 minutes
(3) Add 5 uL/well of ADP-glo (Promega, V9103) with Combi nL
(Thermo), incubate at room temperature for 30 minutes
http://www.bioassayontology.org/bao#BAO_0002909
http://www.drugtargetontology.org/dto/DTO_03300101
http://purl.obolibrary.org/obo/NCBITaxon_9606
The Mission
4
◈ To make the world's bioassay protocol data
machine readable
◈ Much progress toward making this goal an actual
reality...
◈ With the help of ontology markup:
▷ BAO, DTO, CLO, GO, UO, UniProt...
5
Ontologies
BAO, DTO, CLO,

UO, UniProtLegacy Data
PubMed

PubChem

MLPCN

Tox21

ChEMBL
Common Assay

Template
Fully Annotated

Assays
New Data
ELN
ML
Common Assay Template
◈ A recipe for how to use ontologies
◈ Capture most important information for most assays
6
Legacy Curation
◈ Text-centric data: millions of assays...
▷ private collections (ELNs)
▷ published literature (PubMed)
▷ assay databases (PubChem)
◈ Most accessible sources: MLPCN, Tox21
◈ Data comes in as plain text + several fields + SAR
7
Hybrid machine learning
1. Text extraction
2. Probabilistic:
▷ natural language models
▷ correlation models
3. Axiomatic rules
◈ Each of these accelerates the human curator
8
Text extraction
◈ Optimised for low false positives
9
qHTS Assay for Small Molecule Inhibitors of the
Human hERG Channel Activity
NIH Chemical Genomics Center [NCGC]
National Institutes of Environmental Health Sciences
[NIEHS]
National Toxicology Program [NTP]
NCGC Assay Overview:
We have developed a 1536-well cell-based assay for
quantitative high throughput screening (qHTS) to
determine in vitro hERG channel blockage as a
measure of cardio toxicity with small molecules. This
particular assay uses the U2OS cell line which is
derived from human osteosarcoma.
...
Text extraction
◈ Optimised for low false positives
9
qHTS Assay for Small Molecule Inhibitors of the
Human hERG Channel Activity
NIH Chemical Genomics Center [NCGC]
National Institutes of Environmental Health Sciences
[NIEHS]
National Toxicology Program [NTP]
NCGC Assay Overview:
We have developed a 1536-well cell-based assay for
quantitative high throughput screening (qHTS) to
determine in vitro hERG channel blockage as a
measure of cardio toxicity with small molecules. This
particular assay uses the U2OS cell line which is
derived from human osteosarcoma.
...
Natural language models
◈ Use NLP to create fingerprints for each text document
◈ One Bayesian model for each possible annotation term
10
"(VBN performed)"
"(NP (JJ 20-nL))"
"(NP (DT this) (NN screening))"
"(NN DMSO)"
"(NN confirmation)"
"(NP (DT a) (NN solution))"
"(NNP NADP)"
...
Text
NLP
assay format
→organism-based format
bao:BAO_0000218
01101010000...
10001010100...
01010101010...
10101110010...
11001010111...
...
fingerprints
target
→transcription factor
bao:BAO_0000282
...
Correlation models
◈ Similar principle: existing annotations are fingerprints
11
one model
per term
◈ Combine natural language + correlation = ranking
Metrics
12
Metrics
◈ NLP only: 96.9% ± 4.4
12
Metrics
◈ NLP only: 96.9% ± 4.4
12
Metrics
◈ NLP only: 96.9% ± 4.4
12
◈ + correlation: 97.5% ± 3.8
Metrics
◈ NLP only: 96.9% ± 4.4
12
◈ + correlation: 97.5% ± 3.8
◈ + text mining: 97.6% ± 3.8
Axioms
◈ Ontologies like BAO & DTO have axioms
◈ Convert these into rules:
13
subject impact
limit / exclude
◈ Use to constrain predictions at each step
◈ Like correlation models, except absolute
New data: ELN
◈ Paste text or drag a document...
14
New data: ELN
◈ Paste text or drag a document...
14
New data: ELN
◈ Paste text or drag a document...
14
◈ Or re-use...
Autogenerated text
◈ Annotations-to-text is
much easier
◈ Transliteration
template:
15
Extending ontologies
◈ Science is never complete: diction evolves
◈ Curators can request a provisional term
16
Extending ontologies
◈ Science is never complete: diction evolves
◈ Curators can request a provisional term
16
Upgrading ontologies
◈ Next step: submit provisional terms to OntoloBridge
◈ Joint collaboration between CDD and:
▷ Stanford (BioPortal, CEDAR)
▷ University of Miami (BAO, DTO)
◈ Ontology owner receives new additions via API
◈ Experts vet all new terms, outcome made available
◈ Users of BAE are unimpeded...
17
Detailed templates
◈ The Common Assay Template is
a summary
◈ Can insert extra branch
groups
◈ Ultimate goal:
▷ capture whole experiment
▷ replace text entirely
◈ Syncing with ontology
maintainers...
18
★measurement
• measurement details
★assay components
• buffer concentration
• carrier protein concentration
• chelator concentration
• detergent concentration
• metal salt concentration
• reducing agent concentration
• solvent concentration
★control details
★organism details
• cell line details
• microbe/virus details
★protocol details
• antibody details
• assay molecule
• biochemical assay
• coupled substrate incubation
• enzyme reaction
• ligand incubation
• perturbagen incubation
• substrate incubation
★screened entity details
Content expansion
◈ Public data originally from PubChem subset:
▷ MLPCN: thousands of screening assays
▷ Tox21: hundreds of assays from EPA
◈ Detailed text, and compounds + measurements (SAR)
◈ Done most of them: where's the next goldmine?
19
PubMed
◈ PubMed is massive (1.7M open access), but:
▷ only some have relevant assays
▷ no compounds or other useful metadata
◈ Current project:
▷ rank assays based on likelihood of assay description
▷ pick out relevant text
◈ Hundreds of thousands of candidate assays for BAE
◈ Overlap with ChEMBL: ~600 (1.7M ∩ 70K) - metadata, SAR
20
Crowdsourcing
◈ Anyone can authenticate on beta.bioassayexpress.com
21
Crowdsourcing
◈ Anyone can authenticate on beta.bioassayexpress.com
21
Integration
◈ Basic functionality
integrated into
CDD Vault
◈ ELN users can
insert an assay
◈ Annotations are
semantic, like
standalone BAE
22
Integration
◈ Basic functionality
integrated into
CDD Vault
◈ ELN users can
insert an assay
◈ Annotations are
semantic, like
standalone BAE
22
Acknowledgments
◈ BioAssay Express Team
⇨ Peter Gedeck
⇨ Hande Küçük
⇨ Barry Bunin
⇨ Jason Harris
⇨ Charlie Read
⇨ Samantha Jeschonek
23
◈ Collaborators
⇨ René Witte & Bahar
Sateli (Knowlet)
⇨ Julia Davies (UCSF)
⇨ Mark Musen & John
Graybeal (Stanford)
⇨ Stephan Schürer (U.
of Miami)◈ More information
http://www.bioassayexpress.com
http://collaborativedrug.com
Funding
NIH SBIR

Mais conteúdo relacionado

Mais procurados

Mais procurados (6)

Accurate detection of low frequency genetic variants using novel, molecular t...
Accurate detection of low frequency genetic variants using novel, molecular t...Accurate detection of low frequency genetic variants using novel, molecular t...
Accurate detection of low frequency genetic variants using novel, molecular t...
 
Gold on paper-paper platform for Au-nanoprobe TB detection
Gold on paper-paper platform for Au-nanoprobe TB detectionGold on paper-paper platform for Au-nanoprobe TB detection
Gold on paper-paper platform for Au-nanoprobe TB detection
 
DNA_Services
DNA_ServicesDNA_Services
DNA_Services
 
Rt pcr
Rt pcrRt pcr
Rt pcr
 
Introduction to High Resolution Melt Analysis
Introduction to High Resolution Melt AnalysisIntroduction to High Resolution Melt Analysis
Introduction to High Resolution Melt Analysis
 
Pcr presentation
Pcr presentationPcr presentation
Pcr presentation
 

Semelhante a Bringing bioassay protocols to the world of informatics, using semantic annotations

Autonomous model building with a preponderance of well annotated assay protocols
Autonomous model building with a preponderance of well annotated assay protocolsAutonomous model building with a preponderance of well annotated assay protocols
Autonomous model building with a preponderance of well annotated assay protocolsAlex Clark
 
Isolation of rhizobium species from soil and to
Isolation of rhizobium species from soil and toIsolation of rhizobium species from soil and to
Isolation of rhizobium species from soil and totusha madan
 
2015.04.08-Next-generation-sequencing-issues
2015.04.08-Next-generation-sequencing-issues2015.04.08-Next-generation-sequencing-issues
2015.04.08-Next-generation-sequencing-issuesDongyan Zhao
 
Unit 2 Star Activity.pdf
Unit 2 Star Activity.pdfUnit 2 Star Activity.pdf
Unit 2 Star Activity.pdfKhushiDuttVatsa
 
Open pacbiomodelorgpaper j_landolin_20150121
Open pacbiomodelorgpaper j_landolin_20150121Open pacbiomodelorgpaper j_landolin_20150121
Open pacbiomodelorgpaper j_landolin_20150121Jane Landolin
 
Microarray validation
Microarray validationMicroarray validation
Microarray validationElsa von Licy
 
Gene Expression Analysis by Real Time PCR
Gene Expression Analysis by Real Time PCRGene Expression Analysis by Real Time PCR
Gene Expression Analysis by Real Time PCRSuresh Antre
 
Cox1998-Automated_RNA_selection.
Cox1998-Automated_RNA_selection.Cox1998-Automated_RNA_selection.
Cox1998-Automated_RNA_selection.J. Colin Cox
 
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...Elia Brodsky
 
Challenges of FFPE Sample Materials – Where Does Variation in Quantity of Pur...
Challenges of FFPE Sample Materials – Where Does Variation in Quantity of Pur...Challenges of FFPE Sample Materials – Where Does Variation in Quantity of Pur...
Challenges of FFPE Sample Materials – Where Does Variation in Quantity of Pur...QIAGEN
 
208 1st lecture
208 1st lecture208 1st lecture
208 1st lectureBulger208
 
New Technologies at the Center for Bioinformatics & Functional Genomics at Mi...
New Technologies at the Center for Bioinformatics & Functional Genomics at Mi...New Technologies at the Center for Bioinformatics & Functional Genomics at Mi...
New Technologies at the Center for Bioinformatics & Functional Genomics at Mi...Andor Kiss
 
Improved Reagents & Methods for Target Enrichment in Next Generation Sequencing
Improved Reagents & Methods for Target Enrichment in Next Generation SequencingImproved Reagents & Methods for Target Enrichment in Next Generation Sequencing
Improved Reagents & Methods for Target Enrichment in Next Generation SequencingIntegrated DNA Technologies
 
Barcode Data Standards
Barcode Data Standards Barcode Data Standards
Barcode Data Standards Leigh Peele
 

Semelhante a Bringing bioassay protocols to the world of informatics, using semantic annotations (20)

Autonomous model building with a preponderance of well annotated assay protocols
Autonomous model building with a preponderance of well annotated assay protocolsAutonomous model building with a preponderance of well annotated assay protocols
Autonomous model building with a preponderance of well annotated assay protocols
 
Isolation of rhizobium species from soil and to
Isolation of rhizobium species from soil and toIsolation of rhizobium species from soil and to
Isolation of rhizobium species from soil and to
 
2015.04.08-Next-generation-sequencing-issues
2015.04.08-Next-generation-sequencing-issues2015.04.08-Next-generation-sequencing-issues
2015.04.08-Next-generation-sequencing-issues
 
Unit 2 Star Activity.pdf
Unit 2 Star Activity.pdfUnit 2 Star Activity.pdf
Unit 2 Star Activity.pdf
 
Open pacbiomodelorgpaper j_landolin_20150121
Open pacbiomodelorgpaper j_landolin_20150121Open pacbiomodelorgpaper j_landolin_20150121
Open pacbiomodelorgpaper j_landolin_20150121
 
Microarray validation
Microarray validationMicroarray validation
Microarray validation
 
Gene Expression Analysis by Real Time PCR
Gene Expression Analysis by Real Time PCRGene Expression Analysis by Real Time PCR
Gene Expression Analysis by Real Time PCR
 
CHEM3204_PRAC_Manual_2016
CHEM3204_PRAC_Manual_2016CHEM3204_PRAC_Manual_2016
CHEM3204_PRAC_Manual_2016
 
Cox1998-Automated_RNA_selection.
Cox1998-Automated_RNA_selection.Cox1998-Automated_RNA_selection.
Cox1998-Automated_RNA_selection.
 
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
 
PCR lecture.ppt
PCR lecture.pptPCR lecture.ppt
PCR lecture.ppt
 
Challenges of FFPE Sample Materials – Where Does Variation in Quantity of Pur...
Challenges of FFPE Sample Materials – Where Does Variation in Quantity of Pur...Challenges of FFPE Sample Materials – Where Does Variation in Quantity of Pur...
Challenges of FFPE Sample Materials – Where Does Variation in Quantity of Pur...
 
208 1st lecture
208 1st lecture208 1st lecture
208 1st lecture
 
Group b
Group bGroup b
Group b
 
New Technologies at the Center for Bioinformatics & Functional Genomics at Mi...
New Technologies at the Center for Bioinformatics & Functional Genomics at Mi...New Technologies at the Center for Bioinformatics & Functional Genomics at Mi...
New Technologies at the Center for Bioinformatics & Functional Genomics at Mi...
 
Improved Reagents & Methods for Target Enrichment in Next Generation Sequencing
Improved Reagents & Methods for Target Enrichment in Next Generation SequencingImproved Reagents & Methods for Target Enrichment in Next Generation Sequencing
Improved Reagents & Methods for Target Enrichment in Next Generation Sequencing
 
Pcr new
Pcr newPcr new
Pcr new
 
Thesis biobix
Thesis biobixThesis biobix
Thesis biobix
 
Dario Lijtmaer - PCR Amplification
Dario Lijtmaer - PCR AmplificationDario Lijtmaer - PCR Amplification
Dario Lijtmaer - PCR Amplification
 
Barcode Data Standards
Barcode Data Standards Barcode Data Standards
Barcode Data Standards
 

Mais de Alex Clark

Mixtures QSAR: modelling collections of chemicals
Mixtures QSAR: modelling collections of chemicalsMixtures QSAR: modelling collections of chemicals
Mixtures QSAR: modelling collections of chemicalsAlex Clark
 
Mixtures InChI: a story of how standards drive upstream products
Mixtures InChI: a story of how standards drive upstream productsMixtures InChI: a story of how standards drive upstream products
Mixtures InChI: a story of how standards drive upstream productsAlex Clark
 
Mixtures as first class citizens in the realm of informatics
Mixtures as first class citizens in the realm of informaticsMixtures as first class citizens in the realm of informatics
Mixtures as first class citizens in the realm of informaticsAlex Clark
 
Mixtures: informatics for formulations and consumer products
Mixtures: informatics for formulations and consumer productsMixtures: informatics for formulations and consumer products
Mixtures: informatics for formulations and consumer productsAlex Clark
 
Coordination InChI (2019)
Coordination InChI (2019)Coordination InChI (2019)
Coordination InChI (2019)Alex Clark
 
Chemical mixtures: File format, open source tools, example data, and mixtures...
Chemical mixtures: File format, open source tools, example data, and mixtures...Chemical mixtures: File format, open source tools, example data, and mixtures...
Chemical mixtures: File format, open source tools, example data, and mixtures...Alex Clark
 
ACS CINF Luncheon talk (Boston 2018)
ACS CINF Luncheon talk (Boston 2018)ACS CINF Luncheon talk (Boston 2018)
ACS CINF Luncheon talk (Boston 2018)Alex Clark
 
Representing molecules with minimalism: A solution to the entropy of informatics
Representing molecules with minimalism: A solution to the entropy of informaticsRepresenting molecules with minimalism: A solution to the entropy of informatics
Representing molecules with minimalism: A solution to the entropy of informaticsAlex Clark
 
CDD BioAssay Express: Expanding the target dimension: How to visualize a lot ...
CDD BioAssay Express: Expanding the target dimension: How to visualize a lot ...CDD BioAssay Express: Expanding the target dimension: How to visualize a lot ...
CDD BioAssay Express: Expanding the target dimension: How to visualize a lot ...Alex Clark
 
BioAssay Express
BioAssay ExpressBioAssay Express
BioAssay ExpressAlex Clark
 
SLAS2016: Why have one model when you could have thousands?
SLAS2016: Why have one model when you could have thousands?SLAS2016: Why have one model when you could have thousands?
SLAS2016: Why have one model when you could have thousands?Alex Clark
 
The anatomy of a chemical reaction: Dissection by machine learning algorithms
The anatomy of a chemical reaction: Dissection by machine learning algorithmsThe anatomy of a chemical reaction: Dissection by machine learning algorithms
The anatomy of a chemical reaction: Dissection by machine learning algorithmsAlex Clark
 
Compact models for compact devices: Visualisation of SAR using mobile apps
Compact models for compact devices: Visualisation of SAR using mobile appsCompact models for compact devices: Visualisation of SAR using mobile apps
Compact models for compact devices: Visualisation of SAR using mobile appsAlex Clark
 
Green chemistry in chemical reactions: informatics by design
Green chemistry in chemical reactions: informatics by designGreen chemistry in chemical reactions: informatics by design
Green chemistry in chemical reactions: informatics by designAlex Clark
 
ICCE 2014: The Green Lab Notebook
ICCE 2014: The Green Lab NotebookICCE 2014: The Green Lab Notebook
ICCE 2014: The Green Lab NotebookAlex Clark
 
Cloud hosted APIs for cheminformatics on mobile devices (ACS Dallas 2014)
Cloud hosted APIs for cheminformatics on mobile devices (ACS Dallas 2014)Cloud hosted APIs for cheminformatics on mobile devices (ACS Dallas 2014)
Cloud hosted APIs for cheminformatics on mobile devices (ACS Dallas 2014)Alex Clark
 
Building a mobile reaction lab notebook (ACS Dallas 2014)
Building a mobile reaction lab notebook (ACS Dallas 2014)Building a mobile reaction lab notebook (ACS Dallas 2014)
Building a mobile reaction lab notebook (ACS Dallas 2014)Alex Clark
 
Reaction Lab Notebooks for Mobile Devices - Alex M. Clark - GDCh 2013
Reaction Lab Notebooks for Mobile Devices - Alex M. Clark - GDCh 2013Reaction Lab Notebooks for Mobile Devices - Alex M. Clark - GDCh 2013
Reaction Lab Notebooks for Mobile Devices - Alex M. Clark - GDCh 2013Alex Clark
 
Alex Clark : NETTAB 2013
Alex Clark : NETTAB 2013Alex Clark : NETTAB 2013
Alex Clark : NETTAB 2013Alex Clark
 
Open Drug Discovery Teams @ Hacking Health Montreal
Open Drug Discovery Teams @ Hacking Health MontrealOpen Drug Discovery Teams @ Hacking Health Montreal
Open Drug Discovery Teams @ Hacking Health MontrealAlex Clark
 

Mais de Alex Clark (20)

Mixtures QSAR: modelling collections of chemicals
Mixtures QSAR: modelling collections of chemicalsMixtures QSAR: modelling collections of chemicals
Mixtures QSAR: modelling collections of chemicals
 
Mixtures InChI: a story of how standards drive upstream products
Mixtures InChI: a story of how standards drive upstream productsMixtures InChI: a story of how standards drive upstream products
Mixtures InChI: a story of how standards drive upstream products
 
Mixtures as first class citizens in the realm of informatics
Mixtures as first class citizens in the realm of informaticsMixtures as first class citizens in the realm of informatics
Mixtures as first class citizens in the realm of informatics
 
Mixtures: informatics for formulations and consumer products
Mixtures: informatics for formulations and consumer productsMixtures: informatics for formulations and consumer products
Mixtures: informatics for formulations and consumer products
 
Coordination InChI (2019)
Coordination InChI (2019)Coordination InChI (2019)
Coordination InChI (2019)
 
Chemical mixtures: File format, open source tools, example data, and mixtures...
Chemical mixtures: File format, open source tools, example data, and mixtures...Chemical mixtures: File format, open source tools, example data, and mixtures...
Chemical mixtures: File format, open source tools, example data, and mixtures...
 
ACS CINF Luncheon talk (Boston 2018)
ACS CINF Luncheon talk (Boston 2018)ACS CINF Luncheon talk (Boston 2018)
ACS CINF Luncheon talk (Boston 2018)
 
Representing molecules with minimalism: A solution to the entropy of informatics
Representing molecules with minimalism: A solution to the entropy of informaticsRepresenting molecules with minimalism: A solution to the entropy of informatics
Representing molecules with minimalism: A solution to the entropy of informatics
 
CDD BioAssay Express: Expanding the target dimension: How to visualize a lot ...
CDD BioAssay Express: Expanding the target dimension: How to visualize a lot ...CDD BioAssay Express: Expanding the target dimension: How to visualize a lot ...
CDD BioAssay Express: Expanding the target dimension: How to visualize a lot ...
 
BioAssay Express
BioAssay ExpressBioAssay Express
BioAssay Express
 
SLAS2016: Why have one model when you could have thousands?
SLAS2016: Why have one model when you could have thousands?SLAS2016: Why have one model when you could have thousands?
SLAS2016: Why have one model when you could have thousands?
 
The anatomy of a chemical reaction: Dissection by machine learning algorithms
The anatomy of a chemical reaction: Dissection by machine learning algorithmsThe anatomy of a chemical reaction: Dissection by machine learning algorithms
The anatomy of a chemical reaction: Dissection by machine learning algorithms
 
Compact models for compact devices: Visualisation of SAR using mobile apps
Compact models for compact devices: Visualisation of SAR using mobile appsCompact models for compact devices: Visualisation of SAR using mobile apps
Compact models for compact devices: Visualisation of SAR using mobile apps
 
Green chemistry in chemical reactions: informatics by design
Green chemistry in chemical reactions: informatics by designGreen chemistry in chemical reactions: informatics by design
Green chemistry in chemical reactions: informatics by design
 
ICCE 2014: The Green Lab Notebook
ICCE 2014: The Green Lab NotebookICCE 2014: The Green Lab Notebook
ICCE 2014: The Green Lab Notebook
 
Cloud hosted APIs for cheminformatics on mobile devices (ACS Dallas 2014)
Cloud hosted APIs for cheminformatics on mobile devices (ACS Dallas 2014)Cloud hosted APIs for cheminformatics on mobile devices (ACS Dallas 2014)
Cloud hosted APIs for cheminformatics on mobile devices (ACS Dallas 2014)
 
Building a mobile reaction lab notebook (ACS Dallas 2014)
Building a mobile reaction lab notebook (ACS Dallas 2014)Building a mobile reaction lab notebook (ACS Dallas 2014)
Building a mobile reaction lab notebook (ACS Dallas 2014)
 
Reaction Lab Notebooks for Mobile Devices - Alex M. Clark - GDCh 2013
Reaction Lab Notebooks for Mobile Devices - Alex M. Clark - GDCh 2013Reaction Lab Notebooks for Mobile Devices - Alex M. Clark - GDCh 2013
Reaction Lab Notebooks for Mobile Devices - Alex M. Clark - GDCh 2013
 
Alex Clark : NETTAB 2013
Alex Clark : NETTAB 2013Alex Clark : NETTAB 2013
Alex Clark : NETTAB 2013
 
Open Drug Discovery Teams @ Hacking Health Montreal
Open Drug Discovery Teams @ Hacking Health MontrealOpen Drug Discovery Teams @ Hacking Health Montreal
Open Drug Discovery Teams @ Hacking Health Montreal
 

Último

module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learninglevieagacer
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxSuji236384
 
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedConnaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedDelhi Call girls
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learninglevieagacer
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .Poonam Aher Patil
 
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...dkNET
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.Nitya salvi
 
Introduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptxIntroduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptxBhagirath Gogikar
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryAlex Henderson
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Servicenishacall1
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsSérgio Sacani
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfrohankumarsinghrore1
 
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑Damini Dixit
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and ClassificationsAreesha Ahmad
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticssakshisoni2385
 
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPirithiRaju
 

Último (20)

module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
 
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedConnaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
Introduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptxIntroduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptx
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 

Bringing bioassay protocols to the world of informatics, using semantic annotations

  • 1. Bringing bioassay protocols to the world of informatics, using semantic annotations Alex M. Clark
  • 2. SAR ◈ The activity column: ▷ TCruzi (chagas) ▷ IC50 ▷ nM ◈ Underdefined in a typical SDfile or table ◈ Only alternative is a verbose publication 2
  • 3. Assay Informatics ◈ Measurement of an activity has many details: ▷ target, organism, cell line, measurement type, units, experimental design, instrumentation, etc. ◈ Usually only available as text... ◈ ... needs to be informatics. ◈ Searching for experiments, comparing datasets, spotting trends, evaluating completeness 3 Cell-Free Homogeneous Primary HTS to Identify Inhibitors of GSK3beta Activity (1) Dispense 1 uL/well of CABPE, 0.5 uL of ATP, and 1 uL of positive control GW8510 or AB in respective wells according to plate design to 1536-well assay ready plates (Aurora 29847) that contain 2.5 nL/well of 10 mM compound using BioRAPTR (Beckman) to start the reaction. Incubate at room temperature for 60 minutes. (2) Add 2.5 uL/well of ADP-glo (Promega, V9103) with BioRAPTR, incubate at room temperature for 40 minutes (3) Add 5 uL/well of ADP-glo (Promega, V9103) with Combi nL (Thermo), incubate at room temperature for 30 minutes
  • 4. Assay Informatics ◈ Measurement of an activity has many details: ▷ target, organism, cell line, measurement type, units, experimental design, instrumentation, etc. ◈ Usually only available as text... ◈ ... needs to be informatics. ◈ Searching for experiments, comparing datasets, spotting trends, evaluating completeness 3 Cell-Free Homogeneous Primary HTS to Identify Inhibitors of GSK3beta Activity (1) Dispense 1 uL/well of CABPE, 0.5 uL of ATP, and 1 uL of positive control GW8510 or AB in respective wells according to plate design to 1536-well assay ready plates (Aurora 29847) that contain 2.5 nL/well of 10 mM compound using BioRAPTR (Beckman) to start the reaction. Incubate at room temperature for 60 minutes. (2) Add 2.5 uL/well of ADP-glo (Promega, V9103) with BioRAPTR, incubate at room temperature for 40 minutes (3) Add 5 uL/well of ADP-glo (Promega, V9103) with Combi nL (Thermo), incubate at room temperature for 30 minutes http://www.bioassayontology.org/bao#BAO_0002909 http://www.drugtargetontology.org/dto/DTO_03300101 http://purl.obolibrary.org/obo/NCBITaxon_9606
  • 5. The Mission 4 ◈ To make the world's bioassay protocol data machine readable ◈ Much progress toward making this goal an actual reality... ◈ With the help of ontology markup: ▷ BAO, DTO, CLO, GO, UO, UniProt...
  • 6. 5 Ontologies BAO, DTO, CLO, UO, UniProtLegacy Data PubMed PubChem MLPCN Tox21 ChEMBL Common Assay Template Fully Annotated Assays New Data ELN ML
  • 7. Common Assay Template ◈ A recipe for how to use ontologies ◈ Capture most important information for most assays 6
  • 8. Legacy Curation ◈ Text-centric data: millions of assays... ▷ private collections (ELNs) ▷ published literature (PubMed) ▷ assay databases (PubChem) ◈ Most accessible sources: MLPCN, Tox21 ◈ Data comes in as plain text + several fields + SAR 7
  • 9. Hybrid machine learning 1. Text extraction 2. Probabilistic: ▷ natural language models ▷ correlation models 3. Axiomatic rules ◈ Each of these accelerates the human curator 8
  • 10. Text extraction ◈ Optimised for low false positives 9 qHTS Assay for Small Molecule Inhibitors of the Human hERG Channel Activity NIH Chemical Genomics Center [NCGC] National Institutes of Environmental Health Sciences [NIEHS] National Toxicology Program [NTP] NCGC Assay Overview: We have developed a 1536-well cell-based assay for quantitative high throughput screening (qHTS) to determine in vitro hERG channel blockage as a measure of cardio toxicity with small molecules. This particular assay uses the U2OS cell line which is derived from human osteosarcoma. ...
  • 11. Text extraction ◈ Optimised for low false positives 9 qHTS Assay for Small Molecule Inhibitors of the Human hERG Channel Activity NIH Chemical Genomics Center [NCGC] National Institutes of Environmental Health Sciences [NIEHS] National Toxicology Program [NTP] NCGC Assay Overview: We have developed a 1536-well cell-based assay for quantitative high throughput screening (qHTS) to determine in vitro hERG channel blockage as a measure of cardio toxicity with small molecules. This particular assay uses the U2OS cell line which is derived from human osteosarcoma. ...
  • 12. Natural language models ◈ Use NLP to create fingerprints for each text document ◈ One Bayesian model for each possible annotation term 10 "(VBN performed)" "(NP (JJ 20-nL))" "(NP (DT this) (NN screening))" "(NN DMSO)" "(NN confirmation)" "(NP (DT a) (NN solution))" "(NNP NADP)" ... Text NLP assay format →organism-based format bao:BAO_0000218 01101010000... 10001010100... 01010101010... 10101110010... 11001010111... ... fingerprints target →transcription factor bao:BAO_0000282 ...
  • 13. Correlation models ◈ Similar principle: existing annotations are fingerprints 11 one model per term ◈ Combine natural language + correlation = ranking
  • 15. Metrics ◈ NLP only: 96.9% ± 4.4 12
  • 16. Metrics ◈ NLP only: 96.9% ± 4.4 12
  • 17. Metrics ◈ NLP only: 96.9% ± 4.4 12 ◈ + correlation: 97.5% ± 3.8
  • 18. Metrics ◈ NLP only: 96.9% ± 4.4 12 ◈ + correlation: 97.5% ± 3.8 ◈ + text mining: 97.6% ± 3.8
  • 19. Axioms ◈ Ontologies like BAO & DTO have axioms ◈ Convert these into rules: 13 subject impact limit / exclude ◈ Use to constrain predictions at each step ◈ Like correlation models, except absolute
  • 20. New data: ELN ◈ Paste text or drag a document... 14
  • 21. New data: ELN ◈ Paste text or drag a document... 14
  • 22. New data: ELN ◈ Paste text or drag a document... 14 ◈ Or re-use...
  • 23. Autogenerated text ◈ Annotations-to-text is much easier ◈ Transliteration template: 15
  • 24. Extending ontologies ◈ Science is never complete: diction evolves ◈ Curators can request a provisional term 16
  • 25. Extending ontologies ◈ Science is never complete: diction evolves ◈ Curators can request a provisional term 16
  • 26. Upgrading ontologies ◈ Next step: submit provisional terms to OntoloBridge ◈ Joint collaboration between CDD and: ▷ Stanford (BioPortal, CEDAR) ▷ University of Miami (BAO, DTO) ◈ Ontology owner receives new additions via API ◈ Experts vet all new terms, outcome made available ◈ Users of BAE are unimpeded... 17
  • 27. Detailed templates ◈ The Common Assay Template is a summary ◈ Can insert extra branch groups ◈ Ultimate goal: ▷ capture whole experiment ▷ replace text entirely ◈ Syncing with ontology maintainers... 18 ★measurement • measurement details ★assay components • buffer concentration • carrier protein concentration • chelator concentration • detergent concentration • metal salt concentration • reducing agent concentration • solvent concentration ★control details ★organism details • cell line details • microbe/virus details ★protocol details • antibody details • assay molecule • biochemical assay • coupled substrate incubation • enzyme reaction • ligand incubation • perturbagen incubation • substrate incubation ★screened entity details
  • 28. Content expansion ◈ Public data originally from PubChem subset: ▷ MLPCN: thousands of screening assays ▷ Tox21: hundreds of assays from EPA ◈ Detailed text, and compounds + measurements (SAR) ◈ Done most of them: where's the next goldmine? 19
  • 29. PubMed ◈ PubMed is massive (1.7M open access), but: ▷ only some have relevant assays ▷ no compounds or other useful metadata ◈ Current project: ▷ rank assays based on likelihood of assay description ▷ pick out relevant text ◈ Hundreds of thousands of candidate assays for BAE ◈ Overlap with ChEMBL: ~600 (1.7M ∩ 70K) - metadata, SAR 20
  • 30. Crowdsourcing ◈ Anyone can authenticate on beta.bioassayexpress.com 21
  • 31. Crowdsourcing ◈ Anyone can authenticate on beta.bioassayexpress.com 21
  • 32. Integration ◈ Basic functionality integrated into CDD Vault ◈ ELN users can insert an assay ◈ Annotations are semantic, like standalone BAE 22
  • 33. Integration ◈ Basic functionality integrated into CDD Vault ◈ ELN users can insert an assay ◈ Annotations are semantic, like standalone BAE 22
  • 34. Acknowledgments ◈ BioAssay Express Team ⇨ Peter Gedeck ⇨ Hande Küçük ⇨ Barry Bunin ⇨ Jason Harris ⇨ Charlie Read ⇨ Samantha Jeschonek 23 ◈ Collaborators ⇨ René Witte & Bahar Sateli (Knowlet) ⇨ Julia Davies (UCSF) ⇨ Mark Musen & John Graybeal (Stanford) ⇨ Stephan Schürer (U. of Miami)◈ More information http://www.bioassayexpress.com http://collaborativedrug.com Funding NIH SBIR