SlideShare a Scribd company logo
1 of 42

CINF 24: BioAssay express: Creating and
exploiting assay metadata
collaborativedrug.com
bioassayexpress.com
bae@collaborativedrug.com
Philip Cheung
American Chemical Society
Sunday, August 25, 2019 - 3:10 PM -- Grand Ballroom A, Omni San Diego Hotel
A little bit about me…
Graduated Harvey Mudd College in 1996 B.S. in Biology
SAIC – 1996 – 1997
High Throughput Robotics Systems For ISB (Leroy Hood)
Orillion 1997 – 1999
WorldCom MCI / PlayStation 2 Event Management System
Medibuy.com 1999 – 2001
Global Health Exchange Between Premier Hospital Group
Pfizer Global Research and Development 2001 – 2009
Pfizer Global Crystal Structure Database
Oncology Project Support (Computational Biology)
Ophthalmology Indication Discovery (Machine Learning)
Dart NeuroScience (2009-2018)
Bioinformatics Group Leader
Independent Consultant (2018-Present)
Currently Support multiple informatics / Bioinformatics companies in San
Francisco, Boston, and San Diego
So what is assay informatics and why is it
exciting?
The IDEAL Cycle of Assay Management
• Plan experiments & capture ideas
• Perform experiments & capture data
• Analyze data & identify trends
• Store & protect the results
• Retrieve data & build knowledge -
across Concepts / Time / Projects
The REALISTIC Cycle of Assay Management
• Plan experiments & capture ideas
• Perform experiments & capture data
• Analyze data & identify trends
• Store & organize the results
• Retrieve data & build knowledge –
across Concepts / Time / Projects
• Post-It Note Edits & Lost Attributions
• Incomplete “Data Dump” & Lost Data
• Siloed data & Incomplete information
• Lost & non-reproducible data (crisis!)
• Inaccessible & unusable data leading to…
TIME WASTED & OPPORTUNIES LOST!
• No Common Vocabulary
• Limited Assay Mining/Searching
Capabilities
Barrier to Collaboration
Failure to Provide Insight
Even “Best Case” Assay Management is Inefficient
Efficiently & Quickly
Organize Assay Data
Machine Readable Format
Common Vocabulary through
Ontology Markup
Introduces Assay Informatics,
Providing New Insight by
Querying Biologic, Chemical,
and Assay Meta Data
BioAssay Express Leads the Field of Assay Informatics
Capture Metadata Unambiguously with BioAssay Express
Ambiguous
Unstructured
Cytoxicity test;
incubate hek cells for
24hrs. Add 1
compound from library
(tox 21); triplicate. 16
pt dilution. Inc. 48hrs
at 37C. Standard kit
protocol using thermo
Glo kit. Fluorescence in
triplicate. Report IC50
Ambiguous
Structured
Unambiguous
Structured
Cell Line:
Mode of Action:
Assay Kit:
Detection Method:
Detection Instrument:
Perturbagen Type:
Result:
Units:
HEK
Cytoxcity
GLO (Thermo)
Fluorescence
Tox 21
EC50
Cell Line:
Mode of Action:
Assay Kit:
Detection Method:
Detection Instrument:
Perturbagen Type:
Result:
Units:
HEK-293T Cell
[BTO_0002181]
Cytoxicity
[BAO_0000090]
CytoTox-Glo (Promega)
[BAO_0140009]
Luminescence
[BAO_0000045]
EnVision Multilabel Reader
[BAO_0000701]
Tox 21 Compound Library
[BAO_0070002]
IC50
[BAO_0000190]
nanomolar
[UO_0000065]
Missing WrongIncompleteSpelling
BioAssay Express Enables True Assay Informatics
Metadata
Assays
That seems like a lot of work? Can Machine
Learning help?
BioAssay Express - In Action!
Assay Registration is a Breeze!
• Insert Protocol
• Text Mining
• Expert Ontologies
• Predictive Text
• Machine Learning
• NLP
• Correlation Models
• Human Curation
• Accuracy is key
Dose Response assay for agonists of 5-Hydroxytryptamine
(Serotonin) Receptor Subtype 1A (5HT1A)
Assay Description:
Widely expressed in the human brain, 5-hydroxytryptamine (5-HT,
serotonin) receptors have been shown to have an important role in
depression as well as other cognitive and metabolic disorders [1, 2].
Discovering novel modulators of the 5-HT1A serotonin receptor
may not only help probe the function of this receptor, but also help
better understand the complex relationship among the 5-HT
receptor subtypes.
Protocol Summary:
As with the primary HTS assay, a Chinese Hamster Ovary
(CHO) cell line stably transfected with human 5HT1a receptor,
the nuclear factor of activated T-cell-beta lactamase (NFAT-BLA)
reporter construct and the G-alpha-15 promiscuous coupling protein
was used (Invitrogen, part K1083).
Cells were cultured in T-175 sq. cm flasks (Corning, part 431080) at
37 deg C and 95% RH. The assay began by dispensing 10 microliters
of cell suspension to each test well of a 384 well plate.
BioAssay Express: Optimized for Low False Positives
Assay Fingerprints: Bringing Informatics to Metadata
AssayMetadata
Assays  Assay Property Grid
• Generate assay fingerprints
• Compare hundreds of assays at a
glance
• Find, Share, and Innovate
 Blue boxes = exact match
- Blue lines = match inferred from
hierarchy
• Why are my results different from others
doing the ”same” assay?
• Has anyone studied this disease variant in
neurons?
• Did results vary when we switched
instrument models?
• Has someone else already done my
experiment
• What other programs have already screened
my target? Can I jumpstart my new program
with some existing chemistry?
• I want to do some machine learning – can I
find some other “appropriate” experiments.
So, I have a database…
how do I this apply to my data?
So how can I use this technology?
So let’s take a look at the steps required to process a
semi-structured dataset like clinicaltrials.gov
Structured because there are tables that hold categories of data
Unfortunately, its unstructured because the content does not map to an
ontology consistently.
So, how do I FAIR-ify my data?
So how do we go from unstructured to structured data?
Step 1 – Review the data you’re
importing; Map your data to
ontologies
ClinicalTrials.gov
Step 2 –Perform the import; some
percentage will map perfectly.
Step 3 -- Curate the remaining
data using the BAE’s NLP models.
Step 1- Map your data to ontologies
Open source BioAssay Template Tool
https://github.com/cdd/bioassay-template
Step 2- Import: Generate JSON Files
Step 2- Import: Or import using CSV File
Step 2- Some percentage will map out of the box
In this example, I imported a small
subset of clinical trials.gov – 18825 of
the 313472 (~6%) available studies
If I was interested in multiple myeloma,
I could write a script that was
aggressive mapping and assign all of
these as “multiple myeloma”, or I could
assign only the perfect matches, and
let BAE’s NLP help me with the
mappings.
Step 3 – Machine Assisted Curation
Importing /
Mapping data
allows BAE to
build Bayesian
Models
Interesting questions you can visualize
What cancer trials was Celecoxib used in What other combination therapies were also used in those
Interesting questions you can visualize
What new Interventions are being used in Malaria?
Interesting questions you can visualize
What new Interventions are being used in Malaria?
Interesting questions you can visualize
What new Interventions are being used in Malaria?
We’re picking up steam
* Extensive Proof Of Concept Study
But let’s talk about Robots now!
Current Directions -- Projects
An aspirational goal for our team is to build a metadata schema based on semantic web
vocabularies that is comprehensive to the extent that the text description becomes optional
There are many challenges involved in creating the ELN-to-robot loop, here we provide some
insights into our collaborations with UCSF automation experts at the Small Molecule Discovery
Center.
The High level goal – ELN to Robot
GBG by BioSero
Cellario By HiResBio
Adapter/Builder
Director by Wako (Fuji)
- Model the protocols at the step level so
we can export them out to automation
systems
- Import the results back into the system
Simple Serial Protocol
Often times they are more complex -- branch and merge
Break the protocol down into steps
• Model the Dependencies
• Equipment
dependencies
• Reagent dependencies
• Previous step
dependencies
• Model the Steps
• Protocol Steps
Sample Protocol
https://pubchem.ncbi.nlm.nih.gov/bioassay/346#section=Protocol
Plate Layout
Plate Action
(Centrifuge)
Assay Start
(Add Compounds)
Incubation
Readout
Sample Protocol
https://pubchem.ncbi.nlm.nih.gov/bioassay/346#section=Protocol
Plate Layout
Plate Action
(Centrifuge)
Assay Start
(Add Compounds)
Incubation
Readout
Sample Protocol
https://pubchem.ncbi.nlm.nih.gov/bioassay/346#section=Protocol
Sample Protocol
https://pubchem.ncbi.nlm.nih.gov/bioassay/346#section=Protocol
Next Steps
• Next Steps
• Continue our work with UCSF with
simple proof of concept protocols
• Reach out to Vendors and see if we
can integrate into their simulation
platforms
• Thermo
• Wako
• Keep reaching out and learning
from experts like yourself.
http://www.bioassayexpress.com
Try it out!
• Collaborative Drug Discovery
• Alex Clark
• Hande Kücük McGuinty
• Peter Gedeck
• Samantha Jeschonek
• Barry Bunin
• For more info
• bae@collaborativedrug.com http://www.bioassayexpress.com
Smart Drug Discovery Software SavesScientistsTimeSmart Drug Discovery Software Saves Time
Session: CINF: Sci-Mix
Location: Exhibit Hall B,
Date & Time: Monday,
Aug 26 8:00 PM

More Related Content

What's hot

Promiscuous patterns and perils in PubChem and the MLSCN
Promiscuous patterns and perils in PubChem and the MLSCNPromiscuous patterns and perils in PubChem and the MLSCN
Promiscuous patterns and perils in PubChem and the MLSCN
Jeremy Yang
 
Automating the process of continuously prioritising data, updating and deploy...
Automating the process of continuously prioritising data, updating and deploy...Automating the process of continuously prioritising data, updating and deploy...
Automating the process of continuously prioritising data, updating and deploy...
Ola Spjuth
 
2014 agbt giab data integration poster 140206
2014 agbt giab data integration poster 1402062014 agbt giab data integration poster 140206
2014 agbt giab data integration poster 140206
GenomeInABottle
 

What's hot (20)

Martin Ringwald, Mouse Gene Expression DB, fged_seattle_2013
Martin Ringwald, Mouse Gene Expression DB, fged_seattle_2013Martin Ringwald, Mouse Gene Expression DB, fged_seattle_2013
Martin Ringwald, Mouse Gene Expression DB, fged_seattle_2013
 
2016 davis-plantbio
2016 davis-plantbio2016 davis-plantbio
2016 davis-plantbio
 
AI is the Future of Drug Discovery
AI is the Future of Drug DiscoveryAI is the Future of Drug Discovery
AI is the Future of Drug Discovery
 
Promiscuous patterns and perils in PubChem and the MLSCN
Promiscuous patterns and perils in PubChem and the MLSCNPromiscuous patterns and perils in PubChem and the MLSCN
Promiscuous patterns and perils in PubChem and the MLSCN
 
Light Intro to the Gene Ontology
Light Intro to the Gene OntologyLight Intro to the Gene Ontology
Light Intro to the Gene Ontology
 
Automating the process of continuously prioritising data, updating and deploy...
Automating the process of continuously prioritising data, updating and deploy...Automating the process of continuously prioritising data, updating and deploy...
Automating the process of continuously prioritising data, updating and deploy...
 
Giab workshop intro 180125
Giab workshop intro 180125Giab workshop intro 180125
Giab workshop intro 180125
 
Advanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven ResearchAdvanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven Research
 
Giab jan2016 analysis team breakout SNP indel update zook
Giab jan2016 analysis team breakout SNP indel update zookGiab jan2016 analysis team breakout SNP indel update zook
Giab jan2016 analysis team breakout SNP indel update zook
 
Gene Ontology Enrichment Network Analysis -Tutorial
Gene Ontology Enrichment Network Analysis -TutorialGene Ontology Enrichment Network Analysis -Tutorial
Gene Ontology Enrichment Network Analysis -Tutorial
 
Giab workshop update mar2019
Giab workshop update mar2019Giab workshop update mar2019
Giab workshop update mar2019
 
Introducing ProtAnnot - Araport workshop at PAG 2016
Introducing ProtAnnot - Araport workshop at PAG 2016Introducing ProtAnnot - Araport workshop at PAG 2016
Introducing ProtAnnot - Araport workshop at PAG 2016
 
ICAR 2015 Workshop - Nick Provart
ICAR 2015 Workshop - Nick ProvartICAR 2015 Workshop - Nick Provart
ICAR 2015 Workshop - Nick Provart
 
The roles communities play in improving bioinformatics: better software, bett...
The roles communities play in improving bioinformatics: better software, bett...The roles communities play in improving bioinformatics: better software, bett...
The roles communities play in improving bioinformatics: better software, bett...
 
Ontologies for life sciences: examples from the gene ontology
Ontologies for life sciences: examples from the gene ontologyOntologies for life sciences: examples from the gene ontology
Ontologies for life sciences: examples from the gene ontology
 
Next Generation Sequence with Pathway Studio
Next Generation Sequence with Pathway StudioNext Generation Sequence with Pathway Studio
Next Generation Sequence with Pathway Studio
 
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
 
2014 agbt giab data integration poster 140206
2014 agbt giab data integration poster 1402062014 agbt giab data integration poster 140206
2014 agbt giab data integration poster 140206
 
The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...
The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...
The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...
 
Pathway studio into webinar 052715v1
Pathway studio into webinar 052715v1Pathway studio into webinar 052715v1
Pathway studio into webinar 052715v1
 

Similar to BioAssay Express: Creating and exploiting assay metadata

Delivering The Benefits of Chemical-Biological Integration in Computational T...
Delivering The Benefits of Chemical-Biological Integration in Computational T...Delivering The Benefits of Chemical-Biological Integration in Computational T...
Delivering The Benefits of Chemical-Biological Integration in Computational T...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Informatics In The Manchester Centre For Integrative Systems Biology
Informatics In The Manchester Centre For Integrative Systems BiologyInformatics In The Manchester Centre For Integrative Systems Biology
Informatics In The Manchester Centre For Integrative Systems Biology
Neil Swainston
 

Similar to BioAssay Express: Creating and exploiting assay metadata (20)

CINECA webinar slides: Modular and reproducible workflows for federated molec...
CINECA webinar slides: Modular and reproducible workflows for federated molec...CINECA webinar slides: Modular and reproducible workflows for federated molec...
CINECA webinar slides: Modular and reproducible workflows for federated molec...
 
Delivering The Benefits of Chemical-Biological Integration in Computational T...
Delivering The Benefits of Chemical-Biological Integration in Computational T...Delivering The Benefits of Chemical-Biological Integration in Computational T...
Delivering The Benefits of Chemical-Biological Integration in Computational T...
 
2014 abic-talk
2014 abic-talk2014 abic-talk
2014 abic-talk
 
SLAS Ultra-High-Throughput Screening Special Interest Group SLAS2017 Presenta...
SLAS Ultra-High-Throughput Screening Special Interest Group SLAS2017 Presenta...SLAS Ultra-High-Throughput Screening Special Interest Group SLAS2017 Presenta...
SLAS Ultra-High-Throughput Screening Special Interest Group SLAS2017 Presenta...
 
Towards automated phenotypic cell profiling with high-content imaging
Towards automated phenotypic cell profiling with high-content imagingTowards automated phenotypic cell profiling with high-content imaging
Towards automated phenotypic cell profiling with high-content imaging
 
Production Bioinformatics, emphasis on Production
Production Bioinformatics, emphasis on ProductionProduction Bioinformatics, emphasis on Production
Production Bioinformatics, emphasis on Production
 
Pine education-platform
Pine education-platformPine education-platform
Pine education-platform
 
Overview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data AnalysisOverview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data Analysis
 
Informatics In The Manchester Centre For Integrative Systems Biology
Informatics In The Manchester Centre For Integrative Systems BiologyInformatics In The Manchester Centre For Integrative Systems Biology
Informatics In The Manchester Centre For Integrative Systems Biology
 
Too good to be true? How validate your data
Too good to be true? How validate your dataToo good to be true? How validate your data
Too good to be true? How validate your data
 
Enhancing the Quality of ImmPort Data
Enhancing the Quality of ImmPort DataEnhancing the Quality of ImmPort Data
Enhancing the Quality of ImmPort Data
 
Model repositories and standard formats for model reusability
Model repositories and standard formats for model reusabilityModel repositories and standard formats for model reusability
Model repositories and standard formats for model reusability
 
GIAB-GRC workshop oct2015 giab introduction 151005
GIAB-GRC workshop oct2015 giab introduction 151005GIAB-GRC workshop oct2015 giab introduction 151005
GIAB-GRC workshop oct2015 giab introduction 151005
 
GIAB Integrating multiple technologies to form benchmark SVs 180517
GIAB Integrating multiple technologies to form benchmark SVs 180517GIAB Integrating multiple technologies to form benchmark SVs 180517
GIAB Integrating multiple technologies to form benchmark SVs 180517
 
AIQC - ISCB 2022.pdf
AIQC - ISCB 2022.pdfAIQC - ISCB 2022.pdf
AIQC - ISCB 2022.pdf
 
Emerging Challenges for Artificial Intelligence in Medicinal Chemistry
Emerging Challenges for Artificial Intelligence in Medicinal ChemistryEmerging Challenges for Artificial Intelligence in Medicinal Chemistry
Emerging Challenges for Artificial Intelligence in Medicinal Chemistry
 
Mining 'Bigger' Datasets to Create, Validate and Share Machine Learning Models
Mining 'Bigger' Datasets to Create, Validate and Share Machine Learning ModelsMining 'Bigger' Datasets to Create, Validate and Share Machine Learning Models
Mining 'Bigger' Datasets to Create, Validate and Share Machine Learning Models
 
Docker in Open Science Data Analysis Challenges by Bruce Hoff
Docker in Open Science Data Analysis Challenges by Bruce HoffDocker in Open Science Data Analysis Challenges by Bruce Hoff
Docker in Open Science Data Analysis Challenges by Bruce Hoff
 
Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017
 
150219 agbt giab_poster_marc
150219 agbt giab_poster_marc150219 agbt giab_poster_marc
150219 agbt giab_poster_marc
 

Recently uploaded

Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Bertram Ludäscher
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
nirzagarg
 
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
vexqp
 
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
vexqp
 
PLE-statistics document for primary schs
PLE-statistics document for primary schsPLE-statistics document for primary schs
PLE-statistics document for primary schs
cnajjemba
 
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
vexqp
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
ahmedjiabur940
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
chadhar227
 
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling ManjurJual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
ptikerjasaptiker
 

Recently uploaded (20)

Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptx
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
 
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
SR-101-01012024-EN.docx  Federal Constitution  of the Swiss ConfederationSR-101-01012024-EN.docx  Federal Constitution  of the Swiss Confederation
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
PLE-statistics document for primary schs
PLE-statistics document for primary schsPLE-statistics document for primary schs
PLE-statistics document for primary schs
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
 
Data Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdfData Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdf
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling ManjurJual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
 

BioAssay Express: Creating and exploiting assay metadata

  • 1.  CINF 24: BioAssay express: Creating and exploiting assay metadata collaborativedrug.com bioassayexpress.com bae@collaborativedrug.com Philip Cheung American Chemical Society Sunday, August 25, 2019 - 3:10 PM -- Grand Ballroom A, Omni San Diego Hotel
  • 2. A little bit about me… Graduated Harvey Mudd College in 1996 B.S. in Biology SAIC – 1996 – 1997 High Throughput Robotics Systems For ISB (Leroy Hood) Orillion 1997 – 1999 WorldCom MCI / PlayStation 2 Event Management System Medibuy.com 1999 – 2001 Global Health Exchange Between Premier Hospital Group Pfizer Global Research and Development 2001 – 2009 Pfizer Global Crystal Structure Database Oncology Project Support (Computational Biology) Ophthalmology Indication Discovery (Machine Learning) Dart NeuroScience (2009-2018) Bioinformatics Group Leader Independent Consultant (2018-Present) Currently Support multiple informatics / Bioinformatics companies in San Francisco, Boston, and San Diego
  • 3. So what is assay informatics and why is it exciting?
  • 4. The IDEAL Cycle of Assay Management • Plan experiments & capture ideas • Perform experiments & capture data • Analyze data & identify trends • Store & protect the results • Retrieve data & build knowledge - across Concepts / Time / Projects
  • 5. The REALISTIC Cycle of Assay Management • Plan experiments & capture ideas • Perform experiments & capture data • Analyze data & identify trends • Store & organize the results • Retrieve data & build knowledge – across Concepts / Time / Projects • Post-It Note Edits & Lost Attributions • Incomplete “Data Dump” & Lost Data • Siloed data & Incomplete information • Lost & non-reproducible data (crisis!) • Inaccessible & unusable data leading to… TIME WASTED & OPPORTUNIES LOST!
  • 6. • No Common Vocabulary • Limited Assay Mining/Searching Capabilities Barrier to Collaboration Failure to Provide Insight Even “Best Case” Assay Management is Inefficient
  • 7. Efficiently & Quickly Organize Assay Data Machine Readable Format Common Vocabulary through Ontology Markup Introduces Assay Informatics, Providing New Insight by Querying Biologic, Chemical, and Assay Meta Data BioAssay Express Leads the Field of Assay Informatics
  • 8. Capture Metadata Unambiguously with BioAssay Express Ambiguous Unstructured Cytoxicity test; incubate hek cells for 24hrs. Add 1 compound from library (tox 21); triplicate. 16 pt dilution. Inc. 48hrs at 37C. Standard kit protocol using thermo Glo kit. Fluorescence in triplicate. Report IC50 Ambiguous Structured Unambiguous Structured Cell Line: Mode of Action: Assay Kit: Detection Method: Detection Instrument: Perturbagen Type: Result: Units: HEK Cytoxcity GLO (Thermo) Fluorescence Tox 21 EC50 Cell Line: Mode of Action: Assay Kit: Detection Method: Detection Instrument: Perturbagen Type: Result: Units: HEK-293T Cell [BTO_0002181] Cytoxicity [BAO_0000090] CytoTox-Glo (Promega) [BAO_0140009] Luminescence [BAO_0000045] EnVision Multilabel Reader [BAO_0000701] Tox 21 Compound Library [BAO_0070002] IC50 [BAO_0000190] nanomolar [UO_0000065] Missing WrongIncompleteSpelling
  • 9. BioAssay Express Enables True Assay Informatics Metadata Assays
  • 10. That seems like a lot of work? Can Machine Learning help?
  • 11. BioAssay Express - In Action! Assay Registration is a Breeze! • Insert Protocol • Text Mining • Expert Ontologies • Predictive Text • Machine Learning • NLP • Correlation Models • Human Curation • Accuracy is key
  • 12. Dose Response assay for agonists of 5-Hydroxytryptamine (Serotonin) Receptor Subtype 1A (5HT1A) Assay Description: Widely expressed in the human brain, 5-hydroxytryptamine (5-HT, serotonin) receptors have been shown to have an important role in depression as well as other cognitive and metabolic disorders [1, 2]. Discovering novel modulators of the 5-HT1A serotonin receptor may not only help probe the function of this receptor, but also help better understand the complex relationship among the 5-HT receptor subtypes. Protocol Summary: As with the primary HTS assay, a Chinese Hamster Ovary (CHO) cell line stably transfected with human 5HT1a receptor, the nuclear factor of activated T-cell-beta lactamase (NFAT-BLA) reporter construct and the G-alpha-15 promiscuous coupling protein was used (Invitrogen, part K1083). Cells were cultured in T-175 sq. cm flasks (Corning, part 431080) at 37 deg C and 95% RH. The assay began by dispensing 10 microliters of cell suspension to each test well of a 384 well plate. BioAssay Express: Optimized for Low False Positives
  • 13. Assay Fingerprints: Bringing Informatics to Metadata AssayMetadata Assays  Assay Property Grid • Generate assay fingerprints • Compare hundreds of assays at a glance • Find, Share, and Innovate  Blue boxes = exact match - Blue lines = match inferred from hierarchy • Why are my results different from others doing the ”same” assay? • Has anyone studied this disease variant in neurons? • Did results vary when we switched instrument models? • Has someone else already done my experiment • What other programs have already screened my target? Can I jumpstart my new program with some existing chemistry? • I want to do some machine learning – can I find some other “appropriate” experiments.
  • 14. So, I have a database… how do I this apply to my data?
  • 15. So how can I use this technology? So let’s take a look at the steps required to process a semi-structured dataset like clinicaltrials.gov
  • 16. Structured because there are tables that hold categories of data
  • 17. Unfortunately, its unstructured because the content does not map to an ontology consistently.
  • 18. So, how do I FAIR-ify my data?
  • 19. So how do we go from unstructured to structured data? Step 1 – Review the data you’re importing; Map your data to ontologies ClinicalTrials.gov Step 2 –Perform the import; some percentage will map perfectly. Step 3 -- Curate the remaining data using the BAE’s NLP models.
  • 20. Step 1- Map your data to ontologies Open source BioAssay Template Tool https://github.com/cdd/bioassay-template
  • 21. Step 2- Import: Generate JSON Files
  • 22. Step 2- Import: Or import using CSV File
  • 23. Step 2- Some percentage will map out of the box In this example, I imported a small subset of clinical trials.gov – 18825 of the 313472 (~6%) available studies If I was interested in multiple myeloma, I could write a script that was aggressive mapping and assign all of these as “multiple myeloma”, or I could assign only the perfect matches, and let BAE’s NLP help me with the mappings.
  • 24. Step 3 – Machine Assisted Curation Importing / Mapping data allows BAE to build Bayesian Models
  • 25. Interesting questions you can visualize What cancer trials was Celecoxib used in What other combination therapies were also used in those
  • 26. Interesting questions you can visualize What new Interventions are being used in Malaria?
  • 27. Interesting questions you can visualize What new Interventions are being used in Malaria?
  • 28. Interesting questions you can visualize What new Interventions are being used in Malaria?
  • 29. We’re picking up steam * Extensive Proof Of Concept Study
  • 30. But let’s talk about Robots now!
  • 31. Current Directions -- Projects An aspirational goal for our team is to build a metadata schema based on semantic web vocabularies that is comprehensive to the extent that the text description becomes optional There are many challenges involved in creating the ELN-to-robot loop, here we provide some insights into our collaborations with UCSF automation experts at the Small Molecule Discovery Center.
  • 32. The High level goal – ELN to Robot GBG by BioSero Cellario By HiResBio Adapter/Builder Director by Wako (Fuji) - Model the protocols at the step level so we can export them out to automation systems - Import the results back into the system
  • 34. Often times they are more complex -- branch and merge
  • 35. Break the protocol down into steps • Model the Dependencies • Equipment dependencies • Reagent dependencies • Previous step dependencies • Model the Steps • Protocol Steps
  • 36. Sample Protocol https://pubchem.ncbi.nlm.nih.gov/bioassay/346#section=Protocol Plate Layout Plate Action (Centrifuge) Assay Start (Add Compounds) Incubation Readout
  • 37. Sample Protocol https://pubchem.ncbi.nlm.nih.gov/bioassay/346#section=Protocol Plate Layout Plate Action (Centrifuge) Assay Start (Add Compounds) Incubation Readout
  • 40. Next Steps • Next Steps • Continue our work with UCSF with simple proof of concept protocols • Reach out to Vendors and see if we can integrate into their simulation platforms • Thermo • Wako • Keep reaching out and learning from experts like yourself. http://www.bioassayexpress.com
  • 41. Try it out! • Collaborative Drug Discovery • Alex Clark • Hande Kücük McGuinty • Peter Gedeck • Samantha Jeschonek • Barry Bunin • For more info • bae@collaborativedrug.com http://www.bioassayexpress.com
  • 42. Smart Drug Discovery Software SavesScientistsTimeSmart Drug Discovery Software Saves Time Session: CINF: Sci-Mix Location: Exhibit Hall B, Date & Time: Monday, Aug 26 8:00 PM

Editor's Notes

  1. Now At the onset, assay management seems like it should be straightforward. You plan an experiment and capture ideas, perform the experiment and capture data, analyze and identify trends, store and protect the reults, and then retrieve the data to build knowledge.
  2. BUT in reality what happen is different. Best plans turn into post it note edits and lost attribution, Incomplete data dumps, that lead to results with missing information causing data that is lost and non reproducible. Failure to capture assay correctly in fact greatly attributes to the scientific reproducibility crisis. And then when you try and retrieve data you’re left frustrated, either because you can’t find the data, or you do and it’s not descriptive essentially making it unuseable. This just leads to wasted time and lost opportunities
  3. But even the best case scenario for assay management is inefficient It lacks a common vocab between scientists (groups or organization) And there’s a limited ability to mine your assays or search efficiently
  4. Plus, BioAssay Express was built with “FAIR” data in mind, and have scored high marks in that arena. A lofty goal, but by capturing assay meta data in an organized, machine readable, and ontology driven format, we hope that our software product helps reduce the reproducibility crisis and streamlines assay optimization.
  5. In essencd we provide a solution that turns This human readable, unorganized text into A structured, machine readable format, Allowing you to take fiull advantage of your assay metadata for searching and comparisons across all assays.
  6. Now the real beauty of BAE is how convenient it is to use. Annotation of an entire protocol can occur in a matter of minutes, and to demonstrate that I’m going to show you a real time annotation. An assay is simply pasted into the text box We click request suggestions And almost immediately these annotation fields begin to populate. First the text is mined for key terms, shown in green. Then the other fields being to populate via predictive text. Using that hybrid machine learning approach, based on the terms entered, BAE predicts associations and suggests additional terms. But we also allow manual curation review. Accuracy is key in the field of assay informatics, and as such we did not want to rely solely on a text mining approach. The hybrid machine learning greatly accelerates human curation.
  7. Let’s take a closer look at the Text mining aspect. You’ll see that the green fields, those from text mining, are show in green, and match exact phrases or synonyms from the full protocol text. In gray are our predictive text functions, and this AI learns and grows with your personal data (of course privately and securely maintained). One easy area to think about this is in the field of detection methods. We can predict that certain assay kits will require a particular type of physical detection or readout, and that that detection will require a particular instrument reader.
  8. Instead of just a list of similar assays,we visually provide you with all assay metadata. You can compare hundreds of assays at a glance. Here, we sorted approx. 4000 assays down to about 40 in less than a minute. At the top are your assays that match the search terms and on the right are you metadata fields Blue boxes = match, blue lines = match inferred by the hierarchy. Once you create these assay fingerprints you’re able to beging quickly querying your data and using it to probe new areas of research. You might use this to find out why you assay results are different from others doing the same assay (sometimes as simple as a different instrument detection was used” We might ask to reterive any data done with a particular kit, or on a particular target We can eas ask what studies are being done on that new fav protein that came out in the mass spec screen Or inquire about diversifying our assays (avoiding assay bias) And because we’re capturing this data with a common language, in a easy to compare way, we can start to reduce the errors in reproducibility caused by incomplete assay reporting.
  9. One of my favorite aspects of BAE // The real beauty and power of BAE though comes from our assay fingerprint. Across the top are all those assays matching your previous search criteria, and on the left are all of those annotation terms. Blue boxes represent the presence of a term and blue lines indicate a presence inferred by the hierarchy. Suddenly that assay metadata becomes a tool to power and query your results. Why are two similar assay giving different results? Why isn’t a particular hit present in one assay? Sometimes that answer is as simple as a different cell line was used, or a different detection instrument with a varying level of sensitivity. But the answers SHOULD come that quick, assay optimization or comparison shouldn’t require hours of literature comparison or holding up dozens of notebook entries. It should be quick, painless, and informative. And that’s what we hope BAE accomplishes
  10. One of my favorite aspects of BAE // The real beauty and power of BAE though comes from our assay fingerprint. Across the top are all those assays matching your previous search criteria, and on the left are all of those annotation terms. Blue boxes represent the presence of a term and blue lines indicate a presence inferred by the hierarchy. Suddenly that assay metadata becomes a tool to power and query your results. Why are two similar assay giving different results? Why isn’t a particular hit present in one assay? Sometimes that answer is as simple as a different cell line was used, or a different detection instrument with a varying level of sensitivity. But the answers SHOULD come that quick, assay optimization or comparison shouldn’t require hours of literature comparison or holding up dozens of notebook entries. It should be quick, painless, and informative. And that’s what we hope BAE accomplishes
  11. One of my favorite aspects of BAE // The real beauty and power of BAE though comes from our assay fingerprint. Across the top are all those assays matching your previous search criteria, and on the left are all of those annotation terms. Blue boxes represent the presence of a term and blue lines indicate a presence inferred by the hierarchy. Suddenly that assay metadata becomes a tool to power and query your results. Why are two similar assay giving different results? Why isn’t a particular hit present in one assay? Sometimes that answer is as simple as a different cell line was used, or a different detection instrument with a varying level of sensitivity. But the answers SHOULD come that quick, assay optimization or comparison shouldn’t require hours of literature comparison or holding up dozens of notebook entries. It should be quick, painless, and informative. And that’s what we hope BAE accomplishes
  12. One of my favorite aspects of BAE // The real beauty and power of BAE though comes from our assay fingerprint. Across the top are all those assays matching your previous search criteria, and on the left are all of those annotation terms. Blue boxes represent the presence of a term and blue lines indicate a presence inferred by the hierarchy. Suddenly that assay metadata becomes a tool to power and query your results. Why are two similar assay giving different results? Why isn’t a particular hit present in one assay? Sometimes that answer is as simple as a different cell line was used, or a different detection instrument with a varying level of sensitivity. But the answers SHOULD come that quick, assay optimization or comparison shouldn’t require hours of literature comparison or holding up dozens of notebook entries. It should be quick, painless, and informative. And that’s what we hope BAE accomplishes
  13. One of my favorite aspects of BAE // The real beauty and power of BAE though comes from our assay fingerprint. Across the top are all those assays matching your previous search criteria, and on the left are all of those annotation terms. Blue boxes represent the presence of a term and blue lines indicate a presence inferred by the hierarchy. Suddenly that assay metadata becomes a tool to power and query your results. Why are two similar assay giving different results? Why isn’t a particular hit present in one assay? Sometimes that answer is as simple as a different cell line was used, or a different detection instrument with a varying level of sensitivity. But the answers SHOULD come that quick, assay optimization or comparison shouldn’t require hours of literature comparison or holding up dozens of notebook entries. It should be quick, painless, and informative. And that’s what we hope BAE accomplishes
  14. One of my favorite aspects of BAE // The real beauty and power of BAE though comes from our assay fingerprint. Across the top are all those assays matching your previous search criteria, and on the left are all of those annotation terms. Blue boxes represent the presence of a term and blue lines indicate a presence inferred by the hierarchy. Suddenly that assay metadata becomes a tool to power and query your results. Why are two similar assay giving different results? Why isn’t a particular hit present in one assay? Sometimes that answer is as simple as a different cell line was used, or a different detection instrument with a varying level of sensitivity. But the answers SHOULD come that quick, assay optimization or comparison shouldn’t require hours of literature comparison or holding up dozens of notebook entries. It should be quick, painless, and informative. And that’s what we hope BAE accomplishes
  15. One of my favorite aspects of BAE // The real beauty and power of BAE though comes from our assay fingerprint. Across the top are all those assays matching your previous search criteria, and on the left are all of those annotation terms. Blue boxes represent the presence of a term and blue lines indicate a presence inferred by the hierarchy. Suddenly that assay metadata becomes a tool to power and query your results. Why are two similar assay giving different results? Why isn’t a particular hit present in one assay? Sometimes that answer is as simple as a different cell line was used, or a different detection instrument with a varying level of sensitivity. But the answers SHOULD come that quick, assay optimization or comparison shouldn’t require hours of literature comparison or holding up dozens of notebook entries. It should be quick, painless, and informative. And that’s what we hope BAE accomplishes
  16. One of my favorite aspects of BAE // The real beauty and power of BAE though comes from our assay fingerprint. Across the top are all those assays matching your previous search criteria, and on the left are all of those annotation terms. Blue boxes represent the presence of a term and blue lines indicate a presence inferred by the hierarchy. Suddenly that assay metadata becomes a tool to power and query your results. Why are two similar assay giving different results? Why isn’t a particular hit present in one assay? Sometimes that answer is as simple as a different cell line was used, or a different detection instrument with a varying level of sensitivity. But the answers SHOULD come that quick, assay optimization or comparison shouldn’t require hours of literature comparison or holding up dozens of notebook entries. It should be quick, painless, and informative. And that’s what we hope BAE accomplishes
  17. One of my favorite aspects of BAE // The real beauty and power of BAE though comes from our assay fingerprint. Across the top are all those assays matching your previous search criteria, and on the left are all of those annotation terms. Blue boxes represent the presence of a term and blue lines indicate a presence inferred by the hierarchy. Suddenly that assay metadata becomes a tool to power and query your results. Why are two similar assay giving different results? Why isn’t a particular hit present in one assay? Sometimes that answer is as simple as a different cell line was used, or a different detection instrument with a varying level of sensitivity. But the answers SHOULD come that quick, assay optimization or comparison shouldn’t require hours of literature comparison or holding up dozens of notebook entries. It should be quick, painless, and informative. And that’s what we hope BAE accomplishes
  18. One of my favorite aspects of BAE // The real beauty and power of BAE though comes from our assay fingerprint. Across the top are all those assays matching your previous search criteria, and on the left are all of those annotation terms. Blue boxes represent the presence of a term and blue lines indicate a presence inferred by the hierarchy. Suddenly that assay metadata becomes a tool to power and query your results. Why are two similar assay giving different results? Why isn’t a particular hit present in one assay? Sometimes that answer is as simple as a different cell line was used, or a different detection instrument with a varying level of sensitivity. But the answers SHOULD come that quick, assay optimization or comparison shouldn’t require hours of literature comparison or holding up dozens of notebook entries. It should be quick, painless, and informative. And that’s what we hope BAE accomplishes
  19. One of my favorite aspects of BAE // The real beauty and power of BAE though comes from our assay fingerprint. Across the top are all those assays matching your previous search criteria, and on the left are all of those annotation terms. Blue boxes represent the presence of a term and blue lines indicate a presence inferred by the hierarchy. Suddenly that assay metadata becomes a tool to power and query your results. Why are two similar assay giving different results? Why isn’t a particular hit present in one assay? Sometimes that answer is as simple as a different cell line was used, or a different detection instrument with a varying level of sensitivity. But the answers SHOULD come that quick, assay optimization or comparison shouldn’t require hours of literature comparison or holding up dozens of notebook entries. It should be quick, painless, and informative. And that’s what we hope BAE accomplishes
  20. CDD Vault is a software platform for your secure data management and registration needs, able to capture all kinds of data, from numeric assay data to biological images. Visit my colleague Janice Darlington at Poster XY for more information.