The challenge of accurately characterizing bioassays is a real pain point for many drug discovery organizations. Research has shown that some organizations have legacy assay collections exceeding 20,000 protocols, the great majority of which are not accurately characterized. This problem is compounded by the fact that many new protocol registrations are still not following FAIR (Findability, Accessibility, Interoperability, and Reusability) Data principles.
BioAssay Express is a tool focused on transforming the traditional protocol description from an unstructured free form text into a well-curated data store based upon FAIR Data principles. By using well-defined annotations for assays, the tool enables precise ontology based searches without having to resort to imprecise keyword searches.
This talk explores a number of new important features designed to help scientists accelerate the drug discovery process. Some example use-cases include: enabling drug repositioning projects; improving SAR models; identifying appropriate machine learning data sets; fine-tuning integrative-omic pathways;
An aspirational goal for our team is to build a metadata schema based on semantic web vocabularies that is comprehensive to the extent that the text description becomes optional. One of the many possibilities is to take the initial prospective ELN entry for a bioassay protocol and feed it directly to an automated instrument. While there are many challenges involved in creating the ELN-to-robot loop, we will provide some insights into our collaborations with UCSF automation experts.
In summary, the ability to quickly and accurately search or analyze bioassay data (public or internal) is a rate limiting problem in drug discovery. We will present the latest developments toward removing this bottleneck.
https://plan.core-apps.com/acs_sd2019/abstract/6f58993d-a716-49ad-9b09-609edde5a3f4
BioAssay Express: Creating and exploiting assay metadata
1.
CINF 24: BioAssay express: Creating and
exploiting assay metadata
collaborativedrug.com
bioassayexpress.com
bae@collaborativedrug.com
Philip Cheung
American Chemical Society
Sunday, August 25, 2019 - 3:10 PM -- Grand Ballroom A, Omni San Diego Hotel
2. A little bit about me…
Graduated Harvey Mudd College in 1996 B.S. in Biology
SAIC – 1996 – 1997
High Throughput Robotics Systems For ISB (Leroy Hood)
Orillion 1997 – 1999
WorldCom MCI / PlayStation 2 Event Management System
Medibuy.com 1999 – 2001
Global Health Exchange Between Premier Hospital Group
Pfizer Global Research and Development 2001 – 2009
Pfizer Global Crystal Structure Database
Oncology Project Support (Computational Biology)
Ophthalmology Indication Discovery (Machine Learning)
Dart NeuroScience (2009-2018)
Bioinformatics Group Leader
Independent Consultant (2018-Present)
Currently Support multiple informatics / Bioinformatics companies in San
Francisco, Boston, and San Diego
3. So what is assay informatics and why is it
exciting?
4. The IDEAL Cycle of Assay Management
• Plan experiments & capture ideas
• Perform experiments & capture data
• Analyze data & identify trends
• Store & protect the results
• Retrieve data & build knowledge -
across Concepts / Time / Projects
5. The REALISTIC Cycle of Assay Management
• Plan experiments & capture ideas
• Perform experiments & capture data
• Analyze data & identify trends
• Store & organize the results
• Retrieve data & build knowledge –
across Concepts / Time / Projects
• Post-It Note Edits & Lost Attributions
• Incomplete “Data Dump” & Lost Data
• Siloed data & Incomplete information
• Lost & non-reproducible data (crisis!)
• Inaccessible & unusable data leading to…
TIME WASTED & OPPORTUNIES LOST!
6. • No Common Vocabulary
• Limited Assay Mining/Searching
Capabilities
Barrier to Collaboration
Failure to Provide Insight
Even “Best Case” Assay Management is Inefficient
7. Efficiently & Quickly
Organize Assay Data
Machine Readable Format
Common Vocabulary through
Ontology Markup
Introduces Assay Informatics,
Providing New Insight by
Querying Biologic, Chemical,
and Assay Meta Data
BioAssay Express Leads the Field of Assay Informatics
11. BioAssay Express - In Action!
Assay Registration is a Breeze!
• Insert Protocol
• Text Mining
• Expert Ontologies
• Predictive Text
• Machine Learning
• NLP
• Correlation Models
• Human Curation
• Accuracy is key
12. Dose Response assay for agonists of 5-Hydroxytryptamine
(Serotonin) Receptor Subtype 1A (5HT1A)
Assay Description:
Widely expressed in the human brain, 5-hydroxytryptamine (5-HT,
serotonin) receptors have been shown to have an important role in
depression as well as other cognitive and metabolic disorders [1, 2].
Discovering novel modulators of the 5-HT1A serotonin receptor
may not only help probe the function of this receptor, but also help
better understand the complex relationship among the 5-HT
receptor subtypes.
Protocol Summary:
As with the primary HTS assay, a Chinese Hamster Ovary
(CHO) cell line stably transfected with human 5HT1a receptor,
the nuclear factor of activated T-cell-beta lactamase (NFAT-BLA)
reporter construct and the G-alpha-15 promiscuous coupling protein
was used (Invitrogen, part K1083).
Cells were cultured in T-175 sq. cm flasks (Corning, part 431080) at
37 deg C and 95% RH. The assay began by dispensing 10 microliters
of cell suspension to each test well of a 384 well plate.
BioAssay Express: Optimized for Low False Positives
13. Assay Fingerprints: Bringing Informatics to Metadata
AssayMetadata
Assays Assay Property Grid
• Generate assay fingerprints
• Compare hundreds of assays at a
glance
• Find, Share, and Innovate
Blue boxes = exact match
- Blue lines = match inferred from
hierarchy
• Why are my results different from others
doing the ”same” assay?
• Has anyone studied this disease variant in
neurons?
• Did results vary when we switched
instrument models?
• Has someone else already done my
experiment
• What other programs have already screened
my target? Can I jumpstart my new program
with some existing chemistry?
• I want to do some machine learning – can I
find some other “appropriate” experiments.
14. So, I have a database…
how do I this apply to my data?
15. So how can I use this technology?
So let’s take a look at the steps required to process a
semi-structured dataset like clinicaltrials.gov
19. So how do we go from unstructured to structured data?
Step 1 – Review the data you’re
importing; Map your data to
ontologies
ClinicalTrials.gov
Step 2 –Perform the import; some
percentage will map perfectly.
Step 3 -- Curate the remaining
data using the BAE’s NLP models.
20. Step 1- Map your data to ontologies
Open source BioAssay Template Tool
https://github.com/cdd/bioassay-template
23. Step 2- Some percentage will map out of the box
In this example, I imported a small
subset of clinical trials.gov – 18825 of
the 313472 (~6%) available studies
If I was interested in multiple myeloma,
I could write a script that was
aggressive mapping and assign all of
these as “multiple myeloma”, or I could
assign only the perfect matches, and
let BAE’s NLP help me with the
mappings.
24. Step 3 – Machine Assisted Curation
Importing /
Mapping data
allows BAE to
build Bayesian
Models
25. Interesting questions you can visualize
What cancer trials was Celecoxib used in What other combination therapies were also used in those
31. Current Directions -- Projects
An aspirational goal for our team is to build a metadata schema based on semantic web
vocabularies that is comprehensive to the extent that the text description becomes optional
There are many challenges involved in creating the ELN-to-robot loop, here we provide some
insights into our collaborations with UCSF automation experts at the Small Molecule Discovery
Center.
32. The High level goal – ELN to Robot
GBG by BioSero
Cellario By HiResBio
Adapter/Builder
Director by Wako (Fuji)
- Model the protocols at the step level so
we can export them out to automation
systems
- Import the results back into the system
35. Break the protocol down into steps
• Model the Dependencies
• Equipment
dependencies
• Reagent dependencies
• Previous step
dependencies
• Model the Steps
• Protocol Steps
40. Next Steps
• Next Steps
• Continue our work with UCSF with
simple proof of concept protocols
• Reach out to Vendors and see if we
can integrate into their simulation
platforms
• Thermo
• Wako
• Keep reaching out and learning
from experts like yourself.
http://www.bioassayexpress.com
41. Try it out!
• Collaborative Drug Discovery
• Alex Clark
• Hande Kücük McGuinty
• Peter Gedeck
• Samantha Jeschonek
• Barry Bunin
• For more info
• bae@collaborativedrug.com http://www.bioassayexpress.com
42. Smart Drug Discovery Software SavesScientistsTimeSmart Drug Discovery Software Saves Time
Session: CINF: Sci-Mix
Location: Exhibit Hall B,
Date & Time: Monday,
Aug 26 8:00 PM
Editor's Notes
Now At the onset, assay management seems like it should be straightforward.
You plan an experiment and capture ideas, perform the experiment and capture data, analyze and identify trends, store and protect the reults, and then retrieve the data to build knowledge.
BUT in reality what happen is different. Best plans turn into post it note edits and lost attribution,
Incomplete data dumps, that lead to results with missing information causing data that is lost and non reproducible. Failure to capture assay correctly in fact greatly attributes to the scientific reproducibility crisis. And then when you try and retrieve data you’re left frustrated, either because you can’t find the data, or you do and it’s not descriptive essentially making it unuseable. This just leads to wasted time and lost opportunities
But even the best case scenario for assay management is inefficient
It lacks a common vocab between scientists (groups or organization)
And there’s a limited ability to mine your assays or search efficiently
Plus, BioAssay Express was built with “FAIR” data in mind, and have scored high marks in that arena.
A lofty goal, but by capturing assay meta data in an organized, machine readable, and ontology driven format,
we hope that our software product helps reduce the reproducibility crisis and streamlines assay optimization.
In essencd we provide a solution that turns
This human readable, unorganized text into
A structured, machine readable format,
Allowing you to take fiull advantage of your assay metadata for searching and comparisons across all assays.
Now the real beauty of BAE is how convenient it is to use. Annotation of an entire protocol can occur in a matter of minutes, and to demonstrate that I’m going to show you a real time annotation.
An assay is simply pasted into the text box
We click request suggestions
And almost immediately these annotation fields begin to populate.
First the text is mined for key terms, shown in green. Then the other fields being to populate via predictive text. Using that hybrid machine learning approach, based on the terms entered, BAE predicts associations and suggests additional terms. But we also allow manual curation review. Accuracy is key in the field of assay informatics, and as such we did not want to rely solely on a text mining approach. The hybrid machine learning greatly accelerates human curation.
Let’s take a closer look at the Text mining aspect.
You’ll see that the green fields, those from text mining, are show in green, and match exact phrases or synonyms from the full protocol text.
In gray are our predictive text functions, and this AI learns and grows with your personal data (of course privately and securely maintained). One easy area to think about this is in the field of detection methods. We can predict that certain assay kits will require a particular type of physical detection or readout, and that that detection will require a particular instrument reader.
Instead of just a list of similar assays,we visually provide you with all assay metadata. You can compare hundreds of assays at a glance. Here, we sorted approx. 4000 assays down to about 40 in less than a minute.
At the top are your assays that match the search terms and on the right are you metadata fields
Blue boxes = match, blue lines = match inferred by the hierarchy.
Once you create these assay fingerprints you’re able to beging quickly querying your data and using it to probe new areas of research. You might use this to find out why you assay results are different from others doing the same assay (sometimes as simple as a different instrument detection was used”
We might ask to reterive any data done with a particular kit, or on a particular target
We can eas ask what studies are being done on that new fav protein that came out in the mass spec screen
Or inquire about diversifying our assays (avoiding assay bias)
And because we’re capturing this data with a common language, in a easy to compare way, we can start to reduce the errors in reproducibility caused by incomplete assay reporting.
One of my favorite aspects of BAE // The real beauty and power of BAE though comes from our assay fingerprint.
Across the top are all those assays matching your previous search criteria, and on the left are all of those annotation terms.
Blue boxes represent the presence of a term and blue lines indicate a presence inferred by the hierarchy.
Suddenly that assay metadata becomes a tool to power and query your results. Why are two similar assay giving different results? Why isn’t a particular hit present in one assay? Sometimes that answer is as simple as a different cell line was used, or a different detection instrument with a varying level of sensitivity. But the answers SHOULD come that quick, assay optimization or comparison shouldn’t require hours of literature comparison or holding up dozens of notebook entries. It should be quick, painless, and informative. And that’s what we hope BAE accomplishes
One of my favorite aspects of BAE // The real beauty and power of BAE though comes from our assay fingerprint.
Across the top are all those assays matching your previous search criteria, and on the left are all of those annotation terms.
Blue boxes represent the presence of a term and blue lines indicate a presence inferred by the hierarchy.
Suddenly that assay metadata becomes a tool to power and query your results. Why are two similar assay giving different results? Why isn’t a particular hit present in one assay? Sometimes that answer is as simple as a different cell line was used, or a different detection instrument with a varying level of sensitivity. But the answers SHOULD come that quick, assay optimization or comparison shouldn’t require hours of literature comparison or holding up dozens of notebook entries. It should be quick, painless, and informative. And that’s what we hope BAE accomplishes
One of my favorite aspects of BAE // The real beauty and power of BAE though comes from our assay fingerprint.
Across the top are all those assays matching your previous search criteria, and on the left are all of those annotation terms.
Blue boxes represent the presence of a term and blue lines indicate a presence inferred by the hierarchy.
Suddenly that assay metadata becomes a tool to power and query your results. Why are two similar assay giving different results? Why isn’t a particular hit present in one assay? Sometimes that answer is as simple as a different cell line was used, or a different detection instrument with a varying level of sensitivity. But the answers SHOULD come that quick, assay optimization or comparison shouldn’t require hours of literature comparison or holding up dozens of notebook entries. It should be quick, painless, and informative. And that’s what we hope BAE accomplishes
One of my favorite aspects of BAE // The real beauty and power of BAE though comes from our assay fingerprint.
Across the top are all those assays matching your previous search criteria, and on the left are all of those annotation terms.
Blue boxes represent the presence of a term and blue lines indicate a presence inferred by the hierarchy.
Suddenly that assay metadata becomes a tool to power and query your results. Why are two similar assay giving different results? Why isn’t a particular hit present in one assay? Sometimes that answer is as simple as a different cell line was used, or a different detection instrument with a varying level of sensitivity. But the answers SHOULD come that quick, assay optimization or comparison shouldn’t require hours of literature comparison or holding up dozens of notebook entries. It should be quick, painless, and informative. And that’s what we hope BAE accomplishes
One of my favorite aspects of BAE // The real beauty and power of BAE though comes from our assay fingerprint.
Across the top are all those assays matching your previous search criteria, and on the left are all of those annotation terms.
Blue boxes represent the presence of a term and blue lines indicate a presence inferred by the hierarchy.
Suddenly that assay metadata becomes a tool to power and query your results. Why are two similar assay giving different results? Why isn’t a particular hit present in one assay? Sometimes that answer is as simple as a different cell line was used, or a different detection instrument with a varying level of sensitivity. But the answers SHOULD come that quick, assay optimization or comparison shouldn’t require hours of literature comparison or holding up dozens of notebook entries. It should be quick, painless, and informative. And that’s what we hope BAE accomplishes
One of my favorite aspects of BAE // The real beauty and power of BAE though comes from our assay fingerprint.
Across the top are all those assays matching your previous search criteria, and on the left are all of those annotation terms.
Blue boxes represent the presence of a term and blue lines indicate a presence inferred by the hierarchy.
Suddenly that assay metadata becomes a tool to power and query your results. Why are two similar assay giving different results? Why isn’t a particular hit present in one assay? Sometimes that answer is as simple as a different cell line was used, or a different detection instrument with a varying level of sensitivity. But the answers SHOULD come that quick, assay optimization or comparison shouldn’t require hours of literature comparison or holding up dozens of notebook entries. It should be quick, painless, and informative. And that’s what we hope BAE accomplishes
One of my favorite aspects of BAE // The real beauty and power of BAE though comes from our assay fingerprint.
Across the top are all those assays matching your previous search criteria, and on the left are all of those annotation terms.
Blue boxes represent the presence of a term and blue lines indicate a presence inferred by the hierarchy.
Suddenly that assay metadata becomes a tool to power and query your results. Why are two similar assay giving different results? Why isn’t a particular hit present in one assay? Sometimes that answer is as simple as a different cell line was used, or a different detection instrument with a varying level of sensitivity. But the answers SHOULD come that quick, assay optimization or comparison shouldn’t require hours of literature comparison or holding up dozens of notebook entries. It should be quick, painless, and informative. And that’s what we hope BAE accomplishes
One of my favorite aspects of BAE // The real beauty and power of BAE though comes from our assay fingerprint.
Across the top are all those assays matching your previous search criteria, and on the left are all of those annotation terms.
Blue boxes represent the presence of a term and blue lines indicate a presence inferred by the hierarchy.
Suddenly that assay metadata becomes a tool to power and query your results. Why are two similar assay giving different results? Why isn’t a particular hit present in one assay? Sometimes that answer is as simple as a different cell line was used, or a different detection instrument with a varying level of sensitivity. But the answers SHOULD come that quick, assay optimization or comparison shouldn’t require hours of literature comparison or holding up dozens of notebook entries. It should be quick, painless, and informative. And that’s what we hope BAE accomplishes
One of my favorite aspects of BAE // The real beauty and power of BAE though comes from our assay fingerprint.
Across the top are all those assays matching your previous search criteria, and on the left are all of those annotation terms.
Blue boxes represent the presence of a term and blue lines indicate a presence inferred by the hierarchy.
Suddenly that assay metadata becomes a tool to power and query your results. Why are two similar assay giving different results? Why isn’t a particular hit present in one assay? Sometimes that answer is as simple as a different cell line was used, or a different detection instrument with a varying level of sensitivity. But the answers SHOULD come that quick, assay optimization or comparison shouldn’t require hours of literature comparison or holding up dozens of notebook entries. It should be quick, painless, and informative. And that’s what we hope BAE accomplishes
One of my favorite aspects of BAE // The real beauty and power of BAE though comes from our assay fingerprint.
Across the top are all those assays matching your previous search criteria, and on the left are all of those annotation terms.
Blue boxes represent the presence of a term and blue lines indicate a presence inferred by the hierarchy.
Suddenly that assay metadata becomes a tool to power and query your results. Why are two similar assay giving different results? Why isn’t a particular hit present in one assay? Sometimes that answer is as simple as a different cell line was used, or a different detection instrument with a varying level of sensitivity. But the answers SHOULD come that quick, assay optimization or comparison shouldn’t require hours of literature comparison or holding up dozens of notebook entries. It should be quick, painless, and informative. And that’s what we hope BAE accomplishes
One of my favorite aspects of BAE // The real beauty and power of BAE though comes from our assay fingerprint.
Across the top are all those assays matching your previous search criteria, and on the left are all of those annotation terms.
Blue boxes represent the presence of a term and blue lines indicate a presence inferred by the hierarchy.
Suddenly that assay metadata becomes a tool to power and query your results. Why are two similar assay giving different results? Why isn’t a particular hit present in one assay? Sometimes that answer is as simple as a different cell line was used, or a different detection instrument with a varying level of sensitivity. But the answers SHOULD come that quick, assay optimization or comparison shouldn’t require hours of literature comparison or holding up dozens of notebook entries. It should be quick, painless, and informative. And that’s what we hope BAE accomplishes
CDD Vault is a software platform for your secure data management and registration needs, able to capture all kinds of data, from numeric assay data to biological images. Visit my colleague Janice Darlington at Poster XY for more information.