EUGM 2013 - Andrea de Souza (Broad Institute): Setting the stage for the “SD” file for bioassay definitions and data: Building the BioAssay Research Database
Building on the success of the Molecular Libraries Program (MLP), the Broad Institute MLP team is co-leading with the National Center for Advancing Translational Sciences (NCATS) an NIH-sponsored project across 7 institutions to augment the data in PubChem with the creation of the Bioassay Research Database (BARD). The BARD platform standardizes the representation of bioassays in a next-generation repository and provides a user-friendly interface that supports sophisticated queries and data mining. Data originating from publicly-funded chemical biology research efforts will be presented with appropriate context including structured assay and result annotations. These annotations use relevant ontologies including, for example, the BioAssay Ontology, Gene Ontology, and the Unit Ontology. We simplified the representation of ontologies into a hierarchical data dictionary to enable data producers to more easily create and upload projects, assays, and results, while creating two separate user interfaces for data consumers. The BARD WebQuery Interface leverages a Google-like interface with auto-suggest functionality for complex queries, such as retrieval of all assays, and results for biological pathways such as “DNA repair” or “oxidative stress”; presentation of this information in a rich-user interface that includes spreadsheet support for structure-activity relationship analyses. Compounds, projects, and assays can be exported into an Amazon-like query cart for refining queries, and additional computations can be executed on datasets via community-developed plug-ins including promiscuity analyses via the BioActivity Data Associative Promiscuity Pattern Learning Engine (BADAPPLE) and a CYP450 metabolism site prediction plugin (hgp://www.farma.ku.dk/smartcyp/) using 2D structure fingerprints. Integration between the WebQuery and Desktop clients enables power users to initiate analyses in WebQuery and gain more insight via the Desktop client.
Lastly, as industry and academia work together to innovate in small-molecule therapeutics, we have created an initial specification for the Assay Definition Standard. This standard through the Assay Definition Format has been used as the medium of data file transfer for data upload. We expect that the Chemical Biology community now has an opportunity to leverage this standard to routinely transfer assay and result data within and between information systems and organizations.
This presentation will highlight the BARD platform with a focus on representing the cumulative body of work that exploits the ChemAxon toolkit.
EUGM 2013 - Jon Patterson (ChemAxon) ChemAxon Platform for Scientists
Semelhante a EUGM 2013 - Andrea de Souza (Broad Institute): Setting the stage for the “SD” file for bioassay definitions and data: Building the BioAssay Research Database
Towards automated phenotypic cell profiling with high-content imagingOla Spjuth
Semelhante a EUGM 2013 - Andrea de Souza (Broad Institute): Setting the stage for the “SD” file for bioassay definitions and data: Building the BioAssay Research Database (20)
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
EUGM 2013 - Andrea de Souza (Broad Institute): Setting the stage for the “SD” file for bioassay definitions and data: Building the BioAssay Research Database
1. Andrea de Souza
Director, Informatics, Data Analysis & Finance
Center for the Science of Therapeutics
May 29, 2013
BioAssay Research Database
2. Direct Contributors
NIH Molecular Libraries – Glenn McFadden, Ajay Pillai
NIH Chemical Genomics Center – Chris Austin (PI), John Braisted, Marc
Ferrer, Rajarshi Guha, Ajit Jadhav, Dac-Trung Nguyen, Tyler Peryea, Noel
Southall, Henrike Veith
Broad Institute – Benjamin Alexander, Jacob Asiedu, Kay Aubrey, Joshua
Bittker, Steve Brudz, Simon Chatwin, Paul Clemons, Vlado Dancik, Siva
Dandapani, Andrea de Souza, Dan Durkin, David Lahr, Jeri Levine, Judy
McGloughlin, Phil Montgomery, Jose Perez, Stuart Schreiber (PI), Gil
Walzer, Xiaorong Xiang
University of New Mexico – Cristian Bologa, Steve Mathias, Tudor Oprea,
Larry Sklar (PI), Oleg Ursu, Anna Waller, Jeremy Yang
University of Miami – Saminda Abeyruwan, Hande Küküc, Vance
Lemmon, Ahsan Mir, Magdalena Przydzial, Kunie Sakurai, Stephan
Schürer, Uma Vempati, Ubbo Visser
Vanderbilt University – Eric Dawson, Bill Graham, Craig Lindsley (PI),
Shaun Stauffer
Sanford-Burnham Medical Research Institute – “T.C.” Chung, Jena
Diwan, Michael Hedrick, Gavin Magnuson, Siobhan Malany, Ian Pass,
Anthony Pinkerton, Derek Stonich, John Reed (PI)
Scripps Research Institute – Yasel Cruz, Mark Southern,
Hugh Rosen (PI)
3. BARD: BioAssay Research Database
Mission: Enable biomedical researchers and cheminformatic
scientists to effectively use MLP data to generate new
hypotheses
• Unique collaboration amongst 7 NIH & academic centers
• Develop and adopt an Assay Definition Standard (ADS)
• Provide tools for assay registration, querying &
visualization
o Deploy predictive models
o Foster new methods to interpret chemical biology data
o Enable private data sharing
• Developed as an open-source, industrial-strength
platform to support public translational research
4. BARD: BioAssay Research Database
Mission: Enable biomedical researchers and cheminformatic
scientists to effectively use MLP data to generate new
hypotheses
Team Science
• Provide tools for assay registration and data querying &
visualization
o Deploy predictive models
o Foster new methods to interpret chemical biology data
o Enable private data sharing
• Developed as an open-source, industrial-strength platform to
Research Data Management
Technology
Predictive Models
The BARD platform will support public translational research
9. PubChem BioAssay and BARD
PubChem BARD
Missing or fuzzy assay definitions,
experiments and project concepts
Introduce assay definitions,
experiments and projects
‘Column header’ centric with
concentration details embedded
Result types and concentrations as
experimental variables
Extensive use of unstructured text Transition to structured use of
common language
PubChem
MLP-BioAssay
structure
the data
10. Entrez
Uniprot
Gene Ontology Gene Ontology
DiseaseOntology
BioAssay Ontology BioAssay Ontology BioAssay Ontology BioAssay Ontology
UnitOntology
Uniprot Uniprot
UnitOntology
BARD Dictionary & Term Hierarchy
ChemicalOntology
BARD Assay Definition Hierarchy
• Annotate all assays to a minimum standard
• Integrate and extend ontologies
• Enable assay registration
• Represent assays, results, experiments using ADS
• Exchange information in ADS via ADF
Structuring the Data
11. BARD Technology Components
Define & Register
Assays
Data Dictionary – std terms
Catalog of Assay Protocols
High Quality Data &
Result Deposition
Calculations & Results
Project-experiment association
Query & Interpret
Information
Intuitive Guided Queries
Cross Assay & SAR centric views
Advance applications
EnableHypothesisGeneration
Novice Expert
12. BARD Technology Components
Define & Register
Assays
Data Dictionary – std terms
Catalog of Assay Protocols
High Quality Data &
Result Deposition
Calculations & Results
Project-experiment association
Query & Interpret
Information
Intuitive Guided Queries
Cross Assay & SAR centric views
Advance applications
EnableHypothesisGeneration
Novice Expert
13. Web Client
Filter on annotations, such as
detection method type
Google-like searching of: 4,000+ assays, 35M+ compounds, 300+ projects
Save items of
interest for further
analysis
Amazon-like Query Cart
16. Sunburst Visualization
Molecular activity against target classes
Target classifications from PantherDB
PANTHER in 2013: modeling the evolution of gene function,
and other gene attributes, in the context of phylogenetic trees.
Huaiyu Mi, Anushya Muruganujan and Paul D. Thomas
Nucl. Acids Res. (2012) doi: 10.1093/nar/gks1118
17. Jersey
D3.js
Web Query & Desktop ClientsData Warehouse & REST API Catalog of Assay Protocols
Commercial License
MySQL support for
CAP coming soon
As open source as possible
JGoodies
18. Chemaxon Usage in BARD
UNM Promiscuity Plugin
JChem for scaffold decomposition
REST API & Warehouse
JChem for rendering structures and
molecule fingerprint generation
http://bard.nih.gov/api/latest/compounds/6915727/image?s=200
http://bard.nih.gov/api/latest/compounds/?filter=n1cccc2ccccc12%5Bstructure%5D&type=sim&cutoff=0.9&expand=true
http://bard.nih.gov/api/latest/plugins/badapple/prom/cid/6915727?expand=true
19. Chemaxon Usage in BARD
Web Query Client
JChem for rendering structures
Desktop Client
JChem for rendering structures,
molecule import & export
Marvin for drawing query structures
20. • BioActivity Data Associative
Promiscuity Pattern Learning Engine
• Associations via scaffolds for chemical
space navigation
Example URI* description
<base>/badapple/prom/cid/7
52424
For compound with specified ID,
return scaffold IDs and scores.
<base>/badapple/prom/cid/7
52424?expand=true
Additional statistics, scaffold smiles,
and inDrug flag.
<base>/badapple/prom/scafid
/233
For scaffold with specified ID, return
statistics and smiles.
Predictive Models
21. Predictive Models
• Predicts CYP450 isoforms
metabolism sites with 2D
structures
• Patrik Rydberg et. al
• Released under LGPL
• BARD plugin
– Summary HTML view
– Data view