SlideShare uma empresa Scribd logo
1 de 18
Baixar para ler offline
DataFAIRy bioassays
pilot project - lessons
learned and future
outlook
Isabella Feierberg, AstraZeneca
Samantha Jeschonek, Collaborative Drug Discovery
Nick Lynch, Curlew Research
2021-06-02
Why DataFAIRy?
2
Substantial investments are being made in AI, ML and FAIR data
across life science industry and academia
Available metadata in (public domain) data repositories is often
insufficient for answering current and future business questions
Pharma companies already pay for curation of partially overlapping
public domain data (e.g., ChEMBL, papers, chemistry patents)
There is a need for FAIR public domain data with high quality
annotations using public ontologies and a common data model
Example
Predict
kinase
selectivity
Get all
kinase
activity/sel
ectivity
data from
ChEMBL,
inhouse,
collaborato
rs
Build model Use model
Update
model with
new data
3
What we want
4
Well annotated data
Lots of data
High quality data
Siloed data is
not helpful
5
My organization’s data
Public data
Partner’s data
The proposed DataFAIRy operational model (2018)
7
Curation and QC by
independent domain
experts
unstructured
public data
FAIR
data
DataFAIRy
Partners
Cost-shared annotation of public domain bioassay descriptions with high quality, using an agreed data model, making data FAIR
FAIR = Findable, Accessible, Interoperable, Reusable
Small molecule bioassays make up
a good pilot case
8
Chemogenomic model building
Assay development, e.g., assay conditions and tool compounds
Enriching public chemogenomics data with FAIR metadata will
show impact across the cheminformatics domain
Project planning – what is available in the public domain?
•
•
•
•
Roche
Project team
9
Rama Balakrishnan
Martin Romacker
Novartis
Anosha Siripala
Gabriel Backiananthan
BMS
Dana Vanderwall
AstraZeneca
Tim Ikeda
Isabella Feierberg
Collaborative Drug
Discovery
Samantha Jeschonek
Jason Harris
Whitney Smith
Pistoia Alliance
Vladimir Makarov
Thomas Liener
Feasibility study, guidance for a larger initiative, example creation
Pilot project (2020) – Summary
10
Curation of 496 public domain assay descriptions were converted
into FAIR information objects using an agreed data model, which
was guided by jointly defined business questions. Upload of the
metadata to PubChem.
Learning points were captured along with recommendations for
future endeavors
Pilot Project - Business questions
11
Biology oriented literature mining for discovery project planning
Assay technology oriented
Chemistry/tool compound oriented
Specific assay conditions
Computational chemogenomic modelling (e.g., target activity, ”PAINS”)
1
2
3
4
5
26 initial questions, pruned down to 15, across 5 main categories
CDD’s BioAssay Express = NLP/ML + Human in the Loop
12
Pilot Project - Assay selection
13
245 Commercial panel assays: ThermoFisher’s kinase selectivity Z’-lyte panel
-Downloaded vendor’s pdf document with assay protocol
42 PubChem NCATS assays – qHTS, large datasets
-Assay Description and Assay Protocol sections in plain text on Pubchem page
210 publication assays: ChEMBL assays where the target is EGFR, and the reference is Open Access
-Paper/supplementary material, references
1
2
3
100 of these 496 annotated assays were subjected to manual QC by project team members
Pilot: 100 QC:d assays (~20%)
14
• Learning points are largely extrapolating
on the 100 QC:d assays
• 89 ChEMBL assays, 5 NCATS assays
• 6 ThermoFisher panel assays QC:d
15
How well did the pilot assays get annotated?
Pilot Project – Learnings
16
Review of supplements and citations → High
cost. Choose assays wisely.
No persistent links exist for commercial
assay panel protocols
Errors propagate between papers
Commercial assay panels were the easiest to
annotate (low-hanging fruit)
Fully automated is not fully accurate:
Benefit from good work practices: audit trail,
versioning, iterative QC by experts
Need for a common community data standard
for future assay publications.
1
2
3
4
5
6
7
Hard and expensive to annotate old assay
protocols from literature : A need for published
assay protocols to be well-annotated in public
databanks and linked to the publication
Value statement
17
“Richly annotated FAIR bioassay data has been very valuable for an internal data
integration project, where it has provided additional terminology aiding the
assimilation of the chemogenomics datasets used by the machine-learning models.
The extra annotations better harmonise our dataset with those from external
partners, enabling the federated platform to provide superior multi-task predictions
across range of panels and safety screens in a privacy preserving way”
Lewis Mervin, Machine Learning and Cheminformatics Expert, Molecular AI,
Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca
Optimize process, data sources, tools, QC within quality constraints.
Define quality metrics.
Next steps:
18
Define and promote a community standard for assay reporting and
publishing --align with vendors, publishers, government agencies.
Attract new project members and sufficient funding to start the
next phase
Scale up (x 10-100) in next steps. Having more partners
lowers cost per partner per assay and overhead cost
19
Thanks to
AstraZeneca
Nigel Green
David Hayes
Tom Plasterer
BMS
Rick Bishop
Janssen
Herman van Vlijmen
Novartis
Fabien Pernot
MMV
Jeremy Burrows
PubChem
Evan Bolton
ChEMBL
Anna Gaulton
Andrew Leach
Roche
Olivier Roche
Medicines Discovery
Catapult
John Overington
Mark Davies
Pangeadata.ai
Vibhor Gupta
University of Miami
Stephan Schürer
BioSci Consulting
Scott Wagers
Collaborative Drug
Discovery
Barry Bunin
Frank Cole
Alex Clark
Hande Kücük McGinty
(now Univ. Of Ohio)
Pistoia Alliance
Carmen Nitsche (now at CCDC)
Nick Lynch (Now at curlew Research)

Mais conteúdo relacionado

Mais procurados

2011-10-11 Open PHACTS at BioIT World Europe
2011-10-11 Open PHACTS at BioIT World Europe2011-10-11 Open PHACTS at BioIT World Europe
2011-10-11 Open PHACTS at BioIT World Europeopen_phacts
 
Toward F.A.I.R. Pharma. PhUSE Linked Data Initiatives Past and Present
Toward F.A.I.R. Pharma. PhUSE Linked Data Initiatives Past and PresentToward F.A.I.R. Pharma. PhUSE Linked Data Initiatives Past and Present
Toward F.A.I.R. Pharma. PhUSE Linked Data Initiatives Past and PresentTim Williams
 
Knowledge Graphs : Shaping Our Data Future
Knowledge Graphs : Shaping Our Data FutureKnowledge Graphs : Shaping Our Data Future
Knowledge Graphs : Shaping Our Data FutureTim Williams
 
Medicinal Chemistry Due Diligence: Computational Predictions of an expert’s e...
Medicinal Chemistry Due Diligence: Computational Predictions of an expert’s e...Medicinal Chemistry Due Diligence: Computational Predictions of an expert’s e...
Medicinal Chemistry Due Diligence: Computational Predictions of an expert’s e...Sean Ekins
 
How to Create a Big Data Culture in Pharma
How to Create a Big Data Culture in PharmaHow to Create a Big Data Culture in Pharma
How to Create a Big Data Culture in PharmaChris Waller
 
FAIR Data Knowledge Graphs–from Theory to Practice
FAIR Data Knowledge Graphs–from Theory to PracticeFAIR Data Knowledge Graphs–from Theory to Practice
FAIR Data Knowledge Graphs–from Theory to PracticeTom Plasterer
 
2014 agbt giab_progress update
2014 agbt giab_progress update2014 agbt giab_progress update
2014 agbt giab_progress updateGenomeInABottle
 
2015-04-28 Open PHACTS at Swedish Linked Data Network Meet-up
2015-04-28 Open PHACTS at Swedish Linked Data Network Meet-up2015-04-28 Open PHACTS at Swedish Linked Data Network Meet-up
2015-04-28 Open PHACTS at Swedish Linked Data Network Meet-upopen_phacts
 
2015-02-10 The Open PHACTS Discovery Platform: Semantic Data Integration for ...
2015-02-10 The Open PHACTS Discovery Platform: Semantic Data Integration for ...2015-02-10 The Open PHACTS Discovery Platform: Semantic Data Integration for ...
2015-02-10 The Open PHACTS Discovery Platform: Semantic Data Integration for ...open_phacts
 
2020.04.07 automated molecular design and the bradshaw platform webinar
2020.04.07 automated molecular design and the bradshaw platform webinar2020.04.07 automated molecular design and the bradshaw platform webinar
2020.04.07 automated molecular design and the bradshaw platform webinarPistoia Alliance
 
Dataset Catalogs as a Foundation for FAIR* Data
Dataset Catalogs as a Foundation for FAIR* DataDataset Catalogs as a Foundation for FAIR* Data
Dataset Catalogs as a Foundation for FAIR* DataTom Plasterer
 
effective data sharing for a learning healthcare system
effective data sharing for a learning healthcare systemeffective data sharing for a learning healthcare system
effective data sharing for a learning healthcare systemPaul Houston
 
cBioPortal Webinar Slides (3/3)
cBioPortal Webinar Slides (3/3)cBioPortal Webinar Slides (3/3)
cBioPortal Webinar Slides (3/3)Pistoia Alliance
 
Pistoia Alliance US Conference 2015 - 1.1.2 Innovation in Pharma - Chris Waller
Pistoia Alliance US Conference 2015 - 1.1.2 Innovation in Pharma - Chris WallerPistoia Alliance US Conference 2015 - 1.1.2 Innovation in Pharma - Chris Waller
Pistoia Alliance US Conference 2015 - 1.1.2 Innovation in Pharma - Chris WallerPistoia Alliance
 
The Next Generation Open Targets Platform
The Next Generation Open Targets PlatformThe Next Generation Open Targets Platform
The Next Generation Open Targets PlatformHelenaCornu
 
Developing open data analysis pipelines in the cloud: Enabling the ‘big data’...
Developing open data analysis pipelines in the cloud: Enabling the ‘big data’...Developing open data analysis pipelines in the cloud: Enabling the ‘big data’...
Developing open data analysis pipelines in the cloud: Enabling the ‘big data’...Juan Antonio Vizcaino
 
Role of Data Accessibility During Pandemic
Role of Data Accessibility During PandemicRole of Data Accessibility During Pandemic
Role of Data Accessibility During PandemicDatabricks
 

Mais procurados (20)

2011-10-11 Open PHACTS at BioIT World Europe
2011-10-11 Open PHACTS at BioIT World Europe2011-10-11 Open PHACTS at BioIT World Europe
2011-10-11 Open PHACTS at BioIT World Europe
 
Toward F.A.I.R. Pharma. PhUSE Linked Data Initiatives Past and Present
Toward F.A.I.R. Pharma. PhUSE Linked Data Initiatives Past and PresentToward F.A.I.R. Pharma. PhUSE Linked Data Initiatives Past and Present
Toward F.A.I.R. Pharma. PhUSE Linked Data Initiatives Past and Present
 
Knowledge Graphs : Shaping Our Data Future
Knowledge Graphs : Shaping Our Data FutureKnowledge Graphs : Shaping Our Data Future
Knowledge Graphs : Shaping Our Data Future
 
Medicinal Chemistry Due Diligence: Computational Predictions of an expert’s e...
Medicinal Chemistry Due Diligence: Computational Predictions of an expert’s e...Medicinal Chemistry Due Diligence: Computational Predictions of an expert’s e...
Medicinal Chemistry Due Diligence: Computational Predictions of an expert’s e...
 
How to Create a Big Data Culture in Pharma
How to Create a Big Data Culture in PharmaHow to Create a Big Data Culture in Pharma
How to Create a Big Data Culture in Pharma
 
Online Resources to Support Open Drug Discovery Systems
Online Resources to Support Open Drug Discovery SystemsOnline Resources to Support Open Drug Discovery Systems
Online Resources to Support Open Drug Discovery Systems
 
FAIR Data Knowledge Graphs–from Theory to Practice
FAIR Data Knowledge Graphs–from Theory to PracticeFAIR Data Knowledge Graphs–from Theory to Practice
FAIR Data Knowledge Graphs–from Theory to Practice
 
2014 agbt giab_progress update
2014 agbt giab_progress update2014 agbt giab_progress update
2014 agbt giab_progress update
 
2015-04-28 Open PHACTS at Swedish Linked Data Network Meet-up
2015-04-28 Open PHACTS at Swedish Linked Data Network Meet-up2015-04-28 Open PHACTS at Swedish Linked Data Network Meet-up
2015-04-28 Open PHACTS at Swedish Linked Data Network Meet-up
 
2015-02-10 The Open PHACTS Discovery Platform: Semantic Data Integration for ...
2015-02-10 The Open PHACTS Discovery Platform: Semantic Data Integration for ...2015-02-10 The Open PHACTS Discovery Platform: Semantic Data Integration for ...
2015-02-10 The Open PHACTS Discovery Platform: Semantic Data Integration for ...
 
2020.04.07 automated molecular design and the bradshaw platform webinar
2020.04.07 automated molecular design and the bradshaw platform webinar2020.04.07 automated molecular design and the bradshaw platform webinar
2020.04.07 automated molecular design and the bradshaw platform webinar
 
Dataset Catalogs as a Foundation for FAIR* Data
Dataset Catalogs as a Foundation for FAIR* DataDataset Catalogs as a Foundation for FAIR* Data
Dataset Catalogs as a Foundation for FAIR* Data
 
effective data sharing for a learning healthcare system
effective data sharing for a learning healthcare systemeffective data sharing for a learning healthcare system
effective data sharing for a learning healthcare system
 
cBioPortal Webinar Slides (3/3)
cBioPortal Webinar Slides (3/3)cBioPortal Webinar Slides (3/3)
cBioPortal Webinar Slides (3/3)
 
Pistoia Alliance US Conference 2015 - 1.1.2 Innovation in Pharma - Chris Waller
Pistoia Alliance US Conference 2015 - 1.1.2 Innovation in Pharma - Chris WallerPistoia Alliance US Conference 2015 - 1.1.2 Innovation in Pharma - Chris Waller
Pistoia Alliance US Conference 2015 - 1.1.2 Innovation in Pharma - Chris Waller
 
Analytics in Pharmaceutical Industry
Analytics in Pharmaceutical IndustryAnalytics in Pharmaceutical Industry
Analytics in Pharmaceutical Industry
 
MPS webinar master deck
MPS webinar master deckMPS webinar master deck
MPS webinar master deck
 
The Next Generation Open Targets Platform
The Next Generation Open Targets PlatformThe Next Generation Open Targets Platform
The Next Generation Open Targets Platform
 
Developing open data analysis pipelines in the cloud: Enabling the ‘big data’...
Developing open data analysis pipelines in the cloud: Enabling the ‘big data’...Developing open data analysis pipelines in the cloud: Enabling the ‘big data’...
Developing open data analysis pipelines in the cloud: Enabling the ‘big data’...
 
Role of Data Accessibility During Pandemic
Role of Data Accessibility During PandemicRole of Data Accessibility During Pandemic
Role of Data Accessibility During Pandemic
 

Semelhante a DataFAIRy bioassays pilot -- lessons learned and future outlook

tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...
tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...
tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...David Peyruc
 
ELSS use cases and strategy
ELSS use cases and strategyELSS use cases and strategy
ELSS use cases and strategyAnton Yuryev
 
Project Focused Activity And Knowledge Tracker A Unified Data Analysis Collab...
Project Focused Activity And Knowledge Tracker A Unified Data Analysis Collab...Project Focused Activity And Knowledge Tracker A Unified Data Analysis Collab...
Project Focused Activity And Knowledge Tracker A Unified Data Analysis Collab...brosiusad
 
A Big Picture in Research Data Management
A Big Picture in Research Data ManagementA Big Picture in Research Data Management
A Big Picture in Research Data ManagementCarole Goble
 
How SAP HANA can provide value for Pharma R&D
How SAP HANA can provide value for Pharma R&DHow SAP HANA can provide value for Pharma R&D
How SAP HANA can provide value for Pharma R&DMarc Maurer
 
FAIR Data Knowledge Graphs
FAIR Data Knowledge GraphsFAIR Data Knowledge Graphs
FAIR Data Knowledge GraphsTom Plasterer
 
2011-12-02 Open PHACTS at STM Innovation
2011-12-02 Open PHACTS at STM Innovation2011-12-02 Open PHACTS at STM Innovation
2011-12-02 Open PHACTS at STM Innovationopen_phacts
 
NIH Data Summit - The NIH Data Commons
NIH Data Summit - The NIH Data CommonsNIH Data Summit - The NIH Data Commons
NIH Data Summit - The NIH Data CommonsVivien Bonazzi
 
Is that a scientific report or just some cool pictures from the lab? Reproduc...
Is that a scientific report or just some cool pictures from the lab? Reproduc...Is that a scientific report or just some cool pictures from the lab? Reproduc...
Is that a scientific report or just some cool pictures from the lab? Reproduc...Greg Landrum
 
Precompetitive Collaborations
Precompetitive CollaborationsPrecompetitive Collaborations
Precompetitive CollaborationsChris Waller
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemWarren Kibbe
 
Stephen Friend CRUK-MD Anderson Cancer Workshop 2012-02-28
Stephen Friend CRUK-MD Anderson Cancer Workshop 2012-02-28Stephen Friend CRUK-MD Anderson Cancer Workshop 2012-02-28
Stephen Friend CRUK-MD Anderson Cancer Workshop 2012-02-28Sage Base
 
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Pfizer’s Recent Use of tr...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Pfizer’s Recent Use of tr...tranSMART Community Meeting 5-7 Nov 13 - Session 3: Pfizer’s Recent Use of tr...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Pfizer’s Recent Use of tr...David Peyruc
 
The Commons: Leveraging the Power of the Cloud for Big Data
The Commons: Leveraging the Power of the Cloud for Big DataThe Commons: Leveraging the Power of the Cloud for Big Data
The Commons: Leveraging the Power of the Cloud for Big DataPhilip Bourne
 
Tag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh PlatformTag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh PlatformSanjay Padhi, Ph.D
 
Maximize Your Understanding of Operational Realities in Manufacturing with Pr...
Maximize Your Understanding of Operational Realities in Manufacturing with Pr...Maximize Your Understanding of Operational Realities in Manufacturing with Pr...
Maximize Your Understanding of Operational Realities in Manufacturing with Pr...Bigfinite
 
2016 Standardization of Laboratory Test Coding - PHI Conference
2016 Standardization of Laboratory Test Coding - PHI Conference2016 Standardization of Laboratory Test Coding - PHI Conference
2016 Standardization of Laboratory Test Coding - PHI ConferenceMegan Sawchuk
 
Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...
Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...
Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...Barry Smith
 
BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)
BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)
BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)BigData_Europe
 
Opening up pharmacological space, the OPEN PHACTs api
Opening up pharmacological space, the OPEN PHACTs apiOpening up pharmacological space, the OPEN PHACTs api
Opening up pharmacological space, the OPEN PHACTs apiChris Evelo
 

Semelhante a DataFAIRy bioassays pilot -- lessons learned and future outlook (20)

tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...
tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...
tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...
 
ELSS use cases and strategy
ELSS use cases and strategyELSS use cases and strategy
ELSS use cases and strategy
 
Project Focused Activity And Knowledge Tracker A Unified Data Analysis Collab...
Project Focused Activity And Knowledge Tracker A Unified Data Analysis Collab...Project Focused Activity And Knowledge Tracker A Unified Data Analysis Collab...
Project Focused Activity And Knowledge Tracker A Unified Data Analysis Collab...
 
A Big Picture in Research Data Management
A Big Picture in Research Data ManagementA Big Picture in Research Data Management
A Big Picture in Research Data Management
 
How SAP HANA can provide value for Pharma R&D
How SAP HANA can provide value for Pharma R&DHow SAP HANA can provide value for Pharma R&D
How SAP HANA can provide value for Pharma R&D
 
FAIR Data Knowledge Graphs
FAIR Data Knowledge GraphsFAIR Data Knowledge Graphs
FAIR Data Knowledge Graphs
 
2011-12-02 Open PHACTS at STM Innovation
2011-12-02 Open PHACTS at STM Innovation2011-12-02 Open PHACTS at STM Innovation
2011-12-02 Open PHACTS at STM Innovation
 
NIH Data Summit - The NIH Data Commons
NIH Data Summit - The NIH Data CommonsNIH Data Summit - The NIH Data Commons
NIH Data Summit - The NIH Data Commons
 
Is that a scientific report or just some cool pictures from the lab? Reproduc...
Is that a scientific report or just some cool pictures from the lab? Reproduc...Is that a scientific report or just some cool pictures from the lab? Reproduc...
Is that a scientific report or just some cool pictures from the lab? Reproduc...
 
Precompetitive Collaborations
Precompetitive CollaborationsPrecompetitive Collaborations
Precompetitive Collaborations
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health System
 
Stephen Friend CRUK-MD Anderson Cancer Workshop 2012-02-28
Stephen Friend CRUK-MD Anderson Cancer Workshop 2012-02-28Stephen Friend CRUK-MD Anderson Cancer Workshop 2012-02-28
Stephen Friend CRUK-MD Anderson Cancer Workshop 2012-02-28
 
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Pfizer’s Recent Use of tr...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Pfizer’s Recent Use of tr...tranSMART Community Meeting 5-7 Nov 13 - Session 3: Pfizer’s Recent Use of tr...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Pfizer’s Recent Use of tr...
 
The Commons: Leveraging the Power of the Cloud for Big Data
The Commons: Leveraging the Power of the Cloud for Big DataThe Commons: Leveraging the Power of the Cloud for Big Data
The Commons: Leveraging the Power of the Cloud for Big Data
 
Tag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh PlatformTag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh Platform
 
Maximize Your Understanding of Operational Realities in Manufacturing with Pr...
Maximize Your Understanding of Operational Realities in Manufacturing with Pr...Maximize Your Understanding of Operational Realities in Manufacturing with Pr...
Maximize Your Understanding of Operational Realities in Manufacturing with Pr...
 
2016 Standardization of Laboratory Test Coding - PHI Conference
2016 Standardization of Laboratory Test Coding - PHI Conference2016 Standardization of Laboratory Test Coding - PHI Conference
2016 Standardization of Laboratory Test Coding - PHI Conference
 
Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...
Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...
Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...
 
BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)
BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)
BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)
 
Opening up pharmacological space, the OPEN PHACTs api
Opening up pharmacological space, the OPEN PHACTs apiOpening up pharmacological space, the OPEN PHACTs api
Opening up pharmacological space, the OPEN PHACTs api
 

Último

Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLkantirani197
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Silpa
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsSérgio Sacani
 
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)AkefAfaneh2
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...
Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...
Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...Mohammad Khajehpour
 
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedConnaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedDelhi Call girls
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY1301aanya
 
Introduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptxIntroduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptxBhagirath Gogikar
 
IDENTIFICATION OF THE LIVING- forensic medicine
IDENTIFICATION OF THE LIVING- forensic medicineIDENTIFICATION OF THE LIVING- forensic medicine
IDENTIFICATION OF THE LIVING- forensic medicinesherlingomez2
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxSuji236384
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptxAlMamun560346
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .Poonam Aher Patil
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bSérgio Sacani
 
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts ServiceJustdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Servicemonikaservice1
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPirithiRaju
 
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATION
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATIONSTS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATION
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATIONrouseeyyy
 

Último (20)

Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...
Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...
Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...
 
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedConnaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
 
Introduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptxIntroduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptx
 
IDENTIFICATION OF THE LIVING- forensic medicine
IDENTIFICATION OF THE LIVING- forensic medicineIDENTIFICATION OF THE LIVING- forensic medicine
IDENTIFICATION OF THE LIVING- forensic medicine
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptx
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts ServiceJustdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATION
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATIONSTS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATION
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATION
 

DataFAIRy bioassays pilot -- lessons learned and future outlook

  • 1. DataFAIRy bioassays pilot project - lessons learned and future outlook Isabella Feierberg, AstraZeneca Samantha Jeschonek, Collaborative Drug Discovery Nick Lynch, Curlew Research 2021-06-02
  • 2. Why DataFAIRy? 2 Substantial investments are being made in AI, ML and FAIR data across life science industry and academia Available metadata in (public domain) data repositories is often insufficient for answering current and future business questions Pharma companies already pay for curation of partially overlapping public domain data (e.g., ChEMBL, papers, chemistry patents) There is a need for FAIR public domain data with high quality annotations using public ontologies and a common data model
  • 4. What we want 4 Well annotated data Lots of data High quality data
  • 5. Siloed data is not helpful 5 My organization’s data Public data Partner’s data
  • 6. The proposed DataFAIRy operational model (2018) 7 Curation and QC by independent domain experts unstructured public data FAIR data DataFAIRy Partners Cost-shared annotation of public domain bioassay descriptions with high quality, using an agreed data model, making data FAIR FAIR = Findable, Accessible, Interoperable, Reusable
  • 7. Small molecule bioassays make up a good pilot case 8 Chemogenomic model building Assay development, e.g., assay conditions and tool compounds Enriching public chemogenomics data with FAIR metadata will show impact across the cheminformatics domain Project planning – what is available in the public domain? • • • •
  • 8. Roche Project team 9 Rama Balakrishnan Martin Romacker Novartis Anosha Siripala Gabriel Backiananthan BMS Dana Vanderwall AstraZeneca Tim Ikeda Isabella Feierberg Collaborative Drug Discovery Samantha Jeschonek Jason Harris Whitney Smith Pistoia Alliance Vladimir Makarov Thomas Liener
  • 9. Feasibility study, guidance for a larger initiative, example creation Pilot project (2020) – Summary 10 Curation of 496 public domain assay descriptions were converted into FAIR information objects using an agreed data model, which was guided by jointly defined business questions. Upload of the metadata to PubChem. Learning points were captured along with recommendations for future endeavors
  • 10. Pilot Project - Business questions 11 Biology oriented literature mining for discovery project planning Assay technology oriented Chemistry/tool compound oriented Specific assay conditions Computational chemogenomic modelling (e.g., target activity, ”PAINS”) 1 2 3 4 5 26 initial questions, pruned down to 15, across 5 main categories
  • 11. CDD’s BioAssay Express = NLP/ML + Human in the Loop 12
  • 12. Pilot Project - Assay selection 13 245 Commercial panel assays: ThermoFisher’s kinase selectivity Z’-lyte panel -Downloaded vendor’s pdf document with assay protocol 42 PubChem NCATS assays – qHTS, large datasets -Assay Description and Assay Protocol sections in plain text on Pubchem page 210 publication assays: ChEMBL assays where the target is EGFR, and the reference is Open Access -Paper/supplementary material, references 1 2 3 100 of these 496 annotated assays were subjected to manual QC by project team members
  • 13. Pilot: 100 QC:d assays (~20%) 14 • Learning points are largely extrapolating on the 100 QC:d assays • 89 ChEMBL assays, 5 NCATS assays • 6 ThermoFisher panel assays QC:d
  • 14. 15 How well did the pilot assays get annotated?
  • 15. Pilot Project – Learnings 16 Review of supplements and citations → High cost. Choose assays wisely. No persistent links exist for commercial assay panel protocols Errors propagate between papers Commercial assay panels were the easiest to annotate (low-hanging fruit) Fully automated is not fully accurate: Benefit from good work practices: audit trail, versioning, iterative QC by experts Need for a common community data standard for future assay publications. 1 2 3 4 5 6 7 Hard and expensive to annotate old assay protocols from literature : A need for published assay protocols to be well-annotated in public databanks and linked to the publication
  • 16. Value statement 17 “Richly annotated FAIR bioassay data has been very valuable for an internal data integration project, where it has provided additional terminology aiding the assimilation of the chemogenomics datasets used by the machine-learning models. The extra annotations better harmonise our dataset with those from external partners, enabling the federated platform to provide superior multi-task predictions across range of panels and safety screens in a privacy preserving way” Lewis Mervin, Machine Learning and Cheminformatics Expert, Molecular AI, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca
  • 17. Optimize process, data sources, tools, QC within quality constraints. Define quality metrics. Next steps: 18 Define and promote a community standard for assay reporting and publishing --align with vendors, publishers, government agencies. Attract new project members and sufficient funding to start the next phase Scale up (x 10-100) in next steps. Having more partners lowers cost per partner per assay and overhead cost
  • 18. 19 Thanks to AstraZeneca Nigel Green David Hayes Tom Plasterer BMS Rick Bishop Janssen Herman van Vlijmen Novartis Fabien Pernot MMV Jeremy Burrows PubChem Evan Bolton ChEMBL Anna Gaulton Andrew Leach Roche Olivier Roche Medicines Discovery Catapult John Overington Mark Davies Pangeadata.ai Vibhor Gupta University of Miami Stephan Schürer BioSci Consulting Scott Wagers Collaborative Drug Discovery Barry Bunin Frank Cole Alex Clark Hande Kücük McGinty (now Univ. Of Ohio) Pistoia Alliance Carmen Nitsche (now at CCDC) Nick Lynch (Now at curlew Research)