Making Data FAIR (Findable, Accessible, Interoperable, Reusable)

Director, Bioinformatics, Data Science & AI at AstraZeneca em AstraZeneca
25 de Mar de 2019

Mais conteúdo relacionado

Apresentações para você(20)

Similar a Making Data FAIR (Findable, Accessible, Interoperable, Reusable)(20)


Making Data FAIR (Findable, Accessible, Interoperable, Reusable)

  1. Making Data FAIR* Tom Plasterer, PhD Director, Bioinformatics, Research Bioinformatics 20 Mar 2019 * Findable, Accessible, Interoperable and Reusable
  2. 3 What FAIR: Principles at-a-Glance Findable: • F1 (meta)data are assigned a globally unique and persistent identifier • F2 data are described with rich metadata • F3 metadata clearly and explicitly include the identifier of the data it describes • F4 (meta)data are registered or indexed in a searchable resource The FAIR Guiding Principles for scientific data management and stewardship Sci. Data 3:160018 doi: 10.1038/sdata.2016.18 (2016) Accessible: • A1 (meta)data are retrievable by their identifier using a standardized communications protocol • A1.1 the protocol is open, free, and universally implementable • A1.2 the protocol allows for an authentication and authorization procedure, where necessary; • A2 metadata are accessible, even when the data are no longer available; Interoperable: • I1 (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation • I2 (meta)data use vocabularies that follow FAIR principles • I3 (meta)data include qualified references to other (meta)data Reusable: • R1 meta(data) are richly described with a plurality of accurate and relevant attributes • R1.1 (meta)data are released with a clear and accessible data usage license • R1.2 (meta)data are associated with detailed provenance • R1.3 (meta)data meet domain-relevant community standards
  3. 4 Collaborative & Competitive Intelligence: • Who do we want to partner with? Are there complementary assets to our portfolio? • What space is too crowded and not our area of expertise? • Greenfield situations? Mergers, Acquisitions, Partnerships: • How do we efficiently and deeply absorb data generated elsewhere into our systems? How do we efficiently share? • Does this make a smaller biotech/start-up a more viable partner? Improved Patient Care: • Can we share data and outcomes more efficiently in complicated trial settings (basket trials, adaptive trials) to better engage opinion leaders and foster dialog? • Along with Differential Privacy approaches, can we have the broader research community help mine our data? • How do we best reuse Real World Evidence (RWE) data in the clinic and in trial design? Data (Ir)-reproducibility: • Can we make preclinical data (more)-reproducible? • Can we utilize data credentialization? (thanks to Dan Crowther @ Exscientia) Why FAIR: Biopharma Value Proposition
  4. 5 Why FAIR: €26bn Reasons…
  5. 6 When FAIR: A Brief History Moving away from Narrative • Nanopublications Incubating Standards in Open PHACTS • VoID, PROV-O Lorentz Center Workshop • FORCE 11 FAIR Guiding Principles • Participants: IMI members, US researchers, Content providers, ELIXIR; European Open Science Cloud, Big Data to Knowledge (BD2K) Current Status: • FAIR Data Workshops (EU-ELIXIR nodes) • Inclusion in Horizon 2020, NIH Advocacy • IMI2 Data FAIR-ification Call • Vendors getting up to speed
  6. 7 Linked Data Community of Practice How familiar are you with the FAIR principles and metrics? When FAIR: Community Awareness
  7. 8 Linked Data Community of Practice What is the maturity level of your organization with respect to implementation of FAIR? When FAIR: Getting Started
  8. 9 How FAIR: Pistoia FAIR Implementation Group • Business challenge: - Effective application and analysis of data assets in life science industry demands that it is made Findable, Accessible, Interoperable and Reusable • Update and plans: - Workshop at The Hyve, Utrecht NL in June 2018 resulted in a published feature article:- - Workshop at EPAM, Boston US in Dec 2018 contributed to the business case thinking - Phase 1 for 2019 plans:- • Develop the business case to define distinctive role for the project • Develop the FAIR Toolkit concept • Select a use case: e.g. clinical science to engage with CROs at a workshop - Seeking more funding – join us! PM: Ian Harrow Collaborators 1.Metric Tools & Best Practice 2.Training resources 3.Culture change process 4.Use case examples 5.Cost benefit examples • Adapt for Life Science industry • Leverage existing FAIR resources FAIR Toolkit Implementation for LS Industry FAIR
  9. 10 How FAIR: Pistoia Ontologies Mapping Project • Business challenge: – Use of different ontologies within same data domain hampers interoperability and application. Solve by mapping between them. • Update and plans: – Phase 3 completed by end of 2018 • Predicted mappings delivered as a prototype Ontology Mapping Service for phenotype and disease domain • Mappings will be available through public wiki and OxO mapping repository at EMBL-EBI • Mapping algorithm, Paxo is available openly on GitHub – Phase 4 for 2019 plans:- • To extend mapping of biological and chemical ontologies for support of laboratory analytics • FAIR implementation is planned – Seeking more funding – join us! PartnersPM: Ian Harrow
  10. 11 How FAIR:
  11. 12 How FAIR: Implementation Networks
  12. 13 How FAIR: Overview: • ELIXIR - Project Coordinator & Janssen - Project Leader • 22 participants with 12 academic, 7 EFPIA, 3 SME • €8.23M budget with €4M H2020 EC funding + €4.23M EFPIA in-kind • 42 months Goals: • Establish a value-based process for prioritization and selection of IMI project databases • Develop FAIRification toolkit e.g. develop guidelines, tools and metrics - FAIR Cookbook • Apply this toolkit to FAIRify datasets from selected IMI projects and EFPIA companies • Deliver training for data handlers (academia, SMEs and pharmaceuticals) to change and sustain the data management culture • Foster and innovation ecosystem on FAIR open data to power future reuse, knowledge generation and societal benefit e.g. FAIR innovation and SME events Members: PM: Serena Scollen
  13. 14 How FAIR: Concept
  14. 15 How FAIR: FAIR Metrics &
  15. 17 Start FAIR: Find me Datasets about: Projects Study Indication/ Disease Technology Targets Cohort DatesAgent Therapeutic Area Drugs
  16. 18 Dataset Catalog is a collection of Dataset Records • Catalogs are needed to supporting FAIR (Findable) data • Catalogs can and should support Enterprise MDM strategies • Consumers can be internal or external Dataset Catalogs are needed so data consumers can find Datasets • Dataset records need sufficient metadata to support discoverability • Dataset terms are NOT the data instance Dataset Catalogs surface dataset provenance and enable data access Dataset Catalogs can provide datasets for multiple consumption patters • Analytics readiness and fit • ‘Walking’ across information models Start FAIR: Findability Starts with Catalogs
  17. 19 Start FAIR: A DCAT conformant Data Catalog Semantic tagging of datasets with concepts from taxonomies: • provides context • multi-dimensional & flexible • effective for discoverability • light-weight semantics skos:Concept dcat:Catalog skos:ConceptScheme dctypes:Dataset (summary) dct:title dct:publisher <foaf:Agent> foaf:page void:sparqlEndpoint dct:accrualPeriodicity dcat:keyword dcat:dataset dcat:theme dctypes:Dataset (version) dcat:Distribution (dctypes:Dataset) void:vocabulary dct:conformsTo void:exampleResource …other void properties dcat:distribution dcat:themeTaxonomy dct:isVersionOf pav:previousVersion dct:hasPart pav:hasCurrentVersion dct:hasPart dct:title dct:publisher <foaf:Agent> pav:version dct:creator <foaf:Agent> dct:created dct:source dct:creator <foaf:Agent> dct:license dct:format pav:retrievedFrom dct:created pav:createdWith dcat:accessURL dcat:downloadURL void:Dataset dct:title dctDescription dct:publisher <foaf:Agent>
  18. Start FAIR: Dataset to Knowlege Graph to Analytics Data Catalog Filter Phase 1 Experiment Metadata Filter Phase 2 Ad hoc Analyses Filtering Phase 3 Outbound to Data Analytics Data Science Tools Statistical Filtering e.g., clinical trial with > 50 participants Dataset Catalog Descriptions
  19. R&D | RDI Why FAIR? • Cost avoidance, Business Advantage, Data Stewardship When FAIR? • Now! Peers, especially in Europe, are doing it How FAIR? • FAIRplus, GO-FAIR, Pistoia FAIR Implementation Group Start FAIR • Findability first, adopt a FAIR-compliant Data Catalog FAIR-for-Biopharma: Take-aways
  20. R&D | RDI Thanks Key Influencers David Wood Tim Berners-Lee Lee Harland Jane Lomax James Malone Dean Allemang Barend Mons Carole Goble Bernadette Hyland Bob Stanley Eric Little Michel Dumontier John Wilbanks Hans Constandt Filip Pattyn Tim Hoctor Kees Van Boche Serena Scollen AstraZeneca/Pistoia FAIR Data Community Mathew Woodwark Rajan Desai Nic Sinibaldi Chia-Chien Chiang Kerstin Forsberg Ola Engkvist Ian Dix Colin Wood Ted Slater Martin Romacker Eric Neumann John Wise Carmen Nitsche Ian Harrow Jeff Saltzman Kathy Reinold

Notas do Editor

  1. Eric Schulte’s talk: Ready, Set, GO-FAIR:
  2. 50% (or higher) preclinical research could not be reproduced with a cost of $28B/year Pistoia paper: Implementation and relevance of FAIR data principles in biopharmaceutical R&D;
  4. EU Research and Innovation programme ever with nearly €80 billion of funding available over 7 years (2014 to 2020)
  6. Images: ( (