pharmas and academia join forces to make data FAIR
12th Global Summit on Regulatory Science (GSRS22), Bioinformatics Session, Singapore, 20 Oct, 2022
Slides: https://www.slideshare.net/SusannaSansone
Professor of Data Readiness
Associate Director, Oxford e-Research Centre
Interoperability Platform
Co-Lead
elixir-europe.org
Founding
Academic Editor
nature.com/sdata
datareadiness.eng.ox.uk
Susanna-Assunta Sansone
ORCiD: 0000-0001-5306-5690
Twitter: @SusannaASansone
Open, FAIR and reproducible science
The FAIR Principles
Globally unique and
persistent identifiers
Community defined
descriptive metadata
Community defined
terminologies
Detailed
provenance
Terms of access
Terms of
use
Rationale behind the FAIR Principles
Globally unique and
persistent identifiers
Community defined
descriptive metadata
Community defined
terminologies
Detailed
provenance
Terms of access
Terms of
use
https://www.forbes.com/sites/gilpress/2016/03/23/data-preparation-most-time-consuming-least-enjoyable-data-science-task-survey-
says/#276a35e6f637
Discoveries are made using shared data, and this
requires data that are:
● Cited and stored to be discoverable
● Retrievable and structured in standard format(s)
● Richly described to be understandable
Data preparation accounts for
about 80% of the work of data
scientists
FAIR as driven of the digital transformation
in (bio)phamas
● To improve biopharma R&D productivity
● To enables powerful new AI analytics to
access data for ML and prediction
● Requirements
o financial, technical, training
● Challenges
o change the culture, show business
value, achieve the ‘FAIR enough’ on
an enterprise scale
The FAIR Principles:
a continuum of features, attributes, behaviours
In practice, what I need to do?
The FAIR Cookbook:
motivations and ambitions
beyond the hype
Large body of generic FAIR
guidance
Motivations
Non-specific guidance for
the life sciences
Ambitions
Target specific situations to deliver a guide with
applied examples
Join academia and industry forces to make the
case for FAIR data management
Build capacity for high quality data
management in the private and public sectors
Lack of practical examples
of ‘how-to’ with different
data types and scenarios
What it is?
A collection of recipes that cover the
operation steps of FAIR data management
Who is it for?
Data Managers,
Data Stewards,
Data Curators
Software
Developers,
Terminology
Managers
• A venue to document and share existing and new approaches or
services to support FAIRification
• A way to promote a participatory culture that enables sharing of
expertise by getting exposure and credit
Policymakers,
Funders,
Trainers
• Practical examples to
recommend in policies
• To use in educational
material to incentivize and
guide FAIR in practice.
• Introductory material
• Hands-on, technical
step-by-step examples
Researchers,
Data Scientists,
Principal
Investigators
● Over 70 recipes released and more
content available
● Covering over 20 data types, incl:
○ omics
○ pre-clinical
○ clinical areas
But not limited to it!
Coverage and learning objectives
Learn how to improve the FAIRness with exemplar datasets
Understand the levels and indicators of FAIRness
Discover open source technologies, tools and services
Find out the required skills
Acknowledge the challenges
Define what your needs are
Goal: improving visibility of content
Goal: semantic integration of datasets from multiple sources
Goal: security compliance and with regulators
Define what your needs are
Goal: improving visibility of content, e.g.:
Goal: semantic integration of datasets from multiple sources, e.g.:
Goal: security compliance and with regulators, e.g.:
https://w3id.org/faircookbook/FCB010
https://w3id.org/faircookbook/FCB007
https://w3id.org/faircookbook/FCB006
https://w3id.org/faircookbook/FCB020 https://w3id.org/faircookbook/FCB004
https://w3id.org/faircookbook/FCB014 https://w3id.org/faircookbook/FCB035
Anatomy of a recipe
components
Ingredients
An idea of tools/skills needed
Step by step process
Guidelines, process, description
Practical
elements, code
snippets
#Python3
#zooma-annotator-script.py
file
def
get_annotations(propertyType
, propertyValues, filters = ""): "
Examples
Conclusions
What should I read next?
CC BY-SA 4.0 International
MIAME
MIRIAM
MIQAS
MIX
MIGEN
ARRIVE
…
MIAPE
MIASE
…
MISFISHIE
….
REMARK
CONSORT
SRAxml
SDTM FASTA
DICOM
OMOP
…
SBRML
SEDML
…
CDASH
ISA CML
MITAB
…
AAO
CHEBI
OBI
PATO ENVO
MOD
BTO
IDO
…
TEDDY
PRO
…
XAO
DO
…
VO EC number
URL PURL
LSID
Handle
ORCID
RRID
…
InChI
…
IVOA ID
…
DOI
Standard organizations, e.g.: Grass-roots groups, e.g.:
Identifiers
Terminologies Guidelines
Formats
550
303
166
11
More than 1000 standards
Standards in the life science
Tagging recipes with
‘Dataset Maturity Indicators’
Maturity level and indicators
new feature!
https://fairplus.github.io/Data-Maturity
Provide insights into FAIR Maturity reached by
applying a specific recipe to improve a
dataset
Who developed it?
Almost 100 life sciences professionals, researchers and data managers
FARIplus
partners
Industry
+
Academia
ELIXIR
Nodes
represented
Current operations and Editorial Board
Content prioritisation
Identification of topics
Review of drafts
Call for contributions
Monthly book-dash
events
Pre-defined focus areas
Breakout on topics
Housekeeping
Technical platform
Website Martin Cook
Dominique Batista
Office of Data
Science Strategy
Become part of a community of FAIR experts
and write recipes!
1Identify a chapter and a topic
Findability Accessibility Interoperability Reusability
Infrastructure Applied examples Assessment
2 Choose a way of contributing and see our guidelines
Google Docs
HackMD
Git
Markdown cheat sheet
Get recipe template
Tips and tricks
Submit an
outline
3
You can
discuss it
with the
Editorial
Board
What has motivated so many people
to contribute?
● To stay engaged in a growing community and updated with the latest
development
● To proof their FAIR competence (as individual and as an organisation)
● To expand their network of potential collaborators, clients and users
● To address common challenges and find common solutions at pre-
competitive level
● As an educational material on FAIR in a training context (as part of the
FAIRplus Fellowship Programme):
o showed the validity of the recipes’ content towards the intended
(learning) objectives
o confirmed that some recipes require a greater amount of technical
background knowledge, and a steeper learning curve
Utility and value based on three uses:
what we have learned
● As a practical guidance to improve day-to-day tasks for FAIRer data
● And as contributor towards changing the culture in research data
management (behind the pharmas’ firewall)
o outcomes were expressed in terms of satisfaction of the value of the
recipes, against specific tasks, or challenges addressed
o they reported a positive contribution towards their discussion on return
on investment to operationalize FAIR
Utility and value based on three uses:
what we have learned
Thanks to
Editorial Board
Section Editors
FAIRplus partners
All bookdashes’ participants
All authors
fairplus-cookbook@elixir-europe.org
faircookbook.elixir-europe.org
fairplus-project.eu This project has received funding from the Innovative Medicines Initiative Joint Undertaking under grant agreement No 802750. This Joint Undertaking
receives support from the European Union’s Horizon 2020 research and innovation programme and EFPIA Companies. This communication reflects the
views of the authors and neither IMI nor the European Union, EFPIA or any Associated Partners are liable for any use that may be made of the
information contained herein.