ISMB-ECCB 2021, NIH/ODSS Session, 27 July 2021
ELIXIR is the pan-national European Research Infrastructure for Life Science data, whose 23 national nodes and the EBI coordinate the development and long-term sustainability of domain public databases. FAIR services, policies and curation approaches aim to build a FAIR connected data ecosystem of trusted domain repositories, from ENA, HPA and EGA to specialised resources like CorkOakDB and PIPPA for plant phenotypes. But this is only one part of the data landscape and often the end of data’s journey. The nodes support research projects to operate “FAIR data first”, working with institutional and national platforms that are often generic or designed for project-based data management. We need to bridge between project-based and community-based, and support researchers across their whole RDM lifecycle, navigating the complexity this ecosystem. The ELIXIR-CONVERGE project and its flagship RDMkit toolkit (https://rdmkit.elixir-europe.org) aims to do just that.
FAIR Data Bridging from researcher data management to ELIXIR archives in the RDM lifecycle
1. www.elixir-europe.org/converge
ELIXIR-Converge has received funding from the European Union’s
Horizon 2020 Research and Innovation programme under grant
agreement No 871075.
FAIR Data
Bridging from researcher data management to
ELIXIR archives in the RDM lifecycle
Carole Goble
The University of Manchester, UK
Joint Head of Node ELIXIR-UK
carole.goble@manchester.ac.uk
ISMB-ECCB 2021, NIH/ODSS Session, 27 July 2021
2. Building a distributed Europe Research
Infrastructure for Life Science Data
Support FAIR data management for the
diversity of life sciences across Europe
Internationally aligned
Scalable to 500 000 life scientists
Long-term sustainable
Cooperation
FAIR data at its heart
23 Nodes
250+
organisations
https://elixir-europe.org
3. Trusted Domain-specific Data Repositories
Specialised Curated Independently funded Collections provided by Nodes
Core Data Resources & Deposition Databases
Deposition
Databases
Node Specialised Data Repositories
Criteria: https://f1000research.com/articles/5-2422/v2
4. Supporting Data Repositories to be FAIR
FAIR Data Services, Best Practices, Governance
Text mining
Identifier management,
resolution & best practice
Data citation tracking
Scalable curation
Standards
Domain
Standards
Identifier Services
Schema.org universal
machine processable
metadata mark-up
AI/ML
Identifier mapping
FAIR Services
Impact Analysis
Social and process issues of independent data repositories
FAIRification
FAIR Repository Recommendations &
FAIR indicators
5. Supporting Data Repositories for FAIR finding at scale
Bioschemas
Web-based mark-
up of resources
70 sites
60M+ pages
MolecularEntity, Protein, Gene
Sample, Taxon,
ChemicalSubstance…
DataCatalog
Dataset
license & provenance
Google Dataset Discovery
https://bioschemas.org
6. Researcher / Project Data Management Platforms
Supported by Nodes at National & European Levels
Platform for building Project Hubs
organising, cataloguing, sharing
multiple interlinked kinds of
research objects
using multiple repositories
for multi-partner projects.
https://fair-dom.org
7. Researcher / Project Data Management Platforms
Supported by Nodes at National & European Levels
Know the (meta)data flows,
nudge points & added values
Retention beats
out sharing
Embed in practice
Try to be frictionless
*Pasquetto, I. V., Borgman, C. L., &
Wofford, M. F. (2019). Uses and
Reuses of Scientific Data: The Data
Creators’ Advantage. Harvard Data
Science Review, 1(2).
https://doi.org/10.1162/99608f92.fc
14bf2d
8. Supporting FAIR Data throughout the RDM lifecycle
Localised RDM Platforms (maybe Cloud based)
p-ISA
Supported by Nodes at National & European Levels
Processing
Brokering
Norwegian e-Infrastructure for Life Sciences
9. FAIR Data Landscape
enable FAIR data by design,
end to end…
Researchers,
their Projects and
Collaborators
General Repositories
Institutions
Institutional Repositories
Data brokering
My filestore & my
institution’s file store, ELNs
Metadata and
preparation
National Nodes
Specialised RDM Platforms
and Repositories
SARS-CoV-2 Data Hubs
10. How can we help researchers and data
stewards navigate and contribute to this
FAIR data repository landscape?
11. ELIXIR-CONVERGE
distributed local support for data management
A web-based toolkit
for the bioscience community
written by the bioscience community
The European COVID-19 Data Platform
SARS-CoV-2 Data Hubs
Federated European Genome-phenome Archive
SARS-CoV-2 variant surveillance data tracking services and tools
*https://zenodo.org/record/3474630#.YP2jIEDTXZQ
Data Expert network
Training and Capacity Building
Competency Frameworks*
Professionalising Data Stewardship
Training & Training materials
Data brokering pipelines
From project data platforms
to ELIXIR Deposition Databases
Best practice & examples
12. RDMkit
Guides, Pointers, Best Practice, Tools
assembled by the community
https://rdmkit.elixir-europe.org
Horizon Europe Recommended
13.
14. Repositories, Services & Tools placed in narrative context
training materials,
learning paths, events
tools
standards, databases,
policies, organised
into collections
16. Bridging researcher RDM &
public archives in the data lifecycle
End to end FAIR Data takes a village
Infrastructure + Know-how
Communication
Professionalisation
Willingness & responsibility to be a FAIR
data citizen
build capacity and skills for
researchers, stewards and
data providers
support researchers to
know, utilise, enable and
demand FAIR RDM services
pool the expertise of the
community for the
community
17. Acknowledgments
All the ELIXIR-CONVERGE project
All the RDMkit team & editorial board
Guy Cochrane, Sam Holt, Thomas Keane, Zahra Waheed and the COVID Data Platform team
Alasdair Gray and the Bioschemas Community
Björn Grüning, Frederik Coppens, Wolfgang Maier, and the Galaxy Project
Susanna Sansone, Allyson Lister and FAIRsharing team
The FAIRDOM team
For more information
https://rdmkit.elixir-europe.org
ELIXIR-Converge has received funding from the European Union’s
Horizon 2020 Research and Innovation programme under grant
agreement No 871075.