2. FAIRDOM Services for the co-funded call
ERACoBioTech Full proposals
• Cost of data management
clearly budgeted
• Data management template
• Detailed Data Management
Plan (DMP)
• Compliance
– H2020 FAIR DMPs
– National funder DMP
https://www.cobiotech.eu/
4. • What data will be collected or created as part of the study (RAW data)?
• What data will be produced by processing the RAW data (Secondary, processed
data)?
• Are existing data is being re-used (if any)?
• What is the origin of the data?
• What are the types and formats you plan to use for the data generated/collected
(raw, processed, published)?
• What data will be published as the result of your study?
• What are the cost estimates of making your data FAIR?
• Do you have any national/funder/sectorial/departmental procedures for data
management?
Responsibilities, types of study, data, models
Volume and life cycles, processing and access policies
documentation and metadata
Data Management Planning Checklist
General
5. Data Management Planning Checklist
Volume and Life Cycle of the Data
Raw data
• How much RAW data you think will be produced (Estimates, per month, year, full project duration)?
• Will all of the RAW data be kept for the duration of the study or will the RAW data be deleted once it is
processed?
• For large scale RAW data (images, sequence) have you planned the local storage capacity necessary for
processing?
• Do you require help to organise a suitable local management system for RAW data?
• Do you have policies that govern the management and usage of RAW data?
• How long will RAW data be kept?
• Will there be a long-term archive?
Secondary and Published data
• What data processing is foreseen in the project?
• How much processed data will be produced, and stored (can you make estimates per month, year, full
project)?
• How much of this data will be published? (Estimates per month, year, full project)?
• Does your institution, or the project funders, have policies governing the access and usage of processed
data?
6. Data Management Planning Checklist
Personally sensitive data (e.g. medical data)
Data flow through the project, define what data is:
• aggregated (typically safe to share, if names cannot be recovered)
• anonymized (name cannot be recovered from the data)
• pseudonymized (name can be recovered by some)
• non-anonymized (name linked to data)
Which organisational boundaries have to be traversed by which data?
• Make sure with your local data protection officer and ethics commission that the data can be shared with your partners
along the flow described with the anonymisation levels as described.
Why local?
• Some laws change across surprising boundaries.
• E.g. in Germany Universities and other public organisations are subject to another data protection law than enterprises.
Why seek advice?
• Maybe required to be able to recover the name-data-relation, e.g. to enable study participants to *leave* a study.
Secure housekeeping
• What provisions will you have in place for data recovery, secure storage, and transfer of sensitive data?
7. FAIR Findable, Accessible, Interoperable, Reusable
Checklists
Making Data Findable (documentation and metadata management)
• What documentation and metadata will accompany the data (assist its
discoverability)? (Details on methodology, definitions, procedures, SOPs,
vocabularies, units, dependencies, etc)
• What information is needed for the data to be read and interpreted in the
future?
• What naming conventions will be used?
• How will you approach versioning your data?
• How will you capture / create this documentation and metadata?
• How do you ensure the completeness of the captured data?
Making DataAccessible
Specify which data will be made openly available taking into consideration
• What ethics and legal compliance issues do you have if any? Do you need
consent for data preservation and sharing? Do you have to protect
certain data? Is any data sensitive?
• Do you think you might have Intellectual Property Rights issues? Have
you considered ownership of the data, licensing, restrictions on use?
• Do you think you will need to embargo any data?
• How will you make the data available? (consider the platforms you will
use: databases, repositories, etc)
• What methods or software tools are needed to access the data? shoudl
you include documentation detailing how to access use/access the
software that is needed for accessing the data? Is it possible to include
this software with the data (e.g. source code, docker etc)
• If there are any restrictions on accessibility, how will you provide access?
Making Data Interoperable
• What standards (metadata vocabularies, formats,
checklists) or methodologies will you use?
• How do you address data and model quality?What
validation steps do you foresee?
• Will you use standardised vocabulary for all data types
to allow inter-disciplinary interoperability?
• Where you can not used standardised vocabulary for all
types of data, can you map to more commonly used
ontologies?
Making Data Re-usable
• How will you licence your data to permit the widest re-
use possible?
• When will the data be made available for re-use? Does
this include an embargo period? (if so, why?)
• Which data will be available for re-use during/after the
project? If not, why?
• What are your data quality assurance processes?
• How long do you expect your data to remain re-usable?
10. FAIRDOM Services
FAIRDOM Software Platform+Tools
A Central Public Hub
for Projects
Customised Project
Installations
Project Stewardship
Consultancy Services
Community
Activities
70+ Projects 30+ Installations
11. Managing Project Assets
• End to end data management
• Track collection of data and metadata
• Maintain experimental context
• Organise and link assets
• Choose what to keep
• Long-term retention of results beyond a project
• Find and exchange assets
• Share, disseminate and publish assets
• Consistently report for interpretation,
interoperability & comparison
• Support reproducible publications
• Promote standardised metadata practices.
• Reuse public tools and community archives
• Integrate with legacy and home grown systems
• Credit owners
Metadata People Processes
12. Managing Project Assets
• End to end data management
• Track collection of data and metadata
• Maintain experimental context
• Organise and link assets
• Choose what to keep
• Long-term retention of results beyond a project
• Find and exchange assets
• Share, disseminate and publish assets
• Consistently report for interpretation,
interoperability & comparison
• Support reproducible publications
• Promote standardised metadata practices.
• Reuse public tools and community archives
• Integrate with legacy and home grown systems
• Credit owners
Metadata People Processes
13. FAIRDOM Platform
Built on established software systems
Front end
Project Hub
Back end
Onsite storage & analytics
On site
Tracking, analytic pipelines,
Extract,Transform and Load direct from
the instruments,
Large data management
LIMS, auto-archiving
Web-based portal
Project controlled spaces
Metadata catalogue &Yellow pages
Results repository, dissemination and collaboration
Tool gateway
Built using Built using
14. Back end
Instrument Data Management, LIMS, ELN
Samples
Protocols
Experiment
Description
Raw Data
Analysis
Scripts
Results
Laboratory Notebook &
Inventory Manager
ELN
LIMS-like
linking data to biological materials
• samples+protocols management
• data management
• experimental description
Big Data analytics on distributed compute resources
15. • Project controlled spaces
– Working space for projects
– Show space for communicating results
– Yellow pages and collaboration
– Upload or link to data
• Catalogue and aggregate
experimental outputs in one place
– Regardless of physical location
– Organised as Investigation-Study-Assay/Analysis
– Standards-compliant
– Shared metadata
• Linked with other systems
– Project on-site (secure) repositories
– Public deposition archives (PRIDE, Biomodels, ICE
etc)
– Integration with JWSOnline modelling tools
Front End Hub common space, one place
to organise your assets
Built using
16. Front End Hub common space, one place
to organise and report your assets
.org
Nucl. Acids Res. (2016) doi: 10.1093/nar/gkw1032
70+ Projects
30+ Installations
Public & cloud
Subject and Datatype archives
17. Set up to suit your project
.org
Local retention
In flight management,
Private sharing
Customisation
Centres, large projects
National projects
Local skills for admin support
Post-project retention
One stop showcase
Self-managed sharing
Supplementary materials
Off-the-shelf features
Hosted on behalf of users
Delegated admin support
• Trusted
repository
• Guaranteed
until 2029
• Long term
maintenance
• Sustainability
• 1TB per
project stored
centrally.
• Much more
catalogued.
21. Store & Catalogue aggregated across repositories.
Retain context to support decision making and reuse
In House Stores
External Databases
Publishing services
Secure Stores
Model Resources
Your Onsite Store
Institutional Repository
23. Snapshots and Publishing
Work with JWS Online and SED-ML database partners (Snoep and
Waltemath Groups)
• One-click, live figure reproduction using the Hub
• FEBSJ, IET Systems Biology, Metabolomics, and Microbiology
• Molecular Systems Biology in 2016
• Technical model curation service
Author List: Joe Bloggs; Jane Doe
Title: My Investigation
Date: September 2016
DOI: https://doi.org/10.15490/seek##
https://doi.org/10.15490/seek.1.investigation.56
26. Examples: ERASysAPP project
IMOMESIC: Integrating Modelling of Metabolism and Signalling towards an
Application in Liver Cancer https://fairdomhub.org/projects/24
[Adapted from Ursula Klingmüller, Martin Böhm]
Excemplify
Antibody
Database
27. 27
Programme
Overarching research theme (The Digital Salmon)
Project
Research grant (DigiSal, GenoSysFat)
Investigation
A particular biological process, phenomenon or thing
(typically corresponds to [plans for] one or more closely related
papers)
Study
Experiment whose design reflects a specific biological research
question
Assay
Standardized measurement or diagnostic experiment using a
specific protocol
(applied to material from a study)
Jon Olav Vik,
Norwegian University of Life Science
Integration with Norway’s national
einfrastructure for Life Science (NeLS)
33. Standard. Pay your way community
activities. DIY local installation.On your
own curation and sustainability.
FAIRDOMHub
Premium. Direct support of projects.
In-house installation support. Full customer
service.Training.
Super-Premium. Extensive tailoring,
integrations and adaptations of platforms.
Custom and dedicated services.
In house installation support.
Project Support Services
Training – Consultancy – Installation -Customisation
per project negotiation
Cost
• local storage and
servers
• Licenses
• Training budgets
~5-10% of total proposal
budget
20-40 days
consultancy/annum
34. Support Service
Pre
Project
Start
up
Post
Project
Data Management Planning
Running Data Management Plan
Support at different levels
Wrap-up and transfer planning
Publishing
In
flight
Setting up Data Management Plan, Induction
Support at different levels
Project PALs
• advocates
• champions
• focus group
22 PALS
77 project visits
ERASysAPP
35. FAIRDOM Consortium
FAIRDOMAssociation
• Legal entity
• German
• Subcontract status, FEC
• Delivery will be through a
combination of preferred or
designated FAIRDOM facilities
• Contribution to the core built in
FAIRDOM Facility
• Institutional entity
• National identity
• Partner/Co-investigator status
• Delivery through that FAIRDOM
Facility
• Contribution to the core by
arrangement
Funded by
• Core grant awards
• Auxillary grant awards
• Contributions
Manchester
Edinburgh
HITS
Leiden
ETHZ/UZH
ELIXIR Norway
NMBU
ISBE.si
National
Institute of
Biology
Association e.V.
36. ERACoBioTech Consortium Arrangements
The Rules
• 21 national funders, each with
their own regulations
• Consortia
• 3-6 partners
• 3-8 partners if include AR, ES, IL,
LV, PT, RO, RU, SI, TR
• 3 different countries
• up to 2 partners from same
country
• Funder principles
• Subcontractors can be included and
are managed under the national or
regional financing regulations of the
eligible participant
Manchester
Edinburgh
HITS
Leiden
ETHZ/UZH
ELIXIR Norway
NMBU
ISBE.si
National
Institute of
Biology
Association e.V.
Romania
UEFISCDI
Argentina
MINCYT
Switzerland
CTI
Argentina
MINCyT
Spain
MINECO
United Kingdom
BBSRC
Spain
MINECO
Germany
FNR
Belgium
EC
United Kingdom
FAIR-DOM
United Kingdom
BBSRC
Germany
JUELICH
Germany
SMWK
Turkey
TÜBITAK
Italy
MIUR
France
ANR
Slovenia
MIZS
Latvia
VIAA
Poland
NCBR
Estonia
ETAG
Netherlands
NWO
Portugal
FCT
Belgium
SPW - DGO
Germany
JUELICH
Switzerland
CTI
Country
Organisation
Norway
RCN
Spain
CDTI
Germany
JUELICH
United Kingdom
CommBeBiz
Israel
CSO-MoH
Germany
JUELICH
Germany
JUELICH
Netherlands
NWO
Germany
JUELICH
openBIS is a data management platform developed by ID-SIS at ETH
Under active development since 2007
Originally developed for management of life science data within SystemsX projects
Generic underlying structure makes it amenable to be used in other disciplines
Currently used in several labs and facilities at ETH, in Switzerland, Europe and USA
Under active development since 2008 by Heidelberg Institute for Theoretical Studies, DE and the University of Manchester, UK
Sustainability:
Local
FAIRDOMHub
Community archives
Can start off and migrate
Trusted long-term repository
Repository space during and after project
Project controlled spaces
Working space for projects
Show space for communicating results
Collaboration space for partners
Supp. materials space for publications
Portal to project on-site repositories
Portal to modelling tools + public archives
Organise, find and share all experimental outputs in one place
Organise across on-site, internal, secure and public stores all from one place
Setup on-site or in the cloud
Use national or institutional data storage infrastructure
Use our managed central Hub to upload, to organise, to catalogue and to safely save for the long-term
All this metadata is machine processable
Catalogue spanning repositories, Keeping context respects project data solutions, reuses public content, structured
SEEK aggregates as well as stores, so encourages domain specific publishing too
Local
FAIRDOMHub
Community archives
Data Management Planning
Tailored Data Management design
Tailored metadata structures and pipelines
Tailored platform install
Tailored showcase and exchange
Requirements priority
Help in DM problem solving
Help in linking data to analytics
Help in compliance
Help during project movements and staff changes
Help at project sunset time
Help for reprod. Publication
Build a PALs network
Tailored Training, Workshops, Site Visits
Curation support
Data Management Planning
Tailored Data Management design
Tailored metadata structures and pipelines
Tailored platform install
Tailored showcase and exchange
Requirements priority
Help in DM problem solving
Help in linking data to analytics
Help in compliance
Help during project movements and staff changes
Help at project sunset time
Help for reprod. Publication
Build a PALs network
Tailored Training, Workshops, Site Visits
Curation support