FAIRy Stories

FAIRy stories
for Christmas
Carole Goble
The University of Manchester, UK
carole.goble@manchester.ac.uk
ELIXIR-UK, FAIRDOM, ISBE,
BioExcel CoE, Software Sustainability Institute
Open PHACTS
SWAT4HCLS 2017, 5th Dec 2017, Rome

Once upon a time in
a land far, far away
lived a KinG …
Who wanted all data
to be FAIR….

Mark D. Wilkinson,
Michel Dumontier,
IJsbrand Jan Aalbersberg,
Gabrielle Appleton,
Myles Axton,
Arie Baak,
Niklas Blomberg,
Jan-Willem Boiten,
Luiz Bonino da Silva Santos,
Philip E. Bourne,
Jildau Bouwman,
Anthony J. Brookes,
Tim Clark,
Mercè Crosas,
Ingrid Dillo,
Olivier Dumon,
Scott Edmunds,
Chris T. Evelo,
Richard Finkers,
Alejandra Gonzalez-Beltran,
Alasdair J.G. Gray,
Paul Groth,
Carole Goble,
Jeffrey S. Grethe,
Jaap Heringa,
Peter A.C ’t Hoen,
Rob Hooft,
Tobias Kuhn,
Ruben Kok,
Joost Kok,
Scott J. Lusher,
Maryann E. Martone,
Albert Mons,
Abel L. Packer,
Bengt Persson,
Philippe Rocca-Serra,
Marco Roos,
Rene van Schaik,
Susanna-Assunta Sansone,
Erik Schultes,
Thierry Sengstag,
Ted Slater,
George Strawn,
Morris A. Swertz,
Mark Thompson,
Johan van der Lei,
Erik van Mulligen,
Jan Velterop,
Andra Waagmeester,
Peter Wittenburg,
Katherine Wolstencroft,
Jun Zhao,
Barend Mons
Wilkinson Dumontier Schultes
Scientific Data 3, 160018 (2016)
doi:10.1038/sdata.2016.18

Queens…
And FAIRY GODMOTHERS
Scientific Data 3, 160018 (2016)
doi:10.1038/sdata.2016.18

Machine Processable Metadata
Scientific Data 3, 160018 (2016) doi:10.1038/sdata.2016.18
• Catalogues, Search, Stores
• Metadata Standards
• StandardAccess protocols
• Identifiers, Policies
• Authorised Access
• Licensing

FAIR spread across the lands ……
VIVO/SciTS Conferences 6-8 August 2014, Austin, TX

FAIR spread across the lands ……

Stakeholder FAIR Awareness
UK Institutional Research Data Management guidance*
* Jisc: Final Report FAIR in Practice, Nov 2017
Government,
Funder,
Publisher,
National &
International
Infrastructures…
Institutional
Researchers
FAIR spread across the lands …… BUT not
necessarily all the peoples

Moral: Names are important
Spinning (metadata) straw
into gold
Be careful what you
promise…

Me Too!
staking claims
we { are | will be | always
have been } FAIR
a rallying flag

http://dx.doi.org/10.1101/225490
http://blog.ukdataser
vice.ac.uk/fair-data-
assessment-tool/
http://fairmetrics.org/

Beware…
beauty is in the
eye of the
beholder
What’s FAIR from a Cataloguer
perspective maybe useless from
a biologists viewpoint

My Semantic FAIRy Stories
The Scientist and
the FAIR Commons
The MAGIC
Research Object
little semantics and
the big Web

The Scientists and the
FAIR Research
Commons
Supporting mixed
types and many
researchers
FAIR

The Scientists and the
FAIR Research
Commons
Find:
ID resolution
Faceted Navigation
Search, RDF
SPARQL endpoint, APIs
A Commons for Workflows
myexperiment.org
A Commons for Systems Biology Projects
fairdomhub.org
investigation
study
assay/analysis
data
models
SOPs

Community & Project Commons
Structured
organisation
across standards
and types
Federation over
autonomous
resources
Laissez-Faire
Independent
Users
Ecosystem of
types, stores
and metadata

Own little houses: from straw to bricks
Permission controls
Staged sharing
Licenses
Negotiated access
Embargos
Open

Schema
Dublin core
Datacite,
DCAT, Bioschemas
Catalogue
Level
Investigation
Studies
Assay/Analysis
Content
level
Persistent Identifiers
Content level
subject thematic standards
Content
level
Stratified
Linked Data

Getting the best FAIR metadata….
FAIR Access
– myExperiment -> open
– FAIRDOM -> friends and family
– Hand over straw houses to FAIRDOMHub
“TheTragedy of the Commons”*
– Metadata quality and quantity
– Identifier hygiene
– Curation & contributions
– Public good vs personal burden
– Incorporation into processes
– Community socialisation - obligations mismatches. Credit!
*Mark Musen , https://ncip.nci.nih.gov/blog/face-new-tragedy-commons-remedy-better-metadata/

project PIs, funders
time
burden, distrust
project PIs, funders
PALs – juniors, advocates and
Cinderellas
templates, tools
benefit

Bake in
“Semantic Nudging”
Ontologies stealthily embedded
in Excel spreadsheet templates
Added value -
Model execution
Vanity, guilt, shaming
Automation
rightfield.org.uk

“The Last Mile”* -> The First Mile
FAIR from bench to cloud
Last mile - Infrastructure
view
First mile - researcher /
resource view
* Dimitrios Koureas et al Community engagement: The ‘last mile’ challenge for
European research e-infrastructures
Research I deas and Outcomes 2: e9933 (20 Jul 2016)
https://doi.org/10.3897/rio.2.e9933

the generic vs specific zig zag path

The MAGIC Research
OBJECT
GENERIC Framework
For exchange,
reproducibility,
Preservation, active
artefacts
Universal Catering,
bottomless content
FAIR

The FAIR Research Object
import, exchange, portability, maintenance
ISA-TAB
Bergman et al COMBINE archive and OMEX format: one file to share all information to reproduce a modeling project,
BMC Bioinformatics 2014, 15:369

workflow engine
Workflow Run
Provenance
Inputs Outputs
Intermediates
Parameters
Configs
Narrative
Exchange between people & platforms
Commons store, catalogue & archive
Reproduce preserve, port, repair
Activate re-compute, mix, compare,
evolve
The FAIR Workflow Research Object

researchobject.org
Bechhofer et al (2013) Why linked data is not enough for scientists https://doi.org/10.1016/j.future.2011.08.004
Bechhofer et al (2010) Research Objects: Towards Exchange and Reuse of Digital Knowledge, https://eprints.soton.ac.uk/268555/
Standards-based generic
metadata framework for
bundling internal and external
resources with context
citable reproducible packaging
Data used and results produced in study
Methods employed to produce/analyse data
Provenance and settings for the experiments
People involved in the investigation
Annotations about these resources:-
understanding & interpretation

Linking across ROs and into the
Linked Open Data Cloud
• Recording & linking together the
components of an experiment
• Linking across experiments.
• Linked ROs
• A SemanticWeb of Research
Objects
• Resource References – a
bottomless pot

Technology Independent.
The least possible.
The simplest feasible. Low tech.
Low user overhead and thin client
Graceful degradation.
FAIR ROs Desiderata

Construction Content Profile
Types
Identification
to locate things
Aggregates
to link things together
Annotations
about things & their
relationships
Type Checklists
what should be there
Provenance
where it came from
Versioning
its evolution
Dependencies
what else is needed
Manifest checklist
Type Checklists
describing what
should be there
Container
Metadata
Objects

Construction
http://www.researchobject.org/specifications/
RO Model
Identifiers: URI, RRI,
DOI, ORCID
W3C Web
AnnotationVocabulary
Open Archives Initiative
Object Exchange and Reuse
Aggregation
Annotation
Container

Content
Profiles.
Progression LevelsContainer

Profile
http://purl.org/minim/description
W3C
Shape Specs
*Gamble, Zhao, Klyne, Goble. "MIM: A Minimum Information Model Vocabulary and Framework for Scientific Linked
Data", IEEE eScience 2012 Chicago, USA October, 2012), http://dx.doi.org/10.1109/eScience.2012.6404489
validators / viewers
Minim model for
defining
checklists*
multiple profiles for
different consumers
Generic
Specifics
RO-SHOW
Container

Linked Data
Pharmacological
Discovery Platform
Data Releases
Dataset “build”
RO Library
Earth Sciences
Public Health Learning Systems
Asthma Research e-
Lab sharing and
computing statistical
cohort studies
Happy Endings!
ISA based Packaging,
Systems Biology commons
& publishing
Managing distributed
unmovable large datasets
for Biomedical HTS
analytic pipelines *
* Chard et al I'll take that to go: Big data bags and minimal identifiers for exchange of large, complex datasets,
https://doi.org/10.1109/BigData.2016.7840618

Happy Ending – Workflows
Biomedical HTS analytic pipelines
Manifest description of
CWL workflows + rich
context + provenance +
other objects + snapshots
Precision medicine
NGS pipelines regulation*
*Alterovitz, Dean II, Goble, Crusoe, Soiland-Reyes et al Enabling Precision Medicine via standard communication of NGS provenance, analysis, and results, biorxiv.org,
2017, https://doi.org/10.1101/191783
EDAM
Biomolecular modelling
PortableWorkflows

BagIT, JSON(-LD),
schema.org
https://dokie.li/
https://linkedresearch.org/
Manifest: Schema.org,
JSON-LD, RDF
Archive: .tar.gz
Reproducible Document
Stack project
eLife, Substance and Stencila
BagIT data profile +
schema.org JSON-LD
annotations
Many Roads

Morals
Incremental, open frameworks hard work
– Extensive reuse of standards is tricky
– Too Generic vsToo Specific
– Multi-element type & nesting challenges
– ROs with a Purpose
– Examples & templates
Representational Beauty vsTools
– Easy to make, hard to consume
– Be specific, be developer friendly
– Profiles & tools critical
Patience is a virtue

Bioschemas:
Little Semantics and
the big web
Being and keeping light,
small and viral
FAIR

Structured data markup for web pages
Schema.org adds simple
structured metadata markup to
web pages & sitemaps for
harvesting, search and summary
snippet making.
Search engines often highlight
websites containing Schema.org
Widespread commercial and
open source infrastructure
creates a low barrier to adoption

Goldilocks & the 3 Use Cases
Standardised
metadata
mark-up
Metadata
published &
harvested
withoutAPIs
or special
feeds
3 Use Cases
1. Finding/Citing,
2. Summary snippets
3. Metadata exchange /
ingest
Goldilocks
• Reuse ubiquitous
commercial platform
• The least possible change,
the max possible reuse
• Minimum properties – 6
• Reuse domain ontologies –
we are not reinventing
them!
Commodity
Off the Shelf tools
App eco-system
Repository Level
Content type level

Standardised
metadata
mark-up
Metadata
published &
harvested
withoutAPIs
or special
feeds
Commodity
Off the Shelf tools
App eco-system
Repository Level
Content type level
Goldilocks & the 3 Use Cases

Training
materialsEvents
Organizations Data
Software Lab
Protocols
schema.org tailored to the Biosciences for FAIR
simple structured metadata markup on web pages & sitemaps
bio.tools

schema.org tailored to the Biosciences
simple structured metadata markup on web pages & sitemaps
• Specific for life sciences
• Extends existing Schema.org types
• Focused on few types and well defined relationships
• Minimum properties for finding and accessing data
• Best practices for selected properties
• Managed by Bioschemas.org
• Generic data model
• Generous list of properties to describe data types
• Managed by Schema.org

Tailored schema.org to improve
Findability and Accessibility in Bioscience
Layer of constraints +
documentation + extensions
Leyla Garcia. Poster & Flashtalk

2-3 Oct 2017, Hinxton, ~50 people
Ideally 6 concepts
Reuse ontologies
schema.org
Real mark-up
Tools
Find, Cite, Snippets,
Metadata exchange
Community

http://www.france-bioinformatique.fr/en/training_material
https://search.google.com/structured-data/testing-tool
Applied Drupal 7 schema.org extension
Took about 2 hours
Included inTeSS in an hour
[Niall Beard]

MORALs
Community Buy-in Worth it
• First specs & main mechanism for training
• Google / Schema & ELIXIR support
• Research Schemas for EuropeanOpen
Science Cloud pilot
Goldilocks works but is hard work
• Types & Profiles debates
• Elegance vs best for tools
• Reuse domain ontologies
• Validation, mark-up & harvesting tools
Trolls

How are we FAIRing?
Different levels with different emphasis
Its an Ecosystem, not a single solution
• Catalogues, Search, Stores
• Metadata Standards
• StandardAccess protocols
• Identifiers, Policies
• AuthorisedAccess
• Licensing

smart rebrand launch
Still hard, same stuff
Rally big communities
and grassroots initiatives
Examine our capabilities
There is no magic

FAIRy Land PEST
Political
Economic
Social
Technical

Platform & user buy-in from the get-go
Passionate, dedicated leadership
Seeding critical mass
Community
Tools Driver
Bottom up initiatives fostered by big
umbrellas infrastructures
FAIR Semantic Village*
Simple & Lightweight
Ramps not revolutions
FAIR with a PURPOSE & With PEOPLE
FAIR
Support typical developer –
Familiarity – JSON, APIs
*Deb McGuinness

Research for FAIR
FAIR representation
• The Semantic Web
Automated metadata
• Deep learning, machine learning, AI
• Text Mining, Ontology mapping
Social metadata
• User Experience, Crowd Sourcing
• Choice architecture
FAIR action
• Blockchain
• Virtualised & remote execution
• Image processing
• Preservation & portability
• Provenance tracking, object trajectories
• Engineering & Design, Ethics, Social Sciences
Research +
Developer Practitioner
practices

Mark Robinson
Norman Morrison
Paul Groth
Tim Clark
Alejandra Gonzalez-Beltran
Philippe Rocca-Serra
Ian Cottam
Susanna Sansone
Kristian Garza
Daniel Garijo
Catarina Martins
Iain Buchan
Caroline Jay
David De Roure
Oscar Corcho
Steve Pettifer
Khalid Belhajjame
Jun Zhao
Phil Crouch
Lilian Gorea,
Oluwatomide Fasugba
Stian Soiland-Reyes
Michael Crusoe
Rafael Jimenez
Alasdair Gray
Barend Mons
Sean Bechhofer
Michel Dumontier
Mark Wilkinson
Leyla Garcia
Stuart Owen
KatyWolstencroft
Finn Bacall
Alan Williams
Wolfgang Mueller
Olga Krebs
Jacky Snoep
Matthew Gamble
Raul Palma
Mark Musen
http://www.researchobject.org
http://www.myexperiment.org
http://wf4ever.org
http://www.fair-dom.org
http://www.fairdomhub.org
http://seek4science.org
http://rightfield.org.uk
http://www.bioschemas.org
http://www.commonwl.org
http://www.bioexcel.eu
http://www.openphacts.org

FAIRy Stories

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a FAIRy Stories

Semelhante a FAIRy Stories (20)

Mais de Carole Goble

Mais de Carole Goble (20)

Último

Último (20)

FAIRy Stories

Notas do Editor