Elly Dijk & Peter Doorn present the DANS approach to FAIR metrics
Workshop title: Open Science Monitor
Workshop overview:
Which are the measurable components of Open Science? How do we build a trustworthy, global open science monitor? This workshop will discuss a potential framework to measure Open Science, including the path from the publishing of an open policy (registries of policies and how these are represented or machine read), to the use of open methodologies, and the opening up of research results, their recording and measurement.
DAY 2 - PARALLEL SESSION 5
OSFair2017 workshop | Monitoring the FAIRness of data sets - Introducing the DANS approach to FAIR metrics
1. Monitoring the FAIRness of
Data Sets - Introducing the
DANS Approach to FAIR Metrics
Elly Dijk & Peter Doorn, DANS
With thanks to Emily Thomas for some slides
www.dans.knaw.n
l
Open Science Monitor – A workshop organised by OpenAIRE,
EOSCpilot, Jisc, and DANS at the OPEN SCIENCE FAIR, Athens,
7 Sept 2017
2. What is FAIR data and why is it
important for Open Science?
• Open Science has become a policy priority: not only
publications need to be openly accessible, but also other
research outputs, especially research data
• Principle: open if possible, protected if necessary
• Legitimate restrictions to openness:
• to protect privacy
• data creator must be able to publish first
• Openness itself is not enough, data must also be:
• Findable F
• Accessible A
• Interoperable I
• Reusable R
3. Institute of
Dutch Academy
and Research
Funding
Organisation
(KNAW & NWO)
since 2005
First predecessor
dates back to
1964 (Steinmetz
Foundation),
Historical Data
Archive 1989
Mission: promote
and provide
permanent
access to digital
research
resources
DANS is about keeping data
FAIR
4. FAIR guiding principles
1. Findable – Easy to find by both humans and computer systems and based
on mandatory description of the metadata that allow the discovery
of interesting datasets;
2. Accessible – Stored for long term such that they can be easily accessed
and/or downloaded with well-defined license and access conditions (Open
Access when possible), whether at the level of metadata, or at the level of
the actual data content;
3. Interoperable – Ready to be combined with other datasets by humans as
well as computer systems;
4. Reusable – Ready to be used for future research and to be processed
further using computational methods.
Source:
7. Our struggle with FAIR
operationalization
To be Findable:
•F1. (meta)data are assigned a globally unique and
eternally persistent identifier.
•F2. data are described with rich metadata.
•F3. (meta)data are registered or indexed in a
searchable resource.
•F4. metadata specify the data identifier.
To be Accessible:
•A1 (meta)data are retrievable by their identifier
using a standardized communications protocol.
•A1.1 the protocol is open, free, and universally
implementable.
•A1.2 the protocol allows for an authentication and
authorization procedure, where necessary.
•A2 metadata are accessible, even when the data are
no longer available.
No rankings
Relation F2 – F4?
Relation A1 –
F2,3,4?
What if they never were?
Do access licenses
belong to Re-use?
Access criteria relate mainly to
machine access
Help-page: “The FAIR Data Principles explained”
8. Difficulties in FAIR
operationalization II
To be Interoperable:
• I1. (meta)data use a formal, accessible, shared, and
broadly applicable language for knowledge
representation.
• I2. (meta)data use vocabularies that follow FAIR
principles.
• I3. (meta)data include qualified references to other
(meta)data.
To be Re-usable:
• R1. meta(data) have a plurality of accurate and
relevant attributes.
• R1.1. (meta)data are released with a clear and
accessible data usage license.
• R1.2. (meta)data are associated with
their provenance.
• R1.3. (meta)data meet domain-relevant community
standards.
Droste effect:
self-reference
Which other metadata?
Is not R1 about
Findable again?
Are not Licenses about Acc
conditions?
What if a license says “Not accessible”?
Their? Whose?
Is not R1.3 about
Findable again?
For FAIR Help see: https://www.dtls.nl/fair-data/fair-principles-explained/
9. Our evaluation of the original
FAIR principles:
1. Dependencies among dimensions, difficulty to understand and measure the
criteria, no rank order from “low” to “high” FAIRness, grouping of criteria
under dimensions is disputable
2. Do we need or want additional dimensions, principles or criteria?
• Is “openness” a separate dimension, not included in FAIR?
• Is it desirable/possible to say something about “substantive” data quality, such as the
accuracy/precision or correctness of the data?
• What about the long-term access? For how long does data remain FAIR?
• Should data security be included?
3. Several FAIR criteria can be solved at the level of the repository
4. Do we need separate FAIR criteria for different disciplines?
• e.g. machine actionable data are more important in some fields than in other; note that
data accessibility by machines is partly defined by technical specs (A1), partly by licenses
(R1.1)
Therefore, DANS started work on a FAIR “light” implementation in the context
of data repositories compliant with the Data Seal of Approval
10. GO-FAIR Initiative:
FAIR Metrics Group
• Aim: to define metrics enabling both qualitative and quantitative
assessment of the degree to which online resources comply with
the 15 Principles of FAIR Data
• Founding Members:
• Mark Wilkinson, Universidad Politécnica de Madrid
• Susanna Sansone, University of Oxford
• Michel Dumontier, Maastricht University
• Peter Doorn, DANS
• Luiz Olavo Bonino, VU/DTL
• Erik Schultes, DTL
• https://www.dtls.nl/fair-data/fair-metrics-group/ and
http://fairmetrics.org/
11. FAIR Metrics Framework
www.eoscpilot.eu
The European Open Science Cloud for Research pilot project is funded by the
European Commission, DG Research & Innovation under contract no. 739563
11
12. FAIR Metrics Form
Metric Identifier
Metric Name
To which principle does it apply?
What is being measured?
Why should we measure it?
What must be provided?
How do we measure it?
What is a valid result?
For which digital resource(s) is this
relevant?
Examples of their application
across types of digital resource
Comment
www.eoscpilot.eu
The European Open Science Cloud for Research pilot project is funded by the
European Commission, DG Research & Innovation under contract no. 739563
12
• GOALS:
• Develop a broadly useable framework
and tool for FAIR assessment
• Harmonize current ongoing efforts
• Expected outcomes:
• Checklist of clearly defined metrics
• Indication of how these metrics can be
readily implemented (e.g. self-
reporting, automated)
• Proposal(s) for undertaking a FAIR
assessment of a digital resource
• One or more implementations for
performing an assessment
Can we do it?
FAIR Metrics Group
13. The DANS approach to FAIR
www.eoscpilot.eu
The European Open Science Cloud for Research pilot project is funded by the
European Commission, DG Research & Innovation under contract no. 739563
13
Data Seal of Approval Principles (for
data repositories) 2006
FAIR Principles (for data sets)
2014
data can be found on the internet Findable
data are accessible Accessible
data are in a usable format Interoperable
data are reliable Reusable
data can be referred to (citable)
The resemblance is not perfect:
• usable format (DSA) is an aspect of interoperability (FAIR)
• reliability (DSA) is a precondition for reusability (FAIR)
• FAIR explicitly addresses machine readability
A DSA certified repository already guarantees a baseline data quality level
14. All data sets in a
Trusted Repository
are FAIR, but some
are more FAIR than
others
15. Findable (defined by metadata (PID included) and documentation)
1. No PID nor metadata/documentation
2. PID without or with insufficient metadata
3. Sufficient/limited metadata without PID
4. PID with sufficient metadata
5. Extensive metadata and rich additional documentation available
Accessible (defined by presence of user license)
1. Metadata nor data are accessible
2. Metadata are accessible but data is not accessible (no clear terms of reuse in license)
3. User restrictions apply (i.e. privacy, commercial interests, embargo period)
4. Public access (after registration)
5. Open access unrestricted
Interoperable (defined by data format)
1. Proprietary (privately owned), non-open format data
2. Proprietary format, accepted by Certified Trustworthy Data Repository
3. Non-proprietary, open format = ‘preferred format’
4. As well as in the preferred format, data is standardised using a standard vocabulary
format (for the research field to which the data pertain)
5. Data additionally linked to other data to provide context
FAIR “Light” assessment
16. Towards a FAIR Framework?
Analogous to Certification Framework?
Formal
-----------------------------------
Extended
-----------------------------------------
Core
All noses in the same
direction?
17. DANS FAIR assessment
profile
• Developing the data assessment
tool: FAIRdat
• We want to create a badge system
using the FAIR principles to assess
data sets in a TDR
• Operationalise the original
principles to ensure no interactions
among dimensions to ease scoring
• Consider Reusability as the
resultant of the other three:
• the average FAIRness as an
indicator of data quality
• (F+A+I)/3=R
• Manual and automatic scoring
• New principles still in progress
F A I R
2 User Reviews
1 Archivist Assessment
24 Downloads
18. Ideas for our operationalization Solution
R1. (meta)data are richly described with a plurality of
accurate and relevant attributes
≈ F2: “data are described with
rich metadata” → F
R1.1. (meta)data are released with a clear and
accessible data usage license
→ A, aren’t licenses about access
conditions?
R1.2. (meta)data are associated with detailed provenance → F, appears as a criteria for
Findable metadata
R1.3. (meta)data meet domain-relevant community
standards
→ I, ≈ standardized vocabularies
Attempt to operationalize ‘R’
• Proposed solution
22. …FAIR database
• Identifier
• Name of Data Set
• Depositor / “owner”
• Repository
• Discipline
• …
• FAIR scores per item
• Total FAIR score
• Badge
23. Display FAIR badges in any repository (Zenodo,
Dataverse, Mendeley Data, figshare, B2SAFE, …)
24. Monitoring Open Science:
research data
•How can we use this tool for Monitoring Open Science?
• Do you think people are willing to assess research data in
repositories?
• And: Who will do the assessment? (depositors, data
experts at repositories, users)
• Using the FAIR metrics
• in own repository – harvested by OpenAIRE
• in FAIR database – harvested by OpenAIRE
www.eoscpilot.eu
The European Open Science Cloud for Research pilot project is funded by the
European Commission, DG Research & Innovation under contract no. 739563
24
25. Thank you for your attention!
• You’re invited to visit
the DANS/EOSCpilot
workshop and try
FAIRdat
• FAIR metrics: starring
your datasets
•Friday 8 September at
11:30h
•Book Castle Ground
Floor
• elly.dijk@dans.knaw.nl
• peter.doorn@dans.knaw.nl
• www.dans.knaw.nl
• www.eoscpilot.eu
• www.openaire.eu
www.eoscpilot.eu
The European Open Science Cloud for Research pilot project is funded by the
European Commission, DG Research & Innovation under contract no. 739563
25
FAIR Metrics:
STARRING
YOUR DATAF A I R
2 User Reviews
1 Archivist Assessment
24 Downloads