Biomedical data integrators grapple with a fundamental blocker in research today: licensing for data use and redistribution. Complex licensing and data reuse restrictions hinder most publicly-funded, seemingly “open” biomedical data from being put to its full potential. Such issues include missing licenses, non-standard licenses, and restrictive provisions. The sheer diversity of licenses are particularly thorny for those that aim to redistribute data. Redistributors are often required to contact each sub-source to obtain permissions, and this is complicated by the fact that on each side of the agreement there may be multiple legal entities involved and some sub-sources may themselves already be aggregating data from other sub-sources. Furthermore, interpreting legal compliance with source data licensing and use agreements is complicated, as data is often manipulated, shared, and redistributed by many types of research groups and users in various and subtle ways. Here, we debut a new effort, the (Re)usable Data Project, where we have created a five-part rubric to evaluate biomedical data sources and their licensing information to determine the degree to which unnegotiated and unrestricted reuse and redistribution are provided. We have tested the (Re)usable Data rubric against various biomedical data sources, ranking each source on a scale of zero to five stars, and have found that approximately half of the resources rank poorly, getting 2.5 stars or less. Our goal is to help biomedical informaticians and other users navigate the plethora of issues in reusing and redistributing biomedical data. The (Re)usable Data project aims to promote standardization and ease of reuse licensing practices by data providers.
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Reusable data for biomedicine: A data licensing odyssey
1. Reusable data for biomedicine:
A data licensing odyssey
Melissa Haendel
Seth Carbon, Julie McMurry, Robin Champieux,
Letisha Wyatt, Lilly Winfree
RDA September 20tH, 2017
@ontowonka #reusabledata
Image: i.pinimg.com
2. THERE >1500 PUBLIC BIOMEDICAL
DATABASES IN NUCLEIC ACIDS
RESEARCH DATABASE COLLECTION
https://doi.org/10.1093/nar/gkw1188 @ontowonka
3. HOW MANY OF THESE DATA ARE TRULY REUSABLE?
OPENNESS IS AN NAR
REQUIREMENT, BUT …
@ontowonka
4. MONARCH & THE NCATS BIOMEDICAL DATA TRANSLATOR
www.ncats.nih.gov/translator
www.monarchinitative.org
@monarchinit
7. REUSABLEDATA.ORG
Curate, evaluate, and provide guidance on
legal and effective data reuse and
redistributionWanna help? Join us
bit.ly/reusabledata-forum
github.com/reusabledata @ontowonka
9. CRITERION A:
CLARITY
38% RECEIVE FULL STAR
9/24
Non Standard license
(10/24)
Multiple licenses (3/24)
Missing license (2/24)
@ontowonka
10. CRITERION B:
COMPREHENSIVE & FRICTIONLESS
58% RECEIVE FULL STAR
14/24
Reuse terms not clear 5/24
Doesn't apply to all data 4/24
Can’t obtain singly licensed slice 2/24
Auto-fail due to missing/multiple license 3/24
@ontowonka
11. CRITERION C:
DATA IS ACCESSIBLE
92% RECEIVE FULL STAR
22/24 No “reasonable good-faith
location” or single action 2/24
@ontowonka
12. CRITERION D:
FEW RESTRICTIONS ON TYPES OF
REUSE: 29% RECEIVE FULL STAR
7/24
Restrictive but allows academic use
2/24
Restrictive, no academic provisions
12/24
13. CRITERION E:
FEW RESTRICTIONS ON TYPES OF
USER: 32% RECEIVE FULL STAR
7/24
Restrictive but allows academic use 2/24
Restrictive, no academic provisions 10/24
Auto-fail due to missing/multiple license
3/24
@ontowonka
18. THANKS TO:
SETH CARBON
JULIE MCMURRY
ROBIN CHAMPIEUX
LETISHA WYATT
LILLY WINFREE
ANDREW SU
CASEY GREENE
JOHN WILBANKS
SEAN MCDONALD
CHRIS AUSTIN
NOEL SOUTHALL
CHRISTINE COLVIS
Abstract:
Biomedical data integrators grapple with a fundamental blocker in research today: licensing for data use and redistribution. Complex licensing and data reuse restrictions hinder most publicly-funded, seemingly “open” biomedical data from being put to its full potential. Such issues include missing licenses, non-standard licenses, and restrictive provisions. The sheer diversity of licenses are particularly thorny for those that aim to redistribute data. Redistributors are often required to contact each sub-source to obtain permissions, and this is complicated by the fact that on each side of the agreement there may be multiple legal entities involved and some sub-sources may themselve already be aggregating data from other sub-sources. Furthermore, interpreting legal compliance with source data licensing and use agreements is complicated, as data is often manipulated, shared, and redistributed by many types of research groups and users in various and subtle ways. Here, we debut a new effort, the (Re)usable Data Project, where we have created a five-part rubric to evaluate biomedical data sources and their licensing information to determine the degree to which unnegotiated and unrestricted reuse and redistribution are provided. We have tested the (Re)usable Data rubric against various biomedical data sources, ranking each source on a scale of zero to five stars, and have found that approximately half of the resources rank poorly, getting 2.5 stars or less. Our goal is to help biomedical informaticians and other users navigate the plethora of issues in reusing and redistributing biomedical data. The (Re)usable Data project aims to promote standardization and ease of reuse licensing practices by dat providers.
Clearly stated
A clearly stated, unambiguous, and hopefully standard, license for data use is critical for any (re)use of data: if there is no license to be found, then rights are unclear and one needs to assume the default: all rights reserved. more »
Comprehensive and non-negotiated
Data that is mixed under different licenses, only partially available, or must be in some way negotiated creates barriers to the (re)use of data. more »
Accessible
Data must be accessible in a reasonable and manner to be useful to the broader community. more »
Avoid restrictions on kinds of (re)use
Data should be able to be copied, built upon, edited, and modified as freely as possible. more »
Avoid restrictions who may (re)use
Data should should be available to as many people as possible for their (re)use.
MIGHT HIDE
MIGHT HIDE Note here- none got a perfect score on licensing, and the one that one was neither project 1 nor project 2 and got only XXX on licensing.