Presented at the Association of University Technology Managers (AUTM) Annual Conference 2018
Moderator: Arvin Paranjpe, Oregon Health & Science University
Speakers: Frank Curci, Ater Wynne LLP
Melissa Haendel, Oregon Health & Science University
Charles Williams, University of Oregon
Big data is an open frontier, and it’s quickly expanding. However, transaction costs and legal barriers stand squarely in the way of meaningful, far-reaching data integration. We’ll grapple with the issues regarding a large-scale data integration project across humans, model and non-model organisms. Without pointing fingers, we’ll also share a few highlights from the (Re)usable Data Project, which outlined a five-part rubric to evaluate data licenses with respect to clarity and the reuse and redistribution of data. In addition, the topic raises the question: How well-suited are off-the-shelf software and data licenses for universities? Data scientists and software programmers are all too quick to pick one when they release their technology on GitHub. What should technology transfer professionals
recommend? We’ll discuss the usefulness and attributes of a uniform software and data license for university researchers and software programmers.
2. The Software and Data Licensing Solution
Your Moderator and Panelists
2
MODERATOR and PANELIST
Arvin Paranjpe, MS, JD, RTTP
Sr. Technology Development Manager
Technology Transfer
Oregon Health & Science University
PANELIST
Melissa Haendel, Ph.D.
Associate Professor
Library & Dept. of Medical
Informatics and
Clinical Epidemiology
Oregon Health & Science
University
PANELIST
Chuck Williams, Ph.D, J.D., C.L.P
Associate VP for Innovation
University of Oregon
PANELIST
Frank X. Curci, JD
IP Counsel and Partner
Ater Wynne LLP
3. The Software and Data Licensing Solution
You Cannot Share What You Don’t Own Or Have Rights To (Frank X. Curci)
3
First Presenter: Frank X. Curci
Key IP Issues To
Consider When Sharing
Data / Data Sets.
4. The Software and Data Licensing Solution
You Cannot Share What You Don’t Own Or Have Rights To (Frank X. Curci)
4
You should think about the following when your university considers sharing
your Data /Data Sets (and the software that supports & runs that Data) with
others:
“You Cannot Share What You
Don’t Own Or Have Rights
To.”
5. The Software and Data Licensing Solution
You Cannot Share What You Don’t Own Or Have Rights To (Frank X. Curci)
5
• We all know that Data/Data Sets are increasingly critical to R&D
• But, all parties may not fully appreciate the nuances of:
– Whether / what degree is Data protectable as an “intellectual
property” right (IP or IP Rights) ?
– Who owns the IP Rights in that Data?
– If another party owns the IP Rights in that Data
• Can that other party try to control your access/use of that Data?
6. The Software and Data Licensing Solution
You Cannot Share What You Don’t Own Or Have Rights (Frank X. Curci)
6
• Data/Data Sets can be valuable IP assets for both research
universities and for-profit companies they collaborate with:
– Data/Data Sets could be the “secret ingredient” that’s critical to
fully understanding and implementing the larger discovery
– Analytics-derived data can be particularly important
– Core-underlying data could have enhanced value in an
aggregated form due to your selection criteria & organization
of that data
7. The Software and Data Licensing Solution
You Cannot Share What You Don’t Own Or Have Rights To (Frank X. Curci)
7
• Even if your goal is to allow “open & free” sharing of your
Data/Data Sets
– You still need to address the underlying IP Rights / issues in that
Data
• But--you may still be asking?
– Do you really need to care about these IP Rights if all
you want to do is give others in the research community
a free right to use your Data/Data Sets?
8. The Software and Data Licensing Solution
You Cannot Share What You Don’t Own Or Have Rights To (Frank X. Curci)
8
ANSWER:
You normally cannot
give away (ie: even openly
share) something that you
do not first own or have rights
in.
9. The Software and Data Licensing Solution
You Cannot Share What You Don’t Own Or Have Rights To (Frank X. Curci)
9
• On the positive side---securing your IP Rights in your Data/Data Sets can
actually give you more power / leverage to do what you want with your
Data.
– Including any goal/mission to allow open & free access to your
Data/Data Sets by others in the research community
• Think of IP Rights as your “tool” to help achieve your goal.
• This “tool” helps you control the “destiny” of your Data, whether your goal is
to:
• “Commercialize” your Data, or
• Allow your Data to be openly & freely used by others for more R&D
10. The Software and Data Licensing Solution
You Cannot Share What You Don’t Own Or Have Rights To (Frank X. Curci)
10
• If you need more convincing that you should address IP Rights before
sharing Data:
– For-profit companies --that collaborate with research universities--
increasingly view Data as an extremely valuable IP Right which
they would like to own and / or control.
11. The Software and Data Licensing Solution
You Cannot Share What You Don’t Own Or Have Rights To (Frank X. Curci)
11
• Before “sharing” any Data/Data Sets, you should first:
– Identify all proprietary Data:
• used in their research
• otherwise involved in any collaboration with others
• otherwise under their control
– Determine if that Data is protectable IP
– Address who owns/controls that Data and each party’s
right to access & use that Data
12. The Software and Data Licensing Solution
You Cannot Share What You Don’t Own Or Have Rights To (Frank X. Curci)
12
• Identify all potentially-protectable Data:
– Clinical trial data
– Any other data derived or gathered as part of any research
• Research lab notes
• Data & reports from a confidential pilot project
• Analytics-derived data can be particularly important
• Data from beta tests for software applications
13. The Software and Data Licensing Solution
You Cannot Share What You Don’t Own Or Have Rights To (Frank X. Curci)
13
• How Data is Protectable IP —Trade Secrets:
– Trade Secret protection in USA:
• “Trade secret” is confidential / proprietary information of an owner:
• Has commercial value to the owner
• Because that information is not generally available to the public
• Owner takes reasonable measures to keep information secret
– Examples of Data which may be protectable trade secrets:
• Clinical trial data you organize in a special database
• Research derived knowledge/data---think about your laboratory notes
• Analytics-derived data from a research project or clinical trial
14. The Software and Data Licensing Solution
You Cannot Share What You Don’t Own Or Have Rights To (Frank X. Curci)
14
• How Data is Protectable IP---- Copyrights:
– Copyright protection in USA:
• A “work” is protectable under US copyright law if it’s:
– An original “work of authorship”
» ie: original to that creator/author
– Fixed in a tangible medium of expression
» Not an intangible “idea”
• In other words:
– original expressions of author are protected
15. The Software and Data Licensing Solution
You Cannot Share What You Don’t Own Or Have Rights To (Frank X. Curci)
15
• How Data is Protectable ----Copyrights:
– Example of Data which may be Copyright-protectable:
• Lab notes
• Analytics-derived data may have certain copyright protection
due to “creativity” in the analytical components
• Databases may have certain copyright protection due to
“creativity” in selection criteria & organization of the Data
• Selection criteria & organization of aggregated core-
underlying Data could give rise to certain copyright protection
16. The Software and Data Licensing Solution
You Cannot Share What You Don’t Own Or Have Rights To (Frank X. Curci)
16
• IP Ownership & Controlling Access to
Data:
– While some are starting to fully appreciate IP
Rights in Data
– For profit companies ---such as those your
university collaborates with—already see
tremendous IP value in Data which they want
to own and/or control.
17. The Software and Data Licensing Solution
You Cannot Share What You Don’t Own Or Have Rights To (Frank X. Curci)
17
• IP Ownership and Controlling Access to Data:
– Don’t underestimate the IP value of Data---because
many others, including for profit companies, are
increasingly focusing on:
• Who owns the Data
• Who controls the Data
• Who has right to use the Data (under what
rules)
18. The Software and Data Licensing Solution
You Cannot Share What You Don’t Own Or Have Rights To (Frank X. Curci)
18
• IP Ownership & Controlling Access to Data:
– For-profit companies focusing more on ownership of Data
and controlling access to Data:
• Increasingly claim they own all IP Rights in the Data
– then try to limit your researcher’s ability to use that Data
– by inserting “rules” about that use
• If University owns the IP Rights in the Data
– For-profits try to obtain broad access rights to use that Data
» With limited intervention by your university in the company’s use of that Data
– May still try to impose restrictions on university’s use of your own Data
19. The Software and Data Licensing Solution
You Cannot Share What You Don’t Own Or Have Rights To (Frank X. Curci)
19
• Best Practices when giving your Data to Others:
– Prudent to have a written agreement addressing
• Ownership of Data
• Rights to access and use the Data
– Clearly preserve your institution’s IP ownership in the Data
– You probably only want to give the other party a
non-exclusive right to access/use your Data:
• You do not want to limit your own use of your Data!
20. The Software and Data Licensing Solution
You Cannot Share What You Don’t Own Or Have Rights To (Frank X. Curci)
20
• Best Practices when giving your Data with others:
– Clearly define other party’s right to access/use your Data:
• Will other party get access to all--or just part--of the Data?
• What can the other party do with your Data?
• For how long?
• Does the other party have to return your Data?
21. The Software and Data Licensing Solution
You Cannot Share What You Don’t Own Or Have Rights To (Frank X. Curci)
21
• Best Practices when giving your Data to others:
– You may (again) be asking?
• Why worry about these IP issues when giving “your Data” to another research
university or non-profit research institution?
• What’s wrong if that other university/institution suggests “informality” about how
you share “your Data” with them?
– Think about this:
• Do you have all of the background IP rights and other rights necessary to allow
the other institution to use “your Data”?
• So, you may need to put parameters on the use of “your Data” by the other
institution in order to respect the rights of prior rights-holders who have
contributed to your Data source (“upstream party”).
22. The Software and Data Licensing Solution
You Cannot Share What You Don’t Own Or Have Rights To (Frank X. Curci)
22
• Recommendations when your institution seeks access
to Data from other parties you collaborate with:
– Ask party delivering the Data to clarify the “background IP” rights
in that Data:
• Does delivering party have the authority to give YOU the right to access and
use that Data?
• If not—ask delivering party to get you proper access / usage rights from the
“upstream” parties
• Need to avoid an upstream party trying to deny you the right to use that
Data because delivering party did NOT actually have the proper authority to
share that upstream party’s Data with you.
23. The Software and Data Licensing Solution
You Cannot Share What You Don’t Own Or Have Rights To (Frank X. Curci)
23
• Recommendations when your institution seeks access
to Data from other parties you collaborate with:
– Clarify the scope of your rights to use that Data
• Try to minimize restrictions on your university’s right to use the Data
• Be realistic about your university’s ability to police these restrictions internally
– You want to own IP rights in enhancements you develop while
using that Data, such as:
• Analytics-derived data you develop
• Your selection criteria & organization of the underlying Data
– If the delivering party enhances the Data they previously
delivered to your university:
• Does your university get access rights to use that enhanced Data?
25. 25
We Aren’t
The Bad Guys
We all want to be open…….
If you build it, they will come…….
Information just wants to be free……
Most of what we do as universities is
“Open”
26. 26
So how could
Open be bad?
Expectations for Attribution, Reputation and Participation
Vary Between Academic Disciplines
28. THE PUBLIC’S TRUST
28
• Perhaps at an all time low…
• Distrust of government and
science
• High cost of education
29. ITS ABOUT IMPACT…..
29
NOT COMMERCIALIZATION AND PROFIT
Commercialization is one of the channels that can achieve or accelerate
impact.
We need to showcase the adoption and use of our work.
Demonstrating the importance of validation and quality control is critical.
30. BSD ATTRIBUTION
30
Copyright <YEAR> <COPYRIGHT HOLDER>
Redistributions of source
code must retain the
above copyright
notice…….
Redistributions in binary
form must reproduce the above
copyright notice, …. in the
Documentation and/or other
materials provided with the
distribution.
31. APACHE 2.0 ATTRIBUTION
31
Copyright [yyyy] [name of copyright owner]
You must retain, in the Source form of any
Derivative Works that You distribute, all
copyright, patent, trademark, and
attribution notices from the Source form of
the Work, excluding those notices that do
not pertain to any part of the Derivative
Works
If the Work includes a "NOTICE" text file as
part of its distribution, then any Derivative
Works that You distribute must include a
readable copy of the attribution notices
contained within such NOTICE file….
32. GPL ATTRIBUTION
• One line to give the program's name and a brief idea
of what it does.
Copyright (C) <year> <name of author>
• appropriately publish on each copy an appropriate
copyright notice
• You must cause the modified files to carry prominent
notices stating that you changed the files and the
date of any change.
32
34. TEACHING (IMPACT)
Source Available Attribution Language
• You agree to acknowledge the
contribution Developers and
Software make to your research,
and cite appropriate references
about the Software in your
publications. The current
citations for the Software can be
found at:
• http:_______________________
34
Sharing of university software often
is primarily about teaching what we
have done.
Making “Source Available” allows
research colleagues to learn and
test our work.
Quality control is maintained where
use and modification are allowed
but distribution remains with the
project steward.
35. RESEARCH COMMONS
• A stewardship model that curates
software projects.
• Foundation: A unique agreement
among member universities. The
source code belongs to the Research
Commons members and is a
collaborative effort among research
institutions, a model that promotes
shared development and discoveries.
Rosetta Commons
RosettaCommons makes close
collaboration between laboratories the
norm, even with single code modules.
This allows for rapid sharing of
enhancements and promotes the values
of team science.
35
36. Academic Commons
• An iconography based open licensing system that has
greater flexibility for academic researchers than CC:
– Attribution:
• Academic Credit Required
• Approve & Update
• No Attribution or reference
– Sharing:
• Research Commons
• Hereditary
• Redistribution allowed
– Use:
• Personal/Lab/Univ
• Cost recovery
• Commercial
• Modification allowed
Attribution
RC
38. Reusable data for biomedicine:
A data licensing odyssey
Melissa Haendel
Seth Carbon, Julie McMurry, Robin
Champieux, Letisha Wyatt, Lilly Winfree
@ontowonka #reusabledata
Center for Data2Health
39. THERE >1500 PUBLIC BIOMEDICAL
DATABASES IN NUCLEIC ACIDS
RESEARCH DATABASE COLLECTION
https://doi.org/10.1093/nar/gkw1188 @ontowonka
2
40. HOW MANY OF THESE DATA ARE TRULY REUSABLE?
OPENNESS IS AN NAR
REQUIREMENT, BUT …
@ontowonka
3
41. MONARCH & THE NCATS BIOMEDICAL DATA TRANSLATOR
www.ncats.nih.gov/translator
www.monarchinitative.org
@ontowonka
4
44. REUSABLEDATA.ORG
Curate, evaluate, and provide guidance on
legal and effective data reuse and
redistributionWanna help? Join us
bit.ly/reusabledata-forum
github.com/reusabledata @ontowonka
7
46. CRITERION A:
CLARITY
46% RECEIVE FULL STAR
25/54
Non Standard license
(18/54)
Multiple licenses (4/54)
Missing license (7/54)
@ontowonka
2018-01-07
9
47. CRITERION B:
COMPREHENSIVE & FRICTIONLESS
59% RECEIVE FULL STAR
32/54
Reuse terms not clear 3/54
Doesn't apply to all data 10/54
Auto-fail due to missing/multiple license
9/54
2018-01-07
@ontowonka
10
48. CRITERION C:
DATA IS ACCESSIBLE
87% RECEIVE FULL STAR
47/54
No “reasonable good-faith
location” or single action
7/54
2018-01-07
@ontowonka
11
49. CRITERION D:
FEW RESTRICTIONS ON TYPES OF
REUSE: 35% RECEIVE FULL STAR
19/54
Restrictive but allows academic use 7/54
Restrictive, no academic provisions 19/54
Auto-fail due to missing/multiple license
9/24
2018-01-07
@ontowonka
12
50. CRITERION E:
FEW RESTRICTIONS ON TYPES OF
USER: 35% RECEIVE FULL STAR
19/54
Restrictive but allows academic use 11/54
Restrictive, no academic provisions 15/54
Auto-fail due to missing/multiple license
9/54
2018-01-07
@ontowonka
13
54. THANKS TO:
SETH CARBON
JULIE MCMURRY
ROBIN
CHAMPIEUX
LETISHA WYATT
LILLY WINFREE
ANDREW SU
CASEY GREENE
JOHN WILBANKS
SEAN MCDONALD
CHRIS AUSTIN
NOEL SOUTHALL
CHRISTINE COLVIS
Center for Data2Health
56. UNIVERSITY DATA LICENSING
• Generally, TTOs do not:
1. Readily share data and related software under
noncommercial licenses,
2. Allow rapid data aggregation,
3. Protect commercial licensing royalty streams,
and
4. Safeguard provenance of research data
56
57. UNIVERSITY DATA LICENSING
• What’s the root cause?
– Too many data rightsholders (universities)
preventing use of datasets
• Data aggregation is impractical due to multiple license negotiations
and conflicting data sharing restrictions and requirements
57
58. TRAGEDY OF THE COMMONS
58
• Overfishing, Atlantic NW Cod By Epipelagic - Own
work, CC BY-SA 3.0,
https://commons.wikim
edia.org/w/index.php?c
urid=19281989
59. TRAGEDY OF THE ANTICOMMONS
59
- The original Robber
Barrons
- Overprivatization
- Gauntlet of
tollbooths
By lo Herodotus – Derivative, Nicolas Lardot - Own work, CC BY-SA 3.0,
https://commons.wikimedia.org/wiki/File:Bassin_de_la_Meuse.svg
60. TRAGEDY OF THE ANTICOMMONS
60
- Too many tolls, too
little trade.
By DerHexer; derivate work: Carschten - Own work, CC BY-SA 3.0,
https://commons.wikimedia.org/w/index.php?curid=11052758
Château du Haut-Kœnigsbourg, Upper Rhine Plain
61. COMMONS VS. ANTICOMMONS
Common Property
• Overuse
– English Language:
~1620s
61
Overprivatized Property
• Underuse
– Scrabble Dictionary:
2007
62. FULL SPECTRUM OF PROPERTY
62
Heller. “The Wealth of the Commons,” http://wealthofthecommons.org/essay/tragedy-
anticommons
Commons Anticommons
63. UNIVERSITY DATA LICENSING
63
CC BY
Attribution Very permissive
CC BY-SA
Attribution-
ShareAlike
Like Copyleft
CC BY-NC
Attribution-
NonCommercial
No “commercial use,”
but commercial R & D
is fine
CC BY-NC-SA
Attribution-
NonCommercial-
ShareAlike
Copyleft and no
commercial use
CC0
No Rights Reserved No restrictions
64. UNIVERSITY DATA LICENSING
• CC0 – Pro’s
1. Anything is better than gridlock
2. Relatively quick and easy to implement
3. Digital data in the commons
a. Inexhaustible, cannot be overused
4. Currently supported by the CC Science Commons initiative
5. NIH (tentatively) supports it
64
65. UNIVERSITY DATA LICENSING
• CC0 – Con’s
1. Does not address software licensing
a. Data without the related software will often render the data
inaccessible/worthless
b. Bayh-Dole Act implicitly protects software patent rights
2. Lose provenance on research data
3. Abolishes commercial royalty streams on datasets
4. Sustainability issue for data repositories
5. Does not solve the NIH NCBI Database problem of various data
rightsholders submitting to those databases (and retaining their
copyrights in their submissions)
65
66. PROPOSED DATA LICENSE
• Universities / Non-profit Res. Inst. license to each other under:
– Equivalent to a CC BY-NC-SA v. 4.0 (and limit to such “non-
commercial licensees”).
• NOT a Creative Commons license
• de-identified data
• expand non-commercial use to prohibit clinical use
66
67. DATA & SOFTWARE LICENSING SOLUTION
• Aforementioned uniform data license and…
• Open slot for software license
– GPL v. 3.0, MIT Open Source License, etc. depending on backward compatibility issues
– Whenever possible, use the uniform university software license agreement
• Expandable slot for additional terms
– Like the GPL 3.0 Section 7 on “Additional Terms,” allow custom language on:
» Disclaiming warranties
» Limiting liability
» Declining to grant trademark rights
» Others, TBD
67
68. DATA & SOFTWARE LICENSING SOLUTION
• Rules of the game
– Distributed aggregate licensors (i.e. various TTOs) authorized to license commercially
BUT must share royalties with sources acc. to certain factors
– Like the UBMTA, institutions enter a “master agreement” that includes the uniform data
license
– The master agreement also includes rules stating:
• Who can claim to lead commercial licensing of a dataset
• How royalties are shared, i.e. which factors determine royalty sharing and how they
are quantified
• NCBI Database data rightsholders (i.e. academic institutions, non-profits and
government labs) hereby license their data rights under the uniform data license
– More plausible than declaring the data CC0 and informing the submitters that
NCBI unilaterally nuked their rights in their data
68
69. DATA & SOFTWARE LICENSING SOLUTION
• AUTM Data & Software Licensing Committee to finalize and…
– Adopt the master agreement, includes adopting the:
• uniform data license
• uniform software license
– promulgate the rules of the game
– Facilitate execution of the master agreement indicating that the institution will implement
the uniform data license
69
71. The Software and Data Licensing Solution
Models To Consider When Sharing Data/Data Sets (Frank X. Curci)
71
Closing Comments: Frank X. Curci
Models To Consider
When Sharing
Data/Data Sets
72. The Software and Data Licensing Solution
Models To Consider When Sharing Data/Data Sets (Frank X. Curci)
72
• In view of the IP issues and other critical considerations addressed
today during our panel discussion, the following are some of the
models or approaches that universities should consider when they
want to share their Data/Data Sets (and the underlying software that
supports & runs that Data) with others.
• However, these models/approaches assume you have made sure you
have the appropriate IP rights (as discussed at the beginning of this
presentation) to share / license out your Data to others.
73. The Software and Data Licensing Solution
Models To Consider When Sharing Data/Data Sets (Frank X. Curci)
73
• #1: UNILATERAL LICENSE OUT FROM YOUR
UNIVERSITY:
– Your university licenses out its Data/Data Sets and underlying
software through your own license that sets your own rules of
use
• possibly royalty free—if you want
– Allen Institute (Seattle) is an example:
• royalty free licenses given in its brain atlas, but subject to their licensing
provisions
74. The Software and Data Licensing Solution
Models To Consider When Sharing Data/Data Sets (Frank X. Curci)
74
• #2: LICENSE OUT SOME OF YOUR DATA/SOFTWARE BY USING A
THIRD PARTY LICENSING MODEL:
– Could use Creative Commons license
• Sounds attractive—but there are limitations
– Could use GPL or other open source licensing model
• May be too restrictive
– There are pros and cons to relying on the terms & conditions
dictated by a third party license agreement in order to govern the
use and destiny of your Data
75. The Software and Data Licensing Solution
Models To Consider When Sharing Data/Data Sets (Frank X. Curci)
75
• #3: BILATERAL DATA / SOFTWARE
SHARING AGREEMENT BETWEEN TWO
INSTITUTIONS:
– Two institutions sign a written contract
– Arguably easier to accomplish since only
two institutions.
76. The Software and Data Licensing Solution
Models To Consider When Sharing Data/Data Sets (Frank X. Curci)
76
• #4: CONSORTIUM BETWEEN MULTIPLE INSTITUTIONS VIA
AGREEMENT:
– Multiple parties sign a written contract
– Need to address background IP/copyrights of the consortium
members and how that is contributed by each member to this
consortium effort
– Also need to address copyright issues of parties who are not in
consortium
• may need to give notice to members that they may need
license from these outsiders
77. The Software and Data Licensing Solution
Models To Consider When Sharing Data/Data Sets (Frank X. Curci)
77
• #4: CONSORTIUM BETWEEN MULTIPLE INSTITUTIONS VIA
AGREEMENT (continued):
– Need to address the procedures by which the consortium members
develop updates to the Data/Data sets and software
– Need to address the rights of the consortium members to use the
Data/Data sets and software (and updates) for their internal R&D
– Need to address rights of each member (or just a lead member??) of the
consortium to license out the Data/Data sets and software to 3rd parties
and how to share the royalties among the consortium members (possibly
similar to patent pools).
78. The Software and Data Licensing Solution
Models To Consider When Sharing Data/Data Sets (Frank X. Curci)
78
• #5: CONSORTIUM BECOMES A STAND ALONE
NON-PROFIT CORPORATION / LLC:
– At some point it may make sense to
incorporate this consortium effort into a
standalone nonprofit entity
– Often do this:
• to have a more centralized way of managing the
Data/Data Sets and software rights (ie: neutral
repository)
• for administrative/enforcement purposes
79. The Software and Data Licensing Solution
79
Question and Answer Period
Doc #3002304
Editor's Notes
Abstract:
Biomedical data integrators grapple with a fundamental blocker in research today: licensing for data use and redistribution. Complex licensing and data reuse restrictions hinder most publicly-funded, seemingly “open” biomedical data from being put to its full potential. Such issues include missing licenses, non-standard licenses, and restrictive provisions. The sheer diversity of licenses are particularly thorny for those that aim to redistribute data. Redistributors are often required to contact each sub-source to obtain permissions, and this is complicated by the fact that on each side of the agreement there may be multiple legal entities involved and some sub-sources may themselve already be aggregating data from other sub-sources. Furthermore, interpreting legal compliance with source data licensing and use agreements is complicated, as data is often manipulated, shared, and redistributed by many types of research groups and users in various and subtle ways. Here, we debut a new effort, the (Re)usable Data Project, where we have created a five-part rubric to evaluate biomedical data sources and their licensing information to determine the degree to which unnegotiated and unrestricted reuse and redistribution are provided. We have tested the (Re)usable Data rubric against various biomedical data sources, ranking each source on a scale of zero to five stars, and have found that approximately half of the resources rank poorly, getting 2.5 stars or less. Our goal is to help biomedical informaticians and other users navigate the plethora of issues in reusing and redistributing biomedical data. The (Re)usable Data project aims to promote standardization and ease of reuse licensing practices by dat providers.
Clearly stated
A clearly stated, unambiguous, and hopefully standard, license for data use is critical for any (re)use of data: if there is no license to be found, then rights are unclear and one needs to assume the default: all rights reserved. more »
Comprehensive and non-negotiated
Data that is mixed under different licenses, only partially available, or must be in some way negotiated creates barriers to the (re)use of data. more »
Accessible
Data must be accessible in a reasonable and manner to be useful to the broader community. more »
Avoid restrictions on kinds of (re)use
Data should be able to be copied, built upon, edited, and modified as freely as possible. more »
Avoid restrictions who may (re)use
Data should should be available to as many people as possible for their (re)use.
Overview, how many at which scores
TTOs are not able to address all four of the above bullets. There are no off the shelf noncommercial licenses that adequately protect our royalty streams, allow for rapid data aggregation between academic institutions, and safeguard the provenance of research data.
In a broad sense, we TTOs are undergoing a coordination breakdown.
The root cause of the coordination breakdown is that there are too many rightsholders.
It is subtle, but the end result is massive underuse of data. Commercial R & D that is less efficient and effective. Same goes for university R & D and for clinical informatics.
A coordination breakdown like this is called an anticommons problem. But to unpack how an anticommons works, let’s start with the commons problem.
In 1992, Canada declared a moratorium on the Northern Cod fishery, which for the preceding 500 years had largely shaped the lives and communities of Canada's eastern coast.
It has been over 20 years since the moratorium on fishing Atlantic cod in eastern Canada, but the fish stocks have not replenished. Or in other words, there are no fish.
Commercial overfishing is why there are Somali pirates on the high seas. They used to be fisherman. When we use the term Tragedy, we are not being dramatic.
The Rhine River during the Middle Ages is the best example of a Tragedy of the Anticommons. The Rhine starts in the Swiss Alps, follows the Franco-German border, flows through the Netherlands and empties into the North Sea.
It was an important waterway for commerce and Europe’s principal highway during the Middle Ages. The Roman Emperor protected the great European trade route. Merchant ships paid a modest toll to safeguard their transit. But after the empire weakened in the thirteenth century, German Robber Barrons built castles on the Rhine and began collecting their own illegal tolls.
Robber Barons built spectacular castles along the Rhine, including this one. There were hundreds of castles, including 35 within a 90 mile span. Far too many. The gauntlet of tollbooths made shipping impractical and commerce down the river halted.
Today, the hundreds of ruined castles are a lovely tourist destination…But for hundreds of years, everyone suffered—even the barons. The European economic pie shrank. Wealth disappeared. Too many tolls meant too little trade.
This is why I say we Universities are medieval.
As tech transfer professionals, we should be most fearful of gridlock, because it means rampant underuse. In the data sharing world, we have underuse due to high transaction costs.
The above continuum was introduced by Columbia Prof. Michael Heller, a thought leader in this field. He says there’s no effective ownership at the ends of the spectrum… Overfish the cod, and there’s effectively no cod to fish anymore. Too many data license restrictions, then data aggregation and integration is impractical and essentially impossible.
So what’s the data licensing solution? Some say, nuke any and all IP rights in the data – essentially the copyrights in the databases. That’s called CC0. That involves waiving all interests in one’s works and thereby place them as completely as possible in the public domain. We’ll talk about that in a second, but first consider how governments deal with the commons problem. They can’t control the common resource like the ocean, but they can declare moratoria on how you access the ocean, i.e. how and where you fish. What it effectively does is move the resource closer to the middle of the continuum. I think that’s the real solution for data licensing. But first, let’s cover the off the shelf data licenses.
CC BY This license lets others distribute, remix, tweak, and build upon your work, even commercially, as long as they credit you for the original creation.
CC BY-SA This license lets others remix, tweak, and build upon your work even for commercial purposes, as long as they credit you and license their new creations under the identical terms. This license is often compared to “copyleft” free and open source software licenses.
CC BY-NC This license lets others remix, tweak, and build upon your work non-commercially, and although their new works must also acknowledge you and be non-commercial, they don’t have to license their derivative works on the same terms.
CC BY-NC-SA This license lets others remix, tweak, and build upon your work non-commercially, as long as they credit you and license their new creations under the identical terms.
CC0 owners of copyright- or database-protected content to waive those interests in their works and thereby place them as completely as possible in the public domain, so that others may freely build upon, enhance and reuse the works for any purposes without restriction under copyright or database law.
NIH Scientific Data Council nearly adopted the plan to declare NIH-funded research data as CC0.