Some Proposed Principles for Interoperating Cloud Based Data Platforms
1. Some Proposed Principles for
Interoperating Cloud Based Data Platforms
Robert L. Grossman
Center for Translational Data Science
University of Chicago and
Open Commons Consortium
NIH Workshop on
Cloud-Based Platforms Interoperability
October 3, 2019
Draft 1.5
2. Josh Denny (Vanderbilt), David Glazer (Verily Life Sciences), Robert L.
Grossman (University of Chicago), Benedict Paten (University of California
at Santa Cruz), Anthony Philippakis (Broad Institute)
3. Data Biosphere Principles 1. modular, composed of
functional components
with well-specified
interfaces;
2. community-driven,
created by many
groups to foster a
diversity of ideas;
3. open, developed
under open-source
licenses that enable
extensibility and reuse,
with users able to add
custom, proprietary
modules as needed;
and
4. standards-based
Ingest
Explore
HCA
Analysis
Engine
Examples of Data Environments
Portals
Data
Generators
Researchers
Ingest
Explore
CRDC
Methods
Repo
Work-
Spaces
Store
Use in cloud
Ingest
Store
Explore
AoU
Store
Figure: Courtesy of Anthony Philippakis, Broad Institute
4. The question today: how do we go from building data commons to
building data ecosystems of interoperating data resources,
computational resources, and applications that explore, analyze,
visualize and share data and knowledge?
Cloud-based platforms
Cloud-based data ecosystems
of multiple platforms
6. Some Problems Today
• Platforms that refuse to expose any API and instead require all users
to use their platform or application, usually for competitive reasons.
• Platforms that bring data from other resources and platforms into
their system, but don’t let your data out.
• Platforms that don’t interoperate with other systems with the same
or greater security and compliance and blame security and
compliance.
7. Incentives /
Disincentives for
Interoperating
USG / NFP / For profits
Platform Builders /
Platform Operators
Researchers /
Research Consortiums
Patients / Data Generators
Patients Partnered Research
Many incentives to interoperate
Fewer incentives to interoperate
Some incentives to interoperate
8. Let’s Distinguish: Technical Guidelines vs Operating Principles
• Common vision: we have a common vision of interoperating to
accelerate research, improve patient outcomes and leverage
resources.
• Operating principles include questions about which platforms can
interoperate, whether a platform will expose an API, whether a
platform will be open and support different applications or will be
closed and only support a single application, etc.
• Technical guidelines can follow technical best practices (e.g. use a
persistent digital ID not tied to a particular domain or location
within a domain) or standards (e.g. GA4GH TES).
It may be helpful to think of policies as on an orthogonal axis.
9. Principles To Support a Data Ecosystem
• Use Digital IDs
• Interoperate with third party
authentication and authorization
services
• Expose your data through an API
• Expose your data model through an
API
• Interoperate with other trusted data
platforms with similar security &
compliance
• Process authorized queries and
computations from other systems
and return the results (scatter /
gather)
Please
• Refuse to expose any API and
instead require all users to use your
platform or application
• Bring data from other resources and
platforms into your system, but
don’t let your data out.
• Refuse to interoperate with other
systems with the same or greater
security and compliance
Please don’t
10. Narrow Middle Architecture
*Robert L. Grossman, Progress Towards Cancer Data Ecosystems, The Cancer Journal: The Journal of Principles & Practice
of Oncology, May/June, 2018.
11. Architectures for Data Ecosystems
• A simple data ecosystem can be built when a data commons exposes an API that can support a collection
of third party applications that can access data from the commons.
• More complex data ecosystems arise when multiple data commons and data clouds can interoperate and
support a collection of third party applications by using a common set of core services (called framework
services) that provide support for authentication, authorization, digital IDs, metadata, importing,
exporting and harmonization of phenotype data, etc.
Bioinformaticians curating
and submitting data
Researchers analyzing data
and making discoveries
cloud-based
platforms
container-based
workspaces
ML/AI apps
notebooks
data commons
• Authentication
• Authorization
• Digital IDs
• Importing, exporting &
harmonization of clinical data
• Can be multiple implementations
that trust each other & interop
12. Towards a Definition of a Trust Platform
• Before we discuss the operating principles, we need one definition. Let’s say that
Platform A trusts Platform B (so that B is trusted platform) if Platform B
i) operates with a set of policies, procedures and controls that have been
reviewed and approved by Platform A;
ii) the organizations associated with Platform A and Platform B have a formal
signed agreement describing any costs, liabilities, intellectual property issues,
data or data use limitations, etc. that may be associated with the interoperation
of the two platforms.
• As an example, two data commons that both operate with FISMA Moderate security
and compliance (or more generally follow NIST 800-53) and are operated by two
different NIH Institutes or Centers would, in general, each treat each other as trusted
platforms.
• With this definition, two platforms would directly trust each other. At the end we look
at more general trust relationships among members of a consortium or other larger
organization.
13. 1. Interoperate with other trusted platforms: if another trusted
platform is part of your data ecosystem or wants to create an
ecosystem with you, then interoperate with it.
2. Follow the golden rule of data resources: if you take someone else’s
data, let them have access to your data (assuming you have, or can
establish, a trust relationship with them).
Proposed Operating Principles (Draft 1.5)
14. 3. Support the principle of least restrictive access: Provide another
trusted platform access to your data in the least restrictive manner
possible.
- With rare exceptions, a data resource should provide an API so
that application in other trusted platforms can access data directly.
- If this is not possible due to the sensitivity of your data, then
support the ability for approved queries or analyses to be run over
your data and the results returned. Sometimes this is called an
analysis or query gateway.
Proposed Operating Principles (Draft 1.5)
15. 4. Agree on standards, compete on implementations:
- It is important to open up your ecosystem to competition, less it stagnates.
- What this principle means is that a platform should expose its data and
resources via APIs so that other applications and systems can be part of your
ecosystem.
- It is not necessary for the sponsor of a data resource to necessarily fund
other systems or applications, but it is important not to implicitly create a
monopoly by requiring all users of your data to use a particular application or
system.
- Remember that not all researchers have the same requirements, or the same
preferences, and in general a mix of applications, systems and platforms is
better than requiring the use of a single application or system.
Proposed Operating Principles (Draft 1.5)
16. 5. Support patient partnered research: Support patient partnered
research so that individuals can provide their data and have control
over it within your system. If you cannot do this today, add this to your
platform roadmap.
Proposed Operating Principles (Draft 1.5)
17. Trusted Platforms
• A trust relationship between two resources in a data ecosystem requires agreements
between two organizations about a number of matters, including: security;
compliance; liability; data egress charges; and infrastructure costs.
• For this reason, a formal agreement between two different organizations or a memo
between two different units within an organization or agency is usually required.
• As an example, an Interconnection Security Agreement (ISA) between two platforms
would serve this purpose.
• A consortium of platforms can also sign formal agreements. For example, the Open
Commons Consortium agreements for the BloodPAC Consortium.
Bilateral trust
relationships
Consortium trust
relationships
Federated trust
relationships
Isolated
platform
19. 19
For More Information
Robert L. Grossman, Some Proposed Principles for
Interoperating Data Commons, Medium, October 1, 2019,
http://bit.ly/222QYY
Robert L. Grossman
robert.grossman@uchicago.edu
@BobGrossman