Tutorial presented at Mini Gateways 2022. Demonstrates how to build data portals and science gateways with the Django Globus Portal Framework.
The broad scope of a typical science gateway—to simplify access to shared data, computing and other resources—makes building such a gateway from scratch a daunting task. Investigators must be able to stage data from instruments (or other sources), submit compute jobs to analyze data, move data to more persistent storage, describe data products, and provide a means for collaborators to search, discover, reuse and augment these data products. Myriad tools are available to enable all these tasks but integrating them in a way that hides the complexity from users, is a challenge.
In this tutorial we will describe an approach that bootstraps science gateway development based on the Modern Research Data Portal[1] design pattern. The solution uses a set of open source tools that build on the established Django web framework, the ubiquitous OAuth2/OpenID connect standards for authentication/authorization, the widely deployed Globus service for research data management, and the nascent funcX functions-as-a-service platform. Attendees will learn how to rapidly deploy a science gateway that enables both automated computation at scale and data enhanced discovery of resulting data products. The emphasis will be on automating many of the required tasks so that gateway developers can focus on building differentiated, discipline-specific functionality rather than low-value—yet critical—supporting infrastructure.
We will use the ALCF Community Data Co-Op as an exemplar to illustrate how these tools have been used to support large-scale collaborative research. We will describe the overall solution architecture and introduce attendees to the individual tools. Attendees will then use these tools to deploy and configure their own science gateway to support image analysis, description, indexing and search.
The tutorial will comprise a mix of lectures, demonstration and hands-on exercises. Virtual machines will be provided for computation and for hosting the science gateway. The objective is for attendees to develop a high-level understanding of the various components and leave with working code that can serve as the starting point for their own science gateway implementation.
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
Advanced Computing Meets Data FAIRness
1. Advanced Computing Meets Data FAIRness
Building Science Gateways with the Django Globus Portal Framework
Vas Vasiliadis – vas@uchicago.edu
Lee Liming – lliming@uchicago.edu
April 5, 2022
3. Agenda
• Introduction and motivation
• The Modern Research Data Portal design pattern
• Deploying a science gateway using the MRDP
• Making data findable with Globus Search
• Customizing the science gateway
• Making data discoverable at scale
• Integrating compute into your science gateway
- Hands-on exercise
- Live demonstration
6. The brilliance “arms race”...
K. Wille, The Physics of Particle Accelerators: An Introduction, Oxford University Press, Oxford, UK (2000); J. B. Parise and G. E. Brown, Jr., Elements, 2, 37-42 (2006)
7. Some challenges…
• Increasing data rates, heterogeneity
• Continuum of computing resources
• Differing workflows across instruments
8. Distribution Store
Data Portal
Advanced Computing Facility
Instrument Facility
A common data flow pattern
Image Analysis
3
Search/Discovery
5
Science!
6
Imaging
1 Acquisition
2
Description/Identification
4
v
9. Globus services for research data management
Unified Data Access Data Transfer and Sharing Platform-as-a-Service
Reliable Automation Publication & Discovery Remote Execution (future)
12. Why we use portals and science gateways
• Different experiments (beamlines, electron
microscopes, biology, etc) generate data with
different types, size and experimental information
• Processing, curation, and cataloguing need to
happen as soon as possible so data are not lost
• Standardize secure access between users
• Work toward FAIR datasets to enable more science
13. Benefits
• Make data FAIRer
• Track lots of (heterogeneous) data
• Facilitate discovery
– Free text search in Globus Search
– Filtering on specific values
– User Friendly GUI
• Enforce appropriate access controls
– Public/private, group-, subject-level ACLs
• Integrate with other (Globus) services
• Customize for your research environment
14. MRDP: Key elements
Science DMZ
Fast, clean data path
Data Transfer Nodes
Purpose-built data movers
Globus Platform
Secure, reliable data
orchestration
Globus Connect
Storage system enabler
16
Globus Portal
Framework
Data discovery and access
22. Relevant Globus platform capabilities
• Data transfer and sharing
• Data description (metadata) and discovery
• Data (and compute) task orchestration
• Authentication and Authorization
25
24. Globus Auth: Foundational IAM service
Brokers authentication and authorization among…
– End-users
– Identity providers: enterprise, external (federated identities)
– Services: resource servers with REST APIs
– Apps: web, mobile, desktop, command line clients
– Services acting as clients to other services
• OAuth 2.0 Authorization Framework (a.k.a. OAuth2)
• OpenID Connect Core 1.0 (a.k.a. OIDC)
27
25. Several authentication models supported
• Application acting as user with consent
– Auth flow: Authorization code grant
• Application authenticating as itself
– Auth flow: Client credentials grant
– Application (client) has its own identity à app are people too!
• Application able to manage tokens for offline or long
running tasks
– Refresh tokens
26. Data transfer and sharing
• Move data to collection à Submit Transfer task
• Make data accessible à Set guest collection access rule
• Grant user/app access à Add/confirm Group membership
29
Groups
service
Transfer
service
GET /groups/my_groups
POST /endpoint/{endpoint_id}/access
POST /transfer
27. Using guest collections in your data portal
• Create a guest collection; requires authentication
– Cannot be completely automated – must ”log in”
– Create once and automate rest of the steps
• Grant the application Access Manager role
– Allows the application to manage permissions on the collection
– Set for application identity: appclientid@clients.auth.globus.org
• Grant roles for management of endpoint and tasks
29. Globus Search
Evolving the MRDP design pattern
Enabling discoverability:
MRDP + Faceted Search
Input form
Automated
Extraction
Ingest metadata, set
visibility policies
Bulk ingest
MRDP
30. Portal Core Functionality
• User authentication
• Django-based framework
– Portal URL mappings
– Token loading
• Service calls to Globus Search
• Manage request lifecycle
• Post process search requests
31. User authentication
• Scopes are configured in the portal
• Users authenticate with Globus using standard flow
– Python Social Auth used for Authentication backend
• User tokens are saved in the database
• Future requests authorized with user access tokens
– Searches use Search bearer token
32. Portal service calls use the Globus SDK
• Globus portal framework loads tokens from database
• Globus service object instantiated with token
• Call to Globus service(s)
• Portal renders result in templates
33. Globus Portal Framework URLs
• URLs span three categories
– Index Selection
– Index Search page
– Search Subject detail page
• Supports multiple Globus Search indices
• Search page links to multiple result subjects
• Each subject has a unique URL
35. An index is configuration driven
• A Search index is configured in portal settings
• Add Globus Search index UUID
• Add a name
• Add facets
• Add fields
• Start searching!
36. Lifecycle of a request
• User makes a query
• Portal sends request to Globus Search
– Request contains user bearer token
• Portal receives response
• Portal does processing on response
– Parse Dates, build URL for Globus webapp, etc.
• Portal renders data into templates
• User receives a search page
37. Creating your science
gateway using the
Globus portal
framework
40
bit.ly/minisgci-2022
Source: github.com/globus/django-globus-portal-framework
Docs: django-globus-portal-framework.readthedocs.io/en/stable/
38. Step 0: Application registration
• Set redirect URLs
• Get client ID and secret
• Consents implement least
privileges principle
41
developers.globus.org
Redirect URLs
https://tutN.globusdemo.org:8443/
https://tutN.globusdemo.org:8443/complete/globus/
39. Accessing your VM
Host: tutN.globusdemo.org
Login user: devN
Password: Globus_2022#
42
bit.ly/minisgci-2022
Source: github.com/globus/django-globus-portal-framework
Docs: django-globus-portal-framework.readthedocs.io/en/stable/
40. Portal deployment
• Install dependent libraries
– For production use, add robust WSGI/ASGI server
• Deploy a portal instance using cookiecutter
• Configure settings
• Run and use!
• Future: containers
42. Data description and discovery
• Metadata store with fine-
grained visibility controls
• Schema agnostic
à dynamic schemas
• Simple search using URL
query parameters
• Complex search using
search request document
46
docs.globus.org/api/search
Search
Index
43. Distinct access policies
may be applied to
Data and Metadata
…(ideally) using
permissions on
guest collections
…using
permissions on
metadata elements
44. Data ingest with Globus Search
48
Search
Index
POST /index/{index_id}/ingest'
{
"ingest_type": "GMetaList",
"ingest_data": {
"gmeta": [
{
"id": "filetype",
"subject”: "https://search.api.globus.org/abc.txt",
"visible_to": ["public"],
"content": {
"metadata-schema/file#type": "file”
}
},
...
]
}
- Bulk create and update
- Task model for ingest at scale
45. Data ingest with Globus Search
49
Search
Index
POST /index/{index_id}/ingest'
{
"ingest_type": "GMetaList",
"ingest_data": {
"gmeta": [
{
"id": ”weight",
"subject": "https://search.api.globus.org/abc.txt",
"visible_to": ["urn:globus:auth:identity:46bd0f56-
e24f-11e5-a510-131bef46955c"],
"content": {
"metadata-schema/file#size": ”37.6",
"metadata-schema/file#size_human": ”<50lb”
}
},
...
]
}
Visibility limited to Globus Auth identity
- Single user
- Globus Group
- Registered client application
48. Cancer Registry Records for Research (CR3)
• Create network of federated cancer registries
– Deploy similar infrastructure at other cancer registries
– Enable queries across multiple registries
• Federation via Globus: network scale ßà local control
– Data owners input/export data sets, apply QC, set access policies
– Registry data remain at the institution where they were generated
– Identities are provided/authenticated by the institution, not Globus
– System scale depends on data owners providing storage resources
49. CR3 requirements
• Search Index
– Only de-identified data in search index
– No record-level for researchers
• Portal
– Fine-grained access control
– Researchers must use a specific identity
– Access must be logged
– Render graphs based on search results
– Faceted search in real time
50. CR3
Discovery
Portal
Cohort
aggregate
counts
Login with
UPMC/Pitt
credentials
Globus
Search (GS)
Globus
Auth (GA)
UPMC/Pitt
Identity
Providers
Authentication
Auth
initiated to
GA
Cohort
search
initiated to
GS
Researcher
Cohort
aggregate
counts
returned
CR3 Architecture
Globus
Transfer (GT)
Registry Staff
Data transfer from registrar to
researcher mediated by GT
Manage
authorization
Elasticsearch
Request
Service
Cancer Registry De-identified
Data Index (minimal criteria
data: e.g., staging)
51. SEER Registry
Medical Center Registry
State Registry
SEER Registry
Medical Center Registry
State Registry
CR3 Portal (simulated data)
Federated logon using Globus Auth
with Pitt/UPMC as identity providers
Dynamically updating
charts as facets change
Variable facets based on
source registry index
Google-like text search with
facets for filtering
Developed using a framework based
on the Globus Modern Research
Data Portal* design pattern
(docs.globus.org/mrdp)
* PeerJ Articles:cs-144 https://peerj.com/articles/cs-144/
56. Globus Automation Capabilities
Timer Service
Scheduled and recurring transfers
(a.k.a. Globus cron)
Command Line Interface
Ad hoc scripting and integration
Globus Flows service
Comprehensive task (data and
compute) orchestration with human in
the loop interactions
58. The Globus Timer service
• Scheduled/recurring file transfers
• Supports all Globus transfer and sync options
• Service with a command line interface
• Example: NIH – hpc.nih.gov/storage/globus_cron.html
68
63. Managed automation of tasks
• Flows: A platform service for defining, applying, and
sharing distributed research automation flows
• Flows comprise Actions
• Action Providers: Called by Flows to perform tasks
• Triggers*: Start flows based on events
* In development
64. Automation with Globus Flows
• Built on AWS Step Functions
– Simple JSON-based state machine
language
– Conditions, loops, fault tolerance, etc.
– Propagates state through the flow
• Standardized API for integrating
custom event and action services
– Actions: synchronous or asynchronous
– Custom Web forms prompt for user input
• Actions secured with Globus Auth
65. Extending the ecosystem: Action providers
78
• Action Provider is a
service endpoint
– Run
– Status
– Cancel
– Release
– Resume
• Action Provider Toolkit
action-provider-
tools.readthedocs.io/en/latest
Search
Transfer
Notification
ACLs Identifier
Delete
Ingest
User
Form
Describe Xtract
funcX Web
Form
Custom built
Globus Provided
66. Automation services ecosystem
GET /provider_url/
POST /provider_url/run
GET /provider_url/action_id/status
GET /provider_url/action_id/cancel
GET /provider_url/action_id/status
Create Action
Providers
Define and
deploy flows
{ “StartAt”: ”ToProject”,
”States” : {
”ToProject” : { … },
”SetPermission” : { …},
“ProcessData” : { … } … }}
Run flows
67. Working with
Globus Flows
Try it: demo.gladier.org/gladier-demo/upload-file
Run flows: app.globus.org/flows/library
Docs: docs.globus.org/globus-automation-services
80