SlideShare uma empresa Scribd logo
1 de 70
Baixar para ler offline
Advanced Computing Meets Data FAIRness
Building Science Gateways with the Django Globus Portal Framework
Vas Vasiliadis – vas@uchicago.edu
Lee Liming – lliming@uchicago.edu
April 5, 2022
Tutorial materials and handy links
bit.ly/minisgci-2022
Agenda
• Introduction and motivation
• The Modern Research Data Portal design pattern
• Deploying a science gateway using the MRDP
• Making data findable with Globus Search
• Customizing the science gateway
• Making data discoverable at scale
• Integrating compute into your science gateway
- Hands-on exercise
- Live demonstration
Introduction and Motivation
What’s the common theme?
6
The brilliance “arms race”...
K. Wille, The Physics of Particle Accelerators: An Introduction, Oxford University Press, Oxford, UK (2000); J. B. Parise and G. E. Brown, Jr., Elements, 2, 37-42 (2006)
Some challenges…
• Increasing data rates, heterogeneity
• Continuum of computing resources
• Differing workflows across instruments
Distribution Store
Data Portal
Advanced Computing Facility
Instrument Facility
A common data flow pattern
Image Analysis
3
Search/Discovery
5
Science!
6
Imaging
1 Acquisition
2
Description/Identification
4
v
Globus services for research data management
Unified Data Access Data Transfer and Sharing Platform-as-a-Service
Reliable Automation Publication & Discovery Remote Execution (future)
The Modern Research Data
Portal Design Pattern
docs.globus.org/mrdp
Why we use portals and science gateways
• Different experiments (beamlines, electron
microscopes, biology, etc) generate data with
different types, size and experimental information
• Processing, curation, and cataloguing need to
happen as soon as possible so data are not lost
• Standardize secure access between users
• Work toward FAIR datasets to enable more science
Benefits
• Make data FAIRer
• Track lots of (heterogeneous) data
• Facilitate discovery
– Free text search in Globus Search
– Filtering on specific values
– User Friendly GUI
• Enforce appropriate access controls
– Public/private, group-, subject-level ACLs
• Integrate with other (Globus) services
• Customize for your research environment
MRDP: Key elements
Science DMZ
Fast, clean data path
Data Transfer Nodes
Purpose-built data movers
Globus Platform
Secure, reliable data
orchestration
Globus Connect
Storage system enabler
16
Globus Portal
Framework
Data discovery and access
…makes your
storage system a
Globus endpoint
Globus Connectors support diverse systems
What’s wrong with my LRDP?
19
L(egacy)RDP architecture
20
Source: ESnet Science Engagement team
MRDP network architecture
21
Source:
ESnet
Science
Engagement
team
An exemplar:
The ALCF Data Co-op
22
acdc.alcf.anl.gov
Globus
Platform
Services
Relevant Globus platform capabilities
• Data transfer and sharing
• Data description (metadata) and discovery
• Data (and compute) task orchestration
• Authentication and Authorization
25
Brokering Access to
Services using Globus Auth
Globus Auth: Foundational IAM service
Brokers authentication and authorization among…
– End-users
– Identity providers: enterprise, external (federated identities)
– Services: resource servers with REST APIs
– Apps: web, mobile, desktop, command line clients
– Services acting as clients to other services
• OAuth 2.0 Authorization Framework (a.k.a. OAuth2)
• OpenID Connect Core 1.0 (a.k.a. OIDC)
27
Several authentication models supported
• Application acting as user with consent
– Auth flow: Authorization code grant
• Application authenticating as itself
– Auth flow: Client credentials grant
– Application (client) has its own identity à app are people too!
• Application able to manage tokens for offline or long
running tasks
– Refresh tokens
Data transfer and sharing
• Move data to collection à Submit Transfer task
• Make data accessible à Set guest collection access rule
• Grant user/app access à Add/confirm Group membership
29
Groups
service
Transfer
service
GET /groups/my_groups
POST /endpoint/{endpoint_id}/access
POST /transfer
Using guest collections in your data portal
• Create a guest collection; requires authentication
– Cannot be completely automated – must ”log in”
– Create once and automate rest of the steps
• Grant the application Access Manager role
– Allows the application to manage permissions on the collection
– Set for application identity: appclientid@clients.auth.globus.org
• Grant roles for management of endpoint and tasks
Deploying a Simple
(but fully functional and extensible)
Research Data Portal
Globus Search
Evolving the MRDP design pattern
Enabling discoverability:
MRDP + Faceted Search
Input form
Automated
Extraction
Ingest metadata, set
visibility policies
Bulk ingest
MRDP
Portal Core Functionality
• User authentication
• Django-based framework
– Portal URL mappings
– Token loading
• Service calls to Globus Search
• Manage request lifecycle
• Post process search requests
User authentication
• Scopes are configured in the portal
• Users authenticate with Globus using standard flow
– Python Social Auth used for Authentication backend
• User tokens are saved in the database
• Future requests authorized with user access tokens
– Searches use Search bearer token
Portal service calls use the Globus SDK
• Globus portal framework loads tokens from database
• Globus service object instantiated with token
• Call to Globus service(s)
• Portal renders result in templates
Globus Portal Framework URLs
• URLs span three categories
– Index Selection
– Index Search page
– Search Subject detail page
• Supports multiple Globus Search indices
• Search page links to multiple result subjects
• Each subject has a unique URL
Format of a URL
An index is configuration driven
• A Search index is configured in portal settings
• Add Globus Search index UUID
• Add a name
• Add facets
• Add fields
• Start searching!
Lifecycle of a request
• User makes a query
• Portal sends request to Globus Search
– Request contains user bearer token
• Portal receives response
• Portal does processing on response
– Parse Dates, build URL for Globus webapp, etc.
• Portal renders data into templates
• User receives a search page
Creating your science
gateway using the
Globus portal
framework
40
bit.ly/minisgci-2022
Source: github.com/globus/django-globus-portal-framework
Docs: django-globus-portal-framework.readthedocs.io/en/stable/
Step 0: Application registration
• Set redirect URLs
• Get client ID and secret
• Consents implement least
privileges principle
41
developers.globus.org
Redirect URLs
https://tutN.globusdemo.org:8443/
https://tutN.globusdemo.org:8443/complete/globus/
Accessing your VM
Host: tutN.globusdemo.org
Login user: devN
Password: Globus_2022#
42
bit.ly/minisgci-2022
Source: github.com/globus/django-globus-portal-framework
Docs: django-globus-portal-framework.readthedocs.io/en/stable/
Portal deployment
• Install dependent libraries
– For production use, add robust WSGI/ASGI server
• Deploy a portal instance using cookiecutter
• Configure settings
• Run and use!
• Future: containers
Making Data Findable with
Globus Search
Data description and discovery
• Metadata store with fine-
grained visibility controls
• Schema agnostic
à dynamic schemas
• Simple search using URL
query parameters
• Complex search using
search request document
46
docs.globus.org/api/search
Search
Index
Distinct access policies
may be applied to
Data and Metadata
…(ideally) using
permissions on
guest collections
…using
permissions on
metadata elements
Data ingest with Globus Search
48
Search
Index
POST /index/{index_id}/ingest'
{
"ingest_type": "GMetaList",
"ingest_data": {
"gmeta": [
{
"id": "filetype",
"subject”: "https://search.api.globus.org/abc.txt",
"visible_to": ["public"],
"content": {
"metadata-schema/file#type": "file”
}
},
...
]
}
- Bulk create and update
- Task model for ingest at scale
Data ingest with Globus Search
49
Search
Index
POST /index/{index_id}/ingest'
{
"ingest_type": "GMetaList",
"ingest_data": {
"gmeta": [
{
"id": ”weight",
"subject": "https://search.api.globus.org/abc.txt",
"visible_to": ["urn:globus:auth:identity:46bd0f56-
e24f-11e5-a510-131bef46955c"],
"content": {
"metadata-schema/file#size": ”37.6",
"metadata-schema/file#size_human": ”<50lb”
}
},
...
]
}
Visibility limited to Globus Auth identity
- Single user
- Globus Group
- Registered client application
Data discovery with Globus Search
50
{
"@datatype": "GSearchResult",
"@version": "2017-09-01",
"count": 1,
"gmeta": [
{
"@datatype": "GMetaResult",
"@version": "2019-08-27",
"entries": [
{ ... }
],
"subject": "https://..."
}
],
"offset": 0,
"total": 1
}
GET /index/{index_id}/search?q=type%3Ahdf5
Search
Index
Simple query
Data discovery with Globus Search
51
POST /index/{index_id}/search
Search
Index
Complex query
{
"filters": [
{
"type": "range",
"field_name": ”pubdate",
"values": [
{
"from": "*",
"to": "2020-12-31"
}
]
}
],
"facets": [
{
"name": "Publication Date",
"field_name": "pubdate",
...
}
]
}
Filter
Facets
Boosts
Sort
Limit
Cancer Registry Records for Research (CR3)
• Create network of federated cancer registries
– Deploy similar infrastructure at other cancer registries
– Enable queries across multiple registries
• Federation via Globus: network scale ßà local control
– Data owners input/export data sets, apply QC, set access policies
– Registry data remain at the institution where they were generated
– Identities are provided/authenticated by the institution, not Globus
– System scale depends on data owners providing storage resources
CR3 requirements
• Search Index
– Only de-identified data in search index
– No record-level for researchers
• Portal
– Fine-grained access control
– Researchers must use a specific identity
– Access must be logged
– Render graphs based on search results
– Faceted search in real time
CR3
Discovery
Portal
Cohort
aggregate
counts
Login with
UPMC/Pitt
credentials
Globus
Search (GS)
Globus
Auth (GA)
UPMC/Pitt
Identity
Providers
Authentication
Auth
initiated to
GA
Cohort
search
initiated to
GS
Researcher
Cohort
aggregate
counts
returned
CR3 Architecture
Globus
Transfer (GT)
Registry Staff
Data transfer from registrar to
researcher mediated by GT
Manage
authorization
Elasticsearch
Request
Service
Cancer Registry De-identified
Data Index (minimal criteria
data: e.g., staging)
SEER Registry
Medical Center Registry
State Registry
SEER Registry
Medical Center Registry
State Registry
CR3 Portal (simulated data)
Federated logon using Globus Auth
with Pitt/UPMC as identity providers
Dynamically updating
charts as facets change
Variable facets based on
source registry index
Google-like text search with
facets for filtering
Developed using a framework based
on the Globus Modern Research
Data Portal* design pattern
(docs.globus.org/mrdp)
* PeerJ Articles:cs-144 https://peerj.com/articles/cs-144/
Working with Globus
Search
56
jupyter.demo.globus.org
Ingesting search
metadata
57
github.com/globus/searchable-files-demo
Adding a new search
index to your portal
63
Making Data Discoverable at
Scale
Globus Automation Capabilities
Timer Service
Scheduled and recurring transfers
(a.k.a. Globus cron)
Command Line Interface
Ad hoc scripting and integration
Globus Flows service
Comprehensive task (data and
compute) orchestration with human in
the loop interactions
Globus Timer Service
The Globus Timer service
• Scheduled/recurring file transfers
• Supports all Globus transfer and sync options
• Service with a command line interface
• Example: NIH – hpc.nih.gov/storage/globus_cron.html
68
Scheduled transfers
to data portal
endpoint(s)
Globus Timer CLI: pypi.org/project/globus-timer-cli
71
Globus Command Line
Interface (CLI)
Globus Command Line Interface
Open source, uses
the Python SDK
Globus Flows Service
Managed automation of tasks
• Flows: A platform service for defining, applying, and
sharing distributed research automation flows
• Flows comprise Actions
• Action Providers: Called by Flows to perform tasks
• Triggers*: Start flows based on events
* In development
Automation with Globus Flows
• Built on AWS Step Functions
– Simple JSON-based state machine
language
– Conditions, loops, fault tolerance, etc.
– Propagates state through the flow
• Standardized API for integrating
custom event and action services
– Actions: synchronous or asynchronous
– Custom Web forms prompt for user input
• Actions secured with Globus Auth
Extending the ecosystem: Action providers
78
• Action Provider is a
service endpoint
– Run
– Status
– Cancel
– Release
– Resume
• Action Provider Toolkit
action-provider-
tools.readthedocs.io/en/latest
Search
Transfer
Notification
ACLs Identifier
Delete
Ingest
User
Form
Describe Xtract
funcX Web
Form
Custom built
Globus Provided
Automation services ecosystem
GET /provider_url/
POST /provider_url/run
GET /provider_url/action_id/status
GET /provider_url/action_id/cancel
GET /provider_url/action_id/status
Create Action
Providers
Define and
deploy flows
{ “StartAt”: ”ToProject”,
”States” : {
”ToProject” : { … },
”SetPermission” : { …},
“ProcessData” : { … } … }}
Run flows
Working with
Globus Flows
Try it: demo.gladier.org/gladier-demo/upload-file
Run flows: app.globus.org/flows/library
Docs: docs.globus.org/globus-automation-services
80
Adding
compute to
your science
gateway
81
Coming soon: Globus Trigger service
• Trigger–Action platform
• Predefined triggers and
actions to create rules
• Globus processes triggers
and reliably executes actions
bit.ly/minisgci-2022
docs.globus.org
github.com/globus
outreach@globus.org
support@globus.org

Mais conteúdo relacionado

Semelhante a Advanced Computing Meets Data FAIRness

Cloud-based Linked Data Management for Self-service Application Development
Cloud-based Linked Data Management for Self-service Application DevelopmentCloud-based Linked Data Management for Self-service Application Development
Cloud-based Linked Data Management for Self-service Application Development
Peter Haase
 
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
Lucidworks (Archived)
 
Data in Context Interest Group Sessions @ RDA 3rd Plenary, Dublin (March 26-2...
Data in Context Interest Group Sessions @ RDA 3rd Plenary, Dublin (March 26-2...Data in Context Interest Group Sessions @ RDA 3rd Plenary, Dublin (March 26-2...
Data in Context Interest Group Sessions @ RDA 3rd Plenary, Dublin (March 26-2...
Brigitte Jörg
 

Semelhante a Advanced Computing Meets Data FAIRness (20)

Working with Globus Platform Services and Portals
Working with Globus Platform Services and PortalsWorking with Globus Platform Services and Portals
Working with Globus Platform Services and Portals
 
Facilitating Collaboration with Globus (GlobusWorld Tour - STFC)
Facilitating Collaboration with Globus (GlobusWorld Tour - STFC)Facilitating Collaboration with Globus (GlobusWorld Tour - STFC)
Facilitating Collaboration with Globus (GlobusWorld Tour - STFC)
 
Gateways 2020 Tutorial - Automated Data Ingest and Search with Globus
Gateways 2020 Tutorial - Automated Data Ingest and Search with GlobusGateways 2020 Tutorial - Automated Data Ingest and Search with Globus
Gateways 2020 Tutorial - Automated Data Ingest and Search with Globus
 
Echoes Project
Echoes ProjectEchoes Project
Echoes Project
 
Cloud-based Linked Data Management for Self-service Application Development
Cloud-based Linked Data Management for Self-service Application DevelopmentCloud-based Linked Data Management for Self-service Application Development
Cloud-based Linked Data Management for Self-service Application Development
 
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
 
DOXLON November 2016 - Data Democratization Using Splunk
DOXLON November 2016 - Data Democratization Using SplunkDOXLON November 2016 - Data Democratization Using Splunk
DOXLON November 2016 - Data Democratization Using Splunk
 
GlobusWorld 2020 Keynote
GlobusWorld 2020 KeynoteGlobusWorld 2020 Keynote
GlobusWorld 2020 Keynote
 
Enterprise guide to building a Data Mesh
Enterprise guide to building a Data MeshEnterprise guide to building a Data Mesh
Enterprise guide to building a Data Mesh
 
Leveraging the Globus Platform in Web Applications (CHPC 2019 - South Africa)
Leveraging the Globus Platform in Web Applications (CHPC 2019 - South Africa)Leveraging the Globus Platform in Web Applications (CHPC 2019 - South Africa)
Leveraging the Globus Platform in Web Applications (CHPC 2019 - South Africa)
 
Building a modern in-house analytics pipeline
Building a modern in-house analytics pipelineBuilding a modern in-house analytics pipeline
Building a modern in-house analytics pipeline
 
Leveraging the Globus Platform (GlobusWorld Tour - Columbia University)
Leveraging the Globus Platform (GlobusWorld Tour - Columbia University)Leveraging the Globus Platform (GlobusWorld Tour - Columbia University)
Leveraging the Globus Platform (GlobusWorld Tour - Columbia University)
 
MongoDB Basics
MongoDB BasicsMongoDB Basics
MongoDB Basics
 
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWS
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWSExperiences In Building Globus Genomics Using Galaxy, Globus Online and AWS
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWS
 
Introduction to the Globus Platform (GlobusWorld Tour - UMich)
Introduction to the Globus Platform (GlobusWorld Tour - UMich)Introduction to the Globus Platform (GlobusWorld Tour - UMich)
Introduction to the Globus Platform (GlobusWorld Tour - UMich)
 
Introduction to Globus for New Users
Introduction to Globus for New UsersIntroduction to Globus for New Users
Introduction to Globus for New Users
 
Neo4j GraphDay Seattle- Sept19- in the enterprise
Neo4j GraphDay Seattle- Sept19-  in the enterpriseNeo4j GraphDay Seattle- Sept19-  in the enterprise
Neo4j GraphDay Seattle- Sept19- in the enterprise
 
"Data in Context" IG sessions @ RDA 3rd Plenary
"Data in Context" IG sessions @  RDA 3rd Plenary"Data in Context" IG sessions @  RDA 3rd Plenary
"Data in Context" IG sessions @ RDA 3rd Plenary
 
Data in Context Interest Group Sessions @ RDA 3rd Plenary, Dublin (March 26-2...
Data in Context Interest Group Sessions @ RDA 3rd Plenary, Dublin (March 26-2...Data in Context Interest Group Sessions @ RDA 3rd Plenary, Dublin (March 26-2...
Data in Context Interest Group Sessions @ RDA 3rd Plenary, Dublin (March 26-2...
 
BlueBrain Nexus Technical Introduction
BlueBrain Nexus Technical IntroductionBlueBrain Nexus Technical Introduction
BlueBrain Nexus Technical Introduction
 

Mais de Globus

Mais de Globus (20)

Advanced Globus System Administration Topics
Advanced Globus System Administration TopicsAdvanced Globus System Administration Topics
Advanced Globus System Administration Topics
 
Instrument Data Automation: The Life of a Flow
Instrument Data Automation: The Life of a FlowInstrument Data Automation: The Life of a Flow
Instrument Data Automation: The Life of a Flow
 
Reliable, Remote Computation at All Scales
Reliable, Remote Computation at All ScalesReliable, Remote Computation at All Scales
Reliable, Remote Computation at All Scales
 
Best Practices for Data Sharing Using Globus
Best Practices for Data Sharing Using GlobusBest Practices for Data Sharing Using Globus
Best Practices for Data Sharing Using Globus
 
An Introduction to Globus for Researchers
An Introduction to Globus for ResearchersAn Introduction to Globus for Researchers
An Introduction to Globus for Researchers
 
Introduction to Research Automation with Globus
Introduction to Research Automation with GlobusIntroduction to Research Automation with Globus
Introduction to Research Automation with Globus
 
Globus for System Administrators
Globus for System AdministratorsGlobus for System Administrators
Globus for System Administrators
 
Introduction to Globus for System Administrators
Introduction to Globus for System AdministratorsIntroduction to Globus for System Administrators
Introduction to Globus for System Administrators
 
Introduction to Data Transfer and Sharing for Researchers
Introduction to Data Transfer and Sharing for ResearchersIntroduction to Data Transfer and Sharing for Researchers
Introduction to Data Transfer and Sharing for Researchers
 
Introduction to the Globus Platform for Developers
Introduction to the Globus Platform for DevelopersIntroduction to the Globus Platform for Developers
Introduction to the Globus Platform for Developers
 
Introduction to the Command Line Interface (CLI)
Introduction to the Command Line Interface (CLI)Introduction to the Command Line Interface (CLI)
Introduction to the Command Line Interface (CLI)
 
Automating Research Data with Globus Flows and Compute
Automating Research Data with Globus Flows and ComputeAutomating Research Data with Globus Flows and Compute
Automating Research Data with Globus Flows and Compute
 
Automating Research Data Flows and Introduction to the Globus Platform
Automating Research Data Flows and Introduction to the Globus PlatformAutomating Research Data Flows and Introduction to the Globus Platform
Automating Research Data Flows and Introduction to the Globus Platform
 
Advanced Globus System Administration
Advanced Globus System AdministrationAdvanced Globus System Administration
Advanced Globus System Administration
 
Introduction to Globus for System Administrators
Introduction to Globus for System AdministratorsIntroduction to Globus for System Administrators
Introduction to Globus for System Administrators
 
Globus Automation
Globus AutomationGlobus Automation
Globus Automation
 
Advanced Globus System Administration
Advanced Globus System AdministrationAdvanced Globus System Administration
Advanced Globus System Administration
 
Introduction to Globus
Introduction to GlobusIntroduction to Globus
Introduction to Globus
 
Introduction to Globus for System Administrators
Introduction to Globus for System AdministratorsIntroduction to Globus for System Administrators
Introduction to Globus for System Administrators
 
Advanced Globus System Administration
Advanced Globus System AdministrationAdvanced Globus System Administration
Advanced Globus System Administration
 

Último

%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
masabamasaba
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
masabamasaba
 
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
masabamasaba
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
masabamasaba
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
chiefasafspells
 

Último (20)

%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
WSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - KeynoteWSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - Keynote
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go Platformless
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
 
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
 
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
 
tonesoftg
tonesoftgtonesoftg
tonesoftg
 
Artyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxArtyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptx
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 

Advanced Computing Meets Data FAIRness

  • 1. Advanced Computing Meets Data FAIRness Building Science Gateways with the Django Globus Portal Framework Vas Vasiliadis – vas@uchicago.edu Lee Liming – lliming@uchicago.edu April 5, 2022
  • 2. Tutorial materials and handy links bit.ly/minisgci-2022
  • 3. Agenda • Introduction and motivation • The Modern Research Data Portal design pattern • Deploying a science gateway using the MRDP • Making data findable with Globus Search • Customizing the science gateway • Making data discoverable at scale • Integrating compute into your science gateway - Hands-on exercise - Live demonstration
  • 6. The brilliance “arms race”... K. Wille, The Physics of Particle Accelerators: An Introduction, Oxford University Press, Oxford, UK (2000); J. B. Parise and G. E. Brown, Jr., Elements, 2, 37-42 (2006)
  • 7. Some challenges… • Increasing data rates, heterogeneity • Continuum of computing resources • Differing workflows across instruments
  • 8. Distribution Store Data Portal Advanced Computing Facility Instrument Facility A common data flow pattern Image Analysis 3 Search/Discovery 5 Science! 6 Imaging 1 Acquisition 2 Description/Identification 4 v
  • 9. Globus services for research data management Unified Data Access Data Transfer and Sharing Platform-as-a-Service Reliable Automation Publication & Discovery Remote Execution (future)
  • 10.
  • 11. The Modern Research Data Portal Design Pattern docs.globus.org/mrdp
  • 12. Why we use portals and science gateways • Different experiments (beamlines, electron microscopes, biology, etc) generate data with different types, size and experimental information • Processing, curation, and cataloguing need to happen as soon as possible so data are not lost • Standardize secure access between users • Work toward FAIR datasets to enable more science
  • 13. Benefits • Make data FAIRer • Track lots of (heterogeneous) data • Facilitate discovery – Free text search in Globus Search – Filtering on specific values – User Friendly GUI • Enforce appropriate access controls – Public/private, group-, subject-level ACLs • Integrate with other (Globus) services • Customize for your research environment
  • 14. MRDP: Key elements Science DMZ Fast, clean data path Data Transfer Nodes Purpose-built data movers Globus Platform Secure, reliable data orchestration Globus Connect Storage system enabler 16 Globus Portal Framework Data discovery and access
  • 15. …makes your storage system a Globus endpoint
  • 16. Globus Connectors support diverse systems
  • 17. What’s wrong with my LRDP? 19
  • 18. L(egacy)RDP architecture 20 Source: ESnet Science Engagement team
  • 20. An exemplar: The ALCF Data Co-op 22 acdc.alcf.anl.gov
  • 22. Relevant Globus platform capabilities • Data transfer and sharing • Data description (metadata) and discovery • Data (and compute) task orchestration • Authentication and Authorization 25
  • 23. Brokering Access to Services using Globus Auth
  • 24. Globus Auth: Foundational IAM service Brokers authentication and authorization among… – End-users – Identity providers: enterprise, external (federated identities) – Services: resource servers with REST APIs – Apps: web, mobile, desktop, command line clients – Services acting as clients to other services • OAuth 2.0 Authorization Framework (a.k.a. OAuth2) • OpenID Connect Core 1.0 (a.k.a. OIDC) 27
  • 25. Several authentication models supported • Application acting as user with consent – Auth flow: Authorization code grant • Application authenticating as itself – Auth flow: Client credentials grant – Application (client) has its own identity à app are people too! • Application able to manage tokens for offline or long running tasks – Refresh tokens
  • 26. Data transfer and sharing • Move data to collection à Submit Transfer task • Make data accessible à Set guest collection access rule • Grant user/app access à Add/confirm Group membership 29 Groups service Transfer service GET /groups/my_groups POST /endpoint/{endpoint_id}/access POST /transfer
  • 27. Using guest collections in your data portal • Create a guest collection; requires authentication – Cannot be completely automated – must ”log in” – Create once and automate rest of the steps • Grant the application Access Manager role – Allows the application to manage permissions on the collection – Set for application identity: appclientid@clients.auth.globus.org • Grant roles for management of endpoint and tasks
  • 28. Deploying a Simple (but fully functional and extensible) Research Data Portal
  • 29. Globus Search Evolving the MRDP design pattern Enabling discoverability: MRDP + Faceted Search Input form Automated Extraction Ingest metadata, set visibility policies Bulk ingest MRDP
  • 30. Portal Core Functionality • User authentication • Django-based framework – Portal URL mappings – Token loading • Service calls to Globus Search • Manage request lifecycle • Post process search requests
  • 31. User authentication • Scopes are configured in the portal • Users authenticate with Globus using standard flow – Python Social Auth used for Authentication backend • User tokens are saved in the database • Future requests authorized with user access tokens – Searches use Search bearer token
  • 32. Portal service calls use the Globus SDK • Globus portal framework loads tokens from database • Globus service object instantiated with token • Call to Globus service(s) • Portal renders result in templates
  • 33. Globus Portal Framework URLs • URLs span three categories – Index Selection – Index Search page – Search Subject detail page • Supports multiple Globus Search indices • Search page links to multiple result subjects • Each subject has a unique URL
  • 34. Format of a URL
  • 35. An index is configuration driven • A Search index is configured in portal settings • Add Globus Search index UUID • Add a name • Add facets • Add fields • Start searching!
  • 36. Lifecycle of a request • User makes a query • Portal sends request to Globus Search – Request contains user bearer token • Portal receives response • Portal does processing on response – Parse Dates, build URL for Globus webapp, etc. • Portal renders data into templates • User receives a search page
  • 37. Creating your science gateway using the Globus portal framework 40 bit.ly/minisgci-2022 Source: github.com/globus/django-globus-portal-framework Docs: django-globus-portal-framework.readthedocs.io/en/stable/
  • 38. Step 0: Application registration • Set redirect URLs • Get client ID and secret • Consents implement least privileges principle 41 developers.globus.org Redirect URLs https://tutN.globusdemo.org:8443/ https://tutN.globusdemo.org:8443/complete/globus/
  • 39. Accessing your VM Host: tutN.globusdemo.org Login user: devN Password: Globus_2022# 42 bit.ly/minisgci-2022 Source: github.com/globus/django-globus-portal-framework Docs: django-globus-portal-framework.readthedocs.io/en/stable/
  • 40. Portal deployment • Install dependent libraries – For production use, add robust WSGI/ASGI server • Deploy a portal instance using cookiecutter • Configure settings • Run and use! • Future: containers
  • 41. Making Data Findable with Globus Search
  • 42. Data description and discovery • Metadata store with fine- grained visibility controls • Schema agnostic à dynamic schemas • Simple search using URL query parameters • Complex search using search request document 46 docs.globus.org/api/search Search Index
  • 43. Distinct access policies may be applied to Data and Metadata …(ideally) using permissions on guest collections …using permissions on metadata elements
  • 44. Data ingest with Globus Search 48 Search Index POST /index/{index_id}/ingest' { "ingest_type": "GMetaList", "ingest_data": { "gmeta": [ { "id": "filetype", "subject”: "https://search.api.globus.org/abc.txt", "visible_to": ["public"], "content": { "metadata-schema/file#type": "file” } }, ... ] } - Bulk create and update - Task model for ingest at scale
  • 45. Data ingest with Globus Search 49 Search Index POST /index/{index_id}/ingest' { "ingest_type": "GMetaList", "ingest_data": { "gmeta": [ { "id": ”weight", "subject": "https://search.api.globus.org/abc.txt", "visible_to": ["urn:globus:auth:identity:46bd0f56- e24f-11e5-a510-131bef46955c"], "content": { "metadata-schema/file#size": ”37.6", "metadata-schema/file#size_human": ”<50lb” } }, ... ] } Visibility limited to Globus Auth identity - Single user - Globus Group - Registered client application
  • 46. Data discovery with Globus Search 50 { "@datatype": "GSearchResult", "@version": "2017-09-01", "count": 1, "gmeta": [ { "@datatype": "GMetaResult", "@version": "2019-08-27", "entries": [ { ... } ], "subject": "https://..." } ], "offset": 0, "total": 1 } GET /index/{index_id}/search?q=type%3Ahdf5 Search Index Simple query
  • 47. Data discovery with Globus Search 51 POST /index/{index_id}/search Search Index Complex query { "filters": [ { "type": "range", "field_name": ”pubdate", "values": [ { "from": "*", "to": "2020-12-31" } ] } ], "facets": [ { "name": "Publication Date", "field_name": "pubdate", ... } ] } Filter Facets Boosts Sort Limit
  • 48. Cancer Registry Records for Research (CR3) • Create network of federated cancer registries – Deploy similar infrastructure at other cancer registries – Enable queries across multiple registries • Federation via Globus: network scale ßà local control – Data owners input/export data sets, apply QC, set access policies – Registry data remain at the institution where they were generated – Identities are provided/authenticated by the institution, not Globus – System scale depends on data owners providing storage resources
  • 49. CR3 requirements • Search Index – Only de-identified data in search index – No record-level for researchers • Portal – Fine-grained access control – Researchers must use a specific identity – Access must be logged – Render graphs based on search results – Faceted search in real time
  • 50. CR3 Discovery Portal Cohort aggregate counts Login with UPMC/Pitt credentials Globus Search (GS) Globus Auth (GA) UPMC/Pitt Identity Providers Authentication Auth initiated to GA Cohort search initiated to GS Researcher Cohort aggregate counts returned CR3 Architecture Globus Transfer (GT) Registry Staff Data transfer from registrar to researcher mediated by GT Manage authorization Elasticsearch Request Service Cancer Registry De-identified Data Index (minimal criteria data: e.g., staging)
  • 51. SEER Registry Medical Center Registry State Registry SEER Registry Medical Center Registry State Registry CR3 Portal (simulated data) Federated logon using Globus Auth with Pitt/UPMC as identity providers Dynamically updating charts as facets change Variable facets based on source registry index Google-like text search with facets for filtering Developed using a framework based on the Globus Modern Research Data Portal* design pattern (docs.globus.org/mrdp) * PeerJ Articles:cs-144 https://peerj.com/articles/cs-144/
  • 54. Adding a new search index to your portal 63
  • 56. Globus Automation Capabilities Timer Service Scheduled and recurring transfers (a.k.a. Globus cron) Command Line Interface Ad hoc scripting and integration Globus Flows service Comprehensive task (data and compute) orchestration with human in the loop interactions
  • 58. The Globus Timer service • Scheduled/recurring file transfers • Supports all Globus transfer and sync options • Service with a command line interface • Example: NIH – hpc.nih.gov/storage/globus_cron.html 68
  • 59. Scheduled transfers to data portal endpoint(s) Globus Timer CLI: pypi.org/project/globus-timer-cli 71
  • 61. Globus Command Line Interface Open source, uses the Python SDK
  • 63. Managed automation of tasks • Flows: A platform service for defining, applying, and sharing distributed research automation flows • Flows comprise Actions • Action Providers: Called by Flows to perform tasks • Triggers*: Start flows based on events * In development
  • 64. Automation with Globus Flows • Built on AWS Step Functions – Simple JSON-based state machine language – Conditions, loops, fault tolerance, etc. – Propagates state through the flow • Standardized API for integrating custom event and action services – Actions: synchronous or asynchronous – Custom Web forms prompt for user input • Actions secured with Globus Auth
  • 65. Extending the ecosystem: Action providers 78 • Action Provider is a service endpoint – Run – Status – Cancel – Release – Resume • Action Provider Toolkit action-provider- tools.readthedocs.io/en/latest Search Transfer Notification ACLs Identifier Delete Ingest User Form Describe Xtract funcX Web Form Custom built Globus Provided
  • 66. Automation services ecosystem GET /provider_url/ POST /provider_url/run GET /provider_url/action_id/status GET /provider_url/action_id/cancel GET /provider_url/action_id/status Create Action Providers Define and deploy flows { “StartAt”: ”ToProject”, ”States” : { ”ToProject” : { … }, ”SetPermission” : { …}, “ProcessData” : { … } … }} Run flows
  • 67. Working with Globus Flows Try it: demo.gladier.org/gladier-demo/upload-file Run flows: app.globus.org/flows/library Docs: docs.globus.org/globus-automation-services 80
  • 69. Coming soon: Globus Trigger service • Trigger–Action platform • Predefined triggers and actions to create rules • Globus processes triggers and reliably executes actions