2. Tapis Project
● 5 year, NSF funded computing framework supporting multi-site computational
research
● Used to manage data and execute code on HPC, HTC and cloud systems
(>51K researcher accounts, 23 tenants & 15 gateways 2021-2022)
● Agentless, SSH-based communication with storage/compute systems
● Implemented as microservices with REST interfaces
● Users obtain a token by authenticating to Tapis using OAuth2
○ Subsequent APIs calls are authenticated using the token
2
https://tapis-project.org
3. Tapis Higher-Level Objectives
Tapis Gives Researchers The Ability to...
● Track your analysis provenance - Tapis records your input and output data along
with application used and settings - so you know what you have done every time.
● Reproduce your analysis - Tapis records all your inputs/outputs/parameters etc so
you can re-run an analysis.
● Share your data, workflows/applications, computational resources with
collaborators or your lab - Tapis enables sharing with access controls for all your
data/resources/applications within Tapis.
Without having to install or support a complicated stack of technology
4. Who Is Using Tapis?
Science Gateways
● CyVerse
● DesignSafe
● VDJServer
● Synergistic Discovery and Design (SD2)
● 3D Electron Microscopy (3DEM) Platform
● iMicrobe
● `Ike Wai
Labs/Projects
● Planet Texas 2050
● Hawaii Data Science Institute
● iReceptor+
● C-MAIKI
● ECCO
● GenApp
● Acute to Chronic Pain Signatures (A2CPS)
Institutions
● TACC
● CDC
● UH
● NIH
● Compute Canada
Additional collaborations starting
soon...
7. Streaming Data, Events and Functions
● Functions (Actors)
● *Notifications
● Streams
MetaData Management
● Meta
● PgREST
Tapis Services
7
Tenancy, Authentication and Security
● Tenants
● Sites
● Tokens
● Authenticator
● Security Kernel
● *Postits
Data Management and Code
Executions
● Systems
● Files
● Apps
● Jobs
https://tapis-project.github.io/live-docs
8. /systems /files /apps /jobs
8
● Register storage and compute systems
○ Systems have a type, such as
Linux, s3, iRODS, etc.
● Ingest, move and transform data files
and folders
● Register application containers on large
systems
● Launch jobs to invoke applications &
Capture metadata about the workflow
HPC Cloud
HTC
Data Management and Code Execution APIs
9. Globus Integration: Motivation
Globus supports a massive community:
● 20,000+ Globus endpoints; 200K+ users
● Reliable, high-performance transfer
● Support for many storage protocols via connectors: cloud APIs, archival tape
systems.
Many Tapis users are already using Globus, but currently this requires out-of-band
actions that Tapis is unaware of, causing issues:
● Data provenance and history gets broken
● Staging data as part of a job cannot be done through Globus with Tapis
10. Globus Integration: Design
Design:
● New Tapis systemType “Globus” (existing types: “s3”, “IRODS”, …)
● New Tapis endpoints to support walking Globus OAuth flow with user
○ Tapis obtains and managed access and refresh tokens
● New Tapis-Globus Proxy
○ Lightweight service that translates Tapis data transfer requests to Globus API requests
○ Written in Python to take advantage of the Globus Python SDK
● Modify Tapis data transfer agent to utilize Tapis-Globus Proxy
11. Future Work
● OAuth SSH
○ Short-lived tokens, obtained via OAuth flows, that can be used to SSH to systems at TACC
and perhaps other HPC centers in the future.
○ Both Tapis and Globus would be interested in this.
○ Collaboration with TACC Security and HPC groups.
● Support for JWT in Globus Auth Tokens for OAuth SSH
○ Would allow tokens to be validated without an extra API call
○ Modify OAuth SSH to allow configurable policies (JWT and MFA lifetimes, restricted identity
domains, etc.)
12. Thank You
12
Links
GitHub: https://github.com/tapis-project/
Reference: https://tapis.readthedocs.io
OpenAPI “live” docs: https://tapis-project.github.io/live-docs/
* Funding
The Tapis Framework: NSF Office of Advanced CyberInfrastructure #1931439 and #1931575
Contact
Joe Stubbs (jstubbs@tacc.utexas.edu)