O slideshow foi denunciado.
Seu SlideShare está sendo baixado. ×

Shaping the Future: To Globus Compute and Beyond!

Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio

Confira estes a seguir

1 de 28 Anúncio

Mais Conteúdo rRelacionado

Semelhante a Shaping the Future: To Globus Compute and Beyond! (20)

Mais de Globus (20)

Anúncio

Mais recentes (20)

Shaping the Future: To Globus Compute and Beyond!

  1. 1. Shaping the Future: To Globus Compute and Beyond! Kyle Chard, UChicago/ANL Ben Galewsky, NCSA/UIUC GlobusWorld - May 10, 2022
  2. 2. Purpose of this session • Introduce funcX and solicit feedback for our roadmap • Identify missing features necessary for adoption (as a user or as an administrator deploying funcX) 2
  3. 3. Managed (remote) computation • Globus has transformed the way that researchers manage data • We are taking these lessons and applying them to computation • Imagine if running a task on a remote computer was as easy as transfering a file with Globus 3
  4. 4. Why do we need managed computation? • Remote computing is notoriously complicated – Authentication – Network connections – Configuring/managing jobs – Interacting with resources (waiting in queues, scaling nodes) – Configuring execution environment – Getting results back again • And we have to overcome the same obstacles each time we move to a new resource 4
  5. 5. Functions are a natural unit of computation • HPC use jobs, clouds use instances, … • Researchers primarily work in high level languages • Our aim: move closer to researchers’ environments – Allow researchers to work in a familiar language (Python) from familiar interfaces (e.g., Jupyter) with familiar environments 5
  6. 6. Function as a Service (FaaS) Developers work in Python functions 1. Register Python function code 2. Run (and scale) on remote resources Low latency, on-demand, elastic scaling, easy to deploy and update 6 def compute(input_args): # do something return results Nodes Security Env Containers Modules Data
  7. 7. The funcX model: Globus for compute • funcX service – Highly available cloud-hosted service provides managed fire-and-forget function execution • funcX endpoints – Abstracts access to resources (edge to supercomputer), per-user authentication – funcX has only personal endpoints (single agent for a user) • SDK – Python interface for interacting with funcX, familiar Globus look and feel • Security – Leverage the Globus model: funcX endpoints are resource servers, users authenticate and access via tokens 7
  8. 8. Making a computer accessible to funcX 8 Pip installable single user endpoint • Globus Auth for registration Parallel execution using local fork or via common schedulers • Slurm, PBS, LSF, Cobalt, K8s
  9. 9. Executing functions with funcX 9 Users invoke functions as tasks • Register Python function body • Pass input arguments • Select endpoint(s)
  10. 10. Executing functions with funcX 10 Users invoke functions as tasks • Register Python function body • Pass input arguments • Select endpoint(s) funcX stores tasks in the cloud
  11. 11. Executing functions with funcX Users invoke functions as tasks • Register Python function body • Pass input arguments • Select endpoint(s) funcX stores tasks in the cloud Endpoints fetch waiting tasks (when online), run the task, and return the results (or errors) Users retrieve results 11
  12. 12. Demonstration 12
  13. 13. How funcX is being used 13
  14. 14. 231,000 registered functions 17.2 million function invocations 3683 registered endpoints funcX adoption is growing rapidly 335 users 121s average function runtime
  15. 15. Use Case: Fitting-as-a-Service • Physics at the Large Hadron Collider – Search for new physics – Make precision measurements – Provide constraints on models • All require building statistical models and fitting models to data to perform statistical inference • Model complexity can be huge • Time to fit can be many hours 15 Courtesy Matthew Feickert, University of Illinois at Urbana Champaign
  16. 16. Use Case: Fitting-as-a-Service pyhf: pure-Python HEP statistical models • Pure Python implementation of ubiquitous high energy physics (HEP) statistical model specification for multi-bin histogram-based analysis • Supports multiple computational backends and optimizers • JAX, TensorFlow, and PyTorch backends can take advantage of hardware acceleration and automatic differentiation • Possible to outperform traditional C++ implementations that are default in HEP 16
  17. 17. Use Case: Fitting-as-a-Service Challenges • Need to run 100s of fits on statistical combinations or perform large dimensional scans • Takes hours on user workstations • Science is benefited by rapid time-to-insight • Difficult to configure Python jobs on DOE systems available to researchers • Difficult to configure jobs that use specialized processors • Orchestrating and then assembling results is complicated 17
  18. 18. Use Case: Fitting-as-a-Service FuncX Solution: • Fitting as a service • Configure endpoints on – Chicago River K8s cluster – Blue Waters – SDSC Expanse GPU cluster • Secure endpoint with Globus Group • Simple user code to create workspace and request fits 18
  19. 19. Use Case: Fitting-as-a-Service Scaling of Statistical Inference • Fitting all 125 models from pyhf pallet for published ATLAS SUSY 1Lbb analysis • Using University of Chicago River cluster: – 2 minutes 30 seconds 19
  20. 20. Use Case: Inverse Spectroscopy 20 Courtesy Eric Jonas, University of Chicago
  21. 21. Use Case: Inverse Spectroscopy • Typical run involves 100,000 tasks • Average of 40 core-hours per task • Would take 7 years on a modern workstation • Able to complete analysis in one month at TACC • Fire and forget: Launch 100,000 tasks 21 “funcX lets us all spend more time on science and less on infrastructure!” Eric Jonas
  22. 22. Use case: Research Automation • APS experiments process samples with bright, high-energy x-rays – XPCS: studying materials dynamics – SSX: solving crystal structures – HDEM: studying microstructure evolution • Automation allows researchers to process samples faster • Most flows require computation – Quality control, reconstruction, analysis, machine learning training, transformation, inference, plotting, visualization, metadata extraction, aggregation 22 Linking Scientific Instruments and HPC: Patterns, Technologies, Experiences https://arxiv.org/abs/2204.05128
  23. 23. funcX action provider enables seamless integration in flows • Globus Flows can invoke arbitrary functions via the funcX action provider • Functions may be executed in various locations: at the beamline, local server, cluster, cloud 23 HEDM
  24. 24. Common funcX use cases • Easily scale from laptop to cluster to cloud to supercomputer • Seamlessly move between allocations on different systems • Drive compute from a laptop (e.g., via Jupyter) • Execute a large batch of tasks and retrieve results at some much later stage • Gateways, community accounts via sharing of endpoints and functions • Part of automated flows (often to perform actions for which there is no action provider) 24
  25. 25. Help design and steer FuncX project. Join us on the Miro Whiteboard https://bit.ly/gw22-funcx 25
  26. 26. Questions for the community: users • What use cases do you think funcX will be useful for? • What barriers do you see to adoption? • Do you have use cases for sharing endpoints and or functions? (e.g., science gateway applications) • Imagine a world in which all computing resources had a funcX endpoint, what new use cases would be enabled? 26
  27. 27. Questions for the community: administrators • If we had a multi-tenant (GCS-like) endpoint – What questions would you want to ask before deploying? – If we adopted the same layered security model as GCS (map user to local account), would this work for you? – Where would you consider deploying it? (e.g., login nodes?) – Imagine a world in which funcX was the primary way that users interacted with their allocations, what opportunities do you see for your center? 27
  28. 28. https://funcx.org https://funcx.org/binder CSSI Frameworks: funcX: A Function Execution Service for Portability and Performance NSF 2004894/2004932

×