SlideShare a Scribd company logo
1 of 15
Download to read offline
Introduction to the
HACC Simulation Data Portal
Globus World 2019; Chicago, May 1, 2019
Katrin Heitmann (Argonne National Laboratory)
Based on: arXiv:1904.11966
Introduction
! In cosmology we study the origin, evolution, and make-up of
the Universe
! Many unsolved questions:
○ What is the nature of dark energy and dark matter, making up 95% of the
energy-matter budget of our Universe?
○ What is the mass of the lightest particle in the Universe, the neutrino?
○ How can we learn more about the very first moments of the Universe?
! Upcoming cosmological surveys try to answer these
questions and rely on detailed, complex simulations
○ Simulations are carried out and analyzed on the largest supercomputers
available world-wide
○ Cosmological simulations generate large amounts of data (PBs) to capture
the evolution of the Universe faithfully
○ Given the resources required for these simulations, it is crucial to share
them with the community to enable the best possible science outcome HACC/Galacticus/GalSim
Hubble Ultra Deep Field
NASA
What is needed ...
A large-scale effort that
provides easy access to a
range of simulation products to
the world’s cosmologists as
well as analysis capabilities to
established survey
collaborations
Storage
O( 50PB total)
Simulation
(HPC allocations, e.g.,
INCITE, ALCC)
Analysis User community via web and
community-specific clients
simulation
job description
analysis
job description
Public access to cosmological data and computational support for collaborations
CooleyTheta (10PF)
job submission/adaptation layer
Datasets
Collaboration-installed Web/
Data Interfaces
• LSST DM Butler
• Jupyter
• PDACS (Galaxy)
• DESCQA
• Visualization
• Databases
• Globus
• Workflows
Globus
Online
Petrel
O(1 PB, 100TB to start)
• Portal
• Globus
ALCF-hosted
Collaboration-controlled Resources
Physical/Virtual Machine(s)
Phoenix
In collaboration with Tom Uram, Mike Papka, Ian Foster
Storage
O( 50PB total)
Simulation
(HPC allocations, e.g.,
INCITE, ALCC)
Analysis
simulation
job description
analysis
job description
Public access to cosmological data and computational support for collaborations
CooleyTheta (10PF)
job submission/adaptation layer
Storage
O( 50PB total)
Simulation
(HPC allocations, e.g.,
INCITE, ALCC)
Analysis
simulation
job description
analysis
job description
Public access to cosmological data and computational support for collaborations
CooleyTheta (10PF)
job submission/adaptation layer
Temporary storage,
expires with allocation,
only collaborators on the
project have direct
access
Storage
O( 50PB total)
Simulation
(HPC allocations, e.g.,
INCITE, ALCC)
Analysis
simulation
job description
analysis
job description
Public access to cosmological data and computational support for collaborations
CooleyTheta (10PF)
job submission/adaptation layer
Globus
Online
Petrel
O(1 PB, 100TB to start)
Datasets
Storage
O( 50PB total)
Simulation
(HPC allocations, e.g.,
INCITE, ALCC)
Analysis
simulation
job description
analysis
job description
Public access to cosmological data and computational support for collaborations
CooleyTheta (10PF)
job submission/adaptation layer
Globus
Online
Petrel
O(1 PB, 100TB to start)
Datasets
• Portal
• Globus
User community via web and
community-specific clients
Storage
O( 50PB total)
Simulation
(HPC allocations, e.g.,
INCITE, ALCC)
Analysis User community via web and
community-specific clients
simulation
job description
analysis
job description
Public access to cosmological data and computational support for collaborations
CooleyTheta (10PF)
job submission/adaptation layer
Datasets
Collaboration-installed Web/
Data Interfaces
• LSST DM Butler
• Jupyter
• PDACS (Galaxy)
• DESCQA
• Visualization
• Databases
• Globus
• Workflows
Globus
Online
Petrel
O(1 PB, 100TB to start)
• Portal
• Globus
ALCF-hosted
Collaboration-controlled Resources
Physical/Virtual Machine(s)
Phoenix
In collaboration with Tom Uram, Mike Papka, Ian Foster
What exists ...
• Petrel and Phoenix
• Simulations
• First version of web portal
using Globus
! Petrel: Data Management and
Sharing Pilot, hosted at Argonne
! 1.7PB parallel filesystem
! Embedded in Argonne’s
100+Gbps network fabric to allow
high-speed data transfers
! Web and API access via Globus
! Federated login
! Self-managed by PIs
! https://press3.mcs.anl.gov/petrel/
! Webportal for easy access to
simulations
! Currently: ~ 82.5 TB in our
project covering three
simulation projects
! Step 0: Register with Globus
! Step 1: Select simulation
project
! Step 2: Select data products,
information about data size
available
! Step 3: Transfer with Globus to
endpoint of your choice
! Webportal for easy access to
simulations
! Currently: ~ 82.5 TB in our
project covering three
simulation projects
! Step 0: Register with Globus
! Step 1: Select simulation
project
! Step 2: Select data products,
information about data size
available
! Step 3: Transfer with Globus to
endpoint of your choice
! Webportal for easy access to
simulations
! Currently: ~ 82.5 TB in our
project covering three
simulation projects
! Step 0: Register with Globus
! Step 1: Select simulation
project
! Step 2: Select data products,
information about data size
available
! Step 3: Transfer with Globus to
endpoint of your choice
“The purpose of computing is insight not numbers”
- Richard Hamming

More Related Content

What's hot

code.talks 2019 - Scotty: Efficient Window Aggregation for your Stream Proces...
code.talks 2019 - Scotty: Efficient Window Aggregation for your Stream Proces...code.talks 2019 - Scotty: Efficient Window Aggregation for your Stream Proces...
code.talks 2019 - Scotty: Efficient Window Aggregation for your Stream Proces...
Jonas Traub
 
Summingbird: Streaming Portable, MapReduce
Summingbird: Streaming Portable, MapReduceSummingbird: Streaming Portable, MapReduce
Summingbird: Streaming Portable, MapReduce
DataWorks Summit
 
Lec 17 heap data structure
Lec 17 heap data structureLec 17 heap data structure
Lec 17 heap data structure
Sajid Marwat
 

What's hot (20)

Analyzing Larger RasterData in a Jupyter Notebook with GeoPySpark on AWS - FO...
Analyzing Larger RasterData in a Jupyter Notebook with GeoPySpark on AWS - FO...Analyzing Larger RasterData in a Jupyter Notebook with GeoPySpark on AWS - FO...
Analyzing Larger RasterData in a Jupyter Notebook with GeoPySpark on AWS - FO...
 
OCC Overview OMG Clouds Meeting 07-13-09 v3
OCC Overview OMG Clouds Meeting 07-13-09 v3OCC Overview OMG Clouds Meeting 07-13-09 v3
OCC Overview OMG Clouds Meeting 07-13-09 v3
 
The next generation of the Montage image mosaic engine
The next generation of the Montage image mosaic engineThe next generation of the Montage image mosaic engine
The next generation of the Montage image mosaic engine
 
OWL reasoning with WebPIE: calculating the closer of 100 billion triples
OWL reasoning with WebPIE: calculating the closer of 100 billion triplesOWL reasoning with WebPIE: calculating the closer of 100 billion triples
OWL reasoning with WebPIE: calculating the closer of 100 billion triples
 
Using parallel hierarchical clustering to
Using parallel hierarchical clustering toUsing parallel hierarchical clustering to
Using parallel hierarchical clustering to
 
Coding the Continuum
Coding the ContinuumCoding the Continuum
Coding the Continuum
 
Solving Network Throughput Problems at the Diamond Light Source
Solving Network Throughput Problems at the Diamond Light SourceSolving Network Throughput Problems at the Diamond Light Source
Solving Network Throughput Problems at the Diamond Light Source
 
Storm: a distributed ,fault tolerant ,real time computation
Storm: a distributed ,fault tolerant ,real time computationStorm: a distributed ,fault tolerant ,real time computation
Storm: a distributed ,fault tolerant ,real time computation
 
Faster Workflows, Faster
Faster Workflows, FasterFaster Workflows, Faster
Faster Workflows, Faster
 
Q4 2016 GeoTrellis Presentation
Q4 2016 GeoTrellis PresentationQ4 2016 GeoTrellis Presentation
Q4 2016 GeoTrellis Presentation
 
code.talks 2019 - Scotty: Efficient Window Aggregation for your Stream Proces...
code.talks 2019 - Scotty: Efficient Window Aggregation for your Stream Proces...code.talks 2019 - Scotty: Efficient Window Aggregation for your Stream Proces...
code.talks 2019 - Scotty: Efficient Window Aggregation for your Stream Proces...
 
HDF Town Hall
HDF Town HallHDF Town Hall
HDF Town Hall
 
Summingbird: Streaming Portable, MapReduce
Summingbird: Streaming Portable, MapReduceSummingbird: Streaming Portable, MapReduce
Summingbird: Streaming Portable, MapReduce
 
Round Table Introduction: Analytics on 100 TB+ catalogs
Round Table Introduction: Analytics on 100 TB+ catalogsRound Table Introduction: Analytics on 100 TB+ catalogs
Round Table Introduction: Analytics on 100 TB+ catalogs
 
Deep Learning in Deep Space
Deep Learning in Deep SpaceDeep Learning in Deep Space
Deep Learning in Deep Space
 
2021 Dask Summit - Using STAC to catalog SpatioTemporal datasets
2021 Dask Summit - Using STAC to catalog SpatioTemporal datasets2021 Dask Summit - Using STAC to catalog SpatioTemporal datasets
2021 Dask Summit - Using STAC to catalog SpatioTemporal datasets
 
Many Task Applications for Grids and Supercomputers
Many Task Applications for Grids and SupercomputersMany Task Applications for Grids and Supercomputers
Many Task Applications for Grids and Supercomputers
 
Lec 17 heap data structure
Lec 17 heap data structureLec 17 heap data structure
Lec 17 heap data structure
 
The Next Light Wave: Why Too Much Light is An Issue
The Next Light Wave: Why Too Much Light is An IssueThe Next Light Wave: Why Too Much Light is An Issue
The Next Light Wave: Why Too Much Light is An Issue
 
Research in the Cloud
Research in the CloudResearch in the Cloud
Research in the Cloud
 

Similar to Introducing the HACC Simulation Data Portal

Data Automation at Light Sources
Data Automation at Light SourcesData Automation at Light Sources
Data Automation at Light Sources
Ian Foster
 
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
Ian Foster
 
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...
Databricks
 

Similar to Introducing the HACC Simulation Data Portal (20)

Data Automation at Light Sources
Data Automation at Light SourcesData Automation at Light Sources
Data Automation at Light Sources
 
Toward a National Research Platform
Toward a National Research PlatformToward a National Research Platform
Toward a National Research Platform
 
Petrel: A Programmatically Accessible Research Data Service
Petrel: A Programmatically Accessible Research Data ServicePetrel: A Programmatically Accessible Research Data Service
Petrel: A Programmatically Accessible Research Data Service
 
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
 
HPC Cluster Computing from 64 to 156,000 Cores 
HPC Cluster Computing from 64 to 156,000 Cores HPC Cluster Computing from 64 to 156,000 Cores 
HPC Cluster Computing from 64 to 156,000 Cores 
 
Preservation And Reuse In High Energy Physics Salvatore Mele
Preservation And Reuse In High Energy Physics Salvatore MelePreservation And Reuse In High Energy Physics Salvatore Mele
Preservation And Reuse In High Energy Physics Salvatore Mele
 
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...
 
re:Invent 2013-foster-madduri
re:Invent 2013-foster-maddurire:Invent 2013-foster-madduri
re:Invent 2013-foster-madduri
 
Toward a Global Interactive Earth Observing Cyberinfrastructure
Toward a Global Interactive Earth Observing CyberinfrastructureToward a Global Interactive Earth Observing Cyberinfrastructure
Toward a Global Interactive Earth Observing Cyberinfrastructure
 
Accelerating Discovery via Science Services
Accelerating Discovery via Science ServicesAccelerating Discovery via Science Services
Accelerating Discovery via Science Services
 
Big Process for Big Data @ NASA
Big Process for Big Data @ NASABig Process for Big Data @ NASA
Big Process for Big Data @ NASA
 
The Earth System Grid Federation: Origins, Current State, Evolution
The Earth System Grid Federation: Origins, Current State, EvolutionThe Earth System Grid Federation: Origins, Current State, Evolution
The Earth System Grid Federation: Origins, Current State, Evolution
 
Larry Smarr - NRP Application Drivers
Larry Smarr - NRP Application DriversLarry Smarr - NRP Application Drivers
Larry Smarr - NRP Application Drivers
 
Accelerating Time to Science: Transforming Research in the Cloud
Accelerating Time to Science: Transforming Research in the CloudAccelerating Time to Science: Transforming Research in the Cloud
Accelerating Time to Science: Transforming Research in the Cloud
 
Scaling collaborative data science with Globus and Jupyter
Scaling collaborative data science with Globus and JupyterScaling collaborative data science with Globus and Jupyter
Scaling collaborative data science with Globus and Jupyter
 
The Discovery Cloud: Accelerating Science via Outsourcing and Automation
The Discovery Cloud: Accelerating Science via Outsourcing and AutomationThe Discovery Cloud: Accelerating Science via Outsourcing and Automation
The Discovery Cloud: Accelerating Science via Outsourcing and Automation
 
afternoon3.pdf
afternoon3.pdfafternoon3.pdf
afternoon3.pdf
 
Using Containers and HPC to Solve the Mysteries of the Universe by Deborah Bard
Using Containers and HPC to Solve the Mysteries of the Universe by Deborah BardUsing Containers and HPC to Solve the Mysteries of the Universe by Deborah Bard
Using Containers and HPC to Solve the Mysteries of the Universe by Deborah Bard
 
Scaling People, Not Just Systems, to Take On Big Data Challenges
Scaling People, Not Just Systems, to Take On Big Data ChallengesScaling People, Not Just Systems, to Take On Big Data Challenges
Scaling People, Not Just Systems, to Take On Big Data Challenges
 
Terabit Applications: What Are They, What is Needed to Enable Them?
Terabit Applications: What Are They, What is Needed to Enable Them?Terabit Applications: What Are They, What is Needed to Enable Them?
Terabit Applications: What Are They, What is Needed to Enable Them?
 

More from Globus

More from Globus (20)

Advanced Globus System Administration Topics
Advanced Globus System Administration TopicsAdvanced Globus System Administration Topics
Advanced Globus System Administration Topics
 
Instrument Data Automation: The Life of a Flow
Instrument Data Automation: The Life of a FlowInstrument Data Automation: The Life of a Flow
Instrument Data Automation: The Life of a Flow
 
Building Research Applications with Globus PaaS
Building Research Applications with Globus PaaSBuilding Research Applications with Globus PaaS
Building Research Applications with Globus PaaS
 
Reliable, Remote Computation at All Scales
Reliable, Remote Computation at All ScalesReliable, Remote Computation at All Scales
Reliable, Remote Computation at All Scales
 
Best Practices for Data Sharing Using Globus
Best Practices for Data Sharing Using GlobusBest Practices for Data Sharing Using Globus
Best Practices for Data Sharing Using Globus
 
An Introduction to Globus for Researchers
An Introduction to Globus for ResearchersAn Introduction to Globus for Researchers
An Introduction to Globus for Researchers
 
Introduction to Research Automation with Globus
Introduction to Research Automation with GlobusIntroduction to Research Automation with Globus
Introduction to Research Automation with Globus
 
Globus for System Administrators
Globus for System AdministratorsGlobus for System Administrators
Globus for System Administrators
 
Introduction to Globus for System Administrators
Introduction to Globus for System AdministratorsIntroduction to Globus for System Administrators
Introduction to Globus for System Administrators
 
Introduction to Data Transfer and Sharing for Researchers
Introduction to Data Transfer and Sharing for ResearchersIntroduction to Data Transfer and Sharing for Researchers
Introduction to Data Transfer and Sharing for Researchers
 
Introduction to the Globus Platform for Developers
Introduction to the Globus Platform for DevelopersIntroduction to the Globus Platform for Developers
Introduction to the Globus Platform for Developers
 
Introduction to the Command Line Interface (CLI)
Introduction to the Command Line Interface (CLI)Introduction to the Command Line Interface (CLI)
Introduction to the Command Line Interface (CLI)
 
Automating Research Data with Globus Flows and Compute
Automating Research Data with Globus Flows and ComputeAutomating Research Data with Globus Flows and Compute
Automating Research Data with Globus Flows and Compute
 
Automating Research Data Flows and Introduction to the Globus Platform
Automating Research Data Flows and Introduction to the Globus PlatformAutomating Research Data Flows and Introduction to the Globus Platform
Automating Research Data Flows and Introduction to the Globus Platform
 
Advanced Globus System Administration
Advanced Globus System AdministrationAdvanced Globus System Administration
Advanced Globus System Administration
 
Introduction to Globus for System Administrators
Introduction to Globus for System AdministratorsIntroduction to Globus for System Administrators
Introduction to Globus for System Administrators
 
Introduction to Globus for New Users
Introduction to Globus for New UsersIntroduction to Globus for New Users
Introduction to Globus for New Users
 
Working with Globus Platform Services and Portals
Working with Globus Platform Services and PortalsWorking with Globus Platform Services and Portals
Working with Globus Platform Services and Portals
 
Globus Automation
Globus AutomationGlobus Automation
Globus Automation
 
Advanced Globus System Administration
Advanced Globus System AdministrationAdvanced Globus System Administration
Advanced Globus System Administration
 

Recently uploaded

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Recently uploaded (20)

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 

Introducing the HACC Simulation Data Portal

  • 1. Introduction to the HACC Simulation Data Portal Globus World 2019; Chicago, May 1, 2019 Katrin Heitmann (Argonne National Laboratory) Based on: arXiv:1904.11966
  • 2. Introduction ! In cosmology we study the origin, evolution, and make-up of the Universe ! Many unsolved questions: ○ What is the nature of dark energy and dark matter, making up 95% of the energy-matter budget of our Universe? ○ What is the mass of the lightest particle in the Universe, the neutrino? ○ How can we learn more about the very first moments of the Universe? ! Upcoming cosmological surveys try to answer these questions and rely on detailed, complex simulations ○ Simulations are carried out and analyzed on the largest supercomputers available world-wide ○ Cosmological simulations generate large amounts of data (PBs) to capture the evolution of the Universe faithfully ○ Given the resources required for these simulations, it is crucial to share them with the community to enable the best possible science outcome HACC/Galacticus/GalSim Hubble Ultra Deep Field NASA
  • 3. What is needed ... A large-scale effort that provides easy access to a range of simulation products to the world’s cosmologists as well as analysis capabilities to established survey collaborations
  • 4. Storage O( 50PB total) Simulation (HPC allocations, e.g., INCITE, ALCC) Analysis User community via web and community-specific clients simulation job description analysis job description Public access to cosmological data and computational support for collaborations CooleyTheta (10PF) job submission/adaptation layer Datasets Collaboration-installed Web/ Data Interfaces • LSST DM Butler • Jupyter • PDACS (Galaxy) • DESCQA • Visualization • Databases • Globus • Workflows Globus Online Petrel O(1 PB, 100TB to start) • Portal • Globus ALCF-hosted Collaboration-controlled Resources Physical/Virtual Machine(s) Phoenix In collaboration with Tom Uram, Mike Papka, Ian Foster
  • 5. Storage O( 50PB total) Simulation (HPC allocations, e.g., INCITE, ALCC) Analysis simulation job description analysis job description Public access to cosmological data and computational support for collaborations CooleyTheta (10PF) job submission/adaptation layer
  • 6. Storage O( 50PB total) Simulation (HPC allocations, e.g., INCITE, ALCC) Analysis simulation job description analysis job description Public access to cosmological data and computational support for collaborations CooleyTheta (10PF) job submission/adaptation layer Temporary storage, expires with allocation, only collaborators on the project have direct access
  • 7. Storage O( 50PB total) Simulation (HPC allocations, e.g., INCITE, ALCC) Analysis simulation job description analysis job description Public access to cosmological data and computational support for collaborations CooleyTheta (10PF) job submission/adaptation layer Globus Online Petrel O(1 PB, 100TB to start) Datasets
  • 8. Storage O( 50PB total) Simulation (HPC allocations, e.g., INCITE, ALCC) Analysis simulation job description analysis job description Public access to cosmological data and computational support for collaborations CooleyTheta (10PF) job submission/adaptation layer Globus Online Petrel O(1 PB, 100TB to start) Datasets • Portal • Globus User community via web and community-specific clients
  • 9. Storage O( 50PB total) Simulation (HPC allocations, e.g., INCITE, ALCC) Analysis User community via web and community-specific clients simulation job description analysis job description Public access to cosmological data and computational support for collaborations CooleyTheta (10PF) job submission/adaptation layer Datasets Collaboration-installed Web/ Data Interfaces • LSST DM Butler • Jupyter • PDACS (Galaxy) • DESCQA • Visualization • Databases • Globus • Workflows Globus Online Petrel O(1 PB, 100TB to start) • Portal • Globus ALCF-hosted Collaboration-controlled Resources Physical/Virtual Machine(s) Phoenix In collaboration with Tom Uram, Mike Papka, Ian Foster
  • 10. What exists ... • Petrel and Phoenix • Simulations • First version of web portal using Globus
  • 11. ! Petrel: Data Management and Sharing Pilot, hosted at Argonne ! 1.7PB parallel filesystem ! Embedded in Argonne’s 100+Gbps network fabric to allow high-speed data transfers ! Web and API access via Globus ! Federated login ! Self-managed by PIs ! https://press3.mcs.anl.gov/petrel/
  • 12. ! Webportal for easy access to simulations ! Currently: ~ 82.5 TB in our project covering three simulation projects ! Step 0: Register with Globus ! Step 1: Select simulation project ! Step 2: Select data products, information about data size available ! Step 3: Transfer with Globus to endpoint of your choice
  • 13. ! Webportal for easy access to simulations ! Currently: ~ 82.5 TB in our project covering three simulation projects ! Step 0: Register with Globus ! Step 1: Select simulation project ! Step 2: Select data products, information about data size available ! Step 3: Transfer with Globus to endpoint of your choice
  • 14. ! Webportal for easy access to simulations ! Currently: ~ 82.5 TB in our project covering three simulation projects ! Step 0: Register with Globus ! Step 1: Select simulation project ! Step 2: Select data products, information about data size available ! Step 3: Transfer with Globus to endpoint of your choice
  • 15. “The purpose of computing is insight not numbers” - Richard Hamming