Numerous scientific teams use the HDF5 format to store very large datasets. Efficient use of this data in a distributed environment depends on client applications being able to read any subset of the data without transferring the entire file to the local machine. The goal of the HDF5-iRODS Project was to develop an HDF5-iRODS module for the iRODS datagrid server that supported this capability, and to apply the technology to an NCSA/SDSC Strategic Applications Program (SAP) project, FLASH.
A joint team from The HDF Group (representing NCSA) and the SDSC SRB group collaborated to accomplish the project goal. The team implemented five HDF5 microservices functions on the iRODS server, and developed an iRODS FLASH slice client application. The client implementation also includes a JNI interface that allows HDFView, a standard tool for browsing HDF5 files, to access HDF5 files stored remotely in iRODS. Finally, three new collection client/server calls were added to the iRODS APIs, making it easier for users to query the content of an iRODS collection.
1. HDF5-iRODS
Peter Cao
The HDF Group
Mike Wan
San Diego Supercomputer Center
HDF and HDF-EOS Workshop XII
October 16, 2008
October 15-17, 2008
HDF and HDF-EOS Workshop XII, Denver, CO
1
2. Imagine
100 Frames x 1 GB = 100 GB
1 GB
HPSS
DB
October 15-17, 2008
HDF and HDF-EOS Workshop XII, Denver, CO
HPC
2
3. Outline
• HDF5-iRODS module
• Applications
• Demo (if time permits)
October 15-17, 2008
HDF and HDF-EOS Workshop XII, Denver, CO
3
4. What is iRODS?
• Stands for i Rule Oriented Data Systems.
• Developed by the Storage Resource Broker
(SRB) team at the San Diego Supercomputer
Center (SDSC).
• A data grid software system that enables a
customizable architecture for sharing data
distributed across heterogeneous resources.
October 15-17, 2008
HDF and HDF-EOS Workshop XII, Denver, CO
4
5. What is iRODS?
Distributed Storage
Database System
Rule System
For more information and download, visit
www.irods.org
October 15-17, 2008
HDF and HDF-EOS Workshop XII, Denver, CO
5
6. Motivation
iRODS
Distributed data system
Indexing and searching
Access control, etc.
HDF5
Large and diverse data
High-performance I/O
Subsetting, etc.
High-performance distributed data system
October 15-17, 2008
HDF and HDF-EOS Workshop XII, Denver, CO
6
7. Whole File Access
client
I need to see the eye of
Hurricane Bob!
server
Get the file
HDF5
Transfer large file – slow!
October 15-17, 2008
HDF and HDF-EOS Workshop XII, Denver, CO
7
8. HDF5 Object or Subset Level Access
client
I need to see the eye of
Hurricane Bob!
Get me th
e
eye of hu
rric
server
ane Bob
HDF5
Small transfer – fast!
October 15-17, 2008
HDF and HDF-EOS Workshop XII, Denver, CO
8
10. HDF5-iRODS Data Flow
client
server
HDF5 Library
HDF5
HDF5 Object or Subset
HDF5 Object or Subset
(File, Group, Dataset,
Subset of Dataset, Attribute)
iRODS message
(pack/unpack)
(File, Group, Dataset,
Subset of Dataset, Attribute)
iRODS message
(pack/unpack)
October 15-17, 2008
HDF and HDF-EOS Workshop XII, Denver, CO
10
11. New iRODS Micro-services
• Five iRODS micro-services
− msiH5File_open
− msiH5File_close
− msiH5Dataset_read
• reads entire dataset or subset of dataset
− msiH5Dataset_read_attribute
− msiH5Group_read_attribute
Rule Engine
msiH5Dataset_read
H5Dataset.read()
File
October 15-17, 2008
HDF and HDF-EOS Workshop XII, Denver, CO
11
12. HDF5-Enabled iRODS Server
• HDF5 library
• Other external libraries (SZIP, ZLIB)
• iRODS version 1.1 or later from
https://www.irods.org/index.php/Downloads/
Follow the README instruction at module/hdf5
October 15-17, 2008
HDF and HDF-EOS Workshop XII, Denver, CO
12
13. Client Application Requirements
•
•
•
•
HDF5 object header files and client handlers
iRODS client library and header files
HDF5-iRODS JNI for Java applications only
$HOME/.irods/.irodsEnv
irodsHost 'kagiso.hdfgroup.uiuc.edu'
irodsPort 1247
irodsUserName 'rods‘
…
For more information and download, visit
http://www.hdfgroup.org/projects/irods
October 15-17, 2008
HDF and HDF-EOS Workshop XII, Denver, CO
13
16. Example: islice
FLASH is an adaptive-mesh simulation code for astrophysical hydrodynamics problems
• Command-line tool to
visualize data produced by
FLASH simulation runs
• Data is huge (~ 100 GB)
• Interesting part is small
adaptive mesh
16*16*16*47531
For more information, visit
flash.uchicago.edu
October 15-17, 2008
HDF and HDF-EOS Workshop XII, Denver, CO
16
17. Example: islice
“./islice -t flash.pal -m rpv1 -p 2 rundir_055_8km_hdf5_plt_cnt_0424”
Star
Ash Flow
Collision focus point
2048*2048*8
(32MB)
Breakout point
A slice from a 3D simulation of The
Detonation of a White Dwarf Star
October 15-17, 2008
HDF and HDF-EOS Workshop XII, Denver, CO
17
18. Thank You!
This project is sponsored by CIP/NLADR, NSF PACI Project in Support of
the Collaboration between the National Center for Supercomputing
Applications (NCSA) and the San Diego Supercomputer Center (SDSC).
The project is managed under the CyberInfrastructure Partnership (CIP), a
joint effort led by NCSA and SDSC to help scientists and engineers take full
advantage of the high-end CyberInfrastructure resources funded by the
National Science Foundation (NSF).
October 15-17, 2008
HDF and HDF-EOS Workshop XII, Denver, CO
18