SlideShare a Scribd company logo
1 of 39
www.hdfgroup.org
The HDF Group
HDF5 and The HDF Group
May 2014
www.hdfgroup.org
THE HDF GROUP
HDF5 and The HDF Group
www.hdfgroup.org
Mission
To provide high quality software for
managing large complex data,
to provide outstanding services for users
of these technologies,
and to insure effective management of
data throughout the data life cycle.
HDF5 and The HDF Group
www.hdfgroup.org
Goals of The HDF Group
• To create, maintain, and evolve software and
services that enable society to manage large
complex data at every stage of the data life
cycle.
• To establish and maintain a sustainable
organization with a highly-skilled and
committed team devoted to accomplishing the
first goal.
HDF5 and The HDF Group
www.hdfgroup.org
The HDF Group
• 1988-2006: Software group at University of Illinois
National Center for Supercomputing Applications
• 2005-present: Non-profit company in Champaign, IL
• Passionate about managing large, complex,
heterogeneous data throughout its life cycle
• Creators and stewards of HDF4 and HDF5
• Own HDF4 and HDF5
• Formats, libraries, and tools are open and free
• Committed to high quality and reliability
• Currently employ 33 staff
HDF5 and The HDF Group
www.hdfgroup.org
Current project list for the HDF Group
• NASA – Earth Observing System (EOS)
• The basis for global climate research
• HDF is the standard archive and distribution format for EOS
• Hundreds of data products, 8 petabyte archive and growing
• NOAA/NASA – JPSS
• Next generation weather satellite system and EOS
• HDF5 is the primary distribution format (6 TB/day)
• Sandia National Laboratory
• High throughput, multi-stream satellite image management
• Synchrotron community
• Scalable solutions for high throughput data acquisition and
management
• ExaHDF5 (Lawrence Berkeley National Lab)
• High end scientific simulations
• Tuning HDF5 for high performance parallel I/O
• FastForward Computing (DOE)
• Solving I/O challenges for exascale computing
HDF5 and The HDF Group
www.hdfgroup.org
The HDF Group Services
• Helpdesk and Mailing Lists
• Available to all users as a first level of support
• Priority Support
• Rapid issue resolution and advice
• Consulting
• Needs assessment, troubleshooting, design reviews, etc.
• Training
• Tutorials and hands-on practical experience
• Enterprise Support
• Coordinating HDF activities across departments
• Special Projects
• Adapting customer applications to HDF
• New features and tools
• Research and Development
HDF5 and The HDF Group
www.hdfgroup.org
WHO USES HDF5?
HDF5 and The HDF Group
www.hdfgroup.orgHDF5 and The HDF Group
Who uses HDF5?
• Applications that deal with large or complex
data
• Over 200 different application areas
• >2 million data product users world-wide
• Academia, government agencies, industry
www.hdfgroup.org
Members of the HDF support community
• NASA – Earth Observing System
• NOAA/NASA/Riverside Tech – NPOESS
• A large financial institution
• DOE – projects w/LBNL & PNNL, ANL & ORNL
• Lawrence Livermore National Lab
• Army Geospatial Center
• NIH/Geospiza (bio software company )
• Lawrence Berkeley National Lab
• University of Illinois/NCSA
• Sandia National Lab
• A leading U.S. aerospace company
• Projects for petroleum industry, vehicle testing,
weapons research, others
• “In kind” support
HDF5 and The HDF Group
www.hdfgroup.org
New Areas We’re Exploring
• Fusion research data storage
• Submitted proposal for ITER project’s data
management w/large industrial fusion partner
• Astronomy
• Submitted NSF SI2 grant w/NRAO
• Working toward new standard for radioastronomy
data storage
• Electron Microscopy
• Submitted NSF SI2 grant w/LSU, et al
• Proposing new standard for storing imaging data
• Synthesis of HDF5 and cloud storage w/Microsoft
• Developing “RESTful” API for accessing HDF5
data in Azure cloud
HDF5 and The HDF Group
www.hdfgroup.org
HDF5 SCIENCE
APPLICATIONS
HDF5 and The HDF Group
www.hdfgroup.orgHDF5 and The HDF Group
NASA EOS Remote Sensed Data
• HDF format is the standard file format for
storing data from NASA's Earth Observing
System (EOS) mission.
• Petabytes of data stored in HDF4 and HDF5
to support the Global Climate Change
Research Program.
HDF5 and The HDF Group
www.hdfgroup.org
What is JPSS?
• JPSS is the next generation of NOAA's polar-orbiting
environmental satellites.
• JPSS observations enable forecasting severe
weather like hurricanes, tornadoes and blizzards, and
assessing environmental hazards such as droughts,
forest fires, poor air quality and harmful coastal
waters.
• JPSS will provide continuity of critical, global Earth
observations— including our atmosphere, oceans and
land through 2025.
• During Hurricane Sandy in October 2012, JPSS data
helped forecasters and scientists accurately predict
Sandy's hurricane track and infamous 'left hook'
landfall into New York and New Jersey–more than
five days in advance.
HDF5 and The HDF Group
www.hdfgroup.org
CFD General Notation System
HDF5 and The HDF Group
www.hdfgroup.org
What is CFD?
Computational fluid dynamics (CFD) is a
branch of fluid mechanics that uses numerical
methods and algorithms to solve and analyze
problems that involve fluid flows.
HDF5 and The HDF Group
www.hdfgroup.org
This CFD computer generated image shows a model of
the space shuttle. CFD has taken the place of wind tunnels
for many evaluations of aircraft and, as computing power
increases and computer models become more
sophisticated, CFD will largely replace wind tunnels.
HDF5 and The HDF Group
www.hdfgroup.org
What is CGNS ?
• Standard Interface Data Structures (SIDS)
– Collection of conventions and definitions that
defines the intellectual content of CFD-related
data.
• SIDS to ADF Mapping
– Advanced Data Format
• SIDS to HDF5 Mapping
– Defines how the SIDS is represented in HDF5
• CGNS Mid-Level Library (MLL)
– Application Programming Interface (API) which
conforms to the SIDS
– Built on top of ADF/HDF5, which do I/O operations
HDF5 and The HDF Group
www.hdfgroup.org
CGNS and HDF5*
• CGNS was originally built using the ADF format.
• However, ADF does not have parallel I/O or data
compression capabilities, and does not have the
support and tools that HDF5 offers.
• HDF5 has rapidly grown to become a world-wide
format standard for scientific data.
• HDF5 has parallel capability as well as a broader
support base than ADF.
• Therefore, CGNS has adopted HDF5 as the
default (official) data storage mechanism.
* Paraphrased from http://cgns.sourceforge.net/hdf5.html.
HDF5 and The HDF Group
www.hdfgroup.org
• An adaptive mesh refinement (AMR), grid-based
hybrid code which is designed to do simulations
of cosmological structure formation.
HDF5 and The HDF Group
HDF5 and The HDF Group
Image credit: Alexei Kritsuk, Paolo Padoan & Mike Norman
www.hdfgroup.org
What is ENZO for?
• At UC San Diego ENZO cosmology is used to
simulate the universe from first principles, starting
near the Big Bang.
• Researchers using ENZO have conducted the
most detailed simulations ever of a region of the
universe more than 1.5 billion light years across.
• “We need to zoom in on these dense regions to
capture the key physical processes -- including
gravitation, flows of normal and ‘dark’ matter, and
shock heating and radiative cooling of the gas,”
said Mike Norman. “This requires ENZO’s
‘adaptive mesh refinement’ capability.”
HDF5 and The HDF Group
www.hdfgroup.org
• “AMR codes begin with a coarse grid spacing, and then
spawn more detailed subgrids as needed to track key
processes in higher density regions.
• “We achieved unprecedented detail by reaching seven levels
of subgrids throughout the survey volume -- something never
done before -- producing more than 400,000 subgrids,” said
SDSC computational scientist Robert Harkness.
• “Norman is one of the largest users of supercomputing time
in the world, with 16 million computing hours at the TACC,
and millions more on TeraGrid systems at SDSC, PSC, and
NCSA.”
• “The HDF Group provided important support for handling the
output, and SDSC’s data storage environment allowed the
researchers to efficiently store and manage the massive
data.”
HDF5 and The HDF Group
NeXus
HDF5 and The HDF Group
www.hdfgroup.org
What is NeXus?
• In recent years, scientists and programmers in
neutron and synchrotron facilities around the world
concluded that a common data format would fulfill a
valuable function in the scattering community.
• As instrumentation becomes more complex and data
visualization more challenging, scientists find it
difficult to keep up with new developments.
• A common data format makes it easier to exchange
experimental results and to exchange ideas about
how to analyze them. It promotes greater cooperation
in software development and stimulates the design of
more sophisticated visualization tools.
• The NeXus data format has been developed in
response to these needs.
HDF5 and The HDF Group
www.hdfgroup.org
HDF5 TECHNOLOGIES
HDF5 and The HDF Group
www.hdfgroup.org
Data challenges addressed by HDF5
• Ability to organize complex collections of data
• Efficient and scalable data storage and access
• A growing need to integrate a wide variety of
types of data
• The evolution of data technologies
• Long term preservation of data
HDF5 and The HDF Group
www.hdfgroup.orgHDF5 and The HDF Group
HDF is…
• HDF stands for ‘Hierarchical Data Format’
• A file format for storing any kind of data
• Software system to manage data in the format
• Designed for high volume or complex data
• Designed for every size and type of system
• Open format and software library, tools
• There are two HDF’s: HDF4 and HDF5
• Here we focus on HDF5
www.hdfgroup.org
HDF5 Technology Platform
HDF5 data
model
• The “building
blocks” for data
organization
HDF5 software
• Library, language
interfaces, tools
HDF5 file
format
• Byte-level
organization of
data
HDF5 and The HDF Group
www.hdfgroup.org
Professionally managed
• Source under version control, public access
• Automatic daily testing,
• 200+ configurations
• Performance, backward/forward compatibility
• “C, C++, Fortran, Java, Python APIs
• Build supports Autoconfigure and CMake
• Sound development, coding practices
• Maintenance releases every May, November
HDF5 and The HDF Group
www.hdfgroup.org
Professionally supported
• Helpdesk
• FORUM and mailing lists
• Extensive web documentation – User’s Guide,
Ref Manual, examples, tutorials, other docs
• Community friendly
• Integrate contributions from external
developers
• Solicit feedback on new features and pre-
releases
• Collaborate on projects, especially in testing
HDF5 and The HDF Group
www.hdfgroup.org
HDF5 file
lat | lon | temp
----|-----|-----
12 | 23 | 3.1
15 | 24 | 4.2
17 | 21 | 3.6
An HDF5 file is a
container that
holds data
objects.
HDF5 and The HDF Group
www.hdfgroup.org
HDF5 file organization
lat | lon | temp
----|-----|-----
12 | 23 | 3.1
15 | 24 | 4.2
17 | 21 | 3.6
Experiment Notes:
Serial Number: 99378920
Date: 3/13/09
Config: Standard 3
/
SimOutViz
HDF5 groups and links
organize data objects.
Parameters
10;100;1000
Timestep
36,000
HDF5 and The HDF Group
www.hdfgroup.orgHDF5 and The HDF Group
A single platform with multiple uses
• One general data model
• One general format
• One library
• Adaptable for almost any kind of data
• Works on almost any architecture
• Ability to interact well with other technologies
• Attention to past, present, future compatibility
HDF5 Philosophy
www.hdfgroup.org
HDF5 Software Layers & Storage
HDF5 File
Format File Split
Files
File on
Parallel
Filesystem
Other
h5dump
tool
High Level
APIs
HDFView
tool
Tools
h5repack
tool …
I/O Drivers
Internals
Datatype
Conversion
data
compression
Chunked
Storage
Version
Compatibility
and so on…
Language Interfaces
C, Fortran, C++
HDF5 Data Model
Groups, Datasets, Attributes, …
HDF5Library
Posix
I/O
Split
Files
Parallel
I/O
Custom
HDF5 and The HDF Group
www.hdfgroup.org
HDF ecosystem
Storage
EOS Domain
Data Objects
Applications
EOS
Applications
MATLAB
HDF Library
IDL
HDF-EOS Library
Swath Grid Point
Etc.
HDF tools
HDF5 and The HDF Group
www.hdfgroup.org
Other Software
• The HDF Group
• HDFView – an HDF4 & HDF5 browser
• Command-line utilities
• Regression and performance testing software
• 3rd Party
• NetCDF-4, IDL, MATLAB, Mathematica,
PyTables, Pandas
• Communities
• EOS, ASC, CGNS, Energistics, NeXuS
• Integration with other software
• iRODS, OPeNDAP, MPI
HDF5 and The HDF Group
www.hdfgroup.org
www.hdfgroup.org
HDF5 and The HDF Group

More Related Content

What's hot

Geospatial Data Abstraction Library (GDAL) Enhancement for ESDIS (GEE)
Geospatial Data Abstraction Library (GDAL) Enhancement for ESDIS (GEE)Geospatial Data Abstraction Library (GDAL) Enhancement for ESDIS (GEE)
Geospatial Data Abstraction Library (GDAL) Enhancement for ESDIS (GEE)The HDF-EOS Tools and Information Center
 
Using Archivemedia to preserve research data
Using Archivemedia to preserve research dataUsing Archivemedia to preserve research data
Using Archivemedia to preserve research dataARDC
 
Interoperability with netCDF-4 - Experience with NPP and HDF-EOS5 products
Interoperability with netCDF-4 - Experience with NPP and HDF-EOS5 productsInteroperability with netCDF-4 - Experience with NPP and HDF-EOS5 products
Interoperability with netCDF-4 - Experience with NPP and HDF-EOS5 productsThe HDF-EOS Tools and Information Center
 
Research Data Management at Imperial College London
Research Data Management at Imperial College LondonResearch Data Management at Imperial College London
Research Data Management at Imperial College LondonSarah Anna Stewart
 

What's hot (20)

HDF and netCDF Data Support in ArcGIS
HDF and netCDF Data Support in ArcGISHDF and netCDF Data Support in ArcGIS
HDF and netCDF Data Support in ArcGIS
 
Improved Methods for Accessing Scientific Data for the Masses
Improved Methods for Accessing Scientific Data for the MassesImproved Methods for Accessing Scientific Data for the Masses
Improved Methods for Accessing Scientific Data for the Masses
 
Open-source Scientific Computing and Data Analytics using HDF
Open-source Scientific Computing and Data Analytics using HDFOpen-source Scientific Computing and Data Analytics using HDF
Open-source Scientific Computing and Data Analytics using HDF
 
HDF
HDFHDF
HDF
 
Geospatial Data Abstraction Library (GDAL) Enhancement for ESDIS (GEE)
Geospatial Data Abstraction Library (GDAL) Enhancement for ESDIS (GEE)Geospatial Data Abstraction Library (GDAL) Enhancement for ESDIS (GEE)
Geospatial Data Abstraction Library (GDAL) Enhancement for ESDIS (GEE)
 
HDF Product Designer
HDF Product DesignerHDF Product Designer
HDF Product Designer
 
Using Archivemedia to preserve research data
Using Archivemedia to preserve research dataUsing Archivemedia to preserve research data
Using Archivemedia to preserve research data
 
Status of HDF-EOS, Related Software and Tools
 Status of HDF-EOS, Related Software and Tools Status of HDF-EOS, Related Software and Tools
Status of HDF-EOS, Related Software and Tools
 
HDF Update
HDF UpdateHDF Update
HDF Update
 
Transitioning from HDF4 to HDF5
Transitioning from HDF4 to HDF5Transitioning from HDF4 to HDF5
Transitioning from HDF4 to HDF5
 
HDF OPeNDAP Project Update and Demo
HDF OPeNDAP Project Update and DemoHDF OPeNDAP Project Update and Demo
HDF OPeNDAP Project Update and Demo
 
HDF Update
HDF UpdateHDF Update
HDF Update
 
GES DISC Eexperiences with HDF Formats for MEaSUREs Projects
GES DISC Eexperiences with HDF Formats for MEaSUREs ProjectsGES DISC Eexperiences with HDF Formats for MEaSUREs Projects
GES DISC Eexperiences with HDF Formats for MEaSUREs Projects
 
GDAL Enhancement for ESDIS Project
GDAL Enhancement for ESDIS ProjectGDAL Enhancement for ESDIS Project
GDAL Enhancement for ESDIS Project
 
Multidimensional Scientific Data in ArcGIS
Multidimensional Scientific Data in ArcGISMultidimensional Scientific Data in ArcGIS
Multidimensional Scientific Data in ArcGIS
 
SPD and KEA: HDF5 based file formats for Earth Observation
SPD and KEA: HDF5 based file formats for Earth ObservationSPD and KEA: HDF5 based file formats for Earth Observation
SPD and KEA: HDF5 based file formats for Earth Observation
 
Using IDL with Suomi NPP VIIRS Data
Using IDL with Suomi NPP VIIRS DataUsing IDL with Suomi NPP VIIRS Data
Using IDL with Suomi NPP VIIRS Data
 
HDF Update
HDF UpdateHDF Update
HDF Update
 
Interoperability with netCDF-4 - Experience with NPP and HDF-EOS5 products
Interoperability with netCDF-4 - Experience with NPP and HDF-EOS5 productsInteroperability with netCDF-4 - Experience with NPP and HDF-EOS5 products
Interoperability with netCDF-4 - Experience with NPP and HDF-EOS5 products
 
Research Data Management at Imperial College London
Research Data Management at Imperial College LondonResearch Data Management at Imperial College London
Research Data Management at Imperial College London
 

Similar to HDF5 and The HDF Group

ERA CoBioTech Data Management Webinar
ERA CoBioTech Data Management WebinarERA CoBioTech Data Management Webinar
ERA CoBioTech Data Management WebinarFAIRDOM
 
The Earth System Grid Federation: Origins, Current State, Evolution
The Earth System Grid Federation: Origins, Current State, EvolutionThe Earth System Grid Federation: Origins, Current State, Evolution
The Earth System Grid Federation: Origins, Current State, EvolutionIan Foster
 
LCI2009-Tutorial
LCI2009-TutorialLCI2009-Tutorial
LCI2009-Tutorialtutorialsruby
 
LCI2009-Tutorial
LCI2009-TutorialLCI2009-Tutorial
LCI2009-Tutorialtutorialsruby
 

Similar to HDF5 and The HDF Group (20)

RFCs for HDF5 and HDF-EOS5 Status Update
RFCs for HDF5 and HDF-EOS5 Status UpdateRFCs for HDF5 and HDF-EOS5 Status Update
RFCs for HDF5 and HDF-EOS5 Status Update
 
Plans for Enhanced NetCDF-4 Interface to HDF5 Data
Plans for Enhanced NetCDF-4 Interface to HDF5 DataPlans for Enhanced NetCDF-4 Interface to HDF5 Data
Plans for Enhanced NetCDF-4 Interface to HDF5 Data
 
HDF Update
HDF UpdateHDF Update
HDF Update
 
HDF Update
HDF UpdateHDF Update
HDF Update
 
HDF Update
HDF UpdateHDF Update
HDF Update
 
HDF
HDFHDF
HDF
 
HDF Software Process - Lessons Learned & Success Factors
HDF Software Process - Lessons Learned & Success FactorsHDF Software Process - Lessons Learned & Success Factors
HDF Software Process - Lessons Learned & Success Factors
 
Moving form HDF4 to HDF5/netCDF-4
Moving form HDF4 to HDF5/netCDF-4Moving form HDF4 to HDF5/netCDF-4
Moving form HDF4 to HDF5/netCDF-4
 
ERA CoBioTech Data Management Webinar
ERA CoBioTech Data Management WebinarERA CoBioTech Data Management Webinar
ERA CoBioTech Data Management Webinar
 
ODSC and iRODS
ODSC and iRODSODSC and iRODS
ODSC and iRODS
 
The Earth System Grid Federation: Origins, Current State, Evolution
The Earth System Grid Federation: Origins, Current State, EvolutionThe Earth System Grid Federation: Origins, Current State, Evolution
The Earth System Grid Federation: Origins, Current State, Evolution
 
HDF Updae
HDF UpdaeHDF Updae
HDF Updae
 
HDF Status and Development
HDF Status and DevelopmentHDF Status and Development
HDF Status and Development
 
HDF Town Hall
HDF Town HallHDF Town Hall
HDF Town Hall
 
Hdg geo discussion
Hdg geo discussionHdg geo discussion
Hdg geo discussion
 
LCI2009-Tutorial
LCI2009-TutorialLCI2009-Tutorial
LCI2009-Tutorial
 
LCI2009-Tutorial
LCI2009-TutorialLCI2009-Tutorial
LCI2009-Tutorial
 
HDF Update for DAAC Managers (2017-02-27)
HDF Update for DAAC Managers (2017-02-27)HDF Update for DAAC Managers (2017-02-27)
HDF Update for DAAC Managers (2017-02-27)
 
SEEDS Standards Process
SEEDS Standards ProcessSEEDS Standards Process
SEEDS Standards Process
 
Parallel HDF5 Developments
Parallel HDF5 DevelopmentsParallel HDF5 Developments
Parallel HDF5 Developments
 

More from The HDF-EOS Tools and Information Center

STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...The HDF-EOS Tools and Information Center
 

More from The HDF-EOS Tools and Information Center (20)

Cloud-Optimized HDF5 Files
Cloud-Optimized HDF5 FilesCloud-Optimized HDF5 Files
Cloud-Optimized HDF5 Files
 
Accessing HDF5 data in the cloud with HSDS
Accessing HDF5 data in the cloud with HSDSAccessing HDF5 data in the cloud with HSDS
Accessing HDF5 data in the cloud with HSDS
 
The State of HDF
The State of HDFThe State of HDF
The State of HDF
 
Highly Scalable Data Service (HSDS) Performance Features
Highly Scalable Data Service (HSDS) Performance FeaturesHighly Scalable Data Service (HSDS) Performance Features
Highly Scalable Data Service (HSDS) Performance Features
 
Creating Cloud-Optimized HDF5 Files
Creating Cloud-Optimized HDF5 FilesCreating Cloud-Optimized HDF5 Files
Creating Cloud-Optimized HDF5 Files
 
HDF5 OPeNDAP Handler Updates, and Performance Discussion
HDF5 OPeNDAP Handler Updates, and Performance DiscussionHDF5 OPeNDAP Handler Updates, and Performance Discussion
HDF5 OPeNDAP Handler Updates, and Performance Discussion
 
Hyrax: Serving Data from S3
Hyrax: Serving Data from S3Hyrax: Serving Data from S3
Hyrax: Serving Data from S3
 
Accessing Cloud Data and Services Using EDL, Pydap, MATLAB
Accessing Cloud Data and Services Using EDL, Pydap, MATLABAccessing Cloud Data and Services Using EDL, Pydap, MATLAB
Accessing Cloud Data and Services Using EDL, Pydap, MATLAB
 
HDF - Current status and Future Directions
HDF - Current status and Future DirectionsHDF - Current status and Future Directions
HDF - Current status and Future Directions
 
HDFEOS.org User Analsys, Updates, and Future
HDFEOS.org User Analsys, Updates, and FutureHDFEOS.org User Analsys, Updates, and Future
HDFEOS.org User Analsys, Updates, and Future
 
HDF - Current status and Future Directions
HDF - Current status and Future Directions HDF - Current status and Future Directions
HDF - Current status and Future Directions
 
H5Coro: The Cloud-Optimized Read-Only Library
H5Coro: The Cloud-Optimized Read-Only LibraryH5Coro: The Cloud-Optimized Read-Only Library
H5Coro: The Cloud-Optimized Read-Only Library
 
MATLAB Modernization on HDF5 1.10
MATLAB Modernization on HDF5 1.10MATLAB Modernization on HDF5 1.10
MATLAB Modernization on HDF5 1.10
 
HDF for the Cloud - Serverless HDF
HDF for the Cloud - Serverless HDFHDF for the Cloud - Serverless HDF
HDF for the Cloud - Serverless HDF
 
HDF5 <-> Zarr
HDF5 <-> ZarrHDF5 <-> Zarr
HDF5 <-> Zarr
 
HDF for the Cloud - New HDF Server Features
HDF for the Cloud - New HDF Server FeaturesHDF for the Cloud - New HDF Server Features
HDF for the Cloud - New HDF Server Features
 
Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3
Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3
Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3
 
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
 
HDF5 and Ecosystem: What Is New?
HDF5 and Ecosystem: What Is New?HDF5 and Ecosystem: What Is New?
HDF5 and Ecosystem: What Is New?
 
HDF5 Roadmap 2019-2020
HDF5 Roadmap 2019-2020HDF5 Roadmap 2019-2020
HDF5 Roadmap 2019-2020
 

Recently uploaded

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel AraĂşjo
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 

Recently uploaded (20)

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 

HDF5 and The HDF Group

  • 1. www.hdfgroup.org The HDF Group HDF5 and The HDF Group May 2014
  • 3. www.hdfgroup.org Mission To provide high quality software for managing large complex data, to provide outstanding services for users of these technologies, and to insure effective management of data throughout the data life cycle. HDF5 and The HDF Group
  • 4. www.hdfgroup.org Goals of The HDF Group • To create, maintain, and evolve software and services that enable society to manage large complex data at every stage of the data life cycle. • To establish and maintain a sustainable organization with a highly-skilled and committed team devoted to accomplishing the first goal. HDF5 and The HDF Group
  • 5. www.hdfgroup.org The HDF Group • 1988-2006: Software group at University of Illinois National Center for Supercomputing Applications • 2005-present: Non-profit company in Champaign, IL • Passionate about managing large, complex, heterogeneous data throughout its life cycle • Creators and stewards of HDF4 and HDF5 • Own HDF4 and HDF5 • Formats, libraries, and tools are open and free • Committed to high quality and reliability • Currently employ 33 staff HDF5 and The HDF Group
  • 6. www.hdfgroup.org Current project list for the HDF Group • NASA – Earth Observing System (EOS) • The basis for global climate research • HDF is the standard archive and distribution format for EOS • Hundreds of data products, 8 petabyte archive and growing • NOAA/NASA – JPSS • Next generation weather satellite system and EOS • HDF5 is the primary distribution format (6 TB/day) • Sandia National Laboratory • High throughput, multi-stream satellite image management • Synchrotron community • Scalable solutions for high throughput data acquisition and management • ExaHDF5 (Lawrence Berkeley National Lab) • High end scientific simulations • Tuning HDF5 for high performance parallel I/O • FastForward Computing (DOE) • Solving I/O challenges for exascale computing HDF5 and The HDF Group
  • 7. www.hdfgroup.org The HDF Group Services • Helpdesk and Mailing Lists • Available to all users as a first level of support • Priority Support • Rapid issue resolution and advice • Consulting • Needs assessment, troubleshooting, design reviews, etc. • Training • Tutorials and hands-on practical experience • Enterprise Support • Coordinating HDF activities across departments • Special Projects • Adapting customer applications to HDF • New features and tools • Research and Development HDF5 and The HDF Group
  • 9. www.hdfgroup.orgHDF5 and The HDF Group Who uses HDF5? • Applications that deal with large or complex data • Over 200 different application areas • >2 million data product users world-wide • Academia, government agencies, industry
  • 10. www.hdfgroup.org Members of the HDF support community • NASA – Earth Observing System • NOAA/NASA/Riverside Tech – NPOESS • A large financial institution • DOE – projects w/LBNL & PNNL, ANL & ORNL • Lawrence Livermore National Lab • Army Geospatial Center • NIH/Geospiza (bio software company ) • Lawrence Berkeley National Lab • University of Illinois/NCSA • Sandia National Lab • A leading U.S. aerospace company • Projects for petroleum industry, vehicle testing, weapons research, others • “In kind” support HDF5 and The HDF Group
  • 11. www.hdfgroup.org New Areas We’re Exploring • Fusion research data storage • Submitted proposal for ITER project’s data management w/large industrial fusion partner • Astronomy • Submitted NSF SI2 grant w/NRAO • Working toward new standard for radioastronomy data storage • Electron Microscopy • Submitted NSF SI2 grant w/LSU, et al • Proposing new standard for storing imaging data • Synthesis of HDF5 and cloud storage w/Microsoft • Developing “RESTful” API for accessing HDF5 data in Azure cloud HDF5 and The HDF Group
  • 13. www.hdfgroup.orgHDF5 and The HDF Group NASA EOS Remote Sensed Data • HDF format is the standard file format for storing data from NASA's Earth Observing System (EOS) mission. • Petabytes of data stored in HDF4 and HDF5 to support the Global Climate Change Research Program.
  • 14. HDF5 and The HDF Group
  • 15. www.hdfgroup.org What is JPSS? • JPSS is the next generation of NOAA's polar-orbiting environmental satellites. • JPSS observations enable forecasting severe weather like hurricanes, tornadoes and blizzards, and assessing environmental hazards such as droughts, forest fires, poor air quality and harmful coastal waters. • JPSS will provide continuity of critical, global Earth observations— including our atmosphere, oceans and land through 2025. • During Hurricane Sandy in October 2012, JPSS data helped forecasters and scientists accurately predict Sandy's hurricane track and infamous 'left hook' landfall into New York and New Jersey–more than five days in advance. HDF5 and The HDF Group
  • 16. www.hdfgroup.org CFD General Notation System HDF5 and The HDF Group
  • 17. www.hdfgroup.org What is CFD? Computational fluid dynamics (CFD) is a branch of fluid mechanics that uses numerical methods and algorithms to solve and analyze problems that involve fluid flows. HDF5 and The HDF Group
  • 18. www.hdfgroup.org This CFD computer generated image shows a model of the space shuttle. CFD has taken the place of wind tunnels for many evaluations of aircraft and, as computing power increases and computer models become more sophisticated, CFD will largely replace wind tunnels. HDF5 and The HDF Group
  • 19. www.hdfgroup.org What is CGNS ? • Standard Interface Data Structures (SIDS) – Collection of conventions and definitions that defines the intellectual content of CFD-related data. • SIDS to ADF Mapping – Advanced Data Format • SIDS to HDF5 Mapping – Defines how the SIDS is represented in HDF5 • CGNS Mid-Level Library (MLL) – Application Programming Interface (API) which conforms to the SIDS – Built on top of ADF/HDF5, which do I/O operations HDF5 and The HDF Group
  • 20. www.hdfgroup.org CGNS and HDF5* • CGNS was originally built using the ADF format. • However, ADF does not have parallel I/O or data compression capabilities, and does not have the support and tools that HDF5 offers. • HDF5 has rapidly grown to become a world-wide format standard for scientific data. • HDF5 has parallel capability as well as a broader support base than ADF. • Therefore, CGNS has adopted HDF5 as the default (official) data storage mechanism. * Paraphrased from http://cgns.sourceforge.net/hdf5.html. HDF5 and The HDF Group
  • 21. www.hdfgroup.org • An adaptive mesh refinement (AMR), grid-based hybrid code which is designed to do simulations of cosmological structure formation. HDF5 and The HDF Group
  • 22. HDF5 and The HDF Group Image credit: Alexei Kritsuk, Paolo Padoan & Mike Norman
  • 23. www.hdfgroup.org What is ENZO for? • At UC San Diego ENZO cosmology is used to simulate the universe from first principles, starting near the Big Bang. • Researchers using ENZO have conducted the most detailed simulations ever of a region of the universe more than 1.5 billion light years across. • “We need to zoom in on these dense regions to capture the key physical processes -- including gravitation, flows of normal and ‘dark’ matter, and shock heating and radiative cooling of the gas,” said Mike Norman. “This requires ENZO’s ‘adaptive mesh refinement’ capability.” HDF5 and The HDF Group
  • 24. www.hdfgroup.org • “AMR codes begin with a coarse grid spacing, and then spawn more detailed subgrids as needed to track key processes in higher density regions. • “We achieved unprecedented detail by reaching seven levels of subgrids throughout the survey volume -- something never done before -- producing more than 400,000 subgrids,” said SDSC computational scientist Robert Harkness. • “Norman is one of the largest users of supercomputing time in the world, with 16 million computing hours at the TACC, and millions more on TeraGrid systems at SDSC, PSC, and NCSA.” • “The HDF Group provided important support for handling the output, and SDSC’s data storage environment allowed the researchers to efficiently store and manage the massive data.” HDF5 and The HDF Group
  • 25. NeXus HDF5 and The HDF Group
  • 26. www.hdfgroup.org What is NeXus? • In recent years, scientists and programmers in neutron and synchrotron facilities around the world concluded that a common data format would fulfill a valuable function in the scattering community. • As instrumentation becomes more complex and data visualization more challenging, scientists find it difficult to keep up with new developments. • A common data format makes it easier to exchange experimental results and to exchange ideas about how to analyze them. It promotes greater cooperation in software development and stimulates the design of more sophisticated visualization tools. • The NeXus data format has been developed in response to these needs. HDF5 and The HDF Group
  • 28. www.hdfgroup.org Data challenges addressed by HDF5 • Ability to organize complex collections of data • Efficient and scalable data storage and access • A growing need to integrate a wide variety of types of data • The evolution of data technologies • Long term preservation of data HDF5 and The HDF Group
  • 29. www.hdfgroup.orgHDF5 and The HDF Group HDF is… • HDF stands for ‘Hierarchical Data Format’ • A file format for storing any kind of data • Software system to manage data in the format • Designed for high volume or complex data • Designed for every size and type of system • Open format and software library, tools • There are two HDF’s: HDF4 and HDF5 • Here we focus on HDF5
  • 30. www.hdfgroup.org HDF5 Technology Platform HDF5 data model • The “building blocks” for data organization HDF5 software • Library, language interfaces, tools HDF5 file format • Byte-level organization of data HDF5 and The HDF Group
  • 31. www.hdfgroup.org Professionally managed • Source under version control, public access • Automatic daily testing, • 200+ configurations • Performance, backward/forward compatibility • “C, C++, Fortran, Java, Python APIs • Build supports Autoconfigure and CMake • Sound development, coding practices • Maintenance releases every May, November HDF5 and The HDF Group
  • 32. www.hdfgroup.org Professionally supported • Helpdesk • FORUM and mailing lists • Extensive web documentation – User’s Guide, Ref Manual, examples, tutorials, other docs • Community friendly • Integrate contributions from external developers • Solicit feedback on new features and pre- releases • Collaborate on projects, especially in testing HDF5 and The HDF Group
  • 33. www.hdfgroup.org HDF5 file lat | lon | temp ----|-----|----- 12 | 23 | 3.1 15 | 24 | 4.2 17 | 21 | 3.6 An HDF5 file is a container that holds data objects. HDF5 and The HDF Group
  • 34. www.hdfgroup.org HDF5 file organization lat | lon | temp ----|-----|----- 12 | 23 | 3.1 15 | 24 | 4.2 17 | 21 | 3.6 Experiment Notes: Serial Number: 99378920 Date: 3/13/09 Config: Standard 3 / SimOutViz HDF5 groups and links organize data objects. Parameters 10;100;1000 Timestep 36,000 HDF5 and The HDF Group
  • 35. www.hdfgroup.orgHDF5 and The HDF Group A single platform with multiple uses • One general data model • One general format • One library • Adaptable for almost any kind of data • Works on almost any architecture • Ability to interact well with other technologies • Attention to past, present, future compatibility HDF5 Philosophy
  • 36. www.hdfgroup.org HDF5 Software Layers & Storage HDF5 File Format File Split Files File on Parallel Filesystem Other h5dump tool High Level APIs HDFView tool Tools h5repack tool … I/O Drivers Internals Datatype Conversion data compression Chunked Storage Version Compatibility and so on… Language Interfaces C, Fortran, C++ HDF5 Data Model Groups, Datasets, Attributes, … HDF5Library Posix I/O Split Files Parallel I/O Custom HDF5 and The HDF Group
  • 37. www.hdfgroup.org HDF ecosystem Storage EOS Domain Data Objects Applications EOS Applications MATLAB HDF Library IDL HDF-EOS Library Swath Grid Point Etc. HDF tools HDF5 and The HDF Group
  • 38. www.hdfgroup.org Other Software • The HDF Group • HDFView – an HDF4 & HDF5 browser • Command-line utilities • Regression and performance testing software • 3rd Party • NetCDF-4, IDL, MATLAB, Mathematica, PyTables, Pandas • Communities • EOS, ASC, CGNS, Energistics, NeXuS • Integration with other software • iRODS, OPeNDAP, MPI HDF5 and The HDF Group