The document summarizes updates from The HDF Group. It discusses that The HDF Group was established in 1988 and owns HDF4 and HDF5 formats and libraries. It provides services like helpdesk, support, consulting and training to users. The HDF Group aims to ensure long-term accessibility of HDF data through development and support of HDF technologies. Recent improvements include new HDF5 and HDF4 releases, tools updates, HDF-Java and SWMR file access work. Future work involves parallel I/O, indexing methods and EOS, OPeNDAP and NPP/NPOESS support.
1. The HDF Group
HDF Update
Mike Folk
The HDF Group
The 13th HDF and HDF-EOS Workshop
November 3-5, 2009
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
1
www.hdfgroup.org
3. The HDF Group
What’s up with The HDF
Group?
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
3
www.hdfgroup.org
4. The HDF Group
What is
The HDF Group
And why does it exist?
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
4
www.hdfgroup.org
5. The HDF Group
• Established in 1988
• 18 years at University of Illinois National
Center for Supercomputing Applications
• 4 years an independent non-profit company
“The HDF Group”
• The HDF Group owns HDF4 and HDF5
• Basic HDF4 and HDF5 formats, libraries and
tools are open and free
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
5
www.hdfgroup.org
6. Data challenges addressed by HDF
• Our ability to organize complex collections of
data
• Efficient and scalable data storage and access
• A growing need to integrate a wide variety of
types of data
• Long term preservation of data
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
6
www.hdfgroup.org
7. The HDF Group
The HDF Group Mission
To ensure long-term
accessibility of HDF data
through sustainable
development and support of
HDF technologies.
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
7
www.hdfgroup.org
8. Goals
• Maintain and evolve HDF for sponsors and
communities that depend on it
• Provide support to the HDF communities
through consulting, training, tuning,
development, research
• Sustain The HDF Group for the long term to
assure data access over time
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
8
www.hdfgroup.org
9. The HDF Group Services
• Helpdesk and Mailing Lists
• Available to all users as a first level of support
• Standard Support
• Rapid issue resolution and advice
• Consulting
• Needs assessment, troubleshooting, design reviews, etc.
• Training
• Tutorials and hands-on practical experience
• Enterprise Support
• Supporting many HDF activities across organizations
• Special Projects
• Adapting customer applications to HDF
• New features and tools
• Research and Development
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
9
www.hdfgroup.org
10. Members of the HDF support community
•
•
•
•
•
•
•
•
•
NASA – EOS
NOAA/NASA/Riverside Tech – NPOESS
Army Geospatial Center
A leading U.S. aerospace company
NIH/Geospiza (bio software company )
University of Illinois/NCSA
Sandia National Laboratory (2)
Lawrence Berkeley National Lab
Projects for petroleum industry, vehicle testing,
weapons research, others
• “In kind” support
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
10
www.hdfgroup.org
11. Some areas of increased recent interest
• Improvements
•
•
•
•
Concurrent access
Parallel I/O performance
Real-time write performance
High level language support
• Life sciences
• Sequencing
• Biomedical imaging
• Database integration
• Microsoft products (HPC, .NET, others)
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
11
www.hdfgroup.org
14. The HDF Group
Basic Library Releases
HD
F5
HDF4
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
14
4
HDF
www.hdfgroup.org
15. Time-line of the HDF libraries releases
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
15
www.hdfgroup.org
16. HDF5 1.8.3 minor release (May 09)
• New functions
• Improve flexibility when traversing external links
• Validate object identifier
• Enabled data chunk cache properties to be set
per dataset (per file in previous releases)
• Forward/backward compatibility issues
• Modified library to be able to open files with
corrupt root group symbol table messages
• Also corrects corruption errors if found.
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
16
www.hdfgroup.org
17. HDF5 1.8.4 minor release (Nov 09)
• Modified configure and make process to
properly preserve user's CFLAGS and similar
environment variables.
• Corrected a problem where library would rewrite the superblock in a file opened for R/W
access, even when no changes were made
to the file.
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
17
www.hdfgroup.org
18. HDF5 1.6 minor releases
• 1.6.9 May 09
• Minor bug fixes
• Same tools improvements as in 1.8.3
• 1.6.10 Nov 09
• Minor bug fixes
• Ability to embed library information in executable
binaries
• This is a last release of 1.6 series
• announced in May 2009 – no response
• This is your last chance!
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
18
www.hdfgroup.org
19. HDF 4r2.4 minor release (Feb 09)
•
•
•
•
•
Minor bug fixing, enhancements
New routines to get size of compressed data
Support for C shared libraries
Support for 32-bit version on Mac Intel
Updated docs in HTML and PDF
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
19
www.hdfgroup.org
20. HDF 4r2.5 minor release (Feb 10)
• Minor bug fixes, enhancements
• Support for 64-bit version on Mac Intel
• Restructured and cleaned up source code for
easier maintenance
• Changes in versioning
• Improves ability to maintain
• Becomes similar HDF5 versioning works
• Will use major, minor, release and sub-release
suffix in the names of the source tar balls
• E.g., hdf-4.2.5, hdf-4.2.5-snap0
• Library string will include suffix
• E.g., "HDF Version 4.2 Release 4-snap3, October 18,
2009"
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
20
www.hdfgroup.org
21. H4-H5 Conversion Software 2.1 (Feb 09)
• Based on HDF4r2.4 and HDF5-1.8.2
• h4toh5 utility
• Recognizes HDF-EOS2 files (--with-hdfeos2
configuration option)
• Can generate HDF5 files that can be read by
netCDF-4
• h4toh5 library
• Bug fixes
• Performance improvements
• http://hdfgroup.org/h4toh5/
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
21
www.hdfgroup.org
22. H4-H5 Conversion Software 2.2 (Feb 10)
• Based on HDF4r2.5 and HDF5-1.8.4
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
22
www.hdfgroup.org
24. Major Improvements for Existing Tools
• H5dump additions
• Ability to show data pointed to by dataset region references.
• More options for dumping data into ASCII
• Compatible with MS Excel
• Compatible with h5import
• h5diff
• Improvements in accuracy, flexibility, and performance
• Some new flags
• Report non-comparable objects
• Avoid NaN detection
• Option to use system epsilon to compare floating-point numbers
• Compares for strict equality first to improve performance
• Treats two INFINITY values as equal
• Fixed segmentation fault problem on variable length strings.
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
24
www.hdfgroup.org
25. Major Improvements for Existing Tools
• h5stat
• Fixed incorrect statistics on EOS big data files
with corrupted headers.
• h5repack
• Added ability to preserve group creation order
• When chunk size not specified, uses
heuristics to set chunk size
• Fixed problem that 1.8 fails on a file created
with 1.6.
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
25
www.hdfgroup.org
26. Tool activities in the works
• New tool -- h5tail
• Display new records appended to a dataset
• Improved code quality and testing
• Tools library: general purpose APIs for tools
• Tools library currently only for our developers
• Want to make it public so that people can use it in
their products
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
26
www.hdfgroup.org
27. Conversion Tools
Please send us your comments and requests
regarding HDF5 conversion tools, such as
•
•
•
•
HDF4 to HDF5
HDF5 to jpeg
HDF5 to XML
HDF5 to other formats?
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
27
www.hdfgroup.org
29. HDF-Java 2.6 is on the way
• Includes all HDF java products
• Java Wrapper API
• Java Object API
• HDFView
• Adds new features, such as better support for
dataset region references
• Improves performance
• Release schedule
• Beta 1: end of Nov. 09
• Full release: end of Dec. 09
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
29
www.hdfgroup.org
30. Full support of HDF5 1.8.x in hdf-java
• Full HDF5 1.8 support will be added to the
release after version 2.6.
• We are looking for input
• RFC:
http://www.hdfgroup.uiuc.edu/RFC/HDF5/hdf-java/
• Java wrapper will be completed March 2010
• Object API and HDFView update to come later
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
30
www.hdfgroup.org
32. Single-Writer/Multiple-Reader Access
• Situation: A long-running process is modifying
an HDF5 file and simultaneously other
processes want to inspect data in the file.
• Solution: Single-Writer/Multiple-Reader
(SWMR) File Access.
• Allows simultaneous reading of HDF5 file while
the file is being modified by another process
• No inter-process coordination necessary
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
32
www.hdfgroup.org
33. Improved Multi-Threaded Concurrency
• Converting from “big lock” on code (entire
library) to locks on internal library data
structures
• Will improve ability to have multiple threads
performing HDF5 operations simultaneously
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
33
www.hdfgroup.org
34. Other Library Features
• Saving space
• Store Partial Edge Chunks More Efficiently
• Persistent File Free Space tracking/recovery
• Allow a group’s link info to be compressed
• Saving time
• Aggregate neighboring metadata for faster
metadata cache I/O
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
34
www.hdfgroup.org
35. New chunk indexing methods
Dataset type
Index type
Space
improvements
Speed
improvements
no unlimited
dimensions,
no filters,
no missing
chunks
“implicit”
no actual
chunk index
Same storage
space as
contiguous dataset
storage (no index)
Constant time
lookups
Faster parallel I/O
no unlimited
dimensions
“fixed sized”
smaller chunk
index
Smaller index
overhead
Constant time
lookups
1 unlimited
dimension
“extensible
array”
Smaller index
overhead
Constant time
lookups and
appends
2+ unlimited
dimension
Improved
B-tree*
Smaller index
overhead
Faster
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
35
www.hdfgroup.org
36. Parallel I/O Improvements
• Project with Lawrence Berkeley Nat’l Lab to
improve HDF5 performance on parallel
applications
• Up to 6x performance improvements on
certain applications (so far)
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
36
www.hdfgroup.org
38. The HDF Group
HDF-EOS library
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
38
www.hdfgroup.org
39. EOS support
• HDF-EOS2 and HDF-EOS5
• Automatic configuration with szip
enabled/disabled
• Now tested daily with HDF4 and HDF5
development code
• Updated the HDF-EOS website
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
39
www.hdfgroup.org
41. The Main Challenge
• Would like netCDF-4 applications to be able
to read and understand HDF-EOS 5 files
• Problem: NetCDF-4 model follows the HDF5
dimension scale model but HDF-EOS5 does
not.
HDFEOS
GRIDS
No HDF5 dimension
No
CloudFractionAndPressure HDF5 dimension
scales are associated
scales are associated
Data Fields
with this variable
with this variable
CloudFraction
CloudPressure
November 3-5, 2009
41
HDF/HD
F-EOS
Worksh
op XIII
www.hdfgroup.org
42. Our Solution – Augmentation
• Provide dimensions required by netCDF-4
HDFEOS
GRIDS
CloudFractionAndPressure
Data Fields
CloudFraction[XDim][YDim]
CloudPressure[XDim][YDim]
XDim
YDim
November 3-5, 2009
42
HDF/HD
F-EOS
Worksh
op XIII
www.hdfgroup.org
43. Special values in HDF5
• There are cases where a user may wish to
specify more than one “special” value to
describe non-standard data.
• We provide several examples (C, Fortran, IDL)
on how to store special values
• http://www.hdfgroup.org/pubs/rfcs/
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
43
www.hdfgroup.org
45. OPeNDAP
• HDF5-OPeNDAP handler
• Served OMI Swath data
• HDF4-OPeNDAP handler
• Tested with some AIRS data and some MODIS
data
• More information in the Thursday morning
session
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
45
www.hdfgroup.org
46. Swath to Grid conversion Tool
•
•
•
•
Request from NASA GES DISC
Convert Swath to Grid
Support both HDF-EOS2 and TRMM data
Still in the development
MODIS Swath
Converted Grid
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
46
www.hdfgroup.org
47. The HDF Group
Support for NPP/NPOESS
by
The HDF Group
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
47
www.hdfgroup.org
48. Priorities for 2008-2009
• Data accessibility and usability
• Developed library of high level APIs to support
NPP/NPOESS data management
• Modified h5dump to display region references
• Modified HDFView to view object and region
references and quality flags
• System maintenance
• User support
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
48
www.hdfgroup.org
49. NPOESS Project Information
• Project Web site
• http://www.hdfgroup.org/projects/npoess/
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
49
www.hdfgroup.org
51. HDF4 Layout Map Project
• Problem
• Long-term readability of HDF data depends
on long-term availability of software
• Proposed solution
• Create a map of the layout of data objects in
an HDF file, allowing a simple reader to be
written to access the data
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
51
www.hdfgroup.org
52. A Project with the Army Geospatial Center
TRANSFORMING THE
GEOCOMPUTATIONAL BATTLESPACE
FRAMEWORK WITH HDF5
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
52
www.hdfgroup.org
53. Data Challenges
Military Decision Making
Wide variety
Satellite
Buckeye
November 3-5, 2009
Large scale
Culture
High res.
Stream
HDF/HDF-EOS Workshop XIII
High efficiency
Accuracy
53
Time
www.hdfgroup.org
54. NIH STTR with Geospiza, Seattle WA
BIOHDF :TOWARD
SCALABLE
BIOINFORMATICS
INFRASTRUCTURES
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
TM
54
www.hdfgroup.org
55. Next Generation DNA Sequencing
NGS is Powerful
“Transforms today’s biology”
“Democratizing genomics”
“Genome center in a mail room”
“Changing the landscape”
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
55
www.hdfgroup.org
56. … And Daunting
“Prepare for the deluge”
“Byte-ing off more than you can chew”
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
56
www.hdfgroup.org
57. BioHDF Project
• Goal: Move bioinformatics problems from organizing
and structuring data to asking questions and
visualizing data
• Develop data models and tools to work with NGS data in HDF5
• Create HDF5 domain-specific extensions and library modules to
support the unique aspects of NGS data BioHDF
• Integrate BioHDF technologies into Geospiza products
• Deliver core BioHDF technologies to the community
as open-source software
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
57
www.hdfgroup.org
58. The HDF Group
Thank You All
and
Thank You NASA!
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
58
www.hdfgroup.org
59. Acknowledgements
• This report is based on work supported by
cooperative agreement number NNX08AO77A
from the National Aeronautics and Space
Administration (NASA).
• Any opinions, findings, conclusions, or
recommendations expressed in this material
are those of the author[s] and do not
necessarily reflect the views of the National
Aeronautics and Space Administration.
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
59
www.hdfgroup.org
Why
Increasing need for support, services, quick response
Not a good model for a University R&D project
Who
11 software engineers and several students: develop, maintain HDF software, work on special projects, manage projects
3 tech support staff: helpdesk, doc, sysadmin.
Management team
President
Director of Technical Services and Operations
Director of Software Development
Director of Business Operations
Managers responsible for tools, applications
Other THG staff include seven full-time software engineers who develop and maintain the HDF software, as well as working on special projects, and three technical support staff who provide helpdesk support, documentation, and system administration. The HDF group also generally employs students from the University Computer Science and Engineering departments.
The R&D mission
Maintain and evolve HDF for high end science apps
Maintain HDF4 and HDF5 and tools at supercomputing centers, TeraGrid
Support academic science
Cutting edge data management research
Adapt to leading edge, experimental architectures
Integrate with new middleware technologies, parallel file systems
The “Support and Sustain” mission
Maintain, evolve for communities, sponsors
Provide proprietary consulting, tuning, development
Sustain for long term, maintain data access over time
Please mention here that HDF5 maintenance releases are on a half year basis and HDF4 maintenance releases are on yearly basis, i.e., next maintenance release of HDF5 1.6 and 1.8 will be May 2009, and HDF4 in November 2009
Options to dump data into ASCII (compatible w. h5import and Excel)
- As long as the file system is POSIX compliant
Other processes can be on other systems (as long as shared file system is POSIX compliant)
Store Partial Edge Chunks More Efficiently
Allow application to control whether partially used chunks at edges of datasets are compressed and/or allocated as full chunks in file.
Persistent File Free Space tracking
No more “forgetting where all the free space in the file is” when the file is closed
Allow a group’s heaps (which store link info) to be compressed