Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Using HDF5 tools for performance tuning and troubleshooting
1. Using HDF5 tools for
performance tuning and
troubleshooting
2/18/2014
HDF and HDF-EOS Workshop X, Landover, MD
1
2. Introduction
• HDF5 tools may be very useful for performance tuning
and troubleshooting
• Discover objects and their properties in HDF5 files
h5dump -p
• Get file size overhead information
h5stat
• Get locations of the objects in a file
h5ls
• Discover differences
h5diff, h5ls
• Location of raw data
h5ls –vra
2/18/2014
HDF and HDF-EOS Workshop X, Landover, MD
2
3. h5stat
• Prints different statistics about HDF5 file
• Helps
• To troubleshoot size overhead in HDF5 files
• To choose specific object’s properties and storage
strategies
• To use
h5stat --help
h5stat file.h5
• Spec can be found
http://www.hdfgroup.org/RFC/h5stat/
• Let us know if you need some “special” type of statistics
2/18/2014
HDF and HDF-EOS Workshop X, Landover, MD
3
4. h5stat
• Reports two types of statistics:
• High-level information about objects (examples):
• Number of different objects (groups, datasets, datatypes) in
a file
• Number of unique datatypes
• Size of raw data in a file
• Information about object’s structural metadata
• Sizes of structural metadata (total/free)
• Object headers, local and global heaps
• Sizes of B-trees
• Object headers fragmentation
2/18/2014
HDF and HDF-EOS Workshop X, Landover, MD
4
5. h5stat
• Examples of high-level information:
File information
# of unique groups: 10008
# of unique datasets: 30
# of unique named datatypes: 0
……………………
Max. # of links to object: 1
Max. depth of hierarchy: 4
Max. # of objects in group: 19
……………………
Group bins:
# of groups of size 0: 10000
# of groups of size 1 - 9: 7
# of groups of size 10 - 99: 1
……………………
Max. dimension size of 1-D datasets: 1643
……………………
Dataset filters information:
Number of datasets with
………………
SZIP filter: 2
………………
NBIT filter: 10
USER-DEFINED filter: 1
2/18/2014
HDF and HDF-EOS Workshop X, Landover, MD
5
6. h5stat
• Conclusion:
• There are a lot of empty groups in the file; good candidate for
compact group feature
• Some datasets use “user-defined” filters and may not be readable by
HDF5 library
• SZIP compression is needed to read some datasets
Oh… my application uses buffers of size 1024 to read data…
No wonder it crashes on reading…
Do I have all filters needed to read the data?
2/18/2014
HDF and HDF-EOS Workshop X, Landover, MD
6
8. h5stat
• Conclusions
• File size: 6228197
• 1.5% overhead (not bad at all!)
• There some elements are of size 65535 and 32000
Oh… Is it really what I want?
Should I use other datatype and get advantage of compression?
2/18/2014
HDF and HDF-EOS Workshop X, Landover, MD
8
9. Case study: Using HDF5tools to debug a problem
• My applications creates files on Windows with VS2005 and VS2003. I can
read the VS2003 file but not the VS2005 one. H5dump reads both files
OK and there are no differences. What am I doing wrong?
• h5diff good.h5 bad.h5
Datatype:
</Definitions/timespec> and </Definitions/timespec> 1
differences found
• h5ls –vr good.h5
/Definitions/timespec
Location: 0:1:0:900
Type
• h5debug good.h5 900
Message Information:
Type class:
Size:
compound
8 bytes
• h5debug bad.h5 900
Message Information:
Type class:
Size:
2/18/2014
HDF and HDF-EOS Workshop X, Landover, MD
compound
16 bytes
9
10. Case study: Using HDF5tools to debug a problem
• Conclusions
• Compound datatype “timespec” requires different
number of bytes on VS2005 (16 bytes; 2x8bytes) and
on VS2003 (8bytes; 2x4bytes)
Oh… How do I read my data back?
I assumed that my struct would need only 8 bytes for each elements but
it needs 16 bytes on VS2005. I need H5Tget_native_type function
to find the type of my data in memory
2/18/2014
HDF and HDF-EOS Workshop X, Landover, MD
10
11. Where is my data?
• h5ls –var be_data.h5:
Opened "be_data.h5" with sec2 driver.
/Array
Dataset {5/5, 6/6}
Location: 0:1:0:792
Links:
1
Modified: 2006-04-07 15:08:39 CDT
Storage:
240 logical bytes, 240 allocated bytes, 100.00%
utilization
Type:
IEEE 64-bit big-endian float
Address:
2048
• 30 8-byte elements can be read from address 2048 by non-HDF5 application
2/18/2014
HDF and HDF-EOS Workshop X, Landover, MD
11