This 2009 tutorial slide will cover basic HDF5 Data Model objects and their properties. It will include an overview of the HDF5 Libraries and APIs, and describe the HDF5 programming model. Simple programming examples and the HDFView data browser will be used to illustrate HDF5 concepts and start developing your own HDF5 based applications.
This tutorial is for new HDF5 users.
1. The HDF Group
Introduction to HDF5
Barbara Jones
The HDF Group
The 13th HDF & HDF-EOS Workshop
November 3-5, 2009
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
1
www.hdfgroup.org
2. Before We Begin …
HDF-EOS Home Page:
http://hdfeos.org/
Workshop Info:
http://hdfeos.org/workshops/ws13/workshop_thirteen.php
The HDF Group Page:
HDF5 Home Page:
HDF Helpdesk:
HDF Mailing Lists:
November 3-5, 2009
http://hdfgroup.org/
http://hdfgroup.org/HDF5/
help@hdfgroup.org
http://hdfgroup.org/services/support.html
HDF/HDF-EOS Workshop XIII
2
www.hdfgroup.org
3. HDF = Hierarchical Data Format
HDF5 is the second HDF format
• Development started in 1996
• First release was in 1998
HDF4 is the first HDF format
• Originally called HDF
• Development started in 1987
• Still supported by The HDF Group
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
3
www.hdfgroup.org
5. HDF5 is designed …
• for high volume and/or complex data
• for every size and type of system (portable)
• for flexible, efficient storage and I/O
• to enable applications to evolve in their use of
HDF5 and to accommodate new models
• to support long-term data preservation
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
5
www.hdfgroup.org
6. HDF5 Technology
HDF5 is a data model, library and file format for
managing data.
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
6
www.hdfgroup.org
7. HDF5 Technology
• HDF5 (Abstract) Data Model
•
•
Defines the “building blocks” for data organization and
specification
Files, Groups, Datasets, Attributes, Datatypes, Dataspaces, …
• HDF5 Library (C, Fortran 90, C++ APIs)
•
Also Java Language Interface and High Level Libraries
• HDF5 Binary File Format
•
•
Bit-level organization of HDF5 file
Defined by HDF5 File Format Specification
• Tools For Accessing Data in HDF5 Format
•
h5dump, h5repack, HDFView, …
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
7
www.hdfgroup.org
8. The HDF Group
HDF5 Abstract Data Model
a.k.a. HDF5 Logical Data Model
a.k.a. HDF5 Data Model
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
8
www.hdfgroup.org
9. HDF5 File
lat | lon | temp
----|-----|----12 |eEx23 | 3.1
S pe
D ri rim
Nu
C at
15onf|e: 3a/l24ent|o 4.2
m N
ig 13 be te
ur /0 r: s:
9
17 | ati21 |99373.6
on
An HDF5 file is a
container that
holds data
objects.
November 3-5, 2009
:S
HDF/HDF-EOS Workshop XIII
89
ta
20
nd
ar
d
3
9
www.hdfgroup.org
10. HDF5 Groups and Links
HDF5 groups
and links
organize
data objects.
/
Experiment Notes:
Serial Number: 99378920
Date: 3/13/09
Configuration: Standard 3
SimOut
Viz
lat | lon | temp
----|-----|----12 | 23 | 3.1
15 | 24 | 4.2
17 | 21 | 3.6
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
10
www.hdfgroup.org
11. HDF5 Objects
The two primary HDF5 objects are:
• HDF5 Group: A grouping structure containing
zero or more HDF5 objects
• HDF5 Dataset: Raw data elements, together
with information that describes them
(There are other HDF5 objects that help support
Groups and Datasets.)
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
11
www.hdfgroup.org
12. HDF5 Groups
• Used to organize collections
• Every file starts with a root group
• Similar to UNIX directories
• Path to object defines it
• Objects can be shared:
/A/k and /B/l are the same temp
“/”
A
k
B
l
C
temp
= Group
= Dataset
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
12
www.hdfgroup.org
13. HDF5 Datasets
HDF5 Datasets organize and contain your
“raw data values”. They consist of:
• Your raw data
• Metadata describing the data:
- The information to interpret the data (Datatype)
- The information to describe the logical layout of the
data elements (Dataspace)
- Characteristics of the data (Properties)
- Additional optional information that describes the
data (Attributes)
November 3-5,
2009
HDF/HDF-EOS Workshop XIII
13
www.hdfgroup.org
15. HDF5 Dataspaces
An HDF5 Dataspace describes the logical layout
for the data elements:
• Array
• multiple elements in dataset organized in a
multi-dimensional (rectangular) array
• maximum number of elements in each
dimension may be fixed or unlimited
• NULL
• no elements in dataset
• Scalar
• single element in dataset
•
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
15
www.hdfgroup.org
16. HDF5 Dataspaces
Two roles:
Dataspace contains spatial information (logical
layout) about a dataset
stored in a file
• Rank and dimensions
• Permanent part of dataset
definition
Rank = 2
Dimensions = 4x6
Partial I/0: Dataspace describes application’s data
buffer and data elements participating in I/O
Rank = 1
Dimension = 10
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
16
www.hdfgroup.org
17. HDF5 Datatypes
The HDF5 datatype describes how to interpret
individual data elements.
HDF5 datatypes include:
− integer, float, unsigned, bitfield, …
− user-definable (e.g., 13-bit integer)
− variable length types (e.g., strings)
− references to objects/dataset regions
− enumerations - names mapped to integers
− opaque
− compound (similar to C structs)
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
17
www.hdfgroup.org
19. HDF5 Properties
• Properties (also known as Property Lists)
are characteristics of HDF5 objects that can
be modified
• Default properties handle most needs
• By changing properties one can take
advantage of the more powerful features in
HDF5
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
20
www.hdfgroup.org
20. Storage Properties
Data elements
stored physically
adjacent to each
other
Contiguous
(default)
Better access time
for subsets;
extensible
Chunked
Improves storage
efficiency,
transmission speed
Chunked &
Compressed
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
21
www.hdfgroup.org
21. HDF5 Attributes (optional)
• An HDF5 attribute has a name and a value
• Attributes typically contain user metadata
• Attributes may be associated with
- HDF5 groups
- HDF5 datasets
- HDF5 named datatypes
• An attribute’s value is described by a datatype and a
dataspace
• Attributes are analogous to datasets except…
- they are NOT extensible
- they do NOT support compression or partial I/O
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
22
www.hdfgroup.org
22. HDF5 Abstract Data Model Summary
• The Objects in the Data Model are the “building
blocks” for data organization and specification
• Files, Groups, Links, Datasets, Datatypes,
Dataspaces, Attributes, …
• Projects using HDF5 “map” their data concepts to
these HDF5 Objects
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
23
www.hdfgroup.org
23. The HDF Group
HDF5 Software
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
24
www.hdfgroup.org
24. HDF5 Software Layers & Storage
API
High Level
APIs
…
Language
Interfaces
C, Fortran, C++
Internals
Virtual File
Layer
h5dump
tool
h5repack
tool
HDF5 Data Model
Objects
Datatype
Conversion
Filters
Split
Files
Posix
I/O
Java Interface
Tunable Properties
Groups, Datasets, Attributes, …
Memory
Mgmt
HDFview
tool
Chunk Size, I/O Driver, …
Chunked
Storage
Version
Compatibility
and so
on…
Custom
MPI I/O
I/O Drivers
HDF5 File
Format
November 3-5, 2009
File
Split
Files
HDF/HDF-EOS Workshop XIII
File on
Parallel
Filesystem
25
Other
www.hdfgroup.org
25. HDF5 API and Applications
Applications
aClimate
Model
Domain Data
Objects
EOS
library
MATLAB
…
HDF5 Library
Storage
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
26
www.hdfgroup.org
26. HDF5 Home Page
HDF5 home page: http://hdfgroup.org/HDF5/
•
Two releases: HDF5 1.8 and HDF5 1.6
HDF5 source code:
• Written in C, and includes optional C++, Fortran 90 APIs,
and High Level APIs
• Contains command-line utilities (h5dump, h5repack,
h5diff, ..) and compile scripts
HDF pre-built binaries:
• When possible, include C, C++, F90, and High Level
libraries. Check ./lib/libhdf5.settings file.
• Built with and require the SZIP and ZLIB external libraries
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
27
www.hdfgroup.org
27. Useful Tools For New Users
h5dump:
Tool to “dump” or display contents of HDF5 files
h5cc, h5c++, h5fc:
Scripts to compile applications
HDFView:
Java browser to view HDF4 and HDF5 files
http://www.hdfgroup.org/hdf-java-html/hdfview/
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
28
www.hdfgroup.org
28. h5dump Utility
h5dump [options] [file]
-H, --header
-d <names>
-g <names>
-p
Display header only – no data
Display the specified dataset(s).
Display the specified group(s) and
all members.
Display properties.
<names> is one or more appropriate object names.
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
29
www.hdfgroup.org
30. HDF5 Compile Scripts
• h5cc – HDF5 C compiler command
• h5fc – HDF5 F90 compiler command
• h5c++ – HDF5 C++ compiler command
To compile:
% h5cc h5prog.c
% h5fc h5prog.f90
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
31
www.hdfgroup.org
31. Compile option: -show
-show: displays the compiler commands and options
without executing them
% h5cc –show Sample_c.c
Will show the correct paths and libraries used by
the installed HDF5 library.
Will show the correct flags to specify when
building an application with that HDF5 library.
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
32
www.hdfgroup.org
32. The HDF Group
Browsing HDF5 Files with
HDFView
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
33
www.hdfgroup.org
37. Operations Supported by the API
• Create objects (groups, datasets, attributes, complex data
types, …)
• Assign storage and I/O properties to objects
• Perform complex subsetting during read/write
• Use variety of I/O “devices” (parallel, remote, etc.)
• Transform data during I/O
• Make inquiries on file and object structure, content,
properties
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
38
www.hdfgroup.org
38. General Programming Paradigm
• Properties of object are optionally defined
Creation properties
Access properties
• Object is opened or created
• Object is accessed, possibly many times
• Object is closed
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
39
www.hdfgroup.org
39. Order of Operations
• An order is imposed on operations by
argument dependencies
For Example:
A file must be opened before a dataset
-becausethe dataset open call requires a file handle
as an argument.
• Objects can be closed in any order.
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
40
www.hdfgroup.org
40. The General HDF5 API
• Currently C, Fortran 90, Java, and C++
bindings.
• C routines begin with prefix H5?
? is a character corresponding to the type of
object the function acts on
Example Functions:
H5D : Dataset interface
H5F : File interface
e.g., H5Dread
e.g., H5Fopen
H5S : dataSpace interface e.g., H5Sclose
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
41
www.hdfgroup.org
41. HDF5 Defined Types
For portability, the HDF5 library has its own defined
types:
hid_t:
hsize_t:
herr_t:
object identifiers (native integer)
size used for dimensions (unsigned long or
unsigned long long)
function return value
hvl_t:
variable length datatype
For C, include hdf5.h in your HDF5 application.
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
42
www.hdfgroup.org
42. The HDF5 API
• For flexibility, the API is extensive
Victronix
Swiss Army
Cybertool 34
300+ functions
• This can be daunting… but there is hope
A few functions can do a lot
Start simple
Build up knowledge as more features are
needed
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
43
www.hdfgroup.org
43. Basic Functions
H5Fcreate (H5Fopen)
create (open) File
H5Screate_simple/H5Screate create dataSpace
H5Dcreate (H5Dopen)
H5Dread, H5Dwrite
H5Dclose
H5Sclose
H5Fclose
create (open) Dataset
access Dataset
close Dataset
close dataSpace
close File
NOTE: The order specified above is not required.
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
44
www.hdfgroup.org
44. Other Common Functions
DataSpaces:
H5Sselect_hyperslab (Partial I/O)
H5Sselect_elements (Partial I/O)
H5Dget_space
Groups:
H5Gcreate, H5Gopen, H5Gclose
Attributes:
H5Acreate, H5Aopen_name,
H5Aclose, H5Aread, H5Awrite
Property lists:
H5Pcreate, H5Pclose
H5Pset_chunk, H5Pset_deflate
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
45
www.hdfgroup.org
45. High Level APIs
• Included along with the HDF5 library
• Simplify steps for creating, writing, and
reading objects.
• Do not entirely ‘wrap’ HDF5 library
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
46
www.hdfgroup.org
46. The HDF Group
Example HDF5 Code
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
47
www.hdfgroup.org
47. Steps to Create a File
1. Decide on properties the file should have and
create them if necessary:
•
•
•
Creation properties, like size of user block
Access properties (improve performance)
Use default properties (H5P_DEFAULT)
2. Create the file
3. Close the file and the property lists, as needed
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
48
www.hdfgroup.org
48. Code: Create a File
hid_t
herr_t
file_id;
status;
file_id = H5Fcreate("file.h5", H5F_ACC_TRUNC,
H5P_DEFAULT, H5P_DEFAULT);
status = H5Fclose (file_id);
“/” (root)
Note: Return codes not checked for errors in code samples.
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
49
www.hdfgroup.org
50. Steps to Create a Dataset
1. Define dataset characteristics
a) Datatype – integer
b) Dataspace - 4x6
c) Properties if needed, or use H5P_DEFAULT
2. Decide where to put it
•
Obtain location ID:
- Group ID puts it in a Group
- File ID puts it in Root Group
“/” (root)
A
3. Create dataset in file
4. Close everything
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
51
www.hdfgroup.org
51. HDF5 Pre-defined Datatype Identifiers
HDF5 defines* set of Datatype Identifiers per HDF5
session.
For example:
C Type
HDF5 File Type
HDF5 Memory Type
int
H5T_STD_I32BE
H5T_STD_I32LE
H5T_NATIVE_INT
float
H5T_IEEE_F32BE
H5T_IEEE_F32LE
H5T_NATIVE_FLOAT
double
H5T_IEEE_F64BE
H5T_IEEE_F64LE
H5T_NATIVE_DOUBLE
* Value of datatype is NOT fixed
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
52
www.hdfgroup.org
52. Pre-defined File Datatype Identifiers
Examples:
H5T_IEEE_F64LE Eight-byte, little-endian, IEEE floating-point
H5T_STD_I32LE Four-byte, little-endian, signed two's
complement integer
Architecture*
Programming
Type
NOTE: What you see in the file. Name is the same everywhere and
explicitly defines a datatype.
*STD= “An architecture with a semi-standard type like 2’s complement integer, unsigned integer…”
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
53
www.hdfgroup.org
53. Pre-defined Native Datatypes
Examples of predefined native types in C:
H5T_NATIVE_INT
H5T_NATIVE_FLOAT
H5T_NATIVE_UINT
H5T_NATIVE_LONG
H5T_NATIVE_CHAR
(int)
(float )
(unsigned int)
(long )
(char )
NOTE: Memory types.
Different for each machine.
Used for reading/writing.
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
54
www.hdfgroup.org
54. Storage Properties
Data elements
stored physically
adjacent to each
other
Contiguous
(default)
Better access time
for subsets;
extensible
Chunked
Improves storage
efficiency,
transmission speed
Chunked &
Compressed
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
55
www.hdfgroup.org
55. Code: Create a Dataset
1
2
.
.
.
.
5
6
7
hid_t
hsize_t
herr_t
file_id, dataset_id, dataspace_id;
dims[2];
status;
file_id = H5Fcreate (”file.h5", H5F_ACC_TRUNC,
Define a
H5P_DEFAULT, H5P_DEFAULT);
dataspace
dims[0] = 4;
current dims
rank
dims[1] = 6;
dataspace_id = H5Screate_simple (2, dims, NULL);
8
dataset_id = H5Dcreate (file_id,”A",H5T_STD_I32BE,
dataspace_id, H5P_DEFAULT,
H5P_DEFAULT,
H5P_DEFAULT);
9 status = H5Dclose (dataset_id);
10 status = H5Sclose (dataspace_id);
11 status = H5Fclose (file_id);
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
58
www.hdfgroup.org
56. Code: Create a Dataset
1
.
.
.
.
.
.
.
8
hid_t
hsize_t
herr_t
file_id, dataset_id, dataspace_id;
dims[2];
status;
file_id = H5Fcreate (”file.h5", H5F_ACC_TRUNC,
H5P_DEFAULT, H5P_DEFAULT);
dims[0] = 4;
dims[1] = 6;
dataspace_id = H5Screate_simple (2, dims, NULL);
Where to put
Datatype
it
dataset_id = H5Dcreate (file_id,”A",H5T_STD_I32BE,
dataspace_id, H5P_DEFAULT,H5P_DEFAULT,
H5P_DEFAULT);
Size &
shape
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
Properties
(Link Creation,
Dataset Creation and
Access) 59
www.hdfgroup.org
58. Example Code - H5Dwrite
Dataset ID from
H5Dcreate/H5Dopen
Memory Datatype
status = H5Dwrite (dataset_id, H5T_NATIVE_INT,
H5S_ALL,H5S_ALL, H5P_DEFAULT, wdata);
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
61
www.hdfgroup.org
59. Partial I/O
status = H5Dwrite (dataset_id, H5T_NATIVE_INT,
H5S_ALL, H5S_ALL, H5P_DEFAULT,wdata);
Memory
Dataspace
H5S_ALL
H5S_ALL
File Dataspace (disk)
To Modify Dataspace:
H5Sselect_hyperslab
H5Sselect_elements
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
62
www.hdfgroup.org
60. Example Code – H5Dwrite
status = H5Dwrite (dataset_id, H5T_NATIVE_INT,
H5S_ALL, H5S_ALL, H5P_DEFAULT, wdata);
Data Transfer Property List
(MPI I/O, Transformations,…)
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
63
www.hdfgroup.org
61. Example Code – H5Dread
status = H5Dread (dataset_id, H5T_NATIVE_INT,
H5S_ALL, H5S_ALL, H5P_DEFAULT, rdata);
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
64
www.hdfgroup.org
62. High Level APIs: HDF5 Lite (H5LT)
#include “hdf5_hl.h“
.
.
file_id = H5Fcreate(“file.h5",H5F_ACC_TRUNC,
H5P_DEFAULT, H5P_DEFAULT);
status = H5LTmake_dataset (file_id,“A",2,dims,
H5T_STD_I32BE, data);
status = H5Fclose (file_id);
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
65
www.hdfgroup.org
63. Steps to Create a Group
1. Decide where to put it – “root group”
•
Obtain location ID
1. Define properties or use H5P_DEFAULT
3. Create group in file.
4. Close the group.
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
67
www.hdfgroup.org
64. Example: Create a Group
“/” (root)
A
B
4x6 array of
integers
file.h5
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
68
www.hdfgroup.org
65. Code: Create a Group
hid_t file_id, group_id;
...
/* Open “file.h5” */
file_id = H5Fopen (“file.h5”, H5F_ACC_RDWR,
H5P_DEFAULT);
/* Create group "/B" in file. */
group_id = H5Gcreate (file_id,"B", H5P_DEFAULT,
H5P_DEFAULT, H5P_DEFAULT);
/* Close group and file. */
status = H5Gclose (group_id);
status = H5Fclose (file_id);
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
70
www.hdfgroup.org
66. HDF5 Tutorial and Examples
HDF5 Tutorial:
http://www.hdfgroup.org/HDF5/Tutor/
HDF5 Example Code:
http://www.hdfgroup.org/ftp/HDF5/examples/examples-by-api/
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
71
www.hdfgroup.org
67. The HDF Group
Thank You!
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
72
www.hdfgroup.org
68. Acknowledgements
This work was supported by cooperative agreement
number NNX08AO77A from the National
Aeronautics and Space Administration (NASA).
Any opinions, findings, conclusions, or
recommendations expressed in this material are
those of the author[s] and do not necessarily reflect
the views of the National Aeronautics and Space
Administration.
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
73
www.hdfgroup.org
Data Array is an ordered collection of identically typed data items distinguished by their indices
Metadata:
Dataspace – Rank, dimensions; spatial info about dataset
Datatype – Information on how to interpret your data
Storage Properties – How array is organized
Attributes – User-defined metadata (optional)
To create this file, we would start by creating the file itself. When you create a file, the root group gets created with it. So every file has at least that one group.