This tutorial is designed for new HDF5 users. We will cover basic HDF5 Data Model objects and their properties, give an overview of the HDF5 Libraries and APIs, and discuss the HDF5 programming model. Simple C and Fortran examples will be used to illustrate HDF5 concepts.
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Introduction to HDF5
1. Introduction to HDF5
HDF and HDF-EOS Workshop XI
November 6-8, 2007
11/6/07
HDF and HDF-EOS Workshop XI, Landover, MD
1
2. Goals
• Introduce HDF5
• Explain how data can be organized and used
in an application
• Provide example code
11/6/07
HDF and HDF-EOS Workshop XI, Landover, MD
2
3. For More Information…
All workshop slides will be available from:
http://hdfeos.org/workshops/ws11/workshop_eleven.php
See the Resources handout for where to get software,
Docs, FAQs, etc..
11/6/07
HDF and HDF-EOS Workshop XI, Landover, MD
3
4. What is HDF5?
HDF = Hierarchical Data Format
• File format for managing any kind of data
• Software (library and tools) for accessing data in
that format
11/6/07
HDF and HDF-EOS Workshop XI, Landover, MD
4
5. HDF5 Features
• Especially suited for large and/or complex
data collections.
• Platform independent
• C, F90, C++ , Java APIs
11/6/07
HDF and HDF-EOS Workshop XI, Landover, MD
5
12. HDF5 File
Container for Storing Scientific Data
• Primary Objects
- Datasets
- Groups
• Others Objects
- Attributes
- Property Lists
- Dataspaces
11/6/07
HDF and HDF-EOS Workshop XI, Landover, MD
12
13. HDF5 Dataset
• Data array
- Ordered collection of identically typed data items
distinguished by their indices
• Metadata
- Dataspace: Rank, dimensions; spatial info about dataset
- Datatype: Information to interpret your data
- Storage Properties: How array is organized
- Attributes: User-defined metadata (optional)
11/6/07
HDF and HDF-EOS Workshop XI, Landover, MD
13
15. HDF5 Dataset: Dataspace
Spatial Information about a dataset
• Rank and dimensions
- Permanent part of dataset definition
• Subset of points, for partial I/O
- Needed only during I/O operations
Rank = 2
Dimensions = 4x6
• Apply to datasets in memory or in
the file
11/6/07
HDF and HDF-EOS Workshop XI, Landover, MD
15
16. HDF5 Dataset: Compound Datatype
3
5
Dimensionality: 5 x 3
int8
Datatype:
int4
int16 2x3x2 array of float32
Each Element
11/6/07
HDF and HDF-EOS Workshop XI, Landover, MD
16
17. HDF5 Dataset: Datatype
Information on how to interpret a data element
• Permanent part of the dataset definition
• HDF5 atomic types
- normal integer & float
- user-definable (e.g. 13-bit integer)
- variable length types (e.g. strings)
- pointers - references to objects/dataset regions
- enumeration - names mapped to integers
- array
• HDF5 compound types
- Comparable to C structs
- Members can be atomic or compound types
11/6/07
HDF and HDF-EOS Workshop XI, Landover, MD
17
18. HDF5 Dataset: Property List
A collection of values that can be passed to HDF5
functions at lower layers of library
• There are property lists that you can use when:
- creating a file
- accessing a file
- creating a dataset
- reading/writing to a dataset.
• To use the HDF5 library defaults: H5Pdefault
11/6/07
HDF and HDF-EOS Workshop XI, Landover, MD
18
19. HDF5 Dataset: Storage Layout Properties
• Contiguous: Dataset stored in continuous array of
bytes (Default)
• Chunked: Dataset stored as fixed sized chunks. Each
chunk is read/written with a single I/O operation.
Required for:
- compression
- unlimited dimension dataset (extendible)
11/6/07
HDF and HDF-EOS Workshop XI, Landover, MD
19
20. HDF5 Dataset: Properties
Better subsetting
access time; extend,
compression
Chunked
Improves storage
efficiency,
transmission speed
Compressed
Arrays can be
extended in any
direction
Extendible
External
file
11/6/07
Dataset “Fred”
File B
File A
Metadata for Fred
Data for Fred
Metadata in one file,
raw data in another.
HDF and HDF-EOS Workshop XI, Landover, MD
20
21. HDF5 Dataset: Attributes
Data of form “name = value” attached to an object
• Scaleddown versions of dataset operations
- Not extendible
- No compression
- No partial I/O
• Optional
11/6/07
HDF and HDF-EOS Workshop XI, Landover, MD
21
22. HDF5 Dataset (again)
• Data array
- Ordered collection of identically typed data items
distinguished by their indices
• Metadata
- Dataspace:
- Datatype:
- Properties:
- Attributes:
11/6/07
Rank, dimensions; spatial info about dataset
Information to interpret your data
How array is organized
User-defined metadata (optional)
HDF and HDF-EOS Workshop XI, Landover, MD
22
23. HDF5 File: Groups
A mechanism for describing collections of
related objects
• Every file starts with a
root group
“/”
• Can have attributes
• Similar to UNIX
directories
11/6/07
HDF and HDF-EOS Workshop XI, Landover, MD
23
24. HDF5 objects are identified and
located by their pathnames
/ (root)
/x
/foo
/foo/temp
/foo/bar/temp
11/6/07
foo
temp
“/”
x
bar
temp
HDF and HDF-EOS Workshop XI, Landover, MD
24
26. Structure of HDF5 Library
Applications
Object API (C, F90, C++, Java)
Library internals
Virtual file I/O
File or other “storage”
11/6/07
HDF and HDF-EOS Workshop XI, Landover, MD
26
27. Virtual File I/O Layer
Allows HDF5 format address space to map to disk, the
network, memory, or a user-defined device
Virtual file I/O drivers
Stdio
File Family
MPI I/O
Memory
Network
…
“Storage”
File
11/6/07
File
Family
Memory
Network
…
HDF and HDF-EOS Workshop XI, Landover, MD
27
29. General API Topics
• General info about HDF5 programming (C )
• Walk through example program
11/6/07
HDF and HDF-EOS Workshop XI, Landover, MD
29
30. The General HDF5 API
• Currently has C, Fortran 90, Java, C++ bindings.
• C routines begin with prefix H5X, where X is a single
letter indicating the object on which the operation is
to be performed.
Example APIs:
H5F: File Interface:
H5D: Dataset Interface:
H5S: DataSpace Interface:
H5P: Property List Interface:
H5G: Group Interface:
H5A: Attribute Interface:
11/6/07
H5Fopen
H5Dread
H5Screate_simple
H5Pset_chunk
H5Gcreate
H5Acreate
HDF and HDF-EOS Workshop XI, Landover, MD
30
31. The General Paradigm
•
•
•
•
11/6/07
Properties of objects are defined (optional)
Objects are opened or created
Objects then accessed
Objects finally closed
HDF and HDF-EOS Workshop XI, Landover, MD
31
32. Order of Operations
The library imposes an order on the operations by
argument dependencies
Example: A file must be opened before a dataset
because the dataset open call requires a file identifier as
an argument
11/6/07
HDF and HDF-EOS Workshop XI, Landover, MD
32
33. HDF5 C Programming Issues
For portability, HDF5 library has its own defined types.
For example:
hid_t:
Object identifiers
hsize_t:
Size used for dimensions
herr_t:
Function return value
For C, include #include hdf5.h at the top of your HDF5
application.
11/6/07
HDF and HDF-EOS Workshop XI, Landover, MD
33
34. H5dump Command-line Utility To View HDF5 File
h5dump [--header] [-a ] [-d <names>] [-g <names>]
[-l <names>] [-t <names>] <file>
--header
Display header only; no data is displayed.
-a <names> Display the specified attribute(s).
-d <names> Display the specified dataset(s).
-g <names> Display the specified group(s) and all the members.
-l <names>
Displays the value(s) of the specified soft link(s).
-t <names> Display the specified named datatype(s).
-p
Display properties.
<names> is one or more appropriate object names.
11/6/07
HDF and HDF-EOS Workshop XI, Landover, MD
34
36. Example HDF5 Application
1 #include "hdf5.h"
2 #define FILE "dset.h5"
3 int main () {
4 hid_t
file_id, dataset_id, dataspace_id;
5 hsize_t dims[2];
6 herr_t
status;
7 int
i, j, dset_data[4][6];
Steps:
11
Create (or use default) file creation/access
properties
11
Create file w/ above properties
12-15 Create (or use default) dataset
characteristics:
[dataspace, datatype, storage properties]
15
Create dataset using above characteristics
16
Write data to dataset
17-19 Close all interfaces
8
9
10
for (i = 0; i < 4; i++)
for (j = 0; j < 6; j++)
dset_data[i][j] = i * 6 + j + 1;
11
file_id = H5Fcreate (FILE, H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT);
12
13
14
dims[0] = 4;
dims[1] = 6;
dataspace_id = H5Screate_simple (2, dims, NULL);
15
dataset_id = H5Dcreate (file_id, "/dset", H5T_STD_I32BE, dataspace_id,
H5P_DEFAULT);
16 status = H5Dwrite (dataset_id, H5T_NATIVE_INT, H5S_ALL, H5S_ALL, H5P_DEFAULT,
dset_data);
17
18
19
status = H5Sclose (dataspace_id);
status = H5Dclose (dataset_id);
status = H5Fclose (file_id);
11/6/07
HDF and HDF-EOS Workshop XI, Landover, MD
36
37. Example Code - Dataspace
Rank
Array of Dimension
Sizes (4x6)
NOT used here.
12
13
14
dims[0] = 4;
dims[1] = 6;
dataspace_id = H5Screate_simple (2, dims, NULL);
15
dataset_id = H5Dcreate (file_id, “/dset", H5T_STD_I32BE, dataspace_id,
H5P_DEFAULT);
11/6/07
HDF and HDF-EOS Workshop XI, Landover, MD
37
38. Example Code - Datatype
12
13
14
dims[0] = 4;
dims[1] = 6;
dataspace_id = H5Screate_simple (2, dims, NULL);
15
dataset_id = H5Dcreate (file_id, "/dset", H5T_STD_I32BE, dataspace_id,
H5P_DEFAULT);
Where do you
get the datatype?
11/6/07
HDF and HDF-EOS Workshop XI, Landover, MD
38
39. HDF5 Pre-defined Datatype Identifiers
HDF5 opens set of Pre-Defined Datatype identifiers.
For example:
C Type
HDF5 File Type
HDF5 Memory Type
int
H5T_STD_I32BE
H5T_STD_I32LE
H5T_NATIVE_INT
float
H5T_IEEE_F32BE
H5T_IEEE_F32LE
H5T_NATIVE_FLOAT
double
H5T_IEEE_F64BE
H5T_IEEE_F64LE
H5T_NATIVE_DOUBLE
11/6/07
HDF and HDF-EOS Workshop XI, Landover, MD
39
40. Pre-Defined File Datatype Identifiers
Examples:
H5T_IEEE_F64LE Eight-byte, little-endian, IEEE floating-point
H5T_VAX_F32
Four-byte VAX floating point
H5T_STD_I32LE Four-byte, little-endian, signed two's
complement integer
H5T_STD_U16BE Two-byte, big-endian, unsigned integer
Architecture*
Programming
Type
NOTE: What you see in the file. Name is the same everywhere and
explicitly defines a datatype.
*STD= “An architecture with a semi-standard type like 2’s complement integer, unsigned integer…”
11/6/07
HDF and HDF-EOS Workshop XI, Landover, MD
40
41. Pre-defined Native Datatype Identifiers
Examples of predefined native types in C:
H5T_NATIVE_INT
H5T_NATIVE_FLOAT
H5T_NATIVE_UINT
H5T_NATIVE_LONG
H5T_NATIVE_CHAR
(int)
(float )
(unsigned int)
(long )
(char )
NOTE: Memory types.
Different for each machine.
Used for reading/writing.
11/6/07
HDF and HDF-EOS Workshop XI, Landover, MD
41
42. Example Code - H5Dwrite
Dataset Identifier from
H5Dcreate or H5Dopen
Memory Datatype
status = H5Dwrite (dataset_id, H5T_NATIVE_INT, H5S_ALL,
H5S_ALL, H5P_DEFAULT, dset_data);
11/6/07
HDF and HDF-EOS Workshop XI, Landover, MD
42
43. Example Code – H5Dwrite
status = H5Dwrite (dataset_id, H5T_NATIVE_INT, H5S_ALL, H5S_ALL,
H5P_DEFAULT, dset_data);
Data Transfer
Property List
Memory
Dataspace
File
Dataspace
H5S_ALL selects entire
dataspace
11/6/07
HDF and HDF-EOS Workshop XI, Landover, MD
43
44. Memory and File Dataspaces – Why?
Partial I/O: Selected elements from source are mapped
(read/written) to selected elements in destination
- Selections in memory can differ from selection in file:
- Number of selected elements must be the same in source
and destination
- Selection can be slabs, points, or result of set operations
(union, difference ..) on slabs or points
11/6/07
HDF and HDF-EOS Workshop XI, Landover, MD
44
45. Example Code To:
• Create dataset in a group other than root
• Open file and dataset and read data
• Create an attribute for the dataset
11/6/07
HDF and HDF-EOS Workshop XI, Landover, MD
45
46. Example HDF5 Application
1 #include "hdf5.h"
2 #define FILE "dset.h5"
3 int main () {
4 hid_t
file_id, dataset_id, dataspace_id;
5 hsize_t dims[2];
6 herr_t
status;
7 int
i, j, dset_data[4][6];
8
9
10
for (i = 0; i < 4; i++)
for (j = 0; j < 6; j++)
dset_data[i][j] = i * 6 + j + 1;
11
file_id = H5Fcreate (FILE, H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT);
12
13
14
dims[0] = 4;
dims[1] = 6;
dataspace_id = H5Screate_simple (2, dims, NULL);
15
dataset_id = H5Dcreate (file_id, "/dset", H5T_STD_I32BE, dataspace_id,
H5P_DEFAULT);
16 status = H5Dwrite (dataset_id, H5T_NATIVE_INT, H5S_ALL, H5S_ALL, H5P_DEFAULT,
dset_data);
17
18
19
status = H5Sclose (dataspace_id);
status = H5Dclose (dataset_id);
status = H5Fclose (file_id);
11/6/07
HDF and HDF-EOS Workshop XI, Landover, MD
46
47. How to put Dataset in a Group?
hid_t
…
group_id;
Steps:
a. Create a group
b. Insert the dataset
into the group
c. Close the group
a group_id = H5Gcreate (file_id, "mygroup", 0);
…
b dataset_id = H5Dcreate (group_id, "dset", H5T_STD_I32BE,
dataspace_id, H5P_DEFAULT);
c status = H5Gclose (group_id);
11/6/07
HDF and HDF-EOS Workshop XI, Landover, MD
47
48. h5dump Output w/Dataset in a Group
$ h5dump dset.h5
Note that dataset
HDF5 "dset.h5" {
is in the group
GROUP "/" {
“mygroup”
GROUP "mygroup" {
DATASET "dset" {
DATATYPE H5T_STD_I32BE
“/”
DATASPACE SIMPLE { ( 4, 6 ) / ( 4, 6 ) }
DATA {
(0,0): 1, 2, 3, 4, 5, 6,
mygroup
(1,0): 7, 8, 9, 10, 11, 12,
(2,0): 13, 14, 15, 16, 17, 18,
(3,0): 19, 20, 21, 22, 23, 24
}
}
dset
}
}
}
11/6/07
HDF and HDF-EOS Workshop XI, Landover, MD
48
49. How to Read an Existing Dataset
file_id = H5Fopen (FILE, H5F_ACC_RDWR, H5P_DEFAULT);
dataset_id = H5Dopen (file_id, "/dset");
status = H5Dread (dataset_id, H5T_NATIVE_INT, H5S_ALL,
H5S_ALL, H5P_DEFAULT, dset_rdata);
status = H5Dclose (dataset_id);
status = H5Fclose (file_id);
Steps:
- Open Existing File
- Open Existing Dataset
- Read Data
- Close dataset, file ids
11/6/07
HDF and HDF-EOS Workshop XI, Landover, MD
49
50. How to Create an Attribute in the Dataset?
Steps:
a Create an attribute attached
to already open dataset
b Write data to the attribute
c Close attribute, dataspace
hid_t
aspace_id;
hsize_t dimsa;
int
attr_data[2]= {100, 200};
hid_t
attribute_id;
…
Attach to open dataset
dimsa = 2;
aspace_id = H5Screate_simple(1, &dimsa, NULL);
a
attribute_id = H5Acreate (dataset_id, "Units", H5T_STD_I32BE,
b
aspace_id, H5P_DEFAULT);
status = H5Awrite (attribute_id, H5T_NATIVE_INT, attr_data);
c
c
status = H5Aclose (attribute_id);
status = H5Sclose (aspace_id);
11/6/07
HDF and HDF-EOS Workshop XI, Landover, MD
50
52. HDF5 High Level APIs
11/6/07
HDF and HDF-EOS Workshop XI, Landover, MD
52
53. High Level APIs
• Make HDF5 easier to use
• Encourage standard ways to store objects
• Included with HDF5 library
However:
• Still need HDF5 calls (H5Fopen/H5Fclose…)
11/6/07
HDF and HDF-EOS Workshop XI, Landover, MD
53
54. High Level APIs
• HDF5 Lite (H5LT): Functions that simplify the steps
needed to create/read datasets and attributes
• HDF5 Image (H5IM): Functions for creating images
in HDF5.
• HDF5 Table (H5TB): Functions for creating tables
(collections of records) in HDF5.
• Others … (Dimension Scales, Packet Table)
11/6/07
HDF and HDF-EOS Workshop XI, Landover, MD
54
55. High Level Programming Model
file_id = H5Fcreate ( "test.h5", ……. );
/* Call some High Level function */
H5LTsome_function ( file_id, ...extra parameters );
status = H5Fclose ( file_id );
11/6/07
HDF and HDF-EOS Workshop XI, Landover, MD
55
56. HDF5 Hands-On
• Need Secure Shell or putty
• Everyone is logged in as: workshop
• Working directory for each user is specified at
top right of Hands-On page
• Compile Examples with: h5fc, h5cc
• Display Dataset with:
11/6/07
h5dump
HDF and HDF-EOS Workshop XI, Landover, MD
56
57. Thank you!
This presentation is based upon work supported in part by a
Cooperative Agreement with NASA under NASA NNX06AC83A.
Any opinions, findings, and conclusions or recommendations
expressed in this material are those of the author(s) and do not
necessarily reflect the views of the National Aeronautics and Space
Administration.
11/6/07
HDF and HDF-EOS Workshop XI, Landover, MD
57
Editor's Notes
Format and software for scientific data. HDF5 is a different format from earlier versions of HDF, as is the library.
Stores images, multidimensional arrays, tables, etc. That is, you can construct all of these different kinds structures and store them in HDF5. You can also mix and match them in HDF5 files according to your needs.
Emphasis on storage and I/O efficiency Both the library and the format are designed to address this.
Free and commercial software support As far as HDF5 goes, this is just a goal now. There is commercial support for HDF4, but little if any for HDF5 at this time. We are working with vendors to change this.
Emphasis on standards You can store data in HDF5 in a variety of ways, so we try to work with users to encourage them to organize HDF5 files in standard ways.
Users from many engineering and scientific fields
Format and software for scientific data. HDF5 is a different format from earlier versions of HDF, as is the library.
Stores images, multidimensional arrays, tables, etc. That is, you can construct all of these different kinds structures and store them in HDF5. You can also mix and match them in HDF5 files according to your needs.
Emphasis on storage and I/O efficiency Both the library and the format are designed to address this.
Free and commercial software support As far as HDF5 goes, this is just a goal now. There is commercial support for HDF4, but little if any for HDF5 at this time. We are working with vendors to change this.
Emphasis on standards You can store data in HDF5 in a variety of ways, so we try to work with users to encourage them to organize HDF5 files in standard ways.
Users from many engineering and scientific fields
This shows that you can mix objects of different types according to your needs. Typically, there will be metadata stored with objects to indicate what type of object they are.
Like HDF4, HDF5 has a grouping structure. The main difference is that every HDF5 file starts with a root group, whereas HDF4 doesn’t need any groups at all.
Here is an example of a basic HDF5 object.
Notice that each element in the 3D array is a record with four values in it.
Like HDF4, HDF5 has a grouping structure.
The main difference is that every HDF5 file starts with a root group, whereas HDF4 doesn’t need any groups at all.
Like HDF4, HDF5 has a grouping structure.
The main difference is that every HDF5 file starts with a root group, whereas HDF4 doesn’t need any groups at all.