SlideShare uma empresa Scribd logo
1 de 14
Substituting HDF5 tools with Python/H5py scripts
Daniel Kahn
Science Systems and Applications Inc.

HDF HDF-EOS Workshop XIV, 28 Sep. 2010

1 of 14
What are HDF5 tools?
HDF5 tools are command line programs distributed with the HDF5
library. They allow users to manipulate HDF5 files.
h5dump: dump HDF5 data as ASCII text.
h5import: convert non-HDF5 data to HDF5
h5diff: show differences between HDF5 files.
h5copy: Copy objects between HDF5 files.
h5repack: Copy entire file while changing storage
properties of HDF5 objects.
h5edit: (proposed) add attributes to HDF5 objects.

HDF5 tools have a long history as the first (and for a long time only)
way to manipulate HDF5 files conveniently. I.e. without writing a C or
Java program, or without buying expensive commercial software such
as IDL or Matlab.

2 of 14
The tools can be characterized as having three parts:
Text Processing—Evaluate command arguments, process input
text files, match group names.
Tree Walking – Search HDF5 file hierarchy for objects by name.
Object Level Operations – Operate on the objects: copy, diff,
repack, etc.

The tools are simple to use and convenient as they are
distributed with the HDF5 library.
3 of 14
Disadvantage of HDF5 tools:
The command line arguments limit tool capability.
Adding new features with command line syntax which is both
readable and does not break the legacy syntax becomes difficult.

Development time for designing and implementing new features is
long (weeks...months).
Use cases must be evaluated, a solution proposed in an RFC, the
proposal must be implemented, new code is distributed in next
release.

4 of 14
Here's an example from HDF documentation:
h5copy -v -i "test1.h5" -o "test1.out.h5" -s "/array" -d "/array

But suppose we had multiple datasets named arrayNNN
where N is 0–9. We'd like to write something like:
h5copy -v -i "test1.h5" -o "test1.out.h5" -s "/arrayd+{3}”

So that d+{3} would provide a match to all such objects.
Extending the tool syntax to meet this use case, and then
again for the next use case would be a never ending game of
catch up.
A more flexible substitute is desirable...

5 of 14
...Python?

6 of 14
What is Python?
Python is a programming language.
It features dynamic binding of variables, like Perl or shell
scripts, IDL, Matlab, but not C or Fortran.
Unlike Perl, it supports native floating point numbers.
It has scientific array support in the style of IDL or Matlab
(numpy module). Array operations can be programmed using
normal arithmetic operators.
It has access to the HDF5 library (Anderw Collette's
h5py module).
Python is currently the only programming language in wide
spread use to have all these features. They are essential to the
success of the language for easy HDF5 file manipulation.
7 of 14
Real world Experience: Learning Python and h5py is quick.
In the summer of 2010 SSAI hired a summer intern.
Equipped with some Perl programming experience the
intern was able to come up to speed on Python, HDF5,
h5py, and numpy within one to two weeks and, over
the summer, develop a specialized file/dataset merging
tool and a dataset conversion tool.

Python and h5py are the best way to introduce HDF5
because it allows the user to concentrate on the H in
HDF5, rather than the C API syntax.

8 of 14
Python is well suited to HDF5
Python is well suited to HDF5 because the HDF5 array objects
carry the dimensionality, extent, and element data type
information, just as HDF5 datasets do. The object oriented
nature of Python allows these objects to be manipulated at a
high level. C, by contrast, lacks a scientific array object and
the ability to define object methods.

9 of 14
Example: Creating and Writing a Dataset to a New File
Python:

import h5py
import numpy
TestData = numpy.array(range(1,25),dtype='int32').reshape(4,6)
h5py.File("WrittenByH5PY.h5","w")['/TestDataset'] = TestData

Compare to C version:
#include "hdf5.h"
int main() {

hid_t
file_id, dataspace_id, dataset_id; /* identifiers */
herr_t status;
hsize_t dims[2];
const int FirstIndex = 4, SecondIndex = 6;
int
i, j, dset_data[4][6];
for (i = 0; i < 4; i++) /* Initialize the dataset. */
for (j = 0; j < 6; j++)
dset_data[i][j] = i * 6 + j + 1;
dims[0] = FirstIndex;
dims[1] = SecondIndex;
file_id = H5Fcreate("WrittenByC.h5", H5F_ACC_TRUNC, H5P_DEFAULT,H5P_DEFAULT); /* Open an existing file.
*/
dataspace_id = H5Screate_simple(2, dims, NULL);
dataset_id = H5Dcreate(file_id, "/TestDataset", H5T_STD_I32LE, dataspace_id,
H5P_DEFAULT,H5P_DEFAULT,H5P_DEFAULT);
/* Write the dataset. */
status = H5Dwrite(dataset_id, H5T_NATIVE_INT, H5S_ALL, H5S_ALL, H5P_DEFAULT, dset_data);
status = H5Dclose(dataset_id); /* Close the dataset. */
status = H5Fclose(file_id); /* Close the file. */

10 of 14

}
And here's the output:
h5dump WrittenByH5PY.h5
HDF5 "WrittenByH5PY.h5" {
GROUP "/" {
DATASET "TestDataset" {
DATATYPE H5T_STD_I32LE
DATASPACE SIMPLE { ( 4, 6 ) / ( 4, 6 ) }
DATA {
(0,0): 1, 2, 3, 4, 5, 6,
(1,0): 7, 8, 9, 10, 11, 12,
(2,0): 13, 14, 15, 16, 17, 18,
(3,0): 19, 20, 21, 22, 23, 24
}
}
}
}

11 of 14
Python and the Three Pillars of HDF5 Tools
Python is well suited to Text Processing
Python has wide range of string manipulation functions, an easyto-use regular expression module, and list and dictionary (hash
table) objects. No segmentation faults!

Python is well suited to Tree Walking. Recursive
functions and loops over lists are easy to write

Object Level Operations...Not so much.
Object Level Operations (e.g. copy, diff) are challenging to write
efficiently and should be provided as part of the API by the HDF
Group, for example h5o_copy. API functions are available to the
Python programmer via h5py.

12 of 14
Why use Python to substitute HDF5 tools?
Python is available now.
Some HDF5 tools are still under development as new use
cases are presented. For example, users have requested a
tool to add attributes to HDF5 files. Such a capability
already exists with h5py:
python -c "import h5py ; fid = h5py.File('FileForAttributeAddition.h5','r+') ;
fid['/TestDataset'].attrs['CmdLine1'] = 'NewValue' ; fid.close()"

It's little ugly, but it is available today.
Python is a full programming language. It can accomplish
tasks which HDF5 tools cannot.
Further Resources:
http://groups.google.com/group/h5py
http://h5py.alfven.org/

13 of 14
Recommendations:
Users should consider Python and H5py to accomplish their HDF5
file manipulation projects.
The HDF Group should concentrate on providing efficient
API functions for object level tasks: object copy, dataset
difference, etc.

The HDF Group should avoid complex enhancements to tools
where Python/h5py could be used instead.
An easily searched contributed application repository on the HDF
Group website with user ratings would be very helpful.

14 of 14

Mais conteúdo relacionado

Mais procurados

超簡単! インストールなしでRedmineを試す
超簡単! インストールなしでRedmineを試す超簡単! インストールなしでRedmineを試す
超簡単! インストールなしでRedmineを試すShin Tanigawa
 
静的解析を使った開発ツールの開発
静的解析を使った開発ツールの開発静的解析を使った開発ツールの開発
静的解析を使った開発ツールの開発Takuya Ueda
 
事例から学ぶ!Power Platformガバナンス設計~CoEの話も添えて~
事例から学ぶ!Power Platformガバナンス設計~CoEの話も添えて~事例から学ぶ!Power Platformガバナンス設計~CoEの話も添えて~
事例から学ぶ!Power Platformガバナンス設計~CoEの話も添えて~Junichi Kodama
 
継承やめろマジやめろ。 なぜイケないのか 解説する
継承やめろマジやめろ。 なぜイケないのか 解説する継承やめろマジやめろ。 なぜイケないのか 解説する
継承やめろマジやめろ。 なぜイケないのか 解説するTaishiYamada1
 
Python and Machine Learning
Python and Machine LearningPython and Machine Learning
Python and Machine Learningtrygub
 
第22回-第1部「この価格でここまでできる!驚愕のエントリー・ストレージ活用方法」-IBM Storwize V3700-(2012/11/29 on し...
第22回-第1部「この価格でここまでできる!驚愕のエントリー・ストレージ活用方法」-IBM Storwize V3700-(2012/11/29 on し...第22回-第1部「この価格でここまでできる!驚愕のエントリー・ストレージ活用方法」-IBM Storwize V3700-(2012/11/29 on し...
第22回-第1部「この価格でここまでできる!驚愕のエントリー・ストレージ活用方法」-IBM Storwize V3700-(2012/11/29 on し...System x 部 (生!) : しすなま! @ Lenovo Enterprise Solutions Ltd.
 
日本語テストメソッドについて
日本語テストメソッドについて日本語テストメソッドについて
日本語テストメソッドについてkumake
 
ドメイン駆動設計 コアドメインを語り合ってみよう
ドメイン駆動設計 コアドメインを語り合ってみようドメイン駆動設計 コアドメインを語り合ってみよう
ドメイン駆動設計 コアドメインを語り合ってみよう増田 亨
 
BPStudy32 CouchDB 再入門
BPStudy32 CouchDB 再入門BPStudy32 CouchDB 再入門
BPStudy32 CouchDB 再入門Yohei Sasaki
 
Hyper-V、オンプレミスでもコンテナを
Hyper-V、オンプレミスでもコンテナをHyper-V、オンプレミスでもコンテナを
Hyper-V、オンプレミスでもコンテナをTetsuya Yokoyama
 
OpenAI FineTuning を試してみる
OpenAI FineTuning を試してみるOpenAI FineTuning を試してみる
OpenAI FineTuning を試してみるiPride Co., Ltd.
 
RaspberryPi Zero W で FreeBSD を動かす方法にやっと気がついたのと、USB OTGも動いてよかった。
RaspberryPi Zero W で FreeBSD を動かす方法にやっと気がついたのと、USB OTGも動いてよかった。RaspberryPi Zero W で FreeBSD を動かす方法にやっと気がついたのと、USB OTGも動いてよかった。
RaspberryPi Zero W で FreeBSD を動かす方法にやっと気がついたのと、USB OTGも動いてよかった。Suzuki Mitsuhiro
 
DiI/DIコンテナを一から学んでみた
DiI/DIコンテナを一から学んでみたDiI/DIコンテナを一から学んでみた
DiI/DIコンテナを一から学んでみたtak
 
SlideShareをやめて SpeakerDeckに移行します
SlideShareをやめて SpeakerDeckに移行しますSlideShareをやめて SpeakerDeckに移行します
SlideShareをやめて SpeakerDeckに移行しますMamoru Ohashi
 
AI搭載型IP電話 MiiTel を支える組織とアーキテクチャ
AI搭載型IP電話 MiiTel を支える組織とアーキテクチャAI搭載型IP電話 MiiTel を支える組織とアーキテクチャ
AI搭載型IP電話 MiiTel を支える組織とアーキテクチャRevComm Inc
 
Power BI をアプリに埋め込みたい? ならば Power BI Embedded だ!
Power BI をアプリに埋め込みたい? ならば Power BI Embedded だ!Power BI をアプリに埋め込みたい? ならば Power BI Embedded だ!
Power BI をアプリに埋め込みたい? ならば Power BI Embedded だ!Teruchika Yamada
 
最近の単体テスト
最近の単体テスト最近の単体テスト
最近の単体テストKen Morishita
 
iOS/Androidアプリエンジニアが理解すべき「Model」の振る舞い
iOS/Androidアプリエンジニアが理解すべき「Model」の振る舞いiOS/Androidアプリエンジニアが理解すべき「Model」の振る舞い
iOS/Androidアプリエンジニアが理解すべき「Model」の振る舞いKen Morishita
 

Mais procurados (20)

超簡単! インストールなしでRedmineを試す
超簡単! インストールなしでRedmineを試す超簡単! インストールなしでRedmineを試す
超簡単! インストールなしでRedmineを試す
 
Python programming : Abstract classes interfaces
Python programming : Abstract classes interfacesPython programming : Abstract classes interfaces
Python programming : Abstract classes interfaces
 
静的解析を使った開発ツールの開発
静的解析を使った開発ツールの開発静的解析を使った開発ツールの開発
静的解析を使った開発ツールの開発
 
Go入門
Go入門Go入門
Go入門
 
事例から学ぶ!Power Platformガバナンス設計~CoEの話も添えて~
事例から学ぶ!Power Platformガバナンス設計~CoEの話も添えて~事例から学ぶ!Power Platformガバナンス設計~CoEの話も添えて~
事例から学ぶ!Power Platformガバナンス設計~CoEの話も添えて~
 
継承やめろマジやめろ。 なぜイケないのか 解説する
継承やめろマジやめろ。 なぜイケないのか 解説する継承やめろマジやめろ。 なぜイケないのか 解説する
継承やめろマジやめろ。 なぜイケないのか 解説する
 
Python and Machine Learning
Python and Machine LearningPython and Machine Learning
Python and Machine Learning
 
第22回-第1部「この価格でここまでできる!驚愕のエントリー・ストレージ活用方法」-IBM Storwize V3700-(2012/11/29 on し...
第22回-第1部「この価格でここまでできる!驚愕のエントリー・ストレージ活用方法」-IBM Storwize V3700-(2012/11/29 on し...第22回-第1部「この価格でここまでできる!驚愕のエントリー・ストレージ活用方法」-IBM Storwize V3700-(2012/11/29 on し...
第22回-第1部「この価格でここまでできる!驚愕のエントリー・ストレージ活用方法」-IBM Storwize V3700-(2012/11/29 on し...
 
日本語テストメソッドについて
日本語テストメソッドについて日本語テストメソッドについて
日本語テストメソッドについて
 
ドメイン駆動設計 コアドメインを語り合ってみよう
ドメイン駆動設計 コアドメインを語り合ってみようドメイン駆動設計 コアドメインを語り合ってみよう
ドメイン駆動設計 コアドメインを語り合ってみよう
 
BPStudy32 CouchDB 再入門
BPStudy32 CouchDB 再入門BPStudy32 CouchDB 再入門
BPStudy32 CouchDB 再入門
 
Hyper-V、オンプレミスでもコンテナを
Hyper-V、オンプレミスでもコンテナをHyper-V、オンプレミスでもコンテナを
Hyper-V、オンプレミスでもコンテナを
 
OpenAI FineTuning を試してみる
OpenAI FineTuning を試してみるOpenAI FineTuning を試してみる
OpenAI FineTuning を試してみる
 
RaspberryPi Zero W で FreeBSD を動かす方法にやっと気がついたのと、USB OTGも動いてよかった。
RaspberryPi Zero W で FreeBSD を動かす方法にやっと気がついたのと、USB OTGも動いてよかった。RaspberryPi Zero W で FreeBSD を動かす方法にやっと気がついたのと、USB OTGも動いてよかった。
RaspberryPi Zero W で FreeBSD を動かす方法にやっと気がついたのと、USB OTGも動いてよかった。
 
DiI/DIコンテナを一から学んでみた
DiI/DIコンテナを一から学んでみたDiI/DIコンテナを一から学んでみた
DiI/DIコンテナを一から学んでみた
 
SlideShareをやめて SpeakerDeckに移行します
SlideShareをやめて SpeakerDeckに移行しますSlideShareをやめて SpeakerDeckに移行します
SlideShareをやめて SpeakerDeckに移行します
 
AI搭載型IP電話 MiiTel を支える組織とアーキテクチャ
AI搭載型IP電話 MiiTel を支える組織とアーキテクチャAI搭載型IP電話 MiiTel を支える組織とアーキテクチャ
AI搭載型IP電話 MiiTel を支える組織とアーキテクチャ
 
Power BI をアプリに埋め込みたい? ならば Power BI Embedded だ!
Power BI をアプリに埋め込みたい? ならば Power BI Embedded だ!Power BI をアプリに埋め込みたい? ならば Power BI Embedded だ!
Power BI をアプリに埋め込みたい? ならば Power BI Embedded だ!
 
最近の単体テスト
最近の単体テスト最近の単体テスト
最近の単体テスト
 
iOS/Androidアプリエンジニアが理解すべき「Model」の振る舞い
iOS/Androidアプリエンジニアが理解すべき「Model」の振る舞いiOS/Androidアプリエンジニアが理解すべき「Model」の振る舞い
iOS/Androidアプリエンジニアが理解すべき「Model」の振る舞い
 

Destaque

Python and HDF5: Overview
Python and HDF5: OverviewPython and HDF5: Overview
Python and HDF5: Overviewandrewcollette
 
Introduction To Programming with Python-1
Introduction To Programming with Python-1Introduction To Programming with Python-1
Introduction To Programming with Python-1Syed Farjad Zia Zaidi
 
Introduction To Programming with Python-5
Introduction To Programming with Python-5Introduction To Programming with Python-5
Introduction To Programming with Python-5Syed Farjad Zia Zaidi
 
An Introduction to Interactive Programming in Python 2013
An Introduction to Interactive Programming in Python 2013An Introduction to Interactive Programming in Python 2013
An Introduction to Interactive Programming in Python 2013Syed Farjad Zia Zaidi
 
Introduction To Programming with Python-4
Introduction To Programming with Python-4Introduction To Programming with Python-4
Introduction To Programming with Python-4Syed Farjad Zia Zaidi
 
Introduction to UBI
Introduction to UBIIntroduction to UBI
Introduction to UBIRoy Lee
 
Python 4 Arc
Python 4 ArcPython 4 Arc
Python 4 Arcabsvis
 
Python programming - Everyday(ish) Examples
Python programming - Everyday(ish) ExamplesPython programming - Everyday(ish) Examples
Python programming - Everyday(ish) ExamplesAshish Sharma
 
Introduction To Programming with Python Lecture 2
Introduction To Programming with Python Lecture 2Introduction To Programming with Python Lecture 2
Introduction To Programming with Python Lecture 2Syed Farjad Zia Zaidi
 
Cyberoam Firewall Presentation
Cyberoam Firewall PresentationCyberoam Firewall Presentation
Cyberoam Firewall PresentationManoj Kumar Mishra
 
introduction to python
introduction to pythonintroduction to python
introduction to pythonSardar Alam
 

Destaque (20)

Using HDF5 and Python: The H5py module
Using HDF5 and Python: The H5py moduleUsing HDF5 and Python: The H5py module
Using HDF5 and Python: The H5py module
 
The Python Programming Language and HDF5: H5Py
The Python Programming Language and HDF5: H5PyThe Python Programming Language and HDF5: H5Py
The Python Programming Language and HDF5: H5Py
 
Python and HDF5: Overview
Python and HDF5: OverviewPython and HDF5: Overview
Python and HDF5: Overview
 
HDF5 Tools
HDF5 ToolsHDF5 Tools
HDF5 Tools
 
Introduction To Programming with Python-1
Introduction To Programming with Python-1Introduction To Programming with Python-1
Introduction To Programming with Python-1
 
Logic Over Language
Logic Over LanguageLogic Over Language
Logic Over Language
 
Logic: Language and Information 1
Logic: Language and Information 1Logic: Language and Information 1
Logic: Language and Information 1
 
Introduction To Programming with Python-5
Introduction To Programming with Python-5Introduction To Programming with Python-5
Introduction To Programming with Python-5
 
An Introduction to Interactive Programming in Python 2013
An Introduction to Interactive Programming in Python 2013An Introduction to Interactive Programming in Python 2013
An Introduction to Interactive Programming in Python 2013
 
Introduction to Databases
Introduction to DatabasesIntroduction to Databases
Introduction to Databases
 
Introduction To Programming with Python-4
Introduction To Programming with Python-4Introduction To Programming with Python-4
Introduction To Programming with Python-4
 
Introduction to UBI
Introduction to UBIIntroduction to UBI
Introduction to UBI
 
Python 4 Arc
Python 4 ArcPython 4 Arc
Python 4 Arc
 
Clase 2 estatica
Clase 2 estatica Clase 2 estatica
Clase 2 estatica
 
Using visualization tools to access HDF data via OPeNDAP
Using visualization tools to access HDF data via OPeNDAP Using visualization tools to access HDF data via OPeNDAP
Using visualization tools to access HDF data via OPeNDAP
 
Python programming - Everyday(ish) Examples
Python programming - Everyday(ish) ExamplesPython programming - Everyday(ish) Examples
Python programming - Everyday(ish) Examples
 
Lets learn Python !
Lets learn Python !Lets learn Python !
Lets learn Python !
 
Introduction To Programming with Python Lecture 2
Introduction To Programming with Python Lecture 2Introduction To Programming with Python Lecture 2
Introduction To Programming with Python Lecture 2
 
Cyberoam Firewall Presentation
Cyberoam Firewall PresentationCyberoam Firewall Presentation
Cyberoam Firewall Presentation
 
introduction to python
introduction to pythonintroduction to python
introduction to python
 

Semelhante a Substituting HDF5 tools with Python/H5py scripts

Hdf5 parallel
Hdf5 parallelHdf5 parallel
Hdf5 parallelmfolk
 
Docopt, beautiful command-line options for R, user2014
Docopt, beautiful command-line options for R,  user2014Docopt, beautiful command-line options for R,  user2014
Docopt, beautiful command-line options for R, user2014Edwin de Jonge
 

Semelhante a Substituting HDF5 tools with Python/H5py scripts (20)

Introduction to HDF5 Data Model, Programming Model and Library APIs
Introduction to HDF5 Data Model, Programming Model and Library APIsIntroduction to HDF5 Data Model, Programming Model and Library APIs
Introduction to HDF5 Data Model, Programming Model and Library APIs
 
Parallel HDF5 Introductory Tutorial
Parallel HDF5 Introductory TutorialParallel HDF5 Introductory Tutorial
Parallel HDF5 Introductory Tutorial
 
Introduction to HDF5 Data and Programming Models
Introduction to HDF5 Data and Programming ModelsIntroduction to HDF5 Data and Programming Models
Introduction to HDF5 Data and Programming Models
 
HDF5 Advanced Topics - Datatypes and Partial I/O
HDF5 Advanced Topics - Datatypes and Partial I/OHDF5 Advanced Topics - Datatypes and Partial I/O
HDF5 Advanced Topics - Datatypes and Partial I/O
 
Advanced HDF5 Features
Advanced HDF5 FeaturesAdvanced HDF5 Features
Advanced HDF5 Features
 
Advanced HDF5 Features
Advanced HDF5 FeaturesAdvanced HDF5 Features
Advanced HDF5 Features
 
Overview of Parallel HDF5 and Performance Tuning in HDF5 Library
Overview of Parallel HDF5 and Performance Tuning in HDF5 LibraryOverview of Parallel HDF5 and Performance Tuning in HDF5 Library
Overview of Parallel HDF5 and Performance Tuning in HDF5 Library
 
Introduction to HDF5
Introduction to HDF5Introduction to HDF5
Introduction to HDF5
 
Introduction to HDF5 Data Model, Programming Model and Library APIs
Introduction to HDF5 Data Model, Programming Model and Library APIsIntroduction to HDF5 Data Model, Programming Model and Library APIs
Introduction to HDF5 Data Model, Programming Model and Library APIs
 
Introduction to HDF5
Introduction to HDF5Introduction to HDF5
Introduction to HDF5
 
Hdf5 parallel
Hdf5 parallelHdf5 parallel
Hdf5 parallel
 
HDF Update for DAAC Managers (2017-02-27)
HDF Update for DAAC Managers (2017-02-27)HDF Update for DAAC Managers (2017-02-27)
HDF Update for DAAC Managers (2017-02-27)
 
Overview of Parallel HDF5
Overview of Parallel HDF5Overview of Parallel HDF5
Overview of Parallel HDF5
 
Unit V.pdf
Unit V.pdfUnit V.pdf
Unit V.pdf
 
Introduction to HDF5
Introduction to HDF5Introduction to HDF5
Introduction to HDF5
 
HDF5 Tools in IDL
HDF5 Tools in IDLHDF5 Tools in IDL
HDF5 Tools in IDL
 
Docopt, beautiful command-line options for R, user2014
Docopt, beautiful command-line options for R,  user2014Docopt, beautiful command-line options for R,  user2014
Docopt, beautiful command-line options for R, user2014
 
Overview of Parallel HDF5 and Performance Tuning in HDF5 Library
Overview of Parallel HDF5 and Performance Tuning in HDF5 LibraryOverview of Parallel HDF5 and Performance Tuning in HDF5 Library
Overview of Parallel HDF5 and Performance Tuning in HDF5 Library
 
Implementing HDF5 in MATLAB
Implementing HDF5 in MATLABImplementing HDF5 in MATLAB
Implementing HDF5 in MATLAB
 
Introduction to HDF5
Introduction to HDF5Introduction to HDF5
Introduction to HDF5
 

Mais de The HDF-EOS Tools and Information Center

STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...The HDF-EOS Tools and Information Center
 

Mais de The HDF-EOS Tools and Information Center (20)

Cloud-Optimized HDF5 Files
Cloud-Optimized HDF5 FilesCloud-Optimized HDF5 Files
Cloud-Optimized HDF5 Files
 
Accessing HDF5 data in the cloud with HSDS
Accessing HDF5 data in the cloud with HSDSAccessing HDF5 data in the cloud with HSDS
Accessing HDF5 data in the cloud with HSDS
 
The State of HDF
The State of HDFThe State of HDF
The State of HDF
 
Highly Scalable Data Service (HSDS) Performance Features
Highly Scalable Data Service (HSDS) Performance FeaturesHighly Scalable Data Service (HSDS) Performance Features
Highly Scalable Data Service (HSDS) Performance Features
 
Creating Cloud-Optimized HDF5 Files
Creating Cloud-Optimized HDF5 FilesCreating Cloud-Optimized HDF5 Files
Creating Cloud-Optimized HDF5 Files
 
HDF5 OPeNDAP Handler Updates, and Performance Discussion
HDF5 OPeNDAP Handler Updates, and Performance DiscussionHDF5 OPeNDAP Handler Updates, and Performance Discussion
HDF5 OPeNDAP Handler Updates, and Performance Discussion
 
Hyrax: Serving Data from S3
Hyrax: Serving Data from S3Hyrax: Serving Data from S3
Hyrax: Serving Data from S3
 
Accessing Cloud Data and Services Using EDL, Pydap, MATLAB
Accessing Cloud Data and Services Using EDL, Pydap, MATLABAccessing Cloud Data and Services Using EDL, Pydap, MATLAB
Accessing Cloud Data and Services Using EDL, Pydap, MATLAB
 
HDF - Current status and Future Directions
HDF - Current status and Future DirectionsHDF - Current status and Future Directions
HDF - Current status and Future Directions
 
HDFEOS.org User Analsys, Updates, and Future
HDFEOS.org User Analsys, Updates, and FutureHDFEOS.org User Analsys, Updates, and Future
HDFEOS.org User Analsys, Updates, and Future
 
HDF - Current status and Future Directions
HDF - Current status and Future Directions HDF - Current status and Future Directions
HDF - Current status and Future Directions
 
H5Coro: The Cloud-Optimized Read-Only Library
H5Coro: The Cloud-Optimized Read-Only LibraryH5Coro: The Cloud-Optimized Read-Only Library
H5Coro: The Cloud-Optimized Read-Only Library
 
MATLAB Modernization on HDF5 1.10
MATLAB Modernization on HDF5 1.10MATLAB Modernization on HDF5 1.10
MATLAB Modernization on HDF5 1.10
 
HDF for the Cloud - Serverless HDF
HDF for the Cloud - Serverless HDFHDF for the Cloud - Serverless HDF
HDF for the Cloud - Serverless HDF
 
HDF5 <-> Zarr
HDF5 <-> ZarrHDF5 <-> Zarr
HDF5 <-> Zarr
 
HDF for the Cloud - New HDF Server Features
HDF for the Cloud - New HDF Server FeaturesHDF for the Cloud - New HDF Server Features
HDF for the Cloud - New HDF Server Features
 
Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3
Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3
Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3
 
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
 
HDF5 and Ecosystem: What Is New?
HDF5 and Ecosystem: What Is New?HDF5 and Ecosystem: What Is New?
HDF5 and Ecosystem: What Is New?
 
HDF5 Roadmap 2019-2020
HDF5 Roadmap 2019-2020HDF5 Roadmap 2019-2020
HDF5 Roadmap 2019-2020
 

Último

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 

Último (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 

Substituting HDF5 tools with Python/H5py scripts

  • 1. Substituting HDF5 tools with Python/H5py scripts Daniel Kahn Science Systems and Applications Inc. HDF HDF-EOS Workshop XIV, 28 Sep. 2010 1 of 14
  • 2. What are HDF5 tools? HDF5 tools are command line programs distributed with the HDF5 library. They allow users to manipulate HDF5 files. h5dump: dump HDF5 data as ASCII text. h5import: convert non-HDF5 data to HDF5 h5diff: show differences between HDF5 files. h5copy: Copy objects between HDF5 files. h5repack: Copy entire file while changing storage properties of HDF5 objects. h5edit: (proposed) add attributes to HDF5 objects. HDF5 tools have a long history as the first (and for a long time only) way to manipulate HDF5 files conveniently. I.e. without writing a C or Java program, or without buying expensive commercial software such as IDL or Matlab. 2 of 14
  • 3. The tools can be characterized as having three parts: Text Processing—Evaluate command arguments, process input text files, match group names. Tree Walking – Search HDF5 file hierarchy for objects by name. Object Level Operations – Operate on the objects: copy, diff, repack, etc. The tools are simple to use and convenient as they are distributed with the HDF5 library. 3 of 14
  • 4. Disadvantage of HDF5 tools: The command line arguments limit tool capability. Adding new features with command line syntax which is both readable and does not break the legacy syntax becomes difficult. Development time for designing and implementing new features is long (weeks...months). Use cases must be evaluated, a solution proposed in an RFC, the proposal must be implemented, new code is distributed in next release. 4 of 14
  • 5. Here's an example from HDF documentation: h5copy -v -i "test1.h5" -o "test1.out.h5" -s "/array" -d "/array But suppose we had multiple datasets named arrayNNN where N is 0–9. We'd like to write something like: h5copy -v -i "test1.h5" -o "test1.out.h5" -s "/arrayd+{3}” So that d+{3} would provide a match to all such objects. Extending the tool syntax to meet this use case, and then again for the next use case would be a never ending game of catch up. A more flexible substitute is desirable... 5 of 14
  • 7. What is Python? Python is a programming language. It features dynamic binding of variables, like Perl or shell scripts, IDL, Matlab, but not C or Fortran. Unlike Perl, it supports native floating point numbers. It has scientific array support in the style of IDL or Matlab (numpy module). Array operations can be programmed using normal arithmetic operators. It has access to the HDF5 library (Anderw Collette's h5py module). Python is currently the only programming language in wide spread use to have all these features. They are essential to the success of the language for easy HDF5 file manipulation. 7 of 14
  • 8. Real world Experience: Learning Python and h5py is quick. In the summer of 2010 SSAI hired a summer intern. Equipped with some Perl programming experience the intern was able to come up to speed on Python, HDF5, h5py, and numpy within one to two weeks and, over the summer, develop a specialized file/dataset merging tool and a dataset conversion tool. Python and h5py are the best way to introduce HDF5 because it allows the user to concentrate on the H in HDF5, rather than the C API syntax. 8 of 14
  • 9. Python is well suited to HDF5 Python is well suited to HDF5 because the HDF5 array objects carry the dimensionality, extent, and element data type information, just as HDF5 datasets do. The object oriented nature of Python allows these objects to be manipulated at a high level. C, by contrast, lacks a scientific array object and the ability to define object methods. 9 of 14
  • 10. Example: Creating and Writing a Dataset to a New File Python: import h5py import numpy TestData = numpy.array(range(1,25),dtype='int32').reshape(4,6) h5py.File("WrittenByH5PY.h5","w")['/TestDataset'] = TestData Compare to C version: #include "hdf5.h" int main() { hid_t file_id, dataspace_id, dataset_id; /* identifiers */ herr_t status; hsize_t dims[2]; const int FirstIndex = 4, SecondIndex = 6; int i, j, dset_data[4][6]; for (i = 0; i < 4; i++) /* Initialize the dataset. */ for (j = 0; j < 6; j++) dset_data[i][j] = i * 6 + j + 1; dims[0] = FirstIndex; dims[1] = SecondIndex; file_id = H5Fcreate("WrittenByC.h5", H5F_ACC_TRUNC, H5P_DEFAULT,H5P_DEFAULT); /* Open an existing file. */ dataspace_id = H5Screate_simple(2, dims, NULL); dataset_id = H5Dcreate(file_id, "/TestDataset", H5T_STD_I32LE, dataspace_id, H5P_DEFAULT,H5P_DEFAULT,H5P_DEFAULT); /* Write the dataset. */ status = H5Dwrite(dataset_id, H5T_NATIVE_INT, H5S_ALL, H5S_ALL, H5P_DEFAULT, dset_data); status = H5Dclose(dataset_id); /* Close the dataset. */ status = H5Fclose(file_id); /* Close the file. */ 10 of 14 }
  • 11. And here's the output: h5dump WrittenByH5PY.h5 HDF5 "WrittenByH5PY.h5" { GROUP "/" { DATASET "TestDataset" { DATATYPE H5T_STD_I32LE DATASPACE SIMPLE { ( 4, 6 ) / ( 4, 6 ) } DATA { (0,0): 1, 2, 3, 4, 5, 6, (1,0): 7, 8, 9, 10, 11, 12, (2,0): 13, 14, 15, 16, 17, 18, (3,0): 19, 20, 21, 22, 23, 24 } } } } 11 of 14
  • 12. Python and the Three Pillars of HDF5 Tools Python is well suited to Text Processing Python has wide range of string manipulation functions, an easyto-use regular expression module, and list and dictionary (hash table) objects. No segmentation faults! Python is well suited to Tree Walking. Recursive functions and loops over lists are easy to write Object Level Operations...Not so much. Object Level Operations (e.g. copy, diff) are challenging to write efficiently and should be provided as part of the API by the HDF Group, for example h5o_copy. API functions are available to the Python programmer via h5py. 12 of 14
  • 13. Why use Python to substitute HDF5 tools? Python is available now. Some HDF5 tools are still under development as new use cases are presented. For example, users have requested a tool to add attributes to HDF5 files. Such a capability already exists with h5py: python -c "import h5py ; fid = h5py.File('FileForAttributeAddition.h5','r+') ; fid['/TestDataset'].attrs['CmdLine1'] = 'NewValue' ; fid.close()" It's little ugly, but it is available today. Python is a full programming language. It can accomplish tasks which HDF5 tools cannot. Further Resources: http://groups.google.com/group/h5py http://h5py.alfven.org/ 13 of 14
  • 14. Recommendations: Users should consider Python and H5py to accomplish their HDF5 file manipulation projects. The HDF Group should concentrate on providing efficient API functions for object level tasks: object copy, dataset difference, etc. The HDF Group should avoid complex enhancements to tools where Python/h5py could be used instead. An easily searched contributed application repository on the HDF Group website with user ratings would be very helpful. 14 of 14