I have several groups in my h5 file: 'group1', 'group2', ...
and each group has 3 different datasets: 'dataset1', 'dataset2', 'dataset3'
, all of which are arrays with numerical values but the size of array is different.
My goal is to save each dataset from group to a numpy array.
Example:
import h5py
filename = '../Results/someFileName.h5'
data = h5py.File(filename, 'r')
Now I can easily iterate over all groups with
for i in range(len(data.keys())):
group = list(data.keys())[i]
but I can't figure out how to access the datasets within the group. So I am looking for something like MATLAB:
hinfo = h5info(filename);
for i = 1:length(hinfo.Groups())
datasetname = [hinfo.Groups(i).Name '/dataset1'];
dset = h5read(fn, datasetname);
Where dset
is now an array of numbers.
Is there a way I could do the same with h5py?
You are have the right idea.
But, you don't need to loop on range(len(data.keys()))
.
Just use data.keys()
; it generates an iterable list of object names.
Try this:
import h5py
filename = '../Results/someFileName.h5'
data = h5py.File(filename, 'r')
for group in data.keys() :
print(group)
for dset in data[group].keys():
print(dset)
ds_data = data[group][dset] # returns HDF5 dataset object
print(ds_data)
print(ds_data.shape, ds_data.dtype)
arr = data[group][dset][:] # adding [:] returns a numpy array
print(arr.shape, arr.dtype)
print(arr)
Note: logic above is valid ONLY when there are only groups at the top level (no datasets). It does not test object types as groups or data sets.
To avoid these assumptions/limitations, you should investigate .visititems()
or write a generator to recursively visit objects. The first 2 answers are examples showing .visititems()
usage, and the last 1 uses a generator function:
isinstance()
as the test. The object is a Group when it tests true for h5py.Group
and is a Dataset when it tests true for h5py.Dataset
. I consider this more Pythonic than the second example below (IMHO).