pythonmatlabhdf5h5py

Read multiple datasets from same Group in h5 file using h5py


I have several groups in my h5 file: 'group1', 'group2', ... and each group has 3 different datasets: 'dataset1', 'dataset2', 'dataset3', all of which are arrays with numerical values but the size of array is different.

My goal is to save each dataset from group to a numpy array.

Example:

import h5py
filename = '../Results/someFileName.h5'
data = h5py.File(filename, 'r')

Now I can easily iterate over all groups with

for i in range(len(data.keys())):
    group = list(data.keys())[i]

but I can't figure out how to access the datasets within the group. So I am looking for something like MATLAB:

hinfo = h5info(filename);
for i = 1:length(hinfo.Groups())
     datasetname = [hinfo.Groups(i).Name '/dataset1'];
     dset = h5read(fn, datasetname);

Where dset is now an array of numbers.

Is there a way I could do the same with h5py?


Solution

  • You are have the right idea. But, you don't need to loop on range(len(data.keys())). Just use data.keys(); it generates an iterable list of object names. Try this:

    import h5py
    filename = '../Results/someFileName.h5'
    data = h5py.File(filename, 'r')
    for group in data.keys() :
        print(group)
        for dset in data[group].keys():      
            print(dset)
            ds_data = data[group][dset] # returns HDF5 dataset object
            print(ds_data)
            print(ds_data.shape, ds_data.dtype)
            arr = data[group][dset][:] # adding [:] returns a numpy array
            print(arr.shape, arr.dtype)
            print(arr)
    

    Note: logic above is valid ONLY when there are only groups at the top level (no datasets). It does not test object types as groups or data sets.

    To avoid these assumptions/limitations, you should investigate .visititems() or write a generator to recursively visit objects. The first 2 answers are examples showing .visititems() usage, and the last 1 uses a generator function:

    1. Use visititems(-function-) to loop recursively
      This example uses isinstance() as the test. The object is a Group when it tests true for h5py.Group and is a Dataset when it tests true for h5py.Dataset . I consider this more Pythonic than the second example below (IMHO).
    2. Convert hdf5 to raw organised in folders It checks for number of objects below the visited object. when there are no subgroups, it is a dataset. And when there subgroups, it is a group.
    3. How can I combine multiple .h5 file? This quesion has multipel answers. This answer uses a generator to merge data from several files with several groups and datasets into a single file.