pythondatabasehdf5h5pydata-storage

How to save simulation parameters in HDF5 file with h5py?


I'm using H5PY to store a high number of simulations' outputs. Now of course these simulations are parametrized, hence I need to also store what parameters were used for which simulation output.

At first, I wanted to give each simulation a codename and have somewhere a codename to parameters' values matrix but then realized that as simulations' number increases, it will rapidly be expensive to retrieve.

I then thought about creating a group named after a concatenation of all my parameters' values but it doesn't really address the fact that I will need to rapidly retrieve the values of the parameters (I would still need to parse the group's name and extract which values are associated to which parameter). I am therefore now contemplating creating as many groups in the hdf5 structure as I have parameters such that I can directly retrieve the parameters' values when accessing the simulated timeseries, but I would therefore need to create an ungodly amount of subgroups since my parameters take essentially real values (up to the floating point precision of course).

Does that last proposal sound reasonable (it doesn't to me but I don't know, maybe it's not as bad as I'm imagining it) or is there better ways to do, some good practices that I'm unaware of that would address my problem?

Thanks in advance!


Solution

  • I suggest saving the parameters as attributes at the file group level. That way anyone who accesses the file can easily retrieve the parameters and their values. Here is a simple example that shows how to create parameters/attributes:

    with h5py.File('sim_file.h5','w') as h5f:
        h5f.attrs['param1'] = 10 # an int
        h5f.attrs['param2'] = 125.25 # a float
        h5f.attrs['param3'] = 'Average' # a string
        h5f.attrs['param4'] = np.array([0.25, 1.5, 0.75]) # an array
    

    You retrieve the values in a similar way.

    with h5py.File('sim_file.h5') as h5f:
        param1 = h5f.attrs['param1']
        param2 = h5f.attrs['param2']
        param3 = h5f.attrs['param3']
        param4 = h5f.attrs['param4']
    # or, using keys to loop and access:
        for k in h5f.attrs.keys():
            print(f'Param {k} = {h5f.attrs[k]}')
    
    

    Here is a previous answer that shows how to create and retieve attributes for all object types (file, group and dataset): How to read HDF5 attributes (metadata) with Python and h5py