pythonmatlabhdf5math5py

How to read a v7.3 mat file via h5py?


I have a struct array created by matlab and stored in v7.3 format mat file:

struArray = struct('name', {'one', 'two', 'three'}, 
                   'id', {1,2,3}, 
                   'data', {[1:10], [3:9], [0]})
save('test.mat', 'struArray', '-v7.3')

Now I want to read this file via python using h5py:

data = h5py.File('test.mat')
struArray = data['/struArray']

I have no idea how to get the struct data one by one from struArray:

for index in range(<the size of struArray>):
    elem = <the index th struct in struArray>
    name = <the name of elem>
    id = <the id of elem>
    data = <the data of elem>

Solution

  • Matlab 7.3 file format is not extremely easy to work with h5py. It relies on HDF5 reference, cf. h5py documentation on references.

    >>> import h5py
    >>> f = h5py.File('test.mat')
    >>> list(f.keys())
    ['#refs#', 'struArray']
    >>> struArray = f['struArray']
    >>> struArray['name'][0, 0]  # this is the HDF5 reference
    <HDF5 object reference>
    >>> f[struArray['name'][0, 0]].value  # this is the actual data
    array([[111],
           [110],
           [101]], dtype=uint16)
    

    To read struArray(i).id:

    >>> f[struArray['id'][0, 0]][0, 0]
    1.0
    >>> f[struArray['id'][1, 0]][0, 0]
    2.0
    >>> f[struArray['id'][2, 0]][0, 0]
    3.0
    

    Notice that Matlab stores a number as an array of size (1, 1), hence the final [0, 0] to get the number.

    To read struArray(i).data:

    >>> f[struArray['data'][0, 0]].value
    array([[  1.],
           [  2.],
           [  3.],
           [  4.],
           [  5.],
           [  6.],
           [  7.],
           [  8.],
           [  9.],
           [ 10.]])
    

    To read struArray(i).name, it is necessary to convert the array of integers to string:

    >>> f[struArray['name'][0, 0]].value.tobytes()[::2].decode()
    'one'
    >>> f[struArray['name'][1, 0]].value.tobytes()[::2].decode()
    'two'
    >>> f[struArray['name'][2, 0]].value.tobytes()[::2].decode()
    'three'