pythonmatlabpandashdf5data-science-experience

HDF5 dataset from MATLAB to Pandas DataFrame in Python


I have .mat files with HDF5 data and I want to load it into Python (Pandas DataFrame). I can load the file:

f2 = h5py.File("file.mat")
f2['data']

which is an HDF5 dataset:

<HDF5 dataset "data": shape (9999999, 32), type "<f8">

If I read it with Pandas:

g = pd.read_hdf("file.mat",'data')

I get the following error:

cannot create a storer if the object is not existing nor a value are passed

How do I convert this to a Pandas DataFrame?


Solution

  • AFAIK you can't read HDF5 files using Pandas methods, that have NOT been written using Pandas.

    You can read them using one of the following approaches:

    read matlab v7.3 file into python list of numpy arrays via h5py

    Reading ALL variables in a .mat file with python h5py

    http://poquitopicante.blogspot.de/2014/05/loading-matlab-mat-file-into-pandas.html

    Read .mat files in Python