pythonmatlabdataseth5py

Read out string dataset from matlab file in python


When working with the NYUv2 dataset in python I try to get a list of all 894 classes in the dataset. From the documentation I know that its listed in the names variable of the nyu_depth_v2_labeled.mat-file. I want to have an array with 894 strings of the names of the classes, but when trying approaches i found online i could not make it work.

Here is my code:

import numpy as np
import h5py

# Load the .mat file
mat_path = '/path/to/nyu_depth_v2_labeled.mat'
mat_file = h5py.File(mat_path, 'r')
# get variable 'names'
names = mat_file.get('names')
print(names)
print('-------------------------------')
print(names[:])

The problem now is that the result looks like this:

<HDF5 dataset "names": shape (1, 894), type "|O">
-------------------------------
[[<HDF5 object reference> <HDF5 object reference> <HDF5 object reference>
  <HDF5 object reference> <HDF5 object reference> <HDF5 object reference>
  <HDF5 object reference> <HDF5 object reference> <HDF5 object reference>
  <HDF5 object reference> <HDF5 object reference> <HDF5 object reference>
  <HDF5 object reference> <HDF5 object reference> <HDF5 object reference>
  <HDF5 object reference> <HDF5 object reference> <HDF5 object reference>
  <HDF5 object reference> <HDF5 object reference> <HDF5 object reference>
  <HDF5 object reference> <HDF5 object reference> <HDF5 object reference>
  <HDF5 object reference> <HDF5 object reference> <HDF5 object reference>
  <HDF5 object reference> <HDF5 object reference> <HDF5 object reference>
  <HDF5 object reference> <HDF5 object reference> <HDF5 object reference>
  <HDF5 object reference> <HDF5 object reference> <HDF5 object reference>
  <HDF5 object reference> <HDF5 object reference> <HDF5 object reference>
  <HDF5 object reference> <HDF5 object reference> <HDF5 object reference>
  <HDF5 object reference> <HDF5 object reference> <HDF5 object reference>
  <HDF5 object reference> <HDF5 object reference> <HDF5 object reference>
  <HDF5 object reference> <HDF5 object reference> <HDF5 object reference>
  <HDF5 object reference> <HDF5 object reference> <HDF5 object reference>
  <HDF5 object reference> <HDF5 object reference> <HDF5 object reference>
  <HDF5 object reference> <HDF5 object reference> <HDF5 object reference>
  <HDF5 object reference> <HDF5 object reference> <HDF5 object reference>
  <HDF5 object reference> <HDF5 object reference> <HDF5 object reference>
  <HDF5 object reference> <HDF5 object reference> <HDF5 object reference>
...
  <HDF5 object reference> <HDF5 object reference> <HDF5 object reference>
  <HDF5 object reference> <HDF5 object reference> <HDF5 object reference>
  <HDF5 object reference> <HDF5 object reference> <HDF5 object reference>
  <HDF5 object reference> <HDF5 object reference> <HDF5 object reference>]]

How do i make the data "readable" as python strings?


Solution

  • My search revealed that pymatreader is available at PyPI which is

    Convenient reader for Matlab mat files

    After you install it, you should be able to do

    from pymatreader import read_mat
    
    data = read_mat('/path/to/nyu_depth_v2_labeled.mat')
    

    which should then result in

    data is a python dict containing all variables of the mat file.

    I do not have ability to test it, so please test it and write if it does what you need to do.