Is there a uniform way to comb through .mat files' field names in python and extract corresponding data regardless of their format?
These .mat files include:
Where I don't necessarily know n but I know n is consistent among the variables(field names) extracted
I want to create a function that takes m field names and a .mat file path and returns an nXm pandas dataframe.
My code so far:
import scipy.io as sio
import numpy as np
import pandas as pd
def mat2df(mat_file, var_list):
#mat_file is a file path and var_list is a list of strings corresponding to structure field names
df = pd.DataFrame()
surface_mat = sio.loadmat(mat_file)
for i in list(surface_mat):
if "__" not in i and "readme" not in i: #strip away top dict layer of mat file
mat = surface_mat[i] #mat is an ndarray
if mat.dtype.names is not None: #if mat is a 1Xn structure
for j in mat.dtype.names:
if j in var_list: #if variable is named by user
karray = np.reshape(np.transpose(mat[j]),(-1))
#append dataframe column
df[j] = pd.Series(karray, index=range(len(karray)),dtype=mat[j][0][0].dtype)
elif mat[0][0].dtype.names is not None: # if mat is a 1Xn cell array of 1X1 structures
for j in mat[0][0].dtype.names:
if j in var_list: #if variable is named by user
karray = np.array([])
for k in range(len(mat[0])):
karray = np.append(karray,mat[0][k][j][0][0])
#append dataframe column
df[j] = pd.Series(karray, index=range(len(karray)),dtype=mat[j][0][0].dtype)
else: #Unfortunately, code format doesn't leave many options for data formatting
raise NotImplementedError("Current MATLAB data format not yet supported \
\nCurrent support covers structures and cell arrays of structures")
return df
This code only covers the first 2 mat file types listed. Is there a methodology I can use here where I don't have to write a new if statement for each possible layer of nested structure or cell array?
I made a python package that does this for me. hope anyone else stumbling upon this question can use it too.