I have an existing HDF5 file with multiple tables. I want to modify this HDF5 file: in one of the tables I want to drop some rows entirely, and modify values in the remaining rows.
I tried the following code:
import h5py
import numpy as np
with h5py.File("my_file.h5", "r+") as f:
# Get array
table = f["/NASTRAN/RESULT/ELEMENTAL/STRESS/QUAD4_COMP_CPLX"]
arr = np.array(table)
# Modify array
arr = arr[arr[:, 1] == 2]
arr[:, 1] = 1
# Write array back
table[...] = arr
This code however results in the following error when run:
Traceback (most recent call last):
File "C:\_Work\test.py", line 10, in <module>
arr[arr[:, 1] == 2]
IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed
So one of the problems seems to be that the numpy array arr
that I've created is not a two-dimensional array. However I'm not sure exactly how to create a two-dimensional array out of the HDF5 table (or whether that is even the best approach here).
Would anyone here be able to help put me on the right path?
Output from h5dump
on my dataset is as follows
HDF5 "C:\_Work\my_file.h5" {
DATASET "/NASTRAN/RESULT/ELEMENTAL/STRESS/QUAD4_COMP_CPLX" {
DATATYPE H5T_COMPOUND {
H5T_STD_I64LE "EID";
H5T_STD_I64LE "PLY";
H5T_IEEE_F64LE "X1R";
H5T_IEEE_F64LE "Y1R";
H5T_IEEE_F64LE "T1R";
H5T_IEEE_F64LE "L1R";
H5T_IEEE_F64LE "L2R";
H5T_IEEE_F64LE "X1I";
H5T_IEEE_F64LE "Y1I";
H5T_IEEE_F64LE "T1I";
H5T_IEEE_F64LE "L1I";
H5T_IEEE_F64LE "L2I";
H5T_STD_I64LE "DOMAIN_ID";
}
DATASPACE SIMPLE { ( 990 ) / ( H5S_UNLIMITED ) }
ATTRIBUTE "version" {
DATATYPE H5T_STD_I64LE
DATASPACE SIMPLE { ( 1 ) / ( 1 ) }
}
}
}
This answer is specifically focused on OP's request in comments to "throw away all rows where the value for PLY is not 2. Then in the remaining rows change the value for PLY from 2 to 1".
The procedure is relatively straight-forward...if you know the tricks. Steps are summarized here, with matching comments in the code:
np.nonzero()
returns row indices that match the condition stress_arr['PLY']==2
, then uses them as indices to slice values from the array.Code below:
with h5py.File('quad4_comp_cplx_test.h5', 'r+') as h5f:
# Create stress dataset object
stress_ds = h5f['/NASTRAN/RESULT/ELEMENTAL/STRESS/QUAD4_COMP_CPLX']
## stress array below not reqd
## stress_arr = stress_ds[()]
print(stress_ds.shape)
# Rename/move original output dataset to saved name
h5f.move('/NASTRAN/RESULT/ELEMENTAL/STRESS/QUAD4_COMP_CPLX',\
'/NASTRAN/RESULT/ELEMENTAL/STRESS/QUAD4_COMP_CPLX_save')
# Slice a stress array from dataset using indices where PLY==2
# modified reference from stress_arr to stress_ds
## mod_stress_arr = stress_arr[np.nonzero(stress_arr['PLY']==2)]
mod_stress_arr = stress_ds[np.nonzero(stress_ds['PLY']==2)]
print(mod_stress_arr.shape)
# Modify PLY ID from 2 to 1 for all rows
mod_stress_arr[:]['PLY'] = 1
# Finally, save the ply stress array to a dataset with the original name
h5f.create_dataset('/NASTRAN/RESULT/ELEMENTAL/STRESS/QUAD4_COMP_CPLX',
data=mod_stress_arr)