I am new to Python coding. I want to merge data from 2 H5 files to a main H5 file. My goal is to add all objects in the SRRXX/SRR630/*
groups in each source file (file names in list h5_files
) to the main (target) file (main_h5_path
). The code below is my attempt to do this. When I run, I get this exception:
Error occurred during H5 merging: 'Group' object has no attribute 'encode'
I also tried create_group()
, but get the same exception.
What do I need to modify to get my code to work?
#read the mainfile dataset
with h5py.File(main_h5_path, 'r') as h5_main_file_obj:
# return if H5 doesn't contain any data
if len(h5_main_file_obj.keys()) == 0:
return
main_file_timestamp_dtset_obj = h5_main_file_obj['/' + 'SRR6XX' + '/' + 'SRR630']
for file in h5_files:
with h5py.File(file, 'r') as h5_sub_file_obj:
# return if H5 doesn't contain any data
if len(h5_sub_file_obj.keys()) == 0:
continue
sub_file_timestamp_dtset_obj = h5_sub_file_obj['/' + 'SRR6XX' + '/' + 'SRR630']
# h5_main_file_obj.create_dataset(sub_file_timestamp_dtset_obj)
for ts_key in sub_file_timestamp_dtset_obj.keys():
print('ts_key', ts_key)
each_ts_ds = h5_sub_file_obj['/' + 'SRR6XX' + '/' + 'SRR630' + '/' + str(ts_key) + '/']
h5_main_file_obj.create_dataset(each_ts_ds)
except (IOError, OSError, Exception) as e:
print(f"Error occurred during H5 merging: {e}")
return -1
return 0
My orginal answer only copied the group names under group '/SRR6XX/SRR630
' in the source files to the main (target) file. OP commented they want to "copy the group names along with their datasets".
I updated my answer to reflect that request. It only requires a 1 line change. (For reference, the line to create groups is commented out.)
Here are the changes to your original code required to get this working:
ts_key
in your loop is the object name (not the object). Use .items() to get names and objects (or just reference the object by name).main_file_timestamp_dtset_obj
)Modified code below:
def your_function:
with h5py.File(main_h5_path, 'a') as h5_main_file_obj: # need Append mode to add groups
# return if H5 doesn't contain any data
if len(h5_main_file_obj.keys()) == 0:
return
main_file_timestamp_dtset_obj = h5_main_file_obj['/SRR6XX/SRR630']
for file in h5_files:
with h5py.File(file, 'r') as h5_sub_file_obj:
# return if H5 doesn't contain any data
if len(h5_sub_file_obj.keys()) == 0:
continue
sub_file_timestamp_dtset_obj = h5_sub_file_obj['/SRR6XX/SRR630']
# h5_main_file_obj.create_dataset(sub_file_timestamp_dtset_obj)
for ts_key in sub_file_timestamp_dtset_obj.keys():
print('ts_key:', ts_key)
# This only creates group:
#main_file_timestamp_dtset_obj.create_group(ts_key)
# This copies Group and its objects (groups or datasets):
grp_path = 'SRR6XX/SRR630/' + ts_key
h5_sub_file_obj.copy(h5_sub_file_obj[grp_path], main_file_timestamp_dtset_obj)
I wrote another solution that is more compact and checks if source objects are Groups before copying. See below. Another check to consider: conflicts with existing group names in the main (target) file before copying each group. As noted in my comment, consider using External Links to avoid duplicate data.
def my_function():
with h5py.File(main_h5_path, mode='a') as h5ft:
if len(h5ft.keys()) == 0:
return
for h5_source in h5_files:
with h5py.File(h5_source,'r') as h5fs:
if len(h5ft.keys()) == 0:
continue
for grp_name, h5_obj in h5fs['SRR6XX/SRR630'].items():
if isinstance(h5_obj,h5py.Group):
# This only creates group:
#h5ft['SRR6XX/SRR630'].create_group(grp_name)
# This copies Group and its objects (groups or datasets):
grp_path = 'SRR6XX/SRR630/' + grp_name
h5fs.copy(h5fs[grp_path], h5ft['SRR6XX/SRR630'])