I'm new to HDF5 files and I don't understand how to access chunks in a dataset. I have quite a big dataset (1536, 2048, 11, 18, 2) which is chunked into (768, 1024, 1,1,1), each chunk represents half of an image. I want to plot the dataset, giving the mean values of each (whole) image (using matplotlib).
Question: how to I access separate chunks and how do I work with them? (How does h5py use them?)
This is my code:
bla = np.random.randint(0,100, (1536, 2048, 11, 18, 2))
with h5py.File('test.h5','w') as f:
grp = f.create_group('Measurement 1')
grp.create_dataset('data', data = bla, chunks = (768,1024,1,1,1))
f.close()
I have this to get access to the dataset, but I don't know how to access the chunks..
with h5py.File('test.h5', 'r') as hf:
for dset in hf['Measurement 1'].keys():
print (dset)
ds_hf = hf['Measurement 1']['data'] # returns HDF5 dataset object
print (ds_hf)
print (ds_hf.shape, ds_hf.dtype)
data_f = hf['Measurement 1']['data'][:] # adding [:] returns a numpy array
hf.close()
I need the program to open each chunk, get the mean value and close it again before opening the next one, so my RAM doesn't get full.
Here is a sample code that you can understand how chunks work in hdf5, I developed it in a general way, you can modify it based on you requirements:
import h5py
import numpy as np
# Generate random data
bla = np.random.randint(0, 100, (1536, 2048, 11, 18, 2))
# Create the HDF5 file and dataset
with h5py.File('test.h5', 'w') as f:
grp = f.create_group('Measurement 1')
grp.create_dataset('data', data=bla, chunks=(768, 1024, 1, 1, 1))
# Open the HDF5 file
with h5py.File('test.h5', 'r') as hf:
# Access the dataset
ds_hf = hf['Measurement 1']['data']
print(ds_hf)
print(ds_hf.shape, ds_hf.dtype)
# Iterate over the chunks
for chunk_idx in np.ndindex(ds_hf.chunks):
chunk = ds_hf[chunk_idx]
# Process the chunk
chunk_mean = np.mean(chunk)
print(f"Chunk {chunk_idx}: Mean value = {chunk_mean}")
# Close the HDF5 file
hf.close()