pythonnumpyhdf5h5py

Failing to write in hdf5 file


I am trying to create hdf5 file, but the output file is empty.

I have written a python code which is supposed to run in loop and write string in the created datasets. After the file gets saved, I found that the output file is always empty.

Below is the piece of code I have written:

h5_file_name = 'sample.h5'
hf = h5py.File(h5_file_name, 'w')
g1 = hf.create_group('Objects')
dt = h5py.special_dtype(vlen=str)
d1 = g1.create_dataset('D1', (2, 10), dtype=dt)
d2 = g1.create_dataset('D2', (3, 10), dtype=dt)
for i in range(10):
    d1[0][i] = 'Sample'
    d1[1][i] = str(i)
    d2[0][i] = 'Hello'
    d2[1][i] = 'World'
    d2[2][i] = str(i)
hf.close()

The output file is empty as mentioned above.

Can anyone please point out what am I missing here, many thanks in advance !


Solution

  • Your code works for me (in an ipython session):

    In [1]: import h5py                                                                                    
    In [2]: h5_file_name = 'sample.h5' 
       ...: hf = h5py.File(h5_file_name, 'w') 
       ...: g1 = hf.create_group('Objects') 
       ...: dt = h5py.special_dtype(vlen=str) 
       ...: d1 = g1.create_dataset('D1', (2, 10), dtype=dt) 
       ...: d2 = g1.create_dataset('D2', (3, 10), dtype=dt) 
       ...: for i in range(10): 
       ...:     d1[0][i] = 'Sample' 
       ...:     d1[1][i] = str(i) 
       ...:     d2[0][i] = 'Hello' 
       ...:     d2[1][i] = 'World' 
       ...:     d2[2][i] = str(i) 
       ...: hf.close()   
    

    This runs, and creates a file. It is not "empty" in the normal sense. But if by file being empty you mean that it didn't write the words to the file? All that's present is the original ''.

    In [4]: hf = h5py.File(h5_file_name, 'r')                                                              
    In [5]: hf['Objects/D1']                                                                               
    Out[5]: <HDF5 dataset "D1": shape (2, 10), type "|O">
    In [6]: hf['Objects/D1'][:]                                                                            
    Out[6]: 
    array([['', '', '', '', '', '', '', '', '', ''],
           ['', '', '', '', '', '', '', '', '', '']], dtype=object)
    

    ===

    The problem isn't with the file setup, but rather with how you are trying to set elements:

    In [45]: h5_file_name = 'sample.h5' 
        ...: hf = h5py.File(h5_file_name, 'w') 
        ...: g1 = hf.create_group('Objects') 
        ...: dt = h5py.special_dtype(vlen=str) 
        ...: d1 = g1.create_dataset('D1', (2, 10), dtype=dt) 
        ...: d2 = g1.create_dataset('D2', (3, 10), dtype=dt) 
        ...:                                                                                               
    In [46]: d1[:]                                                                                         
    Out[46]: 
    array([['', '', '', '', '', '', '', '', '', ''],
           ['', '', '', '', '', '', '', '', '', '']], dtype=object)
    In [47]: d1[0][0] = 'sample'                                                                           
    In [48]: d1[:]                                                                                         
    Out[48]: 
    array([['', '', '', '', '', '', '', '', '', ''],
           ['', '', '', '', '', '', '', '', '', '']], dtype=object)
    

    Use the tuple style of indexing:

    In [49]: d1[0, 0] = 'sample'                                                                           
    In [50]: d1[:]                                                                                         
    Out[50]: 
    array([['sample', '', '', '', '', '', '', '', '', ''],
           ['', '', '', '', '', '', '', '', '', '']], dtype=object)
    

    With a numpy array d1[0][0]=... works, but that's because d1[0] is a view of d1, but h5py (apparently) does not quite replicate this. d1[0] is a copy, an actual numpy array, not the dataset itself.

    Variations on that whole-array indexing:

    In [51]: d1[0, :] = 'sample'                                                                           
    In [52]: d1[1, :] = np.arange(10)                                                                      
    In [53]: d1[:]                                                                                         
    Out[53]: 
    array([['sample', 'sample', 'sample', 'sample', 'sample', 'sample',
            'sample', 'sample', 'sample', 'sample'],
           ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']], dtype=object)
    In [54]: d2[:,0] = ['one','two','three']                                                               
    In [55]: d2[:]                                                                                         
    Out[55]: 
    array([['one', '', '', '', '', '', '', '', '', ''],
           ['two', '', '', '', '', '', '', '', '', ''],
           ['three', '', '', '', '', '', '', '', '', '']], dtype=object)
    

    Verifying the change in type with indexing:

    In [64]: type(d1)                                                                                      
    Out[64]: h5py._hl.dataset.Dataset
    In [65]: type(d1[0])                                                                                   
    Out[65]: numpy.ndarray
    

    d1[0][0]='foobar' would change that d1[0] array without affecting the d1 dataset.