pythonnumpyprogress-bar

Load .npy file with np.load progress bar


I have a really large .npy file (previously saved with np.save) and I am loading it with:

np.load(open('file.npy'))

Is there any way to see the progress of the loading process? I know tqdm and some other libraries for monitoring the progress but don't how to use them for this problem.

Thank you!


Solution

  • As far I am aware, np.load does not provide any callbacks or hooks to monitor progress. However, there is a work around which may work: np.load can open the file as a memory-mapped file, which means the data stays on disk and is loaded into memory only on demand. We can abuse this machinery to manually copy the data from the memory mapped file into actual memory using a loop whose progress can be monitored.

    Here is an example with a crude progress monitor:

    import numpy as np
    
    x = np.random.randn(8096, 4096)
    np.save('file.npy', x)
    
    blocksize = 1024  # tune this for performance/granularity
    
    try:
        mmap = np.load('file.npy', mmap_mode='r')
        y = np.empty_like(mmap)
        n_blocks = int(np.ceil(mmap.shape[0] / blocksize))
        for b in range(n_blocks):
            print('progress: {}/{}'.format(b, n_blocks))  # use any progress indicator
            y[b*blocksize : (b+1) * blocksize] = mmap[b*blocksize : (b+1) * blocksize]
    finally:
        del mmap  # make sure file is closed again
    
    assert np.all(y == x)
    

    Plugging any progress-bar library into the loop should be straight forward.

    I was unable to test this with exceptionally large arrays due to memory constraints, so I can't really tell if this approach has any performance issues.