I have a really large .npy file (previously saved with np.save) and I am loading it with:
np.load(open('file.npy'))
Is there any way to see the progress of the loading process? I know tqdm and some other libraries for monitoring the progress but don't how to use them for this problem.
Thank you!
As far I am aware, np.load
does not provide any callbacks or hooks to monitor progress. However, there is a work around which may work: np.load
can open the file as a memory-mapped file, which means the data stays on disk and is loaded into memory only on demand. We can abuse this machinery to manually copy the data from the memory mapped file into actual memory using a loop whose progress can be monitored.
Here is an example with a crude progress monitor:
import numpy as np
x = np.random.randn(8096, 4096)
np.save('file.npy', x)
blocksize = 1024 # tune this for performance/granularity
try:
mmap = np.load('file.npy', mmap_mode='r')
y = np.empty_like(mmap)
n_blocks = int(np.ceil(mmap.shape[0] / blocksize))
for b in range(n_blocks):
print('progress: {}/{}'.format(b, n_blocks)) # use any progress indicator
y[b*blocksize : (b+1) * blocksize] = mmap[b*blocksize : (b+1) * blocksize]
finally:
del mmap # make sure file is closed again
assert np.all(y == x)
Plugging any progress-bar library into the loop should be straight forward.
I was unable to test this with exceptionally large arrays due to memory constraints, so I can't really tell if this approach has any performance issues.