I'm using cupy
in a function that receives a numpy
array, shoves it on the GPU, does some operations on it and returns a cp.asnumpy
copy of it.
The problem: The memory is not freed after the function (as seen in ndidia-smi
).
I know about the caching and re-using of memory done by cupy
. However, this seems to work only per-user. When multiple users are computing on the same GPU-server, they are limited by the cached memory of other users.
I also tried calling cp._default_memory_pool.free_all_blocks()
inside the function at the end. This seems to have no effect. Importing cupy
in the main code and calling free_all_blocks
"manually" works, but I'd like to encapsulate the GPU stuff in the function, not visible to the user.
Can you fully release GPU memory used inside a function so that it's usable by other users?
Minimal example:
Main module:
# dont import cupy here, only numpy
import numpy as np
# module in which cupy is imported and used
from memory_test_module import test_function
# host array
arr = np.arange(1000000)
# out is also on host, gpu stuff happens in test_function
out = test_function(arr)
# GPU memory is not released here, unless manually:
import cupy as cp
cp._default_memory_pool.free_all_blocks()
Function module:
import cupy as cp
def test_function(arr):
arr_gpu = cp.array(arr)
arr_gpu += 1
out_host = cp.asnumpy(arr_gpu)
# this has no effect
cp._default_memory_pool.free_all_blocks()
return out_host
CuPy uses Python's reference counter to track which arrays are in use.
In this case, you should del arr_gpu
before calling free_all_blocks
in test_function
.
See here for more details: https://docs.cupy.dev/en/latest/user_guide/memory.html