pythoncupyrapids

Clear all cached kernels from CuPY to force kernel compilation


In the CuPY documentation, it is stated that

"CuPy caches the kernel code sent to GPU device within the process, which reduces the kernel compilation time on further calls."

This means that when one calls a function from CuPY, subsequent calls to this function will be extremely fast. An example is as follows:

import cupy as cp
from timeit import default_timer as timer
import time

mempool = cp.get_default_memory_pool()
pinned_mempool = cp.get_default_pinned_memory_pool()


def multiply():
    rand = cp.random.default_rng()                             #This is the fast way of creating large arrays with cp
    arr = rand.integers(0, 100_000, (10000, 1000))        #Create array
    y = cp.multiply(arr, 42) ## Multiply by 42, randomly chosen number
    return y

if __name__ == '__main__':
    times = []
    start = timer()
    for i in range(21):
        mempool.free_all_blocks()
        pinned_mempool.free_all_blocks()
        start = timer()
        multiply()
        times.append(timer()-start) 

    print(times)

This will return the times:

[0.17462146899993058, 0.0006819850000283623, 0.0006159440001738403, 0.0006145069999092811, 0.000610309999956371, 0.0006169410000893549, 0.0006062159998236893, 0.0006096620002153941, 0.0006096250001519365, 0.0006106630000886071, 0.0006063629998607212, 0.0006168999998408253, 0.0006058349999875645, 0.0006090080000831222, 0.0005964219999441411, 0.0006113049998930364, 0.0005968339999071759, 0.0005951619998540991, 0.0005980400001135422, 0.0005941219999385794, 0.0006568090000200755]

Where only the first call includes the time it takes to compile the kernel as well.

Is there a way to flush everything in order to force the compilation for each subsequent call to multiply()?


Solution

  • Currently, there is no way to disable kernel caching in CuPy. The only option available is to disable persisting kernel caching on disk (CUPY_CACHE_IN_MEMORY=1), but kernels are cached on-memory so compilation runs only once within the process.

    https://docs.cupy.dev/en/stable/user_guide/performance.html#one-time-overheads https://docs.cupy.dev/en/latest/reference/environment.html