pythoncudanumbanumba-pro

CUDA-Python: How to launch CUDA kernel in Python (Numba 0.25)?


could you please help me understand how to write CUDA kernels in Python? AFAIK, numba.vectorize can be performed on cuda, cpu, parallel(multi-cpus), based on target. But target='cuda' requires to set up CUDA kernels.

The main issue is that many examples, answers in Internet are related to deprecated NumbaPro library, so it's hard to follow to such as not-updated WIKIs, especially if you're newbie.

I have:

Here is the error I'm getting:

numba.cuda.cudadrv.driver.CudaAPIError: 1 Call to cuLaunchKernel results in CU DA_ERROR_INVALID_VALUE

import numpy as np
import time

from numba import vectorize, cuda

@vectorize(['float32(float32, float32)'], target='cuda')
def VectorAdd(a, b):
    return a + b

def main():
    N = 32000000

    A = np.ones(N, dtype=np.float32)
    B = np.ones(N, dtype=np.float32)

    start = time.time()
    C = VectorAdd(A, B)
    vector_add_time = time.time() - start

    print "C[:5] = " + str(C[:5])
    print "C[-5:] = " + str(C[-5:])

    print "VectorAdd took for % seconds" % vector_add_time

if __name__ == '__main__':
    main()

Solution

  • The code, as posted, is correct and will run on a Python 2 Numbapro/Accelerate system without error.

    It was likely that the particular system being used to run the code wasn't very large in capacity and was hitting a display driver watchdog or free memory error with 32 million element vectors. Reducing the size of the input data allowed the code to run correctly.

    [This answer assembled from comments and added as a community wiki entry to get this question off the unanswered list]