I try to use a compiled *.cubin file with PyCuda but I get this error:
func = mod.get_function("doublify")
pycuda._driver.LogicError: cuModuleGetFunction failed: named symbol not found
Content of doublify.cu:
__global__ void doublify(float *a)
{
int idx = threadIdx.x + threadIdx.y * 4;
a[idx] *= 2;
}
I compiled it with the following command:
nvcc --cubin -arch sm_75 doublify.cu
This is my python script:
import pycuda.driver as cuda
import pycuda.autoinit
from pycuda.compiler import SourceModule
import numpy
a = numpy.random.randn(4, 4)
a = a.astype(numpy.float32)
a_gpu = cuda.mem_alloc(a.nbytes)
mod = pycuda.driver.module_from_file("doublify.cubin")
func = mod.get_function("doublify")
func(a_gpu, block=(4,4,1))
cuda.memcpy_dtoh(a_doubled, a_gpu)
print(a)
Do I need to pass in additional flags to the nvcc compiler? If I use it with the SourceModule from Pycuda everything is working as expected. It's also not working with compiling a *.fatbin
Figure it out myself after debugging PyCuda itself. If anyone else stumbles upon the same problem, this is the solution: I was missing the extern "C" statement at the beginning of the *.cu file.
extern "C"
__global__ void doublify(float *a)
{
int idx = threadIdx.x + threadIdx.y * 4;
a[idx] *= 2;
}