cuda

PTX kernel name mangling


I cannot link my Cuda program when a kernel is compiled from ptx file.

main.cu:

extern 
__global__ void kernel(int, float*);

int main()
{
    ...
    kernel<<<...>>>(...);
    ...
}

kernel.cu

__global__
void kernel(int n, float* p)
{
    ...
}

If I compile like below, I have no problems and I get an executable:

nvcc -dc main.cu kernel.cu --gpu-architecture=sm_70
nvcc -dlink main.o kernel.o --gpu-architecture=sm_70 -o dlink.o
g++ dlink.o main.o kernel.o -lcudart

If I compile like below (by generating ptx), I get errors:

nvcc -ptx kernel.cu --gpu-architecture=sm_70
nvcc -dc main.cu kernel.ptx --gpu-architecture=sm_70
nvcc -dlink main.o kernel.o --gpu-architecture=sm_70 -o dlink.o
g++ dlink.o main.o kernel.o -lcudart

Error:

main.o: In function `main':
tmpxft_0000b5ce_00000000-5_main.cudafe1.cpp:(.text+0x4789): undefined reference to `kernel(int, float*)'
tmpxft_0000b5ce_00000000-5_main.cudafe1.cpp:(.text+0x497e): undefined reference to `kernel(int, float*)'
collect2: error: ld returned 1 exit status

I am following an example from CUDA_Compiler_Driver_NVCC.pdf.

What do I need to do to fix the error?

(This is CUDA 10.2).


Solution

  • If you want to write your own PTX (or modify PTX), the proper CUDA methodology to use is the CUDA driver API and associated compilation flow.

    The CUDA vectorAddDrv sample code has all the plumbing and workflow that you need.