I cannot link my Cuda program when a kernel is compiled from ptx
file.
main.cu:
extern
__global__ void kernel(int, float*);
int main()
{
...
kernel<<<...>>>(...);
...
}
kernel.cu
__global__
void kernel(int n, float* p)
{
...
}
If I compile like below, I have no problems and I get an executable:
nvcc -dc main.cu kernel.cu --gpu-architecture=sm_70
nvcc -dlink main.o kernel.o --gpu-architecture=sm_70 -o dlink.o
g++ dlink.o main.o kernel.o -lcudart
If I compile like below (by generating ptx
), I get errors:
nvcc -ptx kernel.cu --gpu-architecture=sm_70
nvcc -dc main.cu kernel.ptx --gpu-architecture=sm_70
nvcc -dlink main.o kernel.o --gpu-architecture=sm_70 -o dlink.o
g++ dlink.o main.o kernel.o -lcudart
Error:
main.o: In function `main':
tmpxft_0000b5ce_00000000-5_main.cudafe1.cpp:(.text+0x4789): undefined reference to `kernel(int, float*)'
tmpxft_0000b5ce_00000000-5_main.cudafe1.cpp:(.text+0x497e): undefined reference to `kernel(int, float*)'
collect2: error: ld returned 1 exit status
I am following an example from CUDA_Compiler_Driver_NVCC.pdf
.
What do I need to do to fix the error?
(This is CUDA 10.2).
If you want to write your own PTX (or modify PTX), the proper CUDA methodology to use is the CUDA driver API and associated compilation flow.
The CUDA vectorAddDrv
sample code has all the plumbing and workflow that you need.