linuxcudanvcc

How to use CUDA_FORCE_PTX_JIT?


According to NVIDIA Programming Guide:

Any PTX code loaded by an application at runtime is compiled further to binary code by the device driver. This is called just-in-time compilation. Just-in-time compilation increases application load time, but allows applications to benefit from latest compiler improvements.

...

Setting CUDA_FORCE_PTX_JIT to 1 forces the device driver to ignore any binary code embedded in an application (see Section 3.1.4) and to just-in-time compile embedded PTX code instead; if a kernel does not have embedded PTX code, it will fail to load

I've compiled my simple vectorAdd using following flags:

nvcc -o vectorAdd -gencode arch=compute_20,code=sm_20 vectorAdd.cu

When the CUDA_FORCE_PTX_JIT environment variable is unset, I get correct results. But when I set the CUDA_FORCE_PTX_JIT environment variable to 1 I get following error from cudaGetErrorString:

invalid device function 

How can I fix this issue and get CUDA_FORCE_PTX_JIT working? Maybe the way of my compilation does not embed any PTX code.

Thanks in Advance.

Further information:

CUDA Driver Version: 295.41

CUDA Toolkit version: 4.0

OS: Ubuntu 10.04

Hardware: GTX 480, or Tesla C2050


Solution

  • I found a workaround to handle the issue. During compile, the target GPU must not be specified in anyway (Remove -arch or -gencode flags). Subsequently, the driver generates the destination binary at the runtime.