According to NVIDIA Programming Guide:
Any PTX code loaded by an application at runtime is compiled further to binary code by the device driver. This is called just-in-time compilation. Just-in-time compilation increases application load time, but allows applications to benefit from latest compiler improvements.
...
Setting CUDA_FORCE_PTX_JIT to 1 forces the device driver to ignore any binary code embedded in an application (see Section 3.1.4) and to just-in-time compile embedded PTX code instead; if a kernel does not have embedded PTX code, it will fail to load
I've compiled my simple vectorAdd using following flags:
nvcc -o vectorAdd -gencode arch=compute_20,code=sm_20 vectorAdd.cu
When the CUDA_FORCE_PTX_JIT
environment variable is unset, I get correct results. But when I set the CUDA_FORCE_PTX_JIT
environment variable to 1
I get following error from cudaGetErrorString
:
invalid device function
How can I fix this issue and get CUDA_FORCE_PTX_JIT working? Maybe the way of my compilation does not embed any PTX code.
Thanks in Advance.
Further information:
CUDA Driver Version: 295.41
CUDA Toolkit version: 4.0
OS: Ubuntu 10.04
Hardware: GTX 480, or Tesla C2050
I found a workaround to handle the issue. During compile, the target GPU must not be specified in anyway (Remove -arch
or -gencode
flags). Subsequently, the driver generates the destination binary at the runtime.