[SOLVED] Trouble compiling/running CUDA code involving dynamic parallelism

Trouble compiling/running CUDA code involving dynamic parallelism

I am trying to use dynamic parallelism with CUDA, but I cannot go through the compilation step.

I am working on a GPU with Compute Capability 3.5 and the CUDA version 7.5.

Depending on the switches in the compile command I use, I am getting different error messages, but using the documentation,

I arrived to one line leading to a successful compilation:

nvcc -arch=compute_35 -rdc=true cudaDynamic.cu -o cudaDynamic.out -lcudadevrt

But when the program is launched, all the program fails. With CUDA-memcheck, for each call to an API function, I get the same error message:

========= CUDA-MEMCHECK
========= Program hit cudaErrorUnknown (error 30) due to "unknown error" on CUDA API call to ...

I have also tried this line (taken from CUDA dynamic samples makefile):

nvcc -ccbin g++ -I../../common/inc -m64 -dc -gencode arch=compute_35,code=compute_35 -o cudaDynamic.out -c cudaDynamic.cu

But upon execution, I get:

cudaDynamic.out: Permission denied

I would like to understand how to correctly compile a CUDA dynamic code, because all the other compilation lines that I have tried so far have failed.

Solution

I fixed the problem by fully reinstalling CUDA.

I'm now able to compile both the CUDA samples and my own code.