I am trying to use dynamic parallelism with CUDA, but I cannot go through the compilation step.
I am working on a GPU with Compute Capability 3.5 and the CUDA version 7.5.
Depending on the switches in the compile command I use, I am getting different error messages, but using the documentation,
I arrived to one line leading to a successful compilation:
nvcc -arch=compute_35 -rdc=true cudaDynamic.cu -o cudaDynamic.out -lcudadevrt
But when the program is launched, all the program fails. With CUDA-memcheck, for each call to an API function, I get the same error message:
========= CUDA-MEMCHECK
========= Program hit cudaErrorUnknown (error 30) due to "unknown error" on CUDA API call to ...
I have also tried this line (taken from CUDA dynamic samples makefile):
nvcc -ccbin g++ -I../../common/inc -m64 -dc -gencode arch=compute_35,code=compute_35 -o cudaDynamic.out -c cudaDynamic.cu
But upon execution, I get:
cudaDynamic.out: Permission denied
I would like to understand how to correctly compile a CUDA dynamic code, because all the other compilation lines that I have tried so far have failed.
I fixed the problem by fully reinstalling CUDA.
I'm now able to compile both the CUDA samples and my own code.