cudadynamic-parallelism

CUDA dynamic parallelism with Driver API


I'm trying to compile and link a dynamic kernel and use it with the CUDA driver API on a GK110.

I compile the .cu source file in Visual Studio with the relocatable device code flag and compute_35, sm_35 into a ptx file and then the CUDA linker adds cudadevrt.lib (at least it tried to according to the linker invocation). When I do a cuModuleLoad on the ptx .obj it says unsupported device code. There is a also a .device-link.obj which seems unrealistically small and non of the driver api functions seem to recognize it as a valid image. When inspecting the ptx file I can see that it generated a call to the kernel launch function according to the CUDA documentation (dynamic parallelism from PTX section).

How can I link the proper device code such that the dynamic kernel invocation works?

(this is CUDA 6.5 on Win64 with VC2013)


Solution

  • You need to do the linking while loading the ptx-file using cuda linker provided by the driver API:

    In your app: