So I found this wikipedia resource
Maximum number of resident grids per device (Concurrent Kernel Execution)
and for each compute capability it says a number of concurrent kernels, which I assume to be the maximum number of concurrent kernels.
Now I am getting a GTX 1060 delivered which according to this nvidia CUDA resource has a compute capability of 6.1. From what I have learned about CUDA so far you can specify the virtual compute capability of your code at compile time in NVCC though with the flag -arch=compute_XX
.
So will my GPU be hardware constrained to 32 concurrent kernels or is it capable of 128 with the -arch=compute_60
flag?
According to table 13 in the NVIDIA CUDA programming guide compute capability 6.1 devices have a maximum of 32 resident grids = 32 concurrent kernels.
Even if you use the -arch=compute_60
flag, you will be limited to the hardware limit of 32 concurrent kernels. Choosing particular architectures to compile for does not allow you to exceed the hardware limits of the machine.