I have a CUDA program that uses thrust in some places but also normal kernels.
The problem is: When I run the program standalone, everything works fine. When I run it in the profiler (Visual profiler or nvprof cmd line) the program crashes in a thrust::inclusive_scan operation with a cudaErrorIllegalAdress error. The crash happens always in the profiler and always at the same position. Furthermore, I have multiple Iterations like:
void foo(){ cudaProfilerStart();
for(...){//...
thrust::inclusive_scan(...);//...
}
cudaProfilerStop();
}
for(...) foo();
The crash always happens on the first call to inclusive_scan in the 2nd iteration.
I'm cusing CUDA 6.5 on Win7 with a Quadro K5000.
Any ideas what can cause this or how to narrow it down? Maybe a way to get the adress of the failed access? cuda-memcheck cannot be used with nvprof AFAIK(?)
If I remove the calls to cudaProfilerStart/Stop it seems to work ok. Strange enough, it DID work today morning with them although I did not introduce any changes (did some code editing but reverted everything via git) Also the behaviour does not change if I disable/enable profile-from-start (with cudaProfilerStart/Stop in place)
Minimal working example:
#include "cuda_runtime.h"
#include "device_launch_parameters.h"
#include <stdio.h>
#include <thrust/device_vector.h>
#include <cuda_profiler_api.h>
void foo(){
thrust::device_vector<int> d_in(100), d_out(100);
thrust::inclusive_scan(d_in.begin(), d_in.end(), d_out.begin());
cudaError_t res = cudaDeviceSynchronize();
std::cout << cudaGetErrorString(res) << std::endl;
}
int main(){
cudaProfilerStart();
foo();
cudaProfilerStop();
foo(); // Crash here
cudaDeviceReset();
return 0;
}
Some more scenarios:
Start(); foo(); Stop(); foo() crash
Start(); foo(); Stop(); Start(); foo() OK
Start(); foo(); Stop(); any_other_kernel(); Start(); foo() crash
This behaviour appears to be due to a limitation in the CUDA 7.0 and earlier profiler system. A fix will be available in the CUDA 7.5 release toolkit.
[This answer has been assembled from comments and added as a community wiki entry to get the question off the unanswered queue]