I'm having trouble interpreting this CUDA error message. I've used compute-sanitizer
to track it down to a memory access in a particular kernel, a libcublas batch matrix multiplication. I don't understand the error, because it reports both "Address is out of bounds" and "Address is inside the nearest allocation". If it's inside an allocation, how is it out of bounds? What's actually going wrong here?
========= Invalid __global__ write of size 4 bytes
========= at 0x18e0 in void gemv2N_kernel<int, int, float, float, float, float, (int)128, (int)8, (int)4, (int)4, (int)1, (bool)0, cublasGemvParams<cublasGemvTensorStridedBatched<const float>, cublasGemvTensorStridedBatched<const float>, cublasGemvTensorStridedBatched<float>, float>>(T13)
========= by thread (0,0,0) in block (0,0,137)
========= Address 0x7fd8c0000224 is out of bounds
========= and is inside the nearest allocation at 0x7fd8aa000000 of size 503316480 bytes
========= Saved host backtrace up to driver entry point at kernel launch time
========= Host Frame: [0x209e4a]
========= in /usr/lib/x86_64-linux-gnu/libcuda.so
========= Host Frame: [0x21caf9b]
========= in /usr/local/cuda-11.5.0/lib64/libcublasLt.so.11
========= Host Frame: [0x2224d18]
========= in /usr/local/cuda-11.5.0/lib64/libcublasLt.so.11
========= Host Frame: [0x8c257c]
========= in /usr/local/cuda-11.5.0/lib64/libcublasLt.so.11
========= Host Frame: [0x8c7717]
========= in /usr/local/cuda-11.5.0/lib64/libcublasLt.so.11
========= Host Frame: [0x67d937]
========= in /usr/local/cuda-11.5.0/lib64/libcublasLt.so.11
========= Host Frame:cublasLtSSSMatmul [0x6a3e03]
<a bunch of stack trace of my calling code omitted>
I don't understand the error, because it reports both "Address is out of bounds" and "Address is inside the nearest allocation". If it's inside an allocation, how is it out of bounds?
There is no contradiction here.
It is inside an allocation which you made within the current CUDA context. It is not inside the allocation you passed to the kernel to operate on.