I have been trying to debug cuda programs that use inline PTX assembly. Specifically, I am debugging at the instruction level, and am trying to determine the values of arguments to the instructions. Occasionally, the disassembly includes a reference to constant memory. I am trying to have gdb print the value of this constant memory, but have not found any documentation that shows how to do this. For instance, a disassembly includes IADD R0, R0, c[0x0] [0x148]
I want to determine how to have gdb print the value of c[0x0] [0x148]. I have tried using print * (@constant) ... but this does not seem to work (I pass 0x148 here and it prints out nothing). Is this possible to do in cuda-gdb?
I have tried to avoid this by passing the compiler option --disable-optimizer-constants during compilation, but this does not work.
The way to do this is to print *(void * @parameter *) addr
where addr is the address inside the constant bank 0 that should be printed.
Suppose we have a simple kernel in a file called foo.cu
:
#include <cuda.h>
#include <stdio.h>
#include <cuda_runtime.h>
__global__ void myKernel(int a, int b, int *d)
{
*d = a + b;
}
int main(int argc, char *argv[]) {
if (argc < 3) {
printf("Requires inputs a and b to be specified\n");
return 0;
}
int * dev_d;
int d;
cudaMalloc(&dev_d, sizeof(*dev_d));
myKernel<<<1, 1>>>(atoi(argv[1]), atoi(argv[2]), dev_d);
cudaMemcpy(&d, dev_d, sizeof(d), cudaMemcpyDeviceToHost);
cudaFree(dev_d);
printf("D is: %d\n", d);
return 0;
}
which is compiled via
$ nvcc foo.cu -o foo.out
Next, suppose we are interested in disassembling this program, so we execute cuda-gdb
with a command-line for our program:
$ cuda-gdb --args ./foo.out 10 15
Inside cuda-gdb
, we get to the kernel by typing
(cuda-gdb) set cuda break_on_launch application
(cuda-gdb) start
Temporary breakpoint 1, 0x000055555555b12a in main ()
(cuda-gdb) cont
Inside the kernel, we view the disassembly we are interested in debugging:
(cuda-gdb) x/15i $pc
=> 0x555555b790a8 <_Z8myKerneliiPi+8>: MOV R1, c[0x0][0x20]
0x555555b790b0 <_Z8myKerneliiPi+16>: MOV R0, c[0x0][0x144]
0x555555b790b8 <_Z8myKerneliiPi+24>: MOV R2, c[0x0][0x148]
0x555555b790c0 <_Z8myKerneliiPi+32>:
0x555555b790c8 <_Z8myKerneliiPi+40>: MOV R3, c[0x0][0x14c]
0x555555b790d0 <_Z8myKerneliiPi+48>: IADD R0, R0, c[0x0][0x140]
0x555555b790d8 <_Z8myKerneliiPi+56>: STG.E [R2], R0
0x555555b790e0 <_Z8myKerneliiPi+64>:
0x555555b790e8 <_Z8myKerneliiPi+72>: NOP
0x555555b790f0 <_Z8myKerneliiPi+80>: NOP
0x555555b790f8 <_Z8myKerneliiPi+88>: NOP
0x555555b79100 <_Z8myKerneliiPi+96>:
0x555555b79108 <_Z8myKerneliiPi+104>: EXIT
0x555555b79110 <_Z8myKerneliiPi+112>: BRA 0x70
0x555555b79118 <_Z8myKerneliiPi+120>: NOP
The second argument being passed to the IADD
instruction is in one of the constant memory banks. Let's find out what its value actually is. We advance go to the IADD
instruction:
(cuda-gdb) stepi 4
0x0000555555b790d0 in myKernel(int, int, int*)<<<(1,1,1),(1,1,1)>>> ()
(cuda-gdb) x/i $pc
=> 0x555555b790d0 <_Z8myKerneliiPi+48>: IADD R0, R0, c[0x0][0x140]
We can now obtain the contents of c[0x0][0x140]
as follows:
(cuda-gdb) print (int) *(void * @parameter *) 0x140
$1 = 10
Here, we knew the argument should have 32 bits, so we cast it as an (32-bit) int
. If we hadn't done this, we would get too many bits, e.g.:
(cuda-gdb) print *(void * @parameter *) 0x140
$2 = 0xf0000000a
Note the hexadecimal format can be retained by adding /x after the print
command:
(cuda-gdb) print/x (int) *(void * @parameter *)0x140
$3 = 0xa