__global__ void TEST_prog(int *data_in1, int *data_in2, int *data_out) // employing IF functions
{
unsigned int tid = (blockIdx.x * blockDim.x) + threadIdx.x;
data_out[tid] = data_in1[tid] + data_in2[tid];
}
code for sm_10
Function : _Z9TEST_progPiS_S_
.headerflags @"EF_CUDA_SM10 EF_CUDA_PTX_SM(EF_CUDA_SM10)"
/*0000*/ MOV.U16 R0H, g [0x1].U16; /* 0x0023c78010004205 */
/*0008*/ I2I.U32.U16 R1, R0L; /* 0x04000780a0000005 */
/*0010*/ IMAD.U16 R0, g [0x6].U16, R0H, R1; /* 0x0020478060014c01 */
/*0018*/ SHL R2, R0, 0x2; /* 0xc410078030020009 */
/*0020*/ IADD32 R0, g [0x4], R2; /* 0x2102e800 */
/*0024*/ IADD32 R3, g [0x6], R2; /* 0x2102ec0c */
/*0028*/ GLD.U32 R1, global14[R0]; /* 0x80c00780d00e0005 */
/*0030*/ GLD.U32 R0, global14[R3]; /* 0x80c00780d00e0601 */
/*0038*/ IADD32 R1, R1, R0; /* 0x20008204 */
/*003c*/ IADD32 R0, g [0x8], R2; /* 0x2102f000 */
/*0040*/ GST.U32 global14[R0], R1; /* 0xa0c00781d00e0005 */
...................................
what is stored at g [0x1] ?
g[xx] is shared memory, when the value is set ?
ROH is the high 16 bit of R0 ?
ROL is not assign before, but the second inst read it ?
I guess g [0x4], g [0x6], g [0x4] is the kernel args, but why args is set in shared memory ?
The critical piece of information is
EF_CUDA_SM10 EF_CUDA_PTX_SM(EF_CUDA_SM10)
i.e. this code is SASS for a compute capability 1.0 device. In compute 1.x devices, kernel arguments were stored in shared memory. In compute 2.x devices, a dedicated constant memory bank was introduced for that purpose.
So your reading of the code is completely correct.