A CUDA program should do reduction for double-precision data, I use Julien Demouth's slides named "Shuffle: Tips and Tricks"
the shuffle function is below:
/*for shuffle of double-precision point */
__device__ __inline__ double shfl(double x, int lane)
{
int warpSize = 32;
// Split the double number into 2 32b registers.
int lo, hi;
asm volatile("mov.b32 {%0,%1}, %2;":"=r"(lo),"=r"(hi):"d"(x));
// Shuffle the two 32b registers.
lo = __shfl_xor(lo,lane,warpSize);
hi = __shfl_xor(hi,lane,warpSize);
// Recreate the 64b number.
asm volatile("mov.b64 %0,{%1,%2};":"=d"(x):"r"(lo),"r"(hi));
return x;
}
At present, I got the errors below while compiling the program.
ptxas /tmp/tmpxft_00002cfb_00000000-5_csr_double.ptx, line 71; error : Arguments mismatch for instruction 'mov'
ptxas /tmp/tmpxft_00002cfb_00000000-5_csr_double.ptx, line 271; error : Arguments mismatch for instruction 'mov'
ptxas /tmp/tmpxft_00002cfb_00000000-5_csr_double.ptx, line 287; error : Arguments mismatch for instruction 'mov'
ptxas /tmp/tmpxft_00002cfb_00000000-5_csr_double.ptx, line 302; error : Arguments mismatch for instruction 'mov'
ptxas /tmp/tmpxft_00002cfb_00000000-5_csr_double.ptx, line 317; error : Arguments mismatch for instruction 'mov'
ptxas /tmp/tmpxft_00002cfb_00000000-5_csr_double.ptx, line 332; error : Arguments mismatch for instruction 'mov'
ptxas fatal : Ptx assembly aborted due to errors
make: *** [csr_double] error 255
Could someone give some advice?
There is a syntax error in the inline assembly instruction for the load of the double argument to 32 bit registers. This:
asm volatile("mov.b32 {%0,%1}, %2;":"=r"(lo),"=r"(hi):"d"(x));
should be:
asm volatile("mov.b64 {%0,%1}, %2;":"=r"(lo),"=r"(hi):"d"(x));
Using a "d" (ie 64 bit floating point register) as the source in a 32 bit load is illegal (and a mov.b32 makes no sense here, the code must load 64 bits to two 32 bit registers).