I have some CUDA code which uses the half2
datatype. It should be just two 16 bit floating point numbers packed together in a 32 bit space.
Apparently there are the methods __low2half
and __high2half
which convert a half2
vector into a single half
number. But there is also .x
and .y
on this vector. Most of the time these are the same, so this code works runs through most of the time:
assert(q.x == __low2half(q));
assert(q.y == __high2half(q));
However, there are some kernel runs where this doesn't hold. These are cases where they are not the same:
x=-57344, y=-53376, low=0.234497, high=-0.17041
x=-inf, y=nan, low=0.00634766, high=0.473877
x=nan, y=nan, low=-0.0716553, high=0.540039
x=0, y=0, low=0, high=0
x=3.8147e-05, y=nan, low=nan, high=nan
x=-61440, y=nan, low=-0.999512, high=0.31958
Which way of accessing them is correct, then?
There should be no behavioral difference if properly applied.
Per IEEE definition, nan
values are never equal to anything.
Apart from that, a simple exhaustive test case passes:
# cat t106.cu
#include <cuda_fp16.h>
#include <cstdio>
__global__ void k(){
for (size_t qs = blockIdx.x*blockDim.x+threadIdx.x; qs < 0x100000000ULL; qs += gridDim.x*blockDim.x){
unsigned qu = (unsigned)qs;
half2 q = *(reinterpret_cast<half2 *>(&qu));
if (!__isnan(q.x)) if (q.x != __low2half(q)) printf("fail x: %u\n", qu);
if (!__isnan(q.y)) if (q.y != __high2half(q)) printf("fail y: %u\n", qu);
}
}
int main(){
k<<<1024,256>>>();
cudaDeviceSynchronize();
}
# nvcc -o t106 t106.cu
# ./t106
#
If you want to see e.g. nan
values printed out, omit the first if-test in each case.
I wouldn't be able to respond to a claim about a specific test case unless a complete example of that test case is given. However I claim that your test case is not valid if it produces a result suggesting inequality, subject to above treatment.