__device__ __half2 __h2div ( const __half2 a, const __half2 b )
Description:
Divides half2 input vector a by input vector b in round-to-nearest mode.
__device__ __half2 __hmul2 ( const __half2 a, const __half2 b )
Description:
Performs half2 vector multiplication of inputs a and b, in round-to-nearest-even mode.
Can someone explain me what exact operations are happening for both of these?
Both are elementwise operations. A __half2
is a vector type, meaning it has multiple elements (2) of a simpler type, namely half
(i.e. 16-bit floating point quantity.) These vector types are basically structures where the individual elements are accessed using the structure references .x
, .y
, .z
, and .w
, for vector types up to 4 elements.
If we have two items (a
, b
) that are each of __half2
type:
the division operation:
__half2 a,b;
__half2 result = __hdiv2(a, b);
will create a result
where the first element of result
is equal to the first element of a
divided by the first element of b
, and likewise for the second element.
This means when complete, the following statements should "approximately" be correct:
result.x == a.x/b.x;
result.y == a.y/b.y;
The multiplication operation:
__half2 a,b;
__half2 result = __hmul2(a, b);
will create a result
where the first element of result
is equal to the first element of a
multiplied by the first element of b
, and likewise for the second element.
This means when complete, the following statements should "approximately" be correct:
result.x == a.x*b.x;
result.y == a.y*b.y;
("approximately" means there may be rounding differences, depending on your exact code and possibly other factors, like compile switches)
Regarding rounding, its no different than when these terms are applied in other (non CUDA) contexts. Roughly speaking:
"round to nearest" is what I would consider the usual form of rounding. When an arithmetic result is not exactly representable in the type, the nearest type representation will be chosen so that:
"round to nearest even" is a modification of the above description to choose the closest type representation in the exact midpoint case that has an even numbered least significant digit.