I've been experimenting with Vector to use HW to parallelise integer arithmetic. Is there any way to enable overflow checking with vector operations?
One example is to add two columns (equal length arrays) of ints together. Here c=a+b
means c[0] = a[0] + b[0]
, c[1] = a[1] + b[1]
, etc.
I suppose I could do something like this:
overflow[i] = b[i] >= 0 ? c[i] < a[i] : c[i] >= a[i];
But this (branching) might be slower than .Net's automatic overflow checking, and might negate the performance benefit of using Vector<T>
.
We also want to optimise our most commonly used operations: multiplication, subtraction, to a lesser extent integer division.
Edit: I thought about this a little more, and came up with this, which is 2.5 times as slow as the unchecked vector addition. Seems like a lot of additional overhead.
public Vector<int> Calc(Vector<int> a, Vector<int> b)
{
var result = a + b;
var overflowFlag = Vector.GreaterThan(b, Vector<int>.Zero) * Vector.LessThan(result,a)
+ Vector.LessThan(b,Vector<int>.Zero) * Vector.GreaterThan(result, a);
// It makes no sense to add the flags to the result, but haven't decided what to do with them yet,
// and don't want the compiler to optimise the overflow calculation away
return result + overflowFlag;
}
Timings: (4k iterations adding a pair of 100k arrays)
Using some trickery borrowed from Hacker's Delight (chapter 2, section Overflow Detection), here are some overflow predicates (not tested):
Signed addition:
var sum = a + b;
var ovf = (sum ^ a) & (sum ^ b);
The result is in the signs, not full masks. Maybe that's enough, maybe not, in which case I would normally recommend a right shift but there's no right shift on Vector<T>
(way too much stuff is missing). You could compare against zero though.
Unsigned addition: For completeness?
var sum = a + b;
var ovf = Vector.LessThan(sum, a);
Multiplication:
As far as I can tell there's just no reasonable way to do it. It's a bit annoying even in native SSE, but with pmuldq
and some shuffling it's not too bad.
In C# SIMD this just seems hopeless. There is no high-mul (also missing from native SSE except for 16bit ints, also annoying), no widening multiplication (and no way to narrow the result anyway), also there is no reasonable way to widen in advance. Even if you could widen (could they please add that to the API, seriously), multiplying 64bit integers with SSE is annoying, so annoying that doing it with scalar arithmetic isn't a bad option, which defeats the point.
So I'm going to recommend not doing that in SIMD, not in C# at least.
That doesn't have to mean that you use the built-in overflow detection. While that is appropriate if overflow is a fatal error, it's disastrously slow if it's common and expected and you just want the overflow status in a boolean flag. In that case, you could use:
Signed multiplication:
long ext_prod = (long)a * b;
int prod = (int)ext_prod;
bool ovf = (prod >> 31) != (int)(ext_prod >> 32);
Unsigned multiplication:
ulong ext_prod = (ulong)a * b;
uint prod = (uint)ext_prod;
bool ovf = (ext_prod >> 32) != 0;
In SIMD it would have worked essentially the same way, ie testing whether the high half is filled with copies of the sign (by definition zero in the unsigned case), but the widening makes it annoying in native SIMD and hopeless in C# SIMD.