There exist workloads for which double precision floating point is not quite adequate, hence a need for quad precision. This is rarely supplied in hardware, so a workaround is to use double-double, where a 128-bit number is represented by a pair of 64-bit numbers. It's not true IEEE-754 quad precision - for one thing you don't get any extra exponent bits - but is for many purposes close enough, and much faster than a pure software implementation.
Many computers provide vector floating-point operations, and it would be desirable to use these for double-double operations. Is this possible? In particular, looking at an implementation of double-double at https://github.com/JuliaMath/DoubleDouble.jl/blob/master/src/DoubleDouble.jl it seems to me that each arithmetic operation requires at least one conditional branch in the middle, which I think means SIMD vector operations cannot be used, unless I am missing something?
I take it you’re thinking of the implementations of addition and subtraction, for example:
# Dekker add2
function +{T}(x::Double{T}, y::Double{T})
r = x.hi + y.hi
s = abs(x.hi) > abs(y.hi) ? (((x.hi - r) + y.hi) + y.lo) + x.lo : (((y.hi - r) + x.hi) + x.lo) + y.lo
Double(r, s)
end
On some architectures, the solution might be to compute both branches in parallel using SIMD instructions, then perform an operation that will retrieve the correct result of the two. For example, an incorrect result produced by subtracting x.hi + y.hi
from the wrong operand might always have a negative sign, so taking the maximum might always extract the correct result. (At this time of night, I won’t guarantee that this is valid in this case, but for some operations, the general approach would be.)
Another might be to compare the vector {x.hi, y.hi} > {y.hi, x.hi}
in order to form a bitmask. (That’s pseudocode, not Julia syntax.) The bitwise AND of the bitmask and the pair of potential results will leave the correct result intact and set all bits of the incorrect one to zero. Then, reducing the masked vector with bitwise OR yields the correct result. No branch is required.
A given ISA might have other tricks that would work, such as conditional instructions. Or there are other algorithms than Dekker’s.