floating-pointieee-754powerpcepsilondouble-double-arithmetic

PPC64 long double's machine epsilon calculation


I'm playing around with a PPC64 virtual machine emulated in qemu, trying to mimic a POWER8 CPU.

Here, the long double type is different from the 80-bit float used in x86 for long doubles, and from what I see it does not conform to IEEE754's float128 either, seeing as it has a mantissa with 106 bits according to the C macro LDBL_MANT_DIG (vs. 112 bits of mantissa dictated by IEEE754 for their float128).

Wikipedia says that an IEEE754 float128 should have a machine epsilon of around 1.93e-34, which is much better than that of an 80-bit x86 float (1.08e-19).

When I try to get the machine epsilon in this virtual machine however, I get a rather surprising answer:

#include <iostream>
int main()
{
    long double eps = 1.0l;
    while (1.0l + 0.5l * eps != 1.0l)
        eps = 0.5l * eps;
    std::cout << eps << std::endl;
    return 0;
}

It outputs the following:

4.94066e-324

And I get the same result from LDBL_EPSILON and from std::numeric_limits<long double>::epsilon().

That would make it roughly 10x more precise than what it's supposed to be, which logic tells me should be impossible. Seeing as the mantisa is exactly 2x53 (that of IEEE754's float64), I assume it might be using a double-double struct, which Wikipedia also says should have less precision around small numbers than an IEEE754 float128.

What is happening here?


Solution

  • Firstly, let me assume your operating system is Linux. Until now, all compilers on 64-bit PowerPC uses the ‘double-double’ type for long double by default, whose format is not IEEE-compliant.

    The format is actually a combination of two doubles, so understand it as struct { double high; double low; }. The high part is no different than a normal double, while the low part provides extended mantissa. The exponent of the whole is the same as double, which means its largest representable number is not larger by orders of magnitude than double’s (since double-double has longer mantissa, they’re still different).

    Currently operations of double-double floating numbers don’t have native PowerPC instruction support. The a+b of long double will finally be translated into a call to function __gcc_qadd. (both LLVM’s compiler-rt and GCC’s libgcc have their implementations, see source code of the add function)

    Since POWER ISA 3.0 (Power9), native instruction support for ‘binary128’ (IEEE-compliant 128-bit float type) is supported. You can use -mcpu=power9 -mfloat128 to enable the feature and use __float128 to represent it, or add -mabi=ieeelongdouble to make the compiler see long double as binary128 instead of double-double (along with C library declarations).

    Binary128 is not combination of two doubles, but stored/passed with vector registers, and has much better precision than double-double, with 112 bits of mantissa and 15 bits of exponent. GCC/Clang actually supports binary128 for targets with VSX (Power7 or later) by also using support functions in compiler-rt/libgcc. (for example, a+b produces xsaddqp instruction for Power9 and later, __addkf3 for Power7 and Power8)

    If your toolchain is relatively new (for example, the advanced toolchain 14 or newer), try enabling -mabi=ieeelongdouble with C code. After C++ library support finished, GCC and Clang plans to switch default long double type as binary128 on 64-bit little endian in the future.