cfloating-pointendianness128-bit

128 bit floating point binary representation error


Let's say we have some 128bit floating point number, for example x = 2.6 (1.3 * 2^1 ieee-754). I put in in union like this:

union flt {
        long double flt;
        int64_t byte8[OCTALC];
    } d;
d = x;

Then i run this to get it hexadecimal representation in memory:

void print_bytes(void *ptr, int size) 
{
    unsigned char *p = ptr;
    int i;
    for (i=0; i<size; i++) {
        printf("%02hhX ", p[i]);
    }
    printf("\n");
}

// some where in the code
print_bytes(&d.byte8[0], 16);

And i get something like

66 66 66 66 66 66 66 A6 00 40 00 00 00 00 00 00

So by assumption i expect to see one of the leading bits(the left ones) to be 1(because exponent of 2.6 is 1) but in fact i see right bits to be 1(like it treating value big-endian). If i flip sign the output changes to:

66 66 66 66 66 66 66 A6 00 C0 00 00 00 00 00 00

So it seems like sign bit is righter than i thought. And if you count the bytes it seems like there is only 10 bytes used remaining 6 is like truncated or something. I trying to find out why this happens any help?


Solution

  • You have a number of misconceptions.

    First of all, you don't have a 128-bit floating point number. long double is probably a float in the x86 extended precision format on an x86-64. This is an 80 bit (10 byte) value, which is padded to 16 bytes. (I suspect this is for alignment purposes.)

    And of course, it's going to be in little-endian byte order (since this is an x86/x86-64). This doesn't refer to the order of the bits in each byte, it refers to the order of the bytes in the whole.

    And finally, the exponent is biased. An exponent of 1 isn't stored as 1. It's stored as 1+0x3FFF. This allows for negative exponents.


    So we get the following:

    66 66 66 66 66 66 66 A6 00 40 00 00 00 00 00 00
    

    Demo on Compiler Explorer

    If we remove the padding and reverse the bytes to better match the image in the Wikipedia page, we get

    4000A666666666666666
    

    This translates to

    +0x1.4CCCCCCCCCCCCCCC × 2^(0x4000-0x3FFF)
    

    (0xA66...6 = 0b1010 0110 0110...0110 ⇒ 0b1.0100 1100 1100...110[0] = 0x1.4CC...C)

    or

    +1.29999999999999999995663191310057982263970188796520233154296875 × 2^1
    

    Decimal conversion obtained using

    perl -Mv5.10 -e'
       use Math::BigFloat;
       Math::BigFloat->div_scale( 1000 );
       say
          Math::BigFloat->from_hex(  "4CCCCCCCCCCCCCCC" ) /
          Math::BigFloat->from_hex( "10000000000000000" )
    '
    

    or

    perl -Mv5.10 -e'
       use Math::BigFloat;
       Math::BigFloat->div_scale( 1000 );
       say
          Math::BigFloat->from_hex( "A666666666666666" ) /
          Math::BigFloat->from_hex( "8000000000000000" )
    '