c++floating-pointavr-gcc

Where to find information about the exact binary representation of floating point values used by avr-gcc when compiling for 8-bit processors?


I need to find out the exact binary representation for floats and doubles in a C++ project built with Platformio for an Atmega328 using the Arduino framework. I don't have access to the actual hardware so I can't check it myself.

The micro does not have an FPU and is 8-bit so it's pretty much all up to the compiler (or framework's libraries?) - which in this case seems to be avr-gcc, version 7.3. I've managed to get as far as the avr-gcc documentation telling me that by default double is represented the same way as a float but does not specify what that actually is (the IEEE standard is only mentioned for an optional long double).

So, the question is kinda twofold, really. Most importantly, I need to know what representation is the float in this particular case (I strongly suspect it's IEEE 754, but could use a confirmation). And secondly, I wonder where can find this information formally, as a part of some kind of official documentation.


Solution

  • Floating-Point Format

    In any case, the floating-point format is:

    IEEE-754, binary, little-endian. See also avr-gcc Wiki: Type Layout.

    In the encoded form, respective parts of the representation will occupy:

    32-Bit Floating-Point 64-Bit Floating-Point
    Sign 1 bit (31) 1 bit (63)
    Biased Exponent 8 bits (30−23) 11 bits (62−52)
    Encoded Mantissa 23 Bits (22−0) 52 bits (51−0)
    Exponent Bias 127 1023
    sizeof 4 8

    NaNs are non-signalling.

    Some of the properties are available as GCC built-in macros, for example for float, run

    > echo "" | avr-gcc -xc - -E -dM | grep _FL | sort
    
    #define __FLOAT_WORD_ORDER__ __ORDER_LITTLE_ENDIAN__
    ...
    #define __FLT_HAS_DENORM__ 1
    #define __FLT_HAS_INFINITY__ 1
    #define __FLT_HAS_QUIET_NAN__ 1
    #define __FLT_MANT_DIG__ 24
    #define __FLT_MAX_EXP__ 128
    ...
    #define __FLT_MIN_EXP__ (-125)
    #define __FLT_RADIX__ 2
    #define __SIZEOF_FLOAT__ 4
    

    For double properties, grep for __DBL or DOUBLE.

    Floating-Point Availability

    Floating-Point Implementation