pythonfloating-pointieee-754arbitrary-precision

String of float to IEEE 754 binary128 in Python


I want a function like this:

>>> binary128_to_hex("1.0")
'3fff0000000000000000000000000000'

I currently use C and qemu-aarch64 to do this on my x86 laptop. How can I implement such a function "natively"? I found numpy.float128 and the struct package not helpful.
In addition, thanks to Mark Dickinson’s answer, I figured out to do the reverse conversion (although only normalized values are handled):

import decimal


def hex_to_binary128(x: str):
    with decimal.localcontext() as context:
        context.prec = 34
        x = int(x, 16)
        significand_mask = (1 << 112) - 1
        exponent_mask = (1 << 127) - (1 << 112)
        trailing_significand = x & significand_mask
        significand = 1 + decimal.Decimal(trailing_significand) / (1 << 112)
        biased_exponent = (x & exponent_mask) >> 112
        exponent = biased_exponent - 16383
        f = significand * decimal.Decimal(2) ** exponent
        return f


if __name__ == "__main__":
    print(hex_to_binary128("0000ffffffffffffffffffffffffffff"))
    # 3.362103143112093506262677817321752E-4932

Solution

  • There's a whole spectrum of possible solutions here, depending on how much complexity you want to allow, what kind of performance you need, how much you're prepared to depend on external libraries, and to what extent you need to handle IEEE 754 special cases (overflow, subnormals, signed zeros, etc.).

    Here's some working code that I hope gives a reasonable compromise. It's (a) fairly simple, (b) doesn't depend on anything outside the standard library, (c) handles subnormals, signed zeros, and overflows reasonably well (but doesn't attempt to interpret strings like "inf" or "nan"), and (d) probably has terrible performance. But if you're only interested in casual uses, it may be good enough.

    The idea of the code is to side-step all parsing difficulties by using the fractions.Fraction parser to parse the string input to a Fraction object. We can then pick apart that Fraction object and construct the information we need.

    I'll present the solution in three pieces. First, one of the basic tools we need is the ability to compute the binary exponent (in other words, the floor of the base-2 log) of a positive Fraction. Here's the code for that:

    def exponent(f):
        """
        Binary exponent (IEEE 754 style) of a positive Fraction f.
    
        Returns the unique integer e such that 2**e <= f < 2**(e + 1). Results
        for negative or zero f are not defined.
        """
        n, d = f.numerator, f.denominator
        e = n.bit_length() - d.bit_length()
        if e >= 0:
            adjust = (n >> e) < d     # n / d < 2**e  <=>  floor(n / 2**e) < d
        else:
            adjust = (-d >> -e) < -n  # n / d < 2**e  <=>  floor(-d / 2**-e) < -n
        return e - adjust
    

    It's mostly straightforward: the difference e in bit lengths of the numerator and denominator either gives us the correct exponent, or it's one larger than it should be. To figure out which, we have to compare the fraction's value with 2**e, where e is our test exponent. We could do that directly, computing 2**e as a Fraction and then comparing, but it's a bit more efficient to use some bit-shifting, so that's what we do.

    Next we define some basic and derived constants that describe the IEEE 754 binary128 format. (This makes it easy to test the code below, by replacing those constants with those for the binary64 format and checking that the results are as expected.) The bit width of the format is 128; the precision is 113, and everything else can be derived from those two values.

    # Basic and derived constants for the format.
    WIDTH = 128
    PRECISION = 113
    EMAX = (1 << WIDTH - PRECISION - 1) - 1
    EMIN = 1 - EMAX
    INF = (1 << WIDTH - 1) - (1 << PRECISION - 1)
    SIGN_BIT = 1 << WIDTH - 1
    

    Most of this should be self-explanatory. The constant INF is the bit representation of the positive infinity constant, and we'll use it to handle overflow.

    Finally, here's the main function:

    from fractions import Fraction as F
    
    def binary128_to_hex(s):
        """
        Convert a decimal numeric string to its binary128 representation.
    
        Given a decimal string 's' (for example "1.2", or "-0.13e-123"), find the
        closest representable IEEE 754 binary128 float to the value represented
        by that string, and return a hexadecimal representation of the bits
        of that float in the corresponding IEEE 754 interchange format.
        """
        # Convert absolute value to a Fraction. Capture the sign separately.
        f, negative = abs(F(s)), s.lstrip().startswith("-")
    
        # Find the bits representing the significand and exponent of the result.
        if f == 0:
            bits = 0  # Handle special case of zero.
        else:
            # Find exponent; adjust for possible subnormal.
            exp = max(exponent(f), EMIN)
            if exp > EMAX:
                bits = INF  # Overflow to infinity
            else:
                significand = round(f / F(2) ** (exp + 1 - PRECISION))
                bits = (exp - EMIN << PRECISION - 1) + significand
    
        # Merge sign bit if necessary, then format as a hex string.
        if negative:
            bits |= SIGN_BIT
        return f'{bits:0{WIDTH//4}x}'
    

    There are two pieces of sneaky sleight-of-hand in the above that deserve special mention: first, when constructing bits from exponent and significand using the expression (exp - EMIN << PRECISION - 1) + significand, we don't do anything special to deal with subnormals. The code nevertheless handles subnormals correctly: for the normal case, the exponent value exp - EMIN is actually one smaller than it should be, but the most significant bit of the significand ends up incrementing the exponent field when we perform the addition. (So it's important that we use + and not | to combine the exponent piece with the significand.)

    The other observation is that while the choice of exp ensures that the argument to the round call is strictly smaller than 2**PRECISION, it's possible for the result of the round call to be exactly 2**PRECISION. At that point you might expect us to have to test for that case, and adjust the exponent and significand accordingly. But again, there's no need to handle this case specially - when the fields are combined using (exp - EMIN << PRECISION - 1) + significand, we get an extra increment of the exponent field and everything works out correctly, even in the corner case where we end up overflowing to infinity. The elegant design of the IEEE 754 binary interchange formats makes this sort of trickery possible.

    Here's the result of testing the code above on a few examples:

    >>> binary128_to_hex("1.0")
    '3fff0000000000000000000000000000'
    >>> binary128_to_hex("-1.0")
    'bfff0000000000000000000000000000'
    >>> binary128_to_hex("-0.0")
    '80000000000000000000000000000000'
    >>> binary128_to_hex("3.362103143112093506262677817321752E-4932")
    '0000ffffffffffffffffffffffffffff'
    >>> binary128_to_hex("1.1897314953572317650857593266280071308E+4932")
    '7fff0000000000000000000000000000'
    >>> binary128_to_hex("1.1897314953572317650857593266280070162E+4932")
    '7ffeffffffffffffffffffffffffffff'
    >>> binary128_to_hex("1.2345E-4950")  # subnormal value
    '00000000000000000006c5f6731b03b8'
    >>> binary128_to_hex("3.14159265358979323846264338327950288")
    '4000921fb54442d18469898cc51701b8'