c++floating-pointroundingpolygon.io

Parsing floating point number to a uint64_t fails with fast-math


I have some code that parses floating point number and returns an unsigned int if the number can be converted to unsigned without losing precision:

#include <charconv>
#include <string_view>
#include <stdint.h>

// Note: the source number is expected to be within 32-bit UINT_MAX range, not full 64-bit range.
uint64_t read_uint(std::string_view num)
{
    double d;
    auto r = std::from_chars(num.data(), num.data() + num.size(), d);
    if (r.ec == std::errc() && r.ptr == num.data() + num.size())
    {
        uint64_t u = (uint64_t)d;
        if (d == u + 0.0) // conversion back to a double produced identical value
            return u;
    }
    return ~0ull; // error, return -1
}

and the expectations are:

assert(read_uint("1.0") == 1);
assert(read_uint("1.0654553e+07") == 10654553);
assert(read_uint("1.1") == ~0ull);  // error
assert(read_uint("-123") == ~0ull); // error

However, this code fails miserably with clang on x64/x86 optimized builds when targeting avx/avx2/avx512 and using -fast-math. Specifically, parsing negative numbers fails: assert(read_uint("-123") == ~0llu); Instead of returning -1, it actually returns -123 (converted to uint64_t). The reason it fails is because conversion back to double to verify that the result is identical produces different result:

        uint64_t u = (uint64_t)d;
        if (d == u + 0.0)  // u + 0.0 produces different result
            return u;

as a side note, casting also produces different value when targeting avx512:

        uint64_t u = (uint64_t)d; // u might not be exact when targeting avx512

Clearly, this code is riddled with bugs and gotchas and I have some questions:

Note, that with MS compiler I haven't seen any of the issues above. Values are always exact/identical regardless of optimizations, floating point model, or target arch.

As a side note, this is not exact code that's used in prod, but some extract from it. It parses numbers returned by polygon.io json APIs. Perhaps, they carelessly dumped numbers using python and I've seen cases where values were something like "1.0", "1.0654553e+07" etc in place of plain integers. So far, as a simple workaround I changed casting to uint64_t to be:

uint64_t u = (uint64_t)fabs(d);

Minimal example: https://godbolt.org/z/cKzrK6ven (if you remove -O2 from clang cmdline output will change)

Update.

it appears that casting to int64_t first is the most optimal solution: https://godbolt.org/z/cjjzPMPrh

uint64_t u = (uint64_t)(int64_t)d;

Solution

  • Yes, your code has undefined behavior.

    N4928 conv.fpint p1

    A prvalue of a floating-point type can be converted to a prvalue of an integer type. The conversion truncates; that is, the fractional part is discarded. The behavior is undefined if the truncated value cannot be represented in the destination type.

    The truncated value is -123, which cannot be represented in the destination type uint64_t (it can only represent nonnegative values) so this is undefined behavior.

    Note this applies whether you use a C-style cast (uint64_t)d or static_cast<uint64_t>(d).

    It's true that converting a value of integer type with the value -123 to uint64_t yields a well-defined result (namely 2^64 - 123 = 18446744073709551493). But this does not apply when converting a value of floating-point type.