haskellfloating-pointtype-conversionprecisionlargenumber

A haskell floating point calculation anomaly?


2022 Update: This bug was filed as a GHC ticket and is now fixed: https://gitlab.haskell.org/ghc/ghc/issues/17231 so this is no longer an issue.

Using ghci 8.6.5

I want to calculate the square root of an Integer input, then round it to the bottom and return an Integer.

square :: Integer -> Integer
square m = floor $ sqrt $ fromInteger m

It works. The problem is, for this specific big number as input:

4141414141414141*4141414141414141

I get a wrong result.

Putting my function aside, I test the case in ghci:

> sqrt $ fromInteger $ 4141414141414141*4141414141414141
4.1414141414141405e15

wrong... right?

BUT SIMPLY

> sqrt $ 4141414141414141*4141414141414141
4.141414141414141e15

which is more like what I expect from the calculation...

In my function I have to make some type conversion, and I reckon fromIntegral is the way to go. So, using that, my function gives a wrong result for the 4141...41 input.

I can't figure out what ghci does implicitly in terms of type conversion, right before running sqrt. Because ghci's conversion allows for a correct calculation.

Why I say this is an anomaly: the problem does not occur with other numbers like 5151515151515151 or 3131313131313131 or 4242424242424242 ...

Is this a Haskell bug?


Solution

  • TLDR

    It comes down to how one converts an Integer value to a Double that is not exactly representable. Note that this can happen not just because Integer is too big (or too small), but Float and Double values by design "skip around" integral values as their magnitude gets larger. So, not every integral value in the range is exactly representable either. In this case, an implementation has to pick a value based on the rounding-mode. Unfortunately, there are multiple candidates; and what you are observing is that the candidate picked by Haskell gives you a worse numeric result.

    Expected Result

    Most languages, including Python, use what's known as "round-to-nearest-ties-to-even" rounding mechanism; which is the default IEEE754 rounding mode and is typically what you would get unless you explicitly set a rounding mode when issuing a floating-point related instruction in a compliant processor. Using Python as the "reference" here, we get:

    >>> float(long(4141414141414141)*long(4141414141414141))
    1.7151311090705027e+31
    

    I haven't tried in other languages that support so called big-integers, but I'd expect most of them would give you this result.

    How Haskell converts Integer to Double

    Haskell, however, uses what's known as truncation, or round-towards-zero. So you get:

    *Main> (fromIntegral $ 4141414141414141*4141414141414141) :: Double
    1.7151311090705025e31
    

    Turns out this is a "worse" approximation in this case (cf. to the Python produced value above), and you get the unexpected result in your original example.

    The call to sqrt is really red-herring at this point.

    Show me the code

    It all originates from this code: (https://hackage.haskell.org/package/integer-gmp-1.0.2.0/docs/src/GHC.Integer.Type.html#doubleFromInteger)

    doubleFromInteger :: Integer -> Double#
    doubleFromInteger (S# m#) = int2Double# m#
    doubleFromInteger (Jp# bn@(BN# bn#))
        = c_mpn_get_d bn# (sizeofBigNat# bn) 0#
    doubleFromInteger (Jn# bn@(BN# bn#))
        = c_mpn_get_d bn# (negateInt# (sizeofBigNat# bn)) 0#
    

    which in turn calls: (https://github.com/ghc/ghc/blob/master/libraries/integer-gmp/cbits/wrappers.c#L183-L190):

    /* Convert bignum to a `double`, truncating if necessary
     * (i.e. rounding towards zero).
     *
     * sign of mp_size_t argument controls sign of converted double
     */
    HsDouble
    integer_gmp_mpn_get_d (const mp_limb_t sp[], const mp_size_t sn,
                           const HsInt exponent)
    {
    ...
    

    which purposefully says the conversion is done rounding-toward zero.

    So, that explains the behavior you get.

    Why does Haskell do this?

    None of this explains why Haskell uses round-towards-zero for integer-to-double conversion. I'd strongly argue that it should use the default rounding mode, i.e., round-nearest-ties-to-even. I can't find any mention whether this was a conscious choice, and it at least disagrees with what Python does. (Not that I'd consider Python the gold standard, but it does tend to get these things right.)

    My best guess is it was just coded that way, without a conscious choice; but perhaps other people familiar with the history of numeric programming in Haskell can remember better.

    What to do

    Interestingly, I found the following discussion dating all the way back to 2008 as a Python bug: https://bugs.python.org/issue3166. Apparently, Python used to do the wrong thing here as well, but they fixed the behavior. It's hard to track the exact history, but it appears Haskell and Python both made the same mistake; Python recovered, but it went unnoticed in Haskell. If this was a conscious choice, I'd like to know why.

    So, that's where it stands. I'd recommend opening a GHC ticket so it can be at least documented properly that this is the "chosen" behavior; or better, fix it so that it uses the default rounding mode instead.

    Update:

    GHC ticket opened: https://gitlab.haskell.org/ghc/ghc/issues/17231

    2022 Update:

    This is now fixed in GHC; at least as of GHC 9.2.2; but possibly earlier:

    GHCi, version 9.2.2: https://www.haskell.org/ghc/  :? for help
    Prelude> (fromIntegral $ 4141414141414141*4141414141414141) :: Double
    1.7151311090705027e31