rroundingbit-representation

Same function with same inputs returns different values


Lets say I have a function like follows:

testFunction <- function(testInputs){
    print( sum(testInputs)+1 == 2 )
    return( sum(testInputs) == 1 )
}

When I test this on command line with following input: c(0.65, 0.3, 0.05), it prints and returns TRUE as expected.

However when I use c(1-0.3-0.05, 0.3, 0.05) I get TRUE printed and FALSE returned. Which makes no sense because it means sum(testInputs)+1 is 2 but sum(testInputs) is not 1.

Here is what I think: Somehow printed value is not exactly 1 but probably 0.9999999..., and its rounded up on display. But this is only a guess. How does this work exactly?


Solution

  • This is exactly a floating point problem, but the interesting thing about it for me is how it demonstrates that the return value of sum() produces this error, but with + you don't get it.

    See the links about floating point math in the comments. Here is how to deal with it:

    sum(1-0.3-0.5, 0.3, 0.05) == 1
    # [1] FALSE
    dplyr::near(sum(1-0.3-0.05, 0.3, 0.05), 1)
    # [1] TRUE
    

    For me, the fascinating thing is:

    (1 - 0.3 - 0.05 + 0.3 + 0.05) == 1
    # [1] TRUE
    

    Because you can't predict how the various implementations of floating point arithmetic will behave, you need to correct for it. Here, instead of using ==, use dplyr::near(). This problem (floating point math is inexact, and also unpredictable), is found across languages. Different implementations within a language will result in different floating point errors.

    As I discussed in this answer to another floating point question, dplyr::near(), like all.equal(), has a tolerance argument, here tol. It is set to .Machine$double.eps^0.5, by default. .Machine$double.eps is the smallest number that your machine can add to 1 and be able to distinguish it from 1. It's not exact, but it's on that order of magnitude. Taking the square root makes it a little bigger than that, and allows you to identify exactly those values that are off by an amount that would make a failed test for equality likely to be a floating point error.

    NOTE: yes, near() is in dplyr, which i almost always have loaded, so I forgot it wasn't in base... you could use all.equal(), but look at the source code of near(). It's exactly what you need, and nothing you don't:

    near
    # function (x, y, tol = .Machine$double.eps^0.5) 
    # {
    #     abs(x - y) < tol
    # }
    # <environment: namespace:dplyr>