Lets say I have a function like follows:
testFunction <- function(testInputs){
print( sum(testInputs)+1 == 2 )
return( sum(testInputs) == 1 )
}
When I test this on command line with following input: c(0.65, 0.3, 0.05), it prints and returns TRUE as expected.
However when I use c(1-0.3-0.05, 0.3, 0.05) I get TRUE printed and FALSE returned. Which makes no sense because it means sum(testInputs)+1 is 2 but sum(testInputs) is not 1.
Here is what I think: Somehow printed value is not exactly 1 but probably 0.9999999..., and its rounded up on display. But this is only a guess. How does this work exactly?
This is exactly a floating point problem, but the interesting thing about it for me is how it demonstrates that the return value of sum()
produces this error, but with +
you don't get it.
See the links about floating point math in the comments. Here is how to deal with it:
sum(1-0.3-0.5, 0.3, 0.05) == 1
# [1] FALSE
dplyr::near(sum(1-0.3-0.05, 0.3, 0.05), 1)
# [1] TRUE
For me, the fascinating thing is:
(1 - 0.3 - 0.05 + 0.3 + 0.05) == 1
# [1] TRUE
Because you can't predict how the various implementations of floating point arithmetic will behave, you need to correct for it. Here, instead of using ==
, use dplyr::near()
. This problem (floating point math is inexact, and also unpredictable), is found across languages. Different implementations within a language will result in different floating point errors.
As I discussed in this answer to another floating point question, dplyr::near()
, like all.equal()
, has a tolerance argument, here tol
. It is set to .Machine$double.eps^0.5
, by default. .Machine$double.eps
is the smallest number that your machine can add to 1
and be able to distinguish it from 1
. It's not exact, but it's on that order of magnitude. Taking the square root makes it a little bigger than that, and allows you to identify exactly those values that are off by an amount that would make a failed test for equality likely to be a floating point error.
NOTE: yes, near()
is in dplyr, which i almost always have loaded, so I forgot it wasn't in base... you could use all.equal()
, but look at the source code of near()
. It's exactly what you need, and nothing you don't:
near
# function (x, y, tol = .Machine$double.eps^0.5)
# {
# abs(x - y) < tol
# }
# <environment: namespace:dplyr>