In Octave, I have written code for a Sigmoid function that returns values between 0 and 1; in an ideal world, it would only return 0 for -Inf and 1 for +Inf but due to floating point imprecision, values that are extremely close to either of these are rounded.
My question is why the following occurs: The boundary for rounding is clearly different for 0 vs. 1:
>> sigmoid(-709)
ans = 1.2168e-308
>> sigmoid(-710)
ans = 0
>> sigmoid(36)
ans = 1.00000
>> sigmoid(37)
ans = 1
>> (sigmoid(37)-1)==0
ans = 1
>> (sigmoid(36)-1)==0
ans = 0
>> sigmoid(-710)==0
ans = 1
>> sigmoid(-709)==0
ans = 0
In the example, one can see that the value needed to round the output to 1 is MUCH smaller in magnitude than that needed to round to 0. 37 compared with -710 is a very large discrepancy considering they should be the same in magnitude but with opposite signs...
Perhaps it's an issue with my function:
function [z] = sigmoid(x)
z = 1.0 ./(1.0+exp(-x));
endfunction
Another point, is that I changed the function to add 1 to the result (essentially translating the graph up by 1), and the boundaries became +/-37 for 2 and 1 respectively - this makes me think it really is to do with 0 in particular and not just the function and its lower bound in particular.
If it's something to do with my computer then what would cause such a thing?
First of all, review this brilliant answer by gnovice on floating-point representation.
With that out of the way, let's look at what you're seeing here: You can compute a value very close to zero: sigmoid(-709)
is approximately equal to 1.2e-308
, but you cannot compute a value similarly close to one: sigmoid(709)
is exactly equal to 1, rather than 1 - 1.2e-308
, and even sigmoid(36) == 1
, rather than a value slightly smaller than 1.
But when we know how floating-point numbers are stored in memory, we'll realize that 1 - 1.2e-308
cannot be represented exactly. We'd need 308 decimal digits to represent this number exactly. Double-precision floating-point numbers (the default in Octave) have about 15 decimal digits. That is, 1 - 1e-16
can be represented, but 1 - 1e-17
cannot.
The value of eps(1)
is 2.2204e-16
, this is the smallest difference from 1 that we can encode in a double-precision float.
But values close to 0 can be represented much more precisely: eps(0)
is 4.9407e-324
. That is because a value such as 1.2e-308
does not need 308 decimal digits to be represented, but only 2, with the value -308 in the exponent.
In any case, if you depend on exact values of the sigmoid function so far away from the transition location, then there's something wrong with your code logic.
If you want to make this function symmetric, all you can do is reduce precision on the low end. There are two ways to do this:
Simply set to zero very small values, so that z==0
is reached at the same point as z==1
on the other side:
function z = sigmoid(x)
z = 1.0 ./ (1.0+exp(-x));
z(z < eps(1)) = 0;
end
Always computing the right half of the function, then fold back for negative inputs. This makes the computation error on both sides of x=0
symmetric:
function z = sigmoid(x)
z = 1.0 ./ (1.0+exp(-abs(x)));
I = x < 0;
z(I) = 0.5 - z(I);
end