I'm curious as to why Pytorch's binary_cross_entropy
function seems to be implemented in such a way to calculate ln(0) = -100.
The binary cross entropy function from a math point of view calculates:
H = -[ p_0*log(q_0) + p_1*log(q_1) ]
In pytorch's binary_cross_entropy
function, q
is the first argument and p
is the second.
Now suppose I do p = [1,0]
and q = [0.25, 0.75]
. In this case, F.binary_cross_entropy(q,p)
returns, as expected: -ln(0.25) = 1.386.
If we reverse the function arguments and try F.binary_cross_entropy(p,q)
, this should return an error, since we would try calculating -0.75*ln(0), and ln(0) is in the limit -infinity.
Nonetheless, if I do F.binary_cross_entropy(p,q)
it gives me 75 as the answer (see below):
> import torch.nn.functional as F
> pT = torch.Tensor([1,0])
> qT =torch.Tensor([0.25,0.75])
> F.binary_cross_entropy(pT,qT)
tensor(75.)
Why it was implemented in this way?
It is indeed filling the value with -100. You can find an example of that here.
This is most likely a hack to avoid an error caused by accidental rounding to zero.
Technically speaking, the input probabilities to binary_cross_entropy
are supposed to be generated by a sigmoid function, which is bounded asymptotically between (0, 1)
. This means the input should never actually be zero, but this may occur due to numerical precision issues for very small values.