Cross entropy formula:
But why does the following give loss = 0.7437
instead of loss = 0
(since 1*log(1) = 0
)?
import torch
import torch.nn as nn
from torch.autograd import Variable
output = Variable(torch.FloatTensor([0,0,0,1])).view(1, -1)
target = Variable(torch.LongTensor([3]))
criterion = nn.CrossEntropyLoss()
loss = criterion(output, target)
print(loss) # 0.7437
In your example you are treating output [0, 0, 0, 1]
as probabilities as required by the mathematical definition of cross entropy. But PyTorch treats them as outputs, that don’t need to sum to 1
, and need to be first converted into probabilities for which it uses the softmax function.
So H(p, q)
becomes:
H(p, softmax(output))
Translating the output [0, 0, 0, 1]
into probabilities:
softmax([0, 0, 0, 1]) = [0.1749, 0.1749, 0.1749, 0.4754]
whence:
-log(0.4754) = 0.7437