I'm learning on CrossEntropyLoss module in pytorch. And the tutor says, you should input target value y with 'label encoded', not 'one-hot encoded'. Like this
loss = nn.CrossEntropyLoss()
Y = torch.tensor([0])
Y_pred_good = torch.tensor([[2.0, 1.0, 0.1]])
Y_pred_bad = torch.tensor([[0.5, 1.0, 0.3]])
l1 = loss(Y_pred_good, Y)
l2 = loss(Y_pred_bad, Y)
print(l1.item())
print(l2.item())
But I learned that CrossEntropy Loss is caculated with one-hot encoded class information. Does the pytorch module transform label encoded into one-hot encoded? or is there another way to caculate CELoss with label encoded information?
There's a difference between the multi-label CE loss, nn.CrossEntropyLoss
, and the binary version, nn.BCEWithLogitsLoss
.
For the binary case, the implemented loss allows for "soft labels" and thus requires the binary targets to be floats in the range [0, 1]
.
In contrast, nn.CrossEntropyLoss
works with "hard" labels, and thus does not need to encode them in a one-hot fashion.
If you do the math for the multi-class cross-entropy loss, you'll see that it is inefficient to have a one-hot representation for the targets. The loss is -log p_i
where i
is the true label. One only need to index the proper entry in the predicted probabilities vector. This can be done via multiplication be the one-hot encoded targets, but it is much more efficient to do it be indexing the right entry.
Note: It seems like recent versions of nn.CrossEntropyLoss
also support one-hot encoded targets ("smooth labels").