I was watching a tutorial, when he want to calculate the probabilities of a predictions from logits it use softmax with dim=0 why? isn't dim=0 means take softmax across the rows? so shouldn't we use dim=1? like when we want to get the class id we use torch.argmax(**, dim=1) because every row is representing the probability of different classes for one sample so why not use dim=1?
what's the differences between these two(when we're getting class id using argmax and when getting probabilities using softmax)?
I read some answers on the other questions but I didn't understand it
It's hard to say why it might be in the tutorial, but usually you have (batch_number, **your_data)
shape as input in your network, output in case of classification usually has (batch_number, number_of_classes)
, and you're right that in that case you should use dim=1(or recommended way use even dim=-1 because you can have more complicated output, for example - (batch_number, some_more_data, ..., number_of_classes)
) to get model confidence along dim which sum to 1, but sometimes in architecture of deep network might reshape dimension of the data for some purpose then you can check in what dimension number_of_classes
is
and other part of question the difference between of argmax and softmax is that first one returns the confidences along number of classes, the second one returns the one class index with the highest confidence for each sample, usually you apply softmax
and then argmax
in order to get final class index
Hope it helps