In Tensorflow, I have a classifier network and unbalanced training classes. For various reasons I cannot use resampling to compensate for the unbalanced data. Therefore I am forced to compensate for the misbalance by other means, specifically multiplying the logits by weights based on the number of examples in each class. I know this is not the preferred approach, but resampling is not an option. My training loss op is tf.nn.softmax_cross_entropy_with_logits
(I might also try tf.nn.sparse_softmax_cross_entropy_with_logits
). The Tensorflow docs includes the following in the description of these ops:
WARNING: This op expects unscaled logits, since it performs a softmax on logits internally for efficiency. Do not call this op with the output of softmax, as it will produce incorrect results.
My question: Is the warning above referring only to scaling done by softmax, or does it mean any logit scaling of any type is forbidden? If the latter, then is my class-rebalancing logit scaling causing erroneous results?
The warning just informs you that tf.nn.softmax_cross_entropy_with_logits
will apply a softmax
on the input logits, before computing cross-entropy. This warning seems really to avoid applying softmax twice, as the cross-entropy results would be very different.
Here is a comment in the relevant source code, about the function that implements tf.nn.softmax_cross_entropy_with_logits
:
// NOTE(touts): This duplicates some of the computations in softmax_op
// because we need the intermediate (logits -max(logits)) values to
// avoid a log(exp()) in the computation of the loss.
As the warning states, this implementation is for improving performance, with the caveat that you should not put your own softmax
layer as input (which is somewhat convenient, in practice).
If the forced softmax
hinders your computation, perhaps another API could help: tf.nn.sigmoid_cross_entropy_with_logits
or maybe tf.nn.weighted_cross_entropy_with_logits
.
The implementation does not seem to indicate, though, that any scaling will impact the result. I guess a linear scaling function should be fine, as long as it preserves the original logits repartition. But whatever is applied on the input logits, tf.nn.softmax_cross_entropy_with_logits
will apply a softmax
before computing the cross-entropy.