Would it be possible to use the label_smoothing
feature from tf.losses.softmax_cross_entropy
with tf.contrib.seq2seq.sequence_loss
?
I can see that sequence_loss
optionally takes a softmax_loss_function
as parameter. However, this function would take the targets
as a list of ints, instead of one-hot encoded vectors required by tf.losses.softmax_cross_entropy
, which is also the only function to support label_smoothing
in TensorFlow.
Can you recommend a way of making label_smoothing work with sequence_loss
?
This can't be done efficiently.
tf.contrib.seq2seq.sequence_loss
is designed to work with very large vocabularies, hence it's expecting a loss function from sparse family (see this question for details). The main difference is that labels use ordinal encoding instead of one-hot, because the latter takes too much memory. Actual one-hot encoding is never computed.
label_smoothing
parameter of tf.losses.softmax_cross_entropy
on the other hand is an option to manipulate the one-hot encoding. Here's what it does:
if label_smoothing > 0:
num_classes = math_ops.cast(
array_ops.shape(onehot_labels)[1], logits.dtype)
smooth_positives = 1.0 - label_smoothing
smooth_negatives = label_smoothing / num_classes
onehot_labels = onehot_labels * smooth_positives + smooth_negatives
As you can see, to compute this tensor, onehot_labels
must be stored explicitly, which is exactly what sparse functions try to avoid. That's why neither tf.nn.sparse_softmax_cross_entropy_with_logits
, nor tf.contrib.seq2seq.sequence_loss
provide a similar parameter. Of course, you can do the conversion yourself, but this defeats the whole optimization.