pythontensorflowsoftmaxcross-entropysequence-to-sequence

TensorFlow sequence_loss with label_smoothing


Would it be possible to use the label_smoothing feature from tf.losses.softmax_cross_entropy with tf.contrib.seq2seq.sequence_loss ?

I can see that sequence_loss optionally takes a softmax_loss_function as parameter. However, this function would take the targets as a list of ints, instead of one-hot encoded vectors required by tf.losses.softmax_cross_entropy, which is also the only function to support label_smoothing in TensorFlow.

Can you recommend a way of making label_smoothing work with sequence_loss ?


Solution

  • This can't be done efficiently.

    tf.contrib.seq2seq.sequence_loss is designed to work with very large vocabularies, hence it's expecting a loss function from sparse family (see this question for details). The main difference is that labels use ordinal encoding instead of one-hot, because the latter takes too much memory. Actual one-hot encoding is never computed.

    label_smoothing parameter of tf.losses.softmax_cross_entropy on the other hand is an option to manipulate the one-hot encoding. Here's what it does:

    if label_smoothing > 0:
      num_classes = math_ops.cast(
          array_ops.shape(onehot_labels)[1], logits.dtype)
      smooth_positives = 1.0 - label_smoothing
      smooth_negatives = label_smoothing / num_classes
      onehot_labels = onehot_labels * smooth_positives + smooth_negatives
    

    As you can see, to compute this tensor, onehot_labels must be stored explicitly, which is exactly what sparse functions try to avoid. That's why neither tf.nn.sparse_softmax_cross_entropy_with_logits, nor tf.contrib.seq2seq.sequence_loss provide a similar parameter. Of course, you can do the conversion yourself, but this defeats the whole optimization.