I read from the documentation:
tf.keras.losses.SparseCategoricalCrossentropy( from_logits=False, reduction="auto", name="sparse_categorical_crossentropy" )
Computes the crossentropy loss between the labels and predictions.
Use this crossentropy loss function when there are two or more label classes. We expect labels to be provided as integers. If you want to provide labels using one-hot representation, please use
CategoricalCrossentropy
loss. There should be # classes floating point values per feature fory_pred
and a single floating point value per feature fory_true
.
Why is this called sparse categorical cross entropy? If anything, we are providing a more compact encoding of class labels (integers vs one-hot vectors).
I think this is because integer encoding is more compact than one-hot encoding and thus more suitable for encoding sparse binary data. In other words, integer encoding = better encoding for sparse binary data.
This can be handy when you have many possible labels (and samples), in which case a one-hot encoding can be significantly more wasteful than a simple integer per example.