pythontensorflowmachine-learningkerascomputer-vision

difference between categorical and binary cross entropy


Using keras I have to train a model to predict either the image belongs to class 0 or class 1. I am confused in binary and categorical_cross_entropy. I have searched for that but I am still confused. Some have mentioned that we only use categorical cross entropy when we are trying to predict multi-classes and we should use one-hot-encoder vector for this. So it means we dont need any one-hot-encoded vector labels when we are going to train using binary_cross_entrpoy. Some have suggested to represent one_hot vectors as [0. 1.] (if class is 1) or [1. 0.] (if class is 0) for binary_cross_entropy. I am using one hot encoders [0 1] or [1 0] with categorical cross entropy. My last layer is

model.add(Dense(num_classes, activation='softmax'))
  
# Compile model
model.compile(loss='categorical_crossentropy', 
              optimizer='adadelta', 
              metrics=['accuracy'])

Solution

  • They are mathematically identical for 2 classes hence binary. In other words, 2 class categorical cross entropy is the same as single output binary cross entropy. To give a more tangible example these are identical:

    model.add(Dense(1, activation='sigmoid'))
    model.compile(loss='binary_crossentropy', ...)
    # is the same as
    model.add(Dense(2, activation='softmax'))
    model.compile(loss='categorical_crossentropy', ...)
    

    Which one to use? To avoid one-hot encoding categorical outputs, if you only have 2 classes it is easier - from a coding perspective - to use binary cross entropy. The binary case might be computationally more efficient depending on the implementation.