I am working on a multi-label image classification problem, using TensorFlow
, Keras
and Python 3.9
.
I have built a dataset containing one .csv
file with image names and their respective one-hot encoded labels, like so:
I also have an image folder with the associated image files. There are around 17,000 images, and each one can be classified with a total of 29 possible labels. The dataset is fairly well balanced. These labels refer to the visual components found in an image, for example, the following image belongs to classes [02, 23, 05]
.
This method of image labelling is popular in Trademark Imaging and is known as Vienna Classification.
Now, my goal is to perform predictions on similar images. For this, I am fine-tuning a VGG19
network with a custom prediction layer defined as follows:
prediction_layer = tf.keras.layers.Dense(29, activation=tf.keras.activations.sigmoid)`
All images are properly resized to (224, 224, 3)
and their RGB values scaled to [0, 1]
. My network summary looks like this:
Model: "model_5"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_11 (InputLayer) [(None, 224, 224, 3)] 0
tf.__operators__.getitem_1 (None, 224, 224, 3) 0
(SlicingOpLambda)
tf.nn.bias_add_1 (TFOpLambd (None, 224, 224, 3) 0
a)
vgg19 (Functional) (None, 7, 7, 512) 20024384
global_average_pooling2d_3 (None, 512) 0
(GlobalAveragePooling2D)
dense_12 (Dense) (None, 29) 14877
=================================================================
Total params: 20,039,261
Trainable params: 14,877
Non-trainable params: 20,024,384
_________________________________________________________________
The problem I am facing is in regard to the actual training of the network. I am using Adam
and the binary_crossentropy
loss function which I believe is adequate for multi-label problems. However, after around 5 hours of training, I am fairly dissapointed with the accuracy it's achieving.
Epoch 10/10
239/239 [==============================] - 1480s 6s/step - loss: 0.1670 - accuracy: 0.1969 - val_loss: 0.1656 - val_accuracy: 0.1922
I am somewhat familiar with multi-class classification but this is my first attempt at solving a multi-label problem. Am I failing at any point before training, is VGG19
not ideal for this task, did I get my parameters wrong?
Multilabel problems are different in evaluation. Check out this answer. Low accuracy could mean nothing. Consider that a prediction for one sample is only correct if the entire vector of 29 elements is correct. This is hard to achieve. For your example that is:
[0,1,0,0,1,0,0,0,0...,1,0,0,0,0,0,0]
I recommend you to use the binary accuracy, f1-score hamming loss or coverage to evaluate your model, depending on what aspect of the prediction is most important in your context.