This model is still training. And the Validation accuracy is getting lower than the training. This shows overfitting ? How I can overcome this ? I have used the MobileNet model. Can I do it by reducing the learning rate?
Epoch 10/50
6539/6539 [==============================] - 3386s 518ms/step - loss: 0.8198 - accuracy: 0.7470 - top3_acc: 0.9199 - top5_acc: 0.9645 - val_loss: 1.0399 - val_accuracy: 0.6940 - val_top3_acc: 0.8842 - val_top5_acc: 0.9406
Epoch 11/50
6539/6539 [==============================] - 3377s 516ms/step - loss: 0.7939 - accuracy: 0.7558 - top3_acc: 0.9248 - top5_acc: 0.9669 - val_loss: 1.0379 - val_accuracy: 0.6953 - val_top3_acc: 0.8844 - val_top5_acc: 0.9411
Epoch 12/50
6539/6539 [==============================] - 3386s 518ms/step - loss: 0.7593 - accuracy: 0.7644 - top3_acc: 0.9304 - top5_acc: 0.9702 - val_loss: 1.0454 - val_accuracy: 0.6953 - val_top3_acc: 0.8831 - val_top5_acc: 0.9410
Epoch 13/50
6539/6539 [==============================] - 3394s 519ms/step - loss: 0.7365 - accuracy: 0.7735 - top3_acc: 0.9340 - top5_acc: 0.9713 - val_loss: 1.0476 - val_accuracy: 0.6938 - val_top3_acc: 0.8856 - val_top5_acc: 0.9411
Epoch 14/50
6539/6539 [==============================] - 3386s 518ms/step - loss: 0.7049 - accuracy: 0.7824 - top3_acc: 0.9387 - top5_acc: 0.9739 - val_loss: 1.0561 - val_accuracy: 0.6935 - val_top3_acc: 0.8841 - val_top5_acc: 0.9398
Epoch 15/50
6539/6539 [==============================] - 3390s 518ms/step - loss: 0.6801 - accuracy: 0.7901 - top3_acc: 0.9421 - top5_acc: 0.9755 - val_loss: 1.0673 - val_accuracy: 0.6923 - val_top3_acc: 0.8828 - val_top5_acc: 0.9391
Epoch 16/50
6539/6539 [==============================] - 3635s 556ms/step - loss: 0.6516 - accuracy: 0.7991 - top3_acc: 0.9462 - top5_acc: 0.9772 - val_loss: 1.0747 - val_accuracy: 0.6905 - val_top3_acc: 0.8825 - val_top5_acc: 0.9388
Epoch 17/50
6539/6539 [==============================] - 4070s 622ms/step - loss: 0.6200 - accuracy: 0.8082 - top3_acc: 0.9502 - top5_acc: 0.9805 - val_loss: 1.0859 - val_accuracy: 0.6883 - val_top3_acc: 0.8814 - val_top5_acc: 0.9373
Epoch 18/50
6539/6539 [==============================] - 4092s 626ms/step - loss: 0.5896 - accuracy: 0.8182 - top3_acc: 0.9550 - top5_acc: 0.9822 - val_loss: 1.1029 - val_accuracy: 0.6849 - val_top3_acc: 0.8788 - val_top5_acc: 0.9367
Epoch 19/50
6539/6539 [==============================] - 4087s 625ms/step - loss: 0.5595 - accuracy: 0.8291 - top3_acc: 0.9589 - top5_acc: 0.9834 - val_loss: 1.1147 - val_accuracy: 0.6872 - val_top3_acc: 0.8797 - val_top5_acc: 0.9367
Epoch 20/50
6539/6539 [==============================] - 4015s 614ms/step - loss: 0.5361 - accuracy: 0.8367 - top3_acc: 0.9617 - top5_acc: 0.9852 - val_loss: 1.1325 - val_accuracy: 0.6833 - val_top3_acc: 0.8773 - val_top5_acc: 0.9361
Epoch 21/50
6539/6539 [==============================] - 4093s 626ms/step - loss: 0.5023 - accuracy: 0.8472 - top3_acc: 0.9661 - top5_acc: 0.9870 - val_loss: 1.1484 - val_accuracy: 0.6844 - val_top3_acc: 0.8773 - val_top5_acc: 0.9363
Epoch 22/50
6539/6539 [==============================] - 4094s 626ms/step - loss: 0.4691 - accuracy: 0.8570 - top3_acc: 0.9703 - top5_acc: 0.9892 - val_loss: 1.1730 - val_accuracy: 0.6802 - val_top3_acc: 0.8765 - val_top5_acc: 0.9337
Epoch 23/50
6539/6539 [==============================] - 4091s 626ms/step - loss: 0.4387 - accuracy: 0.8676 - top3_acc: 0.9737 - top5_acc: 0.9904 - val_loss: 1.1986 - val_accuracy: 0.6774 - val_top3_acc: 0.8735 - val_top5_acc: 0.9320
Epoch 24/50
6539/6539 [==============================] - 4033s 617ms/step - loss: 0.4122 - accuracy: 0.8752 - top3_acc: 0.9764 - top5_acc: 0.9915 - val_loss: 1.2157 - val_accuracy: 0.6782 - val_top3_acc: 0.8755 - val_top5_acc: 0.9322
Epoch 25/50
6539/6539 [==============================] - 4105s 628ms/step - loss: 0.3838 - accuracy: 0.8861 - top3_acc: 0.9794 - top5_acc: 0.9927 - val_loss: 1.2419 - val_accuracy: 0.6746 - val_top3_acc: 0.8711 - val_top5_acc: 0.9309
Epoch 26/50
6539/6539 [==============================] - 4098s 627ms/step - loss: 0.3551 - accuracy: 0.8964 - top3_acc: 0.9824 - top5_acc: 0.9938 - val_loss: 1.2719 - val_accuracy: 0.6741 - val_top3_acc: 0.8722 - val_top5_acc: 0.9294
Epoch 27/50
6539/6539 [==============================] - 4101s 627ms/step - loss: 0.3266 - accuracy: 0.9051 - top3_acc: 0.9846 - top5_acc: 0.9950 - val_loss: 1.2877 - val_accuracy: 0.6723 - val_top3_acc: 0.8709 - val_top5_acc: 0.9288
Epoch 28/50
6539/6539 [==============================] - 4007s 613ms/step - loss: 0.3022 - accuracy: 0.9147 - top3_acc: 0.9866 - top5_acc: 0.9955 - val_loss: 1.3156 - val_accuracy: 0.6687 - val_top3_acc: 0.8667 - val_top5_acc: 0.9266
Epoch 29/50
6539/6539 [==============================] - 3410s 521ms/step - loss: 0.2797 - accuracy: 0.9208 - top3_acc: 0.9886 - top5_acc: 0.9962 - val_loss: 1.3409 - val_accuracy: 0.6712 - val_top3_acc: 0.8682 - val_top5_acc: 0.9270
Epoch 30/50
6539/6539 [==============================] - 3398s 520ms/step - loss: 0.2555 - accuracy: 0.9292 - top3_acc: 0.9907 - top5_acc: 0.9969 - val_loss: 1.3703 - val_accuracy: 0.6684 - val_top3_acc: 0.8661 - val_top5_acc: 0.9252
Epoch 31/50
6539/6539 [==============================] - 3401s 520ms/step - loss: 0.2365 - accuracy: 0.9358 - top3_acc: 0.9926 - top5_acc: 0.9975 - val_loss: 1.3945 - val_accuracy: 0.6660 - val_top3_acc: 0.8659 - val_top5_acc: 0.9270
Epoch 32/50
6539/6539 [==============================] - 3387s 518ms/step - loss: 0.2174 - accuracy: 0.9414 - top3_acc: 0.9934 - top5_acc: 0.9979 - val_loss: 1.4218 - val_accuracy: 0.6687 - val_top3_acc: 0.8650 - val_top5_acc: 0.9229
Epoch 33/50
6539/6539 [==============================] - 3397s 519ms/step - loss: 0.1986 - accuracy: 0.9478 - top3_acc: 0.9948 - top5_acc: 0.9983 - val_loss: 1.4513 - val_accuracy: 0.6641 - val_top3_acc: 0.8620 - val_top5_acc: 0.9217
Epoch 34/50
6539/6539 [==============================] - 3394s 519ms/step - loss: 0.1814 - accuracy: 0.9533 - top3_acc: 0.9956 - top5_acc: 0.9986 - val_loss: 1.4752 - val_accuracy: 0.6656 - val_top3_acc: 0.8612 - val_top5_acc: 0.9207
This is my Code. I have used the DeepFashion dataset and use 209222 images for training. and also have used the SGD optimizer with learning_rate=0.001.
mobile = tf.keras.applications.mobilenet.MobileNet(weights='imagenet')
x = mobile.layers[-6].input
if True:
x = Reshape([7*7,1024])(x)
att = MultiHeadsAttModel(l=7*7, d=1024 , dv=64, dout=1024, nv = 16 )
x = att([x,x,x])
x = Reshape([7,7,1024])(x)
x = BatchNormalization()(x)
x = mobile.get_layer('global_average_pooling2d')(x)
x = mobile.get_layer('reshape_1')(x)
x = mobile.get_layer('dropout')(x)
x = mobile.get_layer('conv_preds')(x)
x = mobile.get_layer('reshape_2')(x)
output = Dense(units=50, activation='softmax')(x)
model = Model(inputs=mobile.input, outputs=output)
Your validation loss is got increased while the training loss tends to get smaller in each iteration. This is a classic case of overfitting
.
I am not familiar with "MobileNet model" but it would help if you share the architecture or a link to the architecture details.
I can blindly suggest adding dropouts ( https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dropout) in order to regularize your model ( I guess you do not have dropouts in the model). Honestly, I cannot see how changing the 'learning rate' might help to overcome overfitting, so I do not advise that.
Since you did not share any information about the dataset size, I am not sure how big and diverse the data set is. Anyway, If the dataset is relatively small, you can augment your dataset, you have a better chance to reduce the overfitting.