pythontensorflowdeep-learningdatasettraining-data

Low validation accuracy during data training using lfw datasets


So i was training 400 labels of image datasets that consist around 900 images that split into 80% training and 20% validation. i'm following a guide from tensorflow here(https://www.tensorflow.org/tutorials/images/transfer_learning_with_hub)

this is my dataset (https://drive.google.com/drive/folders/1yIEig6K3g3Y2gFudkE0ca64UzkQtsORA?usp=drive_link)

this is preprocessed dataset using MTCNN

should i change my dataset or else?


Solution

  • in my experience, training using 400 labels with only 900 images 'which mean only 2-3 image' per-label (i see several label have only one image in train dataset or test dataset) is quite challenges for model able effective learning and generalization.

    even if you can find the perfect fine tuning of the model somehow, it's still have high possibility to became over-fitting model, which is a sign of bad model. It only remember several image in training, not learning the important features.

    my recommendation is :

    1. Collect more data, in AI what ever it is having more (*and good quality) is always preferable
    2. reduce the labels, i don't know how many data for each label in the pre-training model, but it's better to have the similar number of data to give a balance dataset composition
    3. if you have problem with gathering data, then do augmentation on the image, there is several option of augmentation like saturation change, rotated, temperature change, etc.