I have issues fine-tuning the pretrained model deeplabv3_mnv2_pascal_train_aug in Google Colab.
When I do the visualization with vis.py, the results appear to be displaced to the left/upper side of the image if it has a bigger height/width, namely, the image is not square.
The dataset used for the fine-tune is Look Into Person. The steps done to do so are:
_LIP_INFORMATION = DatasetDescriptor(
splits_to_sizes={
'train': 30462,
'train_aug': 10582,
'trainval': 40462,
'val': 10000,
},
num_classes=19,
ignore_label=255,
)
_DATASETS_INFORMATION = {
'cityscapes': _CITYSCAPES_INFORMATION,
'pascal_voc_seg': _PASCAL_VOC_SEG_INFORMATION,
'ade20k': _ADE20K_INFORMATION,
'cihp': _CIHP_INFORMATION,
'lip': _LIP_INFORMATION,
}
!python models/research/deeplab/datasets/build_voc2012_data.py \
--image_folder="/content/drive/MyDrive/TFM/lip_trainval_images/TrainVal_images/train_images" \
--semantic_segmentation_folder="/content/drive/MyDrive/TFM/lip_trainval_segmentations/TrainVal_parsing_annotations/train_segmentations" \
--list_folder="/content/drive/MyDrive/TFM/lip_trainval_images" \
--image_format="jpg" \
--output_dir="train_lip_tfrecord/"
!python models/research/deeplab/datasets/build_voc2012_data.py \
--image_folder="/content/drive/MyDrive/TFM/lip_trainval_images/TrainVal_images/val_images" \
--semantic_segmentation_folder="/content/drive/MyDrive/TFM/lip_trainval_segmentations/TrainVal_parsing_annotations/val_segmentations" \
--list_folder="/content/drive/MyDrive/TFM/lip_trainval_images" \
--image_format="jpg" \
--output_dir="val_lip_tfrecord/"
!python deeplab/train.py --logtostderr \
--training_number_of_steps=40000 \
--train_split="train" \
--model_variant="mobilenet_v2" \
--atrous_rates=6 \
--atrous_rates=12 \
--atrous_rates=18 \
--output_stride=16 \
--decoder_output_stride=4 \
--train_batch_size=1 \
--dataset="lip" \
--train_logdir="/content/drive/MyDrive/TFM/checkpoint_lip_mobilenet" \
--dataset_dir="/content/drive/MyDrive/TFM/trainval_lip_tfrecord/" \
--fine_tune_batch_norm=false \
--initialize_last_layer=false \
--last_layers_contain_logits_only=false
!python deeplab/vis.py --logtostderr \
--vis_split="val"
--model_variant="mobilenet_v2"
--atrous_rates=6 \
--atrous_rates=12 \
--atrous_rates=18 \
--output_stride=16 \
--decoder_output_stride=4 \
--dataset="lip" \
--checkpoint_dir="/content/drive/MyDrive/TFM/checkpoint_lip_mobilenet" \
--vis_logdir="/content/drive/My Drive/TFM/eval_results_lip" \
--dataset_dir="/content/drive/My Drive/TFM/trainval_lip_tfrecord" \
--max_number_of_iterations=1 \
--eval_interval_secs=0
With the following steps, an example of the problem I´m facing is:
I don´t know if I´m missing something important or if it needs more training. However, training does not seem to be a solution since loss its at the moment going up and down from 1.5 to 0.5, aprox.
Thanks in advance.
After some time, I did find a solution for this problem. An important thing to know is that, by default, train_crop_size and vis_crop_size are 513x513.
The issue was due to vis_crop_size being smaller than the input images, so vis_crop_size is needed to be greater than the max dimension of the biggest image.
In case you want to use export_model.py, you must use the same logic than vis.py, so your masks are not cropped to 513 by default.