tensorflow-model-garden

How to increase num_classes in ssd_mobilenet_v1 tensorflow


I am using ssd_mobilenet_v1_coco.config and

I changed the value of num_classes to 20 after adding 13 things after planning training

python model_main.py --alsologtostderr --model_dir=training/ --pipeline_config_path=training/ssd_mobilenet_v1_coco.config

I kept trying to learn with the command, but I get an error. To increase num_classes What should I do ? Should I grab num_classes=100 from the beginning and start? I need help.

model {
  ssd {
    num_classes: 20
    box_coder {
      faster_rcnn_box_coder {
        y_scale: 10.0
        x_scale: 10.0
        height_scale: 5.0
        width_scale: 5.0
      }


  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/saver.py", line 1326, in restore
    err, "a mismatch between the current graph and the graph")
tensorflow.python.framework.errors_impl.InvalidArgumentError: Restoring from checkpoint failed. This is most likely due to a mismatch between the current graph and the graph from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Assign requires shapes of both tensors to match. lhs shape= [126] rhs shape= [84]
         [[node save/Assign_56 (defined at /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1748) ]]

Solution

  • I recently had a similar issue. To solve my problem, I had to do the following:

    python research/object_detection/model_main.py \
      --model_dir=./model/finetune0 \
      --pipeline_config_path=./model/pipeline.config \
      --alsologtostderr
    

    My file structure:

    + models
    -+ model
    --+ checkpoint
    --+ model.ckpt.index
    --+ model.ckpt.meta
    --+ model.ckpt.data-00000-of-00001
    --+ pipeline.config
    --- finetune0 (will be autogenerated)
    
    -- data (tfrecord dataset)
    -- annotations (labels)
    ...
    

    Context

    Looks like when you already have a checkpoint at the model_dir, the script will try to resume the training on the model provided, but the new configuration on pipeline.config won't match the current model (num_class differs).

    If you provide this checkpoint in the fine_tune_checkpoint and point model_dir to a new folder, it will build the model from the checkpoint variable, tweak it to match the new config, and then start the training.