pythontensorflowobject-detectionsingle-shot-detector

Reduce Training steps for SSD-300


I am new to deep learning and I am trying to train my SSD-300 (single shot detector) model which is taking too long. For example even though I ran 50 epochs, it is training for 108370+ global steps. I am using the default train_ssd_network.py file from the official github repo: https://github.com/balancap/SSD-Tensorflow

The command I ran for training:

!python train_ssd_network.py --dataset_name=pascalvoc_2007 epochs= 50 --dataset_split_name=train --model_name=ssd_300_vgg --save_summaries_secs=60 --save_interval_secs=600 --weight_decay=0.0005 --optimizer=adam --learning_rate=0.001 --batch_size=6 --gpu_memory_fraction=0.9 --checkpoint_exclude_scopes =ssd_300_vgg/conv6,ssd_300_vgg/conv7,ssd_300_vgg/block8,ssd_300_vgg/block9,ssd_300_vgg/block10,ssd_300_vgg/block11,ssd_300_vgg/block4_box,ssd_300_vgg/block7_box,ssd_300_vgg/block8_box,ssd_300_vgg/block9_box,ssd_300_vgg/block10_box,ssd_300_vgg/block11_box

How can I change the training steps and what is the ideal training steps?

The train_ssd_network.py does not provide a specific number related to global_steps


Solution

  • It looks like the module you are using supports a "max_number_of_steps" flag, which could be used like like --max_number_of_steps=10000 as part of your command line statement. The module relies on tensorflow flags to take input from the command line. You can see all the supported flags here with some descriptions.

    I see in another answer that you found the relevant flag and changed the second argument, None, to another value. This second argument is the default value. Changing it should work, but is not necessary, since you could also pass that value in through the command line.

    tf.app.flags.DEFINE_integer('max_number_of_steps', None,
                                    'The maximum number of training steps.')
    

    The ideal training number of training steps depends on your data and application. A common technique to see if you need to train for longer is to measure the model's loss over time during training and to stop training when loss is no longer decreasing substantially.