mxnetmxnet-gluon

GluonCV inference with finetuned model - “Please make sure source and target networks have the same prefix” error


I used GluonCV to finetune an object detection model in order to recognize some custom classes, mostly following the related tutorial.

I tried using both “ssd_512_resnet50_v1_coco” and “ssd_512_mobilenet1.0_coco” as base models, and the training process ended successfully (the accuracy on the validation dataset is reasonably high).

The problem is, I tried running inference with the newly trained model, by using for example:

classes = ["CML_mug", "person"]
net = gcv.model_zoo.get_model('ssd_512_mobilenet1.0_custom',
                      classes=classes,
                      pretrained_base=False,
                      ctx=ctx)
net.load_params("saved_weights/-0070.params", ctx=ctx)

but I get the error:

AssertionError: Parameter 'mobilenet0_conv0_weight' is missing in file: saved_weights/CML_mobilenet_00/-0000.params, which contains parameters: 'ssd0_ssd0_mobilenet0_conv0_weight', 'ssd0_ssd0_mobilenet0_batchnorm0_gamma', 'ssd0_ssd0_mobilenet0_batchnorm0_beta', ..., 'ssd0_ssd0_ssdanchorgenerator2_anchor_2', 'ssd0_ssd0_ssdanchorgenerator3_anchor_3', 'ssd0_ssd0_ssdanchorgenerator4_anchor_4', 'ssd0_ssd0_ssdanchorgenerator5_anchor_5'. Please make sure source and target networks have the same prefix.

So, it seems the network parameters are named differently in the .params file and in the model I’m using for inference. Specifically, in the .params file, the name of the network weights is prefixed by the string “ssd0_ssd0_”, which lead to the error when invoking net.load_parameters. I did this whole procedure a few times in the past without having problems, did anything change? I’m running it on Ubuntu 18.04, with mxnet-mkl (1.6.0) and gluoncv (0.7.0).

I tried loading the .params file by:

from mxnet import nd
model = nd.load(0070.param)

and I wanted to modify it and remove the “ssd0_ssd0_” string that is causing the problem. I’m trying to navigate the dictionary, but between the keys I only found a:

ssd0_resnetv10_conv0_weight

so, slightly different than indicated in the error.

Anyway, this way of fixing the issue would be a little cumbersome, I’d prefer a more direct way.


Solution

  • Ok, fixed it. Basically, during training I was saving the .params file by using:

    net.export(param_file)
    

    and, as I said, loading them during inference by:

    net.load_parameters(param_file)
    

    However, it doesn’t work this way, but it does if instead of export I use:

    net.save_parameters(param_file)