deep-learningcomputer-visionpytorchobject-recognitiondetectron

How to save a model using DefaultTrainer in Detectron2?


How can I save a checkpoint in Detectron2, using a DefaultTrainer? This is my setup:

cfg = get_cfg()
cfg.merge_from_file(model_zoo.get_config_file("COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml"))

cfg.DATASETS.TRAIN = (DatasetLabels.TRAIN,)
cfg.DATASETS.TEST = ()
cfg.DATALOADER.NUM_WORKERS = 2
cfg.MODEL.ROI_HEADS.NUM_CLASSES = 273  # Number of output classes

cfg.OUTPUT_DIR = "outputs"
os.makedirs(cfg.OUTPUT_DIR, exist_ok=True)
cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml")
cfg.SOLVER.IMS_PER_BATCH = 2
cfg.SOLVER.BASE_LR = 0.00025#0.00025  # Learning Rate
cfg.SOLVER.MAX_ITER = 10000  # 20000 MAx Iterations
cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 128  # Batch Size

trainer = DefaultTrainer(cfg) 
trainer.resume_or_load(resume=False)
trainer.train()


# Save the model
from detectron2.checkpoint import DetectionCheckpointer, Checkpointer
checkpointer = DetectionCheckpointer(trainer, save_dir=cfg.OUTPUT_DIR)
checkpointer.save("mymodel_0")  

I get the error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-94-c1116902655a> in <module>()

      4 checkpointer = DetectionCheckpointer(trainer, save_dir=cfg.OUTPUT_DIR)
----> 5 checkpointer.save("mymodel_0")

/usr/local/lib/python3.6/dist-packages/fvcore/common/checkpoint.py in save(self, name, **kwargs)
    102 
    103         data = {}
--> 104         data["model"] = self.model.state_dict()
    105         for key, obj in self.checkpointables.items():
    106             data[key] = obj.state_dict()

AttributeError: 'DefaultTrainer' object has no attribute 'state_dict'

Docs: https://detectron2.readthedocs.io/en/latest/modules/checkpoint.html


Solution

  • checkpointer = DetectionCheckpointer(trainer.model, save_dir=cfg.OUTPUT_DIR)
    

    is the way to go.

    Alternatively:

    torch.save(trainer.model.state_dict(), os.path.join(cfg.OUTPUT_DIR, "mymodel.pth"))