python deep-learning pytorch openvino detectron

Converting Detectron2 to other frameworks

I have a custom trained detectron2 model for instance segmentation that I load and use for inference as below:

cfg = get_cfg()
cfg.merge_from_file(config_path)
cfg.MODEL.WEIGHTS = weights_path
cfg.MODEL.ROI_HEADS.NUM_CLASSES = 1
cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.7
cfg.MODEL.DEVICE='cpu'

predictor = DefaultPredictor(cfg)
outputs = predictor(im)

I have followed this tutorial from OpenVino in order to convert it to an OV model:

model = build_model(cfg)
DetectionCheckpointer(model).load(cfg.MODEL.WEIGHTS)
model.eval()

ov_model = convert_detectron2_model(model, im)

core = ov.Core()
ov_model = core.read_model("../model/model.xml")
compiled_model = ov.compile_model(ov_model)
results = compiled_model(sample_input[0]["image"])

However, I don't obtain the expected result from the complied ov model, ie. I normally should obtain one instance in the outputs and this is the case when using the classic detectron2 model, but when using the complied ov model, I don't get any instance detected.

Instances(num_instances=0, image_height=3024, image_width=4032, fields=[pred_boxes: Boxes(tensor([], size=(0, 4))), scores: [], pred_classes: [], pred_masks: tensor([], size=(0, 3024, 4032), dtype=torch.bool)])

Here is the information about the OV model:

<Model: 'Model4'
inputs[
<ConstOutput: names[args] shape[?,?,?] type: u8>
]
outputs[
<ConstOutput: names[tensor, 2193, 2195, 2190, 2188, 2172, 2167] shape[..100,4] type: f32>,
<ConstOutput: names[] shape[..100] type: i64>,
<ConstOutput: names[] shape[?,1,28,28] type: f32>,
<ConstOutput: names[] shape[..100] type: f32>,
<ConstOutput: names[image_size] shape[2] type: i64>
]>

UPDATE: As I really need to convert it to another framework in order to use model serving tools, I also tried to convert it to Torchscript. However, here again I have an empty output.

This time I followed the tutorial in this issue:

def inference_func(model, image):
    inputs= [{"image": image}]
    return model.inference(inputs, do_postprocess=False)[0]

wrapper= TracingAdapter(model, sample_input[0]["image"], inference_func)
wrapper.eval()

traced_script_module= torch.jit.trace(wrapper, (sample_input[0]["image"],))
traced_script_module.save("torchscript.pt")

The conversion is then done after a LOT of warnings, and here is a snippet of the resulting model:

RecursiveScriptModule(
  original_name=TracingAdapter
  (model): RecursiveScriptModule(
    original_name=GeneralizedRCNN
    (backbone): RecursiveScriptModule(
      original_name=FPN
      (fpn_lateral2): RecursiveScriptModule(original_name=Conv2d)
      (fpn_output2): RecursiveScriptModule(original_name=Conv2d)
      (fpn_lateral3): RecursiveScriptModule(original_name=Conv2d)
      (fpn_output3): RecursiveScriptModule(original_name=Conv2d)
      (fpn_lateral4): RecursiveScriptModule(original_name=Conv2d)
      (fpn_output4): RecursiveScriptModule(original_name=Conv2d)
      (fpn_lateral5): RecursiveScriptModule(original_name=Conv2d)
      (fpn_output5): RecursiveScriptModule(original_name=Conv2d)
      (top_block): RecursiveScriptModule(original_name=LastLevelMaxPool)
      (bottom_up): RecursiveScriptModule(
        original_name=ResNet
        (stem): RecursiveScriptModule(
          original_name=BasicStem
          (conv1): RecursiveScriptModule(
            original_name=Conv2d
            (norm): RecursiveScriptModule(original_name=FrozenBatchNorm2d)
          )
...

I then proceed the inference:

outputs = torchscript_model(sample_input[0]["image"])

And the resulting outputs are empty:

(tensor([], size=(0, 4), grad_fn=<ViewBackward0>),
 tensor([], dtype=torch.int64),
 tensor([], size=(0, 1, 28, 28), grad_fn=<SplitWithSizesBackward0>),
 tensor([], grad_fn=<IndexBackward0>),
 tensor([3024, 4032]))

I also have tried to export the model directly using the detectron2 functions documented here, but again I have the same issue in the resulting model's inference.

python ../detectron2-main/tools/deploy/export_model.py  --config-file ../detectron2-main/configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml --output torchscript_model --format torchscript --sample-image ../data/test.JPG --export-method tracing MODEL.DEVICE cpu MODEL.WEIGHTS ../model/custom_trained_model.pth

I'll take any idea!

Solution

Follow the Prepare input data steps in tutorial to apply preprocessing steps if you did not apply for your defined "im".

import detectron2.data.transforms as T
from detectron2.data import detection_utils

image_file = "example_image.jpg"

def get_sample_inputs(image_path, cfg):
    # get a sample data
    original_image = detection_utils.read_image(image_path, format=cfg.INPUT.FORMAT)
    # Do same preprocessing as DefaultPredictor
    aug = T.ResizeShortestEdge([cfg.INPUT.MIN_SIZE_TEST, cfg.INPUT.MIN_SIZE_TEST], cfg.INPUT.MAX_SIZE_TEST)
    height, width = original_image.shape[:2]
    image = aug.get_transform(original_image).apply_image(original_image)
    image = torch.as_tensor(image.astype("float32").transpose(2, 0, 1))

    inputs = {"image": image, "height": height, "width": width}

    # Sample ready
    sample_inputs = [inputs]
    return sample_inputs

sample_input = get_sample_inputs(image_file, cfg)
ov_model = convert_detectron2_model(model, sample_input)
core = ov.Core()
ov_model = core.read_model("../model/model.xml")
compiled_model = ov.compile_model(ov_model)
results = compiled_model(sample_input[0]["image"])