tensorflowobject-detectionbounding-boxpost-processingsingle-shot-detector

Post process of tf2 SSD detection models


I need to use a detection model of tensorflow on ios. The tflite model file available in 1 is in uint8 format which doesn't work well with CoreML, so I decided to download a full model and convert it to tflite myself.

All SSD models in the TF zoo contain Non-maximum suppression algorithm, and since NMS doesn't work well with tflite, I removed the post-process function in export_tflite_graph_lib_tf2 code in order to create a tflite model without NMS.

So now I have a working detection model (ssd mobilenetv2 to be exact) which outputs (box_encodings, class_predictions, anchors) instead of (boxes, classes, scores, num_detections)

How do I create a bbox out of each box_encoding, anchors?

I found this formula in 2:

ycenter = y / y_scale * anchor.h + anchor.y;  
xcenter = x / x_scale * anchor.w + anchor.x;  
half_h = 0.5*exp(h/ h_scale)) * anchor.h;  
half_w = 0.5*exp(w / w_scale)) * anchor.w;  
ymin = ycenter - half_h; 
ymax = ycenter + half_h; 
xmin = xcenter - half_w; 
xmax = xcenter + half_w;

But I'm not sure what is y_scale, x_scale, etc. It that the image size? (320*320). If so, the numbers don't match.

For example, a box encoding of (0.425, 0.225, 0.399, 0.200) and an anchor of (0.958, 1.315, 1.437 , 0.938) creates bbox of (0.496, 0.953, 2.641, 2.099). These number don't make sense to me (I was expecting all four numbers (corners of the bbox) to be in range [0,1]). Can anyone clarify this? Thanks


Solution

  • Discovered the answer:

    1. my box_encodings, anchors got mixed.
    2. the scaling factors are: x,y: 10; w,h:5 (and not 320)

    Now the created boxes are created correctly