I need to use a detection model of tensorflow on ios. The tflite model file available in 1 is in uint8 format which doesn't work well with CoreML, so I decided to download a full model and convert it to tflite myself.
All SSD models in the TF zoo contain Non-maximum suppression algorithm, and since NMS doesn't work well with tflite, I removed the post-process function in export_tflite_graph_lib_tf2 code in order to create a tflite model without NMS.
So now I have a working detection model (ssd mobilenetv2 to be exact) which outputs (box_encodings, class_predictions, anchors) instead of (boxes, classes, scores, num_detections)
How do I create a bbox out of each box_encoding, anchors?
I found this formula in 2:
ycenter = y / y_scale * anchor.h + anchor.y;
xcenter = x / x_scale * anchor.w + anchor.x;
half_h = 0.5*exp(h/ h_scale)) * anchor.h;
half_w = 0.5*exp(w / w_scale)) * anchor.w;
ymin = ycenter - half_h;
ymax = ycenter + half_h;
xmin = xcenter - half_w;
xmax = xcenter + half_w;
But I'm not sure what is y_scale, x_scale, etc. It that the image size? (320*320). If so, the numbers don't match.
For example, a box encoding of (0.425, 0.225, 0.399, 0.200) and an anchor of (0.958, 1.315, 1.437 , 0.938) creates bbox of (0.496, 0.953, 2.641, 2.099). These number don't make sense to me (I was expecting all four numbers (corners of the bbox) to be in range [0,1]). Can anyone clarify this? Thanks
Discovered the answer:
Now the created boxes are created correctly