I'm trying to create a custom object detection model in tflite format so that I can use it in a flutter application with the google_mlkit_object_detection package.
The first few models, I created using yolov8 and converted to tflite. The annotations were made with roboflow and the google colab notebook I used was provided by them, the metadata of the converted tflite model looks like the following image
On this model I was getting the error
Input tensor has type kTfLiteFloat32: it requires specifying NormalizationOptions metadata to preprocess input images.
So as suggested I tried to change the metadata and add normalizationOptions but failed to do so. My second alternative was to train a model with the official TensorFlow google colab notebook TensorFlow Lite Model Maker and it generated a model with the following metadata
For this model the error was
Unexpected number of dimensions for output index 1: got 3D, expected either 2D (BxN with B=1) or 4D
So I checked the model from the example app from the package I am using "google_mlkit_object_detection" and the metadata looks like this
So my question is, how can I alter the models I already trained whichever it is easier, to look like this, both input and output, do I have to alter my model's architecture or just the metadata? The second one trained with the official notebook from tensor flow, it seems that all I have to do is include the correct shape format [1,N], but again I might have to change the architecture.
The Custom Models page on the Google documentation says this:
Note: ML Kit only supports custom image classification models. Although AutoML Vision allows training of object detection models, these cannot be used with ML Kit.
So there it is, you can use custom object detection models, but only if they are image classification models. I thought this must be impossible, an object detection model outputs bounding boxes, while a classification model outputs class scores. However, I tried with the YOLOv8 model and the standard object detection model wouldn't work, but the classification model with the [1, 1000] output shape actually works with the Google MLKit example application and you can extract the bounding boxes from it.
I'm not 100% sure how this can work, but what I suspect is that there is a default object detector bundled with the package, which identifies where there could be objects, and then you can only modify the classification model on top of it.
Anyways the simple answer is: Use a classification model with a [1, N] or [1, 1, 1, N] output where N is the number of classes. If you have a model with a different architecture, then you should change the output to this format, otherwise it is not supposed to work.