I'm experiencing a significant performance discrepancy between my YOLOv8n object detection model in its original PyTorch format (.pt) and after converting it to CoreML format for deployment on iOS devices. The original model, trained on a custom dataset, detects objects successfully in a given image. However, the converted CoreML model fails to detect any objects in the same image. I tested in some other images. Although it detects objects in iOS and Mac devices, it does not perform the same as the original .pt detection.
Details:
Original Model: YOLOv8n, trained on a custom dataset using Ultralytics' implementation. in CoLap by using A100.
Conversion Tool:
!pip install coremltools
from ultralytics import YOLO
model_path=f"{HOME}/runs/detect/train/weights/best.pt"
model=YOLO(model_path)
model.export(format='coreml', nms=True)
Questions:
Are there specific layers or operations in YOLOv8n that are known to have compatibility issues with CoreML?
What are the recommended steps for debugging such a discrepancy in object detection performance between the original and converted models?
Any insights or suggestions for further troubleshooting?
After digging more into documentation and some R&D on export from Ultralytics, this is what I found:
The images in dataset for training model were in W:1280 H:720 size. So during the train, I saw some warnings from Ultralytics
WARNING ⚠️ updating to 'imgsz=1280'. 'train' and 'val' imgsz must be an integer, while 'predict' and 'export' imgsz may be a [h, w] list or an integer, i.e. 'yolo export imgsz=640,480' or 'yolo export imgsz=640'
To export for CoreML it's better to define the exact image sizes based on the ratio you are going to use for predictions. So I used this command on my Jupyter notebook running locally on my Macbook Pro M1:
from ultralytics import YOLO
model_path="{Path to your .pt model}"
model=YOLO(model_path)
model.export(format='coreml', nms=True, imgsz=[720,1280])
And now the predictions are even faster and more precise even on iOS devices. (tested on iPhone 14 Pro max)
Source images are in 4K resolution, so I used these settings to create the right CVPixelBuffer
for passing to the VNImageRequestHandler
for developing object detection applications in swift.
let outputSettings: [String: Any] = [
kCVPixelBufferPixelFormatTypeKey as String: kCVPixelFormatType_32BGRA,
kCVPixelBufferWidthKey as String: 1280,
kCVPixelBufferHeightKey as String: 720]