I've just started learning about artificial intelligence image capture.I learned that I needed a dataset, so I took a few screenshots.I labeled the objects with roboflow and trained the model with yolov8. But no matter what I did, I could not detect the fish (small, moving, shadowy) object correctly.I think the Preprocessing and Augmentation sections are very important in roboflow. I applied tiling, but I didn't understand what I should do on the code side. I'm so confused :D I need your help.
roboflow dataset:
https://universe.roboflow.com/test-uifst/test55/dataset/3
The model was trained on : https://colab.research.google.com/github/roboflow-ai/notebooks/blob/main/notebooks/train-yolov8-object-detection-on-custom-dataset.ipynb#
python :
import cv2
from ultralytics import YOLO
import supervision as sv
import mss
import numpy as np
import pyautogui
import time
model = YOLO('best.pt')
sct = mss.mss()
monitor = sct.monitors[1]
while True:
screenshot = sct.grab(monitor)
img = np.array(screenshot)
# RGB (3 CHANNEL)
img = cv2.cvtColor(img, cv2.COLOR_BGRA2RGB)
results = model(img)
for result in results:
for bbox in result.boxes:
if bbox.cls == 0: # "fish"
x1, y1, x2, y2 = bbox.xyxy[0]
As discussed in the comments, it seems the resolution of your screen is big enough to make it more difficult for model to detect small objects.
One solution is to take screenshot of smaller area and run inference on that instead of whole screen:
import cv2
from ultralytics import YOLO
import supervision as sv
import mss
import numpy as np
import pyautogui
import time
model = YOLO('best.pt')
sct = mss.mss()
monitor = sct.monitors[1]
while True:
screenshot = sct.grab((100, 100, 600, 600))
img = np.array(screenshot)
# RGB (3 CHANNEL)
img = cv2.cvtColor(img, cv2.COLOR_BGRA2RGB)
results = model(img)
for result in results:
for bbox in result.boxes:
if bbox.cls == 0: # "fish"
x1, y1, x2, y2 = bbox.xyxy[0]
However it would be a challenge to ensure your objects are in the area.
You can also use Supervision InferenceSlicer
Nice blog post from supervision on detecting small objects: https://supervision.roboflow.com/develop/how_to/detect_small_objects/#inference-slicer
Below is your example modified to use slicer, I knocked it without checking if it runs so errors are expected, my intention was to give you an idea and a starting point:
import cv2
from ultralytics import YOLO
import supervision as sv
import mss
import numpy as np
import pyautogui
import time
model = YOLO('best.pt')
def slicer_callback(image_slice: np.ndarray):
h, w, *_ = image_slice.shape
result = model(image_slice)
detections = sv.Detections.from_inference({
"predictions": [
{
"class": bbox.class_name, # your class name here, I guessed this property will be available
"class_id": bbox.cls,
"x": bbox.xyxy[0][0] + (bbox.xyxy[0][2] - bbox.xyxy[0][0]) // 2, # center point
"y": bbox.xyxy[0][1] + (bbox.xyxy[0][3] - bbox.xyxy[0][1]) // 2, # center point
"width": (bbox.xyxy[0][2] - bbox.xyxy[0][0]) // 2,
"height": (bbox.xyxy[0][3] - bbox.xyxy[0][1]) // 2,
"confidence": bbox.confidence, # your confidence here, I guessed this property will be available
},
for bbox in result.boxes for result in results
],
"image": {"width": w, "height": h}
})
return detections
slicer = sv.InferenceSlicer(
callback=slicer_callback,
slice_wh=(slice_width, slice_height),
overlap_ratio_wh=(overlap_ratio_width, overlap_ratio_height),
iou_threshold=iou_threshold,
thread_workers=1
)
while True:
screenshot = sct.grab(monitor)
img = np.array(screenshot)
# RGB (3 CHANNEL) -- check if this step is required, cv.imshow("", img) before and after cvtColor
img = cv2.cvtColor(img, cv2.COLOR_BGRA2RGB)
result: sv.Detections = slicer(image)
fishes = result[result.class_id == 0]
print(fishes.xyxy)