pytorchobject-detectionyoloyolov8label-studio

Dataset format mismatches between Ultralytics YOLOv8 training and Label Studio exported


I used Tensorflow before and am new to PyTorch and Ultralytics YOLOv8. I recently learnt to train (fine-tune) the YOLOv8 object detection model to fit my own dataset. However, the official documentation only shows how to train it in COCO8 format with YAML file. The dataset exported by Label Studio is only in JSON, not YAML (even exported as YOLO format!).

Then I tried to directly replace model.train(data="coco8.yaml", epochs=100, imgsz=640) with model.train(data="coco/result.json", epochs=100, imgsz=640). But no surprise, it not works.

So, I have 2 questions:

  1. Does YOLOv8 support other format of data to do fine-tuning? Then what's the problem with YOLO and YOLOv8 OBB option in Label Studio?
  2. Can I export YAML file with COCO8 format in Label Studio?

Edit:

By looking through the example coco8.yaml file in their GitHub, I find the yaml file can be easily hard-coded manually. However, I still wonder how the dataset exported by Label Studio can be used, and why the problem of format interface exists.


Solution

  • Does YOLOv8 support other format of data to do fine-tuning?

    No, Ultralytics YOLOv8 supports only datasets in the YOLO format, as described in the official documentation: for Object detection https://docs.ultralytics.com/datasets/detect/, for Oriented Bounding Box detection https://docs.ultralytics.com/datasets/obb/, and so on.

    Then what's the problem with YOLO and YOLOv8 OBB option in Label Studio?

    As expected, the data exported in the YOLO format from Label Studio will have the following content: notes.json, classes.txt, images and labels folders. You still need to do some preprocessing to create a YOLO dataset from these files.

    Can I export YAML file with COCO8 format in Label Studio?

    The YAML file here is just a short text description of the YOLO dataset (dataset path and class list), you need to create it manually in this case.

    1. First, partition the images and their corresponding labels into train and validation sets so that you will have train and val folders with images and labels folders inside each of them.
    2. Create a new text file and describe your dataset in the following format:
    # Train/val/test sets as 1) dir: path/to/imgs
    path: "../datasets/mydataset" # dataset root dir
    train: "train" # train folder (relative to 'path')
    val: "val" # validation folder (relative to 'path')
    test: # test folder (optional)
    
    # Classes list
    names:
        0: person
        1: bicycle
        2: car
    
    1. Rename this text file to mydataset.yaml. This is your YAML file which is absent in the original exported data from Label Studio.
    2. Create the dataset folder and put into it your train and val folders, as well as the mydataset.yaml file. You don't need the json file or classes.txt file (dataset classes information is already in the .yaml file now)

    After this preprocessing of the exported dataset, you can train the YOLOv8 model on it:

    from ultralytics import YOLO
    
    # Load a model
    model = YOLO("yolov8n.pt")  # load a pretrained model (recommended for training)
    
    # Train the model
    results = model.train(data="/path/to/mydataset.yaml", epochs=100, imgsz=640)