pythonnumpypytorchconv-neural-network

Understanding Unusual YOLO Label Formats and Their Impact on Training


I’m working on a dataset of stationary objects, where the data is divided into train, test, and validation folders with corresponding images and labels. The labels are in text files with the following format:

2 0.3832013609375 0 0 0.19411217812499998 0 0.614612228125 0.1995640296875 1 0.619265075 1 1 0.8055533171875 1 0.386728209375 0.798922646875 0 0.3832013609375 0

I’m confused because I expected each bounding box to have just 5 numbers:

class_id, x_center, y_center, width, height.

But here, I see significantly more numbers. Could it be that this format represents something else? Are there additional possibilities for YOLO label formats that I’m unaware of?

Additional Context

The data was sourced from this website, but I couldn’t find clear documentation about this label format.

Here’s the part I don’t understand: when I pass this dataset to YOLO for training using the following code, the training process works without any issues:

def train_yolo(weight_name):
    weight_path = os.path.join(weights_folder, weight_name)

    model = YOLO(weight_path)

    # Train model and save new weights
    results = model.train(data=data_yaml, epochs=100, imgsz=640, batch=16, name=f"yolo_{weight_name.split('.')[0]}", save=True)

    return results

My data.yaml file contains:

train: ../train/images
val: ../valid/images
test: ../test/images

nc: 4
names: ['pencil', 'rubber', 'ruler', 'sharpner']

roboflow:
  workspace: waqas-hussain
  project: stationary-object-detector
  version: 8
  license: CC BY 4.0
  url: https://universe.roboflow.com/waqas-hussain/stationary-object-detector/dataset/8

There’s no direct reference to bounding box formats in this YAML file, yet YOLO processes the data correctly during training.

Questions:

  1. How does YOLO handle these unusual label formats?
  2. Could it be that my training was incorrect due to this strange bounding box format?
  3. Is there a way to confirm what this format represents and how it’s parsed by YOLO?

Any insights or pointers would be greatly appreciated!


Solution

  • From the picture on the website, I see that some of the annotations are not bounding boxes. They are polygons. A common way to encode a polygon is as a list of x/y pairs.

    So I would guess that the format is

    class_id x1 y1 x2 y2 x3 y3
    

    etc.

    To check this, I downloaded one of the pictures and its associated label. (Specifically, CamScanner-10-15-2023-14-29_86_jpg.rf.1042acb34a88542b82bbefa27b86569e.jpg I wrote a program which parsed this label and plotted it.

    Code:

    import numpy as np
    import matplotlib.pyplot as plt
    
    
    label_text = """1 0.3855721390625 0.17391304375 0.26533996718749997 0.1273291921875 0.10779436093749999 0.273291925 0.25290215625 0.3354037265625 0.3855721390625 0.17391304375
    0 0.9618573796875 0.381987578125 0.8872305140625001 0.3540372671875 0.327529021875 0.9782608703125 0.45190713125000004 1 0.9618573796875 0.381987578125
    2 0.970149253125 0.034161490625 0.8084577109375 0 0.0165837484375 0.9254658390625 0.0414593703125 0.9937888203125 0.178275290625 1 0.970149253125 0.034161490625"""
    
    
    lines = label_text.split('\n')
    for line in lines:
        line = line.split(' ')
        class_id = line[0]
        label_without_id = np.array([float(s) for s in line[1:]])
        label_x = label_without_id[::2]
        label_y = label_without_id[1::2]
        plt.plot(label_x, label_y, label=class_id)
        # The convention when working with image coordinates is that Y-axis gets bigger as you move down the image
        plt.gca().invert_yaxis()
    plt.legend()
    plt.show()
    

    Output:

    plot of yolo polygon bounding box

    That looks reasonably plausible, given the input. The aspect ratio is wrong, but they're likely expecting you to rescale the x/y coordinates by the image width/height. You can also compare this to the image labels on roboflow.