I’m working on a dataset of stationary objects, where the data is divided into train, test, and validation folders with corresponding images and labels. The labels are in text files with the following format:
2 0.3832013609375 0 0 0.19411217812499998 0 0.614612228125 0.1995640296875 1 0.619265075 1 1 0.8055533171875 1 0.386728209375 0.798922646875 0 0.3832013609375 0
I’m confused because I expected each bounding box to have just 5 numbers:
class_id, x_center, y_center, width, height.
But here, I see significantly more numbers. Could it be that this format represents something else? Are there additional possibilities for YOLO label formats that I’m unaware of?
Additional Context
The data was sourced from this website, but I couldn’t find clear documentation about this label format.
Here’s the part I don’t understand: when I pass this dataset to YOLO for training using the following code, the training process works without any issues:
def train_yolo(weight_name):
weight_path = os.path.join(weights_folder, weight_name)
model = YOLO(weight_path)
# Train model and save new weights
results = model.train(data=data_yaml, epochs=100, imgsz=640, batch=16, name=f"yolo_{weight_name.split('.')[0]}", save=True)
return results
My data.yaml file contains:
train: ../train/images
val: ../valid/images
test: ../test/images
nc: 4
names: ['pencil', 'rubber', 'ruler', 'sharpner']
roboflow:
workspace: waqas-hussain
project: stationary-object-detector
version: 8
license: CC BY 4.0
url: https://universe.roboflow.com/waqas-hussain/stationary-object-detector/dataset/8
There’s no direct reference to bounding box formats in this YAML file, yet YOLO processes the data correctly during training.
Questions:
Any insights or pointers would be greatly appreciated!
From the picture on the website, I see that some of the annotations are not bounding boxes. They are polygons. A common way to encode a polygon is as a list of x/y pairs.
So I would guess that the format is
class_id x1 y1 x2 y2 x3 y3
etc.
To check this, I downloaded one of the pictures and its associated label. (Specifically, CamScanner-10-15-2023-14-29_86_jpg.rf.1042acb34a88542b82bbefa27b86569e.jpg I wrote a program which parsed this label and plotted it.
Code:
import numpy as np
import matplotlib.pyplot as plt
label_text = """1 0.3855721390625 0.17391304375 0.26533996718749997 0.1273291921875 0.10779436093749999 0.273291925 0.25290215625 0.3354037265625 0.3855721390625 0.17391304375
0 0.9618573796875 0.381987578125 0.8872305140625001 0.3540372671875 0.327529021875 0.9782608703125 0.45190713125000004 1 0.9618573796875 0.381987578125
2 0.970149253125 0.034161490625 0.8084577109375 0 0.0165837484375 0.9254658390625 0.0414593703125 0.9937888203125 0.178275290625 1 0.970149253125 0.034161490625"""
lines = label_text.split('\n')
for line in lines:
line = line.split(' ')
class_id = line[0]
label_without_id = np.array([float(s) for s in line[1:]])
label_x = label_without_id[::2]
label_y = label_without_id[1::2]
plt.plot(label_x, label_y, label=class_id)
# The convention when working with image coordinates is that Y-axis gets bigger as you move down the image
plt.gca().invert_yaxis()
plt.legend()
plt.show()
Output:
That looks reasonably plausible, given the input. The aspect ratio is wrong, but they're likely expecting you to rescale the x/y coordinates by the image width/height. You can also compare this to the image labels on roboflow.