How do I prepare a custom keypoints dataset for WongKinYiu/yolov7?
The keypoints format is described here
https://cocodataset.org/#format-data
In particular this line
annotation{
"keypoints" : [x1,y1,v1,...],
...
}
says that keypoints are an array x1,y1,v1,...
.
The official yolov7-pose github
https://github.com/WongKinYiu/yolov7/tree/pose
has link to download prepared COCO dataset
[Keypoints Labels of MS COCO 2017]
Download it, open and go to directory labels\train2017
. You can open any of the txt
files and you will see lines looking something like this
0 0.671279 0.617945 0.645759 0.726859 0.519751 0.381250 2.000000 0.550936 0.348438 2.000000 0.488565 0.367188 2.000000 0.642412 0.354687 2.000000 0.488565 0.395313 2.000000 0.738046 0.526563 2.000000 0.446985 0.534375 2.000000 0.846154 0.771875 2.000000 0.442827 0.812500 2.000000 0.925156 0.964063 2.000000 0.507277 0.698438 2.000000 0.702703 0.942187 2.000000 0.555094 0.950000 2.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
This line has the following format
class top_left_x top_left_y bottom_right_x bottom_right_y kpt1_x kpt1_y kpt1_v kpt2_x kpt2_y kpt2_v ...
This is the code (from general.py
) responsible for loading it
def xywhn2xyxy(x, w=640, h=640, padw=0, padh=0, kpt_label=False):
# Convert nx4 boxes from [x, y, w, h] normalized to [x1, y1, x2, y2] where xy1=top-left, xy2=bottom-right
# it does the same operation as above for the key-points
y = x.clone() if isinstance(x, torch.Tensor) else np.copy(x)
y[:, 0] = w * (x[:, 0] - x[:, 2] / 2) + padw # top left x
y[:, 1] = h * (x[:, 1] - x[:, 3] / 2) + padh # top left y
y[:, 2] = w * (x[:, 0] + x[:, 2] / 2) + padw # bottom right x
y[:, 3] = h * (x[:, 1] + x[:, 3] / 2) + padh # bottom right y
if kpt_label:
num_kpts = (x.shape[1]-4)//2
for kpt in range(num_kpts):
for kpt_instance in range(y.shape[0]):
if y[kpt_instance, 2 * kpt + 4]!=0:
y[kpt_instance, 2*kpt+4] = w * y[kpt_instance, 2*kpt+4] + padw
if y[kpt_instance, 2 * kpt + 1 + 4] !=0:
y[kpt_instance, 2*kpt+1+4] = h * y[kpt_instance, 2*kpt+1+4] + padh
return y
which is called from
labels[:, 1:] = xywhn2xyxy(labels[:, 1:], ratio[0] * w, ratio[1] * h, padw=pad[0], padh=pad[1], kpt_label=self.kpt_label)
Note the 1
offset in labels[:, 1:]
, which omits the class label.
The label coordinates must be normalized as stated here
assert (l[:, 5::3] <= 1).all(), 'non-normalized or out of bounds coordinate labels'
assert (l[:, 6::3] <= 1).all(), 'non-normalized or out of bounds coordinate labels'
Getting the format of labels right is the only tricky part. The rest is to have images stored in the right directory. The structure is
images/
train/
file_name1.jpg
...
test/
val/
labels/
train/
file_name1.txt
...
test/
val/
train.txt
test.txt
val.txt
where train.txt
contains paths to images. It's contents look like this
./images/train/file_name1.jpg
...