machine-learningmathimage-processingcomputer-visionyolo

How to convert bounding box (x1, y1, x2, y2) to YOLO Style (X, Y, W, H)


I'm training a YOLO model, I have the bounding boxes in this format:-

x1, y1, x2, y2 => ex (100, 100, 200, 200)

I need to convert it to YOLO format to be something like:-

X, Y, W, H => 0.436262 0.474010 0.383663 0.178218

I already calculated the center point X, Y, the height H, and the weight W. But still need a away to convert them to floating numbers as mentioned.


Solution

  • YOLO normalises the image space to run from 0 to 1 in both x and y directions. To convert between your (x, y) coordinates and yolo (u, v) coordinates you need to transform your data as u = x / XMAX and y = y / YMAX where XMAX, YMAX are the maximum coordinates for the image array you are using.

    This all depends on the image arrays being oriented the same way.

    Here is a C function to perform the conversion

    #include <stdlib.h>
    #include <stdio.h>
    #include <errno.h>
    #include <math.h>
    
    struct yolo {
        float   u;
        float   v;
        };
    
    struct yolo
    convert (unsigned int x, unsigned int y, unsigned int XMAX, unsigned int YMAX)
    {
        struct yolo point;
    
        if (XMAX && YMAX && (x <= XMAX) && (y <= YMAX))
        {
            point.u = (float)x / (float)XMAX;
            point.v = (float)y / (float)YMAX;
        }
        else
        {
            point.u = INFINITY;
            point.v = INFINITY;
            errno = ERANGE;
        }
    
        return point;
    }/* convert */
    
    
    int main()
    {
        struct yolo P;
    
        P = convert (99, 201, 255, 324);
    
        printf ("Yolo coordinate = <%f, %f>\n", P.u, P.v);
    
        exit (EXIT_SUCCESS);
    }/* main */