pythoncomputer-visionbounding-box

`[0, 267, 270, 468]` describes a bbox, how do I get it from `[266.67, 0.0, 201.69, 269.58]`?


i got a set of modified annotations for a bunch of coco images. for example, [0, 267, 270, 468] and [254, 250, 458, 454] are 2 pieces from the set, describing two bboxes for the following image. (named 000000173350.jpg in the original dataset)

enter image description here

although they are not in the form of [x, y, width, height], the following ones, which come from the original dataset, are

[266.67, 0.0, 201.69, 269.58] and [250.18, 254.3, 203.64, 203.64]

with the original annotations i can draw bboxes easily

enter image description here

i can decode some part of the modified annotations, since the original ones could be rephrased as [267, 0, 202, 270] (ceiling) and [250, 254, 203, 203] (floor), and the xs and ys parts are swapped.

however, i cannot imagine the rest of the modified annotations, how do i get them from the original annotations?


Solution

  • If I understand correctly you want to achieve the [x0, y0, x1, y1] format, denoting the top-left (x0, y0) and bottom-right (x1, y1) coordinates from the [y, x, h, w] format you have. If so, you can do as follows:

    def convert_bbox(box: tuple) -> tuple:
        y0, x0, h, w = box
    
        x0 = round(x0)
        y0 = round(y0)
        x1 = x0 + round(w)
        y1 = y0 + round(h)
    
        return x0, y0, x1, y1
    

    which yields:

    >>> convert_bbox([266.67, 0.0, 201.69, 269.58])
    (0, 267, 270, 469)
    
    >>> convert_bbox([250.18, 254.3, 203.64, 203.64])
    (254, 250, 458, 454)