pythonpandassortingcoordinatescoordinate-systems

Sorting objects by coordinates from left to right from top to bottom


I am writing a small program that recognizes characters from a webcam. There are two types of symbols: single-line and double-line. It looks something like this:

First example:

First example

I need to arrange all recognized characters from left to right from top to bottom, that is, like this:

Second example:

Second example

If everything is clear with a one-line photo: we just sort by the x coordinate, then I have difficulties with two-line photos. I was trying to write a simple sort by type like this: sorted_df = df.sort_values(by=['y', 'x'], ascending=[False, True])

But such a decision is very often wrong. The problem is also that the input image with the symbols may be at a slight angle.

The input looks like this. I use panda to work with them.

import pandas as pd

data = {
        "xmin": [73.728722, 58.541206, 43.370064, 18.349848, 84.141769, 74.219193, 63.876919, 32.109692, 13.477271],
        "ymin": [9.410283, 10.085771, 10.857979, 12.260820, 36.286518, 36.769310, 37.599922, 39.808289, 40.412071],
        "xmax": [85.914436, 70.791809, 56.026375, 33.629444, 92.453529, 82.558533, 72.851395, 47.012421, 27.849062],
        "ymax": [29.401623, 29.874952, 31.069559, 32.480732, 51.482807, 51.720161, 52.238033, 58.858406, 59.132389],
        "name": ["A", "B", "C", "D", "1", "2", "3", "4", "5"]
    }

df = pd.DataFrame(data) 

Does anyone have a simple and effective solution to this? Which direction should I move in? I will be very grateful!


Solution

  • You can first sort on ymin, compute a diff and form groups with cumsum and a threshold. Then sort again based on this group and xmin:

    # Y-axis threshold at which one considers a new row
    threshold = 10
    out = (df.assign(y=df['ymin'].sort_values(ascending=False)
                                 .diff().abs().gt(threshold).cumsum())
             .sort_values(by=['y', 'xmin'])
           )
    

    Output:

            xmin       ymin       xmax       ymax name  y
    8  13.477271  40.412071  27.849062  59.132389    5  0
    7  32.109692  39.808289  47.012421  58.858406    4  0
    6  63.876919  37.599922  72.851395  52.238033    3  0
    5  74.219193  36.769310  82.558533  51.720161    2  0
    4  84.141769  36.286518  92.453529  51.482807    1  0
    3  18.349848  12.260820  33.629444  32.480732    D  1
    2  43.370064  10.857979  56.026375  31.069559    C  1
    1  58.541206  10.085771  70.791809  29.874952    B  1
    0  73.728722   9.410283  85.914436  29.401623    A  1
    

    Graph:

    enter image description here

    NB. if you want to invert the Y-axis, use df['ymin'].sort_values() instead of df['ymin'].sort_values(ascending=False).

    threshold = 10
    out = (df.assign(y=df['ymin'].sort_values()
                                 .diff().abs().gt(threshold).cumsum())
             .sort_values(by=['y', 'xmin'])
           )
    

    enter image description here