pythonopencvimage-processingcomputer-visionpolygon

drawing from OpenCV fillConvexPoly() does not match the input polygon


I'm trying to follow the solution detailed at this question to prepare a dataset to train a CRNN for HTR (Handwritten Text Recognition). I'm using eScriptorium to adjust text segmentation and transcription, exporting in ALTO format (one XML with text region coordinates for each image) and parsing the ALTO XML to grab the text image regions and export them individually to create a dataset.

The problem I'm finding is that I have the region defined at eScriptorium, like this:

Image text region detected in eScriptorium and adjusted manually

But when I apply this code from the selected solution for the above linked question:

# Initialize mask
mask = np.zeros((img.shape[0], img.shape[1]))

# Create mask that defines the polygon of points
cv2.fillConvexPoly(mask, pts, 1)
mask = mask > 0 # To convert to Boolean

# Create output image (untranslated)
out = np.zeros_like(img)
out[mask] = img[mask]

and display the image I get some parts of the text region filled:

Image region resulting of above Python code

As you can see, some areas that should be inside the mask are filled and, therefore, the image pixels in them are not copied. I've made sure the pixels that make the polygon are correctly parsed and handed to OpenCV to build the mask. I can't find the reason why those areas are filled and I wonder if anyone got into a similar problem and managed to find out the reason or how to avoid it.

TIA


Solution

  • You called cv.fillConvexPoly(). Your polygon is not convex. The algorithm assumed it to be convex and took some shortcuts to simplify the drawing code, so it came out wrong.

    Use cv.fillPoly() instead. That will draw non-convex polygons correctly.

    As you point out, the function signatures are not drop-in compatible. fillPoly() works on a list of polygons, while fillComplexPoly() just takes a single polygon.

    cv.fillConvexPoly(img, points, color)
    # would be replaced with
    cv.fillPoly(img, [points], color) # list of one polygon
    

    Each polygon should be a numpy array of shape (N, 1, 2) and it probably needs to be of an integer dtype too, although I'm not sure about that now and it might support floating point dtype in the future.