pythonopencvcomputer-visionocrdocument-layout-analysis

Divide an image into tiles based on text structure in Python OpenCV


I'm a beginner to computer vision and OpenCV, but I do have moderate experience with Python. I am trying to write a program that takes an image and divides the image into tiles based on the structural organization of the text. For example, given a menu like follow, original menu

I want to use computer vision to identify the table formatting of the texts and divide it into tiles like follow

modified pic

As of now, my purpose isn't to extract the text using OCR. All I need to do is identify the (hidden) table structure in the image and divide it into individual cells, and extract them as sub-images. Any approaches I can use?

Sorry I am really new to computer vision. Feel free to let me know if any other libraries from OpenCV are needed.


Solution

  • I see you have mentioned that you do not want OCR. However, let me still go forward and post this solution here with EasyOCR.

    import easyocr
    import cv2 as cv
    import numpy as np
    import os
    
    path = "menu.jpg"
    assert os.path.exists(path)
    
    #always a good idea to convert BGR to RGB when using OCR
    img = cv.imread(path)
    img = cv.cvtColor(img, cv.COLOR_BGR2RGB)
    
    viz_img = np.copy(img)
    
    #read the text
    reader = easyocr.Reader(['en'])
    text_data = reader.readtext(img, paragraph=True, x_ths=0.5)     #in order ([box-coords], text, confidence)
    
    print(text_data)
    
    #visualize
    for data in text_data:
        # box, text
        box, text = data
        top_left, top_right, bottom_right, bottom_left = box
    
        tl = [int(x) for x in top_left]
        br = [int(x) for x in bottom_right]
        cv.rectangle(viz_img, tl, br, (0, 255, 0), 4)
        cv.putText(viz_img, text, br, cv.FONT_HERSHEY_SIMPLEX, 0.5, (255, 0, 0), 2)
    
    cv.imwrite('viz_with_text.jpg', viz_img)
    

    The documentation of EasyOCR is here.

    Let me explain what I did.

    1. Read image and convert to RGB. From my own experience conversion to RGB gives better results in OCR.
    2. Setup EasyOCR reader. This reader has 3 methods i.e. detect for detection of text, recognize for recognition and readtext for detection and recognition pipeline.
    3. I have used the last method as it provides a functionality to merge vertical bounding boxes into paragraphs. This is what I have enabled with paragraph = True while calling the method. FYI, when you enable paragraph you won't get the confidence of the text recognized in the paragraph.
    4. You can get the box details of each section using the box-coordinates that is returned by the EasyOCR reader. You can check in the for loop in the code how I am parsing the result returned by the reader. FYI, when paragrah mode is disabled you get confidence of recognition as a third value.

    For controlling the extent of merging boxes to form paragraph you need to play with the parameters x_ths for merging horizontally and y_ths for merging vertically.

    Additional Information: If you see your text not being detected properly which can affect the output of the code you have to play with the parameters text_threshold, low_text and link_threshold.

    Please refer to the EasyOCR documentation I have linked above for more details on the parameters.

    The result on the image you have provided is as follows.

    Result