pythonocrtesseractdetectionpython-tesseract

How to get good OCR results using pytesseract


I'm trying to get the data out of this image:

Original Picture

and no matter what I try I can't get a good result.

I have tried ImageEnhance and cv2 I got the most promising result using cv2 and adaptive Treshold:

gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
inverted_image = cv2.bitwise_not(gray)
#_, thresh = cv2.threshold(inverted_image ,95, 255, cv2.THRESH_BINARY)
Test = cv2.adaptiveThreshold(inverted_image,120,cv2.ADAPTIVE_THRESH_GAUSSIAN_C,cv2.THRESH_BINARY,11,2)

Which gives me this picture:

Processed Picture

But tesseract doesn't seem to be able to "read it" I get the following output:

text = pytesseract.image_to_string((Test), lang='fra')

845 Vitalité

75 lnteligence +

35 Sagesse

1Poñge 15 Dommages Feu AaTace à

Teva eM

text = pytesseract.image_to_string((Test), lang='eng')

Extracted Text: 345 Vitalité

‘75 Inteligence ~

‘35 Sagesse

“onée. 1S Dommages Feu WBTaole 2s

7 Reve OM.

Doesn't look much better. I also tried bluring it with cv2.GaussianBlur(image, (3, 3), 0) but that just makes it unreadable.

I am missing something or doing something wrong?

Whole project for better understanding: I want to create a kind of bot who reads the information on a part of my screen does some calculations and does a simple action. There exist around 40 different words that can be in the screenshot. Would it be a solution to create a Database with all words and do a pixel comparison? But how would I manage the numbers? I don't want to save all the numbers from 1-500.

Any help is appreciated!


Solution

  • Use standard image pre-processing pipe line, see also doc of Tesseract.

    In the OP proposed binarisation with cv2.adaptiveThreshold seems just to add more steps to the pipeline (there is not benefit), so just used a mask.

    import cv2
    import matplotlib.pyplot as plt
    import pytesseract as ocr # <- alias!
    import numpy as np
    
    
    path = # location of the image
    
    im = cv2.imread(path)
    # select a color channel
    gim = im[:,:,1]
    # binary image with mask
    bim = np.zeros_like(gim)
    bim[gim>80] = 255
    # morphological operation
    kernel = np.array([[0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0],], dtype=np.uint8)
    dim = cv2.dilate(bim, kernel, iterations=10)
    # detect text-roi
    contours, _ = cv2.findContours(dim, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
    contours = sorted( # sort by height of the box
        contours,
        key=lambda cnt: cv2.boundingRect(cnt)[-1],
        reverse=True)[:7] # 7-rows
    
    fig = plt.figure()
    fig.subplots_adjust(hspace=.1)
    
    # loop over the ROI-rows
    rows_text = {}
    for i, cnt in enumerate(contours, 1):
        x, y, w, h = cv2.boundingRect(cnt)
        # reverse the image
        roi = cv2.bitwise_not(gim[y:y+h,x:x+w])
        # add border to improve OCR
        roi = cv2.copyMakeBorder(roi, *[10]*4, cv2.BORDER_CONSTANT, value=int(roi[0,0]))
        # resize
        f = 2.
        rroi = cv2.resize(roi, None, fx=f, fy=f)
        
        # OCR the ROIs
        text = ocr.image_to_string(rroi, lang='fra', config='--psm 6 --oem 3').strip()
        rows_text[y] = text
    
        # check results
        print(text)
        fig.add_subplot(1, len(contours), i).imshow(rroi, cmap='gray')
    
    # order the results per y-coordinates
    row_texts = dict(sorted(rows_text.items())).values()
    print(*row_texts, sep='\n')
    

    Output

    345 Vitalité
    75 Intelligence
    35 Sagesse
    1 Portée
    15 Dommages Feu
    13 Tacle
    7 Retrait PM
    

    pipeline_summary

    Some extra manual tuning of the arguments need to be done for more general cases. Further use a NLP or similar to check correctness of the OCR-ouptout.