python ocr tesseract detection python-tesseract

How to get good OCR results using pytesseract

I'm trying to get the data out of this image:

Original Picture

and no matter what I try I can't get a good result.

I have tried ImageEnhance and cv2 I got the most promising result using cv2 and adaptive Treshold:

gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
inverted_image = cv2.bitwise_not(gray)
#_, thresh = cv2.threshold(inverted_image ,95, 255, cv2.THRESH_BINARY)
Test = cv2.adaptiveThreshold(inverted_image,120,cv2.ADAPTIVE_THRESH_GAUSSIAN_C,cv2.THRESH_BINARY,11,2)

Which gives me this picture:

Processed Picture

But tesseract doesn't seem to be able to "read it" I get the following output:

text = pytesseract.image_to_string((Test), lang='fra')

845 Vitalité

75 lnteligence +

35 Sagesse

1Poñge 15 Dommages Feu AaTace à

Teva eM

text = pytesseract.image_to_string((Test), lang='eng')

Extracted Text: 345 Vitalité

‘75 Inteligence ~

‘35 Sagesse

“onée. 1S Dommages Feu WBTaole 2s

7 Reve OM.

Doesn't look much better. I also tried bluring it with cv2.GaussianBlur(image, (3, 3), 0) but that just makes it unreadable.

I am missing something or doing something wrong?

Whole project for better understanding: I want to create a kind of bot who reads the information on a part of my screen does some calculations and does a simple action. There exist around 40 different words that can be in the screenshot. Would it be a solution to create a Database with all words and do a pixel comparison? But how would I manage the numbers? I don't want to save all the numbers from 1-500.

Any help is appreciated!

Solution

Use standard image pre-processing pipe line, see also doc of Tesseract.

In the OP proposed binarisation with cv2.adaptiveThreshold seems just to add more steps to the pipeline (there is not benefit), so just used a mask.

import cv2
import matplotlib.pyplot as plt
import pytesseract as ocr # <- alias!
import numpy as np


path = # location of the image

im = cv2.imread(path)
# select a color channel
gim = im[:,:,1]
# binary image with mask
bim = np.zeros_like(gim)
bim[gim>80] = 255
# morphological operation
kernel = np.array([[0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0],], dtype=np.uint8)
dim = cv2.dilate(bim, kernel, iterations=10)
# detect text-roi
contours, _ = cv2.findContours(dim, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
contours = sorted( # sort by height of the box
    contours,
    key=lambda cnt: cv2.boundingRect(cnt)[-1],
    reverse=True)[:7] # 7-rows

fig = plt.figure()
fig.subplots_adjust(hspace=.1)

# loop over the ROI-rows
rows_text = {}
for i, cnt in enumerate(contours, 1):
    x, y, w, h = cv2.boundingRect(cnt)
    # reverse the image
    roi = cv2.bitwise_not(gim[y:y+h,x:x+w])
    # add border to improve OCR
    roi = cv2.copyMakeBorder(roi, *[10]*4, cv2.BORDER_CONSTANT, value=int(roi[0,0]))
    # resize
    f = 2.
    rroi = cv2.resize(roi, None, fx=f, fy=f)
    
    # OCR the ROIs
    text = ocr.image_to_string(rroi, lang='fra', config='--psm 6 --oem 3').strip()
    rows_text[y] = text

    # check results
    print(text)
    fig.add_subplot(1, len(contours), i).imshow(rroi, cmap='gray')

# order the results per y-coordinates
row_texts = dict(sorted(rows_text.items())).values()
print(*row_texts, sep='\n')

Output

345 Vitalité
75 Intelligence
35 Sagesse
1 Portée
15 Dommages Feu
13 Tacle
7 Retrait PM

Some extra manual tuning of the arguments need to be done for more general cases. Further use a NLP or similar to check correctness of the OCR-ouptout.