I'm trying to get the data out of this image:
and no matter what I try I can't get a good result.
I have tried ImageEnhance
and cv2
I got the most promising result using cv2 and adaptive Treshold:
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
inverted_image = cv2.bitwise_not(gray)
#_, thresh = cv2.threshold(inverted_image ,95, 255, cv2.THRESH_BINARY)
Test = cv2.adaptiveThreshold(inverted_image,120,cv2.ADAPTIVE_THRESH_GAUSSIAN_C,cv2.THRESH_BINARY,11,2)
Which gives me this picture:
But tesseract doesn't seem to be able to "read it" I get the following output:
text = pytesseract.image_to_string((Test), lang='fra')
845 Vitalité
75 lnteligence +
35 Sagesse
1Poñge 15 Dommages Feu AaTace à
Teva eM
text = pytesseract.image_to_string((Test), lang='eng')
Extracted Text: 345 Vitalité
‘75 Inteligence ~
‘35 Sagesse
“onée. 1S Dommages Feu WBTaole 2s
7 Reve OM.
Doesn't look much better. I also tried bluring it with cv2.GaussianBlur(image, (3, 3), 0)
but that just makes it unreadable.
I am missing something or doing something wrong?
Whole project for better understanding: I want to create a kind of bot who reads the information on a part of my screen does some calculations and does a simple action. There exist around 40 different words that can be in the screenshot. Would it be a solution to create a Database with all words and do a pixel comparison? But how would I manage the numbers? I don't want to save all the numbers from 1-500.
Any help is appreciated!
Use standard image pre-processing pipe line, see also doc of Tesseract
.
In the OP proposed binarisation with cv2.adaptiveThreshold
seems just to add more steps to the pipeline (there is not benefit), so just used a mask.
import cv2
import matplotlib.pyplot as plt
import pytesseract as ocr # <- alias!
import numpy as np
path = # location of the image
im = cv2.imread(path)
# select a color channel
gim = im[:,:,1]
# binary image with mask
bim = np.zeros_like(gim)
bim[gim>80] = 255
# morphological operation
kernel = np.array([[0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0],], dtype=np.uint8)
dim = cv2.dilate(bim, kernel, iterations=10)
# detect text-roi
contours, _ = cv2.findContours(dim, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
contours = sorted( # sort by height of the box
contours,
key=lambda cnt: cv2.boundingRect(cnt)[-1],
reverse=True)[:7] # 7-rows
fig = plt.figure()
fig.subplots_adjust(hspace=.1)
# loop over the ROI-rows
rows_text = {}
for i, cnt in enumerate(contours, 1):
x, y, w, h = cv2.boundingRect(cnt)
# reverse the image
roi = cv2.bitwise_not(gim[y:y+h,x:x+w])
# add border to improve OCR
roi = cv2.copyMakeBorder(roi, *[10]*4, cv2.BORDER_CONSTANT, value=int(roi[0,0]))
# resize
f = 2.
rroi = cv2.resize(roi, None, fx=f, fy=f)
# OCR the ROIs
text = ocr.image_to_string(rroi, lang='fra', config='--psm 6 --oem 3').strip()
rows_text[y] = text
# check results
print(text)
fig.add_subplot(1, len(contours), i).imshow(rroi, cmap='gray')
# order the results per y-coordinates
row_texts = dict(sorted(rows_text.items())).values()
print(*row_texts, sep='\n')
Output
345 Vitalité
75 Intelligence
35 Sagesse
1 Portée
15 Dommages Feu
13 Tacle
7 Retrait PM
Some extra manual tuning of the arguments need to be done for more general cases. Further use a NLP or similar to check correctness of the OCR-ouptout.