Please download the png file and save it as 'sample.png'.
I want to extract english characters in the png file.
import cv2
import pytesseract
img = cv2.imread("sample.png")
gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
thr = cv2.adaptiveThreshold(gry, 255, cv2.ADAPTIVE_THRESH_MEAN_C,
cv2.THRESH_BINARY_INV, 23, 100)
bnt = cv2.bitwise_not(thr)
txt = pytesseract.image_to_string(bnt, config="--psm 6")
res = ''.join(i for i in txt if i.isalnum())
print(res)
The output is
ee
Another try:
import cv2
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r'/bin/tesseract'
image = cv2.imread('sample.png')
gray_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
inverted_image = cv2.bitwise_not(gray_image)
binary_image = cv2.adaptiveThreshold(inverted_image, 255,
cv2.ADAPTIVE_THRESH_GAUSSIAN_C,
cv2.THRESH_BINARY, 11, 2)
denoised_image = cv2.medianBlur(binary_image, 3)
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (4, 4))
eroded_image = cv2.erode(denoised_image, kernel)
mask = (denoised_image == 255) & (eroded_image == 0)
denoised_image[mask] = 0
cv2.imwrite('preprocessed_image.png', denoised_image)
text = pytesseract.image_to_string(denoised_image, config='--psm 6')
print("result:", text.strip())
Get more accurate result than the first:
result:CRSP
It is 5
instead of S
in the sample.png
. How can I improve the code then?
Where is the number 5
then?
When working with images containing grid lines and noise, it's important to preprocess the image effectively to improve OCR accuracy. I've added some line removal, denoising, and text amplification. You might need to tweak the parameters a little bit but i've got the expected result from your sample image and also tried other samples with the same grid pattern, some different fonts and different colors and all worked correctly. You may need to change dilation kernel and line detection params to achieve a more accurate result. Here's a Google Colab notebook for trial and error
Important Note: The reason that rizzling's code didn't work for you was the tesseract version. I have made sure the notebook linked above is using tesseract 5.4.1, which is the exact version i'm using on my machine
import cv2
import numpy as np
import pytesseract
from PIL import Image
import matplotlib.pyplot as plt
def preprocess_image(image_path):
# Read the image
img = cv2.imread(image_path)
# Convert to grayscale
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# Removing gridlines (most important step)
kernel = np.ones((2,2), np.uint8)
gray = cv2.morphologyEx(gray, cv2.MORPH_CLOSE, kernel)
# Thresholding
_, thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)
# Denoise
denoised = cv2.fastNlMeansDenoising(thresh)
# Dilate
kernel = np.ones((1,3),np.uint8)
dilated = cv2.dilate(denoised, kernel, iterations=1)
return dilated
def perform_ocr(image_path):
# Preprocess the image
processed_image = preprocess_image(image_path)
# Configure Tesseract parameters
custom_config = r'--oem 3 --psm 6 -c tessedit_char_whitelist=0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ'
# Perform OCR
try:
text = pytesseract.image_to_string(processed_image, config=custom_config)
return text.strip()
except Exception as e:
print(f"Error during OCR: {str(e)}")
return None
# Make sure you upload the image to colab for testing
image_path = 'sample.png'
# Perform OCR and get the text
extracted_text = perform_ocr(image_path)
if extracted_text:
print("Extracted Text:")
print(extracted_text)
else:
print("Failed to extract text")
# Optional: Display the processed image
processed = preprocess_image(image_path)
plt.imshow(processed, cmap='gray')
plt.axis('off')
plt.show()