pythontext-recognition

Text Recognition with pytesseract and cv2 or other libs


Please download the png file and save it as 'sample.png'.
enter image description here
I want to extract english characters in the png file.

import cv2
import pytesseract

img = cv2.imread("sample.png")
gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
thr = cv2.adaptiveThreshold(gry, 255, cv2.ADAPTIVE_THRESH_MEAN_C,
                            cv2.THRESH_BINARY_INV, 23, 100)
bnt = cv2.bitwise_not(thr)
txt = pytesseract.image_to_string(bnt, config="--psm 6")
res = ''.join(i for i in txt if i.isalnum())
print(res)

The output is

ee

Another try:

import cv2
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r'/bin/tesseract'
image = cv2.imread('sample.png')
gray_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
inverted_image = cv2.bitwise_not(gray_image)
binary_image = cv2.adaptiveThreshold(inverted_image, 255, 
                                     cv2.ADAPTIVE_THRESH_GAUSSIAN_C, 
                                     cv2.THRESH_BINARY, 11, 2)
denoised_image = cv2.medianBlur(binary_image, 3)
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (4, 4))
eroded_image = cv2.erode(denoised_image, kernel)
mask = (denoised_image == 255) & (eroded_image == 0)
denoised_image[mask] = 0
cv2.imwrite('preprocessed_image.png', denoised_image)
text = pytesseract.image_to_string(denoised_image, config='--psm 6')
print("result:", text.strip())

Get more accurate result than the first:

result:CRSP

It is 5 instead of S in the sample.png. How can I improve the code then?

enter image description here

Where is the number 5 then?


Solution

  • When working with images containing grid lines and noise, it's important to preprocess the image effectively to improve OCR accuracy. I've added some line removal, denoising, and text amplification. You might need to tweak the parameters a little bit but i've got the expected result from your sample image and also tried other samples with the same grid pattern, some different fonts and different colors and all worked correctly. You may need to change dilation kernel and line detection params to achieve a more accurate result. Here's a Google Colab notebook for trial and error

    Important Note: The reason that rizzling's code didn't work for you was the tesseract version. I have made sure the notebook linked above is using tesseract 5.4.1, which is the exact version i'm using on my machine


    import cv2
    import numpy as np
    import pytesseract
    from PIL import Image
    import matplotlib.pyplot as plt
    
    def preprocess_image(image_path):
        # Read the image
        img = cv2.imread(image_path)
        
        # Convert to grayscale
        gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
        
        # Removing gridlines (most important step)
        kernel = np.ones((2,2), np.uint8)
        gray = cv2.morphologyEx(gray, cv2.MORPH_CLOSE, kernel)
        
        # Thresholding
        _, thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)
        
        # Denoise
        denoised = cv2.fastNlMeansDenoising(thresh)
        
        # Dilate
        kernel = np.ones((1,3),np.uint8)
        dilated = cv2.dilate(denoised, kernel, iterations=1)
        
        return dilated
    
    def perform_ocr(image_path):
        # Preprocess the image
        processed_image = preprocess_image(image_path)
        
        # Configure Tesseract parameters
        custom_config = r'--oem 3 --psm 6 -c tessedit_char_whitelist=0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ'
        
        # Perform OCR
        try:
            text = pytesseract.image_to_string(processed_image, config=custom_config)
            return text.strip()
        except Exception as e:
            print(f"Error during OCR: {str(e)}")
            return None
    
    # Make sure you upload the image to colab for testing
    image_path = 'sample.png'
    
    # Perform OCR and get the text
    extracted_text = perform_ocr(image_path)
    
    if extracted_text:
        print("Extracted Text:")
        print(extracted_text)
    else:
        print("Failed to extract text")
    
    # Optional: Display the processed image
    processed = preprocess_image(image_path)
    plt.imshow(processed, cmap='gray')
    plt.axis('off')
    plt.show()