I'm working on a license plate recognition system in Python using OpenCV for image processing and Tesseract OCR for character recognition. I've written a function to process the image of the license plate and extract text from it, but I'm encountering issues with detecting both large and small characters consistently. The function aims to sharpen the image, increase contrast, and apply various preprocessing steps before using Tesseract OCR.
@staticmethod
def process_license_plate(frame, x1, y1, x2, y2):
"""
Extracts and processes the license plate area from the frame using enhanced OCR.
Parameters:
- frame (numpy.ndarray): The image or video frame containing the license plate.
- x1, y1, x2, y2 (int): Coordinates of the license plate bounding box.
Returns:
- str: The recognized text of the license plate.
"""
# Step 1: Crop the license plate area
license_plate_area = frame[y1:y2, x1:x2]
# Step 2: Sharpen the license plate area using a kernel
sharpening_kernel = np.array([[-1, -1, -1], [-1, 9, -1], [-1, -1, -1]])
sharpened_license_plate = opencv.filter2D(license_plate_area, -1, sharpening_kernel)
# Step 3: Increase contrast using CLAHE
lab = opencv.cvtColor(sharpened_license_plate, opencv.COLOR_BGR2LAB)
l, a, b = opencv.split(lab)
clahe = opencv.createCLAHE(clipLimit=3.0, tileGridSize=(8, 8))
limg = opencv.merge([clahe.apply(l), a, b])
enhanced_license_plate = opencv.cvtColor(limg, opencv.COLOR_LAB2BGR)
# Step 4: Preprocess for OCR - Grayscale conversion, Gaussian blur, and Adaptive thresholding
grayscale_license_plate = opencv.cvtColor(enhanced_license_plate, opencv.COLOR_BGR2GRAY)
blurred_license_plate = opencv.GaussianBlur(grayscale_license_plate, (3, 3), 0)
_, thresholded_license_plate = opencv.threshold(blurred_license_plate, 0, 255, opencv.THRESH_BINARY + opencv.THRESH_OTSU)
# Step 5: Morphological operations to clean up the image
morph_kernel = opencv.getStructuringElement(opencv.MORPH_RECT, (3, 3))
opened_license_plate = opencv.morphologyEx(thresholded_license_plate, opencv.MORPH_OPEN, morph_kernel, iterations=1)
inverted_license_plate = opencv.bitwise_not(opened_license_plate)
# Step 6 1/2: Extract text using OCR
license_plate_text = pytesseract.image_to_string(
inverted_license_plate, lang='eng',
config='--psm 6 -c tessedit_char_whitelist=ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789'
)
return license_plate_text.strip()
I attempted to isolate the larger characters by using the height ratio of the license plate as a reference. However, as expected, this approach was not successful.
# Step 6: Filter out smaller elements and keep larger characters (optional)
contours, _ = opencv.findContours(inverted_license_plate, opencv.RETR_EXTERNAL, opencv.CHAIN_APPROX_SIMPLE)
filtered_license_plate = np.zeros_like(inverted_license_plate)
# Define a threshold for contour area (e.g., 90% of the license plate area)
threshold_area = 0.9 * license_plate_area.shape[0] * license_plate_area.shape[1] # Debbuging values
for contour in contours:
if opencv.contourArea(contour) > threshold_area:
opencv.drawContours(filtered_license_plate, [contour], -1, (255, 255, 255), thickness=opencv.FILLED)
This is the end-result with the filter:
And this is the result without the filter
We see 3 chars, a garden state icon, then 3 more chars.
Consider writing a routine that puts a bounding box on that NJ icon. With that in hand, you're in a better position to find a bbox for the 6 chars of interest. You might also find that deleting pixels within the icon's bbox improves OCR performance.
We see writing in several places, having large variations in font size.
Tesseract has some odd interactions with resolution. Give it some clean PDF bitmaps of Helvetica characters, and it works great over a smallish range of resolutions, say 1 em being a dozen to a few dozen pixels. Blow things up so we have a few hundred pixels per character, and OCR performance falls apart.
Notice that the small font text was adequately recognized. Here is some advice so Tesseract will ignore such distractor text:
If you're able to use bounding boxes to blank out distractors, that is fair game as well. But it's more sensitive to variation across plates and automobiles. You might be able to use the vertical extent of connected components to identify giant letters. You'd expect to see at least six such components, with similar baseline and similar height.
RGB color channel could help you out with relevant masks for your greyscale images. There's a relationship between the text "New Jersey" and the foreground / background colors that can be used. So identifying the state, and the color scheme, could be a helpful early stage of your pipeline.