python-3.xopencvocrtesseractpython-tesseract

how to get digits recognize from digital weighing scale using OCR?


I need to extract decimal digits from the digital weighing scale, I am able to generate a mask image but am not able to extract digits from it.

import cv2
import numpy as np
import pytesseract

# Load the image
img = cv2.imread("input.png")

# Color-segmentation to get binary mask
lwr = np.array([43, 0, 71])
upr = np.array([103, 255, 130])
hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
msk = cv2.inRange(hsv, lwr, upr)
cv2.imwrite("/Users/ahx/Desktop/msk.png", msk)

# Extract digits
krn = cv2.getStructuringElement(cv2.MORPH_RECT, (5, 3))
dlt = cv2.dilate(msk, krn, iterations=5)
res = 255 - cv2.bitwise_and(dlt, msk)
cv2.imwrite("/Users/ahx/Desktop/res.png", res)

# Displaying digits and OCR
txt = pytesseract.image_to_string(res, config="--psm 6 digits")
print(''.join(t for t in txt if t.isalnum()))
cv2.imshow("res", res)
cv2.waitKey(0)

Input image

Output image

Can anyone please help how can I print the result from the output(mask)image?


Solution

  • This question will likely not have a definitive answer since there are different ways to aproach the problem. Multi-modal LLMs are likely the easiest way forward today if you don't care about latency/cost.

    I had a similar problem a few years ago and on my dataset the OpenCV aproaches simply didn't work well enough. (See article shared below)

    I ended up training an Object Detection model which first detects where the display is and then on the cropped image it detects digits. My problem was more complex since I wanted it to be rotation-invariant, which is especially tricky for 7-segment display digits.

    The detailed article with initial explorations and more-or-less the algorithm that I ended up using: https://agrbin.github.io/snapscale-article/

    Since this article I trained more models which are both more precise and faster. The final inference time is few dozen ms. Productionized model can be run from an iOS app: https://snapscale.life/

    I used the app to scan the input image from the question and got the expected result "29.9" in real-time.