[SOLVED] Paddle OCR fail to extract single digit

Paddle OCR fail to extract single digit

I’m using Paddle OCR to extract prices from some products price tags, in most cases it give good results but I noticed that it fails to extract prices with single digit

When I edit the image and add a digit next to the first one, the algorithm succeed to extract the price

I’m using Python with the basic code took from PaddleOCR documentation

from paddleocr import PaddleOCR
ocr = PaddleOCR(use_angle_cls=True, lang='fr')
result = ocr.ocr(image_path, cls=True)
for line in result[0]:
    text = line[1][0] 
    position = line[0]
    print(text, position)

Is there an approach or specific configuration that I can use to extract prices with single digit ?

Solution

Increase the max_text_length value switch to lang=en and use det=True in the inference:


ocr = PaddleOCR(use_angle_cls=True,lang='en', max_text_length=50)
img_path = './p.jpg'
result = ocr.ocr(img_path, cls=True, det=True)

Result: