pythonocrtext-recognitionpaddleocr

Paddle OCR fail to extract single digit


I’m using Paddle OCR to extract prices from some products price tags, in most cases it give good results but I noticed that it fails to extract prices with single digit

enter image description here

When I edit the image and add a digit next to the first one, the algorithm succeed to extract the price

enter image description here

I’m using Python with the basic code took from PaddleOCR documentation

from paddleocr import PaddleOCR
ocr = PaddleOCR(use_angle_cls=True, lang='fr')
result = ocr.ocr(image_path, cls=True)
for line in result[0]:
    text = line[1][0] 
    position = line[0]
    print(text, position)

Is there an approach or specific configuration that I can use to extract prices with single digit ?


Solution

  • Increase the max_text_length value switch to lang=en and use det=True in the inference:

    
    ocr = PaddleOCR(use_angle_cls=True,lang='en', max_text_length=50)
    img_path = './p.jpg'
    result = ocr.ocr(img_path, cls=True, det=True)
    

    Result:

    Result from draw_image