javaocrtesseracttess4jarabic-support

Tesseract doesn't recognize Arabic characters


I'm working on an application that uses tesseract api to recognize plate numbers but in the plates there is characters in Arabic.

Someone have an idea how to make this?

this is an example of numberplate


Solution

  • Before everything, you need to pass your image for preprocessing and cropping the area around the plate. Then just perform the binarization for better OCR experience.

    Tesseract doesn't recognize Tashkeel. However, for the characters, use the below line to be able to detect both Arabic characters and English text. Also, please remember to choose an appropriate page segmentation mode.

    pytesseract.image_to_string(image,lang='eng+ara')
    

    You may also need to use the following command to see the configurations you can make to improve it.

    tesseract --print-parameters