I have created a searchable pdf file by running following command on one of my images.
tesseract page.jpg test pdf --oem 1 --psm 5 -l urd
this the image which I have converted to searchable pdf.
the image contains Urdu text, but when I am copying it from newly created pdf file and pasting it in any other text editor, this is what I am getting.
GehbFie”
any tesseract OCR and encoding expert here who can solve my issue please, any help will be highly appreciated, thanks in advance.
pdf is the config file name. it needs to come last in the command, after --oem --psm -l etc.
the correct format for the command is following.
tesseract page.jpg test --oem 1 --psm 5 -l urd pdf
I resolved my issue in this way.