I am working with a Django application, there for some purpose i need to solve captcha i am already saving temporary captcha file but when i try to read captcha using pytesseract it return nothing empty string.
tesseract
sometimes may have problem when text is too small or too big. It may have other problems. See documentation: Improving the quality of the output | tessdoc
If I resize your image 200%
then tesseract can get text.
I used external program ImageMagick for this but you may use python module pillow
(or Wand which also uses Imagemagick
)
$ convert captcha.png -scale 200% captcha-200p.png
Command file
can show some information about files
$ file ca*
captcha-200p.png: PNG image data, 300 x 60, 8-bit grayscale, non-interlaced
captcha.png: PNG image data, 150 x 30, 8-bit/color RGBA, non-interlaced
Strange is that you don't get any error message because when I run tesseract only with input image then it shows message how to use it
$ tesseract captcha-200p.png
Usage:
tesseract --help | --help-extra | --version
tesseract --list-langs
tesseract imagename outputbase [options...] [configfile...]
OCR options:
-l LANG[+LANG] Specify language(s) used for OCR.
NOTE: These options must occur before any configfile.
Single options:
--help Show this help message.
--help-extra Show extra help for advanced users.
--version Show version information.
--list-langs List available languages for tesseract engine.
It needs output name without extension (and it adds .txt
) to write result in file
$ tesseract captcha-200p.png output
Estimating resolution as 308
$ cat ouput.txt
81+20=?
or it needs -
to set ouput to stdout and show it on screen or redirect to other program
$ tesseract captcha-200p.png -
Estimating resolution as 308
81+20=?
Tested on: Linux Mint 22 (based on Ubuntu 24.02), tesseract 5.3.4 (leptonica-1.82.0)