pythontesseractpython-tesseract

Pytesseract not recognize text from image in Python


I am working with a Django application, there for some purpose i need to solve captcha i am already saving temporary captcha file but when i try to read captcha using pytesseract it return nothing empty string.

enter image description here

enter image description here


Solution

  • tesseract sometimes may have problem when text is too small or too big. It may have other problems. See documentation: Improving the quality of the output | tessdoc


    If I resize your image 200% then tesseract can get text.

    I used external program ImageMagick for this but you may use python module pillow
    (or Wand which also uses Imagemagick)

    $ convert captcha.png -scale 200% captcha-200p.png
    

    Command file can show some information about files

    $ file ca*
    
    captcha-200p.png: PNG image data, 300 x 60, 8-bit grayscale, non-interlaced
    captcha.png:      PNG image data, 150 x 30, 8-bit/color RGBA, non-interlaced
    

    Strange is that you don't get any error message because when I run tesseract only with input image then it shows message how to use it

    $ tesseract captcha-200p.png
    
    Usage:
      tesseract --help | --help-extra | --version
      tesseract --list-langs
      tesseract imagename outputbase [options...] [configfile...]
    
    OCR options:
      -l LANG[+LANG]        Specify language(s) used for OCR.
    NOTE: These options must occur before any configfile.
    
    Single options:
      --help                Show this help message.
      --help-extra          Show extra help for advanced users.
      --version             Show version information.
      --list-langs          List available languages for tesseract engine.
    

    It needs output name without extension (and it adds .txt) to write result in file

    $ tesseract captcha-200p.png output
    
    Estimating resolution as 308
    
    $ cat ouput.txt
    
    81+20=?
    

    or it needs - to set ouput to stdout and show it on screen or redirect to other program

    $ tesseract captcha-200p.png -
    
    Estimating resolution as 308
    81+20=?
    

    Tested on: Linux Mint 22 (based on Ubuntu 24.02), tesseract 5.3.4 (leptonica-1.82.0)