pythonwindowspython-3.xpytesser

Image to Text Pytesseract Error


import pytesseract
from PIL import Image, ImageEnhance, ImageFilter
pytesseract.pytesseract.tesseract_cmd="C:\Program Files (x86)\Tesseract- 
OCR\tesseract.exe"
im = Image.open("d:\ss.png") # the second one 
im = im.filter(ImageFilter.MedianFilter())
enhancer = ImageEnhance.Contrast(im)
im = enhancer.enhance(2)
im = im.convert('1')
im.save('temp2.jpg')
text = pytesseract.image_to_string(Image.open('temp2.jpg'))
print(text)

Above is the code to convert an image to text, but it shows the following error:

Traceback (most recent call last):
  File "D:\txt14.py", line 10, in <module>
    text = pytesseract.image_to_string(Image.open('temp2.jpg'))
  File "C:\Users\Admin\AppData\Local\Programs\Python\Python36\lib\site-packages\pytesseract\pytesseract.py", line 193, in image_to_string
    return run_and_get_output(image, 'txt', lang, config, nice)
  File "C:\Users\Admin\AppData\Local\Programs\Python\Python36\lib\site-packages\pytesseract\pytesseract.py", line 140, in run_and_get_output
    run_tesseract(**kwargs)
  File "C:\Users\Admin\AppData\Local\Programs\Python\Python36\lib\site-packages\pytesseract\pytesseract.py", line 111, in run_tesseract
    proc = subprocess.Popen(command, stderr=subprocess.PIPE)
  File "C:\Users\Admin\AppData\Local\Programs\Python\Python36\lib\subprocess.py", line 709, in __init__
    restore_signals, start_new_session)
  File "C:\Users\Admin\AppData\Local\Programs\Python\Python36\lib\subprocess.py", line 997, in _execute_child
    startupinfo)
FileNotFoundError: [WinError 2] The system cannot find the file specified

Can you please help me figure out why this error is happening?


Solution

  • pytesseract.pytesseract.tesseract_cmd="C:\Program Files (x86)\Tesseract-OCR\tesseract.exe"
    

    That \t in the path isn't a backslash and a t, it's a tab character.

    For Windows pathname in your source code, if you want to use backslashes instead of forward slashes, always use raw string literals. Like this:

    pytesseract.pytesseract.tesseract_cmd=r"C:\Program Files (x86)\Tesseract-OCR\tesseract.exe"
    

    In a raw string literal, \t is a backslash and a t, not a tab character.

    You should do the same with the 'd:\ss.png'—you get lucky there, because \s happens to not be an escape sequence for anything (at least not as of Python 3.6), but better safe than sorry.