pythonpython-imaging-libraryanacondapython-tesseractpytesser

Getting started with Python OCR on windows?


I have never used python before, and I am not sure where to start. My goal is to take image data, of numbers and multicolored background, and reliably get the correct characters identified. I looked into the tools necessary for this and I found the Anaconda python distribution which included all the possible packages I might need for this, as well as tesseract-ocr and pytesser.

Unfortunately, I'm lost in how to begin. I"m using the PyCharm Community IDE and simply trying to follow this guide: http://www.manejandodatos.es/2014/11/ocr-python-easy/ to get a grasp on OCR.

This is the code I'm using:

from PIL import Image
from pytesser import *

image_file = 'menu.jpg'
im = Image.open(image_file)
text = image_to_string(im)
text = image_file_to_string(image_file)
text = image_file_to_string(image_file, graceful_errors=True)
print "=====output=======\n"
print text

and I believe the Anaconda distribution that I'm using has PIL, but I'm getting this error:

C:\Users\diego_000\Anaconda\python.exe C:/Users/diego_000/PycharmProjects/untitled/test.py
Traceback (most recent call last):
  File "C:/Users/diego_000/PycharmProjects/untitled/test.py", line 2, in <module>
    from pytesser import *
  File "C:\Users\diego_000\PycharmProjects\untitled\pytesser.py", line 6, in <module>
    import Image
ImportError: No module named Image

Process finished with exit code 1

Can anyone point me in the right direction?


Solution

  • The document you point to says to use

    from PIL import Image
    

    except you use

    import Image
    

    and so the interpreter properly says:

    ImportError: No module named Image
    

    It looks as if you reordered the lines

    from PIL import Image
    from pytesser import *
    

    and that pytesser has a improperly coded dependency on PIL. but I can't be certain with the code you provided.