pythonpytesser

pytesser - next line of text in image?


I'm using pytesser on simple images with plain text. It works great! However, in python it prints each line of text on a new line. But the string it outputs has no "\n" or new line delimiters that I can pull out.

How does it print each new line of the image on a new line in the console? And is there a way I can pull out a particular line? or split them myself?

It's more than likely something very simple i'm missing...

from pytesser import *
image = Image.open('image.jpg') 

text =  image_to_string(image)

print len(text)
print text 

Output:

983
BACK RASHER 1.24
T CREAM 250ML 1.19
T COFFEE 200G 1.09
PEANUT BUTTER 1.12
DIET COKE * 2.39

Solution

  • Thanks to for pointing out my mistake. repr() shows the output as the interpeter sees it, along with the new line "\n" delimiters. Using text.split("\n") I can then split the output up line by line. Thanks dlask!

    from pytesser import *
    image = Image.open('image.jpg')  # Open image object using PIL
    
    text =  image_to_string(image)     # Run tesseract.exe on image
    
    print(repr(text))
    result = text.split("\n")
    
    print result