pythonocr

OCR every .png file in a folder


I want to iterate through every .png file in a folder and print every text contained in the Images. The first iteration works fine but the second gives an Error.

Code:

import pytesseract
from PIL import Image
import os

directory = (r'C:\folder...')

for filename in os.listdir(directory):
    if filename.endswith('.png'):
        Image = Image.open(filename)
        im = pytesseract.image_to_string(Image)
        print(im)

Output:

Traceback (most recent call last): File "C:\Users\Artur\Desktop\Pytesseract_test.py", line 9, in Image = Image.open(filename) AttributeError: 'PngImageFile' object has no attribute 'open'

What does it mean 'PngImageFile' object has no attribute 'open'? Doesn't Image = Image.open(filename) does exactly that?

Thanks in advance

Edit:

The initial PngError is solved but now another Error with PIL library occured:

import pytesseract
from PIL import Image
import os

directory = (r'C:\folder...')

for filename in os.listdir(directory):
    if filename.endswith('.png'):
        img = Image.open(filename)
        im = pytesseract.image_to_string(img)
        print(im)

Output: (ocr of 'frame_0000.png' is correct and then)

Traceback (most recent call last):
  File "C:\Users\Artur\Desktop\Pytesseract_test.py", line 9, in <module>
    img = Image.open(filename)
  File "C:\Users\Artur\AppData\Local\Programs\Python\Python36\lib\site-packages\PIL\Image.py", line 2580, in open
    fp = builtins.open(filename, "rb")
FileNotFoundError: [Errno 2] No such file or directory: 'frame_0001.png'

Edit2:

This is very strange. When I do this:

for filename in os.listdir(r'folderpath...'):
    print(filename)

it works perfectly fine, iterating through every file, printing every filename.

But when I do this:

for filename in os.listdir(r'folderpath...'):
    print(filename)
    print(pytesseract.image_to_string(Image.open(filename)))

an Error is given:

Bewegung_UHF_Plots.m
Traceback (most recent call last):
  File "C:\Users\Artur\Desktop\Pytesseract_test.py", line 19, in <module>
    print(pytesseract.image_to_string(Image.open(filename)))
  File "C:\Users\Artur\AppData\Local\Programs\Python\Python36\lib\site-packages\PIL\Image.py", line 2580, in open
    fp = builtins.open(filename, "rb")
FileNotFoundError: [Errno 2] No such file or directory: 'Bewegung_UHF_Plots.m'

Solution

  • Change name of variable Image to something else like pic or picture