[SOLVED] How do I iterate through files in my directory so they can be opened/read using PyPDF2?

How do I iterate through files in my directory so they can be opened/read using PyPDF2?

I am working on an invoice scraper for work, where I have successfully written all the code to scrape the fields that I need using PyPDF2. However, I am having trouble figuring out how to put this code into a for loop so I can iterate through all the invoices stored in my directory. There could be anywhere from 1 to 250+ files depending on which project I am using this for.

I thought I would be able to use "*.pdf" in place of the pdf name, but it does not work for me. I am relatively new to Python and have not used that many loops before, so any guidance would be appreciated!

import re

pdfFileObj = open(r'C:\Users\notylerhere\Desktop\Test Invoices\SampleInvoice.pdf', 'rb')
pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
pageObj = pdfReader.getPage(0)

#Print all text on page
#print(pageObj.extractText())

#Grab Account Number Meter Number
accountNumber = re.compile(r'\d\d\d\d\d-\d\d\d\d\d')
meterNumber = re.compile(r'(\d\d\d\d\d\d\d\d)')
moAccountNumber = accountNumber.search(pageObj.extractText())
moMeterNumber = meterNumber.search(pageObj.extractText())
print('Account Number: '+moAccountNumber.group())
print('Meter Number: '+moMeterNumber.group(1))'''

Thanks very much!

Solution

Another option is glob:

import glob

files = glob.glob("c:/mydirectory/*.pdf")

for file in files:

    (Do your processing of file here)

You need to ensure everything past the colon is properly indented.