[SOLVED] Python: For loop only iterates once - also using a with statement

Python: For loop only iterates once - also using a with statement

I am trying to open a zip file and iterate through the PDFs in the zip file. I want to scrape a certain portion of the text in the pdf. I am using the following code:

def get_text(part):
    #Create path
    path = f'C:\\Users\\user\\Data\\Part_{part}.zip'
    
    with zipfile.ZipFile(path) as data:
        listdata = data.namelist()
        onlypdfs = [k for k in listdata if '_2018' in k or '_2019' in k or '_2020' in k or '_2021' in k or '_2022' in k]

        for file in onlypdfs:
            with data.open(file, "r") as f:
                #Get the pdf
                pdffile = pdftotext.PDF(f)
                text = ("\n\n".join(pdffile))

    
                #Remove the newline characters
                text = text.replace('\r\n', ' ')
                text = text.replace('\r', ' ')
                text = text.replace('\n', ' ')
                text = text.replace('\x0c', ' ')

                #Get the text that will talk about what I want
                try:
                    text2 = re.findall(r'FEES (.+?) Types', text, re.IGNORECASE)[-1]

                except:
                    text2 = 'PROBLEM'

                #Return the file name and the text
                return file, text2

Then in the next line I am running:

info = []
for i in range(1,2):
    info.append(get_text(i))
info

My output is only the first file and text. I have 4 PDFs in the zip folder. Ideally, I want it to iterate through the 30+ zip files. But I am having trouble with just one. I've seen this question asked before, but the solutions didn't fit my problem. Is it something with the with statement?

Solution

You need to process all the files and store each of them as you iterate. An example of how you could do this is to store them in a list of tuples:

file_list = []
for file in onlypdfs:
    ...
    file_list.append((file, text2)
return file_list

You could then use this like so:

info = []
for i in range(1,2):
    list = get_text(i)
    for file_text in list:
        info.append(file_text)
print(info)