I'm trying to do something fairly simple: store each text box in a powerpoint file as an element in a giant python list. This code should be getting me to that outcome:
text_array = []
for eachfile in glob.glob("master_folder\*.pptx"):
prs = Presentation(eachfile)
#print(eachfile)
#print("----------------------")
for slide in prs.slides:
for shape in slide.shapes:
if hasattr(shape, "text"):
text_array.append(shape.text)
However, like some other questions on SO (PPTX Package not Found), I am greeted with the error:
PackageNotFoundError: Package not found at 'master_folder\April_2020.pptx'
What I've tried:
However, the error has persisted.
Can someone with experience using this library point me in the right direction for the simple task of scraping in-document text and storing it within an native python list (as seen in my code)?
if you can't find anything maybe provide a sample pptx which is not working.