pythonpdfpngimage-conversionpdf2image

How do I convert a multiple paged PDF into a PNG image per pdf page in Python


Amateur Python developer here. I'm working on a project where I take multiple PDfs, each one with varying amounts of pages(1-20ish), and turn them into PNG files to use with pytesseract later.

I'm using pdf2image and poppler on a test pdf that has 3 pages. The problem is that it only converts the last page of the PDF to a PNG. I thought "maybe the program is making the same file name for each pdf page, and with each iteration it rewrites the file until only the last pdf page remains" So I tried to write the program so it would change the file name with each iteration. Here's the code.

from pdf2image import convert_from_path
images = convert_from_path('/Users/jacobpatty/vscode_projects/badger_colors/test_ai/10254_Craigs_Plumbing.pdf', 200)

file_name = 'ping_from_ai_test.png'
file_number = 0
for image in images:
    file_number =+ 1
    file_name = 'ping_from_ai_test' + str(file_number) + '.png'
    image.save(file_name)

This failed in 2 ways. It only made 2 png files('ping_from_ai_test.png' and 'ping_from_ai_test1.png') instead of 3, and when I clicked on the png files they were both just the last pdf page again. I don't know what to do at this point, any ideas?


Solution

  • Your code is only outputting a single file as far as I can see. The problem is that you have a typo in your code.

    The line

    file_number =+ 1

    is actually an assignment:

    file_number = (+1)

    This should probably be

    file_number += 1