pythonpoppler

How to convert multi-page pdf to single html file


I'm implementing poppler pdftohtml method to convert pdf to html. I'm trying to run the exec file via python.

import subprocess

subprocess.Popen([r"D:/poppler-0.68.0/bin/pdftohtml.exe" , 'name.pdf', 'name.html'])

Using the above code I'm getting my html file and also the images(.jpg) of each and every page in pdf.

I need only the html file not images. What changes/arguments should i make/add to get my expected result?


Solution

  • According to their documentation there might be two options that could help you out with that:

    -i ignore images

    and

    -s generate single HTML that includes all pages

    If these don't work, there's nothing else you could do.