I'm implementing poppler pdftohtml method to convert pdf to html. I'm trying to run the exec file via python.
import subprocess
subprocess.Popen([r"D:/poppler-0.68.0/bin/pdftohtml.exe" , 'name.pdf', 'name.html'])
Using the above code I'm getting my html file and also the images(.jpg) of each and every page in pdf.
I need only the html file not images. What changes/arguments should i make/add to get my expected result?
According to their documentation there might be two options that could help you out with that:
-i ignore images
and
-s generate single HTML that includes all pages
If these don't work, there's nothing else you could do.