I have managed to extract images from several PDF pages with the below code, but the resolution is quite low. Is there a way to adjust that?
import fitz
pdffile = "C:\\Users\\me\\Desktop\\myfile.pdf"
doc = fitz.open(pdffile)
for page_index in range(doc.pageCount):
page = doc.loadPage(page_index)
pix = page.getPixmap()
output = "image_page_" + str(page_index) + ".jpg"
pix.writePNG(output)
I have also tried using the code here and updated if pix.n < 5" to "if pix.n - pix.alpha < 4 but this didn't output any images in my case.
As stated in this issue for PyMuPDF, you have to use a matrix: issue on Github.
The example given is:
zoom = 2 # zoom factor
mat = fitz.Matrix(zoom, zoom)
pix = page.getPixmap(matrix = mat, <...>)
Indicated in the issue is also that the default resolution is 72 dpi if you don't use a matrix which likely explains your getting low resolution.