pythonpdfpymupdf

Extract images from PDF in high resolution with Python


I have managed to extract images from several PDF pages with the below code, but the resolution is quite low. Is there a way to adjust that?

import fitz    
pdffile = "C:\\Users\\me\\Desktop\\myfile.pdf"
doc = fitz.open(pdffile)
for page_index in range(doc.pageCount):
    page = doc.loadPage(page_index)  
    pix = page.getPixmap()
    output = "image_page_" + str(page_index) + ".jpg"
    pix.writePNG(output)

I have also tried using the code here and updated if pix.n < 5" to "if pix.n - pix.alpha < 4 but this didn't output any images in my case.


Solution

  • As stated in this issue for PyMuPDF, you have to use a matrix: issue on Github.

    The example given is:

    zoom = 2    # zoom factor
    mat = fitz.Matrix(zoom, zoom)
    pix = page.getPixmap(matrix = mat, <...>)
    

    Indicated in the issue is also that the default resolution is 72 dpi if you don't use a matrix which likely explains your getting low resolution.