pythonpdfpython-imaging-librarypngtransparency

Convert a PDF to a PNG with transparency


My goal is to obtain a PNG file with a transparent background from a PDF file. The convert tool can do the job:

$ convert test.pdf test.png 
$ file test.png 
test.png: PNG image data, 595 x 842, 8-bit gray+alpha, non-interlaced

But I would like to do it programmatically in python without relying on convert or any other tool. I came up with the pdf2image package but I could not figure out how to get transparency. Here is my attempt:

import pdf2image
with open("test.pdf", "rb") as fd:
    pdf = pdf2image.convert_from_bytes(fd.read(), transparent=True)
pdf[0].save("test.png")

Unfortunately I lose transparency:

$ python test.py
$ file test.png 
test.png: PNG image data, 1654 x 2339, 8-bit/color RGB, non-interlaced

Is there any way to do this without relying on an external tool using pdf2image or any other package ?


Solution

  • With PyMuPDF, you can do this:

    import pymupdf
    doc=pymupdf.open("test.pdf")
    for page in doc:
        pix = page.get_pixmap(alpha=True, dpi=150)
        pix.save(f"{doc.name}-{page.number}.png")
    

    Results in transparent PNG images named "test.pdf-0.png", etc. The images have a resolution of 150 DPI in above case.

    Note: I am a maintainer and the original creator of PyMuPDF.