My goal is to obtain a PNG file with a transparent background from a PDF file.
The convert
tool can do the job:
$ convert test.pdf test.png
$ file test.png
test.png: PNG image data, 595 x 842, 8-bit gray+alpha, non-interlaced
But I would like to do it programmatically in python without relying on convert
or any other tool. I came up with the pdf2image
package but I could not figure out how to get transparency. Here is my attempt:
import pdf2image
with open("test.pdf", "rb") as fd:
pdf = pdf2image.convert_from_bytes(fd.read(), transparent=True)
pdf[0].save("test.png")
Unfortunately I lose transparency:
$ python test.py
$ file test.png
test.png: PNG image data, 1654 x 2339, 8-bit/color RGB, non-interlaced
Is there any way to do this without relying on an external tool using pdf2image
or any other package ?
With PyMuPDF, you can do this:
import pymupdf
doc=pymupdf.open("test.pdf")
for page in doc:
pix = page.get_pixmap(alpha=True, dpi=150)
pix.save(f"{doc.name}-{page.number}.png")
Results in transparent PNG images named "test.pdf-0.png", etc. The images have a resolution of 150 DPI in above case.
Note: I am a maintainer and the original creator of PyMuPDF.