I'm using Python 3.10's 'pdfkit' (1.0.0), which calls 'wkhtmltopdf' (0.12.6) in the background, to generate a PDF using code-generated HTML as the source.
The HTML has a base64 encoded ttf font embedded and there are six small (<1k) base64 encoded images per page along with text and divs. The PDF document is only 8 A4 pages.
The call to create the PDF is simply:
pdfkit.from_string(source_html, "filename.pdf")
Using Windows 11 the file created is ~450 KB
Using Ubuntu 22 the file created is ~9 MB
The files, when opened, are visibly identical so what is causing this file size discrepancy and how can I fix it?
I solved this today. The image heavy PDF was 3 times bigger on Ubuntu than on Windows.
Firstly, I added the wkhtmltopdf option, "image-quality" to my from_string
/from_file
options dict
.
pdfkit.from_file("templatised.html", "out.pdf", options=options)
I regression test this on Windows and it makes no noticeable difference. I set the quality the same as I have used for the constituent images. I dislike the thought of applying lossy encoding twice, but I chose wkhtmltopdf
and I will stick with it.
Now, on Linux the option is required, but the python-pdfkit repo warns:
debian/ubuntu repos have reduced functionality (because it compiled without the wkhtmltopdf QT patches)
This doesn't identify image-quality
as missing functionality. Quality defaults to 94, which might explain the huge file sizes.
Setting image-quality
produces a warning when taking the OS-supplied wkhtmltopdf
installation:
The switch --image-quality, is not support using unpatched qt, and will be ignored
The python-pdfkit link before, links to solutions to this, which are out of date. I could not find a solution for Debian bookworm, so had to revert to bullseye.
Precompiled wkhtmltopdf
binaries for each OS can be found here. wkhtmltopdf
was archived January 2023. wkhtmltopdf/packaging was archived August 2023. I don't expect any updates.
Here is the script from JazzCore to install the QT patched version:
WKHTML2PDF_VERSION='0.12.6-1'
sudo apt install -y build-essential xorg libssl-dev libxrender-dev wget
wget "https://github.com/wkhtmltopdf/packaging/releases/download/${WKHTML2PDF_VERSION}/wkhtmltox_${WKHTML2PDF_VERSION}.bionic_amd64.deb"
sudo apt install -y ./wkhtmltox_${WKHTML2PDF_VERSION}.bionic_amd64.deb
They are targetting bionic_amd64 and 0.12.6-1. I want Debian bullseye and 0.12.6-2, which I had to make the appropriate changes to use.
Several other SO posts validated my approach, such as:
How to install wkhtmltopdf with patched qt?
See also https://pythonspeed.com/articles/base-image-python-docker-images/ for advice on migrating from bullseye-slim docker images to the more recent Ubuntu Jammy.