pythonmatplotlibpdffontsfont-embedding

Reducing file sizes of PDFs created using matplotlib by changing font embedding


I'm using matplotlib to produce PDF figures. However, even the simplest figures produce relatively large files, the MWE below produces a file of almost 1 MB. I've become aware that the large file size is due to matplotlib fully embedding all the used fonts. Since I'm going to produce quite a few plots and would like to reduce the file sizes, I'm wondering:

Main question:

Is there a way to get matplotlib to embed font subsets instead of the complete fonts? I would also be fine with not including the fonts at all.

Things considered so far:

Since it is easy, though labor-intensive, to produce files with embedded subsets using external software, is it somehow possible to achieve this directly in matplotlib? Any help would be greatly appreciated.

MWE

import matplotlib.pyplot as plt #Setup
import matplotlib as mpl
mpl.rcParams['pdf.fonttype'] = 42
mpl.rcParams['mathtext.fontset'] = 'dejavuserif'
mpl.rc('font',family='Arial',size=12)

fig,ax=plt.subplots(figsize=(2,2)) #Create a figure containing some text
ax.semilogy(1,1,'s',label='Text\n$M_\mathrm{ath}$')
ax.legend()
fig.tight_layout()
fig.savefig('test.pdf')

Environment: matplotlib 3.1.1


Solution

  • Leaving this here in case anybody else might be looking for something similar: After all, I decided to opt for Ghostscript. Due to the extra step it is not exactly what I was looking for, but at least it can be automated:

    import subprocess
    def gs_opt(filename):
        filenameTmp = filename.split('.')[-2]+'_tmp.pdf'
        gs = ['gswin64',
              '-sDEVICE=pdfwrite',
              '-dEmbedAllFonts=false',
              '-dSubsetFonts=true',             # Create font subsets (default)
              '-dPDFSETTINGS=/prepress',        # Image resolution
              '-dDetectDuplicateImages=true',   # Embeds images used multiple times only once
              '-dCompressFonts=true',           # Compress fonts in the output (default)
              '-dNOPAUSE',                      # No pause after each image
              '-dQUIET',                        # Suppress output
              '-dBATCH',                        # Automatically exit
              '-sOutputFile='+filenameTmp,      # Save to temporary output
              filename]                         # Input file
    
        subprocess.run(gs)                                      # Create temporary file
        subprocess.run(['del', filename],shell=True)            # Delete input file
        subprocess.run(['ren',filenameTmp,filename],shell=True) # Rename temporary to input file
    

    And then calling

    filename = 'test.pdf'
    plt.savefig(filename)
    gs_opt(filename)
    

    This will save the figure as test.pdf, use Ghostscript to create a temporary, optimized test_tmp.pdf, delete the initial file and rename the optimized file to test.pdf.

    Compared to exporting the file with a vector graphics editor, the resulting PDF created by Ghostscript is still a few times larger (typically 4-5 times). However, it is decreasing the file size to something between 1/5 and 1/10 of the initial file. It’s something.