python matplotlib pdf fonts font-embedding

Reducing file sizes of PDFs created using matplotlib by changing font embedding

I'm using matplotlib to produce PDF figures. However, even the simplest figures produce relatively large files, the MWE below produces a file of almost 1 MB. I've become aware that the large file size is due to matplotlib fully embedding all the used fonts. Since I'm going to produce quite a few plots and would like to reduce the file sizes, I'm wondering:

Main question:

Is there a way to get matplotlib to embed font subsets instead of the complete fonts? I would also be fine with not including the fonts at all.

Things considered so far:

A vector graphics editor can readily be used to export a PDF including font subsets (as well as not including fonts at all), but having to perform this step for every file (revision) appears unnecessarily tedious.
Similarly, I've read about post-processing PDF-files (e.g. using Ghostscript), though the effort seems comparable.
I tried setting 'pdf.fonttype'= 3, which does indeed produces considerably smaller files. However, I'd like to keep the text modifiable in vector graphics editors - which doesn't seem to work in this case (for example minus-signs will not be saved as text).

Since it is easy, though labor-intensive, to produce files with embedded subsets using external software, is it somehow possible to achieve this directly in matplotlib? Any help would be greatly appreciated.

MWE

import matplotlib.pyplot as plt #Setup
import matplotlib as mpl
mpl.rcParams['pdf.fonttype'] = 42
mpl.rcParams['mathtext.fontset'] = 'dejavuserif'
mpl.rc('font',family='Arial',size=12)

fig,ax=plt.subplots(figsize=(2,2)) #Create a figure containing some text
ax.semilogy(1,1,'s',label='Text\n$M_\mathrm{ath}$')
ax.legend()
fig.tight_layout()
fig.savefig('test.pdf')

Environment: matplotlib 3.1.1

Solution

Leaving this here in case anybody else might be looking for something similar: After all, I decided to opt for Ghostscript. Due to the extra step it is not exactly what I was looking for, but at least it can be automated:

import subprocess
def gs_opt(filename):
    filenameTmp = filename.split('.')[-2]+'_tmp.pdf'
    gs = ['gswin64',
          '-sDEVICE=pdfwrite',
          '-dEmbedAllFonts=false',
          '-dSubsetFonts=true',             # Create font subsets (default)
          '-dPDFSETTINGS=/prepress',        # Image resolution
          '-dDetectDuplicateImages=true',   # Embeds images used multiple times only once
          '-dCompressFonts=true',           # Compress fonts in the output (default)
          '-dNOPAUSE',                      # No pause after each image
          '-dQUIET',                        # Suppress output
          '-dBATCH',                        # Automatically exit
          '-sOutputFile='+filenameTmp,      # Save to temporary output
          filename]                         # Input file

    subprocess.run(gs)                                      # Create temporary file
    subprocess.run(['del', filename],shell=True)            # Delete input file
    subprocess.run(['ren',filenameTmp,filename],shell=True) # Rename temporary to input file

And then calling

filename = 'test.pdf'
plt.savefig(filename)
gs_opt(filename)

This will save the figure as test.pdf, use Ghostscript to create a temporary, optimized test_tmp.pdf, delete the initial file and rename the optimized file to test.pdf.

Compared to exporting the file with a vector graphics editor, the resulting PDF created by Ghostscript is still a few times larger (typically 4-5 times). However, it is decreasing the file size to something between 1/5 and 1/10 of the initial file. It’s something.