I'm using matplotlib to produce PDF figures. However, even the simplest figures produce relatively large files, the MWE below produces a file of almost 1 MB. I've become aware that the large file size is due to matplotlib fully embedding all the used fonts. Since I'm going to produce quite a few plots and would like to reduce the file sizes, I'm wondering:
Main question:
Is there a way to get matplotlib to embed font subsets instead of the complete fonts? I would also be fine with not including the fonts at all.
Things considered so far:
Since it is easy, though labor-intensive, to produce files with embedded subsets using external software, is it somehow possible to achieve this directly in matplotlib? Any help would be greatly appreciated.
MWE
import matplotlib.pyplot as plt #Setup
import matplotlib as mpl
mpl.rcParams['pdf.fonttype'] = 42
mpl.rcParams['mathtext.fontset'] = 'dejavuserif'
mpl.rc('font',family='Arial',size=12)
fig,ax=plt.subplots(figsize=(2,2)) #Create a figure containing some text
ax.semilogy(1,1,'s',label='Text\n$M_\mathrm{ath}$')
ax.legend()
fig.tight_layout()
fig.savefig('test.pdf')
Environment: matplotlib 3.1.1
Leaving this here in case anybody else might be looking for something similar: After all, I decided to opt for Ghostscript. Due to the extra step it is not exactly what I was looking for, but at least it can be automated:
import subprocess
def gs_opt(filename):
filenameTmp = filename.split('.')[-2]+'_tmp.pdf'
gs = ['gswin64',
'-sDEVICE=pdfwrite',
'-dEmbedAllFonts=false',
'-dSubsetFonts=true', # Create font subsets (default)
'-dPDFSETTINGS=/prepress', # Image resolution
'-dDetectDuplicateImages=true', # Embeds images used multiple times only once
'-dCompressFonts=true', # Compress fonts in the output (default)
'-dNOPAUSE', # No pause after each image
'-dQUIET', # Suppress output
'-dBATCH', # Automatically exit
'-sOutputFile='+filenameTmp, # Save to temporary output
filename] # Input file
subprocess.run(gs) # Create temporary file
subprocess.run(['del', filename],shell=True) # Delete input file
subprocess.run(['ren',filenameTmp,filename],shell=True) # Rename temporary to input file
And then calling
filename = 'test.pdf'
plt.savefig(filename)
gs_opt(filename)
This will save the figure as test.pdf, use Ghostscript to create a temporary, optimized test_tmp.pdf, delete the initial file and rename the optimized file to test.pdf.
Compared to exporting the file with a vector graphics editor, the resulting PDF created by Ghostscript is still a few times larger (typically 4-5 times). However, it is decreasing the file size to something between 1/5 and 1/10 of the initial file. It’s something.