I'm creating plots with matplotlib.pyplot and writing them to pdf. Some of these plots have largeish (up to 100,000) points and obviously have a lot of overlapping points, i.e. certain parts of the chart are just a solid mass. (That's okay - I'm interested in what the sparser parts of the graph look like.)
When I save these plots to pdf, it takes a long time to write, and reading the pdf is even worse. Is there a way to store a "lossy" copy of the plot in the pdf? For example, if I took a screenshot of the plot and embedded it in the pdf, it would load a lot faster.
I recommend trying to plot with the option rasterized
:
pts = np.random.rand(2, 100000)
plt.scatter(*pts, rasterized=True)
plt.savefig('rast.pdf')
For comparison:
plt.scatter(*pts)
plt.savefig('reg.pdf')
And
$ ls -lh tmp*.pdf
177K Dec 9 22:03 tmp_rast.pdf
1.5M Dec 9 22:02 tmp_reg.pdf