javapdfboxpdfrenderer

PDFBox 2 unusual memory consumption


We are trying to render images from different PDF files, using PDFRenderer's method renderImageWithDPI. On a particular PDF, for some pages, the library renderer has a different behaviour.

The rendering itself takes way longer than for other similar pages, and the memory consumption reaches unusually big values: the memory consumed by the process goes up with about 50MB every 1 - 2 seconds, until it reaches values like 5GB of RAM consumed by the application process while in renderImageWithDPI. Once the thread finishes renderImageWithDPI, the memory consumption drops with 1.5 - 2 GB almost immediately. Due to the high memory consumption, sometimes a Java Heap Space Exception can be thrown.

The pages on which this happens are not visibly different than others, with the same width, height, and disk size. The rendering is done with 250 DPI, with ImageType RGB. Also, the application is running with the "-Dsun.java2d.cmm=sun.java2d.cmm.kcms.KcmsServiceProvider" parameter.

Is this a memory leak or an expected behaviour? Also, could somebody explain why some pages suck up 2GB of memory and take 1 minute to be rendered, while others are rendered in a couple of seconds?


Solution

  • Analysis of the PDF shows that page 34 has over 10000 XObject elements, almost all of them CMYK images. You can see this yourself with the PDFDebugger command line app, go to page 34, then resources, then XObject. Converting them is not very fast in java. Memory usage is most likely due to us caching these images. You can observe that the next time the page is shown, it is done much faster. Disabling the cache is shown in the FAQ.

    I also get some speed improvement (21 seconds instead of 89 seconds) by using this option: -Dorg.apache.pdfbox.rendering.UsePureJavaCMYKConversion=true. However image quality may be very slightly different, see PDFBOX-3569 for a discussion.