optimizationpdf-generationghostscriptpdftkqpdf

PDF Optimization - Image Load Before Embedded Text - See Examples Provided


I have been trying to find a way to get our OCRed PDF (bad-uc.pdf) to behave the same as the infix saved (good-uc.pdf).

If you open the following two files in Acrobat Reader (any version should show the same problem), you will see the bad-uc.pdf loads the text before the page image (very slowly)... where the good-uc.pdf loads everything together (seems much faster and responsive).

good-uc.pdf: https://drive.google.com/file/d/0B-Nxr9ySWJnNX2dZSmVscEZIRmc/view?usp=sharing bad-uc-pdf: https://drive.google.com/file/d/0B-Nxr9ySWJnNN2t6X2hFNTBxa0U/view?usp=sharing

I have tried: pdftk, pdftops, ghostscript, pdf2ps, ps2pdf and qpdf, but still couldn't get the images to load before the text... Can someone experts in PDF shed some lights on why these two PDFs behave differently...

My guess is infix restructure the PDF so the images get loaded before the embedded text, but I cannot find a Linux command line tool that can do this kind of PDF structure Optimization.

Greatly appreciated!! Jeffrey


Solution

  • shed some lights on why these two PDFs behave differently...

    Actually both your PDFs take about the same time until being properly displayed by Adobe Reader on my computer. But while your bad-uc.pdf first shows the OCR'ed text and then covers it with the scan, the good-uc.pdf first seems to show a plain page and then covers it with the scan.

    The cause for this is that good-uc.pdf paints the OCR'ed text in rendering mode 3 ("invisible") while bad-uc.pdf paints it normally in rendering mode 0 ("fill outline") with fill color black. As invisible painting may require less time than actual painting in black on white, there might even be an objective difference between the rendering times but I think it mostly is subjective.