pdfghostscriptprinter-control-language

GhostPCL creates invalid pdf


I just downloaded GhostPCL.

Here's how I am calling GhostPCL:

> gpcl6win64.exe -sDEVICE=pdfwrite -o C:\temp\output.pdf C:\temp\input.spl

Input/Output: Get it from my DropBox

The generated pdf seems to be broken.

I cannot select text as expected

enter image description here

and when I copy the selected content to notepad it looks like this:

PDF-Content

Am I missing something or is there a bug in GhostPCL ?


Solution

  • That's because PCL has very limited information about what a given character code is, in terms of another Encoding. Say, for example, Unicode.

    Its entirely possible for a PCL page to download a custom subset font, and then use character codes which only work 'correctly' with that font.

    For example, say that we embed the font in such a way that we set character code 1 for the first character we use, character code 2 for the second and so on. Then we send the text "Hello World"

    That would then be represented in the PCL as

    0x01 0x02 0x03 0x03 0x04 0x05 0x06 0x04 0x07 0x03 0x08

    Obviously, that's not any kind of Encoding which makes sense, and PCL doesn't not have any means of carrying a Unicode mapping around.

    Now, your PCL file contains several TrueType fonts, and its 'possible' that there is enough information in the CMAP subtables of the fonts to resurrect some kind of meaning from the 'text', but the GhostPCL doesn't have that kind of sophistication.

    So no you aren't missing anything, and no there isn't a bug. Please note that the goal for pdfwrite is that the resulting PDF file should be visibly the same as the printed output, nothing more. Despite people's wishful thinking, PDF was never designed as an editable format and the vast majority of PDF files cannot be edited, nor can they reliably have 'text' extracted from them. Some will work, many don't.