pdfhyperlinkghostscript

PostScript - Preserve internal hyperlinks in PDF


With my original question ps2pdf - Unable to open initial device thankfully answered by @KenS, I ran into another problem, where my internal hyperlinks (e.g. "see Figure 1") are lost when converting my PDF using gswin64. This is my command:

gswin64 -dPDFSETTINGS=/ebook -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -o output.pdf input.pdf

I uploaded a minimum example of the original PDF here (will be deleted after 2 weeks) and the converted version here. I found this answer, also from @KenS, on the potential possibility to exctract or preserve the links, but it just says "some PostScript programming". Is there another or "simple" way of achieving this? I found online PDF converters which are able to do so, so there must be a way.


Solution

  • The 'some PostScript programming' refers to extracting the Link information, as the answer says the PDF interpreter already does this for the benefit of the pdfwrite device.

    Your problem is that the Link annotation uses a Named Destination:

    30 0 obj
    <<
      /Type /Annot
      /Subtype /Link
      /Border [ 0 0 1 ]
      /H /I
      /C [ 1 0 0 ]
      /Rect [ 387.470001 700.413025 394.916992 713.314026 ]
      /A <<
        /S /GoTo
        /D (figure.caption.1)
      >>
    >>
    endobj
    

    The Names tree contains :

    52 0 obj
    <<
      /Names [ (Doc-Start) 34 0 R (figure.caption.1) 36 0 R (page.1) 
        33 0 R ]
      /Limits [ (Doc-Start) (page.1) ]
    >>
    endobj
    

    The named destination figure.caption.1 points to object 36:

    36 0 obj
    <<
      /D [ 29 0 R /XYZ 117.828003 696.228027 null ]
    >>
    endobj
    

    Now that could instead have been written much more simply by putting the content of object 36 in place of the figure.caption.1 in the original Destination, eg:

    30 0 obj
    <<
      /Type /Annot
      /Subtype /Link
      /Border [ 0 0 1 ]
      /H /I
      /C [ 1 0 0 ]
      /Rect [ 387.470001 700.413025 394.916992 713.314026 ]
      /A <<
        /S /GoTo
        /D [ 29 0 R /XYZ 117.828003 696.228027 null ]
      >>
    >>
    endobj
    

    I think that the latter, simpler construct would work, but indirection through the names tree does not. I think this is because the pdfwrite device doesn't preserve the Names tree, so it can't preserve any links which rely on the Names tree.

    In fact, I'm not convinced the current code preserves Link annotations at all, which it should, so I'm looking at that now.

    [EDIT]

    So this is a wrinkle I had forgotten....

    The PDF interpreter has to treat annotations in two different ways, depending on whether the PDF is being printed or not. See the PDF 1.7 Reference, section 8.4.2 Annotation Flags, bit position 3.

    If the file is being 'Printed' then there is no point in preserving Link annotations (how on earth would you click a link on the printed output ?).

    So when Printed is true, which is the default value, then the PDF interpreter doesn't preserve certain kinds of annotations. You can alter this quite easily by setting -dPrinted=false on the command line.

    NOTE Some annotations have the 'Print' flag set, which is what this is all about. If you set Printed to 'false' then annotations which have the 'Print' flag set will not be preserved. If you set Printed to true then those annotations will be preserved, but annotations which have the Print flag set to 0 will not be preserved. There is currently no way to have the PDF interpreter preserve both annotations with Print true and ones with Print false. This is likely to be changed in a future release because people do ask for it.

    If you set -dPrinted=false, your Link annotation will be preserved. I should note that it will not be the same construction as was in your original PDF file. It will use the simpler construction where the destination is explicitly stated in the Link annotation itself, rather than indirecting through the Names tree.

    The effect is the same, but it's an example of the kind of thing which is described in the documentation. I presume this won't be a problem for you though.

    Given the way the original file is constructed, I'm not surprised that the pdfwrite output is smaller! For some reason this file contains eight Forms, eight shadings and two colour spaces (one of which is empty) none of which appear to actually be used....