pdfpdf-generationitextbookmarkstagged-pdf

Create destinations for all bookmarks in a PDF file with iText API


I'd like to write some (java) code that takes a PDF document, and creates named destinations from all of the bookmarks. I think the iText API is the easiest way of doing this, but I have never used the API before.

How would you go about writing this sort of code with the iText API? Can iText do the parsing needed to manipulate existing PDFs by itself? The kind of manipulations I am thinking of are:

Or is there a different API that would be better?


Solution

  • I'll just warn you up front that you may be disappointed with this. iText isn't really intended to be used as a parser. It's really more for creating entirely new PDF documents, but you can take a whack at it.

    To start, using iText, you won't be able to modify the existing PDF document. What you can do, though, is to make a copy with the additional features that you want. (If somebody else knows better, please let me know, this drives me crazy.)

    What you will want to do is create a PdfReader object from an input stream on your source file. Then create a PdfCopy object (which is just an extended PdfWriter that makes getting data from an existing source more convenient) for your destination.

    As far as I can tell, the bookmarks cannot be obtained from iText at all. Another library may be needed. I think jpedal may have the ability to extract them (it can get them as an XML document, which you may then have to parse to get what you want.) However you get them, you can then add them to a java.util.List, and set that list as outline on the PDFCopy. The bookmarks themselves are just HashMaps with a particular set of keys. I'm not sure what all of the values are, but they include "Title", "Action" (which seems to be where you'd specify that this is a named destination, though I don't know what that value would be), and "URI" (which is used if this is an external link -- I suspect that this would specify the name of the named destination that you're linking to). Again, the specifics are hard to find.

    Then iterate over the pages of the reader, importing each page to the PdfCopy. this page may help you.

    Sorry I'm not more helpful to you. Good luck.

    P.S. If anybody else knows of a better tool that's either (L)GPL or BSD licensed, I'd love to hear about it.