pdfitextacrobatasposeacrobat-sdk

How to associate search catalog file (.pdx) with PDF document


Using a .NET application, I am trying to create a PDF "table of contents" that references other files, like one would distribute on a DVD etc.

For this purpose, I need a search index and catalog, so full-text search will work across documents. I have been able to automate the construction of the index by copying an "old" .pdx file (the directory structure is always the same) and then calling JavaScript from C#:

var js = $@"catalog.getIndex(""{pdxFilePath}"").build('alert(""Hello"")', true)";

formFields.ExecuteThisJavascript(js);

But how can I associate the .pdx file with my .pdf document, so it gets loaded automatically?

In Acrobat, this is set in the "advanced" document properties:

Acrobat document properties

However, this is not accessible via the info or metadata properties of the document. Apparently this is stored somewhere else, but I don't know enough about the PDF format to figure out how to access this data:

PDF structure

Any help would be highly appreciated. I could use both the Adobe SDK/JavaScript API or some other library (for instance, I know we already have an Aspose license).


Solution

  • Answering my own question here... I was able to solve this using PdfSharp.

    The following code is compatible with PdfSharp 1.50.4845-RC2a.

    pdxFile should be the name of the .pdx file including the file extension (e.g. "catalog.pdx"). I have only tested this with .pdx files located in the same folder as the PDF document, but I would assume that relative paths in general should work.

    No guarantees that this is a perfect solution as I lack a deeper understanding of the PDF format, but this seems to work at least.

        private void SetSearchCatalog(PdfDocument doc, string pdxFile)
        {
            var indexDict = new PdfDictionary(doc);
            indexDict.Elements["/F"] = new PdfString(pdxFile, PdfStringEncoding.RawEncoding);
            indexDict.Elements["/Type"] = new PdfName("/Filespec");
    
            var indexArrayItemDict = new PdfDictionary(doc);
            indexArrayItemDict.Elements["/Index"] = indexDict;
            indexArrayItemDict.Elements["/Name"] = new PdfName("/PDX");
    
            var indexArray = new PdfArray(doc, indexArrayItemDict);
    
            var searchDict = new PdfDictionary(doc);
            searchDict.Elements["/Indexes"] = indexArray;
    
            doc.Internals.Catalog.Elements["/Search"] = searchDict;
        }