Using a .NET application, I am trying to create a PDF "table of contents" that references other files, like one would distribute on a DVD etc.
For this purpose, I need a search index and catalog, so full-text search will work across documents. I have been able to automate the construction of the index by copying an "old" .pdx file (the directory structure is always the same) and then calling JavaScript from C#:
var js = $@"catalog.getIndex(""{pdxFilePath}"").build('alert(""Hello"")', true)";
formFields.ExecuteThisJavascript(js);
But how can I associate the .pdx file with my .pdf document, so it gets loaded automatically?
In Acrobat, this is set in the "advanced" document properties:
However, this is not accessible via the info
or metadata
properties of the document.
Apparently this is stored somewhere else, but I don't know enough about the PDF format to figure out how to access this data:
Any help would be highly appreciated. I could use both the Adobe SDK/JavaScript API or some other library (for instance, I know we already have an Aspose license).
Answering my own question here... I was able to solve this using PdfSharp.
The following code is compatible with PdfSharp 1.50.4845-RC2a.
pdxFile
should be the name of the .pdx file including the file extension (e.g. "catalog.pdx"). I have only tested this with .pdx files located in the same folder as the PDF document, but I would assume that relative paths in general should work.
No guarantees that this is a perfect solution as I lack a deeper understanding of the PDF format, but this seems to work at least.
private void SetSearchCatalog(PdfDocument doc, string pdxFile)
{
var indexDict = new PdfDictionary(doc);
indexDict.Elements["/F"] = new PdfString(pdxFile, PdfStringEncoding.RawEncoding);
indexDict.Elements["/Type"] = new PdfName("/Filespec");
var indexArrayItemDict = new PdfDictionary(doc);
indexArrayItemDict.Elements["/Index"] = indexDict;
indexArrayItemDict.Elements["/Name"] = new PdfName("/PDX");
var indexArray = new PdfArray(doc, indexArrayItemDict);
var searchDict = new PdfDictionary(doc);
searchDict.Elements["/Indexes"] = indexArray;
doc.Internals.Catalog.Elements["/Search"] = searchDict;
}