javapdfaccessibilitypdfbox

Why are structure elements not tagged on new pages in my PDF, even though the first page works?


I'm trying to make my PDF accessible and pass the PAC test using Apache PDFBox. Thanks to some helpful guidance here and Specially for Tilman Hausherr, I got the structure tagging working on the first page see answer here link, but I'm encountering an issue with subsequent pages.

The structure elements on the first page are tagged properly, but when I create new pages, they are not tagged, even though I'm following the same logic to create the structure elements and associate them with the page.

Here's my code to create a new page:

this.page = new PDPage(PDRectangle.A4);
document.addPage(page);
PDPageContentStream contentStream = new PDPageContentStream(document, page, PDPageContentStream.AppendMode.APPEND, true, true);
writeHinweis(font, hinweis);
// Here I set setParentStructCounter according to the type of Page, Image, annotation
setParentStructCount(page);
addFontResource();

For creating the structure element, I'm using the following:

private PDStructureElement createPDStructureElement(String childElementType, PDStructureElement parentElement) {
    PDStructureElement pdStructureElement = new PDStructureElement(childElementType, parentElement);
    pdStructureElement.setPage(page);
    parentElement.appendKid(pdStructureElement);
    return pdStructureElement;
}

And for adding references to the structure element:

private PDMarkedContentReference erstellPDMarkedContentReference() {
    PDMarkedContentReference mcr = new PDMarkedContentReference();
    mcr.setPage(page); // I added this thinking this could solve the problem as it works without for one page
    mcr.setMCID(mcidCounter);
    return mcr;
}

I also use the following code to create the Marked Content Dictionary:

private COSDictionary erstellMarkedContentDictionary(String altText) {
    COSDictionary markedContentDictionary = new COSDictionary();
    markedContentDictionary.setInt(COSName.MCID, mcidCounter);
    markedContentDictionary.setString(COSName.ALT, altText);
    markedContentDictionary.setName(COSName.S, "span"); // also try here thinking to fix the issue 
    markedContentDictionary.setItem(COSName.PAGE, page); // also try here thinking to fix the issue 
    return markedContentDictionary;
}

In the content stream, I start and end marked content like this:

contentStream.beginMarkedContent(COSName.P, PDPropertyList.create(dictionary));
// End content stream
contentStream.endMarkedContent();

Additionally, I'm using:

structureTreeRoot.setParentTreeNextKey(mcidCounter);
mcidCounter++;
structureElementArray.add(element);

Finally, I bind the structure elements to the structure tree:

private void bindStructureTreeRootMitStructureElements() {
    COSDictionary parentTreeRoot = new COSDictionary();
    PDNumberTreeNode parentTree = new PDNumberTreeNode(parentTreeRoot, COSBase.class);

    Map<Integer, COSObjectable> parentTreeMap = new HashMap<>();
    parentTreeMap.put(0, structureElementArray);

    parentTree.setNumbers(parentTreeMap);
    structureTreeRoot.setParentTree(parentTree);
}

Everything looks fine on PAC, as the structure appears correctly. However, when I click on the Marked Content (like the text I expect to be tagged), I see a blank page instead of the expected tagged text on the second or third page.

What am I missing?

The first page works, but the structure tagging doesn't seem to apply properly to subsequent pages. Any guidance on where the issue might be?

Here is a link for the generated PDF, you can see the footnote word (test) on the first page is tagged and the second footnote on the second page not GeneratedPDFFile


Solution

  • Your pages have /StructParents 0 and /StructParents 1 but your parent tree only has an entry for /StructParents 0:

    <</Limits[0 0]/Nums[0[12 0 R 29 0 R]]>>