java pdf accessibility pdfbox tagged-pdf

Issue Creating PDF with PDF/UA Compliance - PAC Checker Shows "Test Object Not Tagged"

I'm currently working on generating a PDF file that is PDF/UA-compliant. My main goal is to ensure that it meets accessibility standards and passes the PAC (PDF Accessibility Checker) tool.

The problem I'm facing is that the PAC checker consistently flags my PDF with an error saying: "Test Object Not Tagged." This suggests that the necessary tagging might be missing or improperly implemented, but I’m not sure what I’m missing to address this.

I've made sure to:

Define document structure elements
Use the right tags in the PDF content generation code
Apply what I believe to be the correct metadata and settings for accessibility

However, it seems I'm still missing something fundamental for the tagging to be recognized. What steps might I be overlooking to achieve the proper tagging in a PDF for PDF/UA compliance?

Any insights on where I might be going wrong, or pointers on critical tagging elements to check, would be greatly appreciated

. Below is my code.

public void newPdf() {
        int mcidCounter = 0; // Starten bei 0
        int structParentCounter = 0; // Starten bei 0
        PDDocument document = new PDDocument();
        PDPage page = new PDPage(PDRectangle.A4);
        document.addPage(page);

        // Setzen des StructParents-Eintrags auf der Seite
        page.setStructParents(structParentCounter); // structParentCounter ist 0

        PDPageContentStream contentStream = new PDPageContentStream(document, page);
        PDType0Font font = loadFont(FontEnum.BUNDES_SANS_WEB_REGULAR, document);

        // Schriftart den Ressourcen der Seite hinzufügen
        PDResources resources = page.getResources();
        if (resources == null) {
            resources = new PDResources();
            page.setResources(resources);
        }
        resources.add(font);

        PDDocumentCatalog catalog = document.getDocumentCatalog();
        PDStructureTreeRoot structureTreeRoot = new PDStructureTreeRoot();
        catalog.setStructureTreeRoot(structureTreeRoot);

        catalog.setLanguage("de-DE"); // Setzt die Dokumentensprache auf Deutsch

        PDMarkInfo markInfo = new PDMarkInfo();
        markInfo.setMarked(true);
        catalog.setMarkInfo(markInfo);

        // Erstellen des Dokument-Strukturelements
        PDStructureElement documentElement = new PDStructureElement(StandardStructureTypes.DOCUMENT, structureTreeRoot);
        structureTreeRoot.appendKid(documentElement);

        // Erstellen des Absatz-Strukturelements
        PDStructureElement paragraphElement = new PDStructureElement(StandardStructureTypes.P, documentElement);
        paragraphElement.setPage(page);
        documentElement.appendKid(paragraphElement);

        // Vorbereiten des Markierungsinhalts mit MCID
        COSDictionary markedContentDictionary = new COSDictionary();
        markedContentDictionary.setInt(COSName.MCID, mcidCounter);

        // Beginnen des markierten Inhalts
        contentStream.beginMarkedContent(COSName.P, PDPropertyList.create(markedContentDictionary));
        contentStream.setFont(font, 12);
        contentStream.beginText();
        contentStream.newLineAtOffset(50, 700);
        contentStream.showText("Hallo Welt");
        contentStream.endText();
        contentStream.endMarkedContent();

        // Schließen des Inhaltsstroms
        contentStream.close();

        // Erstellen des Parent Trees und Verknüpfen mit dem Strukturelement
        COSDictionary parentTreeRoot = new COSDictionary();
        PDNumberTreeNode parentTree = new PDNumberTreeNode(parentTreeRoot, COSBase.class);

        // Mapping von StructParent zu Strukturelement
        Map<Integer, COSObjectable> parentTreeMap = new HashMap<>();
        parentTreeMap.put(structParentCounter, paragraphElement);
        parentTree.setNumbers(parentTreeMap);
        structureTreeRoot.setParentTree(parentTree);

        // Dokument speichern
        ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
        document.save(outputStream);
        byte[] pdfBytes = outputStream.toByteArray();
        document.close();


        Path actualPath = Path.of("src/test/resources/pdf/small_test.pdf");
        Files.write(actualPath, pdfBytes, StandardOpenOption.CREATE, StandardOpenOption.WRITE);
        

    }

And this what PAC Tool shows

Solution

Here are the changes I did to make it work with PAC:

PDViewerPreferences prefs = new PDViewerPreferences(new COSDictionary());
prefs.setDisplayDocTitle(true);
catalog.setViewerPreferences(prefs);

PDMarkedContentReference mcr = new PDMarkedContentReference();
mcr.setMCID(mcidCounter);
paragraphElement.appendKid(mcr);
// alternative:
//paragraphElement.appendKid(PDMarkedContent.create(null, mcr.getCOSObject()));

(this was the most difficult problem) replace

 parentTreeMap.put(structParentCounter, paragraphElement);

with

COSArray ar = new COSArray();
ar.add(paragraphElement);
parentTreeMap.put(structParentCounter, ar); // must be array here, despite only 1 element

structureTreeRoot.setParentTreeNextKey(structParentCounter + 1);

add the code from answer 79106974 to have the metadata.

It now passes the PAC test. However it doesn't display properly with PDF-XChange, it's unclear if this is a PDF-XChange bug or not. This is because PDFBox doesn't have the MCID directly in the content stream, it references the resources. I'll have made a change in PDFBox in PDFBOX-5890 that the MCID is put in the content stream when possible and also added a simplified method to pass it as a parameter; the changes will be in 2.0.33 and 3.0.4, a snapshot will be available here within a few hours.