I'm currently working on generating a PDF file that is PDF/UA-compliant. My main goal is to ensure that it meets accessibility standards and passes the PAC (PDF Accessibility Checker) tool.
The problem I'm facing is that the PAC checker consistently flags my PDF with an error saying: "Test Object Not Tagged." This suggests that the necessary tagging might be missing or improperly implemented, but I’m not sure what I’m missing to address this.
I've made sure to:
Define document structure elements
Use the right tags in the PDF content generation code
Apply what I believe to be the correct metadata and settings for accessibility
However, it seems I'm still missing something fundamental for the tagging to be recognized. What steps might I be overlooking to achieve the proper tagging in a PDF for PDF/UA compliance?
Any insights on where I might be going wrong, or pointers on critical tagging elements to check, would be greatly appreciated
. Below is my code.
public void newPdf() {
int mcidCounter = 0; // Starten bei 0
int structParentCounter = 0; // Starten bei 0
PDDocument document = new PDDocument();
PDPage page = new PDPage(PDRectangle.A4);
document.addPage(page);
// Setzen des StructParents-Eintrags auf der Seite
page.setStructParents(structParentCounter); // structParentCounter ist 0
PDPageContentStream contentStream = new PDPageContentStream(document, page);
PDType0Font font = loadFont(FontEnum.BUNDES_SANS_WEB_REGULAR, document);
// Schriftart den Ressourcen der Seite hinzufügen
PDResources resources = page.getResources();
if (resources == null) {
resources = new PDResources();
page.setResources(resources);
}
resources.add(font);
PDDocumentCatalog catalog = document.getDocumentCatalog();
PDStructureTreeRoot structureTreeRoot = new PDStructureTreeRoot();
catalog.setStructureTreeRoot(structureTreeRoot);
catalog.setLanguage("de-DE"); // Setzt die Dokumentensprache auf Deutsch
PDMarkInfo markInfo = new PDMarkInfo();
markInfo.setMarked(true);
catalog.setMarkInfo(markInfo);
// Erstellen des Dokument-Strukturelements
PDStructureElement documentElement = new PDStructureElement(StandardStructureTypes.DOCUMENT, structureTreeRoot);
structureTreeRoot.appendKid(documentElement);
// Erstellen des Absatz-Strukturelements
PDStructureElement paragraphElement = new PDStructureElement(StandardStructureTypes.P, documentElement);
paragraphElement.setPage(page);
documentElement.appendKid(paragraphElement);
// Vorbereiten des Markierungsinhalts mit MCID
COSDictionary markedContentDictionary = new COSDictionary();
markedContentDictionary.setInt(COSName.MCID, mcidCounter);
// Beginnen des markierten Inhalts
contentStream.beginMarkedContent(COSName.P, PDPropertyList.create(markedContentDictionary));
contentStream.setFont(font, 12);
contentStream.beginText();
contentStream.newLineAtOffset(50, 700);
contentStream.showText("Hallo Welt");
contentStream.endText();
contentStream.endMarkedContent();
// Schließen des Inhaltsstroms
contentStream.close();
// Erstellen des Parent Trees und Verknüpfen mit dem Strukturelement
COSDictionary parentTreeRoot = new COSDictionary();
PDNumberTreeNode parentTree = new PDNumberTreeNode(parentTreeRoot, COSBase.class);
// Mapping von StructParent zu Strukturelement
Map<Integer, COSObjectable> parentTreeMap = new HashMap<>();
parentTreeMap.put(structParentCounter, paragraphElement);
parentTree.setNumbers(parentTreeMap);
structureTreeRoot.setParentTree(parentTree);
// Dokument speichern
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
document.save(outputStream);
byte[] pdfBytes = outputStream.toByteArray();
document.close();
Path actualPath = Path.of("src/test/resources/pdf/small_test.pdf");
Files.write(actualPath, pdfBytes, StandardOpenOption.CREATE, StandardOpenOption.WRITE);
}
Here are the changes I did to make it work with PAC:
1)
PDViewerPreferences prefs = new PDViewerPreferences(new COSDictionary());
prefs.setDisplayDocTitle(true);
catalog.setViewerPreferences(prefs);
PDMarkedContentReference mcr = new PDMarkedContentReference();
mcr.setMCID(mcidCounter);
paragraphElement.appendKid(mcr);
// alternative:
//paragraphElement.appendKid(PDMarkedContent.create(null, mcr.getCOSObject()));
(this was the most difficult problem) replace
parentTreeMap.put(structParentCounter, paragraphElement);
with
COSArray ar = new COSArray();
ar.add(paragraphElement);
parentTreeMap.put(structParentCounter, ar); // must be array here, despite only 1 element
structureTreeRoot.setParentTreeNextKey(structParentCounter + 1);
It now passes the PAC test. However it doesn't display properly with PDF-XChange, it's unclear if this is a PDF-XChange bug or not. This is because PDFBox doesn't have the MCID directly in the content stream, it references the resources. I'll have made a change in PDFBox in PDFBOX-5890 that the MCID is put in the content stream when possible and also added a simplified method to pass it as a parameter; the changes will be in 2.0.33 and 3.0.4, a snapshot will be available here within a few hours.