I'm trying to create a PDF/UA document using PDFBox 3.2, and I’ve followed the solution suggested by @Tilman Hausherr in this Stack Overflow post. I managed to tag both text elements and images, and the image appears correctly tagged in PAC. However, I’m still getting an error in PAC indicating a missing bounding box for the image.
Here’s what I’ve tried so far to resolve this issue:
Marked Content for Rectangle: I created marked content for the image’s rectangle and added it to the document. (No success, as the error persisted).
Adding COSName.BBOX to Figure Structure Element: I added a new item COSName.BBOX with a Rectangle(x, y, width, height) to the figure structure element. (Resulted in a corrupted PDF).
Adding COSName.BBOX to Figure Reference: I added a new item with COSName.BBOX in the figure reference, similar to step 2. (Also resulted in a corrupted PDF).
Despite these efforts, I still don’t see anything in the structure representing the bounding box when I attempt to convert my PDF to PDF/UA. Any guidance on what I might be missing to correctly define the bounding box for the image in PDFBox 3.2 would be greatly appreciated! here is Code für Image creation and Tagging :
COSDictionary markedContentDictionary3 = new COSDictionary();
markedContentDictionary3.setInt(COSName.MCID, mcidCounter + 2);
markedContentDictionary3.setString(COSName.ALT, "Alternate Image Description");
PDMarkedContentReference mcr3 = new PDMarkedContentReference();
mcr3.setMCID(mcidCounter + 2);
//COSDictionary markedContentDictionary4 = new COSDictionary();
//markedContentDictionary4.setInt(COSName.MCID, mcidCounter + 3);
//PDMarkedContentReference mcr4 = new PDMarkedContentReference();
//mcr4.setMCID(mcidCounter + 3);
contentStream.beginMarkedContent(COSName.IMAGE, PDPropertyList.create(markedContentDictionary3));
contentStream.drawImage(image, x, y, width, height);
contentStream.endMarkedContent();
// Schließen des Inhaltsstroms
contentStream.close();
PDStructureElement figureElement = new PDStructureElement(StandardStructureTypes.Figure, documentElement);
figureElement.setPage(page);
figureElement.setAlternateDescription("Dieses Bild zeigt: <dein_Tag_oder_Beschriftung>");
figureElement.appendKid(mcr3);
documentElement.appendKid(figureElement);
assign a number to the image:
image.setStructParent(structParentCounter + 1);
include the figure element in the parent tree, and assign it an attribute
PDStructureElement figureElement = new PDStructureElement(StandardStructureTypes.Figure, documentElement);
PDLayoutAttributeObject attributeObject = new PDLayoutAttributeObject();
attributeObject.setBBox(new PDRectangle(x, y, width, height));
attributeObject.setPlacement(PDLayoutAttributeObject.PLACEMENT_BLOCK);
figureElement.addAttribute(attributeObject);
figureElement.setPage(page);
figureElement.setAlternateDescription("Dieses Bild zeigt: <dein_Tag_oder_Beschriftung>");
PDMarkedContentReference mcr3 = new PDMarkedContentReference();
mcr3.setMCID(mcidCounter + 2);
figureElement.appendKid(mcr3);
documentElement.appendKid(figureElement);
parentTreeMap.put(structParentCounter + 1, figureElement);
// add to the array from SO 79126664, the 0-based index = MCID
ar.add(null); // because you have an MCID "1" about which I know nothing about
ar.add(figureElement);
also don't forget to call
structureTreeRoot.setParentTreeNextKey()
with the highest value plus 1.